Lecture Notes 2008d
Lecture Notes 2008d
FINANCIAL ENGINEERING
A brief introduction using the Matlab system
Fall 2008
Contents
6 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.1 Some general features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Historical volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Implied volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
The implied volatility surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Two modeling approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2 Autoregressive conditional heteroscedasticity . . . . . . . . . . . . . . . . . . . 159
The Arch model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
The Garch model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
The Garch likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Estimation examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Other extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Garch option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Utility based option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Distribution based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
The Heston and Nandi model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.3 The stochastic volatility framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
The Hull and White model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
The Stein and Stein model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
The Heston model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Girsanov’s theorem and option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Example: The Heston model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
The PDE approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
The Feynman-Kac link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Example: The Heston model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Estimation and filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Calibration example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.4 The local volatility model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Interpolation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Implied densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Local volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
%!!' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
A.1 Screenshots of the Matlab Excel Builder . . . . . . . . . . . . . . . . . . . . . . . . 256
A.2 The folders created by YZ
A.3 Screenshot of the [\>]
O
R
& >
^ >R add-in. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
5.1
$>R Y $ ` T& Y Y , $>R Y $ ` T& Y Y and $>R Y $ ` T& Y Y : Simulation and
density function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.1
"'T>! " ` T& b ' ` T l "OTT>! "g Y : Yields based on the
volatility surface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.2
^>$'%& _ R%$>% ` "OTc Y : Calibration of the Nelson-Siegel formula to a
Nelson-Siegel-Svensson parametrization. . . . . . . . . . . . . . . . . . . . . . . . . 207
7.5
` $> ` $ Y >Rg Y : The price path of a payoff based on the
Hull-White tree for the short rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.6
` $> ` & Y C'( Y : Implementation of the Black-Karasinski
Hull-White tree when American features are present. . . . . . . . . . . . . 229
7.7
RO&0"O^^>! Y Y : Correlation structure and principal component
model using a Hull-White interest rate tree. . . . . . . . . . . . . . . . . . . . . . 230
7.8
Si ` R%$ >Rg Y : A Kalman filter wrapper for the multi-factor
analysis of yield curve movements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Apparently, the random variable X counts the number of heads thrown, while
the random variable Y counts the absolute difference between the heads and
tails throws.
Of course this implies that the probabilistic behavior of the random variable
will depend solely on the probabilistic behavior of the states of the world. In
particular, we can write the probability (although we have not formally defined
yet what a probability is)
Pr[X(ω) = x] = Pr[{ω : X(ω) = x}]
In the example above the sample space was small and discrete, and for that
reason the analysis was pretty much straightforward. Unfortunately, this is not
usually the case, and a typical sample space is not discrete. If the sample space
is continuous, expressions like Pr[ω] for elements ω ∈ Ω will be mostly zero,
and therefore not of much interest.
Therefore, rather that assigning probabilities to elements of Ω we need to
assign them to subsets of Ω. A natural question that follows is the following:
can we really assign probabilities to any subset of Ω, no matter how weird and
complicated it is? The answer to this question is generally no. We can construct
sets, like the Vitali set, for which we cannot define probabilities. 1 Subsets of
1
Note that this does not mean that the probability is zero, it means that even if we
assume that the probability is zero we are driven to paradoxes. In fact, the probability
of such a set can not exist, and we cannot allow such a set to be considered for that
purpose.
It turns out that σ-algebras are just the families of set we need to define
probabilities upon, as they are nice enough not to lead us to complications and
paradoxes. Probabilities will be well defined on elements of F . The elements of
a σ-algebra are therefore called events.
As we will see, probabilities are just special cases of a large and very im-
portant class of set functions called measures. It is measures in general that are
defined on σ-algebras. The pair (Ω, F ) is called a measurable space, to indicate
the fact that it is prepared to “be measured”.
Example 2. For a sample space Ω there will exist many σ-algebras, and some
will be larger than others. For the specific sample space of the previous example,
where a coin is tossed three times, some σ-algebras that we may define are the
following
1. The minimal σ-algebra is F0 = {∅, Ω}. It is apparent that this is the
smallest possible set that will satisfy the conditions.
2. F1 = {∅, {HHH, HHT , HT H, HT T }, {T HH, T HT , T T H, T T T }, Ω} is
another σ-algebra on Ω. Apparently F0 ⊆ F1 .
3. The powerset of Ω is also a σ-algebra, F∞ = P(Ω). In fact this is the
largest (or maximal) σ-algebra that we can define on a discrete set. If Ω
was continuous, say the closed interval Ω = R, the powerset is not a sigma
algebra as it includes elements like the Vitali set. The largest useful σ-
algebra that we use in this case is the Borel σ-algebra.
An example of a set that is not a σ-algebra is
GENERATED σ -ALGEBRAS
So far we have defined σ-algebras and we have shown ways to describe them
by expressing some property of their elements. We can also define a σ-algebra
based on a collection of reference subsets of Ω.
F (X) = FX = {X −1 (B) : B ∈ B}
Example 3. Following our coin example, the random variable X will generate the
σ-algebra
as there are sets that we need to exclude (like the Vitali set). The Borel algebra
is here the natural choice.
Definitions of the probability date back as far as Carneades of Cyrene (214-
129BC), a prominent student at Plato’s academy. More recently, Abraham De
Moivre (1711) and Pierre-Simon Laplace (1812) have also attempted to formalize
the everyday notion of “probability”. Modern probability took off in the 1930s,
largely inspired by the axiomatic foundations on measure theory by Andreii
Nikolayevich Kolmogorov.2
MEASURABLE FUNCTIONS
Based on the notion of measurable sets, we can turn to functions that map from
one measurable space (Ω, F ) to another measurable space (Ψ, G ). Functions
that have the property that their pre-images of measurable sets in the destination
measure space Ψ are also measurable sets in the departure set Ω.
f : Ω −→ Ψ
is a (F , G )-measurable function if
Example 5. In our coin example we defined two random variables, X and Y , from
the sample space of three coin tosses
X, Y : Ω −→ R
Each one of these random variable will induce a measurable space on Ω, by the
σ-algebras it generates
PROBABILITY MEASURES
As we indicated in the last subsections, measures are the mathematical equiva-
lent of our everyday notion of “measure”. In the context of stochastic processes
we are not interested in general measures, but in a small subset: the probability
measures.
CONDITIONAL PROBABILITY
The conditional probability is one of the main building blocks of probability the-
ory, and deals with situations where some partial knowledge about the outcome
of the experiment “shrinks” the sample space.
Given a probability space (Ω, F , P) and two events A, F ∈ F . If we assume
that a randomly selected sample ω ∈ A, we want to investigate the probability
that ω ∈ F . Since we know that ω ∈ A, the sample space has shrunk to A ⊆ Ω,
and the appropriate sigma algebra is constructed as FA = {F ∈ Ω : F =
G ∩ A, G ∈ Ω}. The members of the FA are conditional events, that is to say
event F is the event G conditional on event A. We denote the conditional events
as F = G|A.
It is not hard to verify that FA is indeed a σ-algebra on A.
1. The empty set ∅ = (∅ ∩ A) ∈ FA trivially.
2. Also, for an element (G|A) ∈ FA the complement (in the set A) (G|A){ =
G { ∩ A ∈ FA , since G { ∈ F .S S S
3. Finally, the countable union i∈I (Gi |A) = i∈I (Gi ∩ A) = i∈I Gi ∩ A ∈
FA .
Therefore (A, FA ) is a measurable space.
Definition 10. Consider a probability space (Ω, F , P) and an event A ∈ F with
P(A) > 0. The conditional probability is defined, for all F ∈ F , as
P(F ∩ A)
PA (F ) = P(F |A) =
P(A)
We can verify easily that PA is a probability measure on (Ω, F ), which makes
(Ω, F , PA ) a probability space.3 For all events F ∈ F where P(F ∩ A) = 0, the
3
This is indeed an example of different probability measures defined on the same mea-
surable space.
conditional probability P(F |A) = 0. This means that these two events cannot
happen at the same time.
We argued above that by conditioning on the event A we shrink the measur-
able space (Ω, F ) to the smaller measurable space (A, FA ). In fact, equipped
with the measure PA , the latter becomes a probability space. It is easy to verify
that PA is a probability measure on (A, FA ), since P(A|A) = 1. Thus, we can
claim that by conditioning on A the probability space (Ω, F , P) shrinks to the
probability space (A, FA , PA ).
We can also successively condition on a family of events A 1 , A2 , . . . , An . In
fact, we can derive the following useful identity
P(A1 ∩ A2 ∩ · · · ∩ An ) = P(A1 ) · P(A1 |A2 ) · · · P(An |A1 , . . . , An−1 )
Another consequence of the definition is the celebrated Bayes’
S theorem, that
states that if Fi ∈ F , i ∈ I is a collection of events with i∈I Fi = Ω, and
A ∈ F is another event, then
P(F` )P(A|F` )
P(F` |A) = P
i∈I P(Fi )P(A|Fi )
Bayes’ theorem is extensively used to update expectations and forecasts based
on new evidence as this is gathered. This is an example of the filtering problem.
EXPECTATIONS
Given a probability space (Ω, F , P), consider an F -meas random variable X,
and assume that the random variable is integrable
Z
|X(ω)| · dP(ω) < ∞
Ω
A very important quantity is the expectation of X.
Definition 11. The expectation of X with respect to the probability measure P
is given by the integral
Z Z
EX = X(ω) · dP(ω) = x · dPX (x)
Ω Rn
The conditional expectation given a sub-σ-algebra G ⊂ F is a random
variable E[X|G ] that has the properties
1. E[X|G ] is G -measurable
R R
2. For all G ∈ G G E[X|G ]dP = G XdP
The conditional expectation is a random variable, since for different ω ∈ Ω the
quantity E[X|G ] will be different.
One can use the Radon-Nikodym derivative to compute expectations under
different equivalent probability measures. In particular, expectations under Q are
written as
Z Z Z
dQ
EQ X = xdQ(x) = x dP(x) = xM(x)dP(x) = EP [M(X)X]
Ω R dP Ω
Example 6. Let us revisit our coin experiment, where we flip 3 times. The state
space will collect all possible outcomes
Ω = {HHH, HHT , HT H, . . . , T T T }
We we define the collection of random variables
X(t, ω) = number of H in the first t throws
These random variables define a stochastic process on T = {0, 1, 2, 3}. In this
simple case we can tabulate them and keep track of its behavior for all times
and sample points
We can fix time, say t = 2, and concentrate on the random variable X 2 (ω) which
is given by the horizontal slice of the table above
t 0 1 2 3
X (t, T HT ) 0 0 1 1
Example 8. In our coin example the σ-algebras generated by the random vari-
ables Xt are the following
F (X0 ) = {∅, Ω}
F (X1 ) = {∅, {HHH, HHT , HT T , HT H}, {T HH, T HT , T T H, T T T }, Ω}
F (X2 ) = {∅, {HHH, HHT }, {T T H, T T T }, {HT H, HT T , T HH, T HT },
all complements, all unions, all intersections}
The corresponding filtrations will include all unions, intersections and comple-
ments of the individual algebras, namely
F0 = F (X0 )
F1 = F (X0 ) ⊗ F (X1 )
F2 = F (X0 ) ⊗ F (X1 ) ⊗ F (X2 )
For example the set {HT H, HT T } does not belong in neither F (X 1 ) nor F (X2 ),
but it belongs in F2 , since
Intuitively the element represents the event “first toss is a head and second toss
is a tail”. Since Xt measures the number of heads, this event cannot be decided
upon by just observing X1 or by just observing X2 , but it can be deduced by
observing both. In particular, it is equivalent with the event “one head up to
time t = 1” (the event in F (X1 )) and (intersection) “one head up to time t = 2”
(the event in F (X2 )).
DISTRIBUTIONS OF A PROCESS
Based on the probability space (Ω, F , P) we can define the finite-dimensional
distributions of the process Xt . For any collection of times {ti }m
i=1 , and Borel
events {Fi }m
i=1 in B(R n
), the distribution
characterizes the process and determines many important (but not all) properties.
The inverse question is of importance too: Given a set of distributions, is there
a stochastic process that exhibits them?
Kolmogorov’s extension theorem gives an answer to that question. Suppose
that for all ` = 1, 2, . . ., and for all finite set of times {ti }`i=1 in T , we can pro-
vide a probability measure µt1 ,t2 ,...,t` on (Rn` , B(Rn` )) that satisfies the following
consistency conditions
1. For all Borel sets {Fi }`i=1 in Rn the recursive extension
0.4
0
0.1 0.2
0.2
0.3
0.4 0
0.5
0.6 -0.2
0.7
0.8
-0.4
0.9
1
-0.6
-0.8
-1
-1.2
0 0.2 0.4 0.6 0.8 1
The “tent shaped” functions (fkn for each n and k) are the following integrals
over the interval [0, 1]
Z t
n
fk (t) = gnk (u)du
0
Finally, the Brownian motion is defined as the sum over all appropriate n
and k, that is to say
X∞ X
B(t) = fkn (t) · Bkn
n=0 k oddn
k62
as the functions gnk show. Listing 1.1 shows how the function is implemented (a
Z# , and also the first "m%$ Z levels of approximation as the columns of the matrix
function call gives the support of the Brownian motion over [0, 1] as the vector
for all s > t > 0, which means that the conditional distribution depends only
on the latest value of the process, and not on the whole history. Remember
the difference between the σ-algebras Ft which belongs to the filtration of
the process and therefore includes the history, and F (X t ) which is generated
by a single observation at time t. For that reason Markov process are coined
“memory-less”. The Brownian motion is Markov, once again by its definition.
A Feller semigroup is a family of linear mappings indexed by t > 0
Pt : C (R) −→ C (R)
where C (R) is the family of continuous functions that vanish at infinity, such
that
1. P0 is the identity map
2. Pt are contraction mappings, ||Pt || 6 1 for all t > 0
3. Pt has the semigroup property, Pt+s = Pt ◦ Ps for all t, s > 0, and
4. The limit limt↓0 ||Pt f − f|| = 0, for all f ∈ C (R)
A Feller transition density is a density that is associated with a Feller
semigroup. A Markov process with a Feller transition function is called a Feller
The Brownian motion is also a process that has almost surely continuous
samples paths. This is due to Kolmogorov’s continuity theorem, which states
that if for all t ∈ T we can find constants α, β, γ > 0 such that
then Xt has continuous paths (or at least a version). For the Brownian motion
E|Bt1 − Bt2 |4 = 3|t1 − t2 |2 and therefore Bt will have continuous sample paths.
If the drift and volatility is constant, the process Xt = µt+σBt for a Brownian
motion {Bt }t>0 will be a diffusion. More generally the instantaneous drift and
volatility do not have to be constant, but can depend on the location X t and the
time t. Diffusions are then given as solutions to stochastic differential equations.
-0.5 0
0 0.5 1 0.3 0.4 0.5 0.6
1.1
1.2
1
1
0.9
0.8
0.8
0.45 0.5 0.55 0.48 0.5 0.52
for partitions {ti } of the time interval [0, t], where sup |tk+1 −tk | → 0. For “normal”
functions the total variation would be the length of the curve; this means that to
draw a Brownian motion trajectory on a finite interval we will need an infinite
amount of ink. Also, the quadratic variation of “normal” functions is zero, since
they will not be infinitely volatile in arbitrarily small intervals.
When we consider a Brownian motion path, it is impossible to find a interval
that is monotonic, no matter how much we zoom in the trajectory. Therefore we
cannot split a Brownian motion path in two parts with a line that is not vertical.
Figure 1.2 gives a trajectory of a Brownian motion and illustrates how wild the
path is by successively zooming in the process.
dXt
= µ(t, Xt ) + “noise terms”
dt
The solution of such a differential equation could be represented, once again
loosely, as Z t Z t
Xt = X 0 + µ(s, Xs )ds + “noise terms”ds
0 0
If we write the “noise terms” in terms of a Brownian motion, say B t , we have
a process that has given drift and volatility, called an Itō diffusion
Z t Z t
Xt = X 0 + µ(s, Xs )ds + σ(s, Xs )dBs
0 0
The last integral, called an Itō integral with respect to a Brownian motion, is
not readily defined, and must clarify what we actually mean by it. Before we do
so, note that we usually write the above expression in a shorthand “differential”
form as
dXt = µ(t, Xt )dt + σ(t, Xt )dBt
It is obvious that unlike normal (Riemann or Lebesgue), Itō integrals have a
probabilistic interpretation, since they depend on a stochastic process. To give
a simple motivating example, a Brownian motion can be represented as an Itō
integral as Z t
Bt = dBs
0
For the dyadic partitions tk of the time interval [0, t], we define the Itō integral
as the limit of the random variables
X
ω −→ f(tk , ω)[Btk+1 − Btk ](ω)
k>0
R
The limit is taken with respect to the L2 -norm ||f||2 = |f(s)|2 ds. More precisely
we first define the integral for simple, step-like functions, then extend it to
bounded functions φn , and finally move to more general functions f, such that
1. (t, ω) → f(t, ω) is B ⊗ F -meas
2. f(t,
R tω) is Ft -adapted
3. E 0 f 2 (s, ω)ds < ∞
The final property ensures that the function is L 2 -integrable, and allows the
required limits to be well defined using the Itō isometry which states
Z t 2 Z t
E f 2 (s, ω)dBs (ω) =E f 2 (s, ω)ds
0 0
It is easy to see that the Stratonovic integral is not an Ftk -adapted random
variable, since we need to know the value of the process at the future time point
t + k + 1, in order to ascertain the value of the Riemann sums. For that reason
it is not used as often as the Itō representation in financial mathematics, 5 but
it has better convergence properties (due to the midpoint approximation of the
integral) and it is used when one needs to simulate stochastic processes. In fact,
when the process is an Itō diffusion (see below) the two stochastic integrals are
related
Z t Z t Z
1 t ∂σ(s, Xs )
σ(s, Xs ) ◦ dBs = σ(s, Xs )dBs + σ(s, Xs )ds
0 0 2 0 ∂x
An Itō process is a stochastic process on the same filtered space, of the form
Z t Z t
Xt = X 0 + µ(s, ω)ds + σ(s, ω)dBs
0 0
Ex [µ? (t, ω)|FtY ] = µ(t, Ytx ), and σ?2 (t, ω) = σ 2 (t, Y )xt
ITŌ’S FORMULA
Itō’s formula or Itō’s lemma is one of the fundamental tools that we have in
stochastic calculus. It plays the rôle that the chain rule plays in normal calculus.
Just like the chain rule is used to solve ODEs or PDEs, a clever application of
Itō’s formula can significantly simplify a SDE. We consider an Itō process
∂ ∂ 1 ∂2
dYt = g(t, Xt )dt + g(t, Xt )dXt + g(t, Xt )(dXt )2
∂t ∂x 2 ∂x 2
The “trick” is that the square (dXt )2 is computed using the rules
One can easily prove Itō’s formula based on a Taylor’s expansion of the
function g.6 In particular, one can write for 4t > 0 the quantity 4Xt = Xt+4t −
Xt = µ(t, Xt )4t + σ(t, Xt )(Bt+4t − Bt ) + o(4t). Taking powers of the Brownian
increments 4Bt = Bt+4t − Bt yield
E4Bt = 0
E(4Bt )2 = 4t
E(4Bt )n = o(4t) for all n > 3
This implies that the random variable (4Bt )2 will have expected value equal to
4t and variance of order o(4t). A consequence is that in the limit 4B t → 4t,
since the variance goes to zero. Now the Taylor’s expansion for 4Y t = g(t +
4t, Xt+4t ) − g(t, Xt ) will give
0 0.5 Rt
0.1 0
Bs dBs
0.2
0.3
0.4 0
0.5
0.6
0.7 -0.5
0.8
0.9
1 Bt
-1
-1.5
0 0.2 0.4 0.6 0.8 1
Rt 1 2
By taking integrals of both sides, and recognizing that 0 d 2 Bs = 21 Bt2 , we
can solve for the Itō integral in question
Z t
1 1
Bs dBs = Bt2 − t
0 2 2
Example 10. The most widely used model for a stock price, say S t , satisfies
the SDE for a geometric Brownian motion with constant expected return µ and
volatility σ, given by
dSt = µSt dt + σSt dBt
This corresponds loosely to the ODE dS dt = αSt which grows exponentially.
t
0
0.1 3.5
0.2
0.3 3
0.4
0.5
0.6 2.5
0.7
0.8
0.9 2
1
1.5
0.5
0 0.2 0.4 0.6 0.8 1
1
St = S0 exp µ − σ 2 t + σBt
2
Note that under the geometric Brownian assumption the price of the asset is
always positive, an attractive feature in line with the property of limited liability
of stocks. Some stock price trajectories for different ω ∈ Ω are given in figure
1.4.
GENERATORS
Say we are given a Brownian motion Bt on a filtered space. For a SDE that
describes the motion of a stochastic process Xt , say
Definition 14. Given an Itō diffusion Xt , the (infinitesimal) generator of the pro-
cess, denoted with A , is defined for all functions f ∈ C (2) as the limit
Ex f(X4t ) − f(x)
A f(x) = lim = L f(x)
4t↓0 4t
STOPPING TIMES
Definition 15. A stopping time is a random variable
Here, f ∈ C (2) , and also has compact support. Note that in the above integral
the upper bound is a random variable. Dynkin’s formula can be used to assess
when a process is expected to be stopped, that is to say the expectation E x [τ(ω)].
Example 11. For example, say that we are holding a stock with current price S,
and dynamics that follow the geometric Brownian motion
We want to know how long should we expect to wait, before our asset will
be worth at least S̄ > S. Mathematically, we are interested on the first exit time
from the set [0, S̄]
τ = inf{t > 0 : St > S̄}
We cannot directly apply Dynkin’s formula, since we need E s τ < ∞ and we
are not sure about that. For example the asset might have a negative drift and
exponentially drop towards zero. We can define instead the exit times from the
set [a, S̄]
Passing to the limit a → 0 we can retrieve the probability of never reaching our
1−2µ/σ 2
target of S̄, namely pa → p = SS̄ . This probability become higher for
lower expected returns or higher volatility.
If µ > σ 2 /2 then the expected returns are high enough for all sample paths
to eventually breach the target S̄, since St → ∞ as t → ∞, almost surely. The
process will exit with probability 1, but Eτ we might still be ∞.
In this case we consider the function f(x) = log x. Our objectiveRτ now it for
the generator to be constant, in order to simplify the integral 0 a A f(Xs )ds. In
particular
1
A f(x) = µ − σ 2
2
Dynkin’s formula will yield in this case (once again for the exit times τ a )
Z τ
1 2 S
pa · log S̄ + (1 − pa ) · log a = log S + µ − σ E dt
2 0
Passing to the limit this time will give the expected stopping time
log S̄/S
ES τ a → E S τ = as a → 0
µ − σ 2 /2
The Feynman-Kac formula states that this expectation satisfy the partial
differential equation
∂
u(t, x) = A u(t, x) − g(x)u(t, x)
∂t
with boundary condition u(0, x) = f(x).
The Feynman-Kac formula has been very successful in financial mathe-
matics,
n Ras it can represent
o stochastic discount factors through the exponential
t
exp − 0 g(Xs )ds .
Example 12. Suppose that the interest rate rt follows an Itō diffusion given by
√
drt = θ(ρ − rt )dt + σ rt dBt
Also suppose that we have an investment that depends on the level of the interest
rate, for example a house, with value given by H(r). This implies that the property
value will also follow an Itō diffusion, with dynamics given by Itō’s formula. At a
future time T , the house price will be H(rT ), which is of course unknown today.
We are interested in buying the property at time T , which means that are
interested in the present value of H(rT ), namely
Z T
x
u(t, x) = E exp − rs ds · H(rt )
0
Say that we have a project with uncertain payoffs that depend on the evo-
lution of a variable Xt , which has current value X0 = x. The dynamics are
dX = µdt + σdB, and the project will pay f(XT ) = aXT2 + b. We are interested
in establishing the present value
Ex exp {−RT } · (aXT2 + b)
∂ ∂ 1 ∂2
u(t, x) = µ u(t, x) + σ 2 2 u(t, x) − Ru(t, x)
∂t ∂x 2 ∂x
with boundary u(0, x) = ax 2 + b
Example 14. Let’s say that we want to verify the above claim that E Q [Yt ] =
EP [Mt Yt ], stated by Girsanov’s theorem. For example, let’s take the random
variable Yt = log St . Under Q the logarithm will be given by Itō’s formula as
1
Yt = log S0 − σ 2 t + σBtQ
2
and since BtQ is a Q-martingale, the expectation EQ [Yt ] = log S0 − 21 σ 2 t. Under
P we have to consider the process
Zt = M t Yt
Itō’s formula (applied on the function f(x, y) = x · y) will give us the dynamics
for Zt , namely
dZt = Mt dYt + Yt dMt + dYt dMt
which actually produces
1 2 1 2
dZt = exp − λ t − λBt µ − σ dt + σdBt
2 2
1 2
+ log S0 + µ − σ t + σBt [−λdBt ]
2
1 2
+ µ − σ dt + σdBt [−λdBt ]
2
And the two expectations are apparently the same. Observe though how easier
it was to compute the expectation under Q. Girsanov’s theorem can be a valuable
tool when one wants to simplify complex expectations, just by casting them under
a different measure.
In this chapter we will use some of the previous results to establish the Black-
Scholes (BS) paradigm. We will assume a frictionless market where assets prices
follow geometric Brownian motions, and we will investigate the pricing of deriva-
tive contracts. The seminal papers of Black and Scholes (1973) and Merton
(1973) (also collected in the excellent volume in Merton, 1992) defined the area
and sparked thousands of research articles on the fair pricing and hedging of a
variety of contracts.
The original derivation of the BS formula is based on a replicating portfo-
lio that ensures that no arbitrage opportunities are allowed. Say that we are
interested in pricing a claim that has payoffs that depend on the value of the
underlying asset at some fixed future date T . The idea is to construct a portfo-
lio, using the underlying asset and the risk free bond, that replicates the price
path of that claim, and therefore its payoffs. If we achieve that, then the claim in
question is redundant, in the sense that we can replicate it exactly. In addition,
the value of the claim must equal the value of the portfolio, otherwise arbitrage
opportunities would arise.
After the
The parameter µ gives the expected asset return, while σ is the return volatil-
ity.
2. There is a risk free asset which grows at a constant rate r, which applies
for both borrowing and lending. There is no bound to the size of funds that
can be invested or borrowed risk-free.
3. Trading is continuous in time, both for the risk free asset, the underlying
asset and all derivatives. This means that any portfolios can be dynamically
rebalanced continuously.
4. All assets are infinitely divisible and there is an inelastic supply at the spot
price, that is to say the assets are infinitely liquid. Therefore, the actions of
any investor are not sufficient to cause price moves.
5. There are no taxes or any transaction costs. There are no market makers
or bid-ask spreads. The spot price is the single price where an unlimited
number of shares can be bought. Short selling is also allowed.
A derivative security is a contract that offers some payoffs at a future (matu-
rity) time T , that depend on the value of the underlying asset at the time, say
Π(ST ). We are interested in establishing the fair value Pt of such a security at
all times before maturity, that is the process {Pt : 0 6 t 6 T }.
HtF = Vt − HtS St
Therefore we will only keep the process of the shares held, H = {H t : t > 0},
as the trading strategy. Also, in this case the dynamics of the portfolio value are
given by
Say for a minute that we knew the pricing formula for the derivative price,
Pt = f(t, St ). We can then define a trading strategy, where the number of shares
held at each time t is given by
∂
Ht = f(t, St )
∂S
We have selected this particular trading strategy because it sets the volatility
of the portfolio value, Vt , equal to the volatility of the derivative value, Pt . We
call this a hedging or replicating strategy and the portfolio the hedging or
replicating portfolio.
ARBITRAGE OPPORTUNITIES
We claim that if the portfolio has the same volatility dynamics it should also
offer the same return. Otherwise arbitrage opportunities will emerge.
An arbitrage opportunity is a trading strategy J that has the following four
properties (for a stopping time T > 0)
1. The strategy J is self-financing, that is there are no external cash inflows
or outflows. We can move funds from one asset to another, but we cannot
introduce new funds.
2. Vt (J) = 0, that is we can engage in the portfolio at time t with no initial
investment. This means that we can borrow all funds needed to set up the
initial strategy at the risk free rate, without investing any funds of our own.
3. VT (J) > 0, it is impossible to be losing money at time T . The worst outcome
is that we end up with zero funds, but we did not invest any funds in the
first place.
4. P VT (J) > 0 > 0, there is a positive probability that we will actually be
making a profit at time T .
Vt (Θ) = −Pt + Ht St + Φt
Using Itō’s formula for Pt = f(t, St ) and the stochastic differential equation for
St we can write after some algebra (which incidentally cancels the drifts µ)
σ 2 St2
dVt = − ft (t, St ) + rSt fS (t, St ) + fSS (t, St ) − Vt r − Pt r dt (2.2)
2
P t = H t St + Φ t
will also hold at all times. This means that using the trading strategy H t we
create a portfolio that will track (or mimic) the process Pt . Therefore we do not
really need to introduce derivatives in the BS world, as their trajectories and
payoffs can be replicated by using a carefully selected trading strategy. For that
reason we say that in the BS world derivatives are redundant securities. This of
course only holds under the strict BS assumption, and does not generally hold
in any market. It certainly does not hold in the real world where markets are
subject to a number of frictions and imperfections.
As we search for markets and models where securities can be hedged, we
need to introduce the notion of market completeness. We will say that a market
is called complete if all claims can be replicated. A market that is complete will
of course be arbitrage-free, but the inverse is not true. There are many markets
that are arbitrage free but incomplete. One can speculate that the real world
markets fall within this category: claims cannot be perfectly replicated due to
market imperfections, and these imperfections also make arbitrage opportunities
scarce and short lived.
We set a probability space (Ω, F , P), under which the price process is de-
fined. In financial mathematics, an equivalent martingale measure (EMM) is a
measure Q equivalent to the objective one P, under which all discounted as-
set prices form martingales. Therefore for the discounting factor B t , any price
process Vt will satisfy
The fundamental theorem of asset pricing states the following two proposi-
tions:
We are looking for these equivalent probability measures under which the dis-
counted prices will form martingales, which means that under the EMM the
dynamics of the assets will be
N
X
j j j
dSt = rSt dt + σj,i St dBti,Q
i=1
Using Girsanov’s theorem we can actually find the instantaneous drift under
Q, which will be given by
j dMt j
EQ dSt = EP 1 + dSt
Mt
" N ! N !#
j
X X j
P i i
= µj St dt + E λt dBi (t) σj,i dBt St
i=1 i=1
Since the Brownian motions are mutually independent, we can simplify the above
expression to " #
XN
Q j i j j
E dSt = µj + λt σj,i St dt = rSt dt
i=1
which has to be satisfied for all t > 0 and for all j = 1, . . . , M. Therefore the
parameters λ = {λi : i = 1, . . . , N} will be constant, and they must satisfy the
system of M equations with N unknowns
Σ · λ = µ − r1
This system can have no solutions, a unique solution, or an infinite number
of solutions, depending on the rank of the matrix Σ. If the rank is lower than the
number of unknowns, rank(Σ) < N, then the system will not admit a solution.
This means that there does not exist an equivalent martingale measure, and due
to the fundamental theorem of asset pricing it is implied that arbitrage trading
strategies can be constructed using a portfolio of the M stocks. If rank(Σ) > N
then there exists an infinite number of vectors λ that are solutions to the system.
Each one of these solutions will define an equivalent martingale measure and
the market is arbitrage-free. Finally, if rank(Σ) = N then the solution to the
system is unique. This unique λ will define a unique EMM and the market will
be complete. In that case, any other asset that depends on the Brownian motions
B(t) can be replicated using the M assets in the market.
These equalities offer us three options to evaluate the value of the derivative at
time t = 0.
Since BTQ is normally distributed, after some algebra the expectation simplifies
to
Z ∞
exp −σ 2 T /2 B2
P0 = √ S0 exp σB − dB
2πT −d 2T
Z ∞
exp(−rT ) B2
− √ K exp − dB
2πT −d 2T
In the above expression d = log(S0 /K ) + (r − σ 2 /2)T /σ. Evaluating the two
integrals will eventually lead to the Black-Scholes formula.
with St following
the
R t risk neutral dynamics.
We shall also define the function
v(t, s) = EQ exp − 0 rds g(St )|S0 = s , implying that in fact we are interested
in the value V0 = v(T , S0 ). Following the Feynman-Kac approach (see section
1.7) the function v(t, s) solves the parabolic PDE that depends on the dynamics
of the asset prices process under Q (since the expectation is taken under Q)
∂ ∂ 1 ∂2
v(t, s) = rS v(t, s) + v(t, s) − rv(t, s)
∂t ∂s 2 ∂s2
with initial condition v(0, s) = g(s). This is just the Black-Scholes partial dif-
ferential equation (2.3), after the we change the time variable to the time-to-
maturity, which transforms the BS-PDE terminal condition into an initial one.
Exercise timing
Exotic contracts can be classified with respect to their exercise times and their
payoff structure. European-style contracts can be exercised only on the matu-
rity date, while American derivatives can be exercised at any point before the
maturity date. That is to say, a three-month American put with strike price 30p
gives the holder the right to sell the underlying asset for 30p at any point she
Payoff structures
Apart from the standard calls and puts there can be a wide range of structures
that define the payoffs of the contract. The simplest deviation is the digital option
(also called binary or all-or-nothing option), where the payoff is a fixed amount
if the underlying is above or below the strike price. For example a two month
digital call with strike 60p will pay $1 if the value of the underlying is above
60p after two months. In that sense it is a standard bet on the future level of
the underlying asset price.
Another popular option is the cross option, where the underlying asset is
quoted in one currency but the payoffs (and the strike price) are denominated in
another. For example, British Airways are traded in the London stock exchange
and are priced in British pounds, but a US based investor will want the strike
price and the payoff in US dollars. Therefore, if Xt is the USD/GBP exchange
rate, and St is the BA price in London (quoted in GBP), then a European call
will have payoffs of the form (ST XT − K )+ , where the strike price is quoted in
USD. Therefore the writer of this option is also exposed to exchange rate risks
and the correlation between the exchange rate and the underlying asset returns.
A quanto option will address this dependence by setting the exchange rate that
will be used for the conversion beforehand, say X ? . Therefore the payoffs will
only depend on the fluctuations of the underlying asset, given by (S T X ? − K )+ .
The cross option described above is an example of an option that depends
on more than one underlying assets. Other exotics share this feature, like the
1
Just as Bermuda is between Europe and the US.
exchange option that allows the holder to exchange one asset for another, a
basket option that uses a portfolio of assets as the underlying asset, or the
rainbow option that depends on the performance of a collection of assets. An
example of a rainbow option is a European put where the payoffs are computed
using the worst of ten stocks.
Other contracts have features that involve other derivatives, like the com-
pound option that is an option to buy or sell another option. In this case you
can have a call on a call, a put on a call, etc. The swing option lets the holder
decide if she will use the option as a call or as a put, at a pre-specified number
of time points. Typically the holder is not allowed to use all options as calls
or puts, and some provisions are in place to ensure that a mix is actually used.
The chooser option is a variant that allows the holder to decide if the option
will pay off as a call or a put. This decision must be made at some point before
maturity.
If the option is of the European type, one can retrieve its price by using either
the PDE or by simulating the expectation. When the number of underlying assets
is small it is usually faster to numerically solve the PDE, but as the number of
assets grows these numerical methods become increasingly slower. It is typically
stated that if the number of assets is larger than four, then simulation methods
become more efficient.
Path dependence
For options with early exercise features one has to make decisions on the ex-
ercise times. This decision will be dependent on the complete price path of
the underlying asset, and not only on its value at maturity. Some other option
contracts exhibit more explicit or stronger path dependence.
A barrier option has one or more predefined price levels (the barriers). Reach-
ing these barrier can either activate (“knock-in” barrier) or deactivate (“knock-
out” barrier) the contract. Say, for example, that the current price of the under-
lying asset is 47p, and consider a six month call option with strike 55p and a
knock-in barrier at 35p. In order for payoffs to be realized on maturity, not only
the price has to end up higher than the 55p strike price, but the contract must
have been activated beforehand, that is the price needs to have fallen below
35p at some point before maturity. Monitoring of barrier options is not usually
continuous, but takes place on some predefined time points that are typically
equally spaced. The payoff of a Parisian option will depend on the time that is
spend beyond the corresponding barriers, in order to smooth discontinuities.
Lookback options have payoffs that depend not on the terminal value of the
underlying asset, but on the maximum or the minimum value over a predefined
period. Once again in most cases this maximum or minimum is taken over a
discrete set of time points. The special case where the maximum or minimum
over the whole price path is considered yields the Russian option. An Asian
option will have payoffs that depend on the average (arithmetic or geometric)
of the price over a time period, rather than a single value. Therefore an Asian
We will use some Greek letters for the derivatives involved, namely ∆ = ∂V
∂S (the
∂2 V ∂V
Delta), Γ = ∂S2 (the Gamma) and Θ = − ∂t (the Theta). Then we can write the
BS-PDE as
1
−Θ + rS∆ + σ 2 S 2 Γ = rV
2
More importantly, a Taylor’s expansion of the value function V (t, S) over a
small time interval 4t and a small price change 4S yields
∂V ∂V 1 ∂2 V
4V = 4t + 4S + 4S 2 + o(4t, 4S 2 )
∂t ∂S 2 ∂S 2
1
⇒ 4V ≈ −Θ4t + ∆4S + Γ 4S 2
2
The Delta of the derivative or the portfolio will therefore represent its sensitivity
with respect to changes in the underlying asset. In continuous time trading,
holding ∆ units of the underlying asset at all times is sufficient to replicate the
path and payoffs of the portfolio value. Θ will be the time decay of this value,
representing the changes as we move closer to maturity, even if the underlying
asset does not move. When trading takes place in discrete time, there is going to
be some misalignment between the two values, and higher order derivatives can
be used to correct for that. In addition, the Γ controls the size of the hedging
error when one uses the wrong volatility for pricing and/or hedging. This is an
important feature, as the volatility is the only parameter in the BS PDE that is
not directly observed and has to be estimated.
In the BS framework there also some parameters that are considered con-
stant, namely the volatility σ, the risk free rate r, and the dividend yield q.
Therefore one can write the value of function as Vt = V (t, S; σ, r, q), and practi-
tioners use the derivatives of the value functions with respect to these parameters
as a proxy of the respective sensitivities. In particular ν or κ = ∂V∂σ (the Vega or
Kappa2 ), ρ = ∂V∂r (the Rho), and φ V = ∂V
∂q (the Phi).
With the increased popularity of exotic contracts that are particularly sen-
sitive to some parameter values a new set of sensitivities is sometimes used,
although very rarely. These sensitivities are implemented via higher order Tay-
lor’s expansions of the value function. Running out of Greek letters, these sensi-
tivities have taken just odd-sounding names or have borrowed their names from
3
∂2 V ∂3 V
quantum mechanics, like the Speed ∂∂SV3 , the Charm ∂S∂t , the Color ∂S 2 ∂t , the
∂2 V ∂2 V
Vanna ∂S∂σ , and the Volga ∂σ 2 .
No matter what Greek or non-Greek letters are used, the objective is the
same: to enhance the portfolio with a number of contracts that result in a position
that is neutral with respect to some Greek. This turns out to be a simple exercise,
as portfolios are linear combinations of assets and this carries through to their
sensitivities. Say that we are planning to merge two portfolios with values V t1
2
Vega is not a Greek letter, and for that reason this sensitivity is also found in the
literature as Kappa.
and Vt2 into one with value Vt1+2 , and suppose that we are interested in any
sensitivity (where could be ∆, Γ , . . .). It follows that since is actually a
derivative,
1+2
t = 1t + 2t
The simplest asset that we can use to enhance our portfolio in order to
achieve some immunization is the underlying asset itself, S t . Trivially, the val-
uation function of the asset is V (t, S; σ, r, q) = S, and therefore the Delta of
∂S
the asset ∆S = ∂S = 1, while all other sensitivities are equal to zero. The
argument above indicates that by augmenting our portfolio with more units of
the underlying we will change the Delta of the composite position. In order to
immunize other sensitivities we will need to construct a position that incorpo-
rates derivative contracts, with the plain vanilla calls and puts being the most
readily available candidates. For that reason we will now investigate the Greeks
of these simple options and examine how we can use them to achieve Greek-
neutrality. Listing 2.1 gives the Matlab function that produces the price and the
major Greeks for the Black-Scholes option pricing model, for both calls and puts.
THE DELTA
Say that start with a portfolio with value V and Delta ∆V . As we noted above we
can adjust the Delta of a portfolio by adding or removing units of the underlying
asset. In particular, if we add wS units of the asset, the Delta of the portfolio
will become
∆V +S = ∆V + wS ∆S = ∆V + wS
In order to achieve Delta-neutrality, ∆V +S = 0, we will need to short wS units of
the underlying asset. Note that by adding or removing funds from the risk-free
bank account does not have any impact on the Greeks. We can therefore adjust
the bank balance with the proceedings of this transaction.
FIGURE 2.1: Behavior of a call option Delta. Part (a) gives the behavior of the
delta of options with specifications {K , r, σ} = {100, 0.02, 0.20}, and three dif-
ferent times to maturity: t = 0.05 (solid), t = 0.25 (dashed) and t = 0.50 (dotted).
Part (b) gives the behavior of the delta as the time to maturity increases, for a
contract which is at-the-money (S = 100, solid), in-the-money (S = 95, dashed),
and out-of-the-money (S = 105, dotted).
1.0 1.0
PSfrag replacements
0.9 0.9
underlying price PSfrag replacements
delta time to maturity
0.8 0.8
delta
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
70 80 90 100 110 120 130 0.0 0.2 0.4 0.6 0.8 1.0
A position that is Delta-neutral will not change in value for small asset
price changes (but it will change due to the time change as Θ V dictates). Of
course after a small change in the asset price the value of ∆ V +S will change, as
∂ V +S
∂S ∆ = Γ V . In order to maintain a Delta neutral portfolio, one has to rebal-
ance it in a continuous fashion, employing a dynamic Delta hedging strategy.
In the BS framework European calls (and puts) are priced in closed form as
in equation (2.4). Taking the derivative with respect to the price S yields the
Delta for calls and puts
The values of Delta for a European call option, across different spot prices and
different maturities is displayed in figure 2.1. The Delta for deep-in-the-money
options is equal to one, as exercise appears very likely and the seller of the
option will need to hold one unit of asset in order to deliver. For options that
are deep-out-of-the-money exercise is unlikely and the seller of the option will
not need to carry the asset, making the Delta equal to zero. As the time to
maturity increases the Deltas of in- and out-of-the-money contracts converge
towards the at-the-money Delta.
continuously (even in the ideal case where the markets are frictionless). Figure
2.2 illustrates dynamic Delta hedging in a simulated BS world, while in 2.3 the
actual strategy is presented step-by-step. Initially we sell one call option with
strike price K = 100 (at-the-money) and four months to maturity for $2.25. In
order to hedge it we need to purchase ∆ = 0.55 shares, and we will need to
borrow $52.72 to carry out this transaction.
As the price of the underlying asset drops, the Delta of the call follows suit.
We are therefore selling our holdings gradually, recovering some funds for our
bank balance. Eventually the price recovers and we build up the asset holdings
once more. In discrete time intervals the option price changes are not matched
exactly by changes in our portfolio value. In particular these discrepancies are
larger for large moves of the underlying. Overall the hedging portfolio will mimic
the process of the call option to a large extent, but not exactly. In this simulation
run we are left with a profit of $0.12.
Increasing the frequency of trades will decrease the volatility of this hedging
error, and of course at the limit the replicating strategy is exact. If from one
transaction to the next the Delta does not move a lot, we would expect the
impact of discrete hedging to be small. On the other hand, the impact will be
most severe in the areas where the Delta itself changes rapidly. The second
order sensitivity with respect to the price, the Gamma, is in fact summarizing
these effects.
GAMMA
The gamma of a portfolio is defined as the second order derivative of the portfolio
value with respect to the price, or equivalently as the first order sensitivity of
the portfolio Delta with respect to the price. As we already mentioned above, we
expect the Delta of a portfolio to change across time, as the price of the asset
changes. Gamma will give us a quantitative insight on the magnitude of these
changes.3
We have already analyzed how a portfolio can be made Delta-neutral, by
taking a position in the underlying asset. In order to achieve Gamma-neutrality,
the underlying asset is not sufficient. This is due to the fact that
∂2 S
ΓS = =0
∂S 2
This indicates that we need instruments that are nonlinear with respect to the
underlying asset price, in order to achieve Gamma-neutrality. Options are perfect
candidates for this job. On the other hand, the fact that Γ S = 0 has some benefits,
as it implies that after we have made the portfolio Gamma-neutral we can turn
into achieving Delta-neutrality by taking a position in the underlying asset. The
zero value of Gamma will not be affected by this position. We call the strategy
where we are neutral with respect to both Delta and Gamma simultaneously
dynamic Delta-Gamma hedging.
Say that we hold a portfolio with value V and given Delta and Gamma, ∆ V
and Γ V respectively. We follow a two step procedure where we first achieve
Gamma-neutrality, using a liquid contract with known sensitivities. For instance
we can employ a European call option with price C and known Greeks ∆ C and
Γ C . In the second step we will use the underlying asset, which has price S to
achieve delta neutrality (recall that ∆S = 1 and Γ S = 0). The resulting portfolio
will be Delta-Gamma neutral.
3
Delta will also change as time passes, even if the asset price remains the same. The
∂2 V
Charm ∂S∂t would quantify this impact. Generally speaking the impact of asset price
changes captured with the Gamma are more significant that the Delta changes cap-
tured with the Charm. This happens because the magnitude of the squared Brownian
increment (captured by Gamma) is of order o(4t), while the Charm captures effects of
order o(4t 3/2 ).
101 0.7
100 0.6
99 0.5
97 0.3
96 0.2
95 0.1
94 0.0
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25
2.5 +10
0
2.0
-10
PSfrag replacements
-20
1.5
PSfrag replacements
-30
1.0
-40
-50
0.5
-60
0.0 -70
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25
+1.0 0.0
+0.8
+0.6 -0.5
PSfrag replacements+0.4
+0.2 -1.0
PSfrag replacements
0
-0.2 -1.5
-0.4
-0.6 -2.0
-0.8
-1.0 -2.5
-1.0 -0.5 0.0 +0.5 +1.0 0.00 0.05 0.10 0.15 0.20 0.25
FIGURE 2.3: Sample output of the dynamic Delta hedging procedure. A call option
is sold at time t = 0 and is subsequently Delta hedged to maturity
Ä Å ÆPÇ È É±ÊPÇ ËÌÅKÍ ÎÌÏ ÐPÇ È Ñ Í Ò Ó Ñ ÔÖÕ × Ó Æ Å Ø Ä Å ÆPÇ È É±ÊPÇ ËÌÅKÍ ÎÌÏ ÐPÇ È Ñ Í Ò Ó Ñ ÔÖÕ × Ó Æ Å Ø
Ù Ù ÚÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ ÚÙ ÚKÙ Ù Ù Í ÙÕ ÝÙ Þà Ù Ù Ù ßâÙ ÙáPÙ Ü ÙÚ ÙÚ ÙÚÙ Ùã ÄÙ ÊKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ùá õÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá õÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Íä ÍÍ ÒÒ íã éî Ù Ú)áPÜÜ æÚPÚá êõìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éîàß ßÓê Ð%æ)Ú)ÜÜ ëêåæê êÚæ)ìÕ Ü ïÒæ éGçÄKèKÍÍ Í Íä Ù
Íä Í Ó Ð%åè ê)Ü è êPáOÍ Í Ù á æ)Ü ê ç çÚ)Ü á õ ç
Í Ó Ð%åÌá Ú Ú)Ü Ú Ú ÚKÍ Í Ù ê æ)Ü ë æ êÚ)Ü ê ê Ú Ù Ùá çö Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá çÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù Ùáð Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚPÙ Ùá ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éîàßâß á ê)Ú)ÜÜ ç á ëê óóìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éî Ù çIÚ)ÜÜ óÚ çê ôèìÕ ïÒ éGÄKÍÍ Íä
Ù Í Ó Ð%åè ë)Ü è õ õKÍ Í æ ó)Ü Ú æ çÚ)Ü æ è æ
Íä
Í Ó Ð%åè è)Ü çâá õKÍ Í Ù ç ë)Ü ó ó ÚÚ)Ü ê ÚPá Ù Ùá êÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá êÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙôÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù æÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ æÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙÚÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éîàßâß á Ú)Ú)ÜÜ êPá Úá ëôìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éî Ù á Ú)áPÜÜ ç á Úá ëôìÕ ïÒ éGÄKÍÍ Íä
Ù Í Ó Ð%åè ó)Ü è ç õKÍ Í õ ó)Ü ê ç ôÚ)Ü õ è è
Íä
Í Ó Ð%åè ó)Ü Ú õ ëKÍ Í Ù õ ô)Ü ç ó æÚ)Ü õ ó ê Ù Ùá ôÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá ôÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙóÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù õÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ õÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙëÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éîàßâß á çIÚ)ÜÜ ç á çô èêìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éî Ù Ú)Ú)ÜÜ çÚ æÚ çêìÕ ïÒ éGÄKÍÍ Íä
Ù Í Ó Ð%åÌá Ú Ú)Ü á Ú ÚKÍ Í ê õ)Ü Ú æ õÚ)Ü ê ç õ
Íä
Í Ó Ð%åè ó)Ü Ú õ èKÍ Í Ù õ ô)Ü Ú ô êÚ)Ü õ ó Ú Ù Ùá ëÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá ëÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù çö Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ çÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙëÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã é@ îàßâßÓ Ð%Ú)áPÜÜ ÚÚPåÌæáá êÚÚ Ú)ìÕ Ü ïÒ á éGÄKó ÚKÍÍ Í
ÍÍ ÒÒ íã éî Ù ê)Ú)ÜÜ ëÚ êê ëèìÕ ïÒ éGÄKÍÍ Íä
Ù Í Í ê çIÜ Ú ê èÚ)Ü ê ê õ
Íä
Í Ó Ð%åè ë)Ü õ óPáOÍ Í Ù õ Ú)Ü õPá ôÚ)Ü õ æPá Ù Ùá óÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá óÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù êÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ êÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙôÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã é@
ÍÍ ÒÒ íã éî Ù á æ)Ú)ÜÜ ç á ôõ êÚìÕ ïÒ éGÄKÍÍ Íä îàßßÓ Ð%õ)Ú)ÜÜ æÚåÌÚõá ÚæÚ Ú)ìÕ Ü ïÒõ éGÄKè êKÍÍ Í
Ù Í Í ê ë)Ü æ ô èÚ)Ü ê ó ê
Íä
Í Ó Ð%åè ê)Ü ô ó õKÍ Í Ù á ë)Ü ó ê ëÚ)Ü á èPá Ù Ùá èÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá èÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù ôÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ ôÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ çÙ Ù ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã é@ îàßßÓ çIÐ%Ú)ÜÜ óÚåÌçóá õóÚ Ú)ìÕ Ü ïÒô éGÄKó èKÍÍ Í
ÍÍ ÒÒ íã éî Ù æ)Ú)ÜÜ óÚ óõ ôÚìÕ ïÒ éGÄKÍÍ Íä
Ù Í Í ô æ)Ü á ô çÚ)Ü ô õ ç
Íä
Í Ó Ð%åè ê)Ü õPá õKÍ Í Ù á çIÜ è ë êÚ)Ü á ôPá Ù æÙ ÚÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü æÙ ÚÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙæÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù ëÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ ëÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éî Ù ë)Ú)ÜÜ óÚ õë ÚóìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éî Ù Ú)áPÜÜ êÚPÚá æôìÕ ïÒ éGÄKÍÍ Íä Ù
Íä Ù Í Ó Ð%åÌá Ú Ú)Ü á ó óKÍ Í Ù ê çIÜ õ ç ôÚ)Ü ê ê ô
Í Ó Ð%åè ê)Ü á ë ëKÍ Í Ù á õ)Ü ç ë ôÚ)Ü á ç ê Ù æPÙ Ùáð Ù Ù Ù Ù Ù Ú)Ù Ù Ü æPÙ Ùá ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù óÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ óÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éî Ù æ ô)Ú)ÜÜ óæ çë ëæìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã é@ îàßßÓ Ð%æ)Ú)ÜÜ ÚÚåÚæPçèá ê)ìÕ Ü ïÒô éGÄKô ëKÍÍ Í Íä Ù
Íä Í Ó Ð%åè ó)Ü ë ô çGÍ Í Ù æ ë)Ü êPá ÚÚ)Ü æ ó ç
Í Í á ê)Ü ç ó æÚ)Ü á ô ô Ù æÙ æÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü æÙ æÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙôÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù èÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ èÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éî Ù æ Ú)Ú)ÜÜ óæPæá çèìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã é@ îàßâßÓ Ð%Ú)áPÜÜ æÚPåêPá õèáô)ìÕ Ü ïÒÚ éGÄKÚPáOÍÍ Í Íä Ù
Íä Í Ó Ð%åè ë)Ü ç Ú æKÍ Í Ù ô)Ü ô ó ëÚ)Ü Ú ë Ú
Í Í á ô)Ü ë õ ôÚ)Ü á ë è Ù æÙ õÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü æÙ õÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚPÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ùá ÚÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá ÚÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éîàßâß á Ú)Ú)ÜÜ óá õá èÚìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éî Ù ô)Ú)ÜÜ èÚ çë èõìÕ ïÒ éGÄKÍÍ Íä
Ù Í Ó Ð%åè ó)Ü ô ô õKÍ Í á ë)Ü ê æ ëÚ)Ü á ó Ú
Íä
Í Ó Ð%åè çIÜ èPá êKÍ Í Ù è)Ü ë èPá Ú)Ü á Ú ô Ù æÙ çö Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü æÙ çÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ çÙ Ù ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ùá Ùáð Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá Ùá ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙæÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éî Ù á çIÚ)ÜÜ ç á çç ôëìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã é@ Ù
Íä îàßâßÓ Ð%Ú)áPÜÜ ôÚPåôá ëëè ê)ìÕ Ü ïÒçâéGÄKá óKÍÍ Í Íä
Í Ó Ð%åè ó)Ü á ç óKÍ Í Ù õ)Ü Ú ó çÚ)Ü Ú õ õ
Í Í á áPÜ ç ô ÚÚ)Ü á æ õ Ù æÙ êÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü æÙ êÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚPÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ùá æÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá æÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙæÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éî Ù õ)Ú)ÜÜ æÚ Úõ õõìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã é@ Ù
Íä îàßßÓ Ð%æ)Ú)ÜÜ æÚåóæ ççè ê)ìÕ Ü ïÒè éGÄKë æKÍÍ Í Íä
ÍÍä ÷ Þ øàßÓÓ Ð%Ð%Ú)Ü ÚååÌÚá èÚÚ ó)Ú)Õ ÜÜ ïõÚ éGëÚ èKÚKÍ ÍÍ
Í Í á õ)Ü ë ç êÚ)Ü á ç ë
Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙ
Í ÍVß Ú)Ü á á èÚ)Ü Ú Ú Ú
0.09 0.10
0.08 0.09
PSfrag replacements 0.0
underlying price 0.1 0.08
gamma 0.07 0.2
0.3
0.07
0.06 0.4
0.5
0.6 0.06
0.05 0.7
0.8 0.05
0.04 0.9
1.0 0.04
0.03
0.03
0.02
0.02
0.01 0.01
0.00 0.00
70 80 90 100 110 120 130 0.0 0.2 0.4 0.6 0.8 1.0
We want to buy wC units of the option. This makes the value of our composite
position equal to V + C, and most importantly it will have a Gamma equal to
Γ V +C = Γ V + wC Γ C . Therefore, to achieve Gamma-neutrality we need to hold
V
wC = − ΓΓ C units of the option.
V
The Delta of the new portfolio is of course ∆V +C = ∆V − ΓΓ C ∆C . To make
V +C
the position Delta-neutral we want to also hold wS = −∆ shares of the
underlying asset.
For European call and put options the value of Gamma is given by
N0 (d+ )
ΓC = √
Sσ t
Graphically, figure 2.4 gives Gamma across different moneyness and maturity
levels. Apparently the Gamma is significant for contracts that are at-the-money.
In particular, the Gamma of at-the-money options goes to infinity as maturity
approaches. This is due to the discontinuity of the derivative of the payoff func-
tion.
1.8 +10
1.6
0
1.4
PSfrag replacements -10
1.2
PSfrag replacements
-20
1.0
0.8 -30
0.6
-40
0.4
-50
0.2
-60
0.0
-0.2 -70
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25
1.0 0.0
0.8
0.6 -0.5
0.2 -1.0
PSfrag replacements
0.0
-0.2 -1.5
-0.4
-0.6 -2.0
-0.8
-1.0 -2.5
-1.0 -0.5 0.0 0.5 1.0 0.00 0.05 0.10 0.15 0.20 0.25
0.04500 0.04500
0.1 0.1
0.24000 0.24000
0.3 0.3
0.43500 0.43500
0.5 0.5
0.63000 0.63000
0.7 0.7
0.8 0.8
2500 2500
0.9 0.9
1.0 1.0
2000 2000
1500 1500
1000 1000
500 500
0 0
-3 -2 -1 0 +1 +2 +3 -3 -2 -1 0 +1 +2 +3
V 2
These changes of Delta will be proportional to the derivative ∂∆ ∂ V V
∂S = ∂S 2 = Γ .
Therefore, if we construct a position that has Γ = ∆ = 0 we form a portfolio
that will maintain a position which is (approximately) neutral for larger price
changes, and since the price is diffusive, for longer periods of time. 4
Of course as we mentioned above we cannot implement such a position using
the underlying asset alone, and we will need an instrument that exhibits non-zero
Gamma. Typically we use liquid call and put options that are around-the-money
to do so. In figure 2.5 we repeat the experiment of figure 2.2 using a Delta-Gamma
neutral strategy this time. We sell one call option with strike K = 100 and
construct a Delta-Gamma hedge that uses, apart from the underlying asset, a call
option. We could use an option with a constant strike price throughout the time to
maturity, but there is always the risk that as the underlying price fluctuates this
option might become deep-in- or deep-out-of-the-money. Such an option will
have Γ C ≈ 0 (see figure 2.4), and our position in options wC = −Γ V /Γ C → ±∞.
To get around this problem, at each point in time we use a call option that has
strike price 105% the value of the underlying asset at this point, K ? = 1.05St .
This essentially means that when we rebalance we sell the options we might
hold and invest in a brand new contract.5
Figure 2.5 gives the processes for this experiment. In subfigure (c) it is easy
to see that the Delta-Gamma changes in the portfolio follow the changes of the
hedged instrument a lot more closer than the portfolio of figure 2.2 which was
only Delta neutral. This improvement in replication accuracy is also illustrated
in subfigure (d), where the two processes are virtually indistinguishable.
4
There is also an error associated with Delta changes as time passes, proportional to
the Charm ∂∆∂t
, but these effects are typically small and deterministic.
5
Of course if transaction costs were present this would not be the optimal strategy.
We can also repeat the above experiments to assess the average performance
of simple Delta and Delta-Gamma hedging. Here we create 10,000 simulations
of the underlying asset and option prices, and implemented the two hedging
strategies. The table below gives the summary statistics for the hedging errors,
when we hedge 10, 25 or 50 times during the three-month interval to expiration.
Figure 2.6 presents the corresponding histograms for the two hedging strategies,
when we rebalance 25 times.
Hedges Strategy Mean St Dev Min Max Skew Kurt
10 ∆ −0.01 0.52 −3.34 +1.76 −0.47 4.60
∆&Γ −0.12 0.35 −3.76 +1.45 −3.17 19.1
25 ∆ +0.00 0.33 −1.77 +1.30 −0.24 4.30
∆&Γ −0.02 0.17 −2.11 +2.22 −2.63 28.9
50 ∆ −0.00 0.24 −1.50 +1.00 −0.27 4.77
∆&Γ −0.01 0.11 −1.05 +2.28 +0.63 53.3
Itō’s formula will give us the dynamics for the quantity VtH = V (t, St ; σ H )
that gives us the value of Delta that we wish to maintain. In particular,
1 A2 2 H
H
dVt = Θt + (σ ) St Γt dt + ∆H
H
t dSt
2
We can solve the above expression for the last square bracket and substitute
into the expression for the bank balance dynamics (2.7). This will produce
Now we can use (2.6) and the fact that VTH = Π(St ) and ∆H 0
T = Π (ST ) to
write the final bank balance in a parsimonious way as
HTF = exp(rT ) V0I − V0H + Π(ST ) − ST Π 0 (ST )
Z T
1 A2
− (σ ) − (σ H )2 exp(−rt)St2 ΓtH dt
2 0
130 0.0
80 -1.0
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25
price does not trend upwards or downwards (as this would render the option in-
or out-of-the-money).
Figure 2.7 gives an example. We are asked to quote an at-the-money Euro-
pean call (S0 = K = $100) with maturity three months.7 The actual volatility
over the life of the option is σ A = 15%, which indicates that the fair value of
this contract is V0A = $2.74. We agree to sell this option at V0I = $3.73, which
implies a volatility σ I = 20%. Essentially the option is overpriced by $0.99. The
figure illustrates three possible trajectories, where the underlying asset moves
up, down or sideways over the life of the option.
We might know the future actual volatility, in which case we can select
σ H = 15%. If we do not, we can hedge at the implied volatility σ H = 20%. The
following table gives the profits realized using each sample path, with 5, 000
rebalances over the three month period (about 60 per day). One can observe that
in the case where the asset does not trend, using σ I outperforms σ A . Also, note
that when the asset moves sideways, even such a frequent rehedging strategy
is not identical to the continuous one.
P&L when asset moves
up down sideways
σ H = σ A = 15% +$0.99 +$0.99 +$0.92
σ H = σ I = 20% +$0.51 +$0.57 +$1.44
VEGA
We have already highlighted the dependence of derivative contracts on the
volatility of the underlying asset. The BS methodology makes the assumption
7
The drift of the underlying asset is µ = 8%, and the risk free rate of interest is r = 2%.
that the volatility is constant across time, but practitioners routinely compute
the sensitivity of their portfolios with respect to the underlying volatility, and
in some cases try to hedge against volatility changes. Of course, in order to be
precise one should start with a model that specifies a process for the volatility
of the asset, and not the BS framework where the volatility is constant. Then,
sensitivities with respect to the spot volatility are in principle computed in a
straightforward matter, exactly as we compute the BS Delta.
In practice, practitioners use the Black-Scholes Vega instead. It might appear
counterintuitive to use the derivative with respect to a constant, but it offers a
good (first order) approximation. Unless the rebalancing intervals are too long,
or the volatility behaves in an erratic or discontinuous way, the Vega is fairly
robust and easy to compute and use. We follow the last subsection and consider
the value of the portfolio as a function of the volatility σ (in addition to (t, S)),
V = V (t, S; σ). Then, applying Taylor’s expansion yields
1
4V = Θ4t + ∆4S + Γ 4S 2 + ν4σ + o(4t, 4S 2 , 4σ)
2
The underlying asset price does not depend explicitly on the volatility, ren-
dering ν S = 0. Once more we need to rely on nonlinear contracts, such as
options, to make a portfolio Vega-neutral. If we want to achieve joint Gamma-
Vega-neutrality hedge, we will have to use two different derivative securities.
Say that we use two options with prices C1 and C2 , with known deltas
(∆C1 and ∆C2 ), Gammas (Γ C1 and Γ C2 ) and Vegas (ν C1 and ν C2 ). We also use the
underlying asset to achieve delta neutrality (of course ∆ S = 1 and Γ S = ν S = 0).
We want to buy wC1 and wC2 units of the two derivative securities to achieve
Gamma-Vega neutrality. We are therefore faced with the system
Γ V ν C2 − Γ C2 ν V Γ C1 ν V − Γ V ν C1
w C1 = − w C2 = −
Γ C1 ν C2 − Γ C2 ν C1 Γ C1 ν C2 − Γ C2 ν C1
After that we can adjust our holdings of the underlying asset to make our
position Delta-neutral as well. For a European call or put option the BS value
of Vega is given by √
ν C = S tN0 (d+ )
Graphically, the Vega across different moneyness and maturity levels is given
in figure 2.8. It is straightforward to observe that Vega, like Gamma, is more
pronounced for at-the-money options. Unlike Gamma though, thee Vega drops
as we move closer to the maturity. Thus, to achieve Vega neutrality one should
incorporate long dated at-the-money options in her portfolio.
30 v20
v19
25
PSfrag replacements
PSfrag replacements time to maturity v18
underlying price vega
20
vega v17
0.0
15 0.2 v16
0.4
0.6
0.8
v15
10
1.0
0 v14
5
5 10
15 v13
20
25
0 30 v12
70 80 90 100 110 120 130 x12 x13 x14 x15 x16 x17
35
40
∂ ∂ 1 ∂2
f(t, S) + (r − q)S f(t, S) + σ 2 S 2 2 f(t, S) = f(t, S)r
∂t ∂S 2 ∂S
The prices and the Greeks can be computed easily following the same steps.
In particular we can summarize the most useful Greeks in the following catalogue,
where h = +1 for calls and h = −1 for puts, and
log(S0 /K ) + (r − q ± σ 2 /2)(T − t)
d± = p
σ (T − t)
• Option price P = V (t, S)
• Theta Θ = − ∂V∂t
(t,S)
In essence holding the foreign currency will depreciate at the risk free rate
differential. Therefore is is straightforward to confirm that the formulas for the
option prices and their Greeks above will hold, where r r d and q rf .
In the above expression the dots represent parameters that govern the volatility
dynamics, and f· are the corresponding pricing functions.
If we now consider an at-the-money option, where the strike is set at the
forward price KAT M = S exp(rT ), then the HW formula will give
h γ √ i
fHW (t, S; KAT M , r, · · · ) = EQ S 2N T −t −1
2
On the other hand, if PAT M is the observed price, the ATM implied volatility will
solve
σ̂AT M √
PAT M = fBS (t, S; K , r, σ̂AT M ) = S 2N T −t −1
2
Assuming that the HW model is the correct model, fHW (t, S; KAT M , r, · · · ) =
PAT M , we have the relationship
8
Intuitively this means that the volatility risk is diversifiable, or that investors are
indifferent to the level of volatility risk. We will come back to these issues later.
Thus the implied ATM volatility is approximately equal to the expected average
volatility over the life of the option.
Leptokurtosis
It has been long observed that asset returns follow a distribution which is far from
normal, in particular one that exhibits a substantial degree of excess kurtosis
or fat tails (Fama, 1965). These fat tails seem to be more pronounced for short
investment horizons (ie intraday, daily or weekly returns), and they tend to
gradually die out for longer ones (ie monthly, quarterly or annual returns). A
distribution with high kurtosis is consistent with the presence of an implied
volatility smile, as it attaches higher probabilities to extreme events, compared
to the normal distribution. If the at-the-money implied volatility is used, then the
BS formula will underprice out-of-the-money puts and calls. A higher implied
volatility is needed for the BS formula to match the market prices. Merton (1976)
among others, notes that a mixture of normal distributions can exhibit fat tails
relative to the normal, and therefore models that result in such distributions
can be used in order to improve on the BS option pricing results. Most (if not
all) modern option pricing models to some extend do exactly that: expressing
calendar returns as a mixture of normal distributions.
Skewness
Apart from exhibiting fat tails, some asset return series also exhibit significant
skewness. For stocks and indices this skewness is typically negative, highlight-
ing the fact that the speed that stock prices drop is higher than the speed they
grow (although the tend to grow for longer periods then they decline). For cur-
rencies the skew is not generally one sided, swinging from positive to negative
and back, over periods of time. The asymmetries of the implied volatility skew can
be attributed to the skewness of the underlying asset returns. In prices are more
likely to drop by a large amount than rise, one would expect out-of-the-money
puts to be relatively more expensive than out-of-the-money calls. Black (1972)
suggests that volatilities and asset returns are negatively correlated, naming
this phenomenon the leverage effect or Fisher-Black effect. Falling stock prices
imply an increased leverage on firms, which is presumed by agents to entail
more uncertainty, and therefore volatility. This asymmetry can generate skewed
returns, but is not always sufficient to explain the very steep implied skews
we observe in (especially index) options markets. A second component that is
needed is accommodating for market crashes arriving as jumps in the asset price
process, or even just fears of such crashes (the crash-o-phobia of Bates, 1998).
Volatility features
The fact that volatility is not constant is well documented, and allowing it to be
time varying is perhaps the simplest way to construct models that mix normal
distributions. Empirically, it appears that volatility in the market comes in cy-
cles, where low volatility periods are followed by high volatility episodes. This
feature is known in the literature as volatility clustering. The Arch, Garch and
Egarch families9 , as well as models with stochastic volatility have been used in
the literature to model the time variation of volatility and model volatility clus-
tering. The survey of Ghysels et al. (1996) gives a good overview of volatility
models from a modeling perspective. Local volatility models take a completely
different approach, as they focus solely on the pricing and hedging of derivatives,
preferring to keep volatility time-varying but deterministic rather than stochastic
(Dupire, 1994). We will discuss these extensions in chapter 6.
The variation of volatility can be linked to the arrivals of information, and
high trading volume (Mandelbrot and Taylor, 1967; Karpoff, 1987, among others).
One can argue that trading does not take place in a uniform fashion across time:
new information will result in a more dense trading pattern with higher trading
volumes, which in turn result in higher volatilities.
Price discontinuities
Even allowing the volatility to be time varying cannot accommodate for very
sharp changes in the stock price, typically crashes, which although are very rare
events, have a significant impact on the behavior of the market. On October 19th,
1987, the S&P500 index lost about 20% of its value within a day and without any
significant warnings. If the market was to follow the Black-Scholes assumption
of a GBM with constant volatility, such an event should happen once in 10 87
years,10 Even if we allow the volatility to vary wildly, a model with continuous
sample paths that will exhibit such a behavior is not plausible.
Starting with Merton (1976), researchers have been augmenting the diffusive
part of the price process with
9
Arch here stands for autoregressive conditional heteroscedasticity (Engle, 1982), Garch
stands for generalized Arch (Bollerslev, 1986), and Egarch for exponential Garch (Nel-
son, 1991)
10
This is a very long time. For a comparison, the age of our universe is estimated to be
about 1024 years.
The Black and Scholes (1973, BS) partial differential equation (PDE) is, as we
saw, one of the most fundamental relationships in finance. It is as close to a law
as we can get in a discipline that deals with human activities. The importance
of the expression stems from the fact that it must be satisfied by all derivative
contracts, independently of their contractual features. In some special cases, for
example when the contract in question is a European-style option, the solution
of the PDE can be computed in closed-form, but this is not the general case.
In many real situation we will have to approximate the solution of the PDE
numerically.
If t denotes time and S = S(t) is the value of the underlying asset, the BS
model assumes that S follows a geometric Brownian motion
It follows that, for any derivative contract, the pricing function f = f(t, S) will
satisfy the BS PDE
f(0, S) = max(S − K , 0)
Finite difference methods (FDMs) is the generic tern for a large number of
procedures that can be used for solving a (partial) differential equation, which
have as a common denominator some discretization scheme that approximates the
!"#%$& '()" 70(3.1)
For a differentiable function all three limits are equal, and suggest three can-
didates for discrete approximations for the derivative. In particular we can con-
struct:
1. The right limit yields the forward differences approximation scheme
dh(x̄) h(x̄ + 4x) − h(x̄)
≈
dx 4x
2. The left limit yields the backward differences approximation scheme
dh(x̄) h(x̄) − h(x̄ − 4x)
≈
dx 4x
3. The central limit yields the central differences approximation scheme
dh(x̄) h(x̄ + 4x) − h(x̄ − 4x)
≈
dx 24x
These schemes are illustrated in figure 3.1, where the true derivative is also
given for comparisons. Of course the approximation quality will depend on the
salient features of the particular function, and in fact, it turns out to be closely
related to the behaviour of higher order derivatives.
Let us now assume that we have discretized the support of h using a uniform
grid, {xi }∞
i=−∞ with xi = x0 + i · 4x, and define the values of the function
hi = h(xi ). Then, we can introduce the corresponding difference operators D + ,
D− and D0 , and rewrite the difference approximations in shorthand 1 as
1
For us these operators serve as a neat shorthand for the derivative approximations,
but there is, in fact, a whole area of difference calculus that investigates and exploits
their properties.
3.5
0 x̄ + 4x
0.1 3
0.2
0.3 dh(x̄)
dx
0.4 x̄
2.5
0.5
0.6
0.7
0.8 2
0.9 x̄ − 4x
1
1.5
0.5
0 5 10 15 20
hi+1 − hi
Forward: D+ hi =
4x
hi − hi−1
Backward: D− hi =
4x
hi+1 − hi−1
Central: D0 hi =
24x
What are the properties of these schemes and which one is more accurately
representing the true derivative? A first inspection of figure 3.1 reveals that
the central differences approximation is closer to the true derivative, but is this
generally true? In order to formally assess the quality of the approximations we
will use Taylor expansions of h around the point xi , that is to say the expansions
of the points hi±1 :
dh(xi )
D+ h i = + o(4x)
dx
dh(xi )
D− h i = + o(4x)
dx
dh(xi )
D0 h i = + o(4x 2 )
dx
In the above expressions we introduce the big-O notation, where o(4x n )
includes all terms of order 4x n and smaller.2 Now since |4x 2 | |4x| around
zero, it follows that the terms |o(4x 2 )| |o(4x)|, which means that central
differences are more accurate than forward of backward differences. We say
that central differences are second order accurate while forward and backward
differences are first order accurate.
Therefore, without any further information on the function, we should use
central differences where possible. If we have some extra information, perhaps
using one-sided derivatives might be beneficial. Such cases could arise when the
drift term dominates the PDE, or alternatively when the volatility is very small.
In our setting though we will concentrate on approximations that use central
differences as their backbone.
The BS PDE also involves second order derivatives, on top of the first order
ones. We therefore need to establish an approximation scheme for these second
derivatives. When we achieved that we will be able to proceed to the actual
discretization of the BS PDE (3.1). Since we are trying to establish second order
accuracy, we are looking for a scheme that approximates the second derivatives
using central differences.
It turns out that an excellent choice is an approximation that takes central
differences twice over a half-step 4x 2 .
Using the same substitutions from the Taylor expansions as above yields
d2 h(xi )
D2 h i = + o(4x 2 )
dx 2
Therefore, we conclude that the operator D2 is second order accurate. In addition
D2 has the advantage that in order to compute it we use the same values that
were needed for the first diffrence D0 , namely hi±1 , and the value hi .
|g(x)|
2
Formally, if a function g = g(x) is o(4x n ) then the limit of the ratio |4x n| < C < ∞
(meaning that it is bounded) as x → 0. Intuitively, g(x) approaches zero at the same
speed as 4x n . We say that g is of order n.
∂f(t, x) ∂2 f(t, x)
L f(t, x) = α(t, x) + β(t, x) + γ(t, x)f(t, x)
∂x ∂x 2
for general functionals α, β and γ. Therefore the BS PDE will be of the general
form
∂f(t, S)
= L f(t, S) (3.2)
∂t
Suppose that we work on a grid x = {xj }+∞ j=−∞ , with constant grid spacing
equal to 4x. We will concentrate on an initial value problem, and therefore as-
sume that the function extends over the whole real line. The problem of boundary
conditions will be addressed later in this chapter. We will also define the value
function at the grid points, fj (t) = f(t, xj ), j = −∞, . . . , +∞. We construct the
discretized operator by applying the differences D0 and D2
In the above expression the functionals αj , βj and γj are just the restrictions of
α, β and γ on the grid point xi . Substituting the difference operators gives
∂fj (t)
= Lfj (t) ⇔
∂t
∂fj (t) −
= q+ 0
j (t) · fj+1 (t) + qj (t) · fj (t) + qj (t) · fj−1 (t) (3.3)
∂t
The functionals q± 0
j (t) and qj (t) depend on the structure of the PDE and are
given by
1 1
q±
j (t) = ±αj (t) + βj (t)
24x 4x 2
1
q0j (t) = γj (t) − 2βj (t)
4x 2
There is a large number of solvers for such systems. We will consider methods
that apply time discretization as well, and therefore work on a two-dimensional
grid.
THE GRID
In equation (3.4) we converted the PDE in question into a system of infinite
ODEs. Apparently it is not feasible in practice to numerically solve systems
with an infinite number of equations. We will therefore nedd to truncate the grid
and consider a subset with Nx elements x = {xj }N j=1 . This means that we will
x
Figure 3.2 illustrates such a grid, together with a view of a function surface
that we could reconstruct over that grid. It is important to note that neither the
space nor the time grid have to be uniform. One can, and in some cases should,
consider non-uniform grids based on some qualitative properties of the PDE in
hand.
i
fj+1
100
fji
fji+1
95 i
fj−1
90
5
85 10
15
1.1
1.2 4t 20
4x
1.3 25
1.4
fji+1 − fji i 0 i − i
= Lfj (ti ) = q+
j fj+1 + qj fj + qj fj−1 (3.5)
4t
We can explicitly solve4 the above expression for fji+1 , which yields the re-
cursive relationship
3
Just a lot more messier.
4
Hence the name!
i
Essentially, the values fj±1 and fji determine the next period’s value fti+1 . This is
schematically depicted in figure 3.3. In matrix form, the updating takes place as
Now we turn to the BS PDE (3.1) and apply this discretization scheme. To
simplify the expressions we perform the change of variable x = log S. This will
transform the PDE into one with constant coefficients, namely
α σ2
q±
i = ± +
24x 24x 2
2
σ
q0i = −r −
4x 2
i
ε i = f i − f̃
i
We can investigate the convergence f i → f̃ by inspecting the `∞ -norm, namely5
kεk = maxj=1...Nx |fji − f(ti , xj )|. Apparently, if the maximum (absolute) value
converges to zero, then all other values will do as well, and the FDM prices will
converge to the true ones.
Before we move to the inspection of the global errors, we first examine the
local truncation error, defined as the discrepancy between the true parabolic
5
In some cases it is more convenient to work with the `1 -, `2 - or `p -norm. The choice
largely depends on the problem in hand. See XXXX for details.
PDE (3.2) and the approximated one (3.5), evaluated at the true pricing function
at the point (ti , xj )
i ∂ f(ti+1 , xj ) − f(ti , xj )
τj = f(tj , xi ) − L f(tj , xi ) − − Lf(ti , xj )
∂t 4t
The definitions and the properties of the difference operators yield that the
truncation error τji = o(4t, 4x 2 ). We therefore say that the explicit method is
first order accurate in time and second order accurate in space. Intuitively, this
truncation error would tell us how errors will be created over one step, if we
start from the correct function values. Any scheme that offers order of accuracy
greater than zero is called consistent.
Of course, even if small errors are created over a given time step, they
can still accumulate as we move from one time step to the next. It is possible
that they produce feedback effects, producing errors that grow exponentially in
time, destroying the approximate solutions and creating oscillatory or explosive
behaviour. On the other hand we might construct a FDM that has errors that
behave in a “nice” way, without feedback effects. The notion of stability captures
these ideas.
One intuitive way of looking at stability is through the Courant-Friedrichs-
Lewy (CFL) condition6 , which is based on the notion of the domain of depen-
dence. If we have a function f(t, S), then the domain of dependence of the point
(t ? , S ? ) is the set of points
The CFL criterion states that if a numerical scheme is stable, then the true
domain of dependence must be smaller than the domain of dependence of the
approximating scheme.
In parabolic PDEs the domain of dependence of the process is unbounded,
since information travels instantaneously across all values. The domain of de-
pendence of the explicit FDM is bounded, since each value at time t i+1 will only
depend on three of its neighbouring values at time ti . Therefore, according to
CFL criterion in order for the scheme to be stable the condition 4t = o(4x 2 )
must be satisfied.7 Therefore the explicit scheme will not be unconditionally
stable, and will need very small time discretization steps to offer stability.
The connection between local errors, global errors and stability is given by
the Lax equivalence theorem which states that a FDM which is consistent and
stable will be convergent. This means that the explicit method is not (always)
convergent.
6
Stated in 1928, long before any stability issues were discussed in this context. Richard-
son initiated FDM schemes as back as 1922 for weather prediction, but did not discover
any stability problems.
7
This means that the time grid must become finer a lot faster than the space grid, for
the information to rapidly reach remote values.
100 i+1
fj+1
fji
fji+1
95
i+1
fj−1
90
5
85 10
15
1.1
1.2 4t 20
4x
1.3 25
1.4
One way to overcome the stability issues is to use a backward time step.
Rather than taking a forward time step at time ti , we take a backward step from
time ti+1 . This is equivalent to computing the space derivatives at time t i+1 as
shown below
fji+1 − fji i+1 0 i+1
= q+
j fj+1 + qj fj + q− i+1
j fj−1
4t
This equation relates three quantities at time tt+1 and one quantity at time ti ,
which is schematically given in figure 3.4.
Since we are facing one equation with three unknowns we cannot explicitly
give a solution, but we can form a system.
i+1
i+1
−q+ 0
j 4tfj+1 + 1 − qj 4t fj − q− i+1 i
j 4tfj−1 = fj
Note that the number of system equations will be equal to the number of un-
knowns. In matrix form, the system can be written as
The same line of argument we used for the explicit method will give us the
order of accuracy of the implicit scheme, the errors being again o(4t, 4x). On
the other hand, since the value at time ti+1 depends on the whole set of prices f i
at time ti , the domain of dependence of the implicit scheme is unbounded. From
the CFL criterion it follows that the implicit scheme is unconditionally stable.
i
fj+1
100 i+1
fj+1
fji
fji+1
95
i
fj−1
i+1
fj−1
90
5
85 10
15
1.1
4t/2 20
1.2 4t/2
4x
1.3 25
1.4
BOUNDARIES
The above treatment of parabolic PDEs assumed that the space extends over
the real line. Essentially this implies that the matrices involved are of infinite
dimensions. Of course in practice we will be faced with finite grids. Sometimes,
as is the case with barrier options, boundary conditions will be explicitly imposed
by the nature of the derivative contract. In other cases, when the derivative has
early exercise features, the boundary is not explicitly defined and is free in the
sense that it is determined simultaneously with the solution of the PDE.
There are two different kinds of fixed boundary conditions: Dirichlet condi-
tions set f(tB , xB ) = fB , that is the value of the function is known on the boundary.
Neumann conditions set ∂f(t∂x B ,xB )
= φB , that is the derivative is known on the
boundary. In the second case we will need to devise an approximation scheme
that exhibits o(4x 2 ) accuracy; if we do not achieve that, then contaminated val-
ues will diffuse and eventually corrupt the function values at all grid points.
This means that we must use finite difference schemes that achieve o(4x 2 ), like
central differences.
Say that we construct a finite space grid {xi }N i=0 , which essentially discretizes
x
the interval [x0 , xNx ]. Most of the elements of the matrix Q are not affected by the
boundary conditions, and the matrix is still tridiagonal. The only parts that are
determined by the fixed boundary conditions are the first and last rows. Thus Q
will have the form
F F 0
q− 0 +
1 q−1 q10 0+
0 q 2 q2 q2 0
.. .. ..
. . .
Q=
0 q − 0
q q +
0
j j j
.. .. ..
. . .
0 q −
q 0
q +
Nx −1 Nx −1 Nx −1
0 F F
j
We start with a Dirichlet condition at fNx +1 = f(ti , xNx +1 ) = fBi . This point is
utilized in the explicit scheme when the value fNi+1
x
at time ti+1 is calculated. In
particular i
fNi+1
x
= q+ i 0 −
j 4tfB + 1 + qj 4t fNx + qj 4tfNx −1
i
Therefore, in matrix form, the updating equation for the explicit scheme becomes
f i+1 = (I + Q4t) · f i + gi 4t
These values can be use in the approximation schemes to set up the last row of
Q, namely (0, · · · , 0, q+ − 0 i
Nx +1 + qNx +1 , qNx +1 ), and the last element of g equal to
+ i i
2qNx +1 φB 4x. Similarly, a Neumann boundary condition at x0 will set the first
− − i
row of Q to (q00 , q+ i
0 + q0 , 0, · · · , 0), and the first element of g to −2q0 φB 4x.
i
The initialization part of the PDE solver just decomposes the structure and
constructs the log-price and the time grids. The function returns the payoff val-
ues, the boundary values and the derivatives on the boundaries. Therefore both
Dirichlet and Neumann conditions can be accommodated for. The tridiagonal Q
! W"%X# that keeps the
matrix will be constructed according to whether we have specified Dirichlet or
Neumann boundary conditions. We use the switch _
boundary type as a two-element vector. At this stage we assume that the same
boundary applies to all time steps. The Matlab code for the PDE solver is given
in listing 3.3.
Here we use the Matlab backslash operator A\B = inv(A) ∗ B. The snippet in
! W"%X#/û j will implement the
3.4 illustrates how the function can be called to compute the price of a European
put, and plots the pricing function. Setting _
PDE solver with Dirichlet boundary conditions.
¯ N-¯
8
We will implement the solver using Neumann conditions, and therefore we pass the
boundary values as . Actually, for the put price the corresponding Dirichlet bound-
ary condition is not time homogeneous, and our solver will need slight modifications
to accommodate time inhomogeneous boundaries.
LISTING 3.3:
%X ` _ Tc Y : θ-method solver for the Black-Scholes PDE.
p 1C4s q 5Bw8
*u%;t.=O3+>; x9yDv6{=Dv6 ¾ ³O|~}r1C4s q 5g2*6V1G
=?O4=N~}1vB-=?O4=N p *+,=?C4=%Ne8a4=?a+
, }1vB2,g p ,C35¡*,%44,%N=%4
5%38aN~}1vB5%38aN p DC+%AN=O3A3=%M
5
N },B2%K5%38aNaa5%38aN
³ }1vB2=g p 8aN=uC,O3=%M
¯= }1vB0=;%u8 q 4, p =C3:8O43 ;C=4,%DNA5
= }³aH¯=z p =C3:8O4,C3> 53>%4
=D }¶xeF:=F2³C|( p =C3:8O4,C3>
10
qy }1vB0y q +>u%;CN,%M° p 8ONyA+>e1C,O3%.4
¯y }1vB0y;%u8 q 4, p A+>e1C,O3%.43 ;C=4,%DNA5
y } q ytH¯yz p A+>e1C,O3%.4,C3>53>%4
yD }x0 q yF<yF q ya|( p A+>e1C,O3%.4,C3>
q +>u%;C%=%M1O4}1B q +>u%;C%=%M1O4v
15
x9*6 q *v6ù*%C|~} *4DNA 2*6ªyDv6V1Gc p *.N%AA
®>1}úB2%NOH%yúB2%K5%38aNaa5%38aNaH%ytH%yz p ®K1OA ua5
® 8} KB9CNCH%yúB2%K5%38aNaa5%38aNaH%ytH%yz p ®K8K3 ;%ut5
®}±,r5%38aNta5%38aNOHytH%yz p ®K%4,+
p 8aN=%,O3yü
20
üý} C3N -®tC+ ;C45<¯yKOe6:O C3N -®>1cO+ ;C452¯y 6:6¬OB:B:B
C3N 0® 8cO+ ;C452¯y 6:6c
} 4,C+5 <¯ytCc6·e p D4C.=C+,+>*¡.+>;t5=N;C=O5
prq +>u%;CN,%Mþ.+>;CO3=O3+>;t5
3 * q +>u%;C%=%M1O4 p ¥a3,O3%. ?aA4=.+>;CO3=O3+>;t5
z: }r® 8¡ q *%:e p =+ 1 .+>;t5=N;C=
25
g 4>; }r®>1 q *%<e prq +=%=C+ 8.+>;t5=N;C=
4%A%54 p ¯C4u8aN;%;.+>;CO3=O3+>;t5
üg:e6< }r® 8~®>1v p =+ 1üa8aN=%,O3y
üe-¯yaCc6¬¯yG}r® 8~®>1v prq +=%=C+ 8üa8aN=%,O3y
z: }¡>%yK® 8¡*%v:e p =+ 1 .+>;t5=N;C=
30
g 4>; }´yK®>1*%v<e prq +=%=C+ 8.+>;t5=N;C=
4>;
¾ ³³v%FO6:}}~*4,C +5 <¯ytCc6¬¯=tO p %,O3O5+>*,4C5 uaA=O5
¾*+, =;y}cF-¯=KOt p A++ 1=?C4=%Nep8a43 ;t=?a3=O+ 3NA¡.+>;CO3=O3+>;
=?C,C+>uC?=C3:8O4
35
³¾ zF6=;y(}c 4M%4 -¯ytO=?O4=Nt%üC%=Kÿr 4M%4 <¯yKOB:B:B
þ0=?O4=NK%%üC=G ¾ ³zF6:=>;%y%a%=c
4>;
LISTING 3.4:
% X ` _ T ` & Y C'( Y : Implementation of the θ-method solver.
p 1C4s q 5s381aABw8
.A4%N,
.A%.
1B0=?O4=N }B<e p *+,=?C4=%Ne8a4=?a+
1B2, }B2 p ,C35¡*,%44,%N=%4
5
1B 5%38aN }B %e p DC+%AN=O3A3=%M
1B2= }B<%( p 8aN=uC,O3=%M
1B9© }B ( p 5=%,O3 O41C,O3%.4
1B0=;%u8 q 4, }( p =C3:8O43 ;C=4,%DNA5
1B0y q +>u%;CN,%M}B9e p 8ONyA+>e1C,O3%.4
10
1B0y;%u8 q 4, }²( p A+>e1C,O3%.43 ;C=4,%DNA5
1B q +>u%;C%=%M1O4}~c prq +>u%;CN,%M=M1C4
p .N%AA=?C4£¥»5+%AD4,
x9yDv6{=Dv6 ¾ ³O|~}r1C4s q 5¦%s 1u=@6V1Gc
p 8ON>C4 N ¥~1OA+=+>*=?C4,4C5 uaA=O5
15
5u,* 4y1 -yDG(6/=Dv6 ¾ ³ -(
FIGURE 3.6: Early exercise region for an American put. The time-price space
is separated into two parts. If the boundary is crossed then exercise becomes
optimal.
PSfrag replacements
early exercise region
L f(t, S) > 0
f(t, S) = Π(S)
free boundary
log-price
no-exercise region
L f(t, S) = 0
f(t, S) > Π(S)
time to maturity
(where the option can be exercised at a predefined set of times). With small
changes the PDE solver we constructed can take care of these features.
Essentially, the holder of the option has to make a decision at these time
points: exercise early and receive the intrinsic value, or wait and continue holding
the option. In terms of PDE jargon, the problem is now a free-boundary problem.
There is a boundary, which is at the point unknown for us, which separates the
region of (t, S) where early early exercise is optimal and the region of where it
is optimal to wait. Figure 3.6 illustrates these regions. Thus, within the “waiting
optimal” region the BS PDE is satisfied, while outside the boundary f(t, S) will
be equal to the payoff function Π(S).
The boundary function is unknown, but it has a known property: it will
be the first point at which f(t, S) = Π(S). This follows from a no-arbitrage
argument that gives that the pricing function has to be smooth and not exhibit
discontinuities. In terms of the pricing function, it will satisfy
The BS PDE is satisfied within the no-exercise region, while the pricing function
is satisfied within the exercise region. Equation (3.10) reflects that. Within the
exercise region L f(t, S) > 0, while within the no-exercise region f(t, S) > Π(S).
Equations (3.8-3.9) cover these possibilities.
This indicates that a strategy to compute the option price whenearly exercise
is allowed will be to set
fji = max f̂ji , Π(Sj )
where f̂ is the price is no exercise takes place. Therefore the option holder’s
strategy is implemented: the holder will compare the value of the option if she
did not exercise with the price if she does; the option value will be the maximum
of the two. Although the above approach is straightforward in the explicit method
case, it is not so in the other methods where a system has to be solved. In these
cases we are looking for solutions of a system subject to a set of inequality
conditions. In the most general θ-scheme, the system has the form
where S is the vector of the grid prices of the underlying asset, and the inequality
is taken element-wise.
Such systems can not be explicitly solved, but there are iterative methods,
like the projected successive over-relaxation or PSOR method. Given a system
a starting value x (0) , and a relaxation parameter ω ∈ (0, 2), the PSOR method
updates9
i−1
X n
X
(k+1) (k) ω (k+1) (k)
xi = max ci , (1 − ω)xi + bi − aij xj − aij xj
aii
j=1 j=i+1
The PSOR procedure is implemented in the short code given in 3.5. The pro-
gramme will solve A ∗ x > b and x > c, while one of the two equalities will
strictly hold for each element. The initial value is xinit. The function returns the
solution vector Z , and an indicator vector Z of the elements where the second
equality holds; in our case the early exercise points. The solver has to be ad-
! $
justed to accommodate for the early exercise, and the code is given in 3.6. We
introduce Y b and call the PSOR procedure. We demand accuracy of 10 −6 ,
while we allow for 100 iterations to achieve that.
The snippet in listing 3.7 implements the pricing of a European and an American
put and examines the results. The strike price is $1.05. To make the differences
clearer the interest rate is set to 10%. The results are given in figure 3.7; the
American option prices approach the payoff function for small values of the spot
price, while European prices cross. Early exercise will be optimal if the spot
price is below $0.90, where the American prices touch the payoff function.
BARRIER FEATURES
European vanilla options (calls and puts) are exercised on maturity, and have
payoffs that depend on the final value of the underlying asset. Barrier options
have an extra feature: the option might not be active to maturity, depending on
whether or not the barrier has been triggered. Denote the barrier level with B.
The jargon for barrier options specifies the impact of the barrier as follows
• Up: there is an upper barrier, or Down: there is a lower barrier
• In: the contract is not activated before the barrier is triggered, or Out: if the
barrier is breached the contract is cancelled
Therefore we can have eight standard combinations
Up In Calls
-and-
Down Out Puts
9
A value 0 < ω < 1 corresponds to under-relaxation, ω = 1 is the Gauss-Seidel
algorithm, while 1 < ω < 2 corresponds to over-relaxation. In our case we want to use
a value that implements over-relaxation.
LISTING 3.5:
OT>!>Rg Y : PSOR method.
p a1 5+,°Bw8
*u%;t.=O3+>; xwyz604yO|~}1a5+,v6 q 6ª.6¬yC3;a3>= 6¢+8O4%N 6ª=%+A6 8ONy(
; } A4;C%=? q c p A4;C%=?+>*D4C.=C+,O5
y }yO3 ;t3=z p 3 ;t3=O3NA34y
}c p ;%u8 q 4,+>*¡3=4,N=O3+>;t5
5
*AN }a p 5=C+>1%1t3 ;C*AN
J?t3A4*AN
} a
p ;C4y= 3=4,N=O3+>;
yO3 ;t3=}yg p +ADNA>uO4
*+, 3}tF2; p u%1CN=4~;C4JDNA>uO4
10
y:3C} 8ONy :.(:3Ce6/yg:3CeBBB
+ 8a4NKK q :3CO g-36%F-CyH g036:3Cc
4>;
p . ?ON;C4¡58aNA%A4;a+>uC?¡+>,=++8ON>;M 3=4,N=O3+>;t5
3 * ;O+,>8 9yG%yO3 ;t3=Kr=+AK ~8ONy(
15
*AN }~c
4>;
4>;
4y } <ya}}t.C·¹þ2yKCe p =?C44%N,CAM4y4,O.%3%54,4O3+>;
LISTING 3.6:
%X ` _ T ` $ Y >Rg Y : θ-method solver with early exercise.
p 1C4s q 5sN>8a4,@Bw8
*u%;t.=O3+>; x9yDv6{=Dv6 ¾ ³v6ª»C|}r1C4s q 5sN>8a4,2*z6V1G
+ 8a4N~}1vB:+ 8a4N p *+,£C¨À>¼
5 lines 3-37 of
1C4s q 5Bw8
p 4y4,O.%3%54,4O3+>;
» } ³z
» v%FO6:}þ¾ <*cCe
p A++ 1=?C,C+>uC?¡=C3:8O4
10
*+, =;y}cF-¯=KOt
p =?C4=%N8a4=?a+
~} 4M%4 <¯yKO=?O4=Nt%üC=z
q } 4M%4 <¯yKOþ0=?O4=NK%%üC= ¾ ³zF6:=>;%y%¡B:B:B
a=z
xwJz6ù4yO|~}1a5+,v 6 q 6ª*6 ¾ ³F6ù=>;%yz%g6¬+8O4%N 6±>4(B:B:B
15
g6 c
³zF6=;y(}J
»¾ zF6=;y(}~4yz
4>;
0.25
1.2 0.2
0
0.1
0.2
0.15
option value
0.3
0.4 European
0.5 American
0.6 0.1
0.7
0.8
0.9
1 0.05
Payoff function
0
0.8 0.85 0.9 0.95 1 1.05 1.1 1.15
asset price
)!*!+!,-/.,!31!)4J571
lines 2-12 of
5
(FK044FLNM!+OEPQR.94NS!+2FD!92=TN1+23KN0NUV9)!L39U/.
WYXS[Z\LS[Z^]NTN_JZ\Q`a;)!*!+!,-/.,01+26bcd, )eL6Z)>fgB
(FK044FLNM!+OEPQR.94NS!+2FD!92hQe!29)+0NUV9)!L39U/.
WYXS[Z\LS[Zi]NQ_/a ;)!*!+!,-/.bcd, )eL6Z)jfgB
]T_O;]T_kbl!Z +U* fgB (O4N0!. LmTN1+23KN0NUn)!23KN+.
10
]NQ_O;]NQ_kbl!Z +U* fgB (O4N0!. LnQe!29)+0NUn)!23KN+.
G _o; +NXN) b#XNS>fgB (O0..N+LF)!23KN+.
(p)49LR9)!L39UnS!0!4e+O0U*hLNM!+O)0q9NDDhDeU/KL39U
)49L brG_[Z#]NTN_JZiG_[Z#]Q_sZiG_[Z 10NX b)857t>uG_JZAffgB
Barrier options are examples of path dependent contracts, since the final
payoffs depend on the price path before maturity. This path dependence is con-
sidered mild, since we are not interested on the actual levels, but only on the
behavior relative to the barrier, i.e. if the barrier is triggered.
For example, consider an up-and-out call, where the spot price of the under-
lying is S0 = $85, the strike price is K = $105 and the barrier is at B = $120.
This contract will pay off only if the price of the underlying remains below $120
for the life of the option. If St > B at any t, then the payoffs (and the value of
the contract) become zero. One can see that we should expect this contract to
have some strange behaviour when the price is around the barrier level.
Contrast an up-and-in call with the same specifications. For the contract to
pay anything, the price has to reach at least St = $120 for some t (but might
drop in later times).
Now suppose that an investor holds both contracts, and observe that (for any
sample path) the barrier can either be triggered or not. Thus, when one is active
the other one is not. Holding both of them replicates the vanilla call. Therefore,
In the above examples the barrier contract was monitored continuously. For
such contracts closed-form solutions exist. In practice though, barrier options
are motitored discretely, that is to say one examines where the underlying spot
price is with respect to the barrier at a discrete set of points. For example a
barrier contract might be monitored on the closing of each Friday. Monitoring
can have a substantial impact on the pricing of barrier options. For that reason
numerical methods are employed to price barrier options.
An up-and-out option will follow the BS PDE, where a boundary will exist
at St = B (in fact f(t, S) = 0 for all S > B). This feature can be very easily
implemented in the finite difference schemes that we discussed. In particular,
the barrier will be active only on the monitoring dates, and a PDE with no
barriers10 will be solved. Essentially, we can compute the updated values f i+1
normally, and then impose the condition f(ti , xj ) = 0 if ti is a monitoring date
and exp(xi ) > B.
A Matlab listing that implements pricing of up- and down-and-out calls and
put is given in 3.8. The snippet that calls this function is given in 3.9.
LISTING 3.8:
%X ` _ T `_ $>RRg Y : Solver with barrier features.
p 1C4s q 5s q N,%,@Bw8
*u%;t.=O3+>; x9yDv6{=Dv6 ³O|~}~1C4s q 5s q N,%,2*6V1G
q N,%,%³}1vB q N,%,O34,%=O3¾ 8a4C5° prq N,%,O34,=O38a4C5
q N,%,¿}1vB q N,%,O34,CA4D4A prq N,%,O34,¡A4D4A
5
q N,%,N}1vB q N,%,O34,%O3,4O.=a3+; p u1 :~+>,+>J; 2C
+=?O4,5=uC*%*
prq +>u%;CN,%Mþ.+>;CO3=O3+>;t5
3 * q +>u%;C%=%M1O4 p ¥a3,O3%. ?aA4=.+>;CO3=O3+>;t5
10
z: }r® 8¡ q *%:¡0 q N,%,N(c
g 4>; }r®>1 q *%< q N,%N, v
4%A%54 p ¯C4u8aN;%;.+>;CO3=O3+>;t5
üg:e6< }r® 8~®>1v
üe-¯yaCc6¬¯yG}r® 8~®>1v
15
z: }K>CyK® 8¡*%v:~0 q N,%,Nc(
g 4>; }yK®>1*%v<~ q N,%N,
4>;
p u1+>,+>J;8uaA=O3 1aA34,
3 * q N,%N, p u1N ;%e+ u=
20
q N,%,} 4y1 <yD°->} q N,%,¿cc
4%A%54 p %+ J;N ;%e+ u=
q N,%,} 4y1 <yD°->} q N,%,¿cc
4>;
p %,O3O5+>*,4C5 uaA=O5
¾ ³³v%FO6:}}~*4,CB2+q5 N<¯,%yt,Cc 6¬¯=tpO3 ;t 3=O3NA.+>;CO3=O3+>;
25
LISTING 3.9:
%X `_ T `_ $>RR ` & Y '( Y : Implementation for a discretely monitored
barrier option.
p 1C4s q 5s q N,,CsC38C1tAàBw8
1B q N,%,O34,%=O38a4C5} x9F2B2eF2B-%| prq N,%,O34,=O38a4C5
1B q N,%,O34,CA4D4A¡}B%e prq N,%,O34,¡A4D4A
1B q N,%,O34,%O3,4C.=O3%+;}a p u1 :~+>,+>J; 2C
5
lines 2-12 of
1C4s q 5s381aA Bw8
p .N%AA=?C4£¥»5+%AD4,*+, q N,%,O34,¡+>1C=O3+>;t5
x9yDv6{=Dv6 ¾ ³O|~}r1C4s q 5s q N,%,à ¦%s 1u=à6V1Gc
p .,4%N=4N 5 uC,%*NC.41OA+=
10
5u,* 4y1 -yDG(6/=Dv6 ¾ ³ -(
be equal to
∂f ∂f
∆= = exp(−x)
∂S ∂x
∂2 f ∂2 f ∂f
Γ= = − exp(−2x)
∂S 2 ∂x 2 ∂x
The derivatives with respect to the log-price x can be computed using finite
differences on the grid (in fact they have been computed already when solving the
PDE). Note that, since we approximate all quantities using central differences,
the first and last grid points will be lost.
The snippet in 3.10 shows how the Greeks can be computed over a grid, while
figure 3.8 gives the output. In order to make clear the effect of early exercise
we use a relatively high interest rate of 10%. We also implement a relatively
dense (100 × 100) grid over (t, S) to ensure that the derivatives are acurrate.
Observe that the Deltas of both options approach their minimum values of −1
in a continuous way. The Gammas, on the other hand, show different patterns
with the American Gamma jumping to zero.
Even if we use a stable FDM method, like the Crank-Nicolson, computing
the greeks does not always give stable results. For example figure 3.9 presents
the Greeks for the same American and European put options as 3.8, but with
the time steps decreased to 10. The Delta is apparently computed with errors,
which are magnified when the Gamma is numerically approximated. Note that
the instability is introduced by reducing the time steps; the log-price grid is
still based on 100 subintervals. In other cases explosive Greeks are an outcome
of the contract specifications. For instance a barrier option will exhibit Deltas
that behave very erratically around the barrier, since the pricing function is not
differentiable there.
LISTING 3.10:
% X `_ T ` b R%>SCT ` & Y C'( Y : PDE approximations for the Greeks.
p 1C4s q 5s%,44t5%sC38C1tAàBw8
1B0=?O4=N }B<e p *+,=?C4=%Ne8a4=?a+
1B+ 8a4N }GB<e p *+,£C¨À>¼
1B2, }B:>e p ,C35*,%44,%N=%4
1B 5%38aN }B%e p DC+%AN=O3A3=%M
5
1B2= }B:>e p 8aN=uC,O3=%M
1B9© }GB2( p 5=%,O3 O41C,O3%.4
1B0=;%u8 q 4, } ( p =C3:8O4 3 ;C=4,%DNA5
1B0y q +>u%;CN,%M}B %e p 8ONyA+>e1C,O3%.4
1B0y;%u8 q 4, } ( p A+>1C,O3%.43 ;C=4,%DNA5
10
1B q +>u%;C%=%M1O4}~c prq +>u%;CN,%M=M1C4
x9yDv6{=Dv6 ¾ ³ 6{» O|~}r1C4s q 5sN>8a4,à ¦%s 1u=à6V1cc
x9yDv6{=Dv6 ¾ »%³a| }r1C4s q 5g¦%s 1u=à6V1Gc
~} ³F6 4>; g p . uC,%,4;Cn = 8a4,O3%.N;1C,O3%.4C5
»} ¾¾ »%³F6 4>; g p . uC,%,4;C=»uC,C+>1O4%N;1C,O3%.4C5
15
y}~yD<(F 4 ;% %e p =%,u%;t.N=4¡,C3>
¨} 4y1 -yKc p 51O+=
¥y}~yD<tyDv:e p A+>1C,O3%.4,C3>5>=%4>1
p *O3,O5=4,O3DN=O3D4C5JO3>=? ,4C5 1O4C.==%+A+>e1C,O3%.4
O} @ z eF 4>; C z:KF 4 ;% aHc<%C¥yGc
20
»a} 2» eF 4>; C »:KF 4 ;% aHc<%C¥yGc
p 54C.+>;C4,O3DN=O3D4C5JO3>=? ,4C5 1O4C.==%+A+>1C,O3%.4
%~} @ z eF 4>; K> z<(F 4 ;% %% z:KF 4 ;% OHG0¥yce
»~} 2» eF 4>; K>»<(F 4 ;% %%»:KF 4 ;% OHG0¥yce
¥
} OB9Ha¨ p 8a4,O3%.N; 4A=N
25
¥»}r»aB9Ha¨ p »uC,C+>1O4%N; 4A=N
¦ } r %
OCeB9Ha¨B9Ht¨g p 8a4,O3%.N;N>8%8aN
¦>»} -» »aCeB9Ha¨B9Ht¨g p »uC,C+>1O4%N;N>8%8aN
p 1aA%+=O5+>*4A=NC5N>;N>8%8aNC5
5 u q 1aA%+= 0e6260z 1OA+= 4y1 -yK(6VN¥ v6 4y1 0yte6/¥»cc
30
5 u q 1aA%+= 0e6262%z 1OA+= 4y1 -yK(6{¦ v6 4y1 0yte6ª¦>»cc
-0.1 4
-0.2
3.5
-0.3
3
-0.4
2.5
gamma
delta
-0.5
2
-0.6
1.5
-0.7
1
-0.8
-0.9 0.5
-1 0
0.8 1 1.2 1.4 0.8 1 1.2 1.4
spot price spot price
∂f ∂f ∂2 f ∂2 f ∂2 f
L f = αx + αy + βx 2 + βy 2 + βxy + γf
∂x ∂y ∂x ∂y ∂x∂y
∂fj,k ∂fj,k
fj±1,k±1 = fj,k + (±4x) + (±4y)
∂x ∂y
1 ∂2 fj,k 2 1 ∂2 fj,k 2 ∂2 fj,k
+ 4x + 4y + (±4x)(±4y) + o(4x 3 , 4y3 )
2 ∂x 2 2 ∂y2 ∂x∂y
The operator
-0.1 25
-0.2
20
-0.3
15
-0.4
10
gamma
delta
-0.5
5
-0.6
0
-0.7
-5
-0.8
-0.9 -10
-1 -15
0.8 1 1.2 1.4 0.8 1 1.2 1.4
spot price spot price
This uses four points to approximate the cross derivative, but it is not the
only way to do so.11 In any case we can write the discretized operator
If we consider an (Nx , Ny )-point grid over (x, y), then we can construct the
matrix Q which will be (Nx × Ny , Nx × Ny ). The prices f(t, xj , yk ) actually form
a matrix F(t) for a given t, but we prefer to think of them as a vector f = f (t)
produced by stacking the columns of this matrix. Therefore, the price f(t, x j , yk )
will be mapped to the (k − 1)Nx + j element of f
f(t, x1 , y1 )
f(t, x2 , y1 )
..
.
f = f(t, xNx , y1 )
f(t, x1 , y2 )
..
.
f(t, xNx , yNy )
11
For example Ikonen and Toivanen (2004) give an alternative.
F F 0 0 0 0 F F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 0 0 0 0 0 0 0 0
F F 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 0 0 0 0 0 0
F F F 0 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 0
0 F F F 0 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0
0 0 F F F 0 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0
0 0 0 F F F 0 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0
0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 0 0
0 0 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0
0 0 0 0 0 0 F F F 0 0 0 F F F 0 0 0 F F F 0 0 0
0 0 0 0 0 0 0 ♠ 0 0 0 ♣ F ♣ 0 0 0 ♠ 0 0
0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 0 0 0 F F F 0
0 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 0 0 0 F F F
0 0 0 0 0 0 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 F F
0 0 0 0 0 0 0 0 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F F 0 0 0 0 F F
Matrix Q will now be a block-tridiagonal matrix, with block elements that are
tridiagonal themselves. Also Q is a banded matrix, meaning that all elements
can be included within a band around the main diagonal. The structure is given
in figure 3.10 for an approximation that uses six points to discretize x and four
points to discretize y. The elements ♣ give the elements that reflect moves to
j ± 1, used for the derivatives with respect to x; the elements ♠ reflect moves
to k ± 1, used for the derivatives with respect to y; and the elements reflect
moves to (j ± 1, k ± 1), used for the cross derivative.
BOUNDARY CONDITIONS
The boundary conditions will have an effect on these matrices. In particular, the
first and last rows of all matrices will depend on boundary conditions on x. In
addition, all elements of the block matrices B and F will depend on the boundary
conditions imposed on y. The generic sum that Q implements is given by
∂fj,k
= q(−,−) fj−1,k−1 + q(0,−) fj,k−1 + q(+,−) fj+1,k−1
∂t
+ q(−,0) fj−1,k + q(0,0) fj,k + q(+,0) fj+1,k
+ q(−,+) fj−1,k+1 + q(0,+) fj,k+1 + q(+,+) fj+1,k+1
where the coefficients are given by the following quantities (with the elements
that correspond to figure 3.10 also indicated)
(♣): q(±,0) = ±αx /(24x) + σx2 /(24x 2)
(♠): q(0,±) = ±αy /(24y) + σy2 /(24y2)
(F): q(0,0) = −r − σx2 /(4x 2 ) − σy2 /(4y2 )
(): q(+,+) = q(−,−) = +ρσx σy /(44x4y)
(): q(−,+) = q(+,−) = −ρσx σy /(44x4y)
Boundary conditions will influence the first and last rows of each block, as
this is where the boundaries of x are positioned. The whole first and last blocks
will be also affected, since this is where the boundaries of y are positioned.
The first and last rows of these particular blocks will correspond to the corner
boundaries. Also, the boundaries will specify a matrix of constants G, just like
the vector of constants we constructed in the univariate case.
For Neumann conditions the elements (1, 2) and (Nx , Nx − 1) of each block
are given by q(+,·) + q(−,·) . Of course a similar relationship will hold for all
elements of the (1, 2) and (Ny , Ny − 1) block, which will have elements given by
q(·,+) + q(·,−) . Apparently the (1, 2) and (Nx , Nx − 1) elements of these particular
blocks will be dependent on both boundary conditions, and also the boundary
condition across the diagonal. The values for these elements will be given by
q(+,+) + q(−,+) + q(+,−) + q(−,−) .
The elements of the matrix G will be also determined by the Neumann
∂f(x ,y )
conditions for k = 2, . . . , Ny −1 and j = 2, . . . , Nx −1. Say that φx,(j,k) = ∂xj k ,
∂f(x ,y )
and φy,(j,k) = ∂yj k .
The other four points have similar expressions. We will vectorize the constrains
by stacking the columns of G into the vector g.
If we include the impact of the boundary conditions (and keep in mind that
they might be time varying), the system of ODEs that will give us an approximate
solution to the two-dimensional PDE is now given by
∂f (t)
= Q · f (t) + g(t) (3.12)
∂t
If the boundary conditions are homogeneous, g(t) = g, then the solution of the
system is
Once again if the boundaries are homogeneous in time the scheme can be written
as
(I − θ · Q4t) · f i+1 = (I + (1 − θ) · Q4t) · f i + g
In theory solving this system does not present any differences, but in practice
it might not be feasible since Q is not tridiagonal. For that reason a number of
alternating direction implicit (ADI) and local one-dimensional (LOD, also known
as Soviet splitting) schemes are typically used. Such schemes do not solve over
all dimensions simultaneously, but instead split each time step into substeps,
and assume that over each substep the system moves across a single direction.
Therefore at each substep one has to solve a system that is indeed tridiagonal.
4x β βxy 4y
where we have defined βx? = βx − 2xy 4y and βy? = βy − 2 4x , to save some
space. Now we can put down the approximation
h γih γih γ i i+1
1 − αx Dx − βx? D2x − 1 − βxy D2xy − 1 − αy Dy − βy? D2y − f
h 3 i h 3 i h 3
γ γ γi i
= 1 + αx Dx + βx? D2x + 1 + βxy D2xy + 1 + αy Dy + βy? D2y + f
3 3 3
It is tedious to go through the algebra, but one can show that the approxi-
mation of the operators is at least of second order in time and both directions.
Therefore the results are not expected to deteriorate due to this operator split-
ting. In the Peaceman and H. H. Rachford (1955) scheme we implement the
following three steps, solving for auxiliary values f ? and f ??
h γi ? h γi i
1 − αy Dy − βy? D2y − f = 1 + αy Dy + βy? D2y + f
h 3i h i 3
γ ?? γ ?
1 − βxy D2xy − f = 1 + βxy D2xy + f
h 3i h 3
γ i+1 γ i ??
1 − αx Dx − βx? D2x − f = 1 + αx Dx + βx? D2x + f
3 3
For the D’yakonov scheme (see Marchuk, 1990; McKee, Wall, and Wilson,
1996) we use a slightly different splitting where at the first step we produce the
complete right-hand-side
h γi ? h γih γi
1 − αy Dy − βy? D2y − f = 1 + αx Dx + βx? D2x + 1 + βxy D2xy +
3 h 3 3
? 2 γi i
1 + α y Dy + β y Dy + f
h i 3
γ ??
1 − βxy D2xy − f = f?
h 3
γ i i+1
1 − αx Dx − βx? D2x − f = f ??
3
In both cases the operations are implemented using matrices that can be cast
in tridiagonal form by permutations of their elements. In the multidimensional
PDE problems, one has to take special care when dealing with the boundary
conditions, as it may be confusing. Also, some decisions have to be made on the
corners, which are affected by boundary conditions on more than one dimension.
We will make the assumption that both assets follow geometric Brownian mo-
tions, with correlation parameter ρ. The pricing function will satisfy the two-
PSfrag replacements
3
2
−→
t
0
15
10 20
15
5 10
5
0 0
y x
LISTING 3.11:
^>$'' Y 0& " Y : Payoff and boundaries for a two-asset option.
p .NA%A 8K3 ;Bw8
*u%;t.=O3+>; xwMz6ªM¥a|~}.NA%A 8K3 ; <yDv6{MDv6V1G
©}1B9© p 5=%,O3 O41C,O3%.4
x-5yv6±5 MC|} 8a4C5 ?C%,O3 4y1 0yDG(6 4y1 -MDGc p ,C3>
y}~yD<tyDv:eªM}~MD<tMDv:e p O3*%*4,4;C=O3NAC5
5
M~} 8ONy 8t3; :5yv65 MGtG6ù( p 1ONMC+*%**u%;t.=O3+>;
p 1ON,%=O3NA4,O3DN=O3D4C5
M¥vB<¥%y%}þ<M<6%F-a>Mz:e6%F-CHMz p 06
M¥vB<¥%y¯}þ<Mg 4 ;% 6%F-a>Mg 4 ;% %e6%F-CH%Mz p 2¯yv69
M¥vB<¥%M%}þ<M%FO6<a>Mz%FO6:CHyz p r ú6-
10
M¥vB<¥%M¯}þ<MgF6 4>; C>MF6 4 ;% %OH%yz p # 6{¯MG
p O3NC+>;ONA4,O3DN=O3D4C5
} 5>®,= -ycOMce p C3N¡O3*%*4,4;C=O3NA
M¥vB<¥}þ<M:e6:a>Mz<6<CHz p 06-
M¥vB<¥¯}þ<Mg0c6 4>; C>M26 4 ;% %OH%z p 2¯y°60~
15
M¥vB<¥¯}þ<Mg 4 ;% 6:a>Mg 4 ;% %e6<CH%z p 062¯MG
M¥vB<¥¯¯}þ<Mg 4 ;% 6 4>; O Mg 4 ;% c6 4 ;% %OH p 2¯yv6<¯MG
3.6 EXTENSIONS
Apart from the contracts and the techniques we discussed, there is a very large
number of exotic options with features that can be implemented within the PDE
framework. Sometimes we will need to extend the dimensionality of the problem
to accommodate for these special features. For example, in many cases a rebate
is offered when the barrier it triggered. This will make sure that breaching the
barrier will not leave you empty handed. It is straightforward to handle such
rebates in the finite differences procedure.
Other contracts attempt to cushion the barrier effect and the discontinuities
it creates. For example, in Parisian options the barrier is triggered only if the
barrier remains breached for a given (cumulative) time. To solve for this option
we need to introduce an extra variable, namely the cumulative time that barrier
has been breached, say τ.
LISTING 3.12:
%X ` _ T `h X Y : Solver for a two dimensional PDE (part I).
p 1C4s q 5s%Bw8
*u%;t.=O3+>; x<¨ yv6¬¨ Mv6ª*a|~}~1C4s q 5s%°2*6V1G
p 8t+4A~1ON,N>8a4=4,O5
,~}1B2,g¬,?O+}1B-,?O+gù=?O4=N}1B0=?O4=Ng
³~}1B2³gª¯= }1B2¯=z
5
5 y}1B 5%38aNy¢5 M}1B 5%38aNM
Ny},eB2%K5 ycc¢NM},eB2%K5 Mcc
p O3%5%.,4=O3N=O3%+>;ú1ON,N>8a4=4,O5
q y}1B q yz{¯y}1B2¯yz q M}1B q Mz{¯M}1B<¯Mz
y} q yKHG-¯y%eªyD}a q yF<yF q yz p ,C3>+D%4,A+>e%¨
10
M} q MKHG-¯M%eªMD}a q MF<MF q Mz p ,C3>+D%4,A+>e%¨
} 5>®,= -ycOMce p O3NC+>;ONA,C3>53>%4
x9*C6¬M¥O|} *4DNA 2*6{yDv6{MD°6V1Gc p .N%AA1ONMC+*%**u%;t.=O3+>;
p 4,O3DN=O3D4C5+D%4,=?C4 q +>u%;CN,O34C5
¥%y%}~M¥B<¥%y%zª¥%y¯}~M¥B<¥%y¯vª¥%M%}rM¥vB<¥%M%¬¥%M¯}rM¥vB<¥%M¯
15
¥}~M¥B<¥zª¥¯}~M¥B<¥¯vª¥¯}rM¥vB<¥¯¬¥¯¯}rM¥vB<¥¯¯
p 5=NC. .+%A>u8;t5
*} ,4C5 ?ON1O4 <*C6{¯MK¯yv6·e
¨>y} 4y1 -yDGcù¨>M} 4y1 0MDGc p NC.=uONA1C,O3%.4C5
®1C}eB2%aNytH%yeB2%G5 yGHOycc p 2z62C
20
®>8O}c>eB2%ONytH%yeB2%G5 yGHOycc p >e69
®%>1}eB2%aNMtH%yeB2%G5 MGHOMcc p 2z62C
®% 8¡}c>eB2%ONMtH%yeB2%G5 MGHOMcc p 96>
®%}a,5 ycHCyG5 MGHOMcc p 2z62C
®11}B<%C,?O+aC5 yKC5 MtHyaHMz¬®>88}~®11v p 2z62Ce6 (6
25
®18}a®11°¬®>8%1}a®11v p 96>e6r>e69
} + C
;
4 g
5 <
¯ v
y 6
e
} + C
;
4
5 2
¯ e
y
c
·
6
e
pº 8aN=%,O3%.4C5*+,~yK qº +>u%;CN,O34C5
¥} 5 1CO3NO5 0®%t º 69g6<¯yv6-¯yGeBBB
5 1CO3NO5 0®>8Ot 6 c6<¯yv6-¯yG 5 1CO3NO5 0®1CK º 69C(60¯yv6<¯yGc
30
«r} 5 1CO3NO5 0®% 8( ºº 69g6<¯yv6-¯yGeBBB
5 1CO3NO5 0®>88( 6 c6<¯yv6-¯yG 5 1CO3NO5 0®18e º 69C(60¯yv6<¯yGc
»} 5 1CO3NO5 0®%>1c ºº 69g6<¯yv6-¯yGeBBB
5 1CO3NO5 0®>8%1c 6 c6<¯yv6-¯yG 5 1CO3NO5 0®11( º 69C(60¯yv6<¯yGc
prq +>u%;CN,%Mþ.+,%,4C.º =O3+>;t5*+,y
35
¥:e6<}~®1CaC®>8O/¥2¯yv62¯y%r}~®1COC®>8O
«:e6<}~®18cC®>88°ª«2¯yv62¯y%r}~®18GC®>88°
»:e6<}~®11GC®>8%1v/»2¯yv62¯y%r}~®11KC®>8%1v
p 8aN=%,O3%.4C5*+,~MK q +>u%;CN,O34C5
¿}» 5 1CO3NO5 0®% 8( º 69g6<¯y°6<¯yGeBBB
40
5 1CO3NO5 0®>88( º 6 c6<¯yv6-¯yG 5 1CO3NO5 0®18e º 69C(60¯yv6<¯yGc
¾ r
} r
« 5 1CO3NO5 0®%>1c 69g6<¯y°6<¯yGeBBB
5 1CO3NO5 0®>8%1c º 6 cº 6<¯yv6-¯yG 5 1CO3NO5 0®11( º 69C(60¯yv6<¯yGc
prq +>u%;CN,%Mþ.+,%,4C.=O3+>;t5*+,M
¿:e6<}¿z:e6<~®>88°/¿<¯yv62¯y%}¿2¯yv6<¯y%~®18°
¾ :e6<} ¾ :e6<~®>8%1v ¾ <¯yv62¯y%} ¾ 2¯yv6<¯y%~®11v
45
LISTING 3.13:
% X `_ T `h X Y : Solver for a two dimensional PDE (part II).
ü} 5 1ON,O54 <¯yK¯Mv6{¯yK¯MKc p üa8aN=%,O3y
¦r} 5 1ON,O54 2¯yv6<¯MGc p 8aN=%,O3y+>*¡.+>;t5=N;C=O5
p *O3,O5= q A%+. ,+>J
üg:KF<¯y 6:KF0¯yG}¥
50
üg:KF<¯y 60¯yKOF<%C¯yG}¿z
¦%FO6:} >K ®>88GC®% 8GC®18e¥%M%tMz
¦:e6: } z:e6<>K®>88GC®>8OaC®>8%1(%¥%y%:BBB
y%O®>88(¥tz
¦2¯y 6: } 2¯y 6:C%%G>®18GC®1CaC®11(%¥%y¯°:BBB
55
y%O®18(¥¯tz
*+, >Mt}%(F-¯Me p A++ 1=?C,C+>uC?8K3%CA4 q A%+. ,+>JO5
üg M%O¯yKOKF >MK¯y 6 M%C¯yKOKF >Mt¯yG }¥
üg M%O¯yKOKF >MK¯y 6 MC¯yKOKF M%O¯yG}r«
üg M%O¯yKOKF >MK¯yv6 >MK¯yKOKF >MKOC¯yG }»
60
¦g0c6 >MG}>K ®>88GC®>8OaC®>8%1(%¥%y% >MG%%yz
¦2¯yv6 >MG} %G®18GC®1CaC®11(%¥%y¯ >MG%%yz
4>;
p AN5>= q A%+. ,+>J
ü-¯M%C¯yKOKF-¯MK¯y°6-¯M%O¯yKOKF-¯MK¯yG}¥
65
ü-¯M%C¯yKOKF-¯MK¯y°6-¯MO¯yKOKF:¯M%C¯yG} ¾
¦gF60¯MK}%G®>88GC®% 8GC®18e¥%M¯GMz
¦g0c60¯MK}~¦0c60¯MG>K®>88cC®>8OOC®>8%1(%¥%y%-¯MG%y°BBB
>%®>8%1(¥¯Gz
¦2¯yv6-¯MK}~¦2¯yv6<¯MG%%%G>®18cC®1COC®11(%¥%y¯-¯MG%y°BBB
70
>%®11(¥¯¯Gz
~} ,4C5 ?ON1O4 2¦6{¯MK¯yv6·( p 5=NC. .+>;t5=N;C=þ.+%A>u8;t5
=}³tHG<¯=g%e p =C3:8O4¡5>=%4>1
ü º } 5 1O4M4 <¯MK¯yG~=?O4=Nt=K%üe p 381aA3%.%3=1CN,=
ü »} 5 1O4M4 <¯MK¯yGþ0=?O4=NK%=K%üe p 4y1aA3%.%3=1CN,=
75
*~}~*g
*+, 8G}OKF<¯= p A++ 1=?C,C+>uC?¡=C3:8O45=41t5
*~}ü º ÿG ü »G*~a%=Kc p 5+%AD4=?C4=%Nea5MO5=4>8
4>;
80
*} ,4C5 ?ON1O4 9*z62¯yv6<¯MGc p ,4cO5=NC. *+,+>uC=1%uC=
Apparently, the derivative price will now be a function f(S, t, τ). Also, τ will
evolve as an ODE dτ = dt if st > log B and dτ = 0, otherwise. The price will
satisfy a different PDE within each domain
1
St < B : ft (S, t, τ) + rSfS (S, t, τ) + σ 2 S 2 fSS (S, t, τ) = rf(S, t, τ)
2
1
St > B : ft (S, t, τ) + rSfS (S, t, τ) + fτ (S, t, τ) + σ 2 S 2 fSS (S, t, τ)
2
= rf(S, t, τ)
LISTING 3.14:
% X `_ T `h X ` & Y C'( Y : Implementation of the two dimensional solver.
p 1C4s q 5s%Cs38C1aA@Bw8
1B2, }B2{1B-,?O+ }B%e p 3 ;C=%,O5=H.+,,
1B 5%38aNy}B<e{1B 5%38aNM}B%e p DC+%AN=O3A3=O34C5
1B9© }GB2%e{1B0=?O4=N´}B<e p 5=%,O3 O4H~=?O4=N
1B2³}B<e{1B2¯= } ( p =C3:8O45=41t5
5
1B q y}B<e{1B2¯y}Ct p A+>e%¨,C3>
1B q M}B<e{1B2¯M}Ct p A+>e%¨,C3>
x<¨ yv6¬¨ Mv6¬*a|~}r1C4s q 5s%°.N%AA8t3;76V1Gc p .N%AA1C,O3%.4,
3 y} *C3; ¨ y((B9²CB<G¨ MeOGB<COc
3 M} *C3; ¨ M((B9²CB<G¨ MeOGB<COc
10
.+>;C=C+>uC, ¨>yz3 yG(6¬¨>Mz3 MGe6{*:3Mv63 yGg<c
To solve for this contract we would need a grid over a 3-D region, and of course
a more complex set of boundary conditions needs to be specified.
Another group of problems that can be attacked using PDEs arises when
single asset models with more than one factors are considered. For example
one might want to price derivative contracts under the Heston (1993) stochastic
volatility model, where
p
dS(t) = µS(t)dt + v(t)S(t)dBs (t)
p
dv(t) = κ [v̄ − v(t)] dt + φ v(t)dBv (t)
dBs (t)dBv (t) = ρdt
The PDE approach can also be applied in such a setting, although as the di-
mensionality increases implementation become infeasible (and simulation-based
methods are typically preferred).
Following the success of the Black and Scholes (1973) model on pricing and
hedging derivative contracts, there has been a surge of research on models that
can capture the stylized facts of asset and derivative markets. Although the BS
paradigm is elegant and intuitive, it still maintains a number of assumptions
that are too restrictive. In particular, the assumption of identically distributed
and independent Gaussian innovations clearly contradicts empirical evidence.
When developing specifications that relax these assumptions, academics and
practitioners alike discovered that apart from the BS case, very few models offer
European option prices in closed form. Being able to rapidly compute European
call and put prices is paramount, since typically a theoretical model will be
calibrated on a set of prices that come from options markets. The parameter
values retrieved from this calibration will be used to price and devise hedging
strategies for more exotic contracts.
It turned out that, in many interesting cases, even though derivative prices or
the risk-neutral density cannot be explicitly computed, the characteristic function
of the log-returns is tractable. Based on this quantity, researchers did indeed link
the characteristic function to the European call and put price, via an application
of Fourier transforms (see Heston, 1993; Bates, 1998; Madan, Carr, and Chang,
1998; Carr and Madan, 1999; Duffie, Pan, and Singleton, 2000; Bakshi and
Madan, 2000, inter alia for different modeling approaches).
This is called the risk-neutral or risk adjusted probability measure. This measure
need not be unique, given the current set of bond and asset prices, unless the
market is complete, but all derivative contracts will have a no-arbitrage price
that is equal to their discounted expected payoffs under this measure. That is
to say, a European call option will satisfy
Under the BS assumptions Q will be unique, and X(T ) will follow a Gaussian
distribution under both P and Q. Under more general assumptions this need
not be the case. Since we are interested in the pricing of derivatives we are
going to ignore the true probability measure from now on, and focus instead
on the qualities and characteristics of the risk-neutral measure. Therefore all
expectations are assumed to be under Q, unless explicitly stated otherwise.
FOURIER TRANSFORMS
One of the most important tools for solving PDEs is the Fourier transform of
a function f(x). In particular, we define as the Fourier transform of f(x) a new
function φ(u), such that
Z
F[f](u) = φ(u) = exp(iux)f(x)dx
R
√
where i = −1 is the imaginary unit. It turns out that each function f defines
a unique transform φ, and this transform is invertible: if we are given φ we can
retrieve the original function f, using the inverse Fourier transform
Z
1
F−1 [φ](x) = f(x) = exp(−iux)φ(u)du
2π R
There can be some confusion, as different disciplines define the Fourier trans-
form slightly different, setting exp(±iux) the other way round, or multiplying both
integrals with √12π to result in symmetric expressions. Here we use the definition
that Matlab implements, but one has to always verify what a computer language
offers.
Fourier transforms have some properties that make them invaluable tools for
solutions of differential equations, the most important being that the transform
is a linear operator
CHARACTERISTIC FUNCTIONS
If f is a probability density function that measures a random variable, say X(t),
then its Fourier transform is called the characteristic function of the random
variable. It is also convenient to represent the characteristic function as an
expectation, namely
φ(t, u) = E exp(iuX(t))
Characteristic functions are typically covered in most statistics textbooks. 1
Since functions and their Fourier transforms uniquely define each other, the
1
A good reference for characteristic functions and their properties is Kendal and Stuart
(1977, ch 4).
characteristic function will have enough information to uniquely define the prob-
ability distribution of the random variable. In particular, the inverse Fourier
transform will determine the probability density function.
In many cases it is tractable to solve for the characteristic function of a
random variable or a process, rather than the probability density itself. A large
and very flexible class of processes, the Lévy processes, are in fact defined
through their characteristic functions.
Characteristic functions have more important properties. By taking deriva-
tives at the origin u = 0, one can retrieve successive moments of the random
variable, as
∂n φ(t, u)
E [X(t)]n = i−n
∂un u=0
This means that qualitative properties of the distribution, such as the volatility,
skewness and kurtosis can be ascertained directly from the characteristic func-
tion. In addition, it becomes straightforward to implement calibration methods
that are based on the moments.
The characteristic function has the property φ(t, −u) = φ(t, u), with z̄ the
complex conjugate. Thus, the real part is an even function over u, while the
imaginary part is odd. This is in line with the fact that the probability density
is a real valued function, since to achieve that when integrating over the real
line the imaginary parts must cancel out. One can use this property to write the
Fourier inversion that recovers the probability density function as
Z
1 ∞
f(t, x) = Re [exp(−iux)φ(t, u)] du
π 0
The cumulative density function is of course (the function ℵ(·) is the indicator
function) Z x
F (t, x) = P [X(t) 6 x] = E [ℵ(X 6 x)] = f(s)ds
−∞
It is also possible to recover the cumulative density function, from the charac-
teristic function
Z ∞
1 1 exp(iux)φ(t, −u) − exp(−iux)φ(t, u)
F (t, x) = + du
2 2π 0 iu
Z ∞
1 1 exp(−iux)φ(t, u)
= − Re du
2 π 0 iu
The order of integration can be reversed as follows (details on how exactly this
is carried out can be found in the next section where the same approach is
implemented in an option pricing framework)
Z Z ∞
φη (t, u) = exp((iu − η)x)f(t, z)dxdz
R z
Z
exp (iu − η)z 1
= f(t, z)dz = φ(t, η + iu)
R η − iu η − iu
We can therefore compute the cumulative probability by “un-damping” this
characteristic function, in effect computing
Z
F (t, x) = exp(ηx)F η (t, x) = exp(ηx) exp(−iux)φη(t, u)du
R
FIGURE
0.7
0.8
4.1: Damping the Fourier transform
0.7
0.8
to avoid the singularity at the origin.
The
0.9
1
integrand for the normal inverse0.9
1
Gaussian distribution with parameters
{µ, α, β, δ} = {8%, 7.00, −3.50, 0.25} is presented for different values of the
damping parameter η. The real (imaginary) part is given in blue (green). The
dashed thick line gives the integrand for η = 0.01 ≈ 0, which diverges at zero.
The solid thick line presents the integrand for η = 1, while the solid thin line
assumes η = 10. Two different horizons of one day and one month are presented,
to illustrate the change in the tail behavior as the maturity is decreased.
0 0
0.1 0.1
0.2 0.2
0.3 2 0.3 2
0.4 0.4
1.8 1.8
0.5 0.5
0.6 1.6 0.6 1.6
0.7 0.7
0.8 1.4 0.8 1.4
0.9 0.9
1 1.2 1 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 1 2 3 4 5 6 7 8 9 10 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
We specify that this relationship holds under the risk neutral measure because
it is most likely that the market we are working on is incomplete. If this is the
case, then we are not able to specify the no-arbitrage prices of options solely on
the information embedded in (4.2) that is specified under P. We need a change
of measure technique,2 as well as a number of preferences parameters that will
allow us to determine the equivalent measure Q. in order to sidestep these issues
we can assume that the process in (4.2) is defined under Q. The only constraints
that must be imposed is that the expectation
1
B(T ) · ES(T ) = S(0) =⇒ E exp X(T ) =
B(T )
If we assume that the characteristic function φ(T , u) of log S(T ) is given to us,
then the above constraint can be expressed as a constraint on the characteristic
function, that is to say
1
φ(T , −i) =
B(T )
There have been two methods that compute European calls and puts through
the characteristic function. Following the seminal work of Bakshi and Madan
2
For example Girsanov’s theorem (Øksendal, 2003), or the Esscher transform (Gerber
and Shiu, 1994), can be used to define equivalent martingale measures.
The second integral is just the probability P[log S(T ) > log K ], and since
φ(t, u) is the characteristic function of log S(T ) this will be equal to
Z ∞ Z
1 1 ∞ exp(−iux)φ(T , u)
Π2 = f(T , x)dx = + Re du
log K 2 π 0 iu
To compute the second integral we use the trick of multiplying and dividing
the expression as follows
Z ∞ R∞ Z ∞
log K exp(x)f(T , x)dx
exp(x)f(T , x)dx = R ∞ · exp(x)f(T , x)dx (4.3)
log K −∞ exp(x)f(T , x)dx −∞
exp(x)f(T , x)
f ? (T , x) = R ∞
−∞ exp(x)f(T , x)dx
R∞
then the fraction in (4.3) can be expressed as log K f ? (T , x)dx. The Fourier trans-
form of f ? (T , x) is given by
Z
φ(T , u − i)
φ? (T , u) = exp(iux)f ? (T , x)dx =
R φ(T , −i)
Putting everything together will yield the European call option price, which
has the same structure as the Black-Scholes formula, where instead of the cu-
mulative normal values we have Π1 and Π2 . To summarize
PSfrag replacements
0.5
0 k ∈ (−∞, +∞)
x
x ∈ (k, +∞)
or
x ∈ (−∞, +∞)
k ∈ (−∞, x)
-0.5
-1
-1 -0.5 0 0.5 1
k
Z
η
ψ η (T , u) = B(T ) exp(iuk)Pcall (k)dk
R
Z Z ∞
= exp(iuk) exp(ηk) (exp(x) − exp(k)) f(T , k)dx dk
R k
We will change the order of integration, and therefore the integration limits
will change from (k, x) ∈ (−∞, +∞) × (k, +∞) to (x, k) ∈ (−∞, +∞) × (−∞, x),
as shown in figure 4.2. Then
Z Z x
η
ψ (T , u) = B(T ) exp(iuk + ηk + x)f(T , x)dkdx
R −∞
Z Z x
− B(T ) exp(iuk + ηk + k)f(T , x)dkdx
R −∞
LISTING 4.1:
a& ` "!>R Y $'( Y : Characteristic function of the normal distribution.
p 1%?t3s>;a+,8aNA Bw8
*u%;t.=O3+>; M~}r1%?t3s>;a+,8aNAvuv6V1G
= }1vB2=g
, }1vB2,g
5%38aN~}1vB5%38aN
5
N },eB2%c5%38aNKc
M~} 4y1 :3C=aNO>uB2%%=aa5%38aNG%u°B<g
the original call price we just need to apply the inverse Fourier transform on
ψ η (T , u)
Z
exp(−ηk)
Pcall (k) = F −1 [ψ] (k) = exp(−iuk)ψ η (T , u)du
2π R
Option prices are of course real numbers, and that implies that the Fourier
transform ψ η (T , u) must have odd imaginary and even real parts. Therefore we
can simplify the pricing formula to
Z
exp(−ηk) ∞
Pcall (k) = Re[exp(−iuk)ψ η (T , u)]du (4.4)
π 0
The choice of the parameter η determines how fast the integrand approaches
zero. Admissible values for η are the ones for which |ψ η (T , 0)| < ∞, which in
turn implies that |E[S(T )]η+1 | < ∞, or equivalently that the (η + 1)-th moment
exists and is finite. For more information for the choice of η see Carr and Madan
(1999) and Lee (2004b).
LISTING 4.2:
a& ` a" & b Y : Characteristic function of the normal inverse Gaussian
distribution.
p %1 ?t3s>;t3°Bw8
*u%;t.=O3+>; M~}r1%?t3s>;t3u6V1c
= }1vB2=g
, }1vB2,g
4A=N~}1vB-4A=N
5
NA>1%?ON~}1vB0NA>1%?ON
q 4=%N }1vB q 4=%N
Nr},4A=Nt 5>®,= NA>1%?ONta q 4=%N úBBB
4A=Nt 5>®,= NA>1%?ONKtt q 4=%N Oae
M~} 4y1 :3C=aNO>u=a4A=Nt 5>®,= NA>1%?ONta q 4=%N ¡KBBB
10
=t%4A=Nt 5>®,= NA>1%?ONKtt q 4=%N O3>uGgB<tc
The expectation can be cast in terms of the characteristic function of X(T ), giving
the constraint
1
a = r − log φ(T , −i)
T
This constraint will ensure that the under risk-neutrality the asset will grow at
the same rate as the risk free asset.
The characteristic function for the normal distribution, implemented in listing
4.1, is given by
1
φ(t, u) = exp itau − tσ 2 u2
2
for a = r − 12 σ 2 . The characteristic function of the NIG distribution is given in
listing 4.2
p q
2 2 2
φ(t, u) = exp itau + tδ α − β − tδ α − (β + iu) 2
p p
In this case the parameter a = r − δ α 2 − β2 + δ α 2 − (β + 1)2 .
The integral will be approximated with a quadrature, and here we will use the
trapezoidal rule.
RFIGURE
0.1
∞
0.2
4.3: Numerical Fourier inversion
0.1
0.2
using quadrature. The integral
0.3 Re [exp(−iux)φ(T , u)] du is approximated
0.3 where φ(T , u) is the characteristic
00.4 0.4
function
0.5
0.6
of the normal distribution, with
0.5
0.6
µ = 8%, σ = 25% and T = 30/365. The
0.7 0.7
upper
0.8 integration bound is ū = 50. Results
0.8 for x = 5% and x = 15%, as well as
0.9 0.9
4u 1 = 10 and 4u = 5 are given. 1
1 1
0 0
0.1 0.8 0.1 0.8
0.2 0.2
0.3 0.3
0.6 0.6
0.4 0.4
0.5 0.5
0.6 0.4 0.6 0.4
0.7 0.7
PSfrag replacements PSfrag replacements
0.8 0.8
0 0.2
0.9 0 0.2
0.9
0.11 0.11
0.2 0.2
0.3 0 0.3 0
0.4 0.4
0.5 -0.2 0.5 -0.2
0.6 0.6
0.7 0.7
0.8 -0.4 0.8 -0.4
0.9 0 10 20 30 40 500.9 0 10 20 30 40 50
1 1
(a) x = 5%, 4u = 10 (b) x = 5%, 4u = 5
1 1
0 0
0.1 0.8 0.1 0.8
0.2 0.2
0.3 0.3
0.6 0.6
0.4 0.4
0.5 0.5
0.6 0.4 0.6 0.4
0.7 0.7
0.8 0.8
0.9 0.2 0.9 0.2
1 1
0 0
-0.2 -0.2
-0.4 -0.4
0 10 20 30 40 50 0 10 20 30 40 50
In particular, we start with a truncating the interval [0, ∞), over which the
characteristic function is integrated. We select a point ū that is large enough
for the contribution of the integral after this point to be negligible. Then we
discretize the interval [0, 4t] into N − 1 subintervals with spacing 4u, that is
we produce the points u = {uj = j4u : j = 0, . . . , N}. For a given maturity T
we denote the integrand with h? (x, u) = exp(−iux)h(u), and produce the values
at the grid points hj (x) = h? (x, uj ). Then, the trapezoidal approximation to the
integral is given by
Z ∞ N
X 1
h(x, u)du ≈ hj (x)4u − h0 (x) + hN (x) 4u
0 2
j=0
Therefore, in order to carry out the numerical integration, one has to make
two ad hoc choices, namely the upper integration bound ū and the grid spacing
4u. Selecting ū can be guided by the speed of decay of the characteristic
LISTING 4.3:
^ i ` &0"g Y : Trapezoidal integration of a characteristic function.
p .*Cs3 ;C=vBw8
1B2= }F%H (
1B2, }B2%²e
1B 5%38aN~}B<%(
¥}a p ,C3>53>%4+D%4,=?C4.?CN,*u;a.
5
=?O4=N~}¶xeF-¥vF0 |v p .?CN,*u%;t.=O3+>; O3%5%.,4=O3N=O3+;
y~} xB9F2B2eF2B9|°% p +>uC=1%uC=,C3>*+,4;t5%3=%M
¯y} A4;C%=? <ytc
1*} 4,C+5 2¯y 6:e p O4%41t5~4;t5%3=%MDNA>uO4C51*z<yt
*+, K}OKF0¯y p A++ 1 +D%4,yK%DNA>uO4C5
10
p =?C4¡3 ;C=4%,N;C¡DNA>uO4C5
M~} 4y1 %3yg9G%=?O4=NKeB21%?t3s>;a+,8aNAv0=?C4=%N 6V1Gc
M~} ,%4N%A <Mtc
} 5u8 <Mt(>eB2%G0Mz:%Mg 4>; c p =%,N1O4C+3NA¡,uOA4
1º *z9c} º ¥aH 1O3 p =?C4+>uC=1%uC=
15
4>;
function. A good choice of 4u on the other hand can be a little bit trickier, since
the quantity exp{iux} = cos(ux) + i sin(ux) will be oscillatory with frequency
that increases with x. Figure 4.3 gives the quadrature approximations for the
case of the normal distribution. The characteristic function corresponds to a
density with mean µ = 8% p.a. and volatility σ = 20% p.a., and the maturity is
one month, T = 30/365. We have set the upper quadrature bound to ū = 50, and
use two different grid sizes, a “coarse” one 4u = 10 and a “fine” one 4u = 5.
We investigate the integration for x = 5% and for x = 15%. In the first case
the integrand is not oscillatory and even the “coarse” approximation captures
the integration fairly accurately. When x = 15% the integrand oscillates and a
“finer” grid is required. This example illustrates that one must be careful and
cautious when setting up numerical integration procedures that automatically
select the values for ū and 4u.
In order to reconstruct the probability density function we need to repeat the
procedure outlined above for different values of x. This is carried out in listing
4.3 for the normal distribution. The results are plotted in figure 4.4 in logarithmic
scale for different values of ū and 4u. One can verify that if we are interested in
the central part of the distribution, then a coarse grid with is sufficient while the
results are not particular sensitive to the choice of the upper integration bound.
One the other hand, if we want to compute the probability density at the tails,
then we need to implement a very fine grid over a large support interval.
There is a distinct and very important relationship between the fat tails and
the decay of the characteristic function. In particular, the higher the kurtosis
of the distribution, the slower the characteristic function decays towards zero
as u increases. This has some implications on the implementation of the nu-
1 1
−1 −1
−3 −3
−5 −5
−7 −7
-0.5 0 0.5 -0.5 0 0.5
1 1
−1 −1
−3 −3
−5 −5
−7 −7
-0.5 0 0.5 -0.5 0 0.5
1
PSfrag replacements
−1
−2
−3
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
can still be inaccurate around the tails. We need to reduce the grid size to
increase the overall accuracy. Observe that the right tail is slightly oscillatory
even when {4u, ū} = {1, 400}.
This integral can then be used to retrieve the probability density function at
the point x, or a European call option price with log-strike price x. Typically,
we want to compute the integral for many different values of the parameter x,
in order to reconstruct the probability density function or the implied volatility
smile. Using the approach we outline above, we must perform as many numerical
integrations as the the number of abscissas over x.
for all k = 1, . . . , N. The number of operations needed for the FFT is of order
o(N log N). For comparison, if we wanted to compute the above sums separately
and independently it would take o(N 2 ) operations, meaning that in order to
double the number of points the number of operations will increase fourfold.
With the FFT the computational burden increases a bit more than two times. This
substantial speedup is the reason that has made FFT popular in computational
finance, since we typically we need to evaluate thousands of Fourier inversions
when calibrating models to observed volatility surfaces.
The input of the FFT is a vector z ∈ CN , and the output is a new vector
z ? ∈ CN . Each element of z ? will keep the sum for the corresponding value of
k. Our task is therefore to cast the integral approximation in a form that can be
computed using FFT.
The first step is of course to discretize the interval [0, ū] using N equidistant
points, and say that we set u = {(j − 1)4u : j = 1, . . . , N}. Therefore the
trapezoidal approximation to the integral is given by
1
exp(−iu1 x)h(u1 )4u + exp(−iu2 x)h(u2 )4u + exp(−iu3 x)h(u3 )4u + · · ·
2
1
+ exp(−iuN−1 x)h(uN−1 )4u + exp(−iuN x)h(uN )4u
2
0
Thus, if we set α = 12 , 1, 1, . . . , 1, 12 and hj = h(uj ), we can write the approx-
imation as the sum
X N
exp(−iuj x)αj hj 4u
j=1
Since the FFT will also return an (N ×1) vector, we should set the procedure
to produce values for a set x = {x1 +(k−1)4x : k = 1, . . . , N}. We typically want
these values to be symmetric around zero,4 and therefore we can set x1 = − N2 4x.
The approximating sum for these values of x will therefore be given by
4
When we invert to construct a probability density we typically interested in the density
at log-returns symmetric around the peak which will be close to zero. If we invert for
option pricing purposes, we can normalize the current price to one. Then each value
of y will correspond to a log-strike price, and we typically want to retrieve option
prices which are in-, at- and in-the-money. The at-the-money level will be around the
current log-price which is of course zero.
LISTING 4.4:
ii ` >^ $''( Y : Call pricing using the FFT.
p *%*%=Cs.NA%AvBw8
*u%;t.=O3+>; xv6{M|~}*%*%=Cs.NA%A:.*v6/1O. *°6{1%*K
p DNA>uO4C5*+, ¾¾ ³¡381aA4>8a4;C=N=O3%+>;
?}1%*zB04=%Nz/¯}1%*B9¯ªu%¿N,}1%*B2u%¿N,v
p 1ON,N>8a4=4,DNA>uO4C5
5
,~}1a.>*B2,{=~}1a.>*B2=g
%¿N,}¯K¥(Hc
¥u}ru%¿N,cHG2¯e%e{¥} 1O3 H¯aH¥uv p ,C3> 5%34C5
u}¶xeF2¯e%|CC¥uvV}¶xeF2¯e%|CC¥%¿N,v
p .+ 81%uC=4=?C4r1O53G%*u%;t.=O3+>;
10
~} *4DNA :.*v6uc%3CK<?KOg6V1a.>*cc
~} 4y1 ,a=t%B2Hc<?c?c u B<G3OG<%%?KOC>uG(
~} 4y1 3>uK%¿N,(eB2%B2%¥uv p 3 ;C=4%,N;C
p =%,N1O4C+3NA¡,uOA4
z:}eB2%Cz:e{g 4>; }eB2%Cg 4>; c
15
J} ,%4N%A **= <ytc p ¾¾ ³
M~} 4y1 >?K>GeB2JaH 1O3 p +>uC=1%uC=
LISTING 4.5:
iRig Y : Fractional FFT.
p * ,*=°Bw8
*u%;t.=O3+>; *~}~*,*=v2y6ªNa
¯ } 53>%4 9yv6<e
4C} 4y1 1O3 3Nc<F2¯(%aB<g
4} 4y1 1O3 3NK-¯vF%KF0zB<g
5
O}x9yB246 4,C+5 0c6<¯K|(
%} x-KB2Ha46¬4C|(
*a} **= -OCc
*~} **= -%tc
* }~*aþB2~*g
10
3>*} 3>**= -*Gc
*~}4CB2a3>*v:e6:KF<¯Kc
small number of the 512 output values are actually within the ±30% which we
might be interested in.
One way that can result in smaller output grids is increasing the FFT size
N. We have chosen the upper integration bound in a way that the characteristic
function is virtually zero outside the interval. Therefore, when we increase N
we just “pad with zeros” the input vector z. For example, if we append the 512
vector with 7680 zeros we will implement a 8192-point FFT, which will return a
more acceptable output grid of 0.0039. But of course applying an FFT which is
16 times longer will have a serious impact on the speed of the method.
The fractional FFT method, outlined in Chourdakis (2005), addresses this is-
sue. The fractional FFT (FRFT) with parameter α will compute the more general
expression
N
X
zk? = exp −2πα(j − 1)(k − 1) · zj (4.6)
j=1
2. Based on these auxiliary vectors create the two (2N × 1) vectors z 1 and z 2
5
For proofs and discussion on the FRFT also see Bailey and Swarztrauber (1991, 1994).
LISTING 4.6:
iRi ` >^ $''( Y : Call pricing using the FRFT.
p *%,%*%=Cs.NA%A°Bw8
*u%;t.=O3+>; xv6{M|~}*%,%*%=Cs.NA%Av:.*v6/1O. *°6{1%*K
p DNA>uO4C5*+, ¾ ¼ ¾ ³ 381aA4>8a4;C=N=O3%+>;
?}1%*zB04=%Nz/¯}1%*B9¯ªu%¿N,}1%*B2u%¿N,vª%¿N,}1%*B2%¿N,v
p 1ON,N>8a4=4,DNA>uO4C5
5
,~}1a.>*B2,{=~}1a.>*B2=g
%¿N, } ¯t¥(Hc
¥u}ru%¿N,cHG2¯e%e{¥}%%¿N,GH¯ p ,C3> 5%34C5
NA>1%?ON~}r¥uc¥(HH 1O3 p *%,NC.=O3+>;ONA1ON,N>8a4=4,
u}¶xeF2¯e%|CC¥uvV}¶xeF2¯e%|CC¥%¿N,v
10
p .+ 81%uC=4=?C4r1O53G%*u%;t.=O3+>;
~} *4DNA :.*v6uc%3CK<?KOg6V1a.>*cc
~} 4y1 ,a=t%B2Hc<?c?c u B<G3OG<%%?KOC>uG(
~} 4y1 3>uK%¿N,(eB2%B2%¥uv p 3 ;C=4%,N;C
p =%,N1O4C+3NA¡,uOA4
15
z:}eB2%Cz:e{g 4>; }eB2%Cg 4>; c
J} ,%4N%A 0*,*=v2y6¢NA>1%?ONtc p ¾¼¾³
M~} 4y1 >?K>GeB2JaH 1O3 p +>uC=1%uC=
z ε1 1 ε1
z1 = and z 2 =
0 ε2
4. The N-point FRFT will be the first N elements of the inverse FFT
z ? = ε1 IFFT(z ?1 z ?2 )
We can now easily adapt the recipes of the previous section to accommodate
the fractional FFT. We can now choose the two grid sizes freely, and set the
fractional parameter α = 4u4x
2π . Thus, we need to change the corresponding
steps of the recipes to:
4u4x
Run the fractional FFT on z with fractional parameter 2π , that is z ? =
FRFT(z, 4u4x
2π ).
Listing 4.6 implement the fractional FFT based option pricing. Chourdakis
(2005) gives details on the accuracy of this method for option pricing based on a
number of experiments that compares the fractional to the standard FFT. Figure
4.6 gives an example that is based on the normal distribution. One can observe
the exceptional accuracy of both methods: a 8192-point FFT is contrasted to a
512-point FRFT.
×10−15
0
0.1 2
0.2
0.3
0.4
0.5
0.6 1
0.7
0.8
0.9
1
0
-1
0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25
0
for hj = h(uj ), and α = 12 , 1, 1, . . . , 1, 21 . What we have done is approximating
the whole integrand as a piecewise linear function. This integrand is the product
of two terms: the first, exp(−iux), is a combination of trigonometric functions,
and will be highly oscillatory, especially for large values of |x|; the second, h(u),
is also oscillatory but typically very mildly and also independent of x.
It therefore makes sense to approximate only the second component as a
piecewise linear function, and leave the first part intact. We therefore split the
integral into N − 1 sub-integrals
Z uN X Z uj+1
N−1
exp(−iux)h(u)du = exp(−iux)h(u)du
u1 j=1 uj
hj+1 − hj 4hj
h(u) ≈ hj + (u − uj ) = aj + bj u for uj 6 u 6 uj+1
uj+1 − uj 4uj
LISTING 4.7:
i Ri ` &0"% b R%$>%e Y : Integration over an integral using the FRFT.
p *%,%*%=Cs3 ;C=4%,N%=4 Bw8
*u%;t.=O3+>; M~}~©C«s*%,%*%=Cs3 ;O=4%%,CN%=4@2*6u6{yt
>u}uv<tuv:e p 5>=%4>1+>*=?C4¡5 u%1%1a+,%=
y}yz<t>yz:e p 5>=%4>1+>*=?C4,4C5 uaA=O5
*}*z:ª*»}*g 4>; c p *O3,O5=N>;AN5>=4A4>8a4;C=O5
5
y}yz:
u}uv:/u»}uz 4>; c
q } C3>** <*tH%>uv pq s ~1a+3 ;C=O5
.} C3>** x q |Oe p .s~1a+3 ;C=O5
? }.gB2 4y1 %3yt>uGc p D4C.=C+,1ONC5%54=%+ ¾ ¼ ¾ ³
10
NÃ}eB2%O>ucytH 1O3 p ¾ ¼ ¾ ³1ON,N>8a4=4,
?t}~©C«s*%,%*%=vw?v6ªNac
M}3OK:*t 4y1 %3utytO*»G 4y1 %3u»GytB9Hyg
M}M~?tB2 4y1 %3uKK<yGyKB2Hc0yB<g
15
4>;
4.7 SUMMARY
To summarize, for large classes of models closed form solutions even for European
style options are not available but their characteristic function is available in
closed form. For example, models where the logarithmic price is Lévy (Madan
et al., 1998; Carr, Geman, Madan, and Yor, 2002), Garch models (Heston and
Nandi, 2000), affine models (Heston, 1993; Duffie et al., 2000; Bates, 2000, 1998),
regime switching models (Chourdakis, 2002) or stochastic volatility Lévy models
(Carr, Geman, Madan, and Yor, 2003; Carr and Wu, 2004) fall within this category.
Fourier transform methods can be applied to recover numerically European
call and put prices from the Fourier transform of the modified call. Therefore
such models can be rapidly calibrated to a set of observed options contracts,
as we will investigate in the next chapter on volatility. The FFT method or
its fractional variant are well suited to perform this inversion. Also, one can
use these methods to invert the characteristic function itself, thus recovering
numerically the probability density function. This can in turn be used to set up
numerical procedures for pricing American style or other exotic contracts, for
example as in Andricopoulos, Widdicks, Duck, and Newton (2003).
It is typical in many, if not all, financial application to face models that depend
on one or more parameter values, which have to be somehow determined. For
example, if we are making the assumption that the stock price we are investi-
gating follows a homogeneous geometric Brownian motion, then we would be
interested in estimating the expected return and the corresponding volatility.
Then we could produce forecasts, option prices, confidence intervals and risk
measures for an investment on this asset.
At this point we must remind ourselves that not all of the above operations
are carried out under the same measure. This fact will largely determine which
data will be appropriate to facilitate a calibration method. Some parameters,
such as the drift in the Black-Scholes framework, are not the same under the
objective and the pricing measure, while some others, such as the volatility, are.
In particular, if our ultimate goal is pricing, we must place ourselves under
the pricing measure and use instruments that are also determined under the
same measure. In this way the prices that we produce will be consistent with
the prices that we use as inputs, and we will not leave any room for arbitrage.
The dynamics recovered under this data set will not be the real dynamics of the
underlying asset: instead, they will be consistent with the attitude of investors
against risk, and thus modified accordingly. In general, drifts will be lower,
volatilities will be higher, and jumps will be more frequent and more severe.
When pricing assets, investors behave as if this is the, precisely because these
are the scenarios that they dislike.
On the other hand, if our goal is forecasting or risk management, we are
interested in the real asset dynamics. We do not want the parameters to be
contaminated by risk aversion, and the appropriate data in this case would be
actual asset prices. Based on the real historical movements of assets we will
base our forecasts for their future behaviour.
Nevertheless, there are situations where we might want (or have to) use both
probability measures jointly. As derivative prices are forward looking we might
want to augment our information set with their prices, in order to produce more
accurate forecasts. From an “academic” point of view, since the distance between
!"#%$& '()" 132(5.1)
%
' :
& S $ C
' e
1
with L = −17.00 for N = 10, and N(0.83, 1.82) with L = −100.98 for N = 50.
[ ` Z Y Y ]
0
PSfrag replacements 0.3 0.1 0.3
0.2
0.3
0.4
0.5
0.2 0.6 0.2
0.7
0.8
0.9
1
0.1 0.1
0 0
-8 -6 -4 -2 0 2 4 6 8 -8 -6 -4 -2 0 2 4 6 8
(a) N = 10 (b) N = 50
the two probability measures depend on the risk premiums, we might want to
identify these premiums for different risk components. For instance, we might
want to quantify the price of volatility risk versus the price of jump risk. Finally,
in some situations we do not observe the underlying asset directly. This is the
case in fixed income markets, where we can attempt to identify the true dynamics
using time series of bonds which are evaluated under the pricing measure.
In this chapter we will focus on the case where calibration is carried out using
a time series of historical values. There is a plethora of methods available, but
we will focus on the most popular one, the maximum likelihood estimation (MLE)
technique. We will not focus on deriving the properties of MLE, but will rather
refer to Davidson and MacKinnon (1985) and Hamilton (1994). These books also
give a detailed analysis of variants of MLE, as well as alternative method’s of
moments. For an introduction to Bayesian techniques, a good starting point is
Zellner (1995).
To select the maximum log-likelihood we need to set the first order conditions,
namely that
∂
L(θ̂; x) = 0
∂θ
The second order conditions will dictate that for the likelihood to be actually
maximized, the K × K Hessian matrix
1
To be more precise, X contains the random variables that are conditional on their
history. That is to say, the random variable Xt is conditional on the realizations of all
values that preceded it, namely {Xt−1 , Xt−2 , . . . , X1 }.
2
There are practical as well as theoretical reasons for doing so. Imagine having a sample
of 1000 observations from the red density of figure 5.1(b) on page 132, where each has
a likelihood of about 0.1 = 10−1 . Then the likelihood of the sample would be of the
order 10−1000 , small enough to confuse the best of computers. But the log-likelihood
(with base 10 for simplicity) is −1000, a much more manageable figure. This is a
practical issue; the theoretical benefits include the computation of standard errors as
described in the text.
∂2
H= L(θ̂; x) is negative definite
∂θ∂θ 0
The maximization of the log-likelihood function can be carried out analytically
in some special cases, but we typically employ some algorithm to produce θ̂
numerically. The choice of the appropriate algorithm will depend on the nature
of the likelihood function: if it is relatively well behaved, then a standard hill
climbing algorithm will be sufficient. In more complex cases, where the likelihood
exhibits local maxima or is even undefined for specific parameter sets, one needs
to resort to other techniques such as genetic algorithms or other simulation based
methods.
Figure 5.1 illustrates the intuition behind the likelihood function. Samples
are drawn from the blue distribution (for simplicity we assume that the sample
elements are independent and identically distributed) of lengths N = 10 and
N = 50. To compute the corresponding likelihood values, one has to compute
the density value at the sample points as shown. The red curves give a density
that is far away from the true one, and we can see that overall the function
values are lower. We numerically maximize the log-likelihood and estimate the
density that has produced the data, which is given in black. When the sample
is small, the estimated density is not close to the true data generating process,
but it will converge as the sample size increases.
∂L(θ ? ; X )
E =0
∂θ
Note that the random variable in the above expectation is the data sample X .
For IID processes, maximum likelihood estimation can be viewed as setting the
empirical expectation of this score to zero.
In the same light we define the (Fisher) information matrix as minus the
expectation of the the second derivative of the log-likelihood, evaluated again
at the true parameter point
∂2 L(θ ? ; X )
I (θ̂) = −E
∂θ∂θ 0
As before, the Hessian matrix produces an estimate of the information matrix
which is based on the sample. The information matrix will be by construction
positive definite, and therefore invertible.
It turns out that we can also say something on the covariance matrix of the
score. In fact, it will be equal to the information matrix
!2
∂L(θ̂; X ) ∂L(θ̂; X )
V =E = I (θ̂)
∂θ ∂θ
What is the correct way to view these expectations and variances? Say that
we knew the true parameter set, and we constructed a zillion sample paths based
on these parameters, each one of length T . If we compute the score vector based
on each one of these samples, we would find that the average of each element
is zero and that the covariance matrix is given by the information matrix.
The information matrix plays another important role, as its positive defini-
tiveness is a necessary condition for all other asymptotic properties to carry
through.
Eθ̂(X) = θ ?
The maximum likelihood estimator is not generally unbiased, and this appar-
ently is not a good thing. But the maximum likelihood estimator is consistent,
which means that as the sample size increases the bias drops to zero. Further-
more, the variance of the estimator’s distribution also drops to zero, indicating
that the maximum likelihood estimator will converge to the true value as the
sample size increases, or more formally that
It also turns out that the distribution of the MLE is Gaussian, with covariance
matrix equal to the inverse of the Fisher information matrix evaluated at the true
parameter value. We can therefore write
θ̂(X ) ∼ N θ ? , I (θ ? )−1
Furthermore, the variance I (θ ? )−1 of the MLE is equal to the so called Cramér-
Rao lower bound, which states that no other unbiased estimator will have smaller
variance than the MLE. This also makes the maximum likelihood estimator
asymptotically efficient. In practice we do not know the value of I (θ ? ) and use
an estimate instead, for example one based on the Hessian of the log-likelihood.
θ̂(x) − θ †
Z (x) = p
I (θ † )−1
LISTING 5.1: Y `
$>R $ T& $>R $ T&
Y Y , Y ` Y Y and $>R Y $ ` T & Y Y : Simulation and maxi-
mum likelihood estimation of ARMA models
p N ,8aNs5%38@Bw8
*u%;t.=O3+>; y~}N,8aNs5%380.e6¬N>,v68N6¬5e6/¯K
1} A4;C%=? 0N,Gc{®~} A4;C%=? 98CNKcªNr} 8ONy 1v6<®tc
³~}¯:rrN/©}¯<e
u}5 ,N;C; 9³z6/©Kc{y~} 4,C+5 9³z6/©Kc
5
yz:KF-Nz6%F-r}.HG0C 5u8 :N,Kc
y}cªu}~c
*+, C}NzF<³(%
3 * 1Gg6ùy}~N,Kyz<K>1KOF96%F-e 4>;
3 * ®tg6ªu}8CNt>uv<K®aOF96%F-e 4>;
10
yg<aOe6%F-}.ryuuz<aOe6%F-(
4>;
p N,8aNs%A3 àBw8
*u%;t.=O3+>; M~}N,8aNs%A3 0.e6¬N>,v68N6¬5e6{yt
15
1} A4;C%=? 0N,Gc{®~} A4;C%=? 98CNKcªNr} 8ONy 1v6<®tc
³~} A4;C%=? <ytrNVu} 4,C+5 9³z6 e
y~}x-.CHG0C 5u8 0N,G%O+ ;C45g<Nz6:{yC|(
y}cªu}~c¬M~}c
*+, C}NzF<³(%
20
3 * 1Gg6ùy}~N,Kyz<K>1KOF2tc 4>;
3 * ®tg6ªu}8CNt>uv<K®aOF2tc 4>;
uz2K 1K%r}yg2K 1K%:.ryutc
M~}M A+ -;a+,81C%*9u9G 1K%6{6ù5Cc
25
4>;
p N,8aNs4yN>81aA4@Bw8
¯}>%%e¬³~}(
M~}N,8aNs5%38 2eB96xeYB %|v6¡x >eB0 |6 (7B 6xw³z6/¯O|Ce
y~} 4,C+5 ¯6ªCe
30
+ 1=}+>1C=O38K54=°C-¸CN,%4¨.NA46:+**6-¥a3%5 1aANM°6:+**:c
*+, 3:;G}OKF-¯
*~p } g<DtrN,8aNs%A3 <Dz:g6/Dz<6/Dz Cg6VDz9Og6/MF6 3:;(c
~}*8K3 ;t.+>; 2*6x9B2%eYB %ú>eB0 eYB |6x¤|6x¤|6BBB
x¤|6x¤|6x0 ;* >eYB ú>eYB eB9O|v6BBB
35
x º ;* B º B eB2|v6x¤|6 + 1=Gc
yg:30;à6%F-r}g
4>;
PSfrag replacements
PSfrag replacements 35 10
9
30
8
25
7
1 6
20
5
15
4
10 3
2
5
1
0 0
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 PSfrag
0.15 replacements
0.2 -0.2 0 0.2 0.4 0.6 0.8
(a) constant (c) (b) AR parameter (α)
PSfrag replacements
7 50
45
6
40
5 35
30
4
25
3
20
2 15
10
1
5
0 0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
(c) MA parameter (β) (d) volatility (σ )
T c α β σ
True 0·00 0·75 −0·10 0·30
Mean 50 −0·002 0·64 −0·013 0·27
500 0·000 0·74 −0·093 0·30
Volatility 50 0·056 0·20 0·24 0·030
500 0·012 0·044 0·063 0·010
Skewness 50 0·001 −1·8 0·57 0·076
500 −0·12 −0·34 0·067 0·025
Kurtosis 50 8·1 8·8 4·7 2·9
500 3·5 3·1 3·0 3·0
for the sample sizes T = 50 and T = 500 respectively. Observe the high negative
correlation between the estimator of the autoregressive and the moving average
terms. As these two parameters compete to capture the same features of the
data,3 the estimates parameters tend to come in high/low pairs.
LÉVY MODELS
Lévy models can be easily estimated using the MLE approach, by inverting
the characteristic function using the FFT or fractional FFT methods of chapter
4. We can invert the characteristic function directly to produce the probability
density, or we can invert for the cumulative density and then use numerical
differentiation. Although the second method appears to be more cumbersome,
it is often more stable. This happens in the case of Lévy models because the
density typically exhibits a very sharp peak, which the direct transform might
fail to capture.
Irrespective of the method we choose to construct the density, the maxi-
mization of the log-likelihood should be straightforward. As Lévy models are
time-homogeneous, the returns are identically distributed. If we denote with
f(x; θ) the probability density of the Lévy model, then the log-likelihood can be
easily computed over a series of returns {x1 , . . . , xT } as
T
X
L(θ; x) = log f(xt |θ)
t=1
Generally speaking, as the FFT method will produce a dense grid for the
probability density function we only have to call the Fourier inversion once at
each likelihood evaluation and interpolate between those points. This renders
MLE quite an efficient method for the estimation of Lévy processes. 4
In our example we will be using the cumulative density function, recovered
with the code of listing 4.8. We use data of the S&P500 index.
3
An AR(1) process can be written as an MA(∞) one and vice versa. Therefore a series
that is generated by an AR(1) data generating process will produce MA(1) estimators
as a first order approximation, if the estimated model is misspecified.
4
Some popular Lévy models admit closed form expressions for the probability density
function. In principle this means that one can avoid the FFT step altogether and use
the closed form instead. It turns out that in the majority of cases these densities are
expressed in terms of special functions, which can be more expensive to compute (over
the data set) than a single FFT!
observation eq: Xt = ax + bx Yt + εt
transition eq: Yt = ay + by Yt−1 + ηt
Observe that if the filtered density ft−1|t−1 is Gaussian, then the prediction for
Yt will also follow a Gaussian distribution, as the convolution of normals.
In the correction step we incorporate the new observation X t = xt which
updates the information set Ft = {Xt = xt } ∪ Ft−1 . We then write
ft|t (yt ) = P[Yt ∈ dyt |Ft ] = P[Yt ∈ dyt |{Xt ∈ dxt } ∪ Ft−1 ]
µt|t−1 = ay + by µt−1|t−1
vt|t−1 = b2y vt−1|t−1 + ση2
5
Aficionados of Baysian statistical inference will recognize gt|t−1 (xt ) as the normaliza-
tion constant which would be probably ignored. But in our setting it is not ignored,
in fact it facilitates the maximum likelihood estimation of the parameters.
Prediction
µt|t−1 = ay + by µt−1|t−1
vt|t−1 = b2y vt−1|t−1 + ση2
Correction
vt|t−1 bx
Kt =
vt|t−1 b2x + σε2
µt|t = µt|t−1 + Kt (xt − ax − bx µt|t−1 )
vt|t = vt|t−1 − Kt vt|t−1 bx
6
Regarding the notation, “∝” stands for “proportional to”; that is x ∝ y means that
x = C y for some constant C . Here we know that the resulting expression is a Gaussian
density, and therefore we are just interested in the structure of the exponential rather
than the constant that ensures that the total probability is equal to one.
LISTING 5.2:
%S $' Y $ " ` iC&>' %>R `Cj<d Y : One dimensional Kalman filter
p ONA 8aN;as*O3A=C4%,Csa¥ Bw8
*u%;t.=O3+>; x¤¸6E8°6ù5|~}~ONA 8aN;as*O3A=4%,OsO ¥ 16^t
N}1v: q }1v<eùN ¤}1vCe q ¤}1v9O
DN}1v<ac¢D ¤}1v Cac
³~} A4;C%=? @ tc
5
8C}N ¤GHG0C q ¤ccªD}~D¤GHG0C q ¤(e p 1C,4O3%.=48t+ 8a4;C=O5
8} 4,C+5 9³v6:e{D~} 4,C+5 9³v6:e/¸}c
*+, =a}OKF-³
p A3 O4A3 ?a+%+þ4DNA>uON=O3+>;
8K}N%sy q K>8C
10
DO} q c%DrDNz
¸}¸ A+ -DOC @ g-=tC8KCH%DO(
p .+,%,4C.=O3+>;¶5>=%4>1
©}~Dt q KHG:Dt q caND Gc p ©CNA 8aN;%N3;
8<=t}8C©KKr g<=tN q K>8CKc p u%1CN=4~8O4N>;
15
Dg<=t}rD±©tDt q p u%1CN=4DN,O3N;t.4
p 1C,4O3%.=O3+>;¶5>=%4>1
8C~}N ¤ q ¤G 8<=tc p u%1CN=4~8O4N>;
D~}rDg<=t% q ¤c~~D ¤z p u%1CN=4DN,O3N;t.4
4>;
20
5} 5>®,= <Dtc
The series of interest is Yt , and its dynamics are known, but we cannot observe
Yt directly. Instead, we observe a noisy version Xt , which we use to filter out
the path of Yt . In the figure, the observed series are given by the blue crosses,
and based on these we filter out µt|t , which is given in red. With green we give
the true path of Yt , which we want to reconstruct. We also give the two standard
√
deviation shaded area µt|t ± 2 vt|t .
However, in this experiment we have assumed that the parameters of the
dynamic system are known, which is not the case in practice. In a real life
situation, we would be given the series of observations X t , and we would be asked
to estimate the parameter vector θ, and then filter out the latent component.
Multivariate systems
The filter described above can be easily extended to vector processes; the matrix
algebra is a bit more involved, but the ideas remain the same. 7 Exogenous
explanatory variables can be also included to the observation equation. In its
most general form, the Kalman filter equations are given by
observation eq: X t = A Z t + Bx Y t + ε t
transition eq: Y t = By Y t−1 + ηt
FIGURE 5.3: Kalman filtering example. The latent series X t that we want to
reconstruct is given in green, and the observed series Y t is given by the blue
crosses. The filtered series µt|t is given in red, together with the two standard
√
%
S
$ ' $ " C
i >
& ' %
>
R ` Z Y e Y ]
$ C
deviation shaded area µt|t ± 2 vt|t .
[ Y ` >
'
1.5
PSfrag replacements
1
0.5
-0.5
-1
-1.5
0 10 20 30 40 50 60 70 80 90 100
(a) true parameters
1.5
PSfrag replacements
1
0.5
-0.5
-1
-1.5
0 10 20 30 40 50 60 70 80 90 100
(b) MLE parameters
* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H
147(5.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
Prediction
µ t|t−1 = By µ t−1|t−1
Vt|t−1 = By Vt−1|t−1 B0y + Ση
Correction
Kt = Vt|t−1 B0x (Bx Vt|t−1 B0x + Σε )−1
µ t|t = µ t|t−1 + Kt (x t − A z t − Bx µ t|t−1 )
Vt|t = Vt|t−1 − Kt Bx Vt|t−1
Correction
−1
Kt = X t , Y t |Ft−1 W t , W t |Ft−1
µt|t = µ t|t−1 + Kt w t
Vt|t = Vt|t−1 − Kt W t , W t |Ft−1 Kt0
LISTING 5.3:
S%$' Y $ " ` iC&>' %>Rg Y : The N-dimensional Kalman filter
p ONA 8aN;as*O3A=4, Bw8
*u%;t.=O3+>; x¤¸66¬¨|~}~ONA 8aN;as*O3A=4,@16\t
«}1B0«z{¿}1B2¿ z¢¨ »}1B0¨ » p + q 54,%DN=O3+>;t5
« ¤}1B0« ¤{¿N¤}1B2N¿ ¤¢¨ }1B0¨ p *NC.=C+,O5
%«}1B<%{§%«}1B2§%
5
³~} 53>%4 Y v6:e
} 4,C+5 9³z6 53>%4 0¨ 6:Ocù¨}/¸}c
*+, =a}OKF-³
p 1C,4O3%.=O3+>;¶5>=%4>1
£}« ¤N¿ ¤G%« p u%1CN=4~8O4N>;
10
§£}r¿N¤G§%«t¿¤°ª¨z p u%1CN=4DN,O3N;t.4
pp A3 O4A3 ?a+%+þ4DNA>uON=O3+>;
y}« ¿ K£
§y}r¿ K§£GN¿ vª¨ »z
=p } g9=v6%F-g0yz
15
¸´}¸ A+ %4= -§yK =°2H§yK =z
p .+,%,4C.=O3+>;¶5>=%4>1
©´}r§£GN¿ °2H§yz p ©CNA 8aN;%N3;
%«~}r£©KKr g2=6%F-g¢« >¿ K£cc p u%1CN=4r8O4N>;
§%«~}r§£¡±©t¿ K§£ p u%1CN=4DN,O3N;t.4
20
p 5=C+,4
2=6%F-r}«v
¨e2=6%F-r} 5>®,= C3N -§%«Kg
4>;
the linearization error will be an effect of Jensen’s inequality. For instance, the
transition equation is linearized around the point (µ t−1|t−1 , 0), which yields the
approximation
f Y t , ηt ≈ f µ t−1|t−1 , 0 + Fy Y t−1 − µ t−1|t−1 + Fη ηt
Where Fy and Fη are the matrices of first derivatives (the Jacobians) of the
function f , with respect to the corresponding elements. Now the system has a
linear form, and taking the expectation and the covariance produces the set of
prediction equations
µ t|t−1 = f µ t−1|t−1 , 0
Vt|t−1 = Fy Vt−1|t−1 F0y + Fη Ση F0η
We can apply the same idea to produce the correction step, namely
Once again the matrices Fy and Fη denote the appropriate Jacobians, in this
case of the function h.
The unscented Kalman filter will iterate through the subsequent times t =
1, 2, . . . , T , applying a prediction and correction step.
For the prediction step we compute M = 2N a + 1 sigma points, based on
a
columns of the Cholesky decomposition of the covariance matrix V t−1|t−1 (which
we denote here with a square root). The points are computed by horizontally
concatenating as follows
q q
ς at−1|t−1 = µat−1|t−1 ; µ at−1|t−1 + γ Vt−1|t−1
a
; µat−1|t−1 − γ Vt−1|t−1
a
This will be a (N a × M) matrix. The idea behind this computation is that the
M-point sample that is produced by the columns of ς at−1|t−1 exhibits mean and
covariance of µ at−1|t−1 and Vt−1|t−1
a
, respectively. It can be viewed as a minimal
Monte-Carlo simulation of a sample with given first two moments. Thus we can
view the sigma points as
[Y ,1] [Y ,2] [Y ,M]
ς t−1|t−1 ς t−1|t−1 · · · ς t−1|t−1
[ε,1] [ε,2] a,[ε,M]
ς at−1|t−1 =
ς t−1|t−1 ς t−1|t−1 · · · ς t−1|t−1
[η,1] [η,2] [η,M]
ς t−1|t−1 ς t−1|t−1 · · · ς t−1|t−1
a concatenation of samples from the state variable and the error terms, given
the information at time t − 1.
Based on these sigma points one can produce now the predictions for the
state variable and its covariance, by taking the sample moments of the function
f , applied at the sigma points
[Y ,m] [Y ,m] [η,m]
ς t|t−1 = f ς t−1|t−1 , ς t−1|t−1 for m = 1, 2, . . . , M
M
1 X [Y ,m]
µt|t−1 = ς t|t−1
M
m=1
M
1 X [Y ,m] [Y ,m] 0
Vt|t−1 = ς t|t−1 − µt|t−1 ς t|t−1 − µ t|t−1
M
m=1
LISTING 5.4:
W "OT^> "%X ` iC&>' %>R Y : The unscented Kalman filter
p u%;t5%.4;C=4Cs*a3%A=C4, Bw8
*u%;t.=O3+>; x=v6ù¨>=O|~}~u%;t5%.4;C=4Cs*O3%A=C4%,@16ù%N=%Nc
*~}1B2*g p 8a4%NC5 uC,4>8a4;C= *u%;t.=O3+>;
?}1B?z p =%,N;t5%3=O3+>;þ*u%;t.=O3+>;
¨4}1B ¨38aNs41t5%3A%+>; p 8a4%NC5 uC,4>8a4;C=þ4,%,C+,¡.+DN,O3N;t.4
5
¨?}1B ¨38aNs4=Nv p =%,N;t5%3=O3+>;4,%,C+,¡.+DN,O3N;t.4
%}1B<%{§%}1B2§% p 3 ;t3=O3NA~8t+ 8a4;C=O5
¯M} 53>%4 2°6:e{¯%4} 53>%4 0¨>4°6:e
¯?} 53>%4 0¨:?à6:e/´}r¯MO¯%4O%¯?v
10
µ} x9B<%H%tC+ ;C45<%v6:|(
µ} C3N 2µKc
%N}x2%g 4,C+5 2¯4°6:e 4,C+5 2¯>?à6:%|(
§%N}x2§% 4,C+5 2¯Mv6<¯%4K 4,C+5 2¯Mv6-¯?(
4,C+5 2¯46<¯MG¨4 4,C+5 2¯46-¯?(
15
4,C+5 2¯>? 6<¯MG 4,C+5 2¯>? 6<¯%4K¨? |(
=}¶xw|e ¨>=}¶xw|e
*+, =~}tF A4;C%=? 0%N=%Nc
p .+>;t5=%,ut.=5%38aN~1a+3 ;C=O5
«Ã} 5>®,= 2K% .?O+A -§%Nt(
20
¨ £},418aN=2Nv6%x<(6·O|Cx2««C|(
¨>£%M}¨ £°:KF<¯M 6%F-e
¨>£4}¨ £<¯MKOKF-¯Ma¯>?à6%F-
¨>£?}¨ £<¯Ma%¯?cOKF 4 ;% 6%F-e
p 1C,C+ 4C.=O3+>;
25
¨>£%M£}*g:¨ £M°6¢¨>£4Kc
M´},418aN= 5u8 ¨>£%M£( µ6ù%6x-e6<%a|Ce
§%MM}þ:¨ £M£zMG%µGK ¨ £M£zMGg
p .+,%,4C.=O3+>;
¨>£%y£}?z:¨ £M£@6¢¨>£?(c
30
¨>£J£}~%N=%N<=t~r¨>£%y£°
y},418aN= 5u8 ¨>£%y£c>µ6ù%6x-e6<%a|C(
J},418aN= 5u8 ¨>£J£c>µ6ù%6x-e6<%a|C(
§%yM}þ:¨ £M£zMG%µGK ¨ £y£zyGg
§JJ}þ:¨ £J£zJc%µGK ¨ £J£zJcg
35
¦r}r§%yMG 3;D -§JJcc
%Nz:KF0¯MG}Mv%FO6:~¦OJv%FO6:e
§%Nz:KF-¯M 6:KF-¯MG}r§%MM¡·¦O§JJG¦
=}úx9=z{%N:KF-¯MGz|(
¨>=}úx-¨>=z 5>®,= C3N -§%N:KF<¯M 6:KF-¯MGz|(
40
4>;
Based on these quantities we can now update the mean and covariance of
the latent variable, incorporating the new information as follows
−1
Kt = X t , Y t |Ft−1 W t , W t |Ft−1
µ t|t = µ t|t−1 + Kt w t
Vt|t = Vt|t−1 − Kt W t , W t |Ft−1 Kt0
In this chapter we will investigate the modeling of volatility, and its implications
on derivative pricing. We will start with some stylized facts of the historical and
implied volatility, which will benchmark any forecasting or pricing methodology.
We will then give an overview of Garch-type volatility filters and discuss how
the parameters can be estimated using maximum likelihood. We will see that
although Garch filters do a very good job in filtering and forecasting volatility,
they fall somewhat short in the derivatives pricing arena. These shortcomings
stem from the fact that Garch, by construction, is set up in discrete time, while
modern pricing theory is set up under continuous time assumptions.
Two families of volatility models will be introduced for pricing and hedging.
Stochastic volatility models extend the Black-Scholes methodology by introduc-
ing an extra diffusion that models volatility. Local volatility models, on the other
hand, take a different point of view, and make volatility a non-linear function
of time and the underlying asset. Of course each approach has some benefits
but also some limitations, and for that reason we contrast and compare these
methods.
It is important to note that this chapter deals exclusively with equity volatil-
ity, and to some extend exchange rate volatility. These processes are typically
represented using some variants of random walk models. Fixed income securities
models, and their volatility structures, will be covered in a later chapter.
FIGURE 6.1: Dow Jones industrial average (DJIA) weekly returns and yearly
historical volatility. The (annualized) volatility is computed over non-overlapping
52
PSfrag replacements
week periods from the beginning of 1930 to 2005.
20% 60%
15%
50%
0
10%
0.1
0.2 40%
5% 0.3
0.4
0 0.5 30%
0.6
0.7
-5%
0.8 20%
0.9
-10% 1
10%
-15%
-20% 0
30 35 40 45 50 55 60 65 70 75 80 85 90 95 00 05 30 35 40 45 50 55 60 65 70 75 80 85 90 95 00 05
HISTORICAL VOLATILITY
Volatility in financial markets varies over time. This is one of the most docu-
mented stylized facts of asset prices. For example, figure 6.1(a) gives a very long
series of weekly returns1 on the Dow Jones industrial average index (DJIA, or
just “the Dow”). Subfigure 6.1(b) presents the (annualized) standard deviation of
consecutive and non-overlapping 52-week intervals, a proxy of the realized DJIA
volatility over yearly periods. One can readily observe this time variability of
the realized volatility, and in fact we can easily associate it with distinct events,
like the Great Depression (early 30s), the Second World War (late 30s/early
40s), the Oil Crisis (mid 70s), and the Russian Crisis (late 90s).
If we compute the summary statistics of the DJIA returns, we will find that
the unconditional distribution exhibits fat tails (high kurtosis). In particular,
the kurtosis of this sample is k = 8.61. The variability of volatility can cause
fat tails in the unconditional distribution, even if the conditional returns are
normally distributed. To illustrate this point, consider a simple example where
the volatility can take only two values, σt = σ1 = 10% or σt = σ2 = 40%, and both
means are zero. Say that we denote with fN (x; µ, σ) the corresponding normal
probability density functions.
Also, suppose that p1 = 75% of the time returns are drawn from a normal2
r ∝ fN (r; 0, σ1 ), and in the other p2 = 25% of the time they are drawn from a
second normal r ∝ fN (r; 0, σ2 ). If we consider the unconditional distribution, its
probability density function will be a mixture of the two normal distributions,
and in fact
1
Here by returns we actually mean log-returns, that is if St is the time-series of DJIA
values, rt = log St−1 − log St .
2
Here the notation x ∝ f(x; · · · ) means that x is distributed as a random variable that
has a probability density function given by f(x; · · · ).
155(6.1)
0
0.1
0
0.1
0.2 0.2
0.3 0.3
0.4 0.4
FIGURE 6.2: This figure illustrates the different kurtosis and skewness patterns
0.5 0.5
0.6 0.6
that can be generated by mixing two normal distributions. In both figures σ 1 =
0.7
0.8
0.7
0.8
10% and σ2 = 40%. In subfigure (a) the two means are equal µ1 = µ2 = 0, a
0.9
1
0.9
1
setting that can generate fat tails but not skewness. In subfigure (b) µ 1 = 5%
and µ2 = −15%, generating negative skewness in addition to the fat tails.
4.0 4.0
0 0
0.1 3.5 0.1 3.5
0.2 0.2
0.3 0.3
0.4 3.0 0.4 3.0
0.5 0.5
0.6 2.5 0.6 2.5
0.7 0.7
0.8 0.8
2.0 2.0
0.9 0.9
1 1
1.5 1.5
1.0 1.0
0.5 0.5
0 0
-2.0 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0 -2.0 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0
(a) µ1 = µ2 (b) µ1 6= µ2
r ∝ p1 fN (r; 0, σ1 ) + p2 fN (r; 0, σ2 )
Figure 6.2(a) illustrates exactly this point, and gives the two conditional
normals and the unconditional distribution. One can easily compute the statistics
for the unconditional returns, and in particular the unconditional volatility σ =
21.7%, and the kurtosis k = 8.7 > 3.
Inspecting figure 6.1(b), one can also observe that the historical realized
volatility does not swing wildly, but exhibits a cyclical pattern. In particular, it
appears that volatility exhibits high autocorrelation, with low (high) volatility
periods more likely to be followed by more low (high) volatility periods. In the
literature these patterns are often described as volatility clusters. Having said
that, the volatility process appears to be stationary, in the sense that it remains
between bounds, an intuitive feature.3 We can imagine that there is some long
run volatility that serves as an attractor, with the spot volatility hovering around
this level.
IMPLIED VOLATILITY
In chapter 2 we gave a quick introduction to the notion of the implied volatility
(IV), denoted with σ̂. In particular, given an observed European call or put option
price Pobs , the IV will equate it to the theoretical Black-Scholes value, solving
the equation
3
The intuition stems from the fact that, unlike prices themselves, market volatility can
not increase without bounds. Even if we are asked to provide some estimate for the
volatility of DJIA in 1,000 years, we would probably come up with a value that reflects
current volatility bounds. If we are asked to estimate the level of DJIA in 1,000 years’
time, we would produce a very vary large number.
0.5
0.6
0.7 10
0.8
0.9 160 50
1
140 45
5
120 40
0
100 35 0.1
VIX changes
0.2
log-SPX level 0.3
VIX level
80 30 0
0.4
60 25 0.5
0.6
40 20 0.7
0.8 -5
20 15 0.9
1
0 10
-20 5 -10
90 92 95 97 00 02 05 -8 -6 -4 -2 0 2 4 6 8
log-SPX changes
5
Also found in the literature as bust and boom, bear and bull, or recession and expan-
sion, depending on the journal or publication one is reading.
6
Of course this is a very crude method. Bouchaud and Potters (2001) give a formal
empirical investigation based on a large number of stocks and indices, and find that this
negative correlation is more pronounced for indices, but more persistent for individual
stocks.
7
We follow here the standard Modigliani and Miller (1958) capital structure approach,
where stocks and bonds represent different ways of splitting firm ownership. Say that
a company is worth $100m, with $10m in stock and the rest ($90m) in bonds. If the
value of the firm increases by 1% up to $101m, the value of the stock will increase to
$11m to reflect that increase (since the debt value cannot change). This will imply a
10% rise in the stock price.
0.15 0.15
0.10 0.1
1.2 1.2
0.05 0.05
500 1
0 450 0
0.5
1.0 0.8 400 1 0.8
0.6 0.4 0.6 0.4
0.2 0 350 0.2 0 0
Garch SV
current volatility known unknown
conditional volatility computable unknown
volatility randomness no extra source extra source
volatility price of risk set internally set externally
time frame discrete continuous
incompleteness discrete time extra diffusions
option pricing very limited available
historical calibration maximum likelihood hard
calibration to options hard transforms
rt |Ft−1 ∝ fN (rt ; µt , σt )
but having a different volatility σt , and possibly a different mean µt . This volatility
is updated using a mechanism that ensures that at each period t − 1 we can
ascertain the parameters of next period’s returns, σt and µt , based on past returns
alone. In probability jargon we say that both σt and µt are Ft−1 -adapted.
rt = µ + ε t
εt ∼ N(0, ht )
2
ht = ω + γεt−1
In this model
√ the conditional return is indeed normally distributed, r t |Ft−1 ∝
fN (rt ; µ, ht ), and the volatility is Ft−1 -adapted since it is a function of εt−1 =
rt−1 − µ which is known at time t − 1. Also, if the volatility at time t − 1 is large,
then it will be more likely to draw a large (in absolute terms) εt . Therefore an
Arch(1) will exhibit some autocorrelation in the volatility. In order to ensure that
the volatility is positive we need to impose the restrictions ω, γ > 0.
2 2
We can write volatility forecasts ht+s|t = E[εt+s |Ft ] = Et εt+s by backward
substitution as
2 2
ht+s|t = Et εt+s = ω + γEt εt+s−1 = ω + γht+s−1|t
which yields the forecasts (using also ht+1|t = ht+1 which is known at time t)
rt = µ + ε t
εt ∼ N(0, ht )
2
ht = ω + βht−1 + γεt−1
The additional constraint β > 0 is sufficient to keep the variance positive. This
seemingly small addition is equivalent to an Arch(∞) structure, which is clear
if we back-substitute the conditional variances which yields for s lags
1 − βs
ht = ω + β s ht−s + γεt−1
2 2
+ γβεt−2 + · · · + γβ s−1 εt−s
2
1−β
If β 6 1, then we can let s → ∞, giving the Arch(∞) form of the Garch(1,1)
model
ω 2 2
ht = + γεt−1 + γβεt−2 + γβ 2 εt−3
2
+···
1−β
The impact of lagged errors decays exponentially as we move further back in the
past of the series. The Garch(1,1) model has been extremely popular amongst
econometricians and practitioners that need to either filter of forecast volatility.
The natural generalization Garch(p,q) includes p lags of the squared error terms
and q lagged variances.
Once again we can derive the volatility forecasts using forward substitution,
in particular
2 2
ht+s|t = Et εt+s = ω + βEt ht+s−1 + γEt εt+s−1 = ω + (β + γ)ht+s−1|t
which is the same form we encountered in the Arch case for γ β +γ. Therefore
we can compute forecasts for the variance and the integrated variance if we
denote κ = β + γ (the so called persistence parameter)
ω ω
ht+s|t = + κ s−1 ht+1 −
1−κ 1−κ
ω κ − κs ω
Ht,s = (s − 1) + ht+1 −
1−κ 1−κ 1−κ
ω
The long run (or unconditional) variance is now given by h ? = 1−β−γ . In
order for the variance to remain well defined we need to impose the constraint
β + γ 6 0.
The fact that conditionally the random variables rt |Ft−1 are normally dis-
tributed, allows one to compute the likelihood for a given set of parameters
θ = {µ, ω, β, γ}. Often we set the long run variance h? equal to the sample
variance σ̄ 2 , and therefore set ω = σ̄ 2 (1 − β − γ). This makes sense if our sample
is fairly long, and can significantly help the numerical optimization algorithm.
In that case the parameter vector to be estimated is θ = {µ, β, γ}. In order to
start the recursive algorithm that computes the Garch variance we also need an
initial value for h0 . We can also use h0 = σ̄ 2 , or we can add h0 to the parameter
vector and let it be estimated.
In the Garch process we defined above, the parameter µ is not the expected
rate of return. In particular, as the asset price is lognormally distributed,
St =
St−1 exp(rt ), the expected return is Et−1 St = St+1 exp µ − 21 ht . Therefore, if
we want µ to denote the constant expected return, then we need to set up the
Garch equation as
1
r t = µ − ht + ε t , εt ∼ N(0, ht )
2
The next steps, implemented in listing 6.1, show how the likelihood can be
computed for a given set of parameters θ and a sample r = {r t }. The popularity
of the Garch model stems from the fact that this likelihood is computed rapidly
and can be easily and quickly maximized. The ideas behind maximum likelihood
estimation were covered in detail in chapter 5.
1. If they are not part of θ, we set the parameters ω = σ̄ 2 (1 − β − γ) and
h0 = σ̄ 2 .
2. Based on the parameters {µ, ω, β, γ} and the initial value h 0 , we filter the
volatility series, applying the Garch(1,1) recursion
1
ε t = r t − µ + ht
2
ht+1 = ω + βht + γεt2
3. Now we have the variance series which allows us to compute the log-
likelihood of each observation rt . Since rt |Ft−1 ∼ N(µ, ht )
(rt − µ)2 1 1
log L (rt |θ) = − − log ht − log 2π
2ht 2 2
4. Finally adding up will give the log-likelihood of the sample
T
X
log L (r|θ) = log L (rt |θ)
t=1
∂ log L (r|θ)
=0
∂θ θ=θ̂
The Hessian matrix of second derivatives can help us produce the asymptotic
standard errors
∂2 log L (r|θ)
Ĥ =
∂θ 0 θ θ=θ̂
Estimation examples
As an example, we will estimate two time series using the Garch(1,1) process for
the volatility. We start with the long DJIA index sampled weekly from 1930 to
2004 (plotted in figure 6.1(a)), and then move to the shorter SPX index sampled
daily from 1990 to mid-2007 (plotted in figure 6.3). Listing 6.2 shows how the
log-likelihood can be optimized.
The estimation is done using the Optimization Toolbox in Matlab, although
any hill climbing algorithm will do in that simple case. We use constrained
optimization to ensure that β and γ are bounded between zero and one. Also we
want to ensure that κ = β + γ < 1. The standard errors are produced using the
Hessian matrix that is estimated by the toolbox.8 We also use the restriction on
the long run variance, and set the initial variance equal to the sample variance.
The maximum likelihood parameters are given below (all in percentage terms),
with standard errors in parentheses.
DJIA SPX
µ 0.19 0.05
(0.02) (0.01)
β 91.32 93.93
(0.94) (0.53)
γ 7.66 5.42
(0.76) (0.45)
κ =β+γ 98.98 99.45
√ Both times give similar estimated values. If we write the error term ε t =
ht ηt for ηt ∼ N(0, 1), then εt2 = ht η2t and since Eη21 = 1 we can write εt2 =
ht (1+ut ) where now Eut = 0 (but of course ut is not normal). The the Garch(1,1)
variance process can be cast in an autoregressive AR(1) form
ht = ω + κht−1 + γht ut
that volatility behaves as a near unit root process.9 In such a process shocks
to the volatility are near permanent, and the process is reverting very slowly
towards the long run variance.10
These parameter estimates are typical of Garch estimations, and the near
integrated behavior has been the topic of substantial research through the 80s
and the 90s. A number of researchers introduced Garch variants that exhibit long
memory, such as the fractionally integrated Garch (Figarch) of Baillie, Bollerslev,
9
In fact, if we trust the standard errors we are not able to reject the hypothesis κ = 1.
10
A Garch process with β+γ = 1 is called integrated Garch (Igarch), and is equivalent to
the exponentially weighted moving average (EWMA) specification, where the variance
is updated as σt2 = λσt−12 2
+ (1 − λ)εt−1 . In this case the volatility behaves as a random
walk.
60 50
0
0.1 45
50 0.2
0 0.3 40
0.1 0.4
0.2 40 0.5 35
0.3 0.6
0.4 0.7 30
0.5 30 0.8
0.6 0.9 25
0.7 1
0.8 20 20
0.9
1 15
10
10
0 5
30 35 40 45 50 55 60 65 70 75 80 85 90 95 00 05 90 92 95 97 00 02 05 07
and Mikkelsen (1993). Others acknowledge that models with structural breaks
in the variance process can exhibit spuriously high persistence (Lamourex and
Lastrapes, 1990), and produce models that exhibit large swings in the long run
variance attractor (Hamilton and Susmel, 1994; Dueker, 1997).
Figure 6.5 gives the filtered volatility for both cases. This is a by-product of
the likelihood evaluation. For comparison, the historical volatility (of figure 6.1)
and the implied volatility VIX index (of figure 6.3) are also presented. The filtered
volatilities are computed using the maximum likelihood parameter estimates. On
point worth making is that the implied volatility overestimates the true volatility,
illustrated in subfigure (b), where the VIX index is above the filtered volatility
for most of time. This due to the fact that implied volatility can be thought as
a volatility forecast under an equivalent martingale measure, rather than a true
forecast. There will be different risk premiums embedded in the implied volatility,
rendering it a biased estimator or forecast of the true volatility.
OTHER EXTENSIONS
Apart from the simple Garch(1,1) model that we already presented, there have
been scores of modifications and extensions, tailor made to fit the stylized facts
of asset prices. We will give here a few useful alternatives.
In the standard Garch model we √ assumed that conditional returns are nor-
mally distributed, and write εt = ht ηt , with ηt ∼ N(0, 1). The likelihood func-
tion was based on this assumption. It is straightforward to use another distri-
bution for ηt ; if it has a density function that is known in closed form, then it
is straightforward to modify the likelihood function appropriately. Of course it
might be necessary to normalize the distribution to ensure that Eη t = 0 and
Eη2t = 1. A popular choice is the Student-t distribution which can accommodate
We can augment the parameter vector θ with ν, and the third step of the like-
lihood evaluation will now become
3? Now we have the variance series which allows us to compute the log-
likelihood of each observation
√ !
ν − 2Γ (ν + 1)/2 1
log L (rt |θ) = log √ − log ht
ν πΓ (ν/2) 2
ν+1 (rt − µ)2 ν − 2
− log 1 +
2 ht ν2
The function I(x) is the indicator function. Therefore, if γ ? > 0 a negative return
will increase the conditional variance more than a positive one (γ +γ ? instead of
γ).11 But even with the GJR approach we will not have the situation illustrated
in figure 6.3(b), where positive returns will actually have a negative impact on
the volatility.
The Egarch model of Nelson (1991) takes a more direct approach, as it uses
raw rather than squared returns. This implies that the sign is not lost and will
have an impact. In order to get around the non-negativity issue he models the
logarithm of the variance
εt
log ht = ω + β log ht−1 + γ (|ηt | + θηt ) , ηt = √
ht
11
Other asymmetric extensions include the threshold model of Zakoian (1994) and the
quadratic Garch of Sentana (1995).
LISTING 6.3:
b >$ RC^: jj` '%&:Sg Y : Egarch likelihood function.
p 4N,O. ?G%s%A3 Bw8
*u%;t.=O3+>; x<A3:à6/§C|~}4N,O. ?G%s%A3 21N,°6ù%N=%Nc
D¿N,}~D%N,0%N=%Ncc
8}1CN,v: p 8u
Nr}1CN,v< p + 8a4N
5
q }1CN,vC pq 4=%N
.}1CN,v9O p N>8%8aN
J}1CN,v< p =?O4=N
¯} 53>%4 0%N=%Ncc
§} 4,C+5 2¯tOe
10
5 1a+=§}D¿N,v p . uC,%,4;C=DN,O3N;t.4
§:ý}5 1a+=§ p 3 ;t3=O3NADN,O3N;t.4
A%3}eB p A3 O4A3 ?a+%+
*+, 3;yG}OKF-¯
5 1a+=§}5 1a+=§ 4>1a5
15
p 41t5%3A%+>;
4,%,C+, }~%N=%N 3;y(ù8eB2%c5 1a+=§z
p u%1CN=4A3 O4A3 ?a+%+
A%3 }A%3 4,%,C+, Ha5 1a+=§ A+ >5 1a+=§cc
p ;a+,8aNA34*+,4=%N
20
4=%N } 4,%,C+, H 5>®,= >5 1a+=§cc
p u%1CN=4DN,O3N;t.4
A+§½} A+ >5 1a+=§Gc
A+§½}rNr q CA+§.CK N q 5 :4=%NKJt4=%NGc
5 1a+=§} 4y1 A+§ec
25
§>3;ycO}5 1a+=§z
4>;
§}§:KF<¯Gc
p 3 *DN,O3N;t.4*NC3A454=NAN,%4DNA>uO4
p -1O4,?ON1t5uC4=%+N q 5 uC,%1ON,N>8a4=4,¡DNA>uO4C5t
30
3 *¡3%5 ;ON; :A%3e
A%3}4²
4>;
In the Egarch approach γθ < 0 will be consistent with figure 6.3(b), as higher
returns will lower volatility. Listing 6.3 shows an implementation of the Egarch
likelihood function. As there are no constraints in the Egarch maximization, the
hill climbing algorithm might attempt to compute the likelihood for absurd pa-
rameter values as it tries to find the optimum. There are a couple of tricks in the
code that ensure that a likelihood value will be returned. The implementation
for the optimization resembles listing 6.2, but we shall use unconstrained opti-
mization. The maximum likelihood parameters are given below for the two time
series
r t = µt + ε t
ε t = H1/2
t ηt
ηt ∼ N(0, I)
The matrix H1/2t can be thought of as the one obtained from the Cholesky fac-
torization of the covariance matrix Ht . The covariance matrix can be updated in
a form that is analogous to the univariate Garch(1,1)
12
The most widely used forms are the VEC specification of Bollerslev, Engle, and
Wooldridge (1988), and the BEKK specification of Engle and Kroner (1995). A re-
cent survey of different approaches and methods is Bauwens, Laurent, and Rombouts
(2006).
Ht = Ω + B Ht−1 + α (ε t ε 0t )
In this case the (i, j)-th element of the covariance matrix will depend on its
(i) (j)
lagged value and on the product εt−1 εt−1 . Of course more general forms are
possible, with covariances that depend on different lagged covariances or error
products.
To illustrate the multivariate Garch, we will use an example that is based on
the Capital Asset Pricing Model (CAPM). In particular, asset returns will depend
on the covariance with the market and the market premium, which in turn will
depend on the market variance. If we denote with rtA , rtM and rtF the asset, market
and risk free rates of return, then we can write the CAPM relationships as
Since Et−1 rtM − rtF = λEt−1 (εtM )2 , the above system simplifies to
for µQ chosen in a way that makes the discounted price a martingale under Q,
St = E Q
t [exp(−r4t)St+1 ]. But not all is lost: we just need to impose some more
structure that will eventually constrain our choices for Q. Here we will outline
two methods to achieve that, but since derivative pricing typically takes place
in a continuous time setting, we will not dwell into details.
resembles the impact of the idiosyncratic versus the systematic risk in asset
pricing models.
Option prices can be computed from the Euler equations, which state that
the price at time t of a random claim that is realized at time T > t, say X T , is
given by (see for example Barone-Adesi, Engle, and Mancini, 2004)
UW (T , WT )
Xt = E t XT
UW (t, Wt )
Essentially, the Euler equation weights each outcome with its impact on the
marginal rate of substitution, before taking expectations. The price of a European
call option would be then equal to
UW (T , WT )
Pt = E t (ST − K )+
UW (t, Wt )
Note that in the above expression there is no talk of equivalent measures. All
expectation are taken directly under P. Nevertheless, if we think of the marginal
rate of substitution as a Radon-Nikodym derivative, then we can define the
equivalent probability measure.
Of course, in general it is not straightforward neither to specify the ap-
propriate utility nor to compute the expectation in closed form, but things are
substantially simplified if we consider power utility functions. In fact , we will
arrive to the Esscher transform, which has been very successful in actuarial
sciences. This is described in detail in Gerber and Shiu (1994).
Distribution based
The second method takes a more direct approach. Suppose that the log price
follows the standard Garch(1,1) model
1 p
4 log St = µ − ht + ht ηt
2
ht = ω + βht−1 + γht−1 η2t−1
Rather than trying to derive, we define the risk neutral probability measure
as the one under which the random variable
r−µ
ηQ
t = ηt − √
ht
is a martingale. Then under risk neutrality the asset log price follows
1 p
4 log St = r − ht + ht ηQ
t
2
2
r−µ
ht = ω + βht−1 + γht−1 ηQ
t−1 + √
ht
1 p
4 log St = r + λht − ht + ht ηt
2
p 2
ht = ω + βht−1 + γ ηt−1 − δ ht
Here√the bilinearity in the variance process is broken. That is to say, the prod-
uct ht−1 ηt−1 ) is not present and ηt−1 , which is a standardized normal series,
appears in the variance √ update alone.
We set ηQt = η t + λ ht , and define the probability measure Q as one that
is equivalent to P, and also ηQ t ∼ N(0, 1) under this measure. Then, the asset
price process under Q will satisfy
1 p
4 log St = r − ht + ht ηQ t
2
p 2
ht = ω + βht−1 + γ ηQt−1 − δ
Q
ht
for δ Q = δ + λ.
Unlike the standard Garch model, the Heston and Nandi (2000) modification
allows one to compute the characteristic function as a closed form recursion.
Then, option prices or risk neutral can be easily computed using the methods
described in chapter 4.
The leverage effect is accommodated by allowing the asset return and the volatil-
ity innovations to be correlated
Derivative prices will have a pricing function that will depend on the volatility,
on top of time and the underlying asset price
Pt = f(t, St , vt )
where the average variance over the life of the derivative in question is defined
as Z T
1
v̄ = vs ds
T −t t
and f(v̄) is the probability density of the average variance process.
For example, in the original Hull and White (1987) article the variance is
assumed to follow a geometric Brownian motion, which is uncorrelated with the
asset price process.
√
dSt = µSt dt + vt St dBt
dvt = θvt dt + φvt dBtv
In this case, HW give a series approximation for the option price, which is based
on the moments of the average variance.
The HW model was the first approach (together with Wiggins, 1987) towards
a pricing formula for SV models, but the model they propose does not capture
the desired features of realized volatilities. In particular, under the geometric
Brownian motion dynamics, variance will be lognormally distributed. In the long
run, the volatility paths will either explode towards infinity, or they will fall to
zero, depending on the parameter values. Volatility in the HW model does not
exhibit mean reversion and is not stationary. As maturities increase, the variance
of out volatility forecasts increases without bound.
This process was later extended in Schöbel and Zhu (1999) by allowing
the two BM processes to be correlated. The volatility process follows a normal
distribution for each maturity, and therefore can cross zero. This implied that
the true correlation (that is Et dSt dσt ) changes sign when this happens. This can
be an undesirable property of the model.
Schöbel and Zhu (1999) compute the characteristic function of the log-price
1 ρ
φ(T , u) = exp iu log(S0 ) + iuµT − iu σ02 + ξ 2
2 ξ
1
× exp D(T ; s1 , s3 )σ02 + B(T ; s1 , s2 , s3 )σ0 + C(T ; s1 , s2 , s3 )
2
The functions D, B and C are solutions of a system of ODEs, and are given in
a closed (but complicated) form in the appendix of Schöbel and Zhu (1999).
The Heston model has a number of attractive features and a convenient pa-
rameterization. In particular, the variance process is always non-negative, and
is actually strictly positive if 2θv̄ > ξ 2 . The volatility-of-volatility parameter ξ
controls the kurtosis, while the correlation parameter ρ can be used to set the
skewness of the density of asset returns. The variance process exhibits mean re-
version, having as an attractor the long run variance parameter v̄. The parameter
θ defines the strength of mean reversion, and dictates how quickly the volatility
skew flattens out.
This model belongs to the more general class of affine models of Duffie et al.
(2000), and the characteristic function of the log-price is given is closed form. In
particular it has an exponential-affine form14
14
We use the negative square root in d, found in Gatheral (2006), unlike the original
formulation in Heston (1993). Albrecher, Mayer, Schoutens, and Tistaert (2007) discuss
this choice and show that the two are equivalent, but using the negative root offers
higher stability for long maturities. The problem arises due to the branch cuts of the
complex logarithm in C (u, T ). A description of the problem and a different approach
can be found in Kahl and Jäckel (2005).
LISTING 6.4:
a& ` %T%! " Y : Characteristic function of the Heston model.
p 1%?t3s>?O4C5=C+>; Bw8
*u%;t.=O3+>; M~}r1%?t3s>?O4C5=C+>;àuv6V1G
= }1vB2=g
, }1vB2,g
D }1vB<D
5
D¿N, }1vB-D¿N,v
=?O4=N~}1vB-=?O4=N
y3 }1vB<y3
,?O+ }1vB-,?O+g
}a 5>®,= 3%,?O+ay3O uc%=?O4=NKB<~ -y3tCu°B<G 3uGc
10
? }=?O4=N3%,?O+ay3O>u%
}?°B2Hc-?þ>tc
4} 4y1 <=atc
¸%¦} A+ :>zB2C4GgB2Hc:>tc
«Ã}3,a=a>uC=?O4=NtD¿N,cHG:y3tOK<?K= >¸%¦Kc
15
¥´}?GHG:y3tB<G:4GgB2Hc:>zB2C4Gc
M} 4y1 -«rrDt¥Kc
The characteristic function of the Heston model is given in 6.4. This can
be used to compute European style vanilla calls and puts using the transform
methods outlined in chapter 4. We will be using this approach later in this
chapter to calibrate the Heston model to a set of observed option prices.
with initial value M0 = 1. The solution of this SDE is the exponential martingale
(with respect to P), which has the form
Z T Z Z T Z
1 T 2 1 T 2
Mt = exp Φs dBs − Φs ds + Ψs dBsv − Ψs ds
t 2 t t 2 t
The processes Φt and Ψt are Ft -adapted, and therefore they can be functions
of (t, St , vt ). Based on this exponential martingale we can define a probability
measure Q, which is equivalent to P. In fact, every choice of processes Φ t and
Ψt will produce a different equivalent measure. The only constraint we need to
impose on these processes is that the discounted underlying asset price must
form a martingale under Q, which then becomes an equivalent martingale mea-
sure (EMM). The fundamental theorem of asset pricing postulates that if this
the case, then there will be no arbitrage opportunities in the market. It turns
out that this constraint is not sufficient to identify both processes, something
that we should anticipate since the market is incomplete and there will not be
a unique EMM.
The EMM will be defined via its Radon-Nikodym derivative with respect to
the true measure,
dQ
= Mt
dP t
If ΥT is a FT -measurable random variable, then expectations under the equiv-
alent measure will be given as
MT
EQ P
t ΥT = E t ΥT
Mt
It is useful to compute the expectations over an infinitesimal interval dt, as this
will help us compute the drifts and volatilities under Q. In particular we will
have
Mt + dMt
EQ P
t dΥt = Et dΥT
Mt
dMt v
= EPt 1 + dΥt = EP P
t dΥt + Et (Φt dBt + Ψt dBt ) dΥt
Mt
We can employ the above relationship to compute the drifts and the volatil-
ities of the asset returns under Q
This verifies that under equivalent probability measures the drifts are adjusted
but volatilities are not. Now an EMM will be one that satisfies
EQ
t (dSt /St ) = rdt
The function Ξ S (t, S, v) is the market price of risk, the Sharpe ratio of the
underlying asset. In order to construct a system we need a second equation, and
essentially we have the freedom to choose the market price of volatility risk.
Thus if we select a function EQ Q
t dvt = α (t, S, v), which will be the variance drift
under risk neutrality, we can set up a second equation
α(vt ) − α Q (t, St , vt )
ρΦt + Ψt = − = −Ξ v (t, St , vt )
β(vt )
where Ξ v (t, S, v) will be the price of volatility risk.
The market risk premium Ξ S will be typically positive, as the underlying
asset will offer expected returns that are higher than the risk free rate. This
reflects the fact that investors prefer higher returns, but are risk averse against
declining prices. When it comes to volatility, we would expect investors to prefer
lower volatility, and be risk averse against volatility increases. This indicates
that it would make sense to select α Q in a way that implies a negative risk pre-
mium Ξ v , and one that does not increase with volatility. Essentially this means
that α Q > α. In practice we will have to find a convenient parameterization for
α Q or Ξ v that leads to expressions that admit solutions, and at the same time
restrict the family of admissible EMMs. The parameter values cannot be deter-
mined from the dynamics of the underlying asset, but they can be recovered from
observed derivative prices.
If we solve the above system we can find the processes Φ t and Ψt , and
through them the appropriate EMM, as follows
1
Φt = −Ξ S + ρΞ v
1 − ρ2
1
Ψt = 2
−Ξ v + ρΞ S
1−ρ
Finally, derivative prices can be written as expectations under Q, where the
asset dynamics are
α(v) = θ(v̄ − v)
√
β(v) = ξ v
The price of risk is determined by the risk free rate and the asset price
dynamics
µ−r
Ξ S (t, S, v) = √
v
We are free to select the price of volatility risk. Say we set it equal to
κ√
Ξ v (t, S, v) = vt
ξ
for a parameter κ 6 0 (to conform with agents that are averse towards higher
volatility). Then, the risk premium will be positive and increasing with volatility.
In addition, such a risk premium will lead to risk neutral dynamics that have the
same form as the dynamics under P.
Girsanov’s theorem will give the process under Q
√
dSt = rSt dt + vt St dBtQ
√
dvt = α Q (vt )dt + ξ vt dBtv,Q
√
The risk neutral variance drift α Q (vt ) = θ(v̄ − vt ) − Ξ v (t, St , vt )ξ vt . Then we
can rewrite the dynamics
√
dSt = rSt dt + vt St dBtQ
√
dvt = θ Q (v̄ Q − vt )dt + ξ vt dBtQ,v
θv̄
for the parameters θ Q = θ + κ and v̄ Q = θ+κ . Due to their risk aversion, mani-
fested through the parameter κ 6 0, investors behave as if the long run volatility
is higher than it really is, and as if volatility exhibits higher persistence.
where
∂f ∂f 1 ∂2 f
αtX = + α(vt ) + vt St2 2
∂t ∂v 2 ∂S
1 2 ∂2 f √ ∂2 f
+ β (vt ) 2 + ρ vt St β(vt )
2 ∂v ∂S∂v
∂f
βtX ,S = σSt
∂S
∂f
βtX ,v = β(vt )
∂S
From this expression it is apparent that if we construct a portfolio using only
the underlying asset and the bank account, we will not be able to replicate the
price process Xt , since the risk source Btv cannot be reproduced. The market that
is based only one these instruments is incomplete, since the derivative cannot
be replicated.
But we can dynamically complete the market using another derivative X ? ,
with pricing function f ? (t, S, v). This will work of course if X ? actually depends
on the BM Btv , which is typically the case.15 In practice we would perhaps
replicate X (say a barrier option), using the risk free asset, the underlying asset
and a liquid derivative X ? (say a vanilla at the money option).
Following BS, we short X and construct a portfolio of the underlying stock
and the other derivative. We want to select the weights of this portfolio in a way
that makes it risk free. Then it should grow at the risk free rate.
Say that at each point in time we hold ∆t units of the underlying asset and
∆?t units of the derivative. Then the change in our portfolio value Π t will be
Substituting for the dynamics of dXt , dSt and dXt? will give the portfolio
dynamics
dΠt = (. . .)dt
√ ∂f √ √ ∂f ?
+ v t St − ∆t vt St − ∆?t vt St dWt
∂S ∂S
∂f ∂f ?
+ β(vt ) − ∆?t β(vt ) dZt
∂v ∂v
If we select the portfolio weights that make the parentheses equal to zero,
then we have constructed a risk free portfolio. The solution is obviously 16
−1
∂f ∂f ?
∆?t =
∂v ∂v
∂f ∂f ?
∆t = − ∆?t
∂S ∂S
And since the portfolio will be risk now free, it will also have to grow at the risk
free rate of return
dΠt = rΠt = r(Xt − ∆t St − ∆?t Xt? )
We should expect that the drifts will give the PDE that we are looking for, but
at the moment we have a medley of partial derivatives of both pricing functions
f and f ? . Nevertheless, we can carry on setting the portfolio drifts equal, which
yields
∂f ? ∂f ?
α X + µS − ∆µS − ∆? α X − ∆? µS = r(f − ∆S − ∆? f ? )
∂S ∂S
?
Since ∆ + ∆? ∂f ∂f
∂S = ∂S the drift of the underlying asset µ will cancel out,
resembling the BS scenario. Furthermore, if we substitute the hedging weights
∆ and ∆? and rearrange to separate the starred from the non-starred elements
? ?
∂f
α X + rS ∂S − rf α X + rS ∂f
∂S − rf
?
λ= ∂f
= ∂f ?
= λ?
∂v ∂v
The following line of argument is the most important part of the derivation,
and the most tricky to understand at first reading: In the above expression the
RHS ratio λ (which depends only on f) is equal to the LHS ratio λ ? (which
depends only on f ? ). Recall that f and f ? are the pricing functions of two arbitrary
derivatives, which means that the above ratio will be the same for all derivative
contracts. If we selected another derivative contract X ?? , then for its pricing
function λ = λ?? , which implies λ = λ? = λ?? , etc. This means that although
?
16
Apparently, for the solution to exist we need ∂f∂v 6= 0. This corresponds to our previous
remark that a forward contract cannot serve as the hedging instrument.
where the dynamics of the underlying asset and its volatility are given by the
SDEs
√
dSt = rSt dt + vt St dBtQ
dvt = α Q (t, St , vt )dt + β(vt )dBtQ,v
Q,v
EQ Q
t dBt dBt = ρdt
with the drift of the variance process given by α Q (t, S, v) = α(v)−Ξ v (t, S, v)β(v).
The price of volatility risk is Ξ v .
Using the PDE approach we concluded that the pricing function f(t, S, v)
will solve the PDE
∂f ∂f 1 ∂2 f
+ α ? (t, S, v) + vS 2 2
∂t ∂v 2 ∂S
1 ∂2 f √ ∂2 f ∂f
+ β 2 (v) 2 + ρ vSt β(v) + rS = rf
2 ∂v ∂S∂v ∂S
17
An identical line of argument is used in fixed income securities, which we will follow
in chapter XX.
The free functional λ that we introduced in the PDE approach can be interpreted
as the total volatility risk premium. For investors that are averse towards high
volatility λ 6 0.
∂f ∂f 1 ∂2 f
+ {θ(v̄ − v) − λ(t, S, v)} + vS 2 2
∂t ∂v 2 ∂S
1 ∂2 f ∂2 f ∂f
+ ξ 2 v 2 + ρvSξ + rS = rf
2 ∂v ∂S∂v ∂S
In his original paper, Heston assumes λ(t, S, v) to be proportional to the
variance v
λ(t, S, v) = λv
Essentially, following our previous discussion, this indicates that the equivalent
function Ξ v in the EMM approach will be
λ(t, S, v) λ√
Ξ v (t, S, v) = = v
β(v) ξ
This means that the parameter λ of the PDE approach has exactly the same
interpretation as κ. This choice for λ sets the PDE
∂f ∂f 1 ∂2 f
+ θ Q (v̄ Q − v) + vS 2 2
∂t ∂v 2 ∂S
1 ∂2 f ∂2 f ∂f
+ ξ 2 v 2 + ρvSξ + rS = rf
2 ∂v ∂S∂v ∂S
The boundary conditions are also need to specified. Following Heston (1993),
for a European call option
CALIBRATION
Even if we estimate the parameters of a stochastic volatility models using his-
torical time series of asset returns, not all parameters would be useful for the
purpose of derivative pricing. This happens because the estimated parameters
would be the ones under the true probability measure, while investors will use
some adjusted parameters to price derivatives. In particular, for stochastic volatil-
ity models the drift of the variance will be a modification of the true one, which
is done by setting the price of volatility risk. To recover this price of risk, one
should consult some existing derivative prices.
For that reason, practitioners and (to some extend) academics prefer to use
only derivative prices, and calibrate the model based on a set of liquid options.
A standard setting is where a derivatives desk wants to sell an exotic option, and
then hedge its exposure, and say that a stochastic volatility model is employed.
The desk would look at the market prices of liquid European calls and puts, and
would calibrate the pricing function to these prices. Such parameters are the risk
neutral ones, and therefore can be used unmodified to price and hedge the exotic
option. In a sense, they are a generalization of the BS implied volatilities. In a
way, practitioners want to price the exotic contract in a way that is consistent
with the observed vanillas.
If the calibrated model was the one that actually generated the data, then
these implied parameters should be stationary through time, and their variability
should be due to measurement errors alone. In practice of course this is not
the case, and practitioners tend to recalibrate some parameters every day (and
sometimes more often).
To implement this calibration we will need to minimize some measure of dis-
tance between the theoretical model prices and the prices of observed options.
Say that we have a pricing function P(τ, K ; θ) = P(τ, K ; S0 , r; θ), where θ de-
notes the set of unobserved parameters that we need to extract. Also denote with
σ(τ, K ; S0 , r; θ) the implied volatility of that theoretical price, and with P ? (τ, K )
and σ ? (τ, K ) the observed market price and implied volatility. For example, in
Heston’s case θ = {v0 , θ, v̄, ξ, ρ}. There are many objective functions that one
can use for the minimization, the most popular having a weighted sum of squares
form XX 2
G(θ) = wi,j P(τi , Kj ; θ) − P ? (τi , Kj )
i j
The weights wi,j can be used to different ends. Sometimes the choice of w i,j
reflect the liquidity of different options using a measure such as the bid-ask
spread. In other cases one wants to give more weight to options that are near-
the-money (using for example the Gamma), or to options with shorter maturities.
In other cases one might want to implement a weighting scheme based on the
options’ Vega, in order to mimic an objective function that is cast in the implied
volatility space.
Recovering the parameter set θ is not a trivial problem, as the objective
function can (and in many cases does) exhibit multiple local minima. This is
a common feature of inverse problems like this calibration exercise. Typically
some regularization is implemented, in order to make the problem well posed
for standard hill climbing algorithms. A popular example is Tikhonov-Phillips
regularization (see Lagnado and Osher, 1997; Crépey, 2003, for an illustration),
where the objective function is replaced by
Calibration example
As an example we will fit Heston’s stochastic volatility model to a set of observed
option prices. In particular, we are going to use contracts on the SP500 index
written on April 24, 2007. The objective function that we will use is just the sum
of squared differences between model and observed prices. Listing 6.5 gives the
code that computes the objective function. The prices are computed using the
fractional FFT (see chapter 4), and the integration bounds are automatically
selected to reflect the decay of the integrand ψ(u, T ).19
The snippet 6.6 shows how this objective function can be implemented to
calibrate Heston’s model using a set of observed put prices on the SP500 index.
There are eight different maturities in the data set, ranging from 13 to 594 days.
The sum of squared differences between the theoretical and observed prices is
minimized, and for this example we did not use any weighting scheme. Figure
6.6 shows the observed option prices and the corresponding fitted values. The
table below gives the calibrated parameters θ̂ = {v0 , θ, v̄, ξ, ρ}
v0 0.0219
θ 5.5292
v̄ 0.0229
ξ 1.0895
ρ -0.6459
18
This is particularly true for calibrating local volatility models which have a large
number of parameters. We will discuss this family of models in the next section.
19
As Kahl and Jäckel (2005) show, the characteristic function of the Heston model for
large parguments decays as A exp(−uC )/u2 times a cosine (whereRA = ψ(0, T ) and
∞
C = 1 − ρ2 (v0 + θv̄T )/ξ). We can therefore bound the integral z |ψ(u, T )|du by
|A| exp(−C z)/z. The solution of exp(w)w = x is Lambert’s W function which is imple-
mented in Matlab through the Symbolic Math Toolbox. If this toolbox is not available
we have to devise a different strategy to set the upper integration bound, for example
using the moments expansion for the characteristic function. If everything else fails
we can just set a ‘large value’ for the upper integration bound, or set up an adaptive
integration scheme.
LISTING 6.5:
TT k ` %T%! " Y : Sum of squares for the Heston model.
p 5%5®Cs>?O4C5=C+>; Bw8
*u%;t.=O3+>; M~}5%5®Cs>?O4C5=C+>;à21N, 6V1*v6¬%N=%Nc
1O5gB<D }1CN,v:e/1O5gB0D¿N, }1CN,v<e
1O5gB0=?O4=N}1CN,v Ce/1O5gB-y3 }1CN,v9Oe
1O5gB-,?O+ }1CN,v<e
5
4=%N}1%*zB04=%NzV } G1%*B2ON1%1ONKc
«>£}%N=%Nv%FO6:e{³}%N=%Nv%FO6<OH (ª©}%N=%Nv%FO6 Ce
£´}%N=%Nv%FO69Oeù¨}%N=%Nv%FO6<e ,~}%N=%Nv%FO6 CaHO>%e
p ;a+,8aNA341C,O3%.4C5N>;¡5=%,O3 O4C5*+,¨}
£;}£vB9Ha¨{©;}©B9Ht¨g
10
p 54A4C.=O3*%*4,4;C=8aN=uC,O3=O34C5
x9³ u 6 ua|}u%;t3®uO4z<³tc¬,>u},g º u(c{¯u} A4;C%=? -³>u(c
M}cº p JO3AAC44>1þ+>uC=1%uC=¨¨ü
*+, ;}tF-¯u p A++ 1 =?C,C+>uC?³
1O5B2=}r³>uv9;Gc p 54h
= C4C5=C+>;1ON,N>8K5
15
1O5B2,}r,>uv9;Gc
º £«~}þ<³O}}O³>uv9;G·¹þ:«>£e p 54A4C.=.NA%A5
}þ<³O}}O³>uv9;G·¹þ:«>£e p 54A4C.=1u=C5
p º 54=1ON,N>8a4=4,O5*+, ¾ ¼ ¾ ³
p ¦ 3 *=?C4*u%;t.=O3+>;þAN>8 q 4,%=Jà%B-35;O+=NDNC3AN q A454=
p
20
p ¦ 1%*vB2u%¿N,=%+~NþAN,%4D%N%A uC4°¬*+,=?C4u%1%1O4, q +>u%;C
p
N~} ,%4N%A -1%?t3s>?O4C5=C+>;à3CK 4=%NtOHC4=%NKHG4=%NKO6V1O5ac
NC} 5>®,= 0Ct01O5gB-,?O+tCtcBBB
-1O5gB<DKc-1O5gB0=?O4=NKK-1O5gB-D¿N,(CK-1O5gB2=t%HG-1O5gB<y3ac
1%*zB<u%¿N,}AN>8 q 4,%=J°:Nt%NC%HGHNC(
25
1%*zB<%¿N,}GB<C 8ONy N q 5 A+ <©;v º § « £cc
p ,u; ¾ ¼ ¾ ³1C,O3%.%3 ;C4;CO3 ;O4*+F , C4C5=C+>; º
x©Dv6ù«DO|~}*%,%*%=Cs.NA%A 1%?t3s>?O4C5=C+>; 6V1C56{1%*Gc
p .+>;t5=%,ut.=5=%,O3 O4C5N>;1u=1C,O3%.4C5 :*,+8¡1ON,O3=%MG
©D} 4y1 <©DGc{£D}«D 4y1 ,>uv9;G%%³>uv9;G%©D~a
30
«*} 3 ;C=4,1G 2©Dv6ª«>Dv6ª©;v «Kc p 3 ;C=4,1a+%AN=4ú5N>81aA4
£*} 3 ;C=4,1G 2©Dv6/£Dv6ª©;v ºº £cc
p u%1CN=4¨¨ü
M~}M 5u8 :«>*(£;v º «tCB<~ 5u8 -£*(£;v º £cB<g
4>;
35
M~} A+ <MKc p =%N>C4A+=%+~?C4%A 1þ+>1C=O38K3N=O3+>;
LISTING 6.6:
^>$'%& C_ ` %T%! " Y : Calibration of the Heston model.
p .NA3 q s>?O4C5=C+>; Bw8
p 381a+,%=¡+>1C=O3+>;1C,O3%.4C5~%N=%N
%N=%N}yCA5,4%NvC¨!£ Cs%CsCs CsN B-yA%5g:c
p 54=~u11ON,N>8a4=4,O5*+,=?C4 ¾ ¼ ¾ ³
1%*B04=%N }GB<e p «N,,%CNN;1ON,N>8a4=4,
5
1%*B9¯ }C>c p ;%u8 q 4,+>* ¾¾ ³1a+3 ;C=O5
1%*B<ON1%1ON } c p u%1%1O4,¡3 ;C=4%,N=O3+>; 1ON,N>8a4=4,
p 3 ;t3=O3NA1ON,N>8a4=4, 54=
1CN,} x9B2eB9%(B9þKB9 >eYB |e
p +>1C=O3+>;t5*+,=?C4+>1C=O38K3N=O3+>;
10
+ 1=}+>1C=O38K54=°C-¸CN,%4¨.NA46r:+**6~<¥a3%5 1aANM°6r3>=%4, 0c
1CN,}*8K3 ;t.+>; C5%5®Cs>?O4C5=C+>; 6V1N, 6x¤|6x¤|6x¤|6xw|v6%BBB
PSfrag replacements
0 x9B2%B2%eB0 eB0 >eYB |v6%BBB
0.1 x9B<%B<%eB9(B9 >eB%|v6%BBB
15
0.2
0.3
¤
x |6¢+1%= 6V1*v6¬N=N6¬e
0.4
0.5
0.6
FIGURE
0.7 6.6: Calibrated option prices for Heston’s model. The red circles give
the observed
0.8 put prices, while the blue dots are the theoretical prices based on
0.9
Heston’s model that minimize the squared errors.
1
0.10
0 0.08
0.1
0.2
0.3
0.4 0.06
0.5
0.6
0.7 0.04
0.8
0.9
1
0.02
0
0.90 0.95 1.00 1.05 1.10
FIGURE 6.7: The ill-posed inverse problem in Heston’s case. Subfigure (a) gives
the objective function that is minimized to calibrate the parameters. Subfigure
PSfrag replacements
PSfrag replacements
(b) presents the isoquants of this function, together with the minimum point
0
attained using numerical optimization. Observe that all points that are roughly
0
0.1
0.2
across the red line are indistinguishable. The regularized function is given in
0.1
0.2
0.3
0.4
0.3 0.5
(c), while (d) shows its isoquants. Observe that the regularized function is better
0.4
0.5
0.6
0.7
40
10
0 35
0 0.1
0.1 0.2
0.2 8 0.3 30
0.3 0.4
0.4 6 0.5 25
0.5 PSfrag replacements
G(θ)
0.6
0.6
PSfrag replacements 0.7
4 20
θ
0.7
0.8
0.8
2 0.9
0
0.9 15
1 0.11
0 0.2
0.1 0
0.3 10
0.2 0
0.4
0.3 0.5
0.4 5
20 0.6
0.5 0
1 0.7
0.6 2 0
40 4 3 0.8
0.7 5 0 1 2 3 4 5
0.9
0.8 θ ξ ξ
0.9 1
1
(a) the function G(θ) (b) isoquants of G(θ)
40
10
0 35
0 0.1
0.1 0.2
0.2 8 0.3 30
0.3 0.4
0.4 6 0.5 25
0.5
G(θ)
0.6
0.6 0.7
4 20
θ
0.7 0.8
0.8
2 0.9
0.9 15
1
1
0
10
0
5
20
1 0
3 2 0
40 5 4 0 1 2 3 4 5
θ ξ ξ
set of values, indicating that it is very hard to precisely identify the optimal
parameter combination. It is apparent that combinations of values across the
red line in 6.7.b will give values for the objective function that are very close.
This means that based on this set of vanilla options the combinations (θ, ξ) =
(5.0, 1.0), (10.0, 1.7) or (15.0, 2.5) are pretty much indistinguishable.
One way around this problem would be to enhance the information, by in-
cluding more contracts such as forward starting options or cliquet options. 20
20
A forward starting option is an option that has some features that are not determined
until a future time. For example, one could buy (and pay today for) a put option with
three years maturity, but where the strike price will be determined as the level of
SP500 after one year. Essentially one buys today what is going to be an ATM put
in a years’ time. A cliquet or rachet option is somewhat similar, resembling a basket
of forward starting options. For example I could have a contract where every year the
In that way estimates will be biased towards combination where the prior value
is ξ0 . For example, the estimation results of Bakshi, Cao, and Chen (1997) based
on option prices and the joint estimation of returns and volatility in Pan (1997),
indicate a value of ξ ≈ 0.40. Therefore, if we set ξ0 = 0.40 and α = 0.005, the
objective function to be minimized is the one given in figure 6.7(c,d). The optimal
values are now given in the following table
v0 0.0200
θ 3.5260
v̄ 0.0232
ξ 0.7310
ρ -0.7048
The new objective function at the optimal is G̃(θ̂) = 0.0099 which implies a
sum of squares value G(θ̂) = 0.0094, which is not far from the unconditional
optimization result.
LISTING 6.7:
"$X>$> h Y : Nadaraya-Watson smoother.
p ;ONJCN=CBw8
*u%;t.=O3+>; 3r}r;ONJCN=C2y6/Mz6Vz6VJz6{y%36{M%36/?yv6/?%MG
¯} A4;C%=? <ytc
3 *¡3%54>81C=%M 2JK
J}+ ;C45¯6 e
5
4>;
C3}c
C3~}c
*+, }tF<¯
y4} 4y1 >eB2H%?%y(%Gy%3Kygr OB<OH 5>®,= <% 1O3 H?%yz
10
M4} 4y1 >eB2H%?%M(%GM%3KMgr OB<OH 5>®,= <% 1O3 H?%Mz
C3}C3gr O%Jr Oy4B2M4
C3~}C3~Jr O%%y4B2M4
4>;
15
3Ã}~C3gB9HOC3g
As vanilla options are expressed via the risk neutral expectation of the random
variable ST , local volatility models attempt to construct the function σ(t, S) that
is consistent with the implied risk neutral densities for different maturities. The
methodology of local volatility models follows the one on implied risk neutral
densities, originating in the pioneering work of Breeden and Litzenberger (1978).
These methods are inherently nonparametric, and rely on a large number
of option contracts that span different strikes and maturities. In reality there
is only a relatively small set of observed option prices that is traded, and for
that reason some interpolation or smoothing techniques must be employed to
artificially reconstruct the true pricing function or the volatility surface. Of course
this implies that the results will be sensitive to the particular method that is
used. Also, care has to be taken to ensure that the resulting prices are arbitrage
free.
INTERPOLATION METHODS
There are many interpolation methods that one can use on the implied volatility
surface. As second order derivatives of the corresponding pricing function are
required, it is paramount that the surface is sufficiently smooth. In fact, it is
common practice to sacrifice the perfect fit in order to ensure smoothness, which
suggests that we are actually implementing an implied volatility smoother rather
than an interpolator. Within this obvious tradeoff we have to selecting the degree
of fit versus smoothness, which is more of an art than a science.
One popular approach is to use a family of known functions, and reconstruct
the volatility surface as a weighted sum of them. As an example we can use
LISTING 6.8:
& Y ` l !'( Y : Implied volatility surface smoothing.
p 381asDC+%ABw8
%N=%N}yCA5,4%NvC¨!£ Cs%CsCsCsN B-yA%5g:c
«>£}%N=%Nv%FO6:e{³~}%N=%Nv%FO6<OH (ª©}%N=%Nv%FO6Ce
£}%N=%Nv%FO69Oeù¨}%N=%Nv%FO6<e{,~}~%N=%N°%FO6COHO>%e
x º §°6 º §O3%|} q 5s3Dz0¨>6V©z6/,z6V³z6ª« £°6/£Kc
5
x9³ u 6 ua|}u%;t3®uO4z<³tc
p .,4%Nº =4+>uC=1%uC=%,O3O5
©}Gù³}B2(
©+~}ú %%F:©F: %%Oz
³%+~} 8t3; <B %C%³t(F<³F 8ONy GB:>C%³tg
10
¯©C+~} A4;C%=? <©+tcª¯%³+~} A4;C%=? -³%+tc
x9©%+6¬³+C|~} 8a4C5 ?C%,O3 2©%+z6¬³%+tc
©%D+~} ,4C5 ?ON1O4 <©%+6x2¯©C+a¯³%+v6·|Oe p D4C.=C+,O34
³D+~} ,4C5 ?ON1O4 -³%+6x2¯©C+a¯³%+v6·|Oe
³>8a+~} A+ 0³D+tc p =%,N;t5*C+,8
15
©8a+~} A+ :¨:gB9HC©%D+tB9H 5>®,= 0³D+tc
p 1C,41ON,4NC.=uONA%N=%N
³ 8} A+ <³tc p =%,N;t5*C+,8
©>8} A+ :¨:gB9H%©KeB9H 5>®,= <³tc
20
p ¯N%N,%NM%N°µCN=O5+>; 58t+%+=?O4,
p º §%D+}~;ONJCN=C2© 86{³86 º §°6x¤|6ª©>8O+6ª³ 8O+v6ùeB9%6±B:>O(
p ¼CNO3NA¿CNC5%3%5 ¾ u%;t.=O3+>;ú58t+%+=?O4,
.+4*}, q *O.,4%N=4°%x2© 8@ª³8à|6 º §a6%BBB
-¼%¿ u%;t.=O3+>; 6r<8uaA=O3®uON%,O3%. 6r-¼%¿ ¨ 8t+%+=?@6ùB<%C(
25
§º %D+}¾%, ¾q *O3 ;C=4,1à%x<©>8O+°¬³ 8O+v|6·.+4*(g ¾
p 3 ;C=4,1a+%AN=4,C35¡*,%44,%N=%4*+,+>uC=1%uC=
,D+~} 3 ;C=4,1G <³ u 6{,g uce6/³D%+v6~A3 ;O4%N,v6r04y%=%,N1 :c
p .+ 81%uC=41C,O3%.4C5N>;º ,4C5 ?ON1O4=%+r8aN=%,O3%.4C5
30
£%D+~} q 5s%,4%4t5:¨:g6V©D%+v6{,D%+v6 º §D%+ 6ª³D%+v6 e
§C+~} ,4C5 ?ON1O4 §D%+6%x<¯³%+6ª¯©C+O|Ce
£º +} ,4C5 ?ON1O4 <£º D%+6x9¯³%+6ª¯©C+O|Ce
,%+} ,4C5 ?ON1O4 -,D%+6x9¯³%+6ª¯©C+O|Ce
The points that we observe are given at the nodes x n , for n = 1, . . . , N. The radial
function φ(x) will determine how the impact of the value at each nodebehaves.
Common radial functions include the Gaussian φ(x) = exp −x 2 /(2σ 2 ) and the
left) and the Nadaraya-Watson (NW, right) methods. The corresponding local
0.5
0.6
0.5
0.6
0.7 0.7
volatility surfaces and the implied probability density functions for different
0.8
0.9
0.8
0.9
0 0
0.1 0.1
0.2 0.2
0.3 0.3
0.4 0.4 0.4 0.4
0.5 0.5
0.6 0.6
0.7 0.3 0.7 0.3
0.8 0.8
0.9 0.2 0.9 0.2
1 PSfrag replacements1
0
0.1 0.1
0.1
0.2
0 0.3 0
1300 2 0.41300 2
1400 1.5 0.5 1400 1.5
1 0.6 1
1500 0.7 1500
0.5 0.5
1600 0 0.8 1600 0
0.9
1
(a) implied volatility (RBF) (b) implied volatility (NW)
0
0.1
0.2
1.5 0.3 1.5
0.4
0.5
PSfrag replacements 0.6
PSfrag replacements PSfrag replacements
1 0.7 1
0 0
0.8
0.1 0.1
0.9
0.2 0.21
0.5 0.5
0.3 0.3
0.4 0.4
0.5 0.5
0.6 0 0.6 0
1300 2 1300 2
0.7 0.7
1400 1.5 1400 1.5
0.8 0.8
1 1
0.9 1500 0.9 1500
0.5 0.5
1 1
1600 0 1600 0
0.004 0.004
1300 1300
1.5 1.5
1400 1400
1 1
1500 0.5 1500 0.5
1600 0 1600 0
p
multiquadratic function φ(x) = 1 + (x/σ)2 , among others.21 The values of the
parameters c0 , c and λn are determined using the observed value function at the
nodes x n and the required degree of smoothness. Figure 6.8(a) presents a set
21
The parameter σ is user defined. In Matlab the RBF interpolation is im-
? =:=-1GFIH:H<J:J:JKBÌ8N0=-?J>+-,-5CB.0+98H98N0= A0N q .-4<; =,NA:H
plemented in the package of Alex Chirokov that can be download at
.
1.8 1.8
1.6 1.6
1.4 1.4
1.2 1.2
maturity
maturity
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0 0.2 0 0.2
0.5 0.5
1 0 1 0
1300 1350 1400 1450 1500 1550 1600 1300 1350 1400 1450 1500 1550 1600
1.5 1.5
2 strike 2 strike
PSfrag replacements
(a) vertical spreads (RBF) PSfrag replacements
(b) vertical spreads (NW)
1.8 1.8
1.6 1.6
1.4 1.4
1.2 1.2
maturity
maturity
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0
0 0.2 0.2 0.2
0.5 0.4
1 0 0.6 0
1300 1350 1400 1450 1500 1550 1600 1300 1350 1400 1450 1500 1550 1600
1.5 0.8
2 strike 1 strike
PSfrag replacements
(c) butterfly spreads (RBF) PSfrag replacements (d) butterfly spreads (NW)
1.8 1.8
1.6 1.6
1.4 1.4
1.2 1.2
maturity
maturity
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0 0
0.2 0.2 0.2 0.2
0.4 0.4
0.6 0 0.6 0
1300 1350 1400 1450 1500 1550 1600 1300 1350 1400 1450 1500 1550 1600
0.8 0.8
1 strike 1 strike
LISTING 6.9:
%%T ` l !'( Y : Tests for static arbitrage.
p =4C5=CsDC+%AvBw8
381asDC+%A p A+N%N=%NN>;58t+%+=?DC+%AN=O3A3=%Mú5 uC,%*NC.4
p D4,%=O3%.NA5 1C,4%NO5
§¨~} -£+%FO6:KF 4 ;% %a£+%FO6<(F 4>; H%©
§¨~} 2§%¨c}CC¹(-§%¨G}Og
5
prq uC=%=4,%*CAM5 1C,4%NO5
¿¨~}r£+%FO6eF 4>; K>£+z%FO6<(F 4 ;% %%%£+z%FO6:KF 4 ;% e
¿¨~} 2¿%¨c}Cg
p .NA4;CN,þ5 1C,4%NO5
«¨~}r£+<(F 4 ;% 6%F-a>£+z:KF 4 ;% %e6%F-(
10
Ǭ~} -Ǭc}Cg
PN
n=1 wn yn exp(−x 0 Hx)
f(x) = P N
n=1 wn exp(−x 0 Hx)
where yn is the observed value at the point x n , and the matrix H = diag(h1 , . . . , hn )
is user defined. This is implemented for the two-dimensional case in listing 6.7.
Figure 6.8(b) gives the implied volatility surface smoothed using the Nadaraya-
Watson method.
Of course the smoothed or interpolated volatility surface can be mapped
to call and put prices using the Black-Scholes formula. There is also a num-
ber of restrictions that one needs to take into account when constructing the
volatility surface. In particular, it is important to verify that the resulting prices
do not permit arbitrage opportunities. As shown in Carr and Madan (2005) it
is straightforward to rule out static arbitrage by checking the prices of sim-
ple vertical spreads, butterflies and calendar spreads. More precisely, having
constructed a grid of call prices for different strikes 0 = K0 , K1 , K2 , . . . and ma-
turities 0 = T0 , T1 , T2 , . . ., with Ci,j = fBS t, S; Ki , Tj , r, σ̂ (Ki , Tj ) , we need to
construct the following quantities
C −C
1. Vertical spreads V Si,j = Ki−1,j
i −Ki−1
i,j
. There should be 0 6 V Si,j 6 1 for all
i, j = 0, 1, . . .
2. Butterfly spreads BSi,j = Ci−1,j − KKi+1i+1−K Ki −Ki−1
−Ki Ci,j + Ki+1 −Ki Ci+1,j . There should
i−1
where k = log(K /F ). This form always remains positive and ensures that it
grows in a linear fashion for extreme log-strikes. In particular Gatheral (2004)
shows that α controls for the variance level, β controls the angle between the
two asymptotes, σ controls the smoothness around the turning point, ρ controls
the orientation of the skew, and µ shifts the skew across the moneyness level.
IMPLIED DENSITIES
Based on the implied volatility function σ̂(T , K ) the empirical pricing function
is easily determined via the Black-Scholes formula
P(T , K ) = fBS t, S0 ; T , K , r, σ̂ (T , K )
It has been recognized, since Breeden and Litzenberger (1978), that the empirical
pricing function can reveal information on the risk neutral probability density
that is implied by the market. In particular, if Qt (S) is this risk neutral probability
measure of the underlying asset with horizon t, then the call price can be written
as the expectation
Z ∞
P(T , K ) = exp(−rT ) (S − K )dQT (S)
K
If we differentiate twice with respect to the strike price, using the Leibniz
rule
Z Z β(t)
∂ β(t) dβ(t) dα(t) ∂
g(x, t)dx = g β(t), t − g α(t), t + g(x, t) dx
∂t α(t) dt dt α(t) ∂t
LISTING 6.10:
'!%^ ` l !'( Y : Construction of implied densities and the local volatil-
ity surface.
p %A +.sDC+%ABw8
381asDC+%A p A+N%N=%NN>;58t+%+=?DC+%AN=O3A3=%Mú5 uC,%*NC.4
p %5 =4,O3DJ%,=³
¥t³} -£+ eF 4 ;% 6%F-a>£+z:KF 4 ;% 6%F-CHc<%a³Kc
p %5 =4,O3DJ%,=~©
5
¥t:©} -£+%FO6 eF 4>; C>£+z%FO6:KF 4 ;% OHc<%a©Gc
p ;%4,O3DJ%,=~©
¥C>©} -£+%FO6 eF 4>; G>£+%FO6<(F 4 ;% %%%£+z%FO6:KF 4 ;% OH%©(c
p 381aA341C,C+ q N q 3A3=%M 4;t5%3=%M*u%;t.=O3+>;
10 ¾p A%+}.NA 4y1 ,%+%FO6<(F 4 ;% %B2C³+%FO6<(F 4 ;% %OB2¥C>©
DC+%AN=O3A3=%M*u%;t.=O3+>;
¸§}r£+<(F 4 ;% %e6<(F 4 ;% %eKBBB
©%+<(F 4 ;% %e6<(F 4 ;% %gB2¥t:©°<(F 4 ;% %e6%F-e
¸§}r¥t³v%FO6<(F 4 ;% %~,%+<(F 4 ;% %e6<(F 4 ;% %B2%¸§
15
¸§}r¸§ B<H(<B<O©%+z<(F 4 ;% %e6<(F 4 ;% %zB-eB-%¥C>© <(F 4 ;% %e6%F-Cc
we obtain the Breeden and Litzenberger (1978) expression for the implied prob-
ability density function
∂2 P(T , K )
dQT (S) = exp(rT ) (6.1)
∂K 2 K =S
One can recognize that the above expression is the price of 1/(4K ) 2 units of a
very tight butterfly spread around S, like the one used in the static arbitrage
tests above. The relation between the butterfly spread and the risk neutral
probability density is well known amongst practitioners, and can be used to
isolate the exposure to specific ranges of the underlying. We carry out this
approximation in listing 6.10, and the resulting densities are presented in figures
6.8(e,f).
LOCAL VOLATILITIES
A natural question that follows is whether or not a process exists that is consis-
tent with the sequence of implied risk neutral densities. After all, Kolmogorov’s
extension theorem 1.3 postulates that given a collection of transition densities
such a process might exist. Dupire (1994) recognized that one might be able to
find a diffusion which is consistent with the observed option prices, constructing
We can integrate the above expression twice with respect to K which will even-
tually yield the PDE22
The above links the local volatility model with prices of observed call options 23
and in principle it could be used to extract the local volatility function σ(t, S)
from a set of observed contracts. Unfortunately, there is a number of practical
problems with this approach, which stem from the fact that the local volatility is
a function of first and second derivatives of the pricing function P(T , K ). For a
start, there is only a relatively small number of calls and puts available at any
point in time, which means that we will need to set up some interpolation before
we carry out the necessary numerical differentiation using finite differences.
Therefore, our results will be dependent on the interpolation scheme that we
use.
In addition, the observed option prices are “noisy”, and interpolating through
their values will cause its own problems. Numerical differentiation is unsta-
ble at the interpolating nodes, and attempting to take the second derivative
is a guarantee for disaster, with the resulting local volatility surfaces varying
2 P(T ,K )
h i
22
During the second integration we use the identity K ∂ ∂K 2
= ∂
∂K
K ∂P(T
∂K
,K )
− ∂P(T
∂K
,K )
.
23
And also put options through the put-call parity.
If one assumes a form for the implied volatility function, either using an
interpolator or a smoother, it is possible to express the local volatility σ(t, S) in
terms of the implied volatility σ̂(T , K )|T =t,K =S . This is of course feasible since
the pricing function
which can be differentiated analytically with respect to the strike K and the
maturity T . It is actually more convenient to work with the moneyness y =
log(K /F ) = log(K /S) + rT , and also consider the implied total variance as
a function of the maturity and the moneyness w(T , y) = T σ̂ 2 (T , K ). Then, as
shown in Gatheral (2006) the local variance can be easily computed as
1 ∂w(T ,y)
σ 2 (T , y) = T ∂T
2
y ∂w(T ,y) 4y2 −4w(T ,y)−w 2 (T ,y) ∂w(T ,y) 1 ∂2 w(T ,y)
1− w(T ,y) ∂y + 16w 2 (T ,y) ∂y + 2 ∂y2
factor that will affect the bond yield. When the yield of an instrument is quoted,
it is important to know what compounding method has been used, in order to
truly compare bonds.
In particular, let Pt denote the price of an instrument at time t (measured in
years). The simple yield y1 (t1 , t2 ), between two dates t1 and t2 , satisfies
P t2 1 P t2
= 1 + y1 (t1 , t2 )(t2 − t1 ) ⇒ y1 (t1 , t2 ) = −1
P t1 t 2 − t 1 P t1
The simple yield is the return of an investment equal to Pt1 that is initiated at
time t1 , and is then liquidated at time t2 for a price Pt2 . There is no intermediate
reinvestment of any possible proceedings.
Of course one could sell the instrument at the intermediate time t ? = t1 +t 2
2
for a price Pt? , and reinvest this amount for the remaining time to t2 . Say that the
yield of this strategy is denoted y2 (t1 , t2 ). In that case the two simple investments
will satisfy
P t2
= 1 + y2 (t1 , t2 )(t2 − t ? )
Pt ?
Pt ?
= 1 + y2 (t1 , t2 )(t ? − t1 )
P t1
Multiplying the will give the yield if we compound twice during the life of the
bond, namely
" #
P t2 t2 − t 1 2 2 Pt2 1/2
= 1 + y2 (t1 , t2 ) ⇒ y2 (t1 , t2 ) = −1
P t1 2 t2 − t 1 P t1
More generally, if we compound m times over the life of the bond we can
follow the same procedure to deduce the yield ym (t1 , t2 )
m " 1/m #
P t2 t2 − t 1 m P t2
= 1 + ym (t1 , t2 ) ⇒ ym (t1 , t2 ) = −1
P t1 m t2 − t 1 P t1
We will work with the continuous compounded return from now on, and we
will drop the subscript ∞, writing instead y∞ (t, T ) = y(t, T ) = yt (T ). We will
also denote with P(t, T ) = Pt (T ) the price at time t that matures at time T .
100
= (1 + 0.055 × 0.366) (1 + 0.055 × 0.5)3 =
P t1
The financial toolbox of Matlab has a number of functions that convert be-
tween different day count conventions, and the appropriate discount factors.
Here, since the markets are set up in continuous time we will use continuous
compounding.
J:J:JKB3:5<NaBI+-,:
2
More information can be found at the International Swaps and Derivatives Association
website ( ), and in particular in ISDA (1998).
FIGURE
0.0 7.1: Examples of yield curves using the Nelson-Siegel-Svensson
0.1
parametrization. The parametric form is able to produce curves that exhibit the
0.2yield curve shapes.
basic
0.3
0.4 5.0
0.5 flat
normal
0.6 4.8 inverted
humped
0.7 4.6
0.8
0.9 4.4
1 4.2
yield
4.0
3.8
3.6
3.4
3.2
3.0
0 1 2 3 4 5 6 7 8 9 10
maturity
Nelson and Siegel (1987) and Svensson (1994), collectively denoted with NSS,
discuss various parametric forms of the yield curve, summarized in the form
LISTING 7.1:
"'T>! " T& ' ` T l "OTT>!" Y
` b : Yields based on the Nelson-Siegel-
Svensson parametrization.
p a; 4CAC5%+;tsC534%C4CAsC5DC4;K55%+;Bw8
*u%;t.=O3+>; MC}r;a4CAC5%+;tsC534%C4CAsC5DC4;K55%+;g2³6{1CN,c
q }1CN,B q 4=N%
q }1CN,B q 4=Na(
q ~}1CN,B q 4=Ng
5
=O}1CN,B0=%N>uK
=%~}1CN,B0=%N>uO 4>1a5
M~} q ~ q CK0C 4y1 ³B9HC=OCB2Hc0³zB9HC=OC BBB
q O 4y1 ³zB9HC=%tB<G:O 4y1 ³zB9HC=%tB2Hc0³B9HC=%tc
Listing 7.1 gives a simple Matlab code that implements the NSS formula.
Individual yield curves can be used to calibrate this formula and retrieve the
corresponding parameters.
In this chapter we are interested in the construction of mathematical models
that have two desirable and very significant features. On one hand, they should
have the potential to reproduce observed yield curves. In addition, they must be
able to capture the evolution of the yield curve through time, in order to offer
reliable prices for derivative contracts that are based on future yields or bond
prices.
20
2007 2006 10
2005 2004 2003 2002 2001
time maturity
LISTING 7.2:
^>$'%& _ %R $>% ` "OTc Y : Calibration of the Nelson-Siegel formula to a
yield curve.
p . NA3 q ,N=4s>;t5Bw8
*u%;t.=O3+>; xwz6{1CN,O|~}.NA3 q ,N=4s>;t5°2³6¤K
1CN,}c
+ 1=}+>1C=O38K54=°C-¥a3%5 1aANM°6r2;O+ ;C4v:c
~}A5®;a+>;aA3 ;àr%55 ®à6x 8O4N>; ¤K(6ª(B-(6 eB0|6%BBB
5
x¤|6x¤|6 + 1=Gc
*u%;t.=O3+>; 5 ®}55>®z91G
1CN,vB q 4=N%}1v:e
1CN,vB q 4=Na}1v<e
1CN,vB q 4=N~}c
10
1CN,vB-=%N>uKÃ}1v Ce
1CN,vB-=%N>uO }c
5 ®}r;a4CAC5%+;tsC534%C4CAsC5DC4;K55%+;g2³6{1CN,c~¨¤
4>;
15
4>;
β0
0.7 5.5
0.8 5.0
0 4.5
0.9
0.11 4.0
0.2 0
0.3 -1
0.4 -2
0.5 -3
β1
0.6 -4
0.7 -5
-6
0.8
-7
0.9
1 12
10
8
τ1
6
4
2
0
2002 2003 2004 2005 2006
which are the fixed rates of return that are set and reserved at time t, but will
be applicable over a future time period.
In particular, say that we select two points on the yield curve, for bonds that
mature at times T ? and T , with T > T ? > t. The prices of these bonds will be
Pt (T ) and Pt (T ? ), respectively. Now assume that we are interested in setting the
forward (continuously compounded) rate of interest for an investment that will
commence at time T ? and will mature at T , which we will denote with ft (T ? , T ).
Consider the following two investments over the period [t, T ]:
1. Buy one risk free bond that matures at time T . This will cost P t (T ) today,
and will deliver one pound at time T .
2. Buy Pt (T )/Pt (T ? ) units of the risk free bond that matures at time T ? . Also
enter a forward contract to invest risk-free over the period [T ? , T ], at the
rate ft (T ? , T ). This strategy will also cost Pt (T ) today, as it is free to enter
a forward contract. The first leg will deliver Pt (T )/Pt (T ? ) pounds at time T ? ,
which will be invested at the forward rate. Therefore at time T this strategy
will deliver exp{ft (T ? , T ) · (T − T ? )} · Pt (T )/Pt (T ? ).
These two strategies have the same initial cost to set up, the same maturity,
and are both risk free. Therefore they should deliver the same amount on the
maturity date T , otherwise arbitrage opportunities would arise. For example, if
the second strategy was delivering more than one pound at time T , then one
would borrow Pt (T ) at the risk free rate to enter the second strategy with zero
cost at time zero.
Therefore, the arbitrage free forward rate will satisfy
Pt (T ) = Pt (T ? ) · exp {−ft (T ? , T ) · (T − T ? )}
1 Pt (T )
⇒ ft (T ? , T ) = − ?
log
T −T P( T ? )
If we let the time between the two maturities shrink down to zero, by let-
ting for example T ? → T , we define the (instantaneous) forward rate. This is
essentially the short rate that we can reserve today, but will applied at time T
log Pt (T ) − log Pt (T ? )
ft (T ) = − lim
? T ↑T T − T?
∂ log Pt (T ) ∂yt (T )
=− = yt (T ) + (T − t)
∂T ∂T
Forward rates for different maturities define the forward curve. There is a corre-
spondence between the yield and forward curves, and knowing one leads to the
other.
Our objective is to establish prices for bonds with different maturities. The only
constraint that we need to take into account is that the prices of these bonds
must rule out any arbitrage opportunities. In all generality, the price (at time t)
of a bond with maturity T can depend at most on the time t and the short rate
level rt , that is to say
Pt (T ) = g(t, r(t); T )
This formalizes the statement we made above, that the bond is a derivative on
the short rate. It appears that the setting is similar to the one in equity derivative
pricing, if we consider the short rate as the underlying asset. In particular we
can see the analogy
Although the two settings appear to be very similar, there is a very significant
difference: Unlike equities, the short rate is not a traded asset. This means that
we cannot buy or sell the short rate, and therefore we cannot construct the
necessary risk free positions that produced the Black-Scholes PDE. The market,
as we constructed it, is incomplete.
In fact, the pricing of bonds has more common features with the pricing
of options under stochastic volatility, where again we introduced a non-traded
factor (the volatility of the equity returns). Then (section 6.3) we constructed a
portfolio of two options, in order to solve for the price of volatility risk. Here we
will use the same trick, namely to construct a portfolio of two bonds with different
maturities, and investigate the conditions that would make it (instantaneously)
risk free. This will naturally introduce the price of short rate risk that will be
unknown; we will be able to determine this price of risk by calibrating the model
on the observed yield curve, in the same spirit as the calibration of SV models
on the implied volatility surface. These are summarized in the following table
equity SV fixed income
non-traded asset: volatility short rate
used to hedge: 2 options 2 bonds
calibrate on: IV surface yield curve
Now we invoke the same line of argument that we used in section 6.3. In
order to set up the above relationship we did not explicitly specify a particular
pair of bonds, and it will therefore hold for any pair of maturities. Thus, for any
set of maturities T1 , T2 , T3 , T4 , . . . we can write
Therefore the ratio cannot depend on the particular bond maturities, it can at
most depend on (t, rt ), say that it is equal to λ(t, rt ). This means that we can
write
αt (T ) − g(t, rt ; T )rt
= λ(t, rt )
βt (T )
for any maturity T . Essentially we have managed to derive the PDE that the
bond pricing formula has to satisfy, in order to rule out arbitrage opportunities.
We can thus drop the maturity T , as it is not affecting the PDE in any way, and
write
∂g(t, r) ∂g(t, r)
+ {µ(t, r) − λ(t, r)σ(t, r)}
∂t ∂r
1 ∂2 g(t, r)
+ σ 2 (t, r) = g(t, r)r
2 ∂r 2
This PDE is called the term structure PDE, and a boundary condition is
needed in order to solve it analytically or numerically. For a zero-coupon bond
that matures at time T the boundary condition for this PDE will be g(T , r; T ) =
1. Although the PDE is called the term structure PDE, we never used the fact the
the instruments are actually bonds. The quantities Tj can be thought as indices
for different interest rate sensitive instruments: bond options, caps, floors or
swaptions will all satisfy the term structure PDE. In general, any contingent
claim that promises to pay Φ(r(T )) at time T will satisfy the same PDE, with
boundary condition
g(T , r) = Φ(r)
It is apparent that if we require λ(t, r) to be bounded for all t, then the above
expectation will also be bounded. This is a feature that is shared by most models
for the short rate.
Since we are observing bonds which are priced under Q, it is impossible
to explicitly decompose λ from the true short rate drift µ. The best we can do,
In the Vasicek framework the short rate is Gaussian, a feature that leads to
closed form solutions for a number of instruments. For that reason the Vasicek
specification is still used by some practitioners today. In particular
If we assume a constant price of risk λ(t, r) = λ, then under risk neutrality the
dynamics of the sort rate are
drt = θ r̄ Q − rt dt + σdBtQ
for r̄ Q = r̄ + θλ . This indicates that as investors are risk averse, they behave as
if the long run attractor of the short rate is higher than what it actually is. The
pricing functions of interest rate sensitive securities will satisfy the PDE
As the PDE has to be satisfied for all initial spot rates r, we conclude that
both square brackets must be equal to zero, and that C(T ; T ) = D(T ; T ) = 0.
Therefore we recover a system of ODEs for the functionals C and D, namely 3
1
Ct (t; T ) + θr̄ Q D(t; T ) + σ 2 D 2 (t; T ) = 0
2
Dt (t; T ) − θD(t; T ) = 1
C(T ; T ) = 0
D(T ; T ) = 0
The solution of the above system will give the Vasicek bond pricing formula,
namely
1 − exp{θ(T − t)}
D(t; T ) = −
θ
[D(t; T ) − (T − t)] · [θ 2 r̄ Q − σ 2 /2] σ 2 D 2 (t; T )
C(t; T ) = −
θ2 4θ
One important feature of the Vasicek model is the mean reversion it exhibits.
In particular, the short rate of interest is attracted towards a long run value r̄.
The strength of this mean reversion is controlled by the parameter θ. Intuitively,
the half life of the conditional expectation is 1/θ, which means that if the short
rate is at level rt at time t, then it is expected to cover half its distance from the
long run value in 1/θ years. The main shortfall of the Vasicek model is that it
permits the short rate to take negative values. This happens because the short
rate is normally distributed, and therefore can take values over the real line.
As bond prices are exponentially affine with the short rate, and future short
rates are normally distributed, it is easy to infer that future bond prices will
follow the lognormal distribution. Therefore bond options will be priced with
formulas similar to the Black-Scholes one for equity options. In particular, the
price of a call option with strike price K that matures at time τ, written on a
zero coupon bond that pays one pound at time T > τ will be equal to
3
Here we follow the approach outlined in Duffie and Kan (1996) for general affine
structures. Such systems of ODEs that are ‘linear-quadratic’ are known as Ricatti
equations.
LOGNORMAL MODELS
The main shortcoming of the Vasicek model is that it permits negative nominal
interest rates. One straightforward way around this problem is to cast the prob-
lem in terms of the logarithm of the short rate. The first application of this idea
can be found in Dothan (1978) model, which specifies
1
drt = θrt dt + σrt dBt , or d log rt = θ − σ 2 dt + σdBt
2
Here the short rate follows the geometric Brownian motion, just like the un-
derlying stock in the Black-Scholes paradigm. The short rate is log-normally
distributed, and therefore takes only positive values. On the other hand, there
is no mean reversion present, and the long run forecast for the short rate will
either be explosive (if θ > σ 2 /2) or zero (if θ < σ 2 /2). For that reason the Dothan
model is not popular for modeling purposes.
Another approach is casting the logarithm of the short rate to follow the
Ornstein-Uhlenbeck process, giving rise to the exponential Vasicek model
The CIR model is able to capture most of the desired properties of short rate
models. The process is mean reverting, with the long run attractor equal to r̄.
The speed of mean reversion is controlled by the parameter θ. As the short rate
increases, its volatility also increases, at a degree which is dictated by σ. CIR
show that the transition density of the process is a non-central chi-square. In
particular, for
θr̄
2crT |rt ∼ χ 2 4 2 , 2rt c exp{−θ(T − t)}
σ
2θ
with c = 2
σ (1 − exp{−θ(T − t)})
Having the transition density in closed form allows us to calibrate the parameters
to a set of historical data. Unfortunately, the short rate is not directly observed,
but practitioners use yields of bonds with short maturities as a proxy for the
dynamics. More elaborate methods involve (Kalman) filtering and are discussed
later.
One can readily compute the expected value and the variance of the short
rate process, in particular
11 PSfrag replacements
Time series of yield curves
10
PSfrag replacements 9
10
8
8
7
6
rate
yield
6
4
5
2
4
0
3 30
10
2 20 8
6
10 4
1
0 5 10 15 20 25 30 2
0 0
time time maturity
Option prices also take a (relatively) simple form, being dependent on the
cumulative densities of non-central chi-square distributions
with
drt = θt dt + σdBt
The above expression in fact is the functional form for the yield curve, if the time
varying drift functional θt was known. It turns out that it is more convenient
to use the forward curve instead. In particular, applying the Leibnitz rule for
differentiation, yields
Z T
∂ log Pt (T ) σ2
ft (T ) = − = rt + θu du + (T − t)2
∂T t 2
As bond prices are lognormally distributed, options on these bond can be priced
using a formula that is analogous to the Black-Scholes one
Now the short rate will revert towards θt /α, with t 7→ θt a deterministic function
of time. Although negative rates are permitted, in many cases the presence of
mean reversion ensures that their probabilities are fairly small.
Using exactly the same arguments as the ones in the Ho-Lee case, we can
solve for the functional θT in terms of the forward curve ft (T )
LISTING 7.3:
` ^ R%$>%e Y : Create Hull-White trees for the short rate.
p ?JOs.,4%N=4 Bw8
*u%;t.=O3+>; µ}r?JOs.,4%N=4v2³z6¿6{*C6ùN%A 1?CN°6 5%38aNK
¥³} C3>** <³tc
¥y} *4DNA 53>>8ON 6{³tc
¥y}r¥y:KF 4 ;% %gB2 5>®,= ¥³Gc
5
§´} -¥y°B<B2NH (
y}F¾x9| ¿(¬£´O } ¾¿c±«yF } ¾¿c
*+, 3 y}tF A4;C%=? <¥³G
y4~}.4A%A% 8aN=v<yg 4>; (
¥%yC3}r¥yz3 yGc
10
3} A4;C%=? <ytc±3>~} A4;C%=? -y4Kc
´}a¥³z3 yG% *4DNA :N%A 1?CN 6{³g3 yG%%y4g
©´} ,C+>u%;C :y4OKB9H¥%yC3tc
8 } 8ONy 2©K%Ot
}þ-y4 ©t¥%yC3aB9H¥%yC3g
15
£ ¾a3 /y ¿Ã}x§3 yGCHc<%C¥%yC3K% B<Gr aOzB2Hz6%BBB
/
%>§3 yGH%¥%yC3aa B<z6%BBB
§3 yGCHc<%C¥%yC3K% B<G# c%zB<H%|
«/y ¾a3 Ày ¿}xw©cc6/©z6/©tO|8GOt
y ¾a3 ytO ¿´}r¥%yC3aK 8vFw8cg
20
4>;
üF } ¾x0 | ¿(ù,~O } ¾¿c
*+, %y}tF A4;C%=? <ytt
«>y}«Ày ¾%jy ¿G
~} *%4,C+ # <t0!ü ¾%jy ¿%BBB
25
4y1 *4DNA <*C6<y ¾%jy ¿Ct%¥³z2%yGO¿<%yG(6¬B2ae
, ¾%/y ¿r} *4DNA <*C6{~y ¾%jy ¿%e
yy}¶xw|e
*+, r}tF 8ONy 8ONy :«>yc
yyz@ z6:} 5u8 5u8 r:«>yt}/} OeB2/£ ¾%Ày ¿e6ù%BBB
30
B2 4y1 , ¾%>y ¿%¥³<%yGeB2C!ü ¾%Ày ¿c
4>;
ü ¾%ytO ¿}ryyz
!
4>;
µvB9£}£Á µvB-ü}~üe
µvB<«r}«yz\ µvB2y~}yg
35
µvB2³~}³gÁ µvB2,~},g
ζ = {i4ζ : i = −m, · · · , 0, · · · , m}
Typically, from the point ζ̄i = i4ζ the process can move to the nodes {ζi+1 , ζi , ζi−1 }.
Then, one can solve a system that matches the instantaneous drift and volatility
for the probabilities {p+ , p0 , p− }
p+ 4ζ − p− 4ζ = −αi4ζ4t
p+ (4ζ)2 + p− (4ζ)2 = σ 2 4t + α 2 i2 (4ζ)2 (4t)2
p+ + p 0 + p − = 1
7 1
p0 =+ αi4t [αi4t − 3]
6 2
1
p− = − − αi4t [αi4t − 2]
3
1 1
p−− = + αi4t [αi4t − 1]
6 2
and
1 1
p++ =
+ αi4t [αi4t + 1]
6 2
1
p+ = − − αi4t [αi4t + 2]
3
7 1
p0 = + αi4t [αi4t + 3]
6 2
As we noted, in such cases the tree will not grow and the next set of nodes will
also have m+1 elements. The top half of listing 7.3 implements this method for a
more general setting, when the time steps, the mean reversion and the volatility
are all time varying. It makes sense to select the value of m as the first one for
which the volatility geometry changes from the standard one to the ones that
force mean reversion:
( " #)
ζ̄i
m = max round (1 − α4t) +1
4ζ
Conditioning on the state at time T − 1, and using the definition of the bank
account process, allows us to expand the conditional expectation as
1
EQ
t cT = k
BT
K
X T −1
1
= EQ
t c T = k, c T −1 = j PQ
t [cT −1 = j|cT = k]
BT −1 erT −1 4t
j=1
K
X T −1
−rj 4t 1
= e EQ
t cT = k, cT −1 = j PQ
t [cT −1 = j|cT = k]
BT −1
j=1
K
X T −1
−rj 4t 1
= e EQ
t cT −1 = j PQ
t [cT −1 = j|cT = k]
BT −1
j=1
PQ
t [cT −1 = j]
PQ Q
t [cT −1 = j|cT = k] = Pt [cT = k|cT −1 = j]
PQt [cT = k]
K
X T −1
1
Qt (k, T ) = e−rj 4t pt (j, k)EQ
t cT −1 = j PQ
t [cT −1 = j]
BT −1
j=1
K
X T −1
Overall, the construction of an interest rate tree resembles the local volatility
models for equity derivatives. In both frameworks we attempt to exactly replicate
a market implied curve or surface. One has to keep in mind the dangers of over-
fitting, which would introduce spurious qualities into the model. In many cases
market quotes of illiquid instruments can severely distort the model behavior.
This specification exhibits mean reversion, and through the exponential trans-
formation ensures that the short rate remains positive. As with all lognormal
models, the Black-Karasinski model implies an explosive expectation for the
bank account, but since in practice the implementation is done over a finite tree,
this drawback is not severe.
5.8
0
0.1
0.2
5.6
0.3
PSfrag replacements
yield
0.4
0.5
0.6 5.4
0
0.7
0.1
0.8
0.2
0.9
0.31 5.2
0.4
0.5
0.6
0.7 5
0 5 10 15 20 25 30
0.8
0.9 maturity
1 (a) yield curve
25
20
0
0.1
0.2
15
0.3
short rate
0.4
0.5
0.6 10
0.7
0.8
0.9
1 5
0
0 5 10 15 20 25 30
time
(b) short rate tree
FIGURE 7.6: Price path for a ten year 5.50% coupon bearing bond. The price
paths are consistent with the yield curve of figure (7.5), modeled using the
Black-Karasinski process.
150
100
bond price
PSfrag replacements
50
0
0 2 4 6 8 10
time
(a) ten year bond
150
100
bond price
PSfrag replacements
50
0
0 0.5 1 1.5 2
time
(b) initial two year period
70 70
60 60
option price
option price
40 40
30 30
20 20
10 10
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
time time
Listing 7.6 shows how the HW tree building methodology is applied in the
Black-Karasinski case. The yield curve of figure (7.5.a) is assumed, and a HW
is constructed. We use 4t = 1/16 over the first three years, 4t = 1/8 from
the third to the tenth, and 4t = 1/2 for remaining twenty years. A view of
the resulting interest rate tree is given in figure (7.5.b), where this uneven time
discretization is apparent.
Of course, to value such a simple bond we do not need to construct the
complete price path, and in fact we do not need to construct a HW tree at
all. The fair price can be determined by using the yield curve alone, just by
discounting all cashflows. The price path is needed though if we want to value
an option on this ten year bond.
As an example we consider a two year put, with strike price K = $80. To
price this option we need the distribution of the bond price after two years.
Figure (7.6.b) gives the possible bond prices and the corresponding price paths
for the two year period. Essentially, the put option gives us the right to sell
the bond at the strike price if the interest rates after two years are too high.
The price paths for a European and an American version are illustrated in figure
(7.7); the corresponding prices are PE = $3.08 and PA = $3.46, indicating an
early exercise premium of $0.38, which is actually more than 10% of the option
price. The red points (in figures 7.6.b and 7.7.b) indicate the scenarios where
early exercise is optimal. We can observe how the coupon payments affect the
exercise boundary, as we would prefer to exercise immediately after the coupon
payment is realized.
CALIBRATION ISSUES
All models with time varying parameters can be cast in a binomial/trinomial form
that approximates the short rate movements. Nevertheless, although the tree will
always adjust to match the current yield curve, following the HW procedure,
there are still parameters to be identified.
As an example, in the Black-Karasinski examples above we assumed α = 0.25
and σ = 0.20, but without any justification of these values. There are two options
in setting values for such free parameters, based on historical yield curves or
based on derivative prices.
Given a set of historical yield curves, one can produce estimates for the
speed of mean reversion and the volatility parameters. This can be done by
proxying the historical (unobserved) spot rate with a yield of a relatively short
maturity, and then applying maximum likelihood estimation methods. Of course
one has to keep in mind that such an exercise is carried out under the physical
probability measure, which might or might not have an impact, depending on the
exact parameterization of the price of risk. Even better, one can maintain the spot
rate unobservable, and use Kalman filtering techniques that draw information
from the whole yield curve. Such an approach allows one to jointly recover the
parameters under both physical and risk adjusted measures.
If derivative prices on interest rate sensitive instruments are available and
liquid, then their prices can be used to also provide estimates for the fixed
parameters. This is typically done by minimizing the squared price differences,
or the differences between model and actual implied volatilities. This seems to
be the method of choice amongst practitioners, but care must be taken to avoid
pitfalls.
In spirit, the second approach is very similar to the standard calibration of
stochastic volatility models to derivative prices, and is also subject to the im-
plementation difficulties that are associated with such models. In particular, one
main obstacle is model identification, where different set of parameters produce
the same optimal objective function.
Typically, a model will be calibrated on a set of interest rate caps and
swaptions, which are instruments that are sensitive to the volatility of the interest
rate. In virtually all short rate models the terminal volatility is the outcome of
the two quantities we want to retrieve, namely the speed of mean reversion and
the volatility of the innovations. Decreasing the speed has more or less the same
effect as increasing the volatility, and calibrating these quantities is not a well
identified problem. As in the stochastic volatility example, there is a locus of
parameter pairs that produce the same optimal fit, and we have no information
to distinguish between them. Surprisingly many practitioners choose to ignore
this issue, selecting the first set of points that their numerical optimizer returns.
But this can be the source of severe mispricing of other, more exotic, contracts
that will be valued using the calibrated parameters.
In an ideal world, one would get around this issue by also calibrating to
derivatives that are sensitive to future transition densities, such as forward start-
ing swaptions, but unfortunately such contracts are generally not available, and
very illiquid when they exist. Another option is to use the same regularization
techniques that we outlined in the stochastic volatility case, where the prior pa-
TABLE 7.1: Correlations of yields for different maturities. Bonds with longer matu-
rities exhibit relatively higher correlation. Listing 7.7 gives the relevant Matlab
code.
1m 3m 6m 1y 2y 3y 5y 7y 10y 20y
1.00 0.56 0.40 0.28 0.21 0.19 0.18 0.16 0.15 0.10 1m
1.00 0.76 0.55 0.42 0.40 0.36 0.32 0.30 0.24 3m
1.00 0.85 0.68 0.64 0.59 0.55 0.52 0.44 6m
1.00 0.87 0.83 0.78 0.74 0.71 0.64 1y
1.00 0.97 0.92 0.88 0.85 0.77 2y
1.00 0.96 0.93 0.89 0.83 3y
1.00 0.98 0.96 0.90 5y
1.00 0.98 0.94 7y
1.00 0.96 10y
1.00 20y
In practice, this correlation might be high, but is not perfect as the one fac-
tor family suggests. For example, table 7.1 presents the historical correlation
of various bonds with different maturities. Although the correlation is positive
across the board, its magnitude varies substantially at different horizons. In par-
ticular, the long end of the yield curve is much stronger correlated than the
short end: the ten and twenty year bonds move pretty much in unison, with
correlation over 95%, while the one and three months exhibit about half of this
dependence. Also, each maturity exhibits correlations that decay as we consider
bonds with increasing maturity differences. For example, the two year bond is
stronger correlated with the three year rather than the seven year instruments.
One way of increasing the number of free parameters is by considering multi-
factor models. For example, we can consider the interest rate to be the sum of
two simple Vasicek processes, by setting
(1) (2)
rt = xt + xt , where
(j) (j) (j)
dxt = θ (j) x̄ (j) − xt dt + σ (j) dBt
(1) (2)
The two processes xt and xt are called factors, and are in principle un-
observed. In the general specification we can also assume the factors to exhibit
some correlation ρ. If we maintain that our model is affine, then we can postulate
that the yield is again a linear combination
(1) (2)
yt (t + τi ) = A(t; t + τi ) + B1 (t; t + τi ) xt + B2 (t; t + τi ) xt
(1) (2)
⇒ dyt (t + τi ) = B1 (t; t + τi ) dxt + B2 (t; t + τi ) xt
After some extremely boring calculations we can derive the covariance of the
changes of bonds with yields τ1 and τ2 , as
r
(1−ρ2 ) (B11 B22 −B21 B12 )2 σ12 σ22
ρ(τ1 , τ2 ) = ± 1− 2 2 2 2 2 2 2 2
(B11 σ1 +B21 σ2 +2ρB11 B21 σ1 σ2 ) (B12 σ1 +B22 σ2 +2ρB12 B22 σ1 σ2 )
We use the shorthand notation Bij = Bi (t; t + τj ). The sign of the correlation is
positive if
B11 B12 σ12 + B21 B22 σ22
ρ>−
(B11 B22 + B21 B12 )σ1 σ2
Therefore from the expression above we can assess the implied correlation
of special cases. In particular, setting ρ = 1 will render ρ(τ1 , τ2 ) = 1 as the two
factors are driven by the same Brownian motion. Specifying uncorrelated factors
will always produce positive correlation across the maturities. Large negative
correlations produce a peculiar effect: as we set ρ = −1 we observe that the
yield correlation will be either perfectly positive or perfectly negative, depending
on the maturity pair.5
5
We mention this peculiarity because it is fairly common for a correlation of ρ = −1 to
be calibrated from cap or swaption prices. It is very unlikely that such a value reflects
the interest rate dynamics, and it is a feature that is more likely to point towards more
complex dynamics for the rate and its volatility that go beyond the affine setting.
explain changes in the yield curve behavior through time, rather than the yield
curve level. We will denote with y(t; τ) the yield of bond with maturity τ, recorded
at time t = 1, 2, . . ..
Therefore, each yield change 4yj = y(t; τj ) − y(t − 1; τj is written as the
weighted sum of n factors
The coefficients `j,i are called factor loadings, and essentially determine the
sensitivity of yield yj to factor fi , and cj is a constant cj = E4yj . If we assume
that the factors are uncorrelated, and they are normalized with zero mean and
unit variance, then we can write the covariance of different yields as
n
X
Cov(4yj , 4yk ) = `j,m `k,m
m=1
Therefore, if we denote with L the matrix that collects all factor loadings, then
the covariance matrix Σ of the yield changes will be equal to
Σ = L L0
Σ = V M V−1
Using this representation we can write the yield changes in terms of the
elements of the eigenvector matrix V and the eigenvalues in M
√ √ √
4yj = cj + vj,1 m1 f1 + · · · + vj,i mi fi + · · · + vj,n mn fn
It is therefore intuitive that factors that are associated with higher eigenvalues
will contribute more to the total variability of the series. In particular, if we
consider the overall variance of all yields, then we can write as the sum of all
eigenvalues
Xn Xn X n n
X
2
Var(4yj ) = vj,i mi = mi
j=1 j=1 i=1 j=1
where the last equality follows from the fact that the eigenvectors are normalized
to unit length. In factor analysis we select to use only the largest n̄ eigenvalues,
LISTING 7.7:
RO&0"O^^>!
Y Y : Correlation structure and principal component anal-
ysis of yield curve movements.
p 1C,O3 ;t.%.+ 81@Bw8
8C=,M } x0HOO%HC>=%HC>~|e p 8aN=uC,O3=O34C5
%N=%N }yCA5,4%NvC@a¨%s¼CN=4C5s%%a B-yA%5g:c p %4==?C4%N=%N
N=4C5}yC 8ON=40%N=%N°%FO6:Oc
,%N=%4 }%N=%Nv%FO6<(F 4>; c
5
%,N=4~} C3>** 0,%N=%4cc
p %N=%N5 uC,%*NC.4
*O3uC,4 :
5u,* 28=,Mà6ª%N=%45v6¬,%N=%4cc
N=4=O3%. O2MK60
10
p .+,%,4AN=O3+>; 5=%,ut.=uC,4¡+>*¡3 ;C=4,4C5=,%N=%4 . ?ON;C4C5
«>§} .+,%,O.+4* 0%,N=4Kc
C351 O:«+,%,4AN=O3+>; Ãt¨=%,ut.=uC,4 :c
*+, 3 yt}OKF A4;C%=? 28C=,M(
*1C,O3 ;C=%* C p eB2O* ÃgC6 a«>§F6>3 yGc
15
*1C,O3 ;C=%* C-ÿ;:c
4>;
p 1C,O3 ;t.%3 1ONA .+ 81a+>;O4;C=N;ONAMO5%3%5
x2« 60¨>«6-¸³O|~}r1C,O3 ;t.+ 81 :%,N=4Kc
*1C¾ ,O3 ;C=%* O-ÿ%;»a34;CDNA>uOg4 Ãa4C.+ 81a+5%3=O3+>;zÿ;v0
20
*1C,O3 ;C=%* O-*NC.=C+À, Ãa4C3%DC!A ÃK. u8uaAOÿ;:c
*1C,O3 ;C=%* O p O Ã!Ã p eB2O* Ã!Ã p eB2O*Oÿ;6x:KF A4;C%=? 28C=,M((BBB
C¸³tH 5u8 <¸³G . u8K5 u8 <¸³GH 5u8 -¸³K|-e
*O3uC,4 <
« ¾ }« ¾ %Fa6:KYF Cg
25
¨«}¨«%Fa6:KYF Cg
1OA+= -« ¾ 6O+6-¸a3 ;O4µa3%=?@t60(
,C3> +;
yCAN q 4A O28aN=uC,O3=%M°:c
MCAN q 4A O-*NC.=C+,ÀÃtA%+NO3 ;C°0c
30
A44;C O<=?C/4 ÃC*O3,O5/= ÃO*NC.=C+,v6r-=?C4 ÃK54C.+>;C/ ÃO*NC.=C+,v6BBB
<=?C4 ÃO=?t3,%/ ÃO*NC.=C+,v0c
The number of factors that we retain should be used as to make the variance of
the remainder component ηj small. As a rule of thumb, n̄ should ensure that at
least 95% of the total variance is explained by the corresponding factors. If one
finds that such a value of n̄ is large compared to the total number of variables,
it is evidence that factor analysis might not appropriate for this case.
factor 1 2 3 4 5 6 7 8 9 10
eigenvalue (% of sum) 81.7 8.8 4.6 2.2 1.1 0.6 0.3 0.3 0.2 0.2
cumulative (% of sum) 81.7 90.6 95.2 97.3 98.4 98.9 99.3 99.6 99.8 100.0
PSfrag replacements
TABLE 7.2: Relative magnitude of the eigenvalues for the decomposition of the
correlation matrix. The first three factors are responsible for over 95% of the
yield variability.
the third factor
+1.0
first factor
second factor
+0.8
third factor
+0.6
factor loading
+0.4
+0.2
-0.2
-0.4
1m 3m 6m 1y 2y 3y 5y 7y 10y 20y
maturity
FIGURE 7.8: Yield curve factor loadings. A principal component analysis is applied
to changes of yields over different maturities. The three factors that correspond
to the level, the slope and the convexity are clearly identified.
The recipe for principal component analysis is illustrated in listing 7.7. The
relative contribution of the j-th factor, together with the cumulative contribution
of the first j factors are given in table 7.2. For example, the third factor explains
4.6% of the variability of interest rate changes, while the first three factors explain
95.2%. We can therefore adopt a three-factor model as an approximation that
explains sufficiently well the yield curve dynamics.
One of the benefits of factor analysis, especially in interest rate modeling,
is the intuition that it can offer. Figure 7.8 plots the factor loadings for the first
three factors, against maturities. One can think of these curves as the impact
√
on the yield curve of j-th factor shock, with magnitude mj . For example, the a
shock of the first factor will shift the yields of all maturities in the same direc-
tion. This will essentially move the whole yield curve upwards or downwards,
and for that reason we coin the first factor as the level factor. This of course
KALMAN FILTERING
Principal component analysis of yields is a quick-and-dirty method of isolating
the factors present in yield curve moves, and quantifying their impact. But in the
end of the day it is a statistical technique, with no robust structure behind the
dynamics of the factors. When we take the covariance matrix of yield changes,
we implicitly make an assumption on their dynamics, namely that these changes
are stationary (and therefore yields follow unit root processes).
Kalman filtering techniques can be applied for a substantial class of models,
and in particular the relatively large affine family. As an example we will inves-
tigate an OU factor setup, which we will calibrate on a set of historical yield
curves. As we will now have a complete model to describe the bond yields and
their dynamics, the parameters and the corresponding risk premia can be jointly
recovered.
To be more concrete, assume that, under the physical measure, the factors
are specified as
(j) (j) (j)
dxt = −θ (j) xt dt + σ (j) dBt
That is to say, the factors are behaving as OU processes with mean reversion
level at zero. The short rate will be given as the sum of these processes plus
P (j)
a constant term rt = c + j xt . We can assume a constant price of risk that
will eventually after some algebra contribute to the constant term. We can then
P (j)
rewrite the risk adjusted short rate as rt = c + λ + j xt ; the risk adjusted
dynamics for the factors are given by the stochastic differential equations
(j) (j) Q,(j)
dxt = θ (j) λ(j) − xt dt + σ (j) dBt
Q,(j)
with Bt being a Q-Brownian motion.
With the further assumption that these Brownian motions are independent
across the factors, we can write the bond price as the product of Vasicek prices,
having the form
6
Less than three years in this case.
Notice that this is slightly different to the form presented during the discussion
of the Vasicek model, as here τ denotes the time to maturity, while there T
denoted the maturity date. We change the notation slightly to economize on
space here.
Therefore, the yields for different maturities are given by
X X (j)
yt (τ) = cτ + C̃ (j) (t; τ) + D̃ (j) (t; τ)xt
j j
with the functions C̃ and D̃ are given below (the superscripts are removed to
further ease the notation)
β = exp{−θ4t}
σ2
ση2 = 1 − exp{−2θ4t}
2θ
The measurement equations are based on the observed yields of different
maturities, and will be linear in the factors. Of course the model prices will
not match the observed historical yields exactly, and error terms need to be
introduced. This will be due to the fact that every model is just an approximation
with the two Brownian motions uncorrelated. The prices of risk are assumed
zero, and therefore the dynamics under the risk adjusted probability measure
remain unaltered. The instantaneous rate will be the sum of the two factors, and
the yield curve will be an affine function of them.
Listing 7.8 implements a wrapper that converts the inputs of the Gaussian
N-factor model to the form that is expected by the Kalman filter algorithm of
listing 5.3. To implement the filter, there must be some discrepancy between
model and observed yields, and for that reason we add a Gaussian noise ε to
each yield, with standard deviation σε = 0.1%. Otherwise we can solve the yield
curve for the factor values, and there would not be much filtering involved! The
simulated yield curves that serve as the input are given in figure XXX, together
with the simulated and filtered factor paths.
Given a well specified model and the true parameter values, the Kalman
filter does an outstanding job in recovering the factor trajectories. Of course for
the filter to be of any practical relevance, it will have to provide us with decent
parameter estimates as well. We therefore turn in investigating the performance
of the Kalman filter, if the parameter set is unknown.
Just like the standard Kalman filter, we will address the estimation problem
with the maximum likelihood approach. But unlike the standard
LISTING 7.8:
Si ` R%$ >Rg Y : A Kalman filter wrapper for the multi-factor Gaus-
sian model.
p C *CsJ,N1%1O4,àBw8
*u%;t.=O3+>; x¤¸66¬¨|~}rC*CsJ,N1%1O4,16V¯z6{³v6/³z6\t
1} N q 5 91cc{¯³} A4;C%=? <³KcV8C} 4,C+5 9¯v6:
D}8Cg q }8CªN}8C{DC}8C
«r} 4,C+5 2¯³ 6:e/¥} 4,C+5 2¯³v62¯Kc
5
*+, C}OKF<¯ p A++ 1 +D%4,*NC.=C+,O5
p NA%A%+.N=41ON,N>8a4=4,O5
=?O4=N}1°>Oc K%Oc y q N, }1°-Oc G%Oc
53> }1°@ %Oc K%OCc±,C35 }1°2Oc G%Oc
p 54=u%18t+4AN>;%4= q + ;MO34AO5
10
8Cgr O}~y q N,v
Dgr O}eB2%G53>tHC=?O4=N
q rO} 4y1 %=?O4=Na%³K(
Nr O}~y q N,(K0C q r Oc
DgrO}rDrOK0C q rOC
15
*+, K}OKF0¯³ p A++ 1 +D%4,8aN=uC,O3=O34C5
=}³g9Gc
¥zwvr6 O}þ0 4y1 %=?O4=Nt=tHC=?O4=NOH%=g
«g9G}r«9GúBBB
0y q N,K,35: B<t53>tH=?O4=NGaK0C¥wvr6 OúBBB
20
B<%t53>K=OHC=?O4=Nt¥r6 OCc
4>;
4>;
p .+>;t5=%,ut.=¡©CNA 8aN;*O3A=4,3 ;%1%uC=O5
%*B-« ¤}rNg/%*B-« }r«{%*zB<¿ }¥
25
%*B2N¿ ¤} C3N q c/%*B:¨ } C3N <Dtc
%*B0¨ »} 21z 4>; CC 4M%4 -¯³Gc
%*B2%}8C/%*B2§%} C3N -DKc
xw¸z6Vz6¬¨|}rONA 8aN;as*O3A=4, 9*v\ 6 tc
But we can lock at time t an interest rate which will be applied over the
interval (S, T )
This will be the forward rate F (t; S, T )
At S → T we have the instantaneous forward rate F (t; T )
No arbitrage indicates that the (continuously compounded) forward rate is
4.6
4.4
short/forward rate
4.2
3.8
3.6
3.4
3.2
3
0 1 2 3 4 5 6 7 8 9 10
time
Z T
log P(t; T ) − log P(t; S) = − F (t; s)ds
S
Therefore we are facing a system of infinite SDEs, with the initial forward
curve as a boundary condition
Of course, some relationships will ensure that no arbitrage is permitted
In particular, if the forward rate dynamics are given by
The functions
Z T
?
µ (t, T ) = − µ(t, s)ds
t
Z T
σ ? (t, T ) = − σ(t, s)ds
t
If we use the current account for discounting (as the numéraire), we expect
the discounted bonds to form martingales
P(t; T ) P(T ; T ) 1
=E =E
B(t) B(T ) B(T )
300 350
300
250
250
200
frequency 200
150
150
PSfrag replacements
100
100
50 50
0 0
0 5 10 94 96 98 100
rate bond price
6
rates
0
0 2 4 6 8 10 12
3000
cash flows
2000
1000
0
0 2 4 6 8 10 12
time
FIGURE 7.11: Cash flows for interest rate caplets and caps.
0
0 2 4 6 8 10 12 14 16 18 20
Maturity
f(r(t), t)
Z T
Q
= Et exp − r(s)ds Φ(r(T ))
t
Q B(t)
= Et Φ(r(T ))
B(T )
B(t)
If the discounting factor B(T ) was independent of the payoffs we would be
able to split the integral
But since we are dealing with stochastic rates they are not
Implicitly we are using the bank account as the numéraire, but there is
nothing special with this choice
As shown in Geman, el Karoui, and Rochet (1995), one can choose any
positive asset as the numéraire
For each numéraire there exists an equivalent measure, under which every
asset is a martingale
That is to say, if N(t) is the process of the numéraire, then there exists a
measure N induced by this numéraire, such that
X(t) X(T )
= EN
t
N(t) N(T )
for any asset process X(t). Then, we can express the value at t as X(t) =
X (T )
N(t)EN t N(T )
Given a problem, a good numéraire choice can simplify things enormously
For example, we can use the bond that matures at time T as the numéraire;
then all asset prices are given in terms of this asset (rather than currency units)
If T is the measure induced by this bond, we can write the payoffs as
Φ(r(T ))
f(r(t), t) = P(t; T )ETt = P(t; T )ETt Φ(r(T ))
P(T ; T )
1
Computers that don’t run Matlab will need the Matlab Component Runtime (MCR) set
of libraries which is freely available.
!"#%$& '()" 254(A.2)
The second step is to get the Windows Platform SDK (Windows Server 2003
R2) from the web
Y T X>" Y &^:R%!T>!>ig^>! Y l T >W%X& ! > Z R%TT> l &T0W$'^:WOT&0" b OTXS
Ä ÇNÉew
You should download the installer ]%\>dQ Z Z . It is important to select a
custom install and put as the target directory
U \] %R ! b %R $ Y nO&>'%T \mO&^R%!%T>!ixÆO&T:W$' \ W%XC& !iÇ \ÆU \] '$>i!>R YO\>dQ \
This is where Matlab looks for some necessary files. You don’t need to install
all components. The required ones are the following
mO&^R%!%T>!i{oa&0"%X! OT/U!>R% \>dQ I [ Wa&>'>X^Ê" l &:R%! " Y " I [ Wa&>'>XiÊ" l &:R%! " Y "
ÇNÉÈË Ä &:
1.
mOZ &^R%!%T>h!i_{oa&0"%X! OT/U!>R% \>dQ I Ì XC&TRO& _ W%$ _ 'U!!!Y 'T ! " "C!T !'Tp ÇNÉ
eh
Ë mO&^R%!%T>!i d $>%^ $ ÍC^^>%TT \ >R l &^>%T wm d Í \ \>dQ Z
Ë Ä &: I Î I Î
Ï mOh&^_R%!%T>!i d $>%$^ÍC^^>%TT \ >R l &^>%T wm d Í \ \>dQ [ Wa&>'>X^Ê" l &:R%! " Y "
I [
Wa&>'>i X Ê" l &:R%! " Y " Z ÇNÉÈË h Ä _ &: I
e
Ð d _ W bb &0" ^ b Î
!
!
' /
T
i >
! ù
R oa&0"%X! OT
[ Wa&>'>X I [ Wa&>'>X¬UÕ mÊ Z ^>'^ROÕ &_ ^>Ò>%R ^ to actually build the component. Two sub-
project, by clicking on . We must now save the project, and click on
(a) folder:
H<¿>¨<£,>3:.-40, H->3:5<=:,>3 q
(b) folder:
H<¿>¨<£,>3:.-40, H5<,>.
I
m%!X>WC'
. Change the name of this module (at its properties) to ] RO&^>>Rm%$&0" ,
and insert the code given in listing A.3.
We now need to turn to the GUI, which can be as in the screenshot A.3. The
RO&^>>Rn%!>R Y form code.
components need some event handlers, that will respond to the activation of the
form and user input. All this code must reside within the ]
^>! W%e button (listing A.5) or the
When the form is activated some initial values must be set, which is done in
listing A.4. The user can click either the Y
$ _ ! Wg button (listing A.6).
}~s
³C+%+%A5>C4;%u@B:«º +>;C=%,C+%A5zB ³M1C4 F}C8K5+«+>;C=%,C+%A¿%uO=%=O+>;°
¯C4J%C4;%u º =4>8 B:«%N1C=O3+>;= } Ú ¿¨ ÃC£,O3%.4,°BBB Ú
¯C4J%C4;%u º =4>8 BÀ>; O.=O3+>;¡h } Ú¸O+N£,O3%.4Û, Ú
20
»;¨ u q
£,O3DN=4¨ u q ¼C4>8t+D4£,O3%.4%,%C4;u º =C4>87-
¥O3:8 «>8O¿CNF , 5«+ 8%8aN;C¿CN,
¥O3:8 «=,O A 5«+ 8%8aN;C¿CN,C«+;O=%,O+%A
25
À;»,%,C+,¼C4C5 u8a4¯4y=
¨4= «>8O¿CN, } 1%1aA3%.N=O3+>; B:«+ 8%8aN;C¿CN,O5 :
¨4= «=,A}«>8O¿CN,B ¾ 3 ;C«+>;C=%,C+%A° º ¥F2} %%% O
«N%AA «=,AB:«+>;C=%,C+%A5 Ú>¿¨ ã,O3%.4À, ÃzBBB Ú(B ¥C4A4=4
30
»;¨ u q
] RO&^>>Rg
item should be now present in the menu. Invoking this item should
allow us to run the DLL and compute option prices. A screenshot of the add-in
in action is given in figure A.3.
Albrecher, H., P. Mayer, W. Schoutens, and J. Tistaert (2007, January). The little
Heston trap. Wilmott Magazine, 83–92.
Andricopoulos, A. D., M. Widdicks, P. W. Duck, and D. P. Newton (2003). Uni-
versal option valuation using quadrature methods. Journal of Financial Eco-
nomics 67, 447–471.
Bachelier, L. (1900). Théorie de la Spéculation. Gauthier-Villars.
Bailey, D. H. and P. N. Swarztrauber (1991). The fractional fourier transform
and applications. SIAM Review 33(3), 389–404.
Bailey, D. H. and P. N. Swarztrauber (1994). A fast method for the numerical
evaluation of continuous fourier and laplace transforms. SIAM Journal on
Scientific Computing 15(5), 1105–1110.
Baillie, R. T., T. Bollerslev, and H. O. Mikkelsen (1993). Fractionally integrated
generalized autoregressive conditional heteroscedasticity. Journal of Econo-
metrics.
Bajeux, I. and J. C. Rochet (1996). Dynamic spanning: Are options an appropriate
instrument? Mathematical Finance 6, 1–16.
Bakshi, G., C. Cao, and Z. Chen (1997). Empirical performance of alternative
option pricing models. The Journal of Finance 5, 2003–2049.
Bakshi, G. and D. Madan (2000). Spanning and derivative-security valuation.
Journal of Financial Economics 55, 205–238.
Barle, S. and N. Cakici (1998). How to grow a smiling tree. Journal of Financial
Engineering 7 (2), 127–146.
Barndorff-Nielsen, O. E. (1998). Processes of normal inverse Gaussian type.
Finance and Stochastics 2, 41–68.
Barone-Adesi, G., R. Engle, and L. Mancini (2004). GARCH options in incomplete
markets. Working Paper.
Bates, D. S. (1998). Pricing options under jump diffusion processes. Technical
Report 37/88, The Wharton School, University of Pennsylvania.
Bates, D. S. (2000). Post-’87 crash fears in S&P500 futures options. Journal of
Econometrics 94, 181–238.
!"#%$& '()" 264(A.5)
Duffie, D., J. Pan, and K. Singleton (2000). Transform analysis and asset pricing
for affine jump–diffusions. Econometrica 68, 1343–1376.
Dupire, B. (1993). Pricing and hedging with smiles. In Proceedings of the AFFI
Conference, La Baule.
Dupire, B. (1994). Pricing with a smile. RISK 7 (1), 18–20.
Engle, R. (1982). Autoregressive conditional heteroskedasticity with estimates
of the variance of U.K. inflation. Econometrica 50, 987–1008.
Engle, R. and F. K. Kroner (1995). Multivariate simultaneous generalized ARCH.
Econometric Theory 11, 122–150.
Eraker, B., M. Johannes, and N. Polson (2001). MCMC analysis of diffusion
models with application to finance. Journal of Business and Economic Statis-
tics 19(2), 177–91.
Fama, E. F. (1965). The behavior of stock market prices. Journal of Business 38,
34–105.
Feller, W. E. (1951). Two singular diffusion problems. Annals of Mathematics 54,
173–182.
Figlewski, S. and X. Wang (2000). Is the “leverage effect” a leverage effect?
Working Paper, SSRN 256109.
Gallant, A. R. and G. Tauchen (1993). SNP: A program for nonparametric time
series analysis. version 8.3 user’s guide. Working Paper, University of North
Carolina.
Gatheral, J. (1997). Delta hedging with uncertain volatility. In I. Nelken (Ed.),
Volatility in the Capital Markets: State-of-the-Art Techniques for Modeling,
Managing, and Trading Volatility. Glenlake Publishing Company.
Gatheral, J. (2004). A parsimonious arbitrage-free implied volatility parameter-
ization with application to the valuation of volatility derivatives. In Global
Derivatives and Risk Management.
Gatheral, J. (2006). The Volatility Surface: A Practitioner’s Guide. New York,
NY: Wiley Finance.
Geman, H., N. el Karoui, and J.-C. Rochet (1995). Changes of numéraire changes
of probability measure and option pricing. Journal of Applied Probability 32,
443–458.
Gerber, H. U. and E. S. W. Shiu (1994). Option pricing by Esscher transforms.
Transactions of the Society of Actuaries XLVI, 99–191.
Ghysels, E., A. Harvey, and E. Renault (1996). Stochastic volatility. In G. Mad-
dala and C. Rao (Eds.), Handbook of Statistics, 14, Statistical Methods in
Finance. North Holland.
Glosten, L. R., R. Jagannathan, and D. Runkle (1993). On the relation between
the expected value and the volatility of the nominal excess return on stocks.
Journal of Finance 48(5), 1779–1801.
Hamilton, J. D. (1994). Time Series Analysis. Princeton, NJ: Princeton University
Press.
Hamilton, J. D. and R. Susmel (1994). Autoregressive conditional heteroscedas-
ticity and changes in regime. Journal of Econometrics 64, 307–333.
generalized error distribu- market segmentation, 189 sample space, see state
tion, 143 Markov chain, 148, 161 space
Girsanov theorem, 150, 152 Markov Chain Monte Sharpe ratio, 145, 154
Carlo, 160 short rate
Igarch model, 141 maturity date, 35 stylized facts, 189
implied density, 167, 173 maximum likelihood short rate model, see
Breeden-Litzenberger estimation, 127 one-factor model
method, 175 standard errors, 139 σ algebra, 3
implied tree, 174 mean reversion, 192 generated, 4
implied volatility, 131 measurable space, 3 Simulated Method of
and Delta, 134 measure, 3 Moments, 160
and expected volatility, mixture of distributions, smoothing, 167, 176
132 130 Nadaraya-Watson, 168
and moneyness, 134 model risk, 162 radial basis function,
and realized volatility, 167
142 no-arbitrage tests, 169 sovereign bond, 177
skew, 134 Novikov condition, 188 SP500 index, 132, 140
smile, 134 square root process, 151
surface, 133 Ohrnstein-Uhlenbeck “square root” process, 192
dynamics, 134 process, 151 state space, 1
sticky Delta, 135 one-factor model, 184 stochastic volatility, 149,
Ornstein-Uhlenbeck
sticky strike, 135 185
process, 190 and Garch, 135
SVI parameterization,
overfitting, 201
172 calibration, 161, 165
inverse problem, 162 estimation, 160
Particle filter, 161
Itō formula, 186 penalty function, 162 PDE, 156
preferred habitat, 189 replicating portfolio, 157
Kalman filter, 161, 192 price of interest rate risk, Student-t, 142
Kolmogorov backward 185 swaption, 187
equation, 175 price of risk, 145
Kolmogorov extension prior information, 162 term structure PDE, 187
theorem, 174 transform methods, 152
Kolmogorov forward Radon-Nikodym derivative,
equation, 175 154, 188 underlying asset, 35
marginal rate of utility function, 147
Leibniz rule, 173 substitution, 148
leverage effect, 133, 143, random variable, 1 Vasicek model, 190
149 redundant claim, 35 Vega, 162
liquidity preferences regimes vertical spread, 169
theory, 189 of volatility, 132 Vitali set, 2
local volatility, 167, 174 regularization, 162 VIX index, 132, 141, 142
function, 174 Tikhonov-Phillips, 162 and financial crises, 132
PDE representation, 175 risk aversion, 150 and realized volatility,
long memory, 141 risk premium, 150 132
time varying, 146 volatility
marginal rate of substitu- and correlation with
tion, 147 sample path, 1 returns, 132, 143, 149
as Radon-Nikodym sample point, see sample and financial crises, 130
derivative, 148 path attractor, 131