0% found this document useful (0 votes)
37 views285 pages

Lecture Notes 2008d

Uploaded by

Zouaouia Hallam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views285 pages

Lecture Notes 2008d

Uploaded by

Zouaouia Hallam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 285

Kyriakos Chourdakis

FINANCIAL ENGINEERING
A brief introduction using the Matlab system

Fall 2008

 
Contents

1 Elements of stochastic calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 The sample space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
σ-algebras and Borel sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Generated σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
σ-algebras and information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Measures and probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Measurable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Probability measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Equivalent probability measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Stochastic process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Filtration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Distributions of a process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Brownian motion and diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Properties of the Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
The Brownian motion is a martingale . . . . . . . . . . . . . . . . . . . . . . . . . . 17
The Brownian Motion is Gaussian, Markov and continuous . . . . . 18
The Brownian motion is a diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
The Brownian motion is wild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Dealing with diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Variation processes and the Itō integral . . . . . . . . . . . . . . . . . . . . . . . . 21
The Stratonovich integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Itō diffusions and Itō processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Itō’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.6 The partial differential equation approach . . . . . . . . . . . . . . . . . . . . . . 27
Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.7 The Feynman-Kac formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
 !"#%$& '()"  VIII(0.0)

1.8 Girsanov’s transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2 The Black-Scholes world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35


2.1 The original derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
The Black-Scholes assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
The replicating portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Arbitrage opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
The Black-Scoles partial differential equation . . . . . . . . . . . . . . . . . . 39
2.2 The fundamental theorem of asset pricing . . . . . . . . . . . . . . . . . . . . . . 40
The fundamental theorem of asset pricing and Girsanov’s theorem 40
A second derivation of the Black-Scholes formula . . . . . . . . . . . . . . . 42
Expectation under the true measure P . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Expectation under the risk neutral measure Q . . . . . . . . . . . . . . . . . . 43
The Feynman-Kac form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3 Exotic options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Exercise timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Payoff structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Path dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 The Greeks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
The Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Dynamic Delta hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Dynamic Delta-Gamma hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Gamma and uncertain volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Vega . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Dividends and foreign exchange options . . . . . . . . . . . . . . . . . . . . . . . . 63
2.5 Implied volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.6 Stylized facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Leptokurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Volatility features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Price discontinuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3 Finite difference methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69


3.1 Derivative approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2 Parabolic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A PDE as a system of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
The grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Explicit finite differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Stability and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Implicit finite differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
The Crank-Nicolson and the θ-method . . . . . . . . . . . . . . . . . . . . . . . . . 80
Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.3 A PDE solver in Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Plain vanilla options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


IX(0.0) Q #RO& $>S%!%TVU>! WRX$>SO&T
Early exercise features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Barrier features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Computing the Greeks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.4 Multidimensional PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Finite difference approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Alternative direction implicit methods . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.5 A two-dimensional solver in Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4 Transform methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107


4.1 The setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
The “dampened” cumulative density . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.2 Option pricing using transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
The Delta-Probability decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
The Fourier transform of the modified call . . . . . . . . . . . . . . . . . . . . . . 114
4.3 An example in Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
The characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Numerical Fourier inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.4 Applying Fast Fourier Transform methods . . . . . . . . . . . . . . . . . . . . . . 121
FFT inversion for the probability density function . . . . . . . . . . . . . . 123
FFT inversion for European call option pricing . . . . . . . . . . . . . . . . . 123
4.5 The fractional FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.6 Adaptive FFT methods and other tricks . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

5 Historical estimation and filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


5.1 The likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.2 Properties of the ML estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
The score and the information matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Consistency and asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Hypothesis testing and confidence intervals . . . . . . . . . . . . . . . . . . . . 136
5.3 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Linear ARMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Lévy models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.4 Likelihood ratio tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.5 The Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
The filtering procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Generalizations and extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Multivariate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Extended Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Unscented Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  X(0.0)

6 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.1 Some general features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Historical volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Implied volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
The implied volatility surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Two modeling approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2 Autoregressive conditional heteroscedasticity . . . . . . . . . . . . . . . . . . . 159
The Arch model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
The Garch model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
The Garch likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Estimation examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Other extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Garch option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Utility based option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Distribution based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
The Heston and Nandi model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.3 The stochastic volatility framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
The Hull and White model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
The Stein and Stein model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
The Heston model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Girsanov’s theorem and option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Example: The Heston model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
The PDE approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
The Feynman-Kac link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Example: The Heston model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Estimation and filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Calibration example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.4 The local volatility model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Interpolation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Implied densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Local volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

7 Fixed income securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203


7.1 Yields and compounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7.2 The yield curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
The Nelson-Siegel-Svensson parametrization . . . . . . . . . . . . . . . . . . . 205
The dynamics of the yield curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
The forward curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
7.3 The short rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Short rate and bond pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
The hedging portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
The price of risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
7.4 One-factor short rate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
The Vasicek model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


XI(0.0) Q #RO& $>S%!%TVU>! WRX$>SO&T
Lognormal models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
The CIR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.5 Models with time varying parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 220
The Ho-Lee model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
The Hull-White model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Interest rate trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Calibration of interest rate trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
The first stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
The second stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Pricing and price paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
The Black-Karasinski model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Calibration issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.6 Multi-factor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Factors and principal component analysis . . . . . . . . . . . . . . . . . . . . . . 237
Kalman filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
A multi-factor Gaussian example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
7.7 Forward rate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Calibration of HJM models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Short versus forward rate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.8 Bond derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
The Black-76 formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.9 Changes of numéraire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.10 The Libor market model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

8 Credit risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

A Using Matlab with Microsoft Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253


A.1 Setting up Matlab with the C/C++ compiler . . . . . . . . . . . . . . . . . . . 254
A.2 Writing the Matlab functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
A.3 Writing the VBA code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
A.4 The Excel add-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
A.5 Invoking and packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


Figures

1.1 Construction of a Brownian motion trajectory. . . . . . . . . . . . . . . . . . . . . 16


1.2 Zooming into a Brownian motion sample path. . . . . . . . . . . . . . . . . . . . 20
1.3 A sample path of an Itō integral. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4 Asset price trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1 Behavior of a Call option Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50


2.2 Dynamic Delta hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.3 Sample output of the dynamic Delta hedging procedure. A call
option is sold at time t = 0 and is subsequently Delta hedged
to maturity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4 Behavior of a call option Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Dynamic Delta-Gamma hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.6 Histograms for Delta and Delta-Gamma hedging . . . . . . . . . . . . . . . . 57
2.7 Delta hedging with uncertain volatility . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.8 Behavior of a call option Vega . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.1 Finite difference approximation schemes. . . . . . . . . . . . . . . . . . . . . . . . . 71


3.2 A two-dimensional grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.3 The Explicit FDM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4 The Implicit FDM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.5 The Crank-Nicolson FDM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.6 Early exercise region for American options. . . . . . . . . . . . . . . . . . . . . . . 85
3.7 European versus American option prices. . . . . . . . . . . . . . . . . . . . . . . . . 89
3.8 Greeks for American and European puts. . . . . . . . . . . . . . . . . . . . . . . . . 94
3.9 Oscillations of the Greeks in FDM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.10 The structure of the Q-matrix that approximates a
two-dimensional diffusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.11 Two-dimensional PDE grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.1 “Damping” the transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112


4.2 Finite difference approximation schemes. . . . . . . . . . . . . . . . . . . . . . . . . 115
 !"#%$& '()"  XIV(0.0)

4.3 Numerical Fourier inversion using quadrature . . . . . . . . . . . . . . . . . . . 118


4.4 Normal density function using Fourier inversion . . . . . . . . . . . . . . . . . 120
4.5 Normal inverse Gaussian density function using Fourier inversion 121
4.6 Comparison of the FFT and the fractional FFT . . . . . . . . . . . . . . . . . . 127

5.1 Example of likelihood functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132


5.2 Bias and asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 Kalman filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.1 Historical volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154


6.2 Mixtures of normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.3 The VIX volatility index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.4 Implied volatility surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.5 Filtered volatility for DJIA and SPX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.6 Calibrated option prices for Heston’s model . . . . . . . . . . . . . . . . . . . . . . 189
6.7 The ill-posed inverse problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.8 Implied and local volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.9 Static arbitrage tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7.1 Yields curves using the Nelson-Siegel-Svensson parametrization . 206


7.2 Historical yield curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
7.3 Historical Nelson-Siegel parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.4 Simulation of CIR yield curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.5 Calibration of the Black-Karasinski model . . . . . . . . . . . . . . . . . . . . . . . 231
7.6 Price path for a ten year bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
7.7 Price path for bond options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.8 Yield curve factor loadings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.9 Yield and one-year forward curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
7.10 Pull-to-par and bond options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
7.11 Cash flows for interest rate caplets and caps. . . . . . . . . . . . . . . . . . . . . 248
7.12 Typical Black volatilities for caplets and caps. . . . . . . . . . . . . . . . . . . . 249

%!!' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
A.1 Screenshots of the Matlab Excel Builder . . . . . . . . . . . . . . . . . . . . . . . . 256
A.2 The folders created by YZ
A.3 Screenshot of the [\>]
O
R 
& >
^ >R add-in. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


Listings

1.1 _>YaT ` TR%& Y WC>SC'(Tc Y  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


2.1 _ T ` b X eY  : Black-Scholes Greeks. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 _ `^>d$`''( b Y : Dynamic Delta hedging. . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1 f%` Wg Y : Payoff and boundaries for a call. . . . . . . . . . . . . . . . . . . . . . 83
3.2
%f%X`  `_ TcY  Y: :Payoff and boundaries for a put. . . . . . . . . . . . . . . . . . . . . . . . 83
3.3
%X `_ T ` & Y C'( Y : Implementation of the θ-method solver. . . . . . . . . . 85
θ-method solver for the Black-Scholes PDE. . . . . . . . . . 84
3.4
OT>!>Rg Y : PSOR method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.5
%X `_ T ` $ Y >Rg Y : θ-method solver with early exercise. . . . . . . . . . . . . 88
3.6
3.7
%X `_ T ` $ Y >R ` & Y '( Y : Implementation of PSOR for an
%X `_ T `_ $>RRg Y : Solver with barrier features. . . . . . . . . . . . . . . . . . . . 91
American put. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.8
3.9
%X `_ T `_ $>RR ` & Y '( Y : Implementation for a discretely
%X `_ T ` b R%>SCT ` & Y '( Y : PDE approximations for the Greeks. . . . 93
monitored barrier option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.10
^>$'' &0"
3.11
%X `Y_ T `h Y X: YPayoff and boundaries for a two-asset option. . . . . . . . 101
3.12
%X `_ T `h X Y : Solver for a two dimensional PDE (part II). . . . . . . . 104
: Solver for a two dimensional PDE (part I). . . . . . . . . 103
3.13
%X `_ T `h X ` & Y C'( Y : Implementation of the two dimensional solver.105
3.14
a& ` "!>R Y $'( Y : Characteristic function of the normal distribution. 116
4.1
4.2
a& ` "a& b  Y : Characteristic function of the normal inverse
^ i &0"g
Gaussian distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.3
ii` ` ^>$'Y '(: YTrapezoidal integration of a characteristic function. . . . 119
4.4
iRig Y : Fractional FFT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
: Call pricing using the FFT. . . . . . . . . . . . . . . . . . . . . . . . . 124
4.5
iRi ` ^>$''( Y : Call pricing using the FRFT. . . . . . . . . . . . . . . . . . . . . . 125
4.6
iRi ` &0"% b R%$>%e Y : Integration over an integral using the FRFT. 126
4.7
4.8
^ i h ^ Xi Y : Transform a characteristic function into a cumulative 129

5.1
$>R Y $ ` T& Y  Y , $>R Y $ ` T& Y  Y and $>R Y $ ` T& Y  Y : Simulation and
density function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

maximum likelihood estimation of ARMA models . . . . . . . . . . . . . . . . . 137


 !"#%$& '()"  XVI(0.0)
S%$' Y $ " ` iC&>' %>R `Cj<d  Y : One dimensional Kalman filter . . . . . . . . . . . 144
5.2
S%$' $ " iC&>' %>Rg
5.3
W"OTY ^> "%` X ` iC&>' %Y  :RgThe  Y : The
N-dimensional Kalman filter . . . . . . . . . . . . 148

b $>$>RCRC^:^: jj` '%& &:CSg'( Y  : Garch likelihood function. . . . . . . . . . . . . . . . . . . . . . . 163


5.4 unscented Kalman filter . . . . . . . . . . . . . 151
6.1
6.2 b $>RC^: jj` Y '%&:Sg Y : Estimation of a Garch model. . . . . . . . . . . . . . . . . . 165
6.3
ab & ` %Tj%j!` " Y : Characteristic
Y : Egarch likelihood function. . . . . . . . . . . . . . . . . . . . . 168
6.4
TT k ` %T%! " Y : Sum of squaresfunction of the Heston model. . . . . . 177
6.5
^>$'%& %T%! " Y : Calibration of the Heston model. . . . . . . . . . . . . . . 189
for the Heston model. . . . . . . . . . . . 188
6.6
"$X>_C$>` h  Y : Nadaraya-Watson
6.7
& Y  ` l !'( Y : Implied volatility surface smoother. . . . . . . . . . . . . . . . . . . . . . . . . 192
6.8
%%T ` l !'( Y : Tests for static arbitrage.smoothing. . . . . . . . . . . . . . . . . . 193
6.9
6.10
'!%^ ` l !'( Y : Construction of implied densities and the local
. . . . . . . . . . . . . . . . . . . . . . . . . . 196

7.1
"'T>! " ` T&  b ' ` T l "OTT>! "g Y : Yields based on the
volatility surface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

7.2
^>$'%& _ R%$>% ` "OTc Y : Calibration of the Nelson-Siegel formula to a
Nelson-Siegel-Svensson parametrization. . . . . . . . . . . . . . . . . . . . . . . . . 207

 ` ^R%$>%e Y : Create Hull-White trees for the short rate. . . . . . . . .


yield curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
7.3
7.4
 ` $> Y : Compute the price path of a payoff based on a 224

7.5
 ` $> ` $ Y >Rg Y : The price path of a payoff based on the
Hull-White tree for the short rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

7.6
 ` $> ` & Y C'( Y : Implementation of the Black-Karasinski
Hull-White tree when American features are present. . . . . . . . . . . . . 229

7.7
RO&0"O^^>! Y  Y : Correlation structure and principal component
model using a Hull-White interest rate tree. . . . . . . . . . . . . . . . . . . . . . 230

7.8
Si ` R%$ >Rg Y : A Kalman filter wrapper for the multi-factor
analysis of yield curve movements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Matlab file Z `_ `


' T ^>$''( Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Gaussian model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
A.1
Matlab file Z `_ `
' T Wg Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
A.2
VBA module (]
RO&^>>Rm%$&0" ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
A.3
VBA Activation Handlers (]
RO&^>>Rn%!>R Y ) . . . . . . . . . . . . . . . . . . . . . . . . . 258
A.4
VBA User Input Handlers I (]
RO&^>>Rn%!>R Y ) . . . . . . . . . . . . . . . . . . . . . . . 259
A.5
VBA User Input Handlers II (]
RO&^>>Rn%!>R Y ) . . . . . . . . . . . . . . . . . . . . . . 260
A.6
A.7 VBA Add-in installation (

 a
 
& :
T o!>RS _ !!>S ) . . . . . . . . . . . . . . . . . . . . . . . . 261
261

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


1
Elements of stochastic calculus

In order to understand the evolution of asset prices in continuous time in general,


and in the Black-Scholes framework in particular, one needs some tools from
stochastic calculus. In this part we will give an overview of the main ideas from
a probabilistic point of view. In the next chapter we will put these ideas in
developing derivative pricing within the Black-Scholes paradigm.
Our objective at this stage is to construct stochastic processes in continuous
time that have the potential to capture the probabilistic/dynamic behavior of
assets. We want to be as rigorous as possible in our definitions without leaving
any exploitable loopholes, but we don’t want to be too abstract. The theory is
covered (in increasing mathematical complexity) in Øksendal (2003), the two vol-
umes of Rogers and Williams (1994a,b) and Protter (2004). Expositions that have
some elements of maths relating to finance are (again in increasing complexity)
Hull (2003), Neftci (2000), Bingham and Kiesel (2000), and Shreve (2004a,b)

1.1 THE SAMPLE SPACE


We like to think of stochastic processes (or asset prices in our case) as the
outcome of an experiment or as the result of the state of nature. Each state of
the world is a configuration that potentially affects the value of the stochastic
process.
Definition 1. We will denote the set of all states of the world with Ω and we
call it the state space or sample space. The elements ω ∈ Ω are called the
states of the world, sample points or sample paths.
Of course these states of the world are very complicated multidimensional
configurations, and are typically not even numerical. In most cases they are not
directly revealed to us (the observer).
Definition 2. A random variable quantifies the outcome of the experiment, by
mapping events to real numbers (or vectors), and this is what we actually observe.
 !"#%$& '()"  2(1.1)

Therefore, a random variable X is just a function


X : Ω −→ Rn : ω −→ X(ω)
Example 1. As an example, Say that we toss a coin three times, the sample space
will be the set (with 23 = 8 elements)
Ω = {HHH, HHT , HT H, . . . T T H, T T T }
This sample space is not numerical, but we can define the random variables X
and Y in the following way
1. X = X(ω) = {h : h = number of H in ω}
2. Y = Y (ω) = {|h − t| : h, t = number of H, T in ω}
In the following table we summarize the possible sample space outcomes, to-
gether with the corresponding values of two different random variables X and Y

ω HHH HHT HTH THH HTT THT TTH TTT


X (ω) 3 2 2 2 1 1 1 0
Y (ω) 3 1 1 1 1 1 1 3

Apparently, the random variable X counts the number of heads thrown, while
the random variable Y counts the absolute difference between the heads and
tails throws.
Of course this implies that the probabilistic behavior of the random variable
will depend solely on the probabilistic behavior of the states of the world. In
particular, we can write the probability (although we have not formally defined
yet what a probability is)
Pr[X(ω) = x] = Pr[{ω : X(ω) = x}]
In the example above the sample space was small and discrete, and for that
reason the analysis was pretty much straightforward. Unfortunately, this is not
usually the case, and a typical sample space is not discrete. If the sample space
is continuous, expressions like Pr[ω] for elements ω ∈ Ω will be mostly zero,
and therefore not of much interest.
Therefore, rather that assigning probabilities to elements of Ω we need to
assign them to subsets of Ω. A natural question that follows is the following:
can we really assign probabilities to any subset of Ω, no matter how weird and
complicated it is? The answer to this question is generally no. We can construct
sets, like the Vitali set, for which we cannot define probabilities. 1 Subsets of
1
Note that this does not mean that the probability is zero, it means that even if we
assume that the probability is zero we are driven to paradoxes. In fact, the probability
of such a set can not exist, and we cannot allow such a set to be considered for that
purpose.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


3(1.1) Q #RO& $>S%!%TVU>! WRX$>SO&T
the sample space Ω that are nice enough to allow us to define probabilities on
them are called sigma algebras.

σ -ALGEBRAS AND BOREL SETS


Definition 3. A subset of the power set F ⊆ P(Ω) is called a σ-algebra on Ω
if complements and countable unions also belong to the set F :
1. F ∈ F ⇒ F { ∈ FS(Complements)
2. F1 , F2 , . . . ∈ F ⇒ ∞
i=1 Fi ∈ F (Countable unions)

It turns out that σ-algebras are just the families of set we need to define
probabilities upon, as they are nice enough not to lead us to complications and
paradoxes. Probabilities will be well defined on elements of F . The elements of
a σ-algebra are therefore called events.
As we will see, probabilities are just special cases of a large and very im-
portant class of set functions called measures. It is measures in general that are
defined on σ-algebras. The pair (Ω, F ) is called a measurable space, to indicate
the fact that it is prepared to “be measured”.
Example 2. For a sample space Ω there will exist many σ-algebras, and some
will be larger than others. For the specific sample space of the previous example,
where a coin is tossed three times, some σ-algebras that we may define are the
following
1. The minimal σ-algebra is F0 = {∅, Ω}. It is apparent that this is the
smallest possible set that will satisfy the conditions.
2. F1 = {∅, {HHH, HHT , HT H, HT T }, {T HH, T HT , T T H, T T T }, Ω} is
another σ-algebra on Ω. Apparently F0 ⊆ F1 .
3. The powerset of Ω is also a σ-algebra, F∞ = P(Ω). In fact this is the
largest (or maximal) σ-algebra that we can define on a discrete set. If Ω
was continuous, say the closed interval Ω = R, the powerset is not a sigma
algebra as it includes elements like the Vitali set. The largest useful σ-
algebra that we use in this case is the Borel σ-algebra.
An example of a set that is not a σ-algebra is

E = {∅, {HHH}, {HT H}, {T HH}, {T T H}, Ω}

since the complement E 3 {HHH}{ = Ω \ {HHH} ∈


/ E.
We usually work with subsets of the real numbers, and, as we hinted in the
previous subsection, σ-algebras that are defined on the real numbers (or any
other Euclidean space) are very important. We saw that when we want to define
a large σ-algebra, the powerset is not an option, since it includes pathological
cases. The Borel algebra takes its place for such sets.
Definition 4. More formally, the Borel (σ-)algebra is the smallest σ-algebra that
contains all open sets of R (or Rn ).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  4(1.1)

Roughly speaking, Borel sets are constructed from open intervals in R, by


taking in addition all possible unions, intersections and complements. We denote
the Borel algebra with B = B(Rn ).
In fact, it is very difficult to find a set that does not belong to the Borel
algebra, and the ones that don’t are so complicated that we cannot enumerate
their elements.

GENERATED σ -ALGEBRAS
So far we have defined σ-algebras and we have shown ways to describe them
by expressing some property of their elements. We can also define a σ-algebra
based on a collection of reference subsets of Ω.

Definition 5. In particular, given a family G of subsets of Ω, there is a σ-algebra


which is the smallest one that contains G . This is the σ-algebra generated by
G , and we denote it with F (G ) = FG .

The generated σ-algebra will be equal to the intersection of all σ-algebras


that contain G (since it is the smallest one with that property)
\
F (G ) = {F : F is a σ-algebra on Ω, and G ⊂ F }

A random variable can also create a σ-algebra. Given a random variable X,


there is a σ-algebra which is the smallest one that contains the pre-image of X

X −1 (G) : G ⊂ Rn , and G is open

This is the σ-algebra generated by X, and it is denoted by

F (X) = FX = {X −1 (B) : B ∈ B}

Example 3. Following our coin example, the random variable X will generate the
σ-algebra

FX = {∅, {HHH}, {HHT , HT H, T HH}, {HT T , T HT , T T H},


{T T T }, all complements, all unions, all intersections}

It should be straightforward to verify that FX


1. is a σ-algebra
2. contains all sets X −1 (G), for G ∈ B
3. is the smallest such set
For Y , the generated σ-algebra is

FY = {∅, {HHT , HT H, T HH, HT T , T HT , T T H},


{HHH, T T T }, all complements, all unions, all intersections}

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


5(1.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
σ -ALGEBRAS AND INFORMATION
In the theory of stochastic processes σ-algebras are closely linked with infor-
mation.
Intuitively, the generated σ-algebra FX captures the information we acquire
by observing realizations of the random variable X. Knowing the realization
X = x, allows us to decide in which element of the σ-algebra F X the sample
point ω belongs. The sample point ω is the one that created the realization
X(ω) = x.
If we have two random variables X, Y on the same measure space (Ω, F F ),
if FY ⊆ FX then knowing the realization of X gives us enough information to
determine what the realization of Y is, without observing it directly. In particular
there exists a function such that Y = f(X)
If in addition the inverse set relationship does not hold, and F X * FY ,
then this function is not invertible, and knowledge of the realization of Y does
not determine X uniquely. That is to say observing Y does not offer us enough
information to infer the value of X.
In the case where the σ-algebras are the same, FY = FX , then the two
variables contain exactly the same information: observing one is the same as
observing the other.

Example 4. In our coin example it is easy to confirm that FY ⊆ FX , but FX *


FY . This will mean that if we are given the realization of the random variable
X, then we should be able to uniquely determine the realization of the random
variable Y , but not vice versa.
As an example, say that we observe X = 2. Then we know that the sam-
ple point ω that was selected from the sample space will belong to set
F = {HHT , HT H, T HH}, since only for these points X(ω) = 2. It is easy
to verify that F ∈ FX and also F ∈ FY . In fact, for all ω ∈ F we have
Y (ω) = 1. Therefore observing X = 2 uniquely determines the value of Y = 1.
On the other hand, say that we observe the random variable Y , and we have
the realization Y = 3, indicating then the sample point selected belongs in the
set F ? = {HHH, T T T }. Now, of course, F ? ∈ FY but it does not belong to the
σ-algebra generated by X, that is F ? ∈ / FX . In fact, if are given Y = 3 are not
given enough information to decide if X = 3 or X = 0.

1.2 MEASURES AND PROBABILITY


In the previous section we paved the way for the introduction of the measure
function. We introduced the sample space and the sets of its subsets that form
σ-algebras, which are exactly the sets that are well-behaved enough to be
measured. We saw that things are relatively straightforward when the sample
space is discrete, and a natural σ-algebra is the powerset. When the sample
space is continuous we have to be more careful when constructing σ-algebras,

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  6(1.2)

as there are sets that we need to exclude (like the Vitali set). The Borel algebra
is here the natural choice.
Definitions of the probability date back as far as Carneades of Cyrene (214-
129BC), a prominent student at Plato’s academy. More recently, Abraham De
Moivre (1711) and Pierre-Simon Laplace (1812) have also attempted to formalize
the everyday notion of “probability”. Modern probability took off in the 1930s,
largely inspired by the axiomatic foundations on measure theory by Andreii
Nikolayevich Kolmogorov.2

Definition 6. Given a measurable space (Ω, F ) we can define a measure µ as a


function that maps elements of the σ-algebra to the real numbers, µ : F → R,
and also has the following two properties:
1. the measure of the empty set is zero: µ(∅) = 0
2. the measure of disjoint sets is the sum of their measures, alsoScalled σ-
additivity:
P∞ F1 , F2 , . . . ∈ F , and Fi ∩ Fj = ∅ for all i 6= j ⇒ µ ∞
i=1 Fi =
i=1 µ(F i )

The measure can be thought of as the mathematical equivalent of our every-


day notion of “measure”, as in the length of line segments, the volume of solid
bodies, the probability of events, the time needed to travel between points, and
so on.
After augmenting the measurable space with a measure, the triplet (Ω, F , µ)
is called a measure space. The subsets of Ω that are elements of F are called
measurable sets, indicating that they can be potentially measured by µ. Note
that we can define more than one measure on the same measurable space,
creating a whole array of measure spaces (Ω, F , µ1 ), (Ω, F , µ2 ), and so on.

MEASURABLE FUNCTIONS
Based on the notion of measurable sets, we can turn to functions that map from
one measurable space (Ω, F ) to another measurable space (Ψ, G ). Functions
that have the property that their pre-images of measurable sets in the destination
measure space Ψ are also measurable sets in the departure set Ω.

Definition 7. A function f that maps from a measure space (Ω, F ) to (Ψ, G )

f : Ω −→ Ψ

is a (F , G )-measurable function if

for all G ∈ G we have f −1 (G) ∈ F


2
In the words of Kolmogorov: “The theory of probability as [a] mathematical discipline
can and should be developed from axioms in exactly the same way as geometry and
algebra.”

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


7(1.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
If the function f maps from Ω to the Euclidean space Ψ = Rn , augmented with
the Borel σ-algebra G = B(Rn ), then we call the function just F -measurable
(shortened to F -meas)
A random variable X is indeed a function X : Ω → Rn , therefore we can talk
of measurable random variables. In particular, by the definition of the generated
σ-algebra, a random variable will always be measurable to the σ-algebra it
generates, X is FX -meas.
If in addition f maps to a Euclidean space Ω = Rm , also augmented with
the corresponding Borel algebra, F = B(Rm ), then the function is called just
measurable.

Example 5. In our coin example we defined two random variables, X and Y , from
the sample space of three coin tosses

X, Y : Ω −→ R

Each one of these random variable will induce a measurable space on Ω, by the
σ-algebras it generates

X (Ω, FX ), and Y (Ω, FY )

By construction, X is FX -meas and Y is FY -meas. On the other hand, while


Y is FX -meas, X is not a FY -meas random variable. In terms of information Y
being FX -meas means that knowing X will determine Y , but not the other way
round

PROBABILITY MEASURES
As we indicated in the last subsections, measures are the mathematical equiva-
lent of our everyday notion of “measure”. In the context of stochastic processes
we are not interested in general measures, but in a small subset: the probability
measures.

Definition 8. A probability measure P is just a measure on (Ω, F ), with the


added property that
P(Ω) = 1
The measure space (Ω, F , P) is called a probability space

Therefore, for a function P : Ω → R to be a probability measure there are


three requirements
1. P(∅) = 0
2. P(Ω) = 1 S∞ 
3. P∞all F1 , F2 , . . . ∈ F , with Fi ∩ Fj = ∅ for all i 6= j ⇒ P i=1 Fi =
for
i=1 P(Fi )

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  8(1.2)

It is obvious that probability measures are not unique on a measurable space.


Given (Ω, F ) we can define different probability spaces (Ω, F , P 1 ), (Ω, F , P2 ),
and so on.
Given a probability space and a random variable X : Ω → R n , we can
define a probability measure on the Euclidean space R n endowed with its Borel
algebra, (Rn , B(Rn )), in the following way

PX : B(Rn ) −→ [0, 1] : PX (B) = P(X −1 (B))

It is straightforward to verify that PX is a probability measure on (Rn , B(Rn )).


It is important to remember that the probability measure is defined on events
of the sample space, but it induces a probability measure on the real numbers
through random variables. That means that the same random variable X can in-
duce different probability measures on Rn , based on different probability spaces.
For the same B ∈ B(Rn )

(Ω, F , P) PX (B) = P(X −1 (B))


(Ω, F , Q) QX (B) = Q(X −1 (B))

In practice we cannot manipulate the sample space directly, since we typi-


cally we might not even know what the sample space is. Instead, we assume that
the sample space exists and a measurable space is well defined, but we work
with the induced probability measure. Furthermore, with some abuse of notation,
we also denote this induced measure with P.
Different induced probability measures PX , QX , . . . will be then due to differ-
ent measures P, Q, . . . on the measurable space (Ω, F ). These different measures
can be associated with differences of beliefs, differences of behavior, or other is-
sues.
For example, in finance investors are interested for the probabilistic behavior
of a speculative asset price, which is a random variable S : Ω → R + . In a simple
setting, all investors might know and agree on the true (induced) probability
measure P, but they might behave as if the probability measure was a different
one, say Q. There can be many ways that this discrepancy can be theoretically
explained: we will see that it can be a consequence of the risk aversion of
investors, market frictions like transaction costs or liquidity constraints, or other
causes.

EQUIVALENT PROBABILITY MEASURES


As we pointed out in the previous subsection, each random variable can induce
a multitude of different probability measures. We can categorize different prob-
ability measures according to some of their properties. It turns out that the most
important of these classifications is the one that looks at the sets that proba-
bility measures assign zero probability. Measures that agree on these sets are
called equivalent.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


9(1.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
Definition 9. Given two probability measures P, Q on a measurable space
(Ω, F ), we say that Q is absolutely continuous with respect to P, and we write
P  Q, if
(P(F ) = 0 ⇒ Q(F ) = 0 for all F ∈ F )
The Radon-Nikodym derivative of Q with respect to P is defined as
dQ
M=
dP
which makes sense since both P and Q are real-valued functions. If Q  P and
P  Q the probability measures are called equivalent, and we write P ∼ Q.
Absolute continuity implies that impossible events under P will also be im-
possible under Q. If the measures are equivalent then the agree on the subsets
of Ω that have zero probability.

CONDITIONAL PROBABILITY
The conditional probability is one of the main building blocks of probability the-
ory, and deals with situations where some partial knowledge about the outcome
of the experiment “shrinks” the sample space.
Given a probability space (Ω, F , P) and two events A, F ∈ F . If we assume
that a randomly selected sample ω ∈ A, we want to investigate the probability
that ω ∈ F . Since we know that ω ∈ A, the sample space has shrunk to A ⊆ Ω,
and the appropriate sigma algebra is constructed as FA = {F ∈ Ω : F =
G ∩ A, G ∈ Ω}. The members of the FA are conditional events, that is to say
event F is the event G conditional on event A. We denote the conditional events
as F = G|A.
It is not hard to verify that FA is indeed a σ-algebra on A.
1. The empty set ∅ = (∅ ∩ A) ∈ FA trivially.
2. Also, for an element (G|A) ∈ FA the complement (in the set A) (G|A){ =
G { ∩ A ∈ FA , since G { ∈ F .S S S 
3. Finally, the countable union i∈I (Gi |A) = i∈I (Gi ∩ A) = i∈I Gi ∩ A ∈
FA .
Therefore (A, FA ) is a measurable space.
Definition 10. Consider a probability space (Ω, F , P) and an event A ∈ F with
P(A) > 0. The conditional probability is defined, for all F ∈ F , as
P(F ∩ A)
PA (F ) = P(F |A) =
P(A)
We can verify easily that PA is a probability measure on (Ω, F ), which makes
(Ω, F , PA ) a probability space.3 For all events F ∈ F where P(F ∩ A) = 0, the
3
This is indeed an example of different probability measures defined on the same mea-
surable space.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  10(1.2)

conditional probability P(F |A) = 0. This means that these two events cannot
happen at the same time.
We argued above that by conditioning on the event A we shrink the measur-
able space (Ω, F ) to the smaller measurable space (A, FA ). In fact, equipped
with the measure PA , the latter becomes a probability space. It is easy to verify
that PA is a probability measure on (A, FA ), since P(A|A) = 1. Thus, we can
claim that by conditioning on A the probability space (Ω, F , P) shrinks to the
probability space (A, FA , PA ).
We can also successively condition on a family of events A 1 , A2 , . . . , An . In
fact, we can derive the following useful identity
P(A1 ∩ A2 ∩ · · · ∩ An ) = P(A1 ) · P(A1 |A2 ) · · · P(An |A1 , . . . , An−1 )
Another consequence of the definition is the celebrated Bayes’
S theorem, that
states that if Fi ∈ F , i ∈ I is a collection of events with i∈I Fi = Ω, and
A ∈ F is another event, then
P(F` )P(A|F` )
P(F` |A) = P
i∈I P(Fi )P(A|Fi )
Bayes’ theorem is extensively used to update expectations and forecasts based
on new evidence as this is gathered. This is an example of the filtering problem.

EXPECTATIONS
Given a probability space (Ω, F , P), consider an F -meas random variable X,
and assume that the random variable is integrable
Z
|X(ω)| · dP(ω) < ∞

A very important quantity is the expectation of X.
Definition 11. The expectation of X with respect to the probability measure P
is given by the integral
Z Z
EX = X(ω) · dP(ω) = x · dPX (x)
Ω Rn
The conditional expectation given a sub-σ-algebra G ⊂ F is a random
variable E[X|G ] that has the properties
1. E[X|G ] is G -measurable
R R
2. For all G ∈ G G E[X|G ]dP = G XdP
The conditional expectation is a random variable, since for different ω ∈ Ω the
quantity E[X|G ] will be different.
One can use the Radon-Nikodym derivative to compute expectations under
different equivalent probability measures. In particular, expectations under Q are
written as
Z Z Z
dQ
EQ X = xdQ(x) = x dP(x) = xM(x)dP(x) = EP [M(X)X]
Ω R dP Ω

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


11(1.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
INDEPENDENCE
• Two events F1 , F2 ∈ F are independent if
P(F1 ∩ F2 ) = P(F1 ) · P(F2 )
• Two σ-algebras F1 , F2 are independent if all pairs F1 ∈ F1 and F2 ∈ F2
are independent
• Two random variables X1 and X2 are independent if the corresponding gen-
erated σ-algebras FX1 and FX2 are independent
• If X is G -measurable, then E[X|G ] = X
• If X is independent of G , then E[X|G ] = EX
• If H ⊂ G then E[E[X|G ]|G ] = E[X|G ] (tower property)

1.3 STOCHASTIC PROCESS


Of course a random variable is sufficient if we want to describe uncertainty at a
single point in time. For example, we can assume that an asset price at a future
date is a random variable that depends on the state of the world on that date.
But typically we are interested not only on this static profile of the asset price,
but also on the dynamics that might lead there.
Therefore, by collecting a number of random variables, resembling the asset
price at different times, we construct a stochastic process.
Definition 12. A stochastic process is a parameterized family of random vari-
ables {X(t)}t∈T , where all random variables are defined on the same probability
space (Ω, F , P)
Xt : Ω −→ Rn
In our setting the subscript t denotes time, but it could well be a spatial
coordinate. The set T will determine if the stochastic process is defined in
continuous or in discrete time. In particular, if T = {0, 1, 2, . . .} then we have
a discrete time processes, while if T = [0, ∞) the process is cast in continuous
time.
There are two different ways to look at the realizations of a stochastic pro-
cess.
1. If we fix time we have a random variable
ω −→ X(t, ω), for all ω ∈ Ω
2. If we fix a state of the world ω we have the trajectory or path
t −→ X(t, ω), for all t ∈ T
There are also different ways to denote a stochastic process, and we use the
one that clarifies the way we view it at the time, for example X t , X(t, ω), Xt (ω),
or X(ω)(t).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  12(1.3)

Example 6. Let us revisit our coin experiment, where we flip 3 times. The state
space will collect all possible outcomes
Ω = {HHH, HHT , HT H, . . . , T T T }
We we define the collection of random variables
X(t, ω) = number of H in the first t throws
These random variables define a stochastic process on T = {0, 1, 2, 3}. In this
simple case we can tabulate them and keep track of its behavior for all times
and sample points

ω HHH HHT HTH THH HTT THT TTH TTT


X (0, ω) 0 0 0 0 0 0 0 0
X (1, ω) 1 1 1 0 1 0 0 0
X (2, ω) 2 2 1 1 1 1 0 0
X (3, ω) 3 2 2 2 1 1 1 0

We can fix time, say t = 2, and concentrate on the random variable X 2 (ω) which
is given by the horizontal slice of the table above

ω HHH HHT HTH THH HTT THT TTH TTT


X (2, ω) 2 2 1 1 1 1 0 0

Alternatively we can fix the sample point, say ω = T HT , and concentrate on


the function Xt (T HT )

t 0 1 2 3
X (t, T HT ) 0 0 1 1

Example 7. On the same probability space we can define another stochastic


process, say Y (t), where Y (t, ω) = 1 if we roll an even number of H up to time t,
0 otherwise (where zero is considered an even number). In that case the possible
values of the process are given in the following table

ω HHH HHT HTH THH HTT THT TTH TTT


Y (0, ω) 1 1 1 1 1 1 1 1
Y (1, ω) 0 0 0 1 0 1 1 1
Y (2, ω) 1 1 0 0 0 0 1 1
Y (3, ω) 0 1 1 1 0 0 0 1

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


13(1.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
FILTRATION
We discussed in the previous section how σ-algebras can be associated with
information. In particular, we noted that if a random variable X is F -measurable,
then we can determine the value of X(ω) without knowing the exact value of the
sample point ω, but by merely knowing in which sets F ∈ F the sample point
belongs to. In the context of stochastic processes information changes: typically
information is accumulated and a filtration is defined, but sometimes information
can also be destroyed. Therefore, the sigma algebra with respect to the random
variables X(t) are measurable must evolve to reflect that.

Definition 13. Consider a probability space (Ω, F , P). A filtration is a collection


of non-decreasing σ-algebras on Ω

F = {Ft }t∈T with Ft1 ⊆ Ft2 for all t1 , t2 ∈ T , t1 6 t2

where of course Ft ⊆ F for all t ∈ T


The quadruple (Ω, F, F , P) is called a filtered space.

A stochastic process Xt is called adapted (or Ft -adapted if the filtration


is ambiguous) if all random variables Xt are Ft -measurable. In the previous
section we discussed how a random variable generates a σ-algebra which keeps
the information gather by observing the realization of this random variable.
Here each collection of random variables {Xs }s6t will generate a σ-algebra (for
each t ∈ T ). This collection of σ-algebras is called the natural filtration of the
stochastic process Xt .
We denote this filtration by Ft = σ(Xs : 0 6 s 6 t), and in fact it is
the smallest filtration that makes Xt adapted. It represents the accumulated
information we gather by observing the process Xt up to time t. Note that this is
in fact different from the σ-algebra generated by the random variable X t alone,
in fact F (Xt ) ⊆ Ft .
Intuitively when ω ∈ Ω was chosen, the complete path {Xt }t∈T was chosen
as well, but this path has not been completely revealed to us. Our information
consists only of the part {Xs }06s6t . Based on this information we cannot pinpoint
precisely which ω has selected, but we can tell with certainty if ω belongs on
some specific subsets of Ω that form the natural filtration F t .
Another process Yt (ω) will be Ft -adapted if we can ascertain with certainty
the value Yt by observing Xt . There are two ways of looking at this dependence
1. There exist functions {ft }t∈T such that Yt = ft ({Xs }06s6t ) for all t. The
value Yt is a deterministic function of the history of Xt up to time t.
2. The natural filtration of Yt is subsumed in the natural filtration of Xt , that
is to say F (Yt ) ⊆ F (Xt ) for all t ∈ T .

Example 8. In our coin example the σ-algebras generated by the random vari-
ables Xt are the following

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  14(1.3)

F (X0 ) = {∅, Ω}
F (X1 ) = {∅, {HHH, HHT , HT T , HT H}, {T HH, T HT , T T H, T T T }, Ω}
F (X2 ) = {∅, {HHH, HHT }, {T T H, T T T }, {HT H, HT T , T HH, T HT },
all complements, all unions, all intersections}

The corresponding filtrations will include all unions, intersections and comple-
ments of the individual algebras, namely

F0 = F (X0 )
F1 = F (X0 ) ⊗ F (X1 )
F2 = F (X0 ) ⊗ F (X1 ) ⊗ F (X2 )

For example the set {HT H, HT T } does not belong in neither F (X 1 ) nor F (X2 ),
but it belongs in F2 , since

{HT H, HT T } = {HHH, HHT , HT T , HT H} ∩ {HT H, HT T , T HH, T HT }


where {HHH, HHT , HT T , HT H} ∈ F (X1 )
{HT H, HT T , T HH, T HT } ∈ F (X2 )

Intuitively the element represents the event “first toss is a head and second toss
is a tail”. Since Xt measures the number of heads, this event cannot be decided
upon by just observing X1 or by just observing X2 , but it can be deduced by
observing both. In particular, it is equivalent with the event “one head up to
time t = 1” (the event in F (X1 )) and (intersection) “one head up to time t = 2”
(the event in F (X2 )).

DISTRIBUTIONS OF A PROCESS
Based on the probability space (Ω, F , P) we can define the finite-dimensional
distributions of the process Xt . For any collection of times {ti }m
i=1 , and Borel
events {Fi }m
i=1 in B(R n
), the distribution

P(Xt1 ∈ F1 , Xt2 ∈ F2 , . . . , Xtm ∈ Fm )

characterizes the process and determines many important (but not all) properties.
The inverse question is of importance too: Given a set of distributions, is there
a stochastic process that exhibits them?
Kolmogorov’s extension theorem gives an answer to that question. Suppose
that for all ` = 1, 2, . . ., and for all finite set of times {ti }`i=1 in T , we can pro-
vide a probability measure µt1 ,t2 ,...,t` on (Rn` , B(Rn` )) that satisfies the following
consistency conditions
1. For all Borel sets {Fi }`i=1 in Rn the recursive extension

µt1 ,t2 ,...,t` (F1 × F2 × · · · × F` ) = µt1 ,t2 ,...,t` ,t`+1 (F1 × F2 × · · · × Fk × R)

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


15(1.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
2. For all Borel sets {Fi }`i=1 in Rn , and for all permutations ℘ on the set
{1, 2, . . . , `}

µt℘(1) ,t℘(2) ,...,t℘(`) (F1 × F2 × · · · × F` )


= µt1 ,t2 ,...,t` (F℘−1 (1) × F℘−1 (2) × · · · × F℘−1 (`) )
Then, there exists a probability space (Ω, F , P) and a stochastic process X t from
Ω to Rn , which has the measures µ as its finite distributions, that is to say for
all ` = 1, 2, . . .
P(Xt1 ∈ F1 , Xt2 ∈ F2 , · · · , Xt` ∈ F` ) = µt1 ,t2 ,...,t` (F1 × F2 × · · · × F` )
Kolmogorov’s extension theorem gives a very small set of conditions that can
lead to the existence of stochastic processes. This can be very useful as we
do not need to explicitly construct a process from scratch. Indeed, it is easy to
prove the existence of many processes, like ones with infinitely divisible finite
distributions, based on this theorem. The most important stochastic process is
undoubtly the Brownian motion.

1.4 BROWNIAN MOTION AND DIFFUSIONS


There are many different definitions and characterizations for the Brownian mo-
tion.4 Here, in oder to utilize Kolmogorov’s extension theorem, we will define
Brownian motion by invoking its transition density.
For simplicity we will only consider the one-dimensional case, but it is
straightforward to see the generalization to more dimensions. We define the
Gaussian transition density with parameter t for all x, y ∈ Real, which essen-
tially describes the probability mass of moving from point x to y over a time
interval of length t
 
1 (x − y)2
p(x, y; t) = √ exp −
2πt 2t
For any ` = 1, 2, . . ., all times t1 , t2 , . . . , t` , and all Borel sets F1 , F2 , . . . , F` in
R we also define the probability measures µ on R` in the following way

µt1 ,t2 ,...,t` (F1 × F2 × · · · × F` )


Z Z
= ··· p(x, x1 ; t1 ) · p(x1 , x2 ; t2 − t1 ) · · ·
F1 ×F2 ×···×F`

· p(x`−1 , x` ; t`−1 − t` ) · dx1 dx2 · · · dx`


4
The name Brownian motion is in honor of the botanist Robert Brown (1773-1858) who
did extensive botanic research in Australia and observed the random movements of
particles within pollen, the first well documented example of Brownian motion. The
stochastic process is also called Wiener process, in honor of Norbert Wiener (1894-
1964) who studied extensively the properties of the process.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.8
0.9
 !"#%$& '()" 
1
16(1.4)

FIGURE 1.1: Construction of a Brownian motion trajectory.

0.4
0
0.1 0.2
0.2
0.3
0.4 0
0.5
0.6 -0.2
0.7
0.8
-0.4
0.9
1
-0.6

-0.8

-1

-1.2
0 0.2 0.4 0.6 0.8 1

It is easy to verify that the assumptions of Kolmogorov’s extension theorem


are satisfied, which means that there exists a probability space (Ω, F , P), and a
process with Gaussian increments, which we denote with {B t }t>0 and we define
as a Brownian motion [BM] (started at B0 = x)
We can also construct the Brownian motion directly, as the infinite sum of
random “tent shaped” functions. To this end we will need an infinite collection of
standard normal random variables Bkn , for all natural numbers n = 0, 1, . . ., and
for all odd numbers k, where k 6 2n . We need to define the auxiliary function
gnk (u) which are piecewise constant
 (n−1)/2
 2 for 2−n (k − 1) < u 6 2−n k
0 n
g1 = 1, gk (u) = −2 (n−1)/2
for 2−n k < u 6 2−n (k + 1)

0 elsewhere

The “tent shaped” functions (fkn for each n and k) are the following integrals
over the interval [0, 1]
Z t
n
fk (t) = gnk (u)du
0
Finally, the Brownian motion is defined as the sum over all appropriate n
and k, that is to say
X∞ X
B(t) = fkn (t) · Bkn
n=0 k oddn
k62

This is of course a function B : [0, 1] → R. Essentially, at each level n a finer


refinement is added to the existing function, with an impact which falls as 2 −n

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


17(1.4) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 1.1: >_ Ya` T& Y WC'( Y


prq 8ts5%38uaAvBw8
*u%;t.=O3+>; xwyz6{M|~} q 8ts5%38uaA<;%€Ny(
y~}ƒ‚F2‚B2‚%‚a„GF:„%† p =?C4‡*u%;t.=O3+>;ˆ5 u%1%1a+,%=
M~} ‰4,C+5  A4;CŠ%=? <yt(6{;%€NyG‹O„†
M‚O}Œ‚c†
5
*+, ;ƒ}Œ‚(F-;%€Ny
%€NyŽ}Œ;z† p 8ONyƒ;%u8 q 4,Ž+>*Ž=4;C=O5
*+, ‡}‘„KF2(F:%€Ny p A++ 1’=?C,C+>uCŠ?’+““”+ ;C45
M‚Œ}rM‚Œ‹•*g2y6–;6/G%— ,N;C“; †
4>;“
10
MF62;K‹O„r}rM‚†
4>;“
*u%;t.=O3+>; M~}•*gu6–;6VG
3 * ;K}}‚ p =?C4Œ˜%™O+ŒA%3;C4Ž*+,~;ƒ}r‚
15
M~}šu†
,4=uC,; †
4>;“
p *+,r;”›œ•‚
“}ƒK>ž;Gc†
20
NC„Ÿ} uc›}%K—“tB<—G2uGœG2ež%„C—“Kc†
N~} uc›}(-K‹O„C—“tB<—G2uKœK—“Kc†
MO„Ÿ}‡c<‚B<™—;ž‚B<™CO—K<u¡žŽž%„C—“tc†
M%~}‡c<‚B<™—;ž‚B<™Ca—G<K‹O„O—“Žž¢uGc†
25
M} :NC„B2—MO„Cš‹ 0NzB2—CM%ac†

as the functions gnk show. Listing 1.1 shows how the function is implemented (a

Z# , and also the first "m%$ Z levels of approximation as the columns of the matrix
function call gives the support of the Brownian motion over [0, 1] as the vector

). The construction of the Brownian motion in this way is given schematically


in figure 1.1, where the construction for levels n = 2, n = 5 and n = 10 are
illustrated.

PROPERTIES OF THE BROWNIAN MOTION


Having defined and constructed the Brownian motion, we now turn to investi-
gating some important properties. We will assume that {Bt }t>0 is a Brownian
motion and Ft is its natural filtration.

The Brownian motion is a martingale


Given a filtered space (Ω, F, F , P), an stochastic process Xt is a martingale if
1. Xt is adapted to the filtration

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  18(1.4)

2. The process is integrable, E[|Xt |] < ∞, for all t > 0


3. The conditional expectation E[Xs |Ft ] = Xt , for all s > t > 0
It is not hard to verify that the Brownian motion is a martingale with respect
to its natural filtration. This means that the conditional expected increments of
a Brownian motion are zero, or that the best forecast one can provide is just the
current value
E[Bs − Bt |Ft ] = 0, or E[Bs |Ft ] = Bt
Also, one can easily show that Bt2 − t is a martingale.
Lévy’s theorem also states the converse: Given a filtered space, if {X t }t>0 is
a continuous martingale, and Xt2 − t is also a martingale, then Xt is a Brownian
motion. If we drop the second part, and we put instead that E[X s2 |Ft ] = β(s − t),
for an adapted function β, then the time-changed process X β(t) will satisfy the
requirements of Lévy’s theorem. We can then conclude that every continuous
martingale can be represented as a time-changed Brownian motion.
Another important martingale
 is the exponential martingale process, given
by Mt = exp θBt − 21 θ 2 t for any parameter value θ ∈ R.

The Brownian Motion is Gaussian, Markov and continuous


By its definition, the Brownian motion is Gaussian, that is for all times
t1 , t2 , . . . , t` ∈ T the random variable B = (B1 , B2 , . . . , B` ) has a multi-normal
distribution
A Markov process has the property that

P(Xs ∈ F |Ft ) = P Xs ∈ F |F (Xt )

for all s > t > 0, which means that the conditional distribution depends only
on the latest value of the process, and not on the whole history. Remember
the difference between the σ-algebras Ft which belongs to the filtration of
the process and therefore includes the history, and F (X t ) which is generated
by a single observation at time t. For that reason Markov process are coined
“memory-less”. The Brownian motion is Markov, once again by its definition.
A Feller semigroup is a family of linear mappings indexed by t > 0

Pt : C (R) −→ C (R)

where C (R) is the family of continuous functions that vanish at infinity, such
that
1. P0 is the identity map
2. Pt are contraction mappings, ||Pt || 6 1 for all t > 0
3. Pt has the semigroup property, Pt+s = Pt ◦ Ps for all t, s > 0, and
4. The limit limt↓0 ||Pt f − f|| = 0, for all f ∈ C (R)
A Feller transition density is a density that is associated with a Feller
semigroup. A Markov process with a Feller transition function is called a Feller

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


19(1.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
process. One can verify that the Brownian motion is indeed a Feller process, for
the Feller semigroup which is defined as
Z
Pt f(x) = p(x, y; t)f(y)dy
R

Based on the Feller semigroup expectations of functions of the Brownian motion


will be given by
Z
E[f(Bt+s )|Ft ] = Ps f(Bt ) = p(Bt , y; s)f(y)dy
R

The Brownian motion is also a process that has almost surely continuous
samples paths. This is due to Kolmogorov’s continuity theorem, which states
that if for all t ∈ T we can find constants α, β, γ > 0 such that

E|Xt1 − Xt2 |α 6 γ|t1 − t2 |1+β , for all 0 6 t1 , t2 6 t

then Xt has continuous paths (or at least a version). For the Brownian motion
E|Bt1 − Bt2 |4 = 3|t1 − t2 |2 and therefore Bt will have continuous sample paths.

The Brownian motion is a diffusion


A Markov process with continuous sample paths is called a diffusion. A diffusion
{Xt }t>0 is “characterized” by its local drift µ and volatility σ. Loosely speaking,
for small 4t we write the instantaneous drift and volatility

E[Xt+4t − Xt |Ft ] = µ(Xt ) · 4t + o(4t)


E[(Xt+4t − Xt − µ(Xt )4t)2 |Ft ] = σ 2 (Xt ) · 4t + o(4t)

If the drift and volatility is constant, the process Xt = µt+σBt for a Brownian
motion {Bt }t>0 will be a diffusion. More generally the instantaneous drift and
volatility do not have to be constant, but can depend on the location X t and the
time t. Diffusions are then given as solutions to stochastic differential equations.

The Brownian motion is wild


If we fix the sample point ω ∈ Ω, a Brownian motion as a function of time
t → B(t, ω) is a lot wilder than many “normal” functions. We have shown already,
using Kolmogorov’s continuity theorem, that a sample path of the Brownian
motion is almost everywhere continuous, but it turns out that it is nowhere
differentiable.
The total variation of a Brownian motion trajectory is unbounded, and the
quadratic variation is non-zero.
X
|Btk+1 − Btk | −→ ∞
X
|Btk+1 − Btk |2 −→ t

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.1
0 !"#%$& '()"  20(1.4)
0.2
0.3 FIGURE 1.2: Zooming into a Brownian motion sample path.
0.4
0.5
0.6 1.5 1.5
0.7
0.8 1
1
0.9
1 0.5
0.5
0

-0.5 0
0 0.5 1 0.3 0.4 0.5 0.6

1.1
1.2
1
1
0.9
0.8
0.8
0.45 0.5 0.55 0.48 0.5 0.52

for partitions {ti } of the time interval [0, t], where sup |tk+1 −tk | → 0. For “normal”
functions the total variation would be the length of the curve; this means that to
draw a Brownian motion trajectory on a finite interval we will need an infinite
amount of ink. Also, the quadratic variation of “normal” functions is zero, since
they will not be infinitely volatile in arbitrarily small intervals.
When we consider a Brownian motion path, it is impossible to find a interval
that is monotonic, no matter how much we zoom in the trajectory. Therefore we
cannot split a Brownian motion path in two parts with a line that is not vertical.
Figure 1.2 gives a trajectory of a Brownian motion and illustrates how wild the
path is by successively zooming in the process.

DEALING WITH DIFFUSIONS


As we mentioned in the previous subsection, diffusions arise as solutions to
stochastic differential equations. In finance we typically use diffusions to model
factors such as stocks prices, interest rates, volatility, and others, that affect the
value of financial contracts. There are three techniques for solving problems that
relate to diffusions:
1. The stochastic differential equation [SDE] approach
2. The partial differential equation [PDE] approach
3. The martingale approach
All approaches are in principle interchangeable, but in practice some are
more suited for particular problems. As a matter of fact, in finance we use all

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


21(1.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
three to tackle different situations. PDEs offer a “global” view of the problem in
hand, while the other two approaches offer a more probabilistic “local” view.

1.5 STOCHASTIC DIFFERENTIAL EQUATIONS


A stochastic differential equation [SDE] resembles a normal differential equa-
tion, but some parts or some parameters are assumed random. Therefore, the
solution is not a deterministic function but some sort of a generalized, “stochas-
tic” function. The calculus of such functions is called Itō calculus, in honor of
Kiyoshi Itō (1915-). Loosely speaking, one can represent a SDE as

dXt
= µ(t, Xt ) + “noise terms”
dt
The solution of such a differential equation could be represented, once again
loosely, as Z t Z t
Xt = X 0 + µ(s, Xs )ds + “noise terms”ds
0 0
If we write the “noise terms” in terms of a Brownian motion, say B t , we have
a process that has given drift and volatility, called an Itō diffusion
Z t Z t
Xt = X 0 + µ(s, Xs )ds + σ(s, Xs )dBs
0 0

The last integral, called an Itō integral with respect to a Brownian motion, is
not readily defined, and must clarify what we actually mean by it. Before we do
so, note that we usually write the above expression in a shorthand “differential”
form as
dXt = µ(t, Xt )dt + σ(t, Xt )dBt
It is obvious that unlike normal (Riemann or Lebesgue), Itō integrals have a
probabilistic interpretation, since they depend on a stochastic process. To give
a simple motivating example, a Brownian motion can be represented as an Itō
integral as Z t
Bt = dBs
0

VARIATION PROCESSES AND THE ITŌ INTEGRAL


Before we turn to the definition of the Itō integral, we need to give some more
information on the variation processes, some of which we have already encoun-
tered when discussing the properties of the Brownian motion. For any process
Xt (ω) we define the p-th order variation process, which we denote with hX, Xi (p)
t ,
as the probability limit

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  22(1.5)
X
hX, Xi(p)
t = plim |Xtk+1 (ω) − Xtk (ω)|p as 4tk → 0
tk 6t

for a dyadic partition tk of [0, t].


Therefore, the quadratic variation of P the Brownian motion will be the (prob-
ability) limit hB, Bit = hB, Bi(2)
t = plim |4Btk |2 . We have already seen that
the quadratic variation hB, Bit = t, since
hX i
E (4Btk )2 − t = 0
hX i2 X
E (4Btk )2 − t = 2 (4tk )2 → 0
 2
We usually write the above expression in shorthand as dBt (ω) = dt.
If Bt was a “normal” function, then the Itō integral could be written in its
Riemann sense, by using the derivative of Bt
Z t Z t
dBu
σ(s, Xs )dBs = σ(s, Xs ) ds
0 0 du u=s

Here the function t → Bt (ω) is nowhere differentiable, and therefore we cannot


express the integral in such a simple form, but we can think of it as the limit
of Riemann sums. When we fix the sample point ω ∈ Ω, the random variable
Xt = Xt (ω) becomes a function over time t (albeit a wild and weird one), allowing
these Riemann sums to be defined. This indicates that stochastic integrals will
also be random variables, since in that sense they are mappings
Z t
ω −→ f(s, ω)dBs (ω) ∈ R
0

For the dyadic partitions tk of the time interval [0, t], we define the Itō integral
as the limit of the random variables
X
ω −→ f(tk , ω)[Btk+1 − Btk ](ω)
k>0
R
The limit is taken with respect to the L2 -norm ||f||2 = |f(s)|2 ds. More precisely
we first define the integral for simple, step-like functions, then extend it to
bounded functions φn , and finally move to more general functions f, such that
1. (t, ω) → f(t, ω) is B ⊗ F -meas
2. f(t,
R tω) is Ft -adapted
3. E 0 f 2 (s, ω)ds < ∞
The final property ensures that the function is L 2 -integrable, and allows the
required limits to be well defined using the Itō isometry which states
Z t 2 Z t
E f 2 (s, ω)dBs (ω) =E f 2 (s, ω)ds
0 0

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


23(1.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
In particular, for a sequence of bounded functions φ n that converges (in L2 ) to f
Z t
E [f(s, ω) − φn (t, ω)]2 ds −→ 0 as n −→ ∞
0

the corresponding stochastic integrals will also converge.

THE STRATONOVICH INTEGRAL


One important observation when computing the Itō integral is that the left end-
points of each subinterval in the partition were used to evaluate the integrand
f. If the integrand was well behaved, then it would not make any difference if we
took the right point or the midpoint instead, but since the integrand has infinite
variation it matters.
If we used the midpoint instead, then the resulting
R t random variable is called
the Stratonovic stochastic integral, denoted with 0 σ(s, Xs ) ◦ dBs , which is the
limit of the random variables
X f(tk+1 , ω) + f(tk , ω)
ω −→ [Btk+1 − Btk ](ω)
2
k>0

It is easy to see that the Stratonovic integral is not an Ftk -adapted random
variable, since we need to know the value of the process at the future time point
t + k + 1, in order to ascertain the value of the Riemann sums. For that reason
it is not used as often as the Itō representation in financial mathematics, 5 but
it has better convergence properties (due to the midpoint approximation of the
integral) and it is used when one needs to simulate stochastic processes. In fact,
when the process is an Itō diffusion (see below) the two stochastic integrals are
related
Z t Z t Z
1 t ∂σ(s, Xs )
σ(s, Xs ) ◦ dBs = σ(s, Xs )dBs + σ(s, Xs )ds
0 0 2 0 ∂x

ITŌ DIFFUSIONS AND ITŌ PROCESSES


Consider a Brownian motion Bt on a filtered space (Ω, F, F , P), the filtration it
generates {Ft }t>0 , and two Ft -adapted functions µ and σ.
As we noted before, an Itō diffusion is a stochastic process on (Ω, F, P) of
the form Z Z t t
Xt = X 0 + µ(s, Xt )ds + σ(s, Xt )dBs
0 0
We need a few conditions that ensure regularity and that solutions for the SDE
exist and do not explode in finite time
5
But it can be used for example if t is a spatial rather than a time coordinate, since
then we could actually observe the complete realization in one go.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  24(1.5)
Rt
1. The Itō isometry E 0 σ 2 (s, Xs )ds < ∞, for all times t > 0
2. There exist constants A, B such that for x, y ∈ R, s ∈ [0, t]

|µ(s, x)| + |σ(s, x)| 6 A(1 + |x|)


|µ(s, x) − µ(s, y)| + |σ(s, x) − σ(s, y)| 6 B(1 + |x − y|)

An Itō process is a stochastic process on the same filtered space, of the form
Z t Z t
Xt = X 0 + µ(s, ω)ds + σ(s, ω)dBs
0 0

In Itō diffusions the information is generated by the Brownian motion, now we


let the information to be more general, as long as the Brownian motion remains
a martingale.
We consider a filtration Gt that makes Bt a martingale, and assume the µ
and σ are Gt -adapted. Instead of the integrability and isometry assumptions we
need instead
Z t
P[ |µ(s, ω)|ds < ∞ for all t > 0] = 1
0
Z t
P[ σ 2 (s, ω)ds < ∞ for all t > 0] = 1
0

Itō processes generalize Itō diffusions in two ways.


1. Information: we can have more information than just the one we gather by
observing the SDE, but this information should not make the Brownian
motion predictable.
2. Dependence: drift and volatility can depend on the whole history, rather
than the latest value of the process, Xt .
Unlike Itō diffusions, Itō processes are not always Markov. A diffusion dX =
µ(t, X)dt + σ(t, X)dB will coincide in law with a process dY = µ? (t, ω)dt +
σ? (t, ω)dB if

Ex [µ? (t, ω)|FtY ] = µ(t, Ytx ), and σ?2 (t, ω) = σ 2 (t, Y )xt

which essentially states that the process is Markov.

ITŌ’S FORMULA
Itō’s formula or Itō’s lemma is one of the fundamental tools that we have in
stochastic calculus. It plays the rôle that the chain rule plays in normal calculus.
Just like the chain rule is used to solve ODEs or PDEs, a clever application of
Itō’s formula can significantly simplify a SDE. We consider an Itō process

dXt = µ(t, ω)dt + σ(t, ω)dBt

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


25(1.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
A function g(t, x) in C (1,2) (differentiable in t and twice differentiable in x)
will define a new Itō process as the transformation Yt = g(t, Xt ). Itō’s formula
describes the dynamics of Yt in terms of the drift and volatility of Xt , and the
derivatives of the transformation g. In particular, the SDE for Y t is given by

∂ ∂ 1 ∂2
dYt = g(t, Xt )dt + g(t, Xt )dXt + g(t, Xt )(dXt )2
∂t ∂x 2 ∂x 2
The “trick” is that the square (dXt )2 is computed using the rules

dt · dt = dt · dBt = 0, and (dBt )2 = dt

One can easily prove Itō’s formula based on a Taylor’s expansion of the
function g.6 In particular, one can write for 4t > 0 the quantity 4Xt = Xt+4t −
Xt = µ(t, Xt )4t + σ(t, Xt )(Bt+4t − Bt ) + o(4t). Taking powers of the Brownian
increments 4Bt = Bt+4t − Bt yield

E4Bt = 0
E(4Bt )2 = 4t
E(4Bt )n = o(4t) for all n > 3

This implies that the random variable (4Bt )2 will have expected value equal to
4t and variance of order o(4t). A consequence is that in the limit 4B t → 4t,
since the variance goes to zero. Now the Taylor’s expansion for 4Y t = g(t +
4t, Xt+4t ) − g(t, Xt ) will give

∂g(t, Xt ) ∂g(t, Xt ) 1 ∂2 g(t, Xt )


4Yt = 4t + 4Xt + (4Xt )2 + o(4t)
∂t ∂x 2 ∂x 2
Passing to the limit yields Itō’s formula.
Example 9. Itō’s formula can be used to simplify SDEs and cast them is a form
that is easier to explicitly solve. Say for example that we are interested in the
stochastic integral Z t
Bs dBs
0
where Bt is a standard Brownian motion. We will consider the function g(t, x) =
x2 1 2
2 , and define the process Yt = g(t, Xt ) = 2 Bt . Using Itō’s formula we can
specify the dynamics of this process, namely
1 dt
dYt = 0 · dt + Bt dBt + · 1 · (dBt )2 = Bt dBt +
2 2
In other words we can write
 
1 2 1
d B = dt + Bt dBt
2 t 2
6
This Taylor’s expansion is valid, since g is a function that is sufficiently smooth.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.5
0.6
 !"#%$& '()" 
0.7
0.8 26(1.5)
0.9
1
FIGURE 1.3: A sample path of an Itō integral.

0 0.5 Rt
0.1 0
Bs dBs
0.2
0.3
0.4 0
0.5
0.6
0.7 -0.5
0.8
0.9
1 Bt
-1

-1.5
0 0.2 0.4 0.6 0.8 1

Rt 1 2

By taking integrals of both sides, and recognizing that 0 d 2 Bs = 21 Bt2 , we
can solve for the Itō integral in question
Z t
1 1
Bs dBs = Bt2 − t
0 2 2

RA t trajectory Bt (ω) for an element ω ∈ Ω, and the corresponding solution


0 Bs (ω)dBs (ω) is given in figure 1.3.

Example 10. The most widely used model for a stock price, say S t , satisfies
the SDE for a geometric Brownian motion with constant expected return µ and
volatility σ, given by
dSt = µSt dt + σSt dBt
This corresponds loosely to the ODE dS dt = αSt which grows exponentially.
t

Motivated by this exponential growth of the ODE we consider the logarithmic


function g(t, x) = log x. Using Itō’s formula we can construct the SDE for s t =
log St  
1 2
dst = µ − σ dt + σdBt
2
This SDE has constant coefficients and can be readily integrated to give
 
1 2
st = s0 + µ − σ t + σBt
2

We can cast this expression back to the asset price itself

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.8

Q #RO& $>S%!%TVU>! WRX$>SO&T


0.9
27(1.6)
1

FIGURE 1.4: Asset price trajectories

0
0.1 3.5
0.2
0.3 3
0.4
0.5
0.6 2.5
0.7
0.8
0.9 2
1
1.5

0.5
0 0.2 0.4 0.6 0.8 1

  
1
St = S0 exp µ − σ 2 t + σBt
2
Note that under the geometric Brownian assumption the price of the asset is
always positive, an attractive feature in line with the property of limited liability
of stocks. Some stock price trajectories for different ω ∈ Ω are given in figure
1.4.

1.6 THE PARTIAL DIFFERENTIAL EQUATION


APPROACH
In the stochastic differential equation approach we typically consider a function
of an Itō diffusion, and the construct the dynamics of the process under this
transformation. In many applications we are not interested in the actual paths
of the process, but for some expectation of some function at a future date. In the
PDE approach we want to investigate the transition mechanics of the process,
and based on that the transition mechanics of the expectation in hand.

GENERATORS
Say we are given a Brownian motion Bt on a filtered space. For a SDE that
describes the motion of a stochastic process Xt , say

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  28(1.6)

dXt = µ(t, Xt )dt + σ(t, Xt )dBt

we can define an elliptic operator which is applied to twice-differentiable func-


tions f ∈ C (2)  
d 1 d2
L f(x) = µ(t, x) + σ 2 (t, x) 2 f(x)
dx 2 dx
This elliptic operator is also the infinitesimal generator of the process, which is
given formally in the following definition.

Definition 14. Given an Itō diffusion Xt , the (infinitesimal) generator of the pro-
cess, denoted with A , is defined for all functions f ∈ C (2) as the limit
Ex f(X4t ) − f(x)
A f(x) = lim = L f(x)
4t↓0 4t

It is Kolmogorov’s backward equation that gives us the expectation of the


function f at a future date, as the solution of a partial differential equation.
In particular, if we denote the expectation with g(t, x) = E[f(XT )|Xt = x], for
f ∈ C (2) we have
∂ ∂ 1 ∂2
− g(t, x) = Ax g(t, x) = µ(t, x) g(t, x) + σ 2 (t, x) 2 g(t, x)
∂t ∂x 2 ∂x
The subscript x of the generator in the above expression just indicates that
the derivatives are partial and taken with respect to x. The final condition for
Kolmogorov’s backward PDE will be g(T , x) = f(x).
There is a very intuitive way of viewing Kolmogorov’s equation. If we consider
the expectation Et = g(t, Xt ) = E[f(XT )|Ft ] as a stochastic process, then we can
observe (by applying Itō’s formula) that Kolmogorov’s backward PDE sets the
drift of Et equal to zero, rendering Et a martingale. This means that expectations
we form at time t are unbiased, in the sense that we do not anticipate any special
information that will change them.
If we use the indicator function, then the expectation becomes the conditional
probability that the diffusion will take values in a set F at time T > t. In
particular, if we denote this conditional probability with f(t, x; T , F ) = P(X T ∈
F |xt = x) = E[ℵ(XT ∈ F )|xt = x], then Kolmogorov’s backward PDE takes the
form
∂ ∂ 1 ∂2
− f(t, x; T , F ) = µ(t, x) f(t, x; T , F ) + σ 2 (t, x) 2 f(t, x; T , F )
∂t ∂x 2 ∂x
for t 6 T , with a terminal condition f(T , x; T , F ) = ℵ(x ∈ F ) (which is equal to
one if x ∈ F and zero otherwise). Therefore this PDE describes the evolution
of the probability that we will end up in a certain set of states at some future
time T . It is called a backward PDE because we start with a terminal condition
at the future time T and integrate backwards to the present time t.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


29(1.6) Q #RO& $>S%!%TVU>! WRX$>SO&T
Kolmogorov’s forward equation also known as Fokker-Planck equation con-
siders the transition density p(t, x; T , y) = P(XT ∈ dy|xt = x). It postulates
that
∂ ∂  1 ∂2  2 
f(t, x; T , y) = − µ(T , x)f(t, x; T , y) + 2
σ (T , x)f(t, x; T , y)
∂T ∂x 2 ∂x
for T > t, with an initial condition f(t, x; t, y) = ℵ(x = y) (the Dirac function).
The PDE gives the evolution of the distribution of xT given the current state
xt = x. It is called a forward PDE because we start from the state at the current
time t and integrate forwards towards the future time T .

STOPPING TIMES
Definition 15. A stopping time is a random variable

τ : Ω → [0, ∞], such that {ω : τ(ω) 6 t} ∈ Ft for all t > 0

That is to say, a stopping time is defined by an event that is F t -measurable.


This means that at any time t > 0 we can ascertain with certainty whether or
not the event has happened. Examples of stopping times are the first hitting
times, first exit times from a set, and so on. Given a stopping time we can define
the stopped process, which is simply X̃t = Xmin(t,τ) .
Say that τ is a stopping time with Ex τ < ∞, meaning that the process will
be stopped at some point in the future almost surely. Dynkin’s formula gives
expectations at a stopping time τ, as
Z τ
Ex f(Xτ ) = f(x) + Ex A f(Xs )ds
0

Here, f ∈ C (2) , and also has compact support. Note that in the above integral
the upper bound is a random variable. Dynkin’s formula can be used to assess
when a process is expected to be stopped, that is to say the expectation E x [τ(ω)].

Example 11. For example, say that we are holding a stock with current price S,
and dynamics that follow the geometric Brownian motion

dSt = µSt dt + σSt dBt

We want to know how long should we expect to wait, before our asset will
be worth at least S̄ > S. Mathematically, we are interested on the first exit time
from the set [0, S̄]
τ = inf{t > 0 : St > S̄}
We cannot directly apply Dynkin’s formula, since we need E s τ < ∞ and we
are not sure about that. For example the asset might have a negative drift and
exponentially drop towards zero. We can define instead the exit times from the
set [a, S̄]

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  30(1.7)

τa = inf{t > 0 : St > S̄ or St 6 a}


The expected exit time from a compact set is indeed bounded, and therefore
Dynkin’s formula can now be applied.
Here it will be useful to remind us the solution of the SDE for the geometric
Brownian motion   
1 2
St = S0 exp µ − σ t + σBt
2
Suppose that µ < σ 2 /2, which means that the expected returns of the asset
are not large enough for the price to be expected to grow. In this case, as t → ∞
the expectation of the asset price ES St → 0, and in fact every trajectory St → 0,
almost surely. Then, every trajectory will exit the set at least at lower bound a.
Of course, the process might hit the upper bound S̄ first. For that reason, say
that the probability P(Sτa = a) = pa , and of course P(Sτa = S̄) = 1 − pa .
2
Consider the function f(x) = x 1−2µ/σ . This might appear to be an odd choice,
by we have selected this function because when we apply generator A
   
2µ 2 1 2µ 2µ 1−2µ/σ 2
A f(x) = µ 1 − 2 x 1−2µ/σ − σ 2 1 − 2 x =0
σ 2 σ σ2

Therefore Dynkin’s formula yields (for the exit times τa )


2 2 2
pa · S̄ 1−2µ/σ + (1 − pa ) · a1−2µ/σ = S 1−2µ/σ

Passing to the limit a → 0 we can retrieve the probability of never reaching our
 1−2µ/σ 2
target of S̄, namely pa → p = SS̄ . This probability become higher for
lower expected returns or higher volatility.
If µ > σ 2 /2 then the expected returns are high enough for all sample paths
to eventually breach the target S̄, since St → ∞ as t → ∞, almost surely. The
process will exit with probability 1, but Eτ we might still be ∞.
In this case we consider the function f(x) = log x. Our objectiveRτ now it for
the generator to be constant, in order to simplify the integral 0 a A f(Xs )ds. In
particular
1
A f(x) = µ − σ 2
2
Dynkin’s formula will yield in this case (once again for the exit times τ a )
  Z τ
1 2 S
pa · log S̄ + (1 − pa ) · log a = log S + µ − σ E dt
2 0

Passing to the limit this time will give the expected stopping time
 
log S̄/S
ES τ a → E S τ = as a → 0
µ − σ 2 /2

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


31(1.7) Q #RO& $>S%!%TVU>! WRX$>SO&T
1.7 THE FEYNMAN-KAC FORMULA
The Feynman-Kac formula generalizes Kolmogorov’s backward equation, and
provides the connection that we need between the SDE and PDE approaches.
It is named after Richard Feynman (1918-1988) and Marek Kac (1914-1984), but
was published by Kac in 1949. It gives expectations not only of a functional of
the terminal value of the process, but also some functionals that are computed
at intermediate points.
In particular, we start with an Itō diffusion which is associated with a gen-
erator A , and we also consider two functions f ∈ C ((2) and g ∈ C , continuous
and lower bounded. We are interested in computing an expectation of the form
  Z t  
u(t, x) = Ex exp − g(Xs )ds · f(Xt )
0

The Feynman-Kac formula states that this expectation satisfy the partial
differential equation

u(t, x) = A u(t, x) − g(x)u(t, x)
∂t
with boundary condition u(0, x) = f(x).
The Feynman-Kac formula has been very successful in financial mathe-
matics,
n Ras it can represent
o stochastic discount factors through the exponential
t
exp − 0 g(Xs )ds .

Example 12. Suppose that the interest rate rt follows an Itō diffusion given by

drt = θ(ρ − rt )dt + σ rt dBt

Also suppose that we have an investment that depends on the level of the interest
rate, for example a house, with value given by H(r). This implies that the property
value will also follow an Itō diffusion, with dynamics given by Itō’s formula. At a
future time T , the house price will be H(rT ), which is of course unknown today.
We are interested in buying the property at time T , which means that are
interested in the present value of H(rT ), namely
  Z T  
x
u(t, x) = E exp − rs ds · H(rt )
0

Say that we have a project with uncertain payoffs that depend on the evo-
lution of a variable Xt , which has current value X0 = x. The dynamics are
dX = µdt + σdB, and the project will pay f(XT ) = aXT2 + b. We are interested
in establishing the present value
 
Ex exp {−RT } · (aXT2 + b)

This will be equal to u(T , x), where u satisfies the PDE

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  32(1.8)

∂ ∂ 1 ∂2
u(t, x) = µ u(t, x) + σ 2 2 u(t, x) − Ru(t, x)
∂t ∂x 2 ∂x
with boundary u(0, x) = ax 2 + b

1.8 GIRSANOV’S TRANSFORMATION


As we saw in section 1.2 the same measurable space (Ω, F ) can support dif-
ferent probability measures. We also saw how the Radon-Nikodym derivative
can be used to compute expectations under different equivalent measures. Gir-
sanov’s theorem gives us the tools to specify this Radon-Nikodym derivative
for Itō processes. In particular, consider an Itō process on the filtered space
(Ω, {Ft }06t6T , F , P), that solves the SDE
dXt = µ(t, ω)dt + σ(t, ω)dBt
According to Girsanov’s theorem, for each equivalent measure Q ∼ P there
exist Ft -adapted processes λ such that the process
Z t
BtQ (ω) = Bt (ω) + λ(s, ω)ds
0

is a Brownian motion in (Ω, {FtQ }06t6T , F , Q), where FtQ = σ(BsQ : 0 6 s 6 t)


is the σ-algebra generated by BtQ .
If we define the Ft -adapted function α as
α(t, ω) = µ(t, ω) − λ(t, ω) · σ(t, ω)
then the process Xt can be written as a stochastic differential equation under Q
as
dXt = α(t, ω)dt + σ(t, ω)dBtQ
This means that if we are given an equivalent measure we can explicitly solve
for for the function λ(t, ω) and write down the SDE that Xt will solve under the
new measure.
Girsanov’s theorem also allows us the inverse construction: given an adapted
function λ(t, ω) we can
R t explicitly construct an equivalent probability measure
under which Bt (ω)+ 0 λ(s, ω)ds is a Brownian motion. We define the exponential
martingale  Z t Z 
1 t 2
Mt = exp − λ(s, ω)dBs − λ (s, ω)ds
0 2 0
Based on this exponential martingale we define the following measure on (Ω, F t )
Z
Q(F ) = Mt (ω)dP(ω) = E [Mt · ℵ(Xt ∈ F )] , for all F ∈ Ft
F

which we represent as the real-valued process dQ(ω) = Mt (ω) · dP(ω), or in


terms of the Radon-Nikodym derivative dQ
dP Ft = Mt . It follows that

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


33(1.8) Q #RO& $>S%!%TVU>! WRX$>SO&T
1. Q is a probability measure
Rt on Ft .
2. The process BtQ = 0 λ(t, ω)ds + Bt . is a Brownian motion on the filtered
space (Ω, F, F , Q).
3. The Itō process can be written in SDE form as
dXt = α(t, ω)dt + σ(t, ω)dBtQ
Essentially under the new measure the Itō process will have a different drift
α, but the same volatility as the original one: this is the Girsanov transformation.
The process Bt will not be a Brownian motion under the new measure, since we
select elements ω ∈ Ω usingRdifferent probability weights. One the other hand,
t
it turns out that the process 0 λ(t, ω)ds + Bt will be a Brownian motion.
In practice we are not really interested in the probability distribution or
the dynamics on the set Ω, but rather the distribution of F t -measurable ran-
dom variables Yt (ω). The Radon-Nikodym derivative will allow us to express
expectations under different equivalent probability measures. In particular,
EQ [Yt ] = EP [Mt · Yt ]
This is the relation that is routinely used in financial economics, as we very often
want to changes probability measures as they adjust with respect to the risk
aversion profile of the agents, or with respect to different numeráire securities.
Example 13. Say that the price of an asset follows a geometric Brownian motion
dSt = µSt dt + σSt dBt
Here Bt is a Brownian motion under the filtered space (Ω, F, F , P). We can con-
struct a new probability measure Q, under which the asset price is a martingale.
We can write the process above as
µ 
dSt = St σ dt + dBt
σ
Therefore, if we set λ = σµ , then we are looking for the probability measure that
makes the process
BtQ = λt + Bt
a Q-martingale. Then the asset price process under Q will be given by the SDE
dSt = σSt dBtQ
Girsanov’s theorem tells us that such a probability measure on Ω exists. If we
want to take expectations under this equivalent measure we need to construct
the exponential martingale
 
1 2
Mt = exp − λ t − λBt
2
Then, for any Ft -measurable random variable Yt we can write EQ [Yt ] =
EP [Mt Yt ].

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  34(1.8)

Example 14. Let’s say that we want to verify the above claim that E Q [Yt ] =
EP [Mt Yt ], stated by Girsanov’s theorem. For example, let’s take the random
variable Yt = log St . Under Q the logarithm will be given by Itō’s formula as

1
Yt = log S0 − σ 2 t + σBtQ
2
and since BtQ is a Q-martingale, the expectation EQ [Yt ] = log S0 − 21 σ 2 t. Under
P we have to consider the process

Zt = M t Yt

Itō’s formula (applied on the function f(x, y) = x · y) will give us the dynamics
for Zt , namely
dZt = Mt dYt + Yt dMt + dYt dMt
which actually produces
    
1 2 1 2
dZt = exp − λ t − λBt µ − σ dt + σdBt
2 2
   
1 2
+ log S0 + µ − σ t + σBt [−λdBt ]
2
  
1 2
+ µ − σ dt + σdBt [−λdBt ]
2

The solution to the above SDE is written as


Z t    
1 2 1 2
Zt = Z 0 + exp − λ s − λBs µ − σ − λσ ds
0 2 2
Z t     
1 2 1
+ exp − λ s − λBs σ − λ log S0 − λ µ − σ 2 s − λσBs dBs
0 2 2
µ
Taking expectations, and using that Zt = Mt Yt , Z0 = M0 Y0 = log S0 and λ = σ
will yield
Z t 
1 1
EP [Mt Yt ] = log S0 + µ − σ 2 − λσ ds = log S0 − σ 2 t
0 2 2

And the two expectations are apparently the same. Observe though how easier
it was to compute the expectation under Q. Girsanov’s theorem can be a valuable
tool when one wants to simplify complex expectations, just by casting them under
a different measure.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


2
The Black-Scholes world

In this chapter we will use some of the previous results to establish the Black-
Scholes (BS) paradigm. We will assume a frictionless market where assets prices
follow geometric Brownian motions, and we will investigate the pricing of deriva-
tive contracts. The seminal papers of Black and Scholes (1973) and Merton
(1973) (also collected in the excellent volume in Merton, 1992) defined the area
and sparked thousands of research articles on the fair pricing and hedging of a
variety of contracts.
The original derivation of the BS formula is based on a replicating portfo-
lio that ensures that no arbitrage opportunities are allowed. Say that we are
interested in pricing a claim that has payoffs that depend on the value of the
underlying asset at some fixed future date T . The idea is to construct a portfo-
lio, using the underlying asset and the risk free bond, that replicates the price
path of that claim, and therefore its payoffs. If we achieve that, then the claim in
question is redundant, in the sense that we can replicate it exactly. In addition,
the value of the claim must equal the value of the portfolio, otherwise arbitrage
opportunities would arise.
After the

2.1 THE ORIGINAL DERIVATION


In this section we lay down the assumptions for the BS formula. We also give
some important definitions on trading strategies, market completeness and ar-
bitrage. We conclude by illustrating that the market under these assumptions is
complete, by constructing the corresponding replicating portfolio.

THE BLACK-SCHOLES ASSUMPTIONS


We fix a filtered space (Ω, F, F , P), and a Brownian motion on that space, say
Bt . We will maintain the following assumptions:
 !"#%$& '()"  36(2.1)

1. The asset price follows a geometric Brownian motion, that is to say

dSt = µSt dt + σSt dBt (2.1)

The parameter µ gives the expected asset return, while σ is the return volatil-
ity.
2. There is a risk free asset which grows at a constant rate r, which applies
for both borrowing and lending. There is no bound to the size of funds that
can be invested or borrowed risk-free.
3. Trading is continuous in time, both for the risk free asset, the underlying
asset and all derivatives. This means that any portfolios can be dynamically
rebalanced continuously.
4. All assets are infinitely divisible and there is an inelastic supply at the spot
price, that is to say the assets are infinitely liquid. Therefore, the actions of
any investor are not sufficient to cause price moves.
5. There are no taxes or any transaction costs. There are no market makers
or bid-ask spreads. The spot price is the single price where an unlimited
number of shares can be bought. Short selling is also allowed.
A derivative security is a contract that offers some payoffs at a future (matu-
rity) time T , that depend on the value of the underlying asset at the time, say
Π(ST ). We are interested in establishing the fair value Pt of such a security at
all times before maturity, that is the process {Pt : 0 6 t 6 T }.

THE REPLICATING PORTFOLIO


Of course the derivative price at time t will depend only on information available
at the time, that is Pt must be Ft -adapted. Also, the asset price is Markovian,
which indicates that Pt should not depend on the history of the asset price, but
only on the latest value St . We can therefore write the price of the derivative as
a function Pt = f(t, St ). The function f is the unknown pricing formula.
If we actually had the functional form of f(t, S), an application of Itō’s formula
would provide us with the derivative price dynamics
 
∂ ∂ σ 2 St2 ∂2
dPt = f(t, St ) + µSt f(t, St ) + f(t, S t ) dt
∂t ∂S 2 ∂S 2

+ σSt f(t, St )dBt
∂S
Although we don’t actually know f(t, S) explicitly yet, we will later use the dy-
namics above to construct a partial differential equation that the pricing function
has to satisfy. To produce the PDE we will need to introduce some terminology,
including trading strategies and arbitrage opportunities.
We will construct portfolios that we rebalance in time, and we will keep
track of them using a trading strategy Ht . Since we must make all rebalancing
decisions based on the available information, the trading strategy will be an

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


37(2.1) Q #RO& $>S%!%TVU>! WRX$>SO&T
Ft -adapted process as well. Our investment instruments are the underlying and
the risk free asset, therefore the trading strategy H = {(H tS , HtF ) : t > 0} where
HtS keeps track of the number of shares held, and HtF is the amount invested in
the risk free asset (that is the bank balance) at time t. The value of the portfolio
that is generated by the trading strategy is denoted with V t = Vt (H).
A self-financing trading strategy is one where no funds can enter or exit the
portfolio. All changes in the value are due to changes in the price of the assets
that compose it. In this case we don’t really need to keep track of the holdings
of both assets, since there are related via

HtF = Vt − HtS St

Therefore we will only keep the process of the shares held, H = {H t : t > 0},
as the trading strategy. Also, in this case the dynamics of the portfolio value are
given by

dVt = Ht dSt + (Vt − Ht St ) rdt = (Ht St µ + Vt r − Ht St r) dt + σHt St dBt

Say for a minute that we knew the pricing formula for the derivative price,
Pt = f(t, St ). We can then define a trading strategy, where the number of shares
held at each time t is given by

Ht = f(t, St )
∂S
We have selected this particular trading strategy because it sets the volatility
of the portfolio value, Vt , equal to the volatility of the derivative value, Pt . We
call this a hedging or replicating strategy and the portfolio the hedging or
replicating portfolio.

ARBITRAGE OPPORTUNITIES
We claim that if the portfolio has the same volatility dynamics it should also
offer the same return. Otherwise arbitrage opportunities will emerge.
An arbitrage opportunity is a trading strategy J that has the following four
properties (for a stopping time T > 0)
1. The strategy J is self-financing, that is there are no external cash inflows
or outflows. We can move funds from one asset to another, but we cannot
introduce new funds.
2. Vt (J) = 0, that is we can engage in the portfolio at time t with no initial
investment. This means that we can borrow all funds needed to set up the
initial strategy at the risk free rate, without investing any funds of our own.
3. VT (J) > 0, it is impossible to be losing money at time T . The worst outcome
is that we end up with zero funds, but we did not invest any funds in the
first place. 
4. P VT (J) > 0 > 0, there is a positive probability that we will actually be
making a profit at time T .

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  38(2.1)

An arbitrage opportunity is a risk free money making device, since with


no initial investment we have a probability to make a profit, without running
any risk of realizing losses. Finance theory assumes that exploitable arbitrage
opportunities do not exist when pricing claims.
Now say that at time 0 we we engage in the following self-financing strategy
Θ, where
1. we are short (we have sold) one derivative contract,
2. we hold Ht shares, and
3. we keep an amount Φt in the risk free bank account.
Thus, our holdings at any time t > 0 will have value

Vt (Θ) = −Pt + Ht St + Φt

We want to keep the initial gross investment equal to zero, V t = 0, and


therefore our initial bank balance will be Φ0 = P0 − H0 S0 . We also want to
maintain the strategy self-financing, and therefore all changes in the value of
our portfolio must come through changes in the assets themselves,

dVt = −dPt + Ht dSt + dΦt = −dPt + Ht dSt + (Vt + Pt − Ht St ) rdt

Using Itō’s formula for Pt = f(t, St ) and the stochastic differential equation for
St we can write after some algebra (which incidentally cancels the drifts µ)
 
σ 2 St2
dVt = − ft (t, St ) + rSt fS (t, St ) + fSS (t, St ) − Vt r − Pt r dt (2.2)
2

The trading strategy Θ is self-financing, and its initial value is V t (Θ) =


0. Therefore it has two of the four requirements that we set for an arbitrage
opportunity. In order to avoid such opportunities, we want to verify  that there
exists no stopping time τ, such that Vτ (Θ) > 0 and P Vτ (Θ) > 0 > 0.
The value of the trading strategy will evolve in a deterministic way, as illus-
trated in the above relationship where no stochastic term is present. Therefore,
if the term in the brackets is equal to zero for all t, then dVt = 0 which implies
Vt = V0 = 0 for all t. Then, apparently, no arbitrage opportunities are present
since P Vτ (Θ) > 0 = 0.
We can also show that this condition is also necessary. Say that τ > 0 is the
first time that the term in brackets of (2.2) becomes non-zero, and say that it is
negative implying a positive dVτ . Since f is continuous in both arguments, there
will be an interval (τ, τ + 4t) on which portfolio value will remains positive,
and therefore the value of the portfolio Vτ+(4t/2) (Θ) > 0, which indicates an
arbitrage opportunity. If at τ the value of the portfolio becomes negative, then
we can implement the inverse trading strategy for which Vτ+(4t/2) (−Θ) > 0, and
again reach an arbitrage opportunity.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


39(2.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
THE BLACK-SCOLES PARTIAL DIFFERENTIAL EQUATION
In the previous subsection we concluded that the value of the composite portfolio
Θ must be Vt (Θ) = 0 for all t > 0, otherwise arbitrage opportunities will
be present. Then, equation (2.2) will give the celebrated Black-Scholes partial
differential equation (BS-PDE), namely
∂ ∂ 1 ∂2
f(t, S) + rS f(t, S) + σ 2 S 2 2 f(t, S) = f(t, S)r (2.3)
∂t ∂S 2 ∂S
must be satisfied by the derivative pricing function f(t, S). This is one of the
fundamental relationships in financial economics, as it has to be obeyed by any
derivative contract. It shows that the price of the derivative can be replicated
by a dynamically balanced portfolio that consists of the underlying asset and
a risk free bank account, and is actually independent of the expected return on
the underlying asset µ.
As we pointed out, in order to derive the BS-PDE we did not make any
assumptions on the nature of the contract, meaning that the PDE will be satisfied
by all derivatives. The nature of the particular contract will specify the terminal
condition of the PDE. Indeed, we know that on the maturity date
PT = Π(St ) ⇒ f(T , S) = Π(S)
In their paper BS present the case of a European call option, a contract that
gives the holder the right (but not the obligation) to purchase a share at a fixed
price K on the maturity date. Then, the terminal condition becomes f(T , S) =
max(S − K , 0) = (S − K )+ . In this case BS show how the PDE can be solved
analytically and produce the Black-Scholes formula, which is the particular
pricing function f(t, S) for this contract

f(t, S) = S · N(d+ ) − K · exp −r(T − t) · N(d− ) (2.4)
where N(·) is the cumulative (standardized) normal distribution function, and d ±
are given by  
log KS + r ± 21 σ 2 (T − t)
d± = √
σ T −t
Typically we prefer to work with time to maturity, and we use the change
of variable t T − t (abusing the notation slightly). It is also convenient to
define the log-prices setting the variable s log S. Some elementary calculus
produces the BS-PDE under these variable changes, a differential equation with
constant coefficients
 
∂ 1 2 ∂ 1 ∂2
− f(t, s) + r − σ f(t, s) + σ 2 2 f(t, s) = f(t, s)r (2.5)
∂t 2 ∂s 2 ∂s
Apart from rendering this expression easier for numerical methods to handle
(since the coefficients are constant), we have a PDE with an initial condition
rather than a terminal one, namely f(0, s) = (exp(s) − K )+ . In this form, the
BS-PDE is a standard convection-diffusion partial differential equation, a form
that has been studied extensively in classical and quantum physics.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  40(2.2)

2.2 THE FUNDAMENTAL THEOREM OF ASSET


PRICING
Since Vt (Θ) = 0 for all times t > 0, the relationship

P t = H t St + Φ t

will also hold at all times. This means that using the trading strategy H t we
create a portfolio that will track (or mimic) the process Pt . Therefore we do not
really need to introduce derivatives in the BS world, as their trajectories and
payoffs can be replicated by using a carefully selected trading strategy. For that
reason we say that in the BS world derivatives are redundant securities. This of
course only holds under the strict BS assumption, and does not generally hold
in any market. It certainly does not hold in the real world where markets are
subject to a number of frictions and imperfections.
As we search for markets and models where securities can be hedged, we
need to introduce the notion of market completeness. We will say that a market
is called complete if all claims can be replicated. A market that is complete will
of course be arbitrage-free, but the inverse is not true. There are many markets
that are arbitrage free but incomplete. One can speculate that the real world
markets fall within this category: claims cannot be perfectly replicated due to
market imperfections, and these imperfections also make arbitrage opportunities
scarce and short lived.
We set a probability space (Ω, F , P), under which the price process is de-
fined. In financial mathematics, an equivalent martingale measure (EMM) is a
measure Q equivalent to the objective one P, under which all discounted as-
set prices form martingales. Therefore for the discounting factor B t , any price
process Vt will satisfy

V0 = EQ [Bt Vt ] for all t > 0

The fundamental theorem of asset pricing states the following two proposi-
tions:

There exists an EMM ⇔ There are no arbitrage opportunities


There exists a unique EMM ⇔ The market is complete

THE FUNDAMENTAL THEOREM OF ASSET PRICING AND GIRSANOV’S


THEOREM
Girsanov’s theorem is a very useful companion to the fundamental theorem of
asset pricing, as it provides us with the link between different equivalent prob-
ability measures. A typical approach would be to assume a process for an asset
under the true probability measure. This will specify the true dynamics of an
asset or a collection of assets, that is to say the process that we would produce

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


41(2.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
based on time series of the prices. Then, we use Girsanov’s theorem and try to
specify the Radon-Nikodym derivative that produces discounted asset prices that
form martingales. Of course there might be more than one probability measures
with that feature, but if we manage to find one then we can conclude that the
system is arbitrage-free. If we show that such a measure does not exist, then we
know that the system as it stands offers some arbitrage opportunities, and then
we can proceed to find them. Unfortunately, the fundamental theorem of asset
pricing does not always guide us towards these opportunities, but sometimes it
can offer useful insights to identify them.
Suppose that we are facing a market that offers a risk free rate r, and a
collection of M stocks. There are N sources of uncertainty, represented by N
independent Brownian motions B(t) = {Bti : i = 1, . . . , N}. If we collect the
j
asset returns in a (M × 1) vector with dS(t) = {St : j = 1, . . . , M}, then we can
write
dS(t) = µ S(t) + [Σ · dB(t)] S(t)
The (M × N) matrix Σ will determine the correlation structure of the assets.
In fact, the covariance matrix of the stocks will be given by the product Σ · Σ 0 .
Essentially, each asset will satisfy the SDE
N
X
dSj (t) = µj Sj (t)dt + σj,i Sj (t)dBi (t)
i=1

We want to establish whether or not we can find a trading strategy using


these M stocks that will be an arbitrage opportunity. To this end we will examine
the probability measures that are equivalent to the true one. In particular, all
equivalent measures will have a Radon-Nikodym derivative M t that satisfies
N
X
dMt = Mt λit dBti
i=1

We are looking for these equivalent probability measures under which the dis-
counted prices will form martingales, which means that under the EMM the
dynamics of the assets will be
N
X
j j j
dSt = rSt dt + σj,i St dBti,Q
i=1

Using Girsanov’s theorem we can actually find the instantaneous drift under
Q, which will be given by
  
j dMt j
EQ dSt = EP 1 + dSt
Mt
" N ! N !#
j
X X j
P i i
= µj St dt + E λt dBi (t) σj,i dBt St
i=1 i=1

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  42(2.2)

Since the Brownian motions are mutually independent, we can simplify the above
expression to " #
XN
Q j i j j
E dSt = µj + λt σj,i St dt = rSt dt
i=1
which has to be satisfied for all t > 0 and for all j = 1, . . . , M. Therefore the
parameters λ = {λi : i = 1, . . . , N} will be constant, and they must satisfy the
system of M equations with N unknowns
Σ · λ = µ − r1
This system can have no solutions, a unique solution, or an infinite number
of solutions, depending on the rank of the matrix Σ. If the rank is lower than the
number of unknowns, rank(Σ) < N, then the system will not admit a solution.
This means that there does not exist an equivalent martingale measure, and due
to the fundamental theorem of asset pricing it is implied that arbitrage trading
strategies can be constructed using a portfolio of the M stocks. If rank(Σ) > N
then there exists an infinite number of vectors λ that are solutions to the system.
Each one of these solutions will define an equivalent martingale measure and
the market is arbitrage-free. Finally, if rank(Σ) = N then the solution to the
system is unique. This unique λ will define a unique EMM and the market will
be complete. In that case, any other asset that depends on the Brownian motions
B(t) can be replicated using the M assets in the market.

A SECOND DERIVATION OF THE BLACK-SCHOLES FORMULA


Let us now consider the simple case where there is only one risky stock in
the market, with the dynamics given in equation (2.1). Then, it follows that the
coefficient of Girsanov’s transformation will solve
µ−r
σ ·λ =µ−r ⇒λ =
σ
Therefore the coefficient λ is the Sharpe ratio of the risky asset. The Sharpe ratio
is a measure of the risk premium per unit of volatility risk, and represents the
compensation that investors demand for holding the stock which has uncertain
payoffs. In this case, since λ is unique, the market will be complete.
Girsanov’s theorem will define the equivalent martingale probability measure
Q as the one with Radon-Nikodym derivative the exponential martingale
 
dQ 1 2
= Mt = exp − λ t + λBt
dP Ft 2
It follows that the discounted prices process of any other asset must form a
Q-martingale as well. In particular we can consider a European-style contract
that delivers an amount g(ST ) at time T , a payoff that depends explicitly on the
price of the underlying stock at the time. The value of this claim at all times
0 6 t 6 T will satisfy

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


43(2.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
  P
Vt = exp −r(T − t) EQ
t g(ST ) = exp −r(T − t) Et [MT · g(ST )]

These equalities offer us three options to evaluate the value of the derivative at
time t = 0.

Expectation under the true measure P


Under P the asset price and the Radon-Nikodym derivative at time T are func-
tions of BT , and the price of the derivative at time t = 0 can be written, using
the second equality, as
  2     
λT σ2
V0 = exp(−rT )EP t exp − + λB T g S 0 exp µ − T + σB T
2 2
For general functions g(·) this expectation can be computed by simply simulating
values for BT from the normal distribution with mean zero and variance T .

Expectation under the risk neutral measure Q


A much simpler approach is to use the fact that the dynamics of the underlying
asset under Q are known, and in fact
dSt = rSt dt + σSt dBtQ
Therefore using the first equality we can express the price of the derivative as
    
σ2
V0 = exp(−rT )EQ t g S 0 exp r − T + σB Q
T
2
Now the process {BtQ }t>0 is a Brownian motion under Q, and therefore once
again we can draw from the normal distribution with zero mean and variance T
to simulate the values of BTQ . The two expression will of course yield the same
result, but the latter is substantially simpler.
In particular, in the case of a standard European call option, the price will
satisfy
"    + #
2
σ
P0 = exp(−rT )EQ t S0 exp r − T + σBTQ − K
2

Since BTQ is normally distributed, after some algebra the expectation simplifies
to
 Z ∞  
exp −σ 2 T /2 B2
P0 = √ S0 exp σB − dB
2πT −d 2T
Z ∞  
exp(−rT ) B2
− √ K exp − dB
2πT −d 2T
 
In the above expression d = log(S0 /K ) + (r − σ 2 /2)T /σ. Evaluating the two
integrals will eventually lead to the Black-Scholes formula.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  44(2.3)

The Feynman-Kac form


A third approach would invoke the Feynman-Kac formula. In particular we can
write the first expectation of the valuation formula as
  Z T  
Q
V0 = E0 exp − rds g(ST )
0

with St following
 the
R t risk neutral dynamics.
 We shall also define the function
v(t, s) = EQ exp − 0 rds g(St )|S0 = s , implying that in fact we are interested
in the value V0 = v(T , S0 ). Following the Feynman-Kac approach (see section
1.7) the function v(t, s) solves the parabolic PDE that depends on the dynamics
of the asset prices process under Q (since the expectation is taken under Q)

∂ ∂ 1 ∂2
v(t, s) = rS v(t, s) + v(t, s) − rv(t, s)
∂t ∂s 2 ∂s2
with initial condition v(0, s) = g(s). This is just the Black-Scholes partial dif-
ferential equation (2.3), after the we change the time variable to the time-to-
maturity, which transforms the BS-PDE terminal condition into an initial one.

2.3 EXOTIC OPTIONS


The power of the fundamental theorem of asset pricing is unleashed when one
considers pricing contracts that are more complicated than the simple European
calls and puts. There is a very large and fairly liquid market for contracts that
are call exotic, in the sense that they exhibit features that are non-standard. In
practice, the role of a trader is to create tailor-made contracts for her clients,
and the role of the financial engineer is to produce benchmark prices for these
contracts that are arbitrage-free, and also present ways to hedge the exposure
of the trading book using available liquid contracts, like the underlying assets
and standard calls and puts.
The fundamental theorem of asset pricing will dictate that no matter how
complicated the payoff structure, the no-arbitrage price will be equal to the
discounted expected payoffs under the equivalent martingale measure Q. Some-
times it is more convenient to simulate these payoffs under Q, or to evaluate the
expectation in closed form, but in other cases solving the PDE might be more
efficient.

Exercise timing
Exotic contracts can be classified with respect to their exercise times and their
payoff structure. European-style contracts can be exercised only on the matu-
rity date, while American derivatives can be exercised at any point before the
maturity date. That is to say, a three-month American put with strike price 30p
gives the holder the right to sell the underlying asset for 30p at any point she

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


45(2.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
wishes in the next three months. In this case the holder will have to determine
the optimal exercise point. Bermudan options are somewhat between the Euro-
pean and the American ones,1 and allow the holder to exercise at a predefined
set of equally spaced points. For example if the put option described above was
a Bermudan one, perhaps it could offer weekly exercising at the closing of each
Friday during the next three months. Once again every Friday the holder must
decide if it is optimal to exercise or to wait for the next exercise point.
The shout option is slightly more complicated, as the holder has the option
to lock-in one or more prices up to the maturity date (that is by “shouting” to
the seller), and use the price they choose to compute the payoffs at maturity. For
example, if the put was a one-shout option, and after six weeks the underlying
price is 22p the holder has the opportunity to “shout” and lock in that price.
Therefore, if on the maturity date the price is 26p the holder will choose which
value will be used to compute the payoffs, in this case 22p which gives payoffs
(30p − 22p)+ = 8p per share.
Typically, one computes the prices of contracts with exotic exercise structure
using the partial differential equation. In most cases this PDE has to be solved
numerically. In chapter 3 we will give an overview of some methods that are used
to numerically solve for the price of the option, the optimal exercise strategy and
the hedging parameters.

Payoff structures
Apart from the standard calls and puts there can be a wide range of structures
that define the payoffs of the contract. The simplest deviation is the digital option
(also called binary or all-or-nothing option), where the payoff is a fixed amount
if the underlying is above or below the strike price. For example a two month
digital call with strike 60p will pay $1 if the value of the underlying is above
60p after two months. In that sense it is a standard bet on the future level of
the underlying asset price.
Another popular option is the cross option, where the underlying asset is
quoted in one currency but the payoffs (and the strike price) are denominated in
another. For example, British Airways are traded in the London stock exchange
and are priced in British pounds, but a US based investor will want the strike
price and the payoff in US dollars. Therefore, if Xt is the USD/GBP exchange
rate, and St is the BA price in London (quoted in GBP), then a European call
will have payoffs of the form (ST XT − K )+ , where the strike price is quoted in
USD. Therefore the writer of this option is also exposed to exchange rate risks
and the correlation between the exchange rate and the underlying asset returns.
A quanto option will address this dependence by setting the exchange rate that
will be used for the conversion beforehand, say X ? . Therefore the payoffs will
only depend on the fluctuations of the underlying asset, given by (S T X ? − K )+ .
The cross option described above is an example of an option that depends
on more than one underlying assets. Other exotics share this feature, like the
1
Just as Bermuda is between Europe and the US.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  46(2.3)

exchange option that allows the holder to exchange one asset for another, a
basket option that uses a portfolio of assets as the underlying asset, or the
rainbow option that depends on the performance of a collection of assets. An
example of a rainbow option is a European put where the payoffs are computed
using the worst of ten stocks.
Other contracts have features that involve other derivatives, like the com-
pound option that is an option to buy or sell another option. In this case you
can have a call on a call, a put on a call, etc. The swing option lets the holder
decide if she will use the option as a call or as a put, at a pre-specified number
of time points. Typically the holder is not allowed to use all options as calls
or puts, and some provisions are in place to ensure that a mix is actually used.
The chooser option is a variant that allows the holder to decide if the option
will pay off as a call or a put. This decision must be made at some point before
maturity.
If the option is of the European type, one can retrieve its price by using either
the PDE or by simulating the expectation. When the number of underlying assets
is small it is usually faster to numerically solve the PDE, but as the number of
assets grows these numerical methods become increasingly slower. It is typically
stated that if the number of assets is larger than four, then simulation methods
become more efficient.

Path dependence
For options with early exercise features one has to make decisions on the ex-
ercise times. This decision will be dependent on the complete price path of
the underlying asset, and not only on its value at maturity. Some other option
contracts exhibit more explicit or stronger path dependence.
A barrier option has one or more predefined price levels (the barriers). Reach-
ing these barrier can either activate (“knock-in” barrier) or deactivate (“knock-
out” barrier) the contract. Say, for example, that the current price of the under-
lying asset is 47p, and consider a six month call option with strike 55p and a
knock-in barrier at 35p. In order for payoffs to be realized on maturity, not only
the price has to end up higher than the 55p strike price, but the contract must
have been activated beforehand, that is the price needs to have fallen below
35p at some point before maturity. Monitoring of barrier options is not usually
continuous, but takes place on some predefined time points that are typically
equally spaced. The payoff of a Parisian option will depend on the time that is
spend beyond the corresponding barriers, in order to smooth discontinuities.
Lookback options have payoffs that depend not on the terminal value of the
underlying asset, but on the maximum or the minimum value over a predefined
period. Once again in most cases this maximum or minimum is taken over a
discrete set of time points. The special case where the maximum or minimum
over the whole price path is considered yields the Russian option. An Asian
option will have payoffs that depend on the average (arithmetic or geometric)
of the price over a time period, rather than a single value. Therefore an Asian

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


47(2.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
option could be a call with payoffs that depend on the average daily prices of
the underlying over the month prior to maturity.
Path dependence is easily accommodated using simulation methods, as sam-
ple paths of the underlying can be produced and the payoff can be computed
over each path. Nevertheless, one would still set up the relevant PDEs if this
was possible. Sometimes to specify the PDE one must define some auxiliary
variables, for example the “time above the knock out barrier” in the case of a
Parisian option.

2.4 THE GREEKS


So far we have addressed the problem of finding the no-arbitrage price of
a derivative contract, under the assumption that underpin the Black-Scholes
paradigm. We showed that in that case the market is complete, and any contract
can be replicated, at least in principle. Now we will look at these replicating
strategies more closely, and investigate a number of different hedging strategies.
We will take two different views, illustrated in the next two settings:
1. A trader at a financial institution ABC wants to give a quote for a deriva-
tive, most probably one with exotic features. The trader will investigate the
trading strategies that would, in theory at least, replicate the payoffs of this
derivative. In theory, following the BS procedure we shall hold H t shares at
all times, as discussed in section 2.1. In practice, as trading is not continuous
and markets are not frictionless, this replication will not be exact. The quote
she will produce will be the replicating costs, plus a premium for the risk
she runs due to imperfect hedging, plus a fee for her time and bonus.
2. An investor XYZ is holding a portfolio of assets that depend on one or
more risk factors. She wants to enter some options positions that will hedge
her position against adverse moves of these factors, perhaps in the form of
exotic options purchased from the financial institution above. Of course this
insurance will come at a premium, and she wants to investigate the cost of
different protection levels. For example, if her portfolio is well diversified, the
market will be a natural factor she is exposed to. She will consider enhancing
her portfolio with derivative contracts that are written on a market index.
It is important to observe that the value of the derivative that ABC has sold
will obey the same partial differential equation that the portfolio of XYZ does.
This follows from the absence of arbitrage opportunities that would otherwise
occur. If we assume that there is a single underlying source of risk, summarized
by the asset St , then any portfolio or derivative contract with value Vt can be
expressed as a function of St , namely Vt = V (t, St ). This function will satisfy
the Black-Scholes PDE
∂ ∂ 1 ∂2
V (t, S) + rS V (t, S) + σ 2 S 2 2 V (t, S) = rV (t, S)
∂t ∂S 2 ∂S

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  48(2.4)

We will use some Greek letters for the derivatives involved, namely ∆ = ∂V
∂S (the
∂2 V ∂V
Delta), Γ = ∂S2 (the Gamma) and Θ = − ∂t (the Theta). Then we can write the
BS-PDE as
1
−Θ + rS∆ + σ 2 S 2 Γ = rV
2
More importantly, a Taylor’s expansion of the value function V (t, S) over a
small time interval 4t and a small price change 4S yields

∂V ∂V 1 ∂2 V
4V = 4t + 4S + 4S 2 + o(4t, 4S 2 )
∂t ∂S 2 ∂S 2
1
⇒ 4V ≈ −Θ4t + ∆4S + Γ 4S 2
2
The Delta of the derivative or the portfolio will therefore represent its sensitivity
with respect to changes in the underlying asset. In continuous time trading,
holding ∆ units of the underlying asset at all times is sufficient to replicate the
path and payoffs of the portfolio value. Θ will be the time decay of this value,
representing the changes as we move closer to maturity, even if the underlying
asset does not move. When trading takes place in discrete time, there is going to
be some misalignment between the two values, and higher order derivatives can
be used to correct for that. In addition, the Γ controls the size of the hedging
error when one uses the wrong volatility for pricing and/or hedging. This is an
important feature, as the volatility is the only parameter in the BS PDE that is
not directly observed and has to be estimated.
In the BS framework there also some parameters that are considered con-
stant, namely the volatility σ, the risk free rate r, and the dividend yield q.
Therefore one can write the value of function as Vt = V (t, S; σ, r, q), and practi-
tioners use the derivatives of the value functions with respect to these parameters
as a proxy of the respective sensitivities. In particular ν or κ = ∂V∂σ (the Vega or
Kappa2 ), ρ = ∂V∂r (the Rho), and φ V = ∂V
∂q (the Phi).
With the increased popularity of exotic contracts that are particularly sen-
sitive to some parameter values a new set of sensitivities is sometimes used,
although very rarely. These sensitivities are implemented via higher order Tay-
lor’s expansions of the value function. Running out of Greek letters, these sensi-
tivities have taken just odd-sounding names or have borrowed their names from
3
∂2 V ∂3 V
quantum mechanics, like the Speed ∂∂SV3 , the Charm ∂S∂t , the Color ∂S 2 ∂t , the
∂2 V ∂2 V
Vanna ∂S∂σ , and the Volga ∂σ 2 .
No matter what Greek or non-Greek letters are used, the objective is the
same: to enhance the portfolio with a number of contracts that result in a position
that is neutral with respect to some Greek. This turns out to be a simple exercise,
as portfolios are linear combinations of assets and this carries through to their
sensitivities. Say that we are planning to merge two portfolios with values V t1
2
Vega is not a Greek letter, and for that reason this sensitivity is also found in the
literature as Kappa.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


49(2.4) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 2.1: _ T ` b R%>SCTc Y : Black-Scholes Greeks.


prq 5sŠ%,4%4t5vBw8
*u%;t.=O3+>; x¤£6–¥6{¦g6/§C|~} q 5sŠ%,4%4t5z-¨6V©z6/,z6ª56/=6¬«>£c
= }•=~‹ 4>1a5 †
“O„­} A+Š 0¨gB9H%©Kš‹ <,~‹ƒ‚eB2™%—t5gB<B<—G0=tc†
“O„­}~“O„B9Ha5B9H 5>®,= <=tc†
5
“%}~“O„ŒžŸ5B2— 5>®,= <=tc†
¯%“a„Ÿ}r;a+,8K.“%*v0«>£°B2—“O„Cc†
¯%“~}r;a+,8K.“%*v0«>£°B2—“%tc†
;“a„Ÿ}r;a+,81C“%*v-“O„Cc†
£Œ}Œ«>£°B<—G ¨gB2—¯%“a„Œž±©B2— 4y1 ž,zB2—%=teB2—¯%“tc† p 1C,O3%.4
10
¥Œ}Œ«>£°B2—%¯%“a„e† p “4A=N
¦r}•;“a„gB9HO¨gB9Ha5B9H 5>®,= <=tc† p ŠN>8%8aN
§Œ}•;“a„gB2—C¨gB2— 5>®,= <=tc† p D%4Š%N

and Vt2 into one with value Vt1+2 , and suppose that we are interested in any
sensitivity  (where  could be ∆, Γ , . . .). It follows that since  is actually a
derivative,
1+2
t = 1t + 2t
The simplest asset that we can use to enhance our portfolio in order to
achieve some immunization is the underlying asset itself, S t . Trivially, the val-
uation function of the asset is V (t, S; σ, r, q) = S, and therefore the Delta of
∂S
the asset ∆S = ∂S = 1, while all other sensitivities are equal to zero. The
argument above indicates that by augmenting our portfolio with more units of
the underlying we will change the Delta of the composite position. In order to
immunize other sensitivities we will need to construct a position that incorpo-
rates derivative contracts, with the plain vanilla calls and puts being the most
readily available candidates. For that reason we will now investigate the Greeks
of these simple options and examine how we can use them to achieve Greek-
neutrality. Listing 2.1 gives the Matlab function that produces the price and the
major Greeks for the Black-Scholes option pricing model, for both calls and puts.

THE DELTA
Say that start with a portfolio with value V and Delta ∆V . As we noted above we
can adjust the Delta of a portfolio by adding or removing units of the underlying
asset. In particular, if we add wS units of the asset, the Delta of the portfolio
will become
∆V +S = ∆V + wS ∆S = ∆V + wS
In order to achieve Delta-neutrality, ∆V +S = 0, we will need to short wS units of
the underlying asset. Note that by adding or removing funds from the risk-free
bank account does not have any impact on the Greeks. We can therefore adjust
the bank balance with the proceedings of this transaction.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  50(2.4)

FIGURE 2.1: Behavior of a call option Delta. Part (a) gives the behavior of the
delta of options with specifications {K , r, σ} = {100, 0.02, 0.20}, and three dif-
ferent times to maturity: t = 0.05 (solid), t = 0.25 (dashed) and t = 0.50 (dotted).
Part (b) gives the behavior of the delta as the time to maturity increases, for a
contract which is at-the-money (S = 100, solid), in-the-money (S = 95, dashed),
and out-of-the-money (S = 105, dotted).

1.0 1.0

PSfrag replacements
0.9 0.9
underlying price PSfrag replacements
delta time to maturity
0.8 0.8
delta

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0
70 80 90 100 110 120 130 0.0 0.2 0.4 0.6 0.8 1.0

(a) Across Prices (b) Across Time

A position that is Delta-neutral will not change in value for small asset
price changes (but it will change due to the time change as Θ V dictates). Of
course after a small change in the asset price the value of ∆ V +S will change, as
∂ V +S
∂S ∆ = Γ V . In order to maintain a Delta neutral portfolio, one has to rebal-
ance it in a continuous fashion, employing a dynamic Delta hedging strategy.
In the BS framework European calls (and puts) are priced in closed form as
in equation (2.4). Taking the derivative with respect to the price S yields the
Delta for calls and puts

Calls: ∆C = N(d+ ) Puts: ∆P = 1 − N(d+ )

The values of Delta for a European call option, across different spot prices and
different maturities is displayed in figure 2.1. The Delta for deep-in-the-money
options is equal to one, as exercise appears very likely and the seller of the
option will need to hold one unit of asset in order to deliver. For options that
are deep-out-of-the-money exercise is unlikely and the seller of the option will
not need to carry the asset, making the Delta equal to zero. As the time to
maturity increases the Deltas of in- and out-of-the-money contracts converge
towards the at-the-money Delta.

Dynamic Delta hedging


A seller of an option that maintains a Delta-neutral position at all times is repli-
cating the contract and should end up with a zero bank balance, no matter what
the path of the underlying asset is. Of course in practice one cannot rebalance

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


51(2.4) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 2.2: _ T ` d` X b e Y : Dynamic Delta hedging.


prq 5s¥Os>?O4“%Š4Bw8
¨‚ }”„ ‚‚(† p 3 ;t3=O3NAƒ1C,O3%.4
© }”„ ‚‚(† p 5=%,O3 O4Œ1C,O3%.4
, }ƒ‚B2‚(† p 3 ;C=4,4C5=
8u }ƒ‚B2‚%²e† p “%,O3*%=
5
5%3Š8aN~}ƒ‚B:„>‚e† p DC+%AN=O3A3=%M
³}ƒ‚B<%™(† p 8aN=uC,O3=%M
¯´}‡™‚(† p ;%u8 q 4,Ž+>*‡?O4“%Š4C5
“=ƒ}•³OH¯† p =C3:8O4’3 ;C=4,%DNA5
“µ‡} 5>®,= -“=G%— ,N;C“; 9¯v6:„e† p J?t3=4~;a+3%54
10
³Dƒ}ˆ2‚eF<¯Gg2—“=z† p O4%41t5~=%,NC. ”+>*‡=C3:8O4
“A ;O¨ƒ} 8uzž‚B<™—K5%3Š8aNKC—%“=ƒ‹‡5%3Š8aNt—“µ† p “CA%+ŠC¨r1C,C+.4C5%5
A ;O¨}¶x‚(† . u8K5 u8 0“A ;O¨G|(† p A+Ѝƒ1C,C+.4C5%5
¨ }ƒ¨‚t— 4y1 A ;O¨tc† p 1C,O3%.4~1C,C+.4C5%5
p +>1C=O3+>;‘1C,O3%.4ƒN>;“‘¦,4%4t5~1C,C+.4C5%5
15
x9«g6/¥C|~} q 5sŠ%,4%4t5z-¨6V©z6/,z6¢53>Š>8ON 6/³Kž³Dv6·„e†
£¸ } ‰4,C+5 2¯t‹O„e6:„(† p•q N>; q NAN;t.4¶2£C¹%¸K
£¸°:„•}r«:„‡ž±¥:„C—%¨g:„e† p 3 ;t3=O3NAr£C¹¸
*+, =;“yŽ}ŒcF<¯ p A++ 1’=?C,C+>uCŠ?¡=C3:8O4
¯³»%¼%»O¨³½}•£¸-=>;%“yž%„a—K 4y1 <,a—“=GGž„c† p 3 ;C=4,4C5=
20
¥%º ¥ º¾%¾ }Ÿ¥0=;“y(Ož¥z-=>;%“yž%„e† p 5=C+. ‡;O4%4“4“
¿OÀ¼%¼OÀµ }~¥%¥ º¾%¾ —%¨e:=;“y(c† p *u%;C“O5•;O4%4“4“
£¸z:=;“y(Á}•£¸-=>;%“yž%„r‹ º ¯³»%¼%»O¨³ ž¿OÀ¼%¼OÀµv† p £C¹%¸
4>;“
¯³»%¼%»O¨³‘}•£¸2¯K—K 4y1 <,a—%“=GKž„(† p *O3 ;ONA‘3 ;C=4,4C5=Ž1ONM8a4;C=
25
¥%º »%¸ º §%»%¼½} 0¨e2¯t‹O„~œ·©Kc† p 5=C+. ”“4A3D4,4“cÂ
¥%¥ º¾%¾ }~¥%»%¸ º §%»%¼’ž±¥2¯K(† p 5=C+. Ž;O4%4“4“
¿OÀ¼%¼OÀµ }r¥%¥ º¾%¾ —%¨e2¯t‹O„e† p *u%;C“O5r;O4%4“4“
£¸2¯t‹O„Ã}•£¸2¯Kš‹ º ¯³»%¼%»O¨³ ž¿OÀ¼%¼OÀµ‡‹C¥%»%¸ º §%»%¼(—©† p AN5>=ƒ£C¹%¸

continuously (even in the ideal case where the markets are frictionless). Figure
2.2 illustrates dynamic Delta hedging in a simulated BS world, while in 2.3 the
actual strategy is presented step-by-step. Initially we sell one call option with
strike price K = 100 (at-the-money) and four months to maturity for $2.25. In
order to hedge it we need to purchase ∆ = 0.55 shares, and we will need to
borrow $52.72 to carry out this transaction.
As the price of the underlying asset drops, the Delta of the call follows suit.
We are therefore selling our holdings gradually, recovering some funds for our
bank balance. Eventually the price recovers and we build up the asset holdings
once more. In discrete time intervals the option price changes are not matched
exactly by changes in our portfolio value. In particular these discrepancies are
larger for large moves of the underlying. Overall the hedging portfolio will mimic

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  52(2.4)

the process of the call option to a large extent, but not exactly. In this simulation
run we are left with a profit of $0.12.
Increasing the frequency of trades will decrease the volatility of this hedging
error, and of course at the limit the replicating strategy is exact. If from one
transaction to the next the Delta does not move a lot, we would expect the
impact of discrete hedging to be small. On the other hand, the impact will be
most severe in the areas where the Delta itself changes rapidly. The second
order sensitivity with respect to the price, the Gamma, is in fact summarizing
these effects.

GAMMA
The gamma of a portfolio is defined as the second order derivative of the portfolio
value with respect to the price, or equivalently as the first order sensitivity of
the portfolio Delta with respect to the price. As we already mentioned above, we
expect the Delta of a portfolio to change across time, as the price of the asset
changes. Gamma will give us a quantitative insight on the magnitude of these
changes.3
We have already analyzed how a portfolio can be made Delta-neutral, by
taking a position in the underlying asset. In order to achieve Gamma-neutrality,
the underlying asset is not sufficient. This is due to the fact that

∂2 S
ΓS = =0
∂S 2
This indicates that we need instruments that are nonlinear with respect to the
underlying asset price, in order to achieve Gamma-neutrality. Options are perfect
candidates for this job. On the other hand, the fact that Γ S = 0 has some benefits,
as it implies that after we have made the portfolio Gamma-neutral we can turn
into achieving Delta-neutrality by taking a position in the underlying asset. The
zero value of Gamma will not be affected by this position. We call the strategy
where we are neutral with respect to both Delta and Gamma simultaneously
dynamic Delta-Gamma hedging.
Say that we hold a portfolio with value V and given Delta and Gamma, ∆ V
and Γ V respectively. We follow a two step procedure where we first achieve
Gamma-neutrality, using a liquid contract with known sensitivities. For instance
we can employ a European call option with price C and known Greeks ∆ C and
Γ C . In the second step we will use the underlying asset, which has price S to
achieve delta neutrality (recall that ∆S = 1 and Γ S = 0). The resulting portfolio
will be Delta-Gamma neutral.
3
Delta will also change as time passes, even if the asset price remains the same. The
∂2 V
Charm ∂S∂t would quantify this impact. Generally speaking the impact of asset price
changes captured with the Gamma are more significant that the Delta changes cap-
tured with the Charm. This happens because the magnitude of the squared Brownian
increment (captured by Gamma) is of order o(4t), while the Charm captures effects of
order o(4t 3/2 ).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


53(2.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
FIGURE 2.2: Dynamic Delta hedging of a call option. At time zero we sell a
European call with strike price K = 100, and we Delta hedge it 25 times over
its life. The underlying asset process at the hedging times is given in (a), and
the number of shares that we need to hold are given in (b). Subfigure (c) gives
the corresponding call price and (d) our bank balance. As the option expires
out-of-the money we are not asked to deliver at maturity, and the option expires
worthless. In (e) changes in the option price and changes in the hedging portfolio
are compared. Subfigure (f) illustrates the replication error between the hedging
portfolio (solid) and the option (dashed).

101 0.7

100 0.6

99 0.5

PSfrag replacements PSfrag replacements


98 0.4

97 0.3

96 0.2

95 0.1

94 0.0
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25

(a) Underlying price (b) Delta (shares held)

2.5 +10

0
2.0
-10

PSfrag replacements
-20
1.5

PSfrag replacements
-30

1.0
-40

-50
0.5
-60

0.0 -70
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25

(c) Call price (d) Bank balance

+1.0 0.0

+0.8

+0.6 -0.5

PSfrag replacements+0.4

+0.2 -1.0

PSfrag replacements
0

-0.2 -1.5

-0.4

-0.6 -2.0

-0.8

-1.0 -2.5
-1.0 -0.5 0.0 +0.5 +1.0 0.00 0.05 0.10 0.15 0.20 0.25

(e) Option/portfolio changes (f) Replication

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  54(2.4)

FIGURE 2.3: Sample output of the dynamic Delta hedging procedure. A call option
is sold at time t = 0 and is subsequently Delta hedged to maturity
Ä Å ÆPÇ È É±ÊPÇ ËÌÅKÍ ÎÌÏ ÐPÇ È Ñ Í Ò Ó Ñ ÔÖÕ × Ó Æ Å Ø Ä Å ÆPÇ È É±ÊPÇ ËÌÅKÍ ÎÌÏ ÐPÇ È Ñ Í Ò Ó Ñ ÔÖÕ × Ó Æ Å Ø
Ù Ù ÚÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ ÚÙ ÚKÙ Ù Ù Í ÙÕ ÝÙ Þà Ù Ù Ù ßâÙ ÙáPÙ Ü ÙÚ ÙÚ ÙÚÙ Ùã ÄÙ ÊKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ùá õÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá õÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Íä ÍÍ ÒÒ íã éî Ù Ú)áPÜÜ æÚPÚá êõìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éîàß ßÓê Ð%æ)Ú)ÜÜ ëêåæê êÚæ)ìÕ Ü ïÒæ éGçÄKèKÍÍ Í Íä Ù
Íä Í Ó Ð%åè ê)Ü è êPáOÍ Í Ù á æ)Ü ê ç çÚ)Ü á õ ç
Í Ó Ð%åÌá Ú Ú)Ü Ú Ú ÚKÍ Í Ù ê æ)Ü ë æ êÚ)Ü ê ê Ú Ù Ùá çö Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá çÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù Ùáð Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚPÙ Ùá ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éîàßâß á ê)Ú)ÜÜ ç á ëê óóìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éî Ù çIÚ)ÜÜ óÚ çê ôèìÕ ïÒ éGÄKÍÍ Íä
Ù Í Ó Ð%åè ë)Ü è õ õKÍ Í æ ó)Ü Ú æ çÚ)Ü æ è æ
Íä
Í Ó Ð%åè è)Ü çâá õKÍ Í Ù ç ë)Ü ó ó ÚÚ)Ü ê ÚPá Ù Ùá êÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá êÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙôÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù æÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ æÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙÚÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éîàßâß á Ú)Ú)ÜÜ êPá Úá ëôìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éî Ù á Ú)áPÜÜ ç á Úá ëôìÕ ïÒ éGÄKÍÍ Íä
Ù Í Ó Ð%åè ó)Ü è ç õKÍ Í õ ó)Ü ê ç ôÚ)Ü õ è è
Íä
Í Ó Ð%åè ó)Ü Ú õ ëKÍ Í Ù õ ô)Ü ç ó æÚ)Ü õ ó ê Ù Ùá ôÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá ôÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙóÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù õÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ õÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙëÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éîàßâß á çIÚ)ÜÜ ç á çô èêìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éî Ù Ú)Ú)ÜÜ çÚ æÚ çêìÕ ïÒ éGÄKÍÍ Íä
Ù Í Ó Ð%åÌá Ú Ú)Ü á Ú ÚKÍ Í ê õ)Ü Ú æ õÚ)Ü ê ç õ
Íä
Í Ó Ð%åè ó)Ü Ú õ èKÍ Í Ù õ ô)Ü Ú ô êÚ)Ü õ ó Ú Ù Ùá ëÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá ëÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù çö Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ çÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙëÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã é@ îàßâßÓ Ð%Ú)áPÜÜ ÚÚPåÌæáá êÚÚ Ú)ìÕ Ü ïÒ á éGÄKó ÚKÍÍ Í
ÍÍ ÒÒ íã éî Ù ê)Ú)ÜÜ ëÚ êê ëèìÕ ïÒ éGÄKÍÍ Íä
Ù Í Í ê çIÜ Ú ê èÚ)Ü ê ê õ
Íä
Í Ó Ð%åè ë)Ü õ óPáOÍ Í Ù õ Ú)Ü õPá ôÚ)Ü õ æPá Ù Ùá óÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá óÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù êÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ êÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙôÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã é@
ÍÍ ÒÒ íã éî Ù á æ)Ú)ÜÜ ç á ôõ êÚìÕ ïÒ éGÄKÍÍ Íä îàßßÓ Ð%õ)Ú)ÜÜ æÚåÌÚõá ÚæÚ Ú)ìÕ Ü ïÒõ éGÄKè êKÍÍ Í
Ù Í Í ê ë)Ü æ ô èÚ)Ü ê ó ê
Íä
Í Ó Ð%åè ê)Ü ô ó õKÍ Í Ù á ë)Ü ó ê ëÚ)Ü á èPá Ù Ùá èÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá èÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù ôÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ ôÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ çÙ Ù ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã é@ îàßßÓ çIÐ%Ú)ÜÜ óÚåÌçóá õóÚ Ú)ìÕ Ü ïÒô éGÄKó èKÍÍ Í
ÍÍ ÒÒ íã éî Ù æ)Ú)ÜÜ óÚ óõ ôÚìÕ ïÒ éGÄKÍÍ Íä
Ù Í Í ô æ)Ü á ô çÚ)Ü ô õ ç
Íä
Í Ó Ð%åè ê)Ü õPá õKÍ Í Ù á çIÜ è ë êÚ)Ü á ôPá Ù æÙ ÚÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü æÙ ÚÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙæÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù ëÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ ëÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éî Ù ë)Ú)ÜÜ óÚ õë ÚóìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éî Ù Ú)áPÜÜ êÚPÚá æôìÕ ïÒ éGÄKÍÍ Íä Ù
Íä Ù Í Ó Ð%åÌá Ú Ú)Ü á ó óKÍ Í Ù ê çIÜ õ ç ôÚ)Ü ê ê ô
Í Ó Ð%åè ê)Ü á ë ëKÍ Í Ù á õ)Ü ç ë ôÚ)Ü á ç ê Ù æPÙ Ùáð Ù Ù Ù Ù Ù Ú)Ù Ù Ü æPÙ Ùá ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚPÙá ÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù óÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ óÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éî Ù æ ô)Ú)ÜÜ óæ çë ëæìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã é@ îàßßÓ Ð%æ)Ú)ÜÜ ÚÚåÚæPçèá ê)ìÕ Ü ïÒô éGÄKô ëKÍÍ Í Íä Ù
Íä Í Ó Ð%åè ó)Ü ë ô çGÍ Í Ù æ ë)Ü êPá ÚÚ)Ü æ ó ç
Í Í á ê)Ü ç ó æÚ)Ü á ô ô Ù æÙ æÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü æÙ æÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙôÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ù èÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü ÚÙ èÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éî Ù æ Ú)Ú)ÜÜ óæPæá çèìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã é@ îàßâßÓ Ð%Ú)áPÜÜ æÚPåêPá õèáô)ìÕ Ü ïÒÚ éGÄKÚPáOÍÍ Í Íä Ù
Íä Í Ó Ð%åè ë)Ü ç Ú æKÍ Í Ù ô)Ü ô ó ëÚ)Ü Ú ë Ú
Í Í á ô)Ü ë õ ôÚ)Ü á ë è Ù æÙ õÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü æÙ õÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚPÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ùá ÚÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá ÚÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙõÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éîàßâß á Ú)Ú)ÜÜ óá õá èÚìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã éî Ù ô)Ú)ÜÜ èÚ çë èõìÕ ïÒ éGÄKÍÍ Íä
Ù Í Ó Ð%åè ó)Ü ô ô õKÍ Í á ë)Ü ê æ ëÚ)Ü á ó Ú
Íä
Í Ó Ð%åè çIÜ èPá êKÍ Í Ù è)Ü ë èPá Ú)Ü á Ú ô Ù æÙ çö Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü æÙ çÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ çÙ Ù ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ùá Ùáð Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá Ùá ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙæÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éî Ù á çIÚ)ÜÜ ç á çç ôëìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã é@ Ù
Íä îàßâßÓ Ð%Ú)áPÜÜ ôÚPåôá ëëè ê)ìÕ Ü ïÒçâéGÄKá óKÍÍ Í Íä
Í Ó Ð%åè ó)Ü á ç óKÍ Í Ù õ)Ü Ú ó çÚ)Ü Ú õ õ
Í Í á áPÜ ç ô ÚÚ)Ü á æ õ Ù æÙ êÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü æÙ êÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚPÙáÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù
Ù Ùá æÛ Ù Ù Ù Ù Ù Ù Ú)Ù Ù Ü Ùá æÙ ÚKÙ Ù Ù Í Ùñ òÙ ÊÙ Ù Ù ÙÙ ÙÚ)Ù Ü ÙÚ ÙÚ ÙæÙ ìÙ ÒÙ ÄKÙ Ù Ù Í Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÍÍ ÒÒ íã éî Ù õ)Ú)ÜÜ æÚ Úõ õõìÕ ïÒ éGÄKÍÍ
ÍÍ ÒÒ íã é@ Ù
Íä îàßßÓ Ð%æ)Ú)ÜÜ æÚåóæ ççè ê)ìÕ Ü ïÒè éGÄKë æKÍÍ Í Íä
ÍÍä ÷ Þ øàßÓÓ Ð%Ð%Ú)Ü ÚååÌÚá èÚÚ ó)Ú)Õ ÜÜ ïõÚ éGëÚ èKÚKÍ ÍÍ
Í Í á õ)Ü ë ç êÚ)Ü á ç ë
Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÙÙ Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù Ù ÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙ
Í ÍVß Ú)Ü á á èÚ)Ü Ú Ú Ú

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


55(2.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
PSfrag replacements
time to maturity
FIGURE 2.4: Behavior of a call option Gamma. Part (a) gives the behavior of the
gamma

Gamma of options with specifications {K , r, σ} = {100, 0.02, 0.20}, and three


0.0
different times to maturity: t = 0.05 (solid), t = 0.25 (dashed) and t = 0.50
0.1
0.2
0.3
(dotted). Part (b) gives the behavior of the Gamma as the time to maturity0.4
0.5
increases, for a contract which is at-the-money (S = 100, solid), in-the-money
0.6
0.7
0.8
(S = 95, dashed), and out-of-the-money (S = 105, dotted). 0.9
1.0

0.09 0.10

0.08 0.09
PSfrag replacements 0.0
underlying price 0.1 0.08
gamma 0.07 0.2
0.3
0.07
0.06 0.4
0.5
0.6 0.06
0.05 0.7
0.8 0.05
0.04 0.9
1.0 0.04
0.03
0.03

0.02
0.02

0.01 0.01

0.00 0.00
70 80 90 100 110 120 130 0.0 0.2 0.4 0.6 0.8 1.0

(a) Across Prices (b) Across Time

We want to buy wC units of the option. This makes the value of our composite
position equal to V + C, and most importantly it will have a Gamma equal to
Γ V +C = Γ V + wC Γ C . Therefore, to achieve Gamma-neutrality we need to hold
V
wC = − ΓΓ C units of the option.
V
The Delta of the new portfolio is of course ∆V +C = ∆V − ΓΓ C ∆C . To make
V +C
the position Delta-neutral we want to also hold wS = −∆ shares of the
underlying asset.
For European call and put options the value of Gamma is given by

N0 (d+ )
ΓC = √
Sσ t
Graphically, figure 2.4 gives Gamma across different moneyness and maturity
levels. Apparently the Gamma is significant for contracts that are at-the-money.
In particular, the Gamma of at-the-money options goes to infinity as maturity
approaches. This is due to the discontinuity of the derivative of the payoff func-
tion.

Dynamic Delta-Gamma hedging


As Gamma is the sensitivity of the Delta with respect to the underlying price S,
we can use a Delta-Gamma neutral strategy to construct a replicating portfolio
which is second order accurate in S. When we Delta hedge over a discrete
time interval we introduce replication errors since the Delta of our position will
not remain equal to zero as the time changes over this rebalancing interval.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  56(2.4)

FIGURE 2.5: Dynamic Delta-Gamma hedging of a call option. At time zero we


sell a European call with strike price K = 100, and we Delta-Gamma hedge it
25 times over its life. To do so we use the underlying stock and a call option
which has at all points a strike price that is 105% the current spot price. The
underlying asset process is the same as in figure 2.2. Subfigure (a) gives the
number of options (solid) and shares (dashed) that we need to hold to maintain
Delta-Gamma neutrality. The dotted line gives the number of shares that Delta
hedge (as in figure 2.2). Subfigure (b) gives the bank balance if we Delta-Gamma
hedge (solid) or just Delta hedge (dotted). In (c) changes in the option price and
changes in the hedging portfolio are compared. Crosses give the Delta-Gamma
hedging deviations, while circles correspond to pure Delta hedging. Finally,
subfigure (f) illustrates the replication error between the hedging portfolio (solid)
and the option (dashed) which are virtually indistinguishable. The dotted line
gives the process of the Delta hedging portfolio.

1.8 +10

1.6
0
1.4
PSfrag replacements -10
1.2
PSfrag replacements
-20
1.0

0.8 -30

0.6
-40
0.4
-50
0.2
-60
0.0

-0.2 -70
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25

(a) Shares and options held (b) Bank balance

1.0 0.0

0.8

0.6 -0.5

PSfrag replacements 0.4

0.2 -1.0

PSfrag replacements
0.0

-0.2 -1.5

-0.4

-0.6 -2.0

-0.8

-1.0 -2.5
-1.0 -0.5 0.0 0.5 1.0 0.00 0.05 0.10 0.15 0.20 0.25

(c) Option/portfolio changes (d) Replication

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


Q #RO& $>S%!%TVU>! WRX$>SO&T
PSfrag replacements PSfrag replacements
0.0 0.0
57(2.4)
0.1
0.2
0.1
0.2
0.3 0.3
0.4 0.4
0.5 0.5
FIGURE 2.6: Comparison of the replication errors for Delta and Delta-Gamma
0.6
0.7
0.6
0.7
neutral positions. The histograms are based on 10,000 simulations of the un-
0.8
0.9
0.8
0.9
1.0 1.0
derlying asset price. For each run a call option with strike price K = 100 was
Delta or Delta-Gamma hedged, as in figures 2.2 and 2.5.

0.04500 0.04500
0.1 0.1
0.24000 0.24000
0.3 0.3
0.43500 0.43500
0.5 0.5
0.63000 0.63000
0.7 0.7
0.8 0.8
2500 2500
0.9 0.9
1.0 1.0
2000 2000

1500 1500

1000 1000

500 500

0 0
-3 -2 -1 0 +1 +2 +3 -3 -2 -1 0 +1 +2 +3

(a) Delta hedging (b) Delta-Gamma hedging

V 2
These changes of Delta will be proportional to the derivative ∂∆ ∂ V V
∂S = ∂S 2 = Γ .
Therefore, if we construct a position that has Γ = ∆ = 0 we form a portfolio
that will maintain a position which is (approximately) neutral for larger price
changes, and since the price is diffusive, for longer periods of time. 4
Of course as we mentioned above we cannot implement such a position using
the underlying asset alone, and we will need an instrument that exhibits non-zero
Gamma. Typically we use liquid call and put options that are around-the-money
to do so. In figure 2.5 we repeat the experiment of figure 2.2 using a Delta-Gamma
neutral strategy this time. We sell one call option with strike K = 100 and
construct a Delta-Gamma hedge that uses, apart from the underlying asset, a call
option. We could use an option with a constant strike price throughout the time to
maturity, but there is always the risk that as the underlying price fluctuates this
option might become deep-in- or deep-out-of-the-money. Such an option will
have Γ C ≈ 0 (see figure 2.4), and our position in options wC = −Γ V /Γ C → ±∞.
To get around this problem, at each point in time we use a call option that has
strike price 105% the value of the underlying asset at this point, K ? = 1.05St .
This essentially means that when we rebalance we sell the options we might
hold and invest in a brand new contract.5
Figure 2.5 gives the processes for this experiment. In subfigure (c) it is easy
to see that the Delta-Gamma changes in the portfolio follow the changes of the
hedged instrument a lot more closer than the portfolio of figure 2.2 which was
only Delta neutral. This improvement in replication accuracy is also illustrated
in subfigure (d), where the two processes are virtually indistinguishable.
4
There is also an error associated with Delta changes as time passes, proportional to
the Charm ∂∆∂t
, but these effects are typically small and deterministic.
5
Of course if transaction costs were present this would not be the optimal strategy.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  58(2.4)

We can also repeat the above experiments to assess the average performance
of simple Delta and Delta-Gamma hedging. Here we create 10,000 simulations
of the underlying asset and option prices, and implemented the two hedging
strategies. The table below gives the summary statistics for the hedging errors,
when we hedge 10, 25 or 50 times during the three-month interval to expiration.
Figure 2.6 presents the corresponding histograms for the two hedging strategies,
when we rebalance 25 times.
Hedges Strategy Mean St Dev Min Max Skew Kurt
10 ∆ −0.01 0.52 −3.34 +1.76 −0.47 4.60
∆&Γ −0.12 0.35 −3.76 +1.45 −3.17 19.1
25 ∆ +0.00 0.33 −1.77 +1.30 −0.24 4.30
∆&Γ −0.02 0.17 −2.11 +2.22 −2.63 28.9
50 ∆ −0.00 0.24 −1.50 +1.00 −0.27 4.77
∆&Γ −0.01 0.11 −1.05 +2.28 +0.63 53.3

In comparison, the Delta-Gamma neutral strategy gives pricing errors that


are a lot more concentrated around zero, but with significant outliers. This is
illustrated in figure 2.6 and the table above. In particular, for all hedging fre-
quencies Delta-Gamma hedging produces half the standard deviation of the
errors. On the other hand, for some paths of the underlying asset, implementing
Delta-Gamma hedging produces outliers. This is also confirmed by the the ta-
ble, where the minimum and maximum values and the kurtosis indicate extremely
fat tails. Of course, this behavior is dependent on the exact implementation of
the hedging strategies, that is to say which instruments are used and how the
rebalancing points are selected.

Gamma and uncertain volatility


The BS-PDE (2.3) depends on two parameters, the risk free rate r, which is
a quantity that is directly observable, and on the volatility σ, which is not.
Typically, an options writer will sell contracts based on a conservative estimate
of the volatility, say σ̂ and subsequently hedge it. It is therefore natural to ask
what will the implications be if we hedge our position using a wrong value for σ.
It turns out that Gamma has another important role to play, as it will determine
the impact of this misspecification. The approach that we follow here is outlined
in Carr (2002) and Gatheral (1997, 2006), among others.
To put things concretely, say that the true process for the underlying asset
is given by the SDE
dSt = µSt dt + σ A St dBt
where the superscript σ A denotes the actual volatility. We take the position of
the writer of a European-style option that offers a payoff Π(S T ) at time T , and
consider its valuation as a function not only of (t, S), but also of the volatility
σ. For that reason we denote the value of this derivative with V (t, S; σ). We
therefore consider a family of pricing functions for different values of σ, where
all satisfy the BS-PDE

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


59(2.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
∂ ∂ 1 ∂2
V (t, S; σ) + rS V (t, S; σ) + σ 2 S 2 2 V (t, S; σ) = rV (t, S; σ)
∂t ∂S 2 ∂S
with the appropriate boundary condition V (T , S; σ) = Π(S).
Let us assume that we are asked to sell one such contract, and we quote a
price that solves the BS-PDE for a volatility parameter σ I , which we call the
implied volatility. We can write our quote as V0I = V (0, S0 ; σ I ).
After selling the contract we proceed with the Delta hedging approach. In
particular, we implement a self-financing trading strategy H = {(H tS , HtF ) : t >

0}, where we HtS = ∆H H
t = ∂S V (t, St ; σ ) units of the underlying asset at each
time point, and keep a risk-free bank balance of HtF . Note that when we compute
the Delta of the contract we use a third volatility σ H , that is to say the hedging
volatility.
The initial bank account balance will be

H0F = V (0, S0 ; σ I ) − V (0, S0 ; σ H )S0 (2.6)
∂S
and the bank account dynamics will be affected at each time by the amount
needed to purchase (or sell) stocks to maintain Delta-neutrality, and by interest
payments. In particular, at time t + dt the Delta has changed to H tS + dHtS ,
indicating that we will need to purchase dHtS shares. The price of each share is
of course St + dSt when we make this purchase. Also, over this period we will
gain an amount rHtF dt due to the interest on the bank balance. Putting these
two together we can write the dynamics of the bank account balance as 6

dHtF = −dHtS (St + dSt ) + rHtF dt



= −d HtS St + HtS dSt + rHtF dt

= −d ∆H H F
t St + ∆t dSt + rHt dt

The solution of the above SDE can be written as

exp(−rT )HTF − H0F = − exp(−rT )∆H H


T ST + ∆ 0 S0
Z T
 
+ exp(−rt) ∆H H
t dSt − r∆t St dt (2.7)
0

Itō’s formula will give us the dynamics for the quantity VtH = V (t, St ; σ H )
that gives us the value of Delta that we wish to maintain. In particular,
 
1 A2 2 H
H
dVt = Θt + (σ ) St Γt dt + ∆H
H
t dSt
2

Since V (t, S; σ H ) satisfies the BS-PDE, Θ H + ∆H rS + 12 (σ H )2 S 2 Γ H = rV H , we


can write
6
The second equation is due to the fact that d(Xt Yt ) = Xt dYt + Yt dXt + dXt dYt , where
the others are largely based on d(exp(−rt)Xt ) = exp(−rt)dXt − r exp(−rt)Xt .

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  60(2.4)
 
1 A 2  
dVtH H
= rVt + (σ ) − (σ ) St Γt dt + ∆H
H 2 2 H H
t dSt − ∆t rSt dt
2

We can solve the above expression for the last square bracket and substitute
into the expression for the bank balance dynamics (2.7). This will produce

exp(−rT )HTF − H0F = − exp(−rT )∆H H


T ST + ∆ 0 S0
Z T
 
+ exp(−rt) dVtH − rVtH dt
0
Z T
1 A2
− (σ ) − (σ H )2 exp(−rt)St2 ΓtH dt
2 0

Now we can use (2.6) and the fact that VTH = Π(St ) and ∆H 0
T = Π (ST ) to
write the final bank balance in a parsimonious way as

HTF = exp(rT ) V0I − V0H + Π(ST ) − ST Π 0 (ST )
Z T
1 A2
− (σ ) − (σ H )2 exp(−rt)St2 ΓtH dt
2 0

Also, at time T we are holding ∆H 0


T = Π (ST ) shares that we will sell, and also
deliver the payoff of the derivative contract Π(ST ). Overall, our profit (or loss)
from the Delta hedging strategy will be

P&L = exp(rT ) V (0, S0 ; σ I ) − V (0, S0 ; σ H )
Z T
1 H 2 
+ (σ ) − (σ A )2 exp r(T − t) St2 ΓtH dt (2.8)
2 0

Equation (2.8) is very interesting for a number of reasons. If we happen


to know (or be able to estimate fairly accurately) the actual volatility σ A that
we prevail over the life of the contract, then by Delta hedging we can lock in
the profit that the difference between the quote V (0, S0 ; σ I ) and the fair value
V (0, S; σ A ), irrespectively of the path of the underlying asset. To do so we should
use the actual volatility to compute the Delta of our strategy, V tH = VtA for all
0 6 t 6 T . This happens of course because in this case our dynamic rebalanced
portfolio replicates the true payoffs Π(ST ).
It is likely though that we will not know σ A , and in this case we might choose
to hedge using the implied volatility σ I . Then, the first part of P&L vanishes,
and the final profits will depend on the path of the underlying asset, and will
therefore be uncertain. In fact, the sign of the profit will depend on the sign
of ΓtH . For standard calls and puts Γ > 0, which implies that we will always
realize profits if once again the implied (and here also hedging) volatility is
greater than the realized one, σ I = σ H > σ A . The Gamma for standard calls and
puts resembles the underlying probability density, and has its peak around-the-
money. This means that the realized profits will be maximum if the underlying

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


61(2.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
PSfrag replacements PSfrag replacements
0.0 0.0
FIGURE
0.1
0.2
2.7: Delta hedging with uncertain0.1
0.2
volatility. An at-the-money put is sold,
and
0.3
0.4 subsequently Delta-hedged using the
0.3
0.4 implied volatility. Different trajecto-
0.5
ries
0.6 of the underlying asset will generate0.5
0.6 different profits, with the highest when
0.7 0.7
the
0.8
0.9
asset does not trend. 0.8
0.9
1.0 1.0

130 0.0

0.0 120 0.0 -0.2


0.1 0.1
0.2 0.2
0.3 0.3
0.4 110 0.4 -0.4
0.5 0.5
0.6 0.6
0.7 100 0.7 -0.6
0.8 0.8
0.9 0.9
1.0 1.0
90 -0.8

80 -1.0
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25

(a) Asset trajectories (b) Shares long

price does not trend upwards or downwards (as this would render the option in-
or out-of-the-money).
Figure 2.7 gives an example. We are asked to quote an at-the-money Euro-
pean call (S0 = K = $100) with maturity three months.7 The actual volatility
over the life of the option is σ A = 15%, which indicates that the fair value of
this contract is V0A = $2.74. We agree to sell this option at V0I = $3.73, which
implies a volatility σ I = 20%. Essentially the option is overpriced by $0.99. The
figure illustrates three possible trajectories, where the underlying asset moves
up, down or sideways over the life of the option.
We might know the future actual volatility, in which case we can select
σ H = 15%. If we do not, we can hedge at the implied volatility σ H = 20%. The
following table gives the profits realized using each sample path, with 5, 000
rebalances over the three month period (about 60 per day). One can observe that
in the case where the asset does not trend, using σ I outperforms σ A . Also, note
that when the asset moves sideways, even such a frequent rehedging strategy
is not identical to the continuous one.
P&L when asset moves
up down sideways
σ H = σ A = 15% +$0.99 +$0.99 +$0.92
σ H = σ I = 20% +$0.51 +$0.57 +$1.44

VEGA
We have already highlighted the dependence of derivative contracts on the
volatility of the underlying asset. The BS methodology makes the assumption
7
The drift of the underlying asset is µ = 8%, and the risk free rate of interest is r = 2%.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  62(2.4)

that the volatility is constant across time, but practitioners routinely compute
the sensitivity of their portfolios with respect to the underlying volatility, and
in some cases try to hedge against volatility changes. Of course, in order to be
precise one should start with a model that specifies a process for the volatility
of the asset, and not the BS framework where the volatility is constant. Then,
sensitivities with respect to the spot volatility are in principle computed in a
straightforward matter, exactly as we compute the BS Delta.
In practice, practitioners use the Black-Scholes Vega instead. It might appear
counterintuitive to use the derivative with respect to a constant, but it offers a
good (first order) approximation. Unless the rebalancing intervals are too long,
or the volatility behaves in an erratic or discontinuous way, the Vega is fairly
robust and easy to compute and use. We follow the last subsection and consider
the value of the portfolio as a function of the volatility σ (in addition to (t, S)),
V = V (t, S; σ). Then, applying Taylor’s expansion yields

1
4V = Θ4t + ∆4S + Γ 4S 2 + ν4σ + o(4t, 4S 2 , 4σ)
2
The underlying asset price does not depend explicitly on the volatility, ren-
dering ν S = 0. Once more we need to rely on nonlinear contracts, such as
options, to make a portfolio Vega-neutral. If we want to achieve joint Gamma-
Vega-neutrality hedge, we will have to use two different derivative securities.
Say that we use two options with prices C1 and C2 , with known deltas
(∆C1 and ∆C2 ), Gammas (Γ C1 and Γ C2 ) and Vegas (ν C1 and ν C2 ). We also use the
underlying asset to achieve delta neutrality (of course ∆ S = 1 and Γ S = ν S = 0).
We want to buy wC1 and wC2 units of the two derivative securities to achieve
Gamma-Vega neutrality. We are therefore faced with the system

Γ V +C1 +C2 = Γ V + wC1 Γ C1 + wC2 Γ C2 = 0


ν V +C1 +C2 = ν V + wC1 ν C1 + wC2 ν C2 = 0

This will identify the holdings of the two derivatives

Γ V ν C2 − Γ C2 ν V Γ C1 ν V − Γ V ν C1
w C1 = − w C2 = −
Γ C1 ν C2 − Γ C2 ν C1 Γ C1 ν C2 − Γ C2 ν C1

After that we can adjust our holdings of the underlying asset to make our
position Delta-neutral as well. For a European call or put option the BS value
of Vega is given by √
ν C = S tN0 (d+ )
Graphically, the Vega across different moneyness and maturity levels is given
in figure 2.8. It is straightforward to observe that Vega, like Gamma, is more
pronounced for at-the-money options. Unlike Gamma though, thee Vega drops
as we move closer to the maturity. Thus, to achieve Vega neutrality one should
incorporate long dated at-the-money options in her portfolio.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


63(2.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
FIGURE 2.8: Behavior of a call option Vega. Part (a) gives the behavior of the Vega
of options with specifications {K , r, σ} = {100, 0.02, 0.20}, and three different
times to maturity: t = 0.05 (solid), t = 0.25 (dashed) and t = 0.50 (dotted).
Part (b) gives the behavior of the Vega as the time to maturity increases, for a
contract which is at-the-money (S = 100, solid), in-the-money (S = 95, dashed),
and out-of-the-money (S = 105, dotted).

30 v20

v19
25
PSfrag replacements
PSfrag replacements time to maturity v18
underlying price vega
20
vega v17

0.0
15 0.2 v16
0.4
0.6
0.8
v15
10
1.0
0 v14
5
5 10
15 v13
20
25
0 30 v12
70 80 90 100 110 120 130 x12 x13 x14 x15 x16 x17
35
40

(a) Across Prices (b) Across Time

DIVIDENDS AND FOREIGN EXCHANGE OPTIONS


In the above analysis we have ignored the impact of dividends, just to keep
things simple. When a stock pays a continuous dividend at a constant rate q,
the process of the underlying asset under Q is given by the GBM

dSt = (r − q)St dt + σSt dBtQ

Derivatives will be given once again as expectations P0 = exp(−rT )EQ Π(ST ),


and their pricing function Pt = f(t, St ) will satisfy the PDE

∂ ∂ 1 ∂2
f(t, S) + (r − q)S f(t, S) + σ 2 S 2 2 f(t, S) = f(t, S)r
∂t ∂S 2 ∂S
The prices and the Greeks can be computed easily following the same steps.
In particular we can summarize the most useful Greeks in the following catalogue,
where h = +1 for calls and h = −1 for puts, and

log(S0 /K ) + (r − q ± σ 2 /2)(T − t)
d± = p
σ (T − t)
• Option price P = V (t, S)

P = hSe−q(T −t) N(hd+ ) − hK e−r(T −t) N(hd− )


∂V (t,S)
• Delta ∆ = ∂S
∆ = he−q(T −t) N(hd+ )

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  64(2.5)

• Theta Θ = − ∂V∂t
(t,S)

SN0 (d+ )σe−q(T −t)


Θ= √
2 T −t
− hqSN(hd+ )e−q(T −t) + hrK e−r(T −t) N(hd− )
∂2 V (t,S)
• Gamma Γ = ∂S 2
N0 (d+ )e−q(T −t)
Γ= √
Sσ T − t
∂V (t,S;σ )
• Vega ν = ∂σ √
ν = S T − tN0 (d+ )e−q(T −t)
∂V (t,S;r)
• Rho ρ = ∂r
ρ = hK T e−r(T −t) N(hd− )
∂V (t,S;q)
• Dividend-rho ρq = ∂q

ρq = hST e−q(T −t) N(hd+ )

Similar expressions can be derived for foreign exchange rates. In particular,


the exchange rate (denominated in the domestic currency) under risk neutrality
is assumed to follow the GBM

dSt = r d − r f St dt + σSt dBtQ

In essence holding the foreign currency will depreciate at the risk free rate
differential. Therefore is is straightforward to confirm that the formulas for the
option prices and their Greeks above will hold, where r r d and q rf .

2.5 IMPLIED VOLATILITIES


As we have pointed out a few times already, the parameters of the BS formula are
all considered to be F0 -measurable by assumption. In reality though, although
the current price, the strike price, the maturity and the interest rate are observed
at time t = 0, the volatility σ of the asset price is not. Since an array of call
and put options are also available with prices that are observed at time zero,
one will naturally attempt to invert numerically the BS formula and construct
a series of implied volatilities {σ̂(T , K )} across different maturities and strike
prices. Bajeux and Rochet (1996) show that there is a one-to-one relationship
between implied volatilities and option prices. As pointed out in Dupire (1994),
these implied volatilities will indicate how the underlying asset should vibrate
in the BS world, in order for the contract to be priced correctly. Following our
discussion in the previous section, these volatilities would be natural candidates
to compute the Delta that will hedge the option.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


65(2.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
If the assumptions underlying the BS formula were correct, then all prices
would be priced according to the BS formula, and therefore we should extract
the same implied volatility from all options (that is for any maturity and strike
price combination). It turns out though, that these implied volatilities are not
constant, and in fact exhibit some very clear and persistent patterns. We will
see that these patterns can be attributed to actual volatilities that are time
varying, to discontinuities in the asset price process, to hedging demands for
specific option contracts, and to liquidity premiums for some specific groups of
options.
The variation of volatility is a well documented feature of asset returns, and
models that incorporate stochastic volatilities, first introduced in Hull and White
(1987, HW), give the theoretical background to interprete the implied volatility
as the expectation of the average future (realized) volatility over the life of the
option.
Say that we are considering the price of a European call, that is Π(S T ) =
(ST − K )+ . We assume, that the future volatility σt is stochastic but independent
of the stock price, and also has zero price of risk.8 The main idea of HW is to
condition on the average variance over the life of the option, namely the random
variable Z
1 T 2
γ2 = σ dt
T 0 t
Then, using the tower property we can write the option price as
 Q 
P0 = exp(−rT )EQ Π(ST ) = exp(−rT )EQ γ E [Π(ST )|γ]

where the outermost expectation is with respect to all possible realizations of γ.


It turns out that the conditional option prices are equal to their Black-Scholes
counterparts, with σ γ. Thus, we can write the HW prices as a weighted sum
of BS prices
fHW (t, S; K , T , r, · · · ) = EQ
γ fBS (t, S; K , r, γ)

In the above expression the dots represent parameters that govern the volatility
dynamics, and f· are the corresponding pricing functions.
If we now consider an at-the-money option, where the strike is set at the
forward price KAT M = S exp(rT ), then the HW formula will give
h γ √  i
fHW (t, S; KAT M , r, · · · ) = EQ S 2N T −t −1
2
On the other hand, if PAT M is the observed price, the ATM implied volatility will
solve    
σ̂AT M √
PAT M = fBS (t, S; K , r, σ̂AT M ) = S 2N T −t −1
2
Assuming that the HW model is the correct model, fHW (t, S; KAT M , r, · · · ) =
PAT M , we have the relationship
8
Intuitively this means that the volatility risk is diversifiable, or that investors are
indifferent to the level of volatility risk. We will come back to these issues later.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  66(2.6)
  γ √ 
σ̂AT M √
N T = EQ N T
2 2

Assuming short maturities, t ≈ T , the cumulative normal density is approximately


linear around zero, which yields the approximate relationship
s
Z T
Q 1
σ̂AT M ≈ E σ 2 ds
T −t t s

Thus the implied ATM volatility is approximately equal to the expected average
volatility over the life of the option.

2.6 STYLIZED FACTS


The log-normality of the asset price distribution, a result of the GBM that
underlies the BS derivation, is not a satisfactory assumption. In fact, it has been
documented that equity prices do not follow such a distribution even from the
PhD dissertation of Bachelier (1900). Nonetheless, the BS methodology results
into a formula that is intuitive and very easy to implement in practice, and
therefore it is widely used both for academic and practical purposes. In fact,
options in exchanges are actually quoted with their implied volatilities rather
than in dollar or sterling terms. In addition, the fact that the volatility of the
underlying asset and the risk free rate of return are assumed constant, simplifies
the exposition, by forcing the markets to be complete.
Testing the BS model gives rise to many theoretical and practical problems.
If we use actual option prices to carry out such tests, we cannot distinguish
between the potential mis-specifications of the pricing formula and market in-
efficiencies. The joint hypothesis that the correct model is used and that the
markets are efficient is necessarily tested (for a discussion see for example Hull,
2003). The fact that at any time a parameter of the BS model is actually un-
observed further complicates things, as it is not clear which one to use. A third
problem arises from the possible asynchroneity of the equity, bond and option
markets. If trading does not take place simultaneously, or the market are very
thin, it is questionable if the assumption of completeness is satisfactory. Not
having data on synchronous transactions in liquid markets distorts the results.
The patterns of the implied volatilities summarize many of the failures of the
BS model, and researchers have been looking at them closely since good quality
data became available. An early analysis is the seminal paper of Rubinstein
(1985), where different patterns of implied volatilities emerge, depending largely
on the particular period that was used, with predominantly a U-shaped pattern
with the lowest point at-the-money. In the more recent work of Rubinstein (1994)
and Jackwerth and Rubinstein (1996) implied volatilities tend to be higher for
out-of-the-money puts and lower for out-of-the-money calls. These emerging
pattern of implied volatilities with respect to different measures of moneyness is

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


67(2.6) Q #RO& $>S%!%TVU>! WRX$>SO&T
often encountered in the literature as the implied volatility smile, skew or smirk.
If we create a three-dimensional view of the implied volatility with respect to the
moneyness and the time to maturity we construct the implied volatility surface.
Figure XX presents such a surface based on options data on the FTSE100.
These implied volatility patterns can be attributed to some of the best doc-
umented stylized facts of the distribution and dynamics of asset returns (two
excellent surveys are Bollerslev, Engle, and Nelson, 1994, and Ghysels, Harvey,
and Renault, 1996). Below we shall give a small overview of these features and
discuss how they are reflected on the implied volatility surface.

Leptokurtosis
It has been long observed that asset returns follow a distribution which is far from
normal, in particular one that exhibits a substantial degree of excess kurtosis
or fat tails (Fama, 1965). These fat tails seem to be more pronounced for short
investment horizons (ie intraday, daily or weekly returns), and they tend to
gradually die out for longer ones (ie monthly, quarterly or annual returns). A
distribution with high kurtosis is consistent with the presence of an implied
volatility smile, as it attaches higher probabilities to extreme events, compared
to the normal distribution. If the at-the-money implied volatility is used, then the
BS formula will underprice out-of-the-money puts and calls. A higher implied
volatility is needed for the BS formula to match the market prices. Merton (1976)
among others, notes that a mixture of normal distributions can exhibit fat tails
relative to the normal, and therefore models that result in such distributions
can be used in order to improve on the BS option pricing results. Most (if not
all) modern option pricing models to some extend do exactly that: expressing
calendar returns as a mixture of normal distributions.

Skewness
Apart from exhibiting fat tails, some asset return series also exhibit significant
skewness. For stocks and indices this skewness is typically negative, highlight-
ing the fact that the speed that stock prices drop is higher than the speed they
grow (although the tend to grow for longer periods then they decline). For cur-
rencies the skew is not generally one sided, swinging from positive to negative
and back, over periods of time. The asymmetries of the implied volatility skew can
be attributed to the skewness of the underlying asset returns. In prices are more
likely to drop by a large amount than rise, one would expect out-of-the-money
puts to be relatively more expensive than out-of-the-money calls. Black (1972)
suggests that volatilities and asset returns are negatively correlated, naming
this phenomenon the leverage effect or Fisher-Black effect. Falling stock prices
imply an increased leverage on firms, which is presumed by agents to entail
more uncertainty, and therefore volatility. This asymmetry can generate skewed
returns, but is not always sufficient to explain the very steep implied skews
we observe in (especially index) options markets. A second component that is

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  68(2.6)

needed is accommodating for market crashes arriving as jumps in the asset price
process, or even just fears of such crashes (the crash-o-phobia of Bates, 1998).

Volatility features
The fact that volatility is not constant is well documented, and allowing it to be
time varying is perhaps the simplest way to construct models that mix normal
distributions. Empirically, it appears that volatility in the market comes in cy-
cles, where low volatility periods are followed by high volatility episodes. This
feature is known in the literature as volatility clustering. The Arch, Garch and
Egarch families9 , as well as models with stochastic volatility have been used in
the literature to model the time variation of volatility and model volatility clus-
tering. The survey of Ghysels et al. (1996) gives a good overview of volatility
models from a modeling perspective. Local volatility models take a completely
different approach, as they focus solely on the pricing and hedging of derivatives,
preferring to keep volatility time-varying but deterministic rather than stochastic
(Dupire, 1994). We will discuss these extensions in chapter 6.
The variation of volatility can be linked to the arrivals of information, and
high trading volume (Mandelbrot and Taylor, 1967; Karpoff, 1987, among others).
One can argue that trading does not take place in a uniform fashion across time:
new information will result in a more dense trading pattern with higher trading
volumes, which in turn result in higher volatilities.

Price discontinuities
Even allowing the volatility to be time varying cannot accommodate for very
sharp changes in the stock price, typically crashes, which although are very rare
events, have a significant impact on the behavior of the market. On October 19th,
1987, the S&P500 index lost about 20% of its value within a day and without any
significant warnings. If the market was to follow the Black-Scholes assumption
of a GBM with constant volatility, such an event should happen once in 10 87
years,10 Even if we allow the volatility to vary wildly, a model with continuous
sample paths that will exhibit such a behavior is not plausible.
Starting with Merton (1976), researchers have been augmenting the diffusive
part of the price process with

9
Arch here stands for autoregressive conditional heteroscedasticity (Engle, 1982), Garch
stands for generalized Arch (Bollerslev, 1986), and Egarch for exponential Garch (Nel-
son, 1991)
10
This is a very long time. For a comparison, the age of our universe is estimated to be
about 1024 years.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


3
Finite difference methods

The Black and Scholes (1973, BS) partial differential equation (PDE) is, as we
saw, one of the most fundamental relationships in finance. It is as close to a law
as we can get in a discipline that deals with human activities. The importance
of the expression stems from the fact that it must be satisfied by all derivative
contracts, independently of their contractual features. In some special cases, for
example when the contract in question is a European-style option, the solution
of the PDE can be computed in closed-form, but this is not the general case.
In many real situation we will have to approximate the solution of the PDE
numerically.
If t denotes time and S = S(t) is the value of the underlying asset, the BS
model assumes that S follows a geometric Brownian motion

dS(t) = µS(t)dt + σS(t)dB(t)

It follows that, for any derivative contract, the pricing function f = f(t, S) will
satisfy the BS PDE

∂f(t, S) ∂f(t, S) 1 2 2 ∂2 f(t, S)


− + µS + σ S = rf(t, S) (3.1)
∂t ∂S 2 ∂S 2
where S is the price of the asset and t is the time to maturity. Equation (3.1)
is not sufficient to uniquely specify f, initial and perhaps a number of boundary
conditions are also needed for (3.1) to admit a unique solution. In fact, different
derivative contracts will impose different initial and boundary conditions, but
(3.1) must be satisfied by all of them. For example, the standard call option will
impose the initial condition

f(0, S) = max(S − K , 0)

Finite difference methods (FDMs) is the generic tern for a large number of
procedures that can be used for solving a (partial) differential equation, which
have as a common denominator some discretization scheme that approximates the
 !"#%$& '()"  70(3.1)

required derivatives. In this chapter we will give an overview of these methods


and also examine some examples that illustrate the methodology in financial
engineering. Thomas (1995) gives a detailed overview of different approaches,
together with exhaustive analysis of the consistency, convergence and stability
issues. Wilmott, Dewynne, and Howison (1993) present FDMs within an option
pricing framework.

3.1 DERIVATIVE APPROXIMATIONS


Before we turn to the fully fledged PDE (3.1), let us assume for a moment that
we are given a one-dimensional function h = h(x). Our goal is to provide some
estimate of the derivative of h at the point x̄, namely h0 (x̄) = dh(x̄)
dx . We can express
the derivative using three different expressions that involve limits towards x̄:
 h(x̄+4x)−h(x̄)
 lim4x→0
dh(x̄)  4x
= lim4x→0 h(x̄)−h(x̄−4x)
4x
dx 

lim4x→0 h(x̄+4x)−h(x̄−4x)
24x

For a differentiable function all three limits are equal, and suggest three can-
didates for discrete approximations for the derivative. In particular we can con-
struct:
1. The right limit yields the forward differences approximation scheme
dh(x̄) h(x̄ + 4x) − h(x̄)

dx 4x
2. The left limit yields the backward differences approximation scheme
dh(x̄) h(x̄) − h(x̄ − 4x)

dx 4x
3. The central limit yields the central differences approximation scheme
dh(x̄) h(x̄ + 4x) − h(x̄ − 4x)

dx 24x
These schemes are illustrated in figure 3.1, where the true derivative is also
given for comparisons. Of course the approximation quality will depend on the
salient features of the particular function, and in fact, it turns out to be closely
related to the behaviour of higher order derivatives.
Let us now assume that we have discretized the support of h using a uniform
grid, {xi }∞
i=−∞ with xi = x0 + i · 4x, and define the values of the function
hi = h(xi ). Then, we can introduce the corresponding difference operators D + ,
D− and D0 , and rewrite the difference approximations in shorthand 1 as
1
For us these operators serve as a neat shorthand for the derivative approximations,
but there is, in fact, a whole area of difference calculus that investigates and exploits
their properties.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.1

Q #RO& $>S%!%TVU>! WRX$>SO&T


0.2
0.3
71(3.1)
0.4
0.5
FIGURE
0.6 3.1: Finite difference approximation schemes. The forward (green), back-
ward0.7(blue) and central (red) differences approximation schemes, together with
0.8
the 0.9
true derivative (dashed).
1
4

3.5

0 x̄ + 4x
0.1 3
0.2
0.3 dh(x̄)
dx
0.4 x̄
2.5
0.5
0.6
0.7
0.8 2
0.9 x̄ − 4x
1
1.5

0.5
0 5 10 15 20

hi+1 − hi
Forward: D+ hi =
4x
hi − hi−1
Backward: D− hi =
4x
hi+1 − hi−1
Central: D0 hi =
24x

What are the properties of these schemes and which one is more accurately
representing the true derivative? A first inspection of figure 3.1 reveals that
the central differences approximation is closer to the true derivative, but is this
generally true? In order to formally assess the quality of the approximations we
will use Taylor expansions of h around the point xi , that is to say the expansions
of the points hi±1 :

dh(xi ) 1 d2 h(xi ) 1 d3 h(xi )


hi+1 = hi + 4x + 4x 2 + 4x 3 + · · ·
dx 2 dx 2 6 dx 3
dh(xi ) 1 d2 h(xi ) 1 d3 h(xi )
hi−1 = hi − 4x + 4x 2 − 4x 3 + · · ·
dx 2 dx 2 6 dx 3
Substituting the corresponding values in the approximation schemes will yield
the important relationships

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  72(3.2)

dh(xi )
D+ h i = + o(4x)
dx
dh(xi )
D− h i = + o(4x)
dx
dh(xi )
D0 h i = + o(4x 2 )
dx
In the above expressions we introduce the big-O notation, where o(4x n )
includes all terms of order 4x n and smaller.2 Now since |4x 2 |  |4x| around
zero, it follows that the terms |o(4x 2 )|  |o(4x)|, which means that central
differences are more accurate than forward of backward differences. We say
that central differences are second order accurate while forward and backward
differences are first order accurate.
Therefore, without any further information on the function, we should use
central differences where possible. If we have some extra information, perhaps
using one-sided derivatives might be beneficial. Such cases could arise when the
drift term dominates the PDE, or alternatively when the volatility is very small.
In our setting though we will concentrate on approximations that use central
differences as their backbone.
The BS PDE also involves second order derivatives, on top of the first order
ones. We therefore need to establish an approximation scheme for these second
derivatives. When we achieved that we will be able to proceed to the actual
discretization of the BS PDE (3.1). Since we are trying to establish second order
accuracy, we are looking for a scheme that approximates the second derivatives
using central differences.
It turns out that an excellent choice is an approximation that takes central
differences twice over a half-step 4x 2 .

dh(xi+1/2 ) hi+1 −hi hi −hi−1


− dh(xdxi−1/2 ) 4x − 4x hi+1 − 2hi + hi−1
D2 h i = dx
= =
4x 4x 4x 2

Using the same substitutions from the Taylor expansions as above yields

d2 h(xi )
D2 h i = + o(4x 2 )
dx 2
Therefore, we conclude that the operator D2 is second order accurate. In addition
D2 has the advantage that in order to compute it we use the same values that
were needed for the first diffrence D0 , namely hi±1 , and the value hi .

|g(x)|
2
Formally, if a function g = g(x) is o(4x n ) then the limit of the ratio |4x n| < C < ∞
(meaning that it is bounded) as x → 0. Intuitively, g(x) approaches zero at the same
speed as 4x n . We say that g is of order n.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


73(3.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
3.2 PARABOLIC PDES
The BS PDE belongs to a wide and well documented class of PDEs called
parabolic partial differential equations. Many important natural phænomena
are associated with parabolic PDEs, ranging from Einstein’s heat equation to
Schrödinger’s description of quantum mechanics. In order to simplify the subse-
quent notation we use the shorthand elliptic operator

∂f(t, x) ∂2 f(t, x)
L f(t, x) = α(t, x) + β(t, x) + γ(t, x)f(t, x)
∂x ∂x 2
for general functionals α, β and γ. Therefore the BS PDE will be of the general
form
∂f(t, S)
= L f(t, S) (3.2)
∂t
Suppose that we work on a grid x = {xj }+∞ j=−∞ , with constant grid spacing
equal to 4x. We will concentrate on an initial value problem, and therefore as-
sume that the function extends over the whole real line. The problem of boundary
conditions will be addressed later in this chapter. We will also define the value
function at the grid points, fj (t) = f(t, xj ), j = −∞, . . . , +∞. We construct the
discretized operator by applying the differences D0 and D2

Lfj (t) = αj (t) · D0 fj (t) + βj (t) · D2 fj (t) + γj (t) · fj (t)

In the above expression the functionals αj , βj and γj are just the restrictions of
α, β and γ on the grid point xi . Substituting the difference operators gives

fj+1 (t) − fj−1 (t)


Lfj (t) = αj (t) ·
24x
fj+1 (t) − 2fj (t) + fj−1 (t)
+ βj (t) · + γi (t) · fj (t)
4x 2
Our goal was to construct a discretized operator that, in some sence, con-
verges to the actual operator as the discretization becomes finer, or somehow
“L → L ”. Essentially, since we want to establish convergence we will need
a measure of distance between the operators. We will discuss these issues in
more detail in section 3.2, following the introduction to the explicit method. After
establishing this convergence we will move forward and approximate the PDE
itself at the point xj with

∂fj (t)
= Lfj (t) ⇔
∂t
∂fj (t) −
= q+ 0
j (t) · fj+1 (t) + qj (t) · fj (t) + qj (t) · fj−1 (t) (3.3)
∂t
The functionals q± 0
j (t) and qj (t) depend on the structure of the PDE and are
given by

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  74(3.2)

1 1

j (t) = ±αj (t) + βj (t)
24x 4x 2
1
q0j (t) = γj (t) − 2βj (t)
4x 2

A PDE AS A SYSTEM OF ODES


Since (3.3) will hold for all grid points {xj }+∞ j=−∞ , we have represented the dis-
cretized problem (3.3) as a system of ODEs, which can be cast in matrix form
for f (t) = {fj (t)}+∞
j=−∞
∂f (t)
= Q(t) · f (t) (3.4)
∂t
subject to the initial condition f (0) = {f(xj , 0)}+∞
j=−∞ . Equation (3.4) describes the
evolution of the pricing function f as the time to maturity increases. From this
point on we will make the additional assumption that the matrix Q(t) is time-
invariant, Q(t) = Q. Time dependence can be accommodated in a straightforward
way in the numerical implementation. The matrix Q is tridiagonal, in particular
 
.. .. ..
 . . . 
 q− q0 q+ 0 0 
 −1 −1 −1 
Q=  − 0
0 q 0 q0 q0 0 + 

 0 0 q+1 q+1 q+1 
− 0 +
 
.. .. ..
. . .

There is a large number of solvers for such systems. We will consider methods
that apply time discretization as well, and therefore work on a two-dimensional
grid.

THE GRID
In equation (3.4) we converted the PDE in question into a system of infinite
ODEs. Apparently it is not feasible in practice to numerically solve systems
with an infinite number of equations. We will therefore nedd to truncate the grid
and consider a subset with Nx elements x = {xj }N j=1 . This means that we will
x

need to take special care on the treatment of the numerical approximations at


the artificial boundaries x1 and xNx . We will discuss these issues in detail in
section 3.2.
Also, to construct a two-dimensional grid we need to discretize across time
as well, using Nt points that define subintervals of constant width 4t, {ti }N i=1 .
t

Figure 3.2 illustrates such a grid, together with a view of a function surface
that we could reconstruct over that grid. It is important to note that neither the
space nor the time grid have to be uniform. One can, and in some cases should,
consider non-uniform grids based on some qualitative properties of the PDE in
hand.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


PSfrag replacements
PSfrag replacements
0
75(3.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
0.1
0.2
0.3 FIGURE 3.2: A two-dimensional grid.
0.4
0.5 120
0.6
0.7 4t ti −→ ti+1
0.8 100
4x
0.9
1
0 80
0.5 xji
xj
1
1.5 60
2
2.5
3 40
0
0.1
0.2
-1 20
0.3
-0.5
0.40
0.5 0
0.61 -0.5 0 0.5 1 1.5 2 2.5 3
0.7
0.8
0.9
1
10
20
30
40
50
60
70
80
90
100
0
0.2
0.4
0.6
0.8
1
1.2

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  76(3.2)

EXPLICIT FINITE DIFFERENCES

FIGURE 3.3: The Explicit FDM.


PSfrag replacements

i
fj+1
100
fji
fji+1
95 i
fj−1

90
5
85 10
15
1.1
1.2 4t 20
4x
1.3 25
1.4

As we noted, equation (3.4) describes the dynamic evolution of derivative


prices, subject to initial and perhaps boundary conditions. Our time discretiza-
tion has that objective as well: given the pricing function values at time t i we
should be able to determine the function values at time t i+1 . Therefore, starting
from the initial values at time t0 = 0, we recursively produce the values at t1 ,
t2 , and so on.
At first glance using central differences is not feasible, since the central
difference at the time point t0 needs the values at t1 and t−1 to be determined,
but the values at t−1 are unavailable. On the other hand forward differences in
time will do, as we only need the values at t0 and t1 to form them. In order to
condense notation we will use fji to denote the value f(ti , xj ). We also assume a
uniform grid with spacings 4x and 4t, although it is not a lot harder to work
∂f(ti ,xj ) fji+1 −fji
over non-uniform grids.3 Then, by approximating ∂t ≈ 4t , we derive the
explicit finite difference method

fji+1 − fji i 0 i − i
= Lfj (ti ) = q+
j fj+1 + qj fj + qj fj−1 (3.5)
4t

We can explicitly solve4 the above expression for fji+1 , which yields the re-
cursive relationship
3
Just a lot more messier.
4
Hence the name!

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


77(3.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
 i
fji+1 = q+ i 0 − i
j 4tfj+1 + 1 + qj 4t fj + qj 4tfj−1

i
Essentially, the values fj±1 and fji determine the next period’s value fti+1 . This is
schematically depicted in figure 3.3. In matrix form, the updating takes place as

f i+1 = (I + Q4t) · f i (3.6)

Now we turn to the BS PDE (3.1) and apply this discretization scheme. To
simplify the expressions we perform the change of variable x = log S. This will
transform the PDE into one with constant coefficients, namely

∂f(t, x) ∂f(t, x) 1 2 ∂2 f(t, x)


− +α + σ = rf(t, x)
∂t ∂x 2 ∂x 2
with α = r − 12 σ 2 . The coefficients q± 0
i and qi in the system of ODEs (3.4), which
also determine the explicit scheme (3.6), become

α σ2

i = ± +
24x 24x 2
2
σ
q0i = −r −
4x 2

STABILITY AND CONVERGENCE


By constructing a FDM, like the explicit scheme, we use derivative approxima-
tions to reconstruct the true, but unknown, pricing function f(t, S). The outcome
of the FDM is a set of prices at time ti , namely f i = {fji }N j=1 , for all different
x

i = 1, . . . , Nt . The natural question is of course how close are the values f ji to


the true prices f(ti , xj )? If we are to use such a scheme in practice we need to
be convinced that somehow “fji → f(ti , xj )” as the discretization becomes finer.
Also, if we are to put some trust in this approximation we should have an idea
about the order of this convergence.
One straightforward way would be to examine how the pointwise errors
between the true prices and their approximation behave. If we denote the true
i
prices with f̃ = {f(ti , xj )}Nj=1 , then the errors in question would be the differences
x

i
ε i = f i − f̃
i
We can investigate the convergence f i → f̃ by inspecting the `∞ -norm, namely5
kεk = maxj=1...Nx |fji − f(ti , xj )|. Apparently, if the maximum (absolute) value
converges to zero, then all other values will do as well, and the FDM prices will
converge to the true ones.
Before we move to the inspection of the global errors, we first examine the
local truncation error, defined as the discrepancy between the true parabolic
5
In some cases it is more convenient to work with the `1 -, `2 - or `p -norm. The choice
largely depends on the problem in hand. See XXXX for details.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  78(3.2)

PDE (3.2) and the approximated one (3.5), evaluated at the true pricing function
at the point (ti , xj )
   
i ∂ f(ti+1 , xj ) − f(ti , xj )
τj = f(tj , xi ) − L f(tj , xi ) − − Lf(ti , xj )
∂t 4t

The definitions and the properties of the difference operators yield that the
truncation error τji = o(4t, 4x 2 ). We therefore say that the explicit method is
first order accurate in time and second order accurate in space. Intuitively, this
truncation error would tell us how errors will be created over one step, if we
start from the correct function values. Any scheme that offers order of accuracy
greater than zero is called consistent.
Of course, even if small errors are created over a given time step, they
can still accumulate as we move from one time step to the next. It is possible
that they produce feedback effects, producing errors that grow exponentially in
time, destroying the approximate solutions and creating oscillatory or explosive
behaviour. On the other hand we might construct a FDM that has errors that
behave in a “nice” way, without feedback effects. The notion of stability captures
these ideas.
One intuitive way of looking at stability is through the Courant-Friedrichs-
Lewy (CFL) condition6 , which is based on the notion of the domain of depen-
dence. If we have a function f(t, S), then the domain of dependence of the point
(t ? , S ? ) is the set of points

F(t ? , S ? ) = {(t, S) : t > t ? and f(t, S) depends on the value f(t ? , S ? )}

The CFL criterion states that if a numerical scheme is stable, then the true
domain of dependence must be smaller than the domain of dependence of the
approximating scheme.
In parabolic PDEs the domain of dependence of the process is unbounded,
since information travels instantaneously across all values. The domain of de-
pendence of the explicit FDM is bounded, since each value at time t i+1 will only
depend on three of its neighbouring values at time ti . Therefore, according to
CFL criterion in order for the scheme to be stable the condition 4t = o(4x 2 )
must be satisfied.7 Therefore the explicit scheme will not be unconditionally
stable, and will need very small time discretization steps to offer stability.
The connection between local errors, global errors and stability is given by
the Lax equivalence theorem which states that a FDM which is consistent and
stable will be convergent. This means that the explicit method is not (always)
convergent.
6
Stated in 1928, long before any stability issues were discussed in this context. Richard-
son initiated FDM schemes as back as 1922 for weather prediction, but did not discover
any stability problems.
7
This means that the time grid must become finer a lot faster than the space grid, for
the information to rapidly reach remote values.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


79(3.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
IMPLICIT FINITE DIFFERENCES

FIGURE 3.4: The Implicit FDM.


PSfrag replacements

100 i+1
fj+1
fji
fji+1
95
i+1
fj−1
90
5
85 10
15
1.1
1.2 4t 20
4x
1.3 25
1.4

One way to overcome the stability issues is to use a backward time step.
Rather than taking a forward time step at time ti , we take a backward step from
time ti+1 . This is equivalent to computing the space derivatives at time t i+1 as
shown below
fji+1 − fji i+1 0 i+1
= q+
j fj+1 + qj fj + q− i+1
j fj−1
4t
This equation relates three quantities at time tt+1 and one quantity at time ti ,
which is schematically given in figure 3.4.
Since we are facing one equation with three unknowns we cannot explicitly
give a solution, but we can form a system.
i+1
 i+1
−q+ 0
j 4tfj+1 + 1 − qj 4t fj − q− i+1 i
j 4tfj−1 = fj

Note that the number of system equations will be equal to the number of un-
knowns. In matrix form, the system can be written as

f i = (I − Q4t) · f i+1 ⇔ f i+1 = (I − Q4t)−1 · f i (3.7)

The same line of argument we used for the explicit method will give us the
order of accuracy of the implicit scheme, the errors being again o(4t, 4x). On
the other hand, since the value at time ti+1 depends on the whole set of prices f i
at time ti , the domain of dependence of the implicit scheme is unbounded. From
the CFL criterion it follows that the implicit scheme is unconditionally stable.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  80(3.2)

THE CRANK-NICOLSON AND THE θ-METHOD

FIGURE 3.5: The Crank-Nicolson FDM.


PSfrag replacements

i
fj+1
100 i+1
fj+1
fji
fji+1
95
i
fj−1
i+1
fj−1
90
5

85 10
15
1.1
4t/2 20
1.2 4t/2
4x
1.3 25
1.4

Although the implicit scheme is unconditionally stable, it still offers conver-


gence of order o(4t, 4x 2 ). The first order convergence in time is due to the
nature of the derivative approximation. We can increase this order to two by
setting up a central difference scheme in time. We will use a time step of 4t 2 , as
we did in the approximation of the second order derivative.
This is equivalent in taking the space derivatives at the midpoint between t i
and ti+1 . This yields the Crank-Nicolson scheme
i i+1 i i+1
fji+1 − fji fj+1 + fj+1 fji + fji+1 fj−1 + fj−1
= q+
j · + q0j · + q−
j ·
4t 2 2 2
Apparently the Crank-Nicolson scheme will relate six points, illustrated in
figure 3.5. Another approach is to simply add up one-half times equation (3.6) and
one-half times the first of equations (3.7). This yields again the Crank-Nicolson
scheme, in matrix form
   
1 i+1 1
I − Q4t · f = I + Q4t · f i
2 2
Since it uses centered differences to approximate all derivatives, the errors in the
Crank-Nicolson scheme are o(4t 2 , 4x 2 ). Therefore, the Crank-Nicolson scheme
is second order accurate both in time and space. In addition, like the implicit

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


81(3.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
scheme, the Crank-Nicolson scheme has unbounded domain of dependence, and
is therefore unconditionally stable.
Rather than using weights of 21 to balance the explicit and implicit schemes,
one can use different values. This gives rise to the θ-method, which encompasses
all schemes described so far. In particular, the θ-method in matrix form will be

(I − θQ4t) · f i+1 = (I + (1 − θ)Q4t) · f i

It is straightforward to verify that θ = 0 yields the explicit scheme, θ = 1 yields


the implicit scheme, and θ = 21 yields the Crank-Nicolson scheme.

BOUNDARIES
The above treatment of parabolic PDEs assumed that the space extends over
the real line. Essentially this implies that the matrices involved are of infinite
dimensions. Of course in practice we will be faced with finite grids. Sometimes,
as is the case with barrier options, boundary conditions will be explicitly imposed
by the nature of the derivative contract. In other cases, when the derivative has
early exercise features, the boundary is not explicitly defined and is free in the
sense that it is determined simultaneously with the solution of the PDE.
There are two different kinds of fixed boundary conditions: Dirichlet condi-
tions set f(tB , xB ) = fB , that is the value of the function is known on the boundary.
Neumann conditions set ∂f(t∂x B ,xB )
= φB , that is the derivative is known on the
boundary. In the second case we will need to devise an approximation scheme
that exhibits o(4x 2 ) accuracy; if we do not achieve that, then contaminated val-
ues will diffuse and eventually corrupt the function values at all grid points.
This means that we must use finite difference schemes that achieve o(4x 2 ), like
central differences.
Say that we construct a finite space grid {xi }N i=0 , which essentially discretizes
x

the interval [x0 , xNx ]. Most of the elements of the matrix Q are not affected by the
boundary conditions, and the matrix is still tridiagonal. The only parts that are
determined by the fixed boundary conditions are the first and last rows. Thus Q
will have the form
 
F F 0
 q− 0 + 
 1 q−1 q10 0+ 
 0 q 2 q2 q2 0 
 
 .. .. .. 
 . . . 
Q=  
0 q − 0
q q +
0 
 j j j 
 .. .. .. 
 . . . 
 
 0 q −
q 0
q + 
Nx −1 Nx −1 Nx −1
0 F F

where the values at F are determined by the boundary conditions.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  82(3.3)

j
We start with a Dirichlet condition at fNx +1 = f(ti , xNx +1 ) = fBi . This point is
utilized in the explicit scheme when the value fNi+1
x
at time ti+1 is calculated. In
particular  i
fNi+1
x
= q+ i 0 −
j 4tfB + 1 + qj 4t fNx + qj 4tfNx −1
i

Therefore, in matrix form, the updating equation for the explicit scheme becomes

f i+1 = (I + Q4t) · f i + gi 4t

where the last row of Q is (0, · · · , 0, q− 0 i


Nx +1 , qNx +1 ), and g is an (Nx + 1) × 1
vector of zeros, with the last element equal to q+ i
Nx +1 fB . Similarly, a Dirichlet
boundary condition at x0 will set the first row of Q is (q00 , q+
i
0 , 0, · · · , 0), and the
first element of gi is q− f
0 B
i
.
Within the implicit scheme this boundary would appear in the N x + 1 system
that determines function values at time ti
 i
fNi−1
x
= −q+ i 0 −
j 4tfB + 1 − qj 4t fNx − qj 4tfNx −1
i

Therefore when the Crank-Nicolson method is implemented f Bi will affect


both pricing formulas at ti and ti+1 . Similar formulas can be easily computed for
the lower boundary x0 , where the ghost point x−1 is introduced.
When a Neumann condition is imposed at xNx , we apply central differences
∂f(t ,x )
at the point xNx to approximate ∂xi j = φBi , which yields

fNi x +1 = fNi x −1 + 2φBi 4x

These values can be use in the approximation schemes to set up the last row of
Q, namely (0, · · · , 0, q+ − 0 i
Nx +1 + qNx +1 , qNx +1 ), and the last element of g equal to
+ i i
2qNx +1 φB 4x. Similarly, a Neumann boundary condition at x0 will set the first
− − i
row of Q to (q00 , q+ i
0 + q0 , 0, · · · , 0), and the first element of g to −2q0 φB 4x.

3.3 A PDE SOLVER IN MATLAB


PLAIN VANILLA OPTIONS
In this section we will build a Matlab example that implements the θ-method.
We assume that the dynamics of the underlying asset are the ones that govern
the BS paradigm. We will need payoff function G. If we also assume Dirichlet
boundary conditions, then the same function will determine the Dirichlet bound-
aries. If Neumann conditions are specified, then the same function should also
give the derivatives on the boundaries. Since we have set up the PDE in terms of
the log-price, which we denote in the solver with x, the derivatives will be equal
∂f ∂f ∂S ∂f
to ∂x = ∂S ∂x = ∂S exp(x). A call option will be implemented by the function in
listing 3.1. A put option is implemented in 3.2.8

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


83(3.3) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 3.1: %f ` ^>$''( Y : Payoff and boundaries for a call.


p ¦s.NA%AzBw8
*u%;t.=O3+>; xwMz6 q Mv6ª“MO|~}ƒ¦s.NA%A2y6V1G
©´}š1B9©†
“yƒ}•yz<tž>yz:„e†
M} 8ONy  4y1 <yt~ž¢©z6ù‚(† p 1ONMC+*%*‘*u%;t.=O3+>;
5
q Mƒ} ¯N¯ † prq +>u%;C“N,%M¡DNA>uO4C5
“Mƒ}úxw‚g6 4y1 <yg 4>;“ |(† prq +>u%;C“N,%M¡“4,O3DN=O3D4

LISTING 3.2: %f ` Wg Y : Payoff and boundaries for a put.


p ¦s>1%uC=zB8
*u%;t.=O3+>; xwMz6 q Mv6ª“MO|~}ƒ¦s>1%uC=z9yz6V1G
©´}š1B9©†
“yƒ}•yz<tž>yz:„e†
M} 8ONy 2©‘ž 4y1 <yt(6¬‚(† p 1ONMC+*%*‘*u%;t.=O3+>;
5
q Mƒ} ¯N¯ † prq +>u%;C“N,%M¡DNA>uO4C5
“Mƒ}ˆx0ž 4y1 <yz:„O(6¬‚|e† prq +>u%;C“N,%M¡“4,O3DN=O3D4


i
The initialization part of the PDE solver just decomposes the structure and
constructs the log-price and the time grids. The function returns the payoff val-
ues, the boundary values and the derivatives on the boundaries. Therefore both
Dirichlet and Neumann conditions can be accommodated for. The tridiagonal Q
! W"%X# that keeps the
matrix will be constructed according to whether we have specified Dirichlet or
Neumann boundary conditions. We use the switch _
boundary type as a two-element vector. At this stage we assume that the same
boundary applies to all time steps. The Matlab code for the PDE solver is given
in listing 3.3.
Here we use the Matlab backslash operator A\B = inv(A) ∗ B. The snippet in
 ! W"%X#/û j will implement the
3.4 illustrates how the function can be called to compute the price of a European
put, and plots the pricing function. Setting _
PDE solver with Dirichlet boundary conditions.

EARLY EXERCISE FEATURES


In many cases the derivative in hand has early exercise features, either American
(where the option can be exercised at any point prior to maturity), or Bermudan

¯ N-¯
8
We will implement the solver using Neumann conditions, and therefore we pass the
boundary values as . Actually, for the put price the corresponding Dirichlet bound-
ary condition is not time homogeneous, and our solver will need slight modifications
to accommodate time inhomogeneous boundaries.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  84(3.3)

LISTING 3.3:
%X ` _ Tc Y : θ-method solver for the Black-Scholes PDE.
p 1C“4s q 5Bw8
*u%;t.=O3+>; x9yDv6{=Dv6 ¾ ³O|~}r1C“4s q 5g2*6V1G
=?O4=N~}š1vB-=?O4=N† p *+,‡=?C4=%Než8a4=?a+“
, }š1vB2,g† p ,C35¡*,%44‘,%N=%4
5%3Š8aN~}š1vB5%3Š8aN† p DC+%AN=O3A3=%M
5
N }•,‘ž’B2™%—K5%3Š8aNa—a5%3Š8aN†
³ }š1vB2=g† p 8aN=uC,O3=%M
¯= }š1vB0=;%u8 q 4,† p =C3:8O4’3 ;C=4,%DNA5
“= }•³aH¯=z† p =C3:8O4ŽŠ,C3>“ 53>‰%4
=D }¶x‚eF:“=F2³C|(† p =C3:8O4ŽŠ,C3>“
10
qy }š1vB0y q +>u%;C“N,%M°† p 8ONyŽA+>Šež1C,O3%.4
¯y }š1vB0y;%u8 q 4,† p A+>Šež1C,O3%.4”3 ;C=4,%DNA5
“y }ƒ— q ytH¯yz† p A+>Šež1C,O3%.4ƒŠ,C3>“’53>‰%4
yD }ˆx0ž q yF<“yF q ya|(† p A+>Šež1C,O3%.4ƒŠ,C3>“
q +>u%;C“%=%M1O4‘}š1B q +>u%;C“%=%M1O4v†
15
x9*‚6 q *‚v6ù“*%‚C|~} *4DNA 2*6ªyDv6V1Gc† p *Ž.N%AA
®>1Ž}úB2™%—NOH%“yƒ‹úB2™%—K5%3Š8aNa—a5%3Š8aNaH%“ytH%“yz† p ®Kž1OA ua5
® 8‘} žKB9™—CNCH%“yƒ‹úB2™%—K5%3Š8aNa—a5%3Š8aNaH%“ytH%“yz† p ®Kž8K3 ;%ut5
®‚Œ}Žž±,‘žr5%3Š8aNt—a5%3Š8aNOH“ytH%“yz† p ®Kž‰%4,+
p 8aN=%,O3yŽü
20
üý} “C3NŠ -®‚t—C+ ;C45<¯yK‹O„e6:„Oš‹ “C3NŠ -®>1c—O+ ;C452¯y 6:„6¬‹O„•‹”B:B:B
“C3NŠ 0® 8c—O+ ;C452¯y 6:„6Ÿž„c†
Š} ‰4,C+5 <¯yt‹C„c6·„e† p D4C.=C+,‘+>*¡.+>;t5=N;C=O5
prq +>u%;C“N,%Mþ.+>;C“O3=O3+>;t5
3 * q +>u%;C“%=%M1O4 p ¥a3,O3%. ?aA4=ˆ.+>;C“O3=O3+>;t5
Šz:„ }r® 8¡— q *%‚:„e† p =+ 1 .+>;t5=N;C=
25
Šg 4>;“ š}r®>1”— q *%‚<e† prq +=%=C+ 8ˆ.+>;t5=N;C=
4%A%54 p ¯C4u8aN;%;ˆ.+>;C“O3=O3+>;t5
üg:„e6< }r® 8‘‹~®>1v† p =+ 1Žüaž8aN=%,O3y
üe-¯ya‹C„c6¬¯yGš}r® 8‘‹~®>1v† prq +=%=C+ 8”üaž8aN=%,O3y
Šz:„ }¡ž>—%“yK—® 8¡—•“*%‚v:„e† p =+ 1 .+>;t5=N;C=
30
Šg 4>;“ š}´—“yK—®>1”—•“*%‚v<e† prq +=%=C+ 8ˆ.+>;t5=N;C=
4>;“
¾ ³³v%FO6:„Ÿ}}~*‰‚4,C† +5 <¯yt‹C„c6¬¯=t‹O„† p Š%,O3“O5Œ+>*Ž,4C5 uaA=O5
¾*+, =;“yŽ}ŒcF-¯=K‹O„t† p A++ 1’=?C4=%Nepž8a43 ;t=?a3=O+“‘ 3NA¡.+>;C“O3=O3+>;
=?C,C+>uCŠ?”=C3:8O4
35
³¾ zF6=;“y(Ÿ}c 4M%4 -¯yt‹O„‡žš=?O4=Nt—%üC—%“=K•ÿ’r 4M%4 <¯yK‹O„”B:B:B
‹þ0„ž=?O4=NK%—%üC—“=G•— ¾ ³zF6:=>;%“yž%„•‹•Ša—%“=‘c†
4>;“

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


85(3.3) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 3.4:
% X ` _ T ` & Y C'( Y : Implementation of the θ-method solver.
p 1C“4s q 5s381aABw8
.A4%N, †
.A%. †
1B0=?O4=N }Œ‚B<™‚e† p *+,‡=?C4=%Než8a4=?a+“
1B2, }Œ‚B2‚˜† p ,C35¡*,%44Ž,%N=%4
5
1B 5%3Š8aN }Œ‚B %‚e† p DC+%AN=O3A3=%M
1B2= }Œ‚B<%™(† p 8aN=uC,O3=%M
1B9© }Œ‚B ™(† p 5=%,O3 O4Œ1C,O3%.4
1B0=;%u8 q 4, }‚(† p =C3:8O4’3 ;C=4,%DNA5
1B0y q +>u%;C“N,%M”}Œ‚B9˜‚e† p 8ONyŽA+>Šež1C,O3%.4
10
1B0y;%u8 q 4, }Œ²‚(† p A+>Šež1C,O3%.4‘3 ;C=4,%DNA5
1B q +>u%;C“%=%M1O4‘}~‚c† prq +>u%;C“N,%M‘=M1C4
p .N%AA‡=?C4ƒ£¥»’5+%AD4,
x9yDv6{=Dv6 ¾ ³O|~}r1C“4s q 5¦%s 1u=@6V1Gc†
p 8ON>C4‡ N ¥~1OA+=”+>*‡=?C4Ž,4C5 uaA=O5
15
5u,*  4y1 -yDG(6/=Dv6 ¾ ³ -(†

FIGURE 3.6: Early exercise region for an American put. The time-price space
is separated into two parts. If the boundary is crossed then exercise becomes
optimal.

PSfrag replacements
early exercise region
L f(t, S) > 0
f(t, S) = Π(S)

free boundary
log-price

no-exercise region
L f(t, S) = 0
f(t, S) > Π(S)

time to maturity

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  86(3.3)

(where the option can be exercised at a predefined set of times). With small
changes the PDE solver we constructed can take care of these features.
Essentially, the holder of the option has to make a decision at these time
points: exercise early and receive the intrinsic value, or wait and continue holding
the option. In terms of PDE jargon, the problem is now a free-boundary problem.
There is a boundary, which is at the point unknown for us, which separates the
region of (t, S) where early early exercise is optimal and the region of where it
is optimal to wait. Figure 3.6 illustrates these regions. Thus, within the “waiting
optimal” region the BS PDE is satisfied, while outside the boundary f(t, S) will
be equal to the payoff function Π(S).
The boundary function is unknown, but it has a known property: it will
be the first point at which f(t, S) = Π(S). This follows from a no-arbitrage
argument that gives that the pricing function has to be smooth and not exhibit
discontinuities. In terms of the pricing function, it will satisfy

L f(t, S) > 0 (3.8)


f(t, S) > Π(S) (3.9)
L f(t, S) · (f(t, S) − Π(S)) = 0 (3.10)

The BS PDE is satisfied within the no-exercise region, while the pricing function
is satisfied within the exercise region. Equation (3.10) reflects that. Within the
exercise region L f(t, S) > 0, while within the no-exercise region f(t, S) > Π(S).
Equations (3.8-3.9) cover these possibilities.
This indicates that a strategy to compute the option price whenearly exercise
is allowed will be to set  
fji = max f̂ji , Π(Sj )

where f̂ is the price is no exercise takes place. Therefore the option holder’s
strategy is implemented: the holder will compare the value of the option if she
did not exercise with the price if she does; the option value will be the maximum
of the two. Although the above approach is straightforward in the explicit method
case, it is not so in the other methods where a system has to be solved. In these
cases we are looking for solutions of a system subject to a set of inequality
conditions. In the most general θ-scheme, the system has the form

(I − θQ4t) · f i+1 > (I + (1 − θ)Q4t) · f i


f i+1 > Π(S)
 i+1
  i+1 
(I − θQ4t) · f − (I + (1 − θ)Q4t) · f i f − Π(S) = 0

where S is the vector of the grid prices of the underlying asset, and the inequality
is taken element-wise.
Such systems can not be explicitly solved, but there are iterative methods,
like the projected successive over-relaxation or PSOR method. Given a system

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


87(3.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
A·x > b
x>c
[A · x − b] [x − c] = 0

a starting value x (0) , and a relaxation parameter ω ∈ (0, 2), the PSOR method
updates9
  
i−1
X n
X
(k+1) (k) ω  (k+1) (k)
xi = max ci , (1 − ω)xi + bi − aij xj − aij xj 
aii
j=1 j=i+1

The PSOR procedure is implemented in the short code given in 3.5. The pro-
gramme will solve A ∗ x > b and x > c, while one of the two equalities will

strictly hold for each element. The initial value is xinit. The function returns the
solution vector Z , and an indicator vector Z of the elements where the second
equality holds; in our case the early exercise points. The solver has to be ad-
!  $
justed to accommodate for the early exercise, and the code is given in 3.6. We
introduce Y b and call the PSOR procedure. We demand accuracy of 10 −6 ,
while we allow for 100 iterations to achieve that.
The snippet in listing 3.7 implements the pricing of a European and an American
put and examines the results. The strike price is $1.05. To make the differences
clearer the interest rate is set to 10%. The results are given in figure 3.7; the
American option prices approach the payoff function for small values of the spot
price, while European prices cross. Early exercise will be optimal if the spot
price is below $0.90, where the American prices touch the payoff function.

BARRIER FEATURES
European vanilla options (calls and puts) are exercised on maturity, and have
payoffs that depend on the final value of the underlying asset. Barrier options
have an extra feature: the option might not be active to maturity, depending on
whether or not the barrier has been triggered. Denote the barrier level with B.
The jargon for barrier options specifies the impact of the barrier as follows
• Up: there is an upper barrier, or Down: there is a lower barrier
• In: the contract is not activated before the barrier is triggered, or Out: if the
barrier is breached the contract is cancelled
Therefore we can have eight standard combinations
   
Up In Calls
-and-
Down Out Puts
9
A value 0 < ω < 1 corresponds to under-relaxation, ω = 1 is the Gauss-Seidel
algorithm, while 1 < ω < 2 corresponds to over-relaxation. In our case we want to use
a value that implements over-relaxation.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  88(3.3)

LISTING 3.5:
OT>!>Rg Y : PSOR method.
p a1 5+,°Bw8
*u%;t.=O3+>; xwyz604yO|~}•1a5+,v6 q 6ª.6¬yC3;a3>= 6¢+8O4Š%N 6ª=%+A6 8ONy(
; } A4;CŠ%=?  q c† p A4;CŠ%=?”+>*ŽD4C.=C+,O5
y }ŒyO3 ;t3=z† p 3 ;t3=O3NA3‰4Žy
}Œ‚c† p ;%u8 q 4,Ž+>*¡3=4,N=O3+>;t5
5
*ANŠ }‘„a† p 5=C+>1%1t3 ;CŠ‘*ANŠ
J?t3A4Œ*ANŠ
} •‹Ž„a†
 p ;C4y= 3=4,N=O3+>;
yO3 ;t3=ƒ}•yg† p +A“ŽDNA>uO4
*+, 3•}Ž„tF2; p u%1C“N=4~;C4J‘DNA>uO4
10
y:3Cš} 8ONy :.(:3Ce6/yg:3C•‹eBBB
+ 8a4ŠNK—K q :3COž g-36%F-C—y‡ŸH g036:3Cc†
4>;“
p . ?ON;CŠ4¡58aNA%Aƒ4;a+>uCŠ?¡+>,‡=++•8ON>;M 3=4,N=O3+>;t5
3 *  ;O+,>8 9yGž%yO3 ;t3=Kr›Ÿ=+AK  ~œ8ONy(
15
*ANŠ }~‚c†
4>;“
4>;“
4y } <ya}}t.C·¹þ2yKœ‚Ce† p =?C4Ž4%N,CAMŽ4y4,O.%3%54”,4ŠO3+>;

LISTING 3.6:
%X ` _ T ` $ Y >Rg Y : θ-method solver with early exercise.
p 1C“4s q 5sN>8a4,@Bw8
*u%;t.=O3+>; x9yDv6{=Dv6 ¾ ³v6ª»C|Œ}r1C“4s q 5sN>8a4,2*z6V1G
+ 8a4ŠN~}š1vB:+ 8a4ŠN† p *+,ƒ£C¨À>¼
5 lines 3-37 of
1C“4s q 5Bw8
p 4y4,O.%3%54”,4ŠO3+>;
» } ³z†
» v%FO6:„Ÿ}þ¾ <*‚cœ‚Ce†


p A++ 1’=?C,C+>uCŠ?¡=C3:8O4
10
*+, =;“yŽ}ŒcF-¯=K‹O„t†
p =?C4=%Nž8a4=?a+“
~} 4M%4 <¯yK‹O„ƒžŸ=?O4=Nt—%üC—“=z†
q }  4M%4 <¯yK‹O„•‹þ0„ž=?O4=NK%—%üC—“=‘•— ¾ ³zF6:=>;%“yž%„•‹¡B:B:B
Ša—“=z†
xwJz6ù4yO|~}•1a5+,v  6 q 6ª*‚6 ¾ ³F6ù=>;%“yzž%„g6¬+8O4Š%N 6±„>4(B:B:B
15
ž g6š„ ‚‚‘c†

³zF6=;“y(š}ŸJ†
»¾ zF6=;“y(š}~4yz†

4>;“

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.4
0.5

Q #RO& $>S%!%TVU>! WRX$>SO&T


0.6
89(3.3)
0.7
0.8
0.9
FIGURE13.7: European versus American option prices. The American option will
reach the payoff function, while the price of the European contract can cross
below that level.

0.25

1.2 0.2
0
0.1
0.2
0.15
option value
0.3
0.4 European
0.5 American
0.6 0.1
0.7
0.8
0.9
1 0.05
Payoff function

0
0.8 0.85 0.9 0.95 1 1.05 1.1 1.15
asset price

LISTING 3.7:  !"#!$&%' : Implementation of PSOR for an American


put.
()!*!+!,-/.,01+2,31)/46571
)859 1+:!0 ;=<>5@?ACB (D!92FEGH I

)!*!+!,-/.,!31!)4J571
lines 2-12 of
5
(FK044FLNM!+OEPQR.94NS!+2FD!92=TN1+23KN0NUV9)!L39U/.
WYXS[Z\LS[Z^]NTN_JZ\Q`a;)!*!+!,-/.,01+26bcd, )eL6Z)>fgB
(FK044FLNM!+OEPQR.94NS!+2FD!92hQe!29)+0NUV9)!L39U/.
WYXS[Z\LS[Zi]NQ_/a ;)!*!+!,-/.bcd, )eL6Z)jfgB
]T_O;]T_kbl!Z +U* fgB (O4N0!. LmTN1+23KN0NUn)!23KN+.
10
]NQ_O;]NQ_kbl!Z +U* fgB (O4N0!. LnQe!29)+0NUn)!23KN+.
G _o; +NXN) b#XNS>fgB (O0..N+LF)!23KN+.
(p)49LR9)!L39UnS!0!4e+O0U*hLNM!+O)0q9NDDhDeU/KL39U
)49L brG_[Z#]NTN_JZiG_[Z#]Q_sZiG_[Z 10NX b)857t>uG_JZAffgB

D9#2Kr9@)3#+.!ZvKr911 +#UL .!ZwM+4@)x+rL K5yS 3.3@LzMLLr)jl|{{@}}}j5~LrM+#) 9@UqL03r45U+rL{


 !"#%$& '()"  90(3.3)

Barrier options are examples of path dependent contracts, since the final
payoffs depend on the price path before maturity. This path dependence is con-
sidered mild, since we are not interested on the actual levels, but only on the
behavior relative to the barrier, i.e. if the barrier is triggered.
For example, consider an up-and-out call, where the spot price of the under-
lying is S0 = $85, the strike price is K = $105 and the barrier is at B = $120.
This contract will pay off only if the price of the underlying remains below $120
for the life of the option. If St > B at any t, then the payoffs (and the value of
the contract) become zero. One can see that we should expect this contract to
have some strange behaviour when the price is around the barrier level.
Contrast an up-and-in call with the same specifications. For the contract to
pay anything, the price has to reach at least St = $120 for some t (but might
drop in later times).
Now suppose that an investor holds both contracts, and observe that (for any
sample path) the barrier can either be triggered or not. Thus, when one is active
the other one is not. Holding both of them replicates the vanilla call. Therefore,

PU&O + PU&I = PCall , and PD&O + PD&I = PPut

In the above examples the barrier contract was monitored continuously. For
such contracts closed-form solutions exist. In practice though, barrier options
are motitored discretely, that is to say one examines where the underlying spot
price is with respect to the barrier at a discrete set of points. For example a
barrier contract might be monitored on the closing of each Friday. Monitoring
can have a substantial impact on the pricing of barrier options. For that reason
numerical methods are employed to price barrier options.
An up-and-out option will follow the BS PDE, where a boundary will exist
at St = B (in fact f(t, S) = 0 for all S > B). This feature can be very easily
implemented in the finite difference schemes that we discussed. In particular,
the barrier will be active only on the monitoring dates, and a PDE with no
barriers10 will be solved. Essentially, we can compute the updated values f i+1
normally, and then impose the condition f(ti , xj ) = 0 if ti is a monitoring date
and exp(xi ) > B.
A Matlab listing that implements pricing of up- and down-and-out calls and
put is given in 3.8. The snippet that calls this function is given in 3.9.

COMPUTING THE GREEKS


Using finite differences is very useful when one is looking for the hedge parame-
ters, in particular the option’s Delta and Gamma. Given that they are quantities
that are difined as derivatives with respect to the price, one can compute them
rapidly over the given grid. Some care has to be taken here, as we have imple-
mented the discretization in log-prices. The Delta and Gamma of the option will
10
Of there will be barriers imposed at extreme values that are necessary to discretize
the state space.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


91(3.3) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 3.8:
%X ` _ T `_ $>RRg Y : Solver with barrier features.
p 1C“4s q 5s q N,%,@Bw8
*u%;t.=O3+>; x9yDv6{=Dv6 ³O|~}~1C“4s q 5s q N,%,2*6V1G
q N,%,%³Œ}š1vB q N,%,O34,%=O3¾ 8a4C5°† prq N,%,O34,‘=O38a4C5
q N,%,¿ƒ}š1vB q N,%,O34,CA4D4A † prq N,%,O34,¡A4D4A
5
q N,%,N€‡}š1vB q N,%,O34,%“O3,4O.=a3+; † p u1 :„~+>,Ž“+>J; 2‚C
+=?O4,”5=uC*%*
prq +>u%;C“N,%Mþ.+>;C“O3=O3+>;t5
3 * q +>u%;C“%=%M1O4 p ¥a3,O3%. ?aA4=ˆ.+>;C“O3=O3+>;t5
10
Šz:„ }r® 8¡— q *%‚:„Œ—¡0„ž q N,%,N€(c†
Šg 4>;“ š}r®>1”— q *%‚<Œ— q N,%N, €v†
4%A%54 p ¯C4u8aN;%;ˆ.+>;C“O3=O3+>;t5
üg:„e6< }r® 8‘‹~®>1v†
üe-¯ya‹C„c6¬¯yGš}r® 8‘‹~®>1v†
15
Šz:„ }Kž>—C“yK—® 8¡—•“*%‚v:„~—’0„ž q N,%,N€c(†
Šg 4>;“ š}Œ—“yK—®>1”—•“*%‚v<~— q N,%N, €†
4>;“
p u1”+>,Ž“+>J;‘8uaA=O3 1aA34,
3 * q N,%N, € p u1žN ;%“ež+ u=
20
q N,%,€‡}  4y1 <yD°-‡œ>} q N,%,¿cc†
4%A%54 p “%+ J;žN ;%“ež+ u=
q N,%,€‡}  4y1 <yD°-‡›>} q N,%,¿cc†
4>;“
p Š%,O3“O5Œ+>*Ž,4C5 uaA=O5
¾ ³³v%FO6:„Ÿ}}~*‰‚4,CB2—+q5 N<¯,%yt,€‹C„c† 6¬¯=tp‹O„3 ;t† 3=O3NA”.+>;C“O3=O3+>;
25

¾3 * q N,%,%³‡}}Œ‚ p .+>;C=O3 ;%ua+>ut5ƒ8t+>;t3=C+,O3 ;CŠ


q N,%,%³ƒ}r=Dz†
4%A%54 p N“>ut5=Œ8t+>;t3=C+,O3 ;CŠ¡1a+3 ;C=O5
30
q N,%,%³Nƒ} 3 ;C=4,1G„ <=Dv6ª=Dv6 q N,,³6Œ2;O4%N,4C5=°:c†
4>;“
p A++ 1’=?C,C+>uCŠ?¡=C3:8O4
*+, =;“yŽ}ŒcF-¯=K‹O„t†
=Œ}þ 4M%4 <¯yK‹O„‡žš=?O4=Nt—%üC—“=GŸÿ’r 4M%4 <¯yK‹O„•‹þ0„žKB:B:B
35
=¾ ?O4=Nt%—%üO—“=G•— ¾ ³F6:=>;%“yzž%„r‹•Ša—“=‘c†
3 *”5u8 ª=Dz0=;“y(%}} q N,%,%³N‘ p 3 *Œ8t+>;t3=C+,O3 ;CŠ’“%N=%4
³zF6=;“y(š} ¾ = B2— q N,%,€†
4%A%54 ¾
³zF6=;“y(š} ¾ =†
4>;“ ¾
40
4>;“

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  92(3.4)

LISTING 3.9:
%X `_ T `_ $>RR ` & Y '( Y : Implementation for a discretely monitored
barrier option.
p 1C“4s q 5s q N,,CsC38C1tAàBw8
1B q N,%,O34,%=O38a4C5”} x9‚F2‚B2‚™eF2‚B-%™|† prq N,%,O34,”=O38a4C5
1B q N,%,O34,CA4D4A¡}ƒ‚B%‚e† prq N,%,O34,¡A4D4A
1B q N,%,O34,%“O3,4C.=O3%+;ˆ}‘„a† p u1 :„~+>,‡“+>J; 2‚C
5

lines 2-12 of
1C“4s q 5s381aA Bw8
p .N%AA‡=?C4ƒ£¥»’5+%AD4,‡*+, q N,%,O34,¡+>1C=O3+>;t5
x9yDv6{=Dv6 ¾ ³O|~}r1C“4s q 5s q N,%,à ¦%s 1u=à6V1Gc†
p .,4%N=4ƒNŽ 5 uC,%*NC.4Œ1OA+=
10
5u,*  4y1 -yDG(6/=Dv6 ¾ ³ -(†

be equal to

∂f ∂f
∆= = exp(−x)
∂S ∂x
 
∂2 f ∂2 f ∂f
Γ= = − exp(−2x)
∂S 2 ∂x 2 ∂x

The derivatives with respect to the log-price x can be computed using finite
differences on the grid (in fact they have been computed already when solving the
PDE). Note that, since we approximate all quantities using central differences,
the first and last grid points will be lost.
The snippet in 3.10 shows how the Greeks can be computed over a grid, while
figure 3.8 gives the output. In order to make clear the effect of early exercise
we use a relatively high interest rate of 10%. We also implement a relatively
dense (100 × 100) grid over (t, S) to ensure that the derivatives are acurrate.
Observe that the Deltas of both options approach their minimum values of −1
in a continuous way. The Gammas, on the other hand, show different patterns
with the American Gamma jumping to zero.
Even if we use a stable FDM method, like the Crank-Nicolson, computing
the greeks does not always give stable results. For example figure 3.9 presents
the Greeks for the same American and European put options as 3.8, but with
the time steps decreased to 10. The Delta is apparently computed with errors,
which are magnified when the Gamma is numerically approximated. Note that
the instability is introduced by reducing the time steps; the log-price grid is
still based on 100 subintervals. In other cases explosive Greeks are an outcome
of the contract specifications. For instance a barrier option will exhibit Deltas
that behave very erratically around the barrier, since the pricing function is not
differentiable there.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


93(3.4) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 3.10:
% X `_ T ` b R%>SCT ` & Y C'( Y : PDE approximations for the Greeks.
p 1C“4s q 5sŠ%,44t5%sC38C1tAàBw8
1B0=?O4=N }Œ‚B<™‚e† p *+,Ž=?C4=%Než8a4=?a+“
1B+ 8a4ŠN }‘„GB<™‚e† p *+,‡£C¨À>¼
1B2, }Œ‚B:„>‚e† p ,C35’*,%44Ž,%N=%4
1B 5%3Š8aN }Œ‚B%‚e† p DC+%AN=O3A3=%M
5
1B2= }Œ‚B:„>‚e† p 8aN=uC,O3=%M
1B9© }‘„GB2‚™(† p 5=%,O3 O4ƒ1C,O3%.4
1B0=;%u8 q 4, }‘„ ‚‚(† p =C3:8O4 3 ;C=4,%DNA5
1B0y q +>u%;C“N,%M”}Œ‚B %‚e† p 8ONy‘A+>Šež1C,O3%.4
1B0y;%u8 q 4, }‘„ ‚‚(† p A+>Šž1C,O3%.4‘3 ;C=4,%DNA5
10
1B q +>u%;C“%=%M1O4‘}~‚c† prq +>u%;C“N,%M”=M1C4
x9yDv6{=Dv6 ¾ ³ 6{» O|~}r1C“4s q 5sN>8a4,à ¦%s 1u=à6V1cc†
x9yDv6{=Dv6 ¾ »%³a| }r1C“4s q 5g¦%s 1u=à6V1Gc†
~} ³F6 4>;“ g† p . uC,%,4;Cn = 8a4,O3%.N;”1C,O3%.4C5
»Œ} ¾¾ »%³F6 4>;“ g† p . uC,%,4;C=‘»uC,C+>1O4%N;”1C,O3%.4C5
15
y‚Œ}~yD<(F 4 ;%“ ž%„e† p =%,u%;t.N=4“¡Š,C3>“
¨‚Œ} 4y1 -y‚Kc† p 51O+=
¥yƒ}~yD<tžyDv:„e† p A+>Šž1C,O3%.4ŒŠ,C3>“’5>=%4>1
p *O3,O5=Ž“4,O3DN=O3D4C5ŽJO3>=? ,4C5 1O4C.=‡=%+ƒA+>Šež1C,O3%.4
O„Ÿ} @  z eF 4>;“ Cž z:„KF 4 ;%“ žaHc<%—C¥yGc†
20
»a„Ÿ} 2» eF 4>;“ Cž »:„KF 4 ;%“ žaHc<%—C¥yGc†
p 54C.+>;C“‘“4,O3DN=O3D4C5ŽJO3>=? ,4C5 1O4C.=‡=%+ƒA+>Šž1C,O3%.4
%~} @  z eF 4>;“ Kž>— z<(F 4 ;%“ ž%„%‹ z:„KF 4 ;%“ žOHG0¥yce†
»~} 2» eF 4>;“ Kž>—»<(F 4 ;%“ ž%„%‹»:„KF 4 ;%“ žOHG0¥yce†
¥ ƒ
 } O„B9Ha¨‚† p 8a4,O3%.N; “4A=N
25
¥»‡}r»a„B9Ha¨‚† p »uC,C+>1O4%N; “4A=N
¦ ƒ} r %Ž‚
 ž O„CeB9Ha¨‚B9Ht¨‚g† p 8a4,O3%.N;’ŠN>8%8aN
¦>»‡} -»Žž »a„CeB9Ha¨‚B9Ht¨‚g† p »uC,C+>1O4%N;’ŠN>8%8aN
p 1aA%+=O5Œ+>*Ž“4A=NC5ŒN>;“ŽŠN>8%8aNC5
5 u q 1aA%+= 0„e6260„z† 1OA+=  4y1 -y‚K(6VN¥ v6 4y1 0y‚te6/¥»cc†
30
5 u q 1aA%+= 0„e6262%z† 1OA+=  4y1 -y‚K(6{¦ v6 4y1 0y‚te6ª¦>»cc†

3.4 MULTIDIMENSIONAL PDES


In many cases the problem in hand can only be cast in a PDE form that has more
than one space dimensions. This can be the case of a derivative that depends
on more than one asset, or a derivative that depends on a single that exhibits
stochastic volatility, or even a derivative in a BS world that is strongly path-
dependent.
Typically the PDE will be still a parabolic one, with a multidimensional
elliptic operator. For example in the two-dimensional case the operator on the
function f = f(t, x, y) will be

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.10 !"#%$& '()"  94(3.4)
0.2
0.3
FIGURE
0.4
3.8: Greeks for American and European puts. A European and an Amer-
ican0.5put are priced using the Crank-Nicolson method on a (100, 100) grid over
(t, S),
0.6 and the Greeks are computed using finite differences. The Greeks for the
0.7
European put are given in red and for the American put in blue
0.8
0.9
1 0 4.5

-0.1 4

-0.2
3.5
-0.3
3
-0.4
2.5

gamma
delta

-0.5
2
-0.6
1.5
-0.7
1
-0.8

-0.9 0.5

-1 0
0.8 1 1.2 1.4 0.8 1 1.2 1.4
spot price spot price

∂f ∂f ∂2 f ∂2 f ∂2 f
L f = αx + αy + βx 2 + βy 2 + βxy + γf
∂x ∂y ∂x ∂y ∂x∂y

Apparently we will need to discretise both dimensions to approximate the


elliptic operator. The price function at the typical grid point will be now f j,k (t) =
f(t, xj , yk ). The single-variable derivatives pose no real problem, we just need to
take some care when computing the cross derivative approximation. For example,
one can use the Taylor’s expansion of the values fj±1,k±1 and fj±1,k±1

∂fj,k ∂fj,k
fj±1,k±1 = fj,k + (±4x) + (±4y)
∂x ∂y
1 ∂2 fj,k 2 1 ∂2 fj,k 2 ∂2 fj,k
+ 4x + 4y + (±4x)(±4y) + o(4x 3 , 4y3 )
2 ∂x 2 2 ∂y2 ∂x∂y

The operator

fj+1,k+1 + fj−1,k−1 − fj+1,k−1 − fj−1,k+1


D2xy fj,k ≈
44x4y
∂2 f(t,xj ,yk )
will approximate the cross derivative ∂x∂y

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


Q #RO& $>S%!%TVU>! WRX$>SO&T
0
95(3.4)
0.1
0.2
0.3
FIGURE
0.4
3.9: Oscillations of the Greeks in FDM. A European and an American put
priced using the Crank-Nicolson method on a (10, 100) grid over (t, S), and
are 0.5
the 0.6
Greeks are computed using finite differences. The Greeks for the European
put 0.7
are given in red and for the American put in blue
0.8
0.9
1 0 30

-0.1 25

-0.2
20
-0.3
15
-0.4
10

gamma
delta

-0.5
5
-0.6
0
-0.7
-5
-0.8

-0.9 -10

-1 -15
0.8 1 1.2 1.4 0.8 1 1.2 1.4
spot price spot price

This uses four points to approximate the cross derivative, but it is not the
only way to do so.11 In any case we can write the discretized operator

L = αx Dx + αy Dy + βx D2x + βy D2y + βxy D2xy + γ

If we consider an (Nx , Ny )-point grid over (x, y), then we can construct the
matrix Q which will be (Nx × Ny , Nx × Ny ). The prices f(t, xj , yk ) actually form
a matrix F(t) for a given t, but we prefer to think of them as a vector f = f (t)
produced by stacking the columns of this matrix. Therefore, the price f(t, x j , yk )
will be mapped to the (k − 1)Nx + j element of f
 
f(t, x1 , y1 )
 f(t, x2 , y1 ) 
 
 .. 
 . 
 

f =  f(t, xNx , y1 )  
 f(t, x1 , y2 ) 
 
 .. 
 . 
f(t, xNx , yNy )

11
For example Ikonen and Toivanen (2004) give an alternative.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  96(3.4)

FIGURE 3.10: The structure of the Q-matrix that approximates a two-dimensional


diffusion.

 
F F 0 0 0 0 F F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 
 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 
 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 0 0 0 0 0 
 
 0 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 0 0 0 0 
 
 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 0 0 0 0 0 0 0 0 
 
 F F 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 0 0 0 0 0 0 
 
 F F F 0 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 0 
 
 0 F F F 0 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 0 
 
 0 0 F F F 0 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 0 
 
 0 0 0 F F F 0 0 0 F F F 0 0 0 F F F 0 0 0 0 0 0 
 
 
 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 0 0 
 
 0 0 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 
 
 0 0 0 0 0 0 F F F 0 0 0 F F F 0 0 0 F F F 0 0 0 
 
 0 0 0 0 0 0 0  ♠  0 0 0 ♣ F ♣ 0 0 0  ♠  0 0 
 
 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 0 0 0 F F F 0 
 
 0 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 0 0 0 F F F 
 
 0 0 0 0 0 0 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 F F 
 
 0 0 0 0 0 0 0 0 0 0 0 0 F F 0 0 0 0 F F 0 0 0 0 
 
 0 0 0 0 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 0 0 0 
 
 0 0 0 0 0 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 0 0 
 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 0 
 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F F F 0 0 0 F F F 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F F 0 0 0 0 F F

Matrix Q will now be a block-tridiagonal matrix, with block elements that are
tridiagonal themselves. Also Q is a banded matrix, meaning that all elements
can be included within a band around the main diagonal. The structure is given
in figure 3.10 for an approximation that uses six points to discretize x and four
points to discretize y. The elements ♣ give the elements that reflect moves to
j ± 1, used for the derivatives with respect to x; the elements ♠ reflect moves
to k ± 1, used for the derivatives with respect to y; and the elements  reflect
moves to (j ± 1, k ± 1), used for the cross derivative.

FINITE DIFFERENCE APPROACHES


We have managed to represent the solution of the two-dimensional BS PDE as
a system of ODEs, which is very similar to the approach we took when we first
discussed the one-dimensional problem. The difference is of course that now the
size of the matrix Q is larger by one order of magnitude. Nevertheless, we can
represent the problem as

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


97(3.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
∂f (t)
= Q · f (t)
∂t
By using the same arguments we can once again one can construct explicit,
implicit and θ-methods. For example, the Crank-Nicolson scheme will be of the
usual form    
1 i+1 1
I − Q4t · f = I + Q4t · f i
2 2
As an example consider an option with payoffs that depend on two correlated
assets that follow geometric Brownian motions. The BS PDE in terms of the log-
prices x and y will be

∂f(t, x, y) ∂f(t, x, y) ∂f(t, x, y)


= αx + αy
∂t ∂x ∂y
1 2 ∂2 f(t, x, y) 1 2 ∂2 f(t, x, y) ∂2 f(t, x, y)
+ σx + σ + ρσ x σ y − rf(t, x, y)
2 ∂x 2 2 y ∂y2 ∂x∂y

We also assume that a set of Neumann conditions is specified at each bound-


∂f ∂f
ary, namely the corresponding derivatives ∂x ( ∂y resp.) being equal to φx,1 and
φx,Nx (φy,1 and φy,Ny resp.).
To build Q we essentially consider an (Ny , Ny ) tridiagonal matrix, where
changes in y are captured, which has elements that are (N x , Nx ) matrices where
changes in x are captured. We can represent in the following form where all
submatrices have dimensions (Nx , Nx )
 
DB
C D E 
 
 C D E 
 
 . . .
.. .. .. 
Q=  (3.11)
 
 C D E  

 C D E
FD

BOUNDARY CONDITIONS
The boundary conditions will have an effect on these matrices. In particular, the
first and last rows of all matrices will depend on boundary conditions on x. In
addition, all elements of the block matrices B and F will depend on the boundary
conditions imposed on y. The generic sum that Q implements is given by

∂fj,k
= q(−,−) fj−1,k−1 + q(0,−) fj,k−1 + q(+,−) fj+1,k−1
∂t
+ q(−,0) fj−1,k + q(0,0) fj,k + q(+,0) fj+1,k
+ q(−,+) fj−1,k+1 + q(0,+) fj,k+1 + q(+,+) fj+1,k+1

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  98(3.4)

where the coefficients are given by the following quantities (with the elements
that correspond to figure 3.10 also indicated)
(♣): q(±,0) = ±αx /(24x) + σx2 /(24x 2)
(♠): q(0,±) = ±αy /(24y) + σy2 /(24y2)
(F): q(0,0) = −r − σx2 /(4x 2 ) − σy2 /(4y2 )
(): q(+,+) = q(−,−) = +ρσx σy /(44x4y)
(): q(−,+) = q(+,−) = −ρσx σy /(44x4y)
Boundary conditions will influence the first and last rows of each block, as
this is where the boundaries of x are positioned. The whole first and last blocks
will be also affected, since this is where the boundaries of y are positioned.
The first and last rows of these particular blocks will correspond to the corner
boundaries. Also, the boundaries will specify a matrix of constants G, just like
the vector of constants we constructed in the univariate case.
For Neumann conditions the elements (1, 2) and (Nx , Nx − 1) of each block
are given by q(+,·) + q(−,·) . Of course a similar relationship will hold for all
elements of the (1, 2) and (Ny , Ny − 1) block, which will have elements given by
q(·,+) + q(·,−) . Apparently the (1, 2) and (Nx , Nx − 1) elements of these particular
blocks will be dependent on both boundary conditions, and also the boundary
condition across the diagonal. The values for these elements will be given by
q(+,+) + q(−,+) + q(+,−) + q(−,−) .
The elements of the matrix G will be also determined by the Neumann
∂f(x ,y )
conditions for k = 2, . . . , Ny −1 and j = 2, . . . , Nx −1. Say that φx,(j,k) = ∂xj k ,
∂f(x ,y )
and φy,(j,k) = ∂yj k .

G(1, k) = −2q(−,0) φx,(1,k) 4x, G(Nx , k) = +2q(+,0) φx,(Nx ,k) 4x


G(j, 1) = −2q(0,−) φy,(j,1) 4y, G(1, Ny ) = +2q(0,+) φy,(1,Ny ) 4y
The corner elements of G will be determined by both boundary conditions,
as well as the boundary across the diagonal. For example, the element (1, 1) will
be

G(1, 1) = −2q(−,0) φx,(1,1) 4x − 2q(0,−) φy,(1,1) 4y


 q
+ q(−,−) −φx,(1,1) − φy,(1,1) 4x 2 + 4y2

The other four points have similar expressions. We will vectorize the constrains
by stacking the columns of G into the vector g.
If we include the impact of the boundary conditions (and keep in mind that
they might be time varying), the system of ODEs that will give us an approximate
solution to the two-dimensional PDE is now given by
∂f (t)
= Q · f (t) + g(t) (3.12)
∂t
If the boundary conditions are homogeneous, g(t) = g, then the solution of the
system is

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


99(3.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
f (t) = exp(Qt) · f (0) + Q−1 · [exp(Qt) − I] · g
We also cast system (3.12) in the θ-form, and approximate it as the solution
of the updating scheme

(I − θ · Q4t) · f i+1 = (I + (1 − θ) · Q4t) · f i + θ · gi+1 + (1 − θ) · gi

Once again if the boundaries are homogeneous in time the scheme can be written
as
(I − θ · Q4t) · f i+1 = (I + (1 − θ) · Q4t) · f i + g
In theory solving this system does not present any differences, but in practice
it might not be feasible since Q is not tridiagonal. For that reason a number of
alternating direction implicit (ADI) and local one-dimensional (LOD, also known
as Soviet splitting) schemes are typically used. Such schemes do not solve over
all dimensions simultaneously, but instead split each time step into substeps,
and assume that over each substep the system moves across a single direction.
Therefore at each substep one has to solve a system that is indeed tridiagonal.

ALTERNATIVE DIRECTION IMPLICIT METHODS


To understand the ADI methods it is intuitive to write down the Crank-Nicolson
system in terms of operators

1 − αx Dx − αy Dy − βx? D2x − βy? D2y − βxy D2xy − γ f i+1

= 1 + αx Dx + αy Dy + βx? D2x + βy? D2y + βxy D2xy + γ f i

4x β βxy 4y
where we have defined βx? = βx − 2xy 4y and βy? = βy − 2 4x , to save some
space. Now we can put down the approximation
h γih γih γ i i+1
1 − αx Dx − βx? D2x − 1 − βxy D2xy − 1 − αy Dy − βy? D2y − f
h 3 i h 3 i h 3
γ γ γi i
= 1 + αx Dx + βx? D2x + 1 + βxy D2xy + 1 + αy Dy + βy? D2y + f
3 3 3
It is tedious to go through the algebra, but one can show that the approxi-
mation of the operators is at least of second order in time and both directions.
Therefore the results are not expected to deteriorate due to this operator split-
ting. In the Peaceman and H. H. Rachford (1955) scheme we implement the
following three steps, solving for auxiliary values f ? and f ??
h γi ? h γi i
1 − αy Dy − βy? D2y − f = 1 + αy Dy + βy? D2y + f
h 3i h i 3
γ ?? γ ?
1 − βxy D2xy − f = 1 + βxy D2xy + f
h 3i h 3
γ i+1 γ i ??
1 − αx Dx − βx? D2x − f = 1 + αx Dx + βx? D2x + f
3 3

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  100(3.5)

For the D’yakonov scheme (see Marchuk, 1990; McKee, Wall, and Wilson,
1996) we use a slightly different splitting where at the first step we produce the
complete right-hand-side
h γi ? h γih γi
1 − αy Dy − βy? D2y − f = 1 + αx Dx + βx? D2x + 1 + βxy D2xy +
3 h 3 3
? 2 γi i
1 + α y Dy + β y Dy + f
h i 3
γ ??
1 − βxy D2xy − f = f?
h 3
γ i i+1
1 − αx Dx − βx? D2x − f = f ??
3
In both cases the operations are implemented using matrices that can be cast
in tridiagonal form by permutations of their elements. In the multidimensional
PDE problems, one has to take special care when dealing with the boundary
conditions, as it may be confusing. Also, some decisions have to be made on the
corners, which are affected by boundary conditions on more than one dimension.

3.5 A TWO-DIMENSIONAL SOLVER IN MATLAB


We now turn into the implementation of a PDE solver in two space dimensions,
using the θ-method. Essentially this is more of a book-keeping exercise, where
we need to consider the structure of matrix Q, and especially the boundary
conditions. Here we will focus on boundary conditions of the Neumann type.
We will assume that the payoff function returns not only the function values,
but also the derivative over the boundaries, together with the derivatives at the
corner points across the diagonal directions. Figure 3.11 shows the positions
of these boundaries. Each horizontal slice gives the (x, y)-grid at a different
time point. The colored points denote the boundary and initial values that are
necessary to solve the PDE numerically. In particular the bottom slice, at t = 0,
gives the set of initial conditions that need to be specified to start the algorithm.
In the next time periods the boundaries are illustrated. The blue points show the
boundaries at x = x1 and x = xNx , while the green points show the boundaries
at y = y1 and y = yNy . At the black (corner) point both boundary conditions will
have an impact. Essentially these point illustrate where matrix G has potentially
non-zero elements. The elements of matrix Q that are affected lie just within
these points.
As an example we will use a correlation option, which is essentially a Eu-
ropean call option on the minimum price of two underlying assets. The payoff
function of this derivative is

Π(S1 , S2 ) = max (min(S1 , S2 ) − K )

We will make the assumption that both assets follow geometric Brownian mo-
tions, with correlation parameter ρ. The pricing function will satisfy the two-

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


101(3.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
FIGURE 3.11: Schematic representation of the evolution of a two-dimensional
PDE. The points where initial and boundary conditions have to be specified
are also illustrated. red points: initial conditions (t = 0) active; blue points:
boundary conditions on x active; green points: boundary conditions on y active;
and black points: both boundary conditions active.

PSfrag replacements
3

2
−→
t

0
15
10 20
15
5 10
5
0 0
y x

LISTING 3.11:
^>$'' Y 0& " Y : Payoff and boundaries for a two-asset option.
p .NA%A 8K3 ;Bw8
*u%;t.=O3+>; xwMz6ªM¥a|~}Ž.NA%A 8K3 ; <yDv6{MDv6V1G
©Œ}š1B9©† p 5=%,O3 O4ƒ1C,O3%.4
x-5yv6±5 MC|Œ} 8a4C5 ?CŠ%,O3“  4y1 0yDG(6 4y1 -MDGc† p Š,C3>“
“yƒ}~yD<tžyDv:„e†ª“Mƒ}~MD<tžMDv:„e† p “O3*%*4,4;C=O3NAC5
5
M~} 8ONy  8t3; :5yv65 MGtž„G6ù‚(† p 1ONMC+*%*‘*u%;t.=O3+>;
p 1ON,%=O3NAޓ4,O3DN=O3D4C5
M¥vB<¥%y%‚Œ}þ<M<6%F-až>Mz:„e6%F-CH“Mz† p 0„’6‘
M¥vB<¥%y¯‡}þ<Mg 4 ;%“ 6%F-až>Mg 4 ;%“ ž%„e6%F-CH%“Mz† p 2¯yv69‘
M¥vB<¥%M%‚Œ}þ<M%FO6<až>Mz%FO6:„CH“yz† p r ú6-„•
10
M¥vB<¥%M¯‡}þ<MgF6 4>;“ Cž>MF6 4 ;%“ ž%„OH%“yz† p # 6{¯MG
p “O3NŠC+>;ONA‘“4,O3DN=O3D4C5
“‰ƒ} 5>®,= -“yc‹O“Mce† p “C3NŠ¡“O3*%*4,4;C=O3NA
M¥vB<¥‚‚Œ}þ<M:„e6:„až>Mz<6<CH“‰z† p 0„ˆ6-„•
M¥vB<¥‚¯‡}þ<Mg0„c6 4>;“ Cž>M26 4 ;%“ ž%„OH%“‰z† p 2¯y°60„~
15
M¥vB<¥¯‚Œ}þ<Mg 4 ;%“ 6:„až>Mg 4 ;%“ ž%„e6<CH%“‰z† p 0„’62¯MG
M¥vB<¥¯¯‡}þ<Mg 4 ;%“ 6 4>;“ Ož Mg 4 ;%“ ž„c6 4 ;%“ ž%„OH“‰† p 2¯yv6<¯MG

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  102(3.6)

dimensional Black-Scholes PDE, which in terms of the log-prices x 1 = log S1


and x2 = log S2 can be written as

∂f(t, x1 , x2 ) ∂f(t, x1 , x2 ) ∂f(t, x1 , x2 ) 1 2 ∂2 f(t, x1 , x2 )


− + α1 + α2 + σ1
∂t ∂x1 ∂x2 2 ∂x12
1 ∂2 f(t, x1 , x2 ) ∂2 f(t, x1 , x2 )
+ σ22 + ρσ 1 σ 2 − rf(t, x1 , x2 ) = 0
2 ∂x22 ∂x1 ∂x2
for α1 = r − 21 σ12 and α2 = r − 21 σ22 . As this is a European style contract, the
Neumann boundaries across x1 and x2 will be such that the derivative at each
one of these points is the same through time. Therefore, and since the function is
piecewise linear, there is no point in explicitly computing the partial derivative
at each point, since we can do that numerically. Listing 3.11 gives the Matlab
code that returns the payoff function and vectors of derivatives. One can verify
that the derivatives are computed numerically rather explicitly in all directions.
Listings 3.12-3.13 give the Matlab code that implements the θ-method to
solve the two-dimensional BS PDE. In the first part we setup the matrices that
will serve as the blocks that make up the Q-matrix. Given the small number
n
of nonzero elements, all matrix definitions and manipulations are done using
sparse matrix commands. Matrices [ to corresponds to the blocks B . . . F in
equation 3.11. Matrix f keeps the constraints, as discussed in section 3.4, while
the reshaped (stacked) form b corresponds to vector g.
The Matlab code that actually implements the Crank-Nicolson method to
price the correlation is given in listing 3.14. Two assets are considered that
exhibit different volatilities. The discretization grid across the two dimensions is
constructed using (51×51) points. The call has half year to maturity, and we use
30 time steps to compute the price. Therefore we will need to solve 30 systems
of 2601 equations with 2601 unknowns to arrive to the result: a substantial
computational demand.

3.6 EXTENSIONS
Apart from the contracts and the techniques we discussed, there is a very large
number of exotic options with features that can be implemented within the PDE
framework. Sometimes we will need to extend the dimensionality of the problem
to accommodate for these special features. For example, in many cases a rebate
is offered when the barrier it triggered. This will make sure that breaching the
barrier will not leave you empty handed. It is straightforward to handle such
rebates in the finite differences procedure.
Other contracts attempt to cushion the barrier effect and the discontinuities
it creates. For example, in Parisian options the barrier is triggered only if the
barrier remains breached for a given (cumulative) time. To solve for this option
we need to introduce an extra variable, namely the cumulative time that barrier
has been breached, say τ.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


103(3.6) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 3.12:
%X ` _ T `h X Y : Solver for a two dimensional PDE (part I).
p 1C“4s q 5s%“Bw8
*u%;t.=O3+>; x<¨ yv6¬¨ Mv6ª*€a|~}~1C“4s q 5s%“°2*6V1G
p 8t+“4A~1ON,N>8a4=4,O5
,~}š1B2,g†¬,?O+Œ}š1B-,?O+g†ù=?O4=NŒ}š1B0=?O4=Ng†
³~}š1B2³g†ª¯= }š1B2¯=z†
5
5 yƒ}š1B 5%3Š8aNy†¢5 Mƒ}š1B 5%3Š8aNM†
Nyƒ}•,‘žŸ‚eB2™%—K5 ycc†¢NMƒ}•,‘žŸ‚eB2™%—K5 Mcc†
p “O3%5%.,4=O3‰N=O3%+>;ú1ON,N>8a4=4,O5
q yƒ}š1B q yz†{¯yƒ}š1B2¯yz† q Mƒ}š1B q Mz†{¯Mƒ}š1B<¯Mz†
“yƒ}ƒ— q yKHG-¯yž%„e†ªyDƒ}až q yF<“yF q yz† p Š,C3>“’+D%4,”A+>Šež%¨„
10
“Mƒ}ƒ— q MKHG-¯Mž%„e†ªMDƒ}až q MF<“MF q Mz† p Š,C3>“’+D%4,”A+>Šež%¨
“‰ƒ} 5>®,= -“yc‹O“Mce† p “O3NŠC+>;ONAŽŠ,C3>“’53>‰%4
x9*C„6¬M¥O|Œ} *4DNA 2*6{yDv6{MD°6V1Gc† p .N%AA‡1ONMC+*%*Ž*u%;t.=O3+>;
p “4,O3DN=O3D4C5’+D%4,‘=?C4 q +>u%;C“N,O34C5
¥%y%‚Œ}~M¥B<¥%y%‚z†ª¥%y¯‡}~M¥B<¥%y¯v†ª¥%M%‚Œ}rM¥vB<¥%M%‚†¬¥%M¯‡}rM¥vB<¥%M¯†
15
¥‚‚Œ}~M¥B<¥‚‚z†ª¥‚¯‡}~M¥B<¥‚¯v†ª¥¯‚Œ}rM¥vB<¥¯‚†¬¥¯¯‡}rM¥vB<¥¯¯†
p 5=NC.  .+%A>u8;t5
*‚Œ} ,4C5 ?ON1O4 <*C„6{¯MK—¯yv6·„e†
¨>yƒ} 4y1 -yDGc†ù¨>Mƒ} 4y1 0MDGc† p NC.=uONAŒ1C,O3%.4C5
®1C‚Œ}ƒ‚eB2™%—aNytH%“yƒ‹Œ‚eB2™%—G5 yGHO“ycc† p 2‹z62‚C
20
®>8O‚‡}cž>‚eB2™%—ONytH%“yƒ‹Œ‚eB2™%—G5 yGHO“ycc† p >že69‚
®%‚>1Ž}ƒ‚eB2™%—aNMtH%“yƒ‹Œ‚eB2™%—G5 MGHO“Mcc† p 2‚z62‹C
®%‚ 8¡}cž>‚eB2™%—ONMtH%“yƒ‹Œ‚eB2™%—G5 MGHO“Mcc† p 9‚6>ž
®%‚‚Œ}až,‘ž•5 ycHC“yG‘ž•5 MGHO“Mcc† p 2‚z62‚C
®11Ž}ƒ‚B<%™—C,?O+a—C5 yK—C5 MtH“yaH“Mz†¬®>88‘}~®11v† p 2‹z62‹Ce6• ž(6 ž
25
®18‘}až®11°†¬®>8%1Ž}až®11v† p 9‹6>že6r>že69‹
‡
} + C
; 
4 g
5 <
 
¯ v
y 6 
„ e
 † Œ
‚ ‡
} + C
; 
4 
5 2
 
¯ e
y 
ž c
„ ·
6 
„ e
 †
pº 8aN=%,O3%.4C5‡*+,~yKž qº +>u%;C“N,O34C5
¥Œ} 5 1C“O3NŠO5 0®%‚‚t— º 69‚g6<¯yv6-¯yG•‹eBBB
5 1C“O3NŠO5 0®>8O‚t— 6 ž„c6<¯yv6-¯yGš‹ 5 1C“O3NŠO5 0®1C‚K— º 69‹C„(60¯yv6<¯yGc†
30
«r} 5 1C“O3NŠO5 0®%‚ 8(— ºº 69‚g6<¯yv6-¯yG•‹eBBB
5 1C“O3NŠO5 0®>88(— 6 ž„c6<¯yv6-¯yGš‹ 5 1C“O3NŠO5 0®18e— º 69‹C„(60¯yv6<¯yGc†
»Œ} 5 1C“O3NŠO5 0®%‚>1c— ºº 69‚g6<¯yv6-¯yG•‹eBBB
5 1C“O3NŠO5 0®>8%1c— 6 ž„c6<¯yv6-¯yGš‹ 5 1C“O3NŠO5 0®11(— º 69‹C„(60¯yv6<¯yGc†
prq +>u%;C“N,%Mþ.+,%,4C.º =O3+>;t5‘*+,ƒy
35
¥:„e6<Ÿ}~®1C‚a‹C®>8O‚†/¥2¯yv62¯yž%„r}~®1C‚O‹C®>8O‚†
«:„e6<Ÿ}~®18c‹C®>88°†ª«2¯yv62¯yž%„r}~®18G‹C®>88°†
»:„e6<Ÿ}~®11G‹C®>8%1v†/»2¯yv62¯yž%„r}~®11K‹C®>8%1v†
p 8aN=%,O3%.4C5‡*+,~MKž q +>u%;C“N,O34C5
¿Œ}Ÿ»Œ‹ 5 1C“O3NŠO5 0®%‚ 8(— º 69‚g6<¯y°6<¯yG•‹eBBB
40
5 1C“O3NŠO5 0®>88(— º 6 ž„c6<¯yv6-¯yGš‹ 5 1C“O3NŠO5 0®18e— º 69‹C„(60¯yv6<¯yGc†
¾ r
} r
« ‹ 5 1C“O3NŠO5 0®%‚>1c— 69‚g6<¯y°6<¯yG•‹eBBB
5 1C“O3NŠO5 0®>8%1c— º 6 ž„cº 6<¯yv6-¯yGš‹ 5 1C“O3NŠO5 0®11(— º 69‹C„(60¯yv6<¯yGc†
prq +>u%;C“N,%Mþ.+,%,4C.=O3+>;t5‘*+,ƒM
¿:„e6<Ÿ}•¿z:„e6<•‹~®>88°†/¿<¯yv62¯yž%„•}Ÿ¿2¯yv6<¯yž%„•‹~®18°†
¾ :„e6<Ÿ} ¾ :„e6<•‹~®>8%1v† ¾ <¯yv62¯yž%„•} ¾ 2¯yv6<¯yž%„•‹~®11v†
45

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  104(3.6)

LISTING 3.13:
% X `_ T `h X Y : Solver for a two dimensional PDE (part II).
ü•} 5 1ON,O54 <¯yK—¯Mv6{¯yK—¯MKc† p üaž8aN=%,O3y
¦r} 5 1ON,O54 2¯yv6<¯MGc† p 8aN=%,O3yŽ+>*¡.+>;t5=N;C=O5
p *O3,O5= q A%+. Ž,+>J
üg:„KF<¯y 6:„KF0¯yGš}Ÿ¥†
50
üg:„KF<¯y 60¯yK‹O„F<%—C¯yGš}Ÿ¿z†
¦%FO6:„Ÿ} ž>—K ®>88G‹C®%‚ 8G‹C®18e—¥%M%‚t—“Mz†
¦:„e6:„­„ } ƒz:„e6<’ž>—K®>88G‹C®>8O‚a‹C®>8%1(%—¥%y%‚:„BBB
—“yž%—O®>88(—¥‚‚t—“‰z†
¦2¯y 6:„•„ } ƒ2¯y 6:„C‹%%—G>®18G‹C®1C‚a‹C®11(%—¥%y¯°:„BBB
55
—“yž%—O®18(—¥¯‚t—“‰z†
*+, >Mt}%(F-¯Mež„ p A++ 1’=?C,C+>uCŠ?‘8K3“%“CA4 q A%+. ‘,+>JO5
üg Mž%„O—¯yK‹O„KF >MK—¯y 6 Mž%„C—¯yK‹O„KF >Mt—¯yG }Ÿ¥†
üg Mž%„O—¯yK‹O„KF >MK—¯y 6 MžC—¯yK‹O„KF Mž%„O—¯yGš}r«†
üg Mž%„O—¯yK‹O„KF >MK—¯yv6 >MK—¯yK‹O„KF >MK‹O„C—¯yG }Ÿ»†
60
¦g0„c6 >MGš}’ž>—K ®>88G‹C®>8O‚a‹C®>8%1(%—¥%y%‚ >MG%—%“yz†
¦2¯yv6 >MG} %—G®18G‹C®1C‚a‹C®11(%—¥%y¯ >MG%—%“yz†
4>;“
p AN5>= q A%+. ‘,+>J
ü-¯Mž%„C—¯yK‹O„KF-¯MK—¯y°6-¯Mž%„O—¯yK‹O„KF-¯MK—¯yGš}•¥†
65
ü-¯Mž%„C—¯yK‹O„KF-¯MK—¯y°6-¯MžO—¯yK‹O„KF:¯Mž%„C—¯yGš} ¾ †
¦gF60¯MKš}‡%—G®>88G‹C®%‚ 8G‹C®18e—¥%M¯G—“Mz†
¦g0„c60¯MKš}~¦0„c60¯MG”ž>—K®>88c‹C®>8O‚O‹C®>8%1(%—¥%y%‚-¯MG%—“y°BBB
ž>—%®>8%1(—¥‚¯G—“‰z†
¦2¯yv6-¯MK}~¦2¯yv6<¯MG%‹%%—G>®18c‹C®1C‚O‹C®11(%—¥%y¯-¯MG%—“y°BBB
70
ž>—%®11(—¥¯¯G—“‰z†
Š~} ,4C5 ?ON1O4 2¦6{¯MK—¯yv6·„(† p 5=NC. ’.+>;t5=N;C=þ.+%A>u8;t5
“=ƒ}•³tHG<¯=gž%„e† p =C3:8O4¡5>=%4>1
ü º } 5 1O4M4 <¯MK—¯yG~žŸ=?O4=Nt—“=K—%üe† p 381aA3%.%3=Ž1CN,=
ü »‡} 5 1O4M4 <¯MK—¯yGš‹þ0„ž=?O4=NK%—“=K—%üe† p 4y1aA3%.%3=Ž1CN,=
75
*~}~*‚g†
*+, 8G}O„KF<¯=ž„ p A++ 1’=?C,C+>uCŠ?¡=C3:8O4’5=41t5
*~}ƒü º ÿG ü »G—*~‹•Ša—%“=Kc† p 5+%AD4Œ=?C4=%Neža5MO5=4>8
4>;“
80
*€‡} ,4C5 ?ON1O4 9*z62¯yv6<¯MGc† p ,4cžO5=NC. Ž*+,”+>uC=1%uC=

Apparently, the derivative price will now be a function f(S, t, τ). Also, τ will
evolve as an ODE dτ = dt if st > log B and dτ = 0, otherwise. The price will
satisfy a different PDE within each domain

1
St < B : ft (S, t, τ) + rSfS (S, t, τ) + σ 2 S 2 fSS (S, t, τ) = rf(S, t, τ)
2
1
St > B : ft (S, t, τ) + rSfS (S, t, τ) + fτ (S, t, τ) + σ 2 S 2 fSS (S, t, τ)
2
= rf(S, t, τ)

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


105(3.6) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 3.14:
% X `_ T `h X ` & Y C'( Y : Implementation of the two dimensional solver.
p 1C“4s q 5s%“Cs38C1aA@Bw8
1B2, }Œ‚B2‚˜†{1B-,?O+ }Œ‚B%‚e† p 3 ;C=%,O5=ŽHƒ.+,,
1B 5%3Š8aNy‡}Œ‚B<‚e†{1B 5%3Š8aNM‡}Œ‚B%‚e† p DC+%AN=O3A3=O34C5
1B9© }‘„GB2‚%‚e†{1B0=?O4=N´}Œ‚B<™‚e† p 5=%,O3 O4‡H~=?O4=N
1B2³}ƒ‚B<™‚e†{1B2¯=ƒ } ‚(† p =C3:8O4’5=41t5
5
1B q yƒ}ƒ‚B<™‚e†{1B2¯yƒ}ƒ™C„t† p A+>Šež%¨„•Š,C3>“
1B q Mƒ}ƒ‚B<™‚e†{1B2¯Mƒ}ƒ™C„t† p A+>Šež%¨ŒŠ,C3>“
x<¨ yv6¬¨ Mv6¬*€a|~}r1C“4s q 5s%“°.N%AA8t3;76V1Gc† p .N%AAƒ1C,O3%.4,
3 yƒ} *C3;“ ¨ y(œ(B9²CB<—G¨ Me›O„GB<COc†
3 Mƒ} *C3;“ ¨ M(œ(B9²CB<—G¨ Me›O„GB<COc†
10
.+>;C=C+>uC, ¨>yz3 yG(6¬¨>Mz3 MGe6{*€:3Mv63 yGg<c†

To solve for this contract we would need a grid over a 3-D region, and of course
a more complex set of boundary conditions needs to be specified.
Another group of problems that can be attacked using PDEs arises when
single asset models with more than one factors are considered. For example
one might want to price derivative contracts under the Heston (1993) stochastic
volatility model, where
p
dS(t) = µS(t)dt + v(t)S(t)dBs (t)
p
dv(t) = κ [v̄ − v(t)] dt + φ v(t)dBv (t)
dBs (t)dBv (t) = ρdt

Here a derivative will apparently depend on the current volatility as well as


the price, having a pricing function f(t, S, v). Therefore, the PDE that will be
satisfied by such a contract will be a two-dimensional one.
Finally, in modeling fixed-income or credit related securities (and their
derivatives) one might need to resort to multi-factor specifications, for exam-
ple a corporate bond being a function of an M-dimensional state vector x(t)
that has dynamics express via a stochastic differential equation

dx(t) = µ(t, x(t))dt + Σ(t, x(t)) · dB(t)

The PDE approach can also be applied in such a setting, although as the di-
mensionality increases implementation become infeasible (and simulation-based
methods are typically preferred).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


4
Transform methods

Following the success of the Black and Scholes (1973) model on pricing and
hedging derivative contracts, there has been a surge of research on models that
can capture the stylized facts of asset and derivative markets. Although the BS
paradigm is elegant and intuitive, it still maintains a number of assumptions
that are too restrictive. In particular, the assumption of identically distributed
and independent Gaussian innovations clearly contradicts empirical evidence.
When developing specifications that relax these assumptions, academics and
practitioners alike discovered that apart from the BS case, very few models offer
European option prices in closed form. Being able to rapidly compute European
call and put prices is paramount, since typically a theoretical model will be
calibrated on a set of prices that come from options markets. The parameter
values retrieved from this calibration will be used to price and devise hedging
strategies for more exotic contracts.
It turned out that, in many interesting cases, even though derivative prices or
the risk-neutral density cannot be explicitly computed, the characteristic function
of the log-returns is tractable. Based on this quantity, researchers did indeed link
the characteristic function to the European call and put price, via an application
of Fourier transforms (see Heston, 1993; Bates, 1998; Madan, Carr, and Chang,
1998; Carr and Madan, 1999; Duffie, Pan, and Singleton, 2000; Bakshi and
Madan, 2000, inter alia for different modeling approaches).

4.1 THE SETUP


Assume that we are interested in an economy where there exists an asset with
price process S(t). We also assume that there is a risk-free asset, offering
R t a de-
terministic rate of return r(t), implying a set of bond prices B(t) = exp 0 r(s)ds.
We start our analysis with the logarithmic return over a maturity T , say
X(T ) = log S(T )
S(0) . We understand that as a random variable, X(T ) will be dis-
tributed according to a probability measure P, the true or objective probability
 !"#%$& '()"  108(4.1)

measure. Also, we assume that there exists an equivalent probability measure


Q, under which the discounted price will form a martingale

B(T ) · EQ S(T ) = S(0) (4.1)

This is called the risk-neutral or risk adjusted probability measure. This measure
need not be unique, given the current set of bond and asset prices, unless the
market is complete, but all derivative contracts will have a no-arbitrage price
that is equal to their discounted expected payoffs under this measure. That is
to say, a European call option will satisfy

Pcall = B(T ) · EQ max(S(T ) − K , 0)

Under the BS assumptions Q will be unique, and X(T ) will follow a Gaussian
distribution under both P and Q. Under more general assumptions this need
not be the case. Since we are interested in the pricing of derivatives we are
going to ignore the true probability measure from now on, and focus instead
on the qualities and characteristics of the risk-neutral measure. Therefore all
expectations are assumed to be under Q, unless explicitly stated otherwise.

FOURIER TRANSFORMS
One of the most important tools for solving PDEs is the Fourier transform of
a function f(x). In particular, we define as the Fourier transform of f(x) a new
function φ(u), such that
Z
F[f](u) = φ(u) = exp(iux)f(x)dx
R

where i = −1 is the imaginary unit. It turns out that each function f defines
a unique transform φ, and this transform is invertible: if we are given φ we can
retrieve the original function f, using the inverse Fourier transform
Z
1
F−1 [φ](x) = f(x) = exp(−iux)φ(u)du
2π R
There can be some confusion, as different disciplines define the Fourier trans-
form slightly different, setting exp(±iux) the other way round, or multiplying both
integrals with √12π to result in symmetric expressions. Here we use the definition
that Matlab implements, but one has to always verify what a computer language
offers.
Fourier transforms have some properties that make them invaluable tools for
solutions of differential equations, the most important being that the transform
is a linear operator

F[af1 + bf2 ](u) = aF[f1 ](u) + bF[f2 ](u)

and that the Fourier transform of a derivative is given by

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


109(4.1) Q #RO& $>S%!%TVU>! WRX$>SO&T
 
dn f(x)
F (u) = (iu)n F[f](u)
dx n

if all derivatives up to order n decay to zero for large |x|.


To illustrate the point, consider the BS PDE in terms of the logarithms
x = log S, namely

∂f(t, x) ∂f(t, x) 1 2 ∂2 f(t, x)


=α + σ − rf(t, x)
∂t ∂x 2 ∂x 2
If we apply the Fourier transform (with respect to x) on both sides, then the
left-hand-side becomes
  Z
∂ ∂
F f(t, x) (u) = exp(ixu) f(t, x)dx
∂t R ∂t
Z 
∂ ∂
= exp(ixu)f(t, x)dx = φ(t, u)
∂t R ∂t

while the right-hand-side yields


    
∂ 1 2 ∂2 1 2
F α + σ − r f(t, x) (u) = α(iu) + (iu) − r φ(t, u)
∂x 2 ∂x 2 2

Therefore, by applying the Fourier transform we actually transformed a com-


plicated second order PDE into a simple first order ODE which has a straight-
forward solution
 
∂ 1 2
φ(t, u) = αiu − u − r φ(t, u)
∂t 2
 
1 2
=⇒ φ(t, u) = φ(0, u) · exp αiut − u t − rt
2

with φ(0, u) the initial condition.

CHARACTERISTIC FUNCTIONS
If f is a probability density function that measures a random variable, say X(t),
then its Fourier transform is called the characteristic function of the random
variable. It is also convenient to represent the characteristic function as an
expectation, namely
φ(t, u) = E exp(iuX(t))
Characteristic functions are typically covered in most statistics textbooks. 1
Since functions and their Fourier transforms uniquely define each other, the
1
A good reference for characteristic functions and their properties is Kendal and Stuart
(1977, ch 4).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  110(4.1)

characteristic function will have enough information to uniquely define the prob-
ability distribution of the random variable. In particular, the inverse Fourier
transform will determine the probability density function.
In many cases it is tractable to solve for the characteristic function of a
random variable or a process, rather than the probability density itself. A large
and very flexible class of processes, the Lévy processes, are in fact defined
through their characteristic functions.
Characteristic functions have more important properties. By taking deriva-
tives at the origin u = 0, one can retrieve successive moments of the random
variable, as
 ∂n φ(t, u)
E [X(t)]n = i−n
∂un u=0

This means that qualitative properties of the distribution, such as the volatility,
skewness and kurtosis can be ascertained directly from the characteristic func-
tion. In addition, it becomes straightforward to implement calibration methods
that are based on the moments.
The characteristic function has the property φ(t, −u) = φ(t, u), with z̄ the
complex conjugate. Thus, the real part is an even function over u, while the
imaginary part is odd. This is in line with the fact that the probability density
is a real valued function, since to achieve that when integrating over the real
line the imaginary parts must cancel out. One can use this property to write the
Fourier inversion that recovers the probability density function as
Z
1 ∞
f(t, x) = Re [exp(−iux)φ(t, u)] du
π 0
The cumulative density function is of course (the function ℵ(·) is the indicator
function) Z x
F (t, x) = P [X(t) 6 x] = E [ℵ(X 6 x)] = f(s)ds
−∞

It is also possible to recover the cumulative density function, from the charac-
teristic function
Z ∞
1 1 exp(iux)φ(t, −u) − exp(−iux)φ(t, u)
F (t, x) = + du
2 2π 0 iu
Z ∞  
1 1 exp(−iux)φ(t, u)
= − Re du
2 π 0 iu

THE “DAMPENED” CUMULATIVE DENSITY


Computing the cumulative density as above can be very cumbersome, and the
approach does not lend itself naturally to the FFT techniques that we will
discuss later. One main drawback is the fact that the integrand diverges at zero,
rendering the numerical integration unstable in many cases. Here we present
a technique which allows us to rewrite the cumulative density as a Fourier

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


111(4.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
transform that is well defined. We will use exactly the same trick in the next
section, in order to compute a call option price as a single and numerically
tractable Fourier transform.
We introduce the damping factor η > 0, and define the dampened cumulative
probability as
F η (t, x) = exp(−ηx)P [X(t) 6 x]
It is possible to derive the characteristic function of this function, say φ η (t, u)
as follows
Z Z
φη (t, u) = exp(iux)F η(t, x)dx = exp(iux) exp(−ηx)P [X(t) 6 x] dx
R R
Z Z x
= exp(iux − ηx)f(t, z)dzdx
R −∞

The order of integration can be reversed as follows (details on how exactly this
is carried out can be found in the next section where the same approach is
implemented in an option pricing framework)
Z Z ∞
φη (t, u) = exp((iu − η)x)f(t, z)dxdz
R z
Z 
exp (iu − η)z 1
= f(t, z)dz = φ(t, η + iu)
R η − iu η − iu
We can therefore compute the cumulative probability by “un-damping” this
characteristic function, in effect computing
Z
F (t, x) = exp(ηx)F η (t, x) = exp(ηx) exp(−iux)φη(t, u)du
R

The choice of η it important, as it will determine how accurate numerical


implementations will be. A small value for η will eliminate the singularity theo-
retically, but it might not reduce its impact around zero sufficiently for numerical
purposes. If η is too large, then the characteristic function can be pushed towards
zero, which will not allow us to accurately reconstruct its shape and integrate it
with precision. Typically, a value of η in the region of 1 to 5 gives satisfactory
results. In figure 4.1 one can see the impact of the damping parameter η on the
function to be integrated. As we move from η ≈ 0 to η = 1 the function becomes
progressively better behaved, and therefore easier to numerically integrate. But
if we keep damping we run the risk of pushing the whole integrand close to zero,
as it is illustrated when we set η = 10.

4.2 OPTION PRICING USING TRANSFORMS


It is also possible to recover European option prices from the characteristic
function of the log-return. We assume that under the risk neutral measure, the
logarithm of the price satisfies

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


PSfrag replacements PSfrag replacements
0 0

 !"#%$& '()" 


0.1 0.1
0.2 0.2
0.3
0.4
0.3
0.4
112(4.2)
0.5 0.5
0.6 0.6

FIGURE
0.7
0.8
4.1: Damping the Fourier transform
0.7
0.8
to avoid the singularity at the origin.
The
0.9
1
integrand for the normal inverse0.9
1
Gaussian distribution with parameters
{µ, α, β, δ} = {8%, 7.00, −3.50, 0.25} is presented for different values of the
damping parameter η. The real (imaginary) part is given in blue (green). The
dashed thick line gives the integrand for η = 0.01 ≈ 0, which diverges at zero.
The solid thick line presents the integrand for η = 1, while the solid thin line
assumes η = 10. Two different horizons of one day and one month are presented,
to illustrate the change in the tail behavior as the maturity is decreased.
0 0
0.1 0.1
0.2 0.2
0.3 2 0.3 2
0.4 0.4
1.8 1.8
0.5 0.5
0.6 1.6 0.6 1.6
0.7 0.7
0.8 1.4 0.8 1.4
0.9 0.9
1 1.2 1 1.2

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 1 2 3 4 5 6 7 8 9 10 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

(a) t = 1/365 (b) t = 30/365

log S(T ) = log S(0) + X(T ) (4.2)

We specify that this relationship holds under the risk neutral measure because
it is most likely that the market we are working on is incomplete. If this is the
case, then we are not able to specify the no-arbitrage prices of options solely on
the information embedded in (4.2) that is specified under P. We need a change
of measure technique,2 as well as a number of preferences parameters that will
allow us to determine the equivalent measure Q. in order to sidestep these issues
we can assume that the process in (4.2) is defined under Q. The only constraints
that must be imposed is that the expectation
1
B(T ) · ES(T ) = S(0) =⇒ E exp X(T ) =
B(T )
If we assume that the characteristic function φ(T , u) of log S(T ) is given to us,
then the above constraint can be expressed as a constraint on the characteristic
function, that is to say
1
φ(T , −i) =
B(T )
There have been two methods that compute European calls and puts through
the characteristic function. Following the seminal work of Bakshi and Madan
2
For example Girsanov’s theorem (Øksendal, 2003), or the Esscher transform (Gerber
and Shiu, 1994), can be used to define equivalent martingale measures.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


113(4.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
(2000) the call/put option price is expressed in a form that resembles the Black-
Scholes formula, for example
Pcall = S(0)Π1 − K B(T )Π2

where the quantities Π1 and Π2 depend on the particular characteristic function.


This approach offers the same intuition as the Black-Scholes formula, where Π 1
is the option delta, and Π2 is the risk neutral probability of exercise.
Although the above expression is elegant and intuitive, it does not lend
itself for numerical implementations. More recently, Carr and Madan (1999)
develop the Fourier transform of the (modified) European option price directly,
by expressing (with k = log K the log-strike)
Pcall = exp(−ηk) · F −1 [ψ(T , u; η)](k)

where ψ(t, u) is a function of the characteristic function φ(t, u), and η is an


auxiliary dampening parameter.

THE DELTA-PROBABILITY DECOMPOSITION


The Delta-Probability decomposition of the European price has its roots in the
work of Heston (1993) on stochastic volatility, although in Heston’s original
paper the decomposition is not proved in its generality. Bakshi and Madan
(2000) provide a general approach where derivative payoffs are spanned using
trigonometric functions. Here we will provide a heuristic proof for the special
case of a European call option (see also Heston and Nandi, 2000, for details)
Assuming that the probability density function under the risk neutral measure
Q for the time t log-price is f(t, x), we can write the European call option price
as the expected value of its payoffs, as

Pcall = B(T ) · E max(S(T ) − K , 0)


Z ∞
= B(T ) (exp(x) − K )f(T , x)dx
log K
Z ∞ Z ∞
= B(T ) exp(x)f(t, x)dx − B(T )K f(T , x)dx
log K log K

The second integral is just the probability P[log S(T ) > log K ], and since
φ(t, u) is the characteristic function of log S(T ) this will be equal to
Z ∞ Z  
1 1 ∞ exp(−iux)φ(T , u)
Π2 = f(T , x)dx = + Re du
log K 2 π 0 iu
To compute the second integral we use the trick of multiplying and dividing
the expression as follows
Z ∞ R∞ Z ∞
log K exp(x)f(T , x)dx
exp(x)f(T , x)dx = R ∞ · exp(x)f(T , x)dx (4.3)
log K −∞ exp(x)f(T , x)dx −∞

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  114(4.2)
R∞ S(0)
Note that the quantity −∞ exp(x)f(T , x)dx = B(T ) due to the risk-neutrality
restriction (4.1). Also, the fraction in the above expression is by construction
between zero and one, therefore it can be interpreted as some probability. In
particular, if we define

exp(x)f(T , x)
f ? (T , x) = R ∞
−∞ exp(x)f(T , x)dx
R∞
then the fraction in (4.3) can be expressed as log K f ? (T , x)dx. The Fourier trans-
form of f ? (T , x) is given by
Z
φ(T , u − i)
φ? (T , u) = exp(iux)f ? (T , x)dx =
R φ(T , −i)

We can now define the quantity


Z ∞ Z  
? 1 1 ∞ exp(−iux)φ(T , u − i)
Π1 = f (T , x)dx = + Re du
log K 2 π 0 iuφ(T , −i)

Putting everything together will yield the European call option price, which
has the same structure as the Black-Scholes formula, where instead of the cu-
mulative normal values we have Π1 and Π2 . To summarize

Pcall = S(0)Π1 − K B(T )Π2


where Z  
1 1 ∞ exp(−iux)φ(T , u − i)
Π1 = + Re du
2 π 0 iuφ(T , −i)
Z ∞  
1 1 exp(−iux)φ(T , u)
Π2 = + Re du
2 π 0 iu

THE FOURIER TRANSFORM OF THE MODIFIED CALL


The Delta-probability decomposition gives an intuitive expression for the value
of the European call, but is not very efficient operationally since the integrals
required are not defined at the point u = 0. More recently, Carr and Madan
(1999) developed the characteristic function of a modified price itself. In par-
ticular, if we introduce a parameter η, we can define the modified call price
as3
η
Pcall (k) = exp(−ηk)Pcall (k)
which we consider a function of the log-strike price k = log K . We assume that
 η is fixed at T . The Fourier transform of the modified
the maturity of the option
call, say ψ η (t, u) = F Pcall (u), is given by
3
We need the parameter η to modify the original call price, since the original call price
is not square integrable.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


115(4.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
FIGURE 4.2: Finite difference approximation schemes. The forward (green), back-
ward (blue) and central (red) differences approximation schemes, together with
the true derivative (dashed).

PSfrag replacements
0.5

0 k ∈ (−∞, +∞)
x

x ∈ (k, +∞)
or
x ∈ (−∞, +∞)
k ∈ (−∞, x)
-0.5

-1
-1 -0.5 0 0.5 1
k

Z
η
ψ η (T , u) = B(T ) exp(iuk)Pcall (k)dk
R
Z  Z ∞ 
= exp(iuk) exp(ηk) (exp(x) − exp(k)) f(T , k)dx dk
R k

We will change the order of integration, and therefore the integration limits
will change from (k, x) ∈ (−∞, +∞) × (k, +∞) to (x, k) ∈ (−∞, +∞) × (−∞, x),
as shown in figure 4.2. Then
Z Z x
η
ψ (T , u) = B(T ) exp(iuk + ηk + x)f(T , x)dkdx
R −∞
Z Z x
− B(T ) exp(iuk + ηk + k)f(T , x)dkdx
R −∞

Now since η 6= 0 both inner integrals will vanish at k → −∞ (precisely the


reason we introduced this parameter), and the Fourier transform becomes
Z  
1 1
ψ η (T , u) = B(T ) − φ(T , u − i(η + 1))dx
R iu + η iu + η + 1
B(T )
= φ(T , u − i(η + 1))
(iu + η)(iu + η + 1)
This is a closed form expression of the Fourier transform of the modified call
price, in terms of the characteristic function of the log-price. Therefore, to retrieve

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  116(4.3)

LISTING 4.1:
 a& ` "!>R Y $'( Y : Characteristic function of the normal distribution.
p 1%?t3s>;a+,8aNA Bw8
*u%;t.=O3+>; M~}r1%?t3s>;a+,8aNAvuv6V1G
= }š1vB2=g†
, }š1vB2,g†
5%3Š8aN~}š1vB5%3Š8aN†
5
N }•,‘žŸ‚eB2™%—c5%3Š8aNKc†
M~} 4y1 :3C—=a—NO—>u”ž’B2™%—%=a—a5%3Š8aNG%—u°B<g†

the original call price we just need to apply the inverse Fourier transform on
ψ η (T , u)
Z
exp(−ηk)
Pcall (k) = F −1 [ψ] (k) = exp(−iuk)ψ η (T , u)du
2π R

Option prices are of course real numbers, and that implies that the Fourier
transform ψ η (T , u) must have odd imaginary and even real parts. Therefore we
can simplify the pricing formula to
Z
exp(−ηk) ∞
Pcall (k) = Re[exp(−iuk)ψ η (T , u)]du (4.4)
π 0

The choice of the parameter η determines how fast the integrand approaches
zero. Admissible values for η are the ones for which |ψ η (T , 0)| < ∞, which in
turn implies that |E[S(T )]η+1 | < ∞, or equivalently that the (η + 1)-th moment
exists and is finite. For more information for the choice of η see Carr and Madan
(1999) and Lee (2004b).

4.3 AN EXAMPLE IN MATLAB


In the following subsections we will give examples of transform methods that are
based on the normal and the normal inverse Gaussian (NIG) distribution (see
for example Barndorff-Nielsen, 1998, for details).

THE CHARACTERISTIC FUNCTIONS


We want to set up a model under the risk neutral measure Q of the form

S(T ) = S(0) exp{aT + X(T )}

where X(T ) is a random variable with a given characteristic function. If we


assume that the interest rate is constant, then under Q
1
E [exp{aT + X(T )}|F (0)] = exp{rT } ⇒ a = r − log E [exp{X(T )}|F (0)]
T

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


117(4.3) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 4.2:
a& ` a" & b  Y : Characteristic function of the normal inverse Gaussian
distribution.
p %1 ?t3s>;t3аBw8
*u%;t.=O3+>; M~}r1%?t3s>;t3Šu6V1c
= }š1vB2=g†
, }š1vB2,g†
“4A=N~}š1vB-“4A=N†
5
NA>1%?ON~}š1vB0NA>1%?ON†
q 4=%N }š1vB q 4=%N †
Nr}•,‘žš“4A=Nt— 5>®,= NA>1%?ONtaž q 4=%N •‹úBBB
“4A=Nt— 5>®,= NA>1%?ONKtžt q 4=%N ‹O„ae†
M~} 4y1 :3C—=a—NO—>uƒ‹•=a—“4A=Nt— 5>®,= NA>1%?ONtaž q 4=%N ¡žKBBB
10
=t—%“4A=Nt— 5>®,= NA>1%?ONKtžt q 4=%N ‹O3—>uGgB<tc†

The expectation can be cast in terms of the characteristic function of X(T ), giving
the constraint
1
a = r − log φ(T , −i)
T
This constraint will ensure that the under risk-neutrality the asset will grow at
the same rate as the risk free asset.
The characteristic function for the normal distribution, implemented in listing
4.1, is given by  
1
φ(t, u) = exp itau − tσ 2 u2
2
for a = r − 12 σ 2 . The characteristic function of the NIG distribution is given in
listing 4.2
 p q 
2 2 2
φ(t, u) = exp itau + tδ α − β − tδ α − (β + iu) 2

p p
In this case the parameter a = r − δ α 2 − β2 + δ α 2 − (β + 1)2 .

NUMERICAL FOURIER INVERSION


Say that we are interested in inverting the characteristic function to produce
the probability density function or to compute option prices. To do so we need
the value, at the point x, of an integral of the form
Z ∞
f(t, x) = exp(−iux)h(u)du
0

The integral will be approximated with a quadrature, and here we will use the
trapezoidal rule.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  118(4.3)
PSfrag replacements PSfrag replacements
0 0

RFIGURE
0.1

0.2
4.3: Numerical Fourier inversion
0.1
0.2
using quadrature. The integral
0.3 Re [exp(−iux)φ(T , u)] du is approximated
0.3 where φ(T , u) is the characteristic
00.4 0.4
function
0.5
0.6
of the normal distribution, with
0.5
0.6
µ = 8%, σ = 25% and T = 30/365. The
0.7 0.7
upper
0.8 integration bound is ū = 50. Results
0.8 for x = 5% and x = 15%, as well as
0.9 0.9
4u 1 = 10 and 4u = 5 are given. 1

1 1

0 0
0.1 0.8 0.1 0.8
0.2 0.2
0.3 0.3
0.6 0.6
0.4 0.4
0.5 0.5
0.6 0.4 0.6 0.4
0.7 0.7
PSfrag replacements PSfrag replacements
0.8 0.8
0 0.2
0.9 0 0.2
0.9
0.11 0.11
0.2 0.2
0.3 0 0.3 0
0.4 0.4
0.5 -0.2 0.5 -0.2
0.6 0.6
0.7 0.7
0.8 -0.4 0.8 -0.4
0.9 0 10 20 30 40 500.9 0 10 20 30 40 50
1 1
(a) x = 5%, 4u = 10 (b) x = 5%, 4u = 5

1 1

0 0
0.1 0.8 0.1 0.8
0.2 0.2
0.3 0.3
0.6 0.6
0.4 0.4
0.5 0.5
0.6 0.4 0.6 0.4
0.7 0.7
0.8 0.8
0.9 0.2 0.9 0.2
1 1
0 0

-0.2 -0.2

-0.4 -0.4
0 10 20 30 40 50 0 10 20 30 40 50

(c) x = 15%, 4u = 10 (d) x = 15%, 4u = 5

In particular, we start with a truncating the interval [0, ∞), over which the
characteristic function is integrated. We select a point ū that is large enough
for the contribution of the integral after this point to be negligible. Then we
discretize the interval [0, 4t] into N − 1 subintervals with spacing 4u, that is
we produce the points u = {uj = j4u : j = 0, . . . , N}. For a given maturity T
we denote the integrand with h? (x, u) = exp(−iux)h(u), and produce the values
at the grid points hj (x) = h? (x, uj ). Then, the trapezoidal approximation to the
integral is given by
Z ∞ N
X 1 
h(x, u)du ≈ hj (x)4u − h0 (x) + hN (x) 4u
0 2
j=0

Therefore, in order to carry out the numerical integration, one has to make
two ad hoc choices, namely the upper integration bound ū and the grid spacing
4u. Selecting ū can be guided by the speed of decay of the characteristic

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


119(4.3) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 4.3:
^ i ` &0"g Y : Trapezoidal integration of a characteristic function.
p .*Cs3 ;C=vBw8
1B2= }F%‚H ™(†
1B2, }ƒ‚B2‚%²e†
1B 5%3Š8aN~}ƒ‚B<%™(†
¥Œ}‘„a† p Š,C3>“’53>‰%4”+D%4,”=?C4‘.?CN,‘*u;a.
5
=?O4=N~}¶x‚eF-¥vF0„ ‚‚|v† p .?CN,”*u%;t.=O3+>; “O3%5%.,4=O3‰N=O3+;
y~} xž‚B9˜‚F2‚B2‚™eF2‚B9˜‚|°%† p +>uC=1%uC=‡Š,C3>“‘*+,‘“4;t5%3=%M
¯yƒ} A4;CŠ%=? <ytc†
1“*ƒ} ‰4,C+5 2¯y 6:„e† p O4%41t5~“4;t5%3=%M”DNA>uO4C5•1“*z<yt
*+, K}O„KF0¯y p A++ 1 +D%4,ƒyKž%DNA>uO4C5
10
p =?C4¡3 ;C=4Š%,N;C“¡DNA>uO4C5
M~} 4y1 ž%3—yg9G%—=?O4=NKeB2—1%?t3s>;a+,8aNAv0=?C4=%N 6V1Gc†
M~} ,%4N%A <Mtc†
} 5u8 <Mt(ž>‚eB2™%—G0Mz:„%‹Mg 4>;“ c† p =%,N1O4‰C+3“NA¡,uOA4
1º “*z9cš} º —¥aH 1O3 † p =?C4‘+>uC=1%uC=
15
4>;“

function. A good choice of 4u on the other hand can be a little bit trickier, since
the quantity exp{iux} = cos(ux) + i sin(ux) will be oscillatory with frequency
that increases with x. Figure 4.3 gives the quadrature approximations for the
case of the normal distribution. The characteristic function corresponds to a
density with mean µ = 8% p.a. and volatility σ = 20% p.a., and the maturity is
one month, T = 30/365. We have set the upper quadrature bound to ū = 50, and
use two different grid sizes, a “coarse” one 4u = 10 and a “fine” one 4u = 5.
We investigate the integration for x = 5% and for x = 15%. In the first case
the integrand is not oscillatory and even the “coarse” approximation captures
the integration fairly accurately. When x = 15% the integrand oscillates and a
“finer” grid is required. This example illustrates that one must be careful and
cautious when setting up numerical integration procedures that automatically
select the values for ū and 4u.
In order to reconstruct the probability density function we need to repeat the
procedure outlined above for different values of x. This is carried out in listing
4.3 for the normal distribution. The results are plotted in figure 4.4 in logarithmic
scale for different values of ū and 4u. One can verify that if we are interested in
the central part of the distribution, then a coarse grid with is sufficient while the
results are not particular sensitive to the choice of the upper integration bound.
One the other hand, if we want to compute the probability density at the tails,
then we need to implement a very fine grid over a large support interval.
There is a distinct and very important relationship between the fat tails and
the decay of the characteristic function. In particular, the higher the kurtosis
of the distribution, the slower the characteristic function decays towards zero
as u increases. This has some implications on the implementation of the nu-

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  120(4.3)

FIGURE 4.4: Probability density function


R∞ using Fourier inversion. Logarithmic
plots of the integral f(T , x) = π1 0 Re [exp(−iux)φ(T , u)] du, where φ is the
normal characteristic function, together with the true normal density. The pa-
rameters set is {µ, σ, T } = {8%, 25%, 30/365}. Results for ū = 50 and ū = 100,
as well as 4u = 10 and 4u = 1 are given.

1 1

−1 −1

−3 −3

PSfrag replacements PSfrag replacements

−5 −5

−7 −7
-0.5 0 0.5 -0.5 0 0.5

(a) ū = 50%, 4u = 10 (b) ū = 100%, 4u = 10

1 1

−1 −1

−3 −3

PSfrag replacements PSfrag replacements

−5 −5

−7 −7
-0.5 0 0.5 -0.5 0 0.5

(c) ū = 50%, 4u = 1 (d) ū = 100%, 4u = 1

merical integration, as we must be ready to integrate over a very long support.


On the other hand, the oscillations introduced by the complex exponential do
not depend on the characteristic function itself, therefore we will need a fine
grid to accurately compute the density around the tails. This indicates that we
will potentially have to carry out numerical quadratures with many hundreds of
thousands of grid points.
This is illustrated with the NIG distribution. We consider a parameter set
{µ, T , α, β, δ} = {8%, 30/365, 7.00, −3.50, 0.25}. Taking derivatives of the char-
acteristic function gives the volatility, skewness and excess kurtosis, which are
23.4%, −1.21 and 3.96 respectively. Therefore the distribution exhibits negative
skewness and excess kurtosis of a magnitude that is observed in asset returns.
We want to investigate the stability of the Fourier inversion, and figure 4.5 gives
the results for different combinations of 4u and ū. Here we can clearly see the
effect of different choices of these parameters. As we increase the integration
interval the oscillations in the probability function are reduced, but the function

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


121(4.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
FIGURE 4.5: Probability density function R ∞using Fourier inversion. Logarith-
mic plots of the integral f(T , x) = π1 0 Re [exp(−iux)φ(T , u)] du, where φ
is the normal inverse Gaussian characteristic function. The parameters set is
{µ, α, β, δ, T } = {8%, 7.00, −3.50, 0.25, 30/365}. Results for 4u = 10 and
ū = 100 (green), ū = 200 (blue) and ū = 400 (red). The density for 4u = 1 and
ū = 400 (black) is also given.

1
PSfrag replacements

−1

−2

−3
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

can still be inaccurate around the tails. We need to reduce the grid size to
increase the overall accuracy. Observe that the right tail is slightly oscillatory
even when {4u, ū} = {1, 400}.

4.4 APPLYING FAST FOURIER TRANSFORM


METHODS
In the previous section we implemented a numerical integration method that
approximates an integral of the form
Z ∞
y(x) = exp(−iux)h(u)du
0

This integral can then be used to retrieve the probability density function at
the point x, or a European call option price with log-strike price x. Typically,
we want to compute the integral for many different values of the parameter x,
in order to reconstruct the probability density function or the implied volatility
smile. Using the approach we outline above, we must perform as many numerical
integrations as the the number of abscissas over x.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  122(4.4)

The Fast Fourier Transform (FFT) is a numerical procedure that simultane-


ously computes N sums of the form
N
X  

zk? = exp − (j − 1)(k − 1) · zj (4.5)
N
j=1

for all k = 1, . . . , N. The number of operations needed for the FFT is of order
o(N log N). For comparison, if we wanted to compute the above sums separately
and independently it would take o(N 2 ) operations, meaning that in order to
double the number of points the number of operations will increase fourfold.
With the FFT the computational burden increases a bit more than two times. This
substantial speedup is the reason that has made FFT popular in computational
finance, since we typically we need to evaluate thousands of Fourier inversions
when calibrating models to observed volatility surfaces.
The input of the FFT is a vector z ∈ CN , and the output is a new vector
z ? ∈ CN . Each element of z ? will keep the sum for the corresponding value of
k. Our task is therefore to cast the integral approximation in a form that can be
computed using FFT.
The first step is of course to discretize the interval [0, ū] using N equidistant
points, and say that we set u = {(j − 1)4u : j = 1, . . . , N}. Therefore the
trapezoidal approximation to the integral is given by

1
exp(−iu1 x)h(u1 )4u + exp(−iu2 x)h(u2 )4u + exp(−iu3 x)h(u3 )4u + · · ·
2
1
+ exp(−iuN−1 x)h(uN−1 )4u + exp(−iuN x)h(uN )4u
2
 0
Thus, if we set α = 12 , 1, 1, . . . , 1, 12 and hj = h(uj ), we can write the approx-
imation as the sum
X N
exp(−iuj x)αj hj 4u
j=1

Since the FFT will also return an (N ×1) vector, we should set the procedure
to produce values for a set x = {x1 +(k−1)4x : k = 1, . . . , N}. We typically want
these values to be symmetric around zero,4 and therefore we can set x1 = − N2 4x.
The approximating sum for these values of x will therefore be given by
4
When we invert to construct a probability density we typically interested in the density
at log-returns symmetric around the peak which will be close to zero. If we invert for
option pricing purposes, we can normalize the current price to one. Then each value
of y will correspond to a log-strike price, and we typically want to retrieve option
prices which are in-, at- and in-the-money. The at-the-money level will be around the
current log-price which is of course zero.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


123(4.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
N
X
yk = exp(−iuj xk )αj hj 4u
j=1
N
X  
= exp −i(j − 1)(k − 1)4u4x exp −i(j − 1)x1 4u αj hj 4u
j=1
N
X 
= exp −i(j − 1)(k − 1)4u4x zj
j=1

where zj = exp −ix1 uj αj hj 4u. The sum above will be of the FFT form (4.5)
only if 4u4x = 2π N . This set a constraint on the relationship between the
characteristic function input grid size, and the output log-return or log-strike
grid size. This completes the integral approximation; it is now straightforward
to invert for the probability density function or for call prices.

FFT INVERSION FOR THE PROBABILITY DENSITY FUNCTION


To summarize, in order to invert the characteristic function we need to take the
following steps (with we denote element-by-element vector multiplication):
1. Input the grid sizes 4u and 4x, as well as the number of integration points
N. Make sure that they satisfy 4u4x = 2π N.
2. Construct the vectors u = {(j − 1)4u : j = 1, . . . , N} and x = {− N2 4x +
(k − 1)4x : k = 1, . . . , N}.
3. Compute the vector z = exp(−ix1 u) φ(T , u).
4. For the trapezoidal rule set z1 = z21 and zN = z2N .
5. Run the FFT on z, that is z ? = FFT(z).
6. Compute the density function values y = π1 Re[z ? ].
7. Output the pair (x, y): the value yk is the probability density for the log-
return xk , for k = 1, . . . , N.

FFT INVERSION FOR EUROPEAN CALL OPTION PRICING


The inversion of the characteristic function to compute options is very similar.
We just need to also compute the Fourier transform of the modified call before
invoking the FFT. The steps are the following (with we denote element-by-
element vector multiplication and with element-by-element division):
1. Input the grid sizes 4u and 4x, as well as the number of integration points
N. Make sure that they satisfy 4u4x = 2π N . Also input the “dampening
parameter” for the modified call η.
2. Construct the vectors u = {(j − 1)4u : j = 1, . . . , N} and x = {− N2 4x +
(k − 1)4x : k = 1, . . . , N}.
3. Construct the Fourier transform of the modified call
  
ψ = exp(−rT )φ T , u − i(η + 1) (iu + η) (iu + η + 1)

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  124(4.5)

LISTING 4.4:
ii ` >^ $''( Y : Call pricing using the FFT.
p *%*%=Cs.NA%AvBw8
*u%;t.=O3+>; xv6{M|~}Œ*%*%=Cs.NA%A:.*v6/1O. *°6{1%*K
p DNA>uO4C5Œ*+, ¾¾ ³¡381aA4>8a4;C=N=O3%+>;
?ƒ}•1%*zB04=%Nz†/¯Œ}Ÿ1%*B9¯†ªu%¿N,Ž}Ÿ1%*B2u%¿N,v†
p 1ON,N>8a4=4,’DNA>uO4C5
5
,~}•1a.>*B2,†{=~}•1a.>*B2=g†
%¿N,Ž}Žž¯K—¥(Hc†
¥uŽ}ru%¿N,cHG2¯ež%„e†{¥Ž}ƒ— 1O3 H¯aH¥uv† p Š,C3>“ 5%3‰4C5
uƒ}¶x‚eF2¯ež%„|C—C¥uv†Vƒ}¶x‚eF2¯ež%„|C—C¥Ž‹•%¿N,v†
p .+ 81%uC=4Ž=?C4r1O53Gž%*u%;t.=O3+>;
10
‰~} *4DNA :.*v6–ucž%3C—K<?K‹O„g6V1a.>*cc†
‰~} 4y1 ž,a—=t%—‰B2Hc<?c‹?cž u B<‹G3O—G<%—%?K‹O„C—>uG(†
‰~} 4y1 ž3—>uK—%¿N,(eB2—%‰B2—%¥uv† p 3 ;C=4Š%,N;C“
p =%,N1O4‰C+3“NA¡,uOA4
‰z:„•}ƒ‚eB2™%—C‰z:„e†{‰g 4>;“ š}ƒ‚eB2™%—C‰g 4>;“ c†
15
JŒ} ,%4N%A  **= <ytc† p ¾¾ ³
M~} 4y1 ž>?K—>GeB2—JaH 1O3 † p +>uC=1%uC=

4. Compute the vector z = exp(−ix1 u) ψ.


5. For the trapezoidal rule set z1 = z21 and zN = z2N .
6. Run the FFT on z, that is z ? = FFT(z).
7. Compute option values y = π1 exp(−ηx) Re[z ? ].
8. Output the pair (x, y): the value yk is call option that corresponds to an
option with log-strike price xk , for k = 1, . . . , N.
The inversion of the Fourier transform for the modified call is implemented
S‡† #ˆ¢û¢ii ` ^>$''>‰YŠ a& ` "a& b †@O^ iC†7%i‹ in order to retrieve a set
in listing 4.4. One needs to specify the corresponding characteristic function, for
example
O^ i , while parameters
of log-strike prices and the corresponding call prices. The parameters for the
characteristic function are passed through the structure
for the FFT inversion are passed through .
%
 i

4.5 THE FRACTIONAL FFT


The restriction 4u4x = 2π N that has to be satisfied when applying the FFT
hampers the flexibility of the method. One naturally wants a fine grid when
integrating over the characteristic function, but a small 4u can result in very
coarse output grids. For example, a 512 point integration over the interval [0, 100]
would offer a good approximation to invert for the normal distribution of figure
4.4. This implies 4u = 0.1957, and in order for the FFT to be applied we have

to set 4x = N4u = 0.0627, with x1 = −16.05%. This means that only a very

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


125(4.5) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 4.5:
iRig Y : Fractional FFT.
p * ,*=°Bw8
*u%;t.=O3+>; *~}~*,*=v2y6ªNa
¯ } 53>‰%4 9yv6<e†
4C„­} 4y1 ž 1O3 —3—NŽ—c<‚F2¯(ž%„aB<g†
4} 4y1  1O3 —3—Nƒ—K-¯vFž%„KF0„zB<g†
5
‰O„­}ˆx9yB2—4„6 ‰4,C+5 0„c6<¯K|(†
‰%} x-„KB2Ha4„6¬4C|(†
*‰a„Ÿ} **= -‰O„Cc†
*‰~} **= -‰%tc†
*‰ }~*‰a„þB2—~*‰g†
10
3>*‰ƒ} 3>**= -*‰Gc†
*~}Œ4C„B2—a3>*‰v:„e6:„KF<¯Kc†

small number of the 512 output values are actually within the ±30% which we
might be interested in.
One way that can result in smaller output grids is increasing the FFT size
N. We have chosen the upper integration bound in a way that the characteristic
function is virtually zero outside the interval. Therefore, when we increase N
we just “pad with zeros” the input vector z. For example, if we append the 512
vector with 7680 zeros we will implement a 8192-point FFT, which will return a
more acceptable output grid of 0.0039. But of course applying an FFT which is
16 times longer will have a serious impact on the speed of the method.
The fractional FFT method, outlined in Chourdakis (2005), addresses this is-
sue. The fractional FFT (FRFT) with parameter α will compute the more general
expression
N
X 
zk? = exp −2πα(j − 1)(k − 1) · zj (4.6)
j=1

for all k = 1, . . . , N. In order to implement a N-point FRFT one needs to invoke


three times a standard 2N-point FFT,5 but the freedom of selecting 4u and 4x
independently can actually improve the speed for a given degree of accuracy.
In particular, the following steps implement an N-point FRFT, coded in list-
ing 4.5
1. Create the (N × 1) vectors ε and ε̃
 
ε1 = exp −iπα(j − 1)2 : j = 1, . . . , N
 
ε2 = exp iπα(N − j + 1)2 : j = 1, . . . , N

2. Based on these auxiliary vectors create the two (2N × 1) vectors z 1 and z 2
5
For proofs and discussion on the FRFT also see Bailey and Swarztrauber (1991, 1994).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  126(4.6)

LISTING 4.6:
iRi ` >^ $''( Y : Call pricing using the FRFT.
p *%,%*%=Cs.NA%A°Bw8
*u%;t.=O3+>; xv6{M|~}Œ*%,%*%=Cs.NA%Av:.*v6/1O. *°6{1%*K
p DNA>uO4C5Œ*+, ¾ ¼ ¾ ³ 381aA4>8a4;C=N=O3%+>;
?ƒ}•1%*zB04=%Nz†/¯Œ}Ÿ1%*B9¯†ªu%¿N,Ž}Ÿ1%*B2u%¿N,v†ª%¿N,Ž}Ÿ1%*B2%¿N,v†
p 1ON,N>8a4=4,’DNA>uO4C5
5
,~}•1a.>*B2,†{=~}•1a.>*B2=g†
%¿N, }Žž ¯t—¥(Hc†
¥uŽ}ru%¿N,cHG2¯ež%„e†{¥Ž}ƒ—%%¿N,GH¯† p Š,C3>“ 5%3‰4C5
NA>1%?ON~}r¥uc—¥(HH 1O3 † p *%,NC.=O3+>;ONA‘1ON,N>8a4=4,
uƒ}¶x‚eF2¯ež%„|C—C¥uv†Vƒ}¶x‚eF2¯ež%„|C—C¥Ž‹•%¿N,v†
10
p .+ 81%uC=4Ž=?C4r1O53Gž%*u%;t.=O3+>;
‰~} *4DNA :.*v6–ucž%3C—K<?K‹O„g6V1a.>*cc†
‰~} 4y1 ž,a—=t%—‰B2Hc<?c‹?cž u B<‹G3O—G<%—%?K‹O„C—>uG(†
‰~} 4y1 ž3—>uK—%¿N,(eB2—%‰B2—%¥uv† p 3 ;C=4Š%,N;C“
p =%,N1O4‰C+3“NA¡,uOA4
15
‰z:„•}ƒ‚eB2™%—C‰z:„e†{‰g 4>;“ š}ƒ‚eB2™%—C‰g 4>;“ c†
JŒ} ,%4N%A 0*,*=v2y6¢NA>1%?ONtc† p ¾¼¾³
M~} 4y1 ž>?K—>GeB2—JaH 1O3 † p +>uC=1%uC=

   
z ε1 1 ε1
z1 = and z 2 =
0 ε2

3. Apply the FFTs on these vectors

z ?1 = FFT(z 1 ) and z ?2 = FFT(z 2 )

4. The N-point FRFT will be the first N elements of the inverse FFT

z ? = ε1 IFFT(z ?1 z ?2 )

We can now easily adapt the recipes of the previous section to accommodate
the fractional FFT. We can now choose the two grid sizes freely, and set the
fractional parameter α = 4u4x
2π . Thus, we need to change the corresponding
steps of the recipes to:
4u4x
Run the fractional FFT on z with fractional parameter 2π , that is z ? =
FRFT(z, 4u4x
2π ).

Listing 4.6 implement the fractional FFT based option pricing. Chourdakis
(2005) gives details on the accuracy of this method for option pricing based on a
number of experiments that compares the fractional to the standard FFT. Figure
4.6 gives an example that is based on the normal distribution. One can observe
the exceptional accuracy of both methods: a 8192-point FFT is contrasted to a
512-point FRFT.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.5
0.6
0.7
127(4.6)
0.8 Q #RO& $>S%!%TVU>! WRX$>SO&T
0.9
1 4.6:
FIGURE Comparison of the FFT and the fractional FFT, based on the Black-
Scholes model. The figure shows the errors between option prices computed
using the transform methods and their closed form values, for different strike
prices. The blue line gives the errors of the standard FFT method, while the red
line gives the errors of the fractional FFT. All values are ×10 −15 .

×10−15

0
0.1 2
0.2
0.3
0.4
0.5
0.6 1
0.7
0.8
0.9
1
0

-1
0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25

4.6 ADAPTIVE FFT METHODS AND OTHER


TRICKS
There is a number of ways in which adaptive integration can be employed within
the fractional FFT framework. In particular, as the fractional FFT allows us to
integrate over an arbitrary region, it is natural to consider splitting the support of
the characteristic function into subintervals and apply the transform sequentially.
Also, we might consider improving the accuracy of each integration seg-
ment. In the previous sections we worked with the trapezoidal rule, essentially
approximating the function
Z uN N
X
exp(−iux)h(u)du ≈ exp(−iuj x)αj hj 4u
u1 j=1

0
for hj = h(uj ), and α = 12 , 1, 1, . . . , 1, 21 . What we have done is approximating
the whole integrand as a piecewise linear function. This integrand is the product
of two terms: the first, exp(−iux), is a combination of trigonometric functions,

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  128(4.7)

and will be highly oscillatory, especially for large values of |x|; the second, h(u),
is also oscillatory but typically very mildly and also independent of x.
It therefore makes sense to approximate only the second component as a
piecewise linear function, and leave the first part intact. We therefore split the
integral into N − 1 sub-integrals
Z uN X Z uj+1
N−1
exp(−iux)h(u)du = exp(−iux)h(u)du
u1 j=1 uj

We then use the linear approximation within each subinterval

hj+1 − hj 4hj
h(u) ≈ hj + (u − uj ) = aj + bj u for uj 6 u 6 uj+1
uj+1 − uj 4uj

Thus, each subintegral can be computed as


Z uj+1   uj+1
  aj + b j u bj
exp(−iux) aj + bj u du = exp(−iux) −
uj −ix (−ix)2 u=uj
i 
= hj+1 exp(−iuj+1 x) − hj exp(−iuj x)
x
bj  
+ 2 exp(−iuj+1 x) − exp(−iuj x)
x
Luckily, the first square brackets will cancel out sequentially, as we sum
over the sub-integrals, which will give us the result after some straightforward
algebra
Z uN
exp(−iux)h(u)du
u1
N
i 1 X
≈ [hN exp(−iuN x) − h1 exp(−iu1 x)] + 2 cj exp(−iuj x)
x x
j=1

with c = (0 − b1 , b1 − b2 , b2 − b3 , . . . , bN−2 − bN−1 , bN−1 − 0). Recall that


h −hj
bj = j+14u . The sum in the above expression can now be computed using the
fractional FFT procedure.
It might seem initially that the above expression will diverge as x −→ 0.
This is not the case. In fact, by twice applying l’Hôpital’s rule as needed, one
can show that the expression converges to the trapezoidal rule we obtained in
the previous section.
Listing 4.7 shows an implementation of this integration using the fractional
FFT. This method is directly implemented in listing 4.8, which shows how an
adaptive integration technique can be used to invert the characteristic function
and recover the cumulative density.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


129(4.7) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 4.7:
i Ri ` &0"% b R%$>%e Y : Integration over an integral using the FRFT.
p *%,%*%=Cs3 ;C=4Š%,N%=4 Bw8
*u%;t.=O3+>; M~}~©C«s*%,%*%=Cs3 ;O=4%Š%,CN%=4@2*6–u6{yt
“>uŽ}šuv<tžuv:„e† p 5>=%4>1’+>*‡=?C4¡5 u%1%1a+,%=
“yƒ}•yz<tž>yz:„e† p 5>=%4>1’+>*‡=?C4Ž,4C5 uaA=O5
*‚Œ}•*z:„†ª*»‡}•*g 4>;“ c† p *O3,O5=‡N>;“‘AN5>=’4A4>8a4;C=O5
5
y‚Œ}•yz:„†
u‚Œ}šuv:„†/u»‡}šuz 4>;“ c†
q } “C3>** <*tH%“>uv† p•q s ~1a+3 ;C=O5
.Ÿ} “C3>** x‚ q ‚|Oe† p .s~1a+3 ;C=O5
? }Œ.gB2— 4y1 ž%3—y‚t—>uGc† p D4C.=C+,ƒ1ONC5%54“‡=%+ ¾ ¼ ¾ ³
10
NÃ}ƒ‚eB2™%—O“>uc—“ytH 1O3 † p ¾ ¼ ¾ ³Ž1ON,N>8a4=4,
?t„Ÿ}~©C«s*%,%*%=vw?v6ªNac†
M}Œ3O—K:*‚t— 4y1 ž%3—u‚t—ytOž*»G— 4y1 ž%3—u»G—ytB9Hyg†
M}•M~‹•?t„B2— 4y1 ž%3—u‚K—K<yGžy‚KB2Hc0yB<g†
15
4>;“

4.7 SUMMARY
To summarize, for large classes of models closed form solutions even for European
style options are not available but their characteristic function is available in
closed form. For example, models where the logarithmic price is Lévy (Madan
et al., 1998; Carr, Geman, Madan, and Yor, 2002), Garch models (Heston and
Nandi, 2000), affine models (Heston, 1993; Duffie et al., 2000; Bates, 2000, 1998),
regime switching models (Chourdakis, 2002) or stochastic volatility Lévy models
(Carr, Geman, Madan, and Yor, 2003; Carr and Wu, 2004) fall within this category.
Fourier transform methods can be applied to recover numerically European
call and put prices from the Fourier transform of the modified call. Therefore
such models can be rapidly calibrated to a set of observed options contracts,
as we will investigate in the next chapter on volatility. The FFT method or
its fractional variant are well suited to perform this inversion. Also, one can
use these methods to invert the characteristic function itself, thus recovering
numerically the probability density function. This can in turn be used to set up
numerical procedures for pricing American style or other exotic contracts, for
example as in Andricopoulos, Widdicks, Duck, and Newton (2003).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  130(4.7)

LISTING 4.8: h ^ i ^ Xi Y : Transform a characteristic function into a cumulative


density function.
p . *C.“%*vBw8
*u%;t.=O3+>; xv6VMz6ªMDO|~}Ž.*C.“%*:.*v6–16/y6ùDN,N,%ŠO3 ;g
»,%,%³C+%A}Ž„ 4cž>²(† p NC.%.41C=N q A4’«¥ ¾ 381C,C+D4>8a4;C=
€CNy =4,‡}”„>‚%‚%‚e† p 8ONyƒ;%u8 q 4,‘+>*¡3=4,N=O3+>;t5
Nr}ƒº G† p “N>81t3 ;CŠ”*NC.=C+,
5
©Œ}Œ²c†{¯ƒ}Œ©† p ¾ ¼ ¾ ³Ž1a+3 ;C=O5
¸‡} A3 ;t5 1ONC.4 ž>y6ª‚6/¯t‹O„†/¸‡}Ÿ¸°:„KF 4 ;%“ ž%„e†
¦Œ} A3 ;t5 1ONC.4 ª‚6/y6/¯t‹O„†/¦Œ}Ÿ¦<(F 4>;“ c†
*ANŠ }‘„a†·3>=%4,Ž}Ž„a† p 5=C+>1%1t3 ;CŠ‘*ANŠ
ua.>, }Œ‚c†¬u8Cy }ƒ‚B<%™(† p 3 ;C=4Š%,N=O3+>; 3 ;C=4,%DNA
10
M}Œ‚c†ùMDƒ}¶xw|e†
J?t3A4Œ*ANŠ
M%¦4O}ƒ¦3;=v2uO. ,°6{ua.>,a‹%u8Cycc† p Š%4=Ž“4;t5%3=%M¡+D%4, wx ‚69‹c3;*O|
M¸4O}~¸O3;=v2uO. ,°6{ua.>,a‹%u8Cycc† p Š%4=Ž“4;t5%3=%M¡+D%4, x0ž 3:;%*62‚|
M4~}úx9M¸%4°6¬M%¦4O|(†
15
M}rM~‹rM4†
MDŒ}úx2MDz†{MC|(†
3 *  8ONy -M4K%›%»,%,%³C+%AŒ¹¹‘3 =4,gœ ށŒNù3 =4,œ€CNy º =4,
*ANŠ }~‚c†
4>;“
20
ua.>, }•ua.>,ƒ‹%u8Cyz†
3>=%4,Ž}‡3>=%4,c‹O„t†
u8Cyƒ}ru8Cyc—c†
4>;“
M-¯t‹O„F 4>;“ š}‡„%žM-¯t‹O„F 4>;“ c†
25
ƒ}ˆxw¸°6{¦O|(†
p “N>81O4;¡N>;“¡3 ;C=4Š%,N=4’+D%4,Ž;O4ŠN=O3D4C5
*u%;t.=O3+>; ‰~}r¸O3;=v<yC„6¬y%t
u } A3 ;t5 1ONC.4 <yC„6ªyz6/¯Kc†
uNŒ}šua‹O3—N†
30
*NŒ} *4DNA :.*v6Vu%Nv6V1Gc†
* q }r*NvB2Hc0Ntž%3—>uGc†
‰´}Œ*%,%*%=Cs3 ;C=4Š%,N%=4@<* q 6–u6/¸cc†
‰´}až 4y1 -NO—¸cB2— ,%4N%A <‰tH 1O3 †
4>;“
35
p “N>81O4;¡N>;“¡3 ;C=4Š%,N=4’+D%4,Ž1a+5%3=O3D4C5
*u%;t.=O3+>; ‰~}Œ¦3;=v<yC„6¬y%t
u } A3 ;t5 1ONC.4 <yC„6ªyz6/¯Kc†
uNŒ} u(ž%3—N†
*NŒ} *4DNA :.*v6Vu%Nv6V1Gc†
40
* q }r*NvB2Hc:N‹O3—>uGc†
‰´}Œ*%,%*%=Cs3 ;C=4Š%,N%=4@<* q 6–u6/¦Kc†
‰´}až 4y1 žNO—¦tB2— ,%4N%A <‰tH 1O3 †
4>;“
45
4>;“

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


5
Historical estimation and filtering

It is typical in many, if not all, financial application to face models that depend
on one or more parameter values, which have to be somehow determined. For
example, if we are making the assumption that the stock price we are investi-
gating follows a homogeneous geometric Brownian motion, then we would be
interested in estimating the expected return and the corresponding volatility.
Then we could produce forecasts, option prices, confidence intervals and risk
measures for an investment on this asset.
At this point we must remind ourselves that not all of the above operations
are carried out under the same measure. This fact will largely determine which
data will be appropriate to facilitate a calibration method. Some parameters,
such as the drift in the Black-Scholes framework, are not the same under the
objective and the pricing measure, while some others, such as the volatility, are.
In particular, if our ultimate goal is pricing, we must place ourselves under
the pricing measure and use instruments that are also determined under the
same measure. In this way the prices that we produce will be consistent with
the prices that we use as inputs, and we will not leave any room for arbitrage.
The dynamics recovered under this data set will not be the real dynamics of the
underlying asset: instead, they will be consistent with the attitude of investors
against risk, and thus modified accordingly. In general, drifts will be lower,
volatilities will be higher, and jumps will be more frequent and more severe.
When pricing assets, investors behave as if this is the, precisely because these
are the scenarios that they dislike.
On the other hand, if our goal is forecasting or risk management, we are
interested in the real asset dynamics. We do not want the parameters to be
contaminated by risk aversion, and the appropriate data in this case would be
actual asset prices. Based on the real historical movements of assets we will
base our forecasts for their future behaviour.
Nevertheless, there are situations where we might want (or have to) use both
probability measures jointly. As derivative prices are forward looking we might
want to augment our information set with their prices, in order to produce more
accurate forecasts. From an “academic” point of view, since the distance between
 !"#%$& '()"  132(5.1)

FIGURE 5.1: Examples of density and likelihood functions. A sample of size N


is drawn from the N(1.00, 2.00) distribution, presented with the green points
on the horizontal axis. Three densities are also presented, together with the
corresponding sample values. The blue curve
PSfrag replacements
0
gives the true N(1.00, 2.00) which
gives a log-likelihood L(N = 10) = −19.37 0.1
0.2 and L(N = 50) = −101.54; the
0.3
red curve gives the curve for N(0.00, 4.00) 0.4 which far from the true density and
0.5
has a low log-likelihood L(N = 10) =0.6 0.7
−23.60 and L(N = 50) = −121.55;
finally the black curve is the one that 0.8 maximizes the likelihood, N(0.08, 1.32)
0.9

%
' :
& S  $ C
 
' e
 
1
with L = −17.00 for N = 10, and N(0.83, 1.82) with L = −100.98 for N = 50.
[ ` Z Y Y ]

0
PSfrag replacements 0.3 0.1 0.3
0.2
0.3
0.4
0.5
0.2 0.6 0.2
0.7
0.8
0.9
1
0.1 0.1

0 0
-8 -6 -4 -2 0 2 4 6 8 -8 -6 -4 -2 0 2 4 6 8

(a) N = 10 (b) N = 50

the two probability measures depend on the risk premiums, we might want to
identify these premiums for different risk components. For instance, we might
want to quantify the price of volatility risk versus the price of jump risk. Finally,
in some situations we do not observe the underlying asset directly. This is the
case in fixed income markets, where we can attempt to identify the true dynamics
using time series of bonds which are evaluated under the pricing measure.
In this chapter we will focus on the case where calibration is carried out using
a time series of historical values. There is a plethora of methods available, but
we will focus on the most popular one, the maximum likelihood estimation (MLE)
technique. We will not focus on deriving the properties of MLE, but will rather
refer to Davidson and MacKinnon (1985) and Hamilton (1994). These books also
give a detailed analysis of variants of MLE, as well as alternative method’s of
moments. For an introduction to Bayesian techniques, a good starting point is
Zellner (1995).

5.1 THE LIKELIHOOD FUNCTION


Suppose that we have in our disposal a time series of observations, say x =
{x1 , . . . , xT }, and a model which we assume has produced these observations. We

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


133(5.1) Q #RO& $>S%!%TVU>! WRX$>SO&T
will denote with large X the random variables that are produced by the model, 1
and with small x the realizations that make up our sample. We will collect all
parameters of this model in a (K × 1) vector θ. Our objective is twofold: we
want to find an estimator θ̂ of θ which is based on our data set, but we also
want to produce some confidence intervals on θ̂, acknowledging the fact that
our data set is finite and thus our produced estimators are not equal to the true
parameters of the data generating process. The likelihood function is a measure
of fit that will allow us to fulfill these objectives.
To implement the likelihood function we scan through the sample, and pretend
that we are standing at each point in time. Suppose that we are currently at
the t-th observation, with 1 6 t 6 T . Given a value of the parameter set, θ, we
produce the conditional density fXt+1 (x|θ, x1 , . . . , xt ), which we abbreviate with
ft+1|t (x). Notice that we only use the information that was available at time t.
Essentially we are asking the question: when we were at time t in the past, how
would our forecast for time t + 1 look like? Then we go to the next time period
t + 1 and see how good our forecasting density was: if we forecasted rather well,
then the value of the density ft+1|t (xt+1 ) will be high; if our forecasting density
was poor, then ft+1|t (xt+1 ) would be close to zero.
The value ft+1|t (xt+1 ) is the likelihood of the point xt+1 , seen as a function
of the parameter vector. To make this more explicit we introduce the notation
ft+1|t (θ; x) for this likelihood. In order to construct the likelihood of the sample,
Q −1
we take the product Tt=1 ft+1|t (θ; x). The maximum likelihood estimator θ̂ will
be the one that maximizes the sample likelihood. Since this maximum will be
the same under an increasing transformation, and for some other properties that
we will discuss shortly, we typically work with the log-likelihood of the sample 2
T −1
X
L(θ; x) = log f(xt+1 |θ, x1 , . . . , xt )
t=1

To select the maximum log-likelihood we need to set the first order conditions,
namely that

L(θ̂; x) = 0
∂θ
The second order conditions will dictate that for the likelihood to be actually
maximized, the K × K Hessian matrix
1
To be more precise, X contains the random variables that are conditional on their
history. That is to say, the random variable Xt is conditional on the realizations of all
values that preceded it, namely {Xt−1 , Xt−2 , . . . , X1 }.
2
There are practical as well as theoretical reasons for doing so. Imagine having a sample
of 1000 observations from the red density of figure 5.1(b) on page 132, where each has
a likelihood of about 0.1 = 10−1 . Then the likelihood of the sample would be of the
order 10−1000 , small enough to confuse the best of computers. But the log-likelihood
(with base 10 for simplicity) is −1000, a much more manageable figure. This is a
practical issue; the theoretical benefits include the computation of standard errors as
described in the text.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  134(5.2)

∂2
H= L(θ̂; x) is negative definite
∂θ∂θ 0
The maximization of the log-likelihood function can be carried out analytically
in some special cases, but we typically employ some algorithm to produce θ̂
numerically. The choice of the appropriate algorithm will depend on the nature
of the likelihood function: if it is relatively well behaved, then a standard hill
climbing algorithm will be sufficient. In more complex cases, where the likelihood
exhibits local maxima or is even undefined for specific parameter sets, one needs
to resort to other techniques such as genetic algorithms or other simulation based
methods.
Figure 5.1 illustrates the intuition behind the likelihood function. Samples
are drawn from the blue distribution (for simplicity we assume that the sample
elements are independent and identically distributed) of lengths N = 10 and
N = 50. To compute the corresponding likelihood values, one has to compute
the density value at the sample points as shown. The red curves give a density
that is far away from the true one, and we can see that overall the function
values are lower. We numerically maximize the log-likelihood and estimate the
density that has produced the data, which is given in black. When the sample
is small, the estimated density is not close to the true data generating process,
but it will converge as the sample size increases.

5.2 PROPERTIES OF THE ML ESTIMATORS


Maximum likelihood estimators share some very appealing asymptotic proper-
ties. Asymptotic in this context means that these properties hold at the limit,
when the sample size approaches infinity. Therefore, one would tend to con-
sider them more valid for large data samples. Unfortunately, how large a “large”
sample should be is not set in stone, and depends on data generating process.
For that reason it is always to verify the validity of any claims that are based
on these properties via a small simulation experiment. Here we will go through
some fundamental properties and will see how they can used to make inference
on the quality of the estimators.
We make the fundamental assumption that our model is correctly specified.
This means that there is a data generating process which we guess correctly
up to the parameter values. If our model is misspecified all properties go out of
the window, even asymptotically. By using so-called bootstrap techniques we
can take some steps towards testing our hypotheses and constructing confidence
intervals, while taking into account possible misspecification.
We will denote the true value of the parameter set with θ ? . This is the
set that has actually generated the series we observe. For us though, this is a
random variable as we do not observe it directly. In fact, it is the qualities of
this random variable that we intend to quantify.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


135(5.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
THE SCORE AND THE INFORMATION MATRIX
As we said, we produce the maximum likelihood estimator by setting ∂L(θ; x)/∂θ =
0. This derivative is also called the score function, and the first order condition
corresponds to an important property of the score, namely that its expectation,
at the true parameter set is zero

∂L(θ ? ; X )
E =0
∂θ
Note that the random variable in the above expectation is the data sample X .
For IID processes, maximum likelihood estimation can be viewed as setting the
empirical expectation of this score to zero.
In the same light we define the (Fisher) information matrix as minus the
expectation of the the second derivative of the log-likelihood, evaluated again
at the true parameter point

∂2 L(θ ? ; X )
I (θ̂) = −E
∂θ∂θ 0
As before, the Hessian matrix produces an estimate of the information matrix
which is based on the sample. The information matrix will be by construction
positive definite, and therefore invertible.
It turns out that we can also say something on the covariance matrix of the
score. In fact, it will be equal to the information matrix
!2
∂L(θ̂; X ) ∂L(θ̂; X )
V =E = I (θ̂)
∂θ ∂θ

What is the correct way to view these expectations and variances? Say that
we knew the true parameter set, and we constructed a zillion sample paths based
on these parameters, each one of length T . If we compute the score vector based
on each one of these samples, we would find that the average of each element
is zero and that the covariance matrix is given by the information matrix.
The information matrix plays another important role, as its positive defini-
tiveness is a necessary condition for all other asymptotic properties to carry
through.

CONSISTENCY AND ASYMPTOTIC NORMALITY


Lets say that we have a method of producing estimators, not necessarily maximum
likelihood. If we produce different samples with the true parameter set, we will
obviously end up with a different estimated value each time, and let us denote
with θ̂(x) the estimated parameter set that is generated by the sample x. Of
course we cannot carry this experiment out, since we do not know the true
parameter set, but we can pose the question: if we produce a zillion alternative

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  136(5.3)

samples, do we expect the average of their estimators to be equal to the true


one? An estimation method is called unbiased if this is true, namely that

Eθ̂(X) = θ ?

The maximum likelihood estimator is not generally unbiased, and this appar-
ently is not a good thing. But the maximum likelihood estimator is consistent,
which means that as the sample size increases the bias drops to zero. Further-
more, the variance of the estimator’s distribution also drops to zero, indicating
that the maximum likelihood estimator will converge to the true value as the
sample size increases, or more formally that

plimθ̂(X ) = θ ? as the sample size increases

It also turns out that the distribution of the MLE is Gaussian, with covariance
matrix equal to the inverse of the Fisher information matrix evaluated at the true
parameter value. We can therefore write

θ̂(X ) ∼ N θ ? , I (θ ? )−1

Furthermore, the variance I (θ ? )−1 of the MLE is equal to the so called Cramér-
Rao lower bound, which states that no other unbiased estimator will have smaller
variance than the MLE. This also makes the maximum likelihood estimator
asymptotically efficient. In practice we do not know the value of I (θ ? ) and use
an estimate instead, for example one based on the Hessian of the log-likelihood.

HYPOTHESIS TESTING AND CONFIDENCE INTERVALS


For large samples we can utilize the efficiency and asymptotic normality of the
maximum likelihood estimator, and produce confidence intervals that are based
on the normal distribution. In particular, based on the sample we can test the
hypothesis
H0 : θ ? = θ †
against the alternative that θ ? 6= θ † . Under the null the maximum likelihood
estimates will be asymptotically distributed as

θ̂(X ) ∼ N θ † , I (θ † )−1

We are therefore naturally led to the statistic

θ̂(x) − θ †
Z (x) = p
I (θ † )−1

which is distributed as a standardized normal. We would reject the null hypoth-


esis at, say, the 95% confidence level if |Z (x)| > 1.96.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


137(5.3) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 5.1: Y `
$>R $ T&  $>R $ T& 
Y Y , Y ` Y Y and $>R Y $ ` T & Y  Y : Simulation and maxi-
mum likelihood estimation of ARMA models
p N ,8aNs5%38@Bw8
*u%;t.=O3+>; y~}ƒN,8aNs5%380.e6¬N>,v6–8N6¬5e6/¯K
1ƒ} A4;CŠ%=? 0N,Gc†{®~} A4;CŠ%=? 98CNKc†ªNr} 8ONy 1v6<®tc†
³~}Ÿ¯:„r‹rN†/©Œ}Ÿ¯<e†
uƒ}Œ5— ,N;C“; 9³z6/©Kc†{y~} ‰4,C+5 9³z6/©Kc†
5
yz:„KF-Nz6%F-r}Œ.HG0„Cž 5u8 :N,Kc†
y‚Œ}Œ‚c†ªu‚Œ}~‚c†
*+, ‰C}NzF<³(ž%„
3 * 1Gœ‚g6ùy‚Œ}~N,K—yz<‰Kž>1K‹O„F9‰6%F-e† 4>;“
3 * ®tœ‚g6ªu‚Œ}š8CNt—>uv<‰Kž®a‹O„F9‰6%F-e† 4>;“
10
yg<‰a‹O„e6%F-•}Œ.Ÿ‹ry‚Œ‹•u‚Œ‹šuz<‰a‹O„e6%F-(†
4>;“
p N,8aNs%A3 àBw8
*u%;t.=O3+>; M~}ƒN,8aNs%A3  0.e6¬N>,v6–8N6¬5e6{yt
15
1ƒ} A4;CŠ%=? 0N,Gc†{®~} A4;CŠ%=? 98CNKc†ªNr} 8ONy 1v6<®tc†
³~} A4;CŠ%=? <ytš‹rN†Vuƒ} ‰4,C+5 9³z6 „e†
y~}ˆx-.CHG0„Cž 5u8 0N,G%—O+ ;C45g<Nz6:„†{yC|(†
y‚Œ}Œ‚c†ªu‚Œ}~‚c†¬M~}Œ‚c†
*+, ‰C}NzF<³(ž%„
20
3 * 1Gœ‚g6ùy‚Œ}~N,K—yz<‰Kž>1K‹O„F2‰tc† 4>;“
3 * ®tœ‚g6ªu‚Œ}š8CNt—>uv<‰Kž®a‹O„F2‰tc† 4>;“
uz2‰Kž 1K‹%r}•yg2‰Kž 1K‹%‡ž‘:.Ÿ‹ry‚Œ‹•u‚tc†
M~}•M‘ž A+Š -;a+,81C“%*9u9‰Gž 1K‹%6{‚6ù5Cc†
25
4>;“
p N,8aNs4yN>81aA4@Bw8
¯Œ}”„>‚%‚%‚e†¬³~}ƒ™‚(†
M~}ƒN,8aNs5%38 2‚eB9‚‚6‘x‚eYB %™|v6¡x ž>‚eB0„ ‚|6 ‚(7B 6Žxw³z6/¯O|Ce†
y~} ‰4,C+5 ¯6ª˜Ce†
30
+ 1=ƒ}‡+>1C=O38K54=°C-¸CN,%Š4¨.NA46:+**6-¥a3%5 1aANM°6:+**:c†
*+, 3:;G}O„KF-¯
*~p } g<DtrN,8aNs%A3  <Dz:„g6/Dz<6/Dz Cg6VDz9˜Og6/MF6 3:;(c†
‰~}Œ*8K3 ;t.+>; 2*6Žx9‚B2‚%‚‡‚eYB %™úž>‚eB0„ ‚ƒ‚eYB |6‡x¤|6Žx¤|6‡BBB
x¤|6‡x¤|6‡x0ž ;* ž>‚eYB úž>‚eYB Ž‚eB9‚O„|v6ŽBBB
35
x º ;* ‚B Žº ‚B ƒ‚eB2™‚|v6Žx¤|6 + 1=Gc†
yg:30;à6%F-r}•‰g†
4>;“

5.3 SOME EXAMPLES

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  138(5.3)

FIGURE 5.2: Bias and asymptotic normality of maximum likelihood estimators.


Sample paths of an ARMA(1,1) model xt = c + αxt−1 + εt + βεt−1 are simulated,
and the parameters are subsequently estimated. 1,000 short samples (T = 50)
and 1,000 longer samples (T = 500) were generated. The graphs give the dis-
tribution of the estimators for the short sample size in green, and for the longer
sample size in blue. The true parameter values are in red. The table presents
$>R Y $ `  Z $ Y C'e Y ]
some summary statistics that correspond to the distributions.
[

PSfrag replacements
PSfrag replacements 35 10

9
30
8
25
7

1 6
20

5
15
4

10 3

2
5
1

0 0
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 PSfrag
0.15 replacements
0.2 -0.2 0 0.2 0.4 0.6 0.8
(a) constant (c) (b) AR parameter (α)
PSfrag replacements

7 50

45
6
40

5 35

30
4
25
3
20

2 15

10
1
5

0 0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4
(c) MA parameter (β) (d) volatility (σ )

T c α β σ
True 0·00 0·75 −0·10 0·30
Mean 50 −0·002 0·64 −0·013 0·27
500 0·000 0·74 −0·093 0·30
Volatility 50 0·056 0·20 0·24 0·030
500 0·012 0·044 0·063 0·010
Skewness 50 0·001 −1·8 0·57 0·076
500 −0·12 −0·34 0·067 0·025
Kurtosis 50 8·1 8·8 4·7 2·9
500 3·5 3·1 3·0 3·0

Ž#rŽ‘N’#“”!•vrŽY––“@—˜ ”!•y™ “š‘“r˜ ›yœ ’”’@˜™˜˜#‘>ž|ŸŸ@j›~˜#™ “@‘Ž—¡˜¢’r𛣗 “r˜Ÿ


139(5.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
LINEAR ARMA MODELS
Suppose that the data generating process has autoregressive and moving aver-
age terms, and for simplicity assume that both effects are of the first order. Then,
we can write the process as

xt = c + αxt−1 + εt + βεt−1 , εt ∼ N(0, σ 2 )


Since our sample is finite, we have to make an extra assumption on the values
just before the starting date, namely x0 and ε0 . We will set them equal to their
c
expected values, that is to say x0 = 1−α and ε0 = 0.
The parameter set in this case is θ = {c, α, β, σ}. Each observation is
Gaussian conditional on its past, and in particular

xt |θ, xt−1 , . . . , x0 ∼ N(c + αxt−1 + βεt−1 , σ 2 )


Therefore we can compute the log-likelihood function of the sample as
T
X (xt − c − αxt−1 − βεt−1 )2
T
L(θ; x) = − log(2πσ 2 ) −
2 2σ 2
t=1

If we have enough time in our hands we could in principle maximize this


function analytically, but it is more convenient to carry out the maximization
numerically. Listing 5.1 shows how this can be done in a very simple way. In
fact, listing 5.1 conducts a small experiment that illustrates the bias and non-
normality of maximum likelihood estimators in small samples.
We assume a known parameter set, θ = {0, 0.75, −0.10, 0.30}, and simulate
paths of length T = 50 and T = 500. In total we simulate 1000 paths for each
length. We then estimate the parameters using maximum likelihood. Consistency
will indicate that although in small samples the estimators might be biased, as
the sample grows the mean should converge to the true value, while the estima-
tor variance should decrease. Due to the asymptotic normality the distribution
should become gradually closer to the Gaussian one.
Figure 5.2 gives the results of this experiment, and we can observe that this
is indeed the case. The bias is more pronounced for the autoregressive and
the moving average parameters, with their means biased towards zero for the
smaller sample. The non-Gaussian nature of the estimators is also apparent,
with the kurtosis being consistently high. As the sample size increases, the
estimator densities become tighter and markedly more symmetric. The volatility,
although slightly biased for the smaller sample, is very accurately estimated,
exhibiting very small standard deviation. This is a more general feature: drifts
are very sensitive to the actual path, as they largely depend on the first and the
last observation. Volatilities on the other hand take more information over the
variability of the path, and their estimators converge a lot faster.
Note that the fact that asymptotically the parameters are Gaussian does
not imply that they are independent. Indeed, the correlation matrices of the
parameter estimates are

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  140(5.5)
   
1 0·048 0·000 −0·038 1 0·052 −0·015 −0·042
 0·048 1 −0·71 0·081   −0·77 0·099 
  , and  0·052 1 
 0·000 −0·71 1 −0·044  −0·015 −0·77 1 −0·030 
−0·038 0·081 −0·044 1 −0·042 0·099 −0·030 1

for the sample sizes T = 50 and T = 500 respectively. Observe the high negative
correlation between the estimator of the autoregressive and the moving average
terms. As these two parameters compete to capture the same features of the
data,3 the estimates parameters tend to come in high/low pairs.

LÉVY MODELS
Lévy models can be easily estimated using the MLE approach, by inverting
the characteristic function using the FFT or fractional FFT methods of chapter
4. We can invert the characteristic function directly to produce the probability
density, or we can invert for the cumulative density and then use numerical
differentiation. Although the second method appears to be more cumbersome,
it is often more stable. This happens in the case of Lévy models because the
density typically exhibits a very sharp peak, which the direct transform might
fail to capture.
Irrespective of the method we choose to construct the density, the maxi-
mization of the log-likelihood should be straightforward. As Lévy models are
time-homogeneous, the returns are identically distributed. If we denote with
f(x; θ) the probability density of the Lévy model, then the log-likelihood can be
easily computed over a series of returns {x1 , . . . , xT } as
T
X
L(θ; x) = log f(xt |θ)
t=1

Generally speaking, as the FFT method will produce a dense grid for the
probability density function we only have to call the Fourier inversion once at
each likelihood evaluation and interpolate between those points. This renders
MLE quite an efficient method for the estimation of Lévy processes. 4
In our example we will be using the cumulative density function, recovered
with the code of listing 4.8. We use data of the S&P500 index.

3
An AR(1) process can be written as an MA(∞) one and vice versa. Therefore a series
that is generated by an AR(1) data generating process will produce MA(1) estimators
as a first order approximation, if the estimated model is misspecified.
4
Some popular Lévy models admit closed form expressions for the probability density
function. In principle this means that one can avoid the FFT step altogether and use
the closed form instead. It turns out that in the majority of cases these densities are
expressed in terms of special functions, which can be more expensive to compute (over
the data set) than a single FFT!

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


141(5.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
5.4 LIKELIHOOD RATIO TESTS
5.5 THE KALMAN FILTER
Many financial quantities can (and will) be dependent on factors that are unob-
served. Market sentiment, asset volatility, liquidity, the instantaneous risk free
rate and the position on the business cycle are just a few examples of such
unobserved factors which have an impact on the behaviour of other series. For
some of these quantities one can develop proxies, but they themselves will serve
at best as an unbiased but noisy manifestation of the real underlying quantity.
To deal with such problems various optimal filters have been developed, with
the Kalman filter being the most popular of all. Essentially, the Kalman filter is
described by a pair of linear equations, which specify the dynamics of the latent
and the observed variables. In particular, in its simplest form we can write

observation eq: Xt = ax + bx Yt + εt
transition eq: Yt = ay + by Yt−1 + ηt

with Gaussian error terms εt and ηt .


The information set in our disposal consists of observations of F t = {Xs :
s 6 t}, but we are really interested on the dynamics  of the process y t . The
filter provides us with
 estimates µ
 t|s = E Y t |F τ , together with the conditional
variances vt|s = E (Yt − µt|s )2 |Fτ . As it turns out the conditional distributions
are Gaussian, and are therefore summarized by these two moments.

THE FILTERING PROCEDURE


Kalman filtering consists of two steps, which are commonly referred to as the
prediction and correction steps. A recursive algorithm produces the conditional
distribution of Yt |Ft , based on the conditional distribution Yt |Ft−1 . Therefore,
given an initial Gaussian distribution for Y0 , one can move forward and produce
the filtered conditional distributions for all times t.
We begin by some notation. The conditional distribution of Y t given the
information at time s is denoted with Ft|s (y) = P[Yt 6 y|Fs ], and the conditional
d
density ft|s (y) = P[Yt ∈ dy|Fs ] = dy Ft|s (y). Using this notation, the prediction
step with provide us with the conditional density ft|t−1 , as a function of the
filtered density ft−1|t−1 ; the correction step will incorporate the new observation
xt and update the density ft|t−1 to produce the new filtered density ft|t . We will
use gt|s (x) for the corresponding density of Xt , gt|s (x) = P[Xt ∈ dx|Fs ]. With
φ(z; µ, σ) we denote the Gaussian density with mean µ and variance σ 2 .
For the prediction step we employing the formula for conditional probabili-
ties, conditioning on the value of Yt−1

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  142(5.5)

ft|t−1 (yt ) = P[Yt ∈ dyt |Ft−1 ]


Z
= P[Yt ∈ dyt |Yt−1 ∈ dyt−1 , Ft−1 ] P[Yt−1 ∈ dyt−1 |Ft−1 ] dyt−1
R
Z
= φ(yt ; ay + by yt−1 , ση ) ft−1|t−1 (yt−1 ) dyt−1
R

Observe that if the filtered density ft−1|t−1 is Gaussian, then the prediction for
Yt will also follow a Gaussian distribution, as the convolution of normals.
In the correction step we incorporate the new observation X t = xt which
updates the information set Ft = {Xt = xt } ∪ Ft−1 . We then write

ft|t (yt ) = P[Yt ∈ dyt |Ft ] = P[Yt ∈ dyt |{Xt ∈ dxt } ∪ Ft−1 ]

Bayes’ formula will then provide us with the alternative representation

P[Yt ∈ dyt | ∪ Ft−1 ]


ft|t (yt ) = P[Xt ∈ dxt |{Yt ∈ dyt } ∪ Ft−1 ]
P[Xt ∈ dxt | ∪ Ft−1 ]
ft|t−1 (yt )
= φ(xt ; ax + bx yt , σε )
gt|t−1 (xt )

In this expression gt|t−1 (xt ) is treated as a constant, as it is not dependent on


the variable yt .5 Once again, if ft|t−1 is Gaussian, it follows that ft|t will also be
normally distributed, as the product of two Gaussian densities.
We have in fact shown that if the distribution of the initial state variable
Y1 |F0 is Gaussian, th will also be Gaussian, which implies that Y1 |F1 is Gaus-
sian, which in turn implies that Y2 |F1 , Y2 |F2 , Y3 |F2 , ..., Yt+1 |Ft , Yt+1 |Ft+1 , and
so on, are all normally distributed.
This implies that our filtering procedure can be implemented by keeping
track of the conditional means and variances, rather than the whole distribution.
In fact, the prediction step will give the quantities µt|t−1 and vt|t−1 in terms of
µt−1|t−1 and vt−1|t−1 using iterated expectations on the transition equation. In
particular, we can write
    
µt|t−1 = E Yt |Ft−1 = E E Yt |Yt−1 , Ft−1

 integratingover Yt−1 . The transition equation yields


with the outer expectation
the inner expectation, E Yt |Yt−1 , Ft−1 = ay + by Yt−1 . The same procedure can
be applied to the variance, and the resulting predicted moments are given by

µt|t−1 = ay + by µt−1|t−1
vt|t−1 = b2y vt−1|t−1 + ση2
5
Aficionados of Baysian statistical inference will recognize gt|t−1 (xt ) as the normaliza-
tion constant which would be probably ignored. But in our setting it is not ignored,
in fact it facilitates the maximum likelihood estimation of the parameters.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


143(5.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
The correction step will update the means and variances using the observa-
tion Xt = xt . In particular, computing the product of the two Gaussian densities
gives6
  ( )
(xt − ax − bx yt )2 (yt − µt|t−1 )2
ft|t (yt ) ∝ exp − exp − 2
2σε2 2σt|t−1
( )
(yt − µt|t )2
∝ exp −
2vt|t

vt|t−1 bx (xt − ax ) + σε2 µt|t−1


for the quantities µt|t =
vt|t−1 b2x + σε2
vt|t−1 σε2
vt|t =
vt|t−1 b2x + σε2
In the literature the above expressions are more compactly written in terms
of the Kalman gain Kt , namely
vt|t−1 bx
Kt =
vt|t−1 b2x + σε2
µt|t = µt|t−1 + Kt (xt − ax − bx µt|t−1 )
vt|t = vt|t−1 − Kt vt|t−1 bx
This concludes the recursive scheme. Given the values for the parameter set
θ = {ax , bx , ay , by , σε , ση } and an initial Gaussian distribution for Y1 |F0 , we
can filter the expected path for Xt , together with its estimated variance. This
initial distribution is typically taken as the unconditional distribution of the
latent variable, that is to say a Gaussian with mean µ1|0 = ay /(1 − by ) and
v1|0 = ση2 /(1 − b2y ).
For completeness we give the filtering equations in the box below

Prediction
µt|t−1 = ay + by µt−1|t−1
vt|t−1 = b2y vt−1|t−1 + ση2
Correction
vt|t−1 bx
Kt =
vt|t−1 b2x + σε2
µt|t = µt|t−1 + Kt (xt − ax − bx µt|t−1 )
vt|t = vt|t−1 − Kt vt|t−1 bx
6
Regarding the notation, “∝” stands for “proportional to”; that is x ∝ y means that
x = C y for some constant C . Here we know that the resulting expression is a Gaussian
density, and therefore we are just interested in the structure of the exponential rather
than the constant that ensures that the total probability is equal to one.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  144(5.5)

LISTING 5.2:
%S $' Y $ " ` iC&>' %>R `Cj<d  Y : One dimensional Kalman filter
p ONA 8aN;as*O3A=C4%,Csa„¥ Bw8
*u%;t.=O3+>; x¤¸6E8°6ù5|~}~ONA 8aN;as*O3A=4%,OsO„ ¥ 16^t
Nƒ}š1v:„† q ƒ}š1v<e†ùN ¤‡}š1vCe† q ¤‡}š1v9˜O†
DNƒ}š1v<™ac†¢D ¤‡}š1v Cac†
³~} A4;CŠ%=? @ tc†
5
8C‚Œ}ŒN ¤GHG0„Cž q ¤cc†ªD‚Œ}~D¤GHG0„Cž q ¤(e† p 1C,4“O3%.=4“‘8t+ 8a4;C=O5
8‡} ‰4,C+5 9³v6:„e†{D~} ‰4,C+5 9³v6:„e†/¸Œ}Œ‚c†
*+, =a}O„KF-³
p A3 O4A3 ?a+%+“þ4DNA>uON=O3+>;
8K„š}ŒN%sy‡‹ q K—>8C‚†
10
DO„š} q c%—D‚Œ‹rDNz†
¸Œ}Ÿ¸ƒ‹ A+Š -DO„Cš‹ @ g-=tCž8K„CH%DO„(†
p .+,%,4C.=O3+>;¶5>=%4>1
©Œ}~D‚t— q KHG:D‚t— q c‹aND Gc† p ©CNA 8aN;”Š%N3;
8<=tŸ}š8C‚Œ‹Ÿ©K—Kr g<=tŒžN ”ž q K—>8C‚Kc† p u%1C“N=4~8O4N>;
15
Dg<=tŸ}rD‚‘ž±©t—D‚t— q † p u%1C“N=4ŽDN,O3N;t.4
p 1C,4“O3%.=O3+>;¶5>=%4>1
8C‚~}ŒN ¤‡‹ q ¤G— 8<=tc† p u%1C“N=4~8O4N>;
D‚~}rDg<=t%— q ¤c~‹~D ¤z† p u%1C“N=4ŽDN,O3N;t.4
4>;“
20
5Ÿ} 5>®,= <Dtc†

MAXIMUM LIKELIHOOD ESTIMATION


The quantity gt|t−1 (xt ) is very useful in this setting, since it represents the
likelihood of observation xt , given the information at the previous point in time,
Ft−1 . With that, we can devise a maximum likelihood estimation strategy for
the parameters θ, by searching for the parameter set that optimized the log-
likelihood of the sample
( T )
X
θ̂ = arg max log gt|t−1 (xt ; θ)
θ
t=1

In particular, we can write the conditional likelihood of each observation by


conditioning on Yt = yt
Z

gt|t−1 (xt ) = φ(xt ; ax + bx yt , σε ) φ(yt ; µt|t−1 , vt|t−1 ) dyt
R

which once again as a convolution is indeed Gaussian. Iterated expectation of


x
the observation equation will provide us with the mean (µt|t−1 ) and variance
x
(vt|t−1 ) of the conditional likelihood

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


145(5.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
x
µt|t−1 = ax + bx µt|t−1
x
vt|t−1 = b2x vt|t−1 + σε2

Listing 5.2 gives an implementation of the one dimensional Kalman filter,


together with the log-likelihood computation. In figure 5.3 we present an example
of the filter output. The system of equations that have been implemented are

observation: Xt = 0.00 + 1.00Yt + εt , σε = 0.50


transition: Yt = 0.00 + 0.90Yt−1 + ηt , ση = 0.20

The series of interest is Yt , and its dynamics are known, but we cannot observe
Yt directly. Instead, we observe a noisy version Xt , which we use to filter out
the path of Yt . In the figure, the observed series are given by the blue crosses,
and based on these we filter out µt|t , which is given in red. With green we give
the true path of Yt , which we want to reconstruct. We also give the two standard

deviation shaded area µt|t ± 2 vt|t .
However, in this experiment we have assumed that the parameters of the
dynamic system are known, which is not the case in practice. In a real life
situation, we would be given the series of observations X t , and we would be asked
to estimate the parameter vector θ, and then filter out the latent component.

GENERALIZATIONS AND EXTENSIONS


This paradigm is now extended in two directions: first we introduce multivariate
filtering systems, and we then consider a more general nonlinear framework.

Multivariate systems
The filter described above can be easily extended to vector processes; the matrix
algebra is a bit more involved, but the ideas remain the same. 7 Exogenous
explanatory variables can be also included to the observation equation. In its
most general form, the Kalman filter equations are given by

observation eq: X t = A Z t + Bx Y t + ε t
transition eq: Y t = By Y t−1 + ηt

Here, X t is a (Nx × 1) vector of the observed variables, Y t collects the Ny latent


factors, and Z t keeps the Nz explanatory variables. In addition, the white noise
processes ε t and ηt possess covariance matrices Σε and Ση respectively.
Once again the filtering equations will  ba Gaussian,
 but now in a multivariate
setting. We will denote with µ t|s = E Y t |Fτ , with corresponding covariance
matrix Vt|s = E (Y t −µt|s )·(Y t −µ t|s )0 |Fτ . The Kalman filter updating equations
are given in the box below as the counterparts to the univariate case
7
See Hamilton (1994) for an excellent exposition, and Caines (1988) for a more detailed
treatment of linear stochastic systems.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  146(5.5)

FIGURE 5.3: Kalman filtering example. The latent series X t that we want to
reconstruct is given in green, and the observed series Y t is given by the blue
crosses. The filtered series µt|t is given in red, together with the two standard

%
S 
$ ' $ " C
i >
& ' %
 >
 R ` Z Y e Y ]
 $ C

deviation shaded area µt|t ± 2 vt|t .
[ Y ` >
'

1.5

PSfrag replacements
1

0.5

-0.5

-1

-1.5
0 10 20 30 40 50 60 70 80 90 100
(a) true parameters

1.5

PSfrag replacements
1

0.5

-0.5

-1

-1.5
0 10 20 30 40 50 60 70 80 90 100
(b) MLE parameters
* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H
147(5.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
Prediction
µ t|t−1 = By µ t−1|t−1
Vt|t−1 = By Vt−1|t−1 B0y + Ση
Correction
Kt = Vt|t−1 B0x (Bx Vt|t−1 B0x + Σε )−1
µ t|t = µ t|t−1 + Kt (x t − A z t − Bx µ t|t−1 )
Vt|t = Vt|t−1 − Kt Bx Vt|t−1

If we set the innovation process


W t = X t − A Z t − Bx µt|t−1
and denote with w t its realization, then we can write the correction step above
in an alternative, sometimes more intuitive form, as

Correction
−1
Kt = X t , Y t |Ft−1 W t , W t |Ft−1
µt|t = µ t|t−1 + Kt w t
Vt|t = Vt|t−1 − Kt W t , W t |Ft−1 Kt0

with ·, ·|· a shorthand for the conditional covariance matrix.


Listing 5.3 implements the general form of the multivariate Kalman filter.
Such filters are typically used in yield curve modeling. For that reason, an
implementation example of the multivariate Kalman filter can be found in section
7.6.
The standard Kalman filter system is linear in nature and Gaussian. Another
important feature (or limitation) is that the latent variable enters the observation
through the drift. A number of extensions have been proposed in the literature
which attempt to relax some of these restrictions. As a rule, these extensions
involve approximations at one point or the other, and therefore their performance
is as good as this approximation. A nonlinear version of the system can be written
as follows, where we ignore the impact of exogenous variables for simplicity

observation eq: X t = h Y t , ε t

transition eq: Y t = f Y t−1 , ηt
The idea is to propagate the mean and covariance matrix by prediction and
correction steps, just like the standard Kalman filter. The differences between
the approaches is in the way these moments are computed.

Extended Kalman filter


The extended Kalman filter takes a simple linearization approach, by approxi-
mating a nonlinear function with its first order approximation. This means that

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  148(5.5)

LISTING 5.3:
S%$' Y $ " ` iC&>' %>Rg Y : The N-dimensional Kalman filter
p ONA 8aN;as*O3A=4, Bw8
*u%;t.=O3+>; x¤¸6–€6¬¨|~}~ONA 8aN;as*O3A=4,@16\t
«ƒ}š1B0«z†{¿ƒ}š1B2¿ z†¢¨ »‡}š1B0¨ »† p + q 54,%DN=O3+>;t5
« ¤‡}š1B0« ¤†{¿N¤‡}š1B2N¿ ¤†¢¨ ƒ‡}š1B0¨ƒ† p *NC.=C+,O5
€%«Œ}š1B<€%‚†{§%«Œ}š1B2§%‚†
5
³~} 53>‰%4 Y v6:„e†
€Œ} ‰4,C+5 9³z6 53>‰%4 0¨ ƒ6:„Oc†ù¨•}Ÿ€†/¸Œ}Œ‚c†
*+, =a}O„KF-³
p 1C,4“O3%.=O3+>;¶5>=%4>1
€£ƒ}Œ« ¤‡‹•N¿ ¤G—€%«† p u%1C“N=4~8O4N>;
10
§£ƒ}r¿N¤G—§%«t—¿¤°ª‹ƒ¨ƒz† p u%1C“N=4ŽDN,O3N;t.4
pp A3 O4A3 ?a+%+“þ4DNA>uON=O3+>;
€yŒ}Œ« ƒ‹•¿ K—€£†
§yŒ}r¿ K—§£G—N¿ vª‹ƒ¨ »z†
=Œp } g9=v6%F-g0ž€yz†
15
¸´}•¸Œ‹ A+Š  “%4= -§yKŸ„ ‹ =°2H§yK— =z†
p .+,%,4C.=O3+>;¶5>=%4>1
©´}r§£G—N¿ °2H§yz† p ©CNA 8aN;‘Š%N3;
€%«~}r€£‡‹Ÿ©K—Kr g2=6%F-g¢ž« ”ž>¿ K—€£cc† p u%1C“N=4r8O4N>;
§%«~}r§£¡ž±©t—¿ K—§£† p u%1C“N=4‡DN,O3N;t.4
20
p 5=C+,4
€2=6%F-r}Ÿ€«v†
¨e2=6%F-r} 5>®,=  “C3NŠ -§%«Kg†
4>;“

the linearization error will be an effect of Jensen’s inequality. For instance, the
transition equation is linearized around the point (µ t−1|t−1 , 0), which yields the
approximation
   
f Y t , ηt ≈ f µ t−1|t−1 , 0 + Fy Y t−1 − µ t−1|t−1 + Fη ηt

Where Fy and Fη are the matrices of first derivatives (the Jacobians) of the
function f , with respect to the corresponding elements. Now the system has a
linear form, and taking the expectation and the covariance produces the set of
prediction equations

µ t|t−1 = f µ t−1|t−1 , 0
Vt|t−1 = Fy Vt−1|t−1 F0y + Fη Ση F0η

We can apply the same idea to produce the correction step, namely

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


149(5.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
Kt = Vt|t−1 H0x (Hx Vt|t−1 H0x + H0ε Σε Hε )−1
µ t|t = µ t|t−1 + Kt (x t − Hx µ t|t−1 )
vt|t = vt|t−1 − Kt Hx Vt|t−1

Once again the matrices Fy and Fη denote the appropriate Jacobians, in this
case of the function h.

Unscented Kalman filter


Although the standard Kalman filter is optimal in the space of linear models, the
extended Kalman filter is suboptimal in the much wider space of nonlinear mod-
els. The unscented Kalman filter takes a different route in producing estimates
for the mean and the covariance of the relevant random variables. This was in-
troduced in Julier and Uhlmann (1996, 1997) and has since gained popularity in
nonlinear filtering.
The main idea behind the unscented filter can be summarized in contrast to
the extended Kalman filter approach: Both transition and measurement equa-
tions are nonlinear functions of random variables. In the extended Kalman filter,
the means and the covariances of these nonlinear functions are approximated by
linearizing the functions themselves. In the unscented Kalman filter, the means
and covariances are computed by creating first a small sample of points from the
appropriate multivariate Gaussian distribution, and then transform these points
via these nonlinear functions. These sample points are called sigma points in
the unscented filtering jargon. It turns out that in many cases this approach
provides a much more robust approximation to the true dynamics. In addition,
the unscented implementation does not require taking derivatives, which can be
a cumbersome procedure and requires sufficiently smooth functions.
It is clear from the Kalman filtering equations that all needed for the pre-
diction and correction steps are the following steps
1. For the prediction step produce µ t|t−1 and Vt|t−1 .
2. For the correction step compute the covariance between the next observation
and the latent variable X t , Y t |Ft−1 .
3. For the correction step compute the variance of the innovation W t , W t |Ft−1 .
The unscented filter produces these quantities in an intuitive and simple to
implement way. To do so, we need to introduce the augmented state vector Y at ,
which concatenates the state process and the innovations, and has length N a
 
Yt
Y at =  ε t 
ηt

We will denote with µ at|s and Vt|s a


the mean and covariance matrix of this aug-
mented state vector. The unscented Kalman filter will iterate these two moments
forward; to retrieve the moments of the latent variable Yt , one has to extract the
corresponding first rows of µ at|s , and the top left submatrix of Vt|s
a
, respectively.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  150(5.5)

The augmented state vector moments are initialized with


   
µ 0|0 V0|0 0 0
µ a0|0 =  0  and V0|0
a
=  0 Σε 0 
0 0 0 Ση

The unscented Kalman filter will iterate through the subsequent times t =
1, 2, . . . , T , applying a prediction and correction step.
For the prediction step we compute M = 2N a + 1 sigma points, based on
a
columns of the Cholesky decomposition of the covariance matrix V t−1|t−1 (which
we denote here with a square root). The points are computed by horizontally
concatenating as follows
 q q 
ς at−1|t−1 = µat−1|t−1 ; µ at−1|t−1 + γ Vt−1|t−1
a
; µat−1|t−1 − γ Vt−1|t−1
a

This will be a (N a × M) matrix. The idea behind this computation is that the
M-point sample that is produced by the columns of ς at−1|t−1 exhibits mean and
covariance of µ at−1|t−1 and Vt−1|t−1
a
, respectively. It can be viewed as a minimal
Monte-Carlo simulation of a sample with given first two moments. Thus we can
view the sigma points as
 
[Y ,1] [Y ,2] [Y ,M]
ς t−1|t−1 ς t−1|t−1 · · · ς t−1|t−1
 [ε,1] [ε,2] a,[ε,M] 
ς at−1|t−1 =  
 ς t−1|t−1 ς t−1|t−1 · · · ς t−1|t−1 
[η,1] [η,2] [η,M]
ς t−1|t−1 ς t−1|t−1 · · · ς t−1|t−1

a concatenation of samples from the state variable and the error terms, given
the information at time t − 1.
Based on these sigma points one can produce now the predictions for the
state variable and its covariance, by taking the sample moments of the function
f , applied at the sigma points
[Y ,m] [Y ,m] [η,m] 
ς t|t−1 = f ς t−1|t−1 , ς t−1|t−1 for m = 1, 2, . . . , M
M
1 X [Y ,m]
µt|t−1 = ς t|t−1
M
m=1
M
1 X [Y ,m]   [Y ,m] 0
Vt|t−1 = ς t|t−1 − µt|t−1 ς t|t−1 − µ t|t−1
M
m=1

This completes the prediction step.


Applying the function h at the sigma points will produce the forecast for the
observation at time t. Using these forecasts allows us to form the innovations
at the forecasted sigma points, and thus the covariance matrices X t , Y t |Ft−1
and W t , W t |Ft−1 . In particular

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


151(5.5) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 5.4:
W "OT^> "%X ` iC&>' %>R Y : The unscented Kalman filter
p u%;t5%.4;C=4“Cs*a3%A=C4, Bw8
*u%;t.=O3+>; x€=v6ù¨>=O|~}~u%;t5%.4;C=4“Cs*O3%A=C4%,@16ù“%N=%Nc
*~}š1B2*g† p 8a4%NC5 uC,4>8a4;C= *u%;t.=O3+>;
?ƒ}š1B?z† p =%,N;t5%3=O3+>;þ*u%;t.=O3+>;
¨4Œ}š1B ¨3Š8aNs41t5%3A%+>; † p 8a4%NC5 uC,4>8a4;C=þ4,%,C+,¡.+DN,O3N;t.4
5
¨?Ž}š1B ¨3Š8aNs4=Nv† p =%,N;t5%3=O3+>;ˆ4,%,C+,¡.+DN,O3N;t.4
€%‚Œ}š1B<€%‚†{§%‚Œ}š1B2§%‚† p 3 ;t3=O3NA~8t+ 8a4;C=O5
¯Mƒ} 53>‰%4 2€‚°6:„e†{¯%4Œ} 53>‰%4 0¨>4°6:„e†
¯?Ž} 53>‰%4 0¨:?à6:„e†/€´}r¯MO‹¯%4O‹%¯?v†
10
µŒ} x9‚B<™%H%€t—C+ ;C45<%—€v6:„|(†
µŒ} “C3NŠ 2µKc†
€%NŒ}ˆx2€%‚g† ‰4,C+5 2¯4°6:„e† ‰4,C+5 2¯>?à6:„%|(†
§%NŒ}ˆx2§%‚ ‰4,C+5 2¯Mv6<¯%4K ‰4,C+5 2¯Mv6-¯?(
‰4,C+5 2¯46<¯MG•¨4 ‰4,C+5 2¯46-¯?(
15
‰4,C+5 2¯>? 6<¯MG ‰4,C+5 2¯>? 6<¯%4K•¨? |(†
€=ƒ}¶xw|e† ¨>=ƒ}¶xw|e†
*+, =~}‘„tF A4;CŠ%=? 0“%N=%Nc
p .+>;t5=%,ut.=ˆ5%3Š8aN~1a+3 ;C=O5
«Ã} 5>®,= 2€K%— .?O+A -§%Nt(†
20
¨ £ƒ}Œ,418aN=2€Nv6%x<„(6·—€O|C•‹ˆx2«‡ž«C|(†
¨>£%Mƒ}ƒ¨ £°:„KF<¯M 6%F-e†
¨>£4Œ}ƒ¨ £<¯MK‹O„KF-¯Ma‹¯>?à6%F-†
¨>£?Ž}ƒ¨ £<¯Ma‹%¯?c‹O„KF 4 ;%“ 6%F-e†
p 1C,C+ 4C.=O3+>;
25
¨>£%M£‘}•*g:¨ £M°6¢¨>£4Kc†
€M´}Œ,418aN= 5u8  ¨>£%M£(— µ6ù%6ƒx-„e6<%—€a|Ce†
§%MMƒ}þ:¨ £M£zž€MG%—µG—K ¨ £M£zž€MGg†
p .+,%,4C.=O3+>;
¨>£%y£‘}š?z:¨ £M£@6¢¨>£?(c†
30
¨>£J£‘}~“%N=%N<=t~žr¨>£%y£°†
€yŒ}Œ,418aN= 5u8  ¨>£%y£c—>µ6ù%6ƒx-„e6<%—€a|C(†
€Jƒ}Œ,418aN= 5u8  ¨>£J£c—>µ6ù%6ƒx-„e6<%—€a|C(†
§%yMƒ}þ:¨ £M£zž€MG%—µG—K ¨ £y£zž€yGg†
§JJ‡}þ:¨ £J£zž€Jc%—µG—K ¨ £J£zž€Jcg†
35
¦r}r§%yMG— 3;D -§JJcc†
€%Nz:„KF0¯MGš}•€Mv%FO6:„Ÿ‹~¦O—€Jv%FO6:„e†
§%Nz:„KF-¯M 6:„KF-¯MGš}r§%MM¡ž·¦O—§JJG—¦†
€=Œ}úx9€=z†{€%N:„KF-¯MGz|(†
¨>=Œ}úx-¨>=z† 5>®,=  “C3NŠ -§%N:„KF<¯M 6:„KF-¯MGz|(†
40
4>;“

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  152(5.5)

[X ,m] [Y ,m] [ε,m] 


ς t|t−1 = h ς t|t−1 , ς t−1|t−1 for m = 1, 2, . . . , M
[W ,m] [X ,m]
ς t|t−1 = x t − ς t|t−1
M
1 X [X ,m]
µXt|t−1 = ς t|t−1
M
m=1
M
1 X [W ,m]
µW
t|t−1 = ς t|t−1
M
m=1
M
1 X [X ,m]   [Y ,m] 0
X t , Y t |Ft−1 = ς t|t−1 − µ Xt|t−1 ς t|t−1 − µ t|t−1
M
m=1
M
1 X [W ,m]   [W ,m] 0
W t , W t |Ft−1 = ς t|t−1 − µ W
t|t−1 ς t|t−1 − µ W
t|t−1
M
m=1

Based on these quantities we can now update the mean and covariance of
the latent variable, incorporating the new information as follows
−1
Kt = X t , Y t |Ft−1 W t , W t |Ft−1
µ t|t = µ t|t−1 + Kt w t
Vt|t = Vt|t−1 − Kt W t , W t |Ft−1 Kt0

This completes the correction step and the recursion algorithm.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


6
Volatility

In this chapter we will investigate the modeling of volatility, and its implications
on derivative pricing. We will start with some stylized facts of the historical and
implied volatility, which will benchmark any forecasting or pricing methodology.
We will then give an overview of Garch-type volatility filters and discuss how
the parameters can be estimated using maximum likelihood. We will see that
although Garch filters do a very good job in filtering and forecasting volatility,
they fall somewhat short in the derivatives pricing arena. These shortcomings
stem from the fact that Garch, by construction, is set up in discrete time, while
modern pricing theory is set up under continuous time assumptions.
Two families of volatility models will be introduced for pricing and hedging.
Stochastic volatility models extend the Black-Scholes methodology by introduc-
ing an extra diffusion that models volatility. Local volatility models, on the other
hand, take a different point of view, and make volatility a non-linear function
of time and the underlying asset. Of course each approach has some benefits
but also some limitations, and for that reason we contrast and compare these
methods.
It is important to note that this chapter deals exclusively with equity volatil-
ity, and to some extend exchange rate volatility. These processes are typically
represented using some variants of random walk models. Fixed income securities
models, and their volatility structures, will be covered in a later chapter.

6.1 SOME GENERAL FEATURES


This first section will cover some stylized features of volatility. We will dif-
ferentiate between historical and implied volatilities. Although the qualitative
properties of these two are similar, their quantitative aspects might differ sub-
stantially, as they are specified under two different (but nevertheless equivalent)
probability measures.
0.3
0.4
0.5

 !"#%$& '()" 


0.6
0.7
0.8
0.9 154(6.1)
1

FIGURE 6.1: Dow Jones industrial average (DJIA) weekly returns and yearly
historical volatility. The (annualized) volatility is computed over non-overlapping
52
PSfrag replacements
week periods from the beginning of 1930 to 2005.

20% 60%

15%
50%
0
10%
0.1
0.2 40%
5% 0.3
0.4
0 0.5 30%
0.6
0.7
-5%
0.8 20%
0.9
-10% 1
10%
-15%

-20% 0
30 35 40 45 50 55 60 65 70 75 80 85 90 95 00 05 30 35 40 45 50 55 60 65 70 75 80 85 90 95 00 05

(a) weekly DJIA returns (b) annual volatility

HISTORICAL VOLATILITY
Volatility in financial markets varies over time. This is one of the most docu-
mented stylized facts of asset prices. For example, figure 6.1(a) gives a very long
series of weekly returns1 on the Dow Jones industrial average index (DJIA, or
just “the Dow”). Subfigure 6.1(b) presents the (annualized) standard deviation of
consecutive and non-overlapping 52-week intervals, a proxy of the realized DJIA
volatility over yearly periods. One can readily observe this time variability of
the realized volatility, and in fact we can easily associate it with distinct events,
like the Great Depression (early 30s), the Second World War (late 30s/early
40s), the Oil Crisis (mid 70s), and the Russian Crisis (late 90s).
If we compute the summary statistics of the DJIA returns, we will find that
the unconditional distribution exhibits fat tails (high kurtosis). In particular,
the kurtosis of this sample is k = 8.61. The variability of volatility can cause
fat tails in the unconditional distribution, even if the conditional returns are
normally distributed. To illustrate this point, consider a simple example where
the volatility can take only two values, σt = σ1 = 10% or σt = σ2 = 40%, and both
means are zero. Say that we denote with fN (x; µ, σ) the corresponding normal
probability density functions.
Also, suppose that p1 = 75% of the time returns are drawn from a normal2
r ∝ fN (r; 0, σ1 ), and in the other p2 = 25% of the time they are drawn from a
second normal r ∝ fN (r; 0, σ2 ). If we consider the unconditional distribution, its
probability density function will be a mixture of the two normal distributions,
and in fact
1
Here by returns we actually mean log-returns, that is if St is the time-series of DJIA
values, rt = log St−1 − log St .
2
Here the notation x ∝ f(x; · · · ) means that x is distributed as a random variable that
has a probability density function given by f(x; · · · ).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


Q #RO& $>S%!%TVU>! WRX$>SO&T
PSfrag replacements PSfrag replacements

155(6.1)
0
0.1
0
0.1
0.2 0.2
0.3 0.3
0.4 0.4
FIGURE 6.2: This figure illustrates the different kurtosis and skewness patterns
0.5 0.5
0.6 0.6
that can be generated by mixing two normal distributions. In both figures σ 1 =
0.7
0.8
0.7
0.8
10% and σ2 = 40%. In subfigure (a) the two means are equal µ1 = µ2 = 0, a
0.9
1
0.9
1

setting that can generate fat tails but not skewness. In subfigure (b) µ 1 = 5%
and µ2 = −15%, generating negative skewness in addition to the fat tails.

4.0 4.0
0 0
0.1 3.5 0.1 3.5
0.2 0.2
0.3 0.3
0.4 3.0 0.4 3.0
0.5 0.5
0.6 2.5 0.6 2.5
0.7 0.7
0.8 0.8
2.0 2.0
0.9 0.9
1 1
1.5 1.5

1.0 1.0

0.5 0.5

0 0
-2.0 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0 -2.0 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0

(a) µ1 = µ2 (b) µ1 6= µ2

r ∝ p1 fN (r; 0, σ1 ) + p2 fN (r; 0, σ2 )
Figure 6.2(a) illustrates exactly this point, and gives the two conditional
normals and the unconditional distribution. One can easily compute the statistics
for the unconditional returns, and in particular the unconditional volatility σ =
21.7%, and the kurtosis k = 8.7 > 3.
Inspecting figure 6.1(b), one can also observe that the historical realized
volatility does not swing wildly, but exhibits a cyclical pattern. In particular, it
appears that volatility exhibits high autocorrelation, with low (high) volatility
periods more likely to be followed by more low (high) volatility periods. In the
literature these patterns are often described as volatility clusters. Having said
that, the volatility process appears to be stationary, in the sense that it remains
between bounds, an intuitive feature.3 We can imagine that there is some long
run volatility that serves as an attractor, with the spot volatility hovering around
this level.

IMPLIED VOLATILITY
In chapter 2 we gave a quick introduction to the notion of the implied volatility
(IV), denoted with σ̂. In particular, given an observed European call or put option
price Pobs , the IV will equate it to the theoretical Black-Scholes value, solving
the equation
3
The intuition stems from the fact that, unlike prices themselves, market volatility can
not increase without bounds. Even if we are asked to provide some estimate for the
volatility of DJIA in 1,000 years, we would probably come up with a value that reflects
current volatility bounds. If we are asked to estimate the level of DJIA in 1,000 years’
time, we would produce a very vary large number.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


7.28
7.29
7.3
PSfrag replacements

 !"#%$& '()" 


7.31
7.32
7.33
0
0.1
156(6.1)
0.2
0.3
0.4
FIGURE 6.3: The S&P500 index (SPX, in blue) and the implied volatility index 0.5
0.6
07
(VIX, in green) are given in subfigure (a). Subfigure (b) presents a scatterplot of
0
0.1
0.7
0.8
0.2 0.9
the corresponding differences, illustrating the return/volatility correlation.
0.3
0.4
1

0.5
0.6
0.7 10
0.8
0.9 160 50
1
140 45
5
120 40
0
100 35 0.1

VIX changes
0.2
log-SPX level 0.3

VIX level
80 30 0
0.4
60 25 0.5
0.6
40 20 0.7
0.8 -5
20 15 0.9
1
0 10

-20 5 -10
90 92 95 97 00 02 05 -8 -6 -4 -2 0 2 4 6 8
log-SPX changes

(a) levels (b) differences

σ̂ ∈ R+ : Pobs = fBS (S, K , T , r, σ̂)


We also showed that short at-the-money IV reflects expected volatility (under
the equivalent martingale measure that prices the corresponding option).
As expected, implied volatility shows similar patterns to the historical re-
alized volatility. In particular, time series of IV for a particular contract with
fixed maturity exhibit an autoregressive structure and clusters. Figure 6.3(a)
gives the S&P500 index as well as the implied volatility index VIX, released
by the Chicago Board Options Exchange (CBOE). The VIX is computed as a
weighted average of option prices that bracket 30 days to maturity, with more
weight given to options that are at-the-money.4 These options are written on
the S&P500 index, and the data span a period from January 1990 (when the
VIX index was first released) up to April 2007. The VIX has been coined as
investors’ “fear gauge”, and figure 6.3(a) certainly illustrates that. Just like the
realized volatility (discussed in the previous subsection), the VIX increases in
periods where significant events cause the market to go into turmoil. We can
clearly see the first Gulf war (8/90-2/91), the East Asian crisis (5-8/97), the
Russian crisis and the collapse of Long-Term Capital Management (5-9/98), the
9/11 attacks (11/01), and the buildup to the invasion of Iraq (3/03). In all these
episodes the market level declined.
Based on our catalogue of high volatility episodes that we devised above
(using the VIX or the historical realized volatility), it is apparent that they were
accompanied by periods of low or negative returns. Each one of these clusters
is a chapter of market turmoil. This suggests that there might be some negative
correlation between the market returns and their contemporaneous volatility.
Periods of high volatility are accompanied by low returns, while returns are
higher when volatility is low. These two market regimes reflect the bad and
4
The step-by-step construction of the VIX index is given in the White Paper CBOE
(2003).

D9#2Kr9@)3#+.!ZvKr911 +#UL .!ZwM+4@)x+rL K5yS 3.3@LzMLLr)jl|{{@}}}j5~LrM+#) 9@UqL03r45U+rL{


157(6.1) Q #RO& $>S%!%TVU>! WRX$>SO&T
good times in the market.5 It is easy to investigate the validity of this claim,
by a simple scatterplot of the realized volatility against market returns, as in
figure 6.3(b). The negative relationship is apparent, and is verified by a simple
regression that indicates a relationship of the form 4σ̂ = 0.03% − 0.89 · 4µ̂. This
indicates that a 1% drop in the market index is accompanied (on average) by a
0.89% rise of the market volatility.6
This negative correlation is often coined the leverage effect, as it can be
theoretically explained by the degree of leverage that underlies the capital
structure of the firm. In particular, as the firm value is the sum of debt and
equity, a shock to the value of the firm will have an impact on the stock price
that depends on the leverage. If the firm has been financed by issuing stock
alone, then a 1% increase in the firm value will result in a 1% in the stock price;
on the other hand, if the firm is levered, the impact on the stock price can be
a lot higher than 1%, depending on the leverage.7 Thus, higher leverage will
produce higher stock price volatility. In addition, a negative stock price shock
will increase leverage, implying that negative returns imply higher volatility,
and hence the negative correlation. Early research (for example Christie, 1982)
indicate that there is indeed a relationship between this correlation and the
balance sheet, but more recent evidence indicates that this effect cannot really
explain the magnitude of the asymmetry that is observed or implied from options
markets (Figlewski and Wang, 2000). It appears that this negative relationship
might be better attributed to the erratic behavior of market participants during
market downturns.
Whatever its reasons, this negative relationship between asset returns and
their volatilities manifests itself as negative skewness in the unconditional
distribution. Coming back to our toy model that we used to investigate kur-
tosis, assume that now returns can come from two normals with parameters
(µ1 , σ1 ) = (5%, 10%) (the good times), or (µ2 , σ2 ) = (−15%, 40%) (the bad times).
Figure 6.2(b) presents the unconditional distribution for this setting. Straight-
forward calculations can reveal that the skewness of the unconditional returns
is s = −1.37, while kurtosis is similar, k = 8.28.

5
Also found in the literature as bust and boom, bear and bull, or recession and expan-
sion, depending on the journal or publication one is reading.
6
Of course this is a very crude method. Bouchaud and Potters (2001) give a formal
empirical investigation based on a large number of stocks and indices, and find that this
negative correlation is more pronounced for indices, but more persistent for individual
stocks.
7
We follow here the standard Modigliani and Miller (1958) capital structure approach,
where stocks and bonds represent different ways of splitting firm ownership. Say that
a company is worth $100m, with $10m in stock and the rest ($90m) in bonds. If the
value of the firm increases by 1% up to $101m, the value of the stock will increase to
$11m to reflect that increase (since the debt value cannot change). This will imply a
10% rise in the stock price.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


PSfrag replacements
0 PSfrag replacements

 !"#%$& '()" 


0.1
0.2
0.3
0
0.1
0.2
158(6.1)
0.4 0.3
0.5 0.4
0.6 0.5
0.7 0.6
FIGURE 6.4: Implied volatilities σ̂, plotted against maturity T and either
0.8
0.9
0.7
0.8
1 0.9
strike prices K √ (subfigure
√ a) or against the corresponding Deltas ∆ = 1

N log(F /K )/(σ̂ T ) + σ̂ T /2 (subfigure b).


0 0
0.1 0.1
0.2 0.2
0.3 0.3
0.4 0.4
0.5 0.5
0.6 0.25 0.6 0.25
0.7 0.7
0.8 0.8
0.9 0.20 0.9 0.2
1 1

0.15 0.15

0.10 0.1

1.2 1.2
0.05 0.05
500 1
0 450 0
0.5
1.0 0.8 400 1 0.8
0.6 0.4 0.6 0.4
0.2 0 350 0.2 0 0

(a) against strikes (b) against delta

The implied volatility surface


Given an underlying asset, at any point in time there will exist a number of
options, spanning a range of strike prices for different times to maturity. Each
one of these options can be inverted to deliver an implied volatility σ̂(K , T ).
A three-dimensional scatterplot of these implied volatilities gives the implied
volatility surface, which has some distinct and very interesting features. Such a
scatter plot is given in figure 6.4(a), for options computed on the S&P500 index.
One can readily observe an implied volatility skew for each maturity level.
IV is higher for small strikes, which correspond to in-the-money calls or out-
of-the-money puts, and declines as we move towards higher strikes. Another
observation is that this skew is more pronounced for options with shorter matu-
rities, and flattens out for long dated options. The monotonic relation between
the BS price and volatility indicates that out-of-the-money puts appear to carry
a higher premium than the corresponding out-of-the-money calls. More recently,
the volatility skew is presented against other forms of moneyness that remove
some of the maturity effect. Figure 6.4(b) gives such an example, where the same
IV surface is re-parameterized with respect to the Delta of the appropriate call.
In that case at-the-money contracts will be mapped to a Delta of ∆ = 1/2.
If the BS model was the correct one, that is to say if log-returns were nor-
mally distributed with constant volatility, IV surfaces would be flat. The shape
of the IV surface can point towards these deviations from normality, and in fact
it can reveal the risk-neutral distribution that is consistent with the observed
implied volatilities. In particular, across moneyness the skew pattern we outline
above is typical of index options, and to some extend stock options. Currency
options can also exhibit a U-shaped pattern of implied volatilities, coined the
volatility smile. Such a pattern was also encountered in stock and index op-
tions before the 1987 stock market crash (see Rubinstein, 1985, 1994; Jackwerth
and Rubinstein, 1996, for details). A volatility smile will be consistent with a
distribution that exhibits fat tails, since in that case it would be more likely to

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


159(6.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
exercise out-of-the-money puts or calls. To reproduce a volatility skew, one will
need a distribution that is not only leptokurtic but also skewed.
The volatility surface itself is not stable across time. The dynamics of the
surface are investigated in Skiadopoulos, Hodges, and Clelow (2000), and more
recently in Cont and da Fonseca (2002). Assumptions on these dynamics are
going to affect the Delta hedging schemes that can be employed. Derman (1999)
discusses such hedging rules, namely the sticky strike, the sticky Delta or the
sticky local volatility strategy.
One challenge is to construct a theoretical model that can replicate the shape
and the dynamics of the IV surface.

TWO MODELING APPROACHES


To model the time varying nature of the asset return volatility, typically one has
to choose between a Garch and a SV approach. Each one has its benefits, but
also some shortfalls and peculiarities. Generally speaking, the Garch family is
more suited for historical estimation and risk management purposes, while the
stochastic volatility is better adapted towards derivative pricing and hedging.
The following table gives a quick comparison of the two families. In the next
sections we will give more details.

Garch SV
current volatility known unknown
conditional volatility computable unknown
volatility randomness no extra source extra source
volatility price of risk set internally set externally
time frame discrete continuous
incompleteness discrete time extra diffusions
option pricing very limited available
historical calibration maximum likelihood hard
calibration to options hard transforms

6.2 AUTOREGRESSIVE CONDITIONAL


HETEROSCEDASTICITY
In the previous section we pointed out that a mixture of normal distribution has
the potential to produce distributions that exhibit skewness and excess kurto-
sis. Also, by investigating historical realized returns and implied volatilities, we
concluded that market volatility is time varying and cyclical. Autoregressive con-
ditional heteroscedasticity models build exactly on these points. The definitive
reference is Hamilton (1994).
Assume a probability space (Ω, F , P), and say that we are interested in
modeling a series of returns rt = rt (ω) for t = 1, . . . , T and ω ∈ Ω. The

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  160(6.2)

information that is gathered up to period t is represented by the filtration F t =


σ(rs : 0 6 s 6 t). The conditional distribution is normal

rt |Ft−1 ∝ fN (rt ; µt , σt )

but having a different volatility σt , and possibly a different mean µt . This volatility
is updated using a mechanism that ensures that at each period t − 1 we can
ascertain the parameters of next period’s returns, σt and µt , based on past returns
alone. In probability jargon we say that both σt and µt are Ft−1 -adapted.

THE ARCH MODEL


Engle (1982) set up a process which he coined Arch(1), standing for autoregres-
sive conditional heteroscedasticity of order one. In particular

rt = µ + ε t
εt ∼ N(0, ht )
2
ht = ω + γεt−1

In this model
√ the conditional return is indeed normally distributed, r t |Ft−1 ∝
fN (rt ; µ, ht ), and the volatility is Ft−1 -adapted since it is a function of εt−1 =
rt−1 − µ which is known at time t − 1. Also, if the volatility at time t − 1 is large,
then it will be more likely to draw a large (in absolute terms) εt . Therefore an
Arch(1) will exhibit some autocorrelation in the volatility. In order to ensure that
the volatility is positive we need to impose the restrictions ω, γ > 0.
2 2
We can write volatility forecasts ht+s|t = E[εt+s |Ft ] = Et εt+s by backward
substitution as
2 2
ht+s|t = Et εt+s = ω + γEt εt+s−1 = ω + γht+s−1|t

which yields the forecasts (using also ht+1|t = ht+1 which is known at time t)

ht+s|t = ω(1 + γ + · · · + γ s−2 ) + γ s−1 ht+1|t


 
1 − γ s−1 ω ω
=ω + γ s−1 ht+1 = + γ s−1 ht+1 −
1−γ 1−γ 1−γ
The above expression also indicates that the constraint γ 6 1 is needed to
ensure that the volatility process is not explosive. In that case, the long run
ω
expectation for the volatility is h? = 1−γ . The expected integrated variance,
Ps 2
 P s
Ht,s = E k=1 εt+k|t |Ft = k=1 ht+k|t will be given by
 
ω γ − γs ω
Ht,s = (s − 1) + ht+1 −
1−γ 1−γ 1−γ
The Arch(1) model can be easily extended to one of order p (an Arch(p)
model), by allowing the variance to depend on more lagged values of ε

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


161(6.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
2 2 2
ht = ω + γ1 εt−1 + γ2 εt−2 + · · · + γp εt−p
P
For this process to avoid explosive volatility we need the constraint γn 6 1.
The Arch(∞) is the natural extension where the whole history of error terms
affects our volatility forecast. Actually, early research on Arch models indicated
that a large number of lags are required to capture the dynamics of asset volatil-
ity, pointing towards some Arch(∞) structure. This gave eventually rise to the
Garch extension.

THE GARCH MODEL


The Garch model (generalized Arch) of Bollerslev (1986) extends the Arch family
by adding dependence on past variances. For example, the popular Garch(1,1)
specifies

rt = µ + ε t
εt ∼ N(0, ht )
2
ht = ω + βht−1 + γεt−1

The additional constraint β > 0 is sufficient to keep the variance positive. This
seemingly small addition is equivalent to an Arch(∞) structure, which is clear
if we back-substitute the conditional variances which yields for s lags
1 − βs
ht = ω + β s ht−s + γεt−1
2 2
+ γβεt−2 + · · · + γβ s−1 εt−s
2
1−β
If β 6 1, then we can let s → ∞, giving the Arch(∞) form of the Garch(1,1)
model
ω 2 2
ht = + γεt−1 + γβεt−2 + γβ 2 εt−3
2
+···
1−β
The impact of lagged errors decays exponentially as we move further back in the
past of the series. The Garch(1,1) model has been extremely popular amongst
econometricians and practitioners that need to either filter of forecast volatility.
The natural generalization Garch(p,q) includes p lags of the squared error terms
and q lagged variances.
Once again we can derive the volatility forecasts using forward substitution,
in particular
2 2
ht+s|t = Et εt+s = ω + βEt ht+s−1 + γEt εt+s−1 = ω + (β + γ)ht+s−1|t

which is the same form we encountered in the Arch case for γ β +γ. Therefore
we can compute forecasts for the variance and the integrated variance if we
denote κ = β + γ (the so called persistence parameter)
ω  ω 
ht+s|t = + κ s−1 ht+1 −
1−κ 1−κ
ω κ − κs  ω 
Ht,s = (s − 1) + ht+1 −
1−κ 1−κ 1−κ

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  162(6.2)

ω
The long run (or unconditional) variance is now given by h ? = 1−β−γ . In
order for the variance to remain well defined we need to impose the constraint
β + γ 6 0.

THE GARCH LIKELIHOOD


In order to use a Garch model we need to know the parameters of the process,
namely {µ, β, γ, ω}. We can estimate these parameters based on a time series of
historical returns r = {r1 , . . . , rT }. If we denote with ft−1 (rt ) = P[rt ∈ dr|Ft−1 ]
the conditional
Q density, then the likelihood of the sample is given by the product
L (r) = Tt=1 ft−1 (rt ). We are usually employ the logarithm of this expression,
the log-likelihood
T
X
log L (r) = log ft−1 (rt )
t=1

The fact that conditionally the random variables rt |Ft−1 are normally dis-
tributed, allows one to compute the likelihood for a given set of parameters
θ = {µ, ω, β, γ}. Often we set the long run variance h? equal to the sample
variance σ̄ 2 , and therefore set ω = σ̄ 2 (1 − β − γ). This makes sense if our sample
is fairly long, and can significantly help the numerical optimization algorithm.
In that case the parameter vector to be estimated is θ = {µ, β, γ}. In order to
start the recursive algorithm that computes the Garch variance we also need an
initial value for h0 . We can also use h0 = σ̄ 2 , or we can add h0 to the parameter
vector and let it be estimated.
In the Garch process we defined above, the parameter µ is not the expected
rate of return. In particular, as the asset price is lognormally distributed,
 St =
St−1 exp(rt ), the expected return is Et−1 St = St+1 exp µ − 21 ht . Therefore, if
we want µ to denote the constant expected return, then we need to set up the
Garch equation as
1
r t = µ − ht + ε t , εt ∼ N(0, ht )
2
The next steps, implemented in listing 6.1, show how the likelihood can be
computed for a given set of parameters θ and a sample r = {r t }. The popularity
of the Garch model stems from the fact that this likelihood is computed rapidly
and can be easily and quickly maximized. The ideas behind maximum likelihood
estimation were covered in detail in chapter 5.
1. If they are not part of θ, we set the parameters ω = σ̄ 2 (1 − β − γ) and
h0 = σ̄ 2 .
2. Based on the parameters {µ, ω, β, γ} and the initial value h 0 , we filter the
volatility series, applying the Garch(1,1) recursion
1
ε t = r t − µ + ht
2
ht+1 = ω + βht + γεt2

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


163(6.2) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 6.1: b $>RC^: jj` '%&:Sg Y : Garch likelihood function.


p ŠN,O. ?G„%„s%A3  Bw8
*u%;t.=O3+>; x<A3:à6/§C|~}ŒŠN,O. ?G„%„s%A3  21N,°6ù“%N=%Nc
D¿N,Ž}~D%N,0“%N=%Ncc†
8‡}•1CN,v:„† p 8u
q }•1CN,v<† p•q 4=%N
5
.Ÿ}•1CN,vC† p ŠN>8%8aN
Nr}ŒD¿N,(—K0„Cž q ž%.Cc† p + 8a4ŠNˆ:,4C5=%,O3%.=4“
¯Œ} 53>‰%4 0“%N=%Ncc†
§Œ} ‰4,C+5 2¯t‹O„e†
5 1a+=§ }ŒD¿N,v† p . uC,%,4;C=”DN,O3N;t.4
10
§:„•}Ž5 1a+=§† p 3 ;t3=O3NAŽDN,O3N;t.4
A%3Ž}ƒ‚eB9‚(† p =?C4”A3 O4A3 ?a+%+“
*+, 3;“yG}O„KF-¯
p 41t5%3A%+>;
4,%,C+, }~“%N=%N 3;“y(Œžù8‡‹Œ‚eB2™%—c5 1a+=§z†
15
p u%1C“N=4”A3 O4A3 ?a+%+“
A%3Ž}‡A%3Ž‹  4,%,C+, CHa5 1a+=§‡‹ A+Š >5 1a+=§cc†
p u%1C“N=4¡.+>;C“O3=O3+>;ONA’DN,O3N;t.4
5 1a+=§‡}rNr‹ q —a5 1a+=§ƒ‹ƒ.— 4,%,C+, c†
p 5=C+,4”.+>;C“O3=O3+>;ONA’DN,O3N;t.4
20
§>3;“yc‹O„•}Ž5 1a+=§z†
4>;“
p “O3%5%.N,%“¡AN5>=”DN,O3N;t.4”*C+,4C.NC5=
§Œ}Ÿ§:„KF<¯G

3. Now we have the variance series which allows us to compute the log-
likelihood of each observation rt . Since rt |Ft−1 ∼ N(µ, ht )

(rt − µ)2 1 1
log L (rt |θ) = − − log ht − log 2π
2ht 2 2
4. Finally adding up will give the log-likelihood of the sample
T
X
log L (r|θ) = log L (rt |θ)
t=1

The maximization of the log-likelihood is typically numerically, using a hill


climbing algorithm. Press, Flannery, Teukolsky, and Vetterling (1992) describe
a number of such algorithms. We will denote with θ̂ the parameter vector that
maximizes the sample log-likelihood. Essentially the first order conditions set
the Jacobian equal to zero

∂ log L (r|θ)
=0
∂θ θ=θ̂

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  164(6.2)

The Hessian matrix of second derivatives can help us produce the asymptotic
standard errors
∂2 log L (r|θ)
Ĥ =
∂θ 0 θ θ=θ̂

The covariance matrix of θ̂ is given by the inverse of the Hessian (Hamilton,


1994, gives methods to estimate H).

Estimation examples
As an example, we will estimate two time series using the Garch(1,1) process for
the volatility. We start with the long DJIA index sampled weekly from 1930 to
2004 (plotted in figure 6.1(a)), and then move to the shorter SPX index sampled
daily from 1990 to mid-2007 (plotted in figure 6.3). Listing 6.2 shows how the
log-likelihood can be optimized.
The estimation is done using the Optimization Toolbox in Matlab, although
any hill climbing algorithm will do in that simple case. We use constrained
optimization to ensure that β and γ are bounded between zero and one. Also we
want to ensure that κ = β + γ < 1. The standard errors are produced using the
Hessian matrix that is estimated by the toolbox.8 We also use the restriction on
the long run variance, and set the initial variance equal to the sample variance.
The maximum likelihood parameters are given below (all in percentage terms),
with standard errors in parentheses.

DJIA SPX
µ 0.19 0.05
(0.02) (0.01)
β 91.32 93.93
(0.94) (0.53)
γ 7.66 5.42
(0.76) (0.45)
κ =β+γ 98.98 99.45

√ Both times give similar estimated values. If we write the error term ε t =
ht ηt for ηt ∼ N(0, 1), then εt2 = ht η2t and since Eη21 = 1 we can write εt2 =
ht (1+ut ) where now Eut = 0 (but of course ut is not normal). The the Garch(1,1)
variance process can be cast in an autoregressive AR(1) form

ht = ω + κht−1 + γht ut

The importance of the coefficient κ becomes now apparent, as it will deter-


mine the decay of variance shocks. In both time series κ ≈ 1, which indicates
8
The optimization toolbox actually updates estimates of the Hessian and the output is
not always reliable. Some care has to taken here, and the standard errors should be
taken with a pinch of salt. Hamilton (1994) gives a number of superior methods such
as the score, or outer product method, etc.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


165(6.2) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 6.2: b $>RC^: jj` & Y C'( Y : Estimation of a Garch model.


p ŠN,O. ?G„%„s381aAàBw8
p A+N“‘“%N=%N
p “%N=%N‡}ƒyCA5,4%N“0¥/¥ º JC4%4aAMB<y%A5-e†ª³‚Œ}ƒ™c† p ¥¥ º 
“%N=%N‡}ŒyCA5,4%N“vC¨£!Cs§ vB-yA%5:c†ª³‚Œ}‡™c† p ¨>£
³~}ŒyC 8O“N=40“%N=%N°<(F 4 ;%“ º 6:„ac† p =C3:8O4
5
M~} “C3>**  A+Š 0“%N=%N°%FO6<Cc† p A+>Šež%,4=uC,;t5
p 54=¡3 ;t3=O3NAŽDNA>uO4C5
88‘} 8O4N>; <Mtc† p 8u
q }ƒ‚eYB († p•q 4=%N
.Ÿ}ƒ‚eB0„t† p ŠN>8%8aN
10
1CN,ƒ}ˆx88 q .|(†
p .+>;t5=%,NC3 ;C=O5
1CN,¸‘}¶x>ž>‚eB0„ ‚Ž‚B2‚%‚a„~‚B2‚%‚a„ |† p 1a+5%3=O3D4
1CN,€”}ˆx¬‚eB0„ ‚‘‚BŽ‚B|† p A455ƒ=?CN>;þ+ ;C4
. ƒ}¶xw‚Ž„ƒ„|e†
 p•q 4=%Nt‹CŠ%N 88ONe›„
15
.¿‡} x9‚B |†
p +>1C=O38K3‰N=O3+>; +>1C=O3+>;t5
+>1C=C+>1C=ƒ}Ž+>1C=O38K54=vC-€CNy º =4,°K60„‚‚6O-¥a3%5 1aANM°63>=%4,°6B:B:B
¸CN,%Š4¨.NA46:+**:c†
xy6¬*DNA°6¢4yC3>=*ANŠ 6¢+ u=1u=@6¢AN 8 q “%N6 N.+ q 6 ?C455 |~}úBBB
*8K3 ;t.+>;  ŠN,O. ?G„%„s%A3  6/1N, 6ù. v6ù.:¿°6ƒx¤|6‡x¤|6ŽBBB
20
1N,¸à6{1N, € 6ƒx¤|6±+ 1=+ 1=@6{Mtc†
1CN,ƒ}•yg† p 4C5=O38aN=4C5
5=%“4,%,O5~} “C3NŠ  5>®,=  3;D  ?C455 ( p 5=N;C“N,%“’4,%,C+,O5
x-A3:62§C|~}ƒŠN,O. ?G„%„s%A3 @21N, 6{Mtc† p *O3A=4,ŽN=‘4C5=O38aN=4C5
p +>uC=1%uC=‘DN,O3N;t.4Ž=C3:8O4’54,O34C5
25
*O3ŠuC,4 :„†
1OA+= 2³6š„ ‚‚— 5>®,= 2§t—³‚Kc†
Š,C3>“ +;†
“N=4=O3%.  O2y6<MM:c†

that volatility behaves as a near unit root process.9 In such a process shocks
to the volatility are near permanent, and the process is reverting very slowly
towards the long run variance.10
These parameter estimates are typical of Garch estimations, and the near
integrated behavior has been the topic of substantial research through the 80s
and the 90s. A number of researchers introduced Garch variants that exhibit long
memory, such as the fractionally integrated Garch (Figarch) of Baillie, Bollerslev,
9
In fact, if we trust the standard errors we are not able to reject the hypothesis κ = 1.
10
A Garch process with β+γ = 1 is called integrated Garch (Igarch), and is equivalent to
the exponentially weighted moving average (EWMA) specification, where the variance
is updated as σt2 = λσt−12 2
+ (1 − λ)εt−1 . In this case the volatility behaves as a random
walk.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.1
0.2
0.3

 !"#%$& '()" 


0.4 PSfrag replacements
0.5 0
0.6
0.7
0.1
0.2
166(6.2)
0.8 0.3
0.9 0.4
1 0.5
FIGURE 6.5: Filtered volatility for the DJIA and the SPX index. In subfigure (a) 0.6
0.7
the Garch variance (blue) of weekly DJIA returns is plotted with the historical 0.8
0.9
1
realized volatility (red). In (b) the Garch variance (blue) of daily SPX returns is
plotted with the implied volatility VIX index (red)

60 50
0
0.1 45
50 0.2
0 0.3 40
0.1 0.4
0.2 40 0.5 35
0.3 0.6
0.4 0.7 30
0.5 30 0.8
0.6 0.9 25
0.7 1
0.8 20 20
0.9
1 15
10
10

0 5
30 35 40 45 50 55 60 65 70 75 80 85 90 95 00 05 90 92 95 97 00 02 05 07

(a) DJIA volatility (b) SPX volatility

and Mikkelsen (1993). Others acknowledge that models with structural breaks
in the variance process can exhibit spuriously high persistence (Lamourex and
Lastrapes, 1990), and produce models that exhibit large swings in the long run
variance attractor (Hamilton and Susmel, 1994; Dueker, 1997).
Figure 6.5 gives the filtered volatility for both cases. This is a by-product of
the likelihood evaluation. For comparison, the historical volatility (of figure 6.1)
and the implied volatility VIX index (of figure 6.3) are also presented. The filtered
volatilities are computed using the maximum likelihood parameter estimates. On
point worth making is that the implied volatility overestimates the true volatility,
illustrated in subfigure (b), where the VIX index is above the filtered volatility
for most of time. This due to the fact that implied volatility can be thought as
a volatility forecast under an equivalent martingale measure, rather than a true
forecast. There will be different risk premiums embedded in the implied volatility,
rendering it a biased estimator or forecast of the true volatility.

OTHER EXTENSIONS
Apart from the simple Garch(1,1) model that we already presented, there have
been scores of modifications and extensions, tailor made to fit the stylized facts
of asset prices. We will give here a few useful alternatives.
In the standard Garch model we √ assumed that conditional returns are nor-
mally distributed, and write εt = ht ηt , with ηt ∼ N(0, 1). The likelihood func-
tion was based on this assumption. It is straightforward to use another distri-
bution for ηt ; if it has a density function that is known in closed form, then it
is straightforward to modify the likelihood function appropriately. Of course it
might be necessary to normalize the distribution to ensure that Eη t = 0 and
Eη2t = 1. A popular choice is the Student-t distribution which can accommodate

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


167(6.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
conditional fat tails. The density function of the Student-t distribution with ν
degrees of freedom is
 −(ν+1)/2
Γ (ν + 1)/2 x2
ft (x; ν) = √ 1+
πνΓ (ν/2) ν
ν
As the t distribution has variance ν−2 , we can set the density of ηt equal to
r r !
ν−2 ν−2
ηt ∝ ft ηt ;ν
ν ν

We can augment the parameter vector θ with ν, and the third step of the like-
lihood evaluation will now become
3? Now we have the variance series which allows us to compute the log-
likelihood of each observation
√ !
ν − 2Γ (ν + 1)/2 1
log L (rt |θ) = log √ − log ht
ν πΓ (ν/2) 2
 
ν+1 (rt − µ)2 ν − 2
− log 1 +
2 ht ν2

Garch models based on normal or t distributed errors do not exhibit skewness.


Nelson (1991) considers the generalized error distribution (GED) which can
potentially capture skewed errors. Having said that, this approach does not
model the leverage effect directly. The GJR-Garch model, introduced in Glosten,
Jagannathan, and Runkle (1993), uses a dummy variable to assume different
impact of positive and negative news on the variance process. In particular
2
ht = ω + βht−1 + γεt−1 + γ ? I(εt−1 6 0)εt−1
2

The function I(x) is the indicator function. Therefore, if γ ? > 0 a negative return
will increase the conditional variance more than a positive one (γ +γ ? instead of
γ).11 But even with the GJR approach we will not have the situation illustrated
in figure 6.3(b), where positive returns will actually have a negative impact on
the volatility.
The Egarch model of Nelson (1991) takes a more direct approach, as it uses
raw rather than squared returns. This implies that the sign is not lost and will
have an impact. In order to get around the non-negativity issue he models the
logarithm of the variance
εt
log ht = ω + β log ht−1 + γ (|ηt | + θηt ) , ηt = √
ht
11
Other asymmetric extensions include the threshold model of Zakoian (1994) and the
quadratic Garch of Sentana (1995).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  168(6.2)

LISTING 6.3:
 b >$ RC^: jj` '%&:Sg Y : Egarch likelihood function.
p 4ŠN,O. ?G„%„s%A3  Bw8
*u%;t.=O3+>; x<A3:à6/§C|~}‡4ŠN,O. ?G„%„s%A3  21N,°6ù“%N=%Nc
D¿N,Ž}~D%N,0“%N=%Ncc†
8‡}•1CN,v:„† p 8u
Nr}•1CN,v<† p + 8a4ŠN
5
q }•1CN,vC† p•q 4=%N
.Ÿ}•1CN,v9˜O† p ŠN>8%8aN
JŒ}•1CN,v<™† p =?O4=N
¯Œ} 53>‰%4 0“%N=%Ncc†
§Œ} ‰4,C+5 2¯t‹O„e†
10
5 1a+=§ƒ}ŒD¿N,v† p . uC,%,4;C=”DN,O3N;t.4
§:„ý}Ž5 1a+=§† p 3 ;t3=O3NAŽDN,O3N;t.4
A%3Ž}Œ‚eB† p A3 O4A3 ?a+%+“
*+, 3;“yG}O„KF-¯
5 1a+=§‡}‡5 1a+=§‡‹ 4>1a5 †
15
p 41t5%3A%+>;
4,%,C+, }~“%N=%N 3;“y(Œžù8‡‹Œ‚eB2™%—c5 1a+=§z†
p u%1C“N=4”A3 O4A3 ?a+%+“
A%3 }ƒA%3Ž‹  4,%,C+, Ha5 1a+=§‡‹ A+Š >5 1a+=§cc†
p ;a+,8aNA3‰4”*+,Ž4=%N
20
4=%N } 4,%,C+, H 5>®,= >5 1a+=§cc†
p u%1C“N=4ŽDN,O3N;t.4
A+Ч½} A+Š >5 1a+=§Gc†
A+Ч½}rNr‹ q —CA+Ч‘‹ƒ.C—K N q 5 :4=%NKš‹ŸJt—4=%NGc†
5 1a+=§‡} 4y1 A+Чec†
25
§>3;“yc‹O„•}Ž5 1a+=§z†
4>;“
§Œ}Ÿ§:„KF<¯Gc†
p 3 *ŽDN,O3N;t.4”*NC3A4“”54=‡N‡AN,%Š4ƒDNA>uO4
p -1O4,?ON1t5ƒ“uC4Œ=%+‡N q 5 uC,%“‡1ON,N>8a4=4,¡DNA>uO4C5t
30
3 *¡3%5 ;ON; :A%3e
A%3Ž}Ž„4²†
4>;“

In the Egarch approach γθ < 0 will be consistent with figure 6.3(b), as higher
returns will lower volatility. Listing 6.3 shows an implementation of the Egarch
likelihood function. As there are no constraints in the Egarch maximization, the
hill climbing algorithm might attempt to compute the likelihood for absurd pa-
rameter values as it tries to find the optimum. There are a couple of tricks in the
code that ensure that a likelihood value will be returned. The implementation
for the optimization resembles listing 6.2, but we shall use unconstrained opti-
mization. The maximum likelihood parameters are given below for the two time
series

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


169(6.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
DJIA SPX
µ 0.14% 0.04%
(0.02%) (0.01%)
ω -0.2968 -0.2612
(0.0204) (0.0128)
β 0.9785 0.9826
(0.0021) (0.0013)
γ 0.1678 0.1249
(0.0109) (0.0075)
θ -0.4093 -0.6145
(0.0402) (0.0587)
As expected, the product γθ < 0, supporting the negative returns/volatility re-
lationship. The filtered variances are similar to the ones in figure 6.5.
Asset pricing models typically assert that market volatility is a measure of
systematic risk, and that the expected return should be adjusted accordingly. If
r is the risk free rate of return, then popular modifications to the Garch equation
are the so called Garch-in-mean models
p 1
rt = r + λ ht − ht + εt , εt ∼ N(0, ht )
2
1
rt = r + λht − ht + εt , εt ∼ N(0, ht )
2
The parameter λ in the above expressions denotes the price of risk. Note that
in the first alternative the asset exhibits constant Sharpe ratio.
Garch models can also be extended to more dimensions. In that case the
covariance matrix is updated at each time step. In the univariate case we needed
to take some care to ensure that the variance remained positive; now, in an
analogous fashion, we must make sure that the covariance matrix is positive
definite. This is not a trivial task. Also, in the general case a large number of
parameters have to be estimated, and we usually estimate restricted versions in
order to reduce the dimensionality.12 In general, a multivariate Garch(1,1) will
be of the form

r t = µt + ε t
ε t = H1/2
t ηt
ηt ∼ N(0, I)

The matrix H1/2t can be thought of as the one obtained from the Cholesky fac-
torization of the covariance matrix Ht . The covariance matrix can be updated in
a form that is analogous to the univariate Garch(1,1)
12
The most widely used forms are the VEC specification of Bollerslev, Engle, and
Wooldridge (1988), and the BEKK specification of Engle and Kroner (1995). A re-
cent survey of different approaches and methods is Bauwens, Laurent, and Rombouts
(2006).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  170(6.2)

Ht = Ω + B Ht−1 + α (ε t ε 0t )

In this case the (i, j)-th element of the covariance matrix will depend on its
(i) (j)
lagged value and on the product εt−1 εt−1 . Of course more general forms are
possible, with covariances that depend on different lagged covariances or error
products.
To illustrate the multivariate Garch, we will use an example that is based on
the Capital Asset Pricing Model (CAPM). In particular, asset returns will depend
on the covariance with the market and the market premium, which in turn will
depend on the market variance. If we denote with rtA , rtM and rtF the asset, market
and risk free rates of return, then we can write the CAPM relationships as

Et−1 (εtA εtM ) 


rtA = rtF + M 2
Et−1 rtM − rtF + εtA
Et−1 (εt )
rtM = rtF + λEt−1 (εtM )2 + εtM

Since Et−1 rtM − rtF = λEt−1 (εtM )2 , the above system simplifies to

rtA − rtF = λEt−1 (εtA εtM ) + εtA


rtM − rtF = λEt−1 (εtM )2 + εtM

We can estimate the above specification using a multivariate Garch approach,


taking into account that the covariance and the variances can be time varying.
If we define
 A   A
? rt − rtF εt
rt = , εt = , Ht = Et−1 (ε t ε 0t )
rtM − rtF εtM

then we can estimate the process (with 17 parameters)


   
α1 λ1,1 λ1,2 λ1,3
r ?t = + · g(Ht ) + ε t , ε t ∼ N(0, Ht )
α2 λ2,1 λ2,2 λ2,3
     
ω1,1 ω1,2 β1,1 β1,2 γ1,1 γ1,2
Ht = + Ht−1 + (ε t ε 0t )
ω1,2 ω2,2 β1,2 β2,2 γ1,2 γ2,2
(1,1) (2,2) (1,2)
The function g(Ht ) = (Ht , Ht , Ht )0 takes the unique elements of the co-
variance matrix and puts them in a vector form.
If the conditional CAPM with time varying risk premiums is sufficient to
explain the asset and market returns, then the following restrictions should be
satisfied        
α1 0 λ1,1 λ1,2 λ1,3 001
= , =
α2 0 λ2,1 λ2,2 λ2,3 010
The restrictions can be tested with a likelihood ratio test.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


171(6.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
GARCH OPTION PRICING
The Garch family of models has been the workhorse of volatility modeling and
has had many applications in testing, forecasting, and risk management. Appli-
cations within a pricing framework one the other hand have been very limited.
The reason is that Garch models are set-up in discrete time, and for that reason
the underlying market is incomplete.
This means that replicating portfolios do not exist for derivative assets. In-
tuitively, this is due to the fact that the state-space is too dense compared to
the time-space (where rebalancing takes place). Over a time step the asset price
can ‘jump’ to an infinite number of values, and it is impossible to construct a
position that will hedge against all possibilities. In contrast, when trading takes
place continuously the asset price diffuses from one level to the next, giving us
the opportunity to create a dynamic Delta hedging strategy.
This is not a feature of Garch models alone; all models that are set up in
discrete time and have continuous support will share the same drawback. Even
in the simple model where the asset log-price follows a random walk model in
discrete time the market is incomplete. This implies that there is not a unique
way to identify the risk adjusted probability measure in discrete time models.
For example, there is nothing to stop us from specifying

St+1 = St exp µ + εt+1 , εt+1 ∼ N(0, σ 2 ), under P

St+1 = St exp µQ + εt+1
Q
, εt+1 ∼ t(ν), under Q

for µQ chosen in a way that makes the discounted price a martingale under Q,
St = E Q
t [exp(−r4t)St+1 ]. But not all is lost: we just need to impose some more
structure that will eventually constrain our choices for Q. Here we will outline
two methods to achieve that, but since derivative pricing typically takes place
in a continuous time setting, we will not dwell into details.

1. We might impose assumptions on the utility structure. Assuming a certain


utility form will set the family of equivalent measures. In particular, the
parameters of the utility function may be recovered from the true stock and
risk free expected returns.
2. We can assume that the density structure has to be maintained, that is to
say if errors are normally distributed under P, then they must be normally
distributed under Q.

Utility based option pricing


In our first approach will will assume a utility function U(t, Wt ), which measures
utility of wealth Wt realized at time t. We will also need the relationship between
wealth Wt and the underlying asset price St ((an early source for this approach
is Brennan, 1979)). For example, if the underlying asset is a wide index, then
one might assume that investor’s wealth is very correlated with this index. If
the underlying asset is a small stock, then the correlation will be smaller. This

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  172(6.2)

resembles the impact of the idiosyncratic versus the systematic risk in asset
pricing models.
Option prices can be computed from the Euler equations, which state that
the price at time t of a random claim that is realized at time T > t, say X T , is
given by (see for example Barone-Adesi, Engle, and Mancini, 2004)
 
UW (T , WT )
Xt = E t XT
UW (t, Wt )

Essentially, the Euler equation weights each outcome with its impact on the
marginal rate of substitution, before taking expectations. The price of a European
call option would be then equal to
 
UW (T , WT )
Pt = E t (ST − K )+
UW (t, Wt )

Note that in the above expression there is no talk of equivalent measures. All
expectation are taken directly under P. Nevertheless, if we think of the marginal
rate of substitution as a Radon-Nikodym derivative, then we can define the
equivalent probability measure.
Of course, in general it is not straightforward neither to specify the ap-
propriate utility nor to compute the expectation in closed form, but things are
substantially simplified if we consider power utility functions. In fact , we will
arrive to the Esscher transform, which has been very successful in actuarial
sciences. This is described in detail in Gerber and Shiu (1994).

Distribution based
The second method takes a more direct approach. Suppose that the log price
follows the standard Garch(1,1) model

1 p
4 log St = µ − ht + ht ηt
2
ht = ω + βht−1 + γht−1 η2t−1

Rather than trying to derive, we define the risk neutral probability measure
as the one under which the random variable
r−µ
ηQ
t = ηt − √
ht
is a martingale. Then under risk neutrality the asset log price follows

1 p
4 log St = r − ht + ht ηQ
t
2
 2
r−µ
ht = ω + βht−1 + γht−1 ηQ
t−1 + √
ht

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


173(6.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
This approach is pretty much described in Duan (1995). Derivative are com-
puted, in the usual way, as the expectation under risk neutrality. The benefit of
this approach is that standardized errors remain normally distributed even after
the probability measure change. Equivalently, we can say that the Black-Scholes
formula holds for options with one period to maturity.
One major drawback of the standard Garch model is that the expectation
that prices derivatives is not generally computable in closed form. Of course
simulation based techniques can be employed, but they will be time consuming.
An alternative, presented in Duan, Gauthier, and Simonato (1999) can be used.
In this approach the state space is discretized, and a Markov chain is used to
approximate the Garch dynamics.
The Garch variant introduced in Heston and Nandi (2000) circumvents the
computability issue, and we present their approach in the following subsection.

The Heston and Nandi model


Heston and Nandi (2000) propose a similar class of Garch-type processes

1 p
4 log St = r + λht − ht + ht ηt
2
p 2
ht = ω + βht−1 + γ ηt−1 − δ ht

Here√the bilinearity in the variance process is broken. That is to say, the prod-
uct ht−1 ηt−1 ) is not present and ηt−1 , which is a standardized normal series,
appears in the variance √ update alone.
We set ηQt = η t + λ ht , and define the probability measure Q as one that
is equivalent to P, and also ηQ t ∼ N(0, 1) under this measure. Then, the asset
price process under Q will satisfy

1 p
4 log St = r − ht + ht ηQ t
2
p 2
ht = ω + βht−1 + γ ηQt−1 − δ
Q
ht

for δ Q = δ + λ.
Unlike the standard Garch model, the Heston and Nandi (2000) modification
allows one to compute the characteristic function as a closed form recursion.
Then, option prices or risk neutral can be easily computed using the methods
described in chapter 4.

6.3 THE STOCHASTIC VOLATILITY FRAMEWORK


As we pointed out in the previous section, for option pricing purposes continuous
time stochastic volatility models are immensely more popular than model set-up

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  174(6.3)

in discrete time.13 The generic stochastic volatility process is described by a


system of two SDEs

dSt = µt St dt + vt St dBt
dvt = α(vt )dt + β(vt )dBtv

The leverage effect is accommodated by allowing the asset return and the volatil-
ity innovations to be correlated

Et dBt dBtv = ρdt

Derivative prices will have a pricing function that will depend on the volatility,
on top of time and the underlying asset price

Pt = f(t, St , vt )

When we introduce stochastic volatilities we move away from the Black-


Scholes paradigm, where there was a single Brownian motion that generates
uncertainty and markets are complete. Stochastic volatility models are driven
by two Brownian motions, and for that reason the market that is constructed by
the risk free and the underlying asset is not complete. This means that there is
an infinite number of derivative prices that rule out arbitrage.
Having said that, if we augment the hedging instruments with one derivative
contract the market becomes complete. Therefore derivatives will now have to
be priced in relation to each other, as well as the underlying asset. This is one
of the reasons that models with stochastic volatilities are calibrated to observed
derivative prices.
Another point of viewing this issue is through the notion of volatility risk,
introduced by the second BM Btv (and in particular v
p the part B̄t of Bt that is
orthogonal to Bt , since we can write Bt = ρBtv + 1 − ρ2 B̄t ). The underlying
asset does not depend on this BM, and therefore the risk generated by this
BM is not actually priced within the asset price. On the other hand, of course,
the risk of Bt is embedded in the risk premium µt − rt . Investor’s might be risk
averse towards this risk, and although this risk aversion is not manifested in the
market for the underlying asset, it will be manifested in the options market as
these contracts depend on vt directly. Using one derivative we can identify the
risk premium, and then we can price all other derivatives accordingly.
We will describe two approaches that reach derivative prices, one that im-
plements Girsanov’s theorem and one that constructs a hedging portfolio in the
spirit of BS. Before we do that, we will go through some standard stochastic
volatility models that have been proposed, discussing some of their properties
and features. We will just present a selected few here, to illustrate the motivation
as they try to capture they stylized features of volatility processes.
13
Stochastic volatility models can also be set in discrete time, like the specification
described in Harvey, Ruiz, and Shephard (1994). They are used for historical estimation
but are not popular for derivative pricing, just like their Garch-type counterparts.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


175(6.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
THE HULL AND WHITE MODEL
The first stochastic volatility model was introduced in Hull and White (1987,
HW). HW recognized that if investors are indifferent towards volatility risk (that
is vt is not correlated with the consumption that enters their utility function),
and volatility is independent of the underlying asset price process, then one can
integrate out the volatility path, and write the price of an option as a weighted
average. In particular, if we are pricing a European call option, then it is sufficient
to condition on the average variance over the life of the option
Z ∞
PHW = fBS (t, St ; v̄)f(v̄)dv̄
0

where the average variance over the life of the derivative in question is defined
as Z T
1
v̄ = vs ds
T −t t
and f(v̄) is the probability density of the average variance process.
For example, in the original Hull and White (1987) article the variance is
assumed to follow a geometric Brownian motion, which is uncorrelated with the
asset price process.

dSt = µSt dt + vt St dBt
dvt = θvt dt + φvt dBtv

In this case, HW give a series approximation for the option price, which is based
on the moments of the average variance.
The HW model was the first approach (together with Wiggins, 1987) towards
a pricing formula for SV models, but the model they propose does not capture
the desired features of realized volatilities. In particular, under the geometric
Brownian motion dynamics, variance will be lognormally distributed. In the long
run, the volatility paths will either explode towards infinity, or they will fall to
zero, depending on the parameter values. Volatility in the HW model does not
exhibit mean reversion and is not stationary. As maturities increase, the variance
of out volatility forecasts increases without bound.

THE STEIN AND STEIN MODEL


The Stein and Stein (1991, SS) model remedies the long run behavior of the HW
specification. In particular, rather than a geometric Brownian motion, SS use an
Ohrnstein-Uhlenbeck (OU) process. The OU process exhibits mean reversion,
and for that reason has a long run stationary distribution, which is actually
normal. SS model the volatility rather than the variance

dSt = µSt dt + σt St dBt


dσt = θ(σ̄ − σt )dt + ξdBtσ

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  176(6.3)

This process was later extended in Schöbel and Zhu (1999) by allowing
the two BM processes to be correlated. The volatility process follows a normal
distribution for each maturity, and therefore can cross zero. This implied that
the true correlation (that is Et dSt dσt ) changes sign when this happens. This can
be an undesirable property of the model.
Schöbel and Zhu (1999) compute the characteristic function of the log-price
 
1 ρ 
φ(T , u) = exp iu log(S0 ) + iuµT − iu σ02 + ξ 2
2 ξ
 
1
× exp D(T ; s1 , s3 )σ02 + B(T ; s1 , s2 , s3 )σ0 + C(T ; s1 , s2 , s3 )
2

The functions D, B and C are solutions of a system of ODEs, and are given in
a closed (but complicated) form in the appendix of Schöbel and Zhu (1999).

THE HESTON MODEL


By far the most popular model with stochastic volatility is Heston (1993). The
variance follows the square root process of (also called a Feller process, de-
veloped in Feller, 1951), also used as the building block for the Cox, Ingersoll,
and Ross (1985) model for the term structure of interest rates. The dynamics are
given by

dSt = µSt dt + vt St dBt

dvt = θ(v̄ − vt )dt + ξ vt dBtv
Et dBt dBtv = ρdt

The Heston model has a number of attractive features and a convenient pa-
rameterization. In particular, the variance process is always non-negative, and
is actually strictly positive if 2θv̄ > ξ 2 . The volatility-of-volatility parameter ξ
controls the kurtosis, while the correlation parameter ρ can be used to set the
skewness of the density of asset returns. The variance process exhibits mean re-
version, having as an attractor the long run variance parameter v̄. The parameter
θ defines the strength of mean reversion, and dictates how quickly the volatility
skew flattens out.
This model belongs to the more general class of affine models of Duffie et al.
(2000), and the characteristic function of the log-price is given is closed form. In
particular it has an exponential-affine form14
14
We use the negative square root in d, found in Gatheral (2006), unlike the original
formulation in Heston (1993). Albrecher, Mayer, Schoutens, and Tistaert (2007) discuss
this choice and show that the two are equivalent, but using the negative root offers
higher stability for long maturities. The problem arises due to the branch cuts of the
complex logarithm in C (u, T ). A description of the problem and a different approach
can be found in Kahl and Jäckel (2005).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


177(6.3) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 6.4:
a& ` %T%! " Y : Characteristic function of the Heston model.
p 1%?t3s>?O4C5=C+>; Bw8
*u%;t.=O3+>; M~}r1%?t3s>?O4C5=C+>;àuv6V1G
= }š1vB2=g†
, }š1vB2,g†
D‚ }š1vB<D‚†
5
D¿N, }š1vB-D¿N,v†
=?O4=N~}š1vB-=?O4=N†
y3 }š1vB<y3†
,?O+ }š1vB-,?O+g†
“}až 5>®,=  3—%,?O+a—y3O— ucž%=?O4=NKB<~‹ -y3tC—u°B<—G 3‹uGc†
10
? }Œ=?O4=N‘ž3—%,?O+a—y3O—>uƒ‹%“†
Š}š?°B2Hc-?þž>—“tc†
4“ƒ} 4y1 <=a—“tc†
¸%¦Œ} A+Š :„Œž>ŠzB2—C4“GgB2Hc:„‡ž>Štc†
«Ã}Œ3—,a—=a—>uƒ‹C=?O4=Nt—D¿N,cHG:y3tO—K<?K—= ž>—¸%¦Kc†
15
¥´}š?GHG:y3tB<—G:„‡ž4“GgB2Hc:„‡ž>ŠzB2—C4“Gc†
M} 4y1 -«r‹rD‚t—¥Kc†

φ(u, T ) = exp {C(u, T ) + D(u, T )v0 + iu log S0 }


with
 
θv̄ 1 − g exp(dT )
C(u, T ) = iuµT + 2 (θ − iuρξ + d)T − 2 log
ξ 1−g
θ − iuρξ + d 1 − exp(dT )
D(u, T ) = ·
ξ2 1 − g exp(dT )
θ − iuρξ + d
g=
θ − iuρξ − d
q
d = − (iuρξ − θ)2 + ξ 2 u(i + u)

The characteristic function of the Heston model is given in 6.4. This can
be used to compute European style vanilla calls and puts using the transform
methods outlined in chapter 4. We will be using this approach later in this
chapter to calibrate the Heston model to a set of observed option prices.

GIRSANOV’S THEOREM AND OPTION PRICING


We will now turn to the problem of option pricing, and discuss the two main
methods. We start with an implementation of Girsanov’s theorem, and then we
will investigate the hedging structure that will give us the corresponding PDE.
We set up a filtered space {Ω, F, F , P} and two correlated Brownian motions
with respect to P, Bt and Btv . Based on these BMs we now consider a general
stochastic volatility specification

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  178(6.3)

dSt = µt St dt + vt St dBt
dvt = α(vt )dt + β(vt )dBtv
v
EP
t dBt dBt = ρdt

In order to apply Girsanov’s theorem we define the process M t via the


stochastic differential equation

dMt = Φt Mt dBt + Mt Ψt dBtv

with initial value M0 = 1. The solution of this SDE is the exponential martingale
(with respect to P), which has the form
Z T Z Z T Z 
1 T 2 1 T 2
Mt = exp Φs dBs − Φs ds + Ψs dBsv − Ψs ds
t 2 t t 2 t
The processes Φt and Ψt are Ft -adapted, and therefore they can be functions
of (t, St , vt ). Based on this exponential martingale we can define a probability
measure Q, which is equivalent to P. In fact, every choice of processes Φ t and
Ψt will produce a different equivalent measure. The only constraint we need to
impose on these processes is that the discounted underlying asset price must
form a martingale under Q, which then becomes an equivalent martingale mea-
sure (EMM). The fundamental theorem of asset pricing postulates that if this
the case, then there will be no arbitrage opportunities in the market. It turns
out that this constraint is not sufficient to identify both processes, something
that we should anticipate since the market is incomplete and there will not be
a unique EMM.
The EMM will be defined via its Radon-Nikodym derivative with respect to
the true measure,
dQ
= Mt
dP t
If ΥT is a FT -measurable random variable, then expectations under the equiv-
alent measure will be given as
MT
EQ P
t ΥT = E t ΥT
Mt
It is useful to compute the expectations over an infinitesimal interval dt, as this
will help us compute the drifts and volatilities under Q. In particular we will
have
Mt + dMt
EQ P
t dΥt = Et dΥT
Mt
 
dMt v
= EPt 1 + dΥt = EP P
t dΥt + Et (Φt dBt + Ψt dBt ) dΥt
Mt
We can employ the above relationship to compute the drifts and the volatil-
ities of the asset returns under Q

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


179(6.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
   2
dSt √ √  dSt
EQ
t = µ + Φt vt + ρΨt vt dt, and EQ
t = vt dt
St St
The drift and volatility of the variance process are
 2 2
EQ Q
t dvt = α(vt ) + ρΦt β(vt ) + Ψt β(vt ) dt, and Et (dvt ) = β (vt )dt

This verifies that under equivalent probability measures the drifts are adjusted
but volatilities are not. Now an EMM will be one that satisfies
EQ
t (dSt /St ) = rdt

This constraint yields a relationship between Φt and Ψt


µ−r
Φt + ρΨt = − √ = −Ξ S (t, St , vt )
vt

The function Ξ S (t, S, v) is the market price of risk, the Sharpe ratio of the
underlying asset. In order to construct a system we need a second equation, and
essentially we have the freedom to choose the market price of volatility risk.
Thus if we select a function EQ Q
t dvt = α (t, S, v), which will be the variance drift
under risk neutrality, we can set up a second equation
α(vt ) − α Q (t, St , vt )
ρΦt + Ψt = − = −Ξ v (t, St , vt )
β(vt )
where Ξ v (t, S, v) will be the price of volatility risk.
The market risk premium Ξ S will be typically positive, as the underlying
asset will offer expected returns that are higher than the risk free rate. This
reflects the fact that investors prefer higher returns, but are risk averse against
declining prices. When it comes to volatility, we would expect investors to prefer
lower volatility, and be risk averse against volatility increases. This indicates
that it would make sense to select α Q in a way that implies a negative risk pre-
mium Ξ v , and one that does not increase with volatility. Essentially this means
that α Q > α. In practice we will have to find a convenient parameterization for
α Q or Ξ v that leads to expressions that admit solutions, and at the same time
restrict the family of admissible EMMs. The parameter values cannot be deter-
mined from the dynamics of the underlying asset, but they can be recovered from
observed derivative prices.
If we solve the above system we can find the processes Φ t and Ψt , and
through them the appropriate EMM, as follows
1 
Φt = −Ξ S + ρΞ v
1 − ρ2
1 
Ψt = 2
−Ξ v + ρΞ S
1−ρ
Finally, derivative prices can be written as expectations under Q, where the
asset dynamics are

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  180(6.3)

dSt = rSt dt + vt St dBtQ
dvt = α Q (t, St , vt )dt + β(vt )dBtQ,v
Q,v
EQ Q
t dBt dBt = ρdt

where α Q (t, St , vt ) = α(vt ) − Ξ v (t, St , vt )β(vt ).

Example: The Heston model


In Heston’s model the variance drift and volatility are given by

α(v) = θ(v̄ − v)

β(v) = ξ v

The price of risk is determined by the risk free rate and the asset price
dynamics
µ−r
Ξ S (t, S, v) = √
v
We are free to select the price of volatility risk. Say we set it equal to
κ√
Ξ v (t, S, v) = vt
ξ
for a parameter κ 6 0 (to conform with agents that are averse towards higher
volatility). Then, the risk premium will be positive and increasing with volatility.
In addition, such a risk premium will lead to risk neutral dynamics that have the
same form as the dynamics under P.
Girsanov’s theorem will give the process under Q

dSt = rSt dt + vt St dBtQ

dvt = α Q (vt )dt + ξ vt dBtv,Q

The risk neutral variance drift α Q (vt ) = θ(v̄ − vt ) − Ξ v (t, St , vt )ξ vt . Then we
can rewrite the dynamics

dSt = rSt dt + vt St dBtQ

dvt = θ Q (v̄ Q − vt )dt + ξ vt dBtQ,v
θv̄
for the parameters θ Q = θ + κ and v̄ Q = θ+κ . Due to their risk aversion, mani-
fested through the parameter κ 6 0, investors behave as if the long run volatility
is higher than it really is, and as if volatility exhibits higher persistence.

THE PDE APPROACH


Alternatively, we can take a route that follows the BS methodology, where a
derivative is shorted and subsequently hedged. This will give rise to the PDE
representation of the price. In the BS world with constant volatility, it was

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


181(6.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
sufficient to use the underlying asset and the money market account to achieve
the hedge. Here, as we have one more source of risk, these two instruments will
not be sufficient to eliminate volatility risk. To hedge our short derivative we will
use the money market account, the underlying asset and one extra derivative.
Consider a derivative X, and denote its pricing function with f(t, S, v). The
functional form of f will depend on the particulars of the contract, such as ma-
turity, payoff structure, optionality, etc. Therefore, the process for this derivative
will be given by Xt = f(t, St , vt ). Following the BS argument, if we knew the
functional form of f, we could compute the dynamics of the derivative price using
Itō’s formula (for two dimensions)
 
∂f X ,S X ,v
dXt = αtX + µSt dt + βt dBt + βt dBtv
∂S

where
∂f ∂f 1 ∂2 f
αtX = + α(vt ) + vt St2 2
∂t ∂v 2 ∂S
1 2 ∂2 f √ ∂2 f
+ β (vt ) 2 + ρ vt St β(vt )
2 ∂v ∂S∂v
∂f
βtX ,S = σSt
∂S
∂f
βtX ,v = β(vt )
∂S
From this expression it is apparent that if we construct a portfolio using only
the underlying asset and the bank account, we will not be able to replicate the
price process Xt , since the risk source Btv cannot be reproduced. The market that
is based only one these instruments is incomplete, since the derivative cannot
be replicated.
But we can dynamically complete the market using another derivative X ? ,
with pricing function f ? (t, S, v). This will work of course if X ? actually depends
on the BM Btv , which is typically the case.15 In practice we would perhaps
replicate X (say a barrier option), using the risk free asset, the underlying asset
and a liquid derivative X ? (say a vanilla at the money option).
Following BS, we short X and construct a portfolio of the underlying stock
and the other derivative. We want to select the weights of this portfolio in a way
that makes it risk free. Then it should grow at the risk free rate.
Say that at each point in time we hold ∆t units of the underlying asset and
∆?t units of the derivative. Then the change in our portfolio value Π t will be

dΠt = dXt − ∆t dSt − ∆?t dXt?


15 ∂f
It is sufficient that the pricing function depends on v, in order for the derivative ∂v 6= 0,
as we will see below. For example a forward contract is a derivative but it would not
depend on Btv , since its pricing function f(t, S, v) = S exp(r(T − t)) does not depend
on v.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  182(6.3)

Substituting for the dynamics of dXt , dSt and dXt? will give the portfolio
dynamics

dΠt = (. . .)dt
 
√ ∂f √ √ ∂f ?
+ v t St − ∆t vt St − ∆?t vt St dWt
∂S ∂S
 
∂f ∂f ?
+ β(vt ) − ∆?t β(vt ) dZt
∂v ∂v

If we select the portfolio weights that make the parentheses equal to zero,
then we have constructed a risk free portfolio. The solution is obviously 16
 −1
∂f ∂f ?
∆?t =
∂v ∂v
∂f ∂f ?
∆t = − ∆?t
∂S ∂S
And since the portfolio will be risk now free, it will also have to grow at the risk
free rate of return
dΠt = rΠt = r(Xt − ∆t St − ∆?t Xt? )
We should expect that the drifts will give the PDE that we are looking for, but
at the moment we have a medley of partial derivatives of both pricing functions
f and f ? . Nevertheless, we can carry on setting the portfolio drifts equal, which
yields

∂f ? ∂f ?
α X + µS − ∆µS − ∆? α X − ∆? µS = r(f − ∆S − ∆? f ? )
∂S ∂S
?
Since ∆ + ∆? ∂f ∂f
∂S = ∂S the drift of the underlying asset µ will cancel out,
resembling the BS scenario. Furthermore, if we substitute the hedging weights
∆ and ∆? and rearrange to separate the starred from the non-starred elements
? ?
∂f
α X + rS ∂S − rf α X + rS ∂f
∂S − rf
?
λ= ∂f
= ∂f ?
= λ?
∂v ∂v

The following line of argument is the most important part of the derivation,
and the most tricky to understand at first reading: In the above expression the
RHS ratio λ (which depends only on f) is equal to the LHS ratio λ ? (which
depends only on f ? ). Recall that f and f ? are the pricing functions of two arbitrary
derivatives, which means that the above ratio will be the same for all derivative
contracts. If we selected another derivative contract X ?? , then for its pricing
function λ = λ?? , which implies λ = λ? = λ?? , etc. This means that although
?
16
Apparently, for the solution to exist we need ∂f∂v 6= 0. This corresponds to our previous
remark that a forward contract cannot serve as the hedging instrument.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


183(6.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
λ can depend on (t, S, v), it cannot depend on the particular features of each
derivative contract (since if it did, it wouldn’t be equal for all of them). We
therefore conclude that
λ = λ? = λ?? = · · · = λ(t, S, v)
That is very important, because it means that all derivatives will satisfy the
same ratio, which can be rewritten as a single PDE17
∂f
α X + rS ∂S − rf
∂f
= λ(t, S, v)
∂v
∂f ∂f 1 ∂2 f
⇒ + {α(v) − λ(t, S, v)} + vS 2 2
∂t ∂v 2 ∂S
1 2 ∂2 f √ ∂2 f ∂f
+ β (v) 2 + ρ vSt β(v) + rS = rf
2 ∂v ∂S∂v ∂S
As always, the boundary conditions of this PDE will define which contract is
actually priced. In particular, the terminal condition f(T , S, v) = Π(S), with Π(S)
the payoff of the derivative.

The Feynman-Kac link


It is very instructive to pause at this point and verify the links that connect the
two approaches. Using Girsanov’s theorem we built the EMM and we concluded
that a derivative, say with payoff XT = f(T , ST , vT ) = Π(ST ), will be priced as
the expectation under the EMM
X0 = exp(−rT )EQ Q
0 XT = exp(−rT )E0 Π(ST )

where the dynamics of the underlying asset and its volatility are given by the
SDEs

dSt = rSt dt + vt St dBtQ
dvt = α Q (t, St , vt )dt + β(vt )dBtQ,v
Q,v
EQ Q
t dBt dBt = ρdt
with the drift of the variance process given by α Q (t, S, v) = α(v)−Ξ v (t, S, v)β(v).
The price of volatility risk is Ξ v .
Using the PDE approach we concluded that the pricing function f(t, S, v)
will solve the PDE
∂f ∂f 1 ∂2 f
+ α ? (t, S, v) + vS 2 2
∂t ∂v 2 ∂S
1 ∂2 f √ ∂2 f ∂f
+ β 2 (v) 2 + ρ vSt β(v) + rS = rf
2 ∂v ∂S∂v ∂S
17
An identical line of argument is used in fixed income securities, which we will follow
in chapter XX.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  184(6.3)

with α ? (t, S, v) = α(v) − λ(t, S, v) boundary condition f(T , S, v) = Π(S).


The Feynman-Kac formula links the two approaches, as it casts the solution
of the PDE as an expectation under the dynamics dictated by Girsanov’s theorem.
In fact, it follows that α Q (t, S, v) = α ? (t, S, v), which implies that

λ(t, S, v) = Ξ v (t, S, v)β(v) = α(v) − α Q (t, S, v)

The free functional λ that we introduced in the PDE approach can be interpreted
as the total volatility risk premium. For investors that are averse towards high
volatility λ 6 0.

Example: The Heston model


If we implement the PDE using the Heston (1993) dynamics, the derivative
pricing function will satisfy

∂f ∂f 1 ∂2 f
+ {θ(v̄ − v) − λ(t, S, v)} + vS 2 2
∂t ∂v 2 ∂S
1 ∂2 f ∂2 f ∂f
+ ξ 2 v 2 + ρvSξ + rS = rf
2 ∂v ∂S∂v ∂S
In his original paper, Heston assumes λ(t, S, v) to be proportional to the
variance v
λ(t, S, v) = λv
Essentially, following our previous discussion, this indicates that the equivalent
function Ξ v in the EMM approach will be

λ(t, S, v) λ√
Ξ v (t, S, v) = = v
β(v) ξ

This means that the parameter λ of the PDE approach has exactly the same
interpretation as κ. This choice for λ sets the PDE

∂f ∂f 1 ∂2 f
+ θ Q (v̄ Q − v) + vS 2 2
∂t ∂v 2 ∂S
1 ∂2 f ∂2 f ∂f
+ ξ 2 v 2 + ρvSξ + rS = rf
2 ∂v ∂S∂v ∂S
The boundary conditions are also need to specified. Following Heston (1993),
for a European call option

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


185(6.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
f(S, v, T ) = (S − K )+
∂f
(∞, v, t) = 1
∂S
f(0, v, t) = 0
f(S, ∞, t) = S
∂f ∂f
(S, 0, t) + θ Q v̄ Q (S, 0, t)
∂t ∂v
∂f
+rS (S, 0, t) = rf(S, 0, t)
∂S

ESTIMATION AND FILTERING


Since in stochastic volatility models the volatility is unobserved, it is generally
very hard to estimate the parameters based on historical asset returns, and filter
the unobserved volatility process. People have used a number of approaches, for
some of which we give references below. For more details see the surveys of
Ghysels et al. (1996) and Javaheri (2005).
1. Indirect inference: Estimating a deterministic model, for example via Arch or
Egarch, and then studying the dynamics of the filtered volatility (Nelson,
1990, 1991).
2. Simulation based methods: Although the conditional moments or the likeli-
hood are not available in closed form, they can be simulated. Of course, a
simulation has to be run between all time steps, which makes these proce-
dures computationally intensive and very time consuming. Examples include
• Efficient Method of Moments (EMM), e.g. Gallant and Tauchen (1993)
• Simulated Maximum Likelihood (SMM), e.g. Sandmann and Koopman
(1998)
• Markov Chain Monte Carlo (MCMC), e.g. (Eraker, Johannes, and Polson,
2001)
• (Unscented-) Particle Filter, (PF, UPF), e.g. van der Merwe, de Freitas,
Doucet, and Wan (2001)
3. Kalman filter methods: The classical Kalman filter is not directly applicable,
but it can be used in some cases after a transformation. Versions of the
extended Kalman filter have also been employed.
4. Likelihood Approximation Methods: The likelihood can be approximated for
the affine class of models, constructing an updating procedure for the char-
acteristic function (Bates, 2005). Alternatively, the volatility process itself
can be approximated using a Markov chain, as in Chourdakis (2002).

CALIBRATION
Even if we estimate the parameters of a stochastic volatility models using his-
torical time series of asset returns, not all parameters would be useful for the
purpose of derivative pricing. This happens because the estimated parameters

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  186(6.3)

would be the ones under the true probability measure, while investors will use
some adjusted parameters to price derivatives. In particular, for stochastic volatil-
ity models the drift of the variance will be a modification of the true one, which
is done by setting the price of volatility risk. To recover this price of risk, one
should consult some existing derivative prices.
For that reason, practitioners and (to some extend) academics prefer to use
only derivative prices, and calibrate the model based on a set of liquid options.
A standard setting is where a derivatives desk wants to sell an exotic option, and
then hedge its exposure, and say that a stochastic volatility model is employed.
The desk would look at the market prices of liquid European calls and puts, and
would calibrate the pricing function to these prices. Such parameters are the risk
neutral ones, and therefore can be used unmodified to price and hedge the exotic
option. In a sense, they are a generalization of the BS implied volatilities. In a
way, practitioners want to price the exotic contract in a way that is consistent
with the observed vanillas.
If the calibrated model was the one that actually generated the data, then
these implied parameters should be stationary through time, and their variability
should be due to measurement errors alone. In practice of course this is not
the case, and practitioners tend to recalibrate some parameters every day (and
sometimes more often).
To implement this calibration we will need to minimize some measure of dis-
tance between the theoretical model prices and the prices of observed options.
Say that we have a pricing function P(τ, K ; θ) = P(τ, K ; S0 , r; θ), where θ de-
notes the set of unobserved parameters that we need to extract. Also denote with
σ(τ, K ; S0 , r; θ) the implied volatility of that theoretical price, and with P ? (τ, K )
and σ ? (τ, K ) the observed market price and implied volatility. For example, in
Heston’s case θ = {v0 , θ, v̄, ξ, ρ}. There are many objective functions that one
can use for the minimization, the most popular having a weighted sum of squares
form XX 2
G(θ) = wi,j P(τi , Kj ; θ) − P ? (τi , Kj )
i j

The weights wi,j can be used to different ends. Sometimes the choice of w i,j
reflect the liquidity of different options using a measure such as the bid-ask
spread. In other cases one wants to give more weight to options that are near-
the-money (using for example the Gamma), or to options with shorter maturities.
In other cases one might want to implement a weighting scheme based on the
options’ Vega, in order to mimic an objective function that is cast in the implied
volatility space.
Recovering the parameter set θ is not a trivial problem, as the objective
function can (and in many cases does) exhibit multiple local minima. This is
a common feature of inverse problems like this calibration exercise. Typically
some regularization is implemented, in order to make the problem well posed
for standard hill climbing algorithms. A popular example is Tikhonov-Phillips
regularization (see Lagnado and Osher, 1997; Crépey, 2003, for an illustration),
where the objective function is replaced by

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


187(6.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
G̃(θ) = G(θ) + α · g2 (θ, θ 0 )
for a regularization parameter α. The role of the penalty function g(θ, θ 0 ) is to
keep the parameter vector θ as close as possible to a vector that is based on
some prior information θ 0 . Depending on the particular pricing form, sometimes
non-smoothness penalties are also sometimes imposed. 18
From a finance point of view, the issue of multiple optima highlights the
existence of model risk (Ben Hamida and Cont, 2005). Based on the finite set
of option prices, different model parameters are indistinguishable. Using one
set over another to price an exotic contract which might be sensitive to them
can introduce losses. More generally, given the increasing arsenal of theoretical
pricing models, different model classes can give identical fit for vanilla contracts
(for details see Schoutens, Simons, and Tistaert, 2004).

Calibration example
As an example we will fit Heston’s stochastic volatility model to a set of observed
option prices. In particular, we are going to use contracts on the SP500 index
written on April 24, 2007. The objective function that we will use is just the sum
of squared differences between model and observed prices. Listing 6.5 gives the
code that computes the objective function. The prices are computed using the
fractional FFT (see chapter 4), and the integration bounds are automatically
selected to reflect the decay of the integrand ψ(u, T ).19
The snippet 6.6 shows how this objective function can be implemented to
calibrate Heston’s model using a set of observed put prices on the SP500 index.
There are eight different maturities in the data set, ranging from 13 to 594 days.
The sum of squared differences between the theoretical and observed prices is
minimized, and for this example we did not use any weighting scheme. Figure
6.6 shows the observed option prices and the corresponding fitted values. The
table below gives the calibrated parameters θ̂ = {v0 , θ, v̄, ξ, ρ}
v0 0.0219
θ 5.5292
v̄ 0.0229
ξ 1.0895
ρ -0.6459
18
This is particularly true for calibrating local volatility models which have a large
number of parameters. We will discuss this family of models in the next section.
19
As Kahl and Jäckel (2005) show, the characteristic function of the Heston model for
large parguments decays as A exp(−uC )/u2 times a cosine (whereRA = ψ(0, T ) and

C = 1 − ρ2 (v0 + θv̄T )/ξ). We can therefore bound the integral z |ψ(u, T )|du by
|A| exp(−C z)/z. The solution of exp(w)w = x is Lambert’s W function which is imple-
mented in Matlab through the Symbolic Math Toolbox. If this toolbox is not available
we have to devise a different strategy to set the upper integration bound, for example
using the moments expansion for the characteristic function. If everything else fails
we can just set a ‘large value’ for the upper integration bound, or set up an adaptive
integration scheme.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  188(6.3)

LISTING 6.5:
TT k ` %T%! " Y : Sum of squares for the Heston model.
p 5%5®Cs>?O4C5=C+>; Bw8
*u%;t.=O3+>; M~}Ž5%5®Cs>?O4C5=C+>;à21N, 6V1*v6¬“%N=%Nc
1O5gB<D‚ }•1CN,v:„e†/1O5gB0D¿N, }•1CN,v<e†
1O5gB0=?O4=NŒ}•1CN,v Ce†/1O5gB-y3 }•1CN,v9˜Oe†
1O5gB-,?O+ }•1CN,v<™e†
5
4=%NŒ}•1%*zB04=%Nz†V }‘„ ‚Gž1%*B2ON1%1ONKc†
«>£‡}Œ“%N=%Nv%FO6:„e†{³}Œ“%N=%Nv%FO6<OH ™(†ª©Œ}Œ“%N=%Nv%FO6 Ce†
£´}Œ“%N=%Nv%FO69˜Oe†ù¨‚Œ}Œ“%N=%Nv%FO6<™e† ,~}Œ“%N=%Nv%FO6 CaHO„>‚%‚e†
p ;a+,8aNA3‰4‘1C,O3%.4C5ŒN>;“¡5=%,O3 O4C5ƒ*+,ލ‚Œ}Ž„
£;Ž}Ÿ£vB9Ha¨‚†{©;Ž}Ÿ©B9Ht¨‚g†
10
p 54A4C.=‘“O3*%*4,4;C=Ž8aN=uC,O3=O34C5
x9³ u 6 ua|Œ}•u%;t3®uO4z<³tc†¬,>uŽ}•,g º u(c†{¯uŽ} A4;CŠ%=? -³>u(c†
M}Œ‚cº † p JO3AAƒC44>1þ+>uC=1%uC=‘¨¨ü
*+, ;ƒ}‘„tF-¯u p A++ 1 =?C,C+>uCŠ?޳
1O5B2=Œ}r³>uv9;Gc† p 54h
= ƒC4C5=C+>;Ž1ON,N>8K5
15
1O5B2,Œ}r,>uv9;Gc†
º £ƒ«~}þ<³O}}O³>uv9;G·¹þ:«>£‘œ•‚e† p 54A4C.=’.NA%A5
}þ<³O}}O³>uv9;G·¹þ:«>£‘›•‚e† p 54A4C.=ƒ1u=C5
p º 54=‡1ON,N>8a4=4,O5Ž*+, ¾ ¼ ¾ ³
p ¦ 3 *Ž=?C4‡*u%;t.=O3+>;þAN>8 q 4,%=Jà%B-Œ35•;O+=”NDNC3AN q A4’54=
p
20
p ¦ 1%*vB2u%¿N,Ž=%+~NþAN,%Š4ƒD%N%A uC4°¬*+,‡=?C4ƒu%1%1O4, q +>u%;C“
p
N‚~} ,%4N%A -1%?t3s>?O4C5=C+>;àž3C—K 4=%Nt‹O„HC4=%NKHG4=%NK‹O„6V1O5ac†
NC„š} 5>®,= 0„Cžt01O5gB-,?O+tCt—cBBB
-1O5gB<D‚K‹c-1O5gB0=?O4=NK—K-1O5gB-D¿N,(C—K-1O5gB2=t%HG-1O5gB<y3ac†
1%*zB<u%¿N,Ž}‡AN>8 q 4,%=J°:N‚t—%NC„%HGHNC„(†
25
1%*zB<%¿N,Ž}‘„GB<‚C— 8ONy  N q 5  A+Š <©;v º § « £cc†
p ,u; ¾ ¼ ¾ ³Ž1C,O3%.%3 ;CŠ”4;CŠO3 ;O4‡*+F , ƒC4C5=C+>; º
x©Dv6ù«DO|~}Œ*%,%*%=Cs.NA%A 1%?t3s>?O4C5=C+>; 6V1C56{1%*Gc†
p .+>;t5=%,ut.=ˆ5=%,O3 O4C5‡N>;“Œ1u=ƒ1C,O3%.4C5 :*,+8¡1ON,O3=%MG
©DŒ} 4y1 <©DGc†{£Dƒ}Œ«Dƒ‹ 4y1 ž,>uv9;G%—%³>uv9;G%—©D”ž~„a†
30
«*Œ} 3 ;C=4,1G„ 2©Dv6ª«>Dv6ª©;v «Kc† p 3 ;C=4,1a+%AN=4ú5N>81aA4
£*Œ} 3 ;C=4,1G„ 2©Dv6/£Dv6ª©;v ºº £cc†
p u%1C“N=4‘¨¨ü
M~}•MŒ‹ 5u8 :«>*(ž£;v º «tCB<~‹ 5u8 -£*(ž£;v º £cB<g†
4>;“
35
M~} A+Š <MKc† p =%N>C4”A+Šƒ=%+~?C4%A 1þ+>1C=O38K3‰N=O3+>;

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


189(6.3) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 6.6:
^>$'%& C_ ` %T%! " Y : Calibration of the Heston model.
p .NA3 q s>?O4C5=C+>; Bw8
p 381a+,%=¡+>1C=O3+>;Ž1C,O3%.4C5~“%N=%N
“%N=%N‡}ŒyCA5,4%N“vC¨!£ Cs%˜Cs‚˜Cs‚ CsN B-yA%5g:c†
p 54=~u1‘1ON,N>8a4=4,O5Ž*+,Ž=?C4 ¾ ¼ ¾ ³
1%*B04=%N }‘„GB<™‚e† p «N,,ž%€CN“N;‡1ON,N>8a4=4,
5
1%*B9¯ }ƒ™C„>c† p ;%u8 q 4,‘+>* ¾¾ ³ƒ1a+3 ;C=O5
1%*B<ON1%1ONŒ } c† p u%1%1O4,¡3 ;C=4Š%,N=O3+>; 1ON,N>8a4=4,
p 3 ;t3=O3NAƒ1ON,N>8a4=4, 54=
1CN,ƒ} x9‚B2‚ƒ‚eB9‚%‘™(B9‚‚þ„KB9‚‚ ž>‚eYB ‚|e†
p +>1C=O3+>;t5ƒ*+,‡=?C4‘+>1C=O38K3‰N=O3+>;
10
+ 1=ƒ}‡+>1C=O38K54=°C-¸CN,%Š4¨.NA46r:+**6~<¥a3%5 1aANM°6r3>=%4, 0c†
1CN,ƒ}Œ*8K3 ;t.+>;  C5%5®Cs>?O4C5=C+>; 6V1N, 6ƒx¤|6‡x¤|6‡x¤|6‘xw|v6%BBB
PSfrag replacements
0 x9‚B2‚%‚™Ž‚B2‚%‚™ƒ‚eB0„ ‚‘‚eB0„ ‚ ž>‚eYB |v6%BBB
0.1 x9‚B<™‚%‚‘‚B<™‚%‚Ž‚eB9‚”™(B9‚‚ ž>‚eB˜%‚|v6%BBB
15
0.2
0.3
¤
x |6¢+1%= 6V1*v6¬“N=N6¬‚e†
0.4
0.5
0.6
FIGURE
0.7 6.6: Calibrated option prices for Heston’s model. The red circles give
the observed
0.8 put prices, while the blue dots are the theoretical prices based on
0.9
Heston’s model that minimize the squared errors.
1

0.10

0 0.08
0.1
0.2
0.3
0.4 0.06
0.5
0.6
0.7 0.04
0.8
0.9
1
0.02

0
0.90 0.95 1.00 1.05 1.10

These parameters are typical of a calibration procedure, and give an objective


function value of G(θ̂) = 0.0090. The question is of course whether or not
these parameters are unique, well defined and stable. In figure 6.7(a) we show
the function G(θ) for different combinations of (θ, ξ), keeping the rest of the
parameters at their estimated values. There appears to be a “valley” across a

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  190(6.3)

FIGURE 6.7: The ill-posed inverse problem in Heston’s case. Subfigure (a) gives
the objective function that is minimized to calibrate the parameters. Subfigure
PSfrag replacements
PSfrag replacements
(b) presents the isoquants of this function, together with the minimum point
0
attained using numerical optimization. Observe that all points that are roughly
0
0.1
0.2

across the red line are indistinguishable. The regularized function is given in
0.1
0.2
0.3
0.4
0.3 0.5
(c), while (d) shows its isoquants. Observe that the regularized function is better
0.4
0.5
0.6
0.7

^:%^S ` %T%! " Y ]


0.6
behaved than the original one.
0.7
0.8
0.8
0.9
1
[0.9
1

40

10
0 35
0 0.1
0.1 0.2
0.2 8 0.3 30
0.3 0.4
0.4 6 0.5 25
0.5 PSfrag replacements
G(θ)

0.6
0.6
PSfrag replacements 0.7
4 20

θ
0.7
0.8
0.8
2 0.9
0
0.9 15
1 0.11
0 0.2
0.1 0
0.3 10
0.2 0
0.4
0.3 0.5
0.4 5
20 0.6
0.5 0
1 0.7
0.6 2 0
40 4 3 0.8
0.7 5 0 1 2 3 4 5
0.9
0.8 θ ξ ξ
0.9 1
1
(a) the function G(θ) (b) isoquants of G(θ)
40

10
0 35
0 0.1
0.1 0.2
0.2 8 0.3 30
0.3 0.4
0.4 6 0.5 25
0.5
G(θ)

0.6
0.6 0.7
4 20
θ

0.7 0.8
0.8
2 0.9
0.9 15
1
1
0
10
0

5
20
1 0
3 2 0
40 5 4 0 1 2 3 4 5
θ ξ ξ

(c) the function G̃(θ) (d) isoquants of G̃(θ)

set of values, indicating that it is very hard to precisely identify the optimal
parameter combination. It is apparent that combinations of values across the
red line in 6.7.b will give values for the objective function that are very close.
This means that based on this set of vanilla options the combinations (θ, ξ) =
(5.0, 1.0), (10.0, 1.7) or (15.0, 2.5) are pretty much indistinguishable.
One way around this problem would be to enhance the information, by in-
cluding more contracts such as forward starting options or cliquet options. 20
20
A forward starting option is an option that has some features that are not determined
until a future time. For example, one could buy (and pay today for) a put option with
three years maturity, but where the strike price will be determined as the level of
SP500 after one year. Essentially one buys today what is going to be an ATM put
in a years’ time. A cliquet or rachet option is somewhat similar, resembling a basket
of forward starting options. For example I could have a contract where every year the

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


191(6.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
These are contracts that depend on the dynamics of the transition densities for
the volatility, and not only on the densities themselves as vanillas do. For ex-
ample, a forward starting option would depend on the joint distribution of the
volatilities at the starting and the maturity times. Alternatively, if such exotic
contracts do not exist or they are not liquid enough to offer uncontaminated
values, one could stick with vanilla options and use a regularization technique.
This demands some prior view on some parameter values, which could be based
on historical evidence or analysts’ forecasts. As an example, in Heston’s model
the parameter ξ is the same under both the objective and the risk neutral mea-
sure. Based on an estimate ξ0 using historical series of returns and/or option
values, one can set up the objective function

G̃(θ) = G(θ) + α(ξ − ξ0 )2

In that way estimates will be biased towards combination where the prior value
is ξ0 . For example, the estimation results of Bakshi, Cao, and Chen (1997) based
on option prices and the joint estimation of returns and volatility in Pan (1997),
indicate a value of ξ ≈ 0.40. Therefore, if we set ξ0 = 0.40 and α = 0.005, the
objective function to be minimized is the one given in figure 6.7(c,d). The optimal
values are now given in the following table

v0 0.0200
θ 3.5260
v̄ 0.0232
ξ 0.7310
ρ -0.7048

The new objective function at the optimal is G̃(θ̂) = 0.0099 which implies a
sum of squares value G(θ̂) = 0.0094, which is not far from the unconditional
optimization result.

6.4 THE LOCAL VOLATILITY MODEL


Stochastic volatility models take the view that there is an extra Brownian motion
that is responsible for volatility changes. This extra source of randomness creates
a market that is incomplete, where options are not redundant securities. Practi-
cally, this means that in order to hedge a position one needs to hedge against
volatility risk as well as market risk. Local volatility models take a completely
different view. No extra source of randomness is introduced, and the markets
remain complete. In order to account for the implied volatility skew there is a
nonlinear (but deterministic) volatility structure
payoff is determined and paid, and the strike price is readjusted according to the new
SP500 level.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  192(6.4)

LISTING 6.7:
"$X>$> h  Y : Nadaraya-Watson smoother.
p ;ON“JCN=CBw8
*u%;t.=O3+>; ‰3r}r;ON“JCN=C2y6/Mz6V‰z6VJz6{y%36{M%36/?yv6/?%MG
¯Œ} A4;CŠ%=? <ytc†
3 *¡3%54>81C=%M 2JK
JŒ}‡+ ;C45¯6 „e†
5
4>;“
‰C3„Ÿ}Œ‚c†
‰C3~}Œ‚c†
*+, •}‘„tF<¯
y4} 4y1 ž>‚eB2™H%?%y(%—Gy%3Kžygr OB<OH 5>®,= <%— 1O3 H?%yz†
10
M4} 4y1 ž>‚eB2™H%?%M(%—GM%3KžMgr OB<OH 5>®,= <%— 1O3 H?%Mz†
‰C3„Ÿ}Œ‰C3„Ÿ‹•‰gr O%—Jr O—y4B2—M4†
‰C3~}Œ‰C3~‹ŸJr O%—%y4B2—M4†
4>;“
15
‰3Ã}~‰C3„gB9HO‰C3g†

dSt = rSt dt + σ(t, St )St dBt

As vanilla options are expressed via the risk neutral expectation of the random
variable ST , local volatility models attempt to construct the function σ(t, S) that
is consistent with the implied risk neutral densities for different maturities. The
methodology of local volatility models follows the one on implied risk neutral
densities, originating in the pioneering work of Breeden and Litzenberger (1978).
These methods are inherently nonparametric, and rely on a large number
of option contracts that span different strikes and maturities. In reality there
is only a relatively small set of observed option prices that is traded, and for
that reason some interpolation or smoothing techniques must be employed to
artificially reconstruct the true pricing function or the volatility surface. Of course
this implies that the results will be sensitive to the particular method that is
used. Also, care has to be taken to ensure that the resulting prices are arbitrage
free.

INTERPOLATION METHODS
There are many interpolation methods that one can use on the implied volatility
surface. As second order derivatives of the corresponding pricing function are
required, it is paramount that the surface is sufficiently smooth. In fact, it is
common practice to sacrifice the perfect fit in order to ensure smoothness, which
suggests that we are actually implementing an implied volatility smoother rather
than an interpolator. Within this obvious tradeoff we have to selecting the degree
of fit versus smoothness, which is more of an art than a science.
One popular approach is to use a family of known functions, and reconstruct
the volatility surface as a weighted sum of them. As an example we can use

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


193(6.4) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 6.8:
& Y  ` l !'( Y : Implied volatility surface smoothing.
p 381asDC+%ABw8
“%N=%N‡}ŒyCA5,4%N“vC¨!£ Cs%˜Cs‚˜Cs‚CsN B-yA%5g:c†
«>£‡}Œ“%N=%Nv%FO6:„e†{³~}Œ“%N=%Nv%FO6<OH ™(†ª©Œ}Œ“%N=%Nv%FO6Ce†
£Œ}Œ“%N=%Nv%FO69˜Oe†ù¨‚Œ}Œ“%N=%Nv%FO6<™e†{,~}~“%N=%N°%FO6COHO„>‚%‚e†
x º §°6 º §O3%|Œ} q 5s3Dz0¨>‚6V©z6/,z6V³z6ª« £°6/£Kc†
5
x9³ u 6 ua|Œ}•u%;t3®uO4z<³tc†
p .,4%Nº =4”+>uC=1%uC=ŽŠ%,O3“O5
“©‡}ƒ™G†ù“³ƒ}Œ‚B2‚(†
©+~}ú„ %‚%‚F:“©F:„ %‚%‚Oz†
³%+~}  8t3; <‚B %‚C—%³t(F<“³F 8ONy „GB:„>‚C—%³tg†
10
¯©C+~} A4;CŠ%=? <©+tc†ª¯%³+~} A4;CŠ%=? -³%+tc†
x9©Š%+6¬³Š+C|~} 8a4C5 ?CŠ%,O3“ 2©%+z6¬³%+tc†
©%D+~} ,4C5 ?ON1O4 <©Š%+6Œx2¯©C+a—¯³%+v6·„|Oe† p D4C.=C+,O3‰4
³D+~} ,4C5 ?ON1O4 -³Š%+6Œx2¯©C+a—¯³%+v6·„|Oe†
³>8a+~} A+Š 0³D+tc† p =%,N;t5*C+,8
15
©8a+~} A+Š :¨‚:„gB9HC©%D+tB9H 5>®,= 0³D+tc†
p 1C,41ON,4”NC.=uONAŒ“%N=%N
³ 8‘} A+Š <³tc† p =%,N;t5*C+,8
©>8‘} A+Š :¨‚:„gB9H%©KeB9H 5>®,= <³tc†
20
p ¯N“%N,%NM%N°žµCN=O5+>; 58t+%+=?O4,
p º §%D+ƒ}~;ON“JCN=C2© 86{³86 º §°6ƒx¤|6ª©>8O+6ª³ 8O+v6ù‚eB9‚%™6±‚B:„>‚O(†
p ¼CN“O3NAƒ¿CNC5%3%5 ¾ u%;t.=O3+>;ú58t+%+=?O4,
.+4*Ž}ƒ, q *O.,4%N=4°%x2© 8@†ª³8à|6 º §a6%BBB
-¼%¿ u%;t.=O3+>; 6r<8uaA=O3®uON“%,O3%. 6r-¼%¿ ¨ 8t+%+=?@6ù‚B<%™C(†
25
§º %D+ƒ}ƒ¾%, ¾q *O3 ;C=4,1à%x<©>8O+°†¬³ 8O+v|6·.+4*(g† ¾
p 3 ;C=4,1a+%AN=4’,C35¡*,%44‘,%N=%4Ž*+,”+>uC=1%uC=
,D+~} 3 ;C=4,1G„ <³ u 6{,g uce6/³D%+v6~A3 ;O4%N,v6r04y%=%,N1 :c†
p .+ 81%uC=4‡1C,O3%.4C5ŒN>;“Žº ,4C5 ?ON1O4‡=%+r8aN=%,O3%.4C5
30
£%D+~} q 5sŠ%,4%4t5:¨‚:„g6V©D%+v6{,D%+v6 º §D%+ 6ª³D%+v6 „e†
§C+~} ,4C5 ?ON1O4  §D%+6%x<¯³%+6ª¯©C+O|Ce†
£º +} ,4C5 ?ON1O4 <£º D%+6Œx9¯³%+6ª¯©C+O|Ce†
,%+} ,4C5 ?ON1O4 -,D%+6Œx9¯³%+6ª¯©C+O|Ce†

the radial basis function (RBF) interpolation, where we reconstruct an unknown


function using the form
N
X
f(x) = c0 + c0 x + λn φ (||x − x n ||)
n=1

The points that we observe are given at the nodes x n , for n = 1, . . . , N. The radial
function φ(x) will determine how the impact of the value at each nodebehaves.
Common radial functions include the Gaussian φ(x) = exp −x 2 /(2σ 2 ) and the

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  194(6.4)
PSfrag replacements PSfrag replacements
0 0
0.1 0.1
0.2 0.2
FIGURE 6.8: Implied volatilities smoothed with the radial basis function (RBF,
0.3
0.4
0.3
0.4

left) and the Nadaraya-Watson (NW, right) methods. The corresponding local
0.5
0.6
0.5
0.6
0.7 0.7
volatility surfaces and the implied probability density functions for different
0.8
0.9
0.8
0.9

horizons are also presented.


1 1

0 0
0.1 0.1
0.2 0.2
0.3 0.3
0.4 0.4 0.4 0.4
0.5 0.5
0.6 0.6
0.7 0.3 0.7 0.3
0.8 0.8
0.9 0.2 0.9 0.2
1 PSfrag replacements1
0
0.1 0.1
0.1
0.2
0 0.3 0
1300 2 0.41300 2
1400 1.5 0.5 1400 1.5
1 0.6 1
1500 0.7 1500
0.5 0.5
1600 0 0.8 1600 0
0.9
1
(a) implied volatility (RBF) (b) implied volatility (NW)

0
0.1
0.2
1.5 0.3 1.5
0.4
0.5
PSfrag replacements 0.6
PSfrag replacements PSfrag replacements
1 0.7 1
0 0
0.8
0.1 0.1
0.9
0.2 0.21
0.5 0.5
0.3 0.3
0.4 0.4
0.5 0.5
0.6 0 0.6 0
1300 2 1300 2
0.7 0.7
1400 1.5 1400 1.5
0.8 0.8
1 1
0.9 1500 0.9 1500
0.5 0.5
1 1
1600 0 1600 0

(c) local volatility (RBF) (d) local volatility (NW)


0 0
0.1 0.1
0.2 0.2
0.3 0.3
0.4 0.4
0.50.012 0.50.012
0.6 0.6
0.7 0.7
0.8 0.8
0.90.008 0.90.008
1 1

0.004 0.004

1300 1300
1.5 1.5
1400 1400
1 1
1500 0.5 1500 0.5
1600 0 1600 0

(e) implied density (RBF) (f) implied density (NW)

p
multiquadratic function φ(x) = 1 + (x/σ)2 , among others.21 The values of the
parameters c0 , c and λn are determined using the observed value function at the
nodes x n and the required degree of smoothness. Figure 6.8(a) presents a set
21
The parameter σ is user defined. In Matlab the RBF interpolation is im-
? =:=-1GFIH:H<J:J:JKBÌ8N0=-?J>+-,-5CB.0+98H98N0= A0N q .-4<; =,NA:H
plemented in the package of Alex Chirokov that can be download at
.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


195(6.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
FIGURE 6.9: Static arbitrage tests for the smoothed implied volatility functions
of figure 6.8. Vertical, butterfly and calendar spreads are constructed and their
prices are examined. Green dotes represent spreads that have admissible prices,
while red dots indicate spreads that offer arbitrage opportunities as they are
violating the corresponding bounds.
PSfrag replacements PSfrag replacements

1.8 1.8

1.6 1.6

1.4 1.4

1.2 1.2
maturity

maturity
1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0 0.2 0 0.2
0.5 0.5
1 0 1 0
1300 1350 1400 1450 1500 1550 1600 1300 1350 1400 1450 1500 1550 1600
1.5 1.5
2 strike 2 strike

PSfrag replacements
(a) vertical spreads (RBF) PSfrag replacements
(b) vertical spreads (NW)

1.8 1.8

1.6 1.6

1.4 1.4

1.2 1.2
maturity

maturity
1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4
0
0 0.2 0.2 0.2
0.5 0.4
1 0 0.6 0
1300 1350 1400 1450 1500 1550 1600 1300 1350 1400 1450 1500 1550 1600
1.5 0.8
2 strike 1 strike

PSfrag replacements
(c) butterfly spreads (RBF) PSfrag replacements (d) butterfly spreads (NW)

1.8 1.8

1.6 1.6

1.4 1.4

1.2 1.2
maturity

maturity

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4
0 0
0.2 0.2 0.2 0.2
0.4 0.4
0.6 0 0.6 0
1300 1350 1400 1450 1500 1550 1600 1300 1350 1400 1450 1500 1550 1600
0.8 0.8
1 strike 1 strike

(e) calendar spreads (RBF) (f) calendar spreads (NW)

of observed implied volatilities together with the smoothed surface constructed


using the RBF interpolation method. The implementation is given in listing 6.8.
The Nadaraya-Watson (NW) smoother is another popular choice. Here the
approximating function takes the form

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  196(6.4)

LISTING 6.9:
%%T ` l !'( Y : Tests for static arbitrage.
p =4C5=CsDC+%AvBw8
381asDC+%A† p A+N“‘“%N=%N‘N>;“’58t+%+=?”DC+%AN=O3A3=%Mú5 uC,%*NC.4
p D4,%=O3%.NA’5 1C,4%N“O5
§¨~} -£+%FO6:„KF 4 ;%“ ž%„až£+%FO6<(F 4>;“ H%“©†
§¨~} 2§%¨cœ}‚CC¹(-§%¨G›}O„g†
5
prq uC=%=4,%*CAMˆ5 1C,4%N“O5
¿¨~}r£+%FO6eF 4>;“ Kž>—£+z%FO6<(F 4 ;%“ ž%„%‹%£+z%FO6:„KF 4 ;%“ že†
¿¨~} 2¿%¨cœ}‚Cg†
p .NA4;C“N,þ5 1C,4%N“O5
«¨~}r£+<(F 4 ;%“ 6%F-až>£+z:„KF 4 ;%“ ž%„e6%F-(†
10
«¨~} -«¨cœ}‚Cg†

PN
n=1 wn yn exp(−x 0 Hx)
f(x) = P N
n=1 wn exp(−x 0 Hx)

where yn is the observed value at the point x n , and the matrix H = diag(h1 , . . . , hn )
is user defined. This is implemented for the two-dimensional case in listing 6.7.
Figure 6.8(b) gives the implied volatility surface smoothed using the Nadaraya-
Watson method.
Of course the smoothed or interpolated volatility surface can be mapped
to call and put prices using the Black-Scholes formula. There is also a num-
ber of restrictions that one needs to take into account when constructing the
volatility surface. In particular, it is important to verify that the resulting prices
do not permit arbitrage opportunities. As shown in Carr and Madan (2005) it
is straightforward to rule out static arbitrage by checking the prices of sim-
ple vertical spreads, butterflies and calendar spreads. More precisely, having
constructed a grid of call prices for different strikes 0 = K0 , K1 , K2 , . . . and ma-
turities 0 = T0 , T1 , T2 , . . ., with Ci,j = fBS t, S; Ki , Tj , r, σ̂ (Ki , Tj ) , we need to
construct the following quantities
C −C
1. Vertical spreads V Si,j = Ki−1,j
i −Ki−1
i,j
. There should be 0 6 V Si,j 6 1 for all
i, j = 0, 1, . . .
2. Butterfly spreads BSi,j = Ci−1,j − KKi+1i+1−K Ki −Ki−1
−Ki Ci,j + Ki+1 −Ki Ci+1,j . There should
i−1

be BSi,j > 0 for all i, j = 0, 1, . . .


3. Calendar spreads CSi,j = Ci,j+1 − Ci,j . There should be CSi,j > 0 for all
i, j = 0, 1, . . .
In figure 6.9 we construct these tests for the resulting volatility surfaces
based on the two smoothing methods, implemented in listing 6.9. With green
dots we denote the points where no arbitrage opportunities exist, while red dots
represent arbitrage opportunities. Both RBF and NW methods yield prices that
pass the vertical spread tests. The NW smoother produces a very small number of
very short away-from-the-money prices that allow the setup of butterfly spreads

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


197(6.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
with negative value. Both methods fail the calendar spread test for far-out-of-
the-money calls with very short maturities. Nevertheless, the bid-ask spreads in
these areas are wide enough to ensure that these opportunities are not actually
exploitable. Overall the results are very good, but if needed one can incorpo-
rate these tests within the fitting procedures, and thus find smoothed volatility
surfaces that by construction pass all three arbitrage tests.
Another important feature of the implied volatility is that it should behave
in a linear fashion for extreme log-strikes (Lee, 2004a; Gatheral, 2004). This
indicates that it makes sense to extrapolate the implied volatility linearly to
extend outside the region of observed prices.
Apart from these nonparametric methods one can set up parametric curves
to fit the implied volatility skew. A parametric form might be less accurate, but
it can offer more a robust fit where the resulting prices are by construction
free of arbitrage. Gatheral (2004) proposes an implied variance function for each
maturity horizon, coined the stochastic volatility inspired (SVI) parameterization,
of the form
 q 
v(k; α, β, σ, ρ, µ) = α + β ρ(k − µ) + (k − µ) + σ2 2

where k = log(K /F ). This form always remains positive and ensures that it
grows in a linear fashion for extreme log-strikes. In particular Gatheral (2004)
shows that α controls for the variance level, β controls the angle between the
two asymptotes, σ controls the smoothness around the turning point, ρ controls
the orientation of the skew, and µ shifts the skew across the moneyness level.

IMPLIED DENSITIES
Based on the implied volatility function σ̂(T , K ) the empirical pricing function
is easily determined via the Black-Scholes formula

P(T , K ) = fBS t, S0 ; T , K , r, σ̂ (T , K )

It has been recognized, since Breeden and Litzenberger (1978), that the empirical
pricing function can reveal information on the risk neutral probability density
that is implied by the market. In particular, if Qt (S) is this risk neutral probability
measure of the underlying asset with horizon t, then the call price can be written
as the expectation
Z ∞
P(T , K ) = exp(−rT ) (S − K )dQT (S)
K

If we differentiate twice with respect to the strike price, using the Leibniz
rule
Z Z β(t)  
∂ β(t)  dβ(t)  dα(t) ∂
g(x, t)dx = g β(t), t − g α(t), t + g(x, t) dx
∂t α(t) dt dt α(t) ∂t

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  198(6.4)

LISTING 6.10:
'!%^ ` l !'( Y : Construction of implied densities and the local volatil-
ity surface.
p %A +.sDC+%ABw8
381asDC+%A† p A+N“‘“%N=%N‘N>;“’58t+%+=?”DC+%AN=O3A3=%Mú5 uC,%*NC.4
p „%5 =Ž“4,O3DƒJ%,=Œ³
¥t„³ƒ} -£+ eF 4 ;%“ 6%F-až>£+z:„KF 4 ;%“ ž6%F-CHc<%—a“³Kc†
p „%5 =Ž“4,O3DƒJ%,=~©
5
¥t„:©‡} -£+%FO6 eF 4>;“ Cž>£+z%FO6:„KF 4 ;%“ žOHc<%—a“©Gc†
p ;%“Ž“4,O3DƒJ%,=~©
¥C>©‡} -£+%FO6 eF 4>;“ Gž>—£+%FO6<(F 4 ;%“ ž%„%‹%£+z%FO6:„KF 4 ;%“ žOH%“©(c†
p 381aA34“Ž1C,C+ q N q 3A3=%M “4;t5%3=%M”*u%;t.=O3+>;
10 ¾p A%+}.NAƒ 4y1 ž,%+%FO6<(F 4 ;%“ ž%„B2—C³Š+%FO6<(F 4 ;%“ ž%„OB2—¥C>©†
DC+%AN=O3A3=%M’*u%;t.=O3+>;
¸§‡}r£+<(F 4 ;%“ ž%„e6<(F 4 ;%“ ž%„ežKBBB
©%Š+<(F 4 ;%“ ž%„e6<(F 4 ;%“ ž%„gB2—¥t„:©°<(F 4 ;%“ ž%„e6%F-e†
¸§‡}r¥t„³v%FO6<(F 4 ;%“ ž%„•‹~,%+<(F 4 ;%“ ž%„e6<(F 4 ;%“ ž%„B2—%¸§†
15
¸§‡}r¸§ B<H(<‚B<™—O©%Š+z<(F 4 ;%“ ž%„e6<(F 4 ;%“ ž%„zB-eB-—%¥C>© <(F 4 ;%“ ž%„e6%F-Cc†

we obtain the Breeden and Litzenberger (1978) expression for the implied prob-
ability density function

∂2 P(T , K )
dQT (S) = exp(rT ) (6.1)
∂K 2 K =S

It is easy to compute this derivative numerically, and therefore approximate


the implied density using central differences. In particular
P(T , S − 4K ) − 2P(T , S) + P(T , S + 4K )
dQT (S) ≈ exp(rT )
(4K )2

One can recognize that the above expression is the price of 1/(4K ) 2 units of a
very tight butterfly spread around S, like the one used in the static arbitrage
tests above. The relation between the butterfly spread and the risk neutral
probability density is well known amongst practitioners, and can be used to
isolate the exposure to specific ranges of the underlying. We carry out this
approximation in listing 6.10, and the resulting densities are presented in figures
6.8(e,f).

LOCAL VOLATILITIES
A natural question that follows is whether or not a process exists that is consis-
tent with the sequence of implied risk neutral densities. After all, Kolmogorov’s
extension theorem 1.3 postulates that given a collection of transition densities
such a process might exist. Dupire (1994) recognized that one might be able to
find a diffusion which is consistent with the observed option prices, constructing

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


199(6.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
the so called local volatility model, where the return volatility is a deterministic
function of time and the underlying asset. In a series of papers Derman and Kani
(1994), Derman, Kani, and Chriss (1996), and Derman, Kani, and Zou (1996) out-
line the use of the local volatility function for pricing and hedging, while Barle
and Cakici (1998) present a method of constructing an implied trinomial tree
that is consistent with observed option prices.
The dynamics of the underlying asset (under the risk neutral measure) are
given by
dSt = rSt dt + σ(t, St )St dBt (6.2)
The popularity of the local volatility approach stems from the fact that the the
steps taken in the derivation of the Black-Scholes PDE can be replicated, since
the local volatility function σ(t, S) is deterministic. In particular, the markets
remain complete as there is only one source of uncertainty that can be hedged
out using the underlying asset and the risk free bank account.
The pricing function for any derivative under the local volatility dynamics
will therefore satisfy a PDE that resembles the Black-Scholes one
∂ ∂ 1 ∂2
f(t, S) + rS f(t, S) + σ 2 (t, S)S 2 2 f(t, S) = rf(t, S)
∂t ∂S 2 ∂S
Of course, having a functional form for the volatility will mean that closed form
expressions are unattainable even for plain vanilla contracts. Nevertheless, it
is straightforward to modify the finite difference methods that we outlined in
chapter 4 (for example the θ-method in listing 3.3) to account for the local
volatility structure.
Dupire (1993) notes that if the diffusion (6.2) is consistent with the risk
neutral densities (6.1), then the risk neutral densities must satisfy the forward
Kolmogorov equation (see section 1.6). In particular, if we denote the transi-
tion density with with f Q (t, K ; T , S) = Q(ST ∈ dS|St = K ), then the forward
Kolmogorov equation will take the form
∂ Q
f (t, K ; T , S)
∂T
∂   1 ∂2  2 
rK f Q (t, K ; T , S) +
=− σ (T , K )f Q (t, K ; T , S)
∂K 2 ∂K 2
Given the Breeden-Litzenberger representation of the densities (6.1), we can
write
∂2 P(T , K )
f Q (t, K ; T , S) = exp(rT )
∂K 2
By taking the derivative with respect to T , and substituting in the forward
equation we have, after some simplifications, the following
 
∂2 P(T , K ) ∂3 P(T , K ) ∂ ∂2 P(T , K )
r + + rK
∂K 2 ∂T ∂K 2 ∂K ∂K 2
 
1 ∂2 2
2
2 ∂ P(T , K )
− σ (T , K )K =0
2 ∂K 2 ∂K 2

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  200(6.4)

We can integrate the above expression twice with respect to K which will even-
tually yield the PDE22

∂P(T , K ) ∂P(T , K ) 1 2 ∂2 P(T , K )


+ rK − σ (T , K )K 2 = c0 (T )K + c1 (T )
∂T ∂K 2 ∂K 2
The functionals c0 (T ) and c1 (T ) appear as integration constants with respect to
K , and need to be identified using some boundary behavior. In particular, we can
use that as the strike price increases, K → ∞, the call prices and all derivatives
will decay to zero. This will happen because the risk neutral probability density
f Q (t, K ; T , S) decays as S → ∞. In that case the left hand side that involves
the derivatives will equal to zero, which implies that c0 (T ) = c1 (T ) = 0 for all
maturities T . The Dupire PDE is therefore

∂P(T , K ) ∂P(T , K ) 1 2 ∂2 P(T , K )


+ rK − σ (T , K )K 2 =0
∂T ∂K 2 ∂K 2
This partial differential equation resembles the Black-Scholes PDE, and
is actually its adjoint in the sense that Kolmogorov’s backward and forward
equations are. The Black-Scholes PDE will give the evolution of the call price
as we approach maturity and as the underlying asset changes, keeping the
strike and maturity constant. The Dupire PDE is satisfied by a call option as
the maturity and the strike price change, keeping the current time and current
spot price constant.
We can solve the above expression for the local volatility function
v
u ∂P(T ,K )
u + rK ∂P(T ,K )
σ(T , K ) = t ∂T 2
∂K
1 2 ∂ P(T ,K )
2K ∂K 2

The above links the local volatility model with prices of observed call options 23
and in principle it could be used to extract the local volatility function σ(t, S)
from a set of observed contracts. Unfortunately, there is a number of practical
problems with this approach, which stem from the fact that the local volatility is
a function of first and second derivatives of the pricing function P(T , K ). For a
start, there is only a relatively small number of calls and puts available at any
point in time, which means that we will need to set up some interpolation before
we carry out the necessary numerical differentiation using finite differences.
Therefore, our results will be dependent on the interpolation scheme that we
use.
In addition, the observed option prices are “noisy”, and interpolating through
their values will cause its own problems. Numerical differentiation is unsta-
ble at the interpolating nodes, and attempting to take the second derivative
is a guarantee for disaster, with the resulting local volatility surfaces varying
2 P(T ,K )
h i
22
During the second integration we use the identity K ∂ ∂K 2
= ∂
∂K
K ∂P(T
∂K
,K )
− ∂P(T
∂K
,K )
.
23
And also put options through the put-call parity.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


201(6.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
wildly. For these reasons practitioners prefer to use a smoothing method, like
the Nadaraya-Watson or the Radial Basis Function that we outlined above.
The construction of the local volatility is given in listing 6.10, together with the
density extraction. Figures 6.8(c,d) give the local volatility surface for the two
smoothing procedures. One can observe that the two surfaces look very similar,
with the RBF producing somewhat smoother time derivatives. Of course this will
depend largely on the parameters that define the smoothing procedure, which
are chosen ad hoc.
The local volatility function can also play the role of the risk neutral estimator
of the instantaneous future volatility at time T , if the underlying asset level at
this future time is equal to K . As shown in Derman and Kani (1998)
 

σ 2 (T , K ) = EQ (dST )2 |ST = K = EQ lim (ST +4t − K )2 |ST = K
4t↓0

If one assumes a form for the implied volatility function, either using an
interpolator or a smoother, it is possible to express the local volatility σ(t, S) in
terms of the implied volatility σ̂(T , K )|T =t,K =S . This is of course feasible since
the pricing function

P(T , K ) = fBS (t, S0 ; T , K , r, σ̂ (T , K ))

which can be differentiated analytically with respect to the strike K and the
maturity T . It is actually more convenient to work with the moneyness y =
log(K /F ) = log(K /S) + rT , and also consider the implied total variance as
a function of the maturity and the moneyness w(T , y) = T σ̂ 2 (T , K ). Then, as
shown in Gatheral (2006) the local variance can be easily computed as
1 ∂w(T ,y)
σ 2 (T , y) = T ∂T
 2
y ∂w(T ,y) 4y2 −4w(T ,y)−w 2 (T ,y) ∂w(T ,y) 1 ∂2 w(T ,y)
1− w(T ,y) ∂y + 16w 2 (T ,y) ∂y + 2 ∂y2

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


7
Fixed income securities

Fixed income securities1 promise to pay a stream of fixed amount at predefined


points in time. Typically, the issuers of these securities are either governments
(sovereign bonds) or corporations (corporate bonds). Bonds are debt instruments,
used by governments or corporations to borrow money from investors.
Zero-coupon bonds offer a single fixed payment, called the face value of the
bond Π, on the maturity date T . Coupon bearing bonds also promise to pay a
stream of cash flows, the coupons, in addition to the face value. In particular, a
c
c%-coupon bond will pay an amount equal to 100 Π of the face value per year.
c
Typically payments are made in two semi-annual installments of 12 100 Π each.
Instruments with short maturities, like the US Treasury bills, are typically
zero coupon. Longer maturity instruments, like the US Treasury bonds, are typ-
ically coupon bearing. Corporate bonds also typically bear coupons. In most
cases, when a new coupon bearing instrument is introduced, the coupons are
chosen as for the instrument to sell at par. This means that its initial price is
approximately equal to its face value. Then, the coupon reflects the rate of in-
terest: for example if a sovereign 6% 30 year bond will face value $100 is issued
and sells at par, then the buyer will lend the government today $100, and will
receive 12 × 100
6
× $100 = $3 every six months for the next 30 years, plus $100
on the bond maturity.

7.1 YIELDS AND COMPOUNDING


Of course, the bond will almost never sell at exactly its par value. The yield
of the bond is the equivalent constant rate of interest that is able to replicate
all cashflows to maturity, when investing an amount equal to the current bond
price. It is obvious that the frequency that one reinvests the proceeds will be a
1
In this chapter we will call all fixed income securities bonds, although in reality
the word “bond” is reserved for instruments with relatively long maturities. Shorter
instruments in the US are called “bills” or “notes”.
 !"#%$& '()"  204(7.1)

factor that will affect the bond yield. When the yield of an instrument is quoted,
it is important to know what compounding method has been used, in order to
truly compare bonds.
In particular, let Pt denote the price of an instrument at time t (measured in
years). The simple yield y1 (t1 , t2 ), between two dates t1 and t2 , satisfies
 
P t2 1 P t2
= 1 + y1 (t1 , t2 )(t2 − t1 ) ⇒ y1 (t1 , t2 ) = −1
P t1 t 2 − t 1 P t1

The simple yield is the return of an investment equal to Pt1 that is initiated at
time t1 , and is then liquidated at time t2 for a price Pt2 . There is no intermediate
reinvestment of any possible proceedings.
Of course one could sell the instrument at the intermediate time t ? = t1 +t 2
2

for a price Pt? , and reinvest this amount for the remaining time to t2 . Say that the
yield of this strategy is denoted y2 (t1 , t2 ). In that case the two simple investments
will satisfy

P t2
= 1 + y2 (t1 , t2 )(t2 − t ? )
Pt ?
Pt ?
= 1 + y2 (t1 , t2 )(t ? − t1 )
P t1

Multiplying the will give the yield if we compound twice during the life of the
bond, namely
  "  #
P t2 t2 − t 1 2 2 Pt2 1/2
= 1 + y2 (t1 , t2 ) ⇒ y2 (t1 , t2 ) = −1
P t1 2 t2 − t 1 P t1

More generally, if we compound m times over the life of the bond we can
follow the same procedure to deduce the yield ym (t1 , t2 )
 m " 1/m #
P t2 t2 − t 1 m P t2
= 1 + ym (t1 , t2 ) ⇒ ym (t1 , t2 ) = −1
P t1 m t2 − t 1 P t1

If we pass to the limit m → ∞ we can recover the continuously compounded


return y∞ (y1 , y2 ) using the fact that limm→∞ (1 + a/m)m = exp(a)
 m
P t2 t2 − t 1 
= lim 1 + y∞ (t1 , t2 ) = exp y∞ (t1 , t2 )(t2 − t1 )
P t1 m→∞ m
 
1 P t2
⇒ y∞ (t1 , t2 ) = log
t2 − t 1 P t1

We will work with the continuous compounded return from now on, and we
will drop the subscript ∞, writing instead y∞ (t, T ) = y(t, T ) = yt (T ). We will
also denote with P(t, T ) = Pt (T ) the price at time t that matures at time T .

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


205(7.2) Q #RO& $>S%!%TVU>! WRX$>SO&T
Different instruments are quoted using different market conventions. In addi-
tion to the compounding, there are also conventions in the way time intervals are
computed. For example, some instruments are quoted assuming that the month
has 30 days and the year 360 (the 30/360 convention). This means that in order
to convert months to days we assume that each month has 30 days, and to con-
vert days to years we assume that each year has 360 days. Other instruments
might be quoted using the ACT/365, ACT/ACT or more rarely the conventions. 2
If we are dealing with coupon bearing bonds, then we would need to decom-
pose the coupon payments and take their present values individually.
As an example, say that we are interested in a zero-coupon bond that has
22 months and 12 days to maturity, and we are quoted a yield of 5.5%. What is
the value of this bond today, assuming that the face value is $100? If this bond
is compounded semiannually and the 30/360 convention is used, then we would
compute the time interval as

22m + 12d = 1y + 10m + 12d = 1y + 300d + 12d = 1y + 312d = 1.866y

Compounding will take place at the points t0 = 0.366y, t1 = 0.866y, t2 =


1.366y and t3 = 1.866y. Therefore, if we denote with bond prices will satisfy

100
= (1 + 0.055 × 0.366) (1 + 0.055 × 0.5)3 =
P t1

The financial toolbox of Matlab has a number of functions that convert be-
tween different day count conventions, and the appropriate discount factors.
Here, since the markets are set up in continuous time we will use continuous
compounding.

7.2 THE YIELD CURVE


At each point in time t we have the opportunity to invest in instruments of
different maturities τ, each offering a particular yield y(t, τ). The mapping τ →
y(t, t+τ) is called the yield curve. Essentially it represents the annualized return
that is guaranteed by a zero-coupon bond with maturity τ. Observed yield curves
are typically upward sloping, with the yields for long maturities being higher
than the short ones. Such yield curves are called normal to illustrate that this
pattern is the most common. Having said that, flat or inverted (downward sloping)
yield curves are also occasionally observed. A humped yield curve pattern is
rarely encountered.

THE NELSON-SIEGEL-SVENSSON PARAMETRIZATION

J:J:JKB3:5<“NaBI+-,:Š
2
More information can be found at the International Swaps and Derivatives Association
website ( ), and in particular in ISDA (1998).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  206(7.2)

FIGURE
0.0 7.1: Examples of yield curves using the Nelson-Siegel-Svensson
0.1
parametrization. The parametric form is able to produce curves that exhibit the
0.2yield curve shapes.
basic
0.3
0.4 5.0
0.5 flat
normal
0.6 4.8 inverted
humped
0.7 4.6
0.8
0.9 4.4
1 4.2
yield

4.0

3.8

3.6

3.4

3.2

3.0
0 1 2 3 4 5 6 7 8 9 10
maturity

Nelson and Siegel (1987) and Svensson (1994), collectively denoted with NSS,
discuss various parametric forms of the yield curve, summarized in the form

1 − e−τ/τ1 e−τ/τ2 (1 − e−τ/τ2 )


τ → β 0 + β1 + β2
τ/τ1 τ/τ2
There are five parameters in the NSS expression, which control different aspects
of the yield curve shape. In particular, β0 can be used to shift the yield curve
up and down, therefore defining the yields for long maturities. β 1 controls the
amount of curvature that the curve exhibits, while β3 is responsible for a potential
hump. The parameters τ1 and τ2 will determine at which maturities the curvature
and the hump are most pronounced. Figure 7.1 shows some basic yield curves
patterns that can be produced using the NSS approach. The original paper of
Nelson and Siegel (1987) used only the first two components, allowing for level
and curvature effects. Svensson (1994) augmented the formula with the hump
component.
In reality we only observe yields for a relatively small number of maturi-
ties, and the quotes can be contaminated with noise. This can be due to non
synchronous trading, illiquidity and other microstructure issues. The NSS func-
tionals can be used to smooth the observed yield curve, and to interpolate for
maturities that are not directly traded. Also, generalized versions of the NSS
approach can be used to study the dynamics of the yield curve in time.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


207(7.2) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 7.1:
"'T>! " T&   ' ` T l  "OTT>!" Y
` b : Yields based on the Nelson-Siegel-
Svensson parametrization.
p a; 4CAC5%+;tsC534%ŠC4CAsC5DC4;K55%+;Bw8
*u%;t.=O3+>; MC}r;a4CAC5%+;tsC534%ŠC4CAsC5DC4;K55%+;g2³6{1CN,c
q ‚Œ}•1CN,B q 4=N%‚†
q „Ÿ}•1CN,B q 4=Na„(†
q ~}•1CN,B q 4=Ng†
5
=O„Ÿ}•1CN,B0=%N>uK„†
=%~}•1CN,B0=%N>uOƒ‹ 4>1a5 †
M~} q ‚~‹ q „C—K0„Cž 4y1 ž³B9HC=O„CB2Hc0³zB9HC=O„C BBB
‹ q O— 4y1 ž³zB9HC=%tB<—G:„Ož 4y1 ž³zB9HC=%tB2Hc0³B9HC=%tc†

Listing 7.1 gives a simple Matlab code that implements the NSS formula.
Individual yield curves can be used to calibrate this formula and retrieve the
corresponding parameters.
In this chapter we are interested in the construction of mathematical models
that have two desirable and very significant features. On one hand, they should
have the potential to reproduce observed yield curves. In addition, they must be
able to capture the evolution of the yield curve through time, in order to offer
reliable prices for derivative contracts that are based on future yields or bond
prices.

THE DYNAMICS OF THE YIELD CURVE


Figure 7.2 gives historical yield curves over the period 2001-07. A few casual
observations can be made, which can offer valuable insight on the stylized facts
that a fixed income model should adhere to. It appears that the yield curve is
indeed typically upward sloping, with a few instances where it is flat or slightly
inverted. There is no significant “hump-ness” in this particular dataset. The short
end of the yield curve appears to be a lot more volatile than the long end, which
is relatively stable. Also, yields of different maturities do not tend to move in
opposite directions. In contrast they seem to be quite strongly correlated.
We can use these yields to recover the parameters of the NSS formula. In this
particular instance we assume that β2 = τ2 = 0, as the yields in the dataset
do not exhibit a humped pattern. An example of how one can calibrate these
parameters is given in listing 7.2. Figure 7.3 shows these parameter estimates
through time.
The parameter β0 corresponds to the maximum yield across different matu-
rities. As the long term bond yields gradually dropped throughout the sample
period, β0 also decreases to reflect that. As illustrated by the time path of the
parameter β1 , the yield curve became slightly more convex in the first half of the
period, flattening quite rapidly afterwards. Parameter τ 1 shows that the short
end of the yield curve rose steeply between mid-2002 to mid-2003.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  208(7.2)
0.1
0.2
FIGURE 7.2: Historical yield curve dynamics. The corresponding Matlab code can
0.3
be found in listing 7.7.
0.4
0.5
0.6
0.7
0.8
0.9 8
1
6
yield
4

20

2007 2006 10
2005 2004 2003 2002 2001
time maturity

LISTING 7.2:
^>$'%& _ %R $>% ` "OTc Y : Calibration of the Nelson-Siegel formula to a
yield curve.
p . NA3 q ,N=4s>;t5Bw8
*u%;t.=O3+>; xw‰z6{1CN,O|~}‘.NA3 q ,N=4s>;t5°2³6¤K
1CN,ƒ}Œ‚c†
+ 1=ƒ}‡+>1C=O38K54=°C-¥a3%5 1aANM°6r2;O+ ;C4v:c†
‰~}‡A5®;a+>;aA3 ;àr%55 ®à6Œx 8O4N>; ¤K(6ª‚(B-„(6 ‚eB0„|6%BBB
5
x¤|6‡x¤|6 + 1=Gc†
*u%;t.=O3+>; 5 ®ƒ}‡55>®z91G
1CN,vB q 4=N%‚Œ}š1v:„e†
1CN,vB q 4=Na„Ÿ}š1v<e†
1CN,vB q 4=N~}Œ‚c†
10
1CN,vB-=%N>uK„Ã}š1v Ce†
1CN,vB-=%N>uO }Œ‚c†
5 ®ƒ}r;a4CAC5%+;tsC534%ŠC4CAsC5DC4;K55%+;g2³6{1CN,c~ž¨¤†
4>;“
15
4>;“

THE FORWARD CURVE


Different points on the yield curve provides us with the risk free rates of return,
for investments that commence ‘now’ (that is time t in our notation), and mature
at different times in the future. The yield curve also defines the forward rates,

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.1
0.3
0.2
Q #RO& $>S%!%TVU>! WRX$>SO&T
0.4
0.3
209(7.2)
0.5
0.4
0.6
0.5
FIGURE
0.7 7.3: Historical Nelson-Siegel
0.6  parameters.
 The Nelson and Siegel (1987)
0.8
formula
0.7 β 0 + β 1 1 − exp(−τ/τ 1 ) / τ/τ 1 is calibrated to the yields of figure 7.2
0.9
and0.8 0 parameters are presented below. The level parameter β 0 , the convexity
the
0.91
0.1
magnitude parameter β1 and the convexity steepness parameter τ1 are given.
0.21
0.3 7.5
0.4 7.0
0.5 6.5
0.6 6.0

β0
0.7 5.5
0.8 5.0
0 4.5
0.9
0.11 4.0
0.2 0
0.3 -1
0.4 -2
0.5 -3
β1

0.6 -4
0.7 -5
-6
0.8
-7
0.9
1 12
10
8
τ1

6
4
2
0
2002 2003 2004 2005 2006

which are the fixed rates of return that are set and reserved at time t, but will
be applicable over a future time period.
In particular, say that we select two points on the yield curve, for bonds that
mature at times T ? and T , with T > T ? > t. The prices of these bonds will be
Pt (T ) and Pt (T ? ), respectively. Now assume that we are interested in setting the
forward (continuously compounded) rate of interest for an investment that will
commence at time T ? and will mature at T , which we will denote with ft (T ? , T ).
Consider the following two investments over the period [t, T ]:
1. Buy one risk free bond that matures at time T . This will cost P t (T ) today,
and will deliver one pound at time T .
2. Buy Pt (T )/Pt (T ? ) units of the risk free bond that matures at time T ? . Also
enter a forward contract to invest risk-free over the period [T ? , T ], at the
rate ft (T ? , T ). This strategy will also cost Pt (T ) today, as it is free to enter
a forward contract. The first leg will deliver Pt (T )/Pt (T ? ) pounds at time T ? ,
which will be invested at the forward rate. Therefore at time T this strategy
will deliver exp{ft (T ? , T ) · (T − T ? )} · Pt (T )/Pt (T ? ).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  210(7.3)

These two strategies have the same initial cost to set up, the same maturity,
and are both risk free. Therefore they should deliver the same amount on the
maturity date T , otherwise arbitrage opportunities would arise. For example, if
the second strategy was delivering more than one pound at time T , then one
would borrow Pt (T ) at the risk free rate to enter the second strategy with zero
cost at time zero.
Therefore, the arbitrage free forward rate will satisfy

Pt (T ) = Pt (T ? ) · exp {−ft (T ? , T ) · (T − T ? )}
1 Pt (T )
⇒ ft (T ? , T ) = − ?
log
T −T P( T ? )

If we let the time between the two maturities shrink down to zero, by let-
ting for example T ? → T , we define the (instantaneous) forward rate. This is
essentially the short rate that we can reserve today, but will applied at time T

log Pt (T ) − log Pt (T ? )
ft (T ) = − lim
? T ↑T T − T?
∂ log Pt (T ) ∂yt (T )
=− = yt (T ) + (T − t)
∂T ∂T
Forward rates for different maturities define the forward curve. There is a corre-
spondence between the yield and forward curves, and knowing one leads to the
other.

7.3 THE SHORT RATE


Historically, the first family of models introduced in the fixed income literature
were the so called short rate or one-factor models. The main underlying as-
sumption is that there is a unique Brownian motion that is responsible for the
uncertainty in the economy. More formally, we start with a filtered probability
space (Ω, F , {Ft }t>0 , P), and consider a Brownian motion Bt with respect to
this probability measure.
The main ingredient of the one-factor model is the short rate process that
is to say the process of the instantaneous risk free rate r t . Essentially, this is
the rate offered by the bank account or current account, which is not fixed for
any period of time, but is nevertheless risk free during the infinitesimal period
(t, t + dt). The investor is not bound for any maturity and can withdraw funds
from (or add funds to) this account freely, without incurring any penalties. This
is in contrast to other financial assets, like the ones introduced in the Black-
Scholes paradigm, where the return over this infinitesimal period is random.
Of course, investing in the bank account over a longer period of time is not
a risk free investment, since the short rate will change. Having said that, one
can show that if the short rate is process that drives the economy, then bonds

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


211(7.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
with different maturities can be priced in a consistent way that does not permit
arbitrage opportunities. This means that eventually we will show that all bonds
can be priced relative to each other in a unique way. Intuitively, one can think
of bonds as derivatives which are contingent on the future realizations of the
short rate.
To put things more concretely, the current account will satisfy the ordinary
differential equation
Z t 
dBt = Bt rt dt ⇒ Bt = B0 exp rs ds
0

SHORT RATE AND BOND PRICING


As we argued above, the short rate process evolves in a stochastic way, and the
uncertainty is described by a Brownian motion Bt . We can therefore cast the
short rate process as a SDE

drt = µ(t, rt )dt + σ(t, rt )dBt

Our objective is to establish prices for bonds with different maturities. The only
constraint that we need to take into account is that the prices of these bonds
must rule out any arbitrage opportunities. In all generality, the price (at time t)
of a bond with maturity T can depend at most on the time t and the short rate
level rt , that is to say
Pt (T ) = g(t, r(t); T )
This formalizes the statement we made above, that the bond is a derivative on
the short rate. It appears that the setting is similar to the one in equity derivative
pricing, if we consider the short rate as the underlying asset. In particular we
can see the analogy

equity price: dSt = µ(t, St ) + σ(t, St )dBt


interest rate: drt = µ(t, rt ) + σ(t, rt )dBt

In both cases we want to establish a derivative pricing relationship

equity derivative: Pt = g(t, St )


bond price: Pt (T ) = g(t, rt ; T )

Although the two settings appear to be very similar, there is a very significant
difference: Unlike equities, the short rate is not a traded asset. This means that
we cannot buy or sell the short rate, and therefore we cannot construct the
necessary risk free positions that produced the Black-Scholes PDE. The market,
as we constructed it, is incomplete.
In fact, the pricing of bonds has more common features with the pricing
of options under stochastic volatility, where again we introduced a non-traded
factor (the volatility of the equity returns). Then (section 6.3) we constructed a

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  212(7.3)

portfolio of two options, in order to solve for the price of volatility risk. Here we
will use the same trick, namely to construct a portfolio of two bonds with different
maturities, and investigate the conditions that would make it (instantaneously)
risk free. This will naturally introduce the price of short rate risk that will be
unknown; we will be able to determine this price of risk by calibrating the model
on the observed yield curve, in the same spirit as the calibration of SV models
on the implied volatility surface. These are summarized in the following table
equity SV fixed income
non-traded asset: volatility short rate
used to hedge: 2 options 2 bonds
calibrate on: IV surface yield curve

THE HEDGING PORTFOLIO


Let us consider two bonds with maturities T1 and T2 , and say that their prices
are given by the functions P( Tj ) = g(t, rt ; Tj ) = gj (t, rt ), for j = 1, 2. Applying
Itō’s formula to the pricing function will give the dynamics of the bond prices,
namely
dPt (Tj ) = αt (Tj )dt + βt (Tj )dBt
with
∂gj (t, rt ) ∂gj (t, rt ) 1 2 ∂2 gj (t, rt )
αt (Tj ) = + µ(t, rt ) + σ (t, rt )
∂t ∂r 2 ∂r 2
∂gj (t, rt )
βt (Tj ) = σ(t, rt )
∂r
Note that both bonds will depend on the same Brownian motion B t , as this is
the only source of uncertainty that affects the bond dynamics through the short
rate.
Say that we sell the first bond and buy ∆t units of the second one. The
portfolio will have value Πt = Pt (T1 ) − ∆t Pt (T2 ), and will obey the SDE
dΠt = dPt (T1 ) − ∆t dPt (T2 )
Our aim is to construct a risk free portfolio; therefore, to eliminate depen-
dence on dBt we choose the portfolio as
βt (T1 ) ∂g1 (t, rt )/∂r
∆t = =
βt (T2 ) ∂g2 (t, rt )/∂r
Then the portfolio will evolve according to the ordinary differential equation
 
βt (T1 )
dΠt = αt (T1 ) − αt (T2 ) dt (7.1)
βt (T2 )
Since the portfolio is now risk free it must grow as the current account, at
rate rt . If that were not the case, arbitrage opportunities would appear. This
means that

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


213(7.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
 
βt (T1 )
dΠt = Πt rt dt = g1 (t, rt ) − g2 (t, rt ) rt dt (7.2)
βt (T2 )

Equating (7.1) and (7.2) yields the consistency relationship

αt (T1 ) − g(t, rt ; T1 )rt αt (T2 ) − g(t, rt ; T2 )rt


=
βt (T1 ) βt (T2 )

Now we invoke the same line of argument that we used in section 6.3. In
order to set up the above relationship we did not explicitly specify a particular
pair of bonds, and it will therefore hold for any pair of maturities. Thus, for any
set of maturities T1 , T2 , T3 , T4 , . . . we can write

αt (T1 ) − g(t, rt ; T1 )rt αt (T2 ) − g(t, rt ; T2 )rt


=
βt (T1 ) βt (T2 )
αt (T3 ) − g(t, rt ; T3 )rt αt (T4 ) − g(t, rt ; T4 )rt
= = = ···
βt (T3 ) βt (T4 )

Therefore the ratio cannot depend on the particular bond maturities, it can at
most depend on (t, rt ), say that it is equal to λ(t, rt ). This means that we can
write
αt (T ) − g(t, rt ; T )rt
= λ(t, rt )
βt (T )
for any maturity T . Essentially we have managed to derive the PDE that the
bond pricing formula has to satisfy, in order to rule out arbitrage opportunities.
We can thus drop the maturity T , as it is not affecting the PDE in any way, and
write

∂g(t, r) ∂g(t, r)
+ {µ(t, r) − λ(t, r)σ(t, r)}
∂t ∂r
1 ∂2 g(t, r)
+ σ 2 (t, r) = g(t, r)r
2 ∂r 2
This PDE is called the term structure PDE, and a boundary condition is
needed in order to solve it analytically or numerically. For a zero-coupon bond
that matures at time T the boundary condition for this PDE will be g(T , r; T ) =
1. Although the PDE is called the term structure PDE, we never used the fact the
the instruments are actually bonds. The quantities Tj can be thought as indices
for different interest rate sensitive instruments: bond options, caps, floors or
swaptions will all satisfy the term structure PDE. In general, any contingent
claim that promises to pay Φ(r(T )) at time T will satisfy the same PDE, with
boundary condition
g(T , r) = Φ(r)

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  214(7.3)

THE PRICE OF RISK


The price of risk functional λ(t, r) can be freely selected, as long as it does not
permit arbitrage opportunities. Intuitively, it seems to be a good idea to ensure
that the function of the spot rate r → λ(t, r) remains bounded for all times t.
This would ensure that the coefficients of the PDE will not explode at any finite
time, and therefore a solution will exist.
Another way of viewing this kind of restriction is by considering the equiv-
alent probability measure, under which pricing takes place. Essentially, if we
denote with µ Q (t, r) = µ(t, r) − λ(t, r)σ(r, t), then the PDE becomes

∂g(t, r) ∂g(t, r) 1 2 ∂2 g(t, r)


+ µQ (t, r) + σ (t, r) = g(t, r)r
∂t ∂r 2 ∂r 2
This PDE will be solved subject to the boundary condition g(T , r) = Φ(r). For
example, for zero-coupon bonds that mature at T we will have Φ(r) = 1.
The Feynman-Kac theorem postulates that the solution of this PDE can be
expressed as an expectation
  Z T  
g(t, r) = EQ exp − rs ds Φ(rT ) rt = r
t

If BQ is a Brownian motion under Q, then the process for rt is given by

drt = µQ (t, rt )dt + σ(t, rt )dBtQ


= µ(t, rt )σ(t, r)dt + σ(t, rt ){dBtQ − λ(t, rt )dt}

The probability measure Q should be equivalent to the true measure P, oth-


erwise arbitrage opportunities would be possible (this is due Rto the fundamental
t
theorem of asset pricing). We can also write BtQ = Bt + 0 λ(s, rs )ds, which
suggests that in fact
dQ
λ(t, rt ) =
dP Ft
That is to say the process λt = λ(t, rt ) is the Radon-Nikodym derivative of the
risk adjusted measure with respect to the true one. In order for this to be a valid
measure, the Novikov condition must be satisfied, namely that the following
expectation is finite for all t
Z t 
2
E exp λ (s, rs )ds < ∞
0

It is apparent that if we require λ(t, r) to be bounded for all t, then the above
expectation will also be bounded. This is a feature that is shared by most models
for the short rate.
Since we are observing bonds which are priced under Q, it is impossible
to explicitly decompose λ from the true short rate drift µ. The best we can do,

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


215(7.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
given this information, is calibrating the short rate model under risk neutrality,
and therefore recovering µ Q . If our purpose it to price interest rate sensitive
securities this does not pose a problem, as pricing will also take place under Q.
Having said that, we might be interested in the true short rate process, perhaps
for risk management which takes place under the objective probability measure.
In that case we can recover the price of risk and the true drift using filtering
methods, for example a version of the Kalman filter.
There is a very extensive literature that investigates the determinants of the
yield curve, trying to explain why it takes its various shapes and what makes
it evolve, for example from a normal to an inverted one. The same factors will
of course also influence the risk premium λ(r, t). Some of the standard term
structure theories include the following
1. The pure expectations hypothesis assumes that bonds are perfect substi-
tutes. Bond prices are determined from the expectations of future short rates.
As the short rate evolves and these expectations vary, the yield curve will
shift to accommodate them. Very high spot rates could therefore imply an
inverted yield curve.
2. Market segmentation takes an opposite view. Short and long bonds are not
substitutes, due to taxation and different investor objectives. For example
pension funds might only be interested in the long end of the curve, while
hedge funds could be willing to invest in short maturity instruments. The
prices for bonds of different yield ranges are determined independently.
3. Somewhat between the above two extreme points lies the theory of preferred
habitat. Investors forecast future rates, but also have a set investment hori-
zon, demanding an extra premium to invest in bonds outside their preferred
maturity ranges. As short term investors outnumber long term ones, prices
of long maturity bonds will be relatively lower, rendering a normal term
structure. This will be inverted if expectations change sufficiently.
4. The liquidity preferences theory goes one step further and states that in-
vestors will demand an extra premium for having their money tied up for a
longer period. Long maturity bonds will therefore have to offer higher yields
to reflect this premium.
As it is naturally expected, all factor will influence the term structure behavior
to some extent at each point in time.

7.4 ONE-FACTOR SHORT RATE MODELS


Following the discussion of the previous section, we are looking for specifications
for the short rate under Q. From now one we will be working only under the risk
neutral measure, therefore unless otherwise stated we will drop the superscript
Q. Our objective is to consider parametric forms µ(t, r) and σ(t, r) that define the
dynamics of the SDE for the short rate

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  216(7.4)

drt = µ(t, rt )dt + σ(t, rt )dBt

In selecting µ and σ we need to keep in mind some stylized facts of interest


rates, and some desirable properties of interest rate models
• The short rate and yields for all maturities are always positive.
• The process is stationary, in the sense that there is a long run distribution
for the short rate. This indicates that the short rate should be allowed to
increase without bound, and some sort of mean reversion should be present.
• As interest rates increase they become more volatile.
• The term structure of interest rates can be upward sloping, downward sloping
or humped. The model should be capable of producing different yield curve
shapes.
• The short end of the yield curve is substantially more volatile than the long
end. The long end appears to evolve in a much smoother way.
• Yields for different maturities are correlated (and ones for adjacent maturities
very strongly correlated), but not perfectly so.
• Finally, for an interest rate model to be operational, it should offer bond
and bond derivatives in closed form (or at least in a form that is readily
computable).

THE VASICEK MODEL


The first generation of one-factor models assumed a time-homogeneous structure
that lead to tractable expressions for bond prices. The Vasiček (1977) modelcasts
the short rate as an Ornstein-Uhlenbeck process, namely

drt = θ r̄ − rt dt + σdBt

In the Vasicek framework the short rate is Gaussian, a feature that leads to
closed form solutions for a number of instruments. For that reason the Vasicek
specification is still used by some practitioners today. In particular

rt |r0 ∼ N (exp(xxxxx), exp(xxxxx))

If we assume a constant price of risk λ(t, r) = λ, then under risk neutrality the
dynamics of the sort rate are

drt = θ r̄ Q − rt dt + σdBtQ

for r̄ Q = r̄ + θλ . This indicates that as investors are risk averse, they behave as
if the long run attractor of the short rate is higher than what it actually is. The
pricing functions of interest rate sensitive securities will satisfy the PDE

∂g(t, r)  ∂g(t, r) 1 2 ∂2 g(t, r)


+ θ r̄ Q − r + σ = g(t, r)r
∂t ∂r 2 ∂r 2

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


217(7.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
For example, in the case of a bond that matures at time T the terminal
condition will be g(T , r; T ) = 1, and we guess the solution of the PDE to be of
the exponential affine

g(t, r; T ) = exp C(t; T ) + D(t; T ) · r

If we substitute this expression in the PDE we can write


   
Ct + θr̄ Q D + σ 2 D 2 /2 + Dt − θD − 1 r = 0

As the PDE has to be satisfied for all initial spot rates r, we conclude that
both square brackets must be equal to zero, and that C(T ; T ) = D(T ; T ) = 0.
Therefore we recover a system of ODEs for the functionals C and D, namely 3

1
Ct (t; T ) + θr̄ Q D(t; T ) + σ 2 D 2 (t; T ) = 0
2
Dt (t; T ) − θD(t; T ) = 1
C(T ; T ) = 0
D(T ; T ) = 0

The solution of the above system will give the Vasicek bond pricing formula,
namely

1 − exp{θ(T − t)}
D(t; T ) = −
θ
[D(t; T ) − (T − t)] · [θ 2 r̄ Q − σ 2 /2] σ 2 D 2 (t; T )
C(t; T ) = −
θ2 4θ
One important feature of the Vasicek model is the mean reversion it exhibits.
In particular, the short rate of interest is attracted towards a long run value r̄.
The strength of this mean reversion is controlled by the parameter θ. Intuitively,
the half life of the conditional expectation is 1/θ, which means that if the short
rate is at level rt at time t, then it is expected to cover half its distance from the
long run value in 1/θ years. The main shortfall of the Vasicek model is that it
permits the short rate to take negative values. This happens because the short
rate is normally distributed, and therefore can take values over the real line.
As bond prices are exponentially affine with the short rate, and future short
rates are normally distributed, it is easy to infer that future bond prices will
follow the lognormal distribution. Therefore bond options will be priced with
formulas similar to the Black-Scholes one for equity options. In particular, the
price of a call option with strike price K that matures at time τ, written on a
zero coupon bond that pays one pound at time T > τ will be equal to
3
Here we follow the approach outlined in Duffie and Kan (1996) for general affine
structures. Such systems of ODEs that are ‘linear-quadratic’ are known as Ricatti
equations.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  218(7.4)

Ct (τ, K ; T ) = Pt (T )N(d+ ) − K Pt (τ)N(d− )


1 Pt (T ) σ?
where d± = ? log ±
σ K · Pt (τ) 2
r
σ 1 − exp{−2θτ}
and σ ? = [1 − exp{−θ(T − τ)}]
θ 2θ

LOGNORMAL MODELS
The main shortcoming of the Vasicek model is that it permits negative nominal
interest rates. One straightforward way around this problem is to cast the prob-
lem in terms of the logarithm of the short rate. The first application of this idea
can be found in Dothan (1978) model, which specifies
 
1
drt = θrt dt + σrt dBt , or d log rt = θ − σ 2 dt + σdBt
2

Here the short rate follows the geometric Brownian motion, just like the un-
derlying stock in the Black-Scholes paradigm. The short rate is log-normally
distributed, and therefore takes only positive values. On the other hand, there
is no mean reversion present, and the long run forecast for the short rate will
either be explosive (if θ > σ 2 /2) or zero (if θ < σ 2 /2). For that reason the Dothan
model is not popular for modeling purposes.
Another approach is casting the logarithm of the short rate to follow the
Ornstein-Uhlenbeck process, giving rise to the exponential Vasicek model

d log rt = θ(log r̄ − log rt )dt + σdBt

In section 7.5 we will discuss the numerical implementation of a popular exten-


sion of this model, due to Black and Karasinski (1991).
An important feature of all lognormal models is the so-called “explosive” be-
havior of the bank account (see for example the discussion in Brigo and Mercurio,
2001; Sandmann and Sondermann, 1997). Loosely speaking, if the yield is log-
normally distributed, then the expected bank account is given by an expression
of the form
EBt = E exp{exp{Z }}, with Z ∼ N(µZ , σZ2 )
It turns out that this expectation is infinite for all values of µ Z and σZ . That
means that, according to lognormal models, even investing for a very short hori-
zon (where the yield is approximately normal) offers infinite expected returns.
Technically speaking, the right tail of the lognormal distribution does not decay
fast enough, and this is the reason of the infinite expectation.

THE CIR MODEL


The most popular member across the one factor model family is without doubt
the one proposed in Cox et al. (1985, CIR). The short rate follows the “square

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


219(7.4) Q #RO& $>S%!%TVU>! WRX$>SO&T
root” or Feller process.4

drt = θ (r̄ − rt ) dt + σ rt dBt

The CIR model is able to capture most of the desired properties of short rate
models. The process is mean reverting, with the long run attractor equal to r̄.
The speed of mean reversion is controlled by the parameter θ. As the short rate
increases, its volatility also increases, at a degree which is dictated by σ. CIR
show that the transition density of the process is a non-central chi-square. In
particular, for
 
 θr̄
2crT |rt ∼ χ 2 4 2 , 2rt c exp{−θ(T − t)}
σ

with c = 2
σ (1 − exp{−θ(T − t)})

Having the transition density in closed form allows us to calibrate the parameters
to a set of historical data. Unfortunately, the short rate is not directly observed,
but practitioners use yields of bonds with short maturities as a proxy for the
dynamics. More elaborate methods involve (Kalman) filtering and are discussed
later.
One can readily compute the expected value and the variance of the short
rate process, in particular

E [rT |rt ] = r̄ + exp{−θ(T − t)}(rt − r̄)


σ2
V [rT |rt ] = (1 − exp{−θ(T − t)}) [r̄ + (2rt − r̄) exp{−θ(T − t)}]

Also, as the forecasting horizon increases, the stationary (unconditional) distri-
bution of the short rate is Gamma
 
2θ 2θr̄
rt ∼ α ,
σ2 σ2

The instantaneous variance of the square root process is proportional to its


level. For that reason, if the short rate reaches zero the stochastic component
disappears, and the process will revert towards its positive long run mean. There-
fore the CIR model rules out negative short rates, rt > 0 for all t. In particular,
Feller (1951) shows that if the condition 2θr̄ > σ 2 is satisfied, then the mean
reversion is strong enough for the process never to reach zero. In that case the
inequality is strict, rt > 0 for all t.
CIR provide a bond pricing formula which also takes the exponentially affine
form
4
Discussed in detail in Feller (1951).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  220(7.5)

FIGURE 7.4: Simulation of CIR yield curves.

11 PSfrag replacements
Time series of yield curves

10

PSfrag replacements 9
10
8
8
7
6

rate

yield
6
4
5
2
4

0
3 30
10
2 20 8
6
10 4
1
0 5 10 15 20 25 30 2
0 0
time time maturity

(a) short rate (b) yield curves

Pt (T ) = exp (C(t; T ) + D(t; T ) · r0 ) , with


 
2θr̄ 2γ exp{(θ + γ)(T − t)/2}
C(t; T ) = − 2 log
σ (θ + γ)(exp{γ(T − t)} − 1) + 2γ
2(exp{γ(T − t)} − 1)
D(t; T ) = −
(θ + γ)(exp{γ(T − t)} − 1) + 2γ
p
γ = θ 2 + 2σ 2

Option prices also take a (relatively) simple form, being dependent on the
cumulative densities of non-central chi-square distributions

Ct (τ, K ; T ) = Pt (T )χ 2 (d1 ; ν1 , ν2 ) − K Pt (τ)χ 2 (d2 ; ν1 , ν3 )

with

d1 = 2r ? [φ1 + φ2 + D(τ; T )], d2 = 2r ? [φ1 + φ2 ]


4θr̄ 2φ12 r0 exp{γ(τ − t)} 2φ12 r0 exp{γ(τ − t)}
ν1 = 2
, ν2 = , ν2 =
σ φ1 + φ2 + D(τ; T ) φ1 + φ 2
2γ θ+γ
φ1 = 2 , φ2 =
σ (exp{γ(τ − t)} − 1) σ2
p  
2 2 ? 1 C(τ; T )
γ = θ + 2σ , r = log
D(τ; T ) K

7.5 MODELS WITH TIME VARYING PARAMETERS


The one factor models we described above have a finite number of parameters.
Although some models can give flexible yield curve shapes, and conform to the
stylized facts (for example CIR), they cannot match the observed yield curve

©ª#«¬rª@­®#¯°!±v¬rª²² ¯#³´ °!±wµ¯¶@­x¯r´ ¬·y¸ ®°®@´zµ´´r­j¹|ºº@»»»j·~´rµ¯#­ ª@³¼´½®r¶·³¯r´º


221(7.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
exactly. The problem is of course that a large (or infinite if we decide to interpo-
late) number of bonds have to be matched using a finite number of parameters
and a given parametric form: the distance between model and observed prices
can be minimized but not set to zero.
This means that for (practically) all maturities the bonds will be mispriced.
This might appear to be a critical drawback, as one is not required to trade at
the model price. It becomes a more serious flaw when one considers derivatives,
where small discrepancies will be magnified, and in fact arbitrage opportunities
will emerge.
Models with time varying parameters were set up to create that capture
any initial yield curve, and therefore are at least arbitrage-free when it comes
to pricing fixed income derivatives. Such models will assume that one or more
parameters are deterministic functions of time, carefully chosen in a way that
ensures that a term structure is perfectly replicated. Therefore a standard input
for such models would be the current observed yield curve. State-of-the-art
variants also be calibrated on implied volatility curves (from caplets, caps or
swaption prices).

THE HO-LEE MODEL


The first one-factor models with varying parameters proposed in the literature
was Ho and Lee (1986). The underlying assumption was the short rate of interest
is a simple random walk with drift

drt = θt dt + σdBt

The drift t 7→ θt is a deterministic function of time. In particular, the bond prices


will satisfy
 Z T 
Pt (T ) = Et exp − rs ds
t
 Z T  Z s Z s  
= Et exp − rt + θu du + σ dBu ds
t t t

Changing the order of integration we conclude that


 Z T Z s Z T Z s 
Pt (T ) = Et exp −(T − t)rt − θu duds − σ dBu ds
t t t t
 Z T Z T Z T Z T 
= Et exp −(T − t)rt − θu dsdu − σ dsdBu
t u t u
 Z T Z T 
= Et exp −(T − t)rt − (T − u)θu du − σ (T − u)dBu
t t

The last integral is actually a normally distributed random variable, following


Itō’s isometry

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  222(7.5)
Z T  Z T   
2 2 σ2
σ (T − u)dBu ∼ N 0, σ (T − u) du = N 0, (T − t)3
t t 3

Therefore the expectation can be computed in closed form, implying a yield


to maturity
 Z T 
σ2 3
Pt (T ) = exp −(T − t)rt − (T − u)θu du − (T − t)
t 6
Z T
T −u σ2
⇒ yt (T ) = rt + θu du + (T − t)2
t T −t 6

The above expression in fact is the functional form for the yield curve, if the time
varying drift functional θt was known. It turns out that it is more convenient
to use the forward curve instead. In particular, applying the Leibnitz rule for
differentiation, yields
Z T
∂ log Pt (T ) σ2
ft (T ) = − = rt + θu du + (T − t)2
∂T t 2

If we differentiate the forward curve with respect to maturity we can achieve an


expression for θt
∂ft (T )
θT = + σ 2 (T − t)
∂T
Knowing the drift functional can lead to prices for bonds and bond options
that are easy to compute. Using the above relationship we can write
 Z T 
σ2 3
Pt (T ) = exp −(T − t)rt − ft (s)ds − (T − t)
t 3

As bond prices are lognormally distributed, options on these bond can be priced
using a formula that is analogous to the Black-Scholes one

THE HULL-WHITE MODEL


The breakthrough of the Ho-Lee model was that it provided a structure where
the observed yield curve is perfectly matched, not allowing for arbitrage op-
portunities between model and observed prices. Having said that, it has two
significant drawbacks, as there is no mean reversion present and the normality
assumption allows negative nominal interest rates. In particular, not exhibiting
mean reversion means that the distribution for the short rate widens with the
time horizon, and the probability of negative rates increases. Hull and White
(1990) take the Ho-Lee model one step further, and construct a model that ex-
hibits mean reversion in the spirit of the Vasicek framework. For that reason the
Hull-White model is also known as the extended Vasicek model.
The short rate is given by

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


223(7.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
drt = (θt − αrt ) dt + σdBt

Now the short rate will revert towards θt /α, with t 7→ θt a deterministic function
of time. Although negative rates are permitted, in many cases the presence of
mean reversion ensures that their probabilities are fairly small.
Using exactly the same arguments as the ones in the Ho-Lee case, we can
solve for the functional θT in terms of the forward curve ft (T )

INTEREST RATE TREES


Models with time varying parameters give bond and bond option prices that are
expressed as integrals of the forward curve. In practice, such models and their
extensions are implemented through trees. In particular, the seminal papers of
Hull and White (1994, 1996, hencforth HW) show how one can produce trinomial
trees that will approximate a generic model of the form

dξt = (θt − αt ξt ) dt + σt dBt


rt = φt (ξt )

The state variable ξt follows a generalized Ornstein-Uhlenbeck process. The


mean reversion level, the speed of mean reversion and the volatility are allowed
to be deterministic functions of time. The short rate process is given as transfor-
mation of this state variable. Typical transformations are the identity φ t (ξ) = ξ
and the exponential φt (ξ) = exp{ξ}.

CALIBRATION OF INTEREST RATE TREES


The calibration of an interest rate tree using the HW methodology is carried
out in two stages, first building an auxiliary tree and then adjusting it to match
the observed yield curve.

The first stage


In the first stage a trinomial tree is built that reverts to zero, approximating the
diffusion
dζt = −αt ζt dt + σt dBt
As an example we will assume that the mean reversion parameter and the
volatility are constant, but extensions are straightforward, if one wishes to render
them time-varying.
The tree that approximates the process ζt is constructed recursively. Let us
assume that the tree has been constructed up to time t, and denote with its
discretized values ζ̄i , for i = −m̄, . . . , m̄. We will show how to select the nodes
and the transition probabilities that will grow this tree to time t + 4t. The first
step is to select the grid spacing across the state space, for which HW suggest
p
4ζ = 3σ 4t

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  224(7.5)

LISTING 7.3:
 ` ^ R%$>%e Y : Create Hull-White trees for the short rate.
p ?JOs.,4%N=4 Bw8
*u%;t.=O3+>; ƒµ‡}r?JOs.,4%N=4v2³z6–¿6{*C„6ùN%A 1?CN°6 5%3Š8aNK
¥³ƒ} “C3>** <³tc†
¥yƒ} *4DNA 53>Š>8ON 6{³tc†
¥yƒ}r¥y:„KF 4 ;%“ ž%„gB2— 5>®,=  —¥³Gc†
5
§´} -¥y°B<B2NH (†
y}F¾x9‚| ¿(†¬£´O } ¾¿c†±«yŒF } ¾¿c†
*+, 3 yŒ}‘„tF A4;CŠ%=? <¥³G
y4~}‘.4A%A% 8aN=v<yg 4>;“ (†
¥%yC3}r¥yz3 yGc†
10
3­} A4;CŠ%=? <ytc†±3>‚~} A4;CŠ%=? -y4Kc†
€´}až¥³z3 yG%— *4DNA :N%A 1?CN 6{³g3 yG%—%y4g†
©´} ,C+>u%;C“ :y4O‹€KB9H¥%yC3tc†
8 } 8ONy 2©K%‹O„t†
}þ-y4Œ‹€‘ž ©t—¥%yC3aB9H¥%yC3g†
15
£ ¾a3 /y ¿Ã}ˆx§3 yGCHc<%—C¥%yC3K%‹ B<—Gr a‹O„zB2Hz6%BBB
/
„%ž>§3 yGH%¥%yC3aaž B<z6%BBB
§3 yGCHc<%—C¥%yC3K%‹ B<—G# cž%„zB<H%|†
«/y ¾a3 Ày ¿•}ˆxw©cž„c6/©z6/©t‹O„|‡‹8G‹O„t†
y ¾a3 yt‹O„ ¿´}r¥%yC3a—K ž 8vFw8cg†

20
4>;“
ü•F } ¾x0„ | ¿(†ù,~O } ¾¿c†
*+, %yŒ}‘„tF A4;CŠ%=? <yttž„
«>yƒ}Œ«Ày ¾%jy ¿G†
Š~} *%‰4,C+ # <‰t”0!ü ¾%jy ¿%BBB
25
— 4y1 ž *4DNA <*C„6<y ¾%jy ¿‹C‰t%—¥³z2%yGOž¿<%yG(6¬‚B2‚a„e†
, ¾%/y ¿r} *4DNA <*C„6{Š~‹y ¾%jy ¿%e†

yyŒ}¶xw|e†
*+, r}Ž„tF 8ONy  8ONy :«>yc
yyz@ z6:„•} 5u8  5u8 r:«>yt}/} OeB2—/£ ¾%Ày ¿e6ù%BBB
30
B2— 4y1 ž, ¾%>y ¿—%¥³<%yGeB2—C!ü ¾%Ày ¿Œc†
4>;“
ü ¾%yt‹O„ ¿Œ}ryyz†
!
4>;“
ƒµvB9£Œ}Ÿ£Á † ƒµvB-ü•}~üe†
ƒµvB<«r}Œ«yz\ † ƒµvB2y~}•yg†
35

ƒµvB2³~}•³gÁ † ƒµvB2,~}•,g†

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


225(7.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
We will also assume that the discretization across time is done in equal time
steps, and therefore the space step 4ζ is also the same through time. The
implementation in listing 7.3 relaxes all these assumptions and constructs a tree
with time varying αt and σt , and also allows for variable time steps.
We then construct the grid at time t which extends across 2m + 1 elements
(the choice for m will be discussed shortly)

ζ = {i4ζ : i = −m, · · · , 0, · · · , m}

Typically, from the point ζ̄i = i4ζ the process can move to the nodes {ζi+1 , ζi , ζi−1 }.
Then, one can solve a system that matches the instantaneous drift and volatility
for the probabilities {p+ , p0 , p− }

p+ 4ζ − p− 4ζ = −αi4ζ4t
p+ (4ζ)2 + p− (4ζ)2 = σ 2 4t + α 2 i2 (4ζ)2 (4t)2
p+ + p 0 + p − = 1

The solution sets


1 1
p+ = + αi4t [1 − αi4t]
6 2
2
p0 = − α 2 i2 (4t)2
3
1 1
p− = + αi4t [1 + αi4t]
6 2
for all j = −m, . . . , m. If all those probabilities are positive, then the tree will
grow and will have 2(m + 1) + 1 elements in the next time period.
Encountering negative probabilities indicates that the mean reversion of the
tree is too strong at these nodes for this particular transition structure. The
geometry of the tree will then change, and the tree will stop growing. For ex-
ample, having p+ < 0 indicates that the mean reversion is pushing the process
towards zero quite strongly, and we therefore have to change the geometry of
the tree and consider transitions towards the nodes {ζ i , ζi−1 , ζi−2 }. Of course,
due to symmetry we will encounter negative p− on the other end of the grid,
suggesting transitions towards the nodes {ζi+2 , ζi+1 , ζi }.
Solving for these alternative transition geometries yields

7 1
p0 =+ αi4t [αi4t − 3]
6 2
1
p− = − − αi4t [αi4t − 2]
3
1 1
p−− = + αi4t [αi4t − 1]
6 2
and

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  226(7.5)

1 1
p++ =
+ αi4t [αi4t + 1]
6 2
1
p+ = − − αi4t [αi4t + 2]
3
7 1
p0 = + αi4t [αi4t + 3]
6 2
As we noted, in such cases the tree will not grow and the next set of nodes will
also have m+1 elements. The top half of listing 7.3 implements this method for a
more general setting, when the time steps, the mean reversion and the volatility
are all time varying. It makes sense to select the value of m as the first one for
which the volatility geometry changes from the standard one to the ones that
force mean reversion:
( " #)
ζ̄i
m = max round (1 − α4t) +1

The second stage


In the second stage the nodes of the tree that replicates the process ζ t are
shifted up or down in order to match the dynamics that price bonds exactly.
This creates the trinomial tree that will be approximating ξt . To this end, when
calibrating a HW tree we make use of the so called Arrow-Debreu (AD) state
prices, which we define now.
An Arrow-Debreu security is a generic contingent claim that will pay one
monetary unit if a certain event is realized at a particular point in time. Oth-
erwise the AD security pays nothing. The AD state price is the price of this
security. It is easy to see that AD securities can be used as building blocks to
construct more complex payoffs.
In models with a continuous state space, the Arrow-Debreu security is a
European style contract that pays off the Dirac delta function on its maturity.
In the context of HW trees the state space is discretized, as at time T the short
rate can take one out of KT possible values. Then, an AD security will pay one
pound if the short rate is at its k-th value at time T , and zero otherwise. We
denote with Qt (k, T ) the price of this AD security at time zero, the AD state
price.
One can readily observe that if we purchase all Arrow-Debreu securities that
mature at time T , then we are sure to receive one pound on that date. Effectively
we have constructed the payoff of the risk free bond. Arbitrage arguments will
then indicate that the sum of all AD state prices across states will equal the
price of a zero coupon bond.
KT
X
Qt (k, T ) = Pt (T )
k=1

We can also construct an inductive relationship that links AD securities with


successive maturities T and T +1. In particular, like any other security, under the

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


227(7.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
risk neutral probability measure, discounted AD securities will form martingales.
Suppose that the tree can take one of Kt different values at time t, and denote
with ct the actual level of the tree at that time, with ct ∈ {1, 2, . . . , Kt }. Then
we can write
   
Q Q I(cT = k) Q 1
Qt (k, T ) = Et [QT (k, T )] = Et = Et c T = k PQ t [cT = k]
BT BT

Conditioning on the state at time T − 1, and using the definition of the bank
account process, allows us to expand the conditional expectation as
 
1
EQ
t cT = k
BT
K
X T −1  
1
= EQ
t c T = k, c T −1 = j PQ
t [cT −1 = j|cT = k]
BT −1 erT −1 4t
j=1
K
X T −1  
−rj 4t 1
= e EQ
t cT = k, cT −1 = j PQ
t [cT −1 = j|cT = k]
BT −1
j=1
K
X T −1  
−rj 4t 1
= e EQ
t cT −1 = j PQ
t [cT −1 = j|cT = k]
BT −1
j=1

Bayes’ rule will provide us with

PQ
t [cT −1 = j]
PQ Q
t [cT −1 = j|cT = k] = Pt [cT = k|cT −1 = j]
PQt [cT = k]

The quantity pt (j, k) = PQ


t [cT = k|cT −1 = j] is just the (risk neutral) transitional
probability of moving from state j to state k at time t. The AD state price is
then simplified to

K
X T −1  
1
Qt (k, T ) = e−rj 4t pt (j, k)EQ
t cT −1 = j PQ
t [cT −1 = j]
BT −1
j=1
K
X T −1

= e−rj 4t pt (j, k)Qt (j, T − 1)


j=1

In the above expression rj = φ(ξj ), and ξj = ζj + π.


Essentially, in order to fit the observed yield curve one has to solve numer-
ically for the value of π at each maturity horizon T . If one also renders the
volatility and/or the speed of mean reversion time varying, then the parameters
have to be calibrated on a richer set of data that will identify these parame-
ter values. Typically, deterministic volatility functions are chosen as to match
implied volatilities that are derived from caps or swaptions.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  228(7.5)

LISTING 7.4: `  $>


Y : Compute the price path of a payoff based on a Hull-
White tree for the short rate.
p ? JOs>1ON=?Bw8
*u%;t.=O3+>; x£‚6{££a|~}r?JOs>1ON=? ƒµ°6{£ ¾ 
,p } ƒµB2,†¬«yƒ}„ƒµvB<«†
³p } ƒµB2³†/£´„ } ƒµvB9£†
¯³ƒ} A4;CŠ%=? <£ ¾ c†
5
¥³ƒ} “C3>** <³tc†
£%‚Œ}r£ ¾ ¾ 4>;“ ¿G†
£j£ ¾¯À³ ¿•}r£%‚†
*+, %yŒ}þ2¯³ž%„Fž%„gF9
££Ž}r£%‚0«Ày ¾%jy ¿%e†
10
£%‚~} 4y1 ž, ¾%jy ¿—¥³<%yGeB2— 5u8 -££B2—£/¾%yj¿g62%r‹•£ ¾ ¾ %yj¿G†
£À£ ¾%Ày ¿•}•£%‚†
4>;“
£%‚Œ} 4y1 žÀ, ¾C„ ¿%—C¥³v:„O%—>£ ¾C„ ¿%—C£%‚Œ‹•£ ¾ ¾C„ ¿c†
15
£g£ ¾C„ ¿~}r£%‚†

Overall, the construction of an interest rate tree resembles the local volatility
models for equity derivatives. In both frameworks we attempt to exactly replicate
a market implied curve or surface. One has to keep in mind the dangers of over-
fitting, which would introduce spurious qualities into the model. In many cases
market quotes of illiquid instruments can severely distort the model behavior.

Pricing and price paths


After the tree has been fitted to the yield curve, we can proceed to pricing
various interest rate sensitive instruments, such as bond options, interest rate
caps, floor, swaps or swaptions. Essentially we can find the fair value of a given
stream of contingent cashflows, in a way that is consistent with the prices of
risk free bonds.
If there are no early exercise features, prices of contingent claims can be
computed by summing up the corresponding AD security prices. In many cases
we are not only interested in the fair value of the contract, but also on its price
path. For example, in order to find the fair value of a put option with three year
maturity, which is written on a ten-year bond, we need to consider the price
paths of the ten-year bond, in order to ascertain the option payoffs. Price paths
can be easily computed by iterating backwards through the tree, starting from
the terminal date. Listing 7.4 shows how this can be easily implemented. To
allow for early exercise one just has to check if early exercise is optimal at each
tree node (implemented in listing 7.5).

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


229(7.5) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 7.5: `  $> $ >Rg


` Y Y : The price path of a payoff based on the Hull-
White tree when American features are present.
p ? JOs>1ON=?asN>8a4, Bw8
*u%;t.=O3+>; x£‚6/££°6^¼a|~}~?JOs>1ON=?asN>8a4,àƒµ°6{£ ¾ 6{£G
,p } ƒµB2,†¬«yƒ}„ƒµvB<«†
³p } ƒµB2³†/£´„ } ƒµvB9£†
¯³ƒ} A4;CŠ%=? <£ ¾ c†
5
¥³ƒ} “C3>** <³tc†
£%‚Œ}r£ ¾ ¾%¯j³ ¿G†
£%‚Œ} 8ONy 2£‚6{£ À¾¯j³ ¿%e†
£j£ ¾¯À³ ¿•}r£%‚^ † j¼ ¾¯À³ ¿•} -£%‚a}}C£ À¾¯À³ ¿Ÿ¹•£‚cœ‚C†
*+, %yŒ}þ2¯³ž%„Fž%„gF9
10
££Ž}r£%‚0«Ày ¾%jy ¿%e†
£%‚~} 4y1 ž, ¾%jy ¿—¥³<%yGeB2— 5u8 -££B2—/£ ¾%jy ¿g62%r‹•£ ¾ ¾ %yj¿G†
£%‚~} 8ONy 2£‚6{£ À¾%jy ¿%(†
£À£ ¾%Ày ¿•}•£%‚^ † j¼ ¾%/y ¿•}þ<£%‚a}}C£ À¾%Ày ¿Ÿ¹•£‚(œ‚Ce†
4>;“
15
£%‚Œ} 4y1 žÀ, ¾C„ ¿%—C¥³v:„O%—>£ ¾C„ ¿%—C£%‚†
£%‚Œ} 8ONy 2£‚6{£ >¾C„ ¿g†
£g£ ¾C„ ¿~}r£%‚^ † g¼ ¾C„ ¿~} -£%‚a}}C£ >¾C„ ¿r¹•£‚cœ‚Ce†

THE BLACK-KARASINSKI MODEL


The most popular special case of this very general specification is the Black and
Karasinski (1991) model, which is in spirit similar to the exponential Vasicek
model with time varying parameters. Here

dξt = α (θt − ξt ) dt + σdBt


rt = exp{ξt }

This specification exhibits mean reversion, and through the exponential trans-
formation ensures that the short rate remains positive. As with all lognormal
models, the Black-Karasinski model implies an explosive expectation for the
bank account, but since in practice the implementation is done over a finite tree,
this drawback is not severe.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  230(7.5)

LISTING 7.6: `  $> & C'(


` Y Y : Implementation of the Black-Karasinski model
using a Hull-White interest rate tree.
p ? JOs381aABw8
p + q 54,%D4“¡MO34A“”. uC,%D4
³‚Œ}¶x-„>H™z6”BBBC6¨‚´|†
¤%‚Œ}ˆx¬™(B2N  z6‡BBBC6·™(YB ˜|v†¨¤%‚Œ}„¤%‚GHO„>‚%‚e†
³} x9‚F2‚B2‚ %™eF VB:„%™eF2‚B:„%™eF„>‚ „%„GF2‚B<™eF%‚|°%†
5
¤´} 3 ;C=4,1G„ <³‚ 6 ¤‚6/³6Œ. u q 3%.g6r:4y%=%,N1 :c†
¿´} 4y1 ž ¤B2—%³tc†
ƒŒ}r?JOs.,4%N=4°2³6/¿<(F 4>;“ (\ 6 <yt 4y1 <yt6BBB
g<=t•‚B<%™—a+ ;C45 53>‰%4 <=tz6BBB
g<=t•‚eB2%—K+ ;C45 53>‰%4 <=tc†
10

3¿‡} *C3;“ <³a}%}a„>‚Oe†


³«Œ} x9‚B<™eF2‚B<™eF:„>‚|%†
£¾ O } ¾¿c†
*+, 3 yŒ}‘„tF 3¿
15
3 * 3%58a4>8 q 4,°<³g3 yGe6{³«K
£ ¾O3 Ày ¿•}‡eB ™—a+ ;C45 53>‰%4  ƒzB-!ü ¾a3 jy ¿%C(†
4%A%54 ¾
£ ¾O3 Ày ¿•} ‰4,C+5  53>‰%4  ƒzB-!ü ¾O3 jy ¿%Oc†
4>;“ ¾
20
4>;“
£ ¾ ¾ 4>;“ ¿•}r£ ¾ ¾ 4>;“ ¿•‹”„ ‚‚(†
x£‚6ª££O|Œ}r?JOs>1ON=?°Yƒz6ª£ ¾ († *O3ŠuC,4 :„e†ª?JOsŠ%,N1%?2££°6ƒKc†
3»‡} *C3;“ <³a}}%g†
25
££»‡ } ¾%£>£ ¾C„KF>3g» ¿¿c† *O3ŠuC,4 <e†ª?JOsŠ%,N1%? <££» 6 ƒKc†
£¾ O } ¾¿c†
*+, 3 yŒ}‘„tF 3»
£ ¾ ¾O3 Ày ¿•} ‰4,C+5  53>‰%4  ƒzB-!ü ¾O3 jy ¿%Oc†
30
4>;“
£ ¾ ¾ 4>;“ ¿•} 8ONy 9²‚Kž%£j£ ¾a3À» ¿e6ù‚e†
x9£»%‚v6ª£»O|Œ}r?JOs>1ON=?°Y ƒz6ª£ ¾ c† *O3ŠuC,4  Ce†ª?JOsŠ%,N1%?2£»° 6 ƒKc†
£¾ O } ¾¿c†ù£ ƒO } ¾¿c†
35
*+, 3 yŒ}‘„tF 3»
£ ¾ ¾O3 Ày ¿•} ‰4,C+5  53>‰%4  ƒzB-!ü ¾O3 jy ¿%Oc†
£ À¾O3 Ày ¿•} 8ONy 9²‚Kž£j£ ¾O3 Ày ¿e6¬‚e†

4>;“
£ ¾ ¾ 4>;“ ¿•} 8ONy 9²‚Kž%£j£ ¾a3À» ¿e6ù‚e†
40
x9£ ‚v6/N£ vi 6 ¼a|~}r?JOs>1ON=?asN>8a4,@7 ƒ6/£ ¾ 6ª£ Kc†
*O3ŠuC,4 9˜O†ª?JOsŠ%,N1%?asN>8a4,@<N£ v 6 ƒ^ 6 ¼cc†
*O3ŠuC,4 <™†ª?JOsŠ%,N1%?asN>8a4,@<££»x 6 ƒz^ 6 ¼cc†

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


0.3
0.4
0.5
231(7.5)
0.6 Q #RO& $>S%!%TVU>! WRX$>SO&T
0.7
0.8 7.5: Calibration of a tree for the short rate that implements
FIGURE the Black-
0.9
Karasinski
1 model. The Hull-White framework is implemented.

5.8
0
0.1
0.2
5.6
0.3
PSfrag replacements
yield
0.4
0.5
0.6 5.4
0
0.7
0.1
0.8
0.2
0.9
0.31 5.2
0.4
0.5
0.6
0.7 5
0 5 10 15 20 25 30
0.8
0.9 maturity
1 (a) yield curve

25

20
0
0.1
0.2
15
0.3
short rate

0.4
0.5
0.6 10
0.7
0.8
0.9
1 5

0
0 5 10 15 20 25 30
time
(b) short rate tree

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  232(7.5)

FIGURE 7.6: Price path for a ten year 5.50% coupon bearing bond. The price
paths are consistent with the yield curve of figure (7.5), modeled using the
Black-Karasinski process.

150

100
bond price

PSfrag replacements

50

0
0 2 4 6 8 10
time
(a) ten year bond

150

100
bond price

PSfrag replacements

50

0
0 0.5 1 1.5 2
time
(b) initial two year period

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


233(7.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
FIGURE 7.7: Price path for a two year put option, written on the ten year coupon
bearing bond of figure (7.6). The strike price is set at $80.

70 70

60 60

PSfrag replacements 50 PSfrag replacements 50

option price

option price
40 40

30 30

20 20

10 10

0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
time time

(a) European style (b) American style

Listing 7.6 shows how the HW tree building methodology is applied in the
Black-Karasinski case. The yield curve of figure (7.5.a) is assumed, and a HW
is constructed. We use 4t = 1/16 over the first three years, 4t = 1/8 from
the third to the tenth, and 4t = 1/2 for remaining twenty years. A view of
the resulting interest rate tree is given in figure (7.5.b), where this uneven time
discretization is apparent.
Of course, to value such a simple bond we do not need to construct the
complete price path, and in fact we do not need to construct a HW tree at
all. The fair price can be determined by using the yield curve alone, just by
discounting all cashflows. The price path is needed though if we want to value
an option on this ten year bond.
As an example we consider a two year put, with strike price K = $80. To
price this option we need the distribution of the bond price after two years.
Figure (7.6.b) gives the possible bond prices and the corresponding price paths
for the two year period. Essentially, the put option gives us the right to sell
the bond at the strike price if the interest rates after two years are too high.
The price paths for a European and an American version are illustrated in figure
(7.7); the corresponding prices are PE = $3.08 and PA = $3.46, indicating an
early exercise premium of $0.38, which is actually more than 10% of the option
price. The red points (in figures 7.6.b and 7.7.b) indicate the scenarios where
early exercise is optimal. We can observe how the coupon payments affect the
exercise boundary, as we would prefer to exercise immediately after the coupon
payment is realized.

CALIBRATION ISSUES
All models with time varying parameters can be cast in a binomial/trinomial form
that approximates the short rate movements. Nevertheless, although the tree will

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  234(7.5)

always adjust to match the current yield curve, following the HW procedure,
there are still parameters to be identified.
As an example, in the Black-Karasinski examples above we assumed α = 0.25
and σ = 0.20, but without any justification of these values. There are two options
in setting values for such free parameters, based on historical yield curves or
based on derivative prices.
Given a set of historical yield curves, one can produce estimates for the
speed of mean reversion and the volatility parameters. This can be done by
proxying the historical (unobserved) spot rate with a yield of a relatively short
maturity, and then applying maximum likelihood estimation methods. Of course
one has to keep in mind that such an exercise is carried out under the physical
probability measure, which might or might not have an impact, depending on the
exact parameterization of the price of risk. Even better, one can maintain the spot
rate unobservable, and use Kalman filtering techniques that draw information
from the whole yield curve. Such an approach allows one to jointly recover the
parameters under both physical and risk adjusted measures.
If derivative prices on interest rate sensitive instruments are available and
liquid, then their prices can be used to also provide estimates for the fixed
parameters. This is typically done by minimizing the squared price differences,
or the differences between model and actual implied volatilities. This seems to
be the method of choice amongst practitioners, but care must be taken to avoid
pitfalls.
In spirit, the second approach is very similar to the standard calibration of
stochastic volatility models to derivative prices, and is also subject to the im-
plementation difficulties that are associated with such models. In particular, one
main obstacle is model identification, where different set of parameters produce
the same optimal objective function.
Typically, a model will be calibrated on a set of interest rate caps and
swaptions, which are instruments that are sensitive to the volatility of the interest
rate. In virtually all short rate models the terminal volatility is the outcome of
the two quantities we want to retrieve, namely the speed of mean reversion and
the volatility of the innovations. Decreasing the speed has more or less the same
effect as increasing the volatility, and calibrating these quantities is not a well
identified problem. As in the stochastic volatility example, there is a locus of
parameter pairs that produce the same optimal fit, and we have no information
to distinguish between them. Surprisingly many practitioners choose to ignore
this issue, selecting the first set of points that their numerical optimizer returns.
But this can be the source of severe mispricing of other, more exotic, contracts
that will be valued using the calibrated parameters.
In an ideal world, one would get around this issue by also calibrating to
derivatives that are sensitive to future transition densities, such as forward start-
ing swaptions, but unfortunately such contracts are generally not available, and
very illiquid when they exist. Another option is to use the same regularization
techniques that we outlined in the stochastic volatility case, where the prior pa-

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


235(7.6) Q #RO& $>S%!%TVU>! WRX$>SO&T
rameters can be selected as the maximum likelihood estimates of the historical
ones.
Overall, one has to remember that calibration is more of an art than science
(although well disguised as science).

7.6 MULTI-FACTOR MODELS


Short rate models are also known as one factor models, as there is only one
source of uncertainty in the economy, which is represented by the short rate.
Although short rate models are very useful for some applications, they are not
sufficient for others. In particular, the presence of a single factor implied that
yields of all maturities are perfectly correlated (albeit with different volatilities).
This is easily illustrated in the case of the exponentially affine models, like
Vasicek or CIR, where the yield is a linear function of the current short rate.
Yields for two different maturities τ1 and τ2 , and their dynamics, are given by

yt (t + τi ) = A(t; t + τi ) + B(t; t + τi ) rt ⇒ dyt (t + τi ) = B(t; t + τi ) drt

TABLE 7.1: Correlations of yields for different maturities. Bonds with longer matu-
rities exhibit relatively higher correlation. Listing 7.7 gives the relevant Matlab
code.

1m 3m 6m 1y 2y 3y 5y 7y 10y 20y
1.00 0.56 0.40 0.28 0.21 0.19 0.18 0.16 0.15 0.10 1m
1.00 0.76 0.55 0.42 0.40 0.36 0.32 0.30 0.24 3m
1.00 0.85 0.68 0.64 0.59 0.55 0.52 0.44 6m
1.00 0.87 0.83 0.78 0.74 0.71 0.64 1y
1.00 0.97 0.92 0.88 0.85 0.77 2y
1.00 0.96 0.93 0.89 0.83 3y
1.00 0.98 0.96 0.90 5y
1.00 0.98 0.94 7y
1.00 0.96 10y
1.00 20y

Therefore, the correlation of the two yields is

Et {dyt (t + τ1 )dyt (t + τ1 )} B(t; t + τ1 )B(t; t + τ2 )


q 2 2 = pB2 (t; t + τ ) · B2 (t; t + τ ) = 1
Et dyt (t + τ1 ) Et dyt (t + τ2 ) 1 2

In practice, this correlation might be high, but is not perfect as the one fac-
tor family suggests. For example, table 7.1 presents the historical correlation
of various bonds with different maturities. Although the correlation is positive

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  236(7.6)

across the board, its magnitude varies substantially at different horizons. In par-
ticular, the long end of the yield curve is much stronger correlated than the
short end: the ten and twenty year bonds move pretty much in unison, with
correlation over 95%, while the one and three months exhibit about half of this
dependence. Also, each maturity exhibits correlations that decay as we consider
bonds with increasing maturity differences. For example, the two year bond is
stronger correlated with the three year rather than the seven year instruments.
One way of increasing the number of free parameters is by considering multi-
factor models. For example, we can consider the interest rate to be the sum of
two simple Vasicek processes, by setting
(1) (2)
rt = xt + xt , where
(j) (j)  (j)
dxt = θ (j) x̄ (j) − xt dt + σ (j) dBt
(1) (2)
The two processes xt and xt are called factors, and are in principle un-
observed. In the general specification we can also assume the factors to exhibit
some correlation ρ. If we maintain that our model is affine, then we can postulate
that the yield is again a linear combination

(1) (2)
yt (t + τi ) = A(t; t + τi ) + B1 (t; t + τi ) xt + B2 (t; t + τi ) xt
(1) (2)
⇒ dyt (t + τi ) = B1 (t; t + τi ) dxt + B2 (t; t + τi ) xt

After some extremely boring calculations we can derive the covariance of the
changes of bonds with yields τ1 and τ2 , as
r
(1−ρ2 ) (B11 B22 −B21 B12 )2 σ12 σ22
ρ(τ1 , τ2 ) = ± 1− 2 2 2 2 2 2 2 2
(B11 σ1 +B21 σ2 +2ρB11 B21 σ1 σ2 ) (B12 σ1 +B22 σ2 +2ρB12 B22 σ1 σ2 )

We use the shorthand notation Bij = Bi (t; t + τj ). The sign of the correlation is
positive if
B11 B12 σ12 + B21 B22 σ22
ρ>−
(B11 B22 + B21 B12 )σ1 σ2
Therefore from the expression above we can assess the implied correlation
of special cases. In particular, setting ρ = 1 will render ρ(τ1 , τ2 ) = 1 as the two
factors are driven by the same Brownian motion. Specifying uncorrelated factors
will always produce positive correlation across the maturities. Large negative
correlations produce a peculiar effect: as we set ρ = −1 we observe that the
yield correlation will be either perfectly positive or perfectly negative, depending
on the maturity pair.5
5
We mention this peculiarity because it is fairly common for a correlation of ρ = −1 to
be calibrated from cap or swaption prices. It is very unlikely that such a value reflects
the interest rate dynamics, and it is a feature that is more likely to point towards more
complex dynamics for the rate and its volatility that go beyond the affine setting.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


237(7.6) Q #RO& $>S%!%TVU>! WRX$>SO&T
Postulating the short rate to be the sum of factors is not the only way to
construct multi-factor models. For example, in an early article Brennan and
Schwartz (1982) consider a model where the long run interest rate is stochastic,
serving as the second factor. Intuitively, there is a slowly mean reverting process
which is largely a proxy for the business cycle and determines the long run
attractor of the short rate. More recently, Longstaff and Schwartz (1992) propose
a model where the second factor is the volatility of the short rate.
Other multi-factor specifications have been also considered in the literature.
For example Brennan and Schwartz (1982) consider a model where the long
run interest rate is also stochastic, serving as the second factor. Intuitively, long
swings in the short rate are determined by the latter process, which exhibits
weak mean reversion. This process will determine the behavior of the long end
of the yield curve. The short end of the yield curve will be determined by the
process that reverts faster towards the long run short rate process. In another
paper, Longstaff and Schwartz (1992) propose a model where the second factor
is the volatility of the short rate. The mean reverting nature of the short rate
implies that the effect of stochastic volatility will have a higher impact on the
short end of the curve.
In order to achieve a perfect match to a given yield curve Brigo and Mercurio
(2001) describe a method that enhances the multi-factor specification, by adding
a deterministic function of time φ(t)
(1) (2)
rt = xt + xt + φ(t)
Brigo and Mercurio (2001) show how one can retrieve the deterministic function
φ(t) for a variety of processes, including multi-factor Vasicek and CIR.

FACTORS AND PRINCIPAL COMPONENT ANALYSIS


The historical relative moves of yields for various maturities also provide mo-
tivation for using multi-factor specifications. In particular, although yields are
strongly correlated, they don’t always move in the same direction. In fact, there
are periods when the yield curve remains relatively flat, and episodes of steeply
rising yield curves. A single-factor model is not adequate to reproduce such pat-
terns, as it will always generate yields that move together across all maturities.
Principal Component Analysis (PCA) techniques can be employed to explore
the variability and correlations of various yields. PCA has as input a set of N
correlated series, and decomposes them into N uncorrelated components, which
are called the factors. In order to do so, the covariance structure is computed and
its eigen-structure is produced. The eigenvectors with the largest eigenvalues
point towards the most important factors, and can be utilized to investigate which
proportion of the variability of the original series is explained by individual
factors. Typically, one looks for a set of factors that will explain 90-95% of the
total variability.
Also, yields are strongly autocorrelated through time. It makes then perfect
sense to work with the time-differenced series: in essence the factors will then

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  238(7.6)

explain changes in the yield curve behavior through time, rather than the yield
curve level. We will denote with y(t; τ) the yield of bond with maturity τ, recorded
at time t = 1, 2, . . ..
Therefore, each yield change 4yj = y(t; τj ) − y(t − 1; τj is written as the
weighted sum of n factors

4yj = cj + `j,1 f1 + · · · + `j,i fi + · · · + `j,n fn

The coefficients `j,i are called factor loadings, and essentially determine the
sensitivity of yield yj to factor fi , and cj is a constant cj = E4yj . If we assume
that the factors are uncorrelated, and they are normalized with zero mean and
unit variance, then we can write the covariance of different yields as
n
X
Cov(4yj , 4yk ) = `j,m `k,m
m=1

Therefore, if we denote with L the matrix that collects all factor loadings, then
the covariance matrix Σ of the yield changes will be equal to

Σ = L L0

Given that the covariance matrix is not singular, an eigen-decomposition will


produce a matrix V with the linearly independent eigenvectors, together with a
diagonal matrix of eigenvalues M, such that

Σ = V M V−1

But as Σ is symmetric, the eigenvectors form an orthonormal matrix V −1 = V0 ,


which implies essentially that the factor loadings matrix can be expressed as

L=V M

Using this representation we can write the yield changes in terms of the
elements of the eigenvector matrix V and the eigenvalues in M
√ √ √
4yj = cj + vj,1 m1 f1 + · · · + vj,i mi fi + · · · + vj,n mn fn

It is therefore intuitive that factors that are associated with higher eigenvalues
will contribute more to the total variability of the series. In particular, if we
consider the overall variance of all yields, then we can write as the sum of all
eigenvalues
Xn Xn X n n
X
2
Var(4yj ) = vj,i mi = mi
j=1 j=1 i=1 j=1

where the last equality follows from the fact that the eigenvectors are normalized
to unit length. In factor analysis we select to use only the largest n̄ eigenvalues,

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


239(7.6) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING 7.7:
RO&0"O^^>! 
Y Y : Correlation structure and principal component anal-
ysis of yield curve movements.
p 1C,O3 ;t.%.+ 81@Bw8
8C=,M } x0„HO„O%HC„>=%HC„>‘„Ÿ„~™„Ž„‚‘‚|e† p 8aN=uC,O3=O34C5
“%N=%N }ŒyCA5,4%N“vC@€a¨%s¼CN=4C5s%‚%‚a„ B-yA%5g:c† p Š%4=‡=?C4ƒ“%N=%N
“N=4C5•}ŒyC 8O“N=40“%N=%N°%FO6:„Oc†
,%N=%4 }Œ“%N=%Nv%FO6<(F 4>;“ c†
5
“%,N=4~} “C3>** 0,%N=%4cc†
p “%N=%N’5 uC,%*NC.4
*O3ŠuC,4 :„†
5u,* 28=,Mà6ª“%N=%45v6¬,%N=%4cc†
“N=4=O3%.  O2MK60„‚†
10
p .+,%,4AN=O3+>; 5=%,ut.=uC,4¡+>*¡3 ;C=4,4C5=‘,%N=%4 . ?ON;CŠ4C5
«>§‡} .+,%,O.+4* 0“%,N=4Kc†
“C351 O:«+,%,4AN=O3+>‡; Ãt¨=%,ut.=uC,4 :c†
*+, 3 yt}O„KF A4;CŠ%=? 28C=,M(
*1C,O3 ;C=%* C p eB2O* ÃgC6 „ ‚‚—a«>§F6>3 yGc†
15
*1C,O3 ;C=%* C-ÿ;:c†
4>;“
p 1C,O3 ;t.%3 1ONA .+ 81a+>;O4;C=’N;ONAMO5%3%5
x2« 60¨>«6-¸³O|~}r1C,O3 ;t.+ 81 :“%,N=4Kc†
*1C¾ ,O3 ;C=%* O-ÿ%;»a3Š4;CDNA>uOg4 Ãa“4C.+ 81a+5%3=O3+>;zÿ;v0
20
*1C,O3 ;C=%* O-*NC.=C+À, Ãa4C3Š%DC!A ÃK. u8uaAOÿ;:c†
*1C,O3 ;C=%* O p ˜O“ Ã!à p eB2O* Ã!à p eB2O*Oÿ;6ƒx:„KF A4;CŠ%=? 28C=,M((ŒBBB
„ ‚‚—C¸³tH 5u8 <¸³G‡„ ‚‚— . u8K5 u8 <¸³GH 5u8 -¸³K|-e†
*O3ŠuC,4 <†
« ¾ }Œ« ¾ %Fa6:„KYF Cg†
25
¨«Œ}ƒ¨«%Fa6:„KYF Cg†
1OA+= -« ¾ 6Ož+6-¸a3 ;O4µa3“%=?@t60„(†
Š,C3>“ +;†
yCAN q 4A O28aN=uC,O3=%M°:c†
MCAN q 4A O-*NC.=C+,ÀÃtA%+N“O3 ;Cа0c†
30
A4Š4;C“ O<=?C/4 ÃC*O3,O5/= ÃO*NC.=C+,v6r-=?C4 ÃK54C.+>;C/“ ÃO*NC.=C+,v6ŒBBB
<=?C4 ÃO=?t3,%/“ ÃO*NC.=C+,v0c†

and implicitly approximate the contributions of the rest as being independent


across all maturities
√ √ √
4yj = cj + vj,1 m1 f1 + · · · + vj,i mi fi + · · · + vj,n̄ mn̄ fn̄ + ηj

The number of factors that we retain should be used as to make the variance of
the remainder component ηj small. As a rule of thumb, n̄ should ensure that at
least 95% of the total variance is explained by the corresponding factors. If one
finds that such a value of n̄ is large compared to the total number of variables,
it is evidence that factor analysis might not appropriate for this case.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  240(7.6)

factor 1 2 3 4 5 6 7 8 9 10
eigenvalue (% of sum) 81.7 8.8 4.6 2.2 1.1 0.6 0.3 0.3 0.2 0.2
cumulative (% of sum) 81.7 90.6 95.2 97.3 98.4 98.9 99.3 99.6 99.8 100.0
PSfrag replacements
TABLE 7.2: Relative magnitude of the eigenvalues for the decomposition of the
correlation matrix. The first three factors are responsible for over 95% of the
yield variability.
the third factor

+1.0
first factor
second factor
+0.8
third factor

+0.6
factor loading

+0.4

+0.2

-0.2

-0.4
1m 3m 6m 1y 2y 3y 5y 7y 10y 20y
maturity

FIGURE 7.8: Yield curve factor loadings. A principal component analysis is applied
to changes of yields over different maturities. The three factors that correspond
to the level, the slope and the convexity are clearly identified.

The recipe for principal component analysis is illustrated in listing 7.7. The
relative contribution of the j-th factor, together with the cumulative contribution
of the first j factors are given in table 7.2. For example, the third factor explains
4.6% of the variability of interest rate changes, while the first three factors explain
95.2%. We can therefore adopt a three-factor model as an approximation that
explains sufficiently well the yield curve dynamics.
One of the benefits of factor analysis, especially in interest rate modeling,
is the intuition that it can offer. Figure 7.8 plots the factor loadings for the first
three factors, against maturities. One can think of these curves as the impact

on the yield curve of j-th factor shock, with magnitude mj . For example, the a
shock of the first factor will shift the yields of all maturities in the same direc-
tion. This will essentially move the whole yield curve upwards or downwards,
and for that reason we coin the first factor as the level factor. This of course

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


241(7.6) Q #RO& $>S%!%TVU>! WRX$>SO&T
is a rough statement, as it is obvious that such shifts will not be parallel, with
longer maturities being more responsive. Nevertheless, we can apply the same
reasoning for the second factor: now a positive factor shock will shift yields of
short maturities6 upwards, and at the same time will shift longer yields down-
wards. Such a shock will “rotate” or change the “slope” of the yield curve, and
for that reason we call it the slope factor. One can now imagine why the third
factor is called the convexity factor, as it affects the “hump” of the yield curve.

KALMAN FILTERING
Principal component analysis of yields is a quick-and-dirty method of isolating
the factors present in yield curve moves, and quantifying their impact. But in the
end of the day it is a statistical technique, with no robust structure behind the
dynamics of the factors. When we take the covariance matrix of yield changes,
we implicitly make an assumption on their dynamics, namely that these changes
are stationary (and therefore yields follow unit root processes).
Kalman filtering techniques can be applied for a substantial class of models,
and in particular the relatively large affine family. As an example we will inves-
tigate an OU factor setup, which we will calibrate on a set of historical yield
curves. As we will now have a complete model to describe the bond yields and
their dynamics, the parameters and the corresponding risk premia can be jointly
recovered.
To be more concrete, assume that, under the physical measure, the factors
are specified as
(j) (j) (j)
dxt = −θ (j) xt dt + σ (j) dBt
That is to say, the factors are behaving as OU processes with mean reversion
level at zero. The short rate will be given as the sum of these processes plus
P (j)
a constant term rt = c + j xt . We can assume a constant price of risk that
will eventually after some algebra contribute to the constant term. We can then
P (j)
rewrite the risk adjusted short rate as rt = c + λ + j xt ; the risk adjusted
dynamics for the factors are given by the stochastic differential equations
(j) (j)  Q,(j)
dxt = θ (j) λ(j) − xt dt + σ (j) dBt
Q,(j)
with Bt being a Q-Brownian motion.
With the further assumption that these Brownian motions are independent
across the factors, we can write the bond price as the product of Vasicek prices,
having the form
6
Less than three years in this case.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  242(7.6)
Z t+τ 
Q
Pt (τ) = E exp rs ds
t
 
 X X 
(j)
= exp −(c + λ)τ + C (j) (t; τ) + D (j) (t; τ)xt
 
j j

Notice that this is slightly different to the form presented during the discussion
of the Vasicek model, as here τ denotes the time to maturity, while there T
denoted the maturity date. We change the notation slightly to economize on
space here.
Therefore, the yields for different maturities are given by
X X (j)
yt (τ) = cτ + C̃ (j) (t; τ) + D̃ (j) (t; τ)xt
j j

with the functions C̃ and D̃ are given below (the superscripts are removed to
further ease the notation)

[1 − D̃(t; τ)] [θ 2 λ − σ 2 /2] [σ D̃(t; τ)]2 τ


C̃ (t; τ) = +
θ2 4θ
1 − exp{θτ}
D̃(t; τ) =
θτ
Given that we have a set of observed historical yields in our disposal, we can
set up a state space representation of the multi-factor process, which we can
estimate using the Kalman filter. For the exposition we will present an example
of the two factor case, but generalizations are straightforward.
The transition equations for the factors will take place under the physical
measure. Both factors evolve in continuous time and are unobserved, but given
the OU structure their distribution over a discrete time interval 4t will be
Gaussian. We can therefore write what will be the transition equations of the
Kalman filter as
(j) (j)
xt+4t = β (j) xt + ηt

with the coefficients given by (once again we discard the superscripts)

β = exp{−θ4t}
σ2 
ση2 = 1 − exp{−2θ4t}

The measurement equations are based on the observed yields of different
maturities, and will be linear in the factors. Of course the model prices will
not match the observed historical yields exactly, and error terms need to be
introduced. This will be due to the fact that every model is just an approximation

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


243(7.7) Q #RO& $>S%!%TVU>! WRX$>SO&T
of reality, and is therefore mis-specified to some extent. We will denote with ε t,k
the error term at time t that corresponds to the yield of maturity τ k . Thus, we
write X X (j)
yt (τk ) = cτ + C̃ (j) (t; τk ) + D̃ (j) (t; τk )xt + εt,k
j j

A MULTI-FACTOR GAUSSIAN EXAMPLE


We illustrate the use of Kalman filtering with an example that implements the
multi-factor version of the Vasicek model described above. In particular, we sim-
ulate a pair of factors that obey the system of stochastic differential equations
(1) (1) (1)
dxt = −0.05 xt dt + 0.010 dBt
(2) (2) (2)
dxt = −0.20 xt dt + 0.030 dBt

with the two Brownian motions uncorrelated. The prices of risk are assumed
zero, and therefore the dynamics under the risk adjusted probability measure
remain unaltered. The instantaneous rate will be the sum of the two factors, and
the yield curve will be an affine function of them.
Listing 7.8 implements a wrapper that converts the inputs of the Gaussian
N-factor model to the form that is expected by the Kalman filter algorithm of
listing 5.3. To implement the filter, there must be some discrepancy between
model and observed yields, and for that reason we add a Gaussian noise ε to
each yield, with standard deviation σε = 0.1%. Otherwise we can solve the yield
curve for the factor values, and there would not be much filtering involved! The
simulated yield curves that serve as the input are given in figure XXX, together
with the simulated and filtered factor paths.
Given a well specified model and the true parameter values, the Kalman
filter does an outstanding job in recovering the factor trajectories. Of course for
the filter to be of any practical relevance, it will have to provide us with decent
parameter estimates as well. We therefore turn in investigating the performance
of the Kalman filter, if the parameter set is unknown.
Just like the standard Kalman filter, we will address the estimation problem
with the maximum likelihood approach. But unlike the standard

7.7 FORWARD RATE MODELS


A more recent family of models utilizes the forward curve for modeling purposes,
The models described so far
Investing over the period (t, T ) can be seen as similar to investing first over
(0, S) and then reinvesting over (S, T )
Of course we don’t know at time t what interest rate will prevail at time S
for a bond that matures at T ; therefore the second strategy is not risk free

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  244(7.7)

LISTING 7.8:
Si `  R%$ >Rg Y : A Kalman filter wrapper for the multi-factor Gaus-
sian model.
p C *CsJ,N1%1O4,àBw8
*u%;t.=O3+>; x¤¸6–€6¬¨|~}rC*CsJ,N1%1O4,16V¯z6{“³v6/³z6\t
1ƒ} N q 5 91cc†{¯³ƒ} A4;CŠ%=? <³Kc†V8C‚Œ} ‰4,C+5 9¯v6:„†
D‚Œ}Ÿ8C‚g† q }8C‚†ªN}8C‚†{DC}8C‚†
«r} ‰4,C+5 2¯³ 6:„e†/¥Œ} ‰4,C+5 2¯³v62¯Kc†
5
*+, C}O„KF<¯ p A++ 1 +D%4,”*NC.=C+,O5
p NA%A%+.N=4Ž1ON,N>8a4=4,O5
=?O4=NŒ}š1°„>‹˜O—c Kž%„Oc† y q N, }š1°-‹˜O—c Gž%„Oc†
53>Š }š1°@ %‹˜O—c Kž%„OCc†±,C35 }š1°2˜‹˜O—c Gž%„Oc†
p 54=u%1‡8t+“4AŒN>;“‡Š%4= q + ;“”MO34A“O5
10
8C‚gr OŸ}~y q N,v†
D‚gr OŸ}Œ‚eB2™%—G53>ŠtHC=?O4=N†
q rOŸ} 4y1 ž%=?O4=Na—%“³K(†
Nr OŸ}~y q N,(—K0„Cž q r Oc†
DgrOŸ}rD‚rO—K0„Cž q rOC†
15
*+, K}O„KF0¯³ p A++ 1 +D%4,Ž8aN=uC,O3=O34C5
=Œ}•³g9Gc†
¥zwvr6 Oš}þ0„ž 4y1 ž%=?O4=Nt—=tHC=?O4=NOH%=g†
«g9Gš}r«9Gš‹úBBB
0y q N,K‹,35: ž‚B<™—t53>ŠtH=?O4=NGa—K0„Cž¥wvr6 Oš‹úBBB
20
‚B<%™—t53>ŠK—=OHC=?O4=Nt—¥r6 OCc†
4>;“
4>;“
p .+>;t5=%,ut.=¡©CNA 8aN;”*O3A=4,’3 ;%1%uC=O5
%*B-« ¤‡}rNg†/%*B-« ƒ}r«†{%*zB<¿ ƒ}Ÿ¥†
25
%*B2N¿ ¤‡} “C3NŠ  q c†/%*B:¨ ƒƒ} “C3NŠ <Dtc†
%*B0¨ »‡} 21z 4>;“ CC— 4M%4 -¯³Gc†
%*B2€%‚Œ}Ÿ8C‚†/%*B2§%‚Œ} “C3NŠ -D‚Kc†
xw¸z6V€z6¬¨|Œ}rONA 8aN;as*O3A=4, 9*v\ 6 tc†

But we can lock at time t an interest rate which will be applied over the
interval (S, T )
This will be the forward rate F (t; S, T )
At S → T we have the instantaneous forward rate F (t; T )
No arbitrage indicates that the (continuously compounded) forward rate is

log P(t; T ) − log P(t; S)


F (t; S, T ) = −
T −S
Forward rates and bond prices
The instantaneous forward rates are closely linked to bonds
In the limit, as we split the interval (S, T ) into subintervals we reach

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


forw

245(7.7) Q #RO& $>S%!%TVU>! WRX$>SO&T


Yield curve and one year forward rates
5
yield
4.8 forw

4.6

4.4

short/forward rate
4.2

3.8

3.6

3.4

3.2

3
0 1 2 3 4 5 6 7 8 9 10
time

FIGURE 7.9: Yield and one-year forward curves.

Z T
log P(t; T ) − log P(t; S) = − F (t; s)ds
S

which yields for S → t


Z T
log P(t; T ) = − F (t; s)ds
t
 Z T 
⇒ P(t; T ) = exp − F (t; s)ds
t
∂ log P(t; T )
⇒ F (t; T ) =
∂T
Even though multi-factor models consider a larger set of parameters to be
calibrated, they are still finite
Therefore a perfect fit to the initial yield curve cannot be ensured
Also, we are increasingly interested in matching volatility structures implied
from cap/swaption prices
Forward rate models have the whole yield curve as an input (and possibly a
volatility curve as well)
These models were introduced in Heath, Jarrow, and Morton (1992, HJM):
we exploit the link between bond prices and the forward curve
In essence we model each bond maturity with a separate SDE

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  246(7.7)

Therefore we are facing a system of infinite SDEs, with the initial forward
curve as a boundary condition
Of course, some relationships will ensure that no arbitrage is permitted
In particular, if the forward rate dynamics are given by

dF (t; T ) = µ(t, T )dt + σ(t, T )dW (t)

Itō’s formula gives the bond dynamics


 
dP(t; T ) 1
= r(t) + µ? (t, T ) + kσ ? (t, T )k2 dt
P(t; T ) 2
+ σ ? (t, T )dW (t)

The functions
Z T
?
µ (t, T ) = − µ(t, s)ds
t
Z T
σ ? (t, T ) = − σ(t, s)ds
t

If we use the current account for discounting (as the numéraire), we expect
the discounted bonds to form martingales

P(t; T ) P(T ; T ) 1
=E =E
B(t) B(T ) B(T )

HJM show that the no-arbitrage condition is


Z T
µ(t, T ) = σ(t, T ) σ(t, s)ds
t

CALIBRATION OF HJM MODELS


HJM models need
The initial forward rateThe volatility structure
The forward curve which follows from the yield curve. This is specified under
risk neutrality
Since the forwards are given as derivatives of the yield curve one has to be
careful when constructing the yield curve: many instruments on top of bonds are
also used for that (e.g. swaps, futures)
The volatility structure can be specified using PCA on past yield curves.
Volatility structures are the same under all measures
Volatilities implied from derivatives can also be used

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


247(7.9) Q #RO& $>S%!%TVU>! WRX$>SO&T
SHORT VERSUS FORWARD RATE MODELS
Short rate models
⊕ Markovian in nature, easy to model
⊕ Many derivative prices in closed form
⊕ Tree building, finite differences or simulation: all easy
Arbitrage not ruled out
Volatility can be hard to model
Forward rate models
⊕ No arbitrage by nature
⊕ Very flexible volatility structures
⊕ Easy to include many factors
Short rate non Markovian generally
No Feynman-Kač representations
No trees or finite differences; only simulations

7.8 BOND DERIVATIVES


There is a large number of bond and interest rate derivatives with a liquid market
Forwards, swaps, bond options, caplets, floorlets, caps, floors and swaptions
are some examples
The pricing of bond options is most important, since prices of caplets and
floorlets can be expressed as an option on a zero, while swaptions can be ex-
pressed as an option on a coupon paying bond
Unlike equity options, bond options have some distinctive features that arise
from the nature of interest rates
For example, bond prices are known both at the current time t and on maturity
T , a feature known as a pull to par
CIR distribution of 1 year zero price after 0.5 years

THE BLACK-76 FORMULA


Black (1976, B76) in an influential paper considered the pricing of options on
commodities
Commodities have specific cycles, storage and availability costs that are not
captured in the standard BS methodology
With some modifications the B76 formula is used in the market to quote
bonds, caps, swaptions, etc
The B76 formula assumes that bonds or interest rates (depending on the
instrument) are log-normally distributed, with σ being the measure of volatility
Implied volatility curves are constructed for all these instruments
See Brigo and Mercurio (2001) for all relevant formulas

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  248(7.9)

Interest rate distribution Bond price distribution


350 400

300 350

300
250

250
200

frequency 200
150
150
PSfrag replacements
100
100

50 50

0 0
0 5 10 94 96 98 100
rate bond price

FIGURE 7.10: Pull-to-par and bond options.

9x1 year ATM cap realization


10

6
rates

0
0 2 4 6 8 10 12

Cap cash flows


4000

3000
cash flows

2000

1000

0
0 2 4 6 8 10 12
time

FIGURE 7.11: Cash flows for interest rate caplets and caps.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


249(7.9) Q #RO& $>S%!%TVU>! WRX$>SO&T
25
Cap
Caplet
0
20
0.1
0.2

Implied Volatility (%)


0.3
0.4 15
0.5
0.6
0.7
10
0.8
0.9
1
5

0
0 2 4 6 8 10 12 14 16 18 20
Maturity

FIGURE 7.12: Typical Black volatilities for caplets and caps.

7.9 CHANGES OF NUMÉRAIRE


In general, if the payoffs are given as Φ(r(T )), then the price at time t is given
by

f(r(t), t)
  Z T  
Q
= Et exp − r(s)ds Φ(r(T ))
t
 
Q B(t)
= Et Φ(r(T ))
B(T )
B(t)
If the discounting factor B(T ) was independent of the payoffs we would be
able to split the integral
But since we are dealing with stochastic rates they are not
Implicitly we are using the bank account as the numéraire, but there is
nothing special with this choice
As shown in Geman, el Karoui, and Rochet (1995), one can choose any
positive asset as the numéraire
For each numéraire there exists an equivalent measure, under which every
asset is a martingale
That is to say, if N(t) is the process of the numéraire, then there exists a
measure N induced by this numéraire, such that

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  250(7.10)

X(t) X(T )
= EN
t
N(t) N(T )

for any asset process X(t). Then, we can express the value at t as X(t) =
X (T )
N(t)EN t N(T )
Given a problem, a good numéraire choice can simplify things enormously
For example, we can use the bond that matures at time T as the numéraire;
then all asset prices are given in terms of this asset (rather than currency units)
If T is the measure induced by this bond, we can write the payoffs as

Φ(r(T ))
f(r(t), t) = P(t; T )ETt = P(t; T )ETt Φ(r(T ))
P(T ; T )

To make this approach operational we need to find under which measure T


all bonds discounted with P(t; T ) form martingales

7.10 THE LIBOR MARKET MODEL


A variant of the HJM model, which constructs lognormal rates was proposed in
Brace, Ga̧tarek, and Musiela (1995) and Miltersen, Sandmann, and Sondermann
(1997)
Since it produces prices that agree with the B76 market quotes the model
has been coined the “market” model
It uses fixed maturity forward curves, rather than the instantaneous forward
rate, since prices become explosive in that case
Typically the 3 month Libor rate is used as the underlying
This is the model of choice for many practitioners

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


8
Credit risk
A
Using Matlab with Microsoft Excel

In many practical situations one needs to export Matlab functionality to a


spreadsheet programme like Microsoft Excel. Fortunately, Microsoft Windows
provide the functionality via the COM component objects. Using the Matlab com-
piler one can build a standalone COM component in the firm of a dynamically
linked library (DLL) which can be invoked from Visual Basic for Applications
(VBA), the programming language used throughout Excel.
By using VBA one can construct an Excel add-in which can be exported
to any computer running Excel. One of the main benefits is that all required
Matlab functions are exported, and therefore the host computer need not have
Matlab installed.1 Also, Graphical User Interfaces (GUIs) can be constructed
easily using VBA. The main reference of this appendix is MathWorks (2005,
chapter 4.18).
In this appendix we will describe the procedure, and we will produce functions
that implement the Black and Scholes (1973) pricing model for calls and puts.
There are four+one steps in creating the Excel-Matlab link.
0 Setup the C/C++ compiler to work with Matlab
1 Write the functions in Matlab and create the COM component.
2 Write the VBA code that communicates with the DLL and performs the op-
erations.
3 Create the GUI in Excel.
4 Put everything in a package that can be readily installed on any computer
with Excel.

1
Computers that don’t run Matlab will need the Matlab Component Runtime (MCR) set
of libraries which is freely available.
 !"#%$& '()"  254(A.2)

A.1 SETTING UP MATLAB WITH THE C/C++


COMPILER
Wa&>'>XÅÄT>>W at the Matlab command prompt will
Before starting the procedure we have to ensure that the Matlab C compiler is
properly set up. Running Y%_
allow us to select the compiler we want to use. Since we need to compile COM
components we will need the Microsoft Visual C/C++ compiler in our system. It
is truly unbelievable but Microsoft is giving away the compiler for free as part
of the Visual C++ 2005 Express Edition (VC). This is recorded as VC version
8.0, and it only compatible with Matlab 7.3.2 You can download VC at
 Y T >X " h  Y &>^R!%T !i9^ ! Y  l T:WXC& ! Z R%%TT l &>T:W$'^>>X! "'!$X%
You should download and run the file l
^T>>Ww Z  , and install it to the default
directory
U \] %R ! b %R $ Y nO&>'%T \mO&^R%!%T>!ixÆO&T:W$' \ W%XC& !ÈÇ \

The second step is to get the Windows Platform SDK (Windows Server 2003
R2) from the web
 Y T X>" Y &^:R%!T>!>ig^>! Y  l T >W%X& ! > Z R%TT> l &T0W$'^:WOT&0" b OTXS
Ä ÇNÉew 
You should download the installer ]%\>dQ Z Z . It is important to select a
custom install and put as the target directory
U \] %R ! b %R $ Y nO&>'%T \mO&^R%!%T>!ixÆO&T:W$' \ W%XC& !iÇ \ÆU \] '$>i!>R YO\>dQ \

This is where Matlab looks for some necessary files. You don’t need to install
all components. The required ones are the following
mO&^R%!%T>!i{oa&0"%X! OT/U!>R% \>dQ I [ Wa&>'>X^Ê" l &:R%! " Y  " I [ Wa&>'>XiÊ" l &:R%! " Y  "
‰ ÇNÉÈË Ä &:  ‹
1.
 mOZ &^R%!%T>h!i_{oa&0"%X! OT/U!>R% \>dQ I Ì XC&TRO& _ W%$ _ '–U!!!Y 'T ! " "C!T !'Tp‰ ÇNÉ
eh 
Ë mO&^R%!%T>!i d $>%^ $ ÍC^^>%TT \ >R l &^>%T ‰wm d Í \ ‹ \>dQ Z
Ë Ä &:  I Î I Î
‹
Ï  mOh&^_R%!%T>!i d $>%$^ÍC^^>%TT \ >R l &^>%T ‰wm d Í \ ‹ \>dQ [ Wa&>'>X^Ê" l &:R%! " Y  "
I [
Wa&>'>i X Ê" l &:R%! " Y  " ‰ Z ÇNÉÈË h Ä _ &:‹ I
e
Ð d _ W bb &0" ^ b Î 
! 
! 
' /
T 
i >
! ù
R oa&0"%X! OT

Wa&>'>XÅÄT>>W and then select the appropriate


After everything is installed we are ready to setup the Matlab compiler. At
the Matlab prompt just input Y%_
compiler.
2
If you run an earlier versions of Matlab, like 7.0.4, you will need VC 6.0, 7.0 or 7.1. The
VC 7.1 is shipped with the Visual C++ Toolbox 2003, which is not officially supported
but is out there on the net.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


255(A.3) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING A.1: Matlab file Z ' `_ T ` ^>$''( Y


p yCA%s q 5s.NA%A Bw8
*u%;t.=O3+>; x¤£6/¥C|~}ŒyCA%s q 5s.NA%A°<¨g6V©z6/,6/³6 5%3Š8aNK
“O„Ÿ} A+Š 0¨gB9H%©Kš‹ <,~‹ƒ‚eB2™%—G5%3Š8aNvB<zB2—%³g†
“O„Ÿ}~“O„B9HK5%3Š8aNB9H 5>®,= <³tc†
“%~}~“O„Œžr5%3Š8aNB2— 5>®,= <³tc†
5
¯Œ}r;a+,8K.“%*v-“O„Cc†
¥Œ}r;a+,8K.“%*v-“%tc†
£Œ}~¨B2—¯”ž±©B2— 4y1 ž,B2—³tB2—¥†

LISTING A.2: Matlab file Z ' `_ T `  Wg Y


p yCA%s q 5s>1%uC=Bw8
*u%;t.=O3+>; x¤£6/¥C|~}ŒyCA%s q 5s>1%uC= <¨g6V©z6/,6/³6 %5 3Š8aNK
“O„Ÿ} A+Š 0¨gB9H%©Kš‹ <,~‹ƒ‚eB2™%—G5%3Š8aNvB<zB2—%³g†
“O„Ÿ}~“O„B9HK5%3Š8aNB9H 5>®,= <³tc†
“%~}~“O„Œžr5%3Š8aNB2— 5>®,= <³tc†
5
¯Œ}r;a+,8K.“%*vž“O„Cc†
¥Œ}r;a+,8K.“%*vž“%tc†
£Œ}Žž¨B2—¯ƒ‹Ÿ©B2— 4y1 ž,B2—³tB2—¥†

A.2 WRITING THE MATLAB FUNCTIONS


We will now write the Matlab function that implement the standard BS prices
and some hedging parameters. We are interested in passing whole arrays as
arguments, and also want arrays to be returned. There are many ways of doing
this, but to illustrate how different arrays are passed we will create two functions,
shown in listings A.1 and A.2. Both functions should be straightforward. The set
of parameters is input (perhaps some in vector form) and the prices and deltas
are returned.
' %!!' at the Matlab command prompt. A
The second step is to create the COM component using the Matlab Excel
builder. This is invoked by running YZ
window for the Matlab builder will then open. Go to
nO&>' I Ñ   ] R%!NÒ%^ to
start a new Builder project. We set the parameters as shown in figure A.1.

Component name: [\>] OROR &&^>^>>>RRC^'$%TT


Class name: [\>Ô ]Ó
Project version: j
ÍXXùnO&>'
The next step is to add the files Z ' `_ T ` ^>$ ''( Y and Z ' `_ T ` Wg Y
to the

[ Wa&>'>X I [ Wa&>'>X¬UÕ mÊ Z ^>'^ROÕ &_ ^>Ò>%R ^ to actually build the component. Two sub-
project, by clicking on . We must now save the project, and click on

folders are created in the [\>] folder, as shown in figure A.2.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  256(A.3)

FIGURE A.1: Screenshots of the Matlab Excel Builder

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


257(A.3) Q #RO& $>S%!%TVU>! WRX$>SO&T
FIGURE A.2: The folders created by YZ
%!!'

(a) folder:
H<¿>¨<£,>3:.-40, H-“>3:5<=:,>3 q

(b) folder:
H<¿>¨<£,>3:.-40, H5<,>.

A.3 WRITING THE VBA CODE


Open Excel and go to Î
!!'T
I I
m%$%^R%!
[ O&T:W$'
Æ
. We will now
$%T&^xÊXC&:%!>R
!!'T I Ì i>R% "O^>%T and select the necessary libraries which are now
need to tell Excel that the Matlab libraries shall become available. Within VBA
go to Î
available, the one we wrote and one that contains general Matlab utilities:

[mo%\>]URO! &^>O>R &>' j Ô Ó Ô Ó Î # #iiÖCÖC& _ & %R %R$>R$>R# #



 j Î _
Now we will need to write some VBA code that initializes the add-in and also

] R%!NÒ%^‰ [ !!>S j ‹ I Ø "OT>>R


define some global variables that will be kept between function calls in Excel.
Æ Í
To do that we need a module. Right-click on [

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  258(A.3)

LISTING A.3: VBA module (]


RO&^>>Rm%$&0" )
ª£,O3%.4,€CNC3 ;
£u q A3%. 8OM£,O3%.4,=5~¿O¨£,O3%.4, B-¿O¨£,O3%.4,O.ANC5%5
£u q A3%. 8OMN€C=O3A5~€%µ€C=O3A
£u q A3%. q €O+“uaA4 º ;t3=a3NAC3‰4%Ù “ 5~¿O+%+%A4%N;
£u q A3%. ¨=%,O3 O4C5>¼CN;CŠR 4 5r¼CN;CŠ4
5
£u q A3%. «%NA%A£,O3%.4C5>¼CN;CŠCV 4 5r¼CN;CŠ4
£u q A3%. ¥C4A=NC5>¼CN;CŠn 4 5~¼CN;CŠ4
£u q A3%. ¨>‚v6¢53>Š>8ON 6 º ;C=4,4C5=¼CN=4à6¬€CN=uC,O3=%=M 5~¥O+>u q A4
£,O3DN=4”¨ u q ¸O+N“£,O3%.4,à-
10
¥O3:8 8OM ¾ +,n 8 5~£,O3%.4, +,8
À;‘»,%,C+,ަ%+³+ ƒCN;C“CA4»¾ ,%,C+,
«N%AA º ;t3=£,O3%.4,
¨4= 8OM +,8”}r¯4JŽ£,O3%.4, ¾ +,8
«N%AA 8O¾M ¾ +,8B ¨ ?O+>J
15
»%yC3>=’¨ u q
ƒCN;C“CA4»,%,C+,@F
€a5Š¿O+y -»%,,B-¥C4C5%.,O3 1C=O3+>;
»;“‘¨ u q
20
£,O3DN=4”¨ u q º ;t3=£,O3%.4,à-
º * ¯C+À;‘ = q €O+“uaA4 º ;t3=O3NCA3‰C4“ ³?C4>;
»,%,C+,‡¦%+³+ ƒCN;C“CA4s»,%,C+,
*º 8ONM €C=O3A º 5~¯O+=?t3 ;CŠ ³?C4>;
¨4= 8ONM €C=O3A~}r¯4J‘€%µ€C=O3A
25
«N%AA 8ONM €C=O3AB0€%µ º ;t3= 1%1aA3.N%=a3+; 1%1aA3%.N=O3+>;z
»;“ º *
º * 8OM¨£4=,O3%8O.4M, £,O3%º .5r4,‘
¯O+=?t3 ;CŠ ³?C4>;
}r¯4JŽ¿O¨£,O3%.4,°B0¿O¨£,O3%.4,O.ANC5%5
»;“ º *
30
¨‚Œ}‘„ ‚‚ ¦
5%3Š8aNŒ}Œ‚eB0„>™
;C=4,4C5=¼CN=4¡}ƒ‚eB9‚˜
€Cº N=uC,O3=%MŽ}ƒ‚eB2™
q €O+“uaA4 º ;t3=a3NAC3‰4%“þ}~³,uC4
35
»%yC3>=¡¨ u q
ƒCN;C“CA4s»,%,C+, F
q €O+“uaA4 º ;t3=a3NAC3‰4%“þ} ¾ NA54
»;“ º *
40
»;“‘¨ u q

I
m%!X>WC'
. Change the name of this module (at its properties) to ] RO&^>>Rm%$&0" ,
and insert the code given in listing A.3.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


259(A.5) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING A.4: VBA Activation Handlers (]


RO&^>>Rn%!>R Y )
ª£,O3%.4, +,8
1C,O3DN=4”¾ ¨ u q €t54, ¾ +,8tsNO.=a3DCN%=4 -
À;‘»,%,C+,ަ%+³+ ƒCN;C“CA4»,%,C+,
¨>¿C+yvB-§CNA>uO4Œ}Œ¨‚
¨3Š8aN¿O+y B<§CNA>uO4Œ}Ž5%3Š8aN
5
;C=4,4C5=¼CN=4¿O+y B-§CNA>uO4Œ} ;C=4,4C5=¼CN=4
€Cº N=uC,O3=%M¿O+y@B-§CNA>uO4~}Œ€CN=uC,Oº 3=%M
º * ¯C+¨=¡ ¨=%,O3 O4C5>¼CN;CŠ4 5~¯O+=?t3 ;CŠ ³?C4>;
=%,O3 O4C5>¼CN;CŠ4¿O+y B0³%º 4y=Ž}‡¨=%,O3 O4C5>¼CN;CŠ4@Br%“%“%,4C5%5
»;“ º *
º * ¯C+«%=” «%NA%A£,O3%.4C5>¼CN;OŠ4 5~¯O+=?t3 ;CŠ ³?C4>;
10

NA%A£,O3%.4C5>¼ON;CŠC4¿O+y B-³%º 4y=Ž}~s


«%NA%A£,O3%.4C5>¼CN;OŠ4 rB %“%“%,4C5%5
»;“ º *
»%yC3>=’¨ u q
ƒCN;C“CA4»,%,C+,@F
15

€a5Š¿O+y -»%,,B-¥C4C5%.,O3 1C=O3+>;


»;“‘¨ u q

We now need to turn to the GUI, which can be as in the screenshot A.3. The
RO&^>>Rn%!>R Y form code.
components need some event handlers, that will respond to the activation of the
form and user input. All this code must reside within the ]
^>! W%e button (listing A.5) or the
When the form is activated some initial values must be set, which is done in
listing A.4. The user can click either the Y
$ _ ! Wg button (listing A.6).

A.4 THE EXCEL ADD-IN


a&T:o!>RS _ !!>S I ÆO&  ±U!X and add the code of listing A.7. This code
The last part is to create the code that puts the add-in in Excel. Right-click
on Î
!!'T menu item of Excel. A button
O
R 
& >
^ >
 g
R 
 
 
will install and uninstall the add-in in the Î
[\>] Ö!$X RO&^>>R
is added which invokes the
Now we can save the add-in in the [\>]
 RO] &^>>R>XCsubroutine
&TRO& _ directory,
when clicked.
ready
 '$ ).
for packaging. Note that the Excel file has to be saved as an Excel add-in file
( Z

A.5 INVOKING AND PACKAGING


!!'T I ÍXXÄ&0"OT we should be able to locate [\>] RO&^>>Rg Z '$ . Then a [\
To check that everything is all right, we can close Excel and reopen it. Within
Î

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  260(A.5)

LISTING A.5: VBA User Input Handlers I (]


RO&^>>Rn%!>R Y )
ª£,O3%.4, +,8
£,O3DN=4”¾ ¨ u q À>¿uC=%=C+>;as«A3.  -
¥O3:8 ¼;ŠF5~¼CN;CŠ4
º 0À* ;”8OM»£,%,O,C3%+,ƒ .4, º 5~¯O+=?t3 ;CŠ ³?C4>;’¦%+³+ »yO3= ¾ +,8
¼C4C5 u8a4Œ¯4y=
5
¨4= ¼;Ї}r¼CN;CŠ4>¨=%,O3 O4C5>¼CN;CŠ4¿O+y B0³%4y=c
º * »%,€a,” ›œ•‚ ³?C4>;
5Š¿O+y  Ú º ;CDNA3À“ ÃO,N;CŠ4 ÃC*+/, ÃK5=%,O3 O4/Ã%1C,O3%.4C5>ځ
»%yC3>=¡¨ u q
»;“ º *
10
¨4= ¨=%,O3 O4C5>¼CN;CŠ4¡}r¼;Š
¨4= ¼;Ї}r¼CN;CŠ4 «%NA%A£,O3%.4C5>¼ON;CŠC4¿a+y B0³%4y=(
º * »%,€a,” ›œ•‚ ³?C4>;
5Š¿O+y  Ú º ;CDNA3À“ ÃO,N;CŠ4 ÃC*+/, Ãt.N%A/A Ã1C,O3%.4C>5 ځ
»%yC3>=¡¨ u q
15
»;“ º *
¨4= «%NA%A£,O3%.4C5>¼CN;CŠC4’}~¼;Š
¨4= ¼;Ї}r¼CN;CŠ4:¥C4A=NC5>¼CN;CŠ4¿O+y B-³%4y=(
º * »%,€a,” ›œ•‚ ³?C4>;
5Š¿O+y  Ú º ;CDNA3À“ ÃO,N;CŠ4 ÃC*+/, ÃO“4A=NC>5 ځ
20
»%yC3>=¡¨ u q
»;“ º *
¨4= ¥C4A=NC5>¼CN;CŠ4”}r¼;Š
º * «%NA%A£,O¨3%=%.,O4C3 5>O¼C4CN5>;C¼CŠCN4@ B:«%4A%A5gB «+>u%;C= ›œrs
;CŠ4àB:«%4A%A5gB «+>u%;C=ƒ³?C4>;
25
€a5Š¿O+y  Ú%¨4A4C.=O3+>;tj5 Ã8%ua5>j= Ã%?CND%À4 ÃC=?C/4 Ãt5N 8OÀ4 Ãt53>‰%C4 ځ
»%yC3>=¡¨ u q
»;“ º *
¨‚~} «¥ q A ¨>¿C+y°B0³%4y=c
5%3Š8aNŒ} «¥ q A  ¨3Š8aN¿O+y B-³%4y=(
30
;C=4,4C5=¼CN=4¡} «¥ q A  ;C=4,4C5=¼CN=4¿O+y B0³%4y=(
€Cº N=uC,O3=%M‘} «¥ q A 0€CN=uC,Oº 3=%M¿O+y@B-³%4y=(
«N%AA 8OM£,O3%.4,°B0yCA%s q 5s.NA%A°26 «%NA%A£,O3%.4C5>¼CN;CŠ4 6¬s
¥C4A=NC5>¼CN;CŠ46ù¨>‚6±¨=%,O3 O4C5>¼CN;CŠ4à6¬s
;C=4,4C5=¼CN=46¬€N=u,C3>=M 6 5%3Š8aNK
35
¦%+³+ »yO3= º ¾ +,8
ƒCN;C“CA4»,%,C+,@F
€a5Š¿O+y -»%,,B-¥C4C5%.,O3 1C=O3+>;
»yO3= ¾ +,8àF
€%;aA%+N“ €%4
40
»;“‘¨ u q

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


261(A.5) Q #RO& $>S%!%TVU>! WRX$>SO&T

LISTING A.6: VBA User Input Handlers II (]


RO&^>>Rn%!>R Y )
ª£,O3%.4, +,8
£,O3DN=4”¾ ¨ u q  q +>uC=¿uC=%=C+>;ts%«AC3.  -
¥O3:8 8OM q +>uC=h5 q +>uC= ¾ +,8
¨4= 8OM q +>uC=Ž}r¯4J= q +>uC= ¾ +,8
8OM  q +>uC=°B ¨ ?O+>J
5
»;“‘¨ u q

LISTING A.7: VBA Add-in installation (


a&T:o!>RS _ ! !>S )
¬³?t3%5>µO+, q +%+>
£,O3DN=4”¨ u q µO+, q +%+>as%““a3 ; º ;K5=CNCA%A@-
«N%AA %“%“£,O3%.4,€C4;u º =48
»;“‘¨ u q
5
£ ,O3DN=4”¨ u q aµ +, q ++ts““ º ;!€;K3>;K5=CNCAA(-
«N%AA ¼C4>8t+D 4£,O3%.4%,€O4;%u º =48
»;“‘¨ u q
£,O3DN=4”¨ u q %“%“£,O3%.4,€C4;u º =C4>87-
10
¥O3:8 ³C+%+%A5>€C4;%m u 5‡«+ 8%8aN;C“¿CN,£O+1%u1
¥O3:8 ¯C4J%€C4;%u º =4>Œ 8 5ƒ«+ 8%8aN;C“¿CN,¿uO==C+;
«N%AA ¼C4>8t+D4£,O3%.4%,€O4;%u º =48
¨4= ³C+%+%A5>€C4;%u¡}~s
1%1aA3%.N=O3+>; B:«+ 8%8aN;C“¿CN,O5:„B 3 ;C“«+>;C=%,C+%A  ¥F2}%‚%‚%‚O
¾ »%yC3>=¡¨ u q º
º¨*4= ³C¯C+%+%4J%A€C5>€C4;%4;%u u =4>º 8þ5~¯O+=?t3 ;CŠ ³?C4>;”
15

}~s
³C+%+%A5>€C4;%u@B:«º +>;C=%,C+%A5zB ““  ³M1C4 F}C8K5+«+>;C=%,C+%A¿%uO=%=O+>;°
¯C4J%€C4;%u º =4>8 B:«%N1C=O3+>;‘= } Ú ¿¨ ÃC£,O3%.4,°BBB Ú
¯C4J%€C4;%u º =4>8 BÀ>; O.=O3+>;¡h } Ú¸O+N“£,O3%.4Û, Ú
20
»;“‘¨ u q
£,O3DN=4”¨ u q ¼C4>8t+D4£,O3%.4%,%€C4;u º =C4>87-
¥O3:8 «>8O“¿CNF , 5ƒ«+ 8%8aN;C“¿CN,
¥O3:8 «=,O A 5ƒ«+ 8%8aN;C“¿CN,C«+;O=%,O+%A
25
À;‘»,%,C+,‡¼C4C5 u8a4Œ¯4y=
¨4= «>8O“¿CN,‡ } 1%1aA3%.N=O3+>; B:«+ 8%8aN;C“¿CN,O5 :„
¨4= «=,Aƒ}ƒ«>8O“¿CN,B ¾ 3 ;C“«+>;C=%,C+%A° º ¥F2} %‚%‚%‚ O
«N%AA «=,AB:«+>;C=%,C+%A5 Ú>¿¨ ã,O3%.4À, ÃzBBB ځ(B ¥C4A4=4
30
»;“‘¨ u q

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  262(A.5)

] RO&^>>Rg
item should be now present in the menu. Invoking this item should
allow us to run the DLL and compute option prices. A screenshot of the add-in
in action is given in figure A.3.

FIGURE A.3: Screenshot of the [\>] RO&^>>R add-in.

 '$ file, and will create an


Going back to Matlab and the Excel Builder tool, we can package the com-
ponent. Matlab will put together the DLL and the Z
executable that registers the dynamic library with Windows.
We can now ship the add-in and use with a computer that does not have
Matlab installed, but the host computer must have the freely available Matlab
Component Runtime (MCR) libraries. The MCR must be the same version as the
^R _ Wa&>'>X>R , and then ship the mU ÌØ "OT%$''>Rgw Z  with our add-in.
Matlab that created our file. We can build the MCR with our Matlab installation
by using Y
Note that we only need to do this once: after the host computer has MCR
properly setup, we can add more add-ins, given that they have been created
using the same Matlab version.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


References

Albrecher, H., P. Mayer, W. Schoutens, and J. Tistaert (2007, January). The little
Heston trap. Wilmott Magazine, 83–92.
Andricopoulos, A. D., M. Widdicks, P. W. Duck, and D. P. Newton (2003). Uni-
versal option valuation using quadrature methods. Journal of Financial Eco-
nomics 67, 447–471.
Bachelier, L. (1900). Théorie de la Spéculation. Gauthier-Villars.
Bailey, D. H. and P. N. Swarztrauber (1991). The fractional fourier transform
and applications. SIAM Review 33(3), 389–404.
Bailey, D. H. and P. N. Swarztrauber (1994). A fast method for the numerical
evaluation of continuous fourier and laplace transforms. SIAM Journal on
Scientific Computing 15(5), 1105–1110.
Baillie, R. T., T. Bollerslev, and H. O. Mikkelsen (1993). Fractionally integrated
generalized autoregressive conditional heteroscedasticity. Journal of Econo-
metrics.
Bajeux, I. and J. C. Rochet (1996). Dynamic spanning: Are options an appropriate
instrument? Mathematical Finance 6, 1–16.
Bakshi, G., C. Cao, and Z. Chen (1997). Empirical performance of alternative
option pricing models. The Journal of Finance 5, 2003–2049.
Bakshi, G. and D. Madan (2000). Spanning and derivative-security valuation.
Journal of Financial Economics 55, 205–238.
Barle, S. and N. Cakici (1998). How to grow a smiling tree. Journal of Financial
Engineering 7 (2), 127–146.
Barndorff-Nielsen, O. E. (1998). Processes of normal inverse Gaussian type.
Finance and Stochastics 2, 41–68.
Barone-Adesi, G., R. Engle, and L. Mancini (2004). GARCH options in incomplete
markets. Working Paper.
Bates, D. S. (1998). Pricing options under jump diffusion processes. Technical
Report 37/88, The Wharton School, University of Pennsylvania.
Bates, D. S. (2000). Post-’87 crash fears in S&P500 futures options. Journal of
Econometrics 94, 181–238.
 !"#%$& '()"  264(A.5)

Bates, D. S. (2005). Maximum likelihood estimation of latent affine processes.


Review of Financial Studies, forthcoming.
Bauwens, L., S. Laurent, and J. Rombouts (2006). Multivariate GARCH models:
a survey. Journal of Applied Econometrics 21(1), 79–109.
Ben Hamida, S. and R. Cont (2005). Recovering volatility from option prices by
evolutionary computation. Journal of Computational Finance 8(4), XX–XX.
Bingham, N. H. and R. Kiesel (2000). Risk-Neutral Valuation. London, UK:
Springer-Verlag.
Black, F. (1972). Capital market equilibrium with restricted borrowing. Journal
of Business 45, 444–455.
Black, F. (1976). The pricing of commodity contracts. Journal of Financial Eco-
nomics 3(1), 167–79.
Black, F. and P. Karasinski (1991). Bond and option prices when short rates are
lognormal. Financial Analyst Journal (Jul-Aug), 52–59.
Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities.
Journal of Political Economy 81, 637–659.
Bollerslev, T., R. Engle, and D. Nelson (1994). ARCH models. In R. Engle
and D. McFadden (Eds.), Handbook of Econometrics, IV. Amsterdam: North
Holland.
Bollerslev, T., R. F. Engle, and J. M. Wooldridge (1988, February). A capital
asset pricing model with time varying covariances. Journal of Political Econ-
omy 96(1), 116–131.
Bollerslev, T. R. (1986). Generalized autoregressive conditional heteroscedas-
ticity. Journal of Econometrics 31, 307–327.
Bouchaud, J.-P. and M. Potters (2001). More stylized facts of financial markets:
Leverage effect and downside correlation. Physica A 299, 60–70.
Brace, A., D. Ga̧tarek, and M. Musiela (1995). The market model of interest rate
dynamics. Working paper, University of South Wales, Australia.
Breeden, D. T. and R. Litzenberger (1978). Prices of state contigent claims
implicit in option prices. Journal of Business 51, 621–51.
Brennan, M. J. (1979). The pricing of contingent claims in discrete time models.
The Journal of Finance 34, 53–68.
Brennan, M. J. and E. Schwartz (1982). An equilibrium model of bond pricing and
a test of market efficiency. Journal of Financial and Quantitative Analysis 3,
301–329.
Brigo, D. and F. Mercurio (2001). Interest Rate Models: Theory and Practice.
New York, NY: Springer Verlag.
Caines, P. (1988). Linear Stochastic Systems. Probability and Mathematical
Statistic. New York, NY: John Wiley and Sons.
Carr, P. (2002). Frequently asked questions in option pricing theory. Technical
report, forth. Journal of Derivatives.
Carr, P., H. Geman, D. Madan, and M. Yor (2002). The fine structure of asset
returns: An empirical investigation. Journal of Business 75(2), 305–332.
Carr, P., H. Geman, D. Madan, and M. Yor (2003). Stochastic volatility for Lévy
processes. Mathematical Finance 13(3), 345–382.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


265(A.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
Carr, P. and D. Madan (1999). Option valuation using the Fast Fourier Transform.
Journal of Computational Finance 3, 463–520.
Carr, P. and D. Madan (2005). A note on sufficient conditions for no arbitrage.
Finance Research Letters 2, 125–130.
Carr, P. and L. Wu (2004). Time-changed Léevy processes and option pricing.
Journal of Financial Economics 71(1), 113–141.
CBOE (2003). VIXr CBOE volatility index. White Paper, Chicago Board Options
Exchange.
Chourdakis, K. (2002). Continuous time regime switching models and applica-
tions in estimating processes with stochastic volatility and jumps. Technical
Report 464, Queen Mary, University of London.
Chourdakis, K. (2005). Option pricing using the Fractional FFT. Journal of
Computational Finance 8(2), 1–18.
Christie, A. (1982). The stochastic behavior of common stock variances: Value,
leverage and interest rate effects. Journal of Financial Economics 3, 407–432.
Cont, R. and J. da Fonseca (2002). Dynamics of implied volatility surfaces.
Quantitative Finance 2(1), 45–60.
Cox, J. C., J. E. Ingersoll, and S. A. Ross (1985). A theory of the term structure
of interest rates. Econometrica 53, 385–407.
Crépey, S. (2003). Calibration of the local volatility function in a generalized
Black-Scholes model using Tikhonov regularization. SIAM Journal of Math-
ematical Analysis 34, 1183–1206.
Davidson, R. and J. G. MacKinnon (1985, February). The interpretation of test
statistics. Canadian Journal of Economics 18(1), 38–57.
Derman, E. (1999). Regimes of volatility. RISK 12(4), 55–59.
Derman, E. and I. Kani (1994). Riding on a smile. RISK 7 (2), 32–39.
Derman, E. and I. Kani (1998). Stochastic implied trees: Arbitrage pricing with
stochastic term and strike structure of volatility. International Journal of The-
oretical and Applied Finance 1(1), 61–110.
Derman, E., I. Kani, and N. Chriss (1996). Implied trinomial trees of the volatility
smile. Journal of Derivatives 3(4), 7–22.
Derman, E., I. Kani, and J. Z. Zou (1996, July). The local volatility surface:
Unlocking the information in index options pricing. Financial Analysts Journal,
25–36.
Dothan, U. (1978). On the term structure of interest rates. Journal of Financial
Economics 6(1), 59–69.
Duan, J.-C. (1995). The Garch option pricing model. Mathematical Finance 5(1),
13–32.
Duan, J.-C., G. Gauthier, and J.-G. Simonato (1999). An analytical approximation
for the Garch option pricing model. Journal of Computational Finance 2, 75–
116.
Dueker, M. (1997). Markov switching in GARCH processes and mean-reverting
stock-market volatility. Journal of Business and Economic Statistics 15, 26–34.
Duffie, D. and R. Kan (1996). A yield-factor model of interest rates. Mathematical
Finance 6(4), 379–406.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  266(A.5)

Duffie, D., J. Pan, and K. Singleton (2000). Transform analysis and asset pricing
for affine jump–diffusions. Econometrica 68, 1343–1376.
Dupire, B. (1993). Pricing and hedging with smiles. In Proceedings of the AFFI
Conference, La Baule.
Dupire, B. (1994). Pricing with a smile. RISK 7 (1), 18–20.
Engle, R. (1982). Autoregressive conditional heteroskedasticity with estimates
of the variance of U.K. inflation. Econometrica 50, 987–1008.
Engle, R. and F. K. Kroner (1995). Multivariate simultaneous generalized ARCH.
Econometric Theory 11, 122–150.
Eraker, B., M. Johannes, and N. Polson (2001). MCMC analysis of diffusion
models with application to finance. Journal of Business and Economic Statis-
tics 19(2), 177–91.
Fama, E. F. (1965). The behavior of stock market prices. Journal of Business 38,
34–105.
Feller, W. E. (1951). Two singular diffusion problems. Annals of Mathematics 54,
173–182.
Figlewski, S. and X. Wang (2000). Is the “leverage effect” a leverage effect?
Working Paper, SSRN 256109.
Gallant, A. R. and G. Tauchen (1993). SNP: A program for nonparametric time
series analysis. version 8.3 user’s guide. Working Paper, University of North
Carolina.
Gatheral, J. (1997). Delta hedging with uncertain volatility. In I. Nelken (Ed.),
Volatility in the Capital Markets: State-of-the-Art Techniques for Modeling,
Managing, and Trading Volatility. Glenlake Publishing Company.
Gatheral, J. (2004). A parsimonious arbitrage-free implied volatility parameter-
ization with application to the valuation of volatility derivatives. In Global
Derivatives and Risk Management.
Gatheral, J. (2006). The Volatility Surface: A Practitioner’s Guide. New York,
NY: Wiley Finance.
Geman, H., N. el Karoui, and J.-C. Rochet (1995). Changes of numéraire changes
of probability measure and option pricing. Journal of Applied Probability 32,
443–458.
Gerber, H. U. and E. S. W. Shiu (1994). Option pricing by Esscher transforms.
Transactions of the Society of Actuaries XLVI, 99–191.
Ghysels, E., A. Harvey, and E. Renault (1996). Stochastic volatility. In G. Mad-
dala and C. Rao (Eds.), Handbook of Statistics, 14, Statistical Methods in
Finance. North Holland.
Glosten, L. R., R. Jagannathan, and D. Runkle (1993). On the relation between
the expected value and the volatility of the nominal excess return on stocks.
Journal of Finance 48(5), 1779–1801.
Hamilton, J. D. (1994). Time Series Analysis. Princeton, NJ: Princeton University
Press.
Hamilton, J. D. and R. Susmel (1994). Autoregressive conditional heteroscedas-
ticity and changes in regime. Journal of Econometrics 64, 307–333.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


267(A.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
Harvey, A., E. Ruiz, and N. Shephard (1994). Multivariate stochastic variance
models. Review of Economic Studies 61, 247–264.
Heath, D., R. Jarrow, and A. Morton (1992). Bond pricing and the term structure
of interest rates: A new methodology. Econometrica 60(1), 77–105.
Heston, S. L. (1993). A closed-form solution for options with stochastic volatility
with applications to bond and currency options. Review of Financial Studies 6,
327–344.
Heston, S. L. and S. Nandi (2000). A closed-form GARCH option pricing model.
Review of Financial Studies Frth, Frth.
Ho, T. S. Y. and S.-B. Lee (1986). Term structure movements and pricing interest
rate contigent claims. Journal of Finance 41, 1011–1029.
Hull, J. C. (2003). Options, Futures and Other Derivatives. (5th ed.). New Jersey,
NJ: Prentice Hall.
Hull, J. C. and A. White (1987). The pricing of options with stochastic volatilities.
The Journal of Finance 42, 281–300.
Hull, J. C. and A. White (1990). Pricing interest rate derivative securities. Review
of Financial Studies 3(4), 573–592.
Hull, J. C. and A. White (1994). Numerical procedures for implementing term
structure models I. Journal of Derivatives 2, 7–16.
Hull, J. C. and A. White (1996). Using Hull-White interest rate trees. Journal of
Derivatives, 26–36.
Ikonen, S. and J. Toivanen (2004). Operator splitting methods for American option
pricing. Applied Mathematics Letters 17, 809–814.
ISDA (1998). EMU and market conventions: Recent developments. International
Swaps and Derivatives Association document BS:9951.1.
Jackwerth, J. C. and M. Rubinstein (1996). Recovering probability distributions
from options prices. The Journal of Finance 51, 1611–1631.
Javaheri, A. (2005). Inside Volatility Arbitrage: The Secrets of Skewness. Hobo-
ken, NJ: Wiley.
Julier, S. J. and J. K. Uhlmann (1996). A general method for approximating
nonlinear transformations of probability distributions. Technical report.
Julier, S. J. and J. K. Uhlmann (1997). A new extension of the Kalman filter to
nonlinear systems. In International Symposium on Aerospace/Defense Sens-
ing, Simulation and Controls, pp. 182–193.
Kahl, C. and P. Jäckel (2005, September). Not-so-complex logarithms in the
Heston model. Wilmott Magazine, 94–103.
Karpoff, J. (1987). The relation between price changes and trading volume: A
survey. Journal of Financial and Quantitative Analysis 22, 109–126.
Kendal, M. and A. Stuart (1977). The Advanced Theory of Statistics. (4th ed.),
Volume I. London, U.K.: Charles Griffin and Co.
Lagnado, R. and S. Osher (1997). A technique for calibrating derivative secu-
rity pricing models: Numerical solutions of an inverse problem. Journal of
Computational Finance 1(1), 13–25.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  268(A.5)

Lamourex, G. and W. Lastrapes (1990). Persistence in variance, structural change,


and the GARCH model. Journal of Business and Economic Statistics 23, 225–
234.
Lee, R. (2004a). The moment formula for implied volatility at extreme strikes.
Journal of Mathematical Finance 14(3), 469–480.
Lee, R. (2004b). Option pricing by transform methods: extensions, unification
and error control. Journal of Computational Finance 7 (3), 51–86.
Longstaff, F. A. and E. S. Schwartz (1992). Interest rate volatility and the term
structure: A two factor general equilibrium model. Journal of Finance 47 (4),
1259–1282.
Madan, D., P. Carr, and E. Chang (1998). The variance gamma process and
option pricing. European Finance Review 2, 79–105.
Mandelbrot, B. and H. Taylor (1967). On the distribution of stock price differ-
ences. Operations Research 15, 1057–1062.
Marchuk, G. I. (1990). Splitting and alternating direction methods. In N. Holland
(Ed.), Handbook of Numerical Analysis, Volume 1, pp. 197–462. Amsterdam,
Holland.
MathWorks (2005). Matlab Builder for Excel 1.2.5 (User’s Guide). The Math-
Works.
McKee, S., D. P. Wall, and S. K. Wilson (1996). An alternating direction implicit
scheme for parabolic equations mixed derivative and convective terms. Journal
of Computational Physics 126(1), 64–76.
Merton, R. (1976). Option pricing when the underlying stock returns are dis-
continuous. Journal of Financial Economics 4, 125–144.
Merton, R. C. (1973). Theory of rational option pricing. Bell Journal of Economics
and Management Sciences 4, 141–183.
Merton, R. C. (1992). Continuous Time Finance. (2nd ed.). Blackwell Publishing.
Miltersen, K., K. Sandmann, and D. Sondermann (1997). Closed form solutions
for term structure derivatives with lognormal interest rates. Journal of Fi-
nance 52(1), 409–30.
Modigliani, F. and M. Miller (1958). The cost of capital, corporation finance,
and the theory of investment. American Economic Review 48, 261–297.
Neftci, S. N. (2000). Introduction to the Mathematics of Financial Derivatives.
(2nd ed.). Academic Press.
Nelson, C. R. and A. F. Siegel (1987). Parsimonious modeling of yield curves.
Journal of Business 60(4), 473–489.
Nelson, D. B. (1990). Arch models as diffusion approximations. Journal of Econo-
metrics 45, 7–39.
Nelson, D. B. (1991). Conditional heteroscedasticity in asset returns: A new
approach. Econometrica 59, 347–370.
Øksendal, B. (2003). Stochastic Differential Equations. (6th ed.). New York, NY:
Springer-Verlag.
Pan, J. (1997). Stochastic volatility with reset at jumps. Permanent Working
Paper.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


269(A.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
Peaceman, D. W. and J. H. H. Rachford (1955). The numerical solution of
parabolic and elliptic differential equations. Journal of the Society of In-
dustrial and Applied Mathematics 3, 28–45.
Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling (1992). Nu-
merical Recipes in C: The Art of Scientific Computing (2nd ed.). Cambridge
University Press.
Protter, P. E. (2004). Stochastic Integration and Differential Equations. (2nd
ed.). New York, NY: Springer-Verlag.
Rogers, L. C. G. and D. Williams (1994a). Diffusions, Markov Processes and
Martingales. Volume 1: Foundations (2nd ed.). Cambridge, UK: Cambridge
University Press.
Rogers, L. C. G. and D. Williams (1994b). Diffusions, Markov Processes and
Martingales. Volume 2: Itō Calculus (2nd ed.). Cambridge, UK: Cambridge
University Press.
Rubinstein, M. (1985). Nonparametric tests of alternative option pricing models
using all reported trades and quotes on the 30 most active CBOE from August
23, 1976 through August 31, 1978. The Journal of Finance 40, 455–480.
Rubinstein, M. (1994). Implied binomial trees. Journal of Finance 49, 771–818.
Sandmann, G. and S. J. Koopman (1998). Estimation of stochastic volatility
models via monte carlo maximum likelihood. Journal of Econometrics 87 (2),
271–301.
Sandmann, K. and D. Sondermann (1997). A note on the stability of lognor-
mal interest rate models and the pricing of eurodollar futures. Mathematical
Finance 7 (2), 119–125.
Schöbel, R. and J. Zhu (1999). Stochastic volatility using an Ornstein-Uhlenbeck
process: An extension. European Finance Review 3, 23–46.
Schoutens, W., E. Simons, and J. Tistaert (2004, March). A perfect calibration!
Now what? Wilmott Magazine, XX–XX.
Sentana, E. (1995). Quadratic Garch models. Review of Economic Studies 62,
639–661.
Shreve, S. (2004a). Stochastic Methods in Finance v1: The Binomial Asset
Pricing Model. New York, NY: Springer-Verlag.
Shreve, S. (2004b). Stochastic Methods in Finance v2: Continuous Time Models.
New York, NY: Springer-Verlag.
Skiadopoulos, G., S. Hodges, and L. Clelow (2000). Dynamics of the S&P500
implied volatility surface. Review of Derivatives Research 3, 263–282.
Stein, E. M. and J. C. Stein (1991). Stock price distributions with stochastic
volatility: An analytic approach. Review of Financial Studies 4, 727–752.
Svensson, L. (1994). Estimating and interpreting forward interest rates: Sweden
1992-4. Discussion Paper 1051, Centre for Economic Policy Research.
Thomas, J. W. (1995). Numerical Partial Differential Equations. Number 22 in
Texts in Applied Mathematics. New York, NY: Springer.
van der Merwe, R., N. de Freitas, A. Doucet, and E. Wan (2001). The unscented
particel filter. In Advances in Neural Information Processing Systems 13.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


 !"#%$& '()"  270(A.5)

Vasiček, O. A. (1977). An equilibrium characterization of the term structure.


Journal of Financial Economics 5, 177–188.
Wiggins, J. B. (1987). Option values under stochastic volatility: Theory and
empirical estimates. Journal of Financial Economics 19, 351–372.
Wilmott, P., J. Dewynne, and S. Howison (1993). Option Pricing. Mathematical
Models and Computation. Oxford, UK: Oxford Financial Press.
Zakoian, M. (1994). Threshold heteroscedastic models. Journal of Economic
Dynamics and Control 18, 931–955.
Zellner, A. (1995). Introduction to Bayesian Inference in Econometrics. Chich-
ester, UK: John Wiley and Sons.

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


Index

The hook for the index entries.

affine, 190, 193 current account, 184 Figarch model, 141


affine models, 152 fixed income security, 177
arbitrage, 35, 150 Dothan model, 191 floor, 187
static, 169 Dow Jones index, 130, 140 forward curve, 184
Arch model, 136 forward rate, 182
Assymetric Garch model, Efficient Method of
instantaneous, 184
143 Moments, 160
fundamental theorem of
Egarch model, 143, 160
asset pricing, 188
equivalent martingale
Black-Scholes model, 149
measure, 142, 154
Black-Scholes PDE, 175
equivalent probability Garch
bond
measure, 188 and stochastic volatility,
coupon, 177
equivalent probability 135
face value, 177
measures, 153 Garch model, 137
par, 177
Esscher transform, 148 and Arch(∞), 137
yield, 177
Euler equation, 147 and incomplete markets,
bond option, 187
event, 3 146
Borel algebra, 3
expectation hypothesis,
Brownian motion, 186 and persistent volatility,
188
butterfly spread, 169 137, 141
explosive bank account,
fractionally integrated,
192
calendar spread, 169 141
exponential martingale,
cap, 187 Garch(1,1), 137
153
Capital Asset Pricing GED, 143
exponential Vasicek model,
Model (CAPM), 145 191 in-mean, 145
capital structure, 133 exponentially weighted maximum likelihood
CBOE, 132 moving average estimation, 138
characteristic function, 149 model, 141 multidimensional, 145
CIR model, 192 non Gaussian, 142
compounding, 178 fat tails, 130 skewness, 143
continuously, 178 Feller process, 192 standard errors, 139
corporate bond, 177 Feynman-Kac formula, 159 volatility forecasts, 137
 !"#%$& '()"  272(A.5)

generalized error distribu- market segmentation, 189 sample space, see state
tion, 143 Markov chain, 148, 161 space
Girsanov theorem, 150, 152 Markov Chain Monte Sharpe ratio, 145, 154
Carlo, 160 short rate
Igarch model, 141 maturity date, 35 stylized facts, 189
implied density, 167, 173 maximum likelihood short rate model, see
Breeden-Litzenberger estimation, 127 one-factor model
method, 175 standard errors, 139 σ algebra, 3
implied tree, 174 mean reversion, 192 generated, 4
implied volatility, 131 measurable space, 3 Simulated Method of
and Delta, 134 measure, 3 Moments, 160
and expected volatility, mixture of distributions, smoothing, 167, 176
132 130 Nadaraya-Watson, 168
and moneyness, 134 model risk, 162 radial basis function,
and realized volatility, 167
142 no-arbitrage tests, 169 sovereign bond, 177
skew, 134 Novikov condition, 188 SP500 index, 132, 140
smile, 134 square root process, 151
surface, 133 Ohrnstein-Uhlenbeck “square root” process, 192
dynamics, 134 process, 151 state space, 1
sticky Delta, 135 one-factor model, 184 stochastic volatility, 149,
Ornstein-Uhlenbeck
sticky strike, 135 185
process, 190 and Garch, 135
SVI parameterization,
overfitting, 201
172 calibration, 161, 165
inverse problem, 162 estimation, 160
Particle filter, 161
Itō formula, 186 penalty function, 162 PDE, 156
preferred habitat, 189 replicating portfolio, 157
Kalman filter, 161, 192 price of interest rate risk, Student-t, 142
Kolmogorov backward 185 swaption, 187
equation, 175 price of risk, 145
Kolmogorov extension prior information, 162 term structure PDE, 187
theorem, 174 transform methods, 152
Kolmogorov forward Radon-Nikodym derivative,
equation, 175 154, 188 underlying asset, 35
marginal rate of utility function, 147
Leibniz rule, 173 substitution, 148
leverage effect, 133, 143, random variable, 1 Vasicek model, 190
149 redundant claim, 35 Vega, 162
liquidity preferences regimes vertical spread, 169
theory, 189 of volatility, 132 Vitali set, 2
local volatility, 167, 174 regularization, 162 VIX index, 132, 141, 142
function, 174 Tikhonov-Phillips, 162 and financial crises, 132
PDE representation, 175 risk aversion, 150 and realized volatility,
long memory, 141 risk premium, 150 132
time varying, 146 volatility
marginal rate of substitu- and correlation with
tion, 147 sample path, 1 returns, 132, 143, 149
as Radon-Nikodym sample point, see sample and financial crises, 130
derivative, 148 path attractor, 131

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H


273(A.5) Q #RO& $>S%!%TVU>! WRX$>SO&T
clusters, 131 volatility risk, 150 shapes, 179
cyclical, 131 theories of, 188
long memory, 141 yield curve, 179
persistence, 141 historical, 181
time varying, 130 parametric forms, 179 zero-coupon bond, 177

* +-,/.0+213-4 567.0+98:84<; =>56@?>4A21/40=>.CB@D>3:5:3<=E? =:=-1GFIH:H<J:J:JKBL=-?>4<1+2; M:=N 30AOBP;>40= H

You might also like