0% found this document useful (0 votes)
3 views300 pages

Quantitative Economics With Python

The document titled 'Quantitative Economics with Python' by Thomas J. Sargent and John Stachurski provides a comprehensive overview of various economic concepts and mathematical techniques using Python. It covers topics such as geometric series, modeling COVID-19, linear algebra, QR decomposition, complex numbers, circulant matrices, and singular value decomposition. The content is structured into chapters with detailed examples and applications relevant to economics and data analysis.

Uploaded by

Sumit Chatterjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views300 pages

Quantitative Economics With Python

The document titled 'Quantitative Economics with Python' by Thomas J. Sargent and John Stachurski provides a comprehensive overview of various economic concepts and mathematical techniques using Python. It covers topics such as geometric series, modeling COVID-19, linear algebra, QR decomposition, complex numbers, circulant matrices, and singular value decomposition. The content is structured into chapters with detailed examples and applications relevant to economics and data analysis.

Uploaded by

Sumit Chatterjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 300

Quantitative Economics with Python

Thomas J. Sargent & John Stachurski

Jul 06, 2022


CONTENTS

I Tools and Techniques 5


1 Geometric Series for Elementary Economics 7
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Key Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Example: The Money Multiplier in Fractional Reserve Banking . . . . . . . . . . . . . . . . . . . . . 8
1.4 Example: The Keynesian Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Example: Interest Rates and Present Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Back to the Keynesian Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Modeling COVID 19 25
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 The SIR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Ending Lockdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Linear Algebra 35
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Solving Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.8 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4 QR Decomposition 59
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 Some Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Using QR Decomposition to Compute Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.7 𝑄𝑅 and PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5 Complex Numbers and Trigonometry 67


5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 De Moivre’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 Applications of de Moivre’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6 Circulant Matrices 77

i
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Constructing a Circulant Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 Connection to Permutation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.4 Examples with Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.5 Associated Permutation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.6 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7 Singular Value Decomposition (SVD) 95


7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.2 The Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.3 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.4 Reduced Versus Full SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.5 Digression: Polar Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.6 Principle Components Analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.7 Relationship of PCA to SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.8 PCA with Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.9 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.10 Dynamic Mode Decomposition (DMD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.11 Representation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.12 Representation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.13 Representation 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.14 Using Fewer Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.15 Source for Some Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

II Elementary Statistics 115


8 Elementary Probability with Matrices 117
8.1 Sketch of Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.2 Digression: What Does Probability Mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.3 Representing Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.4 Univariate Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.5 Bivariate Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.6 Marginal Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.7 Conditional Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.8 Statistical Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.9 Means and Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.10 Classic Trick for Generating Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.11 Some Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.12 Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.13 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.14 A Mixed Discrete-Continuous Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.15 Matrix Representation of Some Bivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.16 A Continuous Bivariate Random Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.17 Sum of Two Independently Distributed Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 148
8.18 Transition Probability Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
8.19 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8.20 Copula Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.21 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

9 Univariate Time Series with Matrix Algebra 157


9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.2 Samuelson’s model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.3 Adding a random term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

ii
9.4 A forward looking model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

10 LLN and CLT 167


10.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
10.2 Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
10.3 LLN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
10.4 CLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
10.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
10.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

11 Two Meanings of Probability 185


11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
11.2 Frequentist Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
11.3 Bayesian Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

12 Multivariate Hypergeometric Distribution 201


12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
12.2 The Administrator’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
12.3 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

13 Multivariate Normal Distribution 211


13.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
13.2 The Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
13.3 Bivariate Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
13.4 Trivariate Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
13.5 One Dimensional Intelligence (IQ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
13.6 Information as Surprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
13.7 Cholesky Factor Magic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
13.8 Math and Verbal Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
13.9 Univariate Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
13.10 Stochastic Difference Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
13.11 Application to Stock Price Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
13.12 Filtering Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
13.13 Classic Factor Analysis Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
13.14 PCA and Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

14 Heavy-Tailed Distributions 249


14.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
14.2 Visual Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
14.3 Failure of the LLN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
14.4 Classifying Tail Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
14.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

15 Fault Tree Uncertainties 267


15.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
15.2 Log normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
15.3 The Convolution Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
15.4 Approximating Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
15.5 Convolving Probability Mass Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
15.6 Failure Tree Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
15.7 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
15.8 Failure Rates Unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
15.9 Waste Hoist Failure Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

iii
16 Introduction to Artificial Neural Networks 283
16.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
16.2 A Deep (but not Wide) Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
16.3 Calibrating Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
16.4 Back Propagation and the Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
16.5 Training Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
16.6 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
16.7 How Deep? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
16.8 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

17 Randomized Response Surveys 295


17.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
17.2 Warner’s Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
17.3 Comparing Two Survey Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
17.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

18 Expected Utilities of Random Responses 305


18.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
18.2 Privacy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
18.3 Zoo of Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
18.4 Respondent’s Expected Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
18.5 Utilitarian View of Survey Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
18.6 Criticisms of Proposed Privacy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
18.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

III Linear Programming 321


19 Linear Programming 323
19.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
19.2 Objective Function and Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
19.3 Example 1: Production Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
19.4 Example 2: Investment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
19.5 Standard Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
19.6 Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
19.7 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
19.8 Duality Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

20 Optimal Transport 339


20.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
20.2 The Optimal Transport Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
20.3 The Linear Programming Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
20.4 The Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
20.5 The Python Optimal Transport Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

21 Von Neumann Growth Model (and a Generalization) 355


21.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
21.2 Model Ingredients and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
21.3 Dynamic Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
21.4 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
21.5 Interpretation as Two-player Zero-sum Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

iv
IV Introduction to Dynamics 371
22 Dynamics in One Dimension 373
22.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
22.2 Some Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
22.3 Graphical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
22.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
22.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388

23 AR1 Processes 393


23.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
23.2 The AR(1) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
23.3 Stationarity and Asymptotic Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
23.4 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
23.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
23.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

24 Finite Markov Chains 407


24.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
24.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
24.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
24.4 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
24.5 Irreducibility and Aperiodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
24.6 Stationary Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
24.7 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
24.8 Computing Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
24.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
24.10 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428

25 Inventory Dynamics 431


25.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
25.2 Sample Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
25.3 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
25.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
25.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438

26 Linear State Space Models 441


26.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
26.2 The Linear State Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
26.3 Distributions and Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
26.4 Stationarity and Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
26.5 Noisy Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
26.6 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
26.7 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
26.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
26.9 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461

27 Samuelson Multiplier-Accelerator 463


27.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
27.2 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
27.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
27.4 Stochastic Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
27.5 Government Spending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
27.6 Wrapping Everything Into a Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
27.7 Using the LinearStateSpace Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488

v
27.8 Pure Multiplier Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
27.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499

28 Kesten Processes and Firm Dynamics 501


28.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
28.2 Kesten Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
28.3 Heavy Tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
28.4 Application: Firm Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
28.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
28.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510

29 Wealth Distribution Dynamics 513


29.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
29.2 Lorenz Curves and the Gini Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
29.3 A Model of Wealth Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
29.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
29.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
29.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
29.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525

30 A First Look at the Kalman Filter 529


30.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
30.2 The Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
30.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
30.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
30.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
30.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543

31 Shortest Paths 549


31.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
31.2 Outline of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
31.3 Finding Least-Cost Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
31.4 Solving for Minimum Cost-to-Go . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
31.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
31.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557

V Search 561
32 Job Search I: The McCall Search Model 563
32.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
32.2 The McCall Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
32.3 Computing the Optimal Policy: Take 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
32.4 Computing the Optimal Policy: Take 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
32.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
32.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574

33 Job Search II: Search and Separation 579


33.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
33.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
33.3 Solving the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
33.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
33.5 Impact of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
33.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
33.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588

vi
34 Job Search III: Fitted Value Function Iteration 591
34.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
34.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
34.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
34.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
34.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

35 Job Search IV: Correlated Wage Offers 599


35.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
35.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
35.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
35.4 Unemployment Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
35.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
35.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607

36 Job Search V: Modeling Career Choice 609


36.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
36.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
36.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612
36.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
36.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618

37 Job Search VI: On-the-Job Search 623


37.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
37.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
37.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
37.4 Solving for Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628
37.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
37.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632

VI Consumption, Savings and Capital 635


38 Cass-Koopmans Model 637
38.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
38.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
38.3 Planning Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
38.4 Shooting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
38.5 Setting Initial Capital to Steady State Capital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
38.6 A Turnpike Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
38.7 A Limiting Infinite Horizon Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
38.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652

39 Cass-Koopmans Competitive Equilibrium 653


39.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
39.2 Review of Cass-Koopmans Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
39.3 Competitive Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
39.4 Market Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
39.5 Firm Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
39.6 Household Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
39.7 Computing a Competitive Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
39.8 Yield Curves and Hicks-Arrow Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666

40 Cake Eating I: Introduction to Optimal Saving 669


40.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669

vii
40.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
40.3 The Value Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
40.4 The Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
40.5 The Euler Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
40.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676
40.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677

41 Cake Eating II: Numerical Methods 679


41.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
41.2 Reviewing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
41.3 Value Function Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
41.4 Time Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
41.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
41.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689

42 Optimal Growth I: The Stochastic Optimal Growth Model 695


42.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
42.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
42.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
42.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
42.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709

43 Optimal Growth II: Accelerating the Code with Numba 711


43.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
43.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
43.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
43.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717
43.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718

44 Optimal Growth III: Time Iteration 723


44.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
44.2 The Euler Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
44.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
44.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
44.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732

45 Optimal Growth IV: The Endogenous Grid Method 735


45.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
45.2 Key Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
45.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737

46 The Income Fluctuation Problem I: Basic Model 743


46.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
46.2 The Optimal Savings Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
46.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
46.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
46.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752
46.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754

47 The Income Fluctuation Problem II: Stochastic Returns on Assets 759


47.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759
47.2 The Savings Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
47.3 Solution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
47.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
47.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768

viii
47.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769

VII Bayes Law 771


48 Non-Conjugate Priors 773
48.1 Unleashing MCMC on a Binomial Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774
48.2 Prior Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776
48.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780
48.4 Alternative Prior Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
48.5 Posteriors Via MCMC and VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
48.6 Non-conjugate Prior Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795

49 Posterior Distributions for AR(1) Parameters 811


49.1 PyMC Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
49.2 Numpyro Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817

50 Forecasting an AR(1) process 821


50.1 A Univariate First-Order Autoregressive Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
50.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
50.3 Predictive Distributions of Path Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824
50.4 A Wecker-Like Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825
50.5 Using Simulations to Approximate a Posterior Distribution . . . . . . . . . . . . . . . . . . . . . . . 825
50.6 Calculating Sample Path Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
50.7 Original Wecker Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
50.8 Extended Wecker Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830
50.9 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832

VIII Information 835


51 Job Search VII: Search with Learning 837
51.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837
51.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
51.3 Take 1: Solution by VFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
51.4 Take 2: A More Efficient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
51.5 Another Functional Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847
51.6 Solving the RWFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847
51.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848
51.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849
51.9 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849
51.10 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
51.11 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853
51.12 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857

52 Likelihood Ratio Processes 869


52.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
52.2 Likelihood Ratio Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870
52.3 Nature Permanently Draws from Density g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
52.4 Peculiar Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873
52.5 Nature Permanently Draws from Density f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874
52.6 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875
52.7 Kullback–Leibler Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
52.8 Sequels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883

ix
53 Computing Mean of a Likelihood Ratio Process 885
53.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
53.2 Mathematical Expectation of Likelihood Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
53.3 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
53.4 Selecting a Sampling Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
53.5 Approximating a cumulative likelihood ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890
53.6 Distribution of Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891
53.7 More Thoughts about Choice of Sampling Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 893

54 A Problem that Stumped Milton Friedman 899


54.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899
54.2 Origin of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 900
54.3 A Dynamic Programming Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 901
54.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906
54.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 908
54.6 Comparison with Neyman-Pearson Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914
54.7 Sequels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916

55 Exchangeability and Bayesian Updating 917


55.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917
55.2 Independently and Identically Distributed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918
55.3 A Setting in Which Past Observations Are Informative . . . . . . . . . . . . . . . . . . . . . . . . . 919
55.4 Relationship Between IID and Exchangeable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 920
55.5 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921
55.6 Bayes’ Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921
55.7 More Details about Bayesian Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 922
55.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925
55.9 Sequels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931

56 Likelihood Ratio Processes and Bayesian Learning 933


56.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933
56.2 The Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934
56.3 Likelihood Ratio Process and Bayes’ Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935
56.4 Sequels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939

57 Bayesian versus Frequentist Decision Rules 941


57.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942
57.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942
57.3 Frequentist Decision Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
57.4 Bayesian Decision Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950
57.5 Was the Navy Captain’s Hunch Correct? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957
57.6 More Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959
57.7 Distribution of Bayesian Decision Rule’s Time to Decide . . . . . . . . . . . . . . . . . . . . . . . . 959
57.8 Probability of Making Correct Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962
57.9 Distribution of Likelihood Ratios at Frequentist’s 𝑡 . . . . . . . . . . . . . . . . . . . . . . . . . . . 964

IX LQ Control 967
58 LQ Control: Foundations 969
58.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 969
58.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 970
58.3 Optimality – Finite Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972
58.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975
58.5 Extensions and Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 980

x
58.6 Further Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982
58.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 990
58.8 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 991

59 Lagrangian for LQ Control 999


59.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 999
59.2 Undiscounted LQ DP Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1000
59.3 Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001
59.4 State-Costate Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1002
59.5 Reciprocal Pairs Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1002
59.6 Schur decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003
59.7 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004
59.8 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1009
59.9 Discounted Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1010

60 Eliminating Cross Products 1013


60.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013
60.2 Undiscounted Dynamic Programming Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013
60.3 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014
60.4 Duality table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015

61 The Permanent Income Model 1017


61.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017
61.2 The Savings Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018
61.3 Alternative Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025
61.4 Two Classic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028
61.5 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1031
61.6 Appendix: The Euler Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1031

62 Permanent Income II: LQ Techniques 1033


62.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1033
62.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034
62.3 The LQ Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036
62.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037
62.5 Two Example Economies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1040

63 Production Smoothing via Inventories 1051


63.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1051
63.2 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056
63.3 Inventories Not Useful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057
63.4 Inventories Useful but are Hardwired to be Zero Always . . . . . . . . . . . . . . . . . . . . . . . . . 1058
63.5 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059
63.6 Example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1060
63.7 Example 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1061
63.8 Example 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1063
63.9 Example 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1064
63.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067
63.11 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1068

X Multiple Agent Models 1073


64 Schelling’s Segregation Model 1075
64.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075
64.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076

xi
64.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1077
64.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1081
64.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1081

65 A Lake Model of Employment and Unemployment 1089


65.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089
65.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1090
65.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1092
65.4 Dynamics of an Individual Worker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1097
65.5 Endogenous Job Finding Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1099
65.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106
65.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1107

66 Rational Expectations Equilibrium 1117


66.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1117
66.2 Defining Rational Expectations Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1119
66.3 Computation of an Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1122
66.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124
66.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126

67 Stability in Linear Rational Expectations Models 1131


67.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1132
67.2 Linear Difference Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1132
67.3 Illustration: Cagan’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1134
67.4 Some Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136
67.5 Alternative Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1138
67.6 Another Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1139
67.7 Log money Supply Feeds Back on Log Price Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1142
67.8 Big 𝑃 , Little 𝑝 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145
67.9 Fun with SymPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1147

68 Markov Perfect Equilibrium 1149


68.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1149
68.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1150
68.3 Linear Markov Perfect Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1151
68.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154
68.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157
68.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1160

69 Uncertainty Traps 1167


69.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1167
69.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1168
69.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1171
69.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1172
69.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173
69.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175

70 The Aiyagari Model 1181


70.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1181
70.2 The Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1182
70.3 Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1183
70.4 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1184

xii
XI Asset Pricing and Finance 1191
71 Asset Pricing: Finite State Models 1193
71.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1193
71.2 Pricing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1194
71.3 Prices in the Risk-Neutral Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1195
71.4 Risk Aversion and Asset Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199
71.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1208
71.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209

72 Competitive Equilibria with Arrow Securities 1213


72.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213
72.2 The setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1214
72.3 Recursive Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215
72.4 State Variable Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216
72.5 Markov Asset Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216
72.6 General Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1218
72.7 Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1222
72.8 Finite Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1233

73 Heterogeneous Beliefs and Bubbles 1239


73.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1239
73.2 Structure of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1240
73.3 Solving the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1242
73.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1247
73.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1248

XII Data and Empirics 1251


74 Pandas for Panel Data 1253
74.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1253
74.2 Slicing and Reshaping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1254
74.3 Merging Dataframes and Filling NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1259
74.4 Grouping and Summarizing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265
74.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1271
74.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1271
74.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1272

75 Linear Regression in Python 1277


75.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277
75.2 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1278
75.3 Extending the Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284
75.4 Endogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286
75.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1290
75.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1290
75.7 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1291

76 Maximum Likelihood Estimation 1295


76.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295
76.2 Set Up and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296
76.3 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1299
76.4 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1301
76.5 MLE with Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1303
76.6 Maximum Likelihood Estimation with statsmodels . . . . . . . . . . . . . . . . . . . . . . . . . 1308

xiii
76.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1312
76.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1313
76.9 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1313

XIII Auctions 1317


77 First-Price and Second-Price Auctions 1319
77.1 First-Price Sealed-Bid Auction (FPSB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319
77.2 Second-Price Sealed-Bid Auction (SPSB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1320
77.3 Characterization of SPSB Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1320
77.4 Uniform Distribution of Private Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1321
77.5 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1321
77.6 First price sealed bid auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1321
77.7 Second Price Sealed Bid Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1322
77.8 Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1322
77.9 Revenue Equivalence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1324
77.10 Calculation of Bid Price in FPSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326
77.11 𝜒2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1327
77.12 5 Code Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1330
77.13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1334

78 Multiple Good Allocation Mechanisms 1335


78.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335
78.2 Ascending Bids Auction for Multiple Goods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335
78.3 A Benevolent Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336
78.4 Equivalence of Allocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336
78.5 Ascending Bid Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336
78.6 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1337
78.7 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1339
78.8 A Python Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1347
78.9 Robustness Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1356
78.10 A Groves-Clarke Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1368
78.11 An Example Solved by Hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1369
78.12 Another Python Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1372

XIV Other 1379


79 Troubleshooting 1381
79.1 Fixing Your Local Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1381
79.2 Reporting an Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1382

80 References 1383

81 Execution Statistics 1385

Bibliography 1387

Index 1395

xiv
Quantitative Economics with Python

This website presents a set of lectures on quantitative economic modeling, designed and written by Thomas J. Sargent
and John Stachurski.
For an overview of the series, see this page
• Tools and Techniques
– Geometric Series for Elementary Economics
– Modeling COVID 19
– Linear Algebra
– QR Decomposition
– Complex Numbers and Trigonometry
– Circulant Matrices
– Singular Value Decomposition (SVD)
• Elementary Statistics
– Elementary Probability with Matrices
– Univariate Time Series with Matrix Algebra
– LLN and CLT
– Two Meanings of Probability
– Multivariate Hypergeometric Distribution
– Multivariate Normal Distribution
– Heavy-Tailed Distributions
– Fault Tree Uncertainties
– Introduction to Artificial Neural Networks
– Randomized Response Surveys
– Expected Utilities of Random Responses
• Linear Programming
– Linear Programming
– Optimal Transport
– Von Neumann Growth Model (and a Generalization)
• Introduction to Dynamics
– Dynamics in One Dimension
– AR1 Processes
– Finite Markov Chains
– Inventory Dynamics
– Linear State Space Models
– Samuelson Multiplier-Accelerator
– Kesten Processes and Firm Dynamics
– Wealth Distribution Dynamics

CONTENTS 1
Quantitative Economics with Python

– A First Look at the Kalman Filter


– Shortest Paths
• Search
– Job Search I: The McCall Search Model
– Job Search II: Search and Separation
– Job Search III: Fitted Value Function Iteration
– Job Search IV: Correlated Wage Offers
– Job Search V: Modeling Career Choice
– Job Search VI: On-the-Job Search
• Consumption, Savings and Capital
– Cass-Koopmans Model
– Cass-Koopmans Competitive Equilibrium
– Cake Eating I: Introduction to Optimal Saving
– Cake Eating II: Numerical Methods
– Optimal Growth I: The Stochastic Optimal Growth Model
– Optimal Growth II: Accelerating the Code with Numba
– Optimal Growth III: Time Iteration
– Optimal Growth IV: The Endogenous Grid Method
– The Income Fluctuation Problem I: Basic Model
– The Income Fluctuation Problem II: Stochastic Returns on Assets
• Bayes Law
– Non-Conjugate Priors
– Posterior Distributions for AR(1) Parameters
– Forecasting an AR(1) process
• Information
– Job Search VII: Search with Learning
– Likelihood Ratio Processes
– Computing Mean of a Likelihood Ratio Process
– A Problem that Stumped Milton Friedman
– Exchangeability and Bayesian Updating
– Likelihood Ratio Processes and Bayesian Learning
– Bayesian versus Frequentist Decision Rules
• LQ Control
– LQ Control: Foundations
– Lagrangian for LQ Control
– Eliminating Cross Products

2 CONTENTS
Quantitative Economics with Python

– The Permanent Income Model


– Permanent Income II: LQ Techniques
– Production Smoothing via Inventories
• Multiple Agent Models
– Schelling’s Segregation Model
– A Lake Model of Employment and Unemployment
– Rational Expectations Equilibrium
– Stability in Linear Rational Expectations Models
– Markov Perfect Equilibrium
– Uncertainty Traps
– The Aiyagari Model
• Asset Pricing and Finance
– Asset Pricing: Finite State Models
– Competitive Equilibria with Arrow Securities
– Heterogeneous Beliefs and Bubbles
• Data and Empirics
– Pandas for Panel Data
– Linear Regression in Python
– Maximum Likelihood Estimation
• Auctions
– First-Price and Second-Price Auctions
– Multiple Good Allocation Mechanisms
• Other
– Troubleshooting
– References
– Execution Statistics

Previous website
While this new site will receive all future updates, you may still view the old site here for the next month.

CONTENTS 3
Quantitative Economics with Python

4 CONTENTS
Part I

Tools and Techniques

5
CHAPTER

ONE

GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Contents

• Geometric Series for Elementary Economics


– Overview
– Key Formulas
– Example: The Money Multiplier in Fractional Reserve Banking
– Example: The Keynesian Multiplier
– Example: Interest Rates and Present Values
– Back to the Keynesian Multiplier

1.1 Overview

The lecture describes important ideas in economics that use the mathematics of geometric series.
Among these are
• the Keynesian multiplier
• the money multiplier that prevails in fractional reserve banking systems
• interest rates and present values of streams of payouts from assets
(As we shall see below, the term multiplier comes down to meaning sum of a convergent geometric series)
These and other applications prove the truth of the wise crack that
“in economics, a little knowledge of geometric series goes a long way “
Below we’ll use the following imports:

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
import sympy as sym
from sympy import init_printing, latex
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D

7
Quantitative Economics with Python

1.2 Key Formulas

To start, let 𝑐 be a real number that lies strictly between −1 and 1.


• We often write this as 𝑐 ∈ (−1, 1).
• Here (−1, 1) denotes the collection of all real numbers that are strictly less than 1 and strictly greater than −1.
• The symbol ∈ means in or belongs to the set after the symbol.
We want to evaluate geometric series of two types – infinite and finite.

1.2.1 Infinite Geometric Series

The first type of geometric that interests us is the infinite series

1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯

Where ⋯ means that the series continues without end.


The key formula is
1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ = (1.1)
1−𝑐
To prove key formula (1.1), multiply both sides by (1 − 𝑐) and verify that if 𝑐 ∈ (−1, 1), then the outcome is the equation
1 = 1.

1.2.2 Finite Geometric Series

The second series that interests us is the finite geometric series

1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇

where 𝑇 is a positive integer.


The key formula here is

1 − 𝑐𝑇 +1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇 =
1−𝑐
Remark: The above formula works for any value of the scalar 𝑐. We don’t have to restrict 𝑐 to be in the set (−1, 1).
We now move on to describe some famous economic applications of geometric series.

1.3 Example: The Money Multiplier in Fractional Reserve Banking

In a fractional reserve banking system, banks hold only a fraction 𝑟 ∈ (0, 1) of cash behind each deposit receipt that
they issue
• In recent times
– cash consists of pieces of paper issued by the government and called dollars or pounds or …
– a deposit is a balance in a checking or savings account that entitles the owner to ask the bank for immediate
payment in cash

8 Chapter 1. Geometric Series for Elementary Economics


Quantitative Economics with Python

• When the UK and France and the US were on either a gold or silver standard (before 1914, for example)
– cash was a gold or silver coin
– a deposit receipt was a bank note that the bank promised to convert into gold or silver on demand; (sometimes
it was also a checking or savings account balance)
Economists and financiers often define the supply of money as an economy-wide sum of cash plus deposits.
In a fractional reserve banking system (one in which the reserve ratio 𝑟 satisfies 0 < 𝑟 < 1), banks create money by
issuing deposits backed by fractional reserves plus loans that they make to their customers.
A geometric series is a key tool for understanding how banks create money (i.e., deposits) in a fractional reserve system.
The geometric series formula (1.1) is at the heart of the classic model of the money creation process – one that leads us
to the celebrated money multiplier.

1.3.1 A Simple Model

There is a set of banks named 𝑖 = 0, 1, 2, ….


Bank 𝑖’s loans 𝐿𝑖 , deposits 𝐷𝑖 , and reserves 𝑅𝑖 must satisfy the balance sheet equation (because balance sheets balance):

𝐿𝑖 + 𝑅 𝑖 = 𝐷 𝑖 (1.2)

The left side of the above equation is the sum of the bank’s assets, namely, the loans 𝐿𝑖 it has outstanding plus its reserves
of cash 𝑅𝑖 .
The right side records bank 𝑖’s liabilities, namely, the deposits 𝐷𝑖 held by its depositors; these are IOU’s from the bank to
its depositors in the form of either checking accounts or savings accounts (or before 1914, bank notes issued by a bank
stating promises to redeem note for gold or silver on demand).
Each bank 𝑖 sets its reserves to satisfy the equation

𝑅𝑖 = 𝑟𝐷𝑖 (1.3)

where 𝑟 ∈ (0, 1) is its reserve-deposit ratio or reserve ratio for short


• the reserve ratio is either set by a government or chosen by banks for precautionary reasons
Next we add a theory stating that bank 𝑖 + 1’s deposits depend entirely on loans made by bank 𝑖, namely

𝐷𝑖+1 = 𝐿𝑖 (1.4)

Thus, we can think of the banks as being arranged along a line with loans from bank 𝑖 being immediately deposited in
𝑖+1
• in this way, the debtors to bank 𝑖 become creditors of bank 𝑖 + 1
Finally, we add an initial condition about an exogenous level of bank 0’s deposits

𝐷0 is given exogenously

We can think of 𝐷0 as being the amount of cash that a first depositor put into the first bank in the system, bank number
𝑖 = 0.
Now we do a little algebra.
Combining equations (1.2) and (1.3) tells us that

𝐿𝑖 = (1 − 𝑟)𝐷𝑖 (1.5)

1.3. Example: The Money Multiplier in Fractional Reserve Banking 9


Quantitative Economics with Python

This states that bank 𝑖 loans a fraction (1 − 𝑟) of its deposits and keeps a fraction 𝑟 as cash reserves.
Combining equation (1.5) with equation (1.4) tells us that

𝐷𝑖+1 = (1 − 𝑟)𝐷𝑖 for 𝑖 ≥ 0

which implies that

𝐷𝑖 = (1 − 𝑟)𝑖 𝐷0 for 𝑖 ≥ 0 (1.6)

Equation (1.6) expresses 𝐷𝑖 as the 𝑖 th term in the product of 𝐷0 and the geometric series

1, (1 − 𝑟), (1 − 𝑟)2 , ⋯

Therefore, the sum of all deposits in our banking system 𝑖 = 0, 1, 2, … is



𝐷0 𝐷
∑(1 − 𝑟)𝑖 𝐷0 = = 0 (1.7)
𝑖=0
1 − (1 − 𝑟) 𝑟

1.3.2 Money Multiplier

The money multiplier is a number that tells the multiplicative factor by which an exogenous injection of cash into bank
0 leads to an increase in the total deposits in the banking system.
1
Equation (1.7) asserts that the money multiplier is 𝑟
𝐷0
• An initial deposit of cash of 𝐷0 in bank 0 leads the banking system to create total deposits of 𝑟 .

• The initial deposit 𝐷0 is held as reserves, distributed throughout the banking system according to 𝐷0 = ∑𝑖=0 𝑅𝑖 .

1.4 Example: The Keynesian Multiplier

The famous economist John Maynard Keynes and his followers created a simple model intended to determine national
income 𝑦 in circumstances in which
• there are substantial unemployed resources, in particular excess supply of labor and capital
• prices and interest rates fail to adjust to make aggregate supply equal demand (e.g., prices and interest rates are
frozen)
• national income is entirely determined by aggregate demand

1.4.1 Static Version

An elementary Keynesian model of national income determination consists of three equations that describe aggregate
demand for 𝑦 and its components.
The first equation is a national income identity asserting that consumption 𝑐 plus investment 𝑖 equals national income 𝑦:

𝑐+𝑖=𝑦

The second equation is a Keynesian consumption function asserting that people consume a fraction 𝑏 ∈ (0, 1) of their
income:

𝑐 = 𝑏𝑦

10 Chapter 1. Geometric Series for Elementary Economics


Quantitative Economics with Python

The fraction 𝑏 ∈ (0, 1) is called the marginal propensity to consume.


The fraction 1 − 𝑏 ∈ (0, 1) is called the marginal propensity to save.
The third equation simply states that investment is exogenous at level 𝑖.
• exogenous means determined outside this model.
Substituting the second equation into the first gives (1 − 𝑏)𝑦 = 𝑖.
Solving this equation for 𝑦 gives
1
𝑦= 𝑖
1−𝑏
1
The quantity 1−𝑏 is called the investment multiplier or simply the multiplier.
Applying the formula for the sum of an infinite geometric series, we can write the above equation as

𝑦 = 𝑖 ∑ 𝑏𝑡
𝑡=0

where 𝑡 is a nonnegative integer.


So we arrive at the following equivalent expressions for the multiplier:

1
= ∑ 𝑏𝑡
1−𝑏 𝑡=0


The expression ∑𝑡=0 𝑏𝑡 motivates an interpretation of the multiplier as the outcome of a dynamic process that we describe
next.

1.4.2 Dynamic Version

We arrive at a dynamic version by interpreting the nonnegative integer 𝑡 as indexing time and changing our specification
of the consumption function to take time into account
• we add a one-period lag in how income affects consumption
We let 𝑐𝑡 be consumption at time 𝑡 and 𝑖𝑡 be investment at time 𝑡.
We modify our consumption function to assume the form

𝑐𝑡 = 𝑏𝑦𝑡−1

so that 𝑏 is the marginal propensity to consume (now) out of last period’s income.
We begin with an initial condition stating that

𝑦−1 = 0

We also assume that

𝑖𝑡 = 𝑖 for all 𝑡 ≥ 0

so that investment is constant over time.


It follows that

𝑦0 = 𝑖 + 𝑐0 = 𝑖 + 𝑏𝑦−1 = 𝑖

1.4. Example: The Keynesian Multiplier 11


Quantitative Economics with Python

and

𝑦1 = 𝑐1 + 𝑖 = 𝑏𝑦0 + 𝑖 = (1 + 𝑏)𝑖

and

𝑦2 = 𝑐2 + 𝑖 = 𝑏𝑦1 + 𝑖 = (1 + 𝑏 + 𝑏2 )𝑖

and more generally

𝑦𝑡 = 𝑏𝑦𝑡−1 + 𝑖 = (1 + 𝑏 + 𝑏2 + ⋯ + 𝑏𝑡 )𝑖

or
1 − 𝑏𝑡+1
𝑦𝑡 = 𝑖
1−𝑏
Evidently, as 𝑡 → +∞,
1
𝑦𝑡 → 𝑖
1−𝑏
Remark 1: The above formula is often applied to assert that an exogenous increase in investment of Δ𝑖 at time 0 ignites
a dynamic process of increases in national income by successive amounts

Δ𝑖, (1 + 𝑏)Δ𝑖, (1 + 𝑏 + 𝑏2 )Δ𝑖, ⋯

at times 0, 1, 2, ….
Remark 2 Let 𝑔𝑡 be an exogenous sequence of government expenditures.
If we generalize the model so that the national income identity becomes

𝑐𝑡 + 𝑖 𝑡 + 𝑔 𝑡 = 𝑦 𝑡
1
then a version of the preceding argument shows that the government expenditures multiplier is also 1−𝑏 , so that a
permanent increase in government expenditures ultimately leads to an increase in national income equal to the multiplier
times the increase in government expenditures.

1.5 Example: Interest Rates and Present Values

We can apply our formula for geometric series to study how interest rates affect values of streams of dollar payments that
extend over time.
We work in discrete time and assume that 𝑡 = 0, 1, 2, … indexes time.
We let 𝑟 ∈ (0, 1) be a one-period net nominal interest rate
• if the nominal interest rate is 5 percent, then 𝑟 = .05
A one-period gross nominal interest rate 𝑅 is defined as

𝑅 = 1 + 𝑟 ∈ (1, 2)

• if 𝑟 = .05, then 𝑅 = 1.05


Remark: The gross nominal interest rate 𝑅 is an exchange rate or relative price of dollars at between times 𝑡 and 𝑡 + 1.
The units of 𝑅 are dollars at time 𝑡 + 1 per dollar at time 𝑡.
When people borrow and lend, they trade dollars now for dollars later or dollars later for dollars now.
The price at which these exchanges occur is the gross nominal interest rate.

12 Chapter 1. Geometric Series for Elementary Economics


Quantitative Economics with Python

• If I sell 𝑥 dollars to you today, you pay me 𝑅𝑥 dollars tomorrow.


• This means that you borrowed 𝑥 dollars for me at a gross interest rate 𝑅 and a net interest rate 𝑟.
We assume that the net nominal interest rate 𝑟 is fixed over time, so that 𝑅 is the gross nominal interest rate at times
𝑡 = 0, 1, 2, ….
Two important geometric sequences are

1, 𝑅, 𝑅2 , ⋯ (1.8)

and

1, 𝑅−1 , 𝑅−2 , ⋯ (1.9)

Sequence (1.8) tells us how dollar values of an investment accumulate through time.
Sequence (1.9) tells us how to discount future dollars to get their values in terms of today’s dollars.

1.5.1 Accumulation

Geometric sequence (1.8) tells us how one dollar invested and re-invested in a project with gross one period nominal rate
of return accumulates
• here we assume that net interest payments are reinvested in the project
• thus, 1 dollar invested at time 0 pays interest 𝑟 dollars after one period, so we have 𝑟 + 1 = 𝑅 dollars at time1
• at time 1 we reinvest 1 + 𝑟 = 𝑅 dollars and receive interest of 𝑟𝑅 dollars at time 2 plus the principal 𝑅 dollars, so
we receive 𝑟𝑅 + 𝑅 = (1 + 𝑟)𝑅 = 𝑅2 dollars at the end of period 2
• and so on
Evidently, if we invest 𝑥 dollars at time 0 and reinvest the proceeds, then the sequence

𝑥, 𝑥𝑅, 𝑥𝑅2 , ⋯

tells how our account accumulates at dates 𝑡 = 0, 1, 2, ….

1.5.2 Discounting

Geometric sequence (1.9) tells us how much future dollars are worth in terms of today’s dollars.
Remember that the units of 𝑅 are dollars at 𝑡 + 1 per dollar at 𝑡.
It follows that
• the units of 𝑅−1 are dollars at 𝑡 per dollar at 𝑡 + 1
• the units of 𝑅−2 are dollars at 𝑡 per dollar at 𝑡 + 2
• and so on; the units of 𝑅−𝑗 are dollars at 𝑡 per dollar at 𝑡 + 𝑗
So if someone has a claim on 𝑥 dollars at time 𝑡 + 𝑗, it is worth 𝑥𝑅−𝑗 dollars at time 𝑡 (e.g., today).

1.5. Example: Interest Rates and Present Values 13


Quantitative Economics with Python

1.5.3 Application to Asset Pricing

A lease requires a payments stream of 𝑥𝑡 dollars at times 𝑡 = 0, 1, 2, … where


𝑥𝑡 = 𝐺 𝑡 𝑥0
where 𝐺 = (1 + 𝑔) and 𝑔 ∈ (0, 1).
Thus, lease payments increase at 𝑔 percent per period.
For a reason soon to be revealed, we assume that 𝐺 < 𝑅.
The present value of the lease is
𝑝0 = 𝑥0 + 𝑥1 /𝑅 + 𝑥2 /(𝑅2 )+ ⋱
= 𝑥0 (1 + 𝐺𝑅−1 + 𝐺2 𝑅−2 + ⋯)
1
= 𝑥0
1 − 𝐺𝑅−1
where the last line uses the formula for an infinite geometric series.
Recall that 𝑅 = 1 + 𝑟 and 𝐺 = 1 + 𝑔 and that 𝑅 > 𝐺 and 𝑟 > 𝑔 and that 𝑟 and 𝑔 are typically small numbers, e.g., .05
or .03.
1
Use the Taylor series of 1+𝑟 about 𝑟 = 0, namely,
1
= 1 − 𝑟 + 𝑟2 − 𝑟3 + ⋯
1+𝑟
1
and the fact that 𝑟 is small to approximate 1+𝑟 ≈ 1 − 𝑟.
Use this approximation to write 𝑝0 as
1
𝑝0 = 𝑥0
1 − 𝐺𝑅−1
1
= 𝑥0
1 − (1 + 𝑔)(1 − 𝑟)
1
= 𝑥0
1 − (1 + 𝑔 − 𝑟 − 𝑟𝑔)
1
≈ 𝑥0
𝑟−𝑔
where the last step uses the approximation 𝑟𝑔 ≈ 0.
The approximation
𝑥0
𝑝0 =
𝑟−𝑔
is known as the Gordon formula for the present value or current price of an infinite payment stream 𝑥0 𝐺𝑡 when the
nominal one-period interest rate is 𝑟 and when 𝑟 > 𝑔.
We can also extend the asset pricing formula so that it applies to finite leases.
Let the payment stream on the lease now be 𝑥𝑡 for 𝑡 = 1, 2, … , 𝑇 , where again
𝑥𝑡 = 𝐺 𝑡 𝑥0
The present value of this lease is:
𝑝0 = 𝑥0 + 𝑥1 /𝑅 + ⋯ + 𝑥𝑇 /𝑅𝑇
= 𝑥0 (1 + 𝐺𝑅−1 + ⋯ + 𝐺𝑇 𝑅−𝑇 )
𝑥0 (1 − 𝐺𝑇 +1 𝑅−(𝑇 +1) )
=
1 − 𝐺𝑅−1

14 Chapter 1. Geometric Series for Elementary Economics


Quantitative Economics with Python

Applying the Taylor series to 𝑅−(𝑇 +1) about 𝑟 = 0 we get:


1 1
= 1 − 𝑟(𝑇 + 1) + 𝑟2 (𝑇 + 1)(𝑇 + 2) + ⋯ ≈ 1 − 𝑟(𝑇 + 1)
(1 + 𝑟)𝑇 +1 2

Similarly, applying the Taylor series to 𝐺𝑇 +1 about 𝑔 = 0:

(1 + 𝑔)𝑇 +1 = 1 + (𝑇 + 1)𝑔(1 + 𝑔)𝑇 + (𝑇 + 1)𝑇 𝑔2 (1 + 𝑔)𝑇 −1 + ⋯ ≈ 1 + (𝑇 + 1)𝑔

Thus, we get the following approximation:

𝑥0 (1 − (1 + (𝑇 + 1)𝑔)(1 − 𝑟(𝑇 + 1)))


𝑝0 =
1 − (1 − 𝑟)(1 + 𝑔)

Expanding:

𝑥0 (1 − 1 + (𝑇 + 1)2 𝑟𝑔 − 𝑟(𝑇 + 1) + 𝑔(𝑇 + 1))


𝑝0 =
1 − 1 + 𝑟 − 𝑔 + 𝑟𝑔
𝑥0 (𝑇 + 1)((𝑇 + 1)𝑟𝑔 + 𝑟 − 𝑔)
=
𝑟 − 𝑔 + 𝑟𝑔
𝑥0 (𝑇 + 1)(𝑟 − 𝑔) 𝑥0 𝑟𝑔(𝑇 + 1)
≈ +
𝑟−𝑔 𝑟−𝑔
𝑥 𝑟𝑔(𝑇 + 1)
= 𝑥0 (𝑇 + 1) + 0
𝑟−𝑔

We could have also approximated by removing the second term 𝑟𝑔𝑥0 (𝑇 + 1) when 𝑇 is relatively small compared to
1/(𝑟𝑔) to get 𝑥0 (𝑇 + 1) as in the finite stream approximation.
We will plot the true finite stream present-value and the two approximations, under different values of 𝑇 , and 𝑔 and 𝑟 in
Python.
First we plot the true finite stream present-value after computing it below

# True present value of a finite lease


def finite_lease_pv_true(T, g, r, x_0):
G = (1 + g)
R = (1 + r)
return (x_0 * (1 - G**(T + 1) * R**(-T - 1))) / (1 - G * R**(-1))
# First approximation for our finite lease

def finite_lease_pv_approx_1(T, g, r, x_0):


p = x_0 * (T + 1) + x_0 * r * g * (T + 1) / (r - g)
return p

# Second approximation for our finite lease


def finite_lease_pv_approx_2(T, g, r, x_0):
return (x_0 * (T + 1))

# Infinite lease
def infinite_lease(g, r, x_0):
G = (1 + g)
R = (1 + r)
return x_0 / (1 - G * R**(-1))

Now that we have defined our functions, we can plot some outcomes.
First we study the quality of our approximations

1.5. Example: Interest Rates and Present Values 15


Quantitative Economics with Python

def plot_function(axes, x_vals, func, args):


axes.plot(x_vals, func(*args), label=func.__name__)

T_max = 50

T = np.arange(0, T_max+1)
g = 0.02
r = 0.03
x_0 = 1

our_args = (T, g, r, x_0)


funcs = [finite_lease_pv_true,
finite_lease_pv_approx_1,
finite_lease_pv_approx_2]
## the three functions we want to compare

fig, ax = plt.subplots()
ax.set_title('Finite Lease Present Value $T$ Periods Ahead')
for f in funcs:
plot_function(ax, T, f, our_args)
ax.legend()
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
plt.show()

Evidently our approximations perform well for small values of 𝑇 .


However, holding 𝑔 and r fixed, our approximations deteriorate as 𝑇 increases.
Next we compare the infinite and finite duration lease present values over different lease lengths 𝑇 .

# Convergence of infinite and finite


T_max = 1000
T = np.arange(0, T_max+1)
fig, ax = plt.subplots()
ax.set_title('Infinite and Finite Lease Present Value $T$ Periods Ahead')
(continues on next page)

16 Chapter 1. Geometric Series for Elementary Economics


Quantitative Economics with Python

(continued from previous page)


f_1 = finite_lease_pv_true(T, g, r, x_0)
f_2 = np.full(T_max+1, infinite_lease(g, r, x_0))
ax.plot(T, f_1, label='T-period lease PV')
ax.plot(T, f_2, '--', label='Infinite lease PV')
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
ax.legend()
plt.show()

The graph above shows how as duration 𝑇 → +∞, the value of a lease of duration 𝑇 approaches the value of a perpetual
lease.
Now we consider two different views of what happens as 𝑟 and 𝑔 covary

# First view
# Changing r and g
fig, ax = plt.subplots()
ax.set_title('Value of lease of length $T$')
ax.set_ylabel('Present Value, $p_0$')
ax.set_xlabel('$T$ periods ahead')
T_max = 10
T=np.arange(0, T_max+1)

rs, gs = (0.9, 0.5, 0.4001, 0.4), (0.4, 0.4, 0.4, 0.5),


comparisons = ('$\gg$', '$>$', r'$\approx$', '$<$')
for r, g, comp in zip(rs, gs, comparisons):
ax.plot(finite_lease_pv_true(T, g, r, x_0), label=f'r(={r}) {comp} g(={g})')

ax.legend()
plt.show()

1.5. Example: Interest Rates and Present Values 17


Quantitative Economics with Python

This graph gives a big hint for why the condition 𝑟 > 𝑔 is necessary if a lease of length 𝑇 = +∞ is to have finite value.
For fans of 3-d graphs the same point comes through in the following graph.
If you aren’t enamored of 3-d graphs, feel free to skip the next visualization!

# Second view
fig = plt.figure()
T = 3
ax = fig.gca(projection='3d')
r = np.arange(0.01, 0.99, 0.005)
g = np.arange(0.011, 0.991, 0.005)

rr, gg = np.meshgrid(r, g)
z = finite_lease_pv_true(T, gg, rr, x_0)

# Removes points where undefined


same = (rr == gg)
z[same] = np.nan
surf = ax.plot_surface(rr, gg, z, cmap=cm.coolwarm,
antialiased=True, clim=(0, 15))
fig.colorbar(surf, shrink=0.5, aspect=5)
ax.set_xlabel('$r$')
ax.set_ylabel('$g$')
ax.set_zlabel('Present Value, $p_0$')
ax.view_init(20, 10)
ax.set_title('Three Period Lease PV with Varying $g$ and $r$')
plt.show()

/tmp/ipykernel_11195/2419678664.py:4: MatplotlibDeprecationWarning: Calling gca()␣


↪with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor␣

↪releases later, gca() will take no keyword arguments. The gca() function should␣

↪only be used to get the current axes, or if no axes exist, create new axes with␣

↪default keyword arguments. To create a new axes with non-default arguments, use␣

↪plt.axes() or plt.subplot().

ax = fig.gca(projection='3d')

18 Chapter 1. Geometric Series for Elementary Economics


Quantitative Economics with Python

We can use a little calculus to study how the present value 𝑝0 of a lease varies with 𝑟 and 𝑔.
We will use a library called SymPy.
SymPy enables us to do symbolic math calculations including computing derivatives of algebraic equations.
We will illustrate how it works by creating a symbolic expression that represents our present value formula for an infinite
lease.
After that, we’ll use SymPy to compute derivatives

# Creates algebraic symbols that can be used in an algebraic expression


g, r, x0 = sym.symbols('g, r, x0')
G = (1 + g)
R = (1 + r)
p0 = x0 / (1 - G * R**(-1))
init_printing(use_latex='mathjax')
print('Our formula is:')
p0

Our formula is:

𝑥0
− 𝑔+1
𝑟+1 + 1

print('dp0 / dg is:')
dp_dg = sym.diff(p0, g)
dp_dg

1.5. Example: Interest Rates and Present Values 19


Quantitative Economics with Python

dp0 / dg is:

𝑥0
2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)

print('dp0 / dr is:')
dp_dr = sym.diff(p0, r)
dp_dr

dp0 / dr is:

𝑥0 (𝑔 + 1)
− 2
2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)

𝜕𝑝0 𝜕𝑝0
We can see that for 𝜕𝑟 < 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, so 𝜕𝑟 will always be negative.
𝜕𝑝0 𝜕𝑝0
Similarly, 𝜕𝑔 > 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, so 𝜕𝑔 will always be positive.

1.6 Back to the Keynesian Multiplier

We will now go back to the case of the Keynesian multiplier and plot the time path of 𝑦𝑡 , given that consumption is a
constant fraction of national income, and investment is fixed.

# Function that calculates a path of y


def calculate_y(i, b, g, T, y_init):
y = np.zeros(T+1)
y[0] = i + b * y_init + g
for t in range(1, T+1):
y[t] = b * y[t-1] + i + g
return y

# Initial values
i_0 = 0.3
g_0 = 0.3
# 2/3 of income goes towards consumption
b = 2/3
y_init = 0
T = 100

fig, ax = plt.subplots()
ax.set_title('Path of Aggregate Output Over Time')
ax.set_xlabel('$t$')
ax.set_ylabel('$y_t$')
ax.plot(np.arange(0, T+1), calculate_y(i_0, b, g_0, T, y_init))
# Output predicted by geometric series
ax.hlines(i_0 / (1 - b) + g_0 / (1 - b), xmin=-1, xmax=101, linestyles='--')
plt.show()

20 Chapter 1. Geometric Series for Elementary Economics


Quantitative Economics with Python

In this model, income grows over time, until it gradually converges to the infinite geometric series sum of income.
We now examine what will happen if we vary the so-called marginal propensity to consume, i.e., the fraction of income
that is consumed

bs = (1/3, 2/3, 5/6, 0.9)

fig,ax = plt.subplots()
ax.set_title('Changing Consumption as a Fraction of Income')
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
x = np.arange(0, T+1)
for b in bs:
y = calculate_y(i_0, b, g_0, T, y_init)
ax.plot(x, y, label=r'$b=$'+f"{b:.2f}")
ax.legend()
plt.show()

1.6. Back to the Keynesian Multiplier 21


Quantitative Economics with Python

Increasing the marginal propensity to consume 𝑏 increases the path of output over time.
Now we will compare the effects on output of increases in investment and government spending.

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(6, 10))


fig.subplots_adjust(hspace=0.3)

x = np.arange(0, T+1)
values = [0.3, 0.4]

for i in values:
y = calculate_y(i, b, g_0, T, y_init)
ax1.plot(x, y, label=f"i={i}")
for g in values:
y = calculate_y(i_0, b, g, T, y_init)
ax2.plot(x, y, label=f"g={g}")

axes = ax1, ax2


param_labels = "Investment", "Government Spending"
for ax, param in zip(axes, param_labels):
ax.set_title(f'An Increase in {param} on Output')
ax.legend(loc ="lower right")
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
plt.show()

22 Chapter 1. Geometric Series for Elementary Economics


Quantitative Economics with Python

Notice here, whether government spending increases from 0.3 to 0.4 or investment increases from 0.3 to 0.4, the shifts
in the graphs are identical.

1.6. Back to the Keynesian Multiplier 23


Quantitative Economics with Python

24 Chapter 1. Geometric Series for Elementary Economics


CHAPTER

TWO

MODELING COVID 19

Contents

• Modeling COVID 19
– Overview
– The SIR Model
– Implementation
– Experiments
– Ending Lockdown

2.1 Overview

This is a Python version of the code for analyzing the COVID-19 pandemic provided by Andrew Atkeson.
See, in particular
• NBER Working Paper No. 26867
• COVID-19 Working papers and code
The purpose of his notes is to introduce economists to quantitative modeling of infectious disease dynamics.
Dynamics are modeled using a standard SIR (Susceptible-Infected-Removed) model of disease spread.
The model dynamics are represented by a system of ordinary differential equations.
The main objective is to study the impact of suppression through social distancing on the spread of the infection.
The focus is on US outcomes but the parameters can be adjusted to study other countries.
We will use the following standard imports:

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from numpy import exp

We will also use SciPy’s numerical routine odeint for solving differential equations.

25
Quantitative Economics with Python

from scipy.integrate import odeint

This routine calls into compiled code from the FORTRAN library odepack.

2.2 The SIR Model

In the version of the SIR model we will analyze there are four states.
All individuals in the population are assumed to be in one of these four states.
The states are: susceptible (S), exposed (E), infected (I) and removed ®.
Comments:
• Those in state R have been infected and either recovered or died.
• Those who have recovered are assumed to have acquired immunity.
• Those in the exposed group are not yet infectious.

2.2.1 Time Path

The flow across states follows the path 𝑆 → 𝐸 → 𝐼 → 𝑅.


All individuals in the population are eventually infected when the transmission rate is positive and 𝑖(0) > 0.
The interest is primarily in
• the number of infections at a given time (which determines whether or not the health care system is overwhelmed)
and
• how long the caseload can be deferred (hopefully until a vaccine arrives)
Using lower case letters for the fraction of the population in each state, the dynamics are

𝑠(𝑡)
̇ = −𝛽(𝑡) 𝑠(𝑡) 𝑖(𝑡)
𝑒(𝑡)
̇ = 𝛽(𝑡) 𝑠(𝑡) 𝑖(𝑡) − 𝜎𝑒(𝑡) (2.1)
̇ = 𝜎𝑒(𝑡) − 𝛾𝑖(𝑡)
𝑖(𝑡)

In these equations,
• 𝛽(𝑡) is called the transmission rate (the rate at which individuals bump into others and expose them to the virus).
• 𝜎 is called the infection rate (the rate at which those who are exposed become infected)
• 𝛾 is called the recovery rate (the rate at which infected people recover or die).
• the dot symbol 𝑦 ̇ represents the time derivative 𝑑𝑦/𝑑𝑡.
We do not need to model the fraction 𝑟 of the population in state 𝑅 separately because the states form a partition.
In particular, the “removed” fraction of the population is 𝑟 = 1 − 𝑠 − 𝑒 − 𝑖.
We will also track 𝑐 = 𝑖 + 𝑟, which is the cumulative caseload (i.e., all those who have or have had the infection).
The system (2.1) can be written in vector form as

𝑥̇ = 𝐹 (𝑥, 𝑡), 𝑥 ∶= (𝑠, 𝑒, 𝑖) (2.2)

for suitable definition of 𝐹 (see the code below).

26 Chapter 2. Modeling COVID 19


Quantitative Economics with Python

2.2.2 Parameters

Both 𝜎 and 𝛾 are thought of as fixed, biologically determined parameters.


As in Atkeson’s note, we set
• 𝜎 = 1/5.2 to reflect an average incubation period of 5.2 days.
• 𝛾 = 1/18 to match an average illness duration of 18 days.
The transmission rate is modeled as
• 𝛽(𝑡) ∶= 𝑅(𝑡)𝛾 where 𝑅(𝑡) is the effective reproduction number at time 𝑡.
(The notation is slightly confusing, since 𝑅(𝑡) is different to 𝑅, the symbol that represents the removed state.)

2.3 Implementation

First we set the population size to match the US.

pop_size = 3.3e8

Next we fix parameters as described above.

γ = 1 / 18
σ = 1 / 5.2

Now we construct a function that represents 𝐹 in (2.2)

def F(x, t, R0=1.6):


"""
Time derivative of the state vector.

* x is the state vector (array_like)


* t is time (scalar)
* R0 is the effective transmission rate, defaulting to a constant

"""
s, e, i = x

# New exposure of susceptibles


β = R0(t) * γ if callable(R0) else R0 * γ
ne = β * s * i

# Time derivatives
ds = - ne
de = ne - σ * e
di = σ * e - γ * i

return ds, de, di

Note that R0 can be either constant or a given function of time.


The initial conditions are set to

2.3. Implementation 27
Quantitative Economics with Python

# initial conditions of s, e, i
i_0 = 1e-7
e_0 = 4 * i_0
s_0 = 1 - i_0 - e_0

In vector form the initial condition is

x_0 = s_0, e_0, i_0

We solve for the time path numerically using odeint, at a sequence of dates t_vec.

def solve_path(R0, t_vec, x_init=x_0):


"""
Solve for i(t) and c(t) via numerical integration,
given the time path for R0.

"""
G = lambda x, t: F(x, t, R0)
s_path, e_path, i_path = odeint(G, x_init, t_vec).transpose()

c_path = 1 - s_path - e_path # cumulative cases


return i_path, c_path

2.4 Experiments

Let’s run some experiments using this code.


The time period we investigate will be 550 days, or around 18 months:

t_length = 550
grid_size = 1000
t_vec = np.linspace(0, t_length, grid_size)

2.4.1 Experiment 1: Constant R0 Case

Let’s start with the case where R0 is constant.


We calculate the time path of infected people under different assumptions for R0:

R0_vals = np.linspace(1.6, 3.0, 6)


labels = [f'$R0 = {r:.2f}$' for r in R0_vals]
i_paths, c_paths = [], []

for r in R0_vals:
i_path, c_path = solve_path(r, t_vec)
i_paths.append(i_path)
c_paths.append(c_path)

Here’s some code to plot the time paths.

28 Chapter 2. Modeling COVID 19


Quantitative Economics with Python

def plot_paths(paths, labels, times=t_vec):

fig, ax = plt.subplots()

for path, label in zip(paths, labels):


ax.plot(times, path, label=label)

ax.legend(loc='upper left')

plt.show()

Let’s plot current cases as a fraction of the population.

plot_paths(i_paths, labels)

As expected, lower effective transmission rates defer the peak of infections.


They also lead to a lower peak in current cases.
Here are cumulative cases, as a fraction of population:

plot_paths(c_paths, labels)

2.4. Experiments 29
Quantitative Economics with Python

2.4.2 Experiment 2: Changing Mitigation

Let’s look at a scenario where mitigation (e.g., social distancing) is successively imposed.
Here’s a specification for R0 as a function of time.

def R0_mitigating(t, r0=3, η=1, r_bar=1.6):


R0 = r0 * exp(- η * t) + (1 - exp(- η * t)) * r_bar
return R0

The idea is that R0 starts off at 3 and falls to 1.6.


This is due to progressive adoption of stricter mitigation measures.
The parameter η controls the rate, or the speed at which restrictions are imposed.
We consider several different rates:

η_vals = 1/5, 1/10, 1/20, 1/50, 1/100


labels = [fr'$\eta = {η:.2f}$' for η in η_vals]

This is what the time path of R0 looks like at these alternative rates:

fig, ax = plt.subplots()

for η, label in zip(η_vals, labels):


ax.plot(t_vec, R0_mitigating(t_vec, η=η), label=label)

ax.legend()
plt.show()

30 Chapter 2. Modeling COVID 19


Quantitative Economics with Python

Let’s calculate the time path of infected people:

i_paths, c_paths = [], []

for η in η_vals:
R0 = lambda t: R0_mitigating(t, η=η)
i_path, c_path = solve_path(R0, t_vec)
i_paths.append(i_path)
c_paths.append(c_path)

These are current cases under the different scenarios:

plot_paths(i_paths, labels)

Here are cumulative cases, as a fraction of population:

plot_paths(c_paths, labels)

2.4. Experiments 31
Quantitative Economics with Python

2.5 Ending Lockdown

The following replicates additional results by Andrew Atkeson on the timing of lifting lockdown.
Consider these two mitigation scenarios:
1. 𝑅𝑡 = 0.5 for 30 days and then 𝑅𝑡 = 2 for the remaining 17 months. This corresponds to lifting lockdown in 30
days.
2. 𝑅𝑡 = 0.5 for 120 days and then 𝑅𝑡 = 2 for the remaining 14 months. This corresponds to lifting lockdown in 4
months.
The parameters considered here start the model with 25,000 active infections and 75,000 agents already exposed to the
virus and thus soon to be contagious.

# initial conditions
i_0 = 25_000 / pop_size
e_0 = 75_000 / pop_size
s_0 = 1 - i_0 - e_0
x_0 = s_0, e_0, i_0

Let’s calculate the paths:

R0_paths = (lambda t: 0.5 if t < 30 else 2,


lambda t: 0.5 if t < 120 else 2)

labels = [f'scenario {i}' for i in (1, 2)]

i_paths, c_paths = [], []

for R0 in R0_paths:
i_path, c_path = solve_path(R0, t_vec, x_init=x_0)
i_paths.append(i_path)
c_paths.append(c_path)

Here is the number of active infections:

32 Chapter 2. Modeling COVID 19


Quantitative Economics with Python

plot_paths(i_paths, labels)

What kind of mortality can we expect under these scenarios?


Suppose that 1% of cases result in death

ν = 0.01

This is the cumulative number of deaths:

paths = [path * ν * pop_size for path in c_paths]


plot_paths(paths, labels)

This is the daily death rate:

paths = [path * ν * γ * pop_size for path in i_paths]


plot_paths(paths, labels)

2.5. Ending Lockdown 33


Quantitative Economics with Python

Pushing the peak of curve further into the future may reduce cumulative deaths if a vaccine is found.

34 Chapter 2. Modeling COVID 19


CHAPTER

THREE

LINEAR ALGEBRA

Contents

• Linear Algebra
– Overview
– Vectors
– Matrices
– Solving Systems of Equations
– Eigenvalues and Eigenvectors
– Further Topics
– Exercises
– Solutions

3.1 Overview

Linear algebra is one of the most useful branches of applied mathematics for economists to invest in.
For example, many applied problems in economics and finance require the solution of a linear system of equations, such
as
𝑦1 = 𝑎𝑥1 + 𝑏𝑥2
𝑦2 = 𝑐𝑥1 + 𝑑𝑥2

or, more generally,

𝑦1 = 𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑘 𝑥𝑘


⋮ (3.1)
𝑦𝑛 = 𝑎𝑛1 𝑥1 + 𝑎𝑛2 𝑥2 + ⋯ + 𝑎𝑛𝑘 𝑥𝑘

The objective here is to solve for the “unknowns” 𝑥1 , … , 𝑥𝑘 given 𝑎11 , … , 𝑎𝑛𝑘 and 𝑦1 , … , 𝑦𝑛 .
When considering such problems, it is essential that we first consider at least some of the following questions
• Does a solution actually exist?
• Are there in fact many solutions, and if so how should we interpret them?
• If no solution exists, is there a best “approximate” solution?

35
Quantitative Economics with Python

• If a solution exists, how should we compute it?


These are the kinds of topics addressed by linear algebra.
In this lecture we will cover the basics of linear and matrix algebra, treating both theory and computation.
We admit some overlap with this lecture, where operations on NumPy arrays were first explained.
Note that this lecture is more theoretical than most, and contains background material that will be used in applications as
we go along.
Let’s start with some imports:

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
from scipy.interpolate import interp2d
from scipy.linalg import inv, solve, det, eig

3.2 Vectors

A vector of length 𝑛 is just a sequence (or array, or tuple) of 𝑛 numbers, which we write as 𝑥 = (𝑥1 , … , 𝑥𝑛 ) or 𝑥 =
[𝑥1 , … , 𝑥𝑛 ].
We will write these sequences either horizontally or vertically as we please.
(Later, when we wish to perform certain matrix operations, it will become necessary to distinguish between the two)
The set of all 𝑛-vectors is denoted by ℝ𝑛 .
For example, ℝ2 is the plane, and a vector in ℝ2 is just a point in the plane.
Traditionally, vectors are represented visually as arrows from the origin to the point.
The following figure represents three vectors in this manner

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(-5, 5), ylim=(-5, 5))


ax.grid()
vecs = ((2, 4), (-3, 3), (-4, -3.5))
for v in vecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.7,
width=0.5))
ax.text(1.1 * v[0], 1.1 * v[1], str(v))
plt.show()

36 Chapter 3. Linear Algebra


Quantitative Economics with Python

3.2.1 Vector Operations

The two most common operators for vectors are addition and scalar multiplication, which we now describe.
As a matter of definition, when we add two vectors, we add them element-by-element

𝑥1 𝑦1 𝑥1 + 𝑦1
⎡𝑥 ⎤ ⎡𝑦 ⎤ ⎡𝑥 + 𝑦 ⎤
𝑥 + 𝑦 = ⎢ 2 ⎥ + ⎢ 2 ⎥ ∶= ⎢ 2 2⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣𝑥𝑛 ⎦ ⎣𝑦𝑛 ⎦ ⎣𝑥𝑛 + 𝑦𝑛 ⎦

Scalar multiplication is an operation that takes a number 𝛾 and a vector 𝑥 and produces

𝛾𝑥1
⎡ 𝛾𝑥 ⎤
𝛾𝑥 ∶= ⎢ 2 ⎥
⎢ ⋮ ⎥
⎣𝛾𝑥𝑛 ⎦

Scalar multiplication is illustrated in the next figure

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
(continues on next page)

3.2. Vectors 37
Quantitative Economics with Python

(continued from previous page)


ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(-5, 5), ylim=(-5, 5))


x = (2, 2)
ax.annotate('', xy=x, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=1,
width=0.5))
ax.text(x[0] + 0.4, x[1] - 0.2, '$x$', fontsize='16')

scalars = (-2, 2)
x = np.array(x)

for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] - 0.2, f'${s} x$', fontsize='16')
plt.show()

38 Chapter 3. Linear Algebra


Quantitative Economics with Python

In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is more commonly represented
as a NumPy array.
One advantage of NumPy arrays is that scalar multiplication and addition have very natural syntax

x = np.ones(3) # Vector of three ones


y = np.array((2, 4, 6)) # Converts tuple (2, 4, 6) into array
x + y

array([3., 5., 7.])

4 * x

array([4., 4., 4.])

3.2. Vectors 39
Quantitative Economics with Python

3.2.2 Inner Product and Norm

The inner product of vectors 𝑥, 𝑦 ∈ ℝ𝑛 is defined as


𝑛
𝑥′ 𝑦 ∶= ∑ 𝑥𝑖 𝑦𝑖
𝑖=1

Two vectors are called orthogonal if their inner product is zero.


The norm of a vector 𝑥 represents its “length” (i.e., its distance from the zero vector) and is defined as
1/2
√ 𝑛
‖𝑥‖ ∶= 𝑥′ 𝑥 ∶= (∑ 𝑥2𝑖 )
𝑖=1

The expression ‖𝑥 − 𝑦‖ is thought of as the distance between 𝑥 and 𝑦.


Continuing on from the previous example, the inner product and norm can be computed as follows

np.sum(x * y) # Inner product of x and y

12.0

np.sqrt(np.sum(x**2)) # Norm of x, take one

1.7320508075688772

np.linalg.norm(x) # Norm of x, take two

1.7320508075688772

3.2.3 Span

Given a set of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 , it’s natural to think about the new vectors we can create by performing
linear operations.
New vectors created in this manner are called linear combinations of 𝐴.
In particular, 𝑦 ∈ ℝ𝑛 is a linear combination of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } if

𝑦 = 𝛽1 𝑎1 + ⋯ + 𝛽𝑘 𝑎𝑘 for some scalars 𝛽1 , … , 𝛽𝑘

In this context, the values 𝛽1 , … , 𝛽𝑘 are called the coefficients of the linear combination.
The set of linear combinations of 𝐴 is called the span of 𝐴.
The next figure shows the span of 𝐴 = {𝑎1 , 𝑎2 } in ℝ3 .
The span is a two-dimensional plane passing through these two points and the origin.

fig = plt.figure(figsize=(10, 8))


ax = fig.gca(projection='3d')

x_min, x_max = -5, 5


(continues on next page)

40 Chapter 3. Linear Algebra


Quantitative Economics with Python

(continued from previous page)


y_min, y_max = -5, 5

α, β = 0.2, 0.1

ax.set(xlim=(x_min, x_max), ylim=(x_min, x_max), zlim=(x_min, x_max),


xticks=(0,), yticks=(0,), zticks=(0,))

gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k-', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k-', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k-', lw=2, alpha=0.5)

# Fixed linear function, to generate a plane


def f(x, y):
return α * x + β * y

# Vector locations, by coordinate


x_coords = np.array((3, 3))
y_coords = np.array((4, -4))
z = f(x_coords, y_coords)
for i in (0, 1):
ax.text(x_coords[i], y_coords[i], z[i], f'$a_{i+1}$', fontsize=14)

# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b-', lw=1.5, alpha=0.6)

# Draw the plane


grid_size = 20
xr2 = np.linspace(x_min, x_max, grid_size)
yr2 = np.linspace(y_min, y_max, grid_size)
x2, y2 = np.meshgrid(xr2, yr2)
z2 = f(x2, y2)
ax.plot_surface(x2, y2, z2, rstride=1, cstride=1, cmap=cm.jet,
linewidth=0, antialiased=True, alpha=0.2)
plt.show()

/tmp/ipykernel_15617/266575435.py:2: MatplotlibDeprecationWarning: Calling gca()␣


↪with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor␣

↪releases later, gca() will take no keyword arguments. The gca() function should␣

↪only be used to get the current axes, or if no axes exist, create new axes with␣

↪default keyword arguments. To create a new axes with non-default arguments, use␣

↪plt.axes() or plt.subplot().

ax = fig.gca(projection='3d')

3.2. Vectors 41
Quantitative Economics with Python

Examples

If 𝐴 contains only one vector 𝑎1 ∈ ℝ2 , then its span is just the scalar multiples of 𝑎1 , which is the unique line passing
through both 𝑎1 and the origin.
If 𝐴 = {𝑒1 , 𝑒2 , 𝑒3 } consists of the canonical basis vectors of ℝ3 , that is

1 0 0
𝑒1 ∶= ⎡ ⎤
⎢0⎥ , 𝑒2 ∶= ⎡ ⎤
⎢1⎥ , 𝑒3 ∶= ⎡
⎢0⎥

⎣0⎦ ⎣0⎦ ⎣1⎦

then the span of 𝐴 is all of ℝ3 , because, for any 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 , we can write

𝑥 = 𝑥1 𝑒1 + 𝑥2 𝑒2 + 𝑥3 𝑒3

Now consider 𝐴0 = {𝑒1 , 𝑒2 , 𝑒1 + 𝑒2 }.

42 Chapter 3. Linear Algebra


Quantitative Economics with Python

If 𝑦 = (𝑦1 , 𝑦2 , 𝑦3 ) is any linear combination of these vectors, then 𝑦3 = 0 (check it).


Hence 𝐴0 fails to span all of ℝ3 .

3.2.4 Linear Independence

As we’ll see, it’s often desirable to find families of vectors with relatively large span, so that many vectors can be described
by linear operators on a few vectors.
The condition we need for a set of vectors to have a large span is what’s called linear independence.
In particular, a collection of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 is said to be
• linearly dependent if some strict subset of 𝐴 has the same span as 𝐴.
• linearly independent if it is not linearly dependent.
Put differently, a set of vectors is linearly independent if no vector is redundant to the span and linearly dependent
otherwise.
To illustrate the idea, recall the figure that showed the span of vectors {𝑎1 , 𝑎2 } in ℝ3 as a plane through the origin.
If we take a third vector 𝑎3 and form the set {𝑎1 , 𝑎2 , 𝑎3 }, this set will be
• linearly dependent if 𝑎3 lies in the plane
• linearly independent otherwise
As another illustration of the concept, since ℝ𝑛 can be spanned by 𝑛 vectors (see the discussion of canonical basis vectors
above), any collection of 𝑚 > 𝑛 vectors in ℝ𝑛 must be linearly dependent.
The following statements are equivalent to linear independence of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛
1. No vector in 𝐴 can be formed as a linear combination of the other elements.
2. If 𝛽1 𝑎1 + ⋯ 𝛽𝑘 𝑎𝑘 = 0 for scalars 𝛽1 , … , 𝛽𝑘 , then 𝛽1 = ⋯ = 𝛽𝑘 = 0.
(The zero in the first expression is the origin of ℝ𝑛 )

3.2.5 Unique Representations

Another nice thing about sets of linearly independent vectors is that each element in the span has a unique representation
as a linear combination of these vectors.
In other words, if 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛 is linearly independent and

𝑦 = 𝛽1 𝑎1 + ⋯ 𝛽𝑘 𝑎𝑘

then no other coefficient sequence 𝛾1 , … , 𝛾𝑘 will produce the same vector 𝑦.


Indeed, if we also have 𝑦 = 𝛾1 𝑎1 + ⋯ 𝛾𝑘 𝑎𝑘 , then

(𝛽1 − 𝛾1 )𝑎1 + ⋯ + (𝛽𝑘 − 𝛾𝑘 )𝑎𝑘 = 0

Linear independence now implies 𝛾𝑖 = 𝛽𝑖 for all 𝑖.

3.2. Vectors 43
Quantitative Economics with Python

3.3 Matrices

Matrices are a neat way of organizing data for use in linear operations.
An 𝑛 × 𝑘 matrix is a rectangular array 𝐴 of numbers with 𝑛 rows and 𝑘 columns:

𝑎11 𝑎12 ⋯ 𝑎1𝑘


⎡𝑎 𝑎22 ⋯ 𝑎2𝑘 ⎤
𝐴 = ⎢ 21 ⎥
⎢ ⋮ ⋮ ⋮ ⎥
⎣𝑎𝑛1 𝑎𝑛2 ⋯ 𝑎𝑛𝑘 ⎦

Often, the numbers in the matrix represent coefficients in a system of linear equations, as discussed at the start of this
lecture.
For obvious reasons, the matrix 𝐴 is also called a vector if either 𝑛 = 1 or 𝑘 = 1.
In the former case, 𝐴 is called a row vector, while in the latter it is called a column vector.
If 𝑛 = 𝑘, then 𝐴 is called square.
The matrix formed by replacing 𝑎𝑖𝑗 by 𝑎𝑗𝑖 for every 𝑖 and 𝑗 is called the transpose of 𝐴 and denoted 𝐴′ or 𝐴⊤ .
If 𝐴 = 𝐴′ , then 𝐴 is called symmetric.
For a square matrix 𝐴, the 𝑖 elements of the form 𝑎𝑖𝑖 for 𝑖 = 1, … , 𝑛 are called the principal diagonal.
𝐴 is called diagonal if the only nonzero entries are on the principal diagonal.
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then 𝐴 is called the identity matrix
and denoted by 𝐼.

3.3.1 Matrix Operations

Just as was the case for vectors, a number of algebraic operations are defined for matrices.
Scalar multiplication and addition are immediate generalizations of the vector case:

𝑎11 ⋯ 𝑎1𝑘 𝛾𝑎11 ⋯ 𝛾𝑎1𝑘


𝛾𝐴 = 𝛾 ⎡
⎢ ⋮ ⋮ ⋮ ⎤ ⎡
⎥ ∶= ⎢ ⋮ ⋮ ⋮ ⎤⎥
⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝛾𝑎𝑛1 ⋯ 𝛾𝑎𝑛𝑘 ⎦

and
𝑎11 ⋯ 𝑎1𝑘 𝑏11 ⋯ 𝑏1𝑘 𝑎11 + 𝑏11 ⋯ 𝑎1𝑘 + 𝑏1𝑘
𝐴+𝐵 =⎡
⎢ ⋮ ⋮ ⋮ ⎤ ⎡
⎥+⎢ ⋮ ⋮ ⋮ ⎤ ⎡
⎥ ∶= ⎢ ⋮ ⋮ ⋮ ⎤

⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝑏𝑛1 ⋯ 𝑏𝑛𝑘 ⎦ ⎣𝑎𝑛1 + 𝑏𝑛1 ⋯ 𝑎𝑛𝑘 + 𝑏𝑛𝑘 ⎦

In the latter case, the matrices must have the same shape in order for the definition to make sense.
We also have a convention for multiplying two matrices.
The rule for matrix multiplication generalizes the idea of inner products discussed above and is designed to make multi-
plication play well with basic linear operations.
If 𝐴 and 𝐵 are two matrices, then their product 𝐴𝐵 is formed by taking as its 𝑖, 𝑗-th element the inner product of the 𝑖-th
row of 𝐴 and the 𝑗-th column of 𝐵.
There are many tutorials to help you visualize this operation, such as this one, or the discussion on the Wikipedia page.
If 𝐴 is 𝑛 × 𝑘 and 𝐵 is 𝑗 × 𝑚, then to multiply 𝐴 and 𝐵 we require 𝑘 = 𝑗, and the resulting matrix 𝐴𝐵 is 𝑛 × 𝑚.
As perhaps the most important special case, consider multiplying 𝑛 × 𝑘 matrix 𝐴 and 𝑘 × 1 column vector 𝑥.

44 Chapter 3. Linear Algebra


Quantitative Economics with Python

According to the preceding rule, this gives us an 𝑛 × 1 column vector

𝑎11 ⋯ 𝑎1𝑘 𝑥1 𝑎11 𝑥1 + ⋯ + 𝑎1𝑘 𝑥𝑘


𝐴𝑥 = ⎡
⎢ ⋮ ⋮ ⋮ ⎤ ⎡ ⋮ ⎤ ∶= ⎡
⎥⎢ ⎥ ⎢ ⋮ ⎤
⎥ (3.2)
⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝑥𝑘 ⎦ ⎣𝑎𝑛1 𝑥1 + ⋯ + 𝑎𝑛𝑘 𝑥𝑘 ⎦

Note: 𝐴𝐵 and 𝐵𝐴 are not generally the same thing.

Another important special case is the identity matrix.


You should check that if 𝐴 is 𝑛 × 𝑘 and 𝐼 is the 𝑘 × 𝑘 identity matrix, then 𝐴𝐼 = 𝐴.
If 𝐼 is the 𝑛 × 𝑛 identity matrix, then 𝐼𝐴 = 𝐴.

3.3.2 Matrices in NumPy

NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all the standard matrix oper-
ations1 .
You can create them manually from tuples of tuples (or lists of lists) as follows

A = ((1, 2),
(3, 4))

type(A)

tuple

A = np.array(A)

type(A)

numpy.ndarray

A.shape

(2, 2)

The shape attribute is a tuple giving the number of rows and columns — see here for more discussion.
To get the transpose of A, use A.transpose() or, more simply, A.T.
There are many convenient functions for creating common matrices (matrices of zeros, ones, etc.) — see here.
Since operations are performed elementwise by default, scalar multiplication and addition have very natural syntax

A = np.identity(3)
B = np.ones((3, 3))
2 * A

1 Although there is a specialized matrix data type defined in NumPy, it’s more standard to work with ordinary NumPy arrays. See this discussion.

3.3. Matrices 45
Quantitative Economics with Python

array([[2., 0., 0.],


[0., 2., 0.],
[0., 0., 2.]])

A + B

array([[2., 1., 1.],


[1., 2., 1.],
[1., 1., 2.]])

To multiply matrices we use the @ symbol.


In particular, A @ B is matrix multiplication, whereas A * B is element-by-element multiplication.
See here for more discussion.

3.3.3 Matrices as Maps

Each 𝑛 × 𝑘 matrix 𝐴 can be identified with a function 𝑓(𝑥) = 𝐴𝑥 that maps 𝑥 ∈ ℝ𝑘 into 𝑦 = 𝐴𝑥 ∈ ℝ𝑛 .
These kinds of functions have a special property: they are linear.
A function 𝑓 ∶ ℝ𝑘 → ℝ𝑛 is called linear if, for all 𝑥, 𝑦 ∈ ℝ𝑘 and all scalars 𝛼, 𝛽, we have

𝑓(𝛼𝑥 + 𝛽𝑦) = 𝛼𝑓(𝑥) + 𝛽𝑓(𝑦)

You can check that this holds for the function 𝑓(𝑥) = 𝐴𝑥 + 𝑏 when 𝑏 is the zero vector and fails when 𝑏 is nonzero.
In fact, it’s known that 𝑓 is linear if and only if there exists a matrix 𝐴 such that 𝑓(𝑥) = 𝐴𝑥 for all 𝑥.

3.4 Solving Systems of Equations

Recall again the system of equations (3.1).


If we compare (3.1) and (3.2), we see that (3.1) can now be written more conveniently as

𝑦 = 𝐴𝑥 (3.3)

The problem we face is to determine a vector 𝑥 ∈ ℝ𝑘 that solves (3.3), taking 𝑦 and 𝐴 as given.
This is a special case of a more general problem: Find an 𝑥 such that 𝑦 = 𝑓(𝑥).
Given an arbitrary function 𝑓 and a 𝑦, is there always an 𝑥 such that 𝑦 = 𝑓(𝑥)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows

def f(x):
return 0.6 * np.cos(4 * x) + 1.4

xmin, xmax = -1, 1


x = np.linspace(xmin, xmax, 160)
(continues on next page)

46 Chapter 3. Linear Algebra


Quantitative Economics with Python

(continued from previous page)


y = f(x)
ya, yb = np.min(y), np.max(y)

fig, axes = plt.subplots(2, 1, figsize=(10, 10))

for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(ylim=(-0.6, 3.2), xlim=(xmin, xmax),


yticks=(), xticks=())

ax.plot(x, y, 'k-', lw=2, label='$f$')


ax.fill_between(x, ya, yb, facecolor='blue', alpha=0.05)
ax.vlines([0], ya, yb, lw=3, color='blue', label='range of $f$')
ax.text(0.04, -0.3, '$0$', fontsize=16)

ax = axes[0]

ax.legend(loc='upper right', frameon=False)


ybar = 1.5
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.05, 0.8 * ybar, '$y$', fontsize=16)
for i, z in enumerate((-0.35, 0.35)):
ax.vlines(z, 0, f(z), linestyle='--', alpha=0.5)
ax.text(z, -0.2, f'$x_{i}$', fontsize=16)

ax = axes[1]

ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)

plt.show()

3.4. Solving Systems of Equations 47


Quantitative Economics with Python

In the first plot, there are multiple solutions, as the function is not one-to-one, while in the second there are no solutions,
since 𝑦 lies outside the range of 𝑓.
Can we impose conditions on 𝐴 in (3.3) that rule out these problems?
In this context, the most important thing to recognize about the expression 𝐴𝑥 is that it corresponds to a linear combination
of the columns of 𝐴.
In particular, if 𝑎1 , … , 𝑎𝑘 are the columns of 𝐴, then

𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘

Hence the range of 𝑓(𝑥) = 𝐴𝑥 is exactly the span of the columns of 𝐴.


We want the range to be large so that it contains arbitrary 𝑦.
As you might recall, the condition that we want for the span to be large is linear independence.
A happy fact is that linear independence of the columns of 𝐴 also gives us uniqueness.

48 Chapter 3. Linear Algebra


Quantitative Economics with Python

Indeed, it follows from our earlier discussion that if {𝑎1 , … , 𝑎𝑘 } are linearly independent and 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 +⋯+𝑥𝑘 𝑎𝑘 ,
then no 𝑧 ≠ 𝑥 satisfies 𝑦 = 𝐴𝑧.

3.4.1 The Square Matrix Case

Let’s discuss some more details, starting with the case where 𝐴 is 𝑛 × 𝑛.
This is the familiar case where the number of unknowns equals the number of equations.
For arbitrary 𝑦 ∈ ℝ𝑛 , we hope to find a unique 𝑥 ∈ ℝ𝑛 such that 𝑦 = 𝐴𝑥.
In view of the observations immediately above, if the columns of 𝐴 are linearly independent, then their span, and hence
the range of 𝑓(𝑥) = 𝐴𝑥, is all of ℝ𝑛 .
Hence there always exists an 𝑥 such that 𝑦 = 𝐴𝑥.
Moreover, the solution is unique.
In particular, the following are equivalent
1. The columns of 𝐴 are linearly independent.
2. For any 𝑦 ∈ ℝ𝑛 , the equation 𝑦 = 𝐴𝑥 has a unique solution.
The property of having linearly independent columns is sometimes expressed as having full column rank.

Inverse Matrices

Can we give some sort of expression for the solution?


If 𝑦 and 𝐴 are scalar with 𝐴 ≠ 0, then the solution is 𝑥 = 𝐴−1 𝑦.
A similar expression is available in the matrix case.
In particular, if square matrix 𝐴 has full column rank, then it possesses a multiplicative inverse matrix 𝐴−1 , with the
property that 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼.
As a consequence, if we pre-multiply both sides of 𝑦 = 𝐴𝑥 by 𝐴−1 , we get 𝑥 = 𝐴−1 𝑦.
This is the solution that we’re looking for.

Determinants

Another quick comment about square matrices is that to every such matrix we assign a unique number called the deter-
minant of the matrix — you can find the expression for it here.
If the determinant of 𝐴 is not zero, then we say that 𝐴 is nonsingular.
Perhaps the most important fact about determinants is that 𝐴 is nonsingular if and only if 𝐴 is of full column rank.
This gives us a useful one-number summary of whether or not a square matrix can be inverted.

3.4. Solving Systems of Equations 49


Quantitative Economics with Python

3.4.2 More Rows than Columns

This is the 𝑛 × 𝑘 case with 𝑛 > 𝑘.


This case is very important in many settings, not least in the setting of linear regression (where 𝑛 is the number of
observations, and 𝑘 is the number of explanatory variables).
Given arbitrary 𝑦 ∈ ℝ𝑛 , we seek an 𝑥 ∈ ℝ𝑘 such that 𝑦 = 𝐴𝑥.
In this setting, the existence of a solution is highly unlikely.
Without much loss of generality, let’s go over the intuition focusing on the case where the columns of 𝐴 are linearly
independent.
It follows that the span of the columns of 𝐴 is a 𝑘-dimensional subspace of ℝ𝑛 .
This span is very “unlikely” to contain arbitrary 𝑦 ∈ ℝ𝑛 .
To see why, recall the figure above, where 𝑘 = 2 and 𝑛 = 3.
Imagine an arbitrarily chosen 𝑦 ∈ ℝ3 , located somewhere in that three-dimensional space.
What’s the likelihood that 𝑦 lies in the span of {𝑎1 , 𝑎2 } (i.e., the two dimensional plane through these points)?
In a sense, it must be very small, since this plane has zero “thickness”.
As a result, in the 𝑛 > 𝑘 case we usually give up on existence.
However, we can still seek the best approximation, for example, an 𝑥 that makes the distance ‖𝑦 − 𝐴𝑥‖ as small as
possible.
To solve this problem, one can use either calculus or the theory of orthogonal projections.
The solution is known to be 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦 — see for example chapter 3 of these notes.

3.4.3 More Columns than Rows

This is the 𝑛 × 𝑘 case with 𝑛 < 𝑘, so there are fewer equations than unknowns.
In this case there are either no solutions or infinitely many — in other words, uniqueness never holds.
For example, consider the case where 𝑘 = 3 and 𝑛 = 2.
Thus, the columns of 𝐴 consists of 3 vectors in ℝ2 .
This set can never be linearly independent, since it is possible to find two vectors that span ℝ2 .
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two.
For example, let’s say that 𝑎1 = 𝛼𝑎2 + 𝛽𝑎3 .
Then if 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + 𝑥2 𝑎2 + 𝑥3 𝑎3 , we can also write

𝑦 = 𝑥1 (𝛼𝑎2 + 𝛽𝑎3 ) + 𝑥2 𝑎2 + 𝑥3 𝑎3 = (𝑥1 𝛼 + 𝑥2 )𝑎2 + (𝑥1 𝛽 + 𝑥3 )𝑎3

In other words, uniqueness fails.

50 Chapter 3. Linear Algebra


Quantitative Economics with Python

3.4.4 Linear Equations with SciPy

Here’s an illustration of how to solve linear equations with SciPy’s linalg submodule.
All of these routines are Python front ends to time-tested and highly optimized FORTRAN code

A = ((1, 2), (3, 4))


A = np.array(A)
y = np.ones((2, 1)) # Column vector
det(A) # Check that A is nonsingular, and hence invertible

-2.0

A_inv = inv(A) # Compute the inverse


A_inv

array([[-2. , 1. ],
[ 1.5, -0.5]])

x = A_inv @ y # Solution
A @ x # Should equal y

array([[1.],
[1.]])

solve(A, y) # Produces the same solution

array([[-1.],
[ 1.]])

Observe how we can solve for 𝑥 = 𝐴−1 𝑦 by either via inv(A) @ y, or using solve(A, y).
The latter method uses a different algorithm (LU decomposition) that is numerically more stable, and hence should almost
always be preferred.
To obtain the least-squares solution 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦, use scipy.linalg.lstsq(A, y).

3.5 Eigenvalues and Eigenvectors

Let 𝐴 be an 𝑛 × 𝑛 square matrix.


If 𝜆 is scalar and 𝑣 is a non-zero vector in ℝ𝑛 such that

𝐴𝑣 = 𝜆𝑣

then we say that 𝜆 is an eigenvalue of 𝐴, and 𝑣 is an eigenvector.


Thus, an eigenvector of 𝐴 is a vector such that when the map 𝑓(𝑥) = 𝐴𝑥 is applied, 𝑣 is merely scaled.
The next figure shows two eigenvectors (blue arrows) and their images under 𝐴 (red arrows).
As expected, the image 𝐴𝑣 of each 𝑣 is just a scaled version of the original

3.5. Eigenvalues and Eigenvectors 51


Quantitative Economics with Python

A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.grid(alpha=0.4)

xmin, xmax = -3, 3


ymin, ymax = -3, 3
ax.set(xlim=(xmin, xmax), ylim=(ymin, ymax))

# Plot each eigenvector


for v in evecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the image of each eigenvector


for v in evecs:
v = A @ v
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the lines they run through


x = np.linspace(xmin, xmax, 3)
for v in evecs:
a = v[1] / v[0]
ax.plot(x, a * x, 'b-', lw=0.4)

plt.show()

52 Chapter 3. Linear Algebra


Quantitative Economics with Python

The eigenvalue equation is equivalent to (𝐴 − 𝜆𝐼)𝑣 = 0, and this has a nonzero solution 𝑣 only when the columns of
𝐴 − 𝜆𝐼 are linearly dependent.
This in turn is equivalent to stating that the determinant is zero.
Hence to find all eigenvalues, we can look for 𝜆 such that the determinant of 𝐴 − 𝜆𝐼 is zero.
This problem can be expressed as one of solving for the roots of a polynomial in 𝜆 of degree 𝑛.
This in turn implies the existence of 𝑛 solutions in the complex plane, although some might be repeated.
Some nice facts about the eigenvalues of a square matrix 𝐴 are as follows
1. The determinant of 𝐴 equals the product of the eigenvalues.
2. The trace of 𝐴 (the sum of the elements on the principal diagonal) equals the sum of the eigenvalues.
3. If 𝐴 is symmetric, then all of its eigenvalues are real.
4. If 𝐴 is invertible and 𝜆1 , … , 𝜆𝑛 are its eigenvalues, then the eigenvalues of 𝐴−1 are 1/𝜆1 , … , 1/𝜆𝑛 .
A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues are nonzero.
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows

A = ((1, 2),
(2, 1))

A = np.array(A)
(continues on next page)

3.5. Eigenvalues and Eigenvectors 53


Quantitative Economics with Python

(continued from previous page)


evals, evecs = eig(A)
evals

array([ 3.+0.j, -1.+0.j])

evecs

array([[ 0.70710678, -0.70710678],


[ 0.70710678, 0.70710678]])

Note that the columns of evecs are the eigenvectors.


Since any scalar multiple of an eigenvector is an eigenvector with the same eigenvalue (check it), the eig routine normalizes
the length of each eigenvector to one.

3.5.1 Generalized Eigenvalues

It is sometimes useful to consider the generalized eigenvalue problem, which, for given matrices 𝐴 and 𝐵, seeks generalized
eigenvalues 𝜆 and eigenvectors 𝑣 such that
𝐴𝑣 = 𝜆𝐵𝑣

This can be solved in SciPy via scipy.linalg.eig(A, B).


Of course, if 𝐵 is square and invertible, then we can treat the generalized eigenvalue problem as an ordinary eigenvalue
problem 𝐵−1 𝐴𝑣 = 𝜆𝑣, but this is not always the case.

3.6 Further Topics

We round out our discussion by briefly mentioning several other important topics.

3.6.1 Series Expansions



Recall the usual summation formula for a geometric progression, which states that if |𝑎| < 1, then ∑𝑘=0 𝑎𝑘 = (1 − 𝑎)−1 .
A generalization of this idea exists in the matrix setting.

Matrix Norms

Let 𝐴 be a square matrix, and let


‖𝐴‖ ∶= max ‖𝐴𝑥‖
‖𝑥‖=1

The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand side is a matrix norm — in
this case, the so-called spectral norm.
For example, for a square matrix 𝑆, the condition ‖𝑆‖ < 1 means that 𝑆 is contractive, in the sense that it pulls all vectors
towards the origin2 .
2 Suppose that ‖𝑆‖ < 1. Take any nonzero vector 𝑥, and let 𝑟 ∶= ‖𝑥‖. We have ‖𝑆𝑥‖ = 𝑟‖𝑆(𝑥/𝑟)‖ ≤ 𝑟‖𝑆‖ < 𝑟 = ‖𝑥‖. Hence every point is
pulled towards the origin.

54 Chapter 3. Linear Algebra


Quantitative Economics with Python

Neumann’s Theorem

Let 𝐴 be a square matrix and let 𝐴𝑘 ∶= 𝐴𝐴𝑘−1 with 𝐴1 ∶= 𝐴.


In other words, 𝐴𝑘 is the 𝑘-th power of 𝐴.
Neumann’s theorem states the following: If ‖𝐴𝑘 ‖ < 1 for some 𝑘 ∈ ℕ, then 𝐼 − 𝐴 is invertible, and

(𝐼 − 𝐴)−1 = ∑ 𝐴𝑘 (3.4)
𝑘=0

Spectral Radius

A result known as Gelfand’s formula tells us that, for any square matrix 𝐴,

𝜌(𝐴) = lim ‖𝐴𝑘 ‖1/𝑘


𝑘→∞

Here 𝜌(𝐴) is the spectral radius, defined as max𝑖 |𝜆𝑖 |, where {𝜆𝑖 }𝑖 is the set of eigenvalues of 𝐴.
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus, there exists a 𝑘 with
‖𝐴𝑘 ‖ < 1.
In which case (3.4) is valid.

3.6.2 Positive Definite Matrices

Let 𝐴 be a symmetric 𝑛 × 𝑛 matrix.


We say that 𝐴 is
1. positive definite if 𝑥′ 𝐴𝑥 > 0 for every 𝑥 ∈ ℝ𝑛 {0}
2. positive semi-definite or nonnegative definite if 𝑥′ 𝐴𝑥 ≥ 0 for every 𝑥 ∈ ℝ𝑛
Analogous definitions exist for negative definite and negative semi-definite matrices.
It is notable that if 𝐴 is positive definite, then all of its eigenvalues are strictly positive, and hence 𝐴 is invertible (with
positive definite inverse).

3.6.3 Differentiating Linear and Quadratic Forms

The following formulas are useful in many economic contexts. Let


• 𝑧, 𝑥 and 𝑎 all be 𝑛 × 1 vectors
• 𝐴 be an 𝑛 × 𝑛 matrix
• 𝐵 be an 𝑚 × 𝑛 matrix and 𝑦 be an 𝑚 × 1 vector
Then
𝜕𝑎′ 𝑥
1. 𝜕𝑥 =𝑎
𝜕𝐴𝑥
2. 𝜕𝑥 = 𝐴′
𝜕𝑥′ 𝐴𝑥
3. 𝜕𝑥 = (𝐴 + 𝐴′ )𝑥
𝜕𝑦′ 𝐵𝑧
4. 𝜕𝑦 = 𝐵𝑧

3.6. Further Topics 55


Quantitative Economics with Python

𝜕𝑦′ 𝐵𝑧
5. 𝜕𝐵 = 𝑦𝑧 ′
Exercise 3.7.1 below asks you to apply these formulas.

3.6.4 Further Reading

The documentation of the scipy.linalg submodule can be found here.


Chapters 2 and 3 of the Econometric Theory contains a discussion of linear algebra along the same lines as above, with
solved exercises.
If you don’t mind a slightly abstract approach, a nice intermediate-level text on linear algebra is [Janich94].

3.7 Exercises

Exercise 3.7.1
Let 𝑥 be a given 𝑛 × 1 vector and consider the problem
𝑣(𝑥) = max {−𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢}
𝑦,𝑢

subject to the linear constraint

𝑦 = 𝐴𝑥 + 𝐵𝑢

Here
• 𝑃 is an 𝑛 × 𝑛 matrix and 𝑄 is an 𝑚 × 𝑚 matrix
• 𝐴 is an 𝑛 × 𝑛 matrix and 𝐵 is an 𝑛 × 𝑚 matrix
• both 𝑃 and 𝑄 are symmetric and positive semidefinite
(What must the dimensions of 𝑦 and 𝑢 be to make this a well-posed problem?)
One way to solve the problem is to form the Lagrangian

ℒ = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]

where 𝜆 is an 𝑛 × 1 vector of Lagrange multipliers.


Try applying the formulas given above for differentiating quadratic and linear forms to obtain the first-order conditions
for maximizing ℒ with respect to 𝑦, 𝑢 and minimizing it with respect to 𝜆.
Show that these conditions imply that
1. 𝜆 = −2𝑃 𝑦.
2. The optimizing choice of 𝑢 satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥.
3. The function 𝑣 satisfies 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 where 𝑃 ̃ = 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴.
As we will see, in economic contexts Lagrange multipliers often are shadow prices.

Note: If we don’t care about the Lagrange multipliers, we can substitute the constraint into the objective function, and
then just maximize −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢 with respect to 𝑢. You can verify that this leads to the same
maximizer.

56 Chapter 3. Linear Algebra


Quantitative Economics with Python

3.8 Solutions

Solution to Exercise 3.7.1


We have an optimization problem:
𝑣(𝑥) = max{−𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢}
𝑦,𝑢

s.t.

𝑦 = 𝐴𝑥 + 𝐵𝑢

with primitives
• 𝑃 be a symmetric and positive semidefinite 𝑛 × 𝑛 matrix
• 𝑄 be a symmetric and positive semidefinite 𝑚 × 𝑚 matrix
• 𝐴 an 𝑛 × 𝑛 matrix
• 𝐵 an 𝑛 × 𝑚 matrix
The associated Lagrangian is:

𝐿 = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]

Step 1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields
𝜕𝐿
= −(𝑃 + 𝑃 ′ )𝑦 − 𝜆 = −2𝑃 𝑦 − 𝜆 = 0 ,
𝜕𝑦
since P is symmetric.
Accordingly, the first-order condition for maximizing L w.r.t. y implies

𝜆 = −2𝑃 𝑦

Step 2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields
𝜕𝐿
= −(𝑄 + 𝑄′ )𝑢 − 𝐵′ 𝜆 = −2𝑄𝑢 + 𝐵′ 𝜆 = 0
𝜕𝑢
Substituting 𝜆 = −2𝑃 𝑦 gives

𝑄𝑢 + 𝐵′ 𝑃 𝑦 = 0

Substituting the linear constraint 𝑦 = 𝐴𝑥 + 𝐵𝑢 into above equation gives

𝑄𝑢 + 𝐵′ 𝑃 (𝐴𝑥 + 𝐵𝑢) = 0

(𝑄 + 𝐵′ 𝑃 𝐵)𝑢 + 𝐵′ 𝑃 𝐴𝑥 = 0
which is the first-order condition for maximizing 𝐿 w.r.t. 𝑢.
Thus, the optimal choice of u must satisfy

𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥 ,

3.8. Solutions 57
Quantitative Economics with Python

which follows from the definition of the first-order conditions for Lagrangian equation.
Step 3.
Rewriting our problem by substituting the constraint into the objective function, we get

𝑣(𝑥) = max{−(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢}


𝑢

Since we know the optimal choice of u satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥, then

𝑣(𝑥) = −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢 𝑤𝑖𝑡ℎ 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

To evaluate the function


𝑣(𝑥) = −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢
= −(𝑥′ 𝐴′ + 𝑢′ 𝐵′ )𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢
= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑥′ 𝐴′ 𝑃 𝐵𝑢 − 𝑢′ 𝐵′ 𝑃 𝐵𝑢 − 𝑢′ 𝑄𝑢
= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 2𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢

For simplicity, denote by 𝑆 ∶= (𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴, then 𝑢 = −𝑆𝑥.


Regarding the second term −2𝑢′ 𝐵′ 𝑃 𝐴𝑥,

−2𝑢′ 𝐵′ 𝑃 𝐴𝑥 = −2𝑥′ 𝑆 ′ 𝐵′ 𝑃 𝐴𝑥
= 2𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

Notice that the term (𝑄 + 𝐵′ 𝑃 𝐵)−1 is symmetric as both P and Q are symmetric.
Regarding the third term −𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢,

−𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢 = −𝑥′ 𝑆 ′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑆𝑥


= −𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

Hence, the summation of second and third terms is 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥.
This implies that

𝑣(𝑥) = −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 2𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢


= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 + 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
= −𝑥′ [𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴]𝑥

Therefore, the solution to the optimization problem 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 follows the above result by denoting 𝑃 ̃ ∶= 𝐴′ 𝑃 𝐴 −
𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴

58 Chapter 3. Linear Algebra


CHAPTER

FOUR

QR DECOMPOSITION

4.1 Overview

This lecture describes the QR decomposition and how it relates to


• Orthogonal projection and least squares
• A Gram-Schmidt process
• Eigenvalues and eigenvectors
We’ll write some Python code to help consolidate our understandings.

4.2 Matrix Factorization

The QR decomposition (also called the QR factorization) of a matrix is a decomposition of a matrix into the product of
an orthogonal matrix and a triangular matrix.
A QR decomposition of a real matrix 𝐴 takes the form

𝐴 = 𝑄𝑅

where
• 𝑄 is an orthogonal matrix (so that 𝑄𝑇 𝑄 = 𝐼)
• 𝑅 is an upper triangular matrix
We’ll use a Gram-Schmidt process to compute a QR decomposition
Because doing so is so educational, we’ll write our own Python code to do the job

4.3 Gram-Schmidt process

We’ll start with a square matrix 𝐴.


If a square matrix 𝐴 is nonsingular, then a 𝑄𝑅 factorization is unique.
We’ll deal with a rectangular matrix 𝐴 later.
Actually, our algorithm will work with a rectangular 𝐴 that is not square.

59
Quantitative Economics with Python

4.3.1 Gram-Schmidt process for square 𝐴

Here we apply a Gram-Schmidt process to the columns of matrix 𝐴.


In particular, let

𝐴 = [ 𝑎1 𝑎2 ⋯ 𝑎𝑛 ]

Let || · || denote the L2 norm.


The Gram-Schmidt algorithm repeatedly combines the following two steps in a particular order
• normalize a vector to have unit norm
• orthogonalize the next vector
To begin, we set 𝑢1 = 𝑎1 and then normalize:
𝑢1
𝑢1 = 𝑎 1 , 𝑒 1 =
||𝑢1 ||

We orgonalize first to compute 𝑢2 and then normalize to create 𝑒2 :


𝑢2
𝑢2 = 𝑎2 − (𝑎2 · 𝑒1 )𝑒1 , 𝑒2 =
||𝑢2 ||

We invite the reader to verify that 𝑒1 is orthogonal to 𝑒2 by checking that 𝑒1 ⋅ 𝑒2 = 0.


The Gram-Schmidt procedure continues iterating.
Thus, for 𝑘 = 2, … , 𝑛 − 1 we construct
𝑢𝑘+1
𝑢𝑘+1 = 𝑎𝑘+1 − (𝑎𝑘+1 · 𝑒1 )𝑒1 − ⋯ − (𝑎𝑘+1 · 𝑒𝑘 )𝑒𝑘 , 𝑒𝑘+1 =
||𝑢𝑘+1 ||

Here (𝑎𝑗 ⋅ 𝑒𝑖 ) can be interpreted as the linear least squares regression coefficient of 𝑎𝑗 on 𝑒𝑖
• it is the inner product of 𝑎𝑗 and 𝑒𝑖 divided by the inner product of 𝑒𝑖 where 𝑒𝑖 ⋅ 𝑒𝑖 = 1, as normalization has assured
us.
• this regression coefficient has an interpretation as being a covariance divided by a variance
It can be verified that
𝑎1 · 𝑒1 𝑎2 · 𝑒1 ⋯ 𝑎 𝑛 · 𝑒1
⎡ 0 𝑎2 · 𝑒 2 ⋯ 𝑎 𝑛 · 𝑒2 ⎤
𝐴 = [ 𝑎1 𝑎2 ⋯ 𝑎𝑛 ] = [ 𝑒1 𝑒2 ⋯ 𝑒𝑛 ]⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣ 0 0 ⋯ 𝑎𝑛 · 𝑒 𝑛 ⎦

Thus, we have constructed the decomposision

𝐴 = 𝑄𝑅

where

𝑄 = [ 𝑎1 𝑎2 ⋯ 𝑎𝑛 ] = [ 𝑒1 𝑒2 ⋯ 𝑒𝑛 ]

and
𝑎1 · 𝑒1 𝑎2 · 𝑒1 ⋯ 𝑎 𝑛 · 𝑒1
⎡ 0 𝑎2 · 𝑒 2 ⋯ 𝑎 𝑛 · 𝑒2 ⎤
𝑅=⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣ 0 0 ⋯ 𝑎𝑛 · 𝑒𝑛 ⎦

60 Chapter 4. QR Decomposition
Quantitative Economics with Python

4.3.2 𝐴 not square

Now suppose that 𝐴 is an 𝑛 × 𝑚 matrix where 𝑚 > 𝑛.


Then a 𝑄𝑅 decomposition is

𝑎1 · 𝑒1 𝑎2 · 𝑒 1 ⋯ 𝑎 𝑛 · 𝑒1 𝑎𝑛+1 ⋅ 𝑒1 ⋯ 𝑎 𝑚 ⋅ 𝑒1
⎡ 0 𝑎2 · 𝑒 2 ⋯ 𝑎 𝑛 · 𝑒2 𝑎𝑛+1 ⋅ 𝑒2 ⋯ 𝑎 𝑚 ⋅ 𝑒2 ⎤
𝐴 = [ 𝑎1 𝑎2 ⋯ 𝑎𝑚 ] = [ 𝑒1 𝑒2 ⋯ 𝑒𝑛 ]⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎥
⎣ 0 0 ⋯ 𝑎𝑛 · 𝑒 𝑛 𝑎𝑛+1 ⋅ 𝑒𝑛 ⋯ 𝑎 𝑚 ⋅ 𝑒𝑛 ⎦

which implies that

𝑎1 = (𝑎1 ⋅ 𝑒1 )𝑒1
𝑎2 = (𝑎2 ⋅ 𝑒1 )𝑒1 + (𝑎2 ⋅ 𝑒2 )𝑒2
⋮ ⋮
𝑎𝑛 = (𝑎𝑛 ⋅ 𝑒1 )𝑒1 + (𝑎𝑛 ⋅ 𝑒2 )𝑒2 + ⋯ + (𝑎𝑛 ⋅ 𝑒𝑛 )𝑒𝑛
𝑎𝑛+1 = (𝑎𝑛+1 ⋅ 𝑒1 )𝑒1 + (𝑎𝑛+1 ⋅ 𝑒2 )𝑒2 + ⋯ + (𝑎𝑛+1 ⋅ 𝑒𝑛 )𝑒𝑛
⋮ ⋮
𝑎𝑚 = (𝑎𝑚 ⋅ 𝑒1 )𝑒1 + (𝑎𝑚 ⋅ 𝑒2 )𝑒2 + ⋯ + (𝑎𝑚 ⋅ 𝑒𝑛 )𝑒𝑛

4.4 Some Code

Now let’s write some homemade Python code to implement a QR decomposition by deploying the Gram-Schmidt process
described above.

import numpy as np
from scipy.linalg import qr

def QR_Decomposition(A):
n, m = A.shape # get the shape of A

Q = np.empty((n, n)) # initialize matrix Q


u = np.empty((n, n)) # initialize matrix u

u[:, 0] = A[:, 0]
Q[:, 0] = u[:, 0] / np.linalg.norm(u[:, 0])

for i in range(1, n):

u[:, i] = A[:, i]
for j in range(i):
u[:, i] -= (A[:, i] @ Q[:, j]) * Q[:, j] # get each u vector

Q[:, i] = u[:, i] / np.linalg.norm(u[:, i]) # compute each e vetor

R = np.zeros((n, m))
for i in range(n):
for j in range(i, m):
R[i, j] = A[:, j] @ Q[:, i]

return Q, R

4.4. Some Code 61


Quantitative Economics with Python

The preceding code is fine but can benefit from some further housekeeping.
We want to do this because later in this notebook we want to compare results from using our homemade code above with
the code for a QR that the Python scipy package delivers.
There can be be sign differences between the 𝑄 and 𝑅 matrices produced by different numerical algorithms.
All of these are valid QR decompositions because of how the sign differences cancel out when we compute 𝑄𝑅.
However, to make the results from our homemade function and the QR module in scipy comparable, let’s require that
𝑄 have positive diagonal entries.
We do this by adjusting the signs of the columns in 𝑄 and the rows in 𝑅 appropriately.
To accomplish this we’ll define a pair of functions.

def diag_sign(A):
"Compute the signs of the diagonal of matrix A"

D = np.diag(np.sign(np.diag(A)))

return D

def adjust_sign(Q, R):


"""
Adjust the signs of the columns in Q and rows in R to
impose positive diagonal of Q
"""

D = diag_sign(Q)

Q[:, :] = Q @ D
R[:, :] = D @ R

return Q, R

4.5 Example

Now let’s do an example.

A = np.array([[1.0, 1.0, 0.0], [1.0, 0.0, 1.0], [0.0, 1.0, 1.0]])


# A = np.array([[1.0, 0.5, 0.2], [0.5, 0.5, 1.0], [0.0, 1.0, 1.0]])
# A = np.array([[1.0, 0.5, 0.2], [0.5, 0.5, 1.0]])

array([[1., 1., 0.],


[1., 0., 1.],
[0., 1., 1.]])

Q, R = adjust_sign(*QR_Decomposition(A))

62 Chapter 4. QR Decomposition
Quantitative Economics with Python

array([[ 0.70710678, -0.40824829, -0.57735027],


[ 0.70710678, 0.40824829, 0.57735027],
[ 0. , -0.81649658, 0.57735027]])

array([[ 1.41421356, 0.70710678, 0.70710678],


[ 0. , -1.22474487, -0.40824829],
[ 0. , 0. , 1.15470054]])

Let’s compare outcomes with what the scipy package produces

Q_scipy, R_scipy = adjust_sign(*qr(A))

print('Our Q: \n', Q)
print('\n')
print('Scipy Q: \n', Q_scipy)

Our Q:
[[ 0.70710678 -0.40824829 -0.57735027]
[ 0.70710678 0.40824829 0.57735027]
[ 0. -0.81649658 0.57735027]]

Scipy Q:
[[ 0.70710678 -0.40824829 -0.57735027]
[ 0.70710678 0.40824829 0.57735027]
[ 0. -0.81649658 0.57735027]]

print('Our R: \n', R)
print('\n')
print('Scipy R: \n', R_scipy)

Our R:
[[ 1.41421356 0.70710678 0.70710678]
[ 0. -1.22474487 -0.40824829]
[ 0. 0. 1.15470054]]

Scipy R:
[[ 1.41421356 0.70710678 0.70710678]
[ 0. -1.22474487 -0.40824829]
[ 0. 0. 1.15470054]]

The above outcomes give us the good news that our homemade function agrees with what scipy produces.
Now let’s do a QR decomposition for a rectangular matrix 𝐴 that is 𝑛 × 𝑚 with 𝑚 > 𝑛.

A = np.array([[1, 3, 4], [2, 0, 9]])

Q, R = adjust_sign(*QR_Decomposition(A))
Q, R

4.5. Example 63
Quantitative Economics with Python

(array([[ 0.4472136 , -0.89442719],


[ 0.89442719, 0.4472136 ]]),
array([[ 2.23606798, 1.34164079, 9.8386991 ],
[ 0. , -2.68328157, 0.4472136 ]]))

Q_scipy, R_scipy = adjust_sign(*qr(A))


Q_scipy, R_scipy

(array([[ 0.4472136 , -0.89442719],


[ 0.89442719, 0.4472136 ]]),
array([[ 2.23606798, 1.34164079, 9.8386991 ],
[ 0. , -2.68328157, 0.4472136 ]]))

4.6 Using QR Decomposition to Compute Eigenvalues

Now for a useful fact about the QR algorithm.


The following iterations on the QR decomposition can be used to compute eigenvalues of a square matrix 𝐴.
Here is the algorithm:
1. Set 𝐴0 = 𝐴 and form 𝐴0 = 𝑄0 𝑅0
2. Form 𝐴1 = 𝑅0 𝑄0 . Note that 𝐴1 is similar to 𝐴0 (easy to verify) and so has the same eigenvalues.
3. Form 𝐴1 = 𝑄1 𝑅1 (i.e., form the 𝑄𝑅 decomposition of 𝐴1 ).
4. Form 𝐴2 = 𝑅1 𝑄1 and then 𝐴2 = 𝑄2 𝑅2 .
5. Iterate to convergence.
6. Compute eigenvalues of 𝐴 and compare them to the diagonal values of the limiting 𝐴𝑛 found from this process.
Remark: this algorithm is close to one of the most efficient ways of computing eigenvalues!
Let’s write some Python code to try out the algorithm

def QR_eigvals(A, tol=1e-12, maxiter=1000):


"Find the eigenvalues of A using QR decomposition."

A_old = np.copy(A)
A_new = np.copy(A)

diff = np.inf
i = 0
while (diff > tol) and (i < maxiter):
A_old[:, :] = A_new
Q, R = QR_Decomposition(A_old)

A_new[:, :] = R @ Q

diff = np.abs(A_new - A_old).max()


i += 1

eigvals = np.diag(A_new)

return eigvals

64 Chapter 4. QR Decomposition
Quantitative Economics with Python

Now let’s try the code and compare the results with what scipy.linalg.eigvals gives us
Here goes

# experiment this with one random A matrix


A = np.random.random((3, 3))

sorted(QR_eigvals(A))

[0.3642660469068212, 0.49117719304268104, 1.1907905618674441]

Compare with the scipy package.

sorted(np.linalg.eigvals(A))

[0.3642660469071797, 0.49117719304232105, 1.1907905618674457]

4.7 𝑄𝑅 and PCA

There are interesting connections between the 𝑄𝑅 decomposition and principal components analysis (PCA).
Here are some.
1. Let 𝑋 ′ be a 𝑘 × 𝑛 random matrix where the 𝑗th column is a random draw from 𝒩(𝜇, Σ) where 𝜇 is 𝑘 × 1 vector
of means and Σ is a 𝑘 × 𝑘 covariance matrix. We want 𝑛 >> 𝑘 – this is an “econometrics example”.
2. Form 𝑋 ′ = 𝑄𝑅 where 𝑄 is 𝑘 × 𝑘 and 𝑅 is 𝑘 × 𝑛.
3. Form the eigenvalues of 𝑅𝑅′ , i.e., we’ll compute 𝑅𝑅′ = 𝑃 ̃ Λ𝑃 ̃ ′ .
̂ ′.
4. Form 𝑋 ′ 𝑋 = 𝑄𝑃 ̃ Λ𝑃 ̃ ′ 𝑄′ and compare it with the eigen decomposition 𝑋 ′ 𝑋 = 𝑃 Λ𝑃
5. It will turn out that that Λ = Λ̂ and that 𝑃 = 𝑄𝑃 ̃ .
Let’s verify conjecture 5 with some Python code.
Start by simulating a random (𝑛, 𝑘) matrix 𝑋.

k = 5
n = 1000

# generate some random moments


= np.random.random(size=k)
C = np.random.random((k, k))
Σ = C.T @ C

# X is random matrix where each column follows multivariate normal dist.


X = np.random.multivariate_normal( , Σ, size=n)

X.shape

(1000, 5)

4.7. 𝑄𝑅 and PCA 65


Quantitative Economics with Python

Let’s apply the QR decomposition to 𝑋 ′ .

Q, R = adjust_sign(*QR_Decomposition(X.T))

Check the shapes of 𝑄 and 𝑅.

Q.shape, R.shape

((5, 5), (5, 1000))

Now we can construct 𝑅𝑅′ = 𝑃 ̃ Λ𝑃 ̃ ′ and form an eigen decomposition.

RR = R @ R.T

, P_tilde = np.linalg.eigh(RR)
Λ = np.diag( )

̂ ′.
We can also apply the decomposition to 𝑋 ′ 𝑋 = 𝑃 Λ𝑃

XX = X.T @ X

_hat, P = np.linalg.eigh(XX)
Λ_hat = np.diag( _hat)

Compare the eigenvalues which are on the diagnoals of Λ and Λ.̂

, _hat

(array([1.46666089e+00, 1.56022587e+02, 5.03047121e+02, 1.01131984e+03,


8.48616704e+03]),
array([1.46666089e+00, 1.56022587e+02, 5.03047121e+02, 1.01131984e+03,
8.48616704e+03]))

Let’s compare 𝑃 and 𝑄𝑃 ̃ .


Again we need to be careful about sign differences between the columns of 𝑃 and 𝑄𝑃 ̃ .

QP_tilde = Q @ P_tilde

np.abs(P @ diag_sign(P) - QP_tilde @ diag_sign(QP_tilde)).max()

4.385380947269368e-15

Let’s verify that 𝑋 ′ 𝑋 can be decomposed as 𝑄𝑃 ̃ Λ𝑃 ̃ ′ 𝑄′ .

QPΛPQ = Q @ P_tilde @ Λ @ P_tilde.T @ Q.T

np.abs(QPΛPQ - XX).max()

5.4569682106375694e-12

66 Chapter 4. QR Decomposition
CHAPTER

FIVE

COMPLEX NUMBERS AND TRIGONOMETRY

Contents

• Complex Numbers and Trigonometry


– Overview
– De Moivre’s Theorem
– Applications of de Moivre’s Theorem

5.1 Overview

This lecture introduces some elementary mathematics and trigonometry.


Useful and interesting in its own right, these concepts reap substantial rewards when studying dynamics generated by
linear difference equations or linear differential equations.
For example, these tools are keys to understanding outcomes attained by Paul Samuelson (1939) [Sam39] in his classic
paper on interactions between the investment accelerator and the Keynesian consumption function, our topic in the lecture
Samuelson Multiplier Accelerator.
In addition to providing foundations for Samuelson’s work and extensions of it, this lecture can be read as a stand-alone
quick reminder of key results from elementary high school trigonometry.
So let’s dive in.

5.1.1 Complex Numbers

A complex number has a real part 𝑥 and a purely imaginary part 𝑦.


The Euclidean, polar, and trigonometric forms of a complex number 𝑧 are:

𝑧 = 𝑥 + 𝑖𝑦 = 𝑟𝑒𝑖𝜃 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃)

The second equality above is known as Euler’s formula


• Euler contributed many other formulas too!
The complex conjugate 𝑧 ̄ of 𝑧 is defined as

𝑧 ̄ = 𝑥 − 𝑖𝑦 = 𝑟𝑒−𝑖𝜃 = 𝑟(cos 𝜃 − 𝑖 sin 𝜃)

67
Quantitative Economics with Python

The value 𝑥 is the real part of 𝑧 and 𝑦 is the imaginary part of 𝑧.



The symbol |𝑧| = 𝑧 ̄ ⋅ 𝑧 = 𝑟 represents the modulus of 𝑧.
The value 𝑟 is the Euclidean distance of vector (𝑥, 𝑦) from the origin:

𝑟 = |𝑧| = √𝑥2 + 𝑦2

The value 𝜃 is the angle of (𝑥, 𝑦) with respect to the real axis.
Evidently, the tangent of 𝜃 is ( 𝑥𝑦 ).
Therefore,
𝑦
𝜃 = tan−1 ( )
𝑥
Three elementary trigonometric functions are

𝑥 𝑒𝑖𝜃 + 𝑒−𝑖𝜃 𝑦 𝑒𝑖𝜃 − 𝑒−𝑖𝜃 𝑦


cos 𝜃 = = , sin 𝜃 = = , tan 𝜃 =
𝑟 2 𝑟 2𝑖 𝑥
We’ll need the following imports:

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from sympy import *

5.1.2 An Example

Consider the complex number 𝑧 = 1 + 3𝑖.
√ √
For 𝑧 = 1 + 3𝑖, 𝑥 = 1, 𝑦 = 3.

It follows that 𝑟 = 2 and 𝜃 = tan−1 ( 3) = 𝜋3 = 60𝑜 .

Let’s use Python to plot the trigonometric form of the complex number 𝑧 = 1 + 3𝑖.

# Abbreviate useful values and functions


π = np.pi

# Set parameters
r = 2
θ = π/3
x = r * np.cos(θ)
x_range = np.linspace(0, x, 1000)
θ_range = np.linspace(0, θ, 1000)

# Plot
fig = plt.figure(figsize=(8, 8))
ax = plt.subplot(111, projection='polar')

ax.plot((0, θ), (0, r), marker='o', color='b') # Plot r


ax.plot(np.zeros(x_range.shape), x_range, color='b') # Plot x
ax.plot(θ_range, x / np.cos(θ_range), color='b') # Plot y
(continues on next page)

68 Chapter 5. Complex Numbers and Trigonometry


Quantitative Economics with Python

(continued from previous page)


ax.plot(θ_range, np.full(θ_range.shape, 0.1), color='r') # Plot θ

ax.margins(0) # Let the plot starts at origin

ax.set_title("Trigonometry of complex numbers", va='bottom',


fontsize='x-large')

ax.set_rmax(2)
ax.set_rticks((0.5, 1, 1.5, 2)) # Less radial ticks
ax.set_rlabel_position(-88.5) # Get radial labels away from plotted line

ax.text(θ, r+0.01 , r'$z = x + iy = 1 + \sqrt{3}\, i$') # Label z


ax.text(θ+0.2, 1 , '$r = 2$') # Label r
ax.text(0-0.2, 0.5, '$x = 1$') # Label x
ax.text(0.5, 1.2, r'$y = \sqrt{3}$') # Label y
ax.text(0.25, 0.15, r'$\theta = 60^o$') # Label θ

ax.grid(True)
plt.show()

5.1. Overview 69
Quantitative Economics with Python

5.2 De Moivre’s Theorem

de Moivre’s theorem states that:

(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = 𝑟𝑛 𝑒𝑖𝑛𝜃 = 𝑟𝑛 (cos 𝑛𝜃 + 𝑖 sin 𝑛𝜃)

To prove de Moivre’s theorem, note that


𝑛
(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = (𝑟𝑒𝑖𝜃 )

and compute.

70 Chapter 5. Complex Numbers and Trigonometry


Quantitative Economics with Python

5.3 Applications of de Moivre’s Theorem

5.3.1 Example 1

We can use de Moivre’s theorem to show that 𝑟 = √𝑥2 + 𝑦2 .


We have
1 = 𝑒𝑖𝜃 𝑒−𝑖𝜃
= (cos 𝜃 + 𝑖 sin 𝜃)(cos (-𝜃) + 𝑖 sin (-𝜃))
= (cos 𝜃 + 𝑖 sin 𝜃)(cos 𝜃 − 𝑖 sin 𝜃)
2
= cos2 𝜃 + sin 𝜃
𝑥2 𝑦2
= 2
+ 2
𝑟 𝑟
and thus

𝑥2 + 𝑦 2 = 𝑟 2

We recognize this as a theorem of Pythagoras.

5.3.2 Example 2

Let 𝑧 = 𝑟𝑒𝑖𝜃 and 𝑧 ̄ = 𝑟𝑒−𝑖𝜃 so that 𝑧 ̄ is the complex conjugate of 𝑧.


(𝑧, 𝑧)̄ form a complex conjugate pair of complex numbers.
Let 𝑎 = 𝑝𝑒𝑖𝜔 and 𝑎̄ = 𝑝𝑒−𝑖𝜔 be another complex conjugate pair.
For each element of a sequence of integers 𝑛 = 0, 1, 2, … ,.
To do so, we can apply de Moivre’s formula.
Thus,

𝑥𝑛 = 𝑎𝑧 𝑛 + 𝑎𝑧̄ 𝑛̄
= 𝑝𝑒𝑖𝜔 (𝑟𝑒𝑖𝜃 )𝑛 + 𝑝𝑒−𝑖𝜔 (𝑟𝑒−𝑖𝜃 )𝑛
= 𝑝𝑟𝑛 𝑒𝑖(𝜔+𝑛𝜃) + 𝑝𝑟𝑛 𝑒−𝑖(𝜔+𝑛𝜃)
= 𝑝𝑟𝑛 [cos (𝜔 + 𝑛𝜃) + 𝑖 sin (𝜔 + 𝑛𝜃) + cos (𝜔 + 𝑛𝜃) − 𝑖 sin (𝜔 + 𝑛𝜃)]
= 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)

5.3.3 Example 3

This example provides machinery that is at the heard of Samuelson’s analysis of his multiplier-accelerator model [Sam39].
Thus, consider a second-order linear difference equation

𝑥𝑛+2 = 𝑐1 𝑥𝑛+1 + 𝑐2 𝑥𝑛

whose characteristic polynomial is

𝑧 2 − 𝑐1 𝑧 − 𝑐 2 = 0

5.3. Applications of de Moivre’s Theorem 71


Quantitative Economics with Python

or

(𝑧 2 − 𝑐1 𝑧 − 𝑐2 ) = (𝑧 − 𝑧1 )(𝑧 − 𝑧2 ) = 0

has roots 𝑧1 , 𝑧1 .
A solution is a sequence {𝑥𝑛 }∞
𝑛=0 that satisfies the difference equation.

Under the following circumstances, we can apply our example 2 formula to solve the difference equation
• the roots 𝑧1 , 𝑧2 of the characteristic polynomial of the difference equation form a complex conjugate pair
• the values 𝑥0 , 𝑥1 are given initial conditions
To solve the difference equation, recall from example 2 that

𝑥𝑛 = 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)

where 𝜔, 𝑝 are coefficients to be determined from information encoded in the initial conditions 𝑥1 , 𝑥0 .
Since 𝑥0 = 2𝑝 cos 𝜔 and 𝑥1 = 2𝑝𝑟 cos (𝜔 + 𝜃) the ratio of 𝑥1 to 𝑥0 is

𝑥1 𝑟 cos (𝜔 + 𝜃)
=
𝑥0 cos 𝜔

We can solve this equation for 𝜔 then solve for 𝑝 using 𝑥0 = 2𝑝𝑟0 cos (𝜔 + 𝑛𝜃).
With the sympy package in Python, we are able to solve and plot the dynamics of 𝑥𝑛 given different values of 𝑛.
√ √
In this example, we set the initial values: - 𝑟 = 0.9 - 𝜃 = 41 𝜋 - 𝑥0 = 4 - 𝑥1 = 𝑟 ⋅ 2 2 = 1.8 2.
We first numerically solve for 𝜔 and 𝑝 using nsolve in the sympy package based on the above initial condition:

# Set parameters
r = 0.9
θ = π/4
x0 = 4
x1 = 2 * r * sqrt(2)

# Define symbols to be calculated


ω, p = symbols('ω p', real=True)

# Solve for ω
## Note: we choose the solution near 0
eq1 = Eq(x1/x0 - r * cos(ω+θ) / cos(ω), 0)
ω = nsolve(eq1, ω, 0)
ω = np.float(ω)
print(f'ω = {ω:1.3f}')

# Solve for p
eq2 = Eq(x0 - 2 * p * cos(ω), 0)
p = nsolve(eq2, p, 0)
p = np.float(p)
print(f'p = {p:1.3f}')

ω = 0.000
p = 2.000

72 Chapter 5. Complex Numbers and Trigonometry


Quantitative Economics with Python

/tmp/ipykernel_11335/2347233413.py:14: DeprecationWarning: `np.float` is a␣


↪deprecated alias for the builtin `float`. To silence this warning, use `float`␣

↪by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.float64` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

ω = np.float(ω)
/tmp/ipykernel_11335/2347233413.py:20: DeprecationWarning: `np.float` is a␣
↪deprecated alias for the builtin `float`. To silence this warning, use `float`␣

↪by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.float64` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

p = np.float(p)

Using the code above, we compute that 𝜔 = 0 and 𝑝 = 2.


Then we plug in the values we solve for 𝜔 and 𝑝 and plot the dynamic.

# Define range of n
max_n = 30
n = np.arange(0, max_n+1, 0.01)

# Define x_n
x = lambda n: 2 * p * r**n * np.cos(ω + n * θ)

# Plot
fig, ax = plt.subplots(figsize=(12, 8))

ax.plot(n, x(n))
ax.set(xlim=(0, max_n), ylim=(-5, 5), xlabel='$n$', ylabel='$x_n$')

# Set x-axis in the middle of the plot


ax.spines['bottom'].set_position('center')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')

ticklab = ax.xaxis.get_ticklabels()[0] # Set x-label position


trans = ticklab.get_transform()
ax.xaxis.set_label_coords(31, 0, transform=trans)

ticklab = ax.yaxis.get_ticklabels()[0] # Set y-label position


trans = ticklab.get_transform()
ax.yaxis.set_label_coords(0, 5, transform=trans)

ax.grid()
plt.show()

5.3. Applications of de Moivre’s Theorem 73


Quantitative Economics with Python

5.3.4 Trigonometric Identities

We can obtain a complete suite of trigonometric identities by appropriately manipulating polar forms of complex numbers.
We’ll get many of them by deducing implications of the equality
𝑒𝑖(𝜔+𝜃) = 𝑒𝑖𝜔 𝑒𝑖𝜃
For example, we’ll calculate identities for
cos (𝜔 + 𝜃) and sin (𝜔 + 𝜃).
Using the sine and cosine formulas presented at the beginning of this lecture, we have:
𝑒𝑖(𝜔+𝜃) + 𝑒−𝑖(𝜔+𝜃)
cos (𝜔 + 𝜃) =
2
𝑒𝑖(𝜔+𝜃) − 𝑒−𝑖(𝜔+𝜃)
sin (𝜔 + 𝜃) =
2𝑖
We can also obtain the trigonometric identities as follows:
cos (𝜔 + 𝜃) + 𝑖 sin (𝜔 + 𝜃) = 𝑒𝑖(𝜔+𝜃)
= 𝑒𝑖𝜔 𝑒𝑖𝜃
= (cos 𝜔 + 𝑖 sin 𝜔)(cos 𝜃 + 𝑖 sin 𝜃)
= (cos 𝜔 cos 𝜃 − sin 𝜔 sin 𝜃) + 𝑖(cos 𝜔 sin 𝜃 + sin 𝜔 cos 𝜃)
Since both real and imaginary parts of the above formula should be equal, we get:
cos (𝜔 + 𝜃) = cos 𝜔 cos 𝜃 − sin 𝜔 sin 𝜃
sin (𝜔 + 𝜃) = cos 𝜔 sin 𝜃 + sin 𝜔 cos 𝜃
The equations above are also known as the angle sum identities. We can verify the equations using the simplify
function in the sympy package:

74 Chapter 5. Complex Numbers and Trigonometry


Quantitative Economics with Python

# Define symbols
ω, θ = symbols('ω θ', real=True)

# Verify
print("cos(ω)cos(θ) - sin(ω)sin(θ) =",
simplify(cos(ω)*cos(θ) - sin(ω) * sin(θ)))
print("cos(ω)sin(θ) + sin(ω)cos(θ) =",
simplify(cos(ω)*sin(θ) + sin(ω) * cos(θ)))

cos(ω)cos(θ) - sin(ω)sin(θ) = cos(θ + ω)


cos(ω)sin(θ) + sin(ω)cos(θ) = sin(θ + ω)

5.3.5 Trigonometric Integrals

We can also compute the trigonometric integrals using polar forms of complex numbers.
For example, we want to solve the following integral:
𝜋
∫ cos(𝜔) sin(𝜔) 𝑑𝜔
−𝜋

Using Euler’s formula, we have:

(𝑒𝑖𝜔 + 𝑒−𝑖𝜔 ) (𝑒𝑖𝜔 − 𝑒−𝑖𝜔 )


∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = ∫ 𝑑𝜔
2 2𝑖
1
= ∫ 𝑒2𝑖𝜔 − 𝑒−2𝑖𝜔 𝑑𝜔
4𝑖
1 −𝑖 𝑖
= ( 𝑒2𝑖𝜔 − 𝑒−2𝑖𝜔 + 𝐶1 )
4𝑖 2 2
2 2
1
= − [(𝑒𝑖𝜔 ) + (𝑒−𝑖𝜔 ) − 2] + 𝐶2
8
1
= − (𝑒𝑖𝜔 − 𝑒−𝑖𝜔 )2 + 𝐶2
8
2
1 𝑒𝑖𝜔 − 𝑒−𝑖𝜔
= ( ) + 𝐶2
2 2𝑖
1 2
= sin (𝜔) + 𝐶2
2
and thus:
𝜋
1 2 1 2
∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = sin (𝜋) − sin (−𝜋) = 0
−𝜋 2 2

We can verify the analytical as well as numerical results using integrate in the sympy package:

# Set initial printing


init_printing()

ω = Symbol('ω')
print('The analytical solution for integral of cos(ω)sin(ω) is:')
integrate(cos(ω) * sin(ω), ω)

5.3. Applications of de Moivre’s Theorem 75


Quantitative Economics with Python

The analytical solution for integral of cos(ω)sin(ω) is:

print('The numerical solution for the integral of cos(ω)sin(ω) \


from -π to π is:')
integrate(cos(ω) * sin(ω), (ω, -π, π))

The numerical solution for the integral of cos(ω)sin(ω) from -π to π is:

5.3.6 Exercises

We invite the reader to verify analytically and with the sympy package the following two equalities:
𝜋
𝜋
∫ cos(𝜔)2 𝑑𝜔 =
−𝜋 2
𝜋
𝜋
∫ sin(𝜔)2 𝑑𝜔 =
−𝜋 2

76 Chapter 5. Complex Numbers and Trigonometry


CHAPTER

SIX

CIRCULANT MATRICES

6.1 Overview

This lecture describes circulant matrices and some of their properties.


Circulant matrices have a special structure that connects them to useful concepts including
• convolution
• Fourier transforms
• permutation matrices
Because of these connections, circulant matrices are widely used in machine learning, for example, in image processing.
We begin by importing some Python packages

import numpy as np
from numba import njit
import matplotlib.pyplot as plt
%matplotlib inline

np.set_printoptions(precision=3, suppress=True)

6.2 Constructing a Circulant Matrix

To construct an 𝑁 × 𝑁 circulant matrix, we need only the first row, say,

[𝑐0 𝑐1 𝑐2 𝑐3 𝑐4 ⋯ 𝑐𝑁−1 ] .

After setting entries in the first row, the remaining rows of a circulant matrix are determined as follows:
𝑐0 𝑐1 𝑐2 𝑐3 𝑐4 ⋯ 𝑐𝑁−1
⎡ 𝑐 𝑐0 𝑐1 𝑐2 𝑐3 ⋯ 𝑐𝑁−2 ⎤
𝑁−1
⎢ ⎥
⎢ 𝑐𝑁−2 𝑐𝑁−1 𝑐0 𝑐1 𝑐2 ⋯ 𝑐𝑁−3 ⎥
𝐶=⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥ (6.1)
⎢ ⎥
⎢ 𝑐3 𝑐4 𝑐5 𝑐6 𝑐7 ⋯ 𝑐2 ⎥
⎢ 𝑐2 𝑐3 𝑐4 𝑐5 𝑐6 ⋯ 𝑐1 ⎥
⎣ 𝑐1 𝑐2 𝑐3 𝑐4 𝑐5 ⋯ 𝑐0 ⎦
It is also possible to construct a circulant matrix by creating the transpose of the above matrix, in which case only the first
column needs to be specified.
Let’s write some Python code to generate a circulant matrix.

77
Quantitative Economics with Python

@njit
def construct_cirlulant(row):

N = row.size

C = np.empty((N, N))

for i in range(N):

C[i, i:] = row[:N-i]


C[i, :i] = row[N-i:]

return C

# a simple case when N = 3


construct_cirlulant(np.array([1., 2., 3.]))

array([[1., 2., 3.],


[3., 1., 2.],
[2., 3., 1.]])

6.2.1 Some Properties of Circulant Matrices

Here are some useful properties:


Suppose that 𝐴 and 𝐵 are both circulant matrices. Then it can be verified that
• The transpose of a circulant matrix is a circulant matrix.
• 𝐴 + 𝐵 is a circulant matrix
• 𝐴𝐵 is a circulant matrix
• 𝐴𝐵 = 𝐵𝐴
Now consider a circulant matrix with first row

𝑐 = [𝑐0 𝑐1 ⋯ 𝑐𝑁−1 ]

and consider a vector

𝑎 = [𝑎0 𝑎1 ⋯ 𝑎𝑁−1 ]

The convolution of vectors 𝑐 and 𝑎 is defined as the vector 𝑏 = 𝑐 ∗ 𝑎 with components


𝑛−1
𝑏𝑘 = ∑ 𝑐𝑘−𝑖 𝑎𝑖 (6.2)
𝑖=0

We use ∗ to denote convolution via the calculation described in equation (6.2).


It can be verified that the vector 𝑏 satisfies

𝑏 = 𝐶𝑇 𝑎

where 𝐶 𝑇 is the transpose of the circulant matrix defined in equation (6.1).

78 Chapter 6. Circulant Matrices


Quantitative Economics with Python

6.3 Connection to Permutation Matrix

A good way to construct a circulant matrix is to use a permutation matrix.


Before defining a permutation matrix, we’ll define a permutation.
A permutation of a set of the set of non-negative integers {0, 1, 2, …} is a one-to-one mapping of the set into itself.
A permutation of a set {1, 2, … , 𝑛} rearranges the 𝑛 integers in the set.
A permutation matrix is obtained by permuting the rows of an 𝑛 × 𝑛 identity matrix according to a permutation of the
numbers 1 to 𝑛.
Thus, every row and every column contain precisely a single 1 with 0 everywhere else.
Every permutation corresponds to a unique permutation matrix.
For example, the 𝑁 × 𝑁 matrix

0 1 0 0 ⋯ 0
⎡ 0 0 1 0 ⋯ 0 ⎤
⎢ ⎥
0 0 0 1 ⋯ 0
𝑃 =⎢ ⎥ (6.3)
⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥
⎢ 0 0 0 0 ⋯ 1 ⎥
⎣ 1 0 0 0 ⋯ 0 ⎦

serves as a cyclic shift operator that, when applied to an 𝑁 × 1 vector ℎ, shifts entries in rows 2 through 𝑁 up one row
and shifts the entry in row 1 to row 𝑁 .
Eigenvalues of the cyclic shift permutation matrix 𝑃 defined in equation (6.3) can be computed by constructing

−𝜆 1 0 0 ⋯ 0
⎡ 0 −𝜆 1 0 ⋯ 0 ⎤
⎢ ⎥
0 0 −𝜆 1 ⋯ 0
𝑃 − 𝜆𝐼 = ⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥
⎢ 0 0 0 0 ⋯ 1 ⎥
⎣ 1 0 0 0 ⋯ −𝜆 ⎦

and solving

det(𝑃 − 𝜆𝐼) = (−1)𝑁 𝜆𝑁 − 1 = 0

Eigenvalues 𝜆𝑖 can be complex.


Magnitudes ∣ 𝜆𝑖 ∣ of these eigenvalues 𝜆𝑖 all equal 1.
Thus, singular values of the permutation matrix 𝑃 defined in equation (6.3) all equal 1.
It can be verified that permutation matrices are orthogonal matrices:

𝑃𝑃′ = 𝐼

6.3. Connection to Permutation Matrix 79


Quantitative Economics with Python

6.4 Examples with Python

Let’s write some Python code to illustrate these ideas.

@njit
def construct_P(N):

P = np.zeros((N, N))

for i in range(N-1):
P[i, i+1] = 1
P[-1, 0] = 1

return P

P4 = construct_P(4)
P4

array([[0., 1., 0., 0.],


[0., 0., 1., 0.],
[0., 0., 0., 1.],
[1., 0., 0., 0.]])

# compute the eigenvalues and eigenvectors


, Q = np.linalg.eig(P4)

for i in range(4):
print(f' {i} = { [i]:.1f} \nvec{i} = {Q[i, :]}\n')

0 = -1.0+0.0j
vec0 = [-0.5+0.j 0. +0.5j 0. -0.5j -0.5+0.j ]

1 = 0.0+1.0j
vec1 = [ 0.5+0.j -0.5+0.j -0.5-0.j -0.5+0.j]

2 = 0.0-1.0j
vec2 = [-0.5+0.j 0. -0.5j 0. +0.5j -0.5+0.j ]

3 = 1.0+0.0j
vec3 = [ 0.5+0.j 0.5-0.j 0.5+0.j -0.5+0.j]

In graphs below, we shall portray eigenvalues of a shift permutation matrix in the complex plane.
These eigenvalues are uniformly distributed along the unit circle.
They are the 𝑛 roots of unity, meaning they are the 𝑛 numbers 𝑧 that solve 𝑧 𝑛 = 1, where 𝑧 is a complex number.
In particular, the 𝑛 roots of unity are

2𝜋𝑗𝑘
𝑧 = exp ( ), 𝑘 = 0, … , 𝑁 − 1
𝑁
where 𝑗 denotes the purely imaginary unit number.

80 Chapter 6. Circulant Matrices


Quantitative Economics with Python

fig, ax = plt.subplots(2, 2, figsize=(10, 10))

for i, N in enumerate([3, 4, 6, 8]):

row_i = i // 2
col_i = i % 2

P = construct_P(N)
, Q = np.linalg.eig(P)

circ = plt.Circle((0, 0), radius=1, edgecolor='b', facecolor='None')


ax[row_i, col_i].add_patch(circ)

for j in range(N):
ax[row_i, col_i].scatter( [j].real, [j].imag, c='b')

ax[row_i, col_i].set_title(f'N = {N}')


ax[row_i, col_i].set_xlabel('real')
ax[row_i, col_i].set_ylabel('imaginary')

plt.show()

6.4. Examples with Python 81


Quantitative Economics with Python

For a vector of coefficients {𝑐𝑖 }𝑛−1


𝑖=0 , eigenvectors of 𝑃 are also eigenvectors of

𝐶 = 𝑐0 𝐼 + 𝑐1 𝑃 + 𝑐2 𝑃 2 + ⋯ + 𝑐𝑁−1 𝑃 𝑁−1 .

Consider an example in which 𝑁 = 8 and let 𝑤 = 𝑒−2𝜋𝑗/𝑁 .


It can be verified that the matrix 𝐹8 of eigenvectors of 𝑃8 is
1 1 1 ⋯ 1
⎡ 1 𝑤 𝑤2 ⋯ 𝑤7 ⎤
⎢ ⎥
⎢ 1 𝑤2 𝑤4 ⋯ 𝑤14 ⎥
⎢ 1 𝑤3 𝑤6 ⋯ 𝑤21 ⎥
𝐹8 = ⎢ ⎥
1 𝑤4 𝑤8 ⋯ 𝑤28
⎢ ⎥
⎢ 1 𝑤5 𝑤10 ⋯ 𝑤35 ⎥
⎢ 1 𝑤6 𝑤12 ⋯ 𝑤42 ⎥
⎣ 1 𝑤7 𝑤14 ⋯ 𝑤49 ⎦
The matrix 𝐹8 defines a Discete Fourier Transform.

To convert it into an orthogonal eigenvector matrix, we can simply normalize it by dividing every entry by 8.

82 Chapter 6. Circulant Matrices


Quantitative Economics with Python

• stare at the first column of 𝐹8 above to convince yourself of this fact


The eigenvalues corresponding to each eigenvector are {𝑤𝑗 }7𝑗=0 in order.

def construct_F(N):

w = np.e ** (-np.complex(0, 2*np.pi/N))

F = np.ones((N, N), dtype=np.complex)


for i in range(1, N):
F[i, 1:] = w ** (i * np.arange(1, N))

return F, w

F8, w = construct_F(8)

/tmp/ipykernel_11100/903011294.py:3: DeprecationWarning: `np.complex` is a␣


↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

w = np.e ** (-np.complex(0, 2*np.pi/N))


/tmp/ipykernel_11100/903011294.py:5: DeprecationWarning: `np.complex` is a␣
↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

F = np.ones((N, N), dtype=np.complex)

(0.7071067811865476-0.7071067811865475j)

F8

array([[ 1. +0.j , 1. +0.j , 1. +0.j , 1. +0.j ,


1. +0.j , 1. +0.j , 1. +0.j , 1. +0.j ],
[ 1. +0.j , 0.707-0.707j, 0. -1.j , -0.707-0.707j,
-1. -0.j , -0.707+0.707j, -0. +1.j , 0.707+0.707j],
[ 1. +0.j , 0. -1.j , -1. -0.j , -0. +1.j ,
1. +0.j , 0. -1.j , -1. -0.j , -0. +1.j ],
[ 1. +0.j , -0.707-0.707j, -0. +1.j , 0.707-0.707j,
-1. -0.j , 0.707+0.707j, 0. -1.j , -0.707+0.707j],
[ 1. +0.j , -1. -0.j , 1. +0.j , -1. -0.j ,
1. +0.j , -1. -0.j , 1. +0.j , -1. -0.j ],
[ 1. +0.j , -0.707+0.707j, 0. -1.j , 0.707+0.707j,
-1. -0.j , 0.707-0.707j, -0. +1.j , -0.707-0.707j],
[ 1. +0.j , -0. +1.j , -1. -0.j , 0. -1.j ,
1. +0.j , -0. +1.j , -1. -0.j , 0. -1.j ],
[ 1. +0.j , 0.707+0.707j, -0. +1.j , -0.707+0.707j,
-1. -0.j , -0.707-0.707j, 0. -1.j , 0.707-0.707j]])

6.4. Examples with Python 83


Quantitative Economics with Python

# normalize
Q8 = F8 / np.sqrt(8)

# verify the orthogonality (unitarity)


Q8 @ np.conjugate(Q8)

array([[ 1.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, 0.+0.j, 0.+0.j,


0.+0.j],
[-0.-0.j, 1.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, 0.+0.j,
0.+0.j],
[-0.-0.j, -0.-0.j, 1.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, 0.+0.j,
0.+0.j],
[-0.-0.j, -0.-0.j, -0.-0.j, 1.+0.j, -0.+0.j, -0.+0.j, -0.+0.j,
-0.+0.j],
[-0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j, 1.+0.j, -0.+0.j, -0.+0.j,
-0.+0.j],
[ 0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j, 1.+0.j, -0.+0.j,
-0.+0.j],
[ 0.-0.j, 0.-0.j, 0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j, 1.+0.j,
-0.+0.j],
[ 0.-0.j, 0.-0.j, 0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j,
1.+0.j]])

Let’s verify that 𝑘th column of 𝑄8 is an eigenvector of 𝑃8 with an eigenvalue 𝑤𝑘 .

P8 = construct_P(8)

diff_arr = np.empty(8, dtype=np.complex)


for j in range(8):
diff = P8 @ Q8[:, j] - w ** j * Q8[:, j]
diff_arr[j] = diff @ diff.T

/tmp/ipykernel_11100/646542455.py:1: DeprecationWarning: `np.complex` is a␣


↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

diff_arr = np.empty(8, dtype=np.complex)

diff_arr

array([ 0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j,


-0.+0.j])

84 Chapter 6. Circulant Matrices


Quantitative Economics with Python

6.5 Associated Permutation Matrix

Next, we execute calculations to verify that the circulant matrix 𝐶 defined in equation (6.1) can be written as

𝐶 = 𝑐0 𝐼 + 𝑐1 𝑃 + ⋯ + 𝑐𝑛−1 𝑃 𝑛−1

and that every eigenvector of 𝑃 is also an eigenvector of 𝐶.


We illustrate this for 𝑁 = 8 case.

c = np.random.random(8)

array([0.873, 0.241, 0.701, 0.546, 0.32 , 0.88 , 0.282, 0.474])

C8 = construct_cirlulant(c)

Compute 𝑐0 𝐼 + 𝑐1 𝑃 + ⋯ + 𝑐𝑛−1 𝑃 𝑛−1 .

N = 8

C = np.zeros((N, N))
P = np.eye(N)

for i in range(N):
C += c[i] * P
P = P8 @ P

array([[0.873, 0.241, 0.701, 0.546, 0.32 , 0.88 , 0.282, 0.474],


[0.474, 0.873, 0.241, 0.701, 0.546, 0.32 , 0.88 , 0.282],
[0.282, 0.474, 0.873, 0.241, 0.701, 0.546, 0.32 , 0.88 ],
[0.88 , 0.282, 0.474, 0.873, 0.241, 0.701, 0.546, 0.32 ],
[0.32 , 0.88 , 0.282, 0.474, 0.873, 0.241, 0.701, 0.546],
[0.546, 0.32 , 0.88 , 0.282, 0.474, 0.873, 0.241, 0.701],
[0.701, 0.546, 0.32 , 0.88 , 0.282, 0.474, 0.873, 0.241],
[0.241, 0.701, 0.546, 0.32 , 0.88 , 0.282, 0.474, 0.873]])

C8

array([[0.873, 0.241, 0.701, 0.546, 0.32 , 0.88 , 0.282, 0.474],


[0.474, 0.873, 0.241, 0.701, 0.546, 0.32 , 0.88 , 0.282],
[0.282, 0.474, 0.873, 0.241, 0.701, 0.546, 0.32 , 0.88 ],
[0.88 , 0.282, 0.474, 0.873, 0.241, 0.701, 0.546, 0.32 ],
[0.32 , 0.88 , 0.282, 0.474, 0.873, 0.241, 0.701, 0.546],
[0.546, 0.32 , 0.88 , 0.282, 0.474, 0.873, 0.241, 0.701],
[0.701, 0.546, 0.32 , 0.88 , 0.282, 0.474, 0.873, 0.241],
[0.241, 0.701, 0.546, 0.32 , 0.88 , 0.282, 0.474, 0.873]])

Now let’s compute the difference between two circulant matrices that we have constructed in two different ways.

6.5. Associated Permutation Matrix 85


Quantitative Economics with Python

np.abs(C - C8).max()

0.0

7
The 𝑘th column of 𝑃8 associated with eigenvalue 𝑤𝑘−1 is an eigenvector of 𝐶8 associated with an eigenvalue ∑ℎ=0 𝑐𝑗 𝑤ℎ𝑘 .

_C8 = np.zeros(8, dtype=np.complex)

for j in range(8):
for k in range(8):
_C8[j] += c[k] * w ** (j * k)

/tmp/ipykernel_11100/866898372.py:1: DeprecationWarning: `np.complex` is a␣


↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

_C8 = np.zeros(8, dtype=np.complex)

_C8

array([4.316+0.j , 0.051-0.019j, 0.21 -0.1j , 1.056+0.82j ,


0.035-0.j , 1.056-0.82j , 0.21 +0.1j , 0.051+0.019j])

We can verify this by comparing C8 @ Q8[:, j] with _C8[j] * Q8[:, j].

# verify
for j in range(8):
diff = C8 @ Q8[:, j] - _C8[j] * Q8[:, j]
print(diff)

[-0.+0.j -0.+0.j -0.+0.j -0.+0.j -0.+0.j -0.+0.j -0.+0.j -0.+0.j]


[-0.-0.j -0.-0.j -0.-0.j -0.-0.j -0.-0.j -0.-0.j -0.-0.j -0.+0.j]
[ 0.-0.j 0.-0.j -0.-0.j -0.-0.j -0.-0.j -0.-0.j -0.-0.j -0.-0.j]
[-0.-0.j 0.-0.j -0.-0.j 0.-0.j -0.+0.j 0.-0.j -0.-0.j -0.+0.j]
[ 0.+0.j -0.-0.j 0.+0.j -0.-0.j 0.+0.j -0.-0.j 0.+0.j -0.-0.j]
[-0.+0.j -0.-0.j 0.+0.j -0.-0.j 0.-0.j -0.+0.j 0.-0.j 0.+0.j]
[-0.+0.j -0.-0.j 0.-0.j 0.-0.j 0.-0.j 0.-0.j 0.-0.j 0.-0.j]
[0.-0.j 0.-0.j 0.-0.j 0.-0.j 0.-0.j 0.-0.j 0.+0.j 0.+0.j]

6.6 Discrete Fourier Transform

The Discrete Fourier Transform (DFT) allows us to represent a discrete time sequence as a weighted sum of complex
sinusoids.
Consider a sequence of 𝑁 real number {𝑥𝑗 }𝑁−1
𝑗=0 .

The Discrete Fourier Transform maps {𝑥𝑗 }𝑁−1 𝑁−1


𝑗=0 into a sequence of complex numbers {𝑋𝑘 }𝑘=0

86 Chapter 6. Circulant Matrices


Quantitative Economics with Python

where
𝑁−1
𝑘𝑛
𝑋𝑘 = ∑ 𝑥𝑛 𝑒−2𝜋 𝑁 𝑖
𝑛=0

def DFT(x):
"The discrete Fourier transform."

N = len(x)
w = np.e ** (-np.complex(0, 2*np.pi/N))

X = np.zeros(N, dtype=np.complex)
for k in range(N):
for n in range(N):
X[k] += x[n] * w ** (k * n)

return X

Consider the following example.

1/2 𝑛 = 0, 1
𝑥𝑛 = {
0 otherwise

x = np.zeros(10)
x[0:2] = 1/2

array([0.5, 0.5, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ])

Apply a discrete Fourier transform.

X = DFT(x)

/tmp/ipykernel_11100/1700622740.py:5: DeprecationWarning: `np.complex` is a␣


↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

w = np.e ** (-np.complex(0, 2*np.pi/N))


/tmp/ipykernel_11100/1700622740.py:7: DeprecationWarning: `np.complex` is a␣
↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

X = np.zeros(N, dtype=np.complex)

6.6. Discrete Fourier Transform 87


Quantitative Economics with Python

array([ 1. +0.j , 0.905-0.294j, 0.655-0.476j, 0.345-0.476j,


0.095-0.294j, -0. +0.j , 0.095+0.294j, 0.345+0.476j,
0.655+0.476j, 0.905+0.294j])

We can plot magnitudes of a sequence of numbers and the associated discrete Fourier transform.

def plot_magnitude(x=None, X=None):

data = []
names = []
xs = []
if (x is not None):
data.append(x)
names.append('x')
xs.append('n')
if (X is not None):
data.append(X)
names.append('X')
xs.append('j')

num = len(data)
for i in range(num):
n = data[i].size
plt.figure(figsize=(8, 3))
plt.scatter(range(n), np.abs(data[i]))
plt.vlines(range(n), 0, np.abs(data[i]), color='b')

plt.xlabel(xs[i])
plt.ylabel('magnitude')
plt.title(names[i])
plt.show()

plot_magnitude(x=x, X=X)

88 Chapter 6. Circulant Matrices


Quantitative Economics with Python

The inverse Fourier transform transforms a Fourier transform 𝑋 of 𝑥 back to 𝑥.


The inverse Fourier transform is defined as
𝑁−1
1 𝑘𝑛
𝑥𝑛 = ∑ 𝑋𝑘 𝑒2𝜋( 𝑁 )𝑖 , 𝑛 = 0, 1, … , 𝑁 − 1
𝑘=0
𝑁

def inverse_transform(X):

N = len(X)
w = np.e ** (np.complex(0, 2*np.pi/N))

x = np.zeros(N, dtype=np.complex)
for n in range(N):
for k in range(N):
x[n] += X[k] * w ** (k * n) / N

return x

inverse_transform(X)

/tmp/ipykernel_11100/1761241726.py:4: DeprecationWarning: `np.complex` is a␣


↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

w = np.e ** (np.complex(0, 2*np.pi/N))


/tmp/ipykernel_11100/1761241726.py:6: DeprecationWarning: `np.complex` is a␣
↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

x = np.zeros(N, dtype=np.complex)

6.6. Discrete Fourier Transform 89


Quantitative Economics with Python

array([ 0.5+0.j, 0.5-0.j, -0. -0.j, -0. -0.j, -0. -0.j, -0. -0.j,
-0. +0.j, -0. +0.j, -0. +0.j, -0. +0.j])

Another example is
11
𝑥𝑛 = 2 cos (2𝜋 𝑛) , 𝑛 = 0, 1, 2, ⋯ 19
40
1 11
Since 𝑁 = 20, we cannot use an integer multiple of 20 to represent a frequency 40 .

To handle this, we shall end up using all 𝑁 of the availble frequencies in the DFT.
11
Since 40 is in between 10 12
40 and 40 (each of which is an integer multiple of
1
20 ), the complex coefficients in the DFT have
their largest magnitudes at 𝑘 = 5, 6, 15, 16, not just at a single frequency.

N = 20
x = np.empty(N)

for j in range(N):
x[j] = 2 * np.cos(2 * np.pi * 11 * j / 40)

X = DFT(x)

/tmp/ipykernel_11100/1700622740.py:5: DeprecationWarning: `np.complex` is a␣


↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

w = np.e ** (-np.complex(0, 2*np.pi/N))


/tmp/ipykernel_11100/1700622740.py:7: DeprecationWarning: `np.complex` is a␣
↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

X = np.zeros(N, dtype=np.complex)

plot_magnitude(x=x, X=X)

90 Chapter 6. Circulant Matrices


Quantitative Economics with Python

What happens if we change the last example to 𝑥𝑛 = 2 cos (2𝜋 10


40 𝑛)?
10 1
Note that 40 is an integer multiple of 20 .

N = 20
x = np.empty(N)

for j in range(N):
x[j] = 2 * np.cos(2 * np.pi * 10 * j / 40)

X = DFT(x)

/tmp/ipykernel_11100/1700622740.py:5: DeprecationWarning: `np.complex` is a␣


↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

(continues on next page)

6.6. Discrete Fourier Transform 91


Quantitative Economics with Python

(continued from previous page)


w = np.e ** (-np.complex(0, 2*np.pi/N))
/tmp/ipykernel_11100/1700622740.py:7: DeprecationWarning: `np.complex` is a␣
↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

X = np.zeros(N, dtype=np.complex)

plot_magnitude(x=x, X=X)

If we represent the discrete Fourier transform as a matrix, we discover that it equals the matrix 𝐹𝑁 of eigenvectors of the
permutation matrix 𝑃𝑁 .
We can use the example where 𝑥𝑛 = 2 cos (2𝜋 11
40 𝑛) , 𝑛 = 0, 1, 2, ⋯ 19 to illustrate this.

N = 20
(continues on next page)

92 Chapter 6. Circulant Matrices


Quantitative Economics with Python

(continued from previous page)


x = np.empty(N)

for j in range(N):
x[j] = 2 * np.cos(2 * np.pi * 11 * j / 40)

array([ 2. , -0.313, -1.902, 0.908, 1.618, -1.414, -1.176, 1.782,


0.618, -1.975, -0. , 1.975, -0.618, -1.782, 1.176, 1.414,
-1.618, -0.908, 1.902, 0.313])

First use the summation formula to transform 𝑥 to 𝑋.

X = DFT(x)
X

/tmp/ipykernel_11100/1700622740.py:5: DeprecationWarning: `np.complex` is a␣


↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

w = np.e ** (-np.complex(0, 2*np.pi/N))


/tmp/ipykernel_11100/1700622740.py:7: DeprecationWarning: `np.complex` is a␣
↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

X = np.zeros(N, dtype=np.complex)

array([2. +0.j , 2. +0.558j, 2. +1.218j, 2. +2.174j, 2. +4.087j,


2.+12.785j, 2.-12.466j, 2. -3.751j, 2. -1.801j, 2. -0.778j,
2. -0.j , 2. +0.778j, 2. +1.801j, 2. +3.751j, 2.+12.466j,
2.-12.785j, 2. -4.087j, 2. -2.174j, 2. -1.218j, 2. -0.558j])

Now let’s evaluate the outcome of postmultiplying the eigenvector matrix 𝐹20 by the vector 𝑥, a product that we claim
should equal the Fourier tranform of the sequence {𝑥𝑛 }𝑁−1
𝑛=0 .

F20, _ = construct_F(20)

/tmp/ipykernel_11100/903011294.py:3: DeprecationWarning: `np.complex` is a␣


↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

w = np.e ** (-np.complex(0, 2*np.pi/N))


/tmp/ipykernel_11100/903011294.py:5: DeprecationWarning: `np.complex` is a␣
↪deprecated alias for the builtin `complex`. To silence this warning, use␣

↪`complex` by itself. Doing this will not modify any behavior and is safe. If you␣

↪specifically wanted the numpy scalar type, use `np.complex128` here.

(continues on next page)

6.6. Discrete Fourier Transform 93


Quantitative Economics with Python

(continued from previous page)


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/
↪release/1.20.0-notes.html#deprecations

F = np.ones((N, N), dtype=np.complex)

F20 @ x

array([2. +0.j , 2. +0.558j, 2. +1.218j, 2. +2.174j, 2. +4.087j,


2.+12.785j, 2.-12.466j, 2. -3.751j, 2. -1.801j, 2. -0.778j,
2. -0.j , 2. +0.778j, 2. +1.801j, 2. +3.751j, 2.+12.466j,
2.-12.785j, 2. -4.087j, 2. -2.174j, 2. -1.218j, 2. -0.558j])

−1
Similarly, the inverse DFT can be expressed as a inverse DFT matrix 𝐹20 .

F20_inv = np.linalg.inv(F20)
F20_inv @ X

array([ 2. +0.j, -0.313+0.j, -1.902-0.j, 0.908+0.j, 1.618+0.j,


-1.414-0.j, -1.176-0.j, 1.782-0.j, 0.618+0.j, -1.975+0.j,
-0. -0.j, 1.975+0.j, -0.618+0.j, -1.782-0.j, 1.176-0.j,
1.414+0.j, -1.618+0.j, -0.908-0.j, 1.902-0.j, 0.313+0.j])

94 Chapter 6. Circulant Matrices


CHAPTER

SEVEN

SINGULAR VALUE DECOMPOSITION (SVD)

In addition to regular packages contained in Anaconda by default, this lecture also requires:

!pip install quandl

import numpy as np
import numpy.linalg as LA
import matplotlib.pyplot as plt
%matplotlib inline
import quandl as ql
import pandas as pd

7.1 Overview

The singular value decomposition is a work-horse in applications of least squares projection that form a foundation for
some important machine learning methods.
This lecture describes the singular value decomposition and two of its uses:
• principal components analysis (PCA)
• dynamic mode decomposition (DMD)
Each of these can be thought of as a data-reduction procedure designed to capture salient patterns by projecting data onto
a limited set of factors.

7.2 The Setup

Let 𝑋 be an 𝑚 × 𝑛 matrix of rank 𝑟.


Necessarily, 𝑟 ≤ min(𝑚, 𝑛).
In this lecture, we’ll think of 𝑋 as a matrix of data.
• each column is an individual – a time period or person, depending on the application
• each row is a random variable measuring an attribute of a time period or a person, depending on the application
We’ll be interested in two cases
• A short and fat case in which 𝑚 << 𝑛, so that there are many more columns than rows.
• A tall and skinny case in which 𝑚 >> 𝑛, so that there are many more rows than columns.

95
Quantitative Economics with Python

We’ll apply a singular value decomposition of 𝑋 in both situations.


In the first case in which there are many more observations 𝑛 than random variables 𝑚, we learn about a joint distribution
by taking averages across observations of functions of the observations.
Here we’ll look for patterns by using a singular value decomposition to do a principal components analysis (PCA).
In the second case in which there are many more random variables 𝑚 than observations 𝑛, we’ll proceed in a different
way.
We’ll again use a singular value decomposition, but now to do a dynamic mode decomposition (DMD)

7.3 Singular Value Decomposition

A singular value decomposition of an 𝑚 × 𝑛 matrix 𝑋 of rank 𝑟 ≤ min(𝑚, 𝑛) is

𝑋 = 𝑈 Σ𝑉 𝑇

where
𝑈𝑈𝑇 = 𝐼 𝑈𝑇 𝑈 = 𝐼
𝑉𝑉𝑇 = 𝐼 𝑉 𝑇𝑉 = 𝐼
where
• 𝑈 is an 𝑚 × 𝑚 matrix whose columns are eigenvectors of 𝑋 𝑇 𝑋
• 𝑉 is an 𝑛 × 𝑛 matrix whose columns are eigenvectors of 𝑋𝑋 𝑇
• Σ is an 𝑚 × 𝑛 matrix in which the first 𝑟 places on its main diagonal are positive numbers 𝜎1 , 𝜎2 , … , 𝜎𝑟 called
singular values; remaining entries of Σ are all zero
• The 𝑟 singular values are square roots of the eigenvalues of the 𝑚 × 𝑚 matrix 𝑋𝑋 𝑇 and the 𝑛 × 𝑛 matrix 𝑋 𝑇 𝑋
• When 𝑈 is a complex valued matrix, 𝑈 𝑇 denotes the conjugate-transpose or Hermitian-transpose of 𝑈 , mean-
𝑇
ing that 𝑈𝑖𝑗 is the complex conjugate of 𝑈𝑗𝑖 .
• Similarly, when 𝑉 is a complex valued matrix, 𝑉 𝑇 denotes the conjugate-transpose or Hermitian-transpose of
𝑉
In what is called a full SVD, the shapes of 𝑈 , Σ, and 𝑉 are (𝑚, 𝑚), (𝑚, 𝑛), (𝑛, 𝑛), respectively.
There is also an alternative shape convention called an economy or reduced SVD .
Thus, note that because we assume that 𝑋 has rank 𝑟, there are only 𝑟 nonzero singular values, where 𝑟 = rank(𝑋) ≤
min (𝑚, 𝑛).
A reduced SVD uses this fact to express 𝑈 , Σ, and 𝑉 as matrices with shapes (𝑚, 𝑟), (𝑟, 𝑟), (𝑟, 𝑛).
Sometimes, we will use a full SVD in which 𝑈 , Σ, and 𝑉 have shapes (𝑚, 𝑚), (𝑚, 𝑛), (𝑛, 𝑛)
Caveat: The properties
𝑈𝑈𝑇 = 𝐼 𝑈𝑇 𝑈 = 𝐼
𝑉𝑉𝑇 = 𝐼 𝑉 𝑇𝑉 = 𝐼
apply to a full SVD but not to a reduced SVD.
In the tall-skinny case in which 𝑚 >> 𝑛, for a reduced SVD
𝑈𝑈𝑇 ≠ 𝐼 𝑈𝑇 𝑈 = 𝐼
𝑉𝑉𝑇 = 𝐼 𝑉 𝑇𝑉 = 𝐼

96 Chapter 7. Singular Value Decomposition (SVD)


Quantitative Economics with Python

while in the short-fat case in which 𝑚 << 𝑛, for a reduced SVD

𝑈𝑈𝑇 = 𝐼 𝑈𝑇 𝑈 = 𝐼
𝑇
𝑉𝑉 =𝐼 𝑉 𝑇𝑉 ≠ 𝐼

When we study Dynamic Mode Decomposition below, we shall want to remember this caveat because we’ll be using
reduced SVD’s to compute key objects.

7.4 Reduced Versus Full SVD

Earlier, we mentioned full and reduced SVD’s.


You can read about reduced and full SVD here https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html
In a full SVD
• 𝑈 is 𝑚 × 𝑚
• Σ is 𝑚 × 𝑛
• 𝑉 is 𝑛 × 𝑛
In a reduced SVD
• 𝑈 is 𝑚 × 𝑟
• Σ is 𝑟 × 𝑟
• 𝑉 is 𝑛 × 𝑟
Let’s do a some small exercise to compare full and reduced SVD’s.
First, let’s study a case in which 𝑚 = 5 > 𝑛 = 2.
(This is a small example of the tall-skinny that will concern us when we study Dynamic Mode Decompositions below.)

import numpy as np
X = np.random.rand(5,2)
U, S, V = np.linalg.svd(X,full_matrices=True) # full SVD
Uhat, Shat, Vhat = np.linalg.svd(X,full_matrices=False) # economy SVD
print('U, S, V ='), U, S, V

U, S, V =

(None,
array([[-0.37541622, -0.19584961, -0.67837623, -0.4401461 , -0.40839036],
[-0.55849099, -0.35735004, 0.65235117, -0.3671743 , -0.00312385],
[-0.42904804, 0.86421085, 0.11626597, 0.02008157, -0.23481127],
[-0.48271863, -0.27745485, -0.10739446, 0.81450895, -0.12265046],
[-0.36062582, 0.10051012, -0.2986508 , -0.08732897, 0.87351479]]),
array([1.55734944, 0.39671477]),
array([[-0.96609979, -0.25816894],
[-0.25816894, 0.96609979]]))

print('Uhat, Shat, Vhat = '), Uhat, Shat, Vhat

7.4. Reduced Versus Full SVD 97


Quantitative Economics with Python

Uhat, Shat, Vhat =

(None,
array([[-0.37541622, -0.19584961],
[-0.55849099, -0.35735004],
[-0.42904804, 0.86421085],
[-0.48271863, -0.27745485],
[-0.36062582, 0.10051012]]),
array([1.55734944, 0.39671477]),
array([[-0.96609979, -0.25816894],
[-0.25816894, 0.96609979]]))

rr = np.linalg.matrix_rank(X)
print('rank of X - '), rr

rank of X -

(None, 2)

Properties:
• Where 𝑈 is constructed via a full SVD, 𝑈 𝑇 𝑈 = 𝐼𝑟×𝑟 and 𝑈 𝑈 𝑇 = 𝐼𝑚×𝑚
• Where 𝑈̂ is constructed via a reduced SVD, although 𝑈̂ 𝑇 𝑈̂ = 𝐼𝑟×𝑟 it happens that 𝑈̂ 𝑈̂ 𝑇 ≠ 𝐼𝑚×𝑚
We illustrate these properties for our example with the following code cells.

UTU = U.T@U
UUT = [email protected]
print('UUT, UTU = '), UUT, UTU

UUT, UTU =

(None,
array([[ 1.00000000e+00, 7.25986945e-17, -4.29313316e-17,
-7.16514089e-20, -3.42603458e-17],
[ 7.25986945e-17, 1.00000000e+00, 1.64603035e-16,
-6.51859739e-17, -8.41387647e-18],
[-4.29313316e-17, 1.64603035e-16, 1.00000000e+00,
9.85891939e-18, -1.53238976e-16],
[-7.16514089e-20, -6.51859739e-17, 9.85891939e-18,
1.00000000e+00, -4.63853856e-17],
[-3.42603458e-17, -8.41387647e-18, -1.53238976e-16,
-4.63853856e-17, 1.00000000e+00]]),
array([[ 1.00000000e+00, 6.09772859e-18, 1.03210775e-16,
7.46020143e-17, 9.69958745e-18],
[ 6.09772859e-18, 1.00000000e+00, 1.78870202e-16,
1.51877577e-17, 3.14064924e-17],
[ 1.03210775e-16, 1.78870202e-16, 1.00000000e+00,
8.72751834e-17, -6.17786179e-17],
[ 7.46020143e-17, 1.51877577e-17, 8.72751834e-17,
1.00000000e+00, 3.04189949e-17],
[ 9.69958745e-18, 3.14064924e-17, -6.17786179e-17,
3.04189949e-17, 1.00000000e+00]]))

98 Chapter 7. Singular Value Decomposition (SVD)


Quantitative Economics with Python

UhatUhatT = [email protected]
UhatTUhat = Uhat.T@Uhat
print('UhatUhatT, UhatTUhat= '), UhatUhatT, UhatTUhat

UhatUhatT, UhatTUhat=

(None,
array([[ 0.17929441, 0.27965344, -0.00818377, 0.23555983, 0.11569991],
[ 0.27965344, 0.43961124, -0.06920632, 0.36874251, 0.16548898],
[-0.00818377, -0.06920632, 0.93094262, -0.03267001, 0.24158774],
[ 0.23555983, 0.36874251, -0.03267001, 0.30999847, 0.14619378],
[ 0.11569991, 0.16548898, 0.24158774, 0.14619378, 0.14015327]]),
array([[1.00000000e+00, 6.09772859e-18],
[6.09772859e-18, 1.00000000e+00]]))

Remark: The cells above illustrate application of the fullmatrices=True and full-matrices=False op-
tions. Using full-matrices=False returns a reduced singular value decomposition. This option implements an
optimal reduced rank approximation of a matrix, in the sense of minimizing the Frobenius norm of the discrepancy
between the approximating matrix and the matrix being approximated. Optimality in this sense is established in the
celebrated Eckart–Young theorem. See https://en.wikipedia.org/wiki/Low-rank_approximation.
When we study Dynamic Mode Decompositions below, it will be important for us to remember the following important
properties of full and reduced SVD’s in such tall-skinny cases.
Let’s do another exercise, but now we’ll set 𝑚 = 2 < 5 = 𝑛

import numpy as np
X = np.random.rand(2,5)
U, S, V = np.linalg.svd(X,full_matrices=True) # full SVD
Uhat, Shat, Vhat = np.linalg.svd(X,full_matrices=False) # economy SVD
print('U, S, V ='), U, S, V

U, S, V =

(None,
array([[ 0.75354378, -0.65739773],
[ 0.65739773, 0.75354378]]),
array([1.52371698, 0.78047845]),
array([[ 0.24335076, 0.20429337, 0.41002952, 0.69252016, 0.50133447],
[ 0.47340092, 0.16756132, 0.56260509, -0.076725 , -0.65222969],
[-0.35798988, -0.64724402, 0.63701494, -0.16367735, 0.14261877],
[-0.61776202, 0.05930924, -0.01976992, 0.58720849, -0.51927627],
[-0.45484648, 0.71256229, 0.3304125 , -0.37805425, 0.18240677]]))

print('Uhat, Shat, Vhat = '), Uhat, Shat, Vhat

Uhat, Shat, Vhat =

(None,
array([[ 0.75354378, -0.65739773],
[ 0.65739773, 0.75354378]]),
(continues on next page)

7.4. Reduced Versus Full SVD 99


Quantitative Economics with Python

(continued from previous page)


array([1.52371698, 0.78047845]),
array([[ 0.24335076, 0.20429337, 0.41002952, 0.69252016, 0.50133447],
[ 0.47340092, 0.16756132, 0.56260509, -0.076725 , -0.65222969]]))

rr = np.linalg.matrix_rank(X)
print('rank X = '), rr

rank X =

(None, 2)

7.5 Digression: Polar Decomposition

A singular value decomposition (SVD) is related to the polar decomposition of 𝑋

𝑋 = 𝑆𝑄

where
𝑆 = 𝑈 Σ𝑈 𝑇
𝑄 = 𝑈𝑉 𝑇
and 𝑆 is evidently a symmetric matrix and 𝑄 is an orthogonal matrix.

7.6 Principle Components Analysis (PCA)

Let’s begin with a case in which 𝑛 >> 𝑚, so that we have many more observations 𝑛 than random variables 𝑚.
The matrix 𝑋 is short and fat in an 𝑛 >> 𝑚 case as opposed to a tall and skinny case with 𝑚 >> 𝑛 to be discussed
later.
We regard 𝑋 as an 𝑚 × 𝑛 matrix of data:

𝑋 = [𝑋1 ∣ 𝑋2 ∣ ⋯ ∣ 𝑋𝑛 ]

𝑋1𝑗 𝑥1
⎡𝑋 ⎤ ⎡𝑥 ⎤
2𝑗 ⎥ is a vector of observations on variables ⎢ 2 ⎥.
where for 𝑗 = 1, … , 𝑛 the column vector 𝑋𝑗 = ⎢
⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣𝑋𝑚𝑗 ⎦ ⎣𝑥𝑚 ⎦
In a time series setting, we would think of columns 𝑗 as indexing different times at which random variables are observed,
while rows index different random variables.
In a cross section setting, we would think of columns 𝑗 as indexing different individuals for which random variables are
observed, while rows index different random variables.
The number of positive singular values equals the rank of matrix 𝑋.
Arrange the singular values in decreasing order.
Arrange the positive singular values on the main diagonal of the matrix Σ of into a vector 𝜎𝑅 .
Set all other entries of Σ to zero.

100 Chapter 7. Singular Value Decomposition (SVD)


Quantitative Economics with Python

7.7 Relationship of PCA to SVD

To relate a SVD to a PCA (principal component analysis) of data set 𝑋, first construct the SVD of the data matrix 𝑋:

𝑋 = 𝑈 Σ𝑉 𝑇 = 𝜎1 𝑈1 𝑉1𝑇 + 𝜎2 𝑈2 𝑉2𝑇 + ⋯ + 𝜎𝑟 𝑈𝑟 𝑉𝑟𝑇 (7.1)

where

𝑈 = [𝑈1 |𝑈2 | … |𝑈𝑚 ]

𝑉1𝑇
⎡𝑉 𝑇 ⎤
𝑉𝑇 =⎢ 2 ⎥
⎢…⎥
⎣𝑉𝑛𝑇 ⎦
In equation (7.1), each of the 𝑚 × 𝑛 matrices 𝑈𝑗 𝑉𝑗𝑇 is evidently of rank 1.
Thus, we have

𝑈11 𝑉1𝑇 𝑈12 𝑉2𝑇 𝑈1𝑟 𝑉𝑟𝑇


⎛ 𝑇 ⎞
𝑈21 𝑉1 ⎟ ⎛ 𝑇 ⎞
⎜ 𝑈22 𝑉2 ⎟ ⎛
⎜ 𝑈2𝑟 𝑉𝑟 ⎞
𝑇
𝑋 = 𝜎1 ⎜

⎜ ⋯ ⎟ ⎟ + 𝜎2 ⎜
⎜ ⋯ ⎟ ⎟ + … + 𝜎𝑟 ⎜
⎜ ⋯ ⎟

⎟ (7.2)
𝑇 𝑇 𝑇
⎝𝑈𝑚1 𝑉1 ⎠ ⎝𝑈𝑚2 𝑉2 ⎠ ⎝𝑈𝑚𝑟 𝑉𝑟 ⎠

Here is how we would interpret the objects in the matrix equation (7.2) in a time series context:
• 𝑉𝑘𝑇 = [𝑉𝑘1 𝑉𝑘2 … 𝑉𝑘𝑛 ] for each 𝑘 = 1, … , 𝑛 is a time series {𝑉𝑘𝑗 }𝑛𝑗=1 for the 𝑘th principal component
𝑈1𝑘
⎡𝑈 ⎤
• 𝑈𝑗 = ⎢ 2𝑘 ⎥ 𝑘 = 1, … , 𝑚 is a vector of loadings of variables 𝑋𝑖 on the 𝑘th principle component, 𝑖 = 1, … , 𝑚
⎢ … ⎥
⎣𝑈𝑚𝑘 ⎦
• 𝜎𝑘 for each 𝑘 = 1, … , 𝑟 is the strength of 𝑘th principal component

7.8 PCA with Eigenvalues and Eigenvectors

We now use an eigen decomposition of a sample covariance matrix to do PCA.


Let 𝑋𝑚×𝑛 be our 𝑚 × 𝑛 data matrix.
Let’s assume that sample means of all variables are zero.
We can assure this by pre-processing the data by subtracting sample means.
Define the sample covariance matrix Ω as

Ω = 𝑋𝑋 𝑇

Then use an eigen decomposition to represent Ω as follows:

Ω = 𝑃 Λ𝑃 𝑇

Here
• 𝑃 is 𝑚 × 𝑚 matrix of eigenvectors of Ω
• Λ is a diagonal matrix of eigenvalues of Ω

7.7. Relationship of PCA to SVD 101


Quantitative Economics with Python

We can then represent 𝑋 as

𝑋 = 𝑃𝜖

where

𝜖𝜖𝑇 = Λ.

We can verify that

𝑋𝑋 𝑇 = 𝑃 Λ𝑃 𝑇 .

It follows that we can represent the data matrix as

𝜖1
⎡𝜖 ⎤
𝑋 = [𝑋1 |𝑋2 | … |𝑋𝑚 ] = [𝑃1 |𝑃2 | … |𝑃𝑚 ] ⎢ 2 ⎥ = 𝑃1 𝜖1 + 𝑃2 𝜖2 + … + 𝑃𝑚 𝜖𝑚
⎢…⎥
⎣𝜖𝑚 ⎦
where

𝜖𝜖𝑇 = Λ.

To reconcile the preceding representation with the PCA that we obtained through the SVD above, we first note that
𝜖2𝑗 = 𝜆𝑗 ≡ 𝜎𝑗2 .
𝜖𝑗
Now define 𝜖𝑗̃ = √𝜆𝑗
, which evidently implies that 𝜖𝑗̃ 𝜖𝑇𝑗̃ = 1.

Therefore
𝑋 = √𝜆1 𝑃1 𝜖1̃ + √𝜆2 𝑃2 𝜖2̃ + … + √𝜆𝑚 𝑃𝑚 𝜖𝑚̃
= 𝜎1 𝑃1 𝜖2̃ + 𝜎2 𝑃2 𝜖2̃ + … + 𝜎𝑚 𝑃𝑚 𝜖𝑚̃ ,

which evidently agrees with


𝑇 𝑇 𝑇
𝑋 = 𝜎1 𝑈1 𝑉1 + 𝜎2 𝑈2 𝑉2 + … + 𝜎𝑟 𝑈𝑟 𝑉𝑟

provided that we set


• 𝑈𝑗 = 𝑃𝑗 (the loadings of variables on principal components)
𝑇
• 𝑉𝑘 = 𝜖𝑘̃ (the principal components)
Since there are several possible ways of computing 𝑃 and 𝑈 for given a data matrix 𝑋, depending on algorithms used,
we might have sign differences or different orders between eigenvectors.
We can resolve such ambiguities about 𝑈 and 𝑃 by
1. sorting eigenvalues and singular values in descending order
2. imposing positive diagonals on 𝑃 and 𝑈 and adjusting signs in 𝑉 𝑇 accordingly

7.9 Connections

To pull things together, it is useful to assemble and compare some formulas presented above.
First, consider the following SVD of an 𝑚 × 𝑛 matrix:

𝑋 = 𝑈 Σ𝑉 𝑇

102 Chapter 7. Singular Value Decomposition (SVD)


Quantitative Economics with Python

Compute:

𝑋𝑋 𝑇 = 𝑈 Σ𝑉 𝑇 𝑉 Σ𝑇 𝑈 𝑇
≡ 𝑈 ΣΣ𝑇 𝑈 𝑇
≡ 𝑈 Λ𝑈 𝑇

Thus, 𝑈 in the SVD is the matrix 𝑃 of eigenvectors of 𝑋𝑋 𝑇 and ΣΣ𝑇 is the matrix Λ of eigenvalues.
Second, let’s compute

𝑋 𝑇 𝑋 = 𝑉 Σ𝑇 𝑈 𝑇 𝑈 Σ𝑉 𝑇
= 𝑉 Σ𝑇 Σ𝑉 𝑇

Thus, the matrix 𝑉 in the SVD is the matrix of eigenvectors of 𝑋 𝑇 𝑋


Summarizing and fitting things together, we have the eigen decomposition of the sample covariance matrix

𝑋𝑋 𝑇 = 𝑃 Λ𝑃 𝑇

where 𝑃 is an orthogonal matrix.


Further, from the SVD of 𝑋, we know that

𝑋𝑋 𝑇 = 𝑈 ΣΣ𝑇 𝑈 𝑇

where 𝑈 is an orthonal matrix.


Thus, 𝑃 = 𝑈 and we have the representation of 𝑋

𝑋 = 𝑃 𝜖 = 𝑈 Σ𝑉 𝑇

It follows that

𝑈 𝑇 𝑋 = Σ𝑉 𝑇 = 𝜖

Note that the preceding implies that

𝜖𝜖𝑇 = Σ𝑉 𝑇 𝑉 Σ𝑇 = ΣΣ𝑇 = Λ,

so that everything fits together.


Below we define a class DecomAnalysis that wraps PCA and SVD for a given a data matrix X.

class DecomAnalysis:
"""
A class for conducting PCA and SVD.
"""

def __init__(self, X, n_component=None):

self.X = X

self.Ω = (X @ X.T)

self.m, self.n = X.shape


self.r = LA.matrix_rank(X)

if n_component:
(continues on next page)

7.9. Connections 103


Quantitative Economics with Python

(continued from previous page)


self.n_component = n_component
else:
self.n_component = self.m

def pca(self):

, P = LA.eigh(self.Ω) # columns of P are eigenvectors

ind = sorted(range( .size), key=lambda x: [x], reverse=True)

# sort by eigenvalues
self. = [ind]
P = P[:, ind]
self.P = P @ diag_sign(P)

self.Λ = np.diag(self. )

self.explained_ratio_pca = np.cumsum(self. ) / self. .sum()

# compute the N by T matrix of principal components


self. = self.P.T @ self.X

P = self.P[:, :self.n_component]
= self. [:self.n_component, :]

# transform data
self.X_pca = P @

def svd(self):

U, , VT = LA.svd(self.X)

ind = sorted(range( .size), key=lambda x: [x], reverse=True)

# sort by eigenvalues
d = min(self.m, self.n)

self. = [ind]
U = U[:, ind]
D = diag_sign(U)
self.U = U @ D
VT[:d, :] = D @ VT[ind, :]
self.VT = VT

self.Σ = np.zeros((self.m, self.n))


self.Σ[:d, :d] = np.diag(self. )

_sq = self. ** 2
self.explained_ratio_svd = np.cumsum( _sq) / _sq.sum()

# slicing matrices by the number of components to use


U = self.U[:, :self.n_component]
Σ = self.Σ[:self.n_component, :self.n_component]
VT = self.VT[:self.n_component, :]

# transform data

(continues on next page)

104 Chapter 7. Singular Value Decomposition (SVD)


Quantitative Economics with Python

(continued from previous page)


self.X_svd = U @ Σ @ VT

def fit(self, n_component):

# pca
P = self.P[:, :n_component]
= self. [:n_component, :]

# transform data
self.X_pca = P @

# svd
U = self.U[:, :n_component]
Σ = self.Σ[:n_component, :n_component]
VT = self.VT[:n_component, :]

# transform data
self.X_svd = U @ Σ @ VT

def diag_sign(A):
"Compute the signs of the diagonal of matrix A"

D = np.diag(np.sign(np.diag(A)))

return D

We also define a function that prints out information so that we can compare decompositions obtained by different algo-
rithms.

def compare_pca_svd(da):
"""
Compare the outcomes of PCA and SVD.
"""

da.pca()
da.svd()

print('Eigenvalues and Singular values\n')


print(f'λ = {da.λ}\n')
print(f'σ^2 = {da.σ**2}\n')
print('\n')

# loading matrices
fig, axs = plt.subplots(1, 2, figsize=(14, 5))
plt.suptitle('loadings')
axs[0].plot(da.P.T)
axs[0].set_title('P')
axs[0].set_xlabel('m')
axs[1].plot(da.U.T)
axs[1].set_title('U')
axs[1].set_xlabel('m')
plt.show()

# principal components
fig, axs = plt.subplots(1, 2, figsize=(14, 5))
plt.suptitle('principal components')
(continues on next page)

7.9. Connections 105


Quantitative Economics with Python

(continued from previous page)


axs[0].plot(da.ε.T)
axs[0].set_title('ε')
axs[0].set_xlabel('n')
axs[1].plot(da.VT[:da.r, :].T * np.sqrt(da.λ))
axs[1].set_title('$V^T*\sqrt{\lambda}$')
axs[1].set_xlabel('n')
plt.show()

For an example PCA applied to analyzing the structure of intelligence tests see this lecture Multivariable Normal Distri-
bution.
Look at the parts of that lecture that describe and illustrate the classic factor analysis model.

7.10 Dynamic Mode Decomposition (DMD)

We turn to the case in which 𝑚 >> 𝑛.


Here an 𝑚 × 𝑛 data matrix 𝑋̃ contains many more random variables 𝑚 than observations 𝑛.
This tall and skinny case is associated with Dynamic Mode Decomposition.
Dynamic mode decomposition was introduced by [Sch10],
You can read more about Dynamic Mode Decomposition here [KBBWP16] and here [BK19] (section 7.2).
We want to fit a first-order vector autoregression

𝑋𝑡+1 = 𝐴𝑋𝑡 + 𝐶𝜖𝑡+1 (7.3)

where 𝜖𝑡+1 is the time 𝑡 + 1 instance of an i.i.d. 𝑚 × 1 random vector with mean vector zero and identity covariance
matrix and
where the 𝑚 × 1 vector 𝑋𝑡 is
𝑇
𝑋𝑡 = [𝑋1,𝑡 𝑋2,𝑡 ⋯ 𝑋𝑚,𝑡 ] (7.4)

and where 𝑇 again denotes complex transposition and 𝑋𝑖,𝑡 is an observation on variable 𝑖 at time 𝑡.
We want to fit equation (7.3).
Our data are organized in an 𝑚 × (𝑛 + 1) matrix 𝑋̃

𝑋̃ = [𝑋1 ∣ 𝑋2 ∣ ⋯ ∣ 𝑋𝑛 ∣ 𝑋𝑛+1 ]

where for 𝑡 = 1, … , 𝑛 + 1, the 𝑚 × 1 vector 𝑋𝑡 is given by (7.4).


Thus, we want to estimate a system (7.3) that consists of 𝑚 least squares regressions of everything on one lagged value
of everything.
The 𝑖’th equation of (7.3) is a regression of 𝑋𝑖,𝑡+1 on the vector 𝑋𝑡 .
We proceed as follows.
From 𝑋,̃ we form two 𝑚 × 𝑛 matrices

𝑋 = [𝑋1 ∣ 𝑋2 ∣ ⋯ ∣ 𝑋𝑛 ]

and

𝑋 ′ = [𝑋2 ∣ 𝑋3 ∣ ⋯ ∣ 𝑋𝑛+1 ]

106 Chapter 7. Singular Value Decomposition (SVD)


Quantitative Economics with Python

Here ′ does not indicate matrix transposition but instead is part of the name of the matrix 𝑋 ′ .
In forming 𝑋 and 𝑋 ′ , we have in each case dropped a column from 𝑋,̃ the last column in the case of 𝑋, and the first
column in the case of 𝑋 ′ .
Evidently, 𝑋 and 𝑋 ′ are both 𝑚 × 𝑛 matrices.
We denote the rank of 𝑋 as 𝑝 ≤ min(𝑚, 𝑛).
Two possible cases are
• 𝑛 >> 𝑚, so that we have many more time series observations 𝑛 than variables 𝑚
• 𝑚 >> 𝑛, so that we have many more variables 𝑚 than time series observations 𝑛
At a general level that includes both of these special cases, a common formula describes the least squares estimator 𝐴 ̂ of
𝐴 for both cases, but important details differ.
The common formula is

𝐴̂ = 𝑋′𝑋+ (7.5)

where 𝑋 + is the pseudo-inverse of 𝑋.


Formulas for the pseudo-inverse differ for our two cases.
When 𝑛 >> 𝑚, so that we have many more time series observations 𝑛 than variables 𝑚 and when 𝑋 has linearly
independent rows, 𝑋𝑋 𝑇 has an inverse and the pseudo-inverse 𝑋 + is

𝑋 + = 𝑋 𝑇 (𝑋𝑋 𝑇 )−1

Here 𝑋 + is a right-inverse that verifies 𝑋𝑋 + = 𝐼𝑚×𝑚 .


In this case, our formula (7.5) for the least-squares estimator of the population matrix of regression coefficients 𝐴 becomes

𝐴 ̂ = 𝑋 ′ 𝑋 𝑇 (𝑋𝑋 𝑇 )−1

This formula is widely used in economics to estimate vector autorgressions.


The right side is proportional to the empirical cross second moment matrix of 𝑋𝑡+1 and 𝑋𝑡 times the inverse of the
second moment matrix of 𝑋𝑡 .
This least-squares formula widely used in econometrics.
Tall-Skinny Case:
When 𝑚 >> 𝑛, so that we have many more variables 𝑚 than time series observations 𝑛 and when 𝑋 has linearly
independent columns, 𝑋 𝑇 𝑋 has an inverse and the pseudo-inverse 𝑋 + is

𝑋 + = (𝑋 𝑇 𝑋)−1 𝑋 𝑇

Here 𝑋 + is a left-inverse that verifies 𝑋 + 𝑋 = 𝐼𝑛×𝑛 .


In this case, our formula (7.5) for a least-squares estimator of 𝐴 becomes

𝐴 ̂ = 𝑋 ′ (𝑋 𝑇 𝑋)−1 𝑋 𝑇 (7.6)

This is the case that we are interested in here.


̂ we find that
If we use formula (7.6) to calculate 𝐴𝑋

̂ = 𝑋′
𝐴𝑋

so that the regression equation fits perfectly, the usual outcome in an underdetermined least-squares model.

7.10. Dynamic Mode Decomposition (DMD) 107


Quantitative Economics with Python

Thus, we want to fit equation (7.3) in a situation in which we have a number 𝑛 of observations that is small relative to the
number 𝑚 of variables that appear in the vector 𝑋𝑡 .
To reiterate and offer an idea about how we can efficiently calculate the pseudo-inverse 𝑋 + , as our estimator 𝐴 ̂ of 𝐴 we
form an 𝑚 × 𝑚 matrix that solves the least-squares best-fit problem

𝐴 ̂ = argmin𝐴̌ ||𝑋 ′ − 𝐴𝑋||


̌
𝐹 (7.7)

where || ⋅ ||𝐹 denotes the Frobeneus norm of a matrix.


The minimizer of the right side of equation (7.7) is

𝐴̂ = 𝑋′𝑋+ (7.8)

where the (possibly huge) 𝑛 × 𝑚 matrix 𝑋 + = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 is again a pseudo-inverse of 𝑋.


The 𝑖th row of 𝐴 ̂ is an 𝑚 × 1 vector of regression coefficients of 𝑋𝑖,𝑡+1 on 𝑋𝑗,𝑡 , 𝑗 = 1, … , 𝑚.
For some situations that we are interested in, 𝑋 𝑇 𝑋 can be close to singular, a situation that can make some numerical
algorithms be error-prone.
To confront that possibility, we’ll use efficient algorithms for computing and for constructing reduced rank approximations
of 𝐴 ̂ in formula (7.6).
The 𝑖th row of 𝐴 ̂ is an 𝑚 × 1 vector of regression coefficients of 𝑋𝑖,𝑡+1 on 𝑋𝑗,𝑡 , 𝑗 = 1, … , 𝑚.
An efficient way to compute the pseudo-inverse 𝑋 + is to start with a singular value decomposition

𝑋 = 𝑈 Σ𝑉 𝑇 (7.9)

We can use the singular value decomposition (7.9) efficiently to construct the pseudo-inverse 𝑋 + by recognizing the
following string of equalities.
𝑋 + = (𝑋 𝑇 𝑋)−1 𝑋 𝑇
= (𝑉 Σ𝑈 𝑇 𝑈 Σ𝑉 𝑇 )−1 𝑉 Σ𝑈 𝑇
= (𝑉 ΣΣ𝑉 𝑇 )−1 𝑉 Σ𝑈 𝑇 (7.10)
= 𝑉 Σ−1 Σ−1 𝑉 𝑇 𝑉 Σ𝑈 𝑇
= 𝑉 Σ−1 𝑈 𝑇

(Since we are in the 𝑚 >> 𝑛 case in which 𝑉 𝑇 𝑉 = 𝐼 in a reduced SVD, we can use the preceding string of equalities
for a reduced SVD as well as for a full SVD.)
Thus, we shall construct a pseudo-inverse 𝑋 + of 𝑋 by using a singular value decomposition of 𝑋 in equation (7.9) to
compute

𝑋 + = 𝑉 Σ−1 𝑈 𝑇 (7.11)

where the matrix Σ−1 is constructed by replacing each non-zero element of Σ with 𝜎𝑗−1 .

We can use formula (7.11) together with formula (7.8) to compute the matrix 𝐴 ̂ of regression coefficients.
Thus, our estimator 𝐴 ̂ = 𝑋 ′ 𝑋 + of the 𝑚 × 𝑚 matrix of coefficients 𝐴 is

𝐴 ̂ = 𝑋 ′ 𝑉 Σ−1 𝑈 𝑇 (7.12)

In addition to doing that, we’ll eventually use dynamic mode decomposition to compute a rank 𝑟 approximation to 𝐴,̂
where 𝑟 < 𝑝.
Remark: We described and illustrated a reduced singular value decomposition above, and compared it with a full
singular value decomposition. In our Python code, we’ll typically use a reduced SVD.
Next, we describe alternative representations of our first-order linear dynamic system.

108 Chapter 7. Singular Value Decomposition (SVD)


Quantitative Economics with Python

7.11 Representation 1

In this representation, we shall use a full SVD of 𝑋.


We use the 𝑚 columns of 𝑈 , and thus the 𝑚 rows of 𝑈 𝑇 , to define a 𝑚 × 1 vector 𝑏̃𝑡 as follows

𝑏̃𝑡 = 𝑈 𝑇 𝑋𝑡 (7.13)

and

𝑋𝑡 = 𝑈 𝑏̃𝑡 (7.14)

(Here we use the notation 𝑏 to remind ourselves that we are creating a basis vector.)
Since we are using a full SVD, 𝑈 𝑈 𝑇 is an 𝑚 × 𝑚 identity matrix.
So it follows from equation (7.13) that we can reconstruct 𝑋𝑡 from 𝑏̃𝑡 by using
• Equation (7.13) serves as an encoder that rotates the 𝑚 × 1 vector 𝑋𝑡 to become an 𝑚 × 1 vector 𝑏̃𝑡
• Equation (7.14) serves as a decoder that recovers the 𝑚 × 1 vector 𝑋𝑡 by rotating the 𝑚 × 1 vector 𝑏̃𝑡
Define a transition matrix for a rotated 𝑚 × 1 state 𝑏̃𝑡 by

̂
𝐴 ̃ = 𝑈 𝑇 𝐴𝑈 (7.15)

We can evidently recover 𝐴 ̂ from

𝐴 ̂ = 𝑈 𝐴𝑈
̃ 𝑇

Dynamics of the rotated 𝑚 × 1 state 𝑏̃𝑡 are governed by

𝑏̃𝑡+1 = 𝐴𝑏̃ ̃𝑡

To construct forecasts 𝑋 𝑡 of future values of 𝑋𝑡 conditional on 𝑋1 , we can apply decoders (i.e., rotators) to both sides
of this equation and deduce

𝑋 𝑡+1 = 𝑈 𝐴𝑡̃ 𝑈 𝑇 𝑋1

where we use 𝑋 𝑡 to denote a forecast.

7.12 Representation 2

This representation is related to one originally proposed by [Sch10].


It can be regarded as an intermediate step to a related and perhaps more useful representation 3.
As with Representation 1, we continue to
• use a full SVD and not a reduced SVD
As we observed and illustrated earlier in this lecture, for a full SVD 𝑈 𝑈 𝑇 and 𝑈 𝑇 𝑈 are both identity matrices; but under
a reduced SVD of 𝑋, 𝑈 𝑇 𝑈 is not an identity matrix.
As we shall see, a full SVD is too confining for what we ultimately want to do, namely, situations in which 𝑈 𝑇 𝑈 is not
an identity matrix because we use a reduced SVD of 𝑋.
But for now, let’s proceed under the assumption that both of the preceding two requirements are satisfied.

7.11. Representation 1 109


Quantitative Economics with Python

Form an eigendecomposition of the 𝑚 × 𝑚 matrix 𝐴 ̃ = 𝑈 𝑇 𝐴𝑈


̂ defined in equation (7.15):

𝐴 ̃ = 𝑊 Λ𝑊 −1 (7.16)

where Λ is a diagonal matrix of eigenvalues and 𝑊 is an 𝑚 × 𝑚 matrix whose columns are eigenvectors corresponding
to rows (eigenvalues) in Λ.
When 𝑈 𝑈 𝑇 = 𝐼𝑚×𝑚 , as is true with a full SVD of 𝑋, it follows that

𝐴 ̂ = 𝑈 𝐴𝑈
̃ 𝑇 = 𝑈 𝑊 Λ𝑊 −1 𝑈 𝑇 (7.17)

Evidently, according to equation (7.17), the diagonal matrix Λ contains eigenvalues of 𝐴 ̂ and corresponding eigenvectors
of 𝐴 ̂ are columns of the matrix 𝑈 𝑊 .
Thus, the systematic (i.e., not random) parts of the 𝑋𝑡 dynamics captured by our first-order vector autoregressions are
described by

𝑋𝑡+1 = 𝑈 𝑊 Λ𝑊 −1 𝑈 𝑇 𝑋𝑡

Multiplying both sides of the above equation by 𝑊 −1 𝑈 𝑇 gives

𝑊 −1 𝑈 𝑇 𝑋𝑡+1 = Λ𝑊 −1 𝑈 𝑇 𝑋𝑡

or

𝑏̂𝑡+1 = Λ𝑏̂𝑡

where now our encoder is

𝑏̂𝑡 = 𝑊 −1 𝑈 𝑇 𝑋𝑡

and our decoder is

𝑋𝑡 = 𝑈 𝑊 𝑏̂𝑡

We can use this representation to construct a predictor 𝑋 𝑡+1 of 𝑋𝑡+1 conditional on 𝑋1 via:

𝑋 𝑡+1 = 𝑈 𝑊 Λ𝑡 𝑊 −1 𝑈 𝑇 𝑋1 (7.18)

In effect, [Sch10] defined an 𝑚 × 𝑚 matrix Φ𝑠 as

Φ𝑠 = 𝑈 𝑊 (7.19)

and represented equation (7.18) as

𝑋 𝑡+1 = Φ𝑠 Λ𝑡 Φ+
𝑠 𝑋1 (7.20)

Components of the basis vector 𝑏̂𝑡 = 𝑊 −1 𝑈 𝑇 𝑋𝑡 ≡ Φ+


𝑠 are often called DMD modes, or sometimes also DMD projected
nodes.
We turn next to an alternative representation suggested by Tu et al. [TRL+14], one that is more appropriate to use when,
as in practice is typically the case, we use a reduced SVD.

110 Chapter 7. Singular Value Decomposition (SVD)


Quantitative Economics with Python

7.13 Representation 3

Departing from the procedures used to construct Representations 1 and 2, each of which deployed a full SVD, we now
use a reduced SVD.
Again, we let 𝑝 ≤ min(𝑚, 𝑛) be the rank of 𝑋.
Construct a reduced SVD

𝑋 = 𝑈̃ Σ̃ 𝑉 ̃ 𝑇 ,

where now 𝑈 is 𝑚 × 𝑝 and Σ is 𝑝 × 𝑝 and 𝑉 𝑇 is 𝑝 × 𝑛.


Our minimum-norm least-squares estimator approximator of 𝐴 now has representation

𝐴 ̂ = 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑈̃ 𝑇

Paralleling a step in Representation 1, define a transition matrix for a rotated 𝑝 × 1 state 𝑏̃𝑡 by

𝐴 ̃ = 𝑈 ̃ 𝑇 𝐴𝑈
̂ ̃ (7.21)

Because we are now working with a reduced SVD, so that 𝑈̃ 𝑈̃ 𝑇 ≠ 𝐼, since 𝐴 ̂ ≠ 𝑈̃ 𝐴𝑈


̃ ̃ 𝑇 , we can’t simply recover 𝐴 ̂ from
𝐴 ̃ and 𝑈̃ .
Nevertheless, hoping for the best, we persist and construct an eigendecomposition of what is now a 𝑝 × 𝑝 matrix 𝐴:̃

𝐴 ̃ = 𝑊 Λ𝑊 −1 (7.22)

Mimicking our procedure in Representation 2, we cross our fingers and compute the 𝑚 × 𝑝 matrix

Φ̃ 𝑠 = 𝑈̃ 𝑊 (7.23)

that corresponds to (7.19) for a full SVD.


̂ ̃ :
At this point, it is interesting to compute 𝐴Φ 𝑠

̂ ̃ = (𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑈̃ 𝑇 )(𝑈̃ 𝑊 )
𝐴Φ 𝑠

= 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑊
≠ (𝑈̃ 𝑊 )Λ
= Φ̃ 𝑠 Λ

That 𝐴Φ̂ ̃ ≠ Φ̃ Λ means, that unlike the corresponding situation in Representation 2, columns of Φ̃ = 𝑈̃ 𝑊 are not
𝑠 𝑠 𝑠
eigenvectors of 𝐴 ̂ corresponding to eigenvalues Λ.
But in a quest for eigenvectors of 𝐴 ̂ that we can compute with a reduced SVD, let’s define

̂ ̃ = 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑊
Φ ≡ 𝐴Φ 𝑠

It turns out that columns of Φ are eigenvectors of 𝐴,̂ a consequence of a result established by Tu et al. [TRL+14].
To present their result, for convenience we’ll drop the tilde ⋅ ̃ above 𝑈 , 𝑉 , and Σ and adopt the understanding that each of
them is computed with a reduced SVD.
Thus, we now use the notation that the 𝑚 × 𝑝 matrix Φ is defined as

Φ = 𝑋 ′ 𝑉 Σ−1 𝑊 (7.24)

7.13. Representation 3 111


Quantitative Economics with Python

Proposition The 𝑝 columns of Φ are eigenvectors of 𝐴.̌


Proof: From formula (7.24) we have

̂ = (𝑋 ′ 𝑉 Σ−1 𝑈 𝑇 )(𝑋 ′ 𝑉 Σ−1 𝑊 )


𝐴Φ
̃
= 𝑋 ′ 𝑉 Σ−1 𝐴𝑊
= 𝑋 ′ 𝑉 Σ−1 𝑊 Λ
= ΦΛ

Thus, we have deduced that

̂ = ΦΛ
𝐴Φ (7.25)

Let 𝜙𝑖 be the the 𝑖the column of Φ and 𝜆𝑖 be the corresponding 𝑖 eigenvalue of 𝐴 ̃ from decomposition (7.22).
Writing out the 𝑚 × 1 vectors on both sides of equation (7.25) and equating them gives

̂ =𝜆𝜙.
𝐴𝜙 𝑖 𝑖 𝑖

Thus, 𝜙𝑖 is an eigenvector of 𝐴 ̂ that corresponds to eigenvalue 𝜆𝑖 of 𝐴.̃


This concludes the proof.
Also see [BK19] (p. 238)

7.13.1 Decoder of 𝑋 as linear projection

From eigendecomposition (7.25) we can represent 𝐴 ̂ as

𝐴 ̂ = ΦΛΦ+ . (7.26)

From formula (7.26) we can deduce the reduced dimension dynamics

𝑏̌𝑡+1 = Λ𝑏̌𝑡

where

𝑏̌𝑡 = Φ+ 𝑋𝑡 (7.27)

Since Φ has 𝑝 linearly independent columns, the generalized inverse of Φ is

Φ† = (Φ𝑇 Φ)−1 Φ𝑇

and so

𝑏̌ = (Φ𝑇 Φ)−1 Φ𝑇 𝑋 (7.28)

𝑏̌ is recognizable as the matrix of least squares regression coefficients of the matrix 𝑋 on the matrix Φ and

𝑋̌ = Φ𝑏̌

is the least squares projection of 𝑋 on Φ.


By virtue of least-squares projection theory discussed here https://python-advanced.quantecon.org/orth_proj.html, we
can represent 𝑋 as the sum of the projection 𝑋̌ of 𝑋 on Φ plus a matrix of errors.

112 Chapter 7. Singular Value Decomposition (SVD)


Quantitative Economics with Python

To verify this, note that the least squares projection 𝑋̌ is related to 𝑋 by

𝑋 = Φ 𝑏̌ + 𝜖

where 𝜖 is an 𝑚 × 𝑛 matrix of least squares errors satisfying the least squares orthogonality conditions 𝜖𝑇 Φ = 0 or

(𝑋 − Φ𝑏)̌ 𝑇 Φ = 0𝑚×𝑝 (7.29)

̌ 𝑇 Φ which implies formula (7.28).


Rearranging the orthogonality conditions (7.29) gives 𝑋 𝑇 Φ = 𝑏Φ

7.13.2 Alternative algorithm

There is a better way to compute the 𝑝 × 1 vector 𝑏̌𝑡 than provided by formula (7.27).
In particular, the following argument from [BK19] (page 240) provides a computationally efficient way to compute 𝑏̌𝑡 .
For convenience, we’ll do this first for time 𝑡 = 1.
For 𝑡 = 1, we have

𝑋1 = Φ𝑏̌1 (7.30)

where 𝑏̌1 is an 𝑟 × 1 vector.


Recall from representation 1 above that 𝑋1 = 𝑈 𝑏̃1 , where 𝑏̃1 is the time 1 basis vector for representation 1.
It then follows from equation (7.24) that

𝑈 𝑏̃1 = 𝑋 ′ 𝑉 Σ−1 𝑊 𝑏̌1

and consequently

𝑏̃1 = 𝑈 𝑇 𝑋 ′ 𝑉 Σ−1 𝑊 𝑏̌1

Recall that from equation (7.12), 𝐴 ̃ = 𝑈 𝑇 𝑋 ′ 𝑉 Σ−1 .


It then follows that

̃ 𝑏̌
𝑏̃1 = 𝐴𝑊 1

and therefore, by the eigendecomposition (7.16) of 𝐴,̃ we have

𝑏̃1 = 𝑊 Λ𝑏̌1

Consequently,

𝑏̌1 = (𝑊 Λ)−1 𝑏̃1

or

𝑏̌1 = (𝑊 Λ)−1 𝑈 𝑇 𝑋1 , (7.31)

which is computationally more efficient than the following instance of equation (7.27) for computing the initial vector 𝑏̌1 :

𝑏̌1 = Φ+ 𝑋1 (7.32)

Users of DMD sometimes call components of the basis vector 𝑏̌𝑡 = Φ+ 𝑋𝑡 ≡ (𝑊 Λ)−1 𝑈 𝑇 𝑋𝑡 the exact DMD modes.

7.13. Representation 3 113


Quantitative Economics with Python

Conditional on 𝑋𝑡 , we can compute our decoded 𝑋̌ 𝑡+𝑗 , 𝑗 = 1, 2, … from either

𝑋̌ 𝑡+𝑗 = ΦΛ𝑗 Φ+ 𝑋𝑡 (7.33)

or

𝑋̌ 𝑡+𝑗 = ΦΛ𝑗 (𝑊 Λ)−1 𝑈 𝑇 𝑋𝑡 . (7.34)

We can then use 𝑋̌ 𝑡+𝑗 to forcast 𝑋𝑡+𝑗 .

7.14 Using Fewer Modes

Some of the preceding formulas assume that we have retained all 𝑝 modes associated with the positive singular values of
𝑋.
We can adjust our formulas to describe a situation in which we instead retain only the 𝑟 < 𝑝 largest singular values.
In that case, we simply replace Σ with the appropriate 𝑟 × 𝑟 matrix of singular values, 𝑈 with the 𝑚 × 𝑟 matrix of whose
columns correspond to the 𝑟 largest singular values, and 𝑉 with the 𝑛 × 𝑟 matrix whose columns correspond to the 𝑟
largest singular values.
Counterparts of all of the salient formulas above then apply.

7.15 Source for Some Python Code

You can find a Python implementation of DMD here:


https://mathlab.github.io/PyDMD/

114 Chapter 7. Singular Value Decomposition (SVD)


Part II

Elementary Statistics

115
CHAPTER

EIGHT

ELEMENTARY PROBABILITY WITH MATRICES

This lecture uses matrix algebra to illustrate some basic ideas about probability theory.
After providing somewhat informal definitions of the underlying objects, we’ll use matrices and vectors to describe
probability distributions.
Among concepts that we’ll be studying include
• a joint probability distribution
• marginal distributions associated with a given joint distribution
• conditional probability distributions
• statistical independence of two random variables
• joint distributions associated with a prescribed set of marginal distributions
– couplings
– copulas
• the probability distribution of a sum of two independent random variables
– convolution of marginal distributions
• parameters that define a probability distribution
• sufficient statistics as data summaries
We’ll use a matrix to represent a bivariate probability distribution and a vector to represent a univariate probability dis-
tribution
As usual, we’ll start with some imports

# !pip install prettytable

import numpy as np
import matplotlib.pyplot as plt
import prettytable as pt
from mpl_toolkits.mplot3d import Axes3D
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')
%matplotlib inline

/tmp/ipykernel_17182/4202758366.py:6: DeprecationWarning: `set_matplotlib_formats`␣


↪is deprecated since IPython 7.23, directly use `matplotlib_inline.backend_inline.

↪set_matplotlib_formats()`

set_matplotlib_formats('retina')

117
Quantitative Economics with Python

8.1 Sketch of Basic Concepts

We’ll briefly define what we mean by a probability space, a probability measure, and a random variable.
For most of this lecture, we sweep these objects into the background, but they are there underlying the other objects that
we’ll mainly focus on.
Let Ω be a set of possible underlying outcomes and let 𝜔 ∈ Ω be a particular underlying outcomes.
Let 𝒢 ⊂ Ω be a subset of Ω.
Let ℱ be a collection of such subsets 𝒢 ⊂ Ω.
The pair Ω, ℱ forms our probability space on which we want to put a probability measure.
A probability measure 𝜇 maps a set of possible underlying outcomes 𝒢 ∈ ℱ into a scalar number between 0 and 1
• this is the “probability” that 𝑋 belongs to 𝐴, denoted by Prob{𝑋 ∈ 𝐴}.
A random variable 𝑋(𝜔) is a function of the underlying outcome 𝜔 ∈ Ω.
The random variable 𝑋(𝜔) has a probability distribution that is induced by the underlying probability measure 𝜇 and
the function 𝑋(𝜔):

Prob(𝑋 ∈ 𝐴) = ∫ 𝜇(𝜔)𝑑𝜔 (8.1)


𝒢

where 𝒢 is the subset of Ω for which 𝑋(𝜔) ∈ 𝐴.


We call this the induced probability distribution of random variable 𝑋.

8.2 Digression: What Does Probability Mean?

Before diving in, we’ll say a few words about what probability theory means and how it connects to statistics.
These are topics that are also touched on in the quantecon lectures https://python.quantecon.org/prob_meaning.html and
https://python.quantecon.org/navy_captain.html.
For much of this lecture we’ll be discussing fixed “population” probabilities.
These are purely mathematical objects.
To appreciate how statisticians connect probabilities to data, the key is to understand the following concepts:
• A single draw from a probability distribution
• Repeated independently and identically distributed (i.i.d.) draws of “samples” or “realizations” from the same
probability distribution
• A statistic defined as a function of a sequence of samples
• An empirical distribution or histogram (a binned empirical distribution) that records observed relative fre-
quencies
• The idea that a population probability distribution is what we anticipate relative frequencies will be in a long
sequence of i.i.d. draws. Here the following mathematical machinery makes precise what is meant by anticipated
relative frequencies
– Law of Large Numbers (LLN)
– Central Limit Theorem (CLT)

118 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

Scalar example
Consider the following discrete distribution
𝑋 ∼ {𝑓𝑖 }𝐼−1
𝑖=0 , 𝑓𝑖 ⩾ 0, ∑ 𝑓𝑖 = 1
𝑖

Draw a sample 𝑥0 , 𝑥1 , … , 𝑥𝑁−1 , 𝑁 draws of 𝑋 from {𝑓𝑖 }𝐼𝑖=1 .


What do the “identical” and “independent” mean in IID or iid (“identically and independently distributed)?
• “identical” means that each draw is from the same distribution.
• “independent” means that the joint distribution equal tthe product of marginal distributions, i.e.,
Prob{𝑥0 = 𝑖0 , 𝑥1 = 𝑖1 , … , 𝑥𝑁−1 = 𝑖𝑁−1 } = Prob{𝑥0 = 𝑖0 } ⋅ ⋯ ⋅ Prob{𝑥𝐼−1 = 𝑖𝐼−1 }
= 𝑓𝑖0 𝑓𝑖1 ⋅ ⋯ ⋅ 𝑓𝑖𝑁−1
Consider the empirical distribution:
𝑖 = 0, … , 𝐼 − 1,
𝑁𝑖 = number of times 𝑋 = 𝑖,
𝐼−1
𝑁 = ∑ 𝑁𝑖 total number of draws,
𝑖=0
𝑁𝑖
𝑓𝑖̃ = ∼ frequency of draws for which 𝑋 = 𝑖
𝑁
Key ideas that justify connecting probability theory with statistics are laws of large numbers and central limit theorems
LLN:
• A Law of Large Numbers (LLN) states that 𝑓𝑖̃ → 𝑓𝑖 as 𝑁 → ∞
CLT:
• A Central Limit Theorem (CLT) describes a rate at which 𝑓𝑖̃ → 𝑓𝑖
Remarks
• For “frequentist” statisticians, anticipated relative frequency is all that a probability distribution means.
• But for a Bayesian it means something more or different.

8.3 Representing Probability Distributions

A probability distribution Prob(𝑋 ∈ 𝐴) can be described by its cumulative distribution function (CDF)

𝐹𝑋 (𝑥) = Prob{𝑋 ≤ 𝑥}.

Sometimes, but not always, a random variable can also be described by density function 𝑓(𝑥) that is related to its CDF
by

Prob{𝑋 ∈ 𝐵} = ∫ 𝑓(𝑡)𝑑𝑡
𝑡∈𝐵
𝑥
𝐹 (𝑥) = ∫ 𝑓(𝑡)𝑑𝑡
−∞
Here 𝐵 is a set of possible 𝑋’s whose probability we want to compute.
When a probability density exists, a probability distribution can be characterized either by its CDF or by its density.
For a discrete-valued random variable

8.3. Representing Probability Distributions 119


Quantitative Economics with Python

• the number of possible values of 𝑋 is finite or countably infinite


• we replace a density with a probability mass function, a non-negative sequence that sums to one
• we replace integration with summation in the formula like (8.1) that relates a CDF to a probability mass function
In this lecture, we mostly discuss discrete random variables.
Doing this enables us to confine our tool set basically to linear algebra.
Later we’ll briefly discuss how to approximate a continuous random variable with a discrete random variable.

8.4 Univariate Probability Distributions

We’ll devote most of this lecture to discrete-valued random variables, but we’ll say a few things about continuous-valued
random variables.

8.4.1 Discrete random variable

Let 𝑋 be a discrete random variable that takes possible values: 𝑖 = 0, 1, … , 𝐼 − 1 = 𝑋.̄


Here, we choose the maximum index 𝐼 − 1 because of how this aligns nicely with Python’s index convention.
Define 𝑓𝑖 ≡ Prob{𝑋 = 𝑖} and assemble the non-negative vector

𝑓0
⎡ 𝑓 ⎤
𝑓 =⎢ 1 ⎥ (8.2)
⎢ ⋮ ⎥
⎣ 𝑓𝐼−1 ⎦
𝐼−1
for which 𝑓𝑖 ∈ [0, 1] for each 𝑖 and ∑𝑖=0 𝑓𝑖 = 1.
This vector defines a probability mass function.
𝐼−2
The distribution (8.2) has parameters {𝑓𝑖 }𝑖=0,1,⋯,𝐼−2 since 𝑓𝐼−1 = 1 − ∑𝑖=0 𝑓𝑖 .
These parameters pin down the shape of the distribution.
(Sometimes 𝐼 = ∞.)
Such a “non-parametric” distribution has as many “parameters” as there are possible values of the random variable.
We often work with special distributions that are characterized by a small number parameters.
In these special parametric distributions,

𝑓𝑖 = 𝑔(𝑖; 𝜃)

where 𝜃 is a vector of parameters that is of much smaller dimension than 𝐼.


Remarks:
• The concept of parameter is intimately related to the notion of sufficient statistic.
• Sufficient statistic are nonlinear function of a data set.
• Sufficient statistics are designed to summarize all information about the parameters that is contained in the big
data set.
• They are important tools that AI uses to reduce the size of a big data set

120 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

• R. A. Fisher provided a sharp definition of information – see https://en.wikipedia.org/wiki/Fisher_information


An example of a parametric probability distribution is a geometric distribution.
It is described by

𝑓𝑖 = Prob{𝑋 = 𝑖} = (1 − 𝜆)𝜆𝑖 , 𝜆 ∈ [0, 1], 𝑖 = 0, 1, 2, …



Evidently, ∑𝑖=0 𝑓𝑖 = 1.
Let 𝜃 be a vector of parameters of the distribution described by 𝑓, then

𝑓𝑖 (𝜃) ≥ 0, ∑ 𝑓𝑖 (𝜃) = 1
𝑖=0

8.4.2 Continuous random variable

Let 𝑋 be a continous random variable that takes values 𝑋 ∈ 𝑋̃ ≡ [𝑋𝑈 , 𝑋𝐿 ] whose distributions have parameters 𝜃.

Prob{𝑋 ∈ 𝐴} = ∫ 𝑓(𝑥; 𝜃) 𝑑𝑥; 𝑓(𝑥; 𝜃) ≥ 0


𝑥∈𝐴

where 𝐴 is a subset of 𝑋̃ and

̃ =1
Prob{𝑋 ∈ 𝑋}

8.5 Bivariate Probability Distributions

We’ll now discuss a bivariate joint distribution.


To begin, we restrict ourselves to two discrete random variables.
Let 𝑋, 𝑌 be two discrete random variables that take values:

𝑋 ∈ {0, … , 𝐽 − 1}

𝑌 ∈ {0, … , 𝐽 − 1}
Then their joint distribution is described by a matrix

𝐹𝐼×𝐽 = [𝑓𝑖𝑗 ]𝑖∈{0,…,𝐽−1},𝑗∈{0,…,𝐽−1}

whose elements are

𝑓𝑖𝑗 = Prob{𝑋 = 𝑖, 𝑌 = 𝑗} ≥ 0

where

∑ ∑ 𝑓𝑖𝑗 = 1
𝑖 𝑗

8.5. Bivariate Probability Distributions 121


Quantitative Economics with Python

8.6 Marginal Probability Distributions

The joint distribution induce marginal distributions


𝐽−1
Prob{𝑋 = 𝑖} = ∑ 𝑓𝑖𝑗 = 𝜇𝑖 , 𝑖 = 0, … , 𝐼 − 1,
𝑗=0

𝐼−1
Prob{𝑌 = 𝑗} = ∑ 𝑓𝑖𝑗 = 𝜈𝑖 , 𝑖 = 0, … , 𝐽 − 1
𝑖=0
For example, let the joint distribution over (𝑋, 𝑌 ) be
.25 .1
𝐹 =[ ] (8.3)
.15 .5
Then marginal distributions are:
Prob{𝑋 = 0} = .25 + .1 = .35
Prob{𝑋 = 1} = .15 + .5 = .65
Prob{𝑌 = 0} = .25 + .15 = .4
Prob{𝑌 = 1} = .1 + .5 = .6
Digression: If two random variables 𝑋, 𝑌 are continuous and have joint density 𝑓(𝑥, 𝑦), then marginal distributions can
be computed by

𝑓(𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦


𝑓(𝑦) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑥


8.7 Conditional Probability Distributions

Conditional probabilities are defined according to


Prob{𝐴 ∩ 𝐵}
Prob{𝐴 ∣ 𝐵} =
Prob{𝐵}
where 𝐴, 𝐵 are two events.
For a pair of discrete random variables, we have the conditional distribution
𝑓𝑖𝑗 Prob{𝑋 = 𝑖, 𝑌 = 𝑗}
Prob{𝑋 = 𝑖|𝑌 = 𝑗} = =
∑𝑖 𝑓𝑖𝑗 Prob{𝑌 = 𝑗}
where 𝑖 = 0, … , 𝐼 − 1, 𝑗 = 0, … , 𝐽 − 1.
Note that
∑𝑖 𝑓𝑖𝑗
∑ Prob{𝑋𝑖 = 𝑖|𝑌𝑗 = 𝑗} = =1
𝑖
∑𝑖 𝑓𝑖𝑗
Remark: The mathematics of conditional probability implies Bayes’ Law:
Prob{𝑋 = 𝑖, 𝑌 = 𝑗} Prob{𝑌 = 𝑗|𝑋 = 𝑖}Prob{𝑋 = 𝑖}
Prob{𝑋 = 𝑖|𝑌 = 𝑗} = =
Prob{𝑌 = 𝑗} Prob{𝑌 = 𝑗}
For the joint distribution (8.3)
.1 .1
Prob{𝑋 = 0|𝑌 = 1} = =
.1 + .5 .6

122 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

8.8 Statistical Independence

Random variables X and Y are statistically independent if

Prob{𝑋 = 𝑖, 𝑌 = 𝑗} = 𝑓𝑖 𝑔𝑖

where
Prob{𝑋 = 𝑖} = 𝑓𝑖 ≥ 0 ∑ 𝑓𝑖 = 1
Prob{𝑌 = 𝑗} = 𝑔𝑗 ≥ 0 ∑ 𝑔𝑗 = 1

Conditional distributions are


𝑓𝑖 𝑔𝑖 𝑓𝑔
Prob{𝑋 = 𝑖|𝑌 = 𝑗} = = 𝑖 𝑖 = 𝑓𝑖
∑𝑖 𝑓𝑖 𝑔𝑗 𝑔𝑖
𝑓𝑖 𝑔 𝑖 𝑓𝑔
Prob{𝑌 = 𝑗|𝑋 = 𝑖} = = 𝑖 𝑖 = 𝑔𝑖
∑𝑗 𝑓𝑖 𝑔𝑗 𝑓𝑖

8.9 Means and Variances

The mean and variance of a discrete random variable 𝑋 are


𝜇𝑋 ≡ 𝔼 [𝑋] = ∑ 𝑘Prob{𝑋 = 𝑘}
𝑘
2 2
𝜎𝑋 ≡ 𝔻 [𝑋] = ∑ (𝑘 − 𝔼 [𝑋]) Prob{𝑋 = 𝑘}
𝑘

A continuous random variable having density 𝑓𝑋 (𝑥)) has mean and variance

𝜇𝑋 ≡ 𝔼 [𝑋] = ∫ 𝑥𝑓𝑋 (𝑥)𝑑𝑥
−∞

2 2 2
𝜎𝑋 ≡ 𝔻 [𝑋] = E [(𝑋 − 𝜇𝑋 ) ] = ∫ (𝑥 − 𝜇𝑋 ) 𝑓𝑋 (𝑥)𝑑𝑥
−∞

8.10 Classic Trick for Generating Random Numbers

Suppose we have at our disposal a pseudo random number that draws a uniform random variable, i.e., one with probability
distribution
1
Prob{𝑋̃ = 𝑖} = , 𝑖 = 0, … , 𝐼 − 1
𝐼

How can we transform 𝑋̃ to get a random variable 𝑋 for which Prob{𝑋 = 𝑖} = 𝑓𝑖 , 𝑖 = 0, … , 𝐼 − 1, where 𝑓𝑖 is an
arbitary discrete probability distribution on 𝑖 = 0, 1, … , 𝐼 − 1?
The key tool is the inverse of a cumulative distribution function (CDF).
Observe that the CDF of a distribution is monotone and non-decreasing, taking values between 0 and 1.
We can draw a sample of a random variable 𝑋 with a known CDF as follows:
• draw a random variable 𝑢 from a uniform distribution on [0, 1]
• pass the sample value of 𝑢 into the “inverse” target CDF for 𝑋

8.8. Statistical Independence 123


Quantitative Economics with Python

• 𝑋 has the target CDF


Thus, knowing the “inverse” CDF of a distribution is enough to simulate from this distribution.
NOTE: The “inverse” CDF needs to exist for this method to work.
The inverse CDF is

𝐹 −1 (𝑢) ≡ inf{𝑥 ∈ ℝ ∶ 𝐹 (𝑥) ≥ 𝑢} (0 < 𝑢 < 1)

Here we use infimum because a CDF is a non-decreasing and right-continuous function.


Thus, suppose that
• 𝑈 is a uniform random variable 𝑈 ∈ [0, 1]
• We want to sample a random variable 𝑋 whose CDF is 𝐹 .
It turns out that if we use draw uniform random numbers 𝑈 and then compute 𝑋 from

𝑋 = 𝐹 −1 (𝑈 ),

then 𝑋 ia a random variable with CDF 𝐹𝑋 (𝑥) = 𝐹 (𝑥) = Prob{𝑋 ≤ 𝑥}.


We’ll verify this in the special case in which 𝐹 is continuous and bijective so that its inverse function exists andcan be
denoted by 𝐹 −1 .
Note that
𝐹𝑋 (𝑥) = Prob {𝑋 ≤ 𝑥}
= Prob {𝐹 −1 (𝑈 ) ≤ 𝑥}
= Prob {𝑈 ≤ 𝐹 (𝑥)}
= 𝐹 (𝑥)

where the last equality occurs because 𝑈 is distributed uniformly on [0, 1] while 𝐹 (𝑥) is a constant given 𝑥 that also lies
on [0, 1].
Let’s use numpy to compute some examples.
Example: A continuous geometric (exponential) distribution
Let 𝑋 follow a geometric distribution, with parameter 𝜆 > 0.
Its density function is

𝑓(𝑥) = 𝜆𝑒−𝜆𝑥

Its CDF is

𝐹 (𝑥) = ∫ 𝜆𝑒−𝜆𝑥 = 1 − 𝑒−𝜆𝑥
0

Let 𝑈 follow a uniform distribution on [0, 1].


𝑋 is a random variable such that 𝑈 = 𝐹 (𝑋).
The distribution 𝑋 can be deduced from
𝑈 = 𝐹 (𝑋) = 1 − 𝑒−𝜆𝑋
⟹ − 𝑈 = 𝑒−𝜆𝑋
⟹ log(1 − 𝑈 ) = −𝜆𝑋
(1 − 𝑈 )
⟹ 𝑋=
−𝜆

124 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

𝑙𝑜𝑔(1−𝑈)
Let’s draw 𝑢 from 𝑈 [0, 1] and calculate 𝑥 = −𝜆 .
We’ll check whether 𝑋 seems to follow a continuous geometric (exponential) distribution.
Let’s check with numpy.

n, λ = 1_000_000, 0.3

# draw uniform numbers


u = np.random.rand(n)

# transform
x = -np.log(1-u)/λ

# draw geometric distributions


x_g = np.random.exponential(1 / λ, n)

# plot and compare


plt.hist(x, bins=100, density=True)
plt.show()

plt.hist(x_g, bins=100, density=True, alpha=0.6)


plt.show()

8.10. Classic Trick for Generating Random Numbers 125


Quantitative Economics with Python

Geometric distribution
Let 𝑋 distributed geometrically, that is

Prob(𝑋 = 𝑖) = (1 − 𝜆)𝜆𝑖 , 𝜆 ∈ (0, 1), 𝑖 = 0, 1, …


∞ ∞
1−𝜆
∑ Prob(𝑋 = 𝑖) = 1 ⟷ (1 − 𝜆) ∑ 𝜆𝑖 = =1
𝑖=0 𝑖=0
1−𝜆

Its CDF is given by


𝑖
Prob(𝑋 ≤ 𝑖) = (1 − 𝜆) ∑ 𝜆𝑖
𝑗=0

1 − 𝜆𝑖+1
= (1 − 𝜆)[ ]
1−𝜆
= 1 − 𝜆𝑖+1
= 𝐹 (𝑋) = 𝐹𝑖

Again, let 𝑈̃ follow a uniform distribution and we want to find 𝑋 such that 𝐹 (𝑋) = 𝑈̃ .
Let’s deduce the distribution of 𝑋 from

𝑈̃ = 𝐹 (𝑋) = 1 − 𝜆𝑥+1
1 − 𝑈̃ = 𝜆𝑥+1
𝑙𝑜𝑔(1 − 𝑈̃ ) = (𝑥 + 1) log 𝜆
log(1 − 𝑈̃ )
=𝑥+1
log 𝜆
log(1 − 𝑈̃ )
−1=𝑥
log 𝜆

However, 𝑈̃ = 𝐹 −1 (𝑋) may not be an integer for any 𝑥 ≥ 0.

126 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

So let

log(1 − 𝑈̃ )
𝑥=⌈ − 1⌉
log 𝜆

where ⌈.⌉ is the ceiling function.


Thus 𝑥 is the smallest integer such that the discrete geometric CDF is greater than or equal to 𝑈̃ .
We can verify that 𝑥 is indeed geometrically distributed by the following numpy program.
Note: The exponential distribution is the continuous analog of geometric distribution.

n, λ = 1_000_000, 0.8

# draw uniform numbers


u = np.random.rand(n)

# transform
x = np.ceil(np.log(1-u)/np.log(λ) - 1)

# draw geometric distributions


x_g = np.random.geometric(1-λ, n)

# plot and compare


plt.hist(x, bins=150, density=True)
plt.show()

np.random.geometric(1-λ, n).max()

56

np.log(0.4)/np.log(0.3)

8.10. Classic Trick for Generating Random Numbers 127


Quantitative Economics with Python

0.7610560044063083

plt.hist(x_g, bins=150, density=True, alpha=0.6)


plt.show()

8.11 Some Discrete Probability Distributions

Let’s write some Python code to compute means and variances of some univariate random variables.
We’ll use our code to
• compute population means and variances from the probability distribution
• generate a sample of 𝑁 independently and identically distributed draws and compute sample means and variances
• compare population and sample means and variances

8.12 Geometric distribution

Prob(𝑋 = 𝑘) = (1 − 𝑝)𝑘−1 𝑝, 𝑘 = 1, 2, …

1
𝔼(𝑋) =
𝑝
1−𝑝
𝔻(𝑋) =
𝑝2
We draw observations from the distribution and compare the sample mean and variance with the theoretical results.

128 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

# specify parameters
p, n = 0.3, 1_000_000

# draw observations from the distribution


x = np.random.geometric(p, n)

# compute sample mean and variance


μ_hat = np.mean(x)
σ2_hat = np.var(x)

print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)

# compare with theoretical results


print("\nThe population mean is: ", 1/p)
print("The population variance is: ", (1-p)/(p**2))

The sample mean is: 3.333938


The sample variance is: 7.759151412156005

The population mean is: 3.3333333333333335


The population variance is: 7.777777777777778

8.12.1 Newcomb–Benford distribution

The Newcomb–Benford law fits many data sets, e.g., reports of incomes to tax authorities, in which the leading digit is
more likely to be small than large.
See https://en.wikipedia.org/wiki/Benford%27s_law
A Benford probability distribution is
1
Prob{𝑋 = 𝑑} = log10 (𝑑 + 1) − log10 (𝑑) = log10 (1 + )
𝑑
where 𝑑 ∈ {1, 2, ⋯ , 9} can be thought of as a first digit in a sequence of digits.
This is a well defined discrete distribution since we can verify that probabilities are nonnegative and sum to 1.
9
1 1
log10 (1 + ) ≥ 0, ∑ log10 (1 + )=1
𝑑 𝑑=1
𝑑

The mean and variance of a Benford distribution are


9
1
𝔼 [𝑋] = ∑ 𝑑 log10 (1 + ) ≃ 3.4402
𝑑=1
𝑑
9
2 1
𝕍 [𝑋] = ∑ (𝑑 − 𝔼 [𝑋]) log10 (1 + ) ≃ 6.0565
𝑑=1
𝑑

We verify the above and compute the mean and variance using numpy.

Benford_pmf = np.array([np.log10(1+1/d) for d in range(1,10)])


k = np.array(range(1,10))

# mean
(continues on next page)

8.12. Geometric distribution 129


Quantitative Economics with Python

(continued from previous page)


mean = np.sum(Benford_pmf * k)

# variance
var = np.sum([(k-mean)**2 * Benford_pmf])

# verify sum to 1
print(np.sum(Benford_pmf))
print(mean)
print(var)

0.9999999999999999
3.440236967123206
6.056512631375667

# plot distribution
plt.plot(range(1,10), Benford_pmf, 'o')
plt.title('Benford\'s distribution')
plt.show()

130 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

8.12.2 Pascal (negative binomial) distribution

Consider a sequence of independent Bernoulli trials.


Let 𝑝 be the probability of success.
Let 𝑋 be a random variable that represents the number of failures before we get 𝑟 success.
Its distribution is
𝑋 ∼ 𝑁 𝐵(𝑟, 𝑝)
𝑘+𝑟−1 𝑟
Prob(𝑋 = 𝑘; 𝑟, 𝑝) = ( ) 𝑝 (1 − 𝑝)𝑘
𝑟−1
Here, we choose from among 𝑘 + 𝑟 − 1 possible outcomes because the last draw is by definition a success.
We compute the mean and variance to be
𝑘(1 − 𝑝)
𝔼(𝑋) =
𝑝
𝑘(1 − 𝑝)
𝕍(𝑋) =
𝑝2
# specify parameters
r, p, n = 10, 0.3, 1_000_000

# draw observations from the distribution


x = np.random.negative_binomial(r, p, n)

# compute sample mean and variance


μ_hat = np.mean(x)
σ2_hat = np.var(x)

print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
print("\nThe population mean is: ", r*(1-p)/p)
print("The population variance is: ", r*(1-p)/p**2)

The sample mean is: 23.33375


The sample variance is: 77.91651893750002

The population mean is: 23.333333333333336


The population variance is: 77.77777777777779

8.13 Continuous Random Variables

8.13.1 Univariate Gaussian distribution

We write

𝑋 ∼ 𝑁 (𝜇, 𝜎2 )

to indicate the probability distribution


1 1 2
𝑓(𝑥|𝑢, 𝜎2 ) = √ 𝑒[− 2𝜎2 (𝑥−𝑢) ]
2𝜋𝜎2
In the below example, we set 𝜇 = 0, 𝜎 = 0.1.

8.13. Continuous Random Variables 131


Quantitative Economics with Python

# specify parameters
μ, σ = 0, 0.1

# specify number of draws


n = 1_000_000

# draw observations from the distribution


x = np.random.normal(μ, σ, n)

# compute sample mean and variance


μ_hat = np.mean(x)
σ_hat = np.std(x)

print("The sample mean is: ", μ_hat)


print("The sample standard deviation is: ", σ_hat)

The sample mean is: -2.6699866495693146e-06


The sample standard deviation is: 0.09988310440282286

# compare
print(μ-μ_hat < 1e-3)
print(σ-σ_hat < 1e-3)

True
True

8.13.2 Uniform Distribution

𝑋 ∼ 𝑈 [𝑎, 𝑏]
1
, 𝑎≤𝑥≤𝑏
𝑓(𝑥) = { 𝑏−𝑎
0, otherwise
The population mean and variance are
𝑎+𝑏
𝔼(𝑋) =
2
(𝑏 − 𝑎)2
𝕍(𝑋) =
12
# specify parameters
a, b = 10, 20

# specify number of draws


n = 1_000_000

# draw observations from the distribution


x = a + (b-a)*np.random.rand(n)

# compute sample mean and variance


μ_hat = np.mean(x)
σ2_hat = np.var(x)

(continues on next page)

132 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

(continued from previous page)


print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
print("\nThe population mean is: ", (a+b)/2)
print("The population variance is: ", (b-a)**2/12)

The sample mean is: 15.00222274370156


The sample variance is: 8.339607328148443

The population mean is: 15.0


The population variance is: 8.333333333333334

8.14 A Mixed Discrete-Continuous Distribution

We’ll motivate this example with a little story.


Suppose that to apply for a job you take an interview and either pass or fail it.
You have 5% chance to pass an interview and you know your salary will uniformly distributed in the interval 300~400 a
day only if you pass.
We can describe your daily salary as a discrete-continuous variable with the following probabilities:

𝑃 (𝑋 = 0) = 0.95
400
𝑃 (300 ≤ 𝑋 ≤ 400) = ∫ 𝑓(𝑥) 𝑑𝑥 = 0.05
300

𝑓(𝑥) = 0.0005
Let’s start by generating a random sample and computing sample moments.

x = np.random.rand(1_000_000)
# x[x > 0.95] = 100*x[x > 0.95]+300
x[x > 0.95] = 100*np.random.rand(len(x[x > 0.95]))+300
x[x <= 0.95] = 0

μ_hat = np.mean(x)
σ2_hat = np.var(x)

print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)

The sample mean is: 17.548232806538643


The sample variance is: 5877.121811309432

The analytical mean and variance can be computed:


400
𝜇=∫ 𝑥𝑓(𝑥)𝑑𝑥
300
400
= 0.0005 ∫ 𝑥𝑑𝑥
300
400
1
= 0.0005 × 𝑥2 ∣
2 300

8.14. A Mixed Discrete-Continuous Distribution 133


Quantitative Economics with Python

400
𝜎2 = 0.95 × (0 − 17.5)2 + ∫ (𝑥 − 17.5)2 𝑓(𝑥)𝑑𝑥
300
400
= 0.95 × 17.52 + 0.0005 ∫ (𝑥 − 17.5)2 𝑑𝑥
300
400
1
2
= 0.95 × 17.5 + 0.0005 × (𝑥 − 17.5)3 ∣
3 300

mean = 0.0005*0.5*(400**2 - 300**2)


var = 0.95*17.5**2+0.0005/3*((400-17.5)**3-(300-17.5)**3)
print("mean: ", mean)
print("variance: ", var)

mean: 17.5
variance: 5860.416666666666

8.15 Matrix Representation of Some Bivariate Distributions

Let’s use matrices to represent a joint distribution, conditional distribution, marginal distribution, and the mean and
variance of a bivariate random variable.
The table below illustrates a probability distribution for a bivariate random variable.

0.3 0.2
𝐹 = [𝑓𝑖𝑗 ] = [ ]
0.1 0.4

Marginal distributions are

Prob(𝑋 = 𝑖) = ∑ 𝑓𝑖𝑗 = 𝑢𝑖
𝑗

Prob(𝑌 = 𝑗) = ∑ 𝑓𝑖𝑗 = 𝑣𝑗
𝑖

Below we draw some samples confirm that the “sampling” distribution agrees well with the “population” distribution.
Sample results:

# specify parameters
xs = np.array([0, 1])
ys = np.array([10, 20])
f = np.array([[0.3, 0.2], [0.1, 0.4]])
f_cum = np.cumsum(f)

# draw random numbers


p = np.random.rand(1_000_000)
x = np.vstack([xs[1]*np.ones(p.shape), ys[1]*np.ones(p.shape)])
# map to the bivariate distribution

x[0, p < f_cum[2]] = xs[1]


x[1, p < f_cum[2]] = ys[0]

x[0, p < f_cum[1]] = xs[0]


x[1, p < f_cum[1]] = ys[1]

(continues on next page)

134 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

(continued from previous page)


x[0, p < f_cum[0]] = xs[0]
x[1, p < f_cum[0]] = ys[0]
print(x)

[[ 0. 0. 0. ... 0. 1. 0.]
[10. 20. 10. ... 10. 20. 10.]]

Here, we use exactly the inverse CDF technique to generate sample from the joint distribution 𝐹 .

# marginal distribution
xp = np.sum(x[0, :] == xs[0])/1_000_000
yp = np.sum(x[1, :] == ys[0])/1_000_000

# print output
print("marginal distribution for x")
xmtb = pt.PrettyTable()
xmtb.field_names = ['x_value', 'x_prob']
xmtb.add_row([xs[0], xp])
xmtb.add_row([xs[1], 1-xp])
print(xmtb)

print("\nmarginal distribution for y")


ymtb = pt.PrettyTable()
ymtb.field_names = ['y_value', 'y_prob']
ymtb.add_row([ys[0], yp])
ymtb.add_row([ys[1], 1-yp])
print(ymtb)

marginal distribution for x


+---------+----------+
| x_value | x_prob |
+---------+----------+
| 0 | 0.499959 |
| 1 | 0.500041 |
+---------+----------+

marginal distribution for y


+---------+----------+
| y_value | y_prob |
+---------+----------+
| 10 | 0.399779 |
| 20 | 0.600221 |
+---------+----------+

# conditional distributions
xc1 = x[0, x[1, :] == ys[0]]
xc2 = x[0, x[1, :] == ys[1]]
yc1 = x[1, x[0, :] == xs[0]]
yc2 = x[1, x[0, :] == xs[1]]

xc1p = np.sum(xc1 == xs[0])/len(xc1)


xc2p = np.sum(xc2 == xs[0])/len(xc2)
yc1p = np.sum(yc1 == ys[0])/len(yc1)
yc2p = np.sum(yc2 == ys[0])/len(yc2)
(continues on next page)

8.15. Matrix Representation of Some Bivariate Distributions 135


Quantitative Economics with Python

(continued from previous page)

# print output
print("conditional distribution for x")
xctb = pt.PrettyTable()
xctb.field_names = ['y_value', 'prob(x=0)', 'prob(x=1)']
xctb.add_row([ys[0], xc1p, 1-xc1p])
xctb.add_row([ys[1], xc2p, 1-xc2p])
print(xctb)

print("\nconditional distribution for y")


yctb = pt.PrettyTable()
yctb.field_names = ['x_value', 'prob(y=10)', 'prob(y=20)']
yctb.add_row([xs[0], yc1p, 1-yc1p])
yctb.add_row([xs[1], yc2p, 1-yc2p])
print(yctb)

conditional distribution for x


+---------+--------------------+---------------------+
| y_value | prob(x=0) | prob(x=1) |
+---------+--------------------+---------------------+
| 10 | 0.7501469561932967 | 0.24985304380670326 |
| 20 | 0.3333205602603041 | 0.6666794397396959 |
+---------+--------------------+---------------------+

conditional distribution for y


+---------+---------------------+---------------------+
| x_value | prob(y=10) | prob(y=20) |
+---------+---------------------+---------------------+
| 0 | 0.5998351864852918 | 0.40016481351470823 |
| 1 | 0.19975562003915678 | 0.8002443799608432 |
+---------+---------------------+---------------------+

Let’s calculate population marginal and conditional probabilities using matrix algebra.
⋮ 𝑦 1 𝑦2 ⋮ 𝑥
⎡ ⋯ ⋮ ⋯ ⋯ ⋮ ⋯ ⎤
⎢ ⎥
⎢ 𝑥1 ⋮ 0.3 0.2 ⋮ 0.5 ⎥
⎢ 𝑥2 ⋮ 0.1 0.4 ⋮ 0.5 ⎥
⎢ ⋯ ⋮ ⋯ ⋯ ⋮ ⋯ ⎥
⎣ 𝑦 ⋮ 0.4 0.6 ⋮ 1 ⎦

(1) Marginal distribution:
𝑣𝑎𝑟 ⋮ 𝑣𝑎𝑟1 𝑣𝑎𝑟2
⎡ ⋯ ⋮ ⋯ ⋯ ⎤
⎢ ⎥
⎢ 𝑥 ⋮ 0.5 0.5 ⎥
⎢ ⋯ ⋮ ⋯ ⋯ ⎥
⎣ 𝑦 ⋮ 0.4 0.6 ⎦
(2) Conditional distribution:
𝑥 ⋮ 𝑥1 𝑥2
⎡ ⋯⋯⋯ ⋮ ⋯⋯⋯ ⋯⋯⋯ ⎤
⎢ 0.3 0.1 ⎥
⎢ 𝑦 = 𝑦1 ⋮ 0.4 = 0.75 0.4 = 0.25 ⎥
⎢ ⋯⋯⋯ ⋮ ⋯⋯⋯ ⋯⋯⋯ ⎥
0.2 0.4
⎣ 𝑦 = 𝑦2 ⋮ 0.6 ≈ 0.33 0.6 ≈ 0.67 ⎦

136 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

𝑦 ⋮ 𝑦1 𝑦2
⎡ ⋯⋯⋯ ⋮ ⋯⋯⋯ ⋯⋯⋯ ⎤
⎢ 0.3 0.2 ⎥
⎢ 𝑥 = 𝑥1 ⋮ 0.5 = 0.6 0.5 = 0.4 ⎥
⎢ ⋯⋯⋯ ⋮ ⋯⋯⋯ ⋯⋯⋯ ⎥
0.1 0.4
⎣ 𝑥 = 𝑥2 ⋮ 0.5 = 0.2 0.5 = 0.8 ⎦
These population objects closely resemble sample counterparts computed above.
Let’s wrap some of the functions we have used in a Python class for a general discrete bivariate joint distribution.

class discrete_bijoint:

def __init__(self, f, xs, ys):


'''initialization
-----------------
parameters:
f: the bivariate joint probability matrix
xs: values of x vector
ys: values of y vector
'''
self.f, self.xs, self.ys = f, xs, ys

def joint_tb(self):
'''print the joint distribution table'''
xs = self.xs
ys = self.ys
f = self.f
jtb = pt.PrettyTable()
jtb.field_names = ['x_value/y_value', *ys, 'marginal sum for x']
for i in range(len(xs)):
jtb.add_row([xs[i], *f[i, :], np.sum(f[i, :])])
jtb.add_row(['marginal_sum for y', *np.sum(f, 0), np.sum(f)])
print("\nThe joint probability distribution for x and y\n", jtb)
self.jtb = jtb

def draw(self, n):


'''draw random numbers
----------------------
parameters:
n: number of random numbers to draw
'''
xs = self.xs
ys = self.ys
f_cum = np.cumsum(self.f)
p = np.random.rand(n)
x = np.empty([2, p.shape[0]])
lf = len(f_cum)
lx = len(xs)-1
ly = len(ys)-1
for i in range(lf):
x[0, p < f_cum[lf-1-i]] = xs[lx]
x[1, p < f_cum[lf-1-i]] = ys[ly]
if ly == 0:
lx -= 1
ly = len(ys)-1
else:
ly -= 1
self.x = x
self.n = n
(continues on next page)

8.15. Matrix Representation of Some Bivariate Distributions 137


Quantitative Economics with Python

(continued from previous page)

def marg_dist(self):
'''marginal distribution'''
x = self.x
xs = self.xs
ys = self.ys
n = self.n
xmp = [np.sum(x[0, :] == xs[i])/n for i in range(len(xs))]
ymp = [np.sum(x[1, :] == ys[i])/n for i in range(len(ys))]

# print output
xmtb = pt.PrettyTable()
ymtb = pt.PrettyTable()
xmtb.field_names = ['x_value', 'x_prob']
ymtb.field_names = ['y_value', 'y_prob']
for i in range(max(len(xs), len(ys))):
if i < len(xs):
xmtb.add_row([xs[i], xmp[i]])
if i < len(ys):
ymtb.add_row([ys[i], ymp[i]])
xmtb.add_row(['sum', np.sum(xmp)])
ymtb.add_row(['sum', np.sum(ymp)])
print("\nmarginal distribution for x\n", xmtb)
print("\nmarginal distribution for y\n", ymtb)

self.xmp = xmp
self.ymp = ymp

def cond_dist(self):
'''conditional distribution'''
x = self.x
xs = self.xs
ys = self.ys
n = self.n
xcp = np.empty([len(ys), len(xs)])
ycp = np.empty([len(xs), len(ys)])
for i in range(max(len(ys), len(xs))):
if i < len(ys):
xi = x[0, x[1, :] == ys[i]]
idx = xi.reshape(len(xi), 1) == xs.reshape(1, len(xs))
xcp[i, :] = np.sum(idx, 0)/len(xi)
if i < len(xs):
yi = x[1, x[0, :] == xs[i]]
idy = yi.reshape(len(yi), 1) == ys.reshape(1, len(ys))
ycp[i, :] = np.sum(idy, 0)/len(yi)

# print output
xctb = pt.PrettyTable()
yctb = pt.PrettyTable()
xctb.field_names = ['x_value', *xs, 'sum']
yctb.field_names = ['y_value', *ys, 'sum']
for i in range(max(len(xs), len(ys))):
if i < len(ys):
xctb.add_row([ys[i], *xcp[i], np.sum(xcp[i])])
if i < len(xs):
yctb.add_row([xs[i], *ycp[i], np.sum(ycp[i])])

(continues on next page)

138 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

(continued from previous page)


print("\nconditional distribution for x\n", xctb)
print("\nconditional distribution for y\n", yctb)

self.xcp = xcp
self.xyp = ycp

Let’s apply our code to some examples.


Example 1

# joint
d = discrete_bijoint(f, xs, ys)
d.joint_tb()

The joint probability distribution for x and y


+--------------------+-----+--------------------+--------------------+
| x_value/y_value | 10 | 20 | marginal sum for x |
+--------------------+-----+--------------------+--------------------+
| 0 | 0.3 | 0.2 | 0.5 |
| 1 | 0.1 | 0.4 | 0.5 |
| marginal_sum for y | 0.4 | 0.6000000000000001 | 1.0 |
+--------------------+-----+--------------------+--------------------+

# sample marginal
d.draw(1_000_000)
d.marg_dist()

marginal distribution for x


+---------+----------+
| x_value | x_prob |
+---------+----------+
| 0 | 0.499154 |
| 1 | 0.500846 |
| sum | 1.0 |
+---------+----------+

marginal distribution for y


+---------+----------+
| y_value | y_prob |
+---------+----------+
| 10 | 0.398873 |
| 20 | 0.601127 |
| sum | 1.0 |
+---------+----------+

# sample conditional
d.cond_dist()

conditional distribution for x


+---------+--------------------+---------------------+-----+
| x_value | 0 | 1 | sum |
+---------+--------------------+---------------------+-----+
(continues on next page)

8.15. Matrix Representation of Some Bivariate Distributions 139


Quantitative Economics with Python

(continued from previous page)


| 10 | 0.7498677523923655 | 0.25013224760763453 | 1.0 |
| 20 | 0.3327949002457051 | 0.6672050997542949 | 1.0 |
+---------+--------------------+---------------------+-----+

conditional distribution for y


+---------+---------------------+--------------------+-----+
| y_value | 10 | 20 | sum |
+---------+---------------------+--------------------+-----+
| 0 | 0.5992178766472872 | 0.4007821233527128 | 1.0 |
| 1 | 0.19920494523266633 | 0.8007950547673337 | 1.0 |
+---------+---------------------+--------------------+-----+

Example 2

xs_new = np.array([10, 20, 30])


ys_new = np.array([1, 2])
f_new = np.array([[0.2, 0.1], [0.1, 0.3], [0.15, 0.15]])
d_new = discrete_bijoint(f_new, xs_new, ys_new)
d_new.joint_tb()

The joint probability distribution for x and y


+--------------------+---------------------+------+---------------------+
| x_value/y_value | 1 | 2 | marginal sum for x |
+--------------------+---------------------+------+---------------------+
| 10 | 0.2 | 0.1 | 0.30000000000000004 |
| 20 | 0.1 | 0.3 | 0.4 |
| 30 | 0.15 | 0.15 | 0.3 |
| marginal_sum for y | 0.45000000000000007 | 0.55 | 1.0 |
+--------------------+---------------------+------+---------------------+

d_new.draw(1_000_000)
d_new.marg_dist()

marginal distribution for x


+---------+----------+
| x_value | x_prob |
+---------+----------+
| 10 | 0.29917 |
| 20 | 0.400588 |
| 30 | 0.300242 |
| sum | 1.0 |
+---------+----------+

marginal distribution for y


+---------+----------+
| y_value | y_prob |
+---------+----------+
| 1 | 0.448634 |
| 2 | 0.551366 |
| sum | 1.0 |
+---------+----------+

d_new.cond_dist()

140 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

conditional distribution for x


+---------+--------------------+---------------------+---------------------+-----+
| x_value | 10 | 20 | 30 | sum |
+---------+--------------------+---------------------+---------------------+-----+
| 1 | 0.4433257399127128 | 0.22248648118510858 | 0.3341877789021786 | 1.0 |
| 2 | 0.1818737462955641 | 0.5455051635392825 | 0.27262109016515346 | 1.0 |
+---------+--------------------+---------------------+---------------------+-----+

conditional distribution for y


+---------+---------------------+--------------------+-----+
| y_value | 1 | 2 | sum |
+---------+---------------------+--------------------+-----+
| 10 | 0.664809305745897 | 0.335190694254103 | 1.0 |
| 20 | 0.24917121830908565 | 0.7508287816909144 | 1.0 |
| 30 | 0.4993571852039355 | 0.5006428147960645 | 1.0 |
+---------+---------------------+--------------------+-----+

8.16 A Continuous Bivariate Random Vector

A two-dimensional Gaussian distribution has joint density

1 (𝑥 − 𝜇1 )2 2𝜌(𝑥 − 𝜇1 )(𝑦 − 𝜇2 ) (𝑦 − 𝜇2 )2
𝑓(𝑥, 𝑦) = (2𝜋𝜎1 𝜎2 √1 − 𝜌2 )−1 exp [− ( − + )]
2(1 − 𝜌2 ) 𝜎12 𝜎1 𝜎2 𝜎22

1 1 (𝑥 − 𝜇1 )2 2𝜌(𝑥 − 𝜇1 )(𝑦 − 𝜇2 ) (𝑦 − 𝜇2 )2
exp [− ( 2
− + )]
2𝜋𝜎1 𝜎2 √1 − 𝜌2 2(1 − 𝜌2 ) 𝜎1 𝜎1 𝜎2 𝜎22
We start with a bivariate normal distribution pinned down by

0 5 .2
𝜇=[ ], Σ=[ ]
5 .2 1

# define the joint probability density function


def func(x, y, μ1=0, μ2=5, σ1=np.sqrt(5), σ2=np.sqrt(1), ρ=.2/np.sqrt(5*1)):
A = (2 * np.pi * σ1 * σ2 * np.sqrt(1 - ρ**2))**(-1)
B = -1 / 2 / (1 - ρ**2)
C1 = (x - μ1)**2 / σ1**2
C2 = 2 * ρ * (x - μ1) * (y - μ2) / σ1 / σ2
C3 = (y - μ2)**2 / σ2**2
return A * np.exp(B * (C1 - C2 + C3))

μ1 = 0
μ2 = 5
σ1 = np.sqrt(5)
σ2 = np.sqrt(1)
ρ = .2 / np.sqrt(5 * 1)

x = np.linspace(-10, 10, 1_000)


y = np.linspace(-10, 10, 1_000)
x_mesh, y_mesh = np.meshgrid(x, y, indexing="ij")

Joint Distribution
Let’s plot the population joint density.

8.16. A Continuous Bivariate Random Vector 141


Quantitative Economics with Python

# %matplotlib notebook

fig = plt.figure()
ax = plt.axes(projection='3d')

surf = ax.plot_surface(x_mesh, y_mesh, func(x_mesh, y_mesh), cmap='viridis')


plt.show()

# %matplotlib notebook

fig = plt.figure()
ax = plt.axes(projection='3d')

curve = ax.contour(x_mesh, y_mesh, func(x_mesh, y_mesh), zdir='x')


plt.ylabel('y')
ax.set_zlabel('f')
ax.set_xticks([])
plt.show()

142 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

Next we can simulate from a built-in numpy function and calculate a sample marginal distribution from the sample mean
and variance.

μ= np.array([0, 5])
σ= np.array([[5, .2], [.2, 1]])
n = 1_000_000
data = np.random.multivariate_normal(μ, σ, n)
x = data[:, 0]
y = data[:, 1]

Marginal distribution

plt.hist(x, bins=1_000, alpha=0.6)


μx_hat, σx_hat = np.mean(x), np.std(x)
print(μx_hat, σx_hat)
x_sim = np.random.normal(μx_hat, σx_hat, 1_000_000)
plt.hist(x_sim, bins=1_000, alpha=0.4, histtype="step")
plt.show()

-0.0001294339629668653 2.2338665663818036

8.16. A Continuous Bivariate Random Vector 143


Quantitative Economics with Python

plt.hist(y, bins=1_000, density=True, alpha=0.6)


μy_hat, σy_hat = np.mean(y), np.std(y)
print(μy_hat, σy_hat)
y_sim = np.random.normal(μy_hat, σy_hat, 1_000_000)
plt.hist(y_sim, bins=1_000, density=True, alpha=0.4, histtype="step")
plt.show()

4.998845851883271 1.0005847916711021

Conditional distribution

144 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

The population conditional distribution is

𝑦 − 𝜇𝑌 2
[𝑋|𝑌 = 𝑦] ∼ ℕ[𝜇𝑋 + 𝜌𝜎𝑋 , 𝜎𝑋 (1 − 𝜌2 )]
𝜎𝑌
𝑥 − 𝜇𝑋 2
[𝑌 |𝑋 = 𝑥] ∼ ℕ[𝜇𝑌 + 𝜌𝜎𝑌 , 𝜎𝑌 (1 − 𝜌2 )]
𝜎𝑋

Let’s approximate the joint density by discretizing and mapping the approximating joint density into a matrix.
We can compute the discretized marginal density by just using matrix algebra and noting that

𝑓𝑖𝑗
Prob{𝑋 = 𝑖|𝑌 = 𝑗} =
∑𝑖 𝑓𝑖𝑗

Fix 𝑦 = 0.

# discretized marginal density


x = np.linspace(-10, 10, 1_000_000)
z = func(x, y=0) / np.sum(func(x, y=0))
plt.plot(x, z)
plt.show()

The mean and variance are computed by

𝑓𝑖𝑗
𝔼 [𝑋|𝑌 = 𝑗] = ∑ 𝑖𝑃 𝑟𝑜𝑏{𝑋 = 𝑖|𝑌 = 𝑗} = ∑ 𝑖
𝑖 𝑖
∑𝑖 𝑓𝑖𝑗
𝑓𝑖𝑗2
𝔻 [𝑋|𝑌 = 𝑗] = ∑ (𝑖 − 𝜇𝑋|𝑌 =𝑗 )
𝑖
∑𝑖 𝑓𝑖𝑗

Let’s draw from a normal distribution with above mean and variance and check how accurate our approximation is.

8.16. A Continuous Bivariate Random Vector 145


Quantitative Economics with Python

# discretized mean
μx = np.dot(x, z)

# discretized standard deviation


σx = np.sqrt(np.dot((x - μx)**2, z))

# sample
zz = np.random.normal(μx, σx, 1_000_000)
plt.hist(zz, bins=300, density=True, alpha=0.3, range=[-10, 10])
plt.show()

Fix 𝑥 = 1.

y = np.linspace(0, 10, 1_000_000)


z = func(x=1, y=y) / np.sum(func(x=1, y=y))
plt.plot(y,z)
plt.show()

146 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

# discretized mean and standard deviation


μy = np.dot(y,z)
σy = np.sqrt(np.dot((y - μy)**2, z))

# sample
zz = np.random.normal(μy,σy,1_000_000)
plt.hist(zz, bins=100, density=True, alpha=0.3)
plt.show()

We compare with the analytically computed parameters and note that they are close.

8.16. A Continuous Bivariate Random Vector 147


Quantitative Economics with Python

print(μx, σx)
print(μ1 + ρ * σ1 * (0 - μ2) / σ2, np.sqrt(σ1**2 * (1 - ρ**2)))

print(μy, σy)
print(μ2 + ρ * σ2 * (1 - μ1) / σ1, np.sqrt(σ2**2 * (1 - ρ**2)))

-0.9997518414498433 2.22658413316977
-1.0 2.227105745132009
5.039999456960771 0.9959851265795592
5.04 0.9959919678390986

8.17 Sum of Two Independently Distributed Random Variables

Let 𝑋, 𝑌 be two independent discrete random variables that take values in 𝑋,̄ 𝑌 ̄ , respectively.
Define a new random variable 𝑍 = 𝑋 + 𝑌 .
Evidently, 𝑍 takes values from 𝑍 ̄ defined as follows:
𝑋̄ = {0, 1, … , 𝐼 − 1}; 𝑓𝑖 = Prob{𝑋 = 𝑖}
𝑌 ̄ = {0, 1, … , 𝐽 − 1}; 𝑔𝑗 = Prob{𝑌 = 𝑗}
𝑍 ̄ = {0, 1, … , 𝐼 + 𝐽 − 2}; ℎ𝑘 = Prob{𝑋 + 𝑌 = 𝑘}
Independence of 𝑋 and 𝑌 implies that
ℎ𝑘 = Prob{𝑋 = 0, 𝑌 = 𝑘} + Prob{𝑋 = 1, 𝑌 = 𝑘 − 1} + … + Prob{𝑋 = 𝑘, 𝑌 = 0}
ℎ𝑘 = 𝑓0 𝑔𝑘 + 𝑓1 𝑔𝑘−1 + … + 𝑓𝑘−1 𝑔1 + 𝑓𝑘 𝑔0 for 𝑘 = 0, 1, … , 𝐼 + 𝐽 − 2
Thus, we have:
𝑘
ℎ𝑘 = ∑ 𝑓𝑖 𝑔𝑘−𝑖 ≡ 𝑓 ∗ 𝑔
𝑖=0

where 𝑓 ∗ 𝑔 denotes the convolution of the 𝑓 and 𝑔 sequences.


Similarly, for two random variables 𝑋, 𝑌 with densities 𝑓𝑋 , 𝑔𝑌 , the density of 𝑍 = 𝑋 + 𝑌 is

𝑓𝑍 (𝑧) = ∫ 𝑓𝑋 (𝑥)𝑓𝑌 (𝑧 − 𝑥)𝑑𝑥 ≡ 𝑓𝑋 ∗ 𝑔𝑌
−∞

where 𝑓𝑋 ∗ 𝑔𝑌 denotes the convolution of the 𝑓𝑋 and 𝑔𝑌 functions.

8.18 Transition Probability Matrix

Consider the following joint probability distribution of two random variables.


Let 𝑋, 𝑌 be discrete random variables with joint distribution

Prob{𝑋 = 𝑖, 𝑌 = 𝑗} = 𝜌𝑖𝑗

where 𝑖 = 0, … , 𝐼 − 1; 𝑗 = 0, … , 𝐽 − 1 and

∑ ∑ 𝜌𝑖𝑗 = 1, 𝜌𝑖𝑗 ⩾ 0.
𝑖 𝑗

148 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

An associated conditional distribution is


𝜌𝑖𝑗 Prob{𝑌 = 𝑗, 𝑋 = 𝑖}
Prob{𝑌 = 𝑖|𝑋 = 𝑗} = =
∑𝑖 𝜌𝑖𝑗 Prob{𝑋 = 𝑖}

We can define a transition probability matrix


𝜌𝑖𝑗
𝑝𝑖𝑗 = Prob{𝑌 = 𝑗|𝑋 = 𝑖} =
∑𝑗 𝜌𝑖𝑗

where
𝑝 𝑝12
[ 11 ]
𝑝21 𝑝22

The first row is the probability of 𝑌 = 𝑗, 𝑗 = 0, 1 conditional on 𝑋 = 0.


The second row is the probability of 𝑌 = 𝑗, 𝑗 = 0, 1 conditional on 𝑋 = 1.
Note that
∑𝑗 𝜌𝑖𝑗
• ∑𝑗 𝜌𝑖𝑗 = ∑𝑗 𝜌𝑖𝑗 = 1, so each row of 𝜌 is a probability distribution (not so for each column.

8.19 Coupling

Start with a joint distribution

𝑓𝑖𝑗 = Prob{𝑋 = 𝑖, 𝑌 = 𝑗}
𝑖 = 0, ⋯ 𝐼 − 1
𝑗 = 0, ⋯ 𝐽 − 1
stacked to an 𝐼 × 𝐽 matrix
𝑒.𝑔. 𝐼 = 1, 𝐽 = 1

where
𝑓11 𝑓12
[ ]
𝑓21 𝑓22

From the joint distribution, we have shown above that we obtain unique marginal distributions.
Now we’ll try to go in a reverse direction.
We’ll find that from two marginal distributions, can we usually construct more than one joint distribution that verifies
these marginals.
Each of these joint distributions is called a coupling of the two martingal distributions.
Let’s start with marginal distributions

Prob{𝑋 = 𝑖} = ∑ 𝑓𝑖𝑗 = 𝜇𝑖 , 𝑖 = 0, ⋯ , 𝐼 − 1
𝑗

Prob{𝑌 = 𝑗} = ∑ 𝑓𝑖𝑗 = 𝜈𝑗 , 𝑗 = 0, ⋯ , 𝐽 − 1
𝑗

Given two marginal distribution, 𝜇 for 𝑋 and 𝜈 for 𝑌 , a joint distribution 𝑓𝑖𝑗 is said to be a coupling of 𝜇 and 𝜈.
Example:

8.19. Coupling 149


Quantitative Economics with Python

Consider the following bivariate example.

Prob{𝑋 = 0} =1 − 𝑞 = 𝜇0
Prob{𝑋 = 1} =𝑞 = 𝜇1
Prob{𝑌 = 0} =1 − 𝑟 = 𝜈0
Prob{𝑌 = 1} =𝑟 = 𝜈1
where 0 ≤ 𝑞 < 𝑟 ≤ 1

We construct two couplings.


The first coupling if our two marginal distributions is the joint distribution

(1 − 𝑞)(1 − 𝑟) (1 − 𝑞)𝑟
𝑓𝑖𝑗 = [ ]
𝑞(1 − 𝑟) 𝑞𝑟

To verify that it is a coupling, we check that

(1 − 𝑞)(1 − 𝑟) + (1 − 𝑞)𝑟 + 𝑞(1 − 𝑟) + 𝑞𝑟 = 1


𝜇0 = (1 − 𝑞)(1 − 𝑟) + (1 − 𝑞)𝑟 = 1 − 𝑞
𝜇1 = 𝑞(1 − 𝑟) + 𝑞𝑟 = 𝑞
𝜈0 = (1 − 𝑞)(1 − 𝑟) + (1 − 𝑟)𝑞 = 1 − 𝑟
𝜇1 = 𝑟(1 − 𝑞) + 𝑞𝑟 = 𝑟

A second coupling of our two marginal distributions is the joint distribution

(1 − 𝑟) 𝑟−𝑞
𝑓𝑖𝑗 = [ ]
0 𝑞

The verify that this is a coupling, note that

1−𝑟+𝑟−𝑞+𝑞 =1
𝜇0 = 1 − 𝑞
𝜇1 = 𝑞
𝜈0 = 1 − 𝑟
𝜈1 = 𝑟

Thus, our two proposed joint distributions have the same marginal distributions.
But the joint distributions differ.
Thus, multiple joint distributions [𝑓𝑖𝑗 ] can have the same marginals.
Remark:
• Couplings are important in optimal transport problems and in Markov processes.

8.20 Copula Functions

Suppose that 𝑋1 , 𝑋2 , … , 𝑋𝑛 are 𝑁 random variables and that


• their marginal distributions are 𝐹1 (𝑥1 ), 𝐹2 (𝑥2 ), … , 𝐹𝑁 (𝑥𝑁 ), and
• their joint distribution is 𝐻(𝑥1 , 𝑥2 , … , 𝑥𝑁 )

150 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

Then there exists a copula function 𝐶(⋅) that verifies

𝐻(𝑥1 , 𝑥2 , … , 𝑥𝑁 ) = 𝐶(𝐹1 (𝑥1 ), 𝐹2 (𝑥2 ), … , 𝐹𝑁 (𝑥𝑁 )).

We can obtain

𝐶(𝑢1 , 𝑢2 , … , 𝑢𝑛 ) = 𝐻[𝐹1−1 (𝑢1 ), 𝐹2−1 (𝑢2 ), … , 𝐹𝑁−1 (𝑢𝑁 )]

In a reverse direction of logic, given univariate marginal distributions 𝐹1 (𝑥1 ), 𝐹2 (𝑥2 ), … , 𝐹𝑁 (𝑥𝑁 ) and a
copula function 𝐶(⋅), the function 𝐻(𝑥1 , 𝑥2 , … , 𝑥𝑁 ) = 𝐶(𝐹1 (𝑥1 ), 𝐹2 (𝑥2 ), … , 𝐹𝑁 (𝑥𝑁 )) is a coupling of
𝐹1 (𝑥1 ), 𝐹2 (𝑥2 ), … , 𝐹𝑁 (𝑥𝑁 ).
Thus, for given marginal distributions, we can use a copula function to determine a joint distribution when the associated
univariate random variables are not independent.
Copula functions are often used to characterize dependence of random variables.
Discrete marginal distribution
As mentioned above, for two given marginal distributions there can be more than one coupling.
For example, consider two random variables 𝑋, 𝑌 with distributions

Prob(𝑋 = 0) = 0.6,
Prob(𝑋 = 1) = 0.4,
Prob(𝑌 = 0) = 0.3,
Prob(𝑌 = 1) = 0.7,

For these two random variables there can be more than one coupling.
Let’s first generate X and Y.

# define parameters
mu = np.array([0.6, 0.4])
nu = np.array([0.3, 0.7])

# number of draws
draws = 1_000_000

# generate draws from uniform distribution


p = np.random.rand(draws)

# generate draws of X and Y via uniform distribution


x = np.ones(draws)
y = np.ones(draws)
x[p <= mu[0]] = 0
x[p > mu[0]] = 1
y[p <= nu[0]] = 0
y[p > nu[0]] = 1

# calculate parameters from draws


q_hat = sum(x[x == 1])/draws
r_hat = sum(y[y == 1])/draws

# print output
print("distribution for x")
xmtb = pt.PrettyTable()
xmtb.field_names = ['x_value', 'x_prob']
(continues on next page)

8.20. Copula Functions 151


Quantitative Economics with Python

(continued from previous page)


xmtb.add_row([0, 1-q_hat])
xmtb.add_row([1, q_hat])
print(xmtb)

print("distribution for y")


ymtb = pt.PrettyTable()
ymtb.field_names = ['y_value', 'y_prob']
ymtb.add_row([0, 1-r_hat])
ymtb.add_row([1, r_hat])
print(ymtb)

distribution for x
+---------+----------+
| x_value | x_prob |
+---------+----------+
| 0 | 0.600175 |
| 1 | 0.399825 |
+---------+----------+
distribution for y
+---------+----------+
| y_value | y_prob |
+---------+----------+
| 0 | 0.300562 |
| 1 | 0.699438 |
+---------+----------+

Let’s now take our two marginal distributions, one for 𝑋, the other for 𝑌 , and construct two distinct couplings.
For the first joint distribution:

Prob(𝑋 = 𝑖, 𝑌 = 𝑗) = 𝑓𝑖𝑗

where
0.18 0.42
[𝑓𝑖𝑗 ] = [ ]
0.12 0.28
Let’s use Python to construct this joint distribution and then verify that its marginal distributions are what we want.

# define parameters
f1 = np.array([[0.18, 0.42], [0.12, 0.28]])
f1_cum = np.cumsum(f1)

# number of draws
draws1 = 1_000_000

# generate draws from uniform distribution


p = np.random.rand(draws1)

# generate draws of first copuling via uniform distribution


c1 = np.vstack([np.ones(draws1), np.ones(draws1)])
# X=0, Y=0
c1[0, p <= f1_cum[0]] = 0
c1[1, p <= f1_cum[0]] = 0
# X=0, Y=1
c1[0, (p > f1_cum[0])*(p <= f1_cum[1])] = 0
(continues on next page)

152 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

(continued from previous page)


c1[1, (p > f1_cum[0])*(p <= f1_cum[1])] = 1
# X=1, Y=0
c1[0, (p > f1_cum[1])*(p <= f1_cum[2])] = 1
c1[1, (p > f1_cum[1])*(p <= f1_cum[2])] = 0
# X=1, Y=1
c1[0, (p > f1_cum[2])*(p <= f1_cum[3])] = 1
c1[1, (p > f1_cum[2])*(p <= f1_cum[3])] = 1

# calculate parameters from draws


f1_00 = sum((c1[0, :] == 0)*(c1[1, :] == 0))/draws1
f1_01 = sum((c1[0, :] == 0)*(c1[1, :] == 1))/draws1
f1_10 = sum((c1[0, :] == 1)*(c1[1, :] == 0))/draws1
f1_11 = sum((c1[0, :] == 1)*(c1[1, :] == 1))/draws1

# print output of first joint distribution


print("first joint distribution for c1")
c1_mtb = pt.PrettyTable()
c1_mtb.field_names = ['c1_x_value', 'c1_y_value', 'c1_prob']
c1_mtb.add_row([0, 0, f1_00])
c1_mtb.add_row([0, 1, f1_01])
c1_mtb.add_row([1, 0, f1_10])
c1_mtb.add_row([1, 1, f1_11])
print(c1_mtb)

first joint distribution for c1


+------------+------------+----------+
| c1_x_value | c1_y_value | c1_prob |
+------------+------------+----------+
| 0 | 0 | 0.179646 |
| 0 | 1 | 0.420357 |
| 1 | 0 | 0.120022 |
| 1 | 1 | 0.279975 |
+------------+------------+----------+

# calculate parameters from draws


c1_q_hat = sum(c1[0, :] == 1)/draws1
c1_r_hat = sum(c1[1, :] == 1)/draws1

# print output
print("marginal distribution for x")
c1_x_mtb = pt.PrettyTable()
c1_x_mtb.field_names = ['c1_x_value', 'c1_x_prob']
c1_x_mtb.add_row([0, 1-c1_q_hat])
c1_x_mtb.add_row([1, c1_q_hat])
print(c1_x_mtb)

print("marginal distribution for y")


c1_ymtb = pt.PrettyTable()
c1_ymtb.field_names = ['c1_y_value', 'c1_y_prob']
c1_ymtb.add_row([0, 1-c1_r_hat])
c1_ymtb.add_row([1, c1_r_hat])
print(c1_ymtb)

8.20. Copula Functions 153


Quantitative Economics with Python

marginal distribution for x


+------------+--------------------+
| c1_x_value | c1_x_prob |
+------------+--------------------+
| 0 | 0.6000030000000001 |
| 1 | 0.399997 |
+------------+--------------------+
marginal distribution for y
+------------+---------------------+
| c1_y_value | c1_y_prob |
+------------+---------------------+
| 0 | 0.29966800000000005 |
| 1 | 0.700332 |
+------------+---------------------+

Now, let’s construct another joint distribution that is also a coupling of 𝑋 and 𝑌
0.3 0.3
[𝑓𝑖𝑗 ] = [ ]
0 0.4

# define parameters
f2 = np.array([[0.3, 0.3], [0, 0.4]])
f2_cum = np.cumsum(f2)

# number of draws
draws2 = 1_000_000

# generate draws from uniform distribution


p = np.random.rand(draws2)

# generate draws of first coupling via uniform distribution


c2 = np.vstack([np.ones(draws2), np.ones(draws2)])
# X=0, Y=0
c2[0, p <= f2_cum[0]] = 0
c2[1, p <= f2_cum[0]] = 0
# X=0, Y=1
c2[0, (p > f2_cum[0])*(p <= f2_cum[1])] = 0
c2[1, (p > f2_cum[0])*(p <= f2_cum[1])] = 1
# X=1, Y=0
c2[0, (p > f2_cum[1])*(p <= f2_cum[2])] = 1
c2[1, (p > f2_cum[1])*(p <= f2_cum[2])] = 0
# X=1, Y=1
c2[0, (p > f2_cum[2])*(p <= f2_cum[3])] = 1
c2[1, (p > f2_cum[2])*(p <= f2_cum[3])] = 1

# calculate parameters from draws


f2_00 = sum((c2[0, :] == 0)*(c2[1, :] == 0))/draws2
f2_01 = sum((c2[0, :] == 0)*(c2[1, :] == 1))/draws2
f2_10 = sum((c2[0, :] == 1)*(c2[1, :] == 0))/draws2
f2_11 = sum((c2[0, :] == 1)*(c2[1, :] == 1))/draws2

# print output of second joint distribution


print("first joint distribution for c2")
c2_mtb = pt.PrettyTable()
c2_mtb.field_names = ['c2_x_value', 'c2_y_value', 'c2_prob']
c2_mtb.add_row([0, 0, f2_00])
(continues on next page)

154 Chapter 8. Elementary Probability with Matrices


Quantitative Economics with Python

(continued from previous page)


c2_mtb.add_row([0, 1, f2_01])
c2_mtb.add_row([1, 0, f2_10])
c2_mtb.add_row([1, 1, f2_11])
print(c2_mtb)

first joint distribution for c2


+------------+------------+----------+
| c2_x_value | c2_y_value | c2_prob |
+------------+------------+----------+
| 0 | 0 | 0.300074 |
| 0 | 1 | 0.299807 |
| 1 | 0 | 0.0 |
| 1 | 1 | 0.400119 |
+------------+------------+----------+

# calculate parameters from draws


c2_q_hat = sum(c2[0, :] == 1)/draws2
c2_r_hat = sum(c2[1, :] == 1)/draws2

# print output
print("marginal distribution for x")
c2_x_mtb = pt.PrettyTable()
c2_x_mtb.field_names = ['c2_x_value', 'c2_x_prob']
c2_x_mtb.add_row([0, 1-c2_q_hat])
c2_x_mtb.add_row([1, c2_q_hat])
print(c2_x_mtb)

print("marginal distribution for y")


c2_ymtb = pt.PrettyTable()
c2_ymtb.field_names = ['c2_y_value', 'c2_y_prob']
c2_ymtb.add_row([0, 1-c2_r_hat])
c2_ymtb.add_row([1, c2_r_hat])
print(c2_ymtb)

marginal distribution for x


+------------+-----------+
| c2_x_value | c2_x_prob |
+------------+-----------+
| 0 | 0.599881 |
| 1 | 0.400119 |
+------------+-----------+
marginal distribution for y
+------------+---------------------+
| c2_y_value | c2_y_prob |
+------------+---------------------+
| 0 | 0.30007399999999995 |
| 1 | 0.699926 |
+------------+---------------------+

We have verified that both joint distributions, 𝑐1 and 𝑐2 , have identical marginal distributions of 𝑋 and 𝑌 , respectively.
So they are both couplings of 𝑋 and 𝑌 .

8.20. Copula Functions 155


Quantitative Economics with Python

8.21 Time Series

Suppose that there are two time periods.


• 𝑡 = 0 “today”
• 𝑡 = 1 “tomorrow”
Let 𝑋(0) be a random variable to be realized at 𝑡 = 0, 𝑋(1) be a random variable to be realized at 𝑡 = 1.
Suppose that

Prob{𝑋(0) = 𝑖, 𝑋(1) = 𝑗} = 𝑓𝑖𝑗 ≥ 0 𝑖 = 0, ⋯ , 𝐼 − 1


∑ ∑ 𝑓𝑖𝑗 = 1
𝑖 𝑗

𝑓𝑖𝑗 is a joint distribution over [𝑋(0), 𝑋(1)].


A conditional distribution is
𝑓𝑖𝑗
Prob{𝑋(1) = 𝑗|𝑋(0) = 𝑖} =
∑𝑗 𝑓𝑖𝑗

Remark:
• This is a key formula for a theory of optimally predicting a time series.

156 Chapter 8. Elementary Probability with Matrices


CHAPTER

NINE

UNIVARIATE TIME SERIES WITH MATRIX ALGEBRA

Contents

• Univariate Time Series with Matrix Algebra


– Overview
– Samuelson’s model
– Adding a random term
– A forward looking model

9.1 Overview

This lecture uses matrices to solve some linear difference equations.


As a running example, we’ll study a second-order linear difference equation that was the key technical tool in Paul
Samuelson’s 1939 article [Sam39] that introduced the multiplier-accelerator model.
This model became the workhorse that powered early econometric versions of Keynesian macroeconomic models in the
United States.
You can read about the details of that model in this QuantEcon lecture.
(That lecture also describes some technicalities about second-order linear difference equations.)
We’ll also study a “perfect foresight” model of stock prices that involves solving a “forward-looking” linear difference
equation.
We will use the following imports:

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import cm
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size

157
Quantitative Economics with Python

9.2 Samuelson’s model

Let 𝑡 = 0, ±1, ±2, … index time.


For 𝑡 = 1, 2, 3, … , 𝑇 suppose that

𝑦𝑡 = 𝛼0 + 𝛼1 𝑦𝑡−1 + 𝛼2 𝑦𝑡−2 (9.1)

where we assume that 𝑦0 and 𝑦−1 are given numbers that we take as initial conditions.
In Samuelson’s model, 𝑦𝑡 stood for national income or perhaps a different measure of aggregate activity called gross
domestic product (GDP) at time 𝑡.
Equation (9.1) is called a second-order linear difference equation.
But actually, it is a collection of 𝑇 simultaneous linear equations in the 𝑇 variables 𝑦1 , 𝑦2 , … , 𝑦𝑇 .

Note: To be able to solve a second-order linear difference equation, we require two boundary conditions that can take
the form either of two initial conditions or two terminal conditions or possibly one of each.

Let’s write our equations as a stacked system


1 0 0 0 ⋯ 0 0 0 𝑦1 𝛼0 + 𝛼1 𝑦0 + 𝛼2 𝑦−1
⎡ −𝛼 1 0 0 ⋯ 0 0 0 ⎤⎡ 𝑦2 ⎤ ⎡ 𝛼 0 + 𝛼 2 𝑦0 ⎤
1
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ −𝛼 2 −𝛼 1 1 0 ⋯ 0 0 0 ⎥⎢ 𝑦3 ⎥= ⎢ 𝛼 0 ⎥
⎢ 0 −𝛼2 −𝛼1 1 ⋯ 0 0 0 ⎥⎢ 𝑦4 ⎥ ⎢ 𝛼0 ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣ 0 0 0 0 ⋯ −𝛼
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 2 −𝛼 1 1 ⎦⎣ 𝑦𝑇 ⎣
⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟𝛼 0 ⎦
≡𝐴 ≡𝑏
or

𝐴𝑦 = 𝑏

where
𝑦1
⎡𝑦 ⎤
𝑦 = ⎢ 2⎥
⎢⋯⎥
⎣𝑦𝑇 ⎦
Evidently 𝑦 can be computed from

𝑦 = 𝐴−1 𝑏

The vector 𝑦 is a complete time path {𝑦𝑡 }𝑇𝑡=1 .


Let’s put Python to work on an example that captures the flavor of Samuelson’s multiplier-accelerator model.
We’ll set parameters equal to the same values we used in this QuantEcon lecture.

T = 80

# parameters
0 = 10.0
1 = 1.53
2 = -.9

y_1 = 28. # y_{-1}


y0 = 24.

158 Chapter 9. Univariate Time Series with Matrix Algebra


Quantitative Economics with Python

Now we construct 𝐴 and 𝑏.

A = np.identity(T) # The T x T identity matrix

for i in range(T):

if i-1 >= 0:
A[i, i-1] = - 1

if i-2 >= 0:
A[i, i-2] = - 2

b = np.full(T, 0)
b[0] = 0 + 1 * y0 + 2 * y_1
b[1] = 0 + 2 * y0

Let’s look at the matrix 𝐴 and the vector 𝑏 for our example.

A, b

(array([[ 1. , 0. , 0. , ..., 0. , 0. , 0. ],
[-1.53, 1. , 0. , ..., 0. , 0. , 0. ],
[ 0.9 , -1.53, 1. , ..., 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. , ..., 1. , 0. , 0. ],
[ 0. , 0. , 0. , ..., -1.53, 1. , 0. ],
[ 0. , 0. , 0. , ..., 0.9 , -1.53, 1. ]]),
array([ 21.52, -11.6 , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ,
10. , 10. , 10. , 10. , 10. , 10. , 10. , 10. ]))

Now let’s solve for the path of 𝑦.


If 𝑦𝑡 is GNP at time 𝑡, then we have a version of Samuelson’s model of the dynamics for GNP.
To solve 𝑦 = 𝐴−1 𝑏 we can either invert 𝐴 directly, as in

A_inv = np.linalg.inv(A)

y = A_inv @ b

or we can use np.linalg.solve:

y_second_method = np.linalg.solve(A, b)

Here make sure the two methods give the same result, at least up to floating point precision:

np.allclose(y, y_second_method)

9.2. Samuelson’s model 159


Quantitative Economics with Python

True

Note: In general, np.linalg.solve is more numerically stable than using np.linalg.inv directly. However,
stability is not an issue for this small example. Moreover, we will repeatedly use A_inv in what follows, so there is added
value in computing it directly.

Now we can plot.

plt.plot(np.arange(T)+1, y)
plt.xlabel('t')
plt.ylabel('y')

plt.show()

The steady state value 𝑦∗ of 𝑦𝑡 is obtained by setting 𝑦𝑡 = 𝑦𝑡−1 = 𝑦𝑡−2 = 𝑦∗ in (9.1), which yields
𝛼0
𝑦∗ =
1 − 𝛼 1 − 𝛼2
If we set the initial values to 𝑦0 = 𝑦−1 = 𝑦∗ , then 𝑦𝑡 will be constant:

y_star = 0 / (1 - 1 - 2)
y_1_steady = y_star # y_{-1}
y0_steady = y_star

b_steady = np.full(T, 0)
b_steady[0] = 0 + 1 * y0_steady + 2 * y_1_steady
b_steady[1] = 0 + 2 * y0_steady

y_steady = A_inv @ b_steady

plt.plot(np.arange(T)+1, y_steady)
plt.xlabel('t')
(continues on next page)

160 Chapter 9. Univariate Time Series with Matrix Algebra


Quantitative Economics with Python

(continued from previous page)


plt.ylabel('y')

plt.show()

9.3 Adding a random term

To generate some excitement, we’ll follow in the spirit of the great economists Eugen Slutsky and Ragnar Frisch and
replace our original second-order difference equation with the following second-order stochastic linear difference
equation:

𝑦𝑡 = 𝛼0 + 𝛼1 𝑦𝑡−1 + 𝛼2 𝑦𝑡−2 + 𝑢𝑡 (9.2)

where 𝑢𝑡 ∼ 𝑁 (0, 𝜎𝑢2 ) and is IID, meaning independent and identically distributed.
We’ll stack these 𝑇 equations into a system cast in terms of matrix algebra.
Let’s define the random vector
𝑢1
⎡ 𝑢 ⎤
𝑢=⎢ 2 ⎥
⎢ ⋮ ⎥
⎣ 𝑢𝑇 ⎦

Where 𝐴, 𝑏, 𝑦 are defined as above, now assume that 𝑦 is governed by the system

𝐴𝑦 = 𝑏 + 𝑢

The solution for 𝑦 becomes

𝑦 = 𝐴−1 (𝑏 + 𝑢)

Let’s try it out in Python.

9.3. Adding a random term 161


Quantitative Economics with Python

u = 2.

u = np.random.normal(0, u, size=T)
y = A_inv @ (b + u)

plt.plot(np.arange(T)+1, y)
plt.xlabel('t')
plt.ylabel('y')

plt.show()

The above time series looks a lot like (detrended) GDP series for a number of advanced countries in recent decades.
We can simulate 𝑁 paths.

N = 100

for i in range(N):
col = cm.viridis(np.random.rand()) # Choose a random color from viridis
u = np.random.normal(0, u, size=T)
y = A_inv @ (b + u)
plt.plot(np.arange(T)+1, y, lw=0.5, color=col)

plt.xlabel('t')
plt.ylabel('y')

plt.show()

162 Chapter 9. Univariate Time Series with Matrix Algebra


Quantitative Economics with Python

Also consider the case when 𝑦0 and 𝑦−1 are at steady state.

N = 100

for i in range(N):
col = cm.viridis(np.random.rand()) # Choose a random color from viridis
u = np.random.normal(0, u, size=T)
y_steady = A_inv @ (b_steady + u)
plt.plot(np.arange(T)+1, y_steady, lw=0.5, color=col)

plt.xlabel('t')
plt.ylabel('y')

plt.show()

9.3. Adding a random term 163


Quantitative Economics with Python

9.4 A forward looking model

Samuelson’s model is backwards looking in the sense that we give it initial conditions and let it run.
Let’s now turn to model that is forward looking.
We apply similar linear algebra machinery to study a perfect foresight model widely used as a benchmark in macroeco-
nomics and finance.
As an example, we suppose that 𝑝𝑡 is the price of a stock and that 𝑦𝑡 is its dividend.
We assume that 𝑦𝑡 is determined by second-order difference equation that we analyzed just above, so that

𝑦 = 𝐴−1 (𝑏 + 𝑢)

Our perfect foresight model of stock prices is


𝑇 −𝑡
𝑝𝑡 = ∑ 𝛽 𝑗 𝑦𝑡+𝑗 , 𝛽 ∈ (0, 1)
𝑗=0

where 𝛽 is a discount factor.


The model asserts that the price of the stock at 𝑡 equals the discounted present values of the (perfectly foreseen) future
dividends.
Form
𝑝1 1 𝛽 𝛽 2 ⋯ 𝛽 𝑇 −1 𝑦1
⎡ 𝑝 ⎤ ⎡ 0 1 𝛽 ⋯ 𝛽 𝑇 −2 ⎤ ⎡ 𝑦2 ⎤
⎢ 2 ⎥ ⎢ 𝑇 −3 ⎥ ⎢ ⎥
⎢ 𝑝3 ⎥ = ⎢ 0 0 1 ⋯ 𝛽 ⎥⎢ 𝑦3 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⏟ ⎣ 0 0 0 ⋯
⎣ 𝑝𝑇 ⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 1 ⎦⎣ 𝑦𝑇 ⎦
≡𝑝 ≡𝐵

= .96

# construct B
B = np.zeros((T, T))

for i in range(T):
B[i, i:] = ** np.arange(0, T-i)

array([[1. , 0.96 , 0.9216 , ..., 0.04314048, 0.04141486,


0.03975826],
[0. , 1. , 0.96 , ..., 0.044938 , 0.04314048,
0.04141486],
[0. , 0. , 1. , ..., 0.04681041, 0.044938 ,
0.04314048],
...,
[0. , 0. , 0. , ..., 1. , 0.96 ,
0.9216 ],
[0. , 0. , 0. , ..., 0. , 1. ,
0.96 ],
[0. , 0. , 0. , ..., 0. , 0. ,
1. ]])

164 Chapter 9. Univariate Time Series with Matrix Algebra


Quantitative Economics with Python

u = 0.
u = np.random.normal(0, u, size=T)
y = A_inv @ (b + u)
y_steady = A_inv @ (b_steady + u)

p = B @ y

plt.plot(np.arange(0, T)+1, y, label='y')


plt.plot(np.arange(0, T)+1, p, label='p')
plt.xlabel('t')
plt.ylabel('y/p')
plt.legend()

plt.show()

Can you explain why the trend of the price is downward over time?
Also consider the case when 𝑦0 and 𝑦−1 are at the steady state.

p_steady = B @ y_steady

plt.plot(np.arange(0, T)+1, y_steady, label='y')


plt.plot(np.arange(0, T)+1, p_steady, label='p')
plt.xlabel('t')
plt.ylabel('y/p')
plt.legend()

plt.show()

9.4. A forward looking model 165


Quantitative Economics with Python

166 Chapter 9. Univariate Time Series with Matrix Algebra


CHAPTER

TEN

LLN AND CLT

Contents

• LLN and CLT


– Overview
– Relationships
– LLN
– CLT
– Exercises
– Solutions

10.1 Overview

This lecture illustrates two of the most important theorems of probability and statistics: The law of large numbers (LLN)
and the central limit theorem (CLT).
These beautiful theorems lie behind many of the most fundamental results in econometrics and quantitative economic
modeling.
The lecture is based around simulations that show the LLN and CLT in action.
We also demonstrate how the LLN and CLT break down when the assumptions they are based on do not hold.
In addition, we examine several useful extensions of the classical theorems, such as
• The delta method, for smooth functions of random variables.
• The multivariate case.
Some of these extensions are presented as exercises.
We’ll need the following imports:

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import random
import numpy as np
from scipy.stats import t, beta, lognorm, expon, gamma, uniform, cauchy
(continues on next page)

167
Quantitative Economics with Python

(continued from previous page)


from scipy.stats import gaussian_kde, poisson, binom, norm, chi2
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.collections import PolyCollection
from scipy.linalg import inv, sqrtm

10.2 Relationships

The CLT refines the LLN.


The LLN gives conditions under which sample moments converge to population moments as sample size increases.
The CLT provides information about the rate at which sample moments converge to population moments as sample size
increases.

10.3 LLN

We begin with the law of large numbers, which tells us when sample averages will converge to their population means.

10.3.1 The Classical LLN

The classical law of large numbers concerns independent and identically distributed (IID) random variables.
Here is the strongest version of the classical LLN, known as Kolmogorov’s strong law.
Let 𝑋1 , … , 𝑋𝑛 be independent and identically distributed scalar random variables, with common distribution 𝐹 .
When it exists, let 𝜇 denote the common mean of this sample:

𝜇 ∶= 𝔼𝑋 = ∫ 𝑥𝐹 (𝑑𝑥)

In addition, let

1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1

Kolmogorov’s strong law states that, if 𝔼|𝑋| is finite, then

ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (10.1)

What does this last expression mean?


Let’s think about it from a simulation perspective, imagining for a moment that our computer can generate perfect random
samples (which of course it can’t).
Let’s also imagine that we can generate infinite sequences so that the statement 𝑋̄ 𝑛 → 𝜇 can be evaluated.
In this setting, (10.1) should be interpreted as meaning that the probability of the computer producing a sequence where
𝑋̄ 𝑛 → 𝜇 fails to occur is zero.

168 Chapter 10. LLN and CLT


Quantitative Economics with Python

10.3.2 Proof

The proof of Kolmogorov’s strong law is nontrivial – see, for example, theorem 8.3.5 of [Dud02].
On the other hand, we can prove a weaker version of the LLN very easily and still get most of the intuition.
The version we prove is as follows: If 𝑋1 , … , 𝑋𝑛 is IID with 𝔼𝑋𝑖2 < ∞, then, for any 𝜖 > 0, we have

ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} → 0 as 𝑛→∞ (10.2)

(This version is weaker because we claim only convergence in probability rather than almost sure convergence, and assume
a finite second moment)
To see that this is so, fix 𝜖 > 0, and let 𝜎2 be the variance of each 𝑋𝑖 .
Recall the Chebyshev inequality, which tells us that

𝔼[(𝑋̄ 𝑛 − 𝜇)2 ]
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (10.3)
𝜖2
Now observe that
2

{ 1 𝑛 ⎫
}
̄ 2
𝔼[(𝑋𝑛 − 𝜇) ] = 𝔼 ⎨[ ∑(𝑋𝑖 − 𝜇)] ⎬
{ 𝑛 𝑖=1
⎩ }

𝑛 𝑛
1
= 2 ∑ ∑ 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇)
𝑛 𝑖=1 𝑗=1
1 𝑛
= ∑ 𝔼(𝑋𝑖 − 𝜇)2
𝑛2 𝑖=1
𝜎2
=
𝑛
Here the crucial step is at the third equality, which follows from independence.
Independence means that if 𝑖 ≠ 𝑗, then the covariance term 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) drops out.
As a result, 𝑛2 − 𝑛 terms vanish, leading us to a final expression that goes to zero in 𝑛.
Combining our last result with (10.3), we come to the estimate

𝜎2
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (10.4)
𝑛𝜖2
The claim in (10.2) is now clear.
Of course, if the sequence 𝑋1 , … , 𝑋𝑛 is correlated, then the cross-product terms 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) are not necessarily
zero.
While this doesn’t mean that the same line of argument is impossible, it does mean that if we want a similar result then
the covariances should be “almost zero” for “most” of these terms.
In a long sequence, this would be true if, for example, 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) approached zero when the difference between
𝑖 and 𝑗 became large.
In other words, the LLN can still work if the sequence 𝑋1 , … , 𝑋𝑛 has a kind of “asymptotic independence”, in the sense
that correlation falls to zero as variables become further apart in the sequence.
This idea is very important in time series analysis, and we’ll come across it again soon enough.

10.3. LLN 169


Quantitative Economics with Python

10.3.3 Illustration

Let’s now illustrate the classical IID law of large numbers using simulation.
In particular, we aim to generate some sequences of IID random variables and plot the evolution of 𝑋̄ 𝑛 as 𝑛 increases.
Below is a figure that does just this (as usual, you can click on it to expand it).
It shows IID observations from three different distributions and plots 𝑋̄ 𝑛 against 𝑛 in each case.
The dots represent the underlying observations 𝑋𝑖 for 𝑖 = 1, … , 100.
In each of the three cases, convergence of 𝑋̄ 𝑛 to 𝜇 occurs as predicted

n = 100

# Arbitrary collection of distributions


distributions = {"student's t with 10 degrees of freedom": t(10),
"β(2, 2)": beta(2, 2),
"lognormal LN(0, 1/2)": lognorm(0.5),
"γ(5, 1/2)": gamma(5, scale=2),
"poisson(4)": poisson(4),
"exponential with λ = 1": expon(1)}

# Create a figure and some axes


num_plots = 3
fig, axes = plt.subplots(num_plots, 1, figsize=(10, 20))

# Set some plotting parameters to improve layout


bbox = (0., 1.02, 1., .102)
legend_args = {'ncol': 2,
'bbox_to_anchor': bbox,
'loc': 3,
'mode': 'expand'}
plt.subplots_adjust(hspace=0.5)

for ax in axes:
# Choose a randomly selected distribution
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)

# Generate n draws from the distribution


data = distribution.rvs(n)

# Compute sample mean at each n


sample_mean = np.empty(n)
for i in range(n):
sample_mean[i] = np.mean(data[:i+1])

# Plot
ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5)
axlabel = '$\\bar X_n$ for $X_i \sim$' + name
ax.plot(list(range(n)), sample_mean, 'g-', lw=3, alpha=0.6, label=axlabel)
m = distribution.mean()
ax.plot(list(range(n)), [m] * n, 'k--', lw=1.5, label='$\mu$')
ax.vlines(list(range(n)), m, data, lw=0.2)
ax.legend(**legend_args, fontsize=12)

plt.show()

170 Chapter 10. LLN and CLT


Quantitative Economics with Python

10.3. LLN 171


Quantitative Economics with Python

The three distributions are chosen at random from a selection stored in the dictionary distributions.

10.4 CLT

Next, we turn to the central limit theorem, which tells us about the distribution of the deviation between sample averages
and population means.

10.4.1 Statement of the Theorem

The central limit theorem is one of the most remarkable results in all of mathematics.
In the classical IID setting, it tells us the following:
If the sequence 𝑋1 , … , 𝑋𝑛 is IID, with common mean 𝜇 and common variance 𝜎2 ∈ (0, ∞), then
√ 𝑑
𝑛(𝑋̄ 𝑛 − 𝜇) → 𝑁 (0, 𝜎2 ) as 𝑛 → ∞ (10.5)

𝑑
Here → 𝑁 (0, 𝜎2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal with standard deviation 𝜎.

10.4.2 Intuition

The striking implication of the CLT is that for any distribution with finite second moment, the simple operation of adding
independent copies always leads to a Gaussian curve.
A relatively simple proof of the central limit theorem can be obtained by working with characteristic functions (see, e.g.,
theorem 9.5.6 of [Dud02]).
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition.
In fact, all of the proofs of the CLT that we know are similar in this respect.
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating the addition of independent Bernoulli random variables.
In particular, let 𝑋𝑖 be binary, with ℙ{𝑋𝑖 = 0} = ℙ{𝑋𝑖 = 1} = 0.5, and let 𝑋1 , … , 𝑋𝑛 be independent.
𝑛
Think of 𝑋𝑖 = 1 as a “success”, so that 𝑌𝑛 = ∑𝑖=1 𝑋𝑖 is the number of successes in 𝑛 trials.
The next figure plots the probability mass function of 𝑌𝑛 for 𝑛 = 1, 2, 4, 8

fig, axes = plt.subplots(2, 2, figsize=(10, 6))


plt.subplots_adjust(hspace=0.4)
axes = axes.flatten()
ns = [1, 2, 4, 8]
dom = list(range(9))

for ax, n in zip(axes, ns):


b = binom(n, 0.5)
ax.bar(dom, b.pmf(dom), alpha=0.6, align='center')
ax.set(xlim=(-0.5, 8.5), ylim=(0, 0.55),
xticks=list(range(9)), yticks=(0, 0.2, 0.4),
title=f'$n = {n}$')

plt.show()

172 Chapter 10. LLN and CLT


Quantitative Economics with Python

When 𝑛 = 1, the distribution is flat — one success or no successes have the same probability.
When 𝑛 = 2 we can either have 0, 1 or 2 successes.
Notice the peak in probability mass at the mid-point 𝑘 = 1.
The reason is that there are more ways to get 1 success (“fail then succeed” or “succeed then fail”) than to get zero or two
successes.
Moreover, the two trials are independent, so the outcomes “fail then succeed” and “succeed then fail” are just as likely as
the outcomes “fail then fail” and “succeed then succeed”.
(If there was positive correlation, say, then “succeed then fail” would be less likely than “succeed then succeed”)
Here, already we have the essence of the CLT: addition under independence leads probability mass to pile up in the middle
and thin out at the tails.
For 𝑛 = 4 and 𝑛 = 8 we again get a peak at the “middle” value (halfway between the minimum and the maximum
possible value).
The intuition is the same — there are simply more ways to get these middle outcomes.
If we continue, the bell-shaped curve becomes even more pronounced.
We are witnessing the binomial approximation of the normal distribution.

10.4. CLT 173


Quantitative Economics with Python

10.4.3 Simulation 1

Since the CLT seems almost magical, running simulations that verify its implications is one good way to build intuition.
To this end, we now perform the following simulation
1. Choose an arbitrary distribution 𝐹 for the underlying observations 𝑋𝑖 .

2. Generate independent draws of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇).
3. Use these draws to compute some measure of their distribution — such as a histogram.
4. Compare the latter to 𝑁 (0, 𝜎2 ).
Here’s some code that does exactly this for the exponential distribution 𝐹 (𝑥) = 1 − 𝑒−𝜆𝑥 .
(Please experiment with other choices of 𝐹 , but remember that, to conform with the conditions of the CLT, the distribution
must have a finite second moment.)

# Set parameters
n = 250 # Choice of n
k = 100000 # Number of draws of Y_n
distribution = expon(2) # Exponential distribution, λ = 1/2
μ, s = distribution.mean(), distribution.std()

# Draw underlying RVs. Each row contains a draw of X_1,..,X_n


data = distribution.rvs((k, n))
# Compute mean of each row, producing k draws of \bar X_n
sample_means = data.mean(axis=1)
# Generate observations of Y_n
Y = np.sqrt(n) * (sample_means - μ)

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k-', lw=2, label='$N(0, \sigma^2)$')
ax.legend()

plt.show()

174 Chapter 10. LLN and CLT


Quantitative Economics with Python

Notice the absence of for loops — every operation is vectorized, meaning that the major calculations are all shifted to
highly optimized C code.
The fit to the normal density is already tight and can be further improved by increasing n.
You can also experiment with other specifications of 𝐹 .

10.4.4 Simulation 2

Our next simulation is somewhat like the first, except that we aim to track the distribution of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇) as 𝑛
increases.
In the simulation, we’ll be working with random variables having 𝜇 = 0.
Thus, when 𝑛 = 1, we have 𝑌1 = 𝑋1 , so the first distribution is just the distribution of the underlying random variable.

For 𝑛 = 2, the distribution of 𝑌2 is that of (𝑋1 + 𝑋2 )/ 2, and so on.
What we expect is that, regardless of the distribution of the underlying random variable, the distribution of 𝑌𝑛 will smooth
out into a bell-shaped curve.
The next figure shows this process for 𝑋𝑖 ∼ 𝑓, where 𝑓 was specified as the convex combination of three different beta
densities.
(Taking a convex combination is an easy way to produce an irregular shape for 𝑓.)
In the figure, the closest density is that of 𝑌1 , while the furthest is that of 𝑌5

beta_dist = beta(2, 2)

def gen_x_draws(k):
"""
Returns a flat array containing k independent draws from the
distribution of X, the underlying random variable. This distribution
(continues on next page)

10.4. CLT 175


Quantitative Economics with Python

(continued from previous page)


is itself a convex combination of three beta distributions.
"""
bdraws = beta_dist.rvs((3, k))
# Transform rows, so each represents a different distribution
bdraws[0, :] -= 0.5
bdraws[1, :] += 0.6
bdraws[2, :] -= 1.1
# Set X[i] = bdraws[j, i], where j is a random draw from {0, 1, 2}
js = np.random.randint(0, 2, size=k)
X = bdraws[js, np.arange(k)]
# Rescale, so that the random variable is zero mean
m, sigma = X.mean(), X.std()
return (X - m) / sigma

nmax = 5
reps = 100000
ns = list(range(1, nmax + 1))

# Form a matrix Z such that each column is reps independent draws of X


Z = np.empty((reps, nmax))
for i in range(nmax):
Z[:, i] = gen_x_draws(reps)
# Take cumulative sum across columns
S = Z.cumsum(axis=1)
# Multiply j-th column by sqrt j
Y = (1 / np.sqrt(ns)) * S

# Plot
fig = plt.figure(figsize = (10, 6))
ax = fig.gca(projection='3d')

a, b = -3, 3
gs = 100
xs = np.linspace(a, b, gs)

# Build verts
greys = np.linspace(0.3, 0.7, nmax)
verts = []
for n in ns:
density = gaussian_kde(Y[:, n-1])
ys = density(xs)
verts.append(list(zip(xs, ys)))

poly = PolyCollection(verts, facecolors=[str(g) for g in greys])


poly.set_alpha(0.85)
ax.add_collection3d(poly, zs=ns, zdir='x')

ax.set(xlim3d=(1, nmax), xticks=(ns), ylabel='$Y_n$', zlabel='$p(y_n)$',


xlabel=("n"), yticks=((-3, 0, 3)), ylim3d=(a, b),
zlim3d=(0, 0.4), zticks=((0.2, 0.4)))
ax.invert_xaxis()
# Rotates the plot 30 deg on z axis and 45 deg on x axis
ax.view_init(30, 45)
plt.show()

176 Chapter 10. LLN and CLT


Quantitative Economics with Python

/tmp/ipykernel_15567/392210314.py:36: MatplotlibDeprecationWarning: Calling gca()␣


↪with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor␣

↪releases later, gca() will take no keyword arguments. The gca() function should␣

↪only be used to get the current axes, or if no axes exist, create new axes with␣

↪default keyword arguments. To create a new axes with non-default arguments, use␣

↪plt.axes() or plt.subplot().

ax = fig.gca(projection='3d')

As expected, the distribution smooths out into a bell curve as 𝑛 increases.


We leave you to investigate its contents if you wish to know more.
If you run the file from the ordinary IPython shell, the figure should pop up in a window that you can rotate with your
mouse, giving different views on the density sequence.

10.4. CLT 177


Quantitative Economics with Python

10.4.5 The Multivariate Case

The law of large numbers and central limit theorem work just as nicely in multidimensional settings.
To state the results, let’s recall some elementary facts about random vectors.
A random vector X is just a sequence of 𝑘 random variables (𝑋1 , … , 𝑋𝑘 ).
Each realization of X is an element of ℝ𝑘 .
A collection of random vectors X1 , … , X𝑛 is called independent if, given any 𝑛 vectors x1 , … , x𝑛 in ℝ𝑘 , we have

ℙ{X1 ≤ x1 , … , X𝑛 ≤ x𝑛 } = ℙ{X1 ≤ x1 } × ⋯ × ℙ{X𝑛 ≤ x𝑛 }

(The vector inequality X ≤ x means that 𝑋𝑗 ≤ 𝑥𝑗 for 𝑗 = 1, … , 𝑘)


Let 𝜇𝑗 ∶= 𝔼[𝑋𝑗 ] for all 𝑗 = 1, … , 𝑘.
The expectation 𝔼[X] of X is defined to be the vector of expectations:

𝔼[𝑋1 ] 𝜇1
⎛ 𝔼[𝑋2 ] ⎞ ⎛ 𝜇2 ⎞
𝔼[X] ∶= ⎜



⎟ ⎜

⎟ ⎜ ⋮ ⎟
= ⎟ =∶ 𝜇


⎝ 𝔼[𝑋𝑘 ] ⎠ ⎝ 𝜇𝑘 ⎠

The variance-covariance matrix of random vector X is defined as

Var[X] ∶= 𝔼[(X − 𝜇)(X − 𝜇)′ ]

Expanding this out, we get

𝔼[(𝑋1 − 𝜇1 )(𝑋1 − 𝜇1 )] ⋯ 𝔼[(𝑋1 − 𝜇1 )(𝑋𝑘 − 𝜇𝑘 )]



⎜ 𝔼[(𝑋2 − 𝜇2 )(𝑋1 − 𝜇1 )] ⋯ 𝔼[(𝑋2 − 𝜇2 )(𝑋𝑘 − 𝜇𝑘 )] ⎞

Var[X] = ⎜
⎜ ⎟

⋮ ⋮ ⋮
⎝ 𝔼[(𝑋𝑘 − 𝜇𝑘 )(𝑋1 − 𝜇1 )] ⋯ 𝔼[(𝑋𝑘 − 𝜇𝑘 )(𝑋𝑘 − 𝜇𝑘 )] ⎠

The 𝑗, 𝑘-th term is the scalar covariance between 𝑋𝑗 and 𝑋𝑘 .


With this notation, we can proceed to the multivariate LLN and CLT.
Let X1 , … , X𝑛 be a sequence of independent and identically distributed random vectors, each one taking values in ℝ𝑘 .
Let 𝜇 be the vector 𝔼[X𝑖 ], and let Σ be the variance-covariance matrix of X𝑖 .
Interpreting vector addition and scalar multiplication in the usual way (i.e., pointwise), let

1 𝑛
X̄ 𝑛 ∶= ∑ X𝑖
𝑛 𝑖=1

In this setting, the LLN tells us that

ℙ {X̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (10.6)

Here X̄ 𝑛 → 𝜇 means that ‖X̄ 𝑛 − 𝜇‖ → 0, where ‖ ⋅ ‖ is the standard Euclidean norm.


The CLT tells us that, provided Σ is finite,
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) as 𝑛→∞ (10.7)

178 Chapter 10. LLN and CLT


Quantitative Economics with Python

10.5 Exercises

Exercise 10.5.1
One very useful consequence of the central limit theorem is as follows.
Assume the conditions of the CLT as stated above.
If 𝑔 ∶ ℝ → ℝ is differentiable at 𝜇 and 𝑔′ (𝜇) ≠ 0, then
√ 𝑑
𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} → 𝑁 (0, 𝑔′ (𝜇)2 𝜎2 ) as 𝑛 → ∞ (10.8)

This theorem is used frequently in statistics to obtain the asymptotic distribution of estimators — many of which can be
expressed as functions of sample means.
(These kinds of results are often said to use the “delta method”.)
The proof is based on a Taylor expansion of 𝑔 around the point 𝜇.
Taking the result as given, let the distribution 𝐹 of each 𝑋𝑖 be uniform on [0, 𝜋/2] and let 𝑔(𝑥) = sin(𝑥).

Derive the asymptotic distribution of 𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} and illustrate convergence in the same spirit as the program
discussed above.
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
What is the source of the problem?

Exercise 10.5.2
Here’s a result that’s often used in developing statistical tests, and is connected to the multivariate central limit theorem.
If you study econometric theory, you will see this result used again and again.
Assume the setting of the multivariate CLT discussed above, so that
1. X1 , … , X𝑛 is a sequence of IID random vectors, each taking values in ℝ𝑘 .
2. 𝜇 ∶= 𝔼[X𝑖 ], and Σ is the variance-covariance matrix of X𝑖 .
3. The convergence
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) (10.9)
is valid.
In a statistical setting, one often wants the right-hand side to be standard normal so that confidence intervals are easily
computed.
This normalization can be achieved on the basis of three observations.
First, if X is a random vector in ℝ𝑘 and A is constant and 𝑘 × 𝑘, then

Var[AX] = A Var[X]A′

𝑑
Second, by the continuous mapping theorem, if Z𝑛 → Z in ℝ𝑘 and A is constant and 𝑘 × 𝑘, then
𝑑
AZ𝑛 → AZ

10.5. Exercises 179


Quantitative Economics with Python

Third, if S is a 𝑘 × 𝑘 symmetric positive definite matrix, then there exists a symmetric positive definite matrix Q, called
the inverse square root of S, such that

QSQ′ = I

Here I is the 𝑘 × 𝑘 identity matrix.


Putting these things together, your first exercise is to show that if Q is the inverse square root of 2/7, then
√ 𝑑
Z𝑛 ∶= 𝑛Q(X̄ 𝑛 − 𝜇) → Z ∼ 𝑁 (0, I)

Applying the continuous mapping theorem one more time tells us that
𝑑
‖Z𝑛 ‖2 → ‖Z‖2

Given the distribution of Z, we conclude that


𝑑
𝑛‖Q(X̄ 𝑛 − 𝜇)‖2 → 𝜒2 (𝑘) (10.10)

where 𝜒2 (𝑘) is the chi-squared distribution with 𝑘 degrees of freedom.


(Recall that 𝑘 is the dimension of X𝑖 , the underlying random vectors.)
Your second exercise is to illustrate the convergence in (10.10) with a simulation.
In doing so, let

𝑊𝑖
X𝑖 ∶= ( )
𝑈𝑖 + 𝑊𝑖

where
• each 𝑊𝑖 is an IID draw from the uniform distribution on [−1, 1].
• each 𝑈𝑖 is an IID draw from the uniform distribution on [−2, 2].
• 𝑈𝑖 and 𝑊𝑖 are independent of each other.
Hints:
1. scipy.linalg.sqrtm(A) computes the square root of A. You still need to invert it.
2. You should be able to work out Σ from the preceding information.

10.6 Solutions

Solution to Exercise 10.5.1


Here is one solution

"""
Illustrates the delta method, a consequence of the central limit theorem.
"""

# Set parameters
n = 250
replications = 100000
(continues on next page)

180 Chapter 10. LLN and CLT


Quantitative Economics with Python

(continued from previous page)


distribution = uniform(loc=0, scale=(np.pi / 2))
μ, s = distribution.mean(), distribution.std()

g = np.sin
g_prime = np.cos

# Generate obs of sqrt{n} (g(X_n) - g(μ))


data = distribution.rvs((replications, n))
sample_means = data.mean(axis=1) # Compute mean of each row
error_obs = np.sqrt(n) * (g(sample_means) - g(μ))

# Plot
asymptotic_sd = g_prime(μ) * s
fig, ax = plt.subplots(figsize=(10, 6))
xmin = -3 * g_prime(μ) * s
xmax = -xmin
ax.set_xlim(xmin, xmax)
ax.hist(error_obs, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
lb = "$N(0, g'(\mu)^2 \sigma^2)$"
ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k-', lw=2, label=lb)
ax.legend()
plt.show()

What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
In this case, the mean 𝜇 of this distribution is 𝜋/2, and since 𝑔′ = cos, we have 𝑔′ (𝜇) = 0.
Hence the conditions of the delta theorem are not satisfied.

Solution to Exercise 10.5.2

10.6. Solutions 181


Quantitative Economics with Python

First we want to verify the claim that


√ 𝑑
𝑛Q(X̄ 𝑛 − 𝜇) → 𝑁 (0, I)

This is straightforward given the facts presented in the exercise.


Let

Y𝑛 ∶= 𝑛(X̄ 𝑛 − 𝜇) and Y ∼ 𝑁 (0, Σ)

By the multivariate CLT and the continuous mapping theorem, we have


𝑑
QY𝑛 → QY

Since linear combinations of normal random variables are normal, the vector QY is also normal.
Its mean is clearly 0, and its variance-covariance matrix is

Var[QY] = QVar[Y]Q′ = QΣQ′ = I

𝑑
In conclusion, QY𝑛 → QY ∼ 𝑁 (0, I), which is what we aimed to show.
Now we turn to the simulation exercise.
Our solution is as follows

# Set parameters
n = 250
replications = 50000
dw = uniform(loc=-1, scale=2) # Uniform(-1, 1)
du = uniform(loc=-2, scale=4) # Uniform(-2, 2)
sw, su = dw.std(), du.std()
vw, vu = sw**2, su**2
Σ = ((vw, vw), (vw, vw + vu))
Σ = np.array(Σ)

# Compute Σ^{-1/2}
Q = inv(sqrtm(Σ))

# Generate observations of the normalized sample mean


error_obs = np.empty((2, replications))
for i in range(replications):
# Generate one sequence of bivariate shocks
X = np.empty((2, n))
W = dw.rvs(n)
U = du.rvs(n)
# Construct the n observations of the random vector
X[0, :] = W
X[1, :] = W + U
# Construct the i-th observation of Y_n
error_obs[:, i] = np.sqrt(n) * X.mean(axis=1)

# Premultiply by Q and then take the squared norm


temp = Q @ error_obs
chisq_obs = np.sum(temp**2, axis=0)

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
(continues on next page)

182 Chapter 10. LLN and CLT


Quantitative Economics with Python

(continued from previous page)


xmax = 8
ax.set_xlim(0, xmax)
xgrid = np.linspace(0, xmax, 200)
lb = "Chi-squared with 2 degrees of freedom"
ax.plot(xgrid, chi2.pdf(xgrid, 2), 'k-', lw=2, label=lb)
ax.legend()
ax.hist(chisq_obs, bins=50, density=True)
plt.show()

10.6. Solutions 183


Quantitative Economics with Python

184 Chapter 10. LLN and CLT


CHAPTER

ELEVEN

TWO MEANINGS OF PROBABILITY

11.1 Overview

This lecture illustrates two distinct interpretations of a probability distribution


• A frequentist interpretation as relative frequencies anticipated to occur in a large i.i.d. sample
• A Bayesian interpretation as a personal opinion (about a parameter or list of parameters) after seeing a collection
of observations
We recommend watching this video about hypothesis testing within the frequentist approach

https://youtu.be/8JIe_cz6qGA

After you watch that video, please watch the following video on the Bayesian approach to constructing coverage intervals

https://youtu.be/Pahyv9i_X2k

After you are familiar with the material in these videos, this lecture uses the Socratic method to to help consolidate your
understanding of the different questions that are answered by
• a frequentist confidence interval
• a Bayesian coverage interval
We do this by inviting you to write some Python code.
It would be especially useful if you tried doing this after each question that we pose for you, before proceeding to read
the rest of the lecture.
We provide our own answers as the lecture unfolds, but you’ll learn more if you try writing your own code before reading
and running ours.
Code for answering questions:
In addition to what’s in Anaconda, this lecture will deploy the following library:

pip install prettytable

To answer our coding questions, we’ll start with some imports

import numpy as np
import pandas as pd
import prettytable as pt
import matplotlib.pyplot as plt
from scipy.stats import binom
(continues on next page)

185
Quantitative Economics with Python

(continued from previous page)


import scipy.stats as st
%matplotlib inline

Empowered with these Python tools, we’ll now explore the two meanings described above.

11.2 Frequentist Interpretation

Consider the following classic example.


The random variable 𝑋 takes on possible values 𝑘 = 0, 1, 2, … , 𝑛 with probabilties

𝑛!
Prob(𝑋 = 𝑘|𝜃) = ( ) 𝜃𝑘 (1 − 𝜃)𝑛−𝑘
𝑘!(𝑛 − 𝑘)!

where the fixed parameter 𝜃 ∈ (0, 1).


This is called the binomial distribution.
Here
• 𝜃 is the probability that one toss of a coin will be a head, an outcome that we encode as 𝑌 = 1.
• 1 − 𝜃 is the probability that one toss of the coin will be a tail, an outcome that we denote 𝑌 = 0.
• 𝑋 is the total number of heads that came up after flipping the coin 𝑛 times.
Consider the following experiment:
Take 𝐼 independent sequences of 𝑛 independent flips of the coin
Notice the repeated use of the adjective independent:
• we use it once to describe that we are drawing 𝑛 independent times from a Bernoulli distribution with parameter
𝜃 to arrive at one draw from a Binomial distribution with parameters 𝜃, 𝑛.
• we use it again to describe that we are then drawing 𝐼 sequences of 𝑛 coin draws.
Let 𝑦ℎ𝑖 ∈ {0, 1} be the realized value of 𝑌 on the ℎth flip during the 𝑖th sequence of flips.
𝑛
Let ∑ℎ=1 𝑦ℎ𝑖 denote the total number of times heads come up during the 𝑖th sequence of 𝑛 independent coin flips.
𝑛
Let 𝑓𝑘 record the fraction of samples of length 𝑛 for which ∑ℎ=1 𝑦ℎ𝑖 = 𝑘:
𝑛
number of samples of length n for which ∑ℎ=1 𝑦ℎ𝑖 = 𝑘
𝑓𝑘𝐼 =
𝐼
The probability Prob(𝑋 = 𝑘|𝜃) answers the following question:
• As 𝐼 becomes large, in what fraction of 𝐼 independent draws of 𝑛 coin flips should we anticipate 𝑘 heads to occur?
As usual, a law of large numbers justifies this answer.

Exercise 11.2.1
1. Please write a Python class to compute 𝑓𝑘𝐼
2. Please use your code to compute 𝑓𝑘𝐼 , 𝑘 = 0, … , 𝑛 and compare them to Prob(𝑋 = 𝑘|𝜃) for various values of 𝜃, 𝑛
and 𝐼
3. With the Law of Large numbers in mind, use your code to say something

186 Chapter 11. Two Meanings of Probability


Quantitative Economics with Python

Solution to Exercise 11.2.1

class frequentist:

def __init__(self, θ, n, I):

'''
initialization
-----------------
parameters:
θ : probability that one toss of a coin will be a head with Y = 1
n : number of independent flips in each independent sequence of draws
I : number of independent sequence of draws

'''

self.θ, self.n, self.I = θ, n, I

def binomial(self, k):

'''compute the theoretical probability for specific input k'''

θ, n = self.θ, self.n
self.k = k
self.P = binom.pmf(k, n, θ)

def draw(self):

'''draw n independent flips for I independent sequences'''

θ, n, I = self.θ, self.n, self.I


sample = np.random.rand(I, n)
Y = (sample <= θ) * 1
self.Y = Y

def compute_fk(self, kk):

'''compute f_{k}^I for specific input k'''

Y, I = self.Y, self.I
K = np.sum(Y, 1)
f_kI = np.sum(K == kk) / I
self.f_kI = f_kI
self.kk = kk

def compare(self):

'''compute and print the comparison'''

n = self.n
comp = pt.PrettyTable()
comp.field_names = ['k', 'Theoretical', 'Frequentist']
self.draw()
for i in range(n):
self.binomial(i+1)
(continues on next page)

11.2. Frequentist Interpretation 187


Quantitative Economics with Python

(continued from previous page)


self.compute_fk(i+1)
comp.add_row([i+1, self.P, self.f_kI])
print(comp)

θ, n, k, I = 0.7, 20, 10, 1_000_000

freq = frequentist(θ, n, I)

freq.compare()

+----+------------------------+-------------+
| k | Theoretical | Frequentist |
+----+------------------------+-------------+
| 1 | 1.6271660538000033e-09 | 0.0 |
| 2 | 3.606884752589999e-08 | 0.0 |
| 3 | 5.04963865362601e-07 | 1e-06 |
| 4 | 5.007558331512455e-06 | 6e-06 |
| 5 | 3.7389768875293014e-05 | 2.8e-05 |
| 6 | 0.00021810698510587546 | 0.000214 |
| 7 | 0.001017832597160754 | 0.000992 |
| 8 | 0.003859281930901185 | 0.003738 |
| 9 | 0.012006654896137007 | 0.011793 |
| 10 | 0.030817080900085007 | 0.030752 |
| 11 | 0.065369565545635 | 0.06523 |
| 12 | 0.11439673970486108 | 0.114338 |
| 13 | 0.1642619852172365 | 0.165208 |
| 14 | 0.19163898275344246 | 0.191348 |
| 15 | 0.17886305056987967 | 0.178536 |
| 16 | 0.1304209743738704 | 0.130838 |
| 17 | 0.07160367220526209 | 0.071312 |
| 18 | 0.027845872524268643 | 0.027988 |
| 19 | 0.006839337111223871 | 0.006858 |
| 20 | 0.0007979226629761189 | 0.00082 |
+----+------------------------+-------------+

From the table above, can you see the law of large numbers at work?

Let’s do some more calculations.


Comparison with different 𝜃
Now we fix

𝑛 = 20, 𝑘 = 10, 𝐼 = 1, 000, 000

We’ll vary 𝜃 from 0.01 to 0.99 and plot outcomes against 𝜃.

θ_low, θ_high, npt = 0.01, 0.99, 50


thetas = np.linspace(θ_low, θ_high, npt)
P = []
f_kI = []
for i in range(npt):
freq = frequentist(thetas[i], n, I)
freq.binomial(k)
(continues on next page)

188 Chapter 11. Two Meanings of Probability


Quantitative Economics with Python

(continued from previous page)


freq.draw()
freq.compute_fk(k)
P.append(freq.P)
f_kI.append(freq.f_kI)

fig, ax = plt.subplots(figsize=(8, 6))


ax.grid()
ax.plot(thetas, P, 'k-.', label='Theoretical')
ax.plot(thetas, f_kI, 'r--', label='Fraction')
plt.title(r'Comparison with different $\theta$', fontsize=16)
plt.xlabel(r'$\theta$', fontsize=15)
plt.ylabel('Fraction', fontsize=15)
plt.tick_params(labelsize=13)
plt.legend()
plt.show()

Comparison with different 𝑛


Now we fix 𝜃 = 0.7, 𝑘 = 10, 𝐼 = 1, 000, 000 and vary 𝑛 from 1 to 100.
Then we’ll plot outcomes.

n_low, n_high, nn = 1, 100, 50


ns = np.linspace(n_low, n_high, nn, dtype='int')
(continues on next page)

11.2. Frequentist Interpretation 189


Quantitative Economics with Python

(continued from previous page)


P = []
f_kI = []
for i in range(nn):
freq = frequentist(θ, ns[i], I)
freq.binomial(k)
freq.draw()
freq.compute_fk(k)
P.append(freq.P)
f_kI.append(freq.f_kI)

fig, ax = plt.subplots(figsize=(8, 6))


ax.grid()
ax.plot(ns, P, 'k-.', label='Theoretical')
ax.plot(ns, f_kI, 'r--', label='Frequentist')
plt.title(r'Comparison with different $n$', fontsize=16)
plt.xlabel(r'$n$', fontsize=15)
plt.ylabel('Fraction', fontsize=15)
plt.tick_params(labelsize=13)
plt.legend()
plt.show()

Comparison with different 𝐼


Now we fix 𝜃 = 0.7, 𝑛 = 20, 𝑘 = 10 and vary log(𝐼) from 2 to 7.

190 Chapter 11. Two Meanings of Probability


Quantitative Economics with Python

I_log_low, I_log_high, nI = 2, 6, 200


log_Is = np.linspace(I_log_low, I_log_high, nI)
Is = np.power(10, log_Is).astype(int)
P = []
f_kI = []
for i in range(nI):
freq = frequentist(θ, n, Is[i])
freq.binomial(k)
freq.draw()
freq.compute_fk(k)
P.append(freq.P)
f_kI.append(freq.f_kI)

fig, ax = plt.subplots(figsize=(8, 6))


ax.grid()
ax.plot(Is, P, 'k-.', label='Theoretical')
ax.plot(Is, f_kI, 'r--', label='Fraction')
plt.title(r'Comparison with different $I$', fontsize=16)
plt.xlabel(r'$I$', fontsize=15)
plt.ylabel('Fraction', fontsize=15)
plt.tick_params(labelsize=13)
plt.legend()
plt.show()

From the above graphs, we can see that 𝐼, the number of independent sequences, plays an important role.

11.2. Frequentist Interpretation 191


Quantitative Economics with Python

When 𝐼 becomes larger, the difference between theoretical probability and frequentist estimate becomes smaller.
Also, as long as 𝐼 is large enough, changing 𝜃 or 𝑛 does not substantially change the accuracy of the observed fraction as
an approximation of 𝜃.
The Law of Large Numbers is at work here.
For each draw of an independent sequence, Prob(𝑋𝑖 = 𝑘|𝜃) is the same, so aggregating all draws forms an i.i.d sequence
of a binary random variable 𝜌𝑘,𝑖 , 𝑖 = 1, 2, ...𝐼, with a mean of Prob(𝑋 = 𝑘|𝜃) and a variance of

𝑛 ⋅ Prob(𝑋 = 𝑘|𝜃) ⋅ (1 − Prob(𝑋 = 𝑘|𝜃)).

So, by the LLN, the average of 𝑃𝑘,𝑖 converges to:

𝑛!
𝐸[𝜌𝑘,𝑖 ] = Prob(𝑋 = 𝑘|𝜃) = ( ) 𝜃𝑘 (1 − 𝜃)𝑛−𝑘
𝑘!(𝑛 − 𝑘)!
as 𝐼 goes to infinity.

11.3 Bayesian Interpretation

We again use a binomial distribution.


But now we don’t regard 𝜃 as being a fixed number.
Instead, we think of it as a random variable.
𝜃 is described by a probability distribution.
But now this probability distribution means something different than a relative frequency that we can anticipate to occur
in a large i.i.d. sample.
Instead, the probability distribution of 𝜃 is now a summary of our views about likely values of 𝜃 either
• before we have seen any data at all, or
• before we have seen more data, after we have seen some data
Thus, suppose that, before seeing any data, you have a personal prior probability distribution saying that

𝜃𝛼−1 (1 − 𝜃)𝛽−1
𝑃 (𝜃) =
𝐵(𝛼, 𝛽)

where 𝐵(𝛼, 𝛽) is a beta function , so that 𝑃 (𝜃) is a beta distribution with parameters 𝛼, 𝛽.

Exercise 11.3.1
a) Please write down the likelihood function for a sample of length 𝑛 from a binomial distribution with parameter 𝜃.
b) Please write down the posterior distribution for 𝜃 after observing one flip of the coin.
c) Please pretend that the true value of 𝜃 = .4 and that someone who doesn’t know this has a beta prior distribution with
parameters with 𝛽 = 𝛼 = .5.
d) Please write a Python class to simulate this person’s personal posterior distribution for 𝜃 for a single sequence of 𝑛
draws.
e) Please plot the posterior distribution for 𝜃 as a function of 𝜃 as 𝑛 grows as 1, 2, ….
f) For various 𝑛’s, please describe and compute a Bayesian coverage interval for the interval [.45, .55].
g) Please tell what question a Bayesian coverage interval answers.

192 Chapter 11. Two Meanings of Probability


Quantitative Economics with Python

h) Please compute the Posterior probabililty that 𝜃 ∈ [.45, .55] for various values of sample size 𝑛.
i) Please use your Python class to study what happens to the posterior distribution as 𝑛 → +∞, again assuming that the
true value of 𝜃 = .4, though it is unknown to the person doing the updating via Bayes’ Law.

Solution to Exercise 11.3.1


a) Please write down the likelihood function and the posterior distribution for 𝜃 after observing one flip of our coin.
Suppose the outcome is Y.
The likelihood function is:

𝐿(𝑌 |𝜃) = Prob(𝑋 = 𝑌 |𝜃) = 𝜃𝑌 (1 − 𝜃)1−𝑌

b) Please write the posterior distribution for 𝜃 after observing one flip of our coin.
The prior distribution is

𝜃𝛼−1 (1 − 𝜃)𝛽−1
Prob(𝜃) =
𝐵(𝛼, 𝛽)

We can derive the posterior distribution for 𝜃 via

Prob(𝑌 |𝜃)Prob(𝜃)
Prob(𝜃|𝑌 ) =
Prob(𝑌 )
Prob(𝑌 |𝜃)Prob(𝜃)
= 1
∫0 Prob(𝑌 |𝜃)Prob(𝜃)𝑑𝜃
𝜃𝛼−1 (1−𝜃)𝛽−1
𝜃𝑌 (1 − 𝜃)1−𝑌 𝐵(𝛼,𝛽)
= 1 𝜃𝛼−1 (1−𝜃)𝛽−1
∫0 𝜃𝑌 (1 − 𝜃)1−𝑌 𝐵(𝛼,𝛽) 𝑑𝜃
𝜃𝑌 +𝛼−1 (1 − 𝜃)1−𝑌 +𝛽−1
= 1
∫0 𝜃𝑌 +𝛼−1 (1 − 𝜃)1−𝑌 +𝛽−1 𝑑𝜃

which means that

Prob(𝜃|𝑌 ) ∼ Beta(𝛼 + 𝑌 , 𝛽 + (1 − 𝑌 ))

c) Please pretend that the true value of 𝜃 = .4 and that someone who doesn’t know this has a beta prior with 𝛽 = 𝛼 = .5.
d) Please write a Python class to simulate this person’s personal posterior distribution for 𝜃 for a single sequence of 𝑛
draws.

class Bayesian:

def __init__(self, θ=0.4, n=1_000_000, α=0.5, β=0.5):


"""
Parameters:
----------
θ : float, ranging from [0,1].
probability that one toss of a coin will be a head with Y = 1

n : int.
number of independent flips in an independent sequence of draws

(continues on next page)

11.3. Bayesian Interpretation 193


Quantitative Economics with Python

(continued from previous page)


α&β : int or float.
parameters of the prior distribution on θ

"""
self.θ, self.n, self.α, self.β = θ, n, α, β
self.prior = st.beta(α, β)

def draw(self):
"""
simulate a single sequence of draws of length n, given probability θ

"""
array = np.random.rand(self.n)
self.draws = (array < self.θ).astype(int)

def form_single_posterior(self, step_num):


"""
form a posterior distribution after observing the first step_num elements of␣
↪the draws

Parameters
----------
step_num: int.
number of steps observed to form a posterior distribution

Returns
------
the posterior distribution for sake of plotting in the subsequent steps

"""
heads_num = self.draws[:step_num].sum()
tails_num = step_num - heads_num

return st.beta(self.α+heads_num, self.β+tails_num)

def form_posterior_series(self,num_obs_list):
"""
form a series of posterior distributions that form after observing different␣
↪number of draws.

Parameters
----------
num_obs_list: a list of int.
a list of the number of observations used to form a series of␣
↪posterior distributions.

"""
self.posterior_list = []
for num in num_obs_list:
self.posterior_list.append(self.form_single_posterior(num))

e) Please plot the posterior distribution for 𝜃 as a function of 𝜃 as 𝑛 grows from 1, 2, ….

Bay_stat = Bayesian()
Bay_stat.draw()

(continues on next page)

194 Chapter 11. Two Meanings of Probability


Quantitative Economics with Python

(continued from previous page)


num_list = [1, 2, 3, 4, 5, 10, 20, 30, 50, 70, 100, 300, 500, 1000, # this line for␣
↪finite n

5000, 10_000, 50_000, 100_000, 200_000, 300_000] # this line for␣


↪approximately infinite n

Bay_stat.form_posterior_series(num_list)

θ_values = np.linspace(0.01, 1, 100)

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(θ_values, Bay_stat.prior.pdf(θ_values), label='Prior Distribution', color='k',


↪ linestyle='--')

for ii, num in enumerate(num_list[:14]):


ax.plot(θ_values, Bay_stat.posterior_list[ii].pdf(θ_values), label='Posterior␣
↪with n = %d' % num)

ax.set_title('P.D.F of Posterior Distributions', fontsize=15)


ax.set_xlabel(r"$\theta$", fontsize=15)

ax.legend(fontsize=11)
plt.show()

f) For various 𝑛’s, please describe and compute .05 and .95 quantiles for posterior probabilities.

upper_bound = [ii.ppf(0.05) for ii in Bay_stat.posterior_list[:14]]


lower_bound = [ii.ppf(0.95) for ii in Bay_stat.posterior_list[:14]]
(continues on next page)

11.3. Bayesian Interpretation 195


Quantitative Economics with Python

(continued from previous page)

interval_df = pd.DataFrame()
interval_df['upper'] = upper_bound
interval_df['lower'] = lower_bound
interval_df.index = num_list[:14]
interval_df = interval_df.T
interval_df

1 2 3 4 5 10 20 \
upper 0.228520 0.430741 0.235534 0.16528 0.127776 0.347322 0.280091
lower 0.998457 0.999132 0.937587 0.83472 0.739366 0.814884 0.629953

30 50 70 100 300 500 1000


upper 0.32360 0.292234 0.334679 0.322252 0.344599 0.372306 0.385621
lower 0.61429 0.516104 0.526749 0.481969 0.436961 0.444479 0.436759

As 𝑛 increases, we can see that Bayesian coverage intervals narrow and move toward 0.4.
g) Please tell what question a Bayesian coverage interval answers.
The Bayesian coverage interval tells the range of 𝜃 that corresponds to the [𝑝1 , 𝑝2 ] quantiles of the cumulative probability
distribution (CDF) of the posterior distribution.
To construct the coverage interval we first compute a posterior distribution of the unknown parameter 𝜃.
If the CDF is 𝐹 (𝜃), then the Bayesian coverage interval [𝑎, 𝑏] for the interval [𝑝1 , 𝑝2 ] is described by

𝐹 (𝑎) = 𝑝1 , 𝐹 (𝑏) = 𝑝2

h) Please compute the Posterior probabililty that 𝜃 ∈ [.45, .55] for various values of sample size 𝑛.

left_value, right_value = 0.45, 0.55

posterior_prob_list=[ii.cdf(right_value)-ii.cdf(left_value) for ii in Bay_stat.


↪posterior_list]

fig, ax = plt.subplots(figsize=(8, 5))


ax.plot(posterior_prob_list)
ax.set_title('Posterior Probabililty that '+ r"$\theta$" +' Ranges from %.2f to %.2f'
↪%(left_value, right_value),

fontsize=13)
ax.set_xticks(np.arange(0, len(posterior_prob_list), 3))
ax.set_xticklabels(num_list[::3])
ax.set_xlabel('Number of Observations', fontsize=11)

plt.show()

196 Chapter 11. Two Meanings of Probability


Quantitative Economics with Python

Notice that in the graph above the posterior probabililty that 𝜃 ∈ [.45, .55] typically exhibits a hump shape as 𝑛 increases.
Two opposing forces are at work.
The first force is that the individual adjusts his belief as he observes new outcomes, so his posterior probability distribution
becomes more and more realistic, which explains the rise of the posterior probabililty.
However, [.45, .55] actually excludes the true 𝜃 = .4 that generates the data.
As a result, the posterior probabililty drops as larger and larger samples refine his posterior probability distribution of 𝜃.
The descent seems precipitous only because of the scale of the graph that has the number of observations increasing
disproportionately.
When the number of observations becomes large enough, our Bayesian becomes so confident about 𝜃 that he considers
𝜃 ∈ [.45, .55] very unlikely.
That is why we see a nearly horizontal line when the number of observations exceeds 500.
i) Please use your Python class to study what happens to the posterior distribution as 𝑛 → +∞, again assuming that the
true value of 𝜃 = .4, though it is unknown to the person doing the updating via Bayes’ Law.
Using the Python class we made above, we can see the evolution of posterior distributions as 𝑛 approaches infinity.

fig, ax = plt.subplots(figsize=(10, 6))

for ii, num in enumerate(num_list[14:]):


ii += 14
ax.plot(θ_values, Bay_stat.posterior_list[ii].pdf(θ_values),
label='Posterior with n=%d thousand' % (num/1000))

ax.set_title('P.D.F of Posterior Distributions', fontsize=15)


(continues on next page)

11.3. Bayesian Interpretation 197


Quantitative Economics with Python

(continued from previous page)


ax.set_xlabel(r"$\theta$", fontsize=15)
ax.set_xlim(0.3, 0.5)

ax.legend(fontsize=11)
plt.show()

As 𝑛 increases, we can see that the probability density functions concentrate on 0.4, the true value of 𝜃.
Here the posterior means converges to 0.4 while the posterior standard deviations converges to 0 from above.
To show this, we compute the means and variances statistics of the posterior distributions.

mean_list = [ii.mean() for ii in Bay_stat.posterior_list]


std_list = [ii.std() for ii in Bay_stat.posterior_list]

fig, ax = plt.subplots(1, 2, figsize=(14, 5))

ax[0].plot(mean_list)
ax[0].set_title('Mean Values of Posterior Distribution', fontsize=13)
ax[0].set_xticks(np.arange(0, len(mean_list), 3))
ax[0].set_xticklabels(num_list[::3])
ax[0].set_xlabel('Number of Observations', fontsize=11)

ax[1].plot(std_list)
ax[1].set_title('Standard Deviations of Posterior Distribution', fontsize=13)
ax[1].set_xticks(np.arange(0, len(std_list), 3))
ax[1].set_xticklabels(num_list[::3])
ax[1].set_xlabel('Number of Observations', fontsize=11)

plt.show()

198 Chapter 11. Two Meanings of Probability


Quantitative Economics with Python

How shall we interpret the patterns above?


The answer is encoded in the Bayesian updating formulas.
It is natural to extend the one-step Bayesian update to an 𝑛-step Bayesian update.

Prob(𝜃, 𝑘) Prob(𝑘|𝜃) ∗ Prob(𝜃) Prob(𝑘|𝜃) ∗ Prob(𝜃)


Prob(𝜃|𝑘) = = = 1
Prob(𝑘) Prob(𝑘) ∫0 Prob(𝑘|𝜃) ∗ Prob(𝜃)𝑑𝜃

𝜃𝛼−1 (1−𝜃)𝛽−1
(𝑁
𝑘 )(1 − 𝜃)
𝑁−𝑘 𝑘
𝜃 ∗ 𝐵(𝛼,𝛽)
= 1 𝜃𝛼−1 (1−𝜃)𝛽−1
∫0 (𝑁
𝑘 )(1 − 𝜃)
𝑁−𝑘 𝜃𝑘 ∗
𝐵(𝛼,𝛽) 𝑑𝜃

(1 − 𝜃)𝛽+𝑁−𝑘−1 ∗ 𝜃𝛼+𝑘−1
= 1
∫0 (1 − 𝜃)𝛽+𝑁−𝑘−1 ∗ 𝜃𝛼+𝑘−1 𝑑𝜃
= 𝐵𝑒𝑡𝑎(𝛼 + 𝑘, 𝛽 + 𝑁 − 𝑘)
A beta distribution with 𝛼 and 𝛽 has the following mean and variance.
𝛼
The mean is 𝛼+𝛽
𝛼𝛽
The variance is (𝛼+𝛽)2 (𝛼+𝛽+1)

• 𝛼 can be viewed as the number of successes


• 𝛽 can be viewed as the number of failures
The random variables 𝑘 and 𝑁 − 𝑘 are governed by Binomial Distribution with 𝜃 = 0.4.
Call this the true data generating process.
According to the Law of Large Numbers, for a large number of observations, observed frequencies of 𝑘 and 𝑁 − 𝑘
will be described by the true data generating process, i.e., the population probability distribution that we assumed when
generating the observations on the computer. (See Exercise 11.2.1).
Consequently, the mean of the posterior distribution converges to 0.4 and the variance withers to zero.

upper_bound = [ii.ppf(0.95) for ii in Bay_stat.posterior_list]


lower_bound = [ii.ppf(0.05) for ii in Bay_stat.posterior_list]

fig, ax = plt.subplots(figsize=(10, 6))


(continues on next page)

11.3. Bayesian Interpretation 199


Quantitative Economics with Python

(continued from previous page)


ax.scatter(np.arange(len(upper_bound)), upper_bound, label='95 th Quantile')
ax.scatter(np.arange(len(lower_bound)), lower_bound, label='05 th Quantile')

ax.set_xticks(np.arange(0, len(upper_bound), 2))


ax.set_xticklabels(num_list[::2])
ax.set_xlabel('Number of Observations', fontsize=12)
ax.set_title('Bayesian Coverage Intervals of Posterior Distributions', fontsize=15)

ax.legend(fontsize=11)
plt.show()

After observing a large number of outcomes, the posterior distribution collapses around 0.4.
Thus, the Bayesian statististian comes to believe that 𝜃 is near .4.
As shown in the figure above, as the number of observations grows, the Bayesian coverage intervals (BCIs) become
narrower and narrower around 0.4.
However, if you take a closer look, you will find that the centers of the BCIs are not exactly 0.4, due to the persistent
influence of the prior distribution and the randomness of the simulation path.

200 Chapter 11. Two Meanings of Probability


CHAPTER

TWELVE

MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION

Contents

• Multivariate Hypergeometric Distribution


– Overview
– The Administrator’s Problem
– Usage

12.1 Overview

This lecture describes how an administrator deployed a multivariate hypergeometric distribution in order to access
the fairness of a procedure for awarding research grants.
In the lecture we’ll learn about
• properties of the multivariate hypergeometric distribution
• first and second moments of a multivariate hypergeometric distribution
• using a Monte Carlo simulation of a multivariate normal distribution to evaluate the quality of a normal approxi-
mation
• the administrator’s problem and why the multivariate hypergeometric distribution is the right tool

12.2 The Administrator’s Problem

An administrator in charge of allocating research grants is in the following situation.


To help us forget details that are none of our business here and to protect the anonymity of the administrator and the
subjects, we call research proposals balls and continents of residence of authors of a proposal a color.
There are 𝐾𝑖 balls (proposals) of color 𝑖.
There are 𝑐 distinct colors (continents of residence).
Thus, 𝑖 = 1, 2, … , 𝑐
𝑐
So there is a total of 𝑁 = ∑𝑖=1 𝐾𝑖 balls.
All 𝑁 of these balls are placed in an urn.

201
Quantitative Economics with Python

Then 𝑛 balls are drawn randomly.


The selection procedure is supposed to be color blind meaning that ball quality, a random variable that is supposed to
be independent of ball color, governs whether a ball is drawn.
Thus, the selection procedure is supposed randomly to draw 𝑛 balls from the urn.
The 𝑛 balls drawn represent successful proposals and are awarded research funds.
The remaining 𝑁 − 𝑛 balls receive no research funds.

12.2.1 Details of the Awards Procedure Under Study

Let 𝑘𝑖 be the number of balls of color 𝑖 that are drawn.


𝑐
Things have to add up so ∑𝑖=1 𝑘𝑖 = 𝑛.
Under the hypothesis that the selection process judges proposals on their quality and that quality is independent of conti-
nent of the author’s continent of residence, the administrator views the outcome of the selection procedure as a random
vector
𝑘1

⎜𝑘2 ⎞
⎟.
𝑋=⎜
⎜⋮⎟ ⎟
𝑘
⎝ 𝑐⎠

To evaluate whether the selection procedure is color blind the administrator wants to study whether the particular re-
alization of 𝑋 drawn can plausibly be said to be a random draw from the probability distribution that is implied by the
color blind hypothesis.
The appropriate probability distribution is the one described here.
Let’s now instantiate the administrator’s problem, while continuing to use the colored balls metaphor.
The administrator has an urn with 𝑁 = 238 balls.
157 balls are blue, 11 balls are green, 46 balls are yellow, and 24 balls are black.
So (𝐾1 , 𝐾2 , 𝐾3 , 𝐾4 ) = (157, 11, 46, 24) and 𝑐 = 4.
15 balls are drawn without replacement.
So 𝑛 = 15.
The administrator wants to know the probability distribution of outcomes

𝑘1

⎜𝑘2 ⎞
⎟.
𝑋=⎜
⎜⋮⎟ ⎟
⎝𝑘4 ⎠

In particular, he wants to know whether a particular outcome - in the form of a 4 × 1 vector of integers recording the
numbers of blue, green, yellow, and black balls, respectively, - contains evidence against the hypothesis that the selection
process is fair, which here means color blind and truly are random draws without replacement from the population of 𝑁
balls.
The right tool for the administrator’s job is the multivariate hypergeometric distribution.

202 Chapter 12. Multivariate Hypergeometric Distribution


Quantitative Economics with Python

12.2.2 Multivariate Hypergeometric Distribution

Let’s start with some imports.

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import matplotlib.cm as cm
import numpy as np
from scipy.special import comb
from scipy.stats import normaltest
from numba import njit, prange

To recapitulate, we assume there are in total 𝑐 types of objects in an urn.


If there are 𝐾𝑖 type 𝑖 object in the urn and we take 𝑛 draws at random without replacement, then the numbers of type 𝑖
objects in the sample (𝑘1 , 𝑘2 , … , 𝑘𝑐 ) has the multivariate hypergeometric distribution.
𝑐 𝑐
Note again that 𝑁 = ∑𝑖=1 𝐾𝑖 is the total number of objects in the urn and 𝑛 = ∑𝑖=1 𝑘𝑖 .
Notation
We use the following notation for binomial coefficients: (𝑚
𝑞) =
𝑚!
(𝑚−𝑞)! .

The multivariate hypergeometric distribution has the following properties:


Probability mass function:
𝑐
∏𝑖=1 (𝐾
𝑘 )
𝑖

Pr{𝑋𝑖 = 𝑘𝑖 ∀𝑖} = 𝑖

(𝑁
𝑛)

Mean:
𝐾𝑖
E(𝑋𝑖 ) = 𝑛
𝑁
Variances and covariances:
𝑁 − 𝑛 𝐾𝑖 𝐾
Var(𝑋𝑖 ) = 𝑛 (1 − 𝑖 )
𝑁 −1 𝑁 𝑁
𝑁 − 𝑛 𝐾𝑖 𝐾𝑗
Cov(𝑋𝑖 , 𝑋𝑗 ) = −𝑛
𝑁 −1 𝑁 𝑁
To do our work for us, we’ll write an Urn class.

class Urn:

def __init__(self, K_arr):


"""
Initialization given the number of each type i object in the urn.

Parameters
----------
K_arr: ndarray(int)
number of each type i object.
"""

self.K_arr = np.array(K_arr)
self.N = np.sum(K_arr)
(continues on next page)

12.2. The Administrator’s Problem 203


Quantitative Economics with Python

(continued from previous page)


self.c = len(K_arr)

def pmf(self, k_arr):


"""
Probability mass function.

Parameters
----------
k_arr: ndarray(int)
number of observed successes of each object.
"""

K_arr, N = self.K_arr, self.N

k_arr = np.atleast_2d(k_arr)
n = np.sum(k_arr, 1)

num = np.prod(comb(K_arr, k_arr), 1)


denom = comb(N, n)

pr = num / denom

return pr

def moments(self, n):


"""
Compute the mean and variance-covariance matrix for
multivariate hypergeometric distribution.

Parameters
----------
n: int
number of draws.
"""

K_arr, N, c = self.K_arr, self.N, self.c

# mean
μ = n * K_arr / N

# variance-covariance matrix
Σ = np.full((c, c), n * (N - n) / (N - 1) / N ** 2)
for i in range(c-1):
Σ[i, i] *= K_arr[i] * (N - K_arr[i])
for j in range(i+1, c):
Σ[i, j] *= - K_arr[i] * K_arr[j]
Σ[j, i] = Σ[i, j]

Σ[-1, -1] *= K_arr[-1] * (N - K_arr[-1])

return μ, Σ

def simulate(self, n, size=1, seed=None):


"""
Simulate a sample from multivariate hypergeometric
distribution where at each draw we take n objects

(continues on next page)

204 Chapter 12. Multivariate Hypergeometric Distribution


Quantitative Economics with Python

(continued from previous page)


from the urn without replacement.

Parameters
----------
n: int
number of objects for each draw.
size: int(optional)
sample size.
seed: int(optional)
random seed.
"""

K_arr = self.K_arr

gen = np.random.Generator(np.random.PCG64(seed))
sample = gen.multivariate_hypergeometric(K_arr, n, size=size)

return sample

12.3 Usage

12.3.1 First example

Apply this to an example from wiki:


Suppose there are 5 black, 10 white, and 15 red marbles in an urn. If six marbles are chosen without replacement, the
probability that exactly two of each color are chosen is

(52)(10 15
2 )( 2 )
𝑃 (2 black, 2 white, 2 red) = = 0.079575596816976
(30
6)

# construct the urn


K_arr = [5, 10, 15]
urn = Urn(K_arr)

Now use the Urn Class method pmf to compute the probability of the outcome 𝑋 = (2 2 2)

k_arr = [2, 2, 2] # array of number of observed successes


urn.pmf(k_arr)

array([0.0795756])

We can use the code to compute probabilities of a list of possible outcomes by constructing a 2-dimensional array k_arr
and pmf will return an array of probabilities for observing each case.

k_arr = [[2, 2, 2], [1, 3, 2]]


urn.pmf(k_arr)

array([0.0795756, 0.1061008])

Now let’s compute the mean vector and variance-covariance matrix.

12.3. Usage 205


Quantitative Economics with Python

n = 6
μ, Σ = urn.moments(n)

array([1., 2., 3.])

array([[ 0.68965517, -0.27586207, -0.4137931 ],


[-0.27586207, 1.10344828, -0.82758621],
[-0.4137931 , -0.82758621, 1.24137931]])

12.3.2 Back to The Administrator’s Problem

Now let’s turn to the grant administrator’s problem.


Here the array of numbers of 𝑖 objects in the urn is (157, 11, 46, 24).

K_arr = [157, 11, 46, 24]


urn = Urn(K_arr)

Let’s compute the probability of the outcome (10, 1, 4, 0).

k_arr = [10, 1, 4, 0]
urn.pmf(k_arr)

array([0.01547738])

We can compute probabilities of three possible outcomes by constructing a 3-dimensional arrays k_arr and utilizing
the method pmf of the Urn class.

k_arr = [[5, 5, 4 ,1], [10, 1, 2, 2], [13, 0, 2, 0]]


urn.pmf(k_arr)

array([6.21412534e-06, 2.70935969e-02, 1.61839976e-02])

Now let’s compute the mean and variance-covariance matrix of 𝑋 when 𝑛 = 6.

n = 6 # number of draws
μ, Σ = urn.moments(n)

# mean
μ

array([3.95798319, 0.27731092, 1.15966387, 0.60504202])

206 Chapter 12. Multivariate Hypergeometric Distribution


Quantitative Economics with Python

# variance-covariance matrix
Σ

array([[ 1.31862604, -0.17907267, -0.74884935, -0.39070401],


[-0.17907267, 0.25891399, -0.05246715, -0.02737417],
[-0.74884935, -0.05246715, 0.91579029, -0.11447379],
[-0.39070401, -0.02737417, -0.11447379, 0.53255196]])

We can simulate a large sample and verify that sample means and covariances closely approximate the population means
and covariances.

size = 10_000_000
sample = urn.simulate(n, size=size)

# mean
np.mean(sample, 0)

array([3.9575241, 0.2772789, 1.1598499, 0.6053471])

# variance covariance matrix


np.cov(sample.T)

array([[ 1.31769043, -0.17888995, -0.74840691, -0.39039358],


[-0.17888995, 0.25887754, -0.05253011, -0.02745748],
[-0.74840691, -0.05253011, 0.9158426 , -0.11490558],
[-0.39039358, -0.02745748, -0.11490558, 0.53275664]])

Evidently, the sample means and covariances approximate their population counterparts well.

12.3.3 Quality of Normal Approximation

To judge the quality of a multivariate normal approximation to the multivariate hypergeometric distribution, we draw
a large sample from a multivariate normal distribution with the mean vector and covariance matrix for the correspond-
ing multivariate hypergeometric distribution and compare the simulated distribution with the population multivariate
hypergeometric distribution.

sample_normal = np.random.multivariate_normal(μ, Σ, size=size)

def bivariate_normal(x, y, μ, Σ, i, j):

μ_x, μ_y = μ[i], μ[j]


σ_x, σ_y = np.sqrt(Σ[i, i]), np.sqrt(Σ[j, j])
σ_xy = Σ[i, j]

x_μ = x - μ_x
y_μ = y - μ_y

ρ = σ_xy / (σ_x * σ_y)


z = x_μ**2 / σ_x**2 + y_μ**2 / σ_y**2 - 2 * ρ * x_μ * y_μ / (σ_x * σ_y)
denom = 2 * np.pi * σ_x * σ_y * np.sqrt(1 - ρ**2)
(continues on next page)

12.3. Usage 207


Quantitative Economics with Python

(continued from previous page)

return np.exp(-z / (2 * (1 - ρ**2))) / denom

@njit
def count(vec1, vec2, n):
size = sample.shape[0]

count_mat = np.zeros((n+1, n+1))


for i in prange(size):
count_mat[vec1[i], vec2[i]] += 1

return count_mat

c = urn.c
fig, axs = plt.subplots(c, c, figsize=(14, 14))

# grids for ploting the bivariate Gaussian


x_grid = np.linspace(-2, n+1, 100)
y_grid = np.linspace(-2, n+1, 100)
X, Y = np.meshgrid(x_grid, y_grid)

for i in range(c):
axs[i, i].hist(sample[:, i], bins=np.arange(0, n, 1), alpha=0.5, density=True,␣
↪label='hypergeom')

axs[i, i].hist(sample_normal[:, i], bins=np.arange(0, n, 1), alpha=0.5,␣


↪density=True, label='normal')

axs[i, i].legend()
axs[i, i].set_title('$k_{' +str(i+1) +'}$')
for j in range(c):
if i == j:
continue

# bivariate Gaussian density function


Z = bivariate_normal(X, Y, μ, Σ, i, j)
cs = axs[i, j].contour(X, Y, Z, 4, colors="black", alpha=0.6)
axs[i, j].clabel(cs, inline=1, fontsize=10)

# empirical multivariate hypergeometric distrbution


count_mat = count(sample[:, i], sample[:, j], n)
axs[i, j].pcolor(count_mat.T/size, cmap='Blues')
axs[i, j].set_title('$(k_{' +str(i+1) +'}, k_{' + str(j+1) + '})$')

plt.show()

208 Chapter 12. Multivariate Hypergeometric Distribution


Quantitative Economics with Python

The diagonal graphs plot the marginal distributions of 𝑘𝑖 for each 𝑖 using histograms.
Note the substantial differences between hypergeometric distribution and the approximating normal distribution.
The off-diagonal graphs plot the empirical joint distribution of 𝑘𝑖 and 𝑘𝑗 for each pair (𝑖, 𝑗).
The darker the blue, the more data points are contained in the corresponding cell. (Note that 𝑘𝑖 is on the x-axis and 𝑘𝑗 is
on the y-axis).
The contour maps plot the bivariate Gaussian density function of (𝑘𝑖 , 𝑘𝑗 ) with the population mean and covariance given
by slices of 𝜇 and Σ that we computed above.
Let’s also test the normality for each 𝑘𝑖 using scipy.stats.normaltest that implements D’Agostino and Pearson’s
test that combines skew and kurtosis to form an omnibus test of normality.
The null hypothesis is that the sample follows normal distribution.
normaltest returns an array of p-values associated with tests for each 𝑘𝑖 sample.

12.3. Usage 209


Quantitative Economics with Python

test_multihyper = normaltest(sample)
test_multihyper.pvalue

array([0., 0., 0., 0.])

As we can see, all the p-values are almost 0 and the null hypothesis is soundly rejected.
By contrast, the sample from normal distribution does not reject the null hypothesis.

test_normal = normaltest(sample_normal)
test_normal.pvalue

array([0.20342918, 0.37411736, 0.32851877, 0.44077682])

The lesson to take away from this is that the normal approximation is imperfect.

210 Chapter 12. Multivariate Hypergeometric Distribution


CHAPTER

THIRTEEN

MULTIVARIATE NORMAL DISTRIBUTION

Contents

• Multivariate Normal Distribution


– Overview
– The Multivariate Normal Distribution
– Bivariate Example
– Trivariate Example
– One Dimensional Intelligence (IQ)
– Information as Surprise
– Cholesky Factor Magic
– Math and Verbal Intelligence
– Univariate Time Series Analysis
– Stochastic Difference Equation
– Application to Stock Price Model
– Filtering Foundations
– Classic Factor Analysis Model
– PCA and Factor Analysis

13.1 Overview

This lecture describes a workhorse in probability theory, statistics, and economics, namely, the multivariate normal
distribution.
In this lecture, you will learn formulas for
• the joint distribution of a random vector 𝑥 of length 𝑁
• marginal distributions for all subvectors of 𝑥
• conditional distributions for subvectors of 𝑥 conditional on other subvectors of 𝑥
We will use the multivariate normal distribution to formulate some useful models:

211
Quantitative Economics with Python

• a factor analytic model of an intelligence quotient, i.e., IQ


• a factor analytic model of two independent inherent abilities, say, mathematical and verbal.
• a more general factor analytic model
• Principal Components Analysis (PCA) as an approximation to a factor analytic model
• time series generated by linear stochastic difference equations
• optimal linear filtering theory

13.2 The Multivariate Normal Distribution

This lecture defines a Python class MultivariateNormal to be used to generate marginal and conditional distri-
butions associated with a multivariate normal distribution.
For a multivariate normal distribution it is very convenient that
• conditional expectations equal linear least squares projections
• conditional distributions are characterized by multivariate linear regressions
We apply our Python class to some examples.
We use the following imports:

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from numba import njit
import statsmodels.api as sm

Assume that an 𝑁 × 1 random vector 𝑧 has a multivariate normal probability density.


This means that the probability density takes the form
−( 𝑁
2 ) − 12 ′
𝑓 (𝑧; 𝜇, Σ) = (2𝜋) det (Σ) exp (−.5 (𝑧 − 𝜇) Σ−1 (𝑧 − 𝜇))

where 𝜇 = 𝐸𝑧 is the mean of the random vector 𝑧 and Σ = 𝐸 (𝑧 − 𝜇) (𝑧 − 𝜇) is the covariance matrix of 𝑧.
The covariance matrix Σ is symmetric and positive definite.

@njit
def f(z, μ, Σ):
"""
The density function of multivariate normal distribution.

Parameters
---------------
z: ndarray(float, dim=2)
random vector, N by 1
μ: ndarray(float, dim=1 or 2)
the mean of z, N by 1
Σ: ndarray(float, dim=2)
the covarianece matrix of z, N by 1
"""

(continues on next page)

212 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

(continued from previous page)


z = np.atleast_2d(z)
μ = np.atleast_2d(μ)
Σ = np.atleast_2d(Σ)

N = z.size

temp1 = np.linalg.det(Σ) ** (-1/2)


temp2 = np.exp(-.5 * (z - μ).T @ np.linalg.inv(Σ) @ (z - μ))

return (2 * np.pi) ** (-N/2) * temp1 * temp2

For some integer 𝑘 ∈ {1, … , 𝑁 − 1}, partition 𝑧 as

𝑧1
𝑧=[ ],
𝑧2

where 𝑧1 is an (𝑁 − 𝑘) × 1 vector and 𝑧2 is a 𝑘 × 1 vector.


Let
𝜇1 Σ11 Σ12
𝜇=[ ], Σ=[ ]
𝜇2 Σ21 Σ22

be corresponding partitions of 𝜇 and Σ.


The marginal distribution of 𝑧1 is
• multivariate normal with mean 𝜇1 and covariance matrix Σ11 .
The marginal distribution of 𝑧2 is
• multivariate normal with mean 𝜇2 and covariance matrix Σ22 .
The distribution of 𝑧1 conditional on 𝑧2 is
• multivariate normal with mean
𝜇1̂ = 𝜇1 + 𝛽 (𝑧2 − 𝜇2 )
and covariance matrix

Σ̂ 11 = Σ11 − Σ12 Σ−1


22 Σ21 = Σ11 − 𝛽Σ22 𝛽

where

𝛽 = Σ12 Σ−1
22

is an (𝑁 − 𝑘) × 𝑘 matrix of population regression coefficients of the (𝑁 − 𝑘) × 1 random vector 𝑧1 − 𝜇1 on the 𝑘 × 1


random vector 𝑧2 − 𝜇2 .
The following class constructs a multivariate normal distribution instance with two methods.
• a method partition computes 𝛽, taking 𝑘 as an input
• a method cond_dist computes either the distribution of 𝑧1 conditional on 𝑧2 or the distribution of 𝑧2 conditional
on 𝑧1

13.2. The Multivariate Normal Distribution 213


Quantitative Economics with Python

class MultivariateNormal:
"""
Class of multivariate normal distribution.

Parameters
----------
μ: ndarray(float, dim=1)
the mean of z, N by 1
Σ: ndarray(float, dim=2)
the covarianece matrix of z, N by 1

Arguments
---------
μ, Σ:
see parameters
μs: list(ndarray(float, dim=1))
list of mean vectors μ1 and μ2 in order
Σs: list(list(ndarray(float, dim=2)))
2 dimensional list of covariance matrices
Σ11, Σ12, Σ21, Σ22 in order
βs: list(ndarray(float, dim=1))
list of regression coefficients β1 and β2 in order
"""

def __init__(self, μ, Σ):


"initialization"
self.μ = np.array(μ)
self.Σ = np.atleast_2d(Σ)

def partition(self, k):


"""
Given k, partition the random vector z into a size k vector z1
and a size N-k vector z2. Partition the mean vector μ into
μ1 and μ2, and the covariance matrix Σ into Σ11, Σ12, Σ21, Σ22
correspondingly. Compute the regression coefficients β1 and β2
using the partitioned arrays.
"""
μ = self.μ
Σ = self.Σ

self.μs = [μ[:k], μ[k:]]


self.Σs = [[Σ[:k, :k], Σ[:k, k:]],
[Σ[k:, :k], Σ[k:, k:]]]

self.βs = [self.Σs[0][1] @ np.linalg.inv(self.Σs[1][1]),


self.Σs[1][0] @ np.linalg.inv(self.Σs[0][0])]

def cond_dist(self, ind, z):


"""
Compute the conditional distribution of z1 given z2, or reversely.
Argument ind determines whether we compute the conditional
distribution of z1 (ind=0) or z2 (ind=1).

Returns
---------
μ_hat: ndarray(float, ndim=1)
The conditional mean of z1 or z2.
(continues on next page)

214 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

(continued from previous page)


Σ_hat: ndarray(float, ndim=2)
The conditional covariance matrix of z1 or z2.
"""
β = self.βs[ind]
μs = self.μs
Σs = self.Σs

μ_hat = μs[ind] + β @ (z - μs[1-ind])


Σ_hat = Σs[ind][ind] - β @ Σs[1-ind][1-ind] @ β.T

return μ_hat, Σ_hat

Let’s put this code to work on a suite of examples.


We begin with a simple bivariate example; after that we’ll turn to a trivariate example.
We’ll compute population moments of some conditional distributions using our MultivariateNormal class.
For fun we’ll also compute sample analogs of the associated population regressions by generating simulations and then
computing linear least squares regressions.
We’ll compare those linear least squares regressions for the simulated data to their population counterparts.

13.3 Bivariate Example

We start with a bivariate normal distribution pinned down by

.5 1 .5
𝜇=[ ], Σ=[ ]
1.0 .5 1

μ = np.array([.5, 1.])
Σ = np.array([[1., .5], [.5 ,1.]])

# construction of the multivariate normal instance


multi_normal = MultivariateNormal(μ, Σ)

k = 1 # choose partition

# partition and compute regression coefficients


multi_normal.partition(k)
multi_normal.βs[0],multi_normal.βs[1]

(array([[0.5]]), array([[0.5]]))

Let’s illustrate the fact that you can regress anything on anything else.
We have computed everything we need to compute two regression lines, one of 𝑧2 on 𝑧1 , the other of 𝑧1 on 𝑧2 .
We’ll represent these regressions as

𝑧1 = 𝑎 1 + 𝑏 1 𝑧2 + 𝜖 1

and

𝑧2 = 𝑎 2 + 𝑏 2 𝑧1 + 𝜖 2

13.3. Bivariate Example 215


Quantitative Economics with Python

where we have the population least squares orthogonality conditions

𝐸𝜖1 𝑧2 = 0

and

𝐸𝜖2 𝑧1 = 0

Let’s compute 𝑎1 , 𝑎2 , 𝑏1 , 𝑏2 .

beta = multi_normal.βs

a1 = μ[0] - beta[0]*μ[1]
b1 = beta[0]

a2 = μ[1] - beta[1]*μ[0]
b2 = beta[1]

Let’s print out the intercepts and slopes.


For the regression of 𝑧1 on 𝑧2 we have

print ("a1 = ", a1)


print ("b1 = ", b1)

a1 = [[0.]]
b1 = [[0.5]]

For the regression of 𝑧2 on 𝑧1 we have

print ("a2 = ", a2)


print ("b2 = ", b2)

a2 = [[0.75]]
b2 = [[0.5]]

Now let’s plot the two regression lines and stare at them.

z2 = np.linspace(-4,4,100)

a1 = np.squeeze(a1)
b1 = np.squeeze(b1)

a2 = np.squeeze(a2)
b2 = np.squeeze(b2)

z1 = b1*z2 + a1

z1h = z2/b2 - a2/b2

fig = plt.figure(figsize=(12,12))
ax = fig.add_subplot(1, 1, 1)
(continues on next page)

216 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

(continued from previous page)


ax.set(xlim=(-4, 4), ylim=(-4, 4))
ax.spines['left'].set_position('center')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
plt.ylabel('$z_1$', loc = 'top')
plt.xlabel('$z_2$,', loc = 'right')
plt.title('two regressions')
plt.plot(z2,z1, 'r', label = "$z_1$ on $z_2$")
plt.plot(z2,z1h, 'b', label = "$z_2$ on $z_1$")
plt.legend()
plt.show()

The red line is the expectation of 𝑧1 conditional on 𝑧2 .

13.3. Bivariate Example 217


Quantitative Economics with Python

The intercept and slope of the red line are

print("a1 = ", a1)


print("b1 = ", b1)

a1 = 0.0
b1 = 0.5

The blue line is the expectation of 𝑧2 conditional on 𝑧1 .


The intercept and slope of the blue line are

print("-a2/b2 = ", - a2/b2)


print("1/b2 = ", 1/b2)

-a2/b2 = -1.5
1/b2 = 2.0

We can use these regression lines or our code to compute conditional expectations.
Let’s compute the mean and variance of the distribution of 𝑧2 conditional on 𝑧1 = 5.
After that we’ll reverse what are on the left and right sides of the regression.

# compute the cond. dist. of z1


ind = 1
z1 = np.array([5.]) # given z1

μ2_hat, Σ2_hat = multi_normal.cond_dist(ind, z1)


print('μ2_hat, Σ2_hat = ', μ2_hat, Σ2_hat)

μ2_hat, Σ2_hat = [3.25] [[0.75]]

Now let’s compute the mean and variance of the distribution of 𝑧1 conditional on 𝑧2 = 5.

# compute the cond. dist. of z1


ind = 0
z2 = np.array([5.]) # given z2

μ1_hat, Σ1_hat = multi_normal.cond_dist(ind, z2)


print('μ1_hat, Σ1_hat = ', μ1_hat, Σ1_hat)

μ1_hat, Σ1_hat = [2.5] [[0.75]]

Let’s compare the preceding population mean and variance with outcomes from drawing a large sample and then regressing
𝑧1 − 𝜇1 on 𝑧2 − 𝜇2 .
We know that
𝐸𝑧1 |𝑧2 = (𝜇1 − 𝛽𝜇2 ) + 𝛽𝑧2
which can be arranged to
𝑧1 − 𝜇1 = 𝛽 (𝑧2 − 𝜇2 ) + 𝜖,
We anticipate that for larger and larger sample sizes, estimated OLS coefficients will converge to 𝛽 and the estimated
variance of 𝜖 will converge to Σ̂ 1 .

218 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

n = 1_000_000 # sample size

# simulate multivariate normal random vectors


data = np.random.multivariate_normal(μ, Σ, size=n)
z1_data = data[:, 0]
z2_data = data[:, 1]

# OLS regression
μ1, μ2 = multi_normal.μs
results = sm.OLS(z1_data - μ1, z2_data - μ2).fit()

Let’s compare the preceding population 𝛽 with the OLS sample estimate on 𝑧2 − 𝜇2

multi_normal.βs[0], results.params

(array([[0.5]]), array([0.49951561]))

Let’s compare our population Σ̂ 1 with the degrees-of-freedom adjusted estimate of the variance of 𝜖

Σ1_hat, results.resid @ results.resid.T / (n - 1)

(array([[0.75]]), 0.7499568468007555)

̂ and compare it with 𝜇̂


Lastly, let’s compute the estimate of 𝐸𝑧1 |𝑧2 1

μ1_hat, results.predict(z2 - μ2) + μ1

(array([2.5]), array([2.49806245]))

Thus, in each case, for our very large sample size, the sample analogues closely approximate their population counterparts.
A Law of Large Numbers explains why sample analogues approximate population objects.

13.4 Trivariate Example

Let’s apply our code to a trivariate example.


We’ll specify the mean vector and the covariance matrix as follows.

μ = np.random.random(3)
C = np.random.random((3, 3))
Σ = C @ C.T # positive semi-definite

multi_normal = MultivariateNormal(μ, Σ)

μ, Σ

(array([0.81665356, 0.64520806, 0.89203644]),


array([[2.3328485 , 0.69063069, 1.74677053],
[0.69063069, 0.2965043 , 0.59653754],
[1.74677053, 0.59653754, 1.37811334]]))

13.4. Trivariate Example 219


Quantitative Economics with Python

k = 1
multi_normal.partition(k)

2
Let’s compute the distribution of 𝑧1 conditional on 𝑧2 = [ ].
5

ind = 0
z2 = np.array([2., 5.])

μ1_hat, Σ1_hat = multi_normal.cond_dist(ind, z2)

n = 1_000_000
data = np.random.multivariate_normal(μ, Σ, size=n)
z1_data = data[:, :k]
z2_data = data[:, k:]

μ1, μ2 = multi_normal.μs
results = sm.OLS(z1_data - μ1, z2_data - μ2).fit()

As above, we compare population and sample regression coefficients, the conditional covariance matrix, and the condi-
tional mean vector in that order.

multi_normal.βs[0], results.params

(array([[-1.71053213, 2.00793873]]), array([-1.70978271, 2.0076083 ]))

Σ1_hat, results.resid @ results.resid.T / (n - 1)

(array([[0.00678627]]), 0.006790326919375728)

μ1_hat, results.predict(z2 - μ2) + μ1

(array([6.74777753]), array([6.74743547]))

Once again, sample analogues do a good job of approximating their populations counterparts.

13.5 One Dimensional Intelligence (IQ)

Let’s move closer to a real-life example, namely, inferring a one-dimensional measure of intelligence called IQ from a list
of test scores.
The 𝑖th test score 𝑦𝑖 equals the sum of an unknown scalar IQ 𝜃 and a random variable 𝑤𝑖 .

𝑦𝑖 = 𝜃 + 𝜎𝑦 𝑤𝑖 , 𝑖 = 1, … , 𝑛

The distribution of IQ’s for a cross-section of people is a normal random variable described by

𝜃 = 𝜇𝜃 + 𝜎𝜃 𝑤𝑛+1 .

We assume that the noises {𝑤𝑖 }𝑁


𝑖=1 in the test scores are IID and not correlated with IQ.

220 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

We also assume that {𝑤𝑖 }𝑛+1


𝑖=1 are i.i.d. standard normal:

𝑤1
⎡ 𝑤 ⎤
2
⎢ ⎥
𝑤=⎢ ⋮ ⎥ ∼ 𝑁 (0, 𝐼𝑛+1 )
⎢ 𝑤𝑛 ⎥
⎣ 𝑤𝑛+1 ⎦

The following system describes the (𝑛 + 1) × 1 random vector 𝑋 that interests us:

𝑦1 𝜇𝜃 𝜎𝑦 0 ⋯ 0 𝜎𝜃 𝑤1
⎡ 𝑦2 ⎤ ⎡ 𝜇𝜃 ⎤ ⎡ 0 𝜎𝑦 ⋯ 0 𝜎𝜃 ⎤⎡ 𝑤 ⎤
2
⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥
𝑋=⎢ ⋮ ⎥=⎢ ⋮ ⎥+⎢ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥⎢ ⋮ ⎥,
⎢ 𝑦𝑛 ⎥ ⎢ 𝜇𝜃 ⎥ ⎢ 0 0 ⋯ 𝜎𝑦 𝜎𝜃 ⎥ ⎢ 𝑤𝑛 ⎥
⎣ 𝜃 ⎦ ⎣ 𝜇𝜃 ⎦ ⎣ 0 0 ⋯ 0 𝜎𝜃 ⎦ ⎣ 𝑤𝑛+1 ⎦

or equivalently,

𝑋 = 𝜇𝜃 1𝑛+1 + 𝐷𝑤

𝑦
where 𝑋 = [ ], 1𝑛+1 is a vector of 1s of size 𝑛 + 1, and 𝐷 is an 𝑛 + 1 by 𝑛 + 1 matrix.
𝜃
Let’s define a Python function that constructs the mean 𝜇 and covariance matrix Σ of the random vector 𝑋 that we know
is governed by a multivariate normal distribution.
As arguments, the function takes the number of tests 𝑛, the mean 𝜇𝜃 and the standard deviation 𝜎𝜃 of the IQ distribution,
and the standard deviation of the randomness in test scores 𝜎𝑦 .

def construct_moments_IQ(n, μθ, σθ, σy):

μ_IQ = np.full(n+1, μθ)

D_IQ = np.zeros((n+1, n+1))


D_IQ[range(n), range(n)] = σy
D_IQ[:, n] = σθ

Σ_IQ = D_IQ @ D_IQ.T

return μ_IQ, Σ_IQ, D_IQ

Now let’s consider a specific instance of this model.


Assume we have recorded 50 test scores and we know that 𝜇𝜃 = 100, 𝜎𝜃 = 10, and 𝜎𝑦 = 10.
We can compute the mean vector and covariance matrix of 𝑋 easily with our construct_moments_IQ function as
follows.

n = 50
μθ, σθ, σy = 100., 10., 10.

μ_IQ, Σ_IQ, D_IQ = construct_moments_IQ(n, μθ, σθ, σy)


μ_IQ, Σ_IQ, D_IQ

(array([100., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100.,
100., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100.,
100., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100.,
(continues on next page)

13.5. One Dimensional Intelligence (IQ) 221


Quantitative Economics with Python

(continued from previous page)


100., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100.,
100., 100., 100., 100., 100., 100., 100.]),
array([[200., 100., 100., ..., 100., 100., 100.],
[100., 200., 100., ..., 100., 100., 100.],
[100., 100., 200., ..., 100., 100., 100.],
...,
[100., 100., 100., ..., 200., 100., 100.],
[100., 100., 100., ..., 100., 200., 100.],
[100., 100., 100., ..., 100., 100., 100.]]),
array([[10., 0., 0., ..., 0., 0., 10.],
[ 0., 10., 0., ..., 0., 0., 10.],
[ 0., 0., 10., ..., 0., 0., 10.],
...,
[ 0., 0., 0., ..., 10., 0., 10.],
[ 0., 0., 0., ..., 0., 10., 10.],
[ 0., 0., 0., ..., 0., 0., 10.]]))

We can now use our MultivariateNormal class to construct an instance, then partition the mean vector and co-
variance matrix as we wish.
We want to regress IQ, the random variable 𝜃 (what we don’t know), on the vector 𝑦 of test scores (what we do know).
We choose k=n so that 𝑧1 = 𝑦 and 𝑧2 = 𝜃.

multi_normal_IQ = MultivariateNormal(μ_IQ, Σ_IQ)

k = n
multi_normal_IQ.partition(k)

Using the generator multivariate_normal, we can make one draw of the random vector from our distribution and
then compute the distribution of 𝜃 conditional on our test scores.
Let’s do that and then print out some pertinent quantities.

x = np.random.multivariate_normal(μ_IQ, Σ_IQ)
y = x[:-1] # test scores
θ = x[-1] # IQ

# the true value


θ

104.54899674277466

The method cond_dist takes test scores 𝑦 as input and returns the conditional normal distribution of the IQ 𝜃.
In the following code, ind sets the variables on the right side of the regression.
Given the way we have defined the vector 𝑋, we want to set ind=1 in order to make 𝜃 the left side variable in the
population regression.

ind = 1
multi_normal_IQ.cond_dist(ind, y)

(array([104.49298531]), array([[1.96078431]]))

222 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

The first number is the conditional mean 𝜇𝜃̂ and the second is the conditional variance Σ̂ 𝜃 .
How do additional test scores affect our inferences?
To shed light on this, we compute a sequence of conditional distributions of 𝜃 by varying the number of test scores in the
conditioning set from 1 to 𝑛.
We’ll make a pretty graph showing how our judgment of the person’s IQ change as more test results come in.

# array for containing moments


μθ_hat_arr = np.empty(n)
Σθ_hat_arr = np.empty(n)

# loop over number of test scores


for i in range(1, n+1):
# construction of multivariate normal distribution instance
μ_IQ_i, Σ_IQ_i, D_IQ_i = construct_moments_IQ(i, μθ, σθ, σy)
multi_normal_IQ_i = MultivariateNormal(μ_IQ_i, Σ_IQ_i)

# partition and compute conditional distribution


multi_normal_IQ_i.partition(i)
scores_i = y[:i]
μθ_hat_i, Σθ_hat_i = multi_normal_IQ_i.cond_dist(1, scores_i)

# store the results


μθ_hat_arr[i-1] = μθ_hat_i[0]
Σθ_hat_arr[i-1] = Σθ_hat_i[0, 0]

# transform variance to standard deviation


σθ_hat_arr = np.sqrt(Σθ_hat_arr)

μθ_hat_lower = μθ_hat_arr - 1.96 * σθ_hat_arr


μθ_hat_higher = μθ_hat_arr + 1.96 * σθ_hat_arr

plt.hlines(θ, 1, n+1, ls='--', label='true $θ$')


plt.plot(range(1, n+1), μθ_hat_arr, color='b', label='$\hat{μ}_{θ}$')
plt.plot(range(1, n+1), μθ_hat_lower, color='b', ls='--')
plt.plot(range(1, n+1), μθ_hat_higher, color='b', ls='--')
plt.fill_between(range(1, n+1), μθ_hat_lower, μθ_hat_higher,
color='b', alpha=0.2, label='95%')

plt.xlabel('number of test scores')


plt.ylabel('$\hat{θ}$')
plt.legend()

plt.show()

13.5. One Dimensional Intelligence (IQ) 223


Quantitative Economics with Python

The solid blue line in the plot above shows 𝜇𝜃̂ as a function of the number of test scores that we have recorded and
conditioned on.
The blue area shows the span that comes from adding or deducing 1.96𝜎̂𝜃 from 𝜇𝜃̂ .
Therefore, 95% of the probability mass of the conditional distribution falls in this range.
The value of the random 𝜃 that we drew is shown by the black dotted line.
As more and more test scores come in, our estimate of the person’s 𝜃 become more and more reliable.
By staring at the changes in the conditional distributions, we see that adding more test scores makes 𝜃 ̂ settle down and
approach 𝜃.
Thus, each 𝑦𝑖 adds information about 𝜃.
1
If we were to drive the number of tests 𝑛 → +∞, the conditional standard deviation 𝜎̂𝜃 would converge to 0 at rate 𝑛.5 .

13.6 Information as Surprise

By using a different representation, let’s look at things from a different perspective.


We can represent the random vector 𝑋 defined above as

𝑋 = 𝜇𝜃 1𝑛+1 + 𝐶𝜖, 𝜖 ∼ 𝑁 (0, 𝐼)

where 𝐶 is a lower triangular Cholesky factor of Σ so that

Σ ≡ 𝐷𝐷′ = 𝐶𝐶 ′

and

𝐸𝜖𝜖′ = 𝐼.

It follows that

𝜖 ∼ 𝑁 (0, 𝐼).

Let 𝐺 = 𝐶 −1

224 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

𝐺 is also lower triangular.


We can compute 𝜖 from the formula

𝜖 = 𝐺 (𝑋 − 𝜇𝜃 1𝑛+1 )

This formula confirms that the orthonormal vector 𝜖 contains the same information as the non-orthogonal vector
(𝑋 − 𝜇𝜃 1𝑛+1 ).
We can say that 𝜖 is an orthogonal basis for (𝑋 − 𝜇𝜃 1𝑛+1 ).
Let 𝑐𝑖 be the 𝑖th element in the last row of 𝐶.
Then we can write

𝜃 = 𝜇𝜃 + 𝑐1 𝜖1 + 𝑐2 𝜖2 + ⋯ + 𝑐𝑛 𝜖𝑛 + 𝑐𝑛+1 𝜖𝑛+1 (13.1)

The mutual orthogonality of the 𝜖𝑖 ’s provides us with an informative way to interpret them in light of equation (13.1).
Thus, relative to what is known from tests 𝑖 = 1, … , 𝑛 − 1, 𝑐𝑖 𝜖𝑖 is the amount of new information about 𝜃 brought by
the test number 𝑖.
Here new information means surprise or what could not be predicted from earlier information.
Formula (13.1) also provides us with an enlightening way to express conditional means and conditional variances that we
computed earlier.
In particular,

𝐸 [𝜃 ∣ 𝑦1 , … , 𝑦𝑘 ] = 𝜇𝜃 + 𝑐1 𝜖1 + ⋯ + 𝑐𝑘 𝜖𝑘

and
2 2 2
𝑉 𝑎𝑟 (𝜃 ∣ 𝑦1 , … , 𝑦𝑘 ) = 𝑐𝑘+1 + 𝑐𝑘+2 + ⋯ + 𝑐𝑛+1 .

C = np.linalg.cholesky(Σ_IQ)
G = np.linalg.inv(C)

ε = G @ (x - μθ)

cε = C[n, :] * ε

# compute the sequence of μθ and Σθ conditional on y1, y2, ..., yk


μθ_hat_arr_C = np.array([np.sum(cε[:k+1]) for k in range(n)]) + μθ
Σθ_hat_arr_C = np.array([np.sum(C[n, i+1:n+1] ** 2) for i in range(n)])

To confirm that these formulas give the same answers that we computed earlier, we can compare the means and variances
of 𝜃 conditional on {𝑦𝑖 }𝑘𝑖=1 with what we obtained above using the formulas implemented in the class Multivari-
ateNormal built on our original representation of conditional distributions for multivariate normal distributions.

# conditional mean
np.max(np.abs(μθ_hat_arr - μθ_hat_arr_C)) < 1e-10

True

# conditional variance
np.max(np.abs(Σθ_hat_arr - Σθ_hat_arr_C)) < 1e-10

13.6. Information as Surprise 225


Quantitative Economics with Python

True

13.7 Cholesky Factor Magic

Evidently, the Cholesky factorizations automatically computes the population regression coefficients and associated
statistics that are produced by our MultivariateNormal class.
The Cholesky factorization computes these things recursively.
Indeed, in formula (13.1),
• the random variable 𝑐𝑖 𝜖𝑖 is information about 𝜃 that is not contained by the information in 𝜖1 , 𝜖2 , … , 𝜖𝑖−1
• the coefficient 𝑐𝑖 is the simple population regression coefficient of 𝜃 − 𝜇𝜃 on 𝜖𝑖

13.8 Math and Verbal Intelligence

We can alter the preceding example to be more realistic.


There is ample evidence that IQ is not a scalar.
Some people are good in math skills but poor in language skills.
Other people are good in language skills but poor in math skills.
So now we shall assume that there are two dimensions of IQ, 𝜃 and 𝜂.
These determine average performances in math and language tests, respectively.
We observe math scores {𝑦𝑖 }𝑛𝑖=1 and language scores {𝑦𝑖 }2𝑛
𝑖=𝑛+1 .

When 𝑛 = 2, we assume that outcomes are draws from a multivariate normal distribution with representation

𝑦1 𝜇𝜃 𝜎𝑦 0 0 0 𝜎𝜃 0 𝑤1
⎡ 𝑦2 ⎤ ⎡ 𝜇𝜃 ⎤ ⎡ 0 𝜎𝑦 0 0 𝜎𝜃 0 ⎤⎡ 𝑤2 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥
𝑦3 ⎥=⎢ 𝜇𝜂 ⎥+⎢ 0 0 𝜎𝑦 0 0 𝜎𝜂 ⎥⎢ 𝑤3
𝑋=⎢ ⎥
⎢ 𝑦4 ⎥ ⎢ 𝜇𝜂 ⎥ ⎢ 0 0 0 𝜎𝑦 0 𝜎𝜂 ⎥⎢ 𝑤4 ⎥
⎢ 𝜃 ⎥ ⎢ 𝜇𝜃 ⎥ ⎢ 0 0 0 0 𝜎𝜃 0 ⎥⎢ 𝑤5 ⎥
⎣ 𝜂 ⎦ ⎣ 𝜇𝜂 ⎦ ⎣ 0 0 0 0 0 𝜎𝜂 ⎦⎣ 𝑤6 ⎦

𝑤1
⎡𝑤 ⎤
where 𝑤 ⎢ 2 ⎥ is a standard normal random vector.
⎢ ⋮ ⎥
⎣𝑤6 ⎦
We construct a Python function construct_moments_IQ2d to construct the mean vector and covariance matrix of
the joint normal distribution.

def construct_moments_IQ2d(n, μθ, σθ, μη, ση, σy):

μ_IQ2d = np.empty(2*(n+1))
μ_IQ2d[:n] = μθ
μ_IQ2d[2*n] = μθ
μ_IQ2d[n:2*n] = μη
μ_IQ2d[2*n+1] = μη
(continues on next page)

226 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

(continued from previous page)

D_IQ2d = np.zeros((2*(n+1), 2*(n+1)))


D_IQ2d[range(2*n), range(2*n)] = σy
D_IQ2d[:n, 2*n] = σθ
D_IQ2d[2*n, 2*n] = σθ
D_IQ2d[n:2*n, 2*n+1] = ση
D_IQ2d[2*n+1, 2*n+1] = ση

Σ_IQ2d = D_IQ2d @ D_IQ2d.T

return μ_IQ2d, Σ_IQ2d, D_IQ2d

Let’s put the function to work.

n = 2
# mean and variance of θ, η, and y
μθ, σθ, μη, ση, σy = 100., 10., 100., 10, 10

μ_IQ2d, Σ_IQ2d, D_IQ2d = construct_moments_IQ2d(n, μθ, σθ, μη, ση, σy)


μ_IQ2d, Σ_IQ2d, D_IQ2d

(array([100., 100., 100., 100., 100., 100.]),


array([[200., 100., 0., 0., 100., 0.],
[100., 200., 0., 0., 100., 0.],
[ 0., 0., 200., 100., 0., 100.],
[ 0., 0., 100., 200., 0., 100.],
[100., 100., 0., 0., 100., 0.],
[ 0., 0., 100., 100., 0., 100.]]),
array([[10., 0., 0., 0., 10., 0.],
[ 0., 10., 0., 0., 10., 0.],
[ 0., 0., 10., 0., 0., 10.],
[ 0., 0., 0., 10., 0., 10.],
[ 0., 0., 0., 0., 10., 0.],
[ 0., 0., 0., 0., 0., 10.]]))

# take one draw


x = np.random.multivariate_normal(μ_IQ2d, Σ_IQ2d)
y1 = x[:n]
y2 = x[n:2*n]
θ = x[2*n]
η = x[2*n+1]

# the true values


θ, η

(104.87169696314103, 107.9697815380512)

We first compute the joint normal distribution of (𝜃, 𝜂).

multi_normal_IQ2d = MultivariateNormal(μ_IQ2d, Σ_IQ2d)

k = 2*n # the length of data vector


(continues on next page)

13.8. Math and Verbal Intelligence 227


Quantitative Economics with Python

(continued from previous page)


multi_normal_IQ2d.partition(k)

multi_normal_IQ2d.cond_dist(1, [*y1, *y2])

(array([101.17829519, 105.80501858]),
array([[33.33333333, 0. ],
[ 0. , 33.33333333]]))

Now let’s compute distributions of 𝜃 and 𝜇 separately conditional on various subsets of test scores.
It will be fun to compare outcomes with the help of an auxiliary function cond_dist_IQ2d that we now construct.

def cond_dist_IQ2d(μ, Σ, data):

n = len(μ)

multi_normal = MultivariateNormal(μ, Σ)
multi_normal.partition(n-1)
μ_hat, Σ_hat = multi_normal.cond_dist(1, data)

return μ_hat, Σ_hat

Let’s see how things work for an example.

for indices, IQ, conditions in [([*range(2*n), 2*n], 'θ', 'y1, y2, y3, y4'),
([*range(n), 2*n], 'θ', 'y1, y2'),
([*range(n, 2*n), 2*n], 'θ', 'y3, y4'),
([*range(2*n), 2*n+1], 'η', 'y1, y2, y3, y4'),
([*range(n), 2*n+1], 'η', 'y1, y2'),
([*range(n, 2*n), 2*n+1], 'η', 'y3, y4')]:

μ_hat, Σ_hat = cond_dist_IQ2d(μ_IQ2d[indices], Σ_IQ2d[indices][:, indices],␣


↪ x[indices[:-1]])
print(f'The mean and variance of {IQ} conditional on {conditions: <15} are ' +
f'{μ_hat[0]:1.2f} and {Σ_hat[0, 0]:1.2f} respectively')

The mean and variance of θ conditional on y1, y2, y3, y4 are 101.18 and 33.33␣
↪respectively

The mean and variance of θ conditional on y1, y2 are 101.18 and 33.33␣
↪respectively

The mean and variance of θ conditional on y3, y4 are 100.00 and 100.00␣
↪respectively

The mean and variance of η conditional on y1, y2, y3, y4 are 105.81 and 33.33␣
↪respectively

The mean and variance of η conditional on y1, y2 are 100.00 and 100.00␣
↪respectively

The mean and variance of η conditional on y3, y4 are 105.81 and 33.33␣
↪respectively

Evidently, math tests provide no information about 𝜇 and language tests provide no information about 𝜂.

228 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

13.9 Univariate Time Series Analysis

We can use the multivariate normal distribution and a little matrix algebra to present foundations of univariate linear time
series analysis.
Let 𝑥𝑡 , 𝑦𝑡 , 𝑣𝑡 , 𝑤𝑡+1 each be scalars for 𝑡 ≥ 0.
Consider the following model:

𝑥0 ∼ 𝑁 (0, 𝜎02 )
𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏𝑤𝑡+1 , 𝑤𝑡+1 ∼ 𝑁 (0, 1) , 𝑡 ≥ 0
𝑦𝑡 = 𝑐𝑥𝑡 + 𝑑𝑣𝑡 , 𝑣𝑡 ∼ 𝑁 (0, 1) , 𝑡 ≥ 0

We can compute the moments of 𝑥𝑡


1. 𝐸𝑥2𝑡+1 = 𝑎2 𝐸𝑥2𝑡 + 𝑏2 , 𝑡 ≥ 0, where 𝐸𝑥20 = 𝜎02
2. 𝐸𝑥𝑡+𝑗 𝑥𝑡 = 𝑎𝑗 𝐸𝑥2𝑡 , ∀𝑡 ∀𝑗
Given some 𝑇 , we can formulate the sequence {𝑥𝑡 }𝑇𝑡=0 as a random vector

𝑥0
⎡ 𝑥 ⎤
𝑋=⎢ 1 ⎥
⎢ ⋮ ⎥
⎣ 𝑥𝑇 ⎦
and the covariance matrix Σ𝑥 can be constructed using the moments we have computed above.
Similarly, we can define

𝑦0 𝑣0
⎡ 𝑦 ⎤ ⎡ 𝑣 ⎤
𝑌 =⎢ 1 ⎥, 𝑣=⎢ 1 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣ 𝑦𝑇 ⎦ ⎣ 𝑣𝑇 ⎦
and therefore

𝑌 = 𝐶𝑋 + 𝐷𝑉

where 𝐶 and 𝐷 are both diagonal matrices with constant 𝑐 and 𝑑 as diagonal respectively.
Consequently, the covariance matrix of 𝑌 is

Σ𝑦 = 𝐸𝑌 𝑌 ′ = 𝐶Σ𝑥 𝐶 ′ + 𝐷𝐷′

By stacking 𝑋 and 𝑌 , we can write

𝑋
𝑍=[ ]
𝑌

and
Σ𝑥 Σ𝑥 𝐶 ′
Σ𝑧 = 𝐸𝑍𝑍 ′ = [ ]
𝐶Σ𝑥 Σ𝑦

Thus, the stacked sequences {𝑥𝑡 }𝑇𝑡=0 and {𝑦𝑡 }𝑇𝑡=0 jointly follow the multivariate normal distribution 𝑁 (0, Σ𝑧 ).

# as an example, consider the case where T = 3


T = 3

13.9. Univariate Time Series Analysis 229


Quantitative Economics with Python

# variance of the initial distribution x_0


σ0 = 1.

# parameters of the equation system


a = .9
b = 1.
c = 1.0
d = .05

# construct the covariance matrix of X


Σx = np.empty((T+1, T+1))

Σx[0, 0] = σ0 ** 2
for i in range(T):
Σx[i, i+1:] = Σx[i, i] * a ** np.arange(1, T+1-i)
Σx[i+1:, i] = Σx[i, i+1:]

Σx[i+1, i+1] = a ** 2 * Σx[i, i] + b ** 2

Σx

array([[1. , 0.9 , 0.81 , 0.729 ],


[0.9 , 1.81 , 1.629 , 1.4661 ],
[0.81 , 1.629 , 2.4661 , 2.21949 ],
[0.729 , 1.4661 , 2.21949 , 2.997541]])

# construct the covariance matrix of Y


C = np.eye(T+1) * c
D = np.eye(T+1) * d

Σy = C @ Σx @ C.T + D @ D.T

# construct the covariance matrix of Z


Σz = np.empty((2*(T+1), 2*(T+1)))

Σz[:T+1, :T+1] = Σx
Σz[:T+1, T+1:] = Σx @ C.T
Σz[T+1:, :T+1] = C @ Σx
Σz[T+1:, T+1:] = Σy

Σz

array([[1. , 0.9 , 0.81 , 0.729 , 1. , 0.9 ,


0.81 , 0.729 ],
[0.9 , 1.81 , 1.629 , 1.4661 , 0.9 , 1.81 ,
1.629 , 1.4661 ],
[0.81 , 1.629 , 2.4661 , 2.21949 , 0.81 , 1.629 ,
2.4661 , 2.21949 ],
[0.729 , 1.4661 , 2.21949 , 2.997541, 0.729 , 1.4661 ,
2.21949 , 2.997541],
[1. , 0.9 , 0.81 , 0.729 , 1.0025 , 0.9 ,
0.81 , 0.729 ],
(continues on next page)

230 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

(continued from previous page)


[0.9 , 1.81 , 1.629 , 1.4661 , 0.9 , 1.8125 ,
1.629 , 1.4661 ],
[0.81 , 1.629 , 2.4661 , 2.21949 , 0.81 , 1.629 ,
2.4686 , 2.21949 ],
[0.729 , 1.4661 , 2.21949 , 2.997541, 0.729 , 1.4661 ,
2.21949 , 3.000041]])

# construct the mean vector of Z


μz = np.zeros(2*(T+1))

The following Python code lets us sample random vectors 𝑋 and 𝑌 .


This is going to be very useful for doing the conditioning to be used in the fun exercises below.

z = np.random.multivariate_normal(μz, Σz)

x = z[:T+1]
y = z[T+1:]

13.9.1 Smoothing Example

This is an instance of a classic smoothing calculation whose purpose is to compute 𝐸𝑋 ∣ 𝑌 .


An interpretation of this example is
• 𝑋 is a random sequence of hidden Markov state variables 𝑥𝑡
• 𝑌 is a sequence of observed signals 𝑦𝑡 bearing information about the hidden state

# construct a MultivariateNormal instance


multi_normal_ex1 = MultivariateNormal(μz, Σz)
x = z[:T+1]
y = z[T+1:]

# partition Z into X and Y


multi_normal_ex1.partition(T+1)

# compute the conditional mean and covariance matrix of X given Y=y

print("X = ", x)
print("Y = ", y)
print(" E [ X | Y] = ", )

multi_normal_ex1.cond_dist(0, y)

X = [1.04425612 1.12782548 0.54973228 0.49249607]


Y = [1.1415452 1.12814704 0.65694875 0.44324541]
E [ X | Y] =

(array([1.1389275 , 1.12708895, 0.65750761, 0.44361577]),


array([[2.48875094e-03, 5.57449314e-06, 1.24861729e-08, 2.80235835e-11],
(continues on next page)

13.9. Univariate Time Series Analysis 231


Quantitative Economics with Python

(continued from previous page)


[5.57449314e-06, 2.48876343e-03, 5.57452116e-06, 1.25113941e-08],
[1.24861729e-08, 5.57452116e-06, 2.48876346e-03, 5.58575339e-06],
[2.80235835e-11, 1.25113941e-08, 5.58575339e-06, 2.49377812e-03]]))

13.9.2 Filtering Exercise

Compute 𝐸 [𝑥𝑡 ∣ 𝑦𝑡−1 , 𝑦𝑡−2 , … , 𝑦0 ].


To do so, we need to first construct the mean vector and the covariance matrix of the subvector [𝑥𝑡 , 𝑦0 , … , 𝑦𝑡−2 , 𝑦𝑡−1 ].
For example, let’s say that we want the conditional distribution of 𝑥3 .

t = 3

# mean of the subvector


sub_μz = np.zeros(t+1)

# covariance matrix of the subvector


sub_Σz = np.empty((t+1, t+1))

sub_Σz[0, 0] = Σz[t, t] # x_t


sub_Σz[0, 1:] = Σz[t, T+1:T+t+1]
sub_Σz[1:, 0] = Σz[T+1:T+t+1, t]
sub_Σz[1:, 1:] = Σz[T+1:T+t+1, T+1:T+t+1]

sub_Σz

array([[2.997541, 0.729 , 1.4661 , 2.21949 ],


[0.729 , 1.0025 , 0.9 , 0.81 ],
[1.4661 , 0.9 , 1.8125 , 1.629 ],
[2.21949 , 0.81 , 1.629 , 2.4686 ]])

multi_normal_ex2 = MultivariateNormal(sub_μz, sub_Σz)


multi_normal_ex2.partition(1)

sub_y = y[:t]

multi_normal_ex2.cond_dist(0, sub_y)

(array([0.59205609]), array([[1.00201996]]))

232 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

13.9.3 Prediction Exercise

Compute 𝐸 [𝑦𝑡 ∣ 𝑦𝑡−𝑗 , … , 𝑦0 ].


As what we did in exercise 2, we will construct the mean vector and covariance matrix of the subvector
[𝑦𝑡 , 𝑦0 , … , 𝑦𝑡−𝑗−1 , 𝑦𝑡−𝑗 ].
For example, we take a case in which 𝑡 = 3 and 𝑗 = 2.

t = 3
j = 2

sub_μz = np.zeros(t-j+2)
sub_Σz = np.empty((t-j+2, t-j+2))

sub_Σz[0, 0] = Σz[T+t+1, T+t+1]


sub_Σz[0, 1:] = Σz[T+t+1, T+1:T+t-j+2]
sub_Σz[1:, 0] = Σz[T+1:T+t-j+2, T+t+1]
sub_Σz[1:, 1:] = Σz[T+1:T+t-j+2, T+1:T+t-j+2]

sub_Σz

array([[3.000041, 0.729 , 1.4661 ],


[0.729 , 1.0025 , 0.9 ],
[1.4661 , 0.9 , 1.8125 ]])

multi_normal_ex3 = MultivariateNormal(sub_μz, sub_Σz)


multi_normal_ex3.partition(1)

sub_y = y[:t-j+1]

multi_normal_ex3.cond_dist(0, sub_y)

(array([0.91359083]), array([[1.81413617]]))

13.9.4 Constructing a Wold Representation

Now we’ll apply Cholesky decomposition to decompose Σ𝑦 = 𝐻𝐻 ′ and form

𝜖 = 𝐻 −1 𝑌 .

Then we can represent 𝑦𝑡 as

𝑦𝑡 = ℎ𝑡,𝑡 𝜖𝑡 + ℎ𝑡,𝑡−1 𝜖𝑡−1 + ⋯ + ℎ𝑡,0 𝜖0 .

H = np.linalg.cholesky(Σy)

13.9. Univariate Time Series Analysis 233


Quantitative Economics with Python

array([[1.00124922, 0. , 0. , 0. ],
[0.8988771 , 1.00225743, 0. , 0. ],
[0.80898939, 0.89978675, 1.00225743, 0. ],
[0.72809046, 0.80980808, 0.89978676, 1.00225743]])

ε = np.linalg.inv(H) @ y

array([ 1.14012093, 0.10308573, -0.35734549, -0.1484755 ])

array([1.1415452 , 1.12814704, 0.65694875, 0.44324541])

This example is an instance of what is known as a Wold representation in time series analysis.

13.10 Stochastic Difference Equation

Consider the stochastic second-order linear difference equation


𝑦𝑡 = 𝛼0 + 𝛼1 𝑦𝑦−1 + 𝛼2 𝑦𝑡−2 + 𝑢𝑡
where 𝑢𝑡 ∼ 𝑁 (0, 𝜎𝑢2 ) and
𝑦−1
[ ] ∼ 𝑁 (𝜇𝑦̃ , Σ𝑦̃ )
𝑦0
It can be written as a stacked system
1 0 0 0 ⋯ 0 0 0 𝑦1 𝛼0 + 𝛼1 𝑦0 + 𝛼2 𝑦−1
⎡ −𝛼 1 0 0 ⋯ 0 0 0 ⎤⎡ 𝑦2 ⎤ ⎡ 𝛼0 + 𝛼2 𝑦0 ⎤
1
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ −𝛼2 −𝛼1 1 0 ⋯ 0 0 0 ⎥⎢ 𝑦3 ⎥= ⎢ 𝛼0 ⎥
⎢ 0 −𝛼2 −𝛼1 1 ⋯ 0 0 0 ⎥⎢ 𝑦4 ⎥ ⎢ 𝛼0 ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣ 0 0 0 0 ⋯ −𝛼2 −𝛼1 1 ⎦ ⎣
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 𝑦𝑇 ⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⎣ 𝛼0 ⎦
≡𝐴 ≡𝑏

We can compute 𝑦 by solving the system


𝑦 = 𝐴−1 (𝑏 + 𝑢)
We have
𝜇𝑦 = 𝐴−1 𝜇𝑏
′ ′
Σ𝑦 = 𝐴−1 𝐸 [(𝑏 − 𝜇𝑏 + 𝑢) (𝑏 − 𝜇𝑏 + 𝑢) ] (𝐴−1 )

= 𝐴−1 (Σ𝑏 + Σ𝑢 ) (𝐴−1 )
where
𝛼0 + 𝛼1 𝜇𝑦0 + 𝛼2 𝜇𝑦−1
⎡ 𝛼 0 + 𝛼 2 𝜇 𝑦0 ⎤
⎢ ⎥
𝜇𝑏 = ⎢ 𝛼0 ⎥
⎢ ⋮ ⎥
⎣ 𝛼0 ⎦

234 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

𝐶Σ𝑦̃ 𝐶 ′ 0𝑁−2×𝑁−2 𝛼2 𝛼1
Σ𝑏 = [ ], 𝐶=[ ]
0𝑁−2×2 0𝑁−2×𝑁−2 0 𝛼2
𝜎𝑢2 0 ⋯ 0
⎡ 0 𝜎𝑢2 ⋯ 0 ⎤
Σ𝑢 = ⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎣ 0 0 ⋯ 𝜎𝑢2 ⎦
# set parameters
T = 80
T = 160
# coefficients of the second order difference equation
0 = 10
1 = 1.53
2 = -.9

# variance of u
σu = 1.
σu = 10.

# distribution of y_{-1} and y_{0}


μy_tilde = np.array([1., 0.5])
Σy_tilde = np.array([[2., 1.], [1., 0.5]])

# construct A and A^{\prime}


A = np.zeros((T, T))

for i in range(T):
A[i, i] = 1

if i-1 >= 0:
A[i, i-1] = - 1

if i-2 >= 0:
A[i, i-2] = - 2

A_inv = np.linalg.inv(A)

# compute the mean vectors of b and y


μb = np.full(T, 0)
μb[0] += 1 * μy_tilde[1] + 2 * μy_tilde[0]
μb[1] += 2 * μy_tilde[1]

μy = A_inv @ μb

# compute the covariance matrices of b and y


Σu = np.eye(T) * σu ** 2

Σb = np.zeros((T, T))

C = np.array([[ 2, 1], [0, 2]])


Σb[:2, :2] = C @ Σy_tilde @ C.T

Σy = A_inv @ (Σb + Σu) @ A_inv.T

13.10. Stochastic Difference Equation 235


Quantitative Economics with Python

13.11 Application to Stock Price Model

Let
𝑇 −𝑡
𝑝𝑡 = ∑ 𝛽 𝑗 𝑦𝑡+𝑗
𝑗=0

Form
𝑝1 1 𝛽 𝛽 2 ⋯ 𝛽 𝑇 −1 𝑦1
⎡ 𝑝 ⎤ ⎡ 0 1 𝛽 ⋯ 𝛽 𝑇 −2 ⎤ ⎡ 𝑦2 ⎤
⎢ 2 ⎥ ⎢ 𝑇 −3 ⎥ ⎢ ⎥
⎢ 𝑝3 ⎥ = ⎢ 0 0 1 ⋯ 𝛽 ⎥⎢ 𝑦3 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎣ 𝑝𝑇 ⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⏟ ⎣ 0 0 0 ⋯ 1 ⎦⎣ 𝑦𝑇 ⎦
≡𝑝 ≡𝐵

we have
𝜇𝑝 = 𝐵𝜇𝑦
Σ𝑝 = 𝐵Σ𝑦 𝐵′

β = .96

# construct B
B = np.zeros((T, T))

for i in range(T):
B[i, i:] = β ** np.arange(0, T-i)

Denote
𝑦 𝐼
𝑧=[ ]= [ ]𝑦
𝑝 ⏟ 𝐵
≡𝐷

Thus, {𝑦𝑡 }𝑇𝑡=1 and {𝑝𝑡 }𝑇𝑡=1 jointly follow the multivariate normal distribution 𝑁 (𝜇𝑧 , Σ𝑧 ), where

𝜇𝑧 = 𝐷𝜇𝑦

Σ𝑧 = 𝐷Σ𝑦 𝐷′

D = np.vstack([np.eye(T), B])

μz = D @ μy
Σz = D @ Σy @ D.T

We can simulate paths of 𝑦𝑡 and 𝑝𝑡 and compute the conditional mean 𝐸 [𝑝𝑡 ∣ 𝑦𝑡−1 , 𝑦𝑡 ] using the MultivariateNor-
mal class.

z = np.random.multivariate_normal(μz, Σz)
y, p = z[:T], z[T:]

236 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

cond_Ep = np.empty(T-1)

sub_μ = np.empty(3)
sub_Σ = np.empty((3, 3))
for t in range(2, T+1):
sub_μ[:] = μz[[t-2, t-1, T-1+t]]
sub_Σ[:, :] = Σz[[t-2, t-1, T-1+t], :][:, [t-2, t-1, T-1+t]]

multi_normal = MultivariateNormal(sub_μ, sub_Σ)


multi_normal.partition(2)

cond_Ep[t-2] = multi_normal.cond_dist(1, y[t-2:t])[0][0]

plt.plot(range(1, T), y[1:], label='$y_{t}$')


plt.plot(range(1, T), y[:-1], label='$y_{t-1}$')
plt.plot(range(1, T), p[1:], label='$p_{t}$')
plt.plot(range(1, T), cond_Ep, label='$Ep_{t}|y_{t}, y_{t-1}$')

plt.xlabel('t')
plt.legend(loc=1)
plt.show()

In the above graph, the green line is what the price of the stock would be if people had perfect foresight about the path of
dividends while the green line is the conditional expectation 𝐸𝑝𝑡 |𝑦𝑡 , 𝑦𝑡−1 , which is what the price would be if people did
not have perfect foresight but were optimally predicting future dividends on the basis of the information 𝑦𝑡 , 𝑦𝑡−1 at time
𝑡.

13.11. Application to Stock Price Model 237


Quantitative Economics with Python

13.12 Filtering Foundations

Assume that 𝑥0 is an 𝑛 × 1 random vector and that 𝑦0 is a 𝑝 × 1 random vector determined by the observation equation

𝑦0 = 𝐺𝑥0 + 𝑣0 , 𝑥0 ∼ 𝒩(𝑥0̂ , Σ0 ), 𝑣0 ∼ 𝒩(0, 𝑅)

where 𝑣0 is orthogonal to 𝑥0 , 𝐺 is a 𝑝 × 𝑛 matrix, and 𝑅 is a 𝑝 × 𝑝 positive definite matrix.


We consider the problem of someone who
• observes 𝑦0
• does not observe 𝑥0 ,
𝑥
• knows 𝑥0̂ , Σ0 , 𝐺, 𝑅 and therefore the joint probability distribution of the vector [ 0 ]
𝑦0
• wants to infer 𝑥0 from 𝑦0 in light of what he knows about that joint probability distribution.
Therefore, the person wants to construct the probability distribution of 𝑥0 conditional on the random vector 𝑦0 .
𝑥0
The joint distribution of [ ] is multivariate normal 𝒩(𝜇, Σ) with
𝑦0

𝑥0̂ Σ0 Σ0 𝐺′
𝜇=[ ], Σ=[ ]
𝐺𝑥0̂ 𝐺Σ0 𝐺Σ0 𝐺′ + 𝑅

By applying an appropriate instance of the above formulas for the mean vector 𝜇1̂ and covariance matrix Σ̂ 11 of 𝑧1
conditional on 𝑧2 , we find that the probability distribution of 𝑥0 conditional on 𝑦0 is 𝒩(𝑥0̃ , Σ̃ 0 ) where

𝛽0 = Σ0 𝐺′ (𝐺Σ0 𝐺′ + 𝑅)−1
𝑥0̃ = 𝑥0̂ + 𝛽0 (𝑦0 − 𝐺𝑥0̂ )
Σ̃ 0 = Σ0 − Σ0 𝐺′ (𝐺Σ0 𝐺′ + 𝑅)−1 𝐺Σ0

13.12.1 Step toward dynamics

Now suppose that we are in a time series setting and that we have the one-step state transition equation

𝑥1 = 𝐴𝑥0 + 𝐶𝑤1 , 𝑤1 ∼ 𝒩(0, 𝐼)

where 𝐴 is an 𝑛 × 𝑛 matrix and 𝐶 is an 𝑛 × 𝑚 matrix.


It follows that the probability distribution of 𝑥1 conditional on 𝑦0 is

𝑥1 |𝑦0 ∼ 𝒩(𝐴𝑥0̃ , 𝐴Σ̃ 0 𝐴′ + 𝐶𝐶 ′ )

Define
𝑥1̂ = 𝐴𝑥0̃
Σ1 = 𝐴Σ̃ 0 𝐴′ + 𝐶𝐶 ′

238 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

13.12.2 Dynamic version

Suppose now that for 𝑡 ≥ 0, {𝑥𝑡+1 , 𝑦𝑡 }∞


𝑡=0 are governed by the equations

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 + 𝑣𝑡

where as before 𝑥0 ∼ 𝒩(𝑥0̂ , Σ0 ), 𝑤𝑡+1 is the 𝑡 + 1th component of an i.i.d. stochastic process distributed as 𝑤𝑡+1 ∼
𝒩(0, 𝐼), and 𝑣𝑡 is the 𝑡th component of an i.i.d. process distributed as 𝑣𝑡 ∼ 𝒩(0, 𝑅) and the {𝑤𝑡+1 }∞ ∞
𝑡=0 and {𝑣𝑡 }𝑡=0
processes are orthogonal at all pairs of dates.
The logic and formulas that we applied above imply that the probability distribution of 𝑥𝑡 conditional on 𝑦0 , 𝑦1 , … , 𝑦𝑡−1 =
𝑦𝑡−1 is

𝑥𝑡 |𝑦𝑡−1 ∼ 𝒩(𝐴𝑥𝑡̃ , 𝐴Σ̃ 𝑡 𝐴′ + 𝐶𝐶 ′ )

where {𝑥𝑡̃ , Σ̃ 𝑡 }∞
𝑡=1 can be computed by iterating on the following equations starting from 𝑡 = 1 and initial conditions for
𝑥0̃ , Σ̃ 0 computed as we have above:

Σ𝑡 = 𝐴Σ̃ 𝑡−1 𝐴′ + 𝐶𝐶 ′
𝑥𝑡̂ = 𝐴𝑥𝑡−1
̃
𝛽𝑡 = Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1
𝑥𝑡̃ = 𝑥𝑡̂ + 𝛽𝑡 (𝑦𝑡 − 𝐺𝑥𝑡̂ )
Σ̃ 𝑡 = Σ𝑡 − Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡

If we shift the first equation forward one period and then substitute the expression for Σ̃ 𝑡 on the right side of the fifth
equation into it we obtain

Σ𝑡+1 = 𝐶𝐶 ′ + 𝐴Σ𝑡 𝐴′ − 𝐴Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡 𝐴′ .

This is a matrix Riccati difference equation that is closely related to another matrix Riccati difference equation that appears
in a quantecon lecture on the basics of linear quadratic control theory.
That equation has the form

𝑃𝑡−1 = 𝑅 + 𝐴′ 𝑃𝑡 𝐴 − 𝐴′ 𝑃𝑡 𝐵(𝐵′ 𝑃𝑡 𝐵 + 𝑄)−1 𝐵′ 𝑃𝑡 𝐴.

Stare at the two preceding equations for a moment or two, the first being a matrix difference equation for a conditional
covariance matrix, the second being a matrix difference equation in the matrix appearing in a quadratic form for an
intertemporal cost of value function.
Although the two equations are not identical, they display striking family resemblences.
• the first equation tells dynamics that work forward in time
• the second equation tells dynamics that work backward in time
• while many of the terms are similar, one equation seems to apply matrix transformations to some matrices that play
similar roles in the other equation
The family resemblences of these two equations reflects a transcendent duality between control theory and filtering theory.

13.12. Filtering Foundations 239


Quantitative Economics with Python

13.12.3 An example

We can use the Python class MultivariateNormal to construct examples.


Here is an example for a single period problem at time 0

G = np.array([[1., 3.]])
R = np.array([[1.]])

x0_hat = np.array([0., 1.])


Σ0 = np.array([[1., .5], [.3, 2.]])

μ = np.hstack([x0_hat, G @ x0_hat])
Σ = np.block([[Σ0, Σ0 @ G.T], [G @ Σ0, G @ Σ0 @ G.T + R]])

# construction of the multivariate normal instance


multi_normal = MultivariateNormal(μ, Σ)

multi_normal.partition(2)

# the observation of y
y0 = 2.3

# conditional distribution of x0
μ1_hat, Σ11 = multi_normal.cond_dist(0, y0)
μ1_hat, Σ11

(array([-0.078125, 0.803125]),
array([[ 0.72098214, -0.203125 ],
[-0.403125 , 0.228125 ]]))

A = np.array([[0.5, 0.2], [-0.1, 0.3]])


C = np.array([[2.], [1.]])

# conditional distribution of x1
x1_cond = A @ μ1_hat
Σ1_cond = C @ C.T + A @ Σ11 @ A.T
x1_cond, Σ1_cond

(array([0.1215625, 0.24875 ]),


array([[4.12874554, 1.95523214],
[1.92123214, 1.04592857]]))

240 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

13.12.4 Code for Iterating

Here is code for solving a dynamic filtering problem by iterating on our equations, followed by an example.

def iterate(x0_hat, Σ0, A, C, G, R, y_seq):

p, n = G.shape

T = len(y_seq)
x_hat_seq = np.empty((T+1, n))
Σ_hat_seq = np.empty((T+1, n, n))

x_hat_seq[0] = x0_hat
Σ_hat_seq[0] = Σ0

for t in range(T):
xt_hat = x_hat_seq[t]
Σt = Σ_hat_seq[t]
μ = np.hstack([xt_hat, G @ xt_hat])
Σ = np.block([[Σt, Σt @ G.T], [G @ Σt, G @ Σt @ G.T + R]])

# filtering
multi_normal = MultivariateNormal(μ, Σ)
multi_normal.partition(n)
x_tilde, Σ_tilde = multi_normal.cond_dist(0, y_seq[t])

# forecasting
x_hat_seq[t+1] = A @ x_tilde
Σ_hat_seq[t+1] = C @ C.T + A @ Σ_tilde @ A.T

return x_hat_seq, Σ_hat_seq

iterate(x0_hat, Σ0, A, C, G, R, [2.3, 1.2, 3.2])

(array([[0. , 1. ],
[0.1215625 , 0.24875 ],
[0.18680212, 0.06904689],
[0.75576875, 0.05558463]]),
array([[[1. , 0.5 ],
[0.3 , 2. ]],

[[4.12874554, 1.95523214],
[1.92123214, 1.04592857]],

[[4.08198663, 1.99218488],
[1.98640488, 1.00886423]],

[[4.06457628, 2.00041999],
[1.99943739, 1.00275526]]]))

The iterative algorithm just described is a version of the celebrated Kalman filter.
We describe the Kalman filter and some applications of it in A First Look at the Kalman Filter

13.12. Filtering Foundations 241


Quantitative Economics with Python

13.13 Classic Factor Analysis Model

The factor analysis model widely used in psychology and other fields can be represented as

𝑌 = Λ𝑓 + 𝑈

where
1. 𝑌 is 𝑛 × 1 random vector, 𝐸𝑈 𝑈 ′ = 𝐷 is a diagonal matrix,
2. Λ is 𝑛 × 𝑘 coefficient matrix,
3. 𝑓 is 𝑘 × 1 random vector, 𝐸𝑓𝑓 ′ = 𝐼,
4. 𝑈 is 𝑛 × 1 random vector, and 𝑈 ⟂ 𝑓 (i.e., 𝐸𝑈 𝑓 ′ = 0 )
5. It is presumed that 𝑘 is small relative to 𝑛; often 𝑘 is only 1 or 2, as in our IQ examples.
This implies that

Σ𝑦 = 𝐸𝑌 𝑌 ′ = ΛΛ′ + 𝐷
𝐸𝑌 𝑓 ′ = Λ
𝐸𝑓𝑌 ′ = Λ′

Thus, the covariance matrix Σ𝑌 is the sum of a diagonal matrix 𝐷 and a positive semi-definite matrix ΛΛ′ of rank 𝑘.
This means that all covariances among the 𝑛 components of the 𝑌 vector are intermediated by their common dependencies
on the 𝑘 < factors.
Form
𝑓
𝑍=( )
𝑌

the covariance matrix of the expanded random vector 𝑍 can be computed as

𝐼 Λ′
Σ𝑧 = 𝐸𝑍𝑍 ′ = ( )
Λ ΛΛ′ + 𝐷

In the following, we first construct the mean vector and the covariance matrix for the case where 𝑁 = 10 and 𝑘 = 2.

N = 10
k = 2

We set the coefficient matrix Λ and the covariance matrix of 𝑈 to be


1 0

⎜ ⋮ ⋮ ⎞
⎟ 𝜎𝑢2 0 ⋯ 0
⎜ ⎟ ⎛ ⎞
⎜ 1 0 ⎟ 0 𝜎𝑢2 ⋯ 0
Λ=⎜
⎜ ⎟
⎟ , 𝐷=⎜

⎜ ⋮



⎜ 0 1 ⎟ ⋮ ⋮ ⋮

⎜ ⎟

⋮ ⋮ ⎝ 0 0 ⋯ 𝜎𝑢2 ⎠
⎝ 0 1 ⎠

where the first half of the first column of Λ is filled with 1s and 0s for the rest half, and symmetrically for the second
column.
𝐷 is a diagonal matrix with parameter 𝜎𝑢2 on the diagonal.

242 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

Λ = np.zeros((N, k))
Λ[:N//2, 0] = 1
Λ[N//2:, 1] = 1

σu = .5
D = np.eye(N) * σu ** 2

# compute Σy
Σy = Λ @ Λ.T + D

We can now construct the mean vector and the covariance matrix for 𝑍.

μz = np.zeros(k+N)

Σz = np.empty((k+N, k+N))

Σz[:k, :k] = np.eye(k)


Σz[:k, k:] = Λ.T
Σz[k:, :k] = Λ
Σz[k:, k:] = Σy

z = np.random.multivariate_normal(μz, Σz)

f = z[:k]
y = z[k:]

multi_normal_factor = MultivariateNormal(μz, Σz)


multi_normal_factor.partition(k)

Let’s compute the conditional distribution of the hidden factor 𝑓 on the observations 𝑌 , namely, 𝑓 ∣ 𝑌 = 𝑦.

multi_normal_factor.cond_dist(0, y)

(array([0.37829706, 0.31441423]),
array([[0.04761905, 0. ],
[0. , 0.04761905]]))

We can verify that the conditional mean 𝐸 [𝑓 ∣ 𝑌 = 𝑦] = 𝐵𝑌 where 𝐵 = Λ′ Σ−1


𝑦 .

B = Λ.T @ np.linalg.inv(Σy)

B @ y

array([0.37829706, 0.31441423])

Similarly, we can compute the conditional distribution 𝑌 ∣ 𝑓.

multi_normal_factor.cond_dist(1, f)

(array([0.34553632, 0.34553632, 0.34553632, 0.34553632, 0.34553632,


0.37994724, 0.37994724, 0.37994724, 0.37994724, 0.37994724]),
(continues on next page)

13.13. Classic Factor Analysis Model 243


Quantitative Economics with Python

(continued from previous page)


array([[0.25, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0.25, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0.25, 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.25, 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0.25, 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0.25, 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0.25, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.25, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.25, 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.25]]))

It can be verified that the mean is Λ𝐼 −1 𝑓 = Λ𝑓.

Λ @ f

array([0.34553632, 0.34553632, 0.34553632, 0.34553632, 0.34553632,


0.37994724, 0.37994724, 0.37994724, 0.37994724, 0.37994724])

13.14 PCA and Factor Analysis

To learn about Principal Components Analysis (PCA), please see this lecture Singular Value Decompositions.
For fun, let’s apply a PCA decomposition to a covariance matrix Σ𝑦 that in fact is governed by our factor-analytic model.
Technically, this means that the PCA model is misspecified. (Can you explain why?)
Nevertheless, this exercise will let us study how well the first two principal components from a PCA can approximate the
conditional expectations 𝐸𝑓𝑖 |𝑌 for our two factors 𝑓𝑖 , 𝑖 = 1, 2 for the factor analytic model that we have assumed truly
governs the data on 𝑌 we have generated.
So we compute the PCA decomposition

̃ ′
Σ𝑦 = 𝑃 Λ𝑃

where Λ̃ is a diagonal matrix.


We have

𝑌 = 𝑃𝜖

and

𝜖 = 𝑃 ′𝑌

Note that we will arrange the eigenvectors in 𝑃 in the descending order of eigenvalues.

_tilde, P = np.linalg.eigh(Σy)

# arrange the eigenvectors by eigenvalues


ind = sorted(range(N), key=lambda x: _tilde[x], reverse=True)

P = P[:, ind]
_tilde = _tilde[ind]
(continues on next page)

244 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

(continued from previous page)


Λ_tilde = np.diag( _tilde)

print(' _tilde =', _tilde)

_tilde = [5.25 5.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25]

# verify the orthogonality of eigenvectors


np.abs(P @ P.T - np.eye(N)).max()

4.440892098500626e-16

# verify the eigenvalue decomposition is correct


P @ Λ_tilde @ P.T

array([[1.25, 1. , 1. , 1. , 1. , 0. , 0. , 0. , 0. , 0. ],
[1. , 1.25, 1. , 1. , 1. , 0. , 0. , 0. , 0. , 0. ],
[1. , 1. , 1.25, 1. , 1. , 0. , 0. , 0. , 0. , 0. ],
[1. , 1. , 1. , 1.25, 1. , 0. , 0. , 0. , 0. , 0. ],
[1. , 1. , 1. , 1. , 1.25, 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 1.25, 1. , 1. , 1. , 1. ],
[0. , 0. , 0. , 0. , 0. , 1. , 1.25, 1. , 1. , 1. ],
[0. , 0. , 0. , 0. , 0. , 1. , 1. , 1.25, 1. , 1. ],
[0. , 0. , 0. , 0. , 0. , 1. , 1. , 1. , 1.25, 1. ],
[0. , 0. , 0. , 0. , 0. , 1. , 1. , 1. , 1. , 1.25]])

ε = P.T @ y

print("ε = ", ε)

ε = [ 0.88819283 0.73820417 -0.18694674 0.33595973 -1.08939006 -0.05565398


0.57290059 0.38344127 -0.37376177 -0.14303457]

# print the values of the two factors

print('f = ', f)

f = [0.34553632 0.37994724]

Below we’ll plot several things


• the 𝑁 values of 𝑦
• the 𝑁 values of the principal components 𝜖
• the value of the first factor 𝑓1 plotted only for the first 𝑁 /2 observations of 𝑦 for which it receives a non-zero
loading in Λ
• the value of the second factor 𝑓2 plotted only for the final 𝑁 /2 observations for which it receives a non-zero loading
in Λ

13.14. PCA and Factor Analysis 245


Quantitative Economics with Python

plt.scatter(range(N), y, label='y')
plt.scatter(range(N), ε, label='$\epsilon$')
plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$')
plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$')
plt.legend()

plt.show()

Consequently, the first two 𝜖𝑗 correspond to the largest two eigenvalues.


Let’s look at them, after which we’ll look at 𝐸𝑓|𝑦 = 𝐵𝑦

ε[:2]

array([0.88819283, 0.73820417])

# compare with Ef|y


B @ y

array([0.37829706, 0.31441423])

The fraction of variance in 𝑦𝑡 explained by the first two principal components can be computed as below.

_tilde[:2].sum() / _tilde.sum()

0.84

Compute

𝑌 ̂ = 𝑃 𝑗 𝜖𝑗 + 𝑃 𝑘 𝜖𝑘

where 𝑃𝑗 and 𝑃𝑘 correspond to the largest two eigenvalues.

246 Chapter 13. Multivariate Normal Distribution


Quantitative Economics with Python

y_hat = P[:, :2] @ ε[:2]

In this example, it turns out that the projection 𝑌 ̂ of 𝑌 on the first two principal components does a good job of approx-
imating 𝐸𝑓 ∣ 𝑦.
We confirm this in the following plot of 𝑓, 𝐸𝑦 ∣ 𝑓, 𝐸𝑓 ∣ 𝑦, and 𝑦 ̂ on the coordinate axis versus 𝑦 on the ordinate axis.

plt.scatter(range(N), Λ @ f, label='$Ey|f$')
plt.scatter(range(N), y_hat, label='$\hat{y}$')
plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$')
plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$')

Efy = B @ y
plt.hlines(Efy[0], 0, N//2-1, ls='--', color='b', label='$Ef_{1}|y$')
plt.hlines(Efy[1], N//2, N-1, ls='-.', color='b', label='$Ef_{2}|y$')
plt.legend()

plt.show()

The covariance matrix of 𝑌 ̂ can be computed by first constructing the covariance matrix of 𝜖 and then use the upper left
block for 𝜖1 and 𝜖2 .

Σεjk = (P.T @ Σy @ P)[:2, :2]

Pjk = P[:, :2]

Σy_hat = Pjk @ Σεjk @ Pjk.T


print('Σy_hat = \n', Σy_hat)

Σy_hat =
[[1.05 1.05 1.05 1.05 1.05 0. 0. 0. 0. 0. ]
[1.05 1.05 1.05 1.05 1.05 0. 0. 0. 0. 0. ]
[1.05 1.05 1.05 1.05 1.05 0. 0. 0. 0. 0. ]
[1.05 1.05 1.05 1.05 1.05 0. 0. 0. 0. 0. ]
[1.05 1.05 1.05 1.05 1.05 0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 0. 1.05 1.05 1.05 1.05 1.05]
(continues on next page)

13.14. PCA and Factor Analysis 247


Quantitative Economics with Python

(continued from previous page)


[0. 0. 0. 0. 0. 1.05 1.05 1.05 1.05 1.05]
[0. 0. 0. 0. 0. 1.05 1.05 1.05 1.05 1.05]
[0. 0. 0. 0. 0. 1.05 1.05 1.05 1.05 1.05]
[0. 0. 0. 0. 0. 1.05 1.05 1.05 1.05 1.05]]

248 Chapter 13. Multivariate Normal Distribution


CHAPTER

FOURTEEN

HEAVY-TAILED DISTRIBUTIONS

Contents

• Heavy-Tailed Distributions
– Overview
– Visual Comparisons
– Failure of the LLN
– Classifying Tail Properties
– Exercises
– Solutions

In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon


!pip install --upgrade yfinance

14.1 Overview

Most commonly used probability distributions in classical statistics and the natural sciences have either bounded support
or light tails.
When a distribution is light-tailed, extreme observations are rare and draws tend not to deviate too much from the mean.
Having internalized these kinds of distributions, many researchers and practitioners use rules of thumb such as “outcomes
more than four or five standard deviations from the mean can safely be ignored.”
However, some distributions encountered in economics have far more probability mass in the tails than distributions like
the normal distribution.
With such heavy-tailed distributions, what would be regarded as extreme outcomes for someone accustomed to thin
tailed distributions occur relatively frequently.
Examples of heavy-tailed distributions observed in economic and financial settings include
• the income distributions and the wealth distribution (see, e.g., [Vil96], [BB18]),
• the firm size distribution ([Axt01], [Gab16]}),
• the distribution of returns on holding assets over short time horizons ([Man63], [Rac03]), and

249
Quantitative Economics with Python

• the distribution of city sizes ([RRGM11], [Gab16]).


These heavy tails turn out to be important for our understanding of economic outcomes.
As one example, the heaviness of the tail in the wealth distribution is one natural measure of inequality.
It matters for taxation and redistribution policies, as well as for flow-on effects for productivity growth, business cycles,
and political economy
• see, e.g., [AR02], [GSS03], [BEGS18] or [AKM+18].
This lecture formalizes some of the concepts introduced above and reviews the key ideas.
Let’s start with some imports:

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
import quantecon as qe

The following two lines can be added to avoid an annoying FutureWarning, and prevent a specific compatibility issue
between pandas and matplotlib from causing problems down the line:

from pandas.plotting import register_matplotlib_converters


register_matplotlib_converters()

14.2 Visual Comparisons

One way to build intuition on the difference between light and heavy tails is to plot independent draws and compare them
side-by-side.

14.2.1 A Simulation

The figure below shows a simulation. (You will be asked to replicate it in the exercises.)
The top two subfigures each show 120 independent draws from the normal distribution, which is light-tailed.
The bottom subfigure shows 120 independent draws from the Cauchy distribution, which is heavy-tailed.
In the top subfigure, the standard deviation of the normal distribution is 2, and the draws are clustered around the mean.
In the middle subfigure, the standard deviation is increased to 12 and, as expected, the amount of dispersion rises.
The bottom subfigure, with the Cauchy draws, shows a different pattern: tight clustering around the mean for the great
majority of observations, combined with a few sudden large deviations from the mean.
This is typical of a heavy-tailed distribution.

250 Chapter 14. Heavy-Tailed Distributions


Quantitative Economics with Python

14.2. Visual Comparisons 251


Quantitative Economics with Python

14.2.2 Heavy Tails in Asset Returns

Next let’s look at some financial data.


Our aim is to plot the daily change in the price of Amazon (AMZN) stock for the period from 1st January 2015 to 1st
November 2019.
This equates to daily returns if we set dividends aside.
The code below produces the desired plot using Yahoo financial data via the yfinance library.

import yfinance as yf
import pandas as pd

s = yf.download('AMZN', '2015-1-1', '2019-11-1')['Adj Close']

r = s.pct_change()

fig, ax = plt.subplots()

ax.plot(r, linestyle='', marker='o', alpha=0.5, ms=4)


ax.vlines(r.index, 0, r.values, lw=0.2)

ax.set_ylabel('returns', fontsize=12)
ax.set_xlabel('date', fontsize=12)

plt.show()

[*********************100%***********************] 1 of 1 completed

Five of the 1217 observations are more than 5 standard deviations from the mean.
Overall, the figure is suggestive of heavy tails, although not to the same degree as the Cauchy distribution the figure above.
If, however, one takes tick-by-tick data rather daily data, the heavy-tailedness of the distribution increases further.

252 Chapter 14. Heavy-Tailed Distributions


Quantitative Economics with Python

14.3 Failure of the LLN

One impact of heavy tails is that sample averages can be poor estimators of the underlying mean of the distribution.
To understand this point better, recall our earlier discussion of the Law of Large Numbers, which considered IID
𝑋1 , … , 𝑋𝑛 with common distribution 𝐹
𝑛
If 𝔼|𝑋𝑖 | is finite, then the sample mean 𝑋̄ 𝑛 ∶= 1
𝑛 ∑𝑖=1 𝑋𝑖 satisfies

ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (14.1)

where 𝜇 ∶= 𝔼𝑋𝑖 = ∫ 𝑥𝐹 (𝑥) is the common mean of the sample.


The condition 𝔼|𝑋𝑖 | = ∫ |𝑥|𝐹 (𝑥) < ∞ holds in most cases but can fail if the distribution 𝐹 is very heavy tailed.
For example, it fails for the Cauchy distribution.
Let’s have a look at the behavior of the sample mean in this case, and see whether or not the LLN is still valid.

from scipy.stats import cauchy

np.random.seed(1234)
N = 1_000

distribution = cauchy()

fig, ax = plt.subplots()
data = distribution.rvs(N)

# Compute sample mean at each n


sample_mean = np.empty(N)
for n in range(1, N):
sample_mean[n] = np.mean(data[:n])

# Plot
ax.plot(range(N), sample_mean, alpha=0.6, label='$\\bar X_n$')

ax.plot(range(N), np.zeros(N), 'k--', lw=0.5)


ax.legend()

plt.show()

14.3. Failure of the LLN 253


Quantitative Economics with Python

The sequence shows no sign of converging.


Will convergence occur if we take 𝑛 even larger?
The answer is no.
To see this, recall that the characteristic function of the Cauchy distribution is

𝜙(𝑡) = 𝔼𝑒𝑖𝑡𝑋 = ∫ 𝑒𝑖𝑡𝑥 𝑓(𝑥)𝑑𝑥 = 𝑒−|𝑡| (14.2)

Using independence, the characteristic function of the sample mean becomes

̄ 𝑡 𝑛
𝔼𝑒𝑖𝑡𝑋𝑛 = 𝔼 exp {𝑖 ∑𝑋 }
𝑛 𝑗=1 𝑗
𝑛
𝑡
= 𝔼 ∏ exp {𝑖 𝑋𝑗 }
𝑗=1
𝑛
𝑛
𝑡
= ∏ 𝔼 exp {𝑖 𝑋𝑗 } = [𝜙(𝑡/𝑛)]𝑛
𝑗=1
𝑛

In view of (14.2), this is just 𝑒−|𝑡| .


Thus, in the case of the Cauchy distribution, the sample mean itself has the very same Cauchy distribution, regardless of
𝑛!
In particular, the sequence 𝑋̄ 𝑛 does not converge to any point.

14.4 Classifying Tail Properties

To keep our discussion precise, we need some definitions concerning tail properties.
We will focus our attention on the right hand tails of nonnegative random variables and their distributions.
The definitions for left hand tails are very similar and we omit them to simplify the exposition.

254 Chapter 14. Heavy-Tailed Distributions


Quantitative Economics with Python

14.4.1 Light and Heavy Tails

A distribution 𝐹 on ℝ+ is called heavy-tailed if



∫ exp(𝑡𝑥)𝐹 (𝑑𝑥) = ∞ for all 𝑡 > 0. (14.3)
0

We say that a nonnegative random variable 𝑋 is heavy-tailed if its distribution 𝐹 (𝑥) ∶= ℙ{𝑋 ≤ 𝑥} is heavy-tailed.
This is equivalent to stating that its moment generating function 𝑚(𝑡) ∶= 𝔼 exp(𝑡𝑋) is infinite for all 𝑡 > 0.
• For example, the lognormal distribution is heavy-tailed because its moment generating function is infinite every-
where on (0, ∞).
A distribution 𝐹 on ℝ+ is called light-tailed if it is not heavy-tailed.
A nonnegative random variable 𝑋 is light-tailed if its distribution 𝐹 is light-tailed.
• Example: Every random variable with bounded support is light-tailed. (Why?)
• Example: If 𝑋 has the exponential distribution, with cdf 𝐹 (𝑥) = 1 − exp(−𝜆𝑥) for some 𝜆 > 0, then its moment
generating function is finite whenever 𝑡 < 𝜆. Hence 𝑋 is light-tailed.
One can show that if 𝑋 is light-tailed, then all of its moments are finite.
The contrapositive is that if some moment is infinite, then 𝑋 is heavy-tailed.
The latter condition is not necessary, however.
• Example: the lognormal distribution is heavy-tailed but every moment is finite.

14.4.2 Pareto Tails

One specific class of heavy-tailed distributions has been found repeatedly in economic and social phenomena: the class
of so-called power laws.
Specifically, given 𝛼 > 0, a nonnegative random variable 𝑋 is said to have a Pareto tail with tail index 𝛼 if

lim 𝑥𝛼 ℙ{𝑋 > 𝑥} = 𝑐. (14.4)


𝑥→∞

Evidently (14.4) implies the existence of positive constants 𝑏 and 𝑥̄ such that ℙ{𝑋 > 𝑥} ≥ 𝑏𝑥−𝛼 whenever 𝑥 ≥ 𝑥.̄
The implication is that ℙ{𝑋 > 𝑥} converges to zero no faster than 𝑥−𝛼 .
In some sources, a random variable obeying (14.4) is said to have a power law tail.
The primary example is the Pareto distribution, which has distribution
𝛼
1 − (𝑥/𝑥)
̄ if 𝑥 ≥ 𝑥̄
𝐹 (𝑥) = { (14.5)
0 if 𝑥 < 𝑥̄

for some positive constants 𝑥̄ and 𝛼.


It is easy to see that if 𝑋 ∼ 𝐹 , then ℙ{𝑋 > 𝑥} satisfies (14.4).
Thus, in line with the terminology, Pareto distributed random variables have a Pareto tail.

14.4. Classifying Tail Properties 255


Quantitative Economics with Python

14.4.3 Rank-Size Plots

One graphical technique for investigating Pareto tails and power laws is the so-called rank-size plot.
This kind of figure plots log size against log rank of the population (i.e., location in the population when sorted from
smallest to largest).
Often just the largest 5 or 10% of observations are plotted.
For a sufficiently large number of draws from a Pareto distribution, the plot generates a straight line. For distributions
with thinner tails, the data points are concave.
A discussion of why this occurs can be found in [NOM04].
The figure below provides one example, using simulated data.
The rank-size plots shows draws from three different distributions: folded normal, chi-squared with 1 degree of freedom
and Pareto.
The Pareto sample produces a straight line, while the lines produced by the other samples are concave.
You are asked to reproduce this figure in the exercises.

14.5 Exercises

Exercise 14.5.1
Replicate the figure presented above that compares normal and Cauchy draws.
Use np.random.seed(11) to set the seed.

Exercise 14.5.2
Prove: If 𝑋 has a Pareto tail with tail index 𝛼, then 𝔼[𝑋 𝑟 ] = ∞ for all 𝑟 ≥ 𝛼.

Exercise 14.5.3
Repeat exercise 1, but replace the three distributions (two normal, one Cauchy) with three Pareto distributions using
different choices of 𝛼.
For 𝛼, try 1.15, 1.5 and 1.75.
Use np.random.seed(11) to set the seed.

Exercise 14.5.4
Replicate the rank-size plot figure presented above.
If you like you can use the function qe.rank_size from the quantecon library to generate the plots.
Use np.random.seed(13) to set the seed.

Exercise 14.5.5

256 Chapter 14. Heavy-Tailed Distributions


Quantitative Economics with Python

14.5. Exercises 257


Quantitative Economics with Python

There is an ongoing argument about whether the firm size distribution should be modeled as a Pareto distribution or a
lognormal distribution (see, e.g., [FDGA+04], [KLS18] or [ST19a]).
This sounds esoteric but has real implications for a variety of economic phenomena.
To illustrate this fact in a simple way, let us consider an economy with 100,000 firms, an interest rate of r = 0.05 and
a corporate tax rate of 15%.
Your task is to estimate the present discounted value of projected corporate tax revenue over the next 10 years.
Because we are forecasting, we need a model.
We will suppose that
1. the number of firms and the firm size distribution (measured in profits) remain fixed and
2. the firm size distribution is either lognormal or Pareto.
Present discounted value of tax revenue will be estimated by
1. generating 100,000 draws of firm profit from the firm size distribution,
2. multiplying by the tax rate, and
3. summing the results with discounting to obtain present value.
The Pareto distribution is assumed to take the form (14.5) with 𝑥̄ = 1 and 𝛼 = 1.05.
(The value the tail index 𝛼 is plausible given the data [Gab16].)
To make the lognormal option as similar as possible to the Pareto option, choose its parameters such that the mean and
median of both distributions are the same.
Note that, for each distribution, your estimate of tax revenue will be random because it is based on a finite number of
draws.
To take this into account, generate 100 replications (evaluations of tax revenue) for each of the two distributions and
compare the two samples by
• producing a violin plot visualizing the two samples side-by-side and
• printing the mean and standard deviation of both samples.
For the seed use np.random.seed(1234).
What differences do you observe?
(Note: a better approach to this problem would be to model firm dynamics and try to track individual firms given the
current distribution. We will discuss firm dynamics in later lectures.)

14.6 Solutions

Solution to Exercise 14.5.1

n = 120
np.random.seed(11)

fig, axes = plt.subplots(3, 1, figsize=(6, 12))

for ax in axes:
(continues on next page)

258 Chapter 14. Heavy-Tailed Distributions


Quantitative Economics with Python

(continued from previous page)


ax.set_ylim((-120, 120))

s_vals = 2, 12

for ax, s in zip(axes[:2], s_vals):


data = np.random.randn(n) * s
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"draws from $N(0, \sigma^2)$ with $\sigma = {s}$", fontsize=11)

ax = axes[2]
distribution = cauchy()
data = distribution.rvs(n)
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"draws from the Cauchy distribution", fontsize=11)

plt.subplots_adjust(hspace=0.25)

plt.show()

14.6. Solutions 259


Quantitative Economics with Python

260 Chapter 14. Heavy-Tailed Distributions


Quantitative Economics with Python

Solution to Exercise 14.5.2


Let 𝑋 have a Pareto tail with tail index 𝛼 and let 𝐹 be its cdf.
Fix 𝑟 ≥ 𝛼.
As discussed after (14.4), we can take positive constants 𝑏 and 𝑥̄ such that

ℙ{𝑋 > 𝑥} ≥ 𝑏𝑥−𝛼 whenever 𝑥 ≥ 𝑥̄

But then
∞ 𝑥̄ ∞
𝔼𝑋 𝑟 = 𝑟 ∫ 𝑥𝑟−1 ℙ{𝑋 > 𝑥}𝑥 ≥ 𝑟 ∫ 𝑥𝑟−1 ℙ{𝑋 > 𝑥}𝑥 + 𝑟 ∫ 𝑥𝑟−1 𝑏𝑥−𝛼 𝑥.
0 0 𝑥̄


We know that ∫𝑥̄ 𝑥𝑟−𝛼−1 𝑥 = ∞ whenever 𝑟 − 𝛼 − 1 ≥ −1.
Since 𝑟 ≥ 𝛼, we have 𝔼𝑋 𝑟 = ∞.

Solution to Exercise 14.5.3

from scipy.stats import pareto

np.random.seed(11)

n = 120
alphas = [1.15, 1.50, 1.75]

fig, axes = plt.subplots(3, 1, figsize=(6, 8))

for (a, ax) in zip(alphas, axes):


ax.set_ylim((-5, 50))
data = pareto.rvs(size=n, scale=1, b=a)
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"Pareto draws with $\\alpha = {a}$", fontsize=11)

plt.subplots_adjust(hspace=0.4)

plt.show()

14.6. Solutions 261


Quantitative Economics with Python

Solution to Exercise 14.5.4


First let’s generate the data for the plots:

sample_size = 1000
np.random.seed(13)
z = np.random.randn(sample_size)

data_1 = np.abs(z)
data_2 = np.exp(z)
data_3 = np.exp(np.random.exponential(scale=1.0, size=sample_size))
(continues on next page)

262 Chapter 14. Heavy-Tailed Distributions


Quantitative Economics with Python

(continued from previous page)

data_list = [data_1, data_2, data_3]

Now we plot the data:

fig, axes = plt.subplots(3, 1, figsize=(6, 8))


axes = axes.flatten()
labels = ['$|z|$', '$\exp(z)$', 'Pareto with tail index $1.0$']

for data, label, ax in zip(data_list, labels, axes):

rank_data, size_data = qe.rank_size(data)

ax.loglog(rank_data, size_data, 'o', markersize=3.0, alpha=0.5, label=label)


ax.set_xlabel("log rank")
ax.set_ylabel("log size")

ax.legend()

fig.subplots_adjust(hspace=0.4)

plt.show()

14.6. Solutions 263


Quantitative Economics with Python

Solution to Exercise 14.5.5


To do the exercise, we need to choose the parameters 𝜇 and 𝜎 of the lognormal distribution to match the mean and median
of the Pareto distribution.
Here we understand the lognormal distribution as that of the random variable exp(𝜇 + 𝜎𝑍) when 𝑍 is standard normal.
The mean and median of the Pareto distribution (14.5) with 𝑥̄ = 1 are
𝛼
mean = and median = 21/𝛼
𝛼−1

264 Chapter 14. Heavy-Tailed Distributions


Quantitative Economics with Python

Using the corresponding expressions for the lognormal distribution leads us to the equations
𝛼
= exp(𝜇 + 𝜎2 /2) and 21/𝛼 = exp(𝜇)
𝛼−1
which we solve for 𝜇 and 𝜎 given 𝛼 = 1.05.
Here is code that generates the two samples, produces the violin plot and prints the mean and standard deviation of the
two samples.

num_firms = 100_000
num_years = 10
tax_rate = 0.15
r = 0.05

β = 1 / (1 + r) # discount factor

x_bar = 1.0
α = 1.05

def pareto_rvs(n):
"Uses a standard method to generate Pareto draws."
u = np.random.uniform(size=n)
y = x_bar / (u**(1/α))
return y

Let’s compute the lognormal parameters:

μ = np.log(2) / α
σ_sq = 2 * (np.log(α/(α - 1)) - np.log(2)/α)
σ = np.sqrt(σ_sq)

Here’s a function to compute a single estimate of tax revenue for a particular choice of distribution dist.

def tax_rev(dist):
tax_raised = 0
for t in range(num_years):
if dist == 'pareto':
π = pareto_rvs(num_firms)
else:
π = np.exp(μ + σ * np.random.randn(num_firms))
tax_raised += β**t * np.sum(π * tax_rate)
return tax_raised

Now let’s generate the violin plot.

num_reps = 100
np.random.seed(1234)

tax_rev_lognorm = np.empty(num_reps)
tax_rev_pareto = np.empty(num_reps)

for i in range(num_reps):
tax_rev_pareto[i] = tax_rev('pareto')
tax_rev_lognorm[i] = tax_rev('lognorm')

fig, ax = plt.subplots()

(continues on next page)

14.6. Solutions 265


Quantitative Economics with Python

(continued from previous page)


data = tax_rev_pareto, tax_rev_lognorm

ax.violinplot(data)

plt.show()

Finally, let’s print the means and standard deviations.

tax_rev_pareto.mean(), tax_rev_pareto.std()

(1.4587290546623734e+06, 406089.3613661567)

tax_rev_lognorm.mean(), tax_rev_lognorm.std()

(2556174.8615230713, 25586.44456513965)

Looking at the output of the code, our main conclusion is that the Pareto assumption leads to a lower mean and greater
dispersion.

266 Chapter 14. Heavy-Tailed Distributions


CHAPTER

FIFTEEN

FAULT TREE UNCERTAINTIES

15.1 Overview

This lecture puts elementary tools to work to approximate probability distributions of the annual failure rates of a system
consisting of a number of critical parts.
We’ll use log normal distributions to approximate probability distributions of critical component parts.
To approximate the probability distribution of the sum of 𝑛 log normal probability distributions that describes the failure
rate of the entire system, we’ll compute the convolution of those 𝑛 log normal probability distributions.
We’ll use the following concepts and tools:
• log normal distributions
• the convolution theorem that describes the probability distribution of the sum independent random variables
• fault tree analysis for approximating a failure rate of a multi-component system
• a hierarchical probability model for describing uncertain probabilities
• Fourier transforms and inverse Fourier tranforms as efficient ways of computing convolutions of sequences
For more about Fourier transforms see this quantecon lecture Circulant Matrices as well as these lecture Covariance
Stationary Processes and Estimation of Spectra.
El-Shanawany, Ardron, and Walker [ESAW18] and Greenfield and Sargent [GS93] used some of the methods described
here to approximate probabilities of failures of safety systems in nuclear facilities.
These methods respond to some of the recommendations made by Apostolakis [Apo90] for constructing procedures for
quantifying uncertainty about the reliability of a safety system.
We’ll start by bringing in some Python machinery.

!pip install tabulate

Requirement already satisfied: tabulate in /usr/share/miniconda3/envs/quantecon/


↪lib/python3.9/site-packages (0.8.9)

import numpy as np
from numpy import fft
import matplotlib.pyplot as plt
import scipy as sc
from scipy.signal import fftconvolve
from tabulate import tabulate
(continues on next page)

267
Quantitative Economics with Python

(continued from previous page)


import time
%matplotlib inline

np.set_printoptions(precision=3, suppress=True)

15.2 Log normal distribution

If a random variable 𝑥 follows a normal distribution with mean 𝜇 and variance 𝜎2 , then the natural logarithm of 𝑥, say
𝑦 = log(𝑥), follows a log normal distribution with parameters 𝜇, 𝜎2 .
Notice that we said parameters and not mean and variance 𝜇, 𝜎2 .
• 𝜇 and 𝜎2 are the mean and variance of 𝑥 = exp(𝑦)
• they are not the mean and variance of 𝑦
1 2 2 2
• instead, the mean of 𝑦 is 𝑒𝜇+ 2 𝜎 and the variance of 𝑦 is (𝑒𝜎 − 1)𝑒2𝜇+𝜎
A log normal random variable 𝑦 is nonnegative.
The density for a log normal random variate 𝑦 is

1 −(log 𝑦 − 𝜇)2
𝑓(𝑦) = √ exp ( )
𝑦𝜎 2𝜋 2𝜎2

for 𝑦 ≥ 0.
Important features of a log normal random variable are
1 2
mean: 𝑒𝜇+ 2 𝜎
2 2
variance: (𝑒𝜎 − 1)𝑒2𝜇+𝜎
median: 𝑒𝜇
2
mode: 𝑒𝜇−𝜎
.95 quantile: 𝑒𝜇+1.645𝜎
.95-.05 quantile ratio: 𝑒1.645𝜎

Recall the following stability property of two independent normally distributed random variables:
If 𝑥1 is normal with mean 𝜇1 and variance 𝜎12 and 𝑥2 is independent of 𝑥1 and normal with mean 𝜇2 and variance 𝜎22 ,
then 𝑥1 + 𝑥2 is normally distributed with mean 𝜇1 + 𝜇2 and variance 𝜎12 + 𝜎22 .
Independent log normal distributions have a different stability property.
The product of independent log normal random variables is also log normal.
In particular, if 𝑦1 is log normal with parameters (𝜇1 , 𝜎12 ) and 𝑦2 is log normal with parameters (𝜇2 , 𝜎22 ), then the product
𝑦1 𝑦2 is log normal with parameters (𝜇1 + 𝜇2 , 𝜎12 + 𝜎22 ).
Note: While the product of two log normal distributions is log normal, the sum of two log normal distributions is not
log normal.
This observation sets the stage for challenge that confronts us in this lecture, namely, to approximate probability distri-
butions of sums of independent log normal random variables.
To compute the probability distribution of the sum of two log normal distributions, we can use the following convolution
property of a probability distribution that is a sum of independent random variables.

268 Chapter 15. Fault Tree Uncertainties


Quantitative Economics with Python

15.3 The Convolution Property

Let 𝑥 be a random variable with probability density 𝑓(𝑥), where 𝑥 ∈ R.


Let 𝑦 be a random variable with probability density 𝑔(𝑦), where 𝑦 ∈ R.
Let 𝑥 and 𝑦 be independent random variables and let 𝑧 = 𝑥 + 𝑦 ∈ R.
Then the probability distribution of 𝑧 is

ℎ(𝑧) = (𝑓 ∗ 𝑔)(𝑧) ≡ ∫ 𝑓(𝑧)𝑔(𝑧 − 𝜏 )𝑑𝜏
−∞

where (𝑓 ∗ 𝑔) denotes the convolution of the two functions 𝑓 and 𝑔.


If the random variables are both nonnegative, then the above formula specializes to

ℎ(𝑧) = (𝑓 ∗ 𝑔)(𝑧) ≡ ∫ 𝑓(𝑧)𝑔(𝑧 − 𝜏 )𝑑𝜏
0

Below, we’ll use a discretized version of the preceding formula.


In particular, we’ll replace both 𝑓 and 𝑔 with discretized counterparts, normalized to sum to 1 so that they are probability
distributions.
• by discretized we mean an equally spaced sampled version
Then we’ll use the following version of the above formula

ℎ𝑛 = (𝑓 ∗ 𝑔)𝑛 = ∑ 𝑓𝑚 𝑔𝑛−𝑚 , 𝑛 ≥ 0
𝑚=0

to compute a discretized version of the probability distribution of the sum of two random variables, one with probability
mass function 𝑓, the other with probability mass function 𝑔.
Before applying the convolution property to sums of log normal distributions, let’s practice on some simple discrete
distributions.
To take one example, let’s consider the following two probability distributions

𝑓𝑗 = Prob(𝑋 = 𝑗), 𝑗 = 0, 1

and

𝑔𝑗 = Prob(𝑌 = 𝑗), 𝑗 = 0, 1, 2, 3

and

ℎ𝑗 = Prob(𝑍 ≡ 𝑋 + 𝑌 = 𝑗), 𝑗 = 0, 1, 2, 3, 4

The convolution property tells us that

ℎ=𝑓 ∗𝑔 =𝑔∗𝑓

Let’s compute an example using the numpy.convolve and scipy.signal.fftconvolve.

15.3. The Convolution Property 269


Quantitative Economics with Python

f = [.75, .25]
g = [0., .6, 0., .4]
h = np.convolve(f,g)
hf = fftconvolve(f,g)

print("f = ", f, ", np.sum(f) = ", np.sum(f))


print("g = ", g, ", np.sum(g) = ", np.sum(g))
print("h = ", h, ", np.sum(h) = ", np.sum(h))
print("hf = ", hf, ",np.sum(hf) = ", np.sum(hf))

f = [0.75, 0.25] , np.sum(f) = 1.0


g = [0.0, 0.6, 0.0, 0.4] , np.sum(g) = 1.0
h = [0. 0.45 0.15 0.3 0.1 ] , np.sum(h) = 1.0
hf = [0. 0.45 0.15 0.3 0.1 ] ,np.sum(hf) = 1.0000000000000002

A little later we’ll explain some advantages that come from using scipy.signal.ftconvolve rather than numpy.
convolve.numpy program convolve.
They provide the same answers but scipy.signal.ftconvolve is much faster.
That’s why we rely on it later in this lecture.

15.4 Approximating Distributions

We’ll construct an example to verify that discretized distributions can do a good job of approximating samples drawn
from underlying continuous distributions.
We’ll start by generating samples of size 25000 of three independent log normal random variates as well as pairwise and
triple-wise sums.
Then we’ll plot histograms and compare them with convolutions of appropriate discretized log normal distributions.

## create sums of two and three log normal random variates ssum2 = s1 + s2 and ssum3␣
↪= s1 + s2 + s3

mu1, sigma1 = 5., 1. # mean and standard deviation


s1 = np.random.lognormal(mu1, sigma1, 25000)

mu2, sigma2 = 5., 1. # mean and standard deviation


s2 = np.random.lognormal(mu2, sigma2, 25000)

mu3, sigma3 = 5., 1. # mean and standard deviation


s3 = np.random.lognormal(mu3, sigma3, 25000)

ssum2 = s1 + s2

ssum3 = s1 + s2 + s3

count, bins, ignored = plt.hist(s1, 1000, density=True, align='mid')

270 Chapter 15. Fault Tree Uncertainties


Quantitative Economics with Python

count, bins, ignored = plt.hist(ssum2, 1000, density=True, align='mid')

count, bins, ignored = plt.hist(ssum3, 1000, density=True, align='mid')

15.4. Approximating Distributions 271


Quantitative Economics with Python

samp_mean2 = np.mean(s2)
pop_mean2 = np.exp(mu2+ (sigma2**2)/2)

pop_mean2, samp_mean2, mu2, sigma2

(2.4469193226422038e+02, 245.64259335142356, 5.0, 1.0)

Here are helper functions that create a discretized version of a log normal probability density function.

def p_log_normal(x,μ,σ):
p = 1 / (σ*x*np.sqrt(2*np.pi)) * np.exp(-1/2*((np.log(x) - μ)/σ)**2)
return p

def pdf_seq(μ,σ,I,m):
x = np.arange(1e-7,I,m)
p_array = p_log_normal(x,μ,σ)
p_array_norm = p_array/np.sum(p_array)
return p_array,p_array_norm,x

Now we shall set a grid length 𝐼 and a grid increment size 𝑚 = 1 for our discretizations.
Note: We set 𝐼 equal to a power of two because we want to be free to use a Fast Fourier Transform to compute a
convolution of two sequences (discrete distributions).
We recommend experimenting with different values of the power 𝑝 of 2.
Setting it to 15 rather than 12, for example, improves how well the discretized probability mass function approximates
the original continuous probability density function being studied.

p=15
I = 2**p # Truncation value
m = .1 # increment size

272 Chapter 15. Fault Tree Uncertainties


Quantitative Economics with Python

## Cell to check -- note what happens when don't normalize!


## things match up without adjustment. Compare with above

p1,p1_norm,x = pdf_seq(mu1,sigma1,I,m)
## compute number of points to evaluate the probability mass function
NT = x.size

plt.figure(figsize = (8,8))
plt.subplot(2,1,1)
plt.plot(x[:np.int(NT)],p1[:np.int(NT)],label = '')
plt.xlim(0,2500)
count, bins, ignored = plt.hist(s1, 1000, density=True, align='mid')

plt.show()

/tmp/ipykernel_15783/2130497458.py:10: DeprecationWarning: `np.int` is a␣


↪deprecated alias for the builtin `int`. To silence this warning, use `int` by␣

↪itself. Doing this will not modify any behavior and is safe. When replacing `np.

↪int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision.

↪ If you wish to review your current use, check the release note link for␣

↪additional information.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

plt.plot(x[:np.int(NT)],p1[:np.int(NT)],label = '')

# Compute mean from discretized pdf and compare with the theoretical value

mean= np.sum(np.multiply(x[:NT],p1_norm[:NT]))
meantheory = np.exp(mu1+.5*sigma1**2)
mean, meantheory

(2.446905989830291e+02, 244.69193226422038)

15.4. Approximating Distributions 273


Quantitative Economics with Python

15.5 Convolving Probability Mass Functions

Now let’s use the convolution theorem to compute the probability distribution of a sum of the two log normal random
variables we have parameterized above.
We’ll also compute the probability of a sum of three log normal distributions constructed above.
Before we do these things, we shall explain our choice of Python algorithm to compute a convolution of two sequences.
Because the sequences that we convolve are long, we use the scipy.signal.fftconvolve function rather than
the numpy.convove function.
These two functions give virtually equivalent answers but for long sequences scipy.signal.fftconvolve is much
faster.
The program scipy.signal.fftconvolve uses fast Fourier transforms and their inverses to calculate convolu-
tions.
Let’s define the Fourier transform and the inverse Fourier transform.
The Fourier transform of a sequence {𝑥𝑡 }𝑇𝑡=0
−1
is a sequence of complex numbers {𝑥(𝜔𝑗 )}𝑇𝑗=0
−1
given by

𝑇 −1
𝑥(𝜔𝑗 ) = ∑ 𝑥𝑡 exp(−𝑖𝜔𝑗 𝑡) (15.1)
𝑡=0

2𝜋𝑗
where 𝜔𝑗 = 𝑇 for 𝑗 = 0, 1, … , 𝑇 − 1.
The inverse Fourier transform of the sequence {𝑥(𝜔𝑗 )}𝑇𝑗=0
−1
is

𝑇 −1
𝑥𝑡 = 𝑇 −1 ∑ 𝑥(𝜔𝑗 ) exp(𝑖𝜔𝑗 𝑡) (15.2)
𝑗=0

The sequences {𝑥𝑡 }𝑇𝑡=0


−1
and {𝑥(𝜔𝑗 )}𝑇𝑗=0
−1
contain the same information.
The pair of equations (15.1) and (15.2) tell how to recover one series from its Fourier partner.
The program scipy.signal.fftconvolve deploys the theorem that a convolution of two sequences {𝑓𝑘 }, {𝑔𝑘 }
can be computed in the following way:
• Compute Fourier transforms 𝐹 (𝜔), 𝐺(𝜔) of the {𝑓𝑘 } and {𝑔𝑘 } sequences, respectively
• Form the product 𝐻(𝜔) = 𝐹 (𝜔)𝐺(𝜔)
• The convolution of 𝑓 ∗ 𝑔 is the inverse Fourier transform of 𝐻(𝜔)
The fast Fourier transform and the associated inverse fast Fourier transform execute these calculations very quickly.
This is the algorithm that scipy.signal.fftconvolve uses.
Let’s do a warmup calculation that compares the times taken by numpy.convove and scipy.signal.
fftconvolve.

p1,p1_norm,x = pdf_seq(mu1,sigma1,I,m)
p2,p2_norm,x = pdf_seq(mu2,sigma2,I,m)
p3,p3_norm,x = pdf_seq(mu3,sigma3,I,m)

tic = time.perf_counter()

c1 = np.convolve(p1_norm,p2_norm)
c2 = np.convolve(c1,p3_norm)
(continues on next page)

274 Chapter 15. Fault Tree Uncertainties


Quantitative Economics with Python

(continued from previous page)

toc = time.perf_counter()

tdiff1 = toc - tic

tic = time.perf_counter()

c1f = fftconvolve(p1_norm,p2_norm)
c2f = fftconvolve(c1f,p3_norm)
toc = time.perf_counter()

toc = time.perf_counter()

tdiff2 = toc - tic

print("time with np.convolve = ", tdiff1, "; time with fftconvolve = ", tdiff2)

time with np.convolve = 78.51970149099998 ; time with fftconvolve = 0.


↪11671551499966881

The fast Fourier transform is two orders of magnitude faster than numpy.convolve
Now let’s plot our computed probability mass function approximation for the sum of two log normal random variables
against the histogram of the sample that we formed above.

NT= np.size(x)

plt.figure(figsize = (8,8))
plt.subplot(2,1,1)
plt.plot(x[:np.int(NT)],c1f[:np.int(NT)]/m,label = '')
plt.xlim(0,5000)

count, bins, ignored = plt.hist(ssum2, 1000, density=True, align='mid')


# plt.plot(P2P3[:10000],label = 'FFT method',linestyle = '--')

plt.show()

/tmp/ipykernel_15783/2364144025.py:5: DeprecationWarning: `np.int` is a deprecated␣


↪alias for the builtin `int`. To silence this warning, use `int` by itself. Doing␣

↪this will not modify any behavior and is safe. When replacing `np.int`, you may␣

↪wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish␣

↪to review your current use, check the release note link for additional␣

↪information.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

plt.plot(x[:np.int(NT)],c1f[:np.int(NT)]/m,label = '')

15.5. Convolving Probability Mass Functions 275


Quantitative Economics with Python

NT= np.size(x)
plt.figure(figsize = (8,8))
plt.subplot(2,1,1)
plt.plot(x[:np.int(NT)],c2f[:np.int(NT)]/m,label = '')
plt.xlim(0,5000)

count, bins, ignored = plt.hist(ssum3, 1000, density=True, align='mid')


# plt.plot(P2P3[:10000],label = 'FFT method',linestyle = '--')

plt.show()

/tmp/ipykernel_15783/3883904051.py:4: DeprecationWarning: `np.int` is a deprecated␣


↪alias for the builtin `int`. To silence this warning, use `int` by itself. Doing␣

↪this will not modify any behavior and is safe. When replacing `np.int`, you may␣

↪wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish␣

↪to review your current use, check the release note link for additional␣

↪information.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

plt.plot(x[:np.int(NT)],c2f[:np.int(NT)]/m,label = '')

276 Chapter 15. Fault Tree Uncertainties


Quantitative Economics with Python

## Let's compute the mean of the discretized pdf


mean= np.sum(np.multiply(x[:NT],c1f[:NT]))
# meantheory = np.exp(mu1+.5*sigma1**2)
mean, 2*meantheory

(489.38109740938546, 489.38386452844077)

## Let's compute the mean of the discretized pdf


mean= np.sum(np.multiply(x[:NT],c2f[:NT]))
# meantheory = np.exp(mu1+.5*sigma1**2)
mean, 3*meantheory

(734.0714863312252, 734.0757967926611)

15.6 Failure Tree Analysis

We shall soon apply the convolution theorem to compute the probability of a top event in a failure tree analysis.
Before applying the convolution theorem, we first describe the model that connects constituent events to the top end whose
failure rate we seek to quantify.
The model is an example of the widely used failure tree analysis described by El-Shanawany, Ardron, and Walker
[ESAW18].
To construct the statistical model, we repeatedly use what is called the rare event approximation.
We want to compute the probabilty of an event 𝐴 ∪ 𝐵.
• the union 𝐴 ∪ 𝐵 is the event that 𝐴 OR 𝐵 occurs
A law of probability tells us that 𝐴 OR 𝐵 occurs with probability

𝑃 (𝐴 ∪ 𝐵) = 𝑃 (𝐴) + 𝑃 (𝐵) − 𝑃 (𝐴 ∩ 𝐵)

where the intersection 𝐴 ∩ 𝐵 is the event that 𝐴 AND 𝐵 both occur and the union 𝐴 ∪ 𝐵 is the event that 𝐴 OR 𝐵
occurs.

15.6. Failure Tree Analysis 277


Quantitative Economics with Python

If 𝐴 and 𝐵 are independent, then

𝑃 (𝐴 ∩ 𝐵) = 𝑃 (𝐴)𝑃 (𝐵)

If 𝑃 (𝐴) and 𝑃 (𝐵) are both small, then 𝑃 (𝐴)𝑃 (𝐵) is even smaller.
The rare event approximation is

𝑃 (𝐴 ∪ 𝐵) ≈ 𝑃 (𝐴) + 𝑃 (𝐵)

This approximation is widely used in evaluating system failures.

15.7 Application

A system has been designed with the feature a system failure occurs when any of 𝑛 critical components fails.
The failure probability 𝑃 (𝐴𝑖 ) of each event 𝐴𝑖 is small.
We assume that failures of the components are statistically independent random variables.
We repeatedly apply a rare event approximation to obtain the following formula for the problem of a system failure:

𝑃 (𝐹 ) ≈ 𝑃 (𝐴1 ) + 𝑃 (𝐴2 ) + ⋯ + 𝑃 (𝐴𝑛 )

or
𝑛
𝑃 (𝐹 ) ≈ ∑ 𝑃 (𝐴𝑖 ) (15.3)
𝑖=1

Probabilities for each event are recorded as failure rates per year.

15.8 Failure Rates Unknown

Now we come to the problem that really interests us, following [ESAW18] and Greenfield and Sargent [GS93] in the
spirit of Apostolakis [Apo90].
The constituent probabilities or failure rates 𝑃 (𝐴𝑖 ) are not known a priori and have to be estimated.
We address this problem by specifying probabilities of probabilities that capture one notion of not knowing the con-
stituent probabilities that are inputs into a failure tree analysis.
Thus, we assume that a system analyst is uncertain about the failure rates 𝑃 (𝐴𝑖 ), 𝑖 = 1, … , 𝑛 for components of a system.
The analyst copes with this situation by regarding the systems failure probability 𝑃 (𝐹 ) and each of the component prob-
abilities 𝑃 (𝐴𝑖 ) as random variables.
• dispersions of the probability distribution of 𝑃 (𝐴𝑖 ) characterizes the analyst’s uncertainty about the failure prob-
ability 𝑃 (𝐴𝑖 )
• the dispersion of the implied probability distribution of 𝑃 (𝐹 ) characterizes his uncertainty about the probability
of a system’s failure.
This leads to what is sometimes called a hierarchical model in which the analyst has probabilities about the probabilities
𝑃 (𝐴𝑖 ).
The analyst formalizes his uncertainty by assuming that
• the failure probability 𝑃 (𝐴𝑖 ) is itself a log normal random variable with parameters (𝜇𝑖 , 𝜎𝑖 ).

278 Chapter 15. Fault Tree Uncertainties


Quantitative Economics with Python

• failure rates 𝑃 (𝐴𝑖 ) and 𝑃 (𝐴𝑗 ) are statistically independent for all pairs with 𝑖 ≠ 𝑗.
The analyst calibrates the parameters (𝜇𝑖 , 𝜎𝑖 ) for the failure events 𝑖 = 1, … , 𝑛 by reading reliability studies in engineering
papers that have studied historical failure rates of components that are as similar as possible to the components being used
in the system under study.
The analyst assumes that such information about the observed dispersion of annual failure rates, or times to failure, can
inform him of what to expect about parts’ performances in his system.
The analyst assumes that the random variables 𝑃 (𝐴𝑖 ) are statistically mutually independent.
The analyst wants to approximate a probability mass function and cumulative distribution function of the systems failure
probability 𝑃 (𝐹 ).
• We say probability mass function because of how we discretize each random variable, as described earlier.
The analyst calculates the probability mass function for the top event 𝐹 , i.e., a system failure, by repeatedly applying
the convolution theorem to compute the probability distribution of a sum of independent log normal random variables, as
described in equation (15.3).

15.9 Waste Hoist Failure Rate

We’ll take close to a real world example by assuming that 𝑛 = 14.


The example estimates the annual failure rate of a critical hoist at a nuclear waste facility.
A regulatory agency wants the sytem to be designed in a way that makes the failure rate of the top event small with high
probability.
This example is Design Option B-2 (Case I) described in Table 10 on page 27 of [GS93].
The table describes parameters 𝜇𝑖 , 𝜎𝑖 for fourteen log normal random variables that consist of seven pairs of random
variables that are identically and independently distributed.
• Within a pair, parameters 𝜇𝑖 , 𝜎𝑖 are the same
• As described in table 10 of [GS93] p. 27, parameters of log normal distributions for the seven unique probabilities
𝑃 (𝐴𝑖 ) have been calibrated to be the values in the following Python code:

mu1, sigma1 = 4.28, 1.1947


mu2, sigma2 = 3.39, 1.1947
mu3, sigma3 = 2.795, 1.1947
mu4, sigma4 = 2.717, 1.1947
mu5, sigma5 = 2.717, 1.1947
mu6, sigma6 = 1.444, 1.4632
mu7, sigma7 = -.040, 1.4632

Note: Because the failure rates are all very small, log normal distributions with the above parameter values actually
describe 𝑃 (𝐴𝑖 ) times 10−09 .
So the probabilities that we’ll put on the 𝑥 axis of the probability mass function and associated cumulative distribution
function should be multiplied by 10−09
To extract a table that summarizes computed quantiles, we’ll use a helper function

def find_nearest(array, value):


array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return idx

15.9. Waste Hoist Failure Rate 279


Quantitative Economics with Python

We compute the required thirteen convolutions in the following code.


(Please feel free to try different values of the power parameter 𝑝 that we use to set the number of points in our grid for
constructing the probability mass functions that discretize the continuous log normal distributions.)
We’ll plot a counterpart to the cumulative distribution function (CDF) in figure 5 on page 29 of [GS93] and we’ll also
present a counterpart to their Table 11 on page 28.

p=15
I = 2**p # Truncation value
m = .05 # increment size

p1,p1_norm,x = pdf_seq(mu1,sigma1,I,m)
p2,p2_norm,x = pdf_seq(mu2,sigma2,I,m)
p3,p3_norm,x = pdf_seq(mu3,sigma3,I,m)
p4,p4_norm,x = pdf_seq(mu4,sigma4,I,m)
p5,p5_norm,x = pdf_seq(mu5,sigma5,I,m)
p6,p6_norm,x = pdf_seq(mu6,sigma6,I,m)
p7,p7_norm,x = pdf_seq(mu7,sigma7,I,m)
p8,p8_norm,x = pdf_seq(mu7,sigma7,I,m)
p9,p9_norm,x = pdf_seq(mu7,sigma7,I,m)
p10,p10_norm,x = pdf_seq(mu7,sigma7,I,m)
p11,p11_norm,x = pdf_seq(mu7,sigma7,I,m)
p12,p12_norm,x = pdf_seq(mu7,sigma7,I,m)
p13,p13_norm,x = pdf_seq(mu7,sigma7,I,m)
p14,p14_norm,x = pdf_seq(mu7,sigma7,I,m)

tic = time.perf_counter()

c1 = fftconvolve(p1_norm,p2_norm)
c2 = fftconvolve(c1,p3_norm)
c3 = fftconvolve(c2,p4_norm)
c4 = fftconvolve(c3,p5_norm)
c5 = fftconvolve(c4,p6_norm)
c6 = fftconvolve(c5,p7_norm)
c7 = fftconvolve(c6,p8_norm)
c8 = fftconvolve(c7,p9_norm)
c9 = fftconvolve(c8,p10_norm)
c10 = fftconvolve(c9,p11_norm)
c11 = fftconvolve(c10,p12_norm)
c12 = fftconvolve(c11,p13_norm)
c13 = fftconvolve(c12,p14_norm)

toc = time.perf_counter()

tdiff13 = toc - tic

print("time for 13 convolutions = ", tdiff13)

time for 13 convolutions = 6.735937851999552

d13 = np.cumsum(c13)
Nx=np.int(1400)
plt.figure()
(continues on next page)

280 Chapter 15. Fault Tree Uncertainties


Quantitative Economics with Python

(continued from previous page)


plt.plot(x[0:np.int(Nx/m)],d13[0:np.int(Nx/m)]) # show Yad this -- I multiplied by m␣
↪-- step size

plt.hlines(0.5,min(x),Nx,linestyles='dotted',colors = {'black'})
plt.hlines(0.9,min(x),Nx,linestyles='dotted',colors = {'black'})
plt.hlines(0.95,min(x),Nx,linestyles='dotted',colors = {'black'})
plt.hlines(0.1,min(x),Nx,linestyles='dotted',colors = {'black'})
plt.hlines(0.05,min(x),Nx,linestyles='dotted',colors = {'black'})
plt.ylim(0,1)
plt.xlim(0,Nx)
plt.xlabel("$x10^{-9}$",loc = "right")
plt.show()

x_1 = x[find_nearest(d13,0.01)]
x_5 = x[find_nearest(d13,0.05)]
x_10 = x[find_nearest(d13,0.1)]
x_50 = x[find_nearest(d13,0.50)]
x_66 = x[find_nearest(d13,0.665)]
x_85 = x[find_nearest(d13,0.85)]
x_90 = x[find_nearest(d13,0.90)]
x_95 = x[find_nearest(d13,0.95)]
x_99 = x[find_nearest(d13,0.99)]
x_9978 = x[find_nearest(d13,0.9978)]

print(tabulate([
['1%',f"{x_1}"],
['5%',f"{x_5}"],
['10%',f"{x_10}"],
['50%',f"{x_50}"],
['66.5%',f"{x_66}"],
['85%',f"{x_85}"],
['90%',f"{x_90}"],
['95%',f"{x_95}"],
['99%',f"{x_99}"],
['99.78%',f"{x_9978}"]],
headers = ['Percentile', 'x * 1e-9']))

/tmp/ipykernel_15783/3082528578.py:2: DeprecationWarning: `np.int` is a deprecated␣


↪alias for the builtin `int`. To silence this warning, use `int` by itself. Doing␣

↪this will not modify any behavior and is safe. When replacing `np.int`, you may␣

↪wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish␣

↪to review your current use, check the release note link for additional␣

↪information.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

Nx=np.int(1400)
/tmp/ipykernel_15783/3082528578.py:4: DeprecationWarning: `np.int` is a deprecated␣
↪alias for the builtin `int`. To silence this warning, use `int` by itself. Doing␣

↪this will not modify any behavior and is safe. When replacing `np.int`, you may␣

↪wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish␣

↪to review your current use, check the release note link for additional␣

↪information.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/


↪release/1.20.0-notes.html#deprecations

plt.plot(x[0:np.int(Nx/m)],d13[0:np.int(Nx/m)]) # show Yad this -- I multiplied␣


↪by m -- step size

15.9. Waste Hoist Failure Rate 281


Quantitative Economics with Python

Percentile x * 1e-9
------------ ----------
1% 76.15
5% 106.5
10% 128.2
50% 260.55
66.5% 338.55
85% 509.4
90% 608.8
95% 807.6
99% 1470.2
99.78% 2474.85

The above table agrees closely with column 2 of Table 11 on p. 28 of of [GS93].


Discrepancies are probably due to slight differences in the number of digits retained in inputting 𝜇𝑖 , 𝜎𝑖 , 𝑖 = 1, … , 14 and
in the number of points deployed in the discretizations.

282 Chapter 15. Fault Tree Uncertainties


CHAPTER

SIXTEEN

INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS

!pip install --upgrade jax jaxlib


!conda install -y -c plotly plotly plotly-orca retrying

Note: If you are running this on Google Colab the above cell will present an error. This is because Google Colab doesn’t
use Anaconda to manage the Python packages. However this lecture will still execute as Google Colab has plotly
installed.

16.1 Overview

Substantial parts of machine learning and artificial intelligence are about


• approximating an unknown function with a known function
• estimating the known function from a set of data on the left- and right-hand variables
This lecture describes the structure of a plain vanilla artificial neural network (ANN) of a type that is widely used to
approximate a function 𝑓 that maps 𝑥 in a space 𝑋 into 𝑦 in a space 𝑌 .
To introduce elementary concepts, we study an example in which 𝑥 and 𝑦 are scalars.
We’ll describe the following concepts that are brick and mortar for neural networks:
• a neuron
• an activation function
• a network of neurons
• A neural network as a composition of functions
• back-propagation and its relationship to the chain rule of differential calculus

283
Quantitative Economics with Python

16.2 A Deep (but not Wide) Artificial Neural Network

We describe a “deep” neural network of “width” one.


Deep means that the network composes a large number of functions organized into nodes of a graph.
Width refers to the number of right hand side variables on the right hand side of the function being approximated.
Setting “width” to one means that the network composes just univariate functions.
Let 𝑥 ∈ ℝ be a scalar and 𝑦 ∈ ℝ be another scalar.
We assume that 𝑦 is a nonlinear function of 𝑥:

𝑦 = 𝑓(𝑥)

We want to approximate 𝑓(𝑥) with another function that we define recursively.


For a network of depth 𝑁 ≥ 1, each layer 𝑖 = 1, … 𝑁 consists of
• an input 𝑥𝑖
• an affine function 𝑤𝑖 𝑥𝑖 + 𝑏𝐼, where 𝑤𝑖 is a scalar weight placed on the input 𝑥𝑖 and 𝑏𝑖 is a scalar bias
• an activation function ℎ𝑖 that takes (𝑤𝑖 𝑥𝑖 + 𝑏𝑖 ) as an argument and produces an output 𝑥𝑖+1
An example of an activation function ℎ is the sigmoid function
1
ℎ(𝑧) =
1 + 𝑒−𝑧
Another popular activation function is the rectified linear unit (ReLU) function

ℎ(𝑧) = max(0, 𝑧)

Yet another activation function is the identity function

ℎ(𝑧) = 𝑧

As activation functions below, we’ll use the sigmoid function for layers 1 to 𝑁 − 1 and the identity function for layer 𝑁 .
̂ by proceeding as follows.
To approximate a function 𝑓(𝑥) we construct 𝑓(𝑥)
Let

𝑙𝑖 (𝑥) = 𝑤𝑖 𝑥 + 𝑏𝑖 .

We construct 𝑓 ̂ by iterating on compositions of functions ℎ𝑖 ∘ 𝑙𝑖 :

̂ =ℎ ∘𝑙 ∘ℎ
𝑓(𝑥) ≈ 𝑓(𝑥) 𝑁 𝑁 𝑁−1 ∘ 𝑙1 ∘ ⋯ ∘ ℎ1 ∘ 𝑙1 (𝑥)

If 𝑁 > 1, we call the right side a “deep” neural net.


The larger is the integer 𝑁 , the “deeper” is the neural net.
Evidently, if we know the parameters {𝑤𝑖 , 𝑏𝑖 }𝑁 ̂
𝑖=1 , then we can compute 𝑓(𝑥) for a given 𝑥 = 𝑥̃ by iterating on the
recursion

𝑥𝑖+1 = ℎ𝑖 ∘ 𝑙𝑖 (𝑥𝑖 ), , 𝑖 = 1, … 𝑁 (16.1)

starting from 𝑥1 = 𝑥.̃


The value of 𝑥𝑁+1 that emerges from this iterative scheme equals 𝑓(̂ 𝑥).
̃

284 Chapter 16. Introduction to Artificial Neural Networks

You might also like