0% found this document useful (0 votes)
207 views3 pages

1 9780692196380 FM

The document is a note from the author discussing the importance of linear algebra in relation to deep learning, statistics, and optimization. It emphasizes that understanding matrix factorizations is crucial for applications in data analysis and deep learning. The author also mentions the course Math 18.065, which integrates these subjects and provides resources for instructors and students, including recorded lectures and a new textbook.

Uploaded by

hau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
207 views3 pages

1 9780692196380 FM

The document is a note from the author discussing the importance of linear algebra in relation to deep learning, statistics, and optimization. It emphasizes that understanding matrix factorizations is crucial for applications in data analysis and deep learning. The author also mentions the course Math 18.065, which integrates these subjects and provides resources for instructors and students, including recorded lectures and a new textbook.

Uploaded by

hau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Linear Algebra and Learning from Data

Downloaded 06/12/25 to 171.255.185.198 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

a note from the author

This message and this new textbook are about an established subject—linear algebra—leading to the much newer
subject of deep learning. May I express separate thoughts about those two subjects, and connect them. Two other
subjects are essential to success—statistics and optimization—and the book shows how and where they play a crucial
part.

Linear Algebra is completely accepted as basic to the undergraduate curriculum. But I don’t see that its surge
in importance is fully recognized. Multivariable algebra is far more widely used than multivariable calculus. Our
students are really missing out if our teaching is limited to matrix manipulation. It is factorization of the matrices that
we need in applications—into orthogonal and diagonal and triangular matrices.

Deep Learning is a particularly successful application to understanding data. It constructs a learning function
F (v) = w. The data vectors are v, and their meaning is w. F is constructed from a training set of known pairs v
and w. The word deep indicates that F is a composition FL (. . . (F1 (v))) of L simple steps (the “depth” is L). Each
step involves a matrix Ai , a vector bi , and a fixed nonlinear activation function : often Fi = max(Ai vi−1 + bi , 0).
The matrices Ai and the vectors bi are optimized to reproduce F (v) = w on the known training data, leading to good
accuracy on the unseen data.

This is the course I now teach : Math 18.065. And students come to it, knowing that the two subjects are important
for their future. They learn quite a lot about linear algebra, and they see how optimization finds those matrices Ai
in the learning function. Research labs and companies have data to analyze and understand, and this deep learning
approach has become widespread. Students learn key ideas from statistics, to measure the success of the learning
function F .

The course needs an instructor who wants to help. It begins with linear algebra—matrix factorizations A = QR
from Gram-Schmidt orthogonalization and S = QΛQT from eigenvalues and A = U ΣV T from singular values.
This is the heart of the subject and you could not teach any mathematics that is more useful.

To help instructors and students, the 2018 lectures were recorded for MIT’s OpenCourseWare. They will be on
ocw.mit.edu in mid-April. It’s now 2019 and we have the textbook and more experience with the course. I would be
happy to send you the new 2019 videos when SIAM sends you a sample copy of the book.

Gilbert Strang
Department of Mathematics, MIT
GILBERT STRANG

Deep Learning and Neural Nets iii


Downloaded 06/12/25 to 171.255.185.198 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Preface and Acknowledgments vi

Part I : Highlights of Linear Algebra 1


I.1 Multiplication Ax Using Columns of A 2
I.2 Matrix-Matrix Multiplication AB 9
I.3 The Four Fundamental Subspaces 14
I.4 Elimination and A = LU 21
I.5 Orthogonal Matrices and Subspaces 29
I.6 Eigenvalues and Eigenvectors 36
I.7 Symmetric Positive Definite Matrices 44
I.8 Singular Values and Singular Vectors in the SVD 56
I.9 Principal Components and the Best Low Rank Matrix 71
I.10 Rayleigh Quotients and Generalized Eigenvalues 81
I.11 Norms of Vectors and Functions and Matrices 88
I.12 Factoring Matrices and Tensors : Positive and Sparse 97

Part II : Computations with Large Matrices 113


II.1 Numerical Linear Algebra 115
II.2 Least Squares : Four Ways 124
II.3 Three Bases for the Column Space 138
II.4 Randomized Linear Algebra 146

Part III: Low Rank and Compressed Sensing 159


III.1 Changes in A−1 from Changes in A 160
III.2 Interlacing Eigenvalues and Low Rank Signals 168
III.3 Rapidly Decaying Singular Values 178
III.4 Split Algorithms for ℓ 2 + ℓ 1 184
III.5 Compressed Sensing and Matrix Completion 195

Part IV: Special Matrices 203


IV.1 Fourier Transforms : Discrete and Continuous 204
IV.2 Shift Matrices and Circulant Matrices 213
IV.3 The Kronecker Product A  B 221
IV.4 Sine and Cosine Transforms from Kronecker Sums 228
IV.5 Toeplitz Matrices and Shift Invariant Filters 232
IV.6 Graphs and Laplacians and Kirchhoff’s Laws 239
IV.7 Clustering by Spectral Methods and k-means 245
IV.8 Completing Rank One Matrices 255
IV.9 The Orthogonal Procrustes Problem 257
IV.10 Distance Matrices 259
Part V: Probability and Statistics 263
V.1 Mean, Variance, and Probability 264
V.2 Probability Distributions 275
V.3 Moments, Cumulants, and Inequalities of Statistics 284
V.4 Covariance Matrices and Joint Probabilities 294
Downloaded 06/12/25 to 171.255.185.198 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

V.5 Multivariate Gaussian and Weighted Least Squares 304


V.6 Markov Chains 311

Part VI: Optimization 321


VI.1 Minimum Problems : Convexity and Newton’s Method 324
VI.2 Lagrange Multipliers = Derivatives of the Cost 333
VI.3 Linear Programming, Game Theory, and Duality 338
VI.4 Gradient Descent Toward the Minimum 344
VI.5 Stochastic Gradient Descent and ADAM 359

Part VII: Learning from Data 371


VII.1 The Construction of Deep Neural Networks 375
VII.2 Convolutional Neural Nets 387
VII.3 Backpropagation and the Chain Rule 397
VII.4 Hyperparameters : The Fateful Decisions 407
VII.5 The World of Machine Learning 413

Books on Machine Learning 416

Eigenvalues and Singular Values : Rank One 417

Codes and Algorithms for Numerical Linear Algebra 418

Counting Parameters in the Basic Factorizations 419

Index of Authors 420

Index 423

Index of Symbols 432

You might also like