Your Paper
You
May 12, 2025
Abstract
Your abstract.
1 Introduction
Let M = AT A, then M is a positive semi-definite matrix and symmetric.
Theorem 1.1. Eigenvectors are pairwise orthogonal when eigenvalues of M are different
Proof. Given two eigenvalues λ1 , λ2 corresponding with eigenvectors x, y of M.
Sx = λ1 x, Sy = λ2 y, ST = S
⇒ (λ1 x)T y = (Sx)T y = xT S T y = xT Sy = xT Sy = xT λ2 y = (λ2 x)T y
⇒ xT y(λ1 − λ2 ) = 0
Therefore xT y = 0 or x is perpendicular to y if λ1 ̸= λ2
Theorem 1.2. Spectral norm of M is the largest eigenvalue of M.
Proof. Given an eigenvalue λ corresponding with eigenvector v of M.
(M T M ).v = (AT AAT A).v = AT A.(λv) = λ(AT Av) = λ2 v
It thus concludes that v is also an eigenvector of M T M .
Since M is a symmetric matrix, it could be eigendecomposed as:
M = V ΛV T , M T M = V ΛV T V ΛV T = V ΛT ΛV T = V Λ2 V T
with matrix of linearly independent eigenvectors M and diagonal matrix of eigenvalues Λ.
2 Current Approach
2.1 Notation
L is the learning rate and w is the optimal coefficient vector obtained in regression problem, with:
L = ∥AT A∥2 and w = ∥(AT A)− 1AT b∥2
n
1X
Considering centering the data, xci = xi − µ for i = 1, 2, . . . , n, where µ = xi is the mean. We
n i=1
multiply with the standard deviation of each column compressed in the diagonal matrix as:
à = ĀD−1 = (A − µ.1)D−1
After standardizing the matrix, the learning rate and coefficient vector for its counterpart is L′ and
′
w.
1
2.2 Approach
ÃT Ã)−1 ÃT b = (D−1 ĀT ĀD−1 )−1 ÃT b
= (D(ĀT Ā)− 1D).(D−1 ĀT ).b
= D(ĀT Ā)−1 ĀT b
Therefore,
L′ ∥w′ ∥2 = ∥ÃT Ã∥2 ∥(ÃT Ã)− 1ÃT b∥2
= ∥D−1 ĀT ĀD−1 ∥2 .∥D(ĀT Ā)−1 ĀT b∥2
≤ ∥D−1 ∥22 .∥D∥2 .∥ĀT Ā∥2 .∥(ĀT Ā)−1 ĀT b∥2
(1)
References