0% found this document useful (0 votes)
4 views2 pages

Feature Standardization

Optimization Algorithms in Deep Learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views2 pages

Feature Standardization

Optimization Algorithms in Deep Learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Your Paper

You
May 12, 2025

Abstract
Your abstract.

1 Introduction
Let M = AT A, then M is a positive semi-definite matrix and symmetric.
Theorem 1.1. Eigenvectors are pairwise orthogonal when eigenvalues of M are different
Proof. Given two eigenvalues λ1 , λ2 corresponding with eigenvectors x, y of M.

Sx = λ1 x, Sy = λ2 y, ST = S

⇒ (λ1 x)T y = (Sx)T y = xT S T y = xT Sy = xT Sy = xT λ2 y = (λ2 x)T y


⇒ xT y(λ1 − λ2 ) = 0
Therefore xT y = 0 or x is perpendicular to y if λ1 ̸= λ2

Theorem 1.2. Spectral norm of M is the largest eigenvalue of M.


Proof. Given an eigenvalue λ corresponding with eigenvector v of M.

(M T M ).v = (AT AAT A).v = AT A.(λv) = λ(AT Av) = λ2 v

It thus concludes that v is also an eigenvector of M T M .


Since M is a symmetric matrix, it could be eigendecomposed as:

M = V ΛV T , M T M = V ΛV T V ΛV T = V ΛT ΛV T = V Λ2 V T

with matrix of linearly independent eigenvectors M and diagonal matrix of eigenvalues Λ.

2 Current Approach
2.1 Notation
L is the learning rate and w is the optimal coefficient vector obtained in regression problem, with:

L = ∥AT A∥2 and w = ∥(AT A)− 1AT b∥2


n
1X
Considering centering the data, xci = xi − µ for i = 1, 2, . . . , n, where µ = xi is the mean. We
n i=1
multiply with the standard deviation of each column compressed in the diagonal matrix as:

à = ĀD−1 = (A − µ.1)D−1

After standardizing the matrix, the learning rate and coefficient vector for its counterpart is L′ and

w.

1
2.2 Approach

ÃT Ã)−1 ÃT b = (D−1 ĀT ĀD−1 )−1 ÃT b


= (D(ĀT Ā)− 1D).(D−1 ĀT ).b
= D(ĀT Ā)−1 ĀT b

Therefore,

L′ ∥w′ ∥2 = ∥ÃT Ã∥2 ∥(ÃT Ã)− 1ÃT b∥2


= ∥D−1 ĀT ĀD−1 ∥2 .∥D(ĀT Ā)−1 ĀT b∥2
≤ ∥D−1 ∥22 .∥D∥2 .∥ĀT Ā∥2 .∥(ĀT Ā)−1 ĀT b∥2
(1)

References

You might also like