0% found this document useful (0 votes)
6 views56 pages

Week 16 Lecture 01 02 SVD and CUR (Example)

The document discusses the concept of dimensionality reduction in data mining, particularly focusing on the representation of data in lower-dimensional subspaces. It explains how to compress data while maintaining essential information, using matrix decomposition techniques such as Singular Value Decomposition (SVD). The goal is to uncover hidden correlations, reduce noise, and facilitate easier data processing and visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views56 pages

Week 16 Lecture 01 02 SVD and CUR (Example)

The document discusses the concept of dimensionality reduction in data mining, particularly focusing on the representation of data in lower-dimensional subspaces. It explains how to compress data while maintaining essential information, using matrix decomposition techniques such as Singular Value Decomposition (SVD). The goal is to uncover hidden correlations, reduce noise, and facilitate easier data processing and visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Note to other teachers and users of these slides: We would be delighted if you found this our

material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://www.mmds.org

Mining of Massive Datasets


Jure Leskovec, Anand Rajaraman, Jeff Ullman
Stanford University
http://www.mmds.org
Acknowledgment
Jure Leskovec, Anand Rajaraman, Jeff Ullman
 Assumption: Data lies on or near a low
d-dimensional subspace
 Axes of this subspace are effective
representation of the data
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 3
 Compress / reduce dimensionality:
▪ 106 rows; 103 columns; no updates
▪ Random access to any cell(s); small error: OK

The above matrix is really “2-dimensional.” All rows can


be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1]
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 4
 Q: What is rank of a matrix A?
 A: Number of linearly independent columns of A
 For example:
▪ Matrix A = has rank r=2

▪ Why? The first two rows are linearly independent, so the rank is at least
2, but all three rows are linearly dependent (the first is equal to the sum
of the second and third) so the rank must be less than 3.
 Why do we care about low rank?
▪ We can write A as two “basis” vectors: [1 2 1] [-2 -3 1]
▪ And new coordinates of : [1 0] [0 1] [1 1]
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 5
 Cloud of points 3D space:
▪ Think of point positions
as a matrix: A
B A
1 row per point: C

 We can rewrite coordinates more efficiently!


▪ Old basis vectors: [1 0 0] [0 1 0] [0 0 1]
▪ New basis vectors: [1 2 1] [-2 -3 1]
▪ Then A has new coordinates: [1 0]. B: [0 1], C: [1 1]
▪ Notice: We reduced the number of coordinates!
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 6
 Goal of dimensionality reduction is to
discover the axis of data!

Rather than representing


every point with 2 coordinates
we represent each point with
1 coordinate (corresponding to
the position of the point on
the red line).

By doing this we incur a bit of


error as the points do not
exactly lie on the line

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 7


Why reduce dimensions?
 Discover hidden correlations/topics
▪ Words that occur commonly together
 Remove redundant and noisy features
▪ Not all words are useful
 Interpretation and visualization
 Easier storage and processing of the data

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 8


A[m x n] = U[m x r]   r x r] (V[n x r])T
 A: Input data matrix
▪ m x n matrix (e.g., m documents, n terms)
 U: Left singular vectors
▪ m x r matrix (m documents, r concepts)
 : Singular values
▪ r x r diagonal matrix (strength of each ‘concept’)
(r : rank of the matrix A)
 V: Right singular vectors
▪ n x r matrix (n terms, r concepts)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 9
T

n n



VT
m A m

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 10


T

n
1u1v1 2u2v2

m A  +

σi … scalar
ui … vector
vi … vector
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 11
It is always possible to decompose a real
matrix A into A = U  VT , where
 U, , V: unique
 U, V: column orthonormal
▪ UT U = I; VT V = I (I: identity matrix)
▪ (Columns are orthogonal unit vectors)
 : diagonal
▪ Entries (singular values) are positive,
and sorted in decreasing order (σ1  σ2  ...  0)

Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf


J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 12
 A = U  VT - example: Users to Movies
Casablanca
Serenity

Amelie
Matrix
Alien

n
1 1 1 0 0
SciFi
3 3 3 0 0
4 4 4 0 0  VT
5 5 5 0 0 = m
0 2 0 4 4
Romnce 0 0 0 5 5
0 1 0 2 2 U “Concepts”
AKA Latent dimensions
AKA Latent factors

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 13


 A = U  VT - example: Users to Movies
Casablanca
Serenity

Amelie
Matrix
Alien

1 1 1 0 0 0.13 0.02 -0.01


SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 14
 A = U  VT - example: Users to Movies
Casablanca
SciFi-concept
Serenity

Amelie
Matrix

Romance-concept
Alien

1 1 1 0 0 0.13 0.02 -0.01


SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 15
 A = U  VT - example: U is “user-to-concept”
Casablanca similarity matrix
Serenity

Amelie
Matrix

SciFi-concept Romance-concept
Alien

1 1 1 0 0 0.13 0.02 -0.01


SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 16
 A = U  VT - example:
Casablanca
Serenity

Amelie
Matrix

SciFi-concept
Alien

“strength” of the SciFi-concept


1 1 1 0 0 0.13 0.02 -0.01
SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 17
 A = U  VT - example:
Casablanca V is “movie-to-concept”
Serenity

Amelie
Matrix

SciFi-concept similarity matrix


Alien

1 1 1 0 0 0.13 0.02 -0.01


SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32
0.56 0.59 0.56 0.09 0.09
SciFi-concept 0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 18
‘movies’, ‘users’ and ‘concepts’:
 U: user-to-concept similarity matrix
 V: movie-to-concept similarity matrix
 : its diagonal elements:
‘strength’ of each concept

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 19


Movie 2 rating first right
singular vector

v1

Movie 1 rating
 Instead of using two coordinates (𝒙, 𝒚) to describe
point locations, let’s use only one coordinate 𝒛
 Point’s position is its location along vector 𝒗𝟏
 How to choose 𝒗𝟏 ? Minimize reconstruction error
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 21
Movie 2 rating
 Goal: Minimize the sum
of reconstruction errors: first right
singular
𝑁 𝐷 vector
2
෍ ෍ 𝑥𝑖𝑗 − 𝑧𝑖𝑗 v1
𝑖=1 𝑗=1
Movie 1 rating
▪ where 𝒙𝒊𝒋 are the “old” and 𝒛𝒊𝒋 are the
“new” coordinates
 SVD gives ‘best’ axis to project on:
▪ ‘best’ = minimizing the reconstruction errors
 In other words, minimum reconstruction
error
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 22
 A = U  VT - example:

Movie 2 rating
▪ V: “movie-to-concept” matrix first right
singular
▪ U: “user-to-concept” matrix vector

v1

1 1 1 0 0 0.13 0.02 -0.01 Movie 1 rating

3 3 3 0 0 0.41 0.07 -0.03


4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 23
 A = U  VT - example:

Movie 2 rating
first right
singular
variance (‘spread’) vector
on the v1 axis
v1

1 1 1 0 0 0.13 0.02 -0.01 Movie 1 rating

3 3 3 0 0 0.41 0.07 -0.03


4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 24
A = U  VT - example:

Movie 2 rating
 U  Gives the coordinates first right
singular
of the points in the vector

projection axis v1

1 1 1 0 0 Movie 1 rating
Projection of users
3 3 3 0 0 on the “Sci-Fi” axis
4 4 4 0 0 (U ) : 1.61 0.19 -0.01
5 5 5 0 0 5.08 0.66 -0.03
0 2 0 4 4 6.82 0.85 -0.05
0 0 0 5 5 8.43 1.04 -0.06
0 1 0 2 2 1.86 -5.60 0.84
0.86 -6.93 -0.87
0.86 -2.75 0.41
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 25
More details
 Q: How exactly is dim. reduction done?

1 1 1 0 0 0.13 0.02 -0.01


3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 26
More details
 Q: How exactly is dim. reduction done?
 A: Set smallest singular values to zero

1 1 1 0 0 0.13 0.02 -0.01


3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 27
More details
 Q: How exactly is dim. reduction done?
 A: Set smallest singular values to zero

1 1 1 0 0 0.13 0.02 -0.01


3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0  0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 28
More details
 Q: How exactly is dim. reduction done?
 A: Set smallest singular values to zero

1 1 1 0 0 0.13 0.02 -0.01


3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0  0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 29
More details
 Q: How exactly is dim. reduction done?
 A: Set smallest singular values to zero

1 1 1 0 0 0.13 0.02
3 3 3 0 0 0.41 0.07
4 4 4 0 0 0.55 0.09 12.4 0
5 5 5 0 0  0.68 0.11 x 0 9.5 x
0 2 0 4 4 0.15 -0.59
0 0 0 5 5 0.07 -0.73
0 1 0 2 2 0.07 -0.29 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 30
More details
 Q: How exactly is dim. reduction done?
 A: Set smallest singular values to zero
1 1 1 0 0 0.92 0.95 0.92 0.01 0.01
3 3 3 0 0 2.91 3.01 2.91 -0.01 -0.01
4 4 4 0 0 3.90 4.04 3.90 0.01 0.01
5 5 5 0 0  4.82 5.00 4.82 0.03 0.03
0 2 0 4 4 0.70 0.53 0.70 4.11 4.11
0 0 0 5 5 -0.69 1.34 -0.69 4.78 4.78
0 1 0 2 2 0.32 0.23 0.32 2.01 2.01

Frobenius norm:
ǁA-BǁF =  Σij (Aij-Bij)2
ǁMǁF = Σij Mij2 is “small”
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 31
Sigma

A =
U
VT

B is best approximation of A

Sigma

B = U
VT

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 32


 Theorem:
Let A = U  VT and B = U S VT where
S = diagonal rxr matrix with si=σi (i=1…k) else si=0
then B is a best rank(B)=k approx. to A
What do we mean by “best”:
▪ B is a solution to minB ǁA-BǁF where rank(B)=k
Σ
𝜎11

𝜎𝑟𝑟

2
𝐴−𝐵 𝐹 = ෍ 𝐴𝑖𝑗 − 𝐵𝑖𝑗
𝑖𝑗
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 33
Details!

 Theorem: Let A = U  VT (σ1σ2…, rank(A)=r)


then B = U S VT
▪ S = diagonal rxr matrix where si=σi (i=1…k) else si=0
is a best rank-k approximation to A:
▪ B is a solution to minB ǁA-BǁF where rank(B)=k
Σ
𝜎11

𝜎𝑟𝑟

 We will need 2 facts:


▪ 𝑀 = σ𝑖 𝑞𝑖𝑖 2 where M = P Q R is SVD of M
𝐹
▪ U  VT - U S VT = U ( - S) VT
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 34
Details!

 We will need 2 facts:


2
▪ 𝑀 𝐹
= σ𝑘 𝑞𝑘𝑘 where M = P Q R is SVD of M

We apply:
-- P column orthonormal
-- R row orthonormal
▪ U  VT - U S VT = U ( - S) VT -- Q is diagonal

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 35


Details!

 A = U  VT , B = U S VT (σ1σ2…  0, rank(A)=r)
▪ S = diagonal nxn matrix where si=σi (i=1…k) else si=0
then B is solution to minB ǁA-BǁF , rank(B)=k
 Why?
r
min A − B F = min  − S F = min si  ( i − si ) 2
B , rank ( B ) = k
i =1
We used: U  VT - US VT = U ( - S) VT
2
 We want to choose si to minimize σ𝑖 𝜎𝑖 − 𝑠𝑖
 Solution is to set si=σi (i=1…k) and other si=0
k r r
= min si  ( i − si ) +  = 
2 2 2
i i
i =1 i = k +1 i = k +1
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 36
Equivalent:
‘spectral decomposition’ of the matrix:

1 1 1 0 0
3 3 3 0 0
4 4 4 0 0 σ1
= x x
5 5 5 0 0 u1 u2
σ2
0 2 0 4 4
0 0 0 5 5 v1
0 1 0 2 2
v2

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 37


Equivalent:
‘spectral decomposition’ of the matrix
m
1 1 1 0 0 k terms
3 3 3 0 0 σ1 σ2
4 4 4 0 0
= u1 vT1 + u2 vT2 +...
5 5 5 0 0 nx1 1xm
n
0 2 0 4 4 Assume: σ1  σ2  σ3  ...  0
0 0 0 5 5
0 1 0 2 2 Why is setting small σi to 0 the right
thing to do?
Vectors ui and vi are unit length, so σi
scales them.
So, zeroing small σi introduces less error.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 38
Q: How many σs to keep?
A: Rule-of-a thumb:
keep 80-90% of ‘energy’ = σ𝒊 𝝈𝟐𝒊
m

1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
= σ1 u1 vT1 + σ2 u2 vT2 +...
n
0 2 0 4 4
0 0 0 5 5 Assume: σ1  σ2  σ3  ...
0 1 0 2 2

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 39


 To compute SVD:
▪ O(nm2) or O(n2m) (whichever is less)
 But:
▪ Less work, if we just want singular values
▪ or if we want first k singular vectors
▪ or if the matrix is sparse

 Implemented in linear algebra packages like


▪ LINPACK, Matlab, SPlus, Mathematica ...

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 40


 SVD: A= U  VT: unique
▪ U: user-to-concept similarities
▪ V: movie-to-concept similarities
▪  : strength of each concept

 Dimensionality reduction:
▪ keep the few largest singular values
(80-90% of ‘energy’)
▪ SVD: picks up linear correlations

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 41


 SVD gives us:
▪ A = U  VT
 Eigen-decomposition:
▪ A = X  XT
▪ A is symmetric
▪ U, V, X are orthonormal (UTU=I),
▪   are diagonal
 Now let’s calculate:
▪ AAT= U VT(U VT)T = U VT(VTUT) = UT UT
▪ ATA = V T UT (U VT) = V T VT

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 42


 SVD gives us:
▪ A = U  VT Shows how to compute
 Eigen-decomposition: SVD using eigenvalue
decomposition!
▪ A = X  XT
▪ A is symmetric
▪ U, V, X are orthonormal (UTU=I),
▪   are diagonal
 Now let’s calculate: X  XT
▪ AAT= U VT(U VT)T = U VT(VTUT) = UT UT
▪ ATA = V T UT (U VT) = V T VT
X  XT
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 43
 Q: Find users that like ‘Matrix’
 A: Map query into a ‘concept space’ – how?
Casablanca
Serenity

Amelie
Matrix
Alien

1 1 1 0 0 0.13 0.02 -0.01


SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 46
+ Optimal low-rank approximation
in terms of Frobenius norm
- Interpretability problem:
▪ A singular vector specifies a linear
combination of all input columns or rows
- Lack of sparsity:
▪ Singular vectors are dense!
VT

=
U

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 47


Frobenius norm:
ǁXǁF =  Σij Xij2

 Goal: Express A as a product of matrices C,U,R


Make ǁA-C·U·RǁF small
 “Constraints” on C and R:

A C U R

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 49


Frobenius norm:
ǁXǁF =  Σij Xij2

 Goal: Express A as a product of matrices C,U,R


Make ǁA-C·U·RǁF small
 “Constraints” on C and R:

Pseudo-inverse of
the intersection of C and R

A C U R

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 50


 Sampling columns (similarly for rows):

Note this is a randomized algorithm, same


column can be sampled more than once
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 52
 Let W be the “intersection” of sampled
columns C and rows R
▪ Let SVD of W = X Z YT
 Then: U = W+ = Y Z+ XT
▪ +: reciprocals of non-zero
singular values: +ii = ii
▪ W+ is the “pseudoinverse” Why pseudoinverse works?
W = X Z Y then W-1 = X-1 Z-1 Y-1
Due to orthonomality
W R X-1=XT and Y-1=YT
A
 C
Since Z is diagonal Z-1 = 1/Zii
Thus, if W is nonsingular,
pseudoinverse is the true
U = W+ inverse
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 53
 For example:
𝒌 𝒍𝒐𝒈 𝒌
▪ Select 𝒄 = 𝑶 columns of A using
𝜺𝟐
ColumnSelect algorithm
𝒌 𝒍𝒐𝒈 𝒌
▪ Select 𝒓 = 𝑶 rows of A using
𝜺𝟐
ColumnSelect algorithm
▪ Set 𝑼 = 𝑾+CUR error SVD error
 Then:
with probability 98%
In practice:
Pick 4k cols/rows
for a “rank-k” approximation
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 54
+ Easy interpretation
• Since the basis vectors are actual
columns and rows
+ Sparse basis Actual column
• Since the basis vectors are actual Singular vector
columns and rows
- Duplicate columns and rows
• Columns of large norms will be sampled many
times

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 55


 If we want to get rid of the duplicates:
▪ Throw them away
▪ Scale (multiply) the columns/rows by the
square root of the number of duplicates

Rs
Rd

A
Cd Cs Construct a
small U

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 56


sparse and small

SVD: A = U  VT

Huge but sparse Big and dense

dense but small

CUR: A = C U R
Huge but sparse Big but sparse
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 57
 DBLP bibliographic data
▪ Author-to-conference big sparse matrix
▪ Aij: Number of papers published by author i at
conference j
▪ 428K authors (rows), 3659 conferences (columns)
▪ Very sparse
 Want to reduce dimensionality
▪ How much time does it take?
▪ What is the reconstruction error?
▪ How much space do we need?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 58

You might also like