0% found this document useful (0 votes)

6 views56 pages

Week 16 Lecture 01 02 SVD and CUR (Example)

The document discusses the concept of dimensionality reduction in data mining, particularly focusing on the representation of data in lower-dimensional subspaces. It explains how to compress data while maintaining essential information, using matrix decomposition techniques such as Singular Value Decomposition (SVD). The goal is to uncover hidden correlations, reduce noise, and facilitate easier data processing and visualization.

Uploaded by

rameeshamalik.143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views56 pages

Week 16 Lecture 01 02 SVD and CUR (Example)

Uploaded by

rameeshamalik.143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Note to other teachers and users of these slides: We would be delighted if you found this our

material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://www.mmds.org

Mining of Massive Datasets

Jure Leskovec, Anand Rajaraman, Jeff Ullman
Stanford University
http://www.mmds.org
Acknowledgment
Jure Leskovec, Anand Rajaraman, Jeff Ullman
 Assumption: Data lies on or near a low
d-dimensional subspace
 Axes of this subspace are effective
representation of the data
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 3
 Compress / reduce dimensionality:
▪ 106 rows; 103 columns; no updates
▪ Random access to any cell(s); small error: OK

The above matrix is really “2-dimensional.” All rows can

be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1]
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 4
 Q: What is rank of a matrix A?
 A: Number of linearly independent columns of A
 For example:
▪ Matrix A = has rank r=2

▪ Why? The first two rows are linearly independent, so the rank is at least
2, but all three rows are linearly dependent (the first is equal to the sum
of the second and third) so the rank must be less than 3.
 Why do we care about low rank?
▪ We can write A as two “basis” vectors: [1 2 1] [-2 -3 1]
▪ And new coordinates of : [1 0] [0 1] [1 1]
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 5
 Cloud of points 3D space:
▪ Think of point positions
as a matrix: A
B A
1 row per point: C

 We can rewrite coordinates more efficiently!

▪ Old basis vectors: [1 0 0] [0 1 0] [0 0 1]
▪ New basis vectors: [1 2 1] [-2 -3 1]
▪ Then A has new coordinates: [1 0]. B: [0 1], C: [1 1]
▪ Notice: We reduced the number of coordinates!
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 6
 Goal of dimensionality reduction is to
discover the axis of data!

Rather than representing

every point with 2 coordinates
we represent each point with
1 coordinate (corresponding to
the position of the point on
the red line).

By doing this we incur a bit of

error as the points do not
exactly lie on the line

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 7

Why reduce dimensions?
 Discover hidden correlations/topics
▪ Words that occur commonly together
 Remove redundant and noisy features
▪ Not all words are useful
 Interpretation and visualization
 Easier storage and processing of the data

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 8

A[m x n] = U[m x r]   r x r] (V[n x r])T
 A: Input data matrix
▪ m x n matrix (e.g., m documents, n terms)
 U: Left singular vectors
▪ m x r matrix (m documents, r concepts)
 : Singular values
▪ r x r diagonal matrix (strength of each ‘concept’)
(r : rank of the matrix A)
 V: Right singular vectors
▪ n x r matrix (n terms, r concepts)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 9
T

n n



VT
m A m

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 10

n
1u1v1 2u2v2

m A  +

σi … scalar
ui … vector
vi … vector
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 11
It is always possible to decompose a real
matrix A into A = U  VT , where
 U, , V: unique
 U, V: column orthonormal
▪ UT U = I; VT V = I (I: identity matrix)
▪ (Columns are orthogonal unit vectors)
 : diagonal
▪ Entries (singular values) are positive,
and sorted in decreasing order (σ1  σ2  ...  0)

Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 12
 A = U  VT - example: Users to Movies
Casablanca
Serenity

Amelie
Matrix
Alien

n
1 1 1 0 0
SciFi
3 3 3 0 0
4 4 4 0 0  VT
5 5 5 0 0 = m
0 2 0 4 4
Romnce 0 0 0 5 5
0 1 0 2 2 U “Concepts”
AKA Latent dimensions
AKA Latent factors

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 13

 A = U  VT - example: Users to Movies
Casablanca
Serenity

Amelie
Matrix
Alien

1 1 1 0 0 0.13 0.02 -0.01

SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 14
 A = U  VT - example: Users to Movies
Casablanca
SciFi-concept
Serenity

Amelie
Matrix

Romance-concept
Alien

1 1 1 0 0 0.13 0.02 -0.01

SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 15
 A = U  VT - example: U is “user-to-concept”
Casablanca similarity matrix
Serenity

Amelie
Matrix

SciFi-concept Romance-concept
Alien

1 1 1 0 0 0.13 0.02 -0.01

Amelie
Matrix

SciFi-concept
Alien

“strength” of the SciFi-concept

1 1 1 0 0 0.13 0.02 -0.01
SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 17
 A = U  VT - example:
Casablanca V is “movie-to-concept”
Serenity

Amelie
Matrix

SciFi-concept similarity matrix

Alien

1 1 1 0 0 0.13 0.02 -0.01

SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32
0.56 0.59 0.56 0.09 0.09
SciFi-concept 0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 18
‘movies’, ‘users’ and ‘concepts’:
 U: user-to-concept similarity matrix
 V: movie-to-concept similarity matrix
 : its diagonal elements:
‘strength’ of each concept

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 19

Movie 2 rating first right
singular vector

Movie 1 rating
 Instead of using two coordinates (𝒙, 𝒚) to describe
point locations, let’s use only one coordinate 𝒛
 Point’s position is its location along vector 𝒗𝟏
 How to choose 𝒗𝟏 ? Minimize reconstruction error
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 21
Movie 2 rating
 Goal: Minimize the sum
of reconstruction errors: first right
singular
𝑁 𝐷 vector
2
෍ ෍ 𝑥𝑖𝑗 − 𝑧𝑖𝑗 v1
𝑖=1 𝑗=1
Movie 1 rating
▪ where 𝒙𝒊𝒋 are the “old” and 𝒛𝒊𝒋 are the
“new” coordinates
 SVD gives ‘best’ axis to project on:
▪ ‘best’ = minimizing the reconstruction errors
 In other words, minimum reconstruction
error
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 22
 A = U  VT - example:

Movie 2 rating
▪ V: “movie-to-concept” matrix first right
singular
▪ U: “user-to-concept” matrix vector

1 1 1 0 0 0.13 0.02 -0.01 Movie 1 rating

3 3 3 0 0 0.41 0.07 -0.03

4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 23
 A = U  VT - example:

Movie 2 rating
first right
singular
variance (‘spread’) vector
on the v1 axis
v1

1 1 1 0 0 0.13 0.02 -0.01 Movie 1 rating

3 3 3 0 0 0.41 0.07 -0.03

Movie 2 rating
 U  Gives the coordinates first right
singular
of the points in the vector

projection axis v1

1 1 1 0 0 Movie 1 rating
Projection of users
3 3 3 0 0 on the “Sci-Fi” axis
4 4 4 0 0 (U ) : 1.61 0.19 -0.01
5 5 5 0 0 5.08 0.66 -0.03
0 2 0 4 4 6.82 0.85 -0.05
0 0 0 5 5 8.43 1.04 -0.06
0 1 0 2 2 1.86 -5.60 0.84
0.86 -6.93 -0.87
0.86 -2.75 0.41
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 25
More details
 Q: How exactly is dim. reduction done?

1 1 1 0 0 0.13 0.02 -0.01

3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 26
More details
 Q: How exactly is dim. reduction done?
 A: Set smallest singular values to zero

1 1 1 0 0 0.13 0.02 -0.01

3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 27
More details
 Q: How exactly is dim. reduction done?
 A: Set smallest singular values to zero

1 1 1 0 0 0.13 0.02 -0.01

3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0  0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 28
More details
 Q: How exactly is dim. reduction done?
 A: Set smallest singular values to zero

1 1 1 0 0 0.13 0.02 -0.01

3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0  0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 29
More details
 Q: How exactly is dim. reduction done?
 A: Set smallest singular values to zero

1 1 1 0 0 0.13 0.02
3 3 3 0 0 0.41 0.07
4 4 4 0 0 0.55 0.09 12.4 0
5 5 5 0 0  0.68 0.11 x 0 9.5 x
0 2 0 4 4 0.15 -0.59
0 0 0 5 5 0.07 -0.73
0 1 0 2 2 0.07 -0.29 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 30
More details
 Q: How exactly is dim. reduction done?
 A: Set smallest singular values to zero
1 1 1 0 0 0.92 0.95 0.92 0.01 0.01
3 3 3 0 0 2.91 3.01 2.91 -0.01 -0.01
4 4 4 0 0 3.90 4.04 3.90 0.01 0.01
5 5 5 0 0  4.82 5.00 4.82 0.03 0.03
0 2 0 4 4 0.70 0.53 0.70 4.11 4.11
0 0 0 5 5 -0.69 1.34 -0.69 4.78 4.78
0 1 0 2 2 0.32 0.23 0.32 2.01 2.01

Frobenius norm:
ǁA-BǁF =  Σij (Aij-Bij)2
ǁMǁF = Σij Mij2 is “small”
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 31
Sigma

A =
U
VT

B is best approximation of A

Sigma

B = U
VT

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 32

 Theorem:
Let A = U  VT and B = U S VT where
S = diagonal rxr matrix with si=σi (i=1…k) else si=0
then B is a best rank(B)=k approx. to A
What do we mean by “best”:
▪ B is a solution to minB ǁA-BǁF where rank(B)=k
Σ
𝜎11

𝜎𝑟𝑟

2
𝐴−𝐵 𝐹 = ෍ 𝐴𝑖𝑗 − 𝐵𝑖𝑗
𝑖𝑗
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 33
Details!

 Theorem: Let A = U  VT (σ1σ2…, rank(A)=r)

then B = U S VT
▪ S = diagonal rxr matrix where si=σi (i=1…k) else si=0
is a best rank-k approximation to A:
▪ B is a solution to minB ǁA-BǁF where rank(B)=k
Σ
𝜎11

𝜎𝑟𝑟

 We will need 2 facts:

▪ 𝑀 = σ𝑖 𝑞𝑖𝑖 2 where M = P Q R is SVD of M
𝐹
▪ U  VT - U S VT = U ( - S) VT
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 34
Details!

 We will need 2 facts:

2
▪ 𝑀 𝐹
= σ𝑘 𝑞𝑘𝑘 where M = P Q R is SVD of M

We apply:
-- P column orthonormal
-- R row orthonormal
▪ U  VT - U S VT = U ( - S) VT -- Q is diagonal

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 35

Details!

 A = U  VT , B = U S VT (σ1σ2…  0, rank(A)=r)
▪ S = diagonal nxn matrix where si=σi (i=1…k) else si=0
then B is solution to minB ǁA-BǁF , rank(B)=k
 Why?
r
min A − B F = min  − S F = min si  ( i − si ) 2
B , rank ( B ) = k
i =1
We used: U  VT - US VT = U ( - S) VT
2
 We want to choose si to minimize σ𝑖 𝜎𝑖 − 𝑠𝑖
 Solution is to set si=σi (i=1…k) and other si=0
k r r
= min si  ( i − si ) +  = 
2 2 2
i i
i =1 i = k +1 i = k +1
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 36
Equivalent:
‘spectral decomposition’ of the matrix:

1 1 1 0 0
3 3 3 0 0
4 4 4 0 0 σ1
= x x
5 5 5 0 0 u1 u2
σ2
0 2 0 4 4
0 0 0 5 5 v1
0 1 0 2 2
v2

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 37

Equivalent:
‘spectral decomposition’ of the matrix
m
1 1 1 0 0 k terms
3 3 3 0 0 σ1 σ2
4 4 4 0 0
= u1 vT1 + u2 vT2 +...
5 5 5 0 0 nx1 1xm
n
0 2 0 4 4 Assume: σ1  σ2  σ3  ...  0
0 0 0 5 5
0 1 0 2 2 Why is setting small σi to 0 the right
thing to do?
Vectors ui and vi are unit length, so σi
scales them.
So, zeroing small σi introduces less error.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 38
Q: How many σs to keep?
A: Rule-of-a thumb:
keep 80-90% of ‘energy’ = σ𝒊 𝝈𝟐𝒊
m

1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
= σ1 u1 vT1 + σ2 u2 vT2 +...
n
0 2 0 4 4
0 0 0 5 5 Assume: σ1  σ2  σ3  ...
0 1 0 2 2

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 39

 To compute SVD:
▪ O(nm2) or O(n2m) (whichever is less)
 But:
▪ Less work, if we just want singular values
▪ or if we want first k singular vectors
▪ or if the matrix is sparse

 Implemented in linear algebra packages like

▪ LINPACK, Matlab, SPlus, Mathematica ...

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 40

 SVD: A= U  VT: unique
▪ U: user-to-concept similarities
▪ V: movie-to-concept similarities
▪  : strength of each concept

 Dimensionality reduction:
▪ keep the few largest singular values
(80-90% of ‘energy’)
▪ SVD: picks up linear correlations

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 41

 SVD gives us:
▪ A = U  VT
 Eigen-decomposition:
▪ A = X  XT
▪ A is symmetric
▪ U, V, X are orthonormal (UTU=I),
▪   are diagonal
 Now let’s calculate:
▪ AAT= U VT(U VT)T = U VT(VTUT) = UT UT
▪ ATA = V T UT (U VT) = V T VT

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 42

 SVD gives us:
▪ A = U  VT Shows how to compute
 Eigen-decomposition: SVD using eigenvalue
decomposition!
▪ A = X  XT
▪ A is symmetric
▪ U, V, X are orthonormal (UTU=I),
▪   are diagonal
 Now let’s calculate: X  XT
▪ AAT= U VT(U VT)T = U VT(VTUT) = UT UT
▪ ATA = V T UT (U VT) = V T VT
X  XT
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 43
 Q: Find users that like ‘Matrix’
 A: Map query into a ‘concept space’ – how?
Casablanca
Serenity

Amelie
Matrix
Alien

1 1 1 0 0 0.13 0.02 -0.01

SciFi
3 3 3 0 0 0.41 0.07 -0.03
4 4 4 0 0 0.55 0.09 -0.04 12.4 0 0
5 5 5 0 0 = 0.68 0.11 -0.05 x 0 9.5 0 x
0 2 0 4 4 0.15 -0.59 0.65 0 0 1.3
Romnce 0 0 0 5 5 0.07 -0.73 -0.67
0 1 0 2 2 0.07 -0.29 0.32 0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 46
+ Optimal low-rank approximation
in terms of Frobenius norm
- Interpretability problem:
▪ A singular vector specifies a linear
combination of all input columns or rows
- Lack of sparsity:
▪ Singular vectors are dense!
VT

=
U

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 47

Frobenius norm:
ǁXǁF =  Σij Xij2

 Goal: Express A as a product of matrices C,U,R

Make ǁA-C·U·RǁF small
 “Constraints” on C and R:

A C U R

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 49

Frobenius norm:
ǁXǁF =  Σij Xij2

 Goal: Express A as a product of matrices C,U,R

Make ǁA-C·U·RǁF small
 “Constraints” on C and R:

Pseudo-inverse of
the intersection of C and R

A C U R

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 50

 Sampling columns (similarly for rows):

Note this is a randomized algorithm, same

column can be sampled more than once
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 52
 Let W be the “intersection” of sampled
columns C and rows R
▪ Let SVD of W = X Z YT
 Then: U = W+ = Y Z+ XT
▪ +: reciprocals of non-zero
singular values: +ii = ii
▪ W+ is the “pseudoinverse” Why pseudoinverse works?
W = X Z Y then W-1 = X-1 Z-1 Y-1
Due to orthonomality
W R X-1=XT and Y-1=YT
A
 C
Since Z is diagonal Z-1 = 1/Zii
Thus, if W is nonsingular,
pseudoinverse is the true
U = W+ inverse
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 53
 For example:
𝒌 𝒍𝒐𝒈 𝒌
▪ Select 𝒄 = 𝑶 columns of A using
𝜺𝟐
ColumnSelect algorithm
𝒌 𝒍𝒐𝒈 𝒌
▪ Select 𝒓 = 𝑶 rows of A using
𝜺𝟐
ColumnSelect algorithm
▪ Set 𝑼 = 𝑾+CUR error SVD error
 Then:
with probability 98%
In practice:
Pick 4k cols/rows
for a “rank-k” approximation
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 54
+ Easy interpretation
• Since the basis vectors are actual
columns and rows
+ Sparse basis Actual column
• Since the basis vectors are actual Singular vector
columns and rows
- Duplicate columns and rows
• Columns of large norms will be sampled many
times

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 55

 If we want to get rid of the duplicates:
▪ Throw them away
▪ Scale (multiply) the columns/rows by the
square root of the number of duplicates

Rs
Rd

A
Cd Cs Construct a
small U

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 56

sparse and small

SVD: A = U  VT

Huge but sparse Big and dense

dense but small

CUR: A = C U R
Huge but sparse Big but sparse
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 57
 DBLP bibliographic data
▪ Author-to-conference big sparse matrix
▪ Aij: Number of papers published by author i at
conference j
▪ 428K authors (rows), 3659 conferences (columns)
▪ Very sparse
 Want to reduce dimensionality
▪ How much time does it take?
▪ What is the reconstruction error?
▪ How much space do we need?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 58

Linear Algebra For Machine Learning
No ratings yet
Linear Algebra For Machine Learning
65 pages
Big Data - Lecture 06 - SVD
No ratings yet
Big Data - Lecture 06 - SVD
56 pages
Collaborativefiltering 21
No ratings yet
Collaborativefiltering 21
72 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
Lecture 6 7
No ratings yet
Lecture 6 7
69 pages
Lecture 1
No ratings yet
Lecture 1
55 pages
Big Data - Lecture05 - LSH
No ratings yet
Big Data - Lecture05 - LSH
56 pages
06-Dim Red
No ratings yet
06-Dim Red
61 pages
Unit 4
No ratings yet
Unit 4
60 pages
Slide11 Dimred BK v3 0104
No ratings yet
Slide11 Dimred BK v3 0104
91 pages
Data Mining: Dimensionality Reduction Pca - SVD
No ratings yet
Data Mining: Dimensionality Reduction Pca - SVD
33 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
37 pages
Week 5 - Latent Semantic Indexing
No ratings yet
Week 5 - Latent Semantic Indexing
38 pages
07 Recsys1
No ratings yet
07 Recsys1
48 pages
Unit 5
No ratings yet
Unit 5
39 pages
DSV Mod 3
No ratings yet
DSV Mod 3
47 pages
08 Recsys2
No ratings yet
08 Recsys2
60 pages
ch09 Recsys1
No ratings yet
ch09 Recsys1
43 pages
Lecture10 CF Dimensionality Reduction V0
No ratings yet
Lecture10 CF Dimensionality Reduction V0
30 pages
Vector Database Management Systems
No ratings yet
Vector Database Management Systems
13 pages
07 Recsys1
No ratings yet
07 Recsys1
47 pages
17-Matrix Sketching
No ratings yet
17-Matrix Sketching
65 pages
Ch01 Intro
No ratings yet
Ch01 Intro
19 pages
Seminar 5
No ratings yet
Seminar 5
32 pages
ch01 Intro
No ratings yet
ch01 Intro
29 pages
ch03 LSH
No ratings yet
ch03 LSH
58 pages
ch01 Intro
No ratings yet
ch01 Intro
28 pages
Dimensionality Reduction: SVD & Cur: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Dimensionality Reduction: SVD & Cur: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
63 pages
Chapter 6
No ratings yet
Chapter 6
55 pages
Latent Semantic Indexing by Singular Value Decomposition
No ratings yet
Latent Semantic Indexing by Singular Value Decomposition
26 pages
02data Part4
No ratings yet
02data Part4
28 pages
ch07 Clustering
No ratings yet
ch07 Clustering
56 pages
Chapter - 2 Data Mining
No ratings yet
Chapter - 2 Data Mining
21 pages
Vector Databases - A Technical Primer
100% (1)
Vector Databases - A Technical Primer
68 pages
ch-09 - Part 1
No ratings yet
ch-09 - Part 1
22 pages
Large-Scale Machine Learning: K-NN, Perceptron: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Large-Scale Machine Learning: K-NN, Perceptron: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
33 pages
03.data Representation
No ratings yet
03.data Representation
15 pages
Big Data Analytics Course Introduction
No ratings yet
Big Data Analytics Course Introduction
28 pages
Chapter 4 - Part II
No ratings yet
Chapter 4 - Part II
44 pages
Data Similarity
0% (1)
Data Similarity
18 pages
Self Reading - KNN - Notes
No ratings yet
Self Reading - KNN - Notes
7 pages
CHP 4
No ratings yet
CHP 4
72 pages
Finisher Dyanapac Catalogo
No ratings yet
Finisher Dyanapac Catalogo
80 pages
ch02 Mapreduce
No ratings yet
ch02 Mapreduce
7 pages
A Recommender System: John Urbanic
No ratings yet
A Recommender System: John Urbanic
36 pages
Matrix Factorization
No ratings yet
Matrix Factorization
18 pages
ML Unit-1
No ratings yet
ML Unit-1
64 pages
14-Learning Emb
No ratings yet
14-Learning Emb
8 pages
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
No ratings yet
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
50 pages
Matrix-Vector Multiplication by MapReduce-V2
No ratings yet
Matrix-Vector Multiplication by MapReduce-V2
26 pages
1.6-TR-Vector Space Model Simplest Instantiation
No ratings yet
1.6-TR-Vector Space Model Simplest Instantiation
11 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
Worksheet04 - Recommender Systems
No ratings yet
Worksheet04 - Recommender Systems
2 pages
MCQ and Answers of CMDV201
No ratings yet
MCQ and Answers of CMDV201
13 pages
DM3
No ratings yet
DM3
3 pages
L6 Recommendation
No ratings yet
L6 Recommendation
56 pages
RecommenderSystems1 Overview 1
No ratings yet
RecommenderSystems1 Overview 1
11 pages
Vector Space Model
No ratings yet
Vector Space Model
7 pages
Vector Space Model
No ratings yet
Vector Space Model
11 pages
How To Write A Good Synthesis Essay Conclusion
100% (2)
How To Write A Good Synthesis Essay Conclusion
6 pages
El Fantasma de Canterville
No ratings yet
El Fantasma de Canterville
32 pages
Diana Marissa Capstone Fashion Survey Questions - Google Forms
No ratings yet
Diana Marissa Capstone Fashion Survey Questions - Google Forms
12 pages
POST Newspaper For 26th of September, 2015
No ratings yet
POST Newspaper For 26th of September, 2015
88 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
Strategy Ehsaas For Online Consultation
No ratings yet
Strategy Ehsaas For Online Consultation
57 pages
State of West Virginia Motion To Intervene
0% (1)
State of West Virginia Motion To Intervene
9 pages
Babel 2
No ratings yet
Babel 2
217 pages
อังกฤษกลางภาค ป.5
No ratings yet
อังกฤษกลางภาค ป.5
13 pages
Croak by Gina Damico Excerpt
0% (1)
Croak by Gina Damico Excerpt
29 pages
PF L11.1 The Cost of A Car
No ratings yet
PF L11.1 The Cost of A Car
15 pages
Achieve GDPR Compliance With ISO27001 Apr 21
No ratings yet
Achieve GDPR Compliance With ISO27001 Apr 21
11 pages
Guideline For Empanelment of Hospital Under MA Yojana PDF
No ratings yet
Guideline For Empanelment of Hospital Under MA Yojana PDF
43 pages
DLL ENGLISH-6 Q2 W7-Demo
No ratings yet
DLL ENGLISH-6 Q2 W7-Demo
4 pages
Software Defined Network (SDN) Based Internet of Things (Iot) : A Road Ahead
No ratings yet
Software Defined Network (SDN) Based Internet of Things (Iot) : A Road Ahead
9 pages
Mastering Prioritization - by Airfocus
No ratings yet
Mastering Prioritization - by Airfocus
62 pages
Industrial Digital Microscope Tender Specifications
No ratings yet
Industrial Digital Microscope Tender Specifications
5 pages
Clusterware Startup Sequence PDF
No ratings yet
Clusterware Startup Sequence PDF
3 pages
Rheumatology 2016, SLE Flare Vs Infection
No ratings yet
Rheumatology 2016, SLE Flare Vs Infection
9 pages
Heizer Om10 ch11
No ratings yet
Heizer Om10 ch11
71 pages
Placement Test PDF
No ratings yet
Placement Test PDF
9 pages
Resume VTFT
No ratings yet
Resume VTFT
2 pages
Totachi Ygn
No ratings yet
Totachi Ygn
3 pages
Principles of Laboratory Design
No ratings yet
Principles of Laboratory Design
2 pages
Switched Reluctance Motor (SRM)
No ratings yet
Switched Reluctance Motor (SRM)
8 pages
Chess Legends Training
No ratings yet
Chess Legends Training
7 pages
Beamforming Algorithms Technique by Using MVDR and LCMV: Balasem. S.S S.K.Tiong, S. P. Koh
No ratings yet
Beamforming Algorithms Technique by Using MVDR and LCMV: Balasem. S.S S.K.Tiong, S. P. Koh
10 pages
Technical Data Sheet Steel Fibers With Hooked Ends: Performance
No ratings yet
Technical Data Sheet Steel Fibers With Hooked Ends: Performance
2 pages
Curriculum Swimming 2015
100% (1)
Curriculum Swimming 2015
2 pages
Problems in Quantum Mechanics
From Everand
Problems in Quantum Mechanics
V.I. Kogan
5/5 (1)
Elements of Tensor Calculus
From Everand
Elements of Tensor Calculus
A. Lichnerowicz
3.5/5 (2)

Week 16 Lecture 01 02 SVD and CUR (Example)

Uploaded by

Week 16 Lecture 01 02 SVD and CUR (Example)

Uploaded by

Note to other teachers and users of these slides: We would be delighted if you found this our

Mining of Massive Datasets

The above matrix is really “2-dimensional.” All rows can

 We can rewrite coordinates more efficiently!

Rather than representing

By doing this we incur a bit of

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 7

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 8

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 10

Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 13

1 1 1 0 0 0.13 0.02 -0.01

1 1 1 0 0 0.13 0.02 -0.01

1 1 1 0 0 0.13 0.02 -0.01

“strength” of the SciFi-concept

SciFi-concept similarity matrix

1 1 1 0 0 0.13 0.02 -0.01

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 19

1 1 1 0 0 0.13 0.02 -0.01 Movie 1 rating

3 3 3 0 0 0.41 0.07 -0.03

1 1 1 0 0 0.13 0.02 -0.01 Movie 1 rating

3 3 3 0 0 0.41 0.07 -0.03

1 1 1 0 0 0.13 0.02 -0.01

1 1 1 0 0 0.13 0.02 -0.01

1 1 1 0 0 0.13 0.02 -0.01

1 1 1 0 0 0.13 0.02 -0.01

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 32

 Theorem: Let A = U  VT (σ1σ2…, rank(A)=r)

 We will need 2 facts:

 We will need 2 facts:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 35

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 37

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 39

 Implemented in linear algebra packages like

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 40

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 41

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 42

1 1 1 0 0 0.13 0.02 -0.01

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 47

 Goal: Express A as a product of matrices C,U,R

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 49

 Goal: Express A as a product of matrices C,U,R

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 50

Note this is a randomized algorithm, same

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 55

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 56

Huge but sparse Big and dense

dense but small

You might also like