CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec,...

61
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

Transcript of CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec,...

Page 1: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

CS246: Mining Massive DatasetsJure Leskovec, Stanford University

http://cs246.stanford.edu

Page 2: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Often, our data can be represented by an !-by-" matrix

¡ And this matrix can be closely approximated by the product of three matrices that share a small common dimension #

21/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

A Um

rnVT

n

r

~~

´ ´S

r

Page 3: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Compress / reduce dimensionality:§ 106 rows; 103 columns; no updates§ Random access to any cell(s); small error: OK

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 3

Note: The above matrix is really “2-dimensional.” All rows can be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1]

New representation[1 0][2 0][1 0][5 0][0 2][0 3][0 1]

Page 4: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

There are hidden, or latent factors, latent dimensions that – to a close approximation –explain why the values are as they appear in the data matrix

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 4

Page 5: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

The axes of these dimensions can be chosen by:§ The first dimension is the direction in which the

points exhibit the greatest variance

§ The second dimension is the direction, orthogonal to the first, in which points show the greatest variance

§ And so on…, until you have enough dimensions that variance is really low

51/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

Page 6: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Q: What is rank of a matrix A?¡ A: Number of linearly independent rows of A¡ Cloud of points 3D space:

§ Think of point positionsas a matrix:

¡ We can rewrite coordinates more efficiently!§ Old basis vectors: [1 0 0] [0 1 0] [0 0 1]§ New basis vectors: [1 2 1] [-2 -3 1]§ Then A has new coordinates: [1 0], B: [0 1], C: [1 -1]

§ Notice: We reduced the number of coordinates!

1 row per point:

ABC

A

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 6

Page 7: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Goal of dimensionality reduction is to discover the axis of data!

Rather than representingevery point with 2 coordinateswe represent each point with1 coordinate (corresponding tothe position of the point on the red line).

By doing this we incur a bit oferror as the points do not exactly lie on the line

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 7

Page 8: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent
Page 9: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Gives a decomposition of any matrix into a product of three matrices:

¡ There are strong constraints on the form of each of these matrices§ Results in a decomposition that is unique

¡ From this decomposition, you can choose any number ! of intermediate concepts (latent factors) in a way that minimizes the reconstruction error

91/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

A Um

rnVT

n

r

~

´ ´S

r

Page 10: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ A: Input data matrix§ m x n matrix (e.g., m documents, n terms)

¡ U: Left singular vectors § m x r matrix (m documents, r concepts)

¡ S: Singular values§ r x r diagonal matrix (strength of each ‘concept’)

(r : rank of the matrix A)¡ V: Right singular vectors

§ n x r matrix (n terms, r concepts)1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 10

Am

n

Sm

n

U

VT

»

T

rrr

Page 11: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

11

Am

n

» +

s1u1v1 s2u2v2

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

σi … scalarui … vectorvi … vector

T

If we set s2 = 0, then the greencolumns may as well not exist.

Page 12: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

It is always possible to decompose a real matrix A into A = U S VT , where

¡ U, S, V: unique¡ U, V: column orthonormal§ UT U = I; VT V = I (I: identity matrix)§ (Columns are orthogonal unit vectors)

¡ S: diagonal§ Entries (singular values) are positive,

and sorted in decreasing order (σ1 ³ σ2 ³ ... ³ 0)

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 12

Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf

Page 13: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Consider a matrix. What does SVD do?

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 13

=SciFi

Romance

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

Sm

n

U

VT

“Concepts” AKA Latent dimensionsAKA Latent factors

Page 14: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ A = U S VT - example: Users to Movies

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 14

=SciFi

x x

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Romance

Page 15: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ A = U S VT - example: Users to Movies

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 15

SciFi-conceptRomance-concept

=SciFi

x x

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Romance

Page 16: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ A = U S VT - example:

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 16

Romance-concept

U is “user-to-concept” similarity matrix

SciFi-concept

=SciFi

x x

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Romance

Page 17: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ A = U S VT - example:

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 17

SciFi

SciFi-concept

“strength” of the SciFi-concept

=SciFi

x x

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Romance

Page 18: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ A = U S VT - example:

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 18

SciFi-concept

V is “movie-to-concept”similarity matrix

SciFi-concept

=SciFi

x x

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Romance

Page 19: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 19

‘movies’, ‘users’ and ‘concepts’:¡ U: user-to-concept matrix

¡ V: movie-to-concept matrix

¡ S: its diagonal elements: ‘strength’ of each concept

Page 20: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent
Page 21: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 21

v1

first right singular vector

Movie 1 rating

Mov

ie 2

ratin

g

¡ Instead of using two coordinates (", $) to describe point locations, let’s use only one coordinate

¡ Point’s position is its location along vector &'

Page 22: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ A = U S VT - example:§ V: “movie-to-concept” matrix§ U: “user-to-concept” matrix

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 22

v1

first right singular vector

Movie 1 rating

Mov

ie 2

ratin

g

= x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 23: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ A = U S VT - example:

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 23

v1

first right singular vector

Movie 1 rating

Mov

ie 2

ratin

g

variance (‘spread’) on the v1 axis

= x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 24: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

A = U S VT - example:¡ U S: Gives the coordinates

of the points in the projection axis

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 24

v1

first right singular vector

Movie 1 rating

Mov

ie 2

ratin

g

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

1.61 0.19 -0.015.08 0.66 -0.036.82 0.85 -0.058.43 1.04 -0.061.86 -5.60 0.840.86 -6.93 -0.870.86 -2.75 0.41

Projection of users on the “Sci-Fi” axis (U S) T:

Page 25: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

More details¡ Q: How is dim. reduction done?

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 25

= x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 26: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

More details¡ Q: How exactly is dim. reduction done?¡ A: Set smallest singular values to zero

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 26

= x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 27: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

More details¡ Q: How exactly is dim. reduction done?¡ A: Set smallest singular values to zero

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 27

x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

»

Page 28: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

More details¡ Q: How exactly is dim. reduction done?¡ A: Set smallest singular values to zero

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 28

x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

»

This is Rank 2 approximation to A. We could also do Rank 1 approx.The larger the rank the more accurate the approximation.

Page 29: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

More details¡ Q: How exactly is dim. reduction done?¡ A: Set smallest singular values to zero

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 29

» x x

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.020.41 0.070.55 0.090.68 0.110.15 -0.590.07 -0.730.07 -0.29

12.4 0 0 9.5

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.69

This is Rank 2 approximation to A. We could also do Rank 1 approx.The larger the rank the more accurate the approximation.

Page 30: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

More details¡ Q: How exactly is dim. reduction done?¡ A: Set smallest singular values to zero

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 30

»

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.92 0.95 0.92 0.01 0.012.91 3.01 2.91 -0.01 -0.013.90 4.04 3.90 0.01 0.014.82 5.00 4.82 0.03 0.030.70 0.53 0.70 4.11 4.11-0.69 1.34 -0.69 4.78 4.780.32 0.23 0.32 2.01 2.01

Frobenius norm:

ǁMǁF = ÖΣij Mij2 ǁA-BǁF = Ö Σij (Aij-Bij)2

is “small”

This is Rank 2 approximation to A. We could also do Rank 1 approx.The larger the rank the more accurate the approximation

Reconstructed data matrix B

Page 31: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Fact: SVD gives ‘best’ axis to project on:§ ‘best’ = minimizing the sum of reconstruction errors

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 31

A USigma

VT=

B USigma

VT=

B is best approximation of A:

! − # $ = & !'( − #'()

'(

Page 32: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ SVD: A= U S VT: unique§ U: user-to-concept similarities§ V: movie-to-concept similarities§ S : strength of each concept

¡ Q: So what’s a good value for r?¡ Let the energy of a set of singular values be the sum of

their squares.¡ Pick r so the retained singular values have at least 90%

of the total energy.

¡ Back to our example:§ With singular values 12.4, 9.5, and 1.3, total energy = 245.7 § If we drop 1.3, whose square is only 1.7, we are left with

energy 244, or over 99% of the total1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 33

Page 33: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent
Page 34: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ How the SVD is actually computed?

¡ First we need a method for finding the principal eigenvalue (the largest one) and the

corresponding eigenvector of a symmetric matrix

§ ! is symmetric if "#$ = "$# for all # and $¡ Method:§ Start with any “guess eigenvector” '0§ Construct ')*+ = ,'-

| ,'- |for / = 0, 1, …

§ || … || denotes the Frobenius norm

§ Stop when consecutive '/ show little change

351/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

Page 35: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

36

M = 1 22 3 x0 = 1

1

Mx0

||Mx0||= 3

5 /Ö34 = 0.510.86 = x1

Mx1

||Mx1||= 2.23

3.60 /Ö17.93 = 0.530.85 = x2

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

…..

Page 36: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Once you have the principal eigenvector !, you find its eigenvalue l by l = !$%!.§ In proof: We know !l = %! if l is the eigenvalue;

multiply both sides by !$ on the left.§ Since !$! = 1we have l = !$%!

¡ Example: If we take xT = [0.53, 0.85], then l =

37

][[ ][0.53 0.85] 1 22 3

0.530.85

= 4.25

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

Page 37: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Eliminate the portion of the matrix M that can be generated by the first eigenpair, l and ":

#∗:= #– �))*¡ Recursively find the principal eigenpair for #∗,

eliminate the effect of that pair, and so on

¡ Example:

38

M* = [ ] -0.19 0.090.09 0.07– 4.25 [ ]0.53

0.85 [0.53 0.85]1 22 3 = [ ]

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

Page 38: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Start by supposing ! = $S%&¡ '( = (*S+()( = (+()(S(*( = +S*(§ Why? (1) Rule for transpose of a product; (2) the

transpose of the transpose and the transpose of a diagonal matrix are both the identity function

¡ !&! = %S$&$S%& = %S-%&§ Why? * is orthonormal, so *(* is an identity matrix§ Also note that S2 is a diagonal matrix whose /-th

element is the square of the /-th element of S¡ !&!% = %S-%&% = %S-§ Why? + is also orthonormal

391/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

Page 39: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Starting with ("#")% = %S2, note that therefore the )-th column of % is an eigenvector of "#", and its eigenvalue is the )-th element of S2

¡ Thus, we can find % and S by finding the eigenpairs for "#"§ Once we have the eigenvalues in S2, we can find

the singular values by taking the square root of these eigenvalues

¡ Symmetric argument, starting with ""#, gives us *

401/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

Page 40: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ To compute the full SVD using specialized methods:§ O(nm2) or O(n2m) (whichever is less)

¡ But:§ Less work, if we just want singular values§ or if we want first k singular vectors§ or if the matrix is sparse

¡ Implemented in linear algebra packages like§ LINPACK, Matlab, SPlus, Mathematica ...

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 41

Page 41: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent
Page 42: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Q: Find users that like ‘Matrix’¡ A: Map query into a ‘concept space’ – how?

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 43

=SciFi

Romnce

x x

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2

0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32

12.4 0 00 9.5 00 0 1.3

0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09

Page 43: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Q: Find users that like ‘Matrix’¡ A: Map query into a ‘concept space’ – how?

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 44

5 0 0 0 0

q =

MatrixAlie

n

v1

q

v2

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

Project into concept space:Inner product with each ‘concept’ vector vi

Page 44: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Q: Find users that like ‘Matrix’¡ A: Map query into a ‘concept space’ – how?

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 45

v1

q

q*v15 0 0 0 0

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

v2

MatrixAlie

n

q =

Project into concept space:Inner product with each ‘concept’ vector vi

Page 45: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

Compactly, we have:qconcept = q V

E.g.:

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 46

movie-to-conceptsimilarities (V)

=

SciFi-concept

5 0 0 0 0

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

q =

0.56 0.120.59 -0.020.56 0.120.09 -0.690.09 -0.69

x 2.8 0.6

Page 46: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ How would the user d that rated (‘Alien’, ‘Serenity’) be handled?dconcept = d V

E.g.:

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 47

movie-to-conceptsimilarities (V)

=

SciFi-concept

0 4 5 0 0

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

q =

0.56 0.120.59 -0.020.56 0.120.09 -0.690.09 -0.69

x 5.2 0.4

Page 47: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to user q that rated (‘Matrix’), although d and q have zero ratings in common!

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 48

0 4 5 0 0

d =

SciFi-concept

5 0 0 0 0

q =

Matrix

Alie

n

Serenity

Casablan

ca

Amelie

Zero ratings in common Similarity > 0

2.8 0.6

5.2 0.4

Page 48: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

+ Optimal low-rank approximationin terms of Frobenius norm- Interpretability problem:§ A singular vector specifies a linear

combination of all input columns or rows- Lack of sparsity:§ Singular vectors are dense!

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 49

=U

SVT

Page 49: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent
Page 50: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ It is common for the matrix ! that we wish to decompose to be very sparse

¡ But " and # from a SVD decomposition will not be sparse

¡ CUR decomposition solves this problem by using only (randomly chosen) rows and columns of !

511/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

Page 51: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Goal: Express ! as a product of matrices ",$, %Make ǁ! − " · $ · %ǁ) small

¡ “Constraints” on " and %:

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 52

A C U R

Frobenius norm:

ǁXǁF = Ö Σij Xij2

Page 52: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Goal: Express ! as a product of matrices ",$, %Make ǁ! − " · $ · %ǁ) small

¡ “Constraints” on " and %:

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 53

Pseudo-inverse of the intersection of " and%

A C U R

Frobenius norm:

ǁXǁF = Ö Σij Xij2

Page 53: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Let ! be the “intersection” of sampled columns " and rows #§ Let SVD of ! = &'()

¡ Then: * = !+= (',&)§ Z+: reciprocals of non-zero

singular values: Z+ii =1/ Zii

§ Def: W+ is the pseudoinverse

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 54

Why the intersection? These are high magnitude numbersWhy pseudoinverse works?- = ./0)then -12 = 03 12/12.12Due to orthonomality: .12 = .3, 012 = 03Since Z is diagonal Z12 = 1//88Thus, if W is nonsingular, pseudoinverse is the true inverse

A W=

columns, C

row

s, R

intersection

Page 54: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ To decrease the expected error between ! and its decomposition, we must pick rows and columns in a nonuniform manner

¡ The importance of a row or column of ! is the square of its Frobinius norm.§ That is, the sum of the squares of its elements.

¡ When picking rows and columns, the probabilities must be proportional to importance.

¡ Example: [3,4,5] has importance 50, and [3,0,1] has importance 10, so pick the first 5 times as often as the second

551/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets

Page 55: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Sampling columns (similarly for rows):

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 56

Note this is a randomized algorithm, same column can be sampled more than once

Page 56: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Rough and imprecise intuition behind CUR§ CUR is more likely to pick points away from the origin

§ Assuming smooth data with no outliers these are the directions of maximum variation

¡ Example: Assume we have 2 clouds at an angle§ SVD dimensions are orthogonal and thus will be in the

middle of the two clouds§ CUR will find the two clouds (but will be redundant)

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 57

Singular vectorActual column

Page 57: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ For example:§ Select ! = $ % &'( %

)* columns of A using ColumnSelect algorithm

§ Select + = $ % &'( %)* rows of A using

ColumnSelect algorithm§ Set , = -.

¡ Then: / − 123 4 ≤ 2 + 8 / − /9 4with probability 98%

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 58

In practice:Pick 4k cols/rowsfor a “rank-k” approximation

SVD errorCUR error

Page 58: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

+ Easy interpretation• Since the basis vectors are actual

columns and rows+ Sparse basis

• Since the basis vectors are actual columns and rows

- Duplicate columns and rows• Columns of large norms will be sampled many

times

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 59

Singular vectorActual column

Page 59: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 60

SVD: A = U S VT

Huge but sparse Big and dense

CUR: A = C U RHuge but sparse Big but sparse

dense but small

sparse and small

Page 60: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ DBLP bibliographic data§ Author-to-conference big sparse matrix§ Aij: Number of papers published by author i at

conference j§ 428K authors (rows), 3659 conferences (columns)

§ Very sparse¡ Want to reduce dimensionality§ How much time does it take?

§ What is the reconstruction error?§ How much space do we need?

1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 61

Page 61: CS246: Mining Massive Datasets Jure Leskovec, … · CS246: Mining Massive Datasets Jure Leskovec, ... Romance ix n y a lie ... 0 0 0 5 5 0 1 0 2 2 S m n U VT “Concepts” AKA Latent

¡ Accuracy:§ 1 – relative sum squared errors

¡ Space ratio: § #output matrix entries / #input matrix entries

¡ CPU time1/25/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets 62

SVDCURCUR no duplicates

SVDCURCUR no dup

Sun, Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM ’07.

CUR

SVD