SINGULAR VALUE DECOMPOSITIONhari/teaching/bigdata/14_SVD.pdfGoal: Given a ร matrix , = ฮฃ โ = =1...
Transcript of SINGULAR VALUE DECOMPOSITIONhari/teaching/bigdata/14_SVD.pdfGoal: Given a ร matrix , = ฮฃ โ = =1...
-
SVDSINGULAR VALUE DECOMPOSITION
-
Review โฆ
Last time
Recommendation Systems
Decompositions of the Utility Matrix
Gradient Descent
Today
Singular Value Decomposition
-
Dimensionality reduction
Utility Matrix, ๐, is low rank Singular Value Decomposition
๐ โ ๐ ร ๐
๐ = ๐๐
๐ โ ๐ ร ๐, ๐ โ ๐ ร ๐
-
Singular Value Decomposition SWISS ARMY KNIFE OF LINEAR ALGEBRA
-
Goal: Given a ๐ ร ๐ matrix ๐ด,
๐ด = ๐ ฮฃ ๐โ =
๐=1
๐
๐๐๐๐๐๐โ
๐ ร ๐ ๐ ร ๐ ๐ ร ๐ ๐ ร ๐
๐1 โฅ ๐2 โฅ โฏ โฅ ๐๐ โฅ 0 are the singular values of ๐ด๐๐, ๐๐, โฆ , ๐๐ are orthonormal, the left singular vectors of ๐ด, and๐๐, ๐๐, โฆ , ๐๐ are orthonormal, the right singular vectors of ๐ด.
-
Singular Value Decomposition
Closely related problems:
Eigenvalue decomposition ๐ด โ ๐ฮ๐โ
Spanning columns or rows ๐ด โ ๐ถ ๐ ๐
Applications:
Principal Component Analysis: Form an empirical covariance matrix from some collection of statistical data. By computing the singular value decomposition of the matrix, you find the directions of maximal variance
Finding spanning columns or rows: Collect statistical data in a large matrix. By finding a set of spanning columns, you can identify some variables that โexplainโ the data. (Say a small collection of genes among a set of recorded genomes, or a small number of stocks in a portfolio)
Relaxed solutions to ๐-means clustering: Relaxed solutions can be found via the singular value decomposition
PageRank: primary eigenvector
-
Singular values, intuition
Blue circles are ๐ data points in 2๐ท
The SVD of the ๐ ร 2 matrix
๐1: 1st (right) singular vector: direction of maximal variance,
๐1: how much of data variance is explained by the first singular vector
๐2: 2nd (right) singular vector: direction of maximal variance, after removing projection of the data along first singular vector.
๐2: measures how much of the data variance is explained by the second singular vector
-
SVD - Interpretation
๐ = ๐ฮฃ๐โ - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.80
0 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71x
v1
-
SVD - Interpretation
X = U S VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.80
0 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71x
variance (โspreadโ) on the v1 axis
-
SVD - Interpretation
๐ด = ๐ผ๐บ๐โ - example:
๐ฮฃ gives the coordinates of the points in the projection axis
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.80
0 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71x
-
Dimensionality reduction
set the smallest eigenvalues to zero:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.80
0 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71x
-
Dimensionality reduction
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1
0.18
0.36
0.18
0.90
0
0
0
~9.64
x 0.58 0.58 0.58 0 0x
-
Dimensionality reduction
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3
0 0 0 1 1
~
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
-
Relation to Eigenvalue
decomposition
Eigenvalue decomposition: Can only be applied to certain classes of square matrices
Given an SVD of a matrix ๐
๐โ๐ = ๐ฮฃโ๐โ ๐ฮฃ๐โ = ๐ ฮฃโฮฃ ๐โ
๐๐โ = ๐ฮฃ๐โ ๐ฮฃโ๐โ = ๐ ฮฃฮฃโ ๐โ
The columns of ๐ are eigenvectors of ๐โ๐
The columns of ๐ are eigenvectors of ๐๐โ
The non-zero elements of ฮฃ are the square roots of the non-zero eigenvalues of ๐๐โ or ๐โ๐
-
Calculating inverses with SVD
Let ๐ด be a ๐ ร ๐ matrix
Then, ๐, ฮฃ, and ๐ are also ๐ ร ๐
๐ and ๐ are orthogonal, so their inverses are equal to their transposes
ฮฃ is diagonal, so its inverse is the diagonal matrix whose elements are the inverses of the elements of ฮฃ
๐ดโ1 = ๐1/๐1 โฏ
โฎ โฑ โฎโฏ 1/๐๐
๐๐
-
Calculating inverses
If one of the ๐๐ is zero or so small that its value is dominated by round-off error, then there is a problem
The more of the ๐๐s that have this problem, the โmore singularโ ๐ด is
SVD gives a way of determining how singular ๐ด is
The concept of โhow singularโ ๐ด is, is linked with the condition number of ๐ด
The condition number of ๐ด is the ratio of the largest singular value to its smallest singular value
-
Concepts you should know
Null space of ๐ด ๐ฅ | ๐ด๐ฅ = 0
Range of ๐ด ๐ | ๐ด๐ฅ = ๐, for some vector ๐ฅ
Rank of ๐ด dimension of the range of ๐ด
Singular Valued Decomposition constructs orthonormal bases for the range and null space of a matrix
The columns of ๐ which correspond to non-zero singular values of ๐ดare an orthonormal set of basis vectors for the range of ๐ด
The columns of ๐ which correspond to zero singular values form an orthonormal basis for the null space of ๐ด
-
Computing the SVD
Reduce the matrix ๐ to a bidiagonal matrix
Householder reflections
QR decomposition
Compute the SVD of the bidiagonal matrix
Iterative methods
-
Randomized SVD
-
Goal: Given a ๐ ร ๐ matrix ๐ด, for large ๐, ๐, we seek to compute a rank-๐approximation, with ๐ โช ๐,
๐ด โ ๐ ฮฃ ๐โ =
๐=1
๐
๐๐๐๐๐๐โ
๐ ร ๐ ๐ ร ๐ ๐ ร ๐ ๐ ร ๐
๐1 โฅ ๐2 โฅ โฏ โฅ ๐๐ โฅ 0 are the (approximate) singular values of ๐ด๐๐, ๐๐, โฆ , ๐๐ are orthonormal, the (approximate) left singular vectors of ๐ด, and๐๐, ๐๐, โฆ , ๐๐ are orthonormal, the (approximate) right singular vectors of ๐ด.
-
Randomized SVD
1. Draw an ๐ ร ๐ Gaussian random matrix, ฮฉ
2. Form the ๐ ร ๐ sample matrix ๐ = ๐ดฮฉ
3. Form an ๐ ร ๐ orthonormal matrix ๐ such that ๐ = ๐๐
4. Form the ๐ ร ๐ matrix ๐ต = ๐โ๐ด
5. Compute the SVD of the small matrix ๐ต = ๐ฮฃVโ
6. Form the matrix ๐ = ๐ ๐
Computational Costs ?
2,4 ๐ โMatrix-Vector product3,5,6 dense operations on matrices
๐ ร ๐, ๐ ร ๐
-
Computational Costs
If ๐ด can fit in RAM
Cost dominated by 2๐๐๐ flops required for steps 2,4
If A cannot fit in RAM
Standard approaches suffer
Randomized SVD is successful as long as
Matrices of size ๐ ร ๐ and ๐ ร ๐ must fit in RAM
Parallelization
Steps 2,4 permit ๐-way parallelization
-
Probabilistic Error Analysis
The error of the method is defined as
๐๐ = ๐ด โ ๐ด๐
๐๐ is a random variable whose theoretical minimum value is๐๐+1 = min ๐ด โ ๐ด๐ โถ ๐ด๐ has rank ๐
Ideally, we would like ๐๐ to be close to ๐๐+1 with high probability
Not true, the expectation of ๐๐
๐๐+1is large and has very large variance
-
Oversampling โฆ
Oversample a little. If ๐ is a small integer (think ๐ = 5), then we often
can bound ๐๐+๐ by something close to ๐๐+1
๐ผ ๐ด โ ๐ด๐+๐ โค 1 +๐
๐ โ 1๐๐+1 +
๐ ๐ + ๐
๐
๐=๐+1
๐
๐๐2
12