Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine...

55
Scalable Machine Learning Matrix factorization

Transcript of Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine...

Page 1: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Scalable Machine Learning

Matrix factorization

Page 2: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Matrix factorization/decomposition in ML

Arise as a solution to various practical problemsI Matrix completionI Missing value estimationI Representation learning

Page 3: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Recommender systems

There are two basic types of recommender systemsI Items have side information. Example: Pandora. Many possible solutions

some of which can involve matrix factorization.I Items have no side information. Also called collaborative filtering.

Example: Netflix. We can only use other people’s ratings. Need to domatrix factorization either explicitly or implicitly.

Netflix: 100,480,507 ratings that 480,189 users gave to 17,770 movies

1 million dollars goes to fancy matrix factorization!

Page 4: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Matrix factorization/decomposition in math

Given a matrix Y , write Y as a product of 2 or 3 other matrices.

Page 5: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Some useful matrix decompositions

Many decompositions apply only to square matrices and may requireadditional conditions.

I Eigen (diagonalization)-square matrix-sometimes existsI Cholesky-square matrix-sometimes exists

A strong condition for existence of various decompositions is that a matrix ispositive semi-definite often written as Y � 0

xT Yx ≥ 0 (1)

Page 6: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Square matrices in data analysis

I In data analysis matrices are actual data not linear operators.

I If you are dealing with a square matrix it is most likely a samplecovariance. Y T Y or YY T .

I Useful fact: all such matrices are positive semi-definite

I Interesting fact: the converse is also true. All positive semi-definitematrices are inner products in some space. Important for kernel learningmethods like SVM.

Page 7: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

SVDMost of the data matrices are not square!

Any matrix no matter how weird has a Singular Value Decomposition.

U and V are orthonormal so that UT U = I and V T V = I.

assuming m ≥ n

Page 8: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

SVD

Y = U · D · V T (2)

D is diagonal with di >= 0. By convention we write D such that di >= di+1

Figure adapted from the Wikipedia “SVD” article

Page 9: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Why? (Proof sketch)

I Y may be weird by both Y T Y and YY T are nice:

They are both positive semi-definite.

I So they both have eigen-decompositions PDP−1 = PDPT where D isnon-negative and the eigenvectors of P are orthogonal.

I The eigenvectors of YY T and Y T Y give the left (U) and right (V ) singularvectors of Y .

Page 10: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Why? (Proof sketch)

I Y may be weird by both Y T Y and YY T are nice:

They are both positive semi-definite.

I So they both have eigen-decompositions PDP−1 = PDPT where D isnon-negative and the eigenvectors of P are orthogonal.

I The eigenvectors of YY T and Y T Y give the left (U) and right (V ) singularvectors of Y .

Page 11: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Why? (Proof sketch)

I Y may be weird by both Y T Y and YY T are nice:

They are both positive semi-definite.

I So they both have eigen-decompositions PDP−1 = PDPT where D isnon-negative and the eigenvectors of P are orthogonal.

I The eigenvectors of YY T and Y T Y give the left (U) and right (V ) singularvectors of Y .

Page 12: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Why? (Proof sketch)

I Y may be weird by both Y T Y and YY T are nice:

They are both positive semi-definite.

I So they both have eigen-decompositions PDP−1 = PDPT where D isnon-negative and the eigenvectors of P are orthogonal.

I The eigenvectors of YY T and Y T Y give the left (U) and right (V ) singularvectors of Y .

Page 13: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Algebraic intuition

SVD as a sum of rank-1 matrices

So each entry yi,j ∈ Y can be written as.

n∑k=1

dk Ui,k Vk,j (3)

Figure adapted from the Wikipedia “SVD” article

Page 14: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Why is the SVD useful

Y U VTD

p

n

k

Theorem: Zeroing out all but the top k singular values in D gives the bestrank-k approximation to the original matrix.

By “best” we mean in terms of the squared Frobenius norm:

||Y − Y ||2F =∑

i,j

(Yi,j − Yi,j )2

Plausibility argument:

|Y |2F = tr(Y T Y ) =∑

d2i

Where the d2i are eigenvalues of Y T Y and dis are singular values of Y

Page 15: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Why is the SVD useful

Y U VTD

p

n

k

Theorem: Zeroing out all but the top k singular values in D gives the bestrank-k approximation to the original matrix.

By “best” we mean in terms of the squared Frobenius norm:

||Y − Y ||2F =∑

i,j

(Yi,j − Yi,j )2

Plausibility argument:

|Y |2F = tr(Y T Y ) =∑

d2i

Where the d2i are eigenvalues of Y T Y and dis are singular values of Y

Page 16: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Why is the SVD useful

Y U VTD

p

n

k

Theorem: Zeroing out all but the top k singular values in D gives the bestrank-k approximation to the original matrix.

By “best” we mean in terms of the squared Frobenius norm:

||Y − Y ||2F =∑

i,j

(Yi,j − Yi,j )2

Plausibility argument:

|Y |2F = tr(Y T Y ) =∑

d2i

Where the d2i are eigenvalues of Y T Y and dis are singular values of Y

Page 17: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Why is the SVD useful

Y U VTD

p

n

k

Theorem: Zeroing out all but the top k singular values in D gives the bestrank-k approximation to the original matrix.

By “best” we mean in terms of the squared Frobenius norm:

||Y − Y ||2F =∑

i,j

(Yi,j − Yi,j )2

Plausibility argument:

|Y |2F = tr(Y T Y ) =∑

d2i

Where the d2i are eigenvalues of Y T Y and dis are singular values of Y

Page 18: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

An example: single cell RNAseq

I Most clinical data are tissue samplesI Tissue is composed of different cell-typesI Immune: 20 cell-types. Liver: 5 cell-types.

I We often want to know which cell-types are affected by the disease/drug.

Page 19: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

An example:single cell RNAseq

Page 20: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Some single cell data

Tnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

0

1

2

3

4

5

6

Data has lots of 0s. Either the gene is not on in that cell-type or we failed tocapture it.

Page 21: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Some single cell data

Raw data top rank-1 approximationTnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

0

1

2

3

4

5

6Tnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

0

1

2

3

4

Page 22: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Some single cell data

Raw data top rank-1 approximationTnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

0

1

2

3

4

5

6Tnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

0

1

2

3

4

Second rank-1 approximation rank-2 approximationTnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

−1

0

1

2

3

Tnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

0

1

2

3

4

Page 23: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Some single cell data

Raw data rank-5 approximationTnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

0

1

2

3

4

5

6Tnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

−1

0

1

2

3

4

5

6

rank-10 approximationTnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

0

2

4

6

Page 24: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Filling in missing values

I The CD8 molecule is made from 2 different genes (Cd8a and Cd8b1).I They only work together and each cell either makes neither or both.

Raw data rank-10 approximation

●●●●●● ●● ●

●●●●●●●●●●●●●●●●●●●●●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●● ●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●

●●●●

● ●

●●●●

●●

● ●●

●● ●

●●●●● ●●●

●●

●●

●●●●● ●●●● ●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●●

●●

●●

● ●

●●

● ●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●●●●●●●

●●

●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●●

●●● ●●● ●

●● ●●●●● ●●●●●●●●●●●●●● ●●●

●●●●●●

●●●●● ●●●●

● ●

●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●●●●●●

●●●●●●●●●●●●● ●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●

●●●●●

●●●●●●●● ●●●●●●●●●●●●●●●●● ●

●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●

●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●

●●

●●●●

●●●

●●

●●

● ● ●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●

●●●●●●●

●●●●●●

●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●

●●●

●●●●

●●

●●●●

●●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●● ●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●

●● ●

●●● ●

●● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ●●●●●● ● ●●

●●●●●●●●●●●●●●● ●● ●●●●●●●●●●

●●●●●●●●●●●●

● ●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●●

●●●

●●

●●●●●●

●●●●●●

●●●●●●●●●●

●●●●●●●●

●●

●●●●●●●●●●●●●●●0

1

2

3

4

0 1 2 3Cd8a

Cd8

b1

●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●●●●●●●●

●●●●●

●●●

●●●

●●

●●●●●●●●

●●●●

●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●

●●●

●●●●●●●

●●●

●●●●

●●●●

●●●●●●●

●●●●

●●●●

●●●●●●●

●●●●

●●●●●●●●●●

●●●

●●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●●●●●●●

●●

●●●●

●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●●

●●

●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●●

●●●

●●

●●

●●

●●●

●●●●●●●●●●●●●

●●

●●●●●●

●●●●●

●●●

●●●●●●●●●

●●

●●●●●●●

●●●●●●●●

●●●

●●●●

●●

●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●

●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●

●●●

●●●●

●●●●●●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●●●●●●

●●

●●

●●

●●

●●●●●●●●●●●●●

●●●

●●●●●●

●●

●●●●●

●●

●●●●●●●

●●

●●●

●●

●●●●

●●

●●

●●●●●●●●●●●●●●

●●

●●

●●●

●●●

●●●

●●●●●

●●●●

●●●

●●●●●●●●●

●●●●●●●●

●●●●●●

●●●

●●●●

●●●

●●●●

●●●●

●●●

●●●●●●●●●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●●●

●●

●●●●●

●●

●●●●

●●

●●

●●●●●●●●●●●●

●●

●●●●●●●●●

●●●●●●●●

●●

●●

●●●●

●●

●●●●●●●●

●●

●●●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●●●

●●

●●

●●

●●●● ●●

●●●●●

0

1

2

3

4

0 1 2 3Cd8a

Cd8

b1

Page 25: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

What rank do we need?rank-10 approximation

Tnfrsf4Izumo1rSurrogate_iCreCd4Fcgr1Msr1Aif1Ifi205HpIfitm6Mgst1Mcemp1Ms4a7Trem2Cbr2C1qaTcf7Bcl2Nsg2Lef1Birc52810417H13RikSpc24Ccnb2GzmaKlrb1cKlra8Klra4PdgfaGm44040Klrb1fGzmcNcr1Car2Cd8b1Cd8aLag3Cxcr6H2−DMb2Cbfa2t3Ccl17Cd209aInhbaFlrt3Clec4dThbs1SiglechCcr9Cd300cD13Ertd608e

0

2

4

6

Why 10? Often the number of components is determined to the elbow plot.

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●5 10 15 20

0.0

0.1

0.2

0.3

0.4

Index

Var

ianc

e ex

plai

ned

Page 26: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Singular values of a randomized matrix

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●5 10 15 20

0.0

0.1

0.2

0.3

0.4

Index

Var

ianc

e ex

plai

ned

●●

●●

●● ●

● ●●

● ● ● ● ● ●● ●

● ●5 10 15 20

0.04

80.

049

0.05

00.

051

0.05

2

Randomized

Index

Var

ianc

e ex

plai

ned

Page 27: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

SVD vs PCAIn data analysis we frequently here the term “Principle Component Analysis” orPCA

I SVD is matrix decomposition with a mathematical definition and can beapplied to any matrix.

I PCA is a data analysis technique that reduces to applying SVD to datapre-processed in a specific way – we subtract the mean for each row (alsocalled centering)

I If the mean is 0 than then the rank k approximations capture the variance!

●●

●● ●●

●●●

●●●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●●●

●● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●●●

●●

●●●

●●●

● ●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●● ●

●●

● ●●

●●

●●

●●

● ● ●

●●

●●

● ● ●●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

● ●●

●●

● ●●●

●●

●●●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●● ●

●●

● ●

●●

−4 −2 0 2 4

−4

−2

02

4

mydata[, 1]

myd

ata[

, 2]

first SVsecond SV

●●

●● ●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●●●

●● ●

● ●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

● ●●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●● ●

●●

●●

●● ●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

● ●●

● ●

●● ●

●●

●●

●●

●●

●● ●●

●●●

●●●

●●

●●

●●

●●●● ●

●●

● ●●

●●

●●

●●●

● ● ●

●●

●●●●

● ● ●●●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●●●

● ●●

●● ●

●●

● ●●●

●●●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●●

●●

●●

●● ●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●● ●

●●

●●

● ●

●●

●●●

●●●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

● ●

●●●

0 2 4 6 8 10

02

46

810

mydata[, 1]

myd

ata[

, 2]

Page 28: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

SVD vs PCAIn data analysis we frequently here the term “Principle Component Analysis” orPCA

I SVD is matrix decomposition with a mathematical definition and can beapplied to any matrix.

I PCA is a data analysis technique that reduces to applying SVD to datapre-processed in a specific way – we subtract the mean for each row (alsocalled centering)

I If the mean is 0 than then the rank k approximations capture the variance!

●●

●● ●●

●●●

●●●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●●●

●● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●●●

●●

●●●

●●●

● ●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●● ●

●●

● ●●

●●

●●

●●

● ● ●

●●

●●

● ● ●●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

● ●●

●●

● ●●●

●●

●●●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●● ●

●●

● ●

●●

−4 −2 0 2 4

−4

−2

02

4

mydata[, 1]

myd

ata[

, 2]

first SVsecond SV

●●

●● ●●

●●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●●●

●● ●

● ●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

● ●●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●● ●

●●

●●

●● ●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

● ●●

● ●

●● ●

●●

●●

●●

●●

●● ●●

●●●

●●●

●●

●●

●●

●●●● ●

●●

● ●●

●●

●●

●●●

● ● ●

●●

●●●●

● ● ●●●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●●●

● ●●

●● ●

●●

● ●●●

●●●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●●

●●

●●

●● ●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●● ●

●●

●●

● ●

●●

●●●

●●●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

● ●

●●●

0 2 4 6 8 10

02

46

810

mydata[, 1]

myd

ata[

, 2]

Page 29: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

SVD vs PCA

In PCA we also typically analyze just U or just VAnd only care about the first few singular vectors. We assume the data is lowrank up to some error

Cells PCA Genes PCA

●●●●

● ●●

●● ●●●●

●●●

●●●

●●●●

●●●●

●●●

●●●●●●

●●●

●●

●●●●●●●●●●●●●●●

●●●●●●●

●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●

●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●

●●●● ●●●

●●●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●

●●●●

●●●●●

●●●●●

●●

●●●●●

●●●●

●● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●

●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●

●●●●●

●●●●●

●●●● ●●

● ●●●

●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●

● ●●●●●

●●●

●●●●●●●

●●●●●●

●●●●

●●●

●●●

●●●●●●●●●●●●

●●●●

●● ●

●●

●●●●●●

●●●

●●●● ●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●●●

●●●●●●●●

●●●

●●●●

●●

●●●●●●●●●●●●●●●●

●●●●

●●

●●●●●

●●●●●●●●●

●●●●●●●●●

●●

●●●●●

●●●●●

● ●●●●

●●●●●●●●

●●

●●●

●●●●

●●●●●●●●

●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●

●●●●●●● ●●●

●●●●●●●●●

●●

●●●●

●●●●●●●●

●●●●●

●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●

●●●●●●

●●●

●●●●●

●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●

●●●

●●●

●●●●●●●●●●●●●

●●

●●●●●

●●●●

●●

●●●

●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●● ●●●●●●● ●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●●

●●●●●●

●●●●●●●●●

●●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●

−3

0

3

6

−5.0 −2.5 0.0 2.5 5.0PC1(23.91%)

PC

2(17

.11%

)

2

3

4

5

6

7

8

9

10

11

12

13

●●●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●

●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●●●●

●●●●

●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●

●● ●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●

●●●

●●●●

●●●●

●●●●

●●●●●●●

●●●●●●●

●●●

●●

●●

●●●●

●●●

●●●●●

●●●

●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●

●●●●●●●

●●●●●●●●●●

● ●●●

●●●●●●●●●●●●●●

●●

●● ●●●●●●●●●●● ●●●●●●●●●●●●●

●●●●

●●●●

●●●

●●●

●●

●●● ●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●● ●●●●●

●●

●●●●●●

●●●●●●●

●●

● ●●●

●●●

●●●

●●●

●●

●●●

●●●●

●●●

●●

●●● ●●

●●●●●

●●●

●●

●●●

●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●

●●● ●

●●●

●●●●●●●

●●

●●

●●●

●●●●

●●●●

●●●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●●●●

●●●●●●●

● ●

●●●●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●●●●

●●

●●

●●●●

●●●●●

●●●●

●●●●●●

●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●

●●●●●●

●●●

●●●●●●●

●●

●●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●

●●●●

●●

●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●

●●●●

●●●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●

●●●

●●●●●●●●

●●●

●●●●●●●●●●●●

●●●●

●●●●●●

●● ●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●

●●

●●●●●● ●●●●●●●●

●●●●●●●●●●

●●●●●●●●●● ●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●

●●●●●●

●●●●●●●●●●●

●●

●●

●●

●●●

● ●

●●●

●●●

●●●●●●●●

●●●●●●●

●●●●●

●●●●●●●●●●●●●●●● ●

●●●●●●●●●●●●

●●●●●●

−5.0

−2.5

0.0

−5.0 −2.5 0.0 2.5 5.0PC1(23.91%)

PC

3(10

.33%

)

grp

1

2

3

4

5

6

7

8

9

10

11

12

13

●●●●

●●●

●● ●●

●●●●●

●●

●●●●●

●●

●●

●●●

●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●

●●●●●●●●●●●●●●

●●●●

●●●●●

●●●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●●

●●●

●●

●●●●●●

●●●●●

●●●

●●●●●

●●

●●●

●●

●●●

●●●●●●●●●

●●

●●●

●●

●●●●●●●●●●●●●●●●●●

●●●

●●●●●●

●●●●●●●

●● ●●●

●●●●●●●●●

●●●

●●

●●●●

● ●●●

●●●●●●●●

●●●●●●●●●●●

●●●

●●●●

●●

●●●●●

●●

●●●●●

●●●

●●●●●●●●●●

●●●●●●●●●

●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●

●●●●●●●●●●

●●●

●●●

●●●

●●●

●●●●●

●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●

●●●●

●●

●●●●

●● ●●●

●●

●●

●●●●

●●●

●●

●●●●

●●●●

●●●●●●●

●●

●●

●●●

●●●

●●●

●●●

●●●●

●●●●●●

●●

●●●

●●●●

●●●

●●

●●●

●●●

●●●●●●●

●●●●

●●●

●●●

●●●●●

●●●●●●●●●●

●●●●

●●●●●●●

●●

●●●●●

●●

●●

●●

●●●●●

●●

●●● ●

●●●●

●●

●●

●●●●●●●●●●

●●

●●●●●●●

●●●●●●

●●●●

●●●●●●

●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●

●●

●●●

●●●●●●●●●●●

●●●●

●●●

●●●●●●●●

●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●

●●●●●

●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●

●●●●●●●●●●●●

●●●●●●●●

●●● ●

●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●● ●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●

●●●● ●●●●

●●●●●

●●●●●●●●

●●●●

●●●●●

●●●●●●●●●●●●●

●●●●●

●●

●●●●●●

●●●

●●●

●●●●

●●●●●

●●●●●●●

●●●●●●●

●●●●●●●●●●● ●

●●●●●●●●

●●●

●●●●●●

●●●

●●●●●●

●●●

●●●

●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●

●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●●●

●●●●●●●

●●●●●●●

●●

●●●

●●●●●

●●●●

●●●●

●●●●●●●●

●●●●●●

●●

●●●●●●●●●●●●

●● ●

●●

●●●●●

●●●●

●●●●●●●

−2

0

2

4

−5.0 −2.5 0.0 2.5 5.0PC1(23.91%)

PC

4(6.

37%

)

grp

1

2

3

4

5

6

7

8

9

10

11

12

13

●●●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●●●

●●●

●●●

●●●●●●●●●

●●●●●

●●●

●●●●●●

●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●

●●

●●

●●●●

●●●●●●

●●●●●●●

●●●●●

●●●●

●●●●●●●●●

●●●●●●●

●●●●●●●●●

●●●●

●●●●

●●

●●●

●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●

●●●

●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●

●●●●

●●

●●

●●●●

●●●●●●●

●●●●●●●

●●

●●

●●●

●●●

● ●●●●

●●●

●●●

●●

●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●

●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●

●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●

●●

●●●●

●●

●●●●●●●

●●

●●●●

●●●●

●●

●●●●●

●●●

●●●●

●●●●●

●●●●

●●

●●●

●●

●●●

●●

●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●

● ●

●●●●

●●●●

●●●●

●●●●●●●

●●

●●

●●●●

●●

●●● ●

●●

●●●

●●

●●

●●●

●●●●●

●●●●●●●●

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●●

●●●●●

●●●●●

●●●●

●●●●

●●●●●●●●

●●

●●●● ●●●●●●●●●

●●●●●●●●

●●●●●●

●●●

●●●●●●●

●●

●●●

●●●●●●●●●●

●●●●●●●●●●●

●● ●●

●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●

●●

●●●●

●●

●●●●

●●

●●●●●●●

●●●●●●●●●●●●●

●●●●

●●●●●●●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●

●●●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●

●●●

●●●●●●●●

●●

●●

●●●●

●●●●●●●●●●●●

●●●●●●●

●● ●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●

●●●●●●

●●●●●●●●●●●

●●●●

● ●

●●●

● ●

●●●●●●●●●●●●●

●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

−5.0

−2.5

0.0

−3 0 3 6PC2(17.11%)

PC

3(10

.33%

)

2

3

4

5

6

7

8

9

10

11

12

13

●●●●

●●

●●

●●●

●● ●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●●

●●●●●●●●●●●●

●●●●●●●●●●●

●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●

●● ●●●●●●●●●●●●● ●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●

●●●●

●●●●●

●●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●●●

●●●

●●

●●●

●●

●●●●

●●●

●●●●●

●●

●●●

●●

●●●

●●●●●●●●●

●●●●●

●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●

●●●●●

●●●●●●

●●●●●●

●●

●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●

●●

●●●●●

●●

●●●●●●●●●●

●●●●●●●●

●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●

●●●●●●●

●●●

●●●

●●●

●●●

●●●

●●●●●

●●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●●●●

●●

●●

●●●●

●●

●●●●

●●● ●●

●●

●●●●●●

●●●

●●

●●●

●●●●●●

●●●●●●●

●●●●●

●●●

●●●

●●●●●●

●●●●●●

●●●

●●●

●●●●

●●●

●●

●●●

●●●

●●●●●●●●

●●●●●●●

●●●

●● ●●

● ●●●●●●●●●

●●●●

●●●●●●●

●●

●●●●●

●●●

●●

●●

●●●●

●●

●●●●

●●● ●

● ●

●●

●●●●●●●●●

●●●

●● ●

●●●●

●●●

●●●

●●●●

●●●●●

●●

●●●●

● ●●●●●●

●●●●●

●●●●●●● ●●●●●

●●

●●●

●●●●●●●●●●●

●●●●

●●●

●●●●●●●●

●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●

●●●●●

●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●

●●●●●●●

●●●●●●●

●●●●●●●

●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●

● ●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●

●●●●●

●●●●

●●●●●●●●●

●●●●●●●●●●● ●●●●

●●●●●

●●●●●●●

●● ●●

●●●●●

● ●●●●●●●●●●●

●●●●●●

●●

●●●●●●

●●●

●●●

●●●●

●●●●●

●●●●●●●

●●

●●●●●●●●

●●●●●●●●●

●●

●●●●●●

●●●

●●●●●●●●●

●●●●●●

●●●

●●

●●

●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●● ●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●●●●

●●●●●●●●●

●●●●●●

●●

●●●

●●●●●●●●●●●●●

●●●●●●●●

●●●●●●

●●

●●●●●●●●●●●●

●●●

●●●

●●●●●●●●●

●●●●●●●

−2

0

2

4

−3 0 3 6PC2(17.11%)

PC

4(6.

37%

)

grp

1

2

3

4

5

6

7

8

9

10

11

12

13

●●●●

●●

●●

●●●

●●●●●

●●

●●

● ●●

●●

● ●

● ●●

●●●●

● ●

●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●

● ●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●●●●

●●●●

● ●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●

●●

●●

●●●●●●●●●●●●●●

●●●●

●●●●●

●●●●

●●

●●●●●

●●

● ●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●

●●

●●●

●●

●●●●

●●●

●●●●●

●●

●●●

●●

●●●

●●●●●●●●●

●●●●●

●●

●●●●●●●●●●●●●●●●

●●●

●●●●●●●●

●●●●●●●

●●●●●

●●●●●

●●●●●●●●●

●●●●

●●●●●●●●●●●●

●●●●●●●●●●●

●●●

●●●●

●●

●● ●●●

●●

●●●●●

●●●●●

●●●

●●●●●

●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●

●●●●●●●●

●●

●●●

●●●

●●●

●●●

●●●●●

●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●

●●●

●●

●●●●

●●●●●●●●

● ●●●●

●●

●●

●●●●

●●●

●●●

●●●●

●●●●●

●●●

●●●

●● ●●

●●●

●●

●●●

●●●

●●●●

●●●●

●●

●●●●●●●

●●●●

●●

● ●●●

●●●●●●

●●

●●●●● ●●●●

●●

● ●●●

●●●●●

●●

●● ●●

●●

●●●●

●●● ●

● ●

●●

●●●●●●●●●

●●●

●● ●

●●●●

●●●

● ●●

●●●●

●●●●●

●●

●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●

●●

●●●

●●●●

●●● ●●●●●●●●

●●●

●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●

●●●●●●●

●●●●●

●●●●●●

●●● ●●●

●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●

●●●●●

●●●●●●●●●

●●● ●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●

●●●●●

●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●

● ●●●●

●●●

●●●●●

●●● ●

● ●●●●

●●●●●●●●●●●●

●●●●●●

●●

●●●●●●●●●

●●●

●●●●

●●●●●

●●●●●●●

●●●●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●

●●●●●

●●●

●●●●●●

●●●

●●

●●

●●●●●

●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●

●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●

●●●

●●●●●●●●●●●●●●●●

●●

●●

●●●

●●●●●●●●●●●●●

●●●●●●●●

●●●●●●

●●

●●●●●●●●●●●●

●●●

●●●

●●●●

●●●●●

● ●●●●

●●

−2

0

2

4

−5.0 −2.5 0.0PC3(10.33%)

PC

4(6.

37%

)

grp

1

2

3

4

5

6

7

8

9

10

11

12

13

●●●●

●●●●

●●

●●

●●

●●●●

●●●

●●●●

●●

●●●●

●●●●●●

●●

●●●●

−20

0

20

40

−40 −20 0PC1(24.72%)

PC

2(16

.05%

)

2

3

4

5

6

7

8

9

10

11

12

13

● ●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●●●

●●

●●●●

−40

−30

−20

−10

0

10

−40 −20 0PC1(24.72%)

PC

3(11

.46%

)

grp

1

2

3

4

5

6

7

8

9

10

11

12

13

●●●

●●

●●●●●

●●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●●●●

●●●●

−10

0

10

20

−40 −20 0PC1(24.72%)

PC

4(7.

37%

)

grp

1

2

3

4

5

6

7

8

9

10

11

12

13

●●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●●●●

●●

●●●●

−40

−30

−20

−10

0

10

−20 0 20 40PC2(16.05%)

PC

3(11

.46%

)

2

3

4

5

6

7

8

9

10

11

12

13

●●●

●●

●●●●●

●●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●●

●●

●●●●

−10

0

10

20

−20 0 20 40PC2(16.05%)

PC

4(7.

37%

)

grp

1

2

3

4

5

6

7

8

9

10

11

12

13

●●●

●●

●●●●●

●●●

●●

●●

●●

● ●●●●●

●●

●●

●●

●●●●●●

●●

●●●●

−10

0

10

20

−40 −30 −20 −10 0 10PC3(11.46%)

PC

4(7.

37%

)

grp

1

2

3

4

5

6

7

8

9

10

11

12

13

Page 30: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Minimizing error and maximizing variance

SVD approximations can be viewed asI Minimizing squared errorI Maximizing the variance along the singular vector directions

Both views are equivalent and can be used to derive optimization algorithms.

Page 31: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Projection intuition

Page 32: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Two views of SVA

For a rank-1 approximation we have:

min{u,v}||Yn×m − un×1dvT1×m||2F (4)

subject to ||u|| = 1, ||v|| = 1

Is the same as:

max{u,v}uTn×1Yv1×m (5)

A simple algorithm for finding the first singular vectors.

1. Initialize a vector v with L2 norm of 1.2. Iterate:

I u← argmaxu uT Yv subject to ‖u‖22 ≤ 1

I v← argmaxu uT Yv subject to ‖v‖22 ≤ 1

Page 33: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

SVD complexity

I The complexity of computing a full SVD (assuming m > n) is O(mn2)

I If we want just the first k components we can do is O(mnk)

I A very popular method is Randomized SVD or RSVD

Page 34: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

RSVD

A ≈ QQ>A = QB = QUΣV> = UΣV> (6)

Page 35: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Denoising variant of SVD

We can approximate our matrix Y as a sum of Low rank and Sparse matrices.

This is often called roust PCA or rPCA.

A ≈ L + S (7)

This reduces toI penalizing S entry-wise with L1 norm andI penalizing the singular values of L with L1 norm, forcing some of them to

0.

Page 36: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

rPCA example

using rPCA for background subtraction

Low-Rank Modeling and Its Applications in Image Analysis

Page 37: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

General matrix decomposition/factor analysis

We want to approximate a (features) × (samples) matrix Yf×s as a product oftwo (sometimes three) low rank matrices.

Yf×s = Lf×k Fk×s + E (8)

We can generally refer to F as the factors and L as the loadings. E is theerror.

What do we mean approximate? We want to maximize the likelihood of Y . Wemay have some priors on L and F .

Page 38: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

General matrix decomposition/factor analysis

We want to approximate a (features) × (samples) matrix Yf×s as a product oftwo (sometimes three) low rank matrices.

Yf×s = Lf×k Fk×s + E (8)

We can generally refer to F as the factors and L as the loadings. E is theerror.

What do we mean approximate? We want to maximize the likelihood of Y . Wemay have some priors on L and F .

Page 39: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

General matrix decomposition

min{L,F}Loss(Y , LF ) + PL(L) + PF (F ) (9)

I We want to minimize some loss.

I We may want to penalize and constrain the factors and loadings.

I Choices are derived from a likelihood formulation.

Page 40: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

General matrix decomposition

min{L,F}Loss(Y , LF ) + PL(L) + PF (F ) (9)

I We want to minimize some loss.

I We may want to penalize and constrain the factors and loadings.

I Choices are derived from a likelihood formulation.

Page 41: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Consider the least squares error

min{L,F}||Y − LF ||2F (10)

What probabilistic assumption does least-squares loss correspond to?

Absent any other assumptions this is solved by SVD.

We set L = UD1/21:k and F1:k = D1/2

1:k V T .

∗Note that the scaling by D is arbitrary and L = U and F = DVT give the same reconstruction error.

Page 42: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Matrix factorization for prediction

If all we care about is loss there are many equivalent solutions.

Yn×m ≈ Ln×k Fk×m = (LBk×k )(B−1k×k F ) (11)

Multiplying both L and F by some invertible matrix gives the same predicted Y .

But SVD is unique?? How does that make sense?

Page 43: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Matrix factorization for prediction

What about the loss? Least squares loss is pretty standard

min{L,F}||Y − LF ||2F (12)

If we care about predicting missing values is this the best choice?

Hint: there are two possible problems!

Page 44: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Beyond prediction: representation learningWe may want something more from our factorization than just predictions.Most famous machine learning dataset: 70,000 handwritten digits, each in a28× 28 pixel image (784 pixels per image).

Approximating as rank-k

Page 45: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Using SVD to understand your data

I The MNIST dataset is a 70, 000× 784 matrix.I Each row is “really” one of just 10 digits!I Does that correspond to the SVD representations? Do the singular

vectors correspond to digits?

Singular vectors viewed as 28× 28 pixel images:

Not really! If aliens are looking at MNIST and trying to understand handwrittendigits, they shouldn’t use SVD!

Page 46: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

The why of representation learning?

Our data is generated by some process – there is a latent representationthat captures that process

I For MNIST the latent factors are digits.I For Netflix the latent factors are movie attributesI For single cell data the latent factors are cell-types

Can we recover them from matrix factorization?

Why would we want to?

Example: if the latent factors correspond to physical variables we can askquestions about causality. If they are linear combinations of real variables thatdoesn’t make sense!

Page 47: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

SVD does not generally lead to a mechanistically correct model

General factor analysis problem.

Yf×s = Lf×k Fk×s (13)

We can solve by SVD to minimize error, but:

I We hope that the individual vectors Fi are meaningful. SVD onlyguarantees minimum error.

I In fact, the Fis we get from SVD are by construction orthogonal. Thereal mechanistic model has no such restriction.

I Our SVD factors cannot in general capture the true latent structureI We can ask for Z to be sparse (have lots of 0s) and positive

MINIMIZE ||Y − LF ||2FSUBJECT TO L > 0 ||L||L1 < t

More on this next week.

Page 48: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

More than two dataset?

I Canonical correlation. In the base case defined for 2 datasets.

I One of the dimensions is “aligned”.I Same set of biological samples but different assays. Aligned dimension:

samples.I Same assay (gene expression) but different set of samples. Aligned

dimension: genes.

I Tensor factorization. Any number of datasets.

I Considering individual datasets two of the dimensions are “aligned”.I Example: different genomic assays in multiple cell-types

Page 49: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Canonical Correlation Analysis (CCA)Basic ideaGiven a dataset Xp×n and Yp×m

Find a linear combinations of columns of X (a) and a linear combinations ofcolumns of Y (b) such that the correlation between a and b is maximized.(

a′, b′)

= argmaxa,b

corr(

aT X , bT Y)

(14)

With no additional constraints CCA a closed form solution in terms ofeigenvectors of of X T X , X T Y , and Y T Y .

Integrating single-cell transcriptomic data across different conditions, technologies, and species

Page 50: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

CCA for single cell

Page 51: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

ENCODE dataset

A large collection of molecular profiles of different cell-types.The data is (cell-type) × (molecular assay) × (genomic position)Not all assays are available for all cell-types.

Page 52: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Tensor factorization

This can be represented as a 3-dimensional tensor with many missing values.

Deep Tensor Factorization for the Imputation of Thousands of Missing Epigenetics Experiments

Page 53: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Non linear factorizations

We can think of matrix factorization as a prediction problem. If our model is

Yf×s = Lf×k Fk×s + E (15)

Then elements of Y can be predicted as linear combinations of F .The

coefficients are given by L.Now we can replace the linear function that is multiplication by L by somenon-linear function.

Why would we want to do this?

Page 54: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There

Comparing linear and non-linear models

Next week: We will learn about constrained versions of matrix factorization.

PREDICTD: PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition. Nature Communications, 2018.

Page 55: Scalable Machine Learninggobie.csb.pitt.edu/SML/MatrixFactorization.pdf · Scalable Machine Learning Matrix factorization. ... I Representation learning. Recommender systems There