Sparse PCA via Bipartite Matchings - GitHub...

1
Input: i) n d input data matrix S (or covariance A = 1 / n · S > S) ii) k : # of components, iii) s # nnz entries/component, iv ) accuracy 2 (0, 1), v ) r : rank of approximation, Output: X (r) 2 X k such that Tr ( X > (r) AX (r) ) (1 - ) · OPT - 2 · k · kA - Ak 2 , in time T SKETCH (r )+ T SVD (r )+ O ( 4 ) r·k · d · (s · k ) 2 . Theorem II: Algo Guarantees (Full rank) Sparse PCA via Bipartite Matchings Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alex Dimakis [ Multiple Sparse Components ] Obubtib!Tuvejpt† Tqbstf!wfdups! )OQ.ibse* Given a covariance matrix , find direction of maximum variance, as a linear combination of only a few variables: Component 1 2 3 4 5 6 Cumulative Expl. Variance 0 500 1000 1500 2000 2500 3000 +20:99% k = 6 components, s = 10 nnz/component TPower EM-SPCA SpanSPCA SPCABiPart Number of target components 2 3 4 5 6 7 8 Total Cumulative Expl. Variance 0 500 1000 1500 2000 2500 3000 3500 +10.01% +16.22% +17.94% +19.10% +20.99% +23.57% +24.51% s = 10 nnz/component SPCABiPart SpanSPCA EM-SPCA TPower 2 6 6 4 1 0 0 0 δ 0 0 0 0 δ 0 0 0 1 3 7 7 5 Solution I: Deflation A = 2 6 6 4 1 0 0 0 δ 0 0 0 0 δ 0 0 0 1 3 7 7 5 2 6 6 4 1 0 0 0 δ 0 0 0 0 δ 0 0 0 1 3 7 7 5 Solution II: Joint Optimization [ One approach: Deflation ] [ Sparse PCA ] A x > x A X X > x ? = arg max x2X ( x > Ax ) X = x 2 R d : kxk 2 =1, kxk 0 = s Fnqjsjdbm!Dpwbsjbodf Find multiple sparse components with disjoint support sets: Example: NY Times text corpus - Find 8 components, each 10-sparse. - Sparse disjoint components interpreted as distinct topics. [ SPCA on a Low Dim Sketch ] 3!mfwfmt!pg! bqqspyjnbujpo X k = ( X 2 R dk : kX j k 2 =1, kX j k 0 = s, 8j supp(X i ) \ supp(X j )= ;, 8 i, j ) X ? = arg max X2X k Tr ( X > AX ) (MultiSPCA) SPCA Algo SKETCH S o!y!e S s!y!e A 1 / n · S > S A X (r) e!y!e )Ps!frvjw/!!!!!!!!!!* F/h/-!fybdu!ps!bqqspy/!TWE Fyusb!fssps;!efqfoet!po!uif! rvbmjuz!pg!uif!tlfudi/ Fyusb!ujnf;!gps!dpnqvujoh! uif!tlfudi [ In Practice ] - Taking too long? - Run our algorithm and stop it any time. Ignore the theoretical guarantees Still finds solutions with higher explained variance, compared to deflation based methods. Example: Leukemia Dataset - # samples n = 72, dimension d=12582 (probe sets) - Compare to deflation using TPower, EM-SPCA and SpanSPCA for the single component SPCA problem. λ max 1 1 λ max δ 0 0 δ + =1+1=2 =1+ + δ 2 + λ max 1 0 0 δ λ max δ 0 0 1 Problem: Given a 4 x 4 PSD matrix , find two 2-sparse components with disjoint supports, that maximize . x 1 , x 2 x > 1 Ax 1 + x > 2 Ax 2 A Compute components one-by-one. - Compute one sparse PC. - Remove used variables from the dataset. - Repeat. Tjnqmf!cvu- tvcpqujnbm/ Tqbstf!dpmvnot Ejtkpjou!tvqqpsu!tfut A )tvcpqujnbm* Theorem I: Algo Guarantees (Low rank) Input: i) d d rank-r PSD matrix A, ii) k : desired # of components, iii) s # nnz entries/component, iv ) accuracy 2 (0, 1). Output: X 2 X k such that Tr X > A X (1 - ) · OPT, in time T SVD (r )+ O ( 4 ) r·k · d · (s · k ) 2 . k X j =1 D b X j , W j E 2 = k X j =1 X i2I j W 2 ij . Observation I If we knew the support sets , we could determine the optimal value based on Cauchy-Schwarz: I 1 ,..., I k Volopxo!tvqqpsut/! Gjoe!uifn/ u (1) 1 u (1) s . . . u (k) 1 u (k) s . . . . . . . . . . . . W 2 i1 W 2 i1 W 2 ik W 2 ik 2 e j )2-2* )2-t* )l-2* )l-t* t!wfsujdft;! tvqqpsu!pg!uif! 2 tu !dpmvno/ t!wfsujdft;! tvqqpsu!pg!uif! l ui !dpmvno/ Maximum Weight Matching on G: - Each vertex on the left is mapped to a vertex on the right. indices are assigned to each “support set”. - Each right vertex is used at most once. Support sets are disjoint. - Maximum weight = maximum objective in (). e!wfsujdft! sfqsftfoujoh!uif! e!ejnfotjpot fehf!xfjhiut! cbtfe!po!joqvu Consider the complete bipartite graph G on vertices: k · s + d [ Subroutine ] b X = arg max X2X k k X j =1 X j , W j 2 : disjoint support sets of the k components (columns of ). I 1 ,..., I k l In reality, data is not low rank. However, maybe close to low rank. Spectrum of may be sharply decaying is well approximated by a low rank matrix. A A [ Our Algorithm ] A = VV > x > Ax = kV > xk 2 2 V > x, c 2 8c 2 R r : kck 2 =1 x > Ax = max c2S r-1 2 hx, Vci 2 Matrix is PSD and can be decomposed into . Observation I For multiple components… Observation II max X2X k Tr ( X > AX ) = max X2X k max C:C j 2S r-1 2 8j k X j =1 X j , VC j 2 . V V > A = e s;!sbol A In turn, a variational characterization is the following: Fix the value of the variable . Let . r k C SPCA reduces to determining the optimal . Low dimensional variable: sample to find the best. b X = arg max X2X k k X j =1 X j , W j 2 TQDB!bt!b!⁵epvcmf⁶!nbyjnj{bujpo W VC Dbo!cf!tpmwfe/! Ipx@!)Mbufs* C s!y!l!wbsjbcmf/! )tnbmmfs!uibo!Y* Input: rank- PSD - Initialize empty collection - Compute ( ) - For Sample Compute Solve Add to the collection . Output: Best solution in collection . d d r A i =1: O ( 4 / ) r·k S S W VC b X = arg max X2X k k X j =1 X j , W j 2 b X X C V Chol(A) d r [ Algorithm ] S )s!y!l!wbsjbcmf/!Fbdi!dpmvno!jt!voju.opsn* A Think of the d x d matrix as having rank r. For now r < d. Tujmm!nvdi!cfuufs! uibo!objwf!csvuf!gpsdf/ [ Summary ] Tfqbsbuft!bncjfou!boe! jousjotjd!ejnfotjpo/ - First algorithm for multi-component SPCA with disjoint supports; Operates by recasting MultiSPCA into multiple instances of the bipartite maximum weight matching problem. - Provable approximation guarantees. - Complexity: - Low-order polynomial in the ambient dimension d, but - Exponential in the intrinsic dimension r. b X )* s Input: matrix : # nnz entries / column of ) 1. Construct bipartite graph G as above. 2. Compute maximum weight matching to determine the supports 3. Compute each column of for the given support based on Cauchy-Schwarz. I 1 ,..., I k W s b X b X [ Algorithm ] d k

Transcript of Sparse PCA via Bipartite Matchings - GitHub...

Page 1: Sparse PCA via Bipartite Matchings - GitHub Pagesmegasthenis.github.io/repository/NIPS2015-SPCAviaBipartiteMatchings-Poster.pdfSparse PCA via Bipartite Matchings Megasthenis Asteris,

Input: i) n⇥ d input data matrix S (or covariance A =

1/n · S>S)ii) k: # of components, iii) s # nnz entries/component,

iv) accuracy ✏ 2 (0, 1), v) r: rank of approximation,

Output: X(r) 2 Xk such that

Tr�X>

(r)AX(r)

�� (1� ✏) · OPT� 2 · k · kA�Ak2,

in time TSKETCH(r) + TSVD(r) +O⇣�

4✏

�r·k · d · (s · k)2⌘.

Theorem II: Algo Guarantees (Full rank)

Sparse PCA via Bipartite MatchingsMegasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alex Dimakis

[ Multiple Sparse Components ]

Given a covariance matrix , find direction of maximum variance, as a linear combination of only a few variables:

Component1 2 3 4 5 6

CumulativeExpl.Variance

0

500

1000

1500

2000

2500

3000

+20:99%

k = 6 components, s = 10 nnz/component

TPowerEM-SPCASpanSPCASPCABiPart

Number of target components2 3 4 5 6 7 8

TotalCumulativeExpl.

Variance

0

500

1000

1500

2000

2500

3000

3500

+10.01%

+16.22%

+17.94%+19.10%

+20.99%+23.57%

+24.51%

s = 10 nnz/component

SPCABiPartSpanSPCAEM-SPCATPower

2

664

1 0 0 ✏0 � 0 00 0 � 0✏ 0 0 1

3

775

Solution I: Deflation

A =

2

664

1 0 0 ✏0 � 0 00 0 � 0✏ 0 0 1

3

775

2

664

1 0 0 ✏0 � 0 00 0 � 0✏ 0 0 1

3

775

Solution II: Joint Optimization

[ One approach: Deflation ]

[ Sparse PCA ]

Ax

>

x

A XX>

x? = arg max

x2X

�x

>Ax

X =�x 2 Rd : kxk2 = 1, kxk0 = s

Find multiple sparse components with disjoint support sets:

Example: NY Times text corpus - Find 8 components, each 10-sparse. - Sparse disjoint components interpreted as distinct topics.

[ SPCA on a Low Dim Sketch]

Xk =

(X 2 Rd⇥k : kXjk2 = 1, kXjk0 = s, 8j

supp(Xi) \ supp(Xj) = ;, 8 i, j

)

X? = arg max

X2Xk

Tr�X>AX

�(MultiSPCA)

SPCA Algo

SKETCHS S

A 1/n · S>S

A

X(r)

[ In Practice ]- Taking too long? - Run our algorithm and stop it any time. → Ignore the theoretical guarantees → Still finds solutions with higher explained variance, compared to deflation based methods.

Example: Leukemia Dataset - # samples n = 72, dimension d=12582 (probe sets) - Compare to deflation using TPower, EM-SPCA and SpanSPCA

for the single component SPCA problem.

�max

✓1 ✏✏ 1

�◆�max

✓� 00 �

�◆+

= 1 + 1 = 2

= 1 + ✏+ � ⌧ 2

+�max

✓1 00 �

�◆�max

✓� 00 1

�◆

Problem: Given a 4 x 4 PSD matrix , find two 2-sparse components with disjoint supports, that maximize .

x1,x2

x

>1 Ax1 + x

>2 Ax2

A

Compute components one-by-one. - Compute one sparse PC. - Remove used variables from the dataset. - Repeat.

A

Theorem I: Algo Guarantees (Low rank)Input: i) d⇥d rank-r PSD matrixA, ii) k: desired # of components,

iii) s # nnz entries/component, iv) accuracy ✏ 2 (0, 1).

Output: X 2 Xk such that

Tr⇣X

>AX

⌘� (1� ✏) · OPT,

in time TSVD(r) +O⇣�

4✏

�r·k · d · (s · k)2⌘.

kX

j=1

DbXj ,Wj

E2=

kX

j=1

X

i2Ij

W 2ij .

Observation I If we knew the support sets , we could determine the optimal value based on Cauchy-Schwarz:

I1, . . . , Ik

u(1)1

u(1)s

...

u(k)1

u(k)s

...

v1

vp

vi

...

...

...

W 2i1

W 2i1

W 2ik

W 2ik

U1

Uk

Maximum Weight Matching on G: - Each vertex on the left is mapped to a vertex on the right.→ indices are assigned to each “support set”.

- Each right vertex is used at most once.→ Support sets are disjoint.

- Maximum weight = maximum objective in (✳).

Consider the complete bipartite graph G on vertices:k · s+ d

[ Subroutine ]

bX = arg max

X2Xk

kX

j=1

⌦Xj ,Wj

↵2

: disjoint support sets of the k components (columns of ).I1, . . . , Ik

In reality, data is not low rank. However, maybe close to low rank. → Spectrum of may be sharply decaying → is well approximated by a low rank matrix.

A

A

[ Our Algorithm ]

A = VV>

x

>Ax = kV>

xk22 �⌦V

>x, c

↵28c 2 Rr : kck2 = 1

x

>Ax = max

c2Sr�12

hx, Vci2

Matrix is PSD and can be decomposed into .

Observation I

For multiple components…

Observation II

max

X2Xk

Tr�X>AX

�= max

X2Xk

max

C:Cj2Sr�12 8j

kX

j=1

⌦Xj , VCj

↵2.

V V>A =A

In turn, a variational characterization is the following:

Fix the value of the variable . Let .r ⇥ k C

→ SPCA reduces to determining the optimal . → Low dimensional variable: sample to find the best.

bX = arg max

X2Xk

kX

j=1

⌦Xj ,Wj

↵2

W VC

C

Input: rank- PSD - Initialize empty collection - Compute ( ) - For

Sample Compute Solve

Add to the collection . Output: Best solution in collection .

d⇥ d r A

i = 1 : O⇣(4/✏)r·k

S

S

W VC

bX = arg max

X2Xk

kX

j=1

⌦Xj ,Wj

↵2

bXX

C

V Chol(A) d⇥ r

[ Algorithm ]

S

AThink of the d x d matrix as having rank r. For now r < d.

[ Summary ]- First algorithm for multi-component SPCA with disjoint supports;

Operates by recasting MultiSPCA into multiple instances of the bipartite maximum weight matching problem.

- Provable approximation guarantees. - Complexity:

- Low-order polynomial in the ambient dimension d, but - Exponential in the intrinsic dimension r.

bX

s

Input: matrix : # nnz entries / column of ) 1. Construct bipartite graph G as above. 2. Compute maximum weight matching

to determine the supports 3. Compute each column of for the given

support based on Cauchy-Schwarz.

I1, . . . , Ik

Ws bX

bX

[ Algorithm ]d⇥ k