Download - Sparse PCA via Bipartite Matchings - GitHub Pagesmegasthenis.github.io/repository/NIPS2015-SPCAviaBipartiteMatchings-Poster.pdfSparse PCA via Bipartite Matchings Megasthenis Asteris,

Transcript
Page 1: Sparse PCA via Bipartite Matchings - GitHub Pagesmegasthenis.github.io/repository/NIPS2015-SPCAviaBipartiteMatchings-Poster.pdfSparse PCA via Bipartite Matchings Megasthenis Asteris,

Input: i) n⇥ d input data matrix S (or covariance A =

1/n · S>S)ii) k: # of components, iii) s # nnz entries/component,

iv) accuracy ✏ 2 (0, 1), v) r: rank of approximation,

Output: X(r) 2 Xk such that

Tr�X>

(r)AX(r)

�� (1� ✏) · OPT� 2 · k · kA�Ak2,

in time TSKETCH(r) + TSVD(r) +O⇣�

4✏

�r·k · d · (s · k)2⌘.

Theorem II: Algo Guarantees (Full rank)

Sparse PCA via Bipartite MatchingsMegasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, Alex Dimakis

[ Multiple Sparse Components ]

Given a covariance matrix , find direction of maximum variance, as a linear combination of only a few variables:

Component1 2 3 4 5 6

CumulativeExpl.Variance

0

500

1000

1500

2000

2500

3000

+20:99%

k = 6 components, s = 10 nnz/component

TPowerEM-SPCASpanSPCASPCABiPart

Number of target components2 3 4 5 6 7 8

TotalCumulativeExpl.

Variance

0

500

1000

1500

2000

2500

3000

3500

+10.01%

+16.22%

+17.94%+19.10%

+20.99%+23.57%

+24.51%

s = 10 nnz/component

SPCABiPartSpanSPCAEM-SPCATPower

2

664

1 0 0 ✏0 � 0 00 0 � 0✏ 0 0 1

3

775

Solution I: Deflation

A =

2

664

1 0 0 ✏0 � 0 00 0 � 0✏ 0 0 1

3

775

2

664

1 0 0 ✏0 � 0 00 0 � 0✏ 0 0 1

3

775

Solution II: Joint Optimization

[ One approach: Deflation ]

[ Sparse PCA ]

Ax

>

x

A XX>

x? = arg max

x2X

�x

>Ax

X =�x 2 Rd : kxk2 = 1, kxk0 = s

Find multiple sparse components with disjoint support sets:

Example: NY Times text corpus - Find 8 components, each 10-sparse. - Sparse disjoint components interpreted as distinct topics.

[ SPCA on a Low Dim Sketch]

Xk =

(X 2 Rd⇥k : kXjk2 = 1, kXjk0 = s, 8j

supp(Xi) \ supp(Xj) = ;, 8 i, j

)

X? = arg max

X2Xk

Tr�X>AX

�(MultiSPCA)

SPCA Algo

SKETCHS S

A 1/n · S>S

A

X(r)

[ In Practice ]- Taking too long? - Run our algorithm and stop it any time. → Ignore the theoretical guarantees → Still finds solutions with higher explained variance, compared to deflation based methods.

Example: Leukemia Dataset - # samples n = 72, dimension d=12582 (probe sets) - Compare to deflation using TPower, EM-SPCA and SpanSPCA

for the single component SPCA problem.

�max

✓1 ✏✏ 1

�◆�max

✓� 00 �

�◆+

= 1 + 1 = 2

= 1 + ✏+ � ⌧ 2

+�max

✓1 00 �

�◆�max

✓� 00 1

�◆

Problem: Given a 4 x 4 PSD matrix , find two 2-sparse components with disjoint supports, that maximize .

x1,x2

x

>1 Ax1 + x

>2 Ax2

A

Compute components one-by-one. - Compute one sparse PC. - Remove used variables from the dataset. - Repeat.

A

Theorem I: Algo Guarantees (Low rank)Input: i) d⇥d rank-r PSD matrixA, ii) k: desired # of components,

iii) s # nnz entries/component, iv) accuracy ✏ 2 (0, 1).

Output: X 2 Xk such that

Tr⇣X

>AX

⌘� (1� ✏) · OPT,

in time TSVD(r) +O⇣�

4✏

�r·k · d · (s · k)2⌘.

kX

j=1

DbXj ,Wj

E2=

kX

j=1

X

i2Ij

W 2ij .

Observation I If we knew the support sets , we could determine the optimal value based on Cauchy-Schwarz:

I1, . . . , Ik

u(1)1

u(1)s

...

u(k)1

u(k)s

...

v1

vp

vi

...

...

...

W 2i1

W 2i1

W 2ik

W 2ik

U1

Uk

Maximum Weight Matching on G: - Each vertex on the left is mapped to a vertex on the right.→ indices are assigned to each “support set”.

- Each right vertex is used at most once.→ Support sets are disjoint.

- Maximum weight = maximum objective in (✳).

Consider the complete bipartite graph G on vertices:k · s+ d

[ Subroutine ]

bX = arg max

X2Xk

kX

j=1

⌦Xj ,Wj

↵2

: disjoint support sets of the k components (columns of ).I1, . . . , Ik

In reality, data is not low rank. However, maybe close to low rank. → Spectrum of may be sharply decaying → is well approximated by a low rank matrix.

A

A

[ Our Algorithm ]

A = VV>

x

>Ax = kV>

xk22 �⌦V

>x, c

↵28c 2 Rr : kck2 = 1

x

>Ax = max

c2Sr�12

hx, Vci2

Matrix is PSD and can be decomposed into .

Observation I

For multiple components…

Observation II

max

X2Xk

Tr�X>AX

�= max

X2Xk

max

C:Cj2Sr�12 8j

kX

j=1

⌦Xj , VCj

↵2.

V V>A =A

In turn, a variational characterization is the following:

Fix the value of the variable . Let .r ⇥ k C

→ SPCA reduces to determining the optimal . → Low dimensional variable: sample to find the best.

bX = arg max

X2Xk

kX

j=1

⌦Xj ,Wj

↵2

W VC

C

Input: rank- PSD - Initialize empty collection - Compute ( ) - For

Sample Compute Solve

Add to the collection . Output: Best solution in collection .

d⇥ d r A

i = 1 : O⇣(4/✏)r·k

S

S

W VC

bX = arg max

X2Xk

kX

j=1

⌦Xj ,Wj

↵2

bXX

C

V Chol(A) d⇥ r

[ Algorithm ]

S

AThink of the d x d matrix as having rank r. For now r < d.

[ Summary ]- First algorithm for multi-component SPCA with disjoint supports;

Operates by recasting MultiSPCA into multiple instances of the bipartite maximum weight matching problem.

- Provable approximation guarantees. - Complexity:

- Low-order polynomial in the ambient dimension d, but - Exponential in the intrinsic dimension r.

bX

s

Input: matrix : # nnz entries / column of ) 1. Construct bipartite graph G as above. 2. Compute maximum weight matching

to determine the supports 3. Compute each column of for the given

support based on Cauchy-Schwarz.

I1, . . . , Ik

Ws bX

bX

[ Algorithm ]d⇥ k