Rounding Sum of Squares Relaxations

20
Rounding Sum of Squares Relaxations Boaz Barak – Microsoft Research Joint work with Jonathan Kelner (MIT) and David Steurer (Cornell) workshop on semidefinite programming and graph algorithms February 10-14, 2014

description

Joint work with Jonathan Kelner (MIT) and David Steurer (Cornell). Rounding Sum of Squares Relaxations. Boaz Barak – Microsoft Research. workshop on semidefinite programming and graph algorithms. February 10-14, 2014. This talk is about. - PowerPoint PPT Presentation

Transcript of Rounding Sum of Squares Relaxations

Page 1: Rounding Sum of Squares Relaxations

Rounding Sum of Squares RelaxationsBoaz Barak – Microsoft Research

Joint work with Jonathan Kelner (MIT) and David Steurer (Cornell)

workshop on semidefinite programming and graph algorithmsFebruary 10-14, 2014

Page 2: Rounding Sum of Squares Relaxations

This talk is about

• Semi-definite programming , SOS/Positivstellensatz method

• Proof complexity• The Unique Games Conjecture• Graph partitioning, small set expansion• Machine Learning• Cryptography.. (in spirit).

Page 3: Rounding Sum of Squares Relaxations

Sum-of-Squares (SOS) Algorithm[Shor’87,Parillo ’00, Nesterov ’00, Lasserre ’01]

Motivation: Sometimes a polynomial can have exponentially many local minima…

E.g.

… but there is still a short proof that E.g.

… and this proof can be found efficiently via semidefinite programming (SDP)

Page 4: Rounding Sum of Squares Relaxations

SOS Algorithm:For low degree we consider the program :

max𝑥∈ℝ𝑛

𝑃 (𝑥 )𝑠 . 𝑡 .

𝑃1 (𝑥 )=⋯=𝑃𝑘 (𝑥 )=0SOS Proof that : Polynomials and SOS polys s.t.

(𝜈−𝑃 )𝑆=∑𝑃 𝑖𝑄𝑖+𝑆′+1

Degree of proof: max degree of [Gregoriev-Vorobjov’99]

Theorem: [Shor ’87, Parillo ’00, Nesterov ’00, Lasserre ’01]1) A proof of degree can be found in time.2) Can find in time the min s.t. degree d proof that

Positivstellensatz: All true bounds have SOS proof. [Artin ’27, Krivine ’64, Stengle ‘74]

Page 5: Rounding Sum of Squares Relaxations

SOS Algorithm:For low degree we consider the program :

max𝑥∈ℝ𝑛

𝑃 (𝑥 )𝑠 . 𝑡 .

𝑃1 (𝑥 )=⋯=𝑃𝑘 (𝑥 )=0SOS Proof that : Polynomials and SOS polys s.t.

(𝜈−𝑃 )𝑆=∑𝑃 𝑖𝑄𝑖+𝑆′+1

Degree of proof: max degree of [Gregoriev-Vorobjov’99]

Theorem: [Shor ’87, Parillo ’00, Nesterov ’00, Lasserre ’01]1) A proof of degree can be found in time.2) Can find in time the min s.t. degree d proof that

Positivstellensatz: All true bounds have SOS proof. [Artin ’27, Krivine ’64, Stengle ‘74]

Can optimize in time over programs with degree proofs.

Program :

Page 6: Rounding Sum of Squares Relaxations

Program : max𝑥∈ℝ𝑛

𝑃 (𝑥 )𝑠 . 𝑡 .

𝑃1 (𝑥 )=⋯=𝑃𝑘 (𝑥 )=0SOS Proof that : Can optimize in time over programs with degree proofs.

(𝜈−𝑃 )𝑆=∑𝑃 𝑖𝑄𝑖+𝑆′+1

Can’t hope for always: Captures SAT, CLIQUE, 3COL, MAX-CUT, etc…

But maybe often? Essentially only one (robust) lower bound showing [Grigoriev ’01]

Applications:• Optimizing polynomials w/ non-negative coefficients over sphere.• Algorithms for quantum separability problem [Brandao-Harrow’13]• Sparse coding: learning dictionaries beyond the barrier.• Finding sparse vectors in subspaces.• Approach to refute the Unique Games Conjecture.

This talk: General method to analyze the SOS algorithm. [B-Kelner-Steurer’13]

Page 7: Rounding Sum of Squares Relaxations

Program : max𝑥∈ℝ𝑛

𝑃 (𝑥 )𝑠 . 𝑡 .

𝑃1 (𝑥 )=⋯=𝑃𝑘 (𝑥 )=0SOS Proof that : Can optimize in time over programs with degree proofs.

(𝜈−𝑃 )𝑆=∑𝑃 𝑖𝑄𝑖+𝑆′+1

Can’t hope for always: Captures SAT, CLIQUE, 3COL, MAX-CUT, etc…

But maybe often? Essentially only one (robust) lower bound showing [Grigoriev ’01]

Applications:• Optimizing polynomials w/ non-negative coefficients over sphere.• Algorithms for quantum separability problem [Brandao-Harrow’13]• Sparse coding: learning dictionaries beyond the barrier.• Finding sparse vectors in subspaces.• Approach to refute the Unique Games Conjecture.

This talk: General method to analyze the SOS algorithm. [B-Kelner-Steurer’13]

Page 8: Rounding Sum of Squares Relaxations

Program : max𝑥∈ℝ𝑛

𝑃 (𝑥 )𝑠 . 𝑡 .

𝑃1 (𝑥 )=⋯=𝑃𝑘 (𝑥 )=0Finding is hard. We consider easier problem:

“Finding a needle in a needle-stack”

Given many ’s maximizing , find a single with value close to maximum.

(multi) set of s.t. ,

Single s.t. ,

CombinerNon-trivial combiner: Only depends on low degree marginals of

\{𝔼𝑥∼𝑆𝑥𝑖1⋯ 𝑥𝑖𝑘 \} 𝑖1 ,.. ,𝑖𝑘∈ [𝑛]

[B-Kelner-Steurer’13]: Transform “simple” non-trivial combiners to algorithm for original problem.

Idea in a nutshell: Simple combiners will output a solution even when fed “fake marginals”.

Page 9: Rounding Sum of Squares Relaxations

Program : max𝑥∈ℝ𝑛

𝑃 (𝑥 )𝑠 . 𝑡 .

𝑃1 (𝑥 )=⋯=𝑃𝑘 (𝑥 )=0Finding is hard. We consider easier problem:

“Finding a needle in a needle-stack”

Given many ’s maximizing , find a single with value close to maximum.

(multi) set of s.t. ,

Single s.t. ,

CombinerNon-trivial combiner: Only depends on low degree marginals of

\{𝔼𝑥∼𝑆𝑥𝑖1⋯ 𝑥𝑖𝑘 \} 𝑖1 ,.. ,𝑖𝑘∈ [𝑛]

[B-Kelner-Steurer’13]: Transform “simple” non-trivial combiners to algorithm for original problem.

Idea in a nutshell: Simple combiners will output a solution even when fed “fake marginals”.

Pseudoexpectations (aka “Fake Marginals”)

“fake marginals”.

Def: [Lasserre ’01] Degree pseudoexpectation is operator mapping any degree poly into a number satisfying:• Normalization: • Linearity: of deg• Positivity: of deg

Fundamental Fact: deg SOS proof for for any deg pseudoexpectation operator

Take home message:• Pseudoexpectation “looks like” real expectation to low degree polynomials.• Can efficiently find pseudoexpectation matching any polynomial constraints.• Proofs about real random vars can often be “lifted” to pseudoexpectation.

Page 10: Rounding Sum of Squares Relaxations

Program : max𝑥∈ℝ𝑛

𝑃 (𝑥 )𝑠 . 𝑡 .

𝑃1 (𝑥 )=⋯=𝑃𝑘 (𝑥 )=0Finding is hard. We consider easier problem:

“Finding a needle in a needle-stack”

Given many ’s maximizing , find a single with value close to maximum.

(multi) set of s.t. ,

Single s.t. ,

CombinerNon-trivial combiner: Only depends on low degree marginals of

\{𝔼𝑥∼𝑆𝑥𝑖1⋯ 𝑥𝑖𝑘 \} 𝑖1 ,.. ,𝑖𝑘∈ [𝑛]

[B-Kelner-Steurer’13]: Transform “simple” non-trivial combiners to algorithm for original problem.

Idea in a nutshell: Simple combiners will output a solution even when fed “fake marginals”.

Pseudoexpectations (aka “Fake Marginals”)

“fake marginals”.

Def: [Lasserre ’01] Degree pseudoexpectation is operator mapping any degree poly into a number satisfying:• Normalization: • Linearity: of deg• Positivity: of deg

Fundamental Fact: deg SOS proof for for any deg pseudoexpectation operator

Take home message:• Pseudoexpectation “looks like” real expectation to low degree polynomials.• Can efficiently find pseudoexpectation matching any polynomial constraints.• Proofs about real random vars can often be “lifted” to pseudoexpectation.

Problem: Given low degree maximize s.t.

Combining Rounding

Page 11: Rounding Sum of Squares Relaxations

Problem: Given low degree maximize s.t.

[B-Kelner-Steurer’13]: Transform “simple” non-trivial combiners to algorithm for original problem.

Non-trivial combiner: Alg withInput: , r.v. over s.t. Output: s.t.

Corollary: In this case, we can find efficiently:

• Use SOS PSD to find pseudoexpectation matching input conditions.

• Use to round the PSD solution into an actual solution

Crucial Observation: If proof that is good solution is in SOS framework, then it holds even if is fed with a pseudoexpectation.

Combining Rounding

Page 12: Rounding Sum of Squares Relaxations

Goal: Given examples of form , where recover

Find the “right” representation of observed data

Previous best (rigorous) results: [Spielman-Wang-Wright ’12, Arora-Moitra-Ge ‘13, Agrawal-Anandkumar-Jain-Netrapalli-Tandon ‘13]

We show: is sufficient* (even in non-independent, overcomplete case)

Let set of vectors.

LOTS of work: important primitive in Machine Learning, Vision, Neuroscience...

Example Application: Dictionary Learning / Sparse Coding

[Olhausen-Field ’96]

* In quasipoly time; we show is sufficient in poly time.

Page 13: Rounding Sum of Squares Relaxations

Goal: Given examples of form , where recover

Find the “right” representation of observed data

Previous best (rigorous) results: [Spielman-Wang-Wright ’12, Arora-Moitra-Ge ‘13, Agrawal-Anandkumar-Jain-Netrapalli-Tandon ‘13]

We show: is sufficient* (even in non-independent, overcomplete case)

Let set of vectors.

LOTS of work: important primitive in Machine Learning, Vision, Neuroscience,…

Example Application: Dictionary Learning / Sparse Coding

[Olhausen-Field ’96]

* In quasipoly time; we show is sufficient in poly time.

Page 14: Rounding Sum of Squares Relaxations

(3) Show that arguments in (1) and (2) fall under the SOS framework.

Goal: Given examples of form , where recover Let set of vectors.

Achieve in 3 steps:

Result generalizes to overcomplete, non independent case.

For simplicity, assume , ’s orthonormal basis, i.i.d. random vars over s.t.

(1) Find a program s.t. every maximizing is close to one of ’s

(2) Give combining alg taking moments of dist over maximizers into a vector close to one of ’s.

Consider the polynomial𝑃 (𝑥 )=𝔼 ⟨ 𝑦 ,𝑥 ⟩4=𝔼 (∑𝑊 𝑖 ⟨𝑎𝑖 ,𝑥 ⟩ )4(can approximate arbitrarily well from examples)

Opening parenthesis we get𝑃 (𝑥 )≤𝜇∑ ⟨𝑎𝑖 ,𝑥 ⟩4+2𝜇2 (∑ ⟨𝑎𝑖 , 𝑥 ⟩2 )2=𝜇∑ ⟨𝑎𝑖 ,𝑥 ⟩4+𝑜(𝜇)∥ 𝑥∥4

Corollary: unit, Establishes (1) !

Step 1.

Page 15: Rounding Sum of Squares Relaxations

(3) Show that arguments in (1) and (2) fall under the SOS framework.

Goal: Given examples of form , where recover Let set of vectors.

Achieve in 3 steps:

Result generalizes to overcomplete, non independent case.

For simplicity, assume , ’s orthonormal basis, i.i.d. random vars over s.t.

(1) Find a program s.t. every maximizing is close to one of ’s

(2) Give combining alg taking moments of dist over maximizers into a vector close to one of ’s.

Consider the polynomial𝑃 (𝑥 )=𝔼 ⟨ 𝑦 ,𝑥 ⟩4=𝔼 (∑𝑊 𝑖 ⟨𝑎𝑖 ,𝑥 ⟩ )4(can approximate arbitrarily well from examples)

Opening parenthesis we get𝑃 (𝑥 )≤𝜇∑ ⟨𝑎𝑖 ,𝑥 ⟩4+2𝜇2 (∑ ⟨𝑎𝑖 , 𝑥 ⟩2 )2=𝜇∑ ⟨𝑎𝑖 ,𝑥 ⟩4+𝑜(𝜇)∥ 𝑥∥4

Corollary: unit, Establishes (1) !

Step 1.

Page 16: Rounding Sum of Squares Relaxations

Step 2. Let be dist over unit vectors s.t. every satisfies for some

Pick set of random (std gaussian) vectors.

Establishes (2) !

for Let be matrix s.t. Our combining algorithm outputs the top e-vec of .Suppose that and for every , .

(Note that )

Then if then (up to scaling) and we’ll succeed.

(3) Show that arguments in (1) and (2) fall under the SOS framework.

Goal: Given examples of form , where recover Let set of vectors.

Achieve in 3 steps:

(1) Find a program s.t. every maximizing is close to one of ’s

(2) Give combining alg taking moments of dist over maximizers into a vector close to one of ’s.

Slightly tedious but straightforward computations.

Happens w prob

Page 17: Rounding Sum of Squares Relaxations

Unique Games Conjecture: UG/SSE problem is NP-hard. [Khot’02,Raghavendra-Steurer’08]

reasons to believe reasons to suspect

“Standard crypto heuristic”: Tried to solve it and couldn’t.

Very clean picture of complexity landscape:simple algorithms are optimal[Khot’02…Raghavendra’08….]

Random instances are easy via simple algorithm[Arora-Khot-Kolla-Steurer-Tulsiani-Vishnoi’05]

Simple poly algorithms can’t refute it[Khot-Vishnoi’04] Subexponential algorithm

[Arora-B-Steurer ‘10]

Quasipoly algo on KV instance[Kolla ‘10]

Simple subexp' algorithms can’t refute it[B-Gopalan-Håstad-Meka-Raghavendra-Steurer’12] SOS solves all candidate hard

instances[B-Brandao-Harrow-Kelner-Steurer-Zhou ‘12]

SOS

proo

f sy

stem

SOS useful for sparse vector problemCandidate algorithm for search problem[B-Kelner-Steurer ‘13]

A personal overview of the Unique Games Conjecture

Page 18: Rounding Sum of Squares Relaxations

Conclusions• Sum of Squares is a powerful algorithmic framework that can

yield strong results for the right problems.

(contrast with previous results on SDP/LP hierarchies, showing lower bounds when using either wrong hierarchy or wrong problem.)

• “Combiner” view allows to focus on the features of the problem rather than details of relaxation.

• SOS seems particularly useful for problems with some geometric structure, includes several problems related to unique games and machine learning.

• Still have only rudimentary understanding when SOS works or not.

• Other proof complexity approximation algorithms connections?

Page 19: Rounding Sum of Squares Relaxations
Page 20: Rounding Sum of Squares Relaxations

Other ResultsSparse vector problem:Recover -sparse vector in -dimensional subspace given arbitrary basis.

Random case: Recovery for any (Improving on [Demanet-Hand ‘13])

[Brandao-Harrow’12]: Using our techniques, find separable quantum state maximizing a “local operations classical communication” () measurement.

Worst case: Recovery* for

(motivation: machine learning, optimization , [Demanet-Hand 13]worst-case variant is algorithmic bottleneck in UG/SSE alg [Arora-B-Steurer’10])