Average-Case Analysis of High-Dimensional Block-Sparse...

1
Sparsity-Aware Linear Inference Problems Linear Regression: Estimate from y = + noise •Lasso: dddd Sparse Recovery (Compressive Sensing): Recover from y = •Basis Pursuit: Modeling Correlated Coefficients via Group Sparsity Vector partitioned into r groups of size m: Matrix X partitioned into r submatrices: •Group Lasso: •Block Basis Pursuit: Existing theoretical guarantees rely on subdictionaries (column sub- matrices of sensing/design matrix) being well conditioned and either: directly check conditioning with combinatorial computation, or indirectly check conditioning and provide pessimistic bounds. It is possible to avoid these issues by switching to a statistical performance measurement setting: endowing with a “uniform” distribution over all sparse vectors, considering the average-case subdictionary conditioning. Recent theoretical guarantees for problems with standard sparsity that are valid with high probability and require only simple matrix metrics Block-Sparse Recovery Theorem: Assume that is drawn “uniformly” over the set of k-block sparse signals and X satisfies the BIC. As long as , basis pursuit will return with probability of at least 1—4p —4 log 2 . Comparison to basis pursuit/standard sparsity: if coherence (max. column inner product) [Tropp 2008] Matched performance for m = 1 Block-Sparse Linear Regression Theorem: Assume that is drawn “uniformly” over the set of k-block sparse signals and X satisfies the BIC. As long as , the estimate returned by the group lasso with obeys with probability of at least . Comparison to standard lasso: same performance only if coefficients are independent (even within each group) [Candès and Plan 2009] Without independence, we require Discussion As long as dictionary coherences are sufficiently small, is the only matrix metric affecting size of well-conditioned subdictionaries Tight frames provide smallest spectral norm ; therefore, largest well-conditioned subdictionaries obey Results translate to Multiple Measurement Vector (MMV) setting: Kronecker-structured matrices with translatable norm/coherence metrics [Candès and Plan 2009] Full version of this paper: http://arxiv.org/pdf/1309.5310/ Average-Case Subdictionary Conditioning Metrics •Intra-Block Coherence: •Inter-Block Coherence: •Spectral Norm: Block Incoherence Condition (BIC): Blocks are each close to orthogonal and are sufficiently incoherent with one another • The size of the largest well-conditioned subdictionaries scales inversely with the sensing/design matrix spectral norm • The coherence measures do not affect the size of the well-conditioned subdictionaries (other than through the BIC) “Uniform” Distribution over Block Sparse Signals Block support of distributed uniformly among k-subsets of {1,…,r} Entries of have zero median (i.e., its entries have positive and neg- ative signs with equal probabilities): Nonzero blocks of have statistically independent “directions”: where , the m-dimensional sphere, and Sparsity-Aware Linear Inference Simulations Generated 2000 random matrices with normalized columns, p = 5000, m = 10, r = 500, n = 858 [Rao, Recht, Nowak 2012] Scaled matrix spectral norm (via SVD) with multiplier set Resulting design/sensing matrices had columns renormalized, block coherence metrics computed Average performance over 1000 uniformly-drawn block-sparse signals Experiment 1: Measure sparse recovery performance for matrices selected to have matching coherence values among several spectral norm multipliers Experiment 2: Measure performance for matrices with extremal (maximum/minimum) coherences for each spectral norm multiplier Theorem (Average-Case Subdictionary Conditioning): Assume that X satisfies the BIC and S is a k-subset drawn from {1,,r} uniformly at random. Then, as long as the singular values of the block subdictionary satisfy , i = 1,,km, with probability over the choice of set S of at least 1—2p —4 log 2 . Average-Case Analysis of High-Dimensional Block-Sparse Recovery and Regression for Arbitrary Designs Waheed U. Bajwa Marco F. Duarte Robert Calderbank Rutgers University UMass Amherst Duke University. From Sparsity to Group Sparsity Random Group Subdictionaries observation vector coefficients nonzero entries z modeling error (Gaussian) design/sensing matrix X y k 5 10 15 20 25 30 35 40 45 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of nonzero blocks, k Probability of exact recovery o=1 o=2 o=3 o=4 5 10 15 20 25 30 35 40 45 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of nonzero blocks, k Probability of exact recovery o=1 o=2 o=3 o=4 o=5 o=6 o=7 o=8 o=9 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 Number of nonzero blocks, k Regression error (ave. of 1000 trials) o=3 o=4 o=5 o=6 o=7 Sparse Regression Sparse Recovery Dashed: Lowest Coherence; Solid: Highest Coherence Average-Case Performance Guarantees Numerical Results

Transcript of Average-Case Analysis of High-Dimensional Block-Sparse...

Page 1: Average-Case Analysis of High-Dimensional Block-Sparse ...mduarte/images/GroupSparse_AISTATS_Poster_Apr22.pdfrenormalized, block coherence metrics computed •Average performance over

!Sparsity-Aware Linear Inference Problems

!

!

!

!

!

!

!

•Linear Regression: Estimate from y = + noise •Lasso: dddd

•Sparse Recovery (Compressive Sensing): Recover from y = •Basis Pursuit:

!Modeling Correlated Coefficients via Group Sparsity !

Vector partitioned into r groups of size m: Matrix X partitioned into r submatrices: !

•Group Lasso: !

•Block Basis Pursuit: !

Existing theoretical guarantees rely on subdictionaries (column sub-matrices of sensing/design matrix) being well conditioned and either: •directly check conditioning with combinatorial computation, or •indirectly check conditioning and provide pessimistic bounds.

It is possible to avoid these issues by switching to a statistical performance measurement setting: • endowing with a “uniform” distribution over all sparse vectors, • considering the average-case subdictionary conditioning.

Recent theoretical guarantees for problems with standard sparsity that are valid with high probability and require only simple matrix metrics

!

!Block-Sparse Recovery

!Theorem: Assume that is drawn “uniformly” over the set of k-block sparse signals and X satisfies the BIC. As long as ,basis pursuit will return with probability of at least 1—4p—4 log 2.

Comparison to basis pursuit/standard sparsity: if coherence (max. column inner product) [Tropp 2008] Matched performance for m = 1

Block-Sparse Linear Regression !Theorem: Assume that is drawn “uniformly” over the set of k-block sparse signals and X satisfies the BIC. As long as ,the estimate returned by the group lasso with obeys with probability of at least .

Comparison to standard lasso: same performance only if coefficients are independent (even within each group) [Candès and Plan 2009] Without independence, we require !

Discussion !

• As long as dictionary coherences are sufficiently small, is the only matrix metric affecting size of well-conditioned subdictionaries

• Tight frames provide smallest spectral norm ; therefore, largest well-conditioned subdictionaries obey

• Results translate to Multiple Measurement Vector (MMV) setting: Kronecker-structured matrices with translatable norm/coherence metrics [Candès and Plan 2009]

!

Full version of this paper: http://arxiv.org/pdf/1309.5310/

!Average-Case Subdictionary Conditioning Metrics

!

•Intra-Block Coherence:

•Inter-Block Coherence:

•Spectral Norm: !

Block Incoherence Condition (BIC): Blocks are each close to orthogonal and are sufficiently incoherent with one another !

!

!

!

!

!!•The size of the largest well-conditioned subdictionaries scales inversely with the sensing/design matrix spectral norm

•The coherence measures do not affect the size of the well-conditioned subdictionaries (other than through the BIC)

!“Uniform” Distribution over Block Sparse Signals

!

• Block support of distributed uniformly among k-subsets of {1,…,r} • Entries of have zero median (i.e., its entries have positive and neg-

ative signs with equal probabilities): • Nonzero blocks of have statistically independent “directions”: where , the m-dimensional sphere, and

!

!Sparsity-Aware Linear Inference Simulations

!

• Generated 2000 random matrices with normalized columns, p = 5000, m = 10, r = 500, n = 858 [Rao, Recht, Nowak 2012]

• Scaled matrix spectral norm (via SVD) with multiplier set • Resulting design/sensing matrices had columns

renormalized, block coherence metrics computed • Average performance over 1000 uniformly-drawn block-sparse signals !Experiment 1: Measure sparse recovery performance for matrices selected to have matching coherence values among several spectral norm multipliers !

!

!!!Experiment 2: Measure performance for matrices with extremal (maximum/minimum) coherences for each spectral norm multiplier !

!

!

!

!

!

Theorem (Average-Case Subdictionary Conditioning): Assume that X satisfies the BIC and S is a k-subset drawn from {1,…,r} uniformly at random. Then, as long as the singular values of the block subdictionary satisfy , i = 1,…,km, with probability over the choice of set S of at least 1—2p—4 log 2.

Average-Case Analysis of High-Dimensional Block-Sparse Recovery and Regression for Arbitrary Designs

Waheed U. Bajwa Marco F. Duarte Robert CalderbankRutgers University UMass Amherst Duke University.

From Sparsity to Group Sparsity Random Group Subdictionaries

observationvector

coefficientsnonzero entries

z

modeling error

(Gaussian)

design/sensing matrix

Xy

k

5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of nonzero blocks, k

Prob

abilit

y of

exa

ct re

cove

ry

o=1o=2o=3o=4

5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of nonzero blocks, k

Prob

abilit

y of

exa

ct re

cove

ry

o=1o=2o=3o=4o=5o=6o=7o=8o=9

5 10 15 20 25 30 35 40 45 505

10

15

20

25

30

35

Number of nonzero blocks, k

Reg

ress

ion

erro

r (av

e. o

f 100

0 tri

als)

o=3o=4o=5o=6o=7

Sparse Regression Sparse RecoveryDashed: Lowest Coherence; Solid: Highest Coherence

Average-Case Performance Guarantees Numerical Results