[IEEE 2010 IEEE International Conference on Acoustics, Speech and Signal Processing - Dallas, TX,...

�1 OPTIMIZATION AND ITS VARIOUS THRESHOLDS IN COMPRESSED SENSING

Mihailo Stojnic

Purdue University, West Lafayette, INe-mail: [email protected]

ABSTRACT

Recently, [5, 9] theoretically analyzed the success of a poly-

nomial �1-optimization algorithm in solving an under-determined

system of linear equations. In a large dimensional and statisti-

cal context [5, 9] proved that if the number of equations (mea-

surements in the compressed sensing terminology) in the system

is proportional to the length of the unknown vector then there is

a sparsity (number of non-zero elements of the unknown vector)

also proportional to the length of the unknown vector such that �1-

optimization succeeds in solving the system. In this paper, we

consider an alternative performance analysis of �1-optimization

and demonstrate that the proportionality constants it provides in

certain cases match or improve on the best currently known ones

from [8, 9].

Index Terms: compressed sensing, �1-optimization

1. INTRODUCTIONIn last several years the area of compressed sensing has been the

subject of extensive research. The breakthrough results of [5] and

[9] theoretically demonstrated that in certain applications (e.g. sig-

nal processing in sensor networks) classical sampling at Nyquist

rate may not be necessary to perfectly recover signals. These re-

sults generated enormous amount of research with possible appli-

cations ranging from high-dimensional geometry, image recon-

struction, single-pixel camera design, decoding of linear codes,

channel estimation in wireless communications, to machine learn-

ing, data-streaming algorithms, DNA micro-arrays, magneto - en-

cephalography etc. (more on the compressed sensing problems,

their importance, and wide spectrum of different applications can

be found in excellent references [1, 7, 12, 18–20, 28, 30]).

In this paper we are interested in the mathematical background

of certain compressed sensing problems. As is well known, these

problems are very easy to pose and very difficult to solve. Namely,

they are as simple as the following: we would like to find x such

thatAx = y (1)

where A is an m × n (m < n) measurement matrix and y is an

m × 1 measurement vector. Standard compressed sensing con-

text assumes that x is an n × 1 unknown k-sparse vector (under

k-sparse vector we assume a vector that has at most k nonzero

components). The main topic of this paper will be compressed

sensing of the so-called ideally sparse signals (more on the so-

called approximately sparse signals can be found in e.g. [6, 31]).

We will mostly throughout the paper assume no special structure

on the sparse signal (more on the very relevant cases of sparse

signals with special structures the interested reader can find in

[1, 14, 17, 25–27]). Also, in the rest of the paper we will assume

the so-called linear regime, i.e. we will assume that k = βn and

that the number of the measurements is m = αn where α and βare absolute constants independent of n

A very successful approach to solving (1) that recently at-

tracted a great deal of attention is called �1- optimization. Basic

�1- optimization algorithm finds x in (1) by solving the following

�1-norm minimization problem

min ‖x‖1

subject to Ax = y. (2)

Quite remarkably, in [5] the authors were able to show that if α and

n are given, the matrix A is given and satisfies a special property

called the restricted isometry property (RIP), then any unknown

vector x with no more than k = βn (where β is an absolute con-

stant dependent on α and explicitly calculated in [5]) non-zero el-

ements can be recovered by solving (2). As expected, this assumes

that y was in fact generated by that x and given to us. The case

when the available measurements are noisy versions of y is also

of interest [5, 29]. Although that case is not of primary interest in

the present paper it is worth mentioning that the recent popularity

of �1-optimization in compressed sensing is significantly due to its

robustness with respect to noisy measurements. (Of course, the

main reason for its popularity is its ability to solve (1) for a very

wide range of matrices A.)

Clearly, having the matrix A satisfy the RIP condition is of

critical importance for previous claim to hold (more on the im-

portance of the RIP condition can be found in [4]). For several

classes of random matrices (e.g., matrices with i.i.d. zero mean

Gaussian, Bernoulli, or even general Sub-gaussian components)

the RIP condition is satisfied with overwhelming probability [2,5].

(Under overwhelming probability we in this paper assume a prob-

ability that is no more than a number exponentially decaying in naway from 1.) However, the RIP is only a sufficient condition for

�1-optimization to produce the solution of (1).

Instead of characterizing the m× n matrix A through the RIP

condition, in [8, 9] the author associates certain polytope with the

matrix A. Namely, [8,9] consider polytope obtained by projecting

the regular n-dimensional cross-polytope using the matrix A. It

turns out that a necessary and sufficient condition for (2) to pro-

duce the solution of (1) is that this polytope associated with the

matrix A is k-neighborly [8, 9]. Using high-dimensional geom-

etry it is further shown in [9], that if the matrix A is a random

m × n ortho-projector matrix then with overwhelming probabil-

ity polytope obtained projecting the standard n-dimensional cross-

polytope by A is k-neighborly. The precise relation between mand k in order for this to happen is characterized in [8, 9] as well.

3910978-1-4244-4296-6/10/$25.00 ©2010 IEEE ICASSP 2010

It should be noted that one usually considers success of (2)

in finding solution of (1) for any given x. It is also of interest to

consider success of (2) in finding solution of (1) for almost anygiven x. To make a distinction between these cases we recall on

the following definitions from [9, 10].

Clearly, for any given constant α ≤ 1 there is a maximum al-

lowable value of the constant β such that (2) finds solution of (1)

with overwhelming probability for any x. This maximum allow-

able value of the constant β is called the strong threshold (see [9]).

We will denote the value of the strong threshold by βs. Similarly,

for any given constant α ≤ 1 one can define the sectional thresh-old as the maximum allowable value of the constant β such that

(2) finds the solution of (1) with overwhelming probability for anyx with a given fixed location of non-zero components (see [9]).

In a similar fashion one can then denote the value of the sectional

threshold by βsec. Finally, for any given constant α ≤ 1 one can

define the weak threshold as the maximum allowable value of the

constant β such that (2) finds the solution of (1) with overwhelm-

ing probability for any x with a given fixed location of non-zero

components and a given fixed combination of its elements signs

(see [9]). In a similar fashion one can then denote the value of the

weak threshold by βw. In this paper we determine the values of

βs, βsec, βw for the entire range of α, i.e. for 0 < α ≤ 1, for a

specific group of randomly generated matrices A.

2. KEY THEOREMSIn this section we introduce two useful theorems that turn out to be

of key importance for analysis of the success of �1-optimization.

First we recall on a null-space characterization of the matrix A that

guarantees that the solutions of (1) and (2) coincide. The following

theorem from [23] provides this characterization (similar charac-

terizations can be found in [11, 25, 31, 32]; furthermore, if instead

of �1 one, for example, uses an �q-optimization (0 < q < 1) in (2)

then characterizations similar to the ones from [11, 25, 31, 32] can

be derived as well [16]).

Theorem 1 (Null-space characterization; General x) Assume thatan m×n measurement matrix A is given. Let x be a k-sparse vec-tor whose non-zero components can be both positive or negative.Further, assume that y = Ax and that w is an n × 1 vector. LetK be any subset of {1, 2, . . . , n} such that |K| = k and let Ki

denote the i-th element of K. Further, let K̄ = {1, 2, . . . , n} \K.Let 1 be a 2k × k sign matrix. Each element of the matrix 1 iseither 1 or −1 and there are no two rows that are identical. Let 1j

be the j-th row of the matrix 1. Then (2) will produce the solutionof (1) if

(∀w ∈ Rn|Aw = 0) and ∀K, j − 1jwK <

n−k∑i=1

|wK̄i|. (3)

Remark: The following simplification of the previous theorem is

also well-known. Let w ∈ Rn be such that Aw = 0. Let |w|(i)be the i-th smallest magnitude of the elements of w. Set w̃ =(|w|(1), |w|(2), . . . , |w|(n))

T . If (∀w|Aw = 0)∑n

i=n−k+1 w̃i ≤∑n−ki=1 w̃i, where w̃i is the i-th element of w̃, then the solutions

of (1) and (2) coincide.

Having matrix A such that (3) holds would be enough for so-

lutions of (2) and (1) to coincide. If one assumes that m and k are

proportional to n (the case of our interest in this paper) then the

construction of the deterministic matrices A that would satisfy (3)

is not an easy task (in fact, one may say that it is one of the most

fundamental open problems in the area of theoretical compressed

sensing). However, turning to random matrices significantly sim-

plifies things. As we will see later in the paper, the random matri-

ces A that have the null-space uniformly distributed in the Grass-

manian will turn out to be a very convenient choice. The following

phenomenal result from [15] that relates to such matrices is of key

importance in what will follow.

Theorem 2 ( [15] Escape through a mesh) Let S be a subset of theunit Euclidean sphere Sn−1 in Rn. Let Y be a random (n − m)-dimensional subspace of Rn, distributed uniformly in the Grass-manian with respect to the Haar measure. Let

w(S) = E supw∈S

(hT w) (4)

where h is a random column vector in Rn with i.i.d. N (0, 1)

components. Assume that w(S) <(√

m − 14√

m

). Then

P (Y ∩ S = 0) > 1 − 3.5e−

(√m− 1

4√

m−w(S)

)2

18 . (5)Remark: Gordon’s original constant 3.5 was substituted by 2.5in [21]. Both constants are fine for our subsequent analysis.

3. PROBABILISTIC ANALYSIS OF THE NULL-SPACECHARACTERIZATION

In this section we probabilistically analyze validity of the null-

space characterization given in Theorem 1. In the first subsection

of this section we will show how one can obtain the values of the

strong threshold βs for the entire range 0 < α ≤ 1 based on such

an analysis. In the second subsection we will determine the values

of the weak and the sectional threshold.

3.1. Strong thresholdAs masterly noted in [21] Theorem 2 can be used to probabilisti-

cally analyze (3). Namely, let S in (4) be

Ss = {w ∈ Sn−1|n∑

i=n−k+1

w̃i ≥n−k∑i=1

w̃i} (6)

where as earlier the notation w̃ is used to denote the vector ob-

tained by sorting the absolute values of the elements of w in non-

decreasing order. (Here and later in the paper, we assume that kis chosen such that there is an 0 < α ≤ 1 such that the solu-

tions of (1) and (2) coincide.) Let Y be an (n − m) dimensional

subspace of Rn uniformly distributed in Grassmanian. Further-

more, let Y be the null-space of A. Then as long as w(Ss) <(√m − 1

4√

m

), Y will miss Ss (i.e. (3) will be satisfied) with

probability no smaller than the one given in (5). More precisely,

if α = mn

is a constant (the case of interest in this paper), n, mare large, and w(Ss) is smaller than but proportional to

√m then

P (Y ∩ Ss = 0) −→ 1. This in turn is equivalent to having

P (∀w ∈ Rdn|Aw = 0,n∑

i=n−k+1

w̃i <

n−k∑i=1

w̃i) −→ 1

which according to Theorem 1 (or more precisely according to the

remark after Theorem 1) means that the solutions of (1) and (2)

coincide with probability 1. For any given value of α ∈ (0, 1] a

threshold value of β can be then determined as a maximum β such

that w(Ss) <(√

m − 14√

m

). That maximum β will be exactly

the value of the strong threshold βs. If one is only concerned with

finding a possible value for βs it is easy to note that instead of com-

puting w(Ss) it is sufficient to find its an upper bound. However,

as we will soon see, to determine as good values of βs as possi-

ble, the upper bound on w(Ss) should be as tight as possible. The

main contribution of this work and [24] is a fairly precise estimate

of w(Ss).

3911

Let |h|(i) be the i-th smallest magnitude of elements of h.

Set h̃ = (|h|(1), |h|(2), . . . , |h|(n))T . Further, let z ∈ Rn be a

column vector such that zi = 1, 1 ≤ i ≤ (n − k) and zi =−1, n − k + 1 ≤ i ≤ n. Using the Lagrange duality theory it is

shown in [24] that there is a cs = (1 − θs)n ≤ (n − k) such that

w(Ss) = E maxw∈Ss

n∑i=1

h̃i|wi| ≤ O(√

ne−n·const)

+

√√√√E

n∑i=cs+1

h̃2i − (E(h̃T z) − E

∑csi=1 h̃i)2

n − cs(7)

where h̃i is the i-th element of vector h̃ and const is an absolute

constant independent of n. When n is large the first term on the

right hand side of inequality in (7) goes to zero. Finding a good cs

and computing the second term on the right hand side of inequality

in (7) is then enough to determine a valid upper bound on w(Ss)and therefore to compute the values of the strong threshold. Such

results are established in [24] relying on [3, 22]. The following

theorem from [24] then summarizes how an attainable value of the

strong threshold can be computed.

Theorem 3 (Strong threshold) Let A be an m × n measurementmatrix in (1) with the null-space uniformly distributed in the Grass-manian. Let the unknown x in (1) be k-sparse. Let k, m, n be largeand let α = m

nand βs = k

nbe constants independent of m and n.

Let erfinv be the inverse of the standard error function associatedwith zero-mean unit variance Gaussian random variable. Further,let ε > 0 be an arbitrarily small constant and θ̂s, (βs ≤ θ̂s ≤ 1)be the solution of

(1 − ε)

√2πe−(erfinv(1−θs))2 − 2

√2πe−(erfinv(1−βs))2

θs

−√

2erfinv((1 + ε)(1 − θs)) = 0. (8)

If α and βs further satisfy

α >1√2π

⎛⎝√

2π + 2

√2(erfinv(1 − θ̂s))2

e(erfinv(1−θ̂))2−

√2π(1 − θ̂s)

⎞⎠

−(√

2πe−(erfinv(1−θ̂s))2 − 2

√2πe−(erfinv(1−βs))2

)2

θ̂s

(9)

then the solutions of (1) and (2) coincide with overwhelming prob-ability.Proof 1 Follows from the previous discussion and analysis pre-sented in [24].The results for the strong threshold obtained from the above theo-

rem as well as the best currently known ones from [8, 9] are pre-

sented on Figure 1. As can be seen, the threshold results obtained

from the previous analysis are comparable to those from [8,9] in a

large portion of the range for α. For the values of α that are close

to 1 the threshold values from Theorem 3 are slightly better than

those from [8, 9]. For α −→ 1 we have β ≈ .24 which matches

the value obtained in [13] and is in fact optimal.

3.2. Weak and sectional thresholdsIn this subsection we present the weak and sectional equivalents to

Theorem 3. We start with the weak threshold theorem.

Theorem 4 (Weak threshold) Let A be an m × n measurementmatrix in (1) with the null-space uniformly distributed in the Grass-manian. Let the unknown x in (1) be k-sparse. Further, let thelocation and signs of nonzero elements of x be arbitrarily chosenbut fixed. Let k, m, n be large and let α = m

nand βw = k

nbe

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

α

β/α

Strong threshold, l1−optimization

DonohoPresent paper

Fig. 1. Strong threshold, �1-optimization

constants independent of m and n. Let erfinv be the inverse of thestandard error function associated with zero-mean unit varianceGaussian random variable. Further, let ε > 0 be an arbitrarilysmall constant and θ̂w, (βw ≤ θ̂w ≤ 1) be the solution of

(1−ε)(1−βw)

√2πe−(erfinv( 1−θw

1−βw))2

θw−√

2erfinv((1+ε)1 − θw

1 − βw) = 0.

(10)If α and βw further satisfy

α >1 − βw√

2π

⎛⎝√

2π + 2

√2(erfinv( 1−θ̂w

1−βw))2

e(erfinv( 1−θ̂w

1−βw))2

−√

2π1 − θ̂w

1 − βw

⎞⎠

+ βw −

((1 − βw)

√2πe−(erfinv( 1−θ̂w

1−βw))2

)2

θ̂w

(11)

then the solutions of (1) and (2) coincide with overwhelming prob-ability.Proof 2 Follows from the analysis presented in [24].The results for the weak threshold obtained from the above theo-

rem as well as the best currently known ones from [8, 9] are pre-


from the previous analysis match those from [8, 9].

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

α

β/α

Weak threshold, l1−optimization

DonohoPresent paper

Fig. 2. Weak threshold, �1-optimization

Similarly we have the following sectional threshold theorem.

Theorem 5 (Sectional threshold) Let A be an m × n measure-ment matrix in (1) with the null-space uniformly distributed in theGrassmanian. Let the unknown x in (1) be k-sparse. Further, letthe location of nonzero elements of x be arbitrarily chosen butfixed. Let k, m, n be large and let α = m

nand βsec = k

nbe con-

stants independent of m and n. Let erfinv be the inverse of the

3912

standard error function associated with zero-mean unit varianceGaussian random variable. Further, let ε > 0 be an arbitrarilysmall constant and θ̂sec, (βsec ≤ θ̂sec ≤ 1) be the solution of

(1 − ε)(1 − βsec)

√2πe−(erfinv( 1−θsec

1−βsec))2 −

√2π

βsec1−βsec

θsec

−√

2erfinv((1 + ε)1 − θsec

1 − βsec) = 0. (12)

If α and βsec further satisfy

α >1 − βsec√

2π

⎛⎝√

2π + 2

√2(erfinv( 1−θ̂sec

1−βsec))2

e(erfinv( 1−θ̂sec

1−βsec))2

−√

2π1 − θ̂sec

1 − βsec

⎞⎠

+ βsec −

((1 − βsec)

√2πe−(erfinv( 1−θ̂sec

1−βsec))2 −

√2πβsec

)2

θ̂sec

(13)

then the solutions of (1) and (2) coincide with overwhelming prob-ability.Proof 3 Follows from the analysis presented in [24].The results for the sectional threshold obtained from the above the-

orem as well as the best currently known ones from [8, 9] are pre-


from the previous analysis slightly improve on those from [8, 9].

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

α

β/α

Sectional threshold, l1−optimization

DonohoPresent paper

Fig. 3. Sectional threshold, �1-optimization

4. CONCLUSIONIn this paper we considered recovery of sparse signals from a re-

duced number of linear measurements. We provided a theoretical

performance analysis of a classical polynomial �1-optimization al-

gorithm. Under the assumption that the measurement matrix A has

a basis of the null-space distributed uniformly in the Grassmanian,

we derived lower bounds on the values of the recoverable strong,

weak, and sectional thresholds in the so-called linear regime, i.e.

in the regime when the recoverable sparsity is proportional to the

length of the unknown vector. Obtained threshold results are com-

parable to the best currently known ones.

5. REFERENCES

[1] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde. Model-based compressivesensing. available online at http://www.dsp.ece.rice.edu/cs/.

[2] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A simple proof of therestricted isometry property for random matrices. Constructive Approximation,28(3), 2008.

[3] A. Barvinok and A. Samorodnitsky. Random weighting, asymptotic counting,and inverse isoperimetry. Israel Journal of Mathematics, 158:159–191, 2007.

[4] E. Candes. The restricted isometry property and its implications for compressedsensing. Compte Rendus de l’Academie des Sciences, Paris, Series I, 346, pages589–59, 2008.

[5] E. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signalreconstruction from highly incomplete frequency information. IEEE Trans. onInformation Theory, 52:489–509, December 2006.

[6] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-termapproximation. Journal of the American Mathematical Society, 22(1), Jan 2009.

[7] S. F. Cotter and B. D. Rao. Sparse channel estimation via matching pursuit withapplication to equalization. IEEE Trans. on Communications, 50(3), 2002.

[8] D. Donoho. Neighborly polytopes and sparse solutions of underdeterminedlinear equations. 2004. Technical report, Department of Statistics, StanfordUniversity.

[9] D. Donoho. High-dimensional centrally symmetric polytopes with neighbor-lines proportional to dimension. Disc. Comput. Geometry, 35(4):617–652,2006.

[10] D. Donoho and J. Tanner. Thresholds for the recovery of sparse solutions via l1minimization. Proc. Conf. on Information Sciences and Systems, March 2006.

[11] D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decompo-sitions. IEEE Trans. Inform. Theory, 47(7):2845–2862, November 2001.

[12] M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, and R. Bara-niuk. Single-pixel imaging via compressive sampling. IEEE Signal ProcessingMagazine, 25(2), 2008.

[13] C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits oflp decoding. STOC: Proceedings of the thirty-ninth annual ACM symposium onTheory of computing, 2007.

[14] Y. C. Eldar and M. Mishali. Robust recovery of signals from a structured unionof subspaces. 2008. available at arXiv:0807.4581.

[15] Y. Gordon. On Milman’s inequality and random subspaces which escapethrough a mesh in Rn. Geometric Aspect of of functional analysis, Isr. Semin.1986-87, Lect. Notes Math, 1317, 1988.

[16] R. Gribonval and M. Nielsen. Sparse representations in unions of bases. IEEETrans. Inform. Theory, 49(12):3320–3325, December 2003.

[17] M. A. Khajehnejad, A. G. Dimakis, W. Xu, and B. Hassibi. Sparse recov-ery of positive signals with minimal expansion. Preprint, 2009. available atarXiv:0902.4045.

[18] O. Milenkovic, R. Baraniuk, and T. Simunic-Rosing. Compressed sensingmeets bionformatics: a new DNA microarray architecture. Information The-ory and Applications Workshop, 2007.

[19] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solution oflinear matrix equations via nuclear norm minimization. 2007. available onlineat http://www.dsp.ece.rice.edu/cs/.

[20] J. Romberg. Imaging via compressive sampling. IEEE Signal Processing Mag-azine, 25(2):14–20, 2008.

[21] M. Rudelson and R. Vershynin. On sparse reconstruction from Fourier andGaussian measurements. Comm. on Pure and Applied Math., 61(8), 2007.

[22] S. M. Stigler. The asymptotic distribution of the trimmed mean. Analysis ofStatistics, 1:472–477, 1973.

[23] M. Stojnic. A simple performance analysis of �1-optimization in compressedsensing. ICASSP, International Conference on Acoustics, Signal and SpeechProcessing, April 2009.

[24] M. Stojnic. Various thresholds for �1-optimization in compressed sensing.Preprint, 2009. available at arXiv:0907.3666.

[25] M. Stojnic, F. Parvaresh, and B. Hassibi. On the reconstruction of block-sparsesignals with an optimal number of measurements. IEEE Trans. on Signal Pro-cessing, August 2009.

[26] J. Tropp, A. C. Gilbert, and M. Strauss. Algorithms for simultaneous sparseapproximation. part i: Greedy pursuit. Signal Processing, Aug 2005.

[27] E. van den Berg and M. P. Friedlander. Joint-sparse recovery from multiplemeasurements. Preprint, 2009. available at arXiv:0904.2051.

[28] H. Vikalo, F. Parvaresh, and B. Hassibi. On sparse recovery of compressed dnamicroarrays. Asilomor conference, November 2007.

[29] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy recovery ofsparsity. Proc. Allerton Conference on Communication, Control, and Comput-ing, September 2006.

[30] J. Wright and Y. Ma. Dense error correction via ell-1 minimization. availableonline at http://www.dsp.ece.rice.edu/cs/.

[31] W. Xu and B. Hassibi. Compressed sensing over the grassmann man-ifold: A unified analytical framework. 2008. available online athttp://www.dsp.ece.rice.edu/cs/.

[32] Y. Zhang. When is missing data recoverable. available online athttp://www.dsp.ece.rice.edu/cs/.

3913

[IEEE 2010 IEEE International Conference on Acoustics, Speech and Signal Processing - Dallas, TX,...

Documents

Transcript of [IEEE 2010 IEEE International Conference on Acoustics, Speech and Signal Processing - Dallas, TX,...