arXiv:cs/0111050v1 [cs.DS] 19 Nov 2001eﬃciently. In 1947, Dantzig [Dan51] introduced the simplex...

arX

iv:c

s/01

1105

0v1

[cs.

DS

] 19

Nov

200

1

Smoothed Analysis of Algorithms:

Why the Simplex Algorithm Usually Takes Polynomial Time ∗

Daniel A. Spielman †

Department of MathematicsMassachusetts Institute of Technology

Shang-Hua Teng ‡

Akamai Technologies Inc. andDepartment of Computer Science

University of Illinois at Urbana-Champaign.

September 13, 2018

Abstract

We introduce the smoothed analysis of algorithms, which is a hybridof the worst-case and average-case analysis of algorithms. Essentially, westudy the performance of algorithms under small random perturbations oftheir inputs. We show that the simplex algorithm has polynomial smoothedcomplexity.

∗An extended abstract of this paper appeared in the Proceedings of the 33rd Annual ACM Symposium

on Theory of Computing, pp. 296-305, 2001.†Partially supported by an Alfred P. Sloan Foundation Fellowship, and NSF CAREER award CCR:

9701304. Much of this paper was written during a Junior Faculty Research Leave sponsored by the M.I.T.

School of Science‡Partially suppoted by an Alfred P. Sloan Foundation Fellowship, and NSF grant CCR: 99-72532. Part

of this work was done while visiting the department of mathematics at M.I.T.

1

http://arxiv.org/abs/cs/0111050v1

Contents

1 Introduction 31.1 Linear Programming and the Simplex Method . . . . . . . . . . . . . . . . . 41.2 Smoothed Analysis of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 The Shadow Vertex Pivot Rule . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.6 Outline of the paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Gaussian Perturbations 112.1 Well-scaled polytopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 The Main Argument 153.1 Integral Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Focusing on one simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3 Division into Distance and Angle . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Bounding the distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.5 Bounding the Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.6 Combining the bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Bounding the Distance to a Random Simplex 284.1 Changes of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Analysis in the Prism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.3 Random point in an open prism . . . . . . . . . . . . . . . . . . . . . . . . . 404.4 Close prisms have close measure . . . . . . . . . . . . . . . . . . . . . . . . 42

5 Bounding the Angle 46

6 The exponential bound 52

7 Two Phase Simplex Algorithm 607.1 The Phase I Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647.2 A Thought Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.3 Projection from Simplex (b1, . . . , bd) to Simplex (a1, . . . ,ad) . . . . . . . 707.4 Analysis of the thought experiment . . . . . . . . . . . . . . . . . . . . . . . 747.5 Many good choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.6 Analysis of the two-phase simplex algorithm . . . . . . . . . . . . . . . . . . 82

8 Discussion and Open Questions 848.1 Practicality of the analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848.2 Further analysis of the simplex algorithm . . . . . . . . . . . . . . . . . . . 858.3 Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868.4 Smoothed Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

9 Acknowledgements 88

2

A Inequalities 90

B The Condition Number κ 93

1 Introduction

The Analysis of Algorithms community has been challenged by the existence of remarkablealgorithms that are known by scientists and engineers to work well in practice, but whosetheoretical analyses are negative or inconclusive. The root of this problem is that algorithmsare usually analyzed in one of two ways: by worst-case or average-case analysis. Worst-case analysis can improperly suggest that an algorithm will perform poorly by examining itsperformance under incredibly contrived circumstances that may never occur in practice. Onthe other hand, while many algorithms perform unusually well on random inputs consideredin average-case analysis, randomly generated inputs often bear little resemblance to thoseactually encountered in practice.

We propose an analysis that we call smoothed analysis that can help explain the successof many algorithms that both worst-case and average case cannot. In smoothed analysis, wemeasure the performance of an algorithm under slight random perturbations of arbitraryinputs. In particular, we consider absolute and relative Gaussian perturbations of inputsto algorithms that take real and complex inputs, and we measure the running time ofalgorithms in terms of the input size and the variance of the Gaussian perturbations.

We show that the simplex method has polynomial smoothed complexity. The simplexmethod is the classic example of an algorithm that is known to perform well in practice butwhich takes exponential time in the worst case[KM72, Mur80, GS79, Gol83, AC78, Jer73,AZ99]. In the late 1970’s and early 1980’s the simplex method was shown to converge inexpected polynomial time on various distributions of random inputs by researchers includingBorgwardt, Smale, Haimovitch, Adler, Karp, Shamir, Megiddo, and Todd[Bor80, Bor77,Sma83, Hai83, AKS87, AM85, Tod86]. However, the last 20 years of research in probability,combinatorics and numerical analysis have taught us that these random instances have veryspecial properties that one might not expect to find in practice.

For every matrix A with entries of absolute value at most 1, every vector z, and everyvector y whose entries are 1 or −1, we show that the simplex method using the shadow-vertex pivot rule almost always takes time polynomial in 1/σ and the sizes of A and z tosolve

maximize zTx

subject to (A+ σG)x ≤ y, (1)

where σG is a matrix of independently chosen Gaussian random variables of mean 0 andstandard deviation σ.

We note that if A is well-scaled, then the solution to this program is an approximationto the solution of the original.

3

1.1 Linear Programming and the Simplex Method

It is difficult to overstate the importance of linear programming to optimization. Linearprogramming problems arise in innumerable industrial contexts. Moreover, linear program-ming is often used as a fundamental step in other optimization algorithms. In a linearprogramming problem, one is asked to maximize or minimize a linear function over a poly-hedral region. The general form of a linear program is

maximize zTx

subject to Ax ≤ y, (2)

where A is a m-by-d matrix, z is a d-vector, and y is an m-vector.Perhaps one reason that we see so many linear programs is that we can solve them

efficiently. In 1947, Dantzig [Dan51] introduced the simplex method, which was the firstpractical approach to solving linear programs and which remains widely used today. Tostate it roughly, the simplex method proceeds by walking from one vertex to another ofthe polyhedron defined by the inequalities in (2). At each step, it walks to a vertex that isbetter with respect to the objective function. The algorithm will either determine that theconstraints are unsatisfiable, determine that the objective function is unbounded, or reacha vertex from which it cannot make progress, which necessarily optimizes the objectivefunction.

Because of its great importance, other algorithms for linear programming have beeninvented. In 1979, Khaciyan [Kha79] applied the ellipsoid algorithm to linear programmingand proved that it always converged in polynomial time. However, the ellipsoid algorithmhas not been competitive with the simplex method in practice. In contrast, the interior-point method introduced in 1984 by Karmarkar [Kar84] always runs in polynomial timeand has performed very well: variations of the interior point method are competitive withand occasionally superior to the simplex method in practice.

In spite of half a century of attempts to unseat it, the simplex method remains the mostpopular method for solving linear programs. However, there has been no satisfactory the-oretical explanation of its excellent performance. A fascinating approach to understandingthe performance of the simplex method has been the attempt to prove that there alwaysexists a short walk from each vertex to the optimal vertex. The Hirsch conjecture statesthat there should always be a walk of length m− d. Significant progress on this conjecturewas made by Kalai and Kleitman [KK92], who proved that there always exists a walk oflength at most mlog2 d+2. However, the existence of such a short walk does not imply thatthe simplex method will find it.

A simplex method is not completely defined until one specifies its pivot rule—the methodby which it decides which vertex to walk to when it has many to choose from. Thereis no deterministic pivot rule under which the simplex method is known to take a sub-exponential number of steps. In fact, for almost every deterministic pivot rule there is afamily of polytopes on which it is known to take an exponential number of steps[KM72,Mur80, GS79, Gol83, AC78, Jer73]. (See [AZ99] for a survey and a unified construction ofthese polytopes). The best present analysis of randomized pivot rules shows that they take

expected time mO(√d)[Kal92, MSW96], which is quite far from the polynomial complexity

observed in practice. This inconsistency between the exponential worst-case behavior of the

4

simplex method and its everyday practicality leave us wanting a reasonable average-caseanalysis.

Various average-case analyses of the simplex method have been performed. Most rele-vant to this paper is the analysis of Borgwardt [Bor77, Bor80] who proved that the simplexmethod with the shadow vertex pivot rule runs in expected polynomial time for polytopeswhose constraints are drawn independently from spherically symmetric distributions (e.g.Gaussian distributions centered at the origin). Independently, Smale [Sma83, Sma82] provedbounds on the expected running time of the Lemke’s self-dual parametric simplex algorithmon linear programming problems chosen from a spherically-symmetric distribution. Smale’sanalysis was substantially improved by Megiddo [Meg86].

Another model of random linear programs was studied in a line of research initiatedindependently by Haimovich [Hai83] and Adler [Adl83]. Their works considered the max-imum over matrices of the expected time taken by parametric simplex methods to solvelinear programs over these matrices in which the directions of the inequalities are chosen atrandom. While this framework considers the maximum of an average, it does not fit intothe smoothed complexity framework because the random choice of directions of inequalitiescannot be viewed as a perturbation, as different choices yield radically different linear pro-grams. They roughly proved that parametric simplex methods would take expected lineartime to solve phase II linear programming problems, even when conditioned on the prob-lems being feasible. Complete algorithms for linear programming under this model wereanalyzed by Todd [Tod86] and Adler and Megiddo [AM85] who proved quadratic bounds onthe expected time taken by the lexicographic Lemke algorithm to solve linear complemen-tarity problems derived from linear programming problems. Further average-case resultswere obtained in [AKS87].

While these average-case analyses are significant accomplishments, they do not explainthe performance of the simplex method in practice. Problem instances encountered inpractice may have little resemblance to those generated at random. Moreover, it is nowwell-understood that random combinatorial objects have many special properties. Edel-man [Ede92] writes on this point:

What is a mistake is to psychologically link a random matrix with the intu-itive notion of a “typical” matrix or the vague concept of “any old matrix.”

1.2 Smoothed Analysis of Algorithms

We introduce the smoothed analysis of algorithms in the hope that it will succeed in explain-ing the good practical performance of many algorithms for which worst-case and average-case analysis have failed. Our first application of the smoothed analysis of algorithms willbe to the simplex method. We will consider the running time of the simplex method oninputs of the form

maximize zTx

subject to (A+ σG)x ≤ y, (3)

where we let A be arbitrary and G be a matrix of independently chosen Gaussian randomvariables of mean 0 and variance 1. If we let σ go to 0, then we obtain the worst-case

5

complexity of the simplex method; whereas, if we let σ be so large that σG swamps out A,we obtain the average-case analyzed by Borgwardt. By choosing polynomially small σ, thisanalysis combines advantages of worst-case and average-case analysis.

In a smoothed analysis of an algorithm, we assume that the inputs to the algorithm aresubject to slight random perturbations, and we measure the complexity of the algorithmin terms of the input size and the variance of the perturbations. If an algorithm has lowsmoothed complexity, then one should expect it to work well in practice since most real-world problems are generated from data that is inherently noisy. Another way of thinkingabout smoothed complexity is to observe that if an algorithm has low smoothed complexity,then one must be unlucky to choose an input instance on which it performs poorly.

We now provide some definitions for the smoothed analysis of algorithms that take realor complex inputs. For an algorithm A and input x, let

C(A, x)

be a complexity measure of A on input x. Let X be the domain of inputs to A, and letXn be the set of inputs of size n. The size of an input can be measured in various ways.Standard measures are the number of real variables contained in the input and the sumsof the bit-lengths of the variables. Using this notation, one can say that A has worst-caseC-complexity f(n) if

maxx∈Xn

(C(A, x)) = f(n).

Given a family of distributions µn on Xn, we say that A has average-case C-complexityf(n) under µ if

Ex

µn←Xn [C(A, x)] = f(n).

Similarly, we say that A has smoothed C-complexity f(n, σ) if

maxx∈Xn

Eg [C(A, x+ σmax(x)g)] = f(n, σ),

where g is a vector of Gaussian random variables of mean 0 and variance 1 and max(x) isthe maximum entry in x.

In Section 8, we present some generalizations of the definition of smoothed complexitythat might prove useful.

1.3 Notation

In this section, we review the notational conventions we use in this paper. The readershould probably skip this section and just refer to it as necessary.

• For an event A, we let

[A]

denote the function that is 1 if A is true and 0 otherwise.

6

• For a measurable set B and a measure ρ, we write

xρ← B

to indicate that x is a random variable taking values in B according to the distributioninduced by ρ, which is ρ(x)/ρ(B).

• We use bold letters like x, a and ω to denote vectors.

• We use 〈x|y〉 to denote the inner product of vectors x and y.

• Sd−1 denotes the unit sphere in IRd.

• We will often represent a plane in IRd by spherical coordinates (ω, r), where ω ∈ Sd−1and r ∈ IR. The points on this plane are those x such that

〈x|ω〉 = r.

For the following definitions, we let b1, . . . , bm denote a set of vectors in IRd.

• Span (b1, . . . , bk) denotes the subspace spanned by b1, . . . , bk.

• Aff (b1, . . . , bk) denotes the hyperplane that is the affine span of b1, . . . , bk.

• ConvHull (b1, . . . , bm) denotes the convex hull of b1, . . . , bm.

• Assuming b1, . . . , bk are affinely independent, Simplex (b1, . . . , bk) denotes the sim-plex ConvHull (b1, . . . , bk).

1.4 The Shadow Vertex Pivot Rule

Gass and Saaty [GS55] introduced the shadow vertex algorithm as a parametric approachto solving linear programs. Starting at a vertex x, the algorithm chooses some objectivefunction, t, that is optimized by x. The algorithm then slowly transforms t into the objec-tive function of the linear program, z, finding the vertex that optimizes each intermediateobjective function. By remembering t, this provides a pivot rule, known as the Gass-Saatyrule. The vertices encountered during the resulting walk are contained in the set of vectorsthat optimize functions in the span of t and z. The algorithm gets its name from the factthat if one projects the polytope to the plane spanned by t and z, then these vertices arepre-images of the corners of the resulting polygon. Borgwardt’s analysis used the shadowvertex algorithm, and much of the material in this section can be found in his book [Bor80].It is included here for the reader’s convenience.

The shadow-vertex pivot rule seems most natural in the polar formulation of linearprogramming. To understand this formulation, first recall what it means for a point x tobe a solution to

maximize zTx

subject to 〈ai|x〉 ≤ yi, for 1 ≤ i ≤ m (4)

7

Claim 1.4.1 A point x is a solution to (4) if and only if

1. x is feasible, that is 〈ai|x〉 ≤ yi, for 1 ≤ i ≤ m.

2. x is extremal: there exists a set B of inequalities such that 〈ai|x〉 = yi, for {ai : i ∈ B},and

3. x is optimal: z can be expressed as

z =∑

i∈Bαiai, with αi ≥ 0.

Condition (3) says that z lies in the cone from the origin through the simplex spanned by{ai : i ∈ B}. In general, x is usually a vertex of the polytope defined by (4) and B is a setof d linearly independent inequalities; in which case the simplex spanned by {ai : i ∈ B}has co-dimension 1. This condition inspires the polar formulation.

Whenever a polytope contains the origin, its polar is defined. The polytope correspond-ing to (4) contains the origin if and only if each yi > 0, in which case we say that the originis feasible. In this case, we can assume without loss of generality that each yi = 1, and wecan define the polar by

Definition 1.4.2 Let P be the polytope defined by

〈ai|x〉 ≤ 1, for 1 ≤ i ≤ m.

The polar of P , denoted P̄ , is defined to be the convex hull of a1, . . . ,am.

Thus, facets of the primal correspond to vertices of the polar, and vertices of the primalcorrespond to facets of the polar. If the primal is a simple polytope (one in which eachvertex is the intersection of exactly d planes), then the polar is a simplicial polytope (onein which each face is a simplex). A description of the simplex that is polar to the solutionto (4) is given by the following definition:

Definition 1.4.3 For vectors a1, . . . ,am, and a unit vector z, we let

optSimpz;y(a1, . . . ,am)

denote the simplex Simplex ({ai : i ∈ B}), where B is as in Claim 1.4.1. If the program(4) is unbounded or infeasible, then we let optSimpz;y(a1, . . . ,am) be the empty set.

If the origin is feasible, then optSimpz;y(a1, . . . ,am) is a facet of the convex hull ofa1, . . . ,am and every facet of the convex hull optimizes some objective function. If someyi < 0, then we do not obtain as nice a characterization of optSimpz;y(a1, . . . ,am).

In the event that each yi = 1, the linear program has the polar formulation:

maximize α

subject to αz ∈ ConvHull (a1, . . . ,am) . (5)

8

Lemma 1.4.4 Assuming each yi = 1, the solution to (5) is a solution to (4). Moreover,the facet of ConvHull (a1, . . . ,am) in which αz lies is polar to the optimal vertex of (4).

Proof Since the origin is feasible, the solution to (5) will have α > 0. Moreover, therewill exist a set of d linearly independent points B such that

αz ∈ Simplex ({ai : i ∈ B}) .

The points {ai : i ∈ B} lie in the plane determined by x where x is the solution to

〈ai|x〉 = 1, for i ∈ B.

As Simplex ({ai : i ∈ B}) is a facet of ConvHull (a1, . . . ,am), the plane x is a supportingplane, which means

〈ai|x〉 ≤ 1, for 1 ≤ i ≤ m.

Thus, x satisfies conditions (1) and (2) of Claim 1.4.1. Finally, we note that αz ∈Simplex ({ai : i ∈ B}) implies that αz is a non-negative linear combination of {ai : i ∈ B}.Since α > 0, z is a positive linear combination of these points.

For linear programs in which the origin is feasible, the shadow-vertex simplex algorithmcan be seen to walk along the exterior of the convex hull of a1, . . . ,am. We assume thatit begins knowing a facet of the convex hull (how one finds a starting facet or handles aprogram with infeasible origin is discussed in Section 7). It then chooses some point insidethis facet, and forms the unit vector in the direction of this point, which we will call t. Thealgorithm then examines the directions that are encountered as it transforms t into z, andtracks the facets encountered along the way. To make this formal let

tβ =βt+ (1− β)z‖βt+ (1− β)z‖ ;

so, t1 = t and t0 = z. As β goes from 1 to 0, the shadow-vertex simplex algorithmencounters the facets

optSimptβ ;y(a1, . . . , am).

Of course, the algorithm does not actually modify β continuously, but rather computes eachfacet along the path from the facet previously encountered.

For general linear programs, the simplices encountered by the shadow vertex algorithmdo not necessarily lie on the convex hull of the ais. So, we introduce the notation

optSimpsz,t;y(a1, . . . ,am)def=

⋃

v∈Span(z,t)

{optSimpv;y(a1, . . . ,am)

},

which captures the set of simplices the shadow vertex algorithm could encounter during itswalk. If each yi = 1, then the simplices encountered by the algorithm are exactly those thathave non-empty intersection with the plane Span (t,z).

9

1.5 Our Results

We consider linear programming problems of the form

maximize zTx

subject to 〈ai|x〉 ≤ yi, for 1 ≤ i ≤ m (6)where each

yi ∈ {1,−1} ,and

‖ai‖ ≤ 1.We remark that these restrictions do not change the family of linear programs expressible,and that it is simple to transform any linear program into one of this form: first mapx 7→ x+ v for some vector v so that none of the yis are zero, then divide each ai by |yi|,and finally map x 7→ xmaxi ‖ai‖.

We study the perturbations of these programs given by

maximize zTx

subject to 〈bi|x〉 ≤ yi, for 1 ≤ i ≤ m (7)where

bi = ai + gi

and gi is a Gaussian random vector of variance σ2 centered at the origin. Thus, bi is a

Gaussian random vector of variance σ2 centered at ai. Throughout the paper, we willassume 0 ≤ σ ≤ 1.

Our first result, Theorem 3.0.1, says that, for every program of form (6) and for everyvector t, the expected size of the projection of the polytope defined by the equations (7)onto the plane spanning t and z is polynomial in m, the dimension, and 1/σ. To state thisin terms of smoothed complexity, let C(S) count the number of items in the set S. LetXd,m be the set of inputs in d-dimensions with m planes:

Xd,m ={(t,z,y;a1, . . . ,am)

∣∣∣t ∈ IRd,z ∈ IRd,y ∈ {±1}m ,ai ∈ IRd, and ‖ai‖ ≤ 1, for 1 ≤ i ≤ m}.

Then, our result says that the smoothed C-complexity of optSimpst,z;y(a1, . . . ,am) ispolynomial in m, d, and 1/σ, where we only apply perturbations to the ais. This roughlycorresponds to analyzing Phase II of a simplex algorithm. In Section 7, we describe a simpletwo-Phase shadow-vertex simplex algorithm and use Theorem 3.0.1 as a black box to showthat it takes expected time polynomial in m, d, and 1/σ.

As a sanity check on whether it is reasonable to consider slight perturbations of linearprograms, we ask whether slight perturbations can have a large impact on the solutions ofthe programs. We demonstrate that for a large family of linear programs, slight pertur-bations have little impact on their solutions. In Section 2.1, we show that if the polytopecorresponding to the equations (6) contains a ball of radius r and is contained in a co-centricball of radius R, then for σ < r/dR the solution to (7) is an approximation to the solutionof (6), with high probability.

10

1.6 Outline of the paper

In Section 2 we review some basic facts about Gaussian perturbations and demonstratethat small perturbations have little effect on small well-scaled polytopes. In Section 3, wepresent and prove the main theorem of the paper. Technical results required for the proofappear in Sections 4, 5, and 6. Sections 4 and 6 should be relatively self-contained. Theapplication of our main theorem to the analysis of a two-phase simplex algorithm is givenin Section 7. Some elementary inequalities used in the proofs are proved in Section A. Adiscussion of these results and suggestions for further work on smoothed analysis appearsin Section 8. Finally, an index of notation appears at the end of the paper.

2 Gaussian Perturbations

In this section, we formally describe how we perturb linear programming problems andexplore how these perturbations change polytopes.

Recall that a d dimensional Gaussian distribution with variance σ2 centered at a pointa is defined by the probability density function

µ(b) =(1/√2πσ2

)de−‖a−b‖

2/2σ2 .

Throughout the paper, we will assume 0 ≤ σ ≤ 1.For a polytope Pa given by the equations

〈ai|x〉 ≤ yi,

where yi ∈ {1,−1} and ‖ai‖ ≤ 1, we will examine polytopes Pb defined by

〈bi|x〉 ≤ yi,

where bi = ai+gi and gi is a Gaussian random vector of variance σ2 centered at the origin.

Thus, bi is a Gaussian random vector of variance σ2 centered at ai. We remark that Pb is a

simple polytope with probability 1. That is, each vertex of Pb is the intersection of exactlyd hyperplanes.

We will make frequent use of the following fundamental fact about Gaussian distribu-tions.

Lemma 2.0.1 Let g be a d-dimensional Gaussian random vector of variance σ2 centeredat the origin. Then,

Pr [‖g‖ > kσ] ≤ e−(k2−d)/2(k2/d)d/2−1.

Proof To bound the probability that a gi has large norm, we observe that ‖gi‖2 is thewell-studied chi-square distribution with d degrees of freedom. A crude application of thebounds in [GTM69] is

Pr[‖gi‖2 > vσ2

]≤ e−(v−d)/2(v/d)d/2−1.

11

Therefore,

Pr [‖gi‖ > kσ] = Pr[‖gi‖2 > k2σ2

]≤ e−(k2−d)/2(k2/d)d/2−1.

Corollary 2.0.2 Let b1, . . . , bm be d-dimensional Gaussian random vectors of variance σ2

centered at a1, . . . ,am respectively. Assuming d ≥ 3 and k ≥ 3d,

Pr [∃i : dist (ai, bi) ≥ kσ] ≤ me−k2/4.

In particular, for k = 8d√

log(m/σ), this probability is at most me−16d2 log(m/σ).

We will make use of this corollary throughout the paper, and we will fix the value of kthroughout the paper to

k = 8d√

log(m/σ).

We will frequently assume that dist (ai, bi) ≤ kσ. As the probability that this assumptionis false is at most me−16d

2 log(m/σ), and Pb has at most(md

)vertices, the estimate of the ex-

pected number of steps taken by the simplex algorithm that we make under this assumptioncan be off by at most an additive

me−16d2 log(m/σ)

(m

d

)< 1.

2.1 Well-scaled polytopes

In this section, we show that if the polytope defined by (6) is well-scaled and contains theorigin, then it is close to the polytope defined by (7). Note that the origin is inside thepolytope if and only if yi = 1 for all i. The results in this section are not necessary forthe results in the rest of the paper. Rather, they should be viewed as a sanity check: theydemonstrate that perturbations are reasonable for a rich family of linear programs.

For every polytope in d dimensions, there is a linear transformation under which thepolytope contains a ball of radius 1 and is contained in a ball of radius d (c. f. [GLS91,3.1.9]) with the same center. In this section, we assume that the polytope Pa is containedbetween two concentric balls, with the origin as the center, of radii 1 and R, respectively.In particular, we assume that, for all i,

1/R ≤ ‖ai‖ ≤ 1.

Let Pa be the polytope defined by a1, . . .am and Pb be the polytope defined by b1, . . . bm,i.e., the polytope {〈bi|x〉 ≤ 1, 1 ≤ i ≤ m}. Pb is simple with probability 1. That is, everyvertex of Pb is the intersection of exactly d hyperplanes.

We will now show that Pa and Pb do not differ too much in space.

12

Lemma 2.1.1 Let a1, . . .am be planes that determine a polytope Pa that contains the ballof radius 1 around the origin and is contained in the ball of radius R around the origin. Letg1, . . . gm be Gaussian random vectors of variance σ

2 centered at the origin. Let bi = ai+giand let Pb be the polytope given by planes b1, . . . bm. Then for σ < ǫ/Rk, where we recallk = 8d

√log(m/σ), the following hold with probability at least 1−me−16d2 log(m/σ).

(a) Pb contains the ball centered at the origin of radius 1− ǫ/R.

(b) Pb is contained in a ball centered at the origin of radius R/(1− ǫ).

(c) For every point in Pb there is a point in Pa at distance at most 2ǫR.

(d) For every direction, given by a unit vector z,

(1− ǫ)optz(a1, . . . ,am) ≤ optz(b1, . . . , bm) ≤optz(a1, . . . ,am)

1− ǫ ,

where by

optz(a1, . . . ,am),

we mean the solution to (6) assuming yi = 1 for each i.

(e) For every vertex v of Pb and every linear function z optimized by v, the angle of vto z is at most arccos(1− ǫ/(R+ ǫ)).

Proof These statements all follow if all the gis have norm less than ǫ/R, which Lemma 2.0.1implies happens with probability at least 1−me−16d2 log(m/σ).

We now prove Part (a). First note that the distance from the origin to the plane definedby 〈a|x〉 = 1 is 1/ ‖a‖. With probability at least 1−me−16d2 log(m/σ), we have

‖bi‖ ≤ ‖ai‖+ ‖gi‖ ≤ 1 + ǫ/R;

so, the distance from the plane polar to bi to the origin is at least

1/(1 + ǫ/R) ≥ 1− ǫ/R.

We now turn to the proof of parts (b) and (c), assuming that each gi has norm less thanǫ/R. Consider any direction, given by a unit vector t. We will show that the intersectionof the ray in direction t with Pa will not be far from its intersection with Pb. Let ai be thefirst plane that the ray intersects, and let

〈αt|ai〉 = 1;

so, the intersection is at distance α from the origin. We now show that the intersection ofthe ray with Pb cannot be further than α/(1 − ǫ) from the origin. We compute

〈αt|bi〉 = 〈αt|ai〉+ 〈αt|gi〉 ≥ 1− α ‖gi‖ ≥ 1− αǫ/R ≥ 1− ǫ.

So, the intersection of ray t with Pb has distance at most α/(1 − ǫ) from the origin. Thisproves (b).

13

Assuming (b), we can now apply the same reasoning with the roles of Pa and Pb reversed.We thereby prove part (c).

We now prove Part (d). Let u(b) be the optimal vertex in that achieves optz(b1, . . . , bm).Let u(a) be the point of intersection of the ray from ~0 through u(b) with Pa. From theproof of Part (c), we know that

‖u(b)‖ ≤ ‖u(a)‖(1− ǫ) .

Therefore,

optz(b1, . . . , bm) ≤optz(a1, . . . ,am)

(1− ǫ) ,

because optz(b1, . . . , bm) is equal to 〈z|u(b)〉.Similarly, Let v(a) be the optimal vertex in that achieves optz(a1, . . . ,am). Let v(b)

be the point of intersection of the ray from ~0 through v(a) with Pb. We have

‖v(a)‖ ≤ ‖v(b)‖(1− ǫ) .

Therefore,

optz(a1, . . . ,am) ≤optz(b1, . . . , bm)

(1− ǫ) ,

To prove part (e), let b1, . . . , bd be the planes that meet at vertex v. We think of thesebis as being points on the plane defined by v. Let p be the point of intersection of thisplane with the ray from ~0 through z. So, 〈v|p〉 = 1.

By Claim 1.4.1, we know that p is contained in Simplex (b1, . . . , bd). Thus,

‖p‖ ≤ maxi‖bi‖ ≤ 1 + ǫ/R.

From part (b), we know that ‖v‖ ≤ R/(1− ǫ). Letting θ be the angle between z and v,we compute

cos(θ) =〈p|v〉‖p‖ ‖v‖

≥ 1(R/(1 − ǫ))(1 + ǫ/R)

=1− ǫR+ ǫ

.

14

3 The Main Argument

Our main result is that for any two vectors t and z, the expected size of

optSimpsz,t;y(b1, . . . , bm)

is polynomial in m, d, and 1/σ. To bound the number of simplices in this set, let q be aunit vector in the span of t and z that is perpendicular to z. Then, let

zθ = z cos(θ) + q sin(θ).

We want to bound the size of

Eb1,... ,bm

[∣∣∣∣∣⋃

θ

{optSimpzθ;y(b1, . . . , bm)

}∣∣∣∣∣

](8)

Our approach is to consider N points located uniformly around the circle, θ0, . . . , θN−1,and to compute

Eb1,... ,bm

∣∣∣∣∣∣

⋃

i∈[0,... ,N−1]

{optSimpzθi ;y

(b1, . . . , bm)}∣∣∣∣∣∣

(9)

for N sufficiently large that it is exceedingly unlikely that any simplices have been missed.We will bound (9) by counting those i for which zθi and zθi+1 have different optimal

simplices. That is, we use∣∣∣∣∣∣

⋃

i∈[0,... ,N−1]

{optSimpzθi ;y

(b1, . . . , bm)}∣∣∣∣∣∣

=∣∣∣{i : optSimpzθi ;y

(b1, . . . , bm) 6= optSimpzθ((i+1) mod N) ;y(b1, . . . , bm)}∣∣∣

Theorem 3.1.2 implies that for every i,

Prb1,... ,bm

[optSimpzθi ;y

(b1, . . . , bm) 6= optSimpzθ((i+1) mod N) ;y(b1, . . . , bm)]

≤ f(d,m, σ)/N +me−16d2 log(m/σ),where f(d,m, σ) is polynomial in d, m, and 1/σ. In particular, it shows that for every i,if optSimpzθi ;y

(b1, . . . , bm) is non-empty, then the probability that it contains a cone of

angle greater than 2π/N around zθi is at least

1− f(d,m, σ)/N −me−16d2 log(m/σ).So, the probability that a pair of consecutive rays zθi and zθi+1 have different optimizingsimplices is at most

f(d,m, σ)/N +me−16d2 log(m/σ).

Thus,

Eb1,... ,bm

[∣∣∣∣∣⋃

θ

{optSimpzθ;y(b1, . . . , bm)

}∣∣∣∣∣

]≤ f(d,m, σ) +Nme−16d2 log(m/σ).

15

Theorem 3.0.1 Let z and t be unit vectors and let a1, . . . ,am be points of norm at most 1.Let µ1, . . . , µm be Gaussian measures of variance σ

2 < 1 centered at a1, . . . ,am respectively,and let b1, . . . , bm be random points distributed independently according to µ1, . . . , µm.Then, the expected size of

optSimpsz,t;y(b1, . . . , bm)

is at most f(d,m, σ) + 3, where

f(d,m, σ) = 2π2 · 1012(d+ 1)d4m2(1 + 8d

√log(m/σ)σ)11

σ12.

Proof We follow the argument outlined in this section. We define

θ0 =

(σ

18d6m2d log(md/σ)

)3, and

N = 7/θ0.

The angle between zθi and zθi+1 is 2π/N . Applying Theorem 3.1.2 with ǫ = 2π/N , we findthat if optSimpzθi ;y

(b1, . . . , bm) is non-empty, then the chance that this simplex containsa cone radius ǫ around zθi , and therefore zθi+1 , is at least

1− (2π/N)(2 · 1012(d+ 1)d4m2(1 + 8d

√log(m/σ)σ)11

σ12

)−me−11d2 log(m/σ).

Therefore, the probability that zθi+1 is not optimized by the simplex that optimizes zθi isat most

(2π/N)

(2 · 1012(d+ 1)d4m2(1 + 8d

√log(m/σ)σ)11

σ12

)+me−11d

2 log(m/σ).

So, the expected number of simplices that optimize rays in{zθ0 , . . . , zθN−1

}is at most

f(m,d, σ) +Nme−11d2 log(m/σ).

We now compute

Nme−11d2 log(m/σ) = 7

(18d6m2d log(md/σ)

σ

)3me−11d

2 log(m/σ)

= elog 7me3(log(18)+6 log(d)+2d log(m)+log(log(md/σ))+log(1/σ))e−11d2 log(m/σ)

≤ elog 7me3(5d log(m/σ))e−33d log(m/σ),≤ 1,

where we reach this conclusion assuming m ≥ d ≥ 3.

16

So, the number of simplices that optimize rays in zθ0 , . . . , zθN−1 is at most

f(m,d, σ) + 1.

On the other hand, the zθi for which optSimpzθi ;y(b1, . . . , bm) is the empty set (where

the polytope is unbounded) are contiguous. So, these do not add to the size of the shadow.To bound the probability that these N points have missed a simplex entirely, we use

the main result of Section 6. By applying Theorem 6.0.1 with θ = θ0, we find that theprobability that the intersection of any simplex with Span (t,z) subtends an angle lessthan

θ0 ≥ 2π/Nis at most

θ1/30 2d

2√e(1 + σ

√d log(1/θ)

)2(md

)(d

2

)/σ ≤ θ1/30 d4md2 (1 + d log(1/θ))

≤ θ1/30 d4md2(1 + 9d2 log(md/σ)

)

≤ θ1/30 d4md4(9d2 log(md/σ)

)

≤ 2m−d

≤ 1(md

) .

As this bad event occurs with probability less than the reciprocal of the total numberof possible simplices, it can add at most 1 to the expected size of

optSimpsz,t;y(b1, . . . , bm).

3.1 Integral Formulation

Let z be a unit vector and let b1, . . . , bm be the perturbations of a1, . . . ,am. In this section,we will show that it is unlikely that z makes a small angle with any vector in the boundaryof optSimpz;y(b1, . . . , bm). That is, there is probably a cone around z of reasonably largeangle that is strictly contained in the cone from the origin through optSimpz;y(b1, . . . , bm).The proof has two parts: we show that the point of intersection of the ray through z withoptSimpz;y(b1, . . . , bm) is probably far from the boundary of the simplex, and we showthat it is unlikely that z intersects the simplex at a small angle. To make this formal, wedefine

Definition 3.1.1 For a unit vector z and vectors b1, . . . , bd for which the ray from theorigin through z intersects the simplex spanned by b1, . . . , bd,

ang (z, ∂Simplex (b1, . . . , bd))

is the minimum of the angle of z to v, where v ranges over the boundary of Simplex (b1, . . . , bd).We use the convention

ang (z, ∅) =∞.

17

Let p(1)z (ǫ) be the probability that z is ǫ close in angle to the boundary of the simplex

it intersects:

p(1)z (ǫ)

def= Pr

b1,... ,bm

[ang

(z, ∂optSimpz;y(b1, . . . , bm)

)< ǫ].

Note that if the constraints are infeasible or the progam is unbounded, then optSimpz;y(b1, . . . , bm)is empty, and the angle is defined to be ∞.

The main result of this section is

Theorem 3.1.2 Let z be a unit vector and let a1, . . . ,am be points of norm at most 1. Letµ1, . . . , µm be Gaussian measures of variance σ

2 centered at a1, . . . ,am respectively, andlet b1, . . . , bm be random points distributed independently according to µ1, . . . , µm. Then,

p(1)z (ǫ) ≤ ǫ

2 · 1012(d+ 1)d4m2(1 + kσ)11σ12

+me−11d2 log(m/σ).

To prove Theorem 3.1.2, we analyze an integral expression for p(1)z (ǫ). To simplify the

notation in this integral, let Π(d,m) be the set of all permutations on m elements in whichthe first d elements are sorted and the last m− d elements are sorted. For example,

Π(2, 4) = {(1, 2, 3, 4), (1, 3, 2, 4), (1, 4, 2, 3), (2, 3, 1, 4), (2, 4, 1, 3), (3, 4, 1, 2)} .

The other notational convention we introduce is

Definition 3.1.3 For an event A, we let

[A]

denote the function that is 1 if A is true and 0 otherwise.

LetCone(bπ(1), . . . , bπ(d)

)denote the cone from the origin through Simplex

(bπ(1), . . . , bπ(d)

).

Definition 3.1.4 For a permutation π, let

CHy (π(1), . . . , π(d)|b1, . . . , bm)

be the event that there exists ω ∈ Sd−1 and r such that〈ω|bπ(i)

〉= yπ(i)r, for 1 ≤ i ≤ d, and〈

ω|bπ(i)〉≤ yπ(i)r, for 1 ≤ i ≤ m.

In the case that each yi = 1, this is equivalent to the condition that Simplex(bπ(1), . . . , bπ(d)

)

be a facet of the convex hull of b1, . . . , bm.

By Claim 1.4.1 and Definition 1.4.3, we have[Simplex

(bπ(1), . . . , bπ(d)

)= optSimpz;y(b1, . . . , bm)

]

= [CHy (π(1), . . . , π(d)|b1, . . . , bm)][z ∈ Cone

(bπ(1), . . . , bπ(d)

)]

That is, (ω, 1/r) is the polar coordinate representation of x from Claim 1.4.1.Using this notation, we write:

18

Lemma 3.1.5

p(1)z (ǫ) =

∑

π∈Π(d,m)

∫

b1,... ,bm

[CHy (π(1), . . . , π(d)|b1, . . . , bm)][z ∈ Cone

(bπ(1), . . . , bπ(d)

)]

[ang

(z, ∂Simplex

(bπ(1), . . . , bπ(d)

))< ǫ]µ1(b1) · · · µn(bn) db1 , . . . , dbm .

Proof For each choice of π(1), . . . , π(d), the probability that the simplex spanned by thevectors indexed by π(1), . . . , π(d) is optimal for z is

∫b1,... ,bm

[CHy (π(1), . . . , π(d)|b1, . . . , bm)][z ∈ Cone

(bπ(1), . . . , bπ(d)

)]

µ1(b1) · · · µn(bn) db1 , . . . , dbm .

By multiplying by

[ang

(z, ∂Simplex

(bπ(1), . . . , bπ(d)

))< ǫ],

we mask out all those configurations in which the angle is at least ǫ. Note that in theconfigurations in which the program is unbounded or infeasible, we will have

[CHy (π(1), . . . , π(d)|b1, . . . , bm)][z ∈ Cone

(bπ(1), . . . , bπ(d)

)]= 0;

so, these will not contribute to the integral. This is appropriate because in these cases wehave defined the angle to be infinite.

3.2 Focusing on one simplex

Our analysis will be by brute force: we will bound the integral for every choice of π.We apply more brute force by introducing a change of variables that allows us to restrictbπ(1), . . . , bπ(d) to an almost arbitrary plane. In the new variables, we specify the plane inwhich bπ(1), . . . , bπ(d) lie using polar coordinates, and we then specify their locations usingcoordinates in that plane.

To ease comprehension, we will make this change of variables more concrete. We specifya hyperplane in IRd that does not contain the origin by a ω ∈ Sd−1, the points of norm 1 inIRd, and an r > 0, and say that a point x lies in the hyperplane if 〈ω|x〉 = r. To choose acanonical set of coordinates for each (d−1)-dimensional hyperplane specified by (ω, r), firstfix some unit vector, say z, and fix an arbitrary coordinatization of the hyperplane tangentto the unit sphere at z. We use the coordinates in this plane to construct a system of gridlines on the unit sphere that always intersect at right angles, such as depticted in Figure 1.To do this, form grid lines in the hyperplane parallel to its axes, and then project these tothe sphere using stereographic projection from −z. (Recall that under this projection, apoint on the plane maps to the point on the sphere that intersects the line through this pointand −z) As stereographic projection is a conformal map, the images of the grid lines willonly intersect at right angles. For any hyperplane specified by (ω, 1), we locate its origin atthe point at which it is tangent to the unit sphere, and we derive its system of coordinates

19

from the grid lines on the sphere going throught that point. Coordinates for a hyperplanespecified by (ω, r) can be chosen analogously by dilating the unit sphere. In particular, theorigin of the coordinates in the hyperplane is the closest point on the hyperplane to theorigin of IRd. For hyperplanes specified by (ω, r) where ω is in the direction of −z, thegrid lines on the sphere suffer a singularity and the change of variables is not well-defined;however, this is a family of measure zero so it does not effect our integration.

Figure 1:

The Jacobian of this change of variables is computed by a famous theorem of integralgeometry due to Blaschke [Bla35] (for more modern treatments, see [Mil71] or [San76,12.24]).

Theorem 3.2.1 (Blaschke) For b1, . . . , bd variables taking values in IRd, c1, . . . , cd tak-

ing values in IRd−1, ω ∈ Sd−1, r ∈ IR, let

P : (b1, . . . , bd) 7→ ((ω, r), c1, . . . , cd),

denote the map from the d points b1, . . . , bd to the plane through those points representedin spherical coordinates as (ω, r), and the locations of those points on the plane, c1, . . . , cd.The Jacobian of the map P is

|J(P )| = Vol (Simplex (c1, . . . , cd)) .

That is,

db1 . . . dbd = Vol (Simplex (c1, . . . , cd)) dω dr dc1 . . . dcd

Let νω,rπ(i)(ci) denote the restriction of µπ(i) to the hyperplane (ω, r), re-normalized tohave norm 1. This can be viewed as the induced distribution in the hyperplane. The

20

restriction of a Gaussian distribution to a hyperplane is a Gaussian distribution with thesame variance centered at the projection of the center of the original to the hyperplane;so, νω,rπ(i) has variance σ

2 and is centered at âπ(i), the projection of aπ(i) to the hyperplane

(ω, r). Let ψπ(i)(ω, r) be such that for ci in hyperplane (ω, r) and bπ(i) the correspondingpoint in the ambient space, we have

µπ(i)(bπ(i)) = ψπ(i)(ω, r)νω,rπ(i)

(ci).

That is, ψπ(i)(ω, r) corresponds to the probability that bπ(i) lies in the plane (ω, r).

To help express p(1)z (ǫ) in these new variables, we let

Cone (ω, r, c1, . . . , cd)def= Cone (b1, . . . , bd) .

Lemma 3.2.2

p(1)z (ǫ) =

∑

π∈Π(d,m)

∫

ω,r

m∏

i=d+1

(∫

bπ(i)

[〈ω|bπ(i)

〉≤ yπ(i)r

]µπ(i)(bπ(i)) dbπ(i)

)·

∫

c1,... ,cd[z ∈ Cone (ω, r, c1, . . . , cd)] [ang (z, ∂Simplex (c1, . . . , cd)) < ǫ] ·

Vol (Simplex (c1, . . . , cd))

(d∏

i=1

ψπ(i)(ω, r)νω,rπ(i)(ci) dci

)dω dr

Proof We let (ω, r) be the plane through bπ(1), . . . , bπ(d), and let c1, . . . , cd be the coor-dinates of these points in that plane. The probability that each of the points bd+1, . . . , bmsatisfy the constraints

〈ω|bπ(i)

〉≤ yπ(i)r is

m∏

i=d+1

(∫

bπ(i)

[〈ω|bπ(i)

〉≤ yπ(i)r


)

Thus, the density on Aff(bπ(1), . . . , bπ(d)

), given that it is a facet of the linear program, iscould relate

to previousintegralmore explic-ity

m∏

i=d+1

(∫

bπ(i)

[〈ω|bπ(i)

〉≤ yπ(i)r


)(d∏

i=1

ψπ(i)(ω, r)

).

To condition on Simplex(bπ(1), . . . , bπ(d)

)being optimal for z and having z making an

angle at most ǫ from its boundary, we multiply by∫

c1,... ,cd[z ∈ Cone (ω, r, c1, . . . , cd)] [ang (z, ∂Simplex (c1, . . . , cd)) < ǫ] ·


(d∏

i=1

νω,rπ(i)(ci) dci

).

From this expression, one can see that the distribution on bπ(d+1), . . . , bπ(m) can influ-ence the distribution of (ω, r), but once ω and r are fixed, they do not effect the distributionof the points on that plane.

21

3.3 Division into Distance and Angle

Our bound on ang (z, ∂Simplex (c1, . . . , cd)) will have two parts. We first let pω,r denote

the intersection of plane (ω, r) with the ray through z, and we bound

dist(pω,r, ∂Simplex

(bπ(1), . . . , bπ(d)

)),

which denotes the distance from pω,r to the boundary of the simplex. We then bound theangle between z and ω. We combine these bounds with the following lemma:

z p

h ω

u

θ

Figure 2:

Lemma 3.3.1 Let z be a unit vector defining a ray that intersects Simplex(bπ(1), . . . , bπ(d)

),

let pω,r be the point of intersection, and let ω be the unit normal to Aff(bπ(1), . . . , bπ(d)

).

Let

h = dist(pω,r, ∂Simplex

(bπ(1), . . . , bπ(d)

)).

Then,

ang(z, ∂Simplex

(bπ(1), . . . , bπ(d)

))

≥ arctan(h 〈ω|z〉 /(‖pω,r‖+ h)).

Proof Let θ = ang(z, ∂Simplex

(bπ(1), . . . , bπ(d)

)). Let u be a point on the boundary

of the simplex. Figure 2 describes our computation of ang (z,u). The distance from u tothe ray through z is at least h 〈ω|z〉, and the length of the projection of u onto the raythrough z is at most ‖pω,r‖+ h, so the tangent of θ is at least

h 〈ω|z〉 /(‖pω,r‖+ h).

Using this lemma, we can bound p(1)z using the following lemma.

Lemma 3.3.2 For ǫ < 1/100,

p(1)z (ǫ) ≤ p

(2)z (ǫ) +me

−16d2 log(m/σ),

22

where

p(2)z (ǫ)

def=

∑

π∈Π(d,m)

∫

ω,r

m∏

i=d+1

(∫

bπ(i)

[〈ω|bπ(i)

〉≤ yπ(i)r


)·

∫

c1,... ,cd[z ∈ Cone (ω, r, c1, . . . , cd)] ·

[.99 · dist (pω,r, ∂Simplex (c1, . . . , cd)) 〈ω|z〉 /2(1 + kσ) < ǫ] ·


(d∏

i=1


)dω dr

which is roughly the probability that

.99 · dist(pω,r, ∂Simplex

(bπ(1), . . . , bπ(d)

))〈ω|z〉 /2(1 + kσ)

is less than ǫ.

Proof Directly applying Lemma 3.3.1, we find

p(1)z (ǫ) ≤

∑

π∈Π(d,m)

∫

ω,r

m∏

i=d+1

(∫

bπ(i)

[〈ω|bπ(i)

〉≤ yπ(i)r


)·

∫

c1,... ,cd[z ∈ Cone (ω, r, c1, . . . , cd)] ·

[arctan

(dist (pω,r, ∂Simplex (c1, . . . , cd)) 〈ω|z〉‖pω,r‖+ dist (pω,r, ∂Simplex (c1, . . . , cd))

)< ǫ

]·


(d∏

i=1


)dω dr

We simplify this arctan expression in this integral by using

1/100 > arctan(x) =⇒ arctan(x) > (99/100)x,and the fact that the probability that

‖pω,r‖ ≥ 1 + kσ, ordist (pω,r, ∂Simplex (c1, . . . , cd)) ≥ 1 + kσ

is at most me−16d2 log(m/σ).

To simplify our analysis of p(2)z (ǫ), we define

K(π,ω, r, ǫ) =

m∏

i=d+1

(∫

bπ(i)

[〈ω|bπ(i)

〉≤ yπ(i)r


)·

∫

c1,... ,cd[z ∈ Cone (ω, r, c1, . . . , cd)] ·

[dist (pω,r, ∂Simplex (c1, . . . , cd)) < ǫ] ·


d∏

i=1

ψπ(i)(ω, r)νω,rπ(i)(ci) dci .

23

So, we have

p(2)z (ǫ) =

∑

π∈Π(d,m)

∫

ω,rK

(π,ω, r,

ǫ

〈ω|z〉2(1 + kσ)

.99

)dω dr .

As a sanity check, we note that

∑

π∈Π(d,m)

∫

ω,rK(π,ω, r,∞) dω dr ≤ 1.

The reason we do not get equality is that some directions may not be optimized by anysimplex, or the constraints may be infeasible.

3.4 Bounding the distance

Roughly speaking, we want to bound

Prb1,... ,bm[dist

(pω,r, ∂optSimpz;y(b1, . . . , bm)

)< ǫ].

In particular, for every choice of π, ω, and r, we bound

K(π,ω, r, ǫ)/K(π,ω, r,∞).

Once we fix π, ω, and r, the terms

m∏

i=d+1

(∫

bπ(i)

[〈ω|bπ(i)

〉< yπ(i)r


)

and

d∏

i=1

ψπ(i)(ω, r)

become constants. Using the observation that

[z ∈ Cone (ω, r, c1, . . . , cd)]⇔ [pω,r ∈ Simplex (c1, . . . , cd)] ,

we see that the integral:

∫c1,... ,cd

[pω,r ∈ Simplex (c0, . . . , cd)] [dist (pω,r, ∂Simplex (c1, . . . , cd)) < ǫ] ·

Vol (Simplex (c1, . . . , cd))d∏

i=1

νω,rπ(i)(ci),

is proportional to K(π,ω, r, ǫ). So, Theorem 4.0.1, which is proved in Section 4, implies

K(π,ω, r, ǫ)

K(π,ω, r,∞) ≤ (d+ 1)ǫ(97d2(1 + kσ)3

σ4

)2+me−16d

2 log(m/σ). (10)

24

3.5 Bounding the Angle

Let ω be the unit normal to optSimpz;y(b1, . . . , bm). We show that it is unlikely that〈ω|z〉 is small. In particular, we show that the probability that 〈ω|z〉 is less than ǫ is atmost proportional to ǫ2. Our argument uses quite a bit of brute force: we show that this istrue under the assumption that

optSimpz;y(b1, . . . , bm) = Simplex(bπ(1), . . . , bπ(d)

),

for any π.For any π, we can express the probability that 〈ω|z〉 < ǫ by

∫ω∈Sd−1:0≤〈ω|z〉0K(π,ω, r,∞) dω dr∫

ω∈Sd−1∫r>0K(π,ω, r,∞) dω dr

(11)

To simplify this expression, let

J(π,ω, r)def= K(π,ω, r,∞);

so,

J(π,ω, r) =m∏

i=d+1

(∫

bπ(i)

[〈ω|bπ(i)

〉≤ yπ(i)r


)·

∫

c1,... ,cd[z ∈ Cone (ω, r, c1, . . . , cd)]Vol (Simplex (c1, . . . , cd)) ·

d∏

i=1

ψπ(i)(ω, r)νω,rπ(i)(ci) dci .

In Section 5, we prove Lemma 5.0.3 which shows that for every choice of π∫ω∈Sd−1:0≤〈ω|z〉

where

c1 = (2 · 107)m2(1 + kσ)4/σ4, andc0 = me

−16d2 log(m/σ).

Moreover, equation (10 ) says that for all π, ω and r

K(π,ω, r, ǫ)

K(π,ω, r,∞) < ǫc2 + c0,

where

c2 = (d+ 1)

(97d2(1 + kσ)3

σ4

)2ǫ.

So, we can apply Lemma 3.3.1 to show that for all π∫ω,rK

(π,ω, r, ǫ〈ω|z〉

2(1+kσ).99

)dω dr ,

∫ω,rK(π,ω, r,∞) dω dr .

≤ ǫ(4c2 + (2ǫ(1 + kσ)/.99))c1(2(1 + kσ)/.99) + (4c2 + 2)c0.

Summing over π, we get

p(2)z (ǫ) ≤ ǫ(4c2 + (2ǫ(1 + kσ)/.99))c1(2(1 + kσ)/.99) + (4c2 + 2)c0.

Applying Lemma 3.3.2, we then obtain for ǫ < 1/100

p(1)z (ǫ) ≤ ǫ(4c2 + (2ǫ(1 + kσ)/.99))c1(2(1 + kσ)/.99) + (4c2 + 3)c0.

≤ ǫ2 · 1012(d+ 1)d4m2(1 + kσ)11

σ12+

(4(d+ 1)

972d4(1 + kσ)6

σ8+ 3

)me−16d

2 log(m/σ).

To simplify the last part of this expression, we use

log

(4(d+ 1)

972d4(1 + kσ)6

σ8+ 3

)

≤ log((d+ 1)

4 · 104d4(1 + kσ)6σ8

)

≤ log(d+ 1) + log(4 · 104) + 4 log(d) + 6 log(9d√

log(m/σ)) + 8 log(1/σ)

≤ log(4 · 104) + 6 log(9) + 4 log(d) + 6 log(d) + 6 log(√

log(m/σ)) + 8 log(1/σ)

≤ 24 + 4 log(m) + 6 log(m) + 6 log(m/σ) + 8 log(1/σ)≤ 24 + 16 log(m/σ)≤ 5d2 log(m/σ),

for m− 1 ≥ d ≥ 3. So, we obtain the bound

p(1)z (ǫ) ≤ ǫ

2 · 1012(d+ 1)d4m2(1 + kσ)11σ12

+me−11d2 log(m/σ).

Of course, if the original polytope were contained between two spheres as describedin 2.1, then this analysis would be unnecessary and much tighter bounds could be derived.

Lemma 3.6.2

27

4 Bounding the Distance to a Random Simplex

The main result of this section essentially says that if one is given a point p in IRd andd + 1 Gaussian random points whose variance is not too small and whose centers are notto far from p, then, given that p lies inside the simplex spanned by the d + 1 points, itis unlikely that p is very close to the boundary of that simplex. Intuitively, this shouldbe easy to prove: if p happens to lie too close to a bounding facet of the simplex, thenwe should be able to remedy the situation by slightly perturbing the points that definethe facet (See Figure 3). Moreover, a slight perturbation will not substantially change theprobability of the configuration. Our proof follows this intuition, although it might notshare its simplicity.

Figure 3: By perturbing the vertices on the facet close to p, one can move the facet awayfrom p.

One slight difference between the problem we study in this section and that describedin the paragraph above is that, in our situation, the d+ 1 points have a joint distributionthat favors simplices of large volume. In this situation, we can prove that the probabilitythat p has distance ǫ from the boundary of the simplex is linear in ǫ. If the d + 1 pointswere distributed independently, we would obtain a bound of O(ǫ log(1/ǫ)).

Theorem 4.0.1 Let p ∈ IRd have norm at most 1+ kσ. Let ν0, . . . , νd be Gaussian distri-butions with variance σ2 and centers â0, . . . , âd, each of norm at most 1. Let c0, . . . , cd berandom points distributed according to

κ [p ∈ Simplex (c0, . . . , cd)]Vol (Simplex (c0, . . . , cd))d∏

i=0

νi(ci),

where κ is chosen so that

κ

∫

c0,... ,cd[p ∈ Simplex (c0, . . . , cd)]Vol (Simplex (c0, . . . , cd))

d∏

i=0

νi(ci) = 1.

28

Let

p(4)(ǫ) = κ

∫

c0,... ,cd[p ∈ Simplex (c0, . . . , cd)] [dist (p, ∂Simplex (c0, . . . , cd)) < ǫ] ·

Vol (Simplex (c0, . . . , cd))d∏

i=0

νi(ci).

Then, given that p lies inside Simplex (c0, . . . , cd), the probability that p has distance lessthan ǫ from ∂Simplex (c0, . . . , cd) is

p(4)(ǫ) ≤ (d+ 1)(97d2(1 + kσ)3

σ4

)2ǫ+me−16d

2 log(m/σ).

We will prove this bound by considering each facet of ∂Simplex (c0, . . . , cd) individu-ally, and multiplying the resulting bound by d+ 1. Thus, the theorem follows immediatelyfrom Lemma 4.2.1, which shows that

p(5)(ǫ) ≤(97d2(1 + kσ)3

σ4

)2ǫ+me−16d

2 log(m/σ),

where we let

p(5)(ǫ) = I(ǫ)/I(∞)

and

I(ǫ) =

∫

c0,... ,cd[p ∈ Simplex (c0, . . . , cd)] [dist (p,Aff (c1, . . . , cd)) < ǫ] ·


d∏

i=0

νi(ci),

so that p(5)(ǫ) measures the probability that p has distance less than ǫ from the span ofc1, . . . , cd, given that p lies inside Simplex (c0, . . . , cd).

4.1 Changes of variables

Our first step will be to map the variables (c0, . . . , cd) to new variables (e0, ē,f2, . . . ,fd)in a way that will greatly reduce the parameter space we have to consider in the integral.

Definition 4.1.1 We first change to a coordinate system in which p is the origin by map-ping (c0, . . . , cd) to (e0, . . . ,ed) by

ei = ci − p, for 0 ≤ i ≤ d.

29

We then introduce a new set of variables (ē,f2, . . . ,fd) that describe (e1, . . . ,ed) bytheir relations to their centroid by setting

ē =1

d

d∑

i=1

ei, and

f i = ei − ē, for i = 2, . . . , d.Let

Q(f2, . . . ,fd)

be the plane of co-dimension 1 spanned by f2, . . . ,fd, and let

q(f2, . . . ,fd)

be the unit normal to Q(f2, . . . ,fd).

We now examine how to express p(5)(ǫ) in terms of ē and f2, . . . ,fd. In our analysis,we will fix f2, . . . ,fd arbitrarily and then bound the integral in ē and e0. For convenience,we let

f1 = e1 − ē = −d∑

i=2

f i.

Claim 4.1.2 In the new variables, we have

dist (p,Aff (c1, . . . , cd)) = |〈ē|q(f2, . . . ,f d)〉|and

Vol (Simplex (c0, . . . , cd)) = Vol (Simplex (f1, . . . ,fd)) |〈e0 − ē|q(f2, . . . ,fd)〉| /d.

Proof To prove the first equality, note that

Aff (c1, . . . , cd)− p = Aff (e1, . . . ,ed) ,so the distance from p to Aff (c1, . . . , cd) is the distance from Aff (e1, . . . ,ed) to the origin.Now, Aff (f1, . . . ,fd) is parallel to Aff (e1, . . . ,ed), so they have the same unit normal.As ē lies on Aff (e1, . . . ,ed), the inner product of ē with the unit normal is the distancefrom the plane to the origin. The second equality follows from the fact that the volume ofa pyramid in d dimensions is the volume of the base of the pyramid times its height dividedby d.

To translate the measures from the variables c1, . . . , cd into measures for ē,f2, . . . ,fd,we let

η(e0) = ν0(c0 − p),and

ρf2,... ,fd(ē) =

d∏

i=1

νi(ē+ f i + p).

30

Claim 4.1.3 For every f2, . . . ,fd, the measure ρf2,... ,fd(ē) is a function of f2, . . . ,f dtimes a Gaussian of variance σ2/d centered at

vdef=

1

d

d∑

i=1

âi − p.

Proof First, note that ci = ē+ f i + p. Now,

d∏

i=1

νi(ē+ f i + p) = e−12σ2

(∑d

i=1‖âi−(ē+fi+p)‖2)

= e−12σ2

(∑d

i=1‖(âi−p)−(ē+fi)‖2)

= e−12σ2

∑di=1(〈(âi−p)|(âi−p)〉−2〈(âi−p)|ē+fi〉+〈ē+fi|ē+fi〉)

= e−12σ2

(∑d

i=1(〈(âi−p)|(âi−p)〉−2〈(âi−p)|ē〉+〈ē|ē〉)+∑d

i=1〈fi|fi−2(âi−p)〉)

= e−12σ2

(d‖(∑di=1 âi−p)/d−ē‖2+

∑di=1〈fi|fi−2(âi−p)〉

)

= e−d2σ2

(‖(∑di=1 âi−p)/d−ē‖2

)

e−12σ2

(∑d

i=1〈fi|fi−2(âi−p)〉),

where the fourth equality follows from∑d

i=1 f i = 0.

The following definition will help us understand the condition

[0 ∈ Simplex (e0, ē+ f1, . . . , ē+ fd)] .

Definition 4.1.4 For a simplex S and a vector w not in the plane of S, let PrismS (w)denote the open prism obtained by translating S by non-negative multiples of w. That is,

PrismS (w) = {s+ γw such that s ∈ S and γ ≥ 0} .

For example, if S is a line segment with end-points (−1, 0) and (1, 0), and w = (1, 1), thenthe resulting open prism is depicted in Figure 4.

Lemma 4.1.5

[0 ∈ Simplex (e0, ē+ f1, . . . , ē+ fd)]⇔ ē ∈ Prism−S (−e0) ,

where S = Simplex (f1, . . . ,fd).

Proof The convex hull of (e0,e + f1, . . . ,e + fd) contains the origin if and only if theray from the origin through −e0 intersects ē+ S. That is, if there exists a γ ≥ 0 such that

−γe0 ∈ ē+ S

which is equivalent to

ē ∈ −γe0 − S,

31

��

��

1 2

1

2

−1−2 3

Figure 4: The open prism PrismSimplex((−1,0),(1,0)) ((1, 1)).

which is the definition of Prism−S (−e0).

We now make one last change of variables in which we express e0 as a sum of eQ, avector in Q(f2, . . . ,fd) and a multiple of q(f2, . . . ,fd). That is, we let

e0 = eQ + hq(f2, . . . ,fd),

where eQ ∈ Q(f2, . . . ,fd) and h ∈ IR. An alternate expression for h ish = 〈e0|q(f2, . . . ,fd)〉 .

Since ē and e0 lie on opposite sides of Q(f2, . . . ,fd), we can write

|〈e0 − ē|q(f2, . . . ,fd)〉| = |h|+ |〈ē|q(f2, . . . ,fd)〉|We let λeQ(h) denote the induced measure on h:

λeQ(h) = η(eQ + hq(f2, . . . ,fd)).

We now express I(ǫ) in the new variables.

Lemma 4.1.6

I(ǫ) ∼∫

f2,... ,fd,eQ

∫

ē,h

[ē ∈ PrismSimplex(−s2,... ,−sd) (−e0)

][|〈ē|q(f2, . . . ,fd)〉| ≤ ǫ] ·

(|h|+ |〈ē|q(f2, . . . ,fd)〉|)Vol (Simplex (f1, . . . ,fd)) ·ρf2,... ,fd(ē)λ

eQ(h) dē dh df2 · · · dfd deQ .

Proof The changes of variables in this section are merely linear transformations, and sotheir Jaocobian is a constant. The integral formula is merely the expression of I(ǫ) in thenew variables, using the simplifications described in this section.

32

4.2 Analysis in the Prism

Lemma 4.2.1 Under the conditions of Theorem 4.0.1,

p(5)(ǫ) ≤ ǫ(97d2(1 + kσ)3

σ4

)2+me−16d

2 log(m/σ).

Proof Let

I1(f2, . . . ,f d,eQ, ǫ)

=

∫

ē,h

[ē ∈ PrismSimplex(−f2,... ,−fd) (−e0)

][|〈ē|q(f2, . . . ,fd)〉| ≤ ǫ] ·

(|h|+ |〈ē|q(f2, . . . ,fd)〉|) ρf2,... ,fd(ē)λeQ(h) dē dh ,

so

I(ǫ) ∼∫

f2,... ,fd,eQ

I1(f2, . . . ,fd,eQ, ǫ)Vol (Simplex (f1, . . . ,fd)) df2 · · · dfd deQ .

To get the upper bound on p(5)(ǫ), we will apply Lemma 4.2.2.First, we need to observe that if dist (âi, ci) < kσ, for each i, then

1 each ci has norm at most 1 + kσ;

2 p has norm at most 1 + kσ;

3 e0 has norm at most ‖p‖+ ‖c0‖ ≤ 2 + 2kσ;

4 so, eQ has norm at most 2 + 2kσ.

5 The center of η has norm at most ‖p‖+ ‖â0‖ ≤ 2 + kσ;

6 so, the center of λeQ has norm at most 2 + kσ.

7 The center of ρ has norm at most maxi ‖âi‖ /d+ ‖p‖ ≤ 2 + kσ.

8 The maximum of |ci − cj| ≤ 2 + 2kσ,

9 so, ‖f i‖ = ‖ē− ci‖ ≤ 2 + 2kσ.

As the probability that there exists an i such that dist (âi, ci) ≥ kσ is less thanme−16d2 log(m/σ),we can assume that the probability that one of (3), (6), (7), or (9) fails to hold is at mostme−16d

2 log(m/σ). Given that each of these holds, Lemma 4.2.2 implies that

I1(f2, . . . ,fd,eQ, ǫ)

I1(f2, . . . ,fd,eQ,∞)≤ ǫ

(97d2(1 + kσ)3

σ4

)2.

33

So,

p(5)(ǫ) =I(ǫ)

I(∞)

=

∫f2,... ,fd,eQ

I1(f2, . . . ,fd,eQ, ǫ)Vol (Simplex (f1, . . . ,fd)) df2 · · · dfd deQ∫f2,... ,fd,eQ

I1(f2, . . . ,fd,eQ,∞)Vol (Simplex (f1, . . . ,fd)) df2 · · · dfd deQ

≤ ǫ(97d2(1 + kσ)3

σ4

)2+me−16d

2 log(m/σ).

Lemma 4.2.2 Let S be a simplex containing the origin with corners f1, . . . ,fd ∈ IRd, eachof which has norm at most 2 + 2kσ. Let Q be the plane Aff (f1, . . . ,fd) and let q be theunit normal to the plane. Let eQ be a vector in Q of norm at most 2 + 2kσ. Let λ(h) be aunivariate Gaussian of variance σ2 with a mean of absolute value at most 2+ kσ, and let ρbe a d-dimensional Gaussian of variance σ2/d centered at a point of norm at most 2 + kσ.Let

Bh = Prism−S (hq + eQ) .

If ē and h have joint distribution

κρ(ē)λ(h)(|〈ē|q〉|+ h),

where κ is chosen so that

κ

∫ ∞

h=0

∫

ē∈Bhλ(h)ρ(ē) (|〈ē|q〉|+ h) dh dē = 1;

then, given that ē ∈ Bh, the probability that the distance from ē to Q is less than ǫ forǫ < 10−7 is

κ

∫ ∞

h=0

∫

ē∈Bh[|〈ē|q〉| < ǫ]λ(h)ρ(ē) (|〈ē|q〉|+ h) dh dē < ǫ

(97d2(1 + kσ)3

σ4

)2

Proof We first write

κ

∫ ∞

h=0

∫

ē∈Bh[|〈ē|q〉| < ǫ]λ(h)ρ(ē) (|〈ē|q〉|+ h) dh dē (13)

=

∫∞h=0

∫ē∈Bh [|〈ē|q〉| < ǫ]λ(h)ρ(ē) (|〈ē|q〉|+ h) dh dē∫∞h=0

∫ē∈Bh λ(h)ρ(ē) (|〈ē|q〉|+ h) dh dē

=

∫∞h=0

∫ē∈Bh [|〈ē|q〉| < ǫ]λ(h)ρ(ē) |〈ē|q〉| dh dē +

∫∞h=0

∫ē∈Bh [|〈ē|q〉| < ǫ]λ(h)ρ(ē)hdh dē∫∞

h=0

∫ē∈Bh λ(h)ρ(ē) |〈ē|q〉| dh dē +

∫∞h=0

∫ē∈Bh λ(h)ρ(ē)hdh dē

.

34

To make sense of these integrals, we observe that∫

ē∈Bhρ(ē) dē = ρ(Bh),

∫

ē∈Bh[|〈ē|q〉| < ǫ] ρ(ē) dē = Prē ρ←Bh [|〈ē|q〉| < ǫ] ρ(Bh),∫

ē∈Bh|〈ē|q〉| ρ(ē) dē = Eē ρ←Bh [〈ē|q〉] ρ(Bh),

∫

ē∈Bh[|〈ē|q〉| < ǫ] |〈ē|q〉| ρ(ē) dē ≤ ǫPrē ρ←Bh [|〈ē|q〉| < ǫ] ρ(Bh),

where the inequality follows from the fact that [|〈ē|q〉| < ǫ] = 1 implies |〈ē|q〉| < ǫ.Using these identities, we can re-write the integrals in this fraction as

∫∞h=0

∫ē∈Bh [|〈ē|q〉| < ǫ]λ(h)ρ(ē) |〈ē|q〉| dh dē∫∞h=0

∫ē∈Bh λ(h)ρ(ē) |〈ē|q〉| dh dē

≤ǫ∫∞h=0Prē ρ←Bh

[|〈ē|q〉| < ǫ] ρ(Bh)λ(h) dh∫∞h=0Eē ρ←Bh

[〈ē|q〉] ρ(Bh)λ(h) dh

and∫∞h=0

∫ē∈Bh [|〈ē|q〉| < ǫ]λ(h)ρ(ē)hdh dē∫∞h=0

∫ē∈Bh λ(h)ρ(ē)hdh dē

=

∫∞h=0 hPrē ρ←Bh

[|〈ē|q〉| < ǫ] ρ(Bh)λ(h) dh∫∞h=0 hρ(Bh)λ(h) dh

By Lemma 4.2.8, the first of these fractions is at most

9500ǫ2/δ0 + 55000ǫ2 log(2/ǫ)/δ20 ,

and by Lemma 4.2.7, the second is at most

ǫ8e3/δ0,

where

δ0 =β0√1 + β20

σ2/(2d(4 + 3kσ + σ/

√d)), and

β0 =σ2

12d(1 + kσ)2.

For ǫ < 10−7, we have

ǫ log(2/ǫ) < 20 · 10−6,

so

9500ǫ2/δ0 + 55000ǫ2 log(2/ǫ)/δ20 ≤ ǫ/2δ0 + ǫ/2δ20 .

By applying the inequality,

a1 + b1a2 + b2

≤ max(a1a2,b1b2

). (14)

35

we bound (13) by

ǫmax(8e3/δ0, 1/2δ0 + 1/2δ

20

)≤ ǫmax

(8e3/δ0, 1/δ

20

).

Using the definitions of δ0 and β0, we find

β0 =σ2

12d(1 + kσ)2, implies

β0 ≤ 1/12, which implies√1 + β0 ≤ 100/99.

So, we can bound δ0 by

δ0 =β0√1 + β20

σ2

d(8 + 6kσ + 2σ/√d)

≥ .99β0σ2

8d(1 + kσ)

≥ .99σ4

96d2(1 + kσ)3

≥ σ4

97d2(1 + kσ)3.

Using this bound, we compute

max(8e3/δ0, 1/δ

20

)≤ max

(5800d2(1 + kσ)3

σ4,

(97d2(1 + kσ)3

σ4

)2)

≤(97d2(1 + kσ)3

σ4

)2,

for σ ≤ 1. The theorem follows.

The intuition behind this bound is as follows: if h is not too small, then the slope of Bhis not too small. In Section 4.3 we will show that if the slope of Bh is not too small, thenit is unlikely that a point chosen from Bh is near the plane Q. We then show in Section 4.4that it is unlikely that h is too small. Our analysis will be aided by the following definitions.

Definition 4.2.3 Set

β0def=

σ2

12d(1 + kσ)2,

h0def= β0 ‖eQ‖ , and

δ0def=

β0√1 + β20

σ2/(2d(4 + 3kσ + σ/

√d)).

36

Claim 4.2.4 Under the conditions of Lemma 4.2.2,

h0 ≤σ2

6d(1 + kσ).

We will evaluate these integrals by breaking them into two regions split at h0 andapplying the following inequalities, which are proved using the results of Section 4.3.

Lemma 4.2.5 Under the conditions of Lemma 4.2.2, for h ≥ h0Prē ρ←Bh

[|〈ē|q〉| < ǫ] < eǫ/δ0, and (15)Eē ρ←Bh

[〈ē|q〉] ≥ δ0/4e, (16)

For h ≤ h0,

Prē ρ←Bh[|〈ē|q〉| < ǫ] <

(h0h

)eǫ

δ0, and (17)

Eē ρ←Bh[〈ē|q〉] ≥

(h

h0

)δ04e. (18)

Proof By applying Lemma 4.3.1 with a = 2 + kσ, b = 2 + 2kσ and τ = σ/√d, we find

Prē ρ←Bh[|〈ē|q〉| < ǫ] < eǫ

δ

Eē ρ←Bh[〈ē|q〉] ≥ δ

4e,

where

δ =β√

1 + β2σ2/

(2d(4 + 3kσ + σ/

√d))

and β = h/ ‖eQ‖ .

The inequalities for h ≥ h0 follow from the fact that δ is a monotone increasing function ofh.

The inequalities for h ≤ h0 are proved by writingeǫ

δ=

eǫ

δ0

δ0δ, and

δ

4e=

δ04e

δ

δ0,

and observing that, for β < β0,

δ0δ

=β0√

1 + β2

β√

1 + β20≤ β0

β=h0h.

In the analysis that follows, we will consider situations in which the random variable his distributed according to

ζ(h) = λ(h)ρ(Bh),

and we will need to show that it is unlikely that h is close to 0. This fact is a consequenceof the main lemma of Section 4.4.

37

Lemma 4.2.6 Under the conditions of Lemma 4.2.2, let

ζ(h) = λ(h)ρ(Bh).

Then, ζ satisfies

ζ(h′) ≥ ζ(h)/(2e2), for 0 < h < h′ < h0. (19)

Proof The proof of (19) has two parts: we first apply Lemma 4.4.1 with τ = σ/√d,

a = 2 + kσ, b = 2 + 2kσ, and k ≥ 4 to show that

ρ(Bh′) ≥ ρ(Bh)/(2e), for 0 < h < h′ < h0.

We then apply Lemma A.0.4 to show that, for 0 < h < h′ < h0,

λ(h)/λ(h′) ≤ eh0(2(2+kσ)+h0)/2σ2

≤ eσ2

6d(1+kσ)

(4+2kσ+ σ

2

6d(1+kσ)

)/(2σ2)

≤ e4+2kσ+ σ

2

6d(1+kσ)12d(1+kσ)

≤ e1/3+1/6d+1/72d≤ e.

Combining these two bounds we obtain a proof of (19).

Lemma 4.2.7∫∞h=0 hPrē ρ←Bh

[|〈ē|q〉| < ǫ] ρ(Bh)λ(h) dh∫∞h=0 hρ(Bh)λ(h) dh

≤ ǫ8e3/δ0.

Proof For h ≥ h0 we apply (15) to show∫∞h=h0

hλ(h)ρ(Bh)Prē ρ←Bh[|〈ē|q〉| < ǫ] dh

∫∞h=h0

hλ(h)ρ(Bh) dh< eǫ/δ0. (20)

For h ≤ h0 we use (17) to show∫ h0

h=0hλ(h)ρ(Bh)Prē ρ←Bh

[|〈ē|q〉| < ǫ] dh ≤ eǫh0/δ0∫ h0

h=0λ(h)ρ(Bh) dh ;

whereas,

∫ h0

h=0hλ(h)ρ(Bh) dh ≥

∫ h0

h=h0/2hλ(h)ρ(Bh) dh ≥ (h0/2)

∫ h0

h=h0/2λ(h)ρ(Bh) dh

Using Lemma 4.2.6 and part (b) of Lemma A.0.7 we find

∫ h0

h=h0/2λ(h)ρ(Bh) dh ≥

1

4e2

∫ h0

h=0λ(h)ρ(Bh) dh , (21)

38

so,

∫ h0h=0 hλ(h)ρ(Bh)Prē ρ←Bh

[|〈ē|q〉| < ǫ] dh∫ h0h=0 hλ(h)ρ(Bh) dh

≤ eǫh0/δ0∫ h0h=0 λ(h)ρ(Bh) dh

(h0/8e2)∫ h0h=0 λ(h)ρ(Bh) dh

≤ ǫ8e3/δ0. (22)

Combining equations (20) and (22) and using inequality (14) we conclude

∫∞h=0 hλ(h)ρ(Bh)Prē ρ←Bh

[|〈ē|q〉| < ǫ] dh∫∞h=0 hλ(h)ρ(Bh) dh

≤ ǫ8e3/δ0. (23)

Lemma 4.2.8

ǫ∫∞h=0Prē ρ←Bh


[〈ē|q〉] ρ(Bh)λ(h) dh< 9500ǫ2/δ0 + 55000ǫ

2 log(2/ǫ)/δ20 .

Proof As before, we will split these integrals at h0; but, this time we will also split themat ǫh0. For h ≥ h0, we apply (15) and (16) to show

ǫ∫∞h=h0

λ(h)ρ(Bh)Prē ρ←Bh[|〈ē|q〉| < ǫ] dh

∫∞h=h0

λ(h)ρ(Bh)Eē ρ←Bh[〈ē|q〉] dh ≤ ǫ

24e2/δ20 . (24)

For h ≤ h0 we use (18) to show∫ h0

h=0λ(h)ρ(Bh)Eē ρ←Bh

[〈ē|q〉] dh ≥∫ h0

h=h0/2λ(h)ρ(Bh)Eē ρ←Bh

[〈ē|q〉] dh

≥ δ08e

∫ h0

h=h0/2λ(h)ρ(Bh) dh

≥ δ032e3

∫ h0

0λ(h)ρ(Bh) dh , (25)

where the last inequality follows from (21). We then use (17) to bound the correspondingterm in the numerator by

ǫ

∫ h0

h=ǫh0

λ(h)ρ(Bh)Prē ρ←Bh[|〈ē|q〉| < ǫ] dh ≤ (ǫ2e/δ0)h0

∫ h0

h=ǫh0

λ(h)ρ(Bh)/h dh

≤ (ǫ2e/δ0)(4e2 log(2/ǫ) + 2)∫ h0

h=0λ(h)ρ(Bh) dh ,

39

where the last inequality follows from Lemma 4.2.6 and part (c) of Lemma A.0.7. Combiningthis inequality with (25) we find

ǫ∫ h0h=ǫh0

λ(h)ρ(Bh)Prē ρ←Bh[|〈ē|q〉| < ǫ] dh

∫ h0h=0 λ(h)ρ(Bh)Eē ρ←Bh

[〈ē|q〉] dh

≤ (ǫ2e/δ0)(4e

2 log(2/ǫ) + 2)

(δ0/(32e3))

≤ ǫ232e4(31 log(2/ǫ))

δ20, since log(2/ǫ) ≥ 1

≤ 55000ǫ2 log(2/ǫ)/δ20 . (26)

Finally

ǫ

∫ ǫh0

h=0λ(h)ρ(Bh)Prē ρ←Bh

[|〈ē|q〉| < ǫ] dh ≤ ǫ∫ ǫh0

h=0λ(h)ρ(Bh) dh

≤ ǫ22e2∫ h0

h=0λ(h)ρ(Bh) dh ,

where the last inequality follow from Lemma 4.2.6 and part (a) of Lemma A.0.7. So,

ǫ∫ ǫh0h=0 λ(h)ρ(Bh)Prē ρ←Bh

[|〈ē|q〉| < ǫ] dh∫ h0h=0 λ(h)ρ(Bh)Eē ρ←Bh

[〈ē|q〉] dh≤ ǫ

22e2

(δ0/(32e3))≤ 9500ǫ2/δ0. (27)

Combining (24), (26) and (27), and applying inequality (14), we find

ǫ∫∞h=0Prē ρ←Bh


[〈ē|q〉] ρ(Bh)λ(h) dh≤ 9500ǫ2/δ0 + 55000ǫ2 log(2/ǫ)/δ20 . (28)

4.3 Random point in an open prism

For an open prism

B = PrismS (e) ,

we say that the slope of B is the slope of e relative to the hyperplane containing B. Thatis, if Q is the plane containing B and q is the unit normal to Q, then the slope of B is

〈q|e〉‖e− q 〈q|e〉‖ .

For example, in two dimensions if Q is the x-axis, then the slope of B is the standard slopeof the vector e.

40

Lemma 4.3.1 Let Q be a plane in IRd of co-dimension 1 that contains the origin, and let qbe the normal to Q. Let B be an open prism in IRd with a base S ⊂ Q, where S is containedin the ball of radius b centered at the origin. Let ρ be a Gaussian measure of variance τ2

centered at a point v of norm at most a. If B has slope β then,

(a) Prē ρ←B [|〈ē|q〉| < ǫδ] < eǫ, and

(b) Eē ρ←B [〈ē|q〉] ≥ δ/4e,

where

δ = (β/√

1 + β2)τ2/2(a+ b+ τ).

Proof Let e be the vector of norm τ2/2(a+ b+ τ) such that

B = PrismS (e) ,

We observe that B is the union of simplices S+ γe that are isomorphic to S. For γ ∈ [0, 1],we will see that

ρ(S + γe) ≥ ρ(S)/e.

To see this, we compare the measures of (S + γe) with S point-wise. Since each point ofs ∈ S has distance at most (a + b) to v and ‖γe‖ ≤ τ2/2(a + b+ τ), Lemma A.0.4 allowsus to conclude that

ρ(s)/ρ(s + γe) ≤ e(‖γe‖(2(a+b+τ)+‖γe‖))

2τ2

≤ e(τ2/2(a+b+τ))(2(a+b+τ)))

2τ2+ (τ

2/2(a+b+τ))2

2τ2

≤ e12+ τ

2

4(a+b+τ)2

≤ e 12+ 14≤ e,

from which the bound on the ratio of ρ(S + γe) to ρ(S) follows.The angle of e to Q is arctan(β) so the height of e above Q is

cos(arctan(β)) ‖e‖ = (β/√

1 + β2)τ2/2(a + b+ τ).

Thus, for ē ∈ (S + γe) the height of ē above Q is

〈ē|q〉 = γ(β/√

1 + β2)τ2/2(a + b+ τ) = γδ.

If we let B̄ = S +⋃

γ∈[0,1] γe, then we have

Prē ρ←B̄ [|〈ē|q〉| < ǫδ] ≤ Prē ρ←B̄ [|〈ē|q〉| < ǫδ]

=

∫ ǫγ=0 ρ(S + γe)dγ∫ 1γ=0 ρ(S + γe)dγ

≤ eǫ,

41

where the last inequality follows from Lemma A.0.7 with

f(γ) = ρ(S + γe).

This proves part (a).To prove part (b), we use part (a) to observe that

Prē ρ←B [〈ē|q〉 > δ/2e] ≥ 1/2,

from which (b) immediately follows.

4.4 Close prisms have close measure

In the following lemma, we compare the measures of open prisms with the same base andnearby defining vectors.

Lemma 4.4.1 Let Q be a plane in IRd of co-dimension 1 that contains the origin, and letq be the normal to Q. For a, b > 1, let S ⊂ Q be a simplex containing the origin that iscontained inside a ball of radius b. Let ρ be a Gaussian measure of variance τ2 centered ata point v of norm at most a.

Let eQ be a point on Q, and let h and h′ be such that

0 < h < h′ < β0 ‖eQ‖ ,

where

β0 <τ2

3(a+ b+ 4τ)2.

Then,

ρ(Bh′) > ρ(Bh)/(2e),

where

Bh = PrismS (hq + eQ) , and

Bh′ = PrismS(h′q + eQ

).

Proof Let R0 be the 2-plane perpendicular to Q containing the origin and eQ. We willcompare the volumes of Bh and Bh′ by integrating by 2-planes parallel to R0. In each suchplane R, we have

Bh ∩R = PrismS∩R (hqR + eR) ,

where qR is the unit normal to Q ∩ R and eR is the projection of eQ onto R. Note thatS ∩R is just a line segment, so Bh ∩R is an open parallelogram as depicted in Figure 4. InR, let the x-axis be the intersection of Q with R, so that qR is a unit vector in the directionof the y-axis. In the plane R, we have

42

1. the interval S ∩R lies inside the interval [(−b, 0), (b, 0)],

2. the restriction of ρ to R is a Gaussian distribution of variance τ2 centered at theprojection of v to R, which is a point of norm at most a, and

3. h/ ‖eR‖ = h/ ‖eQ‖ and h′/ ‖eR‖ = h′/ ‖eQ‖.These conditions allow us to apply Lemma 4.4.2 to conclude

ρ(Bh′ ∩R) > ρ(Bh ∩R)/2e.

The proof now follows by integrating over R.

b0 b1 w

h

h’

Figure 5:

Lemma 4.4.2 Let a > 1 and b > 1. Then, for every

1. b0 and b1 such that −b < b0 < b1 < b,

2. v ∈ IR2 with ‖v‖ < a,

3. eQ > 0, and

4. h and h′ such that 0 < h < h′ < β0eQ,

where

β0 <τ2

3(a+ b+ 4τ)2,

we have

ρ(Bh′) > ρ(Bh)/(2e),

where ρ is a Gaussian measure with variance τ2 centered at v, and Bh is the open parallel-ogram Prism[(b0,0),(b1,0)] ((eQ, h)) and Bh′ is the corresponding parallelogram.

43

Proof For any x greater than b0, let lh(x) be least y such that (x, y) ∈ Bh and let uh(x)be the greatest y such that (x, y) ∈ Bh. Define lh′(x) and uh′(x) similarly. In this notation,we have

ρ(Bh) =

∫ ∞

x=−∞

∫ ∞

y=−∞[(x, y) ∈ Bh]ρ(x, y) dx dy

=

∫ ∞

x=b0

∫ uh(x)

y=lh(x)ρ(x, y) dx dy

From Figure 5 we immediately see that

∫ b1

x=b0

∫ uh(x)

y=lh(x)ρ(x, y) dx dy ≤

∫ b1

x=b0

∫ uh′(x)

y=lh′ (x)ρ(x, y) dx dy ,

since for x ≤ b1 we have lh(x) = lh′(x) = 0 and uh′(x) > uh(x).We will break the rest of the integral into two parts: one in which Bh and Bh′ are close,

and another whose contribution is exceedingly small. Let w = max(b1, a)+4τ ≤ (b+a)+4τ .We will now show that

∫ w

x=b1

∫ uh(x)

y=lh(x)ρ(x, y) dx dy ≤ e

∫ w

x=b1

∫ uh(x)−lh(x)+lh′ (x)

y=lh′ (x)ρ(x, y) dx dy

≤ e∫ w

x=b1

∫ uh′ (x)

y=lh′ (x)ρ(x, y) dx dy .

The last inequality follows from the fact that

uh′(x)− lh′(x) > uh(x)− lh(x).

To prove the first inequality, we note that the ratio of the first two of these integrals is atleast

miny∈[lh(x),uh(x)]

ρ(x, y − lh(x) + lh′(x))ρ(x, y)

.

To lower bound this quantity, let y ∈ [lh(x), uh(x)] for some x ∈ [b1, w], and let y′ =y − lh(x) + lh′(x). Now examine the distances between v, (x, y), and (x, y′). First noticethe distance from v to such a (x, y) is at most sum of the following three distances: (1) thedistance from v to the origin; (2) w, which is an upper bound on x; and (3) the maximumpossible y for x ≤ w, which is achieved at x = w. The first quantity is at most a by ourassumption. The second quantity is at most (a+ b) + 4τ . The last quantity is bounded byβ0(w − b0) ≤ β0(a+ 2b+ 4τ).

Let s be this sum,

s = (2 + β0)a+ (1 + 2β0)b+ 4(1 + β0)τ.

So, the maximum distance from v to any point on the segment from (x, lh(x)) to (x, lh′(x)+uh(x)− lh(x)) is at most s.

44

Secondly, we note that

x− b1 ≤ w − b1 ≤ a+ b+ 4τ.

This implies that

0 < lh(x) < lh′(x) < β0(a+ b+ 4τ).

Therefore the distance between (x, y) and (x, y′) is at most δ = β0(a + b+ 4τ). So, byLemma A.0.4,

miny∈[lh(x),uh(x)]

ρ(x, y + lh′(x)− lh(x))ρ(x, y)

≥ e−(s+δ)δ

τ2

To ensure that this quantity is at least e−1, we need to show (s+ δ)δ ≤ τ2. Our assumptionon β0 implies β0 ≤ 1/3; so

s+ δ = (2 + 2β0)a+ (1 + 3β0)b+ 4(1 + 2β0)τ

≤ 3(a+ b+ 4τ), and(s+ δ)δ ≤ 3β0(a+ b+ 4τ)2 ≤ τ2.

Finally, we will show that∫ ∞

x=w

∫ ∞

y=−∞[(x, y) ∈ Bh]ρ(x, y) dx dy ≤

∫ w

x=b1

∫ ∞

y=−∞[(x, y) ∈ Bh]ρ(x, y) dx dy .

First, we will compare

∫ w

x=w−2τ

∫ vh(x)

y=uh(x)ρ(x, y) dx dy

with∫ w+2τ

x=w

∫ vh(x)

y=uh(x)ρ(x, y) dx dy

We note that the areas over which these two integrals are taken are isomorphic undertranslation. Thus, we can lower bound the ratios of the integrals of ρ over these two regionsby lower-bounding the ratios of corresponding points under the isomorphism. To this end,let v1 be a point in the first region and let v2 be the isomorphic point in the second. Wenow bound the distance from v to these points. Suppose v = (xa, ya). Let L be the linethrough v perpendicular to the line through v1 and v2. We need to show that v1 and v2 lieon the same side of L. We will prove this by showing that the x-intercept of L is at mostw − 2τ . Because the slope of the line through v1 and v2 is at most β0, the x-intercept of Lis at most xa + β0ya. Thus, we need to show xa + β0ya < w − 2τ , which is equivalent to

β0ya < (w − xa)− 2τ.

We first show that β0 <2τa . By the assumption of the Lemma, we have β0 <

τ2

3(a+b+4τ)2<

τ2/a2. If τ/a ≤ 1, then β0 < τ2/a2 ≤ τ/a ≤ 2τa . If τ/a > 1, β0 ≤ 2τa is implied by β0 < 1/3.

45

Using w−xa > 4τ and ya ≤ a, we derive β0 ≤ 2τa ≤(w−xa)−2τ

ya, which is what we needed.

Let r1 be the distance from v to v1 and let r2 be the distance from v to v2. Let β bethe distance from v to the line through v1 and v2 and let γ be the distance from v1 to L.Then

r22 − r21 ≥(β2 + (γ + 2τ)2

)− (β2 + γ2)

= (γ + 2τ)2 − γ2≥ 2τ(2γ + 2τ)≥ (2τ)2.

Thus, we can conclude that

ρ(v2)/ρ(v1) ≤ e−(r22−r

21)

2τ2 ≤ e−2,

and therefore that the ratio of the two integrals at is most e−2.As similar bounds hold for the ratio of the integrals between w+ 2iτ and w+ 2(i+1)τ

for i ≥ 1, we compute∫∞x=w

∫ vh(x)y=uh(x)

ρ(x, y) dx dy∫ wx=w−2τ

∫ vh(x)y=uh(x)

ρ(x, y) dx dy< e−2 + e−4 + e−6 + · · · ≤ 1.

Combining our bounds on the integrals

arXiv:cs/0111050v1 [cs.DS] 19 Nov 2001eﬃciently. In 1947, Dantzig [Dan51] introduced the simplex...

Documents

Transcript of arXiv:cs/0111050v1 [cs.DS] 19 Nov 2001eﬃciently. In 1947, Dantzig [Dan51] introduced the simplex...