pdfs.semanticscholar.orgpdfs.semanticscholar.org/6e47/a78872fbc48e8112d3a9a1a3847a368e0668.pdfBounding...

Bounding Probability of Small Deviation: A Fourth Moment Approach

Simai He∗, Jiawei Zhang†, and Shuzhong Zhang‡

December 13, 2007

Abstract

In this paper we study the problem of bounding the value of the probability distributionfunction of a random variable X at E[X] + a where a is a small quantity in comparison withE[X], by means of the second and the fourth moments of X. In this particular context, manyclassical inequalities yield only trivial bounds. By studying the primal-dual moments-generatingconic optimization problems, we obtain upper bounds for Prob {X ≥ E[X] + a}, Prob {X ≥ 0},and Prob {X ≥ a} respectively, where we assume the knowledge of the first, second and fourthmoments of X. These bounds are proved to be tightest possible. As application, we demonstratethat the new probability bounds lead to a substantial sharpening and simplification of a recentresult and its analysis by Feige ([7], 2006); also, they lead to new properties of the distributionof the cut values for the max-cut problem. We expect the new probability bounds to be usefulin many other applications.

Keywords: probability of small deviation, fourth moment of a random variable, sum of randomvariables.

MSC subject classification: 60E15, 78M05, 60G50.

∗Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin,

Hong Kong. Email: [email protected].†Department of Information, Operations, and Management Sciences, Stern School of Business, New York Univer-

sity, New York, USA. Email: [email protected].‡Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong,

Shatin, Hong Kong. Email: [email protected]. Research supported by Hong Kong RGC Earmarked Grants

CUHK418505 and CUHK418406.

1

1 Introduction

For a random variable X ∈ R, we consider the problem of upper bounding

Prob {X ≥ E[X] + a} (1.1)

for a given real a. This problem has been studied extensively in the literature. Based on availableinformation about the distribution of X, various inequalities have been developed, including, thewell-known Markov inequality and Chebyshev inequality. Such inequalities (1.3) and (1.4) havebeen extremely useful. However, these two inequalities by themselves could sometimes be too weakto yield useful results especially when a is small or zero. This motivates us to develop strongerprobability inequalities that can handle small deviations.

1.1 Our results

In Section 2, we start our investigation by developing upper bounds for Prob {X ≥ 0} that arerelatively simple functions of the first, second, and fourth moments of X. In particular, we provethat, for any v > 0

Prob {X ≥ 0} ≤ 1− 49(2√

3− 3)(−2M1

v+

3M2

v2− M4

v4

). (1.2)

Here and throughout the paper, we denote Mm = E[Xm]. The above result is in Theorem 2.3 ofthe current paper. The bound provided by (1.2) has a relatively simple closed-form expression, andwe have the freedom to choose any v > 0 in the bound. Therefore, it is quite convenient to usethis bound as long as the information about M1,M2, and M4 is available. The study of this typeof probability bounds is motivated by a lemma used in He et al. [9], which is a special case of (1.2)when M1 = 0 with a specific choice of v.

The assumptions on the inequality (1.2) is minimal. In fact, we do not even require theassumption that E[X] ≤ 0. That is to say, we can estimate the probability that X ≥ 0 even whenE[X] ≥ 0. In fact, inequality (1.2) is non-trivial i.e., the right hand side is less than 1, as long asE[X] < 0 or E[X]2E[X4] ≤ E[X2]3. This is in contrast to many other probability inequalities in theliterature, as we shall see in the next subsection.

The bound provided by (1.2) however, is not necessarily tight. It is of interest to know whetheror not the bound can be further improved. We settle this issue by presenting in Theorem 2.8 atight upper bound, which is thus the best possible bound, given the moments information. As itturns out, the bound (1.2) is a very good one, in view of the tight bound; it is even tight under acertain condition.

After settling the issue of the probability bound for Prob {X ≥ 0}, it is natural to considerthe bound for Prob {X ≥ a}, using the same moments information. This extension is useful and isnontrivial to establish. When E[X] = 0, we are able to provide a tight bound for Prob {X ≥ a},using the information of M2 and M4. The result is presented in Theorem 2.11.

2

Of course, inequality (1.2) may not be immediately applicable if the second and the fourthmoments of X is not directly available. Fortunately, in many applications, as we shall demonstratein this paper, it is relatively straightforward to compute or bound the second and the fourthmoments.

In Section 3 and Section 4, we provide several examples to demonstrate the applicability ofTheorem 2.3. Our first example regards the sum of n independent random variables. In particular,given n independent random variables, X1, X2, · · · , Xn, we provide upper bounds on

Prob

{n∑

i=1

Xi ≥ E

[n∑

i=1

Xi

]+ a

}, i = 1, ..., m.

The bounds are particularly useful when a is a relatively small non-negative real. Here the randomvariables Xi could be bounded from both sides, or from below only. As a special case of this result,we obtain the following bound. If Xi is non-negative with expectation 1, then

Prob

{n∑

i=1

Xi ≥ n + 1

}≤ 7

8.

This strengthens the main result of a recent paper by Feige [7]. In [7], a weaker upper boundof 12/13 is proved by using a completely different approach, and the proof is considerably moreinvolved and lengthy.

In Section 4 we also apply Theorem 2.3 to the well-known weighted maximum-cut problem.Given an undirected graph G = (V, E) where each edge (u, v) has a weight wuv, we wish to partitionthe vertices of G into two sets S1 and S2 so as to maximize the total weight of the edges (u, v) suchthat u ∈ S1 and v ∈ S2. A simple solution to this problem is to independently and equiprobablyassign each vertex of G to either S1 or S2. We denote the total weight of edges with end-pointsin different sets by W . It is clear that the expected value of the W is exactly 1

2

∑(u,v)∈E wu,v. By

applying Theorem 2.3, we can show that

Prob

W ≥ 1

2

∑

(u,v)∈E

wu,v

>

2√

3− 315

and

Prob

W ≥

(12

+0.0036|V |

) ∑

(u,v)∈E

wu,v

> 1.2%.

Both bounds seem to be new. Furthermore, the second bound implies that for any graph,there exists a cut so that the total weight of edges in the cut is at least

(12 + 0.0036

|V |)∑

(u,v)∈E wu,v.

3

1.2 Related Literature

In the literature, there are several probability inequalities based on moment information. Forexample, if X assumes only non-negative values, then

Prob {X ≥ a} ≤ E[X]a

. (1.3)

This is the well-known Markov inequality and gives the tightest possible bound when we know onlythat X is non-negative and has a given expectation. If the standard deviation of X, denoted by σ,is also available, and t > 0, then we have

Prob {X ≥ E[X] + tσ} ≤ 11 + t2

. (1.4)

This inequality is often referred to as the (one-sided) Chebyshev inequality. Both inequalities (1.3)and (1.4) have been extremely useful.

If we know the first three moments of X, it is shown in Bertsimas and Popescu [4] that,

Prob {X > (1 + δ)E[X]} ≤

min(

C2M

C2M+δ2 , 1

1+δ ·D2

M

D2M+(C2

M−δ)2

), if δ > C2

M ,

11+δ ·

D2M+(1+δ)(C2

M−δ)

D2M+(1+C2

M )(C2M−δ)

, if δ ≤ C2M ,

(1.5)

where C2M = M2−M2

1

M21

and D2M = M1M3−M2

2

M41

, and this bound is tight. Tight bounds on Prob {X <

(1− δ)M1} and Prob {|X −M1| > δM1} are also provided in [4]. These inequalities are potentiallyuseful to bound small deviation probability as well, i.e., when δ is small. However, we noticedthat in several applications that we consider in this paper, it is harder to estimate M3 than M4.Furthermore, the bound provided by inequality (1.5) could be as weak as Markov’s and Chebyshev’sbounds, for instance, for the problem considered by Feige [7]. We shall discuss this in more detailslater.

Zelen [17] showed that, if the first four moments of X are known, then

Prob {X ≥ E[X] + tσ} ≤(

1 + t2 +(t2 − tκ3 − 1)2

κ4 − κ23 − 1

)−1

for t ≥ κ3 +√

κ23 + 4

2, (1.6)

where κm = Mmσm .

There are also probability inequalities that uses absolute moments of the random variable X.Let

νm = E [(X − E[X])]m .

Cantelli [3] showed that for m > 0,

Prob {|X − E[X]| ≥ a} ≤ ν2m − ν2m

ν2m − ν2m + (am − νm)2

for a ≥(

ν2m

νm

)1/m

.

4

When m = 2, the above inequality is reduced to the well-known (two-sided) Chebyshev inequality.Von Mises [13] proved that, for m > k > 0,

Prob {|X| ≥ a} ≤ Jm − νm

Jm − amfor a ≥

(νm

νk

)1/(m−k)

,

where J is the root, different from a, of the equation

(Jm − am)/(Jk − ak) = (νm − am)/(νk − ak).

Unfortunately, it is clear from the conditions provided in the above inequalities proved by Zelen[17], Cantelli [3], and Von Mises [13], that none of them is applicable for bounding probabilitieswhen the deviation is very small.

In a recent paper, He et al. [9] studied SDP relaxations for certain quadratic optimizationproblems. The main results are to establish the gap between the SDP relaxations and the quadraticoptimization problems. As a key to their main results, they established the following inequality:

Prob {X ≥ E[X]} ≤ 1− 920· σ4

E[(X − E[X])4].

This inequality is a special case of Theorem 2.3. The current paper is partly motivated by [9].

Our paper is also related to Berger [2], which uses the fourth moment information to boundthe absolute value of a random variable. More specifically, it is shown in [2] that, for all q > 0,

E[|X|] ≥ 3√

32√

q

(E[X2]− E[X4]

q

).

This result has been used by Berger to bound the absolute value of a weighted sum of {+1,−1}unbiased random variables, and achieve tight bounds for the total discrepancy of a set system.

Our results can be viewed as solutions to a special class of moment problems. Moment problemsconcern about deriving bounds on the probability that a certain random variable belongs in a givenset, given information on some of its moments. The study on moment problems has a long history;see Bertsimas and Popescu [4] for a brief review of this area. The tight bounds derived in our paperuse an optimization method and duality theory. This duality approach was proposed independentlyand simultaneously by Isii [10] and Karlin [11]. Bertsimas and Popescu [4] show that, for univariaterandom variables, the dual of the moment problem can be formulated as a semidefinite program(SDP). This result is important because SDP problem can be solved in polynomial time withinany prescribed accuracy. They also discuss the complexity of solving the dual moment problem formultivariate random variables. Recent results on moment problems can also be found in [5], [15],and [12].

The work by Bertsimas and Popescu [4] seems to have settled the moment problems for univari-ate random variables, i.e., for the given information of the moments, one can compute the desiredprobability bound efficiently by solving an SDP. However, such bounds may not be convenientlyused because of the lack of simple closed-form expressions.

5

2 The Moment Problem: Duality Approach

Let us start our discussion by considering the problem:

Z1P = max Prob {X ≥ 0}

s.t. E[X] = M1

E[X2] = M2

E[X4] = M4,

or equivalentlyZ1

P = maxF (·)∫x≥0 1 · dF (x)

s.t.∫x∈R 1 · dF (x) = 1∫x∈R x · dF (x) = M1∫x∈R x2 · dF (x) = M2∫x∈R x4 · dF (x) = M4,

(2.7)

where the variable of this infinite dimensional optimization problem is the probability measure F (·).The dual problem of (2.7) is given as follows:

Z1D = min y0 + M1 · y1 + M2 · y2 + M4 · y4

s.t. g(x) := y0 + y1 · x + y2 · x2 + y4 · x4 ≥ 1{x≥0}, ∀x ∈ R.(2.8)

We first define a feasible solution to the dual problem (2.8).

Lemma 2.1. For any u > v > 0, let c = 1(u+v)3(u−v)

> 0 and d = 2v(u+v)3

> 0. Define

y0 = cu4 + du2,

y1 = 2du,

y2 = d− 2cu2,

y4 = c.

(2.9)

Then (y0, y1, y2, y4) is feasible to problem (2.8) if u ≤ 1+√

32 v.

Proof. If (y0, y1, y2, y3) is defined as in (2.9), then

g(x) = cx4 + (d− 2cu2)x2 + 2dux + cu4 + du2 = c(x2 − u2)2 + d(x + u)2.

It is clear that g(x) ≥ 0 for all x ∈ R. It is left to verify that g(x) ≥ 1 for all x ≥ 0. We firstobserve that

g(0) =u4 + 2u3v − 2u2v2

(u + v)3(u− v).

Thus, g(0) ≥ 1 is reduced to v2 + 2uv− 2u2 ≥ 0, which is true by the assumption that u ≤ 1+√

32 v.

Therefore, g(0) ≥ 1.

Notice that,g(x) = (x + u)2 · (c(x− u)2 + d

).

6

Since c(x − u)2 + d > 0, we have g(x) = 0 if and only if x = −u < 0. Thus x = −u < 0 is theonly global minimum solution of g(x), and thus one of the local minimum solutions. Since g(x) isa polynomial with order four, it has at most two local minimum solutions, including x = −u. Wedenote the other local minimum solution by z. If z < 0, then we must have that g(x) is increasingfor x ≥ 0, and thus g(x) ≥ g(0) ≥ 1. Therefore, we assume that z > 0 > −u. If follows that z mustbe the largest root to g′(x) = 0. But

g′(x) = 4cx(x2 − u2) + 2d(x + u)

and the largest root to g′(x) = 0 is u2 +

√u2

4 − d2c . Therefore,

z =u

2+

√u2

4− d

2c=

u

2+

√u2

4− v(u− v) =

u

2+

∣∣∣v − u

2

∣∣∣ = v.

The last equality holds since u ≤ 1+√

32 v. Now it is straightforward to verify that g(z) = g(v) = 1.

Finally, we observe that the global minimum solution to g(x) in [0,∞) is either x = 0 or x = z.Therefore, g(x) ≥ g(0) = g(z) = 1 for all x ≥ 0. This completes the proof.

In Lemma 2.1, if we choose u = 1+√

32 v, then we have

Corollary 2.2. For any v > 0, define

y0 = 1,

y1 = 89(2√

3− 3)v−1,

y2 = 43(3− 2

√3)v−2,

y4 = 49(2√

3− 3)v−4.

(2.10)

Then (y0, y1, y2, y4) is feasible to problem (2.8) with an objective value

1− 49(2√

3− 3)(−2M1

v+ 3

M2

v2− M4

v4

).

Corollary 2.1 immediately leads to our first main result.

Theorem 2.3. For any v > 0,

Prob {X ≥ 0} ≤ 1− 49(2√

3− 3)(−2E[X]

v+ 3

E[X2]v2

− E[X4]v4

). (2.11)

In Theorem 2.3, we have the freedom to choose any v > 0. In particular, we could choose v

that maximizes the function−2M1

v+ 3

M2

v2− M4

v4.

Such a v can be obtained by solving the equation

−M1v3 + 3M2v

2 − 2M4 = 0,

7

if a solution exists.

Notice that, even if we choose the best v, the bound provided in Theorem 2.3 is not necessarilytight. In what follows, we develop a tight bound for Prob {X ≥ 0}, given the first, second, andthe fourth moments of X. We begin with the case where the bound in Theorem 2.3 is tight. Sincewhen M2 = M2

1 or M4 = M22 the distribution X can be easily identified by the first two moments,

and the result in Theorem 2.8 easily follows, therefore we assume M2 > M21 and M4 > M2

2 for theremaining part of this section.

Let

α =

√M4 −M2

2

M2 −M21

> 0

and

Vmin =

√(√

3− 1)M2 +7− 4

√3

4M2

1 +2−√3

2M1 ≥ 0. (2.12)

Lemma 2.4. If M2/M21 ≥ M4/M

22 and α ≥

√3−12 Vmin, then

Prob {X ≥ 0} ≤ 1− 49(2√

3− 3) supv>0

(−2E[X]

v+ 3

E[X2]v2

− E[X4]v4

).

And the bound is tight.

Proof. The inequality has been established in Theorem 2.3. We need only to show that the boundis tight. In view of Lemma 2.1, it is sufficient to find a feasible solution to problem (2.7) with anobjective value that is equal to the right hand side of the bound.

If M1 < 0, define f(x) = −M1x3 + 3M2x

2 − 2M4. Then it can be verified that f(x) is strictlyincreasing when x ≥ 0. By assumption, M2/M

21 ≥ M4/M

22 , and thus,

f

((√

3− 1)M2

−M1

)= 2(M3

2 /M21 −M4) ≥ 0.

On the other hand, by (2.12), Vmin is a solution of the equation x2−(2−√3)M1x−(√

3−1)M2 = 0.Thus

f(Vmin) = (√

3− 2)M21 V 2

min − (4√

3− 7)M1M2Vmin + (3√

3− 3)M22 − 2M4

= (√

3− 2)M21 V 2

min + (√

3− 2)M2

((√

3− 1)M2 − V 2min

)+ (3

√3− 3)M2

2 − 2M4

= 2M22 − 2M4 + (2−

√3)(D −M2

1 )V 2min

≤ 0

where the last inequality holds because of the assumption that Vmin ≤ (√

3 + 1)α. By the mono-tonicity of f(x) when x ≥ 0, we must have Vmin ≤ (

√3− 1) M2

−M1. Furthermore, there must exist a

unique v ∈ [Vmin, (√

3 − 1) M2−M1

] such that f(v) = 0. For simplicity, in what follows, we assume v

satisfy such a condition. Also, let u = 1+√

32 v.

8

We now define a random variable

X =

−u, with probability q := 49(2√

3− 3)(2−M1v + 3M2

v2 − M4v4 );

0, with probability 1− p− q;v, with probability p := 6−2

√3

3M2v2 − 6−2

√3

9M4v4 + 4

√3−39

M1v .

We show X defines a feasible solution to problem (2.7).

First of all, by the fact f(v) = 0, or M4 = 12

(−M1v3 + 3M2v

2), we have

q =23(2√

3− 3)(−M1

v+

M2

v2

)

and

p =3−√3

3M2

v2+√

33

M1

v.

Therefore q ≥ 0 and p ≥ 0 since v ≤ (√

3− 1) M2−M1

. Furthermore,

p + q = (2−√

3)M1

v+ (√

3− 1)M2

v2≤ 1

since v ≥ Vmin. Therefore, X is indeed a well-defined random variable.

It is easy to check that

E[X] = −qu + pv = −(

23(2√

3− 3)(−M1

v+

M2

v2))

1 +√

32

v +

(3−√3

3M2

v2+√

33

M1

v

)v

= M1,

E[X2] =(

23(2√

3− 3)(−M1

v+

M2

v2))

(1 +

√3

2)2v2 +

(3−√3

3M2

v2+√

33

M1

v

)v2

= M2, and

E[X4] =(

23(2√

3− 3)(−M1

v+

M2

v2))

(1 +

√3

2)4v4 +

(3−√3

3M2

v2+√

33

M1

v

)v4

= −12M1v

3 +32M2v

2 = M4.

Therefore, X is feasible to problem (2.7).

Finally, since u ≥ v > 0, we have

Prob {X ≥ 0} = Prob {X 6= −u} = 1− q = 1− 4(2√

3− 3)9

(−2M1

v+

3M2

v2− M4

v4

).

This completes the proof of the lemma for the case M1 < 0.

For the case M1 ≥ 0 the proof is completely parallel, except that the solution for f(v) = 0exists in range v ∈ [Vmin,

M2M1

]. The details are omitted here.

In order to get a tight bound for cases that are not covered in Lemma 2.4, we need to definedifferent primal and dual variables, which are summarized in the following three lemmas.

9

Lemma 2.5. If M2/M21 ≥ M4/M

22 and α ≤

√3−12 Vmin, then

Prob {X ≥ 0} ≤ 12

+12

α + 2M1√4M2 + α2 + 4M1α

.


Proof. Define

s =√

4M2 + α2 + 4αM1

z = α + 2M1

u = s+α2

v = s−α2 .

From the assumption α ≤√

3−12 Vmin, we have

(5 + 3√

3)α2 −M1α−M2 ≤ 0.

It follows thats =

√4M2 + α2 + 4αM1 ≥ (3 + 2

√3)α

and thus

u =s + α

2≤ 1 +

√3

2· s− α

2=

1 +√

32

v,

which also implies that u > v > 0. Thus, by Lemma 2.1, the function

g(x) = c(x2 − u2)2 + d(x + u)2 = cx4 + (d− 2cu2)x2 + 2dux + cu4 + du2,

with c = 1(u+v)3(u−v)

> 0 and d = 2v(u+v)3

> 0, (implicitly) defines a feasible solution to problem(2.8). The corresponding dual objective value is

cM4 + (d− 2cu2)M2 + 2duM1 + cu4 + du2 =s + z

2s,

where we use the fact that

M2 = s2−α2

4 −M1α

M4 = (M2 −M21 )α2 + M2

2

c = s−3α−1

d = s−2 − s−3α.

On the other hand, we define

X =

{−u (< 0), with probability q := s−z

2s ;v (> 0), with probability p := s+z

2s .

We shall show that X is a feasible solution to problem (2.7).

It is obvious that p + q = 1. Also, by the fact that M2 ≥ M21 , we have

s =√

4M2 + α2 + 4αM1 ≥ |α + 2M1| = |z|.

10

Therefore, p, q ≥ 0. Thus, X is a well-defined random variable.

Furthermore,

E[X] = −uq + vp = −s + α

2s− α− 2M1

2s+

s− α

2s + α + 2M1

2s= M1

E[X2] = u2q + v2p =(s + α)2

4s− α− 2M1

2s+

(s− α)2

4s + α + 2M1

2s=

s2 − α2

4−M1α

= M2, and

E[X4] = u2q + v2p =(s + α)4

16s− α− 2M1

2s+

(s− α)4

16s + α + 2M1

2s= (M2 + αM1)(M2 + α2 + αM1)− αM1(2M2 + α2 + 2αM1)

= (M22 −M2

1 )α2 + M22

= M4.

Finally,

Prob {X ≥ 0} = Prob {X = v} =s + z

2s,

which is equal to the dual objective value. This completes the proof of the Lemma.

Lemma 2.6. If M2/M21 ≤ M4/M

22 and M1 < 0, then

Prob {X ≥ 0} ≤ 1− M21

M2.


Proof. The inequality is the well-known Chebyshev inequality, which is known to be tight; see, forexample, Bertsimas and Popescu [4].

Lemma 2.7. If M2/M21 ≤ M4/M

22 and M1 > 0, then the trivial bound Prob {X ≥ 0} ≤ 1 is

actually tight.

Proof. The primal solution X with objective value 1 can be constructed this way: For any t ≥M1 > 0, define random variable Xt as a two point distribution as follows:

Xt =

{0, with probability 1− M1

t

t, with probability M1t .

We have that EXt = M1, EX2t = tM1 and EX4

t = t3M1.

Consider function f(x) = x3/M21 , which is convex when x > 0. Notice that M2 ≥ M2

1 ,f(M2

1 ) = M41 and f(M2) ≤ M4, the line passing through (M2

1 ,M41 ) and (M2, M4) intersect with

function f(x) at some t ≥ M2. Thus there exists a p ∈ [0, 1], such that

p(M21 ,M4

1 ) + (1− p)(t, f(t)) = (M2, M4).

11

Let Y be the Bernoulli trial which takes the value 1 with probability p, and let

X = Y XM1 + (1− Y )Xt/M1,

where Y is independent to Xt/M1and XM1 , then

(EX, EX2,EX4) = p(EXM1 , EX2M1

, EX4M1

) + (1− p)(EXt/M1, EX2

t/M1, EX4

t/M1)

= p(M1,M21 ,M4

1 ) + (1− p)(M1, t, f(t))

= (M1,M2, M4).

Because X ≥ 0, this gives a feasible solution of the primal problem 2.7 with objective value 1. Since1 is an upper bound for Z1

P , we conclude that Z1P = 1 and that X is an optimal primal solution.

For the dual problem (2.8), y0 − 1 = y1 = y2 = y4 = 0 is obviously a feasible solution withobjective value 1. Because Z1

D ≥ Z1P = 1, this is an optimal dual solution.

Lemma 2.6 and Lemma 2.7 indicate that when the fourth moment of X, i.e., M4, becomessufficiently large, then the information will not be useful anymore in bounding the probability thatX ≥ 0.

The following theorem summarizes the results we obtained above.

Theorem 2.8.

Prob {X ≥ 0}

≤

1− M21

M2, if M4

M22≥ M2

M21

and M1 < 0;

1, if M4

M22≥ M2

M21

and M1 > 0;

1− 4(2√

3−3)9 supv>0

(−2M1v + 3M2

v2 − M4v4

), if M4

M22

< M2

M21

and α ≥√

3−12 Vmin;

12 + 1

2α+2M1√

4M2+α2+4M1α, if M4

M22

< M2

M21

and α ≤√

3−12 Vmin,

(2.13)

where α ,√

M4−M22

M2−M21, Vmin ,

√(√

3− 1)M2 + 7−4√

34 M2

1 + 2−√32 M1. Furthermore, the bound is

tight, i.e., there exists an X such that the inequality (2.13) holds as an equality.

Now we consider a special case where E[X] = 0. As we have mentioned in the introduction,He et al. [9] established the following inequality

Prob {X ≥ E[X]} ≤ 1− 920· σ4

E[(X − E[X])4],

which has been a key to study an SDP relaxation for certain class of quadratic optimization prob-lems. Here we show that this inequlality can be strengthened by using Theorem 2.3.

Corollary 2.9. If E[X] = 0, then

supX{Prob (X ≥ 0)} =

1− (2√

3− 3)M22

M4, if M4

M22≥ 3

√3−32 ;

12 +

√14 − 1

3+M4/M22, if M4

M22≤ 3

√3−32 .

12

Proof. Notice that if M1 = E[X] = 0, then Vmin =√

(√

3− 1)M2 and α = M4−M22

M2. Therefore, The

condition 3√

3−32 ≤ M4

M22

is equivalent to α ≥√

3−12 Vmin. The corollary follows by noting that

maxv>0

(3M2

v2− M4

v4

)=

94· M2

2

M4.

By applying Corollary 2.9, we can obtain a non-trivial bound for the probability X ≥ a whenE[X] = 0, given the information on M2 and M4.

Corollary 2.10. If E[X] = 0 and a ≥ 0, then

Prob {X ≥ a} ≤ 1− (2√

3− 3)(M2 + a2)2

M4 + 6a2M2 + a4.

Proof. Let Y be a random random variable independent to X, and X takes only one of the twovalues, a or −a, each with probability half. Let Z = X + Y . Then

E[Z] = 0

E[Z2] = E[X2] + E[Y 2] = M2 + a2

E[Z4] = E[X4] + 6E[X2]a2 + a4 = M4 + 6a2M2 + a4.

Then, by Corollary 2.9,

Prob {Z ≥ 0} ≤ 1− (2√

3− 3)(M2 + a2)2

M4 + 6a2M2 + a4.

However,

Prob {Z ≥ 0} =Prob {X ≥ a}+ Prob {X ≥ −a}

2≥ Prob {X ≥ a}.

The desired inequality follows.

The bound proved in Corollary 2.10 is not tight in general. A tight bound is summarized inthe following theorem. Its proof, which is quite technical and similar to the proof of Theorem 2.8,is provided in the appendix.

Theorem 2.11. Let K = M4/M22 and L = M2/a2. If E[X] = 0 and a ≥ 0, Then

Prob {X ≥ a} ≤

M2M2+a2 , if K ≥ L + 1

L − 1;

M4−M22

M4−2M2a2+a2 , if K ≤ L + 1L − 1 and L < 1;

12 +

√14 − 1

3+M4/M22, if K ≤ L + 1

L − 1, L ≥ 1 and1√L≥

√K − 1 +

√K2 + 2K − 3−

√K+3−√K−1

2 ;

min{P (v) | v ≥ a}, otherwise

13

where

P (v) = 1− −M4 + M2(3v2 + 2av + a2) + a2v2 + 2av3

a4

4 + a3v + 4a2v2 + 6av3 + 94v4 + (3v3 + 4av2 + 2a2v)

√v2

4 + (a+v)2

2

.

And the bounds are tight.

3 Small Deviation Bound for Sum of Independent Random Vari-

ables

In this section, we consider the problem of bounding the probability of small deviations for sum of in-dependent random variables. In particular, consider n independent random variables X1, X2, · · · , Xn

each with a mean of zero. Let S =∑n

i=1 Xi. We are interested in the probability that S < ∆for some given constant ∆. For this purpose, we may directly apply Theorem 2.11. Then we needto estimate E[S2] and E[S4]. We may also apply Theorem 2.8. In this case, we need to estimateE[(S −∆)2] and E[(S −∆)4]. We demonstrate below how this could be done.

We consider two cases. In the first case, the random variables Xi are uniformly bounded fromboth sides. In the second case, we assume that the random variables are uniformly bounded onlyfrom below.

Given two nonnegative constants c1 and c2, define

s(c1, c2) := max{c21 + 4c1, c

22 − 4c2, c

21 + c2

2 − 4c1c2 − 4(c2 − c1)}.

Our first result is summarized below.

Theorem 3.1. Consider n independent random variables X1, X2, · · · , Xn. Assume that ∆ > 0 isa given constant. Also assume that E[Xi] = 0 and there exists two nonnegative constants c1 and c2

such that −c1∆ ≤ Xi ≤ c2∆. Let S =∑n

i=1 Xi, then

Prob {S < ∆} ≥ F1(∆, c1, c2) ≥ F2(c1, c2), (3.14)

where

F1(∆, c1, c2) =

4(2√

3− 3)9

infD>0

(√6(D∆2 + ∆4)

3D2 + (6 + s(c1, c2))D∆2 + ∆4+

94(D + ∆2)2

3D2 + (6 + s(c1, c2))D∆2 + ∆4

)

andF2(c1, c2) = 4(2

√3− 3)

s(c1, c2) + 2s(c1, c2)2 + 12s(c1, c2) + 24

.

14

Proof. First of all, we can assume without loss of generality that Xi follows a two point distributionfor every i = 1, 2, · · · , n. In particular, given that E[Xi] = 0, we assume that there exists ai, bi ≥ 0,such that

Xi =

{−ai, with probability bi

ai+bi

bi, with probability aiai+bi

.

It follows that E[X2i ] = aibi and E[X4

i ] = aibi(a2i − aibi + b2

i ). Let denote the variance of S by D,i.e., D =

∑ni=1 E[X2

i ] =∑n

i=1 aibi. Therefore, E[(S −∆)2] = D + ∆2. Furthermore,

E[(S −∆)4] = E[S4]− 4∆E[S3] + 6∆2E[S2] + ∆4

=n∑

i=1

E[X4i ] + 6

∑

i<j

E[X2i ]E[X2

j ]− 4∆n∑

i=1

E[X3i ] + 6∆2

n∑

i=1

E[X2i ] + ∆4

=n∑

i=1

(E[X4

i ]− 4∆E[X3i ]− 3(E[X2

i ])2)

+ 3D2 + 6∆2D + M4

= 3D2 + 6∆2D + ∆4 +n∑

i=1

aibi(a2i + b2

i − 4aibi − 4∆(bi − ai)).

Notice that a2i + b2

i − 4aibi − 4∆(bi − ai) is a convex function of ai when bi is fixed, and is convexin bi when ai is fixed. Therefore, an optimal solution to the optimization problem

max0≤ai≤c1,0≤bi≤c2

(a2i + b2

i − 4aibi − 4∆(bi − ai))

is in the set {(0, 0), (0, c2), (c1, 0), (c1, c2)}. Thus, we conclude that

a2i + b2

i − 4aibi − 4∆(bi − ai) ≤ s(c1, c2).

It then follows thatE[(S −∆)4] ≤ 3D2 + 6∆2D + ∆4 + s(c1, c2)D∆2.

Thus by Theorem 2.8, we have for any v > 0 that

Prob {S −∆ < 0} ≥ 49(2√

3− 3)(

2∆v

+3(D + ∆2)

v2− 3D2 + 6∆2D + ∆4 + s(c1, c2)D∆2

v4

).

In particular, we choose v such that

v−2 =32

(D + ∆2

)

3D2 + (6 + s(c1, c2))D∆2 + ∆4.

Then we must have

Prob {S −∆ < 0}

≥ 49(2√

3− 3)

(√6(D∆2 + ∆4)

3D2 + (6 + s(c1, c2))D∆2 + ∆4+

94(D + ∆2)2

3D2 + (6 + s(c1, c2))D∆2 + ∆4

)

≥ F1(∆, c1, c2).

15

Furthermore, it is clear that

F1(∆, c1, c2)

≥ infD>0

(2√

3− 3)(

(D + ∆2)2

3D2 + (6 + s(c1, c2))D∆2 + ∆4

)

≥ (2√

3− 3)4(s(c1, c2) + 2)

s(c1, c2)2 + 12s(c1, c2) + 24= F2(c1, c2),

where the last inequality uses the fact that if we let t = ∆2

D+∆2 , then

3D2 + (6 + x)D∆2 + ∆4

(D + ∆2)2= 3 + xt(1− t)− 2t2 ≤ 3 +

x2

4(x + 2),

for any x > −2. This completes the proof of the theorem.

Now we consider the case where the random variables Xi are bounded from below only. Weobtain a similar result as Theorem 3.1.

Theorem 3.2. Consider n independent random variables X1, X2, · · · , Xn. Assume that ∆ > 0 isa given constant. Also assume that E[Xi] = 0 and there exists a constant c > 0 such that Xi ≥ −c∆for every i. Let S =

∑ni=1 Xi, then for any τ > 0,

Prob {X < ∆} ≥ e−1/τF1(∆, c, τ max(1, c)) ≥ e−1/τF2(c, τ max(1, c)). (3.15)

Proof. Once again, we assume without loss of generality that there exist ai, bi ≥ 0, such that

Xi =

{−ai, with probability bi

ai+bi

bi, with probability aiai+bi

.

By assumption ai ≤ c∆. We also assume that without loss of generality that b1 ≥ b2 ≥ . . . ≥ bn.We consider a fixed τ > 0 and define

N = max{0,max{k | bk ≥ τ(a1 + a2 + . . . + ak), 1 ≤ k ≤ n}}.Let a =

∑Ni=1 ai; if N = 0, then let a = 0. If N < n, then for every i > N ,

bi ≤ bN+1 ≤ τ

N+1∑

i=1

ai ≤ τ(a + aN+1) ≤ τ(a + c∆).

For any i ≤ N , bi ≥ bN ≥ τa. Thus, if N > 0, then

Prob

{N∑

i=1

Xi = −a

}=

N∏

i=1

Prob {Xi = −ai}

=N∏

i=1

(1− ai

ai + bi

)≥

N∏

i=1

(1− ai

ai + τa

)

≥N∏

i=1

e−ai/(τa) = e−1/τ .

16

Let Y =∑n

i=N+1 Xi. Because for each i > N , ai ≤ c∆ ≤ c(a + ∆) and bi ≤ max(1, c)τ(a + ∆), byTheorem 3.1, we know that

Prob {S < ∆}

≥ Prob

{∑

i<N

Xi = −a

}· Prob {Y < a + ∆}

≥ e−1/τF1 (∆, c, τ max{1, c})≥ e−1/τF2 (c, τ max{1, c}) .

The proof is completed.

Theorem 3.2 generalizes an inequality that was proved by Feige [7]. In particular, if every Xi

is non-negative with expectation 1, then Feige proved that

Prob

{n∑

i=1

Xi ≥ n + 1

}≤ 12

13.

For this special case, Theorem 3.2 implies a stronger result than the above inequality.

Corollary 3.3. Consider n independent random variables X1, X2, · · · , Xn each with mean zero. IfXi ≥ −1 for all i = 1, 2, · · · , n, then we have that

Prob

{n∑

i=1

Xi < 1

}≥ e−1/5 1

3(2√

3− 3)≥ 1

8.

Proof. We can apply Theorem 3.2 with c = 1 and ∆ = 1. We choose τ = 5 and thus s(c, τ) = 5.In this case,

F1(∆, c, τ) = infD>0

49(2√

3− 3)

(√6(D + 1)

3D2 + 11D + 1+

94(D + 1)2

3D2 + 11D + 1

).

However, √6(D + 1)

3D2 + 11D + 1+

94(D + 1)2

3D2 + 11D + 1is a decreasing function of D when D ≥ 0. By letting D go to infinity, we have that

F1(∆, c1, c2) ≥ 49(2√

3− 3) · 34

=2√

3− 33

.

By Theorem 3.2, we have

Prob {X < ∆} ≥ e−1/c2F1(∆, c1, c2) = e−1/5 (2√

3− 3)3

,

which completes the proof for the corollary.

17

It would be interesting to see how strong a bound can be obtained if we apply Markov, Cheby-shev, and Bertsimas-Popescu’s (three-moments) inequality to the problem considered in Corollary3.3. Consider the following example, where all the Xis are i.i.d distribution which take value 0and 2 with probability 1/2 of each. Then for the random variable X =

∑ni=1 Xi, and δ = 1

n , wehave M1 = n, M2 = n2 + n, and M3 = n3 + 3n2. Since C2

M = 1n = δ, when n → ∞, the value

f1(C2M , D2

M , δ) = nn+1 approaches 1. Therefore the three moments inequality alone is not good

enough to yield a good bound for the problem.

4 Applications

In many applications and rounding algorithms, the Chernoff type bounds or other similar inequal-ities can be applied to yield claims of the following spirit: If n > N(δ, ε), then for n independentsamples Xi (1 ≤ i ≤ n) of a random variable X it follows that Prob {max1≤i≤n Xi ≤ (1−δ)EX} ≤ ε.

However it follows from our analysis, the δ can be dropped and we can claim the following:

Lemma 4.1. If a random variable X has kurtosis κ = E(X−EX)4

(E(X−EX)2)2− 3, then with n = 2

√3+33 (κ +

3) log(1ε ) many samples, by Theorem 2.3,

Prob{

max1≤i≤n

Xi ≤ EX

}≤ ε

and

Prob{

min1≤i≤n

Xi ≥ EX

}≤ ε.

This is to say that, when a distribution’s Kurtosis κ can be estimated or upper bounded, thenΘ(κ log(1/ε)) many samples would guarantee that we are able to draw one whose value is at leastas good as the expected value of the distribution, with high probability 1− ε.

Proof. Because for each i, Prob {Xi ≤ EX} ≤ 1− (2√

3− 3) 1κ+3 , we have

Prob{

max1≤i≤n

Xi ≤ EX

}≤

(1− (2

√3− 3)

1κ + 3

)n

≤ exp(−n(2

√3− 3)

1κ + 3

).

Thus if n ≥ 2√

3+33 (κ + 3) log(1

ε ), we have Prob {max1≤i≤n Xi ≤ EX} ≤ ε. The other inequality issymmetric.

Also, for sums of independent random variables we have the following:

Lemma 4.2. If Xi are independent random variables with EXi = 0, EX2i = D, EX4

i ≤ (κ +3)(EX2

i )2, then we have that

2√

3− 33 + κ

n

≤ Prob

{n∑

i=1

Xi ≥ 0

}≤ 1− 2

√3− 3

3 + κn

.

18

Proof. Let X =∑n

i=1 Xi, DX = Var(X) = nVar(Xi) = nD, τ = (κ + 3)D2. Then

τX = EX4 =n∑

i=1

EX4i + 6

∑

i<j

EX2i EX2

j ≤ nτ + 3D2X − 3nD2 = (3n2 − 3n + n(κ + 3))D2.

Thus

Prob {X ≥ EX} ≤ 1− (2√

3− 3)D2

X

τX≤ 1− 2

√3− 3

3 + κn

.

The other inequality follows by symmetry.

Now we consider the weighted maximum cut problem. In this problem, we are given anundirected graph G = (V, E) where each edge (u, v) has a weight wu,v, and the goal is to par-tition the vertices of G into two sets S1 and S2 so as to maximize the total weight of the edges(u, v) such that u ∈ S1 and v ∈ S2. This problem is NP-hard, but admits a polynomial time0.878-approximation algorithm; see Goemans and Williamson [6]. Prior to the celebrated result ofGoemans and Williamson, the best known approximation ratio for the maximum cut problem was1/2 for the weighted version, and 1

2 + 12δ for the unweighted version, where δ denotes the maximum

degree of a vertex.

It is well-known that, a simple 1/2-approximation algorithm can be obtained by independentlyand equiprobably assigning each vertex of G to either S1 or S2. Indeed, if we denote the total weightof edges with end-points in different sets by W , then it is clear that

E[W ] =12

∑

(u,v)∈E

wu,v :=12Wtot (4.16)

which of course is no less than half of the maximum weight.

Equation (4.16) has a stronger implication. That is, for any graph, there exists a cut so thatweight of the cut is at least half of the total weight of the edges of the graph. However, twointeresting questions remain:

• There are O(2|V |) many cuts for a graph. Among all the possible cuts, how many of themhave a weight larger than Wtot/2?

• Is it possible to show that there always exists a cut with a weight higher than αWtot for someα > 1/2?

When the graph is unweighted, the answer to the second question is “yes” with an α = 12 + 1

2n

and this bound is the best possible; see Haglin and Venkatesan [8]. The result is obtained byproving the existence of a matching of certain size, which also gives a linear time algorithm to finda cut with a weight larger than (1

2 + 12n)Wtot.

Now we shall answer the above two questions for a general weighted graph by using the simplerandomized algorithm described earlier, together with the moment bound developed in this paper.

19

We slightly formalizes the randomized algorithm as follows. We define |V | independent randombinary variables X1, · · · , X|V |, so that for each node i ∈ V , Xi takes value 1 or −1 with probabilityhalf. Thus, Xi = 1 indicate node i is assigned to the set S1, and vice versa. Then we have

W =∑

i<j

wi,j1−XiXj

2.

For convenience, we also define

Y = W − 12Wtot

so that E[Y ] = 0. We now estimate the second and the fourth moments of random variable Y .

E[Y 2] =14E

∑

i<j

wi,jXiXj

2 =

14

∑

i<j

w2i,j ≥

12|V |2 W 2

tot. (4.17)

It can also be shown thatE[Y 4] ≤ 15(E[Y 2])2. (4.18)

Therefore, it follows immediately from Corollary 2.9 that

Prob{

W ≥ 12Wtot

}= Prob {−Y ≤ 0} ≥ (2

√3− 3)

(E[Y 2])2

E[Y 4]≥ 2

√3− 315

.

Denote ∆ =(E[Y 2]/

√15

)1/2>

√E[Y 2]/2 and let Z = t∆ − Y with t ≥ 0. Then we have

E[Z] = t∆, E[Z2] = E[Y 2] + t2∆2, and

E[Z4] = E[Y 4]− 4t∆E[Y 3] + 6t2∆2E[Y 2] + t4∆4

≤ E[Y 4] + 4t∆(E[Y 2])1/2(E[Y 4])1/2 + 6t2∆2E[Y 2] + t4∆4

≤ E[Y 4] + 4√

15t∆(E[Y 2])3/2 + 6t2∆2E[Y 2] + t4∆4

≤ 15E[Y 2]2 + 4√

15t∆(E[Y 2])3/2 + 6t2∆2E[Y 2] + t4∆4

=(

15 + 4(15)1/4t +6√15

t2 +115

t4)

(E[Y 2])2.

Thus, by Theorem 2.8, we have, for any v > 0,

Prob {Y ≥ t∆} = Prob {Z ≤ 0} ≥ 49(2√

3− 3)(−2E[Z]

v+

3E[Z2]v2

− E[Z4]v4

).

In particular, if we choose v = 10∆ and t = 0.01, then

Prob {Y ≥ t∆} > 1.2%.

It follows that

Prob{

W ≥(

12

+0.0036|V |

)Wtot

}> 1.2%.

To summarize, we have proved the following:

20

Theorem 4.3. For any weighted graph, the following two statements are true.

1). Among all possible cuts of the graph, at least 2√

3−315 > 3% of them will have a cut value

larger than half of the total weight of the edges of the graph.

2). There exists a cut whose weight is at least(

12 + 0.0036

|V |)

times the total weight of the edgesof the graph.

References

[1] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization: Analysis, Algo-rithm, and Engineering Applications. MPS/SIAM Ser. Optim.2, SIAM, Philadelphia, 2001.

[2] B. Berger. The Fourth Moment Method. SIAM Journal on Computing 26, pp. 1188 – 1207,1999.

[3] F.P. Cantelli. Intorno ad un teorema fundamentale della teoria del rischio. Boll. Assoc. Attuar.Ital. (Milan), pp. 1 – 23, 1910.

[4] D. Bertsimas and I. Popescu. Optimal Inequality in Probability Theory: A Convex Optimiza-tion Approach. SIAM Journal on Optimzation, 2005.

[5] D. Bertsimas and I. Popescu. On the Relation Between Option and Stock prices: A ConvexOptimization Approach. Operation Research 50 No.2 , pp. 358 – 374, 2002.

[6] M.X. Goemans and D.P. Williamson. Improved Approximation Algorithms for Maximum Cutand Satisfiability Problems using Semidefinite Programming. Journal of the ACM 42, pp. 1115– 1145, 1995.

[7] U. Feige. On Sums of Independent Random Variables with Unbounded Variances, and Esti-mating the Average Degree in a Graph. SICOMP, (2006).

[8] D.J. Haglin and S.M. Venkatesan. Approximation and Intractability Results for the MaximumCut Problem and Its Variants. IEEE Transactions on Computers 40, pp. 110 – 113, 1991.

[9] S. He, Z.Q. Luo, J. Nie, and S. Zhang. Semidefnite Relaxation Bounds for Indefinite Homoge-neous Quadratic Optimization. Working Paper, 2007.

[10] K. Isii. The extrema of probaiblity determined by generalized moments. I. Bounded randomvariables. Ann. Insti. Statist. Math., 12, pp. 164 – 168, 1960.

[11] S. Karlin and W.J. Studden. Tchebysheff Systems: With Applications in Analysis and Statis-tics. Pure Appl. Math. 15, Interscience, John Wiley and Sons, New York, 1966.

[12] J.B. Lasserre. Bounds on measures satisfying moment conditions. Ann. Appl. Probab. 12, pp.1114 – 1137, 2002.

21

[13] R. Von Mises. The Limits of a Distribution Function if Two Expected Values Are Given. Ann.Math. Statist. 10, pp. 99 – 104, 1939.

[14] Yu. Nesterov. Structure of Non-Negative Polynomial and Optimization Problems. Preprint DP9749, Louvain-la-Neuve, Belgium, 1997.

[15] I. Popescu. A Semidefinite Programming Approach to Optimal Moment Bounds for ConvexClasses of Distributions. Mathematics of Operation Research, 2005.

[16] J. Smith. Generalized Chebyshev Inequalities: Theory and Applications in Decision Analysis.Operations Research, 43, 807 – 825, 1995.

[17] M. Zelen. Bounds on a Distribution Function That Are Functions of Moments to Order Four.J. Res. Nat. Bur. Stand., 53, pp. 377 – 381, 1954.

A Proof of Theorem 2.11

For Theorem 2.11, the primal problem is

Z2P = max Prob {X ≥ a}

s.t. E[X] = 0E[X2] = M2

E[X4] = M4,

or equivalentlyZ2

P = maxF (·)∫x≥a 1 · dF (x)

s.t.∫x∈R 1 · dF (x) = 0∫x∈R x · dF (x) = M1∫x∈R x2 · dF (x)µ = M2∫x∈R x4 · dF (x) = M4.

(A.19)

Its dual problem in this setting can be written as

Z2D = min y0 + 0 · y1 + M2 · y2 + M4 · y4

s.t. g(x) := y0 + y1 · x + y2 · x2 + y4 · x4 ≥ 1{x≥a}, ∀x ∈ R.(A.20)

Lemma A.1. For any u > v > a, let c = 1(u+v)3(u−v)

> 0 and d = 2v(u+v)3

> 0. Define

y0 = cu4 + du2,

y1 = 2du,

y2 = d− 2cu2,

y4 = c.

(A.21)

Then (y0, y1, y2, y4) is feasible to problem (A.20) if u ≤ v2 +

√v2

4 + (a+v)2

2 .

22

Proof. If (y0, y1, y2, y3) is defined by (A.21), then

g(x) = cx4 + (d− 2cu2)x2 + 2dux + cu4 + du2 = c(x2 − u2)2 + d(x + u)2.

It is clear that g(x) ≥ 0 for all x ∈ R. It is left to verify that g(x) ≥ 1 for all x ≥ a. The condition

u ≤ v2 +

√v2

4 + (a+v)2

2 implies that

2u2 − 2uv ≤ (v + a)2.

We first observe that

g(a)− 1c

= (a2 − u2)2 + (a + u)2(2uv − 2v2)− (u + v)3(u− v)

= a4 − 2a2(u2 − uv + v2) + 4av(u2 − uv) + v2(v2 + 2uv − 2u2)

= (a2 − v2)2 − 2(u2 − uv)(v − a)2

≥ (a2 − v2)2 − (v + a)2(v − a)2

= 0.

Therefore g(a) ≥ 1. Notice that,

g(x) = (x + u)2 · (c(x− u)2 + d).

Since c(x − u)2 + d > 0, we have g(x) = 0 if and only if x = −u < 0. Thus x = −u < 0 is theonly global minimum solution of g(x), and thus one of the local minimum solutions. Since g(x) isa polynomial with order four, it has at most two local minimum solutions, including x = −u. Wedenote the other local minimum solution by z. If z < a, then we must have that g(x) is increasingfor x ≥ a, and thus g(x) ≥ g(a) ≥ 1. Therefore, we assume that z > 0 > −u, and z must be thelargest root to g′(x) = 0. But

g′(x) = 4cx(x2 − u2) + 2d(x + u)

and the largest root to g′(x) = 0 is u2 +

√u2

4 − d2c . Therefore,

z =u

2+

√u2

4− d

2c=

u

2+

√u2

4− v(u− v) =

u

2+

∣∣∣v − u

2

∣∣∣ = v.

The last equality holds since

u ≤ v

2+

√v2

4+

(a + v)2

2≤ 2v.

Now it is straightforward to verify that g(z) = g(v) = 1.

Finally, we observe that the global minimum solution to g(x) in [0,∞) is either x = 0 or x = z.Therefore, g(x) ≥ g(0) = g(z) = 1 for all x ≥ 0. This completes the proof.

23

Proof of Theorem 2.11. When M4 = M42 , the distribution has to be

X =

{−√M2, with probability 1

2 ;√M2, with probability 1

2 ,

Since

Prob {X ≥ a} =

{0, if L < 1;12 , if L ≥ 1,

Theorem 2.11 holds under this condition.

If M4 > M22 , the given condition of the moment information implies that the strong duality

holds. Thus problem (A.19) is equivalent to

Z2P = maxF (·)

∫x≥a 1 · dF (x)

s.t.∫x∈R 1 · dF (x) = 0∫x∈R x · dF (x) = M1∫x∈R x2 · dF (x) = M2∫x∈R x4 · dF (x) ≤ M4.

(A.22)

Case 1: When K ≥ L + 1L − 1, we define

X =

{−M2

a , with probability 1L+1 ;

a, with probability LL+1 ,

which is always well defined. Furthermore,

E[X] = −M2a

1L+1 + a L

L+1 = −aL 1L+1 + a L

L+1 = 0,

E[X2] = M22

a21

L+1 + a2 LL+1 = a2L2 1

L+1 + a2 LL+1 = a2L = M2,

E[X4] = a4L4 1L+1 + a4 L

L+1 = a4L(L2 − L + 1) = M22

L (L2 − L + 1) ≤ M22 K = M4.

Therefore X is feasible to problem (A.22) with objective value Prob {X ≥ a} = L/(L + 1).

We now define

g(x) =(

ax + M2

a2 + M2

)2

,

which is feasible to problem (A.20). The corresponding dual objective value is

(a

a2 + M2

)2

M2 +(

M2

a2 + M2

)2

=L

(L + 1)2+

L2

(L + 1)2=

L

L + 1.

Therefore, when K ≥ L + 1L − 1 the inequality in Theorem 2.11 holds and is tight.

Case 2: When L < 1 and K ≤ L + 1L − 1, define

g(x) =[

a2 −M2

a4 − 2a2M2 + M4(x2 − a2) + 1

]2

,

24

which is a feasible solution of problem (A.20). The corresponding dual objective value is

(a2 −M2

a4 − 2a2M2 + M4

)2

M4 + 2(a2 −M2)(M4 − a2M2)(a4 − 2a2M2 + M4)2

M2 +(M4 − a2M2)2

(a4 − 2a2M2 + M4)2

=(

a2 −M2

a4 − 2a2M2 + M4

)2

(M4 −M22 ) +

(a2 −M2

a4 − 2a2M2 + M4M2 +

M4 − a2M2

a4 − 2a2M2 + M4

)2

=M4 −M2

2

(a4 − 2a2M2 + M4)2((M4 −M2

2 + (a2 −M2)2)

=M4 −M2

2

a4 − 2a2M2 + M4.

Now we define v =√

a2M2−M4a2−M2

, p = M4−M22

M4−2M2a2+a4 and

X =

v, with probability q = 1−p(1+a/v)2 ;

−v, with probability r = 1−p(1−a/v)2 ;

a, with probability p.

Sincea2M2 −M4 =

1L

M22 −KM2

2 ≥ (1− L)M22 > 0,

the value v is well defined. We observe that

p =M4 −M2

2

M4 − 2M2a2 + a4=

M4 −M22

M4 −M22 + (M2 − a2)2

,

thus 0 < p < 1. The condition L < 1 and K ≤ L + 1L − 1 implies that

a2(M4 −M22 )2 + (a2 −M2)3M4

≤ M52

((K − 1)2

L+ K

(1− L)3

L3

)

≤ M52

((1− L)4

L3+

(1− L)3

L4(L2 − L + 1)

)

= M52

(1− L)3

L4

= a2M2(a2 −M2)3,

which is equivalent to

a2

v2=

a2(a2 −M2)a2M2 −M4

≤(

(a2 −M2)2

M4 −M22

)2

=(

1p− 1

)2

.

Since1− a

v≤ 1 +

a

v≤ 1

p,

25

we have q, r ≥ 0. Notice that p + q + r = 1, the distribution X is well defined. Furthermore,

E[X] = (q − r)v + pa = −pa + pa = 0,

E[X2] = (q + r)v2 + pa2 = (1− p)v2 + pa2 =(a2 −M2)(a2M2 −M4) + a2(M4 −M2

2 )a4 − 2a2M2 + M4

= M2,

E[X4] = (q + r)v4 + pa4 =(a2M2 −M4)2 + a4(M4 −M2

2 )a4 − 2a2M2 + M4

= M4.

Thus X is feasible to problem (A.19). Since

v =

√a2 − a4 + M4 − 2a2M2

a2 −M2< a,

the corresponding dual objective value is

Prob {X ≥ a} = p =M4 −M2

2

M4 − 2M2a2 + a4.

Therefore, the inequality in Theorem 2.11 is tight when L < 1 and K ≤ L + 1L − 1.

Case 3: When L ≥ 1, K ≤ L + 1L − 1 and 1√

L≥

√K − 1 +

√K2 + 2K − 3 −

√K+3−√K−1

2 ,define

u =

√M4M2

+3M2+√

M4M2

−M2

2 =√

M2

√K+3+

√K−1

2

v =

√M4M2

+3M2−√

M4M2

−M2

2 =√

M2

√K+3−√K−1

2

p = 12 +

√14 − 1

3+K =√

K+3+√

K−12√

K+3

q = 12 −

√14 − 1

3+K =√

K+3−√K−12√

K+3.

Because M4 ≥ M22 for any distribution, these values are well defined. It follows from definition

that u > v > 0. From the assumption K ≤ L + 1L − 1 and L ≥ 1, we have

v =√

K + 3−√K − 12

√M2

=2√

K + 3 +√

K − 1

√M2

≥ 2L+1√

L+ L−1√

L

√M2

=√

M2√L

= a.

The assumption 1√L≥

√K − 1 +

√K2 + 2K − 3−

√K+3−√K−1

2 implies that

(√K + 3−√K − 1

2+

1√L

)2

≥ K − 1 +√

K2 + 2K − 3 = (√

K + 3 +√

K − 1)√

K − 1.

26

Therefore2u(u− v) ≤ (a + v)2,

which implies that u ≤ v2 +

√v2

4 + (a+v)2

2 . It follows from Lemma A.1 that the function

g(x) = c(x2 − u2)2 + d(x + u)2 = cx4 + (d− 2cu2)x2 + 2dux + cu4 + du2

withc =

1(u + v)3(u− v)

=1

(K + 3)M22

√(K + 3)(K − 1)

> 0

and

d =2v

(u + v)3=√

K + 3−√K − 1M2(K + 3)

√K + 3

> 0

defines a feasible solution to problem (A.20). Denote t =√

K2 + 2K − 3, then d = cM2(t−K + 1)and u2 = K+1+t

2 . The corresponding dual objective value is

cM4 + (d− 2cu2)M2 + cu2 + du2

= cM22

(K + t−K + 1− (K + 1 + t) +

(K + 1 + t)2

4+

K + 1 + t

2(t−K + 1)

)

= cM22

(34t2 +

K + 32

t− K2 + 2K − 34

)

= cM22 t

t + K + 32

=12

+t

2(K + 3)= p.

Now we define

X =

{−u, with probability q;v, with probability p,

which is always feasible since K ≥ 1 > 0. Furthermore,

E[X] = pv − qu = 0,

E[X2] = pv2 + qu2 = pv(u + v) = M2,

E[X4] = pv4 + qu4 = pv(u + v)(u2 − uv + v2) = M22 K = M4.

Therefore, the inequality in Theorem 2.11 is tight when L ≥ 1, K ≤ L + 1L − 1 and

1√L≥

√K − 1 +

√K2 + 2K − 3−

√K + 3−√K − 1

2.

Case 4: Now we consider the case when L ≥ 1, K ≤ L + 1L − 1 and

1√L

<

√K − 1 +

√K2 + 2K − 3−

√K + 3−√K − 1

2.

27

Our main goal is to prove there exists a v ≥ a and the corresponding u, so they can generatea feasible dual solution, and also satisfies the following conditions (which are the crucial conditionsfor the feasibility of the primal solution, and for the primal objective to match the dual objectivevalue):

1.au ≤ M2 ≤ vu;

2.

−M4 + M2(3v2 + 2av + a2) + a2v2 + 2av3 =(u + v)2(u− v)(M2 + av)

a + u.

Define

W (K) =√

K − 1 +√

K2 + 2K − 3−√

K+3−√K−12 ;

S(K) =√

K − 1 +√

K2 + 2K − 3;V (K) =

√K+3−√K−1

2 ;U(K) =

√K+3+

√K−1

2 ;

u(v) = v2 +

√v2

4 + (a+v)2

2 ;

t(v) =√

v2

4 + (a+v)2

2 .

Because K ≥ 1, there exists a unique b ≥ 1 such that K = b2 + 1b2− 1, and

√K − 1 +

√K2 + 2K − 3− 2√

K + 3 +√

K − 1=

√2(b2 − 1)− 1

b,

which is a monotonically increasing function of b. Since L+ 1L−1 ≥ K = b2 + 1

L2 −1, and L, b2 ≥ 1,we can conclude that L ≥ b2. If b2 ≥ 2, then KL ≥ b2(b2 + 1

b2− 1) ≥ 3. If b2 < 2, then

KL ≥ b2 + 1b2− 1

(√

2(b2 − 1)− 1b )

2≥ 3.

Therefore, the assumptions L ≥ 1, K ≤ L + 1L − 1 and 1√

L< W (K) guarantees that KL ≥ 3,

and L ≥ 2. Also, since b is monotonically increasing to K, we have that W (K) is a monotonicallyincreasing function of K. Since W (K) is a continuous function with W (1) = −1, there exists a1 ≤ K0 < K such that W (K0) = 1√

L.

From Lemma A.1, for any v ≥ a > 0, by abusing the notation, let u = u(v) and t = t(v),the function gv(x) = c(x2 − u2)2 + d(x + u)2 with d = 2v

(u+v)3and c = 1

(u+v)3(u−v)is feasible for

problem (A.20). Notice that d = 2cv(u−v) and u2−uv = (a+v)2

2 , the corresponding dual objectivevalue is:

P (v) =M4 + M2(2v(u− v)− 2u2) + u4 + 2v(u− v)u2

(u + v)3(u− v)

= 1− −M4 + M2(3v2 + 2av + a2) + a2v2 + 2av3

(u + v)3(u− v).

28

Letf(v) = −M4 + M2(3v2 + 2av + a2) + a2v2 + 2av3

andh(v) = (u + v)3(u− v),

then we havef ′(v) = 2(3v + a)(M2 + av),

and

h′(v) = (u + v)3(u′ − 1) + 3(u + v)2(u− v)(u′ + 1)

= (u + v)2(2(2u− v)u′ + 2u− 4v)

= (u + v)2

4

√v2

4+

(a + v)2

2

1

2+

14

2a + 3v√v2

4 + (a+v)2

2

+ v + 2

√v2

4+

(a + v)2

2− 4v

= 2(u + v)2 (a + 2t(v)) .

Therefore,

P ′(a) =h′(a)f(a)− h(a)f ′(a)

h2(a)

=72a3(−M4 + 6a2M2 + 3a4)− 27a4 × 8a(M2 + a2)

h2(a)

=72a3(3a2M2 −M4)

h2(a)

=72a5M2(3−KL)

h2(a)≤ 0

Also for all v,

(u+ v)(a+2t)− (3v +a)(a+u) = (32v + t)(a+2t)− (3v +a)(a+

v

2+ t) = 2t2−a2− 2va− 3

2v2 = 0.

Let {v0 = V (K0)

√M2;

u0 = U(K0)√

M2.

Then becausea =

1√L

√M2 = W (K0)

√M2,

we have

u0(u0 − v0) = M2

√K0 − 1

√K0 + 3 +

√K0 − 1

2=

(a + v0)2

2.

Since u0 ≥ v0, we have that

u0 =v0

2+

√v20

4+

(a + v0)2

2= u(v0).

29

Notice that a + v0 = S(K0)√

M2 and a + 2t(v0) = a + 2u0 − v0 = (S(K0) + 2√

K0 − 1)√

M2, wehave

(f(v0) + M4 −K0M22 )h′(v0)− h(v0)f ′(v0)

= 2(u0 + v0)2M5/22

[(S(K0) + 2

√K0 − 1)

(−K0 + 2V (K0)2 + S(K0)2 + V (K0)2(S(K0)2 − V (K0)2))

−(U(K0)2 − V (K0)2)(2V (K0) + S(K0))V (K0)(U(K0) + W (K0))]

= 2(u0 + v0)2M5/22

[(S(K0) + 2

√K0 − 1)

√K2

0 + 2K0 − 3√

K0 + 3V (K0)

−√

K20 + 2K0 − 3V (K0)(S(K0) +

√K0 − 1)(S(K0) +

√K0 + 3−

√K0 − 1)

]

= 2(u0 + v0)2M5/22 V (K0)

√K2

0 + 2K0 − 3[(S(K0) + 2

√K0 − 1)

√K0 + 3

−(S(K0) +√

K0 − 1)(S(K0) +√

K0 + 3−√

K0 − 1)]

= 0.

Therefore,

P ′(v0) =h′(v0)f(v0)− h(v0)f ′(v0)

h2(v0)= (K0 −K)

h′(v0)M22

h2(v0)< 0.

Define v1 = (√

3L2 + 2L−L−1)a, and the corresponding u1 = u(v1) = La. Let√

3L2 + 2L =r, we have that

f(v1) + M4 − (L +1L− 1)M2

2

= a4[−(L3 − L2 + L) + L

(12L2 + 12L + 3− 6(L + 1)r + 2r − 2(L + 1) + 1

)

+(4L2 + 4L + 1− 2(L + 1)r

)(2r − 2L− 1)

]

= a4((6L2 + 10L + 4)r − (9L3 + 21L2 + 13L + 1)

)

Therefore,

(f(v1) + M4 − (L +1L− 1)M2

2 )h′(v1)− h(v1)f ′(v1)

= 2(u1 + v1)2a5[(

(6L2 + 10L + 4)r − (9L3 + 21L2 + 13L + 1))(3L + 2− r)

−(r − 1)(2L + 1− r)(1 + 3r − 3L− 3)(L + r − L− 1)]

= 2(u1 + v1)2a5.[(27L3 + 63L2 + 45L + 9)r − (45L4 + 123L3 + 113L2 + 37L + 2)

− (27L3 + 63L2 + 45L + 9)r + (45L4 + 123L3 + 113L2 + 37L + 2)]

= 0,

which implies that

P ′(v1) =h′(v1)f(v1)− h(v1)f ′(v1)

h2(v1)= (L3 + L− L2 −KL2)a4 h′(v1)

h2(v1)≥ 0.

30

Since L ≥ 2, K0 ≥ 1 and K0L ≥ 3, we have that

v1 ≥√

LV (max(1, 3/L)a ≥√

LV (K0)a = v0.

Since L ≥ 1, we have v1 ≥ a. Because P ′(v) is a continuous function, there exists a v ∈[max(a, v0), v1] such that P ′(v) = 0. Let u = u(v) and t = t(v). Because u(v) is monotonicallyincreasing function of v, we have that

au ≤ au1 = M2 = u0v0 ≤ uv.

Since f ′(v)h(v) = h′(v)f(v), it follows that

f(v)M2 + av

= 2(3v + a)f(v)f ′(v)

= 2(3v + a)h(v)h′(v)

=(u2 − v2)(3v + a)

a + 2t.

Since(u + v)(a + 2t) = (3v + a)(a + u),

we havef(v)

M2 + av=

(u2 − v2)(3v + a)a + 2t

=(u2 − v2)(u + v)

a + u.

Therefore the corresponding dual objective is

P = P (v) = 1− M2 + av

(a + u)(u + v).

By the definition of t and u, it’s straightforward to prove that

(3v2 + 2au− a2)(a + u) = (32v3 + 5av2 +

52a2v) + (a2 + 2av + 3v2)t = 2(u + v)3(u− v).

Therefore,

M4 = −f(v) + (2av3 + a2v2 + M2(3v2 + 2av + a2)

= 2av3 + a2v2 + M2(3v2 + 2av + a2)− (M2 + av)(u + v)2(u− v)

a + u

= 2av3 + a2v2 + M2(3v2 + 2av + a2)− (M2 + av)3v2 + 2au− a2

2

= M2(v2 + a2 +(v + a)2

2− au + av) + av(u2 − uv − au)

= M2(v2 + a2 + u2 − uv − au + av) + av(u2 − uv − au)

= M2(v2 + a2)− a2v2 + (M2 + av)(u− a)(u− v)

= M2(v2 + a2)− a2v2 + (1− P )(u2 − a2)(u2 − v2)

= (M2 − (1− P )u2)(v2 + a2)− P a2v2 + (1− P )u4.

Now we define distribution X as

X =

v, with probability q = M2−P a2−(1−P )u2

v2−a2 ;−u, with probability p = 1− P ;

a, with probability r = P v2+(1−P )u2−M2

v2−a2 .

31

Becauseau ≤ M2 ≤ uv,

it follows thatu2 −M2

u2 − a2≤ P ≤ u2 −M2

u2 − v2.

Therefore, q, r ≥ 0. Since

f(v) =(u2 − v2)(u + v)(M2 + av)

a + u≥ 0,

and u > v, we have that P ≤ 1, therefore p ≥ 0 and the distribution X is well defined.

Furthermore,

E[X] = qv − pu + ra =M2 − aP v − (1− P )u2

v + a− (1− P )u = 0,

E[X2] = qv2 + pu2 + ra2 = M2,

E[X4] = qv4 + pu4 + ra4 = (M2 − (1− P )u2)(v2 + a2)− P a2v2 + (1− P )u4 = M4.

Therefore, X is feasible to problem (A.19).

Finally, since v ≥ a, the primal objective value is Prob {X ≥ a} = q + r = P , which matchesthe dual objective value for dual feasible solution corresponds to v. Therefore, we have that

Z2P = Z2

D = P .

By Lemma A.1, any v ≥ a corresponds to a dual feasible solution with objective value P (v),therefore P (v) ≥ P , and

P = P (v) = min{P (v) | v ≥ a}.

¤

32

pdfs.semanticscholar.orgpdfs.semanticscholar.org/6e47/a78872fbc48e8112d3a9a1a3847a368e0668.pdfBounding...

Documents

Transcript of pdfs.semanticscholar.orgpdfs.semanticscholar.org/6e47/a78872fbc48e8112d3a9a1a3847a368e0668.pdfBounding...