Polynomial Optimization Problems · of some important subclasses of polynomial optimization...

Polynomial Optimization Problems

— Approximation Algorithms and Applications

LI, Zhening

A Thesis Submitted in Partial Fulfillment

of the Requirements for the Degree of

Doctor of Philosophy

in

Systems Engineering and Engineering Management

The Chinese University of Hong Kong

June 2011

Thesis/Assessment Committee

Professor Duan Li (Chair)

Professor Shuzhong Zhang (Thesis Advisor)

Professor Anthony Man-Cho So (Committee Member)

Professor Yinyu Ye (External Examiner)

Abstract

Polynomial optimization problem is to optimize a generic multivariate polynomial func-

tion, subject to some suitable polynomial equality and inequality constraints. Such

problem formulation dates back to the 19th century, when the relationship between

nonnegative polynomials and sum of squares were discussed by Hilbert. Polynomial

optimization is one of the fundamental problems in the field of optimization, and has

applications in a large range of areas, including biomedical engineering, control theory,

graph theory, investment science, material science, numerical linear algebra, quantum

mechanics, signal processing, speech recognition, etc. This thesis presents a study

of some important subclasses of polynomial optimization problems arising from vari-

ous applications. The focus is on optimizing a high degree polynomial function, over

some commonly encountered constraint sets, such as the Euclidean ball, the Euclidean

sphere, the intersection of co-centered ellipsoids, the binary hypercube, as well as a

combination of them. Specifically, five classes of models are discussed, i.e., optimiz-

ing a multilinear function with quadratic constraints, a homogeneous polynomial with

quadratic constraints, a general polynomial with convex constraints, a general polyno-

mial with binary constraints, and a homogeneous polynomial with binary and spherical

constraints. All the problems under consideration are NP-hard in general. The main

contribution of this thesis is on the design and analysis of polynomial-time approxima-

tion algorithms with guaranteed worst-case performance ratios. These approximation

ratios are dependent on the problem dimensions only, and the new results improve

some of the existing results in the literature. In each class of these optimization mod-

els, some application examples are discussed and results of numerical experiments are

reported, revealing good practical performance of the proposed algorithms for solving

some randomly generated test instances.

ii

ÁÁÁ

õª`z¯K´Äu,«õªªÚõªØªå^e±õõª¼ê

8I`z¯K"ùa¯KïÄJãÊV§FËA?ØKõ

ª¼êÚõª²Ú¼êm'X"õª`z¯K´`z+¯K

§ÙA^+~2§Ù¥))ÔÆó§!Ø!ãØ!Ý]nØ!

áÆ!êê!þfåÆ!&Ò?n!Ñ£O"Ø©XïÄõª

`z¯K¥aA^+'f¯K"ïÄ`z¯K:3±p

gõª¼ê8I§å8Ï~Ñyõªå8Ü§~XµîAp¥

N!îAp¥¡!Ó%ý8!páNº:8!±9§«|Ü"

äN5`§©ïÄÊaõª`z.§§´g¼êåeõ5¼ê`

z!g¼êåeàgõª`z!àåeõª`z!åe

õª`z§±9Ú¥¡åeàgõª`z"¤k9`z¯KÑ

NP(J"Ø©Ìz3uOÚ©ÛõªmCq"ù

)Ñk½Cq'y§Cq'==`z¯Kêk'§#(JU?

yk©z(J"3za`z.ï?¥§Ø©ÞA^¢~§¿®

ê¢(J§ww«ùa3)ûÅ)Y~1J~Ð"

iii

Acknowledgement

I would never have finished this work, or even started it, if it were not for my father, who

instilled in me a passion for learning and striving. Almost eight years have passed since

my father’s sudden passing away, and he has always stayed in my heart, supporting

and encouraging me during the whole journey of my doctoral work. All the difficulties

have become quite manageable, and my confidence in myself has never been lost.

It is a great fortune to have Professor Shuzhong Zhang as my doctoral advisor, who

got me excited about various interesting research topics, as well as moralities and other

human qualities. I would like to express my heartfelt thanks to him, for his guidance,

constant support and numerous helps when in need, as well as many enjoyable hours

of discussions that we have had over the last four years. I am greatly indebted to him.

I would like to express my gratitude to those who gave me the possibility to complete

this work, especially to my former colleague, Professor Simai He, from whom I have

been benefitted tremendously for his insightful comments, valuable suggestions and

efforts in discussing technical problems. I would also like to thank Professors Duan Li,

Anthony So, and Yinyu Ye for serving on my thesis committee.

I would also like to extend my gratitude to the professors, supporting staff mem-

bers and the fellow postgraduate students at Department of Systems Engineering and

Engineering Management for providing me with technical and non-technical supports.

It has been a great pleasure to work with them. Besides, this work would not have

been possible without the general support from The Chinese University of Hong Kong.

Finally, I would like to express my deepest gratitude to the three most important

women in my family, whose love and support enabled me to complete this work. To my

dear mother: thank you for continuously supporting and taking care of me; To my dear

wife: thank you for your patience, trust and encouragement; And to my dear daughter:

thank you for bringing me so much indelible happiness, I owe you a great deal!

iv

This work is dedicated to my father

Contents

Abstract ii

Acknowledgement iv

1 Introduction 1

1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Notations and Preliminaries 12

2.1 Notations and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.1 Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.2 Constraint Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.3 Models and Organization . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Tensor Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Approximation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 Randomized Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.5 Semidefinite Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Multilinear Form Optimization with Quadratic Constraints 29

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Multilinear Form with Spherical Constraints . . . . . . . . . . . . . . . . 31

3.3 Multilinear Form with Ellipsoidal Constraints . . . . . . . . . . . . . . . 36

3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4.1 Singular Values of Trilinear Forms . . . . . . . . . . . . . . . . . 43

vi

CONTENTS vii

3.4.2 Rank-One Approximation of Tensors . . . . . . . . . . . . . . . . 44

3.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5.1 Randomly Simulated Data . . . . . . . . . . . . . . . . . . . . . 46

3.5.2 Data with Known Optimal Solutions . . . . . . . . . . . . . . . . 47

4 Homogeneous Form Optimization with Quadratic Constraints 49

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2 Homogeneous Form with Spherical Constraint . . . . . . . . . . . . . . . 52

4.3 Homogeneous Form with Ellipsoidal Constraints . . . . . . . . . . . . . 57

4.4 Mixed Form with Quadratic Constraints . . . . . . . . . . . . . . . . . . 59

4.4.1 Mixed Form with Spherical Constraints . . . . . . . . . . . . . . 60

4.4.2 Mixed Form with Ellipsoidal Constraints . . . . . . . . . . . . . 65

4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5.1 Eigenvalues and Approximation of Tensors . . . . . . . . . . . . 66

4.5.2 Density Approximation in Quantum Physics . . . . . . . . . . . 67



4.6.2 Comparison with Sum of Squares Method . . . . . . . . . . . . . 70

5 Polynomial Optimization with Convex Constraints 72

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2 Polynomial with Ball Constraint . . . . . . . . . . . . . . . . . . . . . . 74

5.2.1 Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.2.2 Multilinear Form Relaxation . . . . . . . . . . . . . . . . . . . . 78

5.2.3 Homogenizing Components Adjustment . . . . . . . . . . . . . . 79

5.2.4 Feasible Solution Assembling . . . . . . . . . . . . . . . . . . . . 82

5.3 Polynomial with Ellipsoidal Constraints . . . . . . . . . . . . . . . . . . 85

5.4 Polynomial with General Convex Constraints . . . . . . . . . . . . . . . 88

5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.5.1 Portfolio Selection with Higher Moments . . . . . . . . . . . . . . 92

5.5.2 Sensor Network Localization . . . . . . . . . . . . . . . . . . . . 93



5.6.2 Local Improvements . . . . . . . . . . . . . . . . . . . . . . . . . 95

viii CONTENTS

6 Polynomial Optimization with Binary Constraints 97

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.2 Multilinear Form with Binary Constraints . . . . . . . . . . . . . . . . . 99

6.3 Homogeneous Form with Binary Constraints . . . . . . . . . . . . . . . 103

6.4 Mixed Form with Binary Constraints . . . . . . . . . . . . . . . . . . . . 107

6.5 Polynomial with Binary Constraints . . . . . . . . . . . . . . . . . . . . 109

6.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.6.1 Cut-Norm of Tensors . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.6.2 Maximum Complete Satisfiability . . . . . . . . . . . . . . . . . . 116

6.6.3 Box-Constrained Diophantine Equation . . . . . . . . . . . . . . 117



6.7.2 Data of Low-Rank Tensors . . . . . . . . . . . . . . . . . . . . . 120

7 Homogeneous Form Optimization with Mixed Constraints 122

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.2 Multilinear Form with Binary and Spherical Constraints . . . . . . . . . 124

7.3 Homogeneous Form with Binary and Spherical Constraints . . . . . . . 127

7.4 Mixed Form with Binary and Spherical Constraints . . . . . . . . . . . . 130

7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7.5.1 Matrix Combinatorial Problem . . . . . . . . . . . . . . . . . . . 132

7.5.2 Vector-Valued Maximum Cut . . . . . . . . . . . . . . . . . . . . 133

8 Conclusion and Recent Developments 135

Bibliography 138

Chapter 1

Introduction

Polynomial optimization problem is the following generic optimization model

(POP ) min p(x)

s.t. fi(x) ≤ 0, i = 1, 2, . . . ,m1,

gj(x) = 0, j = 1, 2, . . . ,m2,

x = (x1, x2, · · · , xn)T ∈ Rn,

where p(x), fi(x) (i = 1, 2, . . . ,m1) and gj(x) (j = 1, 2, . . . ,m2) are some multivariate

polynomial functions. This problem is a fundamental model in the field of optimization,

and has applications in a wide range of areas. Many algorithms have been proposed

for subclasses of (POP ), and specialized software packages have been developed.

1.1 History

The modern history of polynomial optimization may date back to the 19th century when

the relationship between nonnegative polynomial function and the sum of squares of

polynomials was studied. Given a multivariate polynomial function that takes only

nonnegative values over the real numbers, can it be represented as a sum of squares

of polynomial functions? Hilbert [54] gave a concrete answer in 1888, which asserted

that the only cases for a nonnegative polynomial to be a sum of squares are: univariate

polynomials; multivariate quadratic polynomials; and bivariate quartic polynomials.

Later, in Hilbert’s 17th problem—one of the famous 23 Hilbert problems addressed in

a celebrated speech in 1900 by Hilbert, a nonnegative polynomial entails expression

of definite rational functions as quotients of sums of squares. Given a multivariate

1

2 1 Introduction

polynomial function that takes only nonnegative values over the real numbers, can it be

represented as a sum of squares of rational functions? This was solved in the affirmative,

by Artin [8] in 1927. A continuous and constructive algorithm was later found by

Delzell [30] in 1984. About 10 years ago, Lasserre [70, 71] and Parrilo [93, 94] proposed

a method called the sum of squares (SOS) to solve general polynomial optimization

problem. The method is based on the fact that deciding whether a given polynomial

is a sum of squares can be reduced to the feasibility of a semidefinite program (SDP).

The SOS approach has a strong theoretical appeal, as it can in principle solve any

polynomial optimization problem to any given accuracy.

1.2 Applications

Polynomial optimizations have wide applications—just to name a few examples: biomed-

ical engineering, control theory, graph theory, investment science, material science, nu-

merical linear algebra, quantum mechanics, signal processing, speech recognition. It is

basically impossible to list, even very partially, the success stories of (POP ), simply

due to its sheer size in the literature. To motivate our study, below we shall nonetheless

mention some sample applications to illustrate the usefulness of (POP ).

Polynomial optimizations have immediate applications in investment science. For

instance, the celebrated mean-variance model was proposed by Markowitz [81] early in

1952, where the portfolio selection problem is modeled by minimizing the variance of the

investments subject to its target return. In control theory, Roberts and Newmann [107]

studied polynomial optimization of stochastic feedback control for stable plants. In

diffusion magnetic resonance imaging (MRI), Barmpoutis et al. [14] presented a case

for the fourth order tensor approximation. In fact, there are a large class of (POP )

arising from tensor approximations and decompositions, which are originated from

applications in psychometrics and chemometrics (see an excellent survey by Kolda

and Bader [68]). Polynomial optimizations have also applications in sinal processing.

Maricic et al. [79] proposed a quartic polynomial model for blind channel equalization

in digital communication, and Qi and Teo [101] conducted global optimization for high

degree polynomial minimization models arising from signal processing. In quantum

physics, Dahl et al. [27] proposed a polynomial optimization model to verify whether a

physical system is entangled or not, which is an important problem in quantum physics.

1.2 Applications 3

Gurvits [42] showed that the entanglement verification is NP-hard in general. In fact,

the model discussed in [27] is related to the nonnegative quadratic mappings studied

by Luo et al. [76].

Among generic polynomial functions, homogeneous polynomials play an important

role in approximation theory (see e.g., two recent papers by Kroo and Szabados [69] and

Varju [117]). Essentially their results state that the homogeneous polynomial functions

are fairly ‘dense’ among continuous functions in a certain well-defined sense. As such,

optimizations of homogeneous polynomials become important. As an example, Ghosh

et al. [39] formulated a fiber detection problem in diffusion MRI by maximizing a

homogenous polynomial function subject to the Euclidean spherical constraint, i.e.,

(HS) max f(x)

s.t. ‖x‖2 = 1, x ∈ Rn.

The constraint of (HS) is a typical polynomial equality constraint. In this case, the

degree of the homogeneous polynomial f(x) may be high. This particular model (HS)

is widely appeared in the following examples. In material sciences, Soare et al. [110]

proposed some 4th, 6th and 8th order homogeneous polynomials to model the plastic

anisotropy of orthotropic sheet metal. In statistics, Micchelli and Olsen [82] considered

a maximum-likelihood estimation model in speech recognition. In numerical linear

algebra, (HS) is the formulation of an interesting problem: the eigenvalues of tensors

(see Qi [99, 100] and Ni et al. [91]). Another widely used application of (HS) is regarding

to the best rank-one approximation of higher order tensors (see [67, 68]).

In fact, Markowitz’s mean-variance model [81] mentioned previously is also opti-

mization on a homogeneous polynomial, in particular, a quadratic form. Recently, an

intensified discussion on investment models involving more than the first two moments

(for instance to include the skewness and the kurtosis of the investment returns) have

been another source of inspiration underlying polynomial optimizations. Mandelbrot

and Hudson [78] made a strong case against a ‘normal view’ of the investment returns.

The use of higher moments in portfolio selection becomes quite necessary. Along that

line, several authors proposed investment models incorporating the higher moments,

e.g., de Athayde and Flore [10], Prakash et al. [96], Jondeau and Rockinger [60], and

Kleniati et al. [64]. However, in those models, the polynomial functions involved are

4 1 Introduction

no longer homogeneous. In particular, a very general model in [64] is

max α∑n

i=1µixi −β∑n

i,j=1σijxixj +γ∑n

i,j,k=1ςijkxixjxk −δ∑n

i,j,k,`=1κijk`xixjxkx`

s.t.∑n

i=1 xi = 1, x ≥ 0, x ∈ Rn,

where (µi), (σij), (ςijk) and (κijk`) are the first four central moments of the given n

assets. The nonnegative parameters α, β, γ, δ measure the investor’s preference to the

four moments, and they sum up to one, i.e., α + β + γ + δ = 1. Besides investment

science, many other important applications of polynomial function optimization involve

an objective that is intrinsically inhomogeneous. The other example is the least square

formulation to the sensor network localization problem proposed in Luo and Zhang [77].

Specifically, the problem takes the form of

min∑

i,j∈S(‖xi − xj‖2 − dij2

)2+∑

i∈S,j∈A(‖xi − aj‖2 − dij2

)2s.t. xi ∈ R3, i ∈ S,

where A and S denote the set of anchor nodes and sensor nodes respectively, dij (i ∈

S, j ∈ S ∪A) are (possibly noisy) distance measurements, aj (j ∈ A) denote the known

positions of anchor nodes, while xi (i ∈ S) represent the positions of sensor nodes to

be estimated.

Apart from the continuous models discussed above, polynomial optimizations over

variables in discrete values, in particular binary variables, are also widely studied. For

example, maximize a polynomial function over variables picking from 1 or -1, i.e.,

(PB) max p(x)

s.t. xi ∈ 1,−1, i = 1, 2, . . . , n.

This type of problem can be found in a great variety of application domains. Indeed,

(PB) has been investigated extensively in the quadratic case, due to its connections

to various graph partitioning problems, e.g., the maximum cut problem [40]. If the

degree of the polynomial goes higher, the following hypergraph max-cover problem is

also well studied. Given a hypergraph H = (V,E) with V being the set of vertices

and E the set of hyperedges (or subsets of V ), and each hyperedge e ∈ E is associated

with a real-valued weight w(e). The problem is to find a subset S of the vertices set

V , such that the total weight of the hyperedges covered by S is maximized. Denoting

xi ∈ 0, 1 (i = 1, 2, . . . , n) to indicate whether or not vertex i is selected in S. The

problem thus is maxx∈0,1n∑

e∈E w(e)∏i∈e xi. By a simple variable transformation

xi → (xi + 1)/2, the problem is transformed to (PB), and vice versa.

1.2 Applications 5

Note that the model (PB) is a fundamental problem in integer programming. As

such it has received attention in the literature (see e.g., [43, 44]). It is also known as the

Fourier support graph problem. Mathematically, a polynomial function p : −1, 1n →

R has Fourier expansion p(x) =∑

S⊂1,2,...,n p(S)∏i∈S xi, which is also called the

Fourier support graph. By assuming that p(x) has only succinct (polynomially many)

non-zero Fourier coefficient p(S), can we compute the maximum value of p(x) over the

discrete hypercube 1,−1n, or alternatively can we find a good approximate solution

in polynomial-time? The latter question actually motivates the discrete polynomial

optimization models studied in this thesis. In general, (PB) is closely related to finding

the maximum weighted independent set in a graph. In fact, any instance of (PB) can

be transformed into the maximum weighted independent set problem, which is also

the most commonly used technique in the literature for solving (PB) (see e.g., [12,

106]). The transformation uses the concept of a conflict graph of a 0-1 polynomial

function, for details, one is referred to [21, 9]. Beyond its connection to the graph

problems, (PB) also has applications in neural networks [58, 21, 6], error-correcting

codes [21, 97], etc. In fact, Bruck and Blaum [21] reveal the natural equivalence within

the model (PB), maximum likelihood decoding of error-correcting codes, and finding

the global maximum of a neural network. Recently Khot and Naor [63] show that it has

applications in the problem of refutation of random k-CNF formulas [32, 35, 33, 34].

If the objective polynomial function in (PB) is homogeneous, likewise, the homoge-

neous quadratic case has been studied extensively, e.g., [40, 88, 90, 5]. Homogeneous

cubic polynomial case is also discussed by Khot and Naor [63]. Another interesting

problem of this class is the ∞ 7→ 1-norm of a matrix F = (Fij), studied by Alon and

Naor [5], i.e.,

‖F ‖∞7→1 = max∑

1≤i≤n1,1≤j≤n2Fijxiyj

s.t. x ∈ 1,−1n1 , y ∈ 1,−1n2 .

It is quite natural to extend the problem of ∞ 7→ 1-norm to higher order tensors. In

particular, the ∞ 7→ 1-norm of a d-th order tensor F = (Fi1i2···id) can be defined as

max∑

1≤i1≤n1,1≤i2≤n2,··· ,1≤id≤nd Fi1i2···idx1i1x2i2· · ·xdid

s.t. xk ∈ 1,−1nk , k = 1, 2, . . . , d.

Another generalization of the matrix ∞ 7→ 1-norm is to extend the entry Fij of the

6 1 Introduction

matrix F to a symmetric matrix Aij ∈ Rm×m, i.e., the problem of

max λmax

(∑1≤i≤n1,1≤j≤n2

xiyjAij

)s.t. x ∈ 1,−1n1 , y ∈ 1,−1n2 ,

where λmax indicates the largest eigenvalue of a matrix. If the matrix Aij ∈ Rm1×m2

is not restricted to be symmetric, we may instead maximize the largest singular value,

i.e.,

max σmax

(∑1≤i≤n1,1≤j≤n2

xiyjAij

)s.t. x ∈ 1,−1n1 , y ∈ 1,−1n2 .

These two problems are actually equivalent to

max∑

1≤i≤n1,1≤j≤n2,1≤k,`≤m Fijk` xiyjzkz`

s.t. x ∈ 1,−1n1 , y ∈ 1,−1n2 ,

‖z‖2 = 1, z ∈ Rm

and

max∑

1≤i≤n1,1≤j≤n2,1≤k≤m1,1≤`≤m2Fijk` xiyjzkw`

s.t. x ∈ 1,−1n1 , y ∈ 1,−1n2 ,

‖z‖2 = ‖w‖2 = 1, z ∈ Rm1 , w ∈ Rm2

respectively, where F = (Fijk`) is a fourth order tensor, whose (i, j, k, `)-th entry is

the (k, `)-th entry of the matrix Aij . These two special models of (POP ) extends

polynomial integer programming problems to the mixed integer programming problems,

which is also an important subclass of (POP ) studied in this thesis.

1.3 Algorithms

Polynomial optimization problems are typically non-convex and highly nonlinear. In

most cases, (POP ) is NP-hard, even for very special instances, such as maximizing a

cubic polynomial over a sphere (see Nestorov [90]), maximizing a quadratic form in

binary variables (see e.g., Goemans and Williamson [40]), etc. The reader is referred

to de Klerk [65] for a survey on the computational complexity issues of polynomial

optimization over some simple constraint sets. In the case that the constraint set is

a simplex and the objective polynomial has a fixed degree, it is possible to derive

polynomial-time approximation schemes (PTAS) (see de Klerk et al. [66]), albeit the

result is viewed mostly as a theoretical one. Almost in all practical situations, the

1.3 Algorithms 7

problem is difficult to solve, theoretically as well as numerically. However, the search

for general and efficient algorithms for polynomial optimization has been a priority for

many mathematical optimizers and researchers in various applications.

Perhaps the very first attempt for solving polynomial optimization problems is

taking them as nonlinear programming problems, and many existing algorithms and

software packages are available, including KNITRO, BARON, IPOPT, SNOPT, and

Matlab optimization toolbox. However, these algorithms and solvers are not tailor

made for polynomial optimization problems, and so the performance may vary greatly

from problem instance to instance. One direct approach is to apply the method of

Lagrange multipliers to reach a set of multivariate polynomial equations, which is the

Karush-Kuhn-Tucker (KKT) system that provides the necessary conditions for opti-

mality (see e.g., [122, 39, 57]). In [39], the authors develop special algorithms for that

purpose, such as subdivision methods proposed by Mourrain and Pavone [84], and gen-

eralized normal forms algorithms designed by Mourrain and Trebuchet [85]. However,

the shortcomings of these methods are apparent if the degree of the polynomial is high.

Generic solution methods based on nonlinear programming and global optimization

have been studied and tested (see e.g., Qi [98] and Qi et al. [102], and the references

therein). Recently, a tensor eigenvalue based method for a global polynomial optimiza-

tion problem was also studies by Qi et al. [103]. Moreover, Parpas and Rustem [92], and

Maringer and Parpas [80] proposed diffusion-based methods to solve the non-convex

polynomial optimization models arising from portfolio selection involving higher mo-

ments. For polynomial integer programming models, e.g., (PB), the most commonly

used technique in the literature is transforming them to the maximum weighted inde-

pendent set problems (see e.g., [12, 106]), by using the concept of a conflict graph of a

0-1 polynomial function.

Sum of squares (SOS) approach has been one major systematic approach for solving

general polynomial optimization problems. The approach was proposed by Lasserre [70,

71] and Parrilo [93, 94], and significant research on the SOS method has been conducted

in recent ten years. The SOS method has a strong theoretical appeal, by constructing

a sequence of semidefinite programming (SDP) relaxations of the given polynomial

optimization problem in such a way that the corresponding optimal values are monotone

and converge to the optimal value of the original problem. Thus it can in principle solve

any instance of (POP ) to any given accuracy. For univariate polynomial optimization,

8 1 Introduction

Nesterov [89] showed that the SOS method in combination with the SDP solution has a

polynomial-time complexity. This is also true for unconstrained multivariate quadratic

polynomial and bivariate quartic polynomial when the nonnegativity is equivalent to

the sum of squares. In general, however, the SDP problems required to be solved by the

SOS method may grow very large, and is not practical when the program dimension

goes high. At any rate, thanks to the recently developed efficient SDP solvers (e.g.,

SeDuMi of Sturm [112], SDPT3 of Toh et al. [115]), the SOS method appears to be

attractive. Henrion and Lasserre [52] developed a specialized tool known as GloptiPoly

(the latest version, GloptiPoly 3, can be found in Henrion et al. [53]) for finding a global

optimal solution of polynomial optimization problems on the SOS method, based on

Matlab and SeDuMi. For an overview on the recent theoretical developments, we refer

to the excellent survey by Laurent [72].

On the other side, the intractability of general polynomial optimizations therefore

motivates the search for suboptimal, or more formally, approximate solutions. In the

case that the objective polynomial is quadratic, a well known example is the semidef-

inite programming relaxation and randomization approach for the max-cut problem

due to Goemans and Williamson [40], where essentially a 0.878-approximation ratio of

the model maxx∈1,−1n xTFx is shown with F being the Laplacian of a given graph.

Note that the approach in [40] has been generalized subsequently by many authors,

including Nesterov [88], Ye [118, 119], Nemirovski et al. [87], Zhang [120], Charikar

and Wirth [24], Alon and Naor [5], Zhang and Huang [121], Luo et al. [75], and He et

al. [50]. In particular, when the matrix F is only known to be positive semidefinite,

Nestrov [88] derived a 0.636-approximation bound for maxx∈1,−1n xTFx. For general

diagonal-free matrix F , Charikar and Wirth [24] derived an Ω (1/ log n)-approximation

bound, while its inapproximate results are also discussed by Arora et al. [7]. For the

matrix ∞ 7→ 1-norm problem maxx∈1,−1n1 ,y∈1,−1n2 xTFy, Alon and Naor [5] de-

rived a 0.56-approximation bound. Remark that all these approximation bounds remain

hitherto the best available ones. In continuous polynomial optimizations, Nemirovski

et al. [87] proposed an Ω (1/ logm)-approximation bound for maximizing a quadratic

form over the intersection of m co-centered ellipsoids. Their models are further studied

and generalized by Luo et al. [75] and He et al. [50].

Among all the successful approximation stories mentioned above, the objective poly-

nomials are all quadratic. However, there are only a few approximation results in the

1.3 Algorithms 9

literature when the degree of the objective polynomial is greater than two. Perhaps

the very first one is due to de Klerk et al. [66] in deriving a PTAS of optimizing a fixed

degree homogenous polynomial over a simplex, and it turns out to be a PTAS of opti-

mizing a fixed degree even form (homogeneous polynomial with only even exponents)

over the spherical constraint. Later, Barvinok [15] showed that optimizing a certain

class of polynomials over the spherical constraint also admits a randomized PTAS. Note

that the results in [66, 15] apply only when the objective polynomial has some special

structure. A quite general result is due to Khot and Naor [63], where they showed how

to estimate the optimal value of the problem maxx∈1,−1n∑

1≤i,j,k≤n Fijkxixjxk with

(Fijk) being square-free, i.e., Fijk = 0 whenever two of the indices are equal. Specifical-

ly, they presented a polynomial-time randomized procedure to get an estimated value

that is no less than Ω

(√lognn

)times the optimal value. Two recent papers (Luo and

Zhang [77], and Ling et al. [73]) discussed polynomial optimization problems with the

degree of objective polynomial being four, and start a whole new research on approxi-

mation algorithms for high degree polynomial optimizations, which are essentially the

main subject in this thesis. Luo and Zhang [77] considered quartic optimization, and

showed that optimizing a homogenous quartic form over the intersection of some co-

centered ellipsoids is essentially equivalent to its (quadratic) SDP relaxation problem,

which is itself also NP-hard. However, this gives a handle on the design of approx-

imation algorithms with provable worst-case approximation ratios. Ling et al. [73]

considered a special quartic optimization model. Basically, the problem is to minimize

a biquadratic function over two spherical constraints. In [73], approximate solutions

as well as exact solutions using the SOS method are considered. The approximation

bounds in [73] are indeed comparable to the bound in [77], although they are dealing

with two different models. Very recently, Zhang et al. [123] and Ling al. [74] further

studied biquadratic function optimization over quadratic constraints. The relations

with its bilinear SDP relaxation are discussed, based on which they derived some data

dependent approximation bounds.

In the meanwhile, when the objective function of (POP ) is a high degree inhomo-

geneous polynomial, we have not seen any approximation results so far, even in the

relative sense (for a discussion on relative approximation algorithms, see Section 2.3).

As a matter of fact, so far all the successful polynomial-time approximation algorithm-

s with provable approximation ratios in the literature, e.g., the quadratic, cubic and

10 1 Introduction

quartic models mentioned above are all dependent on the homogeneity in a crucial way.

Technically, a homogenous polynomial function allows one to scale the overall function

value along a given direction, which is an essential operation in proving the quality

bound of the approximation algorithms. Thus, extending the solution methods and

the corresponding analysis from homogeneous polynomial optimizations to the general

inhomogeneous polynomials is not straightforward. These trigger us to search for ap-

proximate solutions of (POP ) with an inhomogeneous polynomial objective, which is

one of the targets to achieve in this thesis.

1.4 Main Contributions

This thesis is concerned with some important and widely used subclasses of polynomial

optimization problems, including optimization of a multilinear function with quadratic

constraints, a homogeneous polynomial with quadratic constraints, a general polyno-

mial with convex constraints, a general polynomial with binary constraints, and a ho-

mogeneous polynomial with binary and spherical constraints. The detailed description

of the problems studied is listed in Section 2.1.3. All these problems are NP-hard in

general, and the focus is on the design and analysis of polynomial-time approximation

algorithms with provable worst-case performance ratios. We also discuss the appli-

cations of these models, and the numerical performance of the proposed algorithms.

Specifically, our contributions are highlighted as follows.

1. We propose approximation algorithms for optimization of any fixed degree homo-

geneous polynomial with quadratic constraints, which is the first such result for

approximation algorithms of polynomial optimization problems with an arbitrary

degree. The approximation ratios depend only on the dimensions of the problems

concerned. Compared with any existing results for high degree polynomial opti-

mizations, our approximation ratios improve the previous ones, when specialized

to their particular degrees.

2. We establish systematic link identities between multilinear functions and homoge-

neous polynomials, and thus establish the same approximation ratios for homoge-

neous polynomial optimizations with their multilinear form relaxation problems.

3. We propose a general scheme to handle inhomogeneous polynomial optimizations

1.4 Main Contributions 11

through the method of homogenization, and thus establish the same approxima-

tion ratios (in relative sense) for inhomogeneous polynomial optimizations with

their homogeneous polynomial relaxation problems. It is the first approxima-

tion bound of approximation algorithms for general inhomogeneous polynomial

optimizations with a high degree.

4. We propose several decomposition routines for polynomial optimizations over

different types of constraint sets, and derive approximation bounds for multilinear

function optimizations with their lower degree relaxation problems.

5. With the availability of our proposed approximation algorithms, we illustrate

some potential modeling opportunities with the new optimization models.

This thesis is organized as follows. First in Chapter 2, we introduce the notations

and models, as well as providing necessary preparations for better understanding the w-

hole thesis. Then from Chapter 3 to Chapter 7, we discuss five subclasses of polynomial

optimization problems, with each subclass in one chapter (the detail description of these

subclasses is introduced in Section 2.1). In each of these five chapters, polynomial-time

approximation algorithms with provable worst-case performance ratios will be proposed

to solve the models concerned, followed by a discussion on their applications and/or a

report on numerical performance of the algorithms proposed. Finally, in Chapter 8, we

summarize the main results in this thesis, and discuss some recent developments and

future research topics.

Chapter 2

Notations and Preliminaries

2.1 Notations and Models

Throughout this thesis, we exclusively use the boldface letters to denote vectors, ma-

trices, and tensors in general (e.g., the decision variable x, the data matrix Q, and the

tensor form F ), while the usual non-bold letters are reserved for scalars (e.g., x1 being

the first component of the vector x, Qij being one entry of the matrix Q).

2.1.1 Objective Functions

The objective functions of the optimization models studied in this thesis are all multi-

variate polynomial functions. The following multilinear tensor function (or multilinear

form) plays a major role in the discussion

Function T F (x1,x2, · · · ,xd) :=∑

1≤i1≤n1,1≤i2≤n2,··· ,1≤id≤nd

Fi1i2···idx1i1x

2i2 · · ·x

did,

where xk ∈ Rnk for k = 1, 2, . . . , d; and the letter ‘T’ signifies the notion of tensor. In

the shorthand notation we denote F = (Fi1i2···id) ∈ Rn1×n2×···×nd to be a d-th order

tensor, and F to be its corresponding multilinear form. The meaning for multilinear

states if one fixed (x2,x3, · · · ,xd) in the function F , then it is a linear function of x1,

and so on.

Closely related with the tensor form F is a general d-th degree homogeneous poly-

nomial function f(x), where x ∈ Rn. We call the tensor form F = (Fi1i2···id) super-

symmetric (see [67]), if any of its components Fi1i2···id is invariant under all permuta-

tions of i1, i2, · · · , id. As any homogeneous quadratic function uniquely determines

12

2.1 Notations and Models 13

a symmetric matrix, a given d-th degree homogeneous polynomial function f(x) also

uniquely determines a super-symmetric tensor form. In particular, if we denote a d-th

degree homogeneous polynomial function

Function H f(x) :=∑

1≤i1≤i2≤···≤id≤nF ′i1i2···idxi1xi2 · · ·xid ,

then its corresponding super-symmetric tensor form can be written as F = (Fi1i2···id) ∈

Rnd , with Fi1i2···id ≡ F ′i1i2···id/|Π(i1, i2, · · · , id)|, where |Π(i1, i2, · · · , id)| is the number

of distinctive permutations of the indices i1, i2, · · · , id. This super-symmetric tensor

representation is indeed unique. Let F be its corresponding multilinear form defined

by the super-symmetric tensor F , then we have f(x) = F (x,x, · · · ,x︸︷︷︸d

). The letter ‘H’

here is used to emphasize that the polynomial function in question is homogeneous.

We shall also consider in this this the following mixed form

Function M f(x1,x2, · · · ,xs) :=F (x1,x1, · · · ,x1︸︷︷︸d1

,x2,x2, · · · ,x2︸︷︷︸d2

, · · · ,xs,xs, · · · ,xs︸︷︷︸ds

),

where d1 + d2 + · · · + ds = d, xk ∈ Rnk for k = 1, 2, . . . , s, d-th order tensor form

F ∈ Rn1d1×n2

d2×···×nsds ; and the letter ‘M’ signifies the notion of mixed polynomial

form. We may without loss of generality assume that F has partial symmetric property,

namely for any fixed (x2,x3, · · · ,xs), F (·, ·, · · · , ·︸︷︷︸d1

,x2,x2, · · · ,x2︸︷︷︸d2

, · · · ,xs,xs, · · · ,xs︸︷︷︸ds

) is

a super-symmetric d1-th order tensor form, and so on.

Beyond the homogeneous polynomial functions described above, we also study

in this thesis the generic multivariate inhomogeneous polynomial function. An n-

dimensional d-th degree polynomial function can be explicitly written as a summation

of homogenous polynomial functions in decreasing degrees as follows

Function P p(x) :=

d∑k=1

fk(x) + f0 =

d∑k=1

Fk(x,x, · · · ,x︸︷︷︸k

) + f0,

where x ∈ Rn, f0 ∈ R, and fk(x) = Fk(x,x, · · · ,x︸︷︷︸k

) is a homogenous polynomial func-

tion of degree k for k = 1, 2, . . . , d; and letter ‘P’ signifies the notion of polynomial.

One natural way to deal with inhomogeneous polynomial function is through homoge-

nization; that is, we introduce a new variable, to be denoted by xh in this thesis, which

is actually set to be 1, to yield a homogeneous form

p(x) =

d∑k=1

fk(x) + f0 =

d∑k=1

fk(x)xhd−k + f0xh

d = f(x),

14 2 Notations and Preliminaries

where f(x) is an (n + 1)-dimensional d-th degree homogeneous polynomial function,

with variable x ∈ Rn+1. Throughout this thesis, the ‘bar’ notation over boldface lower-

case letters, e.g., x, is reserved for an (n+1)-dimensional vector, with the underlying let-

ter x referring to the vector of its first n components, and the subscript ‘h’ (the subscript

of xh) referring to its last component. For instance, if x = (x1, x2, · · · , xn, xn+1)T ∈

Rn+1, then x = (x1, x2, · · · , xn)T ∈ Rn and xh = xn+1 ∈ R.

Throughout we adhere to the notation F for a multilinear form (Function T) defined

by a tensor form F , and f for a homogenous polynomial (Function H) or a mixed

homogeneous form (Function M), and p for a generic (inhomogeneous) polynomial

function (Function P). Without loss of generality we assume that n1 ≤ n2 ≤ · · · ≤ nd

in the tensor form F ∈ Rn1×n2×···×nd , and n1 ≤ n2 ≤ · · · ≤ ns in the tensor form

F ∈ Rn1d1×n2

d2×···×nsds . We also assume at lease one component of the tensor form, F

in Functions T, H, M, and F d in Function P is nonzero to avoid triviality.

2.1.2 Constraint Sets

The most commonly used constraint sets for polynomial optimization problems are

studied in this thesis. Specifically, we consider the following types of constraint sets:

Constraint Bx ∈ Rn |xi2 = 1, i = 1, 2, . . . , n

=: Bn;

Constraint Bx ∈ Rn |xi2 ≤ 1, i = 1, 2, . . . , n

=: Bn;

Constraint Sx ∈ Rn | ‖x‖ :=

(x1

2 + x22 + · · ·+ xn

2) 1

2 = 1

=: Sn;

Constraint S x ∈ Rn | ‖x‖ ≤ 1 =: Sn;

Constraint Qx ∈ Rn |xTQix ≤ 1, i = 1, 2, . . . ,m

;

Constraint G x ∈ Rn |x ∈ G .

The notion ‘B’ signifies the binary variables or binary constraints, and ‘S’ signifies

the Euclidean spherical constraint, with ‘B’ (hypercube) and ‘S’ (the Euclidean ball)

signifying their convex hulls respectively. The norm notation ‘‖ · ‖’ in this thesis is the

2-norm (the Euclidean norm) unless otherwise specified, including those for vectors, ma-

trices and tensors. In particular, the norm of the tensor F = (Fi1i2···id) ∈ Rn1×n2×···×nd

is defined as

‖F ‖ :=

√ ∑1≤i1≤n1,1≤i2≤n2,··· ,1≤id≤nd

Fi1i2···id2 .

The notion ‘Q’ signifies the quadratic constraints, and we focus on convex quadratic

constraints in this thesis, or specifically the case of co-centered ellipsoids, i.e., Qi 0

2.1 Notations and Models 15

for i = 1, 2, . . . ,m and∑m

i=1Qi 0. A general convex compact set in Rn is also

discussed in this thesis, which is denoted by the notion ‘G’. Constraints B, S, Q and G

are convex, while Constraints B and S are non-convex. It is obvious that Constraint G

is a generalization of Constraint Q, and Constraint Q is a generalization of Constraint

S and Constraint B as well.

2.1.3 Models and Organization

All the polynomial optimization models discussed in this thesis are maximization prob-

lems, and the results for most of their minimization counterparts can be similarly

derived. The names of all the models simply combine the names of the objective func-

tions described in Section 2.1.1, and the names of the constraint sets described in

Section 2.1.2, with the names of the constraints in the subscription. For examples,

model (TS) is to maximize a multilinear tensor function (Function T) under the spher-

ical constraints (Constraint S), model (MBS) is to maximize a mixed polynomial form

(Function M) under binary constraints (Constraint B), mixed with variables under

spherical constraints (Constraint S), etc.

In Chapter 3, we discuss the models for optimizing a multilinear form with quadratic

constraints, including (TS) and (TQ). In Chapter 4, we discuss the models for optimizing

a homogeneous polynomial or a mixed form with quadratic constraints, including (HS),

(HQ), (MS) and (MQ). General polynomial optimization models including (PS), (PQ)

and (PG) are discussed in Chapter 5. Chapter 6 talk about binary integer programming

models, including (TB), (HB), (MB), and (PB). Chapter 7 talk about mixed integer

programming models, including (TBS), (HBS) and (MBS). All these models are listed

below for a quick reference.

Chapter 3:

(TS) max F (x1,x2, · · · ,xd)

s.t. xk ∈ Snk , k = 1, 2, . . . , d;

(TQ) max F (x1,x2, · · · ,xd)

s.t. (xk)TQkikxk ≤ 1, k = 1, 2, . . . , d, ik = 1, 2, . . . ,mk,

xk ∈ Rnk , k = 1, 2, . . . , d.


Chapter 4:

(HS) max f(x)

s.t. x ∈ Sn;

(HQ) max f(x)

s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,

x ∈ Rn;

(MS) max f(x1,x2, · · · ,xs)

s.t. xk ∈ Snk , k = 1, 2, . . . , s;

(MQ) max f(x1,x2, · · · ,xs)

s.t. (xk)TQkikxk ≤ 1, k = 1, 2, . . . , s, ik = 1, 2, . . . ,mk,

xk ∈ Rnk , k = 1, 2, . . . , s.

Chapter 5:

(PS) max p(x)

s.t. x ∈ Sn;

(PQ) max p(x)

s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,

x ∈ Rn;

(PG) max p(x)

s.t. x ∈ G.

Chapter 6:

(TB) max F (x1,x2, · · · ,xd)

s.t. xk ∈ Bnk , k = 1, 2, . . . , d;

(HB) max f(x)

s.t. x ∈ Bn;

(MB) max f(x1,x2, · · · ,xs)

s.t. xk ∈ Bnk , k = 1, 2, . . . , s;

(PB) max p(x)

s.t. x ∈ Bn.

2.2 Tensor Operations 17

Chapter 7:

(TBS) max F (x1,x2, · · · ,xd,y1,y2, . . . ,yd′)

s.t. xk ∈ Bnk , k = 1, 2, . . . , d,

y` ∈ Sm` , ` = 1, 2, . . . , d′;

(HBS) max f(x,y)

s.t. x ∈ Bn,

y ∈ Sm;

(MBS) max f(x1,x2, · · · ,xs,y1,y2, · · · ,yt)

s.t. xk ∈ Bnk , k = 1, 2, . . . , s,

y` ∈ Sm` , ` = 1, 2, . . . , t.

As before, we also assume that the tensor forms of the objective functions in (HBS)

and (MBS) to have partial symmetric property, m1 ≤ m2 ≤ · · · ≤ md′ in (TBS), and

m1 ≤ m2 ≤ · · · ≤ mt in (MBS).

In each chapter mentioned above, we discuss the computational complexity of the

models concerned, and focus on polynomial-time approximation algorithms with worst-

case performance ratios, followed by discussions on their applications and/or numerical

performance of the algorithms proposed. All the numerical computations are conducted

using an Intel Pentium 4 CPU 2.80GHz computer with 2GB of RAM, and the support-

ing software Matlab 7.7.0 (R2008b). Let d1 +d2 + · · ·+ds = d and d′1 +d′2 + · · ·+d′t = d′

in the above mentioned models. The degrees of the objective polynomials in these mod-

els, d and d+ d′, are understood as fixed constants in our subsequent discussions. We

are able to propose polynomial-time approximation algorithms for all these models,

and the approximation ratios depend only on the dimensions (including the number of

variables and the number of constraints) of the problems concerned.

The remaining sections in this chapter discuss some necessary preparations, for the

purpose of better understanding the main subjects in the thesis. The topics include

elementary introductions of tensor operations, approximation algorithms, randomized

algorithms, and semidefinite programming.

2.2 Tensor Operations

A tensor is a multidimensional array. More formally, a d-th order tensor is an element

of the tensor product of d vector spaces, each of which has its own coordinate system.


Each entry of a d-th order tensor has d indices associated. A first order tensor is a

vector, a second order tensor is a matrix, and tensors of order three or higher are called

higher order tensors.

This section describe a few tensor operations commonly used in this thesis. For

a general review of other tensor operations, the reader is referred to [68]. The tensor

inner product is denoted by ‘•’, which is the summation of products of all corresponding

entries. For example, if F 1,F 2 ∈ Rn1×n2×···×nd , then

F 1 • F 2 :=∑

1≤i1≤n1,1≤i2≤n2,··· ,1≤id≤nd

F 1i1i2···id · F

2i1i2···id .

As mentioned before, the norm of the tensor is then defined as ‖F ‖ :=√F • F . Notice

that the tensor inner product and tensor norm also apply to the vectors and the matrices

since they are lower order tensors.

The modes of a tensor are referred to its coordinate systems. For example, the

following fourth order tensor G ∈ R2×2×3×2, with its entries being

G1111 = 1, G1112 = 2, G1121 = 3, G1122 = 4, G1131 = 5, G1132 = 6,

G1211 = 7, G1212 = 8, G1221 = 9, G1222 = 10, G1231 = 11, G1232 = 12,

G2111 = 13, G2112 = 14, G2121 = 15, G2122 = 16, G2131 = 17, G2132 = 18,

G2211 = 19, G2212 = 20, G2221 = 21, G2222 = 22, G2231 = 23, G2232 = 24,

has 4 modes, to be named mode 1, mode 2, mode 3 and mode 4. In case a tensor is

a matrix, it has only two modes, which we usually called them column and row. The

indices for an entry of a tensor are a sequence of integers, each one assigning from one

mode.

The first widely used tensor operation is tensor rewritten, which appears frequently

in this thesis. Namely, by combining a set of modes into one mode, a tensor can be

rewritten as a new tensor with a lower order. For example, by combining modes 3 and

4 together and put it into the last mode of the new tensor, tensor G can be rewritten

as a third order tensor G′ ∈ R2×2×6, with its entries being

G′111 = 1, G′112 = 2, G′113 = 3, G′114 = 4, G′115 = 5, G′116 = 6,

G′121 = 7, G′122 = 8, G′123 = 9, G′124 = 10, G′125 = 11, G′126 = 12,

G′211 = 13, G′212 = 14, G′213 = 15, G′214 = 16, G′215 = 17, G′216 = 18,

G′221 = 19, G′222 = 20, G′223 = 21, G′224 = 22, G′225 = 23, G′226 = 24.

2.2 Tensor Operations 19

By combining modes 2, 3 and 4 together, tensor G is then rewritten as a 2× 12 matrix 1 2 3 4 5 6 7 8 9 10 11 12

13 14 15 16 17 18 19 20 21 22 23 24

;

and by combing all the modes together, tensor G becomes a 24-dimensional vector

(1, 2, . . . , 24)T, which is the same as vectorization of a tensor.

The other commonly used operation of tensor is modes switch, that is to switch the

positions of two modes. This is very much like the transpose of a matrix, switching the

positions of row and column. Accordingly, the modes switch will change the sequences

of indices for the entries of a tensor. For example, by switching mode 1 and mode 3 of

G, tensor G is then changed to G′′ ∈ R3×2×2×2, with its entries defined by

G′′ijk` := Gkji` ∀ j, k, ` = 1, 2, i = 1, 2, 3.

By default, among all the tensors discussed in this thesis, we assume their modes have

been switched (in fact reordered), so that their dimensions are in a non-decreasing

order.

Another widely used operation is multiplying a tensor by a vector. For example,

tensorG has its associated multilinear function G(x,y, z,w), where variables x,y,w ∈

R2 and z ∈ R3. Four modes in G correspond to the four positions of variables in

function G. For a given vector w = (w1, w2)T, its multiplication with G in mode 4

makes G to be G′′′ ∈ R2×2×3, whose entries are defined by

G′′′ijk := Gijk1w1 +Gijk2w2 ∀ i, j = 1, 2, k = 1, 2, 3,

which is basically the inner product of the vectors w and Gijk· := (Gijk1, Gijk2)T. For

examples, if w = (1, 1)T, then G′′′ has entries

G′′′111 = 3, G′′′112 = 7, G′′′113 = 11, G′′′121 = 15, G′′′122 = 19, G′′′123 = 23,

G′′′211 = 27, G′′′212 = 31, G′′′213 = 35, G′′′221 = 39, G′′′222 = 43, G′′′223 = 47.

Its corresponding multilinear function is in fact G(x,y, z, w), with the underling vari-

ables x,y, z. Sometimes, we use G(·, ·, ·, w) more often to denote this new multilinear

function G(x,y, z, w).

This type of multiplication can extend to a tensor with a matrix, even with a tensor.

For example, if we multiply tensor G by a given matrix Z ∈ R3×2 in modes 3 and 4,


then we get a second order tensor (matrix) in R2×2, whose (i, j)-th entry is

Gij·· • Z =

3∑k=1

2∑`=1

Gijk`Zk` ∀ i, j = 1, 2.

Its corresponding multilinear function is denoted by G(·, ·, Z). In general, if a d-th

order tensor multiply by a d′-th order tensor (d′ ≤ d) in appropriate modes, then its

product is a (d − d′)-th order tensor. In particular, if d = d′, then this multiplication

is simply the tensor inner product.

2.3 Approximation Algorithms

Approximation algorithms are algorithms designed to find approximate solutions to

optimization problems. In general, approximation algorithms are often associated with

NP-hard problems, since it is unlikely that there exist polynomial-time exact algorithms

for solving NP-hard problems, one then settles for polynomial-time sub-optimal solu-

tions. Approximation algorithms are also used for problems where exact polynomial-

time algorithms are possible but are too expensive to compute due to the size of the

problem. Usually, an approximation algorithm is associated with an approximation

ratio, which is a provable value measuring the quality of the solution found.

Approximation algorithms are widely used in combinatorial optimizations, typically

in various graph problems. Let us describe a well known example, the vertex cover

problem, to appreciate the notion of approximation algorithms. Given an undirected

graph G = (V,E), and a nonnegative cost associated with each vertex, find a minimum

cost of vertices to cover all the edges, i.e., a set V ′ ⊂ V such that every edge has at

least one endpoint incident at V ′.

The vertex cover problem is NP-hard (see e.g., [38]), even for the cardinality vertex

cover, which is the case that the cost associated with each vertex is 1. There is a very

simple algorithm for cardinality vertex cover problem. Pick any uncovered edge e ∈ E,

select both of its two incident vertices, and then remove all the edges covered by these

two vertices; The process is continued until every edge is removed, and output all the

selected vertices. This process can be done in at most |E| numbers of steps, which

is polynomial-time of the input dimension max|V |, |E|. The algorithm may not get

an optimal cover, however we can show that the number of vertices selected by this

algorithm is at most twice as the optimal value of cardinality vertex cover. If fact,

2.3 Approximation Algorithms 21

for any optimal cover, it must cover any edge picked (not removed) by the algorithm,

and thus must include one of its two incident vertices, then this optimal cover must

include half of the vertices selected by the algorithm. This is a typical approximation

algorithm with approximation ratio 2.

We shall now define formally the approximation algorithms and approximation ra-

tios. Throughout this thesis, for any maximization problem (P ) defined as maxx∈X p(x),

we use v(P ) to denote its optimal value, and v(P ) to denote the optimal value of its

minimization counterpart, i.e.,

v(P ) := maxx∈X

p(x) and v(P ) := minx∈X

p(x).

Definition 2.3.1 Approximation algorithm and approximation ratio:

1. A maximization problem maxx∈X p(x) admits a polynomial-time approximation

algorithm with approximation ratio τ ∈ (0, 1], if v(P ) ≥ 0 and a feasible solution

x ∈ X can be found in polynomial-time such that p(x) ≥ τ v(P );

2. A minimization problem minx∈X p(x) admits a polynomial-time approximation

algorithm with approximation ratio µ ∈ [1,∞), if v(P ) ≥ 0 and a feasible solution

x ∈ X can be found in polynomial-time such that p(x) ≤ µ v(P ).

It is easy to see that the larger the τ , the better the ratio for a maximization

problem, and the smaller the µ, the better the ratio for a minimization problem. In short

the closer to one, the better the ratio. However, sometimes a problem may be very hard,

such that there is no polynomial-time approximation algorithm which approximates

the optimal value within any positive factor. A typical example of this type is also the

vertex cover problem, although its cardinality version has a very simple 2-approximation

algorithm. In those unfortunate cases, we have approximation algorithms with relative

approximation ratios.

Definition 2.3.2 Approximation algorithm and relative approximation ratio:

1. A maximization problem maxx∈X p(x) admits a polynomial-time approximation

algorithm with relative approximation ratio τ ∈ (0, 1], if a feasible solution x ∈ X

can be found in polynomial-time such that p(x) − v(P ) ≥ τ (v(P )− v(P )), or

equivalently v(P )− p(x) ≤ (1− τ) (v(P )− v(P ));


2. A minimization problem minx∈X p(x) admits a polynomial-time approximation

algorithm with relative approximation ratio µ ∈ [1,∞), if a feasible solution x ∈ X

can be found in polynomial-time such that v(P )−p(x) ≥ (1/µ) (v(P )− v(P )), or

equivalently p(x)− v(P ) ≤ (1− 1/µ) (v(P )− v(P )).

Similar to the usual approximation ratio, the closer to one, the better the rela-

tive approximation ratios. For a maximization problem, if we know for sure that the

optimal value of its minimization counterpart is nonnegative, then trivially a relative

approximation ratio already implies a usual approximation ratio. This is not rare, as

many optimization problems always have nonnegative objective functions in real appli-

cations, e.g., various graph partition problems. Of course there are several other ways

in defining the approximation quality to measure the performance of the approximate

solutions (see e.g., [61, 11]).

We would like to point out that the approximation ratios defined are for the worst-

case scenarios, which might be hard or even impossible to find an example attaining

exactly the ratio in applying the algorithms. Thus it does not mean an approximation

algorithm with a better approximation ratio has better performance in practice than

that with a worse ratio. In reality, many approximation algorithms have their approx-

imation ratios far from one if they have one at all, which might approach zero when

the dimensions of the problems become large. Perhaps it is more appropriate to view

the approximation guarantee as a measure that forces us to explore deeper into the

structure of the problem and discover more powerful tools to explore this structure.

In addition, an algorithm with a theoretical assurance should be viewed as a useful

guidance that can be fine tuned to suit the type of instances arising from that specific

applications.

As mentioned in Section 2.1.3, all optimization models considered in this thesis are

maximization problems. Thus we reserve the Greek letter τ , specialized to indicate the

approximation ratio, which is a key ingredient throughout this thesis. All the approxi-

mation ratios presented in this thesis are in general not universal constants, and involve

problem dimensions and Ω. Here Ω (f(n)) signifies that there are positive universal con-

stants α and n0 such that Ω (f(n)) ≥ αf(n) for all n ≥ n0. As usual, O (f(n)) signifies

that there are positive universal constants α and n0 such that O (f(n)) ≤ αf(n) for all

n ≥ n0.

2.4 Randomized Algorithms 23

2.4 Randomized Algorithms

A randomized algorithm is an algorithm which employs a degree of randomness as part

of its operation. The algorithm typically contains certain probability distribution as an

auxiliary input to guide its executions, in the hope of achieving good performance on

average, or with high probability to achieve good performance. Formally, the algorithm’s

performance will be a random variable, thus either the running time, or the output (or

both) are random variables.

Historically, the first randomized algorithm was a method developed by Rabin [104]

for the closest pair problem in computational geometry. The study of randomized algo-

rithms was spurred by the 1977 discovery of a randomized primality test (determining

the primality of a number) by Solovay and Strassen [111]. Soon afterwards Rabin [105]

demonstrated that the 1976 Miller’s primality test [83] can be turned into a randomized

algorithm. At that time, no practical deterministic algorithm for primality was known.

A well known application and commonly used algorithm in which randomness can

be useful is quicksort. Any deterministic version of this algorithm requires O(n2) time

to sort n different numbers (to be denoted by set S), e.g., the straightforward one:

comparing all the pairs requiring n(n−1)2 time. However, if we assume the given n

different numbers are in a sequence uniformly distributed on all the n! number of

distinctive sequences, then the quicksort algorithm sort this sequence in O(n log n)

time. The algorithm chooses an element of S uniformly at random as a pivot, compares

the pivot with other elements and groups them into two sets S1 (those bigger than the

pivot) and S2 (those smaller than the pivot), and then applies the same process to sort

S1 and S2; This process is continued until a sort realizes.

To see why quicksort will cost O(n log n) in average, let us without loss of generality

assume S = 1, 2, . . . , n. Denote xij to be the indicator random variable whether

elements i and j are compared or not during a quicksort. The total time of compares

is then

E

∑1≤i<j≤n

xij

=∑

1≤i<j≤nE [xij ] .

Next we compute E[xij ], which is the probability that i and j are ever compared by a

quicksort. This happens if and only if either i or j is the first pivot selected by quicksort

from the set i, i+ 1, . . . , j − 1, j (assume i < j), and the probability is 2/(j − i− 1).


Therefore, the average time of compares is∑1≤i<j≤n

E [xij ] =∑

1≤i<j≤n

2

j − i+ 1= 2(n+ 1)

n∑k=1

1

k− 4n = O(n log n).

In fact, the expected time argument is based on the uniformly choosing the pivots,

although the worst-case time is also O(n2), when each time we are very unfortunate to

pick the largest element as the pivot.

There are also other successful stories that randomized algorithms help. For ex-

ample, the volume of a convex body can be estimated by a randomized algorithm to

arbitrary precision in polynomial-time [31], while no deterministic algorithm can do

the same [13].

In applying to NP-hard optimization problems, randomized algorithms are often

associated with approximation algorithms to prove performance ratios, in terms of

expectation, or with high probability. A simple example is a randomized approximation

algorithm with approximation ratio 0.5 for the max-cut problem. Given an undirected

graph G = (V,E), and a nonnegative weight associated with each edge, find a partition

(cut) of V into two disjoint sets A and B, so that the total weight of all the edges

connecting one vertex in A and one vertex B is maximized. This is also one of the well

known NP-hard problems [38]. The simple algorithm is as follows. For each vertex,

independently toss a fair coin, and put it into A if head or B if tail. It is easy to see

that the probability of an edge connecting A and B are exactly 1/2. By the linearity

of expectation, the expected total weight of this cut is exactly half of the total weight

of all the edges, which is at least half of the maximum cut.

The current best approximation ratio of the max-cut problem is 0.878, another

celebrated result of randomized algorithm due to Goemans and Williamson [40], which

we shall elaborate in Section 2.5.

We conclude this section by defining polynomial-time randomized approximation

algorithms, like Definition 2.3.1 and Definition 2.3.2. We are not going to elaborate all

of their randomized versions, rather a typical one. The others can be similar described.

Definition 2.4.1 A maximization problem maxx∈X p(x) admits a polynomial-time ran-

domized approximation algorithm with approximation ratio τ ∈ (0, 1], if v(P ) ≥ 0 and

one of the following two facts holds:

1. A feasible solution x ∈ X can be found in polynomial-time, such that E[p(x)] ≥

τ v(P );

2.5 Semidefinite Programming 25

2. A feasible solution x ∈ X can be found in polynomial-time, such that p(x) ≥

τ v(P ) with probability at least 1− ε for all ε ∈ (0, 1).

2.5 Semidefinite Programming

Semidefinite programming (SDP) is a subfield of convex optimization concerned with

the optimization of a linear objective function over the intersection of the cone of

positive semidefinite matrices and an affine space. It can be viewed as an extension of

the well known linear programming problem, where the vector of variables is replaced

by a symmetric matrix, and the cone of nonnegative orthant is replaced by the cone of

positive semidefinite matrices. It is a special case of the so-called conic programming

problems (specialized to the cone of positive semidefinite matrices).

The standard formulation of an SDP problem is

(PSP ) sup C •X

s.t. Ai •X = bi, i = 1, 2, . . . ,m,

X 0,

where the data C and Ai (i = 1, 2, . . . ,m) are symmetric matrices, bi (i = 1, 2, . . . ,m)

are scalars, the dot product ‘•’ is the usual matrix inner product introduced in Sec-

tion 2.2, and ‘X 0’ means matrix X is positive semidefinite.

The dual problem of (PSP ) is

(DSP ) inf bTy

s.t.∑m

i=1 yiAi +Z = C,

Z 0.

A solution for an SDP problem is called strictly feasible if its feasible region has

nonempty interior, which is also called Slater condition. We are now providing the

strong duality theorem of SDP, for its proof one is refereed to Vandenberghe and Boy-

d [116] and Helmberg [51].

Theorem 2.5.1 The followings hold for (PSP ) and (DSP ):

1. If (DSP ) is strictly feasible, then v(PSP ) = v(DSP ). If in addition (DSP ) is

bounded above, then this optimal value is obtained by a feasible X∗ of (PSP );

2. If (PSP ) is strictly feasible, then v(PSP ) = v(DSP ). If in addition (PSP ) is

bounded below, then this optimal value is obtained by a feasible (Z∗,y∗) of (DSP );


3. Suppose one of (PSP ) and (DSP ) is strictly feasible and has bounded optimal

value, then feasible X of (PSP ) and feasible (Z,y) of (DSP ) is a pair of optimal

solutions to its respective problems, if and only if C •X = bTy or X •Z = 0;

4. If both (PSP ) and (DSP ) are strictly feasible, then v(PSP ) = v(DSP ) and this

optimal value is obtained by feasible X∗ of (PSP ) and (Z∗,y∗) of (DSP ).

For convenience, an SDP problem may often be specified in a slightly different, but

equivalent form. For example, linear expressions involving nonnegative scalar variables

may be added to the program specification. This remains an SDP because each variable

can be incorporated into the matrix X as a diagonal entry (Xii for some i). To ensure

that Xii ≥ 0, constraints Xij = 0 can be added for all i 6= j. As another example,

note that for any n × n positive semidefinite matrix X, there exists a set of vectors

v1,v2, · · · ,vn such that Xij = (vi)Tvj for all 1 ≤ i, j ≤ n. Therefore, SDP problems

are often formulated in terms of linear expressions on scalar products of vectors. Given

the solution for the SDP in the standard form, the vectors v1,v2, · · · ,vn can be

recovered in O(n3) time, e.g., using the Cholesky decomposition of X.

There are several types of algorithms for solving SDP problems. These algorithms

output the solutions up to an additive error ε in a time that is polynomial in the

problem dimensions and ln(1/ε). Interior point methods are the most popular and

widely use one. A lot of efficient SDP solvers based on interior point methods have

been developed, including SeDuMi of Sturm [112], SDPT3 of Toh et al. [115], SDPA of

Fujisawa et al. [37], CSDP of Borchers [19], DSDP of Benson and Ye [17], and so on.

SDP is of growing interest for several reasons. Many practical problems in opera-

tions research and combinatorial optimization can be modeled or approximated as SDP

problems. In automatic control theory, SDP is used in the context of linear matrix in-

equalities. All linear programming problems can be expressed as SDP problems, and

via hierarchies of SDP problems the solutions of polynomial optimization problems can

be approximated. Besides, SDP has been used in the design of optimal experiments

and it can aid in the design of quantum computing circuits.

SDP has a wide range of practical applications. One of its significant applications

is in its role to design approximate solutions to combinatorial optimization problem-

s, starting from the seminal work by Goemans and Williamson [40], who essentially

proposed a polynomial-time randomized approximation algorithm with approximation

2.5 Semidefinite Programming 27

ratio 0.878 for the max-cut problem. The algorithm uses SDP relaxation and random-

ization techniques, whose ideas have been revised and generalized in solving various

quadratic programming problems [88, 118, 119, 87, 120, 24, 5, 121, 75, 50] and even

quartic polynomial optimizations [77, 73]. We now elaborates the max-cut algorithm

of Goemans and Williamson.

As described in Section 2.4, the max-cut problem is to find a partition of an undi-

rected graph G = (V,E) with nonnegative weights on edges, into two disjoint sets, so

that the total weight of all the edges connecting these two sets is maximized. Denote

1, 2, . . . , n to be the set of vertices. Let wij ≥ 0 be the weight of edge connecting

vertices i and j for all i 6= j, and let it be 0 if there is no edge between i and j, or

i = j. If we let xi (i = 1, 2, . . . , n) be the binary variable denoting whether it is in the

first set (xi = 1) or the second set (xi = −1), then max-cut is the following quadratic

integer programming problem

(MC) max∑

1≤i,j≤nwij(1− xixj)/4

s.t. xi ∈ 1,−1, i = 1, 2, . . . , n.

The problem is NP-hard (see e.g., Garey and Johnson [38]). Now by introducing a

matrix X with Xij replacing xixj , the constraint is then equivalent to diag (X) =

e, X 0, rank (X) = 1. A straightforward SDP relaxation is dropping the rank-one

constraint, which yields

(SMC) max∑

1≤i,j≤nwij(1−Xij)/4

s.t. diag (X) = e, X 0.

The algorithm first solves (SMC) to get an optimal solution X∗, then randomly gen-

erates an n-dimensional vector following a zero-mean multivariate normal distribution

ξ ∼ N (0n,X∗),

and lets xi = sign (ξi) for i = 1, 2, . . . , n. Note that generating a zero-mean normal

random vector with co-variance matrix X∗ can be done by multiplying (X∗)12 with a

vector whose components are generating from n i.i.d. standard normal random vari-

ables. Besides, the sign function takes 1 for nonnegative numbers and −1 for negative

numbers. Although the output cut (solution x) may not be optimal, and is random

either. It can be shown that

E [xixj ] =2

πarcsin X∗ij ∀ 1 ≤ i, j ≤ n,


which further leads to

E

∑1≤i,j≤n

wij(1− xixj)4

≥ 0.878 v(SMC) ≥ 0.878 v(MC).

This yields a 0.878-approximation ratio for the max-cut problem. The ratio significantly

improves the previous best known one, which is 0.5 introduced in Section 2.4.

We concludes this section as well as this chapter, by introducing another exam-

ple of SDP relaxation and randomization technique for solving quadratic constrained

quadratic programming (QCQP) in Nemirovski et al. [87]. The problem is

(QP ) max xTFx

s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,

x ∈ Rn,

where Qi 0 for i = 1, 2, . . . ,m and∑m

i=1Qi 0. Remark this is exact the model

(HQ) when d = 2, whose algorithm will be used in this thesis. By using the same

method with Xij to replace xixj , and drop the rank-one constraint, we shall have the

standard SDP relaxation for (QP )

(SQP ) max F •X

s.t. Qi •X ≤ 1, i = 1, 2, . . . ,m,

X 0.

A polynomial-time randomized approximation algorithm runs in as follows:

1. Solve (SQP ) to get an optimal solution X∗;

2. Randomly generate a vector ξ ∼ N (0n,X∗);

3. Compute t = max1≤i≤m

√ξTQiξ and output the solution x = ξ/t.

A probability analysis can prove that

xTF x ≥ Ω (1/ logm) v(SQP ) ≥ Ω (1/ logm) v(QP )

holds with probability bigger than a constant. Thus running this algorithmsO (log(1/ε))

times and pick the best solution, which shall hit the approximation bound of Ω (1/ logm)

with probability at least 1− ε. For details, one is referred to Nemirovski et al. [87] and

He et al. [50].

Chapter 3

Multilinear Form Optimization

with Quadratic Constraints

3.1 Introduction

The first subclass of polynomial optimization problems studied in this thesis are the

following multilinear tensor function optimizations over quadratic constraints. Specifi-

cally, the models include maximizing a multilinear form under spherical constraints

(TS) max F (x1,x2, · · · ,xd)

s.t. xk ∈ Snk , k = 1, 2, . . . , d,

and maximizing a multilinear form over co-centered ellipsoidal constraints

(TQ) max F (x1,x2, · · · ,xd)


xk ∈ Rnk , k = 1, 2, . . . , d,

where Qkik 0 and

∑mkik=1Q

kik 0 for k = 1, 2, . . . , d, ik = 1, 2, . . . ,mk.

It is easy to see that the optimal value of (TS), denoted by v(TS), is positive by the

assumption that F is not a zero tensor. Moreover, (TS) is equivalent to

max F (x1,x2, · · · ,xd)

s.t. xk ∈ Snk , k = 1, 2, . . . , d.

This is because we can always scale the decision variables such that ‖xk‖ = 1 for all

1 ≤ k ≤ d without decreasing the objective. Thus (TS) is a special case of (TQ).

29

30 3 Multilinear Form Optimization with Quadratic Constraints

Homogeneous polynomial functions play an important role in approximation theo-

ry. In a certain well-defined sense, homogeneous polynomials are fairly dense among

all the continuous functions (see e.g., [117, 69]). Multilinear form is a special class of

homogeneous polynomials. In fact, one of the main reasons for us to study multilinear

form optimizations is its strong connection to homogenous polynomial optimizations in

deriving approximation bounds, whose details will be discussed in Chapter 4. This con-

nection creates a new approach to handle polynomial optimization problems, and the

fundamental issue is optimization of a multilinear tensor form. Chen et al. [25] establish

the tightness result of multilinear form relaxation for maximizing a homogeneous poly-

nomial over spherical constraint. The study of multilinear form optimizations becomes

much important.

Low degree cases of (TS) can be often encountered: When d = 1, its optimal solution

is F /‖F ‖ due to the Cauchy-Schwartz inequality; When d = 2, (TS) is to compute the

spectrum norm of the matrix F with efficient algorithms readily available. As we shall

prove later that (TS) is already NP-hard when d = 3, the focus of this chapter is to

design polynomial-time approximation algorithms with worst-case performance ratios

for any fixed degree d. The novel idea to handle high degree multilinear form is to

reduce the its degree, which leads to a relaxed multilinear form optimization in a lower

degree case. As any matrix can be treated as a long vector, any higher order tensor can

also be rewritten as a tensor with its order deduced by one (see the tensor operation

in Section 2.2), and thus rewritten its corresponding multilinear form with its degree

deduced by one. After we solve the problem with a lower degree, we need decompose

the solution to make it feasible for the higher degree case. Thus specific decomposition

methods are required, which are the main contributions in this chapter.

For the model (TQ): When d = 1, it can be formulated to a second order cone

program (SOCP), which can be solved in polynomial-time (see e.g., [20, 86]); When d =

2, it can be formulated to a quadratically constrained quadratic programming problem

discussed in Section 2.5, and known to be NP-hard in general. Nemirovski et al. [87]

proposed a polynomial-time randomized approximation algorithm with approximation

ratio Ω (1/ logm) based on SDP relaxation and randomization, and this algorithm

serves as a basis in analyzing our algorithms and approximation ratios.

We discuss approximation algorithms of (TS) in Section 3.2, followed by that of

(TQ) in Section 3.3. Some application examples of the models concerned are discussed

3.2 Multilinear Form with Spherical Constraints 31

in Section 3.4. Finally, numerical performance of the proposed algorithms are reported

in Section 3.5.

3.2 Multilinear Form with Spherical Constraints

Let us first consider the following optimization model

(TS) max F (x1,x2, · · · ,xd)

s.t. xk ∈ Snk , k = 1, 2, . . . , d,

where n1 ≤ n2 ≤ · · · ≤ nd. A special case of (TS) is worth noting, which plays an

important role in our algorithms.

Proposition 3.2.1 If d = 2, then (TS) can be solved in polynomial-time, with v(TS) ≥

‖F ‖/√n1.

Proof. The problem is essentially maxx∈Sn1 ,y∈Sn2 xTFy. For any fixed y, the corre-

sponding optimal x must be Fy/‖Fy‖ due to the Cauchy-Schwartz inequality, and

accordingly,

xTFy =

(Fy

‖Fy‖

)T

Fy = ‖Fy‖ =

√yTFTFy.

Thus the problem is equivalent to maxy∈Sn2 yTFTFy, whose solution is the largest

eigenvalue and a corresponding eigenvector of the positive semidefinite matrix FTF .

We then have

λmax(FTF ) ≥ tr (FTF )/rank (FTF ) ≥ ‖F ‖2/n1,

which implies v(TS) =√λmax(FTF ) ≥ ‖F ‖/√n1.

However, for any degree d ≥ 3, (TS) becomes NP-hard.

Proposition 3.2.2 If d = 3, then (TS) is NP-hard.

Proof. We first quote a result of Nesterov [90], which states that

max∑m

k=1(xTAkx)2

s.t. x ∈ Sn

is NP-hard. Now in a special case d = 3, n1 = n2 = n3 = n and F ∈ Rn3satisfies

Fijk = Fjik for all 1 ≤ i, j, k ≤ n, the objective function of (TS) can be written as

F (x,y, z) =n∑

i,j,k=1

Fijkxiyjzk =n∑k=1

zk

n∑i,j=1

Fijkxiyj

=n∑k=1

zk(xTAky),


where symmetric matrix Ak ∈ Rn×n with its (i, j)-th entry being Fijk for all 1 ≤

i, j, k ≤ n. By the Cauchy-Schwartz inequality, (TS) is equivalent to

max∑n

k=1(xTAky)2

s.t. x,y ∈ Sn.

We need only to show that the optimal value of the above problem is always attainable

at x = y. To see why, denote (x, y) to be any optimal solution pair, with optimal value

v∗. If x = ±y, then the claim is true; otherwise, we may suppose that x+ y 6= 0. Let

us denote w := (x+ y)/‖x+ y‖. Since (x, y) must be a KKT point, there exist (λ, µ)

such that

n∑k=1

xTAky Aky = λx

n∑k=1

xTAky Akx = µy.

Pre-multiplying xT to the first equation and yT to the second equation yield λ = µ =

v∗. Summing up the two equations, pre-multiplying wT, and then scaling, lead us ton∑k=1

xTAky wTAkw = v∗.

By applying the Cauchy-Schwartz inequality to the above equality, we have

v∗ ≤

(n∑k=1

(xTAky)2

) 12(

n∑k=1

(wTAkw)2

) 12

=√v∗

(n∑k=1

(wTAkw)2

) 12

,

which implies that (w, w) is also an optimal solution. The problem is then reduced to

Nesterov’s quartic model, and its NP-hardness thus follows.

In the remainder of this section, we focus on approximation algorithms for (TS) for

general degree d. To illustrate the main idea of the algorithms, let us first work with

the case d = 3, i.e.,

(TS) max F (x,y, z) =∑

1≤i≤n1,1≤j≤n2,1≤k≤n3Fijkxiyjzk

s.t. x ∈ Sn1 , y ∈ Sn2 , z ∈ Sn3 .

Denote W = xyT, and we have

‖W ‖2 = tr (WWT) = tr (xyTyxT) = tr (xTxyTy) = ‖x‖2‖y‖2 = 1.

Model (TS) can now be relaxed to

max F (W , z) =∑

1≤i≤n1,1≤j≤n2,1≤k≤n3FijkWijzk

s.t. W ∈ Sn1×n2 , z ∈ Sn3 .


Notice that the above problem is exactly (TS) with d = 2, which can be solved in

polynomial-time by Proposition 3.2.1. Denote its optimal solution to be (W , z). Clear-

ly F (W , z) ≥ v(TS). The key step is to recover solution (x, y) from the matrix W .

Below we are going to introduce two basic decomposition routines: one is based on

randomization and the other on eigen-decomposition. They play a fundamental role in

our proposed algorithms; all solution methods to be developed later rely on these two

routines as a basis.

Decomposition Routine 3.2.1

• INPUT: matrices M ∈ Rn1×n2 , W ∈ Sn1×n2.

1 Construct

W =

In1×n1 W

WT WTW

0.

2 Randomly generate ξ

η

∼ N (0n1+n2 , W )

and repeat if necessary, until ξTMη ≥M •W and ‖ξ‖‖η‖ ≤ O(√n1).

3 Compute x = ξ/‖ξ‖ and y = η/‖η‖.

• OUTPUT: vectors x ∈ Sn1 , y ∈ Sn2.

Now, let M = F (·, ·, z) and W = W in applying the above decomposition routine.

For the randomly generated (ξ,η), we have

E[F (ξ,η, z)] = E[ξTMη] = M •W = F (W , z).

He et al. [50] establish that if f(x) is a homogeneous quadratic form and x is drawn

from a zero-mean multivariate normal distribution, then there is a universal constant

θ ≥ 0.03 such that

Prob f(x) ≥ E[f(x)] ≥ θ.

Since ξTMη is a homogeneous quadratic form of the normal random vector(ξη

), we

know

Prob ξTMη ≥M •W = Prob F (ξ,η, z) ≥ E[F (ξ,η, z)] ≥ θ.


Moreover, by using a property of normal random vectors (see Lemma 3.1 of [77]), we

have

E[‖ξ‖2‖η‖2

]= E

n1∑i=1

n2∑j=1

ξi2ηj

2

=

n1∑i=1

n2∑j=1

(E[ξi

2]E[ηj2] + 2(E[ξiηj ])

2)

=

n1∑i=1

n2∑j=1

[(WTW )jj + 2Wij

2]

= (n1 + 2) tr (WTW ) = n1 + 2.

By applying the Markov inequality, for any t > 0

Prob ‖ξ‖2‖η‖2 ≥ t ≤ E[‖ξ‖2‖η‖2

]/t = (n1 + 2)/t.

Therefore, by the so-called union inequality for the probability of joint events, we have

ProbF (ξ,η, z) ≥ F (W , z), ‖ξ‖2‖η‖2 ≤ t

≥ 1− Prob

F (ξ,η, z) < F (W , z)

− Prob

‖ξ‖2‖η‖2 > t

≥ 1− (1− θ)− (n1 + 2)/t = θ/2,

where we let t = 2(n1 + 2)/θ. Thus we have

F (x,y, z) ≥ F (W , z)√t

≥ v(TS)

√θ

2(n1 + 2),

obtaining an Ω(1/√n1)-approximation ratio.

Below we present an alternative (and deterministic) decomposition routine.


• INPUT: a matrix M ∈ Rn1×n2.

1 Find an eigenvector y corresponding to the largest eigenvalue of MTM .

2 Compute x = My/‖My‖ and y = y/‖y‖.

• OUTPUT: vectors x ∈ Sn1 , y ∈ Sn2.

This decomposition routine literally follows the proof of Proposition 3.2.1, which

tells us that xTMy ≥ ‖M‖/√n1. Thus we have

F (x,y, z) = xTMy ≥ ‖M‖√n1

= maxZ∈Sn1×n2

M •Z√n1

≥ M • W√n1

=F (W , z)√n1

≥ v(TS)√n1

.


The complexity for DR 3.2.1 is O (n1n2 log(1/ε)) with probability 1 − ε, and for

DR 3.2.2 it is O(maxn1

3, n1n2). However DR 3.2.2 is indeed very easy to implement,

and is deterministic. Both DR 3.2.1 and DR 3.2.2 lead to the following approximation

result in terms of the order of the approximation ratio.

Theorem 3.2.3 If d = 3, then (TS) admits a polynomial-time approximation algorithm

with approximation ratio 1/√n1.

Now we proceed to the case for general d. Let X = x1(xd)T, and (TS) can be

relaxed to

(TS) max F (X,x2,x3, · · · ,xd−1)

s.t. X ∈ Sn1×nd , xk ∈ Snk , k = 2, 3, . . . , d− 1.

Clearly it is a type of the model (TS) with degree d − 1. Suppose (TS) can be

solved approximately in polynomial-time with approximation ratio τ , i.e., we find

(X, x2, x3, · · · , xd−1) with

F (X, x2, x3, · · · , xd−1) ≥ τv(TS) ≥ τv(TS).

Observing that F (·, x2, x3, · · · , xd−1, ·) is an n1 × nd matrix, using DR 3.2.2 we shall

find (x1, xd) such that

F (x1, x2, · · · , xd) ≥ F (X, x2, x3, · · · , xd−1)/√n1 ≥ n1

− 12 τv(TS).

By induction this leads to the following:

Theorem 3.2.4 (TS) admits a polynomial-time approximation algorithm with approx-

imation ratio τ(TS), where

τ(TS) :=

(d−2∏k=1

nk

)− 12

.

Below we summarize the above recursive procedure to solve (TS) as in Theorem 3.2.4.

Remark that the approximation performance ratio of this algorithm is tight. In a

special example F (x1,x2, · · · ,xd) =∑n

i=1 x1ix

2i · · ·xdi , the algorithm can be made to

return a solution with approximation ratio being exactly τ(TS).

Algorithm 3.2.3

• INPUT: a d-th order tensor F ∈ Rn1×n2×···×nd with n1 ≤ n2 ≤ · · · ≤ nd.


1 Rewrite F as a (d− 1)-th order tensor F ′ ∈ Rn2×n3×···×nd−1×ndn1 by combing its

first and last modes into one, and placing it in the last mode of F ′, i.e.,

Fi1,i2,··· ,id = F ′i2,i3,··· ,id−1,(i1−1)nd+id∀ 1 ≤ i1 ≤ n1, 1 ≤ i2 ≤ n2, · · · , 1 ≤ id ≤ nd.

2 For (TS) with the (d− 1)-th order tensor F ′: if d− 1 = 2, then apply DR 3.2.2,

with input F ′ = M and output (x2, x1,d) = (x,y); otherwise obtain a solution

(x2, x3, · · · , xd−1, x1,d) by recursion.

3 Compute a matrix M ′ = F (·, x2, x3, · · · , xd−1, ·) and rewrite the vector x1,d as

a matrix X ∈ Sn1×nd.

4 Apply either DR 3.2.1 or DR 3.2.2, with input (M ′,X) = (M ,W ) and output

(x1, xd) = (x,y).

• OUTPUT: a feasible solution (x1, x2, · · · , xd).

3.3 Multilinear Form with Ellipsoidal Constraints

In this section, we consider a generalization of the optimization model discussed in

Section 3.2, to include general ellipsoidal constraints. Specifically, the model is

(TQ) max F (x1,x2, · · · ,xd)


xk ∈ Rnk , k = 1, 2, . . . , d,

where Qkik 0 and

∑mkik=1Q

kik 0 for k = 1, 2, . . . , d, ik = 1, 2, . . . ,mk.

Let us start with the case d = 2, and suppose F (x1,x2) = (x1)TFx2. Denote

y =

(x1

x2

), F ′ =

0n1×n1F2

FT

2 0n2×n2

, Qi =

Q1i 0n1×n2

0n2×n1 0n2×n2

for all 1 ≤ i ≤ m1,

and Qi =

0n1×n1 0n1×n2

0n2×n1 Q2i−m1

for all m1 + 1 ≤ i ≤ m1 +m2. Then (TQ) is equivalent

to

max yTF ′y

s.t. yTQiy ≤ 1, i = 1, 2, . . . ,m1 +m2,

y ∈ Rn1+n2 .

3.3 Multilinear Form with Ellipsoidal Constraints 37

This QCQP problem is discussed in Section 2.5, and is well known to be solved ap-

proximately by a polynomial-time randomized algorithm with approximation ratio

Ω(

1log(m1+m2)

)(see e.g., Nemirovski et al. [87] and He et al. [50]).

We now proceed to the higher order cases. To illustrate the essential ideas, we shall

focus on the case d = 3. The extension to any higher order can be done by induction.

In case d = 3 we may explicitly write (TQ) as:

(TQ) max F (x,y, z)

s.t. xTQix ≤ 1, i = 1, 2, . . . ,m1,

yTP jy ≤ 1, j = 1, 2, . . . ,m2,

zTRkz ≤ 1, k = 1, 2, . . . ,m3,

x ∈ Rn1 ,y ∈ Rn2 , z ∈ Rn3 ,

where Qi 0 for all 1 ≤ i ≤ m1, P j 0 for all 1 ≤ j ≤ m2, Rk 0 for all 1 ≤ k ≤ m3,

and∑m1

i=1Qi 0,∑m2

j=1P j 0,∑m3

k=1Rk 0.

Combining the constraints of x and y, we have

tr (QixyTP jyx

T) = tr (xTQixyTP jy) = xTQix · yTP jy ≤ 1.

Denoting W = xyT, (TQ) can be relaxed to

(TQ) max F (W , z)

s.t. tr (QiWP jWT) ≤ 1, i = 1, 2, . . . ,m1, j = 1, 2, . . . ,m2,

zTRkz ≤ 1, k = 1, 2, . . . ,m3,

W ∈ Rn1×n2 , z ∈ Rn3 .

Observe that for any W ∈ Rn1×n2 ,

tr (QiWP jWT) = tr (Qi

12WP j

12P j

12WTQi

12 ) =

∥∥∥Qi

12WP j

12

∥∥∥2≥ 0,

and that for any W 6= 0,

∑1≤i≤m1,1≤j≤m2

tr (QiWP jWT) = tr

(m1∑i=1

Qi

)W

m2∑j=1

P j

WT

=

∥∥∥∥∥∥∥(m1∑i=1

Qi

) 12

W

m2∑j=1

P j

12

∥∥∥∥∥∥∥2

> 0.

Indeed, it is easy to verify that tr (QiWP jWT) = (vec(W ))T(Qi⊗P j)vec(W ), which

implies that tr (QiWP jWT) ≤ 1 is actually a convex quadratic constraint for W .


Thus, (TQ) is exactly in the form of (TQ) with d = 2. Therefore we are able to find a

feasible solution (W , z) of (TQ) in polynomial-time, such that

F (W , z) ≥ Ω

(1

log(m1m2 +m3)

)v(TQ) ≥ Ω

(1

logm

)v(TQ), (3.1)

where m = maxm1,m2,m3. Let us fix z, and then F (·, ·, z) is a matrix. Our next

step is to generate (x, y) from W . For this purpose, we first introduce the following

lemma.

Lemma 3.3.1 Suppose Qi ∈ Rn×n, Qi 0 for all 1 ≤ i ≤ m, and∑m

i=1Qi 0, the

following SDP problem

(PS) min∑m

i=1 ti

s.t. tr (UQi) ≤ 1, i = 1, 2, . . . ,m,

ti ≥ 0, i = 1, 2, . . . ,m, U In×n

In×n∑m

i=1 tiQi

0

has an optimal solution with optimal value equal to n.

Proof. Straightforward computation shows that the dual of (PS) is

(DS) max −∑m

i=1 si − 2 tr (Z)

s.t. tr (XQi) ≤ 1, i = 1, 2, . . . ,m,

si ≥ 0, i = 1, 2, . . . ,m, X Z

ZT ∑mi=1 siQi

0.

Observe that (DS) indeed resembles (PS). Since∑m

i=1Qi 0, both (PS) and (DS)

satisfy the Slater condition, and thus both of them have attainable optimal solutions

satisfying the strong duality relationship, i.e., v(PS) = v(DS). Let (U∗, t∗) be an

optimal solution of (PS). Clearly U∗ 0, and by the Schur complement relationship

we have∑m

i=1 t∗iQi (U∗)−1. Therefore,

v(PS) =m∑i=1

t∗i ≥m∑i=1

t∗i tr (U∗Qi) ≥ tr (U∗(U∗)−1) = n. (3.2)

Observe that for any dual feasible solution (X,Z, s) we always have −∑m

i=1 si ≤

−tr (X∑m

i=1 siQi). Hence the following problem is a relaxation of (DS)

(RS) max −tr (XY )− 2 tr (Z)

s.t.

X Z

ZT Y

0.


Consider any feasible solution (X,Y ,Z) of (RS). Let X = PTDP be an orthonor-

mal decomposition with D = Diag (d1, d2, · · · , dn) and P−1 = PT. Notice that

(D,Y ′,Z ′) := (PXPT,PY PT,PZPT) is also a feasible solution for (RS) with

the same objective value. By the feasibility, it follows that diY′ii − (Z ′ii)

2 ≥ 0 for

i = 1, 2, . . . , n. Therefore,

−tr (XY )− 2 tr (Z) = −tr (DY ′)− 2 tr (Z ′) = −n∑i=1

diY′ii − 2

n∑i=1

Z ′ii

≤ −n∑i=1

(Z ′ii)2 − 2

n∑i=1

Z ′ii ≤ −n∑i=1

(Z ′ii + 1)2 + n ≤ n.

This implies that v(DS) ≤ v(RS) ≤ n. By combining this with (3.2), and noticing the

strong duality relationship, it follows that v(PS) = v(DS) = n.

We then have the following decomposition method, to be called DR 3.3.1, as a

further extension of DR 3.2.1. It plays a similar role in Algorithm 3.3.2 as DR 3.2.1 or

DR 3.2.2 does in Algorithm 3.2.3.


• INPUT: matrices Qi ∈ Rn1×n1 , Qi 0 for all 1 ≤ i ≤ m1 with∑m1

i=1Qi 0,

P j ∈ Rn2×n2 , P j 0 for all 1 ≤ j ≤ m2 with∑m2

j=1P j 0, W ∈ Rn1×n2 with

tr (QiWP jWT) ≤ 1 for all 1 ≤ i ≤ m1 and 1 ≤ j ≤ m2, and M ∈ Rn1×n2.

1 Solve the SDP problem

min∑m1

i=1 ti

s.t. tr (UQi) ≤ 1, i = 1, 2, . . . ,m1,

ti ≥ 0, i = 1, 2, . . . ,m1, U In×n

In×n∑m1

i=1 tiQi

0

to get an optimal solution of a matrix U and scalars t1, t2, · · · , tm1.

2 Construct

W =

U W

WT WT(∑m1

i=1 tiQi)W

0.



η

∼ N (0n1+n2 , W )

and repeat if necessary, until ξTMη ≥ M •W , ξTQiξ ≤ O (logm1) for all

1 ≤ i ≤ m1, and ηTP jη ≤ O (n1 logm2) for all 1 ≤ j ≤ m2.

4 Compute x = ξ/√

max1≤i≤m1ξTQiξ and y = η/√

max1≤j≤m2ηTP jη.

• OUTPUT: vectors x ∈ Rn1, y ∈ Rn2.

The computational complexity of DR 3.3.1 depends on the algorithm for solving

the SDP problem (PS), which has O(n12) number of variables and O(m1) number of

constraints. In addition it requires O (n2(n1m1 + n2m2) log(1/ε)) other operations to

get the quality assured solution with probability 1− ε.

Lemma 3.3.2 Under the input of DR 3.3.1, we can find x ∈ Rn1 and y ∈ Rn2 by a

polynomial-time randomized algorithm, satisfying xTQix ≤ 1 for all 1 ≤ i ≤ m1 and

yTP jy ≤ 1 for all 1 ≤ j ≤ m2, such that

xTMy ≥ 1√n

Ω

(1√

logm1 logm2

)M •W .

Proof. Following the randomization procedure in Step 3 of DR 3.3.1, by Lemma 3.3.1

we have for any 1 ≤ i ≤ m1 and 1 ≤ j ≤ m2,

E[ξTQiξ] = tr (QiU) ≤ 1,

E[ηTP jη] = tr

(P jW

T

(m1∑i=1

tiQi

)W

)=

m1∑i=1

ti tr (P jWTQiW ) ≤

m1∑i=1

ti = n1.

So et al. [109] have established that if ξ is a normal random vector and Q 0, then

for any α > 0,

Prob ξTQξ ≥ αE[ξTQξ] ≤ 2e−α2 .

Applying this result we have

Prob ξTQiξ ≥ α1 ≤ Prob ξTQiξ ≥ α1E[ξTQiξ] ≤ 2e−α12 ,

Prob ηTP jη ≥ α2n1 ≤ Prob ηTP jη ≥ α2E[ηTP jη] ≤ 2e−α22 .


Moreover, E[ξTMη] = M •W . Now let x = ξ/√α1 and y = η/

√α2n1, and we have

Prob

xTMy ≥ M •W

√α1α2n1

, xTQix ≤ 1 ∀ 1 ≤ i ≤ m1, yTP jy ≤ 1∀ 1 ≤ j ≤ m2

≥ 1− Prob ξTMη <M •W −

m1∑i=1

Prob ξTQiξ >α1 −m2∑j=1

Prob ηTP jη >α2n1

≥ 1− (1− θ)−m1 · 2e−α12 −m2 · 2e−

α22 = θ/2,

where α1 := 2 ln(8m1/θ) and α2 := 2 ln(8m2/θ). Since α1α2 = O(logm1 logm2), the

desired (x, y) can be found with high probability in multiple trials.

Let us now turn back to (TQ). If we let W = W and M = F (·, ·, z) in applying

Lemma 3.3.2, then in polynomial-time we can find (x, y), satisfying the constraints of

(TQ), such that

F (x, y, z) = xTMy ≥ 1√n1

Ω

(1√

logm1 logm2

)M •W ≥ 1

√n1

Ω

(1

logm

)F (W , z).

Combined with (3.1), we thus prove the following result.

Theorem 3.3.3 If d = 3, then (TQ) admits a polynomial-time randomized approxima-

tion algorithm with approximation ratio 1√n1

Ω(

1log2m

), where m = maxm1,m2,m3.

This result can be generalized to the model (TQ) of any fixed degree d.

Theorem 3.3.4 (TQ) admits a polynomial-time randomized approximation algorithm

with approximation ratio τ(TQ), where

τ(TQ) :=

(d−2∏k=1

nk

)− 12

Ω(

log−(d−1)m),

and m = max1≤k≤dmk.

Proof. We again take recursive steps. Denoting W = x1(xd)T and (TQ) is relaxed to

(TQ) max F (W ,x2,x3, · · · ,xd−1)

s.t. tr (Q1i1WQd

idWT) ≤ 1, i1 = 1, 2, . . . ,m1, id = 1, 2, . . . ,md,

(xk)TQkikxk ≤ 1, k = 2, 3, . . . , d− 1, ik = 1, 2, . . . ,mk,

W ∈ Rn1×nd ,xk ∈ Rnk , k = 2, 3, . . . , d− 1.

Notice that (TQ) is exactly in the form of (TQ) of degree d − 1, by treating W as a

vector of dimension n1nd. By recursion, with high probability we can find a feasible


solution (W , x2, x3, · · · , xd−1) of (TQ) in polynomial-time, such that

F (W , x2, x3, · · · , xd−1) ≥

(d−2∏k=2

nk

)− 12

Ω(

log−(d−2) maxm,m1md)v(TQ)

≥

(d−2∏k=2

nk

)− 12

Ω(

log−(d−2)m)v(TQ).

As long as we fix (x2, x3, · · · , xd−1), and letM = F (·, x2, x3, · · · , xd−1, ·) andW = W

in applying Lemma 3.3.2, we are be able to find (x1, xd) satisfying the constraints of

(TQ), such that

F (x1, x2, · · · , xd) ≥ 1√n1

Ω

(1

logm

)F (W , x2, x3, · · · , xd−1) ≥ τ(TQ) v(TQ).

Summarizing, the recursive procedure for solving general (TQ) (Theorem 3.3.4) is

highlighted as follows:

Algorithm 3.3.2

• INPUT: a d-th order tensor F ∈ Rn1×n2×···×nd with n1 ≤ n2 ≤ · · · ≤ nd, matrices

Qkik∈ Rnk×nk , Qk

ik 0 and

∑mkik=1Q

kik 0 for all 1 ≤ k ≤ d and 1 ≤ ik ≤ mk.




2 Compute matrices P i1,id = Q1i1 ⊗Q

did

for all 1 ≤ i1 ≤ m1 and 1 ≤ id ≤ md.

3 For (TQ) with the (d − 1)-th order tensor F ′, matrices Qkik

(2 ≤ k ≤ d − 1, 1 ≤

ik ≤ mk) and P i1,id (1 ≤ i1 ≤ m1, 1 ≤ id ≤ md): if d − 1 = 2, then apply SDP

relaxation and randomization procedure (Nemirovski et al. [87]) to obtain an ap-

proximate solution (x2, x1,d); otherwise obtain a solution (x2, x3, · · · , xd−1, x1,d)

by recursion.


a matrix X ∈ Rn1×nd.

3.4 Applications 43

5 Apply DR 3.3.1 with input (Q1i ,Q

dj ,X,M ′) = (Qi,P j ,W ,M) for all 1 ≤ i ≤ m1

and 1 ≤ j ≤ m2 and output (x1, xd) = (x,y).


3.4 Applications

As we mentioned in the beginning of this chapter, one of the main reasons to study

multilinear form optimizations is its strong connection to homogenous polynomial opti-

mizations in deriving approximation bounds, which will be discussed in the next chap-

ter. Apart from that, these models also have versatile applications. Here we present two

problems in this section and show that they are readily formulated by the polynomial

optimization models in this chapter.

3.4.1 Singular Values of Trilinear Forms

Trilinear forms play an increasingly important role in many parts of analysis, e.g., in

Fourier analysis, where they appear in the guise of paracommutators and compensated

quantities (see a survey by Peng and Wong [95]). The problem of singular values of

trilinear forms is the following (see also [18]). Denote H1, H2 and H3 to be three

separable Hilbert spaces over the field K, where K stands either for the real or the

complex numbers, and denote a trilinear form F : H1 × H2 × H3 7→ K. The spectrum

norm of the trilinear form F is then the following maximization problem:

‖F ‖S := sup |F (x,y, z)|

s.t. ‖x‖ ≤ 1, ‖y‖ ≤ 1, ‖z‖ ≤ 1,

x ∈ H1, y ∈ H2, z ∈ H3.

More generally, one can state the problem of the stationary values of the functional

|F (x,y, z)| under the same conditions. These corresponding stationary values are

called singular values of the trilinear form F . Bernhardsson and Peetre [18] showed

in the binary case, that ‖F ‖S2 are among the roots of a certain algebraic equation,

called the millenial equation, thought of as a generalization of the time honored secular

equation in the case of matrices. Another approach to singular values is given by De

Lathauwer et al. [28].


When specializing the Hilbert spaces to finite dimensional Euclidean spaces, i.e.,

Hi = Rni for i = 1, 2, 3, and reserving the field K to be the real, the problem of

computing the largest singular value ‖F ‖S is equivalent to (TS) when d = 3. This

is because one can always use (−x,y, z) to replace (x,y, z) if its objective value is

negative, hence the absolute value sign in |F (x,y, z)| can be omitted. Moreover, we

can also scale the decision variables such that ‖x‖ = ‖y‖ = ‖z‖ = 1 without decreasing

the objective. According to Proposition 3.2.2, the problem of computing ‖F ‖S is NP-

hard already in this real case. Together with Theorem 3.2.3, the spectrum norm of a

trilinear form can be approximated in polynomial-time with a factor of 1√minn1,n2,n3

.

3.4.2 Rank-One Approximation of Tensors

Decompositions of higher order tensors (i.e., the order of the tensor is bigger than or

equal to 3) have versatile applications in psychometrics, chemometrics, signal process-

ing, computer vision, numerical analysis, data mining, neuroscience, graph analysis,

and elsewhere (see e.g., an excellent survey by Kolda and Bader [68]). The earliest sto-

ry of tensor decomposition dates back to 1927, where Hitchcock [55, 56] proposed the

idea of the polyadic form of a tensor. Today, tensor decomposition is most widely used

in the form of canonical decomposition (CANDECOMP) by Carroll and Chang [23]

and parallel factors (PARAFAC) by Harshman [45], or in short CP decomposition.

A CP decomposition decomposes a tensor as a summation of rank-one tensors, i.e.,

tensors who can be written as outer product of vectors (see e.g., [67]). Specifically, for

a d-th order tensor F = (Fi1i2···id) ∈ Rn1×n2×···×nd and a given positive integer r, its

CP decomposition is as follows:

F ≈r∑i=1

x1i ⊗ x2

i ⊗ · · · ⊗ xdi ,

where xki ∈ Rnk for i = 1, 2, . . . , r, k = 1, 2, . . . , d. Exact recovery of rank-one decom-

positions is in general impossible, due to various reasons, e.g., data errors. Thus the

following optimization problem for the CP decomposition is straightforward, i.e., to

minimize the norm of the difference,

min ‖F −∑r

i=1 x1i ⊗ x2

i ⊗ · · · ⊗ xdi ‖

s.t. xki ∈ Rnk , i = 1, 2, . . . , r, k = 1, 2, . . . , d.

In particular, the case of r = 1 corresponds to the best rank-one approximation of a

3.5 Numerical Experiments 45

tensor, i.e.,

(TA) min ‖F − x1 ⊗ x2 ⊗ · · · ⊗ xd‖

s.t. xk ∈ Rnk , k = 1, 2, . . . , d.

By scaling, we may require the norm of xk to be one, then (TA) is equivalent to

min ‖F − λx1 ⊗ x2 ⊗ · · · ⊗ xd‖

s.t. λ ∈ R, xk ∈ Snk , k = 1, 2, . . . , d.

For any fixed xk ∈ Snk (k = 1, 2, . . . , d), if we optimize the objective function of (TA)

with respect to λ, we shall have

minλ∈R‖F − λx1 ⊗ x2 ⊗ · · · ⊗ xd‖

= minλ∈R

√‖F ‖2 − 2λF • (x1 ⊗ x2 ⊗ · · · ⊗ xd) + λ2 ‖x1 ⊗ x2 ⊗ · · · ⊗ xd‖2

= minλ∈R

√‖F ‖2 − 2λF (x1,x2, · · · ,xd) + λ2

=

√‖F ‖2 − (F (x1,x2, · · · ,xd))2

.

Thus (TA) is equivalent to

max |F (x1,x2, · · · ,xd)|

s.t. xk ∈ Snk , k = 1, 2, . . . , d,

which is the same as (TS) discussed in Section 3.2. Remark that similar deductions can

also be found in [29, 122, 67].

3.5 Numerical Experiments

In this section we are going to test the numerical performance of the approximation

algorithms proposed in this chapter. As mentioned in Section 2.1.3, all the numerical

computations reported in this thesis are performed on an Intel Pentium 4 CPU 2.80GHz

computer with 2GB of RAM, and the supporting software is Matlab 7.7.0 (R2008b).

The experiments in this section focus on the model (TS) with d = 4, or equivalent-

ly, to recover the best rank-one approximation of a fourth order tensor discussed in

Section 3.4.2. Specifically, we are going to test the following problem

(ETS) max F (x,y, z,w) =∑

1≤i,j,k,`≤n Fijk` xiyjzkw`

s.t. x,y, z,w ∈ Sn.


3.5.1 Randomly Simulated Data

A fourth order tensor F is generated randomly, whose n4 entries are i.i.d. standard

normals. Basically we have a choice to make in the recursion in Algorithm 3.2.3,

yielding two different methods, both of which call the deterministic routine DR 3.2.2.

Method 1 follows the standard recursion procedures in Algorithm 3.2.3, and its first

relaxation problem is

v1 = max F (Z,w) =∑

1≤i,j,k,`≤n Fijk`Zijkw`

s.t. Z ∈ Sn×n×n,w ∈ Sn.

After we get its optimal solution (Z∗,w∗), we fix w∗ and the problem is then reduced

to a trilinear case of (ETS), where recursion goes on. The objective value of the ap-

proximate solution obtained is denoted by v1, and a ratio τ1 := v1/v1 is also computed.

On the other hand, Method 2 chooses the other relaxation as its first step, which is

v2 = max F (X,Z) =∑

1≤i,j,k,`≤n Fijk`XijZk`

s.t. X,Z ∈ Sn×n.

After we get its optimal solution (X∗,Z∗), we may first fix Z∗ and apply DR 3.2.2

to decompose X∗ into x, y ∈ Sn, and then fix (x, y) and apply DR 3.2.2 again to

decompose Z∗ into z, w ∈ Sn, resulting a feasible solution. We also compute its

objective value v2 and a ratio τ2 := v2/v2.

According to Theorem 3.2.4, Method 1 enjoys a theoretic worst-case performance

ratio of 1/n. Method 2 follows a similar fashion of Algorithm 3.2.3 by choosing a dif-

ferent recursion. It also enjoys a worst-case ratio of 1/n, which can be proven similarly

as that of Theorem 3.2.4. From the simulation results in Table 3.1, the objective values

of their feasible solutions are indeed very close. However, Method 2 computes a much

better upper bound of v(ETS), and thus ends up with a better approximation ratio.

The numerical results in Table 3.1 seem to indicate that the performance ratio of

Method 1 is about 2/n, while that of Method 2 is about 1/√n. The main reason for

the difference of upper bounds of v(ETS) (v1 vs. v2) is the first relaxation methods. By

Proposition 3.2.1 we may guess that v1 = Ω(‖F ‖/√n), while v2 = Ω(‖F ‖/n), and this

may contribute to the large gap between v1 and v2. Consequently, it is quite possible

that the true value of v(ETS) is closer to the solution values (v1 and v2), rather than the

optimal value of the relaxed problem (v2). The real quality of the solutions produced

is possibly much better than what is shown by the upper bounds.


Table 3.1: Numerical results (average of 10 instances) of (ETS)

n 2 5 10 20 30 40 50 60 70

v1 2.61 5.64 8.29 9.58 12.55 13.58 15.57 17.65 18.93

v2 2.69 6.57 7.56 10.87 11.74 13.89 14.56 17.10 17.76

v1 3.84 12.70 34.81 93.38 169.08 258.94 360.89 472.15 594.13

v2 2.91 9.46 20.46 39.40 59.55 79.53 99.61 119.77 140.03

τ1 (%) 67.97 44.41 23.81 10.26 7.42 5.24 4.31 3.74 3.19

τ2 (%) 92.44 69.45 36.95 27.59 19.71 17.47 14.62 14.28 12.68

n · τ1 1.36 2.22 2.38 2.05 2.23 2.10 2.16 2.24 2.23

n · τ2 1.85 3.47 3.69 5.52 5.91 6.99 7.31 8.57 8.88√n · τ2 1.31 1.55 1.17 1.23 1.08 1.10 1.03 1.11 1.06

Table 3.2: CPU seconds (average of 10 instances) for solving (ETS)

n 5 10 20 30 40 50 60 70 80 90 100 150

Method 1 0.01 0.01 0.02 0.06 0.20 0.45 0.95 1.94 3.04 5.08 8.04 58.4

Method 2 0.01 0.02 1.13 12.6 253 517 2433 9860 ∞ ∞ ∞ ∞

Although Method 2 works clearly better than Method 1 in terms of upper bound

of v(ETS), it requires much more computational time. The most expensive part of

Method 2 is in its first relaxation, computing the eigenvalue and its corresponding

eigenvector of an n2 × n2 matrix. In comparison, for Method 1 the corresponding part

involves only an n× n matrix. Evidence in Table 3.2 shows that Method 1 can find a

good quality solution very fast even for large size problems. We remark here that for

n = 100, the sizes of the input data are already in the magnitude of 108.

3.5.2 Data with Known Optimal Solutions

The upper bounds appear to be quite loose in general, as one may observe from the

previous numerical results. To test how good the solutions are without referring to

the computed upper bounds, in this subsection we report the test results where the

problem instances are constructed in such a way that the optimal solutions are known.

By this we hope to get some impression, from a different angle, on the quality of

the approximate solutions produced by our algorithms. We first randomly generate a


Table 3.3: Numerical ratios of (ETS) with known optima for n = 50

m 5 10 20 30 40 50 100 150 200

Minimal ratio (%) 50 66 43 37 37 100 100 100 100

Maximal ratio (%) 100 100 100 100 100 100 100 100 100

Average ratio (%) 97 86 76 87 97 100 100 100 100

Optimality instances (%) 7 10 35 71 94 100 100 100 100

vector a ∈ Sn, and generate m symmetric matrices Ai ∈ Rn×n (1 ≤ i ≤ m) with their

eigenvalues lying in the interval [−1, 1] and Aia = a for i = 1, 2, . . . ,m. Then, we

randomly generate a vector b ∈ Sn, and m symmetric matrices Bi ∈ Rn×n (1 ≤ i ≤ m)

with their eigenvalues in the interval [−1, 1] and Bib = b for i = 1, 2, . . . ,m. Define

F (x,y, z,w) =m∑i=1

(xTAiy · zTBiw

).

For this particular multilinear form F (x,y, z,w), it is easy to see that (ETS) has an

optimal solution (a,a, b, b) and optimal value is m.

We generate such random instances with n = 50 for various m, and subsequently

apply Method 1 to solve them. Since the optimal values are known, it is possible to

compute the exact performance ratios, i.e., v1/m. For each m, 200 random instances

are generated and tested. The results are shown in Table 3.3, which suggest that

our algorithm works very well and the performance ratios are much better than the

theoretical worst-case bounds. Indeed, whenever m ≥ 50 our algorithm always finds

optimal solutions.

Chapter 4

Homogeneous Form Optimization

with Quadratic Constraints

4.1 Introduction

This section studies optimizations of an important class of polynomial functions, name-

ly, homogeneous polynomials, or forms. The constraint set is defined by homogeneous

quadratic polynomial equalities or inequalities. Specifically, the models include maxi-

mizing a homogenous form over the Euclidean sphere

(HS) max f(x)

s.t. x ∈ Sn,

and maximizing a homogenous form over the intersection of co-centered ellipsoids

(HQ) max f(x)

s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,

x ∈ Rn,

where Qi 0 for k = 1, 2, . . . , d, and∑m

i=1Qi 0.

As a general extension, we also consider optimizations on mixed forms, i.e.,

Function M f(x1,x2, · · · ,xs) = F (x1,x1, · · · ,x1︸︷︷︸d1

,x2,x2, · · · ,x2︸︷︷︸d2

, · · · ,xs,xs, · · · ,xs︸︷︷︸ds

),

where d = d1 + d2 + · · · + ds is deemed as a fixed constant, and d-th order tensor

F ∈ Rn1d1×n2

d2×···×nsds has partial symmetric property. The mixed form optimization

49

50 4 Homogeneous Form Optimization with Quadratic Constraints

models include

(MS) max f(x1,x2, · · · ,xs)

s.t. xk ∈ Snk , k = 1, 2, . . . , s;

(MQ) max f(x1,x2, · · · ,xs)


xk ∈ Rnk , k = 1, 2, . . . , s,

where Qkik 0 and

∑mkik=1Q

kik 0 for k = 1, 2, . . . , s, ik = 1, 2, . . . ,mk.

The model (MS) is a generalization of (HS) and (TS) in Section 3.2, and (MQ) is

a generalization of (HQ) and (TQ) in Section 3.3. When the degree of the polynomial

objective is odd, (HS) is equivalent to

max f(x)

s.t. x ∈ Sn.

This is because we can always use −x to replace x if its objective value is negative, and

can also scale the vector x along its direction to make it in Sn. Thus (HS) is a special

case of (HQ) when d is odd. However if d is even, the optimal value of (HS) may be

negative, while that of (HQ) is always nonnegative since 0 is always a feasible solution

of (HQ). In the former case, the tensor F is called negative definite, i.e., f(x) < 0 for

all x 6= 0.

The model (HS) is in general NP-hard. When d = 1, (HS) has a close-form solution,

due to the Cauchy-Schwartz inequality; And when d = 2, (HS) is to compute the largest

eigenvalue of the symmetric matrix F ; However (HS) becomes NP-hard when d = 3,

first proven by Nesterov [90]. Interestingly, when d ≥ 3, the model (HS) is also regarded

as computing the largest eigenvalue of the super-symmetric tensor F , like the case d = 2

(see e.g., Qi [98]). Luo and Zhang [77] proposed the first polynomial-time randomized

approximation algorithm with relative approximation ratio Ω(1/n2

)when d = 4, based

on its quadratic SDP relaxation and randomization techniques.

For the model (HQ): When d = 1, it can be formulated as a standard SOCP prob-

lem, which is solvable in polynomial-time; When d = 2, it is the well known QCQP

problem discussed in Section 2.5, and known to be NP-hard in general. Nemirovski et

al. [87] proposed a polynomial-time randomized approximation algorithm with approx-

imation ratio Ω (1/ logm) based on SDP relaxation and randomization, and this ratio

is actually tight; When d = 4, Luo and Zhang [77] established the relationship between

4.1 Introduction 51

(HQ) and its quadratic SDP relaxation, and proposed polynomial-time approximation

algorithm when the number of constraints is one. Meanwhile, Ling et al. [73] proposed

bi-quadratic optimization model, which is exactly the model (MS) when d = 4 and

d1 = d2 = 2. In particular, they established the equivalence between (MS) and its

quadratic SDP relaxation, based on which they proposed a polynomial-time random-

ized approximation algorithm with relative approximation ratio Ω(1/n2

2).

For the model (MS), the computational complexity is similar to its special cases (TS)

and (HS). It is solvable in polynomial-time when d ≤ 2, and is NP-hard when d ≥ 3,

which is claimed in Section 4.4.1. Moreover, when d ≥ 4 and all di (1 ≤ k ≤ s) are even,

there is no polynomial-time approximation algorithm with a positive approximation

ratio unless P = NP . This is verified in its simple case of d = 4 and d1 = d2 = 2

by Ling et al. [73]. The complexity of (MQ) is also same to that of (HQ), i.e., being

solvable in polynomial-time when d = 1 and NP-hard when d ≥ 2. Meanwhile, a special

case of (MQ) when d = 4 and d1 = d2 = 2 is the biquadratic form optimization over

quadratic constraints, studied by Zhang et al. [123] and Ling al. [74]. In their work,

the relationship between biquadratic optimization and its bilinear SDP relaxation is

established, as well as some data dependent approximation bounds are derived.

In this chapter, we are going to present polynomial-time approximation algorithms

with guaranteed worse-case performance ratios for the models concerned. Our algo-

rithms work for any fixed degree d, and the approximation ratios improve all the previ-

ous works specialized to their particular degrees. The major break though for our work

is the multilinear tensor form relaxation, in stead of quadratic SDP relaxation methods

in [77, 73]. The relaxed multilinear optimization problems admit polynomial-time ap-

proximation algorithms discussed in Chapter 3. After we solved the relaxed problems,

we merge a bunch of relaxed variables into one feasible solution by a link identity, and

argue the quality ratios being deteriorated only by some constants, which is the main

contribution in this chapter.

The approximate algorithms of (HS) are presented in Section 4.2, followed by that

of (HQ) in Section 4.3. Models (MS) and (MQ) will be studied in Section 4.4. In

Section 4.5, we discuss some applications with the models presented in this chapter.

Finally, numerical performance of the proposed algorithms will be reported in Sec-

tion 4.6.


4.2 Homogeneous Form with Spherical Constraint

The first model in this chapter is to maximize a homogenous polynomial function of

fixed degree d over a sphere, i.e.,

(HS) max f(x)

s.t. x ∈ Sn.

Let F be the super-symmetric tensor satisfying F (x,x, · · · ,x︸︷︷︸d

) = f(x). Then (HS)

can be relaxed to multilinear form optimization model (TS) discussed in Chapter 3, as

follows:

(HS) max F (x1,x2, · · · ,xd)

s.t. xk ∈ Sn, k = 1, 2, . . . , d.

Theorem 3.2.4 asserts that (HS) can be solved approximately in polynomial-time, with

approximation ratio n−d−22 . The key step is to draw a feasible solution of (HS) from

the approximate solution of (HS). For this purpose, we establish the following link

between (HS) and (HS).

Lemma 4.2.1 Suppose x1,x2, . . . ,xd ∈ Rn, and ξ1, ξ2, · · · , ξd are i.i.d. random vari-

ables, each taking values 1 and −1 with equal probability 1/2. For any super-symmetric

d-th order tensor F and function f(x) = F (x,x, · · · ,x), it holds that

E

[d∏i=1

ξif

(d∑

k=1

ξkxk

)]= d!F (x1,x2, · · · ,xd).

Proof. First we observe that

E

[d∏i=1

ξif

(d∑

k=1

ξkxk

)]= E

d∏i=1

ξi∑

1≤k1,k2,··· ,kd≤dF(ξk1x

k1 , ξk2xk2 , · · · , ξkdx

kd)

=∑

1≤k1,k2,··· ,kd≤dE

d∏i=1

ξi

d∏j=1

ξkjF(xk1 ,xk2 , · · · ,xkd

) .If k1, k2, · · · , kd is a permutation of 1, 2, . . . , d, then

E

d∏i=1

ξi

d∏j=1

ξkj

= E

[d∏i=1

ξ2i

]= 1;

Otherwise, there must be an index k0 with 1 ≤ k0 ≤ d and k0 6= kj for all 1 ≤ j ≤ d.

In the latter case,

E

d∏i=1

ξi

d∏j=1

ξkj

= E [ξk0 ] E

∏1≤i≤d,i 6=k0

ξi

d∏j=1

ξkj

= 0.

4.2 Homogeneous Form with Spherical Constraint 53

Since the number of different permutations of 1, 2, . . . , d is d!, by taking into account

of the super-symmetric property of the tensor F , the claimed relation follows.

When d is odd, the identity in Lemma 4.2.1 can be rewritten as

d!F (x1,x2, · · · ,xd) = E

[d∏i=1

ξif

(d∑

k=1

ξkxk

)]= E

f d∑k=1

∏i 6=k

ξi

xk .

Since ξ1, ξ2, · · · , ξd are i.i.d. random variables taking values 1 or −1, by randomization

we may find a particular binary vector β ∈ Bd, such that

f

d∑k=1

∏i 6=k

βi

xk ≥ d!F (x1,x2, · · · ,xd). (4.1)

We remark that d is considered a constant parameter in this thesis. Therefore, searching

over all the combinations can be done, in principle, in constant time.

Let x =∑d

k=1

(∏i 6=k βi

)xk, and x = x/‖x‖. By the triangle inequality, we have

‖x‖ ≤ d, and thus

f(x) ≥ d!d−dF (x1,x2, · · · ,xd).

Combining with Theorem 3.2.4, we have

Theorem 4.2.2 When d ≥ 3 is odd, (HS) admits a polynomial-time approximation

algorithm with approximation ratio τ(HS), where

τ(HS) := d! d−dn−d−22 = Ω

(n−

d−22

).

The algorithm for approximately solving (HS) with odd d is highlighted below.

Algorithm 4.2.1

• INPUT: a d-th order super-symmetric tensor F ∈ Rnd

1 Apply Algorithm 3.2.3 to solve the problem

max F (x1,x2, · · · ,xd)

s.t. xk ∈ Sn, k = 1, 2, . . . , d.

approximately, with input F and output (x1, x2, · · · , xd).

2 Compute β = arg maxξ∈Bdf(∑d

k=1 ξkxk)

, or randomly generate β uniformly

on Bd and repeat if necessary, until f(∑d

k=1 βkxk)≥ d!F (x1, x2, · · · , xd).


3 Compute x =∑d

k=1 βkxk/‖∑d

k=1 βkxk‖.

• OUTPUT: a feasible solution x ∈ Sn.

We remark that it is unnecessary to enumerate all possible 2d combinations in Step

2 of Algorithm 4.2.1, as (4.1) suggests that a simple randomization process will serve

the same purpose, especially when d is large. In the latter case, we will end up with

a polynomial-time randomized approximation algorithm; otherwise, the computational

complexity of Algorithm 4.2.1 is deterministic and is polynomial-time for fixed d.

When d is even, the only easy case of (HS) appears when d = 2, and even worse,

we have the following:

Proposition 4.2.3 If d = 4, then there is no polynomial-time approximation algorithm

with a positive approximation ratio for (HS) unless P = NP .

Proof. Let f(x) = F (x,x,x,x) with F being super-symmetric. We say quartic form

F (x,x,x,x) is positive semidefinite if F (x,x,x,x) ≥ 0 for all x ∈ Rn. It is well

known that checking the positive semidefiniteness of F (x,x,x,x) is co-NP-complete.

If we were able to find a polynomial-time approximation algorithm to get a positive

approximation ratio τ ∈ (0, 1] for v∗ = maxx∈Sn −F (x,x,x,x), then this algorithm

can be used to check the positive semidefiniteness of F (x,x,x,x). To see why, suppose

this algorithm returns a feasible solution x with −F (x, x, x, x) > 0, then F (x,x,x,x)

is not positive semidefinite. Otherwise the algorithm must return a feasible solution x

with 0 ≥ −F (x, x, x, x) ≥ τ v∗, which implies v∗ ≤ 0; hence, F (x,x,x,x) is positive

semidefinite in this case. Therefore, such algorithm cannot exist unless P = NP .

This negative result rules out any polynomial-time approximation algorithm with

a positive absolute approximation ratio for (HS) when d ≥ 4 is even. Thus we can only

speak of relative approximation ratio. The following algorithm applies for (HS) when

d is even.

Algorithm 4.2.2

• INPUT: a d-th order super-symmetric tensor F ∈ Rnd.

4.2 Homogeneous Form with Spherical Constraint 55

1 Choose any vector x0 ∈ Sn and define a d-th order super-symmetric tensor H ∈

Rnd with respect to the homogeneous polynomial h(x) = (xTx)d/2.


max F (x1,x2, · · · ,xd)− f(x0)H(x1,x2, · · · ,xd)

s.t. xk ∈ Sn, k = 1, 2, . . . , d

approximately, with input F − f(x0)H and output (x1, x2, · · · , xd).

3 Compute β = arg maxξ∈Bd,∏dk=1 ξk=1

f

( ∑dk=1 ξkx

k

‖∑dk=1 ξkx

k‖

).

4 Compute x = arg max

f(x0), f

( ∑dk=1 βkx

k

‖∑dk=1 βkx

k‖

).

• OUTPUT: a feasible solution x ∈ Sn.

Theorem 4.2.4 When d ≥ 4 is even, (HS) admits a polynomial-time approximation

algorithm with relative approximation ratio τ(HS).

Proof. Denote H to be the super-symmetric tensor with respect to the homogeneous

polynomial h(x) = ‖x‖d = (xTx)d/2. Explicitly, if we denote Π to be the set of all

permutations of 1, 2, . . . , d, then

H(x1,x2, · · · ,xd) =1

|Π|∑

(i1,i2,··· ,id)∈Π

((xi1)Txi2

) ((xi3)Txi4

)· · ·((xid−1)Txid

).

For any xk ∈ Sn (k = 1, 2, . . . , d), we have |H(x1,x2, · · · ,xd)| ≤ 1 by applying the

Cauchy-Schwartz inequality termwise.

Pick any fixed x0 ∈ Sn, and consider the following problem

(HS) max F (x1,x2, · · · ,xd)− f(x0)H(x1,x2, · · · ,xd)

s.t. xk ∈ Sn, k = 1, 2, . . . , d.

Applying Theorem 3.2.4 we obtain a solution (x1, x2, · · · , xd) in polynomial-time, with

F (x1, x2, · · · , xd)− f(x0)H(x1, x2, · · · , xd) ≥ τ(TS)v(HS),

where τ(TS) := n−d−22 .

Let us first work on the case that

f(x0)− v(HS) ≤ (τ(TS)/4) (v(HS)− v(HS)) . (4.2)


Since |H(x1, x2, · · · , xd)| ≤ 1, we have

F (x1, x2, · · · , xd)− v(HS)H(x1, x2, · · · , xd)

= F (x1, x2, · · · , xd)− f(x0)H(x1, x2, · · · , xd) +(f(x0)− v(HS)

)H(x1, x2, · · · , xd)

≥ τ(TS)v(HS)−(f(x0)− v(HS)

)≥ τ(TS)

(v(HS)− f(x0)

)− (τ(TS)/4) (v(HS)− v(HS))

≥ (τ(TS)(1− τ(TS)/4)− τ(TS)/4) (v(HS)− v(HS))

≥ (τ(TS)/2) (v(HS)− v(HS)) ,

where the second inequality is due to the fact that the optimal solution of (HS) is

feasible for (HS).

On the other hand, let ξ1, ξ2, · · · , ξd be i.i.d. random variables, each taking values

1 and −1 with equal probability 1/2. By symmetricity, we have Prob∏d

i=1 ξi = 1

=

Prob∏d

i=1 ξi = −1

= 1/2. Applying Lemma 4.2.1 we know

d!(F (x1, x2, · · · , xd)− v(HS)H(x1, x2, · · · , xd)

)= E

[d∏i=1

ξi

(f

(d∑

k=1

ξkxk

)− v(HS)h

(d∑

k=1

ξkxk

))]

= E

f ( d∑k=1

ξkxk

)− v(HS)

∥∥∥∥∥d∑

k=1

ξkxk

∥∥∥∥∥d ∣∣∣∣∣

d∏i=1

ξi = 1

Prob

d∏i=1

ξi = 1

−E

f ( d∑k=1

ξkxk

)− v(HS)

∥∥∥∥∥d∑

k=1

ξkxk

∥∥∥∥∥d ∣∣∣∣∣

d∏i=1

ξi = −1

Prob

d∏i=1

ξi = −1

≤ 1

2E

f ( d∑k=1

ξkxk

)− v(HS)

∥∥∥∥∥d∑

k=1

ξkxk

∥∥∥∥∥d ∣∣∣∣∣

d∏i=1

ξi = 1

,where the last inequality is due to the fact that

f

(d∑

k=1

ξkxk

)− v(HS)

∥∥∥∥∥d∑

k=1

ξkxk

∥∥∥∥∥d

≥ 0,

since∑d

k=1 ξkxk/∥∥∥∑d

k=1 ξkxk∥∥∥ ∈ Sn. Thus by randomization, we can find β ∈ Bd

with∏di=1 βi = 1, such that

1

2

f ( d∑k=1

βkxk

)− v(HS)

∥∥∥∥∥d∑

k=1

βkxk

∥∥∥∥∥d ≥ d! (τ(TS)/2) (v(HS)− v(HS)) .

By letting x =∑d

k=1 βkxk/∥∥∥∑d

k=1 βkxk∥∥∥, and noticing ‖

∑dk=1 βkx

k‖ ≤ d, we have

f(x)− v(HS) ≥ d!τ(TS) (v(HS)− v(HS))

‖∑d

k=1 βkxk‖d

≥ τ(HS) (v(HS)− v(HS)) .

4.3 Homogeneous Form with Ellipsoidal Constraints 57

Recall that the above inequality is derived under the condition that (4.2) holds. In

case (4.2) does not hold, then

f(x0)− v(HS) > (τ(TS)/4) (v(HS)− v(HS)) ≥ τ(HS) (v(HS)− v(HS)) . (4.3)

By picking x = arg maxf(x), f(x0), regardless whether (4.2) or (4.3) holds, we shall

uniformly have f(x)− v(HS) ≥ τ(HS) (v(HS)− v(HS)).

4.3 Homogeneous Form with Ellipsoidal Constraints

We proceed a further generalization of the optimization models to include general

ellipsoidal constraints.

(HQ) max f(x)

s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,

x ∈ Rn,

where f(x) is a homogenous polynomial of degree d, Qi 0 for i = 1, 2, . . . ,m, and∑mi=1Qi 0.

If we relax (HQ) to the multilinear form optimization problem like (TQ), and we

have

(HQ) max F (x1,x2, · · · ,xd)

s.t. (xk)TQixk ≤ 1, k = 1, 2, . . . , d, i = 1, 2, . . . ,m,

xk ∈ Rn, k = 1, 2, . . . , d.

Theorem 3.3.4 asserts an approximate solution for (HQ), together with Lemma 4.2.1

we propose the following algorithm for approximately solving (HQ), no matter d is odd

or even.

Algorithm 4.3.1

• INPUT: a d-th order super-symmetric tensor F ∈ Rnd, matrices Qi ∈ Rn×n, Qi

0 for all 1 ≤ i ≤ m with∑m

i=1Qi 0.


max F (x1,x2, · · · ,xd)

s.t. (xk)TQixk ≤ 1, k = 1, 2, . . . , d, i = 1, 2, . . . ,m,

xk ∈ Rn, k = 1, 2, . . . , d


approximately, and get a feasible solution (x1, x2, · · · , xd).

2 Compute x = arg maxf(

1d

∑dk=1 ξkx

k), ξ ∈ Bd

.

• OUTPUT: a feasible solution x ∈ Rn.

Although Algorithm 4.3.1 applies for both odd and even d of the model (HQ), the

approximation results are different, as the following theorems claimed.

Theorem 4.3.1 When d ≥ 3 is odd, (HQ) admits a polynomial-time randomized ap-

proximation algorithm with approximation ratio τ(HQ), where

τ(HQ) := d! d−dn−d−22 Ω

(log−(d−1)m

)= Ω

(n−

d−22 log−(d−1)m

).

Proof. According to Theorem 3.3.4 we can find a feasible solution (x1, x2, · · · , xd) of

(HQ) in polynomial-time, such that

F (x1, x2, · · · , xd) ≥ τ(HQ)v(HQ) ≥ τ(HQ)v(HQ), (4.4)

where τ(HQ) := n−d−22 Ω

(log−(d−1)m

). By (4.1), we can find a binary vector β ∈ Bd

in polynomial-time, such that

f

(d∑i=1

βixi

)≥ d!F (x1, x2, · · · , xd).

Notice that for any 1 ≤ k ≤ m,(d∑i=1

βixi

)T

Qk

d∑j=1

βjxj

=

d∑i,j=1

βi(xi)TQkβjx

j

=d∑

i,j=1

(βiQk

12 xi)T (

βjQk

12 xj)≤

d∑i,j=1

∥∥∥βiQk

12 xi∥∥∥∥∥∥βjQk

12 xj∥∥∥

=

d∑i,j=1

√(xi)TQkx

i√

(xj)TQkxj ≤

d∑i,j=1

1 · 1 = d 2. (4.5)

If we denote x = 1d

∑di=1 βix

i, then x is a feasible solution for (HQ), satisfying

f(x) ≥ d−dd!F (x1, x2, · · · , xd) ≥ d−dd!τ(HQ) v(HQ) = τ(HQ) v(HQ).

4.4 Mixed Form with Quadratic Constraints 59

Theorem 4.3.2 When d ≥ 4 is even, (HQ) admits a polynomial-time randomized

approximation algorithm with relative approximation ratio τ(HQ).

Proof. First, we observe that v(HQ) ≤ v(HQ) and v(HQ) ≥ v(HQ) = −v(HQ). There-

fore, 2 v(HQ) ≥ v(HQ)− v(HQ). Let (x1, x2, · · · , xd) be the feasible solution for (HQ)

as in the proof of Theorem 4.3.1, satisfying (4.4). According to (4.5), it follows that

x = arg maxf(

1d

∑dk=1 ξkx

k), ξ ∈ Bd

is feasible for (HQ), where ξ1, ξ2, · · · , ξd are

i.i.d. random variables, each taking values 1 and −1 with equal probability 1/2. There-

fore, by Lemma 4.2.1 we have

2 d!F (x1, x2, · · · , xd) = 2 E

[d∏i=1

ξif

(d∑

k=1

ξkxk

)]

= E

[f

(d∑

k=1

ξkxk

)−ddv(HQ)

∣∣∣∣∣d∏i=1

ξi = 1

]− E

[f

(d∑

k=1

ξkxk

)−ddv(HQ)

∣∣∣∣∣d∏i=1

ξi = −1

]

≤ E

[f

(d∑

k=1

ξkxk

)−ddv(HQ)

∣∣∣∣∣d∏i=1

ξi = 1

]≤ ddf (x)− ddv(HQ).

According to (4.4), this implies that

f (x)−v(HQ) ≥ 2d−dd!F (x1, x2, · · · , xd) ≥ 2τ(HQ)v(HQ) ≥ τ(HQ) (v(HQ)− v(HQ)) .

4.4 Mixed Form with Quadratic Constraints

In this section, we further extend polynomial optimization models to the mixed forms.

Specifically, we study the following two models:

(MS) max f(x1,x2, · · · ,xs)

s.t. xk ∈ Snk , k = 1, 2, . . . , s;

(MQ) max f(x1,x2, · · · ,xs)


xk ∈ Rnk , k = 1, 2, . . . , s,

where Qkik 0 and

∑mkik=1Q

kik 0 for k = 1, 2, . . . , s, ik = 1, 2, . . . ,mk. Here we

assume that n1 ≤ n2 ≤ · · · ≤ ns.

Both (TS) in Section 3.2 and (HS) in Section 4.2 are special cases of (MS), and both

(TQ) in Section 3.3 and (HQ) in Section 4.3 are special cases of (MQ). In particular,


(MS) is a generalization of the bi-quadratic optimization model discussed in Ling et

al. [73], specialized to d = 4 and d1 = d2 = 2.

4.4.1 Mixed Form with Spherical Constraints

Let us study the optimization model (MS). First, we have the following hardness result.

Proposition 4.4.1 If d = 3, then (MS) is NP-hard.

Proof. We need verify the NP-hardness in three cases of d = 3, i.e., d1 = 3, d1 = 2 and

d2 = 1, and d1 = d2 = d3 = 1. The case of d1 = 3 is exactly (HS) with d = 3, whose

NP-hardness is proven by Nesterov [90], and the case of d1 = d2 = d3 = 1 is exactly

(TS) with d = 3, whose NP-hardness is proven in Proposition 3.2.2.

When d1 = 2 and d2 = 1, in its special case n1 = n2 = n and F ∈ Rn3satisfying

Fijk = Fjik for all 1 ≤ i, j, k ≤ n, we notice the following form of (TS) is NP-hard in

the proof of Proposition 3.2.2

(TS) max F (x,y, z)

s.t. x,y, z ∈ Sn.

We are going to show that the optimal value of (TS) is equal to the optimal value of

this special case

(MS) max F (x,x, z)

s.t. x, z ∈ Sn.

It is obvious v(TS) ≥ v(MS). Now choose any optimal solution (x∗,y∗, z∗) of (TS)

and compute the matrix M = F (·, ·, z∗). Since M is symmetric, we can compute

an eigenvector x corresponding to the largest absolute eigenvalue λ (which is also the

largest singular value) in polynomial-time. Observe that

|F (x, x, z∗)|= |xTMx|=λ= maxx,y∈Sn

xTMy= maxx,y∈Sn

F (x,y, z∗)=F (x∗,y∗, z∗)=v(TS),

which implies either (x, x, z∗) or (x, x,−z∗) is an optimal solution of (TS). Therefore

v(TS) ≤ v(MS), and this claims v(TS) = v(MS). If (MS) can be solved in polynomial-

time, then its optimal solution is also an optimal solution of (TS), which solves (TS) in

polynomial-time, leading to a contradiction.

Now, we focus on polynomial-time approximation algorithms as before. Similar to

the relaxation in Section 4.2 in handling homogeneous polynomial optimizations, if we


relax (MS) to the multilinear optimization (TS), then by Theorem 3.2.4 we are able to

find (x1, x2, · · · , xd) with ‖xk‖ = 1 for all 1 ≤ k ≤ d in polynomial-time, such that

F (x1, x2, · · · , xd) ≥ τ(MS)v(MS), (4.6)

where

τ(MS) :=

(∏s−1k=1 nk

dk

ns−1

)− 12

ds = 1,

(∏sk=1 nk

dk

ns2

)− 12

ds ≥ 2.

In order to draw a feasible solution of (MS) from (x1, x2, · · · , xd), we need apply the

link identity in Lemma 4.2.1 more carefully. Comparable approximation results as (HS)

can be similarly derived.

Theorem 4.4.2 If d ≥ 3 and one of dk (k = 1, 2, . . . , s) is odd, then (MS) admits a

polynomial-time approximation algorithm with approximation ratio τ(MS), where

τ(MS) := τ(MS)∏

1≤k≤s, 3≤dk

dk!

dkdk

= Ω (τ(MS))

=

∏1≤k≤s, 3≤dk

dk!

dkdk

(∏s−1k=1 nk

dk

ns−1

)− 12

ds = 1,

∏1≤k≤s, 3≤dk

dk!

dkdk

(∏sk=1 nk

dk

ns2

)− 12

ds ≥ 2.

To avoid messy notations and better understand the main ideas of the algorithm

and the proof, here we only consider a special case (MS), which is easily extended to

general (MS).

(MS) max F (x,x,x,x,y,y, z, z, z)

s.t. x ∈ Sn1 ,y ∈ Sn2 , z ∈ Sn3 .

By (4.6), we are able to find x1,x2,x3,x4 ∈ Sn1 ,y1,y2 ∈ Sn2 , and z1, z2, z3 ∈ Sn3

in polynomial-time, such that

F (x1,x2,x3,x4,y1,y2, z1, z2, z3) ≥ τ(MS)v(MS).

Let us first fix (y1,y2, z1, z2, z3) and work for the problem

max F (x,x,x,x,y1,y2, z1, z2, z3)

s.t. x ∈ Sn1 .


Using the same argument in proving Theorem 4.2.2, we are able to find x ∈ Sn1 , such

that either F (x, x, x, x,y1,y2, z1, z2, z3) or F (x, x, x, x,y1,y2,−z1, z2, z3) will be no

less than 4!4−4F (x1,x2,x3,x4,y1,y2, z1, z2, z3), whereas in the latter case, use −z1

to update z1. Here even degree (d1 = 4) for x makes no trouble, as we can always move

the negative sign into z1. We call this process to be an adjustment of the variable x.

Up till now the approximation bound is 4!4−4τ(MS).

Next we work on adjustment of variable y and consider the problem

max |F (x, x, x, x,y,y, z1, z2, z3)|

s.t. y ∈ Sn2 .

This is the matrix largest absolute eigenvalue problem and can be solved in polynomial-

time. Denote its optimal solution to be y, update z1 with −z1 if necessary, and we

keep the approximation bound 4!4−4τ(MS) for the solution (x, x, x, x, y, y, z1, z2, z3).

The last adjustment of the variable z is straightforward. Similar to the adjustment

of the variable x, by focusing on

max F (x, x, x, x, y, y, z, z, z)

s.t. z ∈ Sn3 ,

we can find z ∈ Sn3 in polynomial-time, such that the solution (x, x, x, x, y, y, z, z, z)

admits an approximation bound 3!3−34!4−4τ(MS).

We remark here that the variable z is the last variable for adjustment, since we

cannot move the negative sign to other adjusted variables if the degree of z is even.

That is why we require one of dk to be odd as the condition of Theorem 4.4.2, where

we can always adjust the last variable with an odd degree.

However, if all dk (k = 1, 2, . . . , s) are even, we can only hope for a relative approx-

imation ratio. For its simplest case when d = 4 and d1 = d2 = 2, the bi-quadratic

optimization model maxx∈Sn1 ,y∈Sn2 F (x,x,y,y) does not admit any polynomial-time

approximation algorithm with a positive approximation ratio by Ling et al. [73]. Be-

fore working on this even case, let us first introduce the following link extended from

Lemma 4.2.1.

Lemma 4.4.3 Suppose xk ∈ Rn1 (1 ≤ k ≤ d1), xk ∈ Rn2 (d1 + 1 ≤ k ≤ d1 +

d2), · · · , xk ∈ Rns (d1 + d2 + · · · + ds−1 + 1 ≤ k ≤ d1 + d2 + · · · + ds = d), and

ξ1, ξ2, · · · , ξd are i.i.d. random variables, each taking values 1 and −1 with equal prob-


ability 1/2. Denote

x1ξ =

d1∑k=1

ξkxk, x2

ξ =

d1+d2∑k=d1+1

ξkxk, · · · , xsξ =

d∑k=d1+d2+···+ds−1+1

ξkxk. (4.7)

For any partial symmetric d-th order tensor F ∈ Rn1d1×n2

d2×···×nsds and function

f(x1,x2, · · · ,xs) = F (x1,x1, · · · ,x1︸︷︷︸d1

,x2,x2, · · · ,x2︸︷︷︸d2

, · · · ,xs,xs, · · · ,xs︸︷︷︸ds

),

it holds that

E

[d∏i=1

ξif(x1ξ ,x

2ξ , · · · ,xsξ

)]=

s∏k=1

dk!F (x1,x2, · · · ,xd).

This lemma is easy to prove by applying Lemma 4.2.1 s times.

Theorem 4.4.4 If d ≥ 4 and all dk (k = 1, 2, . . . , s) are even, then (MS) admits

a polynomial-time approximation algorithm with relative approximation ratio τ(MS),

where

τ(MS) := τ(MS)

s∏k=1

dk!

dkdk

= Ω (τ(MS))

=

(s∏

k=1

dk!

dkdk

)(∏s−1k=1 nk

dk

ns−1

)− 12

ds = 1,

(s∏

k=1

dk!

dkdk

)(∏sk=1 nk

dk

ns2

)− 12

ds ≥ 2.

Proof. Denote H ∈ Rn1d1×n2

d2×···×nsds to be the partial symmetric d-th order tensor

with respect to the mixed form

h(x1,x2, · · · ,xs) =

s∏k=1

‖xk‖dk =

s∏k=1

((xk)Txk

) dk2.

Choose any fixed xk ∈ Snk for k = 1, 2, . . . , s, and consider the following problem

(MS) max F (x1,x2, · · · ,xd)− f(x1, x2, · · · , xs)H(x1,x2, · · · ,xd)

s.t. ‖xk‖ = 1, k = 1, 2, . . . , d.

Applying Theorem 3.2.4 we obtain a solution (x1, x2, · · · , xd) with ‖xk‖ = 1 for k =

1, 2, . . . , d in polynomial-time, such that

F (x1, x2, · · · , xd)− f(x1, x2, · · · , xs)H(x1, x2, · · · , xd) ≥ τ(MS)v(MS).


Let us start with the case that

f(x1, x2, · · · , xs)− v(MS) ≤ (τ(MS)/4) (v(MS)− v(MS)) . (4.8)

Notice that |H(x1, x2, · · · , xd)| ≤ 1, we have

F (x1, x2, · · · , xd)− v(MS)H(x1, x2, · · · , xd)

= F (x1, x2, · · · , xd)− f(x1, x2, · · · , xs)H(x1, x2, · · · , xd)

+(f(x1, x2, · · · , xs)− v(MS)

)H(x1, x2, · · · , xd)

≥ τ(MS)v(MS)−(f(x1, x2, · · · , xs)− v(MS)

)≥ τ(MS)

(v(MS)− f(x1, x2, · · · , xs)

)− (τ(MS)/4) (v(MS)− v(MS))

≥ (τ(MS)(1− τ(MS)/4)− τ(MS)/4) (v(MS)− v(MS))

≥ (τ(MS)/2) (v(MS)− v(MS)) ,

where the second inequality is due to the fact that the optimal solution of (MS) is

feasible for (MS).

On the other hand, using the notation of (4.7) and applying Lemma 4.4.3, we have

s∏k=1

dk!(F (x1, x2, · · · , xd)− v(MS)H(x1, x2, · · · , xd)

)= E

[d∏i=1

ξi(f(x1ξ , x

2ξ , · · · , xsξ

)− v(MS)h

(x1ξ , x

2ξ , · · · , xsξ

))]

= E

[f(x1ξ , x

2ξ , · · · , xsξ

)− v(MS)

s∏k=1

‖xkξ‖dk∣∣∣∣∣d∏i=1

ξi = 1

]Prob

d∏i=1

ξi = 1

−E

[f(x1ξ , x

2ξ , · · · , xsξ

)− v(MS)

s∏k=1


ξi = −1

]Prob

d∏i=1

ξi = −1

≤ 1

2E

[f(x1ξ , x

2ξ , · · · , xsξ

)− v(MS)

s∏k=1


ξi = 1

],

where the last inequality is due to f(x1ξ , x

2ξ , · · · , xsξ

)− v(MS)

∏sk=1 ‖x

kξ‖dk ≥ 0, since(

x1ξ/‖x1

ξ‖, x2ξ/‖x2

ξ‖, · · · , xsξ/‖xsξ‖)

is feasible for (MS). Thus, there is a binary vector

β ∈ Bd with∏di=1 βi = 1, such that

1

2

(f(x1β, x

2β, · · · , xsβ

)− v(MS)

s∏k=1

‖xkβ‖dk)≥

s∏k=1

dk! (τ(MS)/2) (v(MS)− v(MS)) .

by letting xk = xkβ/‖xkβ‖ for k = 1, 2, . . . , s, and noticing ‖xkβ‖ ≤ dk, we have

f(x1, x2, · · · , xs)− v(MS) ≥ τ(MS)s∏

k=1

dk!‖xkβ‖−dk (v(MS)− v(MS))

≥ τ(MS) (v(MS)− v(MS)) .


Recall that the above inequality is derived under the condition that (4.8) holds. In

case (4.8) does not hold, then we shall have

f(x1, x2, · · · , xs)− v(MS) > (τ(MS)/4) (v(MS)− v(MS)) ≥ τ(MS) (v(MS)− v(MS)) .

By picking (x1,x2, · · · ,xs) = arg maxf(x1, x2, · · · , xs), f(x1, x2, · · · , xs), we shall

uniformly have f(x1,x2, · · · ,xs)− v(MS) ≥ τ(MS) (v(MS)− v(MS)).

4.4.2 Mixed Form with Ellipsoidal Constraints

Finally, let us discuss the most general model (MQ) for homogeneous polynomial opti-

mization with quadratic constraints. We have similar results as of (MS) in Section 4.4.1.

Theorem 4.4.5 If d ≥ 3 and one of dk (k = 1, 2, . . . , s) is odd, then (MQ) admits a

polynomial-time randomized approximation algorithm with approximation ratio τ(MQ),

where

τ(MQ) := τ(MS) Ω(

log−(d−1)m) s∏k=1

dk!

dkdk

= Ω(τ(MS) log−(d−1)m

)

=

(s∏

k=1

dk!

dkdk

)(∏s−1k=1 nk

dk

ns−1

)− 12

Ω(

log−(d−1)m)

ds = 1,

(s∏

k=1

dk!

dkdk

)(∏sk=1 nk

dk

ns2

)− 12

Ω(

log−(d−1)m)

ds ≥ 2,

and m = max1≤k≤smk.

The proof of Theorem 4.4.5 is very similar to that of Theorem 4.4.2, where a typical

example is illustrated. Here we only highlight the main ideas. First we relax (MQ)

to the multilinear form optimization (TQ) which finds a feasible solution for (TQ) with

an approximation ratio τ(MS)Ω(

log−(d−1)m)

. Then, Lemma 4.4.3 servers as a bridge

from that solution to a feasible solution for (MQ). Specifically, we may adjust the

solution of (TQ) one by one. During each adjustment, we apply Lemma 4.2.1 once,

with the approximation ratio deteriorating no worse than dk!dk−dk . After s times of

adjustments, we are able to get a feasible solution for (MQ) with performance ratio

τ(MQ). Besides, the feasibility of the solution so-obtained is guaranteed by (4.5).

Theorem 4.4.6 If d ≥ 4 and all dk (k = 1, 2, . . . , s) are even, then (MQ) admits a

polynomial-time randomized approximation algorithm with relative approximation ratio

τ(MQ).


Proof. The proof is analogous to that of Theorem 4.3.2. The main differences are:

(i) we use Lemma 4.4.3 instead of invoking Lemma 4.2.1 directly; and (ii) we use

f(

1d1x1ξ ,

1d2x2ξ , · · · , 1

dsxsξ

)instead of f

(1d

∑dk=1 ξkx

k)

during the randomization pro-

cess.

4.5 Applications

To better appreciate the homogeneous polynomial optimization models presented in

this chapter, in this section we are going to present a few examples rising from various

applications. In particular we shall discuss applications of the models (HS) and (MS).

4.5.1 Eigenvalues and Approximation of Tensors

Similar as the eigenvalues of matrices, this kind of concept has been extended to higher

order tensors (see e.g., [98, 99, 100, 91]). In fact, the concept for eigenvalues of tensors

become richer than that restricting to matrices. Qi [98] proposed several definitions of

tensor eigenvalues, among which the most popular and straightforward one is named

Z-eigenvalues. For a given d-th order super-symmetric tensor F ∈ Rnd , its Z-eigenvalue

λ ∈ R with its corresponding eigenvector x ∈ Rn are defined to be the solutions of the

following system: F (x,x, . . . ,x︸︷︷︸

d−1

, ·) = λx,

xTx = 1.

Notice that the Z-eigenvalues are the usual eigenvalues for a symmetric matrix when

the order of the tensor is 2. It was proven in Qi [98] that Z-eigenvalues exist for an

even order real super-symmetric tensor F , and F is positive definite if and only if all

of its Z-eigenvalues are positive, which is similar to the matrix case. Thus, the smallest

Z-eigenvalue of an even order super-symmetric tensor F is an important indicator of

positive definiteness for F . Conversely, the largest Z-eigenvalue can be an indicator of

the negative definiteness for F , which is exactly the model (HS). In general, the optimal

value and any optimal solution of (HS) is the largest Z-eigenvalue and its corresponding

eigenvector for the tensor F , no matter d is even or odd. By Theorem 4.2.2, the largest

Z-eigenvalue of an odd order super-symmetric tensor F can be approximated with a

factor of d!d−dn−d−22 . For an even order tensor, this approximation ratio is in relative

4.5 Applications 67

sense. However if we know in advance that the given even order tensor is positive

semidefinite, we can also have an approximation factor of d!d−dn−d−22 for its largest

Z-eigenvalue.

Regarding to the tensor approximation, in Section 3.4.2 we have discussed the best

rank-one decomposition of a tensor. In case that the give tensor F ∈ Rnd is super-

symmetric, then the corresponding best rank-one approximation should be

min

∥∥∥∥F − x⊗ x⊗ · · · ⊗ x︸︷︷︸d

∥∥∥∥s.t. x ∈ Rn.

Applying the same technique discussed in Section 3.4.2, we can equivalently reformulate

the above problem as

max F (x,x, · · · ,x︸︷︷︸d

)

s.t. x ∈ Sn,

which is identical to the largest eigenvalue problem and (HS). In fact, when d is odd, if

we denote its optimal solution (largest Z-eigenvector) to be x and optimal value (largest

Z-eigenvalue) to be λ = F (x, x, · · · , x︸︷︷︸d

), then the best rank-one approximation of the

super-symmetric tensor F is λ x⊗ x⊗ · · · ⊗ x︸︷︷︸d

.

4.5.2 Density Approximation in Quantum Physics

An interesting problem in physics is to give a precise characterization of entanglement

in a quantum system. This describes types of correlations between subsystems of the

full quantum system that go beyond the statistical correlations that can be found in a

classical composite system. Specifically it arises a matrix approximation problem. The

following formulation is proposed in Dahl et al. [27].

Denote ∆n+ to be the set of all n×n positive semedefinite matrices with trace being

1, i.e., ∆n+ := A ∈ Rn×n |A 0, tr (A) = 1 (sometimes it is also called the matrix

simplex). Using the matrix decomposition method (see e.g., Sturm and Zhang [113]),

it is not hard to verify that the extreme points of ∆n+ are all rank-one matrices, or

specifically, ∆n+ = conv xxT |x ∈ Sn. If n = n1n2, where n1 and n2 are given two

positive integers, then we call a matrix A ∈ ∆n+ separable if A can be written as a

convex combination

A =m∑i=1

λiBi ⊗Ci


for some positive integer m, matrices Bi ∈ ∆n1+ and Ci ∈ ∆n2

+ for i = 1, 2, . . . ,m, and

nonnegative scalars λi (i = 1, 2, . . . ,m) with∑m

i=1 λi = 1. For given n1 and n2, denote

∆n,⊗+ to be the set of all separable matrices of order n = n1n2. The density approxima-

tion problem is the following. Given a density matrix A ∈ ∆n+, find a separable density

matrix X ∈ ∆n,⊗+ which is closest to A, or specifically, the minimization model

(DA) min ‖X −A‖

s.t. X ∈ ∆n,⊗+ .

This projection problem is in general NP-hard, mainly relying on the understanding

of ∆n,⊗+ . An important property of ∆n,⊗

+ is that all its extreme points are symmetric

rank-one matrices (x⊗y)(x⊗y)T with x ∈ Sn1 and y ∈ Sn2 (see the proof in Theorem

2.2 of [27]), i.e.,

∆n,⊗+ = conv (x⊗ y)(x⊗ y)T |x ∈ Sn1 , y ∈ Sn2.

Then in stead, we may turn to the projection subproblem of (DA), to find the projection

of A on the extreme points of ∆n,⊗+ , which is

min ‖(x⊗ y)(x⊗ y)T −A‖

s.t. x ∈ Sn1 , y ∈ Sn2 .

Straightforward computation shows that

‖(x⊗ y)(x⊗ y)T −A‖2 = 1− 2A • (x⊗ y)(x⊗ y)T + ‖A‖2.

Therefore the projection subproblem is equivalent to

max A • (x⊗ y)(x⊗ y)T

s.t. x ∈ Sn1 , y ∈ Sn2 ,

which is the exact model (MS) with d = 4 and d1 = d2 = 2.


In this section we are going to present the numerical performance of the approximation

algorithms proposed in this chapter. In particular, the model (HQ) with d = 4 is being

tested, i.e.,

(EHQ) max f(x) =∑

1≤i,j,k,`≤n Fijk` xixjxkx`

s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,

x ∈ Rn,


Table 4.1: Numerical results of (EHQ) for n = 10 and m = 30

Instance 1 2 3 4 5 6 7 8 9 10

100 · v 0.65 0.77 0.32 0.27 0.73 0.42 0.52 0.64 0.98 1.04

100 · v 4.96 4.53 4.75 5.05 5.86 5.32 5.00 5.19 5.07 5.92

τ (%) 13.10 17.00 6.74 5.35 12.46 7.89 10.40 12.33 19.33 17.57

n ln3m · τ 51.56 66.88 26.51 21.04 49.01 31.06 40.92 48.52 76.05 69.12

where fourth order tensor F is super-symmetric, and matrix Qi is positive semidefinite

for i = 1, 2, . . . ,m. During the testings, cvx v1.2 (Grant and Boyd [41]) is called for

solving the SDP problems whenever applicable.


For the data of (EHQ), a fourth order tensor F ′ is randomly generated, whose n4 entries

follow i.i.d. standard normals. We then symmetrize F ′ to form a super-symmetric

tensor F by averaging the related entries. As to the constraints, we generate m matrix

Q′i ∈ Rn×n (i = 1, 2, . . . ,m) independently, whose entries also follow i.i.d. standard

normals, and then let Qi = (Q′i)TQ′i for i = 1, 2, . . . ,m.

For the particular nature of (EHQ), rather than directly applying Algorithm 4.3.1

to solve it, we use a simplified method. First (EHQ) is relaxed to

max F (X,X) =∑

1≤i,j,k,`≤n Fijk`XijXk`

s.t. tr (QiXQjXT) ≤ 1, i = 1, 2, . . . ,m, j = 1, 2, . . . ,m,

X ∈ Rn×n,

which is a standard quadratic program, and can be solved approximately by SDP

relaxation and randomization (see e.g., [75] or Section 2.5). The optimal value of the

SDP relaxation problem is denoted by v, which we shall use as an upper bound of

v(EHQ). We then apply DR 3.3.1 to decompose this approximate solution into x, y ∈

Rn. Finally we pick a vector with the best objective value of f(x) from 0, x, y, (x +

y)/2, (x − y)/2 as the output. This objective value is denoted by v, and a ratio

τ := v/v is also computed.

By following essentially the same proof, this simplified method also enjoys a worst-

case relative performance ratio of Ω(

1n log3m

), similar as Theorem 4.3.2 asserted. For

n = 10 and m = 30, we randomly generate 10 instances of (EHQ). The solution results


Table 4.2: Numerical ratios (average of 10 instances) of (EHQ)

n 2 5 8 10 12

τ (%) for m = 1 90.2 57.9 73.3 66.2 60.0

τ (%) for m = 5 65.6 28.3 22.5 29.1 17.1

τ (%) for m = 10 60.4 22.3 14.6 16.0 8.9

τ (%) for m = 30 59.4 17.8 10.2 12.2 9.2

are shown in Table 4.1. In Table 4.2, the absolute approximation ratios for various n

and m are shown. Remark that the dimensions of the problems that can be efficiently

solved using our algorithms are not large, due to the limitation of solving large size

SDP relaxation problems.

4.6.2 Comparison with Sum of Squares Method

In this subsection, we compare our solution method with the so-called sum of squares

(SOS) method [70, 71] for solving (EHQ). Due to the limitations of the current SD-

P solvers, our method works only for small size problems. Since the SOS approach

works quite efficiently for small size polynomial optimization problems, it is interest-

ing to know how the SOS method would perform in solving these randomly generated

instances of (EHQ). In particular, we shall use GloptiPoly 3 of Henrion et al. [53].

We randomly generated 10 instances of (EHQ). By using the first SDP relaxation

(Lasserre’s procedure [70]), GloptiPoly 3 found global optimal solutions for 4 instances,

and got upper bounds of optimal values for the other 6 instances. In the latter case,

however, no feasible solutions are generated, while our algorithm always finds feasible

solutions with guaranteed approximation ratio, and so the two approaches are comple-

mentary to each other. Moreover, GloptiPoly 3 always yields a better upper bound

than v for our test instances, which helps to yield better approximation ratios. The

average ratio is 0.112 by the using upper bound v, and is 0.262 by using the upper

bound produced by GloptiPoly 3 (see Table 4.3).

To conclude this section as well as this chapter, we remark that the algorithms

proposed are actually practical, and they produce high quality solutions. The worst-

case performance analysis offers a theoretical ‘safety net’, which is usually far from

the typical performance. Moreover, it is of course possible to improve the solution by


Table 4.3: Numerical results of (EHQ) compared with SOS for n = 12 and m = 30

Instance 1 2 3 4 5 6 7 8 9 10

100 · v 0.30 0.76 0.43 0.76 0.70 0.49 0.81 0.34 0.29 0.62

100 · v 4.75 4.47 5.21 5.20 4.59 4.81 5.23 5.12 5.89 4.78

100 · vSOS 2.05 2.02 2.43 2.41 1.86 2.02 1.99 2.24 2.83 1.88

Optimality of vSOS No No Yes Yes No Yes No Yes No No

v/v (%) 6.32 17.00 8.25 14.62 15.25 10.19 15.49 6.64 4.92 12.97

v/vSOS (%) 14.63 37.62 17.70 31.54 37.63 24.26 40.70 15.18 10.25 32.98

some local search procedure, e.g., the projection gradient methods [22], maximum block

improvement method [25].

Chapter 5

Polynomial Optimization with

Convex Constraints

5.1 Introduction

This chapter tackles an important and useful extension of the models studied in previ-

ous chapters: to allow the objective function to be a generic inhomogeneous polynomial

function. As is evident, many important applications of polynomial optimizations in-

volve an objective that is intrinsically inhomogeneous. Specifically, we consider the

following problems:

(PS) max p(x)

s.t. x ∈ Sn;

(PQ) max p(x)

s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,

x ∈ Rn

where Qi 0 for k = 1, 2, . . . , d, and∑m

i=1Qi 0. It is obvious that (PQ) is an

extension of (PS). We also in the chapter consider a much more general frame of

polynomial optimization over a general convex compact set, i.e., for a give convex

compact set G ⊂ Rn, the problem

(PG) max p(x)

s.t. x ∈ G.

72

5.1 Introduction 73

The model (PS) can be solved in polynomial-time when d ≤ 2, and becomes NP-

hard when d ≥ 3. Even worse for d ≥ 3, there is no polynomial-time approximation

algorithm with a positive approximation ratio unless P = NP , which we shall ar-

gue later. Therefore, the whole chapter is focus on relative approximation algorithms.

The inapproximability of (PS) differs greatly to that of the homogeneous model (HS)

discussed in Section 4.2, since when d is odd, (HS) admits a polynomial-time approxi-

mation algorithm with a positive approximation ratio by Theorem 4.2.2. Consequently,

the optimization of an inhomogeneous polynomial is much harder than a homogeneous

one. The complexity of (PQ) and (PG) is similar, being solvable in polynomial-time

only when d = 1 and NP-hard when d ≥ 2. This is because (PG) is generalized from

(PQ), and (PQ) is generalized from (HQ), an NP-hard problem when d ≥ 2 (see the

discussion in Section 4.1).

Extending the solution methods and the corresponding analysis from homogeneous

polynomial optimizations to the general inhomogeneous polynomials is not straight-

forward. As a matter of fact, so far all the successful approximation algorithms with

provable approximation ratios in the literature, e.g., the quadratic models considered

in [88, 87, 120, 75, 50] and the quartic models considered in [73, 77], are dependent on

the homogeneity in a crucial way. Technically, a homogenous polynomial allows one to

scale the overall function value along a given direction, which is an essential operation

in proving the quality bound of the approximation algorithms. The current chapter

breaks its path from the preceding practices, by directly dealing with a homogeniz-

ing variable. Although homogenization is a natural way to deal with inhomogeneous

polynomial functions, it is quite a different matter when it comes to the worst-case

performance ratio analysis. In fact, the usual homogenization does not lead to any

assured performance ratio. In this chapter we shall point out a specific route to get

around this difficulty, in which we actually provide a general scheme to approximately

solve such problems via homogenization.

In Section 5.2, we start by analyzing the model where the constraint set is the

Euclidean ball, i.e., the model (PS). We propose polynomial-time approximation al-

gorithms with guaranteed relative approximation ratios, which serve as a basis for the

subsequent analysis. In Section 5.3, the discussion is extended to cover the problem

where the constraint set is the intersection of a finite number of co-centered ellipsoids,

i.e., the model (PQ), and relative approximation algorithms are proposed as well. In

74 5 Polynomial Optimization with Convex Constraints

Section 5.4, the approximation bounds are derived even for some very general optimiza-

tion models (PG), e.g., optimization of a polynomial over a polytope. It turns out that

for such general problems, it is still possible to derive relative approximation ratios,

which depend on the problem dimensions only. The tool we used is the Lowner-John

ellipsoids. In Section 5.5, we discuss some applications with the models presented in

this chapter. Finally, we report our numerical experiment results in Section 5.6. As

this chapter is concerned with the relative approximation ratios, we may without loss of

generality assume the polynomial function p(x) to have no constant term, i.e., p(0) = 0.

5.2 Polynomial with Ball Constraint

Our first model in this chapter is to maximize a generic multivariate polynomial function

subject to the Euclidean ball constraint, i.e.,

(PS) max p(x)

s.t. x ∈ Sn.

Since we assume p(x) to have no constant term, the optimal value of this problem is

obviously nonnegative, i.e., v(PS) ≥ 0.

The complexity to solve (PS) can be summarized by the following proposition.

Proposition 5.2.1 If d ≤ 2, then (PS) can be solved in polynomial-time; Otherwise if

d ≥ 3, then (PS) is NP-hard, and there is no polynomial-time approximation algorithm

with a positive approximation ratio unless P = NP .

Proof. For d ≤ 2, (PS) is a standard trust region subproblem. As such it is well known

to be solvable in polynomial-time (see e.g., [113, 114] and the references therein). For

d ≥ 3, in a special case where p(x) is a homogeneous cubic form, it is easy to see that

(PS) is equivalent to maxx∈Sn p(x), which is shown to be NP-hard by Nesterov [90].

Let us now consider a special class of (PS) when d = 3:

v(α) = max f(x)− α‖x‖2

s.t. x ∈ Sn,

where α ≥ 0, and f(x) is a homogeneous cubic form associated with a nonzero super-

symmetric tensor F ∈ Rn×n×n. If v(α) > 0, then its optimal solution x∗ satisfies

f(x∗)− α‖x∗‖2 = ‖x∗‖3f(x∗

‖x∗‖

)− α‖x∗‖2 = ‖x∗‖2

(‖x∗‖f

(x∗

‖x∗‖

)− α

)> 0.

5.2 Polynomial with Ball Constraint 75

Thus by the optimality of x∗, we have ‖x∗‖ = 1. If we choose α = ‖F ‖ ≥ maxx∈Sn f(x),

then v(α) = 0. Since otherwise we must have v(α) > 0 and ‖x∗‖ = 1, with

v(α) = f(x∗)− α‖x∗‖2 ≤ maxx∈Sn

f(x)− α ≤ 0,

which is a contradiction. Moreover, v(0) > 0 simply because F is a nonzero tensor,

and it is also easy to see that v(α) is non-increasing as α ≥ 0 increases. Hence, there

is a threshold α0 ∈ [0, ‖F ‖], such that v(α) > 0 if 0 ≤ α < α0, and v(α) = 0 if α ≥ α0.

Suppose there exists a polynomial-time approximation algorithm with a positive

approximation ratio τ for (PS) when d ≥ 3. Then for every α ≥ 0, we can find

z ∈ Sn in polynomial-time, such that g(α) := f(z) − α‖z‖2 ≥ τv(α). It is obvious

that g(α) ≥ 0 since v(α) ≥ 0. Together with the fact that g(α) ≤ v(α) we have that

g(α) > 0 if and only if v(α) > 0, and g(α) = 0 if and only if v(α) = 0. Therefore, the

threshold α0 also satisfies g(α) > 0 if 0 ≤ α < α0, and g(α) = 0 if α ≥ α0. By applying

the bisection search over the interval [0, ‖F ‖] with this polynomial-time approximation

algorithm, we can find α0 and z ∈ Sn in polynomial-time, such that f(z)−α0‖z‖2 = 0.

This implies that z ∈ Sn is the optimal solution for the problem maxx∈Sn f(x) with

the optimal value α0, which is an NP-hard problem mentioned in the beginning of the

proof. Therefore, such approximation algorithm cannot exist unless P = NP .

The negative result in Proposition 5.2.1 rules out any polynomial-time approxima-

tion algorithm with a positive approximation ratio for (PS). However a positive relative

approximation ratio is still possible, which is the main subject of this section. Below we

shall first present a polynomial-time algorithm for approximately solving (PS), which

admits a (relative) worst-case performance ratio. In fact, here we present a general

scheme aiming at solving the polynomial optimization (PS). This scheme breaks down

to the following four major steps:

1. Introduce an equivalent model with the objective being a homogenous form;

2. Solve a relaxed model with the objective being a multilinear form;

3. Adjust to get a solution based on the solution of the relaxed model;

4. Assemble a solution for the original inhomogeneous model.

Some of these steps can be designed separately. The algorithm below is one realization

of the general scheme for solving (PS), with each step being carried out by a specific


procedure. We first present the specialized algorithm, and then in the remainder of the

section, we elaborate on these four general steps, and prove that in combination they

lead to a polynomial-time approximation algorithm with a quality-assured solution.

Algorithm 5.2.1

• INPUT: an n-dimensional d-th degree polynomial function p(x).

1 Rewrite p(x) − p(0) = F (x, x, · · · , x︸︷︷︸d

) when xh = 1 as in (5.2), with F being an

(n+ 1)-dimensional d-th order super-symmetric tensor.


max F (x1, x2, · · · , xd)

s.t. xk ∈ Sn+1, k = 1, 2, . . . , d

approximately, with input F and output (y1, y2, · · · , yd).

3 Compute (z1, z2, · · · , zd) = arg maxF((

ξ1y1/d1

),(ξ2y2/d

1

), · · · ,

(ξdy

d/d1

)), ξ ∈ Bd

.

4 Compute z = arg maxp(0); p (z(β)/zh(β)) ,β ∈ Bd and β1 =

∏dk=2 βk= 1

, with

z(β) = β1(d+ 1)z1 +∑d

k=2 βkzk.

• OUTPUT: a feasible solution z ∈ Sn.

In Step 2 of Algorithm 5.2.1, Algorithm 3.2.3 is called to approximately solve the

spherically constrained multilinear form optimization problem, which is a deterministic

polynomial-time algorithm. Notice the degree of the polynomial p(x) is deemed a fixed

parameter in this thesis, and thus Algorithm 5.2.1 runs in polynomial-time, and is

deterministic too. Our main result in this section is the following:

Theorem 5.2.2 (PS) admits a polynomial-time approximation algorithm with relative

approximation ratio τ(PS), where

τ(PS) := 2−5d2 (d+ 1)! d−2d(n+ 1)−

d−22 = Ω

(n−

d−22

).


Although homogenization is a natural way to deal with inhomogeneous polynomials,

the worst-case performance ratio does not follow straightforwardly. What is lacking is

that an inhomogeneous polynomial does not allow one to scale the overall function value

along a given direction, which is however an essential operation to prove the quality

bound of the approximation algorithms (see e.g., [87, 75, 50, 77]). Below we study

in detail how a particular implementation of these four steps of the scheme (which

becomes Algorithm 5.2.1) leads to the promised worst-case relative performance ratio

in Theorem 5.2.2. As we shall see later, our solution scheme can be applied to solve a

very general polynomial optimization model (PG).

5.2.1 Homogenization

The method of homogenization depends on the form of the polynomial p(x). Here

in discussion we assume p(x) to have no constant term, although Algorithm 5.2.1

applies for any polynomial. If p(x) is given as a summation of homogeneous polynomial

functions of different degrees, i.e., fk(x) (1 ≤ k ≤ d) is a homogeneous polynomial

function of degree k, then we may first write

fk(x) = Fk(x,x, · · · ,x︸︷︷︸k

) (5.1)

with F k being a k-th order super-symmetric tensor. Then by introducing a homoge-

nizing variable xh, which is always equal to 1, we may rewrite p(x) as

p(x) =

d∑k=1

fk(x) =

d∑k=1

fk(x)xhd−k =

d∑k=1

Fk(x,x, · · · ,x︸︷︷︸k

)xhd−k

= F

((x

xh

),

(x

xh

), · · · ,

(x

xh

)︸︷︷︸

d

)= F (x, x, · · · , x︸︷︷︸

d

) = f(x), (5.2)

where F is an (n+ 1)-dimensional d-th order super-symmetric tensor, whose last com-

ponent is 0 (since p(x) has no constant term).

If the polynomial p(x) is given in terms of summation of monomials, we should first

group them according to their degrees, and then rewrite the summation of monomials in

each group as homogeneous polynomial function. After that, we then proceed according

to (5.1) and (5.2) to obtain the tensor form F , as required.


Finally in this step, we may equivalently reformulate (PS) as

(PS) max f(x)

s.t. x =

(x

xh

),

x ∈ Sn, xh = 1.

Obviously, we have v(PS) = v(PS) and v(PS) = v(PS).

5.2.2 Multilinear Form Relaxation

Multilinear form relaxation has proven to be effective, as discussed in Chapter 4. Specif-

ically, Lemma 4.2.1 and Lemma 4.4.3 are the key link formulae. Now we relax (PS) to

an inhomogeneous multilinear form optimization problem as:

(TPS) max F (x1, x2, · · · , xd)

s.t. xk =

(xk

xkh

), k = 1, 2, . . . , d,

xk ∈ Sn, xkh = 1, k = 1, 2, . . . , d.

Obviously, we have v(TPS) ≥ v(PS) = v(PS). Before proceeding, let us first settle the

computational complexity issue for solving (TPS).

Proposition 5.2.3 (TPS) is NP-hard whenever d ≥ 3.

Proof. Notice that in Proposition 3.2.2, we proved the following problem is NP-hard:

max F (x,y, z)

s.t. x,y, z ∈ Sn.

If d = 3 and a special case where F has the form Fn+1,j,k = Fi,n+1,k = Fi,j,n+1 = 0 for

all 1 ≤ i, j, k ≤ n+ 1, (TPS) is equivalent to the above model, and thus is NP-hard.

(TPS) is still difficult to solve, and moreover it remains inhomogeneous, since xkh

is required to be 1. To our best knowledge, no polynomial-time approximation algo-

rithm is available in the literature to solve this problem. Furthermore, we shall relax

the constraint xkh = 1, and introduce the following parameterized and homogenized

problem:

(TPS(t)) max F (x1, x2, · · · , xd)

s.t. ‖xk‖ ≤ t, xk ∈ Rn+1, k = 1, 2, . . . , d.

Obviously, (TPS) can be relaxed to (TPS(√

2)), since if x is feasible for (TPS) then

‖x‖2 = ‖x‖2 + xh2 ≤ 1 + 1 = 2. Consequently, v(TPS(

√2)) ≥ v(TPS).


Both the objective and the constraints are now homogeneous, and it is easy to see

for all t > 0, (TPS(t)) is equivalent to each other by a simple scaling method. Moreover,

(TPS(1)) is equivalent to

max F (x1, x2, · · · , xd)

s.t. xk ∈ Sn+1, k = 1, 2, . . . , d,

which is in the form of (TS) discussed in Section 3.2. By using Algorithm 3.2.3 and

applying Theorem 3.2.4, (TPS(1)) admits a polynomial-time approximation algorithm

with approximation ratio (n+ 1)−d−22 . Therefore, for all t > 0, (TPS(t)) also admits a

polynomial-time approximation algorithm with approximation ratio (n + 1)−d−22 , and

v(TPS(t)) = td v(TPS(1)). After this relaxation step (Step 2 in Algorithm 5.2.1), we

are able to find a feasible solution (y1, y2, · · · , yd) of (TPS(1)) in polynomial-time, such

that

F (y1, y2, · · · , yd) ≥ (n+ 1)−d−22 v(TPS(1))

= 2−d2 (n+ 1)−

d−22 v(TPS(

√2))

≥ 2−d2 (n+ 1)−

d−22 v(TPS). (5.3)

Algorithm 3.2.3 is the engine which enables the second step of our scheme. In

fact, any polynomial-time approximation algorithm of (TPS(1)) can be used as an

engine to yield a realization (algorithm) of our scheme. As will become evident later,

any improvement of the approximation ratio of (TPS(1)) leads to the improvement of

relative approximation ratio in Theorem 5.2.2. For example, recently So [108] improved

the approximation bound of (TPS(1)) to Ω

((lognn

)− d−22

)(though the algorithm is

mainly of theoretical interest), and consequently the relative approximation ratio under

our scheme is improved to Ω

((lognn

) d−22

)too. Of course, one may apply any other

favorite algorithm to solve the relaxation (TPS(1)). For instance, the alternating least

square (ALS) algorithm (see e.g., [68] and the references therein), and the maximum

block improvement (MBI) method of Chen et al. [25], can be other alternatives for the

second step.

5.2.3 Homogenizing Components Adjustment

The approximate solution (y1, y2, · · · , yd) of (TPS(1)) satisfies ‖yk‖ ≤ 1 for all 1 ≤

k ≤ d, which implies ‖yk‖ ≤ 1, but in general we do not have any control on the size of


ykh, and thus (y1, y2, · · · , yd) may not be a feasible solution for (TPS). The following

lemma plays a link role in our analysis, to ensure that the construction of a feasible

solution for the inhomogeneous model (TPS) is possible.

Lemma 5.2.4 Suppose xk ∈ Rn+1 with |xkh| ≤ 1 for all 1 ≤ k ≤ d. Let η1, η2, · · · , ηdbe independent random variables, each taking values 1 and −1 with E[ηk] = xkh for all

1 ≤ k ≤ d, and let ξ1, ξ2, · · · , ξd be i.i.d. random variables, each taking values 1 and

−1 with equal probability 1/2. If the last component of the tensor F is 0, then

E

[d∏

k=1

ηkF

((η1x

1

1

),

(η2x

2

1

), · · · ,

(ηdx

d

1

))]= F (x1, x2, · · · , xd), (5.4)

and

E

[F

((ξ1x

1

1

),

(ξ2x

2

1

), · · · ,

(ξdx

d

1

))]= 0. (5.5)

Proof. The claimed equations readily result from the following observations:

E

[d∏

k=1

ηkF

((η1x

1

1

),

(η2x

2

1

), · · · ,

(ηdx

d

1

))]

= E

[F

((η1

2x1

η1

),

(η2

2x2

η2

), · · · ,

(ηd

2xd

ηd

))](multilinearity of F )

= F

(E

[(x1

η1

)],E

[(x2

η2

)], · · · ,E

[(xd

ηd

)])(independence of ηk’s)

= F (x1, x2, · · · , xd),

and

E

[F

((ξ1x

1

1

),

(ξ2x

2

1

), · · · ,

(ξdx

d

1

))]= F

(E

[(ξ1x

1

1

)],E

[(ξ2x

2

1

)], · · · ,E

[(ξdx

d

1

)])(independence of ξk’s)

= F

((0

1

),

(0

1

), · · · ,

(0

1

))(zero-mean of ξk’s)

= 0,

where the last equality is due to the fact that the last component of F is 0.

Lemma 5.2.4 suggests that one may enumerate the 2 d possible combinations of((ξ1y1

1

),(ξ2y2

1

), · · · ,

(ξdy

d

1

))and pick the one with the largest value of function F (or

via a simple randomization procedure), to generate a feasible solution for the inhomo-

geneous multilinear form optimization (TPS) from a feasible solution for the homoge-

neous multilinear form optimization (TPS(1)), with a controlled quality deterioration.


It plays a key role in proving the approximation ratio for (TPS), which is a byproduct

in this section.

Theorem 5.2.5 (TPS) admits a polynomial-time approximation algorithm with ap-

proximation ratio 2−3d2 (n+ 1)−

d−22 .

Proof. Let (y1, y2, · · · , yd) be the feasible solution found in Step 2 of Algorithm 5.2.1

satisfying (5.3), and let η = (η1, η2, · · · , ηd)T with all ηk’s being independent and taking

values 1 and −1 such that E[ηk] = ykh. By applying Lemma 5.2.4, (5.4) explicitly implies

F (y1, y2, · · · , yd)

= −∑

β∈Bd,∏dk=1 βk=−1

Prob η = βF((

β1y1

1

),

(β2y

2

1

), · · · ,

(βdy

d

1

))

+∑

β∈Bd,∏dk=1 βk=1

Prob η = βF((

β1y1

1

),

(β2y

2

1

), · · · ,

(βdy

d

1

)),

and (5.5) explicitly implies

∑β∈Bd

F

((β1y

1

1

),

(β2y

2

1

), · · · ,

(βdy

d

1

))= 0.

Combing the above two equalities, for any constant c, we have

F (y1, y2, · · · , yd)

=∑


(c− Prob η = β)F((

β1y1

1

),

(β2y

2

1

), · · · ,

(βdy

d

1

))

+∑


(c+ Prob η = β)F((

β1y1

1

),

(β2y

2

1

), · · · ,

(βdy

d

1

)). (5.6)

If we let

c = maxβ∈Bd,

∏dk=1 βk=−1

Prob η = β,

then the coefficients of each term in (5.6) will be nonnegative. Therefore we are able

to find β′ ∈ Bd, such that

F

((β′1y

1

1

),

(β′2y

2

1

), · · · ,

(β′dy

d

1

))≥ τ0F (y1, y2, · · · , yd), (5.7)


where

τ0 =

∑β∈Bd,

∏dk=1 βk=1

(c+ Prob η = β) +∑


(c− Prob η = β)

−1

≥

2d−1c+∑


Prob η = β+ (2d−1 − 1)c

−1

≥(

2d−1 + 1 + 2d−1 − 1)−1

= 2−d.

Let us denote zk :=(β′kyk

1

)for k = 1, 2, . . . , d. Since ‖zk‖ = ‖β′kyk‖ ≤ 1, we know that

(z1, z2, · · · , zk) is a feasible solution for (TPS). By combing with (5.3), we have

F (z1, z2, · · · , zd) ≥ τ0F (y1, y2, · · · , yd)

≥ 2−d2−d2 (n+ 1)−

d−22 v(TPS)

= 2−3d2 (n+ 1)−

d−22 v(TPS).

One may notice that our proposed algorithm for solving (TPS) is very similar to

Steps 2 and 3 of Algorithm 5.2.1, with only minor modification at Step 3, namely we

choose a solution in arg maxF((

β1y1

1

),(β2y2

1

), · · · ,

(βdy

d

1

)),β ∈ Bd

, instead of choos-

ing a solution in arg maxF((

β1y1/d1

),(β2y2/d

1

), · · · ,

(βdy

d/d1

)),β ∈ Bd

. The reason to

divide d at Step 3 in Algorithm 5.2.1 (to solve (PS)) will become clear later. Finally, we

remark again that it is unnecessary to enumerate all possible 2d combinations in this

step, as (5.6) suggests that a simple randomization process will serve the same purpose,

especially when d is large. In the latter case, we will end up with a polynomial-time

randomized approximation algorithm; otherwise, the computational complexity of the

procedure is deterministic and is polynomial-time.

5.2.4 Feasible Solution Assembling

Finally we come to the last step of the scheme. In Step 4 of Algorithm 5.2.1, a polar-

ization formula z(β) = β1(d + 1)z1 +∑d

k=2 βkzk with β ∈ Bd and β1 =

∏dk=2 βk = 1

is proposed. In fact, searching over all β ∈ Bd will possibly improve the solution,

although the worst-case performance ratio will remain the same. Moreover, one may

choose z1 or any other zk to play the same role here; alternatively one may enumerate

β`(d + 1)z` +∑

1≤k≤d, k 6=` βkzk over all β ∈ Bd and 1 ≤ ` ≤ d, and take the best


possible solution; again, this will not change the theoretical performance ratio. The

polarization formula at Step 4 of Algorithm 5.2.1 works for any fixed degree d, and we

shall complete the final stage of the proof of Theorem 5.2.2. Specifically, we shall prove

that by letting

z = arg max

p(0); p

(z(β)

zh(β)

),β ∈ Bd and β1 =

d∏k=2

βk = 1

with z(β) = β1(d+ 1)z1 +∑d

k=2 βkzk, we have

p(z)− v(PS) ≥ τ(PS) (v(PS)− v(PS)) . (5.8)

First, the solution (z1, z2, · · · , zd) as established at Step 3 of Algorithm 5.2.1

satisfies ‖zk‖ ≤ 1/d (notice we divided d in each term at Step 3) and zkh = 1 for

k = 1, 2, . . . , d. A same proof of Theorem 5.2.5 can show that

F (z1, z2, · · · , zd) ≥ d−d2−3d2 (n+ 1)−

d−22 v(TPS) ≥ 2−

3d2 d−d(n+ 1)−

d−22 v(PS). (5.9)

It is easy to see that

2 ≤ |zh(β)| ≤ 2d and ‖z(β)‖ ≤ (d+ 1)/d+ (d− 1)/d = 2. (5.10)

Thus z(β)/zh(β) is a feasible solution for (PS), and so f(z(β)/zh(β)) ≥ v(PS) = v(PS).

Moreover, we shall argue below that

β1 = 1 =⇒ f(z(β)) ≥ (2d)d v(PS). (5.11)

If this were not the case, then f (z(β)/(2d)) < v(PS) ≤ 0. Notice that β1 = 1 implies

zh(β) > 0, and thus we have

f

(z(β)

zh(β)

)=

(2d

zh(β)

)df

(z(β)

2d

)≤ f

(z(β)

2d

)< v(PS),

which contradicts the feasibility of z(β)/zh(β).

Suppose ξ1, ξ2, · · · , ξd are i.i.d. random variables, each taking values 1 and −1 with

equal probability 1/2. By the link Lemma 4.2.1, noticing that f (z(−ξ)) = f (−z(ξ)) =


(−1)df (z(ξ)), we have

d!F(

(d+ 1)z1, z2, · · · , zd)

= E

[d∏

k=1

ξkf (z(ξ))

]

=1

4E

[f (z(ξ))

∣∣∣∣∣ξ1 = 1,

d∏k=2

ξk = 1

]− 1

4E

[f (z(ξ))

∣∣∣∣∣ξ1 = 1,

d∏k=2

ξk = −1

]

−1

4E

[f (z(ξ))

∣∣∣∣∣ξ1 = −1,

d∏k=2

ξk = 1

]+

1

4E

[f (z(ξ))

∣∣∣∣∣ξ1 = −1,

d∏k=2

ξk = −1

]

=1

4E

[f (z(ξ))

∣∣∣∣∣ξ1 = 1,d∏

k=2

ξk = 1

]− 1

4E

[f (z(ξ))

∣∣∣∣∣ξ1 = 1,d∏

k=2

ξk = −1

]

−1

4E

[f (z(−ξ))

∣∣∣∣∣ξ1 = 1,d∏

k=2

ξk = (−1)d−1

]+

1

4E

[f (z(−ξ))

∣∣∣∣∣ξ1 = 1,d∏

k=2

ξk = (−1)d

].

By inserting and canceling a constant term, the above expression further leads to

d!F(

(d+ 1)z1, z2, · · · , zd)

= E

[d∏

k=1

ξkf (z(ξ))

]

=1

4E

[(f (z(ξ))− (2d)d v(PS)

) ∣∣∣∣∣ ξ1 = 1,

d∏k=2

ξk = 1

]

−1

4E

[(f (z(ξ))− (2d)d v(PS)

) ∣∣∣∣∣ ξ1 = 1,

d∏k=2

ξk = −1

]

+(−1)d−1

4E

[(f (z(ξ))− (2d)d v(PS)

) ∣∣∣∣∣ ξ1 = 1,d∏

k=2

ξk = (−1)d−1

]

+(−1)d

4E

[(f (z(ξ))− (2d)d v(PS)

) ∣∣∣∣∣ ξ1 = 1,d∏

k=2

ξk = (−1)d

]

≤ 1

2E

[(f (z(ξ))− (2d)d v(PS)

) ∣∣∣∣∣ ξ1 = 1,d∏

k=2

ξk = 1

], (5.12)

where the last inequality is due to (5.11). Therefore, there is a binary vector β′ ∈ Bd

with β′1 =∏dk=2 β

′k = 1, such that

f(z(β′))−(2d)dv(PS)≥ 2d!F ((d+1)z1, z2, · · · , zd)≥ 2−3d2

+1(d+1)!d−d(n+1)−d−22 v(PS),

where the last step is due to (5.9).

Below we argue z = arg maxp(0); p (z(β)/zh(β)) ,β ∈ Bd and β1 =

∏dk=2 βk = 1

satisfies (5.8). In fact, if −v(PS) ≥ τ(PS) (v(PS)− v(PS)), then 0 trivially satis-

fies (5.8), and so does z in this case. Otherwise, if −v(PS) < τ(PS) (v(PS)− v(PS)),

then we have

v(PS) > (1− τ(PS)) (v(PS)− v(PS)) ≥ v(PS)− v(PS)

2,

5.3 Polynomial with Ellipsoidal Constraints 85

which implies

f

(z(β′)

2d

)−v(PS)≥(2d)−d2−

3d2

+1(d+1)!d−d(n+1)−d−22 v(PS)≥τ(PS) (v(PS)− v(PS)) .

The above inequality also implies that f (z(β′)/(2d)) > 0. Recall that β′1 = 1 implies

zh(β′) > 0, and thus 2d/zh(β′) ≥ 1 by (5.10). Therefore, we have

p(z) ≥ p(z(β′)

zh(β′)

)= f

(z(β′)

zh(β′)

)=

(2d

zh(β′)

)df

(z(β′)

2d

)≥ f

(z(β′)

2d

).

This shows that z satisfies (5.8) in both cases, which concludes the whole proof.

5.3 Polynomial with Ellipsoidal Constraints

In this section, we consider an extension of (PS), namely

(PQ) max p(x)

s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,

x ∈ Rn,

where Qi 0 for i = 1, 2, . . . ,m, and∑m

i=1Qi 0. Since p(x) is assumed to have no

constant term, we know that v(PQ) ≤ 0 ≤ v(PQ).

Here, like in Section 5.2, we propose a polynomial-time randomized algorithm for

approximately solving (PQ), with a worst-case relative performance ratio. The main

algorithm and the approximation result of this section is the following.

Algorithm 5.3.1

• INPUT: an n-dimensional d-th degree polynomial function p(x), matrices Qi ∈

Rn×n, Qi 0 for all 1 ≤ i ≤ m with∑m

i=1Qi 0.

1 Rewrite p(x) − p(0) = F (x, x, · · · , x︸︷︷︸d

) when xh = 1 as in (5.2), with F being an

(n+ 1)-dimensional d-th order super-symmetric tensor.


max F (x1, x2, · · · , xd)

s.t. (xk)T

Qi 0

0T 1

xk ≤ 1, k = 1, 2, . . . , d, i = 1, 2, . . . ,m

approximately, and get a feasible solution (y1, y2, · · · , yd).



ξ1y1/d1

),(ξ2y2/d

1

), · · · ,

(ξdy

d/d1

)), ξ ∈ Bd

.

4 Compute z = arg maxp(0); p (z(β)/zh(β)) ,β ∈ Bd and β1 =

∏dk=2 βk= 1

, with

z(β) = β1(d+ 1)z1 +∑d

k=2 βkzk.

• OUTPUT: a feasible solution z ∈ Rn.

Theorem 5.3.1 (PQ) admits a polynomial-time randomized approximation algorithm

with relative approximation ratio τ(PQ), where

τ(PQ) := 2−5d2 (d+ 1)! d−2d(n+ 1)−

d−22 Ω

(log−(d−1)m

)= Ω

(n−

d−22 log−(d−1)m

).

Our scheme for solving general polynomial optimization model (PQ) is similar to

that for solving (PS) in Section 5.2. The main difference lies in Step 2, where a different

relaxation model requires a different solution method to cope with. The method in

question is Algorithm 3.3.2.

The proof of Theorem 5.3.1 is similar to that of Theorem 5.2.2. Here we only

illustrate the main ideas and skip the details.

By homogenizing p(x) who has no constant term, we may rewrite (PQ) as

(PQ) max f(x)

s.t. x =

(x

xh

),

xTQix ≤ 1, x ∈ Rn, i = 1, 2, . . . ,m,

xh = 1,

which can be relaxed to the inhomogeneous multilinear function problem

(TPQ) max F (x1, x2, · · · , xd)

s.t. xk =

(xk

xkh

), k = 1, 2, . . . , d,

(xk)TQixk ≤ 1, xk ∈ Rn, k = 1, 2, . . . , d, i = 1, 2, . . . ,m,

xkh = 1, k = 1, 2, . . . , d,

where F (x, x, · · · , x︸︷︷︸d

) = f(x) with F being super-symmetric. We then further relax

(TPQ) to the multilinear form optimization model (TPQ(√

2)), with

(TPQ(t)) max F (x1, x2, · · · , xd)

(xk)TQixk ≤ t2, k = 1, 2, . . . , d, i = 1, 2, . . . ,m,

xk ∈ Rn+1, k = 1, 2, . . . , d,

5.3 Polynomial with Ellipsoidal Constraints 87

where Qi =

Qi 0

0T 1

for i = 1, 2, . . . ,m.

By Theorem 3.3.4, for any t > 0, (TPQ(t)) admits a polynomial-time randomized

approximation algorithm with approximation ratio (n + 1)−d−22 Ω

(log−(d−1)m

), and

v(TPQ(t)) = td v(TPQ(1)). Thus the approximate solution (y1, y2, · · · , yd) found by

Step 2 of Algorithm 5.3.1 satisfies

F (y1, y2, · · · , yd) ≥ (n+ 1)−d−22 Ω

(log−(d−1)m

)v(TPQ(1))

= (√

2)−d(n+ 1)−d−22 Ω

(log−(d−1)m

)v(TPQ(

√2))

≥ 2−d2 (n+ 1)−

d−22 Ω

(log−(d−1)m

)v(TPQ).

Noticing that (ykh)2 ≤ (yk)TQ1yk ≤ 1 for k = 1, 2, . . . , d, we again apply Lemma 5.2.4

to (y1, y2, · · · , yd), and use the same argument as in the proof of Theorem 5.2.5.

Let c = maxβ∈Bd,∏dk=1 βk=−1 Prob η = β, where η = (η1, η2, · · · , ηd)T and its com-

ponents are independent random variables, each taking values 1 and −1 with E[ηk] = ykh

for k = 1, 2, . . . , d. Then we are able to find a binary vector β′ ∈ Bd, such that

F

((β′1y

1

1

),

(β′2y

2

1

), · · · ,

(β′dy

d

1

))≥ τ0F (y1, y2, · · · , yd)

≥ 2−dF (y1, y2, · · · , yd)

≥ 2−3d2 (n+ 1)−

d−22 Ω

(log−(d−1)m

)v(TPQ).

This proves the following theorem as a byproduct.

Theorem 5.3.2 (TPQ) admits a polynomial-time randomized approximation algorith-

m with approximation ratio 2−3d2 (n+ 1)−

d−22 Ω

(log−(d−1)m

).

To prove the main theorem in this section (Theorem 5.3.1), we only need to check

the feasibility of z generated by Algorithm 5.3.1, while the worst-case performance

ratio can be proven by the similar argument in Section 5.2.4. Indeed, (z1, z2, · · · , zd)

at Step 3 of Algorithm 5.3.1 satisfies

(zk)TQizk ≤ 1/d2 ∀ 1 ≤ i ≤ m, 1 ≤ k ≤ d.

For any binary vector β ∈ Bd, as z(β) = β1(d + 1)z1 +∑d

k=2 βkzk, we have 2 ≤

|zh(β)| ≤ 2d. Noticing by the Cauchy-Schwarz inequality,

|(zj)TQizk| ≤ ‖Qi

12zj‖ · ‖Qi

12zk‖ ≤ 1/d2 ∀ 1 ≤ i ≤ m, 1 ≤ j, k ≤ d,


it follows that

(z(β))TQiz(β) ≤ 2d · 2d · 1/d2 = 4 ∀ 1 ≤ i ≤ m.

Thus z(β)/zh(β) is a feasible solution for (PQ), which implies z is also feasible.

To conclude this section, we remark here that (PQ) includes as a special case the

optimization of a general polynomial function over a central-symmetric polytope:

max p(x)

s.t. −1 ≤ (ai)Tx ≤ 1, i = 1, 2, . . . ,m,

x ∈ Rn,

with rank (a1,a2, · · · ,am) = n.

5.4 Polynomial with General Convex Constraints

In this section we study polynomial optimization model in a generic constraint format:

(PG) max p(x)

s.t. x ∈ G,

where G ⊂ Rn is a given convex compact set. As before, we derive polynomial-time

approximation algorithms for solving (PG). Our approaches make use of the well known

Lowner-John ellipsoids (see e.g., [20, 86]), which is the following:

Theorem 5.4.1 Given a convex compact set G ⊂ Rn with non-empty interior.

1. There exists a unique largest volume ellipsoid Ax + a |x ∈ Sn ⊂ G, whose n

times linear-size larger ellipsoid nAx+a |x ∈ Sn ⊃ G, and if in addition G is

central-symmetric, then √nAx+ a |x ∈ Sn ⊃ G;

2. There exists a unique smallest volume ellipsoid Bx+ b |x ∈ Sn ⊃ G, whose n

times linear-size smaller ellipsoid Bx/n+ b |x ∈ Sn ⊂ G, and if in addition G

is central-symmetric, then Bx/√n+ b |x ∈ Sn ⊂ G.

Armed with the above theorem, if we are able to find the Lowner-John ellipsoid

(either the inner or the outer) of the feasible region G in polynomial-time, then the

following algorithm approximately solves (PG) with a worst-case performance ratio.

5.4 Polynomial with General Convex Constraints 89

Algorithm 5.4.1

• INPUT: an n-dimensional d-th degree polynomial function p(x) and a set G ⊂ Rn.

1 Find a scalar t ∈ R, a vector b ∈ Rn, and a matrix A ∈ Rn×m with rank (A) =

m ≤ n, such that two co-centered ellipsoids E1 = Au + b |u ∈ Sm and E2 =

tAu+ b |u ∈ Sm satisfy E1 ⊂ G ⊂ E2.

2 Compute a polynomial function p0(u) = p(Au+ b) of variable u ∈ Rm.

3 Apply Algorithm 5.2.1 with input p0(x) and output y ∈ Sm.

4 Compute z = Ay + b.

• OUTPUT: a feasible solution z ∈ G.

The key result in this section is the following theorem.

Theorem 5.4.2 If Sn ⊂ G ⊂ t Sn := x ∈ Rn | ‖x‖ ≤ t for some t ≥ 1, then (PG)

admits a polynomial-time approximation algorithm with relative approximation ratio

τ(PG)(t), where

τ(PG)(t) := 2−2d(d+ 1)! d−2d(n+ 1)−d−22 (t2 + 1)−

d2 = Ω

(n−

d−22 t−d

).

Proof. By homogenizing the object function of (PG), we get the equivalent problem

(PG) max f(x)

s.t. x =

(x

xh

),

x ∈ G, xh = 1,

where f(x) = p(x) if xh = 1, and f(x) is an (n + 1)-dimensional homogeneous poly-

nomial function of degree d. If we write f(x) = F (x, x, · · · , x︸︷︷︸d

) with F being super-

symmetric, then (PG) can be relaxed to the inhomogeneous multilinear form problem

(TPG) max F (x1, x2, · · · , xd)

s.t. xk =

(xk

xkh

), k = 1, 2, . . . , d,

xk ∈ G, xkh = 1, k = 1, 2, . . . , d.


Recall that in Section 5.2.2, we have defined

(TPS(t)) max F (x1, x2, · · · , xd)

s.t. ‖xk‖ ≤ t, xk ∈ Rn+1, k = 1, 2, . . . , d.

As xk ∈ G ⊂ t Sn, it follows that ‖xk‖ ≤√t2 + 1 in (TPG). Therefore, (TPS(

√t2 + 1))

is a relaxation of (TPG), and v(TPS(√t2 + 1)) ≥ v(TPG) ≥ v(PG) = v(PG). The rest of

the proof follows similarly as that in Section 5.2.4. Specifically, we are able to construct

a feasible solution x ∈ Sn ⊂ G in polynomial-time with a relative performance ratio

τ(PG)(t).

Observe that any ellipsoid can be linearly transformed to the Euclidean ball. By a

variable transformation if necessary, we are led to the main result in this section.

Corollary 5.4.3 Given a bounded set G ⊂ Rn, if two co-centered ellipsoids E1 =

Au + b |u ∈ Sn and E2 = tAu + b |u ∈ Sn can be found in polynomial-time,

satisfying E1 ⊂ G ⊂ E2, then (PG) admits a polynomial-time approximation algorithm

with relative approximation ratio τ(PG)(t).

We remark that in fact the set G in Theorem 5.4.2 and Corollary 5.4.3 does not

need to be convex, as long as the two required ellipsoids are in place. However, the

famous Lowner-John theorem guarantees the existence of such inner and outer ellipsoids

required in Corollary 5.4.3 for any convex compact set, with t = n for G being non-

central-symmetric, and t =√n for G being central-symmetric. Thus, if we are able to

find a pair of ellipsoids (E1, E2) in polynomial-time for G, then (PG) can be solved by a

polynomial-time approximation algorithm with relative approximation ratio τ(PG)(t).

Indeed, it is possible to compute in polynomial-time the Lowner-John ellipsoids in

several interesting cases. Below is a list of such cases (assuming G is bounded); for the

details one is referred to [20, 86]:

• G = x ∈ Rn | (ai)Tx ≤ bi, i = 1, 2, . . . ,m;

• G = conv x1,x2, · · · ,xm, where xi ∈ Rn for i = 1, 2, . . . ,m;

• G =⋂mi=1Ei, where Ei is an ellipsoid in Rn for i = 1, 2, . . . ,m;

• G = conv ⋃mi=1Ei, where Ei is an ellipsoid in Rn for i = 1, 2, . . . ,m;

• G =∑m

i=1Ei := ∑m

i=1 xi |xi ∈ Ei, i = 1, 2, . . . ,m, where Ei is an ellipsoid in

Rn for i = 1, 2, . . . ,m.

5.5 Applications 91

By Corollary 5.4.3, and the computability of the Lowner-John ellipsoids discussed

above, we conclude that for (PG) with the constraint set G being any of the above

cases, then there is a polynomial-time approximation algorithm with a relative approx-

imation quality assurance. In particular, the ratio is τ(PG)(√m) = Ω

(n−

d−22 m−

d2

)for

the last case, and is τ(PG)(n) = Ω(n−

3d−22

)for the other cases.

We also remark that (PQ) : maxxTQix≤1,i=1,2,...,m p(x) discussed in Section 5.3,

may in principle be solved by directly applying Corollary 5.4.3 as well. If we adopt

that approach (Algorithm 5.4.1), then the relative approximation ratio is τ(PG)(√n) =

Ω(n−

2d−22

), which prevails if m is exceedingly large. Taking the better one, the quality

ratio in Theorem 5.3.1 can be improved to Ω(

maxn−

d−22 log−(d−1)m, n−

2d−22

).

Our investigation quite naturally leads to a question which is of a general geometric

interest itself. Consider the intersection of m co-centered ellipsoids in Rn as a geometric

structure. Denote Em,n to be the collection of all such structures, or more specifically

Em,n :=

m⋂i=1

x ∈ Rn |xTQix ≤ 1

∣∣∣∣∣ Qi 0 for i = 1, 2, . . . ,m and

m∑i=1

Qi 0

.

For any central-symmetric and convex compact set G ⊂ Rn centered at b, there exists

Em,n ∈ Em,n and t ≥ 1, such that b + Em,n ⊂ G ⊂ b + tEm,n. Obviously, one can

naturally define

t(G;m,n) := inf t |Em,n ∈ Em,n such that b+ Em,n ⊂ G ⊂ b+ tEm,n,

θ(m,n) := sup t(G;m,n) |G ⊂ Rn is convex compact and central-symmetric.

The famous Lowner-John theorem states that θ(1, n) =√n. Naturally, θ(∞, n) = 1,

because any central-symmetric convex set can be expressed by the intersection of an

infinite number of co-centered ellipsoids. It is interesting to compute θ(m,n) for general

m and n. Of course, it is trivial to observe that θ(m,n) is monotonically decreasing in m

for any fixed n. Anyway, if we are able to compute θ(m,n) and find the corresponding

Em,n in polynomial-time, then Theorem 5.3.1 suggests a polynomial-time randomized

approximation algorithm of (PG) with relative approximation ratio (θ(m,n))−dτ(PQ) =

Ω(

(θ(m,n))−dn−d−22 log−(d−1)m

).

5.5 Applications

The generality of the polynomial optimization models studied in this chapter have

versatile applications. In order to better appreciate these models as well as the approx-


imation algorithms presented, in this section we shall discuss a few detailed examples

rising from real applications, and show that they are readily formulated by the inho-

mogeneous polynomial optimization models in this chapter.

5.5.1 Portfolio Selection with Higher Moments

The portfolio selection problem dates back to early 1950’, when the seminal work

of mean-variance model was proposed by Markowitz [81]. Essentially, in Markowitz’s

model, the mean of the portfolio return is treated as the ‘gain’ factor, while the variance

of the portfolio return is treated as the ‘risk’ factor. By minimizing the risk subject to

certain target of reward, the mean-variance model is as follows:

(MV ) min xTΣx

s.t. µTx = µ0,

eTx = 1, x ≥ 0, x ∈ Rn,

where µ and Σ are the mean vector and co-variance matrix of n given assets respec-

tively, and e is the all one vector. This model and its variations have been studies

extensively along the history of portfolio management. Despite its popularity and orig-

inality, the mean-variance model certainly has drawbacks. An important one is that it

neglects the higher moments information of the portfolio. Mandelbrot and Hudson [78]

made a strong case against a ‘normal view’ of the investment returns. The use of higher

moments in portfolio selection becomes quite necessary, i.e., involving more than the

first two moments (e.g., the skewness and the kurtosis of the investment returns) if they

are also available. That problem has been receiving much attention in the literature

(see e.g., de Athayde and Flore [10], Prakash et al. [96], Jondeau and Rockinger [60],

Kleniati et al. [64], and the references therein). In particular, a very general model

in [64] is

(PM) max αµTx− β xTΣx+ γ∑n

i,j,k=1 ςijkxixjxk − δ∑n

i,j,k,`=1 κijk`xixjxkx`

s.t. eTx = 1, x ≥ 0, x ∈ Rn,

where µ,Σ, (ςijk), (κijk`) are the first four central moments of the n given assets. The

nonnegative parameters α, β, γ, δ measure the investor’s preference to the four moments,

and they sum up to one, i.e., α+ β + γ + δ = 1.

In fact, the mean-variance model (MV ) can be taken as a special case of (PM) with

γ = δ = 0. The model (PM) is essentially in the frame work of our model (PG), as

5.5 Applications 93

the constraint set is convex and compact. By directly applying Corollary 5.4.3 and the

discussion on its applicability in a polytope, it admits a polynomial-time approximation

algorithm with relative approximation ratio Ω(n−5

).

5.5.2 Sensor Network Localization

Suppose in a certain specified region G ⊂ R3, there are a set of anchor nodes, denoted by

A, and a set of sensor nodes, denoted by S. What we have known are the positions of the

anchor nodes aj ∈ G (j ∈ A), and the (possibly noisy) distance measurements between

anchor nodes and sensor nodes, and between two different sensor nodes, denoted by

dij (i ∈ S, j ∈ S ∪ A). The task is to estimate the positions of the unknown sensor

nodes xi ∈ G (i ∈ S). Luo and Zhang [77] proposed a least square formulation to this

sensor network localization problem. Specifically, the problem takes the form of

(SNL) min∑

i,j∈S(‖xi − xj‖2 − dij2

)2+∑

i∈S,j∈A(‖xi − aj‖2 − dij2

)2s.t. xi ∈ G, i ∈ S.

Notice that the objective function of (SNL) is an inhomogeneous quartic polynomial

function. If the specified region G is well formed, say the Euclidean ball, an ellipsoid, a

polytope, or any other convex compact set that can be sandwiched by two co-centered

ellipsoids, then (SNL) can be fit into the model (PG) in the following way. Suppose

E1 ⊂ G ⊂ E2 with E1 and E2 being two co-centered ellipsoids, we know by the

Lowner-John theorem that E2 is bounded by three times larger of E1 in linear size

(for the Euclidean ball or an ellipsoid it is 1, for a central-symmetric G it is less than√

3, and for a general convex compact G it is less than 3). Denote the number of

sensor nodes to be n = |S|, and denote x =((x1)T, (x2)T, · · · , (xn)T

)T ∈ R3n. Then

x ∈ G×G× · · · ×G︸︷︷︸n

, and this feasible region can be sandwiched by two co-centered

sets E1 × E1 × · · · × E1︸︷︷︸n

and E2 × E2 × · · · × E2︸︷︷︸n

, which are both intersections of n co-

centered ellipsoids, i.e., belonging to En,3n. According to the discussion at the end

of Section 5.4, and noticing in this case t(G;n, 3n) ≤ 3 is a constant, (SNL) admits

a polynomial-time randomized approximation algorithm with relative approximation

ratio Ω(

1n log3 n

).



In this section, we present some preliminary test results for the approximation al-

gorithms proposed in this chapter, to give the readers an impression about how our

algorithms work in practice. We shall focus on (PS) with d = 4, specifically, the model

being tested is

(EPS) max p(x) = F4(x,x,x,x) + F3(x,x,x) + F2(x,x) + F1(x)

s.t. x ∈ Sn,

where F 4 ∈ Rn4, F 3 ∈ Rn3

, F 2 ∈ Rn2and F 1 ∈ Rn, are super-symmetric tensors of

orders 4, 3, 2 and 1, respectively.


A fourth order tensor F ′4 is generated randomly, whose n4 entries follow i.i.d. standard

normals. We then symmetrize F ′4 to form the super-symmetric tensor F 4 by averaging

the related entries. The other lower order tensors F 3, F 2 and F 1 are generated in

the same manner. We then apply Algorithm 5.2.1 to get a feasible solution with its

objective value denote by v, which has a guaranteed worst-case performance ratio.

For the purpose of making a comparison, we also compute an upper bound of the

optimal value of (EPS). Like in (5.2), we may let F (x, x, x, x) = f(x) = p(x) when

xh = 1, and F ∈ R(n+1)4 is super-symmetric. (EPS) can be relaxed to

max F (x, x, x, x)

s.t. ‖x‖ ≤√

2, x ∈ Rn+1.

Let y = vec(xxT) ∈ R(n+1)2 , and rewrite F as an (n+ 1)2× (n+ 1)2 matrix F ′. (EPS)

is further relaxed to

max F ′(y,y) = yTF ′y

s.t. ‖y‖ ≤ 2, y ∈ R(n+1)2 .

The optimal value of the above problem is v = 4λmax(F ′), which is taken as an upper

bound of v(EPS).

By Theorem 5.2.2, Algorithm 5.2.1 possesses a theoretic worst-case relative perfor-

mance ratio of 2−10 · 5! · 4−8(n + 1)−1 = Ω(1/n). The numerical simulation results of

(EPS) are listed in Table 5.1. Based on the observation, by comparing with the upper

bound v (which might be very loose), the absolute performance ratio τ := v/v is about

Ω(1/√n), rather than a theoretical relative ratio Ω(1/n).


Table 5.1: Numerical results (average of 10 instances) of (EPS)

n 3 5 10 20 30 40 50 60 70

v 0.342 0.434 0.409 0.915 0.671 0.499 0.529 0.663 0.734

v 10.5 16.1 26.7 51.7 74.4 97.8 121.1 143.6 167.1

τ (%) 3.257 2.696 1.532 1.770 0.902 0.510 0.437 0.462 0.439

n · τ 0.098 0.135 0.153 0.354 0.271 0.204 0.218 0.277 0.307√n · τ 0.056 0.060 0.048 0.079 0.049 0.032 0.031 0.036 0.037

With regard to the computational efforts, we report that Algorithm 5.2.1 ran fairly

fast. For instance, for n = 70 we were able to get a feasible solution within seconds,

while computing the upper bound v costed much more computational time. For n ≥ 80,

however, our computer reported to run out of memory in the experiments, a problem

purely due to the sheer size of the input data.

5.6.2 Local Improvements

The theoretical worst-case performance ratios that we have developed so far are cer-

tainly very conservative, as observed in the previous subsection. It will be desirable to

design a more realistic procedure to know how good the solutions actually are. One

point to note is that we can always improve the quality of the solution by applying a

local improvement procedure on our heuristic solutions. In the Matlab environment,

such local search procedure is readily available, e.g., the fmincon function in Matlab

7.7.0 (R2008b), which finds a local KKT point starting from the feasible solution that

we provide. In our experiments, we find that the fmincon function works well at least

for the low dimensional problems. In particular, for our test cases, it works quite stably

up to n = 10.

In order to evaluate the true quality of our approximate solutions it is desirable to

probe the optimal values, instead of using the loose upper bounds. For this purpose

we set up the following experiments. In this set of experiments we restrict ourselves to

the low dimensional cases, say n ≤ 10. First we take the feasible approximate solution

(which has an objective v) as a starting point to be followed by the local improvement

procedure of the fmincon function to obtain a KKT solution, and denote its objective

value to be vf . Then we use a brutal force approach to randomly sample 1,000 feasible


Table 5.2: Numerical objectives of (EPS) with local improvements for n = 5

Instance 1 2 3 4 5 6 7 8 9 10

v 0.35 0.47 0.08 0.40 0.17 0.13 0.07 1.78 0.32 0.53

vf 4.16 4.85 4.24 3.99 4.28 6.49 6.46 6.42 5.14 6.84

v∗f 4.16 4.85 4.24 3.99 4.28 6.49 6.46 6.42 5.14 6.84

v 14.33 14.92 14.88 15.62 17.59 14.34 15.60 19.12 13.63 15.01

Table 5.3: Numerical objectives of (EPS) with local improvements for n = 10

Instance 1 2 3 4 5 6 7 8 9 10

v 1.51 1.24 0.80 0.28 0.09 0.12 0.30 0.36 0.35 0.61

vf 8.98 7.74 9.74 8.71 8.14 11.24 9.82 7.75 9.18 11.08

v∗f 9.70 7.74 9.74 8.90 8.14 11.24 9.82 7.85 9.18 11.08

v 26.67 24.88 28.06 28.75 27.82 26.99 26.92 27.75 27.83 27.10

solutions, followed by the same local improving fmincon function in Matlab. We then

pick the best one as the proxy of the true optimal solution, and denote its objective

value to be v∗f . This is doable for the case n ≤ 10 in our computational environment.

For the case n = 5 and n = 10 respectively, we generate 10 random instances

of (EPS). The solutions obtained, as described above, are shown in Table 5.2 and

Table 5.3 respectively. The results are quite telling: Algorithm 5.2.1 together with

fmincon yields near optimal solutions, at least for low dimension problems. However

for problems in high dimensions, a stable local improvement procedure is a nontrivial

task, interested readers are referred to a recent paper by Chen et al. [25].

Chapter 6

Polynomial Optimization with

Binary Constraints

6.1 Introduction

We shift the focus from continuous optimization models in previous chapters, to discrete

optimizations. In fact, a very large class of discrete optimization problems have their

objectives and constraints being polynomials, e.g., the graph partition problems, the

network flow problems. In particular, this chapter is concerned with the models of opti-

mizing a polynomial function subject to binary constraints, with the objective focusing

on the four types of polynomial functions mentioned in Section 2.1.1. Specifically, the

models are

(TB) max F (x1,x2, · · · ,xd)

s.t. xk ∈ Bnk , k = 1, 2, . . . , d;

(HB) max f(x)

s.t. x ∈ Bn;

(MB) max f(x1,x2, · · · ,xs)

s.t. xk ∈ Bnk , k = 1, 2, . . . , s;

(PB) max p(x)

s.t. x ∈ Bn.

These four models are discussed sequentially, each one in one chapter. The lat-

ter model generalizes the former one, and each generalization has its own approach

97

98 6 Polynomial Optimization with Binary Constraints

and technique to cope with. The last model, (PB), is indeed a very general discrete

optimization model, since in principle it can be used to model the following general

polynomial optimization problem in discrete values:

max p(x)

s.t. xi ∈ ai1, ai2, · · · , aimi, i = 1, 2, . . . , n.

We also discuss polynomial optimizations over hypercubes as some byproducts of this

chapter. They are models (TB), (HB), (MB) and (PB), i.e., the respective models (TB),

(HB), (MB) and (PB) with B being replaced by B.

All the models are unfortunately NP-hard when the degree of the objective poly-

nomial d ≥ 2, albeit they are trivial when d = 1. This is because each one includes

computing the matrix ∞ 7→ 1-norm (see e.g., [5]) as a subclass, i.e.,

‖F ‖∞7→1 = max (x1)TFx2

s.t. x1 ∈ Bn1 , x2 ∈ Bn2 ,

which is also the exact model of (TB) when d = 2. The matrix ∞ 7→ 1-norm is related

to so-call the matrix cut-norm, the current best polynomial-time approximation ratio

for matrix ∞ 7→ 1-norm as well as the matrix cut-norm is 2 ln(1+√

2)π ≈ 0.56, due to

Alon and Naor [5]. Huang and Zhang [59] considered similar problems for the complex

discrete variables and derived constant approximation ratios. When d = 3, (TB) is

a slight generalization of the model considered by Khot and Naor [63], where F is

assumed to be super-symmetric (implying n1 = n2 = n3) and square-free (Fijk = 0

whenever two of the three indices are equal). The approximation bound of the optimal

value given in [63] is Ω(√

logn1

n1

).

For the model (HB), its NP-hardness for d = 2 can also be derived by reducing to the

max-cut problem, where the matrix F is the Laplacian of a given graph. In a seminar

work by Goemans and Williamson [40], a polynomial-time randomized approximation

algorithm is given with approximation ratio 0.878, by the well known SDP relaxation

and randomization technique. The method is then generalized by Nesterov, who in [88]

proved a 0.63-approximation ratio for (HB) when the matrix F is positive semidefinite.

A more generalized result is due to Charikar and Wirth [24], where an Ω (1/ log n)-

approximate ratio for (HB) is proposed when the matrix F is diagonal-free. If the

degree of the objective polynomial goes higher, the only approximation result in the

literature is due to Khot and Naor [63] in considering homogeneous cubic polynomial,

6.2 Multilinear Form with Binary Constraints 99

where an Ω

(√lognn

)-approximation bound is provided when the tensor F is square-

free. In fact, square-free (or in the matrix case diagonal-free) is some kind of necessary

condition to derive polynomial-time approximation algorithms (see e.g., [4]). Even in

the quadratic case, there is no polynomial-time approximation algorithm with a positive

approximation ratio for the general model (HB) unless P = NP .

In this chapter we propose polynomial-time randomized approximation algorithms

with provable worst-case performance ratios for all the models mentioned in the begin-

ning, provided that the degree of the objective polynomial is fixed. Section 6.2 discusses

the model (TB). Essentially, we apply a similar approach as in Chapter 3, by relaxing

the multilinear form objective to a lower order multilinear form. However the discrete

natural makes the problems quite different as the continuous ones in Chapter 3, and a

novel decomposition routine is proposed in order to derive the approximation bound.

Section 6.3 and Section 6.4 discuss models (HB) and (MB), respectively. Both of them

use multilinear form relaxations, armed with two different versions of link identities, in

order to preserve the approximation bounds under the square-free property. General

model (PB) is discussed in Section 6.5, where the homogenization technique in Chap-

ter 5 is modified and applied. All these approximation algorithms can be applied to

polynomial optimizations over hypercubes, and we also brief the results in Section 6.5

as some byproducts. Some specific applications for the discrete models and approxi-

mation algorithms proposed in this chapter will be discussed in Section 6.6. Finally,

we report our numerical experiment results in Section 6.7.

6.2 Multilinear Form with Binary Constraints

Our first discrete model in question is to maximize a multilinear function in binary

variables, specifically

(TB) max F (x1,x2, · · · ,xd)

s.t. xk ∈ Bnk , k = 1, 2, . . . , d,

where n1 ≤ n2 ≤ · · · ≤ nd.

This model is NP-hard when d ≥ 2, and we shall propose polynomial-time random-

ized approximation algorithms with worse-case performance ratios. The case of d = 2

is to compute ‖F ‖∞7→1, whose best approximation bound is 2 ln(1+√

2)π ≈ 0.56, due to


Alon and Naor [5]. It also servers as a basis in our subsequence analysis. When d = 3,

Khot and Naor [63] proposed a randomized procedure to compute the optimal value of

(TB) in polynomial-time, with approximation bound Ω(√

logn1

n1

).

Our approximation algorithm works for general degree d based on recursion, and is

fairly simple. We may take any approximation algorithm for the d = 2 case, say the

algorithm by Alon and Naor [5], as a basis. When d = 3, noticing that any n1×n2×n3

third order tensor can be rewritten as an n1n2 × n3 matrix by combining its first and

second modes, (TB) can be relaxed to

max F (X,x3)

s.t. X ∈ Bn1n2 , x3 ∈ Bn3 .

This problem is the exact form of (TB) when d = 2, which can be solved approximately

with approximation ratio 2 ln(1+√

2)π . Denote its approximate solution to be (X, x3).

The next key step is to recover (x1, x2) from X. For this purpose, we introduce the

following decomposition routine, which plays a fundamental role in our algorithms for

binary variables, similar as DR 3.2.1, DR 3.2.2 and DR 3.3.1 in Chapter 3.


• INPUT: matrices M ∈ Rn1×n2 , X ∈ Bn1×n2.

1 Construct

X =

In1×n1 X/√n1

XT/√n1 XTX/n1

0.


η

∼ N (0n1+n2 , X)

and compute x1 = sign (ξ) and x2 = sign (η), and repeat if necessary, until

(x1)TMx2 ≥ 2π√n1M •X.

• OUTPUT: vectors x1 ∈ Bn1 , x2 ∈ Bn2.

The complexity for DR 6.2.1 is O (n1n2) in each trial with expectation. Now, if we

let (M ,X) = (F (·, ·, x3), X) and apply DR 6.2.1, then we can prove that the output

6.2 Multilinear Form with Binary Constraints 101

(x1,x2) satisfies

E[F (x1,x2, x3)] = E[(x1)TMx2] ≥ 2M • Xπ√n1

=2F (X, x3)

π√n1

≥ 4 ln(1 +√

2)

π2√n1v(TB),

which yields an approximation ratio for d = 3. By a recursive procedure, this approxi-

mation algorithm is readily extended to solve (TB) with any fixed degree d.

Theorem 6.2.1 (TB) admits a polynomial-time randomized approximation algorithm

with approximation ratio τ(TB), where

τ(TB) :=

(2

π

)d−1

ln(

1 +√

2)(d−2∏

k=1

nk

)− 12

= Ω

(d−2∏k=1

nk

)− 12

.

Proof. The proof is based on mathematical induction on the degree d. For the case

of d = 2, it is exactly the algorithm by Alon and Naor [5]. For general d ≥ 3, let

X = x1(xd)T

and (TB) is then relaxed to

(TB) max F (X,x2,x3 · · · ,xd−1)

s.t. X ∈ Bn1nd ,

xk ∈ Bnk , k = 2, 3, . . . , d− 1,

where we treat X as an n1nd-dimensional vector, and F ∈ Rn1nd×n2×n3×···×nd−1 as a

(d − 1)-th order tensor. Observe that (TB) is the exact form of (TB) in degree d − 1,

and so by induction we can find X ∈ Bn1nd and xk ∈ Bnk (k = 2, 3, . . . , d − 1) in

polynomial-time, such that

F(X, x2, x3, . . . , xd−1

)≥ (2/π)d−2 ln

(1 +√

2) (∏d−2

k=2 nk

)− 12v(TB)

≥ (2/π)d−2 ln(1 +√

2) (∏d−2

k=2 nk

)− 12v(TB).

Rewrite X as an n1 × nd matrix, construct

X =

In1×n1 X/√n1

XT/√n1 X

TX/n1

0,

as in DR 6.2.1, and randomly generate ξ

η

∼ N (0n1+nd , X).

Let x1 = sign (ξ) and xd = sign (η). Noticing that the diagonal components of X are

all ones, by an expectation identity in Goemans and Williamson [40], it follows that

E[x1i xdj

]=

2

πarcsin

Xij√n1

=2

πXij arcsin

1√n1

∀ 1 ≤ i ≤ n1, 1 ≤ j ≤ nd,


where the last equality is due to |Xij | = 1. Let matrix Q = F (·, x2, x3, · · · , xd−1, ·),

and we have

E[F(x1, x2, · · · , xd

)]= E

∑1≤i≤n1,1≤j≤nd

x1i Qij x

dj

=

∑1≤i≤n1,1≤j≤nd

Qij E[x1i xdj

]=

∑1≤i≤n1,1≤j≤nd

Qij2

πXij arcsin

1√n1

=2

πarcsin

1√n1

∑1≤i≤n1,1≤j≤nd

QijXij

=2

πarcsin

1√n1

F(X, x2, x3, · · · , xd−1

)(6.1)

≥ 2

π√n1

(2

π

)d−2

ln(

1 +√

2)(d−2∏

k=2

nk

)− 12

v(TB)

=

(2

π

)d−1

ln(

1 +√

2)(d−2∏

k=1

nk

)− 12

v(TB).

Thus x1 and xd can be found by a randomization process, which concludes the induction

step.

To summarize this section, the algorithm for solving general model (TB) is attached

below. This algorithm is similar to Algorithm 3.2.3, with major differences lying in

different decomposition routines and the computability for the case of d = 2.

Algorithm 6.2.2

• INPUT: a d-th order tensor F ∈ Rn1×n2×···×nd with n1 ≤ n2 ≤ · · · ≤ nd.




2 For (TB) with the (d − 1)-th order tensor F ′: if d − 1 = 2, then apply SDP

relaxation and randomization procedure (Alon and Naor [5]) to obtain an approx-

imate solution (x2, x1,d); otherwise obtain a solution (x2, x3, · · · , xd−1, x1,d) by

recursion.

6.3 Homogeneous Form with Binary Constraints 103


a matrix X ∈ Bn1×nd.

4 Apply DR 6.2.1, with input (M ′, X) = (M ,X) and output (x1, xd) = (x1,x2).


6.3 Homogeneous Form with Binary Constraints

We now consider the model of maximizing a general homogeneous polynomial function

in binary variables, i.e.,

(HB) max f(x)

s.t. x ∈ Bn,

where f(x) is a d-th degree homogenous polynomial with associated super-symmetric

tensor F ∈ Rnd .

When d = 2, an Ω (1/ log n)-approximate ratio for (HB) is proposed when the

matrix F is diagonal-free, by Charikar and Wirth [24]; When d = 3, an Ω

(√lognn

)-

approximation bound for the optimal value of (HB) is provided if the tensor F is

square-free, by Khot and Naor [63]. We remark that the square-free property is a

necessary condition to derive the approximation ratios. Even in the quadratic and

cubic cases for (HB), there is no polynomial-time approximation algorithm with a

positive approximation ratio unless P = NP (see [4]).

As before, we propose polynomial-time randomized approximation algorithms of

(HB) for any fixed degree d. Like the model (TS), the key link from multilinear form

F (x1,x2, · · · ,xd) to the homogeneous polynomial f(x) is Lemma 4.2.1. The approx-

imation ratios for (HB) hold under the square-free condition. This is because under

such conditions, the decision variables are actually in the multilinear form. Hence, one

can replace any point in the hypercube (Bn) by one of its vertices (Bn) without de-

creasing its objective value, due to the linearity. Before presenting our main results in

this section, we first study a property of the square-free polynomial in binary variables,

which will be used frequently in this chapter and the next chapter (Chapter 7).

Lemma 6.3.1 If polynomial function p(x) is square-free and z ∈ Bn, then x ∈ Bn and

x ∈ Bn can be found in polynomial-time, such that p(x) ≤ p(z) ≤ p(x).


Proof. Since p(x) is square-free, by fixing x2, x3, · · · , xn as constants and taking x1 as

an independent variable, we may write

p(x) = g1(x2, x3, · · · , xn) + x1g2(x2, x3, · · · , xn).

Let

x1 =

− 1 g2(z2, z3, · · · , zn) ≥ 0,

1 g2(z2, z3, · · · , zn) < 0.

Then

p((x1, z2, z3, · · · , zn)T

)≤ p(z).

Repeat the same procedures for z2, z3, · · · , zn, and let them be replaced by binary scales

x2, x3, · · · , xn, respectively. Then x = (x1, x2, · · · , xn)T ∈ Bn satisfies p(x) ≤ p(z).

Using a similar procedure, we may find x ∈ Bn with p(x) ≥ p(z).

Lemma 6.3.1 actually proposes a polynomial-time procedure in finding a point in

Bn to replace a point in Bn, without decreasing (or increasing) its function value. Now,

armed with Lemma 6.3.1 and the link Lemma 4.2.1, we present the main results in this

section.

Theorem 6.3.2 If f(x) is square-free and d ≥ 3 is odd, then (HB) admits a polynomial-

time randomized approximation algorithm with approximation ratio τ(HB), where

τ(HB) :=

(2

π

)d−1

ln(

1 +√

2)d! d−dn−

d−22 = Ω

(n−

d−22

).

Proof. Let f(x) = F (x,x, · · · ,x︸︷︷︸d

) with F being super-symmetric, and (HB) can be

relaxed to

(HB) max F (x1,x2, · · · ,xd)

s.t. xk ∈ Bn, k = 1, 2, . . . , d.

By Theorem 6.2.1 we are able to find a set of binary vectors (x1, x2, · · · , xd) in


F (x1, x2, · · · , xd) ≥(

2

π

)d−1

ln(1+√

2)n−d−22 v(HB) ≥

(2

π

)d−1

ln(1+√

2)n−d−22 v(HB).

When d is odd, let ξ1, ξ2, · · · , ξd be i.i.d. random variables, each taking values 1 and

−1 with equal probability 1/2. Then by Lemma 4.2.1 it follows that

d!F (x1, x2, · · · , xd) = E

[d∏i=1

ξif

(d∑

k=1

ξkxk

)]= E

f d∑k=1

∏i 6=k

ξi

xk .

6.3 Homogeneous Form with Binary Constraints 105

Thus we may find a binary vector β ∈ Bd, such that

f

d∑k=1

∏i 6=k

βi

xk ≥ d!F (x1, x2, . . . , xd) ≥

(2

π

)d−1

ln(

1 +√

2)d!n−

d−22 v(HB).

Now we notice that 1d

∑dk=1

(∏i 6=k βi

)xk ∈ Bn, because for all 1 ≤ j ≤ n,

∣∣∣∣∣∣1

d

d∑k=1

∏i 6=k

βi

xkj

∣∣∣∣∣∣ =1

d

∣∣∣∣∣∣d∑

k=1

∏i 6=k

βi

xkj

∣∣∣∣∣∣ ≤ 1

d

d∑k=1

∣∣∣∣∣∣∏i 6=k

βi

xkj

∣∣∣∣∣∣ = 1. (6.2)

Since f(x) is square-free, by Lemma 6.3.1 we are able to find x ∈ Bn in polynomial-

time, such that

f(x) ≥ f

1

d

d∑k=1

∏i 6=k

βi

xk = d−df

d∑k=1

∏i 6=k

βi

xk ≥ τ(HB) v(HB).

Theorem 6.3.3 If f(x) is square-free and d ≥ 4 is even, then (HB) admits a polynomial-

time randomized approximation algorithm with relative approximation ratio τ(HB).

Proof. Like in the proof of Theorem 6.3.2, by relaxing (HB) to (HB), we are able to

find a set of binary vectors (x1, x2, · · · , xd) with

F (x1, x2, · · · , xd) ≥(

2

π

)d−1

ln(

1 +√

2)n−

d−22 v(HB).

Besides, we observe that v(HB) ≤ v(HB) and v(HB) ≥ v(HB) = −v(HB). Therefore

2 v(HB) ≥ v(HB)− v(HB).

Let ξ1, ξ2, · · · , ξd be i.i.d. random variables, each taking values 1 and −1 with equal

probability 1/2. Use a similar argument of (6.2), we have 1d

∑dk=1 ξkx

k ∈ Bn. Then by

Lemma 6.3.1, there exists x ∈ Bn such that

f

(1

d

d∑k=1

ξkxk

)≥ f(x) ≥ v(HB).


Applying Lemma 4.2.1 and we have

1

2E

[f

(1

d

d∑k=1

ξkxk

)− v(HB)

∣∣∣∣∣d∏i=1

ξi = 1

]

≥ 1

2E

[f

(1

d

d∑k=1

ξkxk

)− v(HB)

∣∣∣∣∣d∏i=1

ξi = 1

]

−1

2E

[f

(1

d

d∑k=1

ξkxk

)− v(HB)

∣∣∣∣∣d∏i=1

ξi = −1

]

= E

[d∏i=1

ξi

(f

(1

d

d∑k=1

ξkxk

)− v(HB)

)]

= d−dE

[d∏i=1

ξif

(d∑

k=1

ξkxk

)]− v(HB) E

[d∏i=1

ξi

]= d−dd!F (x1, x2, . . . , xd) ≥ τ(HB) v(HB) ≥ (τ(HB)/2) (v(HB)− v(HB)) .

Thus we may find a binary vector β ∈ Bd with∏di=1 βi = 1, such that

f

(1

d

d∑k=1

βkxk

)− v(HB) ≥ τ(HB) (v(HB)− v(HB)) .

Noticing that 1d

∑dk=1 βkx

k ∈ Bn and applying Lemma 6.3.1, by the square-free property

of f(x), we are able to find x ∈ Bn with

f(x)− v(HB) ≥ f

(1

d

d∑k=1

βkxk

)− v(HB) ≥ τ(HB) (v(HB)− v(HB)) .

To conclude this section, we summarize the algorithm for approximately solving

(HB) below (no matter d is odd or even).

Algorithm 6.3.1

• INPUT: a d-th order super-symmetric square-free tensor F ∈ Rnd.


max F (x1,x2, · · · ,xd)

s.t. xk ∈ Bn, k = 1, 2, . . . , d

approximately, with input F and output (x1, x2, · · · , xd).

6.4 Mixed Form with Binary Constraints 107

2 Compute x = arg maxf(

1d

∑dk=1 ξkx

k), ξ ∈ Bd

.

3 Apply the procedure in Lemma 6.3.1, with input x ∈ Bn and polynomial function

f(x), and output x ∈ Bn satisfying f(x) ≥ f(x).

• OUTPUT: a feasible solution x ∈ Bn.

6.4 Mixed Form with Binary Constraints

We further move on to consider the mixed form of discrete polynomial optimization

model

(MB) max f(x1,x2, · · · ,xs)

s.t. xk ∈ Bnk , k = 1, 2, . . . , s,

where associated with function f is a tensor F ∈ Rn1d1×n2

d2×···×nsds with partial sym-

metric property, n1 ≤ n2 ≤ · · · ≤ ns, and d = d1 + d2 + · · · + ds is deemed as a fixed

constant. This model is a generalization of (TB) in Section 6.2 and (HB) in Section 6.3,

making the model applicable to a wider range of practical problems.

Here again we focus on polynomial-time approximation algorithms. Similar as the

approach in dealing with (HB), we relax the objective function f(x1,x2, · · · ,xs) of

(MB) to a multilinear function, which leads to (TB). After solving (TB) approximately

by Theorem 6.2.1, we are able to adjust the solutions one by one, using Lemma 4.4.3.

The following approximation results are presented, which are comparable to that in

Section 6.3.

Theorem 6.4.1 If f(x1,x2, · · · ,xs) is square-free in each xk (k = 1, 2, . . . , s), d ≥ 3

and one of dk (k = 1, 2, . . . , s) is odd, then (MB) admits a polynomial-time randomized

approximation algorithm with approximation ratio τ(MB), where

τ(MB) := τ(MS)

(2

π

)d−1

ln(

1 +√

2) s∏k=1

dk!

dkdk

= Ω (τ(MS))

=

(2

π

)d−1

ln(

1 +√

2)( s∏

k=1

dk!

dkdk

)(∏s−1k=1 nk

dk

ns−1

)− 12

ds = 1,

(2

π

)d−1

ln(

1 +√

2)( s∏

k=1

dk!

dkdk

)(∏sk=1 nk

dk

ns2

)− 12

ds ≥ 2.


Proof. Like in the proof of Theorem 6.3.2, by relaxing (MB) to (TB), we are able to

find a set of binary vectors (x1, x2, · · · , xd) with

F (x1, x2, · · · , xd) ≥ τ(MB)

(s∏

k=1

dkdk

dk!

)v(MB).

Let ξ = (ξ1, ξ2, · · · , ξd)T, whose components are i.i.d. random variables, taking values

1 and −1 with equal probability 1/2. Similar as (4.7), we denote

x1ξ =

d1∑k=1

ξkxk, x2

ξ =

d1+d2∑k=d1+1

ξkxk, · · · , xsξ =

d∑k=d1+d2+···+ds−1+1

ξkxk.

Without loss of generality, we assume d1 to be odd. By applying Lemma 4.4.3 we have

s∏k=1

dk!F (x1, x2, · · · , xd) = E

[d∏i=1

ξif(x1ξ , x

2ξ , · · · , xsξ

)]= E

[f

(d∏i=1

ξix1ξ , x

2ξ , · · · , xsξ

)].

Therefore we are able to find a binary vector β ∈ Bd, such that

f

(d∏i=1

βix1β

d1,x2β

d2, · · · ,

xsβds

)≥

s∏k=1

dk!dk−dkF (x1, x2, · · · , xd) ≥ τ(MB) v(MB).

Similar as (6.2), it is not hard to verify that∏di=1 βix

1β/d1 ∈ Bn1 , and xkβ/dk ∈ Bnk for

k = 2, 3, . . . , s. By the square-free property of the function f and applying Lemma 6.3.1,

we are able to find a set of binary vectors (x1, x2, · · · , xs) in polynomial-time, such that

f(x1, x2, · · · , xs) ≥ f

(d∏i=1

βix1β

d1,x2β

d2, · · · ,

xsβds

)≥ τ(MB) v(MB).

Theorem 6.4.2 If f(x1,x2, · · · ,xs) is square-free in each xk (k = 1, 2, . . . , s), d ≥ 4

and all dk (k = 1, 2, . . . , s) are even, then (MB) admits a polynomial-time randomized

approximation algorithm with relative approximation ratio τ(MB).

Proof. The proof is analogous to that of Theorem 6.3.3. The main differences are:

(i) we use Lemma 4.4.3 instead of invoking Lemma 4.2.1 directly; and (ii) we use

f(

1d1x1ξ ,

1d2x2ξ , · · · , 1

dsxsξ

)instead of f

(1d

∑dk=1 ξkx

k)

during the randomization pro-

cess.

6.5 Polynomial with Binary Constraints 109

6.5 Polynomial with Binary Constraints

Finally, we consider binary integer programming model to the optimization on a generic

(inhomogeneous) polynomial function, i.e.,

(PB) max p(x)

s.t. x ∈ Bn.

Extending the approximation algorithms and the corresponding analysis for homo-

geneous polynomial optimization to general inhomogeneous polynomials is not straight-

forward. Technically it is also a way to get around the square-free property, which is

a requirement for all the homogeneous polynomial optimizations discussed in previous

sections. The analysis here, is similar as that in Chapter 5, to directly deal with ho-

mogenization. An important observation here is that p(x) can always be rewritten as

a square-free polynomial, since we have xi2 = 1 for i = 1, 2, . . . , n, which allows us

to reduce the power of xi to 0 or 1 in each monomial of p(x). We now propose the

following algorithm for approximately solving (PB).

Algorithm 6.5.1

• INPUT: an n-dimensional d-th degree polynomial function p(x).

1 Rewrite p(x) as a square-free polynomial function p0(x), and then rewrite p0(x)−

p0(0) = F (x, x, · · · , x︸︷︷︸d

) when xh = 1 as in (5.2), with F being an (n + 1)-

dimensional d-th order super-symmetric tensor.


max F (x1, x2, · · · , xd)

s.t. xk ∈ Bn+1, k = 1, 2, . . . , d

approximately, with input F and output (u1, u2, · · · , ud).


ξ1u1/d1

),(ξ2u2/d

1

), · · · ,

(ξdu

d/d1

)), ξ ∈ Bd

.

4 Compute z = arg maxp0(0); p0 (z(β)/zh(β)) ,β ∈ Bd and β1 =

∏dk=2 βk = 1

,

with z(β) = β1(d+ 1)z1 +∑d

k=2 βkzk.


5 Apply the procedure in Lemma 6.3.1, with input z ∈ Bn and polynomial function

p0(x), and output y ∈ Bn satisfying p0(y) ≥ p0(z).

• OUTPUT: a feasible solution y ∈ Bn.

Before presenting the main result and analyzing Algorithm 6.5.1, we first study

another property of the square-free polynomial. Namely, the overall average of the

function values on the support set Bn is zero, and this plays an important role in

analyzing the algorithm for (PB).

Lemma 6.5.1 If the polynomial function p(x) in (PB) : maxx∈Bn p(x) is square-free

and has no constant term, then v(PB) ≤ 0 ≤ v(PB), and a binary vector x ∈ Bn can

be found in polynomial-time with p(x) ≥ 0.

Proof. Let ξ1, ξ2, · · · , ξn be i.i.d. random variables, each taking values 1 and −1 with

equal probability 1/2. For any monomial Fi1i2...ikxi1xi2 · · ·xik with degree k (1 ≤ k ≤ d)

of p(x), by the square-free property, it follows that

E[Fi1i2...ikξi1ξi2 · · · ξik ] = Fi1i2···ikE[ξi1 ]E[ξi2 ] · · ·E[ξik ] = 0.

This implies E[p(ξ)] = 0, and consequently v(PB) ≤ 0 ≤ v(PB). By a randomization

process, a binary vector x ∈ Bn can be found in polynomial-time with p(x) ≥ 0.

We remark that the second part of Lemma 6.5.1 can also be proven by conducting

the procedure in Lemma 6.3.1 with the input vector 0 ∈ Bn, since p(0) = 0. Therefore,

finding a binary vector x ∈ Bn with p(x) ≥ 0 can be done by either a randomized

process (Lemma 6.5.1) or a deterministic process (Lemma 6.3.1). We now present the

main result in this section.

Theorem 6.5.2 (PB) admits a polynomial-time randomized approximation algorithm

with relative approximation ratio τ(PB), where

τ(PB) :=ln(1 +√

2)

2(1 + e)πd−1(d+ 1)! d−2d(n+ 1)−

d−22 = Ω

(n−

d−22

).

Proof. The main idea of the proof is quite similar as that of Theorem 5.2.2. However

the discrete nature of the problem as well as the non-convex feasible region requires us

to be more careful dealing with the specific details. As we are working with relative


approximation ratio, by Step 1 of Algorithm 6.5.1, we may assume that p(x) is square-

free and has no constant term. Then by homogenization as (5.2)

p(x) = F

((x

xh

),

(x

xh

), · · · ,

(x

xh

)︸︷︷︸

d

)= F (x, x, · · · , x︸︷︷︸

d

) = f(x),

where f(x) = p(x) if xh = 1, and f(x) is an (n+1)-dimensional homogeneous polynomi-

al function with associated super-symmetric tensor F ∈ R(n+1)d whose last component

is 0. (PB) is then equivalent to

max f(x)

s.t. x =

(x

xh

), x ∈ Bn, xh = 1,

which can be relaxed to an instance of (TB) as follows

(PB) max F (x1, x2, · · · , xd)

s.t. xk ∈ Bn+1, k = 1, 2, . . . , d.

Let (u1, u2, · · · , ud) be the feasible solution for (PB) found by Theorem 6.2.1 with

F (u1, u2, · · · , ud) ≥ (2/π)d−1 ln(1 +√

2)(n+ 1)−d−22 v(PB)

≥ (2/π)d−1 ln(1 +√

2)(n+ 1)−d−22 v(PB).

Denote vk = uk/d for k = 1, 2, . . . , d, and consequently

F (v1, v2, · · · , vd) = d−dF (u1, u2, · · · , ud) ≥ (2/π)d−1 ln(1 +√

2)d−d(n+ 1)−d−22 v(PB).

Notice that for all 1 ≤ k ≤ d, |vkh| = |ukh/d| = 1/d ≤ 1 and the last component of tensor

F is 0. By applying Lemma 5.2.4, it follows that

E

[d∏

k=1

ηkF

((η1v

1

1

),

(η2v

2

1

), · · · ,

(ηdv

d

1

))]= F (v1, v2, · · · , vd)

and

E

[F

((ξ1v

1

1

),

(ξ2v

2

1

), · · · ,

(ξdv

d

1

))]= 0,

where η1, η2, . . . , ηd are independent random variables, each taking values 1 and −1

with E[ηk] = vkh for k = 1, 2, . . . , d, and ξ1, ξ2, · · · , ξd are i.i.d. random variables, each

taking values 1 and −1 with equal probability 1/2. Combining the two identities, we


have, for any constant c, the following identity

F (v1, v2, · · · , vd)

=∑


(c− Prob η = β)F((

β1v1

1

),

(β2v

2

1

), · · · ,

(βdv

d

1

))

+∑


(c+ Prob η = β)F((

β1v1

1

),

(β2v

2

1

), · · · ,

(βdv

d

1

)).

If we let c = maxβ∈Bd,∏dk=1 βk=−1 Prob η = β, then the coefficient of each term F in

the above is nonnegative. Therefore, a binary vector β′ ∈ Bd can be found, such that

F

((β′1v

1

1

),

(β′2v

2

1

), · · · ,

(β′dv

d

1

))≥ τ0 F (v1, v2, · · · , vd),

with

τ0 =

∑β∈Bd,

∏dk=1 βk=1

(c+ Prob η = β) +∑


(c− Prob η = β)

−1

≥(

2dc+ 1)−1≥

(2d(

1

2+

1

2d

)d+ 1

)−1

≥ 1

1 + e,

where c ≤(

12 + 1

2d

)dis applied, since E[ηk] = vkh = ±1/d for k = 1, 2, . . . , d. Denote

zk =(zk

zkh

)=(β′kvk

1

)for k = 1, 2, . . . , d, and we have

F (z1, z2, · · · , zd) ≥ τ0F (v1, v2, · · · , vd) ≥(

2

π

)d−1 ln(1 +√

2)

1 + ed−d(n+ 1)−

d−22 v(PB).

For any β ∈ Bd, denote z(β) = β1(d+ 1)z1 +∑d

k=2 βkzk. By noticing zkh = 1 and

|zki | = |vki | = |uki |/d = 1/d for all 1 ≤ k ≤ d and 1 ≤ i ≤ n, it follows that

2 ≤ |zh(β)| ≤ 2d and |zi(β)| ≤ (d+ 1)/d+ (d− 1)/d = 2 ∀ 1 ≤ i ≤ n.

Thus z(β)/zh(β) ∈ Bn. By Lemma 6.3.1, there exists x′ ∈ Bn, such that

v(PB) ≤ p(x′) ≤ p (z(β)/zh(β)) = f (z(β)/zh(β)) .

Moreover, we shall argue below that

β1 = 1 =⇒ f(z(β)) ≥ (2d)d v(PB). (6.3)

If this were not the case, then by Lemma 6.5.1 f (z(β)/(2d)) < v(PB) ≤ 0. Notice that

β1 = 1 implies zh(β) > 0, and thus we have

f

(z(β)

zh(β)

)=

(2d

zh(β)

)df

(z(β)

2d

)≤ f

(z(β)

2d

)< v(PB),


which is a contradiction.

Suppose ξ = (ξ1, ξ2, · · · , ξd)T, whose components are i.i.d. random variables, each

taking values 1 and −1 with equal probability 1/2. Noticing that (6.3) holds and using

the same argument as (5.12), we get

1

2E

[(f (z(ξ))− (2d)d v(PB)

) ∣∣∣∣∣ ξ1 = 1,

d∏k=2

ξk = 1

]≥ d!F

((d+ 1)z1, z2, · · · , zd

).

Therefore, a binary vector β′′ ∈ Bd with β′′1 =∏dk=2 β

′′k = 1 can be found, such that

f(z(β′′))− (2d)d v(PB) ≥ 2d!F ((d+ 1)z1, z2, · · · , zd)

≥(

2

π

)d−1 2 ln(1 +√

2)

1 + e(d+ 1)! d−d(n+ 1)−

d−22 v(PB).

By Lemma 6.5.1, a binary vector x′ ∈ Bn can be found in polynomial-time with

p(x′) ≥ 0. Moreover, as z(β′′)/zh(β′′) ∈ Bn, by Lemma 6.3.1, another binary vector

x′′ ∈ Bn can be found in polynomial-time with p(x′′) ≥ p(z(β′′)/zh(β′′)). Below we

shall prove at least one of x′ and x′′ satisfies

p(x)− v(PB) ≥ τ(PB) (v(PB)− v(PB)) . (6.4)

Indeed, if −v(PB) ≥ τ(PB) (v(PB)− v(PB)), then x′ satisfies (6.4) in this case. Other-

wise we shall have −v(PB) < τ(PB) (v(PB)− v(PB)), then

v(PB) > (1− τ(PB)) (v(PB)− v(PB)) ≥ (v(PB)− v(PB)) /2,

which implies

f

(z(β′′)

2d

)− v(PB) ≥ (2d)−d

(2

π

)d−1 2 ln(1 +√

2)

1 + e(d+ 1)! d−d(n+ 1)−

d−22 v(PB)

≥ τ(PB) (v(PB)− v(PB)) .

The above inequality also implies that f (z(β′′)/(2d)) > 0. Recall that β′′1 = 1 implies

zh(β′′) > 0. Therefore,

p(x′′) ≥ p(z(β′′)

zh(β′′)

)= f

(z(β′′)

zh(β′′)

)=

(2d

zh(β′′)

)df

(z(β′′)

2d

)≥ f

(z(β′′)

2d

),

which implies x′′ satisfies (6.4). Finally, arg maxp(x′), p(x′′) satisfies (6.4) in both

cases.

We remark that (PB) is indeed a very general discrete optimization model. For

example, it can be used to model the following general polynomial optimization problem


in discrete values:

(PD) max p(x)

s.t. xi ∈ ai1, ai2, · · · , aimi, i = 1, 2, . . . , n.

To see this, we observe that by adopting the Lagrange interpolation technique and

letting

xi =

mi∑j=1

aij∏

1≤k≤mi, k 6=j

ui − kj − k

∀ 1 ≤ i ≤ n,

the original decision variables can be equivalently transformed to

ui = j =⇒ xi = aij ∀ 1 ≤ i ≤ n, 1 ≤ j ≤ mi,

where ui ∈ 1, 2, . . . ,mi, which can be further represented by dlog2mie independent

binary variables. Combining these two steps of substitution, (PD) is then reformu-

lated as (PB), with the degree of its objective polynomial function no larger than

max1≤i≤nd(mi−1), and the dimension of its decision variables being∑n

i=1dlog2mie.

In many real world applications, the data ai1, ai2, · · · , aimi (i = 1, 2, . . . , n) in (PD)

are arithmetic sequences. Then it is much easier to transform (PD) to (PB), without

going through the Lagrange interpolation. It keeps the same degree of its objective

polynomial function, and the dimension of its decision variables is∑n

i=1dlog2mie.

Finally, we remark that all the approximation algorithms proposed in this chapter

are also applicable for the polynomial optimizations over hypercubes (Bn), which are

models (TB), (HB), (MB) and (PB), i.e., the respective models (TB), (HB), (MB) and

(PB) with B being replaced by B. In particular, the square-free conditions are no

longer required for homogeneous form objectives and mixed form objectives. Therefore

Algorithm 6.3.1 and Algorithm 6.5.1 can be made simpler without going through the

process in Lemma 6.3.1. We now conclude this section, as well as the theoretical part

of this chapter, by the following theorem without proof.

Theorem 6.5.3 The following approximation results hold for polynomial optimizations

over hypercubes:

1. (TB) admits a polynomial-time randomized approximation algorithm with approx-

imation ratio τ(TB);

2. If d ≥ 3 is odd, then (HB) admits a polynomial-time randomized approximation

algorithm with approximation ratio τ(HB); Otherwise d ≥ 4 is even, then (HB)

6.6 Applications 115

admits a polynomial-time randomized approximation algorithm with relative ap-

proximation ratio τ(HB);

3. If one of dk (k = 1, 2, . . . , s) is odd, then (MB) admits a polynomial-time ran-

domized approximation algorithm with approximation ratio τ(MB); Otherwise all

dk (k = 1, 2, . . . , s) are even, then (MB) admits a polynomial-time randomized

approximation algorithm with relative approximation ratio τ(MB);

4. (PB) admits a polynomial-time randomized approximation algorithm with relative

approximation ratio τ(PB).

6.6 Applications

The models studied in this chapter have versatile applications. Given the generic

nature of the discrete polynomial optimization models, this point is perhaps self-evident.

However, we believe it is helpful to present a few examples at this point with more

details, to illustrate the potential modeling opportunities with the new optimization

models. We shall present three problems in this section and show that they are readily

formulated by the discrete polynomial optimization models in this chapter.

6.6.1 Cut-Norm of Tensors

The concept of cut-norm is initially defined on a real matrix A = (Aij) ∈ Rn1×n2 ,

denoted by ‖A‖C , the maximum over all I ⊂ 1, 2, . . . , n1 and J ⊂ 1, 2, . . . , n2, of

the quantity |∑

i∈I,j∈J Aij |. This concept plays a major role in the design of efficient

approximation algorithms for dense graph and matrix problems (see e.g., [36, 3]). Alon

and Naor in [5] proposed a polynomial-time randomized approximation algorithm that

approximates the cut-norm with a factor at least 0.56, which is currently the best

available approximation ratio. Since a matrix is a second order tensor, it is natural to

extend the cut-norm to general higher order tensors, e.g., a recent paper by Kannan [62].

Specifically, given a d-th order tensor F = (Fi1i2···id) ∈ Rn1×n2×···×nd , its cut-norm is

defined as

‖F ‖C := maxIk⊂1,2,...,nk, k=1,2,...,d

∣∣∣∣∣∣∑

ik∈Ik, k=1,2,...,d

Fi1i2···id

∣∣∣∣∣∣ .In fact, the cut-norm ‖F ‖C is closely related to ‖F ‖∞7→1, which is exactly in the

form of (TB). By Theorem 6.2.1, there is a polynomial-time randomized approximation


algorithm which computes ‖F ‖∞7→1 with a factor at least Ω

((∏d−2k=1 nk

)− 12

), where

we assume n1 ≤ n2 ≤ · · · ≤ nd. The following proposition, asserts that the cut-norm of

a general d-th order tensor can also be approximated by a factor of Ω

((∏d−2k=1 nk

)− 12

).

Proposition 6.6.1 For any d-th order tensor F ∈ Rn1×n2×···×nd, ‖F ‖C ≤ ‖F ‖∞7→1 ≤

2d‖F ‖C .

Proof. Recall that ‖F ‖∞7→1 = maxxk∈Bnk , k=1,2,...,d F (x1,x2, · · · ,xd). For any xk ∈

Bnk (k = 1, 2, . . . , d), it follows that

F (x1,x2, · · · ,xd) =∑

1≤ik≤nk, k=1,2,...,d

Fi1i2···idx1i1x

2i2 · · ·x

did

=∑β∈Bd

∑ik∈j|xkj=βk,1≤j≤nk, k=1,2,...,d

Fi1i2···idx1i1x

2i2 · · ·x

did

=∑β∈Bd

∏1≤k≤d

βk∑

ik∈j|xkj=βk,1≤j≤nk, k=1,2,...,d

Fi1i2···id

≤

∑β∈Bd

∣∣∣∣∣∣∣∑

ik∈j|xkj=βk,1≤j≤nk, k=1,2,...,d

Fi1i2···id

∣∣∣∣∣∣∣≤

∑β∈Bd

‖F ‖C = 2d‖F ‖C ,

which implies ‖F ‖∞7→1 ≤ 2d‖F ‖C .

Observe that ‖F ‖C = maxzk∈0,1nk , k=1,2,...,d |F (z1, z2, · · · , zd)|. For any zk ∈

0, 1nk (k = 1, 2, . . . , d), let zk = (e + xk)/2, where e is the all one vector. Clearly

xk ∈ Bnk for k = 1, 2, . . . , d, and thus

F (z1, z2, · · · , zd) = F

(e+ x1

2,e+ x2

2, · · · , e+ xd

2

)=

F (e, e, · · · , e) + F (x1, e, · · · , e) + · · ·+ F (x1,x2, · · · ,xd)2d

≤ 1

2d· ‖F ‖∞7→1 · 2d = ‖F ‖∞7→1,

which implies ‖F ‖C ≤ ‖F ‖∞7→1.

6.6.2 Maximum Complete Satisfiability

The usual maximum satisfiability problem (see e.g., [38]) is to find the boolean values

of the literals, so as to maximize the total weighted sum of the satisfied clauses. The


key point of the problem is that each clause is in the disjunctive form, namely if one of

the literals is assigned the TRUE value, then the clause is called satisfied. If the literals

are also conjunctive, then this form of satisfiability problem is easy to solve. However,

if not all the clauses can be satisfied, and we alternatively look for an assignment

that maximizes the weighted sum of the satisfied clauses, then the problem is quite

different. To make a distinction from the usual Max-SAT problem, let us call the new

problem to be maximum complete satisfiability, or to be abbreviated as Max-C-SAT.

It is immediately clear that Max-C-SAT is NP-hard, since we can easily reduce the

max-cut problem to it. The reduction can be done as follows. For each edge (vi, vj) we

consider two clauses xi, xj and xi, xj, both having weight wij . Then the Max-C-

SAT solution leads to a solution for the max-cut problem.

Now consider an instance of the Max-C-SAT problem with m clauses, each clause

containing no more than d literals. Suppose that clause k (1 ≤ k ≤ m) has the following

form

xk1 , xk2 , · · · , xksk , xk′1 , xk′2 , . . . , xk′tk,

where sk + tk ≤ d, associated with a weight wk ≥ 0 for k = 1, 2, . . . ,m. Then, the

Max-C-SAT problem can be formulated in the form of (PB) as

max∑m

k=1wk∏skj=1

1+xkj2 ·

∏tki=1

1−xk′i

2

s.t. x ∈ Bn.

According to Theorem 6.5.2 and the nonnegativity of the objective function, the above

problem admits a polynomial-time randomized approximation algorithm with approx-

imation ratio Ω(n−

d−22

), which is independent of the number of clauses m.

6.6.3 Box-Constrained Diophantine Equation

Solving a system of linear equations where the variables are integers and constrained

to a hypercube is an important problem in discrete optimization and linear algebra.

Examples of applications include the classical Frobenius problem (see e.g., [2, 16]), and

the market split problem [26], other from engineering applications in integrated circuits

design and video signal processing. For more details, one is referred to Aardal et al. [1].

Essentially, the problem is to find an integer-valued x ∈ Zn and 0 ≤ x ≤ u, such that


Ax = b. The problem can be formulated by the least square method as

(DE) max −(Ax− b)T(Ax− b)

s.t. x ∈ Zn, 0 ≤ x ≤ u.

According to the discussion at the end of Section 6.5, the above problem can be re-

formulated as a form of (PB), whose objective function is quadratic polynomial and

number of decision variables is∑n

i=1dlog2(ui + 1)e. By applying Theorem 6.5.2, (DE)

admits a polynomial-time randomized approximation algorithm with a constant relative

approximation ratio.

Generally speaking, the Diophantine equations are polynomial equations. The box-

constrained polynomial equations can also be formulated by the least square method

as of (DE). Suppose the highest degree of the polynomial equations is d. Then, this

least square problem can be reformulated as a form of (PB), with the degree of the

objective polynomial being 2d and number of decision variables being∑n

i=1dlog2(ui +

1)e. By applying Theorem 6.5.2, this problem admits a polynomial-time randomized

approximation algorithm with a relative approximation ratio Ω(

(∑n

i=1 log ui)−(d−1)

).


In this section we are going to test the numerical performance of the algorithms pro-

posed in this chapter. Our experiments focus on the model (TB) with d = 4 as a typical

case. Specifically the problem to be tested is

(ETB) max F (x,y, z,w) =∑


s.t. x,y, z,w ∈ Bn.


The input data of (ETB) is generated in the same way as that of (ETS), with entries

of F following i.i.d. standard normals. The first relaxation model for Algorithm 6.2.2

to approximately solve (ETB) is

(ETB) max F (X,w) =∑

1≤i,j,k,`≤n Fijk`Xijkw`

s.t. X ∈ Bn×n×n, w ∈ Bn,

which can be solved approximately using SDP relaxation and randomization method

proposed by Alon and Naor [5]. However, the size of the SDP relaxation problem is


(n3 + n) × (n3 + n), which is intractable for current SDP solvers even when n = 8.

Therefore in our testings, we further relax the above problem to

max F (X) =∑

1≤i,j,k,`≤n Fijk`Xijk`

s.t. X ∈ Bn×n×n×n,

whose optimal solution is trivially sign (F ) with optimal value vB := ‖F ‖1. This

optimal solution can be rewritten as an n3 × n matrix, followed by applying DR 6.2.1

to get a feasible solution of (ETB). Then we can apply the recursion procedures of

Algorithm 6.2.2 to get a feasible solution of the original model (ETB), with its objective

value being denoted by v.

According to Theorem 6.2.1, the theoretical worst-case performance ratio of (ETB)

by Algorithm 6.2.2 is Ω(1/n). However, the theoretical ratio for the above method is

indeed Ω(1/n1.5) because of a deeper relaxation, which can be proven by using the same

argument in Theorem 6.2.1. However, this deeper relaxation allows us to skip the SDP

relaxation of (ETB), and make the method applicable for large dimensions. In general,

the trivial upper bound of v(ETB) generated by this method, vB, may not good, and

we may seek a tighter one. For this purpose we turn to the model (TS) discussed in

Section 3.2. Noticing that an n-dimensional binary vector has a norm√n, we may also

relax (ETB) to

max F (x,y, z,w) =∑


s.t. ‖x‖ = ‖y‖ = ‖z‖ = ‖w‖ =√n,

x,y, z,w ∈ Rn,

which can be further relaxed to

max F (X,Z) =∑

1≤i,j,k,`≤n Fijk`XijZk`

s.t. ‖X‖ = ‖Z‖ = n,

X,Z ∈ Rn×n.

The above problem is the largest singular value problem, whose optimal value can be

computed efficiently. Denoted its optimal value to be vS , which is taken as another

upper bound of v(ETB).

The numerical results of 10 randomly generated instances for the upper bounds vB

and vS , as well as the objective values of the approximate solutions generated are listed

in Table 6.1, which clearly shows that vS outperforms vB significantly. Therefore in the


Table 6.1: Numerical upper bounds of v(ETB) for n = 13

Instance 1 2 3 4 5 6 7 8 9 10

v 619 637 603 664 682 572 613 662 591 752

vB 22742 22588 22775 22711 22827 22905 22593 22966 22789 22678

vS 4251 4314 4346 4368 4294 4338 4295 4330 4330 4303

Table 6.2: Numerical ratios (average of 10 instances) of (ETB)

n 5 10 20 30 40 50 60 70 80 90

τ (%) 35.42 18.51 9.94 7.06 5.45 4.09 3.93 3.06 2.99 2.58

τ · n 1.77 1.85 1.99 2.12 2.18 2.04 2.36 2.14 2.39 2.32

τ · n0.9 1.51 1.47 1.47 1.51 1.51 1.38 1.56 1.40 1.54 1.48

τ · n0.5 0.79 0.59 0.44 0.39 0.35 0.29 0.30 0.26 0.27 0.25

following general testings, we shall choose vS as our candidate of the upper bound, to

test the quality of the approximation solution, i.e., τ := v/vS . The simulation results

are listed in Table 6.2. By observation, the performance ratio is better than Ω(1/n),

and is quite close to Ω(1/n0.9). It is clearly better than the theoretical ratio Ω(1/n1.5).

The computational cost for our method is quite low. In fact, for n = 80, we are able

to get a feasible solution within 2 minutes, while computing the upper bound vS costs

much more time. For n ≥ 95, however, our computer reports to run out of memory in

the experiments, a problem purely due to the sheer size of the input data.

6.7.2 Data of Low-Rank Tensors

The numerical tests conducted so far are based on the data generating from i.i.d. stan-

dard normals. It would be interesting to investigate the practicability of our algorithms

using other data settings. In particular, we shall test some low-rank tensors.

As mentioned in Section 3.4.2 (see also [68]), a fourth order tensor has rank r if it

can be written as a summation of r number of rank-one tensors, and cannot be written

as a summation of r− 1 number of rank-one tensors. Specifically, the data we generate

here is

F :=r∑i=1

a1i ⊗ a2

i ⊗ a3i ⊗ a4

i ,


Table 6.3: Numerical ratios (average of 10 instances) of (ETB) with low-rank tensors

r (rank) 1 2 3 4 5 6 7 8 9 10 15 20

τ (%) for n = 10 34.6 30.9 28.0 32.4 26.7 25.8 27.0 28.3 27.2 26.5 25.9 26.2

τ (%) for n = 20 14.8 15.0 14.0 15.7 11.3 11.1 11.2 11.8 12.1 11.7 11.2 12.0

τ (%) for n = 30 9.1 7.3 7.5 6.9 7.2 7.2 6.6 6.6 7.2 7.7 6.2 5.7

where all aki (k = 1, 2, 3, 4, i = 1, 2, . . . , r) are independent of each other, each of which

following i.i.d. standard normals.

We again use the method discussed in the previous subsection to approximately

solve the model (ETB), and compare its objective v with the upper bound vS , i.e.,

τ = v/vS . The performance ratios for such data settings are shown in Table 6.3 for

n = 10, 20 and 30. By observation, we find that low-rank tensors F improve the

approximation ratios significantly. The lower the rank of the tensor F , the better the

performance ratio.

Chapter 7

Homogeneous Form Optimization

with Mixed Constraints

7.1 Introduction

This chapter brings most of the results in previous chapters together, to discuss mixed

integer programming problems. The objective functions are all homogenous polynomial

functions, while the constraints are a combination of two most widely used ones, the

spherical constraint and the binary constraint. In particular, the models considered

include:


s.t. xk ∈ Bnk , k = 1, 2, . . . , d,

y` ∈ Sm` , ` = 1, 2, . . . , d′;

(HBS) max f(x,y)

s.t. x ∈ Bn,

y ∈ Sm;


s.t. xk ∈ Bnk , k = 1, 2, . . . , s,

y` ∈ Sm` , ` = 1, 2, . . . , t.

The model (MBS) is a generalization of the models (TBS) and (HBS). In fact, it can

also be taken as generalization of most of the homogenous polynomial optimization

122

7.1 Introduction 123

models discussed in previous chapters, namely (TS) of Chapter 3, (HS) and (MS) of

Chapter 4, and (TB), (HB) and (MB) of Chapter 6 as well.

These mixed models have versatile applications, e.g., matrix combinatorial prob-

lem, vector-valued max-cut problem, whose details will be discussed in Section 7.5.

Essentially, in many discrete optimization problems, if the objective to be optimized

is extended from a scalar to a vector or a matrix, then we may turn to optimize the

Euclidean norm of the vector, or the spectrum norm of the matrix, which turns out to

be the mixed integer programming models proposed above.

All these models are NP-hard in general, even in the simplest case of one spherical

constraint and one binary constraint, i.e., the model (TBS) with d = d′ = 1. As we will

see later, it is actually equivalent to the maximization of a positive semidefinite form

in binary variables, which includes max-cut as a subproblem and is thus NP-hard. In

fact, this simplest form of (TBS) serves as a basis for all these mixed integer program-

ming models. By using this basis and mathematical induction, we are able to derive

polynomial-time randomized approximation algorithms with worst-case performance

ratios for (TBS) with any fixed degree. The techniques are similar to that of Chap-

ter 3, and two types of decomposition routines are called, one for decomposition of the

spherical constraints, and one for decomposition of the binary constraints. Moreover,

in order to extend the results from (TBS) to (HBS) and (MBS), the multilinear tensor

form relaxation method is again applied. Armed with the link lemmas (Lemma 4.2.1

and Lemma 4.4.3), we are able to derived approximation algorithms under some mild

square-free conditions.

This chapter is organized as follows. We shall discuss models (TBS), (HBS) and

(MBS) in Sections 7.2, 7.3 and 7.4 respectively, and propose polynomial-time random-

ized approximation algorithms with provable approximation ratios or relative approxi-

mation ratios for the respective models. In Section 7.5, we shall discuss a few specific

problems where these mixed models can be directly applied. For the easy of reading,

in this chapter, we shall exclusively use vector x (∈ Bn) to denote discrete variables,

and vector y (∈ Sm) to denote continuous variables. Throughout our discussion, we

shall fix the degree of the objective polynomial function in these mixed models, d+ d′,

to be a constant.

124 7 Homogeneous Form Optimization with Mixed Constraints

7.2 Multilinear Form with Binary and Spherical Constraints

Our first mixed model is to maximize a multilinear function, with some variables being

binary and some in the unit sphere, namely,


s.t. xk ∈ Bnk , k = 1, 2, . . . , d,

y` ∈ Sm` , ` = 1, 2, . . . , d′,

where n1 ≤ n2 ≤ · · · ≤ nd and m1 ≤ m2 ≤ · · · ≤ md′ . This model is a generalization of

(TS) in Section 3.2 and (TB) in Section 6.2.

The simplest case of (TBS), d = d′ = 1, is worth mention, as it plays an essential role

in the whole chapter. Based on this case, we shall derive polynomial-time approximation

algorithm with worst-case performance ratio for (TBS) with any fixed degree d+ d′.

Proposition 7.2.1 If d = d′ = 1, then (TBS) is NP-hard, and admits a polynomial-

time randomized approximation algorithm with approximation ratio√

2/π.

Proof. When d = d′ = 1, (TBS) can be written as

(TBS) max xTFy

s.t. x ∈ Bn1 , y ∈ Sm1 .

For any fixed x in (TBS), the corresponding optimal y must be FTx/‖FTx‖ due to

the Cauchy-Schwartz inequality, and accordingly,

xTFy = xTFFTx

‖FTx‖= ‖FTx‖ =

√xTFFTx.

Thus (TBS) is equivalent to

max xTFFTx

s.t. x ∈ Bn1 .

Noticing that matrix FFT is positive semidefinite, the above problem includes the max-

cut problem (see e.g., [40]) as a subclass. Therefore it is NP-hard. Moreover, according

to the result of Nesterov [88], it admits a polynomial-time randomized approximation

algorithm (SDP relaxation and randomization) with approximation ratio 2/π. This

implies that (TBS) admits a polynomial-time randomized approximation algorithm with

approximation ratio√

2/π.

7.2 Multilinear Form with Binary and Spherical Constraints 125

Proposition 7.2.1 is the foundation to establish the basic relaxation in solving (TBS)

recursively for general degree d and d′. In processing to the high degree cases, for the

recursion on d, with discrete variables xk (k = 1, 2, . . . , d), DR 6.2.1 is applied in

each recursive step; while for the recursion on d′ with continuous variables y` (` =

1, 2, . . . , d′), two decomposition routines in Section 3.2 are readily available, namely

the eigenvalue decomposition approach DR 3.2.2 and the randomized decomposition

approach DR 3.2.1, either one of them will serve the purpose here. The main result in

this section is the following:

Theorem 7.2.2 (TBS) admits a polynomial-time randomized approximation algorithm

with approximation ratio τ(TBS), where

τ(TBS) :=

(2

π

) 2d−12

(d−1∏k=1

nk

d′−1∏`=1

m`

)− 12

= Ω

(d−1∏k=1

nk

d′−1∏`=1

m`

)− 12

.

Proof. The proof is based on mathematical induction on the degree d+ d′, and Propo-

sition 7.2.1 can be used as the base for the induction process when d+ d′ = 1.

For general d + d′ ≥ 3, if d′ ≥ 2, let Y = y1(yd′)T. Noticing that ‖Y ‖2 =

‖y1‖2‖yd′‖2 = 1, similar to the relaxation in the proof of Theorem 3.2.4, (TBS) can be

relaxed to a case with degree d+ d′ − 1, i.e.,

max F (x1,x2, · · · ,xd,Y ,y2,y3, · · · ,yd′−1)

s.t. xk ∈ Bnk , k = 1, 2, . . . , d,

Y ∈ Sm1md′ , y` ∈ Sm` , ` = 2, 3, . . . , d′ − 1.

By induction, a feasible solution (x1, x2, · · · , xd, Y , y2, y3, · · · , yd′−1) can be found in


F (x1, x2, · · · , xd, Y , y2, y3, · · · , yd′−1) ≥(

2

π

) 2d−12

(d−1∏k=1

nk

d′−1∏`=2

m`

)− 12

v(TBS).

Let us denote matrix Q = F (x1, x2, · · · , xd, ·, y2, y3, · · · , yd′−1, ·) ∈ Rm1×md′ . Then

by Proposition 3.2.1 (used in DR 3.2.2), maxy1∈Sm1 ,yd′∈Smd′ (y1)TQyd

′can be solved

in polynomial-time, with its optimal solution (y1, yd′) satisfying

F (x1, x2, · · · , xd, y1, y2, · · · , yd′) = (y1)TQyd′ ≥ ‖Q‖/

√m1.

By the Cauchy-Schwartz inequality, it follows that

‖Q‖ = maxY ∈ Sm1md′

Q • Y ≥ Q • Y = F (x1, x2, · · · , xd, Y , y2, y3, · · · , yd′−1).


Thus we concludes that

F (x1, x2, · · · , xd, y1, y2, · · · , yd′) ≥ ‖Q‖/√m1

≥ F (x1, x2, · · · , xd, Y , y2, y3, . . . , yd′−1)/

√m1

≥ τ(TBS) v(TBS).

For d+ d′ ≥ 3 and d ≥ 2, let X = x1(xd)T, and (TBS) can be relaxed to the other

case with degree d− 1 + d′, i.e.,

max F (X,x2,x3, · · · ,xd−1,y1,y2, · · · ,yd′)

s.t. X ∈ Bn1nd , xk ∈ Bnk , k = 2, 3, . . . , d− 1,

y` ∈ Sm` , ` = 1, 2, . . . , d′.

By induction, it admits a polynomial-time randomized approximation algorithm with

approximation ratio(

2π

) 2d−32

(∏d−1k=2 nk

∏d′−1`=1 m`

)− 12. In order to decompose X into

x1 and xd, we shall conduct the randomization procedure as in Step 2 of DR 6.2.1,

which will further deteriorate by an additional factor of 2π√n1

in expectation, as shown

in (6.1). Combining these two factors, we are led to the ratio τ(TBS).

We end this section by summarizing the algorithm for solving (TBS) below.

Algorithm 7.2.1

• INPUT: a (d + d′)-th order tensor F ∈ Rn1×n2×···×nd×m1×m2×···×md′ with n1 ≤

n2 ≤ · · · ≤ nd and m1 ≤ m2 ≤ · · · ≤ md′.

1 Rewrite F as a matrix M ∈ Rn1n2···nd×m1m2···md′ by combining its first d modes

into the matrix row, and last d′ modes into the matrix column.

2 Apply the procedure in Proposition 7.2.1, with input M and output x ∈ Bn1n2···nd.

3 Rewrite the vector x as a d-th order tensor X ∈ Bn1×n2×···×nd and compute a

d′-th order tensor F ′ = F (X, ·, ·, · · · , ·) ∈ Rm1×m2×···×md′ .

4 Apply Algorithm 3.2.3, with input F ′ and output (y1, y2, · · · , yd′).

5 Compute a d-th order tensor F ′′ = F (·, ·, · · · , ·, y1, y2, · · · , yd′) ∈ Rn1×n2×···×nd.

6 Apply Algorithm 6.2.2, with input F ′′ and output (x1, x2, · · · , xd).

7.3 Homogeneous Form with Binary and Spherical Constraints 127

• OUTPUT: a feasible solution (x1, x2, · · · , xd, y1, y2, · · · , yd′).

7.3 Homogeneous Form with Binary and Spherical Con-

straints

We further extend the mixed model in previous section to the homogeneous polynomial

case, namely,

(HBS) max f(x,y)

s.t. x ∈ Bn, y ∈ Sm,

where f(x,y) = F (x,x, · · · ,x︸︷︷︸d

,y,y, · · · ,y︸︷︷︸d′

), and F ∈ Rnd×md′

is a (d + d′)-th order

tensor with partial symmetric property. This model is a generalization of the model

(HS) in Section 4.2 and the model (HB) in Section 6.3. We shall derive polynomial-

time approximation algorithms with worst-case performance ratios. The method here is

also multilinear function relaxation (TBS), which admits a polynomial-time randomized

approximation algorithm by Theorem 7.2.2. Then by applying Lemma 4.2.1 as a link,

together with the square-free property for the discrete variables x, we are led to the

following results regarding (HBS).

Theorem 7.3.1 If f(x,y) is square-free in x, and either d or d′ is odd, then (HBS)

admits a polynomial-time randomized approximation algorithm with approximation ra-

tio τ(HBS), where

τ(HBS) :=

(2

π

) 2d−12

d! d−dd′! d′−d′n−

d−12 m−

d′−12 = Ω

(n−

d−12 m−

d′−12

).

Proof. Like in the proof of Theorem 6.3.2, by relaxing (HBS) to (TBS), we are able to

find (x1, x2, · · · , xd, y1, y2, · · · , yd′) with xk ∈ Bn for all 1 ≤ k ≤ d and y` ∈ Sm for all

1 ≤ ` ≤ d′ in polynomial-time, such that

F (x1, x2, · · · , xd, y1, y2, · · · , yd′) ≥ (2/π)2d−1

2 n−d−12 m−

d′−12 v(HBS).

Let ξ1, ξ2, · · · , ξd, η1, η2, · · · , ηd′ be i.i.d. random variables, each taking values 1 and −1

with equal probability 1/2. By applying Lemma 4.4.3 (or Lemma 4.2.1 twice), we have

E

d∏i=1

ξi

d′∏j=1

ηj f

(d∑

k=1

ξkxk,

d′∑`=1

η`y`

) = d!d′!F (x1, x2, · · · , xd, y1, y2, · · · , yd′).

(7.1)


Thus we are able to find binary vectors β ∈ Bd and β′ ∈ Bd′ , such that

d∏i=1

βi

d′∏j=1

β′j f

(d∑

k=1

βkxk,

d′∑`=1

β′` y`

)≥ d!d′!F (x1, x2, · · · , xd, y1, y2, · · · , yd′).

Denote

(x, y) :=

d∏i=1

βi

d′∏j=1

β′j

d∑k=1

βkxk,

d′∑`=1

β′` y`

d is odd,

d∑k=1

βkxk,

d∏i=1

βi

d′∏j=1

β′j

d′∑`=1

β′` y`

d′ is odd.

Noticing ‖y‖ ≤ d′ and combining the previous two inequalities, it follows that

f

(x

d,y

‖y‖

)≥ d−dd′−d′

d∏i=1

βi

d′∏j=1

β′jf

(d∑

k=1

βkxk,

d′∑`=1

β′` y`

)≥ τ(HBS) v(HBS).

Denote y = y/‖y‖ ∈ Sm. Since x/d ∈ Bn by a similar argument as (6.2), and f(x, y)

is square-free in x, by applying Lemma 6.3.1, x ∈ Bn can be found in polynomial-time,

such that

f (x, y) ≥ f (x/d, y) ≥ τ(HBS) v(HBS).

We remark that in Theorem 7.3.1, if d′ = 2 and d is odd, then the factor d′! d′−d′

in

τ(HBS) can be removed for the same argument in the proof of Theorem 4.4.2 (basically

the corresponding adjustment is an eigenvalue problem), and this improves the ratio

τ(HBS) to(

2π

) 2d−12 d! d−dn−

d−12 m−

12 . Now we present the approximation result for the

even degree case.

Theorem 7.3.2 If f(x,y) is square-free in x, and both d and d′ are even, then (HBS)

admits a polynomial-time randomized approximation algorithm with relative approxi-

mation ratio τ(HBS).

Proof. Following the same argument as in the proof of Theorem 7.3.1, we shall get (7.1),

which implies

E

d∏i=1

ξi

d′∏j=1

ηj f

(d∑

k=1

ξkxk,

d′∑`=1

η`y`

) ≥ ( 2

π

) 2d−12

d! d′!n−d−12 m−

d′−12 v(HBS).

Denote xξ := 1d

∑dk=1 ξkx

k and yη := 1d′∑d′

`=1 η`y`. Clearly we have

E

d∏i=1

ξi

d′∏j=1

ηj f(xξ, yη

) ≥ τ(HBS) v(HBS).

7.3 Homogeneous Form with Binary and Spherical Constraints 129

Pick any fixed y ∈ Sm and consider the following problem

(HBS) max f(x, y)

s.t. x ∈ Bn.

Since f(x, y) is square-free in x and has no constant term, by Lemma 6.5.1, a binary

vector x ∈ Bn can be found in polynomial-time with

f(x, y) ≥ 0 ≥ v(HBS) ≥ v(HBS).

Next we shall argue f(xξ, yη

)≥ v(HBS). If this were not the case, then f

(xξ, yη

)<

v(HBS) ≤ 0. By noticing ‖yη‖ ≤ 1, this leads to

f(xξ, yη/‖yη‖

)= ‖yη‖−d

′f(xξ, yη

)≤ f

(xξ, yη

)< v(HBS).

Also noticing xξ ∈ Bn, by applying Lemma 6.3.1, a binary vector x ∈ Bn can be found

with

v(HBS) ≤ f(x, yη/‖yη‖) ≤ f(xξ, yη/‖yη‖

)< v(HBS)

resulting in a contradiction.

By that f(xξ, yη

)− v(HBS) ≥ 0, it follows

1

2E

f (xξ, yη)− v(HBS)

∣∣∣∣∣d∏i=1

ξi

d′∏j=1

ηj = 1

≥ 1

2E


∣∣∣∣∣d∏i=1

ξi

d′∏j=1

ηj = 1

−1

2E


∣∣∣∣∣d∏i=1

ξi

d′∏j=1

ηj = −1

= E

d∏i=1

ξi

d′∏j=1

ηj(f(xξ, yη

)− v(HBS)

)= E

d∏i=1

ξi

d′∏j=1

ηj f(xξ, yη

) ≥ τ(HBS) v(HBS).

Thus we are able to find β ∈ Bd and β′ ∈ Bd′ with∏di=1 βi

∏d′

j=1 β′j = 1, such that

f(xβ, yβ′

)− v(HBS) ≥ 2 τ(HBS) v(HBS).

Denote y = yβ′/‖yβ′‖ ∈ Sm. Since xβ ∈ Bn, by Lemma 6.3.1, a binary vector x ∈ Bn

can be found in polynomial-time with f(x, y) ≥ f (xβ, y).


Below we shall prove either (x, y) or (x, y) will satisfy

f(x,y)− v(HBS) ≥ τ(HBS) (v(HBS)− v(HBS)) . (7.2)

Indeed, if −v(HBS) ≥ τ(HBS) (v(HBS)− v(HBS)), then (x, y) satisfies (7.2) in this

case since f(x, y) ≥ 0. Otherwise, if −v(HBS) < τ(HBS) (v(HBS)− v(HBS)), then

v(HBS) > (1− τ(HBS)) (v(HBS)− v(HBS)) ≥ (v(HBS)− v(HBS)) /2,

which implies

f(xβ, yβ′

)− v(HBS) ≥ 2 τ(HBS) v(HBS) ≥ τ(HBS) (v(HBS)− v(HBS)) .

The above inequality also implies that f(xβ, yβ′

)> 0. Therefore, we have

f(x, y) ≥ f (xβ, y) = ‖yβ′‖−d′f(xβ, yβ′

)≥ f

(xβ, yβ′

),

which implies (x, y) satisfies (7.2). Finally, arg maxf(x, y), f(x, y) satisfies (7.2) in

both cases.

7.4 Mixed Form with Binary and Spherical Constraints

The final story of polynomial optimization problems in this thesis brings together a

bunch of models discussed in previous sections and chapters, as a generalization of a

large family, which includes (TS), (HS), (MS), (TB), (HB), (MB), (TBS) and (HBS)

all as its subclasses. The model is to maximize a mixed form over variables in binary

constraints, mixed with variables in spherical constraints, i.e.,


s.t. xk ∈ Bnk , k = 1, 2, . . . , s,

y` ∈ Sm` , ` = 1, 2, . . . , t,

where associated with function f is a tensor F ∈ Rn1d1×n2

d2×···×nsds×m1d′1×m2

d′2×···×mtd′t

with partial symmetric property, n1 ≤ n2 ≤ · · · ≤ ns and m1 ≤ m2 ≤ · · · ≤ mt, and

d = d1 + d2 + · · ·+ ds and d′ = d′1 + d′2 + · · ·+ d′t are deemed as fixed constants.

We shall derive polynomial-time approximation algorithms for this general model.

By relaxing (MBS) to the multilinear function optimization model (TBS) and solving

it approximately using Theorem 7.2.2, we may further adjust its solution one by one

using the link Lemma 4.2.1 or Lemma 4.4.3, leading to the following general results in

two settings.

7.4 Mixed Form with Binary and Spherical Constraints 131

Theorem 7.4.1 If f(x1,x2, · · · ,xs,y1,y2, · · · ,yt) is square-free in each xk (k =

1, 2, . . . , s), and one of dk (k = 1, 2, . . . , s) or one of d′` (` = 1, 2, . . . , t) is odd, then

(MBS) admits a polynomial-time randomized approximation algorithm with approxima-

tion ratio τ(MBS), where

τ(MBS) :=

(2

π

) 2d−12

s∏k=1

dk!

dkdk

∏1≤`≤t, 3≤d′`

d′`!

d′`d′`

(∏sk=1 nk

dk∏t`=1m`

d′`

nsmt

)− 12

= Ω

(∏sk=1 nk

dk∏t`=1m`

d′`

nsmt

)− 12

.

Proof. The proof is analogous to that of Theorem 7.3.1. We first relax (MBS) to (TBS)

and get its approximate solution (x1, x2, · · · , xd, y1, y2, · · · , yd′) using Theorem 7.2.2.

Let ξ1, ξ2, · · · , ξd, η1, η2, · · · , ηd′ be i.i.d. random variables, each taking values 1 and −1

with equal probability 1/2. By applying Lemma 4.4.3, we have

E

d∏i=1

ξi

d′∏j=1

ηjf(x1ξ , x

2ξ , · · · , xsξ, y1

η, y2η, · · · , ytη

)=

s∏k=1

dk!t∏

`=1

d′`!F(x1, x2, · · · , xd, y1, y2, · · · , yd′

), (7.3)

where

x1ξ :=

d1∑k=1

ξkxk, x2

ξ :=

d1+d2∑k=d1+1

ξkxk, · · · , xsξ :=

d∑k=d1+d2+···+ds−1+1

ξkxk, (7.4)

and

y1η :=

d′1∑`=1

η` y`, y2

η :=

d′1+d′2∑`=d′1+1

η` y`, · · · , ytη :=

d′∑`=d′1+d′2+···+d′t−1+1

η` y`. (7.5)

In (7.3), as one of dk (k = 1, 2, . . . , s) or one of d′` (` = 1, 2, . . . , t) is odd, we are

able to move∏di=1 ξi

∏d′

j=1 ηj into the coefficient of the corresponding vector (xkξ or y`η

whenever appropriate) in the function f . Other derivations are essentially the same

as the proof of Theorem 7.3.1. Besides, we only loose a ratio of d′`!/d′`d′` when d′` ≥ 3

in τ(MBS). This is because when d′` ≤ 2, the corresponding adjustments can be done

without deteriorating the ratio, like in the proof of Theorem 4.4.2.

Theorem 7.4.2 If f(x1,x2, · · · ,xs,y1,y2, · · · ,yt) is square-free in each xk (k =

1, 2, . . . , s), and all dk (k = 1, 2, . . . , s) and all d′` (` = 1, 2, . . . , t) are even, then (MBS)


admits a polynomial-time randomized approximation algorithm with relative approxi-

mation ratio τ(MBS), where

τ(MBS) :=

(2

π

) 2d−12

(s∏

k=1

dk!

dkdk

t∏`=1

d′`!

d′`d′`

)(∏sk=1 nk

dk∏t`=1m`

d′`

nsmt

)− 12

= Ω

(∏sk=1 nk

dk∏t`=1m`

d′`

nsmt

)− 12

.

Proof. The proof is analogous to that of Theorem 7.3.2. The main differences are: (i)

we use (7.3) instead of (7.1); and (ii) we use f

(x1ξ

d1,x2ξ

d2, · · · , x

sξ

ds,y1ηd′1,y2ηd′2, · · · , y

tη

d′t

)instead

of f(xξ, yη

), where

(x1ξ , x

2ξ , · · · , xsξ, y1

η, y2η, · · · , ytη

)are defined in (7.4) and (7.5).

7.5 Applications

The generality of the mixed integer polynomial optimizations studied in this chapter

gives rises to some succinct and interesting problems, apart from their versatile ap-

plications. Nevertheless, it should be useful and helpful to present a few examples at

this point with more details, to illustrate the potential modeling opportunities with the

new optimization models. In this section, we shall discuss the matrix combinatorial

problem and some extended version of the max-cut problem, and show that they are

readily formulated by the mixed integer programming problems in this chapter.

7.5.1 Matrix Combinatorial Problem

We discuss a succinct and interesting matrix combinatorial problem. Given n matrices

Ai ∈ Rm1×m2 for i = 1, 2, . . . , n, find a binary combination of them so as to maximize

the combined matrix in terms of spectral norm. Specifically, the following optimization

model

(MCP ) max σmax(∑n

i=1 xiAi)

s.t. xi ∈ 1,−1, i = 1, 2, . . . , n,

where σmax denotes the largest singular value of a matrix. Problem (MCP ) is NP-

hard, even in a special case of m2 = 1. In this case, the matrix Ai is replace by an

m1-dimensional vector ai, with the spectral norm being identical to the Euclidean norm

of a vector. The vector version combinatorial problem is then

max ‖∑n

i=1 xiai‖

s.t. xi ∈ 1,−1, i = 1, 2, . . . , n.


This is equivalent to the model (TBS) with d = d′ = 1, whose NP-hardness is asserted

by Proposition 7.2.1.

Turning back to the general matrix version (MCP ), the problem has an equivalent

formulation

max (y1)T (∑n

i=1 xiAi)y2

s.t. x ∈ Bn, y1 ∈ Sm1 , y2 ∈ Sm2 ,

which is essentially the model (TBS) with d = 1 and d′ = 2

max F (x,y1,y2)

s.t. x ∈ Bn, y1 ∈ Sm1 , y2 ∈ Sm2 ,

where associated with the trilinear function F is a third order tensor F ∈ Rn×m1×m2 ,

whose (i, j, k)-th entry is (j, k)-th entry of the matrix Ai. According to Theorem 7.2.2,

the largest matrix (in terms of spectral norm in (MCP ) formulation) can be approxi-

mated with a factor of√

2πminm1,m2 .

If the given n matrices Ai (i = 1, 2, . . . , n) are symmetric, then the maximization

criterion can be set for the largest eigenvalue in stead of the largest singular value, i.e.,

max λmax(∑n

i=1 xiAi)

s.t. xi ∈ 1,−1, i = 1, 2, . . . , n.

It is also easy to formulate this problem as the model (HBS) with d = 1 and d′ = 2

max F (x,y,y)


whose optimal value can also be approximated with a factor of√

2πm by Theorem 7.3.1

and the remarks that followed.

7.5.2 Vector-Valued Maximum Cut

Consider an undirected graph G = (V,E) where V = v1, v2, · · · , vn is the set of the

vertices, and E ⊂ V × V is the set of the edges. On each edge e ∈ E there is an

associated weight, which is a nonnegative vector in this case, i.e., we ∈ Rm,we ≥ 0

for all e ∈ E. The problem now is to find a cut in such a way that the total sum of

the weights, which is a vector in this case, has a maximum norm. More formally, this

problem can be formulated as

maxC is a cut of G

∥∥∥∥∥∑e∈C

we

∥∥∥∥∥ .


Note that the usual max-cut problem is a special case of the above model where each

weight we ≥ 0 is a scalar. Similar to the scalar case (see [40]), we may reformulate the

above problem in binary variables as

max∥∥∥∑1≤i,j≤n xixjw

′ij

∥∥∥s.t. x ∈ Bn,

where

w′ij =

−wij i 6= j,

−wij +n∑k=1

wik i = j.(7.6)

Observing the Cauchy-Schwartz inequality, we further formulate the above problem as

max(∑

1≤i,j≤n xixjw′ij

)Ty = F (x,x,y)

s.t. x ∈ Bn, y ∈ Sm.

This is the exact form of (HBS) with d = 2 and d′ = 1. Although the square-free

property in x does not hold in this model (which is a condition of Theorem 7.3.1), one

can still replace any point in the hypercube (Bn) by one of its vertices (Bn) without

decreasing its objective function value, since the matrix F (·, ·, ek) =(

(w′ij)k

)n×n

is

diagonal dominant for k = 1, 2, . . . ,m. Therefore, the vector-valued max-cut problem

admits an approximation ratio of 12

(2π

) 32 n−

12 by Theorem 7.3.1.

If the weights on edges are positive semidefinite matrices (i.e., W ij ∈ Rm×m, W ij

0 for all (i, j) ∈ E), then the matrix-valued max-cut problem can also be formulated as

max λmax

(∑1≤i,j≤n xixjW

′ij

)s.t. x ∈ Bn,

where W ′ij is defined similarly as (7.6); or equivalently,

max yT(∑

1≤i,j≤n xixjW′ij

)y


the model (HBS) with d = d′ = 2. Similar to the vector-valued case, by the diagonal

dominant property and Theorem 7.3.2, the above problem admits an approximation

ratio of 14

(2π

) 32 (mn)−

12 . Notice that Theorem 7.3.2 only asserts a relative approxima-

tion ratio. However for this problem the optimal value of its minimization counterpart

is obviously nonnegative, and thus a relative approximation ratio implies a usual ap-

proximation ratio.

Chapter 8

Conclusion and Recent

Developments

This thesis discusses various subclasses of polynomial optimization problems, with a

focus on deriving polynomial-time approximation algorithms with worst-case perfor-

mance guarantees. These subclasses include many frequently encountered constraints

in the literature, such as the Euclidean spherical constraints, the Euclidean ball con-

straints, the ellipsoidal constraints, the binary constraints, and a mixture of them. The

objective functions range from multilinear tensor functions, homogeneous polynomials,

to general inhomogeneous polynomials. Multilinear tensor function optimizations play

the key role in these algorithms, whose ideas are based on lower order multilinear form

relaxations and decomposition routines. Connections between multilinear functions,

homogenous polynomials, and inhomogeneous polynomials are established in preserv-

ing the approximation ratios. All the approximation results are listed in Table 8.1. The

applications of these polynomial optimization models are discussed, which open up a

door to many potential modeling opportunities. Reports on numerical testings show

that the algorithms proposed are actually very effective, and they typically produce

high quality solutions. The worst-case performance analysis offers a theoretical ‘safety

net’, which is usually far from the typical performance. Table 8.1 summarizes the whole

structure of the thesis and the approximation ratios.

Most of the results presented in this thesis have been documented and submitted

for publications in research papers [47, 48, 49], which are all joint works with He and

Zhang. Chapter 3 and Chapter 4 are mainly based on [47], Chapter 5 is mainly based

135

136 8 Conclusion and Recent Developments

Table 8.1: Thesis organization and theoretical approximation ratios

Section Model Theorem Approximation performance ratio

3.2 (TS) 3.2.4

(d−2∏k=1

nk

)− 12

3.3 (TQ) 3.3.4

(d−2∏k=1

nk

)− 12

Ω

(log−(d−1) max

1≤k≤dmk

)4.2 (HS) 4.2.2, 4.2.4 d! d−dn−

d−22

4.3 (HQ) 4.3.1, 4.3.2 d! d−dn−d−22 Ω

(log−(d−1)m

)

4.4.1 (MS) 4.4.2, 4.4.4

(s∏

k=1

dk!

dkdk

)(∏s−1k=1 nk

dk

ns−1

)− 12

(s∏

k=1

dk!

dkdk

)(∏sk=1 nk

dk

ns2

)− 12

4.4.2 (MQ) 4.4.5, 4.4.6

(s∏

k=1

dk!

dkdk

)(∏s−1k=1 nk

dk

ns−1

)− 12

Ω

(log−(d−1) max

1≤k≤smk

)(

s∏k=1

dk!

dkdk

)(∏sk=1 nk

dk

ns2

)− 12

Ω

(log−(d−1) max

1≤k≤smk

)5.2 (PS) 5.2.2 2−

5d2 (d+ 1)! d−2d(n+ 1)−

d−22

5.3 (PQ) 5.3.1 2−5d2 (d+ 1)! d−2d(n+ 1)−

d−22 Ω

(log−(d−1)m

)5.4 (PG) 5.4.2, 5.4.3 2−2d(d+ 1)! d−2d(n+ 1)−

d−22 (t2 + 1)−

d2

6.2 (TB) 6.2.1(

2

π

)d−1

ln(

1 +√

2)(d−2∏

k=1

nk

)− 12

6.3 (HB) 6.3.2, 6.3.3

(2

π

)d−1

ln(

1 +√

2)d! d−dn−

d−22

6.4 (MB) 6.4.1, 6.4.2

(2

π

)d−1

ln(

1 +√

2)( s∏

k=1

dk!

dkdk

)(∏s−1k=1 nk

dk

ns−1

)− 12

(2

π

)d−1

ln(

1 +√

2)( s∏

k=1

dk!

dkdk

)(∏sk=1 nk

dk

ns2

)− 12

6.5 (PB) 6.5.2ln(1 +√

2)

2(1 + e)πd−1(d+ 1)! d−2d(n+ 1)−

d−22

7.2 (TBS) 7.2.2(

2

π

) 2d−12

(d−1∏k=1

nk

d′−1∏`=1

m`

)− 12

7.3 (HBS) 7.3.1, 7.3.2(

2

π

) 2d−12

d! d−dd′! d′−d′n−

d−12 m−

d′−12

7.4 (MBS) 7.4.1, 7.4.2

(2

π

) 2d−12

(s∏

k=1

dk!

dkdk

t∏`=1

d′`!

d′`d′`

)(∏sk=1nk

dk∏t`=1m`

d′`

nsmt

)− 12

137

on [48], and Chapter 6 and Chapter 7 are mainly based on [49]. The results not on-

ly enhanced approximation algorithms for high degree polynomial optimizations, but

also opened up a wide range of new research topics for modeling and novel solution

methods. The research works have attracted some follow-up studies on the topic. For

instance, So [108] improved the approximation ratios of the models (TS) and (HS) to

Ω(∏d−2

k=1

√lognknk

)and Ω

((lognn

) d−22

), respectively. Very recently, He et al. [46] pro-

posed some fairly simple randomized approaches, which improved the approximation

ratios of homogeneous polynomial optimizations with spherical constraints and/or bi-

nary constraints, and the orders of the ratios were comparable to that in [108]. Apart

from the improvements of the approximation ratios, Chen et al. [25] established the

tightness result of multilinear form relaxation for the model (HS), and also derived

some local improvement algorithms for solving general polynomial optimization prob-

lems, especially for the model (PQ). Meanwhile, many other research topics are also

currently under investigations, including the extensions of polynomial optimization to

complex variables; the minimization counterparts of the models discussed in this the-

sis; the inapproximability results of these models; and of course issues from practical

applications of these models.

Bibliography

[1] K. Aardal, C. A. J. Hurkens, and A. K. Lenstra, Solving a System of Linear Dio-phantine Equations with Lower and Upper Bounds on the Variables, Mathematicsof Operations Research, 25, 427–442, 2000. 117

[2] J. L. R. Alfonsın, The Diophantine Frobenius Problem, Oxford University Press,Oxford, UK, 2005. 117

[3] N. Alon, W. F. de la Vega, R. Kannan, and M. Karpinski, Random Sampling andApproximation of MAX-CSP Problems, Proceedings of the 34th Annual ACMSymposium on Theory of Computing, 232–239, 2002. 115

[4] N. Alon, K. Makarychev, Y. Makarychev, and A. Naor, Quadratic Forms onGraphs, Inventiones Mathematicae, 163, 499–522, 2006. 99, 103

[5] N. Alon and A. Naor, Approximating the Cut-Norm via Grothendieck’s Inequality,SIAM Journal on Computing, 35, 787–803, 2006. 5, 8, 27, 98, 100, 101, 102, 115,118

[6] N. Ansari and E. Hou, Computational Intelligence for Optimization, Kluwer A-cademic Publishers, Norwell, MA, 1997. 5

[7] S. Arora, E. Berger, E. Hazan, G. Kindler, and M. Safra, On Non-Approximabilityfor Quadratic Programs, Proceedings of the 46th Annual IEEE Symposium onFoundations of Computer Science, 206–215, 2005. 8

[8] E. Artin, Uber die Zerlegung Definiter Funktionen in Quadrate, Abhandlungenaus dem Mathematischen Seminar der Universitat Hamburg, 5, 100–115, 1927. 2

[9] A. Atamturk, G. L. Nemhauser, and M. W. P. Savelsbergh, Conflict Graph-s in Solving Integer Programming Problems, European Journal of OperationalResearch, 121, 40–55, 2000. 5

[10] G. M. de Athayde and R. G. Flores, Jr., Incorporating Skewness and Kurtosisin Portfolio Optimization: A Multidimensional Efficient Set, in S. Satchell andA. Scowcroft (Eds.), Advances in Portfolio Construction and Implementation,Butterworth-Heinemann, Oxford, UK, Chapter 10, 243–257, 2003. 3, 92

[11] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, andM. Protasi, Complexity and Approximation: Combinatorial Optimization Prob-lems and their Approximability Properties, Springer-Verlag, Berlin, Germany,1999. 22

[12] M. L. Balinski, On a Selection Problem, Management Sciences, 17, 230–231, 1970.5, 7

138

BIBLIOGRAPHY 139

[13] I. Barany and Z. Furedi, Computing the Volume Is Difficult, Discrete & Compu-tational Geometry, 2, 319–326, 1987. 24

[14] A. Barmpoutis, B. Jian, B. C. Vemuri, and T. M. Shepherd, Symmetric Positive4th Order Tensors & Their Estimation from Diffusion Weighted MRI, Proceed-ings of the 20th International Conference on Information Processing in MedicalImaging, 308–319, 2007. 2

[15] A. Barvinok, Integration and Optimization of Multivariate Polynomials by Re-striction onto a Random Subspace, Foundations of Computational Mathematics,7, 229–244, 2006. 9

[16] D. Beihoffer, J. Hendry, A. Nijenhuis, and S. Wagon, Faster Algorithms for Frobe-nius Numbers, The Electronic Journal of Combinatorics, 12, R27, 2005. 117

[17] S. J. Benson and Y. Ye, Algorithm 875: DSDP5—Software for Semidefinite Pro-gramming, ACM Transactions on Mathematical Software, 34, Artical 16, 2008.26

[18] B. Bernhardsson and J. Peetre, Singular Values of Trilinear Forms, ExperimentalMathematics, 10, 509–517, 2001. 43

[19] B. Borchers, CSDP, A C Library for Semidefinite Programming, OptimizationMethods and Software, 11, 613–623, 1999. 26

[20] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press,Cambridge, UK, 2004. 30, 88, 90

[21] J. Bruck and M. Blaum, Neural Networks, Error-Correcting Codes, and Poly-nomials over the Binary n-Cube, IEEE Transactions on Information Theory, 35,976–987, 1989. 5

[22] P. H. Calamai and J. J. More, Projected Gradient Methods for Linearly Con-strained Problems, Mathematical Programming, 39, 93–116, 1987. 71

[23] J. D. Carroll and J.-J. Chang, Analysis of Individual Differences in Multidimen-sional Scaling via an N-Way Generalization of “Eckart-Young” Decomposition,Psychometrika, 35, 283–319, 1970. 44

[24] M. Charikar and A. Wirth, Maximizing Quadratic Programs: ExtendingGrothendieck’s Inequality, Proceedings of the 45th Annual IEEE Symposium onFoundations of Computer Science, 54–60, 2004. 8, 27, 98, 103

[25] B. Chen, S. He, Z. Li, and S. Zhang, Maximum Block Improvement and Poly-nomial Optimization, Technical Report, Department of Systems Engineering andEngineering Management, The Chinese University of Hong Kong, Hong Kong,2011. 30, 71, 79, 96, 137

[26] G. Cornuejols and M. Dawande, A Class of Hard Small 0-1 Programs, InformsJournal of Computing, 11, 205–210, 1999. 117

[27] G. Dahl, J. M. Leinaas, J. Myrheim, and E. Ovrum, A Tensor Product Matrix Ap-proximation Problem in Quantum Physics, Linear Algebra and Its Applications,420, 711–725, 2007. 2, 3, 67, 68

140 BIBLIOGRAPHY

[28] L. De Lathauwer, B. De Moor, and J. Vandewalle, A Multilinear Singular ValueDecomposition, SIAM Journal on Matrix Analysis and Applications, 21, 1253–1278, 2000. 43

[29] L. De Lathauwer, B. De Moor, and J. Vandewalle, On the Best Rank-1 andRank-(R1, R2, · · · , RN ) Approximation of Higher-Order Tensors, SIAM Journalon Matrix Analysis and Applications, 21, 1324–1342, 2000. 45

[30] C. N. Delzell, A Continuous, Constructive Solution to Hilbert’s 17th Problem,Inventiones Mathematicae, 76, 365–384, 1984. 2

[31] M. Dyer, A. M. Frieze, and R. Kannan, A Random Polynomial-Time Algorithmfor Approximating the Volume of Convex Bodies, Journal of the ACM, 38, 1–17,1991. 24

[32] U. Feige, Relations between Average Case Complexity and Approximation Com-plexity, Proceedings of the 34th Annual ACM Symposium on Theory of Comput-ing, 534–543, 2002. 5

[33] U. Feige, J. H. Kim, and E. Ofek. Witnesses for Non-Satisfiability of DenseRandom 3CNF Formulas, Proceedings of the 47th Annual IEEE Symposium onFoundations of Computer Science, 497–508, 2006. 5

[34] U. Feige and E. Ofek, Easily Refutable Subformulas of Large Random 3CNFForumlas, Theory of Computing, 3, 25–43, 2007. 5

[35] J. Friedman, A. Goerdt, and M. Krivelevich, Recognizing More Unsatisfiable Ran-dom k-SAT Instances Efficiently, SIAM Journal on Computing, 35, 408–430,2005. 5

[36] A. M. Frieze and R. Kannan, Quick Approximation to Matrices and Applications,Combinatorica, 19, 175–200, 1999. 115

[37] K. Fujisawa, M. Kojima, K. Nakata, and M. Yamashita, SDPA (SemiDefiniteProgramming Algorithm) User’s Manual—Version 6.2.0, Research Report B-308,Department of Mathematical and Computing Sciences, Tokyo Institute of Tech-nology, Tokyo, Japan, 1995. 26

[38] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to theTheory of NP-Completeness, W. H. Freeman and Company, New York, NY, 1979.20, 24, 27, 116

[39] A. Ghosh, E. Tsigaridas, M. Descoteaux, P. Comon, B. Mourrain, and R. De-riche, A Polynomial Based Approach to Extract the Maxima of an AntipodallySymmetric Spherical Function and Its Application to Extract Fiber Direction-s from the Orientation Distribution Function in Diffusion MRI, Proceedings ofthe 11th International Conference on Medical Image Computing and ComputerAssisted Intervention, 237–248, 2008. 3, 7

[40] M. X. Goemans and D. P. Williamson, Improved Approximation Algorithms forMaximum Cut and Satisfiability Problems using Semidefinite Programming, Jour-nal of the ACM, 42, 1115–1145, 1995. 4, 5, 6, 8, 24, 26, 98, 101, 124, 134

BIBLIOGRAPHY 141

[41] M. Grant and S. Boyd, CVX: Matlab Software for Disciplined Convex Program-ming, Version 1.2, http://cvxr.com/cvx, 2010. 69

[42] L. Gurvits, Classical Deterministic Complexity of Edmonds’ Problem and Quan-tum Entanglement. Proceedings of the 35th Annual ACM Symposium on Theoryof Computing, 10–19, 2003. 3

[43] P. L. Hammer and S. Rudeanu, Boolean Methods in Operations Research,Springer-Verlag, New York, NY, 1968. 5

[44] P. Hansen, Methods of Nonlinear 0-1 Programming, Annals of Discrete Mathe-matics, 5, 53–70, 1979. 5

[45] R. A. Harshman, Foundations of the PARAFAC Procedure: Models and Condi-tions for an “Explanatory” Multi-Modal Factor Analysis, UCLA Working Papersin Phonetics, 16, 1–84, 1970. 44

[46] S. He, B. Jiang, Z. Li, and S. Zhang, Probability Bounds for Polynomial Functionsin Random Variables, Technical Report, Department of Systems Engineering andEngineering Management, The Chinese University of Hong Kong, Hong Kong,2011. 137

[47] S. He, Z. Li, and S. Zhang, Approximation Algorithms for Homogeneous Poly-nomial Optimization with Quadratic Constraints, Mathematical Programming,Series B, 125, 353–383, 2010. 135

[48] S. He, Z. Li, and S. Zhang, General Constrained Polynomial Optimization: AnApproximation Approach, Technical Report, Department of Systems Engineer-ing and Engineering Management, The Chinese University of Hong Kong, HongKong, 2010. 135, 137

[49] S. He, Z. Li, and S. Zhang, Approximation Algorithms for Discrete PolynomialOptimization, Technical Report, Department of Systems Engineering and Engi-neering Management, The Chinese University of Hong Kong, Hong Kong, 2010.135, 137

[50] S. He, Z.-Q. Luo, J. Nie, and S. Zhang, Semidefinite Relaxation Bounds forIndefinite Homogeneous Quadratic Optimization, SIAM Journal on Optimization,19, 503–523, 2008. 8, 27, 28, 33, 37, 73, 77

[51] C. Helmberg, Semidefinite Programming for Combinatorial Optimization, ZIB-Report 00-34, Konrad-Zuse-Zentrum fur Informationstechnik Berlin, Berlin, Ger-many, 2000. 25

[52] D. Henrion and J. B. Lasserre, GloptiPoly: Global Optimization over Polynomialswith Matlab and SeDuMi, ACM Transactions on Mathematical Software, 29, 165–194, 2003. 8

[53] D. Henrion, J. B. Lasserre, and J. Loefberg, GloptiPoly 3: Moments, Optimiza-tion and Semidefinite Programming, Optimization Methods and Software, 24,761–779, 2009. 8, 70

[54] D. Hilbert, Uber die Darstellung Definiter Formen als Summe von Formen-quadraten, Mathematische Annalen, 32, 342–350, 1888. 1

http://cvxr.com/cvx

142 BIBLIOGRAPHY

[55] F. L. Hitchcock, The Expression of a Tensor or a Polyadic as a Sum of Products,Journal of Mathematical Physics, 6, 164–189, 1927. 44

[56] F. L. Hitchcock, Multilple Invariants and Generalized Rank of a p-Way Matrixor Tensor, Journal of Mathematical Physics, 6, 39–79, 2007. 44

[57] P. M. J. van den Hof, C. Scherer, and P. S. C. Heuberger, Model-Based Con-trol: Bridging Rigorous Theory and Advanced Technology, 49–68. Springer-Verlag,Berlin, Germany, 2009. 7

[58] J. J. Hopfield and D. W. Tank, “Neural” Computation of Decisions in Optimiza-tion Problem, Biological Cybernetics, 52, 141–152, 1985. 5

[59] Y. Huang and S. Zhang, Approximation Algorithms for Indefinite ComplexQuadratic Maximization Problems, Science China Mathematics, 53, 2697–2708,2010. 98

[60] E. Jondeau and M. Rockinger, Optimal Portfolio Allocation under Higher Mo-ments, European Financial Management, 12, 29–55, 2006. 3, 92

[61] V. Kann, On the Approximability of NP-Complete Optimization Problems, Ph.D.Dissertation, Royal Institute of Technology, Stockholm, Sweden, 1992. 22

[62] R. Kannan, Spectral Methods for Matrices and Tensors, Proceedings of the 42ndAnnual ACM Symposium on Theory of Computing, 1–12, 2010. 115

[63] S. Khot and A. Naor, Linear Equations Modulo 2 and the L1 Diameter of ConvexBodies, Proceedings of the 48th Annual IEEE Symposium on Foundations ofComputer Science, 318–328, 2007. 5, 9, 98, 100, 103

[64] P. M. Kleniati, P. Parpas, and B. Rustem Partitioning Procedure for PolynomialOptimization: Application to Portfolio Decisions with Higher Order Moments,COMISEF Working Papers Series, WPS-023, 2009. 3, 4, 92

[65] E. de Klerk, The Complexity of Optimizing over a Simplex, Hypercube or Sphere:A Short Survey, Central European Journal of Operations Research, 16, 111–125,2008. 6

[66] E. de Klerk, M. Laurent, and P. A. Parrilo, A PTAS for the Minimization ofPolynomials of Fixed Degree over the Simplex, Theoretical Computer Science,261, 210–225, 2006. 6, 9

[67] E. Kofidis and Ph. Regalia, On the Best Rank-1 Approximation of Higher OrderSupersymmetric Tensors, SIAM Journal on Matrix Analysis and Applications,23, 863–884, 2002. 3, 12, 44, 45

[68] T. G. Kolda and B. W. Bader, Tensor Decompositions and Applications, SIAMReview, 51, 455–500, 2009. 2, 3, 18, 44, 79, 120

[69] A. Kroo and J. Szabados, Joackson-Type Theorems in Homogeneous Approxima-tion, Journal of Approximation Theory, 152, 1–19, 2008. 3, 30

[70] J. B. Lasserre, Global Optimization with Polynomials and the Problem of Mo-ments, SIAM Journal on Optimization, 11, 796–817, 2001. 2, 7, 70

BIBLIOGRAPHY 143

[71] J. B. Lasserre, Polynomials Nonnegative on a Grid and Discrete Representations,Transactions of the American Mathematical Society, 354, 631–649, 2002. 2, 7, 70

[72] M. Laurent, Sums of Squares, Moment Matrices and Optimization over Polyno-mials, in M. Putinar and S. Sullivant (Eds.), Emerging Applications of AlgebraicGeometry, The IMA Volumes in Mathematics and its Applications, 149, 1–114,2009. 8

[73] C. Ling, J. Nie, L. Qi, and Y. Ye, Biquadratic Optimization over Unit Spheresand Semidefinite Programming Relaxations, SIAM Journal on Optimization, 20,1286–1310, 2009. 9, 27, 51, 60, 62, 73

[74] C. Ling, X. Zhang, and L. Qi, Semidefinite Relaxation Approximation for Multi-variate Bi-Quadratic Optimization with Quadratic Constraints, Numerical LinearAlgebra with Applications, DOI: 10.1002/nla.781, 2011. 9, 51

[75] Z.-Q. Luo, N. D. Sidiropoulos, P. Tseng, and S. Zhang, Approximation Bounds forQuadratic Optimization with Homogeneous Quadratic Constraints, SIAM Journalon Optimization, 18, 1–28, 2007. 8, 27, 69, 73, 77

[76] Z.-Q. Luo, J. F. Sturm, and S. Zhang, Multivariate Nonnegative Quadratic Map-pings, SIAM Journal on Optimization, 14, 1140–1162, 2004. 3

[77] Z.-Q. Luo and S. Zhang, A Semidefinite Relaxation Scheme for MultivariateQuartic Polynomial Optimization with Quadratic Constraints, SIAM Journal onOptimization, 20, 1716–1736, 2010. 4, 9, 27, 34, 50, 51, 73, 77, 93

[78] B. B. Mandelbrot and R. L. Hudson, The (Mis)Behavior of Markets: A FractalView of Risk, Ruin, and Reward, Basic Books, New York, NY, 2004. 3, 92

[79] B. Maricic, Z.-Q. Luo, and T. N. Davidson, Blind Constant Modulus Equalizationvia Convex Optimization, IEEE Transactions on Signal Processing, 51, 805–818,2003. 2

[80] D. Maringer and P. Parpas, Global Optimization of Higher Order Moments inPortfolio Selection, Journal of Global Optimization, 43, 219–230, 2009. 7

[81] H. M. Markowitz, Portfolio Selection, Journal of Finance, 7, 79–91, 1952. 2, 3,92

[82] C. A. Micchelli and P. Olsen, Penalized Maximum-Likelihood Estimation, theBaum-Welch Algorithm, Diagonal Balancing of Symmetric Matrices and Appli-cations to Training Acoustic Data, Journal of Computational and Applied Math-ematics, 119, 301–331, 2000. 3

[83] G. L. Miller, Riemann’s Hypothesis and Tests for Primality, Journal of Computerand System Sciences, 13, 300–317, 1976. 23

[84] B. Mourrain and J. P. Pavone, Subdivision Methods for Solving Polynomial E-quations, Journal of Symbolic Computation, 44, 292–306, 2009. 7

[85] B. Mourrain and P. Trebuchet, Generalized Normal Forms and Polynomial Sys-tem Solving. Proceedings of the 2005 International Symposium on Symbolic andAlgebraic Computation, 253–260, 2005. 7

http://dx.doi.org/10.1002/nla.781

144 BIBLIOGRAPHY

[86] A. Nemirovski, Lectures on Modern Convex Optimization, The H. Milton StewartSchool of Industrial and Systems Engineering, Georgia Institute of Technology,Atlanta, GA, 2005. 30, 88, 90

[87] A. Nemirovski, C. Roos, and T. Terlaky, On Maximization of Quadratic Formover Intersection of Ellipsoids with Common Center, Mathematical Program-ming, Series A, 86, 463–473, 1999. 8, 27, 28, 30, 37, 42, 50, 73, 77

[88] Yu. Nesterov, Semidefinite Relaxation and Nonconvex Quadratic Optimization,Optimization Methods and Software, 9, 141–160, 1998. 5, 8, 27, 73, 98, 124

[89] Yu. Nesterov, Squared Functional Systems and Optimization Problems, in H.Frenk, K. Roos, T. Terlaky, and S. Zhang (Eds.), High Performance Optimization,Kluwer Academic Press, Dordrecht, The Netherlands, 405–440, 2000. 8

[90] Yu. Nesterov, Random Walk in a Simplex and Quadratic Optimization over Con-vex Polytopes, CORE Discussion Paper 2003/71, Universite catholique de Lou-vain, Louvain-la-Neuve, Belgium, 2003. 5, 6, 31, 50, 60, 74

[91] Q. Ni, L. Qi, and F. Wang, An Eigenvalue Method for Testing Positive Defi-niteness of a Multivariate Form, IEEE Transactions on Automatic Control, 53,1096–1107, 2008. 3, 66

[92] P. Parpas and B. Rustem, Global Optimization of the Scenario Generation andPortfolio Selection Problems, Proceedings of the International Conference onComputational Science and Its Applications, 908–917, 2006. 7

[93] P. A. Parrilo, Structured Semidefinite Programs and Semialgebraic GeometryMethods in Robustness and Optimization, Ph.D. Dissertation, California Insti-tute of Technology, Pasadena, CA, 2000. 2, 7

[94] P. A. Parrilo, Semidefinite Programming Relaxations for Semialgebraic Problems,Mathematical Programming, Series B, 96, 293–320, 2003. 2, 7

[95] L. Peng and M. W. Wong, Compensated Compactness and Paracommutators,Journal of the London Mathematical Society, 62, 505–520, 2000. 43

[96] A. J. Prakash, C.-H. Chang, and T. E. Pactwa, Selecting a Portfolio with Skew-ness: Recent Evidence from US, European, and Latin American Equity Markets,Journal of Banking & Finance, 27, 1375–1390, 2003. 3, 92

[97] M. Purser, Introduction to Error-Correcting Codes, Artech House, Norwood, MA,1995. 5

[98] L. Qi, Extrema of a Real Polynomial, Journal of Global Optimization, 30, 405–433, 2004. 7, 50, 66

[99] L. Qi, Eigenvalues of a Real Supersymmetric Tensor, Journal of Symbolic Com-putation, 40, 1302–1324, 2005. 3, 66

[100] L. Qi, Eigenvalues and Invariants of Tensors, Journal of Mathematical Analysisand Applications, 325, 1363–1377, 2007. 3, 66

[101] L. Qi and K. L. Teo, Multivariate Polynomial Minimization and Its Applicationsin Signal Processing, Journal of Global Optimization, 26, 419–433, 2003. 2

BIBLIOGRAPHY 145

[102] L. Qi, Z. Wan, and Y.-F. Yang, Global Minimization of Normal Quadratic Poly-nomials Based on Global Descent Directions, SIAM Journal on Optimization, 15,275–302, 2004. 7

[103] L. Qi, F. Wang, and Y. Wang, Z-eigenvalue Methods for a Global PolynomialOptimization Problem, Mathematical Programming, Series A, 118, 301–316, 2009.7

[104] M. O. Rabin, Probabilistic Algorithms, in J. F. Traub (Eds.), Algorithms andComplexity: New Directions and Recent Results, Academic Press, New York,NY, 21–39, 1976. 23

[105] M. O. Rabin, Probabilistic Algorithm for Testing Primality, Journal of NumberTheory, 12, 128–138, 1980. 23

[106] J. M. W. Rhys, A Selection Problem of Shared Fixed Costs and Network Flows,Management Science, 17, 200–207, 1970. 5, 7

[107] A. P. Roberts and M. M. Newmann, Polynomial Optimization of Stochastic Feed-back Control for Stable Plants, IMA Journal of Mathematical Control & Infor-mation, 5, 243–257, 1988. 2

[108] A. M.-C. So, Deterministic Approximation Algorithms for Sphere ConstrainedHomogeneous Polynomial Optimization Problems, Mathematical Programming,Series B, DOI: 10.1007/s10107-011-0464-0, 2011. 79, 137

[109] A. M.-C. So, Y. Ye, and J. Zhang, A Unified Theorem on SDP Rank Reduction,Mathematics of Operations Research, 33, 910–920, 2008. 40

[110] S. Soare, J. W. Yoon, and O. Cazacu, On the Use of Homogeneous Polynomi-als to Develop Anisotropic Yield Functions with Applications to Sheet Forming,International Journal of Plasticity, 24, 915–944, 2008. 3

[111] R. M. Solovay and V. Strassen, A Fast Monte-Carlo Test for Primality, SIAMJournal on Computing, 6, 84–85, 1977. 23

[112] J. F. Sturm, SeDuMi 1.02, A Matlab Toolbox for Optimization over SymmetricCones, Optimization Methods and Software, 11 & 12, 625–653, 1999. 8, 26

[113] J. F. Sturm and S. Zhang, On Cones of Nonnegative Quadratic Functions, Math-ematics of Operations Research, 28, 246–267, 2003. 67, 74

[114] W. Sun and Y.-X. Yuan, Optimization Theory and Methods: Nonlinear Program-ming, Springer-Verlag, New York, NY, 2006. 74

[115] K. C. Toh, M. J. Todd, and R. H. Tutuncu, SDPT3—A Matlab Software Packagefor Semidefinite Programming, Version 1.3, Optimization Methods and Software,11, 545–581, 1999. 8, 26

[116] L. Vandenberghe and S. Boyd, Semidefinite Programming, SIAM Review, 38,49–95, 1996. 25

[117] P. P. Varju, Approximation by Homogeneous Polynomials, Constructive Approx-imation, 26, 317–337, 2007. 3, 30

http://dx.doi.org/10.1007/s10107-011-0464-0

146 BIBLIOGRAPHY

[118] Y. Ye, Approximating Quadratic Programming with Bound and Quadratic Con-straints, Mathematical Programming, 84, 219–226, 1999. 8, 27

[119] Y. Ye, Approximating Global Quadratic Optimization with Convex QuadraticConstraints, Journal of Global Optimization, 15, 1–17, 1999. 8, 27

[120] S. Zhang, Quadratic Maximization and Semidefinite Relaxation, MathematicalProgramming, Series A, 87, 453–465, 2000. 8, 27, 73

[121] S. Zhang and Y. Huang, Complex Quadratic Optimization and Semidefinite Pro-gramming, SIAM Journal on Optimization, 16, 871–890, 2006. 8, 27

[122] T. Zhang and G. H. Golub, Rank-One Approximation to High Order Tensors,SIAM Journal on Matrix Analysis and Applications, 23, 534–550, 2001. 7, 45

[123] X. Zhang, C. Ling, and L. Qi, Semidefinite Relaxation Bounds for Bi-QuadraticOptimization Problems with Quadratic Constraints, Journal of Global Optimiza-tion, 49, 293–311, 2011. 9, 51

Polynomial Optimization Problems · of some important subclasses of polynomial optimization...

Documents

Transcript of Polynomial Optimization Problems · of some important subclasses of polynomial optimization...