Polynomial Optimization Problems · of some important subclasses of polynomial optimization...
Transcript of Polynomial Optimization Problems · of some important subclasses of polynomial optimization...
Polynomial Optimization Problems
— Approximation Algorithms and Applications
LI, Zhening
A Thesis Submitted in Partial Fulfillment
of the Requirements for the Degree of
Doctor of Philosophy
in
Systems Engineering and Engineering Management
The Chinese University of Hong Kong
June 2011
Thesis/Assessment Committee
Professor Duan Li (Chair)
Professor Shuzhong Zhang (Thesis Advisor)
Professor Anthony Man-Cho So (Committee Member)
Professor Yinyu Ye (External Examiner)
Abstract
Polynomial optimization problem is to optimize a generic multivariate polynomial func-
tion, subject to some suitable polynomial equality and inequality constraints. Such
problem formulation dates back to the 19th century, when the relationship between
nonnegative polynomials and sum of squares were discussed by Hilbert. Polynomial
optimization is one of the fundamental problems in the field of optimization, and has
applications in a large range of areas, including biomedical engineering, control theory,
graph theory, investment science, material science, numerical linear algebra, quantum
mechanics, signal processing, speech recognition, etc. This thesis presents a study
of some important subclasses of polynomial optimization problems arising from vari-
ous applications. The focus is on optimizing a high degree polynomial function, over
some commonly encountered constraint sets, such as the Euclidean ball, the Euclidean
sphere, the intersection of co-centered ellipsoids, the binary hypercube, as well as a
combination of them. Specifically, five classes of models are discussed, i.e., optimiz-
ing a multilinear function with quadratic constraints, a homogeneous polynomial with
quadratic constraints, a general polynomial with convex constraints, a general polyno-
mial with binary constraints, and a homogeneous polynomial with binary and spherical
constraints. All the problems under consideration are NP-hard in general. The main
contribution of this thesis is on the design and analysis of polynomial-time approxima-
tion algorithms with guaranteed worst-case performance ratios. These approximation
ratios are dependent on the problem dimensions only, and the new results improve
some of the existing results in the literature. In each class of these optimization mod-
els, some application examples are discussed and results of numerical experiments are
reported, revealing good practical performance of the proposed algorithms for solving
some randomly generated test instances.
ii
ÁÁÁ
õª`z¯K´Äu,«õªªÚõªØªå^e±õõª¼ê
8I`z¯K"ùa¯KïÄJãÊV§FËA?ØKõ
ª¼êÚõª²Ú¼êm'X"õª`z¯K´`z+¯K
§ÙA^+~2§Ù¥))ÔÆó§!Ø!ãØ!Ý]nØ!
áÆ!êê!þfåÆ!&Ò?n!Ñ£O"Ø©XïÄõª
`z¯K¥aA^+'f¯K"ïÄ`z¯K:3±p
gõª¼ê8I§å8Ï~Ñyõªå8ܧ~XµîAp¥
N!îAp¥¡!Ó%ý8!páNº:8!±9§«|Ü"
äN5`§©ïÄÊaõª`z.§§´g¼êåeõ5¼ê`
z!g¼êåeàgõª`z!àåeõª`z!åe
õª`z§±9Ú¥¡åeàgõª`z"¤k9`z¯KÑ
NP(J"Ø©Ìz3uOÚ©ÛõªmCq"ù
)Ñk½Cq'y§Cq'==`z¯Kêk'§#(JU?
yk©z(J"3za`z.ï?¥§Ø©ÞA^¢~§¿®
ê¢(J§ww«ùa3)ûÅ)Y~1J~Ð"
iii
Acknowledgement
I would never have finished this work, or even started it, if it were not for my father, who
instilled in me a passion for learning and striving. Almost eight years have passed since
my father’s sudden passing away, and he has always stayed in my heart, supporting
and encouraging me during the whole journey of my doctoral work. All the difficulties
have become quite manageable, and my confidence in myself has never been lost.
It is a great fortune to have Professor Shuzhong Zhang as my doctoral advisor, who
got me excited about various interesting research topics, as well as moralities and other
human qualities. I would like to express my heartfelt thanks to him, for his guidance,
constant support and numerous helps when in need, as well as many enjoyable hours
of discussions that we have had over the last four years. I am greatly indebted to him.
I would like to express my gratitude to those who gave me the possibility to complete
this work, especially to my former colleague, Professor Simai He, from whom I have
been benefitted tremendously for his insightful comments, valuable suggestions and
efforts in discussing technical problems. I would also like to thank Professors Duan Li,
Anthony So, and Yinyu Ye for serving on my thesis committee.
I would also like to extend my gratitude to the professors, supporting staff mem-
bers and the fellow postgraduate students at Department of Systems Engineering and
Engineering Management for providing me with technical and non-technical supports.
It has been a great pleasure to work with them. Besides, this work would not have
been possible without the general support from The Chinese University of Hong Kong.
Finally, I would like to express my deepest gratitude to the three most important
women in my family, whose love and support enabled me to complete this work. To my
dear mother: thank you for continuously supporting and taking care of me; To my dear
wife: thank you for your patience, trust and encouragement; And to my dear daughter:
thank you for bringing me so much indelible happiness, I owe you a great deal!
iv
This work is dedicated to my father
Contents
Abstract ii
Acknowledgement iv
1 Introduction 1
1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Notations and Preliminaries 12
2.1 Notations and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Constraint Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.3 Models and Organization . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Tensor Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Approximation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Randomized Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Semidefinite Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Multilinear Form Optimization with Quadratic Constraints 29
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Multilinear Form with Spherical Constraints . . . . . . . . . . . . . . . . 31
3.3 Multilinear Form with Ellipsoidal Constraints . . . . . . . . . . . . . . . 36
3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.1 Singular Values of Trilinear Forms . . . . . . . . . . . . . . . . . 43
vi
CONTENTS vii
3.4.2 Rank-One Approximation of Tensors . . . . . . . . . . . . . . . . 44
3.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.1 Randomly Simulated Data . . . . . . . . . . . . . . . . . . . . . 46
3.5.2 Data with Known Optimal Solutions . . . . . . . . . . . . . . . . 47
4 Homogeneous Form Optimization with Quadratic Constraints 49
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Homogeneous Form with Spherical Constraint . . . . . . . . . . . . . . . 52
4.3 Homogeneous Form with Ellipsoidal Constraints . . . . . . . . . . . . . 57
4.4 Mixed Form with Quadratic Constraints . . . . . . . . . . . . . . . . . . 59
4.4.1 Mixed Form with Spherical Constraints . . . . . . . . . . . . . . 60
4.4.2 Mixed Form with Ellipsoidal Constraints . . . . . . . . . . . . . 65
4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5.1 Eigenvalues and Approximation of Tensors . . . . . . . . . . . . 66
4.5.2 Density Approximation in Quantum Physics . . . . . . . . . . . 67
4.6 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.6.1 Randomly Simulated Data . . . . . . . . . . . . . . . . . . . . . 69
4.6.2 Comparison with Sum of Squares Method . . . . . . . . . . . . . 70
5 Polynomial Optimization with Convex Constraints 72
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2 Polynomial with Ball Constraint . . . . . . . . . . . . . . . . . . . . . . 74
5.2.1 Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.2 Multilinear Form Relaxation . . . . . . . . . . . . . . . . . . . . 78
5.2.3 Homogenizing Components Adjustment . . . . . . . . . . . . . . 79
5.2.4 Feasible Solution Assembling . . . . . . . . . . . . . . . . . . . . 82
5.3 Polynomial with Ellipsoidal Constraints . . . . . . . . . . . . . . . . . . 85
5.4 Polynomial with General Convex Constraints . . . . . . . . . . . . . . . 88
5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5.1 Portfolio Selection with Higher Moments . . . . . . . . . . . . . . 92
5.5.2 Sensor Network Localization . . . . . . . . . . . . . . . . . . . . 93
5.6 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6.1 Randomly Simulated Data . . . . . . . . . . . . . . . . . . . . . 94
5.6.2 Local Improvements . . . . . . . . . . . . . . . . . . . . . . . . . 95
viii CONTENTS
6 Polynomial Optimization with Binary Constraints 97
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2 Multilinear Form with Binary Constraints . . . . . . . . . . . . . . . . . 99
6.3 Homogeneous Form with Binary Constraints . . . . . . . . . . . . . . . 103
6.4 Mixed Form with Binary Constraints . . . . . . . . . . . . . . . . . . . . 107
6.5 Polynomial with Binary Constraints . . . . . . . . . . . . . . . . . . . . 109
6.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.6.1 Cut-Norm of Tensors . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.6.2 Maximum Complete Satisfiability . . . . . . . . . . . . . . . . . . 116
6.6.3 Box-Constrained Diophantine Equation . . . . . . . . . . . . . . 117
6.7 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.7.1 Randomly Simulated Data . . . . . . . . . . . . . . . . . . . . . 118
6.7.2 Data of Low-Rank Tensors . . . . . . . . . . . . . . . . . . . . . 120
7 Homogeneous Form Optimization with Mixed Constraints 122
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.2 Multilinear Form with Binary and Spherical Constraints . . . . . . . . . 124
7.3 Homogeneous Form with Binary and Spherical Constraints . . . . . . . 127
7.4 Mixed Form with Binary and Spherical Constraints . . . . . . . . . . . . 130
7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.5.1 Matrix Combinatorial Problem . . . . . . . . . . . . . . . . . . . 132
7.5.2 Vector-Valued Maximum Cut . . . . . . . . . . . . . . . . . . . . 133
8 Conclusion and Recent Developments 135
Bibliography 138
Chapter 1
Introduction
Polynomial optimization problem is the following generic optimization model
(POP ) min p(x)
s.t. fi(x) ≤ 0, i = 1, 2, . . . ,m1,
gj(x) = 0, j = 1, 2, . . . ,m2,
x = (x1, x2, · · · , xn)T ∈ Rn,
where p(x), fi(x) (i = 1, 2, . . . ,m1) and gj(x) (j = 1, 2, . . . ,m2) are some multivariate
polynomial functions. This problem is a fundamental model in the field of optimization,
and has applications in a wide range of areas. Many algorithms have been proposed
for subclasses of (POP ), and specialized software packages have been developed.
1.1 History
The modern history of polynomial optimization may date back to the 19th century when
the relationship between nonnegative polynomial function and the sum of squares of
polynomials was studied. Given a multivariate polynomial function that takes only
nonnegative values over the real numbers, can it be represented as a sum of squares
of polynomial functions? Hilbert [54] gave a concrete answer in 1888, which asserted
that the only cases for a nonnegative polynomial to be a sum of squares are: univariate
polynomials; multivariate quadratic polynomials; and bivariate quartic polynomials.
Later, in Hilbert’s 17th problem—one of the famous 23 Hilbert problems addressed in
a celebrated speech in 1900 by Hilbert, a nonnegative polynomial entails expression
of definite rational functions as quotients of sums of squares. Given a multivariate
1
2 1 Introduction
polynomial function that takes only nonnegative values over the real numbers, can it be
represented as a sum of squares of rational functions? This was solved in the affirmative,
by Artin [8] in 1927. A continuous and constructive algorithm was later found by
Delzell [30] in 1984. About 10 years ago, Lasserre [70, 71] and Parrilo [93, 94] proposed
a method called the sum of squares (SOS) to solve general polynomial optimization
problem. The method is based on the fact that deciding whether a given polynomial
is a sum of squares can be reduced to the feasibility of a semidefinite program (SDP).
The SOS approach has a strong theoretical appeal, as it can in principle solve any
polynomial optimization problem to any given accuracy.
1.2 Applications
Polynomial optimizations have wide applications—just to name a few examples: biomed-
ical engineering, control theory, graph theory, investment science, material science, nu-
merical linear algebra, quantum mechanics, signal processing, speech recognition. It is
basically impossible to list, even very partially, the success stories of (POP ), simply
due to its sheer size in the literature. To motivate our study, below we shall nonetheless
mention some sample applications to illustrate the usefulness of (POP ).
Polynomial optimizations have immediate applications in investment science. For
instance, the celebrated mean-variance model was proposed by Markowitz [81] early in
1952, where the portfolio selection problem is modeled by minimizing the variance of the
investments subject to its target return. In control theory, Roberts and Newmann [107]
studied polynomial optimization of stochastic feedback control for stable plants. In
diffusion magnetic resonance imaging (MRI), Barmpoutis et al. [14] presented a case
for the fourth order tensor approximation. In fact, there are a large class of (POP )
arising from tensor approximations and decompositions, which are originated from
applications in psychometrics and chemometrics (see an excellent survey by Kolda
and Bader [68]). Polynomial optimizations have also applications in sinal processing.
Maricic et al. [79] proposed a quartic polynomial model for blind channel equalization
in digital communication, and Qi and Teo [101] conducted global optimization for high
degree polynomial minimization models arising from signal processing. In quantum
physics, Dahl et al. [27] proposed a polynomial optimization model to verify whether a
physical system is entangled or not, which is an important problem in quantum physics.
1.2 Applications 3
Gurvits [42] showed that the entanglement verification is NP-hard in general. In fact,
the model discussed in [27] is related to the nonnegative quadratic mappings studied
by Luo et al. [76].
Among generic polynomial functions, homogeneous polynomials play an important
role in approximation theory (see e.g., two recent papers by Kroo and Szabados [69] and
Varju [117]). Essentially their results state that the homogeneous polynomial functions
are fairly ‘dense’ among continuous functions in a certain well-defined sense. As such,
optimizations of homogeneous polynomials become important. As an example, Ghosh
et al. [39] formulated a fiber detection problem in diffusion MRI by maximizing a
homogenous polynomial function subject to the Euclidean spherical constraint, i.e.,
(HS) max f(x)
s.t. ‖x‖2 = 1, x ∈ Rn.
The constraint of (HS) is a typical polynomial equality constraint. In this case, the
degree of the homogeneous polynomial f(x) may be high. This particular model (HS)
is widely appeared in the following examples. In material sciences, Soare et al. [110]
proposed some 4th, 6th and 8th order homogeneous polynomials to model the plastic
anisotropy of orthotropic sheet metal. In statistics, Micchelli and Olsen [82] considered
a maximum-likelihood estimation model in speech recognition. In numerical linear
algebra, (HS) is the formulation of an interesting problem: the eigenvalues of tensors
(see Qi [99, 100] and Ni et al. [91]). Another widely used application of (HS) is regarding
to the best rank-one approximation of higher order tensors (see [67, 68]).
In fact, Markowitz’s mean-variance model [81] mentioned previously is also opti-
mization on a homogeneous polynomial, in particular, a quadratic form. Recently, an
intensified discussion on investment models involving more than the first two moments
(for instance to include the skewness and the kurtosis of the investment returns) have
been another source of inspiration underlying polynomial optimizations. Mandelbrot
and Hudson [78] made a strong case against a ‘normal view’ of the investment returns.
The use of higher moments in portfolio selection becomes quite necessary. Along that
line, several authors proposed investment models incorporating the higher moments,
e.g., de Athayde and Flore [10], Prakash et al. [96], Jondeau and Rockinger [60], and
Kleniati et al. [64]. However, in those models, the polynomial functions involved are
4 1 Introduction
no longer homogeneous. In particular, a very general model in [64] is
max α∑n
i=1µixi −β∑n
i,j=1σijxixj +γ∑n
i,j,k=1ςijkxixjxk −δ∑n
i,j,k,`=1κijk`xixjxkx`
s.t.∑n
i=1 xi = 1, x ≥ 0, x ∈ Rn,
where (µi), (σij), (ςijk) and (κijk`) are the first four central moments of the given n
assets. The nonnegative parameters α, β, γ, δ measure the investor’s preference to the
four moments, and they sum up to one, i.e., α + β + γ + δ = 1. Besides investment
science, many other important applications of polynomial function optimization involve
an objective that is intrinsically inhomogeneous. The other example is the least square
formulation to the sensor network localization problem proposed in Luo and Zhang [77].
Specifically, the problem takes the form of
min∑
i,j∈S(‖xi − xj‖2 − dij2
)2+∑
i∈S,j∈A(‖xi − aj‖2 − dij2
)2s.t. xi ∈ R3, i ∈ S,
where A and S denote the set of anchor nodes and sensor nodes respectively, dij (i ∈
S, j ∈ S ∪A) are (possibly noisy) distance measurements, aj (j ∈ A) denote the known
positions of anchor nodes, while xi (i ∈ S) represent the positions of sensor nodes to
be estimated.
Apart from the continuous models discussed above, polynomial optimizations over
variables in discrete values, in particular binary variables, are also widely studied. For
example, maximize a polynomial function over variables picking from 1 or -1, i.e.,
(PB) max p(x)
s.t. xi ∈ 1,−1, i = 1, 2, . . . , n.
This type of problem can be found in a great variety of application domains. Indeed,
(PB) has been investigated extensively in the quadratic case, due to its connections
to various graph partitioning problems, e.g., the maximum cut problem [40]. If the
degree of the polynomial goes higher, the following hypergraph max-cover problem is
also well studied. Given a hypergraph H = (V,E) with V being the set of vertices
and E the set of hyperedges (or subsets of V ), and each hyperedge e ∈ E is associated
with a real-valued weight w(e). The problem is to find a subset S of the vertices set
V , such that the total weight of the hyperedges covered by S is maximized. Denoting
xi ∈ 0, 1 (i = 1, 2, . . . , n) to indicate whether or not vertex i is selected in S. The
problem thus is maxx∈0,1n∑
e∈E w(e)∏i∈e xi. By a simple variable transformation
xi → (xi + 1)/2, the problem is transformed to (PB), and vice versa.
1.2 Applications 5
Note that the model (PB) is a fundamental problem in integer programming. As
such it has received attention in the literature (see e.g., [43, 44]). It is also known as the
Fourier support graph problem. Mathematically, a polynomial function p : −1, 1n →
R has Fourier expansion p(x) =∑
S⊂1,2,...,n p(S)∏i∈S xi, which is also called the
Fourier support graph. By assuming that p(x) has only succinct (polynomially many)
non-zero Fourier coefficient p(S), can we compute the maximum value of p(x) over the
discrete hypercube 1,−1n, or alternatively can we find a good approximate solution
in polynomial-time? The latter question actually motivates the discrete polynomial
optimization models studied in this thesis. In general, (PB) is closely related to finding
the maximum weighted independent set in a graph. In fact, any instance of (PB) can
be transformed into the maximum weighted independent set problem, which is also
the most commonly used technique in the literature for solving (PB) (see e.g., [12,
106]). The transformation uses the concept of a conflict graph of a 0-1 polynomial
function, for details, one is referred to [21, 9]. Beyond its connection to the graph
problems, (PB) also has applications in neural networks [58, 21, 6], error-correcting
codes [21, 97], etc. In fact, Bruck and Blaum [21] reveal the natural equivalence within
the model (PB), maximum likelihood decoding of error-correcting codes, and finding
the global maximum of a neural network. Recently Khot and Naor [63] show that it has
applications in the problem of refutation of random k-CNF formulas [32, 35, 33, 34].
If the objective polynomial function in (PB) is homogeneous, likewise, the homoge-
neous quadratic case has been studied extensively, e.g., [40, 88, 90, 5]. Homogeneous
cubic polynomial case is also discussed by Khot and Naor [63]. Another interesting
problem of this class is the ∞ 7→ 1-norm of a matrix F = (Fij), studied by Alon and
Naor [5], i.e.,
‖F ‖∞7→1 = max∑
1≤i≤n1,1≤j≤n2Fijxiyj
s.t. x ∈ 1,−1n1 , y ∈ 1,−1n2 .
It is quite natural to extend the problem of ∞ 7→ 1-norm to higher order tensors. In
particular, the ∞ 7→ 1-norm of a d-th order tensor F = (Fi1i2···id) can be defined as
max∑
1≤i1≤n1,1≤i2≤n2,··· ,1≤id≤nd Fi1i2···idx1i1x2i2· · ·xdid
s.t. xk ∈ 1,−1nk , k = 1, 2, . . . , d.
Another generalization of the matrix ∞ 7→ 1-norm is to extend the entry Fij of the
6 1 Introduction
matrix F to a symmetric matrix Aij ∈ Rm×m, i.e., the problem of
max λmax
(∑1≤i≤n1,1≤j≤n2
xiyjAij
)s.t. x ∈ 1,−1n1 , y ∈ 1,−1n2 ,
where λmax indicates the largest eigenvalue of a matrix. If the matrix Aij ∈ Rm1×m2
is not restricted to be symmetric, we may instead maximize the largest singular value,
i.e.,
max σmax
(∑1≤i≤n1,1≤j≤n2
xiyjAij
)s.t. x ∈ 1,−1n1 , y ∈ 1,−1n2 .
These two problems are actually equivalent to
max∑
1≤i≤n1,1≤j≤n2,1≤k,`≤m Fijk` xiyjzkz`
s.t. x ∈ 1,−1n1 , y ∈ 1,−1n2 ,
‖z‖2 = 1, z ∈ Rm
and
max∑
1≤i≤n1,1≤j≤n2,1≤k≤m1,1≤`≤m2Fijk` xiyjzkw`
s.t. x ∈ 1,−1n1 , y ∈ 1,−1n2 ,
‖z‖2 = ‖w‖2 = 1, z ∈ Rm1 , w ∈ Rm2
respectively, where F = (Fijk`) is a fourth order tensor, whose (i, j, k, `)-th entry is
the (k, `)-th entry of the matrix Aij . These two special models of (POP ) extends
polynomial integer programming problems to the mixed integer programming problems,
which is also an important subclass of (POP ) studied in this thesis.
1.3 Algorithms
Polynomial optimization problems are typically non-convex and highly nonlinear. In
most cases, (POP ) is NP-hard, even for very special instances, such as maximizing a
cubic polynomial over a sphere (see Nestorov [90]), maximizing a quadratic form in
binary variables (see e.g., Goemans and Williamson [40]), etc. The reader is referred
to de Klerk [65] for a survey on the computational complexity issues of polynomial
optimization over some simple constraint sets. In the case that the constraint set is
a simplex and the objective polynomial has a fixed degree, it is possible to derive
polynomial-time approximation schemes (PTAS) (see de Klerk et al. [66]), albeit the
result is viewed mostly as a theoretical one. Almost in all practical situations, the
1.3 Algorithms 7
problem is difficult to solve, theoretically as well as numerically. However, the search
for general and efficient algorithms for polynomial optimization has been a priority for
many mathematical optimizers and researchers in various applications.
Perhaps the very first attempt for solving polynomial optimization problems is
taking them as nonlinear programming problems, and many existing algorithms and
software packages are available, including KNITRO, BARON, IPOPT, SNOPT, and
Matlab optimization toolbox. However, these algorithms and solvers are not tailor
made for polynomial optimization problems, and so the performance may vary greatly
from problem instance to instance. One direct approach is to apply the method of
Lagrange multipliers to reach a set of multivariate polynomial equations, which is the
Karush-Kuhn-Tucker (KKT) system that provides the necessary conditions for opti-
mality (see e.g., [122, 39, 57]). In [39], the authors develop special algorithms for that
purpose, such as subdivision methods proposed by Mourrain and Pavone [84], and gen-
eralized normal forms algorithms designed by Mourrain and Trebuchet [85]. However,
the shortcomings of these methods are apparent if the degree of the polynomial is high.
Generic solution methods based on nonlinear programming and global optimization
have been studied and tested (see e.g., Qi [98] and Qi et al. [102], and the references
therein). Recently, a tensor eigenvalue based method for a global polynomial optimiza-
tion problem was also studies by Qi et al. [103]. Moreover, Parpas and Rustem [92], and
Maringer and Parpas [80] proposed diffusion-based methods to solve the non-convex
polynomial optimization models arising from portfolio selection involving higher mo-
ments. For polynomial integer programming models, e.g., (PB), the most commonly
used technique in the literature is transforming them to the maximum weighted inde-
pendent set problems (see e.g., [12, 106]), by using the concept of a conflict graph of a
0-1 polynomial function.
Sum of squares (SOS) approach has been one major systematic approach for solving
general polynomial optimization problems. The approach was proposed by Lasserre [70,
71] and Parrilo [93, 94], and significant research on the SOS method has been conducted
in recent ten years. The SOS method has a strong theoretical appeal, by constructing
a sequence of semidefinite programming (SDP) relaxations of the given polynomial
optimization problem in such a way that the corresponding optimal values are monotone
and converge to the optimal value of the original problem. Thus it can in principle solve
any instance of (POP ) to any given accuracy. For univariate polynomial optimization,
8 1 Introduction
Nesterov [89] showed that the SOS method in combination with the SDP solution has a
polynomial-time complexity. This is also true for unconstrained multivariate quadratic
polynomial and bivariate quartic polynomial when the nonnegativity is equivalent to
the sum of squares. In general, however, the SDP problems required to be solved by the
SOS method may grow very large, and is not practical when the program dimension
goes high. At any rate, thanks to the recently developed efficient SDP solvers (e.g.,
SeDuMi of Sturm [112], SDPT3 of Toh et al. [115]), the SOS method appears to be
attractive. Henrion and Lasserre [52] developed a specialized tool known as GloptiPoly
(the latest version, GloptiPoly 3, can be found in Henrion et al. [53]) for finding a global
optimal solution of polynomial optimization problems on the SOS method, based on
Matlab and SeDuMi. For an overview on the recent theoretical developments, we refer
to the excellent survey by Laurent [72].
On the other side, the intractability of general polynomial optimizations therefore
motivates the search for suboptimal, or more formally, approximate solutions. In the
case that the objective polynomial is quadratic, a well known example is the semidef-
inite programming relaxation and randomization approach for the max-cut problem
due to Goemans and Williamson [40], where essentially a 0.878-approximation ratio of
the model maxx∈1,−1n xTFx is shown with F being the Laplacian of a given graph.
Note that the approach in [40] has been generalized subsequently by many authors,
including Nesterov [88], Ye [118, 119], Nemirovski et al. [87], Zhang [120], Charikar
and Wirth [24], Alon and Naor [5], Zhang and Huang [121], Luo et al. [75], and He et
al. [50]. In particular, when the matrix F is only known to be positive semidefinite,
Nestrov [88] derived a 0.636-approximation bound for maxx∈1,−1n xTFx. For general
diagonal-free matrix F , Charikar and Wirth [24] derived an Ω (1/ log n)-approximation
bound, while its inapproximate results are also discussed by Arora et al. [7]. For the
matrix ∞ 7→ 1-norm problem maxx∈1,−1n1 ,y∈1,−1n2 xTFy, Alon and Naor [5] de-
rived a 0.56-approximation bound. Remark that all these approximation bounds remain
hitherto the best available ones. In continuous polynomial optimizations, Nemirovski
et al. [87] proposed an Ω (1/ logm)-approximation bound for maximizing a quadratic
form over the intersection of m co-centered ellipsoids. Their models are further studied
and generalized by Luo et al. [75] and He et al. [50].
Among all the successful approximation stories mentioned above, the objective poly-
nomials are all quadratic. However, there are only a few approximation results in the
1.3 Algorithms 9
literature when the degree of the objective polynomial is greater than two. Perhaps
the very first one is due to de Klerk et al. [66] in deriving a PTAS of optimizing a fixed
degree homogenous polynomial over a simplex, and it turns out to be a PTAS of opti-
mizing a fixed degree even form (homogeneous polynomial with only even exponents)
over the spherical constraint. Later, Barvinok [15] showed that optimizing a certain
class of polynomials over the spherical constraint also admits a randomized PTAS. Note
that the results in [66, 15] apply only when the objective polynomial has some special
structure. A quite general result is due to Khot and Naor [63], where they showed how
to estimate the optimal value of the problem maxx∈1,−1n∑
1≤i,j,k≤n Fijkxixjxk with
(Fijk) being square-free, i.e., Fijk = 0 whenever two of the indices are equal. Specifical-
ly, they presented a polynomial-time randomized procedure to get an estimated value
that is no less than Ω
(√lognn
)times the optimal value. Two recent papers (Luo and
Zhang [77], and Ling et al. [73]) discussed polynomial optimization problems with the
degree of objective polynomial being four, and start a whole new research on approxi-
mation algorithms for high degree polynomial optimizations, which are essentially the
main subject in this thesis. Luo and Zhang [77] considered quartic optimization, and
showed that optimizing a homogenous quartic form over the intersection of some co-
centered ellipsoids is essentially equivalent to its (quadratic) SDP relaxation problem,
which is itself also NP-hard. However, this gives a handle on the design of approx-
imation algorithms with provable worst-case approximation ratios. Ling et al. [73]
considered a special quartic optimization model. Basically, the problem is to minimize
a biquadratic function over two spherical constraints. In [73], approximate solutions
as well as exact solutions using the SOS method are considered. The approximation
bounds in [73] are indeed comparable to the bound in [77], although they are dealing
with two different models. Very recently, Zhang et al. [123] and Ling al. [74] further
studied biquadratic function optimization over quadratic constraints. The relations
with its bilinear SDP relaxation are discussed, based on which they derived some data
dependent approximation bounds.
In the meanwhile, when the objective function of (POP ) is a high degree inhomo-
geneous polynomial, we have not seen any approximation results so far, even in the
relative sense (for a discussion on relative approximation algorithms, see Section 2.3).
As a matter of fact, so far all the successful polynomial-time approximation algorithm-
s with provable approximation ratios in the literature, e.g., the quadratic, cubic and
10 1 Introduction
quartic models mentioned above are all dependent on the homogeneity in a crucial way.
Technically, a homogenous polynomial function allows one to scale the overall function
value along a given direction, which is an essential operation in proving the quality
bound of the approximation algorithms. Thus, extending the solution methods and
the corresponding analysis from homogeneous polynomial optimizations to the general
inhomogeneous polynomials is not straightforward. These trigger us to search for ap-
proximate solutions of (POP ) with an inhomogeneous polynomial objective, which is
one of the targets to achieve in this thesis.
1.4 Main Contributions
This thesis is concerned with some important and widely used subclasses of polynomial
optimization problems, including optimization of a multilinear function with quadratic
constraints, a homogeneous polynomial with quadratic constraints, a general polyno-
mial with convex constraints, a general polynomial with binary constraints, and a ho-
mogeneous polynomial with binary and spherical constraints. The detailed description
of the problems studied is listed in Section 2.1.3. All these problems are NP-hard in
general, and the focus is on the design and analysis of polynomial-time approximation
algorithms with provable worst-case performance ratios. We also discuss the appli-
cations of these models, and the numerical performance of the proposed algorithms.
Specifically, our contributions are highlighted as follows.
1. We propose approximation algorithms for optimization of any fixed degree homo-
geneous polynomial with quadratic constraints, which is the first such result for
approximation algorithms of polynomial optimization problems with an arbitrary
degree. The approximation ratios depend only on the dimensions of the problems
concerned. Compared with any existing results for high degree polynomial opti-
mizations, our approximation ratios improve the previous ones, when specialized
to their particular degrees.
2. We establish systematic link identities between multilinear functions and homoge-
neous polynomials, and thus establish the same approximation ratios for homoge-
neous polynomial optimizations with their multilinear form relaxation problems.
3. We propose a general scheme to handle inhomogeneous polynomial optimizations
1.4 Main Contributions 11
through the method of homogenization, and thus establish the same approxima-
tion ratios (in relative sense) for inhomogeneous polynomial optimizations with
their homogeneous polynomial relaxation problems. It is the first approxima-
tion bound of approximation algorithms for general inhomogeneous polynomial
optimizations with a high degree.
4. We propose several decomposition routines for polynomial optimizations over
different types of constraint sets, and derive approximation bounds for multilinear
function optimizations with their lower degree relaxation problems.
5. With the availability of our proposed approximation algorithms, we illustrate
some potential modeling opportunities with the new optimization models.
This thesis is organized as follows. First in Chapter 2, we introduce the notations
and models, as well as providing necessary preparations for better understanding the w-
hole thesis. Then from Chapter 3 to Chapter 7, we discuss five subclasses of polynomial
optimization problems, with each subclass in one chapter (the detail description of these
subclasses is introduced in Section 2.1). In each of these five chapters, polynomial-time
approximation algorithms with provable worst-case performance ratios will be proposed
to solve the models concerned, followed by a discussion on their applications and/or a
report on numerical performance of the algorithms proposed. Finally, in Chapter 8, we
summarize the main results in this thesis, and discuss some recent developments and
future research topics.
Chapter 2
Notations and Preliminaries
2.1 Notations and Models
Throughout this thesis, we exclusively use the boldface letters to denote vectors, ma-
trices, and tensors in general (e.g., the decision variable x, the data matrix Q, and the
tensor form F ), while the usual non-bold letters are reserved for scalars (e.g., x1 being
the first component of the vector x, Qij being one entry of the matrix Q).
2.1.1 Objective Functions
The objective functions of the optimization models studied in this thesis are all multi-
variate polynomial functions. The following multilinear tensor function (or multilinear
form) plays a major role in the discussion
Function T F (x1,x2, · · · ,xd) :=∑
1≤i1≤n1,1≤i2≤n2,··· ,1≤id≤nd
Fi1i2···idx1i1x
2i2 · · ·x
did,
where xk ∈ Rnk for k = 1, 2, . . . , d; and the letter ‘T’ signifies the notion of tensor. In
the shorthand notation we denote F = (Fi1i2···id) ∈ Rn1×n2×···×nd to be a d-th order
tensor, and F to be its corresponding multilinear form. The meaning for multilinear
states if one fixed (x2,x3, · · · ,xd) in the function F , then it is a linear function of x1,
and so on.
Closely related with the tensor form F is a general d-th degree homogeneous poly-
nomial function f(x), where x ∈ Rn. We call the tensor form F = (Fi1i2···id) super-
symmetric (see [67]), if any of its components Fi1i2···id is invariant under all permuta-
tions of i1, i2, · · · , id. As any homogeneous quadratic function uniquely determines
12
2.1 Notations and Models 13
a symmetric matrix, a given d-th degree homogeneous polynomial function f(x) also
uniquely determines a super-symmetric tensor form. In particular, if we denote a d-th
degree homogeneous polynomial function
Function H f(x) :=∑
1≤i1≤i2≤···≤id≤nF ′i1i2···idxi1xi2 · · ·xid ,
then its corresponding super-symmetric tensor form can be written as F = (Fi1i2···id) ∈
Rnd , with Fi1i2···id ≡ F ′i1i2···id/|Π(i1, i2, · · · , id)|, where |Π(i1, i2, · · · , id)| is the number
of distinctive permutations of the indices i1, i2, · · · , id. This super-symmetric tensor
representation is indeed unique. Let F be its corresponding multilinear form defined
by the super-symmetric tensor F , then we have f(x) = F (x,x, · · · ,x︸ ︷︷ ︸d
). The letter ‘H’
here is used to emphasize that the polynomial function in question is homogeneous.
We shall also consider in this this the following mixed form
Function M f(x1,x2, · · · ,xs) :=F (x1,x1, · · · ,x1︸ ︷︷ ︸d1
,x2,x2, · · · ,x2︸ ︷︷ ︸d2
, · · · ,xs,xs, · · · ,xs︸ ︷︷ ︸ds
),
where d1 + d2 + · · · + ds = d, xk ∈ Rnk for k = 1, 2, . . . , s, d-th order tensor form
F ∈ Rn1d1×n2
d2×···×nsds ; and the letter ‘M’ signifies the notion of mixed polynomial
form. We may without loss of generality assume that F has partial symmetric property,
namely for any fixed (x2,x3, · · · ,xs), F (·, ·, · · · , ·︸ ︷︷ ︸d1
,x2,x2, · · · ,x2︸ ︷︷ ︸d2
, · · · ,xs,xs, · · · ,xs︸ ︷︷ ︸ds
) is
a super-symmetric d1-th order tensor form, and so on.
Beyond the homogeneous polynomial functions described above, we also study
in this thesis the generic multivariate inhomogeneous polynomial function. An n-
dimensional d-th degree polynomial function can be explicitly written as a summation
of homogenous polynomial functions in decreasing degrees as follows
Function P p(x) :=
d∑k=1
fk(x) + f0 =
d∑k=1
Fk(x,x, · · · ,x︸ ︷︷ ︸k
) + f0,
where x ∈ Rn, f0 ∈ R, and fk(x) = Fk(x,x, · · · ,x︸ ︷︷ ︸k
) is a homogenous polynomial func-
tion of degree k for k = 1, 2, . . . , d; and letter ‘P’ signifies the notion of polynomial.
One natural way to deal with inhomogeneous polynomial function is through homoge-
nization; that is, we introduce a new variable, to be denoted by xh in this thesis, which
is actually set to be 1, to yield a homogeneous form
p(x) =
d∑k=1
fk(x) + f0 =
d∑k=1
fk(x)xhd−k + f0xh
d = f(x),
14 2 Notations and Preliminaries
where f(x) is an (n + 1)-dimensional d-th degree homogeneous polynomial function,
with variable x ∈ Rn+1. Throughout this thesis, the ‘bar’ notation over boldface lower-
case letters, e.g., x, is reserved for an (n+1)-dimensional vector, with the underlying let-
ter x referring to the vector of its first n components, and the subscript ‘h’ (the subscript
of xh) referring to its last component. For instance, if x = (x1, x2, · · · , xn, xn+1)T ∈
Rn+1, then x = (x1, x2, · · · , xn)T ∈ Rn and xh = xn+1 ∈ R.
Throughout we adhere to the notation F for a multilinear form (Function T) defined
by a tensor form F , and f for a homogenous polynomial (Function H) or a mixed
homogeneous form (Function M), and p for a generic (inhomogeneous) polynomial
function (Function P). Without loss of generality we assume that n1 ≤ n2 ≤ · · · ≤ nd
in the tensor form F ∈ Rn1×n2×···×nd , and n1 ≤ n2 ≤ · · · ≤ ns in the tensor form
F ∈ Rn1d1×n2
d2×···×nsds . We also assume at lease one component of the tensor form, F
in Functions T, H, M, and F d in Function P is nonzero to avoid triviality.
2.1.2 Constraint Sets
The most commonly used constraint sets for polynomial optimization problems are
studied in this thesis. Specifically, we consider the following types of constraint sets:
Constraint Bx ∈ Rn |xi2 = 1, i = 1, 2, . . . , n
=: Bn;
Constraint Bx ∈ Rn |xi2 ≤ 1, i = 1, 2, . . . , n
=: Bn;
Constraint Sx ∈ Rn | ‖x‖ :=
(x1
2 + x22 + · · ·+ xn
2) 1
2 = 1
=: Sn;
Constraint S x ∈ Rn | ‖x‖ ≤ 1 =: Sn;
Constraint Qx ∈ Rn |xTQix ≤ 1, i = 1, 2, . . . ,m
;
Constraint G x ∈ Rn |x ∈ G .
The notion ‘B’ signifies the binary variables or binary constraints, and ‘S’ signifies
the Euclidean spherical constraint, with ‘B’ (hypercube) and ‘S’ (the Euclidean ball)
signifying their convex hulls respectively. The norm notation ‘‖ · ‖’ in this thesis is the
2-norm (the Euclidean norm) unless otherwise specified, including those for vectors, ma-
trices and tensors. In particular, the norm of the tensor F = (Fi1i2···id) ∈ Rn1×n2×···×nd
is defined as
‖F ‖ :=
√ ∑1≤i1≤n1,1≤i2≤n2,··· ,1≤id≤nd
Fi1i2···id2 .
The notion ‘Q’ signifies the quadratic constraints, and we focus on convex quadratic
constraints in this thesis, or specifically the case of co-centered ellipsoids, i.e., Qi 0
2.1 Notations and Models 15
for i = 1, 2, . . . ,m and∑m
i=1Qi 0. A general convex compact set in Rn is also
discussed in this thesis, which is denoted by the notion ‘G’. Constraints B, S, Q and G
are convex, while Constraints B and S are non-convex. It is obvious that Constraint G
is a generalization of Constraint Q, and Constraint Q is a generalization of Constraint
S and Constraint B as well.
2.1.3 Models and Organization
All the polynomial optimization models discussed in this thesis are maximization prob-
lems, and the results for most of their minimization counterparts can be similarly
derived. The names of all the models simply combine the names of the objective func-
tions described in Section 2.1.1, and the names of the constraint sets described in
Section 2.1.2, with the names of the constraints in the subscription. For examples,
model (TS) is to maximize a multilinear tensor function (Function T) under the spher-
ical constraints (Constraint S), model (MBS) is to maximize a mixed polynomial form
(Function M) under binary constraints (Constraint B), mixed with variables under
spherical constraints (Constraint S), etc.
In Chapter 3, we discuss the models for optimizing a multilinear form with quadratic
constraints, including (TS) and (TQ). In Chapter 4, we discuss the models for optimizing
a homogeneous polynomial or a mixed form with quadratic constraints, including (HS),
(HQ), (MS) and (MQ). General polynomial optimization models including (PS), (PQ)
and (PG) are discussed in Chapter 5. Chapter 6 talk about binary integer programming
models, including (TB), (HB), (MB), and (PB). Chapter 7 talk about mixed integer
programming models, including (TBS), (HBS) and (MBS). All these models are listed
below for a quick reference.
Chapter 3:
(TS) max F (x1,x2, · · · ,xd)
s.t. xk ∈ Snk , k = 1, 2, . . . , d;
(TQ) max F (x1,x2, · · · ,xd)
s.t. (xk)TQkikxk ≤ 1, k = 1, 2, . . . , d, ik = 1, 2, . . . ,mk,
xk ∈ Rnk , k = 1, 2, . . . , d.
16 2 Notations and Preliminaries
Chapter 4:
(HS) max f(x)
s.t. x ∈ Sn;
(HQ) max f(x)
s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,
x ∈ Rn;
(MS) max f(x1,x2, · · · ,xs)
s.t. xk ∈ Snk , k = 1, 2, . . . , s;
(MQ) max f(x1,x2, · · · ,xs)
s.t. (xk)TQkikxk ≤ 1, k = 1, 2, . . . , s, ik = 1, 2, . . . ,mk,
xk ∈ Rnk , k = 1, 2, . . . , s.
Chapter 5:
(PS) max p(x)
s.t. x ∈ Sn;
(PQ) max p(x)
s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,
x ∈ Rn;
(PG) max p(x)
s.t. x ∈ G.
Chapter 6:
(TB) max F (x1,x2, · · · ,xd)
s.t. xk ∈ Bnk , k = 1, 2, . . . , d;
(HB) max f(x)
s.t. x ∈ Bn;
(MB) max f(x1,x2, · · · ,xs)
s.t. xk ∈ Bnk , k = 1, 2, . . . , s;
(PB) max p(x)
s.t. x ∈ Bn.
2.2 Tensor Operations 17
Chapter 7:
(TBS) max F (x1,x2, · · · ,xd,y1,y2, . . . ,yd′)
s.t. xk ∈ Bnk , k = 1, 2, . . . , d,
y` ∈ Sm` , ` = 1, 2, . . . , d′;
(HBS) max f(x,y)
s.t. x ∈ Bn,
y ∈ Sm;
(MBS) max f(x1,x2, · · · ,xs,y1,y2, · · · ,yt)
s.t. xk ∈ Bnk , k = 1, 2, . . . , s,
y` ∈ Sm` , ` = 1, 2, . . . , t.
As before, we also assume that the tensor forms of the objective functions in (HBS)
and (MBS) to have partial symmetric property, m1 ≤ m2 ≤ · · · ≤ md′ in (TBS), and
m1 ≤ m2 ≤ · · · ≤ mt in (MBS).
In each chapter mentioned above, we discuss the computational complexity of the
models concerned, and focus on polynomial-time approximation algorithms with worst-
case performance ratios, followed by discussions on their applications and/or numerical
performance of the algorithms proposed. All the numerical computations are conducted
using an Intel Pentium 4 CPU 2.80GHz computer with 2GB of RAM, and the support-
ing software Matlab 7.7.0 (R2008b). Let d1 +d2 + · · ·+ds = d and d′1 +d′2 + · · ·+d′t = d′
in the above mentioned models. The degrees of the objective polynomials in these mod-
els, d and d+ d′, are understood as fixed constants in our subsequent discussions. We
are able to propose polynomial-time approximation algorithms for all these models,
and the approximation ratios depend only on the dimensions (including the number of
variables and the number of constraints) of the problems concerned.
The remaining sections in this chapter discuss some necessary preparations, for the
purpose of better understanding the main subjects in the thesis. The topics include
elementary introductions of tensor operations, approximation algorithms, randomized
algorithms, and semidefinite programming.
2.2 Tensor Operations
A tensor is a multidimensional array. More formally, a d-th order tensor is an element
of the tensor product of d vector spaces, each of which has its own coordinate system.
18 2 Notations and Preliminaries
Each entry of a d-th order tensor has d indices associated. A first order tensor is a
vector, a second order tensor is a matrix, and tensors of order three or higher are called
higher order tensors.
This section describe a few tensor operations commonly used in this thesis. For
a general review of other tensor operations, the reader is referred to [68]. The tensor
inner product is denoted by ‘•’, which is the summation of products of all corresponding
entries. For example, if F 1,F 2 ∈ Rn1×n2×···×nd , then
F 1 • F 2 :=∑
1≤i1≤n1,1≤i2≤n2,··· ,1≤id≤nd
F 1i1i2···id · F
2i1i2···id .
As mentioned before, the norm of the tensor is then defined as ‖F ‖ :=√F • F . Notice
that the tensor inner product and tensor norm also apply to the vectors and the matrices
since they are lower order tensors.
The modes of a tensor are referred to its coordinate systems. For example, the
following fourth order tensor G ∈ R2×2×3×2, with its entries being
G1111 = 1, G1112 = 2, G1121 = 3, G1122 = 4, G1131 = 5, G1132 = 6,
G1211 = 7, G1212 = 8, G1221 = 9, G1222 = 10, G1231 = 11, G1232 = 12,
G2111 = 13, G2112 = 14, G2121 = 15, G2122 = 16, G2131 = 17, G2132 = 18,
G2211 = 19, G2212 = 20, G2221 = 21, G2222 = 22, G2231 = 23, G2232 = 24,
has 4 modes, to be named mode 1, mode 2, mode 3 and mode 4. In case a tensor is
a matrix, it has only two modes, which we usually called them column and row. The
indices for an entry of a tensor are a sequence of integers, each one assigning from one
mode.
The first widely used tensor operation is tensor rewritten, which appears frequently
in this thesis. Namely, by combining a set of modes into one mode, a tensor can be
rewritten as a new tensor with a lower order. For example, by combining modes 3 and
4 together and put it into the last mode of the new tensor, tensor G can be rewritten
as a third order tensor G′ ∈ R2×2×6, with its entries being
G′111 = 1, G′112 = 2, G′113 = 3, G′114 = 4, G′115 = 5, G′116 = 6,
G′121 = 7, G′122 = 8, G′123 = 9, G′124 = 10, G′125 = 11, G′126 = 12,
G′211 = 13, G′212 = 14, G′213 = 15, G′214 = 16, G′215 = 17, G′216 = 18,
G′221 = 19, G′222 = 20, G′223 = 21, G′224 = 22, G′225 = 23, G′226 = 24.
2.2 Tensor Operations 19
By combining modes 2, 3 and 4 together, tensor G is then rewritten as a 2× 12 matrix 1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
;
and by combing all the modes together, tensor G becomes a 24-dimensional vector
(1, 2, . . . , 24)T, which is the same as vectorization of a tensor.
The other commonly used operation of tensor is modes switch, that is to switch the
positions of two modes. This is very much like the transpose of a matrix, switching the
positions of row and column. Accordingly, the modes switch will change the sequences
of indices for the entries of a tensor. For example, by switching mode 1 and mode 3 of
G, tensor G is then changed to G′′ ∈ R3×2×2×2, with its entries defined by
G′′ijk` := Gkji` ∀ j, k, ` = 1, 2, i = 1, 2, 3.
By default, among all the tensors discussed in this thesis, we assume their modes have
been switched (in fact reordered), so that their dimensions are in a non-decreasing
order.
Another widely used operation is multiplying a tensor by a vector. For example,
tensorG has its associated multilinear function G(x,y, z,w), where variables x,y,w ∈
R2 and z ∈ R3. Four modes in G correspond to the four positions of variables in
function G. For a given vector w = (w1, w2)T, its multiplication with G in mode 4
makes G to be G′′′ ∈ R2×2×3, whose entries are defined by
G′′′ijk := Gijk1w1 +Gijk2w2 ∀ i, j = 1, 2, k = 1, 2, 3,
which is basically the inner product of the vectors w and Gijk· := (Gijk1, Gijk2)T. For
examples, if w = (1, 1)T, then G′′′ has entries
G′′′111 = 3, G′′′112 = 7, G′′′113 = 11, G′′′121 = 15, G′′′122 = 19, G′′′123 = 23,
G′′′211 = 27, G′′′212 = 31, G′′′213 = 35, G′′′221 = 39, G′′′222 = 43, G′′′223 = 47.
Its corresponding multilinear function is in fact G(x,y, z, w), with the underling vari-
ables x,y, z. Sometimes, we use G(·, ·, ·, w) more often to denote this new multilinear
function G(x,y, z, w).
This type of multiplication can extend to a tensor with a matrix, even with a tensor.
For example, if we multiply tensor G by a given matrix Z ∈ R3×2 in modes 3 and 4,
20 2 Notations and Preliminaries
then we get a second order tensor (matrix) in R2×2, whose (i, j)-th entry is
Gij·· • Z =
3∑k=1
2∑`=1
Gijk`Zk` ∀ i, j = 1, 2.
Its corresponding multilinear function is denoted by G(·, ·, Z). In general, if a d-th
order tensor multiply by a d′-th order tensor (d′ ≤ d) in appropriate modes, then its
product is a (d − d′)-th order tensor. In particular, if d = d′, then this multiplication
is simply the tensor inner product.
2.3 Approximation Algorithms
Approximation algorithms are algorithms designed to find approximate solutions to
optimization problems. In general, approximation algorithms are often associated with
NP-hard problems, since it is unlikely that there exist polynomial-time exact algorithms
for solving NP-hard problems, one then settles for polynomial-time sub-optimal solu-
tions. Approximation algorithms are also used for problems where exact polynomial-
time algorithms are possible but are too expensive to compute due to the size of the
problem. Usually, an approximation algorithm is associated with an approximation
ratio, which is a provable value measuring the quality of the solution found.
Approximation algorithms are widely used in combinatorial optimizations, typically
in various graph problems. Let us describe a well known example, the vertex cover
problem, to appreciate the notion of approximation algorithms. Given an undirected
graph G = (V,E), and a nonnegative cost associated with each vertex, find a minimum
cost of vertices to cover all the edges, i.e., a set V ′ ⊂ V such that every edge has at
least one endpoint incident at V ′.
The vertex cover problem is NP-hard (see e.g., [38]), even for the cardinality vertex
cover, which is the case that the cost associated with each vertex is 1. There is a very
simple algorithm for cardinality vertex cover problem. Pick any uncovered edge e ∈ E,
select both of its two incident vertices, and then remove all the edges covered by these
two vertices; The process is continued until every edge is removed, and output all the
selected vertices. This process can be done in at most |E| numbers of steps, which
is polynomial-time of the input dimension max|V |, |E|. The algorithm may not get
an optimal cover, however we can show that the number of vertices selected by this
algorithm is at most twice as the optimal value of cardinality vertex cover. If fact,
2.3 Approximation Algorithms 21
for any optimal cover, it must cover any edge picked (not removed) by the algorithm,
and thus must include one of its two incident vertices, then this optimal cover must
include half of the vertices selected by the algorithm. This is a typical approximation
algorithm with approximation ratio 2.
We shall now define formally the approximation algorithms and approximation ra-
tios. Throughout this thesis, for any maximization problem (P ) defined as maxx∈X p(x),
we use v(P ) to denote its optimal value, and v(P ) to denote the optimal value of its
minimization counterpart, i.e.,
v(P ) := maxx∈X
p(x) and v(P ) := minx∈X
p(x).
Definition 2.3.1 Approximation algorithm and approximation ratio:
1. A maximization problem maxx∈X p(x) admits a polynomial-time approximation
algorithm with approximation ratio τ ∈ (0, 1], if v(P ) ≥ 0 and a feasible solution
x ∈ X can be found in polynomial-time such that p(x) ≥ τ v(P );
2. A minimization problem minx∈X p(x) admits a polynomial-time approximation
algorithm with approximation ratio µ ∈ [1,∞), if v(P ) ≥ 0 and a feasible solution
x ∈ X can be found in polynomial-time such that p(x) ≤ µ v(P ).
It is easy to see that the larger the τ , the better the ratio for a maximization
problem, and the smaller the µ, the better the ratio for a minimization problem. In short
the closer to one, the better the ratio. However, sometimes a problem may be very hard,
such that there is no polynomial-time approximation algorithm which approximates
the optimal value within any positive factor. A typical example of this type is also the
vertex cover problem, although its cardinality version has a very simple 2-approximation
algorithm. In those unfortunate cases, we have approximation algorithms with relative
approximation ratios.
Definition 2.3.2 Approximation algorithm and relative approximation ratio:
1. A maximization problem maxx∈X p(x) admits a polynomial-time approximation
algorithm with relative approximation ratio τ ∈ (0, 1], if a feasible solution x ∈ X
can be found in polynomial-time such that p(x) − v(P ) ≥ τ (v(P )− v(P )), or
equivalently v(P )− p(x) ≤ (1− τ) (v(P )− v(P ));
22 2 Notations and Preliminaries
2. A minimization problem minx∈X p(x) admits a polynomial-time approximation
algorithm with relative approximation ratio µ ∈ [1,∞), if a feasible solution x ∈ X
can be found in polynomial-time such that v(P )−p(x) ≥ (1/µ) (v(P )− v(P )), or
equivalently p(x)− v(P ) ≤ (1− 1/µ) (v(P )− v(P )).
Similar to the usual approximation ratio, the closer to one, the better the rela-
tive approximation ratios. For a maximization problem, if we know for sure that the
optimal value of its minimization counterpart is nonnegative, then trivially a relative
approximation ratio already implies a usual approximation ratio. This is not rare, as
many optimization problems always have nonnegative objective functions in real appli-
cations, e.g., various graph partition problems. Of course there are several other ways
in defining the approximation quality to measure the performance of the approximate
solutions (see e.g., [61, 11]).
We would like to point out that the approximation ratios defined are for the worst-
case scenarios, which might be hard or even impossible to find an example attaining
exactly the ratio in applying the algorithms. Thus it does not mean an approximation
algorithm with a better approximation ratio has better performance in practice than
that with a worse ratio. In reality, many approximation algorithms have their approx-
imation ratios far from one if they have one at all, which might approach zero when
the dimensions of the problems become large. Perhaps it is more appropriate to view
the approximation guarantee as a measure that forces us to explore deeper into the
structure of the problem and discover more powerful tools to explore this structure.
In addition, an algorithm with a theoretical assurance should be viewed as a useful
guidance that can be fine tuned to suit the type of instances arising from that specific
applications.
As mentioned in Section 2.1.3, all optimization models considered in this thesis are
maximization problems. Thus we reserve the Greek letter τ , specialized to indicate the
approximation ratio, which is a key ingredient throughout this thesis. All the approxi-
mation ratios presented in this thesis are in general not universal constants, and involve
problem dimensions and Ω. Here Ω (f(n)) signifies that there are positive universal con-
stants α and n0 such that Ω (f(n)) ≥ αf(n) for all n ≥ n0. As usual, O (f(n)) signifies
that there are positive universal constants α and n0 such that O (f(n)) ≤ αf(n) for all
n ≥ n0.
2.4 Randomized Algorithms 23
2.4 Randomized Algorithms
A randomized algorithm is an algorithm which employs a degree of randomness as part
of its operation. The algorithm typically contains certain probability distribution as an
auxiliary input to guide its executions, in the hope of achieving good performance on
average, or with high probability to achieve good performance. Formally, the algorithm’s
performance will be a random variable, thus either the running time, or the output (or
both) are random variables.
Historically, the first randomized algorithm was a method developed by Rabin [104]
for the closest pair problem in computational geometry. The study of randomized algo-
rithms was spurred by the 1977 discovery of a randomized primality test (determining
the primality of a number) by Solovay and Strassen [111]. Soon afterwards Rabin [105]
demonstrated that the 1976 Miller’s primality test [83] can be turned into a randomized
algorithm. At that time, no practical deterministic algorithm for primality was known.
A well known application and commonly used algorithm in which randomness can
be useful is quicksort. Any deterministic version of this algorithm requires O(n2) time
to sort n different numbers (to be denoted by set S), e.g., the straightforward one:
comparing all the pairs requiring n(n−1)2 time. However, if we assume the given n
different numbers are in a sequence uniformly distributed on all the n! number of
distinctive sequences, then the quicksort algorithm sort this sequence in O(n log n)
time. The algorithm chooses an element of S uniformly at random as a pivot, compares
the pivot with other elements and groups them into two sets S1 (those bigger than the
pivot) and S2 (those smaller than the pivot), and then applies the same process to sort
S1 and S2; This process is continued until a sort realizes.
To see why quicksort will cost O(n log n) in average, let us without loss of generality
assume S = 1, 2, . . . , n. Denote xij to be the indicator random variable whether
elements i and j are compared or not during a quicksort. The total time of compares
is then
E
∑1≤i<j≤n
xij
=∑
1≤i<j≤nE [xij ] .
Next we compute E[xij ], which is the probability that i and j are ever compared by a
quicksort. This happens if and only if either i or j is the first pivot selected by quicksort
from the set i, i+ 1, . . . , j − 1, j (assume i < j), and the probability is 2/(j − i− 1).
24 2 Notations and Preliminaries
Therefore, the average time of compares is∑1≤i<j≤n
E [xij ] =∑
1≤i<j≤n
2
j − i+ 1= 2(n+ 1)
n∑k=1
1
k− 4n = O(n log n).
In fact, the expected time argument is based on the uniformly choosing the pivots,
although the worst-case time is also O(n2), when each time we are very unfortunate to
pick the largest element as the pivot.
There are also other successful stories that randomized algorithms help. For ex-
ample, the volume of a convex body can be estimated by a randomized algorithm to
arbitrary precision in polynomial-time [31], while no deterministic algorithm can do
the same [13].
In applying to NP-hard optimization problems, randomized algorithms are often
associated with approximation algorithms to prove performance ratios, in terms of
expectation, or with high probability. A simple example is a randomized approximation
algorithm with approximation ratio 0.5 for the max-cut problem. Given an undirected
graph G = (V,E), and a nonnegative weight associated with each edge, find a partition
(cut) of V into two disjoint sets A and B, so that the total weight of all the edges
connecting one vertex in A and one vertex B is maximized. This is also one of the well
known NP-hard problems [38]. The simple algorithm is as follows. For each vertex,
independently toss a fair coin, and put it into A if head or B if tail. It is easy to see
that the probability of an edge connecting A and B are exactly 1/2. By the linearity
of expectation, the expected total weight of this cut is exactly half of the total weight
of all the edges, which is at least half of the maximum cut.
The current best approximation ratio of the max-cut problem is 0.878, another
celebrated result of randomized algorithm due to Goemans and Williamson [40], which
we shall elaborate in Section 2.5.
We conclude this section by defining polynomial-time randomized approximation
algorithms, like Definition 2.3.1 and Definition 2.3.2. We are not going to elaborate all
of their randomized versions, rather a typical one. The others can be similar described.
Definition 2.4.1 A maximization problem maxx∈X p(x) admits a polynomial-time ran-
domized approximation algorithm with approximation ratio τ ∈ (0, 1], if v(P ) ≥ 0 and
one of the following two facts holds:
1. A feasible solution x ∈ X can be found in polynomial-time, such that E[p(x)] ≥
τ v(P );
2.5 Semidefinite Programming 25
2. A feasible solution x ∈ X can be found in polynomial-time, such that p(x) ≥
τ v(P ) with probability at least 1− ε for all ε ∈ (0, 1).
2.5 Semidefinite Programming
Semidefinite programming (SDP) is a subfield of convex optimization concerned with
the optimization of a linear objective function over the intersection of the cone of
positive semidefinite matrices and an affine space. It can be viewed as an extension of
the well known linear programming problem, where the vector of variables is replaced
by a symmetric matrix, and the cone of nonnegative orthant is replaced by the cone of
positive semidefinite matrices. It is a special case of the so-called conic programming
problems (specialized to the cone of positive semidefinite matrices).
The standard formulation of an SDP problem is
(PSP ) sup C •X
s.t. Ai •X = bi, i = 1, 2, . . . ,m,
X 0,
where the data C and Ai (i = 1, 2, . . . ,m) are symmetric matrices, bi (i = 1, 2, . . . ,m)
are scalars, the dot product ‘•’ is the usual matrix inner product introduced in Sec-
tion 2.2, and ‘X 0’ means matrix X is positive semidefinite.
The dual problem of (PSP ) is
(DSP ) inf bTy
s.t.∑m
i=1 yiAi +Z = C,
Z 0.
A solution for an SDP problem is called strictly feasible if its feasible region has
nonempty interior, which is also called Slater condition. We are now providing the
strong duality theorem of SDP, for its proof one is refereed to Vandenberghe and Boy-
d [116] and Helmberg [51].
Theorem 2.5.1 The followings hold for (PSP ) and (DSP ):
1. If (DSP ) is strictly feasible, then v(PSP ) = v(DSP ). If in addition (DSP ) is
bounded above, then this optimal value is obtained by a feasible X∗ of (PSP );
2. If (PSP ) is strictly feasible, then v(PSP ) = v(DSP ). If in addition (PSP ) is
bounded below, then this optimal value is obtained by a feasible (Z∗,y∗) of (DSP );
26 2 Notations and Preliminaries
3. Suppose one of (PSP ) and (DSP ) is strictly feasible and has bounded optimal
value, then feasible X of (PSP ) and feasible (Z,y) of (DSP ) is a pair of optimal
solutions to its respective problems, if and only if C •X = bTy or X •Z = 0;
4. If both (PSP ) and (DSP ) are strictly feasible, then v(PSP ) = v(DSP ) and this
optimal value is obtained by feasible X∗ of (PSP ) and (Z∗,y∗) of (DSP ).
For convenience, an SDP problem may often be specified in a slightly different, but
equivalent form. For example, linear expressions involving nonnegative scalar variables
may be added to the program specification. This remains an SDP because each variable
can be incorporated into the matrix X as a diagonal entry (Xii for some i). To ensure
that Xii ≥ 0, constraints Xij = 0 can be added for all i 6= j. As another example,
note that for any n × n positive semidefinite matrix X, there exists a set of vectors
v1,v2, · · · ,vn such that Xij = (vi)Tvj for all 1 ≤ i, j ≤ n. Therefore, SDP problems
are often formulated in terms of linear expressions on scalar products of vectors. Given
the solution for the SDP in the standard form, the vectors v1,v2, · · · ,vn can be
recovered in O(n3) time, e.g., using the Cholesky decomposition of X.
There are several types of algorithms for solving SDP problems. These algorithms
output the solutions up to an additive error ε in a time that is polynomial in the
problem dimensions and ln(1/ε). Interior point methods are the most popular and
widely use one. A lot of efficient SDP solvers based on interior point methods have
been developed, including SeDuMi of Sturm [112], SDPT3 of Toh et al. [115], SDPA of
Fujisawa et al. [37], CSDP of Borchers [19], DSDP of Benson and Ye [17], and so on.
SDP is of growing interest for several reasons. Many practical problems in opera-
tions research and combinatorial optimization can be modeled or approximated as SDP
problems. In automatic control theory, SDP is used in the context of linear matrix in-
equalities. All linear programming problems can be expressed as SDP problems, and
via hierarchies of SDP problems the solutions of polynomial optimization problems can
be approximated. Besides, SDP has been used in the design of optimal experiments
and it can aid in the design of quantum computing circuits.
SDP has a wide range of practical applications. One of its significant applications
is in its role to design approximate solutions to combinatorial optimization problem-
s, starting from the seminal work by Goemans and Williamson [40], who essentially
proposed a polynomial-time randomized approximation algorithm with approximation
2.5 Semidefinite Programming 27
ratio 0.878 for the max-cut problem. The algorithm uses SDP relaxation and random-
ization techniques, whose ideas have been revised and generalized in solving various
quadratic programming problems [88, 118, 119, 87, 120, 24, 5, 121, 75, 50] and even
quartic polynomial optimizations [77, 73]. We now elaborates the max-cut algorithm
of Goemans and Williamson.
As described in Section 2.4, the max-cut problem is to find a partition of an undi-
rected graph G = (V,E) with nonnegative weights on edges, into two disjoint sets, so
that the total weight of all the edges connecting these two sets is maximized. Denote
1, 2, . . . , n to be the set of vertices. Let wij ≥ 0 be the weight of edge connecting
vertices i and j for all i 6= j, and let it be 0 if there is no edge between i and j, or
i = j. If we let xi (i = 1, 2, . . . , n) be the binary variable denoting whether it is in the
first set (xi = 1) or the second set (xi = −1), then max-cut is the following quadratic
integer programming problem
(MC) max∑
1≤i,j≤nwij(1− xixj)/4
s.t. xi ∈ 1,−1, i = 1, 2, . . . , n.
The problem is NP-hard (see e.g., Garey and Johnson [38]). Now by introducing a
matrix X with Xij replacing xixj , the constraint is then equivalent to diag (X) =
e, X 0, rank (X) = 1. A straightforward SDP relaxation is dropping the rank-one
constraint, which yields
(SMC) max∑
1≤i,j≤nwij(1−Xij)/4
s.t. diag (X) = e, X 0.
The algorithm first solves (SMC) to get an optimal solution X∗, then randomly gen-
erates an n-dimensional vector following a zero-mean multivariate normal distribution
ξ ∼ N (0n,X∗),
and lets xi = sign (ξi) for i = 1, 2, . . . , n. Note that generating a zero-mean normal
random vector with co-variance matrix X∗ can be done by multiplying (X∗)12 with a
vector whose components are generating from n i.i.d. standard normal random vari-
ables. Besides, the sign function takes 1 for nonnegative numbers and −1 for negative
numbers. Although the output cut (solution x) may not be optimal, and is random
either. It can be shown that
E [xixj ] =2
πarcsin X∗ij ∀ 1 ≤ i, j ≤ n,
28 2 Notations and Preliminaries
which further leads to
E
∑1≤i,j≤n
wij(1− xixj)4
≥ 0.878 v(SMC) ≥ 0.878 v(MC).
This yields a 0.878-approximation ratio for the max-cut problem. The ratio significantly
improves the previous best known one, which is 0.5 introduced in Section 2.4.
We concludes this section as well as this chapter, by introducing another exam-
ple of SDP relaxation and randomization technique for solving quadratic constrained
quadratic programming (QCQP) in Nemirovski et al. [87]. The problem is
(QP ) max xTFx
s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,
x ∈ Rn,
where Qi 0 for i = 1, 2, . . . ,m and∑m
i=1Qi 0. Remark this is exact the model
(HQ) when d = 2, whose algorithm will be used in this thesis. By using the same
method with Xij to replace xixj , and drop the rank-one constraint, we shall have the
standard SDP relaxation for (QP )
(SQP ) max F •X
s.t. Qi •X ≤ 1, i = 1, 2, . . . ,m,
X 0.
A polynomial-time randomized approximation algorithm runs in as follows:
1. Solve (SQP ) to get an optimal solution X∗;
2. Randomly generate a vector ξ ∼ N (0n,X∗);
3. Compute t = max1≤i≤m
√ξTQiξ and output the solution x = ξ/t.
A probability analysis can prove that
xTF x ≥ Ω (1/ logm) v(SQP ) ≥ Ω (1/ logm) v(QP )
holds with probability bigger than a constant. Thus running this algorithmsO (log(1/ε))
times and pick the best solution, which shall hit the approximation bound of Ω (1/ logm)
with probability at least 1− ε. For details, one is referred to Nemirovski et al. [87] and
He et al. [50].
Chapter 3
Multilinear Form Optimization
with Quadratic Constraints
3.1 Introduction
The first subclass of polynomial optimization problems studied in this thesis are the
following multilinear tensor function optimizations over quadratic constraints. Specifi-
cally, the models include maximizing a multilinear form under spherical constraints
(TS) max F (x1,x2, · · · ,xd)
s.t. xk ∈ Snk , k = 1, 2, . . . , d,
and maximizing a multilinear form over co-centered ellipsoidal constraints
(TQ) max F (x1,x2, · · · ,xd)
s.t. (xk)TQkikxk ≤ 1, k = 1, 2, . . . , d, ik = 1, 2, . . . ,mk,
xk ∈ Rnk , k = 1, 2, . . . , d,
where Qkik 0 and
∑mkik=1Q
kik 0 for k = 1, 2, . . . , d, ik = 1, 2, . . . ,mk.
It is easy to see that the optimal value of (TS), denoted by v(TS), is positive by the
assumption that F is not a zero tensor. Moreover, (TS) is equivalent to
max F (x1,x2, · · · ,xd)
s.t. xk ∈ Snk , k = 1, 2, . . . , d.
This is because we can always scale the decision variables such that ‖xk‖ = 1 for all
1 ≤ k ≤ d without decreasing the objective. Thus (TS) is a special case of (TQ).
29
30 3 Multilinear Form Optimization with Quadratic Constraints
Homogeneous polynomial functions play an important role in approximation theo-
ry. In a certain well-defined sense, homogeneous polynomials are fairly dense among
all the continuous functions (see e.g., [117, 69]). Multilinear form is a special class of
homogeneous polynomials. In fact, one of the main reasons for us to study multilinear
form optimizations is its strong connection to homogenous polynomial optimizations in
deriving approximation bounds, whose details will be discussed in Chapter 4. This con-
nection creates a new approach to handle polynomial optimization problems, and the
fundamental issue is optimization of a multilinear tensor form. Chen et al. [25] establish
the tightness result of multilinear form relaxation for maximizing a homogeneous poly-
nomial over spherical constraint. The study of multilinear form optimizations becomes
much important.
Low degree cases of (TS) can be often encountered: When d = 1, its optimal solution
is F /‖F ‖ due to the Cauchy-Schwartz inequality; When d = 2, (TS) is to compute the
spectrum norm of the matrix F with efficient algorithms readily available. As we shall
prove later that (TS) is already NP-hard when d = 3, the focus of this chapter is to
design polynomial-time approximation algorithms with worst-case performance ratios
for any fixed degree d. The novel idea to handle high degree multilinear form is to
reduce the its degree, which leads to a relaxed multilinear form optimization in a lower
degree case. As any matrix can be treated as a long vector, any higher order tensor can
also be rewritten as a tensor with its order deduced by one (see the tensor operation
in Section 2.2), and thus rewritten its corresponding multilinear form with its degree
deduced by one. After we solve the problem with a lower degree, we need decompose
the solution to make it feasible for the higher degree case. Thus specific decomposition
methods are required, which are the main contributions in this chapter.
For the model (TQ): When d = 1, it can be formulated to a second order cone
program (SOCP), which can be solved in polynomial-time (see e.g., [20, 86]); When d =
2, it can be formulated to a quadratically constrained quadratic programming problem
discussed in Section 2.5, and known to be NP-hard in general. Nemirovski et al. [87]
proposed a polynomial-time randomized approximation algorithm with approximation
ratio Ω (1/ logm) based on SDP relaxation and randomization, and this algorithm
serves as a basis in analyzing our algorithms and approximation ratios.
We discuss approximation algorithms of (TS) in Section 3.2, followed by that of
(TQ) in Section 3.3. Some application examples of the models concerned are discussed
3.2 Multilinear Form with Spherical Constraints 31
in Section 3.4. Finally, numerical performance of the proposed algorithms are reported
in Section 3.5.
3.2 Multilinear Form with Spherical Constraints
Let us first consider the following optimization model
(TS) max F (x1,x2, · · · ,xd)
s.t. xk ∈ Snk , k = 1, 2, . . . , d,
where n1 ≤ n2 ≤ · · · ≤ nd. A special case of (TS) is worth noting, which plays an
important role in our algorithms.
Proposition 3.2.1 If d = 2, then (TS) can be solved in polynomial-time, with v(TS) ≥
‖F ‖/√n1.
Proof. The problem is essentially maxx∈Sn1 ,y∈Sn2 xTFy. For any fixed y, the corre-
sponding optimal x must be Fy/‖Fy‖ due to the Cauchy-Schwartz inequality, and
accordingly,
xTFy =
(Fy
‖Fy‖
)T
Fy = ‖Fy‖ =
√yTFTFy.
Thus the problem is equivalent to maxy∈Sn2 yTFTFy, whose solution is the largest
eigenvalue and a corresponding eigenvector of the positive semidefinite matrix FTF .
We then have
λmax(FTF ) ≥ tr (FTF )/rank (FTF ) ≥ ‖F ‖2/n1,
which implies v(TS) =√λmax(FTF ) ≥ ‖F ‖/√n1.
However, for any degree d ≥ 3, (TS) becomes NP-hard.
Proposition 3.2.2 If d = 3, then (TS) is NP-hard.
Proof. We first quote a result of Nesterov [90], which states that
max∑m
k=1(xTAkx)2
s.t. x ∈ Sn
is NP-hard. Now in a special case d = 3, n1 = n2 = n3 = n and F ∈ Rn3satisfies
Fijk = Fjik for all 1 ≤ i, j, k ≤ n, the objective function of (TS) can be written as
F (x,y, z) =n∑
i,j,k=1
Fijkxiyjzk =n∑k=1
zk
n∑i,j=1
Fijkxiyj
=n∑k=1
zk(xTAky),
32 3 Multilinear Form Optimization with Quadratic Constraints
where symmetric matrix Ak ∈ Rn×n with its (i, j)-th entry being Fijk for all 1 ≤
i, j, k ≤ n. By the Cauchy-Schwartz inequality, (TS) is equivalent to
max∑n
k=1(xTAky)2
s.t. x,y ∈ Sn.
We need only to show that the optimal value of the above problem is always attainable
at x = y. To see why, denote (x, y) to be any optimal solution pair, with optimal value
v∗. If x = ±y, then the claim is true; otherwise, we may suppose that x+ y 6= 0. Let
us denote w := (x+ y)/‖x+ y‖. Since (x, y) must be a KKT point, there exist (λ, µ)
such that
n∑k=1
xTAky Aky = λx
n∑k=1
xTAky Akx = µy.
Pre-multiplying xT to the first equation and yT to the second equation yield λ = µ =
v∗. Summing up the two equations, pre-multiplying wT, and then scaling, lead us ton∑k=1
xTAky wTAkw = v∗.
By applying the Cauchy-Schwartz inequality to the above equality, we have
v∗ ≤
(n∑k=1
(xTAky)2
) 12(
n∑k=1
(wTAkw)2
) 12
=√v∗
(n∑k=1
(wTAkw)2
) 12
,
which implies that (w, w) is also an optimal solution. The problem is then reduced to
Nesterov’s quartic model, and its NP-hardness thus follows.
In the remainder of this section, we focus on approximation algorithms for (TS) for
general degree d. To illustrate the main idea of the algorithms, let us first work with
the case d = 3, i.e.,
(TS) max F (x,y, z) =∑
1≤i≤n1,1≤j≤n2,1≤k≤n3Fijkxiyjzk
s.t. x ∈ Sn1 , y ∈ Sn2 , z ∈ Sn3 .
Denote W = xyT, and we have
‖W ‖2 = tr (WWT) = tr (xyTyxT) = tr (xTxyTy) = ‖x‖2‖y‖2 = 1.
Model (TS) can now be relaxed to
max F (W , z) =∑
1≤i≤n1,1≤j≤n2,1≤k≤n3FijkWijzk
s.t. W ∈ Sn1×n2 , z ∈ Sn3 .
3.2 Multilinear Form with Spherical Constraints 33
Notice that the above problem is exactly (TS) with d = 2, which can be solved in
polynomial-time by Proposition 3.2.1. Denote its optimal solution to be (W , z). Clear-
ly F (W , z) ≥ v(TS). The key step is to recover solution (x, y) from the matrix W .
Below we are going to introduce two basic decomposition routines: one is based on
randomization and the other on eigen-decomposition. They play a fundamental role in
our proposed algorithms; all solution methods to be developed later rely on these two
routines as a basis.
Decomposition Routine 3.2.1
• INPUT: matrices M ∈ Rn1×n2 , W ∈ Sn1×n2.
1 Construct
W =
In1×n1 W
WT WTW
0.
2 Randomly generate ξ
η
∼ N (0n1+n2 , W )
and repeat if necessary, until ξTMη ≥M •W and ‖ξ‖‖η‖ ≤ O(√n1).
3 Compute x = ξ/‖ξ‖ and y = η/‖η‖.
• OUTPUT: vectors x ∈ Sn1 , y ∈ Sn2.
Now, let M = F (·, ·, z) and W = W in applying the above decomposition routine.
For the randomly generated (ξ,η), we have
E[F (ξ,η, z)] = E[ξTMη] = M •W = F (W , z).
He et al. [50] establish that if f(x) is a homogeneous quadratic form and x is drawn
from a zero-mean multivariate normal distribution, then there is a universal constant
θ ≥ 0.03 such that
Prob f(x) ≥ E[f(x)] ≥ θ.
Since ξTMη is a homogeneous quadratic form of the normal random vector(ξη
), we
know
Prob ξTMη ≥M •W = Prob F (ξ,η, z) ≥ E[F (ξ,η, z)] ≥ θ.
34 3 Multilinear Form Optimization with Quadratic Constraints
Moreover, by using a property of normal random vectors (see Lemma 3.1 of [77]), we
have
E[‖ξ‖2‖η‖2
]= E
n1∑i=1
n2∑j=1
ξi2ηj
2
=
n1∑i=1
n2∑j=1
(E[ξi
2]E[ηj2] + 2(E[ξiηj ])
2)
=
n1∑i=1
n2∑j=1
[(WTW )jj + 2Wij
2]
= (n1 + 2) tr (WTW ) = n1 + 2.
By applying the Markov inequality, for any t > 0
Prob ‖ξ‖2‖η‖2 ≥ t ≤ E[‖ξ‖2‖η‖2
]/t = (n1 + 2)/t.
Therefore, by the so-called union inequality for the probability of joint events, we have
ProbF (ξ,η, z) ≥ F (W , z), ‖ξ‖2‖η‖2 ≤ t
≥ 1− Prob
F (ξ,η, z) < F (W , z)
− Prob
‖ξ‖2‖η‖2 > t
≥ 1− (1− θ)− (n1 + 2)/t = θ/2,
where we let t = 2(n1 + 2)/θ. Thus we have
F (x,y, z) ≥ F (W , z)√t
≥ v(TS)
√θ
2(n1 + 2),
obtaining an Ω(1/√n1)-approximation ratio.
Below we present an alternative (and deterministic) decomposition routine.
Decomposition Routine 3.2.2
• INPUT: a matrix M ∈ Rn1×n2.
1 Find an eigenvector y corresponding to the largest eigenvalue of MTM .
2 Compute x = My/‖My‖ and y = y/‖y‖.
• OUTPUT: vectors x ∈ Sn1 , y ∈ Sn2.
This decomposition routine literally follows the proof of Proposition 3.2.1, which
tells us that xTMy ≥ ‖M‖/√n1. Thus we have
F (x,y, z) = xTMy ≥ ‖M‖√n1
= maxZ∈Sn1×n2
M •Z√n1
≥ M • W√n1
=F (W , z)√n1
≥ v(TS)√n1
.
3.2 Multilinear Form with Spherical Constraints 35
The complexity for DR 3.2.1 is O (n1n2 log(1/ε)) with probability 1 − ε, and for
DR 3.2.2 it is O(maxn1
3, n1n2). However DR 3.2.2 is indeed very easy to implement,
and is deterministic. Both DR 3.2.1 and DR 3.2.2 lead to the following approximation
result in terms of the order of the approximation ratio.
Theorem 3.2.3 If d = 3, then (TS) admits a polynomial-time approximation algorithm
with approximation ratio 1/√n1.
Now we proceed to the case for general d. Let X = x1(xd)T, and (TS) can be
relaxed to
(TS) max F (X,x2,x3, · · · ,xd−1)
s.t. X ∈ Sn1×nd , xk ∈ Snk , k = 2, 3, . . . , d− 1.
Clearly it is a type of the model (TS) with degree d − 1. Suppose (TS) can be
solved approximately in polynomial-time with approximation ratio τ , i.e., we find
(X, x2, x3, · · · , xd−1) with
F (X, x2, x3, · · · , xd−1) ≥ τv(TS) ≥ τv(TS).
Observing that F (·, x2, x3, · · · , xd−1, ·) is an n1 × nd matrix, using DR 3.2.2 we shall
find (x1, xd) such that
F (x1, x2, · · · , xd) ≥ F (X, x2, x3, · · · , xd−1)/√n1 ≥ n1
− 12 τv(TS).
By induction this leads to the following:
Theorem 3.2.4 (TS) admits a polynomial-time approximation algorithm with approx-
imation ratio τ(TS), where
τ(TS) :=
(d−2∏k=1
nk
)− 12
.
Below we summarize the above recursive procedure to solve (TS) as in Theorem 3.2.4.
Remark that the approximation performance ratio of this algorithm is tight. In a
special example F (x1,x2, · · · ,xd) =∑n
i=1 x1ix
2i · · ·xdi , the algorithm can be made to
return a solution with approximation ratio being exactly τ(TS).
Algorithm 3.2.3
• INPUT: a d-th order tensor F ∈ Rn1×n2×···×nd with n1 ≤ n2 ≤ · · · ≤ nd.
36 3 Multilinear Form Optimization with Quadratic Constraints
1 Rewrite F as a (d− 1)-th order tensor F ′ ∈ Rn2×n3×···×nd−1×ndn1 by combing its
first and last modes into one, and placing it in the last mode of F ′, i.e.,
Fi1,i2,··· ,id = F ′i2,i3,··· ,id−1,(i1−1)nd+id∀ 1 ≤ i1 ≤ n1, 1 ≤ i2 ≤ n2, · · · , 1 ≤ id ≤ nd.
2 For (TS) with the (d− 1)-th order tensor F ′: if d− 1 = 2, then apply DR 3.2.2,
with input F ′ = M and output (x2, x1,d) = (x,y); otherwise obtain a solution
(x2, x3, · · · , xd−1, x1,d) by recursion.
3 Compute a matrix M ′ = F (·, x2, x3, · · · , xd−1, ·) and rewrite the vector x1,d as
a matrix X ∈ Sn1×nd.
4 Apply either DR 3.2.1 or DR 3.2.2, with input (M ′,X) = (M ,W ) and output
(x1, xd) = (x,y).
• OUTPUT: a feasible solution (x1, x2, · · · , xd).
3.3 Multilinear Form with Ellipsoidal Constraints
In this section, we consider a generalization of the optimization model discussed in
Section 3.2, to include general ellipsoidal constraints. Specifically, the model is
(TQ) max F (x1,x2, · · · ,xd)
s.t. (xk)TQkikxk ≤ 1, k = 1, 2, . . . , d, ik = 1, 2, . . . ,mk,
xk ∈ Rnk , k = 1, 2, . . . , d,
where Qkik 0 and
∑mkik=1Q
kik 0 for k = 1, 2, . . . , d, ik = 1, 2, . . . ,mk.
Let us start with the case d = 2, and suppose F (x1,x2) = (x1)TFx2. Denote
y =
(x1
x2
), F ′ =
0n1×n1F2
FT
2 0n2×n2
, Qi =
Q1i 0n1×n2
0n2×n1 0n2×n2
for all 1 ≤ i ≤ m1,
and Qi =
0n1×n1 0n1×n2
0n2×n1 Q2i−m1
for all m1 + 1 ≤ i ≤ m1 +m2. Then (TQ) is equivalent
to
max yTF ′y
s.t. yTQiy ≤ 1, i = 1, 2, . . . ,m1 +m2,
y ∈ Rn1+n2 .
3.3 Multilinear Form with Ellipsoidal Constraints 37
This QCQP problem is discussed in Section 2.5, and is well known to be solved ap-
proximately by a polynomial-time randomized algorithm with approximation ratio
Ω(
1log(m1+m2)
)(see e.g., Nemirovski et al. [87] and He et al. [50]).
We now proceed to the higher order cases. To illustrate the essential ideas, we shall
focus on the case d = 3. The extension to any higher order can be done by induction.
In case d = 3 we may explicitly write (TQ) as:
(TQ) max F (x,y, z)
s.t. xTQix ≤ 1, i = 1, 2, . . . ,m1,
yTP jy ≤ 1, j = 1, 2, . . . ,m2,
zTRkz ≤ 1, k = 1, 2, . . . ,m3,
x ∈ Rn1 ,y ∈ Rn2 , z ∈ Rn3 ,
where Qi 0 for all 1 ≤ i ≤ m1, P j 0 for all 1 ≤ j ≤ m2, Rk 0 for all 1 ≤ k ≤ m3,
and∑m1
i=1Qi 0,∑m2
j=1P j 0,∑m3
k=1Rk 0.
Combining the constraints of x and y, we have
tr (QixyTP jyx
T) = tr (xTQixyTP jy) = xTQix · yTP jy ≤ 1.
Denoting W = xyT, (TQ) can be relaxed to
(TQ) max F (W , z)
s.t. tr (QiWP jWT) ≤ 1, i = 1, 2, . . . ,m1, j = 1, 2, . . . ,m2,
zTRkz ≤ 1, k = 1, 2, . . . ,m3,
W ∈ Rn1×n2 , z ∈ Rn3 .
Observe that for any W ∈ Rn1×n2 ,
tr (QiWP jWT) = tr (Qi
12WP j
12P j
12WTQi
12 ) =
∥∥∥Qi
12WP j
12
∥∥∥2≥ 0,
and that for any W 6= 0,
∑1≤i≤m1,1≤j≤m2
tr (QiWP jWT) = tr
(m1∑i=1
Qi
)W
m2∑j=1
P j
WT
=
∥∥∥∥∥∥∥(m1∑i=1
Qi
) 12
W
m2∑j=1
P j
12
∥∥∥∥∥∥∥2
> 0.
Indeed, it is easy to verify that tr (QiWP jWT) = (vec(W ))T(Qi⊗P j)vec(W ), which
implies that tr (QiWP jWT) ≤ 1 is actually a convex quadratic constraint for W .
38 3 Multilinear Form Optimization with Quadratic Constraints
Thus, (TQ) is exactly in the form of (TQ) with d = 2. Therefore we are able to find a
feasible solution (W , z) of (TQ) in polynomial-time, such that
F (W , z) ≥ Ω
(1
log(m1m2 +m3)
)v(TQ) ≥ Ω
(1
logm
)v(TQ), (3.1)
where m = maxm1,m2,m3. Let us fix z, and then F (·, ·, z) is a matrix. Our next
step is to generate (x, y) from W . For this purpose, we first introduce the following
lemma.
Lemma 3.3.1 Suppose Qi ∈ Rn×n, Qi 0 for all 1 ≤ i ≤ m, and∑m
i=1Qi 0, the
following SDP problem
(PS) min∑m
i=1 ti
s.t. tr (UQi) ≤ 1, i = 1, 2, . . . ,m,
ti ≥ 0, i = 1, 2, . . . ,m, U In×n
In×n∑m
i=1 tiQi
0
has an optimal solution with optimal value equal to n.
Proof. Straightforward computation shows that the dual of (PS) is
(DS) max −∑m
i=1 si − 2 tr (Z)
s.t. tr (XQi) ≤ 1, i = 1, 2, . . . ,m,
si ≥ 0, i = 1, 2, . . . ,m, X Z
ZT ∑mi=1 siQi
0.
Observe that (DS) indeed resembles (PS). Since∑m
i=1Qi 0, both (PS) and (DS)
satisfy the Slater condition, and thus both of them have attainable optimal solutions
satisfying the strong duality relationship, i.e., v(PS) = v(DS). Let (U∗, t∗) be an
optimal solution of (PS). Clearly U∗ 0, and by the Schur complement relationship
we have∑m
i=1 t∗iQi (U∗)−1. Therefore,
v(PS) =m∑i=1
t∗i ≥m∑i=1
t∗i tr (U∗Qi) ≥ tr (U∗(U∗)−1) = n. (3.2)
Observe that for any dual feasible solution (X,Z, s) we always have −∑m
i=1 si ≤
−tr (X∑m
i=1 siQi). Hence the following problem is a relaxation of (DS)
(RS) max −tr (XY )− 2 tr (Z)
s.t.
X Z
ZT Y
0.
3.3 Multilinear Form with Ellipsoidal Constraints 39
Consider any feasible solution (X,Y ,Z) of (RS). Let X = PTDP be an orthonor-
mal decomposition with D = Diag (d1, d2, · · · , dn) and P−1 = PT. Notice that
(D,Y ′,Z ′) := (PXPT,PY PT,PZPT) is also a feasible solution for (RS) with
the same objective value. By the feasibility, it follows that diY′ii − (Z ′ii)
2 ≥ 0 for
i = 1, 2, . . . , n. Therefore,
−tr (XY )− 2 tr (Z) = −tr (DY ′)− 2 tr (Z ′) = −n∑i=1
diY′ii − 2
n∑i=1
Z ′ii
≤ −n∑i=1
(Z ′ii)2 − 2
n∑i=1
Z ′ii ≤ −n∑i=1
(Z ′ii + 1)2 + n ≤ n.
This implies that v(DS) ≤ v(RS) ≤ n. By combining this with (3.2), and noticing the
strong duality relationship, it follows that v(PS) = v(DS) = n.
We then have the following decomposition method, to be called DR 3.3.1, as a
further extension of DR 3.2.1. It plays a similar role in Algorithm 3.3.2 as DR 3.2.1 or
DR 3.2.2 does in Algorithm 3.2.3.
Decomposition Routine 3.3.1
• INPUT: matrices Qi ∈ Rn1×n1 , Qi 0 for all 1 ≤ i ≤ m1 with∑m1
i=1Qi 0,
P j ∈ Rn2×n2 , P j 0 for all 1 ≤ j ≤ m2 with∑m2
j=1P j 0, W ∈ Rn1×n2 with
tr (QiWP jWT) ≤ 1 for all 1 ≤ i ≤ m1 and 1 ≤ j ≤ m2, and M ∈ Rn1×n2.
1 Solve the SDP problem
min∑m1
i=1 ti
s.t. tr (UQi) ≤ 1, i = 1, 2, . . . ,m1,
ti ≥ 0, i = 1, 2, . . . ,m1, U In×n
In×n∑m1
i=1 tiQi
0
to get an optimal solution of a matrix U and scalars t1, t2, · · · , tm1.
2 Construct
W =
U W
WT WT(∑m1
i=1 tiQi)W
0.
40 3 Multilinear Form Optimization with Quadratic Constraints
3 Randomly generate ξ
η
∼ N (0n1+n2 , W )
and repeat if necessary, until ξTMη ≥ M •W , ξTQiξ ≤ O (logm1) for all
1 ≤ i ≤ m1, and ηTP jη ≤ O (n1 logm2) for all 1 ≤ j ≤ m2.
4 Compute x = ξ/√
max1≤i≤m1ξTQiξ and y = η/√
max1≤j≤m2ηTP jη.
• OUTPUT: vectors x ∈ Rn1, y ∈ Rn2.
The computational complexity of DR 3.3.1 depends on the algorithm for solving
the SDP problem (PS), which has O(n12) number of variables and O(m1) number of
constraints. In addition it requires O (n2(n1m1 + n2m2) log(1/ε)) other operations to
get the quality assured solution with probability 1− ε.
Lemma 3.3.2 Under the input of DR 3.3.1, we can find x ∈ Rn1 and y ∈ Rn2 by a
polynomial-time randomized algorithm, satisfying xTQix ≤ 1 for all 1 ≤ i ≤ m1 and
yTP jy ≤ 1 for all 1 ≤ j ≤ m2, such that
xTMy ≥ 1√n
Ω
(1√
logm1 logm2
)M •W .
Proof. Following the randomization procedure in Step 3 of DR 3.3.1, by Lemma 3.3.1
we have for any 1 ≤ i ≤ m1 and 1 ≤ j ≤ m2,
E[ξTQiξ] = tr (QiU) ≤ 1,
E[ηTP jη] = tr
(P jW
T
(m1∑i=1
tiQi
)W
)=
m1∑i=1
ti tr (P jWTQiW ) ≤
m1∑i=1
ti = n1.
So et al. [109] have established that if ξ is a normal random vector and Q 0, then
for any α > 0,
Prob ξTQξ ≥ αE[ξTQξ] ≤ 2e−α2 .
Applying this result we have
Prob ξTQiξ ≥ α1 ≤ Prob ξTQiξ ≥ α1E[ξTQiξ] ≤ 2e−α12 ,
Prob ηTP jη ≥ α2n1 ≤ Prob ηTP jη ≥ α2E[ηTP jη] ≤ 2e−α22 .
3.3 Multilinear Form with Ellipsoidal Constraints 41
Moreover, E[ξTMη] = M •W . Now let x = ξ/√α1 and y = η/
√α2n1, and we have
Prob
xTMy ≥ M •W
√α1α2n1
, xTQix ≤ 1 ∀ 1 ≤ i ≤ m1, yTP jy ≤ 1∀ 1 ≤ j ≤ m2
≥ 1− Prob ξTMη <M •W −
m1∑i=1
Prob ξTQiξ >α1 −m2∑j=1
Prob ηTP jη >α2n1
≥ 1− (1− θ)−m1 · 2e−α12 −m2 · 2e−
α22 = θ/2,
where α1 := 2 ln(8m1/θ) and α2 := 2 ln(8m2/θ). Since α1α2 = O(logm1 logm2), the
desired (x, y) can be found with high probability in multiple trials.
Let us now turn back to (TQ). If we let W = W and M = F (·, ·, z) in applying
Lemma 3.3.2, then in polynomial-time we can find (x, y), satisfying the constraints of
(TQ), such that
F (x, y, z) = xTMy ≥ 1√n1
Ω
(1√
logm1 logm2
)M •W ≥ 1
√n1
Ω
(1
logm
)F (W , z).
Combined with (3.1), we thus prove the following result.
Theorem 3.3.3 If d = 3, then (TQ) admits a polynomial-time randomized approxima-
tion algorithm with approximation ratio 1√n1
Ω(
1log2m
), where m = maxm1,m2,m3.
This result can be generalized to the model (TQ) of any fixed degree d.
Theorem 3.3.4 (TQ) admits a polynomial-time randomized approximation algorithm
with approximation ratio τ(TQ), where
τ(TQ) :=
(d−2∏k=1
nk
)− 12
Ω(
log−(d−1)m),
and m = max1≤k≤dmk.
Proof. We again take recursive steps. Denoting W = x1(xd)T and (TQ) is relaxed to
(TQ) max F (W ,x2,x3, · · · ,xd−1)
s.t. tr (Q1i1WQd
idWT) ≤ 1, i1 = 1, 2, . . . ,m1, id = 1, 2, . . . ,md,
(xk)TQkikxk ≤ 1, k = 2, 3, . . . , d− 1, ik = 1, 2, . . . ,mk,
W ∈ Rn1×nd ,xk ∈ Rnk , k = 2, 3, . . . , d− 1.
Notice that (TQ) is exactly in the form of (TQ) of degree d − 1, by treating W as a
vector of dimension n1nd. By recursion, with high probability we can find a feasible
42 3 Multilinear Form Optimization with Quadratic Constraints
solution (W , x2, x3, · · · , xd−1) of (TQ) in polynomial-time, such that
F (W , x2, x3, · · · , xd−1) ≥
(d−2∏k=2
nk
)− 12
Ω(
log−(d−2) maxm,m1md)v(TQ)
≥
(d−2∏k=2
nk
)− 12
Ω(
log−(d−2)m)v(TQ).
As long as we fix (x2, x3, · · · , xd−1), and letM = F (·, x2, x3, · · · , xd−1, ·) andW = W
in applying Lemma 3.3.2, we are be able to find (x1, xd) satisfying the constraints of
(TQ), such that
F (x1, x2, · · · , xd) ≥ 1√n1
Ω
(1
logm
)F (W , x2, x3, · · · , xd−1) ≥ τ(TQ) v(TQ).
Summarizing, the recursive procedure for solving general (TQ) (Theorem 3.3.4) is
highlighted as follows:
Algorithm 3.3.2
• INPUT: a d-th order tensor F ∈ Rn1×n2×···×nd with n1 ≤ n2 ≤ · · · ≤ nd, matrices
Qkik∈ Rnk×nk , Qk
ik 0 and
∑mkik=1Q
kik 0 for all 1 ≤ k ≤ d and 1 ≤ ik ≤ mk.
1 Rewrite F as a (d− 1)-th order tensor F ′ ∈ Rn2×n3×···×nd−1×ndn1 by combing its
first and last modes into one, and placing it in the last mode of F ′, i.e.,
Fi1,i2,··· ,id = F ′i2,i3,··· ,id−1,(i1−1)nd+id∀ 1 ≤ i1 ≤ n1, 1 ≤ i2 ≤ n2, · · · , 1 ≤ id ≤ nd.
2 Compute matrices P i1,id = Q1i1 ⊗Q
did
for all 1 ≤ i1 ≤ m1 and 1 ≤ id ≤ md.
3 For (TQ) with the (d − 1)-th order tensor F ′, matrices Qkik
(2 ≤ k ≤ d − 1, 1 ≤
ik ≤ mk) and P i1,id (1 ≤ i1 ≤ m1, 1 ≤ id ≤ md): if d − 1 = 2, then apply SDP
relaxation and randomization procedure (Nemirovski et al. [87]) to obtain an ap-
proximate solution (x2, x1,d); otherwise obtain a solution (x2, x3, · · · , xd−1, x1,d)
by recursion.
4 Compute a matrix M ′ = F (·, x2, x3, · · · , xd−1, ·) and rewrite the vector x1,d as
a matrix X ∈ Rn1×nd.
3.4 Applications 43
5 Apply DR 3.3.1 with input (Q1i ,Q
dj ,X,M ′) = (Qi,P j ,W ,M) for all 1 ≤ i ≤ m1
and 1 ≤ j ≤ m2 and output (x1, xd) = (x,y).
• OUTPUT: a feasible solution (x1, x2, · · · , xd).
3.4 Applications
As we mentioned in the beginning of this chapter, one of the main reasons to study
multilinear form optimizations is its strong connection to homogenous polynomial opti-
mizations in deriving approximation bounds, which will be discussed in the next chap-
ter. Apart from that, these models also have versatile applications. Here we present two
problems in this section and show that they are readily formulated by the polynomial
optimization models in this chapter.
3.4.1 Singular Values of Trilinear Forms
Trilinear forms play an increasingly important role in many parts of analysis, e.g., in
Fourier analysis, where they appear in the guise of paracommutators and compensated
quantities (see a survey by Peng and Wong [95]). The problem of singular values of
trilinear forms is the following (see also [18]). Denote H1, H2 and H3 to be three
separable Hilbert spaces over the field K, where K stands either for the real or the
complex numbers, and denote a trilinear form F : H1 × H2 × H3 7→ K. The spectrum
norm of the trilinear form F is then the following maximization problem:
‖F ‖S := sup |F (x,y, z)|
s.t. ‖x‖ ≤ 1, ‖y‖ ≤ 1, ‖z‖ ≤ 1,
x ∈ H1, y ∈ H2, z ∈ H3.
More generally, one can state the problem of the stationary values of the functional
|F (x,y, z)| under the same conditions. These corresponding stationary values are
called singular values of the trilinear form F . Bernhardsson and Peetre [18] showed
in the binary case, that ‖F ‖S2 are among the roots of a certain algebraic equation,
called the millenial equation, thought of as a generalization of the time honored secular
equation in the case of matrices. Another approach to singular values is given by De
Lathauwer et al. [28].
44 3 Multilinear Form Optimization with Quadratic Constraints
When specializing the Hilbert spaces to finite dimensional Euclidean spaces, i.e.,
Hi = Rni for i = 1, 2, 3, and reserving the field K to be the real, the problem of
computing the largest singular value ‖F ‖S is equivalent to (TS) when d = 3. This
is because one can always use (−x,y, z) to replace (x,y, z) if its objective value is
negative, hence the absolute value sign in |F (x,y, z)| can be omitted. Moreover, we
can also scale the decision variables such that ‖x‖ = ‖y‖ = ‖z‖ = 1 without decreasing
the objective. According to Proposition 3.2.2, the problem of computing ‖F ‖S is NP-
hard already in this real case. Together with Theorem 3.2.3, the spectrum norm of a
trilinear form can be approximated in polynomial-time with a factor of 1√minn1,n2,n3
.
3.4.2 Rank-One Approximation of Tensors
Decompositions of higher order tensors (i.e., the order of the tensor is bigger than or
equal to 3) have versatile applications in psychometrics, chemometrics, signal process-
ing, computer vision, numerical analysis, data mining, neuroscience, graph analysis,
and elsewhere (see e.g., an excellent survey by Kolda and Bader [68]). The earliest sto-
ry of tensor decomposition dates back to 1927, where Hitchcock [55, 56] proposed the
idea of the polyadic form of a tensor. Today, tensor decomposition is most widely used
in the form of canonical decomposition (CANDECOMP) by Carroll and Chang [23]
and parallel factors (PARAFAC) by Harshman [45], or in short CP decomposition.
A CP decomposition decomposes a tensor as a summation of rank-one tensors, i.e.,
tensors who can be written as outer product of vectors (see e.g., [67]). Specifically, for
a d-th order tensor F = (Fi1i2···id) ∈ Rn1×n2×···×nd and a given positive integer r, its
CP decomposition is as follows:
F ≈r∑i=1
x1i ⊗ x2
i ⊗ · · · ⊗ xdi ,
where xki ∈ Rnk for i = 1, 2, . . . , r, k = 1, 2, . . . , d. Exact recovery of rank-one decom-
positions is in general impossible, due to various reasons, e.g., data errors. Thus the
following optimization problem for the CP decomposition is straightforward, i.e., to
minimize the norm of the difference,
min ‖F −∑r
i=1 x1i ⊗ x2
i ⊗ · · · ⊗ xdi ‖
s.t. xki ∈ Rnk , i = 1, 2, . . . , r, k = 1, 2, . . . , d.
In particular, the case of r = 1 corresponds to the best rank-one approximation of a
3.5 Numerical Experiments 45
tensor, i.e.,
(TA) min ‖F − x1 ⊗ x2 ⊗ · · · ⊗ xd‖
s.t. xk ∈ Rnk , k = 1, 2, . . . , d.
By scaling, we may require the norm of xk to be one, then (TA) is equivalent to
min ‖F − λx1 ⊗ x2 ⊗ · · · ⊗ xd‖
s.t. λ ∈ R, xk ∈ Snk , k = 1, 2, . . . , d.
For any fixed xk ∈ Snk (k = 1, 2, . . . , d), if we optimize the objective function of (TA)
with respect to λ, we shall have
minλ∈R‖F − λx1 ⊗ x2 ⊗ · · · ⊗ xd‖
= minλ∈R
√‖F ‖2 − 2λF • (x1 ⊗ x2 ⊗ · · · ⊗ xd) + λ2 ‖x1 ⊗ x2 ⊗ · · · ⊗ xd‖2
= minλ∈R
√‖F ‖2 − 2λF (x1,x2, · · · ,xd) + λ2
=
√‖F ‖2 − (F (x1,x2, · · · ,xd))2
.
Thus (TA) is equivalent to
max |F (x1,x2, · · · ,xd)|
s.t. xk ∈ Snk , k = 1, 2, . . . , d,
which is the same as (TS) discussed in Section 3.2. Remark that similar deductions can
also be found in [29, 122, 67].
3.5 Numerical Experiments
In this section we are going to test the numerical performance of the approximation
algorithms proposed in this chapter. As mentioned in Section 2.1.3, all the numerical
computations reported in this thesis are performed on an Intel Pentium 4 CPU 2.80GHz
computer with 2GB of RAM, and the supporting software is Matlab 7.7.0 (R2008b).
The experiments in this section focus on the model (TS) with d = 4, or equivalent-
ly, to recover the best rank-one approximation of a fourth order tensor discussed in
Section 3.4.2. Specifically, we are going to test the following problem
(ETS) max F (x,y, z,w) =∑
1≤i,j,k,`≤n Fijk` xiyjzkw`
s.t. x,y, z,w ∈ Sn.
46 3 Multilinear Form Optimization with Quadratic Constraints
3.5.1 Randomly Simulated Data
A fourth order tensor F is generated randomly, whose n4 entries are i.i.d. standard
normals. Basically we have a choice to make in the recursion in Algorithm 3.2.3,
yielding two different methods, both of which call the deterministic routine DR 3.2.2.
Method 1 follows the standard recursion procedures in Algorithm 3.2.3, and its first
relaxation problem is
v1 = max F (Z,w) =∑
1≤i,j,k,`≤n Fijk`Zijkw`
s.t. Z ∈ Sn×n×n,w ∈ Sn.
After we get its optimal solution (Z∗,w∗), we fix w∗ and the problem is then reduced
to a trilinear case of (ETS), where recursion goes on. The objective value of the ap-
proximate solution obtained is denoted by v1, and a ratio τ1 := v1/v1 is also computed.
On the other hand, Method 2 chooses the other relaxation as its first step, which is
v2 = max F (X,Z) =∑
1≤i,j,k,`≤n Fijk`XijZk`
s.t. X,Z ∈ Sn×n.
After we get its optimal solution (X∗,Z∗), we may first fix Z∗ and apply DR 3.2.2
to decompose X∗ into x, y ∈ Sn, and then fix (x, y) and apply DR 3.2.2 again to
decompose Z∗ into z, w ∈ Sn, resulting a feasible solution. We also compute its
objective value v2 and a ratio τ2 := v2/v2.
According to Theorem 3.2.4, Method 1 enjoys a theoretic worst-case performance
ratio of 1/n. Method 2 follows a similar fashion of Algorithm 3.2.3 by choosing a dif-
ferent recursion. It also enjoys a worst-case ratio of 1/n, which can be proven similarly
as that of Theorem 3.2.4. From the simulation results in Table 3.1, the objective values
of their feasible solutions are indeed very close. However, Method 2 computes a much
better upper bound of v(ETS), and thus ends up with a better approximation ratio.
The numerical results in Table 3.1 seem to indicate that the performance ratio of
Method 1 is about 2/n, while that of Method 2 is about 1/√n. The main reason for
the difference of upper bounds of v(ETS) (v1 vs. v2) is the first relaxation methods. By
Proposition 3.2.1 we may guess that v1 = Ω(‖F ‖/√n), while v2 = Ω(‖F ‖/n), and this
may contribute to the large gap between v1 and v2. Consequently, it is quite possible
that the true value of v(ETS) is closer to the solution values (v1 and v2), rather than the
optimal value of the relaxed problem (v2). The real quality of the solutions produced
is possibly much better than what is shown by the upper bounds.
3.5 Numerical Experiments 47
Table 3.1: Numerical results (average of 10 instances) of (ETS)
n 2 5 10 20 30 40 50 60 70
v1 2.61 5.64 8.29 9.58 12.55 13.58 15.57 17.65 18.93
v2 2.69 6.57 7.56 10.87 11.74 13.89 14.56 17.10 17.76
v1 3.84 12.70 34.81 93.38 169.08 258.94 360.89 472.15 594.13
v2 2.91 9.46 20.46 39.40 59.55 79.53 99.61 119.77 140.03
τ1 (%) 67.97 44.41 23.81 10.26 7.42 5.24 4.31 3.74 3.19
τ2 (%) 92.44 69.45 36.95 27.59 19.71 17.47 14.62 14.28 12.68
n · τ1 1.36 2.22 2.38 2.05 2.23 2.10 2.16 2.24 2.23
n · τ2 1.85 3.47 3.69 5.52 5.91 6.99 7.31 8.57 8.88√n · τ2 1.31 1.55 1.17 1.23 1.08 1.10 1.03 1.11 1.06
Table 3.2: CPU seconds (average of 10 instances) for solving (ETS)
n 5 10 20 30 40 50 60 70 80 90 100 150
Method 1 0.01 0.01 0.02 0.06 0.20 0.45 0.95 1.94 3.04 5.08 8.04 58.4
Method 2 0.01 0.02 1.13 12.6 253 517 2433 9860 ∞ ∞ ∞ ∞
Although Method 2 works clearly better than Method 1 in terms of upper bound
of v(ETS), it requires much more computational time. The most expensive part of
Method 2 is in its first relaxation, computing the eigenvalue and its corresponding
eigenvector of an n2 × n2 matrix. In comparison, for Method 1 the corresponding part
involves only an n× n matrix. Evidence in Table 3.2 shows that Method 1 can find a
good quality solution very fast even for large size problems. We remark here that for
n = 100, the sizes of the input data are already in the magnitude of 108.
3.5.2 Data with Known Optimal Solutions
The upper bounds appear to be quite loose in general, as one may observe from the
previous numerical results. To test how good the solutions are without referring to
the computed upper bounds, in this subsection we report the test results where the
problem instances are constructed in such a way that the optimal solutions are known.
By this we hope to get some impression, from a different angle, on the quality of
the approximate solutions produced by our algorithms. We first randomly generate a
48 3 Multilinear Form Optimization with Quadratic Constraints
Table 3.3: Numerical ratios of (ETS) with known optima for n = 50
m 5 10 20 30 40 50 100 150 200
Minimal ratio (%) 50 66 43 37 37 100 100 100 100
Maximal ratio (%) 100 100 100 100 100 100 100 100 100
Average ratio (%) 97 86 76 87 97 100 100 100 100
Optimality instances (%) 7 10 35 71 94 100 100 100 100
vector a ∈ Sn, and generate m symmetric matrices Ai ∈ Rn×n (1 ≤ i ≤ m) with their
eigenvalues lying in the interval [−1, 1] and Aia = a for i = 1, 2, . . . ,m. Then, we
randomly generate a vector b ∈ Sn, and m symmetric matrices Bi ∈ Rn×n (1 ≤ i ≤ m)
with their eigenvalues in the interval [−1, 1] and Bib = b for i = 1, 2, . . . ,m. Define
F (x,y, z,w) =m∑i=1
(xTAiy · zTBiw
).
For this particular multilinear form F (x,y, z,w), it is easy to see that (ETS) has an
optimal solution (a,a, b, b) and optimal value is m.
We generate such random instances with n = 50 for various m, and subsequently
apply Method 1 to solve them. Since the optimal values are known, it is possible to
compute the exact performance ratios, i.e., v1/m. For each m, 200 random instances
are generated and tested. The results are shown in Table 3.3, which suggest that
our algorithm works very well and the performance ratios are much better than the
theoretical worst-case bounds. Indeed, whenever m ≥ 50 our algorithm always finds
optimal solutions.
Chapter 4
Homogeneous Form Optimization
with Quadratic Constraints
4.1 Introduction
This section studies optimizations of an important class of polynomial functions, name-
ly, homogeneous polynomials, or forms. The constraint set is defined by homogeneous
quadratic polynomial equalities or inequalities. Specifically, the models include maxi-
mizing a homogenous form over the Euclidean sphere
(HS) max f(x)
s.t. x ∈ Sn,
and maximizing a homogenous form over the intersection of co-centered ellipsoids
(HQ) max f(x)
s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,
x ∈ Rn,
where Qi 0 for k = 1, 2, . . . , d, and∑m
i=1Qi 0.
As a general extension, we also consider optimizations on mixed forms, i.e.,
Function M f(x1,x2, · · · ,xs) = F (x1,x1, · · · ,x1︸ ︷︷ ︸d1
,x2,x2, · · · ,x2︸ ︷︷ ︸d2
, · · · ,xs,xs, · · · ,xs︸ ︷︷ ︸ds
),
where d = d1 + d2 + · · · + ds is deemed as a fixed constant, and d-th order tensor
F ∈ Rn1d1×n2
d2×···×nsds has partial symmetric property. The mixed form optimization
49
50 4 Homogeneous Form Optimization with Quadratic Constraints
models include
(MS) max f(x1,x2, · · · ,xs)
s.t. xk ∈ Snk , k = 1, 2, . . . , s;
(MQ) max f(x1,x2, · · · ,xs)
s.t. (xk)TQkikxk ≤ 1, k = 1, 2, . . . , s, ik = 1, 2, . . . ,mk,
xk ∈ Rnk , k = 1, 2, . . . , s,
where Qkik 0 and
∑mkik=1Q
kik 0 for k = 1, 2, . . . , s, ik = 1, 2, . . . ,mk.
The model (MS) is a generalization of (HS) and (TS) in Section 3.2, and (MQ) is
a generalization of (HQ) and (TQ) in Section 3.3. When the degree of the polynomial
objective is odd, (HS) is equivalent to
max f(x)
s.t. x ∈ Sn.
This is because we can always use −x to replace x if its objective value is negative, and
can also scale the vector x along its direction to make it in Sn. Thus (HS) is a special
case of (HQ) when d is odd. However if d is even, the optimal value of (HS) may be
negative, while that of (HQ) is always nonnegative since 0 is always a feasible solution
of (HQ). In the former case, the tensor F is called negative definite, i.e., f(x) < 0 for
all x 6= 0.
The model (HS) is in general NP-hard. When d = 1, (HS) has a close-form solution,
due to the Cauchy-Schwartz inequality; And when d = 2, (HS) is to compute the largest
eigenvalue of the symmetric matrix F ; However (HS) becomes NP-hard when d = 3,
first proven by Nesterov [90]. Interestingly, when d ≥ 3, the model (HS) is also regarded
as computing the largest eigenvalue of the super-symmetric tensor F , like the case d = 2
(see e.g., Qi [98]). Luo and Zhang [77] proposed the first polynomial-time randomized
approximation algorithm with relative approximation ratio Ω(1/n2
)when d = 4, based
on its quadratic SDP relaxation and randomization techniques.
For the model (HQ): When d = 1, it can be formulated as a standard SOCP prob-
lem, which is solvable in polynomial-time; When d = 2, it is the well known QCQP
problem discussed in Section 2.5, and known to be NP-hard in general. Nemirovski et
al. [87] proposed a polynomial-time randomized approximation algorithm with approx-
imation ratio Ω (1/ logm) based on SDP relaxation and randomization, and this ratio
is actually tight; When d = 4, Luo and Zhang [77] established the relationship between
4.1 Introduction 51
(HQ) and its quadratic SDP relaxation, and proposed polynomial-time approximation
algorithm when the number of constraints is one. Meanwhile, Ling et al. [73] proposed
bi-quadratic optimization model, which is exactly the model (MS) when d = 4 and
d1 = d2 = 2. In particular, they established the equivalence between (MS) and its
quadratic SDP relaxation, based on which they proposed a polynomial-time random-
ized approximation algorithm with relative approximation ratio Ω(1/n2
2).
For the model (MS), the computational complexity is similar to its special cases (TS)
and (HS). It is solvable in polynomial-time when d ≤ 2, and is NP-hard when d ≥ 3,
which is claimed in Section 4.4.1. Moreover, when d ≥ 4 and all di (1 ≤ k ≤ s) are even,
there is no polynomial-time approximation algorithm with a positive approximation
ratio unless P = NP . This is verified in its simple case of d = 4 and d1 = d2 = 2
by Ling et al. [73]. The complexity of (MQ) is also same to that of (HQ), i.e., being
solvable in polynomial-time when d = 1 and NP-hard when d ≥ 2. Meanwhile, a special
case of (MQ) when d = 4 and d1 = d2 = 2 is the biquadratic form optimization over
quadratic constraints, studied by Zhang et al. [123] and Ling al. [74]. In their work,
the relationship between biquadratic optimization and its bilinear SDP relaxation is
established, as well as some data dependent approximation bounds are derived.
In this chapter, we are going to present polynomial-time approximation algorithms
with guaranteed worse-case performance ratios for the models concerned. Our algo-
rithms work for any fixed degree d, and the approximation ratios improve all the previ-
ous works specialized to their particular degrees. The major break though for our work
is the multilinear tensor form relaxation, in stead of quadratic SDP relaxation methods
in [77, 73]. The relaxed multilinear optimization problems admit polynomial-time ap-
proximation algorithms discussed in Chapter 3. After we solved the relaxed problems,
we merge a bunch of relaxed variables into one feasible solution by a link identity, and
argue the quality ratios being deteriorated only by some constants, which is the main
contribution in this chapter.
The approximate algorithms of (HS) are presented in Section 4.2, followed by that
of (HQ) in Section 4.3. Models (MS) and (MQ) will be studied in Section 4.4. In
Section 4.5, we discuss some applications with the models presented in this chapter.
Finally, numerical performance of the proposed algorithms will be reported in Sec-
tion 4.6.
52 4 Homogeneous Form Optimization with Quadratic Constraints
4.2 Homogeneous Form with Spherical Constraint
The first model in this chapter is to maximize a homogenous polynomial function of
fixed degree d over a sphere, i.e.,
(HS) max f(x)
s.t. x ∈ Sn.
Let F be the super-symmetric tensor satisfying F (x,x, · · · ,x︸ ︷︷ ︸d
) = f(x). Then (HS)
can be relaxed to multilinear form optimization model (TS) discussed in Chapter 3, as
follows:
(HS) max F (x1,x2, · · · ,xd)
s.t. xk ∈ Sn, k = 1, 2, . . . , d.
Theorem 3.2.4 asserts that (HS) can be solved approximately in polynomial-time, with
approximation ratio n−d−22 . The key step is to draw a feasible solution of (HS) from
the approximate solution of (HS). For this purpose, we establish the following link
between (HS) and (HS).
Lemma 4.2.1 Suppose x1,x2, . . . ,xd ∈ Rn, and ξ1, ξ2, · · · , ξd are i.i.d. random vari-
ables, each taking values 1 and −1 with equal probability 1/2. For any super-symmetric
d-th order tensor F and function f(x) = F (x,x, · · · ,x), it holds that
E
[d∏i=1
ξif
(d∑
k=1
ξkxk
)]= d!F (x1,x2, · · · ,xd).
Proof. First we observe that
E
[d∏i=1
ξif
(d∑
k=1
ξkxk
)]= E
d∏i=1
ξi∑
1≤k1,k2,··· ,kd≤dF(ξk1x
k1 , ξk2xk2 , · · · , ξkdx
kd)
=∑
1≤k1,k2,··· ,kd≤dE
d∏i=1
ξi
d∏j=1
ξkjF(xk1 ,xk2 , · · · ,xkd
) .If k1, k2, · · · , kd is a permutation of 1, 2, . . . , d, then
E
d∏i=1
ξi
d∏j=1
ξkj
= E
[d∏i=1
ξ2i
]= 1;
Otherwise, there must be an index k0 with 1 ≤ k0 ≤ d and k0 6= kj for all 1 ≤ j ≤ d.
In the latter case,
E
d∏i=1
ξi
d∏j=1
ξkj
= E [ξk0 ] E
∏1≤i≤d,i 6=k0
ξi
d∏j=1
ξkj
= 0.
4.2 Homogeneous Form with Spherical Constraint 53
Since the number of different permutations of 1, 2, . . . , d is d!, by taking into account
of the super-symmetric property of the tensor F , the claimed relation follows.
When d is odd, the identity in Lemma 4.2.1 can be rewritten as
d!F (x1,x2, · · · ,xd) = E
[d∏i=1
ξif
(d∑
k=1
ξkxk
)]= E
f d∑k=1
∏i 6=k
ξi
xk .
Since ξ1, ξ2, · · · , ξd are i.i.d. random variables taking values 1 or −1, by randomization
we may find a particular binary vector β ∈ Bd, such that
f
d∑k=1
∏i 6=k
βi
xk ≥ d!F (x1,x2, · · · ,xd). (4.1)
We remark that d is considered a constant parameter in this thesis. Therefore, searching
over all the combinations can be done, in principle, in constant time.
Let x =∑d
k=1
(∏i 6=k βi
)xk, and x = x/‖x‖. By the triangle inequality, we have
‖x‖ ≤ d, and thus
f(x) ≥ d!d−dF (x1,x2, · · · ,xd).
Combining with Theorem 3.2.4, we have
Theorem 4.2.2 When d ≥ 3 is odd, (HS) admits a polynomial-time approximation
algorithm with approximation ratio τ(HS), where
τ(HS) := d! d−dn−d−22 = Ω
(n−
d−22
).
The algorithm for approximately solving (HS) with odd d is highlighted below.
Algorithm 4.2.1
• INPUT: a d-th order super-symmetric tensor F ∈ Rnd
1 Apply Algorithm 3.2.3 to solve the problem
max F (x1,x2, · · · ,xd)
s.t. xk ∈ Sn, k = 1, 2, . . . , d.
approximately, with input F and output (x1, x2, · · · , xd).
2 Compute β = arg maxξ∈Bdf(∑d
k=1 ξkxk)
, or randomly generate β uniformly
on Bd and repeat if necessary, until f(∑d
k=1 βkxk)≥ d!F (x1, x2, · · · , xd).
54 4 Homogeneous Form Optimization with Quadratic Constraints
3 Compute x =∑d
k=1 βkxk/‖∑d
k=1 βkxk‖.
• OUTPUT: a feasible solution x ∈ Sn.
We remark that it is unnecessary to enumerate all possible 2d combinations in Step
2 of Algorithm 4.2.1, as (4.1) suggests that a simple randomization process will serve
the same purpose, especially when d is large. In the latter case, we will end up with
a polynomial-time randomized approximation algorithm; otherwise, the computational
complexity of Algorithm 4.2.1 is deterministic and is polynomial-time for fixed d.
When d is even, the only easy case of (HS) appears when d = 2, and even worse,
we have the following:
Proposition 4.2.3 If d = 4, then there is no polynomial-time approximation algorithm
with a positive approximation ratio for (HS) unless P = NP .
Proof. Let f(x) = F (x,x,x,x) with F being super-symmetric. We say quartic form
F (x,x,x,x) is positive semidefinite if F (x,x,x,x) ≥ 0 for all x ∈ Rn. It is well
known that checking the positive semidefiniteness of F (x,x,x,x) is co-NP-complete.
If we were able to find a polynomial-time approximation algorithm to get a positive
approximation ratio τ ∈ (0, 1] for v∗ = maxx∈Sn −F (x,x,x,x), then this algorithm
can be used to check the positive semidefiniteness of F (x,x,x,x). To see why, suppose
this algorithm returns a feasible solution x with −F (x, x, x, x) > 0, then F (x,x,x,x)
is not positive semidefinite. Otherwise the algorithm must return a feasible solution x
with 0 ≥ −F (x, x, x, x) ≥ τ v∗, which implies v∗ ≤ 0; hence, F (x,x,x,x) is positive
semidefinite in this case. Therefore, such algorithm cannot exist unless P = NP .
This negative result rules out any polynomial-time approximation algorithm with
a positive absolute approximation ratio for (HS) when d ≥ 4 is even. Thus we can only
speak of relative approximation ratio. The following algorithm applies for (HS) when
d is even.
Algorithm 4.2.2
• INPUT: a d-th order super-symmetric tensor F ∈ Rnd.
4.2 Homogeneous Form with Spherical Constraint 55
1 Choose any vector x0 ∈ Sn and define a d-th order super-symmetric tensor H ∈
Rnd with respect to the homogeneous polynomial h(x) = (xTx)d/2.
2 Apply Algorithm 3.2.3 to solve the problem
max F (x1,x2, · · · ,xd)− f(x0)H(x1,x2, · · · ,xd)
s.t. xk ∈ Sn, k = 1, 2, . . . , d
approximately, with input F − f(x0)H and output (x1, x2, · · · , xd).
3 Compute β = arg maxξ∈Bd,∏dk=1 ξk=1
f
( ∑dk=1 ξkx
k
‖∑dk=1 ξkx
k‖
).
4 Compute x = arg max
f(x0), f
( ∑dk=1 βkx
k
‖∑dk=1 βkx
k‖
).
• OUTPUT: a feasible solution x ∈ Sn.
Theorem 4.2.4 When d ≥ 4 is even, (HS) admits a polynomial-time approximation
algorithm with relative approximation ratio τ(HS).
Proof. Denote H to be the super-symmetric tensor with respect to the homogeneous
polynomial h(x) = ‖x‖d = (xTx)d/2. Explicitly, if we denote Π to be the set of all
permutations of 1, 2, . . . , d, then
H(x1,x2, · · · ,xd) =1
|Π|∑
(i1,i2,··· ,id)∈Π
((xi1)Txi2
) ((xi3)Txi4
)· · ·((xid−1)Txid
).
For any xk ∈ Sn (k = 1, 2, . . . , d), we have |H(x1,x2, · · · ,xd)| ≤ 1 by applying the
Cauchy-Schwartz inequality termwise.
Pick any fixed x0 ∈ Sn, and consider the following problem
(HS) max F (x1,x2, · · · ,xd)− f(x0)H(x1,x2, · · · ,xd)
s.t. xk ∈ Sn, k = 1, 2, . . . , d.
Applying Theorem 3.2.4 we obtain a solution (x1, x2, · · · , xd) in polynomial-time, with
F (x1, x2, · · · , xd)− f(x0)H(x1, x2, · · · , xd) ≥ τ(TS)v(HS),
where τ(TS) := n−d−22 .
Let us first work on the case that
f(x0)− v(HS) ≤ (τ(TS)/4) (v(HS)− v(HS)) . (4.2)
56 4 Homogeneous Form Optimization with Quadratic Constraints
Since |H(x1, x2, · · · , xd)| ≤ 1, we have
F (x1, x2, · · · , xd)− v(HS)H(x1, x2, · · · , xd)
= F (x1, x2, · · · , xd)− f(x0)H(x1, x2, · · · , xd) +(f(x0)− v(HS)
)H(x1, x2, · · · , xd)
≥ τ(TS)v(HS)−(f(x0)− v(HS)
)≥ τ(TS)
(v(HS)− f(x0)
)− (τ(TS)/4) (v(HS)− v(HS))
≥ (τ(TS)(1− τ(TS)/4)− τ(TS)/4) (v(HS)− v(HS))
≥ (τ(TS)/2) (v(HS)− v(HS)) ,
where the second inequality is due to the fact that the optimal solution of (HS) is
feasible for (HS).
On the other hand, let ξ1, ξ2, · · · , ξd be i.i.d. random variables, each taking values
1 and −1 with equal probability 1/2. By symmetricity, we have Prob∏d
i=1 ξi = 1
=
Prob∏d
i=1 ξi = −1
= 1/2. Applying Lemma 4.2.1 we know
d!(F (x1, x2, · · · , xd)− v(HS)H(x1, x2, · · · , xd)
)= E
[d∏i=1
ξi
(f
(d∑
k=1
ξkxk
)− v(HS)h
(d∑
k=1
ξkxk
))]
= E
f ( d∑k=1
ξkxk
)− v(HS)
∥∥∥∥∥d∑
k=1
ξkxk
∥∥∥∥∥d ∣∣∣∣∣
d∏i=1
ξi = 1
Prob
d∏i=1
ξi = 1
−E
f ( d∑k=1
ξkxk
)− v(HS)
∥∥∥∥∥d∑
k=1
ξkxk
∥∥∥∥∥d ∣∣∣∣∣
d∏i=1
ξi = −1
Prob
d∏i=1
ξi = −1
≤ 1
2E
f ( d∑k=1
ξkxk
)− v(HS)
∥∥∥∥∥d∑
k=1
ξkxk
∥∥∥∥∥d ∣∣∣∣∣
d∏i=1
ξi = 1
,where the last inequality is due to the fact that
f
(d∑
k=1
ξkxk
)− v(HS)
∥∥∥∥∥d∑
k=1
ξkxk
∥∥∥∥∥d
≥ 0,
since∑d
k=1 ξkxk/∥∥∥∑d
k=1 ξkxk∥∥∥ ∈ Sn. Thus by randomization, we can find β ∈ Bd
with∏di=1 βi = 1, such that
1
2
f ( d∑k=1
βkxk
)− v(HS)
∥∥∥∥∥d∑
k=1
βkxk
∥∥∥∥∥d ≥ d! (τ(TS)/2) (v(HS)− v(HS)) .
By letting x =∑d
k=1 βkxk/∥∥∥∑d
k=1 βkxk∥∥∥, and noticing ‖
∑dk=1 βkx
k‖ ≤ d, we have
f(x)− v(HS) ≥ d!τ(TS) (v(HS)− v(HS))
‖∑d
k=1 βkxk‖d
≥ τ(HS) (v(HS)− v(HS)) .
4.3 Homogeneous Form with Ellipsoidal Constraints 57
Recall that the above inequality is derived under the condition that (4.2) holds. In
case (4.2) does not hold, then
f(x0)− v(HS) > (τ(TS)/4) (v(HS)− v(HS)) ≥ τ(HS) (v(HS)− v(HS)) . (4.3)
By picking x = arg maxf(x), f(x0), regardless whether (4.2) or (4.3) holds, we shall
uniformly have f(x)− v(HS) ≥ τ(HS) (v(HS)− v(HS)).
4.3 Homogeneous Form with Ellipsoidal Constraints
We proceed a further generalization of the optimization models to include general
ellipsoidal constraints.
(HQ) max f(x)
s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,
x ∈ Rn,
where f(x) is a homogenous polynomial of degree d, Qi 0 for i = 1, 2, . . . ,m, and∑mi=1Qi 0.
If we relax (HQ) to the multilinear form optimization problem like (TQ), and we
have
(HQ) max F (x1,x2, · · · ,xd)
s.t. (xk)TQixk ≤ 1, k = 1, 2, . . . , d, i = 1, 2, . . . ,m,
xk ∈ Rn, k = 1, 2, . . . , d.
Theorem 3.3.4 asserts an approximate solution for (HQ), together with Lemma 4.2.1
we propose the following algorithm for approximately solving (HQ), no matter d is odd
or even.
Algorithm 4.3.1
• INPUT: a d-th order super-symmetric tensor F ∈ Rnd, matrices Qi ∈ Rn×n, Qi
0 for all 1 ≤ i ≤ m with∑m
i=1Qi 0.
1 Apply Algorithm 3.3.2 to solve the problem
max F (x1,x2, · · · ,xd)
s.t. (xk)TQixk ≤ 1, k = 1, 2, . . . , d, i = 1, 2, . . . ,m,
xk ∈ Rn, k = 1, 2, . . . , d
58 4 Homogeneous Form Optimization with Quadratic Constraints
approximately, and get a feasible solution (x1, x2, · · · , xd).
2 Compute x = arg maxf(
1d
∑dk=1 ξkx
k), ξ ∈ Bd
.
• OUTPUT: a feasible solution x ∈ Rn.
Although Algorithm 4.3.1 applies for both odd and even d of the model (HQ), the
approximation results are different, as the following theorems claimed.
Theorem 4.3.1 When d ≥ 3 is odd, (HQ) admits a polynomial-time randomized ap-
proximation algorithm with approximation ratio τ(HQ), where
τ(HQ) := d! d−dn−d−22 Ω
(log−(d−1)m
)= Ω
(n−
d−22 log−(d−1)m
).
Proof. According to Theorem 3.3.4 we can find a feasible solution (x1, x2, · · · , xd) of
(HQ) in polynomial-time, such that
F (x1, x2, · · · , xd) ≥ τ(HQ)v(HQ) ≥ τ(HQ)v(HQ), (4.4)
where τ(HQ) := n−d−22 Ω
(log−(d−1)m
). By (4.1), we can find a binary vector β ∈ Bd
in polynomial-time, such that
f
(d∑i=1
βixi
)≥ d!F (x1, x2, · · · , xd).
Notice that for any 1 ≤ k ≤ m,(d∑i=1
βixi
)T
Qk
d∑j=1
βjxj
=
d∑i,j=1
βi(xi)TQkβjx
j
=d∑
i,j=1
(βiQk
12 xi)T (
βjQk
12 xj)≤
d∑i,j=1
∥∥∥βiQk
12 xi∥∥∥∥∥∥βjQk
12 xj∥∥∥
=
d∑i,j=1
√(xi)TQkx
i√
(xj)TQkxj ≤
d∑i,j=1
1 · 1 = d 2. (4.5)
If we denote x = 1d
∑di=1 βix
i, then x is a feasible solution for (HQ), satisfying
f(x) ≥ d−dd!F (x1, x2, · · · , xd) ≥ d−dd!τ(HQ) v(HQ) = τ(HQ) v(HQ).
4.4 Mixed Form with Quadratic Constraints 59
Theorem 4.3.2 When d ≥ 4 is even, (HQ) admits a polynomial-time randomized
approximation algorithm with relative approximation ratio τ(HQ).
Proof. First, we observe that v(HQ) ≤ v(HQ) and v(HQ) ≥ v(HQ) = −v(HQ). There-
fore, 2 v(HQ) ≥ v(HQ)− v(HQ). Let (x1, x2, · · · , xd) be the feasible solution for (HQ)
as in the proof of Theorem 4.3.1, satisfying (4.4). According to (4.5), it follows that
x = arg maxf(
1d
∑dk=1 ξkx
k), ξ ∈ Bd
is feasible for (HQ), where ξ1, ξ2, · · · , ξd are
i.i.d. random variables, each taking values 1 and −1 with equal probability 1/2. There-
fore, by Lemma 4.2.1 we have
2 d!F (x1, x2, · · · , xd) = 2 E
[d∏i=1
ξif
(d∑
k=1
ξkxk
)]
= E
[f
(d∑
k=1
ξkxk
)−ddv(HQ)
∣∣∣∣∣d∏i=1
ξi = 1
]− E
[f
(d∑
k=1
ξkxk
)−ddv(HQ)
∣∣∣∣∣d∏i=1
ξi = −1
]
≤ E
[f
(d∑
k=1
ξkxk
)−ddv(HQ)
∣∣∣∣∣d∏i=1
ξi = 1
]≤ ddf (x)− ddv(HQ).
According to (4.4), this implies that
f (x)−v(HQ) ≥ 2d−dd!F (x1, x2, · · · , xd) ≥ 2τ(HQ)v(HQ) ≥ τ(HQ) (v(HQ)− v(HQ)) .
4.4 Mixed Form with Quadratic Constraints
In this section, we further extend polynomial optimization models to the mixed forms.
Specifically, we study the following two models:
(MS) max f(x1,x2, · · · ,xs)
s.t. xk ∈ Snk , k = 1, 2, . . . , s;
(MQ) max f(x1,x2, · · · ,xs)
s.t. (xk)TQkikxk ≤ 1, k = 1, 2, . . . , s, ik = 1, 2, . . . ,mk,
xk ∈ Rnk , k = 1, 2, . . . , s,
where Qkik 0 and
∑mkik=1Q
kik 0 for k = 1, 2, . . . , s, ik = 1, 2, . . . ,mk. Here we
assume that n1 ≤ n2 ≤ · · · ≤ ns.
Both (TS) in Section 3.2 and (HS) in Section 4.2 are special cases of (MS), and both
(TQ) in Section 3.3 and (HQ) in Section 4.3 are special cases of (MQ). In particular,
60 4 Homogeneous Form Optimization with Quadratic Constraints
(MS) is a generalization of the bi-quadratic optimization model discussed in Ling et
al. [73], specialized to d = 4 and d1 = d2 = 2.
4.4.1 Mixed Form with Spherical Constraints
Let us study the optimization model (MS). First, we have the following hardness result.
Proposition 4.4.1 If d = 3, then (MS) is NP-hard.
Proof. We need verify the NP-hardness in three cases of d = 3, i.e., d1 = 3, d1 = 2 and
d2 = 1, and d1 = d2 = d3 = 1. The case of d1 = 3 is exactly (HS) with d = 3, whose
NP-hardness is proven by Nesterov [90], and the case of d1 = d2 = d3 = 1 is exactly
(TS) with d = 3, whose NP-hardness is proven in Proposition 3.2.2.
When d1 = 2 and d2 = 1, in its special case n1 = n2 = n and F ∈ Rn3satisfying
Fijk = Fjik for all 1 ≤ i, j, k ≤ n, we notice the following form of (TS) is NP-hard in
the proof of Proposition 3.2.2
(TS) max F (x,y, z)
s.t. x,y, z ∈ Sn.
We are going to show that the optimal value of (TS) is equal to the optimal value of
this special case
(MS) max F (x,x, z)
s.t. x, z ∈ Sn.
It is obvious v(TS) ≥ v(MS). Now choose any optimal solution (x∗,y∗, z∗) of (TS)
and compute the matrix M = F (·, ·, z∗). Since M is symmetric, we can compute
an eigenvector x corresponding to the largest absolute eigenvalue λ (which is also the
largest singular value) in polynomial-time. Observe that
|F (x, x, z∗)|= |xTMx|=λ= maxx,y∈Sn
xTMy= maxx,y∈Sn
F (x,y, z∗)=F (x∗,y∗, z∗)=v(TS),
which implies either (x, x, z∗) or (x, x,−z∗) is an optimal solution of (TS). Therefore
v(TS) ≤ v(MS), and this claims v(TS) = v(MS). If (MS) can be solved in polynomial-
time, then its optimal solution is also an optimal solution of (TS), which solves (TS) in
polynomial-time, leading to a contradiction.
Now, we focus on polynomial-time approximation algorithms as before. Similar to
the relaxation in Section 4.2 in handling homogeneous polynomial optimizations, if we
4.4 Mixed Form with Quadratic Constraints 61
relax (MS) to the multilinear optimization (TS), then by Theorem 3.2.4 we are able to
find (x1, x2, · · · , xd) with ‖xk‖ = 1 for all 1 ≤ k ≤ d in polynomial-time, such that
F (x1, x2, · · · , xd) ≥ τ(MS)v(MS), (4.6)
where
τ(MS) :=
(∏s−1k=1 nk
dk
ns−1
)− 12
ds = 1,
(∏sk=1 nk
dk
ns2
)− 12
ds ≥ 2.
In order to draw a feasible solution of (MS) from (x1, x2, · · · , xd), we need apply the
link identity in Lemma 4.2.1 more carefully. Comparable approximation results as (HS)
can be similarly derived.
Theorem 4.4.2 If d ≥ 3 and one of dk (k = 1, 2, . . . , s) is odd, then (MS) admits a
polynomial-time approximation algorithm with approximation ratio τ(MS), where
τ(MS) := τ(MS)∏
1≤k≤s, 3≤dk
dk!
dkdk
= Ω (τ(MS))
=
∏1≤k≤s, 3≤dk
dk!
dkdk
(∏s−1k=1 nk
dk
ns−1
)− 12
ds = 1,
∏1≤k≤s, 3≤dk
dk!
dkdk
(∏sk=1 nk
dk
ns2
)− 12
ds ≥ 2.
To avoid messy notations and better understand the main ideas of the algorithm
and the proof, here we only consider a special case (MS), which is easily extended to
general (MS).
(MS) max F (x,x,x,x,y,y, z, z, z)
s.t. x ∈ Sn1 ,y ∈ Sn2 , z ∈ Sn3 .
By (4.6), we are able to find x1,x2,x3,x4 ∈ Sn1 ,y1,y2 ∈ Sn2 , and z1, z2, z3 ∈ Sn3
in polynomial-time, such that
F (x1,x2,x3,x4,y1,y2, z1, z2, z3) ≥ τ(MS)v(MS).
Let us first fix (y1,y2, z1, z2, z3) and work for the problem
max F (x,x,x,x,y1,y2, z1, z2, z3)
s.t. x ∈ Sn1 .
62 4 Homogeneous Form Optimization with Quadratic Constraints
Using the same argument in proving Theorem 4.2.2, we are able to find x ∈ Sn1 , such
that either F (x, x, x, x,y1,y2, z1, z2, z3) or F (x, x, x, x,y1,y2,−z1, z2, z3) will be no
less than 4!4−4F (x1,x2,x3,x4,y1,y2, z1, z2, z3), whereas in the latter case, use −z1
to update z1. Here even degree (d1 = 4) for x makes no trouble, as we can always move
the negative sign into z1. We call this process to be an adjustment of the variable x.
Up till now the approximation bound is 4!4−4τ(MS).
Next we work on adjustment of variable y and consider the problem
max |F (x, x, x, x,y,y, z1, z2, z3)|
s.t. y ∈ Sn2 .
This is the matrix largest absolute eigenvalue problem and can be solved in polynomial-
time. Denote its optimal solution to be y, update z1 with −z1 if necessary, and we
keep the approximation bound 4!4−4τ(MS) for the solution (x, x, x, x, y, y, z1, z2, z3).
The last adjustment of the variable z is straightforward. Similar to the adjustment
of the variable x, by focusing on
max F (x, x, x, x, y, y, z, z, z)
s.t. z ∈ Sn3 ,
we can find z ∈ Sn3 in polynomial-time, such that the solution (x, x, x, x, y, y, z, z, z)
admits an approximation bound 3!3−34!4−4τ(MS).
We remark here that the variable z is the last variable for adjustment, since we
cannot move the negative sign to other adjusted variables if the degree of z is even.
That is why we require one of dk to be odd as the condition of Theorem 4.4.2, where
we can always adjust the last variable with an odd degree.
However, if all dk (k = 1, 2, . . . , s) are even, we can only hope for a relative approx-
imation ratio. For its simplest case when d = 4 and d1 = d2 = 2, the bi-quadratic
optimization model maxx∈Sn1 ,y∈Sn2 F (x,x,y,y) does not admit any polynomial-time
approximation algorithm with a positive approximation ratio by Ling et al. [73]. Be-
fore working on this even case, let us first introduce the following link extended from
Lemma 4.2.1.
Lemma 4.4.3 Suppose xk ∈ Rn1 (1 ≤ k ≤ d1), xk ∈ Rn2 (d1 + 1 ≤ k ≤ d1 +
d2), · · · , xk ∈ Rns (d1 + d2 + · · · + ds−1 + 1 ≤ k ≤ d1 + d2 + · · · + ds = d), and
ξ1, ξ2, · · · , ξd are i.i.d. random variables, each taking values 1 and −1 with equal prob-
4.4 Mixed Form with Quadratic Constraints 63
ability 1/2. Denote
x1ξ =
d1∑k=1
ξkxk, x2
ξ =
d1+d2∑k=d1+1
ξkxk, · · · , xsξ =
d∑k=d1+d2+···+ds−1+1
ξkxk. (4.7)
For any partial symmetric d-th order tensor F ∈ Rn1d1×n2
d2×···×nsds and function
f(x1,x2, · · · ,xs) = F (x1,x1, · · · ,x1︸ ︷︷ ︸d1
,x2,x2, · · · ,x2︸ ︷︷ ︸d2
, · · · ,xs,xs, · · · ,xs︸ ︷︷ ︸ds
),
it holds that
E
[d∏i=1
ξif(x1ξ ,x
2ξ , · · · ,xsξ
)]=
s∏k=1
dk!F (x1,x2, · · · ,xd).
This lemma is easy to prove by applying Lemma 4.2.1 s times.
Theorem 4.4.4 If d ≥ 4 and all dk (k = 1, 2, . . . , s) are even, then (MS) admits
a polynomial-time approximation algorithm with relative approximation ratio τ(MS),
where
τ(MS) := τ(MS)
s∏k=1
dk!
dkdk
= Ω (τ(MS))
=
(s∏
k=1
dk!
dkdk
)(∏s−1k=1 nk
dk
ns−1
)− 12
ds = 1,
(s∏
k=1
dk!
dkdk
)(∏sk=1 nk
dk
ns2
)− 12
ds ≥ 2.
Proof. Denote H ∈ Rn1d1×n2
d2×···×nsds to be the partial symmetric d-th order tensor
with respect to the mixed form
h(x1,x2, · · · ,xs) =
s∏k=1
‖xk‖dk =
s∏k=1
((xk)Txk
) dk2.
Choose any fixed xk ∈ Snk for k = 1, 2, . . . , s, and consider the following problem
(MS) max F (x1,x2, · · · ,xd)− f(x1, x2, · · · , xs)H(x1,x2, · · · ,xd)
s.t. ‖xk‖ = 1, k = 1, 2, . . . , d.
Applying Theorem 3.2.4 we obtain a solution (x1, x2, · · · , xd) with ‖xk‖ = 1 for k =
1, 2, . . . , d in polynomial-time, such that
F (x1, x2, · · · , xd)− f(x1, x2, · · · , xs)H(x1, x2, · · · , xd) ≥ τ(MS)v(MS).
64 4 Homogeneous Form Optimization with Quadratic Constraints
Let us start with the case that
f(x1, x2, · · · , xs)− v(MS) ≤ (τ(MS)/4) (v(MS)− v(MS)) . (4.8)
Notice that |H(x1, x2, · · · , xd)| ≤ 1, we have
F (x1, x2, · · · , xd)− v(MS)H(x1, x2, · · · , xd)
= F (x1, x2, · · · , xd)− f(x1, x2, · · · , xs)H(x1, x2, · · · , xd)
+(f(x1, x2, · · · , xs)− v(MS)
)H(x1, x2, · · · , xd)
≥ τ(MS)v(MS)−(f(x1, x2, · · · , xs)− v(MS)
)≥ τ(MS)
(v(MS)− f(x1, x2, · · · , xs)
)− (τ(MS)/4) (v(MS)− v(MS))
≥ (τ(MS)(1− τ(MS)/4)− τ(MS)/4) (v(MS)− v(MS))
≥ (τ(MS)/2) (v(MS)− v(MS)) ,
where the second inequality is due to the fact that the optimal solution of (MS) is
feasible for (MS).
On the other hand, using the notation of (4.7) and applying Lemma 4.4.3, we have
s∏k=1
dk!(F (x1, x2, · · · , xd)− v(MS)H(x1, x2, · · · , xd)
)= E
[d∏i=1
ξi(f(x1ξ , x
2ξ , · · · , xsξ
)− v(MS)h
(x1ξ , x
2ξ , · · · , xsξ
))]
= E
[f(x1ξ , x
2ξ , · · · , xsξ
)− v(MS)
s∏k=1
‖xkξ‖dk∣∣∣∣∣d∏i=1
ξi = 1
]Prob
d∏i=1
ξi = 1
−E
[f(x1ξ , x
2ξ , · · · , xsξ
)− v(MS)
s∏k=1
‖xkξ‖dk∣∣∣∣∣d∏i=1
ξi = −1
]Prob
d∏i=1
ξi = −1
≤ 1
2E
[f(x1ξ , x
2ξ , · · · , xsξ
)− v(MS)
s∏k=1
‖xkξ‖dk∣∣∣∣∣d∏i=1
ξi = 1
],
where the last inequality is due to f(x1ξ , x
2ξ , · · · , xsξ
)− v(MS)
∏sk=1 ‖x
kξ‖dk ≥ 0, since(
x1ξ/‖x1
ξ‖, x2ξ/‖x2
ξ‖, · · · , xsξ/‖xsξ‖)
is feasible for (MS). Thus, there is a binary vector
β ∈ Bd with∏di=1 βi = 1, such that
1
2
(f(x1β, x
2β, · · · , xsβ
)− v(MS)
s∏k=1
‖xkβ‖dk)≥
s∏k=1
dk! (τ(MS)/2) (v(MS)− v(MS)) .
by letting xk = xkβ/‖xkβ‖ for k = 1, 2, . . . , s, and noticing ‖xkβ‖ ≤ dk, we have
f(x1, x2, · · · , xs)− v(MS) ≥ τ(MS)s∏
k=1
dk!‖xkβ‖−dk (v(MS)− v(MS))
≥ τ(MS) (v(MS)− v(MS)) .
4.4 Mixed Form with Quadratic Constraints 65
Recall that the above inequality is derived under the condition that (4.8) holds. In
case (4.8) does not hold, then we shall have
f(x1, x2, · · · , xs)− v(MS) > (τ(MS)/4) (v(MS)− v(MS)) ≥ τ(MS) (v(MS)− v(MS)) .
By picking (x1,x2, · · · ,xs) = arg maxf(x1, x2, · · · , xs), f(x1, x2, · · · , xs), we shall
uniformly have f(x1,x2, · · · ,xs)− v(MS) ≥ τ(MS) (v(MS)− v(MS)).
4.4.2 Mixed Form with Ellipsoidal Constraints
Finally, let us discuss the most general model (MQ) for homogeneous polynomial opti-
mization with quadratic constraints. We have similar results as of (MS) in Section 4.4.1.
Theorem 4.4.5 If d ≥ 3 and one of dk (k = 1, 2, . . . , s) is odd, then (MQ) admits a
polynomial-time randomized approximation algorithm with approximation ratio τ(MQ),
where
τ(MQ) := τ(MS) Ω(
log−(d−1)m) s∏k=1
dk!
dkdk
= Ω(τ(MS) log−(d−1)m
)
=
(s∏
k=1
dk!
dkdk
)(∏s−1k=1 nk
dk
ns−1
)− 12
Ω(
log−(d−1)m)
ds = 1,
(s∏
k=1
dk!
dkdk
)(∏sk=1 nk
dk
ns2
)− 12
Ω(
log−(d−1)m)
ds ≥ 2,
and m = max1≤k≤smk.
The proof of Theorem 4.4.5 is very similar to that of Theorem 4.4.2, where a typical
example is illustrated. Here we only highlight the main ideas. First we relax (MQ)
to the multilinear form optimization (TQ) which finds a feasible solution for (TQ) with
an approximation ratio τ(MS)Ω(
log−(d−1)m)
. Then, Lemma 4.4.3 servers as a bridge
from that solution to a feasible solution for (MQ). Specifically, we may adjust the
solution of (TQ) one by one. During each adjustment, we apply Lemma 4.2.1 once,
with the approximation ratio deteriorating no worse than dk!dk−dk . After s times of
adjustments, we are able to get a feasible solution for (MQ) with performance ratio
τ(MQ). Besides, the feasibility of the solution so-obtained is guaranteed by (4.5).
Theorem 4.4.6 If d ≥ 4 and all dk (k = 1, 2, . . . , s) are even, then (MQ) admits a
polynomial-time randomized approximation algorithm with relative approximation ratio
τ(MQ).
66 4 Homogeneous Form Optimization with Quadratic Constraints
Proof. The proof is analogous to that of Theorem 4.3.2. The main differences are:
(i) we use Lemma 4.4.3 instead of invoking Lemma 4.2.1 directly; and (ii) we use
f(
1d1x1ξ ,
1d2x2ξ , · · · , 1
dsxsξ
)instead of f
(1d
∑dk=1 ξkx
k)
during the randomization pro-
cess.
4.5 Applications
To better appreciate the homogeneous polynomial optimization models presented in
this chapter, in this section we are going to present a few examples rising from various
applications. In particular we shall discuss applications of the models (HS) and (MS).
4.5.1 Eigenvalues and Approximation of Tensors
Similar as the eigenvalues of matrices, this kind of concept has been extended to higher
order tensors (see e.g., [98, 99, 100, 91]). In fact, the concept for eigenvalues of tensors
become richer than that restricting to matrices. Qi [98] proposed several definitions of
tensor eigenvalues, among which the most popular and straightforward one is named
Z-eigenvalues. For a given d-th order super-symmetric tensor F ∈ Rnd , its Z-eigenvalue
λ ∈ R with its corresponding eigenvector x ∈ Rn are defined to be the solutions of the
following system: F (x,x, . . . ,x︸ ︷︷ ︸
d−1
, ·) = λx,
xTx = 1.
Notice that the Z-eigenvalues are the usual eigenvalues for a symmetric matrix when
the order of the tensor is 2. It was proven in Qi [98] that Z-eigenvalues exist for an
even order real super-symmetric tensor F , and F is positive definite if and only if all
of its Z-eigenvalues are positive, which is similar to the matrix case. Thus, the smallest
Z-eigenvalue of an even order super-symmetric tensor F is an important indicator of
positive definiteness for F . Conversely, the largest Z-eigenvalue can be an indicator of
the negative definiteness for F , which is exactly the model (HS). In general, the optimal
value and any optimal solution of (HS) is the largest Z-eigenvalue and its corresponding
eigenvector for the tensor F , no matter d is even or odd. By Theorem 4.2.2, the largest
Z-eigenvalue of an odd order super-symmetric tensor F can be approximated with a
factor of d!d−dn−d−22 . For an even order tensor, this approximation ratio is in relative
4.5 Applications 67
sense. However if we know in advance that the given even order tensor is positive
semidefinite, we can also have an approximation factor of d!d−dn−d−22 for its largest
Z-eigenvalue.
Regarding to the tensor approximation, in Section 3.4.2 we have discussed the best
rank-one decomposition of a tensor. In case that the give tensor F ∈ Rnd is super-
symmetric, then the corresponding best rank-one approximation should be
min
∥∥∥∥F − x⊗ x⊗ · · · ⊗ x︸ ︷︷ ︸d
∥∥∥∥s.t. x ∈ Rn.
Applying the same technique discussed in Section 3.4.2, we can equivalently reformulate
the above problem as
max F (x,x, · · · ,x︸ ︷︷ ︸d
)
s.t. x ∈ Sn,
which is identical to the largest eigenvalue problem and (HS). In fact, when d is odd, if
we denote its optimal solution (largest Z-eigenvector) to be x and optimal value (largest
Z-eigenvalue) to be λ = F (x, x, · · · , x︸ ︷︷ ︸d
), then the best rank-one approximation of the
super-symmetric tensor F is λ x⊗ x⊗ · · · ⊗ x︸ ︷︷ ︸d
.
4.5.2 Density Approximation in Quantum Physics
An interesting problem in physics is to give a precise characterization of entanglement
in a quantum system. This describes types of correlations between subsystems of the
full quantum system that go beyond the statistical correlations that can be found in a
classical composite system. Specifically it arises a matrix approximation problem. The
following formulation is proposed in Dahl et al. [27].
Denote ∆n+ to be the set of all n×n positive semedefinite matrices with trace being
1, i.e., ∆n+ := A ∈ Rn×n |A 0, tr (A) = 1 (sometimes it is also called the matrix
simplex). Using the matrix decomposition method (see e.g., Sturm and Zhang [113]),
it is not hard to verify that the extreme points of ∆n+ are all rank-one matrices, or
specifically, ∆n+ = conv xxT |x ∈ Sn. If n = n1n2, where n1 and n2 are given two
positive integers, then we call a matrix A ∈ ∆n+ separable if A can be written as a
convex combination
A =m∑i=1
λiBi ⊗Ci
68 4 Homogeneous Form Optimization with Quadratic Constraints
for some positive integer m, matrices Bi ∈ ∆n1+ and Ci ∈ ∆n2
+ for i = 1, 2, . . . ,m, and
nonnegative scalars λi (i = 1, 2, . . . ,m) with∑m
i=1 λi = 1. For given n1 and n2, denote
∆n,⊗+ to be the set of all separable matrices of order n = n1n2. The density approxima-
tion problem is the following. Given a density matrix A ∈ ∆n+, find a separable density
matrix X ∈ ∆n,⊗+ which is closest to A, or specifically, the minimization model
(DA) min ‖X −A‖
s.t. X ∈ ∆n,⊗+ .
This projection problem is in general NP-hard, mainly relying on the understanding
of ∆n,⊗+ . An important property of ∆n,⊗
+ is that all its extreme points are symmetric
rank-one matrices (x⊗y)(x⊗y)T with x ∈ Sn1 and y ∈ Sn2 (see the proof in Theorem
2.2 of [27]), i.e.,
∆n,⊗+ = conv (x⊗ y)(x⊗ y)T |x ∈ Sn1 , y ∈ Sn2.
Then in stead, we may turn to the projection subproblem of (DA), to find the projection
of A on the extreme points of ∆n,⊗+ , which is
min ‖(x⊗ y)(x⊗ y)T −A‖
s.t. x ∈ Sn1 , y ∈ Sn2 .
Straightforward computation shows that
‖(x⊗ y)(x⊗ y)T −A‖2 = 1− 2A • (x⊗ y)(x⊗ y)T + ‖A‖2.
Therefore the projection subproblem is equivalent to
max A • (x⊗ y)(x⊗ y)T
s.t. x ∈ Sn1 , y ∈ Sn2 ,
which is the exact model (MS) with d = 4 and d1 = d2 = 2.
4.6 Numerical Experiments
In this section we are going to present the numerical performance of the approximation
algorithms proposed in this chapter. In particular, the model (HQ) with d = 4 is being
tested, i.e.,
(EHQ) max f(x) =∑
1≤i,j,k,`≤n Fijk` xixjxkx`
s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,
x ∈ Rn,
4.6 Numerical Experiments 69
Table 4.1: Numerical results of (EHQ) for n = 10 and m = 30
Instance 1 2 3 4 5 6 7 8 9 10
100 · v 0.65 0.77 0.32 0.27 0.73 0.42 0.52 0.64 0.98 1.04
100 · v 4.96 4.53 4.75 5.05 5.86 5.32 5.00 5.19 5.07 5.92
τ (%) 13.10 17.00 6.74 5.35 12.46 7.89 10.40 12.33 19.33 17.57
n ln3m · τ 51.56 66.88 26.51 21.04 49.01 31.06 40.92 48.52 76.05 69.12
where fourth order tensor F is super-symmetric, and matrix Qi is positive semidefinite
for i = 1, 2, . . . ,m. During the testings, cvx v1.2 (Grant and Boyd [41]) is called for
solving the SDP problems whenever applicable.
4.6.1 Randomly Simulated Data
For the data of (EHQ), a fourth order tensor F ′ is randomly generated, whose n4 entries
follow i.i.d. standard normals. We then symmetrize F ′ to form a super-symmetric
tensor F by averaging the related entries. As to the constraints, we generate m matrix
Q′i ∈ Rn×n (i = 1, 2, . . . ,m) independently, whose entries also follow i.i.d. standard
normals, and then let Qi = (Q′i)TQ′i for i = 1, 2, . . . ,m.
For the particular nature of (EHQ), rather than directly applying Algorithm 4.3.1
to solve it, we use a simplified method. First (EHQ) is relaxed to
max F (X,X) =∑
1≤i,j,k,`≤n Fijk`XijXk`
s.t. tr (QiXQjXT) ≤ 1, i = 1, 2, . . . ,m, j = 1, 2, . . . ,m,
X ∈ Rn×n,
which is a standard quadratic program, and can be solved approximately by SDP
relaxation and randomization (see e.g., [75] or Section 2.5). The optimal value of the
SDP relaxation problem is denoted by v, which we shall use as an upper bound of
v(EHQ). We then apply DR 3.3.1 to decompose this approximate solution into x, y ∈
Rn. Finally we pick a vector with the best objective value of f(x) from 0, x, y, (x +
y)/2, (x − y)/2 as the output. This objective value is denoted by v, and a ratio
τ := v/v is also computed.
By following essentially the same proof, this simplified method also enjoys a worst-
case relative performance ratio of Ω(
1n log3m
), similar as Theorem 4.3.2 asserted. For
n = 10 and m = 30, we randomly generate 10 instances of (EHQ). The solution results
70 4 Homogeneous Form Optimization with Quadratic Constraints
Table 4.2: Numerical ratios (average of 10 instances) of (EHQ)
n 2 5 8 10 12
τ (%) for m = 1 90.2 57.9 73.3 66.2 60.0
τ (%) for m = 5 65.6 28.3 22.5 29.1 17.1
τ (%) for m = 10 60.4 22.3 14.6 16.0 8.9
τ (%) for m = 30 59.4 17.8 10.2 12.2 9.2
are shown in Table 4.1. In Table 4.2, the absolute approximation ratios for various n
and m are shown. Remark that the dimensions of the problems that can be efficiently
solved using our algorithms are not large, due to the limitation of solving large size
SDP relaxation problems.
4.6.2 Comparison with Sum of Squares Method
In this subsection, we compare our solution method with the so-called sum of squares
(SOS) method [70, 71] for solving (EHQ). Due to the limitations of the current SD-
P solvers, our method works only for small size problems. Since the SOS approach
works quite efficiently for small size polynomial optimization problems, it is interest-
ing to know how the SOS method would perform in solving these randomly generated
instances of (EHQ). In particular, we shall use GloptiPoly 3 of Henrion et al. [53].
We randomly generated 10 instances of (EHQ). By using the first SDP relaxation
(Lasserre’s procedure [70]), GloptiPoly 3 found global optimal solutions for 4 instances,
and got upper bounds of optimal values for the other 6 instances. In the latter case,
however, no feasible solutions are generated, while our algorithm always finds feasible
solutions with guaranteed approximation ratio, and so the two approaches are comple-
mentary to each other. Moreover, GloptiPoly 3 always yields a better upper bound
than v for our test instances, which helps to yield better approximation ratios. The
average ratio is 0.112 by the using upper bound v, and is 0.262 by using the upper
bound produced by GloptiPoly 3 (see Table 4.3).
To conclude this section as well as this chapter, we remark that the algorithms
proposed are actually practical, and they produce high quality solutions. The worst-
case performance analysis offers a theoretical ‘safety net’, which is usually far from
the typical performance. Moreover, it is of course possible to improve the solution by
4.6 Numerical Experiments 71
Table 4.3: Numerical results of (EHQ) compared with SOS for n = 12 and m = 30
Instance 1 2 3 4 5 6 7 8 9 10
100 · v 0.30 0.76 0.43 0.76 0.70 0.49 0.81 0.34 0.29 0.62
100 · v 4.75 4.47 5.21 5.20 4.59 4.81 5.23 5.12 5.89 4.78
100 · vSOS 2.05 2.02 2.43 2.41 1.86 2.02 1.99 2.24 2.83 1.88
Optimality of vSOS No No Yes Yes No Yes No Yes No No
v/v (%) 6.32 17.00 8.25 14.62 15.25 10.19 15.49 6.64 4.92 12.97
v/vSOS (%) 14.63 37.62 17.70 31.54 37.63 24.26 40.70 15.18 10.25 32.98
some local search procedure, e.g., the projection gradient methods [22], maximum block
improvement method [25].
Chapter 5
Polynomial Optimization with
Convex Constraints
5.1 Introduction
This chapter tackles an important and useful extension of the models studied in previ-
ous chapters: to allow the objective function to be a generic inhomogeneous polynomial
function. As is evident, many important applications of polynomial optimizations in-
volve an objective that is intrinsically inhomogeneous. Specifically, we consider the
following problems:
(PS) max p(x)
s.t. x ∈ Sn;
(PQ) max p(x)
s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,
x ∈ Rn
where Qi 0 for k = 1, 2, . . . , d, and∑m
i=1Qi 0. It is obvious that (PQ) is an
extension of (PS). We also in the chapter consider a much more general frame of
polynomial optimization over a general convex compact set, i.e., for a give convex
compact set G ⊂ Rn, the problem
(PG) max p(x)
s.t. x ∈ G.
72
5.1 Introduction 73
The model (PS) can be solved in polynomial-time when d ≤ 2, and becomes NP-
hard when d ≥ 3. Even worse for d ≥ 3, there is no polynomial-time approximation
algorithm with a positive approximation ratio unless P = NP , which we shall ar-
gue later. Therefore, the whole chapter is focus on relative approximation algorithms.
The inapproximability of (PS) differs greatly to that of the homogeneous model (HS)
discussed in Section 4.2, since when d is odd, (HS) admits a polynomial-time approxi-
mation algorithm with a positive approximation ratio by Theorem 4.2.2. Consequently,
the optimization of an inhomogeneous polynomial is much harder than a homogeneous
one. The complexity of (PQ) and (PG) is similar, being solvable in polynomial-time
only when d = 1 and NP-hard when d ≥ 2. This is because (PG) is generalized from
(PQ), and (PQ) is generalized from (HQ), an NP-hard problem when d ≥ 2 (see the
discussion in Section 4.1).
Extending the solution methods and the corresponding analysis from homogeneous
polynomial optimizations to the general inhomogeneous polynomials is not straight-
forward. As a matter of fact, so far all the successful approximation algorithms with
provable approximation ratios in the literature, e.g., the quadratic models considered
in [88, 87, 120, 75, 50] and the quartic models considered in [73, 77], are dependent on
the homogeneity in a crucial way. Technically, a homogenous polynomial allows one to
scale the overall function value along a given direction, which is an essential operation
in proving the quality bound of the approximation algorithms. The current chapter
breaks its path from the preceding practices, by directly dealing with a homogeniz-
ing variable. Although homogenization is a natural way to deal with inhomogeneous
polynomial functions, it is quite a different matter when it comes to the worst-case
performance ratio analysis. In fact, the usual homogenization does not lead to any
assured performance ratio. In this chapter we shall point out a specific route to get
around this difficulty, in which we actually provide a general scheme to approximately
solve such problems via homogenization.
In Section 5.2, we start by analyzing the model where the constraint set is the
Euclidean ball, i.e., the model (PS). We propose polynomial-time approximation al-
gorithms with guaranteed relative approximation ratios, which serve as a basis for the
subsequent analysis. In Section 5.3, the discussion is extended to cover the problem
where the constraint set is the intersection of a finite number of co-centered ellipsoids,
i.e., the model (PQ), and relative approximation algorithms are proposed as well. In
74 5 Polynomial Optimization with Convex Constraints
Section 5.4, the approximation bounds are derived even for some very general optimiza-
tion models (PG), e.g., optimization of a polynomial over a polytope. It turns out that
for such general problems, it is still possible to derive relative approximation ratios,
which depend on the problem dimensions only. The tool we used is the Lowner-John
ellipsoids. In Section 5.5, we discuss some applications with the models presented in
this chapter. Finally, we report our numerical experiment results in Section 5.6. As
this chapter is concerned with the relative approximation ratios, we may without loss of
generality assume the polynomial function p(x) to have no constant term, i.e., p(0) = 0.
5.2 Polynomial with Ball Constraint
Our first model in this chapter is to maximize a generic multivariate polynomial function
subject to the Euclidean ball constraint, i.e.,
(PS) max p(x)
s.t. x ∈ Sn.
Since we assume p(x) to have no constant term, the optimal value of this problem is
obviously nonnegative, i.e., v(PS) ≥ 0.
The complexity to solve (PS) can be summarized by the following proposition.
Proposition 5.2.1 If d ≤ 2, then (PS) can be solved in polynomial-time; Otherwise if
d ≥ 3, then (PS) is NP-hard, and there is no polynomial-time approximation algorithm
with a positive approximation ratio unless P = NP .
Proof. For d ≤ 2, (PS) is a standard trust region subproblem. As such it is well known
to be solvable in polynomial-time (see e.g., [113, 114] and the references therein). For
d ≥ 3, in a special case where p(x) is a homogeneous cubic form, it is easy to see that
(PS) is equivalent to maxx∈Sn p(x), which is shown to be NP-hard by Nesterov [90].
Let us now consider a special class of (PS) when d = 3:
v(α) = max f(x)− α‖x‖2
s.t. x ∈ Sn,
where α ≥ 0, and f(x) is a homogeneous cubic form associated with a nonzero super-
symmetric tensor F ∈ Rn×n×n. If v(α) > 0, then its optimal solution x∗ satisfies
f(x∗)− α‖x∗‖2 = ‖x∗‖3f(x∗
‖x∗‖
)− α‖x∗‖2 = ‖x∗‖2
(‖x∗‖f
(x∗
‖x∗‖
)− α
)> 0.
5.2 Polynomial with Ball Constraint 75
Thus by the optimality of x∗, we have ‖x∗‖ = 1. If we choose α = ‖F ‖ ≥ maxx∈Sn f(x),
then v(α) = 0. Since otherwise we must have v(α) > 0 and ‖x∗‖ = 1, with
v(α) = f(x∗)− α‖x∗‖2 ≤ maxx∈Sn
f(x)− α ≤ 0,
which is a contradiction. Moreover, v(0) > 0 simply because F is a nonzero tensor,
and it is also easy to see that v(α) is non-increasing as α ≥ 0 increases. Hence, there
is a threshold α0 ∈ [0, ‖F ‖], such that v(α) > 0 if 0 ≤ α < α0, and v(α) = 0 if α ≥ α0.
Suppose there exists a polynomial-time approximation algorithm with a positive
approximation ratio τ for (PS) when d ≥ 3. Then for every α ≥ 0, we can find
z ∈ Sn in polynomial-time, such that g(α) := f(z) − α‖z‖2 ≥ τv(α). It is obvious
that g(α) ≥ 0 since v(α) ≥ 0. Together with the fact that g(α) ≤ v(α) we have that
g(α) > 0 if and only if v(α) > 0, and g(α) = 0 if and only if v(α) = 0. Therefore, the
threshold α0 also satisfies g(α) > 0 if 0 ≤ α < α0, and g(α) = 0 if α ≥ α0. By applying
the bisection search over the interval [0, ‖F ‖] with this polynomial-time approximation
algorithm, we can find α0 and z ∈ Sn in polynomial-time, such that f(z)−α0‖z‖2 = 0.
This implies that z ∈ Sn is the optimal solution for the problem maxx∈Sn f(x) with
the optimal value α0, which is an NP-hard problem mentioned in the beginning of the
proof. Therefore, such approximation algorithm cannot exist unless P = NP .
The negative result in Proposition 5.2.1 rules out any polynomial-time approxima-
tion algorithm with a positive approximation ratio for (PS). However a positive relative
approximation ratio is still possible, which is the main subject of this section. Below we
shall first present a polynomial-time algorithm for approximately solving (PS), which
admits a (relative) worst-case performance ratio. In fact, here we present a general
scheme aiming at solving the polynomial optimization (PS). This scheme breaks down
to the following four major steps:
1. Introduce an equivalent model with the objective being a homogenous form;
2. Solve a relaxed model with the objective being a multilinear form;
3. Adjust to get a solution based on the solution of the relaxed model;
4. Assemble a solution for the original inhomogeneous model.
Some of these steps can be designed separately. The algorithm below is one realization
of the general scheme for solving (PS), with each step being carried out by a specific
76 5 Polynomial Optimization with Convex Constraints
procedure. We first present the specialized algorithm, and then in the remainder of the
section, we elaborate on these four general steps, and prove that in combination they
lead to a polynomial-time approximation algorithm with a quality-assured solution.
Algorithm 5.2.1
• INPUT: an n-dimensional d-th degree polynomial function p(x).
1 Rewrite p(x) − p(0) = F (x, x, · · · , x︸ ︷︷ ︸d
) when xh = 1 as in (5.2), with F being an
(n+ 1)-dimensional d-th order super-symmetric tensor.
2 Apply Algorithm 3.2.3 to solve the problem
max F (x1, x2, · · · , xd)
s.t. xk ∈ Sn+1, k = 1, 2, . . . , d
approximately, with input F and output (y1, y2, · · · , yd).
3 Compute (z1, z2, · · · , zd) = arg maxF((
ξ1y1/d1
),(ξ2y2/d
1
), · · · ,
(ξdy
d/d1
)), ξ ∈ Bd
.
4 Compute z = arg maxp(0); p (z(β)/zh(β)) ,β ∈ Bd and β1 =
∏dk=2 βk= 1
, with
z(β) = β1(d+ 1)z1 +∑d
k=2 βkzk.
• OUTPUT: a feasible solution z ∈ Sn.
In Step 2 of Algorithm 5.2.1, Algorithm 3.2.3 is called to approximately solve the
spherically constrained multilinear form optimization problem, which is a deterministic
polynomial-time algorithm. Notice the degree of the polynomial p(x) is deemed a fixed
parameter in this thesis, and thus Algorithm 5.2.1 runs in polynomial-time, and is
deterministic too. Our main result in this section is the following:
Theorem 5.2.2 (PS) admits a polynomial-time approximation algorithm with relative
approximation ratio τ(PS), where
τ(PS) := 2−5d2 (d+ 1)! d−2d(n+ 1)−
d−22 = Ω
(n−
d−22
).
5.2 Polynomial with Ball Constraint 77
Although homogenization is a natural way to deal with inhomogeneous polynomials,
the worst-case performance ratio does not follow straightforwardly. What is lacking is
that an inhomogeneous polynomial does not allow one to scale the overall function value
along a given direction, which is however an essential operation to prove the quality
bound of the approximation algorithms (see e.g., [87, 75, 50, 77]). Below we study
in detail how a particular implementation of these four steps of the scheme (which
becomes Algorithm 5.2.1) leads to the promised worst-case relative performance ratio
in Theorem 5.2.2. As we shall see later, our solution scheme can be applied to solve a
very general polynomial optimization model (PG).
5.2.1 Homogenization
The method of homogenization depends on the form of the polynomial p(x). Here
in discussion we assume p(x) to have no constant term, although Algorithm 5.2.1
applies for any polynomial. If p(x) is given as a summation of homogeneous polynomial
functions of different degrees, i.e., fk(x) (1 ≤ k ≤ d) is a homogeneous polynomial
function of degree k, then we may first write
fk(x) = Fk(x,x, · · · ,x︸ ︷︷ ︸k
) (5.1)
with F k being a k-th order super-symmetric tensor. Then by introducing a homoge-
nizing variable xh, which is always equal to 1, we may rewrite p(x) as
p(x) =
d∑k=1
fk(x) =
d∑k=1
fk(x)xhd−k =
d∑k=1
Fk(x,x, · · · ,x︸ ︷︷ ︸k
)xhd−k
= F
((x
xh
),
(x
xh
), · · · ,
(x
xh
)︸ ︷︷ ︸
d
)= F (x, x, · · · , x︸ ︷︷ ︸
d
) = f(x), (5.2)
where F is an (n+ 1)-dimensional d-th order super-symmetric tensor, whose last com-
ponent is 0 (since p(x) has no constant term).
If the polynomial p(x) is given in terms of summation of monomials, we should first
group them according to their degrees, and then rewrite the summation of monomials in
each group as homogeneous polynomial function. After that, we then proceed according
to (5.1) and (5.2) to obtain the tensor form F , as required.
78 5 Polynomial Optimization with Convex Constraints
Finally in this step, we may equivalently reformulate (PS) as
(PS) max f(x)
s.t. x =
(x
xh
),
x ∈ Sn, xh = 1.
Obviously, we have v(PS) = v(PS) and v(PS) = v(PS).
5.2.2 Multilinear Form Relaxation
Multilinear form relaxation has proven to be effective, as discussed in Chapter 4. Specif-
ically, Lemma 4.2.1 and Lemma 4.4.3 are the key link formulae. Now we relax (PS) to
an inhomogeneous multilinear form optimization problem as:
(TPS) max F (x1, x2, · · · , xd)
s.t. xk =
(xk
xkh
), k = 1, 2, . . . , d,
xk ∈ Sn, xkh = 1, k = 1, 2, . . . , d.
Obviously, we have v(TPS) ≥ v(PS) = v(PS). Before proceeding, let us first settle the
computational complexity issue for solving (TPS).
Proposition 5.2.3 (TPS) is NP-hard whenever d ≥ 3.
Proof. Notice that in Proposition 3.2.2, we proved the following problem is NP-hard:
max F (x,y, z)
s.t. x,y, z ∈ Sn.
If d = 3 and a special case where F has the form Fn+1,j,k = Fi,n+1,k = Fi,j,n+1 = 0 for
all 1 ≤ i, j, k ≤ n+ 1, (TPS) is equivalent to the above model, and thus is NP-hard.
(TPS) is still difficult to solve, and moreover it remains inhomogeneous, since xkh
is required to be 1. To our best knowledge, no polynomial-time approximation algo-
rithm is available in the literature to solve this problem. Furthermore, we shall relax
the constraint xkh = 1, and introduce the following parameterized and homogenized
problem:
(TPS(t)) max F (x1, x2, · · · , xd)
s.t. ‖xk‖ ≤ t, xk ∈ Rn+1, k = 1, 2, . . . , d.
Obviously, (TPS) can be relaxed to (TPS(√
2)), since if x is feasible for (TPS) then
‖x‖2 = ‖x‖2 + xh2 ≤ 1 + 1 = 2. Consequently, v(TPS(
√2)) ≥ v(TPS).
5.2 Polynomial with Ball Constraint 79
Both the objective and the constraints are now homogeneous, and it is easy to see
for all t > 0, (TPS(t)) is equivalent to each other by a simple scaling method. Moreover,
(TPS(1)) is equivalent to
max F (x1, x2, · · · , xd)
s.t. xk ∈ Sn+1, k = 1, 2, . . . , d,
which is in the form of (TS) discussed in Section 3.2. By using Algorithm 3.2.3 and
applying Theorem 3.2.4, (TPS(1)) admits a polynomial-time approximation algorithm
with approximation ratio (n+ 1)−d−22 . Therefore, for all t > 0, (TPS(t)) also admits a
polynomial-time approximation algorithm with approximation ratio (n + 1)−d−22 , and
v(TPS(t)) = td v(TPS(1)). After this relaxation step (Step 2 in Algorithm 5.2.1), we
are able to find a feasible solution (y1, y2, · · · , yd) of (TPS(1)) in polynomial-time, such
that
F (y1, y2, · · · , yd) ≥ (n+ 1)−d−22 v(TPS(1))
= 2−d2 (n+ 1)−
d−22 v(TPS(
√2))
≥ 2−d2 (n+ 1)−
d−22 v(TPS). (5.3)
Algorithm 3.2.3 is the engine which enables the second step of our scheme. In
fact, any polynomial-time approximation algorithm of (TPS(1)) can be used as an
engine to yield a realization (algorithm) of our scheme. As will become evident later,
any improvement of the approximation ratio of (TPS(1)) leads to the improvement of
relative approximation ratio in Theorem 5.2.2. For example, recently So [108] improved
the approximation bound of (TPS(1)) to Ω
((lognn
)− d−22
)(though the algorithm is
mainly of theoretical interest), and consequently the relative approximation ratio under
our scheme is improved to Ω
((lognn
) d−22
)too. Of course, one may apply any other
favorite algorithm to solve the relaxation (TPS(1)). For instance, the alternating least
square (ALS) algorithm (see e.g., [68] and the references therein), and the maximum
block improvement (MBI) method of Chen et al. [25], can be other alternatives for the
second step.
5.2.3 Homogenizing Components Adjustment
The approximate solution (y1, y2, · · · , yd) of (TPS(1)) satisfies ‖yk‖ ≤ 1 for all 1 ≤
k ≤ d, which implies ‖yk‖ ≤ 1, but in general we do not have any control on the size of
80 5 Polynomial Optimization with Convex Constraints
ykh, and thus (y1, y2, · · · , yd) may not be a feasible solution for (TPS). The following
lemma plays a link role in our analysis, to ensure that the construction of a feasible
solution for the inhomogeneous model (TPS) is possible.
Lemma 5.2.4 Suppose xk ∈ Rn+1 with |xkh| ≤ 1 for all 1 ≤ k ≤ d. Let η1, η2, · · · , ηdbe independent random variables, each taking values 1 and −1 with E[ηk] = xkh for all
1 ≤ k ≤ d, and let ξ1, ξ2, · · · , ξd be i.i.d. random variables, each taking values 1 and
−1 with equal probability 1/2. If the last component of the tensor F is 0, then
E
[d∏
k=1
ηkF
((η1x
1
1
),
(η2x
2
1
), · · · ,
(ηdx
d
1
))]= F (x1, x2, · · · , xd), (5.4)
and
E
[F
((ξ1x
1
1
),
(ξ2x
2
1
), · · · ,
(ξdx
d
1
))]= 0. (5.5)
Proof. The claimed equations readily result from the following observations:
E
[d∏
k=1
ηkF
((η1x
1
1
),
(η2x
2
1
), · · · ,
(ηdx
d
1
))]
= E
[F
((η1
2x1
η1
),
(η2
2x2
η2
), · · · ,
(ηd
2xd
ηd
))](multilinearity of F )
= F
(E
[(x1
η1
)],E
[(x2
η2
)], · · · ,E
[(xd
ηd
)])(independence of ηk’s)
= F (x1, x2, · · · , xd),
and
E
[F
((ξ1x
1
1
),
(ξ2x
2
1
), · · · ,
(ξdx
d
1
))]= F
(E
[(ξ1x
1
1
)],E
[(ξ2x
2
1
)], · · · ,E
[(ξdx
d
1
)])(independence of ξk’s)
= F
((0
1
),
(0
1
), · · · ,
(0
1
))(zero-mean of ξk’s)
= 0,
where the last equality is due to the fact that the last component of F is 0.
Lemma 5.2.4 suggests that one may enumerate the 2 d possible combinations of((ξ1y1
1
),(ξ2y2
1
), · · · ,
(ξdy
d
1
))and pick the one with the largest value of function F (or
via a simple randomization procedure), to generate a feasible solution for the inhomo-
geneous multilinear form optimization (TPS) from a feasible solution for the homoge-
neous multilinear form optimization (TPS(1)), with a controlled quality deterioration.
5.2 Polynomial with Ball Constraint 81
It plays a key role in proving the approximation ratio for (TPS), which is a byproduct
in this section.
Theorem 5.2.5 (TPS) admits a polynomial-time approximation algorithm with ap-
proximation ratio 2−3d2 (n+ 1)−
d−22 .
Proof. Let (y1, y2, · · · , yd) be the feasible solution found in Step 2 of Algorithm 5.2.1
satisfying (5.3), and let η = (η1, η2, · · · , ηd)T with all ηk’s being independent and taking
values 1 and −1 such that E[ηk] = ykh. By applying Lemma 5.2.4, (5.4) explicitly implies
F (y1, y2, · · · , yd)
= −∑
β∈Bd,∏dk=1 βk=−1
Prob η = βF((
β1y1
1
),
(β2y
2
1
), · · · ,
(βdy
d
1
))
+∑
β∈Bd,∏dk=1 βk=1
Prob η = βF((
β1y1
1
),
(β2y
2
1
), · · · ,
(βdy
d
1
)),
and (5.5) explicitly implies
∑β∈Bd
F
((β1y
1
1
),
(β2y
2
1
), · · · ,
(βdy
d
1
))= 0.
Combing the above two equalities, for any constant c, we have
F (y1, y2, · · · , yd)
=∑
β∈Bd,∏dk=1 βk=−1
(c− Prob η = β)F((
β1y1
1
),
(β2y
2
1
), · · · ,
(βdy
d
1
))
+∑
β∈Bd,∏dk=1 βk=1
(c+ Prob η = β)F((
β1y1
1
),
(β2y
2
1
), · · · ,
(βdy
d
1
)). (5.6)
If we let
c = maxβ∈Bd,
∏dk=1 βk=−1
Prob η = β,
then the coefficients of each term in (5.6) will be nonnegative. Therefore we are able
to find β′ ∈ Bd, such that
F
((β′1y
1
1
),
(β′2y
2
1
), · · · ,
(β′dy
d
1
))≥ τ0F (y1, y2, · · · , yd), (5.7)
82 5 Polynomial Optimization with Convex Constraints
where
τ0 =
∑β∈Bd,
∏dk=1 βk=1
(c+ Prob η = β) +∑
β∈Bd,∏dk=1 βk=−1
(c− Prob η = β)
−1
≥
2d−1c+∑
β∈Bd,∏dk=1 βk=1
Prob η = β+ (2d−1 − 1)c
−1
≥(
2d−1 + 1 + 2d−1 − 1)−1
= 2−d.
Let us denote zk :=(β′kyk
1
)for k = 1, 2, . . . , d. Since ‖zk‖ = ‖β′kyk‖ ≤ 1, we know that
(z1, z2, · · · , zk) is a feasible solution for (TPS). By combing with (5.3), we have
F (z1, z2, · · · , zd) ≥ τ0F (y1, y2, · · · , yd)
≥ 2−d2−d2 (n+ 1)−
d−22 v(TPS)
= 2−3d2 (n+ 1)−
d−22 v(TPS).
One may notice that our proposed algorithm for solving (TPS) is very similar to
Steps 2 and 3 of Algorithm 5.2.1, with only minor modification at Step 3, namely we
choose a solution in arg maxF((
β1y1
1
),(β2y2
1
), · · · ,
(βdy
d
1
)),β ∈ Bd
, instead of choos-
ing a solution in arg maxF((
β1y1/d1
),(β2y2/d
1
), · · · ,
(βdy
d/d1
)),β ∈ Bd
. The reason to
divide d at Step 3 in Algorithm 5.2.1 (to solve (PS)) will become clear later. Finally, we
remark again that it is unnecessary to enumerate all possible 2d combinations in this
step, as (5.6) suggests that a simple randomization process will serve the same purpose,
especially when d is large. In the latter case, we will end up with a polynomial-time
randomized approximation algorithm; otherwise, the computational complexity of the
procedure is deterministic and is polynomial-time.
5.2.4 Feasible Solution Assembling
Finally we come to the last step of the scheme. In Step 4 of Algorithm 5.2.1, a polar-
ization formula z(β) = β1(d + 1)z1 +∑d
k=2 βkzk with β ∈ Bd and β1 =
∏dk=2 βk = 1
is proposed. In fact, searching over all β ∈ Bd will possibly improve the solution,
although the worst-case performance ratio will remain the same. Moreover, one may
choose z1 or any other zk to play the same role here; alternatively one may enumerate
β`(d + 1)z` +∑
1≤k≤d, k 6=` βkzk over all β ∈ Bd and 1 ≤ ` ≤ d, and take the best
5.2 Polynomial with Ball Constraint 83
possible solution; again, this will not change the theoretical performance ratio. The
polarization formula at Step 4 of Algorithm 5.2.1 works for any fixed degree d, and we
shall complete the final stage of the proof of Theorem 5.2.2. Specifically, we shall prove
that by letting
z = arg max
p(0); p
(z(β)
zh(β)
),β ∈ Bd and β1 =
d∏k=2
βk = 1
with z(β) = β1(d+ 1)z1 +∑d
k=2 βkzk, we have
p(z)− v(PS) ≥ τ(PS) (v(PS)− v(PS)) . (5.8)
First, the solution (z1, z2, · · · , zd) as established at Step 3 of Algorithm 5.2.1
satisfies ‖zk‖ ≤ 1/d (notice we divided d in each term at Step 3) and zkh = 1 for
k = 1, 2, . . . , d. A same proof of Theorem 5.2.5 can show that
F (z1, z2, · · · , zd) ≥ d−d2−3d2 (n+ 1)−
d−22 v(TPS) ≥ 2−
3d2 d−d(n+ 1)−
d−22 v(PS). (5.9)
It is easy to see that
2 ≤ |zh(β)| ≤ 2d and ‖z(β)‖ ≤ (d+ 1)/d+ (d− 1)/d = 2. (5.10)
Thus z(β)/zh(β) is a feasible solution for (PS), and so f(z(β)/zh(β)) ≥ v(PS) = v(PS).
Moreover, we shall argue below that
β1 = 1 =⇒ f(z(β)) ≥ (2d)d v(PS). (5.11)
If this were not the case, then f (z(β)/(2d)) < v(PS) ≤ 0. Notice that β1 = 1 implies
zh(β) > 0, and thus we have
f
(z(β)
zh(β)
)=
(2d
zh(β)
)df
(z(β)
2d
)≤ f
(z(β)
2d
)< v(PS),
which contradicts the feasibility of z(β)/zh(β).
Suppose ξ1, ξ2, · · · , ξd are i.i.d. random variables, each taking values 1 and −1 with
equal probability 1/2. By the link Lemma 4.2.1, noticing that f (z(−ξ)) = f (−z(ξ)) =
84 5 Polynomial Optimization with Convex Constraints
(−1)df (z(ξ)), we have
d!F(
(d+ 1)z1, z2, · · · , zd)
= E
[d∏
k=1
ξkf (z(ξ))
]
=1
4E
[f (z(ξ))
∣∣∣∣∣ξ1 = 1,
d∏k=2
ξk = 1
]− 1
4E
[f (z(ξ))
∣∣∣∣∣ξ1 = 1,
d∏k=2
ξk = −1
]
−1
4E
[f (z(ξ))
∣∣∣∣∣ξ1 = −1,
d∏k=2
ξk = 1
]+
1
4E
[f (z(ξ))
∣∣∣∣∣ξ1 = −1,
d∏k=2
ξk = −1
]
=1
4E
[f (z(ξ))
∣∣∣∣∣ξ1 = 1,d∏
k=2
ξk = 1
]− 1
4E
[f (z(ξ))
∣∣∣∣∣ξ1 = 1,d∏
k=2
ξk = −1
]
−1
4E
[f (z(−ξ))
∣∣∣∣∣ξ1 = 1,d∏
k=2
ξk = (−1)d−1
]+
1
4E
[f (z(−ξ))
∣∣∣∣∣ξ1 = 1,d∏
k=2
ξk = (−1)d
].
By inserting and canceling a constant term, the above expression further leads to
d!F(
(d+ 1)z1, z2, · · · , zd)
= E
[d∏
k=1
ξkf (z(ξ))
]
=1
4E
[(f (z(ξ))− (2d)d v(PS)
) ∣∣∣∣∣ ξ1 = 1,
d∏k=2
ξk = 1
]
−1
4E
[(f (z(ξ))− (2d)d v(PS)
) ∣∣∣∣∣ ξ1 = 1,
d∏k=2
ξk = −1
]
+(−1)d−1
4E
[(f (z(ξ))− (2d)d v(PS)
) ∣∣∣∣∣ ξ1 = 1,d∏
k=2
ξk = (−1)d−1
]
+(−1)d
4E
[(f (z(ξ))− (2d)d v(PS)
) ∣∣∣∣∣ ξ1 = 1,d∏
k=2
ξk = (−1)d
]
≤ 1
2E
[(f (z(ξ))− (2d)d v(PS)
) ∣∣∣∣∣ ξ1 = 1,d∏
k=2
ξk = 1
], (5.12)
where the last inequality is due to (5.11). Therefore, there is a binary vector β′ ∈ Bd
with β′1 =∏dk=2 β
′k = 1, such that
f(z(β′))−(2d)dv(PS)≥ 2d!F ((d+1)z1, z2, · · · , zd)≥ 2−3d2
+1(d+1)!d−d(n+1)−d−22 v(PS),
where the last step is due to (5.9).
Below we argue z = arg maxp(0); p (z(β)/zh(β)) ,β ∈ Bd and β1 =
∏dk=2 βk = 1
satisfies (5.8). In fact, if −v(PS) ≥ τ(PS) (v(PS)− v(PS)), then 0 trivially satis-
fies (5.8), and so does z in this case. Otherwise, if −v(PS) < τ(PS) (v(PS)− v(PS)),
then we have
v(PS) > (1− τ(PS)) (v(PS)− v(PS)) ≥ v(PS)− v(PS)
2,
5.3 Polynomial with Ellipsoidal Constraints 85
which implies
f
(z(β′)
2d
)−v(PS)≥(2d)−d2−
3d2
+1(d+1)!d−d(n+1)−d−22 v(PS)≥τ(PS) (v(PS)− v(PS)) .
The above inequality also implies that f (z(β′)/(2d)) > 0. Recall that β′1 = 1 implies
zh(β′) > 0, and thus 2d/zh(β′) ≥ 1 by (5.10). Therefore, we have
p(z) ≥ p(z(β′)
zh(β′)
)= f
(z(β′)
zh(β′)
)=
(2d
zh(β′)
)df
(z(β′)
2d
)≥ f
(z(β′)
2d
).
This shows that z satisfies (5.8) in both cases, which concludes the whole proof.
5.3 Polynomial with Ellipsoidal Constraints
In this section, we consider an extension of (PS), namely
(PQ) max p(x)
s.t. xTQix ≤ 1, i = 1, 2, . . . ,m,
x ∈ Rn,
where Qi 0 for i = 1, 2, . . . ,m, and∑m
i=1Qi 0. Since p(x) is assumed to have no
constant term, we know that v(PQ) ≤ 0 ≤ v(PQ).
Here, like in Section 5.2, we propose a polynomial-time randomized algorithm for
approximately solving (PQ), with a worst-case relative performance ratio. The main
algorithm and the approximation result of this section is the following.
Algorithm 5.3.1
• INPUT: an n-dimensional d-th degree polynomial function p(x), matrices Qi ∈
Rn×n, Qi 0 for all 1 ≤ i ≤ m with∑m
i=1Qi 0.
1 Rewrite p(x) − p(0) = F (x, x, · · · , x︸ ︷︷ ︸d
) when xh = 1 as in (5.2), with F being an
(n+ 1)-dimensional d-th order super-symmetric tensor.
2 Apply Algorithm 3.3.2 to solve the problem
max F (x1, x2, · · · , xd)
s.t. (xk)T
Qi 0
0T 1
xk ≤ 1, k = 1, 2, . . . , d, i = 1, 2, . . . ,m
approximately, and get a feasible solution (y1, y2, · · · , yd).
86 5 Polynomial Optimization with Convex Constraints
3 Compute (z1, z2, · · · , zd) = arg maxF((
ξ1y1/d1
),(ξ2y2/d
1
), · · · ,
(ξdy
d/d1
)), ξ ∈ Bd
.
4 Compute z = arg maxp(0); p (z(β)/zh(β)) ,β ∈ Bd and β1 =
∏dk=2 βk= 1
, with
z(β) = β1(d+ 1)z1 +∑d
k=2 βkzk.
• OUTPUT: a feasible solution z ∈ Rn.
Theorem 5.3.1 (PQ) admits a polynomial-time randomized approximation algorithm
with relative approximation ratio τ(PQ), where
τ(PQ) := 2−5d2 (d+ 1)! d−2d(n+ 1)−
d−22 Ω
(log−(d−1)m
)= Ω
(n−
d−22 log−(d−1)m
).
Our scheme for solving general polynomial optimization model (PQ) is similar to
that for solving (PS) in Section 5.2. The main difference lies in Step 2, where a different
relaxation model requires a different solution method to cope with. The method in
question is Algorithm 3.3.2.
The proof of Theorem 5.3.1 is similar to that of Theorem 5.2.2. Here we only
illustrate the main ideas and skip the details.
By homogenizing p(x) who has no constant term, we may rewrite (PQ) as
(PQ) max f(x)
s.t. x =
(x
xh
),
xTQix ≤ 1, x ∈ Rn, i = 1, 2, . . . ,m,
xh = 1,
which can be relaxed to the inhomogeneous multilinear function problem
(TPQ) max F (x1, x2, · · · , xd)
s.t. xk =
(xk
xkh
), k = 1, 2, . . . , d,
(xk)TQixk ≤ 1, xk ∈ Rn, k = 1, 2, . . . , d, i = 1, 2, . . . ,m,
xkh = 1, k = 1, 2, . . . , d,
where F (x, x, · · · , x︸ ︷︷ ︸d
) = f(x) with F being super-symmetric. We then further relax
(TPQ) to the multilinear form optimization model (TPQ(√
2)), with
(TPQ(t)) max F (x1, x2, · · · , xd)
(xk)TQixk ≤ t2, k = 1, 2, . . . , d, i = 1, 2, . . . ,m,
xk ∈ Rn+1, k = 1, 2, . . . , d,
5.3 Polynomial with Ellipsoidal Constraints 87
where Qi =
Qi 0
0T 1
for i = 1, 2, . . . ,m.
By Theorem 3.3.4, for any t > 0, (TPQ(t)) admits a polynomial-time randomized
approximation algorithm with approximation ratio (n + 1)−d−22 Ω
(log−(d−1)m
), and
v(TPQ(t)) = td v(TPQ(1)). Thus the approximate solution (y1, y2, · · · , yd) found by
Step 2 of Algorithm 5.3.1 satisfies
F (y1, y2, · · · , yd) ≥ (n+ 1)−d−22 Ω
(log−(d−1)m
)v(TPQ(1))
= (√
2)−d(n+ 1)−d−22 Ω
(log−(d−1)m
)v(TPQ(
√2))
≥ 2−d2 (n+ 1)−
d−22 Ω
(log−(d−1)m
)v(TPQ).
Noticing that (ykh)2 ≤ (yk)TQ1yk ≤ 1 for k = 1, 2, . . . , d, we again apply Lemma 5.2.4
to (y1, y2, · · · , yd), and use the same argument as in the proof of Theorem 5.2.5.
Let c = maxβ∈Bd,∏dk=1 βk=−1 Prob η = β, where η = (η1, η2, · · · , ηd)T and its com-
ponents are independent random variables, each taking values 1 and −1 with E[ηk] = ykh
for k = 1, 2, . . . , d. Then we are able to find a binary vector β′ ∈ Bd, such that
F
((β′1y
1
1
),
(β′2y
2
1
), · · · ,
(β′dy
d
1
))≥ τ0F (y1, y2, · · · , yd)
≥ 2−dF (y1, y2, · · · , yd)
≥ 2−3d2 (n+ 1)−
d−22 Ω
(log−(d−1)m
)v(TPQ).
This proves the following theorem as a byproduct.
Theorem 5.3.2 (TPQ) admits a polynomial-time randomized approximation algorith-
m with approximation ratio 2−3d2 (n+ 1)−
d−22 Ω
(log−(d−1)m
).
To prove the main theorem in this section (Theorem 5.3.1), we only need to check
the feasibility of z generated by Algorithm 5.3.1, while the worst-case performance
ratio can be proven by the similar argument in Section 5.2.4. Indeed, (z1, z2, · · · , zd)
at Step 3 of Algorithm 5.3.1 satisfies
(zk)TQizk ≤ 1/d2 ∀ 1 ≤ i ≤ m, 1 ≤ k ≤ d.
For any binary vector β ∈ Bd, as z(β) = β1(d + 1)z1 +∑d
k=2 βkzk, we have 2 ≤
|zh(β)| ≤ 2d. Noticing by the Cauchy-Schwarz inequality,
|(zj)TQizk| ≤ ‖Qi
12zj‖ · ‖Qi
12zk‖ ≤ 1/d2 ∀ 1 ≤ i ≤ m, 1 ≤ j, k ≤ d,
88 5 Polynomial Optimization with Convex Constraints
it follows that
(z(β))TQiz(β) ≤ 2d · 2d · 1/d2 = 4 ∀ 1 ≤ i ≤ m.
Thus z(β)/zh(β) is a feasible solution for (PQ), which implies z is also feasible.
To conclude this section, we remark here that (PQ) includes as a special case the
optimization of a general polynomial function over a central-symmetric polytope:
max p(x)
s.t. −1 ≤ (ai)Tx ≤ 1, i = 1, 2, . . . ,m,
x ∈ Rn,
with rank (a1,a2, · · · ,am) = n.
5.4 Polynomial with General Convex Constraints
In this section we study polynomial optimization model in a generic constraint format:
(PG) max p(x)
s.t. x ∈ G,
where G ⊂ Rn is a given convex compact set. As before, we derive polynomial-time
approximation algorithms for solving (PG). Our approaches make use of the well known
Lowner-John ellipsoids (see e.g., [20, 86]), which is the following:
Theorem 5.4.1 Given a convex compact set G ⊂ Rn with non-empty interior.
1. There exists a unique largest volume ellipsoid Ax + a |x ∈ Sn ⊂ G, whose n
times linear-size larger ellipsoid nAx+a |x ∈ Sn ⊃ G, and if in addition G is
central-symmetric, then √nAx+ a |x ∈ Sn ⊃ G;
2. There exists a unique smallest volume ellipsoid Bx+ b |x ∈ Sn ⊃ G, whose n
times linear-size smaller ellipsoid Bx/n+ b |x ∈ Sn ⊂ G, and if in addition G
is central-symmetric, then Bx/√n+ b |x ∈ Sn ⊂ G.
Armed with the above theorem, if we are able to find the Lowner-John ellipsoid
(either the inner or the outer) of the feasible region G in polynomial-time, then the
following algorithm approximately solves (PG) with a worst-case performance ratio.
5.4 Polynomial with General Convex Constraints 89
Algorithm 5.4.1
• INPUT: an n-dimensional d-th degree polynomial function p(x) and a set G ⊂ Rn.
1 Find a scalar t ∈ R, a vector b ∈ Rn, and a matrix A ∈ Rn×m with rank (A) =
m ≤ n, such that two co-centered ellipsoids E1 = Au + b |u ∈ Sm and E2 =
tAu+ b |u ∈ Sm satisfy E1 ⊂ G ⊂ E2.
2 Compute a polynomial function p0(u) = p(Au+ b) of variable u ∈ Rm.
3 Apply Algorithm 5.2.1 with input p0(x) and output y ∈ Sm.
4 Compute z = Ay + b.
• OUTPUT: a feasible solution z ∈ G.
The key result in this section is the following theorem.
Theorem 5.4.2 If Sn ⊂ G ⊂ t Sn := x ∈ Rn | ‖x‖ ≤ t for some t ≥ 1, then (PG)
admits a polynomial-time approximation algorithm with relative approximation ratio
τ(PG)(t), where
τ(PG)(t) := 2−2d(d+ 1)! d−2d(n+ 1)−d−22 (t2 + 1)−
d2 = Ω
(n−
d−22 t−d
).
Proof. By homogenizing the object function of (PG), we get the equivalent problem
(PG) max f(x)
s.t. x =
(x
xh
),
x ∈ G, xh = 1,
where f(x) = p(x) if xh = 1, and f(x) is an (n + 1)-dimensional homogeneous poly-
nomial function of degree d. If we write f(x) = F (x, x, · · · , x︸ ︷︷ ︸d
) with F being super-
symmetric, then (PG) can be relaxed to the inhomogeneous multilinear form problem
(TPG) max F (x1, x2, · · · , xd)
s.t. xk =
(xk
xkh
), k = 1, 2, . . . , d,
xk ∈ G, xkh = 1, k = 1, 2, . . . , d.
90 5 Polynomial Optimization with Convex Constraints
Recall that in Section 5.2.2, we have defined
(TPS(t)) max F (x1, x2, · · · , xd)
s.t. ‖xk‖ ≤ t, xk ∈ Rn+1, k = 1, 2, . . . , d.
As xk ∈ G ⊂ t Sn, it follows that ‖xk‖ ≤√t2 + 1 in (TPG). Therefore, (TPS(
√t2 + 1))
is a relaxation of (TPG), and v(TPS(√t2 + 1)) ≥ v(TPG) ≥ v(PG) = v(PG). The rest of
the proof follows similarly as that in Section 5.2.4. Specifically, we are able to construct
a feasible solution x ∈ Sn ⊂ G in polynomial-time with a relative performance ratio
τ(PG)(t).
Observe that any ellipsoid can be linearly transformed to the Euclidean ball. By a
variable transformation if necessary, we are led to the main result in this section.
Corollary 5.4.3 Given a bounded set G ⊂ Rn, if two co-centered ellipsoids E1 =
Au + b |u ∈ Sn and E2 = tAu + b |u ∈ Sn can be found in polynomial-time,
satisfying E1 ⊂ G ⊂ E2, then (PG) admits a polynomial-time approximation algorithm
with relative approximation ratio τ(PG)(t).
We remark that in fact the set G in Theorem 5.4.2 and Corollary 5.4.3 does not
need to be convex, as long as the two required ellipsoids are in place. However, the
famous Lowner-John theorem guarantees the existence of such inner and outer ellipsoids
required in Corollary 5.4.3 for any convex compact set, with t = n for G being non-
central-symmetric, and t =√n for G being central-symmetric. Thus, if we are able to
find a pair of ellipsoids (E1, E2) in polynomial-time for G, then (PG) can be solved by a
polynomial-time approximation algorithm with relative approximation ratio τ(PG)(t).
Indeed, it is possible to compute in polynomial-time the Lowner-John ellipsoids in
several interesting cases. Below is a list of such cases (assuming G is bounded); for the
details one is referred to [20, 86]:
• G = x ∈ Rn | (ai)Tx ≤ bi, i = 1, 2, . . . ,m;
• G = conv x1,x2, · · · ,xm, where xi ∈ Rn for i = 1, 2, . . . ,m;
• G =⋂mi=1Ei, where Ei is an ellipsoid in Rn for i = 1, 2, . . . ,m;
• G = conv ⋃mi=1Ei, where Ei is an ellipsoid in Rn for i = 1, 2, . . . ,m;
• G =∑m
i=1Ei := ∑m
i=1 xi |xi ∈ Ei, i = 1, 2, . . . ,m, where Ei is an ellipsoid in
Rn for i = 1, 2, . . . ,m.
5.5 Applications 91
By Corollary 5.4.3, and the computability of the Lowner-John ellipsoids discussed
above, we conclude that for (PG) with the constraint set G being any of the above
cases, then there is a polynomial-time approximation algorithm with a relative approx-
imation quality assurance. In particular, the ratio is τ(PG)(√m) = Ω
(n−
d−22 m−
d2
)for
the last case, and is τ(PG)(n) = Ω(n−
3d−22
)for the other cases.
We also remark that (PQ) : maxxTQix≤1,i=1,2,...,m p(x) discussed in Section 5.3,
may in principle be solved by directly applying Corollary 5.4.3 as well. If we adopt
that approach (Algorithm 5.4.1), then the relative approximation ratio is τ(PG)(√n) =
Ω(n−
2d−22
), which prevails if m is exceedingly large. Taking the better one, the quality
ratio in Theorem 5.3.1 can be improved to Ω(
maxn−
d−22 log−(d−1)m, n−
2d−22
).
Our investigation quite naturally leads to a question which is of a general geometric
interest itself. Consider the intersection of m co-centered ellipsoids in Rn as a geometric
structure. Denote Em,n to be the collection of all such structures, or more specifically
Em,n :=
m⋂i=1
x ∈ Rn |xTQix ≤ 1
∣∣∣∣∣ Qi 0 for i = 1, 2, . . . ,m and
m∑i=1
Qi 0
.
For any central-symmetric and convex compact set G ⊂ Rn centered at b, there exists
Em,n ∈ Em,n and t ≥ 1, such that b + Em,n ⊂ G ⊂ b + tEm,n. Obviously, one can
naturally define
t(G;m,n) := inf t |Em,n ∈ Em,n such that b+ Em,n ⊂ G ⊂ b+ tEm,n,
θ(m,n) := sup t(G;m,n) |G ⊂ Rn is convex compact and central-symmetric.
The famous Lowner-John theorem states that θ(1, n) =√n. Naturally, θ(∞, n) = 1,
because any central-symmetric convex set can be expressed by the intersection of an
infinite number of co-centered ellipsoids. It is interesting to compute θ(m,n) for general
m and n. Of course, it is trivial to observe that θ(m,n) is monotonically decreasing in m
for any fixed n. Anyway, if we are able to compute θ(m,n) and find the corresponding
Em,n in polynomial-time, then Theorem 5.3.1 suggests a polynomial-time randomized
approximation algorithm of (PG) with relative approximation ratio (θ(m,n))−dτ(PQ) =
Ω(
(θ(m,n))−dn−d−22 log−(d−1)m
).
5.5 Applications
The generality of the polynomial optimization models studied in this chapter have
versatile applications. In order to better appreciate these models as well as the approx-
92 5 Polynomial Optimization with Convex Constraints
imation algorithms presented, in this section we shall discuss a few detailed examples
rising from real applications, and show that they are readily formulated by the inho-
mogeneous polynomial optimization models in this chapter.
5.5.1 Portfolio Selection with Higher Moments
The portfolio selection problem dates back to early 1950’, when the seminal work
of mean-variance model was proposed by Markowitz [81]. Essentially, in Markowitz’s
model, the mean of the portfolio return is treated as the ‘gain’ factor, while the variance
of the portfolio return is treated as the ‘risk’ factor. By minimizing the risk subject to
certain target of reward, the mean-variance model is as follows:
(MV ) min xTΣx
s.t. µTx = µ0,
eTx = 1, x ≥ 0, x ∈ Rn,
where µ and Σ are the mean vector and co-variance matrix of n given assets respec-
tively, and e is the all one vector. This model and its variations have been studies
extensively along the history of portfolio management. Despite its popularity and orig-
inality, the mean-variance model certainly has drawbacks. An important one is that it
neglects the higher moments information of the portfolio. Mandelbrot and Hudson [78]
made a strong case against a ‘normal view’ of the investment returns. The use of higher
moments in portfolio selection becomes quite necessary, i.e., involving more than the
first two moments (e.g., the skewness and the kurtosis of the investment returns) if they
are also available. That problem has been receiving much attention in the literature
(see e.g., de Athayde and Flore [10], Prakash et al. [96], Jondeau and Rockinger [60],
Kleniati et al. [64], and the references therein). In particular, a very general model
in [64] is
(PM) max αµTx− β xTΣx+ γ∑n
i,j,k=1 ςijkxixjxk − δ∑n
i,j,k,`=1 κijk`xixjxkx`
s.t. eTx = 1, x ≥ 0, x ∈ Rn,
where µ,Σ, (ςijk), (κijk`) are the first four central moments of the n given assets. The
nonnegative parameters α, β, γ, δ measure the investor’s preference to the four moments,
and they sum up to one, i.e., α+ β + γ + δ = 1.
In fact, the mean-variance model (MV ) can be taken as a special case of (PM) with
γ = δ = 0. The model (PM) is essentially in the frame work of our model (PG), as
5.5 Applications 93
the constraint set is convex and compact. By directly applying Corollary 5.4.3 and the
discussion on its applicability in a polytope, it admits a polynomial-time approximation
algorithm with relative approximation ratio Ω(n−5
).
5.5.2 Sensor Network Localization
Suppose in a certain specified region G ⊂ R3, there are a set of anchor nodes, denoted by
A, and a set of sensor nodes, denoted by S. What we have known are the positions of the
anchor nodes aj ∈ G (j ∈ A), and the (possibly noisy) distance measurements between
anchor nodes and sensor nodes, and between two different sensor nodes, denoted by
dij (i ∈ S, j ∈ S ∪ A). The task is to estimate the positions of the unknown sensor
nodes xi ∈ G (i ∈ S). Luo and Zhang [77] proposed a least square formulation to this
sensor network localization problem. Specifically, the problem takes the form of
(SNL) min∑
i,j∈S(‖xi − xj‖2 − dij2
)2+∑
i∈S,j∈A(‖xi − aj‖2 − dij2
)2s.t. xi ∈ G, i ∈ S.
Notice that the objective function of (SNL) is an inhomogeneous quartic polynomial
function. If the specified region G is well formed, say the Euclidean ball, an ellipsoid, a
polytope, or any other convex compact set that can be sandwiched by two co-centered
ellipsoids, then (SNL) can be fit into the model (PG) in the following way. Suppose
E1 ⊂ G ⊂ E2 with E1 and E2 being two co-centered ellipsoids, we know by the
Lowner-John theorem that E2 is bounded by three times larger of E1 in linear size
(for the Euclidean ball or an ellipsoid it is 1, for a central-symmetric G it is less than√
3, and for a general convex compact G it is less than 3). Denote the number of
sensor nodes to be n = |S|, and denote x =((x1)T, (x2)T, · · · , (xn)T
)T ∈ R3n. Then
x ∈ G×G× · · · ×G︸ ︷︷ ︸n
, and this feasible region can be sandwiched by two co-centered
sets E1 × E1 × · · · × E1︸ ︷︷ ︸n
and E2 × E2 × · · · × E2︸ ︷︷ ︸n
, which are both intersections of n co-
centered ellipsoids, i.e., belonging to En,3n. According to the discussion at the end
of Section 5.4, and noticing in this case t(G;n, 3n) ≤ 3 is a constant, (SNL) admits
a polynomial-time randomized approximation algorithm with relative approximation
ratio Ω(
1n log3 n
).
94 5 Polynomial Optimization with Convex Constraints
5.6 Numerical Experiments
In this section, we present some preliminary test results for the approximation al-
gorithms proposed in this chapter, to give the readers an impression about how our
algorithms work in practice. We shall focus on (PS) with d = 4, specifically, the model
being tested is
(EPS) max p(x) = F4(x,x,x,x) + F3(x,x,x) + F2(x,x) + F1(x)
s.t. x ∈ Sn,
where F 4 ∈ Rn4, F 3 ∈ Rn3
, F 2 ∈ Rn2and F 1 ∈ Rn, are super-symmetric tensors of
orders 4, 3, 2 and 1, respectively.
5.6.1 Randomly Simulated Data
A fourth order tensor F ′4 is generated randomly, whose n4 entries follow i.i.d. standard
normals. We then symmetrize F ′4 to form the super-symmetric tensor F 4 by averaging
the related entries. The other lower order tensors F 3, F 2 and F 1 are generated in
the same manner. We then apply Algorithm 5.2.1 to get a feasible solution with its
objective value denote by v, which has a guaranteed worst-case performance ratio.
For the purpose of making a comparison, we also compute an upper bound of the
optimal value of (EPS). Like in (5.2), we may let F (x, x, x, x) = f(x) = p(x) when
xh = 1, and F ∈ R(n+1)4 is super-symmetric. (EPS) can be relaxed to
max F (x, x, x, x)
s.t. ‖x‖ ≤√
2, x ∈ Rn+1.
Let y = vec(xxT) ∈ R(n+1)2 , and rewrite F as an (n+ 1)2× (n+ 1)2 matrix F ′. (EPS)
is further relaxed to
max F ′(y,y) = yTF ′y
s.t. ‖y‖ ≤ 2, y ∈ R(n+1)2 .
The optimal value of the above problem is v = 4λmax(F ′), which is taken as an upper
bound of v(EPS).
By Theorem 5.2.2, Algorithm 5.2.1 possesses a theoretic worst-case relative perfor-
mance ratio of 2−10 · 5! · 4−8(n + 1)−1 = Ω(1/n). The numerical simulation results of
(EPS) are listed in Table 5.1. Based on the observation, by comparing with the upper
bound v (which might be very loose), the absolute performance ratio τ := v/v is about
Ω(1/√n), rather than a theoretical relative ratio Ω(1/n).
5.6 Numerical Experiments 95
Table 5.1: Numerical results (average of 10 instances) of (EPS)
n 3 5 10 20 30 40 50 60 70
v 0.342 0.434 0.409 0.915 0.671 0.499 0.529 0.663 0.734
v 10.5 16.1 26.7 51.7 74.4 97.8 121.1 143.6 167.1
τ (%) 3.257 2.696 1.532 1.770 0.902 0.510 0.437 0.462 0.439
n · τ 0.098 0.135 0.153 0.354 0.271 0.204 0.218 0.277 0.307√n · τ 0.056 0.060 0.048 0.079 0.049 0.032 0.031 0.036 0.037
With regard to the computational efforts, we report that Algorithm 5.2.1 ran fairly
fast. For instance, for n = 70 we were able to get a feasible solution within seconds,
while computing the upper bound v costed much more computational time. For n ≥ 80,
however, our computer reported to run out of memory in the experiments, a problem
purely due to the sheer size of the input data.
5.6.2 Local Improvements
The theoretical worst-case performance ratios that we have developed so far are cer-
tainly very conservative, as observed in the previous subsection. It will be desirable to
design a more realistic procedure to know how good the solutions actually are. One
point to note is that we can always improve the quality of the solution by applying a
local improvement procedure on our heuristic solutions. In the Matlab environment,
such local search procedure is readily available, e.g., the fmincon function in Matlab
7.7.0 (R2008b), which finds a local KKT point starting from the feasible solution that
we provide. In our experiments, we find that the fmincon function works well at least
for the low dimensional problems. In particular, for our test cases, it works quite stably
up to n = 10.
In order to evaluate the true quality of our approximate solutions it is desirable to
probe the optimal values, instead of using the loose upper bounds. For this purpose
we set up the following experiments. In this set of experiments we restrict ourselves to
the low dimensional cases, say n ≤ 10. First we take the feasible approximate solution
(which has an objective v) as a starting point to be followed by the local improvement
procedure of the fmincon function to obtain a KKT solution, and denote its objective
value to be vf . Then we use a brutal force approach to randomly sample 1,000 feasible
96 5 Polynomial Optimization with Convex Constraints
Table 5.2: Numerical objectives of (EPS) with local improvements for n = 5
Instance 1 2 3 4 5 6 7 8 9 10
v 0.35 0.47 0.08 0.40 0.17 0.13 0.07 1.78 0.32 0.53
vf 4.16 4.85 4.24 3.99 4.28 6.49 6.46 6.42 5.14 6.84
v∗f 4.16 4.85 4.24 3.99 4.28 6.49 6.46 6.42 5.14 6.84
v 14.33 14.92 14.88 15.62 17.59 14.34 15.60 19.12 13.63 15.01
Table 5.3: Numerical objectives of (EPS) with local improvements for n = 10
Instance 1 2 3 4 5 6 7 8 9 10
v 1.51 1.24 0.80 0.28 0.09 0.12 0.30 0.36 0.35 0.61
vf 8.98 7.74 9.74 8.71 8.14 11.24 9.82 7.75 9.18 11.08
v∗f 9.70 7.74 9.74 8.90 8.14 11.24 9.82 7.85 9.18 11.08
v 26.67 24.88 28.06 28.75 27.82 26.99 26.92 27.75 27.83 27.10
solutions, followed by the same local improving fmincon function in Matlab. We then
pick the best one as the proxy of the true optimal solution, and denote its objective
value to be v∗f . This is doable for the case n ≤ 10 in our computational environment.
For the case n = 5 and n = 10 respectively, we generate 10 random instances
of (EPS). The solutions obtained, as described above, are shown in Table 5.2 and
Table 5.3 respectively. The results are quite telling: Algorithm 5.2.1 together with
fmincon yields near optimal solutions, at least for low dimension problems. However
for problems in high dimensions, a stable local improvement procedure is a nontrivial
task, interested readers are referred to a recent paper by Chen et al. [25].
Chapter 6
Polynomial Optimization with
Binary Constraints
6.1 Introduction
We shift the focus from continuous optimization models in previous chapters, to discrete
optimizations. In fact, a very large class of discrete optimization problems have their
objectives and constraints being polynomials, e.g., the graph partition problems, the
network flow problems. In particular, this chapter is concerned with the models of opti-
mizing a polynomial function subject to binary constraints, with the objective focusing
on the four types of polynomial functions mentioned in Section 2.1.1. Specifically, the
models are
(TB) max F (x1,x2, · · · ,xd)
s.t. xk ∈ Bnk , k = 1, 2, . . . , d;
(HB) max f(x)
s.t. x ∈ Bn;
(MB) max f(x1,x2, · · · ,xs)
s.t. xk ∈ Bnk , k = 1, 2, . . . , s;
(PB) max p(x)
s.t. x ∈ Bn.
These four models are discussed sequentially, each one in one chapter. The lat-
ter model generalizes the former one, and each generalization has its own approach
97
98 6 Polynomial Optimization with Binary Constraints
and technique to cope with. The last model, (PB), is indeed a very general discrete
optimization model, since in principle it can be used to model the following general
polynomial optimization problem in discrete values:
max p(x)
s.t. xi ∈ ai1, ai2, · · · , aimi, i = 1, 2, . . . , n.
We also discuss polynomial optimizations over hypercubes as some byproducts of this
chapter. They are models (TB), (HB), (MB) and (PB), i.e., the respective models (TB),
(HB), (MB) and (PB) with B being replaced by B.
All the models are unfortunately NP-hard when the degree of the objective poly-
nomial d ≥ 2, albeit they are trivial when d = 1. This is because each one includes
computing the matrix ∞ 7→ 1-norm (see e.g., [5]) as a subclass, i.e.,
‖F ‖∞7→1 = max (x1)TFx2
s.t. x1 ∈ Bn1 , x2 ∈ Bn2 ,
which is also the exact model of (TB) when d = 2. The matrix ∞ 7→ 1-norm is related
to so-call the matrix cut-norm, the current best polynomial-time approximation ratio
for matrix ∞ 7→ 1-norm as well as the matrix cut-norm is 2 ln(1+√
2)π ≈ 0.56, due to
Alon and Naor [5]. Huang and Zhang [59] considered similar problems for the complex
discrete variables and derived constant approximation ratios. When d = 3, (TB) is
a slight generalization of the model considered by Khot and Naor [63], where F is
assumed to be super-symmetric (implying n1 = n2 = n3) and square-free (Fijk = 0
whenever two of the three indices are equal). The approximation bound of the optimal
value given in [63] is Ω(√
logn1
n1
).
For the model (HB), its NP-hardness for d = 2 can also be derived by reducing to the
max-cut problem, where the matrix F is the Laplacian of a given graph. In a seminar
work by Goemans and Williamson [40], a polynomial-time randomized approximation
algorithm is given with approximation ratio 0.878, by the well known SDP relaxation
and randomization technique. The method is then generalized by Nesterov, who in [88]
proved a 0.63-approximation ratio for (HB) when the matrix F is positive semidefinite.
A more generalized result is due to Charikar and Wirth [24], where an Ω (1/ log n)-
approximate ratio for (HB) is proposed when the matrix F is diagonal-free. If the
degree of the objective polynomial goes higher, the only approximation result in the
literature is due to Khot and Naor [63] in considering homogeneous cubic polynomial,
6.2 Multilinear Form with Binary Constraints 99
where an Ω
(√lognn
)-approximation bound is provided when the tensor F is square-
free. In fact, square-free (or in the matrix case diagonal-free) is some kind of necessary
condition to derive polynomial-time approximation algorithms (see e.g., [4]). Even in
the quadratic case, there is no polynomial-time approximation algorithm with a positive
approximation ratio for the general model (HB) unless P = NP .
In this chapter we propose polynomial-time randomized approximation algorithms
with provable worst-case performance ratios for all the models mentioned in the begin-
ning, provided that the degree of the objective polynomial is fixed. Section 6.2 discusses
the model (TB). Essentially, we apply a similar approach as in Chapter 3, by relaxing
the multilinear form objective to a lower order multilinear form. However the discrete
natural makes the problems quite different as the continuous ones in Chapter 3, and a
novel decomposition routine is proposed in order to derive the approximation bound.
Section 6.3 and Section 6.4 discuss models (HB) and (MB), respectively. Both of them
use multilinear form relaxations, armed with two different versions of link identities, in
order to preserve the approximation bounds under the square-free property. General
model (PB) is discussed in Section 6.5, where the homogenization technique in Chap-
ter 5 is modified and applied. All these approximation algorithms can be applied to
polynomial optimizations over hypercubes, and we also brief the results in Section 6.5
as some byproducts. Some specific applications for the discrete models and approxi-
mation algorithms proposed in this chapter will be discussed in Section 6.6. Finally,
we report our numerical experiment results in Section 6.7.
6.2 Multilinear Form with Binary Constraints
Our first discrete model in question is to maximize a multilinear function in binary
variables, specifically
(TB) max F (x1,x2, · · · ,xd)
s.t. xk ∈ Bnk , k = 1, 2, . . . , d,
where n1 ≤ n2 ≤ · · · ≤ nd.
This model is NP-hard when d ≥ 2, and we shall propose polynomial-time random-
ized approximation algorithms with worse-case performance ratios. The case of d = 2
is to compute ‖F ‖∞7→1, whose best approximation bound is 2 ln(1+√
2)π ≈ 0.56, due to
100 6 Polynomial Optimization with Binary Constraints
Alon and Naor [5]. It also servers as a basis in our subsequence analysis. When d = 3,
Khot and Naor [63] proposed a randomized procedure to compute the optimal value of
(TB) in polynomial-time, with approximation bound Ω(√
logn1
n1
).
Our approximation algorithm works for general degree d based on recursion, and is
fairly simple. We may take any approximation algorithm for the d = 2 case, say the
algorithm by Alon and Naor [5], as a basis. When d = 3, noticing that any n1×n2×n3
third order tensor can be rewritten as an n1n2 × n3 matrix by combining its first and
second modes, (TB) can be relaxed to
max F (X,x3)
s.t. X ∈ Bn1n2 , x3 ∈ Bn3 .
This problem is the exact form of (TB) when d = 2, which can be solved approximately
with approximation ratio 2 ln(1+√
2)π . Denote its approximate solution to be (X, x3).
The next key step is to recover (x1, x2) from X. For this purpose, we introduce the
following decomposition routine, which plays a fundamental role in our algorithms for
binary variables, similar as DR 3.2.1, DR 3.2.2 and DR 3.3.1 in Chapter 3.
Decomposition Routine 6.2.1
• INPUT: matrices M ∈ Rn1×n2 , X ∈ Bn1×n2.
1 Construct
X =
In1×n1 X/√n1
XT/√n1 XTX/n1
0.
2 Randomly generate ξ
η
∼ N (0n1+n2 , X)
and compute x1 = sign (ξ) and x2 = sign (η), and repeat if necessary, until
(x1)TMx2 ≥ 2π√n1M •X.
• OUTPUT: vectors x1 ∈ Bn1 , x2 ∈ Bn2.
The complexity for DR 6.2.1 is O (n1n2) in each trial with expectation. Now, if we
let (M ,X) = (F (·, ·, x3), X) and apply DR 6.2.1, then we can prove that the output
6.2 Multilinear Form with Binary Constraints 101
(x1,x2) satisfies
E[F (x1,x2, x3)] = E[(x1)TMx2] ≥ 2M • Xπ√n1
=2F (X, x3)
π√n1
≥ 4 ln(1 +√
2)
π2√n1v(TB),
which yields an approximation ratio for d = 3. By a recursive procedure, this approxi-
mation algorithm is readily extended to solve (TB) with any fixed degree d.
Theorem 6.2.1 (TB) admits a polynomial-time randomized approximation algorithm
with approximation ratio τ(TB), where
τ(TB) :=
(2
π
)d−1
ln(
1 +√
2)(d−2∏
k=1
nk
)− 12
= Ω
(d−2∏k=1
nk
)− 12
.
Proof. The proof is based on mathematical induction on the degree d. For the case
of d = 2, it is exactly the algorithm by Alon and Naor [5]. For general d ≥ 3, let
X = x1(xd)T
and (TB) is then relaxed to
(TB) max F (X,x2,x3 · · · ,xd−1)
s.t. X ∈ Bn1nd ,
xk ∈ Bnk , k = 2, 3, . . . , d− 1,
where we treat X as an n1nd-dimensional vector, and F ∈ Rn1nd×n2×n3×···×nd−1 as a
(d − 1)-th order tensor. Observe that (TB) is the exact form of (TB) in degree d − 1,
and so by induction we can find X ∈ Bn1nd and xk ∈ Bnk (k = 2, 3, . . . , d − 1) in
polynomial-time, such that
F(X, x2, x3, . . . , xd−1
)≥ (2/π)d−2 ln
(1 +√
2) (∏d−2
k=2 nk
)− 12v(TB)
≥ (2/π)d−2 ln(1 +√
2) (∏d−2
k=2 nk
)− 12v(TB).
Rewrite X as an n1 × nd matrix, construct
X =
In1×n1 X/√n1
XT/√n1 X
TX/n1
0,
as in DR 6.2.1, and randomly generate ξ
η
∼ N (0n1+nd , X).
Let x1 = sign (ξ) and xd = sign (η). Noticing that the diagonal components of X are
all ones, by an expectation identity in Goemans and Williamson [40], it follows that
E[x1i xdj
]=
2
πarcsin
Xij√n1
=2
πXij arcsin
1√n1
∀ 1 ≤ i ≤ n1, 1 ≤ j ≤ nd,
102 6 Polynomial Optimization with Binary Constraints
where the last equality is due to |Xij | = 1. Let matrix Q = F (·, x2, x3, · · · , xd−1, ·),
and we have
E[F(x1, x2, · · · , xd
)]= E
∑1≤i≤n1,1≤j≤nd
x1i Qij x
dj
=
∑1≤i≤n1,1≤j≤nd
Qij E[x1i xdj
]=
∑1≤i≤n1,1≤j≤nd
Qij2
πXij arcsin
1√n1
=2
πarcsin
1√n1
∑1≤i≤n1,1≤j≤nd
QijXij
=2
πarcsin
1√n1
F(X, x2, x3, · · · , xd−1
)(6.1)
≥ 2
π√n1
(2
π
)d−2
ln(
1 +√
2)(d−2∏
k=2
nk
)− 12
v(TB)
=
(2
π
)d−1
ln(
1 +√
2)(d−2∏
k=1
nk
)− 12
v(TB).
Thus x1 and xd can be found by a randomization process, which concludes the induction
step.
To summarize this section, the algorithm for solving general model (TB) is attached
below. This algorithm is similar to Algorithm 3.2.3, with major differences lying in
different decomposition routines and the computability for the case of d = 2.
Algorithm 6.2.2
• INPUT: a d-th order tensor F ∈ Rn1×n2×···×nd with n1 ≤ n2 ≤ · · · ≤ nd.
1 Rewrite F as a (d− 1)-th order tensor F ′ ∈ Rn2×n3×···×nd−1×ndn1 by combing its
first and last modes into one, and placing it in the last mode of F ′, i.e.,
Fi1,i2,··· ,id = F ′i2,i3,··· ,id−1,(i1−1)nd+id∀ 1 ≤ i1 ≤ n1, 1 ≤ i2 ≤ n2, · · · , 1 ≤ id ≤ nd.
2 For (TB) with the (d − 1)-th order tensor F ′: if d − 1 = 2, then apply SDP
relaxation and randomization procedure (Alon and Naor [5]) to obtain an approx-
imate solution (x2, x1,d); otherwise obtain a solution (x2, x3, · · · , xd−1, x1,d) by
recursion.
6.3 Homogeneous Form with Binary Constraints 103
3 Compute a matrix M ′ = F (·, x2, x3, · · · , xd−1, ·) and rewrite the vector x1,d as
a matrix X ∈ Bn1×nd.
4 Apply DR 6.2.1, with input (M ′, X) = (M ,X) and output (x1, xd) = (x1,x2).
• OUTPUT: a feasible solution (x1, x2, · · · , xd).
6.3 Homogeneous Form with Binary Constraints
We now consider the model of maximizing a general homogeneous polynomial function
in binary variables, i.e.,
(HB) max f(x)
s.t. x ∈ Bn,
where f(x) is a d-th degree homogenous polynomial with associated super-symmetric
tensor F ∈ Rnd .
When d = 2, an Ω (1/ log n)-approximate ratio for (HB) is proposed when the
matrix F is diagonal-free, by Charikar and Wirth [24]; When d = 3, an Ω
(√lognn
)-
approximation bound for the optimal value of (HB) is provided if the tensor F is
square-free, by Khot and Naor [63]. We remark that the square-free property is a
necessary condition to derive the approximation ratios. Even in the quadratic and
cubic cases for (HB), there is no polynomial-time approximation algorithm with a
positive approximation ratio unless P = NP (see [4]).
As before, we propose polynomial-time randomized approximation algorithms of
(HB) for any fixed degree d. Like the model (TS), the key link from multilinear form
F (x1,x2, · · · ,xd) to the homogeneous polynomial f(x) is Lemma 4.2.1. The approx-
imation ratios for (HB) hold under the square-free condition. This is because under
such conditions, the decision variables are actually in the multilinear form. Hence, one
can replace any point in the hypercube (Bn) by one of its vertices (Bn) without de-
creasing its objective value, due to the linearity. Before presenting our main results in
this section, we first study a property of the square-free polynomial in binary variables,
which will be used frequently in this chapter and the next chapter (Chapter 7).
Lemma 6.3.1 If polynomial function p(x) is square-free and z ∈ Bn, then x ∈ Bn and
x ∈ Bn can be found in polynomial-time, such that p(x) ≤ p(z) ≤ p(x).
104 6 Polynomial Optimization with Binary Constraints
Proof. Since p(x) is square-free, by fixing x2, x3, · · · , xn as constants and taking x1 as
an independent variable, we may write
p(x) = g1(x2, x3, · · · , xn) + x1g2(x2, x3, · · · , xn).
Let
x1 =
− 1 g2(z2, z3, · · · , zn) ≥ 0,
1 g2(z2, z3, · · · , zn) < 0.
Then
p((x1, z2, z3, · · · , zn)T
)≤ p(z).
Repeat the same procedures for z2, z3, · · · , zn, and let them be replaced by binary scales
x2, x3, · · · , xn, respectively. Then x = (x1, x2, · · · , xn)T ∈ Bn satisfies p(x) ≤ p(z).
Using a similar procedure, we may find x ∈ Bn with p(x) ≥ p(z).
Lemma 6.3.1 actually proposes a polynomial-time procedure in finding a point in
Bn to replace a point in Bn, without decreasing (or increasing) its function value. Now,
armed with Lemma 6.3.1 and the link Lemma 4.2.1, we present the main results in this
section.
Theorem 6.3.2 If f(x) is square-free and d ≥ 3 is odd, then (HB) admits a polynomial-
time randomized approximation algorithm with approximation ratio τ(HB), where
τ(HB) :=
(2
π
)d−1
ln(
1 +√
2)d! d−dn−
d−22 = Ω
(n−
d−22
).
Proof. Let f(x) = F (x,x, · · · ,x︸ ︷︷ ︸d
) with F being super-symmetric, and (HB) can be
relaxed to
(HB) max F (x1,x2, · · · ,xd)
s.t. xk ∈ Bn, k = 1, 2, . . . , d.
By Theorem 6.2.1 we are able to find a set of binary vectors (x1, x2, · · · , xd) in
polynomial-time, such that
F (x1, x2, · · · , xd) ≥(
2
π
)d−1
ln(1+√
2)n−d−22 v(HB) ≥
(2
π
)d−1
ln(1+√
2)n−d−22 v(HB).
When d is odd, let ξ1, ξ2, · · · , ξd be i.i.d. random variables, each taking values 1 and
−1 with equal probability 1/2. Then by Lemma 4.2.1 it follows that
d!F (x1, x2, · · · , xd) = E
[d∏i=1
ξif
(d∑
k=1
ξkxk
)]= E
f d∑k=1
∏i 6=k
ξi
xk .
6.3 Homogeneous Form with Binary Constraints 105
Thus we may find a binary vector β ∈ Bd, such that
f
d∑k=1
∏i 6=k
βi
xk ≥ d!F (x1, x2, . . . , xd) ≥
(2
π
)d−1
ln(
1 +√
2)d!n−
d−22 v(HB).
Now we notice that 1d
∑dk=1
(∏i 6=k βi
)xk ∈ Bn, because for all 1 ≤ j ≤ n,
∣∣∣∣∣∣1
d
d∑k=1
∏i 6=k
βi
xkj
∣∣∣∣∣∣ =1
d
∣∣∣∣∣∣d∑
k=1
∏i 6=k
βi
xkj
∣∣∣∣∣∣ ≤ 1
d
d∑k=1
∣∣∣∣∣∣∏i 6=k
βi
xkj
∣∣∣∣∣∣ = 1. (6.2)
Since f(x) is square-free, by Lemma 6.3.1 we are able to find x ∈ Bn in polynomial-
time, such that
f(x) ≥ f
1
d
d∑k=1
∏i 6=k
βi
xk = d−df
d∑k=1
∏i 6=k
βi
xk ≥ τ(HB) v(HB).
Theorem 6.3.3 If f(x) is square-free and d ≥ 4 is even, then (HB) admits a polynomial-
time randomized approximation algorithm with relative approximation ratio τ(HB).
Proof. Like in the proof of Theorem 6.3.2, by relaxing (HB) to (HB), we are able to
find a set of binary vectors (x1, x2, · · · , xd) with
F (x1, x2, · · · , xd) ≥(
2
π
)d−1
ln(
1 +√
2)n−
d−22 v(HB).
Besides, we observe that v(HB) ≤ v(HB) and v(HB) ≥ v(HB) = −v(HB). Therefore
2 v(HB) ≥ v(HB)− v(HB).
Let ξ1, ξ2, · · · , ξd be i.i.d. random variables, each taking values 1 and −1 with equal
probability 1/2. Use a similar argument of (6.2), we have 1d
∑dk=1 ξkx
k ∈ Bn. Then by
Lemma 6.3.1, there exists x ∈ Bn such that
f
(1
d
d∑k=1
ξkxk
)≥ f(x) ≥ v(HB).
106 6 Polynomial Optimization with Binary Constraints
Applying Lemma 4.2.1 and we have
1
2E
[f
(1
d
d∑k=1
ξkxk
)− v(HB)
∣∣∣∣∣d∏i=1
ξi = 1
]
≥ 1
2E
[f
(1
d
d∑k=1
ξkxk
)− v(HB)
∣∣∣∣∣d∏i=1
ξi = 1
]
−1
2E
[f
(1
d
d∑k=1
ξkxk
)− v(HB)
∣∣∣∣∣d∏i=1
ξi = −1
]
= E
[d∏i=1
ξi
(f
(1
d
d∑k=1
ξkxk
)− v(HB)
)]
= d−dE
[d∏i=1
ξif
(d∑
k=1
ξkxk
)]− v(HB) E
[d∏i=1
ξi
]= d−dd!F (x1, x2, . . . , xd) ≥ τ(HB) v(HB) ≥ (τ(HB)/2) (v(HB)− v(HB)) .
Thus we may find a binary vector β ∈ Bd with∏di=1 βi = 1, such that
f
(1
d
d∑k=1
βkxk
)− v(HB) ≥ τ(HB) (v(HB)− v(HB)) .
Noticing that 1d
∑dk=1 βkx
k ∈ Bn and applying Lemma 6.3.1, by the square-free property
of f(x), we are able to find x ∈ Bn with
f(x)− v(HB) ≥ f
(1
d
d∑k=1
βkxk
)− v(HB) ≥ τ(HB) (v(HB)− v(HB)) .
To conclude this section, we summarize the algorithm for approximately solving
(HB) below (no matter d is odd or even).
Algorithm 6.3.1
• INPUT: a d-th order super-symmetric square-free tensor F ∈ Rnd.
1 Apply Algorithm 6.2.2 to solve the problem
max F (x1,x2, · · · ,xd)
s.t. xk ∈ Bn, k = 1, 2, . . . , d
approximately, with input F and output (x1, x2, · · · , xd).
6.4 Mixed Form with Binary Constraints 107
2 Compute x = arg maxf(
1d
∑dk=1 ξkx
k), ξ ∈ Bd
.
3 Apply the procedure in Lemma 6.3.1, with input x ∈ Bn and polynomial function
f(x), and output x ∈ Bn satisfying f(x) ≥ f(x).
• OUTPUT: a feasible solution x ∈ Bn.
6.4 Mixed Form with Binary Constraints
We further move on to consider the mixed form of discrete polynomial optimization
model
(MB) max f(x1,x2, · · · ,xs)
s.t. xk ∈ Bnk , k = 1, 2, . . . , s,
where associated with function f is a tensor F ∈ Rn1d1×n2
d2×···×nsds with partial sym-
metric property, n1 ≤ n2 ≤ · · · ≤ ns, and d = d1 + d2 + · · · + ds is deemed as a fixed
constant. This model is a generalization of (TB) in Section 6.2 and (HB) in Section 6.3,
making the model applicable to a wider range of practical problems.
Here again we focus on polynomial-time approximation algorithms. Similar as the
approach in dealing with (HB), we relax the objective function f(x1,x2, · · · ,xs) of
(MB) to a multilinear function, which leads to (TB). After solving (TB) approximately
by Theorem 6.2.1, we are able to adjust the solutions one by one, using Lemma 4.4.3.
The following approximation results are presented, which are comparable to that in
Section 6.3.
Theorem 6.4.1 If f(x1,x2, · · · ,xs) is square-free in each xk (k = 1, 2, . . . , s), d ≥ 3
and one of dk (k = 1, 2, . . . , s) is odd, then (MB) admits a polynomial-time randomized
approximation algorithm with approximation ratio τ(MB), where
τ(MB) := τ(MS)
(2
π
)d−1
ln(
1 +√
2) s∏k=1
dk!
dkdk
= Ω (τ(MS))
=
(2
π
)d−1
ln(
1 +√
2)( s∏
k=1
dk!
dkdk
)(∏s−1k=1 nk
dk
ns−1
)− 12
ds = 1,
(2
π
)d−1
ln(
1 +√
2)( s∏
k=1
dk!
dkdk
)(∏sk=1 nk
dk
ns2
)− 12
ds ≥ 2.
108 6 Polynomial Optimization with Binary Constraints
Proof. Like in the proof of Theorem 6.3.2, by relaxing (MB) to (TB), we are able to
find a set of binary vectors (x1, x2, · · · , xd) with
F (x1, x2, · · · , xd) ≥ τ(MB)
(s∏
k=1
dkdk
dk!
)v(MB).
Let ξ = (ξ1, ξ2, · · · , ξd)T, whose components are i.i.d. random variables, taking values
1 and −1 with equal probability 1/2. Similar as (4.7), we denote
x1ξ =
d1∑k=1
ξkxk, x2
ξ =
d1+d2∑k=d1+1
ξkxk, · · · , xsξ =
d∑k=d1+d2+···+ds−1+1
ξkxk.
Without loss of generality, we assume d1 to be odd. By applying Lemma 4.4.3 we have
s∏k=1
dk!F (x1, x2, · · · , xd) = E
[d∏i=1
ξif(x1ξ , x
2ξ , · · · , xsξ
)]= E
[f
(d∏i=1
ξix1ξ , x
2ξ , · · · , xsξ
)].
Therefore we are able to find a binary vector β ∈ Bd, such that
f
(d∏i=1
βix1β
d1,x2β
d2, · · · ,
xsβds
)≥
s∏k=1
dk!dk−dkF (x1, x2, · · · , xd) ≥ τ(MB) v(MB).
Similar as (6.2), it is not hard to verify that∏di=1 βix
1β/d1 ∈ Bn1 , and xkβ/dk ∈ Bnk for
k = 2, 3, . . . , s. By the square-free property of the function f and applying Lemma 6.3.1,
we are able to find a set of binary vectors (x1, x2, · · · , xs) in polynomial-time, such that
f(x1, x2, · · · , xs) ≥ f
(d∏i=1
βix1β
d1,x2β
d2, · · · ,
xsβds
)≥ τ(MB) v(MB).
Theorem 6.4.2 If f(x1,x2, · · · ,xs) is square-free in each xk (k = 1, 2, . . . , s), d ≥ 4
and all dk (k = 1, 2, . . . , s) are even, then (MB) admits a polynomial-time randomized
approximation algorithm with relative approximation ratio τ(MB).
Proof. The proof is analogous to that of Theorem 6.3.3. The main differences are:
(i) we use Lemma 4.4.3 instead of invoking Lemma 4.2.1 directly; and (ii) we use
f(
1d1x1ξ ,
1d2x2ξ , · · · , 1
dsxsξ
)instead of f
(1d
∑dk=1 ξkx
k)
during the randomization pro-
cess.
6.5 Polynomial with Binary Constraints 109
6.5 Polynomial with Binary Constraints
Finally, we consider binary integer programming model to the optimization on a generic
(inhomogeneous) polynomial function, i.e.,
(PB) max p(x)
s.t. x ∈ Bn.
Extending the approximation algorithms and the corresponding analysis for homo-
geneous polynomial optimization to general inhomogeneous polynomials is not straight-
forward. Technically it is also a way to get around the square-free property, which is
a requirement for all the homogeneous polynomial optimizations discussed in previous
sections. The analysis here, is similar as that in Chapter 5, to directly deal with ho-
mogenization. An important observation here is that p(x) can always be rewritten as
a square-free polynomial, since we have xi2 = 1 for i = 1, 2, . . . , n, which allows us
to reduce the power of xi to 0 or 1 in each monomial of p(x). We now propose the
following algorithm for approximately solving (PB).
Algorithm 6.5.1
• INPUT: an n-dimensional d-th degree polynomial function p(x).
1 Rewrite p(x) as a square-free polynomial function p0(x), and then rewrite p0(x)−
p0(0) = F (x, x, · · · , x︸ ︷︷ ︸d
) when xh = 1 as in (5.2), with F being an (n + 1)-
dimensional d-th order super-symmetric tensor.
2 Apply Algorithm 6.2.2 to solve the problem
max F (x1, x2, · · · , xd)
s.t. xk ∈ Bn+1, k = 1, 2, . . . , d
approximately, with input F and output (u1, u2, · · · , ud).
3 Compute (z1, z2, · · · , zd) = arg maxF((
ξ1u1/d1
),(ξ2u2/d
1
), · · · ,
(ξdu
d/d1
)), ξ ∈ Bd
.
4 Compute z = arg maxp0(0); p0 (z(β)/zh(β)) ,β ∈ Bd and β1 =
∏dk=2 βk = 1
,
with z(β) = β1(d+ 1)z1 +∑d
k=2 βkzk.
110 6 Polynomial Optimization with Binary Constraints
5 Apply the procedure in Lemma 6.3.1, with input z ∈ Bn and polynomial function
p0(x), and output y ∈ Bn satisfying p0(y) ≥ p0(z).
• OUTPUT: a feasible solution y ∈ Bn.
Before presenting the main result and analyzing Algorithm 6.5.1, we first study
another property of the square-free polynomial. Namely, the overall average of the
function values on the support set Bn is zero, and this plays an important role in
analyzing the algorithm for (PB).
Lemma 6.5.1 If the polynomial function p(x) in (PB) : maxx∈Bn p(x) is square-free
and has no constant term, then v(PB) ≤ 0 ≤ v(PB), and a binary vector x ∈ Bn can
be found in polynomial-time with p(x) ≥ 0.
Proof. Let ξ1, ξ2, · · · , ξn be i.i.d. random variables, each taking values 1 and −1 with
equal probability 1/2. For any monomial Fi1i2...ikxi1xi2 · · ·xik with degree k (1 ≤ k ≤ d)
of p(x), by the square-free property, it follows that
E[Fi1i2...ikξi1ξi2 · · · ξik ] = Fi1i2···ikE[ξi1 ]E[ξi2 ] · · ·E[ξik ] = 0.
This implies E[p(ξ)] = 0, and consequently v(PB) ≤ 0 ≤ v(PB). By a randomization
process, a binary vector x ∈ Bn can be found in polynomial-time with p(x) ≥ 0.
We remark that the second part of Lemma 6.5.1 can also be proven by conducting
the procedure in Lemma 6.3.1 with the input vector 0 ∈ Bn, since p(0) = 0. Therefore,
finding a binary vector x ∈ Bn with p(x) ≥ 0 can be done by either a randomized
process (Lemma 6.5.1) or a deterministic process (Lemma 6.3.1). We now present the
main result in this section.
Theorem 6.5.2 (PB) admits a polynomial-time randomized approximation algorithm
with relative approximation ratio τ(PB), where
τ(PB) :=ln(1 +√
2)
2(1 + e)πd−1(d+ 1)! d−2d(n+ 1)−
d−22 = Ω
(n−
d−22
).
Proof. The main idea of the proof is quite similar as that of Theorem 5.2.2. However
the discrete nature of the problem as well as the non-convex feasible region requires us
to be more careful dealing with the specific details. As we are working with relative
6.5 Polynomial with Binary Constraints 111
approximation ratio, by Step 1 of Algorithm 6.5.1, we may assume that p(x) is square-
free and has no constant term. Then by homogenization as (5.2)
p(x) = F
((x
xh
),
(x
xh
), · · · ,
(x
xh
)︸ ︷︷ ︸
d
)= F (x, x, · · · , x︸ ︷︷ ︸
d
) = f(x),
where f(x) = p(x) if xh = 1, and f(x) is an (n+1)-dimensional homogeneous polynomi-
al function with associated super-symmetric tensor F ∈ R(n+1)d whose last component
is 0. (PB) is then equivalent to
max f(x)
s.t. x =
(x
xh
), x ∈ Bn, xh = 1,
which can be relaxed to an instance of (TB) as follows
(PB) max F (x1, x2, · · · , xd)
s.t. xk ∈ Bn+1, k = 1, 2, . . . , d.
Let (u1, u2, · · · , ud) be the feasible solution for (PB) found by Theorem 6.2.1 with
F (u1, u2, · · · , ud) ≥ (2/π)d−1 ln(1 +√
2)(n+ 1)−d−22 v(PB)
≥ (2/π)d−1 ln(1 +√
2)(n+ 1)−d−22 v(PB).
Denote vk = uk/d for k = 1, 2, . . . , d, and consequently
F (v1, v2, · · · , vd) = d−dF (u1, u2, · · · , ud) ≥ (2/π)d−1 ln(1 +√
2)d−d(n+ 1)−d−22 v(PB).
Notice that for all 1 ≤ k ≤ d, |vkh| = |ukh/d| = 1/d ≤ 1 and the last component of tensor
F is 0. By applying Lemma 5.2.4, it follows that
E
[d∏
k=1
ηkF
((η1v
1
1
),
(η2v
2
1
), · · · ,
(ηdv
d
1
))]= F (v1, v2, · · · , vd)
and
E
[F
((ξ1v
1
1
),
(ξ2v
2
1
), · · · ,
(ξdv
d
1
))]= 0,
where η1, η2, . . . , ηd are independent random variables, each taking values 1 and −1
with E[ηk] = vkh for k = 1, 2, . . . , d, and ξ1, ξ2, · · · , ξd are i.i.d. random variables, each
taking values 1 and −1 with equal probability 1/2. Combining the two identities, we
112 6 Polynomial Optimization with Binary Constraints
have, for any constant c, the following identity
F (v1, v2, · · · , vd)
=∑
β∈Bd,∏dk=1 βk=−1
(c− Prob η = β)F((
β1v1
1
),
(β2v
2
1
), · · · ,
(βdv
d
1
))
+∑
β∈Bd,∏dk=1 βk=1
(c+ Prob η = β)F((
β1v1
1
),
(β2v
2
1
), · · · ,
(βdv
d
1
)).
If we let c = maxβ∈Bd,∏dk=1 βk=−1 Prob η = β, then the coefficient of each term F in
the above is nonnegative. Therefore, a binary vector β′ ∈ Bd can be found, such that
F
((β′1v
1
1
),
(β′2v
2
1
), · · · ,
(β′dv
d
1
))≥ τ0 F (v1, v2, · · · , vd),
with
τ0 =
∑β∈Bd,
∏dk=1 βk=1
(c+ Prob η = β) +∑
β∈Bd,∏dk=1 βk=−1
(c− Prob η = β)
−1
≥(
2dc+ 1)−1≥
(2d(
1
2+
1
2d
)d+ 1
)−1
≥ 1
1 + e,
where c ≤(
12 + 1
2d
)dis applied, since E[ηk] = vkh = ±1/d for k = 1, 2, . . . , d. Denote
zk =(zk
zkh
)=(β′kvk
1
)for k = 1, 2, . . . , d, and we have
F (z1, z2, · · · , zd) ≥ τ0F (v1, v2, · · · , vd) ≥(
2
π
)d−1 ln(1 +√
2)
1 + ed−d(n+ 1)−
d−22 v(PB).
For any β ∈ Bd, denote z(β) = β1(d+ 1)z1 +∑d
k=2 βkzk. By noticing zkh = 1 and
|zki | = |vki | = |uki |/d = 1/d for all 1 ≤ k ≤ d and 1 ≤ i ≤ n, it follows that
2 ≤ |zh(β)| ≤ 2d and |zi(β)| ≤ (d+ 1)/d+ (d− 1)/d = 2 ∀ 1 ≤ i ≤ n.
Thus z(β)/zh(β) ∈ Bn. By Lemma 6.3.1, there exists x′ ∈ Bn, such that
v(PB) ≤ p(x′) ≤ p (z(β)/zh(β)) = f (z(β)/zh(β)) .
Moreover, we shall argue below that
β1 = 1 =⇒ f(z(β)) ≥ (2d)d v(PB). (6.3)
If this were not the case, then by Lemma 6.5.1 f (z(β)/(2d)) < v(PB) ≤ 0. Notice that
β1 = 1 implies zh(β) > 0, and thus we have
f
(z(β)
zh(β)
)=
(2d
zh(β)
)df
(z(β)
2d
)≤ f
(z(β)
2d
)< v(PB),
6.5 Polynomial with Binary Constraints 113
which is a contradiction.
Suppose ξ = (ξ1, ξ2, · · · , ξd)T, whose components are i.i.d. random variables, each
taking values 1 and −1 with equal probability 1/2. Noticing that (6.3) holds and using
the same argument as (5.12), we get
1
2E
[(f (z(ξ))− (2d)d v(PB)
) ∣∣∣∣∣ ξ1 = 1,
d∏k=2
ξk = 1
]≥ d!F
((d+ 1)z1, z2, · · · , zd
).
Therefore, a binary vector β′′ ∈ Bd with β′′1 =∏dk=2 β
′′k = 1 can be found, such that
f(z(β′′))− (2d)d v(PB) ≥ 2d!F ((d+ 1)z1, z2, · · · , zd)
≥(
2
π
)d−1 2 ln(1 +√
2)
1 + e(d+ 1)! d−d(n+ 1)−
d−22 v(PB).
By Lemma 6.5.1, a binary vector x′ ∈ Bn can be found in polynomial-time with
p(x′) ≥ 0. Moreover, as z(β′′)/zh(β′′) ∈ Bn, by Lemma 6.3.1, another binary vector
x′′ ∈ Bn can be found in polynomial-time with p(x′′) ≥ p(z(β′′)/zh(β′′)). Below we
shall prove at least one of x′ and x′′ satisfies
p(x)− v(PB) ≥ τ(PB) (v(PB)− v(PB)) . (6.4)
Indeed, if −v(PB) ≥ τ(PB) (v(PB)− v(PB)), then x′ satisfies (6.4) in this case. Other-
wise we shall have −v(PB) < τ(PB) (v(PB)− v(PB)), then
v(PB) > (1− τ(PB)) (v(PB)− v(PB)) ≥ (v(PB)− v(PB)) /2,
which implies
f
(z(β′′)
2d
)− v(PB) ≥ (2d)−d
(2
π
)d−1 2 ln(1 +√
2)
1 + e(d+ 1)! d−d(n+ 1)−
d−22 v(PB)
≥ τ(PB) (v(PB)− v(PB)) .
The above inequality also implies that f (z(β′′)/(2d)) > 0. Recall that β′′1 = 1 implies
zh(β′′) > 0. Therefore,
p(x′′) ≥ p(z(β′′)
zh(β′′)
)= f
(z(β′′)
zh(β′′)
)=
(2d
zh(β′′)
)df
(z(β′′)
2d
)≥ f
(z(β′′)
2d
),
which implies x′′ satisfies (6.4). Finally, arg maxp(x′), p(x′′) satisfies (6.4) in both
cases.
We remark that (PB) is indeed a very general discrete optimization model. For
example, it can be used to model the following general polynomial optimization problem
114 6 Polynomial Optimization with Binary Constraints
in discrete values:
(PD) max p(x)
s.t. xi ∈ ai1, ai2, · · · , aimi, i = 1, 2, . . . , n.
To see this, we observe that by adopting the Lagrange interpolation technique and
letting
xi =
mi∑j=1
aij∏
1≤k≤mi, k 6=j
ui − kj − k
∀ 1 ≤ i ≤ n,
the original decision variables can be equivalently transformed to
ui = j =⇒ xi = aij ∀ 1 ≤ i ≤ n, 1 ≤ j ≤ mi,
where ui ∈ 1, 2, . . . ,mi, which can be further represented by dlog2mie independent
binary variables. Combining these two steps of substitution, (PD) is then reformu-
lated as (PB), with the degree of its objective polynomial function no larger than
max1≤i≤nd(mi−1), and the dimension of its decision variables being∑n
i=1dlog2mie.
In many real world applications, the data ai1, ai2, · · · , aimi (i = 1, 2, . . . , n) in (PD)
are arithmetic sequences. Then it is much easier to transform (PD) to (PB), without
going through the Lagrange interpolation. It keeps the same degree of its objective
polynomial function, and the dimension of its decision variables is∑n
i=1dlog2mie.
Finally, we remark that all the approximation algorithms proposed in this chapter
are also applicable for the polynomial optimizations over hypercubes (Bn), which are
models (TB), (HB), (MB) and (PB), i.e., the respective models (TB), (HB), (MB) and
(PB) with B being replaced by B. In particular, the square-free conditions are no
longer required for homogeneous form objectives and mixed form objectives. Therefore
Algorithm 6.3.1 and Algorithm 6.5.1 can be made simpler without going through the
process in Lemma 6.3.1. We now conclude this section, as well as the theoretical part
of this chapter, by the following theorem without proof.
Theorem 6.5.3 The following approximation results hold for polynomial optimizations
over hypercubes:
1. (TB) admits a polynomial-time randomized approximation algorithm with approx-
imation ratio τ(TB);
2. If d ≥ 3 is odd, then (HB) admits a polynomial-time randomized approximation
algorithm with approximation ratio τ(HB); Otherwise d ≥ 4 is even, then (HB)
6.6 Applications 115
admits a polynomial-time randomized approximation algorithm with relative ap-
proximation ratio τ(HB);
3. If one of dk (k = 1, 2, . . . , s) is odd, then (MB) admits a polynomial-time ran-
domized approximation algorithm with approximation ratio τ(MB); Otherwise all
dk (k = 1, 2, . . . , s) are even, then (MB) admits a polynomial-time randomized
approximation algorithm with relative approximation ratio τ(MB);
4. (PB) admits a polynomial-time randomized approximation algorithm with relative
approximation ratio τ(PB).
6.6 Applications
The models studied in this chapter have versatile applications. Given the generic
nature of the discrete polynomial optimization models, this point is perhaps self-evident.
However, we believe it is helpful to present a few examples at this point with more
details, to illustrate the potential modeling opportunities with the new optimization
models. We shall present three problems in this section and show that they are readily
formulated by the discrete polynomial optimization models in this chapter.
6.6.1 Cut-Norm of Tensors
The concept of cut-norm is initially defined on a real matrix A = (Aij) ∈ Rn1×n2 ,
denoted by ‖A‖C , the maximum over all I ⊂ 1, 2, . . . , n1 and J ⊂ 1, 2, . . . , n2, of
the quantity |∑
i∈I,j∈J Aij |. This concept plays a major role in the design of efficient
approximation algorithms for dense graph and matrix problems (see e.g., [36, 3]). Alon
and Naor in [5] proposed a polynomial-time randomized approximation algorithm that
approximates the cut-norm with a factor at least 0.56, which is currently the best
available approximation ratio. Since a matrix is a second order tensor, it is natural to
extend the cut-norm to general higher order tensors, e.g., a recent paper by Kannan [62].
Specifically, given a d-th order tensor F = (Fi1i2···id) ∈ Rn1×n2×···×nd , its cut-norm is
defined as
‖F ‖C := maxIk⊂1,2,...,nk, k=1,2,...,d
∣∣∣∣∣∣∑
ik∈Ik, k=1,2,...,d
Fi1i2···id
∣∣∣∣∣∣ .In fact, the cut-norm ‖F ‖C is closely related to ‖F ‖∞7→1, which is exactly in the
form of (TB). By Theorem 6.2.1, there is a polynomial-time randomized approximation
116 6 Polynomial Optimization with Binary Constraints
algorithm which computes ‖F ‖∞7→1 with a factor at least Ω
((∏d−2k=1 nk
)− 12
), where
we assume n1 ≤ n2 ≤ · · · ≤ nd. The following proposition, asserts that the cut-norm of
a general d-th order tensor can also be approximated by a factor of Ω
((∏d−2k=1 nk
)− 12
).
Proposition 6.6.1 For any d-th order tensor F ∈ Rn1×n2×···×nd, ‖F ‖C ≤ ‖F ‖∞7→1 ≤
2d‖F ‖C .
Proof. Recall that ‖F ‖∞7→1 = maxxk∈Bnk , k=1,2,...,d F (x1,x2, · · · ,xd). For any xk ∈
Bnk (k = 1, 2, . . . , d), it follows that
F (x1,x2, · · · ,xd) =∑
1≤ik≤nk, k=1,2,...,d
Fi1i2···idx1i1x
2i2 · · ·x
did
=∑β∈Bd
∑ik∈j|xkj=βk,1≤j≤nk, k=1,2,...,d
Fi1i2···idx1i1x
2i2 · · ·x
did
=∑β∈Bd
∏1≤k≤d
βk∑
ik∈j|xkj=βk,1≤j≤nk, k=1,2,...,d
Fi1i2···id
≤
∑β∈Bd
∣∣∣∣∣∣∣∑
ik∈j|xkj=βk,1≤j≤nk, k=1,2,...,d
Fi1i2···id
∣∣∣∣∣∣∣≤
∑β∈Bd
‖F ‖C = 2d‖F ‖C ,
which implies ‖F ‖∞7→1 ≤ 2d‖F ‖C .
Observe that ‖F ‖C = maxzk∈0,1nk , k=1,2,...,d |F (z1, z2, · · · , zd)|. For any zk ∈
0, 1nk (k = 1, 2, . . . , d), let zk = (e + xk)/2, where e is the all one vector. Clearly
xk ∈ Bnk for k = 1, 2, . . . , d, and thus
F (z1, z2, · · · , zd) = F
(e+ x1
2,e+ x2
2, · · · , e+ xd
2
)=
F (e, e, · · · , e) + F (x1, e, · · · , e) + · · ·+ F (x1,x2, · · · ,xd)2d
≤ 1
2d· ‖F ‖∞7→1 · 2d = ‖F ‖∞7→1,
which implies ‖F ‖C ≤ ‖F ‖∞7→1.
6.6.2 Maximum Complete Satisfiability
The usual maximum satisfiability problem (see e.g., [38]) is to find the boolean values
of the literals, so as to maximize the total weighted sum of the satisfied clauses. The
6.6 Applications 117
key point of the problem is that each clause is in the disjunctive form, namely if one of
the literals is assigned the TRUE value, then the clause is called satisfied. If the literals
are also conjunctive, then this form of satisfiability problem is easy to solve. However,
if not all the clauses can be satisfied, and we alternatively look for an assignment
that maximizes the weighted sum of the satisfied clauses, then the problem is quite
different. To make a distinction from the usual Max-SAT problem, let us call the new
problem to be maximum complete satisfiability, or to be abbreviated as Max-C-SAT.
It is immediately clear that Max-C-SAT is NP-hard, since we can easily reduce the
max-cut problem to it. The reduction can be done as follows. For each edge (vi, vj) we
consider two clauses xi, xj and xi, xj, both having weight wij . Then the Max-C-
SAT solution leads to a solution for the max-cut problem.
Now consider an instance of the Max-C-SAT problem with m clauses, each clause
containing no more than d literals. Suppose that clause k (1 ≤ k ≤ m) has the following
form
xk1 , xk2 , · · · , xksk , xk′1 , xk′2 , . . . , xk′tk,
where sk + tk ≤ d, associated with a weight wk ≥ 0 for k = 1, 2, . . . ,m. Then, the
Max-C-SAT problem can be formulated in the form of (PB) as
max∑m
k=1wk∏skj=1
1+xkj2 ·
∏tki=1
1−xk′i
2
s.t. x ∈ Bn.
According to Theorem 6.5.2 and the nonnegativity of the objective function, the above
problem admits a polynomial-time randomized approximation algorithm with approx-
imation ratio Ω(n−
d−22
), which is independent of the number of clauses m.
6.6.3 Box-Constrained Diophantine Equation
Solving a system of linear equations where the variables are integers and constrained
to a hypercube is an important problem in discrete optimization and linear algebra.
Examples of applications include the classical Frobenius problem (see e.g., [2, 16]), and
the market split problem [26], other from engineering applications in integrated circuits
design and video signal processing. For more details, one is referred to Aardal et al. [1].
Essentially, the problem is to find an integer-valued x ∈ Zn and 0 ≤ x ≤ u, such that
118 6 Polynomial Optimization with Binary Constraints
Ax = b. The problem can be formulated by the least square method as
(DE) max −(Ax− b)T(Ax− b)
s.t. x ∈ Zn, 0 ≤ x ≤ u.
According to the discussion at the end of Section 6.5, the above problem can be re-
formulated as a form of (PB), whose objective function is quadratic polynomial and
number of decision variables is∑n
i=1dlog2(ui + 1)e. By applying Theorem 6.5.2, (DE)
admits a polynomial-time randomized approximation algorithm with a constant relative
approximation ratio.
Generally speaking, the Diophantine equations are polynomial equations. The box-
constrained polynomial equations can also be formulated by the least square method
as of (DE). Suppose the highest degree of the polynomial equations is d. Then, this
least square problem can be reformulated as a form of (PB), with the degree of the
objective polynomial being 2d and number of decision variables being∑n
i=1dlog2(ui +
1)e. By applying Theorem 6.5.2, this problem admits a polynomial-time randomized
approximation algorithm with a relative approximation ratio Ω(
(∑n
i=1 log ui)−(d−1)
).
6.7 Numerical Experiments
In this section we are going to test the numerical performance of the algorithms pro-
posed in this chapter. Our experiments focus on the model (TB) with d = 4 as a typical
case. Specifically the problem to be tested is
(ETB) max F (x,y, z,w) =∑
1≤i,j,k,`≤n Fijk` xiyjzkw`
s.t. x,y, z,w ∈ Bn.
6.7.1 Randomly Simulated Data
The input data of (ETB) is generated in the same way as that of (ETS), with entries
of F following i.i.d. standard normals. The first relaxation model for Algorithm 6.2.2
to approximately solve (ETB) is
(ETB) max F (X,w) =∑
1≤i,j,k,`≤n Fijk`Xijkw`
s.t. X ∈ Bn×n×n, w ∈ Bn,
which can be solved approximately using SDP relaxation and randomization method
proposed by Alon and Naor [5]. However, the size of the SDP relaxation problem is
6.7 Numerical Experiments 119
(n3 + n) × (n3 + n), which is intractable for current SDP solvers even when n = 8.
Therefore in our testings, we further relax the above problem to
max F (X) =∑
1≤i,j,k,`≤n Fijk`Xijk`
s.t. X ∈ Bn×n×n×n,
whose optimal solution is trivially sign (F ) with optimal value vB := ‖F ‖1. This
optimal solution can be rewritten as an n3 × n matrix, followed by applying DR 6.2.1
to get a feasible solution of (ETB). Then we can apply the recursion procedures of
Algorithm 6.2.2 to get a feasible solution of the original model (ETB), with its objective
value being denoted by v.
According to Theorem 6.2.1, the theoretical worst-case performance ratio of (ETB)
by Algorithm 6.2.2 is Ω(1/n). However, the theoretical ratio for the above method is
indeed Ω(1/n1.5) because of a deeper relaxation, which can be proven by using the same
argument in Theorem 6.2.1. However, this deeper relaxation allows us to skip the SDP
relaxation of (ETB), and make the method applicable for large dimensions. In general,
the trivial upper bound of v(ETB) generated by this method, vB, may not good, and
we may seek a tighter one. For this purpose we turn to the model (TS) discussed in
Section 3.2. Noticing that an n-dimensional binary vector has a norm√n, we may also
relax (ETB) to
max F (x,y, z,w) =∑
1≤i,j,k,`≤n Fijk` xiyjzkw`
s.t. ‖x‖ = ‖y‖ = ‖z‖ = ‖w‖ =√n,
x,y, z,w ∈ Rn,
which can be further relaxed to
max F (X,Z) =∑
1≤i,j,k,`≤n Fijk`XijZk`
s.t. ‖X‖ = ‖Z‖ = n,
X,Z ∈ Rn×n.
The above problem is the largest singular value problem, whose optimal value can be
computed efficiently. Denoted its optimal value to be vS , which is taken as another
upper bound of v(ETB).
The numerical results of 10 randomly generated instances for the upper bounds vB
and vS , as well as the objective values of the approximate solutions generated are listed
in Table 6.1, which clearly shows that vS outperforms vB significantly. Therefore in the
120 6 Polynomial Optimization with Binary Constraints
Table 6.1: Numerical upper bounds of v(ETB) for n = 13
Instance 1 2 3 4 5 6 7 8 9 10
v 619 637 603 664 682 572 613 662 591 752
vB 22742 22588 22775 22711 22827 22905 22593 22966 22789 22678
vS 4251 4314 4346 4368 4294 4338 4295 4330 4330 4303
Table 6.2: Numerical ratios (average of 10 instances) of (ETB)
n 5 10 20 30 40 50 60 70 80 90
τ (%) 35.42 18.51 9.94 7.06 5.45 4.09 3.93 3.06 2.99 2.58
τ · n 1.77 1.85 1.99 2.12 2.18 2.04 2.36 2.14 2.39 2.32
τ · n0.9 1.51 1.47 1.47 1.51 1.51 1.38 1.56 1.40 1.54 1.48
τ · n0.5 0.79 0.59 0.44 0.39 0.35 0.29 0.30 0.26 0.27 0.25
following general testings, we shall choose vS as our candidate of the upper bound, to
test the quality of the approximation solution, i.e., τ := v/vS . The simulation results
are listed in Table 6.2. By observation, the performance ratio is better than Ω(1/n),
and is quite close to Ω(1/n0.9). It is clearly better than the theoretical ratio Ω(1/n1.5).
The computational cost for our method is quite low. In fact, for n = 80, we are able
to get a feasible solution within 2 minutes, while computing the upper bound vS costs
much more time. For n ≥ 95, however, our computer reports to run out of memory in
the experiments, a problem purely due to the sheer size of the input data.
6.7.2 Data of Low-Rank Tensors
The numerical tests conducted so far are based on the data generating from i.i.d. stan-
dard normals. It would be interesting to investigate the practicability of our algorithms
using other data settings. In particular, we shall test some low-rank tensors.
As mentioned in Section 3.4.2 (see also [68]), a fourth order tensor has rank r if it
can be written as a summation of r number of rank-one tensors, and cannot be written
as a summation of r− 1 number of rank-one tensors. Specifically, the data we generate
here is
F :=r∑i=1
a1i ⊗ a2
i ⊗ a3i ⊗ a4
i ,
6.7 Numerical Experiments 121
Table 6.3: Numerical ratios (average of 10 instances) of (ETB) with low-rank tensors
r (rank) 1 2 3 4 5 6 7 8 9 10 15 20
τ (%) for n = 10 34.6 30.9 28.0 32.4 26.7 25.8 27.0 28.3 27.2 26.5 25.9 26.2
τ (%) for n = 20 14.8 15.0 14.0 15.7 11.3 11.1 11.2 11.8 12.1 11.7 11.2 12.0
τ (%) for n = 30 9.1 7.3 7.5 6.9 7.2 7.2 6.6 6.6 7.2 7.7 6.2 5.7
where all aki (k = 1, 2, 3, 4, i = 1, 2, . . . , r) are independent of each other, each of which
following i.i.d. standard normals.
We again use the method discussed in the previous subsection to approximately
solve the model (ETB), and compare its objective v with the upper bound vS , i.e.,
τ = v/vS . The performance ratios for such data settings are shown in Table 6.3 for
n = 10, 20 and 30. By observation, we find that low-rank tensors F improve the
approximation ratios significantly. The lower the rank of the tensor F , the better the
performance ratio.
Chapter 7
Homogeneous Form Optimization
with Mixed Constraints
7.1 Introduction
This chapter brings most of the results in previous chapters together, to discuss mixed
integer programming problems. The objective functions are all homogenous polynomial
functions, while the constraints are a combination of two most widely used ones, the
spherical constraint and the binary constraint. In particular, the models considered
include:
(TBS) max F (x1,x2, · · · ,xd,y1,y2, . . . ,yd′)
s.t. xk ∈ Bnk , k = 1, 2, . . . , d,
y` ∈ Sm` , ` = 1, 2, . . . , d′;
(HBS) max f(x,y)
s.t. x ∈ Bn,
y ∈ Sm;
(MBS) max f(x1,x2, · · · ,xs,y1,y2, · · · ,yt)
s.t. xk ∈ Bnk , k = 1, 2, . . . , s,
y` ∈ Sm` , ` = 1, 2, . . . , t.
The model (MBS) is a generalization of the models (TBS) and (HBS). In fact, it can
also be taken as generalization of most of the homogenous polynomial optimization
122
7.1 Introduction 123
models discussed in previous chapters, namely (TS) of Chapter 3, (HS) and (MS) of
Chapter 4, and (TB), (HB) and (MB) of Chapter 6 as well.
These mixed models have versatile applications, e.g., matrix combinatorial prob-
lem, vector-valued max-cut problem, whose details will be discussed in Section 7.5.
Essentially, in many discrete optimization problems, if the objective to be optimized
is extended from a scalar to a vector or a matrix, then we may turn to optimize the
Euclidean norm of the vector, or the spectrum norm of the matrix, which turns out to
be the mixed integer programming models proposed above.
All these models are NP-hard in general, even in the simplest case of one spherical
constraint and one binary constraint, i.e., the model (TBS) with d = d′ = 1. As we will
see later, it is actually equivalent to the maximization of a positive semidefinite form
in binary variables, which includes max-cut as a subproblem and is thus NP-hard. In
fact, this simplest form of (TBS) serves as a basis for all these mixed integer program-
ming models. By using this basis and mathematical induction, we are able to derive
polynomial-time randomized approximation algorithms with worst-case performance
ratios for (TBS) with any fixed degree. The techniques are similar to that of Chap-
ter 3, and two types of decomposition routines are called, one for decomposition of the
spherical constraints, and one for decomposition of the binary constraints. Moreover,
in order to extend the results from (TBS) to (HBS) and (MBS), the multilinear tensor
form relaxation method is again applied. Armed with the link lemmas (Lemma 4.2.1
and Lemma 4.4.3), we are able to derived approximation algorithms under some mild
square-free conditions.
This chapter is organized as follows. We shall discuss models (TBS), (HBS) and
(MBS) in Sections 7.2, 7.3 and 7.4 respectively, and propose polynomial-time random-
ized approximation algorithms with provable approximation ratios or relative approxi-
mation ratios for the respective models. In Section 7.5, we shall discuss a few specific
problems where these mixed models can be directly applied. For the easy of reading,
in this chapter, we shall exclusively use vector x (∈ Bn) to denote discrete variables,
and vector y (∈ Sm) to denote continuous variables. Throughout our discussion, we
shall fix the degree of the objective polynomial function in these mixed models, d+ d′,
to be a constant.
124 7 Homogeneous Form Optimization with Mixed Constraints
7.2 Multilinear Form with Binary and Spherical Constraints
Our first mixed model is to maximize a multilinear function, with some variables being
binary and some in the unit sphere, namely,
(TBS) max F (x1,x2, · · · ,xd,y1,y2, . . . ,yd′)
s.t. xk ∈ Bnk , k = 1, 2, . . . , d,
y` ∈ Sm` , ` = 1, 2, . . . , d′,
where n1 ≤ n2 ≤ · · · ≤ nd and m1 ≤ m2 ≤ · · · ≤ md′ . This model is a generalization of
(TS) in Section 3.2 and (TB) in Section 6.2.
The simplest case of (TBS), d = d′ = 1, is worth mention, as it plays an essential role
in the whole chapter. Based on this case, we shall derive polynomial-time approximation
algorithm with worst-case performance ratio for (TBS) with any fixed degree d+ d′.
Proposition 7.2.1 If d = d′ = 1, then (TBS) is NP-hard, and admits a polynomial-
time randomized approximation algorithm with approximation ratio√
2/π.
Proof. When d = d′ = 1, (TBS) can be written as
(TBS) max xTFy
s.t. x ∈ Bn1 , y ∈ Sm1 .
For any fixed x in (TBS), the corresponding optimal y must be FTx/‖FTx‖ due to
the Cauchy-Schwartz inequality, and accordingly,
xTFy = xTFFTx
‖FTx‖= ‖FTx‖ =
√xTFFTx.
Thus (TBS) is equivalent to
max xTFFTx
s.t. x ∈ Bn1 .
Noticing that matrix FFT is positive semidefinite, the above problem includes the max-
cut problem (see e.g., [40]) as a subclass. Therefore it is NP-hard. Moreover, according
to the result of Nesterov [88], it admits a polynomial-time randomized approximation
algorithm (SDP relaxation and randomization) with approximation ratio 2/π. This
implies that (TBS) admits a polynomial-time randomized approximation algorithm with
approximation ratio√
2/π.
7.2 Multilinear Form with Binary and Spherical Constraints 125
Proposition 7.2.1 is the foundation to establish the basic relaxation in solving (TBS)
recursively for general degree d and d′. In processing to the high degree cases, for the
recursion on d, with discrete variables xk (k = 1, 2, . . . , d), DR 6.2.1 is applied in
each recursive step; while for the recursion on d′ with continuous variables y` (` =
1, 2, . . . , d′), two decomposition routines in Section 3.2 are readily available, namely
the eigenvalue decomposition approach DR 3.2.2 and the randomized decomposition
approach DR 3.2.1, either one of them will serve the purpose here. The main result in
this section is the following:
Theorem 7.2.2 (TBS) admits a polynomial-time randomized approximation algorithm
with approximation ratio τ(TBS), where
τ(TBS) :=
(2
π
) 2d−12
(d−1∏k=1
nk
d′−1∏`=1
m`
)− 12
= Ω
(d−1∏k=1
nk
d′−1∏`=1
m`
)− 12
.
Proof. The proof is based on mathematical induction on the degree d+ d′, and Propo-
sition 7.2.1 can be used as the base for the induction process when d+ d′ = 1.
For general d + d′ ≥ 3, if d′ ≥ 2, let Y = y1(yd′)T. Noticing that ‖Y ‖2 =
‖y1‖2‖yd′‖2 = 1, similar to the relaxation in the proof of Theorem 3.2.4, (TBS) can be
relaxed to a case with degree d+ d′ − 1, i.e.,
max F (x1,x2, · · · ,xd,Y ,y2,y3, · · · ,yd′−1)
s.t. xk ∈ Bnk , k = 1, 2, . . . , d,
Y ∈ Sm1md′ , y` ∈ Sm` , ` = 2, 3, . . . , d′ − 1.
By induction, a feasible solution (x1, x2, · · · , xd, Y , y2, y3, · · · , yd′−1) can be found in
polynomial-time, such that
F (x1, x2, · · · , xd, Y , y2, y3, · · · , yd′−1) ≥(
2
π
) 2d−12
(d−1∏k=1
nk
d′−1∏`=2
m`
)− 12
v(TBS).
Let us denote matrix Q = F (x1, x2, · · · , xd, ·, y2, y3, · · · , yd′−1, ·) ∈ Rm1×md′ . Then
by Proposition 3.2.1 (used in DR 3.2.2), maxy1∈Sm1 ,yd′∈Smd′ (y1)TQyd
′can be solved
in polynomial-time, with its optimal solution (y1, yd′) satisfying
F (x1, x2, · · · , xd, y1, y2, · · · , yd′) = (y1)TQyd′ ≥ ‖Q‖/
√m1.
By the Cauchy-Schwartz inequality, it follows that
‖Q‖ = maxY ∈ Sm1md′
Q • Y ≥ Q • Y = F (x1, x2, · · · , xd, Y , y2, y3, · · · , yd′−1).
126 7 Homogeneous Form Optimization with Mixed Constraints
Thus we concludes that
F (x1, x2, · · · , xd, y1, y2, · · · , yd′) ≥ ‖Q‖/√m1
≥ F (x1, x2, · · · , xd, Y , y2, y3, . . . , yd′−1)/
√m1
≥ τ(TBS) v(TBS).
For d+ d′ ≥ 3 and d ≥ 2, let X = x1(xd)T, and (TBS) can be relaxed to the other
case with degree d− 1 + d′, i.e.,
max F (X,x2,x3, · · · ,xd−1,y1,y2, · · · ,yd′)
s.t. X ∈ Bn1nd , xk ∈ Bnk , k = 2, 3, . . . , d− 1,
y` ∈ Sm` , ` = 1, 2, . . . , d′.
By induction, it admits a polynomial-time randomized approximation algorithm with
approximation ratio(
2π
) 2d−32
(∏d−1k=2 nk
∏d′−1`=1 m`
)− 12. In order to decompose X into
x1 and xd, we shall conduct the randomization procedure as in Step 2 of DR 6.2.1,
which will further deteriorate by an additional factor of 2π√n1
in expectation, as shown
in (6.1). Combining these two factors, we are led to the ratio τ(TBS).
We end this section by summarizing the algorithm for solving (TBS) below.
Algorithm 7.2.1
• INPUT: a (d + d′)-th order tensor F ∈ Rn1×n2×···×nd×m1×m2×···×md′ with n1 ≤
n2 ≤ · · · ≤ nd and m1 ≤ m2 ≤ · · · ≤ md′.
1 Rewrite F as a matrix M ∈ Rn1n2···nd×m1m2···md′ by combining its first d modes
into the matrix row, and last d′ modes into the matrix column.
2 Apply the procedure in Proposition 7.2.1, with input M and output x ∈ Bn1n2···nd.
3 Rewrite the vector x as a d-th order tensor X ∈ Bn1×n2×···×nd and compute a
d′-th order tensor F ′ = F (X, ·, ·, · · · , ·) ∈ Rm1×m2×···×md′ .
4 Apply Algorithm 3.2.3, with input F ′ and output (y1, y2, · · · , yd′).
5 Compute a d-th order tensor F ′′ = F (·, ·, · · · , ·, y1, y2, · · · , yd′) ∈ Rn1×n2×···×nd.
6 Apply Algorithm 6.2.2, with input F ′′ and output (x1, x2, · · · , xd).
7.3 Homogeneous Form with Binary and Spherical Constraints 127
• OUTPUT: a feasible solution (x1, x2, · · · , xd, y1, y2, · · · , yd′).
7.3 Homogeneous Form with Binary and Spherical Con-
straints
We further extend the mixed model in previous section to the homogeneous polynomial
case, namely,
(HBS) max f(x,y)
s.t. x ∈ Bn, y ∈ Sm,
where f(x,y) = F (x,x, · · · ,x︸ ︷︷ ︸d
,y,y, · · · ,y︸ ︷︷ ︸d′
), and F ∈ Rnd×md′
is a (d + d′)-th order
tensor with partial symmetric property. This model is a generalization of the model
(HS) in Section 4.2 and the model (HB) in Section 6.3. We shall derive polynomial-
time approximation algorithms with worst-case performance ratios. The method here is
also multilinear function relaxation (TBS), which admits a polynomial-time randomized
approximation algorithm by Theorem 7.2.2. Then by applying Lemma 4.2.1 as a link,
together with the square-free property for the discrete variables x, we are led to the
following results regarding (HBS).
Theorem 7.3.1 If f(x,y) is square-free in x, and either d or d′ is odd, then (HBS)
admits a polynomial-time randomized approximation algorithm with approximation ra-
tio τ(HBS), where
τ(HBS) :=
(2
π
) 2d−12
d! d−dd′! d′−d′n−
d−12 m−
d′−12 = Ω
(n−
d−12 m−
d′−12
).
Proof. Like in the proof of Theorem 6.3.2, by relaxing (HBS) to (TBS), we are able to
find (x1, x2, · · · , xd, y1, y2, · · · , yd′) with xk ∈ Bn for all 1 ≤ k ≤ d and y` ∈ Sm for all
1 ≤ ` ≤ d′ in polynomial-time, such that
F (x1, x2, · · · , xd, y1, y2, · · · , yd′) ≥ (2/π)2d−1
2 n−d−12 m−
d′−12 v(HBS).
Let ξ1, ξ2, · · · , ξd, η1, η2, · · · , ηd′ be i.i.d. random variables, each taking values 1 and −1
with equal probability 1/2. By applying Lemma 4.4.3 (or Lemma 4.2.1 twice), we have
E
d∏i=1
ξi
d′∏j=1
ηj f
(d∑
k=1
ξkxk,
d′∑`=1
η`y`
) = d!d′!F (x1, x2, · · · , xd, y1, y2, · · · , yd′).
(7.1)
128 7 Homogeneous Form Optimization with Mixed Constraints
Thus we are able to find binary vectors β ∈ Bd and β′ ∈ Bd′ , such that
d∏i=1
βi
d′∏j=1
β′j f
(d∑
k=1
βkxk,
d′∑`=1
β′` y`
)≥ d!d′!F (x1, x2, · · · , xd, y1, y2, · · · , yd′).
Denote
(x, y) :=
d∏i=1
βi
d′∏j=1
β′j
d∑k=1
βkxk,
d′∑`=1
β′` y`
d is odd,
d∑k=1
βkxk,
d∏i=1
βi
d′∏j=1
β′j
d′∑`=1
β′` y`
d′ is odd.
Noticing ‖y‖ ≤ d′ and combining the previous two inequalities, it follows that
f
(x
d,y
‖y‖
)≥ d−dd′−d′
d∏i=1
βi
d′∏j=1
β′jf
(d∑
k=1
βkxk,
d′∑`=1
β′` y`
)≥ τ(HBS) v(HBS).
Denote y = y/‖y‖ ∈ Sm. Since x/d ∈ Bn by a similar argument as (6.2), and f(x, y)
is square-free in x, by applying Lemma 6.3.1, x ∈ Bn can be found in polynomial-time,
such that
f (x, y) ≥ f (x/d, y) ≥ τ(HBS) v(HBS).
We remark that in Theorem 7.3.1, if d′ = 2 and d is odd, then the factor d′! d′−d′
in
τ(HBS) can be removed for the same argument in the proof of Theorem 4.4.2 (basically
the corresponding adjustment is an eigenvalue problem), and this improves the ratio
τ(HBS) to(
2π
) 2d−12 d! d−dn−
d−12 m−
12 . Now we present the approximation result for the
even degree case.
Theorem 7.3.2 If f(x,y) is square-free in x, and both d and d′ are even, then (HBS)
admits a polynomial-time randomized approximation algorithm with relative approxi-
mation ratio τ(HBS).
Proof. Following the same argument as in the proof of Theorem 7.3.1, we shall get (7.1),
which implies
E
d∏i=1
ξi
d′∏j=1
ηj f
(d∑
k=1
ξkxk,
d′∑`=1
η`y`
) ≥ ( 2
π
) 2d−12
d! d′!n−d−12 m−
d′−12 v(HBS).
Denote xξ := 1d
∑dk=1 ξkx
k and yη := 1d′∑d′
`=1 η`y`. Clearly we have
E
d∏i=1
ξi
d′∏j=1
ηj f(xξ, yη
) ≥ τ(HBS) v(HBS).
7.3 Homogeneous Form with Binary and Spherical Constraints 129
Pick any fixed y ∈ Sm and consider the following problem
(HBS) max f(x, y)
s.t. x ∈ Bn.
Since f(x, y) is square-free in x and has no constant term, by Lemma 6.5.1, a binary
vector x ∈ Bn can be found in polynomial-time with
f(x, y) ≥ 0 ≥ v(HBS) ≥ v(HBS).
Next we shall argue f(xξ, yη
)≥ v(HBS). If this were not the case, then f
(xξ, yη
)<
v(HBS) ≤ 0. By noticing ‖yη‖ ≤ 1, this leads to
f(xξ, yη/‖yη‖
)= ‖yη‖−d
′f(xξ, yη
)≤ f
(xξ, yη
)< v(HBS).
Also noticing xξ ∈ Bn, by applying Lemma 6.3.1, a binary vector x ∈ Bn can be found
with
v(HBS) ≤ f(x, yη/‖yη‖) ≤ f(xξ, yη/‖yη‖
)< v(HBS)
resulting in a contradiction.
By that f(xξ, yη
)− v(HBS) ≥ 0, it follows
1
2E
f (xξ, yη)− v(HBS)
∣∣∣∣∣d∏i=1
ξi
d′∏j=1
ηj = 1
≥ 1
2E
f (xξ, yη)− v(HBS)
∣∣∣∣∣d∏i=1
ξi
d′∏j=1
ηj = 1
−1
2E
f (xξ, yη)− v(HBS)
∣∣∣∣∣d∏i=1
ξi
d′∏j=1
ηj = −1
= E
d∏i=1
ξi
d′∏j=1
ηj(f(xξ, yη
)− v(HBS)
)= E
d∏i=1
ξi
d′∏j=1
ηj f(xξ, yη
) ≥ τ(HBS) v(HBS).
Thus we are able to find β ∈ Bd and β′ ∈ Bd′ with∏di=1 βi
∏d′
j=1 β′j = 1, such that
f(xβ, yβ′
)− v(HBS) ≥ 2 τ(HBS) v(HBS).
Denote y = yβ′/‖yβ′‖ ∈ Sm. Since xβ ∈ Bn, by Lemma 6.3.1, a binary vector x ∈ Bn
can be found in polynomial-time with f(x, y) ≥ f (xβ, y).
130 7 Homogeneous Form Optimization with Mixed Constraints
Below we shall prove either (x, y) or (x, y) will satisfy
f(x,y)− v(HBS) ≥ τ(HBS) (v(HBS)− v(HBS)) . (7.2)
Indeed, if −v(HBS) ≥ τ(HBS) (v(HBS)− v(HBS)), then (x, y) satisfies (7.2) in this
case since f(x, y) ≥ 0. Otherwise, if −v(HBS) < τ(HBS) (v(HBS)− v(HBS)), then
v(HBS) > (1− τ(HBS)) (v(HBS)− v(HBS)) ≥ (v(HBS)− v(HBS)) /2,
which implies
f(xβ, yβ′
)− v(HBS) ≥ 2 τ(HBS) v(HBS) ≥ τ(HBS) (v(HBS)− v(HBS)) .
The above inequality also implies that f(xβ, yβ′
)> 0. Therefore, we have
f(x, y) ≥ f (xβ, y) = ‖yβ′‖−d′f(xβ, yβ′
)≥ f
(xβ, yβ′
),
which implies (x, y) satisfies (7.2). Finally, arg maxf(x, y), f(x, y) satisfies (7.2) in
both cases.
7.4 Mixed Form with Binary and Spherical Constraints
The final story of polynomial optimization problems in this thesis brings together a
bunch of models discussed in previous sections and chapters, as a generalization of a
large family, which includes (TS), (HS), (MS), (TB), (HB), (MB), (TBS) and (HBS)
all as its subclasses. The model is to maximize a mixed form over variables in binary
constraints, mixed with variables in spherical constraints, i.e.,
(MBS) max f(x1,x2, · · · ,xs,y1,y2, · · · ,yt)
s.t. xk ∈ Bnk , k = 1, 2, . . . , s,
y` ∈ Sm` , ` = 1, 2, . . . , t,
where associated with function f is a tensor F ∈ Rn1d1×n2
d2×···×nsds×m1d′1×m2
d′2×···×mtd′t
with partial symmetric property, n1 ≤ n2 ≤ · · · ≤ ns and m1 ≤ m2 ≤ · · · ≤ mt, and
d = d1 + d2 + · · ·+ ds and d′ = d′1 + d′2 + · · ·+ d′t are deemed as fixed constants.
We shall derive polynomial-time approximation algorithms for this general model.
By relaxing (MBS) to the multilinear function optimization model (TBS) and solving
it approximately using Theorem 7.2.2, we may further adjust its solution one by one
using the link Lemma 4.2.1 or Lemma 4.4.3, leading to the following general results in
two settings.
7.4 Mixed Form with Binary and Spherical Constraints 131
Theorem 7.4.1 If f(x1,x2, · · · ,xs,y1,y2, · · · ,yt) is square-free in each xk (k =
1, 2, . . . , s), and one of dk (k = 1, 2, . . . , s) or one of d′` (` = 1, 2, . . . , t) is odd, then
(MBS) admits a polynomial-time randomized approximation algorithm with approxima-
tion ratio τ(MBS), where
τ(MBS) :=
(2
π
) 2d−12
s∏k=1
dk!
dkdk
∏1≤`≤t, 3≤d′`
d′`!
d′`d′`
(∏sk=1 nk
dk∏t`=1m`
d′`
nsmt
)− 12
= Ω
(∏sk=1 nk
dk∏t`=1m`
d′`
nsmt
)− 12
.
Proof. The proof is analogous to that of Theorem 7.3.1. We first relax (MBS) to (TBS)
and get its approximate solution (x1, x2, · · · , xd, y1, y2, · · · , yd′) using Theorem 7.2.2.
Let ξ1, ξ2, · · · , ξd, η1, η2, · · · , ηd′ be i.i.d. random variables, each taking values 1 and −1
with equal probability 1/2. By applying Lemma 4.4.3, we have
E
d∏i=1
ξi
d′∏j=1
ηjf(x1ξ , x
2ξ , · · · , xsξ, y1
η, y2η, · · · , ytη
)=
s∏k=1
dk!t∏
`=1
d′`!F(x1, x2, · · · , xd, y1, y2, · · · , yd′
), (7.3)
where
x1ξ :=
d1∑k=1
ξkxk, x2
ξ :=
d1+d2∑k=d1+1
ξkxk, · · · , xsξ :=
d∑k=d1+d2+···+ds−1+1
ξkxk, (7.4)
and
y1η :=
d′1∑`=1
η` y`, y2
η :=
d′1+d′2∑`=d′1+1
η` y`, · · · , ytη :=
d′∑`=d′1+d′2+···+d′t−1+1
η` y`. (7.5)
In (7.3), as one of dk (k = 1, 2, . . . , s) or one of d′` (` = 1, 2, . . . , t) is odd, we are
able to move∏di=1 ξi
∏d′
j=1 ηj into the coefficient of the corresponding vector (xkξ or y`η
whenever appropriate) in the function f . Other derivations are essentially the same
as the proof of Theorem 7.3.1. Besides, we only loose a ratio of d′`!/d′`d′` when d′` ≥ 3
in τ(MBS). This is because when d′` ≤ 2, the corresponding adjustments can be done
without deteriorating the ratio, like in the proof of Theorem 4.4.2.
Theorem 7.4.2 If f(x1,x2, · · · ,xs,y1,y2, · · · ,yt) is square-free in each xk (k =
1, 2, . . . , s), and all dk (k = 1, 2, . . . , s) and all d′` (` = 1, 2, . . . , t) are even, then (MBS)
132 7 Homogeneous Form Optimization with Mixed Constraints
admits a polynomial-time randomized approximation algorithm with relative approxi-
mation ratio τ(MBS), where
τ(MBS) :=
(2
π
) 2d−12
(s∏
k=1
dk!
dkdk
t∏`=1
d′`!
d′`d′`
)(∏sk=1 nk
dk∏t`=1m`
d′`
nsmt
)− 12
= Ω
(∏sk=1 nk
dk∏t`=1m`
d′`
nsmt
)− 12
.
Proof. The proof is analogous to that of Theorem 7.3.2. The main differences are: (i)
we use (7.3) instead of (7.1); and (ii) we use f
(x1ξ
d1,x2ξ
d2, · · · , x
sξ
ds,y1ηd′1,y2ηd′2, · · · , y
tη
d′t
)instead
of f(xξ, yη
), where
(x1ξ , x
2ξ , · · · , xsξ, y1
η, y2η, · · · , ytη
)are defined in (7.4) and (7.5).
7.5 Applications
The generality of the mixed integer polynomial optimizations studied in this chapter
gives rises to some succinct and interesting problems, apart from their versatile ap-
plications. Nevertheless, it should be useful and helpful to present a few examples at
this point with more details, to illustrate the potential modeling opportunities with the
new optimization models. In this section, we shall discuss the matrix combinatorial
problem and some extended version of the max-cut problem, and show that they are
readily formulated by the mixed integer programming problems in this chapter.
7.5.1 Matrix Combinatorial Problem
We discuss a succinct and interesting matrix combinatorial problem. Given n matrices
Ai ∈ Rm1×m2 for i = 1, 2, . . . , n, find a binary combination of them so as to maximize
the combined matrix in terms of spectral norm. Specifically, the following optimization
model
(MCP ) max σmax(∑n
i=1 xiAi)
s.t. xi ∈ 1,−1, i = 1, 2, . . . , n,
where σmax denotes the largest singular value of a matrix. Problem (MCP ) is NP-
hard, even in a special case of m2 = 1. In this case, the matrix Ai is replace by an
m1-dimensional vector ai, with the spectral norm being identical to the Euclidean norm
of a vector. The vector version combinatorial problem is then
max ‖∑n
i=1 xiai‖
s.t. xi ∈ 1,−1, i = 1, 2, . . . , n.
7.5 Applications 133
This is equivalent to the model (TBS) with d = d′ = 1, whose NP-hardness is asserted
by Proposition 7.2.1.
Turning back to the general matrix version (MCP ), the problem has an equivalent
formulation
max (y1)T (∑n
i=1 xiAi)y2
s.t. x ∈ Bn, y1 ∈ Sm1 , y2 ∈ Sm2 ,
which is essentially the model (TBS) with d = 1 and d′ = 2
max F (x,y1,y2)
s.t. x ∈ Bn, y1 ∈ Sm1 , y2 ∈ Sm2 ,
where associated with the trilinear function F is a third order tensor F ∈ Rn×m1×m2 ,
whose (i, j, k)-th entry is (j, k)-th entry of the matrix Ai. According to Theorem 7.2.2,
the largest matrix (in terms of spectral norm in (MCP ) formulation) can be approxi-
mated with a factor of√
2πminm1,m2 .
If the given n matrices Ai (i = 1, 2, . . . , n) are symmetric, then the maximization
criterion can be set for the largest eigenvalue in stead of the largest singular value, i.e.,
max λmax(∑n
i=1 xiAi)
s.t. xi ∈ 1,−1, i = 1, 2, . . . , n.
It is also easy to formulate this problem as the model (HBS) with d = 1 and d′ = 2
max F (x,y,y)
s.t. x ∈ Bn, y ∈ Sm,
whose optimal value can also be approximated with a factor of√
2πm by Theorem 7.3.1
and the remarks that followed.
7.5.2 Vector-Valued Maximum Cut
Consider an undirected graph G = (V,E) where V = v1, v2, · · · , vn is the set of the
vertices, and E ⊂ V × V is the set of the edges. On each edge e ∈ E there is an
associated weight, which is a nonnegative vector in this case, i.e., we ∈ Rm,we ≥ 0
for all e ∈ E. The problem now is to find a cut in such a way that the total sum of
the weights, which is a vector in this case, has a maximum norm. More formally, this
problem can be formulated as
maxC is a cut of G
∥∥∥∥∥∑e∈C
we
∥∥∥∥∥ .
134 7 Homogeneous Form Optimization with Mixed Constraints
Note that the usual max-cut problem is a special case of the above model where each
weight we ≥ 0 is a scalar. Similar to the scalar case (see [40]), we may reformulate the
above problem in binary variables as
max∥∥∥∑1≤i,j≤n xixjw
′ij
∥∥∥s.t. x ∈ Bn,
where
w′ij =
−wij i 6= j,
−wij +n∑k=1
wik i = j.(7.6)
Observing the Cauchy-Schwartz inequality, we further formulate the above problem as
max(∑
1≤i,j≤n xixjw′ij
)Ty = F (x,x,y)
s.t. x ∈ Bn, y ∈ Sm.
This is the exact form of (HBS) with d = 2 and d′ = 1. Although the square-free
property in x does not hold in this model (which is a condition of Theorem 7.3.1), one
can still replace any point in the hypercube (Bn) by one of its vertices (Bn) without
decreasing its objective function value, since the matrix F (·, ·, ek) =(
(w′ij)k
)n×n
is
diagonal dominant for k = 1, 2, . . . ,m. Therefore, the vector-valued max-cut problem
admits an approximation ratio of 12
(2π
) 32 n−
12 by Theorem 7.3.1.
If the weights on edges are positive semidefinite matrices (i.e., W ij ∈ Rm×m, W ij
0 for all (i, j) ∈ E), then the matrix-valued max-cut problem can also be formulated as
max λmax
(∑1≤i,j≤n xixjW
′ij
)s.t. x ∈ Bn,
where W ′ij is defined similarly as (7.6); or equivalently,
max yT(∑
1≤i,j≤n xixjW′ij
)y
s.t. x ∈ Bn, y ∈ Sm,
the model (HBS) with d = d′ = 2. Similar to the vector-valued case, by the diagonal
dominant property and Theorem 7.3.2, the above problem admits an approximation
ratio of 14
(2π
) 32 (mn)−
12 . Notice that Theorem 7.3.2 only asserts a relative approxima-
tion ratio. However for this problem the optimal value of its minimization counterpart
is obviously nonnegative, and thus a relative approximation ratio implies a usual ap-
proximation ratio.
Chapter 8
Conclusion and Recent
Developments
This thesis discusses various subclasses of polynomial optimization problems, with a
focus on deriving polynomial-time approximation algorithms with worst-case perfor-
mance guarantees. These subclasses include many frequently encountered constraints
in the literature, such as the Euclidean spherical constraints, the Euclidean ball con-
straints, the ellipsoidal constraints, the binary constraints, and a mixture of them. The
objective functions range from multilinear tensor functions, homogeneous polynomials,
to general inhomogeneous polynomials. Multilinear tensor function optimizations play
the key role in these algorithms, whose ideas are based on lower order multilinear form
relaxations and decomposition routines. Connections between multilinear functions,
homogenous polynomials, and inhomogeneous polynomials are established in preserv-
ing the approximation ratios. All the approximation results are listed in Table 8.1. The
applications of these polynomial optimization models are discussed, which open up a
door to many potential modeling opportunities. Reports on numerical testings show
that the algorithms proposed are actually very effective, and they typically produce
high quality solutions. The worst-case performance analysis offers a theoretical ‘safety
net’, which is usually far from the typical performance. Table 8.1 summarizes the whole
structure of the thesis and the approximation ratios.
Most of the results presented in this thesis have been documented and submitted
for publications in research papers [47, 48, 49], which are all joint works with He and
Zhang. Chapter 3 and Chapter 4 are mainly based on [47], Chapter 5 is mainly based
135
136 8 Conclusion and Recent Developments
Table 8.1: Thesis organization and theoretical approximation ratios
Section Model Theorem Approximation performance ratio
3.2 (TS) 3.2.4
(d−2∏k=1
nk
)− 12
3.3 (TQ) 3.3.4
(d−2∏k=1
nk
)− 12
Ω
(log−(d−1) max
1≤k≤dmk
)4.2 (HS) 4.2.2, 4.2.4 d! d−dn−
d−22
4.3 (HQ) 4.3.1, 4.3.2 d! d−dn−d−22 Ω
(log−(d−1)m
)
4.4.1 (MS) 4.4.2, 4.4.4
(s∏
k=1
dk!
dkdk
)(∏s−1k=1 nk
dk
ns−1
)− 12
(s∏
k=1
dk!
dkdk
)(∏sk=1 nk
dk
ns2
)− 12
4.4.2 (MQ) 4.4.5, 4.4.6
(s∏
k=1
dk!
dkdk
)(∏s−1k=1 nk
dk
ns−1
)− 12
Ω
(log−(d−1) max
1≤k≤smk
)(
s∏k=1
dk!
dkdk
)(∏sk=1 nk
dk
ns2
)− 12
Ω
(log−(d−1) max
1≤k≤smk
)5.2 (PS) 5.2.2 2−
5d2 (d+ 1)! d−2d(n+ 1)−
d−22
5.3 (PQ) 5.3.1 2−5d2 (d+ 1)! d−2d(n+ 1)−
d−22 Ω
(log−(d−1)m
)5.4 (PG) 5.4.2, 5.4.3 2−2d(d+ 1)! d−2d(n+ 1)−
d−22 (t2 + 1)−
d2
6.2 (TB) 6.2.1(
2
π
)d−1
ln(
1 +√
2)(d−2∏
k=1
nk
)− 12
6.3 (HB) 6.3.2, 6.3.3
(2
π
)d−1
ln(
1 +√
2)d! d−dn−
d−22
6.4 (MB) 6.4.1, 6.4.2
(2
π
)d−1
ln(
1 +√
2)( s∏
k=1
dk!
dkdk
)(∏s−1k=1 nk
dk
ns−1
)− 12
(2
π
)d−1
ln(
1 +√
2)( s∏
k=1
dk!
dkdk
)(∏sk=1 nk
dk
ns2
)− 12
6.5 (PB) 6.5.2ln(1 +√
2)
2(1 + e)πd−1(d+ 1)! d−2d(n+ 1)−
d−22
7.2 (TBS) 7.2.2(
2
π
) 2d−12
(d−1∏k=1
nk
d′−1∏`=1
m`
)− 12
7.3 (HBS) 7.3.1, 7.3.2(
2
π
) 2d−12
d! d−dd′! d′−d′n−
d−12 m−
d′−12
7.4 (MBS) 7.4.1, 7.4.2
(2
π
) 2d−12
(s∏
k=1
dk!
dkdk
t∏`=1
d′`!
d′`d′`
)(∏sk=1nk
dk∏t`=1m`
d′`
nsmt
)− 12
137
on [48], and Chapter 6 and Chapter 7 are mainly based on [49]. The results not on-
ly enhanced approximation algorithms for high degree polynomial optimizations, but
also opened up a wide range of new research topics for modeling and novel solution
methods. The research works have attracted some follow-up studies on the topic. For
instance, So [108] improved the approximation ratios of the models (TS) and (HS) to
Ω(∏d−2
k=1
√lognknk
)and Ω
((lognn
) d−22
), respectively. Very recently, He et al. [46] pro-
posed some fairly simple randomized approaches, which improved the approximation
ratios of homogeneous polynomial optimizations with spherical constraints and/or bi-
nary constraints, and the orders of the ratios were comparable to that in [108]. Apart
from the improvements of the approximation ratios, Chen et al. [25] established the
tightness result of multilinear form relaxation for the model (HS), and also derived
some local improvement algorithms for solving general polynomial optimization prob-
lems, especially for the model (PQ). Meanwhile, many other research topics are also
currently under investigations, including the extensions of polynomial optimization to
complex variables; the minimization counterparts of the models discussed in this the-
sis; the inapproximability results of these models; and of course issues from practical
applications of these models.
Bibliography
[1] K. Aardal, C. A. J. Hurkens, and A. K. Lenstra, Solving a System of Linear Dio-phantine Equations with Lower and Upper Bounds on the Variables, Mathematicsof Operations Research, 25, 427–442, 2000. 117
[2] J. L. R. Alfonsın, The Diophantine Frobenius Problem, Oxford University Press,Oxford, UK, 2005. 117
[3] N. Alon, W. F. de la Vega, R. Kannan, and M. Karpinski, Random Sampling andApproximation of MAX-CSP Problems, Proceedings of the 34th Annual ACMSymposium on Theory of Computing, 232–239, 2002. 115
[4] N. Alon, K. Makarychev, Y. Makarychev, and A. Naor, Quadratic Forms onGraphs, Inventiones Mathematicae, 163, 499–522, 2006. 99, 103
[5] N. Alon and A. Naor, Approximating the Cut-Norm via Grothendieck’s Inequality,SIAM Journal on Computing, 35, 787–803, 2006. 5, 8, 27, 98, 100, 101, 102, 115,118
[6] N. Ansari and E. Hou, Computational Intelligence for Optimization, Kluwer A-cademic Publishers, Norwell, MA, 1997. 5
[7] S. Arora, E. Berger, E. Hazan, G. Kindler, and M. Safra, On Non-Approximabilityfor Quadratic Programs, Proceedings of the 46th Annual IEEE Symposium onFoundations of Computer Science, 206–215, 2005. 8
[8] E. Artin, Uber die Zerlegung Definiter Funktionen in Quadrate, Abhandlungenaus dem Mathematischen Seminar der Universitat Hamburg, 5, 100–115, 1927. 2
[9] A. Atamturk, G. L. Nemhauser, and M. W. P. Savelsbergh, Conflict Graph-s in Solving Integer Programming Problems, European Journal of OperationalResearch, 121, 40–55, 2000. 5
[10] G. M. de Athayde and R. G. Flores, Jr., Incorporating Skewness and Kurtosisin Portfolio Optimization: A Multidimensional Efficient Set, in S. Satchell andA. Scowcroft (Eds.), Advances in Portfolio Construction and Implementation,Butterworth-Heinemann, Oxford, UK, Chapter 10, 243–257, 2003. 3, 92
[11] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, andM. Protasi, Complexity and Approximation: Combinatorial Optimization Prob-lems and their Approximability Properties, Springer-Verlag, Berlin, Germany,1999. 22
[12] M. L. Balinski, On a Selection Problem, Management Sciences, 17, 230–231, 1970.5, 7
138
BIBLIOGRAPHY 139
[13] I. Barany and Z. Furedi, Computing the Volume Is Difficult, Discrete & Compu-tational Geometry, 2, 319–326, 1987. 24
[14] A. Barmpoutis, B. Jian, B. C. Vemuri, and T. M. Shepherd, Symmetric Positive4th Order Tensors & Their Estimation from Diffusion Weighted MRI, Proceed-ings of the 20th International Conference on Information Processing in MedicalImaging, 308–319, 2007. 2
[15] A. Barvinok, Integration and Optimization of Multivariate Polynomials by Re-striction onto a Random Subspace, Foundations of Computational Mathematics,7, 229–244, 2006. 9
[16] D. Beihoffer, J. Hendry, A. Nijenhuis, and S. Wagon, Faster Algorithms for Frobe-nius Numbers, The Electronic Journal of Combinatorics, 12, R27, 2005. 117
[17] S. J. Benson and Y. Ye, Algorithm 875: DSDP5—Software for Semidefinite Pro-gramming, ACM Transactions on Mathematical Software, 34, Artical 16, 2008.26
[18] B. Bernhardsson and J. Peetre, Singular Values of Trilinear Forms, ExperimentalMathematics, 10, 509–517, 2001. 43
[19] B. Borchers, CSDP, A C Library for Semidefinite Programming, OptimizationMethods and Software, 11, 613–623, 1999. 26
[20] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press,Cambridge, UK, 2004. 30, 88, 90
[21] J. Bruck and M. Blaum, Neural Networks, Error-Correcting Codes, and Poly-nomials over the Binary n-Cube, IEEE Transactions on Information Theory, 35,976–987, 1989. 5
[22] P. H. Calamai and J. J. More, Projected Gradient Methods for Linearly Con-strained Problems, Mathematical Programming, 39, 93–116, 1987. 71
[23] J. D. Carroll and J.-J. Chang, Analysis of Individual Differences in Multidimen-sional Scaling via an N-Way Generalization of “Eckart-Young” Decomposition,Psychometrika, 35, 283–319, 1970. 44
[24] M. Charikar and A. Wirth, Maximizing Quadratic Programs: ExtendingGrothendieck’s Inequality, Proceedings of the 45th Annual IEEE Symposium onFoundations of Computer Science, 54–60, 2004. 8, 27, 98, 103
[25] B. Chen, S. He, Z. Li, and S. Zhang, Maximum Block Improvement and Poly-nomial Optimization, Technical Report, Department of Systems Engineering andEngineering Management, The Chinese University of Hong Kong, Hong Kong,2011. 30, 71, 79, 96, 137
[26] G. Cornuejols and M. Dawande, A Class of Hard Small 0-1 Programs, InformsJournal of Computing, 11, 205–210, 1999. 117
[27] G. Dahl, J. M. Leinaas, J. Myrheim, and E. Ovrum, A Tensor Product Matrix Ap-proximation Problem in Quantum Physics, Linear Algebra and Its Applications,420, 711–725, 2007. 2, 3, 67, 68
140 BIBLIOGRAPHY
[28] L. De Lathauwer, B. De Moor, and J. Vandewalle, A Multilinear Singular ValueDecomposition, SIAM Journal on Matrix Analysis and Applications, 21, 1253–1278, 2000. 43
[29] L. De Lathauwer, B. De Moor, and J. Vandewalle, On the Best Rank-1 andRank-(R1, R2, · · · , RN ) Approximation of Higher-Order Tensors, SIAM Journalon Matrix Analysis and Applications, 21, 1324–1342, 2000. 45
[30] C. N. Delzell, A Continuous, Constructive Solution to Hilbert’s 17th Problem,Inventiones Mathematicae, 76, 365–384, 1984. 2
[31] M. Dyer, A. M. Frieze, and R. Kannan, A Random Polynomial-Time Algorithmfor Approximating the Volume of Convex Bodies, Journal of the ACM, 38, 1–17,1991. 24
[32] U. Feige, Relations between Average Case Complexity and Approximation Com-plexity, Proceedings of the 34th Annual ACM Symposium on Theory of Comput-ing, 534–543, 2002. 5
[33] U. Feige, J. H. Kim, and E. Ofek. Witnesses for Non-Satisfiability of DenseRandom 3CNF Formulas, Proceedings of the 47th Annual IEEE Symposium onFoundations of Computer Science, 497–508, 2006. 5
[34] U. Feige and E. Ofek, Easily Refutable Subformulas of Large Random 3CNFForumlas, Theory of Computing, 3, 25–43, 2007. 5
[35] J. Friedman, A. Goerdt, and M. Krivelevich, Recognizing More Unsatisfiable Ran-dom k-SAT Instances Efficiently, SIAM Journal on Computing, 35, 408–430,2005. 5
[36] A. M. Frieze and R. Kannan, Quick Approximation to Matrices and Applications,Combinatorica, 19, 175–200, 1999. 115
[37] K. Fujisawa, M. Kojima, K. Nakata, and M. Yamashita, SDPA (SemiDefiniteProgramming Algorithm) User’s Manual—Version 6.2.0, Research Report B-308,Department of Mathematical and Computing Sciences, Tokyo Institute of Tech-nology, Tokyo, Japan, 1995. 26
[38] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to theTheory of NP-Completeness, W. H. Freeman and Company, New York, NY, 1979.20, 24, 27, 116
[39] A. Ghosh, E. Tsigaridas, M. Descoteaux, P. Comon, B. Mourrain, and R. De-riche, A Polynomial Based Approach to Extract the Maxima of an AntipodallySymmetric Spherical Function and Its Application to Extract Fiber Direction-s from the Orientation Distribution Function in Diffusion MRI, Proceedings ofthe 11th International Conference on Medical Image Computing and ComputerAssisted Intervention, 237–248, 2008. 3, 7
[40] M. X. Goemans and D. P. Williamson, Improved Approximation Algorithms forMaximum Cut and Satisfiability Problems using Semidefinite Programming, Jour-nal of the ACM, 42, 1115–1145, 1995. 4, 5, 6, 8, 24, 26, 98, 101, 124, 134
BIBLIOGRAPHY 141
[41] M. Grant and S. Boyd, CVX: Matlab Software for Disciplined Convex Program-ming, Version 1.2, http://cvxr.com/cvx, 2010. 69
[42] L. Gurvits, Classical Deterministic Complexity of Edmonds’ Problem and Quan-tum Entanglement. Proceedings of the 35th Annual ACM Symposium on Theoryof Computing, 10–19, 2003. 3
[43] P. L. Hammer and S. Rudeanu, Boolean Methods in Operations Research,Springer-Verlag, New York, NY, 1968. 5
[44] P. Hansen, Methods of Nonlinear 0-1 Programming, Annals of Discrete Mathe-matics, 5, 53–70, 1979. 5
[45] R. A. Harshman, Foundations of the PARAFAC Procedure: Models and Condi-tions for an “Explanatory” Multi-Modal Factor Analysis, UCLA Working Papersin Phonetics, 16, 1–84, 1970. 44
[46] S. He, B. Jiang, Z. Li, and S. Zhang, Probability Bounds for Polynomial Functionsin Random Variables, Technical Report, Department of Systems Engineering andEngineering Management, The Chinese University of Hong Kong, Hong Kong,2011. 137
[47] S. He, Z. Li, and S. Zhang, Approximation Algorithms for Homogeneous Poly-nomial Optimization with Quadratic Constraints, Mathematical Programming,Series B, 125, 353–383, 2010. 135
[48] S. He, Z. Li, and S. Zhang, General Constrained Polynomial Optimization: AnApproximation Approach, Technical Report, Department of Systems Engineer-ing and Engineering Management, The Chinese University of Hong Kong, HongKong, 2010. 135, 137
[49] S. He, Z. Li, and S. Zhang, Approximation Algorithms for Discrete PolynomialOptimization, Technical Report, Department of Systems Engineering and Engi-neering Management, The Chinese University of Hong Kong, Hong Kong, 2010.135, 137
[50] S. He, Z.-Q. Luo, J. Nie, and S. Zhang, Semidefinite Relaxation Bounds forIndefinite Homogeneous Quadratic Optimization, SIAM Journal on Optimization,19, 503–523, 2008. 8, 27, 28, 33, 37, 73, 77
[51] C. Helmberg, Semidefinite Programming for Combinatorial Optimization, ZIB-Report 00-34, Konrad-Zuse-Zentrum fur Informationstechnik Berlin, Berlin, Ger-many, 2000. 25
[52] D. Henrion and J. B. Lasserre, GloptiPoly: Global Optimization over Polynomialswith Matlab and SeDuMi, ACM Transactions on Mathematical Software, 29, 165–194, 2003. 8
[53] D. Henrion, J. B. Lasserre, and J. Loefberg, GloptiPoly 3: Moments, Optimiza-tion and Semidefinite Programming, Optimization Methods and Software, 24,761–779, 2009. 8, 70
[54] D. Hilbert, Uber die Darstellung Definiter Formen als Summe von Formen-quadraten, Mathematische Annalen, 32, 342–350, 1888. 1
142 BIBLIOGRAPHY
[55] F. L. Hitchcock, The Expression of a Tensor or a Polyadic as a Sum of Products,Journal of Mathematical Physics, 6, 164–189, 1927. 44
[56] F. L. Hitchcock, Multilple Invariants and Generalized Rank of a p-Way Matrixor Tensor, Journal of Mathematical Physics, 6, 39–79, 2007. 44
[57] P. M. J. van den Hof, C. Scherer, and P. S. C. Heuberger, Model-Based Con-trol: Bridging Rigorous Theory and Advanced Technology, 49–68. Springer-Verlag,Berlin, Germany, 2009. 7
[58] J. J. Hopfield and D. W. Tank, “Neural” Computation of Decisions in Optimiza-tion Problem, Biological Cybernetics, 52, 141–152, 1985. 5
[59] Y. Huang and S. Zhang, Approximation Algorithms for Indefinite ComplexQuadratic Maximization Problems, Science China Mathematics, 53, 2697–2708,2010. 98
[60] E. Jondeau and M. Rockinger, Optimal Portfolio Allocation under Higher Mo-ments, European Financial Management, 12, 29–55, 2006. 3, 92
[61] V. Kann, On the Approximability of NP-Complete Optimization Problems, Ph.D.Dissertation, Royal Institute of Technology, Stockholm, Sweden, 1992. 22
[62] R. Kannan, Spectral Methods for Matrices and Tensors, Proceedings of the 42ndAnnual ACM Symposium on Theory of Computing, 1–12, 2010. 115
[63] S. Khot and A. Naor, Linear Equations Modulo 2 and the L1 Diameter of ConvexBodies, Proceedings of the 48th Annual IEEE Symposium on Foundations ofComputer Science, 318–328, 2007. 5, 9, 98, 100, 103
[64] P. M. Kleniati, P. Parpas, and B. Rustem Partitioning Procedure for PolynomialOptimization: Application to Portfolio Decisions with Higher Order Moments,COMISEF Working Papers Series, WPS-023, 2009. 3, 4, 92
[65] E. de Klerk, The Complexity of Optimizing over a Simplex, Hypercube or Sphere:A Short Survey, Central European Journal of Operations Research, 16, 111–125,2008. 6
[66] E. de Klerk, M. Laurent, and P. A. Parrilo, A PTAS for the Minimization ofPolynomials of Fixed Degree over the Simplex, Theoretical Computer Science,261, 210–225, 2006. 6, 9
[67] E. Kofidis and Ph. Regalia, On the Best Rank-1 Approximation of Higher OrderSupersymmetric Tensors, SIAM Journal on Matrix Analysis and Applications,23, 863–884, 2002. 3, 12, 44, 45
[68] T. G. Kolda and B. W. Bader, Tensor Decompositions and Applications, SIAMReview, 51, 455–500, 2009. 2, 3, 18, 44, 79, 120
[69] A. Kroo and J. Szabados, Joackson-Type Theorems in Homogeneous Approxima-tion, Journal of Approximation Theory, 152, 1–19, 2008. 3, 30
[70] J. B. Lasserre, Global Optimization with Polynomials and the Problem of Mo-ments, SIAM Journal on Optimization, 11, 796–817, 2001. 2, 7, 70
BIBLIOGRAPHY 143
[71] J. B. Lasserre, Polynomials Nonnegative on a Grid and Discrete Representations,Transactions of the American Mathematical Society, 354, 631–649, 2002. 2, 7, 70
[72] M. Laurent, Sums of Squares, Moment Matrices and Optimization over Polyno-mials, in M. Putinar and S. Sullivant (Eds.), Emerging Applications of AlgebraicGeometry, The IMA Volumes in Mathematics and its Applications, 149, 1–114,2009. 8
[73] C. Ling, J. Nie, L. Qi, and Y. Ye, Biquadratic Optimization over Unit Spheresand Semidefinite Programming Relaxations, SIAM Journal on Optimization, 20,1286–1310, 2009. 9, 27, 51, 60, 62, 73
[74] C. Ling, X. Zhang, and L. Qi, Semidefinite Relaxation Approximation for Multi-variate Bi-Quadratic Optimization with Quadratic Constraints, Numerical LinearAlgebra with Applications, DOI: 10.1002/nla.781, 2011. 9, 51
[75] Z.-Q. Luo, N. D. Sidiropoulos, P. Tseng, and S. Zhang, Approximation Bounds forQuadratic Optimization with Homogeneous Quadratic Constraints, SIAM Journalon Optimization, 18, 1–28, 2007. 8, 27, 69, 73, 77
[76] Z.-Q. Luo, J. F. Sturm, and S. Zhang, Multivariate Nonnegative Quadratic Map-pings, SIAM Journal on Optimization, 14, 1140–1162, 2004. 3
[77] Z.-Q. Luo and S. Zhang, A Semidefinite Relaxation Scheme for MultivariateQuartic Polynomial Optimization with Quadratic Constraints, SIAM Journal onOptimization, 20, 1716–1736, 2010. 4, 9, 27, 34, 50, 51, 73, 77, 93
[78] B. B. Mandelbrot and R. L. Hudson, The (Mis)Behavior of Markets: A FractalView of Risk, Ruin, and Reward, Basic Books, New York, NY, 2004. 3, 92
[79] B. Maricic, Z.-Q. Luo, and T. N. Davidson, Blind Constant Modulus Equalizationvia Convex Optimization, IEEE Transactions on Signal Processing, 51, 805–818,2003. 2
[80] D. Maringer and P. Parpas, Global Optimization of Higher Order Moments inPortfolio Selection, Journal of Global Optimization, 43, 219–230, 2009. 7
[81] H. M. Markowitz, Portfolio Selection, Journal of Finance, 7, 79–91, 1952. 2, 3,92
[82] C. A. Micchelli and P. Olsen, Penalized Maximum-Likelihood Estimation, theBaum-Welch Algorithm, Diagonal Balancing of Symmetric Matrices and Appli-cations to Training Acoustic Data, Journal of Computational and Applied Math-ematics, 119, 301–331, 2000. 3
[83] G. L. Miller, Riemann’s Hypothesis and Tests for Primality, Journal of Computerand System Sciences, 13, 300–317, 1976. 23
[84] B. Mourrain and J. P. Pavone, Subdivision Methods for Solving Polynomial E-quations, Journal of Symbolic Computation, 44, 292–306, 2009. 7
[85] B. Mourrain and P. Trebuchet, Generalized Normal Forms and Polynomial Sys-tem Solving. Proceedings of the 2005 International Symposium on Symbolic andAlgebraic Computation, 253–260, 2005. 7
144 BIBLIOGRAPHY
[86] A. Nemirovski, Lectures on Modern Convex Optimization, The H. Milton StewartSchool of Industrial and Systems Engineering, Georgia Institute of Technology,Atlanta, GA, 2005. 30, 88, 90
[87] A. Nemirovski, C. Roos, and T. Terlaky, On Maximization of Quadratic Formover Intersection of Ellipsoids with Common Center, Mathematical Program-ming, Series A, 86, 463–473, 1999. 8, 27, 28, 30, 37, 42, 50, 73, 77
[88] Yu. Nesterov, Semidefinite Relaxation and Nonconvex Quadratic Optimization,Optimization Methods and Software, 9, 141–160, 1998. 5, 8, 27, 73, 98, 124
[89] Yu. Nesterov, Squared Functional Systems and Optimization Problems, in H.Frenk, K. Roos, T. Terlaky, and S. Zhang (Eds.), High Performance Optimization,Kluwer Academic Press, Dordrecht, The Netherlands, 405–440, 2000. 8
[90] Yu. Nesterov, Random Walk in a Simplex and Quadratic Optimization over Con-vex Polytopes, CORE Discussion Paper 2003/71, Universite catholique de Lou-vain, Louvain-la-Neuve, Belgium, 2003. 5, 6, 31, 50, 60, 74
[91] Q. Ni, L. Qi, and F. Wang, An Eigenvalue Method for Testing Positive Defi-niteness of a Multivariate Form, IEEE Transactions on Automatic Control, 53,1096–1107, 2008. 3, 66
[92] P. Parpas and B. Rustem, Global Optimization of the Scenario Generation andPortfolio Selection Problems, Proceedings of the International Conference onComputational Science and Its Applications, 908–917, 2006. 7
[93] P. A. Parrilo, Structured Semidefinite Programs and Semialgebraic GeometryMethods in Robustness and Optimization, Ph.D. Dissertation, California Insti-tute of Technology, Pasadena, CA, 2000. 2, 7
[94] P. A. Parrilo, Semidefinite Programming Relaxations for Semialgebraic Problems,Mathematical Programming, Series B, 96, 293–320, 2003. 2, 7
[95] L. Peng and M. W. Wong, Compensated Compactness and Paracommutators,Journal of the London Mathematical Society, 62, 505–520, 2000. 43
[96] A. J. Prakash, C.-H. Chang, and T. E. Pactwa, Selecting a Portfolio with Skew-ness: Recent Evidence from US, European, and Latin American Equity Markets,Journal of Banking & Finance, 27, 1375–1390, 2003. 3, 92
[97] M. Purser, Introduction to Error-Correcting Codes, Artech House, Norwood, MA,1995. 5
[98] L. Qi, Extrema of a Real Polynomial, Journal of Global Optimization, 30, 405–433, 2004. 7, 50, 66
[99] L. Qi, Eigenvalues of a Real Supersymmetric Tensor, Journal of Symbolic Com-putation, 40, 1302–1324, 2005. 3, 66
[100] L. Qi, Eigenvalues and Invariants of Tensors, Journal of Mathematical Analysisand Applications, 325, 1363–1377, 2007. 3, 66
[101] L. Qi and K. L. Teo, Multivariate Polynomial Minimization and Its Applicationsin Signal Processing, Journal of Global Optimization, 26, 419–433, 2003. 2
BIBLIOGRAPHY 145
[102] L. Qi, Z. Wan, and Y.-F. Yang, Global Minimization of Normal Quadratic Poly-nomials Based on Global Descent Directions, SIAM Journal on Optimization, 15,275–302, 2004. 7
[103] L. Qi, F. Wang, and Y. Wang, Z-eigenvalue Methods for a Global PolynomialOptimization Problem, Mathematical Programming, Series A, 118, 301–316, 2009.7
[104] M. O. Rabin, Probabilistic Algorithms, in J. F. Traub (Eds.), Algorithms andComplexity: New Directions and Recent Results, Academic Press, New York,NY, 21–39, 1976. 23
[105] M. O. Rabin, Probabilistic Algorithm for Testing Primality, Journal of NumberTheory, 12, 128–138, 1980. 23
[106] J. M. W. Rhys, A Selection Problem of Shared Fixed Costs and Network Flows,Management Science, 17, 200–207, 1970. 5, 7
[107] A. P. Roberts and M. M. Newmann, Polynomial Optimization of Stochastic Feed-back Control for Stable Plants, IMA Journal of Mathematical Control & Infor-mation, 5, 243–257, 1988. 2
[108] A. M.-C. So, Deterministic Approximation Algorithms for Sphere ConstrainedHomogeneous Polynomial Optimization Problems, Mathematical Programming,Series B, DOI: 10.1007/s10107-011-0464-0, 2011. 79, 137
[109] A. M.-C. So, Y. Ye, and J. Zhang, A Unified Theorem on SDP Rank Reduction,Mathematics of Operations Research, 33, 910–920, 2008. 40
[110] S. Soare, J. W. Yoon, and O. Cazacu, On the Use of Homogeneous Polynomi-als to Develop Anisotropic Yield Functions with Applications to Sheet Forming,International Journal of Plasticity, 24, 915–944, 2008. 3
[111] R. M. Solovay and V. Strassen, A Fast Monte-Carlo Test for Primality, SIAMJournal on Computing, 6, 84–85, 1977. 23
[112] J. F. Sturm, SeDuMi 1.02, A Matlab Toolbox for Optimization over SymmetricCones, Optimization Methods and Software, 11 & 12, 625–653, 1999. 8, 26
[113] J. F. Sturm and S. Zhang, On Cones of Nonnegative Quadratic Functions, Math-ematics of Operations Research, 28, 246–267, 2003. 67, 74
[114] W. Sun and Y.-X. Yuan, Optimization Theory and Methods: Nonlinear Program-ming, Springer-Verlag, New York, NY, 2006. 74
[115] K. C. Toh, M. J. Todd, and R. H. Tutuncu, SDPT3—A Matlab Software Packagefor Semidefinite Programming, Version 1.3, Optimization Methods and Software,11, 545–581, 1999. 8, 26
[116] L. Vandenberghe and S. Boyd, Semidefinite Programming, SIAM Review, 38,49–95, 1996. 25
[117] P. P. Varju, Approximation by Homogeneous Polynomials, Constructive Approx-imation, 26, 317–337, 2007. 3, 30
146 BIBLIOGRAPHY
[118] Y. Ye, Approximating Quadratic Programming with Bound and Quadratic Con-straints, Mathematical Programming, 84, 219–226, 1999. 8, 27
[119] Y. Ye, Approximating Global Quadratic Optimization with Convex QuadraticConstraints, Journal of Global Optimization, 15, 1–17, 1999. 8, 27
[120] S. Zhang, Quadratic Maximization and Semidefinite Relaxation, MathematicalProgramming, Series A, 87, 453–465, 2000. 8, 27, 73
[121] S. Zhang and Y. Huang, Complex Quadratic Optimization and Semidefinite Pro-gramming, SIAM Journal on Optimization, 16, 871–890, 2006. 8, 27
[122] T. Zhang and G. H. Golub, Rank-One Approximation to High Order Tensors,SIAM Journal on Matrix Analysis and Applications, 23, 534–550, 2001. 7, 45
[123] X. Zhang, C. Ling, and L. Qi, Semidefinite Relaxation Bounds for Bi-QuadraticOptimization Problems with Quadratic Constraints, Journal of Global Optimiza-tion, 49, 293–311, 2011. 9, 51