Constrained optimization Primal and dual optimization...

Constrained optimization

A general constrained optimization problem has the form

The Lagrangian function is given by

Primal and dual optimization problems

Primal:

Weak duality:

Strong duality: For convex problems with affine constraints

Saddle point property

If are primal/dual optimal with zero duality gap, they are a saddle point of , i.e.,

for all , ,

KKT conditions: The bottom line

If a constrained optimization problem is

• differentiable

• convex

then the KKT conditions are necessary and sufficient for primal/dual optimality (with zero duality gap)

In this case, we can use the KKT conditions to find a solution to our optimization problem

i.e., if we find satisfying the conditions, we have found solutions to both the primal and dual problems

The KKT conditions

(complementary slackness)

Soft-margin classifier

This optimization problem is differentiable and convex

• the KKT conditions and necessary and sufficient conditions for primal/dual optimality (with zero duality gap)

• we can use these conditions to find a relationship between the solutions of the primal and dual problems

• the dual optimization problem will be easy to “kernelize”

Forming the Lagrangian

Begin by converting our problem to the standard form

Forming the Lagrangian

The Lagrangian function is then given by

Lagrange multipliers/dual variables

Soft-margin dual

The Lagrangian dual is thus

and the dual optimization problem is

Let’s compute a simplified expression for

How?Using the KKT conditions!

Taking the gradient

Plugging this in

The dual function is thus

And the dual optimization problem can be written as

Soft-margin dual quadratic program

We can eliminate to obtain

Note: Input patterns are only involved via inner products

Recovering a

Given (the solution to the soft-margin dual), can we recover the optimal and ?

Yes! Use the KKT conditions

From KKT condition 1, we know that

And thus the optimal normal vector is just a linear combination of our input patterns

is a little less obvious – we’ll return to this in a minute

Support vectors

From KKT condition 5 (complementary slackness) we also have that for all ,

The for which are called support vectors

These are the points on or inside the margin of separation

Useful fact:By the KKT conditions, if and only if is a support vector!

Empirical fact

It has been widely demonstrated (empirically) that in typical learning problems, only a small fraction of the training input patterns are support vectors

Thus, support vector machines produce a hyperplane with a sparse representation

This is advantageous for efficient storage and evaluation

What about ?

Another consequence of the KKT conditions (condition 5) is that for all ,

Since , this implies that if , then

Recall that if we also have that is a support vector, and hence

How can we combine these two facts to determine ?

Recovering a

For any such that , we have

In practice, it is common to average over several such to counter numerical imprecision

Support vector machines

Given an inner product kernel , we can write the SVM classifier as

where is the solution of

and for some s.t.

Remarks

• The final classifier depends only on the with , i.e., the support vectors

• The size (number of variables) of the dual QP is , independent of the kernel , the mapping , or the space– remarkable, since the dimension of can be infinite

• The soft-margin hyperplane was the first machine learning algorithm to be “kernelized”, but since then the idea has been applied to many, many other algorithms – kernel ridge regression

– kernel PCA

– …

Solving the quadratic program

How can we actually compute the solution to

where ?

There are several general approaches to soling quadratic programs, and many can be applied to solve the SVM dual

We will focus on a particular example that is very efficient and capitalizes on some of the unique structure in the SVM dual, called sequential minimal optimization (SMO)

Sequential minimal optimization

SMO is an example of a decomposition algorithm

Sequential minimal optimization

Initialize:

Repeat until stopping criteria satisfied

(1) Select a pair (2) Update and by optimizing the dual QP,

holding all other fixed

The reason for decomposing this to a two-variable subproblemis that this subproblem can be solved exactly via a simple analytic update

The update step

Choose and to solve

where and similarly for

SMO in practice

• Several strategies have been proposed for selectingat each iteration

• Typically based on heuristics (often using the KKT conditions) that predict which pair of variables will lead to the largest change in the objective function

• For many of these heuristics, the SMO algorithm is proven to converge to the global optimum after finitely many iterations

• The running time is in the worst case, but tends to be more like in practice

Alternative algorithms

SMO is one of the predominant strategies for training an SVM, but there are important alternatives to consider on very large datasets

– modern variants for solving the dual based on stochastic gradient descent § closely related to SMO

– directly optimizing the primal§ makes most sense when the dimension of the feature space is

small compared to the size of the dataset§ some algorithms very similar to PLA and stochastic gradient

descent version of logistic regression

Constrained optimization Primal and dual optimization...

Documents

Transcript of Constrained optimization Primal and dual optimization...

Primal-Dual Optimization Algorithms over …Primal-Dual Optimization Algorithms over Riemannian Manifolds: an Iteration Complexity Analysis Junyu Zhang Shiqian May Shuzhong Zhangz

Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:

A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization · 2019-08-19 · A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization

LP Dual Primal Equivalence

Playing With Duality: An Overview of Recent Primal-Dual Approaches for Solving Large-Scale Optimization Problems

A specialized primal-dual interior point method for truss ...gondzio/reports/ipmOptLoad.pdf · A specialized primal-dual interior point method for truss layout optimization problems

Convergence analysis of primal-dual algorithms for total ... · PDF fileConvergence analysis of primal-dual algorithms for total variation image restoration ... some primal-dual-based

Primal Dual LPP

primal and dual problem

A new Primal-Dual Interior-Point Algorithm for Second ...A new Primal-Dual Interior-Point Algorithm for Second-Order Cone Optimization∗ Y. Q. Bai †G. Q. Wang C. Roos‡ November

A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION

RE-OPTIMIZATION WITH THE PRIMAL-DUAL INTERIOR POINT METHODgondzio/reports/crash.pdf · 2002-08-01 · RE-OPTIMIZATION WITH THE PRIMAL-DUAL INTERIOR POINT METHOD JACEK GONDZIO yAND

A PRIMAL DUAL ACTIVE SET ALGORITHM FOR A CLASS OF …xllv.whu.edu.cn/submit01.pdf · 2016-08-06 · A PRIMAL DUAL ACTIVE SET ALGORITHM FOR A CLASS OF NONCONVEX SPARSITY OPTIMIZATION

6. Primal-dual algorithms 6.1 Introductionpage.math.tu-berlin.de/~moehring/adm2/Chapter/chap6.pdf6. Primal-dual algorithms 6.2 The primal-dual algorithmn 36-1 Constructing a dual feasible

Primal-Dual Optimization Methods in Neural Networks and Support Vector Machines Training

A Parallel Primal-Dual Interior-Point Method for Semide ...found using primal-dual interior-point methods. The development of e cient primal-dual interior-point methods has become

Recent developments on primal-dual splitting methods with ...rabot/publications/proc14-01.pdf · Recent developments on primal-dual splitting methods 5 3 Primal-dual algorithms for

Primal and dual-primal iterative substructuring methods of stochastic

Dual/Primal Mesh Optimization for Polygonized Implicit Surfaces

Layering as Optimization Decompositionlich1539/fn/Layering.pdfLayering as optimization decomposition Network generalized NUM Layers sub-problems Interface functions of primal/dual