A

Startup ADMM

Shin Matsushima

Department of StatisticsPurdue University

Lab SeminarApril 2, 2012

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 1 / 22

The Paper

”Distributed Optimization and Statistical Learningvia the Alternating Direction Method of Multipliers”Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato and Jonathan EcksteinFoundations and Trends in Machine Learning Vol. 3, No. 1 (2010) 1-122URL: http://www.stanford.edu/˜boyd/papers/pdf/admm distr stats.pdf


http://www.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf

Outline

1 Dual Ascent

2 Method of Multipliers

3 Alterative Direction Method of Multipliers (ADMM)

4 The following contents

Consider the equality-comnstrained convex optimization problem

minimize f (x)

subject to Ax = b

where x ∈ Rn,A ∈ Rm×n and f is convex.Lagrangian:

L(x , y) = f (x) + y>(Ax − b)


Assume strong duality:

infx

supy

L(x , y) = supy

infxL(x , y)︸︷︷︸

g(y):concave

Aiming to solve the dual problem:

maximize g(y) = infxf (x) + y>(Ax − b)

and recover a primal optimal point using y? (optimal solution of above)

x? = argminx

L(x , y?)


Dual Ascent

Procedure(Dual Ascent):

xk+1 = argminx

L(x , yk)

= argminx

f (x) + y>Ax

yk+1 = yk + αk(Axk+1 − b)

Axk+1 − b ∈ ∂g(yk) because

g(yk) = minx

f (x) + yk>(Ax − b) = f (xk+1) + yk>(Axk+1 − b)

g(y) = minx

f (x) + y>(Ax − b) ≤ f (xk+1) + y>(Axk+1 − b)

⇒ g(y)− g(yk) ≤ (y − yk)>(Axk+1 − b)

Necessary to chose appropriate stepsize αk


Dual Decomposition

In the problem

minimize f (x)

subject to Ax = b,

assume f (x) =∑N

i=1 fi (xi ) where xi ∈ Rni , x = [x>1 · · · x>N ]>

Let Ai ∈ Rm×ni ,A = [A1 · · ·AN ]Then the algorithm becomes decentraized:

xk+1 = argminx

L(x , yk)

= argminxi

f (xi ) + yk>Aixi for i = 1, . . . ,N

yk+1︸︷︷︸broadcast

= yk + αk(A xk+1︸︷︷︸gather

−b)


Outline

1 Dual Ascent




Original Problem:

minimize f (x)

subject to Ax = b

Original Lagrangian:

L(x , y) = f (x) + y>(Ax − b)

Augmented Lagrangian:

Lρ(x , y) = f (x) + y>(Ax − b) + ρ/2‖Ax − b‖2

is considered Lagrangian for the following equality-comnstrained convexoptimization problem wich is equivalent to the original problem

minimize f (x) + ρ/2‖Ax − b‖2

subject to Ax = b


Method of Multipliers

Procedure(Method of Multipliers):

xk+1 = argminx

Lρ(x , yk)

= argminx

L(x , yk) + ρ/2‖Ax − b‖2

yk+1 = yk + ρ(Axk+1 − b)

stepsize is now a fixed constant ρ.


An easy understanding for setting stepsize to ρ is the following:We can see

xk+1 = argminx

Lρ(x , yk)

⇒ 0 ∈ ∂xk+1{f (xk+1) + yk>Axk+1 + ρ/2‖Axk+1 − b‖2}= ∂xk+1f (xk+1) + A>yk + ρ(Axk+1 − b)

= ∂xk+1f (xk+1) + A>yk+1

This implies that MM keeps

0 ∈ ∂f (xk+1) + A>yk+1

after every iteration. Note that

0 ∈ ∂f (x?) + A>y? (dual feasibility)


Method of Multipliers has more improved convergence property.But the augmented term disables to make it separate.

xk+1 = argminx

Lρ(x , yk)

= argminx

L(x , yk) + ρ/2‖Ax − b‖2

6= argminxi

fi (xi ) + yk>Aixi + · · ·

yk+1 = yk + ρ(Axk+1 − b)


Outline

1 Dual Ascent




ADMM solves problems in the following form

minimize f (x) + g(z)

subject to Ax + Bz = c

Augmented Lagrangian:

Lρ(x , z , y) = f (x) + g(z) + y>(Ax + Bz − c) + ρ/2‖Ax − Bz − c‖2


Alternative Direction Method of Mutipliers(ADMM)

Procedure(ADMM):

xk+1 = argminx

Lρ(x , zk , yk)

= argminx

f (x) + yk>(Ax + Bzk − c) + ρ/2‖Ax + Bzk − c‖2

zk+1 = argminz

Lρ(xk+1, z , yk)

yk+1 = yk + ρ(Axk+1 + Bzk+1 − c)

This includes x-minimization step and z-minimization step.c.f. Method of multipliers should be

xk+1, zk+1 = argminx ,z

Lρ(x , z , yk)

yk+1 = yk + ρ(Axk+1 + Bzk+1 − c)


Scaled Form of Alternative Direction Method ofMutipliers(ADMM)

Let rk = Axk +Bzk − c (residual) and uk = (1/ρ)yk(scaled dual variable),The procedure of ADMM can be rewritten as follows:Procedure(scaled form of ADMM):

xk+1 = argminx

Lρ(x , zk , yk)

= argminx

f (x) + yk>(Ax + Bzk − c) + ρ/2‖Ax + Bzk − c‖2

= argminx

f (x) + ρ/2‖Ax + Bzk − c + uk‖2

zk+1 = argminz

Lρ(xk+1, z , yk)

uk+1 = uk + (Axk+1 + Bzk+1 − c)

= uk + rk+1


Convergence of ADMM

Assumption :

1 f and g are closed, proper, and convex

2 L(x , z , y) = L0(x , z , y) has a saddle point

Result : As k →∞,

1 Residual Convergence

rk → 0

2 Objective Convergence

f (xk) + g(zk)→ p∗

3 Dual variable Convergence

yk → y∗ where y∗ is a dual optimal point

In practice:a few tens of iteration will often produce modest accuracy solutionit can be very slow to get high accuracy solution


Outline

1 Dual Ascent




Other Characteristics of ADMM

optimality condition and stop conditioin

some variants

Varying ρk → 0More general Augumenting termsInexact x/z-minimization step...


Remarkable Application Discussed in the Following Chapter

Chap 5: Problem with more general constraint

minimize f (x)

subject to x ∈ C

is transfoemed as

minimize f (x) + IC(z)

subject to x − z = 0

Procedure:

xk+1 = argminx

f (x) + ρ/2‖x − zk + uk‖2

zk+1 = argminz∈C

ρ/2‖xk+1 − z + uk‖2

uk+1 = uk + (xk+1 − zk+1)


Chap 7: Distributed Version (Consensus) of

minimize f (x) =N∑i=1

fi (x)

is transfromed as

minimizeN∑i=1

fi (xi )

subject to xi − z = 0 i = 1, . . . ,N

Procedure:

xk+1 = argminx

Lρ(x , zk , yk)

= argminxi

fi (xi ) + yk>i (xi − z) + ρ/2‖xi − z‖2

zk+1 = 1/NN∑i=1

(xk+1i + (1/ρ)yki )

yk+1 = yk + ρ(xk+1 − zk+1)


Contents

4 General patterns

Tips and tools used after this chaper

5 Constrained Convex Optimization

How to incorpolate general constraints

6 `1-Norm Problems

Discussion about problems involving `1-Norm

7 Consensus and Sharing

Framework for distributed optimization

8 Distributed Model Fitting

Examples for the distributed optimization

9 Nonconvex problems

10 Implementation

11 Numerical Examples

12 Conclusion


A

Business

Transcript of A