A

22
Startup ADMM Shin Matsushima Department of Statistics Purdue University Lab Seminar April 2, 2012 Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 1 / 22

Transcript of A

Page 1: A

Startup ADMM

Shin Matsushima

Department of StatisticsPurdue University

Lab SeminarApril 2, 2012

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 1 / 22

Page 2: A

The Paper

”Distributed Optimization and Statistical Learningvia the Alternating Direction Method of Multipliers”Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato and Jonathan EcksteinFoundations and Trends in Machine Learning Vol. 3, No. 1 (2010) 1-122URL: http://www.stanford.edu/˜boyd/papers/pdf/admm distr stats.pdf

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 2 / 22

Page 3: A

Outline

1 Dual Ascent

2 Method of Multipliers

3 Alterative Direction Method of Multipliers (ADMM)

4 The following contents

Page 4: A

Consider the equality-comnstrained convex optimization problem

minimize f (x)

subject to Ax = b

where x ∈ Rn,A ∈ Rm×n and f is convex.Lagrangian:

L(x , y) = f (x) + y>(Ax − b)

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 4 / 22

Page 5: A

Assume strong duality:

infx

supy

L(x , y) = supy

infxL(x , y)︸ ︷︷ ︸

g(y):concave

Aiming to solve the dual problem:

maximize g(y) = infxf (x) + y>(Ax − b)

and recover a primal optimal point using y? (optimal solution of above)

x? = argminx

L(x , y?)

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 5 / 22

Page 6: A

Dual Ascent

Procedure(Dual Ascent):

xk+1 = argminx

L(x , yk)

= argminx

f (x) + y>Ax

yk+1 = yk + αk(Axk+1 − b)

Axk+1 − b ∈ ∂g(yk) because

g(yk) = minx

f (x) + yk>(Ax − b) = f (xk+1) + yk>(Axk+1 − b)

g(y) = minx

f (x) + y>(Ax − b) ≤ f (xk+1) + y>(Axk+1 − b)

⇒ g(y)− g(yk) ≤ (y − yk)>(Axk+1 − b)

Necessary to chose appropriate stepsize αk

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 6 / 22

Page 7: A

Dual Decomposition

In the problem

minimize f (x)

subject to Ax = b,

assume f (x) =∑N

i=1 fi (xi ) where xi ∈ Rni , x = [x>1 · · · x>N ]>

Let Ai ∈ Rm×ni ,A = [A1 · · ·AN ]Then the algorithm becomes decentraized:

xk+1 = argminx

L(x , yk)

= argminxi

f (xi ) + yk>Aixi for i = 1, . . . ,N

yk+1︸︷︷︸broadcast

= yk + αk(A xk+1︸︷︷︸gather

−b)

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 7 / 22

Page 8: A

Outline

1 Dual Ascent

2 Method of Multipliers

3 Alterative Direction Method of Multipliers (ADMM)

4 The following contents

Page 9: A

Original Problem:

minimize f (x)

subject to Ax = b

Original Lagrangian:

L(x , y) = f (x) + y>(Ax − b)

Augmented Lagrangian:

Lρ(x , y) = f (x) + y>(Ax − b) + ρ/2‖Ax − b‖2

is considered Lagrangian for the following equality-comnstrained convexoptimization problem wich is equivalent to the original problem

minimize f (x) + ρ/2‖Ax − b‖2

subject to Ax = b

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 9 / 22

Page 10: A

Method of Multipliers

Procedure(Method of Multipliers):

xk+1 = argminx

Lρ(x , yk)

= argminx

L(x , yk) + ρ/2‖Ax − b‖2

yk+1 = yk + ρ(Axk+1 − b)

stepsize is now a fixed constant ρ.

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 10 / 22

Page 11: A

An easy understanding for setting stepsize to ρ is the following:We can see

xk+1 = argminx

Lρ(x , yk)

⇒ 0 ∈ ∂xk+1{f (xk+1) + yk>Axk+1 + ρ/2‖Axk+1 − b‖2}= ∂xk+1f (xk+1) + A>yk + ρ(Axk+1 − b)

= ∂xk+1f (xk+1) + A>yk+1

This implies that MM keeps

0 ∈ ∂f (xk+1) + A>yk+1

after every iteration. Note that

0 ∈ ∂f (x?) + A>y? (dual feasibility)

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 11 / 22

Page 12: A

Method of Multipliers has more improved convergence property.But the augmented term disables to make it separate.

xk+1 = argminx

Lρ(x , yk)

= argminx

L(x , yk) + ρ/2‖Ax − b‖2

6= argminxi

fi (xi ) + yk>Aixi + · · ·

yk+1 = yk + ρ(Axk+1 − b)

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 12 / 22

Page 13: A

Outline

1 Dual Ascent

2 Method of Multipliers

3 Alterative Direction Method of Multipliers (ADMM)

4 The following contents

Page 14: A

ADMM solves problems in the following form

minimize f (x) + g(z)

subject to Ax + Bz = c

Augmented Lagrangian:

Lρ(x , z , y) = f (x) + g(z) + y>(Ax + Bz − c) + ρ/2‖Ax − Bz − c‖2

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 14 / 22

Page 15: A

Alternative Direction Method of Mutipliers(ADMM)

Procedure(ADMM):

xk+1 = argminx

Lρ(x , zk , yk)

= argminx

f (x) + yk>(Ax + Bzk − c) + ρ/2‖Ax + Bzk − c‖2

zk+1 = argminz

Lρ(xk+1, z , yk)

yk+1 = yk + ρ(Axk+1 + Bzk+1 − c)

This includes x-minimization step and z-minimization step.c.f. Method of multipliers should be

xk+1, zk+1 = argminx ,z

Lρ(x , z , yk)

yk+1 = yk + ρ(Axk+1 + Bzk+1 − c)

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 15 / 22

Page 16: A

Scaled Form of Alternative Direction Method ofMutipliers(ADMM)

Let rk = Axk +Bzk − c (residual) and uk = (1/ρ)yk(scaled dual variable),The procedure of ADMM can be rewritten as follows:Procedure(scaled form of ADMM):

xk+1 = argminx

Lρ(x , zk , yk)

= argminx

f (x) + yk>(Ax + Bzk − c) + ρ/2‖Ax + Bzk − c‖2

= argminx

f (x) + ρ/2‖Ax + Bzk − c + uk‖2

zk+1 = argminz

Lρ(xk+1, z , yk)

uk+1 = uk + (Axk+1 + Bzk+1 − c)

= uk + rk+1

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 16 / 22

Page 17: A

Convergence of ADMM

Assumption :

1 f and g are closed, proper, and convex

2 L(x , z , y) = L0(x , z , y) has a saddle point

Result : As k →∞,

1 Residual Convergence

rk → 0

2 Objective Convergence

f (xk) + g(zk)→ p∗

3 Dual variable Convergence

yk → y∗ where y∗ is a dual optimal point

In practice:a few tens of iteration will often produce modest accuracy solutionit can be very slow to get high accuracy solution

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 17 / 22

Page 18: A

Outline

1 Dual Ascent

2 Method of Multipliers

3 Alterative Direction Method of Multipliers (ADMM)

4 The following contents

Page 19: A

Other Characteristics of ADMM

optimality condition and stop conditioin

some variants

Varying ρk → 0More general Augumenting termsInexact x/z-minimization step...

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 19 / 22

Page 20: A

Remarkable Application Discussed in the Following Chapter

Chap 5: Problem with more general constraint

minimize f (x)

subject to x ∈ C

is transfoemed as

minimize f (x) + IC(z)

subject to x − z = 0

Procedure:

xk+1 = argminx

f (x) + ρ/2‖x − zk + uk‖2

zk+1 = argminz∈C

ρ/2‖xk+1 − z + uk‖2

uk+1 = uk + (xk+1 − zk+1)

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 20 / 22

Page 21: A

Chap 7: Distributed Version (Consensus) of

minimize f (x) =N∑i=1

fi (x)

is transfromed as

minimizeN∑i=1

fi (xi )

subject to xi − z = 0 i = 1, . . . ,N

Procedure:

xk+1 = argminx

Lρ(x , zk , yk)

= argminxi

fi (xi ) + yk>i (xi − z) + ρ/2‖xi − z‖2

zk+1 = 1/NN∑i=1

(xk+1i + (1/ρ)yki )

yk+1 = yk + ρ(xk+1 − zk+1)

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 21 / 22

Page 22: A

Contents

4 General patterns

Tips and tools used after this chaper

5 Constrained Convex Optimization

How to incorpolate general constraints

6 `1-Norm Problems

Discussion about problems involving `1-Norm

7 Consensus and Sharing

Framework for distributed optimization

8 Distributed Model Fitting

Examples for the distributed optimization

9 Nonconvex problems

10 Implementation

11 Numerical Examples

12 Conclusion

Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 22 / 22