A
Transcript of A
![Page 1: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/1.jpg)
Startup ADMM
Shin Matsushima
Department of StatisticsPurdue University
Lab SeminarApril 2, 2012
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 1 / 22
![Page 2: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/2.jpg)
The Paper
”Distributed Optimization and Statistical Learningvia the Alternating Direction Method of Multipliers”Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato and Jonathan EcksteinFoundations and Trends in Machine Learning Vol. 3, No. 1 (2010) 1-122URL: http://www.stanford.edu/˜boyd/papers/pdf/admm distr stats.pdf
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 2 / 22
![Page 3: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/3.jpg)
Outline
1 Dual Ascent
2 Method of Multipliers
3 Alterative Direction Method of Multipliers (ADMM)
4 The following contents
![Page 4: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/4.jpg)
Consider the equality-comnstrained convex optimization problem
minimize f (x)
subject to Ax = b
where x ∈ Rn,A ∈ Rm×n and f is convex.Lagrangian:
L(x , y) = f (x) + y>(Ax − b)
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 4 / 22
![Page 5: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/5.jpg)
Assume strong duality:
infx
supy
L(x , y) = supy
infxL(x , y)︸ ︷︷ ︸
g(y):concave
Aiming to solve the dual problem:
maximize g(y) = infxf (x) + y>(Ax − b)
and recover a primal optimal point using y? (optimal solution of above)
x? = argminx
L(x , y?)
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 5 / 22
![Page 6: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/6.jpg)
Dual Ascent
Procedure(Dual Ascent):
xk+1 = argminx
L(x , yk)
= argminx
f (x) + y>Ax
yk+1 = yk + αk(Axk+1 − b)
Axk+1 − b ∈ ∂g(yk) because
g(yk) = minx
f (x) + yk>(Ax − b) = f (xk+1) + yk>(Axk+1 − b)
g(y) = minx
f (x) + y>(Ax − b) ≤ f (xk+1) + y>(Axk+1 − b)
⇒ g(y)− g(yk) ≤ (y − yk)>(Axk+1 − b)
Necessary to chose appropriate stepsize αk
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 6 / 22
![Page 7: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/7.jpg)
Dual Decomposition
In the problem
minimize f (x)
subject to Ax = b,
assume f (x) =∑N
i=1 fi (xi ) where xi ∈ Rni , x = [x>1 · · · x>N ]>
Let Ai ∈ Rm×ni ,A = [A1 · · ·AN ]Then the algorithm becomes decentraized:
xk+1 = argminx
L(x , yk)
= argminxi
f (xi ) + yk>Aixi for i = 1, . . . ,N
yk+1︸︷︷︸broadcast
= yk + αk(A xk+1︸︷︷︸gather
−b)
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 7 / 22
![Page 8: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/8.jpg)
Outline
1 Dual Ascent
2 Method of Multipliers
3 Alterative Direction Method of Multipliers (ADMM)
4 The following contents
![Page 9: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/9.jpg)
Original Problem:
minimize f (x)
subject to Ax = b
Original Lagrangian:
L(x , y) = f (x) + y>(Ax − b)
Augmented Lagrangian:
Lρ(x , y) = f (x) + y>(Ax − b) + ρ/2‖Ax − b‖2
is considered Lagrangian for the following equality-comnstrained convexoptimization problem wich is equivalent to the original problem
minimize f (x) + ρ/2‖Ax − b‖2
subject to Ax = b
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 9 / 22
![Page 10: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/10.jpg)
Method of Multipliers
Procedure(Method of Multipliers):
xk+1 = argminx
Lρ(x , yk)
= argminx
L(x , yk) + ρ/2‖Ax − b‖2
yk+1 = yk + ρ(Axk+1 − b)
stepsize is now a fixed constant ρ.
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 10 / 22
![Page 11: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/11.jpg)
An easy understanding for setting stepsize to ρ is the following:We can see
xk+1 = argminx
Lρ(x , yk)
⇒ 0 ∈ ∂xk+1{f (xk+1) + yk>Axk+1 + ρ/2‖Axk+1 − b‖2}= ∂xk+1f (xk+1) + A>yk + ρ(Axk+1 − b)
= ∂xk+1f (xk+1) + A>yk+1
This implies that MM keeps
0 ∈ ∂f (xk+1) + A>yk+1
after every iteration. Note that
0 ∈ ∂f (x?) + A>y? (dual feasibility)
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 11 / 22
![Page 12: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/12.jpg)
Method of Multipliers has more improved convergence property.But the augmented term disables to make it separate.
xk+1 = argminx
Lρ(x , yk)
= argminx
L(x , yk) + ρ/2‖Ax − b‖2
6= argminxi
fi (xi ) + yk>Aixi + · · ·
yk+1 = yk + ρ(Axk+1 − b)
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 12 / 22
![Page 13: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/13.jpg)
Outline
1 Dual Ascent
2 Method of Multipliers
3 Alterative Direction Method of Multipliers (ADMM)
4 The following contents
![Page 14: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/14.jpg)
ADMM solves problems in the following form
minimize f (x) + g(z)
subject to Ax + Bz = c
Augmented Lagrangian:
Lρ(x , z , y) = f (x) + g(z) + y>(Ax + Bz − c) + ρ/2‖Ax − Bz − c‖2
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 14 / 22
![Page 15: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/15.jpg)
Alternative Direction Method of Mutipliers(ADMM)
Procedure(ADMM):
xk+1 = argminx
Lρ(x , zk , yk)
= argminx
f (x) + yk>(Ax + Bzk − c) + ρ/2‖Ax + Bzk − c‖2
zk+1 = argminz
Lρ(xk+1, z , yk)
yk+1 = yk + ρ(Axk+1 + Bzk+1 − c)
This includes x-minimization step and z-minimization step.c.f. Method of multipliers should be
xk+1, zk+1 = argminx ,z
Lρ(x , z , yk)
yk+1 = yk + ρ(Axk+1 + Bzk+1 − c)
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 15 / 22
![Page 16: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/16.jpg)
Scaled Form of Alternative Direction Method ofMutipliers(ADMM)
Let rk = Axk +Bzk − c (residual) and uk = (1/ρ)yk(scaled dual variable),The procedure of ADMM can be rewritten as follows:Procedure(scaled form of ADMM):
xk+1 = argminx
Lρ(x , zk , yk)
= argminx
f (x) + yk>(Ax + Bzk − c) + ρ/2‖Ax + Bzk − c‖2
= argminx
f (x) + ρ/2‖Ax + Bzk − c + uk‖2
zk+1 = argminz
Lρ(xk+1, z , yk)
uk+1 = uk + (Axk+1 + Bzk+1 − c)
= uk + rk+1
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 16 / 22
![Page 17: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/17.jpg)
Convergence of ADMM
Assumption :
1 f and g are closed, proper, and convex
2 L(x , z , y) = L0(x , z , y) has a saddle point
Result : As k →∞,
1 Residual Convergence
rk → 0
2 Objective Convergence
f (xk) + g(zk)→ p∗
3 Dual variable Convergence
yk → y∗ where y∗ is a dual optimal point
In practice:a few tens of iteration will often produce modest accuracy solutionit can be very slow to get high accuracy solution
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 17 / 22
![Page 18: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/18.jpg)
Outline
1 Dual Ascent
2 Method of Multipliers
3 Alterative Direction Method of Multipliers (ADMM)
4 The following contents
![Page 19: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/19.jpg)
Other Characteristics of ADMM
optimality condition and stop conditioin
some variants
Varying ρk → 0More general Augumenting termsInexact x/z-minimization step...
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 19 / 22
![Page 20: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/20.jpg)
Remarkable Application Discussed in the Following Chapter
Chap 5: Problem with more general constraint
minimize f (x)
subject to x ∈ C
is transfoemed as
minimize f (x) + IC(z)
subject to x − z = 0
Procedure:
xk+1 = argminx
f (x) + ρ/2‖x − zk + uk‖2
zk+1 = argminz∈C
ρ/2‖xk+1 − z + uk‖2
uk+1 = uk + (xk+1 − zk+1)
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 20 / 22
![Page 21: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/21.jpg)
Chap 7: Distributed Version (Consensus) of
minimize f (x) =N∑i=1
fi (x)
is transfromed as
minimizeN∑i=1
fi (xi )
subject to xi − z = 0 i = 1, . . . ,N
Procedure:
xk+1 = argminx
Lρ(x , zk , yk)
= argminxi
fi (xi ) + yk>i (xi − z) + ρ/2‖xi − z‖2
zk+1 = 1/NN∑i=1
(xk+1i + (1/ρ)yki )
yk+1 = yk + ρ(xk+1 − zk+1)
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 21 / 22
![Page 22: A](https://reader033.fdocuments.us/reader033/viewer/2022060119/55906bf21a28ab60488b46ea/html5/thumbnails/22.jpg)
Contents
4 General patterns
Tips and tools used after this chaper
5 Constrained Convex Optimization
How to incorpolate general constraints
6 `1-Norm Problems
Discussion about problems involving `1-Norm
7 Consensus and Sharing
Framework for distributed optimization
8 Distributed Model Fitting
Examples for the distributed optimization
9 Nonconvex problems
10 Implementation
11 Numerical Examples
12 Conclusion
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 22 / 22