Bregman Iterative Methods, Lagrangian Connections, Dual ...

Post on 01-Jun-2022

2 views 0 download

Transcript of Bregman Iterative Methods, Lagrangian Connections, Dual ...

Bregman Iterative Methods,Lagrangian Connections,

Dual Interpretations,and Applications

Ernie Esser

UCLA

6-30-09

1

Outline

• Bregman Iteration Overview• Method for Constrained Optimization• Compare to Denoising Application

• Linearized Bregman forl1-Minimization• Derivation and Equivalent Forms

• Lagrangian Connections• Bregman Iteration / Method of Multipliers• Linearized Bregman / Uzawa Method

• Dual Interpretations• Proximal Point Algorithm• Gradient Ascent

2

Outline Continued

• Split Bregman Idea• TV-l2 Example

• More General Separable Convex Programs• Split Bregman Connection to ADMM

• Convergence• Dual Interpretation• TV-l − 1 Minimization Example

• Decoupling Variables for More Explicit Algorithms• TV Deblurring Example• Compressive Sensing Example

• Connection to PDHG• Main Idea and Derivation

• Further Applications...

3

A Model Constrained Minimization Problem

minu

J(u) s.t. Ku = f

J closed proper convex

J : Rm → (−∞,∞], u ∈ Rm

K ∈ Rs×m, f ∈ R

s

Examples:• J(u) = ‖u‖1 (Basis Pursuit)

• J(u) = ‖u‖TV

4

Bregman Distance

Dpk

J (u, uk) = J(u) − J(uk) − 〈pk, u − uk〉

wherepk ∈ ∂J(uk).

By definition of subdifferential,pk ∈ ∂J(uk) means

J(v) − J(uk) − 〈pk, v − uk〉 ≥ 0 ∀v

5

Bregman Iteration

uk+1 = arg minu

Dpk

J (u, uk) +δ

2‖Ku − f‖2

pk+1 = pk − δKT (Kuk+1 − f) ∈ ∂J(uk+1)

Equivalentuk+1:

uk+1 = arg minu

J(u) − 〈pk, u〉 +δ

2‖Ku − f‖2

Initialization: p0 = 0, u0 arbitrary

Ref: Yin, W., Osher, S., Goldfarb, D., Darbon, J.,Bregman Iterative Algorithmsfor l1-Minimization with Applications to Compressed Sensing, UCLA CAM Report[07-37], 2007.

6

Denoising Example

minu

‖u‖TV s.t. ‖u − f‖2 ≤ σ2

Apply Bregman iteration to:

minu

‖u‖TV s.t. u = f

⇒ uk+1 = arg minu

‖u‖TV +δ

2‖u − f − pk

δ‖2

pk+1 = pk − δ(uk+1 − f)

‖uk − f‖ → 0 monotonically

‖uk − u∗‖ non-increasing while‖uk − f‖ ≥ ‖u∗ − f‖

⇒ Stop iterating when constraint satisfied

7

Linearized Bregman for l1-Minimization

Apply Bregman iteration to

minu

‖u‖1 s.t. Ku = f

but replaceδ2‖Ku − f‖2 with 〈δKT (Kuk − f), u〉 + 1

2α‖u − uk‖2

⇒ uk+1 = arg minu

‖u‖1 +1

2α‖u − uk − αpk + δαKT (Kuk − f)‖2

pk+1 =−uk+1

α+

uk

α+ pk − δKT (Kuk − f)

Initialization: p0 = 0, u0 arbitrary

Ref: Osher, S., Mao, Y., Dong, B., and Yin, W.,Fast Linearized BregmanIteration for Compressive Sensing and Sparse Denoising, UCLA CAM Report[08-37], 2008.

8

Equivalent Form

Let vk = pk+1 + uk+1

α, v0 = δKT f .

Can rewrite linearized Bregman steps as

uk+1 = arg minu

‖u‖1 +1

2α‖u − αvk‖2

vk+1 = vk − δKT (Kuk+1 − f)

Remark 1: Algorithm actually solves

minu

‖u‖1 +1

2α‖u‖2 s.t. Ku = f

Remark 2: In practice, useµ‖u‖1 instead of‖u‖1 for numerical reasons.

9

Soft Thresholding

Explicit formula for

Sα(z) = arg minu

‖u‖1 +1

2α‖u − z‖2

2

=

{

z − α sign(z) if |z| > α

0 otherwise

Can use Moreau decomposition to reinterpretSα(z) in terms of a projection

Sα(z) = z − αΠ{z:‖z‖∞≤1}(z

α)

whereΠ(z) = zmax(|z|,1) is orthogonal projection onto{z : ‖z‖∞ ≤ 1}.

10

Some Convex Optimization References

• Bertsekas, D.,Constrained Optimization and Lagrange Multiplier Methods,Athena Scientific, 1996.

• Bertsekas, D.,Nonlinear Programming, Athena Scientific, Second Edition.1999.

• Bertsekas, D., and Tsitsiklis, J.,Parallel and Distributed Computation,Prentice Hall, 1989.

• Boyd, S., and Vandenberghe, L.,Convex Analysis, Cambridge UniversityPress, 2006.

• Ekeland, I., and Temam, R.Convex Analysis and Variational Problems,SIAM, Classics in Applied Mathematics, 28, 1999.

• Rockafellar, R., T.,Convex Analysis, Princeton University Press,Princeton, NJ, 1970.

11

Legendre-Fenchel Transform

J∗(p) = supw

〈p, w〉 − J(w)

Special case whenJ is a norm,J(w) = ‖w‖:

J∗(p) = supw

〈p, w〉 − ‖w‖

=

{

0 if 〈p, w〉 ≤ ‖w‖ ∀w

∞ otherwise

=

{

0 if sup‖w‖≤1〈p, w〉 ≤ 1

∞ otherwise

=

{

0 if ‖p‖∗ ≤ 1 by dual norm definition

∞ otherwise

12

Moreau Decomposition

Let f ∈ Rm andJ be a closed proper convex function onR

m. Then:

f = arg minu

J(u) +1

2α‖u − f‖2

2 + α

[

arg minp

J∗(p) +α

2‖p − f

α‖22

]

Sometimes written:

f = proxαJ(f) + α prox J∗

α(f

α)

Ref: Combettes, P., and Wajs, W.,Signal Recovery by Proximal Forward-BackwardSplitting, Multiscale Modelling and Simulation, 2006.

13

Bregman / Method of Multipliers

Bregman iteration forminu J(u) s.t. Ku = f :

uk+1 = arg minu

J(u) − 〈pk, u〉 +δ

2‖Ku − f‖2

pk+1 = pk − δKT (Kuk+1 − f), p0 = 0

Equivalent to method of multipliers:

uk+1 = arg minu

J(u) + 〈λk, Ku − f〉 +δ

2‖Ku − f‖2

λk+1 = λk + δ(Kuk+1 − f), λ0 = 0

with pk = −KT λk ∀k.

Ref: Yin, W., Osher, S., Goldfarb, D., Darbon, J.,Bregman Iterative Algorithmsfor l1-Minimization with Applications to Compressed Sensing, UCLA CAM Report[07-37], 2007.

14

Linearized Bregman / Uzawa

Linearized Bregman iteration forminu J(u)+ 12α

‖u‖2 s.t. Ku = f :

uk+1 = arg minu

J(u) +1

2α‖u − αvk‖2

vk+1 = vk − δKT (Kuk+1 − f), v0 = δKT f

Equivalent to Uzawa’s method:

uk+1 = arg minu

J(u) +1

2α‖u‖2 + 〈λk, Ku − f〉

λk+1 = λk + δ(Kuk+1 − f), λ0 = −δf

with vk = −KT λk ∀k.

Ref: Cai, J.F., Candes, E., and Shen, Z.,A Singular Value Thresholding Algorithmfor Matrix Completion, CAM [08-77], 2008.

15

Relevant Dual Functionals

Lagrangian forminu J(u) s.t. Ku = f is

L(u, λ) = J(u) + 〈λ, Ku − f〉

Dual functional is

q(λ) = infu

L(u, λ) = −J∗(−KT λ) − 〈λ, f〉

Dual problem:maxλ q(λ)

Augmented Lagrangian:

Lδ(u, λ) = L(u, λ) +δ

2‖Ku − f‖2

qδ(λ) = infu

Lδ(u, λ)

16

Proximal Point Interpretation

Lδ(u, λk) = maxy

L(u, y) − 1

2δ‖y − λk‖2

⇒ y∗ = λk + δ(Ku − f)

minu

maxy

L(u, y) − 1

2δ‖y − λk‖2 attained at(uk+1, λk+1)

⇒maxy

q(y) − 1

2δ‖y − λk‖2 attained atλk+1

⇒λk+1 = arg maxy

q(y) − 1

2δ‖y − λk‖2

(Proximal point algorithm for maximizingq(λ))

17

Gradient Ascent Interpretation

qδ(λ) = maxy q(y) − 12δ‖y − λ‖2 can be shown to be differentiable.

∇qδ(λk) = −

[

λk − arg maxy q(y) − 12δ‖y − λk‖2

δ

]

=λk+1 − λk

δ

⇒ λk+1 = λk + δ∇qδ(λk)

18

Dual Functional for Linearized Bregman

Let JLB = J(u) + 12δ‖u‖2.

Lagrangian forminu JLB s.t. Ku = f is

LLB(u, λ) = JLB(u) + 〈λ, Ku − f〉

Dual functional is

qLB(λ) = −J∗LB(−KT λ) − 〈λ, f〉

Remark: From strict convexity ofJLB , J∗LB is differentiable and

∇J∗LB is Lipschitz with constantα‖K‖2.

19

Gradient Ascent Interpretation

From optimality condition for Lagrangian form ofuk+1 update,

uk+1 = arg minu

J(u) +1

2α‖u‖2 + 〈λk, Ku − f〉,

0 ∈ ∂JLB(uk+1) + KT λk.

Using definitions of Legendre transform and subdifferential,

uk+1 = ∇J∗LB(−KT λk),

So ∇qLB(λk) = Kuk+1 − f

Can therefore interpret

λk+1 = λk + δ(Kuk+1 − f) as

λk+1 = λk + δ∇qLB(λk).

Ref: Yin, W.,Analysis and Generalizations of the Linearized Bregman Method,UCLA CAM Report [09-42], May 2009.

20

Split Bregman Idea

Example: Total Variation Denoising

minu

”‖∇u‖1” +λ

2‖u − f‖2

Reformulate as

minw,u

”‖w‖1” +λ

2‖u − f‖2 s.t. w = ∇u

Apply Bregman iteration to constrained problem but use alternatingminimization with respect tow andu.

Ref: Goldstein, T., and Osher, S.,The Split Bregman Algorithm for L1 RegularizedProblems, UCLA CAM Report [08-29], April 2008.

21

Discrete TV Seminorm Notation

‖u‖TV =

Mr∑

p=1

Mc∑

q=1

(D+1 up,q)2 + (D+

2 up,q)2

VectorizeMr × Mc matrix by stacking columns(p, q) element of matrix↔ (q − 1)Mr + p element of vector

Define grid-shaped graph withm nodes corresponding to elements(p, q).Index nodes by(q − 1)Mr + p and edges arbitrarily. For each edgeη withendpoint indices(i, j), i < j, define:

Dη,k =

−1 for k = i,

1 for k = j,

0 for k 6= i, j.

Also defineE ∈ Re×m such that

Eη,k =

{

1 if Dη,k = −1,

0 otherwise.

22

TV Notation (continued)

Define norm‖w‖E =∑m

k=1

(

ET (w2))

k.

Then can rewrite TV norm as

‖u‖TV = ‖Du‖E

Dual norm is defined by

‖p‖E∗ = ‖√

ET (p2)‖∞

23

Convex Programs with Separable Structure

minu

J(u) s.t. Ku = f

J(u) = H(u) +N

i=1

Gi(Aiu + bi)

Rewrite asminz,u

F (z) + H(u) s.t. Bz + Au = b

where F (z) =∑N

i=1 Gi(zi), z =

z1

...

zN

, B =

[

−I

0

]

,

A =

A1

...

AN

K

, and b =

−b1

...

−bN

f

.

24

Application of Bregman Iteration

Apply Bregman Iteration to:

minz,u

F (z) + H(u) s.t. Bz + Au = b

(zk+1, uk+1) = arg minz∈Rn,u∈Rm

F (z) − F (zk) − 〈pkz , z − zk〉+

H(u) − H(uk) − 〈pku, u − uk〉+

α

2‖b − Au − Bz‖2

pk+1z =pk

z + αBT (b − Auk+1 − Bzk+1)

pk+1u =pk

u + αAT (b − Auk+1 − Bzk+1).

Initialization: p0z = 0, p0

u = 0

25

Augmented Lagrangian Form

Augmented Lagrangian is given by

Lα(z, u, λ) = F (z) + H(u) + 〈λ, Au + Bz − b〉 +α

2‖Au + Bz − b‖2

Then(zk+1, uk+1) can be equivalently updated by

(zk+1, uk+1) = arg minz,u

Lα(z, u, λk)

λk+1 = λk + α(Auk+1 + Bzk+1 − b), λ0 = 0,

which is the method of multipliers.

Equivalence to Bregman iteration again follows frompkz = −BT λk and

pku = −AT λk.

26

ADMM / Split BregmanAlternate minimization with respect tou andz:Theorem 1 (Eckstein, Bertsekas) Suppose B has full column rank andH(u) + ‖Au‖2 is strictly convex. Let λ0 and u0 be arbitrary and let α > 0.Suppose we are also given sequences {µk} and {νk} such that µk ≥ 0,νk ≥ 0,

∑∞k=0 µk < ∞ and

∑∞k=0 νk < ∞. Suppose that

‖zk+1 − arg minz∈Rn

F (z) + 〈λk, Bz〉 +α

2‖Auk + Bz − b‖2‖ ≤ µk (1)

‖uk+1 − arg minu∈Rm

H(u) + 〈λk, Au〉 +α

2‖Au + Bzk+1 − b‖2‖ ≤ νk (2)

λk+1 = λk + α(Auk+1 + Bzk+1 − b). (3)

If there exists a saddle point of L(z, u, λ) , then zk → z∗, uk → u∗ andλk → λ∗, where (z∗, u∗, λ∗) is such a saddle point. If no such saddle pointexists, then at least one of the sequences {uk} or {λk} must be unbounded.

Ref: Eckstein, J., and Bertsekas, D.,On the Douglas-Rachford splitting methodand the proximal point algorithm for maximal monotone operators, MathematicalProgramming 55, North-Holland, 1992.

27

Dual Functional

q(λ) = infz,u

F (z) + H(u) + 〈λ, Au + Bz − b〉

= −F ∗(−BT λ) − 〈λ, b〉 − H∗(−AT λ)

λ∗ is optimal if

0 ∈ −B∂F ∗(−BT λ∗) + b − A∂H∗(−AT λ∗)

Let Ψ(λ) = −B∂F ∗(−BT λ) + b φ(λ) = −A∂H∗(−AT λ).

28

Douglas Rachford Splitting

Formally apply Douglas Rachford splitting withα as the time step:

0 ∈ rk+1 − λk

α+ Ψ(rk+1) + φ(λk),

0 ∈ λk+1 − λk

α+ Ψ(rk+1) + φ(λk+1)

Remark: There are possibly many ways to satisfy the above iterations, butADMM satisfies it in a particular way.

rk+1 = (I + αΨ)−1(λk + αAuk)

λk+1 = (I + αφ)−1(rk+1 − αAuk)

29

Reformulation of DR Splitting

rk+1 = arg minr

F ∗(−BT r) + 〈r, b〉 +1

2α‖r − λk + αqk‖2

λk+1 = arg minλ

H∗(−AT λ) +1

2α‖λ − rk+1 − αqk‖2

qk+1 = qk +1

α(rk+1 − λk+1)

Remark: Don’t need the ’full column rank’ or ’strictly convex’ assumptions toguarantee thatλk converges to solution of dual problem

Ref: Eckstein, J.,Splitting Methods for Monotone Operators with Applications toParallel Optimization, Ph. D. Thesis, Massachusetts Institute of Technology,Dept. of Civil Engineering, http://hdl.handle.net/1721.1/14356, 1989.

30

TV- l1 Example

minu

‖u‖TV + β‖Ku − f‖1

Rewrite asmin

u‖Du‖E + β‖Ku − f‖1

Let z =

[

w

v

]

=

[

Du

Ku − f

]

, B = −I A =

[

D

K

]

, b =

[

0

f

]

to put in form minz,u F (z) + H(u) s.t. Bz + Au = b

Introduce dual variableλ =

[

p

q

]

.

Solution exists assumingker(D)⋂

ker(K) = {0}.

31

Augmented Lagrangian and ADMM Iterations

L(z, u, λ) =‖w‖E + β‖v‖1 + 〈p, Du − w〉 + 〈q, Ku − f − v〉+α

2‖w − Du‖2 +

α

2‖v − Ku + f‖2

The ADMM iterations are given by

wk+1 = arg minw

‖w‖E +α

2‖w − Duk − pk

α‖2

vk+1 = arg minv

β‖v‖1 +α

2‖v − Kuk + f − qk

α‖2

uk+1 = arg minu

α

2‖Du − wk+1 +

pk

α‖2 +

α

2‖Ku − vk+1 − f +

qk

α‖2

pk+1 = pk + α(Duk+1 − wk+1)

qk+1 = qk + α(Kuk+1 − f − vk+1),

wherep0 = q0 = 0, u0 is arbitrary andα > 0.

32

Explicit Iterations

The explicit formulas forwk+1, vk+1 anduk+1 are given by

wk+1 = S̃ 1α(Duk +

pk

α)

vk+1 = S β

α

(Kuk − f +qk

α)

uk+1 = (−4 + KT K)−1

(

DT wk+1 − DT pk

α+ KT (vk+1 + f) − KT qk

α

)

= (−4 + KT K)−1(

DT wk+1 + KT (vk+1 + f))

.

where

S̃c(f) = f − cΠ{p:‖p‖E∗≤1}(f

c),

Π{p:‖p‖E∗≤1}(p) =p

E max(

ET (p2), 1)

33

TV- l1 Results

f u

TV-l1 Minimization of512 × 512 Synthetic Image

Image Size Iterations Time

64 × 64 40 1s

128 × 128 51 5s

256 × 256 136 78s

512 × 512 359 836sIterations until‖uk − uk−1‖∞ ≤ .5, ‖Duk − wk‖∞ ≤ .5 and‖vk − uk + f‖∞ ≤ .5

β = .6, .3, .15 and.075, α = .02, .01, .005 and.0025

34

Decoupling VariablesOne can add additional proximal-like penalties to ADMM iterations andobtain a more explicit algorithm that still converges.

Given a step of the ADMM algorithm of the form

uk+1 = arg minu

J(u) + 〈λk, Ku − f〉 +α

2‖Ku − f‖2,

modify the objective functional by adding

1

2〈u − uk, (

1

δ− αKT K)(u − uk)〉,

whereδ is chosen such that0 < δ < 1α‖KT K‖ .

Modified update is given by

uk+1 = arg minu

J(u) + 〈λk, Ku − f〉 +1

2δ‖u − uk + αδKT (Kuk − f)‖2.

Ref: Zhang, X., Burger, M., Bresson, X., Osher, S.,Bregmanized NonlocalRegularization for Deconvolution and Sparse Reconstruction, UCLA CAM Report[09-03] 2009.

35

Convex Constraints as Indicator Functions

Given a constraint of the formu ∈ S whereS is convex, we can enforce theconstraint by adding to the objective functional the indicator function forS,

H(u) =

{

0 if u ∈ S

∞ otherwise

One can then develop algorithms that project onto the constraint set at eachiteration, or in combination with the decoupling trick handle the constraint ina more explicit manner.

36

Example Constraint

Suppose the constraint is‖Ku − f‖ ≤ ε, soS = {u : ‖Ku − f‖ ≤ ε}.

ΠS(z) = (I − K†K)z + K†

{

Kz if ‖Kz − f‖ ≤ ε

f + r(

Kz−KK†f‖Kz−KK†f‖

)

otherwise,

where

r =√

ε2 − ‖(I − KK†)f‖22

By decoupling variables, can simplify projection step to

Π{z:‖z−f‖2≤ε}(z) = f +z − f

max(

‖z−f‖2

ε, 1

) .

Useful whenK† not easy to compute.

37

TV Deblurring Example

minu

‖u‖TV s.t. ‖Ku − f‖ ≤ ε

Rewrite as

minu

‖Du‖E + H(Ku) where H(z) =

{

0 ‖z − f‖ ≤ ε

∞ otherwise.

Let T = {z : ‖z − f‖ ≤ ε} andX = {p : ‖p‖E∗ ≤ 1}.

Saddle point problem from Lagrangian:

maxp,q

infu,w,z

‖w‖E + 〈p, Du − w〉 + H(z) + 〈q, Ku − z〉

38

Split Inexact Uzawa Method for TV Deblurring

uk+1 = arg minu

δ

2‖Du − wk‖2 +

δ

2‖Ku − zk‖2 + 〈pk, Du〉 + 〈qk, Ku〉

+1

2〈u − uk,

[

(1

α− δDT D) + (

1

α− δKT K)

]

(u − uk)〉

wk+1 = arg minw

‖w‖E +δ

2‖w − Duk+1 − pk

δ‖2

zk+1 = arg minz

H(z) +δ

2‖z − Kuk+1 − qk

δ‖2

pk+1 = pk + δ(Duk+1 − wk+1)

qk+1 = qk + δ(Kuk+1 − zk+1)

If D ∼ ∇ andK is normalized blurring operator, just need0 < α < 14δ

.Ref: Zhang, X.,A Unified Primal-Dual Algorithm Based on l1 and Bregman Iteration,(Private Communication), April 2009.

39

TV Deblurring Algorithm (continued)

Use two applications of Moreau decomposition to rewrite previous algorithmin terms of projections ontoT andX :

uk+1 = uk − α

2

[

DT (2pk − pk−1) + KT (2qk − qk−1)]

pk+1 = ΠX(pk + δDuk+1)

qk+1 = (qk + δKuk+1) − δΠT (qk

δ+ Kuk+1)

Remark: Can require more iterations than a more implicit algorithm, but hasthe advantage of only requiring matrix multiplications andsimple projections.

40

Compressive Sensing Example

minz

‖Ψz‖1 s.t. ‖RΓz − f‖2 ≤ ε,

whereRΓ is the measurement matrix and we expectΨz to be sparse.

Let J = ‖ · ‖1, A = Ψ, K = RΓ and

H(x) =

{

0 if ‖x − f‖2 ≤ ε

∞ otherwise.

⇒ Just like the deblurring example

If ΨT Ψ = I (tight frame), can choose to handle implicitly

If Γ is discrete Fourier transform, can handleK implicitly too

41

Connections to PDHG

SinceJ∗∗ = J , J(Au) = J∗∗(Au) = supp〈p, Au〉 − J∗(p).

Can therefore obtain the following saddle point problem fromminu J(Au) + H(u),

minu

supp

−J∗(p) + 〈p, Au〉 + H(u).

The Primal Dual Hybrid Gradient algorithm then alternates primal and dualproximal steps of the form:

pk+1 = arg maxp

−J∗(p) + 〈p, Auk〉 − 1

2δk

‖p − pk‖22

uk+1 = arg minu

H(u) + 〈AT pk+1, u〉 +1

2αk

‖u − uk‖22

Ref: Zhu, M., and Chan, T.,An Efficient Primal-Dual Hybrid Gradient Algorithm forTotal Variation Image Restoration, UCLA CAM Report [08-34], May 2008.

42

Figure 1: PDHG-Related Algorithm Framework

(P) minu FP (u)

FP (u) = J(Au) + H(u)

(D)maxp FD(p)

FD(p) = −J∗(p) − H∗(−AT p)

(PD)minu supp LPD(u, p)

LPD(u, p) = 〈p,Au〉 − J∗(p) + H(u)

(SPP)maxp infu,w LP (u,w, p)

LP (u,w, p) = J(w) + H(u) + 〈p,Au − w〉

(SPD)maxu infp,y LD(p, y, u)

LD(p, y, u) = J∗(p) + H∗(y) + 〈u,−AT p − y〉?? ??

AMA on (SPP)⇔

PFBS on (D)

AMA on (SPD)⇔

PFBS on (P)

@@

@@R

��

��

+ 1

2α‖u − uk‖2

2 + 1

2δ‖p − pk‖2

2

����������

����������

CCCCCCCCCW

CCCCCCCCCW

+ δ

2‖Au − w‖2

2 +α

2‖AT p + y‖2

2Relaxed AMA

on (SPP)Relaxed AMA

on (SPD)

@@@R@

@@I ����

���

ADMM on (SPP)⇔

Dougles Rachfordon (D)

ADMM on (SPD)⇔

Dougles Rachfordon (P)

@@@@ ����

@@@R

@@@R

���

���

+ 1

2〈u − uk, ( 1

α− δAT A)(u − uk)〉 + 1

2〈p − pk, (1

δ− αAAT )(p − pk)〉

Primal-Dual Proximal Pointon (PD)

⇔PDHG

��

��

��

��

@@

@@@R

@@

@@@R

pk+1 →2pk+1 − pk

uk →2uk − uk−1

Split Inexact Uzawaon (SPP)

⇔PDHGMp

Split Inexact Uzawaon (SPD)

⇔PDHGMu

Legend: (P): Primal(D): Dual(PD): Primal-Dual(SPP): Split Primal(SPD): Split Dual

AMA: Alternating Minimization Algorithm (4.2.1)PFBS: Proximal Forward Backward Splitting (4.2.1)ADMM: Alternating Direction Method of Multipliers (4.2.2)PDHG: Primal Dual Hybrid Gradient (4.2)PDHGM: Modified PDHG (4.2.3)⇒ Well Understood Convergence Properties

43

Types of Applications

• Convex programs that decompose into problems of the form

minz,u

F (z) + H(u) s.t. Au + Bz = b

• Especially useful for problems involving convex constraints, l2 andl1-like terms that can be separated

• l1-like terms include TV seminorm, Besov norm and even the nuclearnorm, which is thel1 norm of the singular values of a matrix

• Can also apply these algorithms to convex relaxations of non-convexproblems

44

Sparse Approximation

These algorithms are useful for functionals involving multiple l1-like terms,which can arise when modelling signals as sums of sparse signals in differentrepresentations:

• TV-l1• Cartoon / Texture Decomposition

Sparse∇ + Sparse Fourier coefficients• Background Video Detection

min ‖A‖nuclear + λ‖E‖1 s.t. A + E ∼ original video

Low rank (background) + Sparse error (foreground)

Ref: Osher, S., Sole, A. Vese, L.,Image Decomposition and Restoration Using

Total Variation Minimization and the H−1 Norm. [UCLA CAM Report 02-57]Ref: Talk by John Wright

45

Nonlocal Total Variation

Graph definition of discrete TV seminorm makes it straightforward to extendthese algorithms to non-local TV minimization problems.

‖u‖TV = ‖Du‖E

Simply redefine the edge-node adjacency matrixD.

Let A be the adjacency matrix for the new set of edges, redefineE

accordingly and letW be a diagonal matrix of precomputed nonnegativeweights on the edges. Then

‖u‖NLTV = ‖√

WAu‖E

46

Convexification of Image Segmentation

minu

‖u‖TV + λ‖u(c1 − f)‖2 + λ‖(1 − u)(c2 − f)‖2 s.t. u binary

minu

‖u‖TV + λ〈(c1 − f)2 − (c2 − f)2, u〉 s.t. 0 ≤ u ≤ 1

Convexification idea also extends to active contours, multiphase segmentation.

Ref: Burger, M., and Hintermüller, M.,Projected Gradient Flows for BV / LevelSet Relaxation, UCLA CAM Report [05-40] 2005.Ref: Goldstein, T., Bresson, X., Osher, S.,Geometric Applications of the SplitBregman Method: Segmentation and Surface Reconstruction, UCLA CAM Report[09-06] 2009.

47

Convexification of Image Registration

Given imagesu andφ, minimize

‖φ(x − v) − u(x)‖2 +γ

2‖∇v1‖2 +

γ

2‖∇v2‖2

with respect to displacement fieldv.Obtain convex relaxation by adding edges with unknown weights ci,j suchthat

(vi1, v

i2) =

(xi1 −

j∼i

ci,jyj1), (x

i2 −

j∼i

ci,jyj2)

phi

x2

x1

y2

y1

u

F (c) = ‖Aφc − u‖2 +γ

2‖D(Ay1

c − x1)‖2 +γ

2‖D(Ay2

c − x2)‖2

such thatci,j ≥ 0 and∑

j∼i ci,j = 1.

48