Aug Lagrangian and ADMM

download Aug Lagrangian and ADMM

of 19

Transcript of Aug Lagrangian and ADMM

  • 8/12/2019 Aug Lagrangian and ADMM

    1/19

  • 8/12/2019 Aug Lagrangian and ADMM

    2/19

    ST810 Lecture 24

    Augmented Lagrangian method

    Outline

    Augmented Lagrangian method

    ADMM

    Final words

  • 8/12/2019 Aug Lagrangian and ADMM

    3/19

    ST810 Lecture 24

    Augmented Lagrangian method

    Augmented Lagrangian method

    Consider minimizingf(x)subject to equality constraintsgi(x) =0 fori=1, . . . , q.

    Inequality constraints are ignored for simplicity

    Assumef andgiare smooth for simplicity

    At a constrained minimum, the Lagrange multiplier condition

    0= f(x) +

    q

    i=1

    igi(x)

    holds providedgi(x)are linearly independent

  • 8/12/2019 Aug Lagrangian and ADMM

    4/19

    ST810 Lecture 24

    Augmented Lagrangian method

    Augmented Lagrangian:

    L(x,) =f(x) +

    qi=1

    igi(x) +

    2

    qi=1

    gi(x)2

    The penalty term(/2)

    qi=1 gi(x)2 punishes violations of theequality constraintsgi()

    Idea: optimize the Augmented Lagrangian and adjust in thehope of matching the true Lagrange multipliers

    For large enough (finite), the unconstrained minimizer of the

    augmented Lagrangian coincides with the constrained solution ofthe original problem

    At convergence, the gradient gi(x)gi(x)vanishes and werecover the standard multiplier rule

  • 8/12/2019 Aug Lagrangian and ADMM

    5/19

    ST810 Lecture 24

    Augmented Lagrangian method

    Algorithm: take initially large or gradually increase it; iterate find the unconstrained minimum

    x(t+1) minxL(x,

    (t))

    update the multiplier vector

    (t+1)i

    (t)i +gi(x(

    t)), i=1, . . . , q

    Intuition for updating: ifx(t) is the unconstrained minimum ofL(x,), then the stationarity condition says

    0 = f(x

    (t)

    ) +

    q

    i=1

    (t)

    i gi(x

    (t)

    ) +

    q

    i=1

    gi(x

    (t)

    )gi(x

    (t)

    )

    = f(x(t)) +

    q

    i=1

    [(t)i +gi(x

    (t))]gi(x(t))

    For non-smoothf, replace gradientfby subdifferentialf

  • 8/12/2019 Aug Lagrangian and ADMM

    6/19

    ST810 Lecture 24

    Augmented Lagrangian method

    Example: basis pursuit

    Basis pursuit problem seeks the sparsest solution subject to linearconstraints

    minimize x1

    subject to Ax=b

    Take initially large or gradually increase it; iterate according to

    x(t+1) minx1+(t),Ax b+

    2

    Ax b22 (lasso)

    (t+1) (t) +(Ax(t+1) b)

    Converges in a finite (small) number of steps (Yin et al., 2008)

  • 8/12/2019 Aug Lagrangian and ADMM

    7/19

    ST810 Lecture 24

    Augmented Lagrangian method

    Remarks

    The augmented Lagrangian method dates back to 50s

    (Hestenes, 1969; Powell, 1969) Monograph by Bertsekas (1982) provides a general treatment

    Same as theBregman iteration(Yin et al., 2008) proposed forbasis pursuit (compressive sensing)

    Equivalent to proximal point algorithm applied to the dual; can be

    accelerated (Nesterov)

  • 8/12/2019 Aug Lagrangian and ADMM

    8/19

    ST810 Lecture 24

    ADMM

    Outline

    Augmented Lagrangian method

    ADMM

    Final words

  • 8/12/2019 Aug Lagrangian and ADMM

    9/19

    ST810 Lecture 24

    ADMM

    ADMM

    Alternatingdirectionmethod ofmultipliers

    Consider minimizingf(x) +g(y)subject to affine constraintsAx+ By=c

    The augmented Lagrangian

    L(x, y,) =f(x) +g(y) + ,Ax+ By c +

    2Ax+ By c22

    Idea: perform block descent on xandyand then updatemultiplier vector

    x(t+1) minx

    f(x) + ,Ax+ By(t) c + 2Ax+ By(t) c22

    y(t+1) miny

    g(y) + ,Ax(t+1) + By c +

    2Ax(t+1) + By c22

    (t+1) (t) +(Ax(t+1) + By(t+1) c)

  • 8/12/2019 Aug Lagrangian and ADMM

    10/19

    ST810 Lecture 24

    ADMM

    Example: fused lasso

    Fused lasso problem minimizes

    1

    2

    y X22+

    p1

    j=1

    |j+1j|

    Define=D, where

    D=

    1 1

    1 1

    .

    Then we minimize 12y X22+1 subject toD=

    ST810 L t 24

  • 8/12/2019 Aug Lagrangian and ADMM

    11/19

    ST810 Lecture 24

    ADMM

    Augmented Lagrangian is

    L(,,) = 1

    2y X22+1+

    T(D ) +

    2D 22

    ADMM: Update is a smooth quadratic problem Update is a separated lasso problem (elementwise thresholding) Update multipliers

    (t+1) (t) +(D(t) (t))

    Same algorithm applies to a general regularization matrixD(generalized lasso)

    ST810 Lecture 24

  • 8/12/2019 Aug Lagrangian and ADMM

    12/19

    ST810 Lecture 24

    ADMM

    Remarks on ADMM

    Related algorithms split Bregman iteration(Goldstein and Osher, 2009) Dykstra (1983)s alternating projection algorithm ...

    Proximal point algorithm applied to the dual

    Numerous applications in statistics and machine learning: lasso,gen. lasso, graphical lasso, (overlapping) group lasso, ...

    Embraces distributed computing for big data (Boyd et al., 2011)

    ST810 Lecture 24

  • 8/12/2019 Aug Lagrangian and ADMM

    13/19

    ST810 Lecture 24

    Final words

    Outline

    Augmented Lagrangian method

    ADMM

    Final words

    ST810 Lecture 24

  • 8/12/2019 Aug Lagrangian and ADMM

    14/19

    ST810 Lecture 24

    Final words

    Take-home messages from this course

    Statistics, the science ofdata analysis, is the appliedmathematics in the 21st century Read the first few pages and the last few pages of Tukey (1962)s

    Future of data analysis(posted on course website). They are amust for every statistician

    Big dataera:wiki,WSJ,white house,McKinsey report, ...Challenges: methodology: bigp

    efficiency: bignand/or bigp memory: bign, distributed computing via MapReduce (Hadoop),

    online algorithms

    ST810 Lecture 24

    http://en.wikipedia.org/wiki/Big_datahttp://online.wsj.com/article/SB10001424127887323751104578147311334491922.htmlhttp://online.wsj.com/article/SB10001424127887323751104578147311334491922.htmlhttp://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdfhttp://www.slideshare.net/fred.zimny/mckinsey-quarterlys-2011-report-the-challenge-and-opportunityof-big-datahttp://www.slideshare.net/fred.zimny/mckinsey-quarterlys-2011-report-the-challenge-and-opportunityof-big-datahttp://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdfhttp://online.wsj.com/article/SB10001424127887323751104578147311334491922.htmlhttp://en.wikipedia.org/wiki/Big_data
  • 8/12/2019 Aug Lagrangian and ADMM

    15/19

    ST810 Lecture 24

    Final words

    Coding Prototyping: R and Matlab

    A real programming language: C/C++, Fortran, Python Scripting language: Python, Perl, JavaScript

    Numerical linear algebra. Use standardlibraries(BLAS,LAPACK)! Sparse linear algebra is critical for exploitingsparsitystructure in big data

    ST810 Lecture 24

  • 8/12/2019 Aug Lagrangian and ADMM

    16/19

    Final words

    Optimization Disciplined convex programming(LS, LP, QP, GP, SOCP, SDP)

    Convex programming is becoming atechnology, just like least

    squares (LS). Many statisticians dont realize this Specialized tools in statistics: EM/MM, Fisher scoring,

    Gauss-Newton, simulated annealing, ... Combinatorial optimization techniques: divide-and-conquer,

    dynamic programming, greedy algorithm, ...

    ST810 Lecture 24

  • 8/12/2019 Aug Lagrangian and ADMM

    17/19

    Final words

    About final project

    In your presentation

    describe your research question

    what variables/features in the data are used describe preprocessing procedure

    describe implementation details: language, software, algorithm,timing, ...

    describe the difficulties you met. Which are or are not working?

    send us your slides before your presentation, so we can givebetter feedback

    ST810 Lecture 24

  • 8/12/2019 Aug Lagrangian and ADMM

    18/19

    Final words

    (Partial) answers to your questions

    Weis group lasso (with equality constraint) problem: SOCP,accelerated proximal gradient (Nesterov) method, ADMM

    Tians fused-lasso problem: QP, ADMM, accelerated proximalgradient method coupled with DP, re-parameterize to lasso, pathalgorithm

    Kehuis composite quantile regression problem: LP (althoughoriginal problem is non-convex)

    Shikai: SDP

    Feel free to ask more

    ST810 Lecture 24

  • 8/12/2019 Aug Lagrangian and ADMM

    19/19

    References

    Bertsekas, D. P. (1982). Constrained Optimization and Lagrange Multiplier Methods.Computer Science and Applied Mathematics. Academic Press Inc. [Harcourt BraceJovanovich Publishers], New York.

    Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributedoptimization and statistical learning via the alternating direction method ofmultipliers. Found. Trends Mach. Learn., 3(1):1122.

    Dykstra, R. L. (1983). An algorithm for restricted least squares regression. J. Amer.Statist. Assoc., 78(384):837842.

    Goldstein, T. and Osher, S. (2009). The split Bregman method forl1-regularizedproblems. SIAM J. Img. Sci., 2:323343.

    Hestenes, M. R. (1969). Multiplier and gradient methods. J. Optimization Theory Appl.,4:303320.

    Powell, M. J. D. (1969). A method for nonlinear constraints in minimization problems. InOptimization (Sympos., Univ. Keele, Keele, 1968), pages 283298. AcademicPress, London.

    Tukey, J. W. (1962). The future of data analysis. Ann. Math. Statist., 33:167.

    Yin, W., Osher, S., Goldfarb, D., and Darbon, J. (2008). Bregman iterative algorithmsforl1-minimization with applications to compressed sensing. SIAM J. Imaging Sci.,1(1):143168.