Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015...

32
Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling Joint work with Peter Richtárik (Edinburgh) & Tong Zhang (Rutgers & Baidu)

Transcript of Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015...

Page 1: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Zheng QuUniversity of Edinburgh

Optimization & Big Data Workshop Edinburgh, 6th to 8th May, 2015

Randomized dual coordinate ascent with arbitrary

sampling

Joint work with Peter Richtárik (Edinburgh) & Tong Zhang (Rutgers & Baidu)

Page 2: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Supervised Statistical Learning

Data Algorithm Predictor

Page 3: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Supervised Statistical Learning

Data Algorithm Predictor

Predicted label True label

Input Label

Page 4: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Empirical Risk Minimization

Data Algorithm PredictorInput Label

empirical risk regularization

n = # samples (big!)

Page 5: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

n = # samples (big!)

empirical loss regularization

ERM problem:

Empirical Risk Minimization

Page 6: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Algorithm: QUARTZ

Z. Q., P. Richtárik (UoE) and T. Zhang (Rutgers & Baidu Big Data Lab, Beijing)Randomized dual coordinate ascent with arbitrary sampling arXiv:1411.5873, 2014

Page 7: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Primal-Dual Formulation

Fenchel conjugates:

ERM problem

Dual problem

Page 8: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Intuition behind QUARTZ

Fenchel’s inequality

weak duality

Optimality conditions

Page 9: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

The Primal-Dual Update

STEP 1: PRIMAL UPDATE

STEP 2: DUAL UPDATE

Optimality conditions

Page 10: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

STEP 1: Primal update

STEP 2: Dual update

Just maintaining

Page 11: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

SDCA: SS. Shwartz & T. Zhang, 09/2012mSDCA M. Takáč, A. Bijral, P. Richtárik & N. Srebro, 03/2013ASDCA: SS. Shwartz & T. Zhang, 05/2013AccProx-SDCA: SS. Shwartz & T. Zhang, 10/2013 DisDCA: TB. Yang, 2013 Iprox-SDCA: PL. Zhao & T. Zhang, 01/2014 APCG: QH. Lin, Z. Lu & L. Xiao, 07/2014SPDC: Y. Zhang & L. Xiao, 09/2014QUARTZ: Z. Q., P. Richtárik & T. Zhang, 11/2014

Randomized Primal-Dual Methods

zheng qu
comments
Page 12: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Convergence Theorem

Expected

Separable

Overapproximation

ESO Assumption

Convex combination constant

zheng qu
comments
Page 13: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Iteration Complexity Result

(*)

zheng qu
Page 14: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Complexity Results for Serial Sampling

zheng qu
comments
Page 15: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Experiment: Quartz vs SDCA,uniform vs optimal sampling

zheng qu
comments
Page 16: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

QUARTZ with Standard Mini-Batching

zheng qu
comments
Page 17: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Data Sparsity

A normalized measure of average sparsity of the data

“Fully sparse data” “Fully dense data”

zheng qu
comments
Page 18: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Iteration Complexity Results

zheng qu
comments
Page 19: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Iteration Complexity Results

zheng qu
comments
Page 20: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Theoretical Speedup Factor

Linear speedup up to a certain data-independent mini-batch size:

Further data-dependent speedup:

zheng qu
comments
Page 21: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Plots of Theoretical Speedup Factor

Linear speedup up to a certain data-independent mini-batch size:

Further data-dependent speedup:

zheng qu
comments
Page 22: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Theoretical vs Pratical Speedup

astro_ph; sparsity: 0.08%; n=29,882; cov1; sparsity: 22.22%; n=522,911;

zheng qu
comments
Page 23: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Comparison with Accelerated Mini-Batch P-D Methods

zheng qu
comments
Page 24: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Distribution of Datan = # dual variables Data matrix

Page 25: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Distributed Sampling

Random set of dual variables

Page 26: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Distributed Sampling & Distributed Coordinate Descent

Peter Richtárik and Martin TakáčDistributed coordinate descent for learning with big dataarXiv:1310.2059, 2013

Previously studied (not in the primal-dual setup):

Olivier Fercoq, Z. Q., Peter Richtárik and Martin TakáčFast distributed coordinate descent for minimizing non strongly convex losses2014 IEEE Int Workshop on Machine Learning for Signal Processing, 2014

Jakub Marecek, Peter Richtárik and Martin TakáčFast distributed coordinate descent for minimizing partially separable functionsarXiv:1406.0238, 2014

2

strongly convex & smooth

convex & smooth

Page 27: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Complexity of Distributed QUARTZ

Page 28: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Reallocating Load: Theoretical Speedup

Page 29: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Theoretical vs Practical Speedup

zheng qu
comments
Page 30: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

More on ESOESO:

second order /curvature informationlocal second order /curvature information

lost

get

Page 31: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Computation of ESO Parameters

Lemma (QR’14b)

Sampling Data

Page 32: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.

Conclusion

QUARTZ (Randomized coordinate ascent method with arbitrary sampling )o Direct primal-dual analysis (for arbitrary sampling)

optimal serial sampling tau-nice sampling (mini-batch) distributed sampling

o Theoretical speedup factor which is a very good predictor of the practical speedup factor depends on both the sparsity and the condition number shows a weak dependence on how data is distributed

Accelerated QUARTZ? Randomized fixed point algorithm with relaxation? …?

zheng qu
comments