EU Rural Cooperation Fair 23 rd and 24 th September Edinburgh.
Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015...
-
Upload
cori-lynch -
Category
Documents
-
view
219 -
download
1
Transcript of Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015...
![Page 1: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/1.jpg)
Zheng QuUniversity of Edinburgh
Optimization & Big Data Workshop Edinburgh, 6th to 8th May, 2015
Randomized dual coordinate ascent with arbitrary
sampling
Joint work with Peter Richtárik (Edinburgh) & Tong Zhang (Rutgers & Baidu)
![Page 2: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/2.jpg)
Supervised Statistical Learning
Data Algorithm Predictor
![Page 3: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/3.jpg)
Supervised Statistical Learning
Data Algorithm Predictor
Predicted label True label
Input Label
![Page 4: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/4.jpg)
Empirical Risk Minimization
Data Algorithm PredictorInput Label
empirical risk regularization
n = # samples (big!)
![Page 5: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/5.jpg)
n = # samples (big!)
empirical loss regularization
ERM problem:
Empirical Risk Minimization
![Page 6: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/6.jpg)
Algorithm: QUARTZ
Z. Q., P. Richtárik (UoE) and T. Zhang (Rutgers & Baidu Big Data Lab, Beijing)Randomized dual coordinate ascent with arbitrary sampling arXiv:1411.5873, 2014
![Page 7: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/7.jpg)
Primal-Dual Formulation
Fenchel conjugates:
ERM problem
Dual problem
![Page 8: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/8.jpg)
Intuition behind QUARTZ
Fenchel’s inequality
weak duality
Optimality conditions
![Page 9: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/9.jpg)
The Primal-Dual Update
STEP 1: PRIMAL UPDATE
STEP 2: DUAL UPDATE
Optimality conditions
![Page 10: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/10.jpg)
STEP 1: Primal update
STEP 2: Dual update
Just maintaining
![Page 11: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/11.jpg)
SDCA: SS. Shwartz & T. Zhang, 09/2012mSDCA M. Takáč, A. Bijral, P. Richtárik & N. Srebro, 03/2013ASDCA: SS. Shwartz & T. Zhang, 05/2013AccProx-SDCA: SS. Shwartz & T. Zhang, 10/2013 DisDCA: TB. Yang, 2013 Iprox-SDCA: PL. Zhao & T. Zhang, 01/2014 APCG: QH. Lin, Z. Lu & L. Xiao, 07/2014SPDC: Y. Zhang & L. Xiao, 09/2014QUARTZ: Z. Q., P. Richtárik & T. Zhang, 11/2014
Randomized Primal-Dual Methods
![Page 12: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/12.jpg)
Convergence Theorem
Expected
Separable
Overapproximation
ESO Assumption
Convex combination constant
![Page 13: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/13.jpg)
Iteration Complexity Result
(*)
![Page 14: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/14.jpg)
Complexity Results for Serial Sampling
![Page 15: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/15.jpg)
Experiment: Quartz vs SDCA,uniform vs optimal sampling
![Page 16: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/16.jpg)
QUARTZ with Standard Mini-Batching
![Page 17: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/17.jpg)
Data Sparsity
A normalized measure of average sparsity of the data
“Fully sparse data” “Fully dense data”
![Page 18: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/18.jpg)
Iteration Complexity Results
![Page 19: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/19.jpg)
Iteration Complexity Results
![Page 20: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/20.jpg)
Theoretical Speedup Factor
Linear speedup up to a certain data-independent mini-batch size:
Further data-dependent speedup:
![Page 21: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/21.jpg)
Plots of Theoretical Speedup Factor
Linear speedup up to a certain data-independent mini-batch size:
Further data-dependent speedup:
![Page 22: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/22.jpg)
Theoretical vs Pratical Speedup
astro_ph; sparsity: 0.08%; n=29,882; cov1; sparsity: 22.22%; n=522,911;
![Page 23: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/23.jpg)
Comparison with Accelerated Mini-Batch P-D Methods
![Page 24: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/24.jpg)
Distribution of Datan = # dual variables Data matrix
![Page 25: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/25.jpg)
Distributed Sampling
Random set of dual variables
![Page 26: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/26.jpg)
Distributed Sampling & Distributed Coordinate Descent
Peter Richtárik and Martin TakáčDistributed coordinate descent for learning with big dataarXiv:1310.2059, 2013
Previously studied (not in the primal-dual setup):
Olivier Fercoq, Z. Q., Peter Richtárik and Martin TakáčFast distributed coordinate descent for minimizing non strongly convex losses2014 IEEE Int Workshop on Machine Learning for Signal Processing, 2014
Jakub Marecek, Peter Richtárik and Martin TakáčFast distributed coordinate descent for minimizing partially separable functionsarXiv:1406.0238, 2014
2
strongly convex & smooth
convex & smooth
![Page 27: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/27.jpg)
Complexity of Distributed QUARTZ
![Page 28: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/28.jpg)
Reallocating Load: Theoretical Speedup
![Page 29: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/29.jpg)
Theoretical vs Practical Speedup
![Page 30: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/30.jpg)
More on ESOESO:
second order /curvature informationlocal second order /curvature information
lost
get
![Page 31: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/31.jpg)
Computation of ESO Parameters
Lemma (QR’14b)
Sampling Data
![Page 32: Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d315503460f94a09fc9/html5/thumbnails/32.jpg)
Conclusion
QUARTZ (Randomized coordinate ascent method with arbitrary sampling )o Direct primal-dual analysis (for arbitrary sampling)
optimal serial sampling tau-nice sampling (mini-batch) distributed sampling
o Theoretical speedup factor which is a very good predictor of the practical speedup factor depends on both the sparsity and the condition number shows a weak dependence on how data is distributed
Accelerated QUARTZ? Randomized fixed point algorithm with relaxation? …?