Optimization via (too much?) Randomization
description
Transcript of Optimization via (too much?) Randomization
![Page 1: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/1.jpg)
Optimization via (too much?) Randomization
Peter Richtarik
Why parallelizing like crazy and being lazy can be good
![Page 2: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/2.jpg)
Optimization as Mountain Climbing
![Page 3: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/3.jpg)
Optimization with Big Data
* in a billion dimensional space on a foggy day
Extreme* Mountain Climbing=
![Page 4: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/4.jpg)
Big Data
• digital images & videos• transaction records• government records• health records• defence• internet activity (social media,
wikipedia, ...)• scientific measurements
(physics, climate models, ...)
BIG Volume BIG Velocity BIG Variety
![Page 5: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/5.jpg)
God’s Algorithm = Teleportation
![Page 6: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/6.jpg)
If You Are Not a God...
x0x1
x2 x3
![Page 7: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/7.jpg)
start
settle for this
holy grail
Randomized Parallel Coordinate Descent
![Page 8: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/8.jpg)
Western General Hospital(Creutzfeldt-Jakob Disease)
Arup (Truss Topology Design)
Ministry of Defence dstl lab(Algorithms for Data Simplicity)Royal Observatory
(Optimal Planet Growth)
![Page 9: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/9.jpg)
Optimization as Lock Breaking
![Page 10: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/10.jpg)
A Lock with 4 Dials
Setup: Combination maximizing F opens the lock
x = (x1, x2, x3, x4) F(x) = F(x1, x2, x3, x4)
A function representing the
“quality” of a combination
Optimization Problem: Find combination maximizing F
![Page 11: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/11.jpg)
Optimization Algorithm
![Page 12: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/12.jpg)
A System of Billion Locks with Shared Dials
# dials = n
x1
x2
x3
x4
xn
Lock
1) Nodes in the graph correspond to dials
2) Nodes in the graph also correspond to locks: each lock (=node) owns dials connected to it in the graph by an edge
= # locks
![Page 13: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/13.jpg)
How do we Measure the Quality of a Combination?
F : Rn R
• Each lock j has its own quality function Fj
depending on the dials it owns
• However, it does NOT open when Fj is maximized
• The system of locks opens when
is maximized
F = F1 + F2 + ... + Fn
![Page 14: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/14.jpg)
1) Randomly select a lock
2) Randomly select a dial belonging to the lock
3) Adjust the value on the selected dial based only on the info corresponding to the selected lock
An Algorithm with (too much?) Randomization
![Page 15: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/15.jpg)
IDLE IDLE
IDLE IDLE
IDLE
IDLE
Synchronous Parallelization
J4
J7
J1
J5
J8
J2
time
J6
J9
J3Processor 1
Processor 2
Processor 3 WASTEFUL
![Page 16: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/16.jpg)
Crazy (Lock-Free) Parallelization
time
J4 J5 J6
J7 J8 J9
J1 J2 J3Processor 1
Processor 2
Processor 3 NO WASTE
![Page 17: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/17.jpg)
Crazy Parallelization
![Page 18: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/18.jpg)
Crazy Parallelization
![Page 19: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/19.jpg)
Crazy Parallelization
![Page 20: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/20.jpg)
Crazy Parallelization
![Page 21: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/21.jpg)
Theoretical Result
Average # dials in a lock
Average # of dials common between 2 locks
# Locks
# Processors
![Page 22: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/22.jpg)
Computational Insights
![Page 23: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/23.jpg)
![Page 24: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/24.jpg)
![Page 25: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/25.jpg)
![Page 26: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/26.jpg)
![Page 27: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/27.jpg)
Theory vs Reality
![Page 28: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/28.jpg)
Why parallelizing like crazy and being lazy can be good?
Randomization
• Effectivity• Tractability• Efficiency• Scalability (big data)• Parallelism• Distribution• Asynchronicity
Parallelization
![Page 29: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/29.jpg)
Optimization Methods for Big Data
• Randomized Coordinate Descent– P. R. and M. Takac: Parallel coordinate descent methods
for big data optimization, ArXiv:1212.0873 [can solve a problem with 1 billion variables in 2 hours using 24
processors]• Stochastic (Sub) Gradient Descent
– P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions
[can be applied to optimize an unknown function]• Both of the above
M. Takac, A.Bijral, P. R. and N. Srebro: Mini-batch primal and dual methods for support vector machines, ArXiv:1303.xxxx
![Page 30: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/30.jpg)
Final 2 Slides
![Page 31: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/31.jpg)
ToolsProbability
Machine LearningMatrix Theory
HPC
![Page 32: Optimization via (too much?) Randomization](https://reader035.fdocuments.us/reader035/viewer/2022062723/56813f18550346895da9aead/html5/thumbnails/32.jpg)