Large-scale computation without sacrificing expressiveness

41
Largescale computa0on without sacrificing expressiveness Sangjin Han Sylvia Ratnasamy UC Berkeley 1

description

Presented at the 14th Workshop on Hot Topics in Operating Systems (HotOS XIV) It presents Celias, a new concurrent programming model for data-intensive scalable computing. It aims to devise a new large-scale computation framework for complex algorithms, with elastic scalability and automatic fault tolerance. The paper can be found here: http://www.eecs.berkeley.edu/~sangjin/static/pub/hotos2013_celias.pdf

Transcript of Large-scale computation without sacrificing expressiveness

Page 1: Large-scale computation without sacrificing expressiveness

Large-­‐scale  computa0on  without  sacrificing  expressiveness  

Sangjin Han Sylvia Ratnasamy UC Berkeley

1  

Page 2: Large-scale computation without sacrificing expressiveness

Review:  MapReduce  and  Friends  

Computa0on  

Input   Output  

map  filter  

group  by  reduce  join  …  

2  

Page 3: Large-scale computation without sacrificing expressiveness

Review:  MapReduce  and  Friends  

Computa0on  

Input   Output  

map  filter  

group  by  reduce  join  …  

Observa(on  1:  Bulk  transforma(on  of  immutable  data  (no  fine-­‐grained  updates)  

3  

Page 4: Large-scale computation without sacrificing expressiveness

Example  1:  Sparse  Opera0ons  

•  k-­‐hop  reachability  with  itera0ve  MapReduce  

4  

Page 5: Large-scale computation without sacrificing expressiveness

Example  1:  Sparse  Opera0ons  

•  k-­‐hop  reachability  with  itera0ve  MapReduce  

MR  Source  node  

Graph  

5  

1-­‐hop  nodes  

Page 6: Large-scale computation without sacrificing expressiveness

Example  1:  Sparse  Opera0ons  

•  k-­‐hop  reachability  with  itera0ve  MapReduce  

MR  Source  node  

Graph  

6  

1-­‐hop  nodes   MR  

Graph  

2-­‐hop  nodes  

Page 7: Large-scale computation without sacrificing expressiveness

Example  1:  Sparse  Opera0ons  

•  k-­‐hop  reachability  with  itera0ve  MapReduce  

MR  Source  node  

Graph  

7  

1-­‐hop  nodes   MR  

Graph  

2-­‐hop  nodes   MR  

Graph  

…  

Page 8: Large-scale computation without sacrificing expressiveness

Example  1:  Sparse  Opera0ons  

•  k-­‐hop  reachability  with  itera0ve  MapReduce  

MR  Source  node  

Graph  

8  

1-­‐hop  nodes   MR  

Graph  

2-­‐hop  nodes   MR  

Graph  

…  

Page 9: Large-scale computation without sacrificing expressiveness

Example  1:  Sparse  opera0ons  

Internet  router  topology  graph  (1.7M  nodes,  22.2M  edges)   9  

•  k-­‐hop  reachability  with  itera0ve  MapReduce  

0

5

10

15

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

# of

pro

cess

ed e

dges

(Mill

ions

)

Iteration

Iterative MapReduce Optimal

Page 10: Large-scale computation without sacrificing expressiveness

Review:  MapReduce  and  Friends  (cont’d)  

Converged?  

10  

Page 11: Large-scale computation without sacrificing expressiveness

Review:  MapReduce  and  Friends  (cont’d)  

Converged?  

Filter   Map  

Filter  

Union  

Join  

11  

Page 12: Large-scale computation without sacrificing expressiveness

Review:  MapReduce  and  Friends  (cont’d)  

Observa(on  2:  Sta(c  dataflow  (no  data-­‐dependent  control  flow)  

Converged?  

Filter   Map  

Filter  

Union  

Join  

12  

Page 13: Large-scale computation without sacrificing expressiveness

13  

E  =  (p  ∨  !q)∧(!p  ∨  r  ∨  s)∧(q  ∨  !s  ∨  !t)∧(!p  ∨  s)∧…  

Example  2:  Irregular  parallelism  

•  Parallel  SAT  solver  

Page 14: Large-scale computation without sacrificing expressiveness

14  

E  =  (p  ∨  !q)∧(!p  ∨  r  ∨  s)∧(q  ∨  !s  ∨  !t)∧(!p  ∨  s)∧…  

Example  2:  Irregular  parallelism  

•  Parallel  SAT  solver  

p  =                                                                            T                                                            F    

q  =                                    T                                  F                                                                  T                                  F  

Page 15: Large-scale computation without sacrificing expressiveness

15  

E  =  (p  ∨  !q)∧(!p  ∨  r  ∨  s)∧(q  ∨  !s  ∨  !t)∧(!p  ∨  s)∧…  

Example  2:  Irregular  parallelism  

•  Parallel  SAT  solver  

p  =                                                                            T                                                            F    

q  =                                    T                                  F                                                                  T                                  F  

Page 16: Large-scale computation without sacrificing expressiveness

16  

E  =  (p  ∨  !q)∧(!p  ∨  r  ∨  s)∧(q  ∨  !s  ∨  !t)∧(!p  ∨  s)∧…  

Example  2:  Irregular  parallelism  

•  Parallel  SAT  solver  

p  =                                                                            T                                                            F    

q  =                                    T                                  F                                                                  T                                  F  

r  =                                                              T                                  F  

Page 17: Large-scale computation without sacrificing expressiveness

MapReduce-­‐like  frameworks  assume:    

1.    Bulk  transforma0on  of  immutable  data  

 

2.    Sta0c  dataflow  

17  

Page 18: Large-scale computation without sacrificing expressiveness

Exis0ng  frameworks  assume:    Our  work:  

1.    Bulk  transforma0on  of  immutable  data              Fine-­‐grained  opera0ons  on  mutable  data    

2.    Sta0c  dataflow              Dynamic,  data-­‐dependent  control  flow  

18  

Yet  we  s0ll  want  elas0c  scalability  and  fault  tolerance  

Page 19: Large-scale computation without sacrificing expressiveness

CELIAS  PROGRAMMING  MODEL  Spinning  a  small  twist  to  Linda  

19  

Page 20: Large-scale computation without sacrificing expressiveness

Programming  model  =      data  model    +    computa0on  model  

20  

Page 21: Large-scale computation without sacrificing expressiveness

Data  Models  for  Mutable  Shared  Memory  

21  

Page 22: Large-scale computation without sacrificing expressiveness

Global  address  space:  UPC,  X10,  Fortress…  Too  low  level  

Data  Models  for  Mutable  Shared  Memory  

22  

Page 23: Large-scale computation without sacrificing expressiveness

Global  address  space:  UPC,  X10,  Fortress…  

Key   Value  

…   …  

Key-­‐value  tables:  RAMCloud,  Dynamo,  Piccolo…  

Too  low  level  

Data  Models  for  Mutable  Shared  Memory  

Limited  lookup  ability    

Consistency  concerns  

23  

Page 24: Large-scale computation without sacrificing expressiveness

Global  address  space:  UPC,  X10,  Fortress…  

Key   Value  

…   …  

Key-­‐value  tables:  RAMCloud,  Dynamo,  Piccolo…  

Tuplespace:  Linda  

Too  low  level  

Data  Models  for  Mutable  Shared  Memory  

Limited  lookup  ability    

Consistency  concerns  

Flexible  lookup  with  any  ahributes    

Individual  tuples  are  immutable  

(‘employee’,  ‘John’,  29)  

(‘todo’,  ‘shopping’)  

(‘todo’,  ‘walk’)  

24  

Page 25: Large-scale computation without sacrificing expressiveness

Programming  model  =      data  model    +    computa0on  model  

Linda  =      Tuplespace    +    Linda  processes  

25  

Page 26: Large-scale computation without sacrificing expressiveness

in(…)  …  out(…)  …      Process  A  

…  out(…)  …  out(…)  …    Process  B  

…  in(…)  …  in(…)  …    Process  C  

Linda  Processes  

26  

Page 27: Large-scale computation without sacrificing expressiveness

in(…)  …  out(…)  …      Process  A  

…  out(…)  …  out(…)  …    Process  B  

…  in(…)  …  in(…)  …    Process  C  

Linda  Processes  L  No  automa0c  scaling  L  No  fault  tolerance  

27  

Page 28: Large-scale computation without sacrificing expressiveness

Programming  model  =      data  model    +    computa0on  model  

Linda  =      Tuplespace    +    Linda  processes  

Celias  =      Tuplespace    +    microtasks  

28  

Page 29: Large-scale computation without sacrificing expressiveness

29  

Microtasks  

(  ‘hello’,  5)  

(  ‘hello’,  7)  

(‘world’,  2)  

…  

Func0on  wordcount()  Signature   (?word,  ?cnt1),  (?word,  ?cnt2)  Code   sum  :=  cnt1  +  cnt2  

emit  (word,  sum)  

Page 30: Large-scale computation without sacrificing expressiveness

30  

Microtasks  

(  ‘hello’,  5)  

(  ‘hello’,  7)  

(‘world’,  2)  

…  

Func0on  wordcount()  Signature   (?word,  ?cnt1),  (?word,  ?cnt2)  Code   sum  :=  cnt1  +  cnt2  

emit  (word,  sum)  

word  =  ‘hello’  cnt1  =  5  cnt2  =  7    

When  a  signature  matches:  

1.  microtask  launch  

Page 31: Large-scale computation without sacrificing expressiveness

31  

Microtasks  

(  ‘hello’,  5)  

(  ‘hello’,  7)  

(‘world’,  2)  

…  

Func0on  wordcount()  Signature   (?word,  ?cnt1),  (?word,  ?cnt2)  Code   sum  :=  cnt1  +  cnt2  

emit  (word,  sum)  

5  +  7  =  ??  

When  a  signature  matches:  

1.  microtask  launch  

2.  code  execu0on  

Page 32: Large-scale computation without sacrificing expressiveness

32  

Microtasks  

(‘world’,  2)  

…  

Func0on  wordcount()  Signature   (?word,  ?cnt1),  (?word,  ?cnt2)  Code   sum  :=  cnt1  +  cnt2  

emit  (word,  sum)  (  ‘hello’,  12)  

When  a  signature  matches:  

1.  microtask  launch  

2.  code  execu0on  

3.  atomic  replacement  

Page 33: Large-scale computation without sacrificing expressiveness

33  

(A  +  B)  ×  (C  +  D)  

Two  func0ons:  add()  and  mul0ply()  

Page 34: Large-scale computation without sacrificing expressiveness

34  

(A  +  B)  ×  (C  +  D)  

Two  func0ons:  add()  and  mul0ply()  

Page 35: Large-scale computation without sacrificing expressiveness

35  

E            ×            F  

J  Automa0c  scaling  

Two  func0ons:  add()  and  mul0ply()  

Page 36: Large-scale computation without sacrificing expressiveness

36  

J  Automa0c  scaling  

E            ×            F  

Two  func0ons:  add()  and  mul0ply()  

Page 37: Large-scale computation without sacrificing expressiveness

37  

J  Automa0c  scaling  

E            ×            F  

Two  func0ons:  add()  and  mul0ply()  

Page 38: Large-scale computation without sacrificing expressiveness

38  

J  Automa0c  scaling  J  Fault  tolerance  

E            ×            F  

Two  func0ons:  add()  and  mul0ply()  

Page 39: Large-scale computation without sacrificing expressiveness

More  Examples  in  the  Paper…  

•  MapReduce  – Celias  is  Turing-­‐complete  MapReduce-­‐complete!  – without  any  ar0ficial  sync.  barriers  

•  Single-­‐source  shortest  path  – Pregel-­‐style  graph  processing  

•  Quicksort  – Recursive  control  flow  

39  

Page 40: Large-scale computation without sacrificing expressiveness

Summary  

•  MapReduce-­‐like  frameworks  are  not  suitable  for  algorithms  with:  – Sparse/incremental/fine-­‐grained  computa0on  – Dynamic  dataflow  

•  Celias  comes  to  our  rescue,  yet  it  is  also  – automa0cally  scalable  –  fault  tolerant  

40  

Page 41: Large-scale computation without sacrificing expressiveness

Open  Ques0ons  

•  Microtask  abstrac0on:  good  enough?  went  too  far?  

•  Feasibility  of  an  efficient  implementa0on  –  Reliable  tuplespace  –  Signature  matching  – Microtask  transac0ons  

•  …  what  is  a  killer  app  of  Celias?  

41  

•  <Your  ques0ons  here>