Post on 25-Jun-2015
description
Large-‐scale computa0on without sacrificing expressiveness
Sangjin Han Sylvia Ratnasamy UC Berkeley
1
Review: MapReduce and Friends
Computa0on
Input Output
map filter
group by reduce join …
2
Review: MapReduce and Friends
Computa0on
Input Output
map filter
group by reduce join …
Observa(on 1: Bulk transforma(on of immutable data (no fine-‐grained updates)
3
Example 1: Sparse Opera0ons
• k-‐hop reachability with itera0ve MapReduce
4
Example 1: Sparse Opera0ons
• k-‐hop reachability with itera0ve MapReduce
MR Source node
Graph
5
1-‐hop nodes
Example 1: Sparse Opera0ons
• k-‐hop reachability with itera0ve MapReduce
MR Source node
Graph
6
1-‐hop nodes MR
Graph
2-‐hop nodes
Example 1: Sparse Opera0ons
• k-‐hop reachability with itera0ve MapReduce
MR Source node
Graph
7
1-‐hop nodes MR
Graph
2-‐hop nodes MR
Graph
…
Example 1: Sparse Opera0ons
• k-‐hop reachability with itera0ve MapReduce
MR Source node
Graph
8
1-‐hop nodes MR
Graph
2-‐hop nodes MR
Graph
…
Example 1: Sparse opera0ons
Internet router topology graph (1.7M nodes, 22.2M edges) 9
• k-‐hop reachability with itera0ve MapReduce
0
5
10
15
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
# of
pro
cess
ed e
dges
(Mill
ions
)
Iteration
Iterative MapReduce Optimal
Review: MapReduce and Friends (cont’d)
Converged?
10
Review: MapReduce and Friends (cont’d)
Converged?
Filter Map
Filter
Union
Join
11
Review: MapReduce and Friends (cont’d)
Observa(on 2: Sta(c dataflow (no data-‐dependent control flow)
Converged?
Filter Map
Filter
Union
Join
12
13
E = (p ∨ !q)∧(!p ∨ r ∨ s)∧(q ∨ !s ∨ !t)∧(!p ∨ s)∧…
Example 2: Irregular parallelism
• Parallel SAT solver
14
E = (p ∨ !q)∧(!p ∨ r ∨ s)∧(q ∨ !s ∨ !t)∧(!p ∨ s)∧…
Example 2: Irregular parallelism
• Parallel SAT solver
p = T F
q = T F T F
15
E = (p ∨ !q)∧(!p ∨ r ∨ s)∧(q ∨ !s ∨ !t)∧(!p ∨ s)∧…
Example 2: Irregular parallelism
• Parallel SAT solver
p = T F
q = T F T F
16
E = (p ∨ !q)∧(!p ∨ r ∨ s)∧(q ∨ !s ∨ !t)∧(!p ∨ s)∧…
Example 2: Irregular parallelism
• Parallel SAT solver
p = T F
q = T F T F
r = T F
MapReduce-‐like frameworks assume:
1. Bulk transforma0on of immutable data
2. Sta0c dataflow
17
Exis0ng frameworks assume: Our work:
1. Bulk transforma0on of immutable data Fine-‐grained opera0ons on mutable data
2. Sta0c dataflow Dynamic, data-‐dependent control flow
18
Yet we s0ll want elas0c scalability and fault tolerance
CELIAS PROGRAMMING MODEL Spinning a small twist to Linda
19
Programming model = data model + computa0on model
20
Data Models for Mutable Shared Memory
21
Global address space: UPC, X10, Fortress… Too low level
Data Models for Mutable Shared Memory
22
Global address space: UPC, X10, Fortress…
Key Value
… …
Key-‐value tables: RAMCloud, Dynamo, Piccolo…
Too low level
Data Models for Mutable Shared Memory
Limited lookup ability
Consistency concerns
23
Global address space: UPC, X10, Fortress…
Key Value
… …
Key-‐value tables: RAMCloud, Dynamo, Piccolo…
Tuplespace: Linda
Too low level
Data Models for Mutable Shared Memory
Limited lookup ability
Consistency concerns
Flexible lookup with any ahributes
Individual tuples are immutable
(‘employee’, ‘John’, 29)
(‘todo’, ‘shopping’)
(‘todo’, ‘walk’)
24
Programming model = data model + computa0on model
Linda = Tuplespace + Linda processes
25
in(…) … out(…) … Process A
… out(…) … out(…) … Process B
… in(…) … in(…) … Process C
Linda Processes
26
in(…) … out(…) … Process A
… out(…) … out(…) … Process B
… in(…) … in(…) … Process C
Linda Processes L No automa0c scaling L No fault tolerance
27
Programming model = data model + computa0on model
Linda = Tuplespace + Linda processes
Celias = Tuplespace + microtasks
28
29
Microtasks
( ‘hello’, 5)
( ‘hello’, 7)
(‘world’, 2)
…
Func0on wordcount() Signature (?word, ?cnt1), (?word, ?cnt2) Code sum := cnt1 + cnt2
emit (word, sum)
30
Microtasks
( ‘hello’, 5)
( ‘hello’, 7)
(‘world’, 2)
…
Func0on wordcount() Signature (?word, ?cnt1), (?word, ?cnt2) Code sum := cnt1 + cnt2
emit (word, sum)
word = ‘hello’ cnt1 = 5 cnt2 = 7
When a signature matches:
1. microtask launch
31
Microtasks
( ‘hello’, 5)
( ‘hello’, 7)
(‘world’, 2)
…
Func0on wordcount() Signature (?word, ?cnt1), (?word, ?cnt2) Code sum := cnt1 + cnt2
emit (word, sum)
5 + 7 = ??
When a signature matches:
1. microtask launch
2. code execu0on
32
Microtasks
(‘world’, 2)
…
Func0on wordcount() Signature (?word, ?cnt1), (?word, ?cnt2) Code sum := cnt1 + cnt2
emit (word, sum) ( ‘hello’, 12)
When a signature matches:
1. microtask launch
2. code execu0on
3. atomic replacement
33
(A + B) × (C + D)
Two func0ons: add() and mul0ply()
34
(A + B) × (C + D)
Two func0ons: add() and mul0ply()
35
E × F
J Automa0c scaling
Two func0ons: add() and mul0ply()
36
J Automa0c scaling
E × F
Two func0ons: add() and mul0ply()
37
J Automa0c scaling
E × F
Two func0ons: add() and mul0ply()
38
J Automa0c scaling J Fault tolerance
E × F
Two func0ons: add() and mul0ply()
More Examples in the Paper…
• MapReduce – Celias is Turing-‐complete MapReduce-‐complete! – without any ar0ficial sync. barriers
• Single-‐source shortest path – Pregel-‐style graph processing
• Quicksort – Recursive control flow
39
Summary
• MapReduce-‐like frameworks are not suitable for algorithms with: – Sparse/incremental/fine-‐grained computa0on – Dynamic dataflow
• Celias comes to our rescue, yet it is also – automa0cally scalable – fault tolerant
40
Open Ques0ons
• Microtask abstrac0on: good enough? went too far?
• Feasibility of an efficient implementa0on – Reliable tuplespace – Signature matching – Microtask transac0ons
• … what is a killer app of Celias?
41
• <Your ques0ons here>