Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel...

30
Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies

Transcript of Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel...

Page 1: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Swarat Chaudhuri Penn State

Roberto Lublinerman Pavol Cerny Penn State IST Austria

Parallel Programming with

Object Assemblies

Page 2: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Data parallelism:

- Highly coarse-grained (MapReduce)- Highly fine-grained (numeric computations on dense arrays)-Problem-specific methods

Taming parallelism

Task-parallelismMessage-passing

Page 3: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Taming parallelism

Our target:

Data-parallel computations over large, unstructured, shared-memory graphs

Unknown granularity

High-level correctness as well as efficiency.

Page 4: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Delaunay mesh refinement

• Triangulate a given set of points.• Delaunay property:

No point is contained within the circumcircle of a triangle.

• Quality property:No bad triangles—i.e., triangles with an angle > 120o.

• Mesh refinement:Fix bad triangles through an iterative algorithm.

Page 5: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Retriangulation

Cavity: all triangles whose circumcircle contains new point.

Quality constraint may not hold for all new triangles.

Page 6: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Sequential mesh refinement

Mesh m = /* read input mesh */Worklist wl = new Worklist(m.getBad());foreach triangle t in wl {Cavity c = new Cavity(t);c.expand();c.retriangulate();m.updateMesh(c);wl.add(c.getBad());

}

• Cavities are contiguous “regions” in the mesh.• Worst-case cavities can encompass the whole mesh.

Page 7: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Parallelization

• Computation over complex, unstructured graphsMesh = Heap-allocated graph. Nodes = triangles.

Edges = adjacency

• Atomicity: Cavities must be retriangulated atomically.• Non-overlapping cavities can be processed in parallel.• Seems impossible to handle with static analysis:– Shape of data structure changes greatly over time.– Shape of data structure is highly input-dependent. – Without deep algorithmic knowledge, impossible to say if

statically if cavities will overlap.

• Lots of recent work, notably by Pingali et al.

Page 8: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

List of similar applications

• Delaunay mesh refinement, Delaunay triangulation• Agglomerative clustering, ray tracing• Social network maintenance• Minimum spanning tree, Maximum flow• N-body simulation, epidemiological simulation• Sparse matrix-vector multiplication, sparse Cholesky

factorization• Belief propagation, survey propagation in Bayesian

inference• Iterative dataflow analysis, Petri net simulation• Finite-difference PDE solution

Page 9: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Locality of updates in Chorus

Cavity

• On a mesh of ~100,000 triangles from Lonestar benchmarks: Average cavity size = 3.75 triangles.Maximum cavity size = 12 triangles

• Average-case locality the essence of parallelism. • Chorus: parallel computation driven by

“neighborhoods” in heaps.

Page 10: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Heaps, regions, assemblies

• Heap = directed graphNodes = objectsLabeled edges = pointers

• Region = induced subgraph• Assembly =

region + thread of control

Typically speculativeand shortlived.

Page 11: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

• Assembly class = set of local variables + set of guarded updates + constructor + public

variables.• Program = set of classes • Synchronization happens in guard evaluation.

Programs, assembly classes

busyexecutingupdate

terminated

ready to be preemptedor execute next update

:: Guard: Update

Page 12: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

• g is a condition on thelocal variables and owned objects of

Guards can merge assemblies

:: merge (u.f): S

:: merge (u.f) when g: S

u f

• gets a bigger region, keeps local state

• dies.• must be in ready state

while merge happens

Page 13: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

• Split into assemblies of

class T.• Other assemblies not

affected.• Not a synchronization

construct.

Updates can split an assembly

split(T)

Page 14: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

• Attempts to access objects outside region lead to exceptions.

Local updates

x = u.f;x.f = y;

u f

Page 15: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Delaunay mesh refinement

• Use two assembly classes: Triangle and Cavity. – Cavity = local region in mesh.

• Each triangle:– Determines if it is bad (local check).– If so, merges with neighbors to become cavity.

• Each cavity:– Determines if it is complete (local check).– If no, merges with a neighbor.– If yes, retriangulates (locally) and splits.

Page 16: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Delaunay mesh refinement: sketch

assembly Triangle:: ... action:: merge (v.f, Cavity) when isBad:

skip assembly Cavity:: ... action:: merge (v.f) when (not isComplete):

...

isComplete: retriangulate(); split(Triangle)

Page 17: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Delaunay mesh refinement: sketch

assem Triangle:: ... action:: merge (v.f, Cavity, u) when bad?:

skip assem Cavity:: ... action:: merge (v.f) when (not complete?):

skip

complete?: retriangulate(); split(Triangle)

What happens on a conflict?

• Cavity i “absorbed” by cavity j.• Cavity j now has some

“unnecessary” triangles. • j will later split.

Page 18: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Boruvka’s algorithm for minimum spanning tree

• Assembly = spanning tree • Initially, each assembly has

one node.• As algorithm progresses, trees

merge.

Page 19: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Race-freedom

• No aliasing, only ownership transfer.

• can merge with only when is not in the middle of an update.

Page 20: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Deadlock-freedom

• Classic definition: Process P waits for a resource from Q and vice versa.

• Deadlock in Chorus:– has a locally enabled merge with – has a locally enabled merge with – No other progress is possible.

• But one of the merges can always be carried out. (An assembly can always be killed at its ready state.)

u

Page 21: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

JChorus

• Chorus + sequential Java.

• Assembly classes in addition to object classes.

7: assembly Cavity { 8: action { // expand cavity 9: merge(outgoingedges, TriangleObject t): 10: { outgoingedges.remove(t);11: frontier.add(t);12: build(); } 13: } 14: Set members; Set border;15: Queue frontier; // current frontier16: List outgoingedges; // outgoing edges on which to merge17: TriangleObject initial; ...

Page 22: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Division-based implementation• Division = set of assemblies

mapped to a core.• Local access:

Merge-actions within a divisionSplit-actionsLocal updates

• Remote access:Merge-actions issued across divisions

• Uses assembly-level locks.

Page 23: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Implementation strategies

• Adaptive divisions. Heuristic for reducing the number of remote merges.

• During a merge, not only the target assembly, but also assemblies reachable by k pointer indirections, are migrated.

• Adaptation heuristic does elementary load balancing.

• Union-find data structure to relate objects and assemblies that they belong to

• Needed for splits and merges.• Token-passing for deadlock prevention and

termination detection.

Page 24: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Experiments: Delaunay refinement from Lonestar benchmarks

• Large dataset from Lonestar benchmarks.– 100,364 triangles.– 47,768 initially bad.

• 1 to 8 threads.• Competing approaches:– Object-level locking– DSTM (Software transactions)

Page 25: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Locality: mesh snapshots

The initial mesh and divisions Mesh after several thousand retriangulations

Page 26: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Delaunay: Speedup over sequential

Page 27: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Delaunay: Self-relative speedup

Page 28: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Delaunay: Conflicts

Page 29: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Related models

• Threads + explicit locking: Global heap abstraction, arbitrary aliasing.

• Software transactions: Burden of reasoning passed to transaction manager. In most implementations, heap is viewed as global.

• Static data partitioning: Unpredictable nature of the computation makes static analysis hard.

• Actors: Based on low-level messaging. If sending references, potential of races. If copying triangles, inefficient.

• Pingali et al’s Galois: Same problem, but ours is an alternative.

Page 30: Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

More information

Parallel programming with object assemblies.Roberto Lublinerman, Swarat Chaudhuri, Pavol Cerny.OOPSLA 2009.

http://www.cse.psu.edu/~swarat/chorus