Enabling Speculative Parallelization via Merge Semantics in STMs
Why STMs Need Compilers (a war story) Maurice Herlihy Brown University.
-
Upload
kenneth-lucas -
Category
Documents
-
view
213 -
download
0
Transcript of Why STMs Need Compilers (a war story) Maurice Herlihy Brown University.
Why STMs Need Why STMs Need CompilersCompilers(a war story)(a war story)
Maurice HerlihyBrown University
STMSTM
Interested in managed languagesJava, C#Not C, C++
Synchronization based on objects, not addresses
Strong isolation between transactional & non-transactional access
STM Libraries: DSTM 2003STM Libraries: DSTM 2003
Transactional “wrapper” for objects
Cumbersome & error-prone
// wrapper for node objectTMObject<RBNode> xnode = …// explicit openRBNode node = xnode.openRead();// next field is wrapperTMObject<RBNode> next = node.next;
STM Libraries: SXM 2005STM Libraries: SXM 2005
Mark data types as atomicUse reflection & run-time code generation
Not efficient (no global optimizations)
[Atomic] // Atomic Red-Black Tree nodepublic class RBNode { public int value; public Color color; public bool marked; public RBNode parent, left, right;
public RBNode() { value = 0; color = Color.RED; parent = null; left = null; right = null; }}
Current workCurrent work
Mark data types as atomicCompiler does the rest…
[Atomic] // Atomic Red-Black Tree nodepublic class RBNode { public int value; public Color color; public bool marked; public RBNode parent, left, right;
public RBNode() { value = 0; color = Color.RED; parent = null; left = null; right = null; }
// other methods}
OptimizationsOptimizations
Whole object dataflow analysiscall tree for non-virtual methods Pass context from method to method
Every method is analyzed for:Local objects (not shared across transactions)Read only or read/write objects
Promotes openReads to openWrites for RW objects
Early opens for objects used across basic blocks
Partial Redundancy Elimination (PRE)
Optimizations (cont.)Optimizations (cont.)
First vs. subsequent read/writesInline subsequent reads/writes with fast-path code sequenceSubsequent reads/writes are lock-free, even in the STM mode that uses short locks
Progress ConditionsProgress Conditions
Short-lived locksBlockingEfficient library implementation
Obstruction-freeNo locks, non-blockingInefficient library implemenetation
After compiler optimizations,obstruction-free performs as well as lock-based
Performance ComparisonPerformance ComparisonBlocking Mode
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
List RBTree SkipList HashTable Buffer
Benchmark
Op
s/s
ec
SXM
Library Opts
Compiler
Const + Local
RWPromo
Subsequent
PreOpen
Performance ComparisonPerformance Comparison
Obstruction-Free Mode
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
List RBTree SkipList HashTable Buffer
Benchmark
Op
s/s
ec
SXM
Library Opts
Compiler
Const + Local
RWPromo
Subsequent
PreOpen
Moral of the storyMoral of the story
Libraries are eitherLow-overhead but hard to useInefficient but easy to use
Compiler supportCombines best of both worldsNon-local optimization