Post on 14-Jan-2016
Why STMs Need Why STMs Need CompilersCompilers(a war story)(a war story)
Maurice HerlihyBrown University
STMSTM
Interested in managed languagesJava, C#Not C, C++
Synchronization based on objects, not addresses
Strong isolation between transactional & non-transactional access
STM Libraries: DSTM 2003STM Libraries: DSTM 2003
Transactional “wrapper” for objects
Cumbersome & error-prone
// wrapper for node objectTMObject<RBNode> xnode = …// explicit openRBNode node = xnode.openRead();// next field is wrapperTMObject<RBNode> next = node.next;
STM Libraries: SXM 2005STM Libraries: SXM 2005
Mark data types as atomicUse reflection & run-time code generation
Not efficient (no global optimizations)
[Atomic] // Atomic Red-Black Tree nodepublic class RBNode { public int value; public Color color; public bool marked; public RBNode parent, left, right;
public RBNode() { value = 0; color = Color.RED; parent = null; left = null; right = null; }}
Current workCurrent work
Mark data types as atomicCompiler does the rest…
[Atomic] // Atomic Red-Black Tree nodepublic class RBNode { public int value; public Color color; public bool marked; public RBNode parent, left, right;
public RBNode() { value = 0; color = Color.RED; parent = null; left = null; right = null; }
// other methods}
OptimizationsOptimizations
Whole object dataflow analysiscall tree for non-virtual methods Pass context from method to method
Every method is analyzed for:Local objects (not shared across transactions)Read only or read/write objects
Promotes openReads to openWrites for RW objects
Early opens for objects used across basic blocks
Partial Redundancy Elimination (PRE)
Optimizations (cont.)Optimizations (cont.)
First vs. subsequent read/writesInline subsequent reads/writes with fast-path code sequenceSubsequent reads/writes are lock-free, even in the STM mode that uses short locks
Progress ConditionsProgress Conditions
Short-lived locksBlockingEfficient library implementation
Obstruction-freeNo locks, non-blockingInefficient library implemenetation
After compiler optimizations,obstruction-free performs as well as lock-based
Performance ComparisonPerformance ComparisonBlocking Mode
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
List RBTree SkipList HashTable Buffer
Benchmark
Op
s/s
ec
SXM
Library Opts
Compiler
Const + Local
RWPromo
Subsequent
PreOpen
Performance ComparisonPerformance Comparison
Obstruction-Free Mode
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
List RBTree SkipList HashTable Buffer
Benchmark
Op
s/s
ec
SXM
Library Opts
Compiler
Const + Local
RWPromo
Subsequent
PreOpen
Moral of the storyMoral of the story
Libraries are eitherLow-overhead but hard to useInefficient but easy to use
Compiler supportCombines best of both worldsNon-local optimization