Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming...
-
Upload
julian-munoz -
Category
Documents
-
view
212 -
download
0
Transcript of Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming...
![Page 1: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/1.jpg)
Compiler and Runtime Supportfor Efficient
Software Transactional Memory
Vijay Menon
Programming Systems Lab
Ali-Reza Adl-Tabatabai, Brian T. Lewis,Brian R. Murphy, Bratin Saha, Tatiana Shpeisman
![Page 2: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/2.jpg)
2
Motivation
Locks are hard to get right• Programmability vs scalability
Transactional memory is appealing alternative• Simpler programming model• Stronger guarantees
•Atomicity, Consistency, Isolation•Deadlock avoidance
• Closer to programmer intent• Scalable implementations
Questions• How to lower TM overheads – particularly in software?• How to balance granularity / scalability?
![Page 3: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/3.jpg)
3
Our System
Java Software Transactional Memory (STM) System– Pure software implementation (McRT-STM – PPoPP ’06)– Language extensions in Java (Polyglot)– Integrated with JVM & JIT (ORP & StarJIT)
Novel Features– Rich transactional language constructs in Java– Efficient, first class nested transactions– Complete GC support – Risc-like STM API / IR– Compiler optimizations– Per-type word and object level conflict detection
![Page 4: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/4.jpg)
4
Transactional Java → Java
Transactional Java
atomic {
S;
}
Other Language Constructs• Built on prior research
– retry (STM Haskell, …)– orelse (STM Haskell) – tryatomic (Fortress)– when (X10, …)
Standard Java + STM API
while(true) {
TxnHandle th = txnStart();
try {
S’;
break;
} finally {
if(!txnCommit(th))
continue;
}
}
![Page 5: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/5.jpg)
5
Tight integration with JVM & JIT
StarJIT & ORP
• On-demand cloning of methods (Harris ’03)
• Identifies transactional regions in Java+STM code
• Inserts read/write barriers in transactional code
• Maps STM API to first class opcodes in StarJIT IR (STIR)
Good compiler representation →
greater optimization opportunities
![Page 6: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/6.jpg)
6
Representing Read/Write Barriers
atomic {
a.x = t1
a.y = t2
if(a.z == 0) {
a.x = 0
a.z = t3
}
}
…
stmWr(&a.x, t1)
stmWr(&a.y, t2)
if(stmRd(&a.z) != 0) {
stmWr(&a.x, 0);
stmWr(&a.z, t3)
}
Traditional barriers hide redundant locking/logging
![Page 7: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/7.jpg)
7
An STM IR for Optimization
Redundancies exposed:
atomic {
a.x = t1
a.y = t2
if(a.z == 0) {
a.x = 0
a.z = t3
}
}
txnOpenForWrite(a)
txnLogObjectInt(&a.x, a)
a.x = t1
txnOpenForWrite(a)
txnLogObjectInt(&a.y, a)
a.y = t2
txnOpenForRead(a)
if(a.z != 0) {
txnOpenForWrite(a)
txnLogObjectInt(&a.x, a)
a.x = 0
txnOpenForWrite(a)
txnLogObjectInt(&a.z, a)
a.z = t3
}
![Page 8: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/8.jpg)
8
Optimized Code
atomic {
a.x = t1
a.y = t2
if(a.z == 0) {
a.x = 0
a.z = t3
}
}
txnOpenForWrite(a)
txnLogObjectInt(&a.x, a)
a.x = t1
txnLogObjectInt(&a.y, a)
a.y = t2
if(a.z != 0) {
a.x = 0
txnLogObjectInt(&a.z, a)
a.y = t3
}
Fewer & cheaper STM operations
![Page 9: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/9.jpg)
9
Compiler Optimizations for Transactions
Standard optimizations• CSE, Dead-code-elimination, …
• Careful IR representation exposes opportunities and enables optimizations with almost no modifications
• Subtle in presence of nesting
STM-specific optimizations• Immutable field / class detection & barrier removal (vtable/String)
• Transaction-local object detection & barrier removal
• Partial inlining of STM fast paths to eliminate call overhead
![Page 10: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/10.jpg)
10
McRT-STM
PPoPP 2006 (Saha, et. al.)• C / C++ STM• Pessimistic Writes:
– strict two-phase locking– update in place– undo on abort
• Optimistic Reads: – versioning– validation before commit
• Benefits– Fast memory accesses (no buffering / object wrapping)– Minimal copying (no cloning for large objects)– Compatible with existing types & libraries
Similar STMs: Ennals (FastSTM), Harris, et.al (PLDI ’06)
![Page 11: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/11.jpg)
11
STM Data Structures
Per-thread:
• Transaction Descriptor– Per-thread info for version validation, acquired locks, rollback– Maintained in Read / Write / Undo logs
• Transaction Memento– Checkpoint of logs for nesting / partial rollback
Per-data:
• Transaction Record– Pointer-sized field guarding a set of shared data– Transactional state of data
• Shared: Version number (odd)• Exclusive: Owner’s transaction descriptor (even / aligned)
![Page 12: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/12.jpg)
12
Mapping Data to Transaction Record
Every data item has an associated transaction record
TxR1
TxR2
TxR3
…TxRn
Object words hashinto table of TxRs
Hash is f(obj.hash, offset)
class Foo { int x; int y;}
TxRxy
vtbl Transactionrecord embedded
In objectObject
granularity
Wordgranularity
class Foo { int x; int y;}
hashxy
vtbl
![Page 13: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/13.jpg)
13
Granularity of Conflict Detection
Object-level• Cheaper operation• Exposes CSE opportunities• Lower overhead on 1P
Word-level • Reduces false sharing• Better scalability
Mix & Match• Per type basis• E.g., word-level for arrays,
object-level for non-arrays
// Thread 1
a.x = …
a.y = …
// Thread 2
… = … a.z …
![Page 14: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/14.jpg)
14
Experiments
16-way 2.2 GHz Xeon with 16 GB shared memory• L1: 8KB, L2: 512 KB, L3: 2MB, L4: 64MB (per four)
Workloads• Hashtable, Binary tree, OO7 (OODBMS)
– Mix of gets, in-place updates, insertions, and removals
• Object-level conflict detection by default– Word / mixed where beneficial
![Page 15: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/15.jpg)
15
Effective of Compiler Optimizations
1P overheads over thread-unsafe baseline
Prior STMs typically incur ~2x on 1PWith compiler optimizations:
- < 40% over no concurrency control- < 30% over synchronization
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
HashMap TreeMap
% O
verh
ead
on
1P
Synchronized
No STM Opt
+Base STM Opt
+Immutability
+Txn Local
+Fast Path Inlining
![Page 16: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/16.jpg)
16
Scalability: Java HashMap Shootout
Unsafe (java.util.HashMap)• Thread-unsafe w/o Concurrency Control
Synchronized• Coarse-grain synchronization via SynchronizedMap wrapper
Concurrent (java.util.concurrent.ConcurrentHashMap)• Multi-year effort: JSR 166 -> Java 5• Optimized for concurrent gets (no locking)• For updates, divides bucket array into 16 segments (size / locking)
Atomic• Transactional version via “AtomicMap” wrapper
Atomic Prime• Transactional version with minor hand optimization
• Tracks size per segment ala ConcurrentHashMap
Execution• 10,000,000 operations / 200,000 elements• Defaults: load factor, threshold, concurrency level
![Page 17: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/17.jpg)
17
Scalability: 100% Gets
Atomic wrapper is competitive with ConcurrentHashMapEffect of compiler optimizations scale
02468
10121416
0 4 8 12 16
# of Processors
Sp
eed
up
over
1P
Un
safe
Unsafe Synchronized Concurrent
Atomic (No Opt) Atomic
![Page 18: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/18.jpg)
18
Scalability: 20% Gets / 80% Updates
ConcurrentHashMap thrashes on 16 segmentsAtomic still scales
0
24
6
8
1012
14
16
0 4 8 12 16
# of Processors
Sp
eed
up
ove
r 1P
Un
safe
Synchronized Concurrent Atomic (No Opt) Atomic
![Page 19: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/19.jpg)
19
20% Inserts and Removes
Atomic conflicts on entire bucket array- The array is an object
0
0.5
1
1.5
2
2.5
3
0 4 8 12 16
# of Processors
Sp
eed
up
ove
r 1P
Un
safe
Synchronized Concurrent Atomic
![Page 20: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/20.jpg)
20
20% Inserts and Removes: Word-Level
We still conflict on the single size field in java.util.HashMap
0
0.5
1
1.5
2
2.5
3
0 4 8 12 16
# of Processors
Sp
eed
up
ove
r 1P
Un
safe
Synchronized Concurrent
Object Atomic Word Atomic
![Page 21: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/21.jpg)
21
20% Inserts and Removes: Atomic Prime
Atomic Prime tracks size / segment – lowering bottleneckNo degradation, modest performance gain
0
0.5
1
1.5
2
2.5
3
0 4 8 12 16
# of Processors
Sp
eed
up
ove
r 1P
Un
safe
Synchronized ConcurrentObject Atomic Word AtomicWord Atomic Prime
![Page 22: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/22.jpg)
22
20% Inserts and Removes: Mixed-Level
Mixed-level preserves wins & reduces overheads-word-level for arrays-object-level for non-arrays
0
0.5
1
1.5
2
2.5
3
0 4 8 12 16
# of Processors
Sp
eed
up
ove
r 1P
Un
safe
Synchronized ConcurrentObject Atomic Word AtomicWord Atomic Prime Mixed Atomic Prime
![Page 23: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/23.jpg)
23
Key Takeaways
Optimistic reads + pessimistic writes is nice sweet spot
Compiler optimizations significantly reduce STM overhead- 20-40% over thread-unsafe
- 10-30% over synchronized
Simple atomic wrappers sometimes good enough
Minor modifications give competitive performance to complex fine-grain synchronization
Word-level contention is crucial for large arrays
Mixed contention provides best of both
![Page 24: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/24.jpg)
24
Novel Contributions
Rich transactional language constructs in Java
Efficient, first class nested transactions
Complete GC support
Risc-like STM API
Compiler optimizations
Per-type word and object level conflict detection
![Page 25: Compiler and Runtime Support for Efficient Software Transactional Memory Vijay Menon Programming Systems Lab Ali-Reza Adl-Tabatabai, Brian T. Lewis, Brian.](https://reader035.fdocuments.us/reader035/viewer/2022070305/55141eea550346d8488b5705/html5/thumbnails/25.jpg)
25