Post on 05-Jan-2016
Hybrid Transactional Memory
Sanjeev Kumar,Michael Chu,
Christopher Hughes,Partha Kundu,
Anthony Nguyen,
Intel LabsUniversity of MichiganIntel LabsIntel LabsIntel Labs
Hybrid Transactional Memory 2Intel Labs
Promise of Transactional Memory (TM)
1 Easier to programCompose naturally
2 Easier to get parallelperformance
3 No deadlocks
4 Maintain consistency in the presence of errors
5 Avoid priority inversion and convoying
6 Supports fault tolerance
transaction { A = A – 10; B = B + 10;
}
lock(l1); lock(l2); A = A – 10; B = B + 10;
unlock(l1); unlock(l2);
Simplify Parallel Programming
...if ( error ) abort_transaction;
...if ( error ) recovery_code();
Hybrid Transactional Memory 3Intel Labs
Flavors of Transactional Memory
1 Easier to programCompose naturally
2 Easier to get parallelperformance
3 No deadlocks
4 Maintain consistency in the presence of errors
5 Avoid priority inversion and convoying
6 Supports fault tolerance
Our Work: Efficient support for a TM that supports all these features
Basic
Support programmer abort
Support nonblocking
Hybrid Transactional Memory 4Intel Labs
TM ImplementationsRequires versioning support and conflict detection Hardware approach [ Herlihy’93 ]
Bounded number of locations Maintain versions in cache → Low overhead
Pure-software approach [ Herlihy’03, Harris’03 ] Unbounded number of locations can be accessed within a
transaction Slow due to overhead of maintaining multiple copies
─ Potentially orders of magnitude
Unbounded hardware approach [ Hammond’04, Ananian’05, Rajwar’05, Moore’06 ] Require significant hardware support Discussed in more detail in the paper
Hybrid Transactional Memory 5Intel Labs
Hardware vs. Software TMHardware Approach
Low overhead Buffers transactional
state in Cache More concurrency
Cache-line granularity Bounded resource
Assembly Within a module
Software Approach High overhead
Uses Object copying to keep transactional state
Less Concurrency Object granularity
No resource limits High-level languages Across modules
Useful BUT Limited to library writers
Useful BUT Limited to special data structures
Neither is satisfactory for broader use
Hybrid Transactional Memory 6Intel Labs
This Work
A Hybrid Transactional Memory Scheme
Requires modest hardware support Changes are localized
Supports unbounded number of locations Performance of hardware when within hardware resource
limits ( Low Overhead of pure Hardware TM ) Gracefully fall back to software if the hardware resource limits
are exceeded ( Unbounded resources of Pure software TM )
Experimentally demonstrate effectiveness of our approach
Outline Motivation Proposed Architectural Support Hybrid Transactional Memory Performance Evaluation Conclusions
Hybrid Transactional Memory 8Intel Labs
ISA Extensions Start of a Transaction
Begin Transaction All ( XBA ) or Select ( XBS ) Save Register State ( SSTATE ) Specify handler on abort due to conflict ( XHAND )
During a Transaction Perform memory loads and store Override defaults ( LDX, STX, LDR, STR )
On Transaction Abort Explicit Abort Transaction ( XA ) Restore Register State ( RSTATE )
On Transaction Commit Commit Transaction ( XC )
Hybrid Transactional Memory 9Intel Labs
Baseline CMP Architecture Our proposed changes
Modest and Localized
Modifications to Core L1 $
No changes to Interconnect Coherence Protocol L2 $ Memory
L2 $
Interconnect
L1 $ L1 $ L1 $
Core Core Core
Hybrid Transactional Memory 10Intel Labs
Hardware Support for TMThree requirements: Maintain two versions Detect conflict
Same core: Tag Another core: Cache
coherence
Atomic commit and abort
Bounded Capacity of TM $ Associativity of TM $
and L2
Core
RegularAccesses
Transactional $L1 $
Ta
g Dat
a Ta
gA
ddl.
Tag Old
D
ata
New
D
ata
To Interconnect
Transactional Accesses
L1 $
Outline Motivation Proposed Architectural Support Hybrid Transactional Memory
Existing pure software scheme Our hybrid scheme
Performance Evaluation Conclusions
Hybrid Transactional Memory 12Intel Labs
Pure Software TM [ Herlihy’03 ] We use this Pure Software TM as a starting point Implemented without any special architectural support
using two techniques Use copies of objects to keep transactional state
─ Make modifications on the copy during a transaction Add a level of indirection
─ Switch the versions on when a transaction is committed
Object ContentsObject Pointer
Object Contents
State PointerOldNew
State State Valid Copy
Active Old
Aborted Old
Committed New
Hybrid Transactional Memory 13Intel Labs
Pure Software TM Scheme Cont’d
Object Contents
Object Pointer
Object Contents
State PointerOldNew
State
Object Contents
State PointerOldNew
State
XValid Copy
Before accessing an object within a transaction
Modify
Hybrid Transactional Memory 14Intel Labs
Our Hybrid Transactional Memory Two modes: Hardware and Software mode
The two modes need to coexist Non-solution: Make all threads transition modes in lockstep
Avoid versioning overheads (allocation and copying) in the hardware mode Still incur the indirection overheads
Tricky because it needs to bridge the hardware and software schemes Hardware mode needs to modify data in-place
─ Pure Software TM assumes data is never modified in-place Different sharing granularity
─ Cache-line (Hardware) vs. Object (Software) Different conflict detection scheme
─ Data (Hardware) vs. State (Software)
Hybrid Transactional Memory 15Intel Labs
Hybrid Scheme Example
Object Contents
Object Pointer
Object Contents
State PointerOldNew
State
Object Contents
State PointerOldNew
State
X
In the Software Mode Copy and Modify
In the Hardware Mode Modify in place
Thread 1: HW modeThread 2: HW mode
Thread 3: SW mode
Conflict detected by the threads in the hardware mode
Hybrid Transactional Memory 16Intel Labs
Hybrid Scheme Summary
Object Contents
Object Pointer
Object Contents
State PointerOldNew
State
Conflict DetectionActive Thread Mode
Hardware Software
Conflicting Thread Mode
Hardware
Contents State
Software Object Pointer
State
Sharing GranularityActive Thread Mode
Hardware Software
Conflicting Thread Mode
Hardware
Cache line Object
Software Object Object
Outline Motivation Proposed Architectural Support Hybrid Transactional Memory Performance Evaluation Conclusions
Hybrid Transactional Memory 18Intel Labs
Experimental Framework Infrastructure
Cycle-accurate execution-driven Multi-core simulator Modified GCC
Three microbenchmarks Two scenarios: Low and High Contention Compare four synchronization implementations
Lock Pure Hardware Transactional Memory Pure Software Transactional Memory Hybrid Transactional Memory
Hybrid Transactional Memory 19Intel Labs
Performance
0
1
2
3
4
5
6
1 2 4 8 16 32 64
Lock
TM Pure Hardware
TM Pure Software
TM Hybrid
No
rma
lize
d E
xecu
tion
Tim
e
Number of Cores
Benchmark: Vector-Reduce
Contention: Low
Outline Motivation Proposed Architectural Support Hybrid Transactional Memory Performance Evaluation Conclusions
Hybrid Transactional Memory 21Intel Labs
Conclusions Transactional Memory is a promising approach
Makes parallel programming an easier task Easier to achieve parallel speedup
Hybrid Transactional Memory approach works Requires only modest hardware support Common case: Good performance for most
transactions Uncommon case: Graceful fallback to software mode
when a transaction cannot complete within the hardware bounds
Questions ?
Hybrid Transactional Memory 23Intel Labs
Transactions
A Synchronization Mechanism to coordinate accesses to shared data by concurrent threads (An alternative to locks)
Transaction: A group of operations on shared data
Transaction { A = A – 10; B = B + 10; ... if (error) abort_transaction;}
An API Enhancement: 1. Abort in middle of a transaction o On encountering a error
Hybrid Transactional Memory 24Intel Labs
Transactional Memory (TM) A transaction satisfies the following properties
1) Atomicity: All-or-nothing On Commit: all operations become visible On Abort: none of the operations are performed
2) Isolation (Serializable) The transactions committed appear to have been
performed in some serial order
Additional Properties3) Optimistic concurrency control
Necessary for achieving good parallel speedup
4) Non-blocking (Optional) Avoid Priority Inversion Avoid Convoying
Hybrid Transactional Memory 25Intel Labs
Advantage 1: PerformanceLocks
A
B
L1
L1
A
L1
L1
C
D
L1
L1
Serialized on LocksFiner granularity locks helpsBurden on programmer
Transactions
A
B
C
D
Optimistically execute concurrentlyAbort and restart on data conflictAutomatically done by runtime
A
AData
Conflict
Hybrid Transactional Memory 26Intel Labs
Advantage 2: Reduces Bugs With locks, programmers need to
Remember mapping between shared data and locks that guard them─ Make sure the appropriate locks are held while accessing
shared data
Make lock granularity as small as possible Avoid deadlocks due to locks
All of these can cause subtle bugs
With TM, programmer does not have to deal with these problems
Hybrid Transactional Memory 27Intel Labs
Other Advantages Allows new programming paradigms
Simplifies error handling A new style of programming: Speculate and Verify
Programmer can abort offending transactions
Avoids other problems that locks suffer from Priority Inversion: A low-priority thread can grab a lock and
block a higher-priority thread Convoying: If a thread holding a lock blocks on a high-latency
event (like context-switch or I/O), it can cause other threads to wait for long periods
Fault Tolerant: If a process holding a lock dies, other processes will hang forever
Runtime system can abort offending transactions
Hybrid Transactional Memory 28Intel Labs
0
1
2
3
4
5
6
1 2 4 8 16 32 64
Lock
TM Pure Hardware
TM Pure Software
TM Hybrid
No
rma
lize
d E
xecu
tion
Tim
e
Number of Cores
Benchmark: Vector-Reduce
Contention: Low
Hybrid Transactional Memory 29Intel Labs
ABCDEF
ABCDEF
ABCDEF
Abcdef Ghijk
Abcdef Ghijk
Abcdef Ghijk
Abcdef Ghijk
Abcdef Ghijk
Abcdef Ghijk