Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy &...
-
Upload
cory-bailey -
Category
Documents
-
view
228 -
download
0
Transcript of Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy &...
Transactional Memory
Companion slides forThe Art of Multiprocessor Programming
by Maurice Herlihy & Nir Shavit
2
Our Vision for the Future
In this course, we covered ….In this course, we covered ….
Best practices …Best practices …
New and clever ideas …New and clever ideas …
And common-sense observations.And common-sense observations.
Art of Multiprocessor Programming
3
Our Vision for the Future
In this course, we covered ….In this course, we covered ….
Best practices …Best practices …
New and clever ideas …New and clever ideas …
And common-sense observations.And common-sense observations.
Nevertheless …Nevertheless …
Concurrent programming is still too hard …Concurrent programming is still too hard …
Here we explore why this is ….Here we explore why this is ….
And what we can do about it.And what we can do about it.
Art of Multiprocessor Programming
4
Locking
Art of Multiprocessor Programming
5
Coarse-Grained Locking
Easily made correct …But not scalable.
Art of Multiprocessor Programming
6
Fine-Grained Locking
Can be tricky …
Art of Multiprocessor Programming
7
Locks are not Robust
If a thread holdinga lock is delayed …
No one else can make progress
Art of Multiprocessor Programming
Locking Relies on Conventions
• Relation between– Lock bit and object bits– Exists only in programmer’s mind
/* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */
Actual comment from Linux Kernel
(hat tip: Bradley Kuszmaul)
Art of Multiprocessor Programming
9
Simple Problems are hard
enq(x) enq(y)double-ended queue
No interference if ends “far apart”
No interference if ends “far apart”
Interference OK if queue is small
Interference OK if queue is small
Clean solution is publishable result:
[Michael & Scott PODC 97]
Clean solution is publishable result:
[Michael & Scott PODC 97]
Art of Multiprocessor Programming
Art of Multiprocessor Programming
10
Locks Not Composable
Transfer item from one queue to another
Transfer item from one queue to another
Must be atomic :No duplicate or missing items
Must be atomic :No duplicate or missing items
Art of Multiprocessor Programming
11
Locks Not Composable
Lock sourceLock source
Lock targetLock target
Unlock source & target
Unlock source & target
Art of Multiprocessor Programming
12
Locks Not Composable
Lock sourceLock source
Lock targetLock target
Unlock source & target
Unlock source & target
Methods cannot provide internal synchronization
Methods cannot provide internal synchronization
Objects must expose locking protocols to clients
Objects must expose locking protocols to clients
Clients must devise and follow protocols
Clients must devise and follow protocols
Abstraction broken!Abstraction broken!
13
Monitor Wait and Signal
zzz
Empty
bufferYes!
Art of Multiprocessor Programming
If buffer is empty, wait for item to show up
If buffer is empty, wait for item to show up
14
Wait and Signal do not Compose
empty
emptyzzz…
Art of Multiprocessor Programming
Wait for either?Wait for either?
Art of Multiprocessor Programming
1515
The Transactional Manifesto
• Current practice inadequate– to meet the multicore challenge
• Research Agenda– Replace locking with a transactional API – Design languages or libraries– Implement efficient run-times
Art of Multiprocessor Programming
1616
Transactions
Block of code ….Block of code ….
Atomic: appears to happen instantaneously
Atomic: appears to happen instantaneously
Serializable: all appear to happen in one-at-a-time
order
Serializable: all appear to happen in one-at-a-time
orderCommit: takes effect (atomically)
Commit: takes effect (atomically)
Abort: has no effect (typically restarted)
Abort: has no effect (typically restarted)
Art of Multiprocessor Programming
1717
atomic { x.remove(3); y.add(3);}
atomic { y = null;}
Atomic Blocks
Art of Multiprocessor Programming
1818
atomic { x.remove(3); y.add(3);}
atomic { y = null;}
Atomic Blocks
No data race
Art of Multiprocessor Programming
1919
public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q;}
A Double-Ended Queue
Write sequential Code
Art of Multiprocessor Programming
2020
public void LeftEnq(item x) atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; }}
A Double-Ended Queue
Art of Multiprocessor Programming
2121
public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; }}
A Double-Ended Queue
Enclose in atomic block
Art of Multiprocessor Programming
2222
Warning• Not always this simple
– Conditional waits– Enhanced concurrency– Complex patterns
• But often it is…
Art of Multiprocessor Programming
23
Composition?
Art of Multiprocessor Programming
24
Composition?
public void Transfer(Queue<T> q1, q2){ atomic { T x = q1.deq(); q2.enq(x); }}
Trivial or what?Trivial or what?
Art of Multiprocessor Programming
2525
public T LeftDeq() { atomic { if (this.left == null) retry; … }}
Conditional Waiting
Roll back transaction and restart when
something changes
Roll back transaction and restart when
something changes
Art of Multiprocessor Programming
2626
Composable Conditional Waiting
atomic { x = q1.deq(); } orElse { x = q2.deq();}
Run 1st method. If it retries …Run 1st method. If it retries …Run 2nd method. If it retries …Run 2nd method. If it retries …
Entire statement retriesEntire statement retries
Art of Multiprocessor Programming
2727
Hardware Transactional Memory
• Exploit Cache coherence• Already almost does it
– Invalidation– Consistency checking
• Speculative execution– Branch prediction =
optimistic synch!
Art of Multiprocessor Programming
2828
HW Transactional Memory
Interconnect
caches
memory
read active
T
Art of Multiprocessor Programming
2929
Transactional Memoryread
activeTT
active
caches
memory
Art of Multiprocessor Programming
3030
Transactional Memory
activeTT
activecommitted
caches
memory
Art of Multiprocessor Programming
3131
Transactional Memorywrite
active
committed
TD caches
memory
Art of Multiprocessor Programming
3232
Rewind
activeTT
activewriteaborted
D caches
memory
Art of Multiprocessor Programming
3333
Transaction Commit
• At commit point– If no cache conflicts, we win.
• Mark transactional entries– Read-only: valid– Modified: dirty (eventually written back)
• That’s all, folks!– Except for a few details …
Art of Multiprocessor Programming
3434
Not all Skittles and Beer
• Limits to– Transactional cache size– Scheduling quantum
• Transaction cannot commit if it is– Too big– Too slow– Actual limits platform-dependent
HTM Strengths & Weaknesses
• Ideal for lock-free data structures
HTM Strengths & Weaknesses
• Ideal for lock-free data structures
• Practical proposals have limits on– Transaction size and length– Bounded HW resources– Guarantees vs best-effort
HTM Strengths & Weaknesses
• Ideal for lock-free data structures• Practical proposals have limits on
– Transaction size and length– Bounded HW resources– Guarantees vs best-effort
• On fail– Diagnostics essential– Retry in software?
Composition
Locks don’t compose, transactions do.Locks don’t compose, transactions do.
Composition necessary for Software Engineering.Composition necessary for Software Engineering.
But practical HTM doesn’t really support composition!
But practical HTM doesn’t really support composition!
Why we need STMWhy we need STM
Art of Multiprocessor Programming
39
Simple Lock-Based STM
• STMs come in different forms– Lock-based– Lock-free
• Here : a simple lock-based STM
Art of Multiprocessor Programming
40
Synchronization
• Transaction keeps– Read set: locations & values read– Write set: locations & values to be written
• Deferred update– Changes installed at commit
• Lazy conflict detection– Conflicts detected at commit
Art of Multiprocessor Programming
4141
STM: Transactional Locking
Map
Application Memory
V#
V#
V#
Array of version #s &
locks
Array of version #s &
locks
Art of Multiprocessor Programming
4242
Reading an ObjectMem
Locks
V#
V#
V#
V#
V#
Add version numbers & values to read set
Add version numbers & values to read set
Art of Multiprocessor Programming
4343
To Write an ObjectMem
Locks
V#
V#
V#
V#
V#
Add version numbers & new values to write set
Add version numbers & new values to write set
Art of Multiprocessor Programming
4444
To CommitMem
Locks
V#
V#
V#
V#
V#
X
Y
V#+1
V#+1
Acquire write locksAcquire write locks
Check version numbers unchanged
Check version numbers unchanged
Install new valuesInstall new valuesIncrement version numbersIncrement version numbers
Unlock.Unlock.
Problem: Internal Inconsistency
• A Zombie is an active transaction destined to abort.
• If Zombies see inconsistent states bad things can happen
Art of Multiprocessor Programming
46
Internal Consistency
x
y
4
2
8
4
Invariant: x = 2yInvariant: x = 2y
Transaction A: reads x = 4
Transaction A: reads x = 4Transaction B: writes
8 to x, 16 to y, aborts A )
Transaction B: writes8 to x, 16 to y, aborts
A )Transaction A: (zombie)reads y = 4
computes 1/(x-y)
Transaction A: (zombie)reads y = 4
computes 1/(x-y)
Divide by zero FAIL!Divide by zero FAIL!
Art of Multiprocessor Programming
47
Solution: The Global Clock
• Have one shared global clock
• Incremented by (small subset of) writing transactions
• Read by all transactions
• Used to validate that state worked on is always consistent
100
Art of Multiprocessor Programming
4848
Read-Only TransactionsMem
Locks
12
32
56
19
17
100
Shared Version Clock
Private Read Version (RV)
Copy version clock to local read version clockCopy version clock to
local read version clock
100
Art of Multiprocessor Programming
4949
Read-Only TransactionsMem
Locks
12
32
56
19
17
100
Shared Version Clock
Private Read Version (RV)
Copy version clock to local read version clockCopy version clock to
local read version clockRead lock, version #,
and memoryRead lock, version #,
and memory
100
Art of Multiprocessor Programming
5050
Read-Only TransactionsMem
Locks
12
32
56
19
17
100
Shared Version Clock
Private Read Version (RV)
Copy version clock to local read version clockCopy version clock to
local read version clockRead lock, version #,
and memoryRead lock, version #,
and memoryOn Commit:check unlocked &
version # unchanged
On Commit:check unlocked &
version # unchanged
Art of Multiprocessor Programming
5151
Read-Only TransactionsMem
Locks
12
32
56
19
17
100
Shared Version Clock
Private Read Version (RV)
Copy version clock to local read version clockCopy version clock to
local read version clockRead lock, version #,
and memoryRead lock, version #,
and memoryOn Commit:check unlocked &
version # unchanged
On Commit:check unlocked &
version # unchanged
Check that version #s less than local read clockCheck that version #s
less than local read clock
100
Art of Multiprocessor Programming
5252
Read-Only TransactionsMem
Locks
12
32
56
19
17
100
Shared Version Clock
Private Read Version (RV)
Copy version clock to local read version clockCopy version clock to
local read version clockRead lock, version #,
and memoryRead lock, version #,
and memoryOn Commit:check unlocked &
version # unchanged
On Commit:check unlocked &
version # unchanged
Check that version #s less than local read clockCheck that version #s
less than local read clock
100
We have taken a snapshot without keeping an explicit read
set!
We have taken a snapshot without keeping an explicit read
set!
100
Art of Multiprocessor Programming
5353
Regular TransactionsMem
Locks
12
32
56
19
17
100
Shared Version Clock
Private Read Version (RV)
Copy version clock to local read version clockCopy version clock to
local read version clock
100
Art of Multiprocessor Programming
5454
Regular TransactionsMem
Locks
12
32
56
19
17
100
Shared Version Clock
Private Read Version (RV)
Copy version clock to local read version clockCopy version clock to
local read version clockOn read/write, check:
Unlocked & version # < RV
Add to R/W set
On read/write, check:Unlocked & version # <
RVAdd to R/W set
Art of Multiprocessor Programming
5555
On CommitMem
Locks
100
Shared Version Clock
100
12
32
56
19
17 Private Read Version (RV)
Acquire write locksAcquire write locks
Art of Multiprocessor Programming
5656
On CommitMem
Locks
100
Shared Version Clock
100101
12
32
56
19
17 Private Read Version (RV)
Acquire write locksAcquire write locks
Increment Version ClockIncrement Version Clock
Art of Multiprocessor Programming
5757
On CommitMem
Locks
100
Shared Version Clock
100101
12
32
56
19
17 Private Read Version (RV)
Acquire write locksAcquire write locks
Increment Version ClockIncrement Version ClockCheck version numbers
≤ RVCheck version numbers
≤ RV
Art of Multiprocessor Programming
5858
On CommitMem
Locks
100
Shared Version Clock
100101
12
32
56
19
17 Private Read Version (RV)
Acquire write locksAcquire write locks
Increment Version ClockIncrement Version ClockCheck version numbers
≤ RVCheck version numbers
≤ RVUpdate memoryUpdate memoryx
y
Art of Multiprocessor Programming
5959
On CommitMem
Locks
100
Shared Version Clock
100101
12
32
56
19
17 Private Read Version (RV)
Acquire write locksAcquire write locks
Increment Version ClockIncrement Version ClockCheck version numbers
≤ RVCheck version numbers
≤ RVUpdate memoryUpdate memory
Update write version #sUpdate write version #s
x
y
100
100
Art of Multiprocessor Programming
60
TM Design Issues
• Implementation choices
• Language design issues
• Semantic issues
Art of Multiprocessor Programming
61
Granularity
• Object– managed languages, Java, C#, …– Easy to control interactions between
transactional & non-trans threads
• Word– C, C++, …– Hard to control interactions between
transactional & non-trans threads
Art of Multiprocessor Programming
62
Direct/Deferred Update
• Deferred – modify private copies & install on commit– Commit requires work– Consistency easier
• Direct – Modify in place, roll back on abort– Makes commit efficient– Consistency harder
Art of Multiprocessor Programming
63
Conflict Detection
• Eager– Detect before conflict arises– “Contention manager” module resolves
• Lazy– Detect on commit/abort
• Mixed– Eager write/write, lazy read/write …
Art of Multiprocessor Programming
64
Conflict Detection
• Eager detection may abort transactions that could have committed.
• Lazy detection discards more computation.
Art of Multiprocessor Programming
65
Contention Management & Scheduling
• How to resolve conflicts?
• Who moves forward and who rolls back?
• Lots of empirical work but formal work in infancy
Art of Multiprocessor Programming
66
Contention Manager Strategies
• Exponential backoff
• Priority to– Oldest?– Most work?– Non-waiting?
• None Dominates
• But needed anywayJudgment of Solomon
Art of Multiprocessor Programming
67
I/O & System Calls?
• Some I/O revocable– Provide transaction-
safe libraries– Undoable file
system/DB calls
• Some not– Opening cash
drawer– Firing missile
Art of Multiprocessor Programming
68
I/O & System Calls
• One solution: make transaction irrevocable– If transaction tries I/O, switch to irrevocable
mode.
• There can be only one …– Requires serial execution
• No explicit aborts– In irrevocable transactions
Art of Multiprocessor Programming
69
Exceptions
int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}
int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}
Art of Multiprocessor Programming
70
Exceptions
int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}
int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}
Throws OutOfMemoryException!
Art of Multiprocessor Programming
71
Exceptions
int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}
int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}
Throws OutOfMemoryException!
What is printed?
Art of Multiprocessor Programming
72
Unhandled Exceptions
• Aborts transaction– Preserves invariants– Safer
• Commits transaction– Like locking semantics– What if exception object refers to values
modified in transaction?
Art of Multiprocessor Programming
73
Nested Transactions
atomic void foo() { bar();}
atomic void bar() { …}
atomic void foo() { bar();}
atomic void bar() { …}
Art of Multiprocessor Programming
74
Nested Transactions
• Needed for modularity– Who knew that cosine() contained a
transaction?
• Flat nesting– If child aborts, so does parent
• First-class nesting– If child aborts, partial rollback of child only
Remember 1993?
Citation Count
TM Today
93,300
2,210,000
Second Opinion
Hatin’ on TM
STM is too inefficientSTM is too inefficient
Hatin’ on TM
Requires radical change in programming styleRequires radical change in programming style
Hatin’ on TM
Erlang-style shared nothing only true path to salvationErlang-style shared nothing only true path to salvation
Hatin’ on TM
There is nothing wrong with what we do today.There is nothing wrong with what we do today.
Gartner Hype Cycle
Hat tip: Jeremy Kemp
Art of Multiprocessor Programming
84
• Multicore forces us to rethink almost everything
I, for one, Welcome our new Multicore Overlords …
Art of Multiprocessor Programming
85
• Multicore forces us to rethink almost everything
• Standard approaches too complex
I, for one, Welcome our new Multicore Overlords …
Art of Multiprocessor Programming
86
• Multicore forces us to rethink almost everything
• Standard approaches won’t scale
• Transactions might make life simpler…
I, for one, Welcome our new Multicore Overlords …
Art of Multiprocessor Programming
87
• Multicore forces us to rethink almost everything
• Standard approaches won’t scale
• Transactions might …
• Multicore programming
Plenty more to do…
Maybe it will be you…
I, for one, Welcome our new Multicore Overlords …
Thanks ! תודה
Art of Multiprocessor Programming
88