How Good can a Transactional Memory be? R. Guerraoui, EPFL.

How Good can a Transactional Memory be?

R. Guerraoui , EPFL

How Good can a Transactional Memory

be?

From the New York TimesSan Francisco, May 7, 2004Intel announces a drastic change in its business strategy:« Multicore is THE way to boost performance »

Multicores Multicores areare almost almost everywhereeverywhere

Dual-core commonplace in laptopsQuad-core in desktopsDual quad-core in serversAll major chip manufacturers produce multicore CPUs

SUN Niagara (8 cores, 64 concurrent threads)Intel Xeon (6 cores)AMD Opteron (4 cores)…

The free ride is over

The free ride is over

Every one will need to fork threads

Forking threads is easy

Handling their conflicts is hard

Traditional scalingTraditional scaling

1x2x

4x

Time: Moore’s Law

Speedup

User code

Traditional CPU

Speedup

1x2x

4x

User code

Multicore CPU

Time: Moore’s Law

Ideal multicore scalingIdeal multicore scaling

Real multicore scalingReal multicore scaling

Speedup

1x1.4x

2.2x

User code

Multicore CPU

Time: Moore’s Law

Parallelization & synchronization require

great care!


great care!

Coarse grained locks => slow

Fine grained locks => errors

Double-ended queue

Enqueue Dequeue

A synchronization abstraction that is:

Simple to use Efficient to implement

Wanted

Transactions

accessing object 1; accessing object 2;

Back to the undergraduate level

atomic {

}

Historical perspective

Eswaran et al (CACM’76) Database Papadimitriou (JACM’79) Theory Liskov/Sheifler (TOPLAS’82) Language Knight (ICFP’86) Architecture Herlihy/Moss (ISCA’93) Hardware Shavit/Touitou (PODC’95) Software

Simple example

(consistency invariant)

0 < x < y

T: x := x+1 ; y:= y+1

Simple example

(transaction)

Two-phase locking

To write O, T requires a write-lock on O;

To read O, T requires a read-lock on O;

Before committing, T releases all its locks

T waits if some T’ acquired a lock on O

T waits if some T’ acquired a write-lock on O

Why two phases?

To write O, T requires a write-lock on O; T waits if some T’ acquired a lock on O

To read O, T requires a read-lock on O; T waits if some T’ acquired a write-lock on O

T releases the lock on O when done with O

Why two phases?

T1

T2

read(0) write(1)

O1 O2

read(0) write(1)

O2 O1

Two-phase locking - better dead than wait

-



Before committing, T releases all its locks

T aborts if some T’ acquired a lock on O

A transaction that aborts restarts again

T aborts if some T’ acquired a lock on O

Two-phase locking - better kill than wait -



Before committing, T releases all its lock

T aborts T’ if T’ acquired a lock on O

T aborts T’ if T’ acquired a write-lock on O

A transaction that aborts restarts again

Two-phase locking- invisible reads -

To write O, T requires a write-lock on O; T aborts T’ if some T’ acquired a write-lock

on OTo read O, T checks if all objects read remain valid - else T aborts

Before committing, T checks if all objects read remain valid and releases all its locks

“It is better for Intel to get involved in this [Transactional Memory] now so when we get to the point of having …tons… of cores we will have the answers”

Justin Rattner, Intel Chief Technology Officer

“…we need to explore new techniques like transactional memory that will allow us to get the full benefit of all those transistors and map that into higher and higher performance.”

Bill Gates, President of a transparently operated private foundation

“…manual synchronization is intractable…transactions are the only plausible solution….”

Tim Sweeney, Epic Games

Speedup

1x2x

4x

User code

Multicore CPU

Time: Moore’s Law

Multicore scalingMulticore scaling

Micro-benchmarks …

Linked-lists; red-black trees, etc.Consider specific loads: typically read-only transactions

Challenging TMsSTMBench7 (Eurosys’07)

Large data structure: challenge memory overhead

Long operations: kills non-linear algorithms

Complex access patterns: stretches flexibility

STMBench7:(Transact’08) Transaction Terminator

All TMs collapsed because of internal bugs or memory usage (except X)

Certain urban legends about performance revealed inaccurate; e.g., OF is inherently slower than lock-based TMs

Real-world scalingReal-world scaling

Speedup

1x1.4x

2.2x

User code

Multicore CPU

Time: Moore’s Law


great care!


great care!

Software Transactional Memory:Why is it only a Research Toy?

C. Cascaval, C. Blundell, M. Michael,H. Cain, P. Wu, S. Chiras, S. Chatterjee

Why STM can be more thana Research Toy?

A. Dragojević, P. Felber, V. Gramoli, R. Guerraoui

How Good can a Transactional Memory

be?

How much commit can a TM perform?

A closer look at TM

What really prevents commitment? (Safety)

How much commitment is practically possible? (Progress)

To write an object O, a transaction acquires O and aborts “the” transaction that owns O

To read an object, a transaction T takes a snapshot to see if the system hasn’t changed since T’s last reads; else T is aborted


Killer write (ownership)

Careful read (validation)


More efficient algorithm

Apologizing versus asking permission

Killer writeLazy read: validity check at commit time

A history is atomic if itsrestriction to committed transactions is serializable

The old safety (Pap79)

A history H of committed transactions is serializable if there is a history S(H) that is (1) equivalent to H(2) sequential (3) legal

Back to the example

Invariant: 0 < x < yInitially: x := 1; y := 2

Division by zero

T1: x := x+1 ; y:= y+1

T2: z := 1 / (y - x)

T1: x := 3; y:= 6

Infinite loop

T2: a := y; b:= x; repeat b:= b + 1 until a = b

We need to restrict ALL transactions (as with critical sections)

The old safety property restricts only committed transactions

A history H is opaque if for every transaction T in H, at least one history in committed(T,H) is serializable

Candidate property: Opacity(PPoPP’08)

Careful read (validation)

Killer write (ownership)


Visible vs Invisible Read (SXM; RSTM)

Write is mega killer: to write an object, a transaction aborts any live one which has read or written the object

Visible but not so careful read: when a transaction reads an object, it says so

TheoremReads are eithervisible or careful

NB. Assuming minimal progress and single versions

Intuition of the proof

T1

T2

read()

write()commit

I1,I2,..,Im

O1,O2,..,Onread()

Ik

The theorem does not hold for classical atomicity

i.e., the theorem does not hold for database transactions

A closer look at TM

What really prevents commitment? (Opacity)

How to prove opacity?

Model checking

Check that the conflict graph is acyclic

Number of nodes is unbounded (STM) NP-Complete problem

Model checking opacity?

Hint: reduce the verification space

Uniform system All transactions are treated equallyTransaction and variable names do not

matter

Atomic system Read, write and commit operations are

atomic

TM verification theorem (PLDI’08)

A TM either violates opacity with 2 transactions and 3 variables or satisfies it with any number of variables and transactions

Examples

It takes 15mn to check the correctness of TL2 and DSTM

Reverse two lines in TL2: bug found in 10mn

A closer look at TM

What really prevents commitment? Opacity

How much commitment is possible (given opacity)?

Ideal progress

No correct transaction aborts

T1

T2

read()

write()

commit

O1

O1write()

O2

Aborting is a fatality

read()

O2

abort

Eventual progress

Every correct transaction eventually commits

NB. We allow the possibility for a transaction to abort a finite number of times as long as it eventually commits

Eventual progress

T1

T2

read()

write()

commit

O1

O1write()

O2

read()

O2

abort

Eventual progress is impossible

NB. This impossibility is fundamentally different from FLP: It holds no matter what underlying object is used

A closer look at TM


How much commitment is possible (given opacity)?

Conditional progress - obstruction-freedom -

A correct transaction that does not encounter contention eventually commits

DSTM

To write O, T requires a write-lock on O (use C&S);

T aborts T’ if some T’ acquired a write-lock on O (use C&S)

To read O, T checks if all objects read remain valid - else abort (use C&S)

Before committing, T releases all its locks (use C&S)

DSTM uses C&S

C&S is the strongest synchronization primitive

Is OF-TM possible with less than C&S?

e.g., R/W objects

The consensus number of OF-TM is 2 (SPAA’08)

OF-TM cannot be implemented with R/W objects only

OF-TM does not need C&S

A closer look at TM


How much commitment is practically possible? conditional

More?

Permissiveness (DISC’08)

A TM is permissive if it never aborts when it should not

More precisely

Let P be any safety property and H any P-safe history prefix of a deterministic TM

We say that a TM is permissive w.r.t P if – Whenever <H;commit> satisfies P– <H;commit> can be generated by the TM

Permissive TM

• But….

• Very expensive (maintain global conflict graphs)

• Let P be any safety property and H any history H generated by a TM

• The TM is probabilistic permissive with respect to P if – Whenever <H;commit> satisfies P:– <H;commit> can be generated by

TM with a positive probability

Probabilistic permissiveness

• AVSTM: a probabilistically permissive TM w.r.t opacity

• Idea: at commit, a transaction guesses a serialization time in the future

AVSTM

• Non-blocking

• Good performance under high-contention

• Can be combined with an OF-TM

AVSTM

A closer look at TM

• What really prevents commitment? Opacity

• How much commitment is practically possible? conditional; permissiveness

• How fast?

Off-line scheduler (PODC’95)

• Compare the TM protocol with an off-line scheduler that knows:

– The starting time of transactions– Which objects are accessed (i.e., conflicts)

Greedy contention manager

• State• Priority (based on start time)• Waiting flag (set while waiting)

• Wait if other has• Higher priority AND not waiting

• Abort other if• Lower priority OR waiting

Example of competitive ratio

• Let s be the number of objects accessed by all transactions

• Compare time to commit all transactions

• Greedy is O(s)-competitive with the off-line scheduler– GHP’05 O(s2)– AEST’06 O(s)

Joint work with

• A. Dragojevic• T. Henzinger• M. Herlihy • M. Kapalka• V. Sing

• See Transactions@epfl: lpdwww.epfl.ch

Transactions are conquering the parallel programming world

They look simple and familiar and might make the programmer

happy

They are in fact very tricky and should make US happy

The one slide to remember

How Good can a Transactional Memory be? R. Guerraoui, EPFL.

Documents

Transcript of How Good can a Transactional Memory be? R. Guerraoui, EPFL.