Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan...

Post on 31-Mar-2015

221 views 6 download

Tags:

Transcript of Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan...

Read-Write Lock Allocation in Software Transactional

Memory

Amir Ghanbari Bavarsad and Ehsan Atoofian

Lakehead University

P1

$ $

Pn

Global Clock

Transactional Memory Software transactional memory (STM) exploits a

global clock to validate transactional data Pros: reduces validation overhead Cons: contention

Alternate: Read Write Lock Allocation (RWLA) Pros: no central clock Cons: overhead if a TX aborts

Speculative RWLA: changes validation policy dynamically → Speedup: up to 66%

2

Outline

Background

RWLA

Speculative RWLA

Conclusion

3

4

Counter in STM

T1

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

Transactional data are validated using: Global clock

Shared variable Timestamp for transactions

Lock Memory is mapped to Lock Table Each entry of the table:

Version #

5

Validation in STM

Global Clock

Memory

Lock Table

Version #

6

Updating Global Clock & Lock Increment Global Clock Version # = global_clock Global Clock

Memory

Lock Table

Version #

counter

7

Validation in STM

rv (read version) is set to global_clock

T1

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

Metadata for TX1

rv

Global Clock

8

Successful Read Validation

rv >= version# The most recent write to counter,

occurred before TM_BEGIN()

T1

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

Metadata for TX1 Global Clock

rv

9

Failed Read Validation

rv < version# The most recent write to counter,

occurred after TM_BEGIN()

T1

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

Metadata for TX1 Global Clock

rv

Overhead of Validation

This method, called GV4, results in many cache coherence misses if transactions commit frequently

10

P1

$ $

Pn

Global Clock

Outline

Background

RWLA

Speculative RWLA

Conclusion

11

Lock Memory is mapped to Lock Table Each entry of the table:

Lock bit Read bits

Read Write Lock Allocation (RWLA)

12

Lock Table

Memory

P0P1…Pn-1

lock bitRead bits

13

TM_READ

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

000000 …..

14

TM_READ

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

Set read bit in the corresponding lock

entry

Yes

TM_READ()

Lock bit is free?

000000 …..1

lock bit

15

TM_READ

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

Abort

No

100000 …..

Set read bit in the corresponding lock

entry

Yes

TM_READ()

Lock bit is free?

16

TM_WRITE

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

Abort

TM_WRITE

All read bits are clear?

No

000100 …..

17

TM_WRITE

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

Abort

TM_WRITE

Acquire lockfailed

All read bits are clear?

No

Yes

100000 …..

18

TM_WRITE

TM_BEGIN(); local_counter = TM_READ(counter); local_counter++;

TM_WRITE(counter, local_counter); TM_END();

00000 …..

Abort

TM_WRITE

Acquire lockfailed

All read bits are clear?

No

Yes

10

Experimental Framework

Benchmarks: Stamp v0.9.7 Run up to competition Measured statistics over 10 runs

TL2 as an STM framework

Two Intel Xeon E5660, 6-way CMP

19

Performance of RWLA

20

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Bayes Kmeans Labyrinth Ssca2 Vacation Genome

2 4 8 16 AVG.

bet

ter

Speculative RWLA Conflict occurs frequently → select GV4 Conflict occurs rarely → select RWLA How to predict conflict?

21

Contention Predictor

Prediction: y≥0 →predict commit y<0 →predict abort

Update If outcome of current TX and TXi agree/disagree →increment/decrement

wi

22

1 X1 … Xn

y

w1w0 wn

n

niiwxwy

10 )(

xi: global transaction history, bipolar value

wi: weight vector

Performance of Speculative RWLA # of threads changes between 2 and 16 On average, performance changes from 21% in Bayes to

47% in Labyrinth

23

0

0.2

0.4

0.6

0.8

1

1.2

Bayes Kmeans Labyrinth Ssca2 Vacation Genome

2 4 8 16 AVG.

bet

ter

Conclusion

RWLA to overcome contentions over global clok

Applications react differently to GV4 and RWLA

Speculative RWLA changes validation policy dynamically

Speculative RWLA performance of STMs up to 66%

24

25

Thank You!

Questions?