Towards a Software Transactional Memory for Graphics Processors

37
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry

description

Towards a Software Transactional Memory for Graphics Processors. Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry. Overview. Software Transactional Memory ( STM ) has long been suggested as a possible model for parallel programming However, its practicality is debated - PowerPoint PPT Presentation

Transcript of Towards a Software Transactional Memory for Graphics Processors

Page 1: Towards a Software Transactional Memory for Graphics Processors

TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS

Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry

Page 2: Towards a Software Transactional Memory for Graphics Processors

Overview Software Transactional Memory (STM)

has long been suggested as a possible model for parallel programming

However, its practicality is debated By exploring how to design an STM for

GPUs, we want to bring this debate over to the graphics processor domain

Page 3: Towards a Software Transactional Memory for Graphics Processors

Introduction

Page 4: Towards a Software Transactional Memory for Graphics Processors

Problem Scenario

10

3

1 7

20

Binary Tree 12

12

12

Page 5: Towards a Software Transactional Memory for Graphics Processors

1712

Problem Scenario

10

3

1 7

20

Binary Tree 1217 1

217

17 ?

Page 6: Towards a Software Transactional Memory for Graphics Processors

Pseudocode

find position in tree;insert element;

Step 1

Step 2

Tree might change here!

Page 7: Towards a Software Transactional Memory for Graphics Processors

Software Transactional Memory

STMs provides a construction that

divides code into transactions that are

guaranteed to be executed atomically

Page 8: Towards a Software Transactional Memory for Graphics Processors

Modified Pseudocode

find position in tree;insert element;

Step 1

atomic {

}

Just one step!

Page 9: Towards a Software Transactional Memory for Graphics Processors

Scenario with Transactions

10

20

10

20

Concurrent

Transaction A Transaction B

12

17

12

12

17

17

Page 10: Towards a Software Transactional Memory for Graphics Processors

Scenario with Transactions

10

20

10

20

Concurrent

Committed Transaction B

12

17

Page 11: Towards a Software Transactional Memory for Graphics Processors

Scenario with Transactions

10

20

10

20

Concurrent

Committed Aborted

12

17

Page 12: Towards a Software Transactional Memory for Graphics Processors

Progress GuaranteesLock SchemeLock GranularityLog or Undo-LogConflict Detection Time…

STM Design

Page 13: Towards a Software Transactional Memory for Graphics Processors

One Lock No concurrency Busy waiting Convoying

Doneyet?

Doneyet?Done

yet?

Page 14: Towards a Software Transactional Memory for Graphics Processors

Multiple Locks Better concurrency Difficult Deadlocks

CBA

DN

W

X

Q

U

S

L

Z

T

R

Y

Process A

Process B

Page 15: Towards a Software Transactional Memory for Graphics Processors

20

Dynamic Locks

10

30

Transaction Log

25

251

525

15

25

10Read20Read30Read30Write20Write

15

Page 16: Towards a Software Transactional Memory for Graphics Processors

Two STM Designs

Transactions that fail to acquire a lock are aborted

Busy waits, but avoids deadlocks

Transactions that fail to acquire a lock can steal it or abort the other transaction

No busy wait Based on STM by Tim

Harris and Keir Fraser "Language support for lightweight transactions", OOPSLA 2003

A Blocking STM A Non-Blocking STM

Page 17: Towards a Software Transactional Memory for Graphics Processors

Lock Granularity

Object Aint var1;int var2;int var3;int var4;int var5;

int var1;int var2;int var3;int var4;int var5;

Object B

Object Word Stripe

Tradeoff between false conflicts and the number of locks

Wide memory bus makes it quick to copy object

Page 18: Towards a Software Transactional Memory for Graphics Processors

Log or Undo-Log

Better with high contention

No visibility problem

Optimistic Faster when no

conflict Problem with

visibility

Log Changes Log Original Content

The visibility problem is hard to handle in a non-managed environment

Page 19: Towards a Software Transactional Memory for Graphics Processors

Conflict Detection Time

Conflicts detected early

More false conflicts

Late conflict detection

Locks are held shorter time

Acquire locks immediately

Acquire locks at commit time

Holding locks longer will lead to more transactionsaborting long transactions in the non-blocking STM

Page 20: Towards a Software Transactional Memory for Graphics Processors

CUDA Specifics Minimal use of processor local memory

Better left to the main application SIMD instruction used where possible

Mostly used to coalesce reads and writes No Recursion

Rewritten to use stack

Page 21: Towards a Software Transactional Memory for Graphics Processors

More CUDA Specifics No Functions

Rewritten to use goto statements Possible Scheduler Problems

Bad scheduling can lead to infinite waiting time

No Dynamic Memory during Runtime Allocates much memory

Page 22: Towards a Software Transactional Memory for Graphics Processors

Experiments

Page 23: Towards a Software Transactional Memory for Graphics Processors

Experiments Queue (enqueue/dequeue)

Many conflicts expected

Hash-map (inserts) Fewer conflicts, but longer

transactions as more itemsare inserted

Binary Tree (inserts) Fewer conflicts as the tree

grows

Page 24: Towards a Software Transactional Memory for Graphics Processors

Experiments Skip-List

Insert/Lookup/Remove Scales similar to tree

Comparison with lock-free skip-list H. Sundell and P. Tsigas, Scaleable and lock-

free concurrent dictionaries, SAC04.

Page 25: Towards a Software Transactional Memory for Graphics Processors

Contention LevelsWe performed the experiments using

differentlevels of contention

High Contention (No pause between transactions)

Low Contention (500ms of work distributed randomly)

Page 26: Towards a Software Transactional Memory for Graphics Processors

Backoff Strategies Lowers contention by waiting before

aborted transactions are tried again Increases the probability that at least

one transaction is successful

Exponential LinearNone/Static

Attempts

Tim

e

Page 27: Towards a Software Transactional Memory for Graphics Processors

Hardware Nvidia GTX 280 with 30

multiprocessors 1-60 thread blocks 32 threads in each block

Page 28: Towards a Software Transactional Memory for Graphics Processors

Backoff Strategy (Queue, Full Con)

None Linear Exponential0

2000

4000

6000

8000

10000

12000

14000

16000 Non-BlockingBlocking

Aver

age

num

ber

ofab

orts

per

tr

ansa

ctio

ns

Page 29: Towards a Software Transactional Memory for Graphics Processors

Blocking Queue (Full Contention)

1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 600

5

10

15

20

25

30

No BackoffLinear BackoffExponential Backoff

Thread Blocks

Tota

l num

ber

ofop

erat

ions

per

m

illis

econ

d

Page 30: Towards a Software Transactional Memory for Graphics Processors

Queue (Full Contention)

1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 6005

1015202530354045

Non-Blocking Blocking

Thread Blocks

Tota

l num

ber

ofop

erat

ions

per

m

illis

econ

d

Page 31: Towards a Software Transactional Memory for Graphics Processors

Binary Tree (Full Contention)

1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 600

50

100

150

200

250

300Non-Block-ingBlocking

Thread Blocks

Tota

l num

ber

ofop

erat

ions

per

m

illis

econ

d

Page 32: Towards a Software Transactional Memory for Graphics Processors

Hash-map (Full Contention)

1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 600

20406080

100120140

Non-BlockingBlocking

Thread Blocks

Tota

l num

ber

ofop

erat

ions

per

m

illis

econ

d

Page 33: Towards a Software Transactional Memory for Graphics Processors

Hash-Map Aborts (Full Contention)

0

200

400

600

800

1000

1200

57

1121Non-Blocking

Blocking

Aver

age

num

ber

ofab

orts

per

tr

ansa

ctio

ns

Long and expensivetransactions!

Page 34: Towards a Software Transactional Memory for Graphics Processors

Skip-List (Full Contention)

1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 600

102030405060708090

Non-BlockingBlocking

Thread Blocks

Tota

l num

ber

ofop

erat

ions

per

m

illis

econ

d

Page 35: Towards a Software Transactional Memory for Graphics Processors

Lock-Free Skip-List (Full Contention)

1 4 8 121620242832364044485256600

200400600800

100012001400160018002000

Non-Block-ingBlockingLock-FreeLinear

Thread Blocks

Tota

l num

ber

ofop

erat

ions

per

m

illis

econ

d

Page 36: Towards a Software Transactional Memory for Graphics Processors

Conclusion Software Transactional Memory has

attracted the interest of many researchers over the recent years

We have evaluated a blocking and a non-blocking STM design on a graphics processor

The non-blocking design scaled better than the blocking

For simplicity you pay in performance The debate is still open!

Page 37: Towards a Software Transactional Memory for Graphics Processors

For more information:http://www.cs.chalmers.se/~dcs

Thank you!