Extending Open64 with Transactional Memory features

23
Extending Open64 with Transactional Memory features Jiaqi Zhang Tsinghua University

description

Extending Open64 with Transactional Memory features. Jiaqi Zhang Tsinghua University. Contents. Background Design Implementation Optimization Experiment Conclusion. Transactional Memory Background. Trend to concurrent programming Current solution: Lock Flaws: - PowerPoint PPT Presentation

Transcript of Extending Open64 with Transactional Memory features

Page 1: Extending Open64 with Transactional Memory features

Extending Open64 withTransactional Memory features

Jiaqi ZhangTsinghua University

Page 2: Extending Open64 with Transactional Memory features

Contents

• Background• Design• Implementation• Optimization• Experiment• Conclusion

Page 3: Extending Open64 with Transactional Memory features

Transactional Memory Background

• Trend to concurrent programming• Current solution:

– Lock– Flaws:

• Association between locks and data• Deadlock• Not composable

Page 4: Extending Open64 with Transactional Memory features

Transactional Memory Background

a.credit(amount);b.debit(amount);

class Account{ int balance; lock mylock; bool credit(int amount); bool debit(int amount); };

bool credit(int amount){ acquire(mylock); balance+=amount; release(mylock);}bool debit(int amount){ acquire(mylock); balance-=amount; release(mylock);}

inconsistent stateacquire(a.mylock);acquire(b.mylock);

release(a.mylock);release(b.mylock);

Poor abstraction of class AccountDeadlockExposed implementation details

transfer(Account a, Account b, int amount){

}

atomic{ a.credit(amount); b.debit(amount);}

Page 5: Extending Open64 with Transactional Memory features

Transactional Memory Background

• Current Implementations– TM libraries

• DSTM• DracoSTM• TL2• TinySTM• ……..

Function calls:TM_INIT()/TM_SHUTDOWN()TM_ATOMIC_BEGIN()/TM_ATOMIC_END()TM_SHARED_READ()/TM_SHARED_WRITE()

Explicit Transaction

Page 6: Extending Open64 with Transactional Memory features

Transactional Memory Background

• Current Implementations– Compilers

• Intel C++ STM Compiler• Tanger• OpenTM• GCC

Page 7: Extending Open64 with Transactional Memory features

Design

• Programming Interfaces#pragma tm atomic [clause]structured block

readonly

private(var list)

shared(var list)

#pragma tm abort

#pragma tm functionfunction declaration

#pragma tm waiverfunction declaration

Page 8: Extending Open64 with Transactional Memory features

Design

• TM runtime interfaces (TL2)Interface Description

Thread* TxNewThread() Allocate a new Thread structure to keep logs

TxStart(Thread* Self, jmp_buf* buf, int flags) Start a new transaction for current thread

TxCommit(Thread* Self) Commit the current transaction

TxLoad(Thread* Self, void* addr) Perform synchronized load from given memory address

TxStore(Thread* Self, void* addr, intptr_t val) Perform synchronized store to given memory address

TxStoreLocal(Thread* Self, void* addr, intptr_t val) Perform locally logged store to given memory address

TxAbort(Thread* Self) Abort the current transaction and re-execute

Page 9: Extending Open64 with Transactional Memory features

Design

• Wrapper functions– To ease the process of integrating new TM librariestm_init()/tm_finalize()tm_thread_start()/tm_thread_end()

__tm_atomic_begin()/__tm_atomic_end()__tm_shared_read()/__tm_shared_read_float()__tm_shared_write()/__tm_shared_write_float()__tm_local_write()/__tm_local_write_float()

by programmers

by compiler

more wrapper functions are needed for other data types, and additional TM semantics

Page 10: Extending Open64 with Transactional Memory features

Design

• Optimization– Eliminate redundant calls to runtime libraries

Page 11: Extending Open64 with Transactional Memory features

Implementation

• General Transformation

Page 12: Extending Open64 with Transactional Memory features

Implementation

• General Transformation– #pragma tm atomic– simple statements– control flow statements

• IF• WHILE_DO

a = b+c;

PARM #address of cCALL <__tm_shared_read> LDID <return_offset>STID #tm_preg_num_0 PARM #address of bCALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_1 LDID #tm_preg_num_0 LDID #tm_preg_num_1 ADD PARM PARM #address of aCALL <__tm_shared_write>

setjmp();__tm_atomic_begin();

for(;i<10;i++){}

PARM #address of ICALL <__tm_shared_read> LDID <return_offset>STID #tm_preg_num_0WHILE_DO LDID #tm_preg_num_0 INTCONST 9 LEBODY BLOCK ……………. PARM #address of I CALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_0 END_BLOCK

Page 13: Extending Open64 with Transactional Memory features

Implementation

• General Transformation1.1 int i = 0;

1.2 #pragma tm atomic

{

1.3 int j = 0;

1.4 for(i=0;i<20;i++)

{

1.5 for(j=0;j<10;j++)

{

1.6 result++;

}

}

}

2.1 int i = 0;

2.2 jmpbuf jbuf;

2.3 _setjmp(jbuf);

2.4 TxStart(Self, jbuf);

2.5 TxStore(Self, &j, 0);

2.6 for (TxStore(Self, &i, 0); TxLoad(Self, &i)<20;

TxStore(Self, &i, TxLoad(Self, &i)+1)){

2.7 for(TxStore(Self, &j, 0); TxLoad(Self, &j)<10;

TxStore(Self, &j, TxLoad(Self, &j)+1)){

2.8 TxStore(Self, &result, TxLoad(Self, &result)+1);

}}

2.9 TxCommit(Self);

Page 14: Extending Open64 with Transactional Memory features

Implementation

• Functions– clone and instrument

#pragma tm functionvoid calculate(){}

void calculate()

__tm_cloned__calculate() //instrumented

#pragma tm atomic{ calculate();}

#pragma tm atomic{ __tm_cloned__calculate();}

Page 15: Extending Open64 with Transactional Memory features

Implementation

• Optimization1.1 int i = 0;

1.2 #pragma tm atomic

{

1.3 int j = 0;

1.4 for(i=0;i<20;i++)

{

1.5 for(j=0;j<10;j++)

{

1.6 result++;

}

}

}

2.1 int i = 0;

2.2 jmpbuf jbuf;

2.3 _setjmp(jbuf);

2.4 TxStart(Self, jbuf);

2.5 TxStore(Self, &j, 0);

2.6 for (TxStore(Self, &i, 0);; TxLoad(Self, &i)<20;

TxStore(Self, &i, TxLoad(Self, &i)+1)){

2.7 for(TxStore(Self, &j, 0); TxLoad(Self, &j)<10;

TxStore(Self, &j, TxLoad(Self, &j)+1)){

2.8 TxStore(Self, &result, TxLoad(Self, &result)+1);

}}

2.9 TxCommit(Self);

Transaction local variables : detected by the frontend

Page 16: Extending Open64 with Transactional Memory features

Implementation

• Optimization1.1 int i = 0;

1.2 #pragma tm atomic

{

1.3 int j = 0;

1.4 for(i=0;i<20;i++)

{

1.5 for(j=0;j<10;j++)

{

1.6 result++;

}

}

}

2.1 int i = 0;

2.2 jmpbuf jbuf;

2.3 _setjmp(jbuf);

2.4 TxStart(Self, jbuf);

2.5 j=0;

2.6 for (TxStore(Self, &i, 0); TxLoad(Self, &i)<20;

TxStore(Self, &i, TxLoad(Self, &i)+1)){

2.7 for(j=0; j<10;j++)){

2.8 TxStore(Self, &result, TxLoad(Self, &result)+1);

}}

2.9 TxCommit(Self);

Barrier Free variables : detected according to its storage class

Page 17: Extending Open64 with Transactional Memory features

Implementation

• Optimization1.1 int i = 0;

1.2 #pragma tm atomic

{

1.3 int j = 0;

1.4 for(i=0;i<20;i++)

{

1.5 for(j=0;j<10;j++)

{

1.6 result++;

}

}

}

2.1 int i = 0;

2.2 jmpbuf jbuf;

2.3 _setjmp(jbuf);

2.4 TxStart(Self, jbuf);

2.5 j=0;

2.6 for (; i<20; TxStoreLocal(Self, &i, i+1)){

2.7 for(j=0; j<10;j++)){

2.8 TxStore(Self, &result, TxLoad(Self, &result)+1);

}}

2.9 TxCommit(Self);

Page 18: Extending Open64 with Transactional Memory features

Implementation

• Optimization– Optimization opportunities detection strategy

• Pthread parallel task – transaction local: declared in tm atomic scope– barrier free: auto variables

• Cloned transactional function– transaction local: declared in the function

• OpenMP parallel task– transaction local: declared in tm atomic scope– barrier free: declared in micro task, marked in openmp private clause

• Checking readonly transactions

– Limitation• Reserved design for pointers• Needs programmers to participate in optimization

Page 19: Extending Open64 with Transactional Memory features

Preliminary Experiments• Compare with fine-grained lock based application

Page 20: Extending Open64 with Transactional Memory features

Preliminary Experiments

• Compare with manually instrumented application

Page 21: Extending Open64 with Transactional Memory features

Preliminary Experiments

#pragma tm atomic{ int j; *new_centers_len[index] ++; for(j=0;j<nfeatures;j++){ new_centers[index][j]+=feature[i][j]; }}

private(feature)

Page 22: Extending Open64 with Transactional Memory features

Conclusion & Future work

• A infrastructure for TM on Open64– Replaceable TM implementation– Optimization

• More experiments on non-trivial applications are desired• Nested transaction• Signal processing• Event handler• Indirect calls• Dealing with legacy code• …

FastDB: 8 out of 75 critical regions contain nested transactionsFastDB: 28 out of 75 critical regions contain signal processing

PARSEC: 20 out of 55 critical regions contain signal processing

Page 23: Extending Open64 with Transactional Memory features

Thanks