Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1...

46
Persistency for Synchronization-Free Regions Vaibhav Gogte, Stephan Diestelhorst $ , William Wang $ , Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch $ PLDI 2018 06/20/2018

Transcript of Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1...

Page 1: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Persistency for Synchronization-Free Regions

Vaibhav Gogte, Stephan Diestelhorst$, William Wang$, Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch

$

PLDI 201806/20/2018

Page 2: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Promise of persistent memory (PM)

2

Non-volatility

Performance

Density

Byte-addressable, load-store interface to storage

“Optane DC Persistent Memory will be offered in packages of up to 512GB per stick.”

“… expanding memory per CPU socket to as much as 3TB.”

*

* Source: www.extremetech.com

Page 3: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Core-1

Core-2

Core-3

Core-4

L1 $ L1 $ L1 $ L1 $

LLC

DRAM

Persistent memory system

Persistent Memory (PM)

Page 4: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Core-1

Core-2

Core-3

Core-4

L1 $ L1 $ L1 $ L1 $

LLC

DRAM

Recovery

Recovery can inspect the data-structures in PM to restore system to a consistent state

Persistent memory system

Persistent Memory (PM)

Page 5: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Memory persistency models

5

• Provide guarantees required for recoverable software– Academia [Condit ‘09][Pelley ‘14][Joshi ‘15][Kolli ‘17] …

– Industry [Intel ‘14][ARM ‘16]

• Define the program state that recovery observes post failure• Key primitives:

– Ensure failure atomicity for a group of persists – Govern ordering constraints on memory accesses to PM

Page 6: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Contributions

6

• Persistency model using synchronization-free regions

– Define precise state of the program post failure

– Employ existing synchronization primitives in C++

• Provide semantics as a part of language implementation

– Build compiler pass that emits logging code for persistent accesses

• Propose two implementations: Coupled-SFR and Decoupled-SFR

• Achieve 65% better performance over state-of-the-art tech.

Page 7: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Outline

7

• Design space for language-level persistency models• Our proposal: Persistency for SFRs

– Coupled-SFR implementation– Decoupled-SFR implementation

• Evaluation

Page 8: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Language-level persistency models [Chakrabarti ’14][Kolli ’17]

• Enables writing portable, recoverable software

• Extend language memory-model with persistency semantics

• Persistency model guarantees:

8

Atomic_begin()

A = 100;

B = 200;

Atomic_end()

Failure-atomicity: Which group of stores persist atomically?

Ordering:How can programmers order accesses?

A = 100;

L1.rel();

L1.acq();

A = 200;St(A) <p St(B)

Page 9: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Why failure-atomicity?

9

Task: Fill node and add to linked list, safelyIn-memory data

Initial linked-list

Page 10: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Why failure-atomicity?

10

Task: Fill node and add to linked list, safely

fillNewNode()

In-memory data

Initial linked-list

Page 11: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Why failure-atomicity?

11

Task: Fill node and add to linked list, safely

fillNewNode()

updateTailPtr()

In-memory data

Initial linked-list

Page 12: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Why failure-atomicity?

12

Task: Fill node and add to linked list, safely

fillNewNode()

updateTailPtr()

In-memory data

Fence

Initial linked-list

Page 13: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Why failure-atomicity?

13

Task: Fill node and add to linked list, safelyIn-memory data

Failure-atomicity à Persistent memory programming easier

fillNewNode()updateTailPtr()

Atomic

Page 14: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Semantics for failure-atomicity

14

• Assures that either all or none of the updates visible post failure• Guaranteed by hardware, library or language implementations• Design space for semantics

– Individual persists– Outer critical-sections

Existing mechanisms provide unsatisfying semantics or suffer high performance overhead

Page 15: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Granularity of failure-atomicity - I

15

L1.lock();x -= 100;y += 100;L2.lock();

a -= 100;b += 100;

L2.unlock();L1.unlock();

Page 16: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Granularity of failure-atomicity - I

16

L1.lock();

x -= 100;

y += 100;

L2.lock();

a -= 100;

b += 100;

L2.unlock();

L1.unlock();

Individual persists [Condit ‘09][Pelley ‘14][Joshi ‘16][Kolli ‘17]

• Mechanisms ensure atomicity of individual persists

• Non-sequentially consistent state visible to recovery à Need additional custom logging

Page 17: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Granularity of failure-atomicity - I

17

L1.lock();

x -= 100;

y += 100;

L2.lock();

a -= 100;

b += 100;

L2.unlock();

L1.unlock();

Individual persists [Condit ‘09][Pelley ‘14][Joshi ‘16][Kolli ‘17]

• Mechanisms ensure atomicity of individual persists

• Non-sequentially consistent state visible to recovery à Need additional custom logging

Page 18: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Granularity of failure-atomicity - II

18

L1.lock();x -= 100;y += 100;L2.lock();

a -= 100;b += 100;

L2.unlock();L1.unlock();

Outer critical sections [Chakrabarti ‘14][Boehm ‘16]

• Guarantees recovery to observe SC state à Easier to build recovery code

Page 19: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Granularity of failure-atomicity - II

19

L1.lock();

x -= 100;

y += 100;

L2.lock();

a -= 100;

b += 100;

L2.unlock();

L1.unlock();

Outer critical sections [Chakrabarti ‘14][Boehm ‘16]

• Guarantees recovery to observe SC state à Easier to build recovery code

• Require complex dependency tracking between logsà > 2x performance cost

• Do not generalize to certain sync. constructs à eg. condition variables

Page 20: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Our proposal: Failure-atomic SFRs

20

l1.acq();x -= 100;y += 100;l2.acq();

a -= 100;b += 100;

l2.rel();l1.rel();

SFR1

SFR2

Thread regions delimited by synchronization operations or system calls

Synchronization free regions (SFR)

Enable precise post-failure state with low performance overhead

Page 21: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Guarantees by failure-atomic SFRs

21

Thread 1

l1.acq();x -= 100;y += 100;

l1.rel();

SFR 1

• Intra-thread guarantees– Ensure failure-atomicity of updates within SFR

Page 22: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Guarantees by failure-atomic SFRs

22

Thread 1

l1.acq();x -= 100;y += 100;

l1.rel();

SFR 1

• Intra-thread guarantees– Ensure failure-atomicity of updates within SFR– Define precise points for post-failure program state

Page 23: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Guarantees by failure-atomic SFRs

23

Thread 1

l1.acq();x -= 100;y += 100;

l1.rel();

SFR 1

• Intra-thread guarantees– Ensure failure-atomicity of updates within SFR– Define precise points for post-failure program state

• Inter-thread guarantees

Thread 2

l1.acq();x += 100;y -= 100;

l1.rel();

Page 24: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Guarantees by failure-atomic SFRs

24Two logging impl. à Coupled-SFR and Decoupled-SFR

Thread 1

l1.acq();x -= 100;y += 100;

l1.rel();

SFR 1

• Intra-thread guarantees– Ensure failure-atomicity of updates within SFR– Define precise points for post-failure program state

• Inter-thread guarantees– Order SFRs using synchronizing acq and rel ops– Serialize ordered SFRs in PM

Thread 2

l1.acq();x += 100;y -= 100;

l1.rel();

SFR 2

Page 25: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Undo-logging for SFRs

25

L1.acq();x = 100;

L1.rel();SFR1

persistUndoLog (L)

mutateData (M)

commitLog (C)

persistData (P)

SFR1

Page 26: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Undo-logging for SFRs

26

L1.acq();x = 100;

L1.rel();SFR1 Failure-

atomic

persistUndoLog (L)

mutateData (M)

commitLog (C)

persistData (P)

SFR1Need to ensure the ordering of steps in undo-

logging for SFRs to be failure-atomic

Page 27: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Impl. 1: Coupled-SFR

27

L1.acq();x = 100;

L1.rel();

SFR1

L1.acq();x = 200;

L1.rel();SFR2

Thread 1 Thread 2

Page 28: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Impl. 1: Coupled-SFR

28

L1.acq();x = 100;

L1.rel();

SFR1

L1.acq();x = 200;

L1.rel();SFR2

Thread 1 Thread 2

L1

M1

P1

C1

REL1

SFR1

Thread 1

Page 29: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Impl. 1: Coupled-SFR

29

L1.acq();x = 100;

L1.rel();

SFR1

L1.acq();x = 200;

L1.rel();SFR2

Thread 1 Thread 2

L1

M1

P1

C1

REL1

SFR1

Thread 1

L2

M2

P2

C2

ACQ2

SFR2

Thread 2

Page 30: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Impl. 1: Coupled-SFR

30

L1.acq();

x = 100;

L1.rel();

SFR1

L1.acq();

x = 200;

L1.rel();

SFR2

Thread 1 Thread 2

+ Persistent state lags execution by at most one SFR à Simpler implementation, latest state at failure

- Need to flush updates at the end of each SFR à High performance cost

L1

M1

P1

C1

REL1

SFR1

Thread 1

L2

M2

P2

C2

ACQ2

SFR2

Thread 2

Page 31: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Impl. 2: Decoupled-SFR

31

L1.acq();x = 100;

L1.rel();

SFR1

L1.acq();x = 200;

L1.rel();SFR2

Thread 1 Thread 2

L1

M1

P1

C1

REL1

SFR1

Thread 1

L2

M2

P2

C2

ACQ2

SFR2

Thread 2

Page 32: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Impl. 2: Decoupled-SFR

32

L1.acq();x = 100;

L1.rel();

SFR1

L1.acq();x = 200;

L1.rel();SFR2

Thread 1 Thread 2

L1

M1

P1

C1

REL1

SFR1

Thread 1

L2

M2

P2

C2

ACQ2

SFR2

Thread 2

Key idea: Persist updates and commit logs in background

Page 33: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Impl. 2: Decoupled-SFR

33

L1.acq();x = 100;

L1.rel();

SFR1

L1.acq();x = 200;

L1.rel();SFR2

Thread 1 Thread 2

L1

M1

P1

C1

REL1

SFR1

Thread 1

L2

M2

P2

C2

ACQ2

SFR2

Thread 2

Key idea: Persist updates and commit logs in background

Require recording order of log creation

Page 34: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Log ordering in Decoupled-SFR

34

x = 100;L1.rel();

L1.acq();x = 200;

Thread 1 Thread 2Thread 1Header

Thread 2Header

L1 0Sequence Table

Init x = 0

Page 35: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Log ordering in Decoupled-SFR

35

x = 100;L1.rel();

L1.acq();x = 200;

Thread 1 Thread 2Thread 1Header

Thread 2Header

Store Xxold = 0

L1 0Sequence Table

Init x = 0

Page 36: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Log ordering in Decoupled-SFR

36

x = 100;L1.rel();

L1.acq();x = 200;

Thread 1 Thread 2Thread 1Header

Thread 2Header

Store Xxold = 0

Rel L1Seq = 0

L1 0Sequence Table

Init x = 0

Page 37: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Log ordering in Decoupled-SFR

37

x = 100;L1.rel();

L1.acq();x = 200;

Thread 1 Thread 2Thread 1Header

Thread 2Header

Store Xxold = 0

Rel L1Seq = 0

L1 0Sequence Table

à 1

Init x = 0

Page 38: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Log ordering in Decoupled-SFR

38

x = 100;L1.rel();

L1.acq();x = 200;

Thread 1 Thread 2Thread 1Header

Thread 2Header

Store Xxold = 0

Rel L1Seq = 0

Acq L1Seq = 1

L1 0Sequence Table

à 1

Init x = 0

Page 39: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Log ordering in Decoupled-SFR

39

x = 100;L1.rel();

L1.acq();x = 200;

Thread 1 Thread 2Thread 1Header

Thread 2Header

Store Xxold = 0

Rel L1Seq = 0

Acq L1Seq = 1

Store Xxold = 100

L1 0Sequence Table

à 1

Init x = 0

Page 40: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Log ordering in Decoupled-SFR

40

x = 100;L1.rel();

L1.acq();x = 200;

Thread 1 Thread 2Thread 1Header

Thread 2Header

Store Xxold = 0

Rel L1Seq = 0

Acq L1Seq = 1

Store Xxold = 100

L1 0Sequence Table

à 1

Sequence numbers record inter-thread order of log creation

Init x = 0

In paper: Racing sync. operations, commit optimizations etc.

Background threads commit logs using recorded sequence nos.

Page 41: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Evaluation setup• Designed our logging approaches in LLVM v3.6.0

– Instruments stores and sync. ops. to emit undo logs– Creates log space for managing per-thread undo-logs– Launches background threads to flush/commit logs in Decoupled-SFR

• Workloads: write-intensive micro-benchmarks– 12 threads, 10M operations

• Performed experiments on Intel E5-2683 v3 – 2GHz, 12 physical cores, 2-way hyper-threading

41

Page 42: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

0

0.2

0.4

0.6

0.8

1

1.2

CQ SPS PC RB-tree TATP LL TPCC Mean

Norm

alize

d ex

ec. t

ime

Atlas Coupled-SFR Decoupled-SFR No-persistencyPerformance evaluation

42

Better

[Chakrabarti ’14]

Page 43: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

0

0.2

0.4

0.6

0.8

1

1.2

CQ SPS PC RB-tree TATP LL TPCC Mean

Norm

alize

d ex

ec. t

ime

Atlas Coupled-SFR Decoupled-SFR No-persistencyPerformance evaluation

43

65%

Better

Decoupled-SFR performs 65% better than state-of-the-art ATLAS design

[Chakrabarti ’14]

Page 44: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

0

0.2

0.4

0.6

0.8

1

1.2

CQ SPS PC RB-tree TATP LL TPCC Mean

Norm

alize

d ex

ec. t

ime

Atlas Coupled-SFR Decoupled-SFR No-persistencyPerformance evaluation

44Coupled-SFR performs better that Decoupled-SFR when fewer stores/SFR

Better

[Chakrabarti ’14]

Page 45: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Conclusion• Failure-atomic synchronization-free regions

– Persistent state moves from one sync. operation to the next– Extends clean SC semantics to post-failure recovery

• Coupled-SFR– Easy to reason about PM state after failure; high performance cost

• Decoupled-SFR– Persistent state lags execution; performs 65% better than ATLAS

45

Page 46: Persistency for Synchronization-Free Regionsweb.eecs.umich.edu/~vgogte/pldi18slides.pdf · Thread 1 Thread 2 Thread1 Header Thread2 Header Store X x old= 0 RelL1 Seq= 0 AcqL1 Seq=

Persistency for Synchronization-Free Regions

Vaibhav Gogte, Stephan Diestelhorst$,

William Wang$, Satish Narayanasamy,

Peter M. Chen, Thomas F. Wenisch

$ If the surgery proves unnecessary, we’ll revert your architectural state at no charge.*

*Source: Paul Kocher, ISCA 2018