UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay...

Post on 14-Dec-2015

215 views 1 download

Tags:

Transcript of UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay...

UW-Madison Computer Sciences Multifacet Group © 2011

Karma:Scalable Deterministic Record-Replay

Arkaprava BasuJayaram Bobba

Mark D. Hill

Work done at University of Wisconsin-Madison

2

Executive summary

• Applications of deterministic record-replay– Debugging– Fault tolerance– Security

• Existing hardware record-replayer– Fast record but– Slow replay or – Requires major hardware changes

• Karma: Faster Replay with nearly-conventional h/w– Extends Rerun– Records more parallelism

3

Outline

• Background & Motivation• Rerun Overview• Karma Insights• Karma Implementation• Evaluation• Conclusion

4

Deterministic Record-Replay

• Multi-threaded execution non-deterministic• Deterministic record-replay to reincarnate

past execution• Record:

– Record selective events in a log• Replay:

– Use the log to reincarnate past execution• Key Challenge: Memory races

5

Record-Replay Motivation

• Debugging– Ensures bugs faithfully reappear (no heisenbugs)

• Fault-Tolerance– Enable hot backup for primary server to

shadow primary & take over on failure

• Security– Real time intrusion detection & attack analysis

Rep

lay sp

eed

matte

rs

6

Previous work

• Record Dependence– Wisconsin Flight Data Recorder [ISCA’03,etc.]: Too much

state– UCSD Strata [ASPLOS’06]: Log size grows rapidly w #cores

• Record Independence– UIUC DeLorean [ISCA’08]: Non-conventional BulkSC H/W– Wisconsin Rerun [ISCA’08]: Sequential replay– Intel MRR [MICRO’09]: Only for snoop based systems– Timetraveler [ISCA’10]: Extends Rerun to lower log size

• Our Goal– Retain Rerun’s near-conventional hardware– Enable Faster Replay

7

Outline

• Background & Motivation• Rerun Overview• Karma Insights• Karma Implementation• Evaluation• Conclusion

8

Rerun’s Recording

• Most code executes without races– Use race-free regions for ordering

• Episodes: independent execution regions– Defined per thread

T0 T1

LD A ST B ST C LD F

ST E LD B ST X LD R ST T LD X

T2

ST V ST Z LD W LD J

ST C LD Q LD J

ST Q ST E ST K LD Z

LD V

ST X

Partially adopted from ISCA’08 talk

9

23

Rerun’s Recording (Contd.)

• Capturing causality:– Timestamp via Lamport scalar clock [Lamport ‘78]

• Replay in timestamp order– Episodes with same timestamp can be replayed in parallel

43 2260

61 44

62

2344

45

T0 T1 T2

10

Rerun’s Replay

T0 T1 T2

22

43

4444

45

60

61

TS=22

TS=45

TS=44

TS=43

TS=60

TS=61

11

Outline

• Background & Motivation• Rerun Overview• Karma Insights• Karma Implementation• Evaluation• Conclusion

12

Karma’s Insight 1:

• Capture order with DAG (not scalar clock)

Recording: DAG captured with episode predecessor & successor sets 23

43 2260

61 44

62

2344

45

T0 T1 T2

13

Karma’s Insight 1:

T0 T1 T2

2260

61 43

4444

62

T0 T1 T2

22

43

4444

45

60

61

Reru

n’s

Rep

lay

Karm

a’s

Rep

lay

14

Karma’s Insight 1: (Contd.)

• Naïve approach: DAG arcs point to episodes– Episode represented by integers– Too much log size overhead !!

• Our approach: DAG arcs point to cores– Recording: Only one “active” episode per core – Replay: Send wakeup message(s) to core(s) of

successor episode(s)

15

Karma’s Insight 1:

T0 T1 T2

2260

61 43

44

44

62

84 0|0|1 0|0|1

Anatomy of a log entry

17

• Not necessary to end the episode on every conflict:– As long as the episodes can be ordered during

replay

ST B ST C

Karma Insight 2:

T0 T1 LD A

LD F

ST E LD B ST X LD R ST T

LD X

T2

ST V ST Z LD W LD J

ST C LD Q

LD J ST Q

ST E ST K LD Z

LD V

ST X

18

Outline

• Background & Motivation• Rerun Overview• Karma Insights• Karma Implementation• Evaluation• Conclusion

19

Karma Hardware

Coherence Controller

L1 I

L2 0

L2 1

L2 14

L2 15

Core 15

Interconnect

DR

AM

DR

AM

Core 14

Core 1

Core 0 …

Base System

Total State: 148 bytes/core

Address Filter(FLT)

Reference (REFS)

Predecessor(PRED)

Successor(SUCC)

Timestamp(TS)

20

Outline

• Background & Motivation• Rerun Overview• Karma Insights• Karma Implementation• Evaluation• Conclusion

21

Evaluation:

• Were we able to speed up the replay?

0

0.2

0.4

0.6

0.8

1

1.2

4core-4MB 8core-8MB 16core-16MB

Spee

dup

norm

aliz

ed to

"Ba

se"

of c

orre

spon

ding

co

nfigu

rati

on

Number of cores-L2 cache size

Apache Base

Rerun Replay

Karma Replay

22

Evaluation:

• Were we able to speed up the replay?

0

0.2

0.4

0.6

0.8

1

1.2

4core-4MB 8core-8MB 16core-16MB

Spee

dup

norm

aliz

ed to

"Ba

se"

of c

orre

spon

ding

co

nfigu

rati

on

Number of cores-L2 cache size

Apache Base

Rerun Replay

Karma Replay

0

0.2

0.4

0.6

0.8

1

1.2

4core-4MB 8core-8MB 16core-16MB

Spee

dup

norm

aliz

ed to

"Ba

se"

of c

orre

spon

ding

co

nfigu

rati

on

Number of cores-L2 cache size

Jbb Base

Rerun Replay

Karma Replay

0

0.2

0.4

0.6

0.8

1

1.2

4core-4MB 8core-8MB 16core-16MB

Spee

dup

norm

aliz

ed to

"Ba

se"

of c

orre

spon

ding

co

nfigu

rati

on

Number of cores-L2 cache size

OltpBaseRerun ReplayKarma Replay

0

0.2

0.4

0.6

0.8

1

1.2

4core-4MB 8core-8MB 16core-16MB

Spee

dup

norm

aliz

ed to

"Ba

se"

of c

orre

spon

ding

co

nfigu

ratio

n

Number of cores-L2 cache size

Zeus Base

Rerun Replay

Karma Replay

On Average ~4X improvement in replay speed over Rerun

23

Evaluation

• Did we blowup log size?

0

0.2

0.4

0.6

0.8

1

1.2

1.4

128 256 512 1024 2048 4096 8192 Unbounded

Ka

rma

lo

g s

ize

no

rma

lize

d t

o R

eru

n's

lo

g s

ize

Maximum allowable Episode size

Apache

Zeus

Oltp

Jbb

On average Karma does not increase the size of the log but instead improves it by as much as 40% as we allow larger episodes

25

Conclusion

• Applications of deterministic replay– Debugging– Fault tolerance– Security

• Existing hardware record-replayer– Slow replay or – Requires major hardware changes

• Karma: Faster Replay with nearly-conventional h/w– Extends Rerun– Uses DAG instead of Scalar clock– Extend episodes past conflicts

• Widen Application + Lower Cost More Attractive

26

Questions?