Whose Cache Line Is It Anyway? Mihir Nanavati, Mark Spear, Nathan Taylor, Shriram Rajagopalan, Dutch...

40
Whose Cache Line Is It Anyway? Mihir Nanavati, Mark Spear, Nathan Taylor, Shriram Rajagopalan, Dutch T. Meyer, William Aiello, and Andrew Warfield University of British Columbia Operating System Support for Live Detection and Repair of False Sharing

Transcript of Whose Cache Line Is It Anyway? Mihir Nanavati, Mark Spear, Nathan Taylor, Shriram Rajagopalan, Dutch...

Whose Cache Line Is It Anyway?

Mihir Nanavati, Mark Spear, Nathan Taylor, Shriram Rajagopalan, Dutch T. Meyer, William Aiello, and Andrew

WarfieldUniversity of British Columbia

Operating System Support for LiveDetection and Repair of False Sharing

2

3

4

Mondriaan Memory Protection [ASPLOS ’02, SOSP ’05]

5

Byte-granularity,

software-only remapping

6

False Sharing

7

8

Target System

Xen

Control VM

(Dom0)+

HardwareMemory

9

Dynamic Detection

and

Mitigationof False

Sharing

10

11

T1 T2

Cache

Main Memory

0x300

Read 0x300

0x340

Write 0x300 Write 0x308

12

Cache Line

C Structure

With Padding

With AllocatorMetadata

13

1 2 3 4 5 6 7 80

5

10

15

20

25

30

35

40

Serial ParallelRegular (FS) Source Fixed

No. of Cores

Tim

e (

s)

14

1 2 3 4 5 6 7 80

5

10

15

20

25

30

35

40

Serial ParallelRegular (FS) Source Fixed

No. of Cores

Tim

e (

s)

15

1 2 3 4 5 6 7 80

5

10

15

20

25

30

35

40

Serial ParallelRegular (FS) Source Fixed

No. of Cores

Tim

e (

s)

16

1 2 3 4 5 6 7 80

5

10

15

20

25

30

35

40

Serial ParallelRegular (FS) Source Fixed

No. of Cores

Tim

e (

s)

17

1 2 3 4 5 6 7 80

5

10

15

20

25

30

35

40

Serial ParallelRegular (FS) Source Fixed

No. of Cores

Tim

e (

s)

7.5x

Linux Kernel [OSDI ’10], JVM [Dice, 2012],

Software Transactional Memory [HPCA ’06]

18

Dynamic Detection

and

Mitigationof False

Sharing

19

Modify access locations

Modify access frequencySheriff [OOPSLA ’11]

20

21

Isolated Page

Underlay Page

T1 T2

22

Dynamic Detection

and

Mitigationof False

Sharing

23

Persistent, high-frequency

false sharing

24

Very Fast and Imprecise

Fast and Somewhat

Precise

Slow and Precise

25

Does this signifyfalse sharing?

Performance Counters

Log Page Reads

Instruction Emulation

Log-Analysis

Does contention exist?What pages are involvedin the contention?

What are the byte rangesbeing accessed?

Rules for remapper

26

Dynamic Detection

and

Mitigationof False

Sharing

27

Isolated Page

Underlay Page

T1 T2

28

Don’t be EvilHarmful

29

Fault Driven Redirection

30

Original Code Code Cache

?!It’s a

Fault?!

31

Original Code Code Cache

32

Avoid code trampolines

Catch all accesses via data path

Amortize page fault cost

33

“Know When You are Beaten”

34

Isolated Page

Underlay Page

T1 T2

35

Evaluation

36

0 1000 2000 3000 4000 5000 60000

100

200

300

400

500

600

Time (ms)

Pro

gre

ss (

mill

ion

reco

rds)

Remappings Established

Version with false sharing under Plastic

Coherence Invalidations

Source-fixed Version

110 M/sec

160 M/sec

37

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 Regular w/PlasticCCBenc

hPhoenix Parsec

1.4x

3.6x

5.4x

Norm

aliz

ed

Perf

orm

an

ce

38

Low overhead runtime detection

Byte-granularity remapping

Speedup of up to 5.4x

39

Performance Optimizations

Security Enhancements

40