REMIXOnline Detection and Repair of Cache Contention for the JVM
Ariel Eizenberg,
Joseph Devietti
Shiliang Hu,
Gilles Pokam
What’s wrong with this code?
public class Test extends AtomicLongimplements Runnable {
static void main(final String[] args) {Test test1 = new Test(); Test test2 = new Test();Thread t1 = new Thread(test1);Thread t2 = new Thread(test2);t1.start();t2.start();t1.join();t2.join();
}public void run() {
for(int i = 0; i < 100000000; ++ i) set(i);}}
313ms
1 Summary Background REMIX Detection Repair Performance
What’s wrong with this code?
public class Test extends AtomicLongimplements Runnable {
static void main(final String[] args) {Test test1 = new Test(); Thread t1 = new Thread(test1);Test test2 = new Test();Thread t2 = new Thread(test2);t1.start();t2.start();t1.join();t2.join();
}public void run() {
for(int i = 0; i < 100000000; ++ i) set(i);}}
151ms
1 Summary Background REMIX Detection Repair Performance
What happened?
• Version A: 313ms
test2test1 t1 t2
2 Summary Background REMIX Detection Repair Performance
test1
What happened?
• Version A: 313ms
test2test1 t1 t2
• Version B: 151ms
test2test1 t1 t2
2 Summary Background REMIX Detection Repair Performance
test1
What happened?
• Version A: 313ms
test2test1 t1 t2
• Version B: 151ms
test2
test1
t1 t2
• More surprises!
test2test1 t1 t2
GC, Class Loader, …
2 Summary Background REMIX Detection Repair Performance
Managed Languages
Runtime has increased
control over execution
opportunities for
dynamic optimization
Less programmer
control over memory
more vulnerable to
performance bugs
3 Summary Background REMIX Detection Repair Performance
REMIX
• First system to automatically detect and repair cache contention in managed languages
• Uses hardware performance counters to detect cache contention
• Automatically repairs cache contention bugs, significantly simplifying programmers job
• Implemented as a GC-like pass within the HotSpot JVM
4 Summary Background REMIX Detection Repair Performance
Outline
Cache Coherency
The Remix system
Evaluation
5 Summary Background REMIX Detection Repair Performance
Cache Coherency
• Multicore processors implement a cache
coherence protocol to keep private caches in sync
• Operates on whole cache lines (usually 64 bytes)
• Cache lines have three key states:
• Read Shared
• Write Exclusive
• Invalid
6 Summary Background REMIX Detection Repair Performance
True vs False sharing
Same Bytes
True Sharing
Core
1
Core
0
X
Core
1
Core
0
X YDifferent Bytes
False Sharing
Cache Line
7 Summary Background REMIX Detection Repair Performance
Intel PEBS Events
• PEBS – Precise Event-Based Sampling
• Available in recent Intel multiprocessors
• Log detailed information about architectural events
8 Summary Background REMIX Detection Repair Performance
PEBS HitM Events
• “Hit-Modified” - A cache miss due to a cache line in Modified state on a different core
Core
0
L1$ X:M
Core
1
L1$X:I
Load X
MissHit
9 Summary Background REMIX Detection Repair Performance
Related Work
• Sheriff [Liu and Berger, OOPSLA 2011]
• Plastic [Nanavati et al., EuroSys 2013]
• Laser [Luo et al., HPCA 2016]
• Cheetah [Liu and Liu, CGO 2016]
• Predator [Liu et. al., PPoPP 2014]
• Intel vTune Amplifier XE
• Oracle Java8 @Contended
detect & repairunmanaged
languages
detection only
manual repair
10 Summary Background REMIX Detection Repair Performance
REMIX
• A modified version of the Oracle HotSpot JVM
• Detects & repairs cache contention bugs at runtime
• Works with all JVM languages, no source code modification required
• Provides performance matching hand-optimization
11 Summary Background REMIX Detection Repair Performance
REMIX System Overview
Java, Scala, etc’
JVM
Linux Kernel
CPU w/HITM PEBS HITM events
Application
Runtime system
OS
Hardware
perf API
REMIX
12 Summary Background REMIX Detection Repair Performance
Detection
HITM
Eventsclassify track complete
native heap & stack
13 Summary Background REMIX Detection Repair Performance
Detection
managed heap
HITM
Eventsclassify track
>threshold
complete
native heap & stack
13 Summary Background REMIX Detection Repair Performance
Detection
managed heap
HITM
Eventsclassify track
>threshold
complete
native heap & stack
no
13 Summary Background REMIX Detection Repair Performance
Detection
managed heap
HITM
Eventsclassify track
>threshold
complete
model
$ lines
native heap & stack
no
yes
13 Summary Background REMIX Detection Repair Performance
Detection
managed heap
HITM
Eventsclassify track
>threshold
complete
model
$ linesclassify
native heap & stack
no
yes
map to
objects
13 Summary Background REMIX Detection Repair Performance
Detection
managed heap
HITM
Eventsclassify track
>threshold
complete
model
$ linesclassify
report
native heap & stack
no
yes
map to
objects
true sharing
13 Summary Background REMIX Detection Repair Performance
Detection
managed heap
HITM
Eventsclassify track
>threshold
complete
model
$ linesclassify
repair
report
native heap & stack
no
yes
map to
objects
false sharing
true sharing
13 Summary Background REMIX Detection Repair Performance
Cache Line Modelling
• Cache lines are modelled with 64-bit bitmaps
• HITM event set the address bit, count hit
• Multiple bits set potential false sharing
• Repair is cheaper than more complete modelling!
• Repair when counter exceeds threshold
14 Summary Background REMIX Detection Repair Performance
Top Level Flow
......
......
......
Foo ......
......
......
OriginalHeap
OriginalClasses
NewHeap
ModifiedClasses
Foo
Bar Bar
15 Summary Background REMIX Detection Repair Performance
Top Level Flow
......
......
......
Foo ......
......
......
OriginalHeap
OriginalClasses
NewHeap
ModifiedClasses
Foo
Bar Bar
......
......
......
Foo ......
......
......
OriginalHeap
OriginalClasses
NewHeap
ModifiedClasses
Foo
Bar Bar
15 Summary Background REMIX Detection Repair Performance
Top Level Flow
......
......
......
Foo ......
......
......
OriginalHeap
OriginalClasses
NewHeap
ModifiedClasses
Foo
Bar Bar
......
......
......
Foo ......
......
......
OriginalHeap
OriginalClasses
NewHeap
ModifiedClasses
Foo
Bar Bar
......
......
......
Foo ......
......
......
OriginalHeap
OriginalClasses
NewHeap
ModifiedClasses
Foo
Bar Bar
15 Summary Background REMIX Detection Repair Performance
Top Level Flow
......
......
......
Foo ......
......
......
OriginalHeap
OriginalClasses
NewHeap
ModifiedClasses
Foo
Bar Bar
......
......
......
Foo ......
......
......
OriginalHeap
OriginalClasses
NewHeap
ModifiedClasses
Foo
Bar Bar
......
......
......
Foo ......
......
......
OriginalHeap
OriginalClasses
NewHeap
ModifiedClasses
Foo
Bar Bar
......
......
......
Foo ......
......
......
OriginalHeap
OriginalClasses
NewHeap
ModifiedClasses
Foo
Bar Bar
15 Summary Background REMIX Detection Repair Performance
Padding
• Instance data
• Instance size
• Field list
• OOPMap (reference block map)
• Constant-pool cache
16 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a1
byte a2
class B
extends A
header (16b)
byte a1
byte a2
byte b
class C
extends A
header (16b)
byte a1
byte a2
byte c
17 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a1
byte a2
class B
extends A
header (16b)
byte a1
byte a2
byte b
class C
extends A
header (16b)
byte a1
byte a2
byte c
17 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a1
byte a2
class B
extends A
header (16b)
byte a1
byte a2
byte b
pad 48b
pad 62b
class C
extends A
header (16b)
byte a1
byte a2
byte c
17 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a1
byte a2
class B
extends A
header (16b)
byte a1
byte a2
byte b
pad 48b
pad 62b
pad 48b
pad 62b
class C
extends A
header (16b)
byte a1
byte a2
byte c
17 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a1
byte a2
class B
extends A
header (16b)
byte a1
byte a2
byte b
pad 48b
pad 62b
pad 48b
pad 62b
pad 48b
pad 62b
class C
extends A
header (16b)
byte a1
byte a2
byte c
17 Summary Background REMIX Detection Repair Performance
Repair
• Trace all strong+weak roots in the system
• Traverse heap and find targeted instances
• Live Relocate & pad, store forwarding pointer
• Dead Fix size mismatch
• Adjust all pointers to forwarded objects
• Deoptimize all relevant stack frames
18 Summary Background REMIX Detection Repair Performance
Disruptor & Spring Reactor
• Both are libraries for high-speed inter-thread message passing
19 Summary Background REMIX Detection Repair Performance
Disruptor & Spring Reactor
• Both are libraries for high-speed inter-thread message passing
class Sequence {protected volatile long value;
}
value
19 Summary Background REMIX Detection Repair Performance
56 bytes56 bytes
Disruptor & Spring Reactor
• Both are libraries for high-speed inter-thread message passing
class Sequence {protected long a,b,c,d,e,f,g; // pad beforeprotected volatile long value;protected long i,j,k,l,m,n,o; // pad after
}
• Is this enough? valuea-g i-o
19 Summary Background REMIX Detection Repair Performance
56 bytes56 bytes
Disruptor & Spring Reactor
• Both are libraries for high-speed inter-thread message passing
class Sequence {protected long a,b,c,d,e,f,g; // pad beforeprotected volatile long value;protected long i,j,k,l,m,n,o; // pad after
}
• Is this enough?
• No! – Class loader optimizes aggressively, lays out value to before a.
value a-g i-o
19 Summary Background REMIX Detection Repair Performance
Disruptor & Spring Reactor
• A complex hierarchy forces field order
class LhsPadding {protected long a,b,c,d,e,f,g;
}class Value extends LhsPadding {protected volatile long value;
}class RhsPadding extends Value {protected long i,j,k,l,m,n,o;
}class Sequence extends RhsPadding {
// actual work}
20 Summary Background REMIX Detection Repair Performance
Disruptor + Spring Reactor
0
1
2
3
4
SP
EED
UP
(HIG
HER
IS B
ETTER
)
REMIX Manual Pad
21 Summary Background REMIX Detection Repair Performance
Disruptor + Spring Reactor
0
1
2
3
4
SP
EED
UP
(HIG
HER
IS B
ETTER
)
REMIX Manual Pad
21 Summary Background REMIX Detection Repair Performance
Disruptor + Spring Reactor
0
1
2
3
4
SP
EED
UP
(HIG
HER
IS B
ETTER
)
REMIX Manual Pad
21 Summary Background REMIX Detection Repair Performance
Disruptor + Spring Reactor
0
1
2
3
4
SP
EED
UP
(HIG
HER
IS B
ETTER
)
REMIX Manual Pad
REMIX performs as well as manual
fixes in most cases
21 Summary Background REMIX Detection Repair Performance
Disruptor + Spring Reactor
0
1
2
3
4
SP
EED
UP
(HIG
HER
IS B
ETTER
)
REMIX Manual Pad
REMIX repair threshold is
adjustable
21 Summary Background REMIX Detection Repair Performance
Disruptor + Spring Reactor
0
1
2
3
4
SP
EED
UP
(HIG
HER
IS B
ETTER
)
REMIX Manual Pad
REMIX outperforms when manual
fix adds unnecessary padding
21 Summary Background REMIX Detection Repair Performance
NAS Parallel Benchmarks
• REMIX reveals true-sharing running NAS suite• HITM from true-sharing on JIT counters
• JIT-ing fixes contention
• NAS workloads exceed JIT size limit, not JIT-ed
• REMIX detects this and forces compilation
22 Summary Background REMIX Detection Repair Performance
NAS Parallel Benchmarks
• REMIX reveals true-sharing running NAS suite• HITM from true-sharing on JIT counters
• JIT-ing fixes contention
• NAS workloads exceed JIT size limit, not JIT-ed
• REMIX detects this and forces compilation
7.7 9.0
25.3
0
10
20
30
BT CG FT IS LU MG SP
speed
up
(hig
her
is b
ett
er)
22 Summary Background REMIX Detection Repair Performance
NAS Parallel Benchmarks
• REMIX reveals true-sharing running NAS suite• HITM from true-sharing on JIT counters
• JIT-ing fixes contention
• NAS workloads exceed JIT size limit, not JIT-ed
• REMIX detects this and forces compilation
7.7 9.0
25.3
0
10
20
30
BT CG FT IS LU MG SP
speed
up
(hig
her
is b
ett
er)
22 Summary Background REMIX Detection Repair Performance
No Contention == No Impact Sp
eed
up
(h
igh
er
is b
ett
er)
0.9
1
1.1
ap
para
t
blo
at
chart
an
tlr
avr
ora
fact
ori
e
h2
hsq
ldb
jyth
on
luse
arc
h
pm
d
scala
axb
act
ors
scala
c
scala
do
c
scala
p
scala
rifo
rm
scala
test
specs
sun
flo
w
tmt
tom
cat
trad
eb
ean
s
trad
eso
ap
xala
n
0.9
1
1.1
com
pre
ss
cryp
to.a
es
cryp
to.r
sa
cryp
to.s
ign
…
derb
y
fft.la
rge
fft.sm
all
lu.la
rge
lu.s
mall
mo
nte
-carl
o
mp
eg
au
dio
seri
al
sor.
larg
e
sor.
small
spars
e.la
rge
spars
e.s
mall
sun
flo
w
xml.t
ran
sfo
…
xml.v
alid
ati…
(DaCapo & ScalaBench)
(SpecJVM)
23 Summary Background REMIX Detection Repair Performance
No Contention == No Impact Sp
eed
up
(h
igh
er
is b
ett
er)
0.9
1
1.1
ap
para
t
blo
at
chart
an
tlr
avr
ora
fact
ori
e
h2
hsq
ldb
jyth
on
luse
arc
h
pm
d
scala
axb
act
ors
scala
c
scala
do
c
scala
p
scala
rifo
rm
scala
test
specs
sun
flo
w
tmt
tom
cat
trad
eb
ean
s
trad
eso
ap
xala
n
0.9
1
1.1
com
pre
ss
cryp
to.a
es
cryp
to.r
sa
cryp
to.s
ign
…
derb
y
fft.la
rge
fft.sm
all
lu.la
rge
lu.s
mall
mo
nte
-carl
o
mp
eg
au
dio
seri
al
sor.
larg
e
sor.
small
spars
e.la
rge
spars
e.s
mall
sun
flo
w
xml.t
ran
sfo
…
xml.v
alid
ati…
(DaCapo & ScalaBench)
(SpecJVM)
23 Summary Background REMIX Detection Repair Performance
No Contention == No Impact Sp
eed
up
(h
igh
er
is b
ett
er)
0.9
1
1.1
ap
para
t
blo
at
chart
an
tlr
avr
ora
fact
ori
e
h2
hsq
ldb
jyth
on
luse
arc
h
pm
d
scala
axb
act
ors
scala
c
scala
do
c
scala
p
scala
rifo
rm
scala
test
specs
sun
flo
w
tmt
tom
cat
trad
eb
ean
s
trad
eso
ap
xala
n
0.9
1
1.1
com
pre
ss
cryp
to.a
es
cryp
to.r
sa
cryp
to.s
ign
…
derb
y
fft.la
rge
fft.sm
all
lu.la
rge
lu.s
mall
mo
nte
-carl
o
mp
eg
au
dio
seri
al
sor.
larg
e
sor.
small
spars
e.la
rge
spars
e.s
mall
sun
flo
w
xml.t
ran
sfo
…
xml.v
alid
ati…
(DaCapo & ScalaBench)
(SpecJVM)
23 Summary Background REMIX Detection Repair Performance
No Contention == No Impact Sp
eed
up
(h
igh
er
is b
ett
er)
0.9
1
1.1
ap
para
t
blo
at
chart
an
tlr
avr
ora
fact
ori
e
h2
hsq
ldb
jyth
on
luse
arc
h
pm
d
scala
axb
act
ors
scala
c
scala
do
c
scala
p
scala
rifo
rm
scala
test
specs
sun
flo
w
tmt
tom
cat
trad
eb
ean
s
trad
eso
ap
xala
n
0.9
1
1.1
com
pre
ss
cryp
to.a
es
cryp
to.r
sa
cryp
to.s
ign
…
derb
y
fft.la
rge
fft.sm
all
lu.la
rge
lu.s
mall
mo
nte
-carl
o
mp
eg
au
dio
seri
al
sor.
larg
e
sor.
small
spars
e.la
rge
spars
e.s
mall
sun
flo
w
xml.t
ran
sfo
…
xml.v
alid
ati…
(DaCapo & ScalaBench)
(SpecJVM)
23 Summary Background REMIX Detection Repair Performance
Performance
0
200
400
600
800
Tim
e (
ms)
Collector Detector Repair
25 Summary Background REMIX Detection Repair Performance
Performance
0
200
400
600
800
Tim
e (
ms)
Collector Detector Repair
25 Summary Background REMIX Detection Repair Performance
Performance
0
200
400
600
800
Tim
e (
ms)
Collector Detector Repair
25 Summary Background REMIX Detection Repair Performance
Performance
0
200
400
600
800
Tim
e (
ms)
Collector Detector Repair
25 Summary Background REMIX Detection Repair Performance
Conclusions
• Cache contention afflicts managed languages
• Dynamic information opens up new avenues for optimization
• Performance counters are a gold mine
• REMIX can simplify programmers’ lives by automatically fixing false sharing bugs
• https://github.com/upenn-acg/REMIX (GPLv2)
26 Summary Background REMIX Detection Repair Performance
PEBS HITM Event Accuracy
7.40%
98.60%
W-W Contention
R-W Contention
data address accuracy
28 Summary Background REMIX Detection Repair Performance
sun.misc.Unsafe
• Tracks unsafe access, prevents padding
• REMIX Extended Unsafe :
• Just as fast, enables padding
37 Summary Background REMIX Detection Repair Performance
Time-to-fix
0%
5%
10%
15%
20%
Late
ncy
(%
ru
ntim
e)
34 Summary Background REMIX Detection Repair Performance
Time-to-fix
0%
5%
10%
15%
20%
Late
ncy
(%
ru
ntim
e)
34 Summary Background REMIX Detection Repair Performance
Performance
Benchmark(s) Classes
Loaded
Processing
(ms)
Heap Size
(MB)
DaCapo
Sunflow
1879 1.47 854
Disruptor 700-900 0.5-1.3 40-85
SpringReactor
WorkQueue
1617 0.748 41
SpecJVM serial 1498 1.21 1603
32 Summary Background REMIX Detection Repair Performance
REMIX Speedup vs HITM rate
0
1000
2000
3000
75% 100% 125% 150% 175% 200%
Speedup
HIT
M r
eco
rds/s
ec
Dacapo 9.12Dacapo2006DisruptorScalaBenchSPECjvm2008
31 Summary Background REMIX Detection Repair Performance
Size-Preserving-Klass
• Instances are stored contiguously in memory
• Object sizes not stored directly in the heap• Calculated through the klass pointer in the object
header
• Padding breaks this assumption• Object left behind at old location has wrong size!
• Solution – create a ghost klass with the original size• Modify dead instances to point to this ghost
• Take care not to traverse padded objects in heap until klass size is updated
30 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a
class B extends
A
header (16b)
byte a
byte b
class C extends
A
header (16b)
byte a
byte c
class E extends
C
header (16b)
byte a
byte c
byte d
class E extends
C
header (16b)
byte a
byte c
byte e
29 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a
class B extends
A
header (16b)
byte a
byte b
class C extends
A
header (16b)
byte a
byte c
class E extends
C
header (16b)
byte a
byte c
byte d
class E extends
C
header (16b)
byte a
byte c
byte e
29 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a
class B extends
A
header (16b)
byte a
byte b
class C extends
A
header (16b)
byte a
byte c
class E extends
C
header (16b)
byte a
byte c
byte d
class E extends
C
header (16b)
byte a
byte c
byte e
29 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a
class B extends
A
header (16b)
byte a
byte b
class C extends
A
header (16b)
byte a
byte c
class E extends
C
header (16b)
byte a
byte c
byte d
class E extends
C
header (16b)
byte a
byte c
byte e
pad
pad
29 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a
class B extends
A
header (16b)
byte a
byte b
class C extends
A
header (16b)
byte a
byte c
class E extends
C
header (16b)
byte a
byte c
byte d
class E extends
C
header (16b)
byte a
byte c
byte e
pad
pad
pad
pad
29 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a
class B extends
A
header (16b)
byte a
byte b
class C extends
A
header (16b)
byte a
byte c
class E extends
C
header (16b)
byte a
byte c
byte d
class E extends
C
header (16b)
byte a
byte c
byte e
pad
pad
pad
pad
pad
pad
pad
29 Summary Background REMIX Detection Repair Performance
Padding - Inheritance
class A
header (16b)
byte a
class B extends
A
header (16b)
byte a
byte b
class C extends
A
header (16b)
byte a
byte c
class E extends
C
header (16b)
byte a
byte c
byte d
class E extends
C
header (16b)
byte a
byte c
byte e
pad
pad
pad
pad
pad
pad
pad
pad
pad
pad
pad
pad
pad
29 Summary Background REMIX Detection Repair Performance
Savina Actors Benchmark
• Evaluates actor libraries across 30 benchmarks
• REMIX detects false and true sharing bugs in libraries & benchmarks:
AkkaActor
FuncJava
GparsActor
HabaneroActor
HabaneroSelector
LiftActor
ScalaActor
Scalaz
apsp
asta
r
bank
ing
barb
er big
bito
nics
ort
bndb
uffe
r
cham
eneo
s
cigsm
ok
conc
dict
conc
sll
coun
t
facloc fib
filte
rban
k
fjcre
ate
fjthr
put
logm
ap
nque
enk
philo
sophe
r
ping
pong
pipr
ecision
quicks
ort
radixs
ort
recm
atm
ul
sieve so
r
thre
adrin
g
trapez
oid
uct
0 HITM/s
100 HITM/s
>10K HITM/s
36 Summary Background REMIX Detection Repair Performance
Top Related