Octet: Capturing and Controlling Cross-Thread Dependences Efficiently
Michael BondMilind KulkarniMan CaoMinjia ZhangMeisam Fathi SalmiSwarnendu BiswasAritra SenguptaJipeng Huang
Ohio State
Purdue
• Help express parallelism better• Eliminate concurrency errors• Diagnose production bugs• Deal with nondeterminism
Need practical runtime support
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic
execution
Need practical runtime support
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic
execution
Track dependences Control dependences
Need practical runtime support
o.f = …
… = o.f
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic
execution
Track dependences Control dependences
Need practical runtime support
o.f = …
… = o.f
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic
execution
Track dependences Control dependences
Need practical runtime support
o.f = …
… = o.f
Commodity (software-only) approachesslow programs by several times
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic
execution
Track dependences Control dependences
o.f = …
check
… = o.f
check
Need practical runtime support
Commodity (software-only) approachesslow programs by several times
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic
execution
Track dependences Control dependences
o.f = …
check
… = o.f
check
Need practical runtime support
Any access could race add synchronization at every access
Octet
Framework for runtime supportHB edges all dependencesAtomicity of analysis & access
Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement
Octet
Framework for runtime supportHB edges all dependencesAtomicity of analysis & access
Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement
Proofs!
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
T4
read check
write check
read check
read check
o’s state = RdShc
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
rd o.f
T4
read check
write check
read check
read check
o’s state = RdShc
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
rd o.f
T4
read check
write check
read check
read check
o’s state = RdShc
Sharing detection[von Praun & Gross ’01]Comparison in our paper
Distributed shared memoryShasta [Scales et al. ’96]
Biased locking[Kawachiya et al. ’02][Russell & Detlefs ’06][Hindman & Grossman ’06]
• Atomicity checking• Data race
detection• Record & replay
• Transactional memory• DRF/SC enforcement• Deterministic
execution
Practical runtime support
Track dependences Control dependences
Framework for runtime supportConcurrency control mechanism
Oct
et
wr o.f
T1 T2
rd o.f
rd o.f
T3
safe point
rd o.f
T4
read check
write check
read check
read check
Dependence recorder records happens-before edges
Implementation in Jikes RVMPublicly availablehttp://jikesrvm.org/Research+Archive
Parallel programsDaCapo Benchmarks 2006 & 2009SPEC JBB 2000 & 2005
Parallel platform32 cores (AMD Opteron 6272)
eclip
se6
hsql
db6
luse
a...
xala
n6
avro
ra9
jyth
on9
luin
dex9
luse
a...
pmd9
sunfl
ow9
xala
n9
jbb2
000
jbb2
005
geo
0
100
200
300
400
500
600
700
800
900
1000
Pessimi...
Overh
ead
(%
)34,600% 3,000%
eclip
se6
hsql
db6
luse
a...
xala
n6
avro
ra9
jyth
on9
luin
dex9
luse
a...
pmd9
sunfl
ow9
xala
n9
jbb2
000
jbb2
005
geo
0
20
40
60
80
100
120 Octet w/o coord
Octet w/o coord
Overh
ead
(%
)
eclip
se6
hsql
db6
luse
a...
xala
n6
avro
ra9
jyth
on9
luin
dex9
luse
a...
pmd9
sunfl
ow9
xala
n9
jbb2
000
jbb2
005
geo
0
20
40
60
80
100
120
OctetOctet w/o coord
Overh
ead
(%
)
eclip
se6
hsql
db6
luse
a...
xala
n6
avro
ra9
jyth
on9
luin
dex9
luse
a...
pmd9
sunfl
ow9
xala
n9
jbb2
000
jbb2
005
geo
0
20
40
60
80
100
120
RecorderOctetOctet w/o coord
Overh
ead
(%
)
Top Related