Janus : exploiting parallelism via hindsight

John Field

Janus: exploiting parallelism via hindsight

Omer Tripp Roman Manevich

Mooly Sagiv

the PMD code analyzer

the PMD code analyzerpopular open-source code analyzer (~2,000 weekly

downloads)

the PMD code analyzerpopular open-source code analyzer (~2,000 weekly

downloads)

scans source files modularly =>

parallelization opportunity

the main loop of PMD (simplified)

the main loop of PMD

analyze files in parallel


but.. (shared as local)


hmm.. can’t we “localize” ctx?

the main loop of PMDnot so fast.. (happens sometimes, deep down the stack)


no cloning =>


reuse => @atomic ≈ @sequential

the main loop of PMDto summarize..

the main loop of PMDto summarize..available

parallelism

the main loop of PMDto summarize..available

parallelismmissed by write-set approach

Janus: sequence-based detection/* offlin

e */

T1

ctx.sourceCodeFilename = …;…String fname=ctx.sourceCodeFilename;…

T2

ctx.sourceCodeFilename = …;…String fname=ctx.sourceCodeFilename;…

Janus: sequence-based detection/* offlin

e */

Janus: sequence-based detectionT1

ctx.sourceCodeFilename = …;…ctx.setAttribute(s, …);…

T2

ctx.sourceCodeFilename = …;…… = ctx.getAttribute(s);…

/* offline */


in paper:• more real-world examples• more coding patterns encouraging refined conflict detection

running example

Reverse-Delete MST

G’ = G(e1 … en) = SortEdgesByDecreasingWeight() for i=1 … n

G’.RemoveEdge(ei)if (!G’.HasPath(ei.u, ei.v))

G’.AddEdge(ei)return G’

running example

Reverse-Delete MST




ordered

running example

Reverse-Delete MST




heavy

running example

Reverse-Delete MST




identity if edge restored (available parallelism)

take I: blind parallelism

Reverse-Delete MST




parallel for // broken but fast

take II: write-set speculation

Reverse-Delete MST

G’ = G(e1 … en)= SortEdgesByDecreasingWeight() for i=1 … n



/* with boosting *

/

@atomic // exploit graph semantics: node ≈ location

take II: write-set speculation/* with b

oosting */

can we exploit the available

parallelism with this

approach?

the write-set approach in action

T1 T2

v1 v2uG:


RemoveEdge(u,v1) RemoveEdge(u,v2)T1 T2

v1 v2uG:


RemoveEdge(u,v1)b = HasPath(u,v1)


T1 T2

v1 v2uG:


RemoveEdge(u,v1)b = HasPath(u,v1)test b


T1 T2

v1 v2uG:


RemoveEdge(u,v1)b = HasPath(u,v1)test bAddEdge(u,v1)


T1 T2

v1 v2uG:

take III: online commutativity checking

/* with boosting *

/

Reverse-Delete MST




@atomic // full commutativity checking

precise checking in action

v1 v2uG:



T1 T2


v1 v2uG:

RemoveEdge(u,v1); HasPath(u,v1); AddEdge(u,v1)


∙



∙



T1 T2

≡




T1 T2

v1 v2uG:



∙



∙≡

but.. can we really afford this?



T1 T2

v1 v2uG:




T1 T2

v1 v2uG:

expensive instrumentation




T1 T2

v1 v2uG:


+




T1 T2

v1 v2uG:


expensive conflict detection+




T1 T2

v1 v2uG:


expensive conflict detection+poor performance

take III: Janus/* with b

oosting */

Reverse-Delete MST




@atomic // accurate checking utilizing offline data!

the Janus approach



T1 T2

v1 v2uG:

the Janus approach



T1 T2

v1 v2uG:

sequence-based detection

the Janus approach



T1 T2

v1 v2uG:

sequence-based detection => accurate

the Janus approach



T1 T2

v1 v2uG:


candidate sequences learned ahead of production runs, in profiling

=> accurate

the Janus approach



T1 T2

v1 v2uG:


candidate sequences learned ahead of production runs, in profiling

commutativity testing done offline

=> accurate

the Janus approach



T1 T2

v1 v2uG:


candidate sequences gleaned ahead of production runs, in profiling

commutativity testing done offline

=> accurate=> cheap

the Janus flow

offline

the Janus flowfind candidate sequences via

profiling*ST runs*single

locations

offline



locations

offlinelow instrumentation

overhead



locations

test sequences for

commutativity

offline



locations

test sequences for

commutativity

cache (generalized version) of

commutative sequences

offline



locations

test sequences for

commutativity



offline

cheap conflict detection



locations

test sequences for

commutativity



offline



locations

test sequences for

commutativity



offline

online

the Janus flow

consult sequence cache

when attempting to

commit a transaction

find candidate sequences via


locations

test sequences for

commutativity



offline

online

the Janus flow


when attempting to




locations

test sequences for

commutativity



offline

online

miss perform write-set detection

the Janus flow


when attempting to



online

soundness

the Janus flow


when attempting to




locations

test sequences for

commutativity



offline

online


user input

the Janus spec: data

n1:v1

n2:v2 n3:v2

n4:v1

the Janus spec: data

n1:v1

node data

n1 v1

… …

neighbor node

n2 n1

n1 n2

n3 n1

… …

n2:v2 n3:v2

n4:v1

the Janus spec: operationsnode data

n1 V1

n2 v2

n1:v1

n2:v2


n1 V1

n2 v2

n1:v1

n2:v2

public void addnode(n) { nodes.add(n); }

the Janus spec: operations


node data

n1 V1

n2 v2

insert <n.d,n> into “nodes”

n1:v1

n2:v2


n1 V1

n2 v2

n1:v1

n2:v2

n1:v1

n2:v2

n.d:n




n1 V1

n2 v2

n1:v1

n2:v2

node data

n1 v1

n2 v2

n n.d

n1:v1

n2:v2

n.d:n



the Janus flow


when attempting to




locations

test sequences for

commutativity



offline

online


sequence mining

v1 v2uG:

sequence mining

v1 v2uG:

G’ = G

sequence mining

v1 v2uG:

G’ = G(e1 … en) = SortEdgesByDecreasingWeight()

sequence mining

RemoveEdge(u,v1)T1

v1 v2uG:


sequence mining


T1

v1 v2uG:


sequence mining


T1

v1 v2uG:


RemoveEdge/T1; HasPath/T1

sequence mining


T1

v1 v2uG:



sequence mining


T1

v1 v2uG:


RemoveEdge/T1; HasPath/T1; AddEdge/T1

sequence mining


RemoveEdge(u,v2)T1 T2

v1 v2uG:



sequence mining


RemoveEdge(u,v2)T1 T2

v1 v2uG:



RemoveEdge/T2

sequence mining



T1 T2

v1 v2uG:



RemoveEdge/T2

sequence mining



T1 T2

v1 v2uG:




sequence mining



T1 T2

v1 v2uG:


return G’



the Janus flow


when attempting to




locations

test sequences for

commutativity



offline

online


commutativity checking

RemoveEdge(u,v1

); HasPath(u,v1); AddEdge(u,v1)

RemoveEdge(u,v2


∙

RemoveEdge(u,v2


RemoveEdge(u,v1


∙


RemoveEdge(x,y); HasPath(x,y); AddEdge(x,y)

RemoveEdge(x,z); HasPath(x,z); AddEdge(x,z)

∙



∙


ɸ0 = encode(G)ɸ1 = ɸ0 ˄ {x,y} ϵ “edges”…ɸk = ɸk-1 ˄ {x,y} ϵ “edges”…ɸk+1 = ɸk ˄ {x,y} ϵ “edges”…ɸm = ɸm-1 ˄ {x,y} ϵ “edges”

ψ0 = encode(G)ψ1 = ψ0 ˄ {x,z} ϵ “edges”…ψk = ψk-1 ˄ {x,z} ϵ “edges”…ψk+1 = ψk ˄ {x,y} ϵ “edges”…ψm = ψm-1 ˄ {x,y} ϵ “edges”



∙



∙






∙



∙

test: ¬(ѱ ≡ ɸ)






∙



∙

test: ¬(ѱ ≡ ɸ)

in paper:• encoding rules• “projection” : sound commutativity checking for sequences induced by individual memory locations

the Janus flow


when attempting to




locations

test sequences for

commutativity



offline

online


generalization RemoveEdge(x,y); HasPath(x,y);

AddEdge(x,y)S1= RemoveEdge(x,z); HasPath(x,z);

AddEdge(x,z)S2=



AddEdge(x,z)S2=

[S1 S2]G // “concrete” seq.s commute



AddEdge(x,z)S2=


S1(G) = S1∙S1(G) // idempotentS2(G) = S2∙S2(G) // idempotent



AddEdge(x,z)S2=



[S1+ S2

+]G // Kleene-cross generalization



AddEdge(x,z)S2=



[S1+ S2

+]G // Kleene-cross generalization

++

the Janus flow


when attempting to




locations

test sequences for

commutativity



offline

online

specialized spec


specialized commutativity specstate operation operation


any AddEdge(x,z)/ff AddEdge(x,y)/tt



any HasPath(z,w)/tt AddEdge(x,y)/ff




…




…

any[ RemoveEdge(x,z)/tt;

HasPath(x,z)/ff; AddEdge(x,z)/tt ]+

[ RemoveEdge(x,y)/tt; HasPath(x,y)/ff;

AddEdge(x,y)/tt ]+




…




AddEdge(x,y)/tt ]+

tailored for given application

the Janus flow


when attempting to




locations

test sequences for

commutativity



offline

online


parallelization protocol

shared state [k]

shared state [k+1]

shared state [k+2]

privatized state [k]

privatized state [k’]

a1

a2

.

.

.

.

.an

b1

b2

.

.

.

.

.bm

(a1 … an) (b∙ 1 … bm) Sk ≡ (b1 … bm) (a∙ 1 … an) Sk


shared state [k]

shared state [k+1]

shared state [k+2]



shared state [k+3]

replay on shared

state

bi


shared state [k]

shared state [k+1]

shared state [k+2]



a1

a2

.

.

.

.

.an

b1

b2

.

.

.

.

.bm

(a1 … an) (b∙ 1 … bm) Sk ≡ (b1 … bm) (a∙ 1 … an) Sk


shared state [k]

shared state [k+1]

shared state [k+2]

privatized state [k+2]

retr

y

parallel reverse-delete MST


RemoveEdge(u,v1

)HasPath(u,v1)AddEdge(u,v1)


RemoveEdge(u,v1

)HasPath(u,v1)AddEdge(u,v1)RemoveEdge(u,v2



RemoveEdge(u,v1



RemoveEdge(u,v3



RemoveEdge(u,v1



RemoveEdge(u,v3


?


RemoveEdge(u,v1



RemoveEdge(u,v3

)HasPath(u,v3)AddEdge(u,v3) x.f =

y;||z = x.f;


RemoveEdge(u,v1



RemoveEdge(u,v3


dope!


RemoveEdge(u,v1



RemoveEdge(u,v3


state operation operation

…




AddEdge(x,y)/tt ]+

b


RemoveEdge(u,v1



RemoveEdge(u,v3

)HasPath(u,v3)AddEdge(u,v3)that’

s better..

benchmarksdescription name

utility for synchronizing pairs of directories JFileSync

greedy graph-coloring algorithm JGraphT-1 (coloring)

saturation-degree node-ordering algorithm for heuristic graph coloring

JGraphT-2 (ordering)

Java source code analyzer PMD

machine-learning library for data-mining tasks Weka

Seq 1 2 3 4 5 6 7 80

0.5

1

1.5

2

2.5

Fine-grained conflict detection

Coarse-grained conflict detection

Sequential

Seq 1 2 3 4 5 6 7 80

0.5

1

1.5

2

2.5

Seq 1 2 3 4 5 6 7 80

0.5

1

1.5

2

2.5

Seq 1 2 3 4 5 6 7 80

0.5

1

1.5

2

2.5

Seq 1 2 3 4 5 6 7 80

0.5

1

1.5

2

2.5

Weka

PMD

JGraphT-2

JFileSync

JGraphT-1 speedup

Fine-grained conflict detection

Coarse-grained conflict detection

1 2 3 4 5 6 7 80

0.51

1.52

2.53

3.54

4.5

Weka

1 2 3 4 5 6 7 80

0.51

1.52

2.53

3.54

4.5

JFileSync

1 2 3 4 5 6 7 80

0.51

1.52

2.53

3.54

4.5

PMD

1 2 3 4 5 6 7 80

0.51

1.52

2.53

3.54

4.5

JGraphT-2

1 2 3 4 5 6 7 80

0.51

1.52

2.53

3.54

4.5

JGraphT-1 retries

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

WekaPMDJGraphT-2 JFileSyncJGraphT-1

misses

W/ seq. abstraction W/O seq. abstraction

conclusions

profiling is useful for speculation• prevalent behaviors (dependencies) => effective

specialization• speculation = safety net

Janus enables practical accuracy boosting

• lifts commutativity checking to sequences of operations• runtime overhead comparable to write-set approach

Janus : exploiting parallelism via hindsight

Documents

Transcript of Janus : exploiting parallelism via hindsight