Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn,...

22
Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan

Transcript of Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn,...

Page 1: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Parallelizing Data Race Detection

Benjamin WesterFacebook

David Devecsery, Peter Chen, Jason Flinn, Satish NarayanasamyUniversity of Michigan

Page 2: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Data Races

Benjamin Wester - ASPLOS '13 2

TIM

E

t1 t2

lock(l)x = 1unlock(l)

x = 3lock(l)x = 2unlock(l)

Page 3: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Race Detection

Benjamin Wester - ASPLOS '13

Normal run time

TIM

E With race detector

3

Instrumentation+

Analysis

Detection is slow!8x–200x•Managed vs. binary•Granularity•Completeness•Debug info gatheredMatches app concurrency

Page 4: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Race Detection

Benjamin Wester - ASPLOS '13

Normal run time

TIM

E With race detector

4

Parallel race detector

Detection is slow!8x–200x•Managed vs. binary•Granularity•Completeness•Debug info gatheredMatches app concurrency

Page 5: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

How to use parallel hardware?

• Scale program?– Performance tied to app

• Parallel algorithm?– Lots of fine-grained dependencies

• Uniparallelism– Converts execution into parallel pipeline– Works for instrumentation

Benjamin Wester - ASPLOS '13 5

[Wallace CGO’07,Nightingale ASPLOS’08]

Page 6: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Uniparallelism

Benjamin Wester - ASPLOS '13

TIM

E

6

Page 7: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Uniparallelism

Benjamin Wester - ASPLOS '13

TIM

E

7

Epoch-parallelExecution

Epoch-sequentialExecution

Page 8: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Uniparallelism

• Scales well:Epochs are independent execution units

• More efficient:Synchronization

Benjamin Wester - ASPLOS '13 8

TIM

E

Page 9: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Add Detector…

• Scales well:Epochs are independent execution units

• More efficient:Synchronization& Lock elision

Race detection depends on state!

Benjamin Wester - ASPLOS '13 9

TIM

E Here!Here!

Page 10: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Architecture

Benjamin Wester - ASPLOS '13 10

Epoch-sequentialEpoch-sequential Epoch-parallelEpoch-parallel CommitCommit

Page 11: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Moving Work

Identify meaningful fast subset of analysis state– Low frequency, lightweight, self-contained– Run in Epoch-Sequential phase– Predicted and usable in parallel phase

Symbolic execution– Replace missing state with symbols– Defer some computation to commit phase– Commit: replace symbols with concrete values

Benjamin Wester - ASPLOS '13 11

Page 12: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Architecture

Benjamin Wester - ASPLOS '13 12

Epoch-sequentialEpoch-sequential Epoch-parallelEpoch-parallel CommitCommit

Instrument and predictfast subset of analysis state Perform deferred work

on concrete values

Full instrumentationSymbolic execution

Page 13: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Fast

Slow

State: Vector clocks – e.g. ⟨0, 1, 0⟩ – for each:– Thread– Lock– Variable x 2: last read(s), last write

Example computation:Check(read_x ≤ thread_i ); write_x = thread_i

Transitive reductionUse knowledge of happens-before relationship

Parallel FastTrack

Benjamin Wester - ASPLOS '13 13

Page 14: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Fast

Slow

Parallel Eraser

State:– Set of locks held by thread (lockset)– Lockset for each variable– Position in state machine for each variable

Example computation:If state_x == SHARED then ls_x = ls_x ∩ {L, M}

Lockset factorizationFactor out common behavior

Benjamin Wester - ASPLOS '13 14

Page 15: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Parallel Performance

EfficiencyScalability

2 worker threads as baseline8-CPU PlatformBenchmarks:

SPLASH-2 (water, lu, ocean, fft, radix) Parallel app: pbzip2

Benjamin Wester - ASPLOS '13 15

Page 16: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Efficiency

Benjamin Wester - ASPLOS '13 16

Exactly 2 cores

Page 17: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Efficiency

Benjamin Wester - ASPLOS '13 17

Median 13% faster

Median 13% faster

Median 8% faster

Median 8% faster

Exactly 2 cores

Page 18: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Scaling Parallel FastTrack

Benjamin Wester - ASPLOS '13 18

2 Worker Threads

Page 19: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Scaling Parallel FastTrack

Benjamin Wester - ASPLOS '13 19

Median 4.4x with 8 coresMedian 4.4x with 8 cores

2 Worker Threads

Page 20: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Scaling Parallel Eraser

Benjamin Wester - ASPLOS '13 20

Median 3.3x with 8 coresMedian 3.3x with 8 cores

2 Worker Threads

Page 21: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Conclusion

3-Phase Uniparallel architecture– Parallel FastTrack– Parallel Eraser

Parallel algorithms:– Are more efficient– Scale better

Benjamin Wester - ASPLOS '13 21

Page 22: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Questions?

Benjamin Wester - ASPLOS '13 22