Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz...

23
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research

Transcript of Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz...

Iterative Context Bounding for Systematic Testing of Multithreaded Programs

Madan MusuvathiShaz Qadeer

Microsoft Research

Testing multithreaded programs is HARD

Specific thread interleavings expose subtle errorsTesting often misses these errors

Even when found, errors are hard to debugNo repeatable traceSource of the bug is far away from where it manifests

Current practiceConcurrency testing == Stress testing

Example: testing a concurrent queueCreate 100 threads performing queue operationsRun for days/weeksPepper the code with sleep( random() )

Stress increases the likelihood of rare interleavingsMakes any error found hard to debug

CHESS: Unit testing for concurrencyExample: testing a concurrent queue

Create 1 reader thread and 1 writer threadExhaustively try all thread interleavings

Run the test repeatedly on a specialized scheduler

Explore a different thread interleaving each timeUse model checking techniques to avoid redundancy

Check for assertions and deadlocks in every runThe error-trace is repeatable

State space explosion

x = 1;y = 1;x = 1;y = 1;

x = 2;y = 2;x = 2;y = 2;

2,12,1

1,01,0

0,00,0

1,11,1

2,22,2

2,22,22,12,1

2,02,0

2,12,12,22,2

1,21,2

2,02,0

2,22,2

1,11,1

1,11,1 1,21,2

1,01,0

1,21,2 1,11,1

y = 1;y = 1;

x = 1;x = 1;

y = 2;y = 2;

x = 2;x = 2;

Init state: x = 0, y = 0

x = 2; … … … … … y = 2;

x = 2; … … … … … y = 2;

State space explosion

x = 1; … … … … …y = 1;

x = 1; … … … … …y = 1;

n threads

k steps each

Number of executions = O( nnk )

Exponential in both n and kTypically: n < 10 k > 100

Limits scalability to large programs (large k)

Techniques

Iterative context boundingStrategy for searching large state spaces

State space optimizationReduces the size of the state space

x = 1;if (p != 0) { x = p->f;}

x = 1;if (p != 0) { x = p->f;}

Iterative context bounding

x = p->f;} x = p->f;}

x = 1;if (p != 0) {x = 1;if (p != 0) {

p = 0;p = 0;

preemption

non-preemption

Iterative context-bounding algorithmThe scheduler has a budget of c preemptions

Nondeterministically choose the preemption points

Resort to non-preemptive scheduling after c preemptionsRun each thread to the next yield point

Once all executions explored with c preemptionsTry with c+1 preemptions

Iterative context-bounding has desirable propertiesProperty 0: Easy to implement

Property 1: Polynomial state spacen threads, k steps each, c preemptions

Number of executions <= nkCc . (n+c)!

= O( (n2k)c. n! )

Exponential in n and c, but not in k

x = 1; … … … … …y = 1;

x = 1; … … … … …y = 1;

x = 2; … … … … … y = 2;

x = 2; … … … … … y = 2;

x = 1; … … … …

x = 1; … … … …

x = 2; … … …

x = 2; … … …

…y = 1; …y = 1;

… … … …

y = 2;y = 2;

• Choose c preemption points

• Permute n+c atomic blocks

Property 2: Deep exploration possible with small boundsA context-bounded execution has unbounded depth

A thread may execute unbounded number of steps within each context

Can reach a terminating state from an arbitrary state with zero preemptionsPerform non-preemptive schedulingLeave the number of non-preemptions unbounded

Property 3: Coverage metricIf search terminates with c preemptions,

any remaining error must require at least c+1 preemptions

Intuitive estimate forthe complexity of the bugs remaining in the programthe chance of their occurrence in practice

Property 4: Finds the ‘simplest’ error traceFinds the smallest number of preemptions to the

error

Number of preemptions better metric of error complexity than execution length

Property 5: Lots of bugs with small number of preemptions

Program KLOC Max Num Threads

Bugs Reachable with Preemption Count

0 1 2 3 Total

Bluetooth 0.4 3 0 1 0 0 1

Work-Stealing Queue

1.3 3 0 1 2 0 3

Transaction Manager

7.0 2 0 0 2 1 3

APE 18.9 4 2 1 1 - 4

Dryad Channels 16.0 5 1 5 1 - 7

Most states are covered with small number of preemptions

Coverage vs Time (Dryad)

Techniques

Iterative context-boundingStrategy for searching large state spaces

State space optimization

Optimization for race-free programs Insert context-switches only at synchronization points

Massive state-space reductionNum steps (k) = num synch. operations (not memory accesses)

Run data-race detection to check race-free assumptionGoldilocks algorithm [PLDI ’07] implemented for x86

Theorem: When search terminates for context-bound cEither find an erroneous executionOr find a data-raceOr the program has no errors reachable with c preemptions

ConclusionIterative context-bounding algorithm

Effective search strategy for multi-threaded bugs

Exposes many concurrency bugs

Implemented in the CHESS model checking toolApplying CHESS to Windows drivers, SQL, Cosmos,

Singularity

Visit http://research.microsoft.com/projects/CHESS/

Extra Slides

Partial-order reductionMany thread interleavings are equivalent

Accesses to separate memory locations by different threads can be reordered

Avoid exploring equivalent thread interleavings

Optimistic dynamic partial-order reduction Algorithm [Bruening ‘99] :

Assume the program is data-race freeContext switch only at synchronization pointsCheck for data-races in each execution

Theorem [Stoller ‘00] :If the algorithm terminates without reporting racesThen the program has no assertion failures

Massive reduction:k = number of synchronization accesses (not memory

accesses)

Combining with context-boundingAlgorithm:

Assume the program is data-race freeContext switch only at synchronization pointsExplore executions with c preemptionsCheck for data-races in each execution

Theorem:If the algorithm terminates without reporting races, Then the program has no assertion failures reachable with c

preemptionsRequires that a thread can block only at synchronization points