A Rand omized Algorithm for Concurrency Testing

Post on 24-Feb-2016

49 views 0 download

Tags:

description

A Rand omized Algorithm for Concurrency Testing. Madan Musuvathi Research in Software Engineering Microsoft Research. The Concurrency Testing Problem. A closed program = program + test harness Test harness encodes both the concurrency scenario and the inputs - PowerPoint PPT Presentation

Transcript of A Rand omized Algorithm for Concurrency Testing

A Randomized Algorithm for Concurrency Testing

Madan MusuvathiResearch in Software Engineering

Microsoft Research

The Concurrency Testing ProblemA closed program = program + test harnessTest harness encodes both the concurrency scenario

and the inputsThe only nondeterminism is the thread interleavings

Verification vs TestingVerification:

Prove that the program is correct (free of bugs)With the minimum amount of resources

Testing: ??

Verification vs TestingVerification:

Prove that the program is correct (free of bugs)With the minimum amount of resources

Testing: Given a certain amount of resourcesHow close to a proof you can get?Maximize the number of bugs that you can find

In the limit: Verification == Testing

Testing is more important than VerificationUndecidability argument

There is always going to be programs large enough and properties complex enough for which verification cannot be done

Economic argumentIf the cost of a bug is lesser than the cost finding the bug (or proving

its absence)You are better off shipping buggy software

Engineering arugmentMake software only as reliable as the weakest link in the entire system

o

Providing Probabilistic GuaranteesProblem we would like to solve:

Given a program, prove that it does not do something wrong with probability > 95%

Problem we can hope to solve:Given a program that contains a bug, design a testing

algorithm that finds the bug with probability > 95%Prove optimality: no testing algorithm can do better

Cuzz: Concurrency FuzzingDisciplined randomization of schedules

Probabilistic guaranteesEvery run finds a bug with some (reasonably large) probabilityRepeat runs to increase the chance of finding a bug

ScalableIn the no. of threads and program size

EffectiveBugs in IE, Firefox, Office Communicator, Outlook, …Bugs found in the first few runs

Cuzz Demo

Problem FormulationP is a class of programsB is a class of bugsGiven P and B, you design a testing algorithm TGiven T, the adversary picks a program p in P

containing a bug b in BGiven p, T generates an input in constant timeProve that T finds b in p with a probability X(p,B)

In our caseP is a class of closed terminating concurrent programsB is a class of bugsGiven P and B, you design a testing algorithm TGiven T, the adversary picks a program p in P

containing a bug b in BGiven p, T generates an interleaving in constant timeProve that T finds b in p with a probability X(p,B)

Useful parametersFor a closed terminating concurrent program

(Fancy way of saying, a program combined with a concurrency test)

n : maximum number of threadsk : maximum number of instructions executed

What is a “Bug” – first attempt

Bug is defined as a particular buggy interleaving

No algorithm can find the bug with a probably greater than 1/nk

k instructions(~ millions)

nk schedules

n threads(~ tens)

A Deterministic Algorithm

Provides no guarantees

k instructions(~ millions)

nk schedules

n threads(~ tens)

Randomized Algorithm

Samples the schedule space with some probability distribution Adversary picks the schedule that is the least probable Probability of finding the bug <= 1/nk

k instructions(~ millions)

nk schedules

n threads(~ tens)

Randomized Algorithm

1/nk is a mighty small numberHard to design algorithms that find the bug with

probability == 1/nk

k instructions(~ millions)

nk schedules

n threads(~ tens)

A Good Research Trick

When you cant solve a problem, change the problem definition

Bugs are not adversarialUsually, if there is one interleaving that finds the bug

there are many interleavings that find the same bugThis is not true for program inputs

These set of interleavings that find the bug share the same root cause

The root cause of real bugs are not complicatedSmart people make stupid mistakes

Classifying BugsClassify concurrency bugs based on a suitable “depth”

metric

Adversary can chose any bug but within a given depth

Testing algorithm provides better guarantees for bugs with a smaller depthEven if worst-case probability is less than 1/nk

We want real bugs to have small depthWe want to be able design effective sampling

algorithms for finding bugs of a particular depth

Our Bug Depth DefinitionBug Depth = number of ordering constraints sufficient

to find the bug

Best explained through examples

A Bug of Depth 1 Bug Depth = no. of ordering constraints

sufficient to find the bug

A: …B: fork (child);C: p = malloc();D: …E: …

Parent

F: ….G: do_init();H: p->f ++; I: …J: …

Child Possible schedules

A B C D E F G H I J A B F G H C D E I J A B F G C D E H I J A B F G C H D E I J A B F G H I J C D E …

A Bug of Depth 2 Bug Depth = no. of ordering constraints

sufficient to find the bug

A: …B: p = malloc();C: fork (child);D: ….E: if (p != NULL)F: p->f ++;G:

Parent

H: … I: p = NULL;J : ….

Child Possible schedules

A B C D E F G H I J A B C D E H I J F G A B C H I D E G J A B C D H E F I J G A B C H D E I J F G …

Another Bug of Depth 2 Bug Depth = no. of ordering constraints

sufficient to find the bug

A: …B: Lock (A);C: …D: Lock (B);E: …

Parent

F: …G: Lock (B);H: … I: Lock (A);J: …

Child

HypothesisMost concurrency bugs in practice have a very small

depth

What has been empirically validated :There are lots of bugs of small depths in real programs

Defining a BugA schedule is a sequence of (dynamic) instructions

S = set of schedules of a closed program

A concurrency bug B is a strict subset of S

Ordering ConstraintsA schedule satisfies an ordering constraint (a,b) if

instruction a occurs before instruction b in the schedule

A: …B: fork (child);C: p = malloc();D: …E: …

Parent

F: ….G: do_init();H: p->f ++; I: …J: …

ChildA B F G H C D E I JSatisfies (H, C)

Depth of a BugS(c1,c2,…cn) = set of schedules that satisfy the ordering

constraints c1,c2,…cn

A bug B is of depth ≤ d, if there exists constraints c1,c2,…cd such that S(c1,c2,…cd) B

A Bug of Depth 1 Bug Depth = no. of ordering constraints

sufficient to find the bug

A: …B: fork (child);C: p = malloc();D: …E: …

Parent

F: ….G: do_init();H: p->f ++; I: …J: …

Child Possible schedules

A B C D E F G H I J A B F G H C D E I J A B F G C D E H I J A B F G C H D E I J A B F G H I J C D E …

What is the Depth of this Bug

A: …B: p = malloc();C: fork (child);D: allocated = 1E: p = null;

Parent

F: ….G: if(allocated)H: p->f++; I: …J: …

Child

Any buggy interleaving satisfies(D, G) && (E, H)

Bug depth <= 2

What is the Depth of this Bug

A: …B: p = malloc();C: fork (child);D: allocated = 1E: p = null;

Parent

F: ….G: if(allocated)H: p->f++; I: …J: …

Child Any interleaving that satisfies(E,G) is buggy

Bug depth == 1

Even though there are buggy interelavings thatdon’t satisfy (E,G)

Lets look at the complicated bugvoid AddToCache() { // ... A: x &= ~(FLAG_NOT_DELETED); B: x |= FLAG_CACHED; MemoryBarrier(); // ... } AddToCache(); assert( x & FLAG_CACHED );

The bit operations are not atomicvoid AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );

The bugvoid AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );

void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );

Cuzz GuaranteeGiven a program that creates at most n threads and

executes at most k instructionsCuzz finds every bug of depth d with probability in

every run of the program

A Bug of Depth 1 Bug Depth = no. of ordering constraints

sufficient to find the bug

Probability of bug >= 1/n n: no. of threads (~ tens)

A: …B: fork (child);C: p = malloc();D: …E: …

Parent

F: ….G: do_init();H: p->f ++; I: …J: …

Child Possible schedules

A B C D E F G H I J A B F G H C D E I J A B F G C D E H I J A B F G C H D E I J A B F G H I J C D E …

A Bug of Depth 2 Bug Depth = no. of ordering constraints

sufficient to find the bug

Probability of bug >= 1/nkn: no. of threads (~ tens)k: no. of instructions (~ millions)

A: …B: p = malloc();C: fork (child);D: ….E: if (p != NULL)F: p->f ++;G:

Parent

H: … I: p = NULL;J : ….

Child Possible schedules

A B C D E F G H I J A B C D E H I J F G A B C H I D E G J A B C D H E F I J G A B C H D E I J F G …

Another Bug of Depth 2 Bug Depth = no. of ordering constraints

sufficient to find the bug

Probability of bug >= 1/nkn: no. of threads (~ tens)k: no. of instructions (~ millions)

A: …B: Lock (A);C: …D: Lock (B);E: …

Parent

F: …G: Lock (B);H: … I: Lock (A);J: …

Child

Cuzz AlgorithmInputs: n: estimated bound on the number of threads

k: estimated bound on the number of stepsd: target bug depth

// 1. assign random priorities >= d to threads for t in [1…n] do priority[t] = rand() + d;

// 2. chose d-1 lowering points at randomfor i in [1...d) do lowering[i] = rand() % k;

steps = 0;while (some thread enabled) { // 3. Honor thread priorities Let t be the highest-priority enabled thread; schedule t for one step; steps ++;

// 4. At the ith lowering point, set the priority to i if steps == lowering[i] for some i priority[t] = i;}

A Bug of Depth 1Found when child has a higher probability than the

parent (prob = ½)

fork (child);p = malloc();

ParentPri = 1

do_init();p->f ++;

ChildPri = 2

fork (child);

p = malloc();

A Bug of Depth 2Found when the parent starts with a higher probability and

a lowering point is inserted after the branch condition (prob = 1/2*5 = 1/10)

p = malloc();fork (child);

if (p != NULL) p->f ++;

ParentPri = 3

p = NULL;

ChildPri = 2

p->f ++;

p = malloc();fork (child);

if (p != NULL)Pri = 1

Lowering Point

In Practice, Cuzz Beats its BoundCuzz performs far greater than the theoretical bound

1. The worst-case bound is based on a conservative analysis

2. We employ various optimizations

3. Programs have LOTS of bugs Probability of finding any of the bug is (roughly) the

sum of the probability of finding each

4. The buggy code is executed LOTS of times

For Some of our BenchmarksProbability increases with n, stays the same with k

In contrast, worst-case bound = 1/nkd-1

2 3 5 9 17 33 650

0.005

0.01

0.015

0.02

0.0254 items 16 items64 items

Number of Threads

Prob

abili

ty o

f find

ing

the

bug

Dimension TheoryAny partial-order G can be expressed as an

intersection of a set of total orders

This set is called a realizer of G

a

b

c

d

e

=a b c d e

a d b e c

Property of RealizersFor any unordered pair a and b, a realizer contains

two total orders that satisfy (a,b) and (b,a)

a

b

c

d

e

=a b c d e

a d b e c

Dimension of a Partial OrderDimension of G is the size of the smallest realizer of

G

Dimension is 2 for this example

a

b

c

d

e

=a b c d e

a d b e c

Why is it called “dimension”You can encode a partial-order of dimension d as

points in a d-dimensional spacea

b

c

d

e

=

a b c d e

a d b e c

∩0 1 2 3 4

0

1

2

3

4

a

b

c

d

e

Why is it relevant for usP = Set of all partial orders, B = Set of all bugs of depth 1

If you can uniformly sample the smallest realizer of a partial order p

Probability of any bug of depth 1 >= 1/dimension(p)

a

b

c

d

e

=a b c d e

a d b e c

All this is good, but

Finding the dimension of a partial order in NP complete

Real programs are not static partial-orders

Width of a Partial-OrderWidth of a partial-order G is the minimum number of

total orders needed to cover GWidth corresponds to the number of “threads” in G

For all G, Dimension(G) <= Width(G)

a

b

c

d

e

a

b

c

d

e

is covered by

Cuzz AlgorithmCuzz is an online randomized algorithm for uniformly

sampling a realizer of size Width(G)

Assign random priorities to “threads” and topologically sort based on the priorities

a

b

c

d

e

=a b c d e

a d b e c

Extension to Larger DepthsNote: a realizer of G covers all possible orderings of an unordered

pair

We define a d-realizer of G as a set of total orders that covers all possible orderings of d unordered pairs

d-dimension of G is the size of the smallest d-realizer of G

Theoremd-Dimension(G) <= Dimension(G) . kd-1

where k is the number of nodes in G

Cuzz is an online algorithm for uniformly sampling over a d-realizer of G

OptimizationsNeed to insert lowering points only at sync.

operationsSync operations include locks, semaphores, hardware

interlocked instructions, racy shared memory accessesBased on partial-order reduction in model checkingReduces k from ~millions to ~ten thousands

Reset algorithm after every join point Join point = a state in which only one thread is enabledReduces n and k

Join-Point OptimizationIf partial order G is the serial composition of A and B, thend-Dimension(G) = Max (d-Dimension(A), d-Dimension(B))

d-Dimension(G) <= Dimension(G) . 4d-1

a

b c

ed

e f

g

A

B

Other Practical ConsiderationsLower priority threads can be

starvedTemporarily boost priorities

with a very small probability

Perturbation of real-time can result in “false-errors”Low priority threads run very

slowlySome programs use timing

based synchronization

while(!x) { ; }

x = 1;

High PriLow Pri

sleep(10 sec);p->f++;

p = malloc();

High PriLow Pri

Comparison with Worst-Case Bound

Program Empirical Bound

Splash – Barnes 0.5 0.5

Splash – LU 0.5 0.5

Splash- Barnes 0.49 0.5

Pbzip 0.701 0.0001

Work Steal Queue 0.002 0.0003

Dryad 0.164 2x10-5

Scalability Program LOC d n k sync k optimized

Splash – FFT 1200 1 2 791 139Splash – LU 1130 1 2 1517 996Splash – Barnes 3465 1 2 7917 318

Pbzip2 1978 2 4 1981 1207TPL WSQ 495 2 4 1488 75

Dryad 16036 2 5 9631 1990IE - 1 25 1.4M 0.13MMozilla 245K 1 12 38.4M 3M

ConclusionsProbabilistic concurrency testing

Provides reasonable probabilistic bounds of finding bugs

Notion of bug depthA classification of concurrency bugsEssential for probabilistic boundsMany bugs have a very small depth

The initial prototype of Cuzz is very effectiveFinds lots of bugs within the first few hundred runsScales to large programs