Automatically Generating Search Heuristics for Concolic ...

28
Automatically Generating Search Heuristics for Concolic Testing Sooyoung Cha Korea University ICSE'18 @Gothenburg, Sweden (co-work with Seongjoon Hong, Junhee Lee, Hakjoo Oh)

Transcript of Automatically Generating Search Heuristics for Concolic ...

Automatically Generating Search Heuristics for Concolic Testing

Sooyoung Cha

Korea University

ICSE'18 @Gothenburg, Sweden

(co-work with Seongjoon Hong, Junhee Lee, Hakjoo Oh)

2

Concolic Testing

● Concolic testing (Concrete and Symbolic executions)

– An effective software testing method.– SAGE : Find 30% of all Windows 7 WEX security bugs.

SAGE

3

Concolic Testing

● Concolic testing (Concrete and Symbolic executions)

– An effective software testing method.– SAGE : Find 30% of all Windows 7 WEX security bugs.

● Key Challenge: Path Explosion– # of execution paths: 2

● ex) grep-2.2(3,836) : 2 paths (worst case) ● Exploring all paths is impossible.

# of branches

3,836

SAGE

4

Search Heuristic

● Numerous search heuristics have been proposed.– DFS, BFS, Random, Generational, CFDS, CGS, etc

– Select branches that are likely to maximize code coverage.

b1

b2

b3

DFS(path1) → b3

path1

b1

b2

b3

path2

b8

5

Search Heuristic

● Numerous search heuristics have been proposed.– DFS, BFS, Random, Generational, CFDS, CGS, etc

– Select branches that are likely to maximize code coverage.

b1

b2

b3

BFS(path1) → b1

path1

b1

b2

b3

path2

b4

6

Search Heuristic

● Numerous search heuristics have been proposed.– DFS, BFS, Random, Generational, CFDS, CGS, etc

– Select branches that are likely to maximize code coverage.

b1

b2

b3

path1

b1

b2

b3

path2

S.Heuristic(path1) → b2b5

b6

b7

7

0 500 1000 1500 2000 2500 3000 3500 4000iterations

1000

2000

3000

4000

5000

6000

7000

8000

bra

nch

es

covere

d

vim-5.7

CFDS

CGS

DFS

Gen

Random

Motivation

● No existing heuristics consistently achieve high coverage.● Designing new heuristic is highly nontrivial.

– Search Heuristic (ICSE, FSE, ASE, NDSS, ...)→

0 500 1000 1500 2000 2500 3000 3500 4000iterations

600

700

800

900

1000

1100

1200

1300

bra

nch

es

covere

d

expat-2.1.0

CFDS

CGS

DFS

Gen

Random

8

Goal

● Automatically Generating Search Heuristics

● Key ideas– Parameterized Search Heuristic. – Effective Parameter Search Algorithm.

Input : C program.(e.g., vim, expat)

Svim

Ouput : Search Heuristic.(e.g., Svim, Sexpat)

vim.c Our Tool

9

Effectiveness

● Considerable increase in branch coverage.

● Found real-world performance bugs.– gawk-3.0.3: ./gawk ‘V0\\n^000000000000070000000' file– grep-2.2: ./grep ‘\(\)\1\+**’ file

● Trigger the error in grep-3.1 (the latest version)

0 500 1000 1500 2000 2500 3000 3500 4000iterations

1000

2000

3000

4000

5000

6000

7000

8000

9000

bran

ches

cov

ered

vim-5.7

CFDSCGSDFS

GenOURSRandom

0 500 1000 1500 2000 2500 3000 3500 4000iterations

600

700

800

900

1000

1100

1200

1300

1400

bran

ches

cov

ered

expat-2.1.0

CFDSCGSDFS

GenOURSRandom

10

Effectiveness

● Considerable increase in branch coverage.

● Found real-world performance bugs.– gawk-3.0.3: ./gawk ‘V0\\n^000000000000070000000' file– grep-2.2: ./grep ‘\(\)\1\+**’ file

● Trigger the error in grep-3.1 (the latest version)

`

11

Parameterized Search Heuristic

● Search Heuristicθ : Path Branch→– Generating a “good search heuristic” Finding a “good parameter → θ ”

B1

B2

B3

scoreθ(B1) = 0.1

scoreθ(B2) = 0.7

scoreθ(B3) = -0.5

12

Parameterized Search Heuristic

(1). Represent branches as feature vectors– A feature : a boolean predicate on branches.

● ex1) the branch in main function ? ● ex2) true branch of a case statement ?

B3 = 1, 0, 0, 0, 1⟨ ⟩

B1 = 1, 0, 1, 1, 0⟨ ⟩

B2 = 0, 1, 1, 1, 0⟨ ⟩

13

Parameterized Search Heuristic

● Design 40 features.– 12 static features

● extracted without executing a program.

(e.g., true branch of a loop)

– 28 dynamic features● extracted by the program execution.

(e.g., branch newly covered in the previous execution)

14

Parameterized Search Heuristic

(2). Scoring– The parameter : a k-dimension vector.

θ = -0.5, 0.1, 0.4, 0.2, 0⟨ ⟩– Linear combination of feature vector and parameter

● Scoreθ(B1) = 1, 0, 1, 1, 0 ⟨ ⟩ · -0.5, 0,1, 0.4, 0.2, 0 = 0.1⟨ ⟩● Scoreθ(B2) = 0, 1, 1, 1, 0 ⟨ ⟩ · -0.5, 0.1, 0.4, 0.2, 0 = ⟨ ⟩ 0.7● Scoreθ(B3) = 1, 0, 0, 0, 1 ⟨ ⟩ · -0.5, 0.1, 0.4, 0.2, 0 = -0.5⟨ ⟩

(3). Choosing the branch with the highest score– B2

15

Parameter Search Algorithm

● Finding good parameters is crucial.● Naive algorithm based on random sampling.

θ1 = -0.5, 0.1, 0.4, 0.2, 0 ⟨ ⟩ → Coverage(519)

θ2 = -0.9, 0.5, 0.9, -0.2, 1.0 ⟨ ⟩ → Coverage(423)

θn = 0.7, -0.2, -0.9, -0.9, 0.3 ⟨ ⟩ → Coverage(782)

– Failed to find good parameters.● Search space is intractably large.● Performance variation in concolic testing.

θ22

(best)Timeout

16

Parameter Search Algorithm

● Our Algorithm– Iteratively refine the sample search space via feedback.– Repeat the three steps. (Find, Check, Refine)

0. The 40 sample spaces are Initialized: [-1, 1]● θ = -0.5, 0.1, 0.4, 0.2, ..., 0⟨ ⟩

1. Find • Find good candidate parameters quickly. grep-2.2 + Top 10 θ’

↑[-1, 1]

↑[-1, 1]

↑[-1, 1]...

θ1(1,230),2(1,100), θ3(1,321), …, θ1,000(872)

17

Parameter Search Algorithm

● Our Algorithm 2. Check : rule out unreliable parameters.

3. Refine ● θt1 = ⟨ +0.3, −0.6, +0.6, ..., +0.8 ⟩● θt2 = ⟨ +0.7, −0.2, −0.7, ..., +0.8 ⟩

Avg θ′1(1,310), Avg θ′2(1,457), Avg θ′3(1,436), …, Avg θ10(1,500).grep-2.2 +

Top 2 ( θt1, θt2)

18

Parameter Search Algorithm

● Our Algorithm 2. Check

3. Refine● θt1 = ⟨ +0.3, −0.6, +0.6, ..., +0.8 ⟩● θt2 = ⟨ +0.7, −0.2, −0.7, ..., +0.8 ⟩

Avg θ′1(1,310), Avg θ′2(1,457), Avg θ′3(1,436), …, Avg θ10(1,500).grep-2.2 +

Top 2 ( θt1, θt2)

1st Sample Space: [-1, 1] [→ min(0.3, 0.7), 1] → [0.3, 1]

19

Parameter Search Algorithm

● Our Algorithm 2. Check

3. Refine● θt1 = ⟨ +0.3, −0.6, +0.6, ..., +0.8 ⟩● θt2 = ⟨ +0.7, −0.2, −0.7, ..., +0.8 ⟩

Avg θ′1(1,310), Avg θ′2(1,457), Avg θ′3(1,436), …, Avg θ10(1,500).grep-2.2 +

Top 2 ( θt1, θt2)

2nd Sample Space: [-1, 1] [-1, → max(-0.6, -0.2)] → [-1, -0.2]

20

Parameter Search Algorithm

● Our Algorithm 2. Check

3. Refine● θt1 = ⟨ +0.3, −0.6, +0.6, ..., +0.8 ⟩● θt2 = ⟨ +0.7, −0.2, −0.7, ..., +0.8 ⟩

Avg θ′1(1,310), Avg θ′2(1,457), Avg θ′3(1,436), …, Avg θ10(1,500).grep-2.2 +

Top 2 ( θt1, θt2)

3rd Sample Space: [-1, 1] [-1,1]→

21

Parameter Search Algorithm

● Our Algorithm– The 40 sample spaces are refined !

● θ = 0.5, -0.4, 0.4, 0.2, ..., 0⟨ ⟩

– ‘Find’ stage again !● Randomly sample the parameters in refined sample space !

↑[-1, -0.2]

↑[-1, 1]

↑[0.3, 1]

↑[-1, 1]...

22

Experiments

● Implemented in CREST● Compared with five existing heuristics

– CGS, CFDS, Random, Generational, DFS

● Used 10 open-source C programsProgram # Total branches LOC

vim-5.7 35,464 165K

gawk-3.0.3 8,038 30K

expat-2.1.0 8,500 49K

grep-2.2 3,836 15K

sed-1.17 2,656 9K

tree-1.6.0 1,438 4K

cdaudio 358 3K

floppy 268 2K

kbfiltr 204 1K

replace 196 0.5K

23

Evaluation Setting

● The same initial inputs● The same testing budget (4,000 executions)● Average branch coverage for 100 trials (50 for vim)

– 1 trial = 4,000 executions

240 500 1000 1500 2000 2500 3000 3500 4000

iterations

600

800

1000

1200

1400

1600

1800

bran

ches

cov

ered

grep-2.2

CFDSCGSDFS

GenOURSRandom

0 500 1000 1500 2000 2500 3000 3500 4000iterations

0

100

200

300

400

500

600

700

800

bran

ches

cov

ered

tree-1.6.0

CFDSCGSDFS

GenOURSRandom

0 500 1000 1500 2000 2500 3000 3500 4000iterations

1000

2000

3000

4000

5000

6000

7000

8000

9000

bran

ches

cov

ered

vim-5.7

CFDSCGSDFS

GenOURSRandom

0 500 1000 1500 2000 2500 3000 3500 4000iterations

500

1000

1500

2000

2500

3000

bran

ches

cov

ered

gawk-3.0.3

CFDSCGSDFS

GenOURSRandom

Effectiveness

● Average branch coverage (6 Smiles )

0 500 1000 1500 2000 2500 3000 3500 4000iterations

600

700

800

900

1000

1100

1200

1300

1400

bran

ches

cov

ered

expat-2.1.0

CFDSCGSDFS

GenOURSRandom

0 500 1000 1500 2000 2500 3000 3500 4000iterations

0

100

200

300

400

500

600

700

800

bran

ches

cov

ered

sed-1.17

CFDSCGSDFS

GenOURSRandom

25

1.17(1993.05)

1.18(1993.06)

2.05(1994.05)

version

500

600

700

800

900

1000

aver

age

cove

rage

sed

OURSCGSCFDS

RandomDFSGen

3.0.3(1997.05)

3.0.4(1999.06)

3.0.5(2000.06)

3.0.6(2000.08)

3.1.0(2001.06)

version

1000

1500

2000

2500

aver

age

cove

rage

gawk

OURSCGSCFDS

RandomDFSGen

Effectiveness

● Reusable over multiple subsequent programs

● Time for obtaining the heuristics (with 20 cores)– vim-5.7(24 h), expat-2.1.0(10h), grep-2.2(5h), tree-1.6.0(3h)

(4 Years)

3.0.3 3.0.4 3.0.6 3.1.0

(A Year)

1.17 2.053.0.5version

1.18version

Important Features

● No winning feature always belongs to top 10 features.● Depending the program, the role of feature changes. (feature 10)

Top 10 positive features Top 10 negative features

Search Heuristic should be adaptively tuned for each program.

27

Tool

● Make our tool publicly available.– Parametric Dynamic Symbolic Execution

Tool: ParaDySE

● Make our tool publicly available.– Parametric Dynamic Symbolic Execution

https://github.com/kupl/ParaDySE

Thank You