Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax...

63
Verifying a compiler: Why? How? How far? Xavier Leroy INRIA Paris-Rocquencourt CGO 2011 X. Leroy (INRIA) Verifying a compiler CGO 2011 1 / 57

Transcript of Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax...

Page 1: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Verifying a compiler:Why? How? How far?

Xavier Leroy

INRIA Paris-Rocquencourt

CGO 2011

X. Leroy (INRIA) Verifying a compiler CGO 2011 1 / 57

Page 2: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Compiler verification:

Why?

X. Leroy (INRIA) Verifying a compiler CGO 2011 2 / 57

Page 3: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Can you trust your compiler?

Executablemachine code

Sourceprogram

Compiler6= ?

The miscompilation issue: Bugs in the compiler can lead to incorrectmachine code being generated from a correct source program.

X. Leroy (INRIA) Verifying a compiler CGO 2011 3 / 57

Page 4: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Miscompilation happens

NULLSTONE isolated defects [in integer division] in twelve of twentycommercially available compilers that were evaluated.

http://www.nullstone.com/htmls/category/divide.htm

We tested thirteen production-quality C compilers and, for each, foundsituations in which the compiler generated incorrect code for accessingvolatile variables.

E. Eide & J. Regehr, EMSOFT 2008

To improve the quality of C compilers, we created Csmith, arandomized test-case generation tool, and spent three years using it tofind compiler bugs. During this period we reported more than 325previously unknown bugs to compiler developers. Every compiler wetested was found to crash and also to silently generate wrong codewhen presented with valid input.

X. Yang, Y. Chen, E. Eide & J. Regehr, PLDI 2011

X. Leroy (INRIA) Verifying a compiler CGO 2011 4 / 57

Page 5: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Exhibit A: GCC bug #323

Title: optimized code gives strange floating point results.

#include <stdio.h>

void test(double x, double y)

{double y2 = x + 1.0; // computed in 80 bits, not rounded to 64 bits

if (y != y2) printf("error_n");

}

void main()

{double x = .012;

double y = x + 1.0; // computed in 80 bits, rounded to 64 bits

test(x, y);

}

Why it is a bug: C99 allows intermediate results to be computed withexcess precision, but requires them to be rounded at assignments.

X. Leroy (INRIA) Verifying a compiler CGO 2011 5 / 57

Page 6: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Exhibit A: GCC bug #323

Reported in 2000.

Dozens of duplicates.

More than 150 comments.

Still not acknowledged as a bug.

“Addressed” in 2009 (in GCC 4.5) via flag-fexcess-precision=standard.

Responsible for PHP’s strtod() function not terminating on someinputs. . .

. . . causing denial of service on many Web sites.

X. Leroy (INRIA) Verifying a compiler CGO 2011 6 / 57

Page 7: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Are miscompilation bugs a problem?

For ordinary software:

Compiler-introduced bugs are negligible compared with the bugs inthe program itself.

Programmers rarely run into them.

When they do, debugging is very hard.

X. Leroy (INRIA) Verifying a compiler CGO 2011 7 / 57

Page 8: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Are miscompilation bugs a problem?

For critical software validated by testing only:

Good testing should find all bugs, even those compiler-introduced.

Optimizations can complicate test plans.

X. Leroy (INRIA) Verifying a compiler CGO 2011 8 / 57

Page 9: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Are miscompilation bugs a problem?

!"#$%&

'%()*+,-%.*(/01%-2(2-2%3456%7849:6%8;6$8<;62%=4>5?$@:%>4@A97$@:9$B2

#$8"872B"79$8C"98D562>4?%%%EE%F%GH%IE%GG%EJ

1K$?LB$%M%04??"@7$6%=$%N4B%1B$>:89O5$6

!"#$%&'()*+),-,&./$)*$)0123)4567)8)%9:&;<=$;)&9+&$,)=$,),+;(>%$,

!"#$%&'()*+ ,"#-%&'.)*+

/0"1234%&'.)*+%-30"12%'5)*+

678812%&')*+

9"1:#$32%&'*)*+

;20<<#="1 >320?34$#"!$#=0"0?12

@233-A3%1&'*)*+

!"#$%&

'%()*+,-%.*(/01%-2(2-2%3456%7849:6%8;6$8<;62%=4>5?$@:%>4@A97$@:9$B2

#$8"872B"79$8C"98D562>4?%%%EE%F%&G%HE%&&%EI

1J59K$?$@:%L%-M6:N?$%L

!"#$%&'($#

)(*+,*+-'./+0$1&"#/.

!'.2.34#1$5/

!67+7'($#

789:;+:.</.0$=#.$(+>".432/&$>'#'$=

For critical software validated by review, analysis & testing:(e.g. DO-178 in avionics)

Manual reviews of (representative fragments of) generated assembly.

Turning all optimizations off to get traceability.

Reduced usefulness of formal verification.

X. Leroy (INRIA) Verifying a compiler CGO 2011 9 / 57

Page 10: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Miscompilation and formal verification

Executable model(Simulink, SCADE, etc)

code generator

Source code (C)

compiler

Executablemachine code

Modelchecking

Programproof

Staticanalysis

The guarantees obtained (so painfully!) by source-level formal verificationmay not carry over to the executable code . . .

X. Leroy (INRIA) Verifying a compiler CGO 2011 10 / 57

Page 11: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

A solution? Verified compilers

With a regular compiler:

Sourceprogram

Executable

compiler

Formal verification

?

X. Leroy (INRIA) Verifying a compiler CGO 2011 11 / 57

Page 12: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

A solution? Verified compilers

With a formally verified compiler:

Sourceprogram

Executable

compiler

Formal verification

observationalequivalence

The properties formally established on the source program carry over to theexecutable.

X. Leroy (INRIA) Verifying a compiler CGO 2011 11 / 57

Page 13: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Formal verification of compilers

A radical solution to the miscompilation problem:

Apply program proof to the compiler itself to prove that it preserves thesemantics of the source code.

After all, compilers are complicated programs with a simple specification:

If compilation succeeds, the generated code should behave asprescribed by the semantics of the source program.

X. Leroy (INRIA) Verifying a compiler CGO 2011 12 / 57

Page 14: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

An old idea. . .

Mathematical Aspects of Computer Science, 1967

X. Leroy (INRIA) Verifying a compiler CGO 2011 13 / 57

Page 15: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

An old idea. . .

Machine Intelligence (7), 1972.

X. Leroy (INRIA) Verifying a compiler CGO 2011 14 / 57

Page 16: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Compiler verification:

How far are we today?

(X. Leroy, Formal verification of a realistic compiler, CACM 07/2009)

X. Leroy (INRIA) Verifying a compiler CGO 2011 15 / 57

Page 17: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

The CompCert project(X.Leroy, S.Blazy, et al)

Develop and prove correct a realistic compiler, usable for critical embeddedsoftware.

Source language: a very large subset of C.

Target language: PowerPC/ARM/x86 assembly.

Generates reasonably compact and fast code⇒ careful code generation; some optimizations.

Note: compiler written from scratch, along with its proof; not trying toprove an existing compiler.

X. Leroy (INRIA) Verifying a compiler CGO 2011 16 / 57

Page 18: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

The subset of C supported

Supported natively:

Types: integers, floats, arrays, pointers, struct, union.

Expressions: all of C, including pointer arithmetic.

Control: if/then/else, loops, goto, regular switch.

Functions, including recursive functions and function pointers.

Dynamic allocation (malloc and free).

Volatile accesses.

X. Leroy (INRIA) Verifying a compiler CGO 2011 17 / 57

Page 19: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

The subset of C supported

Not supported at all:

The long long and long double types.

Unstructured switch (Duff’s device), longjmp/setjmp.

Variable-arity functions.

Supported through (unproved!) expansion after parsing:

Block-scoped variables.

typedef.

Bit-fields.

Assignment between struct or union.

Passing struct or union by value.

X. Leroy (INRIA) Verifying a compiler CGO 2011 18 / 57

Page 20: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

The formally verified part of the compiler

CompCert C Clight C#minor

CminorCminorSelRTL

LTL LTLin Linear

MachAsm

side-effects out

of expressions

type elimination

loop simplifications

stack allocation

of “&” variables

instruction

selection

CFG construction

expr. decomp.

register allocation (IRC)

linearization

of the CFG

spilling, reloading

calling conventions

layout of stack frames

asm code

generation

Optimizations: constant prop., CSE, tail calls,

(LCM), (Software pipelining)

(Instruction scheduling)

X. Leroy (INRIA) Verifying a compiler CGO 2011 19 / 57

Page 21: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Formally verified in Coq

After 50 000 lines of Coq and 4 person.years of effort:

Theorem transf_c_program_is_refinement:

forall p tp,

transf_c_program p = OK tp ->

(forall beh, exec_C_program p beh -> not_wrong beh) ->

(forall beh, exec_asm_program tp beh -> exec_C_program p beh).

Behaviors beh = termination / divergence / going wrong+ trace of I/O operations (syscalls, volatile accesses).

X. Leroy (INRIA) Verifying a compiler CGO 2011 20 / 57

Page 22: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

The whole CompCert compiler

AST C

AST Asm

C source

AssemblyExecutable

parsing, construction of an AST

type-checking, de-sugaring

Verifi

edco

mp

iler

printing of

asm syntax

assembling

linking

Type reconstruction

Graph coloring

Code linearization heuristics

Proved in Coq(extracted to Caml)

Not proved(hand-written in Caml)

Part of the TCB

Not part of the TCB

X. Leroy (INRIA) Verifying a compiler CGO 2011 21 / 57

Page 23: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Performance of generated code(On a PowerPC G5 processor)

AE

S

Alm

aben

ch

Bin

aryt

rees

Fan

nku

ch

FF

T

Kn

ucl

eoti

de

Nb

od

y

Qso

rt

Ray

trac

er

Sp

ectr

al

VM

ach

Execution time

gcc -O0

CompCertgcc -O1gcc -O3

X. Leroy (INRIA) Verifying a compiler CGO 2011 22 / 57

Page 24: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Status

Complete source & proofs available for evaluation and research purposes:

http://compcert.inria.fr/

(+ research papers)

Compiler runs on / produces code for{Linux,MacOSX,Cygwin} / {PPC, ARM, x86}.

Tested on small benchmarks (up to 3000 LOC), real-world avionics codes,and by random testing.

X. Leroy (INRIA) Verifying a compiler CGO 2011 23 / 57

Page 25: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

As of early 2011, the under-development version of CompCert isthe only compiler we have tested for which Csmith cannot findwrong-code errors. This is not for lack of trying: we havedevoted about six CPU-years to the task. The apparentunbreakability of CompCert supports a strong argument thatdeveloping compiler optimizations within a proof framework,where safety checks are explicit and machine-checked, hastangible benefits for compiler users.

X. Yang, Y. Chen, E. Eide & J. Regehr, PLDI 2011

X. Leroy (INRIA) Verifying a compiler CGO 2011 24 / 57

Page 26: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Compiler verification:

How?

X. Leroy (INRIA) Verifying a compiler CGO 2011 25 / 57

Page 27: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Verification patterns(for each compilation pass)

transformation transformation

validator

×

Verified transformation Verified translation validation

= formally verified

= not verified

Verified translation validation:

Less to prove (if validator simpler than transformation).

Strong soundness guarantees, but no completeness in general.

Validator reusable for several variants of an optimization.

X. Leroy (INRIA) Verifying a compiler CGO 2011 26 / 57

Page 28: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

An example of a verified transformation:

register allocation by graph coloring

(X. Leroy, A formally verified compiler back-end, §8, J. Autom. Reasoning 43(4))

(S. Blazy, B. Robillard, A. W. Appel, Formal verification of coalescing graph-coloring

register allocation, ESOP 2010)

X. Leroy (INRIA) Verifying a compiler CGO 2011 27 / 57

Page 29: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Starting point: a CFG for a Register Transfer Language

s = 0.0

i = 0

if (i >= size)

a = i << 2

b = load(tbl, a)

c = float(b)

s = s +f c

i = i + 1

d = float(size)

e = s /f d

return(e)

X. Leroy (INRIA) Verifying a compiler CGO 2011 28 / 57

Page 30: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Algorithm 1: liveness analysis (backward dataflow analysis)

Lout(p) =⋃{transf(Lout(s), instr-at(s)) | s successof of p}

s = 0.0

i = 0

if (i >= size)

a = i << 2

b = load(tbl, a)

c = float(b)

s = s +f c

i = i + 1

d = float(size)

e = s /f d

return(e)

[tbl size s]

[tbl size i s]

[tbl size i s]

[tbl size i s a]

[tbl size i s b]

[tbl size i s c]

[tbl size i s]

[tbl size i s]

[s d]

[e]

X. Leroy (INRIA) Verifying a compiler CGO 2011 29 / 57

Page 31: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Algorithm 2: construct interference graph

For each instruction p : r := . . ., add edges between r and Lout(p) \ {r}.

(+ Chaitin’s special case for moves.) (+ Recording of preferences.)

tbl

size

is

a

b

cd

e

X. Leroy (INRIA) Verifying a compiler CGO 2011 30 / 57

Page 32: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Algorithm 3: Coloring of the interference graph

Construct a function φ : Variable → Register + Stackslot such thatφ(x) 6= φ(y) if x and y interfere.

We use the Iterated Register Coalescing heuristic by George & Appel.

tbl

size

is

a

b

cd

e

tbl

size

is

a

b

cd

e

X. Leroy (INRIA) Verifying a compiler CGO 2011 31 / 57

Page 33: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Algorithm 4: Rewriting the code

Replace all variables x by their color φ(x).(Spilling & reloading are done in a later pass.)

f1 = 0.0

r3 = 0

if (r3 >= r2)

r4 = r3 << 2

r4 = load(r1, r4)

f2 = float(r4)

f1 = f1 +f f2

r3 = r3 + 1

f2 = float(r2)

f1 = f1 /f f2

return(f1)

X. Leroy (INRIA) Verifying a compiler CGO 2011 32 / 57

Page 34: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

What needs to be proved? Part 1: proofs of algorithms

Liveness analysis: show that the mapping Lout computed by Kildall’sfixpoint algorithm satisfies the inequations

Lout(p) ⊇ transf(Lout(s), instr-at(p)) if s successor of p

Construction of the interference graph: show that the final graph Gcontains all expected edges, e.g.

p : x := . . . ∧ y 6= x ∧ y ∈ Lout(p) =⇒ (x , y) ∈ G

Coloring of the interference graph: show that (x , y) ∈ G =⇒ φ(x) 6= φ(y)

Either by direct proof (Blazy, Robillard, Appel) or by verified validation:

Validator: enumerate all edges (x , y) of G and abort if φ(x) = φ(y)

Correctness proof for the validator: trivial.

X. Leroy (INRIA) Verifying a compiler CGO 2011 33 / 57

Page 35: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

What needs to be proved? Part 2: semantic preservation

What does “x is live at p” means, semantically?

Hmmm . . .

What does “x is dead at p” means, semantically?

That the program behaves the same regardless of the value of x at point p.

Invariant

Let E : variable → value be the values of variables at point p in theoriginal program. Let R : location→ value be the values of locations atpoint p in the transformed program.E and R agree at p, written p ` E ≈ R, iff

E (x) = R(φ(x)) for all x live before point p

X. Leroy (INRIA) Verifying a compiler CGO 2011 34 / 57

Page 36: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

What needs to be proved? Part 2: semantic preservation

What does “x is live at p” means, semantically?

Hmmm . . .

What does “x is dead at p” means, semantically?

That the program behaves the same regardless of the value of x at point p.

Invariant

Let E : variable → value be the values of variables at point p in theoriginal program. Let R : location→ value be the values of locations atpoint p in the transformed program.E and R agree at p, written p ` E ≈ R, iff

E (x) = R(φ(x)) for all x live before point p

X. Leroy (INRIA) Verifying a compiler CGO 2011 34 / 57

Page 37: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

What needs to be proved? Part 2: semantic preservation

What does “x is live at p” means, semantically?

Hmmm . . .

What does “x is dead at p” means, semantically?

That the program behaves the same regardless of the value of x at point p.

Invariant

Let E : variable → value be the values of variables at point p in theoriginal program. Let R : location→ value be the values of locations atpoint p in the transformed program.E and R agree at p, written p ` E ≈ R, iff

E (x) = R(φ(x)) for all x live before point p

X. Leroy (INRIA) Verifying a compiler CGO 2011 34 / 57

Page 38: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Proving semantic preservation

Show a simulation diagram of the form

p,E ,Mp ` E ≈ R

p,R,M

p′,E ′,M ′

t

?........................................

p′ ` E ′ ≈ R ′

p′,R ′,M ′

t

?

................

Hypotheses: left, a transition in the original code; top, the invariant(register agreement) before the transition.

Conclusions: one transition in the transformed code; bottom, the invariantafter the transition.

X. Leroy (INRIA) Verifying a compiler CGO 2011 35 / 57

Page 39: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Semantic preservation for whole executions

(initial state) S1invariant

T1 (initial state)

S2

ε ?

invariantT2

ε?

S3

ν1 ?

invariantT3

ν1?

S4

ν2 ?

invariantT4

ν2?

(final state) S5

ε ?

invariantT5 (final state)

ε?

Proves that the original program and the transformed program have thesame behavior (the trace t = ν1.ν2).

X. Leroy (INRIA) Verifying a compiler CGO 2011 36 / 57

Page 40: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

An example of verified translation validation:

software pipelining

(J.-B. Tristan and X. Leroy, A simple, verified validator for software pipelining,

POPL 2010)

X. Leroy (INRIA) Verifying a compiler CGO 2011 37 / 57

Page 41: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Software pipelining

LDLDMULADDST

LDLDMULADDST

LDLDMULADDST

LDLDMULADDST

Original loop (4 iterations) Pipelined loop

X. Leroy (INRIA) Verifying a compiler CGO 2011 38 / 57

Page 42: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Software pipelining

LDLDMULADDST

LDLDMULADDST

LDLDMULADDST

LDLDMULADDST

MUL LDST ADD LD

Steady

state

Original loop (4 iterations) Pipelined loop

X. Leroy (INRIA) Verifying a compiler CGO 2011 38 / 57

Page 43: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Software pipelining

LDLDMULADDST

LDLDMULADDST

LDLDMULADDST

LDLDMULADDST

MUL LDST ADD LD

Steady

state

LDLDMUL LDLDMUL LDADD LD

Prologue

MULST ADDST ADDST

Epilogue

Original loop (4 iterations) Pipelined loop

X. Leroy (INRIA) Verifying a compiler CGO 2011 38 / 57

Page 44: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

The pipeliner as a black box

pipeline(B, i ,N) = (P ,S, E , µ, δ)

original loop body

loop index variable

loop bound variable

prologue

steady state

epilogue

initiation intervalunrolling factor

Effect on the loop variable i :

B : +1 P : +µ S : +δ E : +0

X. Leroy (INRIA) Verifying a compiler CGO 2011 39 / 57

Page 45: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

The corresponding loop transformation

i := 0

if(i < N)

B

in:

out

i := 0

if(N ≥ µ)

M := N − µM := M / δ

M := M × δM := M + µ

P

if(i < M)

SE

if(i < N)

B

in:

out

Before: After:

X. Leroy (INRIA) Verifying a compiler CGO 2011 40 / 57

Page 46: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

How to validate it?

For a fixed number of iterations N = µ+ n × δ + m, consider theunrollings of the two loops:

Original loop: BN (= N copies of B)

Pipelined loop: P ;Sn; E ;Bm

If n,m are known at compile-time, we can use symbolic evaluation to showsemantic equivalence between these two basic blocks.

X. Leroy (INRIA) Verifying a compiler CGO 2011 41 / 57

Page 47: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Basics of symbolic evaluation

Interpret instructions as substitutions:

α(x := y + z) = x 7→ y + z

Interpret basic blocks by composing substitutions:

x u Mem

+1

store

load

*

y x Memz

α

y := x + 1;store(u, y);z := load(u);x := y ∗ z ;

=

X. Leroy (INRIA) Verifying a compiler CGO 2011 42 / 57

Page 48: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Validation by comparison of symbolic evaluations

Fundamental property: two basic blocks B1 and B2 that have the samesymbolic evaluation (α(B1) = α(B2)) are semantically equivalent.

Special care must be taken for:

Comparing modulo algebraic laws and load/store properties:

(e + 1) + 2 = e + 3

load(store(m, e1, e2), e3) = load(m, e3) if e1 6= e3

Tracking operations that can fail at run-time (integer division).

Excluding fresh temporaries (introduced by Modulo VariableExpansion) from the comparison.

X. Leroy (INRIA) Verifying a compiler CGO 2011 43 / 57

Page 49: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Checking semantic equivalence of the two loops

Original loop: Bµ+n×δ+m

Pipelined loop: P ;Sn; E ;Bm

How to check semantic equivalence for all counts n,m ?

Bµ Bδ Bδ Bm

P S S

E

Original:

Pipelined:

X. Leroy (INRIA) Verifying a compiler CGO 2011 44 / 57

Page 50: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Checking semantic equivalence

Consider the case n = m = 0 and N = µ:

P

E

Original:

Pipelined:

First (necessary) condition: P ; E ≈ Bµ

(with ≈ denoting semantic equivalence).

X. Leroy (INRIA) Verifying a compiler CGO 2011 45 / 57

Page 51: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Checking semantic equivalence

What if we exit the pipelined loop one step earlier?

(I.e. n 7→ n − 1 and m 7→ m + δ and N unchanged.)

Bµ Bδ Bδ Bm

P S S

E

Original:

Pipelined:

X. Leroy (INRIA) Verifying a compiler CGO 2011 46 / 57

Page 52: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Checking semantic equivalence

What if we exit the pipelined loop one step earlier?

(I.e. n 7→ n − 1 and m 7→ m + δ and N unchanged.)

Bµ Bδ Bδ Bm

P S S

E

Original:

Pipelined:

E

Second condition: S; E ≈ E ;Bδ

X. Leroy (INRIA) Verifying a compiler CGO 2011 46 / 57

Page 53: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

The validation algorithm

validate (i ,N,B) (P,S, E , µ, δ) θ =α(Bµ) ≈θ α(P; E) (A)∧ α(E ;Bδ) ≈θ α(S; E) (B)∧ α(B) v θ (C)∧ α(B)(i) = i + 1 (D)∧ α(P)(i) = i + µ (E)∧ α(S)(i) = i + δ (F)∧ α(E)(i) = i (G)∧ α(B)(N) = α(P)(N) = α(S)(N) = α(E)(N) = N (H)

(Where θ is the set of observed variables, α is symbolic evaluation,≈θ a syntactic equivalence, and v a condition on free variables.)

X. Leroy (INRIA) Verifying a compiler CGO 2011 47 / 57

Page 54: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Soundness of the validator

validate (i ,N,B) (P,S, E , µ, δ) θ = true

∀n,m, α(Bµ+δ×n+m) ≈θ α(P;Sn; E ;Bm) v θ

(R1,M)

(R2,M)

(R ′1,M′)

(R ′2,M′)

execution oforiginal loop, unrolled

execution ofpipelined loop, unrolled

agree on θ agree on θ

X. Leroy (INRIA) Verifying a compiler CGO 2011 48 / 57

Page 55: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Summary

α(Bµ) ≈θ α(P; E) ∧ α(E ;Bδ) ≈θ α(S; E)

A surprisingly simple validator for a difficult optimization.

So simple it could go into regular, non-verified compilers.

The validator uses completely different concepts from the optimization.(No RAW/WAR/WAW dependencies; no modulo variable expansion; etc)

The validator is incomplete in theory, but appears complete againstpublished modulo-scheduling algorithms.

X. Leroy (INRIA) Verifying a compiler CGO 2011 49 / 57

Page 56: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Compiler verification:

How far can we go?

X. Leroy (INRIA) Verifying a compiler CGO 2011 50 / 57

Page 57: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Current status

At this stage of the CompCert experiment, the initial goal – provingcorrect a nontrivial compiler – appears feasible.

(Within the limitations of today’s proof assistants such as Coq.)

This opens up many directions for future work.

X. Leroy (INRIA) Verifying a compiler CGO 2011 51 / 57

Page 58: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Directions for future work

Higher assurance:

Prove more of the unproved parts (e.g. parsing, bit-field emulation).

Formal connections with verification tools.

Formal connections with hardware (microarchitecture) verification.

Other source languages:

Reactive languages (M. Pouzet, M. Pantel, M. Strecker).

Functional languages (Z. Dargaye, A. Tolmach).

Elements of C++ (T. Ramananandro, G. Dos Reis).

Connections with run-time system verification (A. Tolmach et al).

X. Leroy (INRIA) Verifying a compiler CGO 2011 52 / 57

Page 59: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Directions for future work

Shared-memory concurrency:

Verified Software Toolchain (A. Appel et al).

CompCertTSO (P. Sewell et al).

Shorter, easier proofs:

More proof automation.

Generic execution engine for optimizations (S. Lerner & Z. Tatlock)

X. Leroy (INRIA) Verifying a compiler CGO 2011 53 / 57

Page 60: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

More optimizations!

Done in CompCert and related experiments:

Verified transformations Verified translation validation

Constant propagation Lazy Code Motion

CSE over ext. basic blocks Trace scheduling

Software pipelining

Register allocation Register allocation

w/ trivial spilling w/ advanced spilling & splitting

Much remains to be done, in particular:

Loop optimizations.

SSA-based optimizations.

Verified translation validation as the path of least resistance?

X. Leroy (INRIA) Verifying a compiler CGO 2011 54 / 57

Page 61: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

SSA and what to prove about it

Plan A: formalize and reason about SSA semantics.

RTL IntoSSA

SSA SSAoptims

SSA Out ofSSA

RTL

Plan B: use SSA in untrusted implementations; validate over RTL.

RTL IntoSSA

SSA SSAoptim

SSA Out ofSSA

RTL × RTL

Validator

Is SSA just an algorithmic device? (faster, simpler, more powerful optims)Or does SSA have deep semantic properties as well?

X. Leroy (INRIA) Verifying a compiler CGO 2011 55 / 57

Page 62: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

Loop optimizations

Optimizations such as loop interchange or blocking appear difficult toprove by elementary simulation arguments.

A promising approach: validation a posteriori with the polyhedral model.(Work in progress by A. Pilkiewicz; see also Ph. Clauss et al)

originalloop nest

optimizedloop nest

untrustedoptimizer

polyhedral

model 1

polyhedral

model 2

polyhedral

extraction

polyhedral

extraction

validator

X. Leroy (INRIA) Verifying a compiler CGO 2011 56 / 57

Page 63: Verifying a compiler: Why? How? How far?type-checking, de-sugaring compiler printing of asm syntax assembling linking Type reconstruction Graph coloring Code linearization heuristics

In closing. . .

The formal verification of compilers and other development andverification tools

. . . is a fascinating challenge,

. . . appears within reach,

. . . and has practical importance for critical software.

Much remains to be done. Feel free to join!

X. Leroy (INRIA) Verifying a compiler CGO 2011 57 / 57