An Introduction toProof-Carrying Code
Peter LeeCarnegie Mellon University
Lecture 1
October 29, 2001
ConCert Meeting
Plan
Today: Show and tell. Cartoons Some history Special J compiler Demo
Next time: Technical details. Lfi and Oracle-based checking Safety policies Compiler strategy and annotations Engineering considerations Ideas for ConCert-related projects
Arianne 5
On June 4, 1996, the Arianne 5 took off on its maiden flight.
40 seconds into its flight it veered off course and exploded.
It was later found to be an error in reuse of a software component.
For the next two years, virtually every research presentation used this picture.
“Better, Faster, Cheaper”
In 1999, NASA lost both the Mars Polar Lander and the Climate Orbiter.
Later investigations determined software errors were to blame.
Orbiter: Component reuse error.
Lander: Precondition violation.
USS Yorktown
“After a crew member mistakenly entered a zero into the data field of an application, the computer system proceeded to divide another quantity by that zero. The operation caused a buffer overflow, in which data leaked from a temporary storage space in memory, and the error eventually brought down the ship's propulsion system. The result: the Yorktown was dead in the water for more than two hours.”
Programmable mobile devices
By 2003, one in five people will own a mobile communications device.
Nokia expects to sell 500M Java-enabled phones in 2003.
Most of these devices will be power and memory limited.
Security Attacks
According to CERT, the majority of security attacks exploit
input validation failure
buffer overflow
VBShttp://www.cert.org/summaries/CS-2000-04.html
BSOD embarrassments
Observations
Failures often due to simple problems “in the details.”
Reuse is critical but perilous.
Performance still matters a lot.
Safety Engineering
Small theorems about large programs would be useful.
Need clearly specified interfaces and checking of interface compliance.
Must not sacrifice performance.
The Code Safety Problem
Please install and execute this.
Code Safety
CPU
Code
Trusted Host
Is this safe to execute?
TheoremProver
Approach 4Formal Verification
CPU
Code
Flexible andpowerful.
Trusted Host
But really reallyreally hard andmust be correct.
A Key Idea: Explicit Proofs
CertifyingProver
CPU
ProofChecker
Code
Proof
Trusted Host
A Key Idea: Explicit Proofs
CertifyingProver
CPU
Code
Proof
No longer need totrust this component.
ProofChecker
Proof-Carrying Code[Necula & Lee, OSDI’96]
A
B
Formal proof or“explanation” of safety
Typically nativeor VM code
rlrrllrrllrlrlrllrlrrllrrll…
Proof-Carrying Code
CertifyingProver
CPU
Code
Proof
Simple,small (<52KB),and fast.
No longer need totrust this component.
ProofChecker
Reasonable in size (0-10%).
Automation viaCertifying Compilation
CertifyingCompiler
CPULooks and smells like a compiler.
% spjc foo.java bar.class baz.c -ljdk1.2.2
Sourcecode
Proof
Objectcode
CertifyingProver
ProofChecker
The Role ofProgramming Languages
Civilized programming languages can provide “safety for free”.
Well-formed/well-typed safe.
Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.
The Role ofJava in this Short Course
In recent years, Java has been the main focus of my work.
Java is just barely a civilized programming language.
We routinely do better than this.
Java
Java is probably a worthwhile subject of research.
However, it contains many outrageous and mostly inexcusable design errors.
As researchers, we should not forget that we have already done much better, and must continue to do better in the future.
Note
Our current approach seems to work for many problems.
But it is the only one we have tried — there are many others.
PCC is a general concept and we have just barely scratched the surface.
Overview of Our Approach
Please install and execute this.
OK, but let me quickly look over the instructions first.
Code producer Host
Overview of Our Approach
Code producer Host
Overview of Our Approach
This store instruction is dangerous!
Code producer Host
Overview of Our Approach
Can you prove that it is always safe?
Code producer Host
Overview of Our Approach
Can you prove that it is always safe?
Yes! Here’s the proof I got from my certifying Java compiler!
Code producer Host
Overview of Our Approach
Your proof checks out. I believe you because I believe in logic.
Code producer Host
Some History
History: early 90’s
Fox project starts building the FoxNet
Need to control memory layout of data Words, bytes, etc. (endianness? alignment?) Boxed vs unboxed data (efficiency? control?) Packet headers (how to write packet filters?)
ML not expressive enough, and compiler technology is inadequate
Harper invents intentional polymorphism, typed intermediate languages, and type-directed compiling
Biagioni, et al., extend SML design
History: mid 90’s
Question: Can these ideas be used in a “production-quality” compiler for a big language like ML?
Morrisett and Tarditi build TIL General hints on IL design Encouraging signs that optimizations are OK
Stone and Harper design the MIL
Lots of work, world-wide, on type-directed compiling
Work begins on TILT
History: mid 90’s
An easy observation in 1995: Types in TIL are not carried all the way down to the
final target code The idea of enclosing LF encodings of proofs with
code is “floating around”
Lee and Necula work on this, but get nowhere Many problems, such as optimizations
Necula goes to DEC SRC to intern with Detlefs and Nelson
Works on extending ESC to catch memory leaks in Modula-3 programs
The next Fall, takes Frank’s Constructive Logic course
History: 1996
Necula and Lee write several standard BPF packet filters in hand-optimized Alpha assembly code.
Simple operational semantics for a core “safe Alpha”
– Checks safety conditions for each instruction execution Proof system for “real Alpha”
– Encoded in LF– Proofs generated and checked using Elf
Results in “self-certified code”, later “proof-carrying code”
Plus proof representations, certifying compilation, safety policies (incl. resource bounds)
Inspires significant follow-on and new work at Cornell, Princeton, INRIA, and many other places
History: 1999
CMU releases PCC to Cedilla Systems Incorporated.
Patent 6,128,774. Oct.2000, Safe to execute verification of software (Necula and Lee)
Patent 6,253,370. June 2001, Method and apparatus for annotating a computer program to facilitate subsequent processing of the program (Abadi, Ghemawat, and Stata)
In less than 26 months, a complete optimizing “ahead-of-time” PCC compiler for Java.
“Applets, Not Craplets”
History: Today
Strong similarities in TILT, PCC, TAL, …
Compiler design is changing
Some day, all compilers will be certifying
History: Today
Are proofs really necessary?
Probably not
And they are messy, compared to types
But as a verification mechanism, proofchecking seems to have some possibly significant engineering advantages over typechecking
The primary contribution
“Proof engineering”.
PCC more clearly defined the proof-engineering problem
How to do checking with minimal overhead and restriction on programs, with minimal time and space overhead in checking, with minimal size and complexity of the checker, and with minimal need for changes when the proof
system changes
K Virtual Machine
Designed to support the CLDC.
Must fit into <128KB.
Must have fast bytecode verification.
kJava class files must be Java-compatible.
Divides bytecode verification into two stages.
kJava and KVM
kJava Compiler
CPU
Sourcecode
Annot
Bytecodes
kJava Preverifier
Verifier
KVM Verification
“Preverification” is performed by the code producer.
Uses global (iterative) analysis to compute the types of stack slots and local vars at every join point.
Second stage is performed by class loader.
Simple linear scan verifies correctness of join-point annotations.
KVM Example[from Frank Yellin]
static void test(Long x) { Number y = x; while (y.IntValue() != 0) { y = nextValue(y); } return y;
0. aload_01. astore_12. goto 10Long Number | <>5. aload_16. invokeStatic nextValue(Number)9. astore_1Long Number | <>10. aload_111. invokeVirtual intValue()14. ffne 517. return
Join-point typingannotations
KVM Verification
The second stage verifier is a 10KB program that requires
a single scan of the code, and
<100 bytes of run-time storage.
Impressive!
This is Java verification done right.
Join-Point Annotations
All of these approaches to certified code make use of join-point typing annotations to reduce code verification to a simple problem.
They are essentially the classical loop invariants of the Dijkstra/ Hoare program verification approach.
Overheads
In TAL and PCC we observe relatively large annotations sizes (~10-20%), sometimes much more.
Unknown for kJava.
Research question:
Can we reduce this size?
Checking speed and storage space is also a problem.
The Special J Compiler
High-Level Architecture
Explanation
CodeVerificationconditiongenerator
Checker
Safetypolicy
Agent
Host
High-Level Architecture
Explanation
CodeVerificationconditiongenerator
Checker
Safetypolicy
Agent
Host
The VCGen
The verification condition generator (VCGen) examines each instruction.
It is a symbolic evaluator that essentially implements the operational semantics of a “safe” version of the machine language.
It checks some simple properties directly. E.g., direct jumps go to legal addrs.
Informally, it invokes the Checker when “dangerous” instructions are encountered.
The VCGen, cont’d
Examples of dangerous instructions:
memory operations
procedure calls
procedure returns
For each such instruction, VCGen creates a verification condition (VC).
High-Level Architecture
Explanation
CodeVerificationconditiongenerator
Checker
Safetypolicy
Agent
Host
The Checker
When given a VC, the Checker attempts to determine its validity.
Sometimes, it consults the “explanation” for help with this.
If successful, it allows VCGen to proceed.
The set of allowable VCs and their valid proofs is defined by the safety policy.
High-Level Architecture
Explanation
CodeVerificationconditiongenerator
Checker
Safetypolicy
Agent
Host
The Safety Policy
The safety policy is defined by an inference system that defines
the language of predicates (for VCs) the axioms and inference rules for
writing valid proofs of VCs. specifications (pre/post-conditions)
for each required entry point in the code.
Operational Semantics
The VCGen is derived (by hnd) directly from the operational semantics of a “safe machine”.
The calls to the checker establish that the code always makes progress (or halts normally) in the operational semantics.
This leads to a standard notion of soundness.
What Can’t Be Enforced?
Liveness properties currently cannot be enforced by this architecture.
In practice, however, safety properties are often “good enough”.
Architecture
Code producer Host
Ginseng
Native code
Proof
Special J
Java binary
~52KB, written in CWritten in OCaml
Annotations
Architecture
Code producer Host
Proof checker
VCGen
Axioms
Native code
Proof
VCSpecial J
Java binary
Annotations
Architecture
Code producer Host
Java binary
Proof generator
Proof checker
VCGen
Axioms
Axioms
Certifying compiler
VCGen
VC
Native code
Proof
VC
Java Virtual Machine
JVM
Java Verifier
JNI
Class file Class file
Native code
Proof-carrying
code
Ch
ecke
r
Show either the Mandelbrot or NBody3D demo.
Crypto Test Suite Results[Cedilla Systems]
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Cedilla J ava J I T
sec
On average, 72.8% faster than Java, 37.5% faster than Java with a JIT.
Java Grande Suite v2.0 [Cedilla Systems]
0
100
200
300
400
500
600
700
Cedilla J ava J I T
sec
Java Grande Bench Suite [Cedilla Systems]
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
arith assign method
CedillaJ avaJ I T
ops
Ginseng
VCGen
Checker
Safety Policy
Dynamic loading
Cross-platformsupport
~15KB, roughly similar to a KVM verifier (but with floating-point).
~4KB, generic.
~19KB, declarative and machine-generated.
~22KB, some optional.
Example: Source Code
public class Bcopy { public static void bcopy(int[] src,
int[] dst) { int l = src.length; int i = 0;
for(i=0; i<l; i++) { dst[i] = src[i]; } }}
Example: Target Code
ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI_bcopy__6arrays5BcopyAIAI:
cmpl $0, 4(%esp)je L6movl 4(%esp), %ebxmovl 4(%ebx), %ecxtestl %ecx, %ecxjg L22ret
L22:xorl %edx, %edxcmpl $0, 8(%esp)je L6movl 8(%esp), %eaxmovl 4(%eax), %esi
L7:ANN_LOOP(INV = {
(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},
MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret
L13:call __Jv_ThrowBadArrayIndex
ANN_UNREACHABLEnop
L6:call __Jv_ThrowNullPointer
ANN_UNREACHABLEnop
Cut Points
Each loop entry must be annotated as a cut point.
VCGen requires this so that checking can be performed in a single scan of the code.
As a convenience, the modified registers are also declared in the cut annotations.
Example: Source Code
public class Bcopy { public static void bcopy(int[] src,
int[] dst) { int l = src.length; int i = 0;
for(i=0; i<l; i++) { dst[i] = src[i]; } }}
Example: Target Code
ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI_bcopy__6arrays5BcopyAIAI:
cmpl $0, 4(%esp)je L6movl 4(%esp), %ebxmovl 4(%ebx), %ecxtestl %ecx, %ecxjg L22ret
L22:xorl %edx, %edxcmpl $0, 8(%esp)je L6movl 8(%esp), %eaxmovl 4(%eax), %esi
L7:ANN_LOOP(INV = {
(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},
MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret
L13:call __Jv_ThrowBadArrayIndex
ANN_UNREACHABLEnop
L6:call __Jv_ThrowNullPointer
ANN_UNREACHABLEnop
A Note about Memory
We define a type for valid heap memory states:
mem : exp
and operators for reading and writing heap memory:
(sel M A)
(upd M A E)
The VCGen Process (1)_bcopy__6arrays5BcopyAIAI:
cmpl $0, src je L6 movl src, %ebx movl 4(%ebx), %ecx testl %ecx, %ecx jg L22 retL22:
xorl %edx, %edx cmpl $0, dst je L6 movl dst, %eax movl 4(%eax), %esiL7: ANN_LOOP(INV = …
A0 = (type src_1 (jarray jint))A1 = (type dst_1 (jarray jint))A2 = (type rm_1 mem)A3 = (csubneq src_1 0)ebx := src_1ecx := (sel4 rm_1 (add src_1 4))
A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0)
edx := 0
A5 = (csubneq dst_1 0)eax := dst_1esi := (sel4 rm_1 (add dst_1 4))
The VCGen Process (2)
L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI, EDX, EFLAGS,FFLAGS,RM)) cmpl %esi, %edx jae L13
movl 8(%ebx,%edx,4), %edi
movl %edi, 8(%eax,%edx,4) …
A3A5A6 = (csubb 0 (sel4 rm_1 (add src_1 4)))
edi := edi_1edx := edx_1rm := rm_2
A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4))!!Verify!! (saferd4 (add src_1 (add (imul edx_1 4) 8)))
The Checker (1)
The checker is asked to verify that(saferd4 (add src_1 (add (imul edx_1 4) 8)))
under assumptionsA0 = (type src_1 (jarray jint))A1 = (type dst_1 (jarray jint))A2 = (type rm_1 mem)A3 = (csubneq src_1 0)A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0)A5 = (csubneq dst_1 0)A6 = (csubb 0 (sel4 rm_1 (add src_1 4)))A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4))
The checker looks in the PCC for a proof of this VC.
The Checker (2)
In addition to the assumptions, the proof may use axioms and proof rules defined by the host, such as
szint : pf (size jint 4)
rdArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} pf (type A (jarray T)) -> pf (type M mem) -> pf (nonnull A) -> pf (size T 4) -> pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (saferd4 (add A OFF)).
Checker (3)
A proof for
(saferd4 (add src_1 (add (imul edx_1 4) 8)))
in the Java specification looks like this (excerpt):
(rdArray4 A0 A2 (sub0chk A3) szint (aidxi 4 (below1 A7)))
This proof can be easily validated via LF type checking.
VCGenSummary
VCGen is a symbolic evaluator for the object language.
It essentially implements a reference interpreter, except:
it uses symbolic values in order to model all possible executions, and
instead of performing run-time checks, it asks a Checker to verify the safety of “dangerous” instructions.
Safety Policies
More formally, we begin by defining the small-step operational semantics of a machine (called the s86).
, , pc instr ’, pc’
We define the machine so that only safe executions are defined.
program
register state
program counter
Safety Policies, cont’d
For convenience we choose the s86 to be a restriction of the x86.
Hence all s86 programs will execute faithfully on a real x86.
Except that on some programs in which the x86 does not execute, the x86 might do something weird.
The goal then is to prove that any given program always makes progress (or returns) in the s86.
With such a proof, the x86 is then just as good as an s86.
Verification Conditions
The point of the verification conditions, then, is to provide such progress theorems for each instruction in the program.
In other words, a VC’s validity says that the corresponding instruction has a defined execution in the s86 operational semantics.
Symbolic Evaluator
We can define the verification condition generator (VCGen) via a symbolic evaluator
SE,,0,Post(i, , L)
The result of symbolic evaluation is a conjunction of VCs, so the overall progress theorem is then
Pre SE,,0,Post(i, , L)
LF signaturepostcondition
entry point
annotations
Soundness
For particular operational semantics (a safe x86 and a safe Alpha), we have presented theorems that say, essentially:
Thm: If Pre SE,,0,Post(i, , L), then execution of , given Pre and 0, and starting from entry point i, will always make progress (or return).
Getting from Concept to Implementation
In an actual implementation, it is also handy to have a bit more than just a VC generator.
Precise syntax for VCs.
Pre/post-conditions for each entry point expected by the host in any downloaded code.
Precisely specified logical system for proving the VCs.
Verifier for “meta-data.”
Safety Policy Implementations
Safety policies are thus given in four parts:
A verification-condition generator (VCGen). A specification of the pre & post conditions
for all required procedures. A specification of the inference rules for
constructing valid proofs. Plug-ins for performing meta-data
verification.
LF (Elf syntax) is used for the rule and pre/post specifications, C for the VCGen and plug-ins.
C?!@$#@!
The use of C to define and implement the VCGen is, at best, expedient and at worst dubious.
However, since any code-inspection system must parse object files (not trivial!) and understand the instruction set, this seems to have practical benefits.
Clearly, a more formal approach would be desirable.
How Do We Know That It’s Right?
How Do We Know That It’s Right?
Although the papers and dissertation follow a rigorous development leading to a soundness result, in practice it is tempting to hack in new things in the LF signature…
ExampleJava Type-Safety Specification
Our largest example of a safety-policy specification is for the “SpecialJ” Java native-code compiler.
It contains about 140 inference rules.
Roughly speaking, these rules can be separated into 5 classes.
Safety PolicyRule Excerpts
/\ : pred -> pred -> pred.\/ : pred -> pred -> pred.=> : pred -> pred -> pred.all : (exp -> pred) -> pred.
pf : pred -> type.
truei : pf true.andi : {P:pred} {Q:pred} pf P -> pf Q -> pf (/\ P Q).andel : {P:pred} {Q:pred} pf (/\ P Q) -> pf P.ander : {P:pred} {Q:pred} pf (/\ P Q) -> pf Q.
…
1. Standard syntax and rules for first-order logic.
Type of valid proofs, indexed by predicate.
Syntax of predicates.
Inference rules.
= : exp -> exp -> pred.<> : exp -> exp -> pred.
eq_le : {E:exp} {E':exp} pf (csubeq E E') -> pf (csuble E E').
moddist+: {E:exp} {E':exp} {D:exp} pf (= (mod (+ E E') D) (mod (+ (mod E D) E') D)).
=sym : {E:exp} {E':exp} pf (= E E') -> pf (= E' E).<>sym : {E:exp} {E':exp} pf (<> E E') -> pf (<> E' E).
=tr : {E:exp} {E':exp} {E'':exp} pf (= E E') -> pf (= E' E'') -> pf (= E E'').
Safety PolicyRule Excerpts
2. Syntax and rules for arithmetic and equality.
“csuble” means in the x86 machine.
Safety PolicyRule Excerpts
jint : exp.jfloat : exp.jarray : exp -> exp.jinstof : exp -> exp.
of : exp -> exp -> pred.
faddf : {E:exp} {E':exp} pf (of E jfloat) -> pf (of E' jfloat) -> pf (of (fadd E E') jfloat).
ext : {E:exp} {C:exp} {D:exp} pf (jextends C D) -> pf (of E (jinstof C)) -> pf (of E (jinstof D)).
3. Syntax and rules for the Java type system.
Safety PolicySample Rules
aidxi : {I:exp} {LEN:exp} {SIZE:exp} pf (below I LEN) -> pf (arridx (add (imul I SIZE) 8) SIZE LEN).
wrArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} {E:exp} pf (of A (jarray T)) ->
pf (of M mem) -> pf (nonnull A) -> pf (size T 4) ->
pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (of E T) -> pf (safewr4 (add A OFF) E).
4. Rules describing the layout of data structures.
This “sel4” means the result of reading 4 bytes from heap M at address A+4.
Safety PolicySample Rules
nlt0_0 : pf (csubnlt 0 0).nlt1_0 : pf (csubnlt 1 0).nlt2_0 : pf (csubnlt 2 0).nlt3_0 : pf (csubnlt 3 0).nlt4_0 : pf (csubnlt 4 0).
5. Quick hacks.
Sometimes “unclean” things are put into the specification...
The Basic Trick
Recall the bcopy program:public class Bcopy { public static void bcopy(int[] src,
int[] dst) { int l = src.length; int i = 0;
for(i=0; i<l; i++) { dst[i] = src[i]; } }}
Unoptimized Loop Body
L11 :movl 4(%ebx), %eaxcmpl %eax, %edxjae L24
L17 :cmpl $0, 12(%ebp)movl 8(%ebx, %edx, 4), %esije L21
L20 :movl 12(%ebp), %edimovl 4(%edi), %eaxcmpl %eax, %edxjae L24
L23 :movl %esi, 8(%edi, %edx, 4)movl %edi, 12(%ebp)incl %edx
L9 :ANN_INV(ANN_DOM_LOOP,
%LF_(/\ (of rm mem ) (of loc1 (jarray jint) ))%_LF,RB(EBP,EBX,ECX,ESP,FTOP,LOC4,LOC3))cmpl %ecx, %edxjl L11
Bounds check on src.
Bounds check on dst.
Note: L24 raises the ArrayIndex exception.
Unoptimized Code is Easy
In the absence of optimizations, proving the safety of array accesses is relatively easy.
Indeed, in this case it is reasonable for VCGen to verify the safety of the array accesses.
As the optimizer becomes more successful, verification gets harder.
Role of Loop Invariants
It is for this reason that the optimizer’s knowledge must be conveyed to the theorem prover.
Essentially, any facts about program values that were used to perform and code-motion optimizations must be declared in an invariant.
Optimized Loop Body
L7:ANN_LOOP(INV = {
(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},
MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edx
Essential facts about live variables, used by the compiler to eliminate bounds-checks in the loop body.
Certifying Compiling andProving
Intuitively, we will arrange for the Prover to be at least as powerful as the Compiler’s optimizer.
Hence, we will expect the Prover to be able to “reverse engineer” the reasoning process that led to the given machine code.
An informal concept, needing a formal understanding! (Type theory is essential here…)
What is Safety, Anyway?
If the compiler fails to optimize away a bounds-check, it will insert code to perform the check.
This means that programs may still abort at run-time, albeit with a well-defined exception.
Is this safe behavior?
Compiler Development
The PCC infrastructure catches many (probably most) compiler bugs early.
Our standard regression test does not execute the object code!
Principle: Most compiler bugs show up as safety violations.
Example Bug
… L42: movl 4(%eax), %edx
testl %edx, %edxjle L47
L46: … set up for loop … L44: … enter main loop code …
…jl L44jmp L32
L47: fldzfldz
L32: … return sequence …ret
Example Bug
… L42: movl 4(%eax), %edx
testl %edx, %edxjle L47
L46: … set up for loop … L44: … enter main loop code …
…jl L44jmp L32
L47: fldz
L32: … return sequence …ret
Error in rarely executed compensation code is caught by the Proof Generator.
Another Example Bug
Suppose bcopy’s inner loop is changed:
L7: ANN_LOOP( … )cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret
Another Example Bug
Suppose bcopy’s inner loop is changed:
L7: ANN_LOOP( … )cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)addl 2, %edxcmpl %ecx, %edxjl L7ret
Again, PCC spots the danger.
Yet Another
class Floatexc extends Exception {
public static int f(int x) throws Floatexc { return x;} public static int g(int x) { return x;}
public static float handleit (int x, int y) {float fl=0;try { x=f(x); fl=1; y=f(y);}catch (Floatexc b) { fl+=fl; }return fl;
}}
Yet Another
…Install handler…pushl $_6except8Floatexc_Ccall __Jv_InitClassaddl $4, %esp
…Enter try block…L17:
movl $0, -4(%ebp)pushl 8(%ebp)call _6except8Floatexc_MfIaddl $4, %espmovl %eax, %ecx
……A handler…L22:
flds -4(%ebp)fadds -4(%ebp)jmp L18
…
Another Example[by George Necula]
void fir (int *data, int dlen, int *filter, int flen) { int i, j;
for (i=0; i<=dlen-flen; i++) { int s = 0;
for (j=0; j<flen; j++) s += filter[j] * data[i+j];
data[i] = s; }}
Compiled Example
ri = 0sub t1 = rdl, rfl
L0: CUT(ri,rj,rs,t2,t3,t4,rm)le t2 = ri, t1jeq t2, L3rs = 0rj = 0
L1: CUT(rj,rs,t2,t3,t4)lt t2 = rj, rfljeq t2, L2ult t2 = rj, rfljeq t2, Labortld t3 = [rf + 4*rj]add t2 = ri, rj
ult t4 = t2, rdljeq t4, Labortld t2 = [rd + 4*t2]mul t2 = t3, t2add rs = rs, t2add rj = rj, 1jmp L1
L2: ult t2 = ri, rdljeq t2, Labortst [rd + 4*ri] = rsadd ri = ri, 1jmp L0
L3: retLabort: call abort
/* rd=data, rdl=dlen, rf=filter, rfl=flen */
The Safety Policy
The safety policy defines verification conditions of the form:
true, E = E saferd(M, E), safewr(M, E, E) array(EA, ES, EL), vector(EA, ES, EL) Prefir = array(rd,4,rdl),
vector(rf,4,rfl) Postfir = true
VCGen Example
ri = 0sub t1 = rdl, rfl
L0: CUT(ri,rj,rs,t2,t3,t4,rm)
le t2 = ri, t1jeq t2, L3…
L3: ret
Assume precondition: array(cd,4,cdl) vector(cf,4,cfl)
Set ri = 0
Set t1 = sub(cdl,cfl)
Set rd=cd; rdl=cdl; rf=cf; rfl=cfl; rm=cm
Set ri=ci; rj=cj; rs=cs; t2=c2; t3=c3; t4=c4; rm=cm’
Set t2 = le(ci, sub(cdl,cfl))Assume not(le(ci, sub(cdl,cfl)))
Check postcondition;
Check rd,rdl,rf,rfl have initial values
VCGen Example
ri = 0sub t1 = rdl, rfl
L0: CUT(ri,rj,rs,t2,t3,t4,rm)
le t2 = ri, t1jeq t2, L3rs = 0rj = 0
L1: CUT(rj,rs,t2,t3,t4)
lt t2 = rj, rfljeq t2, L2…
L2: ult t2 = ri, rdljeq t2, Labortst [rd + 4*ri] = rs
Set ri = 0
Set t1 = sub(cdl,cfl)Set ri=ci; rj=cj; rs=cs; t2=c2 t3=c3; t4=c4; rm=cm’
Set t2 = le(ci, sub(cdl,cfl))Assume le(ci, sub(cdl,cfl))Set rs = 0Set rj = 0Set rj=cj’; rs=cs’; t2=c2’; t3=c3’; t4=c4’
Set t2 = lt(cj’, cfl)Assume not(lt(cj’, cfl))
Set t2 = ult(ci, cdl)Assume ult(ci, cdl)Check safewr(cm’, add(cd,mul(4,ci)),cs’)
More on the Safety Policy
Some of the inference rules in the LF signature:
rdarray : saferd(M,add(A,mul(S,I))) <- array(A,S,L), ult(I,L).
rdvector : saferd(M,add(A,mul(S,I))) <- vector(A,S,L), ult(I,L).
wrarray : safewr(M,add(A,mul(S,I)),V) <- array(A,S,L), ult(I,L).
The Checker
When the Checker is invoked on safewr(cm’, add(cd,mul(4,ci)), cs’)
There are assumptions: assume0 : ult(ci,cdl). assume1 : not(lt(cj’,cfl)). assume2 : le(ci, sub(cdl,cfl)). assume3 : vector(cf,4,cfl). assume4 : array(cd,4,cdl).
The Checker, cont’d
The VC safewr(cm’, add(cd,mul(4,ci)), cs’)
can be verified by using the rule wrarray : safewr(M,add(A,mul(S,I)),V) <- array(A,S,L), ult(I,L).
and assumptions assume0 : ult(ci,cdl). assume4 : array(cd,4,cdl).
Proof Representation
A simple (but somewhat naïve) representation of the proof is simply the sequence of proof rules:
wrarray, assume4, assume0
Optimized Code
The previous example was somewhat simplified.
More realistic code is optimized, usually based on inferences about integer values.
Such optimizations require that arithmetic invariants be placed in the cut points.
Optimized Example
ri = 0sub t1 = rdl, rfl
L0: CUT(ri>0,{ri,rj,…})le t2 = ri, t1jeq t2, L3rs = 0rj = 0
L1: CUT(rj>0,{rj,rs,…})lt t2 = rj, rfljeq t2, L2ld t3 = [rf + 4*rj]add t2 = ri, rj
ld t2 = [rd + 4*t2]mul t2 = t3, t2add rs = rs, t2add rj = rj, 1jmp L1
L2: st [rd + 4*ri] = rsadd ri = ri, 1jmp L0
L3: ret
/* rd=data, rdl=dlen, rf=filter, rfl=flen */
VCGen Example
ri = 0sub t1 = rdl, rfl
L0: CUT(ri>0, {ri,rj,rs,t2,t3,t4,rm}
le t2 = ri, t1jeq t2, L3rs = 0rj = 0
…
Set ri = 0
Set t1 = sub(cdl,cfl)Set ri=ci; rj=cj; rs=cs; t2=c2 t3=c3; t4=c4; rm=cm’
Set t2 = le(ci, sub(cdl,cfl))Assume le(ci, sub(cdl,cfl))
Assume >(ci,0)
Practical Considerations
Trusted Computing Base
The trusted computing base is the software infrastructure that is responsible for ensuring that only safe execution is possible.
Obviously, any bugs in the TCB can lead to unsafe execution.
Thus, we want the TCB to be simple, as well as fast and small.
VCGen’s Complexity
Fortunately, proofs can be quite small, and proofchecking can be quite simple, small, and fast.
VCGen, at core, is also simple and fast.
But in practice it gets to be quite complicated.
VCGen’s Complexity
Some complications: If dealing with machine code, then
VCGen must parse machine code. Maintaining the assumptions and
current context in a memory-efficient manner is not easy.
Note that Sun’s kVM does verification in a single pass and only 8KB RAM!
VC Explosion
a == b
a == c
f(a,c)
a := x c := x
a := y c := y
a=b => (x=c => safef(y,c) x<>c => safef(x,y))
a<>b => (a=x => safef(y,x) a<>x => safef(a,y))
Exponential growth in size of the VC is possible.And it actually happens in practice!
Precondition: safef(i,j)
VC Explosion
a == b
a == c
f(a,c)
a := x c := x
a := y c := y
INV: P(a,b,c,x)
(a=b => P(x,b,c,x)
a<>b => P(a,b,x,x))
(a’,c’. P(a’,b,c’,x) =>
a’=c’ => safef(y,c’) a’<>c’ => safef(a’,y))
Growth can usually becontrolled by careful placementof just the right “join-point” invariants.
Stack Slots
Each procedure will want to use the stack for local storage.
This raises a serious problem because a lot of information is lost by VCGen (such as the value) when data is stored into memory.
Stack Slots
We avoid this problem by assuming that procedures use up to 256 words of stack as registers.
Main restriction:
No indirect addressing of stack slots.
Callee-save Registers
Standard calling conventions dictate that the contents of some registers be preserved.
These callee-save registers are specified along with the pre/post-conditions for each procedure.
The preservation of their values must be verified at every return instruction.
Postcondition
Precondition
ANN_FUNCTION(__Jv_instanceof,
%LF_(/\ (of loc3 (jinstof _4java4lang6Object_C))
(/\ (of (loc2 jint)
(/\ (jelemtype loc1)
(of rm mem))))%_LF,
%LF_(/\ (of eax jbool)
(of rm mem))%_LF,
RB(ESP,EBP,FTOP),
3,4)
Function specifications
Callee-save registersStack spec
Annotations used by Special J
ANN_CLASSANN_FUNCTIONANN_LOCALSANN_INVANN_DOM_LOOPANN_DOMINATORANN_SYMBOLADDRANN_CALLJAVAVIRTUALANN_CALLJAVAINTERFACEANN_JUMPTHROUGHTABLEANN_INSTALLEDJAVAHANDLERANN_UNINSTALLEDJAVAHANDLERANN_UNREACHABLE
ANN_CLASS and ANN_FUNCTION
Normally, ANN_FUNCTION is not used. Instead, ANN_CLASS declares that an object file implements a Java class.
public final class Factor1 { … }
ANN_CLASS(_7Factor1_vt)…
ANN_LOCALS
As a convenience for VCGen, the number of stack slots is declared for each method.
public static void combineTags(Node n, int i) {
…
}
ANN_LOCALS(__7Factor1_McombineTagsL4NodeXI, 8).text.align 4.globl __7Factor1_McombineTagsL4NodeXI__7Factor1_McombineTagsL4NodeXI :…
ANN_INV / ANN_DOM_LOOP
Loop invariants.
ANN_INV(ANN_DOM_LOOP,
%LF_(/\ (nonnull loc2 )
(/\ (of rm mem )
(of eax (jinstof
_4java4util12ListIterator_vt) )))%_LF,
RB(EBP,ESP,FTOP,LOC4,LOC3,LOC2))
Signifies loop invariant
Invariants
Modified registers
ANN_DOMINATOR
Dominating join points are marked.
ANN_DOMINATOR.L536_dom:
jle .L237
…
.L237 :ANN_INV(.L536_dom, %LF_(/\ (nonnull loc3 ) (/\ (of rm mem ) (of loc3 (jinstof _4Node_vt) )))%_LF, RB(EBP,ESP,FTOP,LOC5,LOC4))
Invariants
Special J currently emits the followings kinds of invariants:
true, false x = y, x <> y (x,y regs or consts) x < y (signed and unsigned) x : t
jint, jbool, … Jclassdesc jinstof(C) implSpecIntf(x,y,z) …
Virtual method invocation
public static void combineTags(Node n, int i) { if(i>0) { if(!n.isString()) { Iterator iter = n.getSubtrees();
while(iter.hasNext()) { combineTags((Node)(iter.next()), i-1); }
…
Virtual method invocation, cont’d
For the loop body:pushl $1 # vmethod
ANN_SYMBOLADDR(0)pushl $_4java4util8Iterator_vt # classpushl -4(%ebp) # objectcall __Jv_LookupInterfaceMethodaddl $12, %esppushl -4(%ebp)
ANN_CALLJAVAVIRTUAL(_4java4util8Iterator_vt, 1) # next methodcall *%eaxaddl $4, %esp
ANN_SYMBOLADDR(0)pushl $_4Node_vtpushl $0pushl %eaxcall __Jv_checkCast
Jump tables
public static final void closeToString (int t) throws IOException { if(!isEmpty(t)) { switch (getColor(t)) { case -1 : break ; // no color case 0 : singleTagString('r', noSecond, false); break; case 1 : singleTagString('g', noSecond, false); break; case 2 : singleTagString('b', noSecond, false); break; case 3 : singleTagString('c', noSecond, false); break; case 4 : singleTagString('m', noSecond, false); break; case 5 : singleTagString('y', noSecond, false); break; case 6 : singleTagString('k', noSecond, false); break; case 7 : singleTagString('w', noSecond, false); break; }…
Jump tables, cont’d
ANN_DOMINATOR.L181_dom:
jae .L23.L33 :ANN_JUMPTHROUGHTABLE(.L32, 9)ANN_SYMBOLADDR(0)
jmp *.L32(, %ebx, 4).L24 :
pushl $0pushl $0pushl $119call
__3Tag_MsingleTagStringCCZaddl $12, %espjmp .L23
.L25…
….L32:
.long .L23
.long .L31
.long .L30
.long .L29
.long .L28
.long .L27
.long .L26
.long .L25
.long .L24
Exception handlers
public Object clone() { try { return super.clone(); } catch (CloneNotSupportedException e) { return null; }}
Exception handlers, cont’d
__7Context_Mclone :pushl %ebpmovl %esp, %ebpcall __Jv_GetExcHandler
ANN_SYMBOLADDR(0)pushl $.L11
ANN_SYMBOLADDR(0)pushl
$_4java4lang26CloneNotSupportedException_vtpushl %ebppushl $1pushl (%eax)
ANN_INSTALLJAVAHANDLER(.L11)movl %esp, (%eax)pushl 8(%ebp)
ANN_DOMINATOR.L14_dom:
call __4java4lang6Object_Mcloneaddl $4, %esp
.L9 :movl %eax, 8(%ebp)call __Jv_GetExcHandlermovl (%esp), %ebx
ANN_UNINSTALLJAVAHANDLER(1)…
.L11 :ANN_INV(.L14_dom,
%LF_(of rm mem )%_LF,RB(EBP,ESP,FTOP,LOC3,LOC2))nop
.L12 :xorl %eax, %eaxmovl %ebp, %esppopl %ebpret
Efficient Representation and Validation of Proofs
Goals
We would like a representation for proofs that is
compact, fast to check, requires very little memory to check, and is “canonical,” in the sense of
accommodating many different logics without requiring a reimplementation of the checker.
Three Approaches
1. Direct representation of a logic.
2. Use of a Logical Framework.
3. Oracle strings.
We will reject (1).We consider only (2) and (3).
Logical Framework
For representation of proofs we use the Edinburgh Logical Framework (LF).
LFi
Skip?
LF Example in Elf Syntax
exp : typepred : typepf : pred -> type
true : pred/\ : pred -> pred -> pred=> : pred -> pred -> predall : (exp -> pred) -> pred
truei : pf trueandi : {P:pred} {R:pred} pf P -> pf R -> pf (/\ P R)andel : {P:pred} {R:pred} pf (/\ P R) -> pf Pimpi : {P:pred} {R:pred} (pf P -> pf R) -> pf (=> P R)alli : {P:exp -> pred} ({X:exp} pf (P X)) -> pf (all P)alle : {P:exp -> pred} {E:exp} pf (all P) -> pf (P E)
LF as a Proof Representation
LF is canonical, in that a single typechecker for LF can serve as a proofchecker for many different logics specified in LF. [See Avron, et al. ‘92]
But the efficiency of the representation is poor.
Size of LF Representation
Proofs in LF are extremely large, due to large amounts of repetition.
Consider the representation of P P P for some predicate P:
The proof of this predicate has the following LF representation:
(=> P (/\ P P))
(impi P (/\ P P) ([X:pf P] andi P P x x))
Checking LF
The nice thing is that typechecking
is enough for proofchecking. [The theorem is in the LF paper.]
But the proofs are extremely large.
(impi P (/\ P P) ([X:pf P] andi P P X X)) : pf (=> P (/\ P P))
Implicit LF
A dramatic improvement can be achieved by using a variant of LF, called Implicit LF, or LFi.
In LFi, parts of the proof can be replaced by placeholders.
(impi * * ([X:*] andi * * X X)) : pf (=> P (/\ P P))
Soundness of LFi
The soundness of the LFi type system is given by a theorem that states:
If, in context , a term M has type A in LFi (and and A are placeholder-free), then there is a term M’ such that M’ has type A in LF.
Typechecking LFi
The typechecking algorithm for LFi is given in [Necula & Lee, LICS98].
A key aspect of the algorithm is that it avoids repeated typechecking of reconstructed terms.
Hence, the placeholders save not only space, but also time.
Effectiveness of LFi
In experiments with PCC, LFi leads to substantial reductions in proof size and checking time.
Improvements increase nonlinearly with proof size.
Experiment Proof size (bytes) Checking time (ms)LF LFi LF LFi
unpack >10 x 106 23728 8256 42simplex >2 x 106 23888 1656 42sharpen 183444 4816 136 7qsort 92412 3098 74 6kmp 77246 2092 60 3bcopy 12466 796 11 1
The Need for Improvement
Despite the great improvement of LFi, in our experiments we observe that, in practice, LFi proofs are 10%-200% the size of the code.
How Big is a Proof?
A basic question is how much essential information is in a proof?
In this proof,
there are only 2 uses of rules and in each case they were the only rule that could have been used.
(impi * * ([X:*] andi * * x x)) : pf (=> P (/\ P P))
Improving the Representation
We will now improve on the compactness of proof representation by making use of the observation that large parts of proofs are deterministically generated from the inference rules.
Additional References
For LF:
Harper, Honsell, & Plotkin. A framework for defining logics. Journal of the ACM, 40(1), 143-184, Jan. 1993.
Avron, Honsell, Mason, & Pollack. Using typed lambda calculus to implement formal systems on a machine. Journal of Automated Reasoning, 9(3), 309-354, 1992.
Additional References
For Elf: Pfenning. Logic programming in the
LF logical framework. Logical Frameworks, Huet & Plotkin (Eds.), 149-181, Cambridge Univ. Press, 1991.
Pfenning. Elf: A meta-language for deductive systems (system description). 12th International Conference on Automated Deduction, LNAI 814, 811-815, 1994.
Oracle-Based Checking
Necula’s ExampleSyntax of Girard’s System F
ty : typeint : tyarr : ty -> ty -> tyall : (ty -> ty) -> ty exp : typez : exps : exp -> explam : (exp -> exp) -> expapp : exp -> exp -> exp
of : exp -> ty -> type
Necula’s ExampleTyping Rules for System F
tz : of z int
ts : {E:exp} of E int -> of (s E) int
tlam : {E:exp->exp} {T1:ty} {T2:ty} ({X:exp} of X T1 -> of (E X) T2) -> of (lam E) (arr T1 T2)
tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T
tgen : {E:exp} {T:ty->ty} ({T1:ty} of E (T T1)) -> of E (all T)
tins : {E:exp} {T:ty->ty} {T1:ty} of E (all T) -> of E (T T1)
LF Representation
Consider the lambda expression
It is represented in LF as follows:
(f.(f x.x) (f 0)) y.y
app (lam [F:exp] app (app F (lam [X:exp] X)) (app F 0)) (lam [Y:exp] Y)
Necula’s Example
Now suppose that this term is an applet, with the safety policy that all applets must be well-typed in System F.
One way to make a PCC is to attach a typing derivation to the term.
Typing Derivation in LF(tapp (lam [F:exp] (app (app F (lam [X:exp] X)) (app F 0))) (lam ([X:exp] X)) (all ([T:ty] arr T T)) int (tlam (all ([T:ty] arr T T)) int ([F:exp] (app (app F (lam [X:exp] X)) (app F 0))) ([F:exp][FT:of F (all ([T:ty] arr T T))] (tapp (app F (lam [X:exp] X)) (app F 0) int int (tapp F (lam [X:exp] X) (arr int int) (arr int int) (tins F ([T:ty] arr T T) (arr int int) FT) (tlam int int ([X:exp] X) ([X:exp][XT:of X int] XT))) (tapp F 0 int int (tins F ([T:ty] arr T T) int FT) t0)))) (tgen (lam [Y:exp] Y) ([T:ty] arr T T) ([T:ty] (tlam T T ([Y:exp] Y) ([Y:exp] [YT:of Y T] YT)))))
Typing Derivation in LFi
(tapp * * (all ([T:*] arr T T)) int (tlam * * * ([F:*][FT:of F (all ([T:ty] arr T T))] (tapp * * int (tapp * * (arr int int) (arr int int) (tins * * * FT) (tlam * * * ([X:*][XT:*] XT))) (tapp * * int int (tins * * * FT) t0)))) (tgen * * ([T:*] (tlam * * * ([Y:*] [YT:*] YT)))))
I think. I did this by hand!
LF Representation
Using 16 bits per token, the LF representation of the typing derivation requires over 2,200 bits.
The LFi representation requires about 700 bits.
(The term itself requires only about 360 bits.)
Skip ahead
A Bit More about LFi
To convert an LF term into an LFi term, a representation algorithm is used. [Necula&Lee, LICS98]
Intuition: When typechecking a term: c M1 M2 … Mn : A (in a context )
we know, if A has no placeholders, that some of the M1…Mn may appear in A.
A Bit More about LFi, cont’d
For example, when the rule
is applied at top level, the first two arguments are present in the term
and thus can be elided.
tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T
app (lam [F:exp] app (app F (lam [X:exp] X)) (app F 0)) (lam [Y:exp] Y)
A Bit More about LFi, cont’d
A similar trick works at lower levels by relying on the fact that typing constraints are solved in a certain order (e.g., right-to-left).
See the paper for complete details.
Can We Do Better?
tz : of z int
ts : {E:exp} of E int -> of (s E) int
tlam : {E:exp->exp} {T1:ty} {T2:ty} ({X:exp} of X T1 -> of (E X) T2) -> of (lam E) (arr T1 T2)
tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T
tgen : {E:exp} {T:ty->ty} ({T1:ty} of E (T T1)) -> of E (all T)
tins : {E:exp} {T:ty->ty} {T1:ty} of E (all T) -> of E (T T1)
Determinism
Looking carefully at the typing rules, we observe:
For any typing goal where the term is known but the type is not:
3 possibilities: tgen, tins, other.
If type structure is known, only 2 choices, tapp or other.
How MuchEssential Information?
(tapp (lam [F:exp] (app (app F (lam [X:exp] X)) (app F 0))) (lam ([X:exp] X)) (all ([T:ty] arr T T)) int (tlam (all ([T:ty] arr T T)) int ([F:exp] (app (app F (lam [X:exp] X)) (app F 0))) ([F:exp][FT:of F (all ([T:ty] arr T T))] (tapp (app F (lam [X:exp] X)) (app F 0) int int (tapp F (lam [X:exp] X) (arr int int) (arr int int) (tins F ([T:ty] arr T T) (arr int int) FT) (tlam int int ([X:exp] X) ([X:exp][XT:of X int] XT))) (tapp F 0 int int (tins F ([T:ty] arr T T) int FT) t0)))) (tgen (lam [Y:exp] Y) ([T:ty] arr T T) ([T:ty] (tlam T T ([Y:exp] Y) ([Y:exp] [YT:of Y T] YT)))))
How MuchEssential Information?
There are 15 applications of rules in this derivation.
So, conservatively: log2 3 15 = 30 bits
In other words, 30 bits should be enough to encode the choices made by a type inference engine for this term.
Oracle-based Checking
Idea: Implement the proofchecker as a nondeterministic logic interpreter whose
program consists of the derivation rules, and
initial goal is the judgment to be verified.
We will avoid backtracking by relying on the oracle string.
Skip ahead
Why Higher-Order?
The syntax of VCs for the Java type-safety policy is as follows:
The LF encodings are simple Horn clauses (and requiring only first-order unification). Higher-order features only for implication and universal quantification.
E ::= x | c E1 … En
F ::= true | F1 F2 | x.F | E | E F
Why Higher-Order?
Perhaps first-order Horn logic (or perhaps first-order hereditary Harrop formulas) is enough.
Indeed, first-order expressions and formulas seem to be enough for the VCs in type-safety policies.
However, higher-order and modal logics would require higher-order features.
A SimplificationA Fragment of LF
Level-0 types. A ::= a | A1 A2
Level-1 types (-normal form). B ::= a M1 … Mn | B1 B2 | x:A.B
Level-0 kinds. K ::= Type | A K
Level-0 terms (-normal form). M ::= x:A.M | c M1 … Mn | x M1 … Mn
LF Fragment
This fragment simplifies matters considerably, without restricting the application to PCC.
Level-0 types to encode syntax.
Level-1 types to encode derivations.
No level-1 terms since we never reconstruct a derivation, only verify that one exists.
LF Fragment, cont’d
ty : typeexp : type
of : exp -> ty -> type
Level-0 types.
Level-1 type family.
Disallowing level-2 and higher type families seems not to have any practical impact.
Logic InterpreterGoals
G ::= B | M = M’ | x:B.G | x:A.G
| T | G1 G2
.
For Necula’s example, the interpreter will be started with the goal
t:ty. of E t
Naïve Interpreter
solve(B1 B2) = x:B1. solve(B2)
solve(x:A.B) = x:A. solve(B)
solve(a M1 … Mn) = subgoals(B, a M1 … Mn) where B is the type of a level-1 constant or a level-1 quantified variable (in scope), as selected by the oracle.
subgoals(B1 B2, B) = x:B1. solve(B2)
subgoals(x:A.B’, B) = x:A. solve(B)
subgoals(a M1’ … Mn’, a M1 … Mn) = M1 = M1’ … Mn = Mn’
Back to the example
Consider
solve(of E t)
This consults the oracle.
Since there are 3 level-1 constants that could be used at this point, 2 bits are fetched from the oracle string (to select tapp).
Higher-Order Unification
The unification goals that remain after solve are higher-order and thus only semi-decidable.
A nondeterministic unification procedure (also driven by the oracle string) is used.
Some standard LP optimizations are also used.
Certifying Theorem Proving
Certifying Theorem Proving
Time does not allow a description here.
See: Necula and Lee. Proof generation
in the Touchstone theorem prover. CADE’00.
Of particular interest: Proof-generating congruence-
closure and simplex algorithms.
Resource Constraints
Bounds on certain resources can be enforced via counting.
In a Reference Intepreter: Maintain a global counter. Increment the count for each
instruction executed. Verify for each instruction that the
limit is not exceeded. Use the compiler to optimize away
the counting operations.
Ten Good Things About PCC
1. Someone else does all the really hard work.
2. The host system changes very little.
...
Logic as a lingua franca
CertifyingProver
CPU
Code
ProofProof
Engine
Logic as a lingua franca
CertifyingProver
CPU
ProofProof
Checker
Policy
VC
Code
Language/compiler/machine dependences isolated from the proof checker.
Expressed as predicates and derivations in a formal logic.
Logic as a lingua franca
CertifyingProver
CPU
…iaddiaload...
ProofProof
Checker
Policy
VC
Code can be in any language
once a Safety Policy is supplied.
Logic as a lingua franca
CertifyingProver
CPU
…addl %eax,%ebxtestl %ecx,%ecxjz NULLPTRmovl 4(%ecx),%edxcmpl %edx,%ebxjae ARRAYBNDSmovl 8(%ecx.%ebx.4).%edx...
ProofProof
Checker
Policy
VC
…addl %eax, %testl %ecx,%ejz NULLPTRmovl 4(%ecx),%cmpl %edx,%ebjae ARRAYBNDmovl 8(%ecx.
Adequacy of dynamic checksand “wrappers” can be verified.
Logic as a lingua franca
CertifyingProver
CPU
…add %eax,%ebxmovl 8(%ecx,%ebx,4)...
ProofProof
Checker
Policy
VC
Safety of optimized codecan be verified.
Ten Good Things About PCC
3. You choose the language.
4. Optimized (“unsafe”) code is OK.
5. Verifies that your optimizer and dynamic checks are OK.
…
The Role ofProgramming Languages
Civilized programming languages can provide “safety for free”.
Well-formed/well-typed safe.
Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.
Certifying Compilers[Necula & Lee, PLDI’98]
Intuition: Compiler “knows” why each translation
step is semantics-preserving. So, have it generate a proof that safety
is preserved. “Small theorems about big programs.”
Don’t try to verify the whole compiler, but only each output it generates.
Automation viaCertifying Compilation
CertifyingCompiler
CPU
ProofChecker
Policy
VC
Sourcecode
Proof
Objectcode
Looks and smells like a compiler.
% spjc foo.java bar.class baz.c -ljdk1.2.2
Ten Good Things About PCC
6. Can sometimes be easy-to-use.
7. You can still be a “hero theorem hacker” if you want.
...
Ten Good Things About PCC
8. Proofs are a “semantic checksum”.
9. Possibility for richer safety policies.
10. Co-exists peacefully with crypto.
Acknowledgments
George Necula.
Robert Harper and Frank Pfenning.
Mark Plesko, Michael Donohue, and Guy Bialostocki.
Top Related