Using Coq to generate and reason about x86 systems code
-
Upload
medge-farrell -
Category
Documents
-
view
24 -
download
1
description
Transcript of Using Coq to generate and reason about x86 systems code
![Page 1: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/1.jpg)
Using Coq to generate and reason about x86 systems code
Andrew Kennedy & Nick Benton (MSR Cambridge)
Jonas Jensen (ITU Copenhagen)
![Page 2: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/2.jpg)
Compositional specification and verification of high-level behavioural properties of low-level systems code
Previous work of Benton et al employed idealized machine code Simple design Infinite memory; pointers are natural numbers
It’s time to get real(ish): hence, x86
The big picture
![Page 3: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/3.jpg)
Modelling x86: bits, bytes, instructions, execution
Generating x86: assembling & compiling
Reasoning about x86: logic & proofs
Discussion
Overview of talk
![Page 4: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/4.jpg)
Clean slate: trusted base is just hardware and its model in Coq. † No dependencies on legacy code, languages,
compilers, or software architectures Verify everything – including (at some point) loader-
verifier Do everything in Coq, making effective use of
computation, notation, type classes, tactics, etc. No dependencies on external tools Coq as “world’s best macro assembler”
Our approach
† And a small boot loader
![Page 5: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/5.jpg)
Modelling x86
![Page 6: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/6.jpg)
We want to compute correctly and efficiently inside Coq Proper modelling of n-bit words, arithmetic with carry, sign,
overflow, rotates, shifts, padding, the lot, all O(n) Generic over word-length, so index type by n : nat
We also want to reason soundly inside Coq Associativity, commutativity, order properties, etc
Bits, bytes and words
𝔹𝑛 ℤ2𝑛
Compute here:n-tuples of bools
Reason here: 'Z_(2^n) from ssreflect
library,reuse lemmas
≅
![Page 7: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/7.jpg)
Example: definition of addition
Effective use of dependent types
Definition is very algorithmic:
so we can compute!
Performance inside Coq?
On this machine, about 2000 additions
a second
![Page 8: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/8.jpg)
Example: proofs about addition
1. Deal with n=0 case
4. Apply ssreflect “ring” lemma for 'Z_(2^n)
2. Apply injectivity of toZp to work in 'Z_(2^n):forall x y, toZp x = toZp y -> x = y
3. Rewrite using homomorphism lemmas e.g. toZp (addB p1 p2) = (toZp p1 + toZp p2)%R
![Page 9: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/9.jpg)
Register state is just total function
Flags can take on undefined value (see later)
Abstractly, memory is DWORD BYTE Partiality represents whether memory is mapped
and accessible Concretely, for efficiency, a trie-like structure
Machine state
![Page 10: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/10.jpg)
x86 is notoriously large and baroque (instruction set manual alone is 1640 pages long)
Subset only: no legacy 16-bit mode, flat memory model (no segment nonsense), no floating point, no SIMD instructions, no protected-mode instructions, no 64-bit mode (yet)
Actually: not too bad, possible to factor so that Coq datatype is “total” (no junk)
X86 instructions
![Page 11: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/11.jpg)
Addressing modes
e.g. ADD EBX, EDI + [EDX*4] + 12
![Page 12: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/12.jpg)
Manuals don’t reveal much “structure” – such as it is – in instruction format
But it can bediscerned – andutilitised forconcise decodingfunctions
Instruction format
![Page 13: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/13.jpg)
Instruction decoding
Uses monadic syntax,reader reads from memory and
advances pointer
Note: there may be many instruction
formats for the same instruction
![Page 14: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/14.jpg)
Currently, a partial function from State to State. Implemented in monadic style, using “primitive” operations of r/w
register, r/w flag, r/w memory, etc. Factored to re-use common patterns e.g. evalMemSpec, evalSrc
Instruction execution
Example fragment: call
and return
![Page 15: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/15.jpg)
Non-determinism & under-specification
![Page 16: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/16.jpg)
Non-determinism & under-specification
![Page 17: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/17.jpg)
For sequential x86, for the subset we care about, almost completely deterministic
Flags are the main issue. Introduce “undefined” state for flags Instructions that depend on a flag whose value
is undefined (e.g. branch-on-carry) then has unspecified behaviour
An alternative would be to set flags non-deterministically (cf RockSalt)
Representing non-determinism and under-specification
![Page 18: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/18.jpg)
Generating x86: Assembling and Compiling
![Page 19: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/19.jpg)
Directly represent encoding by list of bytes Note: encoding is
position-dependent In future we might
mirror decodingusing a monadic style
Instruction encoding
![Page 20: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/20.jpg)
Targets of jumps and branches are just absolute addresses in the Instr type. To write assembler code we want labels – for this we use a kind of HOAS type:
Jumps and labels
![Page 21: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/21.jpg)
Cute use of notation in Coq: can write assembler code more-or-less using syntax of real assemblers!
But also make use of Coq definitions, and “macros”
Syntax matters
While macro
Label
Label binding
![Page 22: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/22.jpg)
Given an assembler program and an address to locate it, we can produce a sequence of bytes in the usual “two-pass” way:
Assembling
![Page 23: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/23.jpg)
Statement of correctness uses overloaded “points-to” predicate, to be described later
Round-trip theorem
Memory between offset and endpos
contains bytes
Memory between offset and endpos decodes to
prog
![Page 24: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/24.jpg)
Instead of trusting – or modelling – existing languages such as C, we plan to develop little languages inside Coq.
We have experimented with a tiny imperative language and its “compiler”, proved correct in Coq
Little languages
![Page 25: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/25.jpg)
Code demo!
![Page 26: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/26.jpg)
Reasoning about x86:Logic and Proof
![Page 27: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/27.jpg)
Assertion logic: predicate on partial states, usual connectives + separating conjunction
Specification logic over this, incorporates step-indexing and framing, with corresponding later and frame connectives
Safety specification used to give rules for instructions, in CPS style, packaged as Hoare-style triples for non-jumpy instructions
Treatment of labels makes for elegant definition and rules for macros (e.g. while, if)
Big picture
![Page 28: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/28.jpg)
Partiality denotes partial description, as usual for separation logic Not to be confused with use of partiality for
flags (undefined state) and memory (un-mapped or inaccessible)
Partial states
![Page 29: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/29.jpg)
Assertions (= SPred) are predicates on partial states
Assertion logic
We define a separation logic of assertions, with usual connectives. Example rules:
Points-to predicate for memory is overloaded for different “decoders” of memory
Core definition: memory from p to q “decodes” to
value x
x could be a BYTE, a DWORD, a seq BYTE or
even an Instr
![Page 30: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/30.jpg)
Machine code does not “finish” and so standard Hoare triple does not suit; also, code is mixed up with store. So we define safe k P to mean “runs without faulting for k steps from any state satisfying P.”
Safety
Example: tight loop
Example: jmp
![Page 31: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/31.jpg)
It’s painful working directly with safe: we must work explicitly with “step-index” k and “frame” R
Instead, we define a specification logic in which a spec is a set S of pairs such that
In other words, it builds in steps and frames
Specification logic
![Page 32: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/32.jpg)
To hide explicit step indices, we use a later connective and the Löb rule:
Connectives for spec logic
We define a frame connective
It gives us a “frame rule” for specs, and distributes over other connectives
![Page 33: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/33.jpg)
Given our definitions of safety and points-to for instructions, we can mimic Hoare-style triples for basic blocks:
Basic blocks
We can then derive familiar rules such as framing:
This is useful when proving straight-line machine code
![Page 34: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/34.jpg)
Rules for instructions (I)No control flow
Use Hoare-like triple
![Page 35: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/35.jpg)
Rules for instructions (II)Control flow
Explicit CPS-like use of safe
Two possible continuations
![Page 36: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/36.jpg)
We overload “points-to” on assembler programs, so (roughly)
Reasoning with labels
![Page 37: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/37.jpg)
Our representation of scoped labels makes it easy to define macros that make use of labels internally – and derive rules for them.
Macros
![Page 38: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/38.jpg)
Putting it together: A spec for a memory allocator
![Page 39: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/39.jpg)
Trivial implementation of allocator
![Page 40: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/40.jpg)
Very painful to work with assertions and specs using only primitive rules
We have built Coq tactic support for Basic simplification of formulae (AC of *, etc.) Pulling out existential quantifiers automatically
Greatly simplifies proving!
Proof support
![Page 41: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/41.jpg)
Proof demo!
![Page 42: Using Coq to generate and reason about x86 systems code](https://reader035.fdocuments.us/reader035/viewer/2022062517/568138d3550346895da08d9e/html5/thumbnails/42.jpg)
We can generate and prove correct tiny programs written in “Coq” assembler and a small while-language
Binary generated by Coq can be run on “raw metal” (booted off a CD!)
Next steps Model of I/O e.g. screen/keyboard; currently our “observable” is
just “faulting” High-level model of processes Build and verify OS components such as scheduler, allocator,
loaded Eventual aim: process isolation theorem
Status