Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O)...

51
Modular Semantics for LLVM IR Steve Zdancewic DeepSpec @ PLDI 2018

Transcript of Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O)...

Page 1: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Modular Semantics for LLVM IR

Steve Zdancewic DeepSpec @ PLDI 2018

Page 2: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Collaborators •  Lennart Berringer •  Joachim Breitner •  Dmitri Garbuzov •  Olek Gierczak •  Gil Hur •  Jeehon Kang •  Nicolas Koh •  Yao Li •  Yishuai Li •  William Mansky •  Milo M.K. Martin •  Santosh Nagarakatte •  Benjamin Pierce •  Christine Rizkallah •  Viktor Vafeiadis •  Li-yao Xia •  Yannick Zakowski •  Jianzhou Zhao

Page 3: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

3

deepspec.org

Page 4: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Vellvm: Verified LLVM IR

4

•  Compiler intermediate representation semantics •  Parameterized by the memory model •  github.com/vellvm/vellvm

Page 5: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

LLVM Compiler Infrastructure

optimizations/ transformations

typed SSA IR

analysis

[Lattner et al.]

5

LLVM

front ends

code gen/jit

Page 6: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

The Vellvm Project

optimizations/ transformations

typed SSA IR

analysis

•  Formal semantics •  Tools for creating

simulation proofs •  Coq implementation •  Extraction of passes for

use with LLVM compiler •  Applications:

–  verified memory safety instrumentation

–  validated optimizations

[Zhao et al. POPL 2012, CPP 2012, PLDI 2013, Mansky et al. CAV2015]

6

Page 7: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Vellvm Framework

transform C Source

Code other

optimizations LLVM

IR target

LLVM

Coq syntax

operational semantics

memory models

type system and SSA

proof techniques & metatheory

7

LLVM IR

OCaml bindings printer parser

extract

Page 8: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Vellvm Framework

C Source Code

other optimizations

LLVM IR target

LLVM OCaml bindings

printer parser

Coq syntax

operational semantics

memory models

type system and SSA

proof techniques & metatheory

8

LLVM IR

Verified Transform

extract

Page 9: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

LLVM IR Semantics

SSA ≈ functional program

+ •  Undefined values / poison •  Effects

–  structured heap load/store –  system calls (I/O)

•  Types & Memory Layout –  structured, recursive types –  type-directed projection –  type casts

[Kelsey 1995 / Appel 1998] We know how to model this and prove properties about the models.

We are starting to figure out how to decompose them modularly.

Page 10: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Modular Semantics •  Factor out memory model [CAV 15]

–  similar to linking/separate compilation •  For:

–  concurrency –  more extensibility/robustness to changes –  better support for casts [PLDI 15]

LLVM SSA core IR

Memory Model / IO Concurrency

Page 11: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

DeepSpec Integration Experiments

11

DeepWeb (work in progress)

•  (Eventual) Goal: web server implemented in C •  Running on top of CertiKOS •  Verified using VST •  Intermediate steps checked by QuickChick

Page 12: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

12

C-level APIs hardware-level APIs

network effects memory models

file systems

•  Coq descriptions of many different systems •  Need a common way of describing their behaviors

–  various levels of abstraction –  different interfaces –  modularity / extensibility

Page 13: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

1.  Explain Interaction Trees

2.  Describe "toy" Vellvm

3.  Come back to DeepSpec

13

Page 14: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Interaction Trees

Coq adaptation of Freer Monads, More Extensible Effects [Kiselyov & Ishii, 2015]

(see also: algebraic effects)

14

Page 15: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

CoInductive M (Event : Type -> Type) X := | Ret (x:X) | Vis {Y: Type} (e : Event Y) (k : Y -> M Event X) | Tau (k: M Event X) .

15

Page 16: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

CoInductive M (Event : Type -> Type) X := | Ret (x:X) | Vis {Y: Type} (e : Event Y) (k : Y -> M Event X) | Tau (k: M Event X) .

16

(potentially) inifinite structure

Page 17: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

CoInductive M (Event : Type -> Type) X := | Ret (x:X) | Vis {Y: Type} (e : Event Y) (k : Y -> M Event X) | Tau (k: M Event X) .

17

named "M" (for "monad")

Page 18: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

CoInductive M (Event : Type -> Type) X := | Ret (x:X) | Vis {Y: Type} (e : Event Y) (k : Y -> M Event X) | Tau (k: M Event X) .

18

parameterized by the type of observable events

Page 19: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

CoInductive M (Event : Type -> Type) X := | Ret (x:X) | Vis {Y: Type} (e : Event Y) (k : Y -> M Event X) | Tau (k: M Event X) .

19

yielding a value of type X

Page 20: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

CoInductive M (Event : Type -> Type) X := | Ret (x:X) | Vis {Y: Type} (e : Event Y) (k : Y -> M Event X) | Tau (k: M Event X) .

20

yield a result (return of the monad)

Page 21: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

CoInductive M (Event : Type -> Type) X := | Ret (x:X) | Vis {Y: Type} (e : Event Y) (k : Y -> M Event X) | Tau (k: M Event X) .

21

"visible" effect e interacts with environment to get a value of type Y k – the continuation that accepts the response

Page 22: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

CoInductive M (Event : Type -> Type) X := | Ret (x:X) | Vis {Y: Type} (e : Event Y) (k : Y -> M Event X) | Tau (k: M Event X) .

22

internal, hidden step of computation

Page 23: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

CoInductive M (Event : Type -> Type) X := | Ret (x:X) | Vis {Y: Type} (e : Event Y) (k : Y -> M Event X) | Tau (k: M Event X) | Err (s:string) .

23

error / aborted computation (needed only for convenience)

Page 24: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

24

τ τ τ

τ τ τ τ τ τ

τ τ τ τ e2

k2 0

k2 1

k2 3 e1

k1 a

k1 b

k1 c

k1 d

k1 e

τ "error"

τ τ τ …

τ e3 k3 () τ e3 k3 () τ e3 k3 () …

τ τ 42 τ τ 17

τ 11

τ τ τ 0

Page 25: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Good Qualities of Interaction Trees •  (M E) is a monad

–  bind is defined coinductively

•  Behavioral Equivalences –  strong bisimulation –  up to Tau (insert a finite no. of Tau's anywhere) –  not too hard to define new simulation relations

•  Extractable from Coq –  yields a way of (externally) running computations

described by interaction trees –  interpretation of events can be

defined in the metalanguage (e.g. OCaml)

25

Page 26: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Applications •  Vellvm Semantics – control-flow graphs, LLVM memory model

•  DeepWeb – web server events (HTTP get/put)

•  Verifiable Software Toolchain – socket API

26

Page 27: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

1.  Explain Interaction Trees

2.  Describe "toy" Vellvm

3.  Come back to DeepSpec

27

Page 28: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

LLMV IR Memory Model

28

(* IO interactions for the LLVM IR *) Inductive IO : Type -> Type := | Alloca : ∀ (t:dtyp), (IO dvalue) | Load : ∀ (t:dtyp) (a:dvalue), (IO dvalue) | Store : ∀ (a:dvalue) (v:dvalue), (IO unit) | GEP : ∀ (t:dtyp) (v:dvalue) (vs:list dvalue), (IO dvalue) | ItoP : ∀ (i:dvalue), (IO dvalue) | PtoI : ∀ (a:dvalue), (IO dvalue) | Call : ∀ (f:string) (args:list dvalue), (IO dvalue) .

output values of the Call event

type of the result provided by the

environment

Page 29: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Memory Interaction Interface

29

Module Mem. Definition addr := nat. Inductive val := | N (n:nat) | A (a:addr) . Inductive IO : Type -> Type := | Alloca : IO addr | Load : ∀ (a:addr), (IO val) | Store : ∀ (a:addr) (v:val), (IO unit) | Call : ∀ (f:funid) (v:val), (IO val) . Definition Interactions (X:Type) := M IO X. Global Instance functor_IO : Functor Interactions := (@mapM IO). Global Instance monad_IO : (@Monad Interactions) (@mapM IO) := …. Global Instance equiv_ixns {X} : (@Equiv (Interactions X)) := EquivUpToTau eq. End Mem.

Operations (simplified)

Page 30: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Memory Interaction Interface

30

Module Mem. Definition addr := nat. Inductive val := | N (n:nat) | A (a:addr) . Inductive IO : Type -> Type := | Alloca : IO addr | Load : ∀ (a:addr), (IO val) | Store : ∀ (a:addr) (v:val), (IO unit) | Call : ∀ (f:funid) (v:val), (IO val) . Definition Interactions (X:Type) := M IO X. Global Instance functor_IO : Functor Interactions := (@mapM IO). Global Instance monad_IO : (@Monad Interactions) (@mapM IO) := …. Global Instance equiv_ixns {X} : (@Equiv (Interactions X)) := EquivUpToTau eq. End Mem.

instantiate the monad

Page 31: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

External Interface

31

Module Ext. Inductive IO : Type -> Type := | Call : ∀ (f:funid) (v:Mem.val), (IO Mem.val) . Definition Interactions (X:Type) := M IO X. Global Instance functor_IO : Functor Interactions := (@mapM IO). Global Instance monad_IO : (@Monad Interactions) (@mapM IO) := …. Global Instance equiv_ixns {X} : (@Equiv (Interactions X)) := EquivUpToTau eq. End Ext.

Interface contains only the external call

Page 32: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Memory Model

32

Definition MemoryModel := ∀ X, Mem.Interactions X -> Ext.Interactions X.

Transduces memory interactions to external traces.

Page 33: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

MM Implementation

33

Definition memory := list Mem.val. Definition load (m:memory) (a:nat) := nth_default (Mem.N 0) m a. Definition store (m:memory) := replace m. (* Effects handlers *) Definition mem_step X (io : Mem.IO X) m : (memory * X) + (Ext.IO X) := match io with | Mem.Alloca => inl (m ++ [Mem.N 0], (length m)) | Mem.Load a => inl (m, load m a) | Mem.Store a v => inl (store m a v, ()) | Mem.Call f v => inr (Ext.Call f v) end.

Page 34: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

34

(* Map the handler over the trace *) Definition mem (m:memory) : MemoryModel := fun X t => (cofix go (m:memory) t := match t with | Tau d' => Tau (go m d') | Vis io k => match mem_step io m with | inl (m', v) => Tau (go m' (k v)) | inr eo => Vis eo (fun v => go m (k v)) end | Ret x => Ret x | Err x => Err x end) m t.

interpret the effects

Page 35: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Metatheory

35

Lemma mem_eutt : ∀ X (s t : Mem.Interactions X), s ≡ t -> ∀ m, (mem m s) ≡ (mem m t). Lemma mem_swap : ∀ {X} m a b v1 v2 (k1 k2 : () -> Mem.Interactions X), a <> b -> (k1 ()) ≡ (k2 ()) -> (mem m (Vis (Mem.Store a v1) (fun _ => Vis (Mem.Store b v2) k1))) ≡ (mem m (Vis (Mem.Store b v2) (fun _ => Vis (Mem.Store a v1) k2))). Lemma mem_load2 : ∀ {X} m a (k : Mem.val -> Mem.val -> Mem.Interactions X), (mem m (Vis (Mem.Load a) (fun v => Vis (Mem.Load a) (fun w => k v w))))

≡ (mem m (Vis (Mem.Load a) (fun w => k w w))).

Page 36: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Programming Language / IR

36

Inductive cmd := | LET (v:var) (e:exp) | ALLOCA (x:var) | LOAD (x:var) (e:exp) | STORE (e1:exp) (e2:exp) | CBR (e:exp) (b1 b2:list cmd) | … .

Define IR language using standard techniques… (mostly omitted here)

Page 37: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Operational Semantics

37

Definition interpret (e:env) (code:list cmd) : Mem.Interactions unit := (cofix go (e:env) (code:list cmd) : Mem.Interactions unit := match code with | [] => Ret () | LET x exp :: k => do v <- eval e exp; Tau (go (add e x v) k) | LOAD x exp :: k => do a <- eval e exp; match a with | Mem.A a => Vis (Mem.Load a) (fun v => go (add e x v) k) | _ => raise "loaded non-address" end …

Interpret the program in the interaction tree monad

Page 38: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Operational Semantics

38

Definition interpret (e:env) (code:list cmd) : Mem.Interactions unit := (cofix go (e:env) (code:list cmd) : Mem.Interactions unit := match code with | [] => Ret () | LET x exp :: k => do v <- eval e exp; Tau (go (add e x v) k) | LOAD x exp :: k => do a <- eval e exp; match a with | Mem.A a => Vis (Mem.Load a) (fun v => go (add e x v) k) | _ => raise "loaded non-address" end …

Use Tau to "hide" internal steps

Page 39: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Operational Semantics

39

Definition interpret (e:env) (code:list cmd) : Mem.Interactions unit := (cofix go (e:env) (code:list cmd) : Mem.Interactions unit := match code with | [] => Ret () | LET x exp :: k => do v <- eval e exp; Tau (go (add e x v) k) | LOAD x exp :: k => do a <- eval e exp; match a with | Mem.A a => Vis (Mem.Load a) (fun v => go (add e x v) k) | _ => raise "loaded non-address" end …

Expose other effects like "load" using Vis.

Page 40: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

40

Lemma load_elim : ∀ x y ex m e code, not_used_in x ex -> (mem m (interpret e ([LOAD x ex; LOAD y ex]++code))) ≡ (mem m (interpret e ([LOAD x ex; LET y (Var x)]++code))).

In this (sequential) memory model, two loads of the same address can be replaced by one load and a "let".

Page 41: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

41

(demo)

Page 42: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

1.  Explain Interaction Trees

2.  Describe "toy" Vellvm

3.  Come back to DeepSpec

42

Page 43: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Network IO

43

(* IO interactions for sockets *) Inductive networkE : Type -> Type := | Listen : endpoint_id -> networkE unit | Accept : endpoint_id -> networkE connection_id | ConnectTo : endpoint_id -> networkE connection_id | CloseConn : connection_id -> networkE unit | Recv : connection_id -> positive -> networkE (option string) | Send : connection_id -> string -> networkE unit .

Page 44: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

OS-level API

44

(* OS-level refinement of Network-level Spec *) Inductive SocketAPI1 : Type -> Type := | Socket_Socket (domain : Z) (type : Z) (protocol : Z) : SocketAPI1 (SocketError + sockfd) | Socket_Close (fd : sockfd): SocketAPI1 (SocketError + unit) | Socket_BindAndListen (fd : sockfd) : SocketAPI1 (SocketError + unit) | Socket_Accept (fd : sockfd) : SocketAPI1 (SocketError + sockfd) | Socket_Recv (fd : sockfd) (num_bytes : Z): SocketAPI1 (SocketError + string) | Socket_Send (fd : sockfd) (msg : string):

SocketAPI1 (SocketError + unit) .

Page 45: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Fancier IO Specs Combinators at the Event level

–  to "mix and match" behaviors –  made palatable via typeclasses

45

(* Example: combine nondeterminism, failure, Sockets *) Definition SocketM (T : Type) := (nondetE +' failureE +' SocketAPI.SocketAPI1) T.

Page 46: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Uses •  Writing effectful programs in Coq •  Giving specifications by "zipping"

–  relating a "client" to a "server" –  for testing / proving

•  Transducers: change levels of abstraction –  e.g. from "high-level" LLVM memory model to

"low-level" machine model

46

Page 47: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Technical Challenges •  Coinduction in Coq

–  syntactic productivity constraints are a pain –  Gil Hur's paco library helps (somewhat)

•  Proofs of some basic facts surprisingly tricky to prove –  e.g. congruence of bind up to Tau –  several possible ways to define EquivUpToTau

47

Page 48: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Work in Progress •  Library & automation support for Interaction Trees

•  Verfied Software Toolchain –  Interaction trees as "ghost state" in separation logic –  connection to CompCert

•  Vellvm proofs about more complex memory models –  int2ptr / ptr2int

•  Using the new Vellvm semantics for interesting optimizations

48

Page 49: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Interaction Trees promising abstraction for

Coq formalization of interactive behaviors

49

deepspec.org github.com/vellvm/vellvm

50

τ τ τ

τ τ τ τ τ τ

τ τ τ τ e2

k2 0

k2 1

k2 3 e1

k1 a

k1 b

k1 c

k1 d

k1 e

τ "error"

τ τ τ …

τ e3 k3 () τ e3 k3 () τ e3 k3 () …

τ τ 42 τ τ 17

τ 11

τ τ τ 0

Page 50: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

50

Page 51: Modular Semantics for LLVM IR · • Effects – structured heap load/store – system calls (I/O) • Types & Memory Layout – structured, recursive types – type-directed projection

Definition bind_body {E X Y} (s : M E X) (go : M E X -> M E Y) (t : X -> M E Y) : M E Y := match s with | Ret x => t x | Vis e k => Vis e (fun y => go (k y)) | Tau k => Tau (go k) | Err s => Err s end. Definition bindM {E X Y} (s: M E X) (t: X -> M E Y) : M E Y := (cofix go (s : M E X) := bind_body s go t) s. 51