A Dependently Typed Assembly Language Robert Harper (Joint work with Robert Harper)

38
A Dependently Typed Assembly Language (Joint work with Robert Robert Harper Harper)
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    1

Transcript of A Dependently Typed Assembly Language Robert Harper (Joint work with Robert Harper)

A Dependently Typed Assembly Language

(Joint work with Robert HarperRobert Harper)

The general motivation

Q: Why do we want to type low level

languages?

A: We want to reap the benefits of type systems at low levels.

Advantages of type systems

•Capturing program errors at compile-time (well-known)

•Enabling aggressive compiler optimizations (recent)

•Supporting sophisticated module systems (SML)

•Facilitating program verification (NuPRL, Coq, PVS)

•Serving as program documentation

The goal of this work

The goal is to capture memory safety of assembly code through a type system

Memory Safety = Type Safety + Safe Array Access

Array bounds checking problem

Array bounds checking refers to determining whether the value of an expression is within the bounds of an array when the expression is used to index the array.

Byte copy: A version in C

voidbcopy(int src[], int dst[]) { int i;

if (length(src) != length(dst) { printf “bcopy: unequal lengths\n”; exit(1); } for(i=1; i < length(src), i++) dst[ii] = src[i];}

Dynamic array bounds checking

• is required for safe languages such as Java, Modula-3, ML, Pascal

• can be expensive in practice (e.g. numerical computation)

• bounds violation is a rich source of program errors in unsafe languages such as C, C++ (e.g. off-by-one error)

Some experimental results

0102030405060

bco

py

bin

ary

sear

ch

bub

ble

sor

t

mat

rix m

ult

quee

n

quic

k so

rt

han

oi t

ower

w/ checks

w/o checks

Static array bounds checking

• Flow Analysis– no annotations required (fully

automatic)– limited to simple cases and sensitive

to program structures– little or no feedback for detecting

program errors

Static array bounds checking

• Type-based approaches– ML type system is too coarse– full dependent types is too fine– dependent ML provides an intermediate

type system with practical type-checking• it is adequate for array bounds checking

elimination• the programmer must write some type

annotations

What are dependent types?

Dependent types depend on the values of language expressions.

For instance, type : dependent type array : array(x) int : int(x) stack : stack(x)

Byte copy: A version in de Caml

let bcopy src dst = begin for i = 0 to vect_length(src) - 1 do dst..(i) <- src..(i) doneendwithtype {n:nat} int vect(n) -> int vect(n) -> unit

Array bounds checking in mobile code

• It needs to be enforced for safety concerns

• It is difficult to eliminate since the machine which executes the code may not trust the source of the code

• It is time-consuming to be compiled away

Some key applications of DTAL

• Compiler verification• Mobile code security• Mobile code efficiency

Increment: A flow chart

i:int

l:label

sp: pop r1

l:labelsp:

r1 = i add r1, r1, 1

r1 = i + 1

sp:

l:labelpop r2 r1 = i + 1

r2 = l

i+1:intsp:

r2 = l

push r1

jmp r2continue ...

Increment: An assembly version

inc: pop r1 add r1, r1, 1 pop r2 push r1 jmp r2

State types

A state type corresponds to code continuation. It records the type information about register file and stack.

For instance,• [r1: int(i), r2: int array(i)]• (‘a)[r1: ‘a, r2: [r1: ‘a]]• (‘a,‘b)[r1: ‘a, r2: ‘b, r3: [r1: ‘a, r2: ‘b]]• {n:nat}[sp: [sp: stack(n)] :: stack(n)]

Register file

We use an array representation for register file.

r1: tau1

r2: tau2

rn: taun

Stack

We use a list representation for stack.

i1 tau1 i2 tau2

in taun

i tau

Increment: A version in DTAL

inc:{i:int}{n:nat} [sp: int(i):: [sp: int(i+1)::stack(n)]::stack(n)]

pop r1 // r1: int(i) add r1, r1, 1 // r1: int(i+1) pop r2 // r2: [sp:int(i+1)::stack(n)] push r1 // sp: int(i+1)::stack(n) jmp r2

Type index objects

• index i,j ::= a | c | i+j | i-j | i*j | i/j

• index prop P ::= false | true | i<j | i<=j | i=j| i>=j| i>j | not P | P1 and P2 | P1 or P2

• index sort gamma ::= int | {a: gamma | P} For instance, nat is a shorthand for {a: int | a >= 0}

Types

• types tau ::= alpha | sigma | int(x) | tau array(x) | stack(x) | prod(tau1,…,taun) | {a:gamma}.tau

• state types sigma ::=[(alpha1,…,alphan){a1:gamma1,…,an:gamman}.state]

• state state ::= (register file,stack)

Note: int is for {a:int}.int(a) nat is for {a:nat}.int(a)

Instructions in DTAL

• instructions ins ::= aop rd, rs, v | bop r, v | jmp v | load rd, rs(v) | store r |

newtuple[tau] r | newarray[tau] r | mov r, v | pop r | push r

• values v ::= () | i | l | r

• instruction sequences I ::= halt | ins; I | l; I

Programs in DTAL

• label maps Lambda ::= (l1: sigma1, …, ln: sigman)• programs ::= (Lambda, I)

Memory allocation

• Tuple of type prod(tau1,…,taun)

• Array of type tau array(n)

tau1

tau2

taun-1

taun

tautau

tautau

Memory allocation: an example

A tuple of type prod(int, prod(int, int)):

intintint

Array types are non-variant

tau1 <= tau2 does not implies tau1 array(n) <= tau2 array(n)

Here is a counterexample: r1: nat array(2) r2: int array(2)

00

r1 = = r2

State types are contra-variant

state state’ implies

[state] <= [state’]

Typing unconditional jumps

state V : [state’]

state state’

I

state jmp v; I

Typing conditional jumps

r = 0?

assumption;state

assumption;state

beq r,v; I

r:int(x)

assumption;state

v:[state’]

assumption,x!=0;state I

assumption,x=0;state state’

Byte copy: A flow chart

r4 <- 0

loop

bcopy

r5 <- r1 - r4

r5 > 0? finish

r5 <- r2(r4)r3(r4) <- r5

r4 <- r4 + 1

r1: array size r2: src r3: dst

Byte copy: A version in DTAL

bcopy:{i:nat} [r1: int(i), r2: int array(i), r3: int array(i)] mov r4, 0 // r4 <- 0 jmp loop // start loop

Byte copy: a version in DTAL

loop: {i:nat, k:nat} [r1: int(i), r2: int array(i) r3: int array(i), r4: int(k)] sub r5, r1, r4 // r5 = r1 - r4 blte r5, finish // r5 <= 0 ? load r5, r2(r4) // safe load store r3(r4), r5 // safe store add r4, r4, 1 // r4 <- r4 + 1 jmp loop // loop againfinish:[] halt

Operational semantics of DTAL

We use a standard abstract machine for assigning operational semantics to DTAL programs. The machine consists of three finite maps:

(Heap, Register File, Stack)

Soundness

• The execution of a DTAL program can either– terminate normally, or– run forever, or– stall.

• A well-typed DTAL program can never stall.

Related work

Here is a (partial) list of some closely related work.– Dependent types in practical

programming (Xi & Pfenning)– TALC Compiler (Morrisett et al at Cornell)– Safe C compiler (Necula & Lee)– TIL compiler (the Fox project at CMU)

Current status & Future work

• We have finished the following.– Theoretical development of DTAL– A prototype implementation of a type-

checker for DTAL

• We are working on the following.– Designing a dependent type system

of JVML (de JVML)– Compiling (a subset of Java) into de

JVML

Conclusion

We have demonstrated some uses of dependent types at assembly level.– It can help compiler debugging and

verification– It can certify the memory safety

property of mobile code– It can lead to safer programs without

compromising efficiency