Advanced Topics: Reasoning About Code Pointers & Self ...

64
Advanced Topics: Reasoning About Code Pointers & Self- Modifying Code Zhong Shao Yale University July 25, 2012

Transcript of Advanced Topics: Reasoning About Code Pointers & Self ...

Page 1: Advanced Topics: Reasoning About Code Pointers & Self ...

Advanced Topics: Reasoning About Code Pointers & Self-

Modifying Code

Zhong ShaoYale University

July 25, 2012

Presenter
Presentation Notes
Remove foot
Page 2: Advanced Topics: Reasoning About Code Pointers & Self ...

L1

L2

L3

L4

Building fully certified systems• One logic for all code

– Consider all possible interactions.– Very difficult!

• Reality– Only limited combinations of

features are used.– It’s simpler to use a specialized

logic for each combination.– Interoperability between logics

Page 3: Advanced Topics: Reasoning About Code Pointers & Self ...

OCAP

Our solution

Ln…L1

Mechanized Meta-Logic (CiC)

Modeling of the machine

…C1 Cn

C1C1Cn

…OS

Cn

…C1 Cn

TCB

Page 4: Advanced Topics: Reasoning About Code Pointers & Self ...

A toy machine

I1f1 :

I2f2 :

I3f3 :

(code heap) C

0

r1

1 2 …

r2 r3 … rn

(data heap) H

(register file) R

(state) S

addu …lw …sw ……

j f

(instr. seq.) I

(program) P::=(C,S,pc)

::=(H,R)::={f

I}*

pc

Page 5: Advanced Topics: Reasoning About Code Pointers & Self ...

Program specifications

I1f1 :

I2f2 :

I3f3 :

(code heap) C

0

r1

1 2 …

r2 r3 … rn

(data heap) H

(register file) R

(state) S

addu …lw …sw ……

j f

(instr. seq.) I

(program) P::=(C,S,pc)

::=(H,R)::={f

I}*

pc

1

2

3

(spec) ::= {f }*

Page 6: Advanced Topics: Reasoning About Code Pointers & Self ...

Invariant- based verification

Initial condition: Inv

(P0 )

P0c1 P1

c2 P2c3 … cn Pn

Progress:if Inv

(P), then P’. P c P’.

Preservation:if Inv

(P) and P c P’, then Inv

(P’).

Presenter
Presentation Notes
Invariants are hard to find
Page 7: Advanced Topics: Reasoning About Code Pointers & Self ...

Mechanized Meta-Logic (CiC)

OCAP Rules

Ln…

“Domain-specific” logics

Modeling of the machine

L1

…C1 Cn

may use different How to link modules?

Page 8: Advanced Topics: Reasoning About Code Pointers & Self ...

OCAP Rules

The OCAP framework [TLDI'07]

Ln…L1

…C1 Cn

( )L1 ( )LnSoundSound

OCAPSoundness

Mechanized Meta-Logic (CiC)

Modeling of the machine

XCAP SCAPTAL …AIM

Page 9: Advanced Topics: Reasoning About Code Pointers & Self ...

Verification of low-level code• Motivation

– everything eventually runs as machine-level binaries– some code must be written at the low level (e.g., context switch,

interaction w. devices) – most compilers cannot be trusted– some high-level language features are not well-understood (very

complex semantic models)• Challenges

– arbitrary control-flow, aliasing, stored programs– no type system (any code is fine)

• Objectives– certifying both user- and system-level code– modular specification & certification is crucial– embedding of high-level systems (to reuse high-level proofs)

Page 10: Advanced Topics: Reasoning About Code Pointers & Self ...

A toy target machine TM

Presenter
Presentation Notes
Since we are working at the assembly language level, we first define a simple target machine.
Page 11: Advanced Topics: Reasoning About Code Pointers & Self ...

Syntax of TM

Page 12: Advanced Topics: Reasoning About Code Pointers & Self ...

Operational semantics of TM

Page 13: Advanced Topics: Reasoning About Code Pointers & Self ...

• Hoare logic in CPS

• Mechanized in a proof assistant (Coq) with a very rich meta logic plus inductive definitions

• The meta logic also serves as the assertion language!!

example:

• Entailment between two assertions (P |= Q) are “semantic”, i.e., just implication in the meta logic

• Program language syntax, semantics, and correctness theorem are all represented and reasoned using the same meta logic

• Like Lamport’s TLA except our meta logic is mechanized

• Hoare-style assertions & inference rules enforce both the correctness & type safety properties

• No need of a separate type system; not a “refinement”

Certified Assembly Programming (CAP)[Yu03, Hamid04, Yu04, Feng05]

Presenter
Presentation Notes
Certified Assembly Programming, or CAP, is a hoare-logic based PCC framework. In CAP, the program is written and specified in CPS, so the post-condition in usual hoare-triple is removed and instead get specified as continuations in precondition.
Page 14: Advanced Topics: Reasoning About Code Pointers & Self ...

CAP inference rules for instructions

Page 15: Advanced Topics: Reasoning About Code Pointers & Self ...

CAP inference rules (cont’d)

Page 16: Advanced Topics: Reasoning About Code Pointers & Self ...

How CAP works

Can be used to prove simple safety and partial correctness properties.

Page 17: Advanced Topics: Reasoning About Code Pointers & Self ...

ECP problem w. Hoare logic• Embedded code pointers (ECP)

Examples: computed GOTOs, higher-order functions, indirect jumps, continuations, return addresses

• Previous approaches– Ignore ECP [Necula98, Yu04]

– Limit ECP specifications to types [Hamid04]

– Sacrifice modularity [Yu03]

– Use complex indexed semantic models [Appel01]

Presenter
Presentation Notes
By embedded code pointers (ECP), we mean things like ……. Reynolds in his LICS02 talk describe ECP as an open problem for general Hoare logic. The previous approaches to the ECP problem for Hoare logic are not exactly satisfactory.
Page 18: Advanced Topics: Reasoning About Code Pointers & Self ...

User-level code: list appendAdapted from [Reynolds02]

11

22

n-1n-1

n-2 ……

Page 19: Advanced Topics: Reasoning About Code Pointers & Self ...

User-level code: list appendAdapted from [Reynolds02]

11

22

n-1n-1

n-2 ……

Page 20: Advanced Topics: Reasoning About Code Pointers & Self ...

Type-based Logic-basedInductive definitions

(correctness of list append) - +

Strong update (Separation logic)(allocation, de-allocation, mutation) - +

Embedded code pointers (continuation) + -

Impredicative polymorphisms (closure) + -

Adapted from [Reynolds02]User-level code: list append

Presenter
Presentation Notes
We want to certify the append function in PCC. To prove correctness of list append operation, inductive definitions are needed, which requires very complex type system. The allocation, de-allocation, and memory mutation needed to handle closures and list requires separation logic like strong update model, which is difficult to support in types. On the other hand, the support of embedded code pointers and impredicative polymorphisms that are required to describe continuations and closures are difficult in logic-based PCC.
Page 21: Advanced Topics: Reasoning About Code Pointers & Self ...

The ECP problem

cptr(f, a) = ?

Page 22: Advanced Topics: Reasoning About Code Pointers & Self ...

• Internalize Hoare-derivation for ECP

Previous approach

Circularity!

• Stratification[OHearn97, Naumann01]

– Works for simple case– Hard for assembly– Hard for polymorphism

• Step-Indexing[Appel01, Appel02, Schneck03]

– Works for polymorphism– Heavyweight– Not standard Hoare logic

Page 23: Advanced Topics: Reasoning About Code Pointers & Self ...

CAP’s approach• Specify ECP by checking against code spec

• Verify all code specs are indeed valid

• Modularity problem

Page 24: Advanced Topics: Reasoning About Code Pointers & Self ...

The XCAP approach• Specify ECP independent of code spec

• Check ECP against global code spec

• Verify global code spec is indeed valid

Page 25: Advanced Topics: Reasoning About Code Pointers & Self ...

Extended propositions

Page 26: Advanced Topics: Reasoning About Code Pointers & Self ...

XCAP rules

Page 27: Advanced Topics: Reasoning About Code Pointers & Self ...

How XCAP supports ECP

(SEQ)

(ECP)

(JMP)

(JD)

Page 28: Advanced Topics: Reasoning About Code Pointers & Self ...

Verification of append()

Page 29: Advanced Topics: Reasoning About Code Pointers & Self ...

Impredicative polymorphisms

• Important for ECP

• Naïve interpretation function fails

Page 30: Advanced Topics: Reasoning About Code Pointers & Self ...

New interpretation [POPL’06]

Soundness of interpretation

Interpretation

Consistency

Page 31: Advanced Topics: Reasoning About Code Pointers & Self ...

Soundness of XCAP

Page 32: Advanced Topics: Reasoning About Code Pointers & Self ...

Case study: context switch?swapcontext:

• Runs thousands of time per second• Used by assembly, C, MSIL, JVML, etc.• Basis of multi-tasking, OS, and software• Safety and correctness taken for granted

Page 33: Advanced Topics: Reasoning About Code Pointers & Self ...

Context switch on x86 (cont’d)swapcontext:; store old contextmov eax, [esp+4]mov [eax+0], OKmov [eax+4], ebxmov [eax+8], ecxmov [eax+12], edxmov [eax+16], esimov [eax+20], edimov [eax+24], ebpmov [eax+28], esp

; load new contextmov eax, [esp+8]mov esp, [eax+28]mov ebp, [eax+24]mov edi, [eax+20]mov esi, [eax+16]mov edx, [eax+12]mov ecx, [eax+8]mov ebx, [eax+4]mov eax, [eax+0]ret

Page 34: Advanced Topics: Reasoning About Code Pointers & Self ...

swapcontext:

old

Context switch (cont’d)

eaxebxecxedxesiediebpesp

retp

…call swapcontext

retp’

………

a1a2a3a4a5a6a7a8

b1b2b3b4b5b6b7b8

OKnew

a8

Page 35: Advanced Topics: Reasoning About Code Pointers & Self ...

Context switch (cont’d)swapcontext:

• Simple code, complex reasoning!– stack / heap / memory mutation– procedure call / first-class code pointer– protection / polymorphism

• Lack specification and verification that are– formal (machine checkable in sound logic)– general (allows all possible usage of context)– realistic (usable from assembly and C level)

Page 36: Advanced Topics: Reasoning About Code Pointers & Self ...

Buggy context code today

Page 37: Advanced Topics: Reasoning About Code Pointers & Self ...

Certifying context-switch [Ni et al TPHOLs 2007]

• The first to verify machine-context code– realistic, no rewriting, no performance penalty

• Based on realistic hardware– variable length instruction decoding, finite word, etc.

• Uses language-based techniques– modular specification and proof

• Fully mechanized– code, machine, meta theory, specification, proof

Page 38: Advanced Topics: Reasoning About Code Pointers & Self ...

The codetypedef struct mctx_st *mctx_t;struct mctx_st {int eax, int ebx, int ecx, int edx,

int esi, int edi, int ebp, int esp};

void swapcontext (mctx_t old, mctx_t new);

void loadcontext (mctx_t mctx);mov eax, [esp+8] // load address of the new contextmov esp, [eax+_esp] // load the new stack pointermov ebp, [eax+_ebp] // load the new registersmov edi, [eax+_edi]mov esi, [eax+_esi]mov edx, [eax+_edx]mov ecx, [eax+_ecx]mov ebx, [eax+_ebx]mov eax, [eax+_eax]ret // invoke the new context

Page 39: Advanced Topics: Reasoning About Code Pointers & Self ...

The code (continued)void makecontext (mctx_t mctx, char *sp, void *lnk,

void *func, void *arg);mov eax, [esp+4] // load address of the contextmov ecx, [esp+8] // load stack top pointer

// for the new stack framemov edx, [esp+20] // load the function's argumentmov [ecx-4], edx // push it onto new stackmov edx, [esp+12] // load the function's return linkmov [ecx-8], edx // push it onto new stackmov edx, [esp+16] // load the function addressmov [ecx-12], edx // push it as return IP onto new stacksub ecx, 12mov [eax+_esp], ecx // all useful info for fresh context

// is on new stackret

Page 40: Advanced Topics: Reasoning About Code Pointers & Self ...

Challenges

1. Polymorphism over arbitrary shape of data2. Multiple explicit stacks and flexible handling3. Strong-update (separation logic)4. General embedded code pointers

Higher-order functions and continuations

5. Partial correctness

TAL Hoare Logic Index-based FPCCProblem 1, 2, 3, 5 4 5

Page 41: Advanced Topics: Reasoning About Code Pointers & Self ...

Applying XCAP to x86

• Finite machine word• Stack push / pop• Function call / return• Variable-length instruction• Word-aligned memory

Page 42: Advanced Topics: Reasoning About Code Pointers & Self ...

Reasoning about memory

• Strong updates (separation logic)• Shallow embedding in PropX, e.g.,

Page 43: Advanced Topics: Reasoning About Code Pointers & Self ...

Reasoning about control flow

• Direct jump

• Indirect jump

• Function call

• Return

Page 44: Advanced Topics: Reasoning About Code Pointers & Self ...

Stack and calling convention

local storage

return addressargument 1argument 2

…argument n

caller frames

excess space

esp

Page 45: Advanced Topics: Reasoning About Code Pointers & Self ...

What is a machine context?

………

retvbxcxdxsidibpsp

cs

mctxpublic

private

typedef struct mctx_st *mctx_t;struct mctx_st { int eax,int ebx,int ecx,int edx,

int esi, int edi, int ebp,int esp };

ret

Page 46: Advanced Topics: Reasoning About Code Pointers & Self ...

swapcontext()void swapcontext (mctx_t old, mctx_t new);

mov eax, [esp+4] mov [eax+ 0], OK mov [eax+ 4], ebxmov [eax+ 8], ecxmov [eax+12], edxmov [eax+16], esimov [eax+20], edimov [eax+24], ebpmov [eax+28], espmov eax, [esp+8] mov esp, [eax+28]mov ebp, [eax+24]mov edi, [eax+20]mov esi, [eax+16]mov edx, [eax+12]mov ecx, [eax+ 8]mov ebx, [eax+ 4]mov eax, [eax+ 0]ret

Page 47: Advanced Topics: Reasoning About Code Pointers & Self ...

First half of the proof

Page 48: Advanced Topics: Reasoning About Code Pointers & Self ...

Second half of the proof

Page 49: Advanced Topics: Reasoning About Code Pointers & Self ...

Other routinesvoid loadcontext (mctx_t mctx);void makecontext (mctx_t mctx, char *sp, void *lnk, void *func, void *arg);

Page 50: Advanced Topics: Reasoning About Code Pointers & Self ...

Coq implementation

Page 51: Advanced Topics: Reasoning About Code Pointers & Self ...

Self‐Modifying Code (SMC)

Definition: Any program that loads, generates, or 

mutates code at runtime.

SMC is important to verify

Intrinsically natural under von Neumann architecture

Many applications, including

Runtime code generationCommonly used for improving performanceJava Library: gnu.bytecode, org.apache.bcelC#(.NET) Library: System.Reflection.Emit

Runtime code modificationCode obfuscationMalicious softwareShellcode

Page 52: Advanced Topics: Reasoning About Code Pointers & Self ...

Example ‐ A Typical OS Bootloader

Memory Disk

0x1000

bootloaderbootloaderbootloaderbootloader

kernelkernel

Sector 1

Sector 2 kernelkernelcopied by BIOS before start-up

copied by bootloader

Load kernel

jmp

0x1000

kernel

(in mem)

0x0000

0x7c00

Page 53: Advanced Topics: Reasoning About Code Pointers & Self ...

Verifying SMC ‐ the Challenge

For OS bootloader(runtime

code generation)

The kernel is accessed as regular data

But also executes as program code

For General SMC scenario

Code and data are stored in the same memory

Program code alters at runtime

Unbounded times of code modification

Control flow is difficult to represent

All the existing verification techniques:

have to assume a fixed code heap

stop working in the presence of SMC!Memory

0x1000

Load kernel

jmp

0x1000

kernel

(in mem)

0x0000

kernel

(in mem)

kernel

(in mem)

0x7c00

Page 54: Advanced Topics: Reasoning About Code Pointers & Self ...

Our New Idea ‐ Machine model

Machine model used before

Generalized model

Extend the assertion language: expressing code body

Page 55: Advanced Topics: Reasoning About Code Pointers & Self ...

Intuition of CAP

The program is stored in the code heap

Code Heap

Page 56: Advanced Topics: Reasoning About Code Pointers & Self ...

Intuition of CAP

The program is stored in the code heap

Code blocks and control flow

For every code block,

Assign a precondition

Obtain intermediate conditions

Reason through the code body

Each code block’s export condition must  derive target preconditions

Code Heap

……

……

……

……

f1 : f2 :

f3 :

{a1

}{a1

} {a2

}{a2

}

{a3

}{a3

}

Page 57: Advanced Topics: Reasoning About Code Pointers & Self ...

Intuition of CAP

The program is stored in the code heap

Code blocks and control flow

For every code block,

Assign a precondition

Obtain intermediate conditions

Reason through the code body

Each code block’s export condition must  derive target preconditions

Code specification

Partial correctness

Whenever fi

is reached, ai

is satisfied

Code Specification

f1 : {a1

}{a1

}f2 : {a2

}{a2

}f3 : {a3

}{a3

}

Page 58: Advanced Topics: Reasoning About Code Pointers & Self ...

Intuition of GCAPUnified Heap

Page 59: Advanced Topics: Reasoning About Code Pointers & Self ...

Intuition of GCAP (cont’d)

Generalized code blocks

Executing sequences of code

Can have overlap in memory

For every code block,

Precondition and intermediate conditions 

carry program code

Export condition must derive target  conditions disjunction

Parametric code

Solves the verification of unbounded 

code modification

Local reasoning

Eliminate irrelevant code

…….

……

…….

……

….

….

….

….

…….…….

a3

a3

a1

a1 a2

a2

a4

a4

a5

a5

Unified Heap

Page 60: Advanced Topics: Reasoning About Code Pointers & Self ...

Example ‐ Certifying the  bootloader• Two code modules

• Control Flow

• Assign the pre‐conditions

Memory

0x1000

Load kernel

jmp

0x1000

kernel

0x0000

B1

B2

0x7c00

B2

in memoryB2

in memory

B2

in disk &

B1

in memory

B2

in disk &

B1

in memory

Page 61: Advanced Topics: Reasoning About Code Pointers & Self ...

Formalization

Assertion Logic

A higher‐order logic

We use Calculus of induction construction (CiC)

Parametric code is expressed with existential quantifiers

Axiomatic inference rules for judgments

Well‐formed world

Well‐formed code heaps

Well‐formed code blocks

Three‐level systems

GCAP0: Verifying non‐self‐modifying code

GCAP1: Verifying runtime code generation

GCAP2: Verifying general self‐modifying code

Page 62: Advanced Topics: Reasoning About Code Pointers & Self ...

Soundness & Expressiveness

• Soundness and partial correctness (GCAP0,1,2)– Theorem: any well‐formed world is safe to 

execute for arbitrary steps without violating its  specification

• Expressiveness (GCAP2)– Theorem: Any invariant‐based proof can be 

translated into GCAP2.– Expressiveness: GCAP2 > GCAP1 > GCAP0

Page 63: Advanced Topics: Reasoning About Code Pointers & Self ...

What we have verified [PLDI’07]

Basic SMC Constructs Important Applications

opcode

modification self‐growing code

control flow modification polymorphic code

unbounded code rewriting code optimization

runtime code checking  code compression

runtime code generation code obfuscation

multilevel RCG code encryption

self‐mutating code block  OS bootloaders

mutual modification  shellcode

Page 64: Advanced Topics: Reasoning About Code Pointers & Self ...

Implementation

• Implementation (Under Coq)– General machine model GTM– Encoding of x86 and MIPS architectures– Assertion language and separation logic– GCAPs

with the complete proof of soundness

– Certified examples• A real OS bootloader

(under Bochs)

• MIPS code examples (under SPIM)