Intermission

13
Intermission

description

Intermission. Binary parsing. main. foo. _ lock_foo. dynamic instrumentation, debugger, static binary analysis tools, malware analysis, binary editor/rewriter, …. Familiar territory. - PowerPoint PPT Presentation

Transcript of Intermission

Page 1: Intermission

Intermission

Page 2: Intermission

Binary parsing

2The Deconstruction of Dyninst

_lock_fo

o

main

foo

dynamic instrumentation, debugger, static binary analysis tools, malware analysis, binary

editor/rewriter, …

Page 3: Intermission

3

Familiar territory

Benjamin Schwarz, Saumya Debray, and Gregory R. Andrews. Disassembly of executable code revisited. 2002

Cristina Cifuentes and K. John Gough. Decompilation of binary programs. 1995

Richard L. Sites, Anton Chernoff, Matthew B. Kirk, Maurice P. Marks, and Scott G. Robinson. Binary translation. 1993.

HenrikTheiling. Extracting safe and precise control flow from binaries. 2000.

Ramkumar Chinchani and Eric van den Berg. A fast static analysis approach to detect exploit code inside network flows. 2005.

J. Troger and C. Cifuentes. Analysis of virtual method invocation for binary translation. 2002.

Laune C. Harris and Barton P. Miller. Practical analysis of stripped binary code. 2005.

Christopher Kruegel, William Robertson, Fredrik Valeur, and Giovanni Vigna. Static disassembly of obfuscated binaries. 2004.

Nathan Rosenblum, Xiaojin Zhu, Barton P. Miller, and Karen Hunt. Learning to analyze binary computer code. 2008.

Amitabh Srivastava and Alan Eustace. ATOM: a system for building customized program analysis tools. 1994.

Barton Miller, Jeffrey Hollingsworth, and Mark Callaghan. Dynamic Program Instrumentation for Scalable Performance Tools. 1994.

Page 4: Intermission

We’ve been down this road…

4The Deconstruction of Dyninst

recursive traversal parsing“gap” parsing heuristicsprobabilistic code models

non-contiguous functions

code sharing non-returning

functions

preamble scanning handles stripped

binaries

learn to recognize function entry points

very accurate gap parsing

the DYNINST binary parser

Page 5: Intermission

What makes a parsing component?

5The Deconstruction of Dyninst

0111010110

1010101010

1110101001

0101011100

0100100101

1010110011

0101010101

0101001001

1110

0101110010110

Parsing API

simple, intuitive

representation

2

functions

blocksedgesInstructionAPI

SymtabAPI

platform independence supported by previous Dyninst components

3

Binarycodesource

abstraction

1

Page 6: Intermission

Flexible code sources

6The Deconstruction of Dyninst

a binary code object

Parser code source requirements:

code location

code data

access to code bytes

unsigned char * buf

41 56 49 89 fe 41 55 …

main foo bar baz

function hints & names

a few (optional) facts

pointer width

external linkage

PLT

Page 7: Intermission

Code source contract

7The Deconstruction of Dyninst

bool isValidAddress

bool isExecutableAddress

void * getPtrToInstruction

void * getPtrToData

unsigned

getAddressWidth

bool isCode

bool isData

Address codeOffset

Address codeLength

Nine mandatory methods

SymtabAPI implementation in 232 lines (including optional hints, function names)

Any binary code object that can be memory mapped can be parsed

Page 8: Intermission

Simple control flow interface

8The Deconstruction of Dyninst

Functions Blocks Edges

start addr.

extents

contain joined by

start addr.

end addr.

in edges

out edges

srctarg

type

Page 9: Intermission

Views of control flow

9The Deconstruction of Dyninst

while(!work.empty()) { Block *b = work.pop();

/* do something with b */

edgeiter eit = b->out().begin(); while(eit != b->out().end()) { work.push(*eit++);

}}

Walking a control flow graphstarting here

What if we only want intraprocedural

edges?

Page 10: Intermission

Edge predicates

10The Deconstruction of Dyninst

while(!work.empty()) { Block *b = work.pop();

/* do something with b */

IntraProc pred; edgeiter eit = b->out().begin(&pred); while(eit != b->out().end()) { work.push(*eit++);

}}

Walking a control flow graph Edge Predicates

Tell iterator whether Edge argument should be returnedComposable (and, or)

Examples:

Intraprocedural Single function

context Direct branches

only

Page 11: Intermission

Extensible CFG objects

11The Deconstruction of Dyninst

image_func

Function

Dyninst image_func

ParseAPI FunctionSimple, only need to represent control flow graph

Complex, handles instrumentation, liveness, relocation, etc.

Special callback points during

parsingparse parse parse

unresBranchNotify(insn)

[derived class does stuff]

parse parse parse

Factory interface for CFG objects

parser

custom

factory

mkfunc()

(Function*)image_func

Page 12: Intermission

What’s in the box?

12The Deconstruction of Dyninst

* box to be released soon

Binary ParserControl Flow Graph

RepresentationSymtabAPI-based

Code Source recursive descent

parsing speculative gap

parsing cross platform:

x86, x86-64, PPC, IA64, SPARC

graph interface extensible objects for

easy tool integration exports Dyninst

InstructionAPI interface

cross-platform supports ELF, PE,

XCOFF formats

Page 13: Intermission

Status

13The Deconstruction of Dyninst

conception code refactoring interface designDyninst re-integration

(major test case)

other major test case: compiler

provenance (come tomorrow!)