150412 38 beamer methods of binary analysis

38
Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks Referen Methods of binary analysis and Valgrind tool MA-INF 3318 - Seminar Verification of Complex Systems Raghunandan Palakodety Supervised by: Dr. Michael Gerz Universit¨ at Bonn 12-01-2015

Transcript of 150412 38 beamer methods of binary analysis

Page 1: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Methods of binary analysis and Valgrind toolMA-INF 3318 - Seminar Verification of Complex Systems

Raghunandan PalakodetySupervised by: Dr. Michael Gerz

Universitat Bonn

12-01-2015

Page 2: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Outline

1 Program analysis

2 Sample (Bad) Code

3 Drawbacks

4 Advanced Valgrind usageOther Tools in ValgrindFine Tuning and Client Requests

5 How does Valgrind do that?The Core - All Valgrind Tools have in commonMemcheck - Behind the Scenes

6 Q&A

7 Thanks

8 References

Page 3: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Introduction

What is Program analysis?

Program analysis is the process of automatically deriving propertiesabout the behavior of computer programs.

Dynamic program analysis Analysis is performed by executingthe program on chosen inputs. Traces of the actual executions arecollected and processed. Properties about the program behaviourare deduced based on the analysis of these concrete executions.[2]Static program analysis Analysis is performed without actuallyexecuting the program. An abstract model of the program is issuedand symbolically executed. Properties about program behavior arededuced from the analysis of these symbolic executions. [2]

Page 4: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Sample (Bad) Code

1 We will use the following code for demonstration purposes:

1 #def ine SIZE 1002 i n t main ( ) {3 i n t i , sum = 0 ;4 i n t ∗a = ma l l o c ( SIZE ) ;5 f o r ( i =0; i < SIZE ; ++i ) sum += a [ i ] ;6 a [ 1 0 1 ] = 1 ;7 f r e e ( a ) ;8 a = NULL ;9 i f ( sum > 0) p r i n t f ( ”Hi !\ n” ) ;

10 re tu rn 0 ;11 }

2 Contains many bugs. Compiles without warnings or errors.

Page 5: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Bugs

1 Memory was allocated but not initialized as in line 6 of thecode.

2 Use of uninitialized values as shown in line 7 of the code.

3 Trying to read past the end of allocated array as shown in line8 of the code.

4 Memory not de-allocated.

Page 6: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Usage of Valgrind tool

1 Run the above program with Valgrind

valgrind −−leak-check=full ./sample

2 Valgrind uses the default tool Memcheck. Error summarylooks as shown below

Example

==31152== Invalid read of size 4==31152== at 0x804844A: main (bad code.c:7)==31152== Address 0x41ff08c is 0 bytes after a block of size 100alloc’d==31152== at 0x402C5A9: malloc (vg replace malloc.c:296)==31152== by 0x8048430: main (bad code.c:6)

Page 7: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Invalid Reads

Example

==31152== Invalid read of size 4==31152== at 0x804844A: main (bad code.c:7)==31152== Address 0x41ff08c is 0 bytes after a block of size 100alloc’d==31152== at 0x402C5A9: malloc (vg replace malloc.c:296)==31152== by 0x8048430: main (bad code.c:6)

We read past the end of the allocated array.

Trying to read from area which we are not allowed to access.

Could result in a SEGFAULT.

Valgrind provides enough details to find the problem.

Page 8: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Invalid Writes

Example

==31152== Invalid write of size 4==31152== at 0x8048463: main (bad code.c:8)==31152== Address 0x41ff090 is 4 bytes after a block of size100 alloc’d==31152== at 0x402C5A9: malloc (vg replace malloc.c:296)==31152== by 0x8048430: main (bad code.c:6)

Similar to invalid read

Details provided by Valgrind:

Location of fault (addresses, line number if debug-informationpresent).Stack-trace to fault (you can get more using−−num−callers=30).Relevant blocks details and allocation/de-allocationstack-trace.

Page 9: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Memory Leaks

At the end of the run, Valgrind does “Garbage Collection”.

Unreferenced memory in C/C++ ⇒ memory leak

.

Example

==31152== 100 bytes in 1 blocks are definitely lost in loss record1 of 1==31152== at 0x402C5A9: malloc (vg replace malloc.c:296)==31152== by 0x8048430: main (bad code.c:6)

Valgrind provides stack-trace for the allocation point3 kinds:

Definitely lost (no pointers to allocation).Probably lost (pointers only to the middle of the allocatedblock).Still reachable (block hasn’t been free’d before exit, butpointers to it still exist).

Page 10: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Use of Uninitialized Value

Example

==31152== Conditional jump or move depends on uninitialisedvalue(s)==31152== at 0x8048476: main (bad code.c:10)

Valgrind checks and make sure the program flow isdeterministic.

Usage of values which haven’t been initialized in conditions isreported.

Also if they are passed as parameters for syscalls.

Valgrind will detail location in which the uninitialized datawas used.

To get trace to the source of it, add “−−track−origins=yes”to the command-line.

Page 11: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Reliability on Valgrind

1 Memcheck was used on an application that used the OpenSSLlibrary.

2 The tool complained about the library using uninitializedmemory in two locations in the filecrypto/rand/md rand.c. In detail .

3 Most of the time Valgrind’s errors describe latent flaws in theprogram.

4 However, sometimes Valgrind is wrong.5 Cryptography related code requires special attention.6 A debian developer commented out code that Valgrind didn’t

like.7 Resulting in a latent bug.8 With massive security implications.9 All because Valgrind claimed a value used is uninitialized.10 Was intentionally used so, to collect more entropy.

Page 12: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Problems Valgrind cannot address I

1 Buffer overflows adversely accessing valid memory.

2 Accesses to stack and global variables are not checked.

3 Business logic/algorithmic problems are not detected.

Page 13: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Problems Valgrind cannot address II

Example

char ∗p = ma l l oc ( 1024 ) ; /∗ b l o ck 1 ∗/char ∗q = ma l l oc ( 1024 ) ; /∗ b l o ck 2 ∗/p += 1200 ; /∗ ”p” now po i n t s i n t o b l o ck 2 ∗/∗p = ’ a ’ ; /∗ i n v a l i d w r i t e − but goes unde t ec t ed ∗/

#de f i n e BUFSIZE 5i n t main ( i n t argc , char ∗∗ a rgv ) {char buf [ BUFSIZE ] ;s t r c p y ( buf , a rgv [ 1 ] ) ; // no bounds check ing ,// a l though a s e c u r e v e r s i o n s t r c p y s ( ) e x i s t s .re tu rn 0 ;}

4 Checks only code that is executed.

5 Does not work well with statically-linked executable.

Page 14: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Valgrind on statically linked executable

Example

==22660== Conditional jump or move depends on uninitialisedvalue(s)==22660== at 0x804B413: IO cleanup (in/home/raghu/Desktop/WS14/Seminar/Presentation/Code/driver)==22660== by 0x80497D8: run exit handlers (in/home/raghu/Desktop/WS14/Seminar/Presentation/Code/driver)==22660== by 0x804981E: exit (in/home/raghu/Desktop/WS14/Seminar/Presentation/Code/driver)==22660== by 0x804911D: (below main) (in/home/raghu/Desktop/WS14/Seminar/Presentation/Code/driver)

1 Valgrind run on statically linked executable.

2 Too many error reports as shown above.

Page 15: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Why? Does Valgrind just not work with -static?

1 It does.

2 The problem is not in Valgrind, it is in glibc, which is notValgrind clean.

3 When you link statically, these errors come from executable(which includes code from libc.a).

4 But when linked dynamically, these errors come from theshared object libc.so.6.

5 By default Valgrind suppresses the errors.

6 In the case of statically linked executable, as a workaround, wecan write suppression files.

Page 16: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Other Tools in Valgrind

We talked about Memcheck – the default tool in Valgrind.

But Valgrind can do much more than that.

Contains many tools, some stable, some experimental.

You can even write your own tools in few days of work.

Valgrind ships with the following tools:

Memory Error Checkers

– SGCheck – Memcheck

Profilers

– Cachegrind – Callgrind– Massif

Thread Error Detectors

– Helgrind – DRD

Sample Tools and others

– Lackey – None– BBV

Page 17: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Experimental tool SGCheck

Activate by adding −−tool=exp-sgcheck.

Similar in goals to Memcheck.

Still experimental.

Uses very different approach than Memcheck.

Can detect failures Memcheck doesn’t detect (like the codewe saw ).

Has got false positives (different than the ones Memcheckgets).

Slower than Memcheck.

Doesn’t check for memory leakage and validity of accesses.

Use alongside Memcheck for better coverage.

Page 18: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

SGCheck illustration

i n t main ( ){i n t i , a [ 1 0 ] ;f o r ( i = 0 ; i <= 10 ; i++)

a [ i ] = 42 ;}

1 Principle: The key observation is that if a memory referencinginstruction accesses inside a stack or global array once, then itis highly likely to always access that same array.

2 At run time we will know the precise address of a[] on thestack, and so we can observe that the first store resultingfrom a[i] = 42 writes a[], and we will (correctly) assume thatthat instruction is intended always to access a[].

3 Then, on the 11th iteration, it accesses somewhere else,possibly a different local, possibly an un-accounted for area ofthe stack (eg, spill slot), so SGCheck reports an error.

Page 19: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Profiling Tools

Cachegrind

Traces the code memory accesses and jump patterns.Simulates a 2-level cache and branch predictor.Provide details about cache misses and their source.Can help optimizing performance critical code.

Callgrind

Extends Cachegrind.Propagates the costs along the call-tree.Has a KDE front-end – KCacheGrind.

Massif

Memory allocations profiler.Keeps stack-trace of every memory allocation/deallocation.Print memory usage status in peak times and upon specificintervals.

Page 20: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Thread Error Detectors

Functionality similar to Intel’s ThreadChecker.

Detects a variety of threading related problems:

Threading API misuse.Lock order problems (potential dead-locks).Data-Races.

Two similar implementations in Valgrind

Helgrind

Discussions on most forums attribute this tool produces manyfalse positives.Supposedly catches more errors too.

DRD

Page 21: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Client requests

Valgrind provides communication channel for tested programs.

Good for unit-test harnesses.

Also good if you are doing weird stuff in your code

A special memory allocator, such as object-pool

Use VALGRIND CREATE MEMPOOL to mark a memory area as anobject-poolUse VALGRIND MEMPOOL ALLOC to mark an object as allocatedUse VALGRIND MEMPOOL FREE to mark an object as free

Self-Modifying code, i.e. JIT compiler

Use VALGRIND DISCARD TRANSLATIONS to report about areasin which code has been changed

Useful also for hunting bugs

For example, check for memory leaks, usingVALGRIND DO LEAK CHECK

Page 22: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Suppression Files

Valgrind’s error summary contains too many details.

Most of the times it is indicating bugs that should be fixed.

But not always the one we want to fix right now.

Sometimes it is correct code, which Valgrind failed tounderstand (false positives).

Mostly in sophisticated/extremely optimized library code.Also possible when having unusual interactions with the kernelcode.

Valgrind includes a mechanism to silent a specific error.

Works with all tools that report errors.Simple file format, see documentation for details.Valgrind includes suppression for many common libs.

Page 23: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Outline

1 Program analysis

2 Sample (Bad) Code

3 Drawbacks

4 Advanced Valgrind usageOther Tools in ValgrindFine Tuning and Client Requests

5 How does Valgrind do that?The Core - All Valgrind Tools have in commonMemcheck - Behind the Scenes

6 Q&A

7 Thanks

8 References

Page 24: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

VEX - Binary Translation

We want to inspect all memory accesses in the code

Straight forward solution - CPU emulation

But this is really slow

Treat the program binary as “source-code”

Allow the tools to modify the code being compiled

VEX’s front-end translates X86 opcodes into IntermediateRepresentation code

VEX’s back-end translates IR code back to X86 code

Tools can manipulate the IR code in the middle

Page 25: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

IR code

VEX translates each guest opcode into a block of IR code.

IR code is similar to assembly for a RISC-style machine.

IR code assumes machine with infinite number of variables.

The “guest state” is stored in a special memory area.

The following example is taken from libvex ir.h.

IR translation of “addl %eax, %edx”

—— IMark(0x24F275, 7) ——t3 = GET:I32(0) # get %eax, a 32-bit integert2 = GET:I32(12) # get %ebx, a 32-bit integert1 = Add32(t3,t2) # addlPUT(0) = t1 # put %eax

Page 26: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

IR code - Properties

IR code is fully typed:

RISC-like assembly language with arbitrary number oftemporary registers. [1]VEX performs sanity checks on these types all of the time. [1]

IR code is in a Single Static Assignment form.

Each variable is assigned only once.Simplifies the instrumentation of the code.Also simplifies the optimization of the code when running it.

IR code is presented to the tools in semi-parsed form.

Easy to manipulate lists of instructions.Instructions are presented in a convenient data-structure.Useful functions for manipulating IR code (add/removeinstructions, etc.)

Page 27: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Instrumentation technique: Disassemble and Resynthesize

Example

Figure 1: Object code → Code cache

1 IR blocks.2 Each of the IR blocks is generated by the VEX system.3 Finally object code is re-generated and stored in code-cache.

Page 28: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

JIT, VEX and IR

The X86→ IR → X86 translation is done in a Just In Timemanner

Every basic-block is translated upon its first execution.

Definition

Basic-Block - a linear sequence of code, with one entry point, oneexit point.

The translation is done so that Valgrind’s dispatcher regainscontrol after each basic-block.(There are exceptions)

Caching of translated code blocks improves execution speed.

Page 29: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Bootstrap Code

Small, shared launcher - launches the relevant tool

Each tool’s binary contains a complete copy of the core code

The interesting stuff in bootstrap:

Reading the debug information for the target program or client.Initialize VEX - Valgrind’s binary-translation mechanism.Call tool-specific initialization code.Load the target program.Setup the environment for the target program run.Initialize Valgrind’s thread-scheduler.

The scheduler makes sure that only one thread runs at a time.Scheduler also handles signals.

Kick-start the “client” application main thread, using VEX andValgrind’s dispatcher.

Page 30: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

High-level view of Valgrind’s activity

Example

Figure 2: Valgrind’s scheduler and dispatcher

1 Auxiliary mapped cache updated for every access oftranslation block.

2 Figure shows Valgrind’s scheduler and dispatcher.

Page 31: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Function Call Redirection

Valgrind implements a redirection mechanism.

This mechanism can “hijack” or capture various function calls.

Done on the binary translation level.

Some examples where this ability is useful:

Memory allocation functions.Various sys-calls (write/read files, etc.)Loading dynamic-link library.Replace some optimized functions with debug versions

The guest code can request that as well.

Redirect requests are indicated by a specially mangled functionname.Special “magic-sequence” to call the original function.Nice C macros make it developer friendly.Please see documentation for details.

Page 32: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Outline

1 Program analysis

2 Sample (Bad) Code

3 Drawbacks

4 Advanced Valgrind usageOther Tools in ValgrindFine Tuning and Client Requests

5 How does Valgrind do that?The Core - All Valgrind Tools have in commonMemcheck - Behind the Scenes

6 Q&A

7 Thanks

8 References

Page 33: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Memcheck Basics, A-bits

Memcheck tracks or shadows what memory the clientapplication may access.

Two levels of allowed access:

Write to / read from the memory.Use the value in memory for anything serious.

The first one is tracked with “A” bits (Access/Addressability).

One bit per byte of memory.Set to 0 if client is not allowed to access (i.e. free’d block).Set to 1 upon memory allocation.Report an error if client code touched memory with A == 0.

Page 34: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

V-bits

The second access level is tracked with “V” bits (Validity)

One bit per bit of memory, so a byte of validity.Set to 1 if the original memory bit haven’t been defined yet.Set to zero once the memory bit is set.State is transitive.

Example

If c is not defined, evaluating a = b + c will get a to be undefined too.

Reports an error when conditional jumps and syscalls are givenundefined values.

Page 35: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Instrumentation technique

Valgrind uses Disassemble-and-resysnthesize approach.

Eight phases. Little complex to demonstrate in this session.

Page 36: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Discussion and Clarification

Q & A

Page 37: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

Thank you

Thank you all for listening.

Page 38: 150412 38 beamer methods of  binary analysis

Program analysis Sample (Bad) Code Drawbacks Advanced Valgrind usage How does Valgrind do that? Q&A Thanks References

References

A. H. ASHOURI, The VEX System.

A. V. Emmanuel Fleury, Gerald Point, Binaryprogram analysis: Theory and practice (what you code is notwhat you execute).2013.