Post on 05-Jan-2016
description
Computer Science
Post-Attack Analysis of Unknown Vulnerabilities
Peng Ning
With Emre C. Sezer, Chongkyung Kil, and Jun Xu
Nov 14, 2007 2007 GMU-CSA Workshop 2Computer Science
Motivation
• Vulnerability analysis– Essential for
• Patching
• Vulnerability based signature generation
– Painstakingly slow• Depends on human efforts
• Existing approaches– Static analysis (e.g., [Chen et al. 04] , [Feng et al. 04], [Larochelle & Evans 01])
• False positives
– Dynamic analysis (e.g., Minos [Crandall et al. 04], TaintCheck [Newsome & Song 05], DIRA [Smirnov & Chiueh 05])
• Used for detection; inadequate vulnerability information
– Symbolic execution (e.g., Exe [Cadar et al. 06], DACODA [Crandall et al. 05])• Scalability issues
– Recovery (e.g., STEM [Sidiroglou et al. 05], SEAD [Lacosto et al. 07])• Change of application semantics
Nov 14, 2007 2007 GMU-CSA Workshop 3Computer Science
MemSherlock
• MemSherlock is an automated debugger– Automated analysis of unknown memory corruption vulnerabilities
– Appeared in ACM CCS ’07
• MemSherlock provides– Statement that causes the memory corruption
– Dynamic program slice leading to the corruption
– Program variables involved in the vulnerability
– All presented at programming language level
• Implications– Generating vulnerability conditions
– Improves signature or patch generation speed
Nov 14, 2007 2007 GMU-CSA Workshop 4Computer Science
General Framework: Web Application Example
Light-weight IDS
Program
Logger
Traffic
MemSherlock
Instrumented
Program
Replayer
Trigger
Nov 14, 2007 2007 GMU-CSA Workshop 5Computer Science
MemSherlock Overview
• Goal is to provide vulnerability information – Intuitive, easy to understand for the programmer
• Not only the corruption point– Slice of program involved in the vulnerability
– Effects of user inputs
– Program variables involved
– Variable relationships (e.g., pointer aliasing)
– Type of vulnerability (e.g., stack buffer overflow)
• MemSherlock performs two important tasks– Finding the corruption point
– Tracking program state
Nov 14, 2007 2007 GMU-CSA Workshop 6Computer Science
MemSherlock: Finding Corruption Point
• Observation: A memory object is modified by a small set of statements (inspired by AccMon)
• For memory object m, write set of m is the set of statements that legitimately modify m, WS(m)
• Security Condition: Memory object m should only be updated by statements in WS(m)
Nov 14, 2007 2007 GMU-CSA Workshop 7Computer Science
MemSherlock: Assembly Line
• Pre-Debugging Phase– Instruments the program for debugging phase
– Extracts program information via static analysis
– Needs to be performed once
• Debugging Phase– Tracks program state
– Monitors memory writes and checks for violation of security condition
– Tracks tainted data and its propagation
Nov 14, 2007 2007 GMU-CSA Workshop 8Computer Science
MemSherlock Architecture
Static Analyzer
Source Code
Rewriting
Compiler
Debugging Agent
Vulnerabilityinformation
Pre-debugging phase
CC CC
010110100101
procvaraddr
Original source files
Program executable
Malicious input
Debugging information
Library specification
Nov 14, 2007 2007 GMU-CSA Workshop 9Computer Science
Pre-debugging: Generating Write Sets
• MemSherlock analyses source code to determine write sets
• For a program variable v, WS(v) includes– Assignment statements (i.e., v=expr)
– Library function calls where v is passed as an argument that can be modified (i.e., memcpy(&v,src))
• MemSherlock treats DLLs as black boxes– Assumption: A DLL is internally secure, but externally insecure
• e.g., no stack overflows in the library functions
• Sound for common, well tested libraries (e.g., clib)
– Requires library specifications
– For each DLL, a list of functions and the arguments they might modify
Nov 14, 2007 2007 GMU-CSA Workshop 10Computer Science
Dealing with Pointers
• For a pointer variable p two write sets are kept– WS(p) – Statements that modify p
– WS(ref(p)) – Statements that modify the referent (e.g., *p=5)
• ref(p) is resolved during runtime (debugging)
• Perform the same analysis for pointer-type function arguments at function calls– Removes the requirement for inter-procedural static analysis
1 int i = 0;2 int *p = &i;3 *p = 1;4 p = NULL;
WS(i) = {1}WS(p) = {2,4}
WS(ref(p)) = {3}
(a) Code example
Line1234
ref(p)N/A
ii
NULL
WS(i){1}{1,3}{1,3}{1}
(b) Write sets after static analysis
(c) ref(p) and WS(i) during monitoring
Nov 14, 2007 2007 GMU-CSA Workshop 11Computer Science
Chained Dereferences
• Earlier technique can only handle simple dereferences
• Source code rewriting is used to convert all chained dereferences to simple dereferences
• Any other dereference that is not simple is converted in the same manner
1 int z;
2 int *y = &z;
3 int **x = &y;
4 **x = 10;
1 int z;
2 int *y = &z;
3 int **x = &y;
4 int *temp = *x;
5 *temp = 10;
Nov 14, 2007 2007 GMU-CSA Workshop 12Computer Science
Output of Pre-debugging Phase
• Simplified program– Simplified pointer dereferences
– Compiled with debugging options
• Input file for the debugger– Program variables and their write sets
– Addresses of global symbols
– Frame pointer offsets of local variables
– Other flags that help the debugger
Nov 14, 2007 2007 GMU-CSA Workshop 13Computer Science
MemSherlock Architecture: Debugging
Static Analyzer
Source Code
Rewriting
Compiler
Debugging Agent
Vulnerabilityinformation
CC CC
010110100101
procvaraddr
Original source files
Program executable
Malicious input
Debugging information
Library specification
Debugging phase
Nov 14, 2007 2007 GMU-CSA Workshop 14Computer Science
Debugging: Dynamic Monitoring
• Runtime monitoring– State Maintenance
– Incorporates taint analysis from TaintCheck• Produces a dynamic slice of the program leading to the vulnerability
• Write Checking– Monitors and validates memory writes
– Write sets are file name and line number pairs <f,l>• Instruction pointer IP is translated into <f,l>
– Write sets are associated with program variables• A destination address is translated into a program variable
Nov 14, 2007 2007 GMU-CSA Workshop 15Computer Science
Keeping Program State
• A given memory region may correspond to different program variables depending on program state
• Dynamic monitor keeps track of memory mapping
mainStack base
Virtual Address Space
fnc A
fnc B
main
fnc A
fnc C
Stack base
Program State 1 Program State 2
Memory write0xABABABAB
Memory write0xABABABAB
Nov 14, 2007 2007 GMU-CSA Workshop 16Computer Science
Debugging: Key Data Structures
• Keeps two lists of memory regions– ActiveMemoryRegions
• Memory corresponding to program variables or their referent memory regions
– NonWritableRegions• Saved registers, return addresses, metadata encapsulating dynamically allocated
memory regions
Nov 14, 2007 2007 GMU-CSA Workshop 17Computer Science
Debugging: State Maintenance
• Function calls/returns (memory)– Local variable addresses are calculated and added to ActiveMemoryRegions– Location of return address and saved registers are added to
NonWritableRegions list
• Heap memory (memory)– malloc/free calls are intercepted– Allocated memory is added to ActiveMemoryRegions– The metadata encapsulating the buffer is added to NonWritableRegions
• Pointer value updates (write sets)– Searches ActiveMemoryRegions to find the referent and updates its WS
Nov 14, 2007 2007 GMU-CSA Workshop 18Computer Science
Debugging: Write Checking
• When instruction IP modifies memory m– if m is in ActiveMemoryRegions
• determines the variable v it belongs to
• converts IP into <f,l>
• checks if <f,l> is in WS(v)
• If the memory write check fails or m is in NonWritableRegions– Marks the operation as a memory corruption
– Displays the vulnerability information
Nov 14, 2007 2007 GMU-CSA Workshop 19Computer Science
Generating Vulnerability Information
• The slice of program contributing to the vulnerability– Statements that have propagated tainted values
– Statements that have modified related memory regions
• Dependency between memory objects involved in the vulnerability– Points to analysis shows memory regions and how they were accessed
• Program state– Call stack information
– Write set information
Nov 14, 2007 2007 GMU-CSA Workshop 20Computer Science
Example Test Case: Null HTTP
•~~http.c~~• 91: void ReadPOSTData(int sid) {• …•100: conn[sid].PostData=calloc(conn[sid].dat->in_ContentLength+1024, sizeof(char));•101: if (conn[sid].PostData==NULL) { ...•107: do {•108: rc=recv(conn[sid].socket, pPostData, 1024, 0);•109: …
•--20361-- Error type: Heap Buffer Overflow
•--20361-- Dest Addr: 3AB3E360
•--20361-- IP: 0x804E5C7: ReadPOSTData (http.c:108)
•--20361-- Dest address resolved to:
•--20361-- Global variable "heap var"
• @ 3AB3E280 (size: 224)
•--20361--
•--20361-- Memory allocated by 0x804E531:
• ReadPOSTData (http.c:100)
•--20361-- TAINTED destination 3AB3E360
•--20361-- Fully tainted from:
•--20361-- 0x804E5C7: ReadPOSTData (http.c:108)
•--20361--
•--20361-- TAINTED size used during allocation
•--20361-- Tainted from:
•--20361-- 0x804E456: ReadPOSTData (http.c:100)
•--20361-- 0x804FBB5: read_header (http.c:153)
•--20361-- 0x805121B: sgets (server.c:211)
•Error Report:
Nov 14, 2007 2007 GMU-CSA Workshop 21Computer Science
Vulnerability Analysis Example
~~http.c~~ 91: void ReadPOSTData(int sid) { 92: char *pPostData;
...100: conn[sid].PostData=calloc(
conn[sid].dat->in_ContentLength+1024, sizeof(char));...
107: do {108: rc=recv(conn[sid].socket, pPostData, 1024, 0);
... Heap Object
Create
Nov 14, 2007 2007 GMU-CSA Workshop 22Computer Science
Vulnerability Analysis Example
Object
Use
~~http.c:~~119: int read_header(int sid) {121: char line[2048];
...127: do {128: memset(line, 0, sizeof(line));129: sgets(line, sizeof(line)-1, conn[sid].socket);
...
153: conn[sid].dat->in_ContentLength=atoi((char *)&line+16); ...
169: if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) {170: ReadPOSTData(sid);
~~http.c~~ 91: void ReadPOSTData(int sid) { 92: char *pPostData;
...100: conn[sid].PostData=calloc(
conn[sid].dat->in_ContentLength+1024, sizeof(char));...
107: do {108: rc=recv(conn[sid].socket, pPostData, 1024, 0);
...
Object
Taint
Nov 14, 2007 2007 GMU-CSA Workshop 23Computer Science
Vulnerability Analysis Example
Object
~~http.c:~~119: int read_header(int sid) {121: char line[2048];
...127: do {128: memset(line, 0, sizeof(line));129: sgets(line, sizeof(line)-1, conn[sid].socket);
...
153: conn[sid].dat->in_ContentLength=atoi((char *)&line+16); ...
169: if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) {170: ReadPOSTData(sid);
~~server.c~~202: int sgets(char *buffer, int max, int fd)203: { ...209: conn[sid].atime=time((time_t*)0);210: while (n<max) {211: if ((rc=recv(conn[sid].socket, buffer, 1, 0))<0) {
...
Object
Taint
Taint
Create
Nov 14, 2007 2007 GMU-CSA Workshop 24Computer Science
Implementation
• Source code is rewritten using CIL (C Intermediate Language)• CodeSurfer was used to extract program variables and their write sets
– A commercial static analysis tool
• objdump and dwarfdump were used to extract global symbol information
• Dynamic Monitoring is implemented in Valgrind– An open source emulator
Nov 14, 2007 2007 GMU-CSA Workshop 25Computer Science
Evaluation
• Tested 11 real-world applications with known memory corruption vulnerabilities
• Test cases included– Stack/Heap buffer overflow, Format string– Both control flow and non-control data attacks
• Testing methodology– Programs were run under MemSherlock– Exploit programs were used to attack the applications– Log and replay was not used
Nov 14, 2007 2007 GMU-CSA Workshop 26Computer Science
Evaluation Results
Application Name
Vuln.Type
Description Captured? #FP
GHTTP S A small HTTP server Yes 7
Icecast S An mp3 broadcast server Yes 0
Sumus S A game server for ‘mus’ Yes 0
Monit S Multi-purpose anomaly detector Yes 0
Newspost S Automatic news posting Yes 2
Prozilla S A download accelerator for Linux No 0
NullHTTP H An HTTP server Yes 0
Xtelnet H A telnet server Yes 4
Wsmp3 H Web server with mp3 broadcasting Yes 0
OpenVMPS F Open source VLan management policy server Yes 2
Power F UPS monitoring utility Yes 10
Type abbreviations: (S)tack overflow, (H)eap overflow and (F)ormat string
Nov 14, 2007 2007 GMU-CSA Workshop 27Computer Science
False Negatives
• Prozilla:– memcpy uses a kernel function to manipulate page tables when copying entire
pages
– Valgrind cannot trace into kernel
– Can be prevented by function wrappers
• Other false negatives are theoretically possible– structs within unions or arrays
• Current implementation does not support unions
• Currently do not differentiate between elements of an array
– Memory corruption errors inside DLLs
Nov 14, 2007 2007 GMU-CSA Workshop 28Computer Science
False Positives
• Embedded assembly
• Incomplete library specification– library functions keeping internal state (e.g., strtok(Null, delim) )
– library functions that modify global variables as side effects (e.g., optarg, errno)
– pointers that point to hidden global structures (e.g., getdatetime() in time.h)
• struct pointers– void pointers that are type-cast to modify struct variables
– since the pointer is not of type struct, MemSherlock fails to update accordingly
Nov 14, 2007 2007 GMU-CSA Workshop 29Computer Science
Conclusion
• Fully automated vulnerability analysis
• The analysis output is intuitive and human readable
• Future Challenges– Automated, long-term fix of vulnerabilities
• Semantic consistency is a great challenge
– Automated, temporary fix of vulnerabilities• Generating vulnerability condition
• Improving signature generation
Computer Science
Thank You