BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) •...
Transcript of BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) •...
![Page 1: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/1.jpg)
BinSim: Trace-based Semantic Binary Diffing via System Call Sliced
Segment Equivalence Checking
Jiang Ming, Dongpeng Xu, Yufei Jiang, Dinghao Wu
![Page 2: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/2.jpg)
This Talk is About• A hybrid method to identify fine-grained relations between
obfuscated binary code.• Address “block-centric” limitation• Detect “conditionally equivalent” behaviors
2
BinSim
![Page 3: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/3.jpg)
Security-relevant Software Analysis for Binary Code
3
Source
datasize = getc(infile);....for (code = 0; code < clear; code++) {
prefix[code] = 0;suffix[code] = code;
}
Binary Code
01010010101010100101010010101010100101010101010101000100001000101001001001001000000010100010010101010010101001001010101001010101001010000110010101010111011001010101010101010100101010111110100101010101010101001010101010101010101010
Commercial-Off-The-Shelf
![Page 4: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/4.jpg)
Binary Difference Analysis (Binary Diffing)
4
• MS12-005: Object Packager Patch Diffing
![Page 5: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/5.jpg)
Security Applications of Binary Diffing• “1-day exploit” generation
5
• Software plagiarism detection
• Malware analysis
Source: http://www.bankinfosecurity.com/blogs/now-ransomware-goes-to-200-p-2287
Ransomware Family Count Surpasses 200
![Page 6: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/6.jpg)
Challenge: Code Obfuscation
6
Original binary
01010010101010100101010010101010100101010101010101000100001000101001001001001000000010100010010101010010101001001010101001010101001010000110010101010111011001010101010101010100101010111110100101010101010101001010101010101010101010
Suspicious binary
11010010101010100101010010101010100101010101010101000111111000101001001001001000000010011100010101010010101001001010101001010110001010000110010101010111011001010101010101011100101010111110100101010101010101001010101010101010100101
• Packer, polymorphism, metamorphism, and code virtualization
![Page 7: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/7.jpg)
Compare Runtime Execution Behavior
• Measure the similarities of program behavior features− E.g., system call sequences and system call dependency graphs− Limitation: many binary code differences do not reflect on the
behavior change; neglect conditionally equivalent behaviors
7
Conditional equivalent behaviors betweenTrojan-Spy.Win32.Zbot variants
![Page 8: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/8.jpg)
Basic Block Semantics Modeling• Represent the input-output relations as symbolic formulas
− Constraint solver, random sampling, and hashing• Resilient to moderate obfuscation within a basic block
− Register swapping, instruction reordering/substitution, and junk code• “Block-centric” limitation
8Constraint Solver HashingRandom sampling
![Page 9: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/9.jpg)
9
mov edx, ecxmov ebx, 0x000Aadd edx, ebxmov ebx, ecxsub ecx, eaxmov eax, ecxdec eaxand ecx, 0jmp 0x401922
lea eax, [ebx]mov edx, 0x000Anopnopadd eax, edxxchg eax, eaxnot ecxadd ecx, ebxlea edx, [ebx]xor ebx, ebxjmp 0x401A22
Binary block 1: Binary block 2:
Are they semantically equivalent?
![Page 10: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/10.jpg)
10
mov edx, ecxmov ebx, 0x000Aadd edx, ebxmov ebx, ecxsub ecx, eaxmov eax, ecxdec eaxand ecx, 0jmp 0x401922
lea eax, [ebx]mov edx, 0x000Anopnopadd eax, edxxchg eax, eaxnot ecxadd ecx, ebxlea edx, [ebx]xor ebx, ebxjmp 0x401A22
Symbolic inputs to basic block 1:ecx_0 = symbol1; eax_0 = symbol2;
Symbolic inputs to basic block 2:ebx_0 = symbol3; ecx_0 = symbol4;
Outputs:eax_2 = symbol1-symbol2-1; ebx_1 = symbol 1;ecx_2 = 0x0;edx_1 = symbol1 + 0xA
Outputs:ecx_2 = symbol3-symbol4-1; edx_1 = symbol 3;ebx_1 = 0x0;eax_3 = symbol4 + 0xA
Constraint solver
![Page 11: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/11.jpg)
Block-Centric Limitation (1)
11
![Page 12: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/12.jpg)
12
Block-centric binary diffing methods fail to match these three cases
![Page 13: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/13.jpg)
Block-Centric Limitation (2)• Compiler optimizations
− loop unrolling and function inline• Return-oriented programming obfuscation
− The chain of ROP gadgets will result in a set of small basic blocks• Different implementations of the same algorithm
− More examples in Hacker’s Delight• Control flow obfuscation
− opaque predicates and control flow flattening• Virtualization obfuscation
− The decode-dispatch loop generates a sequenceof basic blocks to interpret one x86 instruction.
13
![Page 14: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/14.jpg)
BinSim Overview
14
syscall (arg1) syscall (arg1)
• System Call Sliced Segments + Equivalence Checking
Equivalence Checking
![Page 15: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/15.jpg)
BinSim Advantages• Whether two matched system calls are conditional equivalent• System call sliced segments bypass the boundary of a basic
block− more likely to address “Block-centric” limitation
15
BinSim
![Page 16: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/16.jpg)
BinSim Core Method
16
1. System call alignment
2. Dynamic slicing and WP calculation
3. Cryptographic function detection
4. Equivalence checking
5. Cryptographic function approximate matching
![Page 17: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/17.jpg)
BinSim Architecture
17
![Page 18: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/18.jpg)
System Call Alignment• We rely on MalGene [CCS’15], an advanced bioinformatics-
inspired approach to perform system call sequence alignment• Remove system call noises
− multi-tag taint analysis− consider the parameter semantics− fake dependency: xor eax, eax; NtClose(eax)
18
![Page 19: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/19.jpg)
Dynamic Slicing Binary Code (1)• Dynamic slicing on the obfuscated binaries is never a
textbook problem− Indirect memory access: mov ebx, [4*eax+4]
− Implicit control transfers: COMVcc, SETcc, REP, LOOP, CMPXCHG
− Decode-dispatch loop of virtualization obfuscation
19Source: SHARIF, M., LANZI, A., GIFFIN, J., AND LEE, W. Automatic reverse engineering of malware emulators. (S&P’09)
![Page 20: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/20.jpg)
Dynamic Slicing Binary Code (2)• Our solution: split data dependencies and control
dependencies tracking− index and value based slicing to only trace data flow− tracking explicit and implicit control dependencies: eflags bit− exception: jecxz jumps if register ecx is zero− remove the fake control dependencies caused by virtualization
obfuscation code dispatcher: 5 3,163 109
20
![Page 21: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/21.jpg)
Handling Cryptographic Functions
21
• Cryptographic functions have been a known challenge to symbolic execution• Almost no interaction with system calls• Inspired by Caballero et al.’s work (CCS'10), we do a “stitched symbolic execution”
![Page 22: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/22.jpg)
Ground Truth Dataset
22
• Similarity scores change from right pairs to wrong pairs.
![Page 23: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/23.jpg)
Comparative Evaluation Results
23
![Page 24: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/24.jpg)
Analyzing Wild Malware Variants
24
• Collect 1,050 active malware samples from VirusShare on February 2017, 112 families −Perform intra-family pairwise comparison−BinSim are able to find small distances among intra-family samples.−11% of malware variants are “conditionally equivalent”−Example: A variant of CryptoWall terminate execution if infected machine’s UI languages are former Soviet Union country languages
![Page 25: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/25.jpg)
BinSim Limitations• BinSim only detects the similarities/differences exhibiting
during execution.− incomplete path coverage + environment-sensitive malware
• Attack BinSim’s enhanced slicing algorithm− slice size explosion
• Custom cryptographic algorithm − evade cryptographic function approximate matching
25
![Page 26: BinSim: Trace-based Semantic Binary Diffing via System ... · Dynamic Slicing Binary Code (2) • Our solution: split data dependencies and control dependencies tracking − index](https://reader035.fdocuments.us/reader035/viewer/2022071216/6047f87d8591c776910c0346/html5/thumbnails/26.jpg)
Q & A
26