Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of...
Transcript of Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of...
![Page 1: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/1.jpg)
Neural Reverse Engineering of Stripped Binaries
Yaniv David, Uri Alon, Eran YahavTechnion, Israel
![Page 2: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/2.jpg)
Reverse Engineering (RE) BinariesWhat, Why & How?
2
![Page 3: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/3.jpg)
RE – What & Why?
3
Malware?
Bug? find & fix it
![Page 4: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/4.jpg)
RE – How? Disassemblers
4
call getaddrinfomov rax, [rbp-30h]mov rdx, [rbp-50h]mov rdx, cs:688588dmov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]mov eax, [rbx+40h]cdqe
![Page 5: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/5.jpg)
RE – How? Disassemblers
5
call getaddrinfomov rax, [rbp-30h]mov rdx, [rbp-50h]mov rdx, cs:688588dmov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]mov eax, [rbx+40h]cdqe
No Names
No Types
![Page 6: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/6.jpg)
RE – How? Modern Disassemblers
6
![Page 7: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/7.jpg)
RE – How? Modern Disassemblers
7
Where to start?
![Page 8: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/8.jpg)
Progress in Other Domains
8
![Page 10: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/10.jpg)
Progress in the Source Code Domain
10https://code2vec.org - code2vec: Learning Distributed Representations of Code
![Page 11: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/11.jpg)
Un-Stripping Procedure Names
11
![Page 12: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/12.jpg)
Un-Stripping Procedure Names
12
Start at the right place
![Page 13: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/13.jpg)
Translate: Assembly Procedure → English
13
![Page 14: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/14.jpg)
Sequence-To-Sequence (seq2seq) Models
• A basic approach:• LSTM encoder• LSTM decoder
14
estás
how are you
cómo
• LSTM with attention & Transformers are state of the art for seq2seq tasks (machine translation, speech recognition, etc.)
![Page 15: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/15.jpg)
Binary Syntax Is Very Local
15
call getaddrinfomov rax, [rbp-30h]mov rdx, [rbp-50h]mov rdx, cs:688588dmov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]mov eax, [rbx+40h]cdqe
![Page 16: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/16.jpg)
Binary Syntax Is Very Local
16
call getaddrinfomov rax, [rbp-30h]mov rdx, [rbp-50h]mov rdx, cs:688588dmov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]mov eax, [rbx+40h]cdqe
Global offsets local to
executable
Register allocation is local to instruction/BB
Stack offsets local to procedure
![Page 17: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/17.jpg)
Finding Prediction Anchors
17
call getaddrinfomov rdx, cs:qword_68858mov rax, [rbp-30h]mov rdx, [rbp-50h]mov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]Mov eax, [rbp+var3C]cdqe
call getaddrinfo…
call strerror…
call setsockopt…
Not enough data and context
Focus On Calls
![Page 18: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/18.jpg)
Finding Prediction Anchors
18
call getaddrinfomov rdx, cs:qword_68858mov rax, [rbp-30h]mov rdx, [rbp-50h]mov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]Mov eax, [rbp+var3C]cdqe
call getaddrinfo…
call strerror…
call setsockopt…
Not enough data and context
Focus On Calls
Combine binary program analysis with machine learning to find a sweet-spot
![Page 19: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/19.jpg)
Augmented Call Sites as Learning Features
19
![Page 20: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/20.jpg)
Using API Calls
20
…call getaddrinfo
…call strerror
…call setsockopt
…setsockopt(rdi,rsi,rdx,rcx,r8)
API calls Reconstructed API Call Sites
Calling Conventions + Library information
![Page 21: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/21.jpg)
Augmenting Call Sites
21
setsockopt(rdi,rsi,rdx,rcx,r8)
call socket(...)mov [rbp-58h], raxmov rax, [rbp-58h]mov rdi, rax
mov rsi, 1
mov r8, 4
In C: setsocketopt(sock_var,…,1,4)
![Page 22: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/22.jpg)
Augmenting Call Sites
Using concrete or abstracted values:
1. Concrete value (Integer, Enum, String)
2. ARG – procedure argument
3. GLOBAL - pointer to a global variable
4. RET – a return value from a call
5. STACK – pointer to stack memory
22
Less Informative
![Page 23: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/23.jpg)
Pointer-Aware Slicing of Call Site Args
23getaddrinfo(rdi,rsi,rdx,rcx)
mov rdi, rax
mov rax, [rbp-68h] ∅
V(rax) P([rax])
P([rbp-68h])
mov [rbp-68h], rdi
V(rbp)
∅
V
V(rdi)
∅ ∅
P([rdi])
![Page 24: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/24.jpg)
Augmenting Call Site Arguments
24getaddrinfo(rdi,rsi,rdx,rcx)
mov rdi, rax
mov rax, [rbp-68h] ∅
mov [rbp-68h], rdi∅
∅ ∅
STACK
ARG
ARG | ∅
STACK | ARG
ARG | ∅
![Page 25: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/25.jpg)
Augmenting Call Site Arguments
25getaddrinfo(rdi,rsi,rdx,rcx)
mov rdi, rax
mov rax, [rbp-68h] ∅
mov [rbp-68h], rdi∅
∅ ∅
STACK
ARG
ARG | ∅
STACK | ARG
ARG | ∅
Using concrete or abstracted values:
1. Concrete value (Integer, Enum, String)
2. ARG – procedure argument
3. GLOBAL - pointer to a global variable
4. RET – a return value from a call
5. STACK – pointer to stack memory
Less Informative
![Page 26: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/26.jpg)
Augmenting Call Site Arguments
26getaddrinfo(ARG,rsi,rdx,rcx)
mov rdi, rax
∅
∅
∅ ∅ARG
ARG | ∅
STACK | ARG
ARG | ∅
STACK
![Page 27: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/27.jpg)
27
Augmented Control Flow Graph
…call …
…
…call socket
…
…call printf
…
…call setsockopt
…
…call close
…call printf
…
![Page 28: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/28.jpg)
28
Augmented Control Flow Graph
setsockopt(RET,0,10,STK,4)
socket(2,1,0)
printf(GLOBAL,…)
close(…)
...
printf(GLOBAL,…)
Usefull for training seq2seq or GNN models
...
![Page 29: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/29.jpg)
Extracting Paths From the ACFG
29
Extract simple paths(no loops)
setsockopt(RET,0,10,STK,4)
socket(2,1,0)
printf(GLOBAL,…)
close(…) ...
printf(GLOBAL,…)
setsockopt(RET,1,2,STK,4)
getaddrinfo(ARG,ARG,STK,STK)
socket(…)
bind(…)
listen(…)
memset(STK,0,48)
![Page 30: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/30.jpg)
Our Approach: [Set-Of-Seq]-To-Seq
30
setsockopt(RET,1,2,STK,4)
getaddrinfo(ARG,ARG,STK,STK)
socket(…)
bind(…)
listen(…)
memset(STK,0,48)
servercreate socket
![Page 31: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/31.jpg)
EvaluationImplementation: Nero
31
![Page 32: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/32.jpg)
Evaluation Corpus
32
GNU software repository
Remove Duplications
67,246 Labeled
Procedures
Strip
Strip &
Obfuscate APIs
8:1:1 Package-Based Split
![Page 33: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/33.jpg)
Executable Obfuscation Types
• String encoding/encryption
• Code obfuscations (opaque predictions, etc.)
• Commercial (known) / Home-made packers • Header manipulation => API calls not visable
33
![Page 34: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/34.jpg)
Simulating Header Manipulation
• Zeroing ’.dynstr’ removes imported libraries & procedure names
34
![Page 35: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/35.jpg)
Stripped & Obfuscated API Calls
Prec Rec F1
15.46 14.00 14.70
18.41 12.24 14.70
32.10 28.76 30.09
39.12 31.40 34.83
36.50 32.25 34.24
Stripped
Prec Rec F1
22.32 21.16 21.72
25.45 15.97 19.64
34.86 32.54 33.66
39.94 38.89 39.40
41.54 38.64 40.04
Evaluation Results
StatsModel
LSTM-text
Transformer-text
Debin [He et al. 2018]
Nero-LSTM
Nero-Transformer
35”Debin: Predicting Debug Information in Stripped Binaries”, CCS’18
![Page 36: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/36.jpg)
Ablation Study
Components Prec Rec F1
Only Callsà LSTM 23.45 24.56 24.04
Augmented Call Sites à LSTM 36.05 31.77 33.77
Paths à Only Calls à LSTM 29.84 24.08 26.65
Paths à Augmented Call Sites à LSTM 39.94 38.89 39.40
36
![Page 37: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/37.jpg)
Prediction Examples
Model Prediction
Ground Truth read file check new watcher
get user groups
install signal handlers
Debin [He et al. 2018] bt open read index display signal setup
LSTM-text <unk> check opt close stdin <unk>
Transformer-text Ipmi disable coredump <unk> config file
ipmi regfree
Nero-LSTM vfs read file check file get ip groups install handlers
Nero-Transformer read file system list check state get user
groups install signal
37
![Page 38: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/38.jpg)
Qualitive EvaluationError Type Package Ground Truth Predicted Name
Programmers Vs
English Language
wget i18n_initialize i18n_initdirevent split_cfg_path split_config_path
gzip add_env_opt add_option
Date StructureName Missing
gtypist get_best_speed get_list_itemwget ftp_parse_winnt_ls parse_treegzip abort_gzip_signal fatal_signal_handler
Verb Replaced
units read_units parsefindutils share_file_fopen add_filemcsim display_help show_help
38
![Page 39: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/39.jpg)
Qualitive EvaluationError Type Package Ground Truth Predicted Name
Programmers Vs
English Language
wget i18n_initialize i18n_initdirevent split_cfg_path split_config_path
gzip add_env_opt add_option
Date StructureName Missing
gtypist get_best_speed get_list_itemwget ftp_parse_winnt_ls parse_treegzip abort_gzip_signal fatal_signal_handler
Verb Replaced
units read_units parsefindutils share_file_fopen add_filemcsim display_help show_help
39
Measured F1 is actually a lower-
bound
![Page 40: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel](https://reader035.fdocuments.us/reader035/viewer/2022071507/61286c8a56a11973d51d7ca4/html5/thumbnails/40.jpg)
Takeaway Messages
40
Use Augmented Call Sites as Learning Features
setsockopt(rdi,rsi,rdx,rcx,r8)
call socket(...)mov [rbp-58h], raxmov rax, [rbp-58h]mov rdi, rax
mov rsi, 1
mov r8, 4
In C: setsocketopt(sock_var,…,1,4)
Translate: Assembly Procedure → English