Post on 31-Mar-2018
Control Flow Hijacking
Contains bug in PDF
parser
Control of viewer can be
hijacked
malicious.pdf
Foundations of Cybersecurity 2016
Control Flow Hijacking – Principles
Foundations of Cybersecurity 2016
Normal Control Flow(depicted as Control Flow Graph – CFG)
Start
User
Action
Open
File
Delete
File
Display
Content
Control Flow Hijacking – Principles
Foundations of Cybersecurity 2016
Attacked Control Flow(depicted as Control Flow Graph – CFG)
Start
User
Action
Open
File
Delete
File
Display
Content
J^@!!##’_±
Control Flow Hijacking Attacks
Attacker’s goal
• Take over target machine (e.g. web server)
• Execute arbitrary code on target by
hijacking application control flow
Foundations of Cybersecurity 2016
Control Flow Hijacking Attacks
Attack Pattern is Always Similar
Foundations of Cybersecurity 2016
Find bug in
program
Create code to exploit bug and control
program
Feed vulnerable program
with exploit
Hijacked
Common Types of Control Flow Attacks
Foundations of Cybersecurity 2016
Discover which
assumptions are made
Craft exploit outside
assumptions
Assumptions are vulnerabilities
Example: Coffee shop, handing out ordered drinks
- Assumptions:
• Only one Bill in shop right now?
• Legitimate Bill picks up? (ID check?)
• Name correctly understood/pronounced by staff?
- Exploit: Pretend to be Bill, be faster than original Bill
Computer program: What we commonly assume
We write our code in languages that offer several layers of abstraction over machine code; even C
- High-level statements: “=” (assign), “;” (seq), if, while, for, etc.
- Procedures / functions
Naturally, our execution model assumes:
- Basic statements (e.g. assignment) are atomic
- Only one of the branches of an if statement can be taken
- Functions start at the beginning
- They (typically) execute from beginning to end
- And, when done, they return to their call site
- Only the code in the program can be executed
- The set of executable instructions is limited to those output during compilation of the program
Foundations of Cybersecurity 2016
Computer program: The Ugly Truth
We write our code in languages that offer several layers of abstraction over machine code; even C
- High-level statements: “=” (assign), “;” (seq), if, while, for, etc.
- Procedures / functions
But, actually, at the level of machine code
- Basic statements (e.g. assignment) are atomic
- Only one of the branches of an if statement can be taken
- Functions start at the beginning
- They (typically) execute from beginning to end
- And, when done, they return to their call site
- Only the code in the program can be executed
- The set of executable instructions is limited to those output during compilation of the program
Foundations of Cybersecurity 2016
- Each basic statement compiled down to many instructions
- There is no restriction on the target of a jump
- Can start executing in the middle of functions
- A fragment of a function may be executed
- Returns can go to any program instruction
- Dead code (e.g. unused library functions) can be executed
- On the x86, can start executing not only in the middle of functions, but in the middle of instructions!
Computer program: The Ugly Truth
We write our code in languages that offer several layers of abstraction over machine code; even C
- High-level statements: “=” (assign), “;” (seq), if, while, for, etc.
- Procedures / functions
But, actually, at the level of machine code
- Each basic statement compiled down to many instructions
- There is no restriction on the target of a jump
- Can start executing in the middle of functions
- A fragment of a function may be executed
- Returns can go to any program instruction
- Dead code (e.g. unused library functions) can be executed
- On the x86, can start executing not only in the middle of functions, but in the middle of instructions!
Foundations of Cybersecurity 2016
Common Types of Control Flow Attacks
Foundations of Cybersecurity 2016
Discover which
assumptions are made
Craft exploit outside
assumptions
Assumptions are vulnerabilities
Two assumptions often exploited:
- Target buffer is large enough for source data
• Buffer overflows deliberately break this assumption
- Computer integers behave like math integers
• Integer overflows violate this assumption
Hijacked
Control Flow Hijacking Attacks
There are many attacks that execute code on an attacker’s behalf
Buffer Overflows
Integer Overflows
Return-Oriented Programming
Format String Vulnerabilities
Use after Free
Foundations of Cybersecurity 2016
This lecture
Buffer Overflow – History
1972 : First described – no documented use
1988 : “Morris Worm” hits the wildexploited a buffer overflow in the Unix service finger
... Many, many since then...
Foundations of Cybersecurity 2016
6,203
Buffer Overflow
Why is an assumption about data length dangerous?
Computer programs organise internal data values in variablesstored in memory
Foundations of Cybersecurity 2016
PC Memory
Program “Car configurator” Another program...
Colour PriceNav EmailVariables: Red Yes backes@mpi-sws.org 19750
Reserved 50characters for email
looooooooooooooooooongemail
Memory for “email” variable
(size: 50 characters)
Memory for “price” variable
Buffer Overflow and Changing Variable Values
The car configurator example allows us to overwrite the associated price using a very long email address
Foundations of Cybersecurity 2016
[ ][ ][ ][ ][ ][ ][ ][ ][ ][ ]
51 characters
19750
looooooooooooo...oooooongemail7
l o … o n g m a i l 7
Car for 7 €
1 2 50494847464544
Buffer Overflow and Control Flow Hijacking
This technique can also be used to overwrite program control informationthat programs internally store:
- Where they came from when calling a sub-routine
- Local variables that are only used within a sub-routine
This data is organized in a memory region called the stack
Foundations of Cybersecurity 2016
C
y
S
e
c
C
y
S
e
c
C
y
S
e
c
C
y
S
e
push(c) pop() pop()
Stack operations
C
y
S
e
push(e)
ABCD
Control Flow and the Stack
Calling a simple program on shell:$ prog ABCD
Foundations of Cybersecurity 2016
void dangerous()
{
fprintf(stdout, “Hidden functionality!\n");
}
int bar(char *arg, char *out)
{
strcpy(out, arg);
return 0;
}
int foo(char *argv[])
{
char buf[128];
bar(argv[1], buf);
}
int main(int argc, char *argv[])
{
foo(argv);
return 0;
}
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
Entry
point
Return address: 20
Address argv
Memory for buf
(128 bytes)
Address argv[1]
Address of buf
Return address: 15
Source code of prog Stack during execution (simplified)Control flow
Direction the
buffer is filled
ABCD
Control Flow and the Stack
Calling a simple program on shell:$ prog ABCD
Foundations of Cybersecurity 2016
void dangerous()
{
fprintf(stdout, “Hidden functionality!\n");
}
int bar(char *arg, char *out)
{
strcpy(out, arg);
return 0;
}
int foo(char *argv[])
{
char buf[128];
bar(argv[1], buf);
}
int main(int argc, char *argv[])
{
foo(argv);
return 0;
}
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
Return address: 20
Address argv
Memory for buf
(128 bytes)
Address argv[1]
Address of buf
Return address: 15
Source code of prog Stack during execution (simplified)
Exit
Current
instruction
Control flow
AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
Buffer Overflow and Control Flow Hijacking
Calling a simple program on shell:$ prog AAAAAAAAAAAAAAAAA...AAAAAAAAAA01
Foundations of Cybersecurity 2016
void dangerous()
{
fprintf(stdout, “Hidden functionality!\n");
}
int bar(char *arg, char *out)
{
strcpy(out, arg);
return 0;
}
int foo(char *argv[])
{
char buf[128];
bar(argv[1], buf);
}
int main(int argc, char *argv[])
{
foo(argv);
return 0;
}
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
Entry
point
Return address: 20
Address argv
Memory for buf
(128 bytes)
Address argv[1]
Address of buf
Return address: 15
Source code of prog Stack during execution (simplified)Control flow
Return address: 01
128 bytes
Direction the
buffer is filled
AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
Buffer Overflow and Control Flow Hijacking
Calling a simple program on shell:$ prog AAAAAAAAAAAAAAAAA...AAAAAAAAAA01
Foundations of Cybersecurity 2016
void dangerous()
{
fprintf(stdout, “Hidden functionality!\n");
}
int bar(char *arg, char *out)
{
strcpy(out, arg);
return 0;
}
int foo(char *argv[])
{
char buf[128];
bar(argv[1], buf);
}
int main(int argc, char *argv[])
{
foo(argv);
return 0;
}
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
Return address: 20
Address argv
Memory for buf
(128 bytes)
Address argv[1]
Address of buf
Return address: 15
Source code of prog Stack during execution (simplified)Control flow
Return address: 01
128 bytes
Control flowhijacked!
Current
instruction
Buffer Overflow
We have just created our first Control Flow Hijacking attack!
Foundations of Cybersecurity 2016
Control Flow Hijacking Attack
What can we do with an overwritten return address?
Foundations of Cybersecurity 2016
Everything!
• Call an arbitrary address in memory
Call arbitrary functions or only parts of functions
• Call our own code by pointing the return address to the stack that
we have overwritten
\x90\x90\x90\x90
\x90\x90\x90\x90
…
\xeb\x1f\x5e\x89
\x76\x08\x31\xc0
Executing shell code
$ prog \x90\x90…\x90\xeb\x1f\x5e…0x01234
Foundations of Cybersecurity 2016
void dangerous()
{
fprintf(stdout, “Hidden functionality!\n");
}
int bar(char *arg, char *out)
{
strcpy(out, arg);
return 0;
}
int foo(char *argv[])
{
char buf[128];
bar(argv[1], buf);
}
int main(int argc, char *argv[])
{
foo(argv);
return 0;
}
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
Return address: 20
Address argv
Address argv[1]
Address of buf
Return address: 15
Source code of prog Stack during execution (simplified)Control flow
Return address: 0x1234
128 bytes
Executeshellcode!
“NOP Slide”
(Account for stack changes) Shellcode Address of buf
Current
instruction Address:0x1234
After pop():Shellcode still present in memory at old address of buf!
Reasons for Buffer Overflows
Foundations of Cybersecurity 2016
The level of abstraction of a programming language
correlates to how vulnerable it is to buffer overflows.
Assembler
High-LevelLanguages
2GL
3GL
Advanced
3GL
2GL=2nd Generation Language, 3GL=3rd Generation Language
mov r8, r9
add r8, $42
test r8
jne 0x48aac1
Directly interpreted by the CPU.
Must be
written for a
particular
CPU
int main(int argc, char*
argv[])
Translated to
assembler
Intel
ARM
MIPS
C
C++ObjC
Object o = new Map<int, String>;
o.getIterator();
compiled at
run-timeC#
JavaPython
PROGRAMMER MUST ORGANISE
BUFFER LENGTHS AND VARIABLE
LAYOUT
LANGUAGE ORGANISES AND IS SAFE
Finding Buffer Overflows
Finding Buffer Overflows is useful
- for programmers to test their software for vulnerabilities
- for attackers to ...
Methods (white box vs. black box)
- Source Code Analysis (white box)
• Manual code analysis
• Automatic Tools
- Fuzzing (black box)
• Input massive amounts of random data to test whether program crashes
Foundations of Cybersecurity 2016
Source Code Analysis
Manually or automatically scan source for fixed-length buffers that can be overwritten by user input
Only necessary for lower 3GL languages (C, C++, ObjC, ...)
Fixes:
- Either: allocate dynamic buffer as neededchar* buf = malloc(length);
- Or: check length when touching buffersstrncpy() instead of strcpy()
Foundations of Cybersecurity 2016
Fuzzing
Foundations of Cybersecurity 2016
Fuzzing feeds unexpected input to a program and makes it explore corner cases in the hopes that it eventually crashes
Crash dump (=memory content at time of crash) is analyzed to guess the cause of the crash
Fuzzing is no exact science
Very helpful to start
Fuzzing
Simple Fuzzing
- Run target app on local machine
- Feed input with long string “$$$$$$$$$$$$$$$$$$$$....”
- If app crashes, search crash dump for “$$$$$” to find overflow location
Foundations of Cybersecurity 2016
Fuzzing
Advanced Fuzzing
- Run target app on local machine
- Feed input with long string in the form of a certain pattern where each character uniquely describes its positionAa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0A...
- If app crashes, check return addresses on the stack and calculate which position of the generated string that isAa0 = 0, Aa1 = 3, Aa2 = 6, Aa3 = 9, ...
- Now we know the exact buffer length and the position of the return address to overwrite
Foundations of Cybersecurity 2016