Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad...

25
Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012) Outstanding paper award Generalized Vulnerability Extrapolation using Abstract Syntax Trees

Transcript of Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad...

Page 1: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

Fabian Yamaguchi, University of Göttingen

Markus Lottmann, Technische Universität Berlin

Konrad Rieck, University of Göttingen

28th ACSAC (December, 2012)

Outstanding paper award

Generalized Vulnerability Extrapolation using Abstract Syntax Trees

Page 2: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 2

Outline• Introduction

• Vulnerability Extrapolation

• Evaluation

• Limitations

2013/1/29

Page 3: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 3

Introduction

2013/1/29

• The discovery of vulnerabilities in source code is a central issue of computer security.

• Many of these researches, however, are limited to specific conditions and types of vulnerabilities.

• The discovery of vulnerabilities in practice still mainly rests on tedious manual auditing that requires considerable time and expertise.

• Instead of striving for an automated solution, we aim at rendering manual auditing more effective by guiding the search for vulnerabilities.

Page 4: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 4

Contributions

2013/1/29

• Generalized vulnerability extrapolation

• Structural comparison of code

• Evaluation and cases studies

Page 5: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 5

Vulnerability Extrapolation

2013/1/29

• The concept of vulnerability extrapolation builds on the observation that source code often contains several vulnerabilities linked to the same flawed programming patterns.

• Given a known vulnerability, it is thus often possible to discover previously unknown vulnerabilities by finding functions sharing similar code structure.

• 2 advantages of this approach:

• It is a general approach that is not limited to any specific vulnerability type.

• The extrapolation does not hinge on any involved analysis machinery.

Page 6: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 6

Schematic Overview

2013/1/29

Page 7: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 7

Robust AST Extraction

2013/1/29

• Our parser is based on a single grammar definition for the ANTLR parser generator [23] and publicly available. [link]

API node

Syntax node

Page 8: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 8

Embedding of ASTs in a Vector Space

2013/1/29

• We describe the AST of each functions in our code base using a set of subtrees S.

• We experiment with the following three definitions of the set:

• API nodes

• The set S simply consists of all individual API nodes.

• API subtrees

• The set S is defined as all subtrees of depth D in the code base that contain at least one API node.

• API/S subtrees

• The set S consists of all subtrees of depth D containing at least one API or syntax node.

• In the following we fix the depth of subtrees to D = 3.

Page 9: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 9

Converting ASTs to Vectors

2013/1/29

Function 1

Function 2

Function |X|

M =

0*00*00

...

|S|

|X|

Ws: TF-IDF weighting [link]

𝑀 𝑠 , 𝑥=¿(𝑠 , 𝑥) ∙𝑤𝑠

Page 10: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 10

Identification of Structural Patterns

2013/1/29

• However, we cannot yet compare functions with respect to more involved patterns.

• For example, the code base of a server application may contain functions related to network communication, message parsing and thread scheduling.

• It would be better to compare the functions with respect to these functionalities rather than looking at the plain subtrees of the ASTs.

• Latent semantic analysis is a classic technique of natural language processing (NLP) that is used for identifying topics in text documents. [link]

• It determines dominant directions in the vector space.

• We refer to these directions of related subtrees as structural patterns.

Page 11: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 11

Obtaining Directions

2013/1/29

• We obtain these d directions is by performing a singular value decomposition (SVD) of the matrix M. [link]

Page 12: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 12

Extrapolation of Vulnerabilities

2013/1/29

• Three activities can be performed to assist code auditing.

• Vulnerability extrapolation

• Finding structurally similar functions is thus as simple as comparing the rows of V using a suitable measure, such as the cosine distance [link].

• Code base decomposition

• the matrix U storing the most prevalent structural patterns in its columns gives important insight into the structure of the code base.

• Detection of unusual functions

Page 13: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 13

Evaluation

2013/1/29

• For the evaluation we consider 4 popular open-source projects.

• LibTIFF [link] is a library for processing images in the TIFF format.

• 1,292 functions and 52,650 lines of code

• Version 3.8.1 of the library contains a stack-based buffer overflow in the parsing of TLV. (CVE-2006-3459 [link])

• Candidate functions are all parsers for TLV elements.

Page 14: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 14

Evaluation (cont.)

2013/1/29

• Pidgin [link] is a client for instant messaging implementing several communication protocols.

• 11,505 functions and 272,866 lines of code.

• Version 2.10.0 of the client contains a vulnerability in the implementation of the AIM protocol (CVE-2011-4601 [link]).

• Candidate functions are all AIM protocol handlers converting incoming binary messages to strings.

Page 15: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 15

Evaluation (cont.)

2013/1/29

• FFmpeg [link] is a library for conversion of audio and video streams.

• 6,941 functions with a total of 298,723 lines of code

• During the decoding of video frames in version 0.6, indices are incorrectly computed (CVE-2010-3429 [link]).

• Candidate functions are all video decoding routines, which write decoded video frames to a pixel buffer.

Page 16: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 16

Evaluation (cont.)

2013/1/29

• Asterisk [link] is a framework for Voice-over-IP communication.

• 8,155 functions and 283,883 lines of code

• Version 1.6.1.0 of the framework contains a vulnerability (CVE-2011-2529 [ link]), which allows a remote attacker to corrupt memory of the server via a crafted packet.

• Candidate functions are all functions reading incoming packets from UDP/TCP sockets.

• We thoroughly inspect each code base and manually label all candidate functions, that is, all functions that potentially contain the same vulnerability.

• This manual analysis process required several weeks of work.

Page 17: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 17

Quantitative Evaluation

2013/1/29

• The number of extracted structural patterns is not a critical parameter for vulnerability extrapolation.

• In the following case studies, we fix this parameter to 70.

Page 18: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 18

Quantitative Evaluation (cont.)

2013/1/29

Page 19: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 19

Qualitative Evaluation (Case Study)

2013/1/29

• In a case study with FFmpeg and Pidgin, we now demonstrate the practical merit of vulnerability extrapolation and show how our method plays the key role in identifying 8 zero-day vulnerabilities.

• We have conducted two further studies with Pidgin and Asterisk uncovering 2 more zero-day vulnerabilities.

• For the sake of brevity however, we omit these case studies here.

Page 20: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 20

Case Study: FFmpeg

2013/1/29

• CVE-2010-3429

• 3 further vulnerabilities

• 2 of which were zero-day

*

*

Page 21: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 21

Case Study: FFmpeg

2013/1/29

Page 22: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 22

Case Study: Pidgin

2013/1/29

• CVE-2011-4601

• 9 further vulnerabilities

• Six of which were zero-day

Page 23: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 23

Case Study: Pidgin

2013/1/29

Page 24: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 24

Limitations

2013/1/29

• Only identifying potentially vulnerable code

• Due to Rice’s theorem [link], however, a generic discovery of vulnerabilities is impossible anyway.

• The existence of a starting vulnerability

• Complex flaws that span several functions across a code base can be difficult to detect for our method.

Page 25: Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

A SEMINAR AT ADVANCED DEFENSE LAB 25

Q & A

2013/1/29