Eliminating Bugs In MVC-Style Web Applications Tevfik Bultan Verification Lab (Vlab), CS@UCSB...

47
Eliminating Bugs In MVC-Style Web Applications Tevfik Bultan Verification Lab (Vlab), CS@UCSB [email protected] http://www.cs.ucsb.edu/~vlab

Transcript of Eliminating Bugs In MVC-Style Web Applications Tevfik Bultan Verification Lab (Vlab), CS@UCSB...

Eliminating Bugs In MVC-Style Web Applications

Tevfik BultanVerification Lab (Vlab), CS@UCSB

[email protected] http://www.cs.ucsb.edu/~vlab

Joint work with my students & postdocs

• Fang Yu• Muath Alkhalaf• Jaideep Nijjar• Sylvain Halle

Web software everywhere

• Commerce, entertainment, social interaction

• We will rely on web apps more in the future

• Web apps + cloud will make desktop apps obsolete

One major road block

• Web applications are not trustworthy

– They are notorious for security vulnerabilities and unreliable behavior

• Web applications are hard to program

– Distributed behavior, interaction among many components and many languages

• We applications are hard to test

– Highly dynamic behavior and concurrency

• Testing accounts for 50% of the development costs and software failures cost USA 60 billion annually

Why are web applications error prone?

• Script-oriented programming– A collection of scripts with no explicit control flow

• Extensive string manipulation– User input comes in string form, output (html, SQL) is

generated via string manipulation

• Complex interactions– User interaction via browsers, client-side server-side

interaction, interaction with the backend database

• Distributed system– Many components have to coordinate

VLab Research Agenda

We have a hammer

Automated Verification

Actually, we have several hammers

Automated Verification Techniques

Bounded verificationvia SAT solvers

Symbolic executionvia SMT solvers

String analysisModel checking

Unfortunately, life is complicated

Automated verification techniques

Web application problems

Automated verification is hard

• If it wasn’t, everybody would be using it

• It is hard because software systems are too complex

• In order to make automated verification feasible– we need to focus our attention

Making automated verification feasible

• We focus our attention by– Abstraction

• Hide details that do not involve the things we are checking

– Modularity• We focus on one part of the system at a time

– Separation of concerns• We focus on one property at a time

• It turns out these are also the main principles of software design– We should be able to exploit the design structure in

automated verification!

Separation of concerns

• First, we need to identify our concerns– What should we be concerned with if we want to

eliminate the bugs in web applications• Here are some concerns:

– Input validation• Errors in input validation are a major cause of

security vulnerabilities in web applications– Navigation

• Many web applications do not handle unexpected user requests appropriately

– Data model• Integrity of the data model is essential for

correctness of web applications

Why care about input validation?

String related web application vulnerabilities as a percentage of all vulnerabilities (reported by CVE)

OWASP Top 10 in 2007:1.Cross Site Scripting2.Injection Flaws

OWASP Top 10 in 2010:1.Injection Flaws2.Cross Site Scripting

Why care about navigation?

Why care about navigation?

Why care about navigation?

Why care about data model?

• The data in the back-end database is the most important resource for most web applications

• Integrity and consistency of this data is crucial for dependability of the web application

Separation of concerns

• By focusing on one problem at a time:– Input validation errors– Navigation errors– Data modeling errors

We are more likely to develop feasible verification techniques

Still, we need more help!

Modularity: Model-View-Controller

• Model-view-controller (MVC) architecture has become the standard way to structure web applications– Ruby on Rails, Zend Framework for PHP, CakePHP,

Spring Framework for Java, Struts Framework for Java, Django for Python, …

• MVC architecture separates – where and how the data is stored (the model)– from how it is presented (the views)

Model-View-Controller (MVC) architecture

• MVC consists of three modules– Model represents the data– View is its presentation to the user– Controller defines the way the application reacts to

user input

a=50%b=30%c=20%

model

views

MVC framework for web applications

• Model: – An abstract representation of the data stored in the

backend database – Uses an object-relational mapping to map the

application class structure to the back-end database tables

• Views: – Responsible for rendering of the web pages, i.e., how is

the data presented in user’s browser• Controllers:

– Event handlers that process incoming user requests– They can update the data model, and create a new view

to be presented to the user

Exploiting MVC for verification

• Modularity provided by MVC can be exploited in verification

• For checking navigation properties– Focus on controllers

• For checking input validation– Focus on controllers

• For checking data model properties– Focus on the model

Still, need more help

• Even with – the separation of concerns

• check one type of property at a time (navigation, input validation, data model)

– and modularity• focus on only one component at a time (controller or the

model)

we still need more help!

To completely automate the verification process– We need to find a way to map the code written in a

programming language to something that an automated verification tool can check

We have a verification model problem!

How to get verification models?

• The verification model is an abstraction of the behavior of the application

• We obtain this abstraction using two basic approaches:

1. When possible we automatically extract a verification model from the code using automated abstraction + static analysis

• Automated model extraction

2. If we cannot automatically extract the model, we ask the developer to write a model. In this case we have to enforce the specified model at runtime

• Automated model enforcement

Three verification techniques

1. Concern: Input validation checking

– Module: Controller

– Verification model extraction via abstraction: Dependency analysis to extract dependency graph

– Verification technique: String analysis

2. Concern: Navigation correctness

– Module: Controller

– Verification model specification by developer: Automated enforcement of navigation model at runtime

– Verification technique: Model checking

3. Concern: Data model integrity

– Module: Data model

– Verification model extraction via abstraction:

– Verification technique: Bounded SAT based verification

Making it work

Stringanalysis

Inputvalidation

Separation of concerns + modularity+ abstraction

Modelchecking

BoundedSAT basedverification

Navigation correctness

Dataintegrity

Input validation verification

Parser/Parser/Taint AnalysisTaint Analysis

Vulnerability AnalysisVulnerability Analysis

Signature GenerationSignature Generation

AttackPatterns

Application/Scripts

VulnerabilitySignature

(Tainted) Dependency Graphs

Patch SynthesisPatch Synthesis

Sanitization Statements

Reachable Attack Strings

An example XSS vulnerability

A PHP Example:

1:<?php2: $www = $_GET[”www”];3: $l_otherinfo = ”URL”;4: $www = preg_replace(”[^A-Za-z0-9 .-@://]”,””,$www);5: echo ”<td>” . $l_otherinfo . ”: ” . $www . ”</td>”;6:?>

•Line 4 removes all characters that are not in {A-Za-z0-9 .-@:/}•However, “.-@” denotes all characters between “.” and “@” (including “<” and “>”)•Our string analysis automatically detects that there is a XSS vulnerability in line 5

<!sc+ri++pt! ... >

<script ... >

Automatically generated patch

• Our string analysis generates a vulnerability signature that characterizes all possible input strings that can exploit the vulnerability

• Our patch generation algorithm automatically generates a patch that removes the vulnerability

1: <?php

P: if(preg match(’/[^ <]*<.*/’,$_GET[”www”]))

$_GET[”www”] = preg replace(<,””,$_GET[”www”]);2: $www = $_GET[”www”];3: $l_otherinfo = ”URL”;4: $www = preg_replace(”[^A-Za-z0-9 .-@://]”,””,$www);5: echo ”<td>” . $l_otherinfo . ”: ” .$www. ”</td>”;6: ?>

Experiments

We applied our analysis to three open source PHP applications•Webches 0.9.0 (a server for playing chess over the internet)•EVE 1.0 (a tracker for player activity for an online game)•Faqforge 1.3.2 (a document management tool)

•Attack patterns: – Σ∗<scriptΣ∗ (for XSS)– Σ∗ or 1=1 Σ∗ (for SQLI)

Application #Files LOC # XSS Sinks

# SQLISinks

1 Webchess 0.9.0

23 3375 421 140

2 EVE 1.0 8 906 114 17

3 Faqforge 1.3.2

10 534 375 133

Verification Results

Type # Vulnerabilities(single, 2, 3, 4)

Time (s)total

fwd bwd relationalMemory (KB)average

1. XSS SQL

(24, 3, 0, 0)(43, 3, 1, 2 )

46.08110.7

1.734.87

0.9212.04

6.3038.03

16850136790

2. XSS SQL

(0, 0, 8, 0)(8, 3, 0, 0)

288.5023.9

6.801.5

-8.47

127.805.2

12538217280

3. XSS SQL

(20, 0, 0, 0)(0, 0, 0, 0)

7.876.7

0.22-

0.22-

--

9948<1

We use (single, 2, 3, 4) indicates the number of detected vulnerabilities that have single input, two inputs, three inputs and four inputs

Making it work

Stringanalysis

Inputvalidation

Separation of concerns + modularity+ abstraction

Modelchecking

BoundedSAT basedverification

Navigation correctness

Dataintegrity

Navigation verification

• We ask developer to specify the high level navigation behavior as a navigation state machine

• MVC frameworks typically use a hierarchical structure where actions are combined in the controllers and controllers are grouped into modules– We exploit this hierarchy to specify the navigation state

machines as hierarchical state machines

Navigation verification

NavigationProperties

Static Verification

Runtime Enforcement

NavigationState Machine

TranslatorTranslator SMVSMV

Verified

Counter-examplebehavior

NSM PluginNSM Plugin(NSM Interpreter)(NSM Interpreter)

Navigation verification experiments

• We studied three real-world, freely available web applications:

– BambooInvoice: invoice management application159,000 lines of code, 60 actions

– Capstone: student project management system41,000 lines of code, 33 actions

– Digitalus: content management system401,000 lines of code, 26 actions

Specifying the NSM

• We specified the NSMs manually, by exploring potential error sequences in applications

– Amount of effort: Half a day per application (including taking screenshots, drawing the graph, etc.)

Application States Transitions Variables

Digitalus 32 48 7

BambooInvoice 63 80 8

Capstone 8 16 1

Model checking NSMs

• We used an existing model checker (SMV) to statically check navigation properties of NSMs

• Some example navigation properties:– Once you login, the only way to go back to the login

page is by traversing the logout page– Each controller has the 'index' action as its only entry

point from other controllers

• We can statically verify the NSMs with SMV model checker– Verification uses less than 4.4 MB of memory and takes

less than 1 second

Runtime enforcement of NSMs

• PHP plugin for enforcement of NSMs at runtime– Intercepts page requests in an MVC application and

validates them against the NSM – One line of code to insert in the application– Simple: 1,100 lines of PHP code (3% of the smallest

application)• Average processing time when an action conforms to the

NSM:

Application Time without plugin (ms)

Time with plugin (ms)

Digitalus 11 12

BambooInvoice 183 199

Capstone 90 122

Runtime enforcement of NSMs

• Processing time when an action provokes a PHP warning (which the NSM-enabled application blocks):

• ...and when an action provokes a PHP error:

Application Action Time without plugin (ms)

Time with plugin (ms)

Digitalus Create folder 28 26

Digitalus Upload media 18 32

BambooInvoice New invoice 938 574

Application Action Time without plugin (ms)

Time with plugin (ms)

Digitalus Edit page 416 32

Digitalus Delete folder 424 36

BambooInvoice View invoice 564 594

Making it work

Stringanalysis

Inputvalidation

Separation of concerns + modularity+ abstraction

Modelchecking

BoundedSAT basedverification

Navigation correctness

Dataintegrity

Data model verification

• We worked on verification of static data model properties in Ruby-on-Rails applications

• Rails uses active records for object-relational mapping• There are many options in active records declarations that

can be used to specify constraints about the data model• Our goal:

– Automatically analyze the active records files in Rails applications to check if the data model satisfies some properties that we expect it to satisfy

• Approach:– Bounded SAT based verification– We automatically search for data model errors within a

given scope

Data Model Verification

TranslatorTranslator

Data ModelProperties

ActiveRecords

Alloy Specification

Alloy AnalyzerAlloy Analyzer

Verified

Counter-exampledata model instance

Data Model Verification Experiments

• We used two open source Rails applications in our experiments:– TRACKS: An application to manage things-to-do lists

• 6062 lines of code, 44 classes, 13 data model classes

• Alloy translation: 301 lines– Fat Free CRM: Customer Relations Management

software• 12069 lines of code, 54 classes, 20 data model

classes• Alloy translation: 1082 lines

Types of properties we checked

• Relationship cardinality– Is it possible to have two accounts for one user?

• Transitive relations– A book’s author is the same as the book’s edition’s

author• Deletion properties

– Deleting a user does not create orphan orders

• We wrote 10 properties for TRACKS and 20 properties for Fat Free CRM

Data Model Verification Experiments

• Of the 30 properties we checked 7 of them failed

• For example, in TRACKS Note’s User can be different than Note’s Project’s User

Conclusions

• Web application dependability is a very important problem

• Using automated verification we can find and remove errors in web application before they are deployed

• In order to develop feasible automated verification techniques we exploit the MVC architecture and the principles of modularity, abstraction and separation of concerns

• At minimum, the developer has to specify the correctness properties (e.g. attack patterns)

– In some cases, the developer may also need to specify desired behavior (e.g. navigation)

Future work

• Handle more concerns• Maximize automation• Mobile applications