Protecting Database Centric Web Services against SQL/XPath ...mvieira/paper_dexa.pdfProtecting...

Protecting Database Centric Web Services against

SQL/XPath Injection Attacks

Nuno Laranjeiro, Marco Vieira, and Henrique Madeira

CISUC, Department of Informatics Engineering

University of Coimbra, Portugal

cnl, mvieira, [email protected]

Internal Report, 2009

!

Protecting Database Centric Web Services against SQL/XPath Injection Attacks

Nuno Laranjeiro, Marco Vieira, and Henrique Madeira

CISUC, Department of Informatics Engineering University of Coimbra, Portugal cnl, mvieira, [email protected]

Abstract. Web services represent a powerful interface for backend database systems and are increasingly being used in business critical applications. However, field studies show that a large number of web services are deployed with security flaws (e.g., having SQL Injection vulnerabilities). Although several techniques for the identification of security vulnerabilities have been proposed, developing nonvulnerable web services is still a difficult task. In fact, securityrelated concerns are hard to apply as they involve adding complexity to already complex code. This paper proposes an approach to secure web services against SQL and XPath Injection attacks, by transparently detecting and aborting service invocations that try to take advantage of potential vulnerabilities. Our mechanism was applied to secure several web services specified by the TPCApp benchmark, showing to be 100% effective in stopping attacks, nonintrusive and very easy to use.

Keywords: Web services, vulnerabilities, security attacks, SQL Injection, XPath Injection, code instrumentation.

1 Introduction

Web services are now widely used to support many businesses, linking suppliers and clients in sectors such as banking and financial services, transportation, or automotive manufacturing, among others. Web services are selfdescribing components that can be used by other software across the web in a platformindependent manner, and are supported by standard protocols such as SOAP (Simple Object Access Protocol), WSDL (Web Services Description Language) and UDDI (Universal Description, Discovery, and Integration) [6]. In a servicebased environment, providers offer a set of services that frequently

access a backend database and can be explored and used by service consumers. The web service technology provides a clear service interface for consumers, and this is frequently used to enable the aggregation of services in compositions. These compositions, frequently designated as businessprocesses, are essentially a collection of services working together towards an objective [8]. The composition workflow (i.e., the sequencing and coordination of calls to component services) obviously introduces a degree of dependency between services, where a security failure in a component may compromise the whole composition.

2

A recent McKinsey report indicates web services and SOA as one of the most important trends in modern software development [20]. However, the wide use and exposure of web services results in any existing security vulnerability being most probably uncovered and exploited by hackers. In fact, command injection attacks (e.g., SQL or XPath injection) are frequent types of attacks in the web environment [24]. These attacks take advantage of improperly coded applications to change queries sent to a database, enabling, for instance, access to critical data. Vulnerabilities allowing SQL Injection and XPath injection attacks are particularly

relevant in web services [30], as their exposure is high and they frequently use a data persistence solution [29] based either in a traditional relational database or in a XML database. Currently major database vendors and several opensource efforts provide XML databases (e.g., Oracle XML DB, SQL Server 2008, Apache Xindice, etc.) and typically, the access to this type of databases uses XPath expressions. While the goal of XPath Injection is to maliciously explore any existing vulnerabilities in XPath expressions used by an application (for instance to access an XML database), SQL Injection tries to change the SQL statements in a similar manner [24]. Different techniques for the identification of security vulnerabilities have been

proposed in the past [24], namely: – Static vulnerability scanning: consists in analyzing the source code of the

application looking for potential vulnerabilities. It is a “whitebox” approach that can be done manually or by using automated code analysis.

– Penetration testing: widely used technique that tries to disclose security vulnerabilities in web applications (including web services). The testing tool stresses the application from the point of view of the attacker (“blackbox” approach) and tries to penetrate it by issuing a huge amount of interactions.

Although web services are increasingly being used in complex businesscritical

systems, current development support tools do not provide practical ways to protect applications against security attacks. In this paper, we present a phased approach that is able to: 1) learn the profile of regular client requests by transforming requests into invariant statements; 2) protect web service applications from SQL/XPath injection attacks by matching incoming requests with the valid set of codes previously learned. Note that this work focuses on source code vulnerabilities and not any specific security mechanisms, such as authentication and data encryption. In summary, our approach consists of the following major phases:

– Service assessment: An optional phase that characterizes the web service code in terms of security vulnerabilities. Penetration testing and static code analysis are used to identify any existing security vulnerabilities (e.g., web services code prone to SQL/XPath Injection);

– Statement learning: Consists of learning the profile of valid, nonmalicious data access statements. We provide automatable workload generation approach to create a set of invocations able to exercise the web service code, reaching as much data access statements as possible, and enriching the set of invariant statements learned by our framework;

– Service protection: The generation of a protective service wrapper that uses the outcome of the learning phase (a set of valid statements) to prevent the success of SQL/XPath injection attacks. All incoming requests are hashed

3

and compared to valid statements and malicious requests that fall out of the learned set are kept from proceeding.

It is important to emphasize that the proposed approach is quite effective, has an

extremely low overhead, and does not require any access to the source code of the application. Instead we propose a bytecode instrumentation approach that is able to transparently perform the necessary modifications to protect the target service. In our opinion, this integrated tool is extremely important in the following scenarios:

– To help web services developers improving their code. During the build cycle, developers can use the tool to automatically inject bytecode that eliminates security vulnerabilities. Besides improving the quality of the service, it simplifies the developers’ task and, at the same time, reduces the coding and testing effort. This is particularly important for junior programmers that frequently focus on the functionally and disregard code robustness and security.

– To help system administrators improving the security of services already deployed, as the technique can be easily used to improve existing services.

To show the effectiveness of the proposed approach we have used two implemen

tations (independently coded by different developers) of the web services specified by the standard TPCApp performance benchmark. A large number of security problems have been disclosed and corrected, showing that our approach is effective and a powerful tool for developers and system administrators. The structure of the paper is as follows. Next section presents some background

and related work. Section 3 presents the technique for fixing security problems and Section 4 presents the experimental evaluation. Section 5 concludes the paper.

2 Background and Related Work

Several efforts have been undertaken for the identification of security vulnerabilities in computer software. Typical “white box” and “black box” approaches used to test web applications for vulnerabilities, also apply to web services. Concerning white box approaches, various static code analysis tools have been developed [24]. This type of analysis consists in inspecting the program's source code in a static manner (i.e., without executing the program) to detect code patterns that are prone to vulnerabilities. A static analysis technique capable of detecting many application vulnerabilities, including SQL injection, is presented in [19]. The targeted vulnerabilities stem from unchecked inputs, which are widely recognized as the most common source of security vulnerabilities in web applications. FindBugs [13] is a concrete example of a wellknown tool that uses static analysis to inspect Java code for occurrences of bug patterns, including SQL Injection vulnerabilities. FORTIFY [10], Ounce [21], or Pixy [22] are examples of commercial security tools used by web applications/services developers to detect security vulnerabilities. Unlike whitebox approaches, blackbox testing does not requires the application’s

source code. Instead it is based on the execution of a set of runtime tests, where a

4

usually large number of requests is created and delivered to the service under testing. The responses are later analyzed to disclose any existing vulnerabilities. This type of testing is also known as penetration testing, and currently there is a wide range of tools (vulnerability scanners and fuzzers) that can be used to detect vulnerabilities in web services. Some examples include commercial vulnerability scanners such as Acunetix Web Vulnerability Scanner [1], HP Webinspect [14], IBM Rational AppScan [15], and open source scanners such as Gamja [11] and BrupSuite [4]. Although blackbox testing tools do not require source code access, in certain sce

narios, the code may be useful to improve the results. Indeed, a key aspect in security testing of web services is the generation of representative workloads, which are capable of exercising the web service code in a comprehensive way (i.e., achieve high code coverage). Tools like Cobertura [5] and Clover [2] measure the tests coverage by using code coverage analyzers and can be used to validate and improve workloads through the automatic identification of the areas of a program that are not exercised by a set of test cases, helping in the definition of additional test cases. An important aspect is that automatic approaches for vulnerabilities detection are

frequently unable to produce accurate results [30]. Thus, human code inspections for vulnerabilities disclosure are frequently used to obtain more accurate results [9]. A common way to remove SQL/XPath Injection vulnerabilities is to change the

vulnerable code and separate the query structure from the input data by using parameterized queries. Such queries are available for typical databases under the form of prepared statements (an SQL statement structure with placeholders for variables), but also for XML databases (or simply applications that use XPath) under the form of XPath parameterized expressions [24]. The approach presented in [27] describes a replacement algorithm and its corres

ponding automation for removing SQL injection vulnerabilities from SQL statements. The approach consists of replacing the SQL statements by secure prepared statements. Four case studies were conducted on opensource projects. Code inspection and static analysis were used to disclose code prone to SQL injection, which was then replaced by secure code automatically generated. The whole process was able to correct 94% of the vulnerabilities found in 20 files. However, several aspects related with nonexplicit setting, nonstring, or iteratorbased SQL structure still remained unsolved. Our approach is able to overcome these limitations, as it does not make assumptions about the structures used to build SQL statements. An automated approach that tries to convert plain text SQL statements into pre

pared statements is presented in [28]. The strategy is to remove SQL vulnerabilities by replacing vulnerable code with generated secure code. The presented prototype was able to remove SQL injection vulnerabilities in five different statement configurations contained in five custombuilt toy projects. The generated prepared statements were verified to be functionally equivalent to the original statements. However, the conversion algorithms are quite limited and need to be largely improved to reduce the large number of SQL statements that cannot be handled by the proposed approach. Based in the fact that software written in one language often needs to construct

sentences in another languages (such as SQL, XQuery, or XPath queries), in [3] is presented an approach for attack injection, preventing vulnerabilities by construction (a programming style alternative to methods that use string manipulation or highlevel APIs). The proposed methodology consists of embedding the syntax of the guest

5

languages into the syntax of the host language (e.g., SQL in Java) and automatically generating code that maps the embedded language to constructs in the host language that reconstruct the embedded sentences, adding escaping functions where appropriate. Although, the approach is generic enough to be adapted to various languages, it obviously adds complexity to the development phase. AMNESIA (Analysis and Monitoring for NEutralizing SQLInjection Attacks)

[12] is a tool that uses a modelbased approach specifically designed to detect SQL injection attacks, and combines static analysis and runtime monitoring. Static analysis is used to analyze the source code of a given web application, automatically building a model of the legitimate queries that such application can generate. At runtime, AMNESIA monitors all dynamically generated queries and checks them for compliance with the statically generated model. When a query that violates the model is detected, it is classified as an attack, and is prevented from accessing the database. Unlike AMNESIA, our approach does not build a model based on static analysis. Instead we propose learning the profile of legitimate queries at runtime, which may represent a richer, more realistic profile learning, overcoming the intrinsic limitations of static analysis (e.g., requiring access to source code).

3 Security Improvement Approach

To perform SQL Injection the attacker exploits an unchecked input in order to modify the structure of a SQL command [24]. Usually, the attacker starts by adding an extra condition in the ‘where’ clause of a SQL command to gain a privileged access. Then the attacker executes a SQL command returning valuable information (typically using a union clause with the malicious select), disrupting the database by performing inserts, deletes or updates. Regarding XPath, the attack approach is basically the same and only the expression syntax differs. This way, our proposal to identify potential SQL and XPath injection attacks is based on anomaly detection, which consists in searching for deviations from an historical (learned) profile of good commands, and includes three major phases: 1. Service assessment – Consists of using penetration testing, automated static code

analysis, or human code inspection to disclose SQL/XPath Injection vulnerabilities and thus characterize the service in terms of these vulnerabilities.

2. Statement learning – The goal is to identify the valid set of valid SQL statements or XPath expressions. It is composed of two steps: 2.1. Workload generation and execution.

2.1.1. Inspection of the service description document (the WSDL file). 2.1.2. Generation of a service workload that complies with the service de

scription. 2.1.3. Measure the generated workload coverage. Go to 2.1.2 until a given

coverage (developer defined) is obtained. 2.2. Instrument the service to learn valid SQL statements and XPath expressions

used by the application. 3. Service protection – Consists of instrumenting the service to provide protection

against SQL/XPath Injection attacks. Afterwards, the developer may revisit phase

6

1 to verify if the previously detected vulnerabilities were effectively protected.

3.1 Service Assessment

The goal of this first phase is to assess the security of the web service application in terms of SQL/XPath injection vulnerabilities. This initial characterization phase is optional, as the developer may simply wish to apply the security mechanism as a regular attack barrier, without searching the service for potential vulnerabilities. Any of the following alternatives can be used for vulnerabilities detection: penetra

tion testing (by using scanners or fuzzers) [24]; static code analysis [19] (a developer can easily use tools such as FindBugs – http://findbugs.sourceforge.net/); or, in more difficult cases (or in cases where a high degree of confidence is needed), human code inspections by security assurance teams [9]. The outcome of this phase is essentially a set of SQL/XPath injection vulnerabili

ties in the service code. This information can be used later to verify the effectiveness of the proposed protection scheme by rerunning this phase over the protected service.

3.2 Statement Learning

This phase includes 2 steps: a workload generation and execution step, and a command (SQL/XPath) learning step. A detailed description for each follows.

3.2.1 Workload Generation and Execution

The first step for the workload generation and execution is an inspection of the service description document, the WSDL file. This XML file is automatically processed to obtain the list of operations, parameters and associated data types. The information describing the structure and type of all inputs and outputs of each operation is usually found in a XML Schema file (a XSD file that describes the structure of an XML object), which is referenced by the original WSDL [31], [6]. Additionally, the workload generation tool needs to gather information on the valid domains for all input and output objects. For this purpose, the XSD file, that describes all parameters, is searched. This file may also include information on valid values for each parameter, provided that XSD schema restrictions are defined. It is rare, however, to find the valid values for each parameter expressed in a WSDL/XSD pair. This is due to: – Lack of integrated tools (and programming language support) that could be easily

used to add the domain values to the service’s WSDL descriptor. – Currently WSDL or XSD have no support for expressing dependencies between

multiple parameters of a given service operation. This absent feature impairs the full definition of a domain. To tackle this last issue, we have recently proposed a language that enables a full

domain expression in the XSD file associated with the WSDL [18]. The proposed language, named ‘Extended Domain Expression Language – EDEL’, enables web

7

services to fully express their operations domains (including complex parameter domains interdependencies). This can be used to easily create workloads that respect the operations’ domains, hence greatly increasing their coverage. After having collected the necessary service information, the workload generation

is conducted so that we can exercise as many source code points as possible (ideally, the complete set of data access SQL/XPath statements present in the code). Three options are available for the workload generation:

– User defined workload: the developer implements a workload emulation tool based on the knowledge he/she has about the service (e.g., using a client emulation tool like soapUI – http://www.soapui.org/).

– Synthetic workload: this workload is generated automatically using the web service definitions mentioned above. For every parameter of each operation a set of valid input values is randomly generated. Those values are adequately combined to guarantee a large number of valid execution calls.

– Real workload: In some cases it may be possible to use the runtime environment as workload for the learning process (e.g., in the case of already deployed service). However, in this case it is not possible to guarantee the absence of malicious requests, which may result in learning unauthorized commands as valid ones. This will lead to falsenegatives (i.e., malicious commands not detected as such) at runtime.

A set of wellknown tools were combined and integrated in an automated synthetic

workload generation process depicted in Figure 1. Some of these tools are specific to Java, but similar ones exist for all major languages. The following paragraphs outline the workload generation process (see details at [18]).

beneratorJAXB/XJC

XStreamDeserializer

ReflectionLoader

Maven managedCobertura

XML Objects

Java Source Code

Unit Tests

XSD

2

4

1

3

5

6

Fig. 1. Workload generation and execution.

Using the XSD file as starting point, we generate a synthetic workload by using appropriate tools, such as benerator [7] (stage 1). This tool is able to read XSD Schema files and, using the domain information present in each schema, can generate a set of XML files containing values later used to exercise our target service. To use the generated values, we need to create programming language level objects that accu

8

rately represent the structures found in the XSD file (stage 2). JAXB’s binding compiler (xjc – https://jaxb.dev.java.net/) can be used for this purpose. At this point, we are able to use XStream (http://xstream.codehaus.org/) to auto

matically deserialize the produced XML into the corresponding generated Java objects, creating this way a list of objects that form our final workload (stage 3). This is a process that uses reflection to load classes by name and builds a list of objects that are integrated into one unit test case per each service operation (stage 4). Most tools like benerator are, up to this date, unable to consider multiple domain

relations for the input parameters. In fact, to generate the input values this tool only allows the definition of a single domain restriction. Although this restriction can also be a union of restrictions, interparameter restrictions are not taken into account, hence not usable. This way, the workload may include invalid service calls that have to be identified and discarded (using the definitions provided via EDEL). A key difficulty related to the workload generation is that the coverage of the web

service calls is not easy to guarantee (e.g., it is extremely difficult, or even impossible, to generate a workload that exercises all the web service code). Our proposal includes executing the workload and using a test coverage analysis tool to get a metric of the code coverage, such as Cobertura [5] (stage 5). If the developer is not satisfied with the coverage then more web service calls are required. Calls must be added to the workload until the code coverage reaches the level the developer desires (stage 6).

3.2.2 SQL/XPath Learning

To learn the commands profile, we exercise the web service by executing the generated workload. Basically, we start by automatically identifying all the locations in the web service code where the SQL and XPath commands are executed. This is achieved by using AOP (Aspect Oriented Programming) to intercept all the calls to a set of method signatures that correspond to wellknown APIs for executing SQL commands (e.g., Java’s JDBC API, the Spring Framework JDBC API, etc.) and evaluating XPath expressions (e.g., Java’s JAXP API). Besides this set of wellknown APIs, virtually any API can be easily added to the learning mechanism, as the only requirement is to know the full signature of the method to be intercepted. Afterwards, the workload is run and, at runtime, all SQL and XPath commands submitted are intercepted and logged. Figure 2 represents the basic architecture for our interception mechanism. The learning module is described in the following paragraphs, whereas the protection module (also depicted in Figure 2) is described in Section 3.3. At runtime, each data access call is intercepted and delivered to a dispatcher. The

decision here is simply to check if the application is in learning mode or in protection mode, in each case the request is delivered to an appropriate module (learner or protector module, respectively). During learning, SQL and XPath commands are parsed in order to remove the data variant part (if any) and a hash code is generated to uniquely identify each command. In other words, the information used does not represent the exact command text, since commands may differ slightly in different executions, while keeping the same structure. For example, in the SQL command “SELECT * from EMP where job like 'CLERK' and SAL >1000”, the job and the salary in the select criteria (job like ? and sal > ?) are depend on the user’s choices. This way,

9

instead of considering the full command text, we just represent the invariant part of it. After removing the variant part of each command it is possible to calculate the command signature using a hash algorithm.

Application server

AOP Layer

Web Service

Dispatcher

Protector

BusinessLogic

SOAP

Client Application

Learner

Fig. 2. The AOPbased configuration for data access statements learning and service protection.

Each hash signature is associated with a source code entry point (which is provided by the AOP framework) in a Map structure. This does not mean that we need the original application’s source code, but it rather means that we need bytecode compiled with source code line information. This is generally the case, even in production applications as it provides extra information on failure events. In the previously referred Map structure, each key corresponds to a given source code point and has a set of associated valid/expected hashed commands. Note that, in a given point there might be several valid commands. For example, as shown in Figure 2, the SQL command submitted to the database might be an insert or an update. This is why we need a list of valid SQL or XPath commands for each source code point. An important aspect is that the workload is generated in such way that guarantees a

minimum level of code coverage (as discussed in Section 3.2.1). Although this does not assure a complete learning of SQL commands and XPath expressions, it allows us to have a high confidence degree. Obviously, increasing the size of the workload is a way of improving coverage and further guaranteeing a more complete learning. ... if (isInsert()) sql = “INSERT INTO CLIENT VALUES (seq.nextval, 'Jack')”; else sql = “UPDATE CLIENT SET NAME='John' WHERE ID=1”; statement.execute(sql); ...

Fig. 3. Example of SQL commands execution.

3.3 Service Protection

Service protection at runtime (i.e., after deployment) consists in performing one security check per each data access executed. All SQL and XPath commands are intercepted and hashed. The request flow is very similar to the learning phase; the differ

10

ence is that each request is now delivered to the protector module instead of being delivered to the learner module (see Figure 2 for details). Obviously, the calculated hash codes are not added to the learned command set. Instead, they are compared to the hash values of the learned valid commands for the code point at which the command was submitted. In practice, the matching process consists in looking up the current source code

origin in the previously referred Map structure and getting the list of hash codes of the valid (learned) commands for that point. This list (generally quite small) is then searched for an element that exactly matches the hash of the command that is being executed. Execution is allowed to proceed if a match is found. Otherwise, a security exception (the unqualified name for this exception is SecurityRuntimeException) is thrown and, in this way, code execution is kept from proceeding, which prevents the potential attack. If the source code origin is not found in the Map lookup, code execution is also kept from proceeding in a similar manner (in this case, a different exception is thrown –CodePointNotTrainedRuntimeException). This case strongly indicates that the learning phase is incomplete (test coverage was not good enough) and that an extended workload is probably required. All exceptions are logged by default, and we also provide checked versions of these exceptions, for the cases the developer whishes to explicitly state that a web service may throw a particular exception. To verify if the security mechanism is working properly the web service should

be reassessed using a security analysis approach (similar to phase 1). The goal is to check if any of the initially identified vulnerabilities still exist and the expectation is that our mechanism stops any injection attempts by raising the appropriate security exception. If a security vulnerability is detected it means that the workload coverage was not good enough and that the learning phase is incomplete. In this case, the workload should be extended and the learning process repeated. Finally, a developer may want to reexecute the original workload to verify the

service behavior remains correct. Problem indicators include responses outside the expected domains. For certain services, responses that are different from those obtained during the first workload execution are also problem indicators. These might indicate potential problems introduced by the security mechanism (e.g., due to an incomplete learning of SQL and XPath commands). The process should be canceled if these problems are identified, and the developer should extend the workload in order to improve the learning phase completeness.

4 Experimental Evaluation

In this section we present and discuss the experimental evaluation performed over an initial prototype tool (available at [17]) created to demonstrate the feasibility of the proposed approach. All implementation efforts used Java as a programming language. However, other languages could have been used as well (e.g. C#, C++).

4.1 Experimental Setup

To demonstrate our approach we have used the following subset of the web services

11

specified by the standard TPCApp [29] performance benchmark: Change Payment Method, New Customer, New Product, and Product Detail. TPCApp is a performance benchmark for web services and application servers that is widely accepted as representative of real environments. Two versions of each service (versions A and B) were created by independent programmers that were given the flexibility to choose any Java technology they would find appropriate for the implementation. This Java restriction was due to the fact that our prototype tool is Javabased. We chose JBoss 4.2.2.GA as service container and selected the reference imple

mentation for the Java API for XML Web Services (JAXWS) due to their relevance in industry [25], [23]. The setup consisted of two nodes (client and server) that were deployed on two machines connected over an isolated Fast Ethernet network.

4.1 Services Assessment

The first phase of the experimental evaluation consisted of trying to identify potential vulnerabilities. Initially, we opted to use automated tools (vulnerability scanners and static code analyzers), however, due to the poor results obtained we decided to perform a code inspection by a team of security experts with different experience backgrounds. Table 1 summarizes the results. Note that all detected vulnerabilities correspond entirely to SQL injection issues, as the TPCApp specification does not include any XPath usage. However, the assessment approach is essentially the same, as the main difference resides on the syntax of each language. As discussed below, FindBugs, the static analyzer used, was unable to provide individual results per service.

Table 1. Vulnerabilities detected by the different methods.

Scanner F indBugs Code Inspection Service A B A B A B

ChangePaymentMethod 0 0 (3 FP)1

2 0

2 (2 FP) 0 NewCustomer 1 + 1 0 (3 FP) 19 (1 FP) 0 NewProducts 0 0 1 (1 FP) 0 ProductDetail 0 0 0 0

We used a wellknown commercial vulnerability scanner (which we do not iden

tify due to legal reasons that prevent from disclosing results) that was able to identify 2 critical vulnerabilities in version A. Both were manually checked and in fact corresponded to SQL Injection vulnerabilities (although one was originally identified by the scanner as a database error). The scanner also indicated 6 vulnerabilities in version B. An important aspect is that implementation B was using SQL prepared statements (with exception of one statement that, however, does not add any security concern as it is a static SQL command). As prepared statements are the most powerful way of preventing SQL Injection, we were expecting no issues in this version. Anyway, we decided to examine the scanner responses and the source code of version B. We found that the reported errors indicated in all these cases a ‘value to large for column’ error

1 FP: False positives.

12

message. The problem was that the scanner tried to apply an attack expression (“' OR”) that was 2 characters larger than the maximum allowed for that particular database column. The scanner was receiving an SQL error code and incorrectly interpreting it as a SQL Injection vulnerability. Even if an attack expression smaller than the maximum allowed size had been used, it would still pose absolutely no threat as the prepared statement engine escapes offending characters like (‘). As vulnerability scanners are known to present poor results in this kind of envi

ronments [30] we decided to use also a static code analysis tool for disclosing SQL Injection vulnerabilities. We chose FindBugs, as it is able to perform a thorough code analysis, is very easy to use, and has a specific module for detecting SQL Injection vulnerabilities. As we can see in Table 1, FindBugs was able to mark 2 vulnerabilities for the whole source code of version A, and none for version B as expected. Considering version A, the developer created a set of methods for database access and basically FindBugs marked the last point of the source code where a non constant string was passed to an execute SQL method. We then analyzed the database access methods call hierarchy to try to distribute the vulnerabilities per service, which was not possible. The reason is that some services did use the database methods in a vulnerable way, while others did not. To obtain more accurate results we asked a team of security experts to disclose

SQL Injection Vulnerabilities in the source code by executing a thorough code inspection and penetration tests. The security analysis team was composed of 5 elements. Three of these elements are developers with more than 2 years of experience on developing database centric business critical web applications in Java. The remaining two are security researchers, one junior (one year of experience) and one senior (four years working on security related topics). Table 1 presents the summary of the vulnerabilities detected by the team (results represent the union of the vulnerabilities detected by each team member). One vulnerability was counted per each web service input parameter used in a given SQL statement in a vulnerable way. It is important to mention that we doublechecked the vulnerabilities pointed out by each participant (under the form of an example service request) to discard few falsepositives. As we can see, 3 of the services were vulnerable in version A, and one in particular

had 19 security flaws. This large number is due to a large number of user input parameters, being used in more than one SQL statement throughout the code. As expected, Version B presented no security vulnerabilities. With this service characterization available we then proceeded to the next phases of our security improvement approach, statement learning and service protection.

4.2 Statement Learning

In this phase, the WSDL and XML schema (XSD) of each web service were analyzed and, for each input and output parameter, we manually extended the XSD file to include domain restrictions using the standard XSD restriction element and fully respecting the TPCApp specification. EDEL was applied to express the final domains. The workload was defined based on a set of web service requests (a total of 5 requests for the 4 services). Before continuing we analyzed the coverage using Cobertura [5].

13

As we can see in Table 1, the coverage is in general above 80% (except in one case), a value typically accepted as representative by developers. Anyway, we decided to analyze the source code of all versions to understand what code was not being covered. In all cases it corresponded to unused exception catch blocks. Our simple workload was able to exercise the useful source code perfectly, including all data access statements, but was not expected to trigger any errorhandling blocks. Thus, we considered the workload adequate for all services. The workload was then applied to exercise each TPCApp version in order to learn

the expected SQL commands. After the learning process, we manually checked whether all possible SQL commands executed by the service application were correctly learned by our mechanism, and that was effectively the case. Note that, the learning process is quite important in our approach and is directly influenced by the coverage of the workload used. If there were commands not learned we would have to increase the size (and coverage) of the workload. The learning process was quite fast taking only a few seconds.

4.2 Improving Security

After the learning phase, we configured our mechanism to enter the protective state and detect maliciously modified commands. The vulnerability scanner was then used to retest all services for security vulnerabilities. The results were a total zero disclosed SQL/XPath injection vulnerabilities for all services. In fact, the original unprotected version A presented a vulnerability in the Change Payment Method service, when the scanner replaced a particular parameter with (‘) resulting in a ‘quoted string not properly terminated’ database error message. With our protection mechanism in place, this type of request corresponds to the generation of a new checksum not detected in the learning phase. This and all new malicious requests were indeed stopped, preventing any further service execution and possible security consequences. Security tests over version B presented the same initial erroneous results discussed before, so for our purposes the total sum of security issues is zero. Due to the instrumentation technique we were using, we did not rerun FindBugs,

as static analysis is not able to detect that our protection mechanism blocks particular data access statement executions. So, we decided to replay all malicious requests crafted by our code inspection participants. All attempts to inject SQL code were aborted by our mechanism by throwing the SecurityRuntimeException exception. To verify if the security improvement mechanisms changed the web services’ func

tionalities we reran the workload for all three versions. The web services responses were analyzed in order to try to identify potential deviations from the valid output

Table 2. Workload coverage.

Web Service Coverage A B

ChangePaymentMethod 92% 91% NewCustomer 80% 93% NewProducts 74% 87% ProductDetail 94% 87%

14

domains. As expected, no problem was identified, providing a strong indicator that our framework did not change the application’s normal behavior. Additionally, we executed a final test to assess the performance impact related to

the execution of the security system. As we were expecting small values, for the security improvement, we tested the worst case scenario found in the TPCApp services and executed 100000 invocations using that worstcase scenario. The security mechanism took on average 0,052 ms (± 0,029) to execute, less than 0,3% of the total time for the fastest executing service. In order to obtain such low measurements we used a Java method that provides nanosecond precision (but however does not guarantee nanosecond accuracy). In summary, our learning mechanism was able to stop all security attacks with a

negligible overhead. This is a very significant result, as besides effectively securing the target application, it implied absolutely no extraeffort from the developers that implemented the original services.

5 Conclusion

Previous works on web application security have shown that SQL/XPath Injection attacks are extremely relevant in web service applications. This paper presents an approach for improving web services security. The proposed approach consists of learning the profile of valid data access statements (SQL and XPath) and using this profile to later prevent the execution of malicious client requests. The approach was illustrated using two different TPCApp implementations. Various security issues were disclosed and corrected without additional development effort. In fact, while introducing an extremely low performance overhead, our approach proved to be 100% effective, as it was able to abort all attacks attempted in our experiments. During the whole experimental process, no extra complexity was added to the

source code. In fact, as source code is not needed, the mechanism can also be used to easily protect legacy services, which would otherwise require a difficult to implement and hard to maintain procedure. These facts make it an extremely useful tool for developers and service administrators.

References

1. Acunetix Web Vulnerability Scanner, http://www.acunetix.com/vulnerabilityscanner/ 2. Atlassian Clover Code Coverage Analysis, http://www.atlassian.com/software/clover/. 3. Bravenboer, M., Dolstra, E., Visser, E.: Preventing injection attacks with syntax embeddings. Proceedings of the 6th international conference on Generative programming and component engineering, Salzburg, Austria: ACM, pp. 312 (2007) 4. BrupSuite, http://portswigger.net/suite/ 5. Cobertura, http://cobertura.sourceforge.net/. 6. Curbera, F. et al.: Unraveling the Web services web: an introduction to SOAP, WSDL, and UDDI. Internet Computing, IEEE, vol. 6, pp. 8693 (2002) 7. Databene benerator, http://databene.org/databenebenerator

15

8. Erl, T.: ServiceOriented Architecture: Concepts, Technology, and Design, Prentice Hall Professional Technical Reference (2005) 9. Fagan, M.: Design and code inspections to reduce errors in program development. Software pioneers: contributions to software engineering, SpringerVerlag, Inc., pp. 575607 (2002) 10. FORTIFY – http://www.fortifysoftware.com/ 11. Gamja, http://lastlog.com/p4ssion/ 12. Halfond, W., Orso, A.: Preventing SQL injection attacks using AMNESIA. 28th international conference on Software engineering, Shanghai, China: ACM, pp. 795798: 2006 13. Hovemeyer, D., Pugh, W. : Finding bugs is easy. ACM SIGPLAN Notices, vol. 39, pp. 92106 (2004) 14. HP WebInspect, http://www.hp.com 15. IBM Rational AppScan, http://www01.ibm.com/software/awdtools/appscan/ 16. Kiczales, G. et al.: AspectOriented Programming. 11th European Conf. on Objectoriented Programming (1997) 17. Laranjeiro, N., Vieira, M., Madeira, H.: EDEL and Security Improvement for Web Services. http://eden.dei.uc.pt/~cnl/papers/edelsecuritytool.zip (2009) 18. Laranjeiro, N., Vieira, M.: Improving Web Services Robustness. Technical Report, http://eden.dei.uc.pt/~cnl/papers/2009icwsrobustnesssubmitted.pdf (2009) 19. Livshits, V., Lam, M.: Finding security vulnerabilities in java applications with static analysis. Proceedings of the 14th conference on USENIX Security Symposium Volume 14, Baltimore, MD: USENIX Association, pp. 1818 (2005) 20. McKinsey&Company: Enterprise Software Customer Survey (2008) 21. Ounce, http://www.ouncelabs.com/ 22. Pixy, http://pixybox.seclab.tuwien.ac.at/pixy/ 23. Red Hat Middleware: JBoss Application Server, http://www.jboss.org/jbossas/ 24. Stuttard, D., Pinto, M.: The Web Application Hacker's Handbook: Discovering and Exploiting Security Flaws. Wiley, ISBN10: 0470170778, (2007) 25. Sun Microsystems Inc.: JAXWS Reference Implementation, https://jaxws.dev.java.net/ 26. The Eclipse Foundation: The AspectJ Project. http://www.eclipse.org/aspectj/ (2008) 27. Thomas, S., Williams, L., Xie, T.: On automated prepared statement generation to remove SQL injection vulnerabilities. Information and Software Technology, v. 51, pp. 589598 (2009) 28. Thomas, S., Williams, L.: “Using Automated Fix Generation to Secure SQL Statements,” Third International Workshop on Software Engineering for Secure Systems (2007) 29. Transaction Processing Performance Council: TPC BenchmarkTM App (Application Server) Standard Specification, Version 1.1, http://www.tpc.org/tpc_app/ (2005) 30. Vieira, M., Antunes, N., Madeira, H.: Using Web Security Scanners to Detect Vulnerabilities in Web Services. Intl. Conf. on Dependable Systems and Networks, Estoril, Lisbon (2009) 31. W3C: W3C XML Schema, http://www.w3.org/XML/Schema (2008) 32. W3C: XQuery 1.0 and XPath 2.0 Functions and Operators, http://www.w3.org/TR/xqueryoperators/ (2008)

Protecting Database Centric Web Services against SQL/XPath ...mvieira/paper_dexa.pdfProtecting...

Documents

Transcript of Protecting Database Centric Web Services against SQL/XPath ...mvieira/paper_dexa.pdfProtecting...