[IEEE 2009 IEEE International Conference on Web Services (ICWS) - Los Angeles, CA, USA...

Efficient Testing of Service-Oriented Applications Using Semantic Service Stubs

Senthil [email protected]

Vibha Singhal [email protected]

Saurabh [email protected]

Pankaj [email protected]

Debdoot [email protected]

Soham [email protected]

IBM India Research Lab

AbstractService-oriented applications can be expensive to test

because services are hosted remotely, are potentially sharedamong many users, and may have costs associated withtheir invocation. In this paper, we present an approachfor reducing the costs of testing such applications. Thekey observation underlying our approach is that certain as-pects of an application can be tested using locally deployedsemantic service stubs, instead of actual remote services.A semantic service stub incorporates some of the servicefunctionality, such as verifying preconditions and generat-ing output messages based on postconditions. We illustratehow semantic stubs can enable the client test suite to be par-titioned into subsets, some of which need not be executedusing remote services. We also present a case study thatdemonstrates the feasibility of the approach, and potentialcost savings for testing. The main benefits of our approachare that it can (1) reduce the number of test cases that needto be run to invoke remote services, (2) ensure that certainaspects of application functionality are well-tested beforeservice integration occurs.

1 Introduction

Service-oriented computing promises greater flexibilityand efficiency in application development by enabling ap-plications to be composed using third-party services; how-ever, such applications can be difficult to test. Several fac-tors, such as multiple runtime configurations, remote host-ing of services, lack of access to service source code, andunanticipated changes in service semantics, present chal-lenges in testing service-oriented applications [6]. For ex-ample, lack of access to source code prevents the use ofcode-based testing techniques; lack of control over serviceevolution can complicate regression testing.

Another significant factor, which we address in this pa-per, is the cost of testing service-oriented applications. Test-ing a service-oriented application involves invoking ser-vices, which is inherently expensive because services are re-

motely hosted (therefore, susceptible to network traffic anddowntime) and shared by many users (therefore, dependenton server load). Moreover, many services have restrictionson, or costs associated with, their invocation (e.g., a pay-per-use model of service usage). Therefore, running a largetest suite incurs costs—both time and monetary—that couldbe lowered through a better selection of test cases that in-voke remote services. Service providers may have an inter-est in limiting testing activities too. Execution of large testsuites, containing many unnecessary and invalid test cases,by a client can lead to wasteful resource consumption anddegraded performance, which affects all clients.1

Much of the existing research on services testing hasfocused on developing automation frameworks and tech-niques for test-data generation (e.g., [1, 13, 18, 22, 26]).Other research has addressed problems such as regressiontesting (e.g., [3]), contract-based testing (e.g., [9, 23, 25]),and conformance and interoperability (e.g., [2, 11]). How-ever, there has been a limited investigation of how costs oftesting service-oriented applications can be reduced.

In this paper, we present a novel approach for lower-ing the costs of testing service-oriented applications. Weaddress the problem from the service-consumer perspec-tive.2 The main observation underlying our approach is thatcertain aspects of testing a service application can be per-formed using locally deployed, semantic service stubs. Asemantic service stub incorporates some of the functional-ity of the service that it represents. Examples of such func-tionality include verifying preconditions, and returning ap-propriate response and exception messages.

Existing research has proposed different types of behav-ioral contracts, such as preconditions and postconditions,that can be associated with service interfaces [9, 10, 22].Such contracts are typically used for generating test cases

1Execution of unreasonably large test suites may in fact be seen as adenial-of-service attack [5].

2Reference [6] provides a good overview of the issues in testing ofservice-centric systems from the perspectives of different stakeholders.

2009 IEEE International Conference on Web Services

978-0-7695-3709-2/09 $25.00 © 2009 IEEE

DOI 10.1109/ICWS.2009.40

197

Figure 1. Overview of our approach, with two scenarios: (a) the service provider generates semantic stubs and sample test data,which are downloaded by the consumers and used locally for testing; (b) service provider distributes only annotated WSDL, whichthe consumers download, and generate semantic stubs and test data locally using their own code and test-data generator.

and test oracles. In our approach, we leverage behavioralcontracts to generate semantic service stubs that containcode to simulate the behavior specified in the contracts.

To generate semantic stubs, service providers create se-mantic annotations for the exposed services, which areadded to the service source code or XML descriptions.Next, automated translators are used to analyze the anno-tations and generate the stub source code. In this paper,we illustrate these steps using the Java Modeling Language(JML) [16] to specify service semantics at the source codelevel, and the JML compiler (JMLC) to generate stub codeautomatically from the JML specifications.

Service consumers can use such stubs to test their appli-cations before integrating them with the actual remote ser-vices. This approach has several benefits. First, test casesthat are invalid because they violate service contracts can bedetected, and fixed or filtered, before the invocation of re-mote services. Second, local testing using stubs can revealfaults in the application, which can be fixed to make theapplication more robust before it is integrated with remoteservices. Third, because stubs are deployed locally, theycan be used early and repeatedly in the application devel-opment cycle, without concerns regarding costs of remoteservice invocations.

We present a classification of test cases for the client ap-plication. Using the classification, a client test suite canbe partitioned into different categories, such that test casesfrom some of the categories can be run using local semanticstubs only, and need not be rerun after the application hasbeen integrated with remote services.

To demonstrate the feasibility of our approach and evalu-ate its benefits, we implemented a prototype and performedtwo case studies. The results of the studies indicate that ourapproach can (1) reduce the size of the test suite that needsto be run on remote services, (2) lead to early detection offaults, and (3) result in faster execution of test cases.

The main contributions of the paper are

• A novel approach that uses semantic service stubsto improve the efficiency and effectiveness of testingservice-oriented applications

• The results of a case study that demonstrate the feasi-bility and benefits of the technique

In the next section, we provide an overview of the ap-proach, and illustrate semantic stub generation. Section 3discusses how semantic stubs can improve the testing ofservice-oriented applications. Section 4 presents the em-pirical results. Section 5 discusses related research; finally,Section 6 summarizes our work and lists directions for fu-ture research.

2 Semantic Service Stubs

The central feature of our approach is the use of semanticservice stubs that incorporate some of the functionality ofthe actual services, and that can be deployed locally by ser-vice consumers for testing their applications. Typical stubsincorporate little or no semantics of the software compo-nents that they represent, and therefore, provide no bene-fit of enabling meaningful testing of the applications fromwhich they are invoked.

A semantic service stub extends the notion of behavioralmock objects [24] to the domain of web services; it is basedon design-by-contract principles, in which a formal contractis associated with each software component [19]. The ele-ments in a contract include preconditions that a client of thecomponent must satisfy to obtain the services of the compo-nent, and postconditions that the component guarantees tothe client. A semantic service stub contains code that veri-fies whether the inputs to a service operation are valid, andgenerates appropriate output and exception messages basedon postconditions of the operation. Additionally, it can con-tain code to enforce invocation sequences and state depen-

198

dences. In fact, any functionality that is orthogonal to thebusiness logic of a service could be incorporated into a ser-vice stub. In practice, however, the decision to include somefunctionality ought to be based on its usefulness for testing,and the extent to which the functionality is amenable to au-tomated extraction and incorporation into stubs. Pragmaticconcerns would also limit the functionality to what the ser-vice providers are willing to reveal in a service stub.

In this section, first, we provide an overview of our ap-proach (Section 2.1). Then, we illustrate the use of pre-conditions and postconditions for annotating services (Sec-tion 2.2). Finally, we discuss how stubs can be generatedfrom such annotations (Section 2.3).

2.1 Overview of our approach

Figure 1 presents an overview of the approach; it il-lustrates two scenarios. In the first scenario, serviceproviders generate semantic stubs and sample test data, andmake them available for download. To generate semanticstubs, the provider infers semantic annotations for the ser-vice. This can be done in a semi-automated manner usingprogram-analysis techniques. A code analyzer can infer po-tential pre and postconditions from the service source code,which can be manually refined. Next, the provider adds theinferred annotations to the service source code or, alterna-tively, to the WSDL specifications.

The stub generator consists of a code generator and atest-data generator. There are two alternative sets of inputsto the stub generator. In the first alternative, the stub gen-erator takes as inputs the annotated service source code andthe standard WSDL specifications. The code generator au-tomatically generates the stub code from the source-codeannotations. The test-data generator parses the WSDL andthe source-code annotations to generate test data that are re-turned if the operations of the service are invoked correctly.In the second alternative, the stub generator takes as inputthe WSDL specifications with embedded semantic annota-tions. In this case, the code generator uses the WSDL anno-tations to generate the stub code.

Service consumers can download the service stubs, alongwith the test data, and deploy them locally, so that the clientapplications invoke the service stubs instead of the remoteservices. Thus, the test cases for a client application can beexecuted in this environment, without incurring the costs ofinvoking remote services during the early phases of appli-cation development and testing.

In the second scenario, shown at the bottom in Figure 1,the service provider provides only the WSDL with embed-ded semantic annotations. Service consumers download theWSDL, and use their own custom, or another third-party,stub generator to process the annotated WSDL. Thus, in thisscenario, the providers do not host the stubs for download(unlike in the first scenario). This scenario offers flexibility

public interface AmazonAPISpecification {

/* @ requires service != null && service != "";@ requires AWSAccessKeyId != null &&

AWSAccessKeyId != "";@ requires subscriptionId != null &&

subscriptionId != "";@ requires operation != null && operation != "";@ requires ASIN != null && ASIN != "";@ requires cartId != null && cartId != "";@ requires HMAC != null && HMAC != "";@ requires item.qty > 0;@ ensures isValid(cartId) ==>

\result.indexOf("PurchaseURL") >= 1;@ ensures isValid(cartId) ==>

\result.indexOf("Item") >= 1;@ ensures !isValid(cartId) ==>

\result == null;*/public String cartAdd(String service,

String AWSAccessKeyId, String subscriptionId,String operation, String ASIN, String cartId,String HMAC, Item item) ;

public /*@ pure */ boolean isValid(String id);}

Figure 2. Sample Amazon Associate service annotatedwith pre, post, and exception conditions using JML.

to the consumers in letting them select the stub generator,instead of being restricted to using pre-generated stubs.

As the service code evolves, the stubs need to be up-dated so that they remain consistent with the service code.Therefore, when a new service version is made available,providers have to ensure that the stubs are regenerated fromthe new code; moreover, clients that migrate to the new ver-sion have to update the stubs as well.

2.2 Semantic annotations

We illustrate semantic annotations using the Java Mod-eling Language (JML) [16], a behavioral interface speci-fication language for Java. (Other approaches for behav-ioral specification, such as iContract [15], could be usedtoo.) JML includes an extensive set of constructs for spec-ifying various aspects of a component’s behavior, such asmethod preconditions, postconditions, and exception con-ditions, class invariants, loop invariants, etc.3

Source code annotationsIn JML, preconditions, postconditions, and exception con-ditions are specified using the @requires clause, the@ensures clause, and the @signals clause, respectively.Figure 2 presents the interface specifications for a sampleservice, cartAdd, from the Amazon Associate Web Ser-vices.4 Service cartAdd takes as inputs (among others) acart id and an item id, and adds the item to the cart. Thefigure lists the preconditions for the input parameters. Forexample, the preconditions for the first seven parametersstate that these parameters can be neither null nor contain

3http://www.eecs.ucf.edu/∼leavens/JML/jmlrefman/

jmlrefman toc.html4http://aws.amazon.com/associates

199

an empty string. Similarly, the last precondition states thatthe quantity of the item to be added must be greater thanzero. The @ensures clauses for cartAdd() establish con-straints on the return value. The clauses state that if the cartid is valid, the returned string definitely contains a substring“PurchaseURL” and a substring “Item”; however, if the cartid is invalid, an empty string is returned.

Discussion

Adding service annotations to either the code or the WSDLrequires the service provider to provide this information.It may appear that this places an unreasonable burden onservice providers. However, many of the widely used ser-vices, such as Google and Amazon services, have exten-sive text-based documentation that provide service detailswith respect to required parameters, valid requests, sam-ple requests, exception handling, etc. Intra-enterprise ser-vices are typically developed by following the conventionalsoftware-development life cycle, in which the design phaseformally captures service specifications in models, such asUML. Therefore, the available service documentation andmodels can be leveraged for creating semantic annotations.

Moreover, automated program-analysis techniques, suchas symbolic execution [14], can be used to analyze the ser-vice source code to recover potential contracts, such as ex-ception conditions (e.g., [4]). Automatically inferred con-tracts can assist service providers in creating semantic an-notations and ensuring that the annotations are consistent(i.e., do not contradict the contracts present in code) andcomplete (i.e., do not miss contracts implicit in code).

2.3 Stub and test-data generation

After JML annotations have been added to the sourcecode, stubs can be generated automatically using the stan-dard JML compiler, JMLC. JMLC parses the annotations andgenerates Java code that contains assertions to validate thepre and postconditions. Thus, in a JML-based implemen-tation of our approach (see Figure 1), JMLC can be used asthe code-generator component.

The test-data generator can analyze the postconditionsof an operation, and using a constraint solver, such asPOOC [21], generate meaningful test data that can be re-turned when the operation is invoked with no preconditionviolations. Moreover, the test-data generator can identifyequivalence classes in the output domain of an operation,and generate data to ensure coverage of each of the classes.To illustrate, consider the postconditions for the cartAdd().By analyzing the postconditions, the test-data generator candetermine that the output is either an empty string, or a non-empty string that contains two substrings, “PurchaseURL”and “Item”, with no specific ordering between them.Therefore, the generator could create three return strings:(1) an empty string; (2) a string in which “PurchaseURL”

Table 1. Classification of client test cases into five cate-gories based on (1) the outcome of a test case on ALS and(2) whether a test case violates service contracts.

Test Suite Description

1. Tfail(inu ) Failing test cases with invalid service inputs that are pro-vided by the user (reveal application fault in filtering in-valid inputs or in handling service exceptions)

2. Tfail(ina ) Failing test cases with invalid service inputs that aregenerated in the application (reveal application fault increating input messages)

3. Tfail(v) Failing test cases with valid service inputs (reveal ap-plication fault in code that is unrelated to processing ofservice inputs and exceptions)

4. Tpass(in) Passing test cases with invalid service inputs (designedto test exception-handling code)

5. Tpass(v) Passing test cases with valid service inputs

Tinitial Initial test suite to be run on ALS

Tfinal Final test suite to be run on ARS

(Tfail(ina ) + Tfail(v) + Tpass(v))

precedes “Item” (<random string>PurchaseURL<random

string>Item<random string>); and (3) a string inwhich “Item” precedes “PurchaseURL” (<randomstring>Item<random string>PurchaseURL<random

string>). Automated specification-based test-generationtechniques (e.g., [7]) can be used to ensure adequatecoverage of the properties specified as postconditions.

The semantic service stubs, along with test data, can belocally deployed by clients, and used, instead of the remoteservices, during initial testing of their applications.

3 Stub-Based TestingTesting a service-oriented application requires that all in-

teractions between the application and the invoked servicesare tested adequately.5 The goals of such testing include:validating the code that creates input messages sent to theinvoked services; validating the code that parses outputmessages from the services; ensuring that all service excep-tion conditions are processed correctly; and finally, verify-ing that the service provides the business logic expected bythe application. Some of these goals can be accomplishedby using locally deployed semantic service stubs. (In thesubsequent discussion, we use ARS to refer to the client ap-plication integrated with remote services and ALS to referto the application integrated with local semantic stubs.)

The use of semantic service stubs can reduce the costsof testing client applications and potentially lead to earlydetection of faults. To illustrate these benefits, we presenta classification of client test cases, shown in Table 1. Theclassification is based on the outcome of a test case (whenexecuted on ALS ) and whether a test case violates servicecontracts. A typical test suite for an application can in-clude randomly generated test cases, code-coverage-based

5In this paper, we focus on functional testing, and do not consider othertypes of testing such as performance testing.

200

test cases, and functional test cases; we call this test suiteTinitial . We partition this test suite into five categories.

The first three test-suite categories in Table 1 contain testcases that fail when executed on ALS . First, executions ofsome of the test cases may fail because, in those executions,the application invokes the service with invalid inputs thatare provided by the user (Tfail(inu )). Such test cases re-veal faults in the application—either the application shouldfilter out such invalid user inputs or handle exceptions thatare returned by the service in response to the invalid inputs.Second, some test cases may fail because they invoke theservice with invalid inputs that are generated in the applica-tion (Tfail(ina )). Such test cases reveal faults in the appli-cation code that creates the input messages. For example,the application may be formatting valid user inputs incor-rectly before invoking the service. Third, some test casesmay fail because of faults in the application code that isunrelated to the interface with the service—e.g., code un-related to creating input messages and handling exceptionconditions (Tfail(v)). Test cases in this category invoke theservice with valid inputs.

Semantic service stubs can enable such faults to be de-tected without invoking remote services. Thus, the faultscan be fixed so that the application is more robust when itis integrated with remote services, which avoids discoveryof the faults on ARS and the consequent re-execution of thefailing test cases on ARS after the fault repair.

The fourth category of test cases (Tpass(in)) includestest cases that invoke the services with invalid inputs (ei-ther user-provided or application-generated) but that do notfail. Such test cases are designed to exercise the applica-tion’s behavior when it receives exception conditions fromthe service, and are essential for testing the robustness ofthe application. Finally, the remaining test cases (Tpass(v))neither violate the service input constraints nor fail.

Test cases that appear in Tfail(inu ) or Tpass(in) need notbe run on ARS . Test cases in Tfail(inu ) invoke the appli-cation with incorrect inputs and result in failing executions.After faults have been identified and fixed, these test casescan be rerun on ALS to verify the fault repair. Tests inTpass(in) are designed to test whether the application in-vokes the service incorrectly or does not handle, or incor-rectly handles, an exception response from the service; suchbehavior can be adequately tested using stubs. In both thesecases, after the faults identified by the tests have been fixed,the tests need not be run on ARS .

Thus, one of the key benefits of our approach is that itcan enable early detection of faults in the client application.Moreover, certain aspects of the application functionalitycan be tested adequately without invoking the remote ser-vices (and incurring the associated costs). Finally, the ap-proach can reduce the size of the test suite that needs to beexecuted on ARS .

4 Case Study

To demonstrate the feasibility of our approach and evalu-ate its benefits, we implemented a prototype and conductedtwo case studies.

4.1 Experimental setup

Services and semantic service stubs. For our study, weused a set of data services from an ongoing IBM project.There are 12 key artifacts in the system, which include busi-ness processes, requirements, and key performance indica-tors (KPIs). For each artifact, there are REST-based ser-vices for retrieving, creating, updating, and deleting the arti-fact, and linking/unlinking it to another artifact. On average,each artifact has 10 operations associated with it. The totalservices code consists of about 20000 lines of Java code.

One of the authors who was involved in the develop-ment of the services annotated the code with JML specifica-tions. The specification consists of preconditions on inputdata and postconditions on return values. We used JMLC togenerate the stubs containing assertions for validating thepreconditions. Then, we manually augmented the stub codeto generate return values based on postconditions of an op-eration (e.g., to return values of different classes, such asnull and valid values, as discussed in Section 2.3).

Client application. We wrote a client Java applicationthat invokes the services to implement five use cases: (1)create a business process; (2) update a business process; (3)retrieve a business process; (4) link KPIs to a business pro-cess; (5) unlink the KPIs associated with a business process.The client application consists of 5 classes and about 2000lines of code; two of the classes interact with the data ser-vices. The client was written by an author who was notinvolved in the development of the services and, therefore,was not aware of all the correct semantics for invoking theservices.

We refer to the application as AbpRS when integrated with

remote services, as AbpLS when integrated with semantic ser-

vice stubs, and as Abp when the context of integration isirrelevant.

Test suite for the client application. Next, we used a ran-dom test-case generator JCRASHER [8] to generate JUNIT

test cases for the client application.6 JCRASHER generated47 test cases for the application. We augmented the testsuite by generating 50 functional test cases.

Deployment. Finally, we deployed the service over Tom-cat on a Linux server (3 GHz, 7GM RAM); we deployed thesemantic stubs as a service over Tomcat on a Windows ma-chine (2.16 GHz, 2GB RAM). The client application, alongwith the test cases, resided on the Windows machine too.

6We selected JCRASHER because it has been used in several empiricalevaluations and is widely cited (e.g., [18, 20]).

201

Table 2. Number of test cases in different categories.Tinitial Tfail(inu ) Tfail(ina ) Tfail(v) Tpass(in) Tpass(v) Tfinal

97 30 20 0 0 47 67

4.2 Study 1: Test suite reduction

Goals and method. In the first study, we investigated thereduction in the size of the test suite that must be executedon A

bpRS . Table 1 shows that test suite Tfinal contains the

test cases to be executed on AbpRS . Thus, the difference in

the sizes of Tinitial and Tfinal represents the test suite re-duction that can be achieved using our approach. We exe-cuted all 97 test cases on A

bpLS . For the failing test cases,

we identified the ones that were caused by faults in the ap-plication. We fixed those faults and re-executed all the testcases. By following this process, we classified the 97 testcases into the categories shown in Table 1.

Results and analysis. Table 2 shows the classification ofthe 97 test cases into different categories. During the firstexecution of the 97 test cases, 47 passed and 50 failed.Among the passing test cases, none invoked the service op-erations with invalid inputs. Therefore, the 47 passing testcases belong in category Tpass(v).

On examining some of the failing executions, we foundthat Abp was passing invalid user inputs to the service op-erations. We fixed the fault by adding checks in Abp thatfiltered out invalid inputs. Then, we reran all 97 test cases.In this run, 77 test cases passed, whereas 20 failed. Thus,the 30 additional test cases that passed after the fault repairhad previously failed because invalid user inputs were be-ing passed in to the service; we classified these 30 test casesinto category Tfail(inu ).

Next, we examined some of the failing executions in thesecond iteration, and found another fault in Abp : the appli-cation was incorrectly formatting the string containing KPIsthat was input to the service operations that link and unlinkKPIs. The application used a semi-colon as the delimiter informatting the input string, whereas the operation expecteda comma as the delimiter. Upon fixing this bug, we reran the97 test cases; in this iteration, all test cases passed. Thus,the 20 test cases that had failed in the previous iteration re-vealed a fault in input message creation in Abp . Therefore,we added these test cases to the Tfail(ina ) category.

The final test suite, Tfinal , that needs to be executed onA

bpRS contained 67 test cases: 47 from Tpass(v) and 20 from

Tfail(ina ). As discussed in Section 3, test cases in Tfail(inu )

need not be executed on ARS . Thus, the size of the testsuite decreased from 97 to 67—a reduction of 31%.

Although limited in nature (we discuss threats to the va-lidity of our observations in Section 4.4), the study indicatesthat our approach can be effective in reducing the size of thetest suite that needs to be executed on ARS . Moreover, the

0

100

200

300

400

500

600

700

800

900

0 0.25 0.5 0.75 1

Execution tim

e (

s)

Server delay (s)

Local semantic stubsRemote services

Figure 3. Execution time for different server load values.

execution of test cases on ALS can lead to early detectionof faults, thus ensuring that certain aspects of the applica-tion functionality are well-tested before service integrationoccurs.

4.3 Study 2: Execution efficiency

Goals and method. In the second study, we evaluated theimprovement in execution time of the test suite that may beachieved by using local semantic stubs. We considered twoscenarios. In scenario 1, we executed the test suite usingour approach. Thus, we executed Tinitial twice on A

bpLS

and Tfinal once on AbpRS . The two executions of Tinitial

simulate the situation in which faults are identified duringthe first run and, after fault repair, the test suite is rerun toverify the fixes. We computed the total execution time as:

Tls = 2 ∗ time(Tinitial , AbpLS ) + time(Tfinal , A

bpRS ),

where time(T ,A) represents the time required to runtest suite T on application A.

In scenario 2, we considered the conventional test execu-tion environment in which semantic stubs are not used. Tocompute the test execution time for this environment, weran Tinitial twice on A

bpRS :

Trs = 2 ∗ time(Tinitial , AbpRS )

Moreover, for the execution of test cases on AbpRS , we

considered varying server delays to simulate potential net-work latency and server loads, which may affect test-suiteexecution in a real environment. To experiment with differ-ent server delays, we deployed a router on the Linux serverthat routes requests to the service after introducing a speci-fied delay. We computed Tls and Trs for five values of serverdelays (in seconds): 0, 0.25, 0.5, 0.75, and 1.0.

Results and analysis. Figure 3 plots the execution timesfor the two scenarios. The horizontal axis shows the serverdelay values; the vertical axis shows the execution time (inseconds) for Tls and Trs . The plot illustrates that the ex-ecution time for Trs increases significantly faster than theexecution time for Tls . At zero server delay, Tls is 33 sec-onds, whereas Trs is 42 seconds. At one-second delay, Trs

202

(861 seconds) is nearly three times Tls (278 seconds). Onaverage, the execution time for Trs is more than twice theexecution time for Tls .

Thus, this study indicates that test execution using localsemantic stubs can be much faster than test execution usingremote services only. With larger test suites and more com-plex client applications, the differences between the twoscenarios might be much more significant.

4.4 Discussion

The empirical results indicate that our approach can beeffective in reducing the size of the test suite and improvingthe execution efficiency. However, there are some threats tothe validity of the evaluation.

The most significant threats in our validation are threatsto external validity, which arise when the results cannot begeneralized. One external threat is the representativeness ofour subject program and test suite. Our evaluation is basedon only one application that was written in-house by theauthors; therefore, we cannot make any claims about howthe results might apply to more varied services and service-oriented applications. We tried to limit this threat by usingservices that have been developed, and are used, in an on-going research project. We also wrote the application bycreating realistic use cases. Moreover, the application waswritten by an author who was not involved with the devel-opment of the services.

Our test suite may not be representative of how test casesare generated in real environments. We tried to mitigate biasin test-case generation by using a combination of randomand functional test cases. However, the composition of atest suite is a significant determinant of the effectiveness ofour approach, which may have affected our observations.

Another threat concerns the nature of the faults detectedon the application. This threat is reduced by the fact that thefaults were not seeded and, in fact, were detected unexpect-edly only during test execution. A final threat to externalvalidity is our selection of the testing process that the entiretest suite is rerun after a fault repair. A selective regression-testing technique would rerun only the test cases that areaffected by the fault repair. However, we think that in envi-ronments where automated test execution is used, rerunningall test cases is not unrealistic.

To conclude, although our study is limited in nature, itillustrates the benefits of our approach. Future studies thatare more elaborately designed can investigate whether theresults generalize.

5 Related Work

There is much research in the area of web-services test-ing; however, most of the existing work focuses on test-data generation and developing test automation frameworks(e.g., [1, 12, 13, 18, 22, 26]).

Many researchers have presented the idea of associat-ing contracts with web-service interfaces to support testing(e.g., [9, 10, 22, 23, 25]). Some researchers have proposedextensions to conventional contracts to enable richer behav-ioral specifications for web services, such as input-outputdependences and invocation sequences [25], and controlconstructs [9]. The contracts are used to create test cases(using, for example, preconditions, control constructs, andinvocation sequences) and test oracles (using postcondi-tions) [9, 10, 22, 23]. Contracts have also been used to gen-erate conformance tests, which validate, before registering aservice, that the service is delivering the expected function-ality [2, 11]. Our approach differs from these techniquesin that it leverages contracts to generate semantic stubs thatare used by service consumers for testing their applications.Unlike existing research, the goal of our work is to improvethe testing of client applications—by enabling test suite re-duction, efficient execution of test cases, and potential earlydetection of defects.

Pacheco and Ernst [20] present a classification of testcases as illegal, normal operation, or fault-revealing. Theirapproach uses an operational model of a system (consistingof properties that hold at method entry and exit) to classifya candidate input based on the properties that are violated.The goal of their approach is to create a reduced set of fault-revealing test cases from a large test suite. Our classificationof test cases is more fine-grained that their classification: forexample, the test inputs classified as normal operation cor-respond to Tpass(in) and Tpass(v) in our classification. Ourclassification is intended to capture how test-case propertiesdetermine whether a test case should be executed on remoteservices, and thus, illustrate the test suite reduction that canbe attained using our approach.

Using mock objects is a well-known technique for sub-stituting parts of a program that are irrelevant for testinga unit [17]. Tillmann and Schulte [24] present a tech-nique for generating mock objects, that corporate behavior,by analyzing client test cases and applications that use themocked objects. Our approach applies the notion of behav-ioral mock objects—in the form of semantic stubs—to thedomain of service-oriented applications. In this domain, se-mantic stubs offer the benefits of reducing the costs of test-ing by enabling fewer invocations of remote web services.

6 Summary and Future Work

In this paper, we presented an approach that uses seman-tic service stubs for improving the efficiency and effective-ness of testing service-oriented applications. Our approachcan be used to partition the client test suite into subsets suchthat not all subsets need to be executed using remote ser-vices. The results of our study indicate that the approachcan not only reduce the number of test cases that must beexecuted on remote services, but also lead to early detec-

203

tion of faults, especially those related to the application’sinteraction with the services.

In future work, semantic service annotations could beenhanced beyond preconditions, postconditions, and excep-tion conditions to include other aspects of service behav-ior, such as invocation-sequence constraints and state-basedconstraints. To facilitate the use of our approach in practice,automated support for generating service annotations andsemantic stubs is important. Thus, future work could inves-tigate different ways of automating the process of extractingservice semantics. Program-analysis techniques can be usedto infer implicit contracts from source code (e.g., [4, 27]),which can help ensure that the service contracts are com-plete and consistent. Moreover, documentation availablein HTML pages for services, such as those provided byGoogle and Amazon, can be parsed using screen-scrapingtechnologies, to recover service contracts. Finally, futureexperimental studies, using a wider variety of services andapplications, could evaluate how our results generalize.

References

[1] X. Bai, W. Dong, W.-T. Tsai, and Y. Chen. WSDL-basedautomatic test case generation for web services testing. InProc. of the Intl. Workshop on Service-Oriented Syst. Eng.,pages 207–212, Oct. 2005.

[2] A. Bertolino and A. Polini. The audition framework for test-ing web services interoperability. In Proc. of EUROMICRO31st Conf. on Softw. Eng. and Advanced Applications, pages134–142, Aug. 2005.

[3] M. Bruno, G. Canfora, M. Di Penta, G. Esposito, andV. Mazza. Using test cases as contract to ensure servicecompliance across releases. In Proc. of the Intl. Conf. onService Oriented Computing, pages 87–100, Dec. 2005.

[4] R. P. L. Buse and W. R. Weimer. Automatic documentationinference for exceptions. In Proc. of the Intl. Symp. on Softw.Testing and Analysis, pages 273–281, July 2008.

[5] G. Canfora and M. Di Penta. SOA: Testing and self-checking. In Proc. of the Intl. Workshop on Web Services— Modeling and Testing, pages 3–12, July 2006.

[6] G. Canfora and M. Di Penta. Testing services and service-centric systems: Challenges and opportunities. IT Pro,8(2):10–17, Mar–Apr 2006.

[7] J. Chang and D. J. Richarson. Structural specification-basedtesting: Automated support and experimental evaluation. InProc. of the 7th European Softw. Eng. Conf. and 7th ACMSIGSOFT Symp. on the Foundations of Softw. Eng., pages285–302, Sept. 1999.

[8] C. Csallner and Y. Smaragdakis. JCrasher: An automatic ro-bustness tester for Java. Software—Practice and Experience,34(11):1025–1050, Sept. 2004.

[9] G. Dai, X. Bai, Y. Wang, and F. Dai. Contract-based testingfor web services. In Proc. of the 31st Intl. Comp. Softw. andApplications Conf., pages 517–526, July 2007.

[10] R. Heckel and M. Lohmann. Towards contract-based testingof web services. In Proc. of the Intl. Workshop on Test and

Analysis of Component Based Syst., volume 116 of Electr.Notes in Theor. Comput. Sci., pages 145–156, Jan. 2005.

[11] R. Heckel and L. Mariani. Automatic conformance testingof web services. In Proc. of Fundamental Approaches toSoftw. Eng., pages 34–48, Apr. 2005.

[12] H. Huang, W.-T. Tsai, R. Paul, and Y. Chen. Automatedmodel checking and testing for composite web services. InProc. of the 8th Intl. Symp. on Object-Oriented Real-TimeDistributed Computing, pages 300–307, May 2005.

[13] C. Keum, S. Kang, I.-Y. Ko, J. Baik, and Y.-I. Choi. Gener-ating test cases for web services using extended finite statemachine. In Proc. of the 16th IFIP Intl. Conf. on TestingCommunicating Systems, pages 103–117, May 2006.

[14] J. C. King. Symbolic execution and program testing. Com-mun. ACM, 19(7):385–394, July 1976.

[15] R. Kramer. iContract – The Java design by contract tool.In Proc. of TOOLS 26 Technology of Object-Oriented Lan-guages, pages 295–307, Aug. 1998.

[16] G. T. Leavens and Y. Cheon. Design by contract with JML.http://www.eecs.ucf.edu/ leavens/JML/jmldbc.pdf.

[17] T. Mackinnon, S. Freeman, and P. Craig. Endo-testing: Unittesting with mock objects. Addison-Wesley Longman Pub-lishing Co., Inc., 2001.

[18] E. Martin, S. Basu, and T. Xie. Automated testing and re-sponse analysis of web services. In Proc. of the Intl. Conf.on Web Services, pages 647–654, July 2007.

[19] B. Meyer. Applying “design by contract”. IEEE Computer,25(10):40–51, Oct. 1992.

[20] C. Pacheco and M. D. Ernst. Eclat: Automatic generationand classification of test inputs. In Proc. of the 19th Euro-pean Conf. on Object-Oriented Programming, pages 504–527, July 2005.

[21] H. Schlenker and G. Ringwelski. POOC: A platform forobject-oriented constraint programming. In In Proc. JointERCIM/CologNet Intl. Workshop on Constraint Solving andConstraint Logic Programming, pages 159–170, June 2002.

[22] H. M. Sneed and S. Huang. WSDLTest – A tool for test-ing web services. In Proc. of the Intl. Symp. on Web SiteEvolution, pages 14–21, Sept. 2006.

[23] N. Tillman and J. de Jalleux. White-box testing of behav-ioral web service contracts with Pex. In Proc. of the 2008Workshop on Testing, analysis, and verification of web ser-vices and applications, pages 47–48, July 2008.

[24] N. Tillmann and W. Schulte. Mock-object generation withbehavior. In Proc. of the 21st Intl. Conf. on Automated Softw.Eng., pages 365–368, Sept. 2006.

[25] W.-T. Tsai, R. Paul, Y. Wang, C. Fan, and D. Wang. Extend-ing WSDL to facilitate web services testing. In Proc. of theIntl. Symp. on High Assurance Syst. Eng., pages 171–172,Oct. 2002.

[26] W.-T. Tsai, X. Wei, Y. Chen, and R. Paul. A robust test-ing framework for verifying web services by completenessand consistency analysis. In Proc. of the Intl. Workshop onService-Oriented Syst. Eng., pages 151–158, Oct. 2005.

[27] J. Whaley, M. C. Martin, and M. S. Lam. Automatic extrac-tion of object-oriented component interfaces. In Proc. of theIntl. Symp. on Softw. Testing and Analysis, pages 218–228,July 2002.

204

[IEEE 2009 IEEE International Conference on Web Services (ICWS) - Los Angeles, CA, USA...

Documents

Transcript of [IEEE 2009 IEEE International Conference on Web Services (ICWS) - Los Angeles, CA, USA...