Approaches to Failure and Recovery in Service Composition
-
Upload
bikash-ranjan-satapathy -
Category
Documents
-
view
29 -
download
3
Transcript of Approaches to Failure and Recovery in Service Composition
Approaches to Failure and Recovery in Service Composition
by
Petrus Johannes Steyn [email protected]
Department of Computer Science University of Pretoria Pretoria, South Africa
November 2006
SPE780 Computer Science Honours Project
2
Table of Contents Topic Page
Number
1 INTRODUCTION .............................................................................................................. 5
2 OVERVIEW OF WEB SERVICES.................................................................................... 6
2.1 WHAT IS A WEB SERVICE? ......................................................................................... 6 2.2 SOME OF THE PROBLEMS ........................................................................................... 7 2.3 SOME STANDARDS RELATED TO WEB SERVICES ......................................................... 8
3 FAILURE: AN INTRODUCTION........................... ......................................................... 10
3.1 AVAILABILITY FAILURES . .......................................................................................... 11 3.2 CONCURRENCY FAILURES ........................................................................................ 13 3.3 DEPENDENCY FAILURES ........................................................................................... 14 3.4 INCONSISTENCY FAILURES ....................................................................................... 15 3.5 COMPOSITION FAILURES .......................................................................................... 17 3.6 PARTIAL FAILURES .................................................................................................. 17 3.7 FAILURES DUE TO AMBIGUOUS OUTPUT .................................................................... 20 3.8 OTHER FAILURES ..................................................................................................... 21
4 POSSIBLE RECOVERY METHODS .......................... ................................................... 22
4.1 TRANSACTIONAL APPROACH .................................................................................... 22 4.2 DYNAMIC WEB SERVICES ......................................................................................... 25 4.3 LANGUAGE CONSTRUCTS......................................................................................... 26 4.4 SELF-HEALING NETWORKS ....................................................................................... 27 4.5 TRIVIAL RECOVERY METHODS .................................................................................. 30
5 FAILURE DETECTION .................................. ................................................................ 31
5.1 DEFENSIVE PROCESS DESIGN .................................................................................. 32 5.2 SERVICE RUN-TIME MONITORING .............................................................................. 32 5.3 WOFBPEL .............................................................................................................. 33
6 THREE SCENARIOS ..................................................................................................... 33
6.1 FOREIGN TRAVELLER INFORMATION ......................................................................... 33 6.2 GENERAL ENTERTAINMENT PLANNER ....................................................................... 35 6.3 MALL INFORMATION SYSTEM .................................................................................... 35
7 EXAMPLE SCENARIO: SHOPPING DOMAIN .................. ........................................... 37
7.1 PROGRAM DEMO...................................................................................................... 41
8 RELATED WORK ....................................... ................................................................... 41
9 CONCLUSION................................................................................................................ 43
10 ACKNOWLEDGEMENTS................................... ........................................................... 44
3
Table of Contents for Figures Figure Page
Number Figure 1 – Web Services Stack ...................... ........................................................................ 9 Figure 2 – Flow Diagram of an availability failure. ............................................................. 11 Figure 3 – Flow Diagram of a Partial Failure ....... ............................................................... 18 Figure 4 – Flow Diagram of a process showing Ambigu ous Output............................... 20 Figure 5 – Flow Diagram of Transactional-Based Appr oach............................................ 25 Figure 6 – Catch Branch in OBPMS ................... ................................................................. 26 Figure 7 – Flow diagram of Pizza Company ....................................................................... 29 Figure 8 – Flow Diagram of a Trivial Recovery Metho d .................................................... 31 Figure 9 – Foreign Traveller Information ........... ................................................................. 34 Figure 10 – General Entertainment Planner .......... ............................................................. 36 Figure 11 – Mall Information System................ ................................................................... 36 Figure 12 – Flow Diagram showing where Sub-Goals wi ll be Checked .......................... 39 Figure 13 – Screenshot of Program requesting data .. ...................................................... 48 Figure 14 – Busy Searching for Shops ............... ................................................................ 49 Figure 15 – Results Found .......................... ......................................................................... 50 Figure 16 –Displaying Results ...................... ....................................................................... 51 Figure 17 – Failure with the possibility of Recover y ......................................................... 52 Figure 18 – Notification of Failure without the pos sibility of Recovery .......................... 53
Table of Contents for Examples Figure Page
Number Code Example 1 – Service not Found Exception from B PEL Console............................ 12 Code Example 2 – Error Message produced by server w hen incorrect types are used
as input ........................................... ............................................................................... 14 Code Example 3 – BPEL Code from Example............ ........................................................ 16 Code Example 4 – The Corresponding WSDL description ............................................... 16 Code Example 5 – Time out Exception from the BPEL S erver......................................... 19 Code Example 6 – Time out Exception shown in the BP EL Console .............................. 19 Code Example 7 – Catch Branch In BPEL .............. ............................................................ 27
4
Abstract. Web services have become a vital part of our
lives. People do not always know that they are there, but we
do notice it when something went wrong. There are various
problems that can occur when using Web Services. These
problems can be trivial problems like a broken connection
or even more complicated like composition problems.
These problems, or failures, can be fixed by making use of
different recovery methods. Some common recovery
methods that are being researched today include Self-
healing networks and Transaction-based strategies. Most of
the research today is going into Self-healing networks and
dynamic composition of services. Many different detection
methods also exist and the two that are used frequently in
Self-healing networks are namely Defensive Process
Design (DPD) and Service run-time Monitoring (SrtM).
These two are examples of run-time detection strategies.
There are also static or off-line detection strategies.
WofBPEL is a good example of a static fault detection
strategy.
There are different applications for Web Services. Three
examples applications are discussed briefly in this
document. Each of them can be implemented in the real
world and can be of value if implemented successfully.
This document proposes a classification for some of the
most common failures that can occur when using Web
Services. It also proposes some recovery methods that can
be used to recover from these common failures. One of
these recovery methods is also illustrated at the hand of a
real world example.
Keywords: Web services, composition failures, recovery methods.
5
1 Introduction Web services are becoming a big part of everyday life. We use them all
the time without even knowing it. But like everything in life, we only notice
something, when it’s not working.
Web services are very dynamic. They are all around us and we use them
everyday without even knowing it. Popular web site use Web Services to find
and display information from various different domains. The travel domain is
on the domains that rely heavily on the use of Web Services. Travel agencies
use Web Services to connect to other companies (like airline companies of
bus companies) to get their schedules and prices from them.
They can be working for months at a time, and then suddenly go down for
various reasons. When they are down, the system has to, somehow, recover
the data the user requested. There are various ways in which this can be
done. In this document, I try to classify some of the most common failures
that can occur when using Web Services. I also take a look at a few recovery
methods and also briefly discuss three failure detection strategies that are
used today. These failures detection strategies can be classified into two
categories; run-time detection strategies and off-line detection strategies.
These will be discussed in Section 5. Finally, I use a real world example to
illustrate one of the recovery methods that I discussed.
The remainder of this document is broken up as follows. In Section 2, I
give a brief overview and introduction of web service. In Section 3, I give an
overview of the different types of failures that can occur. In Section 4, I give
some methods for recovering from different failures. Section 5 briefly covers
some detection methods found in literature. In Section 6 I introduce a few real
world examples and also give some examples of what errors can occur during
the use of these examples. Section 7 goes more in depth into one of the
examples introduced in Section 6. In Section 8, I discuss some related work
and Section 9 concludes this document.
6
2 Overview of Web Services As stated in the introduction, web services are all around us. We interact
with them all the time without even knowing that they exist. But what is a web
service? And why are they so important today?
2.1 What is a Web Service?
A Web Service is an entity on the web that can provide various kinds of
information to clients. Some types of services offered are: Weather Services,
Exchange Rate Services, Language Translation Services, Geographical
Information Services, etc. These services are accessible from anywhere in
the world, and they are always available (at least theory). The use of these
services are not limited, although, some providers can charge clients for the
use of the services that they provide. They form part of a greater architecture
known as Service Oriented Architecture (SOA). According to Wikipedia [2],
SOA is a “software architectural concept that defines the use of services to
support the requirements of software users”. Web Services are often
identified as the default implementation of SOA, but SOA can be implemented
using various other service-based technologies.
As an example, a Web Service can be compared to a company that is
providing some service to the community. People from the community can
use this service to their advantage. Let’s say the company is a supermarket.
The supermarket will supply the community with the goods that they want at
an affordable rate. Unfortunately, as is always the case, competition is not far
away. Another supermarket will open up soon, offer the same services, but at
a better price. This will cause the old supermarket to either lower their prices,
or offer newer services to their customers.
This example describes what is happening continuously with web
services. One web service, web service A, will offer exchange rate
information. Later a new web service, web service B, also does the same, but
offers better information (more up to date). In response to this, web service A
will start to offer more services, like additional stock exchange information.
This race can continue until one service provider will stop its service
completely.
Web services, as said, are always available. The only thing a client has to
do is to go out and find them. Finding a service that meets a client’s
7
requirements can be complicated, but things can become easier if the service
makes use of certain methods to advertise itself.
Services make use of a Description Language to describe what it does,
and how a client can get access to it. This is described in Web Services
Description Language (WSDL), which serves as an interface to the service. A
WSDL description will supply the client with the necessary operations to
invoke it, and might include a description of the functionality of the service as
well.
Another method of advertising is making use of Ontology-annotated
signatures. These signatures, according to Brogi & Popescu [2005], describe
the semantics of a service. The semantic description of a Web Service will
describe, not only what the Web Service is, but also in what context to use it
(Foggon et al. [2004]). These signatures will eventually be used in the WSDL
descriptors to fully describe the service, and to expose the interface.
There are various other methods and languages that have been
developed around web services, and they will be discussed later.
2.2 Some of the Problems
What are the problems facing us when we want to use multiple services to
gain useful information? Why can we not just use one service for all our
needs?
According to Yu & Lin [2005], services can be upgraded or changed
dynamically according to changes or needs in the environment. This can
result in problems if interfaces to these services also change. When it comes
to service discovery, Sahin et al. [2005] states, that although many advances
have been made when it comes to service discovery, most of the service
discovery techniques have 2 major problems. (1) There is usually some
centralized server involved which handles all requests and this provides a
central failing point for the whole system, and (2) many servers offer limited
search capabilities, which means that you will not be able to always find the
best service.
Once you have access to the services, and you retrieved the necessary
data from them, the system then has to compose the data in a meaningful
way. This is known as service composition. During this process, the system
must be able to distinguish between data that is useful, and data that is
unwanted. This is not always very easy, and it cannot be guaranteed that the
8
data we receive is the correct data. In Yu & Lin’s [2005] paper, the authors
take the approach of using Quality of Service measures, to ensure that data
that are retrieved are correct. The problem with this is that you have to
compare various services with each other in order to establish which service
offers the best quality data.
Another method of making sure that you do get the correct data is to
always use a trusted and reliable service provider. This will ensure that the
data is always correct, and that you receive a quality service. However, things
do go wrong. Service providers might change the services they offer, their
servers might crash, or they might shut down their servers. In such cases,
any use of the services provided by the service provider will result in a failure
being reported by the system.
There are various ways in which we can recover from failure. The system
can keep a backup of previous searches (in the form of a cache), and can use
this data. However this data will not be up to date and it might be invalid. The
system can also launch a search for a new service provider, or search for a
web service that claims to offer the same services. This will result in the user
getting the most up to date and correct data, but it might take a while to
perform the search. Different recovery methods will be discussed in Section
4.
2.3 Some Standards related to Web Services
According to Tartanoglu et al. [2006] the overall definition for Web
Services architecture is still incomplete. The base standards for Web Services
have already emerged from the W3C. They define a core middleware that is
partly built upon results obtained from object-based and component-based
middleware.
The main standards for Web Services architecture as defined by the W3C
Web Service Activity and the Oasis Consortium are:
• SOAP (Simple Object Access Protocol): A lightweight protocol for
information exchange. It sets the rules on how to encode data in XML.
It also describes invocation semantics and mappings to other Internet
transfer protocols.
• WSDL (Web Services Description Language): An XML-based
language that specifies a service’s interface (the type of messages that
9
the service can understand), and the binding information (the protocol
dependant details).
• UDDI (Universal Description, Discovery and Integration): A registry for
dynamically locating Web Services. It can also be used to advertise
Web Services.
Figure 1 shows how these standards fit together in the technology stack.
This figure is adapted from the figures that can be found in Mikalsen et al.
[2002], Tartanoglu et al. [2006] and van der Aalst [2003].
Along with these standards, there also exist a number of languages that
are part of Web Services. The defacto standards for Web Services are BPEL
and WSDL. Where WSDL describes the service’s interface, BPEL describes
the service’s workflow. It describes the interactions that can be performed on
the services (interactions like invoke, reply and receive). These two
languages have been used very successfully up until now. Both have their
roots in XML, and both make use of several W3C approved standards.
Figure 1 – Web Services Stack
Van der Aalst [2003] took a pessimistic look at some of the standards that
have been developed in and around Web Services and work flow languages
in Web Services in his contribution: “Don’t go with the flow: Web services
composition standards exposed”. According to him, all of the supports
claimed by some of the languages are unfounded. He is also under the
10
impression that there are too many so-called ‘standards’. Some of the
languages that he inspected were: BPEL, Microsoft’s XLANG, IBM’s WSFL
and the Workflow Management Coalition’s XPDL. From his research, BPEL
was one of the most comprehensive languages, albeit the most complex one.
His research though was done back in 2003, and since then, BPEL has
become a default standard for describing the work flow of Web Services.
There are also a few other languages (discussed later in this document),
but all the examples and code in this document is in BPEL and WSDL.
3 Failure: An Introduction Different types of failures can occur during the use of web services. These
failures can be caused by something as simple as a broken connection or
busy server, but they might also manifest during the composition of services.
Most of these failures can be solved with little effort, but sometimes the
problem lies much deeper.
Some trivial errors that can happen are broken connections and server
downtime.
These are caused by external factors most of the time since the fault can
lie at the server side (and in the case of the server downtime, the fault will
definitely be caused by the server). There are many other types of failures
that can range from concurrency problems, to dependency problems, and
even availability problems. Most of the types of failures can be classified
accordingly:
• Failures caused by availability.
• Failures caused by concurrency.
• Failures caused by dependency.
• Failures caused by inconsistency.
• Failures caused by incorrect composition.
• Partial failures caused by incorrect parallel execution.
• Failures due to ambiguous output.
These classifications are not the only ones that exist, but they are the
most common ones. Even though most tools will not deploy services with
some of these failures, they can still find their way in if you deploy them
manually. In the following sections I will describe each classification, and also
provide some examples of how these failures can occur.
11
Where possible, I used Oracle JDeveloper 10g [1], and Oracle BPEL
Process Manager Server [1] to simulate the errors in the examples. All the
examples were coded in WSDL and BPEL, mainly due to the development
environment, but also because they work well together and because of their
popularity. Other languages do exist, but the pros and cons will be discussed
later in this paper. All examples make use of “dummy services” that only take
simple inputs and give back simple replies.
3.1 Availability Failures.
Failures in this classification can almost always be traced back to the
server or the connection to the server. They can present themselves in the
form of a ‘time out’, or a ‘service not found’ error.
Figure 2 – Flow Diagram of an availability failure
In OBPMS, this indicates that an error occurred during the execution of the process in question.
12
During a Time Out, the client will usually stop requesting the service after
a certain amount of time due to the server not responding to its requests. This
can be caused by a busy server, or a broken connection, or a lost message.
Either way, the service cannot be accessed at that time.
A Service Not Found error can be attributed to a faulty server, or a deleted
service, or a broken connection. In these cases, the client assumes that the
service is deleted because it cannot find the service or the server that is
hosting the service. A Service Not Found error can also be caused by the
same conditions that cause a Time Out error.
In this example, I make use of a service that does not exist any more. The
system responds with a Remote Fault (basically a Service Not Found error),
and will return this error to the client. In Oracle’s BPEL Process Manager
Server [1] (OBPMS), the following output was observed.
In OBPMS, the user has the option of viewing either the flow diagrams of
the service or the code of the service. The following figure (Figure 2) is the
resulting flow diagram produced by OBPMS.
If we take a look at the code of the example, the following error was
reported by OBPMS.
<process>
<sequence>
receiveInput
[2006/10/16 10:03:42] Received "clientInput" call from partner "client" More...
<scope name="shopScope">
<sequence>
shopInputAssign
[2006/10/16 10:03:42] Updated variable "shopInput" less
<shopInput> <part xmlns:xsi=http://www.w3.org/2001/XMLSchema-in stance name="payload"> <shopdef xmlns="http://services.otn.com">
CNA,PRETORIA </shopdef> </part>
</shopInput>
searchShop (faulted)
[2006/10/16 10:03:42] " remoteFault" has been thrown. less
Code Example 1 – Service not Found Exception from B PEL Console
13
<remoteFault xmlns="http://schemas.oracle.com/bpel/ extension"> <part name="code"> <code> WSDLReadingError </code> </part> <part name="summary">
<summary>Failed to read wsdl. Failed to read wsdl at "http://localhost:9700/orabpel/default/ShopServiceV 2/ShopServiceV2?wsdl", because "WSDLException: faultCode=INVALID_WSDL: The document: http://localhost:9700/orabpel/default/ShopServiceV2 /ShopServiceV2?wsdl is not a wsdl file or does not have a root element of "definitions" in the "http://schemas.xmlsoap.org/wsdl/" namespace or the "http://www.w3.org/2004/08/wsdl" namespace.". Make sure wsdl is valid. You may need to start the OraBPEL server, or make sure the related bpel process is deployed correctly. </summary>
</part> </remoteFault>
</sequence>
<scope>
</sequence>
[2006/10/16 10:03:42] "BPELFault" has not been caught by a catch block.
[2006/10/16 10:03:42] BPEL process instance "1105" cancelled
</process>
Code Example 1 – Service not Found Exception from B PEL Console continued…
3.2 Concurrency Failures
With concurrency failures come all the usual concurrency problems that
exist in normal computer systems and networks. A service can be used by
more than one client at any time, and this can cause concurrency problems if
the service is being updated by a client or by the server. Other clients need to
be informed about the update otherwise clients using the service will receive
inconsistent or corrupt data from the service. These types of failures are
difficult to detect unless the client is actually aware of the updates. Clients will
not know the difference if they are using a service that is outdated, as long as
they receive data that looks correct, according to them. These types of
failures are not common, but they can cause big problems if not caught in
time.
In another scenario, a web service can be used as a resource that first
needs to be acquired. This won’t happen often though since it would not
make sense to create such a service. In addition to this, Tanenbaum et al.
14
[2002] states that trying to lock resources that are distributed is difficult and
can lead to a deadlock situation if not approached correctly.
3.3 Dependency Failures
Services are not only limited to only supplying us with information. They
can make use of other external services to gather the required information
before passing it on to the client. Many problems can occur when using such
a technique. Messages can get lost between services, and can cause a
service to deduce that the called service is not available any more. A service
can also pass on incompatible types to the services it calls (send on string
values when integer values are expected). This can cause the receiving
service to misinterpret the incoming message from the sender, and will
produce incorrect results due to the incompatible types received.
This type of error can be avoided when using a development environment,
but as stated in the previous sections, service providers can and will update
their services periodically. These updates might include changing the types of
the expected input data. Unless these changes are communicated to the
clients, or to the services using the updated service, failures will occur.
In the following example, I tried to invoke a service with an incorrect type.
The server was the only component to respond to this incorrect input. The
service itself did not respond any further because the server refused to invoke
it.
Message handle error.
An exception occurred while attempting to process t he message
"com.collaxa.cube.engine.dispatch.message.invoke.In vokeInstanceMessa
ge"; the exception is: XPath expression failed to e xecute.
Error while processing xpath expression, the expres sion is
"((bpws:getVariableData("inputVariable", "payload",
"/client:DummyService_3ProcessRequest/client:input" ) mod 2.0) =
1.0)", the reason is NaN is not an integer.
Please verify the xpath query.
Code Example 2 – Error Message produced by server w hen incorrect types are used as input
If the set up of the service was correct, in other words if we included catch
branches and exception handlers, the service would have been invoked.
15
However, when working in a synchronous environment, the service would
eventually throw a time out exception to inform the client that something went
wrong if the necessary exception handlers are not present. If we were to work
in an asynchronous environment, we would have to include catch branches to
catch the exception.
3.4 Inconsistency Failures
Every now and again, a service provider might decide to change the
descriptors of some of their services. These changes can affect the access to
them in either a positive or a negative way. On the positive side, the new
descriptors might enhance the use of the service. On the negative side, the
new descriptors might cause a service to become unavailable.
Changes to a service’s descriptor file can cause one of two major
problems. If the descriptor is changed during run time, a client already using
the service might get unexpected results due to the new descriptor file. It can
also cause a service to behave differently to what it is supposed to do.
Changes to the descriptor can also cause a service to be broken
completely. This can happen if the descriptor file, and corresponding BPEL
file, are inconsistent, e.g. the BPEL file uses a variable that is not defined in
the descriptor file anymore.
The latter error should not happen too often since many software
development environments provide checks and tools to prevent this kind of
error. However, if a service provider chooses to do things manually, these
errors can (and most probably will) occur.
In the following code example, the descriptor has been changed, but the
workflow file was kept the same. This will result in a failure. Most
development environments will not allow the creation of such erroneous
services.
The highlighted code segments (shown in bold) are the code segments that will cause the inconsistency problems. The BPEL process still thinks that the input and fault variables can be accessed through the
MapServiceRequestMessage and MapServiceFaultMessage
respectively, whilst their names have changed in the description file to MapServiceInvokedMessage and MapServiceErrorMessage . The
outcome of such an error cannot be tested in the environment setup that I chose to work in, so the resulting behaviour is unknown.
16
<partnerLinks > < partnerLink name=" client " partnerLinkType=" tns:MapService " myRole=" MapServiceProvider "/> </ partnerLinks > <variables > <variable name="input" messageType="tns:MapServiceRequestMessage"/> < variable name=" output " messageType=" tns:MapServiceResponseMessage "/> <variable name="fault" messageType="tns:MapServiceFaultMessage"/> </ variables >
Code Example 3 – BPEL Code from Example
<types >
<schema attributeFormDefault=" qualified " elementFormDefault=" qualified " targetNamespace=" http://services.otn.com " xmlns=" http://www.w3.org/2001/XMLSchema ">
<element name=" request " type=" string "/> <element name=" response " type=" string "/> <element name=" error " type=" string " />
</ schema> </ types > <message name="MapServiceInvokedMessage"> <part name="payload" element="tns:request"/> </message> <message name=" MapServiceResponseMessage "> < part name=" payload " element=" tns:response "/> </ message > <message name="MapServiceErrorMessage"> <part name="payload" element="tns:error" /> </message> <portType name=" MapService "> < operation name=" process "> <input message="tns:MapServiceInvokedMessage"/> < output message=" tns:MapServiceResponseMessage "/>
<fault name="MapNotFound" message="tns:MapServiceErrorMessage" /> </ operation > </ portType >
Code Example 4 – The Corresponding WSDL description
17
3.5 Composition Failures
Failures can also happen during the composition phase. During
composition, different services offering different information are forced to work
together (the composition part). During composition, you need to be able to
rollback from an error (i.e. be able to recover to a point before the request
started) and sometimes these rollbacks are either incorrect, or incomplete. In
Section 4.1, I discuss a Transaction Based approach to recovery from these
types of errors.
Services can also be composed incorrectly (they are forced to work
together, but they cannot) and this can also cause a huge problem from a
client’s perspective. These types of errors will not happen often, but it can
happen that an incorrect service gets used due to its incorrect description (in
Section 4.2 I discuss this problem again).
3.6 Partial Failures
Partial failures are closely linked to composition failures since they can
cause partial failures. A partial failure implies that during a parallel execution
of services, one of the branches cannot find the needed or requested
services. This is not a major problem since parallel execution usually implies
that you only need the output from one branch, but you are working with
incomplete data. From a client’s perspective, it does not matter, since he
would not know the difference (unless all the branches fail), but the goal of a
service is to give the most accurate data to the client invoking it.
As said in the beginning, partial failures are closely linked to composition
failures. Sometimes composition failures can also go unnoticed by the client.
Although these failures will not be noticed, it does not mean they will not have
an affect. As said above, the goal of a service is to give the most accurate
data to the client invoking it. If a service cannot supply that, then the service
will not be good enough to use.
Another form of a partial failure would be if we need the result from all the
branches of the parallel execution. In some cases we might need the results
from all the branches to continue with the execution. If one of the branches
fails the system will still continue to completion, but with incomplete data. This
will cause the returned results to be incorrect or corrupt even. We can force
the execution to stop if we do not have all the necessary information to
continue, but this will be unacceptable to a client using the service.
18
The following example will clarify this problem. As a client we only have
access to one service or access point to the composite service. The service
we are using is calling other services (in parallel) to gather the needed
information. The following was observed when one of the required services
was not found. Once again I show the resulting flow diagram (shown in Figure
3) and the code (shown in Code Example 5) from the OBPMS.
Figure 3 – Flow Diagram of a Partial Failure
The response for the server and the corresponding code fragments
obtained from OBPMS.
In OBPMS, this indicates a time out error. This will only happen when using synchronous services.
19
Com.oracle.bpel.client.delivery. ReceiveTimeOutException : Waiting for response has timed out. The conversation id is 455aa7269f0030c5:149d886:10e5fc38efa:-7ffc. Please check the process instance for detail.
Code Example 5 – Time out Exception from the BPEL S erver
<sequence>
Assign_2
[2006/10/19 10:52:47] Updated variable "invokeDummy_initiate_InputVariable" less
<invokeDummy_initiate_InputVariable>
<part xmlns:xsi="http://www.w3.org/2001/XMLSchema-i nstance"
name="payload">
<DummyService_2ProcessRequest
xmlns="http://xmlns.oracle.com/DummyService_2">
<input>HELO</input>
</DummyService_2ProcessRequest>
</part>
</invokeDummy_initiate_InputVariable>
invokeDymmy
[2006/10/19 10:52:48] Invoked 1-way operation "initiate" on partner "Dummy2". less
<invokeDummy_initiate_InputVariable>
<part xmlns:xsi="http://www.w3.org/2001/XMLSchema-i nstance"
name="payload">
<DummyService_2ProcessRequest
xmlns="http://xmlns.oracle.com/DummyService_2">
<input>HELO</input>
</DummyService_2ProcessRequest>
</part>
</invokeDummy_initiate_InputVariable>
receiveDummy - pending
[2006/10/19 10:52:49] Waiting for "onResult" from "Dummy2". Asynchronous
callback.
Code Example 6 – Time out Exception shown in the BP EL Console
This example required the output from both branches in order to complete
the execution of the process. In this example I used a synchronous service
instead of an asynchronous service. A Time-out error will only occur when
using synchronous services. An asynchronous service will sit idle and wait
indefinitely for a result without giving us a time out exception. If we include
catch blocks in the service, we can avoid these errors. These methods will be
discussed in Section 4.
20
3.7 Failures due to Ambiguous Output
In very few cases, services can be composed in such a way so as to
provide a user with more that one response to only one request. This is
undesirable since we only want one unique response from a service, given a
specific input. Even though some tools will not allow this type of service to be
deployed, they can still exist if they are created without the help of a tool.
Figure 4 – Flow Diagram of a process showing Ambigu ous Output
21
As said above, some tools will not allow these types of services to be
deployed, and that is also the case of JDeveloper 10g [1]. These services can
be created, but they are riddled with errors usually. In the diagram below
(Figure 4), I try to show how this might look.
This example takes a string as input, and delivers two outputs; the string
all upper-case, and the string all lower-case. This service was deployed onto
the server, but failed to run to completion. In a real world scenario, this
service would be able to run, but the output would be determined by the
speed with which each branch executes. The slowest branch’s output would
be the output that would be displayed, unless two output variables are
defined (which is very difficult to do).
In addition, according to Ouyang et al. [2005], a BPEL process must not
use two or more receive actions on the same partner link, port type,
operations or correlation sets. This means that we cannot have two or more
input or output ports that are using the same variable. This statement is also
defined in the BPEL specification. However, this type of error sometimes does
still occur in real world services.
3.8 Other Failures
There are other failures that can occur when using Web Services that do
not fall into any of the categories mentioned above. Quality of Service (QoS)
problems and Service Level Agreement problems would be some of the most
common ones that cannot be classified. However, these two types of failures
can be traced back to any of the above mentioned failures.
A problem with the Quality of Service would result in a service just being
slow to react, or giving back results that is correct, but not up to standard. The
only way that this problem can be fixed would be to rebind to a new service.
In Yu & Lin [2005], the authors describe how to rebind to a new service that
will deliver a better Quality of Service.
A Service Level Agreement error will result in the use of an incorrect
service. As described briefly in Section 2.1, services need to advertise
themselves. If these descriptions of their services are incorrect, we might end
up making use of a service that is delivering faulty and incorrect data to us.
With a Service Level Agreement, we enter into a contract that promises us
the correct data, according to the description of the service. If this description
22
was incorrect to begin with, the agreement is void, and we end up with a
binding to an incorrect service. Once again, the only way to fix this would be
to rebind to another service, but we might end up rebinding to another faulty
service.
There are ways to ensure that the services that we rebind to are correct.
These are discussed in the next section.
4 Possible Recovery Methods Due to failures that can, and will, occur when using web services, various
methods have been researched to be able to recover from these failures.
Some of these methods include transactional methods, Self-healing networks
and using QoS constraints as a heuristic in dynamic composition of services.
Tanenbaum & van Steen [2002] and Tartanoglu et al. [2006] classify
recovery methods into two sections; backward error recovery and forward
error recovery. Backward error recovery involves rolling back to a safe state,
and retrying the operation. This approach is followed in transactions. Forward
error recovery tries to recover from an erroneous state by transforming it into
a safe state. This approach is followed by Self-healing networks.
In this section I give an overview of some of the proposed methods of
error recovery when using web services. I do not do a classification of
recovery methods however. I will also take a look at some trivial methods (like
caching).
4.1 Transactional Approach
According to Tanenbaum & van Steen [2002], a transaction is an
operation that has an all-or-nothing property. This is sometimes also referred
to as the ACID property. Operations that exhibit the ACID property are said to
be; Atomic, Consistent, Isolated and Durable.
• Atomic : The transaction appears to be indivisible to the outside world.
• Consistent : The transaction will not violate any invariant rules of the
system.
• Isolated : Transaction appears to happen sequentially if they are
concurrent (in other words, they do not interfere with each other)
• Durable : Committed transactions cannot be undone, even if the
system crash after a transaction has been committed.
23
They also classify three types of transactions; flat transactions, nested
transactions and distributed transactions. A flat transaction is a normal
transaction that will only commit after the main goal has been reached. This
type of transaction is what is normally referred to when speaking of
transactions in general. Nested and distributed transactions are discussed
later in this section and usually apply to systems spread across a network.
During a transaction, an operation can only be started once all the
resources required for the operation have been acquired. Once these
resources have been acquired, which usually implies that they have been
locked by the acquiring process, the transaction will run to completion before
releasing the resources. It will also only make changes to the acquired
resources permanent once the transaction have completed successfully.
According to Mikalsen et al. [2002], a Transactional Approach can be used
successfully to recover from failures that can occur. A lot of architectures
already support this model since it is quite easy to understand and to use.
The basic idea behind a transactional approach is this: Only commit when
every sub goal has completed successfully. Using a common example of the
travel domain, this is how a transactional approach would work:
A client sends a request to a service, requesting different booking
details from a travel agent. The system then goes and finds the relevant
details for each sub goal (booking a flight, booking a hotel etc.). As soon
as one sub goal cannot be completed, for example a flight cannot be
booked, the system will stop, and roll-back to before the request was
made. This roll-back action will undo all actions performed and thus
reset the state to just before the request. The client can then restart the
request with different parameters.
Sometime this complete roll-back is undesirable. If the flight cannot be
booked due to an unreachable server, we will not be able to complete the
transaction without changing the service we are using. A less strict way of
doing things, would be to commit after certain sub goals have been reached.
This will give the client more power to choose when he wants to commit. In
the example above, we can set the system up in such a way that the system
can commit after each sub goal is reached. If one of the sub goals should fail,
we can still do a partial commit, and complete the transaction in some other
way (to fill in the missing information).
24
Using the same example, if the system is unable to book a flight, the
client can still commit to the hotel bookings and the car rental bookings.
The client can then choose to do the flight booking manually, or let the
system search for other flights that will also reach his final destination
on time. In this way, the system will only roll-back to the start of the
failed sub goal, instead of rolling back completely.
Such an approach is called a nested transaction (Tanenbaum & van Steen
[2002]). Another approach would be to make use of a distributed transaction,
but this would not be satisfactory.
In a distributed transaction, the transaction is approached as a normal
transaction, with the difference that the resources are spread across a
network. We still lock resources and perform the transaction as if it was a
normal flat transaction on a non-distributed platform, but this can cause
problems and can be difficult to manage for the two reasons mentioned
below.
According to Tartanoglu et al. [2006] a transaction-based approach is not
suited for the composition of Web Services for mainly two reasons.
• Transaction management becomes more difficult over a distributed
system. The main problem is that it requires cooperation among the
transactional supports of Web Services, which may not be amenable
with each other, or not willing to do so.
• A transaction-based approach usually involves locking resources until
you are done with them. In a Web Services environment, this is not
really feasible.
Overall though, this type of error recovery is a good method to use. It has
been proven to work in many different domains already, and a transactional-
based framework already exists for Web Services.
Using a simple flow diagram, the basic of using a Transaction-Based
approach is illustrated below.
25
Figure 5 – Flow Diagram of Transactional-Based Appr oach
4.2 Dynamic Web Services
There are two ways in which services can be composed: static and
dynamic. Static composition is the easiest, and also the most stable method
to compose web services. Services are bound to each other during the
compiling of the service, and the bindings stay the same until the need arise
for them to change.
However, we live in a dynamic world. Services do change periodically to
reflect new information or data. This will result in services that are statically
bound, to become useless, unless the new updated service’s interface is still
the same.
To overcome this problem, we can make use of dynamic composition.
This type of composition occurs during run time. Services are bound to other
services on the fly (based on their WSDL descriptions and ontological
annotations). Dependency and composition failures can easily be solved by
this method.
There is a slight problem using this method though. If a service is
advertising itself as something that it is not, using this method might result in
the use of incorrect services. As an example, if we are looking for services
that provide translation services, and we end up using a service that
advertised itself as a translation service, but is actually an exchange rate
service, our resulting feedback will be totally incorrect. When using dynamic
composition, we cannot pick up on such problem until it is too late. We can
26
make use of other recovery methods, in conjunction with dynamic
composition, to solve these problems more effectively.
4.3 Language Constructs
BPEL and WSDL provide us with some error support. We can include
catch branches and we can catch exceptions as they occur. These constructs
can only catch the exceptions that are defined though, but they are still useful.
We can force a process to complete (even with incomplete data) by using
these catch constructs. As an example, if we are expecting an integer value,
and the service gets a string value, we can use the catch block to substitute
the incoming value with a default value. We can only do so if the value is not
important or needed for the completion of the process, but in most cases,
such a solution just will not do.
Figure 6 – Catch Branch in OBPMS
Catch Branch in OBPMS
27
We can however use catch branch to safely recover from an erroneous
state. Instead of just throwing an exception, we can use the catch branch to
catch the exception, and return a user friendly message to inform the client
that something went wrong. The example in Figure 6 makes use of a catch
branch.
In the code, below, you can see where the catch branch is inserted (the
faultHandlers section). If an exception is raised, or the input is incorrect, the
catch branch is invoked, and a default assignment is made. In this example, a
default error message is copied to the output variable.
<scope name=" shopScope "> <<ff aauull tt HHaannddll eerr ss >> <<cc aatt cc hh ff aauull tt NNaammee=="" nnss 11:: SShhooppNNoott FFoouunndd"" >> <<ss eeqquueenncc ee nnaammee=="" SSeeqquueenncc ee__33"" >> <<aass ss ii ggnn nnaammee=="" aass ss ii ggnnEErr rr oorr MMss gg"" >> <<cc ooppyy >> <<ff rr oomm eexx pprr eess ss ii oonn=="" '' SShhoopp NNoott FFoouunndd'' "" // >> <<tt oo vv aarr ii aabbll ee=="" cc ll ii eenntt OOuutt ppuutt "" ppaarr tt =="" ppaayy ll ooaadd"" qquueerr yy ==
"" // cc ll ii eenntt :: SShhooppFFii nnddeerr PPrr oocc eess ss RReess ppoonnss ee// cc ll ii eenntt :: rr eess uull tt "" // >> <<// cc ooppyy >>
<<// aass ss ii ggnn>> <<// ss eeqquueenncc ee>> <<// cc aatt cc hh>> <<// ff aauull tt HHaannddll eerr ss >> < sequence name=" Sequence_1 "> < assign name=" shopInputAssign "> < copy > < from variable =" clientInput " part =" payload " query ="/ client:ShopFinderProcessRequest/client:input "/> < to variable =" shopInput " part =" payload " query =" /ns1:shopdef"/> </ copy > </ assign > < invoke name=" searchShop " partnerLink =" ShopSearch " portType = " ns1:ShopServiceV2 " operation =" process " inputVariable =" shopInput " outputVariable = " shopOutput "/> </ sequence > </ scope >
Code Example 7 – Catch Branch In BPEL
4.4 Self-healing Networks
This is where most of the research has gone into so far. Many researchers
try to come up with new ways in which a network can heal itself without user
intervention. Yu & Lin [2005] uses some form of a self-healing network in their
paper. They combine it with QoS constraints as a heuristic. Baresi et al.
[2006] also proposes to make use of self-healing networks. But what is a self-
healing network?
28
Self-healing networks are networks that are capable of recovering from
errors by themselves. In a Web Services context, they are networks that can
recover from composition faults by themselves. This is done by making use of
some external heuristic that monitors the network’s behaviour.
Different types of self-healing strategies have already been proposed.
There are strategies that make use of QoS constraints as a way of ensuring
stability when composing Web Services (Yu & Lin [2005]). Baresi et al. [2006]
proposes a strategy that is based on design by contract (a construct borrowed
from the Eiffel language). In their strategy, you can set pre- and post-
conditions that have to be met (similar to QoS constraints), but they also
weave in monitoring code that monitors the workflow and checks the pre- and
post-conditions of the services invoked.
Since Web Services live in a very dynamic environment, Self-healing
networks might be the way to go in the future. In Baresi et al. [2006], they use
the example of a Pizza Company to explain their concepts. The flow diagram
in Figure 7 is taken from their paper.
In the example, a client will use a web site or WAP enabled phone to contact
the pizza company. The client then gets authenticated after which his profile
is loaded. This profile holds information regarding the client’s favourite pizzas.
The Pizza Catalogue Service then offers the client a choice of four
different pizzas. When the client made his choice, his credit card details are
validated by the Credit Card Validation Web Service. If everything
goes according to plan, the client’s account is debited and the pizza
company’s account is credited. At the same time, the order will appear in the
browser of the pizza chef, informing him of the new order. In conjunction to
this, the address of the client is obtained from the Phone Company
Service, and the GPS Web Service is then called to obtain the precise
coordinates of the address. Once the coordinates are obtained, a map is
retrieved from the Map Web Service. After this has completed successfully,
the map is sent to the delivery boy’s PDA, and a SMS is sent to the client
informing him that his pizza will be delivered in 20 minutes. In this example,
various failures can occur, and because we are making use of dynamic
composition, failures are bound to happen.
29
Figure 7 – Flow diagram of Pizza Company
In the paper, the authors propose two types of failures detection, and
three types of recovery methods. The two detection methods are briefly
discussed in the next section (Section 5). The three recovery methods
proposed by Baresi et al. [2006] are:
• Retry : if a binding to a service failed, we retry in the hope that it was a
once of failure.
30
• Dynamically bind to another service : we rebind to another service
that offers the same functional or non-functional properties as the one
that is unavailable.
• Process reorganization : a dynamic reorganization of the process at
run-time, in order to overcome the problems due to a faulty or
unavailable external service, for which no alternative matching service
can be found.
These methods can be structured in a hierarchical fashion. This implies if
a service cannot be reached, we first retry the service a few times. If that
strategy doesn’t work, we rebind to another service. If that strategy fails, we
switch over to the most complex recovery method namely process
reorganization.
In process reorganization, we can locally reorganize services if we cannot
rebind to another service that can offer the same properties as the
unavailable service. This is done by using graph transformation rules. Using
this strategy, we can split single nodes into parallel and disjoint nodes, and
we can also combine parallel nodes into single nodes. This is done by
ensuring that the pre- and post-conditions are the same for the resulting
nodes after the transformation was applied. As an example, if a single node n
is split up into two nodes n1 and n2, the pre-condition of nodes n and n1 will
be the same. Similarly, the post-condition of nodes n and n2 will also be the
same. This will result in the post-condition of n1 implying the pre-condition of
n2.
As a more concrete example, if the Get Map and Route Service
cannot return a map of correct resolution for the PDA’s, we can split up that
service into two services Get Good Map and Route and Filter Map.
Get Good Map and Route will return a high resolution map, and Filter
Map will scale down the map to the proper resolution for the PDA’s.
4.5 Trivial Recovery Methods
There are some trivial recovery methods that can be used. A good one would
be to use caching. Clients can cache previous retrieved information, and can
recall it when the service cannot be found (Figure 8), or if some failure
occurred during the request. This would only be useful if the service offers
information that does not change too often (e.g. like a service giving
information about bus times). In cases where information will change very
31
often (e.g. a service that offers the latest stock exchange information), this
type of approach, would be useless, since it would not help a client to use
information that is old.
Another trivial method would be to just keep requesting the information
until it is received, or until a specified time out is reached. This type of error
recovery is the easiest, but it is the most undesirable of all recovery methods,
since clients do not want to wait for a service to respond to a request. Clients
would prefer to use the quickest and most accurate service, which will provide
results in a fast and reliable manner.
Figure 8 – Flow Diagram of a Trivial Recovery Metho d
5 Failure Detection Having now classified some of the most common failures and also having
discussed some of the most common recovery methods, to bring the two
together we need some way to detect whether a failure occurred or not.
Failure detection algorithms are used in Self-healing networks to detect
whether or not something went wrong during the composition phase. There
are various ways in which this can be done, but these various techniques can
Invoke Service
Service Invoked?
Get Data from Cache
No
Yes
32
be split into two main categories: dynamic detection of errors and static
detection of errors.
Dynamic detection implies that the error or failure is detected during
execution or during run-time. Static detection implies that errors or failures are
detected in an offline fashion (in other words, not during run-time).
Baresi et al. [2006] proposes two methods called Defensive Process
Design (DPD) and Service run-time Monitoring (SrtM), which are two forms of
dynamic detection. Ouyang et al. [2005] propose an automated analysis using
Petri net techniques which is a form of static detection.
Since this is not the main focus of this document, these methods will be
discussed briefly in the following subsections.
5.1 Defensive Process Design
According to Baresi et al. [2006], Defensive Process Design (DPD)
consists of designing services in such a way so that they can cope with
failures. This is done by using some of the language constructs that is
included in the BPEL standard. By designing services in such a way, we can
detect and gracefully recover from most exception and failures.
As an example, a time-out failure can be detected in such a way by
encapsulating the invoke action in a scope that has a timer. Once the timer
has run out, the service can recover from the time-out exception by calling
another service, or rebinding, or even retrying the same service.
This type of detection ties in with Section 4.3 since we can use exception
handlers and catch blocks to detect when an error has occurred. BPEL also
provides us with other constructs that will also help with the detection of
failures.
5.2 Service run-time Monitoring
Service run-time Monitoring (SrtM) consists of making use of external
monitoring tools to check whether functional and non-functional contract are
violated. There are various methods that can be used to monitor services.
Baresi et al. [2006] proposes an assertion based approach.
In their approach, they specify pre- and post-conditions to remote
services. These are checked by a separate tool that will notify the process
engine if anything goes wrong. In the event that a pre- or post-condition has
33
been violated, the tool will notify the process engine, which will take the
appropriate actions to recover from the error.
The ASTRO tool set (Trainotti et al. [2005]) also makes use of a similar
method in its WS-mon component. The only difference is that the monitoring
code gets generated automatically by ASTRO and they use Java code to
monitor the services.
5.3 WofBPEL
Ouyang et al. [2005] proposes a technique that is based on Petri net
analysis techniques. They propose the use of an external tool, WofBPEL,
which can analyse composite services once they have been translated into
Petri Net Markup Language (PNML). Unlike the previous two methods, which
can be implemented to analyse service composition dynamically, this
technique analyses service composition statically in an off-line fashion.
A composite service needs to be translated into a secondary language
before it can be analysed for errors. At the time of the article, the tool only
supported three types of error detection: detection of unreachable actions,
detection of conflicting message-consuming activities and metadata
generation for garbage collection of unconsumable messages.
6 Three Scenarios In this section I want to introduce three scenarios where service oriented
computing (SOC) can be used for a real world implementation. There are
many different applications for SOC, some that are very big, and some that
are relatively small. With these scenarios I try to cover a wide spectrum from
the smaller implementation (Foreign Traveller Information) to the large scale
implementation (The General Entertainment Planner).
6.1 Foreign Traveller Information
The idea here is that you are a tourist that just landed in a foreign country.
You want to be able to get various information regarding transport options to
and from your hotel.
In this example, access to information regarding bus times, stations and
prices can be accessed from a mobile device or your laptop. The way this is
done is by making use of different services (one for bus times, another for
34
geographical information, etc.). The main program will go out and find suitable
services to use, and will compose the received data in a meaningful way for
the client using the program. Many services are involved, but only a small
amount of data is needed from them in the end. See Figure 9.
This can be related to a real world scenario. A university professor, on his
way back from a conference, misses his connecting flight due to a delayed
flight from his previous destination. He enquires about other flights and finds
out that all flights to his final destination are booked full, and that the next
flight is only available the following night. Now the professor has a problem. It
is late at night and he needs to book a flight and also a hotel for the night.
Thankfully there are various web sites that the professor can visit to make
these bookings. These web sites almost always make use of Web Services to
gather information. So the professor goes to a web site that will allow him to
make a hotel reservation.
Figure 9 – Foreign Traveller Information
The site gathers information from all the local hotels, and displays them to the professor so that he can make an informed choice. He also visits a web site to make the booking for his flight the following evening. Thanks to Web
35
Services, the day was saved, and the professor got a good nights rest and got home safely on the later flight the he booked using the web sites. 6.2 General Entertainment Planner
In this example a user can plan his night out by finding information about
nearby entertainment complexes. A user will be able to find out, for example,
what movies are showing at cinema complexes and also what times they will
be showing it.
He can also find out the location to these cinemas from his current
location. Other information that users will be able to access will include
information about restaurants, pubs, clubs, bars and other entertainment
hubs. This obviously means that all of this information must be obtained from
various locations so that the user can plan his night. You will need information
on each place’s location (geographical information so that the user can get
maps to these places), you will also need information about the specific
places (prices, atmosphere, type of place etc.) and probably some sort of
translation service so that you can display the information in various
languages. This once again will involve many different services from different
source, and in the end the information obtained form these services, must be
composed in a meaningful way. See Figure 10.
6.3 Mall Information System
In this example the idea is very simple. A user wants to locate the nearest
shop (specific shop like a stationary shop e.g. CNA) in his area. He also
wants to know whether the shop will have what he is looking for and also how
to get there. The user must be able to access this information from his home
computer, as well as his mobile phone (or other mobile device). This requires
that the system can find information about malls and the shops that they
have. It also needs to find geographical information so that it can give the
user directions to the mall. Instead of giving the user a map, the system must
be able to give the user directions in a descriptive manner.
This example once again needs information from different services, but
this time it is on a smaller scale. The system only needs to provide the user
with a list of shopping malls where the shop can be found, and directions to
the nearest one (or one chosen by the user). See Figure 11.
36
Figure 10 – General Entertainment Planner
Figure 11 – Mall Information System
37
7 Example Scenario: Shopping Domain Shopping Centres are being built everywhere nowadays and they are
getting bigger and bigger. Many centres though do not have all the shops that
you would want to visit. Although almost every major shopping centre has a
web site with a store directory on it, not many of us takes the time to go onto
the internet and find out what shopping centre contains a particular store. It
would be much simpler to just use your cell phone to get the information
about a shopping centre. Further more, not many of us know where some of
the major shopping centres are.
In a perfect world we all would know the direction to each one of these as
well as what store each one has. But as we all should know by now, that is
impossible, firstly because there are too many shopping centres, and
secondly, many shopping centres evolve and change. Older stores close to
make way for newer ones and thus the store directory constantly keeps
changing. The proposed system that I came up with will facilitate frequent
shoppers to know exactly where to go, and what they can expect.
The system is in concept, very simple. The customer will use either his cell
phone or his computer (or any other mobile device) to gather the required
information. A program on each device will connect to the necessary services,
and will return the results in a meaningful way. It will be the responsibility of
the program to do error handling and recovery.
Many different languages exist that can be used to describe a web
service. Almost all of them are derived from XML. Depending on the type of
description we want, we can describe a service using any one of the following
standards:
• Web Services Description Language (WSDL)
• OWL and OWL-S
• DAML and DAML-S
Each one of these languages brings along with them their own unique
method of describing a service. WSDL mainly describes the interface and can
also contain a short description of the service. It describes the interface as a
set of end-points operating on messages. These messages are described
abstractly and are bound to concrete network protocols. OWL describes the
semantics of the service. It is often used to describe the ontology of the
38
service, in other words, the behaviour of the service. For the example, we will
use WSDL as the description language.
Different work flow languages also exist. Some of the ones that were
proposed are:
• BPEL (Business Process Execution Language)
• WSFL (Web Services Flow Language)
• XLANG (Web Services for Business Process Design)
• WSCI (Web Service Choreography Interface)
• BPML (Business Process Markup Language)
• BPSS (Business Process Schema Specification)
All of these languages have their own characteristics. According to van
der Aalst [2003], XLANG has block-structures with basic control flow
structures. WSFL on the other hand, is not limited to block-structures, and
allows for directed graphs. It mainly describes Web Service composition and
it considers 2 types of compositions; usage patterns and interaction patterns.
Usage patterns are concerned with how to achieve a particular goal and
interactive patterns are concerned with a collection of Web Services. BPEL
builds on both these languages (XLANG and WSFL) and therefore supports
most of the constructs supported by both languages. It uses programming
abstraction that allows developers to compose multiple discrete Web Services
into an end-to-end process flow. The other languages (WSCI, BPML and
BPSS) are quite new and they have not yet caught on as a standard to be
used for Web Services.
We will use BPEL as the flow language. This has been chosen due to their
ease of use, and also because my development platform (Oracle JDeveloper
10g [1]) only allows me to use these two languages.
To successfully simulate the use of this system, and its capabilities to
recover from a failure, the services that are used will be fake services,
created by me in JDeveloper 10g [1]. These services will only return the
necessary information to the system. This setup allows me to break a service,
so that the system can then start the recovery process.
39
Figure 12 – Flow Diagram showing where Sub-Goals wi ll be Checked
Although there are many different recovery methods, the most practical
one to use when dealing with Web Services would be to use a transaction-
based approach to recovery. With this approach, we can control where and
when failures will be detected. We can do this by checking for certain sub-
goals that needs to be completed before we can continue with the processing
of information. Logical places to insert sub-goals would be after each call to a
Get Shopping
Center Listing
Shopping Center Listing
Retrieved
Get City Map
City Map Retrieved
Yes
No
Yes
No
Display
Retrieved
Data
Check Sub-Goal Here
Check Sub-Goal Here
40
service. Once a service is invoked, we can check that the service has
responded to our request, if it has, that particular sub-goal is complete. If it
has not responded to our request, we can reissue our request, or choose to
rebind to another service. Figure 12 will shows where the sub-goals will be
checked.
For the program, I chose to use .NET for my development environment.
This is mainly due to its ease of use, but also because Web Services can be
easily integrated into the code.
A common way to simulate a transaction based approach in any
programming language would be to use try-catch blocks, or if-statements.
When using try-catch blocks, it would be very easy to pick up if an error
occurred, and if one did occur, we can recover from it in the catch segment
of the try-catch block. The following piece of C#-like pseudo code shows
how this would look.
public void searchServices( string shop, string city, string prov){
try { string service = invokeMapService( "http://aikon:9700/orabpel/ default/DummyService_1/DummyService_1?wsdl" );
} catch ( Exception exception) { MessageBox .Show( "Error Occured during invocation of Service. Retry invocation?" , "Invocation Error" , MessageBoxButtons .RetryCancel, MessageBoxIcon .Error); if (button == ”Retry” ){
string service = invokeMapService( "http://aikon:9700/ orabpel/default/DummyService_1/DummyService_1?wsdl" );
} }
}
public string invokeMapService( string url) {
try { invokeMapservice(parameter1, parameter2); string result = returnMapserviceresults();
} catch ( Exception exception) { MessageBox .Show( "Error Occured during invocation of Service" , "Invocation Error" , MessageBoxButtons .OK, MessageBoxIcon .Error);
} return result;
}
Code Example 8 – Pseudo code for a Transaction-base d approach
41
Transaction can also be done in a similar way using if-statements. This
will look almost exactly the same as the try-catch example above, but
determining whether a failure occurred will be more difficult than before.
7.1 Program Demo
In this section I give a demonstration of how the program works, and how
it copes with failures. When the program is started, the user must input the
requested data into the fields. The data that is requested are; shop name,
province and city. This is shown in Figure 13.
The program then goes out and finds the relevant information and displays
it on the screen. Depending on the results found, the user will either get only
one response (in other words only one result will be displayed and the system
will automatically display the results page for this result), or the user will get
the opportunity to choose from a list of results and the user must choose
which one to display. Once the user has made his choice about which results
to display, the program will respond by displaying the shop name, the mall
name, additional information and directions on how to get there. This is
shown in Figure 16.
In the event that something went wrong during the invocation of the
service, the program will inform the user and will ask the user how he wants
to handle the situation. The user can either retry the invocation, or it can ask
the program to handle the error. The program will first retry to invoke the
service, after which it will try to find a new service (if one is available). In the
event that something went wrong during the operations on the services, the
program will make use of standard transaction-based rules to recover from
the failure. This is shown in Figure 17. It can also happen that there is no
possibility of recovery. This situation is shown in Figure 18.
8 Related Work During my research, I have not come upon any research papers that deal
with the classification of faults in Web Services. Many papers do, however,
name some common faults that can occur. In Baresi et al. [2006] the authors
name some of the faulty behaviour that can occur during deployment time,
and during run time. They do not, however, try to classify them into
categories.
42
Tanenbaum & van Steen [2002] do a classification of faults in distributed
systems. Some of these faults are closely related to faults that can occur in
Web Services and they have been included in the classification model in
Section 3, but their work is focussed on distributed systems and not Web
Services.
A great deal of research has also gone into the detection of faults,
something I did not cover in detail in this document. Ouyang et al. [2005] uses
an automated tool to detect a limited set of faults by making use of Petri net
analysis techniques. Their tool, WofBPEL, can detect unreachable services,
services that make use of ambiguous input or output and invalid input
messages to a service (in other words, messages of the wrong type for the
service). Their analysis however, is done statically and the BPEL processes
have to be converted into another language before it can be analysed. Baresi
et al. uses two run-time methods to detect failures. DPD and SrtM can be
used to detect failures when using Self-healing networks. Another detection
strategy is included in ASTRO (Trainotti et al. [2005]). In ASTRO, monitors
are generated automatically in Java. These monitors are used to check
predefined properties of the associated processes and they will produce
feedback in the event of a failure. These properties can be related back to the
pre- and post-conditions of a service.
When it comes to recovery methods, a lot of research has gone into this
field. Both Tanenbaum & van Steen [2002] and Tartanoglu et al. [2006]
classify recovery methods into two subfields namely forward and backward
error recovery. Both also mention the use of transactions as a successful way
to recover from failure. However, most of the research focuses on Self-
healing networks, and dynamic composition of services. Other methods are
also discussed, but not as much as the Self-healing Approach. The
Transaction Based approach, however, has been mentioned before in
different papers and textbooks under many different names and guises. It
seems to be the most logical choice when you do not want to make use of a
Self-healing network (even though the two methods can be combined
successfully to produce an even better recovery method).
Various tools and languages have also been created to help with the
composition of services. Brogi & Popescu [2005] proposed a workflow
language called Yet Another Workflow Language (YAWL) that can be used to
not only express the basic workflow, but also the behaviour of the
43
composition. YAWL is based on Petri nets, which makes failure detection a
bit easier. When using YAWL, a service using BPEL as the workflow
language and OWL as the descriptor will first need be translated into YAWL.
After that, services are expanded to include control-flow constructs. These
construct can then be used in the next phase to make sure that aggregated
services does not have processes with unsatisfied inputs. These constructs
can be seen as pre- and post-conditions of a service. If they are not met, the
composition will fail. Finally, the service is deployed as normal Web Service.
Their proposed strategy is a great in theory, but even though it is “semi-
automated”, it is still an off-line strategy.
Ponnekanti & Fox [2002] proposed a developer toolkit for the composition
of Web Services called SWORD. Although a developer toolkit isn’t anything
new, their toolkit allows for the composition of services by supplying it with the
necessary pre- and post-conditions. It will also generate rule based plans
using these conditions as a base to work from.
Pautasso & Alonso [2003] created a visual language in which a service’s
workflow can be described using a graphical representation. Their language
called BioOpera Flow Language (BFL) works very much the same as BPEL’s
graphical notation in OBPMS. They have many of the same constructs in
BFL, as well as a development environment specifically designed for BFL.
All in all, a lot of research has gone into recovery and detection methods,
but not a lot of research has gone into failures as such. Many researchers
mention some of the failures they came across in their publications, but they
do not classify them into specific classifications.
9 Conclusion Web Services live in a very dynamic environment. Due to this
environment, many things will go wrong during the lifetime of a single Web
Service. This paper tries to classify some of the common failure points when
using Web Services. This classification is by no means a complete
classification, but only serves as a model with which certain failures can be
associated. Very little research has gone into the classification of failures.
Some papers try to just name them (Baresi et al. [2006]) and others try to
classify them into their own classifications (Tartanoglu et al. [2006]). More
research has gone into recovery from failures than into failures themselves.
44
Different recovery methods have been proposed, but some of the more
popular ones have stayed in the research arena longer. Nowadays more
research is going into self-healing composition of services than any other
recovery method. This is partly due to its success, but also due to the fact that
there are still many areas that can be improved upon in self-healing networks.
Transaction-based approaches have been around for a long time and they
have proven to be successful in the real world already. Some problems do
persist though when using a transaction-based approach in a distributed
fashion, but models have been proposed to solve this (Mikalsen et al. [2002]).
Other methods also exist. Tartanoglu et al. [2006] uses a term Forward
Error Recovery to classify al those recovery methods that come from the
workflow language itself (all the exception handling etc.). There also exist
trivial methods that are not suited to Web Services at all, like caching, that
only prove to us why we need all these different recovery methods.
The research field in recovery from failure is far from depleted, and a lot of
research can still be done in various other related areas. Even though it was
not covered in this document, a lot of research is still continuing in service
discovery as well. Discovery and recovery can go hand-in-hand, especially
when we look at Self-healing networks, since Self-healing networks do
recovery by searching (discovering) for other services that can take over from
a service that failed. Various other research fields are opening up in Web
Services, and all of them have to deal with failure and recovery at some point.
This document tries to show how important a formal classification of failures
can be.
10 Acknowledgements I would like to thank May Chan for her help and all the discussions
regarding this topic. I would also like my supervisor, Prof. J. Bishop, for her
support in guiding me in the right direction every time.
45
References [1] "Oracle BPEL Process Manager Suite 10g," Oracle.
[2] "Service-oriented architecture," Wikipedia, Available:
http://en.wikipedia.org/wiki/Service_Oriented_Architecture. [Accessed:
2006/11/09 2006].
[3] "Microsoft Visual Studio 2005," Professional Edition ed: Microsoft,
2005.
[4] Wil .M.P. van der Aalst, "Don't go with the flow: Web services
composition standards exposed," IEEE Intelligent Systems, vol. 18, no.
1, pp. 72-76,
[5] Wolf-Tilo Balke and Matthias Wagner, "Towards Personalized
Selection of Web Services." in Proceedings of the WWW (Alternate
Paper Tracks), 2003.
[6] Luciano Baresi, Carlo Ghezzi, and Sam Guinea. "Towards Self-healing
Service Compositions." in Contributions to Ubiquitous Computing, vol
42, Springer, 2006.
[7] Antonio Brogi and Razvan Popescu, "Towards Semi-automated
Workflow-Based Aggregation of Web Services." in Proceedings of the
ICSOC, 2005, pp. 214-227.
[8] Robert J. Brunner, Frank Cohen, Francisco Curbera, Darren Govoni,
Steven Haines, Matthias Kloppmann, Benoit Marchal, K. Scott
Morison, Arthur Ryman, Joseph Weber, and Mark Wutka, Java Web
Services Unleashed, Sams Publishing, 2002.
[9] Paul A. Buhler, Christopher Starr, William H. Schroder, and José M.
Vidal, "Preparing for Service-Oriented Computing: A Composite
Design Pattern for Stubless Web Service Invocation." in Proceedings
of the ICWE, 2004, pp. 603-604.
[10] Damian Foggon, Daniel Maharry, Chris Ullman, and Karli Watson,
Programming Microsoft .NET XML Web Services, Microsoft Press,
2004.
[11] Rania Khalaf, Nirmal Mukhi, and Sanjiva Weerawarana, "Service-
Oriented Composition in BPEL4WS." in Proceedings of the WWW
(Alternate Paper Tracks), 2003, pp.
[12] Heiko Ludwig, Henner Gimpel, Asit Dan, and Robert Kearney,
"Template-Based Automated Service Provisioning - Supporting the
46
Agreement-Driven Service Life-Cycle." in Proceedings of the ICSOC,
2005, pp. 283-295.
[13] Thomas Mikalsen, Stefan Tai, and Isabelle Rouvellou, "Transactional
Attitudes: Reliable Composition of Autonomous Web Services,"
presented at International Conference on Dependable Systems and
Networks, Washington D.C., USA, 2002.
[14] Chun Ouyang, Wil M.P. van der Aalst, Stephan Breutel, Marlon
Dumas, Arthur H.M. ter. Hofstede, and Eric Verbeek, "WofBPEL: A
Tool for Automated Analysis of BPEL Processes." in Proceedings of
the ICSOC, 2005, pp. 484-489.
[15] Abhijit A. Patil, Swapna A. Oundhakar, Amit P. Sheth, and Kunal
Verma, "Meteor-s web service annotation framework." in Proceedings
of the WWW, 2004, pp. 553-562.
[16] Cesare Pautasso and Gustavo Alonso, "Visual composition of web
services." in Proceedings of the HCC, 2003, pp. 92-99.
[17] David S. Platt, Introducing Microsoft .NET, Microsoft Press, 2001.
[18] Shankar R. Ponnekanti and Armando Fox, "SWORD: A Developer
Toolkit for Web Service Composition," vol. no. pp. January~01.
[19] Mike Rosen, "BPM and SOA: Where Does One End and the Other
Begin?" Available: http://www.bptrends.com. [Accessed: 2006].
[20] Ozgur D. Sahin, Cagdas Evren Gerede, Divyakant Agrawal, Amr El
Abbadi, Oscar H. Ibarra, and Jianwen Su, "SPiDeR: P2P-Based Web
Service Discovery." in Proceedings of the ICSOC, 2005, pp. 157-169.
[21] Ichiro Satoh, "Location-Based Services in Ubiquitous Computing
Environments." in Proceedings of the ICSOC, 2003, pp. 527-542.
[22] Andrew S. Tanenbaum and Maarten van Steen, Distributed Systems:
Principles and Paradigms, International Edition. Prentice Hall, 2002,
pp. 272-277.
[23] Ferda Tartanoglu, Valerie Issarny, Alexander Romanovsky, and Nicole
Levy, "Dependability in the Web Services Architecture," Available:
http://www-rocq.inria.fr/~tartanog/publi/wads/. [Accessed: 2006/10/05
2006].
[24] Michele Trainotti, Marco Pistore, Gaetano Calabrese, Gabriele Zacco,
Gigi Lucchese, Fabio Barbon, Piergiorgio Bertoli, and Paolo Traverso,
"ASTRO: Supporting Composition and Execution of Web Services." in
Proceedings of the ICSOC, 2005, pp. 495-501.
47
[25] Tao Yu and Kwei-Jay Lin, "Service Selection Algorithms for
Composing Complex Services with Multiple QoS Constraints." in
Proceedings of the ICSOC, 2005, pp. 130-143.
48
Figure 13 – Screenshot of Program requesting data
49
Figure 14 – Busy Searching for Shops
50
Figure 15 – Results Found
51
Figure 16 –Displaying Results
52
Figure 17 – Failure with the possibility of Recover y
53
Figure 18 – Notification of Failure without the pos sibility of Recovery