A Recovery Mechanism Based on a Rewriting Process for Web ...€¦ · of compositions, takes user...

Universidade Federal do Rio Grande do Norte

Centro de Ciências Exatas e da Terra

Departamento de Informática e

Matemática Aplicada

Programa de Pós-Graduação em

Sistemas e Computação

Mestrado Acadêmico em Sistemas e Computação

A Recovery Mechanism Based on a

Rewriting Process for Web Service Compositions

Rafael Ferreira Toledo

Natal-RN

Julho, 2018

Rafael Ferreira Toledo

A Recovery Mechanism Based on a

Rewriting Process for Web Service Compositions

Dissertação de Mestrado apresentada ao Pro-

grama de Pós-Graduação em Sistemas e Com-

putação do Departamento de Informática e

Matemática Aplicada da Universidade Federal

do Rio Grande do Norte como requisito parcial

para a obtenção do grau de Mestre em Sistemas

e Computação.

Linha de pesquisa:

Linguagens de Programação e Métodos Formais

Orientador

Prof. Dr. Umberto Souza da Costa

PPgSC � Programa de Pós-Graduação em Sistemas e Computação

DIMAp � Departamento de Informática e Matemática Aplicada

CCET � Centro de Ciências Exatas e da Terra

UFRN � Universidade Federal do Rio Grande do Norte

Natal-RN

Julho, 2018

Toledo, Rafael Ferreira. A recovery mechanism based on a rewriting process for webservice compositions / Rafael Ferreira Toledo. - 2018. 82f.: il.

Dissertação (mestrado) - Universidade Federal do Rio Grandedo Norte, Centro de Ciências Exatas e da Terra, Programa de Pós-Graduação em Sistemas e Computação. Natal, 2018. Orientador: Umberto Souza da Costa.

1. Computação - Dissertação. 2. Serviços web - Disssertação.3. Recuperação de falhas - Disssertação. 4. Preferências dousuário - Disssertação. 5. Reescrita da composição de serviços -Disssertação. I. Costa, Umberto Souza da. II. Título.

RN/UF/CCET CDU 004

Universidade Federal do Rio Grande do Norte - UFRNSistema de Bibliotecas - SISBI

Catalogação de Publicação na Fonte. UFRN - Biblioteca Setorial Prof. Ronaldo Xavier de Arruda - CCET

Elaborado por Joseneide Ferreira Dantas - CRB-15/324

Agradecimentos

O desenvolvimento da presente dissertação não seria viável se não fosse pelo apoio de

pessoas que foram presentes em diversos momentos da minha vida. Gostaria de expressar a

minha eterna gratidão a essas pessoas por todo carinho e dedicação direcionados a mim e ao

meu sucesso.

Em primeiro lugar, agradeço aos meus pais, Rogério e Cristiane, por me incentivarem a

perseguir meus sonhos. Obrigado por me ensinarem a questionar e entender o funcionamento das

coisas. Sem este incentivo, jamais despertaria minha satisfação por desenvolver o conhecimento

cientí�co.

Agradeço ao meu tio Vitor por representar para mim uma referência em termos de dedicação

e curiosidade pela busca do conhecimento.

Agradeço aos meus irmãos, Gabriel e Rachel, por me motivarem a ser uma pessoa melhor

e me inspirarem a trabalhar para o desenvolvimento do futuro.

Agradeço a Letícia por compartilhar comigo as motivações, os planos, o amor e a vida. Sua

participação e companhia são fundamentais para a busca dos meus objetivos e sem elas eu seria

incapaz de alcançar essa mais nova conquista. Obrigado por ser tão especial na minha vida.

Agradeço a Giovanna, Adalberto e Lucas por toda a acolhida e apoio nos últimos anos.

Agradeço a Rodolfo, Alice, Claudio, João, Gabriel, Marcela, Igor, Vinícius e Fernanda por

me acompanharem nesta jornada compartilhando momentos de alegria e amizade que foram

importantes para tornar este caminho mais agradável e especial.

Agradeço a Umberto, meu orientador, e a toda a organização do Programa de Pós-Graduação

em Sistemas e Computação da Universidade Federal do Rio Grande do Norte por oferecerem

as melhores condições possíveis para meu desenvolvimento acadêmico.

E por isso eu lhe digo

Que não é preciso

Buscar solução para a vida

Ela não é uma equação

Não tem que ser resolvida

Paulinho da Viola e Ferreira Goullar

A Recovery Mechanism Based on aRewriting Process for Web Service Compositions

Author: Rafael Ferreira Toledo

Supervisor: Dr. Umberto Souza da Costa

Abstract

This dissertation presents an approach to improve the robustness of Web service compo-

sitions by recovering from failures occurred at di�erent moments of their execution. We �rst

present a taxonomy of failures as an overview of previous research works on the topic of fault

recovery of service compositions. The resulting classi�cation is used to propose our self-healing

method for Web service orchestrations. The proposed method, based on the re�nement process

of compositions, takes user preferences into account to generate the best possible recovering

compositions. To validate our approach, we produced a prototype implementation capable of

simulating and analyzing di�erent scenarios of faults. For that matter, our work uses algorithms

for generating synthetic compositions and Web services. In this setting, both the recovery time

and the user preference degradation are investigated under di�erent strategies, namely local,

partial or total recovery. These strategies represent di�erent levels of intervention on the com-

position.

Keywords : Web Services, Self-Healing, User Preferences, Service Composition Rewriting.

Um mecanismo para recuperação de falhas baseado em umprocesso de reescrita de composições de serviços Web

Autor: Rafael Ferreira Toledo

Orientador: Dr. Umberto Souza da Costa

Resumo

Esta dissertação apresenta uma abordagem para melhorar a robustez das composições de

serviços Web, recuperando de falhas ocorridas em diferentes momentos de sua execução. Primeira-

mente, apresentamos uma taxonomia de falhas como uma visão geral de trabalhos anteriores

sobre recuperação de falhas em composições de serviços. A classi�cação resultante é usada para

propor nosso método de autocorreção para orquestrações de serviços da Web. O método pro-

posto, baseado no processo de re�namento das composições, considera as preferências do usuário

para gerar as melhores soluções possíveis para a recuperação. Para validar nossa abordagem,

produzimos um protótipo de implementação capaz de simular e analisar diferentes cenários de

falhas. Nesse sentido, nosso trabalho apresenta algoritmos para gerar composições e serviços

Web sintéticos. Nesse cenário, tanto o tempo de recuperação quanto a degradação da preferên-

cia do usuário são investigados após a execução de recuperações locais, parciais e totais. Essas

estratégias representam diferentes níveis de intervenção na composição.

Palavras-chave: Serviços Web, Recuperação de falhas, Preferências do Usuário, Reescrita da

Composição de Serviços.

List of Figures

Figure 3.1: Concrete services and PCDs stored at the Organiser according to theuser's preferences [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Figure 5.1: Number of synthetic services built for the experiment. . . . . . . . . . . . 65Figure 5.2: Number of executed recoveries distributed by level. . . . . . . . . . . . . 66Figure 5.3: Time spent to execute the experiment considering compositions of di�er-

ent sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Figure 5.4: Average recovery time of all levels of recovery. . . . . . . . . . . . . . . . 67Figure 5.5: Experiment executed with decreasing numbers of funcionalities. . . . . . 68Figure 5.6: Local recovery - Average recovery time. . . . . . . . . . . . . . . . . . . . 68Figure 5.7: Partial recovery - Average recovery time. . . . . . . . . . . . . . . . . . . 69Figure 5.8: Partial recovery - Percentage distribution of the average recovery time. . 69Figure 5.9: Total recovery - Average recovery time. . . . . . . . . . . . . . . . . . . . 70Figure 5.10: Total recovery - Percentage distribution of the average recovery time. . . 71Figure 5.11: Average preference degradation of all levels of recovery. . . . . . . . . . . 71Figure 5.12: Local recovery - Average recovery time (ms) considering the locality of

faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Figure 5.13: Local recovery - Average preference degradation considering the locality

of faults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Figure 5.14: Partial recovery - Average recovery time (ms) considering the locality of

faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Figure 5.15: Partial recovery - Average preference degradation considering the locality

of faults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Figure 5.16: Total recovery - Average recovery time (ms) considering the locality of

faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Figure 5.17: Total recovery - Average recovery time (ms) considering the locality of

faults - unsuccessful attempt of Local recovery. . . . . . . . . . . . . . . 75Figure 5.18: Total recovery - Average recovery time (ms) considering the locality of

faults - unsuccessful attempt of Partial recovery. . . . . . . . . . . . . . . 76Figure 5.19: Total recovery - Average recovery time (ms) considering the locality of

faults - successful attempt of Total recovery. . . . . . . . . . . . . . . . . 76Figure 5.20: Total recovery - Average preference degradation considering the locality

of faults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

9

List of Tables

Table 3.1: Fault Taxonomy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Table 3.2: Recovery Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Table 4.1: PCDs based on available services Example 4.1.3 . . . . . . . . . . . . . . . 42Table 4.2: PCDs based on available services Example 4.1.3 . . . . . . . . . . . . . . . 43Table 4.3: PCDs based on concrete services . . . . . . . . . . . . . . . . . . . . . . . 45Table 4.4: PCDs used for producing the initial rewriting . . . . . . . . . . . . . . . . 46Table 4.5: PCDs covering Flight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Table 4.6: PCDBooking replacing PCDGol . . . . . . . . . . . . . . . . . . . . . . . . 46Table 4.7: PCDLatam replacing PCDGol . . . . . . . . . . . . . . . . . . . . . . . . . 47Table 4.8: PCDs covering Flight, Hotel and/or Car . . . . . . . . . . . . . . . . . . . 47Table 4.9: PCDs sorted in decreasing order of user preference and distributed by

coverage domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Table 4.10: Component PCDs of the �rst candidate for recovery . . . . . . . . . . . . 48Table 4.11: Coverage domain Hotel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Table 4.12: Component PCDs of the �rst candidate for local recovery . . . . . . . . . 49Table 4.13: Component PCDs of the �rst candidate for local recovery . . . . . . . . . 49Table 4.14: PCDs sorted in decreasing order of user preference and distributed by

coverage domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Table 4.15: Component PCDs of the �rst candidate for partial recovery . . . . . . . . 50Table 4.16: Component PCDs of the second candidate for partial recovery . . . . . . . 50Table 4.17: Component PCDs of the fourth candidate for partial recovery . . . . . . . 51Table 4.18: Component PCDs of the third candidate for partial recovery . . . . . . . . 51Table 4.19: Component PCDs of the �fth candidate for partial recovery . . . . . . . . 51Table 4.20: Component PCDs of the �rst candidate for partial recovery . . . . . . . . 52Table 4.21: Available PCDs for the coverage domain Car . . . . . . . . . . . . . . . . 52Table 4.22: Component PCDs of the �rst candidate for local recovery . . . . . . . . . 53Table 4.23: Component PCDs of the second candidate for local recovery . . . . . . . . 53Table 4.24: Coverage Domains of Car and Payment . . . . . . . . . . . . . . . . . . . 53Table 4.25: PCDs sorted in decreasing order of user preference and distributed by

coverage domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Table 4.26: Component PCDs of the �rst candidate for total recovery . . . . . . . . . 54Table 4.27: Component PCDs of the second candidate for total recovery . . . . . . . . 55Table 4.28: Component PCDs of the third candidate for total recovery . . . . . . . . . 55Table 4.29: Component PCDs of the fourth candidate for total recovery . . . . . . . . 56

10

List of Algorithms

Algorithm 1: Self-Healing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Algorithm 2: Recover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Algorithm 3: General Recovery Process . . . . . . . . . . . . . . . . . . . . . . . . 59Algorithm 4: Build Synthetic Web Services . . . . . . . . . . . . . . . . . . . . . . 60Algorithm 5: Simulation of Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

11

Contents

1 Introduction 15

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.2 General Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.3 Speci�c Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4 The Problem Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Related Work 19

3 Preliminaries 23

3.1 The Service-Oriented Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.1 The Life Cycle of Service Compositions . . . . . . . . . . . . . . . . . . . 243.1.2 Execution Control and Transactions . . . . . . . . . . . . . . . . . . . . . 24

3.2 Service Composition Rewriting . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.1 Abstract Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.2 Concrete Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.3 User Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.4 Coverage Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.5 Pareto Preference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.6 Formation of PCDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.7 Combination of PCDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Fault Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.1 Service Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3.2 Composition Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3.3 Infrastructure Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Recovery Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Proposal 39

4.1 Speci�cation of Services and Compositions . . . . . . . . . . . . . . . . . . . . . 394.1.1 Composition Speci�cation . . . . . . . . . . . . . . . . . . . . . . . . . . 394.1.2 Service Speci�cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.1.3 Rewriting Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 Recovery Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2.1 Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2.2 Local Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2.3 Partial Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.2.4 Total Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3 Recovery Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 Experimental Results 59

5.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.1.1 Composition Speci�cation . . . . . . . . . . . . . . . . . . . . . . . . . . 60

12

CONTENTS 13

5.1.2 Synthetic Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.1.3 Rewritings of Speci�cation . . . . . . . . . . . . . . . . . . . . . . . . . . 615.1.4 Simulation of Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.3 Recovery Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.4 Preference Degradation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.5 Locality of Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6 Conclusions 79

14 CONTENTS

Chapter 1

Introduction

In the past decades, much e�ort has been driven to evolve the technology of Web service systemsand to manage the quality of their execution. The paradigm of Service-Oriented Architecture(SOA) and the resulting implementation through Web service compositions are widely studiedin the context of fault-handling and quality of services (QoS) management [2, 3, 4, 5, 6, 7]. Itbecomes hard to guarantee the desired behavior of a Web service composition because of thedistributed aspect of such systems [8]. This kind of system is executed in a highly dynamicenvironment which demands the identi�cation and correction of any unwanted eventuality thatmay happen at runtime. The ability to manage its execution, aiming to avoid or correct faults,provides the reliability and fault-tolerance of the deployed system. Despite the success achievedwith the development of tools for design and deployment of Web service compositions, studiesregarding recovery from failures still present signi�cant opportunities for improvement [8, 9, 10].

As a heterogeneous information environment, the SOA paradigm may o�er a wide varietyof services to deliver distinct or similar functionalities. This condition motivates e�orts toaddress the challenge of rewriting the abstract speci�cation of the system into a compositionof available services [11, 12, 1, 13, 14]. The idea is to correctly select those services that satisfythe speci�cation of the composite service regarding the expected functional and non-functionalproperties.

In a case of failure, the available services that o�er the same functionality of a failed onerepresent potential alternatives for recovery. The correctness achieved by the rewriting processis similarly expected during the selection of a substitute. Thus, the adaptation of the re�nementtechniques to the context of fault-tolerance can be seen as a suitable solution for the recoveryproblem. The capacity of a system to know what are the potential faults contributes to the faulttolerance aspect, but it does not cover all the expected goals of a fault-tolerant system. A fault-tolerant mechanism is typically implemented by error detection and subsequent system recovery[15]. The understanding of the possible faults is one of the critical factors to suggest and applya proper recovery mechanism [16]. A fault taxonomy for SOA helps to re�ne possible reactionsfor runtime faults, guiding fault injection tests executed on services during development, andconsequently improving robustness, reliability, and availability of SOA components [17].

The motivation for this work is the opportunity to improve the robustness and reliabilityof the Web service compositions through the development of recovery mechanisms. The devel-opment is based on the use of rewriting techniques for the provision of alternative services orsub-compositions to be used for the replacement of failed services from the composition. Thestudy of failure recovery in Web service and the proposal of a recovery mechanism are targetsof this work. The next sections present the details on the motivation, goals, the scope of theproblem and contributions of this research.

15

16 CHAPTER 1. INTRODUCTION

1.1 Motivation

This research is motivated by the need to improve the robustness of Web service compositions.A Web Service composition operates within a dynamic environment that eases the occurrenceof unpredictable disruptions and changes that can a�ect the execution of the system. The avail-ability of the remotely located services, the attributes of quality delivered by the services and theoperating condition of the network used to establish the communication between the servicesare some of the characteristics of the SOA environment that can cause failures on the executionof business processes. According to [10], providing services capable of detecting and correctingfaults while preserving the runtime performance and with minimal dependency on human in-teraction is one of the research challenges that remain in Service-Oriented Computing. Failurerecovery is a crucial issue for proper and adequate delivery of Web service functionalities [9].

A Web service composition should ful�ll some requirements to achieve success, such as re-liability and adaptability. Reliability is related to the capacity of a service to be available andwork correctly despite any exception or fault that may occur during the execution [18]. Adapt-ability is de�ned by the capacity of the system to implement changes, to adjust the behaviorto meet the speci�cation of the system [8]. Both requirements are directly related to the abilityto recover a composition from a situation of failure while preserving the properties speci�edby the user. These properties include the non-functional aspects of the speci�ed compositionwhich can be expressed by the user preference over a set of service services. For example, a usermight prefer a service with a lower cost over another one with a higher cost.

The temporary unavailability of a vital service during execution of the composition and thechange of quality attributes o�ered by a service are some of the possible situations that demandthe adaptation of the Web service composition to maintain the reliability of the system. In thesecases, the substitution of the service is a reasonable solution, considering the elimination of theservice that causes the problem and the adoption of a new, equivalent service. Also, consideringthe candidates for substitution that o�er the desired functionality, the user might prefer someof them over others based on non-functional attributes of the service [19]. Therefore, it becomescrucial to consider the preference expressed by the user while selecting the new services. In thatway, the delivered recovered composition not only follows the user speci�cation correctly butit is also appropriate according to the user preferences on the available services.

As pointed out by [18], selectability and adaptability are means of achieving and managingreliability on a Web service composition. Consequently, a mechanism capable of providing thoseresources to Web service compositions directly improve the degree of robustness of the servicecomposition. The development of this work goes towards the development of mechanisms torecover Web service compositions from failure applying the adaptation of the system duringthe execution while considering the user preferences.

1.2 General Goal

The general goal of this work is to provide fault recovery mechanisms for Web service compo-sition while respecting the preferences of the user. Such kind of mechanism plays the activerole of avoiding impacts caused by failures on the availability and correctness of a businessprocess. The application of the recovery mechanism consequently increases the degree of ro-bustness associated with the Web service composition. The principle of the recovery that wepropose is the replacement of failed services by a preferable available service. This reasoningfundamentally includes the selection of backup services and the substitution of services that ledthe business process to the failure state. The selection for alternative services must be reliableto the speci�cation and preferences de�ned by the user for the service composition during thedevelopment phase. Because of that, the rewriting techniques that guide the re�nement process

1.3. SPECIFIC GOALS 17

are considered as the basis for the process of backup selection. Also, the replacement of servicesmay demand di�erent levels of recovery for distinct conditions of rewriting provided by theavailable candidates.

1.3 Speci�c Goals

The combination of the following secondary goals guides the development of the present work inorder to achieve its primary goal. These speci�c goals cover de�ned contexts of the developmentof fault recovery mechanisms.

1. Analysis of previous works that address the problem of fault recovery in the context ofWeb service compositions, to understand the challenges involved in the development ofthe project.

2. Identi�cation of the di�erences between the rewriting process used for re�nement and therecovery mechanism proposed.

3. Speci�cation of the recovery mechanism that adopts suitable strategies for di�erent sce-narios of failures.

4. Evaluation of cost of recovery and compliance of user preferences for the recovered com-position considering the possible strategies of recovery.

1.4 The Problem Scope

The problem scope is limited to the provision of preferable alternative services or sub-compositionsfor a fault recovery mechanism. The selection of the alternative services is based on the rewritingsystem that initially re�nes the speci�cation of a distributed system to the actual implementa-tion of the Web service composition. The rewriting process is adapted to the context of recoveryof Web service composition to guide the selection of available services to replace the failed ser-vices of a monitored Wed service composition. In other words, the resulting elements of theadapted rewriting process are supposed to substitute the services that represent the cause offailures. Moreover, the replacement of failed services contributes to the repairing of the execu-tion of the business process. Consequently, this kind of recovery action improves the reliabilityof the distributed system while reducing the degradation of global user preference.

1.5 Contributions

This work delivers a set of contributions that represent the direct results of the speci�c goalspreviously de�ned:

1. The overview of previous works on fault recovery mechanisms. This contribution situ-ates the work on state of the art and shows the theoretical basis of the research area.Consequently, the theoretical background paves the way for the development of the nextsubgoals.

2. The de�nition of the necessary adaptation of the rewriting techniques to the applicationon fault recovery of Web service compositions. Since the recovery mechanism speci�ed inthis work will be based on the rewriting techniques usually considered for the re�nementprocess, it is important to understand the di�erences between these contexts.

18 CHAPTER 1. INTRODUCTION

3. A fault recovery mechanism capable of automatically react to failures searching for alter-native rewritings of the service composition to recover the business process.

4. An analysis of the e�ectiveness and impact of the di�erent levels of recovery. This con-tribution helps to understand how di�erent levels of intervention a�ect the total recoverytime and the degradation of user preferences of the recovered composition.

Chapter 2 presents the related works in the area of fault recovery for service compositions.Chapter 3 explains the background needed for the research, including basic concepts of SOA,fault recovery systems and techniques of Web service composition rewriting. Chapter 4 describesthe functioning of the proposed recovery mechanism. Chapter 5 shows the experiments executedwith the recovery mechanism and the analysis of the results achieved. Finally, Chapter 6 endsthe document stating the conclusions of our research.

Chapter 2

Related Work

This chapter presents an overview on previous works which address the challenges of develop-ing fault-recovery methods for Web service compositions. This study provides the theoreticalbackground for the present thesis. Also, this chapter helps to investigate what has been alreadydone in the research area. Moreover, the next paragraphs help us to identify what are thecontributions of the proposed work to the research area.

Fault-tolerance is the preservation of correctness of the delivered service in the presence offaults [15]. This aspect is a fundamental element of broader research topics, like self-healing anddependable systems. Self-healing is the ability of a system to discover, diagnose and react tofaults without disrupting the runtime environment. The development of autonomic capabilitiesin service-level management, such as self-con�guring and self-healing, is one of the most notableresearch challenges for the future of SOC [10]. A speci�c trend points to the development ofmechanisms to detect errors and recover Web service compositions [16].

According to [20], two of the elements that compose the development of a self-healing systemare the fault model and the system response. The fault model plays the fundamental role inde�ning what faults are expected by the self-healing system. This model includes characteristicsof faults such as duration, manifestation, source, granularity and pro�le expectations. Thismodel is essential for a given system to know whether it can heal itself. Additionally, the systemresponse is composed by the fault detection, the degradation of the system, the response forthe fault, the recovery operation, time constants and assurance of behavior.

Similar to our work, the self-healing composite service approach suggested by [21] proposesa self-healing approach while dealing with services at a conceptual level to provide a formalmodel for composite service executions. The work does not identify the type of faults; instead,it handles general service failures which may be caused by any fault. Nonetheless, the authorrecognizes the importance of identifying the type of faults instead of general service failuressince di�erent faults may require di�erent reactions.

Some works focus on extending the standard Business Process Execution Language (BPEL)due to its lack of appropriate mechanisms to satisfy self-healing requirements. For example, [22]suggests SelfHeal-BPEL, an extension to ActiveBPEL engine for supporting the execution ofBPEL processes. This architecture extension is composed by planning, monitoring, diagnosisand recovery modules. The work�ow of these modules follows a self-healing policy which holdssome information about the executed business process, such as, (i) pre- and post-conditions ofBPEL activities, (ii) list the BPEL activities to monitor during BPEL process performance,(iii) diagnose information related to unexpected failures of BPEL activities, and (iv) suggestedrecovery strategies for these failures. The authors explain that the reasoning of their tool ischoosing the recovery strategy that would cause the least disruption to the executed Webservice composition based on the severity of the failure. The recovery strategies considered for

19

20 CHAPTER 2. RELATED WORK

that work are: Retrial and Data Mediation, representing minor impact; Substitution, causingan average impact; and Process reorganization resulting in a major impact.

Similarly, [23] suggests augmenting BPEL with self-healing capabilities, like run-time mon-itoring and reaction strategies, by proposing a solution called Dynamo (Dynamic Monitoring).Their proposal adds the external de�nition of supervision rules which are formed by: (i) thelocation that indicates which part of the process is evaluated; (ii) supervision parameters whichare meta-level information, such as the priority level of the supervision rule; (iii) monitoringexpression speci�ed in WSCoL (Web Service Constraint Language); and (iv) a set of reactionstrategies.

Another example is AO4BPEL, an aspect-oriented mechanism to extend BPEL [24]. Besidesthe gain in modularity on the composition speci�cation, the mechanism enables the adaptationof the application's behavior at runtime through dynamic weaving. Summarily, AO4BPELprovides the speci�cation of: (i) the location of the process to be monitored (join point); (ii)the predicates on the attributes handled during the monitoring (pointcut) and (iii) the piece ofcode that must be executed when the pointcut is reached (advice). The combination of thesefeatures grounds the theory of aspect-oriented programming, and they are easily adapted tosatisfy the requirements of a self-healing mechanism.

Other works follow a similar approach [7, 25, 26, 27]. They de�ne entities and methodologiesthat enable the de�nition of monitoring statements and the triggering of the autonomic exe-cution of recovery actions in the face of disruptive situations. In this chapter we recognize thecontribution of those works that are dedicated to developing the self-healing compositions in acomprehensive context. However, this chapter also focuses on exposing the contribution of worksthat were dedicated to the speci�c development of supporting technology for the substitutionof failed components from a Web service composition.

The capacity of a component to be replaced is quanti�ed and analyzed in [28]. The authorsde�ne and evaluate a replaceability property to compose Web services that both performsreselection and avoids the violation of QoS constraints. Replaceability is de�ned as the degreeto which a composition or a service is exchangeable with one that accomplishes the same goalor processing. This property also depends on the desired QoS constraints for the compositionand the available candidates for substitution. That work proposes an algorithm that �nds aservice composition that is equal to or more tolerant than an initial composition by consideringthe replaceability of the components of the initial composition and the other available services.

In [29], the authors propose an architecture for self-healing which recovers dynamic Webservice composition from QoS faults. Their approach is based on providing an alternative servicefor the faulty service by performance prediction. When the process execution starts, the BPELProcess Scanner detects and calculates the response time o�ered by each service and the averagevalue of previous response times. The Performance Evaluator compares those values and, if thecurrent value is higher than the average response time, then the Service Extractor is activatedto replace the evaluated component. This replacement avoids the violation of response timebefore the execution of failed service. A dedicated entity, called Alternate Service Finder looksfor a service which suits the requirements and the interrelationships with other services. Theresults of the search are sent to the Service Integrator that adapts the initial composition.

Considering that it is often necessary to select the most preferred substitution candidatebased on non-functional attributes of the service (e.g., security, reliability), [19] proposes anapproach to use preference networks for representing and reasoning about preferences overnon-functional properties. This work presents algorithms for solving variants of this problem:i) context-insensitive substitution, when the choice of the preferred substitution is independentof the other constituents of the composite service; ii) context-sensitive substitution when thechoice of the preferred substitution depends on the other constituents of the composite service;and iii) when multiple constituents of a composite service need to be replaced simultaneously.

21

The context-insensitive approach assumes that the preferred substitution can be obtained inde-pendently of the context as a local optimization. In the context-sensitive approach, the contextof the substitution is taken into account as in a global optimization approach.

In the matter of organizing the search space of available services, the Web service com-position model presented in [30] is based on clustering similar Web services available in therepository into tasks and jobs. By proposing these service organizing technique, the authorsaim at fastly substituting the faulty service from the corresponding task and also isolating thefaulty Web service in its boundary to do not a�ect other services.

The exchange of components in a composite service is also referred in some works as rebind-ing. In this context, the set of concretizations that not only ful�ll the functional requirementsbut also satisfy the QoS constraints for the composition are represented by the bindings betweenabstract and concrete services [31]. Thus, rebinding is also considered the replacement of a con-crete service that was initially bound to a given abstract service. WS Binder is a frameworkproposed in [31] for enabling pre-execution and runtime dynamic binding of service composi-tions according to functional and non-functional preferences, global and local QoS optimizationcriteria. The framework supports runtime recovery actions by performing service rebinding. Ifno binding has been speci�ed during the Service Selection phase or if a failure arises,WS Bindercomes into place by selecting, based on con�guration preferences, the slice of the work�ow tobe rebound. This slice can either consist of a single service or of a partial work�ow fragmentthat leads to the termination of the work�ow execution. In both cases, the steps of discoveryand selection are performed on the selected slice.

The authors of [2] also propose a QoS-aware binding approach for early runtime re-bindingwhenever the actual QoS deviates from initial estimates, or when a service is not available. Theproposed approach aims at "early predicting" the need for rebinding, possibly before it would betoo late and any recovery action would become ine�ective. When a service is not available, theirapproach considers two strategies: (i) continuing the process execution, accepting the violationof response time and possibly a price increase; or (ii) interrupting the process execution. Thelatter has to be preferred in some application domains, such as time-dependent systems, or insome business contexts where the violation of SLA constraints may lead to applicable penalties.In any case, although a rebinding can permit the completion of the work�ow execution, the�nal response time needs to account for both unavailability timeout and the rebinding overhead.When this is not acceptable, the process interruption is the only viable alternative.

The self-healing middleware framework proposed by [32] de�nes the di�erent types of substi-tutions regarding the new selected components. In this framework, the recon�guration executionis performed by dynamic rebinding the requests for a newly available Web service componentin a seamless way for the remote requesters. The framework primarily considers two types ofsubstitutions: (i) single substitution, where the de�cient service is entirely replaced by a newequivalent one which o�ers the same operations; (ii) composite substitution, where the failedservice is replaced by a set of services where their union covers the operations o�ered by thede�cient one and no overlapping is detected in the o�ered operations.

The replacement of services is also explored by some works through the transactional aspectrepresented by the execution of services. In [33], for example, authors proposed a frameworkfor e�cient, fault tolerant, and correct distributed execution of Transactional Composite WebServices (TCWSs). The substitution of services is done such that the new component locallyoptimizes the QoS. The framework relies on service replacement and on a compensation pro-tocol to support forward and backward recovery of the composite services. The execution of aTCWS in the framework is managed by an Execution Engine, which is in charge of initiating,controlling, and monitoring the execution. If a service S fails during its execution, the executionis modi�ed in a way that, if S is retriable, the execution control tries forward recovery by re-invoking until it successfully �nishes. Otherwise, another transactional equivalent S' is selected

22 CHAPTER 2. RELATED WORK

to replace S, still trying a forward recovery. If there not exist any substitute S', a backwardrecovery is needed, i.e., all executed service must be compensated.

In the context of backward recovery, the work in [34] proposes a QoS-driven service replace-ment model with compensation support for Web service compositions. The proposed modelintegrates �exible compensation service and substitute selection during execution of the busi-ness process. Since component failures may cause cascade failures on the partial participants tothe composition, the work considers determining the a�ected range of cascade failure servicesas the �rst step for correct replacement. Based on the idea of executing the needed repairs withminimal interrupt time delay, the authors propose a self-healing algorithm that includes threemain steps: (i) identifying the unavailable component and checking the need to compensate;(ii) if needed, identifying the minimal compensate scope a�ected by faulty component based onmulti-relations, such as control relation, data relation and business relation and so on; (iii) re-selecting the optimal replacement candidate for the minimal scope of replacement by matchingbehavior interface and bene�t-cost analysis.

This chapter presented a literature review of works that aimed to propose solutions that pro-vide self-healing capabilities and fault-tolerance to Web service compositions. Works proposinga comprehensive architecture for recovering mechanisms were initially presented in the presentchapter [22, 23, 24, 7, 25, 26, 27]. This overview showed the conventional structure for thoseworks and the similarities on their solutions. Later in the present chapter, the focus was redi-rected to those works that are speci�cally dedicated to solving challenges related to the substi-tution of failed Web services, or rebinding [28, 29, 19, 30, 31, 2, 32, 33, 34]. The consideration oftransaction properties [33, 34], con�guration preferences [31, 19] and organization of availablecandidates registry [30] are some of the contributions of those works that are closely relate withour proposal, although they represent di�erent approaches.

Di�erently of some referred works [22, 23, 24, 7, 25, 26, 27], this thesis does not cover thespeci�cation of a fault monitoring system. The scope of the work is focused on the speci�cationof a fault recovery mechanism that provide alternative options for replacement of failed service.In comparison with works on the same topic, the proposed approach presents the di�erential ofbeing based on outcomes of a rewriting process of compositions. Consequently, the mechanismspeci�ed in the present work includes some of the properties that are individually covered by[30, 31, 19, 33, 34]. The rewriting process considered by the present work speci�es fundamentalaspects of the re�nement method that can be properly adapted to a fault recovery mechanism,such as, (i) a formal search space of available service for re�nement;(ii) reasoning over trans-actional properties of the composition; and (iii) consideration of the user preferences on theresulting composition. In the next chapter, we present more details on the methodology of theproposed fault recovery mechanism for Web service compositions.

Chapter 3

Preliminaries

This chapter includes the fundamental concepts involved in the development of the work. Ini-tially, notions of the Service-Oriented Architecture (SOA) are exposed to establish the back-ground concepts of the research area. After that, the chapter dedicates a speci�c section topresent the concepts and techniques of the service composition rewriting problem. The chapterconcludes by explaining concepts related to the context of fault-recovery mechanisms for servicecompositions. All the theoretical background stated in this chapter o�ers the necessary expla-nation of the problem which the work proposes to solve as well as the tools that instrumentthe development of the solution.

3.1 The Service-Oriented Architecture

SOA is a logical way of designing a software system to provide services either to end-user ap-plications or other services distributed on a network, via published and discoverable interfaces.SOA can be viewed as a key to ful�ll the visionary promise of Service Oriented Computing(SOC), where applications are built over loosely-coupled services, to create dynamic businessprocesses and promote the agile development of software [10].

Web Service standards provide and implement the general SOA concept. These standardshave become a set of practical tools used by enterprise engineers for SOA projects [35]. IBMdescribes a Web service as self-contained, self-describing, modular applications that can bepublished, located and invoked across the Web [8, 36].

SOAP-based Web services [37] and RESTful Web services [38] represent the two main ar-chitectural styles regarding the development of Web services [8, 39, 21]. The research trendsand published services show that the developers prefer the RESTful services over SOAP-basedservices. Moreover, the essential public directories of SOAP services (UDDI) were shut down in2006 [21]. The present dissertation considers a higher-level concept of services, which is indepen-dent of their implementation. Therefore, the work developed does not explore the speci�citiesof those architectural styles.

A service can be classi�ed as atomic or composite [40]. An application that does not dependon another service to respond to the service requesters is an atomic Web service. Whereas,composite services, or service compositions, result from the combination of services, atomicor other composite services, to implement a set of functionalities of a more complex businessprocess. Composite and atomic services are speci�ed by an identi�er (e.g., URL), a set ofattributes and a set of operations [8].

23

24 CHAPTER 3. PRELIMINARIES

3.1.1 The Life Cycle of Service Compositions

The life cycle of a service composition encompasses four phases: De�nition, Service Selection,Deployment, and Execution. The phases are described as follows [8]:

• De�nition: The Web service composition is speci�ed by an abstract model, includingservice requirements and user preferences;

• Service Selection: An available concrete service is selected to ful�ll the requirements ofeach service speci�ed by the abstract model;

• Deployment: The selected services are integrated so that the composition is deployedas an executable service;

• Execution: The composition is executed possibly together with monitoring and fault-handling mechanisms.

The principle of distributing the service composition development in phases is starting withan abstract de�nition and gradually making it concrete. In this way, an executable serviceprocess can be generated from the abstract speci�cation [41].

3.1.2 Execution Control and Transactions

The execution of a composition of services is the invocation of each service according to the spec-i�cation of the implemented system. Orchestration and choreography are the two approaches forexecution control of Web service compositions [8, 10, 39]. Orchestration represents a centralizedperspective where a business party involved in the composition is responsible for coordinatingthe sequence of interactions among the service services. That includes the management of mes-sage exchange between the parties, the capacity to handle errors and the description of theoverall process [8].

The choreography approach states a distributed perspective of the global description thatspeci�es the behavior of a given Web service composition. This interaction protocol is basedon the exchange of messages, rules of interaction and agreements between the business parties.Those elements serve as guidelines for the operations implemented by each of the service servicesinvolved in the business process. Di�erent from the orchestration none of the entities involvedis responsible coordinate the execution of the business process [39].

Two execution scenarios are fundamental to the composition of services, sequential andparallel [21]. The former presents a relation of dependability between the services in a way thatsome services cannot be invoked until the previous ones have �nished. This behavior happensbecause the dependent service needs the attributes produced by the previous ones, or there arerestriction controls sequentially imposed by the speci�cation. In the case of a parallel scenario,services can be simultaneously invoked since they do not have dependencies of data amongthem.

The fault tolerance and reliability of a composite service can be approached at di�erentlevels during its development. Usually, the exception handling constructs at the language level,such as in Business Process Execution Language (BPEL) [42], support the implementationof fault-handling strategies. On the other hand, at a more abstract level, approaches at thework�ow of the composition provide a language and technology independent solution for theissue [43]. In this context, the transactional properties of Web services are important aspectsof the development of those solutions. They are de�ned, by [44], as follows:

WS Transactions are de�ned as sequences of Web services operations or processesthat are executed under certain criteria to achieve mutually agreed outcome re-gardless of system failures or concurrent access to data sources, i.e., either all the

3.2. SERVICE COMPOSITION REWRITING 25

Web services operations succeed completely or fail without leaving any incorrect orinconsistent outcomes.

Web services may be classi�ed as e�ect-providing or data-providing services [12, 11]. The �rstcategory de�nes services whose execution produces changes in the state of the world due to thebusiness functions that they implement, being also called world-altering services. The secondcategory de�nes information-providing services that allow query-like access to organizations'data sources and their execution does not present any e�ect on the state of the world. Thisclassi�cation is relevant to the discussion on the proper recovery actions that must be executedin case of the possible service failures. For example, the substitution of world-altering servicesdemands more complex solutions considering the inconvenience of repeated executions and theneeded compensation for the possible e�ects caused before the failure. In the case of data-providing, those requirements do not exist, and their substitution can be implemented moree�ciently. The present work considers that the services of the service compositions and theirrespective alternatives are data-providing services.

The next section explains the rewriting techniques used during the Service Selection phase,responsible for translating the abstract speci�cation of the system regarding available Webservice services.

3.2 Service Composition Rewriting

The SOA paradigm can be naturally considered a heterogeneous information environment. Thewide variety of o�ered services to deliver the same functionality demands e�orts to address thechallenge of selecting proper services while developing a Web service composition. Because ofthat, works like in [11, 12, 1, 13, 14], that adapt query rewriting techniques for the automaticcomposition of services, prove the applicability of these techniques in the context of SOA.

The automatic composition of Web services is achieved through the selection of candidatesservices to integrate the �nal composition, and the con�guration of the selected services toform the speci�ed system. The selection of candidates aims to identify the concrete servicesthat cover the expected functionalities of the abstract composition. Once the services are chosen,the combination of them generates a concrete composition that translates the speci�cation ofthe system.

The works in [1, 14] adapt a query rewriting algorithm to serve as an approach to theservice composition re�nement. The re�nement process suggested by the authors uses preferenceinformation to guide the search space of concrete services. The re�nement algorithm is capableof proposing concrete compositions that comply with the composition speci�cation and thatare presented in a decreasing order determined by the recommendation indicator (such asuser preference, QoS, etc.). The score associated with each concrete service available to thecomposition developer represents the user preferences on services. Services having a higherscore are preferred to the ones having lower scores [1]. The results of those works are consideredas a theoretical basis for the methodology of the present dissertation.

The process for re�ning abstract compositions is structured in two main phases. In the �rstone, each concrete service de�nition is scanned to identify what parts of the speci�cation itcovers [14]. This identi�cation is achieved through the computation of tuples, called PartialCoverage Descriptors (PCDs), which contain the semantic mapping information to make theparameters of the concrete service compatible with those of the speci�cation. For each possiblematching, a PCD containing the mapping information will be produced.

The next sections present and illustrate the conceptual de�nitions in [1, 14] that are funda-mental to the development of the work in the present dissertation.


3.2.1 Abstract Composition

In the context of [1, 14], abstract compositions are speci�ed by equations, such as:

C(t̄) ≡def A1(t̄1), . . . , An(t̄n), Q1(t̄′1), . . . , Qm(t̄′m)

The left-hand side of the speci�cation de�nes the interface of the composition. The elementsof the tuple t̄ are formal parameters and represent input(?) and output(!) data. The right-handside of the de�nition consists of abstract services (A1, . . . , An), also called subgoals, and qualityconstraints (Q1, . . . , Qm), expressing requirements of the composition.

Abstract services correspond to semantic descriptions of service functionalities and relation-ships between their required inputs and expected outputs. In this scenario, abstract servicesare the building blocks used to specify the abstract composition to be re�ned. Constraints arerelational expressions that can express static conditions (to be veri�ed during the re�nementprocess) and dynamic conditions (to be veri�ed during runtime). Quality constraints Qi(t̄) areof the form (X op Y ), (X op a) or (X ∈ C) where X and Y are variables, a is constant,op ∈ <,>,≤,≥,= and C is a set of constants.

Example 3.2.1. Consider the speci�cation of a composite service for booking �ights over theInternet. This composition is speci�ed as follows:

TravelAgency(Uid?, Pwd?, T ravelParam?, F lightTkn!, T otalCost!, Ack!)

≡Authentication(Uid?, Pwd?, UsrTkn!),

F light(UsrTkn?, T ravelParam?, F lightTkn!, F lightInvoice!),

Payment(UsrTkn?, F lightInvoice?, T otalCost!, Ack!, P rot!, Form!),

Form = ”JSON”, Ack = ”OK”

The client supplies an identi�cation, password, and parameters of travel (origin, destination,departure and return date). In exchange, she expects to receive the tickets price (already debitedfrom her bank account), a travel token and the transaction acknowledgment. The compositionbegins with the authentication of the client (who is supposed to be registered in an authenticationservice used by the travel agency). This step returns a token to identify the client in the airlinecompany Website. The BookFlight service also uses the expected locations and dates provided bythe user to return the price and the invoice, to process the payment. After looking up the �ight'sprice, the bill is paid by using the credit card information, already associated with the client'sidenti�cation (service Payment). There are two constraints in the composition. The �rst one isa static constraint that speci�es the request format to be used for payment. The second conditionestablishes that the whole process was successfully �nished (to be veri�ed dynamically).

3.2.2 Concrete Services

As in the case of abstract compositions, the left-hand side of the concrete service speci�cationde�nes the interface of the service. The elements of a tuple t̄ are formal parameters and representinput(?) and output(!) data. The right-hand side of the de�nition consists of abstract services(A1, . . . , An) and quality constraints (Q1, . . . , Qm), expressing requirements of the service.

S(t̄) ≡def A1(t̄1), . . . , Ak(t̄k), Q1(t̄′1), . . . , Qr(t̄′r)

In this case, the left-hand side of the de�nition gives the name and interface of the concreteservice. The right-hand side uses abstract services and constraints to express the capabilities ofthe service. The semantic information embedded in the abstract services can help to broadenthe number of available services for the re�nement process. The service supplier/publisher issupposed to give the speci�cation of each concrete service.


Example 3.2.2. In the context of the previous example, let us de�ne two of the concrete servicesused for payments:

V isaCheckout(Tkn?, Invoice?, Ack!, P rot!, Form!) ≡Payment(Tkn?, Invoice?, Ack!, P rot!, Form!),

P rot = “REST”, Form = {“URIQueryString/CRUD”}

PayPal(Tkn?, Invoice?, Ack!, P rot!, Form!) ≡Payment(Tkn?, Invoice?, Ack!, P rot!, Form!),

P rot = “REST”, Form = {“JSON”, “URIQueryString/CRUD”}

VisaCheckout1 is a payment service that uses the REST protocol, and its supported requestformat is the URI Query String/CRUD. PayPal2 is a payment service that uses the RESTprotocol, and it supports the request formats JSON and URI Query String/CRUD.

The re�nement method matches the speci�cation of concrete services with that of the ab-stract composition while considering the de�nition of each available concrete service to coverparts of the abstract composition. These coverings will be later used to build a concrete solution(re�nement) of the speci�cation.

3.2.3 User Preferences

Based on utility theory models, [1] de�nes user preferences on services as follows:

Given a set S of concrete services and a real-valued scoring function u : S 7→ [0, 1], apreference P = (S , <P ) is derived from the scoring function, where for two concreteservices x, y ∈ S , x <P y i� u(x) < u(y). The above de�nition is interpreted as"I like y better than x if it has a better score". Notice that scoring functions canrepresent very di�erent types of preferences, ranging from simple scores to complex,multi-criteria expressions that combine various quality perspectives. These scoresmay be manually de�ned by users or communities, may be deduced or estimatedfrom user activity, or may be automatically assessed.

Example 3.2.3. One user can rank services according to a quality factor (e.g., preferringthose having lower response time). She can use statistics on service response time as a scoringfunction. Another user may prefer services having a better reputation. He can enumerate hisscore on some services (e.g., VisaCheckout(0.8), PayPal (0.7)), calculate average appreciationfrom community forums for other services, and give default values to the remaining ones. Athird user may compute di�erent scores according to several quality criteria and combine themas a weighted sum.

According to [1], their approach is independent of how the scoring function is de�ned, butthe obtained results rely on it. The notion of preference classi�es the available concrete servicesfollowing the user's point of view. However, this notion is orthogonal to the semantics of eachconcrete service, since the de�nition of preferences does not need to take into account thefunctionality of concrete services. The authors aim to use preference information to producecompositions that maximize the combined weight of its service services. The notion of CoverageDomain is introduced to adapt this information to the context of the re�nement procedure, thenotion of Coverage Domain is introduced.

1Service described at https://www.programmableWeb.com/api/visa-checkout2Service described at https://www.programmableWeb.com/api/paypal


3.2.4 Coverage Domain

For each abstract service Ai of an abstract composition the Coverage Domain Ai is de�ned,such that:

• Every concrete service sj ∈ S belonging to the coverage domain Ai contains the abstractservice Ai in its de�nition.

• For each coverage domain Ai, there is a threshold ξi, de�ned at the same time as theabstract composition. Only a service sj such that u(S) ≥ ξi will be included in thecoverage domain.

Notice that, since each abstract service Ai is a semantic annotation that denotes that a concreteservice sj provides a de�ned functionality, the coverage domain Ai represents the set of concreteservices that can perform such a abstract service. Also an empty coverage domain Ai indicatesthat no concrete service of S provides the abstract service described by Ai. In this case, nore�nement can be produced over S .

Example 3.2.4. Let us suppose that the coverage domain APayment, related to the Paymentabstract service is de�ned such that ξAPayment

= 0.6. Suppose that the VisaCheckout and Pay-Pal services presented in Example 3.2.2 have scores of 0.8 and 0.7 respectively. Both concreteservices belong to APayment since their speci�cation contains the Payment abstract service andtheir weights are greater than the threshold.

As previously explained, the re�nement process should propose concrete compositions thatcomply with the composition speci�cation and that are presented in decreasing order of prefer-ences. Implicitly, the former abstract services in a given composition have higher priority thanthe later ones, in decreasing order. A concrete composition may consist of a tuple of concreteservices. Each of these services may belong to a di�erent coverage domain. Each coverage do-main is associated with a user preference, i.e., a preference order of its concrete services. So weneed to adopt a mechanism to combine preferences in order to model complex preferences. In[1, 14], the authors are interested in two types of complex preferences, namely Lexicographicaland Pareto orders [45, 46]. The Lexicographical order semantics strictly orders the abstract ser-vices concerning their importance, whereas the Pareto semantics considers none of the abstractservices as more important than another. This semantics is frequently used in the literatureand may lead to di�erent rankings of the re�ned (concrete) compositions. More details on thePareto ordering mechanism are explained in the following sections.

3.2.5 Pareto Preference

In a Pareto preference, all preferences are equally important what provides an intuitive way ofcombining user preferences. A tuple of concrete services is better than another one, if it is notworse in any preference and better in at least one of them. In other words, form user preferencesP1 = (A1, <P1), . . . , Pm = (Am, <Pm) where Ai is a coverage domain and ui : S 7→ [0, 1] is areal-valued scoring function associated to Pi, a Pareto preference P = (A1 × · · · × Am, <P ) ontwo tuples of concrete services x = (x1, . . . , xm), y = (y1, . . . , ym) is de�ned as:

(x1. . . . , xm) <P (y1, . . . , ym)⇔∀i ∈ {1, . . . ,m} : ui(xi) ≥ ui(yi)∧∃j ∈ {1, . . . ,m} : uj(xj) > uj(yj)

Intuitively, in the Pareto approach, the successors of a given combination of concrete servicesare obtained by replacing one service at a time. For instance, a set of (Pareto) successors of atuple of integer numbers 〈n1, . . . , nk〉 is {〈n1 + 1, . . . , nk〉, . . . , 〈n1, . . . , nk + 1〉}.


3.2.6 Formation of PCDs

As de�ned in section 3.2, Partial Coverage Descriptors (PCDs) are used to describe how a servicecan be used to cover parts of the abstract services of a composition. These tuples contain thesemantic mapping information that is considered during the translation of the compositionspeci�cation in terms of concrete services. The structure of PCDs is presented below:

〈S, h, ϕ,G,Def, has_opt〉

• S identi�es the concrete service in which the PCD is based

• h, the head homomorphism on S, is a mapping from Terms(S) to Terms(S). Such that:

� For every term x that is not a parameter of S (i.e., terms not appearing on theleft-hand side of the speci�cation of S), h(x) = x.

� Additionally, for terms x and y that are parameters of S, h may be such that h(x) =h(y), where for every parameter x we have that h(x) = h(h(x)). This mapping isthe head homomorphism in [47]. For example, consider the abstract compositionC(. . . ) ≡def A1(. . . ), . . . , Ai(x, x), . . . , An(. . . ) and the concrete service S(u, v) ≡def

. . . , Ai(u, v), . . . , we need to equate u and v, so that ϕ may be de�ned as a function.

• ϕ is a partial mapping from Terms(C) to h(Terms(S)) that de�nes the correspon-dence between the terms appearing on the abstract composition and the terms thatappear on the concrete service de�nition. For example, consider the abstract composi-tion C(. . . ) ≡def A1(. . . ), . . . , Ai(x, y), . . . , An(. . . ) and the concrete service S(u, v) ≡def

. . . , Ai(u, v), . . . , we have that ϕ(x) = u and ϕ(y) = v. Notice that we have ignored themapping h in this example.

• G is the set of abstract service names and quality constraints covered by S

• Def is a set of quality constraints of the abstract composition. This set will contain thoseconditions that cannot be guaranteed by S alone.

• has_opt is a boolean �ag used to indicate that some abstract service in the de�nition ofS has been used in G and has an optimal parameter.

Example 3.2.5. Let C(X, Y ) ≡def A1(X,X), A2(X, Y ) be an abstract composition and S(xa, xb)≡def A1(xa, xb), A3(xa) the speci�cation of a concrete service. Let us consider the subgoalA1(X,X) in C. Clearly, we can use the de�nition of S to cover part of the composition. Indeed,it is possible to obtain the PCD D = 〈S, ϕ, h, {A1}, ∅, false〉 where h(xa) = xa, h(xb) = xa andϕ(X) = h(xa).

In the context of the formation of PCDs, some individual cases are essential to be explored.For instance, it is worth noticing that parameters appearing on the left-hand side of a compo-sition should only be mapped to parameters appearing on the left-hand side of concrete servicespeci�cations or optional ones.

Example 3.2.6. Let C(y?, z!) ≡def A1(y?, z!, w?), A2(w!), A3(y?, z?) be an abstract compositionand S(xa?, xb!) ≡def A1(xa?, xb!, xc?), A2(xc!) the speci�cation of a concrete service S. The PCDD = 〈S, h, ϕ, {A1, A2}, ∅, false〉, with h(xa) = xa, h(xb) = xb and ϕ(y) = xa, ϕ(z) = xb,ϕ(w) = xc, is built since: (i) terms y and z (the parameters appearing on the left-hand side ofthe composition) are mapped by ϕ to, respectively, xa and xb (also in the left-hand side) of thede�nition of S; (ii) ϕ(w) is a term that appears only on the right-hand side of the de�nition ofS and is mapped to xc. The inclusion of A2 in the coverage of the D is motivated by the sharingof the term w in the right-hand side of the abstract composition: this term does not appear onthe left-hand side of S and acts as input parameter of A1 and output parameter of A2.


Figure 3.1: Concrete services and PCDs stored at the Organiser according to the user's prefer-ences [1].

The re�nement method in [1, 14] supports the notion of optional parameters in the speci�-cation of concrete services, i.e., parameters that can be ignored. The information about optionalparameters is supposed to be provided by the vendor of the service as part of its speci�cation.

3.2.7 Combination of PCDs

The resulting mappings of the �rst phase are used in the second phase of the algorithm to build aconcrete solution (re�nement) of the speci�cation. The second phase combines concrete services,to cover the whole speci�cation. Several possible solutions may be obtained and presented tothe user. In [1], the authors present three di�erent versions of the re�nement method. Theydi�er on the preference strategy - Lexicographical order or Pareto - and on the computation ofthe PCDs (Partial Coverage Descriptor) at one go or on the �y.

The re�nement method proposes an organizer, an index-like structure, which is built foreach abstract composition, according to the user's preferences and the coverage domain order.The goal of this organizer is to propose the concrete compositions according to an establishedpreference semantics. Figure 3.1 illustrates the three-level structure used to implement theorganizer. The �rst level sets the order of coverage domains in the case of Lexicographical ordersemantics. For instance, in Figure 3.1, concrete services in A0 having the highest weight arefound in the slot labeled rank0.0 while those having the lowest weight are found in the slotlabeled rank0.m0 . The authors of [1] suppose that for domain A0 there are m0 ranks, while forA1 there are m1 and so on. In the third level, they use a hash table to store concrete services.More precisely, a hash function associates a slot to a concrete service, and its PCDs are storedin the corresponding linked list. Notice that, the preference assigned to the PCD is the samepreference score as its concrete service.

An iterator I traverses this organizer by respecting the following rules:


• A concrete service can take part in a composition i� it does not cover a domain alreadycovered by previous services (i.e., already chosen to be in this composition, according tothe traversal order).

• A coverage domain is not visited if a service in the composition has already covered it.

The iterator generates each set P of PCDs involved in a generated composition. The re-�nement process further checks whether a given set P can be correctly combined to produce acomposition re�nement. The number of PCDs in P is not necessarily equal the number of ab-stract services in the speci�cation C , since a concrete service may cover more than one abstractservice.

The iterator guarantees that P respects the following properties: (i) PCDs in P cover all theabstract services A1, . . . , An of C and (ii) there are no overlapping between the abstract servicescovered by di�erent PCDs. Deferred quality constraints give the only allowable intersectionbetween them. The re�nement method tries to cover the de�nition of the abstract compositionC , by verifying whether, for partial coverages in P , the following conditions hold:

1. The deferred quality constraints appearing in the PCDs must hold when their variablesare instantiated using the mappings of the PCDs.

2. Each term in C mapped to an optional output parameter (inside the de�nition of si)can only be mapped to optional input parameters (inside the de�nition of any concreteservice).

Example 3.2.7. Given the speci�cation of the composition of Example 3.2.1, suppose that wehave de�ned the following coverage domains obtained from the available concrete services anduser preferences (in decreasing order of precedence):

Authentication: OrangeAuth (0.8), YahooAuth (0.7), Twitter (0.65), Facebook (0.6).BookFlight: Expedia (0.9) and Almundo (0.8)Payment: VisaCheckout(0.8) and PayPal (0.7).

The algorithm for producing PCDs will return a set of PCD on the concrete services Or-angeAuth, Expedia, and Paypal. Notice that, although VisaCheckout is preferred over PayPal,the service cannot be used in the composition since it only uses the URI Query String requestformat, instead of JSON, which is required by the speci�cation. In this way, no PCD is generatedbased on this service. Thus, the procedure generates the following re�nement:

TravelAgency(Uid?, Pwd?, T ravelParam?, F lightTkn!, T otalCost!, Ack!)

≡def OrangeAuth(Uid?, Pwd?, UsrTkn!),

Expedia(UsrTkn?, T ravelParam?, F lightTkn!, F lightInvoice!),

PayPal(UsrTkn?, F lightInvoice?, T otalCost!, Ack!, P rot!, Form!),

Ack = ”OK”

If the combination of PCDs in P satis�es the conditions above, one concrete composition isproduced. The process completes the composition imposing quality constraints and optional pa-rameters when they apply. The re�ned composition is returned with its pre- and post-conditions.These conditions cannot be statically veri�ed and need to be checked at runtime by the gener-ated concrete composition. They represent quality constraints of the abstract composition thatare not insured by the concrete services (since they have a dynamic nature). An example is thecondition Ack = ”OK”.


Example 3.2.8. Let us consider the abstract composition: C(y!) ≡def A1(x?, y!), A2(x!), x ≥10, y ∈ {5, 4, 3}. Let S3(xc?, xa!) ≡def A1(xc?, xa!), A2(xc!) be a concrete service speci�cation.Notice that the de�nition of S3 does not impose any quality constraints. In this case, the qualityrequirements of C are not ensured by the coverage provided by S3. However, as x and y aremapped by ϕ to parameters of S3, the quality requirements x ≥ 10 and y ∈ {5, 4, 3} will beincluded in the re�nement as pre- and post- conditions, respectively.

Each concrete composition C ′(EC(t̄)) ≡def S1(t̄1), . . . , Sk(t̄k) has a parameter tuple ob-tained by applying the function EC(t̄) to the parameters of the abstract composition. Thisfunction expresses an equivalence class of parameters. The function EC(t̄) permits to equateparameters that are di�erent on the abstract composition but that are mapped to the sameterm on a concrete service [14].

Example 3.2.9. Let C(x?, y?, z!) ≡def A1(x?, y?, w!), A2(w?, z!) be an abstract composition,and S1(a?, r!) ≡def A1(a?, a!, r!) and S2(c?, d!) ≡def A2(c?, d!) be the speci�cations of concreteservices. The PCDs D1 and D2 are produced as follows: D1 = 〈S1, h1, ϕ1, {A1}, ∅, false〉, whereh1 is the identity function and ϕ1 is de�ned such that ϕ1(x) = a, ϕ1(y) = a and ϕ1(w) = r.And D2 = 〈S2, h2, ϕ2, {A2}, ∅, false〉 where h2 is the identity and ϕ2(w) = c and ϕ2(z) = d.Notice that in D1, both x and y are mapped by ϕ1 to a and thus de�ne the equivalence class{x, y}. Indeed, from the point of view of D1, x and y correspond to the same parameter. In orderto build a re�nement in terms of D1 and D2, each occurrence of a must be replaced with therepresentative term of the equivalence class {x, y}. So, we can generate the concrete compositionC ′ by using the terms in EC(〈x, y, z〉), as follows: C ′(x?, x?, z!) ≡def S1(x?, w!), S2(w?, z!).

The parameters of S1(t̄1), . . . , Sk(t̄k) in the concrete composition are represented by thetuples t̄i. The terms in these tuples are obtained as t̄i = fi

−1 ◦ EC ◦ ψi ◦ hi(t̄′i), such that: (i)t̄′i are the parameters of Si; (ii) the mappings ψi rename the variables of the service Si intothe corresponding variables of the abstract composition; and (iii) the conversion functions fiare provided by a set of ontologies. For each t′j ∈ hi(t̄′),

ψi(t′j) =

{tj, if ϕi(tj) = fi ◦ hi(t′j)t′j, otherwise

As usual, conversion functions are bijective. In the case of the same representation of data,conversion functions are the identity [14].

Three algorithms that implement the re�nement method are presented by [1]. They sharea nearly common structure but di�er in the way the produced solutions are ordered or in themoment in which new PCDs (Partial Coverage Descriptors) are calculated. The algorithms are:

• WALO: This algorithm distinguishes two main phases, (i) building all the PCDs re-quired by the abstract composition, by considering all the available concrete services;and (ii) producing concrete compositions from a search space formed by concrete ser-vices annotated with (numerical) user preferences. The search space is swept by using alexicographical order of the abstract services that form the composition.

• LOIR: This algorithm is a variation of WALO, where the two phases are merged into one.In this implementation, (i) PCDs are produced on demand and (ii) this production isdone e�ciently, i.e., it avoids useless tasks that lead to redundant PCDs. The motivationof the proposal is to avoid the costs of producing PCDs that are not used to produce therequired number of solutions. The lexicographical order is also used to produce solutions.

• POTI: This algorithm is yet another variation of WALO, where the lexicographical or-dering is replaced with the Pareto ordering. The two phases of WALO are present inPOTI.

3.3. FAULT TAXONOMY 33

The three re�nement versions are independently implemented. All of them use a nearlycommon index-like data structure that organizes concrete services according to the preferencesemantics used. The results in [1] present that LOIR performance is shown to be the best, inparticular when considering the use of a signi�cant amount of concrete services.

The fault-recovery mechanism proposed by the present dissertation adopts the rewritingprocess described by POTI because of the Pareto ordering and its e�ects on the preferencedegradation during the search for alternative services.

3.3 Fault Taxonomy

Faults, errors, and failures are some of the terms used to de�ne elements that a�ect the well-functioning of a given system [15]. A failure is the deviation of the service from the correctbehavior. This unexpected behavior is the consequence of the occurrence of errors that altersthe service, being noticed by the user when the behavior reaches the service interface. Faultsare the possible causes of an error. The identi�cation of the possible causes of these problemsplays a fundamental role in the development of the Fault recovery mechanism. Therefore thissection is dedicated to compiling a fault taxonomy based on previous works that aimed to pointout the possible errors in a Service-Oriented Architecture environment.

The faults listed in this work are well-established in the literature [16, 17, 6, 48, 49, 7, 50],and they are distributed in a manner that contributes to the discussion proposed. The errors aregrouped by their similarities of possible causes and origins resulting in di�erent fault contexts.By doing that, the de�nition of violations is done more precisely, and it becomes easier todetermine the particular recovery strategy to the categories of faults.

Some of the works considered for this study are interested in proposing a speci�c taxonomyof faults in a Web service composition context in order to contribute to future works on therecovery of service compositions [16, 17, 49]. In the speci�c case of [17], for example, the faultsare classi�ed considering their possible occurrence on each phase of the life cycle of a servicecomposition. In contrast with the mentioned works, there are others which aim to suggest asolution to provide fault-tolerant Web service composition, and because of that, they de�necategories and models of faults to explain how their proposals can be applied to deal withfaults [6, 48, 49, 7, 50]. This last mentioned goal is closer to the purpose of the present section.

Some works that address the recovery of SOA present a system able to monitor and reactto the failures [6, 7, 50]. The fault modeling presented by them is intrinsically associated withthe monitoring mechanism they used along with their proposed recovery system. Their faultmodels enable the user to specify the erroneous behavior that may occur during execution basedon their proposed platform. In that case, they do not explicitly de�ne a fault taxonomy, butinstead, they explain the methodology of faults speci�cation using their models. The scope ofour project does not include the de�nition of a monitoring mechanism. Instead, the goal of thissection is to describe and identify the disruptive behavior typical of any service that may causefaults and compromise the execution of a service composition.

The following sections compile and organize faults and possible causes found in the bibliog-raphy related to Web service compositions [16, 17, 7, 48, 49, 51, 7]. Through the study of thetaxonomies found in the literature, it is possible to relate the faults and their causes in threeprincipal subdivisions: Service context, Composition context, and Infrastructure context. Thesethree contexts correspond to the three levels of faults that guide our taxonomy, summarized inTable 3.1. In the next sections we discuss these contexts and relate them to previous works.


3.3.1 Service Level

This level is speci�cally related to functional aspects of each service of the Web service compo-sition. This context considers the quality and correctness of data delivered by the service. TheService Level is divided in Content and Timing faults [48, 16, 49, 6].

Content violations are related to the de�nition and expectation of the outputs of a givenservice. Since a speci�c behavior is expected for the service based on its description and thecomposition speci�cation, any mismatch between these expectations and the data delivered bythe service during the Execution phase con�gures a fault. Some examples to include in thiscategory are the delivery of incorrect results from the service and the provision of a servicedi�erent from expected. The �rst example is related to the incorrect results of the serviceconcerning the description provided by the vendor. This case results in the incoherent behaviorof the service and may damage the process of the �nal result of the composition. In the case ofthe wrong service provided, the service output is coherent with its description, but it deviatesfrom the speci�cation of the activity which the service is bound. This kind of problem mayhappen due to an adverse selection of service before the Deployment phase [16, 17, 48, 49].

In the case of Timing faults, the category summarizes the errors regarding the time of arrivaland delivery of the service that may impact the functional speci�cations of the Web servicecomposition [16].

3.3.2 Composition Level

The composition level mostly concerns aspects related to the result of composing the Web ser-vices selected during the Scheduling phase. This context includes the evaluation of the capacityof conversation between service services and the delivery of requirements by the resulting com-position [49, 17, 48]. Those requirements are divided into goals speci�ed for the compositionand the quality attributes preferred by the developer. Both of them are established during theDe�nition phase of the system, and the concrete composition is expected to meet these require-ments during the entire process of execution. The Composition level of faults is then composedby Compatibility faults, Coverage faults, and QoS faults.

Compatibility failures are related to the mismatch behavior of the services regarding thespeci�cation of the composition, including incompatibility of exchanged data between servicesand the incorrect order of invocation of the services. The mismatch of information may occur dueto di�erences in the arguments or protocols considered by the services during the exchange ofdata [48], including missing parameters or incorrect data types. The wrong order of invocationis typically due to a violation of the order of invocation of service operations or messagesaccording to the expected execution of the composition [49].

Since the coherence of data is fundamental for the composed execution of services, toolsused during the development and construction of the concrete composition are supposed toidentify such kind of fault before the generation of the executable process [16]. However, thecause of this type of failure may also appear after the Deployment phase. As a consequence ofan unexpected change of a service interface after an update, for example. Then, in this case, theerror occurs during execution and systems of monitoring and recovery must be used to identifyand correct it.

Coverage faults are de�ned as violations of the speci�cation of the Web service compositionregarding the binding between abstract services, or goals, and the concrete service services. Thedevelopment of the service composition follows the speci�cation de�ned by the user that shouldbe fully covered by the �nal implemented system. Then the failure caused by a missing part ofthe composition may impact the business process [17]. If the resulting composition covers onlya part of the goals, the business process will not represent the behavior expected by the user.The unavailability of the service may cause this situation during execution.

3.4. RECOVERY ACTIONS 35

The faults of Quality of Service (QoS) are related to the violations of non-functional at-tributes of the composition. They regard the quality o�ered by the services, such as availability,response time, throughput, security, and price. All the mentioned attributes and their expectedvalues can be formalized on a Service Level Agreement [52], in which the client and the providerstate their terms of the agreement and de�ne the QoS attributes expected by the execution ofthe service. Both parties should monitor and compare the attributes of qualities during inter-action against what is claimed in the SLA [53]. The aggregation of QoS attributes for a servicecomposition helps to deal with the analysis of QoS of the service composition more accurately[54, 55].

Deviations during the development of the composition can cause errors in Compositionlevel. The Service Selection phase plays a fundamental role in the correctness and reliabilityof the resulting Web service composition [1]. The re�nement process may address most of thechallenges related to avoiding Composition faults and lacking such a process during developmentmay consequently expose the system to the occurrence of this kind of faults. Naturally, likethe Service violations, Composition faults caused by unexpected changes in services providedduring the Execution phase cannot be addressed by auxiliary mechanisms used during thedevelopment stages of the composition. Instead, the mechanisms of recovery and monitoringshould be responsible for dealing with the problems during the Execution phase.

3.3.3 Infrastructure Level

In this level, the faults are in the technical supporting execution environment of services [48, 7].Faults in this level can impact on the execution of the Web services due to de�ciencies in theunderlying provisioning infrastructure, making it impossible to execute the service or to provideit with proper QoS attributes[49].

The Infrastructure level is divided into Platform and Network violations. Platform faulthappens when the service becomes unavailable because of a problem with the client or theproviding service device, like the malfunctioning of the application server [48]. Network faultsare physical failures in communication to the service or client, like connectivity loss or lowbandwidth. Both Platform and Network violations may cause unavailability of Web serviceservices and a consequent failure of the composition [16].

The literature review and the resulting fault categorization is exposed in Table 3.1. Eachkind of violation is related to a set of references which includes works that consider the sameconcept of violation in their taxonomies. The referred works do not present the frequency ofoccurrence of faults in each category in real-world applications. Those works only specify thedisruptive behavior that a given Web service composition may face during its execution.

The recovery mechanism developed in the present dissertation is capable of providing so-lutions for the unavailability or malfunctioning of a Web service within a given composition.This kind of behavior can be caused by faults from all levels. The recovery mechanism considersthat the substitution of the service can remedy the failure whether it originates from a crashedserver or from incorrect delivered results. The next section explores the relation of the recoveryactions established in the literature and the applicability of them in di�erent scenarios of faults.

3.4 Recovery Actions

The application of recovery actions to correct the errors detected during execution contributesto achieving fault-tolerance in a Web service composition. As for the taxonomy of faults, thecategorization of those recovery actions helps to study the di�erent scenarios of faults and theresulting solutions for problems [48, 7, 49]. This section is supposed to present well-established


Table 3.1: Fault Taxonomy.

Level Violation Example References

ServiceContent

Incorrect results,service provideddi�erent from expected

[16, 17, 7, 48, 49]

Timing Time-out [16, 17, 51, 49]

Composition

Qualityof

Service

Low availability,high rate of error,Financial faults,SLA violation

[16, 17, 48, 51, 49]

CompatibilityMissing parameter,mismatch data types,incorrect order

[16, 17, 48, 51, 49]

CoverageMissing parts

of the composition[17, 7, 49]

InfrastructurePlatform Server crashed [16, 17, 7, 48, 51, 49]

NetworkMissing connection,low bandwidth

[16, 17, 7, 48, 51, 49]

reactions and explain how they are applied to the faults stated in the previous section. Someof the fundamental failure reactions for Web service composition are:

• Notify : Considered an instrumented action[48], the Notify strategy includes actions oflogging details about errors in �les and also alerting the stakeholders of the process aboutthe occurrence of faults [7, 48]. This reaction should be applicable to all kind of faults.

• Ignore : This strategy is de�ned by the decision of not actively interfering with the exe-cution of the Web service composition during the scenario of failure. This reaction appliesto any fault caused by a service that does not a�ect the primary goal of the composition.

• Retry : The idea is to repeat the execution of the activity motivated by a possible transientfault due to an instability of hardware or software [7]. Only considered for services that canbe executed multiple times without a�ecting the consistency of the state of the process[51](Section 3.1.2). This strategy is suitable for any fault that implies on violation offunctional behavior or mismatch of output content from the Web service. However, errorsregarding pre-condition constraints violations may not be solved by the application ofthis recovery action [48].

Beside the listed reactions, two main strategies de�ne the level of intervention achieved bythe recovery mechanism proposed in the present work. Those are Replace and Recompose.The recovery action of Replace entails in substituting the failed service for an equivalent serviceregarding functional interfaces and provided QoS. This reaction is typically triggered afterdetecting any fault that results in the complete unavailability of service or inability to succeedin achieving the bound goal. Replace strategies demands a mechanism to identify availablecompatible services and to include them in the Web service composition deployed originally[49, 7].

The replacement of a failed service leads to the dynamic binding of a unique equivalentservice or, in some cases, to a composition of Web services that combined present equivalentfunctions of the failed one [51]. In the exceptional cases of failure of all services involved in the

3.4. RECOVERY ACTIONS 37

composition, the recovery action to be considered is Recompose. This reaction implies in estab-lishing an alternative business process with the same primary goals of the failed composition[48]. The execution of this kind reaction works as a Replace action applied to all Web services.

Table 3.2 summarily presents the recovery actions considered for the bibliographic referencesand the applicability of those strategies to di�erent kinds of faults.

Table 3.2: Recovery Actions

Reaction Faults References

Notify Applicable to all faults [48, 7]

IgnoreApplicable to all faults that does not a�ectprimary goal of the composition

[48, 7]

RetryApplicable to all kindof post-condition fault

[51, 49, 48, 7]

ReplaceApplicable to faults of complete unavailabilityof service or inability to succeed in goal

[51, 49, 48, 7]

Recompose Applicable to faults of all services involved in the composition [51, 49, 48]

This chapter presented the context of Web service composition and the application of rewrit-ing techniques during the development of systems in the SOC paradigm. Part of the work is alsodedicated to exploring the faults and recovery actions well established by the literature aimingto provide background to the development of the recovery mechanism. After these studies, it ispossible to understand the failures that may threaten the execution of a business process, theviable recovery actions to reestablish the execution and the techniques used to provide correct-ness and completeness during the development of those systems. The information compiled inthis chapter provides the theoretical basis for the methodology of the work that is presented inthe next chapter.

Chapter 4

Proposal

In this chapter, we present an autonomic mechanism to provide alternative services for replace-ment of failed portions of a Web service composition. The mechanism is intended to be partof a platform able to (i) identify runtime failures; (ii) propose reactions to the failures, and(iii) recover the system. Our proposal aims to support the platform by suggesting alternativeservices to the failed portions of the service composition. Section 4.1 begins the present chapterby introducing the elements involved in the recovery scenario, such as the composition speci-�cation, the initial composition, and the replacing candidates. In Section 4.2, we present thelevels of recovery considered by the recovery mechanism and how they distinctly impact thefailed composition. Section 4.3 concludes the chapter by showing the algorithms responsible forspecifying the autonomic behavior of the recovery mechanism proposed.

4.1 Speci�cation of Services and Compositions

A Web service composition is considered to implement a business process by correctly coveringthe correspondent abstract speci�cation. Due to the variety of available services, each abstractfunctionality that compose the speci�cation can be implemented by a set of distinct candidateservices. Our work adopts the format of concrete services and abstract compositions presentedin [1, 14] to specify respectively the Web services and compositions. In those works, the concreteservices and abstract compositions are described in terms of abstract services. In the presentwork, we refer to those abstract building blocks as functionalities. The following section willpresent the details of the mentioned speci�cation.

4.1.1 Composition Speci�cation

The composition speci�cation considered by the present work are represented by equations thatare similar to those presented by [1, 14] for abstract compositions. For example, a compositionspeci�cation of n functionalities is represented as:

C(t̄) ≡ F1(t̄1), . . . , Fn(t̄n)

The left-hand side of the speci�cation de�nes the interface of the composition. The elementsof the tuple t̄ are formal parameters and represent input(?) and output(!) data. Di�erentlyof [1, 14], the right-hand side of the de�nition only consists of semantic descriptions of servicefunctionalities (F1, . . . , Fn). The mentioned works also include the representation of qualityconstraints expressing requirements of the composition, but the present research will not explorethese parameters. The composition of containing functionalities of the speci�cation representsthe relationships between their required inputs and expected outputs. As the abstract services

39

40 CHAPTER 4. PROPOSAL

of [1, 14], the functionalities are the building blocks used to specify the composition speci�cationto be implemented by the services.

4.1.2 Service Speci�cation

The service speci�cation is inspired by the description of concrete services of [1, 14], exceptthat we do not express quality constraints of the capacities of services. As in the compositionspeci�cation, the left-hand side of the service speci�cation also de�nes the interface of theservice. The elements of a tuple t̄ are formal parameters and represent input(?) and output(!)data. And the right-hand side of the speci�cation consists of functionalities (F1, . . . , Fk).

S(t̄) ≡ F1(t̄1), . . . , Fk(t̄k)

In this case, the left-hand side of the de�nition gives the name and interface of the ser-vice. Additionally, the right-hand side uses the functionalities to express the capabilities of theservice. The service supplier/publisher is supposed to give the speci�cation of each service.

A re�nement method selects services whose speci�cation matches with the compositionspeci�cation. The matching evaluation considers the de�nition of each service to cover part ofthe speci�ed functionalities. This covering relation is used to build a service composition thatre�nes the speci�cation.

Example 4.1.1. Consider a composition with three functionalities, as described by the followingspeci�cation:

Cspec(x1?, x4!) ≡ F1(x1?, x2!), F2(x2?, x3!), F3(x3?, x4!)

Where Cspec describes a composition with functionalities F1, . . . , F3. For this speci�cation, con-sider the following available services:

Sa(x1?, x2!) ≡ F1(x1?, x2!)Sb(x1?, x2!) ≡ F2(x1?, x2!)Sc(x1?, x2!) ≡ F3(x1?, x2!)We can conclude that a service composition C formed by Sa, Sb and Sc is capable of imple-

menting Cspec since each of the services covers a single part of the composition. The result ofthe re�nement process would be:

C(x1?, x4!) ≡ Sa(x1?, x2!), Sb(x2?, x3!), Sc(x3?, x4!)

The semantic information expressed by the functionalities helps to broaden the number ofavailable services to execute a common task. Services that include the same functionality intheir respective speci�cations can be equally considered to re�ne that functionality in a givencomposition speci�cation. Therefore, services with a common functionality may represent aset of alternative options for the re�ning that speci�c part of the composition. We considerthese sets of alternative services may support a faulty recovery mechanism that executes thereplacement of the faulty service.

Example 4.1.2. Consider the same composition speci�cation(Cspec) presented in Example 4.1.1and the following available candidates for re�nement:

Sa(x1?, x2!) ≡ F1(x1?, x2!)Sb(x1?, x2!) ≡ F2(x1?, x2!)Sc(x1?, x2!) ≡ F3(x1?, x2!)Sd(x1?, x2!) ≡ F2(x1?, x2!)Suppose that an user chooses to execute the following service composition and an error

occurs during the execution of Sb.

4.1. SPECIFICATION OF SERVICES AND COMPOSITIONS 41

C(x1?, x4!) ≡ Sa(x1?, x2!),Sb(x2?,x3!), Sc(x3?, x4!)

The fault recovery mechanism must look for other available services that cover the function-ality F2 to �x the initial composition. The set SF2 = {Sb, Sd} contains all the available servicesthat cover the mentioned functionality. Then, in order to recover the process the mechanismsubstitute Sb by the available service Sd.

C(x1?, x4!) ≡ Sa(x1?, x2!),Sd(x2?,x3!), Sc(x3?, x4!)

Services that have a common functionality in their respective speci�cations do not nec-essarily cover the same portion of the composition speci�cation. Each service executes thefunctionalities as speci�ed by their respective vendors. Moreover, those similar services maypresent di�erences regarding their adopted topology and set of all covered functionalities. Thevariability of candidates to cover the same functionality directly impacts possible substitutionsthat a recovery system is capable of implementing.


Sa(x1?, x2!) ≡ F1(x1?, x2!)Sb(x1?, x2!) ≡ F2(x1?, x2!)Sc(x1?, x2!) ≡ F3(x1?, x2!)Sd(x1?, x2!) ≡ F2(x1?, x2!)Se(x1?, x3!) ≡ F2(x1?, x2!), F3(x2?, x3!)Sf (x1?, x4!) ≡ F1(x1?, x2!), F2(x2?, x3!), F3(x3?, x4!)Suppose that an user chooses to execute the following service composition and an error

occurs during the execution of Sb, as describe in Example 4.1.2.

C(x1?, x4!) ≡ Sa(x1?, x2!),Sb(x2?,x3!), Sc(x3?, x4!)

In this new scenario, the set of available candidates to cover the missing functionality F2 isSF2 = {Sb, Sd, Se, Sf}. The recovery mechanism could execute di�erent substitutions that woulddistinctly impact the initial composition. Since the recovery mechanism is supposed to reestablishthe well-functioning of the initial composition, any substitution that results in an incorrectre�nement of the composition speci�cation cannot be considered. Three possible solutions canbe considered in the current situation.

The most simple substitution, already presented in the previous example, is replacing Sb bySd. These services have the same set of covering functionalities.

C(x1?, x4!) ≡ Sa(x1?, x2!),Sd(x2?,x3!), Sc(x3?, x4!)

Another possibility is replacing the failed service Sb by the available service Se. In this case,the modi�cation of the initial composition would not be limited by just replacing the failedservice. However, this time the service Sc would also be considered for replacement.

C(x1?, x4!) ≡ Sa(x1?, x2!),Se(x2?,x4!)

The last possibility of recovery involves a more invasive replacement which consists of mod-ifying the whole initial composition by the service using Sf

C(x1?, x4!) ≡ Sf (x1?,x4!)


Notice that the possible replacement is presented in a way that none of them are prioritizedover the others. They are supposed to successfully recover the composition although they presentdi�erent impacts on the initial composition.

The de�nition of the possible replacements presented in Example 4.1.3 demands coherentreasoning over the compatibility of services involved in the replacement according to the com-position speci�cation. Because of that, the adaptation of a method that addresses the servicecomposition re�nement problem can be suitable for designing the recovery mechanism. The nextsection describes the re�nement method POTI [1] that was adapted to support the developmentof our recovery mechanism.

4.1.3 Rewriting Technique

The fault-recovery mechanism proposed by the present thesis adopts the rewriting processdescribed by POTI [1]. The adoption of this algorithm is justi�ed by the user preference orderingimplemented during the iterative generation of rewritings. Di�erently of the algorithms WALOand LOIR which adopt a lexicographical order, POTI traverses the search space according tothe Pareto ordering [45, 46]. Since Pareto ordering does not prioritize coverage domains, therecovery mechanism investigates the neighborhood of the current solution in a more uniformway with regard to each coverage domain. On the other hand, we expect that the lexicographicalorder would cause greater degradation of preferences. This degradation would occur becauseeach coverage domains would be entirely explored before investigating other coverage domains.

The POTI [1] algorithm includes two steps:(i) the production of Partial Coverage Descrip-tors (PCDs) based on the available services, and (ii) the combination of the produced PCDs togenerate service compositions. In the �rst phase, POTI splits a given composition speci�cationinto blocks and looks for Web services providing the corresponding functionality of each block.The result of this identi�cation phase is a set of Partial Coverage Descriptors (PCDs). EachPCD contains information on how to use a Web service as a part of the service composition.This information includes variable mappings (to bind variables in the speci�cation to the ar-guments of the services), as well as, the functionality covered by the service. Therefore, given acomposition speci�cation Cspec = {F1, . . . , Fk}, where each Fi stands for a functionality, and aset S of Web services available in the registry, this step generates the set of all PCDs that canbe used to cover functionalities of Cspec.

Table 4.1 summarizes the PCDs which are based on the available services of Example 4.1.3.No-tice that one PCD may cover more than one functionality.

Table 4.1: PCDs based on available services Example 4.1.3

PCD Service Functionalities

PCDSa Sa {F1}PCDSb

Sb {F2}PCDSc Sc {F3}PCDSd

Sd {F2}PCDSe Se {F2, F3}PCDSf

Sf {F1, F2, F3}

During the combination phase, the rewriting process checks some constraints before inte-grating PCDs. These constraints include: (i) All functionalities of the speci�cation needs to becovered; (ii) Each functionality must be covered by just one PCD; (iii) The use of parame-ters on the predicates must be consistent. For the sake of simplicity, the veri�cation of these

4.2. RECOVERY METHOD 43

constraints are abstractly represented as a call of a function Compatible, which take a set ofPCDs and returns a Boolean value. The computational details of this function can be foundin [14]. Once a set of compatible PCDs is found, it is integrated into a service composition.This integration is represented by the Combine function that combines PCDs.

Since several compositions may be produced by the method, user preferences are used tochose the most suitable rewritings. The POTI algorithm in [1] adopts user preference valuesto classify the resulting compositions, using the Pareto order [46]. These values are arbitrarilychosen by the user, which are used to prioritize their use in the rewriting. We will suppose thatpreference values range from zero (less preferred) to one (most preferred). The preference ofa composition is calculated as the weighted average of the preference scores of its componentservices. Where the weights correspond to the respective number of functionalities that eachservice covers.

Example 4.1.4. Suppose that an user assign her preference to each available service of Exam-ple 4.1.3: Sa(1.0), Sb(0.5), Sc(0.8), Sd(0.1), Se(0.9) and Sc(0.3). The PCDs can be summarilypresented as in the following table:

Table 4.2: PCDs based on available services Example 4.1.3

PCD Service Functionalities Preference

PCDSa Sa {F1} 1.0PCDSb

Sb {F2} 0.5PCDSc Sc {F3} 0.8PCDSd

Sd {F2} 0.1PCDSe Se {F2, F3} 0.9PCDSf

Sf {F1, F2, F3} 0.3

Considering the user preferences, the most preferred composition will be C(x1?, x4!) ≡Sa(x1?, x2!), Se(x2?, x4!) , with a composition preference of 0.93.

We adapt the POTI algorithm to provide alternative services in case of failure of parts ofa composition. Speci�cally, our method explores the PCDs generated at the selection phase ofPOTI to replace parts of the failed composition, while preserving the overall functionality.

4.2 Recovery Method

For a composition formed by services S1, . . . , Sn, our algorithm considers three incrementalrecovery levels: local, partial and total. Given that a service Si fails, each recovery level de�neswhich part of the composition must be replaced. At the local level, the algorithm tries to replacejust the failed service Si. If it is not possible to recover locally, the algorithm steps to the partiallevel of recovery, by replacing Si and the subsequent services. This situation occurs when thereis no possible substitution for the individual service. In this case, the algorithm tries to replacethe sub-composition de�ned by Si, . . . , Sn. If there is no possible replacement at the partiallevel, the algorithm tries to obtain a rewriting for the whole composition. Consequently, thislast level of recovery may consider the replacement of services that were already executed beforethe occurrence of the failure.

While searching for replacing services, the proposed recovery mechanism does not considerthe execution of compensation services, or rollbacks, required to implement the replacement.These transactional requirements involving compensation of executed services, for example, issupposed to be handled by the entity responsible for managing the substitution. Consequently,this substitution manager is guided by the results of the recovery method proposed, since


di�erent levels of recovery may demand distinct compensation actions. Moreover, the developedmechanism is limited to suggest alternative services to recover the failed system independentlyof those cases of compensation. Because of that, the cost of recovery considered in this studyonly includes the resources spent during the search for alternative services considering the levelsof substitution.

We consider that an initial composition that presents the failure is the most preferredone produced during the re�nement process. Thus, the substitution of the services involvedin the composition may entail the decrement of the user preference on the new recoveredcomposition. Because of that, the recovery method aims to replace the de�ned portion of theinitial composition by the next preferred available services. In other words, the mechanism triesto reach the minor degradation of the user preference while replacing services.

The next sections present the Travel Agency use case and di�erent scenarios of failures thatmay happen during the execution of the process. These sections are supposed to illustrate thedescribed levels of recovery and the reasoning applied by the recovery mechanism.

4.2.1 Use Case

The use case of the Travel Agency is presented in this section to illustrate the functioning of therecovery algorithms and the levels of substitution. The use case is explored in di�erent scenariosof failure throughout the present section aiming to achieve a comprehensive explanation of thetool. Also, the presented use case represents a practical application of the proposed recoverymethod.

The abstract composition expressing the speci�cation of the system is shown below. Thecomposite service enables a user to book and pay for services that will be used during her trip,such as, �ight tickets, hotel room, and a rental car. For that matter, the system is formed byAuthentication, Flight, Hotel, Car and Payment services.

Basically, the client supplies an identi�cation, password and the parameters of the travel(origin, destination, departure date and return date). In exchange, she expects to receive thetotal price (already debited from her bank account), a �ight token, a hotel token, a car tokenand the transaction acknowledgment. The returned tokens are key codes that give access to theinformation related to the booked services, such as departure and arrival airports, check-outtime in the hotel and car model rented.

TravelAgency(Uid?, Pwd?, T ravelParam?, F lightTkn!, HotelTkn!, CarTkn!, T otalCost!, Ack!)

≡Authentication(Uid?, Pwd?, UsrTkn!),

F light(UsrTkn?, T ravelParam?, F lightTkn!, F lightInvoice!),

Hotel(UsrTkn?, F lightTkn?, HotelTkn!, HotelInvoice!),

Car(UsrTkn?, HotelTkn?, CarTkn!, CarInvoice!),

Payment(UsrTkn?, F lightInvoice?, HotelInvoice?, CarInvoice?, T otalCost!, Ack!)

The composition starts with the authentication of the client, who is supposed to be registeredin an authentication service used by the Travel Agency. This step returns a token to identifythe client in the services involved in the composition. The Flight service uses the expectedlocations and dates provided by the user to return the �ight token, the price and the invoice,to process the payment.

The Hotel service uses the �ight token to retrieve the expected locations and dates of thetravel in order to �nd a hotel at the destination. After booking a room, the Hotel service returnsthe hotel token, the price and the invoice.

The Car service uses the hotel token to retrieve the location of the hotel and the check-inand check-out time to choose a near rental car pick-up unit and set a reservation. Once thevehicle is reserved, the Car service returns the car token, the price and the invoice.


The Payment service looks up the �ight's price, the hotel's price, and the car's price, in therespective invoices. The bill is paid by using the credit card information already associated withthe client's identi�cation. Once the whole process is complete, the Payment service returns theacknowledgement �ag indicating whether the system succeeded or not.

Table 4.3 shows the PCDs that were based on the concrete services considered for thedescribed use case. They compose the initial rewritings and the search space of replacing can-didates of the following examples presented in the chapter. Each table presenting PCDs inthe current chapter shows some of the elements of a PCD tuple. For the sake of legibility, themappings considered for each PCD are not explicitly presented in tables.

Table 4.3: PCDs based on concrete services

PCD Service Goals Preference

PCDGoogleAuth GoogleAuth Authentication 0.9PCDGol Gol Flight 0.9

PCDBooking BookingFlight,Hotel,Car

0.9

PCDLatam Latam Flight 0.8PCDIbis Ibis Hotel 0.9

PCDLocalizaHertz LocalizaHertz Car 0.9PCDExpedia Expedia Hotel, Car 0.7PCDV isa Visa Payment 0.9

4.2.2 Local Recovery

The Local Recovery occurs when a single service of the composition is to be replaced. Thisservice can cover one or more functionalities. Also, these functionalities may be covered byanother service or a sub-composition of available services. Example 4.2.1 considers the casewhere the failed service covers only one functionality of the composition.

Example 4.2.1. Suppose that the user chooses to run the �rst concrete composition producedby the re�nement process and a failure occurs during the execution of Gol.


≡GoogleAuth(Uid?, Pwd?, UsrTkn!),

Gol(UsrTkn?,TravelParam?,FlightTkn!,FlightInvoice!),

Ibis(UsrTkn?, F lightTkn?, HotelTkn!, HotelInvoice!),

LocalizaHertz(UsrTkn?, HotelTkn?, CarTkn!, CarInvoice!),

V isa(UsrTkn?, F lightInvoice?, HotelInvoice?, CarInvoice?, T otalCost!, Ack!)

Table 4.4 shows the PCDs considered during the production of the initial rewriting. Noticethat the PCDGol is responsible for the coverage domain Flight. The recovery system will initiallyapply the local level of recovery and look for a new PCD, in decreasing order of preference, thatis based on a concrete service capable of recovering Flight.

As shown by Table 4.5, the PCD that follows PCDGol in decreasing order of preference inthe coverage domain of Flight is PCDBooking, which presents the same rank of preference.


Table 4.4: PCDs used for producing the initial rewriting



PCDIbis Ibis Hotel 0.9PCDLocalizaHertz LocalizaHertz Car 0.9

PCDV isa Visa Payment 0.9

Table 4.5: PCDs covering Flight


PCDGol Gol Flight 0.9PCDBooking Booking Flight, Hotel, Car 0.9PCDLatam Latam Flight 0.8

The recovery mechanism evaluates the combination of the candidate for replacement, PCDBooking,with all the PCDs used for producing the initial rewriting except the one based on the failed ser-vice Gol. This evaluation will help to conclude whether it is feasible to use Booking for recovery.

Table 4.6 shows the set of PCDs tested at this point in the process. Notice that PCDBooking

covers not only the coverage domain Flight, but it also covers the functionalities Hotel and Car.These coverage domains are implemented in the original composition by Ibis and LocalizaHertz.

Table 4.6: PCDBooking replacing PCDGol


PCDGoogleAuth GoogleAuth Authentication 0.9

PCDBooking Booking

Flight,

Hotel,

Car

0.9



The overlapping con�ict is detected and the recovery mechanism de�nes this combinationof PCDs unfeasible. The search for a compatible candidate continues through the available op-tions. The next candidate in the domain of coverage Flight in decreasing order of preference isPCDLatam (Table 4.5). The combination of Latam with the well functioning services from theinitial rewriting is shown by Table 4.7.

Since the resulting combination of PCDs does not present any con�ict, the recovery mecha-nism �nishes its search and de�nes that Latam is the recovering substitute of the failed serviceGol. The resulting rewriting after the substitution is:



Latam(UsrTkn?,TravelParam?,FlightTkn!,FlightInvoice!),





Table 4.7: PCDLatam replacing PCDGol


PCDGoogleAuth GoogleAuth Authentication 0.9PCDLatam Latam Flight 0.8



Still in the context of local recovery, in case of the failed service covers more than onecoverage domain, the recovery mechanism searches for a sub-composition of concrete servicesresponsible for implementing all the uncovered domains. The Example 4.2.2 illustrate thisspeci�c situation.

Example 4.2.2. Consider that the user chooses to run the following concrete composition:



Booking(UsrTkn?, TravelParam?,FlightTkn!, FlightInvoice!,

HotelTkn!, HotelInvoice!, CarTkn!, CarInvoice!),


This rewriting is one of the products of the re�nement process. The composition is formedof three concrete services: GoogleAuth covering the �rst functionality (Authentication), Book-ing covering three sequential coverage domain (Flight, Hotel, Car) and Visa implementing theremaining one (Payment). In case of failure of Booking, the recovery system will initially ap-ply the local level of recovery and try to cover each of Booking's coverage domains with a newconcrete service. The Table 4.8 presents the PCDs produced during the re�nement process thatincludes Flight, Hotel and/or Car in their set of covered goals. Table 4.9 shows the coveragedomains involved in the recovery and their PCDs sorted by preference.

Table 4.8: PCDs covering Flight, Hotel and/or Car


PCDGol Gol Flight 0.9

PCDBooking BookingFlight,Hotel,Car

0.9

PCDLatam Latam Flight 0.8PCDIbis Ibis Hotel 0.9

PCDExpedia Expedia Hotel, Car 0.7PCDLocalizaHertz LocalizaHertz Car 0.9

Considering that Booking is the failed service, the PCD based on this service must not beincluded in the candidate recovery composition. Therefore, in this case, the recovery mecha-nism searches for the �rst feasible concrete sub-composition formed by the PCDs: PCDGol,PCDLatam, PCDLocalizaHertz, PCDIbis and PCDExpedia.

To locally replace Booking, three services that combined provide the same set of functional-ities must be selected. In this example, the �rst suggested combination is composed of the nextimmediate concrete services after the failed service in each coverage domain. Thus, as seen in


Table 4.9: PCDs sorted in decreasing order of user preference and distributed by coveragedomain

Flight Hotel Car

PCDBooking PCDBooking PCDBooking

PCDGol PCDIbis PCDLocalizaHertz

PCDLatam PCDExpedia PCDExpedia

Table 4.9, the concrete service after PCDBooking in the coverage domain of Flight is PCDGol.In the coverage domain of Hotel, the candidate is PCDIbis, and for coverage domain Car thenext option is PCDLocalizaHertz. Table 4.10 shows the PCDs suggested for recovery.

Table 4.10: Component PCDs of the �rst candidate for recovery



PCDIbis Ibis Hotel 0.9

PCDLocalizaHertz LocalizaHertz Car 0.9


The PCDs suggested do serve as a possible rewriting. They do not present any con�ict orviolation and the composition of them results in the following rewriting:



Gol(UsrTkn?,TravelParam?,FlightTkn!,FlightInvoice!),

Ibis(UsrTkn?,FlightTkn?,HotelTkn!,HotelInvoice!),

LocalizaHertz(UsrTkn?,HotelTkn?,CarTkn!,CarInvoice!),


Since the failures explored in the previous examples were recovered in the context of a localrecovery, the service GoogleAuth and Visa was not replaced in any of the cases.

4.2.3 Partial Recovery

The Partial level of recovery is tried by the algorithm when the local recovery does not succeed.In this case, not only the faulty service Si is replaced, but all the subsequent services that werenot executed before the occurrence of the failure.

Example 4.2.3. For this example, consider the same abstract speci�cation of the Travel Agencycomposition of previous examples. Suppose that the user chooses to run the following concretecomposition suggested by the re�nement process:



Gol(UsrTkn?, T ravelParam?, F lightTkn!, F lightInvoice!),

Ibis(UsrTkn?,FlightTkn?,HotelTkn!,HotelInvoice!),




Consider that an error has occurred during the execution of Ibis and it must be replaced by asimilar service that also covers the functionality Hotel. Table 4.11 shows the PCDs originallyconsidered for the coverage domain Hotel.

Table 4.11: Coverage domain Hotel

PCD Service Goals User Preference

PCDIbis Ibis Hotel 0.9PCDBooking Booking Flight, Hotel, Car 0.9PCDExpedia Expedia Hotel, Car 0.7

The �rst strategy adopted by the recovery mechanism is to �nd a replacement at a local level.As explained in the previous section, the local recovery looks for the �rst candidate to recover thecoverage domains of the failed service. In this case, the recovery mechanism will check whetherPCDBooking or PCDExpedia are capable of covering Hotel and operate with the other PCDs ofthe initial composition without any con�ict.

The evaluation of candidates will follow a decreasing order of preference, meaning thatPCDBooking (0.9) will be checked before PCDExpedia (0.7). The Table 4.12 shows the result-ing combination of PCDBooking and the PCDs from the initial rewriting, except PCDIbis.

Table 4.12: Component PCDs of the �rst candidate for local recovery



PCDBooking Booking Flight, Hotel, Car 0.9

PCDLocalizaHertz LocalizaHertz Car 0.9PCDV isa Visa Payment 0.9

PCDBooking covers the domains of Flight, Hotel and Car. Since the initial compositionincludes PCDGol and PCDLocalizaHertz, respectively covering Flight and Car, two overlappingcon�icts occur. Therefore PCDBooking cannot be considered as a feasible candidate for localrecovery. The next step is considering PCDExpedia to recover the business process:




PCDExpedia Expedia Hotel, Car 0.7

PCDLocalizaHertz LocalizaHertz Car 0.9PCDV isa Visa Payment 0.9

In this case, con�icts between the coverage domain of PCDExpedia and the PCDs of theinitial composition also exist. PCDLocalizaHertz is supposed to cover Car that is also covered byPCDExpedia. Because of these con�icts the local recovery concludes unsuccesstotaly which leadsthe recovery mechanism to consider the partial level.

The partial recovery consists in replacing not only the failed service but also the concreteservices that were not executed before the occurrence of the failure. In this example, the recoverymechanism, at the partial level, will look for a sub-composition of available PCDs to recover thegoals Hotel, Car and Payment. Table 4.14 shows the coverage domains and the available PCDsconsidered for the partial recovery.



Hotel Car Payment

PCDIbis PCDLocalizaHertz PCDV isa

PCDBooking PCDBooking -PCDExpedia PCDExpedia -

Since PCDIbis is based on the failed service, the recovery mechanism ignores this PCD whilesearching for recovery sub-compositions. The mechanism will look for the �rst sub-compositionthat successfully implement the uncovered domains of the composition. As seen in the previousexample, suggested sub-compositions are evaluated considering the combination of them withthe PCDs that formed the initial composition. The �rst combination evaluated is shown inTable 4.15.

Table 4.15: Component PCDs of the �rst candidate for partial recovery





PCDVisa Visa Payment 0.9

Note that PCDBooking covers the three coverage domains of the abstract speci�cation (Flight,Hotel, Car). Additionaly, PCDGol, that is based on the executed service Gol, initially coversFlight. Since this service has been already executed at the moment of the failure, it is notbeing considered for substitution at the partial level. Because of that, it is possible to concludethat any recovery sub-composition that includes PCDBooking is not feasible for this level ofrecovery. By adding PCDBooking to the composition, it would inevitably results in con�ictswith PCDGol. Beside that, this combination includes the con�ict of combining PCDBooking andPCDLocalizaHertz. Both of them are expected to cover Car. Thus this combination is not acceptedto recover the composition and the mechanism evaluates the next sub-composition, shown inTable 4.16.

Table 4.16: Component PCDs of the second candidate for partial recovery





In this case, the sub-composition also includes PCDBooking but this time it is a candidate tocover Hotel and Car. This combination also results in the con�ict of combining PCDBooking andPCDGol, stated in the previous sub-composition. Both cover Flight and their execution shouldnot be considered. Because of that another candidate is evaluated. The next candidate is shownin Table 4.17.

The new candidate also presents overlapping con�icts. PCDExpedia and PCDLocalizaHertz

include the coverage domain of Car in their sets of Goals. Because of that the sub-composition


Table 4.17: Component PCDs of the fourth candidate for partial recovery






is not accepted for recovery and the mechanism keep searching for a feasible solution.Table 4.18shows the next candidate for partial recovery.

Table 4.18: Component PCDs of the third candidate for partial recovery






The next sub-composition evaluated also includes PCDBooking, which results in the con�ictsreached by previous candidates. Beside that, PCDBooking and PCDExpedia should not be executedin the same composition because both are supposed to cover Hotel and Car. Therefore thiscandidate is not accepted and another candidate is considered. Table 4.19 includes the PCDsthat form the next candidate for partial recovery.

Table 4.19: Component PCDs of the �fth candidate for partial recovery






The new candidate has the same con�icts of the previous one (Table 4.18). PCDBooking con-�icts with PCDGol over the coverage domain of Flight. And the coverage domains of PCDExpedia

are also covered by PCDBooking. The di�erence of this candidate to the previous one that includethe same PCDs is the order of execution of the components PCDExpedia and PCDBooking. Thecomposition is not accepted and a new candidate is considered. The last available candidate forthe partial recovery is shown in the Table 4.20.

The new sub-composition does not present any con�ict between its component services andthey can work composed with PCDGol and PCDGoogleAuth. Because of that, the recovery mech-anism concludes the search and returns the suggestion of a new partial rewriting. The newrewriting results from the composition of PCDGoogleAuth, PCDGol, PCDExpedia and PCDV isa,maintaining the executed services and recovering the domains Hotel, Car, and Payment. Theservice Visa is part of the solution because it is not failed and the rewriting process choose it to


Table 4.20: Component PCDs of the �rst candidate for partial recovery





cover the Payment functionality. The resulting rewriting is shown below.




Expedia(UsrTkn?,FlightTkn?,HotelTkn!,HotelInvoice!,CarTkn!,CarInvoice!),

Visa(UsrTkn?,FlightInvoice?,HotelInvoice?,CarInvoice?,TotalCost!,Ack!)

4.2.4 Total Recovery

When the performance of a partial recovery of the composition is not possible, our recoverymethod will try a Total recovery. At this level, the mechanism searches for a replacement forthe whole composition. The services that were already executed before the occurrence of thefailure are also eligible for replacement.Example 4.2.4 illustrates a total recovery in the TravelAgency use case.

Example 4.2.4. Consider that the user chooses to run the same initial composition of theprevious examples, but an error occurs during the execution of LocalizaHertz.





LocalizaHertz(UsrTkn?,HotelTkn?,CarTkn!,CarInvoice!),


This concrete service covers the coverage domain Car and the recovery mechanism will �rsttry to recover this functionality performing a substitution at the local level. Table 4.21 showsthe PCDs considered for the local recovery.

Table 4.21: Available PCDs for the coverage domain Car

PCD Service Covered Goals User Preference

PCDLocalizaHertz LocalizaHertz Car 0.9PCDBooking Booking Flight, Hotel, Car 0.9PCDExpedia Expedia Hotel, Car 0.7

During the tests, the recovery mechanism detects con�icts between the candidates for thelocal recovery with the PCDs that are originally included in the concrete composition. As seen intable 4.22, beside Car, PCDBooking covers Flight and Hotel, both respectively covered by PCDGol


and PCDIbis. Additionally, table 4.23 shows that PCDExpedia covers Car and Hotel, while thelatter is already covered by PCDIbis in the initial composition. The recovery mechanism �nishesthe process of local recovery unsuccesstotaly due to the lack of compatible candidates.



PCDGoogleAuth GoogleAuth Authentication 0.9PCDGol Gol Flight 0.9PCDIbis Ibis Hotel 0.9



Table 4.23: Component PCDs of the second candidate for local recovery


PCDGoogleAuth GoogleAuth Authentication 0.9PCDGol Gol Flight 0.9PCDIbis Ibis Hotel 0.9



A possible alternative to implement the recovery is �nding a sub-composition capable ofsubstituting the failed service and the non-executed services. In this case, the failed service isLocalizaHertz, responsible for Car, and the non-executed service is Visa which covers Payment.The Table 4.24 shows the PCDs and coverage domains considered for the partial recovery.

Table 4.24: Coverage Domains of Car and Payment

Car Payment

PCDLocalizaHertz PCDV isa

PCDBooking -PCDExpedia -

The sub-compositions that are formed by the PCDs of Table 4.24 result in the same combi-nations checked during the local recovery. This similarity occurs because PCDV isa is the onlycandidate to cover Payment and the candidates to cover Car are the same previously considered.

All the combinations tested during local and partial recovery presented con�icts that involvePCDs based on services that were executed before the failure, PCDGol and PCDIbis. Thereforethe partial recovery also concludes without success and the recovery mechanism starts consid-ering the substitution of the whole composition, also known as total recovery. Table 4.25 showsall the coverage domains and PCDs considered for total recovery.

The main goal of this level of recovery is to �nd the �rst compatible composition that does notinclude the failed service LocalizaHertz. The �rst candidate to be tested at this level is shownin Table 4.26. This combination of PCDs has been already tested during the searches of theprevious levels of recovery. Since there are con�icts regarding the coverage domains of PCDGol,PCDIbis and PCDBooking, the combination is not accepted.

The next candidate for recovery that is presented on Table 4.27. In this case, PCDBooking

and PCDIbis cause a con�ict over the coverage domain of Hotel and this composition cannotbe considered as a feasible solution. Then the recovery mechanism requests a new candidate forthe Iterator.



Authentication Flight Hotel Car Payment

PCDGoogleAuth PCDGol PCDIbis PCDLocalizaHertz PCDV isa

- PCDBooking PCDBooking PCDBooking -- PCDLatam PCDExpedia PCDExpedia -

Table 4.26: Component PCDs of the �rst candidate for total recovery







The next candidate is shown in Table 4.28. This combination of PCDs present con�ictsbetween PCDBooking and PCDGol over the coverage domain Flight. Because of that, it is notaccepted by the recovery mechanism that keeps searching for a suitable candidate.

The next candidate for total recovery is presented in Table 4.29.This combination includesPCDGoogleAuth, PCDBooking and PCDV isa. This candidate does not have any con�ict and ittotaly covers the speci�cation of the system.

The recovery mechanism evaluates this candidate as a feasible solution and it suggests thesubstitution of the initial composition by the resulting rewriting:



Booking(UsrTkn?, T ravelParam?, F lightTkn!, F lightInvoice!,

HotelTkn!, HotelInvoice!, CarTkn!, CarInvoice!),


The previous sections described and illustrated the recovery method and the levels of re-covery proposed by the present dissertation. In the next section, we will present the algorithmthat describes the autonomic behavior of the recovery mechanism.

4.3 Recovery Algorithms

As previously mentioned, we suppose that the POTI algorithm is used to generate compositionsfor a given abstract speci�cation. In this process, the set PCD of Partial Coverage Descriptorsis produced. In our self-healing scenario, we consider that the preferable composition C is theone initially deployed, being formed by the set of services {S1, . . . , Sn}.

Algorithm 1 describes the function Heal which de�nes our recovering approach. The algo-rithm receives the set PCD containing the PCDs produced by POTI, the running compositionC formed by the set of services {S1, . . . , Sn}, as well as the identi�cation of the failed serviceSi. The subset of PCD that produces the composition C is denoted by PCDC (line 2).

The FL, FP and FT sets (lines 3 to 5) contain the functionalities to be recovered at local,partial and total recovery levels, respectively. In Algorithm 1, these functionalities are retrieved

4.3. RECOVERY ALGORITHMS 55

Table 4.27: Component PCDs of the second candidate for total recovery






Table 4.28: Component PCDs of the third candidate for total recovery






by calling the method Funct with the correspondent PCDs. At the local level, the recoveringfunctionalities are those initially covered by the failed service Si. Thus, the argument passed toFunct is the resulting set PCD[Si]∩PCDC (line 3). For the partial recovery, the functionalitiesto be recovered are those originally covered by Si, . . . , Sn. These functionalities are returnedby Funct when considering an argument equal to PCD[Si, . . . , Sn] ∩ PCDC (line 4). Finally,the functionalities to be recovered during a total recovery are all of those contained in thesubset PCDC . Therefore, this subset is considered by Funct when returning the FT set offunctionalities (line 5).

The algorithm �rst tries a local recovery (line 6). If the local recovery does not succeed, thealgorithm tries a partial recovery (line 8). Finally, if the partial recovery is not possible, a totalone is tried (line 11). The algorithm returns a non-empty recovering composition R whenevera solution is found.

The description of the recovery levels and the possible cases of failures within a givencomposition enable us to analyze some particular scenarios. In these cases, the strategy ofsubstitutions of di�erent levels of recovery may present the same reasoning. Consequently, themechanism can present a repetitive behavior during the sequence of recoveries attempts.

In the �rst case, consider that the execution of a given service composition C, formed by theset of services {S1, . . . , Sn}, fails during the execution of S1, the �rst service of the composition.Naturally, each level of recovery will consider a set of functionalities to recover. In this particularcase, the set of functionalities consider for recovery at the partial and total level are equal. Sincethe partial recovery considers the replacement of the failed service and all the subsequent ones,in the case of the failure of the �rst service, all the service in the composition will be eligiblefor replacement. This reasoning is equal to the recovery at the total level.


Sa(x1?, x2!) ≡ F1(x1?, x2!)Sb(x1?, x2!) ≡ F2(x1?, x2!)Sc(x1?, x2!) ≡ F3(x1?, x2!)Suppose that an user chooses to execute the following service composition and an error

occurs during the execution of Sa.

C(x1?, x4!) ≡ Sa(x1?,x2!), Sb(x2?, x3!), Sc(x3?, x4!)

The sets of functionalities considered for each level of recovery are:FL = {F1}


Table 4.29: Component PCDs of the fourth candidate for total recovery





Algorithm 1 Self-HealingInput:

- The set PCD of all available PCDs.- The running composition C ≡ {S1, . . . , Sn}.- The failed service Si.Output:

- The recovering composition R.

1: function Heal(PCD, C, Si)2: PCDC ← PCDsOf(C)3: FL ← Funct(PCD[Si] ∩ PCDC)4: FP ← Funct(PCD[Si, . . . , Sn] ∩ PCDC)5: FT ← Funct(PCDC)6: R← Recover(PCD,PCDC , Si,FL)7: if R is the empty composition then8: R← Recover(PCD,PCDC , Si,FP )9: end if

10: if R is the empty composition then11: R ← Recover(PCD,PCDC , Si,FT )12: end if

13: return R14: end function

FP = {F1, F2, F3}FT = {F1, F2, F3}Notice that FP and FT are equal, indicating that the recovery method tries to substitute the

whole composition in partial and total recovery

Another particular case occurs when the last service in a given composition fails. In thiscase, the partial recovery has the same e�ect of the local recovery. Since there is no subsequentservice to include in the set of functionalities to recover at the partial level, only the failedservice will be considered for replacement. This reasoning perfectly represents the strategyadopted during the local recovery.


Sa(x1?, x2!) ≡ F1(x1?, x2!)Sb(x1?, x2!) ≡ F2(x1?, x2!)Sc(x1?, x2!) ≡ F3(x1?, x2!)Suppose that an user chooses to execute the following service composition and an error

occurs during the execution of Sc.

C(x1?, x4!) ≡ Sa(x1?, x2!), Sb(x2?, x3!),Sc(x3?,x4!)

The sets of functionalities considered for each level of recovery are:FL = {F3}

4.3. RECOVERY ALGORITHMS 57

FP = {F3}FT = {F1, F2, F3}Notice that FL and FP are equal. In this case, the recovery method tries to substitute the

failed service at local and partial level.

Algorithm 2 Recover.Input:

- The set PCD of all available PCDs.- The set PCDC of PCDs used in C.- The failed service Si.- The set F of functionalities to be covered.Output:

- The recovering composition.

1: function Recover(PCD,PCDC , Si,F)

2:PCDF ← PCDC \ {p ∈ PCDC |

p covers f ∈ F}3: PCDH ← PCD \ PCD[Si]4: PCDU ← {p ∈ PCDH | p covers f ∈ F}5: for each P = {P1, . . . , Pm} ⊆ PCDU

6: such that7: (i) F ⊆ Funct(P) and8: (ii) ∀ r 6= s. Funct(Pr) ∩ Funct(Ps) = ∅9: do

10: if Compatible(P ∪ PCDF ) then11: return Combine(P ∪ PCDF )12: end if

13: end for

14: return empty;15: end function

Algorithm 2 produces new service compositions. It takes (i) the set (registry) PCD ofall available PCDs; (ii) the set PCDC of PCDs used in the original composition C; (iii) theidenti�cation of the failed service Si; and (iv) the set F of functionalities covered by servicesto be replaced in the recovering composition. This algorithm de�nes three sets of PCDs:

1. The set PCDF of �xed PCDs (line 2). This set is formed by all the PCDs of the originalcomposition that will be maintained in the resulting composition. Notice that the contentsof this set depend on the recovery level. This set is empty in total recoveries.

2. The set PCDH of healthy PCDs (line 3), containing all the available PCDs, except thoseformed for the failed service Si.

3. The set PCDU of available PCDs (line 4), containing the healthy PCDs that can be usedto cover the functionalities in F .

The loop at line 5, iterates over those sets P of PCDs that may be used to recover thecomposition. As in POTI, the Pareto ordering of user preferences de�ne the order that thealgorithm iterates over the search space of produced PCDs. As previously seen in this chapter,the PCDs are assigned with the same user preference score of the service in which they arebased. Conditions at lines 7 and 8 state that each functionality in F must be covered just onceby P . The body of the loop checks whether the P and PCDF form a suitable composition.


As in the combination phase of POTI, the Compatible method called, in line 10, checks therewriting constraints before combining PCDs, such as: (i) all functionalities of the speci�cationneeds to be covered; (ii) each functionality must be covered by just one PCD; (iii) the useof parameters on the predicates must be consistent. The function Compatible takes a set ofPCDs and returns a Boolean value indicating the success of the veri�cation. Once the �rst setof compatible PCDs is found, it is passed to method Combine (line 11) which is responsiblefor producing the recovered composition with the de�ned PCDs. The algorithm �nishes once asolution is found.

Intuitively, the time complexity of Algorithm 1 is dominated by the time complexity ofAlgorithm 2. Algorithm 2 corresponds to the combination phase of POTI [1], but the searchspace to be explored depends on the recovery level being executed. In the worst case, we havea total recovery, where all the functionalities may be replaced and all the concrete services areanalyzed, except for the failed service. Therefore, the worst-case time complexity of Algorithm 2is the same of the combination phase of POTI: (M×m

n)

nm , where n is the number of functionalities

in the composition,M is the number of services, andm is the maximal number of functionalitiesin the speci�cation of a service.

This chapter presented the methodology proposed for the recovery mechanism which pri-mary purpose is to provide alternative services to failed components of a given Web servicecomposition. The algorithms and its reasoning were presented in details throughout the chap-ter. Additionally, to illustrate the di�erent scenarios of failures and the capabilities of the mech-anism, comprehensive examples are explored to ensure the well understanding of the developedtool.

The di�erences between the proposed levels of recovery create some points of interest forinvestigation. For example, during the sequence of attempts of recovery, the tendency is toincrease the �exibility of substitution by considering more services to be replaced at each level.This e�ect naturally increases the e�ort applied to process the substitution, but possibly a lowerdegradation of preference may be achieved by the increased �exibility. Besides the di�erencesbetween levels, the work�ow of the recovery mechanism can be studied to identify the e�ortdemanded by each level in di�erent scenarios of failure. This study may help to specify thesuitable level of recovery in di�erent situation of failures. In the next chapter, we dedicate toexplore the proposed mechanism to develop some studies regarding the results of the recoverymethod and the resources applied to the process.

Chapter 5

Experimental Results

This chapter presents the validation approach designed to evaluate the proposed fault recoverymechanism. To validate our proposal, we have implemented a prototype and conducted exper-iments with three primary objectives: (i) measure the recovery time of compositions in thepresence of faults; (ii) evaluate the compliance of the recovered composition with regard to theuser preferences, and (iii) understand how the locality of the fault a�ects these two parameters.The �rst section of this chapter explains the work�ow of the experiment and the elements thatcompose the experimental environment. The remaining sections present separately the analysisand visualization of the experimental results.

5.1 Experimental Settings

The experiment consists in simulating di�erent scenarios of failures to trigger recoveries. Al-gorithm 3 describes the work�ow of the experiment which performs the following tasks: (i)generation of a composition speci�cation Cspec (lines 3 and 4); (ii) building of synthetic Webservices that may be used to rewrite Cspec (line 5); (iii) execution of the rewriting mechanism togenerate the service compositions C and their associated set of PCDs PCD(line 6); and �nally,(iv) executing of the recovery method in simulated scenarios of failures (line 7). In the following,we explain the steps of Algorithm 3 in details and illustrate some of them with examples.

Algorithm 3 General Recovery ProcessInput:

- The minimum number of functionalities Fmin.- The maximum number of functionalities Fmax.- The number of times N to repeat each recovery.

1: procedure Monitor(Fmin, Fmax, N)2: for each i ∈ {Fmin, . . . , Fmax} do

3: Cspec ← C(x1, xi+1) ≡4: F1(x1, x2), . . . , Fi(xi, xi+1)5: W ← BuildSyntheticWebServices(i)6: (C, PCD) ← POTI(Cspec,W)7: SimulateFailures(W , C,PCD,N)8: end for

9: end procedure

59

60 CHAPTER 5. EXPERIMENTAL RESULTS

5.1.1 Composition Speci�cation

The procedureMonitor (Algorithm 3) considers compositions with a number of functionalitiesranging from Fmin to Fmax, and assumes N as the number of executions of each failure recovery.In Example 5.1.1 we illustrate the generation of compositions of sizes 4, 5 and 6.

Example 5.1.1. Consider that Algorithm 3 is de�ned to experiment composition speci�cationsof sizes ranging from Fmin = 4 to Fmax = 6. The results produced in lines 3 and 4 consideringthe speci�ed parameters are:

C4(x1, x5) ≡ F1(x1, x2), F2(x2, x3), F3(x3, x4), F4(x4, x5)C5(x1, x6) ≡ F1(x1, x2), F2(x2, x3), F3(x3, x4), F4(x4, x5), F5(x5, y6)C6(x1, x7) ≡ F1(x1, x2), F2(x2, x3), F3(x3, x4), F4(x4, x5), F5(x5, x6), F6(x6, x7)

In order to run the experiment, we need to generate synthetic services with di�erent num-ber of functionalities. The size of the produced composition speci�cation is the fundamentalparameter to build the synthetic services involved in the experiment, which is described in thenext section.

5.1.2 Synthetic Web Services

Algorithm 3 calls the function BuildSyntheticWebServices (Algorithm 4) to generatethe registry of available Web services W . This registry is initially empty, and it is iterativelypopulated with the speci�cation of services S[r,s], where r and s respectively correspond to the�rst and last functionalities covered by the service. For instance, S[2,4] is the service coveringthe interval of functionalities F2 to F4, being de�ned by the following speci�cation:

S[2,4](x2, x5) ≡ F2(x2, x3), F3(x3, x4), F4(x4, x5).

Algorithm 4 Build Synthetic Web ServicesInput:

- The number of functionalities n.Output:

- All the possible Web services for n functionalities.

1: function BuildSyntheticWebServices(n)2: W ← ∅3: for each i ∈ {1, . . . , n} do4: for each j ∈ {1, . . . , n− i+ 1} do5: F ← ∅6: for each k ∈ {j, . . . , j + i− 1} do7: F ← F ∪ {Fk(xk, xk+1)}8: end for

9: W←W ∪ {S[j,j+i−1](xj, xj+i)≡F}10: end for

11: end for

12: return W13: end function

The synthetic services cover all the possible sequential portions of a given abstract compo-sition. Those sequential portions cover from 1 to n functionalities, where n is the total size ofthe abstract composition.

5.1. EXPERIMENTAL SETTINGS 61

In Algorithm 4, each iteration of the loop at line 3 controls the generation of Web servicesthat cover a given number of functionalities. In this manner, the �rst iteration builds serviceswith one functionality, the second iteration builds services with two functionalities, and so on.Example 5.1.2 illustrates the construction of the synthetic service considering a compositionspeci�cation of four functionalities.

Example 5.1.2. Let us consider a composition with four functionalities, as described by thefollowing speci�cation:

Cspec(x1, x5) ≡ F1(x1, x2), F2(x2, x3), F3(x3, x4), F4(x4, x5)

where Cspec describes a composition with functionalities F1, . . . , F4. For this speci�cation, Algo-rithm 4 builds the synthetic services below:

• Services with 1 functionality:

S[1,1](x1, x2) ≡ F1(x1, x2)

S[2,2](x2, x3) ≡ F2(x2, x3)

S[3,3](x3, x4) ≡ F3(x3, x4)

S[4,4](x4, x5) ≡ F4(x4, x5)

• Services with 2 functionalities:

S[1,2](x1, x3) ≡ F1(x1, x2), F2(x2, x3)

S[2,3](x2, x4) ≡ F2(x2, x3), F3(x3, x4)

S[3,4](x3, x5) ≡ F3(x3, x4), F4(x4, x5)


S[1,3](x1, x4) ≡ F1(x1, x2), F2(x2, x3), F3(x3, x4)

S[2,4](x2, x5) ≡ F2(x2, x3), F3(x3, x4), F4(x4, x5)


S[1,4](x1, x5) ≡ F1(x1, x2), F2(x2, x3), F3(x3, x4), F4(x4, x5)

Notice that these services cover all possible sequence of functionalities between F1,. . . ,F4.

After generating the composition speci�cation (Cspec) and populating the service registry(W), Algorithm 3 is able to rewrite Cspec using services available in W .

5.1.3 Rewritings of Speci�cation

In Algorithm 3 (line 6), the rewriting mechanism POTI is called to generate the service compo-sitions in which the failures are simulated. POTI is one of the rewriting methods proposed by [1]that were presented in Chapter 3. The algorithm of POTI builds Partial Coverage Descriptors(PCDs) considering all the available concrete services and produces concrete compositions froma search space formed by those PCDs. In the case of POTI, the search space is swept by using aPareto ordering of the functionalities that form the composition. POTI produces a set of servicecompositions C and a set PCD, containing all the PCDs used to generate the compositions inC. Those sets are used to simulate failures.


Example 5.1.3. Let us consider the composition with four functionalities used in Example 5.1.2and the set W produced for that example.

Cspec(x1, x5) ≡ F1(x1, x2), F2(x2, x3), F3(x3, x4), F4(x4, x5)

W = {S[1,1], S[2,2], S[3,3], S[4,4], S[1,2], S[2,3], S[3,4], S[1,3], S[2,4], S[1,4]}

For this speci�cation and available candidates, POTI produces the sets PCD and C below:

PCD =

{PCDS[1,1]

, PCDS[2,2], PCDS[3,3]

, PCDS[4,4], PCDS[1,2]

, PCDS[2,3],

PCDS[3,4], PCDS[1,3]

, PCDS[2,4], PCDS[1,4]

}

C =

C1(x1, x5) ≡ S[1,1](x1, x2), S[2,2](x2, x3), S[3,3](x3, x4), S[4,4](x4, x5),C2(x1, x5) ≡ S[1,1](x1, x2), S[2,2](x2, x3), S[3,4](x3, x5),C3(x1, x5) ≡ S[1,1](x1, x2), S[2,3](x2, x4), S[4,4](x4, x5),C4(x1, x5) ≡ S[1,1](x1, x2), S[2,4](x2, x5),C5(x1, x5) ≡ S[1,2](x1, x3), S[3,3](x3, x4), S[4,4](x4, x5),C6(x1, x5) ≡ S[1,2](x1, x3), S[3,4](x3, x5),C7(x1, x5) ≡ S[1,3](x1, x4), S[4,4](x4, x5),C8(x1, x5) ≡ S[1,4](x1, x5)

Once all the possible rewritings are produced, the function SimulateFailures(Algorithm 5)is called by Algorithm 3(line 7). This function is responsible for emulating a scenario of failurein each service of all produced compositions.

5.1.4 Simulation of Failures

Algorithm 5 describes the core of our experiment. This algorithm de�nes the procedure Simu-lateFailures, called by Algorithm 3 (line 7). This method is responsible for simulating thepossible scenarios of failures in all produced service compositions. The procedure Simulate-Failures takes (i) the set (or registry)W of services; (ii) the set of compositions C, generatedby POTI; (iii) the set PCD, also generated by POTI; and (iv) the number N of times toexecute each recovery.

5.1. EXPERIMENTAL SETTINGS 63

Algorithm 5 Simulation of FailuresInput:

- The set W of available Web services.- The set C of service compositions.- The set PCD of available PCDs.- The number of times N to repeat each recovery.

1: procedure SimulateFailures(W , C, PCD,N)2: for each C ∈ C such that3: C(x1, xi+1) ≡

S[1,a](x1, xa+1),..., S[b,i](xb, xi+1)4: do

5: PCDC ← PCDsOf(C)6: for each j ∈ {1, . . . , N} do7: SetPreferences(PCD,PCDC)8: for each S[k,l]∈{S[1,a],..., S[b,i]} do9: L ← {S[a,b] ∈ W|k ≤ a, b ≤ l}10: Heal(PCDsOf(L), C, S[k,l])11: P←{S[a,b] ∈ W | k≤a≤ l, b > l}12: Heal(PCDsOf(P), C, S[k,l])13: T ← {S[a,b] ∈ W | a 6= k}14: Heal(PCDsOf(T ), C, S[k,l])15: end for

16: end for

17: end for

18: end procedure

The procedure SimulateFailures tries to recover from failures in each composition C inC (line 2). Each composition C is assumed to be the most preferred composition during thesimulation of their failures. In this way, the algorithm assigns the maximum possible preferencefor C. The preference of a composition is calculated as the mean of the preference scores ofits component services. Thus, we assign the value 1.0 to the services that participate in C, soits preference score will be 1.0. Then, Algorithm 5 randomly assigns preference scores between0.01 and 0.99 to the remaining services (line 7). Thus, the remaining compositions will have amean preference score smaller than 1.0.

Example 5.1.4. Suppose that the procedure SimulateFailures tries to recover from failuresin the composition C3 of Example 5.1.3.

C3(x1, x5) ≡ S[1,1](x1, x2), S[2,3](x2, x4), S[4,4](x4, x5)

The procedure SetPreferences, called by Algorithm 5 (line 7), de�nes the preferencescores in a way that:


PCD PreferencePCDS[1,1]

1.0PCDS[2,2]

0.25PCDS[3,3]

0.72PCDS[4,4]

1.0PCDS[1,2]

0.9PCDS[2,3]

1.0PCDS[3,4]

0.35PCDS[1,3]

0.65PCDS[2,4]

0.47PCDS[1,4]

0.12

The inner loop of Algorithm 5 simulates the failure of each service S[k,l] in the de�nition ofC (line 8). The algorithm uses the available services for addressing the three levels of recovery,one after the other. This sequential work�ow is done by �ltering the set W to ensure that theright subset of PCDs is later o�ered to the Heal procedure:

• The set L is a registry which contains all the services in W that may be used to substi-tute S[k,l] in C (i.e, to perform a local recovery of S[k,l]). Thus, only services that coverfunctionalities presented in the speci�cation of S[k,l] are maintained in L (line 9).

• The set P is a registry containing all the services in W that may be used to substituteS[k,l] and all those services appearing in C after S[k,l] (i.e, to perform a partial recoveryafter the failure of S[k,l]). Thus, services that cover the functionalities of S[k,l] and thoseof the subsequent services are kept in P (line 11).

• The set T is a registry that contains all the services in W that may be used to build anew composition from scratch, not using S[k,l] (i.e, to perform a total recovery after thefailure of S[k,l]). In that case, all services of W are considered by T , except those thatcover the �rst functionality of the speci�cation of S[k,l] (line 13). Thereby the recoverymechanism must consider candidates that also recover services that precede S[k,l] in thecomposition. This way the local or partial recoveries will not represent viable options.

Example 5.1.5. Suppose that, during the experiment, a failure is simulated in S[2,3] whileconsidering the service composition C3, produced in Example 5.1.3.

C3(x1, x5) ≡ S[1,1](x1, x2),S[2,3](x2,x4), S[4,4](x4, x5)

Considering the set of available service W, which was used in Example 5.1.3 to produce therewritings.

W = {S[1,1], S[2,2], S[3,3], S[4,4], S[1,2], S[2,3], S[3,4], S[1,3], S[2,4], S[1,4]}

The registry sets produced during the simulation of failures are:L = {S[2,2], S[3,3], S[2,3]}W = {S[2,2], S[3,3], S[4,4], S[2,3], S[3,4], S[2,4]}W = {S[1,1], S[3,3], S[4,4], S[1,2], S[3,4], S[1,3], S[1,4]}

In our validation, the Heal procedure is feed with the PCDs that are based on the elementsof the registry sets L, P and T . Each registry set induces the Heal procedure to recover thecomposition C by applying a di�erent level of recovery. The results and time spent for eachrecovery are registered, which generates the database of experimental results.

5.2. PARAMETER SETTINGS 65

5.2 Parameter Settings

In order to validate our proposal, we have built a propotype and explored compositions of sizesranging from Fmin = 4 to Fmax = 12 functionalities (this range of functionalities was chosen dueto the time required to run our validation, as explained latter in this section). The propotypewas executed on top of an Ubuntu 18.04 LTS Bionic Beaver, Linux kernel 4.15, 8GB RAM,AMD Phenom II X4 820 2.8GHz Quad-Core, Java 8.

In Figure 5.1, we show the number of synthetic services built when considering each composi-tion size. As expected, the number of functionalities within a composition speci�cation directlyimpacts on the size of the set of generated services. This behavior is justi�ed by the fact thatthe synthetic services are supposed to cover portions of the composition of di�erent sizes, goingfrom a single functionality until the totality of the speci�cation. The amount of services de�nesthe number of compositions explored and the size of the search space of substitutes.

Figure 5.1: Number of synthetic services built for the experiment.

In Figure 5.2, we show the number of recoveries executed by the healing algorithm, dis-tributed by level. The increment of the number of functionalities increases the number of ser-vices and services compositions. Consequently, the number of recoveries executed for each sizeof composition follows the growth of the set of available services. Figure 5.2 also exposes thedi�erences and similarities between the occurrences of recoveries at di�erent levels. For eachsize of composition, the number of local recoveries is always smaller than the number of partialand total recoveries. This di�erence happens because simulated failures in services that cover asingle functionality are not possible to recover at the local level. This can be explained by thefact that the search space of available services is de�ned in a way that there are no repeatedcandidate services. Therefore, if a service that covers just one functionality fails, there will nothave a direct substitute for enabling the local recovery.

In the case of partial and total recoveries, they present the same amount of executions.For the partial recovery, the failures simulated at the last service of each composition are notconsidered for this level of recovery. As explained in Section 4.3, in this particular case, thepartial recovery tries to recover the composition precisely as the local recovery. During theexperiment, the services are �ltered in a way that the subsequent services of the failed onemust be considered. Since the last service in the composition does not precede any other, thepartial recovery is not viable, and this failure is supposed to be recovered at another level.

A similar case occurs for the total recovery. This level of recovery is not considered forhealing failures in the �rst service of each composition. In this particular case, the failed service


Figure 5.2: Number of executed recoveries distributed by level.

does not have precedent services. Additionally, the search space of available services is �lteredin a way that the failure can only be �xed by recovering services that precede the failed one.Therefore, considering the �ltered search space, the total of recovery is not possible for thescenario of failure in the �rst service. These conclusions on the particular cases of recovery foreach level helps to understand the data presented in Figure 5.2.

During the experiment, the time cost and the preference degradation are measured for eachexecuted recovery. In Figure 5.3, we show the time spent for a single execution of the experimentconsidering the portions of time dedicated to the rewriting and to recovery procedures. Noticethat for smaller compositions, the execution of the experiment takes less than 78,125 ms, whichcorresponds to less than 2 minutes. In the case of compositions with the maximum number of 12functionalities, the experiment takes approximately 10 hours to execute the whole work�ow ofthe experiment for one time. In that manner, we have experimented the recoveries consideringcompositions containing from 4 up to 12 functionalities.

Figure 5.3: Time spent to execute the experiment considering compositions of di�erent sizes.

The next sections discuss the experimental results divided by each analyzed parameter: (i)the recovery time cost; (ii) the compliance of the recovered composition with regard to theuser preferences, and (iii) the locality of faults. The results presented in the next sections were

5.3. RECOVERY TIME 67

obtained running the prototype on the same running environment as the previous experiments.However, in order to generate consistent measurements of the time required for recoveries, eachfailure is simulated for 10 times.

5.3 Recovery Time

In this section we analyze the time required for recovering compositions. Figure 5.4 shows theaverage time spent by the healing algorithm to perform local, partial and total recoveries, foreach size of the composition. As expected, local recoveries are cheaper than partial and totalrecoveries. Indeed, the local recovery tries to substitute a single service while the other levelsreplace di�erent amounts of services. Figure 5.4 also shows that the total recovery is the mostexpensive level of recovery.

Figure 5.4: Average recovery time of all levels of recovery.

In Figure 5.4, the recoveries executed in compositions of four functionalities are representedas more expensive than recoveries in some larger compositions. This behavior is explained by thetime spent by the Java Virtual Machine (JVM) to start up. At the beginning of the execution,the JVM spent some time to initialize the needed classes in order to run the code [56]. Since theexperiment follows an increasing order of composition size to run the experiment, the startuptime of the JVM is also taken into account during the �rst tests. In order to ensure that thestartup time is the reason for the unexpected results, we executed recoveries for 7, 6, 5 and 4funcionalities, in this reverse order. As expected, the recoveries in the compositions having fourfunctionalities are cheaper than the other recoveries, as shown in Figure 5.5.

The next �gures present a detailed analysis in each of the recovery levels represented inFigure 5.4. For example, Figure 5.6 shows the average time cost to execute local recoveries foreach size of the composition. The di�erence between the highest and lowest measured values isapproximately 0.2 ms. Because of the reduced di�erence, we can say that the time cost of a localrecovery does not considerably vary in cases of di�erent sizes of compositions. These similaritiesare justi�ed by the fact that in every case of local recovery only one service is replaced, even ifit covers more than one functionality.

In Figure 5.7, we show the average time spent by the recovery mechanism to heal a com-position at the partial level. As previously explained, the mechanism only executes the partialrecovery after trying the local recovery. Then the overall time cost results from the summation


Figure 5.5: Experiment executed with decreasing numbers of funcionalities.

Figure 5.6: Local recovery - Average recovery time.

5.3. RECOVERY TIME 69

of the time elapsed during the unsuccessful attempts of local recovery and the time spent duringthe search for a solution at the partial level. As expected, the average time cost increases aswe increment the number of functionalities in the composition speci�cation. The average timecost exceeds 1 ms for partial recoveries executed in compositions of 10 or more functionalities.

Figure 5.7: Partial recovery - Average recovery time.

Figure 5.8 exposes the percentage distribution of the average time cost presented in Fig-ure 5.7. Figure 5.8 proves that the combinatorial search executed during the partial recoveryis the most expensive phase of the process. This search represents an average of 98% in theoverall time spent during the partial recovery of compositions of 12 functionalities, for example.While the remaining 2%, in this case, is taken to unsuccessfully try the local recovery. Thisbehavior indicates that the greater the composition, the greater the relative portion of timespent with the partial recovery. This behavior is expected due to the e�ort applied during thepartial recovery to �nd a sub-composition to recover the failed service and subsequent others.

Figure 5.9 shows the average time spent for the execution of total recoveries for each com-position size. As previously shown in Figure 5.4, the total recovery demands more time forrecovering than the other levels of recovery. This di�erence is mainly noticeable in cases ofcompositions with 10 or more functionalities. As in Figure 5.7, the data of Figure 5.9 also

Figure 5.8: Partial recovery - Percentage distribution of the average recovery time.


Figure 5.9: Total recovery - Average recovery time.

considers the accumulation of time cost of the unsuccessful attempts of the previous levels ofrecovery, local and partial.

Figure 5.9 exposes that, for more complex compositions, the time spent to search a solutionfor total recovery is considerably smaller in comparison with the time taken by the algorithmto conclude the failed attempts of local and partial recovery. Indeed, the problem of recoveringa service composition becomes more �exible when all services are eligible for substitution. This�exibility eases the search of solutions at the total level in comparison with the other levelsthat present restrictions for their combinatorial problems.

Figure 5.10 gives more details on the relative portions of time cost dedicated to each level ofrecovery, from the local to the total level. The rise in the percentage of time spent for the attemptof partial recovery becomes clear as we increase the size of the composition and consequentlythe search space of available services. For the recovery mechanism the local recovery representsthe most restrictive level, and the total recovery is seen as the most �exible level. The partialrecovery remains in the middle of those levels regarding �exibility. The execution of recoveriesin this order increases time costs. For services compositions of nine functionalities, for instance,the unsuccessful attempt of the partial recover can represent more than half of the time cost.This e�ect may motivate adjustments on the healing algorithm to achieve solutions for totalrecoveries faster. The work�ow could be adapted to ignore the partial recovery to return asolution spending less time, for example.

5.4 Preference Degradation

This section explores the results in terms of user preference of the recovered compositions.Figure 5.11 shows the mean preference degradation achieve by the healing algorithm whenperforming local, partial or total recoveries, for each size of the composition. The values ofdegradation were obtained as the di�erence between 1.0 (the preference of each most preferredcomposition, as generated by POTI) and the mean preference value of the recovering compo-sition produced for each level of recovery.

Notice that, for all recovery levels, the degradation of preferences reduces as the number offunctionalities increases. This behavior is explained by the fact that the higher the number offunctionalities, the smaller the relative contribution of the failed service for the overall preferencevalue of a composition. We can also notice that local recovery consistently provokes a greater

5.4. PREFERENCE DEGRADATION 71

Figure 5.10: Total recovery - Percentage distribution of the average recovery time.

Figure 5.11: Average preference degradation of all levels of recovery.


preference degradation. The partial recovery delivers better results but they are close to theones reached by the local level. The best results are obtained by using the total recovery, whichis more �exible since the algorithm may substitute all the services in the original composition.

Considering Figures 5.4 and 5.11 we observe that, for smaller compositions, the time costdoes not di�er signi�cantly between the levels but the total recovery delivers the lowest pref-erence degradation. Additionally, in the case of more complex compositions, the total recoverystill the deliver the smallest preference degradation. In all cases, the total recovery representsthe most expensive level. As seen in Figure 5.10, the unsuccessful attempt of the partial recov-ery is the primary cost of the total recovery. These observations suggest the adoption of totalrecoveries for small compositions.

In the case of larger compositions, the recovery mechanism could skip the attempt of partialrecovery in order to reduce time costs. In that way, the local recovery is initially tried, but ifthis level of recovery is not successful the total recovery is initiated. These results also showthat di�erent levels of recovery may be suitable for di�erent priorities on the time cost andpreference degradation. The total recovery may be desirable for a situation that prioritizesthe lower degradation of preference over the time cost. Whereas local recovery is suitable forcases that demand the fastest solutions for failures independently of the resulting preferencedegradation.

5.5 Locality of Faults

This section is dedicated to analyzing how the locality of the faults within the compositionimpacts on the recovery time and preference degradation. Considering the local recovery, Fig-ure 5.12 presents the average recovery time for compositions with di�erent numbers of func-tionalities. Notice that the values achieved are pretty similar, which shows that local recoveryreaches the same results independently of the locality of the failed service in the composition.

Figure 5.12: Local recovery - Average recovery time (ms) considering the locality of faults

In Figure 5.13, we show the average preference degradation of locally recovered compositionsof di�erent sizes. The exposed values are also equivalents for all possible localities of faults. The

5.5. LOCALITY OF FAULTS 73

behavior illustrated by these data is expected considering that the local recovery only substitutea single service of the composition.

Figure 5.13: Local recovery - Average preference degradation considering the locality of faults.

In Figure 5.14, we show how the locality of faults a�ects the average recovery time for thepartial recoveries of compositions of di�erent sizes. Notice that the recovery time is longer whenthe fail occurs in the initial portions of the composition. This behavior is expected because the�rst services have more subsequent service to be considered during the partial recovery.

Figure 5.14: Partial recovery - Average recovery time (ms) considering the locality of faults

In Figure 5.15, we show how the preference degrades in partial recoveries for di�erent faultlocalities. The reached values are su�ciently close to state that the partial recovery achievesthe same preference degradation for failures in di�erent services of the composition.


Figure 5.15: Partial recovery - Average preference degradation considering the locality of faults.

In Figure 5.16, we show the average time cost for the total recovery depending on the localityof fault for di�erent compositions. Notice that the time cost for total recovery reduces if thefailed service is close to the end of the composition. Whereas faults in the initial portion of thecomposition demand more time for recovery.

Figure 5.16: Total recovery - Average recovery time (ms) considering the locality of faults

The analysis of the locality of faults for the partial recovery (Figure 5.14) and total recovery(Figure 5.16) present similar behavior. Because of that we decided to investigate the time spentin each level of recovery before achieving the total level. The in�uence of the unsuccessfulattempt of partial recovery on the �nal cost of total recovery was previously noted duringthe recovery time analysis(Figures 5.9 and 5.10). Because of that, we wanted to investigatethe contribution of the attempts of other levels to the resulting data shown by Figure 5.16.


Figures 5.17, 5.18 and 5.19 show the time spent in each level of recovery while experimentingthe total recovery of failures.

In Figure 5.17 we note that the same stable behavior with regards to the locality of faults ismaintained when reaching unsuccessful responses at the local level of recovery. These data alsoshow that the local recovery has a minor in�uence on the average time cost of total recovery.

Figure 5.17: Total recovery - Average recovery time (ms) considering the locality of faults -unsuccessful attempt of Local recovery.

Figure 5.18 shows that failures that occurred at the beginning of the composition demandsmuch more time for partial recovery. The values also enforce that the partial recovery representsthe greatest contribution to the time cost of the overall time spent in total recovery. Therefore,the partial level of recovery directly in�uences the time cost of total recovery with regards tothe locality of faults. For example, for compositions with 12 functionalities, the time cost of afailure at the second service represents more than 360 ms than a failure at the last service ofthe composition.

Figure 5.19 shows the time spent to �nd a solution that replaces all the service compositionon the total level. Note that for small compositions the time cost does not present a relevantdi�erence when considering the failure in di�erent portions of the composition. However, in thecase of more complex compositions, when the �rst functionalities are involved in the failure,the time cost is slightly greater than the occurrence of failures at the last functionalities of thecomposition. For example, in the case of a composition of 12 functionalities, a failure at thesecond functionality would require 17 ms more than a failure at the last functionality.

In Figure 5.20, we show how the preference degrades in partial recoveries for di�erent faultlocalities. These values do not present a signi�cant di�erence which lead us to conclude thatthe resulting preference degradation will be the same independently of the locality of faults.

This chapter explained the experiment developed to test the recovery algorithm proposed inthe present thesis. The experimental setting is formed by phases, such as (i) building syntheticservices, (ii) rewriting compositions, and (iii) simulating failures. The recovery method reactsto simulated failures in services of all generated compositions. The results of the experimentguide the conclusions on the functioning of the recovery mechanism in di�erent cases of failures.The levels of recovery considered by the mechanism appear to deliver signi�cantly di�erent


Figure 5.18: Total recovery - Average recovery time (ms) considering the locality of faults -unsuccessful attempt of Partial recovery.

Figure 5.19: Total recovery - Average recovery time (ms) considering the locality of faults -successful attempt of Total recovery.


Figure 5.20: Total recovery - Average preference degradation considering the locality of faults.

results in cases of complex compositions. The local recovery is identi�ed as the fastest level ofrecovery. On the other hand, the total recovery returns the best solution concerning preferencedegradation. Additionally, the partial recovery represents the most expensive portion of therecovery method and delivers an intermediate preference degradation. The results show thatthe execution of the partial level of recovery is not justi�ed by the demanded time cost. Thisdiscovery may motivate a potential optimization of the recovery method.

Chapter 6

Conclusions

In this dissertation, we have proposed a fault-recovery mechanism for Web service composi-tions. The approach was based on the substitution of the service that originates the failure.The recovering service is expected to meet compatibility restrictions imposed by the interactionwith the other services included in the composition. To address such a challenge, we adaptedrewriting techniques used during the re�nement phase of the composition development. The ser-vice modeling and the compatibility veri�cation executed during rewriting process are relevantcontributions for the search of alternative services for healing the composition.

Chapter 2 presented the related work in the area of fault recovery for service compositions.In that chapter we exposed that most of the works commonly de�ne a complete platformcapable of detecting and reacting to faults automatically. Other works focus on speci�c issuesregarding the substitution of failed services. Some of those works deal with single aspects ofthe problem, such as the organization of a search space of available services and the processingof user preferences over the recovery candidates. Our work does not include the speci�cation ofa mechanism for monitoring faults. The proposed mechanism is based on the outcomes of therewriting process. The recovery mechanism is supposed to help a self-healing platform capableof detecting faults and triggering reactions to the problem. The proposed recovery methodpresented solutions for most of the problems that some of related works address separately.

Chapter 3 presented the theoretical background of the present thesis. This chapter includedbasic notions of the Service-Oriented Architecture (SOA), Web service composition rewriting,faults and recovery classi�cation. In that chapter, we presented the rewriting process in whichthe proposed recovery mechanism is based. Moreover, a taxonomy of faults was presented as asummary of previous taxonomies proposed for the context of Web service compositions. Duringthis study, we learned that recovery actions that entail the replacement of failed services issuitable for solving all classi�ed faults.

Chapter 4 speci�ed the proposed recovery mechanism. In this chapter, we adapted someof the concepts considered for the rewriting algorithm POTI [1] to the provision of alternativeservices. We also formalized the work�ow of the recovery mechanism and its considered levels.Local, Partial and Total recovery represent di�erent levels of intervention on the failed servicecomposition. In a local recovery, only the failed service is replaced by one or more candidatesthat recover the corresponding functionalities. Additionally, a recovery on the partial levelconsiders the substitution of the failed service and all the services of the compositions to beexecuted yet. Finally, the total recovery requires the complete recomposition of the originalbusiness process.

Chapter 5 described the experiment developed to explore the recovery mechanism. Thisexperiment included (i) the design of synthetic services and composition speci�cations; (ii) therewriting of those compositions speci�cations and (iii) the simulation of faults in the servicecompositions produced. The results of the experiment enabled the analysis of the time cost

79

80 CHAPTER 6. CONCLUSIONS

demanded by the levels of recovery, the preference degradation resulting from the replacement ofa service and the e�ects of the locality of faults. From those studies, we achieve some conclusionson the functioning of the proposed mechanism with compositions of di�erent sizes.

For smaller compositions, the results did not show relevant di�erences on the time cost ofthe recovery levels considered. For more complex compositions, the time spent for total recoveryconsiderably exceeds the average time needed to execute local and partial recoveries. However,the results have shown that the total recovery delivers less preference degradation than theother levels in all considered sizes of compositions. Additionally, the unsuccessful attempt ofthe partial recovery represented the main cost of the process of achieving the total recovery ofthe composition. By eliminating the need to try the partial recovery before executing the totalrecovery, we believe that the recovery method may improve the results by reaching the bestsolutions faster. Those results also helped us to conclude that the recovery level to be appliedcan be chosen according to the priorities of the user, either on the quality of the recoveringcomposition or on the recovery cost.

Although the present work accomplished the initially proposed goals, the mechanism de-veloped presents limitations that must be considered. The recovery mechanism is supposed todeal with failed services that do not present any harm to the state of the business process.The mechanism considers that the failed service does not alter the world and the executionof compensation actions are not necessary to achieve the recovery. Additionally, the proposedmechanism can be adapted to consider the execution of compensation services in case of failurein world-altering services. The compensation services would be executed before the implemen-tation of a replacement, which changes the proposed work�ow of the recovery mechanism.

Finally, the use of compensation actions and the adoption of di�erent combinations ofrecovery levels can be managed by a fault recovery platform that is responsible for identifyingthe failure and smartly reacting to it. This external entity may present an arti�cial intelligencecapable of choosing the best recovery work�ow for a given failure based on the history ofattempts and the current characteristics of the recovery scenario. For example, the recoveryplatform could be speci�ed to ignore failures which occur in services that are not essential tothe primary goal of the composition. The speci�cation of such a smart recovery platform isconsidered a potential future work of the present dissertation.

Bibliography

[1] BA, C. et al. Experiments on service composition re�nement on the basis of preference-driven recommendation. International Journal of Web and Grid Services, Inderscience Pub-lishers (IEL), v. 12, n. 2, p. 182�214, 2016.

[2] CANFORA, G. et al. A framework for qos-aware binding and re-binding of composite webservices. Journal of Systems and Software, Elsevier, v. 81, n. 10, p. 1754�1769, 2008.

[3] CHEN, Z. et al. Ux-an architecture providing qos-aware and federated support for uddi. In:IEEE. International Conference on Web Services. Las Vegas, Nevada, USA, 2003.

[4] BERBNER, R. et al. Heuristics for qos-aware web service composition. In: IEEE. Interna-tional Conference on Web Services. Chicago, IL, USA, 2006. p. 72�82.

[5] JATOTH, C.; GANGADHARAN, G.; BUYYA, R. Computational intelligence based qos-aware web service composition: A systematic literature review. IEEE Transactions on Ser-vices Computing, IEEE, v. 10, n. 3, p. 475�492, 2017.

[6] LI, G. et al. A fault-tolerant framework for qos-aware web service composition via case-based reasoning. International Journal of Web and Grid Services, Inderscience PublishersLtd, v. 10, n. 1, p. 80�99, 2014.

[7] LIU, A. et al. Facts: A framework for fault-tolerant composition of transactional web ser-vices. IEEE Transactions on Services Computing, IEEE, v. 3, n. 1, p. 46�59, 2010.

[8] SHENG, Q. Z. et al. Web services composition: A decade's overview. Information Sciences,Elsevier, v. 280, p. 218�238, 2014.

[9] YU, Q. et al. Deploying and managing web services: issues, solutions, and directions. TheVLDB Journal - The International Journal on Very Large Data Bases, Springer-Verlag NewYork, Inc., v. 17, n. 3, p. 537�572, 2008.

[10] PAPAZOGLOU, M. P. et al. Service-oriented computing: State of the art and researchchallenges. Computer, IEEE, v. 40, n. 11, 2007.

[11] BARHAMGI, M.; BENSLIMANE, D.; MEDJAHED, B. A query rewriting approach forweb service composition. IEEE Transactions on Services Computing, IEEE, v. 3, n. 3, p.206�222, 2010.

[12] ZHAO, W.; LIU, C.; CHEN, J. Automatic composition of information-providing web ser-vices based on query rewriting. Science China Information Sciences, Springer, v. 55, n. 11,p. 2428�2444, 2012.

[13] MESMOUDI, A.; MRISSA, M.; HACID, M.-S. Combining con�guration and query rewrit-ing for web service composition. In: IEEE. 2011 IEEE International Conference on WebServices. Washington, DC, USA, 2011. p. 113�120.

81

82 BIBLIOGRAPHY

[14] COSTA, U. S. et al. Automatic re�nement of service compositions. In: SPRINGER. In-ternational Conference on Web Engineering. Aalborg, Denmark, 2013. p. 400�407.

[15] AVIZIENIS, A.; LAPRIE, J.-C.; RANDELL, B. Fundamental concepts of computer systemdependability. In: Workshop on Robot Dependability: Technological Challenge of DependableRobots in Human Environments. [S.l.]: Citeseer, 2001. p. 1�16.

[16] CHAN, K. M. et al. A fault taxonomy for web service composition. In: SPRINGER. In-ternational Conference on Service-Oriented Computing. Vienna, 2007. p. 363�375.

[17] BRUNING, S.; WEISSLEDER, S.; MALEK, M. A fault taxonomy for service-orientedarchitecture. In: IEEE. 10th IEEE High Assurance Systems Engineering Symposium. [S.l.],2007. p. 367�368.

[18] IMMONEN, A.; PAKKALA, D. A survey of methods and approaches for reliable dynamicservice compositions. Service Oriented Computing and Applications, Springer, v. 8, n. 2, p.129�158, 2014.

[19] SANTHANAM, G. R.; BASU, S.; HONAVAR, V. Web service substitution based on pref-erences over non-functional attributes. In: IEEE. Services Computing, 2009. SCC'09. IEEEInternational Conference on. [S.l.], 2009. p. 210�217.

[20] KOOPMAN, P. Elements of the self-healing system problem space. p. 31, 2003.

[21] AROCHA, R. E. A. An approach for self-healing transactional composite services. Tese(Doutorado) � Université Paris Dauphine-Paris IX, 2015.

[22] SUBRAMANIAN, S. et al. On the enhancement of bpel engines for self-healing compositeweb services. In: IEEE. International Symposium on Applications and the Internet. Turku,Finland, 2008. p. 33�39.

[23] BARESI, L.; GUINEA, S. Dynamo and self-healing bpel compositions. In: IEEE COM-PUTER SOCIETY. Companion to the proceedings of the 29th International Conference onSoftware Engineering. Washington, DC, USA, 2007. p. 69�70.

[24] CHARFI, A.; MEZINI, M. Ao4bpel: An aspect-oriented extension to bpel. World wideweb, Springer, v. 10, n. 3, p. 309�344, 2007.

[25] BARESI, L.; GHEZZI, C.; GUINEA, S. Towards self-healing composition of services. In:Contributions to ubiquitous computing. [S.l.]: Springer, 2007. p. 27�46.

[26] VIZCARRONDO, J. et al. The component of knowledge representation of armiscom forthe self-healing in web services composition. Latin American Journal of Computing Facultyof Systems Engineering Escuela Politécnica Nacional Quito-Ecuador, v. 3, n. 2, p. 14, 2016.

[27] NABUCO, O. et al. Model-based qos-enabled self-healing web services. In: IEEE. 19thInternational Workshop on Database and Expert Systems Application. Turin, Italy, 2008. p.711�715.

[28] AL-HELAL, H.; GAMBLE, R. Introducing replaceability into web service composition.IEEE Transactions on Services Computing, IEEE, v. 7, n. 2, p. 198�209, 2014.

[29] POONGUZHALI, S.; SUNITHA, R.; AGHILA, G. Self-healing in dynamic web servicecomposition. International Journal on Computer Science and Engineering, Citeseer, v. 3,n. 5, p. 2054�2060, 2011.

BIBLIOGRAPHY 83

[30] ALSEDRANI, A.; TOUIR, A. Clustering based service selection for dynamic service com-position. International journal of Web & Semantic Technology, Academy and Industry Re-search Collaboration Center (AIRCC), v. 8, n. 2, p. 1�14, apr 2017.

[31] PENTA, M. D. et al. Ws binder: a framework to enable dynamic binding of composite webservices. In: ACM. Proceedings of the International workshop on Service-oriented softwareengineering. Shanghai, China, 2006. p. 74�80.

[32] HALIMA, R. B.; DRIRA, K.; JMAIEL, M. A qos-oriented recon�gurable middleware forself-healing web services. In: IEEE. IEEE International Conference on Web Services. Beijing,China, 2008. p. 104�111.

[33] CARDINALE, Y.; RUKOZ, M. A framework for reliable execution of transactional com-posite web services. In: ACM. Proceedings of the International Conference on Managementof Emergent Digital Ecosystems. San Francisco, CA, USA, 2011. p. 129�136.

[34] YIN, Y. et al. A self-healing composite web service model. In: IEEE. IEEE Asia-Paci�cServices Computing Conference. Biopolis, Singapore, 2009. p. 307�312.

[35] RAINES, G. Cloud computing and soa. MITRE, white paper, Oct, 2009.

[36] PATEL, S.; SHAH, T. R. A survey on issues and challenges of web service development,composition, discovery. Journal of Science and Technology(ISSN: 0975-5446), VNSGU, v. 5,n. 1, 2016.

[37] BOX, D. Simple object access protocol (soap) 1.1, ver. 1.1. http://www. w3.org/TR/2000/NOTE-SOAP-20000508/, 2000.

[38] FIELDING, R. T. Architectural Styles and the Design of Network-based Software Archi-tectures. Tese (Doutorado) � UNIVERSITY OF CALIFORNIA, IRVINE, California, 2000.

[39] LEMOS, A. L.; DANIEL, F.; BENATALLAH, B. Web service composition: a survey oftechniques and tools. ACM Computing Surveys (CSUR), ACM, v. 48, n. 3, p. 33, 2016.

[40] ELHAG, A. A. M.; MOHAMAD, R. Metrics for evaluating the quality of service-orienteddesign. In: IEEE. 8th Malaysian Software Engineering Conference (MySEC). Langkawi,Malaysia, 2014. p. 154�159.

[41] ORRIËNS, B.; YANG, J.; PAPAZOGLOU, M. Model driven service composition. Service-Oriented Computing-ICSOC 2003, Springer, p. 75�90, 2003.

[42] ANDREWS, T. Business process execution language for web services (bpel4ws) version1.1. http://www-106.ibm.com/developerworks/library/ws-bpel/, 2003.

[43] BUSHEHRIAN, O.; ZARE, S.; RAD, N. K. A work�ow-based failure recovery in webservices composition. Journal of Software Engineering and Applications, Scienti�c ResearchPublishing, v. 5, n. 02, p. 89, 2012.

[44] CASADO, R.; YOUNAS, M.; TUYA, J. A generic framework for testing the web servicestransactions. In: Advanced Web Services. [S.l.]: Springer, 2014. p. 29�49.

[45] KIESSLING, W. Preference queries with sv-semantics. 11th International Conference onManagement of Data (COMAD), Bremen, UNK, Germany, v. 5, p. 15�26, 2005.

84 BIBLIOGRAPHY

[46] KIESSLING, W. Foundations of preferences in database systems. In: Proceedings of 28thInternational Conference on Very Large Data Bases. Hong Kong, China: VLDB, 2002. p.311�322.

[47] POTTINGER, R.; HALEVY, A. Minicon: A scalable algorithm for answering queries usingviews. The VLDB Journal - The International Journal on Very Large Data Bases, Springer-Verlag New York, Inc., v. 10, n. 2-3, p. 182�198, 2001.

[48] WANG, M.; BANDARA, K. Y.; PAHL, C. Integrated constraint violation handling fordynamic service composition. In: IEEE. International Conference on Services Computing.Bangalore, India, 2009. p. 168�175.

[49] ARDAGNA, D. et al. Faults and recovery actions for self-healing web services. In: WorldWide Web Conf. Edinburgh, Scotland Uk: [s.n.], 2006.

[50] SIMMONDS, J.; BEN-DAVID, S.; CHECHIK, M. Monitoring and recovery for web serviceapplications. Computing, Springer, v. 95, n. 3, p. 223�267, 2013.

[51] ERRADI, A.; MAHESHWARI, P.; TOSIC, V. Recovery policies for enhancing web servicesreliability. In: IEEE. International Conference on Web Services. [S.l.], 2006. p. 189�196.

[52] BIANCO, P.; LEWIS, G. A.; MERSON, P. Service level agreements in service-orientedarchitecture environments. Pittsburgh, 2008.

[53] MENASCÉ, D. A. Qos issues in web services. IEEE internet computing, IEEE, v. 6, n. 6,p. 72�75, 2002.

[54] CANFORA, G. et al. An approach for qos-aware service composition based on geneticalgorithms. In: ACM. Proceedings of the 7th annual conference on Genetic and evolutionarycomputation. [S.l.], 2005. p. 1069�1075.

[55] JAEGER, M. C.; ROJEC-GOLDMANN, G.; MUHL, G. Qos aggregation for web servicecomposition using work�ow patterns. In: IEEE. International Enterprise distributed objectcomputing conference. Monterey, CA, USA, 2004. p. 149�159.

[56] LINDHOLM, T. et al. The Java virtual machine speci�cation. [S.l.]: Pearson Education,2014.

A Recovery Mechanism Based on a Rewriting Process for Web ...€¦ · of compositions, takes user...

Documents

Transcript of A Recovery Mechanism Based on a Rewriting Process for Web ...€¦ · of compositions, takes user...