[IEEE 2011 IEEE International Workshop on the Maintenance and Evolution of Service-Oriented and...

8
On Supporting Dynamic Web Service Selection with Histogramming Atousa Pahlevan, Sean Chester, Alex Thomo and Hausi A. M¨ uller Department of Computer Science University of Victoria Email: [email protected], [email protected], [email protected], [email protected] Abstract—To ensure that consumer requests for web services are served successfully and effectively amidst overwhelming options, one must narrow the web service search to only the most qualified, highest-ranked services. However, today, the ranking of services is done only with regards to static attributes or with a snapshot of current values, resulting in low quality search results. To improve user experience, one must consider dynamic quality of service measures and address the practical challenges they incur. In this paper, we propose using histograms and an area- to-right-of-threshold function to handle the fluctuation and absence of attributes values effectively. This permits utilizing well-established techniques for selecting web services, such as skyline and top-k. We also discuss algorithmic considerations to efficiently produce dynamic web service discovery results. I. I NTRODUCTION With the increasing proliferation of web services, selecting the best one for your needs can be a very overwhelming task; so, it is natural to expect some automation of the process. Current mechanisms for this, borrowing primarily from In- formation Retrieval (IR) techniques and based primarily on string similarity, are only of limited use because the service descriptions on which the similarity is calculated are short and often ambiguous or syntactically erroneous. A promising alternative is to evaluate the suitability of each service with quality of service (QoS) parameters. This is quite challenging in practice. However, because QoS attributes can be difficult to measure, fluctuate, are context-sensitive, and depend on environmental factors such as network availability. As an example, consider trying to select among four possible telescope web services such as those given in Table I. An important QoS attribute specific to the domain of telescope services is the visibility of the sky, which varies by day (sometimes wildly as in the case of T3) depending on the weather and season, and would likely be measured by a sep- arate data source, such as AccuWeather. 1 Another important QoS attribute for telescopes, one of varying interest to different users (i.e., sensitive to user context), is composition: the number of objects captured in a photograph. On certain days, particularly those in which the availability of the telescope is minimal due to, say, maintenance, it may be impossible to determine a value for this attribute because no photographs may have been taken, or because the quality of the photographs may be too poor to reliably calculate their composition. 1 http://www.accuweather.com The examples highlight why intelligent evaluation based on dynamic attributes is necessary. Consider the consequences if one ignores that the parameter values fluctuate and merely assumes that all functionally-equivalent services are equally TABLE I AN EXAMPLE SET OF TELESCOPE WEB SERVICES WITH VALUES FOR SEVERAL DYNAMIC QOS PARAMETERS, MEASURED ON FOUR CONSECUTIVE DAYS.THE CURRENT VALUE ON DAY FIVE IS UNKNOWN. Days Telescope Parameters 1 2 3 4 5 T1 availability(%) 20 40 50 60 ? contrast 8 9 8 5 ? dayLight 8 9 5 ? composition 8 5 ? latency(ms) 6 2 3 ? functional y y y y ? weather/visibility(%) 70 90 90 80 ? geographical situation 8 8 4 5 ? T2 availability 85 93 65 89 ? contrast 8 9 4 5 ? dayLight 8 9 5 5 ? composition 8 9 9 5 ? latency 9 8 9 ? functional y y y y ? weather/visibility 90 80 80 90 ? geographical situation 8 6 8 ? T3 availability 80 70 70 40 ? contrast 8 9 4 5 ? dayLight 8 9 4 5 ? composition 8 9 5 ? latency 6 7 6 8 ? functional y y y y ? weather/visibility 10 10 70 90 ? geographical situation 1 1 3 7 ? T4 availability 30 60 30 60 ? contrast 8 9 4 1 ? dayLight 8 9 1 ? composition 8 3 ? latency 5 7 4 1 ? functional y y y y ? weather/visibility 70 70 50 ? geographical situation 8 9 9 9 ? ,(((

Transcript of [IEEE 2011 IEEE International Workshop on the Maintenance and Evolution of Service-Oriented and...

On Supporting Dynamic Web Service Selectionwith Histogramming

Atousa Pahlevan, Sean Chester, Alex Thomo and Hausi A. MullerDepartment of Computer Science

University of Victoria

Email: [email protected], [email protected], [email protected], [email protected]

Abstract—To ensure that consumer requests for web servicesare served successfully and effectively amidst overwhelmingoptions, one must narrow the web service search to only the mostqualified, highest-ranked services. However, today, the rankingof services is done only with regards to static attributes or with asnapshot of current values, resulting in low quality search results.To improve user experience, one must consider dynamic qualityof service measures and address the practical challenges theyincur.

In this paper, we propose using histograms and an area-to-right-of-threshold function to handle the fluctuation andabsence of attributes values effectively. This permits utilizingwell-established techniques for selecting web services, such asskyline and top-k. We also discuss algorithmic considerations toefficiently produce dynamic web service discovery results.

I. INTRODUCTION

With the increasing proliferation of web services, selecting

the best one for your needs can be a very overwhelming task;

so, it is natural to expect some automation of the process.

Current mechanisms for this, borrowing primarily from In-

formation Retrieval (IR) techniques and based primarily on

string similarity, are only of limited use because the service

descriptions on which the similarity is calculated are short and

often ambiguous or syntactically erroneous.

A promising alternative is to evaluate the suitability of each

service with quality of service (QoS) parameters. This is quite

challenging in practice. However, because QoS attributes can

be difficult to measure, fluctuate, are context-sensitive, and

depend on environmental factors such as network availability.

As an example, consider trying to select among four possible

telescope web services such as those given in Table I. An

important QoS attribute specific to the domain of telescope

services is the visibility of the sky, which varies by day

(sometimes wildly as in the case of T3) depending on the

weather and season, and would likely be measured by a sep-

arate data source, such as AccuWeather.1 Another important

QoS attribute for telescopes, one of varying interest to different

users (i.e., sensitive to user context), is composition: the

number of objects captured in a photograph. On certain days,

particularly those in which the availability of the telescope

is minimal due to, say, maintenance, it may be impossible

to determine a value for this attribute because no photographs

may have been taken, or because the quality of the photographs

may be too poor to reliably calculate their composition.

1http://www.accuweather.com

The examples highlight why intelligent evaluation based on

dynamic attributes is necessary. Consider the consequences if

one ignores that the parameter values fluctuate and merely

assumes that all functionally-equivalent services are equally

TABLE IAN EXAMPLE SET OF TELESCOPE WEB SERVICES WITH VALUES FOR

SEVERAL DYNAMIC QOS PARAMETERS, MEASURED ON FOUR

CONSECUTIVE DAYS. THE CURRENT VALUE ON DAY FIVE IS UNKNOWN.

DaysTelescope Parameters 1 2 3 4 5

T1

availability(%) 20 40 50 60 ?

contrast 8 9 8 5 ?

dayLight 8 9 5 ?

composition 8 5 ?

latency(ms) 6 2 3 ?

functional y y y y ?

weather/visibility(%) 70 90 90 80 ?

geographical situation 8 8 4 5 ?

T2

availability 85 93 65 89 ?

contrast 8 9 4 5 ?

dayLight 8 9 5 5 ?

composition 8 9 9 5 ?

latency 9 8 9 ?

functional y y y y ?

weather/visibility 90 80 80 90 ?

geographical situation 8 6 8 ?

T3

availability 80 70 70 40 ?

contrast 8 9 4 5 ?

dayLight 8 9 4 5 ?

composition 8 9 5 ?

latency 6 7 6 8 ?

functional y y y y ?

weather/visibility 10 10 70 90 ?

geographical situation 1 1 3 7 ?

T4

availability 30 60 30 60 ?

contrast 8 9 4 1 ?

dayLight 8 9 1 ?

composition 8 3 ?

latency 5 7 4 1 ?

functional y y y y ?

weather/visibility 70 70 50 ?

geographical situation 8 9 9 9 ?

valuable. The user must manually filter all cases, even extreme

ones such as T2 and T3 in Table I, where one service

outperforms or matches another on every measurement for

every metric. This leads to frustration on the part of the user

and limits the adoption rate of the system.

Despite this need, the literature (cf. Section III) does not

adequately address the problem. Techniques come in two

flavors. They do not consider dynamic attributes at all or they

take a snapshot of the values. In the former case, the real

potential in QoS attributes is squandered. In the latter, services

are misrepresented by temporary fluctuations in QoS values or

harshly penalized because the mechanism for obtaining those

values is unsuccessful. Both cases will inevitably arise in a

practical implementation.

Furthermore, there are two well-established approaches to

producing a subset of well-chosen services. Each with its own

advantages: a user-context-sensitive ranking by aggregation

(cf. Section II-A); and a reduced subset of user-context-

agnostic best choices, a skyline (cf. Section II-B) of web

services. These approaches require reliable, non-null values

for each QoS attribute and, thus, break down in the presence

of dynamic attributes (cf. Section II-C).

Hence, to satisfy the user need for timely, well-chosen web

services, new ideas are needed. That is precisely what this

paper offers. We present a novel histogram-based technique

for establishing reliable values for QoS parameters that can

withstand temporary fluctuations (regardless how much) and

null values. We show how the aggregation and skyline tech-

niques can be restored using these histograms so that real,

practical implementations can properly meet user needs.

This paper makes several contributions to the area of

dynamic web service discovery. (1) We introduce a novel

histogram-based approach to the challenging dynamic web

service selection problem that can handle the salient chal-

lenges of dynamic attributes that cripple the search quality of

existing methods (cf. IV); (2) We explain how state-of-the-art

algorithms for aggregation querying falter in this histogram-

setting and indicate the superiority of the more conceptually

simple sequential scan (cf. V); and (3) We indicate how the

results of our research here can be seamlessly integrated within

existing systems, both technically and from the perspective of

user behavior (cf. VI).

In the next section, we provide further background on

the problem, explaining precisely why dynamic web service

selection is difficult. Section III reviews related work on web

service selection, aggregation querying, and skyline. In Sec-

tion IV, we propose our histogram-based technique for reliably

handling dynamic web service selection. Section V details the

algorithmic considerations that go into efficiently supporting

dynamic web service selection when our histogram-based

technique is applied. The penultimate section discusses how

our techniques can be integrated into existing systems. Finally,

we describe future work and conclude in Section VII.

II. BACKGROUND ON AGGREGATION AND SKYLINE

There are two primary approaches to service selection:

aggregation, which seeks to identify the services most ap-

propriate to the given context; and skyline, which seeks to

identify in a context-agnostic manner the best services. Both,

unfortunately, encounter practical difficulties in the presence

of dynamic attributes. The remainder of this section describes

both approaches and the cause of those difficulties.

A. Aggregation

An aggregation (a.k.a. top-k or ranked) query ranks each

tuple of a database relation by combining the attributes with

a single, aggregate scoring function and then reports the

highest scored tuples (either the k highest ranked tuples or

those whose score exceed a predetermined threshold). In the

case of web services with numeric attributes s1, . . . , sm, the

rank is determined as follows. First, the user context q is

modeled by weights for each QoS parameter, (q1, . . . , qm),where the magnitude of each qi reflects the importance of

the i’th parameter within the context. Then, the context-aware

score for a service is determined as∑m

i=1 si ∗ qi. Then, the

rank of a service sj is the number of other services with a

higher score than that of sj . This offers the advantage of being

personalized for user context, because the context is built into

the ranking mechanism.

Section III discusses existing algorithms for computing the

top-k web services, given a user context modeled as outlined

above.

B. Skyline

A disadvantage of the aforementioned aggregation query is

that the reduction of m attributes to a single score can often

reduce the accuracy of the retrieval process in multi-criteria

decision-making [22]. Hence, the skyline approach.

The skyline of a relation, in contrast to the top-k, is context-

agnostic, which proffers the advantage of circumventing the

non-trivial task of modeling user context numerically. Con-

ceptually, the skyline is the subset of services which are top-

ranked for some user context.

We denote the numeric attributes of service si as

(si1, . . . , sim). Then, service si is said to dominate another

service sj if it has an equal or better value for every single

attribute, and a strictly better value on at least one of those

attributes (that is, if (∀k, sik ≥ sjk) ∧ (∃l, sil > sjl )). The

skyline, then, is the set of all services that are not dominated

by any other service. Unlike aggregation, this produces a set

of incomparable services, rather than a total ordering and,

unlike aggregation, the cardinality of the result set is entirely

data-dependent as a result of being context-agnostic.

C. Practical Challenges of Dynamic Service Discovery

Some practical challenges arise when implementing dy-

namic service discovery because the attributes are, well, dy-

namic. Ignoring these challenges comes at a definite cost,

where a service best suited to a user context can be penalized

because the evaluation mechanism is ill-defined. Using our

telescope example we now show how these challenges arise

and specifically why they prohibit the application of aggrega-

tion and skyline.

The Telescope Example Expanded: There are hundreds of

telescope web services available around the world. But to a

user interested in photographing a particular star, only a subset

of these services are of particular use depending on several

functional parameters. Telescopes may not be oriented towards

the star at all. With others, the user may not have a service-

level agreement (SLA) that permits her to use that telescope.

By considering all such parameters, one can reduce the set of

telescopes to a small subset which are functionally equivalent.This is referred to as static, functional selection—capturing a

precise values at an exact moment in time. After functional

selection, the elements of the subset need to be ranked and/or

further pruned with respect to their dynamic, quantitative QoS

attributes. The values of all these attributes fluctuate over

time, perhaps even wildly, which is what complicates the

matter of selecting the web service most likely to deliver the

highest quality photographs. The SDDS knowledge base (cf.

Section III) maintains an expert-derived threshold value for

each attribute in each domain as shown in the third column

of Table II, to ensure a minimum quality of service. But this

approach still suffers from several practical problems.

Because the values are captured and/or computed at run

time, they may not always be available, perhaps because

they did not arrive at the system on time, or were lost due

to network problems. Consider if AccuWeather is off-line,

contrast is not announced by the service provider, or the sensor

that detects values is affected by a power outage. In none of

the scenarios should the telescope necessarily be discounted

from consideration, but neither a skyline nor an aggregation

function is well-defined over these null values.

Null values are not the only concern. Consider visibility.

On even a superb day, there can still be patchy cloud cover.

A particular measurement of visibility, one that captures the

precise value at an exact moment in time, might conclude that

the visibility is poor, when in fact it is only a momentary lapse

before the cloud passes over. Another issue is latency. Perhaps

the latency for a particular telescope is generally poor, but

the service has just been restarted and so there are few users

connected as yet. A measurement at this time will overestimate

the quality of the service.

D. The Research Problem

Ideally, one could apply both well-established selection

approaches—aggregation and skyline—to dynamic web ser-

vice selection; however, the challenges detailed above pre-

vent this by obfuscating the relative QoS among services.

The definition of skyline (or, more precisely, of dominance)

requires a well-defined ternary, ordinal relationship (in other

words, (<,=, >) must all be well-defined for the attribute

values of any two services). Null values especially disrupt this,

because they are incomparable to non-null values. Snapshot

values distort this, because they cause the ordinal relationship

among services to fluctuate wildly. Thus, we need to restore

the ternary relationship somehow when null and fluctuating

values arise.

For aggregation, the problem is a bit trickier, because the

scoring function relies on attributes that are real-valued, not

just ordinal. Thus, it is important that the difference between

3 and 1 is larger than the difference between 3 and 2, for

example, because this affects the eventual aggregate score.

Naturally, any technique that supports the case of aggre-

gation can be used to re-establish the concept of dominance,

because the ordinal operators (<,=, >) can be applied to any

real-valued attribute. Thus, the goal is to derive a reasonable

value for null attributes to fairly assess the magnitude by

which the values, if known, would differ from the other known

values. By so doing, both skyline and aggregation can be

applied and the literature on both topics can be selectively

applied.

There is another particularly salient characteristic of dy-

namic web service selection as it relates to skyline and

aggregation. As a first step the choices of web services must

be narrowed to those functionally equivalent and functionally

appropriate. This step must be done first; otherwise, it might

happen that none of the top-k (or, equivalently, none of

the skyline) services match the functional requirements. The

consequence of this first selection is that neither an index

nor a pre-sort-based algorithm—the principle techniques in

the literature discussed in Section III—can effectively resolve

the aggregation or skyline component of the query because

the data structures are designed to query the entire dataset

efficiently, not the particular, ad-hoc subset of it. This charac-

teristic establishes a streaming scenario, wherein the results

of the functional selection are piped into the aggregation

(equivalent to skyline) selection.

As such, an important two-part research problem naturally

arises:

How can one establish a reasonable replacement for nulland instantaneous values in order to apply aggregation andskyline to a set of services—and, given that such replacementscan be produced, which streaming algorithms for aggregationand skyline can best handle being piped from an earlierfunctional selection?

III. RELATED WORK

In this section, we review related work spanning a few

domains. In particular, we describe efforts to support web

service discovery in Section III-A, especially relevant literature

on aggregation querying in Section III-B, and skyline research,

paying particular attention to papers that assume a streaming

setting in Section III-C.

A. Web Service Discovery

The ranking of services by QoS properties can be done in

three ways [12], [24]: (1) with regards to only static attributes;

(2) by taking a snapshot of the current values [22], [6]; or (3)

by deriving a reasonable estimate for the values. With regards

to the first option, considering only static attributes squanders

latent potential to improve the user experience. Returning to

TABLE IIEXAMPLE: SDDS THRESHOLD KNOWLEDGE BASE

Parameters Threshold

availability 80contrast 8dayLight 8

composition 6Photography of star latency 8by telescope domain functional y

visibility 90geographical situation 9

our example in Table I, consider the consequences of capturing

a measure of the availability of telescope service T4 on Day

1 and using that information to evaluate the service on Day

4, when the value has changed substantially. Nonetheless, the

work done here is still of importance, because it introduced

the idea of using aggregation querying in the context of web

service discovery.

Regarding the second option, capturing instantaneous mea-

surements of QoS properties rather ignores the practical issues

that arise naturally when using dynamic attributes. For exam-

ple, on Day 3, the latency measurement for T2 is null, but one

should be lenient because historically the value has been quite

high.

An especially important paper is that of Skoutas et al. [22],

who introduce the concept of dominance and skyline into

web service discovery. In their approach, they examine every

subset of the QoS parameters and determine in how many

of those subsets a given service is dominated. The services

can then be ranked in ascending order of score. In this

way, they combine the notions of skyline and aggregation

without the need to model user context. They also present

algorithms—TKDD, TKDg, TKM—to determine the top-k web

services based on the three dominant scores. They improve the

performance by having boundaries for each object, consisting

of the lowest and highest values of the dominating scores.

Therefore, instances (attributes of service) are maintained in

three lists: maximum, minimum, and actual values of instances

for a service. This approach provides interesting research in

static service discovery where values of attributes are available,

but it cannot support missing values.

SDDS (Static Discovery and Dynamic Selection) by Pahle-

van and Muller is a framework to take static and dynamic

attributes into account for web service discovery [20], [19],

[18]. Table I shows an example with SDDS-determined at-

tributes enumerated in Table II, identifying seven example QoS

attributes. These tables show how to calculate the values for

these attributes in a variety of ways; for example, the visibility

is obtained from weather services such as AccuWeather, and

latency is calculated using the SDDS framework. Geolocation

information is provided by experts and stored in the knowledge

base. Other attributes that refer to context, such as environmen-

tal conditions or user preferences, can be obtained via sensors.

Hadad et al. [6] proposed an algorithm—TQoS-driven

selection—that is embedded in the transactional service selec-

tion for web service composition. The algorithm is basically

designed for the local QoS optimization—that is, the selection

of the qualifying services for each activity that fulfills the

transactional requirements. The algorithm’s input is a work-

flow that includes a set of activities to fulfill a request. The

output is a set of the services for each activity in the workflow

with similar transactional properties but dissimilar nonfunc-

tional properties. The TQoS selection uses user preferences

as a weight assigned to each QoS criterion. Then the web

service is scored through a conventional scoring function to

rank services and find the one with the maximum score. If

there are several services with the same score, one of them is

chosen randomly. Similar techniques are applied to select the

maximum score path for web service composition.

Dustdar et al. [13], [15] proposed the QoS-aware VRESCO

run-time environment which was designed for dynamic bind-

ing, invocation and mediation. The functional attributes are

described in the service meta data model. These QoS attributes

can be either specified manually using a management service,

or measured automatically, and integrated into VRESCO. For

each service, a schedule can be defined that specifies when the

monitor should trigger the measurement. Then, the individual

QoS measurement is published as QoS event into the run

time. The average QoS values can be aggregated based on

the information stored in the QoS events. Unfortunately, these

approach cannot support fluctuating or unavailable values. The

challenge for this algorithm, as for many others, is capturing

and computing dynamic attributes at query time.

B. Aggregation Querying

To determine, given a user context, the k top services, there

is already a significant body of work on this topic [11]. Here

we focus on the two most relevant approaches. One is to use a

traditional scan-based algorithm to process all candidate tuples

in the database, calculate their scores, and return the k tuples

with the highest grades. This involves evaluating every service

for every query.

The better alternative is the Threshold Algorithm (TA)introduced by Fagin et al. [7], which improves on the scan-

based alternative by detecting opportunities to terminate earlier

(i.e., before scanning every tuple). The trick is to exploit the

fact that each attribute’s data source can often be provided in

sorted order and thus one can determine when no further tuples

can score higher than the ones already seen. The idea is refined

further by Nepal and Ramakrishna [17] and Guntzer et al. [8]

to handle data that is homogeneous with inherently fuzzy

attributes. Several extensions to TA have been proposed to

decrease access costs and handle data access in situations when

certain sources cannot provide sorted access [2].

However, a bigger hurdle for TA in dynamic web service

selection settings is that it is ill-defined for the inevitable occa-

sion where null values occur, because sorting is obfuscated by

nulls. One potential alleviation to the null value problem is the

probabilistic top-k query, introduced by Cheng [4], in which

each attribute is modeled as a range of possible values from

a probability density function (pdf). The motivating example

for this research is the modeling of moving objects, where at

any point in time, the actual location is within a certain range,

d, of its last reported location value. If the actual location

changes, then the sensor reports its new location value to the

database and changes d accordingly. The pdf can be an equal

chance of locating the object anywhere in the range of d, and

it can be estimated with a time-series analysis [3]. However,

this approach cannot be applied to our dynamic web service

selection setting because it depends on the construction of an

R-Tree index. As mentioned in Section II-D, this result cannot

be applied for the dynamic web service selection, because the

set of web services will already have been subjected to an

earlier, functional selection, which renders any index useless.

For several applications, it is only necessary to know

whether the probability exceeds a given threshold. Accord-

ingly, a Probabilistic Threshold Query (PTQ) is a variant of a

probabilistic query where only answers with probability values

over a certain threshold are returned [5]. These methods are

more expensive to evaluate than their traditional counterparts

due to the calculated probabilities that accompany the answers.

Several methods to improve the efficiency and reduce the cost

of such queries have been explored. For example, to speed up

the otherwise inefficient use of an interval index to answer the

PTQ, they improve the R-tree by expanding the probabilistic

query to its internal nodes to avoid investigating the contents

of a node. They avoid computing the probability values of

those intervals which cannot satisfy the query [5]. Several

drawbacks characterize the above-mentioned approaches, TA

and its extensions, in service search and discovery.

C. Skyline

Borzonyi et al. [1] introduced Skyline with the canonical

example of a hotel reservation system, where the skyline are

all hotels not worse than any other hotel in terms of both the

distance to the beach and the price per night. However, the

skyline has also drawbacks. First, the cardinality of the skyline

operator can not be controlled—it may include every tuple in

the worst case and, more often than not, close to the worst

case if attributes are uncorrelated. This is especially likely

in domains with more QoS parameters, because the skyline

cardinality is subject to the curse of dimensionality [9]. Addi-

tionally, the traditional skyline operator is designed for static

dimensions such as distance and price [10], wherein dynamic

service discovery is dealing with attributes that fluctuate over

time.

To take advantage of the promise proffered by the dom-

inance concept in multi-criteria service discovery, several

refinements need be performed to align it with dynamic service

selection. Xia et al. [23] presented the concept of ε-skyline

to control the size of the skyline, by ranking the skyline

result, and finally reflecting the user preferences at different

dimensions. They assign weights to dimensions to rank the

skyline results. Here, user preferences are involved to rank and

control the size of the skyline. However, one of our objectives

in studying skyline in service discovery is to offer a context-

agnostic option to overcome the problem of modeling user

context.

Another technique to address the curse of dimensionality

is k-dominance, which relaxes the idea of dominance to k-

dominance. Huang et al. [9] proposed algorithms, such as δ-

skyline employing the concept of k-dominance to rank results

without requiring user context. The definition itself suggests

its efficacy in controlling the cardinality of the skyline—that

is precisely what the user-defined δ indicates.

Papadias et al. introduced another extension of the skyline

query, the top-k dominating query [21]. It combines the

advantages of top-k and skyline queries to control the size of

the output independent of the dimension scales, and without

having any ranking functions. Yiu et al. [14] conducted a study

with algorithms on top-k dominating queries and proposed to

sum the number of points it dominates from all subspaces.

The other aforementioned concern regarding the use of

skyline in dynamic service selection, that of traditionally sup-

porting only static datasets, has also been rectified. Continuousskyline query processing handles dynamic data sets that arise

in domains such as location-based services. Instead of snapshot

queries, continuous queries require continuous evaluation as

the query results vary with the datasets. In terms of location-

based services, the movement of a service object changes the

skyline according to the service’s current location. In real-time

applications, these situations become more complex due to the

movement of all of the query points.

In dynamic service discovery, the services and also the

dynamic attributes are constantly added, removed, or changed.

Our dynamic repository in SDDS is updated periodically and

again at query time, verifying whether the registered services

are still qualified or not by checking their dynamic attributes.

Thus, we regularly need to compute a skyline for the data

points that are valid at query time. Huang et al. [10] proposed

the first work on continuous skyline queries in the mobile

environment. Here, continuous query processing is conducted

for users by updating the skyline instead of computing a new

query from scratch. The possible change from one time to

another is predicted and processed accordingly. This method

uses a type of data structure that is based on the analysis

and exploits the spatial-temporal coherence of the problem.

Morse et al. present the continuous time interval skyline

operator, which continuously computes the current skyline

over a data stream [16]. They present an algorithm called

LookOut for evaluating such queries. Each data point in the

dataset is associated with an interval of time for which it is

valid. The interval consists of the arrival time of the point and

an expiration time for the point. If the service is dominated,

no changes are made to the skyline. If any of the services in

skyline are dominated by a new service, the new service is

added to skyline.

IV. ON THE SELECTION OF SERVICES

Techniques for “best” tuple selection have been extensively

researched in the database community. In this section, we

detail some of them, indicate how the aforementioned dynamic

attribute challenges manifest themselves in those techniques,

and offer a sound solution based on histogramming.

A. Manifestation of Aforementioned Challenges

Both aggregation and skyline encounter practical issues

in the presence of dynamic attributes, which render them

unusable.

Null Values: Null values are perhaps the most debilitating

because they introduce increased incomparability. The sum-

mation on which aggregation depends is ill-defined with null

values. The biggest drawback of skyline is that it often is

not sufficiently selective, especially as the dimensionality in-

creases, and introducing more incomparability via null values

exacerbates this problem. A conceivable approach is to replace

null values with zeros or estimates, but zeroing values can

be unnecessarily harsh, especially when the null arises from

network issues with an external data source.

Temporary Fluctuations: Temporary fluctuations and un-

reliable measurements can significantly affect the skyline. A

service that is consistently part of the skyline because of a

typically maximal value on one attribute can be dropped at

run-time. Consider again the scenario of a momentary cloud

cover at an otherwise consistently high-visibility area.

Dynamic Context Sensitivity: The context always affects

the relevancy of dynamic attributes. For instance, availabil-

ity is usually extremely important for quality-driven service

selection. However, the importance of this attribute can by

enervated if the user decides to wait for a higher quality service

that is temporarily unavailable rather than settle for a lower-

quality service that is immediately available.

B. Histogramming and ART ()

The above issues can be addressed quite tidily using his-

togramming techniques. Rather than taking an instantaneous

measurement of each attribute, we collect the ν most recent

measurements in a histogram. In this case, null values only

become a concern when all of the last ν measurements are

null and rapid fluctuation is smoothened.

Neither aggregation nor skyline are defined for histograms

(i.e., collections of measurements), but a natural approach

is to define a function on a histogram to convert it into a

sensible value. We define the ART () function, or area-right-of-threshold function, for precisely that. Given a threshold τand an attribute histogram a, ART (a, τ) is the number of

measurements greater than τ .2

We can formalize both the aggregation and the skyline

problems on histograms. The top-k aggregation problem is

to retrieve from the set of services T , given a weight vec-

tor w and a threshold vector �τ both in Rd, the subset

{s = 〈s1, . . . , sd〉 ∈ T : ||{t = 〈t1, . . . , td〉 ∈ T :∑wi ∗ART (ti, τi) >

∑wi ∗ART (si, τi)}|| ≤ k}. This is

the same as the traditional definition of top-k aggregation,

except that we have replaced each ti with ART (ti, τi), the

value we compute from each histogram.

Similarly, we define the skyline by replacing each ti with

ART (ti, τi). A service s ∈ T dominates another service t ∈2The term ‘area’ in area-right-of-threshold comes from the idea of inte-

grating part of the histogram, when each measurement is of unit area.

T if ∀τART (si, τi) ≥ ART (ti, τi) ∧ ∃i : ART (si, τi) >ART (ti, τi). As a result, the skyline of best web services are

those not dominated by any other functionally equivalent web

service.

About Composition: An important point to make is with

regards to web service composition. In the context of our

example, this could include composing a telescope service

with an image conversion service. Most telescopes produce

images in Flexible Image Transport System (FITS) format,

which then need be resized, converted, and appropriately

compressed for the final destination medium, be it a mobile

phone, a computer, or a database of transforms.

This fits into the computation of the functionally equivalent

services. In the end, our selection procedure, detailed shortly,

treats each service in the composition in isolation and will

either produce the highest ranked service or a randomly chosen

service from a set of best ranked ones.

V. ALGORITHMIC CONSIDERATIONS

It is one thing to propose a solution to a problem; it is quite

another to demonstrate that it is efficient. In this section, we

give case studies to illustrate the most efficient approaches to

solving the variants of top-k aggregation and skyline when

handling dynamic attributes with skyline and ART().

A. Top-k Aggregation

The most renowned—and efficient—method for handling

top-k aggregation queries, especially when attributes arrive

from external data sources (i.e., in a middleware setting) is the

Threshold Algorithm by Fagin et al. However, its efficiency

evades the histogram setting because it relies on sorted access

to each individual attribute. However, in the case of context-

sensitive thresholds, the ART() values change depending on the

choices of thresholds, thus altering the sorted order. Consider

the following example from our telescope table (Table I).

Focusing on the latency attribute, note that if the user

provides a threshold τl = 4, then the ranking by ART()of the services is (T3,T2,T4,T1) with values (4, 4, 3, 1).If the user instead provides a threshold τl = 8, then the

ranking changes to (T2,T4,T3,T1) with values (3, 2, 1, 0).This demonstrates that to use an algorithm dependent on

sorted access to individual attributes, the ART() and histogram

approach may require resorting for every query, which is

impractical.

On the other hand, sequential scan is a very nice choice

since dynamic QoS selection is applied as a second stage of a

pipeline which includes static functional equivalence selection.

Consequently, it can be considered in a streaming setting,

and preprocessing or indexing is rendered ineffective by the

first selection query. Thus, maintaining a buffer of the k best

services seen so far, and streaming the results of the functional

selection will perform well.

B. Streaming Skyline

Originally, the skyline algorithm was intended for static

query points over a static data set, where there are no temporal

data associated with these elements. To handle dynamic data

sets, continuous skyline processing is incorporated to update

the skyline at the query time, though this is impractical when

all the query points are changing. Another challenge arises

because the ART() values change depending on the choice of

thresholds in our histogram setting. However, using the stream

skyline algorithm, the change in ART() values does not depend

on the choice of thresholds. For instance, in our telescope table

(Table I) example, the default threshold value for availability is

defined as 80 in this domain. Based on this threshold, our first

skyline result is (T2,T3,T4,T1). This result is independent

from the day threshold. However, our histogram approaches

require the day threshold with value (1, 2, 3, 4) in order to

compute the skyline. The ART() will be different depending

on what day we choose to take it from. In this case, the

skyline result, if the threshold value is taken from day 3, is

(T2,T3,T1,T4) .

VI. ON SYNTHESIS WITH LITERATURE AND USERS

As mentioned earlier in our related work section, our

proposed approach to handling dynamic attributes is com-

plementary rather than adversarial to existing web service

selection techniques. This is an important point to warrant

further elaboration, because it is requisite to consider how our

research can be seamlessly integrated into existing systems.

That is the focus of this section, wherein we first describe the

compatibility of our techniques with existing literature and

then with existing user behavior.

A. Static Selection

Our definition of top-k aggregation reduces to the traditional

definition if the thresholds to the ART() function are all

set to zero. But it is important for less trivial instances of

compatibility. First, the system is compatible with our SDDS

framework because the thresholds can be assumed to be set by

an expert administrator rather than a user. Although the static

nature of such thresholds presents a missed opportunity for

efficiency, the proposed system here can handle that scenario

well. Second, any static selection mechanism for evaluating

which are functionally equivalent services is compatible with

this approach because it is a prerequisite step of our problem.

Third, existing methods for dynamic attributes that are based

on snapshot measurements are, in fact, special cases of our

problem in which the histogram consists only of the most

recent measurement and the threshold are, again, all zero.

B. User Behavior

Especially when proposing new ideas that, ultimately, are

meant to improve user experience, it is important to consider

the extent to which they require users to change their behavior,

since this influences the quality of experience and both the

likelihood and ease of adoption. Here we refer the reader to,

as an example, the finviz trading site.3 This is a popular web

service selection website that requires users to set thresholds

for various service parameters. This illustrates the readiness

3http://finviz.com/

with which users can adopt a system that requires them to set

thresholds (that can be more or less exact) for a web service

selection engine.

VII. CONCLUSIONS

A large number of web services perform the same function

but in different ways or with different quality, that is, they

have different nonfunctional properties, such as quality of

service properties. This complicates the web service search

and discovery process and gives rise to the need for an effec-

tive, automated, quality-driven service retrieval and dynamic

selection algorithm that considers only a qualified subset of

the services. With our approach, we strive to serve consumers

better and achieve effective service selection from the vast

numbers of services available by narrowing the search to

include only qualified services—the top-k services.

Today, service selection by QoS attributes only considers

static attributes or with a snapshot of current values, resulting

in low-quality, low-accuracy results. To address this challenge,

we focused on capturing snapshot measurements of QoS

attributes at the time of the query, which necessitates con-

sidering dynamic attributes, effectively selecting services, and

confronting the challenges posed by fluctuating and missing

attributes and the task of monitoring and storing these values.

After considering these aspects, we were able to derive a more

effective algorithm without relying on stale dynamic attributes.

Our histogram-based approach deals with these challenges that

hinder other methods, but more complicated algorithms come

up short in this setting compared to the simpler sequential

scan. Our methods and approach both simplify and enhance

the web service search and discovery process. This enhance-

ment is accentuated by the ease with which our approach can

be integrated with existing technology and systems, both for

users and administrators.

ACKNOWLEDGMENTS

This work was funded in part by the National Sciences and

Engineering Research Council (NSERC) of Canada (CRDPJ

320529-04 and CRDPJ 356154-07), IBM Corporation via

the CSER Consortium, and University of Victoria, British

Columbia, Canada.

REFERENCES

[1] S. Borzonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,”in Proc. 17th International Conference on Data Engineering, vol. 3.IEEE, 2001, pp. 421–430.

[2] N. Bruno, L. Gravano, and A. Marian, “Evaluating Top-k Queries overWeb-accessible Databases,” in Proc. 18th International Conference onData Engineering. IEEE Computer Society, 2002, pp. 216–226.

[3] C. Chatfield, The Analysis of Time Series: An Introduction, 6th ed.Chapman and Hall, 1989, vol. 1.

[4] R. Cheng, D. V. Kalashnikov, and S. Prabhakar, “Evaluating ProbabilisticQueries over Imprecise Data,” in Proc. ACM SIGMOD InternationalConference on Management of Data. ACM, 2003, pp. 551–562.

[5] R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. S. Vitter, “EfficientIndexing Methods for Probabilistic Threshold Queries over UncertainData,” in Proc. 13th International Conference on Very Large Databases,vol. 30. VLDB Foundation, 2004, pp. 876–887.

[6] J. El Hadad, M. Manouvrier, and M. Rukoz, “TQoS: Transactional andQoS-Aware Selection Algorithm for Automatic Web Service Composi-tion,” Transactions on Services Computing, vol. 3, pp. 73–85, 2010.

[7] R. Fagin, A. Lotem, and M. Naor, “Optimal Aggregation Algorithms forMiddleware,” in Proc. 20th ACM Symposium on Principles of DatabaseSystems. ACM, 2001, pp. 102–113.

[8] U. Guntzer, W.-T. Balke, and W. Kiessling, “Optimizing Multi-featureQueries for Image Databases,” in Proc. 26th International Conferenceon Very Large Databases. VLDB Foundation, 2000, pp. 419–428.

[9] J. Huang, D. Ding, G. Wang, and J. Xin, “Tuning the Cardinality ofSkyline,” in Advanced Web and Network Technologies, and Applications,Y. Ishikawa, J. He, G. Xu, Y. Shi, G. Huang, C. Pang, Q. Zhang, andG. Wang, Eds. Springer-Verlag, 2008, vol. 4977, pp. 220–231.

[10] Z. Huang, H. Lu, B. Ooi, and A. Tung, “Continuous Skyline Queriesfor Moving Objects,” IEEE transactions on knowledge and data engi-neering, vol. 18, pp. 1645–1658, 2006.

[11] I. F. Ilyas, G. Beskales, and M. A. Soliman, “A Survey of Top-kQuery Processing Techniques in Relational Database Systems,” ACMComputing Surveys, vol. 40, October 2008.

[12] K. Kyriakos and D. Dimitris, “Requirements for QoS-Based Web ServiceDescription and Discovery,” IEEE Transactions on Services Computing,vol. 2, pp. 320–337, 2009.

[13] P. Leitner, F. Rosenberg, A. Michlmayr, and S. Dustdar, “Towards aFlexible Mediation Framework for Dynamic Service Invocations,” inProc. of the 6th IEEE European Conference on Web Services (ECOWS),pp. 45–59, 2008.

[14] M. Lung and Y. Nikos, “Multi-dimensional Top-k Dominating Queries,”VLDB Journal, vol. 18, pp. 695–718, 2009.

[15] A. Michlmayr, F. Rosenberg, P. Leitner, and S. Dustdar, “End-to-EndSupport for QoS-Aware Service Selection, Binding, and Mediation inVRESCo,” IEEE Transactions on Services Computing, vol. 3, no. 3, pp.193–205, 2010.

[16] M. Morse, J. Patel, and W. Grosky, “Efficient Continuous SkylineComputation,” Information Sciences, vol. 177, pp. 3411–3437, 2007.

[17] S. Nepal and M. V. Ramakrishna, “Query Processing Issues in Image(multimedia) Databases,” in Proc. 15th International Conference onData Engineering. IEEE, 1999, pp. 22–29.

[18] A. Pahlevan and H. A. Muller, “Static-Discovery Dynamic-Selection(SDDS) Approach to Web Service Discovery,” in Proc. 3th InternationalWorkshop on Service Intelligence and Computing, 2009, pp. 769–772.

[19] ——, “A Dynamic Framework for Quality Web Service Discovery,” inProc. 4th International Workshop on a Research Agenda for Mainte-nance and Evolution of Service-Oriented Systems. SEI Press, 2010, inprint.

[20] ——, “Self-Adaptive Management of Web Service Discovery,” in Proc.PhD Symposium at the 8th IEEE European Conference on Web Services.IEEE Computer Society, 2010, pp. 21–24.

[21] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “Progressive SkylineComputation in Database Systems,” ACM Transactions on DatabaseSystems, vol. 30, pp. 41–82, 2005.

[22] D. Skoutas, D. Sacharidis, A. Simitsis, V. Kantere, and T. Sellis, “Top-kDominant Web Services under Multi-Criteria Matching,” in Proc. 12thInternational Conference on Extending Database Technology, 2009, pp.898–909.

[23] X. Tian, D. Zhang, and Y. Tao, “On Skyline with Flexible DominanceRelation,” in Proc. of 24th International Conference on Data Engineer-ing. IEEE, 2008, pp. 1397–1399.

[24] H.-L. Truong and S. Dustdar, “A Survey on Context-aware Web ServiceSystems,” International Journal of Ad Hoc and Ubiquitous Computing,vol. 5, no. 1, pp. 5–31, 2009.