[IEEE 2011 IEEE International Workshop on the Maintenance and Evolution of Service-Oriented and...
Transcript of [IEEE 2011 IEEE International Workshop on the Maintenance and Evolution of Service-Oriented and...
On Supporting Dynamic Web Service Selectionwith Histogramming
Atousa Pahlevan, Sean Chester, Alex Thomo and Hausi A. MullerDepartment of Computer Science
University of Victoria
Email: [email protected], [email protected], [email protected], [email protected]
Abstract—To ensure that consumer requests for web servicesare served successfully and effectively amidst overwhelmingoptions, one must narrow the web service search to only the mostqualified, highest-ranked services. However, today, the rankingof services is done only with regards to static attributes or with asnapshot of current values, resulting in low quality search results.To improve user experience, one must consider dynamic qualityof service measures and address the practical challenges theyincur.
In this paper, we propose using histograms and an area-to-right-of-threshold function to handle the fluctuation andabsence of attributes values effectively. This permits utilizingwell-established techniques for selecting web services, such asskyline and top-k. We also discuss algorithmic considerations toefficiently produce dynamic web service discovery results.
I. INTRODUCTION
With the increasing proliferation of web services, selecting
the best one for your needs can be a very overwhelming task;
so, it is natural to expect some automation of the process.
Current mechanisms for this, borrowing primarily from In-
formation Retrieval (IR) techniques and based primarily on
string similarity, are only of limited use because the service
descriptions on which the similarity is calculated are short and
often ambiguous or syntactically erroneous.
A promising alternative is to evaluate the suitability of each
service with quality of service (QoS) parameters. This is quite
challenging in practice. However, because QoS attributes can
be difficult to measure, fluctuate, are context-sensitive, and
depend on environmental factors such as network availability.
As an example, consider trying to select among four possible
telescope web services such as those given in Table I. An
important QoS attribute specific to the domain of telescope
services is the visibility of the sky, which varies by day
(sometimes wildly as in the case of T3) depending on the
weather and season, and would likely be measured by a sep-
arate data source, such as AccuWeather.1 Another important
QoS attribute for telescopes, one of varying interest to different
users (i.e., sensitive to user context), is composition: the
number of objects captured in a photograph. On certain days,
particularly those in which the availability of the telescope
is minimal due to, say, maintenance, it may be impossible
to determine a value for this attribute because no photographs
may have been taken, or because the quality of the photographs
may be too poor to reliably calculate their composition.
1http://www.accuweather.com
The examples highlight why intelligent evaluation based on
dynamic attributes is necessary. Consider the consequences if
one ignores that the parameter values fluctuate and merely
assumes that all functionally-equivalent services are equally
TABLE IAN EXAMPLE SET OF TELESCOPE WEB SERVICES WITH VALUES FOR
SEVERAL DYNAMIC QOS PARAMETERS, MEASURED ON FOUR
CONSECUTIVE DAYS. THE CURRENT VALUE ON DAY FIVE IS UNKNOWN.
DaysTelescope Parameters 1 2 3 4 5
T1
availability(%) 20 40 50 60 ?
contrast 8 9 8 5 ?
dayLight 8 9 5 ?
composition 8 5 ?
latency(ms) 6 2 3 ?
functional y y y y ?
weather/visibility(%) 70 90 90 80 ?
geographical situation 8 8 4 5 ?
T2
availability 85 93 65 89 ?
contrast 8 9 4 5 ?
dayLight 8 9 5 5 ?
composition 8 9 9 5 ?
latency 9 8 9 ?
functional y y y y ?
weather/visibility 90 80 80 90 ?
geographical situation 8 6 8 ?
T3
availability 80 70 70 40 ?
contrast 8 9 4 5 ?
dayLight 8 9 4 5 ?
composition 8 9 5 ?
latency 6 7 6 8 ?
functional y y y y ?
weather/visibility 10 10 70 90 ?
geographical situation 1 1 3 7 ?
T4
availability 30 60 30 60 ?
contrast 8 9 4 1 ?
dayLight 8 9 1 ?
composition 8 3 ?
latency 5 7 4 1 ?
functional y y y y ?
weather/visibility 70 70 50 ?
geographical situation 8 9 9 9 ?
valuable. The user must manually filter all cases, even extreme
ones such as T2 and T3 in Table I, where one service
outperforms or matches another on every measurement for
every metric. This leads to frustration on the part of the user
and limits the adoption rate of the system.
Despite this need, the literature (cf. Section III) does not
adequately address the problem. Techniques come in two
flavors. They do not consider dynamic attributes at all or they
take a snapshot of the values. In the former case, the real
potential in QoS attributes is squandered. In the latter, services
are misrepresented by temporary fluctuations in QoS values or
harshly penalized because the mechanism for obtaining those
values is unsuccessful. Both cases will inevitably arise in a
practical implementation.
Furthermore, there are two well-established approaches to
producing a subset of well-chosen services. Each with its own
advantages: a user-context-sensitive ranking by aggregation
(cf. Section II-A); and a reduced subset of user-context-
agnostic best choices, a skyline (cf. Section II-B) of web
services. These approaches require reliable, non-null values
for each QoS attribute and, thus, break down in the presence
of dynamic attributes (cf. Section II-C).
Hence, to satisfy the user need for timely, well-chosen web
services, new ideas are needed. That is precisely what this
paper offers. We present a novel histogram-based technique
for establishing reliable values for QoS parameters that can
withstand temporary fluctuations (regardless how much) and
null values. We show how the aggregation and skyline tech-
niques can be restored using these histograms so that real,
practical implementations can properly meet user needs.
This paper makes several contributions to the area of
dynamic web service discovery. (1) We introduce a novel
histogram-based approach to the challenging dynamic web
service selection problem that can handle the salient chal-
lenges of dynamic attributes that cripple the search quality of
existing methods (cf. IV); (2) We explain how state-of-the-art
algorithms for aggregation querying falter in this histogram-
setting and indicate the superiority of the more conceptually
simple sequential scan (cf. V); and (3) We indicate how the
results of our research here can be seamlessly integrated within
existing systems, both technically and from the perspective of
user behavior (cf. VI).
In the next section, we provide further background on
the problem, explaining precisely why dynamic web service
selection is difficult. Section III reviews related work on web
service selection, aggregation querying, and skyline. In Sec-
tion IV, we propose our histogram-based technique for reliably
handling dynamic web service selection. Section V details the
algorithmic considerations that go into efficiently supporting
dynamic web service selection when our histogram-based
technique is applied. The penultimate section discusses how
our techniques can be integrated into existing systems. Finally,
we describe future work and conclude in Section VII.
II. BACKGROUND ON AGGREGATION AND SKYLINE
There are two primary approaches to service selection:
aggregation, which seeks to identify the services most ap-
propriate to the given context; and skyline, which seeks to
identify in a context-agnostic manner the best services. Both,
unfortunately, encounter practical difficulties in the presence
of dynamic attributes. The remainder of this section describes
both approaches and the cause of those difficulties.
A. Aggregation
An aggregation (a.k.a. top-k or ranked) query ranks each
tuple of a database relation by combining the attributes with
a single, aggregate scoring function and then reports the
highest scored tuples (either the k highest ranked tuples or
those whose score exceed a predetermined threshold). In the
case of web services with numeric attributes s1, . . . , sm, the
rank is determined as follows. First, the user context q is
modeled by weights for each QoS parameter, (q1, . . . , qm),where the magnitude of each qi reflects the importance of
the i’th parameter within the context. Then, the context-aware
score for a service is determined as∑m
i=1 si ∗ qi. Then, the
rank of a service sj is the number of other services with a
higher score than that of sj . This offers the advantage of being
personalized for user context, because the context is built into
the ranking mechanism.
Section III discusses existing algorithms for computing the
top-k web services, given a user context modeled as outlined
above.
B. Skyline
A disadvantage of the aforementioned aggregation query is
that the reduction of m attributes to a single score can often
reduce the accuracy of the retrieval process in multi-criteria
decision-making [22]. Hence, the skyline approach.
The skyline of a relation, in contrast to the top-k, is context-
agnostic, which proffers the advantage of circumventing the
non-trivial task of modeling user context numerically. Con-
ceptually, the skyline is the subset of services which are top-
ranked for some user context.
We denote the numeric attributes of service si as
(si1, . . . , sim). Then, service si is said to dominate another
service sj if it has an equal or better value for every single
attribute, and a strictly better value on at least one of those
attributes (that is, if (∀k, sik ≥ sjk) ∧ (∃l, sil > sjl )). The
skyline, then, is the set of all services that are not dominated
by any other service. Unlike aggregation, this produces a set
of incomparable services, rather than a total ordering and,
unlike aggregation, the cardinality of the result set is entirely
data-dependent as a result of being context-agnostic.
C. Practical Challenges of Dynamic Service Discovery
Some practical challenges arise when implementing dy-
namic service discovery because the attributes are, well, dy-
namic. Ignoring these challenges comes at a definite cost,
where a service best suited to a user context can be penalized
because the evaluation mechanism is ill-defined. Using our
telescope example we now show how these challenges arise
and specifically why they prohibit the application of aggrega-
tion and skyline.
The Telescope Example Expanded: There are hundreds of
telescope web services available around the world. But to a
user interested in photographing a particular star, only a subset
of these services are of particular use depending on several
functional parameters. Telescopes may not be oriented towards
the star at all. With others, the user may not have a service-
level agreement (SLA) that permits her to use that telescope.
By considering all such parameters, one can reduce the set of
telescopes to a small subset which are functionally equivalent.This is referred to as static, functional selection—capturing a
precise values at an exact moment in time. After functional
selection, the elements of the subset need to be ranked and/or
further pruned with respect to their dynamic, quantitative QoS
attributes. The values of all these attributes fluctuate over
time, perhaps even wildly, which is what complicates the
matter of selecting the web service most likely to deliver the
highest quality photographs. The SDDS knowledge base (cf.
Section III) maintains an expert-derived threshold value for
each attribute in each domain as shown in the third column
of Table II, to ensure a minimum quality of service. But this
approach still suffers from several practical problems.
Because the values are captured and/or computed at run
time, they may not always be available, perhaps because
they did not arrive at the system on time, or were lost due
to network problems. Consider if AccuWeather is off-line,
contrast is not announced by the service provider, or the sensor
that detects values is affected by a power outage. In none of
the scenarios should the telescope necessarily be discounted
from consideration, but neither a skyline nor an aggregation
function is well-defined over these null values.
Null values are not the only concern. Consider visibility.
On even a superb day, there can still be patchy cloud cover.
A particular measurement of visibility, one that captures the
precise value at an exact moment in time, might conclude that
the visibility is poor, when in fact it is only a momentary lapse
before the cloud passes over. Another issue is latency. Perhaps
the latency for a particular telescope is generally poor, but
the service has just been restarted and so there are few users
connected as yet. A measurement at this time will overestimate
the quality of the service.
D. The Research Problem
Ideally, one could apply both well-established selection
approaches—aggregation and skyline—to dynamic web ser-
vice selection; however, the challenges detailed above pre-
vent this by obfuscating the relative QoS among services.
The definition of skyline (or, more precisely, of dominance)
requires a well-defined ternary, ordinal relationship (in other
words, (<,=, >) must all be well-defined for the attribute
values of any two services). Null values especially disrupt this,
because they are incomparable to non-null values. Snapshot
values distort this, because they cause the ordinal relationship
among services to fluctuate wildly. Thus, we need to restore
the ternary relationship somehow when null and fluctuating
values arise.
For aggregation, the problem is a bit trickier, because the
scoring function relies on attributes that are real-valued, not
just ordinal. Thus, it is important that the difference between
3 and 1 is larger than the difference between 3 and 2, for
example, because this affects the eventual aggregate score.
Naturally, any technique that supports the case of aggre-
gation can be used to re-establish the concept of dominance,
because the ordinal operators (<,=, >) can be applied to any
real-valued attribute. Thus, the goal is to derive a reasonable
value for null attributes to fairly assess the magnitude by
which the values, if known, would differ from the other known
values. By so doing, both skyline and aggregation can be
applied and the literature on both topics can be selectively
applied.
There is another particularly salient characteristic of dy-
namic web service selection as it relates to skyline and
aggregation. As a first step the choices of web services must
be narrowed to those functionally equivalent and functionally
appropriate. This step must be done first; otherwise, it might
happen that none of the top-k (or, equivalently, none of
the skyline) services match the functional requirements. The
consequence of this first selection is that neither an index
nor a pre-sort-based algorithm—the principle techniques in
the literature discussed in Section III—can effectively resolve
the aggregation or skyline component of the query because
the data structures are designed to query the entire dataset
efficiently, not the particular, ad-hoc subset of it. This charac-
teristic establishes a streaming scenario, wherein the results
of the functional selection are piped into the aggregation
(equivalent to skyline) selection.
As such, an important two-part research problem naturally
arises:
How can one establish a reasonable replacement for nulland instantaneous values in order to apply aggregation andskyline to a set of services—and, given that such replacementscan be produced, which streaming algorithms for aggregationand skyline can best handle being piped from an earlierfunctional selection?
III. RELATED WORK
In this section, we review related work spanning a few
domains. In particular, we describe efforts to support web
service discovery in Section III-A, especially relevant literature
on aggregation querying in Section III-B, and skyline research,
paying particular attention to papers that assume a streaming
setting in Section III-C.
A. Web Service Discovery
The ranking of services by QoS properties can be done in
three ways [12], [24]: (1) with regards to only static attributes;
(2) by taking a snapshot of the current values [22], [6]; or (3)
by deriving a reasonable estimate for the values. With regards
to the first option, considering only static attributes squanders
latent potential to improve the user experience. Returning to
TABLE IIEXAMPLE: SDDS THRESHOLD KNOWLEDGE BASE
Parameters Threshold
availability 80contrast 8dayLight 8
composition 6Photography of star latency 8by telescope domain functional y
visibility 90geographical situation 9
our example in Table I, consider the consequences of capturing
a measure of the availability of telescope service T4 on Day
1 and using that information to evaluate the service on Day
4, when the value has changed substantially. Nonetheless, the
work done here is still of importance, because it introduced
the idea of using aggregation querying in the context of web
service discovery.
Regarding the second option, capturing instantaneous mea-
surements of QoS properties rather ignores the practical issues
that arise naturally when using dynamic attributes. For exam-
ple, on Day 3, the latency measurement for T2 is null, but one
should be lenient because historically the value has been quite
high.
An especially important paper is that of Skoutas et al. [22],
who introduce the concept of dominance and skyline into
web service discovery. In their approach, they examine every
subset of the QoS parameters and determine in how many
of those subsets a given service is dominated. The services
can then be ranked in ascending order of score. In this
way, they combine the notions of skyline and aggregation
without the need to model user context. They also present
algorithms—TKDD, TKDg, TKM—to determine the top-k web
services based on the three dominant scores. They improve the
performance by having boundaries for each object, consisting
of the lowest and highest values of the dominating scores.
Therefore, instances (attributes of service) are maintained in
three lists: maximum, minimum, and actual values of instances
for a service. This approach provides interesting research in
static service discovery where values of attributes are available,
but it cannot support missing values.
SDDS (Static Discovery and Dynamic Selection) by Pahle-
van and Muller is a framework to take static and dynamic
attributes into account for web service discovery [20], [19],
[18]. Table I shows an example with SDDS-determined at-
tributes enumerated in Table II, identifying seven example QoS
attributes. These tables show how to calculate the values for
these attributes in a variety of ways; for example, the visibility
is obtained from weather services such as AccuWeather, and
latency is calculated using the SDDS framework. Geolocation
information is provided by experts and stored in the knowledge
base. Other attributes that refer to context, such as environmen-
tal conditions or user preferences, can be obtained via sensors.
Hadad et al. [6] proposed an algorithm—TQoS-driven
selection—that is embedded in the transactional service selec-
tion for web service composition. The algorithm is basically
designed for the local QoS optimization—that is, the selection
of the qualifying services for each activity that fulfills the
transactional requirements. The algorithm’s input is a work-
flow that includes a set of activities to fulfill a request. The
output is a set of the services for each activity in the workflow
with similar transactional properties but dissimilar nonfunc-
tional properties. The TQoS selection uses user preferences
as a weight assigned to each QoS criterion. Then the web
service is scored through a conventional scoring function to
rank services and find the one with the maximum score. If
there are several services with the same score, one of them is
chosen randomly. Similar techniques are applied to select the
maximum score path for web service composition.
Dustdar et al. [13], [15] proposed the QoS-aware VRESCO
run-time environment which was designed for dynamic bind-
ing, invocation and mediation. The functional attributes are
described in the service meta data model. These QoS attributes
can be either specified manually using a management service,
or measured automatically, and integrated into VRESCO. For
each service, a schedule can be defined that specifies when the
monitor should trigger the measurement. Then, the individual
QoS measurement is published as QoS event into the run
time. The average QoS values can be aggregated based on
the information stored in the QoS events. Unfortunately, these
approach cannot support fluctuating or unavailable values. The
challenge for this algorithm, as for many others, is capturing
and computing dynamic attributes at query time.
B. Aggregation Querying
To determine, given a user context, the k top services, there
is already a significant body of work on this topic [11]. Here
we focus on the two most relevant approaches. One is to use a
traditional scan-based algorithm to process all candidate tuples
in the database, calculate their scores, and return the k tuples
with the highest grades. This involves evaluating every service
for every query.
The better alternative is the Threshold Algorithm (TA)introduced by Fagin et al. [7], which improves on the scan-
based alternative by detecting opportunities to terminate earlier
(i.e., before scanning every tuple). The trick is to exploit the
fact that each attribute’s data source can often be provided in
sorted order and thus one can determine when no further tuples
can score higher than the ones already seen. The idea is refined
further by Nepal and Ramakrishna [17] and Guntzer et al. [8]
to handle data that is homogeneous with inherently fuzzy
attributes. Several extensions to TA have been proposed to
decrease access costs and handle data access in situations when
certain sources cannot provide sorted access [2].
However, a bigger hurdle for TA in dynamic web service
selection settings is that it is ill-defined for the inevitable occa-
sion where null values occur, because sorting is obfuscated by
nulls. One potential alleviation to the null value problem is the
probabilistic top-k query, introduced by Cheng [4], in which
each attribute is modeled as a range of possible values from
a probability density function (pdf). The motivating example
for this research is the modeling of moving objects, where at
any point in time, the actual location is within a certain range,
d, of its last reported location value. If the actual location
changes, then the sensor reports its new location value to the
database and changes d accordingly. The pdf can be an equal
chance of locating the object anywhere in the range of d, and
it can be estimated with a time-series analysis [3]. However,
this approach cannot be applied to our dynamic web service
selection setting because it depends on the construction of an
R-Tree index. As mentioned in Section II-D, this result cannot
be applied for the dynamic web service selection, because the
set of web services will already have been subjected to an
earlier, functional selection, which renders any index useless.
For several applications, it is only necessary to know
whether the probability exceeds a given threshold. Accord-
ingly, a Probabilistic Threshold Query (PTQ) is a variant of a
probabilistic query where only answers with probability values
over a certain threshold are returned [5]. These methods are
more expensive to evaluate than their traditional counterparts
due to the calculated probabilities that accompany the answers.
Several methods to improve the efficiency and reduce the cost
of such queries have been explored. For example, to speed up
the otherwise inefficient use of an interval index to answer the
PTQ, they improve the R-tree by expanding the probabilistic
query to its internal nodes to avoid investigating the contents
of a node. They avoid computing the probability values of
those intervals which cannot satisfy the query [5]. Several
drawbacks characterize the above-mentioned approaches, TA
and its extensions, in service search and discovery.
C. Skyline
Borzonyi et al. [1] introduced Skyline with the canonical
example of a hotel reservation system, where the skyline are
all hotels not worse than any other hotel in terms of both the
distance to the beach and the price per night. However, the
skyline has also drawbacks. First, the cardinality of the skyline
operator can not be controlled—it may include every tuple in
the worst case and, more often than not, close to the worst
case if attributes are uncorrelated. This is especially likely
in domains with more QoS parameters, because the skyline
cardinality is subject to the curse of dimensionality [9]. Addi-
tionally, the traditional skyline operator is designed for static
dimensions such as distance and price [10], wherein dynamic
service discovery is dealing with attributes that fluctuate over
time.
To take advantage of the promise proffered by the dom-
inance concept in multi-criteria service discovery, several
refinements need be performed to align it with dynamic service
selection. Xia et al. [23] presented the concept of ε-skyline
to control the size of the skyline, by ranking the skyline
result, and finally reflecting the user preferences at different
dimensions. They assign weights to dimensions to rank the
skyline results. Here, user preferences are involved to rank and
control the size of the skyline. However, one of our objectives
in studying skyline in service discovery is to offer a context-
agnostic option to overcome the problem of modeling user
context.
Another technique to address the curse of dimensionality
is k-dominance, which relaxes the idea of dominance to k-
dominance. Huang et al. [9] proposed algorithms, such as δ-
skyline employing the concept of k-dominance to rank results
without requiring user context. The definition itself suggests
its efficacy in controlling the cardinality of the skyline—that
is precisely what the user-defined δ indicates.
Papadias et al. introduced another extension of the skyline
query, the top-k dominating query [21]. It combines the
advantages of top-k and skyline queries to control the size of
the output independent of the dimension scales, and without
having any ranking functions. Yiu et al. [14] conducted a study
with algorithms on top-k dominating queries and proposed to
sum the number of points it dominates from all subspaces.
The other aforementioned concern regarding the use of
skyline in dynamic service selection, that of traditionally sup-
porting only static datasets, has also been rectified. Continuousskyline query processing handles dynamic data sets that arise
in domains such as location-based services. Instead of snapshot
queries, continuous queries require continuous evaluation as
the query results vary with the datasets. In terms of location-
based services, the movement of a service object changes the
skyline according to the service’s current location. In real-time
applications, these situations become more complex due to the
movement of all of the query points.
In dynamic service discovery, the services and also the
dynamic attributes are constantly added, removed, or changed.
Our dynamic repository in SDDS is updated periodically and
again at query time, verifying whether the registered services
are still qualified or not by checking their dynamic attributes.
Thus, we regularly need to compute a skyline for the data
points that are valid at query time. Huang et al. [10] proposed
the first work on continuous skyline queries in the mobile
environment. Here, continuous query processing is conducted
for users by updating the skyline instead of computing a new
query from scratch. The possible change from one time to
another is predicted and processed accordingly. This method
uses a type of data structure that is based on the analysis
and exploits the spatial-temporal coherence of the problem.
Morse et al. present the continuous time interval skyline
operator, which continuously computes the current skyline
over a data stream [16]. They present an algorithm called
LookOut for evaluating such queries. Each data point in the
dataset is associated with an interval of time for which it is
valid. The interval consists of the arrival time of the point and
an expiration time for the point. If the service is dominated,
no changes are made to the skyline. If any of the services in
skyline are dominated by a new service, the new service is
added to skyline.
IV. ON THE SELECTION OF SERVICES
Techniques for “best” tuple selection have been extensively
researched in the database community. In this section, we
detail some of them, indicate how the aforementioned dynamic
attribute challenges manifest themselves in those techniques,
and offer a sound solution based on histogramming.
A. Manifestation of Aforementioned Challenges
Both aggregation and skyline encounter practical issues
in the presence of dynamic attributes, which render them
unusable.
Null Values: Null values are perhaps the most debilitating
because they introduce increased incomparability. The sum-
mation on which aggregation depends is ill-defined with null
values. The biggest drawback of skyline is that it often is
not sufficiently selective, especially as the dimensionality in-
creases, and introducing more incomparability via null values
exacerbates this problem. A conceivable approach is to replace
null values with zeros or estimates, but zeroing values can
be unnecessarily harsh, especially when the null arises from
network issues with an external data source.
Temporary Fluctuations: Temporary fluctuations and un-
reliable measurements can significantly affect the skyline. A
service that is consistently part of the skyline because of a
typically maximal value on one attribute can be dropped at
run-time. Consider again the scenario of a momentary cloud
cover at an otherwise consistently high-visibility area.
Dynamic Context Sensitivity: The context always affects
the relevancy of dynamic attributes. For instance, availabil-
ity is usually extremely important for quality-driven service
selection. However, the importance of this attribute can by
enervated if the user decides to wait for a higher quality service
that is temporarily unavailable rather than settle for a lower-
quality service that is immediately available.
B. Histogramming and ART ()
The above issues can be addressed quite tidily using his-
togramming techniques. Rather than taking an instantaneous
measurement of each attribute, we collect the ν most recent
measurements in a histogram. In this case, null values only
become a concern when all of the last ν measurements are
null and rapid fluctuation is smoothened.
Neither aggregation nor skyline are defined for histograms
(i.e., collections of measurements), but a natural approach
is to define a function on a histogram to convert it into a
sensible value. We define the ART () function, or area-right-of-threshold function, for precisely that. Given a threshold τand an attribute histogram a, ART (a, τ) is the number of
measurements greater than τ .2
We can formalize both the aggregation and the skyline
problems on histograms. The top-k aggregation problem is
to retrieve from the set of services T , given a weight vec-
tor w and a threshold vector �τ both in Rd, the subset
{s = 〈s1, . . . , sd〉 ∈ T : ||{t = 〈t1, . . . , td〉 ∈ T :∑wi ∗ART (ti, τi) >
∑wi ∗ART (si, τi)}|| ≤ k}. This is
the same as the traditional definition of top-k aggregation,
except that we have replaced each ti with ART (ti, τi), the
value we compute from each histogram.
Similarly, we define the skyline by replacing each ti with
ART (ti, τi). A service s ∈ T dominates another service t ∈2The term ‘area’ in area-right-of-threshold comes from the idea of inte-
grating part of the histogram, when each measurement is of unit area.
T if ∀τART (si, τi) ≥ ART (ti, τi) ∧ ∃i : ART (si, τi) >ART (ti, τi). As a result, the skyline of best web services are
those not dominated by any other functionally equivalent web
service.
About Composition: An important point to make is with
regards to web service composition. In the context of our
example, this could include composing a telescope service
with an image conversion service. Most telescopes produce
images in Flexible Image Transport System (FITS) format,
which then need be resized, converted, and appropriately
compressed for the final destination medium, be it a mobile
phone, a computer, or a database of transforms.
This fits into the computation of the functionally equivalent
services. In the end, our selection procedure, detailed shortly,
treats each service in the composition in isolation and will
either produce the highest ranked service or a randomly chosen
service from a set of best ranked ones.
V. ALGORITHMIC CONSIDERATIONS
It is one thing to propose a solution to a problem; it is quite
another to demonstrate that it is efficient. In this section, we
give case studies to illustrate the most efficient approaches to
solving the variants of top-k aggregation and skyline when
handling dynamic attributes with skyline and ART().
A. Top-k Aggregation
The most renowned—and efficient—method for handling
top-k aggregation queries, especially when attributes arrive
from external data sources (i.e., in a middleware setting) is the
Threshold Algorithm by Fagin et al. However, its efficiency
evades the histogram setting because it relies on sorted access
to each individual attribute. However, in the case of context-
sensitive thresholds, the ART() values change depending on the
choices of thresholds, thus altering the sorted order. Consider
the following example from our telescope table (Table I).
Focusing on the latency attribute, note that if the user
provides a threshold τl = 4, then the ranking by ART()of the services is (T3,T2,T4,T1) with values (4, 4, 3, 1).If the user instead provides a threshold τl = 8, then the
ranking changes to (T2,T4,T3,T1) with values (3, 2, 1, 0).This demonstrates that to use an algorithm dependent on
sorted access to individual attributes, the ART() and histogram
approach may require resorting for every query, which is
impractical.
On the other hand, sequential scan is a very nice choice
since dynamic QoS selection is applied as a second stage of a
pipeline which includes static functional equivalence selection.
Consequently, it can be considered in a streaming setting,
and preprocessing or indexing is rendered ineffective by the
first selection query. Thus, maintaining a buffer of the k best
services seen so far, and streaming the results of the functional
selection will perform well.
B. Streaming Skyline
Originally, the skyline algorithm was intended for static
query points over a static data set, where there are no temporal
data associated with these elements. To handle dynamic data
sets, continuous skyline processing is incorporated to update
the skyline at the query time, though this is impractical when
all the query points are changing. Another challenge arises
because the ART() values change depending on the choice of
thresholds in our histogram setting. However, using the stream
skyline algorithm, the change in ART() values does not depend
on the choice of thresholds. For instance, in our telescope table
(Table I) example, the default threshold value for availability is
defined as 80 in this domain. Based on this threshold, our first
skyline result is (T2,T3,T4,T1). This result is independent
from the day threshold. However, our histogram approaches
require the day threshold with value (1, 2, 3, 4) in order to
compute the skyline. The ART() will be different depending
on what day we choose to take it from. In this case, the
skyline result, if the threshold value is taken from day 3, is
(T2,T3,T1,T4) .
VI. ON SYNTHESIS WITH LITERATURE AND USERS
As mentioned earlier in our related work section, our
proposed approach to handling dynamic attributes is com-
plementary rather than adversarial to existing web service
selection techniques. This is an important point to warrant
further elaboration, because it is requisite to consider how our
research can be seamlessly integrated into existing systems.
That is the focus of this section, wherein we first describe the
compatibility of our techniques with existing literature and
then with existing user behavior.
A. Static Selection
Our definition of top-k aggregation reduces to the traditional
definition if the thresholds to the ART() function are all
set to zero. But it is important for less trivial instances of
compatibility. First, the system is compatible with our SDDS
framework because the thresholds can be assumed to be set by
an expert administrator rather than a user. Although the static
nature of such thresholds presents a missed opportunity for
efficiency, the proposed system here can handle that scenario
well. Second, any static selection mechanism for evaluating
which are functionally equivalent services is compatible with
this approach because it is a prerequisite step of our problem.
Third, existing methods for dynamic attributes that are based
on snapshot measurements are, in fact, special cases of our
problem in which the histogram consists only of the most
recent measurement and the threshold are, again, all zero.
B. User Behavior
Especially when proposing new ideas that, ultimately, are
meant to improve user experience, it is important to consider
the extent to which they require users to change their behavior,
since this influences the quality of experience and both the
likelihood and ease of adoption. Here we refer the reader to,
as an example, the finviz trading site.3 This is a popular web
service selection website that requires users to set thresholds
for various service parameters. This illustrates the readiness
3http://finviz.com/
with which users can adopt a system that requires them to set
thresholds (that can be more or less exact) for a web service
selection engine.
VII. CONCLUSIONS
A large number of web services perform the same function
but in different ways or with different quality, that is, they
have different nonfunctional properties, such as quality of
service properties. This complicates the web service search
and discovery process and gives rise to the need for an effec-
tive, automated, quality-driven service retrieval and dynamic
selection algorithm that considers only a qualified subset of
the services. With our approach, we strive to serve consumers
better and achieve effective service selection from the vast
numbers of services available by narrowing the search to
include only qualified services—the top-k services.
Today, service selection by QoS attributes only considers
static attributes or with a snapshot of current values, resulting
in low-quality, low-accuracy results. To address this challenge,
we focused on capturing snapshot measurements of QoS
attributes at the time of the query, which necessitates con-
sidering dynamic attributes, effectively selecting services, and
confronting the challenges posed by fluctuating and missing
attributes and the task of monitoring and storing these values.
After considering these aspects, we were able to derive a more
effective algorithm without relying on stale dynamic attributes.
Our histogram-based approach deals with these challenges that
hinder other methods, but more complicated algorithms come
up short in this setting compared to the simpler sequential
scan. Our methods and approach both simplify and enhance
the web service search and discovery process. This enhance-
ment is accentuated by the ease with which our approach can
be integrated with existing technology and systems, both for
users and administrators.
ACKNOWLEDGMENTS
This work was funded in part by the National Sciences and
Engineering Research Council (NSERC) of Canada (CRDPJ
320529-04 and CRDPJ 356154-07), IBM Corporation via
the CSER Consortium, and University of Victoria, British
Columbia, Canada.
REFERENCES
[1] S. Borzonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,”in Proc. 17th International Conference on Data Engineering, vol. 3.IEEE, 2001, pp. 421–430.
[2] N. Bruno, L. Gravano, and A. Marian, “Evaluating Top-k Queries overWeb-accessible Databases,” in Proc. 18th International Conference onData Engineering. IEEE Computer Society, 2002, pp. 216–226.
[3] C. Chatfield, The Analysis of Time Series: An Introduction, 6th ed.Chapman and Hall, 1989, vol. 1.
[4] R. Cheng, D. V. Kalashnikov, and S. Prabhakar, “Evaluating ProbabilisticQueries over Imprecise Data,” in Proc. ACM SIGMOD InternationalConference on Management of Data. ACM, 2003, pp. 551–562.
[5] R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. S. Vitter, “EfficientIndexing Methods for Probabilistic Threshold Queries over UncertainData,” in Proc. 13th International Conference on Very Large Databases,vol. 30. VLDB Foundation, 2004, pp. 876–887.
[6] J. El Hadad, M. Manouvrier, and M. Rukoz, “TQoS: Transactional andQoS-Aware Selection Algorithm for Automatic Web Service Composi-tion,” Transactions on Services Computing, vol. 3, pp. 73–85, 2010.
[7] R. Fagin, A. Lotem, and M. Naor, “Optimal Aggregation Algorithms forMiddleware,” in Proc. 20th ACM Symposium on Principles of DatabaseSystems. ACM, 2001, pp. 102–113.
[8] U. Guntzer, W.-T. Balke, and W. Kiessling, “Optimizing Multi-featureQueries for Image Databases,” in Proc. 26th International Conferenceon Very Large Databases. VLDB Foundation, 2000, pp. 419–428.
[9] J. Huang, D. Ding, G. Wang, and J. Xin, “Tuning the Cardinality ofSkyline,” in Advanced Web and Network Technologies, and Applications,Y. Ishikawa, J. He, G. Xu, Y. Shi, G. Huang, C. Pang, Q. Zhang, andG. Wang, Eds. Springer-Verlag, 2008, vol. 4977, pp. 220–231.
[10] Z. Huang, H. Lu, B. Ooi, and A. Tung, “Continuous Skyline Queriesfor Moving Objects,” IEEE transactions on knowledge and data engi-neering, vol. 18, pp. 1645–1658, 2006.
[11] I. F. Ilyas, G. Beskales, and M. A. Soliman, “A Survey of Top-kQuery Processing Techniques in Relational Database Systems,” ACMComputing Surveys, vol. 40, October 2008.
[12] K. Kyriakos and D. Dimitris, “Requirements for QoS-Based Web ServiceDescription and Discovery,” IEEE Transactions on Services Computing,vol. 2, pp. 320–337, 2009.
[13] P. Leitner, F. Rosenberg, A. Michlmayr, and S. Dustdar, “Towards aFlexible Mediation Framework for Dynamic Service Invocations,” inProc. of the 6th IEEE European Conference on Web Services (ECOWS),pp. 45–59, 2008.
[14] M. Lung and Y. Nikos, “Multi-dimensional Top-k Dominating Queries,”VLDB Journal, vol. 18, pp. 695–718, 2009.
[15] A. Michlmayr, F. Rosenberg, P. Leitner, and S. Dustdar, “End-to-EndSupport for QoS-Aware Service Selection, Binding, and Mediation inVRESCo,” IEEE Transactions on Services Computing, vol. 3, no. 3, pp.193–205, 2010.
[16] M. Morse, J. Patel, and W. Grosky, “Efficient Continuous SkylineComputation,” Information Sciences, vol. 177, pp. 3411–3437, 2007.
[17] S. Nepal and M. V. Ramakrishna, “Query Processing Issues in Image(multimedia) Databases,” in Proc. 15th International Conference onData Engineering. IEEE, 1999, pp. 22–29.
[18] A. Pahlevan and H. A. Muller, “Static-Discovery Dynamic-Selection(SDDS) Approach to Web Service Discovery,” in Proc. 3th InternationalWorkshop on Service Intelligence and Computing, 2009, pp. 769–772.
[19] ——, “A Dynamic Framework for Quality Web Service Discovery,” inProc. 4th International Workshop on a Research Agenda for Mainte-nance and Evolution of Service-Oriented Systems. SEI Press, 2010, inprint.
[20] ——, “Self-Adaptive Management of Web Service Discovery,” in Proc.PhD Symposium at the 8th IEEE European Conference on Web Services.IEEE Computer Society, 2010, pp. 21–24.
[21] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “Progressive SkylineComputation in Database Systems,” ACM Transactions on DatabaseSystems, vol. 30, pp. 41–82, 2005.
[22] D. Skoutas, D. Sacharidis, A. Simitsis, V. Kantere, and T. Sellis, “Top-kDominant Web Services under Multi-Criteria Matching,” in Proc. 12thInternational Conference on Extending Database Technology, 2009, pp.898–909.
[23] X. Tian, D. Zhang, and Y. Tao, “On Skyline with Flexible DominanceRelation,” in Proc. of 24th International Conference on Data Engineer-ing. IEEE, 2008, pp. 1397–1399.
[24] H.-L. Truong and S. Dustdar, “A Survey on Context-aware Web ServiceSystems,” International Journal of Ad Hoc and Ubiquitous Computing,vol. 5, no. 1, pp. 5–31, 2009.