Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get...

62
Barbara Catania, Giovanna Guerrini DIBRIS - University of Genoa, Italy Adaptively Approximate Techniques in Distributed Architectures 1

Transcript of Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get...

Page 1: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Barbara Catania, Giovanna Guerrini

DIBRIS - University of Genoa, Italy

Adaptively Approximate Techniques in Distributed Architectures

1

Page 2: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

What are we talking about?

SOFSEM 2015 2

Query

Answer

User Database Management

System

Rainer Manthey’s talk

This talk

Page 3: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

What are we talking about?

SOFSEM 2015 3

How to effectively and efficiently processing

queries in traditional and advanced data

management architectures

Why and how to combine approximation and

adaptivity in advanced architectures

Page 4: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Summary

4

Background and problem statement

ASAP: Approximate Search with Adaptive

Processing

ASAP in the Small

ASAP in the Large

Conclusions

SOFSEM 2015

Page 5: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

PART I

Background and problem statement

5 SOFSEM 2015

Page 6: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

What are we talking about?

6

How to effectively and efficiently processing queries in traditional and advanced data management architectures

The past The present

SOFSEM 2015

Page 7: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

7 SOFSEM 2015

The past

Page 8: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Reference architecture

SOFSEM 2015 8

static data integration

and reconciliation

SELECT,

INSERT,

DELETE, UPDATE

COMMIT/ ROLLBACK

Page 9: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Data

9

Structured data

Data source with a well-known schema

SOFSEM 2015

ID Number PartNum Quantity Price

ID00033 1 XY-47 14 16.80

ID00034 2 B-987 6 2.34

… … … … …

Page 10: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Queries

10

Operational data retrieval operations

Precise queries: the user expects as results all

the objects that precisely meet the request

Declarative languages: SQL standard

SELECT Part_Num

FROM Catalog

WHERE Price > 10

Page 11: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Query processing

SOFSEM 2015 11

(Declarative) query

(Precise) Answer

User DataBase

Management System

Page 12: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Query processing

12

(Precise)

Answer

User

Compiled Query Plan

Query optimizer

Query executor

(Declarative)

query

SOFSEM 2015

DataBase

Management System

Page 13: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

13

Crowdsourced

data

Data streams

Semantic (linked)

Data

Large-scale

data distribution

The present

SOFSEM 2015

Page 14: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Emerging features: data

14

Huge (terabytes to

exabytes) amount

of (shared)

information

Different types of data availability - stored data - stream data

Uncertainty due to

data inconsistency,

incompleteness,

ambiguities,

deception, low

freshness

Heterogeneous w.r.t. structure, semantics, quality Geo-referenced, time-variant

Page 15: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Emerging features: queries

SOFSEM 2015 15

Fully exploiting the potential of the huge amount of available data

Pressing need of using these data for goals beyond “routine” processing

Limited knowledge of the user about data to be queried Limited resources with respect to data volumes High system dynamicity

Page 16: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Emerging processing modalities: approximation

16

Precise results are not always possible approximation is a need, in presence of bound

resources and high load Precise results are not always desired approximation (relaxation) is an opportunity for

increasing user satisfaction, in presence of highly heterogeneous data

limited knowledge about data

Even when she knows data, usually she wants only the ‘best results’, in order to avoid flooding best results first (preference-based queries)

Page 17: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Emerging processing modalities: adaptivity

17

Data properties may not be known and estimated a priori

Processing conditions (network load…) vary significantly over time

Importance to get early the first (good quality) results

It may not be possible to determine execution plans before the processing starts

Need to adapt the processing to dynamic conditions, giving up the a priori selection of a single execution strategy, fixed before processing Interleave the optimization and

execution stages

Measure

Analyze

Plan

Actuate

Page 18: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Reference

Vs

Approximation Adaptivity

Traditional Possible Possible

Data Streams Velocity Required, due to

data unboundness

Required, due

to dynamicity

Large-scale Data

Distribution

Volume

Variety

Velocity

Veracity

Required, due to

the high

heterogeneity

Required, due

to elasticity

Beyond traditional query processing

18 SOFSEM 2015

Page 19: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Beyond traditional query processing

Approximation Adaptivity

19

Subject: the query processing

task or the data to which

approximation is applied

Target: the information used

for the approximation

Subject: the processing task

affected by the adaptation

Target: what the technique

attempts at adapting

• Aim: the parameter(s) to be maximized/minimized

SOFSEM 2015

Page 20: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Approximation: traditional environments

SOFSEM 2015 20

Query specification

(by rewriting)

Query specification

(by rewriting)

Data distribution, structure

information

Data distribution, structure

information

Result relevance

Result relevance

Query specification

(preference-based:

top-k, skyline)

Query specification

(preference-based:

top-k, skyline)

Ranking function,

relevant attributes

Ranking function,

relevant attributes

Processing

algorithms

Processing

algorithms

Similarity functions Similarity functions

Data reduction Data reduction Synopsis,

summaries

Synopsis,

summaries

Throughput Throughput

Subject Target Aim

Pruning conditions,

heuristics

Pruning conditions,

heuristics

Page 21: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Adaptivity: traditional environments

SOFSEM 2015 21

Queries over many tables Unreliability of traditional cost estimation, mainly

due to unavailabile and/or out-to-date statistics about attribute correlations and skewed attribute distributions

Subject Target Aim

Query Plans/tuple

routing

Query Plans/tuple

routing

Data characteristics/

query parameters

Data characteristics/

query parameters Throughput Throughput

Page 22: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Which recurrent aims?

Quality of Service (QoS)

Oriented Techniques

Quality of Data (QoD)

Oriented Techniques

SOFSEM 2015 22

Finalized at coping with

limited or constrained

resource availability

during query processing

(with QoD guarantees)

Finalized at improving

the quality of result data

(with QoS guarantees)

Page 23: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

QoS parameters QoD parameters

SOFSEM 2015 23

Throughput

CPU usage

Memory consumption

Latency

Communication overhead

Accuracy

Coverage

Freshness

Which recurrent aims?

Page 24: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

How to combine approximation and adaptivity?

24

Aim

ADAPTIVITY

Aim

APPROXIMATION

Quality of Data Quality of Service

Quality of Data

Quality of Service

ASAP: Approximate Search with Adaptive Processing

SOFSEM 2015

Page 25: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

What is ASAP

A framework under which defining QoD-oriented approximation techniques which may adaptively change, at run-time, the degree of approximation applied

In ASAP techniques, decisions concerning when, how and how much to approximate are dynamically taken, during the processing, with the goal of improving the quality of result with efficiency guarantees

25 SOFSEM 2015

Page 26: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

ASAP in our work

ASAP in the Small Definition of ASAP techniques

for advanced architectures with a limited degree of distribution

ASAP in the Large Investigate ASAP techniques in

highly distributed architectures and emerging contexts

Moving towards a vision ...

Data Streams ASAP in the Small

Large-scale Data

Management ASAP in the Large

26 SOFSEM 2015

Page 27: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

PART II

ASAP in the Small

27 SOFSEM 2015

Page 28: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

ASAP in the Small

Definition of ASAP techniques for advanced

architectures with a limited degree of distribution

Data Stream Management Systems

Adaptive techniques for combining exact (fast)

and approximate (accurate) relaxed queries over

dynamic (stream) data

28 SOFSEM 2015

Page 29: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Data streams

SOFSEM 2015 29

Data Stream

Management System

Page 30: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Continuous queries

SOFSEM 2015 30

Data Stream

Management System

window

Result

Blocking vs

non blocking operators

Operator semantics

relies on approximation

SELECT *

FROM R [RANGE 5 MINUTES]

[ROWS 4]

Continuous query

Continuous query

Continuous query

Page 31: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Key features

SOFSEM 2015 31

Data unboundness

Dynamic environment

Unknown and dynamic characteristics for data at runtime

Limited resources with respect to incoming data

Increasingly aggressive sharing of resources and computation

Page 32: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Approximation: adding velocity

Load shedding Load shedding Drops of

tuples/probes

Drops of

tuples/probes

32

Subject Target Aim

Memory

consumption

Memory

consumption

CPU usage CPU usage

SOFSEM 2015

Result relevance

Result relevance

Query specification

(preference-based:

top-k, skyline)

Query specification

(preference-based:

top-k, skyline)

Ranking function,

relevant attributes

Ranking function,

relevant attributes

Data reduction Data reduction Sketches Sketches Computability Computability

Page 33: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Limited resources

under fixed plans

(load schedding,

operator

scheduling)

Limited resources

under fixed plans

(load schedding,

operator

scheduling)

Subquery

sharing

Subquery

sharing

Arrival rate Arrival rate

Workload Workload

Throughput

(Output rate)

Throughput

(Output rate)

Memory

consumption

Memory

consumption

Accuracy Accuracy

33

Subject Target Aim

SOFSEM 2015

Query Plans/tuple

routing

Query Plans/tuple

routing Data characteristics,

system conditions

Data characteristics,

system conditions

CPU usage CPU usage

Adaptivity: adding velocity

Page 34: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Relaxed Queries

Relaxation skyline queries [2006, 2012] For each window, only the best tuples according to (a

subset of) the conditions contained in the query are returned to the user

Best tuples [2001] In terms of a domination relationship [2001] between

tuples inside the window distance with respect to query conditions

Never empty result set

Given a precise query, several relaxation skyline queries One for each set of query conditions to be relaxed

34 SOFSEM 2015

Page 35: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Targeted Problem

Precise queries Very efficient for non blocking operators: need a window-based execution only for blocking operators, like joins and aggregates

May decrease user satisfaction: may lead to the empty or few answer problem

Maximal accuracy

Relaxation skyline queries Execution overhead: need a specific window-based execution, even for selection

May increase user satisfaction: avoid the empty-answer or the few-answer problem

May decrease result accuracy

35 SOFSEM 2015

Page 36: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Why ASAP

QoD-oriented approximation Continuous relaxation skyline queries Minimize distance of the result tuples from the

user request

QoD-oriented adaptation Adapting query plans, providing a good

compromise between user satisfaction and efficiency

Maximize accuracy

36 SOFSEM 2015

Page 37: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

ASAP technique

Goal Moving from one (possibly relaxed) query to

another, maximizing accuracy during the processing

Adaptivity Adaptively selecting query (execution plans),

either precise or relaxed Decision based on statistics, relying on already

processed data and computed results, and heuristics

37 SOFSEM 2015

Page 38: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

ASAP technique

Precise

Continuous

Query

Q

Relaxed

Continuous

Query

Q1

Relaxed

Continuous

Query

Q2

1. A QoD-oriented

user request

2. An accuracy measure

3. A QoD-oriented

adaptive framework

38 SOFSEM 2015

Page 39: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

QoD-oriented user request

Constraint provided by the user together with the initial

request

𝝈𝑨𝑽𝑮: average cardinality (selectivity constraint)

𝝅𝑴𝑨𝑿: maximal distance from the specified query

conditions (precision constraint)

𝝁 : weight for selectivity and precision (trade-off

constraint)

Precise Continuous Query annotated

with specific QoD constraints

39 SOFSEM 2015

Page 40: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Accuracy

Q annotated precise query

Q’ precise/relaxed query

Accuracy of Q’ with respect to Q: how far is Q’ result with

respect to Q result

An higher accuracy of Q’ implies an higher user satisfaction in

obtaining Q’ result

Three main components

Precision operator 𝜋, depending on 𝜋𝑀𝐴𝑋

Selectivity operator 𝜎, depending on 𝜎𝐴𝑉𝐺

Trade-off constraint

𝜶 = 𝝈 ∗ 𝝁

𝐥𝐨𝐠𝒄 𝝅 + 𝟏 ∗ 𝟏 − 𝝁 + 𝟏

40 SOFSEM 2015

Page 41: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

QoD adaptive framework

Monitor

Collect aggregate values,

based on the query in

execution:

• selectivity

• precision

Assessor

Determine whether some QoD

conditions are satisfied

• 𝑠𝑒𝑙+ : too many results

• 𝑠𝑒𝑙−: too less results

• 𝑟𝑒𝑙𝑎𝑥+: imprecise results

returned

• 𝑟𝑒𝑙𝑎𝑥−: good discarded

tuples

Responder

Based on assessor predicates, determine whether the

query plan should be modified 41 SOFSEM 2015

Page 42: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

QoD adaptive framework

Precise

Continuous

Query

Relaxed

Continuous

Query

𝜓0 = ¬𝑠𝑒𝑙− ∨ ¬𝑟𝑒𝑙𝑎𝑥− 𝜓1 = ¬𝑠𝑒𝑙+ ∧ ¬𝑟𝑒𝑙𝑎𝑥+ 𝜓2 = 𝑠𝑒𝑙− ∧ 𝑟𝑒𝑙𝑎𝑥−

𝜓3 = 𝑠𝑒𝑙+ ∨ 𝑟𝑒𝑙𝑎𝑥+

42 SOFSEM 2015

Page 43: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Experimental results

SOFSEM 2015 43

Medium selectivity (50%)

Equal relevance for selectivity

and precision

Amortized accuracy

Amortized processing time

Page 44: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

PART III

ASAP in the Large

44 SOFSEM 2015

Page 45: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

ASAP in the Large

SOFSEM 2015 45

User interactions with the network and its many applications generate

a valuable amount of information, facts, and opinions with a great

socio-economic potential

This huge wealth of information is currently being exploited much

below its potential because of the difficulties in accessing data to

retrieve relevant information

ASAP in the Large as

a step towards the realization of an entity-relationship search paradigm for uncontrolled and wide information domains

with an impact on qualitative and quantitative performance of systems for processing strongly interrelated and heterogeneous data in distributed dynamic environments

Page 46: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Reference architecture

SOFSEM 2015 46

Open

DBaaS

Which sources?

How?

Page 47: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Data sources

Data from different sources are highly heterogeneous in terms of structure, semantic richness, and quality

Geo-referenced, time-variant, and dynamic

Information sources may contain: strongly related and semantically complex but relatively static data (e.g.,

Linked Open Data)

unstructured data, or data with a simple and defined structure

data dynamically generated by a multitude of diverse people (e.g., social

networks, microblogs)

highly dynamic data generated by public or private institutions linked to

the territory (data streams)

Graph-shaped data model 47 SOFSEM 2015

Page 48: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Requests Complex requests expressing relationships among the

entities of user interest

Users are able to specify such requests only vaguely, since they cannot reasonably know format and structure of data encoding the relevant information

Requests may rely on user profile and request context

Examples: the nearest shops selling the book which my friend Luca

likes the biography of the author of the painting I am

watching

Graph-based query languages

48 SOFSEM 2015

Page 49: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Approximation: adding variety and veracity

49

Partial results Partial results

Subject Target Aim ...

Data reduction Data reduction

Throughput Throughput

Communication

overhead

Communication

overhead

Source selection Source selection Data source

content, quality indicators

Data source content, quality

indicators

Quality of Data (accuracy, coverage,

freshness)

Quality of Data (accuracy, coverage,

freshness)

Latency Latency

Monetary budget Monetary budget SOFSEM 2015

... ...

Page 50: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

50

Load balancing Load balancing

Computational

paradigm

Computational

paradigm

Subject Target Aim

Machine capabilities workload

distribution

Machine capabilities workload

distribution

Data and system conditions (#nodes,

failure rate, …)

Data and system conditions (#nodes,

failure rate, …)

SOFSEM 2015

... ... ...

Source selection Source selection

User feedback User feedback

Quality of Data (accuracy, coverage,

freshness)

Quality of Data (accuracy, coverage,

freshness)

CPU utilization CPU utilization

Query specification Query specification

Throughput Throughput

Communication

overhead

Communication

overhead

Latency Latency

Adaptivity: adding variety and veracity

Page 51: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Targeted problem Processing complex requests on heterogeneous and

dynamic information sources can be costly

request interpretation

processing on available sources deemed relevant

aggregation of results in a consistent answer to be returned to the

user

The answer may not guarantee the user satisfaction

it could have been incorrectly interpreted

it could have been processed on inaccurate, incomplete, unreliable

data

it could have required a processing time inadequate to the

urgency of the request

User intervention helps but it is not always possible

51 SOFSEM 2015

Page 52: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

«Vision» [DBRank 2013]

User intervention can be limited by

exploiting information on:

a) user context (geo-location, needs,…)

and user profile (interests, habits,…)

b) data and processing quality

c) similar requests repeated over time

52 SOFSEM 2015

Page 53: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

«Vision»

a) Context+Profile: overcome the logic of one-size-fits-all without overloading the user with useless results

b) Quality: to distinguish trustworthy sources from lower-quality ones

a) + b) allow us [Weikum, 2011] to choose the level of detail of an answer according to

the user background to prefer concise and timely answers sacrificing the

quality of result in the case of a user on the move or in an emergency situation

53 SOFSEM 2015

Page 54: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

«Vision»

c) Information needs may be widespread among different users

during or after an exceptional event (environmental emergencies or flash mobbing initiatives)

users belonging to the same community users that are in the same place, possibly at different

times

common information needs: response times and interpretation errors can be limited taking advantage of the experience gained by prior processing of similar requests

54 SOFSEM 2015

Page 55: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Wearable Query (WQ)

SOFSEM 2015 55

Context information:

spatio-temporal coordinates of the request

its motivation

its environment (e.g., in terms of potential interaction and

urgency)

User profile (provided by the user+induced by the system):

user background and fields of interests

Explicit request annotated with

context and profile information

Page 56: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Enabling «Materials»

Wearable Query Processing

data quality and

dynamicity indicators

user

profile

request

context

knowledge gained

during execution

annotated with context,

profile, quality and

dynamicity measures

Profiled Wearable Query Patterns (PWQPs) - synthetic representations of a set of WQs processed in the past; correspondences among WQs and source portions Source meta information Yellow pages – source indexing on the basis of associated meta-information and represented concepts Mappings – correspondences among different data source portions

56 SOFSEM 2015

Page 57: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Why ASAP

QoD-oriented approximation Wearable queries: explicit request annotated with context and

profile information Minimize distance of the result from context and user

information Maximize accuracy, taking into account metadata generated

by previous WQ executions and data sources

QoD-oriented adaptation The space of sources is incrementally adapted to the

peculiarities of the submitted requests Simultaneous requests are processed by incrementally

adapting them to the peculiarities of the space of sources and its evolution over time

Reduced user interaction

57 SOFSEM 2015

Page 58: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Several issues

SOFSEM 2015 58

Which data summaries?

Which specific data quality measures?

How to take care of geo-spatial information?

Which kind of indexing techniques?

How to manage reusage?

...

Page 59: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Ongoing work

SOFSEM 2015 59

Source data summaries and metadata

information for Linked data and their usage in

Yellow Pages

Automatic acquisition of approximate geo-

spatial contexts for crowdsourced (social)

data

Page 60: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

PART IV

Conclusions

60 SOFSEM 2015

Page 61: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

Key concepts

SOFSEM 2015 61

No more possible to rely on precise queries

Two enabling concepts: approximation and adaptivity

Useful a classification based on quality: QoD, QoS

Need for combined solutions

Our group: emphasis on QoD-QoD approaches (ASAP)

Page 62: Adaptively Approximate Techniques in Distributed Architectures · 2015-02-22 · Importance to get early the first (good quality) results It may not be possible to determine execution

ASAP in summary

ASAP is not a new concept but a specific revisitation of

existing approaches focusing on QoD parameters

Useful in specific (and more controlled) contexts and even

more relevant when increasing the complexity of the

environment and of the data sources at hand

ASAP in the Large still an ongoing activity, several open

issues

62 SOFSEM 2015

Thank you!