On-the-fly Data Integration

26
09.05.2008 Mapping Data to Queries Martin Hentschel Systems Group, ETH Zurich

Transcript of On-the-fly Data Integration

Page 1: On-the-fly Data Integration

09.05.2008

Mapping Data to Queries

Martin Hentschel

Systems Group, ETH Zurich

Page 2: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

“…, but the real advantage of XML is precisely

that it allows you to go from Point A to

destinations unknown.”

-- Larry O’Brien,

Microsoft

2

Page 3: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected] 3

Goals

Integrate data from various data feeds Light-weight

Easy to use

Fast

Page 4: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected] 4

Goals

Integrate data from various data feeds Light-weight

Mapping rules Easy to use

Based on common language (XQuery)

FastImplements research ideas (YFilter)

Page 5: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Targets

Health care Electronic health records (Health Level 7)

Finance Exchange of financial data (xBRL)

Web services News feeds Weather

Every domain which uses several data sources

5

Page 6: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Example

Find the most powerful car

6

<db> <car> <name>Ford</name> <hp>130</hp> </car></db>

<db> <car> <name>Ford</name> <hp>130</hp> </car></db>

<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>

<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>

Page 7: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Example

Find the most powerful car

7

<db> <car> <name>Ford</name> <hp>130</hp> </car></db>

<db> <car> <name>Ford</name> <hp>130</hp> </car></db>

<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>

<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>

daten is-a db;auto is-a car;ps is-a hp;

daten is-a db;auto is-a car;ps is-a hp;

Page 8: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Example

Find the most powerful car

Apply standard XQuery

8

<db> <car> <name>Ford</name> <hp>130</hp> </car></db>

<db> <car> <name>Ford</name> <hp>130</hp> </car></db><daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>

<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>

daten is-a db;auto is-a car;ps is-a hp;

daten is-a db;auto is-a car;ps is-a hp;

let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car

let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car

Page 9: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Example

Find the most powerful car

Apply standard XQuery

9

<db> <car> <name>Ford</name> <hp>130</hp> </car></db>

<db> <car> <name>Ford</name> <hp>130</hp> </car></db><daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>

<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>

daten is-a db;auto is-a car;ps is-a hp;

daten is-a db;auto is-a car;ps is-a hp;

let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car

let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car

<auto> <name>VW Golf</name> <ps>150</ps></auto>

<auto> <name>VW Golf</name> <ps>150</ps></auto>

Result

Page 10: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Usage Scenarios

Continuous query processing

10

DSMSDSMS

Queries

Queries

RulesRulesStreamingInputEvents

StreamingOutputEvents

Page 11: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Usage Scenarios

Publish/subscribe systems

11

RulesRules

Publishers Subscribers

EnhancedBroker

EnhancedBroker

Data

SubscriptionsData

Data

Page 12: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Usage Scenarios

Data integration

12

RulesRules

Source 1

Company‘sData Store

Data

Data

DataSource 2

Source x

Homogeneous

DataData

HandlerData

Handler

Page 13: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

The Is-A Rule

Map XML elements

Expresses a substitutability relationship Like in object oriented design Use the car wherever vehicles are expected

It follows //vehicle also returns car elements Returned as car Not transformed into vehicle Consistent with OO-approach

13

car is-a vehicle; car is-a vehicle;

Page 14: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

The Is-A Rule

Map path expressions XPath path expressions Left hand side may include predicates

14

german/car is-a auto;auto is-a german/car;

german/car is-a auto;auto is-a german/car;

car[@ps < 100] is-aslow/

vehicle;

car[@ps < 100] is-aslow/

vehicle;

Page 15: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

The Is-A Rule

Specify contexts Element names could be used differently in

different contexts

Scope applicability of rules Further refinement

15

car in cars[@country=‘Germany’]

is-a auto;

car in cars[@country=‘Germany’]

is-a auto;

Page 16: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

The Is-A Rule

Element construction Map elements Transform data, e.g. for

Integration of very diverse data

16

auto as $a is-a<car>

<kw>{$a/ps * 0.74}</kw>

</car>;

auto as $a is-a<car>

<kw>{$a/ps * 0.74}</kw>

</car>;

<car> <name>Ford</name> <kw>100</kw></car>

<car> <name>Ford</name> <kw>100</kw></car>

<auto> <name>VW Golf</name> <ps>150</ps></auto>

<auto> <name>VW Golf</name> <ps>150</ps></auto>

Page 17: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Implementation

Several possibilities MDQ approach

- Native approach, novel MDQ data model- Allows lazy execution

Query rewrite- E.g. //(car | auto | vehicle | ...)- Does not scale

Data translation- Translate input data- Big overhead

17

Page 18: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

MDQ Data Model

Classical XML tree model

18

<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>

<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>

auto

psname

„Golf“ „150“

daten

Page 19: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

MDQ Data Model

MDQ data model

Move names from

nodes to edges

19

<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>

<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>

auto

psname

„Golf“ „150“

daten

Page 20: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

MDQ Data Model

Application of mapping rules

20

<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>

<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>

auto

psname

„Golf“ „150“

daten

daten is-a db;auto is-a car;ps is-a hp;

daten is-a db;auto is-a car;ps is-a hp;

db

car

hp

Page 21: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Lazy Evaluation, YFilter

Built from left hand side of rules

Non-deterministic finite state machine

Main idea: Evaluate XQuery program Iterate through data model Report to YFilter Apply rules only when reaching an accepting

state

21

R1: daten is-a db;R2: auto is-a car;R2: ps is-a hp;

R1: daten is-a db;R2: auto is-a car;R2: ps is-a hp;

* daten

auto

ps

R1

R2

R3

Page 22: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Experiment: Throughput

Complex query (multiple scans, joins)

QR: too many unions, DT: overhead of translation

22

Page 23: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Experiment: Throughput

Simple query

Less unions for QR, DT: still overhead of translation

23

Page 24: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Experiment: Throughput

1 input message, bundle of queries evaluated at once

QR: even more unions, DT: less overhead, only transforms input message once

24

Page 25: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

Again: Advantages

Performance Novel data model, lazy execution

Light-weight Mappings rules are small units

Extensibility Add more rules as new sources are adopted

Flexibility Complex mappings through element

constructors25

Page 26: On-the-fly Data Integration

09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]

The End

Visit our website, LIVE DEMO! http://fifthelement.inf.ethz.ch:8080/rules

Write us, please! [email protected]

26