Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University...

39
/56 Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University

Transcript of Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University...

Page 1: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Peer-to-Peer Data Integration with Active XML

Tova Milo

Tel-Aviv University

Page 2: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Active XML - Outline

Introduction

Active XML• Active XML documents• Active XML services

Novel issues• Exchanging Active XML data• Querying Active XML data• Distribution and replication• Security and Access control

Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations

Applications

Conclusion

Page 3: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Introduction

Page 4: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Distributed data management in P2P Information is everywhere

services

XMLXML

services

XMLXMLXMLXML

services

XML

services

XMLWeb

Webservice

Webservice

Data warehousesDatabasesWeb sitesPC, PDA, cell phones, home appliances, cars…

Page 5: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

The golden triangle of distributed data management

XML• a standard for data representation & exchange

Query languages• XPath, XQuery

Web services • standards for distributed computing • Activation of methods on remote servers

XQueryXPath

XML

Webservices

Page 6: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

What is Active XML (AXML)?

AXML is a declarative language

for distributed information management

and

an infrastructure to support this language,

in a peer-to-peer framework.

Page 7: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Active XML

Page 8: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Active XML documents

XML documents with embedded calls to Web services

Intensional• Some of the data is given explicitly • Some is given intensionally

(i.e. the means to acquire data when needed are given)

Dynamic• If the external sources change, the same document will provide

different information• Reaction to world changes

Page 9: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

/56

Tova Milo – Tel Aviv University

Not a new idea in databases, nor on the Web

Mixing calls to data is an old idea• Procedural attributes in relational systems• Basis of Object-oriented Databases

In HTML world • Sun’s JSP, PHP+MySQL

Calls to Web services inside XML documents• Macromedia FLEX, Apache Jelly, Microsoft XAML

What is new is the exploitation of the idea…

Page 10: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

A sample AXML document<?xml version=“1.0” ?><newspaper><title>Le Monde</title><date>06/10/2003</date><call svc=“Yahoo.GetTemp”><city>Paris</city></call><call svc=“TimeOut.GetEvents”>exhibits</call></newspaper>

GetTemp

city

“Paris”

newspaper

titledate

“06/10/2003”

“Le Monde”

GetEvents

“Exhibits”

AXML documents may contain calls:• to any existing Web services

(e-bay.net, google.com…)• to any AXML Web services

(to be defined)

Page 11: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Materialization

We will see later that:• Replacing the call by its result is not the only option• Calls are not necessarily RPC-style synchronous invocations

<?xml version=“1.0” ?><newspaper><title>Le Monde</title><date>06/10/2003</date><call svc=“Yahoo.GetTemp”><city>Paris</city></call><call svc=“TimeOut.GetEvents”>exhibits</call></newspaper>

GetTemp

city

“Paris”

newspaper

titledate

“06/10/2003”

“Le Monde”

GetEvents

“Exhibits”

����

temp

“16°C”

SOAP call

<temp>16°C</temp>

Page 12: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

AXML Web services

Parameters: AXML data

Result: AXML data

Distribute computations: by sending as parameters data containing service calls, one can delegate some work to other peers

Partial computations: by returning data containing service calls, one can give to the receiver the control of these calls

Great flexibility

Page 13: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Calling an AXML service<?xml version=“1.0” ?><newspaper><title>Le Monde</title><date>06/10/2003</date>

<call svc=“TimeOut.GetEvents”>exhibits</call></newspaper>

newspaper

titledate

“06/10/2003”

“Le Monde”

GetEvents

“Exhibits”

<temp>16°C</temp>

exhibits

GetExhibits

“Paris”

City

����

temp

“16°C”

SOAP call (still…)

Materialization is a recursive process

Termination is an issue

<exhibits> <call svc=“Yahoo.GetExhibits”><city>Paris</city></call></exhibits>

Page 14: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Novel issues

Page 15: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Active XML - Outline

Introduction

Active XML• Active XML documents• Active XML services

Novel issues• Exchanging Active XML data (SIGMOD’03, PODS’05)• Querying Active XML data• Distribution and replication• Security and Access control

Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations

Applications

Conclusion

Page 16: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

To call or not to call ?

GetEvents

“Exhibits”

newspaper

title date

“Le Monde”“06/10/2003”

GetTemp

city

“Paris”

temp

“16°C”

����

� Materialization can be performed� by the sender, before sending a document� or by the receiver, after receiving it

GetEvents

“Exhibits”

newspaper

title date

“Le Monde”“06/10/2003”

GetTemp

city

“Paris”

temp

“16°C”

Page 17: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Why control the materialization of calls?

For added functionality, e.g. • Intensional data allows to get up-to-date information

For security reasons or capabilities, e.g.• I don’t trust this Web service/domain• I don’t have the right credentials to invoke it• It costs money• Maybe the receiver doesn’t know Active XML!

For performance reasons, e.g.• A proxy can invoke all the services on behalf of a PDA

… and many more reasons you can think of!

Page 18: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

We extend XML Schema, with intensional types: XMLSchemaint

How to control it? Using types

Casting algorithms use signatures of services: WSDLint

... ...

r

......

...

... ...

gfq

...

CapabilitiesACLCost...

Sender

dataexchangeSchemaf q

g

CapabilitiesACLCost...

Receiver

gg

g

g

gg

q

q

q

f

fr

r

Page 19: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Rewritings

The Goal:Given • an AXML document d• a schema sCan we rewrite d so that it matches s?

Safe rewriting: one that for sure leads to s(we know without making any call)

Possible rewriting: one that possibly leads to s(depending on the answers of the services)

Page 20: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Results

The general problem is undecidable [MSS04]

Restrictions on the considered rewritings• Left-to-right: No “going back and forth”• K-depth: bound on the nesting of function calls

(Search space still infinite but finitely representable)

Under these restrictions• We have algorithms to find safe/possible rewritings• They are PTIME (for deterministic schemas)• We can also do it between schemas

Implementation• first demo at VLDB 2003 (customizable news syndication)

Page 21: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Active XML - Outline

IntroductionActive XML

• Active XML documents• Active XML services

Novel issues• Exchanging Active XML data• Querying Active XML data (SIGMOD’04, PODS’05)• Distribution and replication• Security and Access control

Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations

ApplicationsConclusion

Page 22: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Querying AXML Data

Given a (tree pattern) query:/newspaper[temp > 18°C]/exhibits//exhibit[location=“Le Louvre”]

Materialize the document?

Call only the services that may contributedata to the query answer.

The problem: Lazy evaluation of service callsTo call or not to call, this time when evaluating a query

GetTemp

city

“Paris”

newspaper

titlegetDate

“Le Monde”

GetEvents

“Exhibits”

exhibits

GetExhibits

“Paris”

City

temp

“19°C”

Page 23: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Lazy evaluation

Difficulties:• Calls can be found everywhere in the document• May appear dynamically (as a result of previous calls)• May become (ir)relevant due to previous invocations• Need to take signatures of calls into consideration

Possible approach: modify the query processor• Trigger the calls found on the way• Not so great:

– Computation is blocked– Optimization opportunities are lost

Our solution:• Drives queries that find the relevant calls (recursively)• Use service signatures to prune irrelevant calls• Parallel call invocations• Pushing queries to capable external sources

Page 24: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Active XML - Outline

IntroductionActive XML

• Active XML documents• Active XML services

Novel issues• Exchanging Active XML data• Querying Active XML data • Distribution and replication• Security and Access control

Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations

ApplicationsConclusion

Page 25: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Active XML peers

Page 26: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Distributed data management in P2P

services

XMLXML

services

XMLXMLXMLXML

services

XML

services

XMLWeb

Webservice

Webservice

AXML

AXML

AXML

AXML

AXML

AXML

AXML

Page 27: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

What is an AXML peer ?

Repository: manages persistent AXML data

Client: uses (AXML) Web services to dynamically enrich data

Server: easy (declarative) definition of AXML services

AXMLpeerso

ap

Page 28: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Global architecture

�����

���������

�� �������

��

��

��������� �

��

�������

�� ������

��������� ���������� �

AXML

XML

AXML

AXML

AXML store

servicedescriptions

AXMLengine

Query engine

Page 29: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Active XML - Outline

IntroductionActive XML

• Active XML documents• Active XML services

New issues• Exchanging Active XML data• Querying Active XML data• Distribution and replication• Security and Access control

Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations

Applications• P2P auctions• News syndication• Other applications

Conclusion

Page 30: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Managing persistent AXML data

“Our newspaper should have its temperature information refreshed daily. New exhibits should be fetched every week and archived for 6 months”

Service call results enrich the document (calls can be kept for possible future reuse)

Main issues:• When to activate a service call? (pull/push, implicit/explicit)• What to do with its result? (add/replace/merge)

Page 31: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Example: AXML document with control attributes

<?xml version=“1.0” ?><newspaper><title>Le Monde</title><date>06/10/2003</date><call svc=“Yahoo.GetTemp” mode=“lazy”

valid=“1 day”merge=“replace” >

<city>Paris</city></call><call svc=“TimeOut.GetEvents” mode=“every Monday morning”

valid=“6 months”merge=“append”>

exhibits</call></newspaper>

Page 32: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Providing declarative AXML services

Services can be defined by queries or updates over the AXML documents of the repository (XQuery, XPath, Xupdate)

Users can subscribe to services

Services can be composed (BPEL4WS)

Which (lazy) service calls may contribute to the answer?

let service GetExhibitsByLocation($loc) be

for $a in document(“newspaper.xml")/newspaper/exhibits,

$b in $a//exhibit

where $b@name=$loc

return <exhibits> {$b} </exhibits>

Page 33: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Active XML - Outline

IntroductionActive XML

• Active XML documents• Active XML services

New issues• Exchanging Active XML data• Querying Active XML data• Distribution and replication• Security and Access control

Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations (PODS’04, PODS’05)

ApplicationsConclusion

Page 34: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Applications

Page 35: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Demos

Peer-to-peer auctions (VLDB 2002 demo)• Discovery of new peers/auctions through intensional answers

RSS News syndication (VLDB 2003 demo)• Customization of services through schemas + news subscriptions

Decentralized management of patient data (VLDB 2004 demo)• Use AXML to coordinate the integration of data

and privacy enforcement services in a uniform way

Querying Business Processes (VLDB 2005 demo)• Use AXML to model and query BPEL specifications

Others…

A powerful framework for the fast developmentof distributed, data-centric applications.

Page 36: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Other applications

Dynamic warehouse on food risk management (E.dot) • Use AXML as the platform for the warehouse definition,

construction and maintenance

Network configuration (SWAN)• Consider using AXML exchange of information to

configure hardware/software components

Software distribution (EDOS)• Consider using AXML to customize distributions and

keep your view of the software fresh

Page 37: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

Conclusion

Page 38: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

��/56

Tova Milo – Tel Aviv University

AXML documents and services

A simple paradigm…

…that allows for new, powerful features

• Intensional parameters and results: AXML documents can be exchanged

• Support for continuous services (streams of answers)• Control over the exchange of AXML data• Lazy query evaluation

AXML implementation goes Open Source (ObjectWeb consortium)

Page 39: Peer-to-Peer Data Integration with Active XML - IBM Research · Tova Milo – Tel Aviv University Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University /56 Tova

�/56

Tova Milo – Tel Aviv University

Thanks:

Serge Abiteboul, Omar Benjelloun, IoanaManolescu, Bernd Amann, Jerome Baumgarten, Bogdan Cautis, and many others…