Post on 22-Dec-2015
1
Omar Benjelloun – Active XML
Active XML: A data-centric perspective on Web services
Omar BenjellounINRIA Futurs
With: Serge Abiteboul, Tova Milo, and many others.April 30th, 2004
2
Omar Benjelloun – Active XML
Active XML - Outline
Introduction
Active XML• Active XML documents• Active XML services
Novel issues• Exchanging Active XML data• Querying Active XML data
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations
Applications
Conclusion
3
Omar Benjelloun – Active XML
Introduction
4
Omar Benjelloun – Active XML
Distributed data management in P2P Information is everywhere
services
XML XML
services
XML XMLXML
XML
services
XML
services
XMLInternet
Webservice
Webservice
Data warehousesDatabasesWeb sitesPC, PDA, cell phones, home appliances, cars…
5
Omar Benjelloun – Active XML
The golden triangle of distributed data management
XML
a standard for data representation & exchange• Extensible Markup Language• Labeled ordered trees• Types: XML Schema / tree automata
Query languages• XPath, XQuery
Web services
standards for distributed computing • SOAP, WSDL, UDDI• Activation of methods on remote servers• Many burgeoning standard proposals
(Choreography, QoS, user interface, etc.)
XQuery XPath
XML
SOAPWSDL
6
Omar Benjelloun – Active XML
What is Active XML (AXML)?
AXML is a declarative language
for distributed information management
and
an infrastructure to support this language,
in a peer-to-peer framework.
7
Omar Benjelloun – Active XML
Active XML
8
Omar Benjelloun – Active XML
Active XML documents
XML documents with embedded calls to Web services
Intensional • Some of the data is given explicitly • Some is given intensionally
(i.e. the means to acquire data when needed are given)
Dynamic• If the external sources change, the same document will provide
different information• Reaction to world changes
9
Omar Benjelloun – Active XML
Not a new idea in databases, nor on the Web
Mixing calls to data is an old idea• Procedural attributes in relational systems• Basis of Object-oriented Databases
In Web programming• Sun’s JSP, PHP+MySQL
Calls to Web services inside documents• Macromedia FLEX, Apache Jelly, Microsoft XAML
What is new is the exploitation of the idea…
10
Omar Benjelloun – Active XML
Web services in brief
A number of standards• XML• SOAP: Exchange of messages between applications• WSDL: Description of service interfaces (e.g. input/output types)• UDDI: Advertisement and discovery of services• … other proposed standards (choreography, security, etc.)
For us: means to provide, invoke and describe remote functions with XML input/output.
They make AXML documents universally understandable.
11
Omar Benjelloun – Active XML
A sample AXML document
<?xml version=“1.0” ?><newspaper> <title>Le Monde</title> <date>06/10/2003</date> <call svc=“Yahoo.GetTemp”> <city>Paris</city> </call> <call svc=“TimeOut.GetEvents”> exhibits </call></newspaper>
GetTemp
city
“Paris”
newspaper
titledate
“06/10/2003”
“Le Monde”
GetEvents
“Exhibits”
AXML documents may contain calls:• to any existing Web services
(e-bay.net, google.com…)
• to any AXML Web services (to be defined)
12
Omar Benjelloun – Active XML
Materialization
We will see later that:• Replacing the call by its result is not the only option
• Calls are not necessarily RPC-style synchronous invocations
<?xml version=“1.0” ?><newspaper> <title>Le Monde</title> <date>06/10/2003</date> <call svc=“Yahoo.GetTemp”> <city>Paris</city> </call> <call svc=“TimeOut.GetEvents”> exhibits </call></newspaper>
GetTemp
city
“Paris”
newspaper
titledate
“06/10/2003”
“Le Monde”
GetEvents
“Exhibits”
Y!Y!
temp
“16°C”
SOAP call
<temp>16°C</temp>
13
Omar Benjelloun – Active XML
AXML Web services
Parameters: AXML data
Result: AXML data
Distribute computations: by sending as parameters data containing service calls, one can delegate some work to other peers.
Partial computations: by returning data containing service calls, one can give to the receiver the control of these calls.
Great flexibility
14
Omar Benjelloun – Active XML
Calling an AXML service
<?xml version=“1.0” ?><newspaper> <title>Le Monde</title> <date>06/10/2003</date>
<call svc=“TimeOut.GetEvents”> exhibits </call></newspaper>
newspaper
titledate
“06/10/2003”
“Le Monde”
GetEvents
“Exhibits”
<temp>16°C</temp>
exhibits
GetExhibits
“Paris”
City
T!T!
temp
“16°C”
SOAP call (still…)
Materialization is a recursive process
Termination is an issue
<exhibits> <call svc=“Yahoo.GetExhibits”> <city>Paris</city> </call></exhibits>
15
Omar Benjelloun – Active XML
Organization
Novel issues raised by the AXML language• Exchange of AXML data
• Querying AXML data
Supporting infrastructure• AXML peers:
– Management of persistent AXML data– Declarative AXML services
Applications
16
Omar Benjelloun – Active XML
Novel issues
17
Omar Benjelloun – Active XML
Active XML - Outline
Introduction
Active XML• Active XML documents• Active XML services
Novel issues• Exchanging Active XML data (SIGMOD 2003)• Querying Active XML data
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations
Applications
Conclusion
18
Omar Benjelloun – Active XML
To call or not to call ?
GetEvents
“Exhibits”
newspaper
title date
“Le Monde”“06/10/2003”
GetTemp
city
“Paris”
temp
“16°C”
Y!Y!
Materialization can be performed by the sender, before sending a document… or by the receiver, after receiving it.
GetEvents
“Exhibits”
newspaper
title date
“Le Monde”“06/10/2003”
GetTemp
city
“Paris”
temp
“16°C”
19
Omar Benjelloun – Active XML
Why control the materialization of calls?
For added functionality, e.g. • Intensional data allows to get up-to-date information.
For security reasons or capabilities, e.g.• I don’t trust this Web service/domain,
• I don’t have the right credentials to invoke it,
• It costs money,
• Maybe the receiver doesn’t know Active XML!
For performance reasons, e.g.• A proxy can invoke all the services on behalf of a PDA.
… and many more reasons you can think of!
20
Omar Benjelloun – Active XML
We extend XML Schema, with intensional types: XMLSchemaint
How to control it? Using types
Static analysis algorithms use signatures of services: WSDLint
... ...
r
......
...
... ...
g
fq
...
CapabilitiesACLCost...
Sender
dataexchangeSchemaf q
g
CapabilitiesACLCost...
Receiver
gg
g
g
gg
q
q
q
f
fr
r
21
Omar Benjelloun – Active XML
Data:newspaper = title.date.(GetTemp|temp).(GetEvents|exhibit*)
title = data
date = data
temp = data
city = data
exhibit = title.(GetDate|date)
Functions:GetTemp(city) -> temp
GetEvents(data) -> (exhibit|performance)*
GetDate(title) -> date
The extended schema language
Rewriting: replace call(s) by an arbitrary output of the service.
To simplify, we use here a DTD-like syntax
GetTemp
city
“Paris”
newspaper
titledate
“06/10/2003”
“Le Monde”
GetEvents
“Exhibits”
22
Omar Benjelloun – Active XML
Rewritings
The Goal:Given • an intensional document d • a schema s, Can we rewrite d so that it matches s?
Safe rewriting: one that for sure leads to s
(we know without making any call).
Possible rewriting: one that may lead to s (depending on the answers of services).
23
Omar Benjelloun – Active XML
Difficulties
Infinite search space• Vertical
• Horizontal
Main problem • The result of a Web service call is unknown,
• We just know a signature (input/output types)
We want a very efficient solution.
Foundations of the problem • String & tree automata,
• with existential and universal transitions.
24
Omar Benjelloun – Active XML
Results
The general problem is undecidable [MSS03]
Restrictions on the considered rewritings• Left-to-right: No “going back and forth”• K-depth: bound on the nesting of function calls (Search space still infinite but finitely representable)
Under these restrictions• We have algorithms to find safe/possible rewritings.• They are PTIME (for deterministic schemas).• We can also do it between schemas.
Implementation• demo at VLDB 2003 (customizable news syndication)
25
Omar Benjelloun – Active XML
Safe rewriting algorithm
Sketch• Deal with function parameters first,
• Top-down traversal of the tree,
• For each data node:– rewrite its children (viewed as a word), – to match the target type (a regular expression)– using regular automata techniques, and smart marking.
26
Omar Benjelloun – Active XML
Safe rewriting algorithm (2)
Build an FSA that accepts all k-depth rewritings of the initial word.
Build an FSA that recognizes the complement of the target type.
GetEvents
1wA
q1title
q6
dateq2 q3
GetTempq0 q4
q5
q7
exhibit
performance
temp
p0 p1title
p2date
p3temp p4GetEvents
p6*
p5
exhibit
exhibit
*
* * * *
*
A
27
Omar Benjelloun – Active XML
Safe rewriting algorithm (3)
Compute the intersection of these languages:
A smart marking determines whether a safe rewriting exists.
Then run the word on the marked automaton to find an actual rewriting.
Optimization: lazy construction of the automata
q0,p0 q1,p1 q2,p2 q3,p3 q4,p4
q6,p3q5,p2
q3,p6q7,p6
q4,p6
q7,p6 q7,p3 q4,p3
q7,p5 q4,p5
title date
temp
GetEvents
GetEventsperformance
performance
GetTemp
performanceexhibit
exhibit
exhibit
exhibit
AAA kw
28
Omar Benjelloun – Active XML
Active XML - Outline
Introduction
Active XML• Active XML documents• Active XML services
Novel issues• Exchanging Active XML data• Querying Active XML data (SIGMOD 2004)
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations
Applications
Conclusion
29
Omar Benjelloun – Active XML
Querying AXML Data
Given a (tree pattern) query:/newspaper[temp > 18°C]/exhibits//exhibit[location=“Le Louvre”]
Materialize the document?
Call only the services that may contribute
data to the query answer.
The problem: Lazy evaluation of service calls
To call or not to call, this time when evaluating a query
GetTemp
city
“Paris”
newspaper
titlegetDate
“Le Monde”
GetEvents
“Exhibits”
exhibits
GetExhibits
“Paris”
City
temp
“19°C”
30
Omar Benjelloun – Active XML
Lazy evaluation
Difficulties:• Calls can be found everywhere in the document
• May appear dynamically (as a result of previous calls)
• May become (ir)relevant due to previous invocations
• Need to take signatures of calls into consideration
A possible approach: modify the query processor• Top-down evaluation
• Trigger the calls found on the way
• Not so great:
– Computation is blocked
– Optimization opportunities are lost
31
Omar Benjelloun – Active XML
Our solution
Given a query to evaluate:
Derive a set of
“node-focused” queries (NFQ),
that find the relevant calls
when evaluated on the document.
Need to be reevaluated, as the document evolves!
newspaper
temp
> 18°C
exhibits
exhibit
location
“Le Louvre”
newspaper
temp
> 18°C
exhibits
*
*
*
Etc.
32
Omar Benjelloun – Active XML
Optimizations
Service calls sequencing• Analysis of the relationship between calls (through the NFQ’s)• Layering, and parallelization inside each layer.
Refinement via type analysis• Matching output types of services with data expected of queries
“Pushing” queries to capable services
Acceleration:• Via relaxation:
– NFQ approximation– Superset of the relevant calls
• Via a special access structure, similar to a DataGuide:– Restricted to paths that lead to service calls– Indexes the calls
Experimental assessment• 10x speed-up when combining optimizations
33
Omar Benjelloun – Active XML
Active XML peers
34
Omar Benjelloun – Active XML
Distributed data management in P2P
services
XML XML
services
XML XMLXML
XML
services
XML
services
XMLWeb
Webservice
Webservice
AXML
AXML
AXML
AXML
AXML
AXML
AXML
35
Omar Benjelloun – Active XML
What do we need from an AXML system ?
Persistent, manageable, dynamic AXML data.
Easy ways to define services
Control of the exchanged data (parameters & results of service calls)
Peer-to-peer architecture, where each AXML peer:• Repository: manages persistent AXML data
• Client: uses (AXML) Web services
• Server: provides AXML services
AXMLpeerso
ap
36
Omar Benjelloun – Active XML
Global architecture
query
readupdate
SOAPwrapper
SOAP
SOAP
AXML peer S3
SOAPservice
SOAP client
AXML peer S1AXML peer S2
AXML
XML
XML
AXML
AXML store
servicedescriptions
AXMLengine
Query engine
37
Omar Benjelloun – Active XML
Implementation
SUN’s Java SDK 1.4 (includes XML parser, XPath processor, XSLT engine)
Apache Tomcat 4.1 servlet engine
Apache Axis SOAP toolkit 1.1
X-OQL query processor, persistent DOM repository
JSP-based Web user interface, using JSTL 1.0 standard tag library
Also, a lightweight implementation for PDA/phone (J2ME, CLDC profile), used for [ABB03demo].
38
Omar Benjelloun – Active XML
Active XML - Outline
Introduction
Active XML• Active XML documents• Active XML services
New issues• Exchanging Active XML data• Querying Active XML data
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations
Applications• P2P auctions• News syndication• Other applications
Conclusion
39
Omar Benjelloun – Active XML
Managing persistent AXML data
“Our newspaper should have its temperature information refreshed daily. New exhibits should be fetched every week and archived for 6 months”
Service call results enrich the document (calls can be kept for possible future reuse)
Main issues:• When to activate a service call?
• What to do with its result?
40
Omar Benjelloun – Active XML
When to activate a service call?
Explicit pull mode • Daily, weekly, or after some event: e.g., when another call occurs
• This aspect of the problem is related to active databases
Implicit pull mode• Detect which intensional information (the service calls) may
contribute to the answer of a query (lazy evaluation)
• This aspect of the problem is related to deductive databases
Push mode• Based on a query subscription; the service provider pushes
information to the client (E.g., for synchronization purposes)
• This is related to stream and subscription queries
41
Omar Benjelloun – Active XML
Managing service call results
How long does the returned data remain valid?
• Just long enough to answer a query: Mediation
• 1 day, 1 week, … or unbounded: Caching / Warehousing
• Various portions of the document may follow different policies: Hybrid
For repeated service call invocations: merge policy
• append,
• replace,
• Fusion (using XML Schema-like keys),
• Specific merge policies can be provided as Web services
42
Omar Benjelloun – Active XML
Example: AXML document with control attributes
<?xml version=“1.0” ?><newspaper> <title>Le Monde</title> <date>06/10/2003</date> <call svc=“Yahoo.GetTemp” mode=“lazy”
valid=“1 day” merge=“replace” > <city>Paris</city> </call> <call svc=“TimeOut.GetEvents” mode=“every Monday morning” valid=“6 months”
merge=“append”> exhibits </call></newspaper>
43
Omar Benjelloun – Active XML
Active XML - Outline
Introduction
Active XML• Active XML documents• Active XML services
Novel issues• Exchanging Active XML data• Querying Active XML data
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations
Applications
Conclusion
44
Omar Benjelloun – Active XML
Declarative AXML services
Services can be defined by queries or updates over the AXML documents of the repository (XQuery, XPath, Xupdate)
Which (lazy) service calls may contribute to the answer?
let service GetExhibitsByLocation($loc) be
for $a in document(“newspaper.xml")/newspaper/exhibits,
$b in $a//exhibit
where $b@name=$loc
return <exhibits> {$b} </exhibits>
45
Omar Benjelloun – Active XML
Other means to define services
Other programming languages:• XSLT transformations (through Apache Xalan)
• Java classes (through Axis)
Composition of existing services:• BPEL4WS (through IBM’s BPEL4J implementation)
46
Omar Benjelloun – Active XML
Active XML - Outline
Introduction
Active XML• Active XML documents• Active XML services
New issues• Exchanging Active XML data• Querying Active XML data
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations (PODS 2004)
Applications
Conclusion
47
Omar Benjelloun – Active XML
Theoretical foundations: Positive AXML
Restricted framework• Data model
– set-based (unordered) AXML trees– Call results are accumulated in documents
• Services – Monotone– Positive: defined by conjunctive fragment of XQuery
Results• Well-defined (possibly infinite) fix-point semantics• Termination, lazy evaluation…
Connections to: • Regular (infinite) trees, Query-Sub-Query [AM04],…
48
Omar Benjelloun – Active XML
Applications
49
Omar Benjelloun – Active XML
Demos
Peer-to-peer auctions (VLDB 2002 demo)• Discovery of new peers/auctions through intensional answers
RSS News syndication (VLDB 2003 demo 1)• Customization of services through schemas + news subscriptions
Distributed workspaces (VLDB 2003 demo 2)
Web warehousing (ECDL 2003 demo)
A powerful framework for the fast development of distributed, data-centric applications.
50
Omar Benjelloun – Active XML
Other applications
E.dot, a dynamic warehouse on food risk management• Use AXML as the platform for the warehouse definition,
construction and maintenance
Network configuration • Use AXML exchange of information to configure
hardware/software components
Software distribution• Use AXML to customize distributions and keep your view of
the software fresh
Decentralized user profile/patient data management• Use AXML to coordinate the integration of data, and privacy
enforcement services in a uniform way
51
Omar Benjelloun – Active XML
Conclusion
52
Omar Benjelloun – Active XML
AXML documents and services
A simple paradigm…
…that allows for new, powerful features.
• Intensional parameters and results:
AXML documents can be exchanged
• Support for continuous services (streams of answers)
• Control over the exchange of AXML data
IssuesControl of call activation via typing, Lazy evaluation, Replication and distribution, Security, Mobility, Termination, Implementation, Foundations, …
53
Omar Benjelloun – Active XML
Current/Future work
Security and privacy (with Bell Labs)
Editor/browser plug-in for AXML
Mass storage XML DB (with Xyleme Corp.)
P2P infrastructure
…
54
Omar Benjelloun – Active XML
To know more…
http://purl.org/net/axml• Implementation becomes open-source• Already available for research• Will be released publicly very soon.
Selected publications• S.Abiteboul, O. Benjelloun, T. Milo:
Positive Active XML, PODS, 2004.• S.Abiteboul, O. Benjelloun, B. Cautis, I. Manolescu, T. Milo, N. Preda:
Lazy Query Evaluation for Active XML, SIGMOD, 2004.• T. Milo, S. Abiteboul, B. Amann, O. Benjelloun, F. Dang Ngoc:
Exchanging Intensional XML Data, SIGMOD, 2003 (full version to appear in TODS).
• S. Abiteboul, O. Benjelloun, I. Manolescu, T. Milo, R. Weber: Active XML: A Data-Centric Perspective on Web Services (book chapter), In Web Dynamics, Springer, 2004.
• S. Abiteboul, A. Bonifati, G. Cobena, I. Manolescu, T.Milo:Dynamic XML Documents with Distribution and Replication, SIGMOD, 2003
55
Omar Benjelloun – Active XML
Merci
56
Omar Benjelloun – Active XML
57
Omar Benjelloun – Active XML
Extra slides
58
Omar Benjelloun – Active XML
Asynchronous/Continuous services
The client subscribes and then is notified
The server decides when to send data• E.g., promotional offers
Change control:• Management of replication [ABCMM03]
• What to do when a change is detected– Send the new state of data
– Send the delta between old and new state
– Dual of merge policies
59
Omar Benjelloun – Active XML
Peer-to-peer auctions (VLDB 2002 demo)
Each peer proposes auctions:• Document myauctions.xml with the
peer’s items and their current bids• Services offered:
– getLocalAuctions(),– status(auctionId)
Each peer bids on auctions:• Document mybids.xml with the
peer’s bids• Services offered:
– bid(peer,auctionId, amount)– bidUpTo(peer, auctionId,
increment, limit)
Each peer knows about other peers’ auctions:• Document
allauctions.xml contains calls to other peers that transitively retrieve their known auctions.
• Service offered : getAllAuctions()
When an auction closes, the winner is notified.
60
Omar Benjelloun – Active XML
News syndication (VLDB 2003 demo)News sources:•GetStory(id)•GetNewsAbout(kwd)
Aggregators:•GetNewsAbout(kwd)•…but several versions, more or less intensional
Clients:•PC, laptops, PDAs
61
Omar Benjelloun – Active XML
Customizing the output of services• News sources/aggregators provide different versions of
GetNewsAbout with different output schemas• The output is automatically transformed into the desired schema• Clients can also specify a desired output schema as a parameter
Customizing the input of services• Location-aware continuous services for mobile users• The context of the user is given by intensional parameters
Distributed logging mechanism• Also customizable through the use of schemas
Service customization using schemas
62
Omar Benjelloun – Active XML
Call parameters<temp> <call svc=“GetTemp@weather.com”><city>“Denver”</city></call></temp>
XPath
AXML
<temp> <call svc=“GetTemp@weather.com”> <city> <call svc=“GetCapital@us.gov”>“colorado”</call> </city> </call></temp>
<temp> <call svc=“GetTemp@weather.com”>../../city</call></temp>
To call or not to call (before invoking) ?
XML