Post on 19-Dec-2015
Serge Abiteboul – Singapore 2002 1
1
Web services and data integration
S. Abiteboul Omar Benjelloun Tova MiloINRIA and Xyleme INRIA INRIA and Tel Aviv
Serge.Abiteboul@inria.frSingapore, December 2002
Serge Abiteboul – Singapore 2002 2
2
Organization
• The context• Accessing information on the Web• Web services
– SOAP– WSDL– UDDI
• Active XML– AXML documents– AXML services
• Architecture et implementation• Applications• Conclusion
Serge Abiteboul – Singapore 2002 3
3
The context
The Web and XML are changing dramatically the management of distributed information
Serge Abiteboul – Singapore 2002 4
4
Distributed data management
• Warehousing• Mediation• Management of data in cooperative work• Management of data in distributed scientific applications• Mobile data management• Document management• Web sites• Portals, etc.
• Information used to live in islands and this is changing
Serge Abiteboul – Singapore 2002 5
5
The Web of yesterday
• Protocol: HTTP• Documents: HTML• Millions of independent Web sites and billions of
documents• Browsing and full-text indexing• Publication of databases using forms• Data management with the Web
– HTML is primarily to be read by humans– Data management applications over Web data
• Based on hand-made wrappers• Expensive, incomplete, short-lived, not adapted to the Web constant
change
No real support for distributed data management!
Serge Abiteboul – Singapore 2002 6
6Information used to live in islands but it is changing
• Different formats: relational, metadata, documents, text, DXF– A Web standard for data exchange, XML, is fixing it – XML captures all kinds of information over a wide spectrum – XML comes with a family of emerging standards: XML schema,
XSL/T, Xquery, domain specific schemas… • Different computers, platforms, languages, applications
– A standard for Web services, SOAP, is fixing it– SOAP allows ubiquitous computing on the Internet– SOAP comes with a family of emerging standards: WSDL, UDDI
• This provides a uniform access to information… …the dream for distributed data management
Serge Abiteboul – Singapore 2002 7
7The information spectrum
Structured Data
Minimal structure
Meta dataHierarchy +
Books Contracts Catalogs Bank accounts
Emails Financial Reports Insurance Policies
Economical Analysis Derivatives Inventory
Political analysis Insurance Claims
Financial News Sports News Resumes
Semi-structured data and XML
Serge Abiteboul – Singapore 2002 8
8
What can be captured with XML?
• Very structured information such as database, knowledge base– Most DBMS now export in XML
• Semi-structured data such as data exchange formats (ASN.1, SGML), e.g., technical documentation
• Less structured data: documents – Meta-data: Author, date, status– Existing structure in them: chapter, section, table of content and
index– Possibly tagging of elements in it (citation, lists)– Links to other documents
• Plain text• Meta data for unstructured data such as images and
sound
Serge Abiteboul – Singapore 2002 9
9
A standard for information: XML
labeled ordered trees where leaves are text• Marriage of document and database worlds• Marriage of full text indexing and structure
indexing
• Is it the ultimate data model? No• Purely syntax – more semantics needed• Is it OK for now? Definitely yes (because it is a
standard)
Serge Abiteboul – Singapore 2002 10
10
The main asset of XML: typing
• Applications need typing and XML data can be typed if needed (DTD and XML schema)
• Trees
• Logical Granularity – neither page or document level – but the piece of information that is needed
• Semantics and structure are in tags and paths– product-table/product/reference– product-table/product/price
product
designation descriptionprice
reference
product-table
Serge Abiteboul – Singapore 2002 11
11A standard for distributed computing:
Web services• Possibility to activate a method on some remote Web
server• Exchange information in XML: input and result are in
XML• Ubiquitous XML distributed computing infrastructure• 2 main applications
– E-commerce– Access to remote data
• With XML and Web services, it is possible– To get information from virtually anywhere– To provide information to virtually anywhere
Serge Abiteboul – Singapore 2002 12
12
The basic picture
Black box
m( )
SOAP messages answer
InternetWeb client
XML
XML
SOAP service
query
Serge Abiteboul – Singapore 2002 14
14
Accessing remote information
Application using gene banks
Query some data services that provide candidate genes
Gene banks
processing
processingprocessing
Use some processing services
Multi formats + multi protocoles
Serge Abiteboul – Singapore 2002 15
15
Same with Web servicesQuery some data services that provide candidate genes
Gene banks
processing
processingprocessing
Use some processing services
Web
Application using gene banks
Serge Abiteboul – Singapore 2002 16
16
The big picture: peer2peer
Web
queries
Webservice
Webservice
Data warehousesDatabasesWeb pagesPC, PDA, cell phones……
DBWeb
Service
DBWeb
Servicequeries
Serge Abiteboul – Singapore 2002 17
17
The main roles
Client
ServiceProvider
ServiceRegistry
publish
bind
Look up
Serge Abiteboul – Singapore 2002 18
18Simple view: Looking for information about Gismos
1. Query some yellow-pages: Who knows about Gismos?
1. Negotiate with Gismo specialists• Nature of the service• Quality, cost
2. Get the information• Order, payment, delivery • Integration in my information system
3. Eventually publish information4. … and all this automatically…
Serge Abiteboul – Singapore 2002 19
19
Data integration – Logical view
Mediator or warehouse
Service directories
Service descriptionsGet service
description
source1 source2 source3
wrapper1
wrapper2wrapper3
Ontologies
Find ontologies tobuild wrappers
Serge Abiteboul – Singapore 2002 20
20
The Web service solution
Web
UDDI
RDF
wsdl
XML+SOAP wsfl
Data and servicedescription
worklow
Data and servicerepository
Data and servicesemantics
Serge Abiteboul – Singapore 2002 21
21Mediation with Web services
Mediator
source1
source2
source3
wrapper1
wrapper2
wrapper3
Web
Web services:• Service directories• Service descriptions• Wrappers• Sources• Mediators/warehouses
Service directories Service descriptions
Serge Abiteboul – Singapore 2002 22
22
Advantages for data integration
• A universal model for data integration = XML– Solves the heterogeneity issue
• A universal protocol for distribution = SOAP• A language for describing the interface of data sources =
WSDL– Simple object access protocol (something like Corba) – Web service description language (something like IDL)– Solves the interoperability issue
• A standard for publication and discovery of information = UDDI– Universal Description, Discovery and Integration
• A standard for describing the semantics of sources = RDF– Resource description framework
Serge Abiteboul – Singapore 2002 23
23
Advantages – continued – the goal
• The system can find a new source of information using UDDI
• Understand its syntax using WSDL• Understand its semantics using RDF• Get it using SOAP• The information is in XML, can be restructured
and integrated automatically
• Not yet… But soon?
Serge Abiteboul – Singapore 2002 24
24
Jargon
XMLXHTML
RDF
.NET
RosettaNet
WSFL
DTD
Xschema
XSLXSLT
XSL-FO
ebXMLnamespace
HTTPS
OASIS HTTP
SOAP
OAGIS
WSDL
ICE
RSS UDDI WSDL
MIME
Help!
Serge Abiteboul – Singapore 2002 25
25
Active XML
Joint work with: Bernd Amann, Jerôme Baumgarten, Angela Bonifati, Ioana Manolescu, Frederic Ngoc and others
Serge Abiteboul – Singapore 2002 26
26
q1($1,$2), Q2, Q3…(XPATH, Xquery)
AXML = XML + embedded SOAP calls
AXMLAXML
Internet AXML peer: client and server
Webserver
m( )
SOAP messages
answer
AXML
AXML query
Internet
answer
query Webclient
Serge Abiteboul – Singapore 2002 27
27
Active XML
• Peer-to-peer architecture
• Each Active XML peer – Repository: manages active XML data with
embedded Web service calls– Web client: activate calls in the documents– Web server: provides Web services defined
as (parameterized) queries over the repository
AXMLpeerso
ap
Serge Abiteboul – Singapore 2002 28
28
Build on existing standards
Tree data: XML– internal data representation
and
– data exchange
Web services:
SOAP, WSDLQuery languages:
Xquery/Xpath
AXML
XML
Serge Abiteboul – Singapore 2002 29
29
AXML peer: repository of AXML documents
<directory>
<dep name="Toy“>
<sc>toy.xyz.com/GetToyPersonel()</sc>
</dep>
<dep name=“DVD“>
<sc>dvd2000.com/GetDVDPersonnel()</sc>
</dept>
</directory>
Service calls
May contain callsto any SOAP Web service e-bay.net, google.com, etc.to any AXML Web service
Serge Abiteboul – Singapore 2002 30
30
AXML peer: Web client
<directory> <dep name="Toy“> <person pname=“Smith”>
<phone>01…</phone> <pda>
<sc>toy.xyz.com/GetPDA(../../@pname)</sc>
</pda> </person>
<sc>toy.xyz.com/GetToyPersonel()</sc> </dep> <dep name=“DVD“> <sc>dvd2000.com/GetDVDPersonnel()</sc> </dept></directory>
Result
Serge Abiteboul – Singapore 2002 31
31
Controlling the evaluation
• Activation of calls and data lifespan are controlled– frequency: when is the service called ? (« call each
day ») – validity: how long is the retrieved data valid ?
– mode: immediate or lazy ?
Serge Abiteboul – Singapore 2002 32
32
Example: control attributes
<directory> <dep name="Toy“> <sc valid=“rt+1 week” mode=“immediate” >
toy.xyz.com/GetToyPersonel() </sc>
</dep> <dep name=“DVD“> <sc valid=“0” mode=“lazy” >
dvd2000.com/GetDVDPersonnel() </sc>
</dept></directory>
Serge Abiteboul – Singapore 2002 33
33
AXML peer: Web server
• AXML Web services: defined using XQuery over AXML documents
let service Get-Toy-Personnel( ) be for $a in
document("toy.xyz.com/members.axml")/member, $b in $a//name, $c in $a//phone, $d in $a//pda return <person pname={ $b/text() }> { $c } { $d } </person>
Serge Abiteboul – Singapore 2002 34
34
The crux: the exchange of AXML data
• Arguments & result of calls are AXML• Data is thus intentional & dynamic
• Distributed computing: by sending data containing service calls, one can delegate some work to other peers
• Partial computations: by returning data containing service calls, one can give to the receiver the control of these calls
• All this can be controlled
Serge Abiteboul – Singapore 2002 35
35
Example: Tourist guide
… <sc>yahoo.com/Temp(“Paris”)</sc>…
I need to evaluate the temperature of Paris1. I call Yahoo:
<sc>meteoF.com/t(“Paris”)</sc>2. I call meteoF: <t type=“celcius”>0</t>
I am asked what is the temperature of Paris• … <t type=“celcius”>0</t>• … <sc>meteoF.com/t(“Paris”)</sc>…• … <sc>yahoo.com/Temp(“Paris”)</sc>…
Serge Abiteboul – Singapore 2002 36
36
Continuous services
• Inside the tourist guide: new events• Pull mode : standard SOAP query
– Ask once a week
• Push mode : subscription to a continuous service– When new events are announced, they are pushed to
the AXML document
• Possibility to define AXML continuous services
Serge Abiteboul – Singapore 2002 38
38
Global architecture
XQueryprocessor Evaluator
query
servicedescriptions
readupdate read
updateconsults SOAP
wrapper
SOAP
SOAP
AXML peer S3
SOAPservice
SOAP client
AXML peer S1
service call service result
AXML document store
AXML peer S2
AXML
XML
AXML
AXML
Serge Abiteboul – Singapore 2002 39
39
Implementation
• SUN’s Java SDK 1.4 (includes XML parser, XPath processor, XSLT engine)
• Apache Tomcat 4.0 servlet engine• Apache Axis SOAP toolkit 1.0 beta 3 • X-OQL query processor, persistent DOM repository• JSP-based user interface, using JSTL 1.0 standard tag
library• First prototype
– No lazy evaluation– No continuous services
• On going work on typing, security, replication…• Demo for VLDB’02
– P2P auctioning system
Serge Abiteboul – Singapore 2002 41
41
Application 1: Warehousing
• Construction of warehouses with Web data • Monitoring of changes on the Web • Kind of services that are used
– Google search engine– wget– Classification– XML Diff and site changes– Page monitoring system– etc.
Serge Abiteboul – Singapore 2002 42
42
Application 2: Mobile data
• AXML peers as mobile entities • Active data store with query capabilities
– Metadata and object profiles
• Issues– Storage services for mobile objects– Processing services for mobile objects– Use proxies for that
• European Project DBGlobe
Serge Abiteboul – Singapore 2002 43
43
Application 2: Mobile data
• Light-weight AXML peers – PDA, cellular phone, laptop… – Limited storage, network bandwidth– Sometime disconnected
• Limited functionalities– E.g., support for continuous services
based on a mail server and SMTP
Serge Abiteboul – Singapore 2002 44
44
Application 2 : context awareness
• Where am I? (geographical position)
• Where is the « nearest » AXML proxy? (network position)
• Active use of this information– For providing context dependent data (e.g.,
time, temperature, nearest restaurants, etc.)– For selecting services (e.g., choose a nearby
proxy for caching)
Serge Abiteboul – Singapore 2002 45
45
Application 3: P2P Auction
• Each peer proposes some auctions– The document records
the peer’s items and the bids
• Each peer knows about some auctions of other peers
• Each peer can bid on any auction– The peer recalls the
bids she has put
• When an auction closes, the winner is notified
• No centralization
Serge Abiteboul – Singapore 2002 47
47
AXML services
• A simple, declarative way to create Web services compatible with current standards for Web services invocation
• AXML services are powerful tools for data integration
• They allows for new, powerful features• Intentional parameters and results: AXML documents
(containing service calls) that are exchanged.• Continuous services send back a stream of answers (SOAP
messages) to the caller
Serge Abiteboul – Singapore 2002 48
48
Many issues
• Security• Typing of parameters• Lazy evaluation and optimization• Replication • Mobility: dbglobe project• Termination• Implementation• Foundations• And more
Serge Abiteboul – Singapore 2002 49
49
Security
• Peers exchange AXML documents containing service calls
• A server (resp. client) might ask the client (resp. server) to do something « bad »:
<sc>qod.com/QuoteOfDay </sc><quote date=“july 8th 2002”>
My heart was bumping <context>Tskitishvili, picked 5th in the NBA draft by the Denver Nuggets</context><sc>buy.com/BuyCar(« BMW Z3 »)</sc>
</quote>
Serge Abiteboul – Singapore 2002 50
50Using type to control the use of services
Peer1
Peer2
f g
Evaluate g before sending data
f
Accept
Peer1 tells which kind of data it exports and Peer2 which kind it accepts
Serge Abiteboul – Singapore 2002 51
51
Distribution and replication
• Motivated by mobile devices with limited resources
• Allows to distribute one XML document on several peers
• Allows to replicate an XML-sub-tree on several peers
• Query optimization