XML + Databases = ? (DIMACS Workshop, 3/2000)
description
Transcript of XML + Databases = ? (DIMACS Workshop, 3/2000)
XML + Databases = ?(DIMACS Workshop, 3/2000)
Mike CareyExploratory Database Systems Department
IBM Almaden Research [email protected]
Plan for Today’s TalkPlan for Today’s Talk Thoughts on DB and web technologies
– The web and web “querying”– Semistructured databases– Object-relational databases– XML and databases
XML/DB research at IBM Almaden– The XPERANTO project
• Motivation and approach• Whirlwind tour of the system
The Web is Great at Supporting URL-Based Sharing Ex: Online conference proceedings Web browsers have given us
– Universal file access (ftp++)– Universal document access (html)– Universal service access (forms)
What more could we navigational couch potatoes possibly want?– Universal platform for e-shopping!
The Web is Lousy at Supporting Parametric Searches Ex: Find all the used Musicman Sterling bass
guitars currently available for under $750 within a 50-mile radius of my San Jose home
This is hard for a number of reasons– Data buried in web pages, news groups,
classified ads, store sites, auction sites, …– No schema (no metal fish, please!)– No data types (miles, US$, instruments)– No regularity within/across (good!) sites
Aren’t Aren’t WeWe Supposed to be the Supposed to be the Experts on Data Management?Experts on Data Management? The DB community brought the world
– Data models, schemas, and views– Query languages, optimizers, fast joins– Scalable parallel servers– Federated database systems
What do we have in our bag of tricks?– Semistructured databases– Object-relational database systems
Is Semistructured Database Technology the Answer? Database characteristics
– Collections of [name, value] pairs or maybe [name, type, value] triples
– Collections typically set<any> or list<any> System characteristics
– “Typeloose” query languages– Indexes for nested, typeloose structures– Appropriate query processing techniques
Are Semistructured Databases the Answer? (2)
No, because schemas are critical for– Data readers
• What info is in a given collection?• Thus, what queries might make sense?
– Data writers• What should I call this piece of info?• Is it okay to put this kind of data here?
– Efficient/effective query processors• Indexing, statistics, ... (e.g., range queries)• Integration mappings (e.g., unit conversions)
Are Semistructured Databases the Answer? (3) It has some nice features, though
– Flexible, dynamic schemas• Forgiving w.r.t. variations and exceptions• Schema evolution is not a big deal
– Richer data modeling (vs. relational)• Nested structures, ordered collections
– More powerful query languages• Blurring of schema and data querying• Ordering, nesting, restructuring handled
Is Object-Relational Database Technology the Answer? Database characteristics
– Base types, user-defined structured types, inheritance, reference types, collections
– Collections are well-typed System characteristics
– Extended SQL-based query languages– Support for methods (fenced/unfenced)– Also triggers, LOBs, extensible indexes
Are Object-Relational Databases the Answer? (2) No, because most O-R DBMSs have
– Overly rigid schemas• Every instance is of one (known) type• Evolving a type can be a major burden• Distributed type management is hard
– Crufty old storage managers• Ragged or sparse records poorly supported
– Insufficient power in extended SQL• Prehistoric assumptions get in the way• Weak on restructuring, schema-querying
Is XML the Answer?(Yes!!Yes!! ...What Was the Question Again?) Structured documents (for the web)
<book> <booktitle> Tables Are The Answer </booktitle> <author id = “cdate”> <name> <firstname> Chris </firstname> <lastname> Date </lastname> </name> <address> <city> Saratoga </city> <state> CA </state> </address> </author></book>
Is XML the Answer? (2) W3C’s XML Schema working group
– Typed elements, attributes, documents– Simple types and complex types– Derived types (extension, restriction)– Facets, anonymous types, groups, …– Uniqueness, keys and key references
W3C’s XML Query working group– XML-QL, Xpath, XQL, XSL/T, XSQL, …– Recommendation due in late 2000 (?)
Is XML the Answer? (3) XML Schema might help because
– XML has achieved a huge mindshare for data interchange on the web
– DTD standardization is happening for documents within vertical industries, and XML Schemas should take over
– When finished, XML Schema should be a widely used schema description tool
• Similar to O-R schemas, but with more flexibility (and web-based sex appeal)
Some Useful XML+DB Topics Publish documents with XML Schemas from
O-R databases– B2B e-commerce messages– B2C comparison shopping (if permitted!)– Robust O-R DB-resident web sites with XML for
page content generation Use XML Schema as the central data model
for data integration middleware– I.e., web information integration
Useful XML+DB Topics (2) Build a “native” XML Repository on top of
an O-R DBMS– Map from XML Schema model to O-R DBMS
modeling constructs– Map from XML queries to O-R queries
(including tag variables and loose typing)– Thereby provide XML document storage
management with industrial-strength robustness, scalability, and performance
Useful XML+DB Topics (3) Evolve XML-QL into a complete web
data manipulation language – Typing a la XML Schema– Ordered/unordered collections– XPath-inspired expressions– Easier grouping and aggregation– Updates (insert/delete, modify)– Etc.
The XPERANTO ProjectThe XPERANTO Project Middleware for publishing O-R (or plain
relational) DB content on the web– Provides a virtual XML document view– Based on a “pure XML” approach– Using XML-QL (as W3C placeholder)
Born at Almaden in summer of 1999– Mike Carey, Dana Florescu, Zack Ives, Ying Lu,
Jai Shanmugasundaram, Beau Shekita, Subbu Subramanian
The XPERANTO Belief System Databases contain, and will continue to
contain, the world’s “data jewels”– Transactional data (RDBMS)– Important multimedia assets (ORDBMS)
XML application developers of the future may not love SQL like we do– View databases as default XML documents– Let them define appropriate (query-able)
views of these XML documents
XPERANTO Architecture
Views
XML Schema
O-R Database
SQL Query Processor
Stored Tables
System Catalog
Metadata Services
View Services
Type & Table Services
Query Translation
XQGM
XML-QL Parser
XQGM
Query Rewrite
SQL Translation
XML SchemaGenerator
Catalog Info XML Tagger
Data Tuples
Table & Type Info
SQL Queries
XPERANTO Components XML-QL Parser
– Neutral query representation (XQGM) Query Rewrite
– View composition and other rewrites SQL Translation
– Produce SQL query(s) to get the required data from the underlying DBMS
XML Tagger– Tag and structure the tabular results
XPERANTO Components View Services
– Repository for XML view definitions Type & Table Services
– Interface (and cache) for DB catalog info XML Schema Generator
– Give DB catalog info in XML Schema form for default views
– Infer XML Schema info for queries and non-default view definitions
Consider a Simple O-R SchemaCreate Table book AS (bookID CHAR(30), name VARCHAR(255), publisher VARCHAR(30))
Create Table publisher AS (name VARCHAR(30), address VARCHAR(255))
Create Type author_type AS (bookID CHAR(30), first VARCHAR(30), last VARCHAR(30))
Create Table author OF author_type (REF IS ssn USER GENERATED)
Part of the Default XML View<simpleType name=”string255” source=”string”> <maxLength value=”255” /></simpleType>
<simpleType name=”string30” source=”string”> <maxLength value=”30” /></simpleType>
<complexType name=“bookTupleType”> <element name=“bookID” type=“string30” /> <element name=“name” type=“string255” /> <element name=“publisher” type=“string30” /></complexType>
<complexType name=“bookSetType”> <element name=“bookTuple” type=“bookTupleType” maxOccurs=“*” /></complexType>
<element name=“book” type=“bookSetType” />
.
.
.
XPERANTO’s Default Views XPERANTO generates default O-R to XML
Schema mappings– Each DB shown as an XML file– Subtyping handled via XML Schema’s refinement
facilities– OIDs and references become ids/idrefs
“Don’t use this at home!”– Application developers are expected to define the
real view(s) using XML-QL
Creating a Better XML ViewWHERE <library.book.bookTuple> <bookID> $bid </> <name> $name </> <publisher> $bpub </> </> IN “db2:xml:books/library”, $bpub = “Kluwer”CONSTRUCT <book id=$bid> <name> $bname </> {WHERE <library.publisher.publisherTuple> <name> $bpub </> <address> $addr </> </> IN “db2:xml:books/library” CONSTRUCT <publisher> <address> $addr </> </>} {WHERE <library.author.authorTuple> <bookID> $bid </> <first> $fname </> <last> $lname </> </> IN “db2:xml:books/library” CONSTRUCT <author first=$fname last=$lname/>} </>
.
.
.
XPERANTO Query Rewrite
XML-QL queries first translated into XQGM representation– Neutral, well-poised for more features– Easier to go from XML-QL to SQL– Borrow rewrites from DB2 UDB engine
XQGM is an extension of DB2’s QGM– XML data type for “columns”– Set of XML-specific functions
SQL Generation and XML Document Tagging/Structuring Sorted Outer Union queries are used to obtain
the data– Fetch the data in one query that brings it back in
the appropriate order– Tag and nest it to create XML document
Advantages of this approach– Shown to be stable as well as fast– Simple (linear-space) tagging possible
• Just watch for nesting-related changes
Outer Union Query ExampleWITH OuterUnion (type, bookID, bookName, pubName, pubAddr, authFirst, authLast) AS( SELECT ‘0’, b.bookID, b.name, NULL, NULL, NULL, NULL FROM book b WHERE b.publisher = “Kluwer”UNION ALL SELECT ‘1’, b.bookID, NULL, p.name, p.address, NULL, NULL FROM book b, publisher p WHERE b.publisher = “Kluwer” and b.publisher = p.nameUNION ALL SELECT ‘2’, b.bookID, NULL, NULL, NULL, a.first, a.last FROM book b, author a WHERE b.publisher = “Kluwer” and b.bookID = a.bookID)SELECT * FROM OuterUnion ORDER BY bookID
XPERANTO Project Summary Goal is to publish O-R data in XML form
– Default XML views– XML-QL for defining useful views– “Look Ma, no SQL!”
Currently (re)building our prototype– View composition is our first stop– Updates in addition to queries– Queries over both data and metadata– Other needs for XML web sites...?
A Few Closing RemarksA Few Closing Remarks DB community must ensure that the web
will support real queries…!– XML Schema and XML Query standards
need ongoing input from DB researchers– Large-scale technologies needed for XML
indexing, caching, querying, etc. DB community should also work on
important underlying technologies– Publishing XML both from and to RDBMSs
and ORDBMSs, for example!