XML and RDBMSs (Storage and Publishing)
Transcript of XML and RDBMSs (Storage and Publishing)
1
1
1
ICS-FORTH & Univ. of Crete Spring 2010
XML and RDBMSs(Storage and Publishing)
2
ICS-FORTH & Univ. of Crete Spring 2010
Where’s the XML Data?
ExportLegacy databases
ImportWarehouse XML data
?
WrapSOAP/CORBA/Java RMI
Business
Application
Logic
?
ViewMinimal result
2
2
3
ICS-FORTH & Univ. of Crete Spring 2010
Two XML - DBMS Problems
� XML documents need to be stored and retrieved in databases
�Exploit capabilities of RDBMS for efficient XML data processing
�Scalability, Availability, Performance, ...
�Need to support mapping schemes for shredding XML into relations
�Need to propagate constraints from XML to relations
� Data stored in databases need to be published in XML
�Most (structured) data stored in RDBMS
�Seamless integration of XML and Relational data
�Specification schemes for publishing needed
�Efficient publishing algorithms needed
� When not to use RDBMS
�Streaming data (RSS, SOAP), Information Retrieval (Google)
4
ICS-FORTH & Univ. of Crete Spring 2010
Storing XML Data
� Storage foundation of efficient XML query processing
�Flat streams: store XML data as is in text files
�fast for storing and retrieving whole documents
−query support: limited; concurrency control: no
�Native XML Databases: designed specifically for XML
�XML document stored as is
�Efficient support for XML queries
�Many techniques need to be re-developed
�Colonial Strategies: Re-use existing storage systems
�Leverage mature systems (RDBMS)
�Simple integration with legacy data
�Map XML document into underlying structures
E.g., shred document into flat tables
3
3
5
ICS-FORTH & Univ. of Crete Spring 2010
� A new XML type (like varchar, date, numeric)
� SQL / XML 2003, the first edition of the SQL/XML standard published by the ISO as part 14 of the SQL standard: ISO/IEC 9075-14:2003
�XML type restricted to
�XML document or
�XML element or
�Sequence of XML elements
� SQL / XML, 2nd edition (ISO/IEC 9075-14:2006)
�Full support of the XQuery Data Model
�XML(SEQUENCE), XML(ANY CONTENT), ...
� N.B. XML in different rows is incomparable
Commercial Systems -- XML Type in SQL
6
ICS-FORTH & Univ. of Crete Spring 2010
Example (SQL:2003)
create table books(
title varchar(20),
authors XML);
„D. Kossmann“Dual Buffering
<author>D. Chamberlin </author>
<author>D. Florescu</author>
<author>et al.</author>
XQuery 1.0
AuthorsAuthorsAuthorsAuthorsTitleTitleTitleTitle
No schema validation, no typing!
4
4
7
ICS-FORTH & Univ. of Crete Spring 2010
From XML to Relations
� Sophisticated query processing, storage and concurrency control
techniques have been developed for relational DBMS
�Thus, why not take advantage of available DBMS techniques?
� Store and query XML data using relational DBMS
�Derive a relational schema from an XML DTD (schema)
�Shred XML data into relational tuples – store XML data
�Translate XML queries to SQL queries
�Convert query results back to XML
RDB
query answer
store query translation
DBMS
XML
8
ICS-FORTH & Univ. of Crete Spring 2010
Architecture: XML Shredding
RDB
query answer
store
DBMS
XML
XML query parsing query rewriting
XML tagger Tagging results
query translation
Relational schemagenerator
XML documentshredder
5
5
9
ICS-FORTH & Univ. of Crete Spring 2010
Nontrivial Issues
� XML storage peculiarities
�Characteristics of XML data: optional elements & values, repetition, choice, inherent order, large text fragments, mixed content
�Mapping DTDs to relational schemas: data model mismatch
• DTD: recursive, regular expressions
• relational schema: tables
�Characteristics of XML queries: document order & structure, full-text search, transformation
�Query translation: query language mismatch
• XQuery, XSLT: Turing-complete
• SQL: first-order
� Information preservation:
�lossless: there should be an effective method to reconstruct the original XML document from its relational storage – the inverse of the storage mapping should exist
�propagation/preservation of integrity constraints – will be seen shortly
10
ICS-FORTH & Univ. of Crete Spring 2010
Derivation of a Relational Schema
� Lossless: the original document can be effectively reconstructed from its relational representation
� Query support: XML queries should be able to be rewritten to efficientrelational queries
� Normal form: BCNF, 3NF have proved extremely useful for integrity maintenance of updates -- will be addressed later
Relational schema generator
XML document shredder
6
6
11
ICS-FORTH & Univ. of Crete Spring 2010
Running Example: Book Document
DTD:
<!ELEMENT db (book*)>
<!ELEMENT book (title,authors*,chapter*,ref*)>
<!ELEMENT chapter (text | section)*>
<!ELEMENT ref book>
<!ELEMENT title #PCDATA>
<!ELEMENT author #PCDATA>
<!ELEMENT section #PCDATA>
<!ELEMENT text #PCDATA>
� Recursive
� Complex regular expressions
12
ICS-FORTH & Univ. of Crete Spring 2010
Graph Representation of the DTD
� Each element type/attribute is represented by a unique node
� Edges represent the subelement (and attribute) relations
� *: 0 or more occurrences of subelements
� Cycles indicate recursione.g., book
� Simplification: e.g., (text|section)*� text* | section* -- ignore order
� XML document conforming to the DTD are those trees that unfold the graph (special treatment: * � empty at leaf)
db
title
book
author chapter ref
****
**** ********
**** ****
sectiontext
7
7
13
ICS-FORTH & Univ. of Crete Spring 2010
Canonical Representation
� Store an XML document as a graph (tree)
�Node relation: node(nodeId,tag,type)
e.g., node(02,book,element), node(03,author,element)
�Edge relation: edge(sid,did)
sid, did: source and destination nodes; e.g., edge(02,03)
� Pros and cons
�Lossless: the original document can be reconstructed; order preserving:
one can add “order” information to the edge relation
�Flexible schema: easy schema evolution but ignores regularity in structure
�Supports heterogeneous elements with optional values
�Querying: requires multi-table joins or self joins for element reconstruction
�A simple query /db/book[author=“Bush”]/title requires 3
joins of the edge relation!
14
ICS-FORTH & Univ. of Crete Spring 2010
Schema-driven: the Shred Inlining Algorithm
� Require the availability of the document DTD
� Represent the DTD as a graph (simplifying
regular expressions)
� Traverse the DTD graph depth-first
�and create relations for the nodes
�the root
�each * node
�each recursive node
�each node of in-degree > 1
� Inlining: nodes with in-degree of 1
are inlined – no relation is created
db
title
book
author chapter ref
****
**** ********
**** ****
sectiontext
8
8
15
ICS-FORTH & Univ. of Crete Spring 2010
Relational Schema Example
� To preserve the semantics
�ID: each relation has an artificial ID (key)
�parentID: foreign key coding edge relation
�Column naming: path in the DTD graph
db(dbID)
book(bookID,parentID,code,title:string)
author(authorID,bookID,author:string)
chapter(chapterID,bookID)
ref(refID,bookID)
text(textID,chapterID,text:string)
section(sectionID,chapterID,section:string)
Note: title is inlined
db
title
book
author chapter ref
****
**** ********
**** ****
sectiontext
16
ICS-FORTH & Univ. of Crete Spring 2010
Keys and Foreign Keys
db(dbID)
book(bookID,parentID,code,title:string)
author(authorID,bookID,author:string)
chapter(chapterID,bookID)
ref(refID,bookID)
text(textID,chapterID,text:string)
section(sectionID,chapterID,section:string)
� Keys: book.bookID, author.authorID, …
book.parentID ⊆ db.dbID if code = 1
book.parentID ⊆ ref.refID if code = 0
� Foreign keys: referring to parent node
author.bookID ⊆ book.bookID, similarly for chapter, ref
text.chapterID ⊆ chapter.chapterID, similarly for section
db
title
book
author chapter ref
****
**** ********
**** ****
sectiontext
9
9
17
ICS-FORTH & Univ. of Crete Spring 2010
� Use associated DTD/XML Schema to decompose document
� Shred inlining:
� Rule of thumb: Inline as much as possible to minimize number ofjoins
� Shared: do not inline if shared, set-valued, recursive
� Hybrid: also inline if shared but not set-valued or recursive
� Querying: It supports a large class of XML queries commonly encountered
+ Fast lookup & reconstruction of inlined elements
- Reconstruction may require multi-table joins and unions
Schema-Driven: Summary
18
ICS-FORTH & Univ. of Crete Spring 2010
Schema-Driven: Summary
� Instance mapping can be easily derived from schema mapping
� Is it lossless?
�The order information is lost (simplification of regular expressions
defining element types)
� Is there anything missing?
�“core dumping” the entire document to a new database
In practice one often wants to
�select relevant data from the document
�store the selected data in an existing database of a predefined
schema
�XML Schema: type + constraints
What happens to XML constraints? Can we achieve normal forms
(BNCF, 3NF) for the relational storage?
10
10
19
ICS-FORTH & Univ. of Crete Spring 2010
Selective Mapping from XML to Relations: Example
� Existing relational database R :
book(id,title)
ref(id1,id2)
� Select from XML and store in R
�books with title containing “WMD”, and
�books cited, directly or indirectly
� Difference:
�select only part of the data from an input document
�store the data in an existing database with a fixed schema
RXML
existing DBbook documentSQL inserts
db
title
book
author chapter ref
****
**** ********
**** ****
sectiontext
20
ICS-FORTH & Univ. of Crete Spring 2010
Mapping Specification: XML2DB Mappings
�XML2DB Mapping:
�Input: an XML document T of a DTD D, and an existing database schema R
�Output: a list of SQL inserts ∆R, incrementing the database of R
�Rely on an extension of Attribute Grammars:
�treat the DTD D as an ECFG (extended context-free)
�associate semantic attributes and actions with each production rule
�attributes: passing data top-down
�actions: generate SQL inserts ∆R
�evaluation: generate SQL inserts in parallel with XML parsing
11
11
21
ICS-FORTH & Univ. of Crete Spring 2010
XML2DB Mappings
� DTD: normalized; element type definitions e → α
α ::= PCDATA | ε | e1, …, en | e1 + … + en | e*
� Relation variables: for each relation schema Ri, define a variable ∆Ri, which holds tuples to be inserted into Ri
� Attributes: $e associated with each element type e
$e: tuple-valued, to pass data value top-down
� Rules: associated with each e → α; conditional statements�for each e’ in α, define $e’ using the parent attribute $e�Increment relation variables: ∆Ri := ∆Ri U {tuple}
22
ICS-FORTH & Univ. of Crete Spring 2010
XML2DB Mapping: Attributes Example
� db → book*
$book := top:= top:= top:= top /* indicating the children of the root
db
book ...book book book
$book $book$book$book
db
title
book
author chapter ref
****
**** ********
**** ****
sectiontext
12
12
23
ICS-FORTH & Univ. of Crete Spring 2010
XML2DB Mapping: Semantic Actions Example
� book → title, author*, chapter*, ref*if text(title) contains “WMD” or ($book<>toptoptoptop and $book<>bot)then id:=gen_id( ); /* generate a new id */
∆book:=∆book U {(id,text(title))};/*increment book*/if $book <> top /* cited by another book */
then ∆ref:=∆ref U {($book,id)}; /*increment ref*/$ref := id; /* passing information downward */
else $ref := botrecall relation schema: book(id,title), ref(id1,id2)
� gen_id(): a function generating a fresh unique id
� conditional: either has “WMD” or is referenced by a
book of WMDdb
reftitle
“WMD”
book
chapterchapter ref
$book
$refrefrefref $refrefrefref
ref ...
24
ICS-FORTH & Univ. of Crete Spring 2010
Implementing XML2DB Mappings
XML
XML2DB
RDBparsing
SAX parsing: SQL insertsgeneration
SAX actions
SQL insertsexecution
SAX parsing extended with corresponding semantic actions
� startDocument( ), endDocument( );
� startElement(A, eventNo), endElement(A);
� text(s)
SQL updates:
insert into book
select *
from ∆∆∆∆bookbookbookbook
13
13
25
ICS-FORTH & Univ. of Crete Spring 2010
XML Constraints
� Recall: an XML schema consists of both types and constraints
�(//book,{isbn}) -- isbn is an (absolute) key of book
�(//book,(chapter,{number}) -- number is a key of chapter
relative to book
�(//book,(title,{ })) -- each book has a unique titledb
book bookbook book
title chapter
“XMLXMLXMLXML”
chapter
title section
“6”
number section
number text DTD
number
“1” number XPath
title
“XMLXMLXMLXML” number
“1”
number
“10”
isbn isbn
title
chapter chapter
26
ICS-FORTH & Univ. of Crete Spring 2010
Mapping from XML to a Predefined Relation
� One wants to store certain information from the XML document in:
RDB: chapter(bookTitle, chapterNum, chapterTitle)
�Mapping: for each book, extract its title, and the numbers and titles of all its chapters
�Predefined relational key: (bookTitle, chapterNum)
� Can the XML document be mapped to the relation without violating the key?
chapter chapter
db
book bookbook book
title chapter
“XMLXMLXMLXML”
chapter
titletitletitletitle section
“6”
numbernumbernumbernumber section
number text DTDDTDDTDDTD
number
“1111” number XPathXPathXPathXPath
title
“XMLXMLXMLXML” numbernumbernumbernumber
“1111”
number
“10”
isbn isbn
titletitletitletitle
14
14
27
ICS-FORTH & Univ. of Crete Spring 2010
A Safe Mapping
� Now change the relational schema to
RDB: chapter(isbn, chapterNum, chapterTitle)
�The relation can be populated without any violation. Why?
�The relational key (isbn, chapterNum) for chapter is implied (entailed) by the keys on the original XML data:
(//book,{isbn})
(//book,(chapter,{number})
(//book,(title,{ }))
chapter chapter
db
book bookbook book
title chapter
“XMLXMLXMLXML”
chapter
titletitletitletitle section
“6”
numbernumbernumbernumber section
number text DTDDTDDTDDTD
number
“1111” number XPathXPathXPathXPath
title
“XMLXMLXMLXML” numbernumbernumbernumber
“1111”
number
“10”
isbn isbn
titletitletitletitle
28
ICS-FORTH & Univ. of Crete Spring 2010
Why Do We Care About Constraints?
� Constraints are a fundamental part of the semantics of the data –mapping from XML to relations should not lose the information
� Relational constraints are important for query optimization, data cleaning, and consistency/integrity maintenance, . . .
� Constraints help us determine whether a relational schema for storing XML data makes sense or not
� Problem statement: Constraint Propagation
�Input: a set K of XML keys, a predefined relational schema S, a mapping f from XML to S, and a functional dependency FD over S
�Output: is FD implied by K via f? I.e., does FD hold over f(T) for any XML document T that satisfies K?
Note: XML schema/DTD is not required – K is the only semantics
15
15
29
ICS-FORTH & Univ. of Crete Spring 2010
Constraints Can Do Even Better
� One wants to find a “good” relational schema to store:
chapter(isbn,bookTitle,author,chapterNum,chapterTitle)
� What is a good schema? In normal form: BCNF, 3NF, …
�Prevent update anomaly (the relational theory)
�Efficient storage, …
� But how to find a normalized design?
chapter chapter
db
book bookbook book
title chapter
“XMLXMLXMLXML”
chapter
titletitletitletitle section
“6”
numbernumbernumbernumber section
number text DTDDTDDTDDTD
number
“1111” number XPathXPathXPathXPath
title
“XMLXMLXMLXML” numbernumbernumbernumber
“1111”
number
“10”
isbn isbn
titletitletitletitle
30
ICS-FORTH & Univ. of Crete Spring 2010
Constraint Propagation and Normalization
From given XML keys:
(//book,{isbn}), (//book,(chapter,{number}),
(//book,(title,{ }))
one can derive functional dependencies:
isbn → bookTitle,
isbn, chapterNum → chapterTitle
Normalize the relation by using these functional dependencies:
chapter(isbn,bookTitle,author,chapterNum,chapterTitle)
book(isbn, bookTitle),
chapter(isbn, chapterNum, chapterTitle),
author(isbn, author)
The new schema is in BCNF!
16
16
31
ICS-FORTH & Univ. of Crete Spring 2010
Computing Minimum Cover of Propagated FDs
� Problem statement: Minimum Cover of Propagated FDs
�Input: a set K of XML keys, and a mapping f from XML to S
�Output: a minimum cover F of all the functional dependencies (FDs) implied by the XML keys K via f
�F is a cover (a set of FDs): any FD implied by K via f is also implied by F
�F is minimum: F contains no redundant FDs, i.e., any FD in F is not entailed by other FDs in F
� Why not compute the complete set of FDs implied by K via f?
�Exponentially large!
� Why care about minimum cover?
�Normalize relational schema for storing/querying XML data!
Remark: again XML schema/DTD is not required – K is the only semantic specification
32
ICS-FORTH & Univ. of Crete Spring 2010
Research Issues
� What we know: for XML keys and a restricted transformation language
(from XML to relations), there exist efficient algorithms
�Constraint propagation: polynomial
�Minimum cover of propagated FDs: polynomial
� For general constraints/transformation languages: beyond reach
�Undecidable if the transformation language is relationally complete
(selection, projection, join, union, difference), even for XML keys
�Undecidable if both XML keys and foreign keys are considered,
even for the identity “transformation”
� Open: identify
�practical transformation languages
�practical XML constraints
that allow efficient constraint propagation
17
17
33
ICS-FORTH & Univ. of Crete Spring 2010
� Store XML as a BLOB in relational database�index by materializing indexed expressions in separate columns�Plus: store XML in parsed and validated form�Minus: proprietary solution (blob is a black box)�Minus: replicate data for indexing
� Model-driven Shredding (schema-agnostic or generic) (Florescu, Kossmann 99)�Edge, binary approach + alternatives �Corresponds to generic APIs for Java �Plus: very general; integrates well with relational data�Minus: poor performance
� Schema-based Shredding (or schema-specific) (Shanmugasundaram et al. 99)�Map XML Schema / DTD to SQL DDL�Plus: integrates well with relational data�Minus: missing tools, complicated
� SQL/XML (latest product developments after 2005)�Extend SQL with XML data type�Plus: integrates well with relational data�Minus: not clear how it integrates with application, odd „marriage“
XML-RDB Summary of Storage Approaches
34
ICS-FORTH & Univ. of Crete Spring 2010
XML Publishing
� An XML view definition language: specifying desired mapping
� Efficient implementation of the view language
DB DB
XML
Web
XML
Q: XML view
18
18
35
ICS-FORTH & Univ. of Crete Spring 2010
From Relations to XML Views
ActorActorActorActor(aidaidaidaid, lname, fname, lname, fname, lname, fname, lname, fname) :
{ <001, 001, 001, 001, ““““ViterelliViterelliViterelliViterelli””””, , , , ““““JoeJoeJoeJoe””””>,
…}
MovieMovieMovieMovie(midmidmidmid, title, year, title, year, title, year, title, year) :
{ <011, 011, 011, 011, ““““Analyze ThisAnalyze ThisAnalyze ThisAnalyze This””””, 1999, 1999, 1999, 1999>,
<032, 032, 032, 032, ““““See Spot RunSee Spot RunSee Spot RunSee Spot Run””””, 2001, 2001, 2001, 2001>
…}
AppearanceAppearanceAppearanceAppearance(midmidmidmid, , , , aidaidaidaid) :
{ <001, 011001, 011001, 011001, 011>, <001, 032001, 032001, 032001, 032>
…}
<Actor aid=001Actor aid=001Actor aid=001Actor aid=001><LnameLnameLnameLname>ViterelliViterelliViterelliViterelli</LnameLnameLnameLname><FnameFnameFnameFname>JoeJoeJoeJoe</FnameFnameFnameFname><Movie mid=011 year=Movie mid=011 year=Movie mid=011 year=Movie mid=011 year=““““1999199919991999””””> > > >
Analyze This Analyze This Analyze This Analyze This </MovieMovieMovieMovie><Movie mid=032 year=Movie mid=032 year=Movie mid=032 year=Movie mid=032 year=““““2001200120012001””””> > > >
See Spot Run See Spot Run See Spot Run See Spot Run </MovieMovieMovieMovie>
</ActorActorActorActor><Actor aid=002Actor aid=002Actor aid=002Actor aid=002>
<LnameLnameLnameLname>WinterWinterWinterWinter</LnameLnameLnameLname><FnameFnameFnameFname>AlexAlexAlexAlex</FnameFnameFnameFname><Movie year=Movie year=Movie year=Movie year=““““1988198819881988””””> Bill and Ted> Bill and Ted> Bill and Ted> Bill and Ted’’’’ssssExcellent Adventure Excellent Adventure Excellent Adventure Excellent Adventure </MovieMovieMovieMovie>
36
ICS-FORTH & Univ. of Crete Spring 2010
Commercial Systems -- Canonical Publishing
� Canonical publishing: the universal-relation approach
�Embedding single SQL query in XSL stylesheet
�Result: canonical XML representation (1:1) of published relations
� Systems:
�Oracle 10g XML SQL facilities
�DB2
�SQL Server 2005
� Incapable of expressing practical XML publishing: default fixed XML
document template
�Map SQL Types to XML Schema Types
�Encode special characters: e.g., „<“ -> „<“
�Provide functions to construct XML Data
19
19
37
ICS-FORTH & Univ. of Crete Spring 2010
<Actor>
<Lname>Viterelli</Lname>
<Fname>Joe</Fname>
<Movie year=“1999”> Analyze This
</Movie>
<Movie year=“2001”> See Spot Run
</Movie>
</Actor>
<Actor>
<Lname>Winter</Lname>
<Fname>Alex</Fname>…
<SQL_stmt>
SELECT A.lname,A.fname, M.year, M.title
FROM Movie M, Actor A, Appearance Ap
WHERE M.mid=Ap.mid AND M.aid=A.Aid
ORDER BY aid
</SQL_stmt>
A.lname A.fname M.year M.title
Viterelli Joe 1999 See Spot Run
Viterelli Joe 2001 Analyze This
<element_node Actor>
<element_node Lname>
<text_node>
<Column name=“A.lname”/>
</text_node>
…
IBM DB2 SQL Statement giving
big relation
+ Formatting Template annotated withcolumns of the universal relation
Canonical Publishing
38
ICS-FORTH & Univ. of Crete Spring 2010
Map SQL Types to XML Schema Types
Xsd:patternXsd:durationINTERVAL
Xsd:patternXsd:dateTimeTIMESTAMP
Xsd:patternXsd:timeTIME
Xsd:patternXsd:dateDATE
Xsd:doubleDOUBLE
Xsd:floatFLOAT
precision, scaleXsd:decimalNUMERIC, DEC.
min-maxInclusiveXsd:integerSMALLINT, ...
Xsd:maxLengthXsd:base64binaryBLOB
Xsd:maxLengthXsd:stringVARCHAR
Xsd:lengthXsd:stringCHAR
Xsd:booleanBOOLEAN
RestrictionsRestrictionsRestrictionsRestrictionsXML SchemaXML SchemaXML SchemaXML SchemaSQLSQLSQLSQL
20
20
39
ICS-FORTH & Univ. of Crete Spring 2010
Architecture: XML Publishing Middleware
Query
GeneratorRequest
SQL
Queries
???
View
Definition
Query-cost Estimates
Source Capabilities
Tagger
40
ICS-FORTH & Univ. of Crete Spring 2010
XPERANTO
� Commercial system: IBM DB2 XML extender, SQL/XML
� Middleware (vendor-independent): XPERANTO
� Extending SQL with XML constructors:
select XML-aggregation
from R1, . . ., Rn
where conditions
� XML constructors (XML-aggregation): functions
�Input: tables and XML trees (forest)
�Output: XML tree
21
21
41
ICS-FORTH & Univ. of Crete Spring 2010
XML Publishing with XPERANTO
Relational schema:
Extended SQL:
select XMLAGG(ACTOR(lname,fname,
select XMLAGG(MOVIE(title,year)
from Appearance Ap,Movie M
where Ap.aid = A.aid and Ap.mid = M.mid
group order by A.lname, A. fname ))
from Actor A
Actor (aid, lname, fname)
Appearance (mid, aid)
Movie (mid, title, year)
<<<<ActorActorActorActor>
<LnameLnameLnameLname>ViterelliViterelliViterelliViterelli</LnameLnameLnameLname>
<FnameFnameFnameFname>JoeJoeJoeJoe</FnameFnameFnameFname>
<Movie year=Movie year=Movie year=Movie year=““““1999199919991999””””> . . . > . . . > . . . > . . . </MovieMovieMovieMovie>
<Movie year=Movie year=Movie year=Movie year=““““2001200120012001””””> . . . > . . . > . . . > . . . </MovieMovieMovieMovie>
</ActorActorActorActor>
<ActorActorActorActor>
<LnameLnameLnameLname>WinterWinterWinterWinter</LnameLnameLnameLname>
<FnameFnameFnameFname>AlexAlexAlexAlex</FnameFnameFnameFname>
…………
42
ICS-FORTH & Univ. of Crete Spring 2010
XML Constructors
� Actor constructor:
create function ACTOR(lname:str, fname:str, mlist:XML)<Actor>
<Lname> {lname} </Lname> <Fname> {fname} </Fname>mlist
</Actor>
� Movie constructor (mlist):
create function MOVIE(title:str, year:int)
<Movie year=“{year}”> {title} </Movie>
� Verbose and cumbersome
�small document: tedious
�large documents: unthinkable
22
22
43
ICS-FORTH & Univ. of Crete Spring 2010
SilkRoute
� Annotated template: embedding SQL in a fixed XML tree
�Middleware: SilkRoute
�Commercial: SQL Server 2005, IBM DB2 DAD, XML/SQL
� Advantages:
�More `modular’ comparing to the universal relation approach
�Limited schema-driven: conforming to a fixed doc template
root
Q1 Q2Q3
Q4 Q5
tag1 tag2 tag3
tag5 tag6
44
ICS-FORTH & Univ. of Crete Spring 2010
� Data transformations based on inter-schema constraints
�Given two schemas S1 and S2, want to translate instances of S1 to instances of S2. Constraint-driven approach:
�Start with a set of referential integrity constraints (foreign keys) from S1 to S2
�Derive schema (logical) mapping from S1 to S2, by reasoning about the given integrity constraints (chasing technique)
�Derive mapping queries for translating instances of S1 to instances of S2
� Pros and cons:
�Generic system, and can be used for XML publishing
�Semi-automated
�Chasing may not terminates for XML DTD/Schema
�Recursive target schema (S2)? unclear
Clio
23
23
45
ICS-FORTH & Univ. of Crete Spring 2010
Getting Real: Data Exchange on the Web
� All members of a community (industry) agree on a DTD and
then exchange data w.r.t. it: e-commerce, health-care, ...
� XML Publishing:
�mapping relational data to XML
�conforming to the predefined DTD
DTD
DB DB
XML
Web
XML
Q: XML view
46
ICS-FORTH & Univ. of Crete Spring 2010
Data Exchange: Insurance Company and Hospital
Daily report:
Relational database R at the hospital:
Patient (SSN, name, tname, policy#, date)
inTreatment (tname, cost)
outTreatment (tname, referral#)
Procedure (tname1, tname2)
treatment
� in hospital: composition hierarchy in Procedure
� outside of the hospital: referral#
R XML
hospital insurance companyXML view
24
24
47
ICS-FORTH & Univ. of Crete Spring 2010
Data Exchange: Insurance Company and Hospital
� DTD D predefined by the insurance company:
report → patient*
patient → SSN, pname, treatment, policy#
treatment → tname, (inTreatment + outTreatment)
inTreatment → treatment*
outTreatment → referral#
� How to define a mapping σ such that for any instance DB of R,
�σ (DB) is an XML document containing all the patients and their
treatments (hierarchy, referral#) from DB,
�σ (DB) conforms to D?
48
ICS-FORTH & Univ. of Crete Spring 2010
Challenge: Recursive Type
� XML data: unbounded depth -- cannot be decided statically
treatment → tname, (inTreatment + outTreatment)
inTreatment → treatment* --- recursive
SSN
“123”
treatment
inTreatment
...treatment
...
...report
patient patient patientpatient
treatmentname
tname
“Joe”
...
Policy#Policy#Policy#Policy#
<-- unbounded
25
25
49
ICS-FORTH & Univ. of Crete Spring 2010
Challenge: Non-Determinism
� The choice of a production (element type definition)
treatment → tname, (inTreatment ++++ outTreatment)-- depends on the underlining relational data
report
...patient patient patientpatient
...treatment treatment referral#
inTreatment
tnameoutTreatment
treatment <-- data-dependent
50
ICS-FORTH & Univ. of Crete Spring 2010
Existing Systems
� Fixed XML tree template or ignoring DTD-conformance
�middleware: SilkRoute (AT&T), XPERANTO (IBM),…
�systems: SQL Server 2005, IBM DB2 XML extender, …
incapable of coping with a predefined DTD (e.g. recursion)
� Type checking: define a view and then check its conformance
�undecidable in general, co-NEXPTIME for extremely restricted view
definitions
�no guidance on how to define XML views that typecheck
one gets an XML view that typechecks only after repeated failures and
with luck
26
26
51
ICS-FORTH & Univ. of Crete Spring 2010
� DTD: normalized; element type definitions e → α
α ::= PCDATA | ε | e1, …, en | e1 + … + en | e*
� Attributes: $e associated with each element type e
$e: tuple-valued, to pass data value as well as control
� Rules: associated with each e → α; for e’ in α, $e’ = Q($e)
�SQL query Q extracts data from DB
�parent attribute $e as a constant parameter in Q
ATG DTD
Semantic rulesattributes
=+
Schema-directed XML Publishing: Attribute Translation Grammar (ATG)
52
ICS-FORTH & Univ. of Crete Spring 2010
Semantics: Conceptual Evaluation
� Top-down
report → patient*
$patient ←←←← select SSN, name, tname, policy
from Patient --- SQL query
recall Patient (SSN, name, tname, policy#)
� Data-driven: a patient element for each tuple in Patient relation
report
patient ...patient patient patient
$patient $patient$patient$patient
27
27
53
ICS-FORTH & Univ. of Crete Spring 2010
Inherited attributes
Inherited: $child is computed using $parent
� patient → SSN, name, treatment, policy#
$SSN=$patient.SSN, $name=$patient.name
$treatment=$patient.tname $policy#=$patient.policy
recall $patient = (SSN, name, tname, policy)
� SSN → PCDATA
$PCDATA = $SSN report
SSN
“123”
patient ...patient patient patient
treatmentname
“Joe”
policy#
“LU23”
$patient
$SSNSSNSSNSSN
$PCDATAPCDATAPCDATAPCDATA
54
ICS-FORTH & Univ. of Crete Spring 2010
Coping with Non-Determinism
treatment → tname, (inTreatment + outTreatment)
$tname = $treatment
($inTreatment, $outTreatment) =
case Qc($treatment).tag --- conditional query
1: ($treatment, null)
else: (null, $treatment)
Qc: selectselectselectselect 1 as tag fromfromfromfrom inTreatment wherewherewherewhere tname= $treatment
� conditional query: the choice of production
� $parent as constant parameter in SQL query
inTreatmenttname
treatment
outTreatment
treatment
tname
...Qc($treatment)=nullQc($treatment)=1
28
28
55
ICS-FORTH & Univ. of Crete Spring 2010
Coping with Recursion
inTreatment → treatment*
$treatment ← select tname2
from Procedure
where tname1 = $inTreatment
recall Procedure (tname1, tname2)
� $parent as constant parameter in SQL query Q
� inTreatment is further expanded as long as Q(DB) is nonempty
inTreatment
treatment treatment
treatment
tname
...$treatment $treatment
56
ICS-FORTH & Univ. of Crete Spring 2010
DTD-Directed Publishing with ATGs
� DTD-directed: the XML tree is constructed strictly following the
productions of a DTD --- DTD conformance
� Data-driven: the choice of productions and expansion of the XML tree
(recursion) depends on relational data
treatment
inTreatment
...treatment
...
...report
patient patient patientpatient
treatmentname
tname“Joe”
...
policy#policy#policy#policy#SSN
“123”
<-- unbounded
data-dependent
29
29
57
ICS-FORTH & Univ. of Crete Spring 2010
ATGs vs. existing systems
� DTD-conformance:
�ATGs: provide guidance for how to define DTD-directed publishing
�Other systems: based on a fixed tree template
� Expressive power: strictly more expressive than others
�ATGs: capable of expressing XML views supported by other
systems
�Other systems: cannot handle recursion/nondeterminism
ATG: the first systematic method to ensure DTD-conformance
58
ICS-FORTH & Univ. of Crete Spring 2010
ATGs vs. Attribute Grammars (AGs)
�AGs:
�Definition: w.r.t. a CFG
�Evaluation: parse a string with the CFG, then evaluate attributes
given the parse tree
�ATGs: combining DTD and database operations
�Definition: w.r.t. an ECFG (DTD) and SQL queries
�Evaluation: given DB, extract relevant data from DB with SQL
queries to build an XML tree of the DTD
It does not make sense to “parse” a database w.r.t. a DTD
ATGs are not a mild variation of AGs
30
30
59
ICS-FORTH & Univ. of Crete Spring 2010
PRATA: Middleware based on ATGs
XMLATG
R
parsing
query plan generation, evaluation
relationstagging
ATG graph
cost estimate
query
statistics
query results
� ATG graph: representing the ATG
� relations: representing root-leaf paths --->
� tagging: one pass
DCBA
60
ICS-FORTH & Univ. of Crete Spring 2010
Evaluation of ATGs
� Conceptual evaluation: defining semantics, but not efficient
� Techniques proved useful
�partitioning: reduce db visits, make use of DBMS optimizer
(handling annotated templates – complete tree)
�Relations representing XML trees: root-leaf paths
�Separate tagging
root
Q1 Q2Q3
tag1 tag3
Q4 Q5
tag2
tag5 tag6
partitioning
31
31
61
ICS-FORTH & Univ. of Crete Spring 2010
ATG graph
inTreatment outTreatment
referral#
report
patient
nameSSN treatment
tname
policy#
Q1
Q3
Q2 Q5
Q4
*
Q7Qc
procedure
Q6 Q9
*
Q8
� edge labels: SQL queries
� cyclic: introduced by recursion -- not in other systems
� iterative unfolding: partial ATG tree to a depth d
62
ICS-FORTH & Univ. of Crete Spring 2010
Partitioning partial ATG trees
� query composition (outer join)
� challenge: how to partition into clusters?
�finer ⇒ fewer nulls, smaller queries sent to DBMS
�coarser ⇒ leverage DBMS optimizer, less DB visits
Finding an optimal strategy: NP-hard
...
report
name
treatment
patient
SSN policy#
Q1
Q3Q2 Q5
Q4
*
*
inTreatment outTreatment
referral#
tnameQ7
Qc
procedureQ6 Q9
Q8
treatment
composed Q’
composed Q
32
32
63
ICS-FORTH & Univ. of Crete Spring 2010
Materialization of Intermediate Results
Partitioning: root-leaf relation R⇒ intermediate R1, R2, R3, R4
� repeated computation: each Ri carrying a key from the root
�materialization: storing keys in temporary tables
�cons: overhead with temporary tables
�pros: reducing recomputation in descendant queries
What keys to materialize? Not supported by other systems
R1
R2 R3
P1
P2P3
P4
output relation R
R4
sort merge
sort merge key
key
materializing keys in R1
materializing keys in R3
64
ICS-FORTH & Univ. of Crete Spring 2010
Query Plan Generation
� New challenge: find “best” partitioning and materialization
�mutually dependent: partitioning and materialization must be taken
together
�expensive: exponentially many choices – NP-hard
� Efficient cost-based heuristic:
�DBMS statistics: arity/cardinality of a table, cost to evaluate a
query/create temporary table
�cost estimate: simple formulae for computing the benefit of
materialization/partitioning -- effective in practice
�“optimal strategy” -- dynamic programming: decides partitioning
w.r.t. the benefit of materialization
33
33
65
ICS-FORTH & Univ. of Crete Spring 2010
ATG Summary
� ATGs:
�ensure DTD-conformance automatically
�support recursion/non-determinism naturally
�simple: no need for a new language – knowing DTD + SQL is all
that one needs for writing an ATG
� PRATA: middleware based on ATGs
�novel technique of combining partitioning and materialization
�a practical solution for DTD-directed publishing
ATG: the first systematic way for DTD-directed publishing
66
ICS-FORTH & Univ. of Crete Spring 2010
Extension: Capturing Integrity Constraints
� XML schema:
�Type (DTD)
�Integrity constraints: keys, foreign keys
� Schema-directed XML publishing: automatically guarantee that the
target document both conforms to the type and satisfies the constraints
-- in a single framework
Challenge: consistency analyses undecidable
DB
XML: ATG+DTD constraints
34
34
67
ICS-FORTH & Univ. of Crete Spring 2010
Schema-directed XML to XML transformations
� Mapping between XML documents
� XTG: XML Transformation Grammar
�DTD augmented with XQuery
�Constraint-compilation
XML XML
DTD
constraints
transformation
XTG
68
ICS-FORTH & Univ. of Crete Spring 2010
� Why publish XML data? What is the major difficulty?
� One can publish relational data via an XQuery view. How to do it?
What are the pros and cons of this approach?
� What is schema-directed publishing? Why do we need it?
� Why does ATG automatically ensure DTD-conformance?
Publishing Summary
35
35
69
ICS-FORTH & Univ. of Crete Spring 2010
Commercial Systems: MS SQL Server 2005
� CLOB (character large objects)
�Annotated schema (XSD): fixed tree templates�Non-recursive schema�associate elements and attributes with table and column names�core-dumping entire elements or documents�Given a relational database, XSD populates an XML elements/attributes with corresponding tuples/columns
� Data loading: �Combine INSERT and node( ), value( ) (OPENXML), XPath�OPENXML: access to XML data as a relational rowset�selective shredding, limited recursion, can’t store the entire document in a single pass
� Query support:
�SQL and SQL extensions for publishing results as XML (For-Xml clause)
�An extension of SQL with an FOR-XML construct
�Nested FOR-XML to construct XML documents
70
ICS-FORTH & Univ. of Crete Spring 2010
Commercial Systems: MS SQL Server 2005
� Summary:
�Support both entire-document and selective shredding
�Able to increment existing tables
�Nonrecursive schema
�Does not support context-dependent tuple construction: to
construct (A, B), one cannot extract attribute B based on the
extracted value of A (inf. passing)
�Does not support entire document shredding and selective
shredding in a uniform framework
�incapable of supporting schema-directed publishing
�can’t define recursive XML views (bounded recursion depth)
36
36
71
ICS-FORTH & Univ. of Crete Spring 2010
Microsoft SQL Server 2005
XML ParserXML ParserXMLXML
ValidationValidation
XML XML datatypedatatype
(binary XML)(binary XML)
SchemaSchema
CollectionCollection
XMLXML RelationalRelational
XML SchemataXML Schemata
OpenXMLOpenXML/nodes()/nodes()
FOR XML with FOR XML with
TYPE directiveTYPE directive
RowsetsRowsets
query()query()
modify()modify()Node Node
TableTable
PATH PATH
Index Index
PROP PROP
Index Index
VALUE VALUE
Index Index
PRIMARYPRIMARY
XML INDEXXML INDEX
query()query()
72
ICS-FORTH & Univ. of Crete Spring 2010
Microsoft SQL Server 2005: Indexing
� Create XML index on XML column
CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc)
� Creates secondary indexes on tags, values, paths
� Speeds up queries
�Results can be served directly from index
�Entire query is optimized
�Same award winning cost based optimizer
�Indexes are used as available
� Create XML index on XML column
CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc)
� Creates secondary indexes on tags, values, paths
� Speeds up queries
�Results can be served directly from index
�Entire query is optimized
�Same award winning cost based optimizer
�Indexes are used as available
37
37
73
ICS-FORTH & Univ. of Crete Spring 2010
Commercial System: IBM DB2 XML Extender
� XML Columns: CLOBs + side tables for indexing individual elements
� User-defined mapping through DAD (Document Access Definition): a fixed XML tree template (non-recursive)
�SQL mapping: template-based language to publish relational data as XML (to be discussed in the next lecture)
�RDB node mapping: for both publishing and storing XML (associating element and attributes with tables and columns)
� XML Collections: Declarative decomposition of XML into multiple tables
�Data loading: follows DAD mapping
� SQL/XML: an extension of SQL with XML constructors (XMLAGG, XMLELEMENT, etc) as discussed earlier
74
ICS-FORTH & Univ. of Crete Spring 2010
Commercial System: IBM DB2 XML Extender
� Summary:
�Support both entire-document and selective shredding
�Support recursive schema
�cannot increment existing tables (DB2)
�does not support context-dependent tuple construction
�incapable of supporting schema-directed publishing
�can’t define recursive XML views
38
38
75
ICS-FORTH & Univ. of Crete Spring 2010
IBM DB2 XQuery Support
Hybrid SQL/XQueryCompiler
XML Navigation
Query Evaluation Run-time
SQL/XML Parser XQuery Parser
Table Storage XML Indexes XML Storage
Native XML storage
High-performanceXPath processing
Unified internal
representation
XML node-level,text indexes
SQL extendedfor XML
Dynamic function dispatch
XML data typein SQL
XQuery top-levelparser
LanguagesCompose
76
ICS-FORTH & Univ. of Crete Spring 2010
Commercial System: Oracle 10g XML DB
� Store XML data in CLOB (character large objects) or tables
�Canonical mapping into object-relational tables
�tag names are mapped to column names
�elements with text-only map to scalar columns
�elements with sub-elements map to object types
�list of elements maps to collections
�Indexing: standard relational
�Core-dumping entire elements
�Annotated schema: recursive
� Publish relational data via DBMS_XMLGEN, a PL/SQL package
�Supports recursive XML view definition (via linear recursion of SQL’99)
� does not support schema-directed XML publishing
39
39
77
ICS-FORTH & Univ. of Crete Spring 2010
Commercial System: XQuery Support
� One of the first relational database to ship an XQuery implementation
�Supports XMLQuery and XMLTable construct
�Native compilation into SQL /XML structures
�Returns XMLType(Content)
�Can query over relational, O-R, XMLType data
�fn:doc - Maps to XDB Repository on server
�SQLPlus provides xquery command to execute XQuery
�XSL-T will also get compiled to XQuery
� Other query support:
�CLOBs: SQL + Oracle Text; XPath
�Canonical: SQL
78
ICS-FORTH & Univ. of Crete Spring 2010
Oracle Database 10gr2 XQuery Support
XQuery XSL-T Parser
Compiler Rewrite to SQLX
Normalization
XQuery Type checkSQLX rewrite
Execution engine
XQueryX
Compiled XQuery Tree
SQL MetadataXMLSchemaRepository
Statically Type checked Tree
Normalized Tree (casts, treat )
SQL/XML Operand Tree
Relational Optimizer
SQL Operand Tree
XML Indexes, Text Indexes
XQuery F&O
Execution Structures
SQL
XQUERY
40
40
79
ICS-FORTH & Univ. of Crete Spring 2010
Commercial Systems: Summary
� Data design:
�CLOBs
�Fixed canonical mappings
�Mappings in terms of annotated schema
� Querying:
�SQL as the main access method to XML documents “XML-aware”extensions to SQL
�E.g., Limited XPath navigation syntax ongoing Xquery support
� No support for
�context-dependent tuple construction
�recursive schema (Microsoft, IBM DB2)
�incrementing existing tables (Oracle)
�constraint propagation
80
ICS-FORTH & Univ. of Crete Spring 2010
Native Techniques
� Built from scratch
� NatiX (University of Mannheim, Germany)
� Xyleme (France)
� Xindice (Apache – open source)
� Re-tool existing systems to handle XML
� Tamino: hierarchical database (ADABAS)
� Excelon: OODB
� Design efficient data structures for compact storage and fast access; data partitioning; indexing on both values and structure
41
41
81
ICS-FORTH & Univ. of Crete Spring 2010
NatiX
� Unit of storage = element
� Elements clustered to minimize page hits
� Inter-element pointers capture document structure
� Low-level algorithmic support for read/write/insert/delete operations
� No use of DTDs or XML Schema
82
ICS-FORTH & Univ. of Crete Spring 2010
Xyleme
� Data layout: based on NatiX
� Indexing: sophisticated indexing of text and elements
� Query support: XPATH, XQuery, updates
� A data warehouse for XML content: store, classify, index, integrate, query and monitor massive volumes of XML content
� Semantic services: extensible thesauri and schema mappers that enable the system to go beyond simple indexing
42
42
83
ICS-FORTH & Univ. of Crete Spring 2010
Xyleme/Natix
84
ICS-FORTH & Univ. of Crete Spring 2010
eXcelon XIS
� Extends Object Store – an object-oriented database
� Data Layout: stores parsed nodes (accessible through DOM interface)
� Indexing: value indexes for strings and numbers; text indexes; structural indexes
� Query Support: DOM, XSLT, XPath, XQuery, updates
� Other features:
� Node-level management
� Data is stored in a pre-parsed format: only data objects needed for an operation are loaded into memory
� Create, add, delete, update elements directly
� Handles arbitrary XML documents - without Schema or DTD
� But can enforce schemas if necessary
� Triggers; transactions; distributed caching mechanism
43
43
85
ICS-FORTH & Univ. of Crete Spring 2010
Software A/G Tamino
� Extends Adabas – nested relations
� Indexing: value and structure
� Query support:
� Full-text search operators
� Queries return entire document or some projection of document
� No construction of new XML values (Ex: XQL)
� Access control at the node level, transactions; multi-media; triggers; backup/restore; compression; support for multi-media documents, e.g., video
86
ICS-FORTH & Univ. of Crete Spring 2010
XQuery Java implementation
� XQuery or XQueryX input
� Extensible function implementation
� Compiles into rowsource like structures
� Optimization – push XQuery to XMLDB
� XQJ API driver – for accessing mid tier/backend
� Shared data model with XSL/XPath
� Shared F&O – pre-defined & external
�Standard Function implementation interfaces
�Write Java func once – use it in XQuery/XSLT
44
44
87
ICS-FORTH & Univ. of Crete Spring 2010
DB Adapter
Java Execution engine
Java Layer Architecture
XQJ
API
Dr i ver
XQueryXQueryXXPath
XQJ API (Java)
conn = datasrc.getConnection();expr = conn.prepareExpression(
‘for $i in doc(“xxx”) return $i’);res = expr.executeQuery();while (res.next()) { … }
Push down query
select *from XMLTable(‘for $i in doc(“xx”) return $i);
User
XMLDB
SQL + XQuery orXQueryX
XQuery aware SQL engine
Rewrite to SQLX
Normalization
XQuery Type check
SQL
Compiler
XMLQuery, XMLTable
SQL/XML Rel optimizer, Execution engine
Compiler
XQuery Parser
SQLX rewrite
Query Pushdown
Normalization
Type check
Compiler
Parser
XDS data sources, files
Push down
XQuery Java Engine
88
ICS-FORTH & Univ. of Crete Spring 2010
References
� Serge Abiteboul,Sophie Cluet,Tova Milo: Querying and Updating the File. VLDB 1993
� D. Barbosa,A. Barta,A. Mendelzon,G. Mihaila, F. Rizzolo, P. Rodriguez-Gianolli: ToX – The Toronto XML Engine, International Workshop on Information Integration on the Web, Rio de Janeiro, 2001.
� Phil Bohannon, Juliana Freire, Prasan Roy, Jérôme Siméon: From XML Schema to Relations: A cost-based Approach to XML Storage. ICDE 2002
� Michael J. Carey,Jerry Kiernan, Jayavel Shanmugasundaram, Eugene J. Shekita, Subbu N. Subramanian: XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents.VLDB 2000
� Qiming Chen, Yahiko Kambayashi: Nested Relation Based Database Knowledge Representation. SIGMOD Conference 1991
� Vassilis Christophides, Sophie Cluet, Jérôme Siméon: On Wrapping Query Languages and Efficient XML Integration. SIGMOD Conference 2000: 141-152
� Alin Deutsch, Mary F. Fernandez, Dan Suciu: Storing Semistructured Data with STORED. SIGMOD Conference 1999
� A. Eisenberg, J. Melton: Advancements in SQL /XML. ACM SIGMOD Record, 2004.
45
45
89
ICS-FORTH & Univ. of Crete Spring 2010
References
� Daniela Florescu, Donald Kossman: A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database. IEEE Data Eng. Bulletin 1999
� Minos N. Garofalakis, Aristides Gionis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim: XTRACT: A System for Extracting Document Type Descriptors from XML Documents. SIGMOD Conference 2000
� Roy Goldman, Jennifer Widom: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. VLDB 1997
� P.J. Marron, G. Lausen: On Processing XML in LDAP, VLDB 2001� Carl-Christian Kanne, Guido Moerkotte: Efficient Storage of XML Data.Technical Report 8/99, University of Mannheim, 1999
� Feng Tian, David J. DeWitt, Jianjun Chen, and Chun Zhang: The Design and Performance Evaluation of Various XML Storage Strategies, Technical report, University of Wisconsin
� Masatoshi Yoshikawa, Takeyuki Shimura, Shunsuke Uemura: XRel: A Path-Based Approach to Storage and Retrieval of XML Documents Using Relational Databases
� Chun Zhang, Jeffrey F. Naughton, David J. DeWitt, Qiong Luo, Guy M. Lohman: On Supporting Containment Queries in Relational Database Management Systems. SIGMOD 2001
� Justin Zobel, James A. Thom,Ron Sacks-Davis: Efficiency of Nested Relational Document Database Systems. VLDB 1991