XTree for Declarative XML Querying

44
1 XTree for Declarative XML Querying Zhuo Chen, Tok Wang Ling, Mengchi Liu, and Gillian Dobbie January 2004

description

XTree for Declarative XML Querying. Zhuo Chen, Tok Wang Ling, Mengchi Liu, and Gillian Dobbie January 2004. Outlines. Introduction Preliminaries XTree Algorithm to transform XTree query to XQuery Conclusion and future works. Outlines. Introduction Preliminaries XTree - PowerPoint PPT Presentation

Transcript of XTree for Declarative XML Querying

Page 1: XTree for  Declarative XML Querying

1

XTree for Declarative XML

QueryingZhuo Chen, Tok Wang Ling,

Mengchi Liu, and Gillian Dobbie

January 2004

Page 2: XTree for  Declarative XML Querying

2

Outlines

Introduction Preliminaries XTree Algorithm to transform XTree query to XQuery Conclusion and future works

Page 3: XTree for  Declarative XML Querying

3

Outlines

Introduction Preliminaries XTree Algorithm to transform XTree query to XQuery Conclusion and future works

Page 4: XTree for  Declarative XML Querying

4

Introduction

How to query XML documents is an important issue in XML research

Various query languages proposed: XPath, XQuery, Lorel, XML-GL, XQL, XML-QL,

XSLT, YATL, XDuce, a rule-based semantic querying, a declarative XML querying, etc

XQuery based on XPath is selected as the basis for an official W3C query language for XML

Page 5: XTree for  Declarative XML Querying

5

Introduction

In this paper, we will Analyze the limitations of XPath Propose a new set of syntax rules called XTree,

which is a generalization of XPath Show how XTree can efficiently replace the

notations of XPath Give algorithms to convert queries based on

XTree expressions to standard XQuery queries

Page 6: XTree for  Declarative XML Querying

6

Outlines

Introduction Preliminaries

Background on XPath Limitations of XPath

XTree Algorithm to transform XTree query to XQuery Conclusion and future works

Page 7: XTree for  Declarative XML Querying

7

Preliminaries

XPath A W3C standard A set of syntax rules for defining parts of an XML

document It uses paths to identify nodes (elements and

attributes) in XML documents These path expressions look very much like

computer file system

Page 8: XTree for  Declarative XML Querying

8

Background on XPath Sample XML document of a bibliography

<bib name=“IT”> <book id=“b001” year=“1994”> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id =“b002” year=“1992”> <title>Advanced Programming in the Unix Environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id=“b003” year=“2000”> <title>Data on the Web</title>

<edition>3</edition> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann</publisher> </book> <journal id=“j001” year=“1998”> <title>XML</title> <editor><last>Date</last><first>C.</first></editor> <editor><last>Gerbarg</last><first>M.</first></editor> <publisher>Morgan Kaufmann</publisher> </journal></bib>

Page 9: XTree for  Declarative XML Querying

9

Background on XPath XPath examples

/bib/book/@year Get attribute “year” of each book

/bib/book/author Get element “author” of each book

//author Get all elements named “author”, regardless of their absolute paths

/bib/book/* Get all sub-elements of each book

/bib/book/@* Get all attributes of each book

/bib/book[2] Get the second book element

/bib/book[last()] Get the last book element

Page 10: XTree for  Declarative XML Querying

10

Background on XQuery

XQuery An XML querying language to search XML

documents Based on XPath FLWOR statements

For – Let – Where – Order by – Return For clause iterate the variable over the result of its expression Let clause bind the variable to the result of its expression

Complex queries (nested clauses) Complex result constructions User-defined functions

Page 11: XTree for  Declarative XML Querying

11

Background on XQuery

XQuery example List year an title of all books published after 1995

for $book in /bib/bookwhere $book/@year > 1995return <book> { $book/@year } { $book/title } </book>

<book year=“2000”> <title>Data on the Web</title></book>

XQuery:

Result:

Page 12: XTree for  Declarative XML Querying

12

Limitations of XPath XPath has some limitations:

1. We can only assign one variable for each XPath expression It is just a linear path, which is not like the XML’s tree

structure Inefficient If a query needs to get values from several places, it has to

use several paths

2. It is difficult to reveal the relationship among correlated XPaths This may cause mistakes if a user does not pay

attention when writing a query Eg, if we want to output title and author of each book

XPath 1: /bib/book/title, XPath 2: /bib/book/authorWrong! The above two paths are not correlated

Page 13: XTree for  Declarative XML Querying

13

Limitations of XPath XPath has some limitations:

3. XPath is inefficient to express query that returns elements at path A while the condition is in a distant path B Difficult to distinguish condition branch from target branch Especially for multiple conditions and nested conditions Eg, find the value of publisher id of a book which has an author

with last name as “Stevens” and first name as “W.” /bib/book/author[last=“Stevens” and first=“W.”]/../publisher/@pubid

4. XPath expressions are only used in the querying part of XQuery, not in the result construction part In XQuery, the result construction part mixes literal text,

variable evaluation and even nested sub-queries The whole query is difficult to read and comprehend

Page 14: XTree for  Declarative XML Querying

14

Limitations of XPath XPath has some limitations:

5. XPath can only bind variable on the whole node (element or attribute) structure, which is a name-value pair If we want to get the substructure of the node, we have to

invoke built-in functions local-name() to get node name string() to get string value

Difficult to query XML documents with unknown structure, or to rename the nodes in the result

Eg, Suppose we do not know the sub-structure of book element, we want to re-structure books in this way: keep text nodes and sub-elements unchanged, but convert attributes to sub-elements:

for $book in /bib/booklet $attrib := $book/@*return <book> { $book/text(), $book/* } <attribute name={ local-name($attrib) } value={ string($attrib) }/> </book>

Page 15: XTree for  Declarative XML Querying

15

Outlines

Introduction Preliminaries XTree

Basic syntax XTree for querying XTree for result construction

Algorithm to transform XTree query to XQuery Conclusion and future works

Page 16: XTree for  Declarative XML Querying

16

XTree

XTree is a generalization of XPath XTree has a tree structure like XML XTree is more efficient than XPath

In the querying part, one XTree expression can bind multiple variables In XQuery, one XPath expression can only bind one variable

In the result construction part, one XTree expression can be used to define the result format Avoid nested structure in the query Make the whole query easier to read and understand

Supports list-valued variables explicitly, and determines their values uniquely

Page 17: XTree for  Declarative XML Querying

17

XTree syntax Similar to that of XPath

/ means parent-child hierarchy // means no matter how many levels down (ancestor-descent)

( ) in front to indicate the URL of the document Sibling tree nodes are enclosed by { }, and separated by

commas { } can be nested In XTree, conditions are written directly without { }

Use logic variables as place holders to bind/match the values at their places → to assign variables in the querying part ← to get values from variables in the result construction part

Only interested sub-trees are written in XTree, not the whole XML tree structure

Page 18: XTree for  Declarative XML Querying

18

XTree for querying

Example. For the sample bibliography document, suppose we want to get the year and title of each book, and its authors’ last names and first names We can use the variables $y, $t, $first, $last to bind

them respectively as in the following XTree expression:

for $book in /bib/book, $y in $book/@year, $t in $book/title, $author in $book/author, $last in $author/last, $first in $author/first

/bib/book/{@year→$y, title→$t, author/{last→$last, first→$first}} We can instantiate many variables in one XTree expression The above XTree expression corresponds to the following 6 XPath expressions in XQuery:

Symbol → will assign values of nodes on the left side to the variable on the right side

Page 19: XTree for  Declarative XML Querying

19

XTree for querying

Example. Suppose we want to get the last name and first name elements at whatever depth in the document, we can write the following XTree expression:

/bib//{last→$last, first→$first} The square braces enclosing two elements last

and first specifies that these two elements are sibling.

According to the XML document, the parent of sibling elements last and first is /bib/book/author or /bib/journal/editor

XTree allows a user to use path abbreviation as in XPath

Page 20: XTree for  Declarative XML Querying

20

XTree for querying

Example. Suppose we want to obtain some attribute with value “2000” in some book element, and bind variable $b to that book:

/bib/book→$b/@$attr=“2000” According to the sample document, $b will bind to the

third book, and $attr will bind to the attribute name “year”.

XTree allows a user to bind variables on the structure of XML document A user can assign variable $var on the left side of → symbol Here $var will bind to the name of the corresponding node

Page 21: XTree for  Declarative XML Querying

21

XTree

Two types of variables Single-valued variables

$X An element instance of the specified path

List-valued variables {$X} A list of all $X instances Explicitly indicated by a pair of curly braces

Note that both sibling nodes and list-valued variables are enclosed by curly braces Sibling nodes will have commas as separators in the braces List-valued variables does not have commas in the braces

Page 22: XTree for  Declarative XML Querying

22

List-valued variables

Aggregate functions

Suppose list-valued variable {$nums} binds to a list of numbers {$nums}.count() returns the number of items in the list {$nums}.avg() returns the average value of items in the list {$nums}.min() returns the minimum value in the list {$nums}.max() returns the maximum value in the list {$nums}.sum() returns the sum of values in the list

Object-oriented functions of list-valued variables:

Page 23: XTree for  Declarative XML Querying

23

List-valued variables

List operations Suppose list-valued variable {$names} binds to a list of name elements

Object-oriented functions of list-valued variables:

{$names}.[1-3, 6] returns a sublist of 1st to 3rd items, and 6th item {$names}.last() returns the last item in the list {$names}.sort() sorts the items in the list in ascending order {$names}.sort_desc() sorts the items in the list in descending order {$names}.distinct() eliminates duplicate items in the list {$names}.random(3) picks out 3 items randomly $name {$names} check whether an item is in the list {$names’} {$names} check whether the first list is a sub-list of the

second list

Page 24: XTree for  Declarative XML Querying

24

Semantics of list-valued variables Definition 1. The associated path of variable $a (or {$a}) is

the absolute path expression from root to the nodes represented by $a (or {$a}). /bib/book→$b/title→$t the associated path of $t is /bib/book/title.

Definition 2. Variable $a is an ancestor variable of $b if $a and $b are defined in the same XTree expression, and the associated path of $a is a prefix of the associated path of $b. /bib/book→$b/{title→$t, author→$a} $b is an ancestor variable of $t and $a, but $t is not an ancestor

variable of $a.

Page 25: XTree for  Declarative XML Querying

25

Semantics of list-valued variables Definition 3. In an XTree expression, when a variable is

bound to a value in the query evaluation, the variable is instantiated. /bib/book/{author→$a/first→$first, title→$t} In the evaluation, when we have reach /bib/book/author, $a is

instantiated; when reach /bib/book/author/first, $first is instantiated.

Definition 4. The value of list-valued variable {$a} is a list of all instances of $a with all its ancestor variables instantiated. /bib/book/author→{$a} {$a} means all the author elements

of all the books /bib/book→$b/author→{$a} {$a} means all the authors of a

certain book $b

value of {$a}

value of {$a}

Page 26: XTree for  Declarative XML Querying

26

XTree for result construction

XTree expression can also be used to define the result format

Symbol ← will get values of variables from right side and assign them to the expression on the left side

The result construction part is just one XTree expression No nested structure as the return clause of XQuery

Since XTree already has a tree structure Easy to read and understand Must be concrete

No condition checking or uncertainty in the structure Unlike XTree expressions in the querying part

Page 27: XTree for  Declarative XML Querying

27

XTree for result construction Example. We want to list the titles and publishers of books which

are published after 1993, suppose we have bound the variables by the following XTree expression:

/bib/book/{@year>1993, title→$t, publisher→$p}We can write the following XTree expression to define the result format:

/result/recentbook/{title←$t, publisher←$p} The result format is defined as: under the root result, each recentbook

element will store the title and publisher of that book<result>

<recentbook> <title>TCP/IP Illustrated</title>

<publisher>Addison-Wesley</publisher> </recentbook> <recentbook>

<title>Data on the web</title> <publisher>Morgan Kaufmann</publisher> </recentbook>

<result>

Page 28: XTree for  Declarative XML Querying

28

XTree for result construction Example. For each book, show the title, the number of authors and

the first author, suppose the variable bindings are defined in the following XTree expression:

/bib/book/{title→$t, author→{$a}}

We can write the following XTree expression to return the result:

/result/book/{title←$t, authNum←{$a}.count(), author←{$a}[1]} {$a}.count() counts the

number of items in the {$a} list

{$a}[1] returns the first item in the {$a} list

Output:

<result> <book> <title>TCP/IP Illustrated</title> <authNum>1</authNum> <author><last>Stevens</last><first>W.</first></author> </book> <book> <title>Advanced Programming in the Unix Environment</title> <authNum>1</authNum> <author><last>Stevens</last><first>W.</first></author> </book> <book> <title>Data on the Web</title>> <authNum>3</authNum> <author><last>Abiteboul</last><first>Serge</first></author> </book></result>

Page 29: XTree for  Declarative XML Querying

29

XTree for result construction

Example. Suppose we want to return a book whose title is “Computer Architecture”, and which does not have a specified author, we can write the following XTree expression:

/bib/book/{title←“Computer Architecture”, no-author}

It will output the following XML segment:

The right side of ← symbol can be: A pre-defined variable or invocation of functions on variables Literal text, indicating static content Omitted, indicating an empty value

<bib> <book> <title>Computer Architecture</title> <no-author/> </book></bib>

Page 30: XTree for  Declarative XML Querying

30

XTree for result construction

Query based on XTree expressions has QWOC (Query-Where-Order by-Construct) statements Query clause contains one or more XTree expressions for

selection and variables binding Where clause is optional, it defines constraints Order by clause is optional, it defines the ordering Construct clause contains one XTree expression to define

the output format

Page 31: XTree for  Declarative XML Querying

31

Outlines

Introduction Preliminaries XTree Algorithm to transform XTree query to XQuery

An algorithm to transform an XTree expression in the query part to a set of XPath expressions

An algorithm to transform an XTree expression in the result construction part to some nested XQuery expressions

Conclusion and future works

Page 32: XTree for  Declarative XML Querying

32

Transformation algorithm for querying part Transform an XTree expression in the querying part

to a set of XPath expressions Not as trivial as just extracting each path associated with a

variable to be an XTree expression Variables may correlate to each other by some common

ancestors We have to use such common ancestors to constrain the

descendent variables The common ancestors we want are just those branching

nodes (the nodes just before every pair of square braces for branching) Use stack to store such common ancestors for later use

Page 33: XTree for  Declarative XML Querying

33

Transformation algorithm for querying part Process the XTree expression from left to right, for each

common ancestor of variables (except the root), assign a single-valued variable on it if it is not originally bound to a variable

Translate each single-valued variable to be an XPath expression in a for clause; translate each list-valued variable to be an XPath expression in a let clause Try to write the path expression of a variable to be the relative

path of its nearest ancestor variable (make use of the stack) If it has such ancestor variable, then write its path expression to be

the relative path from that ancestor variable If it does not have any ancestor variable, then write its path

expression to be the absolute path from the root The output paths will be in depth-first order of the XTree

Page 34: XTree for  Declarative XML Querying

34

Transformation algorithm for querying part Example:

/bib/{book/{title→$t, author→{$a}}, journal/{title→$jt, editor/{last→$last, first→$first}}}

XPaths generated:for $b in /bib/book for $t in $b/title let $a := $b/authorfor $j in /bib/journalfor $jt in $j/title for $e in $j/editor for $last in $e/last for $first in $e/first

/bib/{book→$b/{title→$t, author→{$a}}, journal/{title→$jt, editor/{last→$last, first→$first}}}/bib/{book→$b/{title→$t, author→{$a}}, journal/{title→$jt, editor/{last→$last, first→$first}}}/bib/{book→$b/{title→$t, author→{$a}}, journal/{title→$jt, editor/{last→$last, first→$first}}}/bib/{book→$b/{title→$t, author→{$a}}, journal→$j/{title→$jt, editor/{last→$last, first→$first}}}/bib/{book→$b/{title→$t, author→{$a}}, journal→$j/{title→$jt, editor/{last→$last, first→$first}}}/bib/{book→$b/{title→$t, author→{$a}}, journal→$j/{title→$jt, editor→$e/{last→$last, first→$first}}}/bib/{book→$b/{title→$t, author→{$a}}, journal→$j/{title→$jt, editor→$e/{last→$last, first→$first}}}/bib/{book→$b/{title→$t, author→{$a}}, journal→$j/{title→$jt, editor→$e/{last→$last, first→$first}}}

Page 35: XTree for  Declarative XML Querying

35

Transformation algorithm for result construction part Transform an XTree expression in the result construction

part to some XQuery expressions More complicated

We will often encounter nested sub-queries in XQuery Consider the case that the node name to get the variable

value is different from the node name where the variable was bound in the querying part

Process the XTree expression step by step Find the corresponding XPath expression of each variable in

the XPaths generated from last algorithm Translate each variable value substitution to some XQuery

statement Use curly braces { } to form sub-query blocks according to the

structure of the XTree expression in construct clause

Page 36: XTree for  Declarative XML Querying

36

Transformation algorithm for result construction part Example:

query /bib/{book/{title→$t, author→{$a}}, journal/{title→$jt, editor/{last→$last, first→$first}}}construct /result/{book/{name←$t, authors/{@count←{$a}.count( ), au←{$a}}}, journal/{title←$jt, editor/{first←$first, last←$last}}}

Generated XPath expressions of the querying part:

for $b in /bib/bookfor $t in $b/titlelet $a := $b/authorfor $j in /bib/journalfor $jt in $j/titlefor $e in $j/editorfor $last in $e/lastfor $first in $e/first

Page 37: XTree for  Declarative XML Querying

37

Transformation algorithm for result construction part Output:<result> { for $b in /bib/book return <book> { for $t in $b/title return <name> {$t/*} {$t/@*} {$t/text()} </name> } { let $a := $b/author return <authors count={count($a)}> { for $x in $a return <au> {$x/*} {$x/@*} {$x/text()} </au> } </authors> } </book> }

{ for $j in /bib/journal return <journal> { for $jt in $j/title return {$jt} } { for $e in $j/editor return <editor> { for $first in $e/first return {$first} } { for $last in $e/last return {$last} } </editor> } </journal>} </result>

Page 38: XTree for  Declarative XML Querying

38

Outlines

Introduction Preliminaries XTree Algorithm to transform XTree query to XQuery Conclusion and future works

Conclusion Future works

Page 39: XTree for  Declarative XML Querying

39

Conclusion

Discussed the limitations of XPath Proposed a new set of syntax rules called XTree

XTree has a tree structure In the querying part, one XTree expression can bind

multiple variables In the result construction part, one XTree expression can

define the result format List-valued variables are explicitly indicated, and their

values are uniquely determined XTree is more compact and convenient to use than XPath

Designed algorithms to transform a query based on XTree expressions to a standard XQuery query

Page 40: XTree for  Declarative XML Querying

40

Future works

Implement an XTree query parser Queries based on XTree expressions can be executed

directly The query evaluation will be more efficient on this approach,

since we will have a global view of the whole query tree Extend the transformation algorithms to support queries

with join, negation, grouping and recursion Optimize the output XQuery queries of our transformation

algorithms according to the schema of the XML document Observe the progressive development of XPath to

continuously enhance our XTree

Page 41: XTree for  Declarative XML Querying

41

References1. S.Abiteboul, D.Quass, J.McHugh, J.Widom, and J.L. Wiener. The Lorel Query Language for

Semistructured Data. International Journal of Digital Library 1(1):68-99, 1997.2. S.Ceri, S.Comai, E.Damiani, P.Fraternali, S.Paraboschi, and L.Tanca. XML-GL: a Graphical

Language for Querying and Restructuring WWW data. In Proceedings of the 8th International World Wide Web Conference, Toronto, Canada, 1999.

3. S.Cluet and J.Simeon. YATL: a Functional and Declarative Language for XML. Draft manuscript, March 2000.

4. H.Hosoya and B.Pierce. XDuce: A Typed XML Processing Language (Preliminary Report). In Proceedings of WebDB Workshop, 2000.

5. M.Liu and T.W.Ling. Towards Declarative XML Querying. In Proceedings of WISE 2002, 127-138, Singapore, 2002.

6. P.Chippimolchai, V.Wuwongse and C.Anutariya. Semantic Query Formulation and Evaluation for XML Databases. In Proceedings of WISE 2002, 205-214, Singapore, 2002.

7. D.Chamberlin, P. Fankhauser, M.Marchiori, and J.Robie. XML Query Requirements. W3C Working Draft, In http://www.w3.org/TR/xquery-requirements/, June 2003.

8. J. Clark and S.DeRose. XML Path Language (XPath) Version 1.0. W3C Recommendation, In http://www.w3.org/TR/xpath, November 2001.

9. D.Chamberlin, D.Florescu, J.Robie, J.Simon, and M.Stefanescu. XQuery 1.0: A Query Language for XML. W3C Working Draft, In http://www.w3.org/TR/xquery/, May 2003.

10. J.Robie, J.Lapp, and D.Schach. XML Query Language (XQL). In 11. http://www.w3.org/TandS/QL/QL98/pp/xql.html, 1998.12. A. Deutsch, M.Fernandez, D.Florescu, A.Levy, and D.Suciu. XML-QL: A Query Language for

XML. In http://www.w3.org/TR/NOTE-xml-ql/, August 1998.13. J.Clark. XSL Transformations (XSLT) Version 1.0. W3C Recommendation, In

http://www.w3.org/TR/xslt, November 1999.

Page 42: XTree for  Declarative XML Querying

42

Thank you

Page 43: XTree for  Declarative XML Querying

43

<bib name=“IT”> <book id=“b001” year=“1994”> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id =“b002” year=“1992”> <title>Advanced Programming in the Unix Environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id=“b003” year=“2000”> <title>Data on the Web</title>

<edition>3</edition> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann</publisher> </book> <journal id=“j001” year=“1998”> <title>XML</title> <editor><last>Date</last><first>C.</first></editor> <editor><last>Gerbarg</last><first>M.</first></editor> <publisher>Morgan Kaufmann</publisher> </journal></bib>

{$a}

back

Page 44: XTree for  Declarative XML Querying

44

<bib name=“IT”> <book id=“b001” year=“1994”> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id =“b002” year=“1992”> <title>Advanced Programming in the Unix Environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id=“b003” year=“2000”> <title>Data on the Web</title>

<edition>3</edition> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann</publisher> </book> <journal id=“j001” year=“1998”> <title>XML</title> <editor><last>Date</last><first>C.</first></editor> <editor><last>Gerbarg</last><first>M.</first></editor> <publisher>Morgan Kaufmann</publisher> </journal></bib>

{$a} $b

$b{$a}

$b{$a}

back