Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
1 Part 3: Query Languages Managing XML and Semistructured Data.
-
date post
19-Dec-2015 -
Category
Documents
-
view
219 -
download
0
Transcript of 1 Part 3: Query Languages Managing XML and Semistructured Data.
2
In this section…In this section… Lorel (A Lightweight Object REpository Language -
developed at Standford) XPath specification
• data model• Examples [xpath, axis]• syntax
XQuery FLWR expressions FOR and LET expressions Collections and sorting (XML-QL the earlier version in AT&T Labs)
Resources:The Lorel Query Language for Semistructured Data by Abiteboul, Quass, McHugh, Widom, Wiener, in International Journal on Digital Libraries, 1997.A formal semantics of patterns in XSLT by Phil Wadler. XML Path Language (XPath) www.w3.org/TR/xpathXQuery: A Query Language for XML Chamberlin, Florescu, et al.W3C recommendation: www.w3.org/TR/xquery/
3
Querying XML DataQuerying XML Data A core query language (extracting +
restructuring) XPath (core expressions) allows simple
navigation through the tree XQuery is used as the SQL of XML XSLT (Extensible Stylesheet Language
Transformation) = recursive traversal based on pattern matching - will not discuss here
4
Sample Data for QueriesSample Data for Queries<biblio>
<paper>…</paper><book><author> Smith </author>
<date> 1999 </date> <title> Database Systems </title></book><book > <author> Roux</author> <author> Combalusier</author> <date> 1976 </date> <title> Database Systems </title></book>
</biblio>
<biblio><paper>…</paper><book><author> Smith </author>
<date> 1999 </date> <title> Database Systems </title></book><book > <author> Roux</author> <author> Combalusier</author> <date> 1976 </date> <title> Database Systems </title></book>
</biblio>
5
Will illustrate with:XML DB =
&o1
&o12 &o24 &o29
&96&30
paper bookbook
authordate
titleauthor
authordate
title
biblio
&o47 &o48 &o50
&o52 &25
Smith 1999 DatabaseSystems
RouxCombalusier
1976 DatabaseSystems
. . .
A Core Query LanguageA Core Query LanguageA SQL-like language for querying semi-structured data
6
Query 1:
SELECT author: XFROM biblio.book.author X
SELECT author: XFROM biblio.book.author X
&o1
&o12 &o24 &o29
&96&30
paper bookbook
authordate
titleauthor
authordate
title
biblio
&o47 &o48 &o50
&o52 &25
Smith 1999 DatabaseSystems
Roux Combalusier1976
DatabaseSystems
. . .
answer
author
authorauthor Answer =
{author: “Smith”, author: “Roux”, author: “Combalusier”}
Answer ={author: “Smith”, author: “Roux”, author: “Combalusier”}
7
Query 2:
SELECT row: XFROM biblio._ XWHERE “Smith” in X.author
SELECT row: XFROM biblio._ XWHERE “Smith” in X.author
&o1
&o12 &o24 &o29
&96&30
paper bookbook
authordate
titleauthor
authordate
title
biblio
&o47 &o48 &o50
&o52 &25
Smith 1999 DatabaseSystems
Roux Combalusier1976
DatabaseSystems
. . .
answer
row
row
. . .
Answer ={row: {author:“Smith”, date: 1999, title: “Database…”}, row: …}
Answer ={row: {author:“Smith”, date: 1999, title: “Database…”}, row: …}
8
Query 3:
SELECT row: ( SELECT author: Y FROM X.author Y)FROM biblio.book X
SELECT row: ( SELECT author: Y FROM X.author Y)FROM biblio.book X
&o1
&o12 &o24 &o29
&96&30
paper bookbook
authordate
titleauthor
authordate
title
biblio
&o47 &o48 &o50
&o52 &25
Smith 1999 DatabaseSystems
Roux Combalusier1976
DatabaseSystems
. . .
answer
row
row
&a1
&a2author
authorauthor
Answer ={row: {author:“Smith”}, row: {author:“Roux”, author:“Combalusier”,},}
Answer ={row: {author:“Smith”}, row: {author:“Roux”, author:“Combalusier”,},}
9
Query 4:
SELECT ( SELECT row: {author: Y, title: T} FROM X.author Y, X.title T)FROM biblio.book XWHERE “Roux” in X.author
SELECT ( SELECT row: {author: Y, title: T} FROM X.author Y, X.title T)FROM biblio.book XWHERE “Roux” in X.author
&o1
&o12 &o24 &o29
&96&30
paper bookbook
authordate
titleauthor
authordate
title
biblio
&o47 &o48 &o50
&o52 &25
Smith 1999 DatabaseSystems
Roux Combalusier1976
DatabaseSystems
. . .
answer
row
row
&a1
&a2author
author title
Answer ={row: {author:“Roux”, title: “Database…”}, row: {author:“Combalusier”, title: “Database…”},}
Answer ={row: {author:“Roux”, title: “Database…”}, row: {author:“Combalusier”, title: “Database…”},}
title
10
LorelLorel Minor syntactic differences in regular path
expressions (% instead of _, # instead of _*) Common path convention:
becomes:
SELECT biblio.book.authorFROM biblio.bookWHERE biblio.book.year = 1999
SELECT biblio.book.authorFROM biblio.bookWHERE biblio.book.year = 1999
SELECT X.authorFROM biblio.book XWHERE X.year = 1999
SELECT X.authorFROM biblio.book XWHERE X.year = 1999
11
LorelLorel Existential variables:
• What happens with books having multiple authors ? Author is existentially quantified:
SELECT biblio.book.yearFROM biblio.bookWHERE biblio.book.author = “Roux”
SELECT biblio.book.yearFROM biblio.bookWHERE biblio.book.author = “Roux”
SELECT X.yearFROM biblio.book X, X.author YWHERE Y = “Roux”
SELECT X.yearFROM biblio.book X, X.author YWHERE Y = “Roux”
12
LorelLorel
Path variables. @P in:
• What happens on graphs with cycles ? Constructing new results
• Several default rules Casting between datatypes
• Very useful in practice
SELECT @PFROM biblio.# @P X
SELECT @PFROM biblio.# @P X
13
XPathXPath http://www.w3.org/TR/xpath (11/99) Building block for other W3C standards:
• XSL Transformations (XSLT) • XML Link (XLink)• XML Pointer (XPointer)• XML Query
Was originally part of XSL
14
XPath: SummaryXPath: Summarybib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book/[@price<“55”]/author/lastname matches…
15
Example for XPath QueriesExample for XPath Queries<bib>
<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
16
Data Model for XPathData Model for XPath
bib
book book
publisher author . . . .
Addison-Wesley Serge Abiteboul
The root
The root element
17
XPath: Simple ExpressionsXPath: Simple Expressions
Result: <year> 1995 </year>
<year> 1998 </year>
Result: empty (there were no papers)
/bib/book/year/bib/book/year
/bib/paper/year/bib/paper/year
18
XPath: Restricted Kleene ClosureXPath: Restricted Kleene Closure
Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author>
Result: <first-name> Rick </first-name>
//author//author
/bib//first-name/bib//first-name
19
XPath: Text NodesXPath: Text Nodes
Result: Serge Abiteboul
Jeffrey D. Ullman
!Rick Hull doesn’t appear because he has firstname, lastname
Functions in XPath:• text() = matches the text value• node() = matches any node (= * or @* or text())• name() = returns the name of the current tag
/bib/book/author/text()/bib/book/author/text()
20
XPath: WildcardXPath: Wildcard
Result: <first-name> Rick </first-name>
<last-name> Hull </last-name>
* Matches any element
//author/*//author/*
21
XPath: Attribute NodesXPath: Attribute Nodes
Result: “55”
@price means that price is has to be an attribute
/bib/book/@price/bib/book/@price
22
XPath: PredicatesXPath: Predicates
Result: <author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
/bib/book/author[firstname]/bib/book/author[firstname]
23
XPath: More PredicatesXPath: More Predicates
Result: <lastname> … </lastname>
<lastname> … </lastname>
/bib/book/author[firstname][address[//zip][city]]/lastname/bib/book/author[firstname][address[//zip][city]]/lastname
24
XPath: More PredicatesXPath: More Predicates
/bib/book[@price < “60”]/bib/book[@price < “60”]
/bib/book[author/@age < “25”]/bib/book[author/@age < “25”]
/bib/book[author/text()]/bib/book[author/text()]
25
XQueryXQuery Based on Quilt
(which is based on XML-QL)
http://www.w3.org/TR/xquery/
2/2001 XML Query data
model• Ordered !
FLWOR (flower) Expressions
FOR ...
LET...
WHERE...
ORDER BY…
RETURN...
FOR ...
LET...
WHERE...
ORDER BY…
RETURN...
26
XQueryXQueryQuery: Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
* bib.xml is shown on slide 15Result:<title> Principles of Database…</title>
<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
27
XQueryXQuery
Query: Find book titles by the coauthors of “Foundations of Databases”:
FOR $x IN bib/book[title/text() = “Foundations …”]/author $y IN bib/book[author/text() = $x/text()]/title
RETURN <answer> $y/text() </answer>
FOR $x IN bib/book[title/text() = “Foundations …”]/author $y IN bib/book[author/text() = $x/text()]/title
RETURN <answer> $y/text() </answer>
Result: <answer> Foundations … </ answer > < answer> Foundations …</ answer >
The answer willcontain duplicates !
28
XQueryXQuery
Same as before, but eliminate duplicates:
FOR $x IN bib/book[title/text() = “Database Theory”]/author $y IN distinct(bib/book[author/text() = $x/text()]/title)
RETURN <answer> $y/text() </answer>
FOR $x IN bib/book[title/text() = “Database Theory”]/author $y IN distinct(bib/book[author/text() = $x/text()]/title)
RETURN <answer> $y/text() </answer>
Result: < answer> Foundations …</ answer >distinct = a function
that eliminates duplicates
29
SQL and XQuery Side-by-sideSQL and XQuery Side-by-sideProduct(pid, name, maker)Company(cid, name, city)
Query: Find all products made in Seattle
SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”
SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”
FOR $x IN /db/Product/row $y IN /db/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN $x/name
FOR $x IN /db/Product/row $y IN /db/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN $x/name
SQL XQuery
FOR $y IN /db/Company/row[city/text()=“Seattle”] $x IN /db/Product/row[maker/text()=$y/cid/text()]RETURN $x/name
FOR $y IN /db/Company/row[city/text()=“Seattle”] $x IN /db/Product/row[maker/text()=$y/cid/text()]RETURN $x/name
CoolXQuery
30
XQuery: NestingXQuery: Nesting
Query: For each author of a book by Morgan Kaufmann, list all books s/he published:
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)RETURN <result> { $a, FOR $t IN /bib/book[author=$a]/title RETURN $t } </result>
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)RETURN <result> { $a, FOR $t IN /bib/book[author=$a]/title RETURN $t } </result>
<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
Result:
31
XQueryXQuery FOR $x IN expr -- binds $x to each value in the
list expr LET $x = expr -- binds $x to the entire list expr
• Useful for common subexpressions and for aggregations
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
count = a (aggregate) function that returns the number of elms
32
XQueryXQuery
Query: Find books whose price is larger than average:
FOR $a IN /bib/bookLET $b:=avg(/bib/book/price/text())WHERE $a/price/text() > $bRETURN $a
FOR $a IN /bib/bookLET $b:=avg(/bib/book/price/text())WHERE $a/price/text() > $bRETURN $a
33
XQueryXQuery
$b is a collection of elements, not a single elementcount = a (aggregate) function that returns the number of elements
<big_publishers> { FOR $p IN distinct(//publisher/text()) LET $b := document("bib.xml")/book[publisher/text() = $p] WHERE count($b) > 100 RETURN <publisher> $p </publisher>}</big_publishers>
<big_publishers> { FOR $p IN distinct(//publisher/text()) LET $b := document("bib.xml")/book[publisher/text() = $p] WHERE count($b) > 100 RETURN <publisher> $p </publisher>}</big_publishers>
Query: Find all publishers that published more than 100 books:
34
FOR v.s. LETFOR v.s. LETFOR Binds node variables iterationLET Binds collection variables one valueExamples
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>
35
Sorting in XQuerySorting in XQuery
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> {<name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> {$b/title , $b/@price} </book> SORTBY(price DESCENDING) } </publisher> SORTBY(name) </publisher_list>
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> {<name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> {$b/title , $b/@price} </book> SORTBY(price DESCENDING) } </publisher> SORTBY(name) </publisher_list>
36
Sorting in XQuerySorting in XQuery Sorting arguments: refer to the name space of the
RETURN clause, not the FOR clause To sort on an element you don’t want to display,
first return it, then remove it with an additional query.
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> { <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> { $b/title , $b/price } </book> ORDER BY price DESCENDING } </publisher> ORDER BY name </publisher_list>
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> { <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> { $b/title , $b/price } </book> ORDER BY price DESCENDING } </publisher> ORDER BY name </publisher_list>
37
Collections in XQueryCollections in XQuery
Ordered and unordered collections• /bib/book/author = an ordered collection
• Distinct(/bib/book/author) = an unordered collection
LET $b = /bib/book $b is a collection $b/author a collection (several authors...)
RETURN <result> $b/author </result>RETURN <result> $b/author </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>
38
If-Then-ElseIf-Then-Else
FOR $h IN //holding RETURN <holding> { $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author } </holding> ORDER BY title
FOR $h IN //holding RETURN <holding> { $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author } </holding> ORDER BY title
39
QuantifiersQuantifiers
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
Existential Existential QuantifiersQuantifiers
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
Universal Universal QuantifiersQuantifiers
40
Other Stuff in XQueryOther Stuff in XQuery
BEFORE and AFTER• for dealing with order in the input
FILTER• deletes some edges in the result tree
Recursive functions• Currently: arbitrary recursion• Perhaps more restrictions in the future ?
41
Group-By in XQuery ??Group-By in XQuery ??
No GROUPBY currently in XQuery A recent proposal (next)
• What do YOU think ?
42
Group-By in XQuery ??Group-By in XQuery ??
FOR $b IN document("http://www.bn.com")/bib/book,
$y IN $b/@year
WHERE $b/publisher="Morgan Kaufmann"
RETURN GROUPBY $y
WHERE count($b) > 10
IN <year> $y </year>
FOR $b IN document("http://www.bn.com")/bib/book,
$y IN $b/@year
WHERE $b/publisher="Morgan Kaufmann"
RETURN GROUPBY $y
WHERE count($b) > 10
IN <year> $y </year>
SELECT year
FROM Bib
WHERE Bib.publisher="Morgan Kaufmann"
GROUPBY year
HAVING count(*) > 10
SELECT year
FROM Bib
WHERE Bib.publisher="Morgan Kaufmann"
GROUPBY year
HAVING count(*) > 10
with GROUPBY
Equivalent SQL
43
Group-By in XQuery ??Group-By in XQuery ??
FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result>
FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result>
FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib, $a IN $b/author, $y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>), $a IN $Tup/a/node(), $y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result>
FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib, $a IN $b/author, $y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>), $a IN $Tup/a/node(), $y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result>
with GROUPBY
Without GROUPBY
44
Group-By in XQuery ??Group-By in XQuery ??
FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@year, $t IN $b/title, $p IN $b/publisher RETURN GROUPBY $p, $y IN <result> $p, <year> $y </year>, GROUPBY $a IN <authorEntry> $a, GROUPBY $t IN $t <authorEntry> </result>
Nested GROUPBY’s