XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion,...
-
date post
21-Dec-2015 -
Category
Documents
-
view
243 -
download
1
Transcript of XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion,...
![Page 1: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/1.jpg)
XML
May 2nd, 2002
![Page 2: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/2.jpg)
Agenda
• XML as a data model
• Querying XML
• Manipulating XML
• A lot of discussion, politics and stories.
![Page 3: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/3.jpg)
More Facts About XML
• Every database vendor has an XML page:– www.oracle.com/xml– www.microsoft.com/xml– www.ibm.com/xml
• Many applications are just fancier Websites
• But, most importantly, XML enables data sharing on the Web – hence our interest
![Page 4: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/4.jpg)
What is XML ?From HTML to XML
HTML describes the presentation: easy for humans
![Page 5: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/5.jpg)
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteboul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteboul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
HTML is hard for applications
![Page 6: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/6.jpg)
XML
<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …
</bibliography>
<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …
</bibliography>
XML describes the content: easy for applications
![Page 7: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/7.jpg)
XML
• eXtensible Markup Language
• Roots: comes from SGML– A very nasty language
• After the roots: a format for sharing data
• Emerging format for data exchange on the Web and between applications
![Page 8: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/8.jpg)
XML Applications
• Sharing data between different components of an application.
• Archive data in text files.• EDI: electronic data exchange:
– Transactions between banks
– Producers and suppliers sharing product data (auctions)
– Extranets: building relationships between companies
• Scientists sharing data about experiments.
![Page 9: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/9.jpg)
Web Services
• A new paradigm for creating distributed applications?
• Systems communicate via messages, contracts.
• Example: order processing system.
• MS .NET, J2EE – some of the platforms
• XML – a part of the story; the data format.
![Page 10: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/10.jpg)
XML Syntax
• Very simple:<db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> <book> <title>Transaction Processing</title> <author>Bernstein</author> <author>Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher></db>
![Page 11: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/11.jpg)
XML Terminology
• tags: book, title, author, …
• start tag: <book>, end tag: </book>
• start tags must correspond to end tags, and conversely
![Page 12: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/12.jpg)
XML Terminology• an element: everything between tags
– example element:
<title>Complete Guide to DB2</title>– example element:
• elements may be nested• empty element: <red></red> abbreviated <red/>• an XML document has a unique root element
well formed XML document: if it has matching tags
<book> <title> Complete Guide to DB2 </title> <author>Chamberlin</author> </book>
<book> <title> Complete Guide to DB2 </title> <author>Chamberlin</author> </book>
![Page 13: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/13.jpg)
The XML Tree
db
book book publisher
title author title author author name state“CompleteGuideto DB2”
“Chamberlin” “TransactionProcessing”
“Bernstein” “Newcomer”“MorganKaufman”
“CA”
Tags on nodesData values on leaves
![Page 14: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/14.jpg)
More XML Syntax: Attributes
<book price = “55” currency = “USD”> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> <year> 1998 </year></book>
<book price = “55” currency = “USD”> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> <year> 1998 </year></book>
price, currency are called attributes
![Page 15: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/15.jpg)
Replacing Attributes with Elements
<book> <title> Complete Guide to DB2
</title> <author> Chamberlin </author> <year> 1998 </year> <price> 55 </price> <currency> USD </currency></book>
<book> <title> Complete Guide to DB2
</title> <author> Chamberlin </author> <year> 1998 </year> <price> 55 </price> <currency> USD </currency></book>
attributes are alternative ways to represent data
![Page 16: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/16.jpg)
“Types” (or “Schemas”) for XML
• Document Type Definition – DTD
• Define a grammar for the XML document, but we use it as substitute for types/schemas
• Will be replaced by XML-Schema (will extend DTDs)
![Page 17: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/17.jpg)
An Example DTD
• PCDATA means Parsed Character Data (a mouthful for string)
<!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT publisher (#PCDATA)>]>
<!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT publisher (#PCDATA)>]>
![Page 18: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/18.jpg)
More on DTDs: Attributes<!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> . . . <!ATTLIS book price CDATA #REQUIRED language CDATA #IMPLIED> <!ATTLIS author phone CDATA #IMPLIED> ]>
<!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> . . . <!ATTLIS book price CDATA #REQUIRED language CDATA #IMPLIED> <!ATTLIS author phone CDATA #IMPLIED> ]>
<db> <book price=“55” language=“English”> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> </book>…</db>
<db> <book price=“55” language=“English”> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> </book>…</db>
The type:CDATA = stringID = a keyIDREF = a foreign keyothers=rarely used
Default declaration:#REQUIRED=required#IMPLIED=optional#FIXED=fixed (rarely used)
![Page 19: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/19.jpg)
DTDs as Grammars
Same thing as:
• A DTD is a EBNF (Extended BNF) grammar• An XML tree is precisely a derivation tree
XML Documents that have a DTD and conform to it are called valid
db ::= (book|publisher)*book ::= (title,author*,year?)title ::= stringauthor ::= stringyear ::= stringpublisher ::= string
db ::= (book|publisher)*book ::= (title,author*,year?)title ::= stringauthor ::= stringyear ::= stringpublisher ::= string
![Page 20: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/20.jpg)
More on DTDs as Grammars
<!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)>]>
<!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)>]>
<paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section></paper>
XML documents can be nested arbitrarily deep
![Page 21: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/21.jpg)
XML for Representing Data
<persons><row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone>
6363</phone></row></persons>
<persons><row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone>
6363</phone></row></persons>
n a m e p h o n e
J o h n 3 6 3 4
S u e 6 3 4 3
D i c k 6 3 6 3
row row row
name name namephone phone phone
“John” 3634 “Sue” “Dick”6343 6363
persons XML: persons
![Page 22: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/22.jpg)
XML vs Data Models
• XML is self-describing
• Schema elements become part of the data– Relational schema: persons(name,phone)– In XML <persons>, <name>, <phone> are part
of the data, and are repeated many times
• Consequence: XML is much more flexible
• XML = semistructured data
![Page 23: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/23.jpg)
Semi-structured Data Explained
• Missing attributes:
• Repeated attributes
<person> <name> John</name> <phone>1234</phone> </person>
<person> <name>Joe</name></person>
<person> <name> John</name> <phone>1234</phone> </person>
<person> <name>Joe</name></person> no phone !
<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>
<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>
two phones !
![Page 24: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/24.jpg)
Semistructured Data Explained
• Attributes with different types in different objects
• Nested collections (no 1NF)• Heterogeneous collections:
– <db> contains both <book>s and <publisher>s
<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>
<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>
structured name !
![Page 25: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/25.jpg)
XML Data v.s. E/R, ODL, Relational
• Q: is XML better or worse ?• A: serves different purposes
– E/R, ODL, Relational models:• For centralized processing, when we control the data
– XML:• Data sharing between different systems• we do not have control over the entire data• E.g. on the Web
• Do NOT use XML to model your data ! Use E/R, ODL, or relational instead.
![Page 26: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/26.jpg)
Exporting Relational Data to XML
• Product(pid, name, weight)
• Company(cid, name, address)
• Makes(pid, cid, price)
product companymakes
![Page 27: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/27.jpg)
Export data grouped by companies
<db><company> <name> GizmoWorks </name> <address> Tacoma </address> <product> <name> gizmo </name> <price> 19.99 </price> </product> <product> …</product> …</company><company> <name> Bang </name> <address> Kirkland </address> <product> <name> gizmo </name> <price> 22.99 </price> </product> …</company>…
</db>
<db><company> <name> GizmoWorks </name> <address> Tacoma </address> <product> <name> gizmo </name> <price> 19.99 </price> </product> <product> …</product> …</company><company> <name> Bang </name> <address> Kirkland </address> <product> <name> gizmo </name> <price> 22.99 </price> </product> …</company>…
</db>
Redundantrepresentationof products
![Page 28: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/28.jpg)
The DTD
<!ELEMENT db (company*)>
<!ELEMENT company (name, address, product*)>
<!ELEMENT product (name,price)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT db (company*)>
<!ELEMENT company (name, address, product*)>
<!ELEMENT product (name,price)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT price (#PCDATA)>
![Page 29: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/29.jpg)
Export Data by Products<db> <product> <name> Gizmo </name> <manufacturer> <name> GizmoWorks </name> <price> 19.99 </price> <address> Tacoma </address> </manufacturer> <manufacturer> <name> Bang </name> <price> 22.99 </price> <address> Kirkland
</address> </manufacturer> … </product> <product> <name> OneClick </name> …</db>
<db> <product> <name> Gizmo </name> <manufacturer> <name> GizmoWorks </name> <price> 19.99 </price> <address> Tacoma </address> </manufacturer> <manufacturer> <name> Bang </name> <price> 22.99 </price> <address> Kirkland
</address> </manufacturer> … </product> <product> <name> OneClick </name> …</db>
RedundantRepresentationof companies
![Page 30: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/30.jpg)
Which One Do We Choose ?
• The structure of the XML data is determined by agreement, with our partners, or dictated by committees– Many XML dialects (called applications)
• XML Data is often nested, irregular, etc
• No normal forms for XML
![Page 31: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/31.jpg)
XML Query Languages
• Xpath
• XML-QL
• Xquery
![Page 32: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/32.jpg)
An Example of XML Data<bib>
<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
![Page 33: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/33.jpg)
XPath• Syntax for XML document navigation and node
selection• A recommendation of the W3C (i.e. a standard)• Building block for other W3C standards:
– XSL Transformations (XSLT) – XQuery– XML Link (XLink)– XML Pointer (XPointer)
• Was originally part of XSL – “XSL pattern language”
![Page 34: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/34.jpg)
XPath: Simple Expressions
/bib/book/year
Result: <year> 1995 </year>
<year> 1998 </year>
/bib/paper/year
Result: empty (there were no papers)
![Page 35: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/35.jpg)
XPath: Restricted Kleene Closure
//author
Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author>
/bib//first-nameResult: <first-name> Rick </first-name>
![Page 36: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/36.jpg)
Xpath: Text Nodes
/bib/book/author/text()
Result: Serge Abiteboul
Jeffrey D. Ullman
Rick Hull doesn’t appear because he has firstname, lastname
![Page 37: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/37.jpg)
Xpath: Wildcard
//author/*
Result: <first-name> Rick </first-name>
<last-name> Hull </last-name>
* Matches any element
![Page 38: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/38.jpg)
Xpath: Attribute Nodes
/bib/book/@price
Result: “55”
@price means that price is has to be an attribute
![Page 39: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/39.jpg)
Xpath: Qualifiers
/bib/book/author[firstname]
Result: <author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
![Page 40: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/40.jpg)
Xpath: More Qualifiers
/bib/book/author[firstname][address[//zip][city]]/lastname
Result: <lastname> … </lastname>
<lastname> … </lastname>
![Page 41: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/41.jpg)
Xpath: More Qualifiers
/bib/book[@price < “60”]
/bib/book[author/@age < “25”]
/bib/book[author/text()]
![Page 42: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/42.jpg)
Xpath: Summarybib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book/[@price<“55”]/author/lastname matches…
![Page 43: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/43.jpg)
Xpath: More Details
• An Xpath expression, p, establishes a relation between:– A context node, and– A node in the answer set
• In other words, p denotes a function:– S[p] : Nodes -> {Nodes}
• Examples:– author/firstname– . = self– .. = parent– part/*/*/subpart/../name = what does it mean ?
![Page 44: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/44.jpg)
The Root and the Root
• <bib> <paper> 1 </paper> <paper> 2 </paper> </bib>
• bib is the “document element”
• The “root” is above bib
• /bib = returns the document element
• / = returns the root
• Why ? Because we may have comments before and after <bib>; they become siblings of <bib>
• This is advanced xmlogy
![Page 45: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/45.jpg)
Xpath: More Details
• We can navigate along 13 axes:ancestorancestor-or-selfattributechilddescendantdescendant-or-selffollowingfollowing-siblingnamespaceparentprecedingpreceding-siblingself
![Page 46: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/46.jpg)
Xpath: More Details
• Examples:– child::author/child:lastname = author/lastname– child::author/descendant::zip = author//zip– child::author/parent::* = author/..– child::author/attribute::age = author/@age
![Page 47: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/47.jpg)
XQuery
• Based on Quilt (which is based on XML-QL)
• Check out the W3C web site for the latest.
• XML Query data model – Ordered !
![Page 48: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/48.jpg)
FLWR (“Flower”) Expressions
FOR ... LET... FOR... LET...
WHERE...
RETURN...
![Page 49: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/49.jpg)
XQuery
Find all book titles published after 1995: FOR $x IN document(“bib.xml”)/bib/book WHERE $x/year>1995 RETURN $x/title
Result:<title> abc </title><title> def </title><title> ghi </title >
![Page 50: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/50.jpg)
XQueryFor each author of a book by Morgan
Kaufmann, list all books she published:
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function that eliminates duplicates
![Page 51: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/51.jpg)
XQuery
Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
![Page 52: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/52.jpg)
XQuery
• FOR $x in expr -- binds $x to each value in the list expr
• LET $x = expr -- binds $x to the entire list expr– Useful for common subexpressions and for
aggregations
![Page 53: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/53.jpg)
XQuery
count = a (aggregate) function that returns the number of elms
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
![Page 54: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/54.jpg)
XQuery
Find books whose price is larger than average:
LET $a=avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
LET $a=avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
![Page 55: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/55.jpg)
XQuery
Summary:
• FOR-LET-WHERE-RETURN = FLWR
FOR/LET Clauses
WHERE Clause
RETURN Clause
List of tuples
List of tuples
Instance of Xquery data model
![Page 56: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/56.jpg)
FOR v.s. LET
FOR
• Binds node variables iteration
LET
• Binds collection variables one value
![Page 57: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/57.jpg)
FOR v.s. LET
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>
![Page 58: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/58.jpg)
Collections in XQuery
• Ordered and unordered collections– /bib/book/author = an ordered collection
– Distinct(/bib/book/author) = an unordered collection
• LET $a = /bib/book $a is a collection• $b/author a collection (several authors...)
RETURN <result> $b/author </result>RETURN <result> $b/author </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>
![Page 59: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/59.jpg)
Collections in XQuery
What about collections in expressions ?
• $b/price list of n prices
• $b/price * 0.7 list of n numbers
• $b/price * $b/quantity list of n x m numbers ??
• $b/price * ($b/quant1 + $b/quant2) $b/price * $b/quant1 + $b/price * $b/quant2 !!
![Page 60: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/60.jpg)
Sorting in XQuery
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
![Page 61: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/61.jpg)
Sorting in XQuery
• Sorting arguments: refer to the name space of the RETURN clause, not the FOR clause
![Page 62: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/62.jpg)
If-Then-Else
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
![Page 63: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/63.jpg)
Existential Quantifiers
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
![Page 64: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/64.jpg)
Universal Quantifiers
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
![Page 65: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/65.jpg)
Other Stuff in XQuery
• BEFORE and AFTER– for dealing with order in the input
• FILTER– deletes some edges in the result tree
• Recursive functions– Currently: arbitrary recursion– Perhaps more restrictions in the future ?
![Page 66: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/66.jpg)
Processing XML Data
• Do we really need to process XML data? What are we processing XML for?
• How are we going to do it? Use existing technology?
• Are there other processing paradigms that we need to consider?
![Page 67: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/67.jpg)
Query Processing For XML
• Approach 1: store XML in a relational database. Translate an XML-QL/Quilt query into a set of SQL queries.– Leverage 20 years of research & development.
• Approach 2: store XML in an object-oriented database system.– OO model is closest to XML, but systems do not
perform well and are not well accepted.
• Approach 3: build a native XML query processing engine. – Still in the research phase; see Zack next week.
![Page 68: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/68.jpg)
Relational Approach
• Step 1: given a DTD, create a relational schema.
• Step 2: map the XML document into tuples in the relational database.
• Step 3: given a query Q in Xquery, translate it to a set of queries P over the relational database.
• Step 4: translate the tuples returned from the relational database into XML elements.
![Page 69: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/69.jpg)
Which Relational Schema?• The key question! Affects performance.
• No magic solution.
• Some options:– The EDGE table: put everything in one table– The Attribute tables: create a table for every tag
name.– The inlining method: inline as much data into
the tables.
![Page 70: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/70.jpg)
An Example DTD<!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT publisher (name, state)> <!ELEMENT name (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ATTLIST book pub IDREF #IMPLIED>]>
![Page 71: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/71.jpg)
Recall: The XML Tree
db
book book publisher
title author title author author name state“CompleteGuideto DB2”
“Chamberlin” “TransactionProcessing”
“Bernstein” “Newcomer”“MorganKaufman”
“CA”
Tags on nodesData values on leaves
![Page 72: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/72.jpg)
The Edge Approach
sourceID tag destID destValue
- Don’t need a DTD.- Very simple to implement.
![Page 73: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/73.jpg)
The Attribute Approach
rootID bookId
bookID title
rootID pubID
pubID pubName
bookID author
Book
Title
Author
Publisher
pubID state
PubName
PubState
![Page 74: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/74.jpg)
The In-lining Approach
bookID title pubName pubState
bookID authorBookAuthor
Book
sourceID tag destID destValuePublisher
![Page 75: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/75.jpg)
Let the Querying Begin!
• Matching data using elements patterns.FOR $t IN
document(bib.xml)/book/[author=“bernstein”]/author/title
RETURN
<bernsteinBook> $t </bernsteinBook>
![Page 76: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/76.jpg)
The Edge Approach
SELECT e3.destValueFROM E as e1, E as e2, E as e3WHERE e1.tag = “book” and e1.destID=e2.sourceID and e2.tag=“title” and e1.destID=e3.sourceID and e3.tag=“author” and e2.author=“Bernstein”
![Page 77: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/77.jpg)
The Attribute Approach
SELECT Title.titleFROM Book, Title, Author WHERE Book.bookID = Author.bookID and Book.bookID = Title.bookID and Author.author = “Bernstein”
![Page 78: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/78.jpg)
The In-lining Approach
SELECT Book.titleFROM Book, BookAuthor WHERE Book.bookID =BookAuthor.bookID and BookAuthor.author = “Bernstein”
![Page 79: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/79.jpg)
A Challenge: Reconstructing Elements
• Matching data using elements patterns.FOR $b IN document(bib.xml)/book/[author=“bernstein”]
RETURN
<bernsteinBook> $b </bernsteinBook>
![Page 80: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/80.jpg)
Reconstructing XML Elements
• Matching data using elements patterns.WHERE <book>
<author> Bernstein </author>
<title> $t </title>
</book> ELEMENT-AS $e
IN “www.a.b.c/bib.xml”
CONSTRUCT
$e
![Page 81: XML May 2 nd, 2002. Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649d585503460f94a36e56/html5/thumbnails/81.jpg)
Some Open Questions
• Native query processing for XML• To order or not to order?• Combining IR-style keyword queries with
DB-style structured queries• Updates• Automatic selection of a relational schema• How should we extend relational engines to
better support XML storage and querying?