© 2017 A. Alawini, S. Davidson
XMLandXQuery
SusanB.DavidsonCIS700:AdvancedTopicsinDatabases
MW1:30-3
Towne309
http://www.cis.upenn.edu/~susan/cis700/homepage.html
2
XML Anatomy
<?xml version="1.0" encoding="ISO-8859-1" ?> <dblp> <mastersthesis mdate="2002-01-03" key="ms/Brown92"> <author>Kurt P. Brown</author> <title>PRPL: A Database Workload Specification Language</title> <year>1992</year> <school>Univ. of Wisconsin-Madison</school> </mastersthesis> <article mdate="2002-01-03" key="tr/dec/SRC1997-018"> <editor>Paul R. McJones</editor> <title>The 1995 SQL Reunion</title> <journal>Digital System Research Center Report</journal> <volume>SRC1997-018</volume> <year>1997</year> <ee>db/labs/dec/SRC1997-018.html</ee> <ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee> </article>
Processing Instr.
Element
Attribute
Close-tag
Open-tag
3
XML Data Model Visualized (and simplified!)
Root
?xml dblp
mastersthesis article
mdate key author title year school editor title year journal volume ee ee
mdate key
2002…
ms/Brown92
Kurt P….
PRPL…
1992
Univ….
2002…
tr/dec/…
Paul R.
The…
Digital…
SRC…
1997
db/labs/dec
http://www.
attribute root
p-i element
text
4
Structural Constraints: Document Type Definitions (DTDs)
TheDTDisanEBNFgrammardefiningXMLstructure• XMLdocumentspecifiesanassociatedDTD,plustherootelement
• DTDspecifieschildrenoftheroot(andsoon)
DTDdefinesspecialsignificanceforattributes:• IDs–specialattributesthatareanalogoustokeysforelements
• IDREFs–referencestoIDs
• IDREFS–anastyhackthatrepresentsalistofIDREFs
5
An Example DTD
ExampleDTD:<!ELEMENT dblp((mastersthesis | article)*)> <!ELEMENT mastersthesis(author,title,year,school,committeemember*)> <!ATTLIST mastersthesis(mdate CDATA #REQUIRED
key ID #REQUIRED advisor CDATA #IMPLIED>
<!ELEMENT author(#PCDATA)>
… ExampleuseofDTDinXMLfile:
<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPEdblpSYSTEM“my.dtd"><dblp>…
6
Representing Graphs and Links in XML: Basically Using Foreign Keys
<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE graph SYSTEM “special.dtd"> <graph>
<author id=“author1”> <name>John Smith</name> </author> <article> <author ref=“author1” /> <title>Paper1</title> </article> <article> <author ref=“author1” /> <title>Paper2</title> </article>
…
7
Graph Data Model
Root
!DOCTYPE graph
author article
name title
ref ref
John Smith
author1 author1
Paper2
?xml article
id
author1
author author title
Paper1
8
Graph Data Model
Root
!DOCTYPE graph
author article
name title
ref ref
John Smith
Paper2
?xml article
id
author1
author author title
Paper1
9
Querying XML
Howdoyouqueryadirectedgraph?atree?
ThestandardapproachusedbymanyXML,semistructured-data,andobjectquerylanguages:• Definesomesortofatemplatedescribingtraversalsfromtherootofthedirectedgraph
• InXML,thebasisofthistemplateiscalledanXPath
10
XPaths
Initssimplestform,anXPathislikeapathinafilesystem:/mypath/subpath/*/morepath
• TheXPathreturnsanodesetrepresentingtheXMLnodes(andtheirsubtrees)attheendofthepath
• XPathscanhavenodetestsattheend,returningonlyparticularnodetypes,e.g.,text(),processing-instruction(),comment(),element(),attribute()
• XPathisfundamentallyanorderedlanguage:itcanqueryinorder-awarefashion,anditreturnsnodesinorder
11
Some Example XPath Queries
• /dblp/mastersthesis/title• /dblp/*/editor• //title• //title/text()
12
Context Nodes and Relative Paths
XPathhasanotionofacontextnode:it’sanalogoustoacurrentdirectory• “.”representsthiscontextnode• “..”representstheparentnode• Wecanexpressrelativepaths:
subpath/sub-subpath/../..getsusbacktothecontextnode
Ø Bydefault,thedocumentrootisthecontextnode
13
Predicates – Selection Operations
Apredicateallowsustofilterthenodesetbasedonselection-likeconditionsoversub-XPaths:
/dblp/article[title=“Paper1”]
whichisequivalentto:/dblp/article[./title/text()=“Paper1”]
14
Axes: More Complex Traversals
Thusfar,we’veseenXPathexpressionsthatgodownthetree(anduponestep)• Butwemightwanttogoup,left,right,etc.
• Theseareexpressedwithso-calledaxes:• self::path-step• child::path-step parent::path-step
• descendant::path-step ancestor::path-step
• descendant-or-self::path-step ancestor-or-self::path-step
• preceding-sibling::path-step following-sibling::path-step
• preceding::path-step following::path-step
• ThepreviousXPathswesawwerein“abbreviatedform”
15
Querying Order
• Wesawinthepreviousslidethatwecouldqueryforprecedingorfollowingsiblingsornodes
• Wecanalsoqueryanodeforitspositionaccordingtosomeindex:• fn::first() ,fn::last()returnindexof0th&lastelementmatchingthelaststep:
• fn::position() givestherelativecountofthecurrentnode
child::article[fn::position()=fn::last()]
16
Beyond XPath: XQuery
Astrongly-typed,Turing-completeXMLmanipulationlanguage• AttemptstodostatictypecheckingagainstXMLSchema• BasedonanobjectmodelderivedfromSchema
UnlikeSQL,fullycompositional,highlyorthogonal:• Inputs&outputscollections(sequencesorbags)ofXMLnodes
• Anywhereaparticulartypeofobjectmaybeused,mayusetheresultsofaqueryofthesametype
• DesignedmostlybyDBandfunctionallanguagepeople
17
XQuery’s Basic Form
• HasananalogousformtoSQL’sSELECT..FROM..WHERE..GROUPBY..ORDERBY
• Themodel:bindnodes(ornodesets)tovariables;operateovereachlegalcombinationofbindings;produceasetofnodes
• “FLWOR”statement[notecasesensitivity!]:for{iteratorsthatbindvariables}let{collections}where{conditions}orderby{order-paths}return{outputconstructor}
• MixesXML+XQuerysyntax;use{}as“escapes”
18
XML Data Model Visualized
Root
?xml dblp
mastersthesis article
mdate key author title year school editor title year journal volume ee ee
mdate key
2002…
ms/Brown92
Kurt P….
PRPL…
1992
Univ….
2002…
tr/dec/…
Paul R.
The…
Digital…
SRC…
1997
db/labs/dec
http://www.
attribute root
p-i element
text
19
“Iterations” in XQuery Aseriesof(possiblynested)FORstatementsassigningtheresultsofXPathstovariables
for$rootindoc(“http://my.org/my.xml”) for$subin$root/rootElement, $sub2in$sub/subElement,…• Somethinglikeatemplatethatpattern-matches,producesa“bindingtuple”
• Foreachofthese,weevaluatetheWHEREandpossiblyoutputtheRETURNtemplate
• document()ordoc()functionspecifiesaninputfileasaURI
20
Two XQuery Examples <root-tag>{for$pindoc(“dblp.xml”)/dblp/proceedings,$yrin$p/yrwhere$yr=“1999”return<proc>{$p}</proc>
}</root-tag>for$iindoc(“dblp.xml”)/dblp/inproceedings[author/text()=“JohnSmith”]
return<smith-paper> <title>{$i/title/text()}</title> <key>{$i/@key}</key> {$i/crossref} </smith-paper>
21
Nesting in XQuery
NestingXMLtreesisperhapsthemostcommonoperationInXQuery,it’seasy–putasubqueryinthereturnclausewhereyouwantthingstorepeat!
for$uindoc(“dblp.xml”)/dblp/universitywhere$u/country=“USA”return<ms-theses-99> {$u/name}{ for$mtin$u/../mastersthesis where$mt/year/text()=“1999”and____________ return$mt/title} </ms-theses-99>
22
Collections & Aggregation in XQuery InXQuery,manyoperationsreturncollections
• XPaths,sub-XQueries,functionsoverthese,…• Theletclauseassignstheresultstoavariable
Aggregationappliesafunctionoveracollection(elegant!)
let$allpapers:=doc(“dblp.xml”)/dblp/articlereturn<article-authors><count>{fn:count(fn:distinct-values($allpapers/authors))}</count>
{ for$paperindoc(“dblp.xml”)/dblp/articlelet$pauth:=$paper/authorreturn<paper>{$paper/title} <count>{fn:count($pauth)}</count> </paper>
}</article-authors>
23
Collections, Ctd.
UnlikeSQL,wecancomposeaggregationsandcreatenewcollectionsfromold:
<result>{let$avgItemsSold:=fn:avg(for$orderindoc(“my.xml”)/orders/orderlet$totalSold=fn:sum($order/item/quantity)return$totalSold)return$avgItemsSold
}</result>
24
Distinct-ness
InXQuery,DISTINCT-nesshappensasafunctionoveracollection• Butsincewehavenodes,wecandoduplicateremovalaccordingtovalueornode
• Candofn:distinct-values(collection)toremoveduplicatevalues,orfn:distinct-nodes(collection)toremoveduplicatenodes
for$yearsinfn:distinct-values(doc(“dblp.xml”)//year/text()
return$years
25
Sorting in XQuery
• SQLactuallyallowsyoutosortitsoutput,withaspecialORDERBYclause
• InXQuery,whatweorderisthesequenceof“resulttuples”outputbythereturnclause:
for$xindoc(“dblp.xml”)/proceedingsorderby$x/title/text()return$x
26
What If Order Doesn’t Matter?
Bydefault:• SQLisunordered
• XQueryisorderedeverywhere!
• Butunorderedqueriesaremuchfastertoanswer
XQueryhasawayoftellingthequeryenginetoavoidpreservingorder:• unordered{for$xin(mypath)…}
27
Querying & Defining Metadata – Can’t Do This in SQL
Cangetanode’snamebyqueryingname():for$xindoc(“dblp.xml”)/dblp/*
returnname($x)
Canconstructelementsandattributesusingcomputednames:
for$xindoc(“dblp.xml”)/dblp/*,
$yearin$x/year,
$titlein$x/title/text()return
element{name($x)}{
attribute{“year-”+$year}{$title}
}
28
XQuery Summary
VeryflexibleandpowerfullanguageforXML• Cleanandorthogonal:canalwaysreplaceacollectionwithanexpressionthatcreatescollections
• DBanddocument-oriented(withkeywordsearchextensions)
• ThecoreisrelativelycleanandeasytounderstandTuringComplete–thereareseveralXQueryfunctionsthatenablethis(notdiscussed).
Top Related