XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a...

51
XPath Chapter 6 XPath Peter Wood (BBK) XML Data Management 116 / 245

Transcript of XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a...

Page 1: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Chapter 6

XPath

Peter Wood (BBK) XML Data Management 116 / 245

Page 2: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Introduction

XPath is a language that lets you identify particular parts of XMLdocumentsXPath interprets XML documents as nodes (with content)organised in a tree structureXPath indicates nodes by (relative) position, type, content, andseveral other criteriaBasic syntax is somewhat similar to that used for navigating filehierarchiesXPath 1.0 (1999) and 2.0 (2010) are W3C recommendations

Peter Wood (BBK) XML Data Management 117 / 245

Page 3: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Some Tools for XPath

Saxon (specifically Saxon-HE which implements XPath 2.0,XQuery 1.0 and XSLT 2.0)eXist-db (a native XML database supporting XPath 2.0, XQuery1.0 and XSLT 1.0)XPath Checker (add-on for Firefox)XPath Expression Testbed (available online)

Peter Wood (BBK) XML Data Management 118 / 245

Page 4: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Data Model

XPath’s data model has some non-obvious features:

The tree’s root node is not the same as the document’s root(document) elementThe tree’s root node contains the entire document including theroot element (and comments and processing instructions thatappear before it)XPath’s data model does not include everything in the document:XML declaration and DTD are not addressablexmlns attributes are reported as namespace nodes

Peter Wood (BBK) XML Data Management 119 / 245

Page 5: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Data Model (2)

There are 6 types of node:I rootI elementI attributeI textI commentI processing instruction

Element nodes have an associated set of attribute nodesAttribute nodes are not children of element nodesThe order of child element nodes is significantWe will only consider the first 4 types of node

Peter Wood (BBK) XML Data Management 120 / 245

Page 6: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Example (1)

Recall our CD library example

<CD-library>

<CD number="724356690424">

<performance>

<composer>Frederic Chopin</composer>

<composition>Waltzes</composition>

<soloist>Dinu Lipatti</soloist>

<date>1950</date>

</performance>

</CD>

...

Peter Wood (BBK) XML Data Management 121 / 245

Page 7: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Example (2)

...

<CD number="419160-2">

<composer>Johannes Brahms</composer>

<soloist>Emil Gilels</soloist>

<performance>

<composition>Piano Concerto No. 2</composition>

<orchestra>Berlin Philharmonic</orchestra>

<conductor>Eugen Jochum</conductor>

<date>1972</date>

</performance>

<performance>

<composition>Fantasias Op. 116</composition>

<date>1976</date>

</performance>

</CD>

...

Peter Wood (BBK) XML Data Management 122 / 245

Page 8: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Example (3)

...

<CD number="449719-2">

<soloist>Martha Argerich</soloist>

<orchestra>London Symphony Orchestra</orchestra>

<conductor>Claudio Abbado</conductor>

<date>1968</date>

<performance>

<composer>Frederic Chopin</composer>

<composition>Piano Concerto No. 1</composition>

</performance>

<performance>

<composer>Franz Liszt</composer>

<composition>Piano Concerto No. 1</composition>

</performance>

</CD>

...

Peter Wood (BBK) XML Data Management 123 / 245

Page 9: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Example (4)

...

<CD number="430702-2">

<composer>Antonin Dvorak</composer>

<performance>

<composition>Symphony No. 9</composition>

<orchestra>Vienna Philharmonic</orchestra>

<conductor>Kirill Kondrashin</conductor>

<date>1980</date>

</performance>

<performance>

<composition>American Suite</composition>

<orchestra>Royal Philharmonic</orchestra>

<conductor>Antal Dorati</conductor>

<date>1984</date>

</performance>

</CD>

</CD-library>

Peter Wood (BBK) XML Data Management 124 / 245

Page 10: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Example — Tree Structure

p c s p p s o t d p p c p p

C C C C

L

Peter Wood (BBK) XML Data Management 125 / 245

Page 11: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Location Path

The most useful XPath expression is a location path:e.g., /CD-library/CD/performanceA location path consists of at least one location step:e.g., CD-library, CD and performance are location stepsA location step takes as input a set of nodes, also called thecontext (to be defined more precisely later)The location step expression is applied to this node set andresults in an output node setThis output node set is used as input for the next location step

Peter Wood (BBK) XML Data Management 126 / 245

Page 12: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Location Path (2)

There are two different kinds of location paths:I Absolute location pathsI Relative location paths

An absolute location pathI starts with /I is followed by a relative location pathI is evaluated at the root (context) node of a documentI e.g., /CD-library/CD/performance

A relative location pathI is a sequence of location stepsI each separated by /I evaluated with respect to some other context nodesI e.g., CD/performance

Peter Wood (BBK) XML Data Management 127 / 245

Page 13: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Evaluation of absolute location path

/CD-library/CD/performance

p c s p p s o t d p p c p p

C C C C

L

Peter Wood (BBK) XML Data Management 128 / 245

Page 14: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Evaluation of absolute location path/

CD-library/CD/performance

p c s p p s o t d p p c p p

C C C C

L

Peter Wood (BBK) XML Data Management 128 / 245

Page 15: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Evaluation of absolute location path/CD-library

/CD/performance

p c s p p s o t d p p c p p

C C C C

LL

Peter Wood (BBK) XML Data Management 128 / 245

Page 16: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Evaluation of absolute location path/CD-library/CD

/performance

p c s p p s o t d p p c p p

C C C C

L

C C C C

Peter Wood (BBK) XML Data Management 128 / 245

Page 17: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Evaluation of absolute location path/CD-library/CD/performance

p c s p p s o t d p p c p p

C C C C

L

p p p p p p p

Peter Wood (BBK) XML Data Management 128 / 245

Page 18: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Location Step

In general, a location step in turn consists of aI (navigation) axisI node testI predicate(s)

Syntax is axis :: node test [ predicate ] . . . [ predicate ]

e.g., child::CD[composer='Johannes Brahms']I child is the axisI CD is the node testI composer='Johannes Brahms' is the predicate

A location step is applied to each node in the context (i.e., eachnode becomes the context node)All resulting nodes are added to the output set of this location step

Peter Wood (BBK) XML Data Management 129 / 245

Page 19: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Evaluation of predicate/child::CD-library/child::CD

[composer='Johannes Brahms']

p c s p p s o t d p p c p p

C C C C

L

C C C C

Peter Wood (BBK) XML Data Management 130 / 245

Page 20: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Evaluation of predicate/child::CD-library/child::CD[composer='Johannes Brahms']

p c s p p s o t d p p c p p

C C C C

L

C

Peter Wood (BBK) XML Data Management 130 / 245

Page 21: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Axes

An axis specifies what nodes, relative to the current context node,to considerThere are 13 different axes (some can be abbreviated)

I self, abbreviated by .I child, abbreviated by empty axisI parent, abbreviated by ..I descendant-or-self, abbreviated by empty location stepI descendant, ancestor, ancestor-or-selfI following, following-sibling, preceding, preceding-siblingI attribute, abbreviated by @I namespace

Peter Wood (BBK) XML Data Management 131 / 245

Page 22: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Axes

The following slides show (graphical) examples of the axes,assuming the node in bold typeface is the context node

Peter Wood (BBK) XML Data Management 132 / 245

Page 23: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Self-Axis

The self-axis just contains the context node itself

Peter Wood (BBK) XML Data Management 133 / 245

Page 24: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Child-Axis

The child-axis contains the children (direct descendants) of thecontext node

Peter Wood (BBK) XML Data Management 134 / 245

Page 25: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Parent-Axis

The parent-axis contains the parent (direct ancestor) of thecontext node

Peter Wood (BBK) XML Data Management 135 / 245

Page 26: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Descendant-Axis

The descendant-axis contains all direct and indirect descendantsof the context node

Peter Wood (BBK) XML Data Management 136 / 245

Page 27: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Descendant-Or-Self-Axis

The descendant-or-self-axis contains all direct and indirectdescendants of the context node + the context node itself

Peter Wood (BBK) XML Data Management 137 / 245

Page 28: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Ancestor-Axis

The ancestor-axis contains all direct and indirect ancestors of thecontext node

Peter Wood (BBK) XML Data Management 138 / 245

Page 29: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Ancestor-Or-Self-Axis

The ancestor-or-self-axis contains all direct and indirect ancestorsof the context node + the context node itself

Peter Wood (BBK) XML Data Management 139 / 245

Page 30: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Following-Axis

The following-axis contains all nodes that begin after the contextnode ends

Peter Wood (BBK) XML Data Management 140 / 245

Page 31: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Preceding-Axis

The preceding-axis contains all nodes that end before the contextnode begins

Peter Wood (BBK) XML Data Management 141 / 245

Page 32: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Following-Sibling-Axes

The following-sibling-axis contains all following nodes that havethe same parent as the context node

Peter Wood (BBK) XML Data Management 142 / 245

Page 33: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Preceding-Sibling-Axis

The preceding-sibling-axis contains all preceding nodes that havethe same parent as the context node

Peter Wood (BBK) XML Data Management 143 / 245

Page 34: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Partitioning

The axes self, ancestor, descendant, following and precedingpartition a document into five disjoint subtrees:

Peter Wood (BBK) XML Data Management 144 / 245

Page 35: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Attribute-Axis

Attributes are handled in a special way in XPathThe attribute-axis contains all the attribute nodes of the contextnodeThis axis is empty if the context node is not an element nodeDoes not contain xmlns attributes used to declare namespaces

Peter Wood (BBK) XML Data Management 145 / 245

Page 36: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Namespace-Axis

The namespace-axis contains all namespaces in scope of thecontext nodeThis axis is empty if the context node is not an element node

Peter Wood (BBK) XML Data Management 146 / 245

Page 37: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Node Tests

Once the correct relative position of a node has been identified thetype of a node can be testedA node test further refines the nodes selected by the location stepA double colon :: separates the axis from the node testThere are seven different kinds of node tests

nameprefix:*node()

text()

comment()

processing-instruction()

*

Peter Wood (BBK) XML Data Management 147 / 245

Page 38: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Name

The name node test selects all elements with a matching nameI e.g., if our context is a set of 4 CD elements and the location step

uses the child axis, then we get element nodes with differentnames

I we can use the name node test to return, e.g., only soloist

elements

Along the attribute-axis it matches all attributes with the samename

Peter Wood (BBK) XML Data Management 148 / 245

Page 39: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Prefix:*

Along an element axis, all nodes whose namespace URIs are thesame as the prefix are matchedThis node test is also available for attribute nodes

Peter Wood (BBK) XML Data Management 149 / 245

Page 40: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Comment, Text, Processing-Instruction

comment() matches all comment nodestext() matches all text nodesprocessing-instruction() matches all processing instructions

Peter Wood (BBK) XML Data Management 150 / 245

Page 41: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Node and *

node() selects all nodes, regardless of type: attribute,namespace, element, text, comment, processing instruction, androot* selects all element nodes, regardless of name

I If the axis is the attribute axis, then it selects all attribute nodesI If the axis is the namespace axis, then is selects all namespace

nodes

Peter Wood (BBK) XML Data Management 151 / 245

Page 42: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Key for full CD library example

Element name Abbreviation Colourroot blacklibrary L whitecd C greyperformance p pinkcomposer c bluecomposition greensoloist s yellowconductor t redorchestra o browndate d orange

Peter Wood (BBK) XML Data Management 152 / 245

Page 43: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Full CD library example

p c s p p s o t d p p c p p

C C C C

L

Peter Wood (BBK) XML Data Management 153 / 245

Page 44: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Example using * and node()/CD-library/CD/*/node()

p c s p p s o t d p p c p p

C C C C

L

Peter Wood (BBK) XML Data Management 154 / 245

Page 45: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Example showing difference between * and node()/CD-library/CD/*/*

p c s p p s o t d p p c p p

C C C C

L

Peter Wood (BBK) XML Data Management 155 / 245

Page 46: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Example using descendant//composer or /descendant-or-self::node()/composer

p c s p p s o t d p p c p p

C C C C

L

c c

Peter Wood (BBK) XML Data Management 156 / 245

Page 47: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Another example using descendant//performance/composer or/descendant-or-self::node()/child::composer

p c s p p s o t d p p c p p

C C C C

L

Peter Wood (BBK) XML Data Management 157 / 245

Page 48: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Predicates

A node set can be reduced further with predicatesWhile each location step must have an axis and a node test(which may be empty), a predicate is optionalA predicate contains a Boolean expression which is tested foreach node in the resulting node setA predicate is enclosed in square brackets [ ]

Peter Wood (BBK) XML Data Management 158 / 245

Page 49: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Predicates (2)

XPath supports a full complement of relational operators,including =, >, <, >=, <=, !=XPath also provides Boolean and and or operators to combineexpressions logicallyIn some cases a predicate may not be a Boolean; then XPath willconvert it to one implicitly (if that is possible):

I an empty node set is interpreted as falseI a non-empty node set is interpreted as true

Peter Wood (BBK) XML Data Management 159 / 245

Page 50: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Example using a predicate

//performance[composer]

p c s p p s o t d p p c p p

C C C C

L

p p p

Peter Wood (BBK) XML Data Management 160 / 245

Page 51: XPath - Roma Tre Universityatzeni/didattica/BD/20112012... · XPath Introduction XPath is a language that lets you identify particular parts of XML documents XPath interprets XML

XPath

Another example using a predicate

//CD[performance/orchestra]

p c s p p s o t d p p c p p

C C C C

L

C C

Peter Wood (BBK) XML Data Management 161 / 245