Introduction to XSLT
description
Transcript of Introduction to XSLT
Evan LenzXML/XSLT Consultanthttp://[email protected]
August 2, 2005
O’Reilly Open Source ConventionAugust 1 - 5, 2005
Introduction to XSLT
Who is this guy?
• Evan Lenz– Majored in music– Over 5 years ago, read Michael Kay's XSLT Programmer's
Reference cover-to-cover while sitting by his newborn son's hospital bed
– Participated on the XSL Working Group for a couple years– Wrote XSLT 1.0 Pocket Reference (due out this month)– Preparing for entrance to a Ph.D. program in Digital Arts and
Experimental Media
Why does he like XSLT?
• XSLT is:– Powerful– Small– Beautiful– In high demand– Fun to learn– Fun to teach
What should I expect this afternoon?
• Fasten your seatbelts• A variety of interactive exercises and traditional
presentation• Feel free to feel overwhelmed
– You're learning more than you think!
• Try your best while you're here and it will be time well spent
• Have fun!
What's with the handouts?
• The big handout is a late-stage draft of XSLT 1.0 Pocket Reference, due out this month– If you would like a complimentary copy of the final book, put
your name and mailing address on the sign-up sheet
• The smaller handout contains exercises that we will be using today
XSLT from 30,000 feet
High-level overview
What is XSLT?
• “XSL Transformations”– “A language for
transforming XML documents into other XML documents”
– W3C Recommendation• http://www.w3.org/TR/xslt• Version 1.0: 1999-11-16
OK, then what is XSL?
• “Extensible Stylesheet Language”– “A language for
expressing stylesheets”– W3C Recommendation
• http://www.w3.org/TR/xsl
• Version 1.0: 2001-10-15– Has 2 parts:
• XSLT– Refactored out of XSL so
that it could proceed independently
• XSL-FO– “Formatting Objects”
What is XPath?
• “XML Path Language”– “A language for addressing
parts of an XML document”– W3C Recommendation
• http://www.w3.org/TR/xpath• Version 1.0: 1999-11-16• Released on the same day
as XSLT 1.0– The expression language
used in XSLT
A relationship of subsets
• XPath is part of XSLT• XSLT is part of XSL• Today we are
concerned only with the inner two circles:– XSLT and XPath– XSL, a.k.a. XSL-FO, is
out of scope for today
What is XSLT used for?
• Common applications– Stylesheets for converting XML to HTML
• Generating Web pages or whole websites• Docbook -> HTML
– Transformations from one document type to another• *ML to *ML – as many potential applications as there are XML
document types• RSS, SVG, UBL, LegalXML, HrXML, XBRL• Office applications
– SpreadsheetML, WordML, Keynote XML, OOo XML, PowerPoint (in next version), Access XML, etc.
– Extracting data from documents– Modifying or fixing up documents
Where is XSLT used?
• Every platform– Windows, Linux, Mac, UNIX, Java
• Many browsers support XSLT natively– Firefox/Mozilla, Internet Explorer, Safari
• Many frameworks use or support XSLT– .NET, Java, LAMP
• PHP5 now uses libxslt– Cocoon, 4Suite, Amazon web services, Google appliance,
Cisco routers, etc., etc.
• XSLT IS EVERYWHERE!!
Interoperable implementations?
• In terms of interoperability, XSLT is unmatched among languages having multiple implementations– Java
• Saxon – http://saxon.sf.net (open-source)• Xalan-J – http://xml.apache.org/xalan-j/ (open-source)
– Windows• MSXML – fast, fully conformant
– Python• 4xslt – http://www.4suite.org (open-source)
– C• libxslt – http://xmlsoft.org (open-source; used in Firefox, Safari,
PHP5, etc.)• Xalan-C++ – http://xml.apache.org/xalan-c/ (open-source)
Enough already, let's see some code!
Example XML file
INPUT: names.xml<people> <person> <givenName>Joe</givenName> <familyName>Johnson</familyName> </person> <person> <givenName>Jane</givenName> <familyName>Johnson</familyName> </person> <person> <givenName>Jim</givenName> <familyName>Johannson</familyName> </person> <person> <givenName>Jody</givenName> <familyName>Johannson</familyName> </person></people>
A very simple stylesheet, names.xsl
OUTPUT: the result of the transformation
$ saxon names.xml names.xsl >names.html
Or we could open the XML directly in the browser
• Oops, we must first add a processing instruction (PI) to the top, like this:<?xml-stylesheet type="text/xsl" href="names.xsl"?><people> <!-- ... --></people>
That's better. Displays as HTML but viewing source shows it's just XML.
One more example for now
<?xml-stylesheet type="text/xsl" href="article.xsl"?><article> <heading>This is a short article</heading> <para>This is the <emphasis>first</emphasis> paragraph.</para> <para>This is the <strong>second</strong> paragraph.</para></article>
• INPUT: article.xml
A rule-oriented stylesheet
• article.xsl:
A rule-oriented stylesheet, cont.
• article.xsl, cont.:
OUTPUT: article.xml transformed to HTML
See a pattern here?
XPath in a nutshell
How XPath fits in XSLT• XPath expressions appear in attribute values, e.g.:
– <xsl:for-each select="/people/person"/>– <xsl:value-of select="givenName"/>– <xsl:apply-templates select="/article/para"/>
• What these mean:– /people/person
• Select all person child elements of all people child elements of the root node
– givenName• Select all givenName child elements of the context node
– /article/para• Select all para child elements of all article child elements of
the root node
The skinny on XPath• XPath is an expression language
– The only thing you can do with XPath is write expressions– When we say “expression”, we mean “XPath expression”
• Every expression returns a value– XPath 1.0 has just four data types:
• Node-set (the most important)• String• Number• Boolean
• All expressions are evaluated in a context– Understanding context is crucial to understanding XPath
Path expressions
• Expressions that return node-sets are sometimes called path expressions
• A node-set is:– An unordered collection of zero or more nodes
• Every expression is evaluated relative to exactly one context node– The context node is analogous to the current directory in a
filesystem• On a CLI, dir/* expands to all the files in the dir directory
inside the current directory• As an XPath expression, dir/* would select all the element
children of all the dir element children of the context node
A filesystem analogy
• Addressing files:– Relative
• dir/*• ../file
– Absolute• /home/elenz/file.txt
• Addressing XML nodes:– Relative
• body/p• ../table
– Absolute• /html/body/p
QUIZ 1: You have 5 minutes• Ready?• Set...
Go! Use this cheat sheet1) para selects the para element children of the context node
2) * selects all element children of the context node
3) node() selects all children of the context node
4) @name selects the name attribute of the context node
5) @* selects all the attributes of the context node
6) para[1] selects the first para child of the context node
7) para[last()] selects the last para child of the context node
8) */para selects all para grandchildren of the context node
9) /doc/chapter[5]/section[2] selects the second section of the fifth chapter of the doc
10) chapter//para selects the para element descendants of the chapter element children of the context node
11) //para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node
12) . selects the context node
13) .//para selects the para element descendants of the context node
14) .. selects the parent of the context node15) title
16) ../@lang selects the lang attribute of the parent of the context node
XPath is all about trees
A venture into the abstract world of the XPath data model
Start filling out the NOTES page
The XPath data model
• An abstraction of an XML document, after parsing– In XSLT, models the source tree, stylesheet tree, & result tree
• An XML document is a tree of nodes• There are 7 kinds of nodes (memorize these!)
– Root node– Element node– Attribute node– Text node– Comment node– Processing Instruction (PI) node– Namespace node
Root nodes• Every XML document has exactly one root node
– An “invisible” container for the whole document– The XPath expression / selects the root node of the same
document as the context node
• The root node is not an element– Instead, the “document element” or “root element” is a child of
the root node
• It can also contain:– Processing instruction (PI) nodes– Comment nodes
• XSLT extension to XPath data model:– Root node may contain text nodes– Root node may contain more than one element node
Element nodes• There is one element node for each element that
appears in a document. (Duh.)• Example: <foo><bar/></foo>
– There are two element nodes above: foo and bar.– The foo element contains the bar element node.
• Element nodes can contain:– Text nodes– Other element nodes– Comment nodes– Processing instruction (PI) nodes
Node property: children• Applies only to:
– Element nodes– Root nodes
• Consists of:– Ordered list of zero or more other nodes
• 4 kinds of nodes can be children (memorize this subset!)– Element nodes– Text nodes– Comment nodes– Processing instruction (PI) nodes
• Instead of “Lions, Tigers, and Bears, Oh My”, chant:– “Elements, comments, text, PIs! Elements, comments, text, PIs!”
• Example: <foo><bar/> <!-- hi --> </foo>– The foo element's children consists of four nodes in order:
• 1) element, 2) text, 3) comment, 4) text
Why should I memorize that subset of four?
• Knowing what types of nodes can be children is crucial to understanding what this little, unassuming instruction does (as we shall see):– <xsl:apply-templates/>
• So remember:– “Elements, comments, text, PIs!”– “Elements, comments, text, PIs!”
How to access the children• Use the child axis, e.g. (in non-abbreviated form):
– child::node()• Selects all children of the context node
– child::*• Selects all child elements of the context node
– child::paragraph• Selects all child elements named paragraph
– child::xyz:foo• Selects all child elements named foo in the namespace
designated by the xyz prefix– child::xyz:*
• Selects all child elements that are in the namespace designated by the xyz prefix
Attribute nodes• There is one attribute node for each attribute that
appears in a document. (Duh again.)• Example: <foo bar="bat" bang="baz"/>
– There are two attribute nodes in the above example:• bar and bang
Node property: attributes
• Applies only to:– Element nodes
• Consists of:– Unordered list of zero or more attribute nodes
• For example:– <doc lang="en"/>
• The doc element's attributes property consists of one lang attribute
How to select attributes• Use the attribute axis, e.g. (in abbreviated form):
– @lang• Selects the attribute named lang
– @* or @node()• Selects all attributes of the context node
– @abc:foo• Selects the attribute named foo in the namespace designated by
the abc prefix– @abc:*
• Selects all attributes that are in the namespace designated by the abc prefix
Text nodes• There is one text node for each contiguous sequence
of character data in a document• Text nodes are never adjacent siblings to each other
– Adjacent text nodes are always automatically merged into one text node (e.g., when creating the result tree in XSLT)
• Lexical details are thrown away– The XPath data model knows nothing about:
• CDATA sections, entity references, or character references
• Example: <foo><</foo>– There is one text node in the above document (a < character)
• Example: <foo><![CDATA[<]]></foo>– Identical to the first example, as far as XPath is concerned
Text node quiz
<foo> <bar>Hello world.</bar></foo>
• Example:
• How many text nodes are in the above document?
Text node quiz: ANSWER
<foo> <bar>Hello world.</bar></foo>
• Example:
• How many text nodes in the above document?• ANSWER: 3
– 1: Linefeed, space, space– 2: Hello world.– 3: Linefeed
How to select text nodes• Use the text() node test:
– text()• Short for child::text()
– descendant::text()• Selects all text nodes that are descendants of the context node
Comment nodes• There is one comment node for each comment• Example:
– <!--This is a comment node-->
How to select comments• Use the comment() node test on the child axis:
– comment()• Short for child::comment()
Processing instruction (PI) nodes• There is one PI node for each PI• The XML declaration is not a PI
– <?xml version="1.0"?> is not a PI– (It's not a node at all but just a lexical detail that XPath knows
nothing about.)
• Example:– (This is a PI.)
• <?xml-stylesheet type="text/xsl" href="a.xsl"?>
How to select processing instructions• Use the processing-instruction() node test
– Any PI:• processing-instruction()
– Selects all PI children of the context node– Short for child::processing-instruction()
– PI with a specific target:• processing-instruction('xml-stylesheet')
– Selects all xml-stylesheet processing instruction children of the context node
Namespace nodes• There is one namespace node for each in-scope
namespace URI/prefix binding for each element in a document. (No duh... er... what?)– Always includes this (implicit) binding (used by reserved
attributes xml:lang and xml:space, etc.):• Prefix: “xml”• URI: “http://www.w3.org/XML/1998/namespace”
– Example: <foo/>• There is one namespace node in the above document
– Example: <foo xmlns="http://example.com"/>• There are two namespace nodes in the above document
– The implicit xml one (see above)– And this one:
» Prefix: “”» URI: “http://example.com”
Node property: namespace nodes
• As with the attributes property, applies only to:– Element nodes
• Consists of:– Unordered list of zero or more namespace nodes
• For example:– <foo xmlns:xyz="http://example.com"/>
• The foo element's namespace nodes property consists of two namespace nodes (one for xyz and one for xml)
How to select namespace nodes• Use the namespace axis:
– namespace::*• Selects all of the context node's namespace nodes
– namespace::node()• Same as above
– namespace::xyz• Select the context node's namespace node that declares the xyz
prefix.
Node property: parent• Applies to:
– All node types except root node:• Element nodes• Text nodes• Comment nodes• Processing instruction (PI) nodes• Attribute nodes• Namespace nodes
• Consists of:– Exactly one other node
• Root node, or• Element node
• Example: <foo bar="bat"/>– The bar attribute's parent is the foo element
How to access the parent node• Use the parent axis:
– ..• Selects the parent node of the context node• Short for parent::node()
– parent::doc• Select the parent node of the context node provided that it is an
element named doc– (otherwise return an empty node-set)
– parent::*• Select the parent node of the context node provided that it is an
element
A riddle
• “You have a parent but you are not a child.”• “What are you?”
A riddle, cont.
• “You have a parent but you are not a child.”• “What are you?”
– Hint:• Only 4 node types are children, but 6 node types have parents• 6 - 4 = 2 ...
The answer
• “You have a parent but you are not a child.”• “What are you?”
– Hint:• 4 node types are children, but 6 node types have parents• 6 - 4 = 2 ...
• ANSWER:– A namespace node or an attribute node of course!
• Embracing the asymmetry and moving on...
Derived node relationships
• A node's descendants consists of the transitive closure of the children property– A fancy way of saying:
• My children and my grandchildren and my great grandchildren and their kids and so on
• A node's ancestors consists of the transitive closure of the parent property– A fancy way of saying:
• My parent and my grandparent and my great grandparent and its parent and so on
Shooting blanks is okay• QUIZ: How many nodes will each of the following
expressions return?– parent::comment()– attribute::text()– ancestor::processing-instruction()– namespace::xyz:*
Shooting blanks is okay• QUIZ: How many nodes will each of the following
expressions return?– parent::comment()– attribute::text()– ancestor::processing-instruction()– namespace::xyz:*
• ANSWER: 0, by definition– These expressions are perfectly legal; they're just guaranteed
to return empty
Node property: string-value
• Applicable to:– All node types
• Root: concatenation of all descendant text node string-values• Element: concatenation of all descendant text node string-values• Attribute: normalized attribute value• Text: character data (always at least one character)• Comment: the content of the comment• PI: text following the PI target and whitespace
– e.g., type="text/xsl" href="style.xsl" is the string-value of an example stylesheet PI
• Namespace node – the namespace URI– Use <xsl:value-of/> to insert the string-value of a node
into the result tree
Node property: expanded-name
• Applicable to:– Elements and attributes:
• Local part: local name of node, returned by local-name()• URI part: namespace name (URI) of node, namespace-uri()
– PIs:• Local part: the PI target, e.g., xml-stylesheet• URI part: (always null)
– Namespace nodes:• Local part: the namespace prefix, e.g., xml or xyz or empty
string (“”) in the case of a default namespace• URI part: (always null)
• Root, text, and comment nodes do not have names
Document order
• There is an ordering for all nodes in a document called document order
• The root node is always the first node in a document• The rest are ordered according to where their XML
representation begins– Except that the relative order of attributes and namespace
nodes on the same element is implementation-defined
• Why should I care about document order?– Because it's the default order in which nodes are processed
by both <xsl:for-each> and <xsl:apply-templates>
Quiz: counting nodes
• How many nodes are in the following XML document?
Answer
• How many nodes are in the following XML document?• 15!
The first 14 nodes in the QUIZ 1 example
Quiz review
• para– Short for child::para
• *– Short for child::*
• node()– Short for child::node()
• @name– Short for attribute::name
• @*– Short for attribute::*
Quiz review
• para[1]– Short for child::para[1]– Equivalent to child::para[position() = 1]
• para[last()]– Short for child::para[last()]– Equivalent to child::para[position() = last()]
• */para– Short for child::*/child::para
• /doc/chapter[5]/section[2]– Short for /child::doc/child::chapter[5]/child::section[2]
Quiz review
• chapter//para– Short for:
– child::chapter/descendant-or-self::node()/child::para
• //para– Short for:
– /descendant-or-self::node()/para
• .– Short for:
– self::node()
• .//para– Short for:
– self::node()/descendant-or-self::node()/child::para
Quiz review
• ..– Short for parent::node()
• title– Short for child::title
• ../@lang– Short for parent::node()/attribute::lang
Summary of abbreviations• XPath has five abbreviations. They are:
– . is short for self::node()– .. is short for parent::node()– @ is short for attribute::– // is short for /descendant-or-self::node()/– foo is short for child::foo
XPath, the language
Descending from the clouds
Keep filling out that NOTES page
XPath basics: review• XPath is an expression language• Every expression returns a value
– XPath 1.0 has just four data types (write these down!):• Node-set (the most important)• String• Number• Boolean
• All expressions are evaluated in a context– Understanding context is crucial to understanding XPath
XPath context• All XPath expressions (whether in XSLT or not) are
evaluated in a context– The context consists of 6 parts:
• The context node• The context size (an integer 1 or higher)
– Returned by the last() function
• The context position (an integer 1 or higher)– Returned by the position() function
• A set of namespace/prefix declarations in scope for the expression
– Used to evaluate QNames in the expression, e.g., xyz:foo/xyz:bar
• A set of variable bindings• A function library
XPath context, cont.• The context comprises the entire world for an XPath
expression, so to speak.– Other than its context, there is no input to an XPath
expression. It consists of everything outside the expression itself that may affect the resulting value of the expression.
• The context indicates:– Where you are
• Where in the tree:– Context node
• Where in processing:– Context size (the size of an arbitrary list of nodes being processed)– Context position (the position of the current node in that list)
– What is available to you– What variables you can reference– What namespace prefixes you can use– What functions you can call
XPath syntax overview• XPath supports these kinds of expressions:
• Variable references:– $foo, $bar, etc.
• Function calls:– starts-with($str,'a')– true()– round($num)
• Parenthesized expressions– (//para)– (foo | bar)
• String literals– "foo", 'bar', etc.
• Numbers– 13, 24.7, .007, etc.
• cont...
XPath syntax overview, cont.
• Node-set expressions– /html/body/p[2]/text()– //@person | //person– (.//note | para/fnote)[1]– $ns[@id='xyz']
• Arithmetic expressions– (($x - 5) * 2) div -3– $pos mod 2
• Boolean expressions– $is-good and $is-valid– $x >= 4– position() != last()
Node-set expressions
• A node-set is:• An unordered collection of zero or more nodes
• Node-set expressions include:• Location paths (the most important kind of expression!)
– foo/bar[3]
• Union expressions (union of two node-set expressions using the | operator)
– $set | foo/bar[3]
• Filtered expressions (a predicate applied to any expression using the predicate operator)
– $set[.='good']
• Path expressions (any expression composed with a location path using the / or // operators)
– $set//bar
Location paths: a formal definition• A location path:
– Is the most important kind of XPath expression– Returns a node-set– Can be absolute or relative– Relative:
• One or more steps separated by /– foo– foo/bar
– Absolute:• /
– Selects the root node of the document that contains the context node– The only location path that doesn't have any steps in it
• / followed by a relative location path– /foo– /foo/bar
Location path steps
• A location path step has 3 parts:– An axis specifier– A node test– Zero or more predicates
• The above is equivalent to this abbreviated form:– paragraph[string-length(.) > 100]– (because child is the default axis)
• It selects each paragraph child whose string-value is greater than 100 characters in length
How a step is evaluated• Moving from left to right:
1. The axis identifies a set of nodes relative to the context node.2. The node test acts as a filter on that set.3. Each of any number of optional predicates in turn acts as a filter
on the set identified by the preceding predicates and node test to its left.
• For example:– child::paragraph[string-length(.)>100]
1. The child axis identifies all the children of the context node.2. Among those, the paragraph node test selects only the
elements named “paragraph”.3. Among those, the string-length(.)>100 predicate filters out
all but the nodes whose string-value is greater than 100 characters long.
The axis
• That's this part:• Can be any one of 13 axes:
– child::– self::– parent::– descendant::– descendant-or-self::– ancestor::– ancestor-or-self::– following::– following-sibling::– preceding::– peceding-sibling::– attribute::– namespace::
The 13 XPath axes• What each axis contains:
– child• The children of the context node.
– descendant• The descendants of the context node (children, children's children,
etc.).– parent
• The parent of the context node (empty if context node is root node).
– ancestor• The ancestors of the context node (parent, parent's parent, etc.).
The 13 XPath axes, cont.• What each axis contains, cont.:
– attribute• The attributes of the context node (empty if context node is not an
element).– namespace
• The namespace nodes of the context node (empty if context node is not an element).
– self• Just the context node itself.
– descendant-or-self• The context node and descendants of the context node.
– ancestor-or-self• The context node and ancestors of the context node.
The 13 XPath axes, cont.• What each axis contains, cont.:
– following-sibling• All nodes with the same parent as the context node that come
after the context node in document order (empty if context node is an attribute or namespace node).
– preceding-sibling• All nodes with the same parent as the context node that come
before the context node in document order (excluding attributes and namespace nodes).
– following• All nodes after the context node in document order, excluding
descendants, attributes, and namespace nodes.– preceding
• All nodes before the context node in document order, excluding ancestors, attributes, and namespace nodes.
A little observation about siblings
• The types of nodes that can be siblings are the same as the types of nodes that can be children– “Elements, comments, text, PIs!”
• That's because attributes and namespace nodes are by definition not siblings to anyone, not even to each other. They're just “attached” to their parent.
The node test
• That's this part:• A node test is a filter on an axis
– There are two kinds of node test:• Node type tests
– Any node: node()– Specific node type: text(), comment(), processing-instruction()– Specific PI target: processing-instruction('foo')
• Name tests– Wildcard (any name): *– Namespace-qualified wildcard (any local name within a particular
namespace): xyz:*, abc:*, etc.– QName (a specific expanded-name): foo, xyz:foo, the highlighted example,
etc.
– Node type tests are not functions• They're special forms that only happen to look like functions
Node type tests• The most inclusive node test: node()
– It includes every node regardless of its name or node type• Thus, it effectively selects all nodes on the given axis
– child::node() selects all children– ancestor::node() selects all ancestors– etc.
• Specific node type– text(), comment(), processing-instruction()
• Selects only the nodes of the given type from the given axis– descendant::text() selects all descendant text nodes– following::comment() selects all following comment nodes– preceding::processing-instruction() selects all preceding PI nodes
• Specific PI target– child::processing-instruction('xml-stylesheet')
Name tests• Name tests only select nodes of one type at a time,
depending on the axis– This is called the principal node type for an axis
• Attributes while on the attribute axis• Namespace nodes while on the namespace axis• Element nodes while on every other axis
• The wildcard: *– Selects all nodes of the principal node type
• child::* selects all element nodes on the child axis• attribute::* selects all attribute nodes on the attribute
axis– effectively no different than attribute::node() because the attribute
axis can only ever contain attribute nodes
Name tests, cont.• Namespace-qualified wildcards: xyz:*
– Selects all nodes of the principal node type whose expanded-name has a particular URI part
• child::xyz:* selects all element nodes on the child axis that are in the namespace designated by the xyz prefix
• QNames: foo, xyz:foo, etc.– Selects all nodes of the principal node type whose expanded-
name has a particular local part and a particular URI part• child::foo selects all element nodes on the child axis that
have local name foo and that are not in a namespace• ancestor::xyz:foo selects all element nodes on the ancestor axis that have local name foo and that are in the namespace designated by the xyz prefix
How multiple steps are evaluated
• The rightmost step indicates what nodes are returned:– table/tr/td
• The above location path returns a node-set of zero or more td elements
– /doc/section[5]/para[text() and @*]• What does the above location path return?
• Each step is evaluated once for each node returned by the step to its left, using that node as the context node for the evaluation
• The result is the union of the node-sets returned by all the evaluations of the rightmost step
Predicates
• A predicate filters a node-set to produce a new node-set– price[. > 5]
• Of all the price child elements, return only those whose string-value, when converted to a number, is greater than 5
– The predicate expression is evaluated once for each node in the node-set to be filtered, using that node as the context node for the evaluation
• The result is converted to a boolean (if necessary)• If true, the node is retained in the result• If false, the node is excluded from the result
Numeric predicates
• When the predicate expression evaluates to a number– It is interpreted in a special way, such that:
• foo[5] is short for foo[position()=5]• foo[last()] is short for foo[position()=last()]
Context size in predicates
• The last() function returns the context size– The number of nodes returned by the step (or arbitrary node-
set expression) to its left– foo[last()] evaluates to foo[5] if there are a total of 5 foo elements
Context position in predicates
• The position() function returns the context position– The proximity position of the context node for the current
predicate evaluation• The relative position of the node among all the nodes being
filtered in document order– foo[5] returns the 5th foo element in document order
– Unless the step uses one of the four reverse axes:– preceding– preceding-sibling– ancestor– ancestor-or-self
• ancestor::node()[1] is equivalent to ..• preceding-sibling::foo[1] returns the first foo element in
reverse document order
“Step filters” vs. “Expression filters”
• A predicate can be used to filter two different things:– A location path step– An arbitrary node-set expression
• $node-set[@foo='bar']
• Gotchas– //para[1] vs. (//para)[1]– ancestor::*[1] vs. (ancestor::*)[1]
Comparisons with node-sets
• price > 20– True if there are any price element children whose string-
value when converted to a number is greater than 20• foo[bar = bat]
– Select all foo elements that have any bar element child and any bat element child that have the same string-value
• Comparisons with empty node-sets always return false– Gotcha:
• foo != 2• foo != bar
– Use not() for the true complement• not(foo = 2)• not(foo = bar)
Functions overview• String functions
• string(), concat(), starts-with(), contains(), substring-before(), substring-after(), substring(), string-length(), normalize-space(), translate()
• Node-set functions• last(), position(), count(), id(), local-name(), namespace-uri(), name()
• Boolean functions• boolean(), not(), true(), false(), lang()
• Number functions• number(), sum(), floor(), ceiling(), round()
• XSLT adds:– document(), key(), generate-id(), system-property(),
format-number(), current(), element-available(), function-available(), unparsed-entity-uri()
XSLT element overview
XSLT elements, by use case
Creating nodesxsl:element, xsl:attribute, xsl:text, xsl:comment,
xsl:processing-instructionCopying nodes xsl:copy-of, xsl:copyRepetition (looping) xsl:for-eachSorting xsl:sortConditional processing xsl:choose, xsl:ifComputing or extracting a value xsl:value-ofDefining variables and parameters xsl:variable, xsl:paramDefining and calling subprocedures (named templates)
xsl:template, xsl:call-templateDefining and applying template rules
xsl:template, xsl:apply-templates, xsl:apply-importsNumbering and number formatting xsl:number, xsl:decimal-formatDebugging xsl:message
XSLT elements, cont.
Combining stylesheets (modularization)xsl:import, xsl:include
Compatibilityxsl:fallback
Building lookup indexesxsl:key
XSLT code generationxsl:namespace-alias
Output formattingxsl:output
Whitespace stripping xsl:strip-space, xsl:preserve-space
XSLT's processing model
The end: construct a result tree
The means: process lists
• If XPath is about trees, then XSLT is about lists– Populate arbitrary nodes from the source tree into lists– Iterate over those lists– For each node in the list, create part of the result tree
• Source tree -> List processing -> Result tree• Thus, there is always:
– a current node list, and– a current node
Two mechanisms for iterating over lists
• xsl:apply-templates and xsl:for-each• They both iterate over the nodes of a given node-set
– Supplied by the XPath expression in the select attribute
• For example:– <xsl:apply-templates select="para"/>
• “Populate the current node list with para elements, sorted in document order. For each para element, invoke the best-matching template rule.”
All XSLT processing begins with...
• A virtual call to:– <xsl:apply-templates select="/"/>
• The current node list initially consists of just one node– The root node of the source tree– In other words, the XSLT processor invokes the template rule
that matches the root node
• This call constructs the entire result tree– Nothing happens before it– Nothing happens after it
Your job as an XSLT stylesheet author...
• ...is to define—using template rules—what happens when the XSLT processor executes this instruction:– <xsl:apply-templates select="/"/>
Template rules
• An XSLT stylesheet contains a set of template rules• Two kinds of template rule:
– Those you define– Those that XSLT defines for you
• These are called the built-in template rules.
• There is a built-in template rule for each of the 7 types of node– Ensures that all calls to xsl:apply-templates will never
fail to find a matching template rule• Even if your stylesheet contains no explicit template rules at all
The empty stylesheet
• Consider this stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
</xsl:stylesheet>
• If you apply the above stylesheet to the example XML from QUIZ 1...– What will the result be?
The result$ xsltproc empty.xsl quiz1.xml<?xml version="1.0"?>
This is a simple XML document
You can do it! There's nothing to it! Go fast!
This will be interesting Here we go...
sub-chapter Who ever heard of nested chapters?!
another sub-chapter End of sub-chapter
No more nested chapters for now...
Template rules that you define
• When you define template rules, you override the default behavior
• An explicit template rule is:– An xsl:template element that has a match attribute
• For example:<xsl:template match="foo"> <!-- construct part of the result tree --> <xsl:apply-templates/> <!--...--></xsl:template>
Applying template rules
• <xsl:apply-templates/>– Short for:
• <xsl:apply-templates select="node()"/>
• Process all child nodes of the context node
Applying template rules: an OOP analogy
• <xsl:apply-templates/>• For each item in the list
– Invoke the same polymorphic function
• Each template rule is an implementation of that polymorphic function
Patterns
• The value of the match attribute is a pattern• Looks like an XPath expression
– Uses a subset of XPath syntax
• But has a more passive role– Does the current node match this pattern? Yes or no.
• When xsl:apply-templates is invoked, for each node in the list, the XSLT processor searches all the patterns of the stylesheet for the best-matching one
Example patterns
• Example patterns– /– /doc[@format='simple']– bar– foo/bar– section//para– @foo– @*– node()– text()– *– xyz:*
Does the pattern match?
• Informal:– If this pattern were an expression, would the node in question
ever be selected by it?
• Formal:– A node matches a pattern if the node is a member of the result
of evaluating the pattern as an expression with respect to some possible context node.
Template rules with multiple patterns
• Separate the alternative patterns with |• <xsl:template match="foo | bar">...
– Is short for:
<xsl:template match="foo"> <!--...--></xsl:template>
<xsl:template match="bar"> <!--...--></xsl:template>
What about conflicts?
• A foo element would match both of these template rules
<xsl:template match="foo"> <!--...--></xsl:template>
<xsl:template match="*"> <!--...--></xsl:template>
• Which one gets invoked by <xsl:apply-templates select="foo"/>?
Two steps to resolving conflicts
• When more than one template rule matches:1. Eliminate rules with lower import precedence.2. Eliminate rules with lower priority.
• Only one rule should be left, otherwise error• Import precedence depends on what file the rule
occurs in– Where it occurs in the import tree (via xsl:import)
• Priority depends on:– The priority attribute of the xsl:template element, or– The default priority (when priority attribute is absent)
Default priority
• Priority is a positive or negative decimal number• The higher the number, the higher the priority• There are four default priorities:
– -.5– -.25– 0– .5
• -.5 -.25 0 .5• |_________|_________|_______________|
Default priority depends on...
• ...the syntax of the match pattern• The most common pattern format has a priority of 0• 0
– Match a particular name• foo, xyz:foo, @foo, @xyz:foo, processing-instruction('foo')
• .5– The highest default priority– Any pattern with a predicate or multiple steps
• foo/bar, foo[2], foo[@good='yes']
The lower default priorities are...
• -.25– One-step wildcards within a namespace
• xyz:*, @xyz:*
• -.5– The lowest default priority– One-step wildcards regardless of name
• *, @*, text(), comment(), processing-instruction(), node()
Modes
• Modes allow you to process the same node again but do something different this time– <xsl:apply-templates select="heading" mode="toc"/>– <xsl:template match="heading" mode="toc">...
• When the mode attribute is absent, that means the default (unnamed) mode
• You can segment your template rules into sets organized by concern– What they generate in the result tree
The built-in template rules
• For elements and root nodes– Apply templates to children:
<xsl:template match="/ | *"> <xsl:apply-templates/></xsl:template>
• For text nodes and attribute nodes– Output the string-value of the node:
<xsl:template match="text() | @*"> <xsl:value-of select="."/></xsl:template>
The built-in template rules
• For processing instructions and comments:– Do nothing<xsl:template match="comment() | processing-instruction()"/>
• For namespace nodes– Do nothing
Template rule content
• Three kinds of elements:– XSLT instructions
• Any element in the XSLT namespace, e.g., <xsl:value-of/>– Literal result elements
• Any element in any other namespace, or no namespace• Creates a shallow copy of itself to the result tree
– Extension elements• Any element in a namespace that's declared as an extension namespace
(using the extension-element-prefixes attribute on the xsl:stylesheet element)
Attribute value templates
• Attributes on literal result elements can contain dynamic values, delimited by curly braces– <para class="{@format}">...
• To include a literal curly brace, double it:– <foo bar="{{not interpreted as XPath}}"/>
Miscellaneous topics...
The template rule engine
• A lot goes on behind-the-scenes• xsl:apply-templates is the most important
instruction in XSLT• <xsl:apply-templates select="para"/>
– This means: “Apply templates to the para element children of the context node.”
But what does “apply templates” mean?
• <xsl:apply-templates select="para"/>– Let the para node-set populate the current node list (in
document order)– For each node in the list, invoke the best-matching template
rule• A template rule
Namespace node quiz• Example:
– <foo xmlns:e="http://example.com"><bar/></foo>
• Quiz: How many namespace nodes in the above document?
Answer• Example:
– <foo xmlns="http://example.com"><bar/></foo>
• Quiz: How many namespace nodes are in the above document?– ANSWER: 4
• Two namespace nodes for each element– As we'll see, namespace nodes are a property of the element
for which they're in scope.
• Doesn't that make for a huge proliferation of namespace nodes?– Yes.
• Should I care?– Hardly ever.
QNames in XPath/XSLT• QNames are expanded
– Into local and URI parts– Using a set of namespace/prefix declarations
• Supplied in the XPath expression context
• This does not include a default namespace declaration (declared by xmlns)– Thus, if you want to select nodes in a particular namespace,
then you must use a prefix
• In other words, a QName without a prefix always designates a node that is not in a namespace
More node properties• String-value• Expanded-name
– Two parts:• Local part• Namespace URI part
• Also:– unique ID (used by id() function)– XSLT-specific:
• base URI (used by document() and xsl:import/include)• unparsed entity URIs (used by unparsed-entity-uris()
function)
• That's it!
Node relationship properties• The value of each of these node properties is a set of
other nodes:– Children– Attributes– Namespace nodes– Parent
• These properties are what connect one node to all the other nodes in the tree– By traversing these properties you can move from one node to
anywhere else in the tree
• These properties are not applicable to all 7 node types– We'll see which ones apply to which node types