Introduction to XSLT

Evan LenzXML/XSLT Consultanthttp://[email protected]

August 2, 2005

O’Reilly Open Source ConventionAugust 1 - 5, 2005

Introduction to XSLT

http://xmlportfolio.com/

mailto:[email protected]

Who is this guy?

• Evan Lenz– Majored in music– Over 5 years ago, read Michael Kay's XSLT Programmer's

Reference cover-to-cover while sitting by his newborn son's hospital bed

– Participated on the XSL Working Group for a couple years– Wrote XSLT 1.0 Pocket Reference (due out this month)– Preparing for entrance to a Ph.D. program in Digital Arts and

Experimental Media

Why does he like XSLT?

• XSLT is:– Powerful– Small– Beautiful– In high demand– Fun to learn– Fun to teach

What should I expect this afternoon?

• Fasten your seatbelts• A variety of interactive exercises and traditional

presentation• Feel free to feel overwhelmed

– You're learning more than you think!

• Try your best while you're here and it will be time well spent

• Have fun!

What's with the handouts?

• The big handout is a late-stage draft of XSLT 1.0 Pocket Reference, due out this month– If you would like a complimentary copy of the final book, put

your name and mailing address on the sign-up sheet

• The smaller handout contains exercises that we will be using today

XSLT from 30,000 feet

High-level overview

What is XSLT?

• “XSL Transformations”– “A language for

transforming XML documents into other XML documents”

– W3C Recommendation• http://www.w3.org/TR/xslt• Version 1.0: 1999-11-16

http://www.w3.org/TR/xslt

OK, then what is XSL?

• “Extensible Stylesheet Language”– “A language for

expressing stylesheets”– W3C Recommendation

• http://www.w3.org/TR/xsl

• Version 1.0: 2001-10-15– Has 2 parts:

• XSLT– Refactored out of XSL so

that it could proceed independently

• XSL-FO– “Formatting Objects”

http://www.w3.org/TR/xsl

http://www.w3.org/TR/xsl

What is XPath?

• “XML Path Language”– “A language for addressing

parts of an XML document”– W3C Recommendation

• http://www.w3.org/TR/xpath• Version 1.0: 1999-11-16• Released on the same day

as XSLT 1.0– The expression language

used in XSLT

http://www.w3.org/TR/xpath

A relationship of subsets

• XPath is part of XSLT• XSLT is part of XSL• Today we are

concerned only with the inner two circles:– XSLT and XPath– XSL, a.k.a. XSL-FO, is

out of scope for today

What is XSLT used for?

• Common applications– Stylesheets for converting XML to HTML

• Generating Web pages or whole websites• Docbook -> HTML

– Transformations from one document type to another• *ML to *ML – as many potential applications as there are XML

document types• RSS, SVG, UBL, LegalXML, HrXML, XBRL• Office applications

– SpreadsheetML, WordML, Keynote XML, OOo XML, PowerPoint (in next version), Access XML, etc.

– Extracting data from documents– Modifying or fixing up documents

Where is XSLT used?

• Every platform– Windows, Linux, Mac, UNIX, Java

• Many browsers support XSLT natively– Firefox/Mozilla, Internet Explorer, Safari

• Many frameworks use or support XSLT– .NET, Java, LAMP

• PHP5 now uses libxslt– Cocoon, 4Suite, Amazon web services, Google appliance,

Cisco routers, etc., etc.

• XSLT IS EVERYWHERE!!

Interoperable implementations?

• In terms of interoperability, XSLT is unmatched among languages having multiple implementations– Java

• Saxon – http://saxon.sf.net (open-source)• Xalan-J – http://xml.apache.org/xalan-j/ (open-source)

– Windows• MSXML – fast, fully conformant

– Python• 4xslt – http://www.4suite.org (open-source)

– C• libxslt – http://xmlsoft.org (open-source; used in Firefox, Safari,

PHP5, etc.)• Xalan-C++ – http://xml.apache.org/xalan-c/ (open-source)

http://saxon.sf.net/

http://xml.apache.org/xalan-j/

http://www.4suite.org/

http://xmlsoft.org/

http://xml.apache.org/xalan-c/

Enough already, let's see some code!

Example XML file

INPUT: names.xml<people> <person> <givenName>Joe</givenName> <familyName>Johnson</familyName> </person> <person> <givenName>Jane</givenName> <familyName>Johnson</familyName> </person> <person> <givenName>Jim</givenName> <familyName>Johannson</familyName> </person> <person> <givenName>Jody</givenName> <familyName>Johannson</familyName> </person></people>

A very simple stylesheet, names.xsl

OUTPUT: the result of the transformation

$ saxon names.xml names.xsl >names.html

Or we could open the XML directly in the browser

• Oops, we must first add a processing instruction (PI) to the top, like this:<?xml-stylesheet type="text/xsl" href="names.xsl"?><people> </people>

That's better. Displays as HTML but viewing source shows it's just XML.

One more example for now

<?xml-stylesheet type="text/xsl" href="article.xsl"?><article> <heading>This is a short article</heading> <para>This is the <emphasis>first</emphasis> paragraph.</para> <para>This is the <strong>second</strong> paragraph.</para></article>

• INPUT: article.xml

A rule-oriented stylesheet

• article.xsl:

A rule-oriented stylesheet, cont.

• article.xsl, cont.:

OUTPUT: article.xml transformed to HTML

See a pattern here?

XPath in a nutshell

How XPath fits in XSLT• XPath expressions appear in attribute values, e.g.:

– <xsl:for-each select="/people/person"/>– <xsl:value-of select="givenName"/>– <xsl:apply-templates select="/article/para"/>

• What these mean:– /people/person

• Select all person child elements of all people child elements of the root node

– givenName• Select all givenName child elements of the context node

– /article/para• Select all para child elements of all article child elements of

the root node

The skinny on XPath• XPath is an expression language

– The only thing you can do with XPath is write expressions– When we say “expression”, we mean “XPath expression”

• Every expression returns a value– XPath 1.0 has just four data types:

• Node-set (the most important)• String• Number• Boolean

• All expressions are evaluated in a context– Understanding context is crucial to understanding XPath

Path expressions

• Expressions that return node-sets are sometimes called path expressions

• A node-set is:– An unordered collection of zero or more nodes

• Every expression is evaluated relative to exactly one context node– The context node is analogous to the current directory in a

filesystem• On a CLI, dir/* expands to all the files in the dir directory

inside the current directory• As an XPath expression, dir/* would select all the element

children of all the dir element children of the context node

A filesystem analogy

• Addressing files:– Relative

• dir/*• ../file

– Absolute• /home/elenz/file.txt

• Addressing XML nodes:– Relative

• body/p• ../table

– Absolute• /html/body/p

QUIZ 1: You have 5 minutes• Ready?• Set...

Go! Use this cheat sheet1) para selects the para element children of the context node

2) * selects all element children of the context node

3) node() selects all children of the context node

4) @name selects the name attribute of the context node

5) @* selects all the attributes of the context node

6) para[1] selects the first para child of the context node

7) para[last()] selects the last para child of the context node

8) */para selects all para grandchildren of the context node

9) /doc/chapter[5]/section[2] selects the second section of the fifth chapter of the doc

10) chapter//para selects the para element descendants of the chapter element children of the context node

11) //para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

12) . selects the context node

13) .//para selects the para element descendants of the context node

14) .. selects the parent of the context node15) title

16) ../@lang selects the lang attribute of the parent of the context node

XPath is all about trees

A venture into the abstract world of the XPath data model

Start filling out the NOTES page

The XPath data model

• An abstraction of an XML document, after parsing– In XSLT, models the source tree, stylesheet tree, & result tree

• An XML document is a tree of nodes• There are 7 kinds of nodes (memorize these!)

– Root node– Element node– Attribute node– Text node– Comment node– Processing Instruction (PI) node– Namespace node

Root nodes• Every XML document has exactly one root node

– An “invisible” container for the whole document– The XPath expression / selects the root node of the same

document as the context node

• The root node is not an element– Instead, the “document element” or “root element” is a child of

the root node

• It can also contain:– Processing instruction (PI) nodes– Comment nodes

• XSLT extension to XPath data model:– Root node may contain text nodes– Root node may contain more than one element node

Element nodes• There is one element node for each element that

appears in a document. (Duh.)• Example: <foo><bar/></foo>

– There are two element nodes above: foo and bar.– The foo element contains the bar element node.

• Element nodes can contain:– Text nodes– Other element nodes– Comment nodes– Processing instruction (PI) nodes

Node property: children• Applies only to:

– Element nodes– Root nodes

• Consists of:– Ordered list of zero or more other nodes

• 4 kinds of nodes can be children (memorize this subset!)– Element nodes– Text nodes– Comment nodes– Processing instruction (PI) nodes

• Instead of “Lions, Tigers, and Bears, Oh My”, chant:– “Elements, comments, text, PIs! Elements, comments, text, PIs!”

• Example: <foo><bar/>  </foo>– The foo element's children consists of four nodes in order:

• 1) element, 2) text, 3) comment, 4) text

Why should I memorize that subset of four?

• Knowing what types of nodes can be children is crucial to understanding what this little, unassuming instruction does (as we shall see):– <xsl:apply-templates/>

• So remember:– “Elements, comments, text, PIs!”– “Elements, comments, text, PIs!”

How to access the children• Use the child axis, e.g. (in non-abbreviated form):

– child::node()• Selects all children of the context node

– child::*• Selects all child elements of the context node

– child::paragraph• Selects all child elements named paragraph

– child::xyz:foo• Selects all child elements named foo in the namespace

designated by the xyz prefix– child::xyz:*

• Selects all child elements that are in the namespace designated by the xyz prefix

Attribute nodes• There is one attribute node for each attribute that

appears in a document. (Duh again.)• Example: <foo bar="bat" bang="baz"/>

– There are two attribute nodes in the above example:• bar and bang

Node property: attributes

• Applies only to:– Element nodes

• Consists of:– Unordered list of zero or more attribute nodes

• For example:– <doc lang="en"/>

• The doc element's attributes property consists of one lang attribute

How to select attributes• Use the attribute axis, e.g. (in abbreviated form):

– @lang• Selects the attribute named lang

– @* or @node()• Selects all attributes of the context node

– @abc:foo• Selects the attribute named foo in the namespace designated by

the abc prefix– @abc:*

• Selects all attributes that are in the namespace designated by the abc prefix

Text nodes• There is one text node for each contiguous sequence

of character data in a document• Text nodes are never adjacent siblings to each other

– Adjacent text nodes are always automatically merged into one text node (e.g., when creating the result tree in XSLT)

• Lexical details are thrown away– The XPath data model knows nothing about:

• CDATA sections, entity references, or character references

• Example: <foo><</foo>– There is one text node in the above document (a < character)

• Example: <foo><![CDATA[<]]></foo>– Identical to the first example, as far as XPath is concerned

Text node quiz

<foo> <bar>Hello world.</bar></foo>

• Example:

• How many text nodes are in the above document?

Text node quiz: ANSWER

<foo> <bar>Hello world.</bar></foo>

• Example:

• How many text nodes in the above document?• ANSWER: 3

– 1: Linefeed, space, space– 2: Hello world.– 3: Linefeed

How to select text nodes• Use the text() node test:

– text()• Short for child::text()

– descendant::text()• Selects all text nodes that are descendants of the context node

Comment nodes• There is one comment node for each comment• Example:

–

How to select comments• Use the comment() node test on the child axis:

– comment()• Short for child::comment()

Processing instruction (PI) nodes• There is one PI node for each PI• The XML declaration is not a PI

– <?xml version="1.0"?> is not a PI– (It's not a node at all but just a lexical detail that XPath knows

nothing about.)

• Example:– (This is a PI.)

• <?xml-stylesheet type="text/xsl" href="a.xsl"?>

How to select processing instructions• Use the processing-instruction() node test

– Any PI:• processing-instruction()

– Selects all PI children of the context node– Short for child::processing-instruction()

– PI with a specific target:• processing-instruction('xml-stylesheet')

– Selects all xml-stylesheet processing instruction children of the context node

Namespace nodes• There is one namespace node for each in-scope

namespace URI/prefix binding for each element in a document. (No duh... er... what?)– Always includes this (implicit) binding (used by reserved

attributes xml:lang and xml:space, etc.):• Prefix: “xml”• URI: “http://www.w3.org/XML/1998/namespace”

– Example: <foo/>• There is one namespace node in the above document

– Example: <foo xmlns="http://example.com"/>• There are two namespace nodes in the above document

– The implicit xml one (see above)– And this one:

» Prefix: “”» URI: “http://example.com”

Node property: namespace nodes

• As with the attributes property, applies only to:– Element nodes

• Consists of:– Unordered list of zero or more namespace nodes

• For example:– <foo xmlns:xyz="http://example.com"/>

• The foo element's namespace nodes property consists of two namespace nodes (one for xyz and one for xml)

How to select namespace nodes• Use the namespace axis:

– namespace::*• Selects all of the context node's namespace nodes

– namespace::node()• Same as above

– namespace::xyz• Select the context node's namespace node that declares the xyz

prefix.

Node property: parent• Applies to:

– All node types except root node:• Element nodes• Text nodes• Comment nodes• Processing instruction (PI) nodes• Attribute nodes• Namespace nodes

• Consists of:– Exactly one other node

• Root node, or• Element node

• Example: <foo bar="bat"/>– The bar attribute's parent is the foo element

How to access the parent node• Use the parent axis:

– ..• Selects the parent node of the context node• Short for parent::node()

– parent::doc• Select the parent node of the context node provided that it is an

element named doc– (otherwise return an empty node-set)

– parent::*• Select the parent node of the context node provided that it is an

element

A riddle

• “You have a parent but you are not a child.”• “What are you?”

A riddle, cont.


– Hint:• Only 4 node types are children, but 6 node types have parents• 6 - 4 = 2 ...

The answer


– Hint:• 4 node types are children, but 6 node types have parents• 6 - 4 = 2 ...

• ANSWER:– A namespace node or an attribute node of course!

• Embracing the asymmetry and moving on...

Derived node relationships

• A node's descendants consists of the transitive closure of the children property– A fancy way of saying:

• My children and my grandchildren and my great grandchildren and their kids and so on

• A node's ancestors consists of the transitive closure of the parent property– A fancy way of saying:

• My parent and my grandparent and my great grandparent and its parent and so on

Shooting blanks is okay• QUIZ: How many nodes will each of the following

expressions return?– parent::comment()– attribute::text()– ancestor::processing-instruction()– namespace::xyz:*

Shooting blanks is okay• QUIZ: How many nodes will each of the following

expressions return?– parent::comment()– attribute::text()– ancestor::processing-instruction()– namespace::xyz:*

• ANSWER: 0, by definition– These expressions are perfectly legal; they're just guaranteed

to return empty

Node property: string-value

• Applicable to:– All node types

• Root: concatenation of all descendant text node string-values• Element: concatenation of all descendant text node string-values• Attribute: normalized attribute value• Text: character data (always at least one character)• Comment: the content of the comment• PI: text following the PI target and whitespace

– e.g., type="text/xsl" href="style.xsl" is the string-value of an example stylesheet PI

• Namespace node – the namespace URI– Use <xsl:value-of/> to insert the string-value of a node

into the result tree

Node property: expanded-name

• Applicable to:– Elements and attributes:

• Local part: local name of node, returned by local-name()• URI part: namespace name (URI) of node, namespace-uri()

– PIs:• Local part: the PI target, e.g., xml-stylesheet• URI part: (always null)

– Namespace nodes:• Local part: the namespace prefix, e.g., xml or xyz or empty

string (“”) in the case of a default namespace• URI part: (always null)

• Root, text, and comment nodes do not have names

Document order

• There is an ordering for all nodes in a document called document order

• The root node is always the first node in a document• The rest are ordered according to where their XML

representation begins– Except that the relative order of attributes and namespace

nodes on the same element is implementation-defined

• Why should I care about document order?– Because it's the default order in which nodes are processed

by both <xsl:for-each> and <xsl:apply-templates>

Quiz: counting nodes

• How many nodes are in the following XML document?

Answer

• How many nodes are in the following XML document?• 15!

The first 14 nodes in the QUIZ 1 example

Quiz review

• para– Short for child::para

• *– Short for child::*

• node()– Short for child::node()

• @name– Short for attribute::name

• @*– Short for attribute::*

Quiz review

• para[1]– Short for child::para[1]– Equivalent to child::para[position() = 1]

• para[last()]– Short for child::para[last()]– Equivalent to child::para[position() = last()]

• */para– Short for child::*/child::para

• /doc/chapter[5]/section[2]– Short for /child::doc/child::chapter[5]/child::section[2]

Quiz review

• chapter//para– Short for:

– child::chapter/descendant-or-self::node()/child::para

• //para– Short for:

– /descendant-or-self::node()/para

• .– Short for:

– self::node()

• .//para– Short for:

– self::node()/descendant-or-self::node()/child::para

Quiz review

• ..– Short for parent::node()

• title– Short for child::title

• ../@lang– Short for parent::node()/attribute::lang

Summary of abbreviations• XPath has five abbreviations. They are:

– . is short for self::node()– .. is short for parent::node()– @ is short for attribute::– // is short for /descendant-or-self::node()/– foo is short for child::foo

XPath, the language

Descending from the clouds

Keep filling out that NOTES page

XPath basics: review• XPath is an expression language• Every expression returns a value

– XPath 1.0 has just four data types (write these down!):• Node-set (the most important)• String• Number• Boolean

• All expressions are evaluated in a context– Understanding context is crucial to understanding XPath

XPath context• All XPath expressions (whether in XSLT or not) are

evaluated in a context– The context consists of 6 parts:

• The context node• The context size (an integer 1 or higher)

– Returned by the last() function

• The context position (an integer 1 or higher)– Returned by the position() function

• A set of namespace/prefix declarations in scope for the expression

– Used to evaluate QNames in the expression, e.g., xyz:foo/xyz:bar

• A set of variable bindings• A function library

XPath context, cont.• The context comprises the entire world for an XPath

expression, so to speak.– Other than its context, there is no input to an XPath

expression. It consists of everything outside the expression itself that may affect the resulting value of the expression.

• The context indicates:– Where you are

• Where in the tree:– Context node

• Where in processing:– Context size (the size of an arbitrary list of nodes being processed)– Context position (the position of the current node in that list)

– What is available to you– What variables you can reference– What namespace prefixes you can use– What functions you can call

XPath syntax overview• XPath supports these kinds of expressions:

• Variable references:– $foo, $bar, etc.

• Function calls:– starts-with($str,'a')– true()– round($num)

• Parenthesized expressions– (//para)– (foo | bar)

• String literals– "foo", 'bar', etc.

• Numbers– 13, 24.7, .007, etc.

• cont...

XPath syntax overview, cont.

• Node-set expressions– /html/body/p[2]/text()– //@person | //person– (.//note | para/fnote)[1]– $ns[@id='xyz']

• Arithmetic expressions– (($x - 5) * 2) div -3– $pos mod 2

• Boolean expressions– $is-good and $is-valid– $x >= 4– position() != last()

Node-set expressions

• A node-set is:• An unordered collection of zero or more nodes

• Node-set expressions include:• Location paths (the most important kind of expression!)

– foo/bar[3]

• Union expressions (union of two node-set expressions using the | operator)

– $set | foo/bar[3]

• Filtered expressions (a predicate applied to any expression using the predicate operator)

– $set[.='good']

• Path expressions (any expression composed with a location path using the / or // operators)

– $set//bar

Location paths: a formal definition• A location path:

– Is the most important kind of XPath expression– Returns a node-set– Can be absolute or relative– Relative:

• One or more steps separated by /– foo– foo/bar

– Absolute:• /

– Selects the root node of the document that contains the context node– The only location path that doesn't have any steps in it

• / followed by a relative location path– /foo– /foo/bar

Location path steps

• A location path step has 3 parts:– An axis specifier– A node test– Zero or more predicates

• The above is equivalent to this abbreviated form:– paragraph[string-length(.) > 100]– (because child is the default axis)

• It selects each paragraph child whose string-value is greater than 100 characters in length

How a step is evaluated• Moving from left to right:

1. The axis identifies a set of nodes relative to the context node.2. The node test acts as a filter on that set.3. Each of any number of optional predicates in turn acts as a filter

on the set identified by the preceding predicates and node test to its left.

• For example:– child::paragraph[string-length(.)>100]

1. The child axis identifies all the children of the context node.2. Among those, the paragraph node test selects only the

elements named “paragraph”.3. Among those, the string-length(.)>100 predicate filters out

all but the nodes whose string-value is greater than 100 characters long.

The axis

• That's this part:• Can be any one of 13 axes:

– child::– self::– parent::– descendant::– descendant-or-self::– ancestor::– ancestor-or-self::– following::– following-sibling::– preceding::– peceding-sibling::– attribute::– namespace::

The 13 XPath axes• What each axis contains:

– child• The children of the context node.

– descendant• The descendants of the context node (children, children's children,

etc.).– parent

• The parent of the context node (empty if context node is root node).

– ancestor• The ancestors of the context node (parent, parent's parent, etc.).

The 13 XPath axes, cont.• What each axis contains, cont.:

– attribute• The attributes of the context node (empty if context node is not an

element).– namespace

• The namespace nodes of the context node (empty if context node is not an element).

– self• Just the context node itself.

– descendant-or-self• The context node and descendants of the context node.

– ancestor-or-self• The context node and ancestors of the context node.

The 13 XPath axes, cont.• What each axis contains, cont.:

– following-sibling• All nodes with the same parent as the context node that come

after the context node in document order (empty if context node is an attribute or namespace node).

– preceding-sibling• All nodes with the same parent as the context node that come

before the context node in document order (excluding attributes and namespace nodes).

– following• All nodes after the context node in document order, excluding

descendants, attributes, and namespace nodes.– preceding

• All nodes before the context node in document order, excluding ancestors, attributes, and namespace nodes.

A little observation about siblings

• The types of nodes that can be siblings are the same as the types of nodes that can be children– “Elements, comments, text, PIs!”

• That's because attributes and namespace nodes are by definition not siblings to anyone, not even to each other. They're just “attached” to their parent.

The node test

• That's this part:• A node test is a filter on an axis

– There are two kinds of node test:• Node type tests

– Any node: node()– Specific node type: text(), comment(), processing-instruction()– Specific PI target: processing-instruction('foo')

• Name tests– Wildcard (any name): *– Namespace-qualified wildcard (any local name within a particular

namespace): xyz:*, abc:*, etc.– QName (a specific expanded-name): foo, xyz:foo, the highlighted example,

etc.

– Node type tests are not functions• They're special forms that only happen to look like functions

Node type tests• The most inclusive node test: node()

– It includes every node regardless of its name or node type• Thus, it effectively selects all nodes on the given axis

– child::node() selects all children– ancestor::node() selects all ancestors– etc.

• Specific node type– text(), comment(), processing-instruction()

• Selects only the nodes of the given type from the given axis– descendant::text() selects all descendant text nodes– following::comment() selects all following comment nodes– preceding::processing-instruction() selects all preceding PI nodes

• Specific PI target– child::processing-instruction('xml-stylesheet')

Name tests• Name tests only select nodes of one type at a time,

depending on the axis– This is called the principal node type for an axis

• Attributes while on the attribute axis• Namespace nodes while on the namespace axis• Element nodes while on every other axis

• The wildcard: *– Selects all nodes of the principal node type

• child::* selects all element nodes on the child axis• attribute::* selects all attribute nodes on the attribute

axis– effectively no different than attribute::node() because the attribute

axis can only ever contain attribute nodes

Name tests, cont.• Namespace-qualified wildcards: xyz:*

– Selects all nodes of the principal node type whose expanded-name has a particular URI part

• child::xyz:* selects all element nodes on the child axis that are in the namespace designated by the xyz prefix

• QNames: foo, xyz:foo, etc.– Selects all nodes of the principal node type whose expanded-

name has a particular local part and a particular URI part• child::foo selects all element nodes on the child axis that

have local name foo and that are not in a namespace• ancestor::xyz:foo selects all element nodes on the ancestor axis that have local name foo and that are in the namespace designated by the xyz prefix

How multiple steps are evaluated

• The rightmost step indicates what nodes are returned:– table/tr/td

• The above location path returns a node-set of zero or more td elements

– /doc/section[5]/para[text() and @*]• What does the above location path return?

• Each step is evaluated once for each node returned by the step to its left, using that node as the context node for the evaluation

• The result is the union of the node-sets returned by all the evaluations of the rightmost step

Predicates

• A predicate filters a node-set to produce a new node-set– price[. > 5]

• Of all the price child elements, return only those whose string-value, when converted to a number, is greater than 5

– The predicate expression is evaluated once for each node in the node-set to be filtered, using that node as the context node for the evaluation

• The result is converted to a boolean (if necessary)• If true, the node is retained in the result• If false, the node is excluded from the result

Numeric predicates

• When the predicate expression evaluates to a number– It is interpreted in a special way, such that:

• foo[5] is short for foo[position()=5]• foo[last()] is short for foo[position()=last()]

Context size in predicates

• The last() function returns the context size– The number of nodes returned by the step (or arbitrary node-

set expression) to its left– foo[last()] evaluates to foo[5] if there are a total of 5 foo elements

Context position in predicates

• The position() function returns the context position– The proximity position of the context node for the current

predicate evaluation• The relative position of the node among all the nodes being

filtered in document order– foo[5] returns the 5th foo element in document order

– Unless the step uses one of the four reverse axes:– preceding– preceding-sibling– ancestor– ancestor-or-self

• ancestor::node()[1] is equivalent to ..• preceding-sibling::foo[1] returns the first foo element in

reverse document order

“Step filters” vs. “Expression filters”

• A predicate can be used to filter two different things:– A location path step– An arbitrary node-set expression

• $node-set[@foo='bar']

• Gotchas– //para[1] vs. (//para)[1]– ancestor::*[1] vs. (ancestor::*)[1]

Comparisons with node-sets

• price > 20– True if there are any price element children whose string-

value when converted to a number is greater than 20• foo[bar = bat]

– Select all foo elements that have any bar element child and any bat element child that have the same string-value

• Comparisons with empty node-sets always return false– Gotcha:

• foo != 2• foo != bar

– Use not() for the true complement• not(foo = 2)• not(foo = bar)

Functions overview• String functions

• string(), concat(), starts-with(), contains(), substring-before(), substring-after(), substring(), string-length(), normalize-space(), translate()

• Node-set functions• last(), position(), count(), id(), local-name(), namespace-uri(), name()

• Boolean functions• boolean(), not(), true(), false(), lang()

• Number functions• number(), sum(), floor(), ceiling(), round()

• XSLT adds:– document(), key(), generate-id(), system-property(),

format-number(), current(), element-available(), function-available(), unparsed-entity-uri()

XSLT element overview

XSLT elements, by use case

Creating nodesxsl:element, xsl:attribute, xsl:text, xsl:comment,

xsl:processing-instructionCopying nodes xsl:copy-of, xsl:copyRepetition (looping) xsl:for-eachSorting xsl:sortConditional processing xsl:choose, xsl:ifComputing or extracting a value xsl:value-ofDefining variables and parameters xsl:variable, xsl:paramDefining and calling subprocedures (named templates)

xsl:template, xsl:call-templateDefining and applying template rules

xsl:template, xsl:apply-templates, xsl:apply-importsNumbering and number formatting xsl:number, xsl:decimal-formatDebugging xsl:message

XSLT elements, cont.

Combining stylesheets (modularization)xsl:import, xsl:include

Compatibilityxsl:fallback

Building lookup indexesxsl:key

XSLT code generationxsl:namespace-alias

Output formattingxsl:output

Whitespace stripping xsl:strip-space, xsl:preserve-space

XSLT's processing model

The end: construct a result tree

The means: process lists

• If XPath is about trees, then XSLT is about lists– Populate arbitrary nodes from the source tree into lists– Iterate over those lists– For each node in the list, create part of the result tree

• Source tree -> List processing -> Result tree• Thus, there is always:

– a current node list, and– a current node

Two mechanisms for iterating over lists

• xsl:apply-templates and xsl:for-each• They both iterate over the nodes of a given node-set

– Supplied by the XPath expression in the select attribute

• For example:– <xsl:apply-templates select="para"/>

• “Populate the current node list with para elements, sorted in document order. For each para element, invoke the best-matching template rule.”

All XSLT processing begins with...

• A virtual call to:– <xsl:apply-templates select="/"/>

• The current node list initially consists of just one node– The root node of the source tree– In other words, the XSLT processor invokes the template rule

that matches the root node

• This call constructs the entire result tree– Nothing happens before it– Nothing happens after it

Your job as an XSLT stylesheet author...

• ...is to define—using template rules—what happens when the XSLT processor executes this instruction:– <xsl:apply-templates select="/"/>

Template rules

• An XSLT stylesheet contains a set of template rules• Two kinds of template rule:

– Those you define– Those that XSLT defines for you

• These are called the built-in template rules.

• There is a built-in template rule for each of the 7 types of node– Ensures that all calls to xsl:apply-templates will never

fail to find a matching template rule• Even if your stylesheet contains no explicit template rules at all

The empty stylesheet

• Consider this stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

</xsl:stylesheet>

• If you apply the above stylesheet to the example XML from QUIZ 1...– What will the result be?

The result$ xsltproc empty.xsl quiz1.xml<?xml version="1.0"?>

This is a simple XML document

You can do it! There's nothing to it! Go fast!

This will be interesting Here we go...

sub-chapter Who ever heard of nested chapters?!

another sub-chapter End of sub-chapter

No more nested chapters for now...

Template rules that you define

• When you define template rules, you override the default behavior

• An explicit template rule is:– An xsl:template element that has a match attribute

• For example:<xsl:template match="foo">  <xsl:apply-templates/> </xsl:template>

Applying template rules

• <xsl:apply-templates/>– Short for:

• <xsl:apply-templates select="node()"/>

• Process all child nodes of the context node

Applying template rules: an OOP analogy

• <xsl:apply-templates/>• For each item in the list

– Invoke the same polymorphic function

• Each template rule is an implementation of that polymorphic function

Patterns

• The value of the match attribute is a pattern• Looks like an XPath expression

– Uses a subset of XPath syntax

• But has a more passive role– Does the current node match this pattern? Yes or no.

• When xsl:apply-templates is invoked, for each node in the list, the XSLT processor searches all the patterns of the stylesheet for the best-matching one

Example patterns

• Example patterns– /– /doc[@format='simple']– bar– foo/bar– section//para– @foo– @*– node()– text()– *– xyz:*

Does the pattern match?

• Informal:– If this pattern were an expression, would the node in question

ever be selected by it?

• Formal:– A node matches a pattern if the node is a member of the result

of evaluating the pattern as an expression with respect to some possible context node.

Template rules with multiple patterns

• Separate the alternative patterns with |• <xsl:template match="foo | bar">...

– Is short for:

<xsl:template match="foo"> </xsl:template>

<xsl:template match="bar"> </xsl:template>

What about conflicts?

• A foo element would match both of these template rules

<xsl:template match="foo"> </xsl:template>

<xsl:template match="*"> </xsl:template>

• Which one gets invoked by <xsl:apply-templates select="foo"/>?

Two steps to resolving conflicts

• When more than one template rule matches:1. Eliminate rules with lower import precedence.2. Eliminate rules with lower priority.

• Only one rule should be left, otherwise error• Import precedence depends on what file the rule

occurs in– Where it occurs in the import tree (via xsl:import)

• Priority depends on:– The priority attribute of the xsl:template element, or– The default priority (when priority attribute is absent)

Default priority

• Priority is a positive or negative decimal number• The higher the number, the higher the priority• There are four default priorities:

– -.5– -.25– 0– .5

• -.5 -.25 0 .5• |_________|_________|_______________|

Default priority depends on...

• ...the syntax of the match pattern• The most common pattern format has a priority of 0• 0

– Match a particular name• foo, xyz:foo, @foo, @xyz:foo, processing-instruction('foo')

• .5– The highest default priority– Any pattern with a predicate or multiple steps

• foo/bar, foo[2], foo[@good='yes']

The lower default priorities are...

• -.25– One-step wildcards within a namespace

• xyz:*, @xyz:*

• -.5– The lowest default priority– One-step wildcards regardless of name

• *, @*, text(), comment(), processing-instruction(), node()

Modes

• Modes allow you to process the same node again but do something different this time– <xsl:apply-templates select="heading" mode="toc"/>– <xsl:template match="heading" mode="toc">...

• When the mode attribute is absent, that means the default (unnamed) mode

• You can segment your template rules into sets organized by concern– What they generate in the result tree

The built-in template rules

• For elements and root nodes– Apply templates to children:

<xsl:template match="/ | *"> <xsl:apply-templates/></xsl:template>

• For text nodes and attribute nodes– Output the string-value of the node:

<xsl:template match="text() | @*"> <xsl:value-of select="."/></xsl:template>

The built-in template rules

• For processing instructions and comments:– Do nothing<xsl:template match="comment() | processing-instruction()"/>

• For namespace nodes– Do nothing

Template rule content

• Three kinds of elements:– XSLT instructions

• Any element in the XSLT namespace, e.g., <xsl:value-of/>– Literal result elements

• Any element in any other namespace, or no namespace• Creates a shallow copy of itself to the result tree

– Extension elements• Any element in a namespace that's declared as an extension namespace

(using the extension-element-prefixes attribute on the xsl:stylesheet element)

Attribute value templates

• Attributes on literal result elements can contain dynamic values, delimited by curly braces– <para class="{@format}">...

• To include a literal curly brace, double it:– <foo bar="{{not interpreted as XPath}}"/>

Miscellaneous topics...

The template rule engine

• A lot goes on behind-the-scenes• xsl:apply-templates is the most important

instruction in XSLT• <xsl:apply-templates select="para"/>

– This means: “Apply templates to the para element children of the context node.”

But what does “apply templates” mean?

• <xsl:apply-templates select="para"/>– Let the para node-set populate the current node list (in

document order)– For each node in the list, invoke the best-matching template

rule• A template rule

Namespace node quiz• Example:

– <foo xmlns:e="http://example.com"><bar/></foo>

• Quiz: How many namespace nodes in the above document?

Answer• Example:

– <foo xmlns="http://example.com"><bar/></foo>

• Quiz: How many namespace nodes are in the above document?– ANSWER: 4

• Two namespace nodes for each element– As we'll see, namespace nodes are a property of the element

for which they're in scope.

• Doesn't that make for a huge proliferation of namespace nodes?– Yes.

• Should I care?– Hardly ever.

QNames in XPath/XSLT• QNames are expanded

– Into local and URI parts– Using a set of namespace/prefix declarations

• Supplied in the XPath expression context

• This does not include a default namespace declaration (declared by xmlns)– Thus, if you want to select nodes in a particular namespace,

then you must use a prefix

• In other words, a QName without a prefix always designates a node that is not in a namespace

More node properties• String-value• Expanded-name

– Two parts:• Local part• Namespace URI part

• Also:– unique ID (used by id() function)– XSLT-specific:

• base URI (used by document() and xsl:import/include)• unparsed entity URIs (used by unparsed-entity-uris()

function)

• That's it!

Node relationship properties• The value of each of these node properties is a set of

other nodes:– Children– Attributes– Namespace nodes– Parent

• These properties are what connect one node to all the other nodes in the tree– By traversing these properties you can move from one node to

anywhere else in the tree

• These properties are not applicable to all 7 node types– We'll see which ones apply to which node types

Introduction to XSLT

Documents

Transcript of Introduction to XSLT