Introduction to XSLT

138
Evan Lenz XML/XSLT Consultant http://xmlportfolio.com [email protected] August 2, 2005 O’Reilly Open Source Convention August 1 - 5, 2005 Introduction to XSLT

description

Introduction to XSLT. Evan Lenz XML/XSLT Consultant http://xmlportfolio.com [email protected]. August 2, 2005 O’Reilly Open Source Convention August 1 - 5, 2005. Who is this guy?. Evan Lenz Majored in music - PowerPoint PPT Presentation

Transcript of Introduction to XSLT

Page 1: Introduction to XSLT

Evan LenzXML/XSLT Consultanthttp://[email protected]

August 2, 2005

O’Reilly Open Source ConventionAugust 1 - 5, 2005

Introduction to XSLT

Page 2: Introduction to XSLT

Who is this guy?

• Evan Lenz– Majored in music– Over 5 years ago, read Michael Kay's XSLT Programmer's

Reference cover-to-cover while sitting by his newborn son's hospital bed

– Participated on the XSL Working Group for a couple years– Wrote XSLT 1.0 Pocket Reference (due out this month)– Preparing for entrance to a Ph.D. program in Digital Arts and

Experimental Media

Page 3: Introduction to XSLT

Why does he like XSLT?

• XSLT is:– Powerful– Small– Beautiful– In high demand– Fun to learn– Fun to teach

Page 4: Introduction to XSLT

What should I expect this afternoon?

• Fasten your seatbelts• A variety of interactive exercises and traditional

presentation• Feel free to feel overwhelmed

– You're learning more than you think!

• Try your best while you're here and it will be time well spent

• Have fun!

Page 5: Introduction to XSLT

What's with the handouts?

• The big handout is a late-stage draft of XSLT 1.0 Pocket Reference, due out this month– If you would like a complimentary copy of the final book, put

your name and mailing address on the sign-up sheet

• The smaller handout contains exercises that we will be using today

Page 6: Introduction to XSLT

XSLT from 30,000 feet

High-level overview

Page 7: Introduction to XSLT

What is XSLT?

• “XSL Transformations”– “A language for

transforming XML documents into other XML documents”

– W3C Recommendation• http://www.w3.org/TR/xslt• Version 1.0: 1999-11-16

Page 8: Introduction to XSLT

OK, then what is XSL?

• “Extensible Stylesheet Language”– “A language for

expressing stylesheets”– W3C Recommendation

• http://www.w3.org/TR/xsl

• Version 1.0: 2001-10-15– Has 2 parts:

• XSLT– Refactored out of XSL so

that it could proceed independently

• XSL-FO– “Formatting Objects”

Page 9: Introduction to XSLT

What is XPath?

• “XML Path Language”– “A language for addressing

parts of an XML document”– W3C Recommendation

• http://www.w3.org/TR/xpath• Version 1.0: 1999-11-16• Released on the same day

as XSLT 1.0– The expression language

used in XSLT

Page 10: Introduction to XSLT

A relationship of subsets

• XPath is part of XSLT• XSLT is part of XSL• Today we are

concerned only with the inner two circles:– XSLT and XPath– XSL, a.k.a. XSL-FO, is

out of scope for today

Page 11: Introduction to XSLT

What is XSLT used for?

• Common applications– Stylesheets for converting XML to HTML

• Generating Web pages or whole websites• Docbook -> HTML

– Transformations from one document type to another• *ML to *ML – as many potential applications as there are XML

document types• RSS, SVG, UBL, LegalXML, HrXML, XBRL• Office applications

– SpreadsheetML, WordML, Keynote XML, OOo XML, PowerPoint (in next version), Access XML, etc.

– Extracting data from documents– Modifying or fixing up documents

Page 12: Introduction to XSLT

Where is XSLT used?

• Every platform– Windows, Linux, Mac, UNIX, Java

• Many browsers support XSLT natively– Firefox/Mozilla, Internet Explorer, Safari

• Many frameworks use or support XSLT– .NET, Java, LAMP

• PHP5 now uses libxslt– Cocoon, 4Suite, Amazon web services, Google appliance,

Cisco routers, etc., etc.

• XSLT IS EVERYWHERE!!

Page 13: Introduction to XSLT

Interoperable implementations?

• In terms of interoperability, XSLT is unmatched among languages having multiple implementations– Java

• Saxon – http://saxon.sf.net (open-source)• Xalan-J – http://xml.apache.org/xalan-j/ (open-source)

– Windows• MSXML – fast, fully conformant

– Python• 4xslt – http://www.4suite.org (open-source)

– C• libxslt – http://xmlsoft.org (open-source; used in Firefox, Safari,

PHP5, etc.)• Xalan-C++ – http://xml.apache.org/xalan-c/ (open-source)

Page 14: Introduction to XSLT

Enough already, let's see some code!

Page 15: Introduction to XSLT

Example XML file

INPUT: names.xml<people> <person> <givenName>Joe</givenName> <familyName>Johnson</familyName> </person> <person> <givenName>Jane</givenName> <familyName>Johnson</familyName> </person> <person> <givenName>Jim</givenName> <familyName>Johannson</familyName> </person> <person> <givenName>Jody</givenName> <familyName>Johannson</familyName> </person></people>

Page 16: Introduction to XSLT

A very simple stylesheet, names.xsl

Page 17: Introduction to XSLT

OUTPUT: the result of the transformation

$ saxon names.xml names.xsl >names.html

Page 18: Introduction to XSLT

Or we could open the XML directly in the browser

• Oops, we must first add a processing instruction (PI) to the top, like this:<?xml-stylesheet type="text/xsl" href="names.xsl"?><people> <!-- ... --></people>

Page 19: Introduction to XSLT

That's better. Displays as HTML but viewing source shows it's just XML.

Page 20: Introduction to XSLT

One more example for now

<?xml-stylesheet type="text/xsl" href="article.xsl"?><article> <heading>This is a short article</heading> <para>This is the <emphasis>first</emphasis> paragraph.</para> <para>This is the <strong>second</strong> paragraph.</para></article>

• INPUT: article.xml

Page 21: Introduction to XSLT

A rule-oriented stylesheet

• article.xsl:

Page 22: Introduction to XSLT

A rule-oriented stylesheet, cont.

• article.xsl, cont.:

Page 23: Introduction to XSLT

OUTPUT: article.xml transformed to HTML

Page 24: Introduction to XSLT

See a pattern here?

Page 25: Introduction to XSLT

XPath in a nutshell

Page 26: Introduction to XSLT

How XPath fits in XSLT• XPath expressions appear in attribute values, e.g.:

– <xsl:for-each select="/people/person"/>– <xsl:value-of select="givenName"/>– <xsl:apply-templates select="/article/para"/>

• What these mean:– /people/person

• Select all person child elements of all people child elements of the root node

– givenName• Select all givenName child elements of the context node

– /article/para• Select all para child elements of all article child elements of

the root node

Page 27: Introduction to XSLT

The skinny on XPath• XPath is an expression language

– The only thing you can do with XPath is write expressions– When we say “expression”, we mean “XPath expression”

• Every expression returns a value– XPath 1.0 has just four data types:

• Node-set (the most important)• String• Number• Boolean

• All expressions are evaluated in a context– Understanding context is crucial to understanding XPath

Page 28: Introduction to XSLT

Path expressions

• Expressions that return node-sets are sometimes called path expressions

• A node-set is:– An unordered collection of zero or more nodes

• Every expression is evaluated relative to exactly one context node– The context node is analogous to the current directory in a

filesystem• On a CLI, dir/* expands to all the files in the dir directory

inside the current directory• As an XPath expression, dir/* would select all the element

children of all the dir element children of the context node

Page 29: Introduction to XSLT

A filesystem analogy

• Addressing files:– Relative

• dir/*• ../file

– Absolute• /home/elenz/file.txt

• Addressing XML nodes:– Relative

• body/p• ../table

– Absolute• /html/body/p

Page 30: Introduction to XSLT

QUIZ 1: You have 5 minutes• Ready?• Set...

Page 31: Introduction to XSLT

Go! Use this cheat sheet1) para selects the para element children of the context node

2) * selects all element children of the context node

3) node() selects all children of the context node

4) @name selects the name attribute of the context node

5) @* selects all the attributes of the context node

6) para[1] selects the first para child of the context node

7) para[last()] selects the last para child of the context node

8) */para selects all para grandchildren of the context node

9) /doc/chapter[5]/section[2] selects the second section of the fifth chapter of the doc

10) chapter//para selects the para element descendants of the chapter element children of the context node

11) //para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

12) . selects the context node

13) .//para selects the para element descendants of the context node

14) .. selects the parent of the context node15) title

16) ../@lang selects the lang attribute of the parent of the context node

Page 32: Introduction to XSLT

XPath is all about trees

A venture into the abstract world of the XPath data model

Start filling out the NOTES page

Page 33: Introduction to XSLT

The XPath data model

• An abstraction of an XML document, after parsing– In XSLT, models the source tree, stylesheet tree, & result tree

• An XML document is a tree of nodes• There are 7 kinds of nodes (memorize these!)

– Root node– Element node– Attribute node– Text node– Comment node– Processing Instruction (PI) node– Namespace node

Page 34: Introduction to XSLT

Root nodes• Every XML document has exactly one root node

– An “invisible” container for the whole document– The XPath expression / selects the root node of the same

document as the context node

• The root node is not an element– Instead, the “document element” or “root element” is a child of

the root node

• It can also contain:– Processing instruction (PI) nodes– Comment nodes

• XSLT extension to XPath data model:– Root node may contain text nodes– Root node may contain more than one element node

Page 35: Introduction to XSLT

Element nodes• There is one element node for each element that

appears in a document. (Duh.)• Example: <foo><bar/></foo>

– There are two element nodes above: foo and bar.– The foo element contains the bar element node.

• Element nodes can contain:– Text nodes– Other element nodes– Comment nodes– Processing instruction (PI) nodes

Page 36: Introduction to XSLT

Node property: children• Applies only to:

– Element nodes– Root nodes

• Consists of:– Ordered list of zero or more other nodes

• 4 kinds of nodes can be children (memorize this subset!)– Element nodes– Text nodes– Comment nodes– Processing instruction (PI) nodes

• Instead of “Lions, Tigers, and Bears, Oh My”, chant:– “Elements, comments, text, PIs! Elements, comments, text, PIs!”

• Example: <foo><bar/> <!-- hi --> </foo>– The foo element's children consists of four nodes in order:

• 1) element, 2) text, 3) comment, 4) text

Page 37: Introduction to XSLT

Why should I memorize that subset of four?

• Knowing what types of nodes can be children is crucial to understanding what this little, unassuming instruction does (as we shall see):– <xsl:apply-templates/>

• So remember:– “Elements, comments, text, PIs!”– “Elements, comments, text, PIs!”

Page 38: Introduction to XSLT

How to access the children• Use the child axis, e.g. (in non-abbreviated form):

– child::node()• Selects all children of the context node

– child::*• Selects all child elements of the context node

– child::paragraph• Selects all child elements named paragraph

– child::xyz:foo• Selects all child elements named foo in the namespace

designated by the xyz prefix– child::xyz:*

• Selects all child elements that are in the namespace designated by the xyz prefix

Page 39: Introduction to XSLT

Attribute nodes• There is one attribute node for each attribute that

appears in a document. (Duh again.)• Example: <foo bar="bat" bang="baz"/>

– There are two attribute nodes in the above example:• bar and bang

Page 40: Introduction to XSLT

Node property: attributes

• Applies only to:– Element nodes

• Consists of:– Unordered list of zero or more attribute nodes

• For example:– <doc lang="en"/>

• The doc element's attributes property consists of one lang attribute

Page 41: Introduction to XSLT

How to select attributes• Use the attribute axis, e.g. (in abbreviated form):

– @lang• Selects the attribute named lang

– @* or @node()• Selects all attributes of the context node

– @abc:foo• Selects the attribute named foo in the namespace designated by

the abc prefix– @abc:*

• Selects all attributes that are in the namespace designated by the abc prefix

Page 42: Introduction to XSLT

Text nodes• There is one text node for each contiguous sequence

of character data in a document• Text nodes are never adjacent siblings to each other

– Adjacent text nodes are always automatically merged into one text node (e.g., when creating the result tree in XSLT)

• Lexical details are thrown away– The XPath data model knows nothing about:

• CDATA sections, entity references, or character references

• Example: <foo>&lt;</foo>– There is one text node in the above document (a < character)

• Example: <foo><![CDATA[<]]></foo>– Identical to the first example, as far as XPath is concerned

Page 43: Introduction to XSLT

Text node quiz

<foo> <bar>Hello world.</bar></foo>

• Example:

• How many text nodes are in the above document?

Page 44: Introduction to XSLT

Text node quiz: ANSWER

<foo> <bar>Hello world.</bar></foo>

• Example:

• How many text nodes in the above document?• ANSWER: 3

– 1: Linefeed, space, space– 2: Hello world.– 3: Linefeed

Page 45: Introduction to XSLT

How to select text nodes• Use the text() node test:

– text()• Short for child::text()

– descendant::text()• Selects all text nodes that are descendants of the context node

Page 46: Introduction to XSLT

Comment nodes• There is one comment node for each comment• Example:

– <!--This is a comment node-->

Page 47: Introduction to XSLT

How to select comments• Use the comment() node test on the child axis:

– comment()• Short for child::comment()

Page 48: Introduction to XSLT

Processing instruction (PI) nodes• There is one PI node for each PI• The XML declaration is not a PI

– <?xml version="1.0"?> is not a PI– (It's not a node at all but just a lexical detail that XPath knows

nothing about.)

• Example:– (This is a PI.)

• <?xml-stylesheet type="text/xsl" href="a.xsl"?>

Page 49: Introduction to XSLT

How to select processing instructions• Use the processing-instruction() node test

– Any PI:• processing-instruction()

– Selects all PI children of the context node– Short for child::processing-instruction()

– PI with a specific target:• processing-instruction('xml-stylesheet')

– Selects all xml-stylesheet processing instruction children of the context node

Page 50: Introduction to XSLT

Namespace nodes• There is one namespace node for each in-scope

namespace URI/prefix binding for each element in a document. (No duh... er... what?)– Always includes this (implicit) binding (used by reserved

attributes xml:lang and xml:space, etc.):• Prefix: “xml”• URI: “http://www.w3.org/XML/1998/namespace”

– Example: <foo/>• There is one namespace node in the above document

– Example: <foo xmlns="http://example.com"/>• There are two namespace nodes in the above document

– The implicit xml one (see above)– And this one:

» Prefix: “”» URI: “http://example.com”

Page 51: Introduction to XSLT

Node property: namespace nodes

• As with the attributes property, applies only to:– Element nodes

• Consists of:– Unordered list of zero or more namespace nodes

• For example:– <foo xmlns:xyz="http://example.com"/>

• The foo element's namespace nodes property consists of two namespace nodes (one for xyz and one for xml)

Page 52: Introduction to XSLT

How to select namespace nodes• Use the namespace axis:

– namespace::*• Selects all of the context node's namespace nodes

– namespace::node()• Same as above

– namespace::xyz• Select the context node's namespace node that declares the xyz

prefix.

Page 53: Introduction to XSLT

Node property: parent• Applies to:

– All node types except root node:• Element nodes• Text nodes• Comment nodes• Processing instruction (PI) nodes• Attribute nodes• Namespace nodes

• Consists of:– Exactly one other node

• Root node, or• Element node

• Example: <foo bar="bat"/>– The bar attribute's parent is the foo element

Page 54: Introduction to XSLT

How to access the parent node• Use the parent axis:

– ..• Selects the parent node of the context node• Short for parent::node()

– parent::doc• Select the parent node of the context node provided that it is an

element named doc– (otherwise return an empty node-set)

– parent::*• Select the parent node of the context node provided that it is an

element

Page 55: Introduction to XSLT

A riddle

• “You have a parent but you are not a child.”• “What are you?”

Page 56: Introduction to XSLT

A riddle, cont.

• “You have a parent but you are not a child.”• “What are you?”

– Hint:• Only 4 node types are children, but 6 node types have parents• 6 - 4 = 2 ...

Page 57: Introduction to XSLT

The answer

• “You have a parent but you are not a child.”• “What are you?”

– Hint:• 4 node types are children, but 6 node types have parents• 6 - 4 = 2 ...

• ANSWER:– A namespace node or an attribute node of course!

• Embracing the asymmetry and moving on...

Page 58: Introduction to XSLT

Derived node relationships

• A node's descendants consists of the transitive closure of the children property– A fancy way of saying:

• My children and my grandchildren and my great grandchildren and their kids and so on

• A node's ancestors consists of the transitive closure of the parent property– A fancy way of saying:

• My parent and my grandparent and my great grandparent and its parent and so on

Page 59: Introduction to XSLT

Shooting blanks is okay• QUIZ: How many nodes will each of the following

expressions return?– parent::comment()– attribute::text()– ancestor::processing-instruction()– namespace::xyz:*

Page 60: Introduction to XSLT

Shooting blanks is okay• QUIZ: How many nodes will each of the following

expressions return?– parent::comment()– attribute::text()– ancestor::processing-instruction()– namespace::xyz:*

• ANSWER: 0, by definition– These expressions are perfectly legal; they're just guaranteed

to return empty

Page 61: Introduction to XSLT

Node property: string-value

• Applicable to:– All node types

• Root: concatenation of all descendant text node string-values• Element: concatenation of all descendant text node string-values• Attribute: normalized attribute value• Text: character data (always at least one character)• Comment: the content of the comment• PI: text following the PI target and whitespace

– e.g., type="text/xsl" href="style.xsl" is the string-value of an example stylesheet PI

• Namespace node – the namespace URI– Use <xsl:value-of/> to insert the string-value of a node

into the result tree

Page 62: Introduction to XSLT

Node property: expanded-name

• Applicable to:– Elements and attributes:

• Local part: local name of node, returned by local-name()• URI part: namespace name (URI) of node, namespace-uri()

– PIs:• Local part: the PI target, e.g., xml-stylesheet• URI part: (always null)

– Namespace nodes:• Local part: the namespace prefix, e.g., xml or xyz or empty

string (“”) in the case of a default namespace• URI part: (always null)

• Root, text, and comment nodes do not have names

Page 63: Introduction to XSLT

Document order

• There is an ordering for all nodes in a document called document order

• The root node is always the first node in a document• The rest are ordered according to where their XML

representation begins– Except that the relative order of attributes and namespace

nodes on the same element is implementation-defined

• Why should I care about document order?– Because it's the default order in which nodes are processed

by both <xsl:for-each> and <xsl:apply-templates>

Page 64: Introduction to XSLT

Quiz: counting nodes

• How many nodes are in the following XML document?

Page 65: Introduction to XSLT

Answer

• How many nodes are in the following XML document?• 15!

Page 66: Introduction to XSLT

The first 14 nodes in the QUIZ 1 example

Page 67: Introduction to XSLT

Quiz review

• para– Short for child::para

• *– Short for child::*

• node()– Short for child::node()

• @name– Short for attribute::name

• @*– Short for attribute::*

Page 68: Introduction to XSLT

Quiz review

• para[1]– Short for child::para[1]– Equivalent to child::para[position() = 1]

• para[last()]– Short for child::para[last()]– Equivalent to child::para[position() = last()]

• */para– Short for child::*/child::para

• /doc/chapter[5]/section[2]– Short for /child::doc/child::chapter[5]/child::section[2]

Page 69: Introduction to XSLT

Quiz review

• chapter//para– Short for:

– child::chapter/descendant-or-self::node()/child::para

• //para– Short for:

– /descendant-or-self::node()/para

• .– Short for:

– self::node()

• .//para– Short for:

– self::node()/descendant-or-self::node()/child::para

Page 70: Introduction to XSLT

Quiz review

• ..– Short for parent::node()

• title– Short for child::title

• ../@lang– Short for parent::node()/attribute::lang

Page 71: Introduction to XSLT

Summary of abbreviations• XPath has five abbreviations. They are:

– . is short for self::node()– .. is short for parent::node()– @ is short for attribute::– // is short for /descendant-or-self::node()/– foo is short for child::foo

Page 72: Introduction to XSLT

XPath, the language

Descending from the clouds

Keep filling out that NOTES page

Page 73: Introduction to XSLT

XPath basics: review• XPath is an expression language• Every expression returns a value

– XPath 1.0 has just four data types (write these down!):• Node-set (the most important)• String• Number• Boolean

• All expressions are evaluated in a context– Understanding context is crucial to understanding XPath

Page 74: Introduction to XSLT

XPath context• All XPath expressions (whether in XSLT or not) are

evaluated in a context– The context consists of 6 parts:

• The context node• The context size (an integer 1 or higher)

– Returned by the last() function

• The context position (an integer 1 or higher)– Returned by the position() function

• A set of namespace/prefix declarations in scope for the expression

– Used to evaluate QNames in the expression, e.g., xyz:foo/xyz:bar

• A set of variable bindings• A function library

Page 75: Introduction to XSLT

XPath context, cont.• The context comprises the entire world for an XPath

expression, so to speak.– Other than its context, there is no input to an XPath

expression. It consists of everything outside the expression itself that may affect the resulting value of the expression.

• The context indicates:– Where you are

• Where in the tree:– Context node

• Where in processing:– Context size (the size of an arbitrary list of nodes being processed)– Context position (the position of the current node in that list)

– What is available to you– What variables you can reference– What namespace prefixes you can use– What functions you can call

Page 76: Introduction to XSLT

XPath syntax overview• XPath supports these kinds of expressions:

• Variable references:– $foo, $bar, etc.

• Function calls:– starts-with($str,'a')– true()– round($num)

• Parenthesized expressions– (//para)– (foo | bar)

• String literals– "foo", 'bar', etc.

• Numbers– 13, 24.7, .007, etc.

• cont...

Page 77: Introduction to XSLT

XPath syntax overview, cont.

• Node-set expressions– /html/body/p[2]/text()– //@person | //person– (.//note | para/fnote)[1]– $ns[@id='xyz']

• Arithmetic expressions– (($x - 5) * 2) div -3– $pos mod 2

• Boolean expressions– $is-good and $is-valid– $x >= 4– position() != last()

Page 78: Introduction to XSLT

Node-set expressions

• A node-set is:• An unordered collection of zero or more nodes

• Node-set expressions include:• Location paths (the most important kind of expression!)

– foo/bar[3]

• Union expressions (union of two node-set expressions using the | operator)

– $set | foo/bar[3]

• Filtered expressions (a predicate applied to any expression using the predicate operator)

– $set[.='good']

• Path expressions (any expression composed with a location path using the / or // operators)

– $set//bar

Page 79: Introduction to XSLT

Location paths: a formal definition• A location path:

– Is the most important kind of XPath expression– Returns a node-set– Can be absolute or relative– Relative:

• One or more steps separated by /– foo– foo/bar

– Absolute:• /

– Selects the root node of the document that contains the context node– The only location path that doesn't have any steps in it

• / followed by a relative location path– /foo– /foo/bar

Page 80: Introduction to XSLT

Location path steps

• A location path step has 3 parts:– An axis specifier– A node test– Zero or more predicates

• The above is equivalent to this abbreviated form:– paragraph[string-length(.) > 100]– (because child is the default axis)

• It selects each paragraph child whose string-value is greater than 100 characters in length

Page 81: Introduction to XSLT

How a step is evaluated• Moving from left to right:

1. The axis identifies a set of nodes relative to the context node.2. The node test acts as a filter on that set.3. Each of any number of optional predicates in turn acts as a filter

on the set identified by the preceding predicates and node test to its left.

• For example:– child::paragraph[string-length(.)>100]

1. The child axis identifies all the children of the context node.2. Among those, the paragraph node test selects only the

elements named “paragraph”.3. Among those, the string-length(.)>100 predicate filters out

all but the nodes whose string-value is greater than 100 characters long.

Page 82: Introduction to XSLT

The axis

• That's this part:• Can be any one of 13 axes:

– child::– self::– parent::– descendant::– descendant-or-self::– ancestor::– ancestor-or-self::– following::– following-sibling::– preceding::– peceding-sibling::– attribute::– namespace::

Page 83: Introduction to XSLT

The 13 XPath axes• What each axis contains:

– child• The children of the context node.

– descendant• The descendants of the context node (children, children's children,

etc.).– parent

• The parent of the context node (empty if context node is root node).

– ancestor• The ancestors of the context node (parent, parent's parent, etc.).

Page 84: Introduction to XSLT

The 13 XPath axes, cont.• What each axis contains, cont.:

– attribute• The attributes of the context node (empty if context node is not an

element).– namespace

• The namespace nodes of the context node (empty if context node is not an element).

– self• Just the context node itself.

– descendant-or-self• The context node and descendants of the context node.

– ancestor-or-self• The context node and ancestors of the context node.

Page 85: Introduction to XSLT

The 13 XPath axes, cont.• What each axis contains, cont.:

– following-sibling• All nodes with the same parent as the context node that come

after the context node in document order (empty if context node is an attribute or namespace node).

– preceding-sibling• All nodes with the same parent as the context node that come

before the context node in document order (excluding attributes and namespace nodes).

– following• All nodes after the context node in document order, excluding

descendants, attributes, and namespace nodes.– preceding

• All nodes before the context node in document order, excluding ancestors, attributes, and namespace nodes.

Page 86: Introduction to XSLT

A little observation about siblings

• The types of nodes that can be siblings are the same as the types of nodes that can be children– “Elements, comments, text, PIs!”

• That's because attributes and namespace nodes are by definition not siblings to anyone, not even to each other. They're just “attached” to their parent.

Page 87: Introduction to XSLT

The node test

• That's this part:• A node test is a filter on an axis

– There are two kinds of node test:• Node type tests

– Any node: node()– Specific node type: text(), comment(), processing-instruction()– Specific PI target: processing-instruction('foo')

• Name tests– Wildcard (any name): *– Namespace-qualified wildcard (any local name within a particular

namespace): xyz:*, abc:*, etc.– QName (a specific expanded-name): foo, xyz:foo, the highlighted example,

etc.

– Node type tests are not functions• They're special forms that only happen to look like functions

Page 88: Introduction to XSLT

Node type tests• The most inclusive node test: node()

– It includes every node regardless of its name or node type• Thus, it effectively selects all nodes on the given axis

– child::node() selects all children– ancestor::node() selects all ancestors– etc.

• Specific node type– text(), comment(), processing-instruction()

• Selects only the nodes of the given type from the given axis– descendant::text() selects all descendant text nodes– following::comment() selects all following comment nodes– preceding::processing-instruction() selects all preceding PI nodes

• Specific PI target– child::processing-instruction('xml-stylesheet')

Page 89: Introduction to XSLT

Name tests• Name tests only select nodes of one type at a time,

depending on the axis– This is called the principal node type for an axis

• Attributes while on the attribute axis• Namespace nodes while on the namespace axis• Element nodes while on every other axis

• The wildcard: *– Selects all nodes of the principal node type

• child::* selects all element nodes on the child axis• attribute::* selects all attribute nodes on the attribute

axis– effectively no different than attribute::node() because the attribute

axis can only ever contain attribute nodes

Page 90: Introduction to XSLT

Name tests, cont.• Namespace-qualified wildcards: xyz:*

– Selects all nodes of the principal node type whose expanded-name has a particular URI part

• child::xyz:* selects all element nodes on the child axis that are in the namespace designated by the xyz prefix

• QNames: foo, xyz:foo, etc.– Selects all nodes of the principal node type whose expanded-

name has a particular local part and a particular URI part• child::foo selects all element nodes on the child axis that

have local name foo and that are not in a namespace• ancestor::xyz:foo selects all element nodes on the ancestor axis that have local name foo and that are in the namespace designated by the xyz prefix

Page 91: Introduction to XSLT

How multiple steps are evaluated

• The rightmost step indicates what nodes are returned:– table/tr/td

• The above location path returns a node-set of zero or more td elements

– /doc/section[5]/para[text() and @*]• What does the above location path return?

• Each step is evaluated once for each node returned by the step to its left, using that node as the context node for the evaluation

• The result is the union of the node-sets returned by all the evaluations of the rightmost step

Page 92: Introduction to XSLT

Predicates

• A predicate filters a node-set to produce a new node-set– price[. > 5]

• Of all the price child elements, return only those whose string-value, when converted to a number, is greater than 5

– The predicate expression is evaluated once for each node in the node-set to be filtered, using that node as the context node for the evaluation

• The result is converted to a boolean (if necessary)• If true, the node is retained in the result• If false, the node is excluded from the result

Page 93: Introduction to XSLT

Numeric predicates

• When the predicate expression evaluates to a number– It is interpreted in a special way, such that:

• foo[5] is short for foo[position()=5]• foo[last()] is short for foo[position()=last()]

Page 94: Introduction to XSLT

Context size in predicates

• The last() function returns the context size– The number of nodes returned by the step (or arbitrary node-

set expression) to its left– foo[last()] evaluates to foo[5] if there are a total of 5 foo elements

Page 95: Introduction to XSLT

Context position in predicates

• The position() function returns the context position– The proximity position of the context node for the current

predicate evaluation• The relative position of the node among all the nodes being

filtered in document order– foo[5] returns the 5th foo element in document order

– Unless the step uses one of the four reverse axes:– preceding– preceding-sibling– ancestor– ancestor-or-self

• ancestor::node()[1] is equivalent to ..• preceding-sibling::foo[1] returns the first foo element in

reverse document order

Page 96: Introduction to XSLT

“Step filters” vs. “Expression filters”

• A predicate can be used to filter two different things:– A location path step– An arbitrary node-set expression

• $node-set[@foo='bar']

• Gotchas– //para[1] vs. (//para)[1]– ancestor::*[1] vs. (ancestor::*)[1]

Page 97: Introduction to XSLT

Comparisons with node-sets

• price > 20– True if there are any price element children whose string-

value when converted to a number is greater than 20• foo[bar = bat]

– Select all foo elements that have any bar element child and any bat element child that have the same string-value

• Comparisons with empty node-sets always return false– Gotcha:

• foo != 2• foo != bar

– Use not() for the true complement• not(foo = 2)• not(foo = bar)

Page 98: Introduction to XSLT

Functions overview• String functions

• string(), concat(), starts-with(), contains(), substring-before(), substring-after(), substring(), string-length(), normalize-space(), translate()

• Node-set functions• last(), position(), count(), id(), local-name(), namespace-uri(), name()

• Boolean functions• boolean(), not(), true(), false(), lang()

• Number functions• number(), sum(), floor(), ceiling(), round()

• XSLT adds:– document(), key(), generate-id(), system-property(),

format-number(), current(), element-available(), function-available(), unparsed-entity-uri()

Page 99: Introduction to XSLT

XSLT element overview

Page 100: Introduction to XSLT

XSLT elements, by use case

Creating nodesxsl:element, xsl:attribute, xsl:text, xsl:comment,

xsl:processing-instructionCopying nodes xsl:copy-of, xsl:copyRepetition (looping) xsl:for-eachSorting xsl:sortConditional processing xsl:choose, xsl:ifComputing or extracting a value xsl:value-ofDefining variables and parameters xsl:variable, xsl:paramDefining and calling subprocedures (named templates)

xsl:template, xsl:call-templateDefining and applying template rules

xsl:template, xsl:apply-templates, xsl:apply-importsNumbering and number formatting xsl:number, xsl:decimal-formatDebugging xsl:message

Page 101: Introduction to XSLT

XSLT elements, cont.

Combining stylesheets (modularization)xsl:import, xsl:include

Compatibilityxsl:fallback

Building lookup indexesxsl:key

XSLT code generationxsl:namespace-alias

Output formattingxsl:output

Whitespace stripping xsl:strip-space, xsl:preserve-space

Page 102: Introduction to XSLT

XSLT's processing model

Page 103: Introduction to XSLT

The end: construct a result tree

Page 104: Introduction to XSLT

The means: process lists

• If XPath is about trees, then XSLT is about lists– Populate arbitrary nodes from the source tree into lists– Iterate over those lists– For each node in the list, create part of the result tree

• Source tree -> List processing -> Result tree• Thus, there is always:

– a current node list, and– a current node

Page 105: Introduction to XSLT

Two mechanisms for iterating over lists

• xsl:apply-templates and xsl:for-each• They both iterate over the nodes of a given node-set

– Supplied by the XPath expression in the select attribute

• For example:– <xsl:apply-templates select="para"/>

• “Populate the current node list with para elements, sorted in document order. For each para element, invoke the best-matching template rule.”

Page 106: Introduction to XSLT

All XSLT processing begins with...

• A virtual call to:– <xsl:apply-templates select="/"/>

• The current node list initially consists of just one node– The root node of the source tree– In other words, the XSLT processor invokes the template rule

that matches the root node

• This call constructs the entire result tree– Nothing happens before it– Nothing happens after it

Page 107: Introduction to XSLT

Your job as an XSLT stylesheet author...

• ...is to define—using template rules—what happens when the XSLT processor executes this instruction:– <xsl:apply-templates select="/"/>

Page 108: Introduction to XSLT

Template rules

• An XSLT stylesheet contains a set of template rules• Two kinds of template rule:

– Those you define– Those that XSLT defines for you

• These are called the built-in template rules.

• There is a built-in template rule for each of the 7 types of node– Ensures that all calls to xsl:apply-templates will never

fail to find a matching template rule• Even if your stylesheet contains no explicit template rules at all

Page 109: Introduction to XSLT

The empty stylesheet

• Consider this stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

</xsl:stylesheet>

• If you apply the above stylesheet to the example XML from QUIZ 1...– What will the result be?

Page 110: Introduction to XSLT

The result$ xsltproc empty.xsl quiz1.xml<?xml version="1.0"?>

This is a simple XML document

You can do it! There's nothing to it! Go fast!

This will be interesting Here we go...

sub-chapter Who ever heard of nested chapters?!

another sub-chapter End of sub-chapter

No more nested chapters for now...

Page 111: Introduction to XSLT

Template rules that you define

• When you define template rules, you override the default behavior

• An explicit template rule is:– An xsl:template element that has a match attribute

• For example:<xsl:template match="foo"> <!-- construct part of the result tree --> <xsl:apply-templates/> <!--...--></xsl:template>

Page 112: Introduction to XSLT

Applying template rules

• <xsl:apply-templates/>– Short for:

• <xsl:apply-templates select="node()"/>

• Process all child nodes of the context node

Page 113: Introduction to XSLT

Applying template rules: an OOP analogy

• <xsl:apply-templates/>• For each item in the list

– Invoke the same polymorphic function

• Each template rule is an implementation of that polymorphic function

Page 114: Introduction to XSLT

Patterns

• The value of the match attribute is a pattern• Looks like an XPath expression

– Uses a subset of XPath syntax

• But has a more passive role– Does the current node match this pattern? Yes or no.

• When xsl:apply-templates is invoked, for each node in the list, the XSLT processor searches all the patterns of the stylesheet for the best-matching one

Page 115: Introduction to XSLT

Example patterns

• Example patterns– /– /doc[@format='simple']– bar– foo/bar– section//para– @foo– @*– node()– text()– *– xyz:*

Page 116: Introduction to XSLT

Does the pattern match?

• Informal:– If this pattern were an expression, would the node in question

ever be selected by it?

• Formal:– A node matches a pattern if the node is a member of the result

of evaluating the pattern as an expression with respect to some possible context node.

Page 117: Introduction to XSLT

Template rules with multiple patterns

• Separate the alternative patterns with |• <xsl:template match="foo | bar">...

– Is short for:

<xsl:template match="foo"> <!--...--></xsl:template>

<xsl:template match="bar"> <!--...--></xsl:template>

Page 118: Introduction to XSLT

What about conflicts?

• A foo element would match both of these template rules

<xsl:template match="foo"> <!--...--></xsl:template>

<xsl:template match="*"> <!--...--></xsl:template>

• Which one gets invoked by <xsl:apply-templates select="foo"/>?

Page 119: Introduction to XSLT

Two steps to resolving conflicts

• When more than one template rule matches:1. Eliminate rules with lower import precedence.2. Eliminate rules with lower priority.

• Only one rule should be left, otherwise error• Import precedence depends on what file the rule

occurs in– Where it occurs in the import tree (via xsl:import)

• Priority depends on:– The priority attribute of the xsl:template element, or– The default priority (when priority attribute is absent)

Page 120: Introduction to XSLT

Default priority

• Priority is a positive or negative decimal number• The higher the number, the higher the priority• There are four default priorities:

– -.5– -.25– 0– .5

• -.5 -.25 0 .5• |_________|_________|_______________|

Page 121: Introduction to XSLT

Default priority depends on...

• ...the syntax of the match pattern• The most common pattern format has a priority of 0• 0

– Match a particular name• foo, xyz:foo, @foo, @xyz:foo, processing-instruction('foo')

• .5– The highest default priority– Any pattern with a predicate or multiple steps

• foo/bar, foo[2], foo[@good='yes']

Page 122: Introduction to XSLT

The lower default priorities are...

• -.25– One-step wildcards within a namespace

• xyz:*, @xyz:*

• -.5– The lowest default priority– One-step wildcards regardless of name

• *, @*, text(), comment(), processing-instruction(), node()

Page 123: Introduction to XSLT

Modes

• Modes allow you to process the same node again but do something different this time– <xsl:apply-templates select="heading" mode="toc"/>– <xsl:template match="heading" mode="toc">...

• When the mode attribute is absent, that means the default (unnamed) mode

• You can segment your template rules into sets organized by concern– What they generate in the result tree

Page 124: Introduction to XSLT

The built-in template rules

• For elements and root nodes– Apply templates to children:

<xsl:template match="/ | *"> <xsl:apply-templates/></xsl:template>

• For text nodes and attribute nodes– Output the string-value of the node:

<xsl:template match="text() | @*"> <xsl:value-of select="."/></xsl:template>

Page 125: Introduction to XSLT

The built-in template rules

• For processing instructions and comments:– Do nothing<xsl:template match="comment() | processing-instruction()"/>

• For namespace nodes– Do nothing

Page 126: Introduction to XSLT

Template rule content

• Three kinds of elements:– XSLT instructions

• Any element in the XSLT namespace, e.g., <xsl:value-of/>– Literal result elements

• Any element in any other namespace, or no namespace• Creates a shallow copy of itself to the result tree

– Extension elements• Any element in a namespace that's declared as an extension namespace

(using the extension-element-prefixes attribute on the xsl:stylesheet element)

Page 127: Introduction to XSLT

Attribute value templates

• Attributes on literal result elements can contain dynamic values, delimited by curly braces– <para class="{@format}">...

• To include a literal curly brace, double it:– <foo bar="{{not interpreted as XPath}}"/>

Page 128: Introduction to XSLT

Miscellaneous topics...

Page 129: Introduction to XSLT
Page 130: Introduction to XSLT

The template rule engine

• A lot goes on behind-the-scenes• xsl:apply-templates is the most important

instruction in XSLT• <xsl:apply-templates select="para"/>

– This means: “Apply templates to the para element children of the context node.”

Page 131: Introduction to XSLT

But what does “apply templates” mean?

• <xsl:apply-templates select="para"/>– Let the para node-set populate the current node list (in

document order)– For each node in the list, invoke the best-matching template

rule• A template rule

Page 132: Introduction to XSLT
Page 133: Introduction to XSLT
Page 134: Introduction to XSLT

Namespace node quiz• Example:

– <foo xmlns:e="http://example.com"><bar/></foo>

• Quiz: How many namespace nodes in the above document?

Page 135: Introduction to XSLT

Answer• Example:

– <foo xmlns="http://example.com"><bar/></foo>

• Quiz: How many namespace nodes are in the above document?– ANSWER: 4

• Two namespace nodes for each element– As we'll see, namespace nodes are a property of the element

for which they're in scope.

• Doesn't that make for a huge proliferation of namespace nodes?– Yes.

• Should I care?– Hardly ever.

Page 136: Introduction to XSLT

QNames in XPath/XSLT• QNames are expanded

– Into local and URI parts– Using a set of namespace/prefix declarations

• Supplied in the XPath expression context

• This does not include a default namespace declaration (declared by xmlns)– Thus, if you want to select nodes in a particular namespace,

then you must use a prefix

• In other words, a QName without a prefix always designates a node that is not in a namespace

Page 137: Introduction to XSLT

More node properties• String-value• Expanded-name

– Two parts:• Local part• Namespace URI part

• Also:– unique ID (used by id() function)– XSLT-specific:

• base URI (used by document() and xsl:import/include)• unparsed entity URIs (used by unparsed-entity-uris()

function)

• That's it!

Page 138: Introduction to XSLT

Node relationship properties• The value of each of these node properties is a set of

other nodes:– Children– Attributes– Namespace nodes– Parent

• These properties are what connect one node to all the other nodes in the tree– By traversing these properties you can move from one node to

anywhere else in the tree

• These properties are not applicable to all 7 node types– We'll see which ones apply to which node types