Post on 26-Dec-2015
Putting XML to Work
Henry S. ThompsonHCRC Language Technology
GroupUniversity of Edinburgh
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
2
Overview of the tutorial What is an XML application?
Content, Form, Function Namespaces
Ownership of names XSL(T)
Style for XML DOM
Standard abstract API for XML XML Schema
Specifying the structure of document families RDF
Defining and using Data Models
2When you see this, it means there’s accompanying information in the Additional Materials handbook
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
3What is an XML application?
Putting XML to work means designing an XML application
SGML defines an application as having A syntax: what do all the documents involved
in this application share in terms of structure == markup?
A semantics: what do the components of that markup mean
You already know the basic story about defining a syntax You can use English (or French or . . .) You should use a DTD Or even better a Schema
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
4
An aside about structure Formal definitions of document structure
are a kind of contract The user undertakes to structure his/her
documents per the DTD The application undertakes to process
documents which conform to the DTD As is the case with real contracts, both
sides benefit by using them Users know what they have to do to get the
results they want Developers can depend on parsers for a lot
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
5
The W3C
A word from our sponsors The W3C is responsible for all the XML
family The W3C is The World Wide Web
Consortium, a voluntary association of companies and non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Director (Tim Berners-Lee) holding all the high cards, but the big vendors (e.g. Microsoft, Adobe, Netscape) have a lot of power.
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
6
. . . and its WGs
The XML recommendation was written by the W3C’s XML Working Group
Which split itself into pieces, each of which handles a part of the ongoing work Core WG (XML itself, Namespaces, Infoset) Schema WG (XML Schema) Linking WG (XLink, XPointer) Query WG (XML Query) Protocols WG (XML Protocols) XSL WG (XSLT, XSL-FO)
XML Namespaces
Henry S. ThompsonHCRC Language Technology
GroupUniversity of Edinburgh
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
8
Namespaces for XML
Where did those colons come from? xsl:this, fo:that, xml:the_other
Two communities pushed for namespaces Vendors, to manage the composition of
document fragments– E.g. the inclusion of mathematical formulae
in a document Working groups, to reserve names
without compromising users' freedom to name things– E.g. it wouldn't do for XML-link to reserve LINK for simple links, or XSL to reserve TEXT
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
9
Namespaces, cont'd
A W3C Recommendation was endorsed in January 1999 There was a lot of vendor pressure to
get something in place, which caused political tension and at least one resignation from the WG
The example illustrates how namespaces are declared, scoped and used
4
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
10
Namespaces defined
You can use prefixed names, consisting of two simple names separated by a colon (:)
The namespace prefix is an abbreviation for a URI which uniquely identifies the owner/meaning/identity of the source of the name
Using a namespace essentially cedes responsibility for the meaning of the qualified names to the owner of the URI
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
11
Declaring a namespace The association between namespace
prefixes and URIs is declared using reserved attributes<doc xmlns:mml='http://www.w3.org/TR/REC-MathML/'>...</doc>
Anywhere inside the above doc element mml is a legal namespace prefix, standing for the URI given
There is also a mechanism for defining the default (unprefixed) namespace
Declarations are scoped Qualified names can be used for
Element type names Attribute names
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
12
Namespace limitations An add-on for, not a rewrite of, the XML
spec Validation is unchanged
Declarations must match instances character by character
Indeed there's no place for associating prefixes with URIs in DTDs
There is no provision for merging DTDs The rules are confusing
Unprefixed attributes are never qualified Unprefixed elements are qualified if and
only if there is a default namespace declaration in scope
From Structure to Appearance:
Style for XML
Henry S. ThompsonHCRC Language Technology
GroupUniversity of Edinburgh
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
14
Overview of the material
Why a style language? Two approaches to style for XML:
CSS for simple cases XSL for complex cases
Hands-on Exercises
2When you see this, it means there’s accompanying information in the Additional Materials handbook
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
15
Why a style language?
Separating form from content Separating structure from
appearance Single source, multiple delivery
media
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
16
Three stages on the way Document Compilers: ASCII text with
formatting instructions and body text intermixed nroff, Scribe, TeX
WYSIWYG Word Processors: Out-of-band formatting instructions change appearance on-screen; proprietary file formats. Word, Word Perfect
(Semi-)Structured Markup: Markup has either intrinsic or extrinsic rendering consequences. SGML, HTML, XML
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
17
Is this progress?
The old document compilers had complex procedural semantics, which
made debugging and maintenance very tricky for documents of any sophistication.
made authoring and reading tedious, with obtrusive annotations everywhere.
The use of scoped annotations in Scribe and TeX was a big improvement over _roff, but the annotations were still resolutely about appearance, not structure.
LaTeX tried to fix this, but paid an unacceptable price in terms of complexity and fragility.
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
18
Is this progress?, cont'd
The WYSIWYG systems are lovely to look at, and there's no
problem with obtrusive annotations. but even with the addition of
paragraph and character styles, generalisation and consistency are hard to come by.
and there's the built-in obsolescence of proprietary formats to worry about.
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
19
SGML . . .
SGML solved the proprietary format problem It's an ISO standard (8879) It's human-readable (and
understandable!) But for a long time there was no
standard way of formatting SGML documents for printing or viewing
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
20
. . . and HTML
So HTML (nearly/post-hoc an SGML application), by mandating a rendering semantics for all its semi-structural markup, filled a real need.
But it was not extensible (fixed tag set) not customisable (fixed appearance per
tag)
Three Problems; Three Solutions: Electronic Style! Style standard for SGML?
DSSSL Customise HTML page appearance?
CSS Extend HTML tag-set and control
style? XML and
– CSS– XSL
Technology Appraisals
Henry S. ThompsonStyle for XML, London 1998-11-25
9
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
22
Cascading Style Sheets Level 1 Accepted Recommendation per
W3C, December 1996 Level 2 Accepted Recommendation, May
1998 Addresses the problems of:
customising the appearance of HTML documents
minimal styling for XML Initially driven by the need for site
designers to differentiate the appearance of their pages from one another
Focus accordingly is on controlling the colour, size and shape of regions and fonts
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
23
A pretty CSS example
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
24
CSS rules CSS style rules associate properties with
elements in your documents which match selectors
The basic structure of a rule looks like this:selector[, selector ...] {pname: pvalue[; pname: pvalue ...]}
Simple examples:verbatim {white-space: pre}H1 {text-align: center; font-variant: small-caps}
The first would provide style for an XML doc't
The second would change HTML's H1 6
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
25CSS: Cascading Style Sheets
Customising HTML formatting <P> elements by means of
simple instruction:
Formatting XML formatting <foobar> elements by
means of similar instruction:
P {font-weight: bold; font-size: 14pt; font-family: sans-serif}
foobar {display: block; border-style: solid; background-color: green}
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
26Associating rules with documents
Contents of STYLE element in the HTML header
Destination of an appropriate LINK element
In STYLE attributes on any HTML element
6
7
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
27
CSS selectors
Rules can have one or more selectors, separated with commas
Simple names select elements by name In addition to element type names, other
selector syntax includes Space-separated lists, indicating (non-
immediate) ancestry Qualification with period or hash, indicating
class or id attribute matching Qualification with colon (pseudo-classes), for
link state and typographic sensitivity
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
28CSS: Cascading Style Sheets
Store CSS instructions in separate style file<?xml version="1.0" standalone="yes"?>
<?xml-stylesheet type="text/css" href="mystyle.css"?><article> <title>An example</title><text><quote>It was the best of times, it was the worst of times</quote>, wrote <author>Charles Dickens</author> in <book>Tale of Two Cities</book>.</text></article>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
29Using classes to get ready for XML
You can cheat with your HTML to make it look more like XML<DIV CLASS='MESSAGE'>Some text which is<SPAN CLASS='EMPH'>really</SPAN> important.</DIV>
And use class selectors in your stylesheet.MESSAGE {display:block; margin-top: 6pt}.EMPH {display:inline; font-style:italic}
8
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
30CSS selectors: Vertical context
Sometimes you need context-sensitive selectors
For depth-sensitive renderingOL {list-style-type: lower-alpha}OL OL {list-style-type: lower-roman}
For context-appropriate renderingH1 {font-weight: bold;font-size: large}H2 {font-weight: bold;font-style: italic}H3 {font-style: italic}H2 EM,H3 EM {font-style: normal}
Note that in the last rule we have two selectors, separated by commas, sharing the same result
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
31
CSS boxes
CSS and HTML 4.0 et seq use a nested-boxes rendering model, and every block element is rendered into a box
Boxes all have margins, borders and padding (outside in)
All four margins and paddings (left-,right-, top-, bottom-) have width properties, and a shorthand property for setting them all together
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
32
CSS borders
Borders, in addition to widths, have colours and styles, plus shorthand properties for various combinations
There are also float and clear properties to allow a modest amount of displacement and flow-around.
CSS2 goes a lot further with this
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
33
P { margin: 3ex; border-width: thin; border-style: solid; border-left: double; text-align: justify; border-color: blue; padding: 2ex 4ex}
gives the following for a sample paragraph
CSS box example
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
34
CSS property values
Some are symbolic, e.g. font-style: italic
URLs appear in a few places, e.g. background-image: url(http://www...)
Most are lengths, e.g. 3em, 2px percentages, e.g. 110% numbers colours, e.g. red, #fd0
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
35
CSS
In HTML you can “invent” your own tags using classes
And you can define how they should be rendered
<div class="newsarticle">Here is some text. It’s a news article. It mentions <span class="company">Reuters</span>.</div>
<STYLE>div.newsarticle {text-align: left; font-style: italic}span.company {color: red}</STYLE>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
36
Exercise: >> cat exa12.xml
<?xml version="1.0" standalone="yes"?><newsarticle><headline>A newsarticle.</headline><author>Marc Moens</author><newsbody>Here is some text. It's a news article. It mentions<company>Reuters</company>. And since<company>Reuters</company> is a company, we would like it to come out slightly differently.</newsbody></newsarticle>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
37
Exercise: >> cat exa12.xml Change this into an HTML file, keeping
our own “invented” tags (like “headline”, “newsbody”, “company”) as classes For example
<headline> <div class="headline"> call it exa12.html See slide 29 for ideas
Put the style instructions in a separate file call it exa12.css See page 7 for how to connect the two files
7
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
38
Solution: >> exa12.html
<HTML><HEAD><TITLE>A simple example</TITLE><LINK rel=stylesheet type="text/css" href="exa12.css"></HEAD><BODY><DIV class="headline">A newsarticle</DIV><DIV class="author">Marc Moens</DIV><DIV class="newsbody">Here is some text. It’s a news article. It mentions<SPAN class="company">Technology Appraisals</SPAN>. And since <SPAN class="company">Technology Appraisals</SPAN> is a company, we would like it to come out slightly differently.</DIV></BODY></HTML>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
39
Solution: >> exa12.css
div.headline {text-align: center; color: blue; border-style: dashed; font-size: xx-large}div.author {text-align: center; color: blue; font-size: large}div.newsbody {text-align: left; font-style: italic}span.company {color: red}
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
40
Solution in the Browser
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
41
Style
This was CSS as applied to HTML Later:
CSS to render XML other ways of rendering XML
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
42
The 'Cascade' in CSS
What happens when there is more than one rule which provides a value for a property on a given element?
The highest priority value assignment wins
When no assignment is found, the value is either inherited or defaulted
This explains why our original H1 example was bold
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
43
CSS priority
A number of things contribute to determining priority Origin, in increasing order of importance
– browser– user– author
Specificity, in increasing order of importance– Number of element types– Number of CLASS selectors– Number of ID selectors
Importance, marked with !important
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
44
CSS cascade example
The following are in increasing order of priority
LIUL LIUL OL LILI.specialOL LI.special#hotone
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
45
CSS for XML for real
In principle, it's easy Just use your own element type names
instead of HTML's In practice
IE and Mozilla support it Functionality often insufficient for
complex document types Style sheet linkage is via a PI
<?xml-stylesheet type="text/css" href="…"?>
9
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
46Ecise: >exa13.xml (=exa12.xml)
<?xml version="1.0" standalone="yes"?><newsarticle><headline>A newsarticle.</headline><author>Marc Moens</author><newsbody>Here is some text. It's a news article. It mentions <company>Technology Appraisals</company>. And since<company>Technology Appraisals</company> is a company, we would like it to come out slightly differently.</newsbody></newsarticle>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
47
Exercise: >> exa13.xml Create CSS style sheet for exa13.xml
call it exa13.css you can probably reuse material from exa12.css(if you didn’t do that exercise, use exa12style.css)
Remember to use display:block or display:inline in every style rule
Link the document to the stylesheet with<?xml-stylesheet type='text/css' href='exa13.css'?>
View it in Mozilla
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
48
CSS: Summary
Easy to learn Also useful for HTML Works in most browsers
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
49
What is DSSSL? An ISO standard (ISO 10179:1996) A style language
How do I format my SGML documents? A transformation language
How do I transform my SGML documents?
A hopeless acronym Document Style Semantics and
Specification Language A lost opportunity!
Sunk by webhead round-paren allergies
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
50XSL: Extensible Stylesheet Language
A style language specifically for XML– W3C recommendation, Nov 1999
Synthesis of the best of CSS and DSSSL– DSSSL processing and formatting models– CSS properties
XSL is XML– A declarative specification of both the
"pattern" and the "action" of template rules. More generic than CSS
– style and rendering are just a special case of more general tree transformation processes
– can be used for other transformations (XSL-T)
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
51XSL: Extensible Stylesheet Language
Main properties localised (template rules keyed to
elements) scoped (inheritance of general
characteristics) unbiased (with respect to writing
direction, language,…)– “indent paragraph” will mean
“indent on the left” for English “indent on the right” for Hebrew
– “inline” will mean left to right for English top to bottom for some Japanese text
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
52
What is XSL for? Portable standard style specification Single source documents, multiple
delivery media Print Presentation Screen
Multiple document types, single house style
Just as much complexity as you need Single continuous scroll for screen delivery Multi-column pages with side bars etc. for
books
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
53
What is XSL not for?
Controlling filling and line breaking: left to a low-level formatting engine
Page or line fidelity: use a photocopier!
Carefully crafted page layout: emphasis is on automatable processes
User interaction: ditto
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
54
XSL process architecture
CSS takes document tree and decorates it with formatting properties
XSL takes a document tree and builds a new document tree which it then decorates XSL is really two languages
– a transformation language (XSLT)– a formatting language (XSL-FO)
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
55
XSL Transformations
XSL style sheet: template rules pattern which specifies which tree it
applies to result which specifies which tree it
should output
XSL processor reads XML document and XSL stylesheet carries out the instructions in the
stylesheet (outputs a new XML document)
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
56
XSL Transformations
From XML to XML not from/to PDF, TeX, Word,
Postscript,… you can go from XML to intermediate
language, and then different processor to Tex, Word,…
you can go from/to HTML or SGML if it’s well-formed XML
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
57
XSL Transformations
Three places it can happen Web browser (e.g. Mozilla) is handed XML
document and stylesheet, transforms the document and presents it to the user
The server applies style sheet to document to create different format (e.g. HTML) and sends that document to the client – http://xml.apache.org/xalan-j/
A program (e.g. saxon) transforms XML document before it is placed on the server– http://sourceforge.net/projects/saxon
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
58
Key style concepts
Modular Structuring of specs at the XML level
Localised template rules keyed to elements
Scoped Generic characteristics are inherited
Unbiased No expectations regarding national
language, writing direction or character set
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
59
Process architecture CSS takes a document tree and decorates
it with formatting properties XSL takes a (source) document tree and
builds a new (result) document tree XSL provides a set of formatting objects
with sophisticated rendering properties If your stylesheet uses formatting objects
for its result tree, you get a result which constitutes a set of claims on appearance defined by the XSL recommendation
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
60
Formatting objects
From the simple block, inline, graphic-rule, character, sequence
To the complex: page-master, page-set-sequence, column-master, column-set-sequence, table
Each has properties appropriate to its semantics
You can think of them as replacing and extending the CSS display property
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
61Formatting objects and areas The semantics of formatting objects is
expressed in terms of areas Areas are rectangular regions
produced by the rendering of formatting objects
A formatting object when rendered produces one or more fixed-dimension areas, whose position is determined by a parent in the formatting object tree
There are two types of areas: inline areas display areas 12
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
62
Filling areas The process by which a formatting
object positions the areas of its daughters in its own area is called filling
Filling is different for the two different types of areas Display areas are filled into area containers Inline areas are filled into lines
Most formatting object classes are unequivocally inlined or displayed, that is, destined to be rendered into inline or display areas respectively 12
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
63Formatting objects and filling A flow object must specify area
containers and/or lines for its contents to be filled into
For example a page-master formatting object specifies one or more area containers, into which the display areas resulting from its children are filled
And a block flow object specifies a line, into which the inline areas resulting from its children are filled
Finally a character formatting object yields an inline area containing a glyph 13
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
64
XSL is XML
No parentheses! XSL is notated with XML element
types DSSSL semantics without DSSSL
syntax But you can think of it more like
specifying a transformation from one DTD (the source document) to another (for a formatting object specification document)
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
65Fundamental aspects of XSL
An iconic, declarative approach, using XML to specify both the "pattern" and the "action" of template rules.
An XML DTD for a set of formatting object types and attributes adequate to provide on-screen display as well as printing support.
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
66
Template rules
The main component of an XSL stylesheet is the template rule
Each template rule contains A pattern, identifying the source
document elements which the rule should apply to– Like a CSS selector
A template, specifying what gets added to the result tree when this rule fires– Crucially, and unlike CSS, this typically
involves elements
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
67
Simple rule example
<xsl:template match='div/title'> <fo:block font-weight='bold'> <xsl:apply-templates/> </fo:block></xsl:template>
T h e s . . .
div
title
Block [f-w: bold]
T h e s . . .
PatternRestriction onmatch context The el't type
to match
The for-matting objectto be created
The content of the formatting object:use the subordinate results
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
68
XSL and CSS
We could try translate our example into CSS as follows:
div title { font-weight: bold }
But that would actually be wrong: The interpretation of nesting is different:
The XSL pattern matches title elements with div as parent, where the CSS pattern matches title elements with div as ancestor at any remove.
XSL does not require a one-to-one relation between source and destination
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
69
Richer patterns
XSL can restrict matches based on ancestry descendants position wrt siblings attribute presence/absence/value
These are expressed in the form of path expressions, which are shared with the draft XPointer proposal
The common part is called XPath
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
70
XPath patterns / for (root's) children // for (root's) descendants .. for parent name for matching elements @name for matching attributes [. . .] for conditions
=, != for (in)equality Numerical and boolean expressions String and number literals Special-purpose functions
– not(…), position(), last() 15
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
71
Specificity
With all these pattern variants, what happens if two rules match?
Drawing on both DSSSL and CSS, there are a set of precedence rules
Basically, the richer the pattern, the higher precedence
If all else fails, there is a numeric priority attribute
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
72
Iconic actions The 'action' part of a rule isn't much
like an action at all It's more like a picture of what you
want in the way of formatting objects Nesting is specified directly So you can build up quite detailed
formatting object structures The special xsl:apply-templates
element type determines where the formatting objects resulting from processing the children of the matched node should be plugged in
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
73
Action example
This 'action' builds a rich result structure <fo:block> <fo:inline width='3cm'> <fo:sequence font-weight='bold'> <xsl:value-of select='@name'/> <xsl:text>. . . . .</xsl:text> </fo:sequence> </fo:inline> <fo:sequence font-posture='italic'> <xsl:apply-templates/> </fo:sequence> </fo:block>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
74
Plugging in results
demo
HTML
BODY
<x:templ match='demo'><HTML> <BODY> <x:apply-templates/> </BODY></HTML>
P
<x:templ match='para'><P> <x:apply-templates/></P>
para
T h e f . . .T h e f . . .
para
T h e s . . .
P
T h e s . . .
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
75Where did the HTML come from?
You don't have to use formatting objects for your result tree
HTML elements are an obvious alternative
But HTML isn't XML (although there is XHTML as well …) XSLT allows you to specify output type
HTML Browser converts automatically
17
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
76
XSLT Exercise
Copy exa15.xm to exa15.xml Look at exa15.xml with Mozilla Copy catalog.xs to catalog.xsl edit exa15.xml to contain a stylesheet
reference to catalog.xsl:<?xml-stylesheet type='text/xsl' href='catalog.xsl'?>
Look at exa15.xml again Edit catalog.xsl to provide template
rules for the components of exa15.xml so that it looks interesting
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
77
Step by step
First we need to separate the entries: Add a template rule matching <entry>
– Look at slide 74 for a clue<xsl:template match="entry"> <P> <xsl:apply-templates/> </P></xsl:template>
A different HTML tag is wanted for <name>– Use <B> instead of <P><xsl:template match="name"> <B> <xsl:apply-templates/> </B></xsl:template>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
78
Step by step, cont'd Let's give the price some colour
<xsl:template match="price"> <SPAN STYLE="color: red"> <xsl:apply-templates/> </SPAN></xsl:template>
Separating form from appearance would be better:<xsl:template match="price"> <SPAN CLASS="price"> <xsl:apply-templates/> </SPAN></xsl:template>
Need to add <style> to the <head> as well
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
79
Combining XSL and CSS Add a <style> element to the
template for the root<xsl:template match="/"> . . .<head> <style> SPAN.price {color: red} </style> . . .</xsl:template>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
80
Exercise, cont'd Next separate paragraphs for the
options and descriptions, and italicise the <emph>ed material
What about the options themselves? Attributes have to be accessed explicitly<xsl:template match="colors"> <SPAN CLASS="color">Colours: <xsl:value-of select="@value"/> </SPAN></xsl:template>
Note that the literal text appears in the result Handle the other options the same way
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
81
Selection
You may not always want to just invoke processing on a node's children in the ordinary way
You can supply a select attribute on xsl:apply-templates to specify what you want processed
If all you want is the text content of an element or attribute as such Use the xsl:value-of element instead
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
82
Example of select
<xsl:template match='/'> <HTML> <HEAD> <TITLE> <xsl:value-of select='catalog/title'/> </TITLE> </HEAD> <BODY> <xsl:apply-templates/> </BODY> </HTML></xsl:template>
17
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
83
Reordering using select You may not even want material to
appear in the output in the same order it appears in the source, e.g. if the source was derived from a database
select can be used to reorder by pulling out first one child type, then another, etc.
<xsl:apply-templates select='a'/><xsl:apply-templates select='b'/>
All a's will end up before all b's, regardless of where they started
xsl:sort provides more detailed control
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
84
Defaults
XSL has two default rules, similar to DSSSL's For character nodes, copy the
character For all others, a sequence of the
results of processing all children
19
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
85
Exercise
Use the reordering technique introduced above to improve your catalog stylesheet Use the <title> material twice
– Once for the <title> inside <head>– Once for the reader to see
Display the Name, Price and Part Number in the same order for every item
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
86
Watch out for this pitfall You might have tried this to get started<xsl:template match="entry"> <P> <xsl:apply-templates select="name"/> <xsl:apply-templates select="number"/> <xsl:apply-templates select="price"/> </P> <xsl:apply-templates/></xsl:template>
Why didn't this work as expected?
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
87
Macros If you have a constellation of
formatting objects and attributes you will use in more than one template rule, defining a named template is good practice
Named templates can be invoked by name or pattern
<xsl:template name='ruled-para'> <xsl:sequence> <fo:graphic-rule width='2mm'/> <fo:block> <xsl:apply-templates/></fo:block> <fo:graphic-rule width='2mm'/> </xsl:sequence></xsl:template>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
88
More sophisticated XSL
The XSL formatting objects contain many properties They enable you to express detailed
constraints on appearance Patterns for matching and selection
also support a wide range of detail The tree construction and
formatting were originally one recommendation, separated to get XSLT out the door 20
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
89XSLT for Transformation as such
XML to XML can be very useful DTDs change Documents can be merged
Saxon, Michael Kay’s batch implementation, is ideal here Fast Simple command line interface Supports document() for multiple
inputs
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
90The identity transformation The core of every serious
transformation <xsl:templatematch="@*|*|comment()|processing-instruction()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy></xsl:template>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
91
Transform exercise
Test the identity transform> saxon exa2.xm copy.xsl
Copy copy.xsl to fix.xsl Note we're using a DTD to help
XED help you Edit it to add a template for
<price> which adds a 'currency' attribute
Try it on exa15.xml as follows> saxon exa15.xml fix.xsl
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
92
Heavy hint
<xsl:template match="price"> <price currency="UKL"> <xsl:apply-templates select="@*|node()"/> </price> </xsl:template>
You can compute the value of attributes by using curly braces . . .<price currency="{/catalog/@currency}">. . .
Would copy the currency attribute from the catalog root to price
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
93
Variables XSLT is a pure functional language
No state No side-effects
You can bind variables<xsl:variable name="currencySymbol"> £</xsl:variable><xsl:variable name="title" select="/catalog/title"/>
And access them with a dollar-sign ($)
<xsl:value-of select="$currencySymbol"/>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
94Combining several documents
The document function allows access within a stylesheet to named other documents
If bound to a variable, can then be used as the starting point for a search
<xsl:variable name="catalog" select="document('exa15.xml')/*"/>. . .<xsl:value-of select="$catalog/entry[number='E102']/price"/>
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
95Multiple document exercise Look at order.xml and fold.xsl
Can you see what fold.xsl is for? Try it:> saxon order.xml fold.xsl
Copy fold.xsl to nfold.xsl Edit it to use the catalog database
to add a name attribute to each order Use the addition of the price attribute
as a model To run this, do
> saxon order.xml nfold.xsl
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
96
Implementations of XSL James Clark has implemented most of XSLT
Lotus, IBM and others have done so as well The best Java implementation in my view is
Michael Kay's Saxon (http://sourceforge.net/projects/saxon)
IE supports the whole language And it's much faster than the Java
implementations Mozilla’s implementation is also pretty good
Although not blindingly fast Others are implementing subsets of the
formatting semantics Usage is widespread
Processing XML:SAX and DOM
Henry S. ThompsonHCRC Language Technology
GroupUniversity of Edinburgh
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
98
What is SAX Serialised Access to XML A 'fat' stream of bits from an XML document Uses the 'factory' design pattern
Programs register handlers for (subset of events)– Start tag (incl. Attributes)– End tag– Text content– …
Document is parsed (possibly validate)– Appropriate handlers are called– Program does what it likes
No tree unless you build it
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
99
What is the DOM?
A programming-language independent interface for XML/HTML documents Object types, properties and methods for
– navigation– construction
Tree-oriented Not a data model
Different expressions of the same XML document present different pictures through the DOM
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
100
What is the DOM, cont'd?
A core plus extension for HTML Primary expression is in OMG IDL Bindings provided for
Ecmascript (= standardised Javascript) Java C++
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
101
The DOM approach
Not objects, but interfaces Each implementation may have its
own objects But you can build and navigate
through the standard interface
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
102
DOM interface types
Node see below
NodeList e.g. for children of elements
NamedNodeList e.g. for attribute list
String all values
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
103
DOM Nodes Almost everything is a sub-type of Node Different sub-types allow different types
of children They also support different attributes
and methods A partial list of types and their children
Document Element, ProcessingInstruction, Comment, DocumentType
Element Element, Text, EntityReference, . . .Attr Text, EntityReferenceDocumentType None (!)EntityReference Element, Text, . . .
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
104
DOM Issues There are some things which are
implementation-specific Input/output Entities vs. references
There are some things which are language specific property vs. method syntax
There is some redundancy in the API nodeName is defined for most Node
subtypes– for Elements, same as tagName– for Attrs, same as name
2627
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
105
DOM examples The same example in two languages
Javascript and Java Bold-face indicates the parallelism
All but the first are DOM components Node creation and decoration is
possible too Loading would then not be needed Both SAXDOM and IE5 define a way to
output a document– SAXDOM: writeDocument produces a text
version– IE5: text output not clear as of this writing, but
can display synthesised documents by handing off to the browser
2627
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
106
DOM navigation
Tree-wise parentNode, firstChild, lastChild, nextSibling and previousSibling properties
By name getAttribute(name) getElementsByTagName(name) no access via ID
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
107
SAX and DOM verdict DOM is clunky, with various infelicities
But it's more portable Whole-tree approach limits size of
document which can be processed SAX is a clear win here
Both are better than ad-hoc interfaces Quick way to get XML-based e-
commerce applications moving Java server-side (or client-side with SAX) Javascript client-side (not SAX)
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
108XML is ASCII for the 21st century ASCII (ISO 646) solved a fundamental
interchange problem for flat text documents What bits encode what characters
– (For a pretty parochial definition of 'character') UNICODE/ISO 10646 extends that
solution to the whole world XML thought it was doing the same for
simple tree-structured documents The emphasis in the XML design was on
simplifying SGML to move it to the Web XML didn't touch SGML's architectural vision
– flexible linearisation/transfer syntax– for tree-structured documents with internal links
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
109
The essence of XML
It's a markup language used for annotating text
It is concerned with logical structure to identify sections, titles, section
headers, chapters, paragraphs,… It is not concerned with appearance
you say 'this is a subtitle'not 'this is in bold, 14pt, centered'
you say 'this is an example'not 'this is in verbatim, indented by 5pts, ragged right'
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
110The essence of XML, try again
It's a markup language used for transferring data
It is concerned with data models to convert between application-
appropriate and transfer-appropriate forms
It is not concerned with human beings It's produced and consumed by
programs
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
111
Application data
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
112
Structured markup<POORDERHDR><DATETIME qualifier="DOCUMENT"> <YEAR>1996</YEAR> <MONTH>06</MONTH> <DAY>30</DAY> <HOUR>23</HOUR> <MINUTE>59</MINUTE> <SECOND>59</SECOND> <SUBSECOND>0000</SUBSECOND> <TIMEZONE>+0100</TIMEZONE> </DATETIME> <OPERAMT qualifier="EXTENDED" type="T"> <VALUE>670000</VALUE> <NUMOFDEC>2</NUMOFDEC> <SIGN>+</SIGN> <CURRENCY>USD</CURRENCY>. . .
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
113
What just happened!? The whole transfer syntax story just
went meta, that's what happened! XML has been a runaway success, on a
much greater scale than its designers anticipated Not for the reason they had hoped
– Because separation of form from content is right But for a reason they barely thought about
– Data must travel the web Tree structured documents are a useable
transfer syntax for just about anything So data-oriented web users think of XML as
a transfer mechanism for their data
Metadata for XML: RDF and XML-Link
Henry S. ThompsonHCRC Language Technology
GroupUniversity of Edinburgh
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
115
XML and e-Business Ed Feigenbaum once described Terry
Winograd’s work as “a breakthrough in enthusiasm” I worry sometimes if XML and e-business is
vulnerable to the same criticism Negotiation between producers and
consumers is the key If you can’t describe what you want, you can’t
have it If you can’t describe what you’ve got, no-one will
use it If you can’t dicker, you’ll always lose
So as far as I can see, for e-Business to be successful the Web badly needs a solution to the metadata problem
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
116
The problem
We're all drowning in information: our desktop machines are bursting, our intranets are vast, the World Wide Web is effectively infinite, and significantly different from one week to the next.
Traditional approaches to storage management (hierarchical file systems) and information retrieval (indexing on all words in every document) are clearly already barely coping at best, and are unlikely to meet the challenge.
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
117
The problem, cont'd
Exponential increase in average amount of local disk space per individual.
Hyper-exponential growth in connectivity: Intranets; The Internet
Hierarchical file systems, with perhaps formerly meaningful names, are just not adequate to the organisational task which arises.
The idea, if not yet the reality, of metadata has been widely touted as the solution to these problems
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
118
Where did I put that?
How can I find the information I need? In a document I wrote once; In a message I received once; In a message I sent once; In a document a colleague wrote; In a document somewhere on the Internet?
Traditional information-retrieval techniques are not working: Too many of the repositories (e.g.
compressed mail archives) are resistant to indexing;
Plain text indexing is too blunt an instrument.
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
119
The solution(s)?
There's been a lot of talk about metadata.
What is metadata? It's just data. But it's data about other data That machines can read.
What could metadata do for us? Give search engines something to work
with that is designed for their needs. Give us all a place to record what a
document is for or about.
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
120Requirements for metadata
What would we need to make this work? A standard syntax, so metadata can be
recognised as such; One or more standard vocabularies, so
search engines, authors and users all speak the same language;
Lots of documents with metadata attached;
Attributions we can believe; sources we trust.
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
121
What is RDF?
RDF is actually two standardisation efforts, under the aegis of the W3C.
It stands for Resource Description Framework (in other words, data about data).
The two efforts are: Standardising the syntax and abstract
semantics; Providing a standard way of defining
standard vocabularies (but not actually defining any).
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
122An aside: what is the W3C?
The World Wide Web Consortium. A voluntary association of companies and
non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Chairman (Tim Berners-Lee) holding all the high cards, but the big vendors (e.g. Microsoft, Adobe, Netscape) have a lot of power.
How do standards get drafted and approved? W3C Draft Recommendations come from
Working Groups with little (XML) or a lot of input from W3C staff (CSS1,2). They are approved by the Chairman.
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
123
RDF Model and Syntax
The model is a labelled directed graph, with nodes being loci in hyperspace, and labels being atoms.
There is a notion of reification, which allows property types (= edge labels) and individual edges to themselves be described (i.e. effectively be the sources of edges)
The syntax is XML, almost but not quite expressible in a DTD
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
124
A simple RDF example
This is expressed using one of several shorthands The about attribute of a Description
identifies the thing described Each sub-element name is a property Each sub-element either has
– content (the atomic value of its property)– a resource attribute which points to its
value[ora's home
page] Ora Lassilav:Name
Lassila@w3.orgv:Email
[ora himself]
s:Creator
34
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
125
What RDF is not
RDF obviously cannot provide the third requirement: lots of meta-described documents
More subtly, RDF is not a knowledge representation system. There is no notion of what a conformant
RDF application might do other than identify and normalise metadata.
We can easily read much more into RDF annotations than is computationally there.
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
126
An alternative answer
If RDF is just about connections between points in hyperspace, why isn't XML Link all that's needed?
Before showing how this can be done, a brief diversion into XML Link
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
127
What is XLink
Just as XML itself simplified SGML while extending HTML
XLink simplifies HyTime while extending HTML
XLink provides mechanisms for Describing links with link elements Identifying links and link ends by type and
role Locating link ends with a powerful locator
syntax Incorporating link elements in-line or out-of-
line Specifying default behaviours
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
128
Simple XLink example
This a simple reconstruction of HTML's A element, specifying two-ended link in-line with one implicit and one explicit locator<refr xlink:type="simple" xlink:href="http://www.w3.org/">The W3C</refr>
On the next slide is a richer example, specifying a two-ended link out-of-line with two explicit locators
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
129More complex link example<connect xlink:type='extended'> <dutch xlink:type='locator' xlink:href='http://www.klm.nl/About/Nederlands/default.htm'>
<english xlink:type='locator' xlink:href='http://www.klm.nl/About/default.htm'>
This is a good example of hand-crafted home-page translation pairing.</connect>
Putting XML to Work:Conclusions
Henry S. ThompsonHCRC Language Technology
GroupUniversity of Edinburgh
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
131
XML is moving fast
The language itself won't change much Some of the add-ons are also stable
Namespaces XSLT
Other things are pretty close to stabilising XML Link XML Schema
Some parts are much less stable XML Query, Protocols
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
132New project today: what to use? XML with DTD, one namespace,
unless you really need SGML Ready for Schema
Author with one of the free tools Validate with RXP Render by:
Using XSL transformation to HTML+CSS– Show with IE or Mozilla online – Use XT, Xalan, SAXON offline and show with
IE4, Netscape– Use XML+CSS in simple cases, show with IE,
Mozilla(?)– For data, use XML+XSL and IE or Mozilla
Script with Java+SAX; Javascript/Python + DOM
Division of Informatics Henry S. ThompsonPutting XML to Work, Edinburgh 2003-12-16
133
What will I use tomorrow?
XML+Schema and Namespaces Validate with schema-based parser Render with XSL to screen or print Build tools with SAX or DOM