Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC,...
-
Upload
lee-daniel -
Category
Documents
-
view
215 -
download
0
Transcript of Are we There Yet? The State of the XML Revolution Henry S. Thompson Language Technology Group HCRC,...
Are we There Yet? The State of the XML Revolution
Henry S. ThompsonLanguage Technology Group
HCRC, University of Edinburgh
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
2
Introduction XML is going to solve all our
problems: The information glut Web/print publication coordination Site management Electronic commerce :-) . . . The common cold
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
3
Overview Although XML the language is
stable, everything else is still in flux CSS/XSL for Style RDF/Xlink for meta-data DTD/Schema for document structure
definition I’ll give a selective introduction to
the latter two, then summarise business strategy as I see it
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
4An aside: what is the W3C? The World Wide Web Consortium. A voluntary association of companies and
non-profit organisations. Membership costs serious money, confers voting rights. Complex procedures, with the Chairman (Tim Berners-Lee) holding all the high cards, but the big vendors (e.g. Microsoft, Adobe, Netscape) have a lot of power.
How do standards get drafted and approved? W3C Draft Recommendations come from
Working Groups with little (XML) or a lot of input from W3C staff (CSS1,2). They are approved by the Chairman.
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
5Part 1, The Information Glut We're all drowning in information: our
desktop machines are bursting, our intranets are vast, the World Wide Web is effectively infinite, and significantly different from one week to the next.
Traditional approaches to storage management (hierarchical file systems) and information retrieval (indexing on all words in every document) are clearly already barely coping at best, and are unlikely to meet the challenge.
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
6
The problem Exponential increase in average amount
of local disk space per individual. Hyper-exponential growth in connectivity:
Intranets; The Internet
Hierarchical file systems, with perhaps formerly meaningful names, are just not adequate to the organisational task which arises.
The idea, if not yet the reality, of metadata has been widely touted as the solution to these problems
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
7
Where did I put that? How can I find the information I need?
In a document I wrote once; In a message I received once; In a message I sent once; In a document a colleague wrote; In a document somewhere on the Internet?
Traditional information-retrieval techniques are not working: Too many of the repositories (e.g.
compressed mail archives) are resistant to indexing;
Plain text indexing is too blunt an instrument.
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
8
The solution(s)? There's been a lot of talk about
metadata. What is metadata?
It's just data. But it's data about other data.
What could metadata do for us? Give search engines something to work
with that is designed for their needs. Give us all a place to record what a
document is for or about.
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
9Requirements for metadata What would we need to make this
work? A standard syntax, so metadata can be
recognised as such; One or more standard vocabularies, so
search engines, authors and users all speak the same language;
Lots of documents with metadata attached.
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
10
What is RDF? RDF is actually two standardisation
efforts, under the aegis of the W3C. It stands for Resource Description
Framework (in other words, data about data).
The two efforts are: Standardising a syntax and abstract
semantics for metadata; Providing a standard way of defining
standard metadata vocabularies (but not actually defining any).
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
11
RDF Model and Syntax The model is a labelled directed
graph, with nodes being loci in hyperspace, and labels being atoms.
There is a notion of reification, which allows property types (= edge labels) and individual edges to themselves be described (i.e. effectively be the sources of edges)
The syntax is XML, almost but not quite expressible in a DTD
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
12
An RDF example This is expressed using one of
several shorthands The about attribute of a Description
identifies the thing described Each sub-element name is a property Each sub-element either has
– content (the atomic value of its property)– a resource attribute which points to its
value[ora's home
page]
[ora himself]
s:Creator Ora Lassilav:Name
[email protected]:Email
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
13
What RDF is not RDF obviously cannot provide the
third requirement: lots of meta-described documents
More subtly, RDF is not a knowledge representation system. There is no notion of what a conformant
RDF application might do other than identify and normalise metadata.
We can easily read much more into RDF annotations than is computationally there.
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
14
An alternative answer If RDF is just about connections
between points in hyperspace, why isn't XML Link all that's needed?
Before showing how this can be done, a brief diversion into XML Link
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
15
What is XML-Link Just as XML itself simplified SGML while
extending HTML XML-link simplifies HyTime while
extending HTML XML-link provides mechanisms for
Describing links with link elements Identifying links and link ends by type and
role Locating link ends with a powerful locator
syntax Incorporating link elements in-line or out-of-
line Specifying default behaviours
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
16
Simple XML-link example This a simple reconstruction of
HTML's A element, specifying a two-ended link in-line with one implicit and one explicit locator<refr xml:link="simple"
href="http://www.w3.org/">The W3C</refr>
On the next slide is a richer example, specifying a two-ended link out-of-line with two explicit locators
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
17More complex link example
<connect xml:link='extended'> <dutch xml:link='locator' href='http://www.klm.nl/About/Nederlands/default.htm'/>
<english xml:link='locator' href='http://www.klm.nl/About/default.htm'/> This is a good example of hand-crafted home-page translation pairing.</connect>
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
18Using XML Link for metadata XML Link is all about pointing from
place to place in hyperspace So we can use it to reconstruct the
same kind of metadata that RDF is about
The attached example is not of course complete
I've restructured the syntax to make things simpler
Note that by declaring things in the DTD, the instance is much simpler
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
19
New aspects of XML Link We use the role attribute to indicate
what part is played by each endpoint of a link
We use the xml:attributes attribute to map attribute names from what the application wants to what XML Link expects
We cheat a bit to get inline string values
So on this account, RDF is just an application of XML Link
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
20
Namespaces for XML Where did those colons come from?
xsl:this, fo:that, xml:the_other Two communities pushed for
namespaces Vendors, to manage the composition of
document fragments– E.g. the inclusion of mathematical formulae
in a document Working groups, to reserve names
without compromising users' freedom to name things– E.g. it wouldn't do for XML-Link to reserve
LINK for simple links, or XSL to reserve TEXT
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
21
Namespaces, cont'd A second and near-final working
draft W3C recommendation was published in August 1998 There was a lot of vendor pressure to
get something in place, which caused political tension and at least one resignation from the WG
The example illustrates how namespaces are declared, scoped and used
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
22
Namespaces defined You can use qualified names,
consisting of two simple names separated by a colon (:)
The namespace prefix is an abbreviation for a URI which uniquely identifies the owner/meaning/identity of the source of the name
Using a namespace essentially cedes responsibility for the meaning of the qualified names to the owner of the URI
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
23
Declaring a namespace The association between namespace
prefixes and URIs is declared using reserved attributes<doc xmlns:mml=
'http://www.w3.org/TR/REC-MathML/'>...</doc>
Anywhere inside the above doc element mml is a legal namespace prefix, standing for the URI given
There is also a mechanism for defining the default (unprefixed) namespace
Declarations are scoped Qualified names can be used for
Element type names Attribute names
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
24
Namespace limitations An add-on for, not a rewrite of, the
XML spec Validation is unchanged
Declarations must match instances character by character
Indeed there's no place for associating prefixes with URIs in DTDs
There is no provision for merging DTDs XML Schema have responsibility for
addressing these issues in the near future
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
25
Conclusion We don't need/aren't ready for high-
level standards yet RDF Database serialisation
The basic mechanisms we have are very powerful XML Namespaces XML-Link
Let's focus on exploiting them creatively Light-weight components Designed for composition and pipelining
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
26Part 2: Document Structure Moving from the old (DTD) to the
new (Schemas)
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
27
Overview What are schemata, anyway? Where did this all start? The nature of document structure Taking control of structure definition The role of inheritance in structure
definition Comparison with existing techniques Datatypes
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
28
Terminology Documents have structure
Document types Document instances
Structure can be defined Informally (D. S. D.) SGML DTD XML DTD Schema using XML
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
29
Background SGML DTDs for D. S. D
Sperberg-McQueen Others
Considered for XML itself MCF, then RDF, now DCD, by Bray
et al. XML-Data, two versions, by Layman
et al.
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
30
Document Structure Two relations are constitutive
Part-of Kind-of
Existing DSD mechanisms use Content Models to specify part-of relations
But they only specify kind-of relations implicitly or informally
Making kind-of relations explicit would make both understanding and maintenance easier
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
31
Taking Control of D. S. D. Eric Naggum used to talk about SGML
allowing users to take control of their data
XML allows the same move one level up, for developers The starting point is much simpler The architecture is congenial The demand is there
We need to do this, to make the transition to validation easier
We need to do it now, to stimulate experimentation
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
32
Why validate? A D. S. D. is a contract between
producers and consumers It provides a guaranteed interface Producers validate to ensure they are
providing what they promised Consumers validate to check up on
producers and to protect their applications
Application authors validate to simplify their task Leave error detection and analysis to the
validating parser
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
33
Reconstructing DTDs The Schema DTD is expressed in vanilla
XML Top level element types for declaring
Element types :-) Entities Notations . . .
Subordinate element types for declaring Attributes Content models . . .
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
34
Schema example<!ELEMENT text (PCDATA|emph|name)*><!ATTLIST text
timestamp NMTOKEN #REQUIRED>
<ElementType name="text" content="mixed">
<attribute type="timestamp"/> <element type="emph"/> <element type="name"/></ElementType><AttributeType name="timestamp"
datatype="ISOTime" default="required"/>
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
35The Schema Architecture: Static A document consists of
A schema A document proper
Each is well-formed XML The schema is valid w.r.t the
Schema DTD The document proper is meta-valid
w.r.t the schema
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
36The Schema Architecture: Dynamic An XML application (XSP) which
meta-validates In the first instance, semantics in
terms of translation into vanilla XML
‘Takes control’ because changing how schemata work means changing the Schema DTD upgrading XSP accordingly not changing XML itself
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
37What to Do with our New Power? Get rid of parameter entities! Two main uses of P. E. s
element type disjunctions for content models
attribute declaration list fragments Both of these are indicative of
implicit kind-of hierarchies
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
38Conclusion: XML is moving fast The language itself won't change
much Things around it will change a lot
XML Link Namespaces XML Schema XSL
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
39What do I use for a new project today? XML with DTD, no namespaces, unless
you really need SGML Author with one of the free tools Validate with SP (best error messages) Render by:
Using DSSSL and JADE (robust, powerful, flexible)– Upgrade path via full XSL in due course
Using XSL transformation via XT to HTML+CSS– Show with IE4 or Netscape4– Upgrade via XML+CSS in due course
Build tools with SAX/DOM using Javascript, Java or Python
XML: Are we There Yet? Henry S. Thompson
ECN, Curtin, 1999.1.25
40
What will I use in a year? XML+Schema and Namespaces Validate with schema-based parser Render with XSL to screen or print Build tools with the DOM