XML for Information Management – Day 4 Airi Salminen XML for Information Management University of...
-
Upload
carmel-marianna-francis -
Category
Documents
-
view
221 -
download
1
Transcript of XML for Information Management – Day 4 Airi Salminen XML for Information Management University of...
XML for Information Management – Day 4Airi Salminen
XML for Information Management
University of Erlangen-NurembergComputational Linguistics
Instructor: Professor Airi Salminenhttp://users.jyu.fi/~airi/
26.4.-30.4.2010
XML for Information Management – Day 4Airi Salminen
2
1. Entity types2. Entity declarations and references3. XML processor treatment of entity
references4. Motivations for the use of entities5. XML family of languages
Outline
XML for Information Management – Day 4Airi Salminen
3
3. Entity types
Physical structure of XML documents consists of entities.
An entity is a unit recognized by the XML processor, the content of an entity is text or other kind of data.
XML for Information Management – Day 4Airi Salminen
4
parsed entities -- unparsed entities
internal entities -- external entities
general entities -- parameter entities
3-dimensional categorization:
3. Entity types
XML for Information Management – Day 4Airi Salminen
5
parsed entity
intended to be parsed by the XML processor, content consists of marked-up text
unparsed entity
not intended to be parsed by the XML processor, content can be whatever data
3. Entity types
XML for Information Management – Day 4Airi Salminen
6
internal entity
name and value given in an entity declaration
always a parsed entity
external entity
not internal
parsed or unparsed
3. Entity types
XML for Information Management – Day 4Airi Salminen
7
general entity
used in elements and attributes
parsed or unparsed
internal or external
parameter entity
used in the document type definition
always parsed
internal or external
3. Entity types
XML for Information Management – Day 4Airi Salminen
8
Alternatives
parsed internal parameter
internal general
external parameter
internal general
unparsed external general
3. Entity types
XML for Information Management – Day 4Airi Salminen
9
• root entity, external subset of DTD
• other files intended for XML processing
INPUT FILES for XML processing:
UNPARSED ENTITIES:
XMLprocessor
Information about: application
• elements and attributes
• comments• processing instructions• character data• namespaces• notations and
locations of unparsed entities
• files not intended for XML processing but referred to by entity references in the INPUT FILES
INTERNAL ENTITIES:
• name and textual content given in DTD
3. Entity types
XML for Information Management – Day 4Airi Salminen
10
4. Entity declarations and references
EntityDecl ::= GEDecl | PEDecl
GEDecl ::= '<!ENTITY' S Name S EntityDef S? '>'
PEDecl ::= '<!ENTITY' S '%' Name S PEDef S? '>'
EntityDef ::= EntityValue | ( ExternalID NDataDecl?)
PEDef ::= EntityValue | ExternalID
entity definition for external entityentity definition for internal entity
XML for Information Management – Day 4Airi Salminen
11
internal entity
name and value ( = literal value) given
<!ENTITY % Shape "(rect | circle | poly | default )">
<!ENTITY JY "Jyväskylän yliopisto">
name literal value
4. Entity declarations and references
XML for Information Management – Day 4Airi Salminen
12
name and system identifier (possibly together with public identifier) given, for an unparsed entity also notation
external entity
<!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent"><!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html
Declarations from XHTML specification:
<!ENTITY virtuaaliyliopistouutiset SYSTEM "http://virtuaaliyliopisto.jyu.fi/kotisivut/sisalto/etusivu/newsfeed.xml">
4. Entity declarations and references
XML for Information Management – Day 4Airi Salminen
13
Unparsed entity
notation name
The notation must have been declared, for example:
<!ENTITY image1 SYSTEM "../images/birdnest.gif" NDATA gif>
4. Entity declarations and references
<!NOTATION gif PUBLIC "-//ISBN 0-7923-9432-1::Graphic Notation//NOTATION CompuServe Graphic Interchange Format//EN" >
XML for Information Management – Day 4Airi Salminen
14
References to parameter entities:
%Shape;
&JY;
%HTMLsymbol;
&virtuaaliyliopistouutiset;
References to parsed general entities:
Reference to an unparsed general entity:
<poem image="image1">
The type of the attribute has to be ENTITY or ENTITIES
4. Entity declarations and references
XML for Information Management – Day 4Airi Salminen
15
In addition to entity references, XML documents may contain character references.
Refers to a specific character of Unicode
Provides a decimal or hexadecimal representation of the character’s code point in Unicode
"Example:
One-character entity defined: <!ENTITY quot """>
4. Entity declarations and references
XML for Information Management – Day 4Airi Salminen
16
Where an entity or character reference can occur?
reference to
can occur inparameter entity ‣document type definition
parsed general entity ‣element content‣attribute value (either in the start-
tag or in the attribute definition)‣entity value
unparsed general entity ‣attribute value (either in the start-tag or in the attribute definition)
character ‣element content‣attribute value (either in the start-
tag or in the attribute definition)‣entity value
4. Entity declarations and references
XML for Information Management – Day 4Airi Salminen
17
5. XML processor treatment of entity references
References to unparsed entities
Validating processor makes the identifiers for the entities and associated notations available to the application.
<poem image="figure1"><!-- From a poem of Aale Tynni --><line>Seisoin ikkunassa ja nauroin. Ihana puu.</line><line>Ihana pesä.</line></poem>
XML for Information Management – Day 4Airi Salminen
18
References to parsed entities
Dealing with two kinds of entity values:
literal value - the character string written between quotes in the entity definition
replacement text - derived by replacing the character references and parameter entity references in the literal value by their character values and replacement texts, respectively.
The XML processor replaces the entity reference by its replacement text.
5. XML processor treatment of entity references
XML for Information Management – Day 4Airi Salminen
19
<!ENTITY rhyme1 "<rhyme xml:lang="fi"><line>Ole aina iloinen</line><line>niin kuin pikku varpunen</line></rhyme>">
entity declaration
The XML processor is not able to parse this! Problem with the quotes inside the quotes!
5. XML processor treatment of entity references
XML for Information Management – Day 4Airi Salminen
20
<!ENTITY rhyme1 "<line>Ole aina iloinen</line><line>niin kuin pikku varpunen</line></rhyme>">
replacement text = literal value
entity declaration
entity reference <rhymecollection>&rhyme1; </rhymecollection>
<rhyme><line>Ole aina iloinen</line><line>niin kuin pikku varpunen</line></rhyme>
5. XML processor treatment of entity references
XML for Information Management – Day 4Airi Salminen
21
<!ENTITY rhyme1 "<rhyme xml:lang="fi"><line>Ole aina iloinen</line><line>niin kuin pikku varpunen</line></rhyme>">
replacement text
entity declaration with character references
entity reference<rhymecollection>&rhyme1; </rhymecollection>
<rhyme xml:lang="fi"><line>Ole aina iloinen</line><line>niin kuin pikku varpunen</line></rhyme>
5. XML processor treatment of entity references
literal value <rhyme xml:lang="fi"><line>Ole aina iloinen</line><line>niin kuin pikku varpunen</line></rhyme>
XML for Information Management – Day 4Airi Salminen
22
<!ENTITY % StyleSheet "CDATA"> <!-- style sheet data -->
<!ENTITY % Text "CDATA"> <!-- used for titles etc. -->
<!ENTITY % coreattrs "id ID #IMPLIED class CDATA #IMPLIED
style %StyleSheet; #IMPLIED title %Text; #IMPLIED">
http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html
Declarations from XHTML specification:
literal value of coreattrs: id ID #IMPLIED class CDATA #IMPLIED
style %StyleSheet; #IMPLIED title %Text; #IMPLIED
replacement text of coreattrs: id ID #IMPLIED class CDATA #IMPLIED
style CDATA #IMPLIED title CDATA #IMPLIED
5. XML processor treatment of entity references
XML for Information Management – Day 4Airi Salminen
23
<!ENTITY % Block " (%block; | form | %misc; )*">
Exercise
Entity declaration from XHTML Strict-DTD:
What is the (a) literal value(b) replacement text
of entity Block
(a) literal value: (%block; | form | %misc; )*
5. XML processor treatment of entity references
XML for Information Management – Day 4Airi Salminen
24
<!ENTITY % heading "h1| h2| h3| h4| h5| h6"><!ENTITY % lists "ul | ol | dl"><!ENTITY % blocktext "pre | hr | blockquote | address"><!ENTITY % block "p | %heading; | div | %lists; | %blocktext; | fieldset | table"><!ENTITY % misc.inline "ins | del | script"><!ENTITY % misc "noscript | %misc.inline;">
http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html
Declarations from XHTML specification:
Other entity declarations needed from the DTD:
5. XML processor treatment of entity references
XML for Information Management – Day 4Airi Salminen
25
Deriving the replacement text of Block : references to parameter entities in the literal value (%block; | form | %misc;)* replaced by their replacement texts.
p | %heading; | div | %lists; | %blocktext; | fieldset | table
Literal value of block:
Replacement text of block:p | h1| h2| h3| h4| h5| h6 | div | ul | ol | dl | pre | hr | blockquote | address | fieldset | table
Literal value of misc : noscript | %misc.inline;
Replacement text of misc : noscript | ins | del | scriptReplacement text of Block : (p | h1| h2| h3| h4| h5| h6 | div | ul | ol | dl | pre | hr | blockquote |
address | fieldset | table | form | noscript | ins | del | script )*
5. XML processor treatment of entity references
XML for Information Management – Day 4Airi Salminen
26
6. Motivations for the use of entities
• use of non-textual data (audio, graphics, etc.) in XML documents (but can be added also in stylesheets)
• modularization of documents
• consistency
• multiuse of definitions
• adding semantic information by informative entity names and comments attached to entity declarations
The use of entities supports:
XML for Information Management – Day 4Airi Salminen
27
5. XML family of languages
Specification of XML 1.0 was just the first step in the development of languages for the management of data on the Web.
‣W3C (World Wide Web Consortium) developes specifications to support the use of the web, the specifications are publicly available at http://www.w3.org/TR/
‣Development is systematic
‣Development process is specified and published
XML for Information Management – Day 4Airi Salminen
28
‣Working Draft: represents work in progress.
‣Candidate Recommendation: has received significant review from its immediate technical community, explicit call for implementation and technical feedback.
‣Proposed Recommendation: represents consensus in the development group, proposed to the Advisory Committee for review.
‣Recommendation: represents consensus within W3C, widespread implementation encouraged.
Phases of the W3C development process
5. XML family of languages
XML for Information Management – Day 4Airi Salminen
29
XML family =
XML + XML-related languages
A. Salminen, XML Family of Languages. Overview and Classification. http://users.jyu.fi/~airi/xmlfamily.html
5. XML family of languages
XML for Information Management – Day 4Airi Salminen
30
XML-related languages fall into the following categories: XML accessory: intended for wide use to extend the
capabilites of XML
XML transducer: intended for transducing some input XML data into some output form
XML application: intended for some special application domain, defines constraints for XML data on the domain
5. XML family of languages
XML for Information Management – Day 4Airi Salminen
31
additional rules extending the capabilities specified in XML
intended for wide use development primarily at W3C for realizing the modularization principle of W3C: keep
XML itself small and as stable as possible
most important: XML Names, XML Schema, XPath, XLink
XML Accessory
5. XML family of languages
XML for Information Management – Day 4Airi Salminen
32
5. XML family of languages
W3C Recommendations for XML Accessories:
Language Purpose Recommendation
XML Names Qualifying element and attribute names 1999, 2004, 2006, 2009
XML Stylesheet Associating style sheets with an XML document 1999
XPath Addressing parts of XML documents 1999, 2007
XML Schema Constraining a class of XML documents 2001, 2004
XLink To create and describe links 2001
XML Base A base URI service 2001
XPointer Fragment identifiers especially for URI references 2003
xml:id Attribute xml:id in XML documents 2005
ITS Mechanism to support internationalization and localization of content
2007
XML for Information Management – Day 4Airi Salminen
33
To convert XML input data (a document, part of document, a set of documents) into output
Associated with a processing model Active development at W3C
most important: CSS, XSL, XSLT, XQuery
XML Transducer
5. XML family of languages
XML for Information Management – Day 4Airi Salminen
34
5. XML family of languages
W3C Recommendations for XML Transducers:
Language Purpose Recommendation
CSS Rendering (1996), 1998
XSLT Transformation 1999, 2007
Canonical XML Canonicalization 2001, 2002
XSL Rendering 2001, 2006
XInclude Merging 2004, 2006
XQuery Querying 2007
XML for Information Management – Day 4Airi Salminen
35
Defines constraints for a class of XML data on a particular application domain
Usually defined by a DTD or some other schema language
development work both at W3C and outside
Examples from W3C: SMIL, RDF, XHTML
XML Application
5. XML family of languages
XML for Information Management – Day 4Airi Salminen
36
• Non-textual Data
• Web Publishing
• Metadata and Semantic Web
• Web Communication and Services
5. XML family of languages
XML Applications developed at W3C for:
XML for Information Management – Day 4Airi Salminen
37
5. XML family of languages
W3C Recommendations for non-textual data:Language Purpose Recommendation
SMIL (Syncronized Multimedia Integration Language)
Integrating a set of independent multimedia objects into a syncronized multimedia presentation
1998, 2001, 2005
MathML (Mathematical Markup Language)
Mathematical notation, especially for eabling encoding mathematical material for the Web
1999, 2001, 2003
Ruby Annotation Markup for ruby, short annotations alongside the base text typically used in East Asian documents
2001
SMIL Animation Animation functionality in XML documents 2001
SVG To describe two-dimensional vector and mixed vector/raster graphic
2001, 2003
VoiceXML (Voice Extensible Markup Language)
To describe audio dialogs and thus support interactive voice response applications on the Web
2004, 2007
SSML (Speech Synthesis Markup Languages)
To assist generation of synthetic speech in Web and other applications
2004
EMMA (Extensible MultiModal Annotation markup language)
To enable Web access using multimodal interfaces
2009
XML for Information Management – Day 4Airi Salminen
38
5. XML family of languages
W3C Recommendations for Web publishing:
Language Purpose Recommendation
XHTML Reformulation of HTML 4.0 in XML specified by three document types: Strict, Transitional, Frameset
1999, 2000, 2002
XHTML Modularization Defining XHTML elements and attributes in a set of modules
2001
XHTML Basic The minimal core of XHTML 2000
XML Events To represent asynchronous occurrences, such as mouse clicks, in XHTML or in other XML markup
2003
XForms For Web forms allowing online interaction between human users and software, to be used in XHTML or in other XML markup
2003, 2006
XHTML-Print Simple XHTML suitable for printing from mobile devices as well as for display
2006
XML for Information Management – Day 4Airi Salminen
39
5. XML family of languages
W3C Recommendations for Semantic Web:
Language Purpose Recommendation
RDF (Resource Description Framework)
A model and XML-based language for metadata describing Web resources
1999, 2004
RDF Schema To define RDF vocabularies 2004
OWL (Web Ontology Language) Publishing and sharing ontologies 2004
WebCGM XCF Metadata for WebCGM pictures 2007
GRDDL (Gleaning Resource Descriptions from Dialects of Languages)
Markup for declaring that an XML document includes RDF compatible data
2007
SPARQL Query language for RDF 2008
POWDER Metadata to describe a group of resources 2009
XML for Information Management – Day 4Airi Salminen
40
5. XML family of languages
W3C Recommendations for Web communication and services:
Language Purpose Recommendation
P3P (Platform for Privacy Preferences)
To enable Web sites to express their practices to collect and use data collected from users of sites
2002
XML-Signature Associating digital objects by digital signatures in XML format
2002
XML Encryption Encrypting data and representing the result in XML 2002
SOAP (Simple Object Access Protocol)
Rules to exchange structured and typed information between peers in a decentralized, distributed environment
2003, 2007
CC/PP (Composite Capabilities/Preference Profiles)
A format for how a client device tells an origin server about its user agent profile
2004
XKMS (XML Key Management Specification)
Protocol for distributing and registering public keys 2005
WSDL (Web Services Description Language)
To describe Web services 2007
SML Service modeling 2009
XML for Information Management – Day 4Airi Salminen
41
A. Salminen, XML Family of Languages. Overview and Classification. http://users.jyu.fi/~airi/xmlfamily.html
For more information:
1. XML family of languages