Introduction to XML John Arnett, MSc Standards Modeller Information and Statistics Division...
Transcript of Introduction to XML John Arnett, MSc Standards Modeller Information and Statistics Division...
Introduction to XMLJohn Arnett, MScStandards ModellerInformation and Statistics DivisionNHSScotlandTel: 0131 551 8073 (x2073)mailto:[email protected]://isdscotland.org/xml
Contents
• What is XML?• Anatomy of an XML Document• Conformance and Validation• Summary• Find Out More
What is XML?
– a programming language– a software panacea– an object-oriented technology– HTML with funny tags– a replacement for HTML… but it
is re-shaping publishing on the web
• XML is not…
What is XML?
– Meta-markup language derived from SGML (Standard Generalised Markup Language)
– Open Standard, currently XML 1.0 2nd edition (W3C Recommendation 6 October 2000)
• Stands for Extensible Markup Language
What is XML?
– XML is the universal format for structured documents and data on the Web
– A data object is an XML document if it is well-formed, as defined in [the W3C] specification (more on this later)
• W3C says
What is XML?
• Data Content and Presentation Sample dataset
1
0
1
0
SEX
15061976SarahJackson147678
12111979LesleyMartin 111672
23081983AlisonMcKenzie198457
06011971IanJones134376
DOBFORENAMESURNAMEID
Flat file, database, spreadsheet, etc
• Record – data oriented structure
111672 Martin Lesley 0 12111979
What is XML?
Structured Searchable Easy to understand Portable
What is XML?
• HTML – document oriented structure
<h1>Record Id: <font color="red">11672</font></h1><table><colgroup><col align="left"></colgroup>
<tr><th>Surname:</th><td>Martin</td></tr><tr><th>Given Name:</th><td>Lesley</td></tr><tr><th>Sex:</th><td>Male</td></tr><tr><th>Date of Birth:</th><td>12 November 1979</td></tr>
</table>
Record Id: 11672Surname: MartinGiven Name: LesleySex: MaleDate of Birth: 12 November 1979
Easy to understand Portable Structured Searchable
What is XML?
• XML to the rescue!<Record recordId=“11672">
<Surname>Martin</Surname><GivenName>Lesley</GivenName><Sex>M</Sex><DateOfBirth>
<Day>12</Day><Month>11</Month><Year>1979</Year> </DateOfBirth>
</Record>
Easy to understand Portable Structured Searchable
What is XML?
– Structured– Separates data from presentation– Self-describing – Searchable– Extensible
•i.e. any number of tags allowed
• But XML also…
Anatomy of an XML Document
– character data•tab, carriage return and line feed
•Unicode characters– markup
• XML documents consist of text
Anatomy of an XML Document
• Markup<?xml version="1.0" encoding="UTF-8"?><Message>
<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– start-, end- and empty element tags•tag names are case sensitive!
– entity and character references– comments
Anatomy of an XML Document
• Character data<?xml version="1.0" encoding="UTF-8"?><Message>
<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– Reserved characters&, <, >,‘ and “
Anatomy of an XML Document
• Declaration<?xml version="1.0" encoding="UTF-8"?><Message>
<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– Optional first line of markup (but W3C recommended)
– Used to match documents to parsers
Anatomy of an XML Document
• Root Element<?xml version="1.0" encoding="UTF-8"?><Message>
<!-- this is an xml comment --><MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– Uniquely named element – Contains all the data and links
to other documents
Anatomy of an XML Document
• Elements<Book>XML Bible
<Price>24.99</Price><img src=“book.gif"/><Author>E.R. Harold</Author><Publisher>J. Forbes</Publisher>
</Book>
– Define the content of the XML document
– May contain other elements, character data or can be empty
Anatomy of an XML Document
• Attributes<BookCatalog Subject="XML">
<Book Title="XML Bible" Price="24.99“/>
<Book Title="XML How To Program" Price=“19.99“/>
<Book Title=“Definitive XML Schema“ Price=“44.99“/>
</BookCatalog>
– Add data about the elements
Anatomy of an XML Document
– Built-in entities& = & “ = " < = < > = > ‘ = '
• Handling reserved characters
–CDATA Sections<CodeSnippet>
<![CDATA[if(this->getX() < 5 && values[0] => 10) cerr << "out of range";]]>
</CodeSnippet>
Anatomy of an XML Document
• Namespaces– Preventing naming collisions<order xmlns:cust="http://www.example.com/custDetails“ xmlns:book="http://www.example.com/bookDetails" xmlns="http://www.example.com/order">
<cust:title>Dr</cust:title>
<cust:name>Peter Parker</cust:name>
<book:title>White Teeth</book:title><book:price>5.99</book:price>
<orderNumber>AYT2379</orderNumber>
</order>
Conformance and Validation
– One root element– Start and end tags match <Tag>content</Tag>
– Empty elements are terminated as <Tag/>
– Tags are correctly nested <Parent><Child></Child></Parent>
– All attributes enclosed in “quotes”
• All XML processors must check well-formedness constraints
Conformance and Validation
– specified in Document Type Definitions (DTDs) or Schemas
– a valid XML document must be well-formed
– a well-formed document need not necessarily be valid
• Validating XML processors check against validity constraints
Document Type Definitions
• DTD syntax able to specify
<!ATTLIST Product EffDate CDATA #IMPLIED>
– Element attributes
•limited number of data types•default and fixed attribute values
<!ELEMENT Product (Name, Size?)><!ELEMENT Name (#PCDATA)><!ELEMENT Size (#PCDATA)>
– Structure and order of child elements
Document Type Definitions
– Easy to understand and implement– Lightweight alternative to schemas– But…
•use non-XML syntax•only limited support for data typing and namespaces
•difficult to extend
• DTD’s
Schemas
– Uses XML syntax– Provides built-in and supports
user-defined data types– Supports namespaces– Provides several extensibilty
mechanisms
• W3C Schema
Schemas
• Schemas therefore more flexible…<xs:element name="Product">
<xs:complexType><xs:sequence>
<xs:element name=“Name" type="xs:string"/><xs:element name=“Size" type="xs:positiveInteger”
minOccurs="0"/></xs:sequence><xs:attribute name=“EffDate" type="xs:date"/>
</xs:complexType></xs:element>
<!ELEMENT Product (Name, Size?)><!ELEMENT Name (#PCDATA)><!ELEMENT Size (#PCDATA)><!ATTLIST Product EffDate CDATA #IMPLIED>
• but harder to understand than DTD’s
In Summary…
• A language for describing markup languages
• Extensible, ie. define own tags • Readable, structured and self
describing• Documents must be well-formed• Documents may be validated
using DTD’s and/or Schemas
Find Out More
• World Wide Web Consortium– www.w3.org
• W3C XML v1.0 Specification– http://www.w3.org/TR/REC-xml
Find Out More
• The XML Industry Portal– www.xml.org
• O’Reilly XML site– www.xml.com
• XML Cover Pages– www.oasis-open.org/cover/
• Café Con Leche– www.ibiblio.org/xml/
Find Out More
• Scottish Health and Community Care XML Steering Group– www.isdscotland.org/xml
XML Tools
• XSV - Open Source XML Schema Validator– www.ltg.ed.ac.uk/~ht/xsv-
status.html• MSXML 4.0
– www.microsoft.com/downloads/details.aspx?FamilyID=3144b72b-b4f2-46da-b4b6-c5d7485f2b42
XML Tools
• XML Spy 2004 IDE– www.altova.com/
products_ide.html • Free XML Tools and Software
– www.garshol.priv.no/download/xmltools/
Printed Sources
• Numerous printed sources – for more information visit– Charles F. Goldfarb's www
.xmlbooks.com– www.amazon.com