XML's validation - DTD
-
Upload
videdegroup -
Category
Education
-
view
356 -
download
1
description
Transcript of XML's validation - DTD
Validation - DTDs
Nguyễn Đăng Khoa
Content
• Document Type Definitions (DTDs)• XML Schemas
DTD – What’s a DTD?
• is a set of rules that defines the elements and their attributes for an XML document
• DTD defines the “grammar” for an XML document
• DTDs were created as part of SGML
DTD’s goals
• Check XML document is valid or not
When to use a DTD
• To create and manage large sets of documents for your company
• To define clearly what markup may be used in certain documents and how markup should be sequenced
• To provide a common frame of reference for documents that many users can share
When NOT to use a DTD
• You’re working with only one or a few small document
• You’re using a nonvalidating processor to handle your XML documents
DTD - Example
DTD - Example
DTD – Internal subset declarations
<!DOCTYPE name_of_root [ …. declarations …]>• Declarations appear between the [ and ]
DTD – External subset declarations
• System Identifiers<!DOCTYPE name_of_root SYSTEM “URI to DTD file”
[ …. declarations …]>– Example:• <!DOCTYPE name SYSTEM “file:///c:/name.dtd” [ ]>• <!DOCTYPE name SYSTEM
“http://wiley.com/hr/name.dtd” [ ]>• <!DOCTYPE name SYSTEM “name.dtd”>
DTD – External subset declarations
• Public Identifiers<!DOCTYPE name_of_root PUBLIC “entry in a
catalog” optional_system_identifier>– Example:• <!DOCTYPE name PUBLIC “-//Beginning XML//DTD
Name Example//EN”> • <!DOCTYPE name PUBLIC “-//Beginning XML//DTD
Name Example//EN” “name.dtd”>
– Common format is Formal Public Identifiers, FPIs-//Owner//Class Description//Language//Version
DTD – External subset declarations
• Public Identifiers<!DOCTYPE name_of_root PUBLIC “entry in a
catalog” optional_system_identifier>– Example:• <!DOCTYPE name PUBLIC “-//Beginning XML//DTD
Name Example//EN”> • <!DOCTYPE name PUBLIC “-//Beginning XML//DTD
Name Example//EN” “name.dtd”>
– Common format is Formal Public Identifiers, FPIs-//Owner//Class Description//Language//Version
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”“http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>
Anatomy of a DTD
• Element declarations• Attribute declarations• Entity declarations
DTD – Element declarations
• declare each element that appears within the document
• can include declarations for optional elements
<!ELEMENT name (first, middle, last)>
ELEMENT declaration
element name
element content model
DTD – Element content models
• Element• Mixed• Empty• Any
DTD – Element Content
• Include the allowable elements within parentheses
<!ELEMENT contact (name)>
<!ELEMENT contact (name, location, phone, knows, description)>
Sequences
Choices
DTD – Element Content – Sequences
• Elements within these documents must appear in a distinct order
<!ELEMENT name (first, middle, last)>
<!ELEMENT contact (name, location, phone, knows, description)>
DTD – Element Content – Sequences
• Elements within these documents must appear in a distinct order
<!ELEMENT name (first, middle, last)>
<!ELEMENT contact (name, location, phone, knows, description)>
Error when parent element:• is missing one of the elements• contains more elements• the elements appeared in another order
DTD – Element Content – Choices
• Allow one element or another, but not both
<!ELEMENT location (address | GPS)>
DTD – Element Content – Choices
• Allow one element or another, but not both
<!ELEMENT location (address | GPS)>
Error when parent element:• is empty• contain more than one of these elements
DTD – Element Content – Combining Sequences and Choices
• Many XML documents need to leverage much more complex rules
<!ELEMENT location (address | (latitude, longitude))>
DTD – Mixed Content
• Any element with text in its content– text can appear by itself or it can be interspersed
between elements• Case 1: simplest mixed content model - text-
only:
<!ELEMENT first (#PCDATA)>
<first>John</first>
DTD – Mixed Content
• Case 2: Mixed content models can also contain elements interspersed within the text
<description>Joe is a developer and author for <title>Beginning XML</title>, now in its <detail>5th Edition</detail></description>
<!ELEMENT description (#PCDATA | title | detail)*>
DTD – Mixed Content
• Case 2: Mixed content models can also contain elements interspersed within the text
<description>Joe is a developer and author for <title>Beginning XML</title>, now in its <detail>5th Edition</detail></description>
<!ELEMENT description (#PCDATA | title | detail)*>
4 rules:• They must use the choice mechanism to separate elements• The #PCDATA keyword must appear first• There must be no inner content models.• If there are child elements, the * cardinality indicator must appear at the end of the model
DTD – Empty Content
• Elements never need to contain content
<!ELEMENT br EMPTY>
DTD – Any Content
• The ANY keyword indicates that – text (PCDATA) – any elements must be declared within the DTD– any order any number of times
<!ELEMENT description ANY>
DTD – Example
DTD – Example
DTD – Cardinality
• An element’s cardinality defines how many times it will appear within a content model
DTD - Example
DTD - Example
DTD - Example
DTD - Example
DTD – Attribute Declarations
• declare a list of allowable attributes for each element
<!ATTLIST contacts source CDATA #IMPLIED>
ATTLIST declaration
associated element’s name
list of declared attributes
DTD – Attribute Declarations
• declare a list of allowable attributes for each element
<!ATTLIST contacts source CDATA #IMPLIED>
attribute name
attribute type
attribute value declaration
DTD – Attribute Types
• When declaring attributes, you can specify how the processor should handle the data that appears in the value
DTD – Attribute Types
DTD – Attribute Types – CDATA
<!ATTLIST website description CDATA #IMPLIED>
DTD – Attribute Types – ID
<!ATTLIST website url ID #IMPLIED>
DTD – Attribute Types – IDREF
<!ATTLIST website link IDREF #IMPLIED>
DTD – Attribute Types – IDREFS
<!ATTLIST website links IDREFS #IMPLIED>
DTD – Attribute Types – NMTOKEN
<!ATTLIST website category NMTOKEN #IMPLIED>
DTD – Attribute Types – NMTOKENS
<!ATTLIST website category NMTOKENS #IMPLIED>
DTD – Attribute Types – Enumerated list
<!ATTLIST website like (YES|NO) #IMPLIED>
DTD – Attribute Value Declarations
• Within each attribute declaration you must specify how the value will appear in the document– Has a default value– Has a fixed value– Is required– Is implied (or is optional)
DTD – Attribute Value Declarations – Default values
• can be sure that it is included in the final output
<!ATTLIST phone kind (Home | Work | Cell | Fax) “Home”>
kind=“Work”
kind=“Home”
DTD – Attribute Value Declarations – Fixed Values
• When an attribute’s value can never change, you use the #FIXED keyword followed by the fixed value
• Fixed values operate much like default values
<!ATTLIST contacts version CDATA #FIXED “1.0”>
DTD – Attribute Value Declarations – Required Values
• Attribute is required must be included within the XML document– you are not permitted to specify a default value
<! ATTLIST phone kind (Home | Work | Cell | Fax) #REQUIRED>
DTD – Attribute Value Declarations – Implied Values
• Attribute has no default value, has no fixed value, and is not required
<! ATTLIST knows contacts IDREFS #IMPLIED>
DTD – Specifying Multiple Attributes
<!ATTLIST contacts version CDATA #FIXED “1.0” source CDATA #IMPLIED>
<!ATTLIST contacts version CDATA #FIXED “1.0”> <!ATTLIST contacts source CDATA #IMPLIED>
DTD – Example
DTD – Example
DTD – Example
DTD – Example
DTD – Entity Declarations
• escape characters • include special characters• refer to sections of replacement text, other
XML markup, and even external files
DTD – Entity Declarations
• 4 primary types– Built-in entities– Character entities– General entities– Parameter entities
DTD – Entity Declarations – Built-in entities
• Start with an ampersand (&) and finish with a semicolon (;)
• There are five built-in entity references in XML
DTD – Entity Declarations – Character entities
• Begin with &# and end with a semicolon (;)• Example: the Greek letter omega (Ω) as a
reference it would be Ω in hexadecimal or Ω in decimal
DTD – Entity Declarations – General entities
• create reusable sections of replacement text• must be declared within the DTD before they
can be used• There are 2 ways to declare:– Internal entity declaration– External entity declaration
DTD – Entity Declarations – General entities
• Internal Entity Declaration
&source-text;&address-unknow;&empty-gps;
DTD – Entity Declarations – General entities
• External Entity Declaration
DTD – Entity Declarations – Parameter entities
• much like general entities, enable you to create reusable sections of replacement text
• cannot be used in general content• can refer to parameter entities only within the
DTD%NameDeclarations;
DTD – Entity Declarations – Parameter entities
DTD Limitations
• Poor support for XML namespaces• Poor data typing• Limited content model descriptions