QUALITY CONTROL WITH SCHEMAS CSC1310 Fall 2009. BASIS CONCEPTS SchemaSchema is a pass-or-fail test...

Post on 18-Jan-2016

223 views 0 download

Tags:

Transcript of QUALITY CONTROL WITH SCHEMAS CSC1310 Fall 2009. BASIS CONCEPTS SchemaSchema is a pass-or-fail test...

QUALITY CONTROL WITH QUALITY CONTROL WITH SCHEMASSCHEMASCSC1310 Fall 2009CSC1310 Fall 2009

BASIS CONCEPTSBASIS CONCEPTS

• SchemaSchema is a pass-or-fail test for document• Schema is a minimum set of requirements for

document to prevent anomalous processing or to formalize an application.

• ValidationValidation is a testing a document with a schema.– StructureStructure: use and placement of markup elements

and attributes.– Data typingData typing: patterns of character data– IntegrityIntegrity: the status of links between nodes and

resources.– Business rulesBusiness rules: spelling checks, checksum results

and so on.

DOCUMENT TYPE DEFINITIONSDOCUMENT TYPE DEFINITIONS(DTDS)(DTDS)

• DTD is the oldest and widely supported schema language.

• DTD declares a set of allowed elements (vocabulary).

• DTD defines a content modelcontent model for each element (grammar)

• DTD declares a set of allowed attributes for each element: name, data type, default values, behavior (for example, required or optional).

DOCUMENT PROLOG FOR DTDDOCUMENT PROLOG FOR DTD All external parsed entities (including DTD)

shouldshould begin with text declaration. Text declarationText declaration looks like XML declarationXML declaration

except explicitly excludingexcluding the standalonestandalone property.

<?xml version=“1.0” encoding=“character <?xml version=“1.0” encoding=“character set”>set”>

Encoding in DTD won’t automatically carry over the XML documents that use the DTD.

External parsed entities (including DTD) must must notnot contain a document type declaration.

DECLARATIONSDECLARATIONS DTD is a set of rules (declarationsdeclarations). Each declaration adds a new element, set of

attributes, entity or notation. If there are redundant entity declarations, entity declarations,

the first one that appears takes precedence, others are ignored.others are ignored.

• EMPTYEMPTY: no information (special tags like <br>)

• ANYANY: any information.

• PCDATA or CDATAPCDATA or CDATA : character data.

• With ChildrenWith Children : a parent-child relationship (order of kids).

USE OF CHILDRENUSE OF CHILDREN There are ways that children elements can be

defined in a DTD file : One Occurrence OnlyOne Occurrence Only

Minimum of One Occurence (+)Minimum of One Occurence (+)

Zero or More Occurences (*)Zero or More Occurences (*)

Zero or One Occurences (?)Zero or One Occurences (?) Either / Or Occurrences ( | )Either / Or Occurrences ( | )

ATTRIBUTESATTRIBUTES There are four value options : ValueValue: The default value of the attribute

surrounded by quotes ( " ") #IMPLIED#IMPLIED: The attribute is optional #FIXED#FIXED: A fixed value.

#REQUIRED#REQUIRED: The attribute is required when the element is used.

TYPES OF ATTRIBUTETYPES OF ATTRIBUTE CDATACDATA : The value is Character Data. (en1|en2|...)(en1|en2|...) : The value is an enumerated

list. IDID : The value is a unique id. IDREFIDREF : The value is the id of another element. IDREFSIDREFS : : The value is a list of other ids NMTOKENNMTOKEN : The value is a valid XML name. NMTOKENSNMTOKENS : The value is a list of valid XML

names. ENTITYENTITY : The value is an entity. ENTITIESENTITIES : The value is a list of entities. NOTATIONNOTATION : The value is a name of a notation. xmlxml : The value is a predefined XML value.

EXAMPLEEXAMPLE

EXAMPLEEXAMPLE

EXAMPLEEXAMPLE

<!ELEMENT date (year, month, day)><!ELEMENT year #PCDATA><!ELEMENT month #PCDATA ><!ELEMENT day #PCDATA >

EXAMPLEEXAMPLE

<!ELEMENT address (street, city, country, zip)><!ELEMENT street (#PCDATA | unit )*><!ELEMENT city #PCDATA ><!ELEMENT country #PCDATA ><!ELEMENT zip #PCDATA ><!ELEMENT unit #PCDATA >

EXAMPLEEXAMPLE<!ELEMENT person (name,

age, gender)><!ELEMENT name (first, last,

(junior | senior)? )><!ELEMENT age #PCDATA ><!ELEMENT gender

#PCDATA ><!ELEMENT first #PCDATA ><!ELEMENT last #PCDATA ><!ELEMENT junior #EMPTY><!ELEMENT senior #EMPTY><!ATTLIST person pid ID #REQUIRED employed (fulltime|

partime)>

TIPS FOR DESIGNING DTDTIPS FOR DESIGNING DTD Organize declarations into groupsgroups by their

purposeBlocks, hierarchical elements, part of tables, lists,

etc. Use whitespacewhitespace

More understandable and easier to navigate.

Use commentscommentsAt the top of each DTD file: purpose, version

number, contact informationCustomization: original, authors, your changes.Label each section and subsection of the DTD.

Track versionversion Use parameter entitiesparameter entities

Hold recurring parts of declarations and allow to edit them in one place.

PARAMETER ENTITIESPARAMETER ENTITIES In the external external DTD, can be used in:

Element-type declarations to hold element groupsAttribute list declarations to hold attribute definition.

In the internalinternal DTD, can hold only complete declarations.

<!ENTITY %% common.atts “ id ID # IMPLIED class CDATA #IMPLIED”>

<!ATTLIST foo %%common.atts;><!ATTLIST bar %%common.atts; extra CDATA #FIXED “blah”>

IMPORTING MODULESIMPORTING MODULES .mod.mod means file contains declarations but

should not be used as DTD on its own. External entity import all the text in a file.

<!ELEMENT catalog (title, metadata, front, entries+)>

<!ENTITY % basic.stuff SYSTEM “basics.mod”>%basic.stuff;<!ENTITY % front.matter SYSTEM “front.mod”>%front.matter;<!ENTITY % metadata SYSTEM “metadata.dtd”>%metadata;

CONDITIONAL SECTIONSCONDITIONAL SECTIONS Conditional sectionConditional section is a special form of

markup in DTD to mark a region for inclusion or exclusion.

Conditional section can be used only in external subsets

<![INCLUDE [ DTD text ]]><![INCLUDE [ DTD text ]]>

<![IGNORE [ DTD text ]]><![IGNORE [ DTD text ]]>

<![INCLUDE [<!ELEMENT blah #PCDATA>]]>

OVERRIDING ELEMENTOVERRIDING ELEMENT In DTD:<!ENTITY % default.polyhedron “INCLUDE”><![%default.polyhedron;[ <!ELEMENT polyhedron (side+,angle+)>]]> In XML:<!DOCTYPE picture SYSTEM “shapes.dtd”[ <!ENTITY %default.polyhedron “IGNORE”> <!ELEMENT polyhedron (side, side, side+,

angle, angle, angle+)>] >

LIMITATION OF DTDLIMITATION OF DTD DTD describes how elements are arranged in

document, but say a little about the content in document.

DTD is not flexible in children order. Lockdown namespace: any element in a

document has to have a corresponding declaration in DTD.

Schema is a new validation system:contains rules that all must be satisfied for a

document to be considered valid is not built into the XML specification. W3C XML Schema, RELAX NG, Schematron.

NAMESPACESNAMESPACES Namespaces are used to group elements and

attributes.xmlns: namespace_prefix = “namespace_identifier”xmlns: namespace_prefix = “namespace_identifier”<part catalog xlmns:nw=“http://www.nutware.com” xlmns=“http://www.bobco.com”> #implicit namespaceimplicit namespace<nw:entry nw:number=“1327”> < nw:decription > hexnut < /nw:description

></nw:entry><part id=“555”> <name> type 4 </name></part></part-catalog>

W3C SCHEMA (2001)W3C SCHEMA (2001) XML document by themselves.

In DTD: <!ELEMENT country #PCDATA > In W3C Schema<xs:schema

xlmns:xs=“http://www.w3.org/2001/XMLSchema”>

<xs:element name=“country” type=“xs:string”/>

</xs:schema>

WIDELY USED TYPES.WIDELY USED TYPES. xs:stringxs:string any text xs:tokenxs:token textual tokens separated by

whitespace xs:decimalxs:decimal any decimal number xs:integerxs:integer any integer number xs:floatxs:float floating-point number xs:ID, xs:IDREFxs:ID, xs:IDREF the same as ID, IDREF in DTD xs:booleanxs:boolean “true”/”false” (“1”/”0”) xs:timexs:time time as HH:MM:SS-Timezone xs:datexs:date date in format CCYY-MM-DD xs:dateTimexs:dateTime date/time combination in format

CCYY-MM-DDTHH:MM:SS-Timezone xs:Qnamexs:Qname namespace-qualified name

COMPLEX ELEMENT IN SCHEMACOMPLEX ELEMENT IN SCHEMA

<xs:element name=“date”> <xs:complexType> <xs:all> <xs:element ref=“year”/>

<xs:element ref=“month”/> <xs:element ref=“day”/>

</xs:all> </xs:complexType></xs:element><xs:element name=“year” type=“xs:integer”/><xs:element name=“month” type=“xs:integer”/><xs:element name=“day” type=“xs:integer”/>

FACETSFACETS

FacetFacet is a way to control the range of the data type.

<xs:simpleType name=“monthNum”> <xs:restriction base=“xs:integer”> <xs:minInclusiveminInclusive value=“1”/> <xs:maxInclusivemaxInclusive value=“12”/> </xs:restriction> </xs:simpleType><xs:element name=“month” type=“monthNum”/> Facets can create fixed values, constrain the

length of strings, match patterns, set allowed values.

FACETS EXAMPLEFACETS EXAMPLE List of allowed values:<xs:simpleType name=“genderType”> <xs:restriction base=“xs:token”> <xs:enumerationenumeration value=“female”/>

<xs:enumeration enumeration value=“male”/> </xs:restriction></xs:simpleType> Pattern:<xs:simpleType name=“pcode”> <xs:restriction base=“xs:token”> <xs:patternpattern value=“[0-9]{3}[A-Z]{3}”/> </xs:restriction></xs:simpleType>

SCHEMA EXAMPLESCHEMA EXAMPLE

<xs:schema xlmns:xs=

“http://www.w3.org/2001/

XMLSchema”>

<xs:element name=“census-record”>

<xs:complexType>

<xs:sequence>

<xs:element ref=“date”/>

<xs:element ref=“address”/>

<xs:element ref=“person”

maxOccurs=“unbounded”/>

</xs:sequence>

<xs:attribute ref=“taker”/>

</xs:complexType> </xs:element>

SCHEMA EXAMPLESCHEMA EXAMPLE

<xs:attribute name=“taker”>

<xs:simpleType>

<xs:restriction base=“xs:integer”>

<xs:minInclusive value=“1”/>

<xs:maxInclusive value=“9999”/>

</xs:restriction>

</xs:simpleType>

</xs:attribute>

SCHEMA EXAMPLESCHEMA EXAMPLE<xs:element name=“date” type=“xs:date”>

<xs:element name=“address”>

<xs:complexType>

<xs:all>

<xs:element ref=“street”/>

<xs:element ref=“city”/>

<xs:element ref=“country”/>

<xs:element ref=“zip”/>

</xs:all>

</xs:complexType> </xs:element>

<xs:element name=“street” type=“xs:string”/>

<xs:element name=“city” type=“xs:string”/>

<xs:element name=“country” type=“xs:string”/>

SCHEMA EXAMPLESCHEMA EXAMPLE<xs:element name=“zip”>

<xs:simpleType>

<xs:restriction base=“xs:token”>

<xs:pattern value=“[0-9]{3}[A-Z]{3}”/>

</xs:restriction>

</xs:simpleType>

</xs:element>

SCHEMA EXAMPLESCHEMA EXAMPLE<xs:element name=“person”>

<xs:complexType>

<xs:all>

<xs:element ref=“name”/>

<xs:element ref=“age”/>

<xs:element ref=“gender”/>

</xs:all>

<xs:attribute ref=“employed”/>

<xs:attribute ref=“pid”/>

</xs:complexType>

</xs:element>

SCHEMA EXAMPLESCHEMA EXAMPLE<xs:attribute name=“employed”>

<xs:simpleType >

<xs:restriction base=“xs:token”>

<xs:enumeration value=“fulltime”/>

<xs:enumeration value=“parttime”/>

<xs:enumeration value=“none”/>

</xs:restriction>

</xs:simpleType>

</xs:attribute>

<xs:attribute name=“pid”>

<xs:simpleType>

<xs:restriction base=“xs:integer”>

<xs:minInclusive value=“1”/>

<xs:maxInclusive value=“999999”/>

</xs:restriction>

</xs:simpleType>

</xs:attribute>

SCHEMA EXAMPLESCHEMA EXAMPLE<xs:element name=“age”>

<xs:simpleType>

<xs:restriction base=“xs:integer”>

<xs:minInclusive value=“0”/>

<xs:maxInclusive value=“150”/>

</xs:restriction>

</xs:simpleType>

</xs:element>

<xs:attribute name=“gender”>

<xs:simpleType >

<xs:restriction base=“xs:token”>

<xs:enumeration value=“female”/>

<xs:enumeration value=“male”/>

</xs:restriction>

</xs:simpleType> </xs:element>

SCHEMA EXAMPLESCHEMA EXAMPLE<xs:element name=“name”>

<xs:complexType>

<xs:all>

<xs:element ref=“first”/>

<xs:element ref=“last”/>

</xs:all>

<xs:choice minOccurs=“0”>

<xs:element ref=“junior”/>

<xs:element ref=“senior”/>

</xs:choice>

</xs:complexType>

</xs:element>

SCHEMA EXAMPLESCHEMA EXAMPLE

<xs:element name=“junior” type=“emptyElem”/>

<xs:element name=“senior” type=“emptyElem”/>

<xs:complexType name=“emptyElem”/>

</xs:schema>