Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
-
Upload
jayson-warren -
Category
Documents
-
view
215 -
download
0
Transcript of Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
Management of XML and Semistructured Data
Lecture 11: Schemas
Wednesday, May 2nd, 2001
Outline
• XML Schema
• Types in Xduce
• Regular tree languages
Attributes in XML Schema
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
. . . . . .
</xsd:sequence>
<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>
</xsd:complexType>
</xsd:element>
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
. . . . . .
</xsd:sequence>
<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>
</xsd:complexType>
</xsd:element>
Attributes are associated to the type, not to the elementOnly to complex types; more trouble if we want to add attributesto simple types.
“Mixed” Content, “Any” Type
• Better than in DTDs: can still enforce the type, but now may have text between any elements
• Means anything is permitted there
<xsd:complexType mixed="true"> . . . .
<xsd:complexType mixed="true"> . . . .
<xsd:element name="anything" type="xsd:anyType"/> . . . .
<xsd:element name="anything" type="xsd:anyType"/> . . . .
“All” Group
• A restricted form of & in SGML• Restrictions:
– Only at top level– Has only elements– Each element occurs at most once
• E.g. “comment” occurs 0 or 1 times
<xsd:complexType name="PurchaseOrderType">
<xsd:all> <xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:all>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="PurchaseOrderType">
<xsd:all> <xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:all>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
Derived Types by Extensions <complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
Corresponds to inheritance
Derived Types by Restrictions
• (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions…
<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>
<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>
Corresponds to set inclusion
Simple Types
• String
• Token
• Byte
• unsignedByte
• Integer
• positiveInteger
• Int (larger than integer)
• unsignedInt
• Long
• Short
• ...
• Time
• dateTime
• Duration
• Date
• ID
• IDREF
• IDREFS
Facets of Simple Types
Examples
• length
• minLength
• maxLength
• pattern
• enumeration
• whiteSpace
• maxInclusive
• maxExclusive
• minInclusive
• minExclusive
• totalDigits
• fractionDigits
•Facets = additional properties restricting a simple type
•15 facets defined by XML Schema
Facets of Simple Types
• Can further restrict a simple type by changing some facets
• Restriction = subset
Not so Simple Types
• List types:
• Union types
• Restriction types
<xsd:simpleType name="listOfMyIntType">
<xsd:list itemType="myInteger"/>
</xsd:simpleType>
<xsd:simpleType name="listOfMyIntType">
<xsd:list itemType="myInteger"/>
</xsd:simpleType>
<listOfMyInt>20003 15037 95977 95945</listOfMyInt>
Types in XDuce
• Xduce = a functional programming language (like ML)
• Emphasis: type checking for its functions• Data model = ordered trees
– Captures XML elements and attributes
• Types = regular expressions– Same expressive power as XML Schema– Simpler concept– Closer connection to regular tree languages
Values in XDuce<bib> <book> <title> ML for the Working Programmer </title> <author> Paulson </author> <year> 1991 </year> </book> <paper> ... </paper> ...</bib>
<bib> <book> <title> ML for the Working Programmer </title> <author> Paulson </author> <year> 1991 </year> </book> <paper> ... </paper> ...</bib>
val x = bib[book[title[“ML for the Working Programmer”], author[“Paulson”], year[“1991”] ], paper[....], ... ]
val x = bib[book[title[“ML for the Working Programmer”], author[“Paulson”], year[“1991”] ], paper[....], ... ]
Types in XDuce
<!ELEMENT bib ((book|paper)*)><!ELEMENT book (title, author*, year, publisher?)><!ELEMENT title #PCDATA>...
<!ELEMENT bib ((book|paper)*)><!ELEMENT book (title, author*, year, publisher?)><!ELEMENT title #PCDATA>...
type Bib = bib[(Book|Paper)*]type Book = book[Title, Author*, Year, Publisher?]type Title = title[String]...
type Bib = bib[(Book|Paper)*]type Book = book[Title, Author*, Year, Publisher?]type Title = title[String]...
Types in XDuce
• Important idea:– Types are first class citizens– Element names are second class
• This is consistent with regular expressions and automata:– Type = state (we will see later)
Example of Types in XDuce
type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]
type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]
Formal Definition of Types in XDuce
T ::= variable
::= base type
::= () /* empty sequence */
::= T,T /* concatenation */
::= T | T /* alternation */
Where are “*” and “?” ?
Types in XDuce
Derived types:
• Given T, the type T* is an abbreviation for:– type X = T, X | ()
• Similarly, T+ and T? are abbreviations for:– type X = T, T*– type Y = T | ()
Types in XDuce
• Danger with recursion:– Type X = a[], X, b[] | ()– What is is ?
• Need to restrict to tail recursive types
Subsumption in Xduce Types
• Definition. T1 <: T2 if the set defined by T1 is a subset of that defined by T2
• Examples– Name, Addr <: Name, Addr, Tel?– Name, Addr, Tel <: Name, Addr, Tel?– T, T, T <: T*
XDuce
• Main goal: given a function, check that it is type correct– Come to Benjamin Pierce’s talk on Monday
• One note:– The type checking algorithm in Xduce incomplete (will
see why, in a couple of lectures)
• Important piece of typechecking:– Checking if T1 <: T2
• Obviously can’t do this for context free languages• But can do for regular languages (next)
Regular Tree Languages
• Given a ranked alphabet, L = L0 L1 . . . Lk • Ranked trees are T ::= a[T1,...,Ti] a Li
Definition Bottom-up tree automata isA = (L, Q, , QF) where:– L = ranked alphabet– Q = set of states– = transition relation, : (i=0,k L x Qi) Q– QF = terminal states
Bottom Up Tree Authomata
Computation on a tree t• For each node t = a[t1,...,ti], if the roots of t1,..., ti are
labeled with states q1, ..., qi and q in (a, q1, ..., qi), then label t with q
• If the root is labeled with a state in QF, then accept
The language accepted by A consists of all trees t accepted by A
A regular tree language is a set of trees accepted by some automaton A
Example of Tree Automaton
• L0 = {b}, L2 = {a}
• Q = {q1, q2}
• (b) = q1, (a,q1,q1) = q2, (a,q2,q2) = q1
• What does this accept ?
Properties of Regular Tree Languages
• If T1, T2 are regular, then so are:– T1 T2– T1 – T2– T1 T2
• If A is a nondeterministic bottom up tree automaton, then there exists an equivalent deterministic one– Not true for “top-down” automata
• If T1, T2 are regular, then it is decidable whether T1 T2
Top-down Automata
• Defined similarly, just the computation differs:– Start from the root at an initial state, move downwards
– If all leaves end in an accepting state, then accept
• Here deterministic automata are strictly weaker– e.g. cannot recognize the set {a[a,b], a[b,a]}
• Nondeterministic bottom up = = deterministic bottom up = nondeterministic top down
Example of a Bottom-up Automaton
• A = (L, Q, , , q0, QF) where
– L = L0 L2, L0 = {a, b}, L2 = {a}
– Q = {T0, T1}– (a) = T0, (b) = T1,– (a, T1, T0) = T1, (a, T0, T1) = T1
type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]
type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]
Regular Tree Languages and XDuce types
• For ranked alphabets, tail-recursive Xduce types correspond precisely to regular tree languages
• Same is true for unranked alphabets, but there the definition of regular tree lnaugages is more complex
Conclusion for Schemas
A Theoretical View
• XML Schemas = Xduce types = regular tree languages
• DTDs = strictly weaker
A Practical View
• XML Schemas still too complex