Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

29
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001

Transcript of Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Page 1: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Management of XML and Semistructured Data

Lecture 11: Schemas

Wednesday, May 2nd, 2001

Page 2: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Outline

• XML Schema

• Types in Xduce

• Regular tree languages

Page 3: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Attributes in XML Schema

<xsd:element name=“paper” type=“papertype”/>

<xsd:complexType name=“papertype”>

<xsd:sequence>

<xsd:element name=“title” type=“xsd:string”/>

. . . . . .

</xsd:sequence>

<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>

</xsd:complexType>

</xsd:element>

<xsd:element name=“paper” type=“papertype”/>

<xsd:complexType name=“papertype”>

<xsd:sequence>

<xsd:element name=“title” type=“xsd:string”/>

. . . . . .

</xsd:sequence>

<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>

</xsd:complexType>

</xsd:element>

Attributes are associated to the type, not to the elementOnly to complex types; more trouble if we want to add attributesto simple types.

Page 4: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

“Mixed” Content, “Any” Type

• Better than in DTDs: can still enforce the type, but now may have text between any elements

• Means anything is permitted there

<xsd:complexType mixed="true"> . . . .

<xsd:complexType mixed="true"> . . . .

<xsd:element name="anything" type="xsd:anyType"/> . . . .

<xsd:element name="anything" type="xsd:anyType"/> . . . .

Page 5: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

“All” Group

• A restricted form of & in SGML• Restrictions:

– Only at top level– Has only elements– Each element occurs at most once

• E.g. “comment” occurs 0 or 1 times

<xsd:complexType name="PurchaseOrderType">

<xsd:all> <xsd:element name="shipTo" type="USAddress"/>

<xsd:element name="billTo" type="USAddress"/>

<xsd:element ref="comment" minOccurs="0"/>

<xsd:element name="items" type="Items"/>

</xsd:all>

<xsd:attribute name="orderDate" type="xsd:date"/>

</xsd:complexType>

<xsd:complexType name="PurchaseOrderType">

<xsd:all> <xsd:element name="shipTo" type="USAddress"/>

<xsd:element name="billTo" type="USAddress"/>

<xsd:element ref="comment" minOccurs="0"/>

<xsd:element name="items" type="Items"/>

</xsd:all>

<xsd:attribute name="orderDate" type="xsd:date"/>

</xsd:complexType>

Page 6: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Derived Types by Extensions <complexType name="Address">

<sequence> <element name="street" type="string"/>

<element name="city" type="string"/>

</sequence>

</complexType>

<complexType name="USAddress">

<complexContent>

<extension base="ipo:Address">

<sequence> <element name="state" type="ipo:USState"/>

<element name="zip" type="positiveInteger"/>

</sequence>

</extension>

</complexContent>

</complexType>

<complexType name="Address">

<sequence> <element name="street" type="string"/>

<element name="city" type="string"/>

</sequence>

</complexType>

<complexType name="USAddress">

<complexContent>

<extension base="ipo:Address">

<sequence> <element name="state" type="ipo:USState"/>

<element name="zip" type="positiveInteger"/>

</sequence>

</extension>

</complexContent>

</complexType>

Corresponds to inheritance

Page 7: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Derived Types by Restrictions

• (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions…

<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>

<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>

Corresponds to set inclusion

Page 8: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Simple Types

• String

• Token

• Byte

• unsignedByte

• Integer

• positiveInteger

• Int (larger than integer)

• unsignedInt

• Long

• Short

• ...

• Time

• dateTime

• Duration

• Date

• ID

• IDREF

• IDREFS

Page 9: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Facets of Simple Types

Examples

• length

• minLength

• maxLength

• pattern

• enumeration

• whiteSpace

• maxInclusive

• maxExclusive

• minInclusive

• minExclusive

• totalDigits

• fractionDigits

•Facets = additional properties restricting a simple type

•15 facets defined by XML Schema

Page 10: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Facets of Simple Types

• Can further restrict a simple type by changing some facets

• Restriction = subset

Page 11: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Not so Simple Types

• List types:

• Union types

• Restriction types

<xsd:simpleType name="listOfMyIntType">

<xsd:list itemType="myInteger"/>

</xsd:simpleType>

<xsd:simpleType name="listOfMyIntType">

<xsd:list itemType="myInteger"/>

</xsd:simpleType>

<listOfMyInt>20003 15037 95977 95945</listOfMyInt>

Page 12: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Types in XDuce

• Xduce = a functional programming language (like ML)

• Emphasis: type checking for its functions• Data model = ordered trees

– Captures XML elements and attributes

• Types = regular expressions– Same expressive power as XML Schema– Simpler concept– Closer connection to regular tree languages

Page 13: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Values in XDuce<bib> <book> <title> ML for the Working Programmer </title> <author> Paulson </author> <year> 1991 </year> </book> <paper> ... </paper> ...</bib>

<bib> <book> <title> ML for the Working Programmer </title> <author> Paulson </author> <year> 1991 </year> </book> <paper> ... </paper> ...</bib>

val x = bib[book[title[“ML for the Working Programmer”], author[“Paulson”], year[“1991”] ], paper[....], ... ]

val x = bib[book[title[“ML for the Working Programmer”], author[“Paulson”], year[“1991”] ], paper[....], ... ]

Page 14: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Types in XDuce

<!ELEMENT bib ((book|paper)*)><!ELEMENT book (title, author*, year, publisher?)><!ELEMENT title #PCDATA>...

<!ELEMENT bib ((book|paper)*)><!ELEMENT book (title, author*, year, publisher?)><!ELEMENT title #PCDATA>...

type Bib = bib[(Book|Paper)*]type Book = book[Title, Author*, Year, Publisher?]type Title = title[String]...

type Bib = bib[(Book|Paper)*]type Book = book[Title, Author*, Year, Publisher?]type Title = title[String]...

Page 15: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Types in XDuce

• Important idea:– Types are first class citizens– Element names are second class

• This is consistent with regular expressions and automata:– Type = state (we will see later)

Page 16: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Example of Types in XDuce

type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]

type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]

Page 17: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Formal Definition of Types in XDuce

T ::= variable

::= base type

::= () /* empty sequence */

::= T,T /* concatenation */

::= T | T /* alternation */

Where are “*” and “?” ?

Page 18: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Types in XDuce

Derived types:

• Given T, the type T* is an abbreviation for:– type X = T, X | ()

• Similarly, T+ and T? are abbreviations for:– type X = T, T*– type Y = T | ()

Page 19: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Types in XDuce

• Danger with recursion:– Type X = a[], X, b[] | ()– What is is ?

• Need to restrict to tail recursive types

Page 20: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Subsumption in Xduce Types

• Definition. T1 <: T2 if the set defined by T1 is a subset of that defined by T2

• Examples– Name, Addr <: Name, Addr, Tel?– Name, Addr, Tel <: Name, Addr, Tel?– T, T, T <: T*

Page 21: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

XDuce

• Main goal: given a function, check that it is type correct– Come to Benjamin Pierce’s talk on Monday

• One note:– The type checking algorithm in Xduce incomplete (will

see why, in a couple of lectures)

• Important piece of typechecking:– Checking if T1 <: T2

• Obviously can’t do this for context free languages• But can do for regular languages (next)

Page 22: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Regular Tree Languages

• Given a ranked alphabet, L = L0 L1 . . . Lk • Ranked trees are T ::= a[T1,...,Ti] a Li

Definition Bottom-up tree automata isA = (L, Q, , QF) where:– L = ranked alphabet– Q = set of states– = transition relation, : (i=0,k L x Qi) Q– QF = terminal states

Page 23: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Bottom Up Tree Authomata

Computation on a tree t• For each node t = a[t1,...,ti], if the roots of t1,..., ti are

labeled with states q1, ..., qi and q in (a, q1, ..., qi), then label t with q

• If the root is labeled with a state in QF, then accept

The language accepted by A consists of all trees t accepted by A

A regular tree language is a set of trees accepted by some automaton A

Page 24: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Example of Tree Automaton

• L0 = {b}, L2 = {a}

• Q = {q1, q2}

• (b) = q1, (a,q1,q1) = q2, (a,q2,q2) = q1

• What does this accept ?

Page 25: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Properties of Regular Tree Languages

• If T1, T2 are regular, then so are:– T1 T2– T1 – T2– T1 T2

• If A is a nondeterministic bottom up tree automaton, then there exists an equivalent deterministic one– Not true for “top-down” automata

• If T1, T2 are regular, then it is decidable whether T1 T2

Page 26: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Top-down Automata

• Defined similarly, just the computation differs:– Start from the root at an initial state, move downwards

– If all leaves end in an accepting state, then accept

• Here deterministic automata are strictly weaker– e.g. cannot recognize the set {a[a,b], a[b,a]}

• Nondeterministic bottom up = = deterministic bottom up = nondeterministic top down

Page 27: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Example of a Bottom-up Automaton

• A = (L, Q, , , q0, QF) where

– L = L0 L2, L0 = {a, b}, L2 = {a}

– Q = {T0, T1}– (a) = T0, (b) = T1,– (a, T1, T0) = T1, (a, T0, T1) = T1

type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]

type T1 = b[] | a[T1, T0] | a[T0, T1]type T0 = a[] | a[T0, T0]

Page 28: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Regular Tree Languages and XDuce types

• For ranked alphabets, tail-recursive Xduce types correspond precisely to regular tree languages

• Same is true for unranked alphabets, but there the definition of regular tree lnaugages is more complex

Page 29: Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.

Conclusion for Schemas

A Theoretical View

• XML Schemas = Xduce types = regular tree languages

• DTDs = strictly weaker

A Practical View

• XML Schemas still too complex