Chapter 23XML

Introduction and Motivation

HTTP Standard Generalized Markup

Language eXtensible Markup Language▪ Useful as a data format to exchange between

apps▪ Markup means something not mentioned in

the document▪ Has tags enclosed n angle brackets▪ <title>Database Systems Concepts</title>

<dept name> Comp. Sci. </dept name><building> Taylor </building><budget> 100000 </budget>


<course id> CS-101 </course id><title> Intro. to Computer Science </title><dept name> Comp. Sci </dept name><credits> 4 </credits>


<IID> 10101 </IID><name> Srinivasan </name><dept name> Comp. Sci. </dept name><salary> 65000 </salary>


<IID> 10101 </IID><course id> CS-101 </course id>


Tags are self documenting No rigid format

Can evolve over time Nested structures Widely accepted

Lots of tools

XML has become THE dominant format for data exchange

Elements Single root Proper nesting

<course> . . . <title> . . . </title> .. . </course>

<course> . . . <title> . . . </course> ... </title>

Text in the context of an element May be mixed with subelements

Nesting to avoid joins (fig. 23.5, 23.6)

Structure (Cont’d)

Attributes name= value Strings Useful as identifiers

Namespace <university

xmlns:yale=“”> Literal values

<![CDATA[<course> · · ·</course>]]>

XML Document Schema

Databases have schemas XML

Document Type Definition XML Schema Relax NG

<!DOCTYPE university [<!ELEMENT university ( (department|course|instructor|teaches)+)><!ELEMENT department ( dept name, building, budget)><!ELEMENT course ( course id, title, dept name, credits)><!ELEMENT instructor (IID, name, dept name, salary)><!ELEMENT teaches (IID, course id)><!ELEMENT dept name( #PCDATA )><!ELEMENT building( #PCDATA )><!ELEMENT budget( #PCDATA )><!ELEMENT course id ( #PCDATA )><!ELEMENT title ( #PCDATA )><!ELEMENT credits( #PCDATA )><!ELEMENT IID( #PCDATA )><!ELEMENT name( #PCDATA )><!ELEMENT salary( #PCDATA )>

] >

DTD (Cont’d)

<!DOCTYPE university-3 [<!ELEMENT university ( (department|course|

instructor)+)><!ELEMENT department ( building, budget )><!ATTLIST departmentdept_name ID #REQUIRED ><!ELEMENT course (title, credits )><!ATTLIST coursecourse_id ID #REQUIREDdept_name IDREF #REQUIREDinstructors IDREFS #IMPLIED ><!ELEMENT instructor ( name, salary )><!ATTLIST instructor IID ID #REQUIRED >dept name IDREF #REQUIRED >· · · declarations for title, credits,

building,budget, name and salary · · ·] >

DTD Limitations

No constraints Data verification needed

No limit over occurrence Lack of typing for ID and IDREF

XML Schema

Result of deficiencies in DTD Has string, integer, decimal,… User defined types

XML Schema (Cont’d)

<xs:schema xmlns:xs=“”><xs:element name=“university” type=“universityType” />

<xs:element name=“department”><xs:complexType><xs:sequence><xs:element name=“dept name” type=“xs:string”/><xs:element name=“building” type=“xs:string”/><xs:element name=“budget” type=“xs:decimal”/></xs:sequence></xs:complexType></xs:element><xs:element name=“course”><xs:element name=“course id” type=“xs:string”/><xs:element name=“title” type=“xs:string”/><xs:element name=“dept name” type=“xs:string”/><xs:element name=“credits” type=“xs:decimal”/></xs:element><xs:complexType name=“UniversityType”><xs:sequence><xs:element ref=“department” minOccurs=“0”maxOccurs=“unbounded”/><xs:element ref=“course” minOccurs=“0”maxOccurs=“unbounded”/><xs:element ref=“instructor” minOccurs=“0”maxOccurs=“unbounded”/><xs:element ref=“teaches” minOccurs=“0”maxOccurs=“unbounded”/></xs:sequence></xs:complexType>


<xs:attribute name = “dept name”/>

XML Schema (Cont’d)

PK<xs:key name = “deptKey”>

<xs:selector xpath = “/university/department”/>

<xs:field xpath = “dept name”/></xs:key>

FK<xs: name = “courseDeptFKey” refer=“deptKey”>

<xs:selector xpath = “/university/course”/><xs:field xpath = “dept name”/>


XML Schema Benefits

Constraints User-defined types PK and FK Integrated namespaces Min and Max values Type extension by inheritence

Query and Transformation

XPath Language for path expressions

XQuery Standard language for querying XML▪ Modeled after SQL but different▪ Deal with nested XML data

Tree Model of XML and XPath

Trees and nodes Elements and attributes

XPath 2.0 /university-3/instructor/name▪ <name>Srinivasan</name>▪ <name>Brandt</name>

XPath features

Selection /university-3/course[credits >=

4]/@course id Functions

Count()▪ /university-2/instructor[count(./teaches/

course)> 2] id(“foo”)

Union “|” …

W3C XQuery 1.0▪ For▪ Let▪ Where▪ Order by▪ Return

XQuery (Cont’d)

for $x in /university-3/courselet $courseId := $x/@course_idwhere $x/credits > 3return <course_id> { $courseId } </course_id>

is equivalent to

for $x in /university-3/course[credits > 3]return <course_id> { $x/@course id } </course_id>

XQuery Joins

for $c in /university/course,$i in /university/instructor,$t in /university/teaches

where $c/course_id= $t/course id and $t/IID = $i/IIDreturn <course_instructor> { $c $i } </course_instructor>

which is equivalent to

for $c in /university/course,$i in /university/instructor,$t in /university/teaches[ $c/course id= $t/course id

and $t/IID = $i/IID]return <course_instructor> { $c $i } </course_instructor>

Functions and Types

declare function local:dept_courses($iid as xs:string) as element(course)* {

for $i in /university/instructor[IID = $iid],$c in /university/courses[dept name = $i/dept_name]return $c


Document Object Model JAVA DOM API

Simple API for XML Event model

Storage of XML Data

Non-relational Data Stores Flat files (NO ACID) XML Database▪ DOM C++-based

Storage of XML Data (Cont’d)

Relational Databases Store as string▪ clob

Tree Representation Map to Relations Publishing and Shredding XML Data Native Storage within Relational


select xmlelement (name “course”,xmlattributes (course id as course id, dept name as dept name),xmlelement (name “title”, title),xmlelement (name “credits”, credits))

from course

XML Applications

Storing Data With Complex Structure ODF OOXML

Standardized Data Exchange Format B2B

Web Services – HTTP SOAP WSDL

Data Mediation – Wrappers