1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath...

70
1 Chapter 10: XML Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery

Transcript of 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath...

Page 1: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

1

Chapter 10: XMLChapter 10: XML

What is XMLWhat is XML Basic Components of XMLBasic Components of XML XPathXPath XQueryXQuery

Page 2: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

2

What is XML?What is XML?

EExxtensible tensible MMarkup arkup LLanguageanguage Structured markupStructured markup Simplified SGMLSimplified SGML Next-generation HTMLNext-generation HTML W3C Recommendation (spec)W3C Recommendation (spec)

World Wide Web ConsortiumWorld Wide Web Consortium

Page 3: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

3

Family TreeFamily Tree

SGML (1985)

HTML (1993)

XML (1998)

GML (1969)

Page 4: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

4

HTML ExampleHTML Example

<HTML><HTML><HEAD><HEAD><TITLE>HTML example</TITLE><TITLE>HTML example</TITLE></HEAD> </HEAD>

<BODY> <BODY>

<H1>HTML example</H1> <H1>HTML example</H1>

<P>This is an example of HTML markup codes. </P><P>This is an example of HTML markup codes. </P>

</BODY></BODY>

</HTML></HTML>

ExampleExample

Page 5: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

5

HTML and XMLHTML and XML

HTML: HTML: content and presentation are mixed, structure?content and presentation are mixed, structure? Tags, e.g. <H>, <li>, are fixed and specify Tags, e.g. <H>, <li>, are fixed and specify

presentation presentation XML:XML:

Content, presentation, and structure are Content, presentation, and structure are separatedseparated

User can define new tags with meaningful User can define new tags with meaningful annotationannotation

Page 6: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

6

Basic SyntaxBasic Syntax

Starts with XML declarationStarts with XML declaration<?xml version="1.0" standalone=“yes”?><?xml version="1.0" standalone=“yes”?>

Rest of document inside the "root Rest of document inside the "root element"element"<TEI.2>…</TEI.2><TEI.2>…</TEI.2>

<state><state>

<sname> Texas </sname><sname> Texas </sname>

<scode> TX </scode><scode> TX </scode>

</state></state>

Page 7: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

7

Two Kinds of XMLTwo Kinds of XML

Standalone Standalone <?xml version="1.0" standalone=“yes”?><?xml version="1.0" standalone=“yes”?>

Using Document Type Definition (DTD)Using Document Type Definition (DTD) <?xml version="1.0" standalone=“no”?><?xml version="1.0" standalone=“no”?> <!DOCTYPE state SYSTEM “state.dtd”><!DOCTYPE state SYSTEM “state.dtd”> DTD is the meta-data to describe available tagsDTD is the meta-data to describe available tags <!DOCTYPE state[<!DOCTYPE state[

<!ELEMENT state(sname, scode)><!ELEMENT state(sname, scode)>

<!ELEMENT sname (#PCDATA)><!ELEMENT sname (#PCDATA)>

<!ELEMENT scode (#PCDATA)><!ELEMENT scode (#PCDATA)>

]>]>

Page 8: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

8

HTML is an application of HTML is an application of XMLXML

Available tags, e.g. <P> are used to Available tags, e.g. <P> are used to describe presentationdescribe presentation

Where is the DTD of HTML?Where is the DTD of HTML?

Page 9: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

9

Well-formed vs. ValidWell-formed vs. Valid

XML must be XML must be well-formedwell-formed correct syntaxcorrect syntax tags match, tags nest, all characters legaltags match, tags nest, all characters legal parser must reject if not well-formedparser must reject if not well-formed

XML may be XML may be validvalid with respect to a with respect to a DTD (Document Type Definition)DTD (Document Type Definition) tags are used correctlytags are used correctly tags are all declaredtags are all declared attributes are declaredattributes are declared

Page 10: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

10

Validity CheckingValidity Checking

Checks everything specified in a DTDChecks everything specified in a DTD Can't check text (currency, spelling)Can't check text (currency, spelling) Checks against DTD: this is a valid memo, Checks against DTD: this is a valid memo,

book, bibliography, ...book, bibliography, ...

Page 11: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

11

XML SyntaxXML Syntax

The XML declarationThe XML declaration ElementsElements EntitiesEntities TextText Declarations and NotationsDeclarations and Notations Processing InstructionsProcessing Instructions CommentsComments

Page 12: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

12

The XML DeclarationThe XML Declaration

At very beginning of fileAt very beginning of file Officially optional, but always use itOfficially optional, but always use it Can declare version, encoding, standaloneCan declare version, encoding, standalone

Must be in that orderMust be in that order Each is optionalEach is optional

Must declare other encodingsMust declare other encodings <?xml encoding="Big5"?><?xml encoding="Big5"?>

<?xml encoding="ISO-8859-1"?><?xml encoding="ISO-8859-1"?>

Page 13: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

13

ElementsElements

Basic building block of XMLBasic building block of XML Star and end tagStar and end tag

<person>Nico</person><person>Nico</person> Attributes: <date format=“iso8601”> Attributes: <date format=“iso8601”>

</date></date> May be abbreviated by: <date May be abbreviated by: <date

format=“iso8601”/> format=“iso8601”/> Elements can be arbitrary nested to Elements can be arbitrary nested to

describe very rich information structuredescribe very rich information structure

Page 14: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

14

Elements and AttributesElements and Attributes Attributes can parameterize an elementAttributes can parameterize an element

<state region = “Southen”><state region = “Southen”> <sname> Texas </sname><sname> Texas </sname> <scode> TX </scode><scode> TX </scode> </state></state>

Can be represented by sub-element Can be represented by sub-element <state><state> <region> Southen </region><region> Southen </region> <sname> Texas </sname><sname> Texas </sname> <scode> TX </scode><scode> TX </scode> </state></state>

Page 15: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

15

Attribute SyntaxAttribute Syntax

Name can be any Unicode character, digit, Name can be any Unicode character, digit, or '.', '-', '_'or '.', '-', '_'

Cannot repeat: Cannot repeat: same attribute name can not appear more same attribute name can not appear more

than once in an elementthan once in an element Order doesn't matterOrder doesn't matter Values must be quoted (single or double)Values must be quoted (single or double) Values may not contain "<"Values may not contain "<" Values may have defaults in DTDValues may have defaults in DTD

Page 16: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

16

Attributes and Sub-Attributes and Sub-elementselements

A matter of preferenceA matter of preference Main differences:Main differences:

Attribute name can not repeat in the same Attribute name can not repeat in the same elementelement

Sub-element can repreatSub-element can repreat Attribute values are always string dataAttribute values are always string data

Sub-elements can have further sub-elementsSub-elements can have further sub-elements

Page 17: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

17

Special AttributesSpecial Attributes

id has unique identifier for elementid has unique identifier for element idref references an ididref references an id

<state id = “texas”> <sname> Texas </sname> <scode> TX </scode> <cityin idref = “dallas”/> </state>

<city id = “dallas”> <dcode> DAL </ccode> <cname> Dallas </cname> <stateof idref = “texas”/></city>

Page 18: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

18

A unit of textA unit of text Five predefined entitiesFive predefined entities

&amp; (&) &apos;(‘) &lt;(<) &gt;(>) &quot;&amp; (&) &apos;(‘) &lt;(<) &gt;(>) &quot;(“)(“)

Define your own in DTDDefine your own in DTD<!ENTITY euro "&#x20AC;"><!ENTITY euro "&#x20AC;">

Use numeric character referencesUse numeric character references&#x20AC; &#8364;&#x20AC; &#8364;

EntitiesEntities

Page 19: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

19

TextText

Character stringsCharacter strings Use predefined entities (&lt; &amp; …)Use predefined entities (&lt; &amp; …)

XML Example: &lt; (>) &amp;(&) &lt;(<)XML Example: &lt; (>) &amp;(&) &lt;(<) CDATA ("character data") section for raw CDATA ("character data") section for raw

text without using entitiestext without using entities<![CDATA[ if a< b then print a is less than b<![CDATA[ if a< b then print a is less than b

]]>]]>

Page 20: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

20

DeclarationsDeclarations

Allow validity checkingAllow validity checking OptionalOptional May be internal (in document), external, or May be internal (in document), external, or

bothboth DTD (Document Type Definition) is all DTD (Document Type Definition) is all

active declarationsactive declarations Use existing DTDs when possibleUse existing DTDs when possible

Page 21: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

21

External DTDExternal DTD

Most commonMost common Use DOCTYPE declaration before root Use DOCTYPE declaration before root

elementelement <!DOCTYPE greeting SYSTEM "hello.dtd"><!DOCTYPE greeting SYSTEM "hello.dtd">

<greeting>Hello, world!</greeting><greeting>Hello, world!</greeting>

Page 22: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

22

Internal (standalone) DTDInternal (standalone) DTD

For custom documentsFor custom documents Also uses DOCTYPE declarationAlso uses DOCTYPE declaration

<!DOCTYPE greeting [<!DOCTYPE greeting [<!ELEMENT greeting (#PCDATA)><!ELEMENT greeting (#PCDATA)>]>]><greeting>Hello, world!</greeting><greeting>Hello, world!</greeting>

Specify in XML declarationSpecify in XML declaration <?xml version="1.0" standalone="yes"?><?xml version="1.0" standalone="yes"?>

Page 23: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

23

External plus Internal DTDExternal plus Internal DTD

Usually to declare entitiesUsually to declare entities Use DOCTYPE declaration before root Use DOCTYPE declaration before root

elementelement <!DOCTYPE greeting SYSTEM "hello.dtd" [<!DOCTYPE greeting SYSTEM "hello.dtd" [

<!ENTITY excl "&#x21;"><!ENTITY excl "&#x21;">]>]><greeting>Hello, world&excl;</greeting><greeting>Hello, world&excl;</greeting>

Page 24: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

24

Element Type DeclarationsElement Type Declarations

Declare nameDeclare name Declare allowed contentDeclare allowed content

<!ELEMENT a EMPTY><!ELEMENT a EMPTY><!ELEMENT either (one | theother)><!ELEMENT either (one | theother)><!ELEMENT ordered (first, second)><!ELEMENT ordered (first, second)><!ELEMENT list (item+)><!ELEMENT list (item+)><!ELEMENT dl ((dt?, dd?)*)><!ELEMENT dl ((dt?, dd?)*)><!ELEMENT text (#PCDATA)><!ELEMENT text (#PCDATA)><!ELEMENT mixed (#PCDATA | b | i | em)><!ELEMENT mixed (#PCDATA | b | i | em)>

Page 25: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

25

Attribute List DeclarationsAttribute List Declarations

Declare attributes for an elementDeclare attributes for an element Declare value typesDeclare value types Declare defaultsDeclare defaults

<!ATTLIST termdef<!ATTLIST termdef id ID #REQUIRED id ID #REQUIRED name CDATA #IMPLIED> name CDATA #IMPLIED><!ATTLIST list<!ATTLIST list type (bullets|ordered|glossary) type (bullets|ordered|glossary) "ordered">"ordered"><!ATTLIST form<!ATTLIST form method CDATA #FIXED "POST"> method CDATA #FIXED "POST">

Page 26: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

26

Entity DeclarationsEntity Declarations

<!ENTITY copy “&#x00A9;”><!ENTITY copy “&#x00A9;”> <!ENTITY copyright <!ENTITY copyright

"&copy; Infoseek Corp. 1999, All rights "&copy; Infoseek Corp. 1999, All rights reserved">reserved">

Page 27: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

27

Processing InstructionsProcessing Instructions

Instructions to applicationsInstructions to applications fonts?fonts? security?security? correctness checks?correctness checks?

Linking to a style sheetLinking to a style sheet<?xml-stylesheet href="mystyle.css" <?xml-stylesheet href="mystyle.css"

type="text/css"?> type="text/css"?> Instructions to indexing robotsInstructions to indexing robots

<?robots index="no" follow="yes"?><?robots index="no" follow="yes"?>

Page 28: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

28

CommentsComments

Like HTML and SGMLLike HTML and SGML<!-- a comment --><!-- a comment -->

Anything is OK inside a commentAnything is OK inside a comment <!-- <head> & <tail> are elements --><!-- <head> & <tail> are elements -->

<!-- <?xml?> declaration goes here --><!-- <?xml?> declaration goes here -->

Page 29: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

29

What is a DTD?What is a DTD?

"Document Type Definition""Document Type Definition" Bunch of XML declarationsBunch of XML declarations Usually external to documentUsually external to document Designed for some purpose (use one that Designed for some purpose (use one that

matches your needs)matches your needs) Best left to expertsBest left to experts

Page 30: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

30

A Bug Report DocumentA Bug Report Document

<?xml?><bugreport><product>xmltron</product><version>1.1</version><os>RTE</os><osversion>4.0</osversion><date scheme="ISO8601">1999-11-03</date><report><summary>doesn’t work</summary><detail>at all</detail></report><solution>none yet</solution></bugreport>

Page 31: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

31

Make a Document TypeMake a Document Type

<!DOCTYPE bugreport [ <!-- declarations go here -->

]><bugreport> ...

Doctype and root element must match

Page 32: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

32

Declarations for ElementsDeclarations for Elements

<!DOCTYPE bugreport [<!ELEMENT bugreport wait 'til next slide><!ELEMENT product #PCDATA><!ELEMENT version #PCDATA><!ELEMENT os #PCDATA><!ELEMENT osversion #PCDATA><!ELEMENT date #PCDATA><!ELEMENT report (summary, detail)><!ELEMENT summary #PCDATA><!ELEMENT detail #PCDATA><!ELEMENT solution #PCDATA>]>

Page 33: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

33

Declaration for Root Declaration for Root ElementElement

<!DOCTYPE bugreport [<!ELEMENT bugreport (product, version, os, osversion, date, report, solution?)>

<solution> is optional, others required andmust be in this order.

Page 34: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

34

Declarations for AttriburesDeclarations for Attribures

<!ATTLIST date scheme CDATA #IMPLIED>

"CDATA" instead of "PCDATA" means it isn't "parsed" for entities

Page 35: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

35

Declarations for AttributesDeclarations for Attributes

"CDATA" instead of "PCDATA" means it "CDATA" instead of "PCDATA" means it isn't "parsed" for entities (no markup)isn't "parsed" for entities (no markup)

#IMPLIED means optional (value #IMPLIED means optional (value implied by document)implied by document)

separate ATTLIST declarations for the separate ATTLIST declarations for the same element are OKsame element are OK

internal ATTLIST declarations override internal ATTLIST declarations override externalexternal

<!ATTLIST date scheme CDATA #IMPLIED>

Page 36: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

36

documents = contents + documents = contents + stylestyle

Extensible Stylesheet Language (XSL)Extensible Stylesheet Language (XSL) Specifications still in draftSpecifications still in draft But implementations keeping paceBut implementations keeping pace

Page 37: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

37

<?xml version="1.0"?><?xml version="1.0"?><?xml-stylesheet type="text/css" href="xmlpartstyle.css"?><?xml-stylesheet type="text/css" href="xmlpartstyle.css"?><PARTS><PARTS> <TITLE>Computer Parts</TITLE><TITLE>Computer Parts</TITLE> <PART><PART> <ITEM>Motherboard</ITEM><ITEM>Motherboard</ITEM> <MANUFACTURER>ASUS</MANUFACTURER><MANUFACTURER>ASUS</MANUFACTURER> <MODEL>P3B-F </MODEL><MODEL>P3B-F </MODEL> <COST> 123.00</COST><COST> 123.00</COST> </PART></PART> <PART><PART> <ITEM>Video Card</ITEM><ITEM>Video Card</ITEM> <MANUFACTURER>ATI</MANUFACTURER><MANUFACTURER>ATI</MANUFACTURER> <MODEL>All-in-Wonder Pro</MODEL><MODEL>All-in-Wonder Pro</MODEL> <COST> 160.00</COST><COST> 160.00</COST> </PART></PART> <PART><PART> <ITEM>Sound Card</ITEM><ITEM>Sound Card</ITEM> <MANUFACTURER>Creative Labs</MANUFACTURER><MANUFACTURER>Creative Labs</MANUFACTURER> <MODEL>Sound Blaster Live</MODEL><MODEL>Sound Blaster Live</MODEL> <COST> 80.00</COST><COST> 80.00</COST> </PART></PART> <PART><PART> <ITEM> inch Monitor</ITEM><ITEM> inch Monitor</ITEM> <MANUFACTURER>LG Electronics</MANUFACTURER><MANUFACTURER>LG Electronics</MANUFACTURER> <MODEL> 995E</MODEL><MODEL> 995E</MODEL> <COST> 290.00</COST><COST> 290.00</COST> </PART></PART></PARTS></PARTS> Using a cascading style sheet, we will see Using a cascading style sheet, we will see

Page 38: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

38

XPathXPath

Used to access part of XML document Used to access part of XML document Compact, non-XML syntax Compact, non-XML syntax Use a pattern expression to identify nodes Use a pattern expression to identify nodes

in an XML documentin an XML document Have a library of standard functions Have a library of standard functions W3C Standard W3C Standard

Page 39: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

39

XPath ExampleXPath Example

Sample XMLSample XML The root elementThe root element

/STATES/STATES The SCODE of all STATE elements of STATES The SCODE of all STATE elements of STATES

element element /STATES/STATE/SCODE/STATES/STATE/SCODE

All the CAPTIAL element with a CNAME sub-element All the CAPTIAL element with a CNAME sub-element of the STATE element of the STATES elementof the STATE element of the STATES element /STATES/STATE/CAPITAL[CNAME=‘Atlanta’]/STATES/STATE/CAPITAL[CNAME=‘Atlanta’]

All CITIES elements in the XML documentAll CITIES elements in the XML document //CITIES//CITIES

Page 40: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

40

More XPath ExampleMore XPath Example

Element AA with two ancestorsElement AA with two ancestors /*/*/AA/*/*/AA

First BB element of AA elementFirst BB element of AA element /AA/BB[1]/AA/BB[1]

All the CC elements of the BB elements All the CC elements of the BB elements which has an sub-element A with value ‘3’ which has an sub-element A with value ‘3’ /BB[A=‘3’]/CC/BB[A=‘3’]/CC

Any elements AA or elements CC of Any elements AA or elements CC of elements BBelements BB //AA | /BB/CC//AA | /BB/CC

Page 41: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

41

Even More XPath ExampleEven More XPath Example

Select all sub-elements of elements BB of elements Select all sub-elements of elements BB of elements AAAA /BB/AA/*/BB/AA/* When you do not know the sub-elementsWhen you do not know the sub-elements Different from /BB/AADifferent from /BB/AA

Select all attributes named ‘aa’Select all attributes named ‘aa’ //@aa//@aa

Select all CITIES elements with an attribute named aaSelect all CITIES elements with an attribute named aa //CITIES[@aa]//CITIES[@aa]

Select all CITIES elements with an attribute named aa Select all CITIES elements with an attribute named aa with value ‘123’with value ‘123’ //CITIES[@aa = ‘123’]//CITIES[@aa = ‘123’]

Page 42: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

42

AxisAxis

Context nodeContext node Evaluation of XPath is from left to rightEvaluation of XPath is from left to right The context node the current node (set) being The context node the current node (set) being

evaluatedevaluated AxisAxis

Specifies the relationship of the resulting Specifies the relationship of the resulting nodes relative to context nodenodes relative to context node

Example: Example: /child::AA – children of AA, abbreviated by /AA/child::AA – children of AA, abbreviated by /AA //AA/ancestor::BB – BB elements who are ancestor of //AA/ancestor::BB – BB elements who are ancestor of

any AA elementsany AA elements

Page 43: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

43

AxesAxes

ancestorancestor: //BBB/ancestor::*: //BBB/ancestor::*   <AAA><AAA>

          <BBB/>           <BBB/>           <CCC/>           <CCC/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>                     <DDD><DDD>                <BBB/>                <BBB/>                    </DDD> </DDD>           <CCC/>           <CCC/>     </AAA></AAA>

Page 44: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

44

AxesAxes

ancestorancestor: //BBB/ancestor::DDD: //BBB/ancestor::DDD   <AAA> <AAA>

          <BBB/>           <BBB/>           <CCC/>           <CCC/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>                    <DDD> <DDD>                <BBB/>                <BBB/>                    </DDD>  </DDD>           <CCC/>           <CCC/>   </AAA>   </AAA>

Page 45: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

45

AxesAxes

attributeattribute: Contains all attributes of the current node: Contains all attributes of the current node //BBB/attribute::* – abbreviated by //@//BBB/attribute::* – abbreviated by //@ <AAA> <AAA>

          <BBB           <BBB aa=‘1’aa=‘1’/> />           <CCC/>           <CCC/>           <BBB           <BBB aa=‘2’aa=‘2’ /> />           <BBB           <BBB aa=‘3’aa=‘3’ /> />                     <DDD> <DDD>                <BBB                <BBB bb=‘31’bb=‘31’ /> />           </DDD>           </DDD>           <CCC/>           <CCC/>   </AAA>   </AAA>

//BBB/attribute::bb//BBB/attribute::bb

Page 46: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

46

AxesAxes

childchild /AAA/DDD/child::BBB – child can be omitted for /AAA/DDD/child::BBB – child can be omitted for

abbreviationabbreviation   <AAA> <AAA>

          <BBB/>           <BBB/>           <CCC/>           <CCC/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>                     <DDD> <DDD>                             <BBB/>   <BBB/>           </DDD>           </DDD>           <CCC/>           <CCC/>   </AAA>   </AAA>

Page 47: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

47

AxesAxes

descendantdescendant /AAA/descendent::*/AAA/descendent::* <AAA> <AAA>

                   <BBB/>  <BBB/>           <CCC/>           <CCC/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>           <DDD>           <DDD>                <BBB/>                <BBB/>           </DDD>           </DDD>           <CCC/>           <CCC/>   </AAA>   </AAA>

/AAA/descendent::CCC ?/AAA/descendent::CCC ?

Page 48: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

48

AxesAxes

parentparent //BBB/parent::*//BBB/parent::* <AAA><AAA>

          <BBB/>           <BBB/>           <CCC/>           <CCC/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>                     <DDD><DDD>                <BBB/>                <BBB/>                     </DDD></DDD>           < CCC/>           < CCC/>     </AAA></AAA>

//BBB/parent::DDD ?//BBB/parent::DDD ?

Page 49: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

49

AxesAxes

descendant-or-selfdescendant-or-self followingfollowing following-siblingfollowing-sibling preceding: preceding: preceding-siblingpreceding-sibling selfself

Page 50: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

50

PredicatesPredicates

Filters a element setFilters a element set A predicate is placed inside square brackets ( [ ] )A predicate is placed inside square brackets ( [ ] ) Example: //Example: //BBB[position() mod 2 = 0 ]BBB[position() mod 2 = 0 ]       <<AAAAAA> >

          <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <CCCCCC/> />           <          <CCCCCC/> />           <          <CCCCCC/> />      </     </AAAAAA> >

Page 51: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

51

PredicatesPredicates

//BBB[@aa=’31’]//BBB[@aa=’31’] <AAA> <AAA>

          <BBB aa=‘1’/>           <BBB aa=‘1’/>           <CCC/>           <CCC/>           <BBB aa=‘2’ />           <BBB aa=‘2’ />           <BBB aa=‘3’ />           <BBB aa=‘3’ />           <DDD>           <DDD>                               <BBB bb=‘31’ /><BBB bb=‘31’ />           </DDD>           </DDD>           <CCC/>           <CCC/>   </AAA>   </AAA>

Is it different from //BBB/attribute::bb?Is it different from //BBB/attribute::bb?

Page 52: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

52

XQueryXQuery

XQuery is a general purpose query XQuery is a general purpose query language for XML data language for XML data

XQuery uses a XQuery uses a for … let … where .. resultfor … let … where .. result … … syntaxsyntax forfor SQL from SQL from wherewhere SQL where SQL where resultresult SQL select SQL select letlet allows temporary variables, and has allows temporary variables, and has no equivalent in SQLno equivalent in SQL

Page 53: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

53

FLWR Syntax in XQuery FLWR Syntax in XQuery Simple FLWR expression in XQuery Simple FLWR expression in XQuery

find all accounts with balance > 400, find all accounts with balance > 400, with each result enclosed in an with each result enclosed in an <account-number> .. </account-<account-number> .. </account-number> tagnumber> tag forfor $x$x in in /bank-2/account/bank-2/account let let $acctno := $x/@account-$acctno := $x/@account-number number wherewhere $x/balance > 400 $x/balance > 400 return return <account-number> $acctno <account-number> $acctno </account-number></account-number>

Page 54: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

54

Path Expressions and Path Expressions and FunctionsFunctions

The function The function distinct( )distinct( ) can be used to can be used to removed duplicates in path expression removed duplicates in path expression resultsresults

The functionThe function document(name)document(name) returns returns root of named documentroot of named document E.g. E.g. document(“bank-2.xml”)/bank-2/accountdocument(“bank-2.xml”)/bank-2/account

Aggregate functions such as Aggregate functions such as sum( )sum( ) and and count( )count( ) can be applied to path expression can be applied to path expression resultsresults

Page 55: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

55

JoinsJoins Joins are specified in a manner very Joins are specified in a manner very

similar to SQLsimilar to SQL

for for $a $a inin /bank/account, /bank/account, $c $c inin /bank/customer,/bank/customer, $d $d inin /bank/depositor /bank/depositor

where where $a/account-number = $a/account-number = $d/account-number $d/account-number and and $c/customer-name = $c/customer-name = $d/customer-name$d/customer-name

return return <cust-acct> $c $a </cust-<cust-acct> $c $a </cust-acct>acct>

Page 56: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

56

The same query can be expressed with the The same query can be expressed with the selections specified as XPath selections:selections specified as XPath selections: forfor $a $a inin /bank/account /bank/account $c $c inin /bank/customer /bank/customer

$d $d inin /bank/depositor[ /bank/depositor[ account-number = account-number = $a/account-number $a/account-number andand customer-name = customer-name = $c/customer-name$c/customer-name]] return return <cust-acct> $c $a</cust-acct><cust-acct> $c $a</cust-acct>

Page 57: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

57

Changing Nesting StructureChanging Nesting Structure

<bank-1><bank-1> forfor $c $c inin /bank/customer /bank/customer returnreturn

<customer><customer> $c/*$c/* for for $d $d inin /bank/depositor[customer-name = /bank/depositor[customer-name =

$c/customer-name],$c/customer-name], $a $a inin /bank/account[account- /bank/account[account-

number=$d/account-number]number=$d/account-number] returnreturn $a $a

</customer></customer> </bank-1></bank-1>

Page 58: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

58

XQuery Path ExpressionsXQuery Path Expressions

$c/text()$c/text() gives text content of an element gives text content of an element without any without any subelements/tagssubelements/tags

XQuery path expressions support the “–>” XQuery path expressions support the “–>” operator for dereferencing IDREFsoperator for dereferencing IDREFs Equivalent to the id( ) function of XPath, but Equivalent to the id( ) function of XPath, but

simpler to usesimpler to use Can be applied to a set of IDREFs to get a set of Can be applied to a set of IDREFs to get a set of

resultsresults June 2001 version of standard has changed “–June 2001 version of standard has changed “–

>” to “=>”>” to “=>”

Page 59: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

59

Sorting in XQuery Sorting in XQuery Sortby Sortby clause can be used at the end of clause can be used at the end of

any expression. E.g. to return customers any expression. E.g. to return customers sorted by namesorted by name for for $c in /bank/customer$c in /bank/customer return return <customer> $c/* </customer> <customer> $c/* </customer> sortbysortby(name)(name)

Page 60: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

60

Can sort at multiple levels of nesting (sort by Can sort at multiple levels of nesting (sort by customer-name, and by account-number within customer-name, and by account-number within each customer)each customer)

<bank-1><bank-1> for for $c in /bank/customer$c in /bank/customer returnreturn

<customer><customer> $c/* $c/* for for $d$d in in /bank/depositor[customer-/bank/depositor[customer-

name=$c/customer-name],name=$c/customer-name], $a $a in in /bank/account[account-/bank/account[account-

number=$d/account-number]number=$d/account-number] return return <account> $a/* </account> <account> $a/* </account> sortbysortby(account-number)(account-number)

</customer></customer> sortby sortby(customer-name)(customer-name) </bank-1></bank-1>

Page 61: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

61

Application Program Application Program InterfaceInterface There are two standard application program There are two standard application program

interfaces to XML data:interfaces to XML data: SAX SAX (Simple API for XML)(Simple API for XML)

Based on parser model, user provides event handlers Based on parser model, user provides event handlers for parsing events for parsing events

E.g. start of element, end of elementE.g. start of element, end of element Not suitable for database applicationsNot suitable for database applications

DOM DOM (Document Object Model)(Document Object Model) XML XML data is parsed into a tree representation data is parsed into a tree representation Variety of functions provided for traversing the DOM Variety of functions provided for traversing the DOM

treetree E.g.: Java DOM API provides Node class with methodsE.g.: Java DOM API provides Node class with methods

getParentNode( ), getFirstChild( ), getParentNode( ), getFirstChild( ), getNextSibling( )getNextSibling( ) getAttribute( ), getData( ) (for text node) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), … getElementsByTagName( ), …

Also provides functions for updating DOM treeAlso provides functions for updating DOM tree

Page 62: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

62

Storage of XML DataStorage of XML Data XML data can be stored in XML data can be stored in

Non-relational data storesNon-relational data stores Flat filesFlat files

Natural for storing XMLNatural for storing XML But has all problems discussed in Chapter 1 (no But has all problems discussed in Chapter 1 (no

concurrency, no recovery, …)concurrency, no recovery, …) XML databaseXML database

Database built specifically for storing XML data, Database built specifically for storing XML data, supporting DOM model and declarative queryingsupporting DOM model and declarative querying

Currently no commercial-grade systemsCurrently no commercial-grade systems

Relational databasesRelational databases Data must be translated into relational formData must be translated into relational form Advantage: mature database systemsAdvantage: mature database systems Disadvantages: overhead of translating data and Disadvantages: overhead of translating data and

queriesqueries

Page 63: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

63

Storage of XML in Storage of XML in Relational DatabasesRelational Databases

Alternatives:Alternatives: String RepresentationString Representation Tree RepresentationTree Representation Map to relationsMap to relations

Page 64: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

64

String RepresentationString Representation Store each top level element as a string field of a Store each top level element as a string field of a

tuple in a relational databasetuple in a relational database Use a single relation to store all elements, orUse a single relation to store all elements, or Use a separate relation for each top-level element typeUse a separate relation for each top-level element type

E.g. account, customer, depositor relationsE.g. account, customer, depositor relations Each with a string-valued attribute to store the elementEach with a string-valued attribute to store the element

Indexing:Indexing: Store values of subelements/attributes to be indexed Store values of subelements/attributes to be indexed

as extra fields of the relation, and build indices on as extra fields of the relation, and build indices on these fieldsthese fields

E.g. customer-name or account-numberE.g. customer-name or account-number Oracle 9 supports Oracle 9 supports function indices function indices which use the which use the

result of a function as the key value. result of a function as the key value. The function should return the value of the required The function should return the value of the required

subelement/attributesubelement/attribute

Page 65: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

65

String Representation String Representation (Cont.)(Cont.)

Benefits: Benefits: Can store any XML data even without DTDCan store any XML data even without DTD As long as there are many top-level elements As long as there are many top-level elements

in a document, strings are small compared to in a document, strings are small compared to full documentfull document

Allows fast access to individual elements.Allows fast access to individual elements.

DrawbackDrawback:: Need to parse strings to access Need to parse strings to access values inside the elementsvalues inside the elements Parsing is slow.Parsing is slow.

Page 66: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

66

Tree RepresentationTree Representation Tree representation: Tree representation: model XML data as tree and store model XML data as tree and store

using relationsusing relations nodes(id, type, label, value)nodes(id, type, label, value) child (child-id, parent-id) child (child-id, parent-id)

Each element/attribute is given a unique identifierEach element/attribute is given a unique identifier Type indicates element/attributeType indicates element/attribute Label specifies the tag name of the element/name of Label specifies the tag name of the element/name of

attributeattribute Value is the text value of the element/attributeValue is the text value of the element/attribute The relation The relation child child notes the parent-child relationships in the notes the parent-child relationships in the

treetree Can add an extra attribute to Can add an extra attribute to child child to record ordering of children to record ordering of children

bank (id:1)

customer (id:2) account (id: 5)

customer-name(id: 3)

account-number (id: 7)

Page 67: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

67

Tree Representation (Cont.)Tree Representation (Cont.)

Benefit: Can store any XML data, even Benefit: Can store any XML data, even without DTDwithout DTD

Drawbacks:Drawbacks: Data is broken up into too many pieces, Data is broken up into too many pieces,

increasing space overheadsincreasing space overheads Even simple queries require a large number of Even simple queries require a large number of

joins, which can be slowjoins, which can be slow

Page 68: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

68

Mapping XML Data to Mapping XML Data to RelationsRelations Map to relationsMap to relations

If DTD of document is known, can map data to If DTD of document is known, can map data to relationsrelations

A relation is created for each element typeA relation is created for each element type Elements (of type #PCDATA), and attributes are Elements (of type #PCDATA), and attributes are

mapped to attributes of relationsmapped to attributes of relations More details on next slide …More details on next slide …

Benefits: Benefits: Efficient storageEfficient storage Can translate XML queries into SQL, execute Can translate XML queries into SQL, execute

efficiently, and then translate SQL results back efficiently, and then translate SQL results back to XMLto XML

Drawbacks: need to know DTD, Drawbacks: need to know DTD, translation overheads still presenttranslation overheads still present

Page 69: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

69

Mapping XML Data to Mapping XML Data to Relations (Cont.)Relations (Cont.) Relation created for each element type containsRelation created for each element type contains

An id attribute to store a unique id for each elementAn id attribute to store a unique id for each element A relation attribute corresponding to each element attributeA relation attribute corresponding to each element attribute A parent-id attribute to keep track of parent elementA parent-id attribute to keep track of parent element

As in the tree representationAs in the tree representation Position information (iPosition information (ithth child) can be store too child) can be store too

All subelements that occur only once can become All subelements that occur only once can become relation attributesrelation attributes For text-valued subelements, store the text as attribute For text-valued subelements, store the text as attribute

valuevalue For complex subelements, can store the id of the For complex subelements, can store the id of the

subelementsubelement Subelements that can occur multiple times Subelements that can occur multiple times

represented in a separate tablerepresented in a separate table Similar to handling of multivalued attributes when Similar to handling of multivalued attributes when

converting ER diagrams to tablesconverting ER diagrams to tables

Page 70: 1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

70

Mapping XML Data to Mapping XML Data to Relations (Cont.)Relations (Cont.) E.g. For E.g. For bank-1 bank-1 DTD with DTD with accountaccount elements elements

nested within nested within customercustomer elements, create elements, create relationsrelations customer(id, parent-id, customer-name, customer-customer(id, parent-id, customer-name, customer-

stret, customer-city)stret, customer-city) parent-idparent-id can be dropped here since parent is the sole root can be dropped here since parent is the sole root

elementelement All other attributes were subelements of type #PCDATA, and All other attributes were subelements of type #PCDATA, and

occur only onceoccur only once account (id, parent-id, account-number, branch-name, account (id, parent-id, account-number, branch-name,

balance)balance) parent-idparent-id keeps track of which customer an account occurs keeps track of which customer an account occurs

underunder Same account may be represented many times with different Same account may be represented many times with different

parentsparents