Post on 26-Mar-2015
WorzykFH Anhalt
Telemedizin WS 09/10XML - 1
XML
Extensible Markup Language
WorzykFH Anhalt
Telemedizin WS 09/10XML - 2
XML
• Metalanguage– A Language, which describes languages– Languages describe formats for data
exchange
WorzykFH Anhalt
Telemedizin WS 09/10XML - 3
Example
Hans MeyerLohmannstrasse 2306366 KöthenDr. Else MüllerBernburger Strasse 5606366 Köthen
WorzykFH Anhalt
Telemedizin WS 09/10XML - 4
Example
Hans MeyerLohmannstrasse 2306366 Köthen
Dr. Else MüllerBernburger Strasse 5606366 Köthen
<Patient> <Name> <Strasse> <Ort></Patient><Arzt> <Name> <Strasse> <Ort></Arzt>
</Name></Strasse></Ort>
</Name></Strasse></Ort>
WorzykFH Anhalt
Telemedizin WS 09/10XML - 5
Structure of XML documents
• Prolog– Deklaration of type of dokument– DTD (Document Type Definition)
• Elements
http://www.w3schools.com/xml/default.asp
http://de.selfhtml.org/
WorzykFH Anhalt
Telemedizin WS 09/10XML - 6
Document Type DefinitionDTD
• It describes the grammar of a XML - document
• It describes permitted elements and attributes– their data type and range of values– their nesting
• An XML – Dokument, that conforms to a DTD is called valid
WorzykFH Anhalt
Telemedizin WS 09/10XML - 7
Example DTD<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE Personen [ <!ELEMENT Personen (Patient)> <!ELEMENT Patient (#PCDATA)>]><Personen> <Patient> Hans Meyer Lohmannstrasse 23 06366 Köthen </Patient></Personen>
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten1.xml
WorzykFH Anhalt
Telemedizin WS 09/10XML - 8
Structure of XML documents• DTD describes the characteristics of the
elements• Elements are initiated by a start tag
<Elementname> and are terminated by a closing tag </Elementname>.
• XML tags are case sensitive• Elements can contain Elements.• #PCDATA Parsed character data: The
elements consist of character strings whose characters are part of the defined character set.
WorzykFH Anhalt
Telemedizin WS 09/10XML - 9
Names of Elements
• Names can contain letters, numbers, and other characters
• Names must not start with a number or punctuation character
• Names must not start with the letters xml (or XML or Xml ..)
• Names cannot contain spaces
WorzykFH Anhalt
Telemedizin WS 09/10XML - 10
Sequence of Elements
Subordinate elements are separated in the declaration by commas and included in parentheses.
Example:<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE Personen [ <!ELEMENT Personen (Patient,Arzt)> <!ELEMENT Patient (Name,Adresse)> <!ELEMENT Arzt (Name, Adresse)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Adresse (#PCDATA)>]> http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten2.xml
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten3.xml
WorzykFH Anhalt
Telemedizin WS 09/10XML - 11
selection list
• Selection of exactly one element: The available elements are seperated by |
• Example:<!DOCTYPE Personen [ <!ELEMENT Personen (Patient|Arzt)> <!ELEMENT Patient
(Name,Adresse,Diagnose)> <!ELEMENT Arzt (Name,
Adresse,Fachgebiet)>
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten4.xml
WorzykFH Anhalt
Telemedizin WS 09/10XML - 12
Multiple occurrence
* The element can appear no time or arbitrarily often
+ The element can appear at least one time or arbitrarily often
? The element can appear no time or at most one time
Datenbanksysteme 2 SS 2004Seite 13 - 13
WorzykFH Anhalt
Attributes<!ATTLIST element-name attribute-name attribute-type
default-value>
Types of attriutes::CDATA, (en1|en2|..), ID, IDREF, IDREFS, NMTOKEN,
NMTOKENS, ENTITY, ENTITIES, NOTATION, xml:
Defaultvalue:value#REQUIRED, #IMPLIED, #FIXED value
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten5.xml
http://www.w3schools.com/xml/xml_attributes.asp
WorzykFH Anhalt
Telemedizin WS 09/10XML - 14
Comments
Comments are embedded by <!– and -->
<!-- This is a comment -->
WorzykFH Anhalt
Telemedizin WS 09/10XML - 15
Well-formed XML - File
• The file starts with the XML-declaration, which establish the reference to XML
• It exists at least one data element• It exists exactly one root element, which
contain all other data elements• All required attributes are defined• All elements have the right content• The elements must be nested properly
WorzykFH Anhalt
Telemedizin WS 09/10XML - 16
Valide XML - File
• The file is well-formed• A DTD is assigned to the file• The content of the file is according
to the assigned DTD
WorzykFH Anhalt
Telemedizin WS 09/10XML - 17
ParserA parser validates if an XML Document is valide:
<html><body>
<script type="text/javascript">var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")xmlDoc.async="false"xmlDoc.validateOnParse="true"xmlDoc.load("Patienten5.xml")
document.write("<br />Error Code: ")document.write(xmlDoc.parseError.errorCode)document.write("<br />Error Reason: ")document.write(xmlDoc.parseError.reason)document.write("<br />Error Line: ")document.write(xmlDoc.parseError.line)</script>
</body></html>
http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Parser.htm
WorzykFH Anhalt
Telemedizin WS 09/10XML - 18
DTD - Disadvantages
• Few datatypes• specification not in XML – Syntax
– Specification can not be validated with a parser
WorzykFH Anhalt
Telemedizin WS 09/10XML - 19
XML - Schema
• An XML Schema:• defines elements that can appear in a document • defines attributes that can appear in a document • defines which elements are child elements • defines the order of child elements • defines the number of child elements • defines whether an element is empty or can
include text • defines data types for elements and attributes • defines default and fixed values for elements and
attributes
http://www.w3schools.com/schema/schema_intro.asp
WorzykFH Anhalt
Telemedizin WS 09/10XML - 20
XML SchemaAdvantages over DTD
• XML Schemas are extensible to future additions
• XML Schemas are richer and more useful than DTDs
• XML Schemas are written in XML • XML Schemas support data types
– xs;date, xs;dateTime, xs:string
• XML Schemas support namespaces – xmlns:xs="http://www.w3.org/2001/XMLSchema“
WorzykFH Anhalt
Telemedizin WS 09/10XML - 21
Dublin Core Standard
Dublin Core Metadata InitiativeConference in 1995 in Dublin / Ohio
defined a set of describing attributs to categorize documents in the internet
15 core elements are recommended in „Dublin Core Metadata Element Set, Version 1.1 (ISO 15836)“
http://dublincore.org/documents/dces/
WorzykFH Anhalt
Telemedizin WS 09/10XML - 22
How to create an XML structure
• Create a tree-structure of the data• Convert that structure to a DTD• Add data elements• Test
WorzykFH Anhalt
Telemedizin WS 09/10XML - 23
ExampleQuarterly billing
• One file consists of exactly one physician and at least one patient
• A phyiscian is either a General Practitioner or a dentist
• A general practitioner has an address and a profession
• A dentist has an address• A patient has an address and no ore
more diagnisis• An address consists of Name, City, Street• A name has a salutation Mr. or Ms.
WorzykFH Anhalt
Telemedizin WS 09/10XML - 24
ExampleQuarterly billing
billing
Physician Patient
General Practitioner Dentist Address Diagnosis
Address Profession ? Adresse Name City Street
Mr Ms
+
| *
WorzykFH Anhalt
Telemedizin WS 09/10XML - 25
Example - DTD
<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE Billing [ <!ELEMENT Billing (Physician, Patient+)> <!ELEMENT Physician (General_Practitioner | Dentist)> <!ELEMENT General_Practitioner (Address, Profession?)> <!ELEMENT Dentist (Address)> <!ELEMENT Patient (Address, Diagnosis*)> <!ELEMENT Address (Name, City, Street)> <!ELEMENT Profession (#PCDATA)> <!ELEMENT Diagnosis (#PCDATA)> <!ELEMENT Name (#PCDATA)> <!ELEMENT City (#PCDATA)> <!ELEMENT Street (#PCDATA)> <!ATTLIST Name Salutation (Mr|Ms) "Ms">]>
WorzykFH Anhalt
Telemedizin WS 09/10XML - 26
Example - Data< Billing > < Physician > < General_Practitioner > <Address> <Name>Dr. Erpel</Name> <City>Entenhausen</City> <Street>Am Krankenhaus 1</Street> </Address> < Profession >Geriatrics</ Profession > </ General_Practitioner > </ Physician > < Patient > <Address> <Name Anrede="Herr">Daniel</Name> <City>Entenhausen</City> <Street>Bahnhofstrasse 3a</Street> </Address> <Diagnose>Bettflucht</Diagnose> </Patient> <Patient> <Address> <Name>Daisy</Name> <City>Entenhausen</City> <Street>Am Stadtpark</Street> </Address> <Diagnosis>Sonnenbrand</Diagnosis> <Diagnosis>Migräne</Diagnosis> </Patient></ Billing >
WorzykFH Anhalt
Telemedizin WS 09/10XML - 27
Queries to XML - Files
• XPath• XQuery
WorzykFH Anhalt
Telemedizin WS 09/10XML - 28
XPath
The language XPath serves to address parts of a XML document.
It was designed for the use both in XSLT and in XPointer.
XPath models a XML document as a tree, which consists of knots.
http://www.informatik.hu-berlin.de/~obecker/obqo/w3c-trans/xpath-de-20010702/
WorzykFH Anhalt
Telemedizin WS 09/10XML - 29
Example<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price></book>
<book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price></book>
<book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price></book>
<book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price></book>
</bookstore>
WorzykFH Anhalt
Telemedizin WS 09/10XML - 30
Queries with XPath
Select all titles: /bookstore/book/title
Select the title of the first book/bookstore/book[1]/title
Select all the prices/bookstore/book/price/text()
Select price nodes with price>35/bookstore/book[price>35]/title
http://www.w3schools.com/xpath/xpath_examples.asp
WorzykFH Anhalt
Telemedizin WS 09/10XML - 31
XQuery
• Querylanguage for XML data• Uses Xpath expression• Analogy to SQL
WorzykFH Anhalt
Telemedizin WS 09/10XML - 32
Xquery Example<?xml version="1.0" encoding="ISO-8859-1"?>
<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
<book year="1992">
<title>Advanced Programming in the Unix environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price>39.95</price>
</book>
<book year="1999">
<title>The Technology and Content for Digital TV</title>
<editor>
<last>Gerbarg</last><first>Darcy</first>
<affiliation>CITI</affiliation>
</editor>
<publisher>Kluwer Academic Publishers</publisher>
<price>129.95</price>
</book>
</bib>
WorzykFH Anhalt
Telemedizin WS 09/10XML - 33
Xquery Example
Query:
doc("books.xml")/bib/book[price<50]
results:
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price>39.95</price>
</book>
WorzykFH Anhalt
Telemedizin WS 09/10XML - 34
FLWOR
For, Let, Where, Order by, Return
for $x in doc("books.xml")/bib/bookwhere $x/price>50order by $x/titlereturn $x/title
Results:<title>Advanced Programming in the Unix
environment</title><title>TCP/IP Illustrated</title><title>The Technology and Content for Digital TV</title>
WorzykFH Anhalt
Telemedizin WS 09/10XML - 35
XML – Documents in Databases
XML – Documents can be• Focussed on data• Focussed on text• Semi-structured
WorzykFH Anhalt
Telemedizin WS 09/10XML - 36
Alternatives to store XML Documents
• Storage as a whole• Storage within the XML-Structure• Transformation to structures of the
database
WorzykFH Anhalt
Telemedizin WS 09/10XML - 37
Storage of XML documents as a whole
Original will be stored in a file system or as CLOB in a database
full-text indexStrukturindex
WorzykFH Anhalt
Telemedizin WS 09/10XML - 38
Example<hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung></hotel>
WorzykFH Anhalt
Telemedizin WS 09/10XML - 39
full-text index
Begriff Verweis
hotel * * *
Warnemünde *
Rostock *
ort * *
<hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung></hotel>
WorzykFH Anhalt
Telemedizin WS 09/10XML - 40
full-text - andStructurindex
Begriff Verweis Element
Warnemünde * *
Seestrasse * *
Rostock * *
<hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung></hotel>
Element Ver weis
Ord nung
Vor gänger
hotel * 1
adresse * 2 *
ort * 3 *
strasse * 3 *
anreise bschreibung
* 2 *
WorzykFH Anhalt
Telemedizin WS 09/10XML - 41
Queries
Volltextindexhotel AND warnemünde(hotel OR pension) AND (rostock OR warnemünde)
Volletxt- und Strukturindexhotel.adresse.ort CONTAINS (“warnemünde“) ANDhotel.freizeitmoeglichkeit CONTAINS (“swimming pool“)
WorzykFH Anhalt
Telemedizin WS 09/10XML - 42
Characteristics full-text index
Description of Schema Not required
Reconstruction of document
The document remains in the original form
Queries - Information Retrieval - SQL
further characteristics The evaluation of the structure is possible
Use Document-centered applications
WorzykFH Anhalt
Telemedizin WS 09/10XML - 43
generic storage Storage within the XML-
Structure
All Informationen of the XML-Dokument will be stored– simple generic Storage– Document Object Model
WorzykFH Anhalt
Telemedizin WS 09/10XML - 44
BeispielDocID Element
nameID Vor
gängerOrdnung
Wert
h0001 hotel 101 1h0001 hotelname 102 101 1 Hotel Hübnerh0001 kategorie 103 101 2 4h0001 adresse 104 101 3h0001 plz 105 104 1 18119h0001 ort 106 104 2 Warnemünde...
DocID Attributname
ID Element Wert
h0001 url 101 101 http://www.hotel-huebner.de
h0001 id 102 101 h0001...
WorzykFH Anhalt
Telemedizin WS 09/10XML - 45
Document Object Model
The structure of the tree will be transformed to a class hierarchy
Storage in objectrelational or objektoriented databases
WorzykFH Anhalt
Telemedizin WS 09/10XML - 46
Queries
• XPath• QXuery• XQL
– Abfragesprache der Software AG
• SQL
WorzykFH Anhalt
Telemedizin WS 09/10XML - 47
Characteristics Generic Storage
Description of Schema Not required
Reconstruction of document possible, but expensive
Queries - XQuery, XQL - QL considers the storage
structures further characteristics Queries anb Updates possible with
DOM Use for documents
- Focussed on data - Focussed on text - Semi-structured
WorzykFH Anhalt
Telemedizin WS 09/10XML - 48
Transformation to Structures of databasesDTD or Schema must be availableAutomatic or userdriven proceduresTransformtion to
relationalobjectrelationalobjectoriented
Databases
WorzykFH Anhalt
Telemedizin WS 09/10XML - 49
Transformation XML - Information Datenbankiformation Element Root - Element Relation XML - Element Attribut of a Relation Sequence of Elementen Attribute of a Relation Alternative of Elementen Attribute of a Relation Element with Qualifizierer ? Attribut, nullvalue possible Element with Qualifizierer +
or * SET oder LIST
komplex strukturiertes Element
ROW
Attribut XML - Attribut Attributof a Relation #IMPLIED Nullvalue not allowed #REQUIRED Nullvalue not allowed Defaultwert Defaultvalue
WorzykFH Anhalt
Telemedizin WS 09/10XML - 50
ExampleHotelname url id erstellt-am autor kate
gorie fax anreisebeschreibung
Hotel Hübner
http:// h0001 03/02/2003 Hans Müller
4 0381 Aus Richtung Rostock
id plz ort strasse nummer
h0001 18119 Warnemünde Seestrasse
12
id telefon Ordnung
h0001 0381 / 5434 - 0 1
WorzykFH Anhalt
Telemedizin WS 09/10XML - 51
Queries
• SQL with– Joins– Aggregatfunktionen– Queryoptimizing– Update
WorzykFH Anhalt
Telemedizin WS 09/10XML - 52
Characteristics Structures of databases
Description of Schema required
Reconstruction of document
only partly possible
Queries - SQL und XML
further characteristics Keeps the order of elements with additional attributs
Use For data-centered applications