XML Databases
description
Transcript of XML Databases
![Page 1: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/1.jpg)
![Page 2: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/2.jpg)
XML: What and Why? Application: Web Services XQuery: The XML Query Language Good News: XQuery, as a declarative
language, is ideal for automatic parallel execution
Bad News: We still need Java We limmit the “automatic parallel execution”
XQueryP scripting extension and the tradeoff
CS315B
![Page 3: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/3.jpg)
<language type=”computer language”><name> XML </name><description> A universal format
for structured documents and
data. </description>
</language>
XML is designed to describe data and to focus on what data is.
HTML is designed to display data and to focus on how data looks.
Thus, HTML is about displaying information, while XML is about describing information.
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 4: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/4.jpg)
Can represent a wide variety of both structured and unstructured data
Can be used in integrating heterogeneous data sources (traditional/relational databases , data files, email messages, web pages, etc.)
Can be used on a variety if devices including PCs, PDAs, smart mobile phones, etc.
B U T M A I N L Y . . . ”Helps companies to cut costs in
information exchange”CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 5: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/5.jpg)
Differences
Commonalities
XML Relational Data Model
Tree Table
Data and schemas should not be correlated.Data can exist with or without schema, or with multiple schemas.
Schema first, then data
XML
Logical and Physical Data Independence
Declarative Semantics
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 6: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/6.jpg)
A WS is a class on the Web. Like an RPC, which identified by a URI (e.g.
http://my.service:234)accepts as argument an XML envelope returns an XML response.
Server Application Logic
Client
Web Service
XML
Client
XML
Client
XML
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 7: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/7.jpg)
Typical Architecture
Server Application Logic (Java / .NET)
Web Service (XML Domain)
XML DB
XML
XQuery (XML Domain)
Server Application Logic
Client
Web Service
XML
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 8: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/8.jpg)
XQuery is a declarative programming language, designed to manipulate and query XML data.
With XQuery you describe ’’what’’ you want to achieve and leave the ’’how’’ to the runtime system
It is essentially designed for optimizability, including automatic parallelization of the execution of the queries
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 9: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/9.jpg)
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
<<bookbook> > <<ISBNISBN>> 333 333 <</ISBN/ISBN> > <<titletitle>> RDBMS RDBMS <</title/title>> <<authorauthor>> Paul Paul <</author/author>> <<chapterchapter>> <<numnum>> I I <</num/num>> <<titletitle>> Information Retrieval using RDBMS Information Retrieval using RDBMS <</title/title>> <<sectionsection>> <<titletitle>> Beyond Simple Information Retrieval Beyond Simple Information Retrieval <</title/title>> <<sectionsection>> <<titletitle>> Extension of RDBMS features Extension of RDBMS features <</title/title>> <</section/section>> <</section/section>> <</chapter/chapter>><</book/book>>
![Page 10: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/10.jpg)
Syntactic sugar that combines FOR, LET, IF
Example Return the number of title elements of the chapter ”I” of the book
FOR var IN expr
LET var := expr WHERE expr
RETURN expr
XQUERY SQL Analogy
FOR $chapters IN /book//chapterLET $titles := $chapters//titleWHERE $chapters/num = ”I”RETURN <NumOfTitles> count($titles)
</NumOfTitles>
similar to FROMno analogy in SQLsimilar to WHEREsimilar to SELECT
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 11: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/11.jpg)
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
((docIddocId, , sPossPos, , ePosePos, , levellevel))
docIddocId: identifier of the document: identifier of the documentsPossPos : starting position of the element or string within the XML doc: starting position of the element or string within the XML docePosePos : end position of the element (for string => same as sPos): end position of the element (for string => same as sPos)levellevel : nesting depth within the document: nesting depth within the document
![Page 12: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/12.jpg)
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
To facilitate the evaluation of the XQuery expressions, an index is created for all the nodes within the XML database.
Term docId
sPos
ePos
level
book 1 1 36 0
ISBN 1 2 4 1
title 1 5 7 1
chapter
1 11 35 1
title 1 15 20 2
title 1 22 26 3
title 1 28 32 4
RDBMS
1 6 6 2
RDBMS
1 19 19 3
RDBMS
1 30 30 3
![Page 13: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/13.jpg)
”Beowulf cluster”: An example of a high performance parallel computing system used for parallel processing of XML Queries
Several processing nodes interconnected via a switch
Each node has its own CPU with a sizable cache, a large main memory (typically>1GB) and a hd
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 14: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/14.jpg)
Master: runs the file. Serves as the point system for the clustering S/W to route duties and monitor all individual nodes (i.e., slaves)
Beowulf: Open source s/w like LinuxMPI library for broadcasting and point-to-point
messages among the cluster’s nodes.
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 15: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/15.jpg)
Phase 1: Distribute the entries of the fully-Phase 1: Distribute the entries of the fully-inverted index among the cluster nodes for inverted index among the cluster nodes for processing (e.g., round-robin distribution, processing (e.g., round-robin distribution, hash-based distribution).hash-based distribution).
Phase 2: Each cluster processes the Phase 2: Each cluster processes the containment query to generate the containment query to generate the corresponding lists of index entries.corresponding lists of index entries.
Phase 3: The elements of the generated Phase 3: The elements of the generated list are checked against one another to list are checked against one another to produce the result set.produce the result set.
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 16: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/16.jpg)
Despite of XQuery we still need Java/.NET to:
implement user interfacescall Web services; interact with other
programsexpose functions as Web servicewrite complex applications
Trade-off between optimizability (on one side) & flexibility, determinism and expressive power (on the other side)
Query languages are more optimizable but pay a price on the other side
Imperative languages lack optimizability but the semantics are simpler, deterministic and richer
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 17: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/17.jpg)
The ultimate goal: get rid of Java => all XQuery
XQueryP: Extension of XQuery for scripting
Server Application Logic (Java / .NET)
Web Service (XML Domain)
XML DB
XML
XQuery (XML Domain)
Server Application Logic
Client
Web Service
XML
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 18: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/18.jpg)
Prototype in Big OracleDBPresented at Plan-X 2005
Prototype in BerkeleyDB-XMLMight be open sourced (if interest)
MXQueryhttp://www.mxquery.org (Java)Runs on mobile phones: Java CLDC1.1; some
cuts even run CLDC 1.0Eclipse Plugin available since March 2007
Zorba C++ engine (FLWOR Foundation)Small footprint, performance, extensibility,
potentially embeddable in many contexts
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 19: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/19.jpg)
Ghassan Z. Qadah: ”Parallel processing of xml databases” [2005 IEEE CCECE/CCGEI]
Xiaogang Li, Swarup Kumar Sahoo, Gagan Agrawal: ”XQuery Perspective: Using XML/XQuery for Scientific Applications and Applying Scientific CompilationTechniques” [2004 SIGMOD]
Daniela Florescu, Donald Kossmann. ”CS345B: XML and Databases”. http://www.stanford.edu/class/cs345b/
W3C XML Query XQuery http://www.w3.org/XML/Query/
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 20: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/20.jpg)
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 21: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/21.jpg)
Introduces parts of code that will:Run in Sequential ModeDefine the order in which expressions will
be evaluatedBe strictly deterministicManually handle exceptions
CS315B
![Page 22: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/22.jpg)
HealthCare Level Seven http://www.hl7.org/ Geography Markup Language (GML) Systems Biology Markup Language (SBML)
http://sbml.org/ XBRL, the XML based Business Reporting
standard http://www.xbrl.org/ Global Justice XML Data Model (GJXDM)
http://it.ojp.gov/jxdm ebXML http://www.ebxml.org/ e.g. Encoded Archival Description
Application http://lcweb.loc.gov/ead/ Digital photography metadata XMP An XML grammar for sensor data
(SensorML) Real Simple Syndication (RSS 2.0)
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 23: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/23.jpg)
Xpath 1.0
XSLT 2.0XQuery 1.0
Xpath 2.0
XSLT 1.0
uses
uses
extends, almost backwards compatible
extendsFLWOR expressionsNode constructorsValidation
1999
2007
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism
![Page 24: XML Databases](https://reader036.fdocuments.us/reader036/viewer/2022062422/56814097550346895dac2ef6/html5/thumbnails/24.jpg)
1. Allow to execute sub-computations in a different order
Parallelization, rescheduling2. Possible to use various data access paths3. Allow lazy evaluation4. Allow streaming/pipelining between operations (no
materialization of intermediate results)5. Allow various evaluation algorithms for the same
logical operation
CS315B – Domain-specific Languages for Parallelism CS315B – Domain-specific Languages for Parallelism