XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML...
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML...
![Page 1: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/1.jpg)
XML and Databases
198:541
![Page 2: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/2.jpg)
XML Motivation
![Page 3: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/3.jpg)
XML Motivation
Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions (presentation)
Integration of data from different sources Structural differences
Closely related to semistructured data
![Page 4: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/4.jpg)
Semistructured Data
Integration of heterogeneous sourcesData sources with non rigid structures
Biological data Web data
Need for more structural information than plain text, but less constraints on structure than in relational data
![Page 5: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/5.jpg)
Characteristics of Semistructured Data
Missing or additional tuplesMultiple attributesDifferent types in different objectsHeterogeneous collectionSelf-describing, irregular data with no apriori structure
![Page 6: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/6.jpg)
HTML Document Example
<h1> Bibliography </h1><p> <i> Foundations of Databases
</i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999
Type of informatio
nTitle
Authors
Year
book
![Page 7: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/7.jpg)
The Idea Behind XML
Easily support information exchange between applications / computers
Reuse what worked in HTML Human readable Standard Easy to generate and read
But allow arbitrary markup Uniform language for semistructured
data Data Management
![Page 8: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/8.jpg)
XML
![Page 9: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/9.jpg)
XML
eXtensible Markup LanguageUniversal standard for documents
and data Defined by W3C
Set of emerging technologies XLink, XPointer, XSchema, DOM, SAX,
XPath, XQuery,…
![Page 10: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/10.jpg)
XML
XML gives a syntax, not a semanticXML defines the structure of a document, not how it is processedSeparate structural information from format instructions
![Page 11: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/11.jpg)
XML Example
<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …
</bibliography>
![Page 12: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/12.jpg)
XML Terminology
Tags: book, title, author,… Start tag: <book> End Tag: </book>
Elements are nested Empty Element
<reviews></reviews> => <reviews/>
XML Document: single root element XML Document is well formed: matching
tags
![Page 13: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/13.jpg)
XML Attributes
Attributes are <name, value> pairs that characterize an element.
<book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year></book>
Can define oid, but they are just syntax
![Page 14: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/14.jpg)
More XML
Text can be CDATA or PCDATAEntity References: &:&, >:>,…Processing Instructions: <?blink?>Comments: <!-- comment text -->
![Page 15: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/15.jpg)
Well Formed XML Documents
Elements must be properly nested <book><title> Foundations of Databases </title></book> But Not: <book><title> Foundations of Databases </book></title>
There must be a unique root element Elements can be of
‘element content’ or ‘mixed content’:
<title>This is <b>Mixed</b>Content</title>
![Page 16: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/16.jpg)
XML: Potential
Flexible enough to represent anything Stock market, DNA, Music, Chemicals Weather information Wireless network configuration
Enables easy information exchange Between companies Within companies
Standard: everybody uses the same technology
![Page 17: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/17.jpg)
XML: Limitations XML is only a syntax for documents We need tools!
Editors and parsers Programming APIs (for Java, C++, etc.) Languages to manipulate XML (how many
books?) Schemas (What is a book like?) Storage (What if you have a lot of XML?) Transfer protocols (How do you exchange it?) What about XML in Chinese…? How can XML fit into my phone…? Query processing? …
![Page 18: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/18.jpg)
XML Schema Language
![Page 19: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/19.jpg)
DTDs: Document Type Descriptors
Similar to a schemaGrammar describing constraints on
document structure and content
XML Documents can be validated against a DTD
<!ELEMENT Book (title, author*)><!ELEMENT title #PCDATA><!ELEMENT author (name, address, age?)><!ATTLIST Book id ID #REQUIRED><!ATTLIST Book pub IDREF #IMPLIED>
![Page 20: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/20.jpg)
Shortcomings of DTDs
Useful for documents, but not so good for data:
No support for structural re-use Object-oriented-like structures aren’t supported
No support for data types Can’t do data validation
Can have a single key item (ID), but: No support for multi-attribute keys No support for foreign keys (references to other
keys) No constraints on IDREFs (reference only a Section)
![Page 21: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/21.jpg)
XSchema
In XML format Includes primitive data types
(integers, strings, dates,…)Supports value-based constraints
(integers > 100) Inheritance Foreign keys…
![Page 22: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/22.jpg)
Example of XSchema<schema version=“1.0”
xmlns=“http://www.w3.org/1999/XMLSchema”><element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0” maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0” maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>
![Page 23: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/23.jpg)
XML Storage
![Page 24: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/24.jpg)
Storing XML Data
Different approaches: Storing as text Using RDBMS Using a native system
Tailored for XML, (NATIX, Tamino, Ipedo, etc.)
Performance of the various approachesdepends on your application
![Page 25: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/25.jpg)
Storing XML as Text
SimpleEasy to compressNo updatesNeed to parse the document every time it is needed
![Page 26: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/26.jpg)
Storing XML in RDBMS
Uses existing RDBMS techniquesCostly in space, takes time to
reconstruct original documentExample techniques:
Schema with 2 relations: tag and value
Schema with n relations: 1 per element name
![Page 27: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/27.jpg)
Accessing and Querying XML Data
![Page 28: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/28.jpg)
XML as a Tree: DOM
DOM = Document Object Model Class hierarchy serving as an API to XML
trees Methods of those classes can be used to
manipulate XML (e.g., Node::child, Node::name)
Can be used from Java, C++ to develop XML applications.
Each node has an identity (i.e., a unique identifier) in the whole document
![Page 29: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/29.jpg)
XML as a DOM Tree
Class hierarchy(node, element attribute)
bibliography
book
title author publisher year
book
authorauthor
Foundation
s of Databases
Abiteboul Hull Vianu Addison Wesley
1995
![Page 30: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/30.jpg)
XML as a Stream: SAX
XML document = event stream. E.g., Opening tag ‘book’ Opening tag ‘title’ Text “Foundations of databases” Closing tag ‘title’ Opening tag ‘author’ Etc.
SAX allow you to associate actions with those events to build applicationsVery efficient since it corresponds to events during parsing, but not always sufficient.
![Page 31: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/31.jpg)
XPath
Language for navigating in an XML document (seen as a tree)
One root node types of nodes: root, element, text,
attribute, comment,… XPath expression defines navigation
in the tree following axis: child, descendant, parent, ancestor,…
![Page 32: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/32.jpg)
XPath: Examples
Find all the titles of all the books: //book/title
Find the title of all books written by Charles Dickens //book[author=“Charles Dickens”]/title
Find the title of the first section in the second chapter in “Great Expectations”
//book[title=“Great Expectations”]/chapter[2]/section[1]/title
Find the title of all sections that come after the second chapter in “Great Expectations”:
//book[title=“Great Expectations”]/chapter[2]/following::section/title
![Page 33: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/33.jpg)
Querying XML Data
Need for a language to query XML dataShould yield XML outputShould support standard query operations No schema requiredSeveral work on an XML query language: XML-QL, XQuery,..
![Page 34: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/34.jpg)
XQuery
XPath included in XQuery FLWR expressions: for let where
returnFOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
Result: <title> abc </title> <title> def </title> <title> ghi </title>
![Page 35: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/35.jpg)
How to process XML Queries?
Use indexes Need to identify nodes Need to know relations between
nodes
Labeling Schemes Dewey encoding Prefix-Postfix encoding
Twigstack
![Page 36: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/36.jpg)
Web Services
![Page 37: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/37.jpg)
What are Web Services
Programming interfaces for application to application communication on the Web platform-independent, language-independent object model-independent
Possibility to activate methods on remote web servers (RPC)
2 main applications E-commerce Access to remote data
![Page 38: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/38.jpg)
XML and Web Services
Exchange of information between application is in XML Input and Result Use of SOAP to generate messages
Descriptions of the web service functionality given in XML, according to the WSDL schema
Web Services standards use XML heavily
![Page 39: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/39.jpg)
Conclusions
XML: a very active area Many research directions Many applications
Standards not finalized yet: XQuery XML Schema Web Services…
![Page 40: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/40.jpg)
Some Important XML Standards
XSL/XSLT: presentation and transformation standards
RDF: resource description framework (meta-info such as ratings, categorizations, etc.)
XPath/XPointer/XLink: standard for linking to documents and elements within
Namespaces: for resolving name clashes DOM: Document Object Model for manipulating
XML documents SAX: Simple API for XML parsing …
![Page 41: XML and Databases 198:541. XML Motivation Huge amounts of unstructured data on the web: HTML documents No structure information Only format instructions.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6d5503460f94a4dab9/html5/thumbnails/41.jpg)
References XML
http://www.w3.org/XML/ Sudarshan S. Chawathe: Describing and Manipulating XML Data. IEEE
Data Engineering Bulletin 22(3)(1999) XML Standards
http://www.w3.org/ (XSL, XPath, XSchema, DOM…) Storing XML Data
Daniela Florescu, Donald Kossmann: Storing and Querying XML Data using an RDMBS. IEEE Data Engineering Bulletin 22(3)(1999)
Hartmut Liefke, Dan Suciu: XMILL: An Efficient Compressor for XML Data. SIGMOD Conference 2000
XQuery http://www.w3.org/TR/xquery/ Peter Fankhauser: XQuery Formal Semantics: State and Challenges.
SIGMOD Record 30(3)(2001) Web Services
http://www.w3.org/2002/ws/