Introduction To XML Algebra

51
1 Introduction To XML Algebra Wan Liu Bintou Kane Advanced Database Instructor: Elka 2/11/2002 1

description

Introduction To XML Algebra. Wan Liu Bintou Kane Advanced Database Instructor: Elka 2/11/2002 1. Outline. Reasons for XML algebra Niagara algebra AT&T Algebra. Data Model and Design. We need a clear framework to design a database - PowerPoint PPT Presentation

Transcript of Introduction To XML Algebra

Page 1: Introduction To XML Algebra

1

Introduction To XML Algebra

Wan LiuBintou KaneAdvanced Database Instructor: Elka

2/11/20021

Page 2: Introduction To XML Algebra

2

Outline

Reasons for XML algebra Niagara algebra AT&T Algebra

Page 3: Introduction To XML Algebra

3

Data Model and Design We need a clear framework to design a

database A data model is like creating different

data structures for appropriate programming usage. It is a type system, it is abstract.

Relational database is implemented by tables, XML format is a new one method for information integration.

Page 4: Introduction To XML Algebra

4

Why XML Algebra? It is common to translate a query

language into the algebra. First, the algebra is used to give a

semantics for the query language. Second, the algebra is used to

support query optimization.

Page 5: Introduction To XML Algebra

5

XML Algebra HistoryLore Algebra (August 1999)

-- Stanford University

IBM Algebra (September 1999) --Oracle; IBM; Microsoft Corp

YAT Algebra (May 2000)

AT&T Algebra (June 2000) --AT&T; Bell Labs

Niagara Algebra (2001) -- University of Wisconsin -Madison

Page 6: Introduction To XML Algebra

6

NIAGARA Title : Following the paths of XML

Data: An algebraic framework for XML query evaluation

By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

Page 7: Introduction To XML Algebra

7

OutLine Concepts of Niagara Algebra

Operations

Optimization

Page 8: Introduction To XML Algebra

8

Goals of Niagara Algebra

Be independent of schema information Query on both structure and content Generate simple,flexible, yet powerful

algebraic expressions Allow re-use of traditional optimization

techniques

Page 9: Introduction To XML Algebra

9

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<invoice No = 1>

<account_number>2 </account_number>

<carrier>AT&T</carrier>

<total>$0.25</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>Sprint</carrier>

<total>$1.20</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>AT&T</carrier>

<total>$0.75</total>

</invoice>

</Invoice_Document>

Customer.xml

<Customer_Document>

<customer>

<account>1 </account>

<name>Tom </name>

</customer >

<customer>

<account>2 </account>

<name>George </name>

</customer >

</Customer _Document>

Page 10: Introduction To XML Algebra

10

XML Data Model and Tree Graph

Example:Invoice_Document

Invoice Invoice…

numbercarrier total number

carriertotal

2 AT&T $0.25 1 Sprint $1.20

<Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice>

<invoice><number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </invoice>

</Invoice_Document>

Ordered Tree Graph,

Semi structured Data

Page 11: Introduction To XML Algebra

11

XML Data Model [GVDNM01]

Collection of bags of vertices. Vertices in a bag have no order. Example:

Root invoice.xml invoice invoice.account_number

<invoice>Invoice-element-content

</invoice>

< account_number >element-content

</ account_number >

[Root“invoice.xml”, invoice, invoice. account_number ]

Page 12: Introduction To XML Algebra

12

Data Model Bag elements are reachable by path

expressions. The path expression consists of two

parts : An entry point A relative forward part

Example: account_number:invoice

Page 13: Introduction To XML Algebra

13

Operators Source S , Follow , Select , Join ,

Rename , Expose , Vertex , Group , Union , Intersection , Difference - , Cartesian Product .

Page 14: Introduction To XML Algebra

14

Source Operator S Input : a list of documents Output :a collection of singleton bags Examples : S (*) All Known XML documentsS (invoice*.xml) All XML documents whose filename matches “invoice*.xmlS (*,schema.dtd) All known XML documents that conform to

schema.dtd

Page 15: Introduction To XML Algebra

15

Follow operator Input : a path expression in entry

point notation Functionality : extracts vertices

reachable by path expression Output : a new bag that consist of

the extracted vertex + all the contents of the original bag (in care of unnesting follow)

Page 16: Introduction To XML Algebra

16

Follow operator (Example*)

Root invoice.xml invoice

<invoice>Invoice-element-content

</invoice>

Root invoice.xml invoice invoice.carrier

<invoice>Invoice-element-content

</invoice>

<carrier>carrier -element-content

</carrier >

(carrier:invoice)*Unnesting Follow

{[Root invoice.xml , invoice]}

{[Root invoice.xml , invoice, invoice.carrier]}

Page 17: Introduction To XML Algebra

17

Select operator Input : a set of bags Functionality : filters the bags of a

collection using a predicate Output : a set of bags that conform

to the predicate Predicate : Logical operator (,,), or simple

qualifications (,,,,,)

Page 18: Introduction To XML Algebra

18

Select operator (Example)

invoice.carrier =Sprint

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

{[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}

{[Root invoice.xml , invoice],… }

Page 19: Introduction To XML Algebra

19

Join operator Input: two collections of bags Functionality: Joins the two

collections based on a predicate Output: the concatenation of pairs of

pages that satisfy the predicate

Page 20: Introduction To XML Algebra

20

Join operator (Example)

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root customer.xml customer<customer>

customer-element-content</customer>

account_number: invoice =number:customer

Root invoice.xml invoice Root customer.xml customer<invoice>

Invoice-element-content</invoice>

<customer>customer-element-content

</customer>

{[Root invoice.xml , invoice]} {[Root customer.xml , customer]}

{[Root invoice.xml , invoice, Root customer.xml , customer]}

Page 21: Introduction To XML Algebra

21

Expose operator Input: a list of path expressions of

vertices to be exposed Output: a set of bags that contains

vertices in the parameter list with the same order

Page 22: Introduction To XML Algebra

22

Expose operator (Example)

Root invoice.xml invoice. bill_period invoice.carrier

<invoice>carrier-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

(bill_period,carrier)

{[Root invoice.xml , invoice.bill_period, invoice.carrier]}

Root invoice.xml invoice invoice.carrier invoice.bill_period

<invoice>Invoice-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

{[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}

<invoice>carrier-element-content

</invoice>

Page 23: Introduction To XML Algebra

23

Vertex operator

Creates the actual XML vertex that will encompass everything created by an expose operator

Example :

(Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]

Page 24: Introduction To XML Algebra

24

Other operators Group : is used for arbitrary

grouping of elements based on their values Aggregate functions can be used with

the group operator (i.e. average) Rename : Changes the entry point

annotation of the elements of a bag. Example: (invoice.bill_period,date)

Page 25: Introduction To XML Algebra

25

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<invoice>

<account_number>2 </account_number>

<carrier>AT&T</carrier>

<total>$0.25</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>Sprint</carrier>

<total>$1.20</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<total>$0.75</total>

</invoice>

<auditor> maria </auditor>

</Invoice_Document>

Customer.xml

<Customer_Document>

<customer>

<account>1 </account>

<name>Tom </name>

</customer >

<customer>

<account>2 </account>

<name>George </name>

</customer >

</Customer _Document>

Page 26: Introduction To XML Algebra

26

Xquery ExampleList account number, customer name, and

invoice total for all invoices that has carrier = “Sprint”.

FOR $i in (invoices.xml)//invoice,

$c in (customers.xml)//customer

WHERE $i/carrier = “Sprint” and

$i/account_number= $c/account

RETURN

<Sprint_invoices>

$i/account_number,

$c/name,

$i/total

</Sprint_invoices>

Page 27: Introduction To XML Algebra

27

Example: Xquery output

<Sprint_Invoice>

<account_number>1 </account_number>

<name>Tom </name>

<total>$1.20</total>

</Sprint_Invoice >

Page 28: Introduction To XML Algebra

28

Algebra Tree Execution

customer (2) customer(1) Invoice (1) invoice (2) invoice (3)

Source (Invoices.xml) Source (cutomers.xml)

Follow (*.invoice) Follow (*.customer)

Select (carrier= “Sprint” )

invoice (2)

Join (*.invoice.account_number=*.customer.account)

invoice(2) customer(1)

Expose (*.account_number , *.name, *.total )

Account_number name total

Page 29: Introduction To XML Algebra

29

Optimization with Niagara

Optimizer based on the Niagara algebra

Use the operation more efficiently

Produce simpler expression by combining operations

Page 30: Introduction To XML Algebra

30

Language Convention A and B are path expressions A< B -- Path Expression A is

prefix of B AnB --- Common prefix of path

A and B AńB --- Greatest common of

path A and B ┴ --- Null path Expression

Page 31: Introduction To XML Algebra

31

Use of Rule 8.5Make profit of rule 8.5

Allows optimization based on path selectivity

When applying un-nesting follow operation Φμ

Page 32: Introduction To XML Algebra

32

Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)]

True WhenExist C / C <A && C < B

C = AńBOr AnB = ┴Interchangeability of Follow operation

Page 33: Introduction To XML Algebra

33

Application of 8.5 With Invoice

Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] *

?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] **

Both Share the common prefix invoice

Case AńB = invoice

Page 34: Introduction To XML Algebra

34

Benefit of Rule Application Note if:acc_Num required for each invoice Elementcarrier is not required for invoice Element

Then using *

Φμ(acc_Num:invoice)[Φμ(acc_Num:customer)]

make more sense than ** Why?

Page 35: Introduction To XML Algebra

35

Reduction of Input Size on the firstSub-operation

Φμ(carrier:invoice)

Should we or can we apply the 8.5 below?Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]Why?

Page 36: Introduction To XML Algebra

36

acc_Num:invoice and

acc_Num:Customer are totally different path

Case is: AnB = ┴ Then yes

Page 37: Introduction To XML Algebra

37

Rule 8.7 , 8.9 , 8.11 Interesting Helps identify

When and where to use selection to decrease size of input operation to

subsequent operationExample Algebra tree slide 28Selected before join.

Page 38: Introduction To XML Algebra

38

Addition would be

Give computation for finding when rule can be applied automatically in a case and then apply it.

Page 39: Introduction To XML Algebra

39

AT&T Algebra

Page 40: Introduction To XML Algebra

40

Page 41: Introduction To XML Algebra

41

AT&T Algebra Introduction

The algebra is derived from the nested relational algebra.

AT&T algebra makes heavy use of list comprehensions, a standard notation in the function programming community.

AT&T algebra uses the functional programming language Haskell as a notation from presenting the algebra.

Page 42: Introduction To XML Algebra

42

AT&T data model The data model merges attribute and

element nodes, and eliminates comments.

Declare Basic Type: Node.Text :: String ->nodeelem :: Tag -> [Node] ->noderef :: Node ->Node

<<bibbib>> <<book yearbook year=“1999”>=“1999”> <<titletitle> Data on the Web</title>> Data on the Web</title> <year> 1999</year><year> 1999</year> </book></book>

</bib></bib>

elem “bib” [

elem “book”[

elem “@year” [ text “1999” ],

elem “title” [text “Data on the web” ] ]]

Page 43: Introduction To XML Algebra

43

Basic Type Declarations To find the type of a node,

isText :: Node -> Bool isElem :: Node -> Bool isRef :: Node -> Bool

For a text node, string :: Node -> String For an element node,

1)tag :: Node -> Tag 2)children :: Node -> [Node]

For a reference node, dereference :: Node -> Node

Page 44: Introduction To XML Algebra

44

Nested relational algebra… In the nested relational approach, data is

composed of tuples and lists. Tuple values and tuple types are written

in round brackets. (1999,"Data on theWeb",["Abiteboul"]) :: (Int,String,[String]) Decompose values: year :: (Int,String,[String]) year (x,y,l) = x

Page 45: Introduction To XML Algebra

45

Nested relational algebra… Comprehensions: List comprehensions can

be used to express fundamental query operations, navigation, cartesian product, nesting, joins.

Example: [ value x | x <- children book0, is "author" x ]

==> [ "Abiteboul" ] Normal expression:[ exp | qual1,...,qualn ] bool-exp pat <- list-exp

Page 46: Introduction To XML Algebra

46

Nested relational algebra… Using comprehensions to write queries.

Navigatefollow :: Tag -> Node -> [Node] follow t x = [ y | y <- children x, is t y ] Cartesian product[ (value y, value z) | x <- follow "book" bib0, y <- follow "title" x, z <- follow "author" x ] ==> [ ("Data on the Web", "Abiteboul")]

Page 47: Introduction To XML Algebra

47

Nested relational algebra… Joins.

elem "reviews"elem "reviews" [ [

elem "book" [ elem "book" [

elem "title" [ text"Data on the elem "title" [ text"Data on the Web" ], Web" ],

elem "review" [ text "This is elem "review" [ text "This is great!" ]] great!" ]]

elem “bib” [

elem “book”[

elem “@year” [ text “1999” ],

elem “title” [text “Data on the web” ] ]]

[ (value y, int (value z), value w) | x <- follow "book" bib0,

y <- follow "title" x,

z <- follow "@year" x,

u <- follow "book" reviews0,

v <- follow "title" u,

w <- follow “@year" u,

y == v ]

==> [("Data on the Web", 1999, "This is great!")]

Page 48: Introduction To XML Algebra

48

Nested relational algebra… Regular expression matching

( [ (x,y,u) | x <- item "@year", y <- item "title", u <- rep (item "author") ] ) :: Reg (Node,Node,[Node] ) match reg0 book0

==> [(elem "@year" [text "1999"], elem "title" [text "Data on the

Web"],

[elem "author" [text "Abiteboul"],

elem "author" [text "Buneman"],

elem "author" [text "Suciu"] ] ) ]

Match :: Reg a -> Node-> [a]

Result

Page 49: Introduction To XML Algebra

49

Nested relational algebra… Sorting.

sortBy :: (a -> a -> Bool) -> [a] -> [a]

sortBy (<=) [3,1,2,1] ==> [1,1,2,3]

GroupinggroupBy :: (a -> a -> Bool) -> [a] -> [[a]] groupBy (==) [3,1,2,1] == [[2],[1,1],[3]]

Page 50: Introduction To XML Algebra

50

Cross Comparisons of Algebra

Niagara and AT&T standalone XML algebras

Niagara proposed after W3C had selected proposed standard

and has operators which operate on sets of bags

At&T algebra chosen as proposed standard by W3C

-- expressions resemble high level query language -- latest version of document referred to as “Semantics of XML Query Language XQuery”

Page 51: Introduction To XML Algebra

51

Future Work

Need more different evaluation strategies which would allow for flexible query plans

Develop physical operators that take advantage of physical storage structures and generate mapping

from query tree to a physical query plan