Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V....

Post on 14-Jan-2016

220 views 0 download

Transcript of Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V....

Structural Joins: A Primitive for Efficient XML Query Pattern Matching

Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava, Yuqing Wu

Presented by

Parag Abhyankar08305017

2

Introduction

XQuery Specify patterns of Selection Predicate having Tree Structural Relationship. e.g.  book[title = ‘XML’] // author[. = ‘jane’]

The primitive tree structured relationships Parent-child : (book, title), (title,XML), (author, jane) Ancestor-descendant : (book, author)

Finding all occurrences of these relationships is a core operation for XML query processing.

3

Representing XML Elements : (Background)

Element: (DocId, StartPos : EndPos, LevelNum) String: (DocId, StartPos, LevelNum) Inspired from 'Multi-Predicate Merge Join' by Zang

4

Background continued..

Element E1(D1,S1:E1,L1) Element E2(D2,S2:E2,L2)

If D1=D2, S1<S2 and E2<E1 E1-E2 is ancestor-descendant

If D1=D2, S1<S2, E2<E1 and L1+1=L2 E1-E2 is parent-child

5

Structural Joins Join Algorithms for matching Structural Relationship

tree-merge and stack-tree

Input: Lists of tree nodes sorted by (DocId, StartPos)

Output: Lists of sorted results joined according desired structural relationship.

Use in XML Query Pattern matching Query Tree Pattern decompose binary structural

relationships. Match each relationship with XML database ‘Stitching’ together basic matches

6

Tree-Merge Join(O/p Sorted Ancestor/Parent order)

AList and DList lists of potential ancestors and descendants in sorted order.

For every node in AList do Skip all unmatchable d's (d starts before a) Output pair (a,d) till a ends after d.

7

Example

Alist={Title_1}

Dlist={Book_1, XML_1, Jane_1}

Title_1

Skips Book_1 as it starts before Title_1. Pairs with XML_1 Do not consider Jane_1 as it ends after

Title_1.

Book

Author

Jane

Title

XML

AList

Title_1

DList

Book_1

XML_1

Jane_1

8

Tree-Merge Join Detail Algorithm (O/p Sorted Ancestor/Parent order)

9

Example

ai pairs with each dj where i <= j <= 2i-1

Worst Case scenario.

Complexity: O(|AList| + |DList| + |OutputList|)

10

Tree-Merge Join(O/p Sorted Descendants order)

AList and DList lists of potential ancestors and descendants in sorted order.

For every node in DList do Skip all unmatchable a's (a ends before d starts) Output pair (a,d) till a starts before d starts.

11

Example

Alist={Book_1, Title_1}

Dlist={Book_1, XML_1, Jane_1}

Book_1

doesn't have any matching a. XML_1

Pairs with Book_1, Title_1 Jane_1

Pairs with Book_1 Do not consider Title_1 (as Title_1 starts

before Jane_1)

Book

Author

Jane

Title

XML

AList

Book_1

Title_1

DList

Book_1

XML_1

Jane_1

12

Tree-Merge Join Algorithm (O/p Sorted Descendent/Child order)

13

Example

di pairs with ai and a0

Worst Case scenario.

14

Stack-Tree Desc.(O/p sorted by Descendants)

Stack Contains Elements that can be ancestor of remaining ds

Consider elements from Alist and Dlist one by one

If top can not be ancestors, POP it out. If new 'a' has potential to be ancestor add to Stack Else new 'd' will pair with all elements for Stack (Bottom

to Top )

15

Stack-Tree Desc.(O/p sorted by Descendants)

16

Example

AList = {a1,a2,a3,…,an} DList = {d1,d2,d3,….d2n}

a a1 d d1

Stack Only ai s can go on Stack

17

Example continued..

AList = {a2,a3,…,an} DList = {d1,d2,d3,….d2n}

As a starts before d a1 goes to stack a a2 d d1

a1

Stack

18

AList = {a2,a3,…,an} DList = {d2,d2,d3,….d2n}

As d starts before a d1 pairs with all elements

from Stack

a a2 d d2

Example continued..

a1

Stack

19

AList = {a3,a3,…,an} DList = {d2,d2,d3,….d2n}

As a starts before d a2 goes to stack

a a3 d d2

Example continued..

a2

a1

Stack

20

AList = {a3,a3,…,an} DList = {d2,d2,d3,….d2n}

As d starts before a d2 pairs with all elements

from Stack

a a3 d d3

Example continued..

a2

a1

Stack

21

AList = {} DList = {dn+2,….d2n}

d dn+2

dn+2 will pop an

As an ends before dn+2

Topan-1

Example continued..

an-1..

a2a1

Stack

dn+2

22

Stack-Tree Anc.(O/p sorted by Ancestor)

Tricky: As join with top of stack can’t be added to o/p until join to it’s ancestor is added to o/p.

two lists are associated with each node on the stack: self-list is a list of result elements from the join of

this node with appropriate DList elements. inherit-list is a list of join results involving AList

elements that were descendants of the current node on the stack.

23

Stack-Tree Anc.(O/p sorted by Ancestor)

24

Example

AList = {a1,a2,a3,…,an} DList = {d1,d2,d3,….d2n}

a a1 d d1

Stack Only ai s can go on Stack

25

Example continued..

AList = {a2,a3,…,an} DList = {d1,d2,d3,….d2n}

As a starts before d a1 goes to stack a a2 d d1

a1

Stack

26

AList = {a2,a3,…,an} DList = {d2,d2,d3,….d2n}

As d starts before a d1 pairs with all elements

from Stack and added to their self-list

a a2 d d2

Example continued..

a1

Stack

SL= d1IL=

27

AList = {a3,a3,…,an} DList = {d2,d2,d3,….d2n}

As a starts before d a2 goes to stack

a a3 d d2

Example continued..

a2

a1

Stack

SL= d1IL=

28

AList = {a3,a3,…,an} DList = {d2,d2,d3,….d2n}

As d starts before a d2 pairs with all elements

from Stack and added to their self-list

a a3 d d3

Example continued..

a2

a1Stack

SL= d1, d2IL=

SL= d2IL=

29

AList = {} DList = {dn+2,….d2n}

d dn+2

dn+2 will pop an

an’s SL appended to IL and IL appended to an-1’s SL

Topan-1

Example continued..

an-1..

a2a1

dn+2

SL= d1,d2..

SL= d2,d3…

SL= dn-1IL=(an-dn)..

IL=

IL=

The Last node coming out of Stack will append IL to OutputList

30

Experimental Evaluation

Results

– STJ-D outperforms other algorithms

• Single pass over i/p nodes, No intermediate file writes

– STJ-A showed better performance than TMJ-A, TMJ-D

– Performance of STJ-A is comparable with TMJs when result size is large.

• Writing to intermediate files