Post on 14-Jan-2016
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava, Yuqing Wu
Presented by
Parag Abhyankar08305017
2
Introduction
XQuery Specify patterns of Selection Predicate having Tree Structural Relationship. e.g. book[title = ‘XML’] // author[. = ‘jane’]
The primitive tree structured relationships Parent-child : (book, title), (title,XML), (author, jane) Ancestor-descendant : (book, author)
Finding all occurrences of these relationships is a core operation for XML query processing.
3
Representing XML Elements : (Background)
Element: (DocId, StartPos : EndPos, LevelNum) String: (DocId, StartPos, LevelNum) Inspired from 'Multi-Predicate Merge Join' by Zang
4
Background continued..
Element E1(D1,S1:E1,L1) Element E2(D2,S2:E2,L2)
If D1=D2, S1<S2 and E2<E1 E1-E2 is ancestor-descendant
If D1=D2, S1<S2, E2<E1 and L1+1=L2 E1-E2 is parent-child
5
Structural Joins Join Algorithms for matching Structural Relationship
tree-merge and stack-tree
Input: Lists of tree nodes sorted by (DocId, StartPos)
Output: Lists of sorted results joined according desired structural relationship.
Use in XML Query Pattern matching Query Tree Pattern decompose binary structural
relationships. Match each relationship with XML database ‘Stitching’ together basic matches
6
Tree-Merge Join(O/p Sorted Ancestor/Parent order)
AList and DList lists of potential ancestors and descendants in sorted order.
For every node in AList do Skip all unmatchable d's (d starts before a) Output pair (a,d) till a ends after d.
7
Example
Alist={Title_1}
Dlist={Book_1, XML_1, Jane_1}
Title_1
Skips Book_1 as it starts before Title_1. Pairs with XML_1 Do not consider Jane_1 as it ends after
Title_1.
Book
Author
Jane
Title
XML
AList
Title_1
DList
Book_1
XML_1
Jane_1
8
Tree-Merge Join Detail Algorithm (O/p Sorted Ancestor/Parent order)
9
Example
ai pairs with each dj where i <= j <= 2i-1
Worst Case scenario.
Complexity: O(|AList| + |DList| + |OutputList|)
10
Tree-Merge Join(O/p Sorted Descendants order)
AList and DList lists of potential ancestors and descendants in sorted order.
For every node in DList do Skip all unmatchable a's (a ends before d starts) Output pair (a,d) till a starts before d starts.
11
Example
Alist={Book_1, Title_1}
Dlist={Book_1, XML_1, Jane_1}
Book_1
doesn't have any matching a. XML_1
Pairs with Book_1, Title_1 Jane_1
Pairs with Book_1 Do not consider Title_1 (as Title_1 starts
before Jane_1)
Book
Author
Jane
Title
XML
AList
Book_1
Title_1
DList
Book_1
XML_1
Jane_1
12
Tree-Merge Join Algorithm (O/p Sorted Descendent/Child order)
13
Example
di pairs with ai and a0
Worst Case scenario.
14
Stack-Tree Desc.(O/p sorted by Descendants)
Stack Contains Elements that can be ancestor of remaining ds
Consider elements from Alist and Dlist one by one
If top can not be ancestors, POP it out. If new 'a' has potential to be ancestor add to Stack Else new 'd' will pair with all elements for Stack (Bottom
to Top )
15
Stack-Tree Desc.(O/p sorted by Descendants)
16
Example
AList = {a1,a2,a3,…,an} DList = {d1,d2,d3,….d2n}
a a1 d d1
Stack Only ai s can go on Stack
17
Example continued..
AList = {a2,a3,…,an} DList = {d1,d2,d3,….d2n}
As a starts before d a1 goes to stack a a2 d d1
a1
Stack
18
AList = {a2,a3,…,an} DList = {d2,d2,d3,….d2n}
As d starts before a d1 pairs with all elements
from Stack
a a2 d d2
Example continued..
a1
Stack
19
AList = {a3,a3,…,an} DList = {d2,d2,d3,….d2n}
As a starts before d a2 goes to stack
a a3 d d2
Example continued..
a2
a1
Stack
20
AList = {a3,a3,…,an} DList = {d2,d2,d3,….d2n}
As d starts before a d2 pairs with all elements
from Stack
a a3 d d3
Example continued..
a2
a1
Stack
21
AList = {} DList = {dn+2,….d2n}
d dn+2
dn+2 will pop an
As an ends before dn+2
Topan-1
Example continued..
an-1..
a2a1
Stack
dn+2
22
Stack-Tree Anc.(O/p sorted by Ancestor)
Tricky: As join with top of stack can’t be added to o/p until join to it’s ancestor is added to o/p.
two lists are associated with each node on the stack: self-list is a list of result elements from the join of
this node with appropriate DList elements. inherit-list is a list of join results involving AList
elements that were descendants of the current node on the stack.
23
Stack-Tree Anc.(O/p sorted by Ancestor)
24
Example
AList = {a1,a2,a3,…,an} DList = {d1,d2,d3,….d2n}
a a1 d d1
Stack Only ai s can go on Stack
25
Example continued..
AList = {a2,a3,…,an} DList = {d1,d2,d3,….d2n}
As a starts before d a1 goes to stack a a2 d d1
a1
Stack
26
AList = {a2,a3,…,an} DList = {d2,d2,d3,….d2n}
As d starts before a d1 pairs with all elements
from Stack and added to their self-list
a a2 d d2
Example continued..
a1
Stack
SL= d1IL=
27
AList = {a3,a3,…,an} DList = {d2,d2,d3,….d2n}
As a starts before d a2 goes to stack
a a3 d d2
Example continued..
a2
a1
Stack
SL= d1IL=
28
AList = {a3,a3,…,an} DList = {d2,d2,d3,….d2n}
As d starts before a d2 pairs with all elements
from Stack and added to their self-list
a a3 d d3
Example continued..
a2
a1Stack
SL= d1, d2IL=
SL= d2IL=
29
AList = {} DList = {dn+2,….d2n}
d dn+2
dn+2 will pop an
an’s SL appended to IL and IL appended to an-1’s SL
Topan-1
Example continued..
an-1..
a2a1
dn+2
SL= d1,d2..
SL= d2,d3…
SL= dn-1IL=(an-dn)..
IL=
IL=
The Last node coming out of Stack will append IL to OutputList
30
Experimental Evaluation
Results
– STJ-D outperforms other algorithms
• Single pass over i/p nodes, No intermediate file writes
– STJ-A showed better performance than TMJ-A, TMJ-D
– Performance of STJ-A is comparable with TMJs when result size is large.
• Writing to intermediate files