Haris Georgiadis Minas Charalambides Vasilis Vassalos Athens University of Economics and Business 1...
-
Upload
deshaun-newsome -
Category
Documents
-
view
218 -
download
0
Transcript of Haris Georgiadis Minas Charalambides Vasilis Vassalos Athens University of Economics and Business 1...
Haris Georgiadis Minas Charalambides
Vasilis Vassalos
Athens University of Economics and Business1
Efficient Physical Operators for a cost-based XPath Execution Engine
Motivation (1)XPath query: /s/r/*/it[mb/m/to=‘x’]//k
Three navigation alternatives (among others):
Straightforward navigationretrieve all it elements under /s/r/*/it; keep those having at least one to descendant under /mb/m/to with text value ‘x’. For the it elements left, return their k descendants.
Starting from kreturn all k elements with at least one it ancestor, which in turn: • has a to descendant under /mb/m/to with text value ‘x’ and • has a s document element ancestor via relative path parent::*/parent::r/parent::s.
Starting from toreturn all to elements under /s/r/*/it/mb/m/to, keep only those with text value ‘x’, then go backward via parent::m/parent::mb/parent::it and, for the it elements left, return their k descendants
2Athens University of Economics and Business
Motivation (2)Many XPath processing algorithms
PPFS+ , Staircase Join, Sort Merge-based structural joins, PathStack, Twig2Stack etc
Many physical data models and storage techniques : Shredding on relations:
Schema-based mapping vs. edge-based mappingStorage into disk pages preserving XML
hierarchy Structural encodings:
Region Encoding vs. Prefix based encodingData structures: XB-trees, F&B Index, Path
indexes3Athens University of Economics and Business
Contribution IGeCOEX: the first generic Xpath cost-
based execution and optimization frameworkAgnostic to the underlying XML storage
system and the access methods it supports
Independent of the techniques and algorithms available for XPath processing. Encapsulated in operator implementations, and
rewriting rulesCost based optimization
5Athens University of Economics and Business
Contribution IIXPalgebra: A novel XPath logical algebra
Good fit with many XPath processing techniques
Lookup and SM: two novel and efficient families of physical operators for Xpath
Multiple storage engines Experimental evaluation: Direct
comparison of operator implementations
Athens University of Economics and Business 6
GeCOEX System ArchitectureParser
Physical Plan Executor
XPath query
resultXPA API
Primitive Access
Method Cost Models
Database Statistics
Physical Plan
Selector
Que
ry O
ptim
izati
onQ
uery
Exe
cutio
n
XPA
Driver
Rewriting Rules
Descriptors
Physical Operators
Descriptors
Physical Operator Descriptors Cost
Models
Descriptors
Physical Operator Descriptors Cost
Models
Primitive Access
Method Cost Models
Descriptors
Physical OperatorsPrimitive
Access Methods
Primitive Access
Methods
Data Model
Database Statistics
7Athens University of Economics and Business
XPalgebraGeneric sequence-based logical algebra for a subset of XPath
Forward and backward axes Non-positional predicates involving conjunctive boolean
expressionsMaintains the navigation nature of XpathData Model
ElementSequence
Duplicate-free list of elements in document order
Sequence Operators: (mainly) navigationInput and Output: Sequence
Boolean Operators: used for filteringInput: ElementOutput: True or False
8Athens University of Economics and Business
XPalgebra – Sequence OperatorsBoth the input and the output of a Sequence operator are
sequences of nodesThe input sequence is called context sequence
BoolExpr: const | Ъ1^Ъ2^ … ^Ъn , where Ъi : Boolean Operator10Athens University of Economics and Business
XPalgebra – Boolean Operatorsapplied on single nodes only the input element is called context elementreturn boolean values
BoolExpr: const | Ъ1^Ъ2^ … ^Ъn , where Ъi : Boolean Operator
f(S, Ъfp/d//c)
…[d//c]
12Athens University of Economics and Business
XPalgebra - examples
/s/r/*/it[mb/m/to=‘x’]//k
dk(f(fp/s/r/*/it(root), Ъfp/mb/m/to(Ъvftext()=x)))
13Athens University of Economics and Business
Physical Operators
Athens University of Economics and Business 14
Implements the Sequence interface of XPA APIAccess the XML data using the AccessMethods interface of
the XPA APIExample: a physical operator implementation
That’s how physical operators are agnostic to the physical data model
Physical OperatorsLarge number of physical operators, divided
roughly into four ‘families’:Lookup operators (LU)
Inspired by indexed nested loops joindLU
a: for each element n from input sequence S make a lookup using XPAAPI.Descs(n, a)
SortMerge-based operators(SM) Inspired by Sort Merge joindSM
a: scan all elements from input sequence S and all a elements (using XPAAPI.Descs(root, a)) and find ‘ancestor-descendant’ matches
Staircase Join operators[Grust 2003]PathStack operators [Bruno 2002]
Athens University of Economics and Business 15
Physical Operators
Athens University of Economics and Business 16
s LU* SM* Staircase[Grust 2003]
PathStack[Bruno 2002]
c (child) **
d (descendant) fp (forward path) **
p (parent) X **
a (ancestor) **
bp (backward path)
** X
cs (cousin) X X
**: inspired by original
5 XML Storage Systems and their XPA drivers
22Athens University of Economics and Business
Parser
Physical Plan Executor
XPath query
resultXPA API
Primitive Access
Method Cost Models
Database Statistics
Physical Plan
Selector
Que
ry O
ptim
izati
onQ
uery
Exe
cutio
n
XPA
Driver
Rewriting Rules
Descriptors
Physical Operators
Descriptors
Physical Operator Descriptors Cost
Models
Descriptors
Physical Operator Descriptors Cost
Models
Descriptors
Physical OperatorsPrimitive
Access Methods
Data Model
XMLStorageSystem
The PE-basic Native XML storage system Dewey encoding, 1 B-Tree per tag name
The RE-basic Native XML storage system Pre/Post/Level encoding, 1 B-Tree per tag
nameThe PE-Path Native XML storage system
Dewey encoding, 1 B-Tree per tag name, Paths B-Tree
The RE-Path Native XML storage system Pre/Post/Level encoding, 1 B-Tree per tag
name, Paths B-Tree
The Edge-RE Native XML storage system Pre/Post/Level encoding, 1 B-Tree for all
elements
Lookup OperatorsNovel efficient algorithms for holistically evaluating
forward and backward multi-step pathsBased on root-to-node filtering.
buffered-leaping: a new technique for pipelined duplicate elimination and document order preservation
Search a minimum window of elements for each element in the context sequencewindow: the result of calling the method from the
AccessMethods interface of the XPA API (e.g. Descs(), Ancs()) corresponding to the XPath axis (e.g. descendant, ancestor) for a given context element
The size of chain at any time is very small and upper bounded by the depth of the
XML document
Example: fpLU/c/f
r
b1 b2 b3 b8
c f4c b4 b6 b7
b9
c
f1
f2
f3 f5b5 c c c f11 c
d c
f6 f7
f8 f9 d
f10
f12 f13
c c
f14 f15
f16 d
c
f17
e
rootAnc contextEl chainnext()
b1 b1
b2
b2 not a descendant of b1
window =XPAPI.Descs(b1,‘f’);
regExprFilter(f1.getRTNPath(), /c//f, 1) = true
f1
next()
regExprFilter(f2.getRTNPath(), /c//f, 1) = falseregExprFilter(f3.getRTNPath(), /c//f, 1) = true
f3
b2
b3
b3 not a descendant of b2
window =XPAPI.Descs(b2,‘f’);
regExprFilter(f4.getRTNPath(), /c//f, 1) = false
next()
regExprFilter(f5.getRTNPath(), /c//f, 1) = true
f5
next()b3
b5
b5 is a descendant of b3
window =XPAPI.Descs(b3,‘f’);
b5 b7
b7 is a descendant of b3
b7
b9
b9 is not a descendant of b3
f6 descendant of b3 and regExprFilter(f6.getRTNPath(), /c//f, 1) = falsef6 descendant of b5 and regExprFilter(f6.getRTNPath(), /c//f, 3) = falsef6 not descendant of b7f7 descendant of b3 and regExprFilter(f7.getRTNPath(), /c//f, 1) = falsef7 descendant of b5 and regExprFilter(f7.getRTNPath(), /c//f, 3) = true
f7
f8 descendant of b3 and regExprFilter(f8.getRTNPath(), /c//f, 1) = falsef8 not descendant of b5f8 not descendant of b7f9 again not reachable from any of b3, b5, b7 via /c//ff10 again not reachable from any of b3, b5, b7 via /c//ff11 again not reachable from any of b3, b5, b7 via /c//ff12 is reachable from b7 via /c//f
f12next()
next()
f13 is reachable from b7 via /c//f
f13
next()
b9
nullcontext sequence is exhausted
window =XPAPI.Descs(b9,‘f’);
f16 is not reachable from b9 via /c//ff17 is reachable from b9 via /c//f
f17
Example: bpLUparent::c/ancestor::b
r
b1 b2 b3 b8
c f4c b4 b6 b7
b9
c
f1
f2
f3 f5b5 c c c f11 c
d c
f6 f7
f8 f9 d
f10
f12 f13
c c
f14 f15
f16 d
c
f17
e
contextEl sortedElements
window =XPAPI.Ancs(f2,‘b’);
window =XPAPI.Ancs(f3,‘b’);
window =XPAPI.Ancs(f5,‘b’);
window =XPAPI.Ancs(f6,‘b’);
window =XPAPI.Ancs(f8,‘b’);
window =XPAPI.Ancs(f11,‘b’);
Cheap implementation of Ancs() in the PE-Path driverDewey(f2)=1.1.2.1.1RTN(f2)= /r/b/c/f => there is a ‘b’ ancestor b’ at level 2Þ Dewey(b’)= substr(dewey(f2), …) = 1.1 RTN(b’)=substr(RTN(f2), …) = /r/bAncs() outputs n without actually retrieving b1 from the database. n is the virtual representation of b1, denoted as #b1
b1#f2
f3
f5
f3 is a descendant of b1
V
next() b1
b2#V
f5 not a descendant of b1f6 not a descendant of b2
f6
next() b2
next()
b3# b4# b5#
f8
V
f8 is a descendant of b3
f11
f11 is a descendant of b3
b7#
null
b4
reverseOf(parent::c/ancestor::b)=/c//fV: regExprFilter(f3.getRTNPath(), /c//f, 1)=true
SM OperatorsInspired by sort-merge join algorithmsTraverse two sequences of elements, left and right
left: the context sequence (the input sequence)right: always consists of all the elements of the requested
tag nameKeeping track of the current elements on left and right,
try to find matching pairs according to the appropriate navigation axis and condition
Novel techniques for holistic SM-based forward path and backward path operators with guaranteed low memory requirements
Performance Comparison
Performance Comparison
Sensitivity to context selectivitydescendant ancestor
forward path
Conclusions I Novel techniques for evaluating forward and
backward multi-step paths pipelined duplicate elimination and document
order preservationLookup fp, Lookup bp, Lookup cs, SM fp, SM
bp, SM csFast backwards navigation that fully exploits
the capabilities of the underlying storage system
Algorithms perform well across a variety of different physical storage models
First steps towards building cost models for XPath
Athens University of Economics and Business 33
Conclusions II Operator-based XPath processing provides
significant optimization opportunitiesDifferent implementations of logical
operators can provide benefits in different circumstancesE.g. context selectivity
Query plans can be much more efficient than (existing) monolithic (twig) techniques in most circumstances
34Athens University of Economics and Business
Thank you!
36Athens University of Economics and Business