Xpath Query Evaluation
description
Transcript of Xpath Query Evaluation
![Page 1: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/1.jpg)
Xpath Query Evaluation
![Page 2: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/2.jpg)
Goal
• Evaluating an Xpath query against a given document– To find all matches
• We will also consider the use of types
• Complexity is important– Huge Documents
![Page 3: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/3.jpg)
Data complexity vs. Combined Complexity
• Two inputs to the query evaluation problem– Data (XML document) of size |D|– Query (Xpath expression) of size |Q|– Usually |Q| << |D|
• Polynomial data complexity– Complexity that is polynomial in |D|, possibly exponential in |Q|
• Polynomial combined complexity– Complexity that is polynomial in |D| and |Q|
• Fixed Parameter Tractable complexity – Complexity Poly(|D|)*f(|Q|)
![Page 4: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/4.jpg)
Xpath Query Evaluation
• Input: XML Document D, Xpath query Q
• Output: A subset of the nodes of D, as defined by Q
• We will follow Efficient Algorithms for Processing Xpath Queries / Gottlob, Koch, Pichler, TODS 2005
![Page 5: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/5.jpg)
Simple algorithm
process-location-step(n,Q) { S:-= Apply Q.first to n; If |Q|> 1 For each node n’ in s do process-location-step(n’,Q.next)}
![Page 6: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/6.jpg)
Complexity
• Worst case: in each step of Q the axis is “following”
• So we apply the query in each step on O(|D|) nodes
• And we get Time(|Q|)= |D|*Time(|Q|-1)
• I.e. the complexity is O(|D|^|Q|)
![Page 7: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/7.jpg)
Early Systems Performance
Figure taken from Gottlob, Koch, Pichler ‘05
![Page 8: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/8.jpg)
Internet Explorer 6
Figure taken from Gottlob, Koch, Pichler ‘05
![Page 9: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/9.jpg)
IE6 – performance as a function of document size
Figure taken from Gottlob, Koch, Pichler ‘05
![Page 10: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/10.jpg)
Polynomial data complexity
• Poly data complexity is sometimes considered good even if exponential in the query size
• But can we have polynomial combined complexity for Xpath query evaluation?
• Yes!
![Page 11: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/11.jpg)
Two main principles
• Query parse trees: the query is divided to parts according to its structure (not to be confused with the XML tree structure)
• Context-value tables: for every expression e occurring in the parse tree, compute a table of all valid combinations of context c and value v such that e evaluates to v in c.
![Page 12: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/12.jpg)
Xpath query parse tree
descendant::b/following-sibling::* [position() != last()]
![Page 13: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/13.jpg)
Bottom-up vs. Top-down evaluation
• We will discuss two kinds of query evaluation algorithms:– Bottom-up means that the query parse tree is
processed from the leaves up to the root– Top-down means that the parse tree is processed
from the root to the leaves
• When processing we will fill in the context-value table
![Page 14: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/14.jpg)
Bottom-up evaluation
• Main idea: compute the value for each leaf for every possible context
• Propagate upwards until the root
• Dynamic programming algorithm to avoid re-evaluation of queries in the same context
![Page 15: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/15.jpg)
Operational semantics
• Needed as a first step for evaluation algorithms
• Similar ideas used in compilers design
• Here the semantics is based on the notion of contexts
![Page 16: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/16.jpg)
Contexts
• The domain of contexts is C= dom X {<k,n> | 1<k<n< |dom|} A context is c=<x,k,n> where x is a context node k is a context position n is the context size
![Page 17: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/17.jpg)
Semantics for Xpath expressions
• The semantics of evaluating an expression is a 4-tuple where the first 3 elements are the context, and the fourth is the value obtained by evaluation in the context
![Page 18: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/18.jpg)
Some notations
• T(t): all nodes satisfying a predicate t
• E(e): all nodes satisfying a regular exp. e (applied with respect to a given axis)
• Idxx(x,S) is the index of a node x in the set s with respect to a given axis and the document order
![Page 19: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/19.jpg)
![Page 20: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/20.jpg)
Context-value Table
• Given a query sub-expression e, the context-value table of e specifies all combinations of context c and value v, such that computing e on the context c results in v
• Bottom-up algorithm follows: compute the context-value table in a bottom-up fashion with respect to the query
![Page 21: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/21.jpg)
Bottom-up algorithm
![Page 22: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/22.jpg)
Example
4 times
![Page 23: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/23.jpg)
Complexity
• O(|D|^3*|Q|) space ignoring strings and numbers– O(|Q|) tables, with 3 columns, each including values
in 1…|D| thus O(|D|^3*|Q|)– An extra O(|D|*|Q|) multiplicative factor for strings
and numbers
• O(|D|^5*|Q|) time ignoring strings and numbers– It can take O(|D|^2) to combine two nodesets– Extra O(|Q|) in case of strings and numbers
![Page 24: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/24.jpg)
Optimization
• Represent contexts as pairs of current and previous node
• Allows to get the time complexity down to O(|D|^4* |Q|^2)
• Space complexity can be brought down to O(|D|^2*|Q|^2) via more optimizations
![Page 25: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/25.jpg)
Top-down evaluation
• Similar idea
• But allows to compute only values for contexts that are needed
• Same worst-case bounds
![Page 26: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/26.jpg)
Top-down or bottom-up?
• General question in processing XML trees• The tradeoff:
– Usually easier to combine results computed in children to obtain the result at the parent
• So bottom-up traversal is usually easier to design
– On the other hand, some of the computation is redundant since we don’t know if it will become relevant
• So top-down traversal may be more efficient
![Page 27: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/27.jpg)
Linear-time fragment• Core Xpath includes only navigation
– \ and \\
• Core Xpath can be evaluated in O(|D|*|Q|)
• Observtion: no need to consider the entire triple, only current context node
• Top-down or bottom-up evaluation with essentially the same algorithm
• But smaller tables (for every query node, all document nodes and values of evaluation) are maintained.
![Page 28: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/28.jpg)
Types are helpful
• Can direct the search– In some parts of the tree there is no hope to get a
match to a given sub-expression of the query– As a result we may have tables with less entries.
• Whiteboard discussion
![Page 29: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/29.jpg)
Type Checking and Inference
• Type checking a single document: straightforward– Polynomial combined complexity if automaton
representing type is deterministic, exponential in automaton size but polynomial in document size otherwise
• Type checking the results of a (Xpath) query• Inferring the results of a query
![Page 30: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/30.jpg)
Type Inference
• An (incomplete) algorithm for type inference can work its way to the top of the query parse tree to infer a type in a bottom-up fashion – Start by inferring a type for the leaves (simple
queries), then use it for their parents
• Type Inference is inherently incomplete.• Can be performed for some languages that
are “regular” in a sense.
![Page 31: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/31.jpg)
Restricted language allowing for type inference
• Axes: child, descendant, parent, ancestor, following-sibling, etc.
• variables can be bound to nodes in the input tree= then passed as parameters
• An equality test can be performed between node ID's, but not between node values.
![Page 32: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/32.jpg)
Type Checking
• In addition to inferring a type we need to verify containment in another type.
• Type Inference can be used as a tool for Type Checking.
• Type Checking was shown to be decidable for the same language fragment, but with high complexity.
![Page 33: Xpath Query Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022081519/568144fb550346895db1c64b/html5/thumbnails/33.jpg)
Intuitive connection to text
• Queries => regular expressions• Types (tree automata) => context free
languages• Type Inference => intersection of context free
and regular languages, resulting in a context free one
• Type checking => Type Inference + inclusion of context free languages (with some restrictions to guarantee decidability)