XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael...
-
Upload
jean-douglas -
Category
Documents
-
view
217 -
download
1
Transcript of XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael...
![Page 1: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/1.jpg)
XML Prefiltering as a String Matching Problem
Christoph Koch1, Stefanie Scherzinger2, Michael Schmidt3
1Cornell University 2IBM Boeblingen 3Freiburg University
24th International Conference on Data Engineering
April 9, Cancun (Mexico), 2008
![Page 2: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/2.jpg)
2
XML data often processed ad-hoc, e.g. in streaming scenarios and main memory-based processors
Low main memory consumption then becomes the key prerequisite to performance
XML prefiltering as an established technique that aims at decreasing main memory consumption
Motivation
We present a novel approach to XML prefiltering based on efficient string matching techniques
![Page 3: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/3.jpg)
3
Buffer only data that is relevant to query evaluation
Prefiltering/Projection Statical analysis of the XQuery/XPath expression Identifiy parts of the input document that are relevant
to query evaluation Discard parts of the input document that are not
relevant to query evaluation
A. Marian and J. SiméonProjecting XML DocumentsIn Proc. VLDB’03, pages 213–224, 2003
S. Bréssan, B. Catania, Z. Lacroix, Y. G. Li and A. Maddalena Accelerating Queries by Pruning XML DocumentsTKDE, 54(2):211–240, 2005
V. Benzaken, G. Castagna, D. Colazzo, and K. NguyenType-Based XML ProjectionIn Proc. VLDB’06, 2006
XML Prefiltering
![Page 4: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/4.jpg)
4
<q> { /site//australia//description} </q>
XQuery
Relevant Paths{ /site//australia//description# }
site
regions
XML Document
XML Prefiltering
africa asia australia
description
„PDA“
item
![Page 5: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/5.jpg)
5
Existing Approaches1. Analysis of the input query, extraction of relevant paths
2. Tokenization of the input document
3. Compilation of an automaton that projects the document token by token
XML Prefiltering
Our Approach1. Analysis of the input query, extraction of relevant paths
2. Use efficient string matching techniques to locate the relevant parts of the input document (without parsing and tokenizing the document)
Challenge: take string matching algorithms to the second dimension, to navigate in tree-structured data
![Page 6: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/6.jpg)
6
String Matching Techniques
Example: Boyer-Moore Algorithm
S t r i n g m a t c h i n g f o r b e g i n n e
b e g i n
Search for keyword
--- length of keyword = 5
1 5 10 15 20 25
r s
Similar algorithms exist for multi-keyword search (e.g., Commentz-Walter Algorithm)
b e g i n b e g i n b e g i nb e g i n
b e g i n
b e g i n
match
![Page 7: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/7.jpg)
7
String Matching and XML Prefiltering
String matching techniques have originally been designed for search in flat and unstructered text
But: XML is structured and prefiltering requires us to keep track of axis relations in the input paths (such as child and descendant relations)
XML schema knowledge (e.g., in the form of DTDs) provides us with structural information that can be
exploited for target-oriented search
![Page 8: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/8.jpg)
8
The Runtime Automaton
<!DOCTYPE site [ <!ELEMENT site (regions)> <!ELEMENT regions (africa, asia, australia)> <!ELEMENT africa (item*)> <!ELEMENT asia (item*)> <!ELEMENT australia (item*)> <!ELEMENT item (location,name,payment, description,shipping,incategory+)> <!ELEMENT incategory EMPTY> <!ATTLIST incategory category ID #REQUIRED>... ]>
We restrict to non-recursive DTDs, which can be transformed to finite automaton Ideas also applicable in the context of recursive DTDs
Fragment of the XMark DTD
![Page 9: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/9.jpg)
9
The Runtime Automaton
<site>
<regions>
<africa>
</africa><asia>
</asia>
<australia>
</site>
</australia>
<item>
<item>
<location> </location>
</name>
<description>
<shipping>
<name>
</payment><payment>
</description>
</shipping>
<incategory></incategory>
<incategory>
<item>
</item>
</item>
</item>
(<item> child tags)
(<item> child tags)
![Page 10: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/10.jpg)
10
The Runtime Automaton
<site>
<regions>
<africa>
</africa><asia>
</asia>
<australia>
</site>
</australia>
<item>
<item>
<location> </location>
</name>
<description>
<shipping>
<name>
</payment><payment>
</description>
</shipping>
<incategory></incategory>
<incategory>
<item>
</item>
</item>
</item>
(<item> child tags)
(<item> child tags)
Search for string “<site”
![Page 11: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/11.jpg)
11
The Runtime Automaton
<site>
<regions>
<africa>
</africa><asia>
</asia>
<australia>
</site>
</australia>
<item>
<item>
<location> </location>
</name>
<description>
<shipping>
<name>
</payment><payment>
</description>
</shipping>
<incategory></incategory>
<incategory>
<item>
</item>
</item>
</item>
(<item> child tags)
(<item> child tags)
Search for strings “<item”
and “</australia”
in parallel
![Page 12: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/12.jpg)
12
The Runtime Automaton
<site>
<regions>
<africa>
</africa><asia>
</asia>
<australia>
</site>
</australia>
<item>
<item>
<location> </location>
</name>
<description>
<shipping>
<name>
</payment><payment>
</description>
</shipping>
<incategory></incategory>
<incategory>
<item>
</item>
</item>
</item>
(<item> child tags)
(<item> child tags)
{ /site //australia //description# }
![Page 13: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/13.jpg)
13
The Runtime Automaton
<site>
<regions>
<africa>
</africa><asia>
</asia>
<australia>
</site>
</australia>
<item>
<location> </location>
</name>
<description>
<shipping>
<name>
</payment><payment>
</description>
</shipping>
<incategory></incategory>
<incategory>
</item>
{ /site //australia //description# }
![Page 14: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/14.jpg)
14
<australia>
</site>
</australia>
<item>
<location> </location>
</name>
<description>
<shipping>
<name>
</payment><payment>
</description>
</shipping>
<incategory></incategory>
<incategory>
</item>
The Runtime Automaton
<site>
{ /site //australia //description# }
![Page 15: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/15.jpg)
15
</site>
</australia>
<description>
<shipping>
</description>
</shipping>
<incategory></incategory>
<incategory>
</item>
The Runtime Automaton
<site>
<australia>
{ /site //australia //description# }
![Page 16: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/16.jpg)
16
</site>
</australia>
<description>
</description>
The Runtime Automaton
<australia>
<site>
<description>
</australia>
{ /site //australia //description# }
![Page 17: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/17.jpg)
17
Static Compilation into Lookup Tables
Automaton As <site> p0
p0 <australia> p1
p1 <description> p2
p1 </australia> q1
p2 </description> q2
q2 <description> p2
q2 </australia> q1
q1 </site> q0
Frontier Vocabulary Vs {<site>}
p0 {<australia>}
p1 {<description>,</australia>}
p2 {</description>}
q0 {}
q1 {</site>}
q2 {<description>, </australia>}
Action Table Ts no operation
p0 copy tag
p1 copy tag
p2 copy on
q0 copy tag
q1 copy tag
q2 copy off
s p1
q1q0 </site> </australia>
<description>
q2
p2
</description>
<australia>p0<site>
<description></australia>
![Page 18: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/18.jpg)
18
Static Compilation into Lookup Tables
s p1
<australia>p0
<site>
<regions> <africa> </africa> <asia> </asia>
<australia>
s p0<site>
p1Extract from the original runtime automaton
Extract from the optimized runtime automaton
Shortest possible XML string between <site> and <australia>:
s=“<regions><africa/><asia/>” with |s|=25
Initially skip 25 characters
Initial Jump Table Jp0 25
q2 43
other states 0
![Page 19: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/19.jpg)
19
The Runtime Algorithm
q := s; // current statec := 0; // cursor position
while q is not final dobegin(1) Perform initial jump J[q](2) Perform keyword search for tags V[q] until a tag t is matched (starting from current cursor position c)(3) Assign q := A[q, t](4) Perform action T[q] end
Lean runtime algorithm
Operates on top of the precompiled tables
Uses efficient string-matching techniques to locate keywords (step (2))
Runtime Core Algorithm
![Page 20: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/20.jpg)
20
<site><regions><africa><item><location>United States</location><na
me>T V</name><payment>Creditcard</payment><description>15’’LCD-Fla
tPanel</description><shipping>Within country</shipping><incategory
category="3"/></item></africa><asia/><australia><item ><location>
A Sample Run
while q is not final dobegin(1) Perform initial jump J[q](2) Perform keyword search for tags V[q] until tag t is matched (3) Assign q := A[q, t](4) Perform action T[q] end
Current state: q = s
Initial Jump: J[q=s] = 0
Frontier Voc.: V[q=s] = {<site>}
Current state: q = p0
Initial Jump: J[q=p0] = 25
Frontier Voc.: V[q]={<australia>}
Current state: q = p1
Initial Jump: J[q=p1] = 0
Frontier Voc.: V[q=p1] = {</australia>, <description>}
Matched tag „<site>“
A[s,<site>] = p0
Matched tag „<australia>“:
A[p0,<australia>] = p1
T[q=p0] = copy tag (<site>)T[q=p1] = copy tag (<australia>)
{ /site //australia //description# }
![Page 21: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/21.jpg)
21
A Sample Run
Egypt</location><name>PDA</name><payment>Check</payment><descripti
on>Palm Zire 71</description><shipping/><incategory category="3"/>
</item></australia></regions></site>
while q is not final dobegin(1) Perform initial jump J[q](2) Perform keyword search for tags V[q] until tag t is matched (3) Assign q := A[q, t](4) Perform action T[q] end
Current state: q = p2
Initial Jump: J[q=p2] = 0
Frontier Voc.: V[q=p2] = {</description>}
Matched tag „<description>“:
A[p1,<description>] = p2
Current state: q = p1
Initial Jump: J[q=p1] = 0
Frontier Voc.: V[q=p1] = {</australia>, <description>}
Matched tag „</description>“:
A[p2,<description>] = q2
copy on
copy off
T[q=p2] = copy onT[q=p2] = copy off
{ /site //australia //description# }
![Page 22: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/22.jpg)
22
Experiments
Prototype implementation in C++: SMP Settings
Core2 Duo IBM ThinkPad Z61p T2500 2.00GHz CPU with 1GB RAM Ubuntu Linux 6.06 LTS
Data sets: XMark, Medline, Proteine Sequence Document Sizes: 1MB up to 5,000MB Queries: XMark queries, user-defined XPath queries Query Engines
XQuery: Qizx/open, Saxon XPath: SPEX
![Page 23: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/23.jpg)
23
XM1 XM5 XM10 XM14 XM20
Proj. Size 67.64MB 22.10MB 307.63MB 1357.28MB 38.52MB
Memory 1.64MB 1.68MB 1.96MB 1.64MB 1.67MB
Elapsed Time 4min 12s 4min 12s 4min 55s 5min 21s 4min 10s
Usr+Sys 31.00s 19.91s 54.94s 53.71s 31.67s
CPU 12.52% 8.05% 13.85% 17.07% 12.92%
Char. Comp. 18.86% 9.87% 22.38% 21.24% 18.67%
Experimental Results
Projection of a 5,000MB XMark document for selected XMark benchmark queries
Projection Characteristics for Selected XMark Benchmark Queries
![Page 24: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/24.jpg)
24
Experimental Results
Throughput comparison SMP projection for XMark
(average over all queries on 5,000MB document)
vs. Bare XML document
tokenization performed by the Xerces C++ parser
SMP is faster than all projection systems that rely on a prior tokenization of the input XML document
![Page 25: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/25.jpg)
25
QizX XQuery EngineSucess TimeFail MemFail
1000MB without projection 0 0 18
1000MB with projection 18 0 05000MB without projection 0 0 185000MB with projection 15 2 1
Success Rates for 18 XMark Queries with and without Projection, where TimeFail: >1hour, MemFail: >1GB
Experimental Results
When coupled with projection, in-memory XQuery engines scale up to documents in the Gigabyte range
![Page 26: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/26.jpg)
26
Experimental Results
Throughput improvement 656MB Medline document 5 user-defined XPath queries Evaluated with the SPEX
XPath engine
![Page 27: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/27.jpg)
27
Summary
Efficient string matching techniques, originally designed for keyword search in flat text, can be used for search and navigation in unparsed XML documents
A novel approach to XML prefiltering on top on these ideas reduces XML prefiltering to a sequence of simple string matching tasks
Extensive experimental evaluation demonstrates persistently high throughput and scalability of our XML
prefiltering system significant improvements for both XQuery and XPath
engines when coupled with document prefiltering
![Page 28: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/28.jpg)
Thank You for Your Attention!
Y. Diao et. al.: “Path Sharing and Predicate evaluation for High-Performance XML Filtering” in TODS, 2003.
T. J. Green et al.: “Processing XML streams with deterministic automata and stream indexes” in TODS, 2004.
D. Olteanu: “SPEX: Streamed and Progressive Evaluation of XPath” in TKDE, 2007.
X. Li and G. Agrawal: “Efficient Evaluation of XQuery over Streaming Data” in VLDB, 2005.
A. Marian and J. Simeon: “Projecting XML Documents” in VLDB, 2003.
V. Benzaken, G. Castagna, D. Colazzo, and K. Nguyen: “Type-Based XML Projection” in VLDB, 2006.
M. Schmidt, S. Scherzinger, and C. Koch: “Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming XQuery Evaluation” in ICDE, 2007.
A. V. Aho: “Algorithms for finding patterns in strings” in Handbook of Theoretical. Comp. Sc., Volume A, 1990.
B. W. Watson and G. Zwaan: “A taxonomy of sublinear multiple keyword pattern matching algorithms” in Sci. Comput. Program., 1996.
D. E. Knuth, J. H. Morris (Jr.), and V. R. Pratt: “Fast Pattern Matching in Strings” in SIAM J. Computing, 1977.
R. S. Boyer and J. S. Moore: “A Fast String Searching Algorithm,” in Commun. ACM, 1977.
A. V. Aho and M. J. Corasick: “Efficient string matching: An aid to bibliographic search” CACM, 1975.
B. Commentz-Walter: “A String Matching Algorithm Fast on the Average” in Proc. ICALP, 1979.
A. Berlea and H. Seidl: “Binary Queries for Document Trees” in Nordic J. of Computing, 2004.
J. Jaakkola and P. Kilpelainen: “Nested text-region algebra” TR C-1999-2, Univ. of Helsinki, 1999.
M. Takeda et al: “Processing Text Files as Is: Pattern Matching over Compressed Texts, Multi-byte Character Texts, and Semi-structured Texts” in Proc. SPIRE, 2002.
M. Altinel et. al.: “Efficient Filtering of XML Documents for Selective Dissemination of Information” in ICDE, 2000.
A. Bruggemann-Klein and D. Wood: “One-Unambiguous Regular Languages” in Inform. and Comp., 1998.
J.-M. Champarnaud: “Subset Construction Complexity for Homogeneous Automata, Position Automata and ZPC-Structures” in Theor. Comput. Sci., 2001.
![Page 29: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/29.jpg)
Additional Resources
![Page 30: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/30.jpg)
The Runtime Automaton
In some cases, intermediate states must be kept to keep track of axis relation
<a> <b>
<c> <b> </b> </c>
</b> </a>
</a>
{ /a/b }
<a> <b> </b> </a>
NOT CORRECT! <a>
<c>
<b></b>
</c>
</a>
![Page 31: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/31.jpg)
The Runtime Automaton
In some cases, intermediate states must be kept to keep track of axis relation
<a> <b>
<c> <b> </b> </c>
</b> </a>
</a>
{ /a/b }
<a> <b>
<c> </c>
</b> </a>
</a>
CORRECT <a>
<c>
<b></b>
</c>
</a>
![Page 32: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/32.jpg)
Medline XPath Queries
M1 /MedlineCitationSet//CollectionTitle
M2 /MedlineCitationSet//DataBank[DataBankName/text()=“PDB”] /AccessionNumberList
M3 /MedlineCitationSet//PersonalNameSubjectList /PersonalNameSubject[LastName/text()=“Hippocrates” or DatesAssociatedWithName=“Oct2006”] /TitleAssociatedWithName
M4 /MedlineCitationSet//CopyrightInformation[contains(text(),“NASA”)]
M5 /MedlineCitationSet/MedlineCitation[ contains(MedlineJournalInfo//text(),“Sterilization”)]/DateCompleted
![Page 33: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/33.jpg)
XMark Queries
let $auction := doc("auction.xml") return
for $b in $auction/site/people/person[@id = "person0"]
return $b/name/text()
let $auction := doc("auction.xml") return
count(
for $i in $auction/site/closed_auctions/closed_auction
where $i/price/text() >= 40
return $i/price
)
XM1
XM5
![Page 34: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/34.jpg)
XMark Queries
let $auction := doc("auction.xml") return for $i in distinct-values($auction/site/people/person/profile/interest/@category) let $p := for $t in $auction/site/people/person where $t/profile/interest/@category = $i return <personne> <statistiques> <sexe>{$t/profile/gender/text()}</sexe> <age>{$t/profile/age/text()}</age> <education>{$t/profile/education/text()}</education> <revenu>{fn:data($t/profile/@income)}</revenu> </statistiques> <coordonnees> <nom>{$t/name/text()}</nom> <rue>{$t/address/street/text()}</rue> <ville>{$t/address/city/text()}</ville> <pays>{$t/address/country/text()}</pays> <reseau> <courrier>{$t/emailaddress/text()}</courrier> <pagePerso>{$t/homepage/text()}</pagePerso> </reseau> </coordonnees> <cartePaiement>{$t/creditcard/text()}</cartePaiement> </personne>return <categorie>{<id>{$i}</id>, $p}</categorie>
XM10
![Page 35: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/35.jpg)
XMark Queries
let $auction := doc("auction.xml")
return for $i in $auction/site//item
where contains(string(exactly-one($i/description)),"gold")
return $i/name/text()
XM14
![Page 36: XML Prefiltering as a String Matching Problem Christoph Koch 1, Stefanie Scherzinger 2, Michael Schmidt 3 1 Cornell University 2 IBM Boeblingen 3 Freiburg.](https://reader030.fdocuments.us/reader030/viewer/2022032708/56649e7f5503460f94b83b80/html5/thumbnails/36.jpg)
XMark Queries
let $auction := doc("auction.xml")
return <result>
<preferred>
{count($auction/site/people/person/profile[@income >= 100000])}
</preferred>
<standard>
{ count($auction/site/people/person/profile[@income<100000 and
@income >= 30000] ) } </standard>
<challenge>
{count($auction/site/people/person/profile[@income < 30000])}
</challenge>
<na>
{count(for $p in $auction/site/people/person
where empty($p/profile/@income)
return $p)}
</na>
</result>
XM20