GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana...
-
Upload
donald-sullivan -
Category
Documents
-
view
233 -
download
0
Transcript of GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana...
![Page 1: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/1.jpg)
GPX-Matcher - A Generic Boolean Predicate-based
XPath Expression Matcher
Mohammad Sadoghi, Ioana Burcea, and Hans-Arno JacobsenMiddleware Systems Research Group
University of Toronto
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
EDBT’2011
An X-ToPSS Project
http://msrg.org/tags/x-topss
![Page 2: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/2.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGThe Problem in a Nutshell
XPath Expressions (XPE)(Millions of XPE)
XML Filtering
Matched XPE
XML
Matched Subscriptions
Event/Publication
Subscriptions(Boolean Expressions)
Pub/Sub Engine
![Page 3: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/3.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGPublish/Subscribe Systems
Broker
Publisher Publisher
Subscriber Subscriber
Subscriptions
Publications
NotificationNotification
IBM=84
MSFT=27 INTC=19 JNJ=58ORCL=12
HON=24
AMGN=58
Stock marketsNYSE
NASDAQTSX
Subscriptions:IBM > 85
ORCL < 10JNJ > 60
3X-ToPSS & GPX-Matcher
![Page 4: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/4.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGPub/Sub Matching Algorithms• Rete algorithm [Forgy, late 70s]
– A graph-structure to correlate events, process rules (solves a more general problem)
• SIFT [Yan et al. TODS‘94]– Predicate counting et al.
• Gough algorithm [Gough et al. ACSC‘95]– Based on a finite state representation of subscriptions
• Gryphon algorithm [Aguilera, et al. PODC‘99]– Decision tree over predicates
• Clustering algorithm [Fabret et al. SIGMOD‘01]– Clusters subscriptions based on common predicates
• k-Index [Whang et al. VLDB‘09]• Hardware-based matching acceleration [Sadoghi et al. VLDB‘10]• BE-Tree [Sadoghi & Jacobsen, SIGMOD’2011]
4X-ToPSS & GPX-Matcher
![Page 5: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/5.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGThe Key Question?
Can XML Filtering be benefited from the efficient publish/subscribe matching
algorithms that have been developed for more than three decades?
5X-ToPSS & GPX-Matcher
![Page 6: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/6.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGXML Filtering Challenges
• Filter XML according to XPEs
• Efficiently, at Internet-scale, for millions of XPEs, and for many XML documents per unit of time
6X-ToPSS & GPX-Matcher
XPath Expressions (XPE)(Millions of XPE)
Matched XPE
XML
![Page 7: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/7.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGXML Filtering Systems• Growing need for XML filtering
– Application-level firewalls– Maleware detection and prevention– Document routing– RSS aggregators– XML-based messaging and application integration
• Selected industry players (XML appliances)– SolaceSystems– IBM DataPower– Talerian– Sarvega (Intel)
7X-ToPSS & GPX-Matcher
• XML filtering systems are publish/subscribe systems
• XPath & XML are subscription and publication, respectively
![Page 8: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/8.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGThe Core Problem
• XML Document Filtering Problem– Given a set of XPath expressions Q and an XML
document d, find all expressions in Q that are matched by d
• An expressions q is matched by an XML document d if and only if q selects a non-empty set of nodes in d– XPath expressions are used to select entire
documents or fragments of documents
8X-ToPSS & GPX-Matcher
![Page 9: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/9.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGAgenda
• Supported XPath Language• Mapping XML Filtering to Pub/Sub Matching
– XPath encoding– XML encoding
• Experimental results• Outlook
9X-ToPSS & GPX-Matcher
![Page 10: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/10.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGXML and XPath
<section>
<subsection> <figure> … </figure> </subsection> <figure> … </figure></section>
section
subsection
figure
figure
XML fragment XML tree XML paths
section-subsection-figure
section-figure
XPath queries
/section/subsection/figure
section/figure
/section//subsection/figure
section//figure
/section/*/figure
*/figure
location step
child operator
descendent operator wildcards
absolute query
relative query
10
![Page 11: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/11.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
X-ToPSS & GPX-Matcher 11
XPath 2.0 Subset Considered• Absolute path expressions
– /a/b• Relative path expressions
– a/b/c• Descendant operators in path expressions
– a/b//a/d• Wildcards in path expressions
– a/*/*/b• Not discussed, but shown how to address
– Filter predicates in path expressions• <path>[@x>1]/<path>
– Nested path filters (the XPE becomes a tree)• <path>[a/b]/<path>
![Page 12: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/12.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGAgenda
• Supported XPath Language• Mapping XML Filtering to Pub/Sub Matching
– XPath encoding– XML encoding
• Experimental results• Outlook
12X-ToPSS & GPX-Matcher
![Page 13: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/13.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGOur Question(s)
• How can we map XPath expressions onto subscriptions?– Conjunctive Boolean formula over predicates– S = (a1 op v1) (a2 op v2) … (an op vn)
• How can we map XML documents onto publications?– Set of attribute-value pairs– P = {(a1, v1), (a2, v2), …, (am, vm)}
13X-ToPSS & GPX-Matcher
![Page 14: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/14.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGPredicate Calculus
• Single-tag predicate
• Double-tags predicate
• End-tag predicate
• Length-constraint predicate
voppt
v opppdtt
),( 21
vpt
vlength
14X-ToPSS & GPX-Matcher
![Page 15: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/15.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGSingle-tag Predicate Example
• XPath expression/b/…
• Predicate
1 bp
15X-ToPSS & GPX-Matcher
Tag b at position 1
b
a
c
d
b-a-c
(b, 1), (a, 2), (c, 3)
![Page 16: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/16.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGDouble-tags Predicate Example I
• XPath expression… a/b …
• Predicate
1 ),( ppdba
16X-ToPSS & GPX-Matcher
Distance between Tag a and Tag b is one location step
x
a d
x-a-b
(x, 1), (a, 2), (b, 3)
b
![Page 17: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/17.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGDouble-tags Predicate Example II
17X-ToPSS & GPX-Matcher
Distance between Tag a and Tag b is at
least one location step
• XPath expressiona//b
• Predicate
1 ),( ppdba
a
x
b
d
a-x-b
(a, 1), (x, 2), (b, 3)
![Page 18: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/18.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGEnd-tag Predicate Example
• XPath expression/a/*/*
• Predicate
2ap
18X-ToPSS & GPX-Matcher
Tag a at least two location steps away
from path end
a
x
y
d
a-x-y
(a, 1), (x, 2), (y, 3), (length, 3)
![Page 19: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/19.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGLength-constraint Predicate Example
• XPath expression*/*/*
• Predicate 3 length
19X-ToPSS & GPX-Matcher
Length of the path is at least 3
x
y
z
d
x-y-z
(x, 1), (y, 2), (z, 3) (length, 3)
![Page 20: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/20.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
Putting it Together:XPath Query Encoding Example
Q1: a/b//aQ2: a//b/dQ3: a/*/*/*//b/d
Q1: a1/b1//a2
Q2: a1//b1/d1
Q3: a1/*/*/*//b1/d1
1),( 21 ab
ppdQ1: 1),( 11 ba
ppd
P3 P4
P4P5 4),( 11 ba
ppd 1),( 11 db
ppdQ3:
1),( 11 ba
ppd 1),( 11 db
ppdQ2:
Our XPath encoding grows linearly in the size of the XPath expression
P1 P2
20
![Page 21: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/21.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGXML Document Path Encodinga-b-c-d
a1-b1-c1-d1
(length, 4),
(a1, 1), (b1, 2), (c1, 3), (d1, 4)
(a1, b1, 1), (a1, c1, 2), (a1, d1, 3),
(b1, c1, 1), (b1, d1, 2),
(c1, d1, 1)
The resulting attribute-value “pairs” set has O(n2) tags.
Without duplicate tags
(i.e., all occurrence
numbers are 1)
Document path
Attribute-value pair
Publication
21
![Page 22: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/22.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
Mapping XML Filtering to Pub/Sub Matching
Matched XPE
XML
Matched Subscriptions
Event/Publication
Subscriptions(Boolean Expressions)
Pub/Sub Engine
XPath Expressions (XPE) (Millions of XPE)
![Page 23: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/23.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGMatching Algorithms
• Pick any pub/sub matching algorithm• We used
– Counting algorithm [exact origin is unknown]– Clustering algorithm [Fabret, Jacobsen et al.,
2001]• Both are two-phased matching algorithms
1. Predicate matching: Match all predicates.2. Subscriptions matching: Match subscriptions
using the result from step 1.
23X-ToPSS & GPX-Matcher
![Page 24: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/24.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
Predicate Matching: Single Tag Predicate
=
1 3 42 Predicate value
voppt vpt vlength
(length, 4),
(a1, 1), (b1, 2), (c1, 3), (d1, 4)(a1, b1, 1), (a1, c1, 2), (a1, d1, 3),
(b1, c1, 1), (b1, d1, 2),
(c1, d1, 1)
a
Publication:
Predicate bit vector
Hash on the tag
i
i
1
1 ap
c 3 cp
0 0 0
24
with id i
j
with id j
![Page 25: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/25.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGSubscription Matching:
Clustering Algorithm• Cluster queries based on the access predicates• Access predicates shared by all queries in cluster• Only check clusters whose access predicates are matched• Open Question: how to choose an effective access predicate
Access predicates
false
false
pipi
25X-ToPSS & GPX-Matcher
![Page 26: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/26.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGExperimental Evaluation• All algorithms implemented in C
– GPX – the base encoding with counting– GPX-ap – the base encoding with clustering (access pred.)– YFilter & BPA
• DTDs used for generating workloads– NITF DTD (News Industry Data Format)– PSD DTD (Protein Sequence Database)
• Total filtering time averaged over 500 XML documents– XML parsing time is negligible in
the overall filtering time• Intel Quad-Core 2.66 GHz, 4GB
encodedXPath expressions
XML
26X-ToPSS & GPX-Matcher
![Page 27: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/27.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGScalability in Number of XPEsAll XPEs are distinct
27X-ToPSS & GPX-Matcher
1 ms vs.
18 msap on first
ap on last
![Page 28: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/28.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGScalability in Number of XPEsXPEs workload contains duplicates
28X-ToPSS & GPX-Matcher
![Page 29: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/29.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGEffect of Path Length
29X-ToPSS & GPX-Matcher
![Page 30: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/30.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGEffect of Wildcards
X-ToPSS & GPX-Matcher 30
![Page 31: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/31.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGConclusions
• Novel XML/XPath encoding• Leverages existing matching techniques• Differs significantly from predominantly
automata-based related work• Outperforms related approach by an order of
magnitude under many experimental conditions
31X-ToPSS & GPX-Matcher
![Page 32: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/32.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGThank You!
• To learn more about X-ToPSS, please see– http://msrg.org/tags/x-topss
32X-ToPSS & GPX-Matcher
![Page 33: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/33.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
X-ToPSS & GPX-Matcher 33
![Page 34: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/34.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGAgenda
• XML-based Filtering Systems• Mapping XML Filtering to Pub/Sub Matching
– XPath encoding– XML encoding
• Experimental results• Outlook
34X-ToPSS & GPX-Matcher
![Page 35: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/35.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGContent-based Publish/Subscribe
• Subscription: Boolean expressions (i.e., an attribute-operator-value triple)
(subject = news) (topic = travel) (date > 21.2.2011)
• Publication (a.k.a. event): Sets of attribute-value pairs
(subject, news), (topic, travel), (date, 21.2.2011), …
35X-ToPSS & GPX-Matcher
![Page 36: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/36.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGThe Pub/Sub Matching Problem
• Given an event, e, and a set of subscriptions, S, determine all subscriptions, s S, that match e.
subscriptions
event / publication
matches 36X-ToPSS & GPX-Matcher
![Page 37: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/37.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGWide Applicability• Selective information dissemination• Location-based services• Personalization, alerting services• Application integration• Service & resource discovery• Network and distributed system management• Monitoring, surveillance, and control • Network and distributed system management• Workforce management• Workload management & job scheduling• Business activity monitoring• Business process management, monitoring, and execution
X-ToPSS & GPX-Matcher 37
![Page 38: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/38.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGMatching Algorithm Techniques
• Amortized storage & processing• Access predicates• Cost model-driven subscription partitioning• Cache-conscious data structure layout• Asynchronous cache-level pre-fetching • Event queue re-ordering and batch processing• Parallelization of algorithms for SMP & multi-core• FPGA-based acceleration (hardware-level)
38X-ToPSS & GPX-Matcher
![Page 39: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/39.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGeXtensible Markup Language
• XML – de facto standard for data exchange– Web Services, data and application integration,
information dissemination
• XPath – XML query language– Also used as basis for other query languages (e.g.,
XQuery, Xpointer, XSLT et al.)
39X-ToPSS & GPX-Matcher
![Page 40: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/40.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGXML and XPath
<section>
<subsection> <figure> … </figure> </subsection> <figure> … </figure></section>
XML fragment
section
subsection
figure
figure
XML tree XML paths
section-subsection-figure
section-figure
XPath queries
/section/subsection/figure
section/figure
/section//subsection/figure
section//figure
/section/*/figure
*/figure
40X-ToPSS & GPX-Matcher
![Page 41: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/41.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGXML and XPath
<section>
<subsection> <figure> … </figure> </subsection> <figure> … </figure></section>
section
subsection
figure
figure
XML fragment
XML tree XML paths
section-subsection-figure
section-figure
XPath queries
/section/subsection/figure
section/figure
/section//subsection/figure
section//figure
/section/*/figure
*/figure
location step
child operator
41
![Page 42: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/42.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGXML and XPath
<section>
<subsection> <figure> … </figure> </subsection> <figure> … </figure></section>
section
subsection
figure
figure
XML fragment XML tree XML paths
section-subsection-figure
section-figure
XPath queries
/section/subsection/figure
section/figure
/section//subsection/figure
section//figure
/section/*/figure
*/figure
location step
child operator
descendent operator 42
![Page 43: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/43.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGOur Research Goal
• Solve the XML filtering problem using content-based pub/sub matching algorithm.
• Why– Build on and exploit several decades worth of
insights, rather than construct special purpose solutions.
43X-ToPSS & GPX-Matcher
![Page 44: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/44.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGIn a Nutshell
encodedXPath expressions
section
subsection
figure
figure
section-subsection-figure
section-figure
44X-ToPSS & GPX-Matcher
![Page 45: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/45.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGSpecial purpose XML/XPath Filtering Algorithm
• XFilter [Altinel et al. VLDB‘00]• WebFilter [Pereira et al. VLDB’01]• YFilter [Diao et al. TODS‘03]• XTrie [Chan et al. ICDE‘03]• AFilter [Candan et al. VLDB‘06]• BPA [Huo & Jacobsen, ICDE‘06]• BoXFilter [Moro et al. VLDB‘07]• pFiST [Kwon et al. DKE’08]
45X-ToPSS & GPX-Matcher
![Page 46: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/46.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGFrom XML Filtering to Publish/Subscribe Matching
• XPath expressions are encoded in a predicate calculus
• XML documents are expressed as a set of paths from the root to a leave in the document tree– Each path is translated into sets of attribute-value
pairs (tags and their location in the path)
• Matching algorithm– The attribute-value pairs are matched against the
predicates with traditional pub/sub matching algorithms
46X-ToPSS & GPX-Matcher
![Page 47: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/47.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGPossibly Extensions
• Extend predicate calculus to encompass other XPath 2.0 features
• Alternative encodings• Exploit DTD or schema information• Exploit information about XPath expressions
processed
47X-ToPSS & GPX-Matcher
![Page 48: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/48.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGX-ToPSS: XML-based Toronto Publish/Subscribe System
• Distributed, content-based publish/subscribe (cf. ICDCS’08)– Exploit DTDs (Document Type Definition) to optimize
subscription routing in distributed pub/sub systems– Explain covering and merging optimizations for
XML/XPath• Alternative predicate-based XML/Xpath
matching algorithm that cannot exploit traditional pub/sub schemes (cf. ICDE’06)
• Encoding presented herein, cf. EDBT’2011 (forthcoming)
http://msrg.org/tags/x-topss 48
![Page 49: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/49.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGExample: XPath Query Encoding
a 1b1
b 1a2
1d1
1),( 21 ab
ppd
1),( 11 ba
ppd
4),( 11 ba
ppd
1),( 11 db
ppd
1),( 11 ba
ppdP1
P3
P4
P5
P2
=
=
=
1 3 421
2
3
4
5
Predicate identifier (pid)
49X-ToPSS & GPX-Matcher
![Page 50: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/50.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
data tuplessubscriptions
query publication
Query and subscription are very similar.
Data tuples and publication are very similar.
However, the two problem statements are inverse.
That’s Like Data Base Querying !!
sets of tuples
Abo
ut p
ast
Abo
ut f
utur
e
sets of tuples
![Page 51: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/51.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
X(length, 5),
(a1, 1), (b1, 2), (c1, 3), (b2, 4), (d1, 5)
(a1, b1, 1), (a1, c1, 2), (a1, b2, 3), (a1, d1, 4),
(b1, c1, 1), (b1, b2, 2), (b1, d1, 3),
(c1, b2, 1), (c1, d1, 2),
(b2, d1, 1)
a-b-c-b-d
a1-b1-c1-b2-d1
a1-b1-c1-b2-d1 a1-b1-c1-b1-d1
a1- -c1-b1-d1
(length, 5),
(a1, 1), (c1, 3), (b1, 4), (d1, 5)
(a1, c1, 2), (a1, b1, 3), (a1, d1, 4),
(c1, b1, 1), (c1, d1, 2),
(b1, d1, 1)
XML Document Path Encoding ExampleWith duplicate tags
51X-ToPSS & GPX-Matcher
![Page 52: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/52.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
Example - XML Document Path Encoding(with Duplicates)a-b-c-b-d
a1-b1-c1-b2-d1
a1-b1-c1-b2-d1 a1-b1-c1-b1-d1X
a1- -c1-b1-d1(length, 5),
(a1, 1), (b1, 2), (c1, 3), (b2, 4), (d1, 5)
(a1, b1, 1), (a1, c1, 2), (a1, b2, 3), (a1, d1, 4),
(b1, c1, 1), (b1, b2, 2), (b1, d1, 3),
(c1, b2, 1), (c1, d1, 2),
(b2, d1, 1)
(length, 5),
(a1, 1), (c1, 3), (b1, 4), (d1, 5)
(a1, c1, 2), (a1, b1, 3), (a1, d1, 4),
(c1, b1, 1), (c1, d1, 2),
(b1, d1, 1)
a1-b1-c1-b2-d1
(length, 5),
(a1, 1), (b1, 2), (c1, 3), (b2, 4), (d1, 5)
(a1, b1, 1), (a1, c1, 2), (a1, b2, 3), (a1, d1, 4),
(b1, c1, 1), (b1, b2, 2), (b1, d1, 3),
(c1, b2, 1), (c1, d1, 2),
(b2, d1, 1)
X
(length, 5),
(a1, 1), (c1, 3), (b1, 4), (d1, 5)
(a1, c1, 2), (a1, b1, 3), (a1, d1, 4),
(c1, b1, 1), (c1, d1, 2),
(b1, d1, 1)52X-ToPSS & GPX-Matcher
![Page 53: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/53.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
Predicate Matching: Double Tags Predicate
a 1b1
b 1a2
1d1
Hash on the first tag
Hash on (occ # first tag,
second tag, occ # second tag)
=
=
=
1 3 42
Predicate operator
Predicate value
v opppdtt
),( 21
(length, 4),
(a1, 1), (b1, 2), (c1, 3), (d1, 4)
(a1, b1, 1), (a1, c1, 2), (a1, d1, 3),
(b1, c1, 1), (b1, d1, 2),
(c1, d1, 1)
Publication:
53X-ToPSS & GPX-Matcher
![Page 54: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/54.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGMatching Algorithm
1. Match all predicates (predicate matching) and record results in predicate bit vector
2. Match subscriptions based on predicate bit vector (subscriptions matching)
From here on forward, nothing new really (we re-use pub/sub matching algorithms, as promised.)
X-ToPSS & GPX-Matcher 54
![Page 55: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/55.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORG
Subscription Matching: Counting Algorithm
Q1
Q2
Q3
222
For each query record the number of predicates
0$1
For each query count the number of satisfied
predicates
= Q2 is matched
1
5
34
2Q1
Q1
Q2
Q2, Q3
Q3
For each predicate associate queries that
contain itPredicates
453
432
211
PPQ
PPQ
PPQ
P3 P4match
55X-ToPSS & GPX-Matcher
![Page 56: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/56.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGRelated Work: YfilterQ1: a/b//aQ2: a//b/dQ3: a/*/*/*//b/d
a b ε
**
ε*
b d
*
a
Q2, Q3
ε*
*
Q1
56
[Diao et al. TODS‘03]
![Page 57: GPX-Matcher - A Generic Boolean Predicate-based XPath Expression Matcher Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen Middleware Systems Research.](https://reader035.fdocuments.us/reader035/viewer/2022062222/5697bffc1a28abf838cc1828/html5/thumbnails/57.jpg)
MIDDLEWARE SYSTEMSRESEARCH GROUP
MSRG.ORGLonger-term Vision
• Map matching problems for different languages onto an efficient pub/sub matching kernel
• For example, for:– Graph-structured query / data (RSS, RQL)– Tree-structured query / data (XML / XPath)– Regular expressions / sentences– Etc.
57X-ToPSS & GPX-Matcher