Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg...
-
Upload
bailey-knop -
Category
Documents
-
view
214 -
download
0
Transcript of Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg...
![Page 1: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/1.jpg)
Cooperative Query Answering for Semistructured Data
Speakers:
Chuan Lin & Xi Zhang
By Michael Barg and Raymond K. Wong
![Page 2: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/2.jpg)
Outline
Motivations Overview Basic Concepts Cooperative Query Processing Experiment
![Page 3: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/3.jpg)
Motivations
XML data– same semantic content– very different structures
![Page 4: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/4.jpg)
Example: same semantics, diff structures
“insurance claims” related to
“smoking” for “woman”
User Query:
Court Transcript:
insurance claim
plaintiff
woman
smoking
Insurance Record:insurance claim
insurer
woman smoking
![Page 5: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/5.jpg)
Motivations
No exact query result
“phone number” of “Bob”
Who is the new “sales manager”
User Query: personnel
sales manager
Joe
phone number
assistantsales manager
Bob
phone number
salesmansalesman
Data:
![Page 6: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/6.jpg)
Overview
Goal:– Return approximate answers for XML queries– “approximate”: semantic + structural similar
Solution:– Return a set of results– ranked by an overall score
score: indicates how well the subgraph containing the result satisfies the query criteria.
![Page 7: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/7.jpg)
Basic Concepts: Query Tree
Query: /restaurant[.//Soho]/phone_number
Result Term
For each edge:
“head”: the end which is closer to nearest result term“end”: the other end
In case of tie, “head” is the end closer to root
Query Tree:
restaurant
soho phone_numberrh
th
t
![Page 8: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/8.jpg)
Basic Concepts: Converging Order
Order of edges considered in query processing Converge on a result term
![Page 9: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/9.jpg)
Basic Concepts: Similarity
Semantically similar topologies
restaurant
address
soho
restaurant
soho
soho
restaurant
soho
eating_places
restaurant
shopping_center
sohorestaurant
(a) (c) (e)(b) (d)
![Page 10: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/10.jpg)
Basic Concepts: Similarity (cont.)
Deviation Proximity (DP)– Measure how far one structure deviates from a desired stru
cture– Given:
ra: data node with value a
rb: data node with value b Q(a,b): query tree edge
– DP: the actual position of rb to the nearest position, r’b, which satisfies the topological relationship specified by Q(a,b)
Topological relationship: parent-child, ancestor-descendent
![Page 11: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/11.jpg)
Deviation Proximity
restaurant
address
soho
restaurant
soho
soho
eating_places
restaurant
shopping_center
sohorestaurant
0 2 31 3
Q (restaurant, soho) requires parent-child relationship
(soho’) (soho’)
soho
restaurant
(soho’)
(soho’)
(soho’)
DP(restauarent, soho):
![Page 12: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/12.jpg)
Deviation Proximity
0 2 30 3
Q (restaurant, soho) requires anc-desc relationship
restaurant
address
soho
restaurant
soho
soho
eating_places
restaurant
shopping_center
sohorestaurant
(soho’)
(soho’)
soho
restaurant
(soho’)
(soho’)
(soho’)
DP(restauarent, soho):
![Page 13: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/13.jpg)
Cooperative Query Processing
Input: a Query Tree QT, an XML Document Tree DT
Output: ordered list of <rresult_term, score> Cooperative Query Processing
– Structural proximity calculation– Progressive Score
![Page 14: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/14.jpg)
Cooperative Query Processing (cont.)
Progressively matching edges in QT with DT
– Consider edges in converging order– For each edge QT(a,b), where a is head and b is t
ail, get a list of <ra, score> ra is a node in DT with value a
score is the progressive score of ra w.r.t the nearest rb
use graph encoding to calculate structural proximity of ra and rb
![Page 15: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/15.jpg)
Structural Proximity Calculation
Encodings and Compressed Arrays– Compact– Preserve relationship to a larger graph– Facilitate distance calculations
Proximity Searching
![Page 16: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/16.jpg)
Encodings and Compressed Arrays
Basic Concepts:– Common Node– Terminal Node– Annotated Node
Path representation– Representing Single Path– Representing Multiple Paths– Representing Multiple Elements
Compressed Arrays– Each encoding is a path/muti-path for a node/a set of n
odes
![Page 17: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/17.jpg)
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street SuburbMai ns Mai ns
Somethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunesAppl e_bee
Name Shop
Branch
Soho
Bonza mal l product Restraurant
TypeSeafood
1 2 3
1
1 2 3
1 2 1 21
1 1 1 1
1
1
1 2 3
1
1 2
12
3
11
1 1 2
1 1
1
Encodings and Compressed Arrays
![Page 18: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/18.jpg)
Representing Single Path
1.1.1 y1
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street SuburbMai ns Mai ns
Somethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunesAppl e_bee
Name Shop
Branch
Soho
Bonza mal l product Restraurant
TypeSeafood
1 2 3
1
1 2 3
1 2 1 21
1 1 1 1
1
1
1 2 3
1
1 2
12
3
11
1 1 2
1 1
1
1.2.1.1.1.1 y2
![Page 19: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/19.jpg)
Representing Multiple Paths
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street SuburbMai ns Mai ns
Somethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunesAppl e_bee
Name Shop
Branch
Soho
Bonza mal l product Restraurant
TypeSeafood
1 2 3
1
1 2 3
1 2 1 21
1 1 1 1
1
1
1 2 3
1
1 2
12
3
11
1 1 2
1 1
1
A
B
C
1.3 B.B.2.1.1 C.3 C.C.2 y3
![Page 20: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/20.jpg)
Representing Multiple Elements
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street SuburbMai ns Mai ns
Somethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunesAppl e_bee
Name Shop
Branch
Soho
Bonza mal l product Restraurant
TypeSeafood
1 2 3
1
1 2 3
1 2 1 21
1 1 1 1
1
1
1 2 3
1
1 2
12
3
11
1 1 2
1 1
1
A
B
C
1 A.A.1.1y1.2.1.1.1.1 y2.3 B.B.2.1.1 C.3 C.C.2 y3
![Page 21: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/21.jpg)
Compressed Arrays
0 1 0 0 1 1 1 1 0 1
0 0 1 0 1 1 0 0 1 1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
1
0
0
1
1
1
0
0
0
0
0
1
1
0
B 3 y2 1 111111 A2 Ay1
Ori gi nate f romnode “ A”
Ori gi nate f rom node “ A”Ori gi nate f rom
node “ A”
Boundary Pattern
I denti fi er Pattern
At node y1At node ANode A
Path encodi ng segment
0 1 1 1 0 1
1 1 1 1 0 0
0
0
0
1
1
0
0
1
1
1
0
1
0
0
0
1
1
0
1
0
0
0
1
1
0
0
0
1
0
1
1
0
3 1 1 B2 CC C y3 2
Ori gi nate f rom node “ B”Ori gi nate f romnode “ B”
Ori gi nate f romnode “ C”
Boundary Pattern
I denti fi er Pattern
At node y3At node CNode C
Path encodi ng segment
![Page 22: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/22.jpg)
Drawback of Encoding
1A.A.1B.B.1D.2E.?.2C.C.1F.2G
A
B C
D E F G
1
1 2
1 2 1 2
![Page 23: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/23.jpg)
Proximity Searching
Multi-Element Comparison– Input:
A compressed array, caN, containing the multi-element encoding of the Near Set.
A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF.
– output: dist, the shortest path from EF to the closest element in
Near Set
![Page 24: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/24.jpg)
Proximity Searching
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street SuburbMai ns Mai ns
Somethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunesAppl e_bee
Name Shop
Branch
Soho
Bonza mal l product Restraurant
TypeSeafood
1 2 3
1
1 2 3
1 2 1 21
1 1 1 1
1
1
1 2 3
1
1 2
12
3
11
1 1 2
1 1
1
x2
x3
x1
A
B
Cy1
y3
y2
MinDist=5 MinDist = 4 MinDist = 2
![Page 25: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/25.jpg)
Progressive Score
Accumulative Deviation Proximity (DP)– Calculated from structural proximity
Boolean operator at Query Tree branches
a
b c
a
b c
prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))
![Page 26: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/26.jpg)
Experiment
Query:
//restaurant/soho
XML:
Query Result:
<soho, 2><soho, 3><soho, 4>
Food Gui de Shoppi ng Mal lCi ty Gui de
Restaurant
Address Menu Name
Street Suburb
Mai ns Mai nsSomethi ng_fi shy
BBQ Sea FoodSohoBrown St
Soho
Pl ace to eat
Seafood
Restraustant
Category Fastfood
neptunes Appl e_bee
Name Shop
Branch
Soho
Bonza mal l productRestraurant
TypeSeafood
![Page 27: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/27.jpg)
Thank you!
![Page 28: Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c405503460f948ebd96/html5/thumbnails/28.jpg)
Questions & Answers