How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries
-
Upload
andre-freitas -
Category
Technology
-
view
89 -
download
0
Transcript of How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries
![Page 1: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/1.jpg)
How hard is this query? Measuring the Semantic Complexity of
Schema-agnostic Queries
André Freitas, Juliano Efson Sales, Siegfried Handschuh, Edward Curry
IWCS, London 2015
![Page 2: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/2.jpg)
Outline
• Motivation
• Query Semantic Complexity & Entropy
• Entropy Measures
• Validation & Analysis
• Conclusions
![Page 3: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/3.jpg)
Motivation
![Page 4: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/4.jpg)
Shift in the Database Landscape
Very-large and dynamic “schemas”.
10s-100s attributes1,000s-1,000,000s attributes
before 2000circa 2015
4 Brodie & Liu, 2010
![Page 5: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/5.jpg)
Databases for a Complex WorldHow do you query data on this scenario?
5
![Page 6: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/6.jpg)
Schema-agnosticism
Ab
stra
ctio
n
Laye
r
6
Who is the daughter of Bill Clinton?
Bill Clinton
Chelsea Clinton
child
![Page 7: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/7.jpg)
Schema-agnostic queries
Query approaches over structured databases which
allow users satisfying complex information needs
without the understanding of the representation
(schema) of the database.
7
Semantic Parsing
![Page 8: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/8.jpg)
Vocabulary Problem for Databases
Query: Who is the daughter of Bill Clinton married to?
Quantify the Semantic Gap
Possible representations
8
![Page 9: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/9.jpg)
Core Questions
• Can we measure the semantic complexity of a query-DB mapping?
• What defines an “easy” or a “hard” query?
• Which are the best estimators?
9
![Page 10: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/10.jpg)
Semantic Complexity & Entropy
![Page 11: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/11.jpg)
Configuration space of semantic matchings
Quantify the Query-DB semantic gap
Not all queries are born equal!
11
Semantic Complexity & Entropy
![Page 12: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/12.jpg)
Semantic Complexity & Entropy
• Structural/conceptual complexity
• Level of ambiguity/indeterminacy/vagueness
• Teminological gap
• Novelty
12
![Page 13: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/13.jpg)
Semantic Configuration Space
mΣ(Q,DB)13
![Page 14: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/14.jpg)
Semantic Entropy Measures
![Page 15: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/15.jpg)
Semantic Entropy Measures
Hsyntax
15
?Hstruct
HtermHtermHmatching
![Page 16: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/16.jpg)
In the scope of this work
• Entropy -> Entropy estimator, approximation.
16
![Page 17: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/17.jpg)
Syntactic Entropy (Hsyntax)
• The syntactic entropy of a query is defined by thepossible syntactic configurations in which a querycan be interpreted under the database syntax.
• Estimate the uncertainty of the translation of thequery into the DB categories (IDB(Q)).
• Is a function of the probability of the syntacticinterpretation of a query.
17
![Page 18: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/18.jpg)
Structural Entropy (Hstruct)
• The structural entropy defines the complexity of adatabase based on the possible facts that can beencoded under its schema.
• Pollard & Biermann, A measure of semanticcomplexity for natural language systems (2000).
18
![Page 19: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/19.jpg)
Terminological Entropy (Hterm)
• The terminological entropy focuses on quantifying anestimate on the amount of ambiguity, synonymy andvagueness for the query or database terms.
• Translational Entropy (Htrans) as an estimator.
• Melamed, Measuring semantic entropy (1997).
• Translation probability based on parallel corpora.
19
![Page 20: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/20.jpg)
Matching Entropy (Hmatching)
• Consists of measures which describe theuncertainty involved in the query-datamatching/alignment between query terms anddataset entities.
• Provides an estimate based on the set ofpotential alignments.
• Distributional entropy (Hdist): Estimator based ondistributional semantic models.
20
![Page 21: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/21.jpg)
Query Features as Complexity Estimators
• Query features (reference to data model/query operator categories).– Contains instance reference (named entities)
– Contains class reference
– Contains complex class reference
– Contains property
– Contains value
– Yes/No question
– Contains operator
21
![Page 22: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/22.jpg)
Validation & Analysis
![Page 23: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/23.jpg)
Experimental Set-up
• Question Answering over Linked Data TestCollection (Unger et al. 2011).
• QALD 2011 & 2012.
• 150 natural language queries over DBpedia(RDF).
Dataset (DBpedia + YAGO classes): 45,768 properties288,316 classes9,434,677 instances128,071,259 triples
23
![Page 24: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/24.jpg)
Query Analysis Example
24
![Page 25: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/25.jpg)
Query Analysis Example
25
![Page 26: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/26.jpg)
Experimental Set-up
• Linear regression between each entropymeasure and the f-measure of theparticipating QA systems.
• 4 QA systems:– QALD 2011: PowerAqua, Freya (κ = 0.501, 95% confidence
interval, ‘moderate’ agreement).
– QALD 2012: QAKis, MHE (κ= 0.236, 95% confidenceinterval, ‘fair’ agreement).
26
![Page 27: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/27.jpg)
1st Analysis
• Linear regression model.
• Hsyntax, Hterm (Htrans), Hmatching (Hdist) and Hstruct
27
![Page 28: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/28.jpg)
1st Analysis
• Higher correlation:
– Hsyntax (-)
– Hterm (Htrans) (-)
– Hmatching (Hdist) (-)
• Lower correlation:
– Hstruct
28
![Page 29: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/29.jpg)
2nd Analysis
• Query features (reference to data model/query operator categories).– Contains instance reference (named entities)
– Contains class reference
– Contains complex class reference
– Contains property
– Contains value
– Yes/No question
– Contains operator
29
![Page 30: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/30.jpg)
2nd Analysis
• Linear regression model.
30
![Page 31: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/31.jpg)
2nd Analysis
• Higher correlation:
– References to instances (+)
– Presence of operators (-)
– Presence of complex classes (complex nominals) (-)
31
![Page 32: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/32.jpg)
3rd Analysis
• Classification of the query-DBterminological gap for each datamodel category.
32
![Page 33: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/33.jpg)
3rd Analysis
Lower terminological gap
Higher terminological gap
![Page 34: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/34.jpg)
Query Classification
...34
![Page 35: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/35.jpg)
Query Classification
• % of unanswered questions:
– Syntactic complexity (Hsyntax): 51.7%
– Vocabulary gap (Hmatching, Hterm): 68.9%
– No reference to instance (named entity) (Hstruct,Hterm): 20.6%
35
![Page 36: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/36.jpg)
Limitations
• Validation of the regression model in adifferent test collection.
• Distributional entropy needs a moreprincipled definition.
36
![Page 37: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/37.jpg)
Minimizing Semantic Entropy
Reflections on the Design of Schema-agnostic Query Mechanisms
Or ....
![Page 38: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/38.jpg)
Minimizing the Semantic Entropy for the Semantic Matching
Definition of a semantic pivot: first query term to be resolved in the database.
Maximizes the reduction of the semanticconfiguration space (Hstruct , Hmatch).
38
![Page 39: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/39.jpg)
Semantic Pivots (Hstruct , Hmatch)
• Who is the daughter of Bill Clinton married to?
437100,184 62,781
> 4,580,000
dbpedia:spouse dbpedia:children :Bill_Clinton
39
![Page 40: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/40.jpg)
Minimizing the Semantic Entropy for the Semantic Matching
Definition of a semantic pivot: first query termto be resolved in the database.
Maximizes the reduction of the semanticconfiguration space (Hstruct , Hmatch).
Less prone to more complex synonymicexpressions and abstraction-level differences(Hterm , Hmatch).
40
![Page 41: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/41.jpg)
Semantic Pivots
• Proper nouns tends to have high percentage of string
overlap for synonymic expressions.
William Jefferson Clinton
Bill Clinton
William J. Clinton
T. E. Lawrence
Thomas Edward Lawrence
Lawrence of Arabia
Who is the daughter of Bill Clinton married to?
41
![Page 42: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/42.jpg)
Minimizing the Semantic Entropy for the Semantic Matching
Definition of a semantic pivot: first query term to be resolved in the database.
Maximizes the reduction of the semanticconfiguration space (Hstruct , Hmatch).
Less prone to more complex synonymic expressionsand abstraction-level differences (Hterm , Hmatch).
proper nouns >> nouns >> complex nominals >>adjectives , verbs.
42
![Page 43: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/43.jpg)
Semantic Matching
• Hsyntax is a strong estimator of querycomplexity.
• Hmatching can be used as an estimator for thequality of the predicate alignment.
• Hterm can be used as a heuristic for matchingcomplexity.
43
![Page 44: How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic Queries](https://reader030.fdocuments.us/reader030/viewer/2022032616/55a8923a1a28ab1c608b4611/html5/thumbnails/44.jpg)
Conclusions
• Both entropy (Hsyntax, Hterm, Hmatching) and query features(instances, complex classes, operators) can be used asestimators for query semantic complexity.
• This can be incorporated as heuristics into schema-agnostic query planning approaches (or approximatesemantic parsing) to maximize semantic matchingprobabilities.
• Need for the construction of better semantic entropyestimators.
44