Wi presentation
-
Upload
saeedeh-shekarpour -
Category
Education
-
view
326 -
download
0
Transcript of Wi presentation
Keyword-driven SPARQL Query Generation
Leveraging Background Knowledge
Authors:Saeedeh Shekarpour, Sören Auer, Axel-Cyrille Ngonga Ngomo, Daniel
Gerber, Sebastian Hellmann, Claus Stadler
AKSW group
Universität Leipzig
WI-IAT conference
Outline• Motivation• Entity recognition Phase• SPARQL query generation • Evaluation• Conclusion and future work
2AKSW group - Universität Leipzig 24 August 2011
Querying web of documents
3AKSW group - Universität Leipzig
Text retrieval
24 August 2011
Web of Data
AKSW group - Universität Leipzig 424 August 2011
Motivations
Difficulties of Sparql
• Knowledge about the underlying ontology structure.
• Proficiency in formulating formal queries.
Keyword paradigm
• Successful experience of keyword-based search in document retrieval
• Satisfactory research results about the usability of this paradigm
5AKSW group - Universität Leipzig 24 August 2011
Birds-eye-view of the envisioned search approach
6AKSW group - Universität Leipzig 24 August 2011
Overview of the proposed method
7AKSW group - Universität Leipzig 24 August 2011
Outline• Motivation• Entity recognition phase• SPARQL query generation phase • Evaluation• Conclusion and future work
8AKSW group - Universität Leipzig 24 August 2011
Mapping keywords to IRIs
• The goal is recognition of entities.
• Mapping is based on string similarity.
• This similarity is applied on all types of entities (i.e., classes, properties and instances).
• As a result, for each keyword, we retrieve a list of IRI candidates called anchor points.
9AKSW group - Universität Leipzig 24 August 2011
Ranking and Selecting Anchor Points
• Ranking is based on Specificity degree.• Specificity degree is in terms of string similarity and
connectivity degree.• The string similarity score calculates the similarity of
the label of to • The connectivity degree CD(u) for each is
computed as counting how often occurs in the triples of the knowledge base.
10
iKi APu
iKi APu
u
iK
AKSW group - Universität Leipzig 24 August 2011
Ranking and Selecting Anchor Points
• Specificity degree is defined as:
• Sorting anchor points corresponding to each keyword based on specificity degree.
• Selecting IRIs in each sorted anchor points list.
11
))(log(),()( uCDKuuS ilabel
ntop
AKSW group - Universität Leipzig 24 August 2011
Outline• Motivation• Entity recognition phase• SPARQL query generation phase • Evaluation• Conclusion and future work
12AKSW group - Universität Leipzig 24 August 2011
Graph pattern template
• H is a set of placeholders and V is a set of variable identifiers being
disjoint from each other and from .
• A graph pattern template is defined as:
• After replacing the placeholders in a graph pattern template with the detected IRIs, a graph pattern with triple patterns of the form
13
)}()()(|),,{( EVoEVpEVsopsGPT
)()()( ICVPVIV
PIC
AKSW group - Universität Leipzig 24 August 2011
Categorization of all graph pattern templates
14
Category Possible Patterns Pattern Schema
Instance-Property (IP)
IP.P1 IP.P2
IP.P3 IP.P4 IP.P5 IP.P6
)s, p, ?o(?) s, p, o(
?) s1, ?p1, o1?)(s1, p2, ?o2 (?)s1, ?p1, o1?)(o2, p2, ?s1() s1, ?p1, ?o1?)(s2, p2, ?o1() s1, ?p1, ?o1?)(o1, p2, ?o2 (
Class-Instance (CI) CI.P7
CI.P8 ?)s1, a, c?)(s1, ?p1, o1 (?)s1, a, c)(s2, ?p1, ?s1 (
Instance-Instance (II)
II.P9 II.P10 II.P11 II.P12
)s, ?p, o() s, ?p1, ?x?)(x, ?p2, o(
) s1, ?p1, ?x)(s2, ?p2, ?x(?)s, ?p1, o1?)(s, ?p2, o2 (
Class-Property (CP) CP.P13
CP.P14 ?)s, a, c?)(s, p, ?o(?) s, a, c?)(x, p, ?s (
Property-Property (PP) PP.P15
PP.P16 PP.P17
?)s, p1, ?x?)(x, p2, ?o(?) s1, p1, ?o?)(s2, p2, ?o (?)s, p1, ?o1?)(s, p2, ?o2 (
AKSW group - Universität Leipzig 24 August 2011
Appropriate identified graph pattern templates
15
Category Possible Patterns Pattern Schema
Instance-Property (IP) IP.P1IP.P4 IP.P6
)s, p, ?o(?)s1, ?p1, o1?)(o2, p2, ?s1(
) s1, ?p1, ?o1?)(o1, p2, ?o2 (
Class-Instance (CI) CI.P7
CI.P8 ?)s1, a, c?)(s1, ?p1, o1 (?)s1, a, c)(s2, ?p1, ?s1 (
Instance-Instance (II) II.P9
II.P10 )s, ?p, o(
) s, ?p1, ?x?)(x, ?p2, o (
Class-Property (CP) CP.P14 ?) s, a, c?)(x, p, ?s (
Property-Property (PP) - -
AKSW group - Universität Leipzig 24 August 2011
Query generation algorithm
16AKSW group - Universität Leipzig 24 August 2011
Example
Consider two keywords : "Germany“ and "island“ User intention: the list of Germany's islands.
After applying mapping and ranking functions on the user keywords, we obtain two identified IRIs, i.e.
1. http://dbpedia.org/ ontology/ Island with the type class
2. http://dbpedia.org/ resource/Germany with the type instance.
The possible graph pattern templates for these two IRIs are:
1. (?island, a, dbo:Island), (?island, ?p, dbr:Germany)
2. (?island, a, dbo:Island), (dbr:Germany, ?p, ?island)
17AKSW group - Universität Leipzig 24 August 2011
Example
SPARQL queries are:
SELECT * WHERE { ?island a dbo:Island . ?island ?p dbp:Germany . }
SELECT * WHERE { ?island a dbo:Island . dbp:Germany ?p ?island. }
Some desired answers to be retrieved are: db:Rettbergsaue a dbo:Island .
db:Rettbergsaue dbp:country dbr:Germany .
db:Sylt a dbo:Island .
db:Sylt dbp:country dbr:Germany .
db:Vilm a dbo:Island .
db:Vilm dbp:country dbr:Germany .
db:Mainau a dbo:Island .
db:Mainau dbp:country dbr:Germany .
18AKSW group - Universität Leipzig 24 August 2011
Online interface
19AKSW group - Universität Leipzig
lod-query.aksw.org
24 August 2011
Outline• Introduction• Entity recognition phase• SPARQL query generation phase• Evaluation• Conclusion and future work
20AKSW group - Universität Leipzig 24 August 2011
Accuracy metrics
• The user’s intention in keyword-based search is ambiguous.
• Judging the correctness of the retrieved answers is a challenging task.
• Example: Given the keywords France and President .
• Following RDF graphs (i.e. answers) are presented to the user:1. Nicolas_Sarkozyy nationality France .
Nicolas_Sarkozy a President .
2. Felix_Faure birthplace France .
Felix_Faure a President .
3. Yasser_Arafat deathplace France .
Yasser_Arafat a President .
...
21AKSW group - Universität Leipzig 24 August 2011
Accuracy metrics
• Besides distinguishing between answers related to different interpretations, we differentiate between pure answers (just containing preferred terms) and those which contain some impurity.
• In fact, the correctness of an answer is not a bivalent value.
• We investigate two questions:
1) For how many of the keyword queries do the templates yield answers at all with respect to the original intention?
2) If answers are returned, how correct are they?
AKSW group - Universität Leipzig 2224 August 2011
Accuracy metrics
• Correctness rate. For an individual answer, we define correctness rate as the fraction of correct (preferred) RDF terms occurring in it.
• Average CR. For a given set of answers of a query q, we define average correct rate as the arithmetic mean of the CRs of its individual answers.
• Fuzzy precision metric (FP). which measures the overall correctness of the answers corresponding to a set of keyword queries.
AKSW group - Universität Leipzig 2324 August 2011
Accuracy metrics
• We also measured the recall as the fraction of keyword queries for which answers were found:
AKSW group - Universität Leipzig 2424 August 2011
Accuracy of each categorized graph pattern
25AKSW group - Universität Leipzig 24 August 2011
Categorization based on the matter of information.
1. Finding special characteristics of an instance - IP.P1, IP.P4 IP.P6
2. Finding similar instances - CI.P7, CI.P8, CP.P14
• Finding associations between instances - II.P9, II.P10
26AKSW group - Universität Leipzig 24 August 2011
Samples of keywords and results
27AKSW group - Universität Leipzig 24 August 2011
Accuracy results for different categories
Category Recall Fuzzy precision F-score
Similar instances 0.700 0.735 0.717
Characteristics of an instance
0.625 0.700 0.660
Associations between instances
0.500 0.710 0.580
General accuracy 0.625 0.724 0.670
28AKSW group - Universität Leipzig 24 August 2011
Outline• Introduction• Entity recognition Phase• SPARQL query generation • Evaluation• Conclusion and future work
29AKSW group - Universität Leipzig 24 August 2011
Conclusion and future work
• Analysis of graph patterns for limiting search space.
• We did not separate ontology level and knowledge base level for generating graph patterns.
We aim to:
1. Allow a larger number of keywords.
2. Make more extensive use of linguistic features and techniques.
3. Enable users to refine obtained queries and to add additional constraints.
4. Apply this work on large-scale datasets of Data Web.
30AKSW group - Universität Leipzig 24 August 2011
31
Thank you for your attention.Thanks to my colleague from AKSW
research group.Any Question?
AKSW group - Universität Leipzig 24 August 2011