Keyword Search in RDF Databasescgi.di.uoa.gr/.../files/dissertation_msc_presentation.pdfMSc...
Transcript of Keyword Search in RDF Databasescgi.di.uoa.gr/.../files/dissertation_msc_presentation.pdfMSc...
Keyword Search in RDF Databases
Charalampos S. [email protected]
Department of Informatics & TelecommunicationsUniversity of Athens
MSc Dissertation PresentationApril 15, 2011
Outline
Background
User Requirements
Our Approach
Implementation
Experimental Evaluation
Other Directions to Keyword Search
Conclusions & Future Work
Background
Keyword Search on (Semi-)Structured data
Data modelGraph-based data model
I Schema-aware
X
I Schema-agnostic (comeswith specialized indexes)
Exploration Algorithms
I Backward Expansion
I Bidirectional Expansion
I Variants of the above
X
Structure of the Answer
I Trees
I Graphs
I Nodes/Entities
X
I (Multi-)Relations
Ranking/Scoring of Answers
I TF/IDF and other morecomplex measures from IR
X
I Node/Edge weights
X
I Page Rank like,Hub/Authority nodes
X
I Path length
X
I Number of nodes/edges
X
Keyword Search on (Semi-)Structured data
Data modelGraph-based data model
I Schema-aware
X
I Schema-agnostic (comeswith specialized indexes)
Exploration Algorithms
I Backward Expansion
I Bidirectional Expansion
I Variants of the above
X
Structure of the Answer
I Trees
I Graphs
I Nodes/Entities
X
I (Multi-)Relations
Ranking/Scoring of Answers
I TF/IDF and other morecomplex measures from IR
X
I Node/Edge weights
X
I Page Rank like,Hub/Authority nodes
X
I Path length
X
I Number of nodes/edges
X
Keyword Search on (Semi-)Structured data
Data modelGraph-based data model
I Schema-aware
X
I Schema-agnostic (comeswith specialized indexes)
Exploration Algorithms
I Backward Expansion
I Bidirectional Expansion
I Variants of the above
X
Structure of the Answer
I Trees
I Graphs
I Nodes/Entities
X
I (Multi-)Relations
Ranking/Scoring of Answers
I TF/IDF and other morecomplex measures from IR
X
I Node/Edge weights
X
I Page Rank like,Hub/Authority nodes
X
I Path length
X
I Number of nodes/edges
X
Keyword Search on (Semi-)Structured data
Data modelGraph-based data model
I Schema-aware
X
I Schema-agnostic (comeswith specialized indexes)
Exploration Algorithms
I Backward Expansion
I Bidirectional Expansion
I Variants of the above
X
Structure of the Answer
I Trees
I Graphs
I Nodes/Entities
X
I (Multi-)Relations
Ranking/Scoring of Answers
I TF/IDF and other morecomplex measures from IR
X
I Node/Edge weights
X
I Page Rank like,Hub/Authority nodes
X
I Path length
X
I Number of nodes/edges
X
Keyword Search on (Semi-)Structured data
Data modelGraph-based data model
I Schema-aware X
I Schema-agnostic (comeswith specialized indexes)
Exploration Algorithms
I Backward Expansion
I Bidirectional Expansion
I Variants of the above X
Structure of the Answer
I Trees
I Graphs
I Nodes/Entities X
I (Multi-)Relations
Ranking/Scoring of Answers
I TF/IDF and other morecomplex measures from IRX
I Node/Edge weights X
I Page Rank like,Hub/Authority nodes X
I Path length X
I Number of nodes/edges X
User Requirements
Requirements
Use CaseHistorians interested in the evolution of Biotechnology andRenewable Energy (see Papyrus Project)
1. Search for information about certain concepts, facts, events
2. Search may involve temporal restrictions
3. Source of information may contain indefinite temporalinformation
Requirements
Use CaseHistorians interested in the evolution of Biotechnology andRenewable Energy (see Papyrus Project)
1. Search for information about certain concepts, facts, eventse.g., stem cells and public opinion
2. Search may involve temporal restrictions
3. Source of information may contain indefinite temporalinformation
Requirements
Use CaseHistorians interested in the evolution of Biotechnology andRenewable Energy (see Papyrus Project)
1. Search for information about certain concepts, facts, events
2. Search may involve temporal restrictionse.g., evolution of biotechnology in 20th century
3. Source of information may contain indefinite temporalinformation
Requirements
Use CaseHistorians interested in the evolution of Biotechnology andRenewable Energy (see Papyrus Project)
1. Search for information about certain concepts, facts, events
2. Search may involve temporal restrictions
3. Source of information may contain indefinite temporalinformatione.g., “around 7000 BC biotechnology involved brewing beer,fermenting wine and baking bread with help of yeast”
Our Approach
Data Model
I Extension of the temporal RDF data model with indefinitetime intervals for the validity of a triple: (s,p,o)[i ]
I Validity of a resource r : (r ,subclass,Resource)[i ]
I Indefinite time intervals (set I ): (s1,s2,e1,e2), wheres1,s2,e1,e2 are natural numbers
s1 s2 e1 e2
I Indefinite time point: (s,e,s,e)
s e
Data Model
I Extension of the temporal RDF data model with indefinitetime intervals for the validity of a triple: (s,p,o)[i ]
I Validity of a resource r : (r ,subclass,Resource)[i ]
I Indefinite time intervals (set I ): (s1,s2,e1,e2), wheres1,s2,e1,e2 are natural numbers
s1 s2 e1 e2
I Indefinite time point: (s,e,s,e)
s e
Data Model
I Extension of the temporal RDF data model with indefinitetime intervals for the validity of a triple: (s,p,o)[i ]
I Validity of a resource r : (r ,subclass,Resource)[i ]
I Indefinite time intervals (set I ): (s1,s2,e1,e2), wheres1,s2,e1,e2 are natural numbers
s1 s2 e1 e2
I Indefinite time point: (s,e,s,e)
s e
Query Language
I Keyword-based language extended with temporal constraintsexpressed in a controlled natural language
I Temporal constraints (set TR):I Allen’s temporal relations: before, after, meets, met by,overlaps, overlapped by, starts, started by, finishes,finished by, during, contains
I Other lexical forms (formal/informal) used in speech andwriting
I Time intervals: [s1− s2,e3− e4], with each si (ei ) being ofthe form yyyy/mm/dd
DefinitionA query is a pair (KL,φ). The expression KL is a list of keywordsand φ is a finite conjunction of formulas of the form R c , whereR ∈ TR and c ∈ I
Query Language
I Keyword-based language extended with temporal constraintsexpressed in a controlled natural language
I Temporal constraints (set TR):I Allen’s temporal relations: before, after, meets, met by,overlaps, overlapped by, starts, started by, finishes,finished by, during, contains
I Other lexical forms (formal/informal) used in speech andwriting
I Time intervals: [s1− s2,e3− e4], with each si (ei ) being ofthe form yyyy/mm/dd
DefinitionA query is a pair (KL,φ). The expression KL is a list of keywordsand φ is a finite conjunction of formulas of the form R c , whereR ∈ TR and c ∈ I
Query Language
I Keyword-based language extended with temporal constraintsexpressed in a controlled natural language
I Temporal constraints (set TR):I Allen’s temporal relations: before, after, meets, met by,overlaps, overlapped by, starts, started by, finishes,finished by, during, contains
I Other lexical forms (formal/informal) used in speech andwriting
I Time intervals: [s1− s2,e3− e4], with each si (ei ) being ofthe form yyyy/mm/dd
DefinitionA query is a pair (KL,φ). The expression KL is a list of keywordsand φ is a finite conjunction of formulas of the form R c , whereR ∈ TR and c ∈ I
Example
DatabaseBerlin has been a city since some point in the 12th century
(ex:city, rdf:type, rdfs:Class).
(ex:city, rdfs:label, "City").
(ex:berlin, foaf:name, "Berlin").
(ex:berlin, rdf:type, ex:city)[1100 - 1190, 2011].
Example
DatabaseBerlin has been a city since some point in the 12th century
(ex:city, rdf:type, rdfs:Class).
(ex:city, rdfs:label, "City").
(ex:berlin, foaf:name, "Berlin").
(ex:berlin, rdf:type, ex:city)[1100 - 1190, 2011].
Information needs (1)
I would like information about cities during the period 1200 – 1210
city during [1200, 1210]
Example
DatabaseBerlin has been a city since some point in the 12th century
(ex:city, rdf:type, rdfs:Class).
(ex:city, rdfs:label, "City").
(ex:berlin, foaf:name, "Berlin").
(ex:berlin, rdf:type, ex:city)[1100 - 1190, 2011].
Information needs (1)
I would like information about cities during the period 1200 – 1210
city during [1200, 1210]
AnswerThere is such a city named Berlin
Example
DatabaseBerlin has been a city since some point in the 12th century
(ex:city, rdf:type, rdfs:Class).
(ex:city, rdfs:label, "City").
(ex:berlin, foaf:name, "Berlin").
(ex:berlin, rdf:type, ex:city)[1100 - 1190, 2011].
Information needs (2)
I would like information about cities during the period 1100 – 1110
city during [1100, 1110]
Example
DatabaseBerlin has been a city since some point in the 12th century
(ex:city, rdf:type, rdfs:Class).
(ex:city, rdfs:label, "City").
(ex:berlin, foaf:name, "Berlin").
(ex:berlin, rdf:type, ex:city)[1100 - 1190, 2011].
Information needs (2)
I would like information about cities during the period 1100 – 1110
city during [1100, 1110]
AnswerThere is possibly a city named Berlin
Example
DatabaseBerlin has been a city since some point in the 12th century
(ex:city, rdf:type, rdfs:Class).
(ex:city, rdfs:label, "City").
(ex:berlin, foaf:name, "Berlin").
(ex:berlin, rdf:type, ex:city)[1100 - 1190, 2011].
The ideal answerThe answer to both questions should include the individual“Berlin” and the class “City”, but in the second case these twoentities should have lower score to convey the notion of possibility
Data Structures
Data graph
The underlying RDF graph
Schema graph
A summary of the data graph, containing data-driven schemainformation
Query graph
Super graph of schema graph containing elements from the RDFgraph
Data Structures
Data graph
The underlying RDF graph
Schema graph
A summary of the data graph, containing data-driven schemainformation
Query graph
Super graph of schema graph containing elements from the RDFgraph
Data Structures
Data graph
The underlying RDF graph
Schema graph
A summary of the data graph, containing data-driven schemainformation
Query graph
Super graph of schema graph containing elements from the RDFgraph
Data graph
Subject Predicate Object
pro1 type Projectpro2 type Projectpro1 name Papyruspub1 type Publicationpub1 author res1pub1 author res2pub1 year 2010pub2 type Publicationres1 type Researcherres2 type Researcheruniv1 name DI&Tres2 name Y. Ioannidisres1 name M. Koubarakisres1 worksAt univ1univ1 type Universityuniv2 type University
University subclass AgentAgent subclass Resourcepub1 hasProject pro1
subclass
subclas
s
subclasssubclass
works
At
hasProject
author
pro1
pro2
pub1
pub2
resres
2
res3
univ1
univ2
Project
Researcher
University
Publication
Person Resource
Agent
author
type
type
type
type
type
type
type
ManolisKoubarakis
YannisIoannidis
name
namePapyrus name
DI&T
name
2010
typeyear
type
1
Figure: a) RDF triples b) Data graph
Schema and Query graphs
Project Researcher
Publication Person
Agent
University
Resource
subclass
subclas
s
subclass
subcla
ss
worksAt
hasProject au
thor
2010 Resource
Agent
Researcher
Person
University
hasProject
author
subcla
ss
worksAt
year
name
subclass
subclass
subclas
s
Project
Publication
ManolisKoubarakis
Figure: a) schema graph
b) query graph for koubarakis publications
during 2010
Schema and Query graphs
Project Researcher
Publication Person
Agent
University
Resource
subclass
subclas
s
subclass
subcla
ss
worksAt
hasProject au
thor
2010 Resource
Agent
Researcher
Person
University
hasProject
author
subcla
ss
worksAt
year
name
subclass
subclass
subclas
s
Project
Publication
ManolisKoubarakis
Figure: a) schema graph b) query graph for koubarakis publications
during 2010
Keyword Search Algorithm
Query processing in 4 phases:
1. Keyword Interpretation (KI): Interpret keywords as RDF graphelements (keyword elements) and construct the query graph
2. Graph Exploration (GE): Explore the query graph startingfrom keyword elements for finding top-k subgraphs
3. Query Mapping (QM): Map top-k subgraphs to SPARQLqueries and evaluate them
4. Entity Transformation (ET): Transform and rank entitiesappearing in subgraphs and results from query evaluation toentities
Running query
koubarakis publications during 2010
Keyword Search Algorithm
Query processing in 4 phases:
1. Keyword Interpretation (KI): Interpret keywords as RDF graphelements (keyword elements) and construct the query graph
2. Graph Exploration (GE): Explore the query graph startingfrom keyword elements for finding top-k subgraphs
3. Query Mapping (QM): Map top-k subgraphs to SPARQLqueries and evaluate them
4. Entity Transformation (ET): Transform and rank entitiesappearing in subgraphs and results from query evaluation toentities
Running query
koubarakis publications during 2010
Keyword Search Algorithm
Query processing in 4 phases:
1. Keyword Interpretation (KI): Interpret keywords as RDF graphelements (keyword elements) and construct the query graph
2. Graph Exploration (GE): Explore the query graph startingfrom keyword elements for finding top-k subgraphs
3. Query Mapping (QM): Map top-k subgraphs to SPARQLqueries and evaluate them
4. Entity Transformation (ET): Transform and rank entitiesappearing in subgraphs and results from query evaluation toentities
Running query
koubarakis publications during 2010
Keyword Search Algorithm
Query processing in 4 phases:
1. Keyword Interpretation (KI): Interpret keywords as RDF graphelements (keyword elements) and construct the query graph
2. Graph Exploration (GE): Explore the query graph startingfrom keyword elements for finding top-k subgraphs
3. Query Mapping (QM): Map top-k subgraphs to SPARQLqueries and evaluate them
4. Entity Transformation (ET): Transform and rank entitiesappearing in subgraphs and results from query evaluation toentities
Running query
koubarakis publications during 2010
Keyword Search Algorithm
Query processing in 4 phases:
1. Keyword Interpretation (KI): Interpret keywords as RDF graphelements (keyword elements) and construct the query graph
2. Graph Exploration (GE): Explore the query graph startingfrom keyword elements for finding top-k subgraphs
3. Query Mapping (QM): Map top-k subgraphs to SPARQLqueries and evaluate them
4. Entity Transformation (ET): Transform and rank entitiesappearing in subgraphs and results from query evaluation toentities
Running query
koubarakis publications during 2010
Keyword Interpretationkoubarakis publications during 2010
Project Researcher
Publication Person
Agent
University
Resourcesubclass
subclas
s
subclass
subcla
ss
worksAt
hasProject au
thor
Keyword Interpretationkoubarakis publications during 2010
2010 Resource
Agent
Researcher
Person
University
hasProject
author
subcla
ss
worksAt
year
name
subclass
subclass
subclas
s
Project
Publication
ManolisKoubarakis
Graph Explorationkoubarakis publications during 2010
Graph Explorationkoubarakis publications during 2010
2010 Resource
Agent
Researcher
Person
University
hasProject
author
subcla
ss
worksAt
year
name
subclass
subclass
subclas
s
Project
Publication
ManolisKoubarakis
Graph Explorationkoubarakis publications during 2010
2010 Resource
Agent
Researcher
Person
University
hasProject
author
subclass
worksAt
year
name
subcl
ass
subclass
subclas
s
Project
Publicat ion
ManolisKoubarakis
Query Mappingkoubarakis publications during 2010
Query Mappingkoubarakis publications during 2010
2010 Resource
Agent
Researcher
Person
University
hasProject
author
subclass
worksAt
year
name
subcl
ass
subclass
subclas
s
Project
Publicat ion
ManolisKoubarakis
Query Mappingkoubarakis publications during 2010
2010
Researcherauth
or
year
name
Publicat ion
ManolisKoubarakis
Query Mappingkoubarakis publications during 2010
2010
Researcherauth
or
year
name
Publicat ion
ManolisKoubarakis
Conjunctive query
name(x,”Koubarakis”)∧type(x,Researcher)∧author(y,x)∧year(y,”2010”)∧type(y,Publication)
Query Mappingkoubarakis publications during 2010
SPARQL Query
SELECT ?x ?y ?xl ?yl
WHERE {
?x rdf:type ex:Researcher .
?x foaf:name "Manolis Koubarakis" .
?y ex:author ?x .
?x rdf:type ex:Publication .
?x ex:year "2010" .
?x rdfs:label ?xl .
?y rdfs:label ?yl .
}
Query Mappingkoubarakis publications during 2010
Table: SPARQL result
?x ?y ?xl ?ylres1 pub1 “Manolis Koubarakis”
Entity Transformationkoubarakis publications during 2010
Table: Answer
Rank Entity1 “Manolis Koubarakis”2 Publication3 pub14 Researcher
How ranking is performed?
Entity Transformationkoubarakis publications during 2010
Table: Answer
Rank Entity1 “Manolis Koubarakis”2 Publication3 pub14 Researcher
How ranking is performed?
Scoring FunctionsScoring of a graph (cont’d)
For any cost function cf : V ∪E → [0,1]:
CG = ∑pi∈P
∑n∈pi
cf (n)
Path length
cf (n)≡ cp(n) = 1/diamG , for any element n
Keyword Matching
cf (n)≡ ckw (n) =
{ε > 0 , if 1− sim(n) = 01− sim(n) , otherwise
Scoring FunctionsScoring of a graph (cont’d)
For any cost function cf : V ∪E → [0,1]:
CG = ∑pi∈P
∑n∈pi
cf (n)
Path length
cf (n)≡ cp(n) = 1/diamG , for any element n
Keyword Matching
cf (n)≡ ckw (n) =
{ε > 0 , if 1− sim(n) = 01− sim(n) , otherwise
Scoring FunctionsScoring of a graph (cont’d)
For any cost function cf : V ∪E → [0,1]:
CG = ∑pi∈P
∑n∈pi
cf (n)
Path length
cf (n)≡ cp(n) = 1/diamG , for any element n
Keyword Matching
cf (n)≡ ckw (n) =
{ε > 0 , if 1− sim(n) = 01− sim(n) , otherwise
Scoring FunctionsScoring of a graph
Popularity
cf (n)≡ cpop(n) =
1− |nagg ||V | , if n ∈ VC
1− |ninc ||V | , if n ∈ VE
1− |nagg ||E | , if n ∈ LR
Combine
cf (n)≡ ccomb(n) = cp(n)∗ ckw (n)∗ cpop(n)
Scoring FunctionsScoring of a graph
Popularity
cf (n)≡ cpop(n) =
1− |nagg ||V | , if n ∈ VC
1− |ninc ||V | , if n ∈ VE
1− |nagg ||E | , if n ∈ LR
Combine
cf (n)≡ ccomb(n) = cp(n)∗ ckw (n)∗ cpop(n)
Scoring FunctionsScoring of an entity
Entity Cost
Represents the weighted average of the cost of an entity over thesubgraphs it appears
Cost(e,S) =Ccf (e)∗minSGCost
|SGS(e)| ∑SGi∈SGS (e)
1
Ccf (SGi )
Final Entity Cost
Represents the average cost of an entity derived both during graphexploration and query mapping
Cost(e) =
Cost(e,SGE)+Cost(e,QE)
2 , if e ∈ SGE ∩QECost(e,SGE ) , if e ∈ SGE and e /∈ QECost(e,QE ) , if e ∈ QE and e /∈ SGE
Scoring FunctionsScoring of an entity
Entity Cost
Represents the weighted average of the cost of an entity over thesubgraphs it appears
Cost(e,S) =Ccf (e)∗minSGCost
|SGS(e)| ∑SGi∈SGS (e)
1
Ccf (SGi )
Final Entity Cost
Represents the average cost of an entity derived both during graphexploration and query mapping
Cost(e) =
Cost(e,SGE)+Cost(e,QE)
2 , if e ∈ SGE ∩QECost(e,SGE ) , if e ∈ SGE and e /∈ QECost(e,QE ) , if e ∈ QE and e /∈ SGE
Implementation
System Architecture
Figure: The architecture of the keyword querying system
System Implementation
RDF Store:
Sesame
Keyword index and full-text search: LuceneSail
Query Processor: Java implementation
System Implementation
RDF Store: Sesame
Keyword index and full-text search: LuceneSail
Query Processor: Java implementation
System Implementation
RDF Store: Sesame
Keyword index and full-text search:
LuceneSail
Query Processor: Java implementation
System Implementation
RDF Store: Sesame
Keyword index and full-text search: LuceneSail
Query Processor: Java implementation
System Implementation
RDF Store: Sesame
Keyword index and full-text search: LuceneSail
Query Processor:
Java implementation
System Implementation
RDF Store: Sesame
Keyword index and full-text search: LuceneSail
Query Processor: Java implementation
Experimental Evaluation
Experimental Setup
Machine
I CPU: Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00 GHz, L26144 KB
I Main Memory: 4 GB, 1033 MHz
I Hard Disk: 750 GB, 7200 rpm, 8 MB Buffer
Datasets
I History Ontology (HO)
I Semantic Web Dog Food Ontology (SWDF)
I Digital Bibliography & Library Project Ontology (DBLP)
Evaluated Dimensions
I Efficiency: Load Scalability, Query Answering Performance
I Effectiveness: Precision/Recall, F-measure, NDCG
Experimental Setup
Machine
I CPU: Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00 GHz, L26144 KB
I Main Memory: 4 GB, 1033 MHz
I Hard Disk: 750 GB, 7200 rpm, 8 MB Buffer
Datasets
I History Ontology (HO)
I Semantic Web Dog Food Ontology (SWDF)
I Digital Bibliography & Library Project Ontology (DBLP)
Evaluated Dimensions
I Efficiency: Load Scalability, Query Answering Performance
I Effectiveness: Precision/Recall, F-measure, NDCG
Experimental Setup
Machine
I CPU: Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00 GHz, L26144 KB
I Main Memory: 4 GB, 1033 MHz
I Hard Disk: 750 GB, 7200 rpm, 8 MB Buffer
Datasets
I History Ontology (HO)
I Semantic Web Dog Food Ontology (SWDF)
I Digital Bibliography & Library Project Ontology (DBLP)
Evaluated Dimensions
I Efficiency: Load Scalability, Query Answering Performance
I Effectiveness: Precision/Recall, F-measure, NDCG
Dataset Characteristics
Triples Classes Properties Instances Avg. Inst./ClassHO 8,327 367 727 1,405 4
SWDF 88,996 96 369 8,580 89DBLP 55,364,046 12 20 3,609,294 300,775
HO: High schema complexity and small number of instances
SWDF: Medium schema complexity and number of instances
DBLP: Low schema complexity and high number of instances
EfficiencyLoad/Service Scalability
# users ≡ # sessions ≡ # queries
10
100
1000
10000
1 2 4 8 16 32 64 128 256
#sec
onds
(lo
g sc
ale)
#users
HOSWDF
0
5000
10000
15000
20000
25000
30000
1 2 4 8 16 32
#sec
onds
(lo
g sc
ale)
#users
Figure: Load scalability for a) HO/SWDF and b) DBLP
EfficiencyIndex Performance (cont’d)
0
5
10
15
20
25
30
0 10 20 30 40 50 60
#sec
onds
(th
ousa
nds)
#triples (millions)
LucenInfInference
LuceneNative
Figure: Index construction (DBLP)
Native Triple indexingon disk
Lucene Native+Keywordindexing
Inference Native+Schemagraph inference
LuceneInf Lucene+Schemagraph inference
EfficiencyIndex Performance
0
5000
10000
15000
20000
25000
30000
HISTORY DOGFOOD DBLP
#sec
onds
Dataset
Native Lucene Inference LuceneInf
0
1
2
3
4
5
6
7
8
9
HISTORY DOGFOOD
100
200
300
400
500
600
700
800
900
1000
1100
HISTORY DOGFOOD DBLP
#ms
Dataset
Figure: a) Index construction time, b) Load times of schema graph index
EfficiencyScoring Function Performance (cont’d)
Figure: Tasks performance: HO
EfficiencyScoring Function Performance (cont’d)
Figure: Tasks performance: SWDF
EfficiencyScoring Function Performance
Conclusion
Scoring functions have similar computational characteristics
Figure: Tasks performance: DBLP
EfficiencyScoring Function Performance
Conclusion
Scoring functions have similar computational characteristics
Figure: Tasks performance: DBLP
EfficiencyQuery Performance (cont’d)
Figure: Tasks performance: HO
EfficiencyQuery Performance (cont’d)
Figure: Tasks performance: SWDF
EfficiencyQuery Performance (cont’d)
Figure: Tasks performance: DBLP
EfficiencyQuery Performance
Table: Tasks performance: Overall contribution
KI GE QM ETHO 0.23 0.60 0.13 0.02
SWDF 0.35 0.52 0.11 0.01DBLP 0.90 0.05 0.04 0.01
Conclusions
HO: QP dominated by graph exploration
SWDF: QP rather fairly distributed among querying tasks
DBLP: QP dominated by keyword indexing
EfficiencyQuery Performance
Table: Tasks performance: Overall contribution
KI GE QM ETHO 0.23 0.60 0.13 0.02
SWDF 0.35 0.52 0.11 0.01DBLP 0.90 0.05 0.04 0.01
Conclusions
HO: QP dominated by graph exploration
SWDF: QP rather fairly distributed among querying tasks
DBLP: QP dominated by keyword indexing
EffectivenessMeasurement methodology (cont’d)
Precision (or how succinct is the answer)
P =#(relevant items retrieved)
#(retrieved items)
Recall (or how much did it cover the question)
R =#(relevant items retrieved)
#(relevant items)
F1-measure (or how much it is to the point)
F1 =2PR
P +R
EffectivenessMeasurement methodology (cont’d)
Precision (or how succinct is the answer)
P =#(relevant items retrieved)
#(retrieved items)
Recall (or how much did it cover the question)
R =#(relevant items retrieved)
#(relevant items)
F1-measure (or how much it is to the point)
F1 =2PR
P +R
EffectivenessMeasurement methodology (cont’d)
Precision (or how succinct is the answer)
P =#(relevant items retrieved)
#(retrieved items)
Recall (or how much did it cover the question)
R =#(relevant items retrieved)
#(relevant items)
F1-measure (or how much it is to the point)
F1 =2PR
P +R
EffectivenessMeasurement methodology
NDCG (or how good was the ranking)
NDCGk =r1 + ∑
ki=2
rilog2(i)
IDCGk, k = 15
Judgement
I 5 history experts
I 20 keyword queries (2-3 keywords each)
I Relevance judgements for relevant entities
EffectivenessMeasurement methodology
NDCG (or how good was the ranking)
NDCGk =r1 + ∑
ki=2
rilog2(i)
IDCGk, k = 15
Judgement
I 5 history experts
I 20 keyword queries (2-3 keywords each)
I Relevance judgements for relevant entities
Effectiveness
Figure: Effectiveness evaluation results (HO)
Other Directions to KeywordSearch
Other Directions to Keyword Search
Browsing
I Rich interfaces for result exploration, discovery of hiddeninter-connections, and query refinement
I Zero-effort web publishing
Other Directions to Keyword Search
Result Snippets
Generation of small passage descriptions for quick judgement ofresults
Other Directions to Keyword Search
Result Clustering
Clustering of results according to different interpretations of thesemantics of the query
Other Directions to Keyword Search
Query Cleaning
I Semantic linkage and spelling corrections of database-relevantquery keywords
I Segmentation of nearby query keywords so that each segmentcorresponds to a high quality data term
Conclusions & Future Work
Conclusions
I Design & implementation of a keyword-based system on RDFgraphs with indefinite temporal information
I Evaluation results:
rather scalable, modest performance,modest effectiveness
I Keyword interpretation and graph exploration call forimprovement
Conclusions
I Design & implementation of a keyword-based system on RDFgraphs with indefinite temporal information
I Evaluation results: rather scalable
, modest performance,modest effectiveness
I Keyword interpretation and graph exploration call forimprovement
Conclusions
I Design & implementation of a keyword-based system on RDFgraphs with indefinite temporal information
I Evaluation results: rather scalable, modest performance
,modest effectiveness
I Keyword interpretation and graph exploration call forimprovement
Conclusions
I Design & implementation of a keyword-based system on RDFgraphs with indefinite temporal information
I Evaluation results: rather scalable, modest performance,modest effectiveness
I Keyword interpretation and graph exploration call forimprovement
Conclusions
I Design & implementation of a keyword-based system on RDFgraphs with indefinite temporal information
I Evaluation results: rather scalable, modest performance,modest effectiveness
I Keyword interpretation and graph exploration call forimprovement
Future Work
I Sesame/LuceneSail substitution by BigOWLIM
I Keyword interpretation and graph exploration improvement
I Challenge: Integration and querying of semi-structured dataand linked-data, stored in different formats (RDF, XML) anddata sources (ontologies, knowledge bases, databases)
I Schema-agnostic or hybrid data model
Future Work
I Sesame/LuceneSail substitution by BigOWLIM
I Keyword interpretation and graph exploration improvement
I Challenge: Integration and querying of semi-structured dataand linked-data, stored in different formats (RDF, XML) anddata sources (ontologies, knowledge bases, databases)
I Schema-agnostic or hybrid data model
Future Work
I Sesame/LuceneSail substitution by BigOWLIM
I Keyword interpretation and graph exploration improvement
I Challenge: Integration and querying of semi-structured dataand linked-data, stored in different formats (RDF, XML) anddata sources (ontologies, knowledge bases, databases)
I Schema-agnostic or hybrid data model
Future Work
I Sesame/LuceneSail substitution by BigOWLIM
I Keyword interpretation and graph exploration improvement
I Challenge: Integration and querying of semi-structured dataand linked-data, stored in different formats (RDF, XML) anddata sources (ontologies, knowledge bases, databases)
I Schema-agnostic or hybrid data model
This is the End. . .