Hybrid Acquisition of Temporal Scopes for RDF Data
Anisa Rula1, Matteo Palmonari1, Axel-Cyrille Ngonga Ngomo2, Daniel Gerber2, Jens Lehmann2, and Lorenz Bühmann2
1. University of Milano-Bicocca, SITI Lab2. Universität Leipzig, Institut für Informatik, AKSW
2
Outline
Anisa Rula
1. Introduction & Motivation
2. Approach Overview
3. Details of the Approach
4. Experimental Evaluation
5. Conclusions
Anisa Rula
Some facts are always valid while other facts are valid for a certain timeinterval (volatile facts)
Temporal Scoping of RDF triples
3
Anisa Rula
Some facts are always valid while other facts are valid for a certain timeinterval (volatile facts)Volatile facts are represented by triples whose validity is defined by atime interval i.e. the temporal scope
Temporal Scoping of RDF triples
3
team
team
Alexandre Pato
S.C. Corinthians
Anisa Rula
Some facts are always valid while other facts are valid for a certain timeinterval (volatile facts)Volatile facts are represented by triples whose validity is defined by atime interval i.e. the temporal scope
Temporal Scoping of RDF triples
A.C. Milan
3
team
team
Alexandre Pato
S.C. Corinthians
Anisa Rula
Some facts are always valid while other facts are valid for a certain timeinterval (volatile facts)Volatile facts are represented by triples whose validity is defined by atime interval i.e. the temporal scope
Temporal Scoping of RDF triples
2007-2013
2013-2014
Temporal scopes, represented by time intervals
A.C. Milan
3
team
team
Temporally annotated RDF triples
Alexandre Pato
S.C. Corinthians
Anisa Rula
Some facts are always valid while other facts are valid for a certain timeinterval (volatile facts)Volatile facts are represented by triples whose validity is defined by atime interval i.e. the temporal scope
Temporal Scoping of RDF triples
2007-2013
2013-2014
Temporal scopes, represented by time intervals
A.C. Milan
3
Motivation World changes: relations represented in RDF triples may be valid only
for a specific time interval [Gutierrez et al.,2005]o E.g. <Alexandre_Pato, team, A.C._Milan> [2007,2013]
Motivation & Challenges
Anisa Rula 4
Motivation World changes: relations represented in RDF triples may be valid only
for a specific time interval [Gutierrez et al.,2005]o E.g. <Alexandre_Pato, team, A.C._Milan> [2007,2013]
Many applications have to use temporally annotated RDF tripleso E.g. Temporal Query Answering, Question Answering over KBs, Temporal
Reasoning, Timelines
Motivation & Challenges
Anisa Rula 4
Motivation World changes: relations represented in RDF triples may be valid only
for a specific time interval [Gutierrez et al.,2005]o E.g. <Alexandre_Pato, team, A.C._Milan> [2007,2013]
Many applications have to use temporally annotated RDF tripleso E.g. Temporal Query Answering, Question Answering over KBs, Temporal
Reasoning, Timelines
Motivation & Challenges
Anisa Rula 4
Temporally annotated RDF triples are largely unavailable or incomplete in the LOD
(Rula et al., 2012)
Motivation World changes: relations represented in RDF triples may be valid only
for a specific time interval [Gutierrez et al.,2005]o E.g. <Alexandre_Pato, team, A.C._Milan> [2007,2013]
Many applications have to use temporally annotated RDF tripleso E.g. Temporal Query Answering, Question Answering over KBs, Temporal
Reasoning, Timelines
Challenges
Motivation & Challenges
Anisa Rula 4
Temporally annotated RDF triples are largely unavailable or incomplete in the LOD
(Rula et al., 2012)
Motivation World changes: relations represented in RDF triples may be valid only
for a specific time interval [Gutierrez et al.,2005]o E.g. <Alexandre_Pato, team, A.C._Milan> [2007,2013]
Many applications have to use temporally annotated RDF tripleso E.g. Temporal Query Answering, Question Answering over KBs, Temporal
Reasoning, Timelines
Challenges Low availability and quality of temporal information in RDF data
Motivation & Challenges
Anisa Rula 4
Temporally annotated RDF triples are largely unavailable or incomplete in the LOD
(Rula et al., 2012)
Motivation World changes: relations represented in RDF triples may be valid only
for a specific time interval [Gutierrez et al.,2005]o E.g. <Alexandre_Pato, team, A.C._Milan> [2007,2013]
Many applications have to use temporally annotated RDF tripleso E.g. Temporal Query Answering, Question Answering over KBs, Temporal
Reasoning, Timelines
Challenges Low availability and quality of temporal information in RDF data NLP challenges for web-scale temporal information extraction
(scalability, availability of corpus, conflicting information) [Derczynsk et al., 2013, Ling et al., 2010]
Motivation & Challenges
Anisa Rula 4
Temporally annotated RDF triples are largely unavailable or incomplete in the LOD
(Rula et al., 2012)
Anisa Rula
Approach Overview: Use the Web as Source of Evidence
team
teamAlexandre Pato
S.C. Corinthians
A.C. Milan
5Anisa Rula
Use evidence from the Web for temporal scoping of RDF triples
Anisa Rula
Approach Overview: Use the Web as Source of Evidence
Web of Data - RDF (61.9 Billion)
World Wide Web (1.8 Billion)
Source of evidence
team
teamAlexandre Pato
S.C. Corinthians
A.C. Milan
5Anisa Rula
Use evidence from the Web for temporal scoping of RDF triples
Anisa Rula
Approach Overview: Use the Web as Source of Evidence
Web of Data - RDF (61.9 Billion)
World Wide Web (1.8 Billion)
Source of evidence
team
teamAlexandre Pato
S.C. Corinthians
A.C. Milan
5Anisa Rula
Use evidence from the Web for temporal scoping of RDF triples
Anisa Rula
Approach Overview: Use the Web as Source of Evidence
Web of Data - RDF (61.9 Billion)
World Wide Web (1.8 Billion)
Source of evidence
Temporally annotated RDF triples
team
teamAlexandre Pato
team
team
Alexandre Pato
S.C. Corinthians
A.C. Milan
2007-2013
2013-2014S.C. Corinthians
A.C. Milan
5Anisa Rula
Use evidence from the Web for temporal scoping of RDF triples
Web of Documents
Mapping facts to time intervalsTemporal Information Extraction
fact
t1 occ1
t2 occ2
t3 occ3
t4 occ4
Matching Selection
Reasoning
Approach Overview: Hybrid Acquisition of Time Scopes
<s,p,o>
Web of Data
Temporally annotated RDF triples
6Anisa Rula
Set of disconnected time intervals
<s,p,o>[x1,y1],…,[xn,yn]
Temporal Information Extraction - Web Documents
Anisa Rula 7
DeFacto [Lehmann & al. 2012] Retrieves a set of webpages that
confirm the given RDF triple The RDF triple issued to the search
engine is verbalized by using natural language patterns
Temporal Extension for DeFacto (TempDeFacto) Apply Named Entity Tagger to extract the entities of type Date class Observe the occurrences of the labels of the subject and object in less
than 20 tokens Analyze the context window of n characters before and after subject-
object occurrences in order to retrieve the time points Return a distribution vector of date and their number of occurrences
Temporal Information Extraction - Web Documents
Anisa Rula 8
<Alexandre_Pato,team, A.C._Milan>
“Alexandre Pato” “played for” “A.C. Milan”“Pato” “’s striker” “Milan”“CR7” “Mi”
Occurrences of the labels of the subject and object
Temporal Information Extraction - Web Documents
Anisa Rula 8
<Alexandre_Pato,team, A.C._Milan>
“Alexandre Pato” “played for” “A.C. Milan”“Pato” “’s striker” “Milan”“CR7” “Mi”
Pato played for A.C. Milan from 2007 to 2013.A.C. Milan’s top striker Pato left in 2013. In 2013 Pato visited Milan for a short holiday.
Occurrences of the labels of the subject and object
Context window of n characters before and after subject-object occurrences
Temporal Information Extraction - Web Documents
Anisa Rula 8
<Alexandre_Pato,team, A.C._Milan>
“Alexandre Pato” “played for” “A.C. Milan”“Pato” “’s striker” “Milan”“CR7” “Mi”
Pato played for A.C. Milan from 2007 to 2013.A.C. Milan’s top striker Pato left in 2013. In 2013 Pato visited Milan for a short holiday.
Occurrences of the labels of the subject and object
Context window of n characters before and after subject-object occurrences
Nam
ed Entity Tagger
Temporal Information Extraction - Web Documents
Anisa Rula 8
<Alexandre_Pato,team, A.C._Milan>
“Alexandre Pato” “played for” “A.C. Milan”“Pato” “’s striker” “Milan”“CR7” “Mi”
Pato played for A.C. Milan from 2007 to 2013.A.C. Milan’s top striker Pato left in 2013. In 2013 Pato visited Milan for a short holiday.
2013 17
2007 11
2006 1
…. ….
2010 4
2009 4
1989 2
Occurrences of the labels of the subject and object
Context window of n characters before and after subject-object occurrences
Nam
ed Entity Tagger
DeFacto Vector (dfv)
Temporal Information Extraction - Web of Data
<Alexandre_Pato>
Content negotiation
Regular expressions
TAlexandre_Pato= {1989, 2000, 2006, 2007, 2008, 2013}Relevant Time Points
RDF document dAlexandre_Pato
Anisa Rula
The set of time intervals for a given triple with starting and ending time points defined with the set of relevant time points
9
Temporal Information Extraction - Web of Data
<Alexandre_Pato>
Content negotiation
null null null null null null
0 null null null null null
0 0 null null null null
0 0 0 null null null
0 0 0 0 null null
0 0 0 0 0 null
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
Relevant Interval Matrix (RIM)
Regular expressions
TAlexandre_Pato= {1989, 2000, 2006, 2007, 2008, 2013}Relevant Time Points
RDF document dAlexandre_Pato
Anisa Rula
The set of time intervals for a given triple with starting and ending time points defined with the set of relevant time points
∀ 𝑟𝑟𝑟𝑟𝑟𝑟𝑡𝑡𝑖𝑖𝑡𝑡𝑗𝑗∈ 𝑅𝑅𝑅𝑅𝑅𝑅𝑒𝑒𝑤𝑤𝑟𝑟𝑤𝑤𝑤 𝑟𝑟, 𝑗𝑗 > 0
𝑓𝑓𝑓𝑓𝑟𝑟 𝑟𝑟 ≤ 𝑗𝑗⇒ 𝑟𝑟𝑟𝑟𝑟𝑟𝑡𝑡𝑖𝑖𝑡𝑡𝑗𝑗 = 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑓𝑓𝑓𝑓𝑟𝑟 𝑟𝑟 > 𝑗𝑗⇒ 𝑟𝑟𝑟𝑟𝑟𝑟𝑡𝑡𝑖𝑖𝑡𝑡𝑗𝑗 = 0
9
null null null null null null
null null null null null
null null null null
null null null
null null
null
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
1. Matching temporal distribution (dfv) against the relevant time interval matrix
RIM
Mapping Facts to Time Intervals - Matching
MatchingSelection
Reasoning
RDF data
2013 17
2007 11
2006 1
2011 6
2008 2
2016 3
2012 15
2010 4
2009 4
1989 2
dfv
Anisa Rula 10
null null null null null null
null null null null null
null null null null
null null null
null null
null
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
1. Matching temporal distribution (dfv) against the relevant time interval matrix
0.004 0.166 0.166 0.736 0.8 2.48
0 0 0.142 1.5 1.555 4.2
0 0 0.002 6 4.666 7.5
0 0 0 0.026 6.5 8.428
0 0 0 0 0.004 8
0 0 0 0 0 0.040
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
RIM
Mapping Facts to Time Intervals - Matching
MatchingSelection
Reasoning
RDF data
2013 17
2007 11
2006 1
2011 6
2008 2
2016 3
2012 15
2010 4
2009 4
1989 2
Significance Matrix (SM)dfv
Anisa Rula 10
null null null null null null
null null null null null
null null null null
null null null
null null
null
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
1. Matching temporal distribution (dfv) against the relevant time interval matrix
0.004 0.166 0.166 0.736 0.8 2.48
0 0 0.142 1.5 1.555 4.2
0 0 0.002 6 4.666 7.5
0 0 0 0.026 6.5 8.428
0 0 0 0 0.004 8
0 0 0 0 0 0.040
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
RIM
Mapping Facts to Time Intervals - Matching
MatchingSelection
Reasoning
RDF data
2013 17
2007 11
2006 1
2011 6
2008 2
2016 3
2012 15
2010 4
2009 4
1989 2
𝑠𝑠𝑟𝑟2007:2008=11 + 2
2 = 6.5
Significance Matrix (SM)dfv
Anisa Rula 10
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
SM
0.004 0.166 0.166 0.736 0.8 2.48
0 0 0.142 1.5 1.555 4.2
0 0 0.002 6 4.666 7.5
0 0 0 0.026 6.5 8.428
0 0 0 0 0.004 8
0 0 0 0 0 0.040
Mapping Facts to Time Intervals - Selection
2. Mapping Selection: top-k function: selects the k intervals that have highest scores in the SM neighbor-x: selects a set of intervals whose significance score is close to
the maximum significance score in the SM matrix, up to a certain threshold x
neighbor-k-x: selects the top-k intervals in the neighborhood of the interval with higher significance score
MatchingSelection
Reasoning
11Anisa Rula
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
SM
0.004 0.166 0.166 0.736 0.8 2.48
0 0 0.142 1.5 1.555 4.2
0 0 0.002 6 4.666 7.5
0 0 0 0.026 6.5 8.428
0 0 0 0 0.004 8
0 0 0 0 0 0.040
Mapping Facts to Time Intervals - Selection
2. Mapping Selection: top-k function: selects the k intervals that have highest scores in the SM neighbor-x: selects a set of intervals whose significance score is close to
the maximum significance score in the SM matrix, up to a certain threshold x
neighbor-k-x: selects the top-k intervals in the neighborhood of the interval with higher significance score
top-k , 𝑘𝑘 = 3 [2006,2013][2007, 2013][2008, 2013]
MatchingSelection
Reasoning
11Anisa Rula
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
SM
0.004 0.166 0.166 0.736 0.8 2.48
0 0 0.142 1.5 1.555 4.2
0 0 0.002 6 4.666 7.5
0 0 0 0.026 6.5 8.428
0 0 0 0 0.004 8
0 0 0 0 0 0.040
Mapping Facts to Time Intervals - Selection
2. Mapping Selection: top-k function: selects the k intervals that have highest scores in the SM neighbor-x: selects a set of intervals whose significance score is close to
the maximum significance score in the SM matrix, up to a certain threshold x
neighbor-k-x: selects the top-k intervals in the neighborhood of the interval with higher significance score
neighbor, 𝑥𝑥 = 23
top-k , 𝑘𝑘 = 3 [2006,2013][2007, 2013][2008, 2013]
[2007,2008][2006,2013][2007, 2013][2008, 2013]
MatchingSelection
Reasoning
11Anisa Rula
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
SM
0.004 0.166 0.166 0.736 0.8 2.48
0 0 0.142 1.5 1.555 4.2
0 0 0.002 6 4.666 7.5
0 0 0 0.026 6.5 8.428
0 0 0 0 0.004 8
0 0 0 0 0 0.040
Mapping Facts to Time Intervals - Selection
2. Mapping Selection: top-k function: selects the k intervals that have highest scores in the SM neighbor-x: selects a set of intervals whose significance score is close to
the maximum significance score in the SM matrix, up to a certain threshold x
neighbor-k-x: selects the top-k intervals in the neighborhood of the interval with higher significance score
neighbor, 𝑥𝑥 = 23
top-k , 𝑘𝑘 = 3
neighbor-k-x ,𝑘𝑘 = 2, 𝑥𝑥 = 23 [2007, 2013][2008, 2013]
[2006,2013][2007, 2013][2008, 2013]
[2007,2008][2006,2013][2007, 2013][2008, 2013]
MatchingSelection
Reasoning
11Anisa Rula
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
SM
0.004 0.166 0.166 0.736 0.8 2.48
0 0 0.142 1.5 1.555 4.2
0 0 0.002 6 4.666 7.5
0 0 0 0.026 6.5 8.428
0 0 0 0 0.004 8
0 0 0 0 0 0.040
Mapping Facts to Time Intervals - Selection
2. Mapping Selection: top-k function: selects the k intervals that have highest scores in the SM neighbor-x: selects a set of intervals whose significance score is close to
the maximum significance score in the SM matrix, up to a certain threshold x
neighbor-k-x: selects the top-k intervals in the neighborhood of the interval with higher significance score
neighbor, 𝑥𝑥 = 23
top-k , 𝑘𝑘 = 3
neighbor-k-x ,𝑘𝑘 = 2, 𝑥𝑥 = 23 [2007, 2013][2008, 2013]
[2006,2013][2007, 2013][2008, 2013]
[2007,2008][2006,2013][2007, 2013][2008, 2013]
MatchingSelection
Reasoning
11Anisa Rula
Mapping Facts to Time Intervals - Reasoning
3. Interval merging via reasoning based on Allen’s algebra relation
MatchingSelection
Reasoning
12Anisa Rula
[2007, 2013][2008, 2013]
[ 2007 2013]
Mapping Facts to Time Intervals - Reasoning
3. Interval merging via reasoning based on Allen’s algebra relation
<Alexander_Pato,playsFor, A.C._Milan>
MatchingSelection
Reasoning
12Anisa Rula
Experimental Setup - Dataset
Dataset # facts Domain Property Equivalent Property
Freebase Yago2DBpedia 1000 Sport team team playsForDBpedia 1000 Politicians office government_positions_held holdsPoliticalPositionDBpedia 500 Celebrities spouse spouse ismarriedTo
Dataset: 2500 DBpedia triples with semantic equivalent triples in Freebase and Yago2
Gold standard: triples annotated with temporal scopes in Yago2 manually curated to correct missing or wrong values
Anisa Rula 13
Experimental Setup - Evaluation Measures
The evaluation measures capture the degree of overlap between theretrieved intervals and the intervals in the gold standard
Precision (for a triple): number of time points in the temporal scopethat fall into the time interval in the gold standard
Recall (for a triple): number of time points in the gold standard that arecovered by the temporal scope
F1 measure (for a triple): the harmonic mean of precision and recall
14Anisa Rula
2007 2011
2008 2010
2007 2011
2006 2012
2007 2011
2007 2011
RefR
Experimental Setup - Evaluation Measures
The evaluation measures capture the degree of overlap between theretrieved intervals and the intervals in the gold standard
Precision (for a triple): number of time points in the temporal scopethat fall into the time interval in the gold standard
Recall (for a triple): number of time points in the gold standard that arecovered by the temporal scope
F1 measure (for a triple): the harmonic mean of precision and recall
14Anisa Rula
2007 2011
2008 2010
2007 2011
2006 2012
2007 2011
2007 2011F1=1F1=0.83F1=0.75
RefR
Experimental Setup - Evaluation Measures
The evaluation measures capture the degree of overlap between theretrieved intervals and the intervals in the gold standard
Precision (for a triple): number of time points in the temporal scopethat fall into the time interval in the gold standard
Recall (for a triple): number of time points in the gold standard that arecovered by the temporal scope
F1 measure (for a triple): the harmonic mean of precision and recallMacro-averaged F1 (avgF-1): aggregated measure for a set of triples
14Anisa Rula
2007 2011
2008 2010
2007 2011
2006 2012
2007 2011
2007 2011F1=1F1=0.83F1=0.75
RefR
Temp prop DBpedia Freebase TemporalDeFactoConfig #facts avgF1 Config #facts avgF1 Config #facts avgF1
playsFor top-1 loc 264 0.505 top-1 loc 213 0.477 top-3 311 0.511
holdsPoliticalPosition
neigh-10 702 0.699 neigh-10-2 242 0.549 top-3 709 0.586
ismarriedTo neigh-10 702 0.600 neigh-10 524 0.547 top-3 709 0.545
Good quality of the approach with an avgF1 of up to 70% Using evidence from RDF documents the performance can be
significantly improved (significantly better results for two properties and negligibly worst results for one property)
Experimental Results - Accuracy of Best Configurations for all Properties Different sources for the creation of the RIM Setup different configurations in the selection and reasoning steps:
o E.g. config top-3 refers to selection function top-3 and reasoning = yes
15Anisa Rula
Temp prop Source Configuration With reasoning
Without reasoning
#fact avgF1 #fact avgF1playsFor TempDeFacto top-3 311 0.511 505 0.467
holdsPoliticalPosition DBpedia neigh-10 702 0.699 822 0.667
ismarriedTo DBpedia neigh-10 705 0.600 977 0.563
The best results are obtained when reasoning is enabled
Experimental Results - Accuracy with vs. without Reasoning for all Properties The best configurations for the three properties
16Anisa Rula
Conclusions & Future Work
Summary Temporal extension of the DeFacto framework Modeling a space of relevant time intervals given an RDF triple Mapping volatile facts to time intervals based on a three-phase algorithm Unsupervised method
17Anisa Rula
Conclusions & Future Work
Summary Temporal extension of the DeFacto framework Modeling a space of relevant time intervals given an RDF triple Mapping volatile facts to time intervals based on a three-phase algorithm Unsupervised method
Future work
17Anisa Rula
Conclusions & Future Work
Summary Temporal extension of the DeFacto framework Modeling a space of relevant time intervals given an RDF triple Mapping volatile facts to time intervals based on a three-phase algorithm Unsupervised method
Future work Determine when to add or not to add the temporal scope based on the
confidence of the acquisition process Collect additional relevant time points to improve the overall results Show the effectiveness of acquired temporal scopes in temporal query
answering
17Anisa Rula
References
[Rula&2012] Anisa Rula, Matteo Palmonari, Andreas Harth, Steffen Stadtmüller,Andrea Maurino: On the Diversity and Availability of Temporal Information inLinked Open Data. International Semantic Web Conference (1) 2012: 492-507
[Gutiérrez&2005] C. Gutierrez, C. A. Hurtado, and A. A. Vaisman. Temporal RDF.In The 2ndESWC, pages 93-107, 2005
[Lehmann&2012] Jens Lehmann, Daniel Gerber, Mohamed Morsey, Axel-CyrilleNgonga Ngomo: DeFacto - Deep Fact Validation. International Semantic WebConference (1) 2012: 312-327
[Ling&2010] X. Ling and D. S. Weld. Temporal information extraction. In 25thAAAI, 2010.
[Derczynsk&2013] L. Derczynski and R. Gaizauskas. Information retrieval fortemporal bounding. In 4th ICTIR, pages 29:129–29:130. ACM, 2013.
19Anisa Rula
1989 2000 2006 2007 2008 2014
1989
2000
2006
2007
2008
2014
… 1900 1901 … 2003 2004 … 2012 2013 now
…
1900
1901
…
2003
2004
…
2012
2013
now
Approach Overview:Time Interval Representation and Relevant Interval Matrix
When does <Alexander_Pato,playsFor, A.C._Milan> ?
All possible time intervals from all possible time points
Relevant time intervals from a set of relevant time points
Triangular Matrix
Vect
or o
f tim
e po
ints
Intuition: use evidence from the Web to reduce the set of considered time intervals and to identify the most significance time intervalAnisa Rula 20
Experimental Results - Accuracy with Different Selection Functions
For higher k in top-k selection, recall increases while precision decreases
Best precision-recall trade-off with neighbor-x, x=10
precision recall F1top-k, k = 1 0,686 0,654 0,67top-k, k = 2 0,515 0,865 0,645top-k, k = 3 0,426 0,924 0,583neighbor x = 10 0,689 0,709 0,699
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1• Dataset: DBpedia and property:<holdsPoliticalPosition>
21Anisa Rula
Top Related