Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
-
Upload
holistic-benchmarking-of-big-linked-data -
Category
Science
-
view
195 -
download
2
Transcript of Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery TutorialBenchmarking for Instance Matching Systems
Axel-Cyrille Ngonga Ngomo(1), Irini Fundulaki(2), Mohamed Ahmed Sherif(1)
(1) Institute for Applied Informatics, Germany(2) FORTH, Greece
October 18th, 2016Kobe, Japan
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 1 / 36
The Questions(s)
Instance matching research has led to the development ofvarious systems.
What are the problems that I wishto solve?What are the relevant keyperformance indicators?What is the behavior of the existingengines w.r.t. the key performanceindicators?
Which are the tool(s) that I should use for my dataand for my use case?
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 2 / 36
Importance of Benchmarking
Benchmarks existTo allow adequate measurements of systemsTo provide evaluation of engines for real (or close to real) use cases
Provide helpDesigners and Developers to assess the performance of their toolsUsers to compare the different available tools and evaluate suitability for theirneedsResearchers to compare their work to others
Leads to improvements:Vendors can improve their technologyResearchers can address new challengesCurrent benchmark design can be improved to cover new necessities andapplication domains
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 3 / 36
The Answer: Benchmark your engines!
Instance Matching/Linking Benchmark comprises ofDatasets: The raw material of the benchmarks. These are the source andthe target dataset that will be matched together to find the links betweenresources
Test Cases: Address heterogeneities (structural, value, semantic) of thedatasets to be matched
Gold Standard (Ground Truth / Reference Alignment): The "correctanswer sheet" used to judge the completeness and soundness of the instancematching algorithms
Metrics: The performance metric(s) that determine the systems behaviourand performance
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 4 / 36
Benchmark Datasets: Characteristics
NatureReal Datasets: Widely used datasets from a domain of interest+ Realistic conditions for heterogeneity problems+ Realistic distributions- Error prone, hard to create Reference Alignment
Synthetic Datasets: Produced with a data generator (that hopefully producesdata with interesting characteristics+ Fully controlled test conditions+ Accurate, Easy to create Reference Alignments- Unrealistic distributions- Systematic heterogeneity problems
SchemaDatasets to be matched have the same or different schemas
DomainDatasets come from the same or different domains
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 5 / 36
Benchmark Test Cases: Variations
ValueName style abbreviations, Typographical errors, change format(date/gender/number), synonym change, language change (multilinguality)
StructuralChange property depth, Delete/add property, split property values,transformation of object/data to data/object type property
Semanticsclass deletion/modification, invert property assertions, change class/propertyhierarchy, assert class disjointness
Combinations of Variations
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 6 / 36
Benchmark: Gold Standard
The "correct answer sheet" used to judge the completeness and soundness ofthe instance matching algorithms
CharacteristicsExistence of errors / missing alignments
Representation: owl:sameAs and skos:exactMatch
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 7 / 36
Benchmark: MetricsPrecision P = tp
(tp+fn)Recall R = tp
(tp+fp)F-measure F = 2× P × R
(P+R)
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 8 / 36
Instance Matching Benchmarks:Desirable Attributes
Systematic Procedure matching tasks should be reproducible and the exe-cution must be comparable
Availability benchmark should be availableQuality precise evaluation rules and high quality ontologies
must be providedEquity evaluation process should not privilege any systemDissemination benchmark should be used to evaluate instance
matching systemsVolume dataset sizeGold Standard gold standard should exist and be as accurate as pos-
sible
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 9 / 36
What about Benchmarks?
Instance matching techniques have, until recently, beenbenchmarked in an ad-hoc way.
There is no standard way of benchmarking the performanceof the systems, when it comes to Linked Data.
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 10 / 36
Ontology Alignment Evaluation Initiative
IM benchmarks have been mainly driven forward by the OntologyAlignment Evaluation Initiative (OAEI)
organizes annual campaign for ontology matching since 2005hosts independent benchmarks
In 2009, OAEI introduced the Instance Matching (IM) Trackfocuses on the evaluation of different instance matching techniques and toolsfor Linked Data
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 11 / 36
Instance Matching Benchmarks
Bechmark GeneratorsSynthetic BenchmarksReal Benchmarks
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 12 / 36
Semantic Web Instance Generation(SWING) [FMN+11]
Semi automatic generator of Instance Matching Benchmarks
Contributed in the generation of IIMB Benchmarks of OAEI in 2010, 2011and 2012 Instance Matching TracksFreely available at (https://code.google.com/p/swing-generator/)All kind of variations supported into the benchmarks except multilingualityAutomatically produced gold standard
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 13 / 36
Lance [SDF+15b]
Flexible, generic and domain-independent benchmark generator which takesinto consideration RDFS and OWL constructs in order to evaluate instance
matching systems.
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 14 / 36
Lance [SDF+15b]
Lance provides support for:Semantics-aware transformations
Complex class definitions (union, intersection)Complex property definitions (functional properties, inverse functionalproperties)Disjointness (properties)
Standard value and structure based transformationsWeighted gold standard based on tensor factorizationVarying degrees of difficulty and fine-grained evaluation metrics
Available at http://github.com/jsaveta/Lance
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 15 / 36
Lance Architecture
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 16 / 36
Synthetic Benchmarks
Ontology Alignment Evaluation Benchmarks
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 17 / 36
Synthetic Instance Matching Benchmarks:Overview (1)
IIMB2009
IIMB2010
PR2010
IIMB2011
Sandbox2012
IIMB2012
RDFT2013
ID-REC2014
AuthorTask2015
Systematic Procedure√ √ √ √ √ √ √ √ √
Availability√ √ √ √ √
- -√ √
Quality√ √ √ √ √ √ √ √ √
Equity√ √ √ √ √ √ √ √ √
Dissemination 6 3 6 1 3 4 4 5 5Volume 0.2K 1.4K 0.86K 4K 0.375K 1.5K 0.43K 2.650K 10K
Gold Standard√ √ √ √ √ √ √ √ √
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 18 / 36
Synthetic Instance Matching Benchmarks:Overview (2)
IIMB2009
IIMB2010
PR2010
IIMB2011
Sandbox2012
IIMB2012
RDFT2013
ID-REC2014
AuthorTask2015
Value Variations√ √ √ √ √ √ √ √ √
Structural Variations√ √ √ √
- - - + +Logical Variations
√ √-
√-
√- - -
Multilinguality - - - - - -√ √ √
IIMB2009
IIMB2010
PR2010
IIMB2011
Sandbox2012
IIMB2012
RDFT2013
ID-REC2014
AuthorTask2015
Blind Evaluations - - - - - -√ √ √
1-n Mappings - -√
- - -√ √
-
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 19 / 36
Synthetic Instance Matching Benchmarks:Overview (3)
IIMB2009
IIMB2010
PR2010
IIMB2011
Sandbox2012
IIMB2012
RDFT2013
ID-REC2014
AuthorTask2015
Lance2015
SystematicProcedure
√ √ √ √ √ √ √ √ √ √
Availability√ √ √ √ √
- -√ √ √
Quality√ √ √ √ √ √ √ √ √ √
Equity√ √ √ √ √ √ √ √ √ √
Dissemination 6 3 6 1 3 4 4 5 5 2
Volume 0.2K 1.4K 0.86K 4K 0.375K 1.5K 0.43K 2.650K 10K > 1M
GoldStandard
√ √ √ √ √ √ √ √ √ √
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 20 / 36
Synthetic Instance Matching Benchmarks:Overview (4)
IIMB2009
IIMB2010
PR2010
IIMB2011
Sandbox2012
IIMB2012
RDFT2013
ID-REC2014
AuthorTask2015
Lance2015
ValueVariations
√ √ √ √ √ √ √ √ √ √
StructuralVariations
√ √ √ √- - - + + +
LogicalVariations
√ √-
√-
√- - - +
Multilinguality - - - - - -√ √ √ √
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 21 / 36
Synthetic Instance Matching Benchmarks:Overview (5)
IIMB2009
IIMB2010
PR2010
IIMB2011
Sandbox2012
IIMB2012
RDFT2013
ID-REC2014
AuthorTask2015
Lance2015
BlindEvaluations - - - - - -
√ √ √ √
1-nMappings - -
√- - -
√ √- -
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 22 / 36
Real Benchmarks
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 23 / 36
Real Instance Matching Benchmarks:Overview (1)
ARS DI 2010 DI 2011
Systematic Procedure√ √ √
Availability√ √
-
Quality√ √ √
Equity√ √ √
Dissemination 5 2 3
Volume 100K 6K NA
Gold Standard√ √
+
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 24 / 36
Real Instance Matching Benchmarks:Overview (2)
ARS DI 2010 DI 2011
Value Variations√ √ √
Structural Variations√ √
-
Logical Variations - - -
Multilinguality - - -
Blind Evaluations - - -
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 25 / 36
Wrapping Up
Multilinguality
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 26 / 36
Wrapping Up
Value Variations
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 27 / 36
Wrapping Up
Structural Variations
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 28 / 36
Wrapping Up
Logical Variations
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 29 / 36
Wrapping Up
Combinations of Variations
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 30 / 36
Wrapping Up
Scalability
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 31 / 36
Open Issues
Only one benchmark that tackles both, combination of variations andscalability issuesNot enough IM benchmark using the full expressiveness of RDF/OWLlanguage
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 32 / 36
Systems
Systems can handle the value variations, the structural variation, and thesimple logical variations separately.More work needed for complex variations (combination of value, structural,and logical)More work needed for structural variationsEnhancement of systems to cope with the clustering of the mappings (1-nmappings)
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 33 / 36
Conclusions
Many instance matching benchmarks have been proposedEach of them answering to some of the needs of instance matching systems.It is essential to start creating benchmarks that will “show the way to thefuture”Extend the limits of existing systems.
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 34 / 36
Acknowledgment
This work was supported by grants from the EU H2020 Framework Programmeprovided for the project HOBBIT (GA no. 688227).
Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 35 / 36