Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems

36
Link Discovery Tutorial Benchmarking for Instance Matching Systems Axel-Cyrille Ngonga Ngomo (1) , Irini Fundulaki (2) , Mohamed Ahmed Sherif (1) (1) Institute for Applied Informatics, Germany (2) FORTH, Greece October 18th, 2016 Kobe, Japan Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 1 / 36

Transcript of Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems

Link Discovery TutorialBenchmarking for Instance Matching Systems

Axel-Cyrille Ngonga Ngomo(1), Irini Fundulaki(2), Mohamed Ahmed Sherif(1)

(1) Institute for Applied Informatics, Germany(2) FORTH, Greece

October 18th, 2016Kobe, Japan

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 1 / 36

The Questions(s)

Instance matching research has led to the development ofvarious systems.

What are the problems that I wishto solve?What are the relevant keyperformance indicators?What is the behavior of the existingengines w.r.t. the key performanceindicators?

Which are the tool(s) that I should use for my dataand for my use case?

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 2 / 36

Importance of Benchmarking

Benchmarks existTo allow adequate measurements of systemsTo provide evaluation of engines for real (or close to real) use cases

Provide helpDesigners and Developers to assess the performance of their toolsUsers to compare the different available tools and evaluate suitability for theirneedsResearchers to compare their work to others

Leads to improvements:Vendors can improve their technologyResearchers can address new challengesCurrent benchmark design can be improved to cover new necessities andapplication domains

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 3 / 36

The Answer: Benchmark your engines!

Instance Matching/Linking Benchmark comprises ofDatasets: The raw material of the benchmarks. These are the source andthe target dataset that will be matched together to find the links betweenresources

Test Cases: Address heterogeneities (structural, value, semantic) of thedatasets to be matched

Gold Standard (Ground Truth / Reference Alignment): The "correctanswer sheet" used to judge the completeness and soundness of the instancematching algorithms

Metrics: The performance metric(s) that determine the systems behaviourand performance

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 4 / 36

Benchmark Datasets: Characteristics

NatureReal Datasets: Widely used datasets from a domain of interest+ Realistic conditions for heterogeneity problems+ Realistic distributions- Error prone, hard to create Reference Alignment

Synthetic Datasets: Produced with a data generator (that hopefully producesdata with interesting characteristics+ Fully controlled test conditions+ Accurate, Easy to create Reference Alignments- Unrealistic distributions- Systematic heterogeneity problems

SchemaDatasets to be matched have the same or different schemas

DomainDatasets come from the same or different domains

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 5 / 36

Benchmark Test Cases: Variations

ValueName style abbreviations, Typographical errors, change format(date/gender/number), synonym change, language change (multilinguality)

StructuralChange property depth, Delete/add property, split property values,transformation of object/data to data/object type property

Semanticsclass deletion/modification, invert property assertions, change class/propertyhierarchy, assert class disjointness

Combinations of Variations

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 6 / 36

Benchmark: Gold Standard

The "correct answer sheet" used to judge the completeness and soundness ofthe instance matching algorithms

CharacteristicsExistence of errors / missing alignments

Representation: owl:sameAs and skos:exactMatch

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 7 / 36

Benchmark: MetricsPrecision P = tp

(tp+fn)Recall R = tp

(tp+fp)F-measure F = 2× P × R

(P+R)

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 8 / 36

Instance Matching Benchmarks:Desirable Attributes

Systematic Procedure matching tasks should be reproducible and the exe-cution must be comparable

Availability benchmark should be availableQuality precise evaluation rules and high quality ontologies

must be providedEquity evaluation process should not privilege any systemDissemination benchmark should be used to evaluate instance

matching systemsVolume dataset sizeGold Standard gold standard should exist and be as accurate as pos-

sible

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 9 / 36

What about Benchmarks?

Instance matching techniques have, until recently, beenbenchmarked in an ad-hoc way.

There is no standard way of benchmarking the performanceof the systems, when it comes to Linked Data.

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 10 / 36

Ontology Alignment Evaluation Initiative

IM benchmarks have been mainly driven forward by the OntologyAlignment Evaluation Initiative (OAEI)

organizes annual campaign for ontology matching since 2005hosts independent benchmarks

In 2009, OAEI introduced the Instance Matching (IM) Trackfocuses on the evaluation of different instance matching techniques and toolsfor Linked Data

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 11 / 36

Instance Matching Benchmarks

Bechmark GeneratorsSynthetic BenchmarksReal Benchmarks

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 12 / 36

Semantic Web Instance Generation(SWING) [FMN+11]

Semi automatic generator of Instance Matching Benchmarks

Contributed in the generation of IIMB Benchmarks of OAEI in 2010, 2011and 2012 Instance Matching TracksFreely available at (https://code.google.com/p/swing-generator/)All kind of variations supported into the benchmarks except multilingualityAutomatically produced gold standard

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 13 / 36

Lance [SDF+15b]

Flexible, generic and domain-independent benchmark generator which takesinto consideration RDFS and OWL constructs in order to evaluate instance

matching systems.

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 14 / 36

Lance [SDF+15b]

Lance provides support for:Semantics-aware transformations

Complex class definitions (union, intersection)Complex property definitions (functional properties, inverse functionalproperties)Disjointness (properties)

Standard value and structure based transformationsWeighted gold standard based on tensor factorizationVarying degrees of difficulty and fine-grained evaluation metrics

Available at http://github.com/jsaveta/Lance

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 15 / 36

Lance Architecture

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 16 / 36

Synthetic Benchmarks

Ontology Alignment Evaluation Benchmarks

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 17 / 36

Synthetic Instance Matching Benchmarks:Overview (1)

IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Systematic Procedure√ √ √ √ √ √ √ √ √

Availability√ √ √ √ √

- -√ √

Quality√ √ √ √ √ √ √ √ √

Equity√ √ √ √ √ √ √ √ √

Dissemination 6 3 6 1 3 4 4 5 5Volume 0.2K 1.4K 0.86K 4K 0.375K 1.5K 0.43K 2.650K 10K

Gold Standard√ √ √ √ √ √ √ √ √

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 18 / 36

Synthetic Instance Matching Benchmarks:Overview (2)

IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Value Variations√ √ √ √ √ √ √ √ √

Structural Variations√ √ √ √

- - - + +Logical Variations

√ √-

√-

√- - -

Multilinguality - - - - - -√ √ √

IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Blind Evaluations - - - - - -√ √ √

1-n Mappings - -√

- - -√ √

-

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 19 / 36

Synthetic Instance Matching Benchmarks:Overview (3)

IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Lance2015

SystematicProcedure

√ √ √ √ √ √ √ √ √ √

Availability√ √ √ √ √

- -√ √ √

Quality√ √ √ √ √ √ √ √ √ √

Equity√ √ √ √ √ √ √ √ √ √

Dissemination 6 3 6 1 3 4 4 5 5 2

Volume 0.2K 1.4K 0.86K 4K 0.375K 1.5K 0.43K 2.650K 10K > 1M

GoldStandard

√ √ √ √ √ √ √ √ √ √

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 20 / 36

Synthetic Instance Matching Benchmarks:Overview (4)

IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Lance2015

ValueVariations

√ √ √ √ √ √ √ √ √ √

StructuralVariations

√ √ √ √- - - + + +

LogicalVariations

√ √-

√-

√- - - +

Multilinguality - - - - - -√ √ √ √

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 21 / 36

Synthetic Instance Matching Benchmarks:Overview (5)

IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Lance2015

BlindEvaluations - - - - - -

√ √ √ √

1-nMappings - -

√- - -

√ √- -

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 22 / 36

Real Benchmarks

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 23 / 36

Real Instance Matching Benchmarks:Overview (1)

ARS DI 2010 DI 2011

Systematic Procedure√ √ √

Availability√ √

-

Quality√ √ √

Equity√ √ √

Dissemination 5 2 3

Volume 100K 6K NA

Gold Standard√ √

+

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 24 / 36

Real Instance Matching Benchmarks:Overview (2)

ARS DI 2010 DI 2011

Value Variations√ √ √

Structural Variations√ √

-

Logical Variations - - -

Multilinguality - - -

Blind Evaluations - - -

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 25 / 36

Wrapping Up

Multilinguality

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 26 / 36

Wrapping Up

Value Variations

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 27 / 36

Wrapping Up

Structural Variations

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 28 / 36

Wrapping Up

Logical Variations

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 29 / 36

Wrapping Up

Combinations of Variations

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 30 / 36

Wrapping Up

Scalability

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 31 / 36

Open Issues

Only one benchmark that tackles both, combination of variations andscalability issuesNot enough IM benchmark using the full expressiveness of RDF/OWLlanguage

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 32 / 36

Systems

Systems can handle the value variations, the structural variation, and thesimple logical variations separately.More work needed for complex variations (combination of value, structural,and logical)More work needed for structural variationsEnhancement of systems to cope with the clustering of the mappings (1-nmappings)

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 33 / 36

Conclusions

Many instance matching benchmarks have been proposedEach of them answering to some of the needs of instance matching systems.It is essential to start creating benchmarks that will “show the way to thefuture”Extend the limits of existing systems.

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 34 / 36

Acknowledgment

This work was supported by grants from the EU H2020 Framework Programmeprovided for the project HOBBIT (GA no. 688227).

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 35 / 36

References I

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 36 / 36