Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems

Link Discovery TutorialBenchmarking for Instance Matching Systems

Axel-Cyrille Ngonga Ngomo(1), Irini Fundulaki(2), Mohamed Ahmed Sherif(1)

(1) Institute for Applied Informatics, Germany(2) FORTH, Greece

October 18th, 2016Kobe, Japan

Ngonga Ngomo et al. (AKSW & FORTH) LD Tutorial:Benchmarking October 17, 2016 1 / 36

The Questions(s)

Instance matching research has led to the development ofvarious systems.

What are the problems that I wishto solve?What are the relevant keyperformance indicators?What is the behavior of the existingengines w.r.t. the key performanceindicators?

Which are the tool(s) that I should use for my dataand for my use case?


Importance of Benchmarking

Benchmarks existTo allow adequate measurements of systemsTo provide evaluation of engines for real (or close to real) use cases

Provide helpDesigners and Developers to assess the performance of their toolsUsers to compare the different available tools and evaluate suitability for theirneedsResearchers to compare their work to others

Leads to improvements:Vendors can improve their technologyResearchers can address new challengesCurrent benchmark design can be improved to cover new necessities andapplication domains


The Answer: Benchmark your engines!

Instance Matching/Linking Benchmark comprises ofDatasets: The raw material of the benchmarks. These are the source andthe target dataset that will be matched together to find the links betweenresources

Test Cases: Address heterogeneities (structural, value, semantic) of thedatasets to be matched

Gold Standard (Ground Truth / Reference Alignment): The "correctanswer sheet" used to judge the completeness and soundness of the instancematching algorithms

Metrics: The performance metric(s) that determine the systems behaviourand performance


Benchmark Datasets: Characteristics

NatureReal Datasets: Widely used datasets from a domain of interest+ Realistic conditions for heterogeneity problems+ Realistic distributions- Error prone, hard to create Reference Alignment

Synthetic Datasets: Produced with a data generator (that hopefully producesdata with interesting characteristics+ Fully controlled test conditions+ Accurate, Easy to create Reference Alignments- Unrealistic distributions- Systematic heterogeneity problems

SchemaDatasets to be matched have the same or different schemas

DomainDatasets come from the same or different domains


Benchmark Test Cases: Variations

ValueName style abbreviations, Typographical errors, change format(date/gender/number), synonym change, language change (multilinguality)

StructuralChange property depth, Delete/add property, split property values,transformation of object/data to data/object type property

Semanticsclass deletion/modification, invert property assertions, change class/propertyhierarchy, assert class disjointness

Combinations of Variations


Benchmark: Gold Standard

The "correct answer sheet" used to judge the completeness and soundness ofthe instance matching algorithms

CharacteristicsExistence of errors / missing alignments

Representation: owl:sameAs and skos:exactMatch


Benchmark: MetricsPrecision P = tp

(tp+fn)Recall R = tp

(tp+fp)F-measure F = 2× P × R

(P+R)


Instance Matching Benchmarks:Desirable Attributes

Systematic Procedure matching tasks should be reproducible and the exe-cution must be comparable

Availability benchmark should be availableQuality precise evaluation rules and high quality ontologies

must be providedEquity evaluation process should not privilege any systemDissemination benchmark should be used to evaluate instance

matching systemsVolume dataset sizeGold Standard gold standard should exist and be as accurate as pos-

sible


What about Benchmarks?

Instance matching techniques have, until recently, beenbenchmarked in an ad-hoc way.

There is no standard way of benchmarking the performanceof the systems, when it comes to Linked Data.


Ontology Alignment Evaluation Initiative

IM benchmarks have been mainly driven forward by the OntologyAlignment Evaluation Initiative (OAEI)

organizes annual campaign for ontology matching since 2005hosts independent benchmarks

In 2009, OAEI introduced the Instance Matching (IM) Trackfocuses on the evaluation of different instance matching techniques and toolsfor Linked Data


Instance Matching Benchmarks

Bechmark GeneratorsSynthetic BenchmarksReal Benchmarks


Semantic Web Instance Generation(SWING) [FMN+11]

Semi automatic generator of Instance Matching Benchmarks

Contributed in the generation of IIMB Benchmarks of OAEI in 2010, 2011and 2012 Instance Matching TracksFreely available at (https://code.google.com/p/swing-generator/)All kind of variations supported into the benchmarks except multilingualityAutomatically produced gold standard


https://code.google.com/p/swing-generator/

Lance [SDF+15b]

Flexible, generic and domain-independent benchmark generator which takesinto consideration RDFS and OWL constructs in order to evaluate instance

matching systems.


Lance [SDF+15b]

Lance provides support for:Semantics-aware transformations

Complex class definitions (union, intersection)Complex property definitions (functional properties, inverse functionalproperties)Disjointness (properties)

Standard value and structure based transformationsWeighted gold standard based on tensor factorizationVarying degrees of difficulty and fine-grained evaluation metrics

Available at http://github.com/jsaveta/Lance


http://github.com/jsaveta/Lance

Lance Architecture


Synthetic Benchmarks

Ontology Alignment Evaluation Benchmarks


Synthetic Instance Matching Benchmarks:Overview (1)

IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Systematic Procedure√ √ √ √ √ √ √ √ √

Availability√ √ √ √ √

- -√ √

Quality√ √ √ √ √ √ √ √ √

Equity√ √ √ √ √ √ √ √ √

Dissemination 6 3 6 1 3 4 4 5 5Volume 0.2K 1.4K 0.86K 4K 0.375K 1.5K 0.43K 2.650K 10K

Gold Standard√ √ √ √ √ √ √ √ √



IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Value Variations√ √ √ √ √ √ √ √ √

Structural Variations√ √ √ √

- - - + +Logical Variations

√ √-

√-

√- - -

Multilinguality - - - - - -√ √ √

IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Blind Evaluations - - - - - -√ √ √

1-n Mappings - -√

- - -√ √

-



IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Lance2015

SystematicProcedure

√ √ √ √ √ √ √ √ √ √

Availability√ √ √ √ √

- -√ √ √

Quality√ √ √ √ √ √ √ √ √ √

Equity√ √ √ √ √ √ √ √ √ √

Dissemination 6 3 6 1 3 4 4 5 5 2

Volume 0.2K 1.4K 0.86K 4K 0.375K 1.5K 0.43K 2.650K 10K > 1M

GoldStandard

√ √ √ √ √ √ √ √ √ √



IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Lance2015

ValueVariations

√ √ √ √ √ √ √ √ √ √

StructuralVariations

√ √ √ √- - - + + +

LogicalVariations

√ √-

√-

√- - - +

Multilinguality - - - - - -√ √ √ √



IIMB2009

IIMB2010

PR2010

IIMB2011

Sandbox2012

IIMB2012

RDFT2013

ID-REC2014

AuthorTask2015

Lance2015

BlindEvaluations - - - - - -

√ √ √ √

1-nMappings - -

√- - -

√ √- -


Real Benchmarks


Real Instance Matching Benchmarks:Overview (1)

ARS DI 2010 DI 2011

Systematic Procedure√ √ √

Availability√ √

-

Quality√ √ √

Equity√ √ √

Dissemination 5 2 3

Volume 100K 6K NA

Gold Standard√ √

+


Real Instance Matching Benchmarks:Overview (2)

ARS DI 2010 DI 2011

Value Variations√ √ √

Structural Variations√ √

-

Logical Variations - - -

Multilinguality - - -

Blind Evaluations - - -


Wrapping Up

Multilinguality


Wrapping Up

Value Variations


Wrapping Up

Structural Variations


Wrapping Up

Logical Variations


Wrapping Up

Combinations of Variations


Wrapping Up

Scalability


Open Issues

Only one benchmark that tackles both, combination of variations andscalability issuesNot enough IM benchmark using the full expressiveness of RDF/OWLlanguage


Systems

Systems can handle the value variations, the structural variation, and thesimple logical variations separately.More work needed for complex variations (combination of value, structural,and logical)More work needed for structural variationsEnhancement of systems to cope with the clustering of the mappings (1-nmappings)


Conclusions

Many instance matching benchmarks have been proposedEach of them answering to some of the needs of instance matching systems.It is essential to start creating benchmarks that will “show the way to thefuture”Extend the limits of existing systems.


Acknowledgment

This work was supported by grants from the EU H2020 Framework Programmeprovided for the project HOBBIT (GA no. 688227).


References I


Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems

Science

Transcript of Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems