Melinda: Methods and tools for Web Data Interlinking

25
December

description

Presentation given at STI Innsbruck the 17th of December 2009.

Transcript of Melinda: Methods and tools for Web Data Interlinking

Page 1: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Melinda

Methods and tools for Web data Interlinking

François Schar�e

December @ STI Innsbruck

Page 2: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

1 Introduction

2 Framework

3 Tools

4 Application

5 Conclusions

Page 3: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Publishing datasets on the Web

Four publication principles

1 Resources are identi�ed by URIs.

2 URIs are dereferenceable.

3 When a URI is dereferenced, a description of the identi�ed

resource should be returned, ideally adapted through content

negotiation.

4 Published Web datasets must contain links to other Web

datasets.

Page 4: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Interlinking datasets

Links are contained in speci�c datasets

<http://www.example.org/linkset/DBPedia-MB>

a void:Linkset ;

void:target <http://www.dpbedia.org>;

void:target <http://www.musicbrainz.org>;

<http://www.example.org/linkset/DBPedia-MB>

<http://www.dbpedia.org/resource/

Johann_Sebastian_Bach>

owl:sameAs

<http://www.musicbrainz.org/artist/

24f1766e-9635-4d58-a4d4-9413f9f98a4c> .

Page 5: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Web Data Cloud

Page 6: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Goodie : Open Data's coming up

data.gov, US Data Act

data.gov.co.uk, Sir TBL on the track

Other intitiatives around : from the EU, Open Data intitiatives

Page 7: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

What do we do ?

We propose a framework capturing the various data

interlinking methods

We study existing tools and position them in the framework

We propose an architecture allowing to articulate ontology

alignment and interlinking tools

Page 8: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

General approach

URI1 URI2Data interlinking

owl :sameAs

Fig.: The data interlinking problem.

Page 9: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Manual resource alignement

URI1 URI2

URI transformation

owl :sameAs

Fig.: URI transformation.

Page 10: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Matching identi�ers - Exemple

http://dbpedia.org/resource/Johann_Sebastian_Bach

http://www.lastfm.fr/music/Johann+Sebastian+Bach

owl:sameAs

URI alignment

Fig.: URI transformation exemple

Page 11: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Datasets sharing a common ontology

O1

URI1 URI2

Resource

matching of

datasets described

by the same

ontology

owl :sameAs

Fig.: Matching two datasets described according to the same ontology.

Page 12: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Datasets sharing a common ontology - Exemple

URI1 URI2first

mo:MusicArtist

last first last

Johann-Sebastian Bach

Jean-Sébastien Bach

Resource matching algorithm,datasets described according to a common ontology

type type

DBPedia Musicbrainz

Fig.: Matching data sharing a common ontology

Page 13: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Matching datasets having heterogeneous ontologies

O1 O2

URI1 URI2

Implicit alignment

Resource

matching of

datasets described

by di�erent

ontologies

owl :sameAs

Fig.: Two datasets matched using an implicit alignment.

Page 14: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Exemple

URI1 URI2

mo:MusicArtist

givenname

nameSebastianBach"

"Johann"Jean-Sébastien"

"Bach"

type type

OpenCyc Musicbrainz

Classical Music Performer

English ID

Page 15: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

General interlinking framework

O1 O2

URI1 URI2

Ontology matching

Alignment

Data interlinking

owl :sameAs

Fig.: General framework for data interlinking involving ontology matching.

Page 16: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Processes and speci�cations

process result

instance link speci�cation linkset

class matcher alignment

Tab.: Matching process, interlinks, and their results.

Page 17: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Analysis criterion

Degree of Automation

Is the tool completely automatic ?

Does the tool need to be parametrized by the user ? What kind

of parameters (data matching techniques, ontology

alignment) ?

Used matching techniques

String matching ?

External functions (values conversion, data transformations) ?

Similarity propagation ?

Other techniques ?

Domain : Is the tool speci�c for a given domain ?

Page 18: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Analysis criterion

Ontologies

Does the tool take into account ontologies associated to the

datasets ?

Does the tool allow to interlink datasets described according

to di�erent ontologies ?

In the case the ontologies di�er, does the tool perform

ontology alignment ?

Output

What does the tool produce in output ?

Does the tool propose to merge the two input datasets ?

Postprocessing Does the tool perform any post-processing

operations ?

Page 19: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Six interlinking tools

RKB-CRS Coreference resolution service of the RKB RDF

Knowledge Base.

LD-mapper Interlinking tool for the music ontology MO.

ODD Linker Interlinking tool based on SQL record matching.

RDF-AI Interlinking and data fusion tool.

Silk et Silk LSL Interlinking tool and link speci�cation language.

Knofuss architecture Outil Interlinking and data fusion tool with

ontology alignment.

Page 20: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Six interlinking tools

owl:sameAs

URI 2

Resource comparison method

URI 1

O1 O2Implicit

Alignment

OntologyMatchingSystem

Silk

ODD-Linker LD-Mapper

RDF-AI Knofuss

ExplicitAlignment

RKB-CRS

Fig.: Tools positioned in the de�ned framework

Page 21: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Application

Let us consider a link speci�cation between DBPedia andGeonames :

<Silk><Prefix id="rdfs" namespace=

"http://www.w3.org/2000/01/rdf-schema#" /><Prefix id="dbpedia" namespace=

"http://dbpedia.org/ontology/" /><Prefix id="gn" namespace=

"http://www.geonames.org/ontology#" />

<DataSource id="dbpedia"><EndpointURI>http://demo_sparql_server1/sparql</EndpointURI><Graph>http://dbpedia.org</Graph>

</DataSource>

<DataSource id="geonames"><EndpointURI>http://demo_sparql_server2/sparql</EndpointURI><Graph>http://sws.geonames.org/</Graph>

</DataSource>

<Thresholds accept="0.9" verify="0.7" /><Output acceptedLinks="accepted_links.n3"

verifyLinks="verify_links.n3"mode="truncate" />

<Interlink id="cities"><LinkType>owl:sameAs</LinkType><SourceDataset dataSource="dbpedia" var="a"><RestrictTo>

?a rdf:type dbpedia:City</RestrictTo>

</SourceDataset><TargetDataset dataSource="geonames" var="b"><RestrictTo>

?b rdf:type gn:P</RestrictTo>

</TargetDataset><LinkCondition><AVG>

<Compare metric="jaroSimilarity"><Param name="str1" path="?a/rdfs:label" /><Param name="str2" path="?b/gn:name" />

</Compare><Compare metric="numSimilarity">

<Param name="num1"path="?a/dbpedia:populationTotal" />

<Param name="num2" path="?b/gn:population" /></Compare>

</AVG></LinkCondition>

</Interlink></Silk>

Page 22: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Application

The alignment implicitely contained in the link speci�cation.

:dbp-geo a align:Alignment;align:onto1 <http://dbpedia.org/ontology/>;align:onto2 <http://www.geonames.org/ontology#>;align:map [ :map1 a align:Cell;align:entity1 dbpedia:City;align:entity2 gn:P;align:relation align:subsumedBy.

];align:map [ :map2 a align:Cell;align:entity1 dbpedia:populationTotal;align:entity2 gn:population;align:relation align:equivalent.

];align:map [ :map3 a align:Cell;align:entity1 rdfs:label;align:entity2 gn:name;align:relation align:equivalent.

].

align:map [ :map2 a align:Cell;align:entity1 [ a align:Property;

edoal:and dbpedia:populationTotal.edoal:and [ a edoal:PropertyDomainRestriction;

edoal:domain dbpedia:City.];

align:entity2 [ a align:Property;edoal:and gn:population;

edoal:and [ a edoal:PropertyDomainRestriction;edoal:domain gn:P. ];

align:relation align:equivalent.];align:map [ :map2 a align:Cell;

align:entity1 [ a align:Property;edoal:and rdfs:label.

edoal:and [ a edoal:PropertyDomainRestriction;edoal:domain dbpedia:City.

];align:entity2 [ a align:Property;

edoal:and gn:name;edoal:and [ a edoal:PropertyDomainRestriction;

edoal:domain gn:P. ];align:relation align:equivalent.

].

Page 23: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Application

Using the alignment, the link speci�cation can be simpli�ed.

<UseAlignment rdf:resource="#dbp-geo" />

<Interlink id="cities"><LinkType>owl:sameAs</LinkType><LinkCell rdf:resource="#map1" /><LinkCondition><AVG>

<Compare metric="jaroSimilarity"><CellParam rdf:resource="#map2" />

</Compare><Compare metric="numSimilarity">

<CellParam rdf:resource="#map3" /></Compare>

</AVG></LinkCondition>

<Thresholds accept="0.9" verify="0.7" /><Output acceptedLinks="accepted_links.n3"verifyLinks="verify_links.n3"mode="truncate" />

</Interlink>

Page 24: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

Conclusions

We propose a framework for data interlinking on the Web of

data.

We have presented existing tools and positioned them wrt the

framework.

We propose a simpli�cation of the interlinking task and

demonstrate it on an example.

Our current work goes towards more interoperability for link

speci�cations :

Is it possible to construct more generic link speci�cations ? ie

attached to datasets or ontologies

Is it possible to automatically �nd out the key properties

allowing to identify matching pairs ?

Page 25: Melinda: Methods and tools for Web Data Interlinking

Introduction Framework Tools Application Conclusions

For more

http://melinda.inrialpes.fr

François Schar�e et Jérôme Euzenat. Linked data meets

ontology matching : enhancing data interlinking through

ontology alignments. (submitted WWW'2010).