DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

23
Bernhard Haslhofer, Niko Popitsch DSNotify - Detecting and Fixing Broken Links in Linked Data Sets WebS ’09 @ DEXA 2009 Linz, 02/09/2009 Bernhard Haslhofer and Niko Popitsch

description

Bernhard Haslhofer and Niko Popitsc, University of ViennaWeb Semantic Workshop, DEXA 2009 Linz, 2 September 2009

Transcript of DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Page 1: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

WebS ’09 @ DEXA 2009

Linz, 02/09/2009

Bernhard Haslhofer and Niko Popitsch

Page 2: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

Summary

2

Page 3: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
Page 4: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
Page 5: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
Page 6: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

<mo:MusicGroup rdf:about="/music/artists/084308bd-1654-436f-ba03-df6697104e19#artist">

<foaf:name>Green Day</foaf:name>

<owl:sameAs rdf:resource="http://dbpedia.org/resource/Green_Day" />

<mo:image rdf:resource="/music/images/artists/7col_in/084308bd-1654-436f-ba03-

df6697104e19.jpg" />

<foaf:page rdf:resource="/music/artists/084308bd-1654-436f-ba03-df6697104e19.html" />

<mo:musicbrainz rdf:resource="http://musicbrainz.org/artist/084308bd-1654-436f-ba03-

df6697104e19.html" />

<mo:homepage rdf:resource="http://www.greenday.com/" />

<mo:fanpage rdf:resource="http://www.greendayvideos.com/" />

<mo:fanpage rdf:resource="http://www.greenday.net" />

<mo:imdb rdf:resource="http://www.imdb.com/name/nm1554564/" />

<mo:myspace rdf:resource="http://www.myspace.com/greenday" />

...

Page 7: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

...

<rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day">

<dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Green Day

is an American rock trio formed in 1987. The band has consisted of Billie Joe Armstrong

(vocals, guitar), Mike Dirnt, and Tré Cool for the majority of its existence...

</dbpprop:abstract>

</rdf:Description>

...

<rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day">

<dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="de">Green Day

[gɹiːn deɪ] ist eine US-amerikanische Punk-Rock-Band, mit der Anfang der 1990er das Punk-

Revival begann. Die Band wurde 1987 von Billie Joe Armstrong und Mike Dirnt zusammen

mit dem Schlagzeuger John Kiffmeyer alias Al Sobrante als The Sweet Children....

</dbpprop:abstract>

</rdf:Description>

...

Page 8: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

...but...

8

Page 9: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

• Events between DBpedia 3.2 (10/2008) and 3.3 (05/2009)

• # resources created: 29449

• # resources removed: 4789

• # resources moved: 729

9

Some numbers...

Page 10: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
Page 11: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

Link Integrity...• is a qualitative property that is given when all links

within and between a set of data sources are valid and deliver the result intended by the link creator.

• cf. referential integrity in RDBMS

• demands a solution that

• detects broken links between resources

• provides support for fixing broken links

11

Page 12: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

Types of broken links

• Removed link targets

• e.g., resource deleted, server not available anymore, etc.

• Moved link targets

• available at another Web location

• e.g., reorganization of Web resources

• Modified link targets

12

Page 13: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

The DSNotify Approach• periodically monitor items (resources) in a specific

Linked Data source

• extract descriptive features vector for each item

• store item + feature vector in index

• use feature vectors to detect if items have been removed or moved to another location

• if moved, add relationship between “old” and “new” item

13

Page 14: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

Architecture

14

DSNOTIFY

LOD SourcesLOD Source

owl:sameAs

owl:sameAs

update

* Monitor (feature extraction)Event

LOG

monitor

Indices

II RII AII

* Move Detector (heuristic)

notifications

querying

user

Decision making* Decider

* LOD source

updater

LOD „consuming“

application

Page 15: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch 15

Index Interaction

http://dbpedia.org/resource/

Green_Day (band)

http://dbpedia.org/resource/

band/Green_Day

http://dbpedia.org/resource/

Green_Day (band)

http://dbpedia.org/resource/

Green_Day (band)

http://dbpedia.org/resource/

Green_Day (band)

http://dbpedia.org/resource/

band/Green_Day

http://dbpedia.org/resource/

band/Alternative/Green_Day

Item Index (II) Archived Item Index (AII) Removed Item Index (RII)

t1

t2

t3

t4

time

Page 16: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

Move Detection

• is a semi-automatic process

• calculate similarity between items based on their feature vectors using domain-specific heuristics

• probability > given threshold: automatic decision

• probability < given threshold: ask expert user

16

Page 17: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

DSNotify HTTP Interface

• GET http://<server>:<port>/<dsnotify>/item/<uri>

• find out what happened with an item

• GET http://<server>:<port>/<dsnotify>/eventChoice

• retrieve pending event choices (move / remove)

• ...

17

Page 18: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

Evaluation Plan

18

t0t-1t-2t-n ...

DBpedia 3.2DBpedia 3.1DBpedia 3.0DBpedia 2.0

Diff

mv rm

manual classification

Diff

mv rm

manual classification

Diff

mv rm

manual classification

Page 19: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

Status / Future Work

• 1st prototype (infrastructure) ready

• annotated test-data set based on DBpedia available

• Currently working on:

• system for simulating past modifications in DBpedia

• the DSNotify evaluation

19

Page 20: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Fixing Your Web since 2009

Page 21: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

Backup

21

Page 22: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

Evaluation Plan

• Monitor simulated DBpedia evolution (t-n - t0)

• Precision / recall of automatic move detection

• with different similarity thresholds

• with different heuristics / and feature vectors

22

Page 23: DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

Bernhard Haslhofer, Niko Popitsch

Linked Data / Web of Data

• Data management paradigm on the basis of Web technologies

• HTTP, URI, and RDF/S are the key technologies

• Applications (not Web browsers) are data consumers

• Links between resources play a major role

23