RDFUnit - Test-Driven Linked Data quality Assessment ()

20

description

RDFUint methodology presentation in the WWW2014 Semantic web research track

Transcript of RDFUnit - Test-Driven Linked Data quality Assessment ()

Page 1: RDFUnit - Test-Driven Linked Data quality Assessment ()

Test-driven Evaluation of Linked Data Quality

Dimitris Kontokostas14 Patrick Westphal1 Sören Auer2

Sebastian Hellmann14 Jens Lehmann1 Roland Cornelissen3

Amrapali Zaveri1

1AKSW, University of Leipzig

2University of Bonn

3Stichting Bibliotheek.nl

4DBpedia Association

2014-04-11

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 1 / 20

Page 2: RDFUnit - Test-Driven Linked Data quality Assessment ()

Problem De�nition

Unprecedented volume of structured dataon the Web

Datasets are of varying quality

LOD contains many crowd-sourced datasets with good coverage, but often poor,non-uniform quality

OWL schemas are often not su�cientlydeveloped or exploited for quality evaluation

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 2 / 20

Page 3: RDFUnit - Test-Driven Linked Data quality Assessment ()

Motivation

Quality is �tness for use

Key to the success of data webMajor barrier for industry adoption

Methodology inspired from test-drivensoftware development

Vocabularies, ontologies and knowledgebases should be accompanied by anumber of test cases, which help toensure a basic level of quality

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 3 / 20

Page 4: RDFUnit - Test-Driven Linked Data quality Assessment ()

Test-Driven Development (Software)

Test case: input on which the program under test is executed duringtesting.

Test suite: a set of test cases for testing a program

Status: Success or Fail (Error)

Test cases are implemented largely manually or with limitedprogrammatic support

H. Zhu, P. A. V. Hall, and J. H. R. May. Software unit test coverage

and adequacy. ACM Comput. Surv., 29(4):366�427, 1997.

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 4 / 20

Page 5: RDFUnit - Test-Driven Linked Data quality Assessment ()

Test-Driven Development (RDF)

Test case: a data constraint that involves one or more triples

Test suite: a set of test cases for testing a dataset

Status: Success, Fail, Timeout (complexity) or Error (e.g. network)

Fail: Error, warning or notice

RDF: basis for both data and schema

Uni�ed model facilitates automatic test case generationSPARQL serves as the test case de�nition language

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 5 / 20

Page 6: RDFUnit - Test-Driven Linked Data quality Assessment ()

Example test case

A person should never have a birth date after a death date

Test cases are written in SPARQL

SELECT ? s WHERE {? s dbo : b i r t hDa t e ? v1 .? s dbo : deathDate ? v2 .FILTER ( ? v1 > ?v2 ) }

We query for errors

Success: Query returns empty result set

Fail: Query returns results

Every result we get is a violation instance

Timeout / Error: needs further investigation on SPARQL Enginecapabilities, query syntax or query complexity

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 6 / 20

Page 7: RDFUnit - Test-Driven Linked Data quality Assessment ()

Patterns & Bindings

Data Quality Test Patterns (DQTP)abstract patterns, which can be further re�ned into concrete data qualitytest cases using test pattern bindings

Existing library of 20 patterns (DBpedia mailing lists since 2008)

SELECT ? s WHERE {? s %%P1%% ?v1 .? s %%P2%% ?v2 .FILTER ( ? v1 %%OP%% ?v2 ) }

Bindingsmapping of variables to valid pattern replacement

P1 => dbo : b i r t hDa t e | SELECT ? s WHERE {P2 => dbo : deathDate | ? s dbo : b i r t hDa t e ? v1 .OP => > | ? s dbo : deathDate ? v2 .

| FILTER ( ? v1 > ?v2 ) }

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 7 / 20

Page 8: RDFUnit - Test-Driven Linked Data quality Assessment ()

Test Auto Generators (TAGs)

RDF(s) & OWL (partial) support

Query schema for supported axioms

SELECT DISTINCT ?T1 ?T2 WHERE {?T1 owl : d i s j o i n tW i t h ?T2 . }

For every result a binding to a pattern is generated & a test caseinstantiated

Supported axioms at the moment:

RDFS: domain & rangeOWL: minCardinality, maxCardinality, cardinality, functionalProperty,InverseFunctionalProperty, disjointClass, propertyDisjointWith,AsymmetricProperty and deprecated

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 8 / 20

Page 9: RDFUnit - Test-Driven Linked Data quality Assessment ()

TAG Example

INVFUNC pattern

SELECT ? s WHERE {?a %%P1%% ? r e s o u r c e .?b %%P1%% ? r e s o u r c e .FILTER (? a != ?b ) }

Test Auto Generators => query the schema & generate bindingse.g for owl:InverseFunctionalProperty

SELECT DISTINCT ?P1 ?MESSAGE WHERE { {?P1 r d f : t ype owl : I n v e r s e F u n c t i o n a l P r o p e r t y . } UNION {?P1 r d f s : subPrope r tyOf+ owl : I n v e r s e F u n c t i o n a l P r o p e r t y }

}

Bindings => for every result of a TAG, binds values to a patterns andinstantiate test cases e.g. bind foaf:homepage to %%P1%%

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 9 / 20

Page 10: RDFUnit - Test-Driven Linked Data quality Assessment ()

Test Coverage (1/2)

Coverage computation function f : Q → 2E

Takes a SPARQL query q ∈ Q corresponding to a test case pattern bindingas input and returns a set of entities.

Property domain coverage (dom): Identi�es the ratio of propertyoccurrences, where a test-case is de�ned for verifying domainrestrictions of the property.

F ′(QS ,D) =∑

p∈F (QS) pfreq(p) where pfreq(p) is the frequency of aproperty p

fdom returns the set of all properties p such that the triple pattern(?s, p, ?o) occurs in q and there is at least one other triple patternusing ?s in q.

Property range coverage (ran): Identi�es the ratio of propertyoccurrences, where a test-case is de�ned for verifying rangerestrictions of the property. (similar to dom)

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 10 / 20

Page 11: RDFUnit - Test-Driven Linked Data quality Assessment ()

Test Coverage (2/2)

Property dependency coverage (pdep): Ratio of propertyoccurrences, where a test-case is de�ned for verifying dependencieswith other properties.

At least two di�erent propertiesProperty cardinality coverage (card): Ratio of propertyoccurrences, where a test-case is de�ned for verifying the cardinality ofthe property.

GROUP BY ?s/o and HAVING(count(?s/o) <op> <number>)Class instance coverage (mem): Ratio of classes with test-casesregarding class membership.

(?s, rdf : type, c)Class dependency coverage (cdep): Ratio of class occurrences forwhich test-cases verifying relationships with other classes are de�ned.

At least two di�erent classes

Cov(QS ,D) =1

6(F ′

dom(QS ,D) + F ′ran(QS ,D) + F ′

pdep(QS ,D)+

F ′card (QS ,D) + F ′

mem(QS ,D) + F ′cdep(QS ,D))

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 11 / 20

Page 12: RDFUnit - Test-Driven Linked Data quality Assessment ()

Schema Enrichment

Optionally run schema enrichment on the dataset (e.g. DL-Learner)

Get new axioms (�lter manually)

Run TAGs on the axioms and get additional automatic test cases

1. obtain schema information

Reasoner

SPARQLEndpoint

EnrichmentOntology

Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint)

Background Knowledge

BackgroundKnowledge+ Relevant Instance Data

List of Axiom Suggestions+ Metadata

(opt

ion

alin

voca

tion

)

2. obtain axiom type and entity specific data

3. run machine learning algorithm

3-Phase EnrichmentLearning Approach:

(onl

y ex

ecu

ted

once

per

know

ledg

e ba

se)

iterate over all axiom typesand schema entities for fullenrichment

(sam

ple

dat

aif

nece

ssar

y)

Learner

DL-Learner

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 12 / 20

Page 13: RDFUnit - Test-Driven Linked Data quality Assessment ()

Test Case Elicitation Work�ow

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 13 / 20

Page 14: RDFUnit - Test-Driven Linked Data quality Assessment ()

Test case elicitation

Test cases depend on a schema or a dataset

Automatic & Manual test cases are reusable

A dataset can be tested against a number of schema

e.g. dbo, foaf, skos

Linked Open Vocabularies (http://lov.okfn.org):

Describes 400 vocabularies in RDF (pre�x, uri, description, etc.)

Run TAGs on all vocabularies

32.293 unique reusable test cases (10/2013)

Used for pre�x dereferencing

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 14 / 20

Page 15: RDFUnit - Test-Driven Linked Data quality Assessment ()

Evaluation

Implement the methodology in (Databugger) RDFUnit tool

Tested on 3 crowd-sourced and 2 library datasets

dbpedia.org (owl, dbo, foaf, dcterms, dc, skos, geo, prov)nl.dbpedia.org (owl, dbo, foaf, dcterms, dc, skos, geo, prov)linkedgeodata.org (ngeo, spatial, lgdo, dcterms, gsp, owl, geo, skos,foaf)id.loc.gov (owl, foaf, dcterms, skos, mads, mrel, premis)datos.bne.es (owl, frbrer, isbd, dcterms and skos)

De�ned manual test cases for dbo (22), lgdo (6) & skos (20)

Enrich datasets with DL-Learner to get additional test cases

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 15 / 20

Page 16: RDFUnit - Test-Driven Linked Data quality Assessment ()

Evaluation Results

Dataset Triples Subjects TC Pass Fail TO Errors ManEr EnrEr E/R

dben 817,467,330 24,922,670 6,064 4,288 1.860 55 63,644,169 5,224,298 249,857 2.55

dbnl 74,790,253 4,831,594 5,173 4,149 812 73 5,375,671 211,604 15,041 1.11

lgd 274,690,851 51,918,417 634 545 86 3 57,693,912 133,140 1 1.11

datos 60,017,091 7,470,044 2,473 2,376 89 8 27,943,993 25 537 3.74

loc 436,126,273 53,072,042 536 499 28 9 9,392,909 49 3,663 0.18

Pattern dben dbnl lgd datos loc

COMP 1.7M 7 - - -

INVFUNC 279K 13K - 511 3.5K

LITRAN 9 - - - -

MATCH 171K 103K 637 - -

OWLASYMP 19K 3K - - -

OWLCARD 610 291 1 1 3

OWLDISJC 92 - - 8.1K 1.1K

OWLDISJP 3.4K 7K - 53 223

OWLIRREFL 1.4K 14 - - -

PVT 267K 1.2K 22 - -

RDFSDOMAIN 31M 2.3M 55M 28M 9M

RDFSRANGE 26M 2.5M 191K 320K 111K

RDFSRANGED 760K 286K 2.7M 2 -

TRIPLE - - 132K - -

TYPDEP 674K - - - -

TYPRODEP 2M 100K - - -

Errors

Schema TC dben dbnl lgd dat. loc

dbo 5.7K 7.9M 716K - - -

frbrer 2.1K - - - 11K -

lgdo 224 - - 2.8M - -

isbd 179 - - - 28M -

prov 125 25M - - - -

foaf 95 25M 4.6M - - 59

gsp 83 - - 39M - -

mads 75 - - - - 0.3M

owl 48 5 3 2 5 -

skos 28 41 - - - 9M

dcterms 28 960 881 191K 37K 659

ngeo 18 - 119 - -

geo 7 2.8M 120K 16M - -

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 16 / 20

Page 17: RDFUnit - Test-Driven Linked Data quality Assessment ()

Evaluation Coverage

Richer / stricter schemas result in higher test coverage

Metric dben lgd datos loc

fpdom 20.32% 8.98% 72.26% 20.35%

fpran 23.67% 10.78% 37.64% 28.78%

fpdep 24.93% 13.65% 77.75% 29.78%

fcard 23.67% 10.78% 37.63% 28.78%

fmem 73.51% 12.78% 93.57% 58.62%

fcdep 37.55% 0% 93.56% 36.86%

Cov(QS ,D) 33.94% 9.49% 68.74% 33.86%

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 17 / 20

Page 18: RDFUnit - Test-Driven Linked Data quality Assessment ()

Related Work

SPIN: Expresses SPARQL queries in RDF and allows SPARQL querieswith arguments

Does not fully support our Pattern Bindings (e.g. operators)Compatible but would exponentially expand our Pattern library

SELECT ?x WHERE { ? c1 owl : d i s j o i n tW i t h ? c2 .? x a ? c1 .? x a ? c2 . }

PelletICV: converts OWL constraints to SPARQL

Expresses constraints only within OWLDoes not support the (re-)use of DQTPsNo schema enrichment step

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 18 / 20

Page 19: RDFUnit - Test-Driven Linked Data quality Assessment ()

Conclusion

Methodology to de�ne reusable test cases

Evaluation revealed a substantial amount of data quality issues

First step in a larger research and development agenda

Future directions

Web serviceTest-driven data quality cockpitAutomatic repair strategies

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 19 / 20

Page 20: RDFUnit - Test-Driven Linked Data quality Assessment ()

Thank you!

Dimitris kontokostas

With kind support of

http://rdfunit.aksw.org

http://github.com/AKSW/RDFUnit

Kontokostas et al. (WWW2014) TD-LD Quality Evaluation 2014-04-11 20 / 20