Query-Driven Management of Linked Data Quality

19
Query-Driven Management of Linked Data Quality Fariz Darari Supervised by: Prof. Werner Nutt Free University of Bozen-Bolzano [email protected] Dec 5, 2014 Fariz Darari (UniBZ) Dec 5, 2014 1 / 19

Transcript of Query-Driven Management of Linked Data Quality

Page 1: Query-Driven Management of Linked Data Quality

Query-Driven Management of Linked Data Quality

Fariz DarariSupervised by: Prof. Werner Nutt

Free University of Bozen-Bolzano

[email protected]

Dec 5, 2014

Fariz Darari (UniBZ) Dec 5, 2014 1 / 19

Page 2: Query-Driven Management of Linked Data Quality

Overview

1 Quality Linked Data is Everywhere?

2 Query-Driven Data Quality

3 Research Problems

4 Current Results

Fariz Darari (UniBZ) Dec 5, 2014 2 / 19

Page 3: Query-Driven Management of Linked Data Quality

Section 1

Quality Linked Data is Everywhere?

Fariz Darari (UniBZ) Dec 5, 2014 3 / 19

Page 4: Query-Driven Management of Linked Data Quality

DBpedia

Everybody knows Wikipedia.

DBpedia is a Linked Data version of Wikipedia.

Fariz Darari (UniBZ) Dec 5, 2014 4 / 19

Page 5: Query-Driven Management of Linked Data Quality

Google’s Knowledge Graph

Fariz Darari (UniBZ) Dec 5, 2014 5 / 19

Page 6: Query-Driven Management of Linked Data Quality

Linked Data is Everywhere

Linked Datasets as of August 2014

Uniprot

AlexandriaDigital Library

Gazetteer

lobidOrganizations

chem2bio2rdf

MultimediaLab University

Ghent

Open DataEcuador

GeoEcuador

Serendipity

UTPLLOD

GovAgriBusDenmark

DBpedialive

URIBurner

Linguistics

Social Networking

Life Sciences

Cross-Domain

Government

User-Generated Content

Publications

Geographic

Media

Identifiers

EionetRDF

lobidResources

WiktionaryDBpedia

Viaf

Umthes

RKBExplorer

Courseware

Opencyc

Olia

Gem.Thesaurus

AudiovisueleArchieven

DiseasomeFU-Berlin

Eurovocin

SKOS

DNBGND

Cornetto

Bio2RDFPubmed

Bio2RDFNDC

Bio2RDFMesh

IDS

OntosNewsPortal

AEMET

ineverycrea

LinkedUser

Feedback

MuseosEspaniaGNOSS

Europeana

NomenclatorAsturias

Red UnoInternacional

GNOSS

GeoWordnet

Bio2RDFHGNC

CticPublic

Dataset

Bio2RDFHomologene

Bio2RDFAffymetrix

MuninnWorld War I

CKAN

GovernmentWeb Integration

forLinkedData

Universidadde CuencaLinkeddata

Freebase

Linklion

Ariadne

OrganicEdunet

GeneExpressionAtlas RDF

ChemblRDF

BiosamplesRDF

IdentifiersOrg

BiomodelsRDF

ReactomeRDF

Disgenet

SemanticQuran

IATI asLinked Data

DutchShips and

Sailors

Verrijktkoninkrijk

IServe

Arago-dbpedia

LinkedTCGA

ABS270a.info

RDFLicense

EnvironmentalApplications

ReferenceThesaurus

Thist

JudaicaLink

BPR

OCD

ShoahVictimsNames

Reload

Data forTourists in

Castilla y Leon

2001SpanishCensusto RDF

RKBExplorer

Webscience

RKBExplorerEprintsHarvest

NVS

EU AgenciesBodies

EPO

LinkedNUTS

RKBExplorer

Epsrc

OpenMobile

Network

RKBExplorerLisbon

RKBExplorer

Italy

CE4R

EnvironmentAgency

Bathing WaterQuality

RKBExplorerKaunas

OpenData

Thesaurus

RKBExplorerWordnet

RKBExplorer

ECS

AustrianSki

Racers

Social-semweb

Thesaurus

DataOpenAc Uk

RKBExplorer

IEEE

RKBExplorer

LAAS

RKBExplorer

Wiki

RKBExplorer

JISC

RKBExplorerEprints

RKBExplorer

Pisa

RKBExplorer

Darmstadt

RKBExplorerunlocode

RKBExplorer

Newcastle

RKBExplorer

OS

RKBExplorer

Curriculum

RKBExplorer

Resex

RKBExplorer

Roma

RKBExplorerEurecom

RKBExplorer

IBM

RKBExplorer

NSF

RKBExplorer

kisti

RKBExplorer

DBLP

RKBExplorer

ACM

RKBExplorerCiteseer

RKBExplorer

Southampton

RKBExplorerDeepblue

RKBExplorerDeploy

RKBExplorer

Risks

RKBExplorer

ERA

RKBExplorer

OAI

RKBExplorer

FT

RKBExplorer

Ulm

RKBExplorer

Irit

RKBExplorerRAE2001

RKBExplorer

Dotac

RKBExplorerBudapest

SwedishOpen Cultural

Heritage

Radatana

CourtsThesaurus

GermanLabor LawThesaurus

GovUKTransport

Data

GovUKEducation

Data

EnaktingMortality

EnaktingEnergy

EnaktingCrime

EnaktingPopulation

EnaktingCO2Emission

EnaktingNHS

RKBExplorer

Crime

RKBExplorercordis

Govtrack

GeologicalSurvey of

AustriaThesaurus

GeoLinkedData

GesisThesoz

Bio2RDFPharmgkb

Bio2RDFSabiorkBio2RDF

Ncbigene

Bio2RDFIrefindex

Bio2RDFIproclass

Bio2RDFGOA

Bio2RDFDrugbank

Bio2RDFCTD

Bio2RDFBiomodels

Bio2RDFDBSNP

Bio2RDFClinicaltrials

Bio2RDFLSR

Bio2RDFOrphanet

Bio2RDFWormbase

BIS270a.info

DM2E

DBpediaPT

DBpediaES

DBpediaCS

DBnary

AlpinoRDF

YAGO

PdevLemon

Lemonuby

Isocat

Ietflang

Core

KUPKB

GettyAAT

SemanticWeb

Journal

OpenlinkSWDataspaces

MyOpenlinkDataspaces

Jugem

Typepad

AspireHarperAdams

NBNResolving

Worldcat

Bio2RDF

Bio2RDFECO

Taxon-conceptAssets

Indymedia

GovUKSocietal

WellbeingDeprivation imd

EmploymentRank La 2010

GNULicenses

GreekWordnet

DBpedia

CIPFA

Yso.fiAllars

Glottolog

StatusNetBonifaz

StatusNetshnoulle

Revyu

StatusNetKathryl

ChargingStations

AspireUCL

Tekord

Didactalia

ArtenueVosmedios

GNOSS

LinkedCrunchbase

ESDStandards

VIVOUniversityof Florida

Bio2RDFSGD

Resources

ProductOntology

DatosBne.es

StatusNetMrblog

Bio2RDFDataset

EUNIS

GovUKHousingMarket

LCSH

GovUKTransparencyImpact ind.Households

In temp.Accom.

UniprotKB

StatusNetTimttmy

SemanticWeb

Grundlagen

GovUKInput ind.

Local AuthorityFunding FromGovernment

Grant

StatusNetFcestrada

JITA

StatusNetSomsants

StatusNetIlikefreedom

DrugbankFU-Berlin

Semanlink

StatusNetDtdns

StatusNetStatus.net

DCSSheffield

AtheliaRFID

StatusNetTekk

ListaEncabezaMientosMateria

StatusNetFragdev

Morelab

DBTuneJohn PeelSessions

RDFizelast.fm

OpenData

Euskadi

GovUKTransparency

Input ind.Local auth.Funding f.

Gvmnt. Grant

MSC

Lexinfo

StatusNetEquestriarp

Asn.us

GovUKSocietal

WellbeingDeprivation ImdHealth Rank la

2010

StatusNetMacno

OceandrillingBorehole

AspireQmul

GovUKImpact

IndicatorsPlanning

ApplicationsGranted

Loius

Datahub.io

StatusNetMaymay

Prospectsand

TrendsGNOSS

GovUKTransparency

Impact IndicatorsEnergy Efficiency

new Builds

DBpediaEU

Bio2RDFTaxon

StatusNetTschlotfeldt

JamendoDBTune

AspireNTU

GovUKSocietal

WellbeingDeprivation Imd

Health Score2010

LoticoGNOSS

UniprotMetadata

LinkedEurostat

AspireSussex

Lexvo

LinkedGeoData

StatusNetSpip

SORS

GovUKHomeless-

nessAccept. per

1000

TWCIEEEvis

AspireBrunel

PlanetDataProject

Wiki

StatusNetFreelish

Statisticsdata.gov.uk

StatusNetMulestable

Enipedia

UKLegislation

API

LinkedMDB

StatusNetQth

SiderFU-Berlin

DBpediaDE

GovUKHouseholds

Social lettingsGeneral Needs

Lettings PrpNumber

Bedrooms

AgrovocSkos

MyExperiment

ProyectoApadrina

GovUKImd CrimeRank 2010

SISVU

GovUKSocietal

WellbeingDeprivation ImdHousing Rank la

2010

StatusNetUni

Siegen

OpendataScotland Simd

EducationRank

StatusNetKaimi

GovUKHouseholds

Accommodatedper 1000

StatusNetPlanetlibre

DBpediaEL

SztakiLOD

DBpediaLite

DrugInteractionKnowledge

BaseStatusNet

Qdnx

AmsterdamMuseum

AS EDN LOD

RDFOhloh

DBTuneartistslast.fm

AspireUclan

HellenicFire Brigade

Bibsonomy

NottinghamTrent

ResourceLists

OpendataScotland SimdIncome Rank

RandomnessGuide

London

OpendataScotland

Simd HealthRank

SouthamptonECS Eprints

FRB270a.info

StatusNetSebseb01

StatusNetBka

ESDToolkit

HellenicPolice

StatusNetCed117

OpenEnergy

Info Wiki

StatusNetLydiastench

OpenDataRISP

Taxon-concept

Occurences

Bio2RDFSGD

UIS270a.info

NYTimesLinked Open

Data

AspireKeele

GovUKHouseholdsProjectionsPopulation

W3C

OpendataScotland

Simd HousingRank

ZDB

StatusNet1w6

StatusNetAlexandre

Franke

DeweyDecimal

Classification

StatusNetStatus

StatusNetdoomicile

CurrencyDesignators

StatusNetHiico

LinkedEdgar

GovUKHouseholds

2008

DOI

StatusNetPandaid

BrazilianPoliticians

NHSJargon

Theses.fr

LinkedLifeData

Semantic WebDogFood

UMBEL

OpenlyLocal

StatusNetSsweeny

LinkedFood

InteractiveMaps

GNOSS

OECD270a.info

Sudoc.fr

GreenCompetitive-

nessGNOSS

StatusNetIntegralblue

WOLD

LinkedStockIndex

Apache

KDATA

LinkedOpenPiracy

GovUKSocietal

WellbeingDeprv. ImdEmpl. Rank

La 2010

BBCMusic

StatusNetQuitter

StatusNetScoffoni

OpenElection

DataProject

Referencedata.gov.uk

StatusNetJonkman

ProjectGutenbergFU-BerlinDBTropes

StatusNetSpraci

Libris

ECB270a.info

StatusNetThelovebug

Icane

GreekAdministrative

Geography

Bio2RDFOMIM

StatusNetOrangeseeds

NationalDiet Library

WEB NDLAuthorities

UniprotTaxonomy

DBpediaNL

L3SDBLP

FAOGeopolitical

Ontology

GovUKImpact

IndicatorsHousing Starts

DeutscheBiographie

StatusNetldnfai

StatusNetKeuser

StatusNetRusswurm

GovUK SocietalWellbeing

Deprivation ImdCrime Rank 2010

GovUKImd Income

Rank La2010

StatusNetDatenfahrt

StatusNetImirhil

Southamptonac.uk

LOD2Project

Wiki

DBpediaKO

DailymedFU-Berlin

WALS

DBpediaIT

StatusNetRecit

Livejournal

StatusNetExdc

Elviajero

Aves3D

OpenCalais

ZaragozaTurruta

AspireManchester

Wordnet(VU)

GovUKTransparency

Impact IndicatorsNeighbourhood

Plans

StatusNetDavid

Haberthuer

B3Kat

PubBielefeld

Prefix.cc

NALT

Vulnera-pedia

GovUKImpact

IndicatorsAffordable

Housing Starts

GovUKWellbeing lsoa

HappyYesterday

Mean

FlickrWrappr

Yso.fiYSA

OpenLibrary

AspirePlymouth

StatusNetJohndrink

Water

StatusNetGomertronic

Tags2conDelicious

StatusNettl1n

StatusNetProgval

Testee

WorldFactbookFU-Berlin

DBpediaJA

StatusNetCooleysekula

ProductDB

IMF270a.info

StatusNetPostblue

StatusNetSkilledtests

NextwebGNOSS

EurostatFU-Berlin

GovUKHouseholds

Social LettingsGeneral Needs

Lettings PrpHousehold

Composition

StatusNetFcac

DWSGroup

OpendataScotland

GraphSimd Rank

DNB

CleanEnergyData

Reegle

OpendataScotland SimdEmployment

Rank

ChroniclingAmerica

GovUKSocietal

WellbeingDeprivation

Imd Rank 2010

StatusNetBelfalas

AspireMMU

StatusNetLegadolibre

BlukBNB

StatusNetLebsanft

GADMGeovocab

GovUKImd Score

2010

SemanticXBRL

UKPostcodes

GeoNames

EEARodAspire

Roehampton

BFS270a.info

CameraDeputatiLinkedData

Bio2RDFGeneID

GovUKTransparency

Impact IndicatorsPlanning

ApplicationsGranted

StatusNetSweetie

Belle

O'Reilly

GNI

CityLichfield

GovUKImd

Rank 2010

BibleOntology

Idref.fr

StatusNetAtari

Frosch

Dev8d

NobelPrizes

StatusNetSoucy

ArchiveshubLinkedData

LinkedRailway

DataProject

FAO270a.info

GovUKWellbeing

WorthwhileMean

Bibbase

Semantic-web.org

BritishMuseum

Collection

GovUKDev LocalAuthorityServices

CodeHaus

Lingvoj

OrdnanceSurveyLinkedData

Wordpress

EurostatRDF

StatusNetKenzoid

GEMET

GovUKSocietal

WellbeingDeprv. imdScore '10

MisMuseosGNOSS

GovUKHouseholdsProjections

totalHouseolds

StatusNet20100

EEA

CiardRing

OpendataScotland Graph

EducationPupils by

School andDatazone

VIVOIndiana

University

Pokepedia

Transparency270a.info

StatusNetGlou

GovUKHomelessness

HouseholdsAccommodated

TemporaryHousing Types

STWThesaurus

forEconomics

DebianPackageTrackingSystem

DBTuneMagnatune

NUTSGeo-vocab

GovUKSocietal

WellbeingDeprivation ImdIncome Rank La

2010

BBCWildlifeFinder

StatusNetMystatus

MiguiadEviajesGNOSS

AcornSat

DataBnf.fr

GovUKimd env.

rank 2010

StatusNetOpensimchat

OpenFoodFacts

GovUKSocietal

WellbeingDeprivation Imd

Education Rank La2010

LODACBDLS

FOAF-Profiles

StatusNetSamnoble

GovUKTransparency

Impact IndicatorsAffordable

Housing Starts

StatusNetCoreyavisEnel

Shops

DBpediaFR

StatusNetRainbowdash

StatusNetMamalibre

PrincetonLibrary

Findingaids

WWWFoundation

Bio2RDFOMIM

Resources

OpendataScotland Simd

GeographicAccess Rank

Gutenberg

StatusNetOtbm

ODCLSOA

StatusNetOurcoffs

Colinda

WebNmasunoTraveler

StatusNetHackerposse

LOV

GarnicaPlywood

GovUKwellb. happy

yesterdaystd. dev.

StatusNetLudost

BBCProgram-

mes

GovUKSocietal

WellbeingDeprivation Imd

EnvironmentRank 2010

Bio2RDFTaxonomy

Worldbank270a.info

OSM

DBTuneMusic-brainz

LinkedMarkMail

StatusNetDeuxpi

GovUKTransparency

ImpactIndicators

Housing Starts

BizkaiSense

GovUKimpact

indicators energyefficiency new

builds

StatusNetMorphtown

GovUKTransparency

Input indicatorsLocal authorities

Working w. tr.Families

ISO 639Oasis

AspirePortsmouth

ZaragozaDatos

AbiertosOpendataScotland

SimdCrime Rank

Berlios

StatusNetpiana

GovUKNet Add.Dwellings

Bootsnall

StatusNetchromic

Geospecies

linkedct

Wordnet(W3C)

StatusNetthornton2

StatusNetmkuttner

StatusNetlinuxwrangling

EurostatLinkedData

GovUKsocietal

wellbeingdeprv. imd

rank '07

GovUKsocietal

wellbeingdeprv. imdrank la '10

LinkedOpen Data

ofEcology

StatusNetchickenkiller

StatusNetgegeweb

DeustoTech

StatusNetschiessle

GovUKtransparency

impactindicatorstr. families

Taxonconcept

GovUKservice

expenditure

GovUKsocietal

wellbeingdeprivation imd

employmentscore 2010

Fariz Darari (UniBZ) Dec 5, 2014 6 / 19

Page 7: Query-Driven Management of Linked Data Quality

Data Completeness

Fariz Darari (UniBZ) Dec 5, 2014 7 / 19

Page 8: Query-Driven Management of Linked Data Quality

Data Correctness

Fariz Darari (UniBZ) Dec 5, 2014 8 / 19

Page 9: Query-Driven Management of Linked Data Quality

Data Timeliness

Fariz Darari (UniBZ) Dec 5, 2014 9 / 19

Page 10: Query-Driven Management of Linked Data Quality

Section 2

Query-Driven Data Quality

Fariz Darari (UniBZ) Dec 5, 2014 10 / 19

Page 11: Query-Driven Management of Linked Data Quality

Sieve’s Approach

Pablo N. Mendes, Hannes Muhleisen, Christian Bizer: Sieve: Linked Data quality assessment and fusion.EDBT/ICDT Workshops 2012.

Fariz Darari (UniBZ) Dec 5, 2014 11 / 19

Page 12: Query-Driven Management of Linked Data Quality

Example: Query-Driven Data Completeness

Fariz Darari (UniBZ) Dec 5, 2014 12 / 19

Page 13: Query-Driven Management of Linked Data Quality

Generalization: Query-Driven Data Quality

2-step query-driven Linked Data quality management:

We annotate parts of data with data quality aspects such ascompleteness, correctness and timeliness.

Then, query answers are checkedif they reside inside the annotated parts.

Fariz Darari (UniBZ) Dec 5, 2014 13 / 19

Page 14: Query-Driven Management of Linked Data Quality

Section 3

Research Problems

Fariz Darari (UniBZ) Dec 5, 2014 14 / 19

Page 15: Query-Driven Management of Linked Data Quality

Interrelations between Data Quality Aspects

There is already the query-driven data completeness framework1

How to generalize a model to support representation of datacorrectness, data completeness, data timeliness and other qualityaspects over data sources,

and provide reasoning techniques for such interrelations?1Fariz Darari, Werner Nutt, Giuseppe Pirro, Simon Razniewski: Completeness

Statements about RDF Data Sources and Their Use for Query Answering.ISWC 2013.

Fariz Darari (UniBZ) Dec 5, 2014 15 / 19

Page 16: Query-Driven Management of Linked Data Quality

Other Research Problems

Data Quality and Data Provenance

Data Completeness and Linked Data Streams

Data Completeness and SPARQL Queries

Data Quality and Data Privacy

Fariz Darari (UniBZ) Dec 5, 2014 16 / 19

Page 17: Query-Driven Management of Linked Data Quality

Section 4

Current Results

Fariz Darari (UniBZ) Dec 5, 2014 17 / 19

Page 18: Query-Driven Management of Linked Data Quality

Current Results

Timestamped completeness statements

Efficient techniques for completeness reasoning2

Completeness reasoning implementation3

RDF and SPARQL reconciliation4

2To be submitted to the ACM TWEB

3Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A Completeness Reasoner for SPARQL Queries

Over RDF Data Sources. ESWC 2014 Demo.4Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic Gap between RDF and SPARQL

using Completeness Statements. ISWC 2014 Poster.

Fariz Darari (UniBZ) Dec 5, 2014 18 / 19

Page 19: Query-Driven Management of Linked Data Quality

Query-Driven Management of Linked Data Quality

Take away messages:Query-driven data quality approach is formal but flexibleto manage Linked Data quality.

Thank you!

Fariz Darari (UniBZ) Dec 5, 2014 19 / 19