QueryingIncompleteGeospatial...

22
Querying Incomplete Geospatial Information in RDF Charalampos Nikolaou and Manolis Koubarakis Department of Informatics and Telecommunications National and Kapodistrian University of Athens International Symposium on Spatial and Temporal Databases (SSTD) 2013 August 23, 2013

Transcript of QueryingIncompleteGeospatial...

Page 1: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Querying  Incomplete  Geospatial  Information  in  RDF

Charalampos Nikolaou and Manolis Koubarakis

Department of Informatics and Telecommunications National and Kapodistrian University of Athens

International Symposium on Spatial and Temporal Databases (SSTD) 2013 August 23, 2013

Page 2: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Motivation •  Increased interest in publishing geospatial datasets

as linked data (i.e., encoded in RDF and with semantic links to other datasets)

•  Geospatial information might be: o  Quantitative (e.g., exact geometric information) o  Qualitative (e.g., topological relations)

... and express knowledge that is o  Complete o  Incomplete (or indefinite)

Page 3: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Ordnance  Survey  (UK)

73,546,231  triples

Page 4: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Global  Administrative  Areas  (GADM)

9,896,532 triples

Page 5: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Nomenclature  of  Territorial  Units  for  Statistics  (NUTS)

316,246 triples

Page 6: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Linked  Geospatial  Data

As of September 2011

MusicBrainz

(zitgist)

P20

Turismo de

Zaragoza

yovisto

Yahoo! Geo

Planet

YAGO

World Fact-book

El ViajeroTourism

WordNet (W3C)

WordNet (VUA)

VIVO UF

VIVO Indiana

VIVO Cornell

VIAF

URIBurner

Sussex Reading

Lists

Plymouth Reading

Lists

UniRef

UniProt

UMBEL

UK Post-codes

legislationdata.gov.uk

Uberblic

UB Mann-heim

TWC LOGD

Twarql

transportdata.gov.

uk

Traffic Scotland

theses.fr

Thesau-rus W

totl.net

Tele-graphis

TCMGeneDIT

TaxonConcept

Open Library (Talis)

tags2con delicious

t4gminfo

Swedish Open

Cultural Heritage

Surge Radio

Sudoc

STW

RAMEAU SH

statisticsdata.gov.

uk

St. Andrews Resource

Lists

ECS South-ampton EPrints

SSW Thesaur

us

SmartLink

Slideshare2RDF

semanticweb.org

SemanticTweet

Semantic XBRL

SWDog Food

Source Code Ecosystem Linked Data

US SEC (rdfabout)

Sears

Scotland Geo-

graphy

ScotlandPupils &Exams

Scholaro-meter

WordNet (RKB

Explorer)

Wiki

UN/LOCODE

Ulm

ECS (RKB

Explorer)

Roma

RISKS

RESEX

RAE2001

Pisa

OS

OAI

NSF

New-castle

LAASKISTI

JISC

IRIT

IEEE

IBM

Eurécom

ERA

ePrints dotAC

DEPLOY

DBLP (RKB

Explorer)

Crime Reports

UK

Course-ware

CORDIS (RKB

Explorer)CiteSeer

Budapest

ACM

riese

Revyu

researchdata.gov.

ukRen. Energy Genera-

tors

referencedata.gov.

uk

Recht-spraak.

nl

RDFohloh

Last.FM (rdfize)

RDF Book

Mashup

Rådata nå!

PSH

Product Types

Ontology

ProductDB

PBAC

Poké-pédia

patentsdata.go

v.uk

OxPoints

Ord-nance Survey

Openly Local

Open Library

OpenCyc

Open Corpo-rates

OpenCalais

OpenEI

Open Election

Data Project

OpenData

Thesau-rus

Ontos News Portal

OGOLOD

JanusAMP

Ocean Drilling Codices

New York

Times

NVD

ntnusc

NTU Resource

Lists

Norwe-gian

MeSH

NDL subjects

ndlna

myExperi-ment

Italian Museums

medu-cator

MARC Codes List

Man-chester Reading

Lists

Lotico

Weather Stations

London Gazette

LOIUS

Linked Open Colors

lobidResources

lobidOrgani-sations

LEM

LinkedMDB

LinkedLCCN

LinkedGeoData

LinkedCT

LinkedUser

FeedbackLOV

Linked Open

Numbers

LODE

Eurostat (OntologyCentral)

Linked EDGAR

(OntologyCentral)

Linked Crunch-

base

lingvoj

Lichfield Spen-ding

LIBRIS

Lexvo

LCSH

DBLP (L3S)

Linked Sensor Data (Kno.e.sis)

Klapp-stuhl-club

Good-win

Family

National Radio-activity

JP

Jamendo (DBtune)

Italian public

schools

ISTAT Immi-gration

iServe

IdRef Sudoc

NSZL Catalog

Hellenic PD

Hellenic FBD

PiedmontAccomo-dations

GovTrack

GovWILD

GoogleArt

wrapper

gnoss

GESIS

GeoWordNet

GeoSpecies

GeoNames

GeoLinkedData

GEMET

GTAA

STITCH

SIDER

Project Guten-berg

MediCare

Euro-stat

(FUB)

EURES

DrugBank

Disea-some

DBLP (FU

Berlin)

DailyMed

CORDIS(FUB)

Freebase

flickr wrappr

Fishes of Texas

Finnish Munici-palities

ChEMBL

FanHubz

EventMedia

EUTC Produc-

tions

Eurostat

Europeana

EUNIS

EU Insti-

tutions

ESD stan-dards

EARTh

Enipedia

Popula-tion (En-AKTing)

NHS(En-

AKTing) Mortality(En-

AKTing)

Energy (En-

AKTing)

Crime(En-

AKTing)

CO2 Emission

(En-AKTing)

EEA

SISVU

education.data.g

ov.uk

ECS South-ampton

ECCO-TCP

GND

Didactalia

DDC Deutsche Bio-

graphie

datadcs

MusicBrainz

(DBTune)

Magna-tune

John Peel

(DBTune)

Classical (DB

Tune)

AudioScrobbler (DBTune)

Last.FM artists

(DBTune)

DBTropes

Portu-guese

DBpedia

dbpedia lite

Greek DBpedia

DBpedia

data-open-ac-uk

SMCJournals

Pokedex

Airports

NASA (Data Incu-bator)

MusicBrainz(Data

Incubator)

Moseley Folk

Metoffice Weather Forecasts

Discogs (Data

Incubator)

Climbing

data.gov.uk intervals

Data Gov.ie

databnf.fr

Cornetto

reegle

Chronic-ling

America

Chem2Bio2RDF

Calames

businessdata.gov.

uk

Bricklink

Brazilian Poli-

ticians

BNB

UniSTS

UniPathway

UniParc

Taxonomy

UniProt(Bio2RDF)

SGD

Reactome

PubMedPub

Chem

PRO-SITE

ProDom

Pfam

PDB

OMIMMGI

KEGG Reaction

KEGG Pathway

KEGG Glycan

KEGG Enzyme

KEGG Drug

KEGG Com-pound

InterPro

HomoloGene

HGNC

Gene Ontology

GeneID

Affy-metrix

bible ontology

BibBase

FTS

BBC Wildlife Finder

BBC Program

mes BBC Music

Alpine Ski

Austria

LOCAH

Amster-dam

Museum

AGROVOC

AEMET

US Census (rdfabout)

Media

Geographic

Publications

Government

Cross-domain

Life sciences

User-generated content

~  62  billion  triples

Page 7: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Question

How do we manage (represent, store, query) this data efficiently?

Page 8: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Challenges:  Theory ①  RDF extensions for representing and querying incomplete

qualitative and quantitative geospatial information

•  GeoSPARQL o  Standard OGC query language for RDF data with geospatial information o  Topological relations can be expressed/queried, but no reasoning is

offered.

•  We proposed RDFi

o  Can work with any topological/temporal constraint language with/without constant symbols (e.g., RCC-5, RCC-8, IA)

o  Formal semantics and algorithm for computing certain answers o  Preliminary complexity results for various constraint languages

•  No published algorithm for query processing when considering RCC-8 and constants

Open  issue

Page 9: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

RDFi  by  example gag:Region rdfs:subClassOf geo:Feature. gag:WestGreece rdf:type gag:Region. gag:Municipality rdfs:subClassOf geo:Feature.

gag:OlympiaMuni rdf:type gag:Municipality.

noa:Hotspot rdfs:subClassOf geo:Feature. noa:hotspot rdf:type noa:Hospot.

noa:Fire rdfs:subClassOf geo:Feature. noa:fire rdf:type noa:Fire.

gag:OlympiaMuni geo:hasGeometry ex:oGeo. ex:oGeo rdf:type sf:Polygon.

ex:oGeo geo:asWKT "POLYGON((..))"^^geo:wktLiteral.

noa:hotspot geo:hasGeometry ex:rec. ex:rec geo:asWKT "POLYGON((..))"^^geo:wktLiteral.

gag:WestGreece geo:sfContains gag:OlympiaMuni. noa:hotspot geo:sfContains noa:fire.

West  Greece

Olympia

Page 10: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

RDFi  by  example  (cont’d) Query: Find fires inside the region of West Greece. GeoSPARQL query: CERTAIN SELECT ?f WHERE { ?f rdf:type noa:Fire.

gag:WestGreece geo:sfContains ?f. }

West  Greece

Olympia

Page 11: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

RDFi  by  example  (cont’d) Query: Find fires inside the region of West Greece. GeoSPARQL query: CERTAIN SELECT ?f WHERE { ?f rdf:type noa:Fire.

gag:WestGreece geo:sfContains ?f. }

West  Greece

Olympia

contains

contains

contains

Page 12: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Challenges:  Theory ②  Efficient computation of the entailment relation

Φ⊨Θ •  where Φ and Θ are quantifier-free first-order

formulas of a constraint language expressing the topological relations of various frameworks (RCC-8, DE-9IM, etc.)

Page 13: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Challenges:  Theory ③  Computing entailment is equivalent to checking

consistency of formulas with constraint networks

•  Constraint networks: o  Spatial relations among regions o  Regions might be constant ones (exact geometric

information) or identified by a URI

•  Most recent results considered basic and complete RCC-5 networks with polygonal regions

•  For RCC-8, deciding consistency is NP-complete •  No published algorithm for checking consistency •  Are there tractable cases?

Open  issue

Page 14: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Challenges:  Practice ④  Scale to billions of triples

•  Reasoners from QSR scale only up to hundreds of regions with complex spatial relations

How do they perform in our case?

•  Setting: o  Real linked geospatial datasets o  No constants o  Only base RCC-8 relations o  Evaluation of consistency checking using the well-known

path-consistency algorithm

Page 15: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Experimental  evaluation

0.01

0.1

1

10

100

1000

gag nuts admingeo gadm-geovocab

ela

pse

d t

ime

- m

inu

tes

(lo

gsc

ale

)

dataset

Timeout PostgreSQLRenz

PyRCC8PPyRCC8

Setup: Intel Xeon E5620, 2.4 GHz, 12MB L3, 48GB RAM, RAID 5, Ubuntu 12.04

•  Computation of the complete constraint network

•  Running time: O(n3)

•  Memory requirements: O(n2)

n ≈ thousands to millions

hundreds of regions

thousands of regions

thousands of regions

thousands of regions

after  one  day

Page 16: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Network  structure   •  We have started working on algorithms taking into

account the structure of these networks: o  Node degrees fit a power-law distribution o  Network is sparse

100

101

102

103

104

105

106

100 101 102 103 104 105

Nu

mb

er

of

no

de

s (lo

gsc

ale

)

Degree (logscale)

Power-law with ! = 2.1

Page 17: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Network  structure  (cont’d) •  Edges of three kinds:

•  Reflect networks composed of components with

hierarchical structure o  R-tree extensions (Papadias, Kalnis, Mamoulis, AAAI’99)

•  Parallel algorithms combined with backward-chaining techniques for lazy query processing o  Graph partitioning o  Path compression data structures and indexes

externally connected equals non-tangential proper part

Page 18: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Related  work:  Spatial

•  Qualitative spatial reasoning -  Efficient algorithms for consistency checking of constraint

networks (complex spatial relations, few number of regions) -  Does not consider query processing

•  Description logic reasoners -  PelletSpatial: RCC-8 reasoning (cannot handle disjunctions) -  RacerPro: RCC-8 reasoning

Page 19: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Related  work:  Temporal •  Chaudhuri (VLDB’88)

•  The knowledge representation language Telos (TOIS’90)

•  Foundations of temporal constraint databases (Koubarakis,

PhD thesis, ‘94)

•  Qualitative temporal reasoning community (since 80s)

•  SQL+i system (BNCOD‘96)

•  Later system (IEEE’97)

•  Hurtado and Vaisman (2006)

Page 20: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Conclusions •  What’s the CHALLENGE?

Implementing an efficient query processing system for incomplete geospatial information in RDFi

•  The desired system should: o  reason about qualitative and quantitative spatial

information that might be incomplete o  be scalable to billions of triples in the most useful cases

Page 21: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Thank  you

Page 22: QueryingIncompleteGeospatial InformationinRDF8cgi.di.uoa.gr/~charnik/files/pubs/2013/nikolaou-sstd2013.pdfMusic Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs

Dataset  characteristics

Dataset #triples #regions #RCC-8 relations

#definiterelations(after PC)

#indefiniterelations(after PC)

ADMGB 149 046 11 762 77 907 46 777 728 45 777 577GAG 11 780 412 3023 4870 82 231NUTS-RDF 316 246 2236 3176 906 558 2 045 451GADM-RDF 9 896 532 276 728 590 445 X XGADM-RDF-EUROPE 355 656 23 037 51 309 X X