WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic...

20
WWW 2017 Tutorial: Semantic Data Management in Practice Part 2: Storage and Querying Olaf Hartig Linköping University [email protected] @olafhartig Olivier Curé University of Paris-Est Marne la Vallée [email protected] @oliviercure

Transcript of WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic...

Page 1: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

WWW 2017 Tutorial:Semantic Data Management in Practice

Part 2: Storage and Querying

Olaf HartigLinköping University

[email protected]

@olafhartig

Olivier CuréUniversity of Paris-Est Marne la Vallée

[email protected]

@oliviercure

Page 2: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

2WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Goals

● Achieve an initial understanding of the RDF database management ecosystem

● Understand differences between 7 identified production-ready stores

Page 3: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

3WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Overview

● RDF storage● Seven production-ready RDF stores● Ontology Based Data Access● Demo● APIs

Page 4: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

4WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

RDF Storage

● Although most production-ready RDF stores support ACID properties, they are best considered as

– OLAP (online analytical processing) – not OLTP (On line transaction processing)

● This implies that updates are performed in batch– Mainly due to reasoning (see Section 5)

Page 5: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

5WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

RDF Storage

● RDF is a logical data model and thus does not impose any physical storage solution

● Existing RDF stores are either– based on an existing DataBase Management

System, ● relational model, e.g., PostgreSQL● NoSQL, e.g., Cassandra

– Designed from scratch, e.g., as a Graph store

Page 6: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

6WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

RDF Stores Taxonomy

Page 7: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

7WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

RDF Store Ecosystem

Page 8: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

8WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

RDF Distributed data management

● RDF storage is part of Big data● Distribution of RDF triples over a cluster of machines

Page 9: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

9WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Overview

● RDF storage● Seven production-ready RDF stores● Ontology Based Data Access● Demo● APIs

Page 10: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

10WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

7 Production-Ready Systems

● They all guarantee– ACID transactions– Replication (mostly Master-Slave, some Master-

Master)– Partition (Range, Hashing)

Page 11: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

11WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Data Models and Querying

● Some of these systems support other data models– XML for MarkLogic and Virtuoso– Property graph for GraphDB, BlazeGraph and

Stardog

http://njh.me/ssn#QBE01  geo­lat 48.83

geo­lon 2.21comment “Transm..”

observes

SensingDevice

DébitMètre

type

since 2012

Page 12: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

12WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Data Models and Querying

● Some of these systems support other data models– XML for MarkLogic and Virtuoso– Property graph for GraphDB, BlazeGraph and

Stardog– Relational for Virtuoso and Oracle – Document for MarkLogic

● Hence other query languages than SPARQL (v1.1) can be supported– Gremlin for property graph, Xquery for XML, SQL

for relational, Prolog

Page 13: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

13WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

License

● Some of these systems have free editions but with some feature or use limitations:– MarkLogic’s dev license is free for up to 1TB and

10 months max– Stardog: community (10DB max with 25M

triples/DB, 4 users), dev (no limits but 30 day trial)– Allegrograph: free and dev have restrictions of 5M

and 50M respectively– Virtuoso and GraphDB: free but no clustering and

no replication– Blazegraph: free for a single machine

● All systems have commercial editions (Oracle is commercial only)

Page 14: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

14WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Summary of production-ready systemsTriple store Full-text

searchCloud-ready

Extra features

Allegrograph Integrated + solr

AMI

Blazegraph Integrated + solr

AMI Reification done right

GraphDB Integrated + solr +

elacticsearch (ent.)

AMI RDF ranking

MarkLogic Integrated AMI With Xquery, Javascript

Oracle IntegratedInline in SQL

Stardog Integrated + Lucene

AMI Integrity constraints, Explanations

Virtuoso Integrated AMI Inline in SQL

Page 15: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

15WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Overview

● RDF storage● Seven production-ready RDF stores● Ontology Based Data Access● Demo● APIs

Page 16: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

16WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

OBDA (Ontology Based Data Access) Alternative

● Relevant when you have an existing (relational) database and want to reason over it using an ontology

● The ontology models the domain, hides the structure of the data sources and enriches incomplete data

● The ontology is connected to the data sources via mappings that relate concepts and properties to SQL views over the sources

● Queries, expressed in SPARQL, are translated into the sources query language (usually SQL)

● State of the art is Ontop

Page 17: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

17WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Overview

● RDF storage● Seven production-ready RDF stores● Ontology Based Data Access● Demo● APIs

Page 18: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

18WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Demo

● With Blazegraph (v2.1.4)– Website: https://www.blazegraph.com/ – Download:

https://sourceforge.net/projects/bigdata/files/bigdata/2.1.4/blazegraph.jar/download

– Start: java -server -Xmx4g -jar blazegraph.jar– http://localhost:9999/blazegraph

● And an extract of our sensor database instantiating the Semantic Sensor Network ontology

Page 19: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

19WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Overview

● RDF storage● Seven production-ready RDF stores● Ontology Based Data Access● Demo● APIs

Page 20: WWW 2017 Tutorial: Semantic Data Management in Practice ...€¦ · WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and Querying 12 Olaf Hartig and Olivier

20WWW 2017 Tutorial: Semantic Data Management in Practice Part 2 – Storage and QueryingOlaf Hartig and Olivier Curé

Available APIs

● Two popular Java APIs to process and handle RDF data and SPARQL queries are:– RDF4J (formerly Sesame)

– Apache Jena

● They both – provide a JDBC-like API and REST-like API– storing, querying and reasoning capabilities