A Path-based Relational RDF Database

34
A Path-based Relational RDF Database A. Matono, T. Amagasa, M. Yoshikawa, S. Uemura ADC 2005 SNU IDB Lab. Hyewon Lim January 9 th , 2009

description

A Path-based Relational RDF Database. A. Matono , T. Amagasa , M. Yoshikawa, S. Uemura ADC 2005 SNU IDB Lab. Hyewon Lim January 9 th , 2009. Contents. Introduction An Overview of RDF Related Work and the Differences with Our Work - PowerPoint PPT Presentation

Transcript of A Path-based Relational RDF Database

Page 1: A Path-based Relational RDF Database

A Path-based Relational RDF Database

A. Matono, T. Amagasa, M. Yoshikawa, S. UemuraADC 2005

SNU IDB Lab.Hyewon Lim

January 9th, 2009

Page 2: A Path-based Relational RDF Database

2

Contents

Introduction An Overview of RDF Related Work and the Differences with Our

Work Path-based Approach for Storing RDF Data

in Relational Databases Performance Evaluation Conclusions

Page 3: A Path-based Relational RDF Database

3

Introduction (1/8)

Quality and quantity of metadata Semantic Web makes it possible to perform

high-level processes Reasoning, deduction, semantic searches

Metadata Described by Resource Description Framework

(RDF) RDF describes data and their semantics

Page 4: A Path-based Relational RDF Database

4

Introduction (2/8)

The specification defines an RDF model and RDF syntax

RDF model Statements describe a relationship between a

pair of terms A set of statements

Represent metadata whose structure is a directed graph

Page 5: A Path-based Relational RDF Database

5

Introduction (3/8)

RDF is common to use as a format to de-scribe various types of metadata Typical usage: describe large-scale metadata

Wordnet (35MB), Gene Ontology (365MB), Open Di-rectory Project (2GB)

In order to handle such data efficiently, RDF DBs that can manage massive RDF data are essential

Page 6: A Path-based Relational RDF Database

6

Introduction (4/8)

One naïve approach is to use XML DBs Any RDF data can be serialized as XML data

This approach is impractical Structure of semantics as RDF data is different

to the structure of syntax as XML data Semantics cannot be stored into XML DBs

Page 7: A Path-based Relational RDF Database

7

Introduction (5/8)

Another way: utilize relational DBs or Berkeley DB Several RDF DBs have been proposed Such conventional RDF DBs can be classified

into two groups 1. Schema data are designed based on RDF schema

Cannot handle such RDF data that do not have accom-panying RDF schema

2. RDF DBs store RDF data in terms of triples

Page 8: A Path-based Relational RDF Database

8

Introduction (6/8)

Problems of processing large RDF data us-ing conventional RDF databases Ability to handle RDF schema

RDF query using information of RDF schema is impor-tant classes of RDF queries

Second group do not make any distinction between schema information and instance data

First group can process such queries

Poor performance in processing path queries Need to perform a join operation per each path step

Page 9: A Path-based Relational RDF Database

9

Introduction (7/8)

Propose a path-based relational RDF DBs Relational schema is designed to be indepen-

dent of RDF schema information, and Designed to make the distinction between

schema information and instance data Can handle schemaless RDF data as well as RDF data

with schema Extract all reachable path expressions for each

resource, and store them To improve performance for path queries Do not need to perform join operations

Page 10: A Path-based Relational RDF Database

10

Introduction (8/8)

Steps Classify every statement into categories ac-

cording to the type of predicate Construct subgraphs for each category Store the subgraphs into distinct relational ta-

bles Apply appropriate techniques for representing the

semantics of each subgraph

Limit the structure of a subgraph is DAG

Page 11: A Path-based Relational RDF Database

11

An Overview of RDF (1/4)

RDF A foundation for representing and manipulating

metadata on Web resources Usable as long as the location of a Web re-

source is identifiable in terms of a URI Statements represent binary relationships be-

tween two distinct(or identical) resources RDF data are modeled as a directed graph

Nodes and arcs represent resources and relationships “This paper is authored by Akiyoshi MATONO.”

www.matono.net/paper

“Akiyoshi MATONO”authored

Page 12: A Path-based Relational RDF Database

12

An Overview of RDF (2/4)

RDF Schema A specification for defining schematic informa-

tion of RDF data We can define:

Classes (rdfs:class) as types of resources Properties of a class (rdf:Property) Domains (rdfs:domain) and ranges (rdf:range) of the

properties Inheritance relationships (rdfs:subClassOf,

rdfs:subPropertyOf) among classes or properties Types (rdf:type)

Page 13: A Path-based Relational RDF Database

13

An Overview of RDF (3/4)

Using RDF and RDF Schema, we can rep-resent complex information

Page 14: A Path-based Relational RDF Database

14

An Overview of RDF (4/4)

Classifying RDF data Large size

Wordnet, ODP, and Gene Ontology Created mainly for systematical organization of data

resources Do not contain cycles Simple structure

Small size RSS, FOAF, and Dublin Core Used as metadata of images or Web pages

Page 15: A Path-based Relational RDF Database

15

Related Work and the Differences with Our Work (1/3)

Several RDF DBs have been proposed Most of which use Relational DBs or Berkeley

DB as their underlying data storage Approaches using RDB

Flatly sores statements into a single relational table Creates relational tables for classes and properties that

are defined in the RDF schema information, storing re-sources according to their classes/properties

Approaches using BDB Create three hash tables Keys: subjects, predicates, objects

Page 16: A Path-based Relational RDF Database

16

Related Work and the Differences with Our Work (2/3)

Problems of the conventional approaches Using the flat and hash approaches

Difficult to perform schema queries They do not make any distinction between schema in-

formation and resource descriptions

schema approach Be able to process queries about RDF schema Cannot handle RDF data without RDF schema infor-

mation Relational schema is designed based on that

Costly to maintain schema evolution Capabilities of the three approaches for pro-

cessing path-based queries are not sufficient

Page 17: A Path-based Relational RDF Database

17

Related Work and the Differences with Our Work (3/3)

In conventional RDF databases, statement-based queries can be processed ef-

ficiently RDF data is decomposed into a large number of

statements When processing a path-based query

Require a number of join operations according to the steps in the path expression

Page 18: A Path-based Relational RDF Database

18

Path-based Approach for Storing RDF Data in Relational Databases

- Subgraph extraction from RDF graph(1/2)

When storing RDF data Parses the RDF data generates own RDF graph decomposes the graph into five subgraphs ac-

cording to the type of predicate Class Inheritance (CI) graphs – rdfs:subClassOf Property Inheritance (PI) graphs – rdfs:subPropertyOf Type (T) graphs – rdf:type Domain-Range (DR) graphs – rdfs:domain, rdfs:range Generic (G) graphs

Page 19: A Path-based Relational RDF Database

19

Path-based Approach for Storing RDF Data in Relational Databases

- Subgraph extraction from RDF graph(2/2)

Advantages of dividing an RDF graph Store RDF data into distinct relational tables

Dising relational schema to be independent of RDF schema information

Structures of the resulting subgraphs are less complex than the original RDF graphs Opportunities to apply several techniques for repre-

senting each subgraph by consider each graph struc-ture

Page 20: A Path-based Relational RDF Database

20

Path-based Approach for Storing RDF Data in Relational Databases

- Path expressions (1/3)

Most queries of RDF data Queries to detect subgraphs matching a given

graph Queries to detect a set of nodes which can be

reached via given path expressions These queries are represented in path ex-

pressions Storage based on path expressions

Decrease in the number of join operations

Page 21: A Path-based Relational RDF Database

21

Path-based Approach for Storing RDF Data in Relational Databases

- Path expressions (2/3)

Store not the entire RDF graph only graph G to which path-based queries are

frequently posed Graph CI and PI should be stored by a scheme

that can detect ancestor-descendant relation-ships

Queries for RDF data use path expressions consisting of arcs Stores arc paths into a relational table

Page 22: A Path-based Relational RDF Database

22

Path-based Approach for Storing RDF Data in Relational Databases

- Path expressions (3/3)

Arc path DAG g, node set V(g), arc set E(g)

A finite sequence of arcs (v0, v1), (v1, v2), …, (vk-2, vk-1), (vk-1, vk)

The path expression of the arc path l(v0, v1), l(v1, v2), …, l(vk-2, vk-1), l(vk-1, vk)

Absolute arc path An arc path whose source node is a root

vm vn

Page 23: A Path-based Relational RDF Database

23

Path-based Approach for Storing RDF Data in Relational Databases

- Extended interval numbering scheme for DAGs (1/2)

Interval numbering scheme Detect ancestor-descendant relationships be-

tween two nodes in a tree We use it to detect inheritance relationships

between classes or properties Extend the scheme to apply it to DAGs

Page 24: A Path-based Relational RDF Database

24

Path-based Approach for Storing RDF Data in Relational Databases

- Extended interval numbering scheme for DAGs (2/2)

Relationship between two nodes can be verified by a subsumption v is an ancestor of u iff pre(v) < pre(u) ∧ post(u) < post(v)

v is a parent of u if depth(u) - depth(v)=1

v

u

(2, 5, 1)

(4, 1, 3)

v

u

(5, 4, 2)

(6, 3, 3)

Page 25: A Path-based Relational RDF Database

25

Path-based Approach for Storing RDF Data in Relational Databases

- Proposed relational schema (1/2)

Designed relational schema for storing RDF data based on the subgraphs

Page 26: A Path-based Relational RDF Database

26

Path-based Approach for Storing RDF Data in Relational Databases

- Proposed relational schema (2/2)

Storage example of the RDF data

Page 27: A Path-based Relational RDF Database

27

Path-based Approach for Storing RDF Data in Relational Databases

- Query Processing

Examples Find the title of something painted by someone

SELECT r.resourceNameFROM path AS p, resource AS rWHERE p.pathID=r.pathID AND p.pathexp=‘#title<#paints’

Find the names of the classes that are http://www.w3.org/2000/01/rdf-schema#Resources‘s direct superclass

SELECT c1.classNameFROM class AS c, class AS c1WHERE c.pre<c1.pre AND c.post>c1.postAND c.depth=c1.depth-1 AND c.className=‘http://www.w3.org/2000/01/ref-schema#Resources’

Page 28: A Path-based Relational RDF Database

28

Performance Evaluation

Compared the processing time between our approach and Jena2 Jena2: based on the flat approach

Cannot evaluate the performance of schema-based queries Exist no RDF data with schema information

whose size is large enough to be used in our experiments on the Web

Environments Athlon 1.4 GHz CPU, 1GB memory, Gentoo

Linux 1.4, PostgreSQL 7.4.3

Page 29: A Path-based Relational RDF Database

29

Performance Evaluation

- Schema-based Queries (1/3)

Basic schema queries Find immediate children (or parents) of a

given class (or property) Find inheritance relationships between

given two classes (or properties) Find classes as a domain (or range) of a

given property Querying the meta-schema

Find all resources, that is, instances of “rdfs:Resource”.

Find all literals

Page 30: A Path-based Relational RDF Database

30

Performance Evaluation

- Schema-based Queries (2/3)

Quering type information Find a set of instances of given class Find a set of statements using given prop-

erty

When the above queries are processed, there are two cases: Answer is obtained by a single access to

data storage, or multiple accesses

Page 31: A Path-based Relational RDF Database

31

Performance Evaluation

- Schema-based Queries (3/3)

The ability of each approach for schema-based queries

Our approach is efficient because of interval num-ber scheme

In meta-schema queries, if the RDF graph includes many multiple paths, the redundancy is increased

Page 32: A Path-based Relational RDF Database

32

Performance Evaluation

- Path-based Queries (1/2)

Datasets Sufficient size to see scalability The G graph of the data does not contain any

cycles The G graph of the data contain long absolute

path expressions Use the Gene Ontology

Page 33: A Path-based Relational RDF Database

33

Performance Evaluation

- Path-based Queries (2/2)

Experiment results

Page 34: A Path-based Relational RDF Database

34

Conclusions

We can handle schemaless RDF data We can process schema-based queries us-

ing the interval numbering scheme For path-based queries

Achieved high performance To reduce the number of join operations, we stored

RDF data based on path expressions

Future work Investigate query-processing techniques

Query language, query transformation, and query op-timization for RDF data