A Path-based Relational RDF Database

A Path-based Relational RDF Database

A. Matono, T. Amagasa, M. Yoshikawa, S. UemuraADC 2005

SNU IDB Lab.Hyewon Lim

January 9th, 2009

2

Contents

Introduction An Overview of RDF Related Work and the Differences with Our

Work Path-based Approach for Storing RDF Data

in Relational Databases Performance Evaluation Conclusions

3

Introduction (1/8)

Quality and quantity of metadata Semantic Web makes it possible to perform

high-level processes Reasoning, deduction, semantic searches

Metadata Described by Resource Description Framework

(RDF) RDF describes data and their semantics

4

Introduction (2/8)

The specification defines an RDF model and RDF syntax

RDF model Statements describe a relationship between a

pair of terms A set of statements

Represent metadata whose structure is a directed graph

5

Introduction (3/8)

RDF is common to use as a format to de-scribe various types of metadata Typical usage: describe large-scale metadata

Wordnet (35MB), Gene Ontology (365MB), Open Di-rectory Project (2GB)

In order to handle such data efficiently, RDF DBs that can manage massive RDF data are essential

6

Introduction (4/8)

One naïve approach is to use XML DBs Any RDF data can be serialized as XML data

This approach is impractical Structure of semantics as RDF data is different

to the structure of syntax as XML data Semantics cannot be stored into XML DBs

7

Introduction (5/8)

Another way: utilize relational DBs or Berkeley DB Several RDF DBs have been proposed Such conventional RDF DBs can be classified

into two groups 1. Schema data are designed based on RDF schema

Cannot handle such RDF data that do not have accom-panying RDF schema

2. RDF DBs store RDF data in terms of triples

8

Introduction (6/8)

Problems of processing large RDF data us-ing conventional RDF databases Ability to handle RDF schema

RDF query using information of RDF schema is impor-tant classes of RDF queries

Second group do not make any distinction between schema information and instance data

First group can process such queries

Poor performance in processing path queries Need to perform a join operation per each path step

9

Introduction (7/8)

Propose a path-based relational RDF DBs Relational schema is designed to be indepen-

dent of RDF schema information, and Designed to make the distinction between

schema information and instance data Can handle schemaless RDF data as well as RDF data

with schema Extract all reachable path expressions for each

resource, and store them To improve performance for path queries Do not need to perform join operations

10

Introduction (8/8)

Steps Classify every statement into categories ac-

cording to the type of predicate Construct subgraphs for each category Store the subgraphs into distinct relational ta-

bles Apply appropriate techniques for representing the

semantics of each subgraph

Limit the structure of a subgraph is DAG

11

An Overview of RDF (1/4)

RDF A foundation for representing and manipulating

metadata on Web resources Usable as long as the location of a Web re-

source is identifiable in terms of a URI Statements represent binary relationships be-

tween two distinct(or identical) resources RDF data are modeled as a directed graph

Nodes and arcs represent resources and relationships “This paper is authored by Akiyoshi MATONO.”

www.matono.net/paper

“Akiyoshi MATONO”authored

12


RDF Schema A specification for defining schematic informa-

tion of RDF data We can define:

Classes (rdfs:class) as types of resources Properties of a class (rdf:Property) Domains (rdfs:domain) and ranges (rdf:range) of the

properties Inheritance relationships (rdfs:subClassOf,

rdfs:subPropertyOf) among classes or properties Types (rdf:type)

13


Using RDF and RDF Schema, we can rep-resent complex information

14


Classifying RDF data Large size

Wordnet, ODP, and Gene Ontology Created mainly for systematical organization of data

resources Do not contain cycles Simple structure

Small size RSS, FOAF, and Dublin Core Used as metadata of images or Web pages

15

Related Work and the Differences with Our Work (1/3)

Several RDF DBs have been proposed Most of which use Relational DBs or Berkeley

DB as their underlying data storage Approaches using RDB

Flatly sores statements into a single relational table Creates relational tables for classes and properties that

are defined in the RDF schema information, storing re-sources according to their classes/properties

Approaches using BDB Create three hash tables Keys: subjects, predicates, objects

16


Problems of the conventional approaches Using the flat and hash approaches

Difficult to perform schema queries They do not make any distinction between schema in-

formation and resource descriptions

schema approach Be able to process queries about RDF schema Cannot handle RDF data without RDF schema infor-

mation Relational schema is designed based on that

Costly to maintain schema evolution Capabilities of the three approaches for pro-

cessing path-based queries are not sufficient

17


In conventional RDF databases, statement-based queries can be processed ef-

ficiently RDF data is decomposed into a large number of

statements When processing a path-based query

Require a number of join operations according to the steps in the path expression

18

Path-based Approach for Storing RDF Data in Relational Databases

- Subgraph extraction from RDF graph(1/2)

When storing RDF data Parses the RDF data generates own RDF graph decomposes the graph into five subgraphs ac-

cording to the type of predicate Class Inheritance (CI) graphs – rdfs:subClassOf Property Inheritance (PI) graphs – rdfs:subPropertyOf Type (T) graphs – rdf:type Domain-Range (DR) graphs – rdfs:domain, rdfs:range Generic (G) graphs

19


- Subgraph extraction from RDF graph(2/2)

Advantages of dividing an RDF graph Store RDF data into distinct relational tables

Dising relational schema to be independent of RDF schema information

Structures of the resulting subgraphs are less complex than the original RDF graphs Opportunities to apply several techniques for repre-

senting each subgraph by consider each graph struc-ture

20


- Path expressions (1/3)

Most queries of RDF data Queries to detect subgraphs matching a given

graph Queries to detect a set of nodes which can be

reached via given path expressions These queries are represented in path ex-

pressions Storage based on path expressions

Decrease in the number of join operations

21



Store not the entire RDF graph only graph G to which path-based queries are

frequently posed Graph CI and PI should be stored by a scheme

that can detect ancestor-descendant relation-ships

Queries for RDF data use path expressions consisting of arcs Stores arc paths into a relational table

22



Arc path DAG g, node set V(g), arc set E(g)

A finite sequence of arcs (v0, v1), (v1, v2), …, (vk-2, vk-1), (vk-1, vk)

The path expression of the arc path l(v0, v1), l(v1, v2), …, l(vk-2, vk-1), l(vk-1, vk)

Absolute arc path An arc path whose source node is a root

vm vn

23


- Extended interval numbering scheme for DAGs (1/2)

Interval numbering scheme Detect ancestor-descendant relationships be-

tween two nodes in a tree We use it to detect inheritance relationships

between classes or properties Extend the scheme to apply it to DAGs

24


- Extended interval numbering scheme for DAGs (2/2)

Relationship between two nodes can be verified by a subsumption v is an ancestor of u iff pre(v) < pre(u) ∧ post(u) < post(v)

v is a parent of u if depth(u) - depth(v)=1

v

u

(2, 5, 1)

(4, 1, 3)

v

u

(5, 4, 2)

(6, 3, 3)

25


- Proposed relational schema (1/2)

Designed relational schema for storing RDF data based on the subgraphs

26


- Proposed relational schema (2/2)

Storage example of the RDF data

27


- Query Processing

Examples Find the title of something painted by someone

SELECT r.resourceNameFROM path AS p, resource AS rWHERE p.pathID=r.pathID AND p.pathexp=‘#title<#paints’

Find the names of the classes that are http://www.w3.org/2000/01/rdf-schema#Resources‘s direct superclass

SELECT c1.classNameFROM class AS c, class AS c1WHERE c.pre<c1.pre AND c.post>c1.postAND c.depth=c1.depth-1 AND c.className=‘http://www.w3.org/2000/01/ref-schema#Resources’

28

Performance Evaluation

Compared the processing time between our approach and Jena2 Jena2: based on the flat approach

Cannot evaluate the performance of schema-based queries Exist no RDF data with schema information

whose size is large enough to be used in our experiments on the Web

Environments Athlon 1.4 GHz CPU, 1GB memory, Gentoo

Linux 1.4, PostgreSQL 7.4.3

29


- Schema-based Queries (1/3)

Basic schema queries Find immediate children (or parents) of a

given class (or property) Find inheritance relationships between

given two classes (or properties) Find classes as a domain (or range) of a

given property Querying the meta-schema

Find all resources, that is, instances of “rdfs:Resource”.

Find all literals

30



Quering type information Find a set of instances of given class Find a set of statements using given prop-

erty

When the above queries are processed, there are two cases: Answer is obtained by a single access to

data storage, or multiple accesses

31



The ability of each approach for schema-based queries

Our approach is efficient because of interval num-ber scheme

In meta-schema queries, if the RDF graph includes many multiple paths, the redundancy is increased

32


- Path-based Queries (1/2)

Datasets Sufficient size to see scalability The G graph of the data does not contain any

cycles The G graph of the data contain long absolute

path expressions Use the Gene Ontology

33


- Path-based Queries (2/2)

Experiment results

34

Conclusions

We can handle schemaless RDF data We can process schema-based queries us-

ing the interval numbering scheme For path-based queries

Achieved high performance To reduce the number of join operations, we stored

RDF data based on path expressions

Future work Investigate query-processing techniques

Query language, query transformation, and query op-timization for RDF data

A Path-based Relational RDF Database

Documents

Transcript of A Path-based Relational RDF Database