On Triple Dissemination, Forward- Chaining and Load Balancing in DHT Based RDF Stores Dominic...

Post on 18-Dec-2015

214 views 1 download

Tags:

Transcript of On Triple Dissemination, Forward- Chaining and Load Balancing in DHT Based RDF Stores Dominic...

On Triple Dissemination, Forward-Chaining and Load Balancing in DHT

Based RDF Stores

Dominic Battre, Felix Heine, Andre Höing, and Odej Kao

Presented byAldarwich Yaser

Albert-Ludwigs-University Freiburg SS 2009 Department of Computer Science

Computer Networks and Telematics Prof. Christian Schindelhaue

Overview

Motivation Introduction

• RDF

• DHT

• Pastry

Triples dissemination Reasoning Load Balancing References

1

Motivation

Centralized database Shortcomings• Incapable to handle load• Capacities limitation like in (Seasame,Jena)

Decentralized database • Example: Babelpeers,RDFpeers and Edutella

• Provides scalibility,effeciency and capacity

Reasoning• Infer new data from existing information

Load balancing

RDF Introduction

Resource Description Framework (RDF) Used for representing information on the Web RDFs provides a powerful model for storing and

inferencing knowledge . In RDF everything is represented by triples of the

form(S,P,O)

Example: Germany has Capital Berlin

S P O

2

DHT Introduction

Solve the item location problem in a distributed

network of nodes

Use a key k to calculate the ID

ID=hash(k)

Operations: • Put(k, x)• Get(k)

3

Triple dissemination

Triple T=(s,p,o)

identifier = (hash(s))

identifier = (hash(p))

identifier = (hash(o))

Responsible node for p

Responsible node for o

Responsible node for s

http://videolectures.net/iswc08_kaoudi_rdfs/

Query q = (s, p, o)

identifier = (hash(p))

4

Pastry Protocol

Each peer has a 128-bit ID: nodeID• Unique and uniformly distributed• Use cryptographic function applied to IP-address

Message takes O(log N) steps to destination

Node state contains:• Leaf Set • Routing table explain• Neighborhood Set

Pastry (prefix-matching)

323310

323211

322021

313221

103231

Route(m, 323310)?

Node-id

Key

RDf Reasoning

The query is formulated gernerally RDFs extract data even if the description does not

exactly match the query

Example:

Christian fatherof SchindelhauerFather subpropertyof relatives

=> Christian relative of Schindelhauer

RDFS Rules

Rule NamePreconditionGenerated Triple

rdfs2a,rdfs:domain,x

u, a , v

u, rdf:type, x

Rdfs3a, rdfs:range, x

u, a, v

v, rdf:type, x

rdfs5u, rdfs:subPropertyOf, v

v, rdfs:subpropertyOf, x

u,rdfs:subPropertyOf,x

rdfs9u, rdfs:subClassOf, x

v, rdf:type, u

v, rdf:type, x

rdfs11u, rdfs:subClassOf, v

v, rdfs:subClassOf, x

u, rdfs:subClassOf, x

6

Node Architecture

Each node hosts multiple RDf databases• local triples database

• Received triples database

• Replica database

• Generated triples

Generated Triples

Local Triples

Received Triples

Replica

5 Node

Triple dissemination in DHT

Node1 Node2 Node3 Node4

Generated Triples

Local Triples

Received Triples

Replica

Generated Triples

Local Triples

Received Triples

Replica

Generated Triples

Local Triples

Received Triples

Replica

Generated Triples

Local Triples

Received Triples

Replica

7

Triples life-cycle

Triples are subjected to different events

like (Joining, Departure)

Triples life-time• long life time triples has few refreshes refreshes

• short life time triples(generated triples)

Update triples update inferred triples Soft-state

Node Departure

Node substitution Correction of routing table Replica duty Decreasing number of replicas

8

n1

n4

n3

n2

n9

Node Arrival

More complicated Query recieving Task of replica nodes Time reduction

9

n1

n4

n3

n2

n6

n9

Load balancing

Major criticism against DHT based RDF strores Many collisions are unavoidable Example:

• DHT stores many triples with predicate rdf:type

“ rdfs:subClassOf“ create many triples with Predicate

rdf:type

Overlay Tree Builds for discrete DHT positions like the one stores triples

with rdf:type

10

Node1 Node2 Node3 Node4

Local Triples

Received Triples

Local

Generated Triples

Remote Triples

Exte

Exte

Local

Remote Triples

Local Triples

Received Triples

Generated Triples

Local Triples

Received Triples

Generated Triples

Local Triples

Received Triples

Generated Triples

Local

Remote Triples

Exte

Local

Remote Triples

Local

Remote Triples

refe

renc

esre

fere

nces

references references

Load-balancing with remote triples database11

Replicated overlay tree

Root

Rank1 Rank2

12

Query routing in overlay tree

RootRank1 Rank2

Qeury

Result

13

Handling RDFs rules in load balancing

Problem of RDF rules• As node is overloaded, the triples are splited into other nodes

• Example:

a, rdfs:domain, x

u, a, v

a, rdfs:domain, xu,a,v u,a,v

a, rdfs:domain, x

Node3Node1 Node2

Handling RDFs rules in load balancing

Solution• Make copy of most common rdfs schema into each node in

overlay tree

a, rdfs:domain, xu,a,v

Node1 Node4Node3

a, rdfs:domain, x

u, a, v

Node2

a, rdfs:domain, x a, rdfs:domain, x

Conclusion

P2p based distributed database offer better

scalability and source integration Real power of RDF is stems from possibility

to derive new data from explicit knwoledge Overlay tree is the solution for overloading

problem

References

http://www.videolectures.net http://cone.informatik.uni-freiburg.de http://www.w3schools.com http://www.w3.org/TR/rdf-schema/ http://peersim.sourceforge.net/ http://infolab.stanford.edu http://www.edutella.org/edutella.shtml Battre,heine,Kao:Top k RDF query evaluation in p2p

14

Thanks for your Attention