1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with...

35
1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué, Gia-Hien Nguyen, Laurent Simon

Transcript of 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with...

Page 1: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

1

Building scalable semantic PDMS: the SomeWhere approach.

Marie-Christine Rousset

Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué, Gia-Hien Nguyen, Laurent Simon

Page 2: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

2

How to make semantic approaches scalable to the web ?

A data centered vision of the Semantic Web – viewed as a huge semantic and distributed data

management system

SomeWhere – a peer to peer infrastructure– based on simple personalized ontologies and

mappings distributed at large scale

Focus of this talk

Page 3: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

3

P2P Data Management Systems

Logical network of peers (≠ physical network)– each peer is characterized by

its physical address (IP) a description of the stored resources its neighbors in the network

– the peers to which it can transmit messages (queries,...)

Various topologies – random and dynamic (Gnutella)– fixed (Chord, Hypercube)– guided by the semantics

SON, Edutella, Piazza, DRAGO, coDB, Somewhere

Page 4: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

4

SomeWhere logical networks The topology is not fixed Guided by mappings

– A peer joins by declaring mappings

between its ontology and the ontologies of some peers that it knows

leaves by removing the mappings with its acquaintances in the network

Page 5: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

5

SomeWhere in a nutshell

Simple data model based on a propositional language of classes – for defining ontologies, mappings, and queries– a sublanguage of OWL DL (W3C)

Scales up to one thousand peers – logical network : « small world »

Page 6: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

6

SomeWhere Data Model

A0

A1 A2

A3 A4 A5

St_A3St_A5

Data Data

Ontology: hierarchy of intentional classes

Schema+Data

Storage description: extensional classes

More complex inclusion statement: St_A1 A1 ¬A2

Page 7: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

7

SomeWhere Data Model

Mappings:

Q1 Q2

Q

A1 (A2 A3) B1 ¬B3

A1 A2

A3

A

B

B1 B2

B3

Queries:

Logical combination of class literals: A1 ¬A3

A1 B3 B3 A1

Page 8: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

8

Semantics

Standard FO logical semantics– one single domain of interpretation– a distributed set of formulas interpreted in the same way as if

they were not distributed– in contrast with some other approaches

coDB: epistemic logic DRAGO: distributed semantics of DDL or DFOL based on a

collection of domains of interpretations Our assumption:

– the objects have a unique URI– objects stored at different peers and having the same URI are

interpreted as being the same

Page 9: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

9

Data model: example

Musique

Rock Pop Classique

Français US St_pop Tchaikovsky

St_Français St_US St_Tchaikovsky

Radio

Mouv Nostalgie

St_Mouv St_Nostalgie

Mouv Rock

Music

Pop_Rock Classical

St_Pop_Rock

St_Ru Tchai

St_Tchai

Ru It

Rock Pop Pop_Rock

Tchai Tchaikovsky

Tchaikovsky Tchai

P1P2

Page 10: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

10

Query answering : illustration

Musique

Rock Pop Classique

Français US St_pop Tchaikovsky

St_Français St_US St_Tchaikovsky

Music

Pop_Rock Classical

St_Pop_Rock

St_Ru Tchai

St_Tchai

Radio

Mouv Nostalgie

St_Mouv St_Nostalgie

Rock Pop Pop_Rock

Tchai Tchaikovsky

Mouv Rock

Pop_Rock ?

Ru It

Pop? Rock ?

Mouv ?

St_Pop_Rock

St_Francais St_Pop

Rewritings

St_USSt_Français

St_Mouv

St_US St_Pop

St_Mouv St_Pop

St_Pop

Tchaikovsky Tchai

St_Pop

St_Français

St_Mouv

St_Pop St_USSt_Pop

P1P2

Page 11: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

11

Query answering in SomeWhere Decomposition of queries/recombination of answers

– only atomic queries are transmitted to peers a complex query is splitted into atomic queries

– each solicited peer processes a given atomic query q and incrementally sends back intentional answers for it

(conjunction of) extensional classes that are rewritings of q– intentional answers of different atomic queries resulting from the split

of a complex query must be recombined intentional answers can combine extensional classes of different peers

Can be reduced to a consequence finding problem in distributed clausal propositional theories

– Ontologies and mappings are encoded as clauses– The maximal conjunctive rewritings of a conjunctive query Q correspond

exactly to the negation of the clauses that are proper prime implicates of the negation of Q w.r.t the union of the local theories and the mappings

Page 12: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

12

Query answering algorithm

Message based local algorithm running on each peer– query, answer, and termination messages

Global properties – soundness– completeness– termination (even for cyclic networks)

Page 13: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

13

extension

Flash demo

of the

SomeWhere

http://www.lri.fr/~adjiman/somewhere/

Page 14: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

Comedy Humor

Humor Comedy

Action Suspense Thriller

Animation Cartoons

MyBookmarks

Movies

Comedy Thriller Cartoons Adult

P3

DramaComedy

Drama

Friends Humor

MyBookmarks

Genre

Drama Fantastic Gore

P5

MyBookmarks

IMDB

Drama Comedy

TVShows

Friends Alias

P6

MyBookmarks

DVD

Humor Romance

P4

BenStiller Comedy

MyBookmarks

DVD

Animation Action Suspense

P1

MyBookmarks

Actors

BruceWillis BenStiller JuliaRoberts

P2

BruceWillis Action

Drama Thriller

Page 15: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

BruceWillis Action

BenStiller Comedy Drama Thriller

DramaComedy

Drama

Friends Humor

Comedy Humor

Humor Comedy

Action Suspense Thriller

Animation Cartoons

P3

P1

P2 P5

P6

P4

Classes extensions

Page 16: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

MyBookmarks

Movies

Comedy Thriller Cartoons Adult

MyBookmarks

Genre

Drama Fantastic Gore

MyBookmarks

DVD

Humor Romance

MyBookmarks

IMDB

Drama Comedy

TVShows

Friends Alias

Drama Thriller

DramaComedy

Drama

Friends Humor

Comedy Humor

Humor Comedy

Action Suspense Thriller

Animation Cartoons

P3

P5

P6

P4

MyBookmarks

DVD

Animation Action Suspense

MyBookmarks

Actors

BruceWillis BenStiller JuliaRoberts

P2

BruceWillis Action

BenStiller Comedy

P1

Q1: Thriller ?

Q2: NOT Adult ?

Q3: Thriller AND Comedy ?

Page 17: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

MyBookmarks

Movies

Comedy Thriller Cartoons Adult

MyBookmarks

DVD

Animation Action Suspense

MyBookmarks

Actors

BruceWillis BenStiller JuliaRoberts

MyBookmarks

Genre

Drama Fantastic Gore

MyBookmarks

DVD

Humor Romance

MyBookmarks

IMDB

Drama Comedy

TVShows

Friends Alias

BruceWillis Action

BenStiller Comedy Drama Thriller

DramaComedy

Drama

Friends Humor

Comedy Humor

Humor Comedy

Action Suspense Thriller

Animation Cartoons

P3

P1

P2 P5

P6

P4

Thriller ?

P3:ThrillerP1:Action P1:SuspenseP5:DramaP6:DramaComedyP2:BruceWillis P1:Suspense

Rewritings:

Page 18: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

P3:Thriller

P1:Action P1:Suspence

P5:Drama

P6:DramaComedy

P2:BruceWillis P1:Suspense

Rewritings of Thriller: evaluation

P3

P1

P5

P6

P5

Local

Navigational

Navigational

Navigational

Integration

Page 19: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

19

SomeWhere infrastructure

1 machine

N peers

N machines

K peers per machine

N machines

1 peer per machine

Page 20: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

20

SomeWhere infrastructure

Zoom on one machine

100 % JAVA 1.5

somewhere.jar ~ 250 Ko

Page 21: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

21

Scalability experiments [IJCAI 05]

on randomly generated networks– 1000 peers deployed on a cluster of 75 machines– small world topology

Close to the topology of the web

peers– ontologies

random clauses of length 2

– mappings random clauses of length 2 or 3

Page 22: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

22

P = 0.01

Varying topologies

1000 peers Ring, 10 neighbours/peer

P = 0.1Small world

P = 1

Random graph

P: probability of redirecting an edge

Model of Watts and Strogatz

Page 23: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

23

Scalability results

Varying parameters Number of mappings between peers complexity of mappings

– ratio of clauses of length 3 (0%, 20%, 100%)

timeout : 30 s/query

Depth of query processing Small depth (less than 7) even on the hard cases

Time to produce a number of answers In 90% cases, the first answer is produced within 2 seconds Easy cases (simple mappings):

– few answers per query (5 on average)– very fast (less than 0.1s) to compute all the answers without timeouts

Hard cases (complex and more mappings per edge)– around 1000 answers per query (but > 30% queries not complete : timeouts)– quite fast to obtain them (less than 20s)

Page 24: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

24

Ongoing work (1)

Extending the data model to RDF(S)– W3C recommendation for describing web resources– Classes and (binary) relations between objects– each object is identified by a URI

http://www.louvre.fr "Le Louvre" MuseumName

http://www.paris.fr

Located

" Paris" CityName

Triple notation: <resource, property, value>Relational notation: property(resource, value)

Page 25: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

25

RDFS

ArtistNameLocated

MadeBy

Contains

City CityName

Literal

Museum Work ArtistMuseumNameLiteral

Is-a

ArcheologyMuseum

Is-a

ModernMuseum

Literal

CulturalPlace

Is-a

WorkName

Literal

Page 26: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

26

SomeRDFS: data model

a simple fragment of RDFS

distributed through simple mappings

(using the same constructors)

Page 27: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

27

Query rewriting

Propositionalisation of RDFS statements

Query rewriting using SomeWhere

C1dom C2

dom C1range C2

range

P1rel P2

rel

Prel Cdom Prel Crange

Page 28: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

28

illustration

Q(X,Y): P2.Work(X)P2.refersTo(X,Y)

P2.Workdom P2.Workrange

P2.Paintingdom …

SomeWhere rewriting

SomeWhere rewriting

P1.Paintsrel …

SomeWhere rewriting

P1.belongsTorel …

P2.Painting(X) P1.Paints(Z,X) P1.belongsTo(X,Y)

R1(X,Y): P2.Painting(X)P1.belongsTo(X,Y)

P2.refersTorel

Page 29: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

29

illustration

Q(X,Y): P2.Work(X)P2.refersTo(X,Y)

P2.Workdom P2.Workrange

P2.Paintingdom …

SomeWhere rewriting

SomeWhere rewriting

P1.Paintsrel …

SomeWhere rewriting

P1.belongsTorel …

P2.Painting(X) P1.Paints(Z,X) P1.belongsTo(X,Y)

P2.refersTorel

R2(X,Y): P1.Paints(Z,X)P1.belongsTo(X,Y)

Page 30: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

30

Ongoing work (2)

Handling inconsistencies– how to define them ?

insatisfiability (no model) => inconsistency not a necessary condition

– how to check consistency? at each join of a new peer

– how to deal with inconsistency? correct it or reason with it ?

A’

A

B’

B

B’ A’

A B

for each A, there exists a model in which A is non empty: S | A

Page 31: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

31

illustration

path m1:

AIPubli is a subclass of Conf.

inconsistencies are caused by mappings.

Article

Theory Expe

P2

path m0 -> m2: AIPublic is a subclass of Journal.

Conf and Journal are disjoint, therefore AIPUbli is necessarily empty

Publi

Conf Journal

P3

>--<

2005

AIPubli BDPubli

P1

AIPubli Theory

m0

Theory Journal

m2

2005 Conf

m1

Page 32: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

32

P2P detecting of inconsistencies

Propagation of m1: { ¬AIPubli v Conf;¬AIPubli v Publi;¬AIPubli v ¬Journal; ¬BDPubli v Conf; ¬BDPubli v Publi; ¬BDPubli v ¬Journal }.

No production of unit clause No inconsistency

Propagation of m2: { ¬Theory v Journal; ¬AIPubli v Journal;…..; ¬AIPubli ;…; ¬AIPubli v ¬Conf}. Production of a unit clause Inconsistency{m1,m2} is a NoGood stored at P3

¬Conf v Publi¬Journal v Publi¬Journal v ¬Conf

¬AIPubli v 2005¬BDPubli v 2005

¬Theory v Article¬Expe v Article

¬AIPubli v Theory

¬2005 v Conf

m1

¬Theory v Journal

m2

Page 33: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

33

Distributed storage of the NoGoods

{ M*2

M*1

M*n

… }{ }

{ }

Page 34: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

34

Principle:– avoid the inconsistencies when constructing answers

Semantics of « well-founded » answer:– obtained from a consistent subset of formulas

Algorithm:– for each answer,

build its set of mapping supports and return the set of NoGoods encountered during the reasoning,

discard the mapping supports including a NoGood

– return the answers having a not empty set of mapping supports

P2P well-founded reasoning

It will be presented in more details at ECAI 06

Page 35: 1 Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué,

35

Perspectives

Coupling SomeWhere to a DHT for optimizing lookup queries

Adapting the SomeWhere algorithm to support the epistemic semantics

Modeling and handling trust in P2P Semantic overlay networks

– based on a logical approach

P2P discovery and composition of smart devices– based on a semantic description of the functionality, inputs and

outputs of devices