Distributed data management with the rule-based language...

52
Distributed data management with the rule-based language: Webdamlog Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL Webdam Inria ENS-Cachan Université de Paris Sud December 5th, 2013 Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 1 / 43

Transcript of Distributed data management with the rule-based language...

Page 1: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Distributed data management with therule-based language: Webdamlog

Ph.D. defense

Émilien ANTOINE

Supervisor: Serge ABITEBOUL

Webdam Inria ENS-Cachan Université de Paris Sud

December 5th, 2013

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 1 / 43

Page 2: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

1 Context and problematic

2 Webdamlog a language for distributed knowledge

3 System implementation of a Webdamlog engine

4 Proof of concept application and feasibility of Webdamlog

5 Conclusion

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 2 / 43

Page 3: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Focus of this thesis

Allow the Web users to manage their personal datain place

Two main aspects:personal data are distributedWeb users want to automate tasks and they are notprogrammers

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 3 / 43

Page 4: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Problem overview

We are interested in all kinds of data of a Web userpersonal data photos, movies, music, emailssocial data annotations, recommendations, social linkslocalization bookmarks, phone numbers

privacy logins/passwords, ssh keys

Data of users are heterogeneous by nature

Data are distributed on severaldevices laptops, smartphones, Internet boxes, cloud-storage, . . .

systems Facebook, Picasa, Gmail, . . .

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 4 / 43

Page 5: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Typical distribution of knowledge

Internet

Alice

Bob

CharlieP2P

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 5 / 43

Page 6: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Example of distributed task on personal dataAlice has

a blog on Wordpress.com to publish movie reviewsa Facebook and Gmail account to talk to friendsa Dropbox account to share files

she wishes to advertise friends about new reviews and share themovieCumbersome tasks for humans

remember each login/passwordsign on each websiteuse each GUI

Solutionshope that a system, e.g. Facebook, provides an appropriatewrappertry to specify it with a system such as ifttt.com “if this then that”if everything fails, write a script (only for hackers)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 6 / 43

Page 7: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Example of distributed task on personal dataAlice has

a blog on Wordpress.com to publish movie reviewsa Facebook and Gmail account to talk to friendsa Dropbox account to share files

she wishes to advertise friends about new reviews and share themovieCumbersome tasks for humans

remember each login/passwordsign on each websiteuse each GUI

Solutionshope that a system, e.g. Facebook, provides an appropriatewrappertry to specify it with a system such as ifttt.com “if this then that”if everything fails, write a script (only for hackers)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 6 / 43

Page 8: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

The Web as a distributed knowledge base

Give to Alice a system thatuse the Web service to manage her knowledgeallow her to specify taskshide the network and protocol issues

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 7 / 43

Page 9: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Principles underlying the approach

Knowledge and computation are distributed on several peersThese peers are autonomous (P2P)They are willing to collaborate (delegation)Knowledge from heterogeneous systems is integrated using“wrappers” in a mediation style

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 8 / 43

Page 10: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

1 Context and problematic

2 Webdamlog a language for distributed knowledgeDatalogWebdamlog

3 System implementation of a Webdamlog engine

4 Proof of concept application and feasibility of Webdamlog

5 Conclusion

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 9 / 43

Page 11: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Representing knowledge in datalog

Facts in relations:

Friend Picture Locationuser friend withAlice BobAlice CharlieBob Charlie

id tagpic1 Alicepic2 Bobpic3 Charlie

pic id locationpic1 Grenoblepic2 Nantespic3 Rennes

Rules:Where($someone,$loc)︸ ︷︷ ︸

head

:- Picture($pic, $someone)︸ ︷︷ ︸atom

, Location($pic, $loc)︸ ︷︷ ︸atom︸ ︷︷ ︸

body

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 10 / 43

Page 12: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Representing knowledge in datalog

In datalog relations are either:extensional: Friend,Picture,Location

I a list of facts stored in databaseI only in the body of datalog rules

intensional: WhereI a list of facts defined by rules ie. a viewI appear at least once in the head of a rule

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 11 / 43

Page 13: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Datalog supports recursion

Recursion needed in graphs e.g. network path, social link

FoF($x,$y) :- Friend($x,$y)

FoF($x,$y) :- Friend($x,$z), FoF($z,$y)

Already exists in SQL but ugly

1 WITH RECURSIVE FoF(From, To) AS

2 (

3 SELECT From, To, FROM Friend

4 UNION

5 SELECT Friend.From, FoF.End

6 FROM Friend, FoF

7 WHERE Friend.To = FoF.From);

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 12 / 43

Page 14: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Some datalog extensions we use

update extensional relations in the head of rules:Friend($y,$x) :- Friend($x,$y)

negation in the body of rulesPicture($x,Charlie) :- Picture($x,Alice), ¬Picture($x,Bob)

distributed relations are distributed over the networkPicture@Bob($x,Bob) :- Picture@Alice($x,Bob)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 13 / 43

Page 15: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Webdamlog

Schema(π,E , I, σ) where

π is a possibly infinite set of peer namesE is a set of extensional relations of the form m@p for p ∈ π

I is a set of intensional relations of the form m@p for p ∈ π

σ(m@p) typing function, arity and sorts of m@p fields

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 14 / 43

Page 16: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Webdamlog

Factsm@p(a1, ...,an), where

m is a relation namep is a peer name in πn = σ(m@p) and a1, . . . ,an are data valuesdata values includes the relations and peer names

ExtensionalFriend@Alice(Alice,Bob)

Friend@Alice(Alice,Charlie)

Friend@Alice(Bob,Charlie)

Picture@Alice(pic1,Alice)

. . .

IntensionalWhere@Alice(Alice,Grenoble)

Where@Alice(Bob,Nantes)

Where@Alice(Charlie,Rennes)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 15 / 43

Page 17: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

WebdamlogRules$Rn+1@$Pn+1($Un+1) :- (¬)$R1@$P1($U1), . . . , (¬)$Rn@$Pn($Un)

$Ri are relation terms possibly variables$Pi are peer terms possibly variables$U i are tuples of termsread body from left to right

Safety condition$Rn+1 $Pn+1 must appear positively bound in the body$Pi must be previously boundeach variables must appear in positive atom before being used

Rules reside at peers:[at Alice]

Picture@Alice($x,$y) :- Friend@Alice(Alice, $friend ),

Picture@ $friend ($x,$y)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 16 / 43

Page 18: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

WebdamlogRules$Rn+1@$Pn+1($Un+1) :- (¬)$R1@$P1($U1), . . . , (¬)$Rn@$Pn($Un)

$Ri are relation terms possibly variables$Pi are peer terms possibly variables$U i are tuples of termsread body from left to right

Safety condition$Rn+1 $Pn+1 must appear positively bound in the body$Pi must be previously boundeach variables must appear in positive atom before being used

Rules reside at peers:[at Alice]

Picture@Alice($x,$y) :- Friend@Alice(Alice, $friend ),

Picture@ $friend ($x,$y)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 16 / 43

Page 19: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Particularity of Webdamlog rules

[at p]r0@p0(xo) :- r1@p1(x1), r2@p2(x2), ...,rn@pn(xn)

Semantic depending on 3 criteriap1, . . . ,pn = p, the rule is called localhead r0@p0(xo) is intensional or extensionalhead r0@p0(xo) is local or not (p0 = p or not)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 17 / 43

Page 20: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Local rules with local intensional head

Extensional: Friend@Alice: (Alice,Bob),(Alice,Charlie),(Bob,Charlie)Intensional: FoF@Alice

[at Alice]FoF@Alice($x,$y):-Friend@Alice($x,$y)

FoF@Alice($x,$y):- Friend@Alice($x,$z), FoF@Alice($z,$y)

FoF will contain the transitive closure of Friend:(Alice,Bob), (Alice,Charlie), (Bob,Charlie), (Alice,Charlie)

IntuitionThis is standard datalog evaluation

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 18 / 43

Page 21: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Local rules with local extensional head

Extensional: Friend@AliceStep 0: (Alice,Bob),(Alice,Charlie),(Bob,Charlie)

[at Alice]Friend@Alice($y,$x):-Friend@Alice($x,$y)

Step 1: (Bob,Alice),(Charlie,Alice),(Charlie,Bob)Step 2: (Alice,Bob),(Alice,Charlie),(Bob,Charlie)

IntuitionDatabase updates

Remarks:by default extensional facts are consumedunless we declare the relation as persistent

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 19 / 43

Page 22: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Local rules with non-local extensional head

Extensional: Today@Alice(december 5)

Extensional: Event@Alice(birthday,december 5,SMS,Bob-phone)

[at Alice]$r@$p(Happy birthday):-Today@Alice($date),

Event@Alice(birthday,$date,$r,$p)

Produce SMS@Bob-phone(Happy birthday)

IntuitionMessaging between peers

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 20 / 43

Page 23: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Local rules with non-local intensional head

Extensional: Friend@AliceIntensional: Contact@Bob

[at Alice]Contact@Bob($x,$y):- Friend@Alice($x,$y)

Bob gets a view on Alice’s friends

IntuitionExternal view definition

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 21 / 43

Page 24: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Non-local rules: delegationThe main novelty of Webdamlog

Extensional: Friend@Alice: (Alice,Bob),(Alice,Charlie),(Bob,Charlie)

[at Alice]Picture@Alice($pic,Alice):-Friend@Alice(Alice,$f),

Picture@$f($pic,Alice)

This will install two rules:

[at Bob] Picture@Alice($pic,Alice):-Picture@Bob($pic,Alice)[at Charlie] Picture@Alice($pic,Alice):-Picture@Charlie($pic,Alice)

RemarkIf Friend@Alice(Alice,Bob) no longer holds the delegation isuninstalled

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 22 / 43

Page 25: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Non-local rules: delegation 2The main novelty of Webdamlog

IntuitionPossible to ask another peer to perform some tasks for you(distributed computation)Possible to exchange knowledge between peers

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 23 / 43

Page 26: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Webdamlog semantic

State

(I, Γ, Γ̃)

I(p) the local state of p is a finite set of extensional factsΓ(p) is the finite set of rules at pΓ̃(p,q) (p 6= q) is the set of rules that p delegated to q

State transitionChoose some peer p randomly – asynchronouslyCompute the transition: (I0, Γ0, Γ̃0) → (I1, Γ1, Γ̃1) → (I2, Γ2, Γ̃2) → . . .

the database updates at pthe messages sent to the other peersthe delegations of rules to other peers

Fair sequence: each peer is selected infinitely often

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 24 / 43

Page 27: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Webdamlog summary

Webdamlog is datalog with novel extensionsvariables in relation and peer namesdelegation

both imply installing rules at run-time

Results:formal definition of Webdamlogexpressivity results:

I the model with delegation is more general, unless all peers andprograms are known in advance

convergence is very hard to achieve because of asynchronicity

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 25 / 43

Page 28: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

1 Context and problematic

2 Webdamlog a language for distributed knowledge

3 System implementation of a Webdamlog engineWebdamlog evaluationDeletions in Webdamlog

4 Proof of concept application and feasibility of Webdamlog

5 Conclusion

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 26 / 43

Page 29: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Implementation on top of Bud

Bud [UC Berkeley] is a distributed datalog engine with updates; itsupports

local rules with local extensional head (semi-naive)local rules with non-local extensional head (semi-naive)

Bud does not supportnegationintensional relation optimizations (query sub-query, magic sets)

Bud neither any other engine implements requirements forWebdamlog

variables in relation and peer namesdelegationinstalling rules at run-time

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 27 / 43

Page 30: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Semi-naiveNaive evaluation leads to redundant computationEdge: (1,2),(2,3),(3,4),(4,5)

Path($x,$y):-Edge($x,$y)

Path($x,$y):-Edge($x,$z),Path($z,$y)

Path(I)0 =∅

Path(I)1 =Path(I)0 ∪ {Path(1, 2),Path(2, 3),Path(3, 4),Path(4, 5)}

Path(I)2 =Path(I)1 ∪ {Path(1, 3),Path(2, 4),Path(3, 5)}

Path(I)3 =Path(I)2 ∪ {Path(1, 4),Path(2, 5)}

Path(I)4 =Path(I)3 ∪ {Path(1, 5)}

Path(I)5 =Path(I)4

︸ ︷︷ ︸∆ new facts since previous step

Semi-naive use delta of previous step to compute the current stepPathi($x,$y):-Edge($x,$z),∆Pathi−1($z,$y)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 28 / 43

Page 31: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Webdamlog engine run

Run a Webdamlog stage (I, Γ, Γ̃) → (I′, Γ′, Γ̃′) in three steps

1 inputs are collected and a new state is definedI insert/delete Webdamlog facts and rulesI update deltas of relationsI compile the Webdamlog program into a Bud program

2 semi-naive evaluation: monotonic Bud program with a fixed setof local rules run to fixpoint

3 output messages and delegations are collected and sent to otherpeers for next stage

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 29 / 43

Page 32: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Management of variables[at p]Fact: r1@p(relname,peername)Rule: r0@p(xo)︸ ︷︷ ︸

head

:- r1@p($x,$y),︸ ︷︷ ︸static

$x@p(x1),r2@$y(x2),. . .︸ ︷︷ ︸dynamic

Stage i:

step 1 install at p: i@p($x,$y) :- r1@p($x,$y)︸ ︷︷ ︸static

step 2 semi-naive evaluation evaluate i@pstep3 FOR EACH fact F(vx ,vy ) IN i@p

send new rule: head :- dynamicwith variables bounded by F(vx ,vy )r0@p(x0)︸ ︷︷ ︸

head

:- vx@p(x1),r2@vy (x2), . . .︸ ︷︷ ︸dynamic

Stage i + 1, at step 1:receive r0@p(xo) :- relname@p(x1),r2@peername(x2),. . .

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 30 / 43

Page 33: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Management of variables[at p]Fact: r1@p(relname,peername)Rule: r0@p(xo)︸ ︷︷ ︸

head

:- r1@p($x,$y),︸ ︷︷ ︸static

$x@p(x1),r2@$y(x2),. . .︸ ︷︷ ︸dynamic

Stage i:

step 1 install at p: i@p($x,$y) :- r1@p($x,$y)︸ ︷︷ ︸static

step 2 semi-naive evaluation evaluate i@pstep3 FOR EACH fact F(vx ,vy ) IN i@p

send new rule: head :- dynamicwith variables bounded by F(vx ,vy )r0@p(x0)︸ ︷︷ ︸

head

:- vx@p(x1),r2@vy (x2), . . .︸ ︷︷ ︸dynamic

Stage i + 1, at step 1:receive r0@p(xo) :- relname@p(x1),r2@peername(x2),. . .

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 30 / 43

Page 34: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Management of variables[at p]Fact: r1@p(relname,peername)Rule: r0@p(xo)︸ ︷︷ ︸

head

:- r1@p($x,$y),︸ ︷︷ ︸static

$x@p(x1),r2@$y(x2),. . .︸ ︷︷ ︸dynamic

Stage i:

step 1 install at p: i@p($x,$y) :- r1@p($x,$y)︸ ︷︷ ︸static

step 2 semi-naive evaluation evaluate i@pstep3 FOR EACH fact F(vx ,vy ) IN i@p

send new rule: head :- dynamicwith variables bounded by F(vx ,vy )r0@p(x0)︸ ︷︷ ︸

head

:- vx@p(x1),r2@vy (x2), . . .︸ ︷︷ ︸dynamic

Stage i + 1, at step 1:receive r0@p(xo) :- relname@p(x1),r2@peername(x2),. . .

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 30 / 43

Page 35: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Management of variables[at p]Fact: r1@p(relname,peername)Rule: r0@p(xo)︸ ︷︷ ︸

head

:- r1@p($x,$y),︸ ︷︷ ︸static

$x@p(x1),r2@$y(x2),. . .︸ ︷︷ ︸dynamic

Stage i:

step 1 install at p: i@p($x,$y) :- r1@p($x,$y)︸ ︷︷ ︸static

step 2 semi-naive evaluation evaluate i@pstep3 FOR EACH fact F(vx ,vy ) IN i@p

send new rule: head :- dynamicwith variables bounded by F(vx ,vy )r0@p(x0)︸ ︷︷ ︸

head

:- vx@p(x1),r2@vy (x2), . . .︸ ︷︷ ︸dynamic

Stage i + 1, at step 1:receive r0@p(xo) :- relname@p(x1),r2@peername(x2),. . .

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 30 / 43

Page 36: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Management of variables[at p]Fact: r1@p(relname,peername)Rule: r0@p(xo)︸ ︷︷ ︸

head

:- r1@p($x,$y),︸ ︷︷ ︸static

$x@p(x1),r2@$y(x2),. . .︸ ︷︷ ︸dynamic

Stage i:

step 1 install at p: i@p($x,$y) :- r1@p($x,$y)︸ ︷︷ ︸static

step 2 semi-naive evaluation evaluate i@pstep3 FOR EACH fact F(vx ,vy ) IN i@p

send new rule: head :- dynamicwith variables bounded by F(vx ,vy )r0@p(x0)︸ ︷︷ ︸

head

:- vx@p(x1),r2@vy (x2), . . .︸ ︷︷ ︸dynamic

Stage i + 1, at step 1:receive r0@p(xo) :- relname@p(x1),r2@peername(x2),. . .

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 30 / 43

Page 37: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Management of variables[at p]Fact: r1@p(relname,peername)Rule: r0@p(xo)︸ ︷︷ ︸

head

:- r1@p($x,$y),︸ ︷︷ ︸static

$x@p(x1),r2@$y(x2),. . .︸ ︷︷ ︸dynamic

Stage i:

step 1 install at p: i@p($x,$y) :- r1@p($x,$y)︸ ︷︷ ︸static

step 2 semi-naive evaluation evaluate i@pstep3 FOR EACH fact F(vx ,vy ) IN i@p

send new rule: head :- dynamicwith variables bounded by F(vx ,vy )r0@p(x0)︸ ︷︷ ︸

head

:- vx@p(x1),r2@vy (x2), . . .︸ ︷︷ ︸dynamic

Stage i + 1, at step 1:receive r0@p(xo) :- relname@p(x1),r2@peername(x2),. . .

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 30 / 43

Page 38: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Evaluation of deletion in datalog

Edge@Alice: (1,2),(2,3),(3,4),(4,5)

Rule1: Path@Alice($x,$y):-Edge@Alice($x,$y)

Rule2: Path@Alice($x,$y):-Edge@Alice($x,$z),Path@Alice($z,$y)

Path@Bob: (1,3),(1,5)

Rule3: Path@Alice($x,$y):-Path@Bob($x,$y)

Path@Alice:

(1,2), (2,3), (3,4), (4,5),

Rule3︷ ︸︸ ︷(1,3), (2,4), (3,5), (1,4), (2,5),

Rule 3︷ ︸︸ ︷(1,5)︸ ︷︷ ︸

Rule 1 and 2

Delete Path@Alice(2,3)semi-naive requires full recomputationwe introduce a novel optimization based on provenance

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 31 / 43

Page 39: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Example for provenance in Webdamlog evaluation

Alice creates a gallery of pictures where she is tagged from her ownpictures and the pictures of her friendsPicture@Alice(pic1,Alice)

Friend@Alice(Alice,Bob)

Picture@Bob(pic1,Alice)

[Rule 1 at Alice]Gallery@Alice($pic,Alice):-Picture@Alice($pic,Alice)[Rule 2 at Alice]Gallery@Alice($pic,Alice):-Friend@Alice(Alice,$f),

Picture@$f($pic,Alice)

[Rule 3 at Bob] delegationGallery@Alice($pic,Alice):-Picture@Bob($pic,Alice)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 32 / 43

Page 40: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Example for provenance in Webdamlog evaluation

Alice creates a gallery of pictures where she is tagged from her ownpictures and the pictures of her friendsPicture@Alice(pic1,Alice)

Friend@Alice(Alice,Bob)

Picture@Bob(pic1,Alice)

[Rule 1 at Alice]Gallery@Alice($pic,Alice):-Picture@Alice($pic,Alice)[Rule 2 at Alice]Gallery@Alice($pic,Alice):-Friend@Alice(Alice,$f),

Picture@$f($pic,Alice)

[Rule 3 at Bob] delegationGallery@Alice($pic,Alice):-Picture@Bob($pic,Alice)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 32 / 43

Page 41: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Example for provenance in Webdamlog evaluation

Alice creates a gallery of pictures where she is tagged from her ownpictures and the pictures of her friendsPicture@Alice(pic1,Alice)

Friend@Alice(Alice,Bob)

Picture@Bob(pic1,Alice)

[Rule 1 at Alice]Gallery@Alice($pic,Alice):-Picture@Alice($pic,Alice)[Rule 2 at Alice]Gallery@Alice($pic,Alice):-Friend@Alice(Alice,$f),

Picture@$f($pic,Alice)

[Rule 3 at Bob] delegationGallery@Alice($pic,Alice):-Picture@Bob($pic,Alice)

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 32 / 43

Page 42: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Optimization deletion using provenance

Time nodes x are conjunctionPlus nodes + are disjunction

Picture@Alice(pic1,Alice)

Rule 1 +

xFriend@Alice(Alice,Bob)

Picture@Bob(pic1,Alice)

Rule 2 x

Peer Bob

Peer Alice

Gallery@Alice(pic1,Alice)

Rule 3

x

avoid to rederive the full relationskeep trace of multiple proofs for the same fact

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 33 / 43

Page 43: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Optimization deletion using provenance

Time nodes x are conjunctionPlus nodes + are disjunction

Picture@Alice(pic1,Alice)

Rule 1 +

x

Picture@Bob(pic1,Alice)

Rule 2 x

Peer Bob

Peer Alice

Gallery@Alice(pic1,Alice)

Rule 3

x

avoid to rederive the full relationskeep trace of multiple proofs for the same fact

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 34 / 43

Page 44: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Optimization deletion using provenance

Time nodes x are conjunctionPlus nodes + are disjunction

Picture@Alice(pic1,Alice)

Rule 1 +Picture@Bob(pic1,Alice)

Rule 2 x

Peer Bob

Peer Alice

Gallery@Alice(pic1,Alice)

Rule 3

x

avoid to rederive the full relationskeep trace of multiple proofs for the same fact

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 35 / 43

Page 45: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Optimization deletion using provenance

Time nodes x are conjunctionPlus nodes + are disjunction

Picture@Alice(pic1,Alice)

Rule 1 +Picture@Bob(pic1,Alice)

Rule 2 x

Peer Bob

Peer Alice

Gallery@Alice(pic1,Alice)

x

avoid to rederive the full relationskeep trace of multiple proofs for the same fact

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 36 / 43

Page 46: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Optimization deletion using provenance

Time nodes x are conjunctionPlus nodes + are disjunction

Picture@Alice(pic1,Alice)

Rule 1 +Picture@Bob(pic1,Alice)

Rule 2

Peer Bob

Peer Alice

Gallery@Alice(pic1,Alice)

x

avoid to rederive the full relationskeep trace of multiple proofs for the same fact

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 37 / 43

Page 47: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

1 Context and problematic

2 Webdamlog a language for distributed knowledge

3 System implementation of a Webdamlog engine

4 Proof of concept application and feasibility of WebdamlogArchitectureExperimentsUser study

5 Conclusion

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 38 / 43

Page 48: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Architecture of a Webdamlog peer

IMAPSMTP

HTMLJavascript

ORM

HTTPJSON

Web interface

Database

Webdamlog engine

Facebook

Email

Event pool

Reactorpattern

Webdamlog peer

updatesWrappers

Facts/rules updates

Wrappers translate external commands into facts/rules updatesReactor pattern activates the Webdamlog engine if needed

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 39 / 43

Page 49: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Experiments

QSQ style optimization

20 40 60 80 100

0.3

0.4

0.5

0.6

0.7

0.8

0.9

% of matched facts

wai

ting

time

at S

ue (s

ec)

QSQ evaluationfull materialization

Cost of provenance graph maintenance

0 200 400 600 800 1000 1200 1400

0.2

0.4

0.6

0.8

1.0

1.2

1.4

# of facts deleted

tota

l tim

e on

all

peer

s (s

ec)

with provenancewithout provenance

Deletion of facts

0 2000 4000 6000 8000 10000

24

68

# of facts deleted

tota

l tim

e on

all

peer

s (s

ec)

propagationrecomputation

Deletion of rules

0 20 40 60 80 1000

24

68

# of peers deleted from allFriends@Sue

time

(sec

)

propagation totalrecomputation totalpropagation waitingrecomputation waiting

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 40 / 43

Page 50: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

User studyTo show that Webdamlog is declarative and user-friendly

Settings:pool of 27 participants with and without IT traininga 20 minute lesson about the languagean exam to test

I understanding of Webdamlog programsI ability to write Webdamlog programs

Results:everyone but 2 participants perfectly understood the languagea large majority wrote correct rulesnon technical participants took longer to answer

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 41 / 43

Page 51: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Demonstration

A demonstration showed at SIGMOD 2013A social network to share pictures among the attendees of theconferenceThis application “Wepic” run thanks to a Webdamlog engine andwrappers (web interface, database, . . . )Scenario

I SIGMOD runs a “Wepic” peer with an empty programI attendees run their own peer with a basic program to exchange

contact with SIGMODI peer can be customized by adding rules

Play the video

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 42 / 43

Page 52: Distributed data management with the rule-based language ...webdam.inria.fr/emilien.antoine/wp-content/uploads/defense.pdf · Ph.D. defense Émilien ANTOINE Supervisor: Serge ABITEBOUL

Conclusion

We proposea formal language for data managementan implementation of an engine for this language (withoptimizations)a system for application development (with wrappers)

Future workon access control (on-going work)a better graphical interfacea more in-depth user studyan API for the development of applications and wrappersthe enhancement Webdamlog with ontology technologyoptimization techniques, e.g. dQSQ

Émilien Antoine (Inria) Ph.D. defense December 5th, 2013 43 / 43