WebdamExchange and WebdamLog : some models for web data management Emilien Antoine, Meghyn...

27
1 WebdamExchange and WebdamLog: some models for web data management Emilien Antoine, Meghyn Bienvenu, Alban Galland Webdam WS, 04/03/2011

description

WebdamExchange and WebdamLog : some models for web data management Emilien Antoine, Meghyn Bienvenu, Alban Galland. Webdam WS, 04/03/2011. Organization. Introduction Representing all Web information as logical sentences Representing all Web data management as logical rules - PowerPoint PPT Presentation

Transcript of WebdamExchange and WebdamLog : some models for web data management Emilien Antoine, Meghyn...

Page 1: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

1

WebdamExchange and WebdamLog: some models for web data management Emilien Antoine, Meghyn Bienvenu, Alban Galland

Webdam WS, 04/03/2011

Page 2: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

2

Organization

• Introduction• Representing all Web information as logical sentences• Representing all Web data management as logical rules• Some clues about WebdamPoor• Some clues about implementation

• Conclusion

Page 3: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

Introduction

Page 4: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

4

Context of the work presented here

• Joint work with many people: Émilien Antoine, Serge Abiteboul, Meghyn Bienvenu, David Gross-Amblard, Marilena Oita, Amélie Marian, Bruno Marnette, Neoklis Polyzotis, Philippe Rigaux, Marie-Christine Rousset…

Page 5: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

5

Context: Web data management• Scale: lots of users, servers, large volume of data…• Distribution heterogeneity: Cloud (social networks), P2P (DHT,

gossiping)…• Security heterogeneity: login, https, crypto, hidden URL…• Terminology heterogeneity: annotation, semantic Web, ontologies…• Incomplete information: inconsistencies, belief, trust…• The heterogeneity keeps increasing with new systems and new

applications arriving

• Consequence 1: difficulty to perform data integration/management• Consequence 2: impossibility to keep control over its own data

Page 6: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

6

Thesis: Web data = distributed knowledge

• Work plan1. Represent all Web information as logical sentences2. Represent all Web data management as logical rules3. Develop a system to validate these ideas

• Motivation for the approach• Facilitate the design/implementation of complex systems• Facilitate the control/surveillance of complex systems• Use reasoning to optimize query evaluation• Use reasoning for semantics/ontologies • Use reasoning to manage access control and protect data• Use reasoning to analyze properties of systems

Page 7: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

7

Motivating example

• Alice : get me the pictures of my friends where I am with Bob?• What is going on:

• Find the friends of Alice (The iPhone of Alice may remember it)• For each answer, say Sue, find where Sue keeps her pictures (She may

keep her pictures on Picasa)• Find the means to access Sue’s pictures (Alice may ask the private url to

a common friend)• Find the photos with Bob and Alice (e.g. by querying the meta-data)

Page 8: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

8

Motivating example

• Alice : get me the pictures of my friends where I am with Bob?• Issues: heterogeneity of friends

• Heterogeneity of hosting: Some keep their pictures on trusted servers such as Picasa, some put in on untrusted DHT, some have them on their smartphones…

• Heterogeneity of access-control: Some are public, some use login-password, some use private url, some use cryptography…

• Heterogeneity of data description: they may use different models of meta-data (taxonomies, ontologies…)

Page 9: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

9

Complicated application organization…

• Example of our SocialRock demo:

Page 10: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

Representing all Web information as logical sentences

Page 11: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

11

The information belongs to someone

• Each information belongs to a principal• A principal has an identity (URI) which can be authenticated• Two kinds of principal: peer and virtual principal

• A peer: alice-laptop, alice-iPhone, picasa, facebook, dht-peer-124, …• Storage and processing capabilities• A peer typically has a URL and can be sent query/update requests

• A virtual principal: alice, alice-friends, roc14• A virtual principal relies on peers for storage and processing

Page 12: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

12

The kind of information we are talking about

• Data: pictures, movies, music, emails, ebooks, reports• Localization: bookmarks, knowledge such as Alice has an

account in Facebook, Sue puts her pictures in Picasa• Access: login/password, access rights on servers• Annotations /Ontologies: semantic tags in Picasa ,RDFS, OWL• Services: search engines, yellow pages, dictionaries…• Incomplete information: beliefs, probabilistic information…• And more…

Page 13: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

13

Logical statements to represent information

• Data: • Document: picture34@alice-iPhone(picture34.jpg,09/12/2009,…)• Collection: pictures@alice(picture34@alice-iPhone)

• Localization: where@alice(picture37, picasa/alice)• Access right: isOwner@picasa/alice(alice)• Access secret : ownSecret@picasa/alice(“alice”, “HG-FT23”)• Ontologies: [email protected](“alice”, human-being)• Services: [email protected]($Person, $City, $Y)• Belief: picture34@alice-iPhone(picture34.jpg,09/12/2009,…,75%)• Etc.

Page 14: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

14

WebdamExchange focus: authenticated knowledge

• Base statement: • someone states picture37@alice (….)• It is annotated with a proof that “someone” can write data of alice• In the cryptographic setting, it is a signature of the whole statement using

the write secret key of alice

• Keeping trace of provenance: • alice-laptop states picture37@alice (….) requester bob at 12:30,

10/08/2009• alice-Laptop is the performer (the peer who did the update of the data of

Alice)• bob is the requester (the peer or the user who requested the update)

• The content is possibly encrypted: • alice-laptop states picture37@alice (….) protected for reader@alice

requester bob at 12:30, 10/08/2009

Page 15: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

15

WebdamExchange focus: authenticated knowledge

• Communication: external knowledge is knowledge about other principals: • alice-laptop says (alice-laptop states picture37@Alice (….) requester bob

at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009• alice-laptop is the performer of the communication• sue-iphone is the receiver of the communication• External knowledge is authenticated by the performer and is stored by the

receiver .

• The external knowledge keep a trusted trace of the provenance and communication are pilled-up: • sue-iphone says (alice-laptop says (alice-laptop states picture37@Alice

(….) requester bob at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009) to bob-iphone at 13:10, 15/10/2009

• The time is the time of the performer, there is no global clock

Page 16: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

16

The model covers a wide range of data

• The model does not prescribe any particular architecture for distribution• Gossiping, DHT, centralized server• Combination of these• Based on an abstract notion of localization

• The model does not prescribe how access control is enforced, e.g.:• Documents in Web servers with access protected by login/password• Documents protected by cryptographic keys in public sites• Based on an abstract notion of secret and hint

• See presentation of Emilien on WebdamPoor

Page 17: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

17

Summary of WebdamExchange

• All the information forms a trusted knowledge base• Each peer manages some portion of the knowledge base

• Now, we have to use this distributed knowledge base … for the management of the distributed knowledge base!

Page 18: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

Representing all Web data management as logical rules

Page 19: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

19

From WebdamExchange to Webdamlog

• The logical part of the WebdamExchange statements can easily be translated into datalog facts.

• Now we want to perform reasoning on these facts in order to locate, exchange, and update information• Example: use logical reasoning among peers to locate the

pictures of Alice’s friends in which she appears with Bob

• This motivates Webdamlog, a rule-based language for web data management

Page 20: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

20

Why datalog?• Datalog: very popular in the 90’s, prehistory by Web time

+ Natural syntax; reasonably expressive; easy to extend- Recursion not really essential in most applications

• Datalog extensions• Negation and aggregate functions lots of work on these• Updates, time, trees, distribution less work on these

• We use a datalog-like language influenced by• Active XML for distribution and delegation• Hellerstein’s Dedalus for time and performance

Page 21: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

21

Webdamlog• Facts (messages) of the form m@p(a1,...,an)

• Rules of the form R@P(U) :- (¬) R1@P1(U1), …, (¬) Rn@Pn(Un)• R,Ri are relation terms

• P,Pi are peer terms

• U,Ui are tuples of terms

• Safety condition

• Intuition: if the body holds for some valuation v, the fact vR@vP(vU) is sent to the peer vP

• What happens if the body of the rule mentions different peers?• Peers need to collaborate to evaluate the rule rule delegation

Page 22: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

22

WebdamlogSystem:• A finite set of peers• Each peer p in has a local program

P(p) and a delegated program D(p), which are both finite sets of rules

• Each peer p also has a database I(p) consisting of a finite set of facts of the form m@p(u)

Semantics: • In a state (P,D,I), choose

randomly some p • Evaluate (P(p)UD(p))(I(p))• This defines the new DB I’(p)• Send facts and update

delegations of the other peers to define (D’(q),I’(q)) for each peer q≠p

• The changes to each q are installed instantaneously – we will see how to avoid this if desired

• Choose another peer and keep going (in a fair way)

Page 23: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

23

Features of Webdamlog illustrated

Alice: get me the pictures of my friends where I am with Bob

result@alice-iphone($photo) :- friends@alice-iphone($X),

findPhotos@alice-iphone($X, $R, $P),

$R@$P($X, $Photo, $Meta),

contains@$P($Meta, “Alice”) ,

contains@$P($Meta, “Bob”)

findPhotos@alice-iphone($X, photos, picasa) :- member($X, picasa)

friends@alice-iphone(Sue) member(Sue,picasa)

- Peers and relations treated as data: they are reified- $R@$P: will instantiate with concrete relation and peer- friends@alice-iphone is extensional, occurs in data at alice-iphone- findPhotos@alice-iphone intensional, derived from data + rules

Page 24: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

24

Peer picasa will send the photos as extensional facts to alice-iphone.

When Alice terminates her query, she cancels all the delegations.

Features of Webdamlog illustrated

findPhotos@alice-iphone($X, photos, picasa) :- member($X, picasa)

friends@alice-iphone(Sue) member(Sue,picasa)

Partial evaluation at alice-iphone ($XSue, $R photos, $P picasa)Then alice-iphone installs the rest of the rule at picasa:result@alice-iphone($Photo,Sue) :-

photos@picasa(Sue,$Photo,$Meta),contains@picasa($Meta, “Alice”) , contains@picasa($Meta, “Bob”)

result@alice-iphone($photo) :- friends@alice-iphone($X),

findPhotos@alice-iphone($X, $R, $P),

$R@$P($X, $Photo, $Meta),

contains@$P($Meta, “Alice”) ,

contains@$P($Meta, “Bob”)

Alice: get me the pictures of my friends where I am with Bob

Page 25: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

25

What can we show ?

• In general, asynchronicity yields non-deterministic systems• Identified two types of Webdamlog systems (only positive rules /

appropriately stratified negation) for which we have:• convergence: all runs eventually reach same state• simulation by centralized datalog program

• Interesting to compare expressivity of different variants of WebdamLog: full / limited / no delegation, presence of time-stamps or ordering of peers…• For appropriate notion of simulation, can show that

full delegation > limited delegation > no delegation

Page 26: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

26

More refined asynchronicity

• To model transmission of facts from peer p to peer q, we may use a “peer” netpq that captures the network• Replace m@q(u) at p by m@netpq(u)

• netpq should just relay messages: $M@q($U) :- $M@netpq($U)

• Problem: all messages stocked in netpq arrive at the same time

• Better with time • m@netpq(u,t) where t is the time at p

• $M@q(U) :- $M@netpq (U,T), min(T, $M@netpq (U,T)), using min aggregate function

Page 27: WebdamExchange  and  WebdamLog : some models for web data management Emilien  Antoine, Meghyn Bienvenu,  Alban Galland

27

Summary of Webdamlog

• Peer are asynchronously running their own datalog programs• They interact by exchanging facts and delegating rules

Some things to look at:• Evaluation and optimization of queries• Acquisition of new rules• Reasoning with social information (trust, provenance, etc.)