May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of...

15
May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John Mylopoulos Univ. of Toronto Luciano Serafini Univ. of Trento Ilya Zaihrayeu Univ. of Trento Data Management for Peer-to-Peer Computing: A Vision

Transcript of May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of...

Page 1: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 1

Philip A. Bernstein Microsoft Research

Fausto Giunchiglia Univ. of Trento

Anastasios Kementsietsidis Univ. of Toronto

John Mylopoulos Univ. of Toronto

Luciano Serafini Univ. of Trento

Ilya Zaihrayeu Univ. of Trento

Data Management for Peer-to-Peer Computing:

A Vision

Page 2: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 2

A Peer to Peer (P2P) Research Project Database and AI researchers intermittently connect to

exchange research ideas … about P2P DBs

SeattleSeattle TorontoTorontoTrento (Italy)Trento (Italy)

Page 3: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 3

Is There a Role for P2P DBs ?

Peers come and go, but must still be able to interoperate.

To us, the big question is how to cope with DBs that are incomplete, overlapping, and mutually inconsistent dynamically appear and disappear have limited connectivity.

Scenario Databases of medical patients Complete integration is likely to be infeasible But dynamic integration of DBs relevant to one patient

could have high value.

Page 4: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 4

Contributions

Why P2P databases are different

A P2P database scenario

A logic for P2P databases

Architecture and implementation issues

Page 5: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 5

A Model for P2P Databases Each peer is a node with a database. It exchanges data

and services with acquaintances (i.e. other peers).

The set of acquaintances changes often, due to site availability changing usage patterns

Peers are fully autonomous. No global control or central server.

Hence no global schema It would be impractical to build one for each peer And it might be impossible because of inconsistencies

Page 6: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 6

A Motivating Scenario A patient may be described

in several DBs, which use different patient id formats, disease descriptions, etc.

When a patient is admitted to the hospital, H becomes acquainted with D

The acquaintance is dropped when treatment is over

When the doctor prescribes a drug, D becomes acquainted with P

A patient is injured skiing, so more DBs get involved

H: HospitalP: Pharmacist

D: Doctor

Ski Clinic

Page 7: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 7

Proposal: Local Relational Model (LRM)

A logic for P2P data integration

Instead of a global schema, each peer has coordination formulas – each specifies semantic

interdependencies between two acquaintances binary domain relations – each specifies how

symbols in one database translate to symbols in an acquaintance’s database.

Each expression in a coordination formula is relative to just one participating database

Use coordination formulas and domain relations for query and update processing.

Page 8: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 8

A Coordination Formula p: pharmacist DB

medication(PrescriptionID, Pid, Prod)

d: doctor DB

treatment(Tid, Pid, Description, Type)

where type {“hospital”, “home”}. (i:x).A(x) means

for all v in the domain of database i, A(v) is true.

A coordination formula

(p:y).(p:z).(p: (x).medication(x, y, z) d: (w).treatment(w, y, z, “home”) ) “There’s a row in treatment in the doctor DB

for each row in medication in the pharmacist DB”

Page 9: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 9

Domain Relation

A row <d1,d2> in domain relation rik specifies that value d1 in DBi corresponds to value d2 in DBk

rik may be partial

rik,rki need not be symmetric

Example - DBi contains lengths in meters and

DBk in kilometers (total but not symmetric)

rik(x) = roundToClosestK(x) rik(653)=1, rik(453)=0

rki(x) = x*1000 rki(1)=1000

Page 10: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 10

Queries A query is a coordination formula of the form A(x) i: q(x), where A(x) is a coordination formula x has n variables i is the database against which the query is posed q is a new n-ary predicate symbol

A relational space is a pair <db,r> where db is a set of DBs and r associates an rik with each pair of DBs

<db,r> ⊨ A relational space <db,r> satisfies a coordination formula

The answer to a query:

{ddomi | <db,r> ⊨ ((i:x).A(x) i:x=d)}n

Page 11: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 11

Interpreting a Query

A query

((i:P(x) j:R(y)) k:S(x,y) ) h: q(x,y)

Evaluate P,R,S in i,j,k (respectively)

Map these results via rih,rjh,rkh to sets si,sj,sk

And then compute ((si sj) sk)

Page 12: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 12

View-Based Data Integration

The standard model for data integration is based on Global as view Local as view

How does this relate to LRM? Could use LRM to express view definitions

Either map “standard” view defns to LRM, or Specify a restricted LRM for view definitions

Could customize LAV/GAV query interpretation for such LRM views and queries

This is work in progress.

Page 13: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 13

Implementation Architecture A classic multi-database system, with

A protocol for adding/dropping acquaintances LRM query processing (domain mapping logic) that can

cope with chains of acquaintances Dynamic approach to materialized view creation

Tools to help a user establish an acquaintance (e.g. yellow pages, defining domain relations & coord formulas, ...)

Local Information Source

Wrapper

Query Mgr Update Mgr

User Interface

LRM layer

A Node

P2P Network

Page 14: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 14

Summary

Why P2P databases are different

A P2P database scenario

A logic for P2P databases (LRM) Coordination formulas and domain relations Query semantics

Architecture and implementation issues

Page 15: May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.

May 28, 2002 P2P Databases 15

Theoretical Results Provide inference rules for coordination formulas Prove that the rules are sound and complete. Define a generalized relational theory as a theory with

domain closure, distinct domain values, and a finite number of possible relation extensions (CWA).

Define relational multi-context system <T,R> as a family of relational languages (one per database) with a generalized relational theory (in T) and a set of coordination formulas (in R) for each one.

Prove that for any relational multi-context system, there’s a unique maximal relational space that satisfies it. (Generalizes Reiter’s result on CWA.)

Other results on recursive queries and query evaluation.