Data Management for Peer-to-Peer Computing: A Vision Ali Rahbari.

Data Management for Peer-to-Peer Computing: A Vision

Ali Rahbari

Outline

• P2P Data Networks

• Why P2P Databases are Different

• A P2P Database Scenario

• A logic for P2P Databases

• Propagation Strategy

• Architecture and Implementation Issues

P2P Data Networks: Basic Notions

• Node

– Database, File System, etc

• P2P network

– Indexed nodes with equal participant rights

• Services

– Query answering

– Query, results and update propagation

• Locality

– No global schema, no centralized control

– Nodes have only a partial vision of the world

• Autonomy

– Nodes are largely independent of their language and content, etc

Roles for P2P DBs?

• Peers come and go, but must still be able to interoperate.

• To us, the big question is how to cope with DBs that

– are incomplete, overlapping, and mutually inconsistent

– dynamically appear and disappear

– have limited connectivity.

• Scenario

– Databases of medical patients

– Complete integration is likely to be infeasible

– But dynamic integration of DBs relevant to one patient could have high value.

A Model for P2P Databases

• Each peer is a node with a database. It exchanges data

and services with acquaintances (i.e. other peers).

• The set of acquaintances changes often, due to

– site availability

– changing usage patterns

• Peers are fully autonomous.

– No global control or central server.

H: HospitalP: Pharmacist

D: Doctor

A Motivating Scenario

A patient may be described in several DBs, which use

different patient id formats, disease descriptions,

etc.

But the databases can use different patient id

formats, disease descriptions, etc

1. When a patient is admitted

to the hospital, H becomes acquainted with D

2. The acquaintance is dropped when treatment is

over

3. When the doctor prescribes a drug, D becomes

acquainted with P

4. A patient is injured skiing, so more DBs get

involved

Ski Clinic

Proposal: Local Relational Model (LRM)

• A logic for P2P data integration

• Instead of a global schema, each peer has

– coordination formulas – each specifies semantic interdependencies

between two acquaintances

– binary domain relations – each specifies how symbols in one database

translate to symbols in an acquaintance’s database.

• Each expression in a coordination formula is relative to just one

participating database

• Use coordination formulas and domain relations for query and

update processing.

A Coordination Formula• p: pharmacist DB

medication(PrescriptionID, PatientID, Prod)

• d: doctor DB

treatment(TreatmentID, PatientID, Description, Type)

where type {“hospital”, “home”}

• (i:x).A(x) means for all x in the domain of database i, A(x) is true.

• A coordination formula:

(p:y).(p:z).(p: (x).medication(x, y, z)

d: (w).treatment(w, y, z, “home”) )

“There’s a row in treatment in the doctor DB

for each row in medication in the pharmacist DB”

Domain Relation

• A row <d1,d2> in domain relation rik specifies that value d1 in DBi corresponds to

value d2 in DBk

• rik may be partial

• rik,rki need not be symmetric

• Example - DBi contains lengths in meters and

DBk in kilometers (total but not symmetric)

– rik(x) = roundToClosestK(x)

rik(653)=1, rik(453)=0

– rki(x) = x*1000

rki(1)=1000

Queries• A query is a coordination formula of the form A(x) i: q(x), where

– A(x) is a coordination formula

– x has n variables

– i is the database against which the query is posed

– q is a new n-ary predicate symbol

• A relational space is a pair <db,r> where db is a set of DBs and r associates an

rik with each pair of DBs

• <db,r> ⊨ f A relational space <db,r> satisfies a coordination formula f

• The answer to a query:

{ddomi | <db,r> ⊨ ((i:x).A(x) i:x=d)}

Interpreting a Query

• A query:

((i:P(x) j:R(y)) k:S(x,y) ) h:q(x,y)

• Evaluate P,R,S in i,j,k (respectively)

• Map these results via rih,rjh,rkh to sets si,sj,sk

• And then compute ((si sj) sk)

P2P Databases: Proposed Solution

Coordinate query and update exchange between autonomous DBs using:

• Coordination Formulas– Specify semantic interdependencies between data from two nodes

table to table: Cust Customer

column to column: name(Cust) nm(Customer)

• Binary Domain Relations– Specify how the symbols used in one database translate to symbols used in another database

‘one’ ‘uno’

CAN$1.00 US$0.65

• Keep AUTONOMY and COORDINATION, as much as possible

What’s New in the Solution?

• No global schema, no central registry, no form of

control

• No need of system restructuring when new nodes

come and old ones go away

• We do not integrate, we COORDINATE.

– Integration is built at design time

– coordination happens at runtime

Propagation Strategy: Basic notions

• Acquaintance– Pair of nodes which have coordination formulas and binary domain relations with respect to each

other

– Acquaintances can exchange data and services

• Interest Group– Set of nodes with inter-acquaintances between them which have related content

• Group Manager– Node of an Interest Group, which is dedicated for group and query propagation management

– GM has higher requirements for stability, must be permanently active

• Query Scope

– Set of nodes which are supposed to answer a given query. Query Scope is defined by Group Manager

15

Query Propagation Strategy

1. User submits query Q ()

2. Node defines query topic

3. Node sends to Group Manager (GM) request to

define Query Scope (QS)

4. GM computes and sends back QS

5. Node 1 sends query to acquaintances in QS, and

reports this fact to GM

6. Nodes 2 and 4 send answer to node 1

7. Nodes propagate the query to theirs acquaintances

from QS and report this fact to GM

8. And so on…

9. Nodes which do not propagate any further, report

this fact to GM

10. Propagation stops when “no more propagation”

received from all boundary nodes

1

2

3

4

6

5

10

8

7

9

11

1. Q ()2. Q (, topic)

3. QS (, topic) = ? GM

4. QS (, topic)= (2, 4, 6, 8, 9, 11)

5. “nodes 2 and 4 are reached”

←R

es2

←Res4

“node 6 is reached”

“node 8 is reached”

“no more propagation from 8”

“no more propagation from 9”

Implementation Architecture

• A classic multi-database system, with

– A protocol for adding/dropping acquaintances

– LRM query processing (domain mapping logic) that can cope with chains of

acquaintances

– Dynamic approach to materialized view creation

• Tools to help a user establish an acquaintance

Architecture

• P2P Layer– P2P functionality’s add-on

• Local Data Source– Database

– File system

• User Interface– User queries

– Results

• Query Manager and Update Manager– Responsible for query and update propagation

– Manage coordination and correspondence rules, acquaintances,

and interest groups

• Wrapper– Provides a translation layer between QM and UM, and LDS

Summary

• Why P2P databases are different

• A P2P database scenario

• A logic for P2P databases (LRM)

– Coordination formulas and domain relations

– Query semantics

• Architecture and implementation issues

P2P Databases 19

منابع

• 1. M.J. Carey, L.M. Haas, P.M. Schwarz, Manish Arya, W.F. Cody, R. Fagin, M. Flickner, A. Luniewski, W.

Niblack, D. Petkovic, J. Thomas II, J.H. Williams, E.L. Wimmers: Towards heterogeneous multimedia

information systems: The Garlic approach. RIDE-DOM 1995: 124-131.

• 2. T. Catarci and M. Lenzerini. Representing and using interschema knowledge in cooperative information

systems. International J. of Intelligent and Cooperative Info. Sys., 2(4), 375-398, 1993.

• 3. S. Ceri and J. Widom. Managing semantic heterogeneity with production rules and persistent queues.

In Proceedings 19th VLDB (1993), 108-119.

• 4. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J.D. Ullman, J. Widom. The

TSIMMIS Project: Integration of heterogeneous data sources. 16th Meeting of Information Processing

Society of Japan, 1994, 7–18.

• 5. A. Gupta and J. Widom. Local verification of global integrity constraints in distributed databases. In

Proc. ACM SIGMOD Conference, 49-58, 1993.

Data Management for Peer-to-Peer Computing: A Vision Ali Rahbari.

Documents

Transcript of Data Management for Peer-to-Peer Computing: A Vision Ali Rahbari.