1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths...

30
1 Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College Distributed databases 3

Transcript of 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths...

Page 1: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

1

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Distributed databases

3

Page 2: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

2

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Outline

generalities objectives problems

Page 3: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

3

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

1

Page 4: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

4

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Introduction

communication network

server

applicationapplication

application

applicationapplication

applicationapplication

server

serverDBMS in its own right

Page 5: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

5

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Introduction

distributed database = collection of connected sites each site is a DB in its own right (1)

• has its own DBMS and its own users

• operations can be performed locally as if the DB was not distributed

the sites collaborate (transparently from the user’s point of view) the union of all DBs = the DB of the whole organisation (institution)

• (oppose to (1))

physical or logical distribution strict homogeneity (assumption)

Page 6: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

6

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Motivation

advantages matches the structure of the organisation

• example

efficiency of processing• stored closely to where it is being used

increased accessibility• remote DBs can be accessed

disadvantage complexity

Page 7: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

7

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Implementations (systems)

commercial ORACLE (Oracle Corporation) INGRES/STAR (Ask Group Inc. Ingres Division) DB2 (IBM)

they all provide some sort of features for distributed databases

Page 8: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

8

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Fundamental principle

a distributed DB system should look to the user exactly as a non-distributed DB system

Page 9: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

9

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

2

Page 10: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

10

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Objectives

local autonomy

no reliance on central site

location independence

fragmentation independence

replication independence

distributed query processing

distributed transaction management

Page 11: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

11

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Objectives are:

not independent from each other not exhaustive sometimes contradicting different degree of importance (for the user)

Page 12: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

12

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Local autonomy

all operations at a certain site are fully controlled by that site

not achievable (why?) therefore, autonomy should be achieved to the

maximum extent possible

local data is locally owned and managed local data belongs to the local server even if it is

accessible from other servers security, integrity, ..., are in the responsibility of the local

server

Page 13: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

13

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

No reliance on a central site

reasons bottle-neck vulnerability

conclusion all sites must be equal

Page 14: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

14

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Location independence

users should not have to know where data is physically stored

why do you think this is needed?• think of application programs

what does this objective look like?

Page 15: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

15

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Data fragmentation

data fragmentation if a relation can be divided into “fragments” for storing

purposes motivation: performance - data is stored where it is

mostly used

definition fragment = any subrelation derivable via restriction or

projection

Page 16: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

16

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

FRAGMENT Emp INTOLo_Emp AT SITE ‘London’

WHERE Dept_id = ‘Sales’Le_Emp AT SITE ‘Leeds’

WHERE Dept_id = ‘Dev’ ;

Data fragmentation - example

Page 17: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

17

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Fragmentation independence / transparency

users should perceive data as if it were not fragmented

why?

it is the optimiser’s responsibility to determine which fragments need to be physically accessed

similar to views retrieving updating (JOIN and UNION views)

Page 18: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

18

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Data replication

copies of the same fragment can exist at different sites

reasons better availability better performance

disadvantage update propagation

Page 19: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

19

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Replication independence / transparency

users should not have to be aware of data replication

it is the optimiser’s responsibility to choose which replica to use

commercial systems not full support for replication independence (update

problems) - primary copy

Page 20: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

20

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Distributed query processing

the system must have set level operators one record at a time - too many messages (traffic) relational - indicated

optimisation particularly relevant! find best way to move data across the network

Page 21: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

21

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

3

Page 22: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

22

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Problems

occur due to network utilisation

aim minimise network utilisation

query processing

catalogue management

update propagation

recovery control

concurrency control

Page 23: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

23

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Query processing

in a distributed environment query execution is distributed query optimisation is distributed

• global optimisation

• local optimisation

example• query on relation R issued at site X

• part of R, say Ry, stored at Y

• part of R, say Rz, stored at Z

• where is the query going to be executed?

Page 24: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

24

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Catalogue management

what ‘other’ data does the catalog include? fragmentation, replication ...

where should the catalogue be stored centralised fully replicated

• loss of autonomy - update propagation!

partitioned • non local operations - very expensive!

combination of first and third

Page 25: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

25

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Central Catalogue

all updates, including local updates, have to be recorded in the central catalogue disadvantages:

bottleneck conflicts with the “no reliance on a central site” objective

Page 26: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

26

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Fully Replicated Catalogue

the entire database catalogue (not only the local one) is stored at each site

every time an update is made, it has to be recorded at each site disadvantages

loss of local autonomy time and network traffic consuming updates

Page 27: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

27

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Update propagation

problems because of replication data might become less available

primary copy scheme one copy is designated primary copy (unique) primary copies exist at different sites (distributed) an update is logically complete if the primary copy has been

updated• the site holding the primary copy would have to propagate the

updates

violation of local autonomy

Page 28: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

28

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Concurrency control

locking overhead - increased number of messages

primary copy strategy locking only the primary copy the primary copy’s site will propagate the update loss of autonomy (severely)

global deadlock two interlocked (waiting for each other) sites cannot be detected using the wait-for graph - therefore,

communication overhead

Page 29: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

29

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Page 30: 1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

30

Term 2, 2004, Lecture 9, Distributed Databases Marian Ursu, Department of Computing, Goldsmiths College

Conclusion

generalities objectives – in brief problems – in brief