Unit 1 DD Overview

42
Distributed Databases : An Overview Unit-1

Transcript of Unit 1 DD Overview

Page 1: Unit 1 DD Overview

Distributed Databases :An Overview

Unit-1

Page 2: Unit 1 DD Overview

ContentsUNIT – IChapter.1

1.0 What is a Distributed Database [ DDB]1.1 Features of Distributed versus Centralized Databases,

Chapter 3. Levels Of Distribution Transparency, 3.1 Reference Architecture for Distributed Databases , 3.2 Types of Data Fragmentation, 3.6 Integrity Constraints in Distributed Databases.

Book-1 : Distributed Databases, by Stefano Ceri, Giuseppe Pelagatti, Tata McGraw-Hill edn 20081.1; 3.1, 3.2, 3.6

Page 3: Unit 1 DD Overview

1.1 Features of Distributed versus Centralized Databases

What is a Distributed Database [DDB]?A simple definition:

A collection of data which belong to the same enterprise spread over the sites of a computer network.

The two important aspect of a DDB are:Distribution – [ of data]

In a centralized database data is at a single site [ host]

Logical Correlation – how exactly the data at different site are related.

Illustration of DDB through example:

Page 4: Unit 1 DD Overview

Different Scenarios of BD applications

Personal Computer• One DB application • one computer

Page 5: Unit 1 DD Overview

• One/more application(s) on a single computer with multiple [dumb] terminals / users

Different Scenarios of BD applications

Page 6: Unit 1 DD Overview

• Multiple networked computers each with its own DB local application and local users

Different Scenarios of BD applications

Page 7: Unit 1 DD Overview

• Multiple networked computers each with its own DB local DB and local users with a global application accessing data from these sites

Different Scenarios of BD applications

Page 8: Unit 1 DD Overview

• Multiple networked computers each with its own local DB and local users with multiple global applications, each accessing data from these multiple sites

Different Scenarios of BD applications

Page 9: Unit 1 DD Overview

Example.1A bank with 3 branches at different

locations. At each branch, a computer controls the teller terminals of the branch and the account database of the branch.

Each branch with its local database constitutes one site of the distributed database.

Computers are connected by a communication network

each site handles only local applications – operations requested from a terminal to access the db of that branch.

Does logical correlation property hold here?Should this be considered as an example of a DDB or a set of local DBs?

A global application – eg. An application that transfers funds from one site to another- is the one that make a DDB.

Page 10: Unit 1 DD Overview

Example.2Same the previous example 1Now the computers and their

respective DBs have been moved form the branches to a common building and are connected with a high-bandwidth local network.

Tellers are connected to their respective computers by telephone lines

Each processor and its DB constitute a site for the local computer network.Should this be considered as an

example of a DDB or a set of local DBs?

Fig 1.2

Same as example 1 except for the geographical distribution of the computers

What are the major differences between the two from the view point of functioning and performance?

Page 11: Unit 1 DD Overview

Example.3• Here the data of the different

branches are distributed on three “backend” computers, which perform the DBMS functions.

• The application programs are executed by a different computer [front-end] , which requests database access services from the backends when necessary.

Computer Center

Fig 1.3 A multiprocessor System

Should this be considered as an example of a DDB or a set of local DBs?NO. though the data is distributed, their distribution is not relevant to the application point of view. What is missing here is the local application.

Page 12: Unit 1 DD Overview

1.1 Features of Distributed versus Centralized Databases

From the examples we can have the following working definition of a Distributed Database [DDB].A DDB is an integrated database which is built on top of a

computer network rather than on a single computer. The data which constitute the database are stored at the different sites of the computer network, and the application programs which are run by the computer access data at different sites.

Page 13: Unit 1 DD Overview

13

Taxonomy of DDS

Page 14: Unit 1 DD Overview

14

Homogeneous Distributed Databases

In a homogeneous distributed databaseAll sites have identical software Are aware of each other and agree to cooperate

in processing user requests.Each site surrenders part of its autonomy in

terms of right to change schemas or softwareAppears to user as a single system

Page 15: Unit 1 DD Overview

15

Architecture of Homogeneous DDBMS

Page 16: Unit 1 DD Overview

16

Schema Architecture of a Homogenous DDBMS

Page 17: Unit 1 DD Overview

17

Hetrogeneous Distributed Databases

In a heterogeneous distributed databaseDifferent sites may use different schemas and software

Difference in schema is a major problem for query processing

Difference in software is a major problem for transaction processing

Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing

Page 18: Unit 1 DD Overview

18

Overall Architecture of multidatabase Systems

Page 19: Unit 1 DD Overview

19

1. Distributed Database System

• Tightly Coupled• Loosely Coupled

Page 20: Unit 1 DD Overview

20

Schema Architecture of Tightly-Coupled MDBS

• Advantages of Replication– Availability: failure of site containing relation r does not

result in unavailability of r is replicas exist.– Parallelism: queries on r may be processed by several

nodes in parallel.– Reduced data transfer: relation r is available locally at

each site containing a replica of r.

– ri = Ri (r)

Page 21: Unit 1 DD Overview

21

1. Distributed Database System

• Loosely Coupled• A distributed database system consists of

loosely coupled sites that share no physical component

• Database systems that run on each site are independent of each other

• Transactions may access data at one or more sites

Page 22: Unit 1 DD Overview

22

Loosely Coupled MDBS with Export Schema

Page 23: Unit 1 DD Overview

23

Loosely Coupled MDBS with No Export Schema

Page 24: Unit 1 DD Overview

DBS Architectures

DBS-Architecture

Page 25: Unit 1 DD Overview

Features of a centralized Vs DDBs

Page 26: Unit 1 DD Overview

centralized Vs DDBsReview:

What is a centralized DB? Traditional databases

What is a DDBs?

Features that characterize a Centralized DBCentralized ControlData independenceReduction of redundancyComplex physical structures and efficient access Integrity, Recovery and Concurrency ControlPrivacy and Security

Page 27: Unit 1 DD Overview

centralized Vs DDBsCentralized Control

CDB One point control of the entire DB Single Database Administrator [DBA]

DDB Multi point (source) control Global Database Administrator [GDBA] & Local Database Administrator [LDBA] & “Site /Local Autonomy”- decides freedom of local

administrator

Page 28: Unit 1 DD Overview

centralized Vs DDBsData Independence

What is data Independence? Organization of data (physical storage of data in a DB) is

transparent to the application developer How is it achieved?

Layered design/ Levels of Abstraction– Logical Level [Conceptual design- schemas, tuples, attributes]– Physical Level [ how data is stored in the hard disc]

Benefit Application developers need not know how data is

stored in the database stored In CDB

Allows the two layers to be designed independently How does this help? Each can be designed /changed

independent of the other.

Page 29: Unit 1 DD Overview

centralized Vs DDBsData Independence …. Contd…

In DDB Also proves data independence, with an additional

feature called Distribution Transparency –Application programmers

not only need to know – How data is stored, and also– On which site it is stored.

Thus we have here in addition to traditional– Conceptual Schema– Storage Schema, we have– External Schema

Page 30: Unit 1 DD Overview

centralized Vs DDBsRedundancy Reduction

In CDB Redundancy repetition of data Reduced as much as possible for TWO reasons:

– To avoid inconsistencies– To minimized the storage required

It is one of the main concerns – Normalization used

In DDB Redundancy is allowed ………….

Page 31: Unit 1 DD Overview

centralized Vs DDBsRedundancy Reduction … contd..

In DDB Redundancy is allowed Reasons

– Faster access [ local data can be accessed faster]» Higher throughput» Higher availability» More fault tolerant

Makes design, development and data modification complex .

Page 32: Unit 1 DD Overview

centralized Vs DDBsComplex Physical Structure & Efficient Access

In CDB Uses indexing, hashing, interfile chains and so on Purpose – faster / efficient access

In DDB Complex structures alone can not solve access

problems Efficient access is still an issue Complex structures at local level alone [local

optimization] are not enough. The network delays dominate the disc access delays.

A global optimization is necessary and it includes local optimization plan + an additional “network access plan”

Page 33: Unit 1 DD Overview

centralized Vs DDBsIntegrity, recovery & concurrency Control

In CDB Integrity- requires enforcing ACID properties Integrity in Concurrency environmentConcurrency control

Various Protocols : two-phase, time-stamp, tree- ..etc.,Recovery

Log based approach, checkpointing etc. In DDB

All these are enforced Distribution of data make these protocols more

complex.

Page 34: Unit 1 DD Overview

centralized Vs DDBsPrivacy & Security

In CDB DBA ensures authorized access to data Also requires additional specialized control

DDB Has similar problem, in addition to threats over the

network Local autonomy helps the local DBA to enforce

security Additional security measures are required for global /

overt the network threats.

Page 35: Unit 1 DD Overview

Why DDBS?Organizational & economic reasons Interconnection of existing DBs Incremental growthReduced communication overheadPerformance considerationsReliability & availabilityAll these problems are not new. Why then the

development of DDBSs has taken this long? First, development of inexpensive, powerful small computers Second, for want of necessary network, middleware & DB-

technologies

Page 36: Unit 1 DD Overview

DDBMSDistributed Database Management Systems

They support the creation & maintenance of DDBSsThey contain additional components which extend the

capabilities of CDBMSs. The typical such software components are: The database management component (DB) Data communication component (DC)– ODBC, JDBC,

TCP/IP The data dictionary (DD)– to include information

about the distribution of data over the network – fragmentation schema & allocation schema

The distributed database component (DDB)

Page 37: Unit 1 DD Overview

components of a commercial DDBMS

DCDCDB

DD DD DDBDDB

Local database-2

DCDCDB

DD DD DDBDDB

Local database-1

Site 2

Site 1

Page 38: Unit 1 DD Overview

components of a commercial DDBMSServices supported by the above systems

Remote database access by an application: RPC, ODBC, JDBC, TCP/IP, Named-pipes

Some degree of distribution transparencySupport for database administration & controlSome support for concurrency control

Page 39: Unit 1 DD Overview

Assignment -1

1. List out all the key words introduced in this chapter and write a brief definition/explanation for each of them.

2. Selected any TWO commercial DBMS of your choice and describe the salient features of them as DDBMS.

DUE: next week the same hour.Questions:1. What are the different types of DDBS? Explain them

briefly2. What are the major differences between CDB & DDB?

Exsplain.

Page 40: Unit 1 DD Overview

Seminars Sai sandeepShekun Bee IndexingRamya KrishnaSwathi GSwathi CSameeraRajeswriSharon SamuelSri RamyaSravanthi

Page 41: Unit 1 DD Overview

Seminars

Naga subramanyamGiridharSyed AbdullaBhaskar Aunusha-1Najma KanamAmruthaAnusha-2

Page 42: Unit 1 DD Overview

JAI SAI RAM