CGS 2545: Database Concepts Spring 2012 Distributed Database Management Systems
Distributed Database Systems
-
Upload
shelly-golden -
Category
Documents
-
view
72 -
download
1
description
Transcript of Distributed Database Systems
![Page 1: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/1.jpg)
1
Distributed Database Systems
![Page 2: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/2.jpg)
2
A Distributed Database on a Geographically Dispersed Network
![Page 3: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/3.jpg)
3
A Distributed Database on a Local Network
![Page 4: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/4.jpg)
4
A Multi-Processor System
![Page 5: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/5.jpg)
5
Types of Accesses to a Distributed Database
![Page 6: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/6.jpg)
6
Distributed Access Plan
1) At site 1Send sites 2 and 3 the supplier number SN
2) At sites 2 and 3Execute in parallel, upon receipt of the supplier number, the following program:
Find all PARTS records havingSUP # = SN;Send result to site 1
3) At Site 1Merge results from sites 2 and 3;Output the result.
![Page 7: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/7.jpg)
7
![Page 8: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/8.jpg)
8
Components of a Commercial DDBMS
![Page 9: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/9.jpg)
9
Data Distribution
Problem:Choose a unit of the logical database to use for assignment to data modules.
Possibilities:Relations –Distribution issues will influence
logical database design.Columns –Distribution issues will
influence logical database design.
Rows –Too many; Directories become too large.
Data Items -Too many; Directories become too large.
![Page 10: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/10.jpg)
10
Data Distribution
Fragments – Logically defined rectangular subsets of relationsRelation 1
Relation 2
Fragment 2
Fragment 3
Fragment 1
Fragment 1
Fragment 2
![Page 11: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/11.jpg)
11
Data Distribution
Logical definition of fragments -
Jones
35 32K
Salesman
Black A
Name Age $ Job-Title Supervisor
Dept.
Fragment 1
Fragment 2 Fragment 3
$ > 30K
$ < 30K
![Page 12: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/12.jpg)
12
Data Distribution
Datamodules
F1
F2 F3 F1 F2
DM1
DM2
DM3
Personnel Inventory
Assignment of Fragments to Datamodules
![Page 13: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/13.jpg)
13
Data Distribution
Advantages of fragments as units of distribution.
Very flexible in size and definition.Distribution choices are largely independent of logical design.
![Page 14: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/14.jpg)
14
System Considerations
Reliable NetworkPipelining
Logical Data ItemsDatabase Operations: Read
WriteTransactions: Read Set
Write SetAtomic – “All or Nothing”
Effect
![Page 15: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/15.jpg)
15
System Considerations (cont’d)
Each site in the DDBMS has one or both of the following software modules:
Transaction Manager (TM)Data Manager (DM)
TM’sRead, Parse, and Optimize user queriesHandle all interface with the user
DM’sMaintain physical databasePerform actual reads and writes
![Page 16: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/16.jpg)
16
System Considerations (cont’d)
TM
DMTM
TM DM
DMTransaction
Transaction
Transaction
Transaction
Data
Data
Data
TM’s communication only with DM’s
DM’s communication only with TM’s
![Page 17: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/17.jpg)
17
Transaction Execution
Transaction TM’s Action.
Begin Set up temporary workspace.
Read (X) Select a DM which stores X,Send a message to this DM requesting X,Place X in workspace.
Read (X) No Action necessaryX is already in workspace.
Write (X) Change the value of X.
Read (X) No action necessary.
End Send a pre-commit to each DM that stores a copy of X,
Await acknowledgements,Send commit message
![Page 18: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/18.jpg)
18
Optimal File Allocation In A Distributed Database System
Given a number of computers that process common information files, how can we:
allocate the files optimally so that the allocation yields minimum overall operating costs (storage and communication)?meet access time requirements for each file?not exceed the storage capacity of each computer?
Note: A File may be viewed as a segment.
![Page 19: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/19.jpg)
19
System Parameters
n Computers
m FilesSize of each fileUsage distribution for each file at each computerFrequency of modification of each file at each computer during usageAccess time requirement for each file at each computer
Storage capacity of each computer.
Cost of storage per unit file length per computer.
Cost of transmission per unit file length per second per pair of computers.
![Page 20: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/20.jpg)
20
Model
COSTS
Total Cost = Storage Costs + Transmission Costs
TC = CS + CT
Transmission Costs = Costs for Retrievals + Cost for Updates
CT = CTR + CTU
CONSTRAINTS
Each file must be stored in at least one computer.The storage capacity of each computer must not be exceeded.The probability of exceeding the required access time for each file must be less than a specified bound.
![Page 21: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/21.jpg)
21
Mathematical Representation Model
![Page 22: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/22.jpg)
22
![Page 23: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/23.jpg)
23
![Page 24: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/24.jpg)
24
![Page 25: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/25.jpg)
25
![Page 26: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/26.jpg)
26
![Page 27: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/27.jpg)
27
Transmission Paths Between Each Pair of
Computers
![Page 28: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/28.jpg)
28
![Page 29: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/29.jpg)
29
Reliability Constraint
Assuming processors and channels each have identical reliability,
ap = availability of the processor
ac = availability of the channel
rj = # of redundant copies of the jth file
Aj = Availability of the jth file
Aj= ap [1 - (1 - acap)rj
For example ap = 0.98, ac = 0.99, then
Aj = 0.951 for rj = 1
Aj = 0.979 for rj = 2
![Page 30: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/30.jpg)
30
![Page 31: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/31.jpg)
31
File Directory for Distributed Databases
![Page 32: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/32.jpg)
32
To Other NodesTransaction
ManagerDirectory Manager
Database Manager
DDBMS
User Transactio
n
Database
Directory
Fragment
Overview of the Directory Manager
Legend
High-Level Request
Standard Database Call
Physical Access Call
Non-Local Request
![Page 33: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/33.jpg)
33
Content of Directory
Global description
Fragmentation description
Allocation description
Mappings to local names
Access method description
Statistics on the database
Consistency information
![Page 34: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/34.jpg)
34
Content of a Directory System
Physical (Static)
Location (Site, Copy #, Disk, Page);
Creator;
Creation Date;
Version of the File Size;
Code Format;
Date of Last Update;
Logical (Dynamic)
File Status (R, W)
Number of Backlog Jobs;
Site Availability;
Resource Requirement;
Processing Cost;
Communication Cost;
Translation Cost;
Security
(File, User, C);
C=Read/Write;
Read Only;
Write Only;
Operation
Compression ratio (Logical Operation Query Data Value);
Query Access Optimizer;
Statistical Data Gathering;
Protocols
![Page 35: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/35.jpg)
35
The Functional Objectives ofIntegrated Dictionary/Directory
To support the control of data resourcesMaintaining data independence, security, and integrity
To support applications developmentOffering standardized data definitions and usage characteristicsEstablished program entities, DDL
To provide independence of directory data elements
Different hardware and software environmentsChanges in these environments
![Page 36: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/36.jpg)
36
Possible Data Types In IDD
Data names, definitions, formats and sizes.
Integrity constraints, authorization tables, and usage statistics for transaction management.
Schemas and sub-schemas.
Description of standardized transactions and reports.
Characteristics of hardware, such as processors, lines, and terminals.
Description of users.
The IDD must support the maintenance of relationships between various entities such as:
Associations between
Authorization tables and data,Users and transactionsReports
The IDD supplies version control
![Page 37: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/37.jpg)
37
Entity EntityRelationship
Attribute Attribute Attribute
Attribute Attribute Attribute
Figure 1
![Page 38: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/38.jpg)
38
Contains
Relationship Created 820708
Social Security Number
Entity Created 820114
Payroll Record
Maximum Length 400 Characters
Entity Created 820519
Comments Length
9 Characters
Figure 2
![Page 39: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/39.jpg)
39
Schema Model Level
TypicalMeta-Entity-Types
Schema Level
Typical
Entity-Types, Relationship-Types,and
Attribute-Types
DictionaryLevelTypical
Entities, Relationships, and Attributes
Entity-Type
Element
Record
Document
Social-Security-Number
Agency-Name
Employee Record
Payroll Record
Form 1040
FIPS Guideline
Relationship-Type Record-Contains-Element
Payroll-Record-Contains-Employee-Name
Table 1
Length
CreatorAttribute-Type
9 Characters
ADP Division
![Page 40: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/40.jpg)
40
Classes of Directory
Centralized Directory
Single Master DirectoryExtended Centralized DirectoryMultiple Master Directory
Local Directory
Distributed Directory
![Page 41: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/41.jpg)
41
![Page 42: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/42.jpg)
42
![Page 43: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/43.jpg)
43
![Page 44: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/44.jpg)
44
![Page 45: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/45.jpg)
45
![Page 46: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/46.jpg)
46
Causes For Directory Update
Changing the description or structure of
the user database.
Moving user database entities from one
node to another.
Changing the description of a user or
node.
Changing a user view.
Changing a network node’s status.
![Page 47: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/47.jpg)
47
Specific Drawbacks with Globally Replicated Directories
1) Additional remote activity to maintain directory coherence.
2) Difficulty of posting directory changes to a down site.
3) Difficulty of integrating a new site.
4) Storage of directory entries where they are not referenced.
5) Blurred responsibility for maintaining the directory.
![Page 48: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/48.jpg)
48
Performance Measure
Operating Cost/Unit Time = Communication Cost
(Query+Update)
+Storage Cost + Code Translation Cost(Query+Update)
Response Time
![Page 49: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/49.jpg)
49
Operating Cost for the Centralized Directory System
![Page 50: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/50.jpg)
50
![Page 51: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/51.jpg)
51
Cost Trade-offs of Directory Systems
Assume
Communication cost much greater than storage costNo Translation costAll computers have same directory update rate
Then the cost trade-off point is at directory update rate.
P(C,EC) = 2/(N – 1) P(C,D) = 2/(N – 1) P(L,D) = 1
![Page 52: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/52.jpg)
52
![Page 53: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/53.jpg)
53
Type
Centralized
Extended Centralized
Multiple Master
Distributed Master
Localized
Description
Single Master directory
Advantages
Simplicity
Ease of update
Reduces transmission costs and delays
Reduces transmission costs and delays
Fall-soft CharacteristicsFast Response
Simple update procedure
Disadvantages
Transmission costs and delays
Coordinating updates of local directories
Knowledge of appended directories
Storage requirements
Coordinating update of redundant copies
Storage costs
Transmission costs for updates to the directory
Transmission costs for non-local queries
Variation of the centralized case in which the directory information is permanently appended in the local node once it is obtained from the master directoryVariation of the centralized case in which redundant copies of the master directory exist
Master at every node
Local directory at each node without replication
Directory Design Alternatives
![Page 54: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/54.jpg)
54
Distributed Ingres Dictionary/Directory Contain Four Types of Data:
Relation name and location
Information for parsing queries(domain names, formats, etc.)
Performance information(number of tuples, storage structures, etc.)
Consistency information(protection, integrity constraints, etc. Does not include control data for concurrency control and synchronization)
![Page 55: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/55.jpg)
55
SDD-1 Dictionary/Directory
The directory itself is defined and maintained like any other user data. It can be logically fragmented, distributed, and replicated across the distributed DBMS’s.
A directory locator (a small highly static file of directory fragment locations) is kept at every site and is used by the TMs and DMs to plan and control transactions and to help ensure DB integrity and consistency across concurrent accesses of data elements.
The transaction modules are capable of caching remotely accessed directory data for subsequent usage. This facility is provided on the presumption that DB operations will exhibit the locality-of-reference characteristic.
![Page 56: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/56.jpg)
56
Vpatient : Patient Class
name
SSN
age
patID
{report}
PatientDB1
name
SSN
age
PatientDB2
name
SSN
patID
PatReportDB2
patID
report
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 17: Pictorial diagram showing usefulness of keys.
![Page 57: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/57.jpg)
57
name
sex
age
ssn
job
personDB1
name
sex
age
ssn
personDB2
name
gender
ssn
job
Figure 15: Pictorial diagram showing correspondence between virtual and real attributes.
Vperson : PersonClass
V person
People
Virtual Collection
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Character_to_String
Character_to_StringLargePositiveInteger_to_String
![Page 58: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/58.jpg)
58
Vretiree:retireClass
name
income
Vincome: incomeClass
stockAmount
pension
financeDB1
name
stockAmount
financeDB2
name
pension
Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.
Figure 18: Pictorial diagram for aggregation.
![Page 59: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/59.jpg)
59
Vname: nameClass
first
middle
last
personDB1
name
getfirst
getmiddle
getlast
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 19: Pictorial diagram of computed attribute.
![Page 60: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/60.jpg)
60
Vretiree:retireClass
name
incom
e
financeDB1
name
stockAmount
financeDB2
name
pension
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 20: Pictorial diagram of computed attribute.
1
2
![Page 61: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/61.jpg)
61
Vinsurance:insuranceClass
name
{insuranceAmoun
ts}
carInsuranceDB1
carOwner
amount
houseInsuranceDB2
houseOnwer
amount
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 21: Pictorial diagram showing grouping.
![Page 62: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/62.jpg)
62
Vpatient : patientClass
name
{doctors}
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 22: Pictorial diagram showing relationship.
Vdoctors : doctorClass
name
docID
salarypatientDB1
name
salary
patientDB1
name
docID
patientDB2
name
physician
patientDB1
name
docID(key)
(pointer)
relationship
![Page 63: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/63.jpg)
63
VtreatedBy : treatedByClass
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 23: Pictorial diagram showing a named relationship.
Vpatient : PatientClass
.
.
.
patientDB1
name
docID
amountOwed
patient
doctor
amountOwed
(key)
(key)
Vdoctor : DoctorClass
.
.
.
![Page 64: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/64.jpg)
64
Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.
VpersonPatient : personClass
name
Vpatient : patientClass
patID
amount
VpersonDoctor : personClass
name
Vdoctor : DoctorClass
docID
salary
patientDB1name
SSN
payment
name
docID
salary
doctorDB2
Figure 24: Pictorial diagram showing relationship.
Vpatient
patient
Vdoctor
doctorperson
VpersonPatient
VpersonDoctor
Virtual collections
![Page 65: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/65.jpg)
65
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 30: Derivation of Virtual Entity Vconcept.
ConceptSemTypeconceptID
semTypeID
Concept
conceptID
termID
stringType
stringID
stringVal
Vconcept
conceptID
semType
{termSet}Vterm
termID
{stringSet}
Vstring
stringName
stringID
stringType
(key)
![Page 66: Distributed Database Systems](https://reader036.fdocuments.us/reader036/viewer/2022081501/568133fb550346895d9aeffa/html5/thumbnails/66.jpg)
66
DsemType
ID
name
definition
{relatedTo}
DsemRelate
relName
semName
status
SemTypeDef
ID
name
definition
SemTypeRel
name1
rel
name2
status
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 31: Derivation of Virtual Entity VsemType.