Distributed databases,types of database
-
Upload
boomadevi-shanmugam -
Category
Documents
-
view
580 -
download
4
Transcript of Distributed databases,types of database
![Page 1: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/1.jpg)
R.BOOMADEVI.,M.C.A.,M.E[A/P] CSE DEPARTMENT
CHRIST THE KING ENGINEERING COLLEGECOIMBATORE
![Page 2: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/2.jpg)
A distributed database system consists of loosely coupled sites that share no physical component
Appears to user as a single systemDatabase systems that run on each site
are independent of each otherProcessing maybe done at a site other
than the initiator of request
![Page 3: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/3.jpg)
All sites have identical software They are aware of each other and agree to
cooperate in processing user requests It appears to user as a single system
![Page 4: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/4.jpg)
A distributed system connects three databases: hq, mfg, and sales
An application can simultaneously access or modify the data in several databases in a single distributed environment.
![Page 5: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/5.jpg)
In a heterogeneous distributed database system, at least one of the databases uses different schemas and software.
A database system having different schema may cause a major problem for query processing.
A database system having different software may cause a major problem for transaction processing.
![Page 6: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/6.jpg)
Replication◦ System maintains multiple copies of data,
stored in different sites, for faster retrieval and fault tolerance.
Fragmentation◦ Relation is partitioned into several fragments
stored in distinct sites
Replication and fragmentation can be combined• Relation is partitioned into several fragments:
system maintains several identical replicas of each such fragment.
![Page 7: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/7.jpg)
Availability: failure of site containing relation r does not result in unavailability of r is replicas exist.
Parallelism: queries on r may be processed by several nodes in parallel.
Reduced data transfer: relation r is available locally at each site containing a replica of r.
![Page 8: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/8.jpg)
Increased cost of updates: each replica of relation r must be updated.
Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented. One solution: choose one copy as primary copy and
apply concurrency control operations on primary copy.
![Page 9: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/9.jpg)
Data can be distributed by storing individual tables at different sites
Data can also be distributed by decomposing a table and storing portions at different sites – called Fragmentation
Fragmentation can be horizontal or vertical
![Page 10: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/10.jpg)
Usage - in general applications use views so it’s appropriate to work with subsets
Efficiency - data stored close to where it is most frequently used
Parallelism - a transaction can divided into several sub-queries to increase degree of concurrency
Security - data more secure - only stored where it is needed
Disadvantages:
Performance - may be slower
Integrity - more difficult
![Page 11: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/11.jpg)
Each fragment, Ti , of table T contains a subset of the rows
Each tuple of T is assigned to one or more fragments.
Horizontal fragmentation is lossless
![Page 12: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/12.jpg)
A bank account schema has a relation Account-schema = (branch-name, account-number, balance).
It fragments the relation by location and stores each fragment locally: rows with branch-name = `Hillside` are stored in the Hillside in a fragment
![Page 13: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/13.jpg)
Each fragment, Ti, of T contains a subset of the columns, each column is in at least one fragment, and each fragment includes the key:
Ti = attr_listi (T)
T = T1 T2 ….. Tn
All schemas must contain a common candidate key (or superkey) to ensure lossless join property.
A special attribute, the tuple-id attribute may be added to each schema to serve as a candidate key.
![Page 14: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/14.jpg)
A employee-info schema has a relation employee-info schema = (designation, name, Employee-id, salary).
It fragments the relation to put information in two tables for security concern.
![Page 15: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/15.jpg)
Commit protocols are used to ensure atomicity across sites
Atomicity states that database modifications must follow an “all or nothing” rule.
a transaction which executes at multiple sites must either be committed at all the sites, or aborted at all the sites.
![Page 16: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/16.jpg)
What is this?
Two-phase commit is a transaction protocol designed for the complications that arise with distributed resource managers.
Two-phase commit technology is used for hotel and airline reservations, stock market transactions, banking applications, and credit card systems.
With a two-phase commit protocol, the distributed transaction manager employs a coordinator to manage the individual resource managers. The commit process proceeds as follows:
![Page 17: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/17.jpg)
Step 1 Coordinator asks all participants to prepare to commit transaction Ti.
Ci adds the records <prepare T> to the log and forces log to stable storage (a log is a file which maintains a record of all changes to the database)
sends prepare T messages to all sites where T
executed
![Page 18: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/18.jpg)
Step 2 Upon receiving message, transaction manager at site determines if it can commit the transaction if not:
add a record <no T> to the log and send abort T message to Ci
if the transaction can be committed, then:1). add the record <ready T> to the log2). force all records for T to stable storage3). send ready T message to Ci
![Page 19: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/19.jpg)
Step 1 T can be committed of Ci received a ready T message from all the participating sites: otherwise T must be aborted.
Step 2 Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto stable storage. Once the record is in stable storage, it cannot be revoked (even if failures occur)
Step 3 Coordinator sends a message to each participant informing it of the decision (commit or abort)
Step 4 Participants take appropriate action locally.
![Page 20: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/20.jpg)
![Page 21: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/21.jpg)
There have been two performance issues with two phase commit: ◦ If one database server is unavailable, none of the
servers gets the updates. ◦ This is correctable through network tuning and
correctly building the data distribution through database optimization techniques.
![Page 22: Distributed databases,types of database](https://reader036.fdocuments.us/reader036/viewer/2022062303/556d0c5cd8b42ad34f8b4d35/html5/thumbnails/22.jpg)
THANK YOU