Investigating Distributed Database Systems
description
Transcript of Investigating Distributed Database Systems
![Page 1: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/1.jpg)
Investigating Distributed Investigating Distributed Database SystemsDatabase Systems
Challenges and Technology
Kishore Puppala Rao
![Page 2: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/2.jpg)
DefinitionsDefinitions
A database is a logically related collection of data, stored in one or many files
A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network
![Page 3: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/3.jpg)
ArchitectureArchitecture
Client/server architecturesMultiple clients, single server – this is the
most common and straightforward implementation
Multiple clients, multiple servers – more flexible. DB distributed over multiple servers. Each client directs requests to a “home” server.
![Page 4: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/4.jpg)
Architecture (cont’d)Architecture (cont’d)
DB is physically distributed by fragmenting and replicating data (discussed later)
Regardless of architecture, implementation details of queries, transactions and DB operations should be transparent to users.
![Page 5: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/5.jpg)
Architecture (Peer-to-peer)Architecture (Peer-to-peer)
No distinction between client and serverEach site has functionality of both client
and serverE.g. File-sharing apps such as BearShare,
LiveWireSophisticated protocols needed to manage
data distributed across multiple sites
![Page 6: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/6.jpg)
FragmentationFragmentation
Partitions the dataSubdivides each relation either vertically
(by project operation) or horizontally (by selection operation)
Facilitates the placement of data close to its place of use, reducing transmission costs
![Page 7: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/7.jpg)
ReplicationReplication
Refers to duplication of data for access and/or security purposes
Fragments or whole database may be replicated
Replication involves keeping physical separate copies of data at different sites
![Page 8: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/8.jpg)
Distributed vs. ParallelDistributed vs. Parallel
Distributed DBMS are not parallel DBMS, although distinction may be unclear
Distributed DBMS assume loose connection between processors operating independently, perhaps under different operating systems
![Page 9: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/9.jpg)
Parallel DBMSParallel DBMS
Multiple processors under same operating system.
Architecture: Shared-none, shared-disk, or shared memory
Shared-Nothing: Each processor has exclusive access to its main memory and disk. Each processing element (PE) is a local site.
![Page 10: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/10.jpg)
Parallel DBMS (cont’d)Parallel DBMS (cont’d)
Shared-memory: Each PE has access to any memory module or disk through some fast connection (e.g. LAN or cross-bar switch)
Shared-disk: Each PE has exclusive access to its own memory, but shared access to any disk via a fast connection. PE accesses DB pages on shared disk and copy to local cache
![Page 11: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/11.jpg)
TransparencyTransparency
Distributed (and Parallel) DBMS must provide same functionality and consistency of centralized DBMS.
Transparency implies presenting a consistent view that shields the user from implementation details such as fragmentation, replication, and distribution.
Introduces major challenges
![Page 12: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/12.jpg)
ChallengesChallenges
Query processing and optimizationConcurrency controlReliability protocolsReplication protocols
![Page 13: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/13.jpg)
Query Processing and Query Processing and OptimizationOptimization
Techniques needed to address difficulties arising from data distribution and fragmentation. Localization techniques employed.
Algebraic queries on global relations are transformed to operate on fragments
Opportunities for parallel processing are identified (fragments are stored at different sites), unnecessary work is eliminated (not all fragments may be involved in the query)
![Page 14: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/14.jpg)
Query optimizationQuery optimization
Determining the execution sites for distributed operations
Identifying the best distributed algorithm for distributed operations
Changing the order of operations in a query
![Page 15: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/15.jpg)
Concurrency ControlConcurrency Control
Challenge in synchronizing user transactions is to extend serializability and concurrency to the distributed execution environment
Serializability: The ability to perform a set of operations in parallel with the same effect as if they were performed in a certain sequence, requires:
(a) execution of the set of transactions at each site must be serializable
(b) the serialization orders of these transactions at all these sites must be identical
![Page 16: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/16.jpg)
Concurrency (cont’d)Concurrency (cont’d)
If locking-based algorithms used, lock management may be centralized or distributed
Deadlocks must be avoidedDeadlock detection and management in a
distributed database can be difficult
![Page 17: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/17.jpg)
Reliability protocolsReliability protocols
Several types of failures: System, media, transaction, communication
May be difficult to differentiate type of failure
Distributed reliability protocols enforce transaction atomicity (commit all or commit nothing)
![Page 18: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/18.jpg)
Reliability (cont’d)Reliability (cont’d)
E.g. of Atomic commitment protocol: Two-phase commit
All sites involved in the execution of a distributed transaction must agree to commit the transaction before it is made permanent.
![Page 19: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/19.jpg)
Replication protocolsReplication protocols
Each logical data item has a number of physical instances
Challenge is to maintain (or approximate) consistency among physical copies as user updates logical data
Example criterion: One-copy equivalence – All physical copies of logical data should be equivalent after being updated by a transaction
Read-One/Write All (ROWA) protocol – enforces one-copy equivalence. Disadvantage: failure of one site may block entire transaction
![Page 20: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/20.jpg)
Replication (cont’d)Replication (cont’d)
Alternative algorithms relax ROWA by mapping each write to a subset of the physical copies
Quorum-based voting: Copies are assigned votes; read and (especially) write operations have to collect votes and reach a quorum to commit data. (see class notes)
![Page 21: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/21.jpg)
Research and TrendsResearch and Trends
Workflow models (advanced transaction models)
Network scaling problemsMulti-database systems and interoperabilityDistributed object management
![Page 22: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/22.jpg)
Trends (cont’d)Trends (cont’d)
Primitive objects are not simple-structured data. Can consist of programs, voice, images, etc.
Distributed DBMS must handle increasingly larger data objects. E.g. 1MB storage needed for 1 digital X-Ray image (1024x1024) @ 8 bits/pixel
Most commercial DBMS (e.g. MS SQL Server 2000, Oracle 8i) provide some sort of distribution
Emergence of broadband networks eliminates the network as a bottleneck
![Page 23: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/23.jpg)
Trends (cont’d)Trends (cont’d)
Mobile computing is escalating in interest and prevalence
Mobile stations may download data as needed Alternatively, more powerful mobile stations may
store native data for sharing with others Mobility raises issues of address migration,
maintenance of directories, and determining the location of stations
Object-oriented DBMS e.g. CORBA (platform independent), COM/OLE (MS-specific)
![Page 24: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/24.jpg)
CORBACORBA
Common Object Request Broker Architecture
Facilitates the maintenance and DB access of data from a number of autonomous and heterogeneous sources (e.g. file systems, spreadsheets) via a multidatabase approach
Provides a generic platform for distributed computing
![Page 25: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/25.jpg)
CORBA (cont’d)CORBA (cont’d)
In multidatabase systems, the main problem is the heterogeneity extant at four levels: platform, communication, database system, and semantic.
CORBA facilitates implementation transparency by providing client access via interfaces defined in a special Interface Definition Language (IDL), independent of the databases actual software and hardware environment.
Provides location transparency, allowing clients to access DB objects independent of location and communication protocols
![Page 26: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/26.jpg)
CORBA (cont’d)CORBA (cont’d)
Provides a common interface to mask heterogeneity among native database system implementations based on different data models (e.g. flat-file, relational, spreadsheet) and query languages
Common interface overcomes semantic conflicts such as schema and data conflicts
![Page 27: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/27.jpg)
ReferencesReferences
M.T. Ozsu and P. Valduriez, "Distributed and Parallel Database Systems – Technology and Current State-of-the-Art", ACM Computing Surveys, 28(1): 125 - 128, March 1996.
A. Dogac, C. Dengi and M.T. Ozsu, "Distributed Object Computing Platforms", Communications of ACM, 41(9): 95-103, September 1998.
J. N. Gray, “Notes on Data Base Operating Systems.” Operating Systems: An Advanced Course. R. Bayer, R.M. Graham (eds.) New York: Springer-Verlag, 1979, pp. 393-481.
![Page 28: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/28.jpg)
References (cont’d)References (cont’d)
M.T. Ozsu, "The Push/Pull Effect - Can Distributed Database Technology Meet The Challenges of New Applications?", Database Programming & Design, April 1997.
![Page 29: Investigating Distributed Database Systems](https://reader035.fdocuments.us/reader035/viewer/2022062409/568145a4550346895db2995b/html5/thumbnails/29.jpg)
Thank youThank you