TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud...

29
TM 11- TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing

Transcript of TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud...

Page 1: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-11Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Lecture 11

Distributed Databasesand

Cloud computing

Page 2: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-22Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Definitions

• Distributed Database: A single logical database that is spread physically across computers in multiple locations that are connected by a data communications link.

• Decentralized Database: A collection of independent databases on non-networked computers.

Page 3: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-33Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Reasons forDistributed Database

• Local business units want control over data.

• Consolidate data across local databases for integrated decision making.

• Reduce telecommunications costs.

• Reduce the risk of telecommunications failures.

Page 4: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-44Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Distributed Database Options• Homogeneous - Same DBMS at each node.

– Autonomous - Independent DBMSs.– Non-autonomous - Central , coordinating

DBMS.

• Heterogeneous - Different DBMSs at different nodes.– Gateways - Simple paths are created to other

databases without the benefits of one logical database.

Page 5: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-55Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Distributed database environments

Page 6: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-66Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Homogeneous, Non-Autonomous Database

Page 7: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-77Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Homogeneous, Non-Autonomous Database

• Data is distributed across all the nodes.• Same DBMS at each node.• All data is managed by the distributed

DBMS (no exclusively local data.)• All access is through one, global schema.• The global schema is the union of all the

local schema.

Page 8: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-88Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Focus on The Following Heterogeneous Environment

Page 9: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-99Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Focus on The Following Heterogeneous Environment

• Data distributed across all the nodes.

• Different DBMS may be used at each node.

• Local access is done using the local DBMS and schema.

• Remote access is done using the global schema.

Page 10: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-1010Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Objectives and Trade-offs

• Location Transparency - User does not have to know the location of the data.

• Local Autonomy - Local site can operate with its database when central site is down.

• Synchronous Distributed Database - All copies of the same data are always identical.

• Asynchronous Distributed Database - Some data inconsistency is tolerated.

Page 11: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-1111Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Advantages ofDistributed Database

• Increased reliability and availability.

• Local control over data.

• Modular growth.

• Lower communication costs.

• Faster response for certain queries.

Page 12: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-1212Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Disadvantages ofDistributed Database

• Software cost and complexity.

• Processing overhead.

• Data integrity exposure.

• Slower response for certain queries.

Page 13: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-1313Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Options forDistributing a Database

• Data replication.

• Horizontal partitioning.

• Vertical partitioning.

• Combinations of the above.

Page 14: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-1414Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Data Replication• Advantages -

– Reliability.– Fast response.– May avoid complicated distributed transaction integrity

routines (if replicated data is refreshed at scheduled intervals.)

– De-couples nodes (transactions proceed even if some nodes are down.)

– Reduced network traffic at prime time (if updates can be delayed.)

Page 15: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-1515Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Data Replication

• Disadvantages -– Additional requirements for storage space.– Additional time for update operations.– Complexity and cost of updating.– Integrity exposure of getting incorrect data if

replicated data is not updated simultaneously.

• Therefore, better when used for non-volatile data.

Page 16: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-1616Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Types of Data Replication

• Snapshot Replication - – Changes are periodically sent to a master site which

sends an updated snapshot out to the other sites.

• Near Real-Time Replication -– Broadcast update orders without requiring

confirmation.

• Pull Replication -– Each site controls when it wants updates.

Page 17: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-1717Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Issues in Data Replication Use

• Data timeliness.• Useful if DBMS cannot reference data from more

than one node.• Batched updates can cause performance problems.• Updates complicated with heterogeneous DBMSs

or database design.• Telecommunications speeds may limit mass

updates.

Page 18: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-1818Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Horizontal Partitioning

• Different records of a file at different sites.

• Advantages -– Data stored close to where it is used.– Local access optimization.– Security.

• Disadvantages– Accessing data across partitions.– No data replication.

Page 19: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-1919Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Vertical Partitioning

• Different columns of a file at different sites.

• Advantages and disadvantages are the same as for horizontal partitioning except that combining data across partitions is more difficult because it requires joins.

Page 20: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-2020Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Factors in Choice ofDistributed Strategy

• Funding, autonomy, security.

• Site data referencing patterns.

• Growth and expansion needs.

• Technological capabilities.

• Costs of managing complex technologies.

• Need for reliable service.

Page 21: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-2121Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Cloud computing

• Cloud computing is the latest evolution of Internet-based computing.

• The potential benefits of cloud computing are overwhelming. However, attaining these benefits requires that each aspect of the cloud platform support the key design principles of the cloud model.

• One of the core design principles is dynamic scalability, or the ability to provision and decommission servers on demand.

• Unfortunately, the majority of today’s database servers are incapable of satisfying this requirement.

Page 22: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-2222Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Page 23: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-2323Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Key Benefits of Cloud Computing:

• Lower costs: All resources, including expensive networking equipment, servers, IT personnel, etc. are shared, resulting in reduced costs, especially for small to mid-sized applications and prototypes.

• Dynamic scalability: Most applications experience spikes in traffic. Instead of over-buying your own equipment to accommodate these spikes, many cloud services can smoothly and efficiently scale to handle these spikes with a more cost-effective pay-as-you-go model.

• Simplified maintenance: upgrades are rapidly deployed across the shared infrastructure, as are backups.

• Large scale testing: Cloud computing makes large scale prototyping and load testing much easier. You can easily spawn 1,000 servers in the cloud to load test your application and then release them as soon as you are done.

• Faster development: Cloud computing platforms provide many of the core services that, under traditional development models, would normally be built in house. These services, plus templates and other tools can significantly accelerate the development cycle.

Page 24: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-2424Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Evolving Cloud Database Requirements

• Cloud database usage patterns are evolving, and business adoption of these technologies accelerates that evolution. Initially, cloud databases serviced consumer applications. These early applications put a priority on read access, because the ratio of reads to writes was very high. Delivering high-performance read access was the primary purchase criteria. However, this is changing.

• Consumer-centric cloud database applications have been evolving with the adoption of Web 2.0 technologies. User generated content, particularly in the form of social networking, have placed somewhat more emphasis on updates. Reads still outnumber writes in terms of the ratio, but the gap is narrowing. With support for transactional business applications, this gap between database updates and reads is further shrinking. Business applications also demand that the cloud database be ACID compliant: providing Atomicity, Consistency, Isolation and Durability.

Page 25: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-2525Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

The Achilles Heel of Cloud Databases

• Dynamic scalability—one of the core principles of cloud computing—has proven to be a particular problem for databases. The reason is simple; most databases use a shared-nothing architecture. The shared-nothing architecture relies on splitting (partitioning) the data into separate silos of data, one

per server.

Page 26: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-2626Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Are Replicated Tables the Answer?

• Since data partitioning and cloud databases are inherently incompatible, Amazon, Facebook and Google have taken another approach to solve the cloud database challenge. They have created a persistence engine—technically not a database—that abandons typical ACID compliance in favor replicated tables of data that store and retrieve information while supporting dynamic or elastic scalability.

• Facebook offers BigTable, Amazon has SimpleDB and Facebook is working on Cassandra. However, they are not a replacement for a real database, and they do not address corporate cloud computing requirements.

Page 27: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-2727Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

The Shared-Disk Database Architecture is Ideal for Cloud Databases

• The database architecture called shared-disk, which eliminates the need to partition data, is ideal for cloud databases. Shared-disk databases allow clusters of low-cost servers to use a single collection of data, typically served up by a Storage Area Network (SAN) or Network Attached Storage (NAS). All of the data is available to all of the servers, there is no partitioning of the data. As a result, if you are using two servers, and your query takes .5 seconds, you can dynamically add another server and the same query might now take .35 seconds. In other words, shared-disk databases support elastic scalability.

• The shared-disk DBMS architecture has other important advantages—in addition to elastic scalability—that make it very appealing for deployment in the cloud.

Page 28: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-2828Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Page 29: TM 11-1 Copyright © 1999 Addison Wesley Longman, Inc. Lecture 11 Distributed Databases and Cloud computing.

TM 11-TM 11-2929Copyright © 1999 Addison Wesley Longman, Inc.Copyright © 1999 Addison Wesley Longman, Inc.

Conclusion

• Whether you are assembling, managing or developing on a cloud computing platform, you need a cloud-compatible database.

• Shared-nothing databases require data partitioning, which is structurally incompatible with dynamic scalability, a core foundation of cloud computing.

• The shared-disk database architecture, on the other hand, does support elastic scalability. It also supports other cloud objectives such as lower costs for hardware, maintenance, tuning and support.