Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

Post on 15-Dec-2014

864 views 1 download

description

Multitenant data infrastructures for large cloud platforms hostinghundreds of thousands of applications face the challenge of servingapplications characterized by small data footprint and unpredictableload patterns. When such a platform is built on an elasticpay-per-use infrastructure, an added challenge is to minimizethe system’s operating cost while guaranteeing the tenants’ servicelevel agreements (SLA). Elastic load balancing is therefore an importantfeature to enable scale-up during high load while scalingdown when the load is low. Live migration, a technique to migratetenants with minimal service interruption and no downtime, is criticalto allow lightweight elastic scaling. We focus on the problemof live migration in the database layer. We propose Zephyr,a technique to efficiently migrate a live database in a shared nothingtransactional database architecture. Zephyr uses phases of ondemandpull and asynchronous push of data, requires minimal synchronization,results no service unavailability and few or no abortedtransactions, minimizes the data transfer overhead, provides ACIDguarantees during migration, and ensures correctness in the presenceof failures. We outline a prototype implementation using anopen source relational database engine and an present a thoroughevaluation using various transactional workloads. Zephyr’s efficiencyis evident from the few tens of failed operations, 10-20%change in average transaction latency, minimal messaging, and nooverhead during normal operation when migrating a live database.

Transcript of Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

Zephyr: Live Migration in Shared Nothing Databases for Elastic

Cloud Platforms

Aaron J. Elmore, Sudipto Das, Divyakant Agrawal, Amr El Abbadi

Distributed Systems LabUniversity of California Santa Barbara

Sudipto Das {sudipto@cs.ucsb.edu}

Serve thousands of applications (tenants)◦ AppEngine, Azure, Force.com

Tenants are (typically)◦ Small

◦ SLA sensitive

◦ Erratic load patterns

◦ Subject to flash crowds i.e. the fark, digg, slashdot, reddit effect (for now)

Support for Multitenancy is critical Our focus: DBMSs serving these platforms

Cloud Application Platforms

Sudipto Das {sudipto@cs.ucsb.edu}

Multitenancy…

What the tenant wants…

What the service provider wants…

Unused resources

Cloud Infrastructure is Elastic

Static provisioning for peak is inelastic

Traditional Infrastructures Deployment in the Cloud

Demand

Capacity

Time

Reso

urc

es

Demand

Capacity

Time

Reso

urc

es

Slide Credits: Berkeley RAD Lab

Sudipto Das {sudipto@cs.ucsb.edu}

Elasticity in a Multitenant DB

Database tier

Sudipto Das {sudipto@cs.ucsb.edu}

Load Balancer

Application/Web/Caching tier

Live Database Migration

Migrate a tenant’s database in a Live system◦ A critical operation to support elasticity

Different from◦ Migration between software versions

◦ Migration in case of schema evolution

Sudipto Das {sudipto@cs.ucsb.edu}

VM Migration for DB Elasticity

VM migration [Clark et al., NSDI 2005]

One tenant-per-VM ◦ Pros: allows fine-grained load balancing

◦ Cons Performance overhead Poor consolidation ratio [Curino et al., CIDR 2011]

Multiple tenants in a VM◦ Pros: good performance

◦ Cons: Migrate all tenants Coarse-grained load balancing

Sudipto Das {sudipto@cs.ucsb.edu}

Problem Formulation Multiple tenants share the same database process◦ Shared process multitenancy

◦ Example systems: SQL Azure, ElasTraS, RelationalCloud, and may more

Migrate individual tenants VM migration cannot be used for fine-grained

migration Target architecture: Shared Nothing

◦ Shared storage architectures: see our VLDB 2011 Paper

Sudipto Das {sudipto@cs.ucsb.edu}

Sudipto Das {sudipto@cs.ucsb.edu}

Shared nothing architecture

Sudipto Das {sudipto@cs.ucsb.edu}

How to ensure no downtime? Need to migrate the persistent database image (tens of

MBs to GBs) How to guarantee correctness during failures?

Nodes can fail during migration How to ensure transaction atomicity and durability? How to recover migration state after failure?

Nodes recover after a failure

How to guarantee serializability? Transaction correctness equivalent to normal operation

How to minimize migration cost? …

Why is Live Migration hard?

Sudipto Das {sudipto@cs.ucsb.edu}

Downtime ◦ Time tenant is unavailable

Service Interruption◦ Number of operations failing/transactions aborting

Migration Overhead/Performance impact◦ During normal operation, migration, and after

migration

Additional Data Transferred ◦ Data transferred in addition to DB’s persistent image

Migration Cost Metrics

Sudipto Das {sudipto@cs.ucsb.edu}

Migration executed in phases Starts with transfer of minimal information to destination

(“wireframe”) Source and destination concurrently execute

transactions in one migration phase Database pages used as granule of

migration Pages “pulled” by destination on-demand

Minimal transaction synchronization A page is uniquely owned by either source or destination Leverage page level locking

Logging and handshaking protocols to tolerate failures

How did we do it?

Sudipto Das {sudipto@cs.ucsb.edu}

For this talk◦ Small tenants

i.e. not sharded across nodes.

◦ No replication

◦ No structural changes to indices

Extensions in the paper◦Relaxes these assumptions

Simplifying Assumptions

Sudipto Das {sudipto@cs.ucsb.edu}

Design Overview

Owned Pages

Active transactions

Page owned by Node

Page not owned by Node

P1

P2

P3

Pn

TS1,…, TSk

Source Destination

Sudipto Das {sudipto@cs.ucsb.edu}

Init Mode

Owned Pages

Active transactions

Un-owned Pages

Freeze index wireframe and migrate

Page owned by Node

Page not owned by Node

P1

P2

P3

Pn

TS1,…, TSk

Source Destination

P1

P2

P3

Pn

Sudipto Das {sudipto@cs.ucsb.edu}

What is an index wireframe?

Source Destination

Sudipto Das {sudipto@cs.ucsb.edu}

Dual ModeRequests for un-owned pages can block

Old, still active transactions

New transactions

Page owned by Node

Page not owned by Node

P1

P2

Pn

TSk+1,…, TSl

TD1,…, TDm

P3

P3 accessed by TDi

P3 pulled from

source

Source Destination

P1

P2

P3

Pn

Index wireframes remain frozen

Sudipto Das {sudipto@cs.ucsb.edu}

Finish ModePages can be pulled by the destination, if needed

Completed

Page owned by Node

Page not owned by Node

Pn

Source Destination

P1

P2

P3

P1, P2, … pushed

from source TDm+1,

…, TDn

Pn

P1

P2

P3

Sudipto Das {sudipto@cs.ucsb.edu}

Normal Operation

Page owned by Node

Page not owned by Node

Source Destination

P1

P2

P3

TDn+1,…, TDp

Pn

Index wireframe un-frozen

Sudipto Das {sudipto@cs.ucsb.edu}

Once migrated, pages are never pulled back by source◦ Transactions at source accessing migrated pages are

aborted

No structural changes to indices during migration◦ Transactions (at both nodes) that make structural

changes to indices abort

Destination “pulls” pages on-demand◦ Transactions at the destination experience higher

latency compared to normal operation

Artifacts of this design

Sudipto Das {sudipto@cs.ucsb.edu}

Only concern is “dual mode”◦ Init and Finish: only one node is executing transactions

Local predicate locking of internal index and exclusive page level locking between nodes no phantoms

Strict 2PL Transactions are locally serializable

Pages transferred only once ◦ No Tdest Tsource conflict dependency

Guaranteed serializability

Serializability (proofs in paper)

Sudipto Das {sudipto@cs.ucsb.edu}

Transaction recovery◦ For every database page, transactions at source

ordered before transactions at destination

◦ After failure, conflicting transactions replayed in the same order

Migration recovery◦ Atomic transitions between migration modes

Logging and handshake protocols◦ Every page has exactly one owner

Bookkeeping at the index level

Recovery (proofs in paper)

Sudipto Das {sudipto@cs.ucsb.edu}

In the presence of arbitrary repeated failures, Zephyr ensures:◦ Updates made to database pages are consistent

◦ A failure does not leave a page without an owner

◦ Both source and destination are in the same migration mode

Guaranteed termination and starvation freedom

Correctness (proofs in paper)

Sudipto Das {sudipto@cs.ucsb.edu}

Replicated Tenants Sharded Tenants Allow structural changes to the indices◦ Using shared lock managers in the dual mode

Extensions (Details in the paper)

Sudipto Das {sudipto@cs.ucsb.edu}

Prototyped using an open source OLTP database H2◦ Supports standard SQL/JDBC API

◦ Serializable isolation level

◦ Tree Indices

◦ Relational data model

Modified the database engine◦ Added support for freezing indices

◦ Page migration status maintained using index

◦ Details in the paper…

Tungsten SQL Router migrates JDBC connections during migration

Implementation

Sudipto Das {sudipto@cs.ucsb.edu}

Two database nodes, each with a DB instance running

Synthetic benchmark as load generator◦ Modified YCSB to add transactions

Small read/write transactions Compared against Stop and Copy (S&C)

Experimental Setup

Sudipto Das {sudipto@cs.ucsb.edu}

Experimental Methodology

Metadata

Default transaction parameters:

10 operations per transaction 80% Read,

15% Update, 5% Inserts

Hardware: 2.4 Ghz Intel Core 2 Quads, 8GB

RAM, 7200 RPM SATA HDs with 32 MB Cache

Gigabit ethernet

Workload: 60 sessions100 Transactions per

session

System Controller

Migrate

Default DB Size: 100k rows (~250 MB)

Sudipto Das {sudipto@cs.ucsb.edu}

Downtime (tenant unavailability)◦S&C: 3 – 8 seconds (needed to migrate,

unavailable for updates)

◦Zephyr: No downtime. Either source or destination is available

Service interruption (failed operations)◦S&C: ~100 s – 1,000s. All transactions with

updates are aborted

◦Zephyr: ~10s – 100s. Orders of magnitude less interruption

Results Overview

Sudipto Das {sudipto@cs.ucsb.edu}

Average increase in transaction latency (compared to the 6,000 transaction workload without migration)◦ S&C: 10 – 15%. Cold cache at destination

◦ Zephyr: 10 – 20%. Pages fetched on-demand

Data transfer◦ S&C: Persistent database image

◦ Zephyr: 2 – 3% additional data transfer (messaging overhead)

Total time taken to migrate◦ S&C: 3 – 8 seconds. Unavailable for any writes

◦ Zephyr: 10 – 18 seconds. No-unavailability

Results Overview

Sudipto Das {sudipto@cs.ucsb.edu}

Failed Operations

Orders of magnitude fewer failed operations

Sudipto Das {sudipto@cs.ucsb.edu}

Proposed Zephyr, a live database migration technique with no downtime for shared nothing architectures◦ The first end to end solution with safety,

correctness and liveness guarantees

Prototype implementation on a relational OLTP database

Low cost on a variety of workloads

Contributions

Back-up

Sudipto Das {sudipto@cs.ucsb.edu}

More details

Source Destination

Txns

Sudipto Das {sudipto@cs.ucsb.edu}

Freeze indexes

Source Destination

Txns

Sudipto Das {sudipto@cs.ucsb.edu}

Duplicate indexes with sentinels

Source Destination

Txns

Sudipto Das {sudipto@cs.ucsb.edu}

Dual Mode

Source Destination

Txns

Sudipto Das {sudipto@cs.ucsb.edu}

Finish Mode

37

Source Destination

Txns

Sudipto Das {sudipto@cs.ucsb.edu}

Finish Mode

Source Destination

Txns

Sudipto Das {sudipto@cs.ucsb.edu}

Either source or destination is serving the tenant◦ No downtime

Serializable transaction execution◦ Unique page ownership

◦ Local multi-granularity locking

Safety in the presence of failures◦ Transactions are atomic and durable

◦ Migration state is recovered from log Ensure consistency of the database state

Guarantees

Sudipto Das {sudipto@cs.ucsb.edu}

Wireframe copy Typically orders of magnitude smaller than

data Operational overhead during migration

Extra data (in addition to database pages) transferred

Transactions aborted during migration

Migration Cost Analysis

Sudipto Das {sudipto@cs.ucsb.edu}

Effect of Inserts on Zephyr

Failures due to attempted modification of Index structure

Sudipto Das {sudipto@cs.ucsb.edu}

Average Transaction Latency

Only committed transaction reported

Loss of cache for both migration types

Zephyr results in a remote fetch