Rocksteady: Fast Migration for Low-Latency In-memory Storage...Rocksteady: Fast Migration for...

Rocksteady: Fast Migration for Low-LatencyIn-memory Storage

Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman

Introduction• Distributed low-latency in-memory key-value stores are emerging

• Predictable response times ~10 µs median, ~60 µs 99.9th-tile

• Problem: Must migrate data between servers• Minimize performance impact of migration → go slow?• Quickly respond to hot spots, skew shifts, load spikes → go fast?

• Solution: Fast data migration with low impact• Early ownership transfer of data, leverage workload skew• Low priority, parallel and adaptive migration

• Result: Migration protocol for RAMCloud in-memory key-value store• Migrates 256 GB in 6 minutes, 99.9th-tile latency less than 250 µs• Median latency recovers from 40 µs to 20 µs in 14 s

multiGet( ) multiGet( )

Why Migrate Data?

Poor spatial locality → High multiGet() fan-out → More RPCs

Server 1 Server 2

No LocalityFanout=7

A C B D

Client 1

Client 2

6 Million

multiGet( )multiGet( )

Migrate To Improve Spatial Locality

Server 1 Server 2

A C B D

A B C D

BNo LocalityFanout=7

Client 1 Client 2

6 Million

multiGet( ) multiGet( )

Spatial Locality Improves Throughput

Better spatial locality → Fewer RPCs → Higher throughputBenefits multiGet(), range scans

Server 1 Server 2

A B C D

Full LocalityFanout=1

No LocalityFanout=7

Client 1 Client 2

6 Million

25 Million

The RAMCloud Key-Value Store

Data Center Fabric

Master

Backup

Master

Backup

Master

Backup

Master

Backup

Client Client Client Client

Coordinator

All Data in RAM

Kernel Bypass/DPDK

10 µs reads

Data Center Fabric

Master

Backup

Master

Backup

Master

Backup

Master

Backup

Coordinator

Write RPC

Data Center Fabric

Master

Backup

Master

Backup

Master

Backup

Master

Backup

Coordinator1x in DRAM

3x on Disk

Write RPC

Fault-tolerance & Recovery In RAMCloud

Data Center Fabric

Master

Backup

Master

Backup

Master

Backup

Master

Backup

Coordinator

Fault-tolerance & Recovery In RAMCloud

Data Center Fabric

Master

Backup

Master

Backup

Master

Backup

Master

Backup

Coordinator 2 seconds to recover

Performance Goals For Migration

• Maintain low access latency• 10 µsec median latency → System extremely sensitive• Tail latency matters at scale → Even more sensitive

• Migrate data fast• Workloads dynamic → Respond quickly• Growing DRAM storage: 512 GB per server

• Slow data migration → Entire day to scale cluster

Rocksteady Overview: Early Ownership Transfer

Problem: Loaded source can bottleneck migrationSolution: Instantly shift ownership and all load to target

Source Server Target Server

Client 1 Client 2

Reads and Writes

Client 3 Client 4

Rocksteady Overview: Early Ownership Transfer

Problem: Loaded source can bottleneck migrationSolution: Instantly shift ownership and all load to target

Client 1 Client 2 Client 3 Client 4

Instantly Redirected All future operations

serviced at Target

Creates “headroom” to speed migration

Rocksteady Overview: Leverage Skew

On-demand Pull

Problem: Data has not arrived at source yetSolution: On demand migration of unavailable data

Rocksteady Overview: Leverage Skew

Problem: Data has not arrived at source yetSolution: On demand migration of unavailable data

Hot keys move early

Median Latency recovers to

20 µs in 14 s

Rocksteady Overview: Adaptive and Parallel

On-demand Pull

Problem: Old single-threaded protocol limited to 130 MB/sSolution: Pipelined and parallel at source and target

Parallel Pulls

Rocksteady Overview: Adaptive and Parallel

On-demand Pull

Problem: Old single-threaded protocol limited to 130 MB/sSolution: Pipelined and parallel at source and target

Target Driven

Yields toOn-demand Pulls

Moves 758 MB/sParallel Pulls

Rocksteady Overview: Eliminate Sync Replication

Source Server Target ServerParallel Pulls

On-demand Pull

Replication

Problem: Synchronous replication bottleneck at targetSolution: Safely defer replication until after migration

Rocksteady Overview: Eliminate Sync Replication

Replication

Problem: Synchronous replication bottleneck at targetSolution: Safely defer replication until after migration

Rocksteady: Putting it all together

• Instantaneous ownership transfer• Immediate load reduction at overloaded source• Creates “headroom” for migration work

• Leverage skew to rapidly migrate hot data• Target comes up to speed with little data movement

• Adaptive parallel, pipelined at source and target• All cores avoid stalls, but yield to client-facing operations

• Safely defer replication at target• Eliminates replication bottleneck and contention 20

Rocksteady

• Instantaneous ownership transfer

• Leverage skew to rapidly migrate hot data

• Adaptive parallel, pipelined at source and target

• Safely defer synchronous replication at target

Source Server

Evaluation Setup

300 Million Records45 GB

Target Server

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

Evaluation Setup

150 Million Records22.5 GB

Target Server

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

150 Million Records22.5 GB

Instantaneous Ownership Transfer

Before migration: Source over-loaded, Target under-loadedOwnership transfer creates Source headroom for migration

Before OwnershipTransfer

Immediately AfterTransfer

tiliza

Created 55%Source CPU Headroom

Rocksteady

Leverage Skew To Move Hot Data

After ownership transfer, hot keys pulled on-demandMore skew → Median restored faster (migrate fewer hot keys)

240µs 245µs

155µs

75µs28µs 17µs

Uniform (Low) Skew=0.99 Skew=1.5 (High)

99.9th Latency Median Latency Before Migration:

Median=10 µs99.9th = 60 µs

Rocksteady

Parallel, Pipelined, & Adaptive Pulls

Target HashTable

0 8 16 24

WorkerCores

DispatchCore

Polling

replay replay read(B)

MigrationManager

Pull Buffers

pulling

Per-CoreBuffers

• Target driven, migration manager

• Co-partitioned hash tables, pull from partitions in parallel

• Replay pulled data into per-core buffers

Source

DispatchCore

Polling

read(A)

HashTable

0 8 16 24

pull(11) pull(17)

GatherList

CopyAddresses

• Stateless passive Source

• Granular 20 KB pulls

• Redirect any idle CPU for migration

• Migration yields to regular requests, on-demand pulls

Rocksteady

Each server has a recovery log distributed across the cluster

Backup

Naïve Fault Tolerance During Migration

Source

Target

Backup

B CSourceRecovery Log

TargetRecovery Log

Migrated data needs to be triplicated to target’s recovery log

Source

Target

Backup

TargetRecovery Log

Migrated data needs to be triplicated to target’s recovery log

Source

Target

Backup

TargetRecovery Log

Synchronous Replication Bottlenecks Migration

Synchronous replication hits migration speed by 34%

Source

Target

Backup

TargetRecovery Log

B A B A B A

Replicate at Target only after all data has been moved over

Source

Target

Backup

TargetRecovery Log

Rocksteady: Safely Defer Replication At The Target

Writes/Mutations Served By Target

Mutations have to be replicated by the target

Source

Target

AB C’

Backup

TargetRecovery Log

C’ C’ C’

Crash Safety During Migration

• Need both Source and Target recovery log for data recovery

• Initial table state on Source recovery log• Writes/Mutations on Target recovery log

• Transfer ownership back to Source in case of crash• Migration cancelled• Recovery involves both recovery logs

• Source takes a dependency on Target recovery log at migration start

• Stored reliably at the cluster coordinator• Identifies position after which mutations present

If The Source Crashes During Migration

Recover Source, recover from Target recovery log

Source

Target

AB C’

Backup

TargetRecovery Log

C’ C’ C’

If The Target Crashes During Migration

Recover from Source and Target recovery log, recover Target

Source

Target

AB C’

Backup

TargetRecovery Log

C’ C’ C’

Crash Safety During Migration

• Need both Source and Target recovery log for data recovery

• Initial table state on Source recovery log• Writes/Mutations on Target recovery log

• Transfer ownership back to Source in case of crash• Migration cancelled• Recovery involves both recovery logs

• Source takes a dependency on Target recovery log at migration start

• Stored reliably at the cluster coordinator• Identifies position after which mutations present

Safely Transfer Ownership At Migration Start

Safely Delay Replication Till All Data Has Been Moved

Performance of Rocksteady

Time Since Experiment Start (Seconds)

cy (µ

Rocksteady

Source Keeps Ownership+

Sync Replication

YCSB-B, 300 Million objects (30 B key, 100 B value), migrate half

cy (µ

Rocksteady

Sync Replication

Median latencybetter after~14 seconds

cy (µ

Rocksteady

Sync Replication

Median latencybetter after~14 seconds

28% fastermigration

Related Work• Dynamo: Pre-partition hash keys

• Spanner: Applications given control over locality (Directories)

• FaRM and DrTM: Re-use in-memory redundancy for migration

• Squall: Reconfiguration protocol for H-Store• Early ownership transfer• Paced background migration• Fully partitioned, serial execution, no synchronization

• Each migration pull stalls execution• Synchronous replication at the target

Conclusion• Distributed low-latency in-memory key-value stores are emerging

• Predictable response times ~10 µs median, ~60 µs 99.9th-tile

• Problem: Must migrate data between servers• Minimize performance impact of migration → go slow?• Quickly respond to hot spots, skew shifts, load spikes → go fast?

• Solution: Fast data migration with low impact• Leverage skew: Transfer ownership before data, move hot data first• Low priority, parallel and adaptive migration

• Result: Migration protocol for RAMCloud in-memory key-value store• Migrates at 758 MBps with 99.9th-tile latency < 250 µs

Source Code: https://github.com/utah-scs/RAMCloud/tree/rocksteady-sosp201746

Backup Slides

Rocksteady Tail Latency Breakdown

100150200250300

Rocksteady

100150200250300

Rocksteady

• Disabling parallel pulls brings tail latency down to 160 µsec

100150200250300

Rocksteady

• Disabling parallel pulls brings tail latency down to 160 µsec

• Synchronous on-demand pulls further brings tail latency down to 135 µsec

Rocksteady: Fast Migration for Low-Latency In-memory Storage...Rocksteady: Fast Migration for...

Documents

Transcript of Rocksteady: Fast Migration for Low-Latency In-memory Storage...Rocksteady: Fast Migration for...

The Impact of Inter-node Latency versus Intra-node Latency on … · 2021. 8. 11. · The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23rd IASTED

QA Function for Low Latency Trading Platform. 2 | © 2012, Cognizant Agenda Latency : Definition and Importance Low Latency Application Architecture -

Long Tail of Latency - Duke Universitydb.cs.duke.edu/courses/cps214/fall15/Day15.pdf · Day 15. Agenda •Why is Latency Important? •Latency in Data Centers •Reducing Latency

Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architectureyoonguk/papers/lee-hpca13.pdf · 2013-12-06 · Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture

Average Throughput vs. Latency (Pre-existing)engr.case.edu/liberatore_vincenzo/papers/adamprj2.pdfThroughput vs. Latency – Throughput is relatively unaffected by latency, as seen

Ultra-low Latency Switches XG2000 Series - Fujitsu · Ultra-low Latency Switches XG2000 Series Offers industry-leading low latency in 10GbE Switching (300ns latency) Delivers 400Gbps

GUIDE MELBOURNE PROGRAM - TV Tonight09:30 amTeenage Mutant Ninja Turtles (Rpt) G Behop And Rocksteady Conquer The Universe Bebop and Rocksteady gain possession of an Anxietron ray

Website Latency Diagnostics

In-memory Storage Fast Migration for Low-latency Rocksteadyiwanicki/courses/ds/2019/presentations… · In-memory stores migrations Low-latency access times RAMCloud responds in tens

CMS Latency Review, 13th March 2007CSC Trigger 1 Latency and Synchronization update.

UTP SIP MIGRATION AND LATENCY REDUCTION … SIP MIGRATION AND LATENCY REDUCTION PLAN • Performance and Reliability Improvements • High Level Project Plan • Acceptance Testing

Latency Liberated

HOSTWAY MAKES MIGRATION TO OFFICE 365 EASY!€¦ · destination servers, network latency, number of mailbox items, number mailboxes, mailbox size, size of attachments, number of folders,

High -Fidelity Latency Measurements in Low -Latency Networks

Latency LTE

Megaphone: Latency-conscious state migration for ...Megaphone: Latency-conscious state migration for distributed streaming dataﬂows Moritz Hoffmann Andrea Lattuada Frank McSherry

Towards Energy Proportionality for Large-Scale Latency ...csl.stanford.edu/~christos/publications/2014.pegasus.isca.pdf · to optimize tail latency, not just average latency. Achieving

Global Low Latency Ethernet - Tata Communications · Low Latency Multipoint guarantees quality and SLAs: It is backed by Guaranteed “Latency” SLA with continuous latency monitoring

“Cut Me Some Slack”: Latency-Aware Live Migration …lass.cs.umass.edu/papers/pdf/edbt12-slack.pdf · “Cut Me Some Slack”: Latency-Aware Live Migration for Databases ... high

MUTINY REPORTS MAIDEN ROCKSTEADY IRON RESOURCE …