Review - Scalable Atomic Visibility with RAMP Transactions

28
Introduction Scalable Atomic Visibility with RAMP Transactions Peter Bailis, Alan Fekete 2 , Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica UC Berkeley and University of Sydney 2 Iskandar Setiadi 13511073 Advanced Distributed System Institut Teknologi Bandung April 21, 2015 3/24/22 1

Transcript of Review - Scalable Atomic Visibility with RAMP Transactions

Introduction

Scalable Atomic Visibility with RAMP Transactions

Peter Bailis, Alan Fekete2, Ali Ghodsi, Joseph M. Hellerstein, Ion StoicaUC Berkeley and University of Sydney2

Iskandar Setiadi13511073

Advanced Distributed SystemInstitut Teknologi Bandung

April 21, 2015

April 14, 2023 1

Iskandar Setiadi

IntroductionOverview and MotivationSemantic and System ModelRAMP Transaction AlgorithmsExperimental Evaluation

Note: Due to time restriction, several additional details and further optimizations are left as an exercise for the reader.

Outline

April 14, 2023 2

Iskandar Setiadi

TransactionA sequence of operations performed as a single logical unit of work

Atomic Visible Transactional AccessCases where all or none of each transaction’s effects should be visibleIf a transaction T1 writes x = 1 and y = 1, then another transaction T2 should not read x = 1 and y = null.

Introduction

April 14, 2023 3

Iskandar Setiadi

Scalability and Atomic VisibilityMany traditional transactional mechanisms use two-phases locking and variants of optimistic concurrency control to ensure the correctness of transactions.These algorithms are slow and, under failure, unavailable in a distributed environment.

Current Problems

April 14, 2023 4

Iskandar Setiadi

Read Atomic Multi-Partition (RAMP)This algorithm enforces atomic visibility while offering excellent scalability, guaranteed commit despite partial failures (via synchronization independence), and minimized communication between servers (via partition independence).

RAMP transactions allow reads to “race” writes: It can autonomously detect the presence of non-atomic reads and, if necessary, repair them via a second round of communication with servers.

Read Atomic (RA) Isolation

April 14, 2023 5

Iskandar Setiadi

RAMP uses ACPs (Atomic Commitment Protocol) with non-blocking concurrency control mechanisms: individual transactions can stall due to failures or communication delays without forcing other transactions to stall.

Overview

April 14, 2023 6

Iskandar Setiadi

Facebook and LinkedIn Espresso allow a user to perform a “like” action on a certain message / post.Violations of atomic visibility may surface as broken bi-directional relationship (friend relationship in Facebook) and dangling references.

Motivation: Foreign Key Constraints

April 14, 2023 7

Iskandar Setiadi

Secondary IndexingSearching data via secondary attributes (e.g. birth date) is challenging. In Cassandra and Google Megastore, they allow local secondary index, which requires contacting every partition for secondary attribute lookups.

Materialized View MaintenanceExample: Mailbox “unread message counter”

Motivation (Cont.)

April 14, 2023 8

Iskandar Setiadi

Fractured ReadsA transaction Tj exhibits fractured reads if transaction Ti writes versions xm and yn (in any order, with x possibly but not necessarily equal to y), Tj reads version xm and version yk, and k < n.

Read Atomic Isolation (RA) prevents fractured read anomalies and also prevents transactions from reading uncommited, aborted, or intermediate data. (snapshot view)

Semantic and System Model

April 14, 2023 9

Iskandar Setiadi

RA does not prevent concurrent updates or provide serial access to data items.Example: RA cannot be used to maintain bank account balances. RA is a better fit for the “friend” operation.

RA Implications & Limitations

April 14, 2023 10

Iskandar Setiadi

Given specification for RA isolation and scalability, the following example will focus on providing read-only and write-only transactions with “last writer wins” overwrite policy.

3 types:1. RAMP-Fast (RAMP-F): metadata size is linear to

transaction size (not data size)2. RAMP-Hybrid (RAMP-H): constant-factor metadata3. RAMP-Small (RAMP-S): constant-factor metadata

RAMP Transaction Algorithms

April 14, 2023 11

Iskandar Setiadi

One RTT for reads (stable), except for partial readsTwo RTTs for writes

RAMP-Fast

April 14, 2023 12

Iskandar Setiadi

RAMP-Fast (Cont.)

April 14, 2023 13

Iskandar Setiadi

WriteIn the PREPARE phase, each partition adds the write to its local database.In the COMMIT phase, each partition updates an index containing the highest-timestamped committed version of each item.

Read Fetching the last committed version for each item and calculate whether it is “missing” any versions.

RAMP-Fast (Cont.)

April 14, 2023 14

Iskandar Setiadi

RAMP-Fast (Algorithm)

April 14, 2023 15

Iskandar Setiadi

RAMP-S uses constant-size metadata but always requires two RTT for reads.First round of reads: fetch the highest committed timestamp for each item from its respective partitionSecond round of reads: retrieve the highest-timestamped version of the item that also appears in the supplied set of timestamps

RAMP-Small

April 14, 2023 16

Iskandar Setiadi

RAMP-Small (Algorithm)

April 14, 2023 17

Iskandar Setiadi

RAMP-H Write: store a Bloom filter as the metadataRAMP-H Read: Same with RAMP-F, except this algorithm computes a list of potentially higher-timestamped writes for each item from the Bloom filter. Any potentially missing versions are fetched in a second round of reads.

RAMP-Hybrid

April 14, 2023 18

Iskandar Setiadi

Bloom Filter

April 14, 2023 19

Iskandar Setiadi

Safety PropertiesBloom filter may result in false positive. In the appendix, it’s proven that any false positive will not compromise the integrity of the result set; with unique timestamps, any reads due to false positive will return null.

RAMP-Hybrid (Cont.)

April 14, 2023 20

Iskandar Setiadi

RAMP-Hybrid (Algorithm)

April 14, 2023 21

Iskandar Setiadi

Summary of Basic Algorithms

April 14, 2023 22

Iskandar Setiadi

RAMP-F, RAMP-H, and often RAMP-S outperform existing solutions across a range of workload conditions while exhibiting overheads typically within 8% and no more than 48% of peak throughput.Each algorithm is evaulated using YCSB benchmark and several cr1.8xlarge instances on Amazon EC2 with a 95% read and 5% write proportion.

Experimental Evaluation

April 14, 2023 23

Iskandar Setiadi

LWLR: Long write locks and long read locks, providing Repeatable Read Isolation (PL-2.99)LWSR: Long write locks with short read locks, providing Read Committed Isolation (PL-2L, ≠ RA)LWNR: Long write with no read locks, providing Read Uncommitted Isolation (≠ RA)NWNR: No locks, base performance for parallelized operationsE-PCI: Eiger system’s 2PC-PCI, where for each transaction, designated “coordinator” server enforce RA isolation

Notation

April 14, 2023 24

Iskandar Setiadi

Result

April 14, 2023 25

Iskandar Setiadi

Result (Cont.)

April 14, 2023 26

Iskandar Setiadi

Cooperative Termination Protocol (CTP)Several transactions may become stalled operations. To “free” these leaks, CTP is used.

In the real environment, the blocked operations should occur with a modest failure rate of 1 in 1000 writes. Thus, the average-case overheads are small.

CTP Reference: P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems. Addison-wesley New York, 1987.

Experimental: CTP Overhead

April 14, 2023 27

Iskandar Setiadi

With 100 servers (in several availability zone” of EC2), RAMP-F was within 2.6%, RAMP-H within 3.4%, RAMP-S was within 45% of NWNR.

Experimental: Scalability

April 14, 2023 28