Retso hotdep-2011

23
Lock-free transactional support for large-scale storage systems Flavio Junqueira, Benjamin Reed, Maysam Yabandeh Yahoo! Research June 2011

description

Talk at the 7th Workshop on Hot Topics in Systems Dependability (HotDep 2011)

Transcript of Retso hotdep-2011

Page 1: Retso hotdep-2011

Lock-free transactional support for large-scale

storage systemsFlavio Junqueira, Benjamin Reed, Maysam Yabandeh

Yahoo! ResearchJune 2011

Page 2: Retso hotdep-2011

June 2011

Big data

• Large data sets

✓ Unstructured, semi-structured data

✓ Critical for business logic

• Examples of such data

✓ Web logs, server logs, social media, etc

2

Page 3: Retso hotdep-2011

June 2011

Big data

3

+43% clicksvs. editor selected

+160% clicksvs. one-size fits all

Eric Baldeschwieler @IBM Big Data, May 2011

Page 4: Retso hotdep-2011

June 2011

Big data: Hadoop

4

Eric Baldeschwieler @IBM Big Data, May 2011

Page 5: Retso hotdep-2011

June 2011

• Database generations in batches

• Online concurrent updates

Background

5

InputDB

Hours of MapReduce

OutputDB

Hours of MapReduce

OutputDB

Input

OutputDB

Input

Input txn

Input txn

Require transactional support

e.g., Hbase, HDFS

Page 6: Retso hotdep-2011

June 2011

Examples

• Mutable tables

• Various indexes: Web, news, shopping, coupons

• User and content models

• Characteristics

✓ Concurrency

✓ Losing updates is undesirable

✓ There are concurrent reads and they must be consistent

6

Page 7: Retso hotdep-2011

June 2011

Semantics

• Read only previously committed values

7

w(x,v)

w(x,v’)

r(x) = v

Time

Txn

Page 8: Retso hotdep-2011

June 2011

Semantics

• No concurrent writes to the same row

8

w(x,v’)

w(x,v)

Time

Txn

At least one must abort

Page 9: Retso hotdep-2011

June 2011

Snapshot Isolation

• Known in the database realm

• Conflicting transactions

✓ Write to the same element (e.g., row)

✓ Time range between start and commit overlap

• Efficient implementation by versioning

9

Page 10: Retso hotdep-2011

June 2011

Locks?

• Previous approaches: Lock data to modify

✓ Convoy effect

✓ Delays of several seconds

✓ Higher overhead on data servers

• Our approach

✓ Lock-free, centralized transaction manager

✓ Single point of failure, potential bottleneck?

10

[Percolator, OSDI’10]

Page 11: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Keeps stateabout committed rows

Page 12: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Ts①

Keeps stateabout committed rows

Page 13: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Ts①r(r1)w(r2, v2, Ts(txnr))

v1, Ts(txnw), ⊥ACK

②②

Ts(txnw) < Ts(txnr)

Keeps stateabout committed rows

Page 14: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Ts① Tc(txnw) < Ts(txnr)? ③r(r1)w(r2, v2, Ts(txnr))

v1, Ts(txnw), ⊥ACK

②②

Ts(txnw) < Ts(txnr)

Keeps stateabout committed rows

Page 15: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Ts① Tc(txnw) < Ts(txnr)? ③Commit r2

④r(r1)w(r2, v2, Ts(txnr))

v1, Ts(txnw), ⊥ACK

②②

Ts(txnw) < Ts(txnr)

Keeps stateabout committed rows

Page 16: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Ts① Tc(txnw) < Ts(txnr)? ③Commit r2

Cleanup(r2, txnr) ⑤

r(r1)w(r2, v2, Ts(txnr))

v1, Ts(txnw), ⊥ACK

②②

Ts(txnw) < Ts(txnr)

Keeps stateabout committed rows

Page 17: Retso hotdep-2011

June 2011

ReTSO: Design choices

• TSO

✓ Keeps state of modified rows

• In-memory state

✓ Highest commit timestamp of all garbage-collected rows

• Auto-GC Hash map

✓ Lazy garbage-collection

✓ Upon a hit

12

Page 18: Retso hotdep-2011

June 2011

ReTSO: Increasing dependability

• Remote write-ahead log

13

WAL

ReTSOInquiries

Updates

BackupReTSO Warm or cold

e.g., NFS, BookKeeper

[http://zookeeper.apache.org/bookkeeper]

Writes to WALare synchronous but do

not block other txns

Page 19: Retso hotdep-2011

June 2011

Preliminary results

• Coded in Java

✓ Except for hash map (C++ with JNI interface)

• Uses BookKeeper for WAL

• 10 identical servers

✓ 2.13 Dual Core Intel Xeon

✓ 4GB of RAM

✓ 1 Gigabit interfaces

14

Page 20: Retso hotdep-2011

June 2011

Preliminary results

• Average throughput observed

✓ 3 clients, 1,000 concurrent transactions

✓ 81k TPS

• Average latency

✓ 1 client, 1 txn

✓ 0.87 ms (with WAL)

✓ 0.17 ms (without WAL)

15

Page 21: Retso hotdep-2011

June 2011

Preliminary results

• Increasing the load of the system

✓ 1 to 16 clients

✓ Max is 72k TPS

16

0

2

4

6

8

10

12

14

16

18

20000 40000 60000 80000 100000 120000

Late

ncy

in m

s

Throughput in TPS

ReTSOWAL-disabled

Page 22: Retso hotdep-2011

June 2011

What’s baking?

• Integration

✓ HBase

✓ Query engine

• Real workloads

17

Page 23: Retso hotdep-2011

June 2011

Summary

• Transaction management for large-scale data repositories

• Lock-based vs. Lock-free

✓ ReTSO is lock-free and dependable

✓ Reduced load on storage nodes

✓ Low latency despite faults

• Performance sufficient for realistic applications

18