Transaction chains: achieving serializability with low-latency in geo-distributed storage systems
description
Transcript of Transaction chains: achieving serializability with low-latency in geo-distributed storage systems
![Page 1: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/1.jpg)
Transaction chains: achieving serializability with low-latency in geo-distributed storage systems
Yang Zhang Russell Power Siyuan ZhouYair Sovran *Marcos K. Aguilera Jinyang Li
New York University *Microsoft Research Silicon Valley
![Page 2: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/2.jpg)
Large-scale Web applications
Why geo-distributed storage?
Geo-distributed storage
Replication
![Page 3: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/3.jpg)
Geo-distribution is hard
Low latency:O(Intra-datacenter RTT)
Strong semantics:relational tables w/
transactions
![Page 4: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/4.jpg)
?Low latency
Key/value only
Limited forms of transaction
General transaction
Prior workStrictserializable
Serializable
Eventual
Variousnon-serializable
High latency
Provably high latency according to CAP
Spanner [OSDI’12]
Dynamo [SOSP’07]
COPS [SOSP’11]
Walter [SOSP’11]
Eiger [NSDI’13]
Our work
![Page 5: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/5.jpg)
Our contributions1. A new primitive: transaction chain– Allow for low latency, serializable transactions
2. Lynx geo-storage system: built with chains– Relational tables– Secondary indices, materialized join views
![Page 6: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/6.jpg)
Talk Outline• Motivation• Transaction chains• Lynx• Evaluation
![Page 7: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/7.jpg)
Why transaction chains?
Bidder Item Price Seller Item Highest bidBids Items
Alice Book $100
Bob Book $20
Alice iPhone $20
Bob
Datacenter-1 Datacenter-2
Alice
Bob Camera $100
Auction service
![Page 8: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/8.jpg)
Why transaction chains?
Alice’s BidsAlice Book $100
Bob
Datacenter-1 Datacenter-2
AliceBob Camera $100
Bob’s Items
1. Insert bid to Alice’s Bids
2. Update highest bid on Bob’s Items
Operation: Alice bids on Bob’s camera
1. Insert bid to Alice’s Bids
![Page 9: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/9.jpg)
Why transaction chains?
Alice’s BidsAlice Book $100
Bob
Datacenter-1 Datacenter-2
AliceBob Camera $100
Bob’s Items
2. Update highest bid on Bob’s Items
Operation: Alice bids on Bob’s camera
1. Insert bid to Alice’s Bids
![Page 10: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/10.jpg)
Low latency with first-hop return
Alice’s BidsAlice Book $100
Bob
Datacenter-1 Datacenter-2
Alice
Bob Camera $100Bob’s Items
bid on Bob’s camera
Alice Camera $500
$500
![Page 11: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/11.jpg)
Problem: what if chains fail?
1. What if servers fail after executing first-hop?
2. What if a chain is aborted in the middle?
![Page 12: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/12.jpg)
Solution: provide all-or-nothing atomicity
1. Chains are durably logged at first-hop– Logs are replicated to another closest data center– Chains are re-executed upon recovery
2. Chains allow user-aborts only at first hop
• Guarantee: First hop commits all hops eventually commit
![Page 13: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/13.jpg)
Problem: non-serializable interleaving• Concurrent chains ordered inconsistently at different hops
X=1 Y=1
X=2 Y=2
Time
T1
T2
Server-X: T1 < T2 Server-Y: T2 < T1
Not serializable!
T2 T1
• Traditional 2PL+2PC prevents non-serializable interleaving at the cost of high latency
![Page 14: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/14.jpg)
Conflict?
Solution: detect non-serializable interleaving via static analysis
• Statically analyze all chains to be executed– Web applications invoke fixed set of operations
X=1 Y=1
X=2 Y=2
Serializable if no SC-cycle [Shasha et. al TODS’95]
A SC-cycle has both red and blue edges
T1
T2
![Page 15: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/15.jpg)
Outline• Motivation• Transaction chains• Lynx’s design• Evaluation
![Page 16: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/16.jpg)
How Lynx uses chains
• User chains: used by programmers to implement application logic
• System chains: used internally to maintain– Secondary indexes– Materialized join views– Geo-replicas
![Page 17: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/17.jpg)
Example: secondary index
Bob Car $20Alice Book $20
Bob Camera $100Alice iPhone $100
Bidder Item PriceBids (base table)
Alice Camera $100
Bob iPhone $20
Bidder Item PriceBids (secondary index)
Alice Camera $100
Bob Car $20
![Page 18: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/18.jpg)
Example user and system chain
Alice Book $100
Bob
Datacenter-1 Datacenter-2
Alice
Bob Camera $100
bid on Bob’s camera
Alice Camera $100
![Page 19: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/19.jpg)
Insert to Bids table
Update Items table
Lynx statically analyzes all chains beforehand
Put-bid
Read-bids
Put-bidInsert to Bids table
Update Items table
Read-bids
SC-cycleOne solution: execute chain as a distributed transaction
Read Bids table
Read Bids table
![Page 20: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/20.jpg)
Insert to Bids table
Update Items table
SC-cycle source #1: false conflicts in user chains
Put-bid
Insert to Bids table
Update Items tablePut-bid
False conflict because max(bid, current_price)
commutes
![Page 21: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/21.jpg)
Insert to Bids table
Update Items table
Solution: users annotate commutativity
Put-bid
Insert to Bids table
Update Items tablePut-bid
com
mut
es
![Page 22: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/22.jpg)
SC-cycle source #2: system chains
Insert to Bids table
…Put-bid
Insert to Bids table
…Put-bid
Insert to Bids-secondary
Insert to Bids-secondary
SC-cycle
![Page 23: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/23.jpg)
Solution: chains provide origin-ordering• Observation: conflicting system chains originate at the
same first hop server.
Both write the same row of Bids table
• Origin-ordering: if chains T1 < T2 at same first hop, then T1 < T2 at all subsequent overlapping hops.– Can be implemented cheaply sequence number vectors
T1
Insert to Bids table
Insert to Bids-secondary
T2
Insert to Bids table
Insert to Bids-secondary
![Page 24: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/24.jpg)
Limitations of Lynx/chains1. Chains are not strictly serializable, only serializable.2. Programmers can abort only at first hop
• Our application experience: limitations are managable
![Page 25: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/25.jpg)
Outline• Motivation• Transaction chains• Lynx’s design• Evaluation
![Page 26: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/26.jpg)
Simple Twitter Clone on Lynx
Author Tweet
Tweets
Alice New York rocks
From To
Follow-Graph
Alice Bob
Alice Eve
Bob Time to sleep
To From
Follow-Graph (secondary)
Bob Alice
Bob Clark
Geo-replicated
Geo-replicated
Author(=to)
From Tweet
Bob Alice Time to sleep
Eve Alice Hi there
Tweets JOIN Follow-Graph (Timeline)
Eve Hi there
![Page 27: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/27.jpg)
Experimental setup
us-west
europe
us-east
82ms
153ms
102ms
Lynx protoype:• In-memory database• Local disk logging only.
![Page 28: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/28.jpg)
Returning on first-hop allows low latency
Follow-user Post-tweet Follow-user Post-tweet Read-timeline0
50
100
150
200
250
300
174
252
3.2 3.1 3.1
Late
ncy
(ms)
First hop return
Chain completion
![Page 29: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/29.jpg)
Applications achieve good throughput
Follow-User Post-Tweet Read-Timeline0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
184000 173000
1350000
Mill
ion
ops/
sec
![Page 30: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/30.jpg)
Related work• Transaction decomposition– SAGAS [SIGMOD’96], step-decomposed transactions
• Incremental view maintenance– Views for PNUTS [SIGMOD’09]
• Various geo-distributed/replicated storage– Spanner[OSDI’12], MDCC[Eurosys’13],
Megastore[CIDR’11], COPS [SOSP’11], Eiger[NSDI’13], RedBlue[OSDI’12].
![Page 31: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/31.jpg)
Conclusion• Chains support serializability at low latency– With static analysis of SC-cycles
• Key techniques to reduce SC-cycles– Origin ordering– Commutative annotation
• Chains are useful – Performing application logic – Maintaining indices/join views/geo-replicas
![Page 32: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/32.jpg)
![Page 33: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/33.jpg)
Limitations of Lynx/chains1. Chains are not strict serializable
Time
Remedies: – Programmers can wait for chain completion– Lynx provides read-your-own-writes
2. Programmers can only abort at first hop• Our application experience shows the limitations are managable
Serializable Strict serializable
![Page 34: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/34.jpg)
![Page 35: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/35.jpg)
2PC and chainsThe easy way
W(A)
R(A)
W(B)
W(A) W(B)
R(A)
2PC-W(AB)
R(A)
R(A)
T1
T2
T2
T1
T2
T1
T1
![Page 36: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/36.jpg)
2PC and chainsThe hard way
W(A)
R(A) R(B)
W(B)
W(A) W(B)
R(A) R(B)
2PC-W(AB)
R(A) R(B)
R(A) R(B)
T1
T2
T2
T1
T2
T1
T1
![Page 37: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/37.jpg)
2PC and chainsThe hard way
Chain
DC1 DC2 DC3 DC4
A B C D
2PC retry
Parallelunlock
![Page 38: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/38.jpg)
![Page 39: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/39.jpg)
Lynx is scalable
1 2 4 80
500
1000
1500
2000
2500
3000
48 93 184374
42 86 173356265
586
1350
2770
FollowTweetTimeline
#Servers per DC
QPS
(K/
s)
![Page 40: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/40.jpg)
![Page 41: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/41.jpg)
1. Insert bid into bid history 2. Update max price on item
1. Insert bid into bid history 2. Update max price on item
T1
T2
Conflict onbid history
Conflict onitem
SC-cycle Not serializable
Challenge of static analysis: false conflict
![Page 42: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/42.jpg)
Solution: communitivity annotations
1. Insert bid into bid history 2. Update max price on item
1. Insert bid into bid history 2. Update max price on item
T1
T2
Conflict onbid history
Commutativeoperation
No SC-cycle Serializable
Conflict onitem
No real conflict because bid ids
are unique
Updating max commutes
Commutativeoperation
![Page 43: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/43.jpg)
ACID: all-or-nothing atomicity• Chain’s failure guarantee:– If the first hop of a chain commits, then all hops
eventually commit• Users are only allowed to abort a chain in the first hop
• Achievable with low latency:– Log chains durably at the first hop• Logs replicated to a nearby datacenter
– Re-execute stalled chains upon failure recovery
![Page 44: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/44.jpg)
ACID: serializability• Serializability– Execution result appears as if obey a serial order
for all transactions– No restrictions on the serial order
Ordering 1 Ordering 2
Transactions
![Page 45: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/45.jpg)
Problem #2: unsafe interleaving• Serializability– Execution result appears as if obey a serial order
for all transactions– No restrictions on the serial order
Ordering 1 Ordering 2
Transactions
![Page 46: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/46.jpg)
Chains are not linearizable• Serializability• Linearability
Ordering 1 Ordering 2
Transactions
Time
Linearizable
a total ordering of chains a total ordering of chains
& total order obeys the issue order
![Page 47: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/47.jpg)
Transaction chains: recap• Chains provide all-or-nothing atomicity• Chains ensure serializability via static analysis• Practical challenges:– How to use chains?– How to avoid SC-cycles?
![Page 48: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/48.jpg)
Example user chain
Bidder Item PriceBids
Alice Camera 100
1. Insert bid into Alice’s bid history
Alice Bob
Seller Item HighestItems
Bob CameraBob Camera 100
2. Update max price on Bob’s camera
![Page 49: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/49.jpg)
Lynx implementation
• 5000 lines C++ and 3500 lines RPC library• Uses an in-memory key/value store• Support user chains in Javascript (via V8)
![Page 50: Transaction chains: achieving serializability with low-latency in geo-distributed storage systems](https://reader033.fdocuments.us/reader033/viewer/2022051003/56816553550346895dd7cce1/html5/thumbnails/50.jpg)
Geo-distributed storage is hard• Applications demand simplicity & performance– Friendly programming model
• Relational tables• Transactions
– Fast response• Ideally, operation latency = O(intra-datacenter RTT)
• Geo-distribution leads to high latency– Coordinate data access across datacenters
• Operation latency = O(inter-datacenter RTT) = O(100ms)