Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summit 2016
Cassandra summit LWTs
-
Upload
christopher-batey -
Category
Technology
-
view
95 -
download
0
Transcript of Cassandra summit LWTs
![Page 1: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/1.jpg)
LWTs in practice
Christopher Batey@chbatey
The Last Pickle
![Page 2: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/2.jpg)
![Page 3: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/3.jpg)
![Page 4: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/4.jpg)
Overview
Review of Cassandra’s consistency model
What are LWTs?
Why do we need them?
How do they work?
How do you use them?
![Page 5: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/5.jpg)
Writes
Client A
C
![Page 6: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/6.jpg)
Concurrent writes
Client A
C
Client B
C
![Page 7: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/7.jpg)
QUORUM based consistency
Client A
C
Client B
C
![Page 8: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/8.jpg)
QUORUM based consistency
Client A
C
Client B
C
![Page 9: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/9.jpg)
QUORUM based consistency
Client A
C
Client B
C
![Page 10: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/10.jpg)
Voucher example
CREATE TABLE vouchers_mutable (
name text PRIMARY KEY,
sold int
)
![Page 11: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/11.jpg)
Read and write race condition with Quorum
● Client A read the number of ticket sales at 299● Client B read the number of ticket sales at 299● Client A sells ticket 300● Client B sells ticket 300
![Page 12: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/12.jpg)
Compare and set
![Page 13: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/13.jpg)
Enter Light Weight Transactions
● Client A read the number of ticket sales at 299● Client B read the number of ticket sales at 299● Client A sells ticket 300 if total sold is 299● Client B sells ticket 300 if total sold is 299
![Page 14: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/14.jpg)
Examples
![Page 15: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/15.jpg)
Uniqueness
CREATE TABLE users (
user_name text PRIMARY KEY,
email text,
password text
)
INSERT INTO users (user_name, password, email )
VALUES ( 'chbatey', 'different',
'[email protected]' ) IF NOT EXISTS;
![Page 16: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/16.jpg)
Finite resource
CREATE TABLE vouchers_mutable (
name text PRIMARY KEY,
sold int
)
UPDATE vouchers_mutable SET sold = 1
WHERE name = 'free tv' IF sold = 0;
![Page 17: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/17.jpg)
Immutable events
CREATE TABLE vouchers (
name text,
when timeuuid,
who text,
PRIMARY KEY (name, when)
);
![Page 18: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/18.jpg)
Batches + LWTs
CREATE TABLE vouchers (
name text,
when timeuuid,
sold int static,
who text,
PRIMARY KEY (name, when)
);
INSERT INTO vouchers (name, sold) VALUES
( 'free tv', 0);
![Page 19: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/19.jpg)
Batches + LWTs
BEGIN BATCH
UPDATE vouchers SET sold = 1 WHERE name = 'free tv' IF sold = 0
INSERT INTO vouchers (name, when, who) VALUES ( 'free tv', now(), 'chris')
APPLY BATCH ;
[applied]
-----------
True
![Page 20: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/20.jpg)
Batches + LWTs
BEGIN BATCH
UPDATE vouchers SET sold = 1 WHERE name = 'free tv' IF sold = 0
INSERT INTO vouchers (name, when, who) VALUES ( 'free tv', now(), 'charlie')
APPLY BATCH ;
[applied] | name | when | sold
-----------+---------+------+------
False | free tv | null | 1
![Page 21: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/21.jpg)
How they work
![Page 22: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/22.jpg)
LWTs be puzzling
1. Why does a LWT have two consistency levels?2. What is this SERIAL consistency I keep hearing about?3. What are SERIAL reads?4. Why does my LWT fail but the value still get written?5. Why are they so damn slow?
![Page 23: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/23.jpg)
Consensus for a partition
![Page 24: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/24.jpg)
Consensus for a partition
![Page 25: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/25.jpg)
Stages of a LWT
● Prepare and promise● Read existing value● Propose and accept● Commit
![Page 26: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/26.jpg)
Consensus for a partition
Promised
I want the value to be 5, as long as it currently 4
prepare proposeAccepted
commitCommittedCondition
met
read
![Page 27: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/27.jpg)
Prepare and promise 1
Client
partition table promised accepted committed
Prepare 1
![Page 28: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/28.jpg)
Prepare and promise 1
Client
partition table promised accepted committed
A vouchers 1
![Page 29: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/29.jpg)
Prepare and promise 2
Client
partition table promised accepted committed
A vouchers 1Prepare 2
![Page 30: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/30.jpg)
Prepare and promise 2
Client
partition table promised accepted committed
A vouchers 2
![Page 31: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/31.jpg)
Prepare and promise - rejection
Client
partition table promised accepted committed
A vouchers 2Prepare 1
![Page 32: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/32.jpg)
Prepare and promise - rejection
Client
partition table promised accepted committed
A vouchers 2
Rejected- ClientRequest.CASWrite.contentions
Prepare 1
![Page 33: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/33.jpg)
Prepare and promise - trace
Parsing insert into users (user_name, password, email ) values ( 'chbatey', 'chrisrocks', '[email protected]' ) if not exists; [SharedPool-Worker-1] | 2016-08-22 12:38:44.132000 | 127.0.0.1 | 1125
Sending PAXOS_PREPARE message to /127.0.0.3 [MessagingService-Outgoing-/127.0.0.3] | 2016-08-22 12:38:44.141000 | 127.0.0.1 | 10414
Sending PAXOS_PREPARE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] | 2016-08-22 12:38:44.142000 | 127.0.0.1 | 10908
Promising ballot fb282190-685c-11e6-71a2-e0d2d098d5d6 [SharedPool-Worker-1] | 2016-08-22 12:38:44.147000 | 127.0.0.3 | 4325
![Page 34: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/34.jpg)
Prepare and promise - trace
Promising ballot fb282190-685c-11e6-71a2-e0d2d098d5d6 [SharedPool-Worker-1] | 2016-08-22 12:38:44.147000 | 127.0.0.3 | 4325
Promising ballot fb282190-685c-11e6-71a2-e0d2d098d5d6 [SharedPool-Worker-3] | 2016-08-22 12:38:44.166000 | 127.0.0.1 | 35282
![Page 35: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/35.jpg)
Read
LOCAL_SERIAL => LOCAL_QUORUM
SERIAL => QUORUM
ClinetRequest.CASWrite.conditionNotMet
![Page 36: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/36.jpg)
Propose and accept
Client
partition table promised accepted committed
A vouchers 1Propose 1
![Page 37: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/37.jpg)
Propose and accept
Client
partition table promised accepted committed
A vouchers 1 1Propose 1
![Page 38: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/38.jpg)
Propose and accept - rejection
Client
partition table promised accepted committed
A vouchers 2Propose 1
Rejected- ClientRequest.CASWrite.contentions
![Page 39: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/39.jpg)
Propose and accept - trace
Sending PAXOS_PROPOSE message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] | 2016-08-22 12:38:44.196000 | 127.0.0.1 | 65606
Sending PAXOS_PROPOSE message to /127.0.0.1 [MessagingService-Outgoing-/127.0.0.1] | 2016-08-22 12:38:44.196000 | 127.0.0.1 | 65606
PAXOS_PROPOSE message received from /127.0.0.1 [MessagingService-Incoming-/127.0.0.1] | 2016-08-22 12:38:44.197000 | 127.0.0.1 | 65986
Sending PAXOS_PROPOSE message to /127.0.0.3 [MessagingService-Outgoing-/127.0.0.3] | 2016-08-22 12:38:44.197000 | 127.0.0.1 | 66139
![Page 40: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/40.jpg)
Propose and accept - trace
Accepting proposal Commit(fb282190-685c-11e6-71a2-e0d2d098d5d6, [lwts.users] key=chbatey columns=[[] | [email password]]\n Row: EMPTY | [email protected], password=chrisrocks) [SharedPool-Worker-2] | 2016-08-22 12:38:44.199000 | 127.0.0.1 | 67804
![Page 41: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/41.jpg)
Commit
● The normal consistency is now used for the commit
![Page 42: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/42.jpg)
Consensus for a partition
Promised
I want the value to be 5, as long as it currently 4
prepare proposeAccepted
commitCommittedCondition
met
read
![Page 43: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/43.jpg)
SERIAL reads
o.a.c.s.StorageProxy.readWithPaxos
● For a single partition● Runs a prepare and ensures all replicas have the latest commit● Then runs the read at either Q or LQ
![Page 44: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/44.jpg)
Write timestamps
Client A
C
2
1
3
C
![Page 45: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/45.jpg)
Write timestamps
Client A
C
2
1
3
C
![Page 46: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/46.jpg)
Write timestamps
Client A
C
2
1
3
C
T = 1
T = 2
![Page 47: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/47.jpg)
Some numbers
![Page 48: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/48.jpg)
Setup
4 * i2xLarge
RF = 3
10 clients trying to buy 1000 vouchers each - 10k total operations
Contention: all clients buying the same voucher (same partitoin)
No contention: all clients after different vouchers (different partition)
![Page 49: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/49.jpg)
Mutable field
CREATE TABLE vouchers_mutable (
name text PRIMARY KEY,
sold int
) UPDATE vouchers_mutable SET sold = 1
WHERE name = 'free tv' IF sold = 0;
UPDATE vouchers_mutable SET sold = 1
WHERE name = 'free tv'1)
2)
![Page 50: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/50.jpg)
Histogram
![Page 51: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/51.jpg)
Histogram
![Page 52: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/52.jpg)
Batches
CREATE TABLE vouchers (
name text,
when timeuuid,
sold int static,
who text,
PRIMARY KEY (name, when)
);
BEGIN BATCH
UPDATE vouchers SET sold = 1 WHERE name = 'free tv' IF sold = 0
INSERT INTO vouchers (name, when, who) VALUES ( 'free tv', now(), 'charlie')
APPLY BATCH ;
![Page 53: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/53.jpg)
Histogram
![Page 54: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/54.jpg)
Histogram
![Page 55: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/55.jpg)
Summary
LWT Batch Contention Incorrect results 99th %ile (milliseconds)
N N Y 87% Lost 48
Y N Y 0% Lost1% Unknown 81% CNM
191
Y N N 0% Lost0% Unknown0% CNM
52
Y Y Y 0% Lost<1% Unknown82% CNM
192
![Page 56: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/56.jpg)
Summary
![Page 57: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/57.jpg)
Summary
● LWTs are expensive● They are more complex and less mature than the regular read and write path● Might be a lot easier than bringing in a second technology
![Page 58: Cassandra summit LWTs](https://reader031.fdocuments.us/reader031/viewer/2022021813/589d9b5e1a28abfb3d8b5a73/html5/thumbnails/58.jpg)
Questions?Christopher Batey
@chbatey
The Last Pickle