PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer...

79
Peter Bailis, Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica VLDB 2012 UC Berkeley Probabilistically Bounded Staleness for Practical Partial Quorums PBS

Transcript of PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer...

Page 1: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Peter Bailis, Shivaram Venkataraman, Mike Franklin, Joe Hellerstein, Ion Stoica

VLDB2012

UC Berkeley

Probabilistically Bounded Stalenessfor Practical Partial Quorums

PBS

Page 2: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

watch a recording(of an earlier talk) at:http://vimeo.com/37758648

PBSNOTE TO READER

Page 3: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

play with a live demo at:http://pbs.cs.berkeley.edu/#demo

PBSNOTE TO READER

Page 4: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

R+W

strongconsistencyhigherlatency

eventualconsistency

lowerlatency

Page 5: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

consistencycontinuum is a choicebinary

strong eventual

Page 6: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

latency vs.consistency

informed by practice

our focus:

availability, partitions,failures

not in this talk:

Page 7: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

quantify eventual consistency:wall-clock time (“how eventual?”)versions (“how consistent?”)

analyze real-world systems:EC is often strongly consistentdescribe when and why

our contributions

Page 8: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

introsystem model

practicemetricsinsights

integration

Page 9: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Apache, DataStax

Project Voldemort

Dynamo:Amazon’s Highly Available Key-value Store

SOSP 2007

Page 10: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Adobe

Cisco

Digg

Gowalla

IBM

Morningstar

NetflixPalantir

Rackspace

Reddit

Rhapsody

Shazam

Spotify

Soundcloud

Twitter

Mozilla

Ask.comYammerAol

GitHubJoyentCloud

Best Buy

LinkedInBoeing

Comcast

Cassandra

RiakVoldemortGilt Groupe

Page 11: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

N replicas/keyread: wait for R replieswrite: wait for W acks

N=3R=2

Page 12: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

“strong”consistency

else:

R+W > Nif:

eventualconsistency

then:

Page 13: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

reads return the last acknowledged write or anin-flight write (per-key)

consistency___“strong”

regular register

R+W > N

Page 14: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

99th 99.9th1 1x 1x2 1.59x 2.35x3 4.8x 6.13xR

W99th 99.9th

1 1x 1x2 2.01x 1.9x3 4.96x 14.96x

LatencyLinkedIndisk-basedmodelN=3

Page 15: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

⇧consistency, ⇧latencywait for more replicas,

read more recent data

consistency, ⇧ ⇧

latencywait for fewer replicas,

read less recent data

Page 16: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

eventualconsistency“if no new updates are made to the object, eventually all accesses will return the last updated value”

W. Vogels, CACM 2008

R+W ≤ N

Page 17: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

HowHow long do I have to wait?

eventual?

Page 18: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

consistent?How

What happens if I don’t wait?

Page 19: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

R+W

strongconsistencyhigherlatency

eventualconsistency

lowerlatency

Page 20: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

introsystem model

practicemetricsinsights

integration

Page 21: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Cassandra:R=W=1, N=3

by default(1+1 ≯ 3)

Page 22: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

eventual consistency

“maximum performance”

“very lowlatency”

okay for “most data”

“general case”

in the wild

Page 23: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

anecdotally, EC“good enough” for many kinds of data

How eventual?How consistent?

“eventual and consistent enough”

Page 24: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Probabilistically Bounded Staleness

can’t make promisescan give expectations

Can we do better?

Page 25: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

introsystem model

practicemetricsinsights

integration

Page 26: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

HowHow long do I have to wait?

eventual?

Page 27: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

t-visibility: probability p of consistent reads after t seconds

(e.g., 10ms after write, 99.9% of reads consistent)

How eventual?

Page 28: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

t-visibility depends on

messaging and processing delays

Page 29: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Coordinator Replicawrite

ack

read

response

wait for W responses

t seconds elapse

wait for R responses

response is stale

if read arrives before write

once per replica Time

Page 30: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

R1

R2

write

ack

read

W=1

R=1

N=2

response

Time

Alice

Bob

R2R1

R2inconsistent

Page 31: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

(R)

(W)write

ack

read

response

wait for W responses

t seconds elapse

wait for R responses

response is stale

if read arrives before write

once per replica

(A)

(S)

Coordinator Replica Time

Page 32: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

solving WARS: order statistics

dependent variables

instead: Monte Carlo methods

Page 33: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

to use WARS:

W53.244.5101.1

...

A10.38.211.3...

R15.322.419.8...

S9.614.26.7...

run simulationMonte Carlo, sampling

gather latency data

44.511.3

15.314.2

Page 34: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

real Cassandra cluster varying latencies:

t-visibility RMSE: 0.28%latency N-RMSE: 0.48%

WARS accuracy

Page 35: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

How eventual?

key: WARS modelneed: latencies

t-visibility: consistent reads with probability p after t seconds

Page 36: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

introsystem model

practicemetricsinsights

integration

Page 37: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Yammer100K+ companies

uses Riak

LinkedIn 175M+ users

built and uses Voldemort

production latenciesfit gaussian mixtures

Page 38: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

10 ms

N=3

Page 39: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Latency is combined read and write latency at 99.9th percentile

R=3, W=1100% consistent:Latency: 15.01 ms

LNKD-DISKN=3

16.5% faster

R=2, W=1, t =13.6 ms99.9% consistent:Latency: 12.53 ms

worthwhile?

Page 40: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

N=3

Page 41: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Latency is combined read and write latency at 99.9th percentile

R=3, W=1100% consistent:Latency: 4.20 ms

LNKD-SSDN=3

59.5% faster

R=1, W=1, t = 1.85 ms99.9% consistent:Latency: 1.32 ms

Page 42: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

10�2 10�1 100 101 102 103

0.20.40.60.81.0

W=3

10�2 10�1 100 101 102 103

0.20.40.60.81.0

CD

F

W=1

10�2 10�1 100 101 102 103

Write Latency (ms)

0.20.40.60.81.0

W=2

LNKD-SSD LNKD-DISK YMMR WANLNKD-SSD LNKD-DISK YMMR WAN

N=3

Page 43: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Coordinator Replica

write

ack(A)

(W)

response(S)

(R)

wait for W responses

t seconds elapse

wait for R responses

response is stale

if read arrives before write

once per replica

SSDs reducevariance

compared todisks!

read

Page 44: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Yammer

latency81.1%

(187ms) 202 mst-visibility 99.9th

N=3

Page 45: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

k-staleness (versions)How consistent?monotonic reads

quorum load

in the paper

Page 46: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

in the paper

<k,t>-staleness:versions and time

Page 47: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

latency distributions

WAN model

varying quorum sizes

staleness detection

in the paper

Page 48: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

introsystem model

practicemetricsinsights

integration

Page 49: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

1.Tracing2. Simulation3. Tune N,R,W

Integration

Project Voldemort

Page 51: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort
Page 52: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

http://pbs.cs.berkeley.edu/#demo

Page 53: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Related WorkQuorum Systems• probabilistic quorums [PODC ’97]

• deterministic k-quorums [DISC ’05, ’06]

Consistency Verification• Golab et al. [PODC ’11]

• Bermbach and Tai [M4WSOC ’11]

• Wada et al. [CIDR ’11]

• Anderson et al. [HotDep ’10]

• Transactional consistency:Zellag and Kemme [ICDE ’11],Fekete et al. [VLDB ’09]

Latency-Consistency• Daniel Abadi [Computer ’12]

• Kraska et al. [VLDB ’09]

Bounded StalenessGuarantees

• TACT [OSDI ’00]

• FRACS [ICDCS ’03]

• AQuA [IEEE TPDS ’03]

Page 54: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

R+W

strongconsistencyhigherlatency

eventualconsistency

lowerlatency

Page 55: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

consistencycontinuum is a

strong eventual

Page 56: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

quantify eventual consistency

model staleness in time, versions

latency-consistency trade-offs

analyze real systems and hardware

pbs.cs.berkeley.eduPBSquantify which choice is best and explain

why EC is often strongly consistent

Page 57: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Extra Slides

Page 58: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

PBS and apps

Page 59: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

staleness requires either:

staleness-tolerant data structurestimelines, logs

cf. commutative data structures logical monotonicity

asynchronous compensation codedetect violations after data is returned; see paper

cf. “Building on Quicksand” memories, guesses, apologies

write code to fix any errors

Page 60: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

minimize:(compensation cost)×(# of expected anomalies)

asynchronouscompensation

Page 61: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Read only newer data?

client’s read rateglobal write rate

(monotonic reads session guarantee)

# versions tolerablestaleness

=

(for a given key)

Page 62: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Failure?

Page 63: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

latency spikes

Treat failures as

Page 64: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

How l o n gdo partitions last?

Page 65: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

what time interval?99.9% uptime/yr ⇒ 8.76 hours downtime/yr

8.76 consecutive hours down⇒ bad 8-hour rolling average

hide in tail of distribution ORcontinuously evaluate SLA, adjust

Page 66: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

10�2 10�1 100 101 102 103

0.20.40.60.81.0

W=3

10�2 10�1 100 101 102 103

0.20.40.60.81.0

CD

F

W=1

10�2 10�1 100 101 102 103

Write Latency (ms)

0.20.40.60.81.0

W=2

LNKD-SSD LNKD-DISK YMMR WAN

LNKD-SSD LNKD-DISK YMMR WANLNKD-SSD LNKD-DISK YMMR WAN

LNKD-SSD LNKD-DISK YMMR WAN

N=3

Page 67: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

10�2 10�1 100 101 102 103

0.20.40.60.81.0

R=3

LNKD-SSD LNKD-DISK YMMR WAN

LNKD-SSD LNKD-DISK YMMR WANLNKD-SSD LNKD-DISK YMMR WAN

LNKD-SSD LNKD-DISK YMMR WAN

10�2 10�1 100 101 102 103

0.20.40.60.81.0

CD

F

W=1

10�2 10�1 100 101 102 103

Write Latency (ms)

0.20.40.60.81.0

W=2

(LNKD-SSD and LNKD-DISK identical for reads)N=3

Page 68: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

<k,t>-staleness:versions and time

approximation: exponentiate

t-staleness by k

Page 69: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

Synthetic,Exponential Distributions

W 1/4x ARS

W 10x ARS

N=3, W=1, R=1

Page 70: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

concurrent writes:deterministically choose

Coordinator R=2

(“key”, 1) (“key”, 2)

Page 71: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort
Page 72: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort
Page 73: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort
Page 74: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort
Page 75: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

N = 3 replicas

Coordinator

read(“key”)

(“key”, 1)(“key”, 1)(“key”, 1)

client

read(“key”)(“key”, 1)readR=3

R1 R2 R3(“key”, 1) (“key”, 1) (“key”, 1)

Page 76: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

N = 3 replicas

Coordinator

read(“key”)

(“key”, 1)(“key”, 1)(“key”, 1)

(“key”, 1)read(“key”)readR=3

R1 R2 R3(“key”, 1) (“key”, 1) (“key”, 1)

client

Page 77: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

N = 3 replicas

Coordinator

read(“key”)

(“key”, 1)(“key”, 1)(“key”, 1)

(“key”, 1)read(“key”)

send read to all

readR=1

R1 R2 R3(“key”, 1) (“key”, 1) (“key”, 1)

client

Page 78: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

(“key”, 2)

Coordinator Coordinator

read(“key”)

ack(“key”, 2)

write(“key”, 2)

write(“key”, 2)ack(“key”, 2)

W=1

R1(“key”, 1) R2(“key”, 1) R3(“key”, 1)(“key”, 2)

(“key”, 1)

read(“key”)

(“key”,1)

ack(“key”, 2) ack(“key”, 2)

(“key”, 2)

(“key”, 2) (“key”, 2)

R=1

Page 79: PBS VLDB 2012 - Shivaramshivaram.org/talks/pbs-vldb12-talk.pdf · Soundcloud Twitter Mozilla Yammer Ask.com Aol GitHub JoyentCloud Best Buy LinkedIn Boeing Comcast Cassandra Voldemort

N=3