Ensuring Consistency in a Replicated World
-
Upload
yelp-engineering -
Category
Technology
-
view
188 -
download
0
description
Transcript of Ensuring Consistency in a Replicated World
![Page 1: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/1.jpg)
Ensuring Consistency in a Replicated World
Josh Snyder 2014-‐09-‐30
![Page 2: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/2.jpg)
2
what is Yelp?
![Page 3: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/3.jpg)
• we operate in a bunch of markets • aim to be globally distributed • our users should never see stale content • our developers should be able to design an application resilient to
replication delay
3
goals
![Page 4: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/4.jpg)
4
a sample architecture
![Page 5: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/5.jpg)
• a small set of moving parts • enables us to do more with fewer shards • masks geographic traffic split from users and developers • enhanced tolerance to replication delay • ability to
– perform online replication hierarchy changes – batch-load data
5
our toolset
![Page 6: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/6.jpg)
6
cookies
![Page 7: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/7.jpg)
• give the client a short-lived “dirty session” cookie • encode the time of the latest interaction between you and them • expire or ignore the cookie after replicas have caught up
7
cookies
![Page 8: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/8.jpg)
• load balancer: • POST? • GET? -> cookie?
• routes the request into the appropriate datacenter • adds headers to requests
8
request routing
![Page 9: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/9.jpg)
• users get read-after-write consistency • routing a user’s request between datacenters increases latency !
• getting it wrong: increased load on the master database
9
tradeoffs
![Page 10: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/10.jpg)
• we need to be assured that a user’s request falls back to a datacenter that has all of their data
10
tradeoffs
![Page 11: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/11.jpg)
• we need a clear picture of it • never underestimate replication delay, always overestimate
11
replication delay
![Page 12: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/12.jpg)
• made of lies (for this purpose) • underestimates most of the time • overestimates some of the time
12
Seconds_Behind_Master
http://bugs.mysql.com/bug.php?id=66921
![Page 13: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/13.jpg)
13
heartbeats
![Page 14: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/14.jpg)
• insert known data on the master • wait until you see it on the slave • time waited is replication delay
14
heartbeats
![Page 15: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/15.jpg)
15
clocks are evil
![Page 16: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/16.jpg)
16
clocks are evil (2)
![Page 17: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/17.jpg)
17
pt-heartbeat
![Page 18: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/18.jpg)
18
yelp_heartbeat
![Page 19: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/19.jpg)
19
the secret sauce
![Page 20: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/20.jpg)
• A sensu check:
20
what does that get us? (pt 1)
![Page 21: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/21.jpg)
21
why that way?
![Page 23: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/23.jpg)
• aggregates heartbeat information • provides it to the webapp • determines when to expire the dirty session cookie
23
repl_delay_reporter
![Page 24: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/24.jpg)
• Wait for replication: • “I inserted some data; when will it be available on all replicas?”
• Throttle to replication: • “I want to bulk insert data. Will doing so cause too much replication delay?”
24
operations
![Page 25: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/25.jpg)
• insert some data • ask the master database “what’s the heartbeat right now?”
• ask the repl_delay_reporter “what’s the lowest heartbeat right now?” • wait a bit
• loop until the lowest heartbeat exceeds the original master heartbeat
25
wait for replication
![Page 26: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/26.jpg)
• determines when to expire the dirty session cookie • relies on only 1 clock, and only for monotonicity • used heavily by batches
– provides read-after-write consistency
26
wait for replication
![Page 27: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/27.jpg)
• prevents batches from causing excessive replication delay • operates before the beginning of each transaction
– batches ask “is replication delay low enough for me to write right now?”
• batches are required to keep their transactions reasonably-sized
27
throttle to replication
![Page 28: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/28.jpg)
• load on masters • laggards • over-throttling
28
gotchas
![Page 29: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/29.jpg)
• batch data can reside on the same shards that serve OLTP requests • support databases with heterogenous SLAs • automatic load-shedding when there is a replication issue
29
what this gets us
![Page 30: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/30.jpg)
• shunting of nearly ALL reading and reporting off of the master • better mileage out of the Percona toolkit • on-line replication hierarchy changes
30
what this gets us
![Page 31: Ensuring Consistency in a Replicated World](https://reader033.fdocuments.us/reader033/viewer/2022060121/559427ab1a28ab07418b45a2/html5/thumbnails/31.jpg)