Crossing the Production Barrier: Development at Scale
-
Upload
jgoulah -
Category
Technology
-
view
3.776 -
download
0
description
Transcript of Crossing the Production Barrier: Development at Scale
[email protected] / @johngoulah
Crossing the Production BarrierDevelopment At Scale
The world’s handmade marketplaceplatform for people to sell homemade, crafts, and vintage goods
42MM unique visitors/mo.
1.5B+ page views / mo.
42MM unique visitors/mo.
1.5B+ page views / mo.
42MM unique visitors/mo.
850K shops / 200 countries
1.5B+ page views / mo.
895MM sales in 2012
42MM unique visitors/mo.
850K shops / 200 countries
big cluster, 20 shards and adding 5 more
over 40% increase from last year in QPS (25K last year)additional 30K moving over from postgres
1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
4TB InnoDB buffer pool
over 40% increase from last year in QPS (25K last year)additional 30K moving over from postgres
1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
4TB InnoDB buffer pool
20TB+ data stored
over 40% increase from last year in QPS (25K last year)additional 30K moving over from postgres
1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
60K+ queries/sec avg
4TB InnoDB buffer pool
20TB+ data stored
over 40% increase from last year in QPS (25K last year)additional 30K moving over from postgres
1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
60K+ queries/sec avg
4TB InnoDB buffer pool
20TB+ data stored
~1.2Gbps outbound (plain text)
over 40% increase from last year in QPS (25K last year)additional 30K moving over from postgres
1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
60K+ queries/sec avg
4TB InnoDB buffer pool
20TB+ data stored
99.99% queries under 1ms
~1.2Gbps outbound (plain text)
over 40% increase from last year in QPS (25K last year)additional 30K moving over from postgres
1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
50+ MySQL servers / 800 CPUs
Server SpecHP DL 380 G7
96GB RAM16 spindles / 1TB RAID 10
24 Core16 x 146GB
The Problem
been around since ’05, hit this a few years ago, every big company probably has this issue
DATA
sync prod to dev, until prod data gets too big
http://www.flickr.com/photos/uwwresnet/6280880034/sizes/l/in/photostream/
Some Approaches
subsets have to end somewhere (a shop has favorites that are connected to people, connected to shops, etc)generated data can be time consuming to fake
Some Approaches
subsets of data
subsets have to end somewhere (a shop has favorites that are connected to people, connected to shops, etc)generated data can be time consuming to fake
Some Approaches
subsets of data
generated data
subsets have to end somewhere (a shop has favorites that are connected to people, connected to shops, etc)generated data can be time consuming to fake
But...
but there is a problem with both of those approaches
Edge Cases
what about testing edge cases, difficult to diagnose bugs?hard to model the same data set that produced a user facing bug
http://www.flickr.com/photos/sovietuk/141381675/sizes/l/in/photostream/
Perspective
another issue is testing problems at scale, complex and large gobs of datareal social network ecosystem can be difficult to generate (favorites, follows) (activity feed, “similar items” search gives better results)
http://www.flickr.com/photos/donsolo/2136923757/sizes/l/in/photostream/
Prod Dev ?
what most people do before data gets too big, almost 2 days to sync 20Tb over 1Gbps link, 5 hrs over 10Gbps bringing prod dataset to dev was expensive hardware/maint, keeping parity with prod, and applying schema changes would take at least as long
Use Production
so we did what we saw as the last resort - used production not for greenfield development, more for mature features and diagnosing bugswe still have a dev database but the data is sparse and unreliable
Use Production(sometimes)
so we did what we saw as the last resort - used production not for greenfield development, more for mature features and diagnosing bugswe still have a dev database but the data is sparse and unreliable
goes without saying this can be dangerousalso difficult if done right, we’ve been working on this for a year
http://www.flickr.com/photos/stuckincustoms/432361985/sizes/l/in/photostream/
Approach
two big things: cultural and technical
Solve Culture Issues First
part of figuring this out was exhausting all other optionsgetting buy-in from major stakeholders
Two “Simple” Technical Issues
step 0:
failure recovery
step 1:
make it safehow to have test data in production, prevent stupid mistakes
phased rollout
phased rollout
read-only
phased rollout
read-onlyr/w dev shard only
phased rollout
read-onlyr/w dev shard only
full r/w
How?
how did we do it?
Quick Overview
high level view
http://www.flickr.com/photos/h-k-d/7852444560/sizes/o/in/photostream/
tickets index
shard 1 shard 2 shard N
tickets index
shard 1 shard 2 shard N
Unique IDs
tickets index
shard 1 shard 2 shard N
Shard Lookup
tickets index
shard 1 shard 2 shard N
Store/Retrieve Data
dev shard
introducing....
dev shard, shard used for initial writes of data created when coming from dev env
tickets index
shard 1 shard 2 shard N
tickets index
shard 1 shard 2 shard N
DEV shard
shard 1 shard 2 shard N
DEV shard
www.etsy.com www.goulah.vm
Initial Writes
shard 1 shard 2 shard N
DEV shard
www.etsy.com www.goulah.vm
Initial Writes
shard 1 shard 2 shard N
DEV shard
www.etsy.com www.goulah.vm
Initial Writes
mysql proxy
proxy hits all of the shards/index/tickets
http://www.oreillynet.com/pub/a/databases/2007/07/12/getting-started-with-mysql-proxy.html
dangerous/unnecessary queries
-- filter dangerous queries - (queries without a WHERE)-- remove unnecessary queries - (instead of DELETE, have a flag, ALTER statements don’t run from dev)
dangerous/unnecessary queries
(DEV) etsy_rw@jgoulah [test]> select * from fred_test;
-- filter dangerous queries - (queries without a WHERE)-- remove unnecessary queries - (instead of DELETE, have a flag, ALTER statements don’t run from dev)
dangerous/unnecessary queries
(DEV) etsy_rw@jgoulah [test]> select * from fred_test;
ERROR 9001 (E9001): Selects from tables must have where clauses
-- filter dangerous queries - (queries without a WHERE)-- remove unnecessary queries - (instead of DELETE, have a flag, ALTER statements don’t run from dev)
known in/egress funnel
we know where all of the queries from dev originate from
http://www.flickr.com/photos/medevac71/4875526920/sizes/l/in/photostream/
explicitly enabled
% dev_proxy onDev-Proxy config is now ON. Use 'dev_proxy off' to turn it off.
Not on all the time
visual notifications
notify engineers they are using the proxy, this is read-only mode
read/write mode
read-write mode, needed for login and other things that write data
stealth data
hiding data from users (favorites go on dev and prod shard, making sure test user/shops don’t show up)
http://www.flickr.com/photos/davidyuweb/8063097077/sizes/h/in/photostream/
Security
http://www.flickr.com/photos/sidelong/3878741556/sizes/l/in/photostream/
PCI
token exchange only, locked down for most people
PCI
off-limits
token exchange only, locked down for most people
anomaly detection
another part of our security setup is detection
logging
basics of anomaly detection is log collection
2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361
[htSp8458VmHlC] [etsy_index_B] [browse.php] */
SELECT id FROM table;
2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361
[htSp8458VmHlC] [etsy_index_B] [browse.php] */
SELECT id FROM table;
date
2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361
[htSp8458VmHlC] [etsy_index_B] [browse.php] */
SELECT id FROM table;
date thread id
2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361
[htSp8458VmHlC] [etsy_index_B] [browse.php] */
SELECT id FROM table;
date thread id
source ip
2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361
[htSp8458VmHlC] [etsy_index_B] [browse.php] */
SELECT id FROM table;
date thread id
source ip
unique id generated by proxy
2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361
[htSp8458VmHlC] [etsy_index_B] [browse.php] */
SELECT id FROM table;
date thread id
source ip
unique id generated by proxy
app request id
2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361
[htSp8458VmHlC] [etsy_index_B] [browse.php] */
SELECT id FROM table;
date thread id
source ip
unique id generated by proxy
app request id dest. shard
2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361
[htSp8458VmHlC] [etsy_index_B] [browse.php] */
SELECT id FROM table;
date thread id
source ip
unique id generated by proxy
app request id dest. shard script
login-as
(read only, logged w/ reason for access)
reason is recorded and reviewed
Recovery
sources of restore data
sources of restore dataHadoop
sources of restore dataHadoop
Backups
sources of restore dataHadoop
Backups
Delayed Slaves
Delayed Slaves
pt-slave-delay watches a slave and starts and stops its replication SQL thread as necessary to hold it
http://www.flickr.com/photos/xploded/141295823/sizes/o/in/photostream/
Delayed Slaves
role of the delayed slavealso source of BCP (business continuity planning - prevention and recovery of threats)
4 hour delay behind master
Delayed Slaves
role of the delayed slavealso source of BCP (business continuity planning - prevention and recovery of threats)
4 hour delay behind master
produce row based binary logs
Delayed Slaves
role of the delayed slavealso source of BCP (business continuity planning - prevention and recovery of threats)
4 hour delay behind master
produce row based binary logs
Delayed Slaves
allow for quick recovery
role of the delayed slavealso source of BCP (business continuity planning - prevention and recovery of threats)
pt-slave-delay --daemonize
--pid /var/run/pt-slave-delay.pid --log /var/log/pt-slave-delay.log
--delay 4h --interval 1m --nocontinue
last 3 options most important, 4h delay, interval is how frequently it should check whether slave should be started or stopped nocontinue - don’t continue replication normally on exitxuser/pass eliminated for brevity
R/W R/W
Slave
Shard Pair
R/W R/W
Slave
Shard Pair
pt-slave-delay
R/W R/W
Slave
Shard Pair
pt-slave-delayrow based binlogs
R/W R/W
Slave
Shard Pair
HDFS
VerticaParse/
Transform
in addition can use slaves to send data to other stores for offline queries1)parse each binlog file to generate sequence file of row changes2)apply the row changes to a previous set for the latest version
something bad happens...bad query is run (bad update, etc)
http://www.flickr.com/photos/focalintent/1332072795/sizes/o/in/photostream/
A B
Slave
Before Restoration....
master.info should be pointing to the right place
step 2 could be flipping physical box (for faster recovery such as index servers)
A B
Slave
Before Restoration....
1) stop delayed slave replication
master.info should be pointing to the right place
step 2 could be flipping physical box (for faster recovery such as index servers)
B
Slave
Before Restoration....
1) stop delayed slave replication
2) pull side A A
master.info should be pointing to the right place
step 2 could be flipping physical box (for faster recovery such as index servers)
B
Slave
Before Restoration....
3) stop master-master replication
1) stop delayed slave replication
2) pull side A A
master.info should be pointing to the right place
step 2 could be flipping physical box (for faster recovery such as index servers)
> SHOW SLAVE STATUS
Relay_Log_File: dbslave-relay.007178Relay_Log_Pos: 8666654
on delayed slave
get the relay position
mysql> show relaylog events in "dbslave-relay.007178" from 8666654 limit 1\G
*************************** 1. row ******************* Log_name: dbslave-relay.007178 Pos: 8666654 Event_type: Query Server_id: 1016572End_log_pos: 8666565 Info: use `etsy_shard`; /* [CVmkWxhD7gsatX8hLbkDoHk29iKo] [etsy_shard_001_B] [/your/activity/index.php] */ UPDATE `news_feed_stats` SET `time_last_viewed` = 1366406780, `update_time` = 1366406780 WHERE `owner_id` = 30793071 AND `owner_type_id` = 2 AND `feed_type` = 'owner'2 rows in set (0.00 sec)
on delayed slave
show relaylog events will show statements from relay log pass relay log and position to start
filter bad queriescycle through all the logs, analyze Query events rotate events - next log filelast relay log points to binlog master (server_id is masters, binlog coord matches master_log_file/pos)
http://www.flickr.com/photos/chriswaits/6607823843/sizes/l/in/photostream/
B
Slave
After Delayed Slave Data Is Restored....
A
master.info should be pointing to the right place
step 2 could be flipping physical box (for faster recovery such as index servers)
B
Slave
After Delayed Slave Data Is Restored....1) stop
mysql on A and slave
A
master.info should be pointing to the right place
step 2 could be flipping physical box (for faster recovery such as index servers)
B
Slave
After Delayed Slave Data Is Restored....1) stop
mysql on A and slave
2) copy data files
to A
A
master.info should be pointing to the right place
step 2 could be flipping physical box (for faster recovery such as index servers)
B
Slave
After Delayed Slave Data Is Restored....1) stop
mysql on A and slave
2) copy data files
to A
3) restart B to A replication, let A catch up to B
A
master.info should be pointing to the right place
step 2 could be flipping physical box (for faster recovery such as index servers)
Slave
After Delayed Slave Data Is Restored....1) stop
mysql on A and slave
2) copy data files
to A
3) restart B to A replication, let A catch up to B
A
4) restart A to B replication, put A back in, then pull B
A B
master.info should be pointing to the right place
step 2 could be flipping physical box (for faster recovery such as index servers)
Other Forms of RecoveryMigrate Single Object (user/shop/etc)
Hadoop Deltas
Backup + Binlogs
migrate object from delayed slave (similar to shard migration)can generate deltas from hadoopif delayed slave has “played” the bad data, go from last nights backup (slower)
Use Cases
what are some use cases?
http://www.flickr.com/photos/seatbelt67/502255276/sizes/o/in/photostream/
user reports a bug...
a user files a bug, i can trace the code for the exact page they're on right from my dev machine
testing “dry” writes
testing how application runs a “dry” write -- r/o mode, exception is thrown with the exact query it would have attempted to run, the values it tried to use, etc.
search ads campaign consistency
starting campaigns and maintaining consistency for entire ad system is nearly impossible in dev Search ads data is stored in more than a dozen DB tables and state changes are driven by a combination of browsers triggering ads, sellers managing their campaigns, and a slew of crons running anywhere from once per 5 minutes to once a month eg) to test pausing campaigns that run out of money mid-day, can pull large numbers of campaigns from prod and operate on those to verify that the data will still be consistent
google product listing ads
GPLA is where we syndicate our listings to google to be used in google product search adswe can test edge cases in GPLA syndication where it would be difficult to recreate the state in dev
testing prototypes
features like similar items search gives better results in production because of the amount of data, allowed us to test the quality of listings a prototype was displaying
performance testing
need a real data set to test pages like treasury search with lots of threads/avatars/etc the dev data is too sparse, xhprof traces don’t mean anything, missing avatars change perf characteristics
hadoop generated datasets
dataset produced from hadoop (recommendations for users, or statistics about usage) but since hadoop is prod data its for prod users/listings/shops, so have to check against prod--- sync to dev would fill dev dbs and data wouldn’t line up (b/c prod data)
browse slices
browse slices have complex population so its easier to test experiment against prod data
not enough listings to populate the narrower subcategories, and it just takes too long
Thank You
etsy.com/jobs
We’re hiring