Crossing the Production Barrier: Development at Scale

[email protected] / @johngoulah

Crossing the Production BarrierDevelopment At Scale

mailto:[email protected]


The world’s handmade marketplaceplatform for people to sell homemade, crafts, and vintage goods

42MM unique visitors/mo.

1.5B+ page views / mo.




850K shops / 200 countries


895MM sales in 2012


850K shops / 200 countries

big cluster, 20 shards and adding 5 more

over 40% increase from last year in QPS (25K last year)additional 30K moving over from postgres

1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)

4TB InnoDB buffer pool




20TB+ data stored



60K+ queries/sec avg


20TB+ data stored





20TB+ data stored

~1.2Gbps outbound (plain text)





20TB+ data stored

99.99% queries under 1ms

~1.2Gbps outbound (plain text)



50+ MySQL servers / 800 CPUs

Server SpecHP DL 380 G7

96GB RAM16 spindles / 1TB RAID 10

24 Core16 x 146GB

The Problem

been around since ’05, hit this a few years ago, every big company probably has this issue

DATA

sync prod to dev, until prod data gets too big

http://www.flickr.com/photos/uwwresnet/6280880034/sizes/l/in/photostream/

Some Approaches

subsets have to end somewhere (a shop has favorites that are connected to people, connected to shops, etc)generated data can be time consuming to fake

Some Approaches

subsets of data


Some Approaches

subsets of data

generated data


But...

but there is a problem with both of those approaches

Edge Cases

what about testing edge cases, difficult to diagnose bugs?hard to model the same data set that produced a user facing bug

http://www.flickr.com/photos/sovietuk/141381675/sizes/l/in/photostream/

Perspective

another issue is testing problems at scale, complex and large gobs of datareal social network ecosystem can be difficult to generate (favorites, follows) (activity feed, “similar items” search gives better results)

http://www.flickr.com/photos/donsolo/2136923757/sizes/l/in/photostream/

Prod Dev ?

what most people do before data gets too big, almost 2 days to sync 20Tb over 1Gbps link, 5 hrs over 10Gbps bringing prod dataset to dev was expensive hardware/maint, keeping parity with prod, and applying schema changes would take at least as long

Use Production

so we did what we saw as the last resort - used production not for greenfield development, more for mature features and diagnosing bugswe still have a dev database but the data is sparse and unreliable

Use Production(sometimes)

so we did what we saw as the last resort - used production not for greenfield development, more for mature features and diagnosing bugswe still have a dev database but the data is sparse and unreliable

goes without saying this can be dangerousalso difficult if done right, we’ve been working on this for a year

http://www.flickr.com/photos/stuckincustoms/432361985/sizes/l/in/photostream/

Approach

two big things: cultural and technical

Solve Culture Issues First

part of figuring this out was exhausting all other optionsgetting buy-in from major stakeholders

Two “Simple” Technical Issues

step 0:

failure recovery

step 1:

make it safehow to have test data in production, prevent stupid mistakes

phased rollout

phased rollout

read-only

phased rollout

read-onlyr/w dev shard only

phased rollout

read-onlyr/w dev shard only

full r/w

How?

how did we do it?

Quick Overview

high level view

http://www.flickr.com/photos/h-k-d/7852444560/sizes/o/in/photostream/

tickets index

shard 1 shard 2 shard N

tickets index


Unique IDs

tickets index


Shard Lookup

tickets index


Store/Retrieve Data

dev shard

introducing....

dev shard, shard used for initial writes of data created when coming from dev env

tickets index


tickets index


DEV shard


DEV shard

www.etsy.com www.goulah.vm

Initial Writes

mysql proxy

proxy hits all of the shards/index/tickets

http://www.oreillynet.com/pub/a/databases/2007/07/12/getting-started-with-mysql-proxy.html

dangerous/unnecessary queries

-- filter dangerous queries - (queries without a WHERE)-- remove unnecessary queries - (instead of DELETE, have a flag, ALTER statements don’t run from dev)


(DEV) etsy_rw@jgoulah [test]> select * from fred_test;





(DEV) etsy_rw@jgoulah [test]> select * from fred_test;

ERROR 9001 (E9001): Selects from tables must have where clauses




known in/egress funnel

we know where all of the queries from dev originate from

http://www.flickr.com/photos/medevac71/4875526920/sizes/l/in/photostream/

explicitly enabled

% dev_proxy onDev-Proxy config is now ON. Use 'dev_proxy off' to turn it off.

Not on all the time

visual notifications

notify engineers they are using the proxy, this is read-only mode

read/write mode

read-write mode, needed for login and other things that write data

stealth data

hiding data from users (favorites go on dev and prod shard, making sure test user/shops don’t show up)

http://www.flickr.com/photos/davidyuweb/8063097077/sizes/h/in/photostream/

Security

http://www.flickr.com/photos/sidelong/3878741556/sizes/l/in/photostream/

PCI

token exchange only, locked down for most people

PCI

off-limits

token exchange only, locked down for most people

anomaly detection

another part of our security setup is detection

logging

basics of anomaly detection is log collection

2013-04-22 18:05:43 485370821 devproxy --

/* DEVPROXY source=10.101.194.19:40198

uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361

[htSp8458VmHlC] [etsy_index_B] [browse.php] */

SELECT id FROM table;

2013-04-22 18:05:43 485370821 devproxy --

/* DEVPROXY source=10.101.194.19:40198




date

2013-04-22 18:05:43 485370821 devproxy --

/* DEVPROXY source=10.101.194.19:40198




date thread id

2013-04-22 18:05:43 485370821 devproxy --

/* DEVPROXY source=10.101.194.19:40198




date thread id

source ip

2013-04-22 18:05:43 485370821 devproxy --

/* DEVPROXY source=10.101.194.19:40198




date thread id

source ip

unique id generated by proxy

2013-04-22 18:05:43 485370821 devproxy --

/* DEVPROXY source=10.101.194.19:40198




date thread id

source ip


app request id

2013-04-22 18:05:43 485370821 devproxy --

/* DEVPROXY source=10.101.194.19:40198




date thread id

source ip


app request id dest. shard

2013-04-22 18:05:43 485370821 devproxy --

/* DEVPROXY source=10.101.194.19:40198




date thread id

source ip


app request id dest. shard script

login-as

(read only, logged w/ reason for access)

reason is recorded and reviewed

Recovery

sources of restore data

sources of restore dataHadoop


Backups


Backups

Delayed Slaves

Delayed Slaves

pt-slave-delay watches a slave and starts and stops its replication SQL thread as necessary to hold it

http://www.flickr.com/photos/xploded/141295823/sizes/o/in/photostream/

Delayed Slaves

role of the delayed slavealso source of BCP (business continuity planning - prevention and recovery of threats)

4 hour delay behind master

Delayed Slaves



produce row based binary logs

Delayed Slaves



produce row based binary logs

Delayed Slaves

allow for quick recovery


pt-slave-delay --daemonize

--pid /var/run/pt-slave-delay.pid --log /var/log/pt-slave-delay.log

--delay 4h --interval 1m --nocontinue

last 3 options most important, 4h delay, interval is how frequently it should check whether slave should be started or stopped nocontinue - don’t continue replication normally on exitxuser/pass eliminated for brevity

R/W R/W

Slave

Shard Pair

R/W R/W

Slave

Shard Pair

pt-slave-delay

R/W R/W

Slave

Shard Pair

pt-slave-delayrow based binlogs

R/W R/W

Slave

Shard Pair

HDFS

VerticaParse/

Transform

in addition can use slaves to send data to other stores for offline queries1)parse each binlog file to generate sequence file of row changes2)apply the row changes to a previous set for the latest version

something bad happens...bad query is run (bad update, etc)

http://www.flickr.com/photos/focalintent/1332072795/sizes/o/in/photostream/

A B

Slave

Before Restoration....

master.info should be pointing to the right place

step 2 could be flipping physical box (for faster recovery such as index servers)

A B

Slave


1) stop delayed slave replication



B

Slave



2) pull side A A



B

Slave


3) stop master-master replication


2) pull side A A



> SHOW SLAVE STATUS

Relay_Log_File: dbslave-relay.007178Relay_Log_Pos: 8666654

on delayed slave

get the relay position

mysql> show relaylog events in "dbslave-relay.007178" from 8666654 limit 1\G

*************************** 1. row ******************* Log_name: dbslave-relay.007178 Pos: 8666654 Event_type: Query Server_id: 1016572End_log_pos: 8666565 Info: use ètsy_shard`; /* [CVmkWxhD7gsatX8hLbkDoHk29iKo] [etsy_shard_001_B] [/your/activity/index.php] */ UPDATE `news_feed_stats` SET `time_last_viewed` = 1366406780, ùpdate_time` = 1366406780 WHERE òwner_id` = 30793071 AND òwner_type_id` = 2 AND `feed_type` = 'owner'2 rows in set (0.00 sec)

on delayed slave

show relaylog events will show statements from relay log pass relay log and position to start

filter bad queriescycle through all the logs, analyze Query events rotate events - next log filelast relay log points to binlog master (server_id is masters, binlog coord matches master_log_file/pos)

http://www.flickr.com/photos/chriswaits/6607823843/sizes/l/in/photostream/

B

Slave

After Delayed Slave Data Is Restored....

A



B

Slave

After Delayed Slave Data Is Restored....1) stop

mysql on A and slave

A



B

Slave



2) copy data files

to A

A



B

Slave



2) copy data files

to A

3) restart B to A replication, let A catch up to B

A



Slave



2) copy data files

to A

3) restart B to A replication, let A catch up to B

A

4) restart A to B replication, put A back in, then pull B

A B



Other Forms of RecoveryMigrate Single Object (user/shop/etc)

Hadoop Deltas

Backup + Binlogs

migrate object from delayed slave (similar to shard migration)can generate deltas from hadoopif delayed slave has “played” the bad data, go from last nights backup (slower)

Use Cases

what are some use cases?

http://www.flickr.com/photos/seatbelt67/502255276/sizes/o/in/photostream/

user reports a bug...

a user files a bug, i can trace the code for the exact page they're on right from my dev machine

testing “dry” writes

testing how application runs a “dry” write -- r/o mode, exception is thrown with the exact query it would have attempted to run, the values it tried to use, etc.

search ads campaign consistency

starting campaigns and maintaining consistency for entire ad system is nearly impossible in dev Search ads data is stored in more than a dozen DB tables and state changes are driven by a combination of browsers triggering ads, sellers managing their campaigns, and a slew of crons running anywhere from once per 5 minutes to once a month eg) to test pausing campaigns that run out of money mid-day, can pull large numbers of campaigns from prod and operate on those to verify that the data will still be consistent

google product listing ads

GPLA is where we syndicate our listings to google to be used in google product search adswe can test edge cases in GPLA syndication where it would be difficult to recreate the state in dev

testing prototypes

features like similar items search gives better results in production because of the amount of data, allowed us to test the quality of listings a prototype was displaying

performance testing

need a real data set to test pages like treasury search with lots of threads/avatars/etc the dev data is too sparse, xhprof traces don’t mean anything, missing avatars change perf characteristics

hadoop generated datasets

dataset produced from hadoop (recommendations for users, or statistics about usage) but since hadoop is prod data its for prod users/listings/shops, so have to check against prod--- sync to dev would fill dev dbs and data wouldn’t line up (b/c prod data)

browse slices

browse slices have complex population so its easier to test experiment against prod data

not enough listings to populate the narrower subcategories, and it just takes too long

Thank You

etsy.com/jobs

We’re hiring

http://github.com/etsy/deployinator

http://github.com/etsy/deployinator

Crossing the Production Barrier: Development at Scale

Technology

Transcript of Crossing the Production Barrier: Development at Scale