Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

download Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

of 87

Transcript of Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    1/87

    Author:Fred Hatfull

    Scaling Out Websites With Your Own Two HandsThe Dirty Work

    Date:CWRU, October 27 2012

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    2/87

    PAGE:

    The Dirty Work

    CWRU Alumnus

    Software Developer at Yelp

    Infrastructure Engineer

    Availability

    Performance

    Productivity

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Who Am I?

    2

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    3/87

    PAGE:

    The Dirty Work

    Whats Yelp?

    Scaling the Backend

    Accelerating Content Delivery

    Monitoring Performance

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Overview

    3

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    4/87

    PAGE:

    The Dirty Work

    Help consumers find great local businesses

    Help businesses owners find more customers

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Whats Yelp?

    4

    Numbers current as of 2012Q2.

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    5/87

    PAGE:

    The Dirty Work

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Whats Yelp?

    5

    Were hiring! yelp.com/careers

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    6/87

    PAGE:

    The Dirty Work

    Five Sites

    www.yelp.com biz.yelp.com

    api.yelp.com

    m.yelp.com

    admin.yelp.com

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Whats Yelp?

    6

    www - consumer facing websitebiz - business owners website for managing ads, biz page, etcapi - public and private APIs (mobile apps, too)m - mobile site (web browsers on mobile devices)admin - administrative tools

    http://www.yelp.com/http://www.yelp.com/http://www.yelp.com/http://www.yelp.com/http://www.yelp.com/http://www.yelp.com/http://www.yelp.com/http://www.yelp.com/http://www.yelp.com/http://www.yelp.com/http://www.yelp.com/
  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    7/87

    PAGE:

    The Dirty Work

    Numerous Open-Source Projects:

    mrjob firefly

    testify

    tron

    many more: github.com/Yelp

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Whats Yelp?

    7

    mrjob - python Map/Reduce frameworkfirefly - time-series statistics graphingtestify - more python test frameworktron - distributed cron

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    8/87

    PAGE:

    The Dirty Work

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    8

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    9/87

    PAGE:

    The Dirty Work

    Like most websites, it all started with a single server...

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    In the Beginning...

    9

    You have probably set up a website like this....

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    10/87

    PAGE:

    The Dirty Work

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    In the Beginning

    10

    Apache, Python, mySQL, Linux.

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    11/87

    PAGE:

    The Dirty Work

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    In the Beginning

    11

    16.32.64.128

    No load balancer, no internal DNS, no web framework. Just us, mod_python, and mySQL.

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    12/87

    PAGE:

    The Dirty Work

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Up and Running

    12

    16.32.64.128

    web1

    web2

    db1

    * Names changed to protect the innocent

    Trafc starts picking up. One box doesnt cut it any more... time to scale out horizontally.Adding webs is the low hanging fruit.

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    13/87

    PAGE:

    The Dirty Work

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Up and Running

    13

    16.32.64.128

    web1

    db1

    * Names changed to protect the innocent

    web2

    web3

    web4

    web5

    web6

    Ok, we begin to hit the limits of horizontal scaling. The webapp can always benefit fromhaving more machines (+HAproxy)

    Th Di t W k

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    14/87

    PAGE:

    The Dirty Work

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Up and Running

    14

    16.32.64.128

    web1

    db1

    * Names changed to protect the innocent

    web2

    web3

    web4

    web5

    web6

    !

    But the database is now under very heavy load!

    Th Di t W k

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    15/87

    PAGE:

    The Dirty Work

    Options

    Find a faster data store?

    Use separate databases?

    Sharding?

    Replication?

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Scaling the Database

    15

    There are a few classic options for scaling up mySQL. We could switch datastores... maybemySQL is just slow? How about Oracle? MSSQL? etc. We could introduce an entirely newdatabase machine with its own mySQL instance... basically just a clone of the current one.Both DBs dont know anything about each other. We could shard the database by havingmultiple machines where each is responsible for a certain set of keys. Or we could just

    replicate the current database to accomodate more trafc, and hope writes dont getoverwhelmin .

    TheDirtyWork

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    16/87

    PAGE:

    The Dirty Work

    Options

    Find a faster data store?

    Use separate databases?

    Sharding?

    Replication?

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Scaling the Database

    16

    Ok. Its 2004... noSQL isnt around, really, and our data is pretty relational anyway. mySQL islooking like the fastest store that meets our requirements.

    TheDirtyWork

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    17/87

    PAGE:

    The Dirty Work

    Options

    Find a faster data store?

    Use separate databases?

    Sharding?

    Replication?

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Scaling the Database

    17

    We could just set up an entirely new database machine and run the new database in parallel.Have to make two read queries instead of one now, figure out where to send writes, andkeeping schemas in sync is a nightmare. Clearly not scalable.

    TheDirtyWork

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    18/87

    PAGE:

    The Dirty Work

    Options

    Find a faster data store?

    Use separate databases?

    Sharding?

    Replication

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Scaling the Database

    18

    We are actually doing a form of sharding here, but its not quite the conventional master-master sharding that usually comes to mind. master-master would work, but its a huge painto get right and requires a lot of eort to make sure keys go to and are retrieved from theshards where they belong. Our read-heavy trafc patterns make us an ideal candidate forreplication to massively increase read capacity while reducing engineering overhead.

    TheDirtyWork

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    19/87

    PAGE:

    The Dirty Work

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    19

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    20/87

    The Dirty Work

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    21/87

    PAGE:

    y

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Database Replication

    21

    db1

    webs

    Reads/Writes

    readdb1

    Reads

    ReplicatedWrites

    Simple two-database replication scheme. All write trafc hits the master, `db1`. Read trafcis split between the master and a read-only database. When writes come through the masterdatabase, the master database informs the slave database that the write happens so that theslave replicates the action taken on the master.

    The Dirty Work

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    22/87

    PAGE:

    y

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Database Replication

    22

    db1

    webs

    Reads/Writes

    readdb1

    Reads

    ReplicatedWrites

    db2

    Its good practice to keep another master early in the replication stream that can bepromoted to write master if the write master fails.

    The Dirty Work

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    23/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Database Replication

    23

    db1

    webs

    Reads/Writes

    readdb1

    Reads

    ReplicatedWrites

    db2

    Replication cant always happen immediately, depending on the load on the slave db and themaster and how far apart they are/network congestion.

    The Dirty Work

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    24/87

    PAGE:

    Title Page General Bullet Points

    Graph Page

    Image Page Closing Page

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Slide or page title goes here

    24

    Life used to be so easy... :(

    The Dirty Work

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    25/87

    PAGE:

    Strong Consistency

    vs.

    Eventual Consistency

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Database Replication

    25

    A shift in thinking. Life is easy in strongly-consistent systems. Although scaling can bechallenging, you get a guarantee that data is always up-to-date. Eventual consistency is a bigchange and has lots of nasty corner cases ready to bite.

    The Dirty Work

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    26/87

    PAGE:

    Replication has lots of cons:

    Expensive/poorly formed queries have multiplicative effect

    Replication delay can lead to inconsistent views

    Figuring out when to hit master vs. slave

    etc...

    CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Database Replication

    26

    While horizontally-scalable read capacity is a big win, there are lots of new things to thinkabout.

    The Dirty WorkS li O W b i Wi h Y O T H d

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    27/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Replication: Gross Queries

    27

    websmaster

    slaves

    2ms1ms4500ms

    A normal replication stream. A lots of nice, small queries floating by. Master has no problem.

    The Dirty WorkS li O t W b it With Y O T H d

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    28/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Replication: Gross Queries

    28

    websmaster

    slaves

    2ms

    1ms4500ms

    2ms

    2ms

    Small writes get replicated fine, but the big, nasty table-scan suddenly locks a whole bunchof rows and waits seconds for mySQL to figure out which rows should come back. Thisdelays all the writes in the replication stream and causes an increase in replication delay,exacerbating the inconsistency

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    29/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Replication: Gross Queries

    29

    webs

    master

    slaves

    1ms

    4500ms

    4500ms

    4500ms

    In the case of a big INSERT/UPDATE/DELETE, that query also needs to replicate to the slaves,causing big delays on all of the slaves

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    30/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Replication: Consistency

    30

    web

    db master

    db replica

    Heres an example of where replication can introduce inconsistency. Our user wants to knowabout the restaurant Happy Dog

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    31/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Replication: Consistency

    31

    web

    db master

    db replica

    As part of the request, the web server handling the request asks a DB replica for informationabout Happy Dog

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    32/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Scaling Out Websites With Your Own Two Hands

    Replication: Consistency

    32

    web

    db master

    db replica

    The replica comes back with the requested information...

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    33/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    g

    Replication: Consistency

    33

    web

    db master

    db replica

    and its returned to the user. Fine. Everything here is as it used to be.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    34/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    g

    Replication: Consistency

    34

    web

    db master

    db replica

    Now our user wants to write a review about Happy Dog

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    35/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Replication: Consistency

    35

    web

    db master

    db replica

    Since we have to write data, our web connects to the master database and issues the writesfor the new review

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    36/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Replication: Consistency

    36

    web

    db master

    db replica

    Thats the end of our users web request. After she POSTs the review she gets redirected backto Happy Dogs page. Like before, her web hits a replica instead of the master because it onlyneeds to do reads. However, her request gets through the web and to the replica before herreview makes it to the replica in the replication stream...

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    37/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Replication: Consistency

    37

    web

    db master

    db replica

    wheres my review??

    As a result, our user sees the stale information, even though she just contributed content!Now the user thinks her content has disappeared.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    38/87

    PAGE:

    Writes:

    alwaysmasterReads:

    can useeithermasterorslave

    majority can useslave sometimes you want to hit themasterfor consistency

    CWRU, 27 October 2012 - Fred Hatfull

    Replication: Consistency

    38

    Heres how DB access is split up based on what kind of activity you are doing. Writes(INSERTs/UPDATEs/DELETEs) always hit the master, since nothing else will accept writes.Reads (SELECTs) can use either the master or a slave, and usually only need a slave. Its up tothe application to figure out if it needs to hit the master, and that can be tricky

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    39/87

    PAGE:

    When Does Consistency Matter?

    Consistency only matters when itsexpected

    example: users writing reviews

    If the user doesnt know information is out of date... is it

    really out of date?

    CWRU, 27 October 2012 - Fred Hatfull

    Replication: Consistency

    39

    The dirty secret is: consistency doesnt really always matter.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    40/87

    PAGE:

    Asking for the master:

    after writes, hit the master until replication catches up webs/load-balancers can remember user state

    but expensive, brittle

    instead, teach clients to ask for the master dirty session cookie

    CWRU, 27 October 2012 - Fred Hatfull

    Replication: Consistency

    40

    Always hit the master for writes. After writes, hang on to a cookie for X seconds. While theuser has the cookie, hit the master.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    41/87

    PAGE:

    After writes, issue dirty session cookie

    cookie contains a timestamp in the future

    webs check for cookie

    if time is before timestamp in cookie:

    redirect to master db

    else:

    remove cookie, continue via replica

    CWRU, 27 October 2012 - Fred Hatfull

    Replication: Dirty Session

    41

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    42/87

    PAGE:

    Title Page General Bullet Points

    Graph Page

    Image Page Closing Page

    CWRU, 27 October 2012 - Fred Hatfull

    Slide or page title goes here

    42

    Caches

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    43/87

    PAGE:

    Fastest thing since sliced bread Often seen as a drop-in performance enhancement

    Can be hard to get right

    Present hidden availability implications

    CWRU, 27 October 2012 - Fred Hatfull

    Caches

    43

    Take advantage of precomputed/pre-retrieved results in-memory. While it seems like adrop-in speed upgrade, they can be surprisingly hard to get right.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    44/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Caches: Types

    44

    webs dbs

    load balancer

    Several dierent types of caches in several dierent places...

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    45/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Caches: Types

    45

    webs dbs

    load balancer

    HTTP caches(varnish etc)

    HTTP Caches - cache full HTTP responses. Great for static sites or dynamic sites with contentthat changes infrequently. Ex: varnish, squid

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    46/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Caches: Types

    46

    webs dbs

    load balancer

    HTTP caches(varnish etc)

    in-memory caches

    Memoization of results from ... things. Functions, db queries, etc. Typically per-node (notshared between webs, for example)

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    47/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Caches: Types

    47

    webs dbs

    load balancer

    HTTP caches(varnish etc)

    in-memory caches

    memcache

    Memcache! Frequently used to store computed results for faster lookup and load reduction.Used to cache anything from raw DB rows to larger queries (joins) to gzipd blobs toserialized data structures

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    48/87

    PAGE:

    Primary cache in most places: memcache

    Takes advantage of fast in-memory key/value lookups Good for expensive operations

    Complex DB queries - 100s of ms

    Network roundtrip to memcache - 2-3ms Misses are cheap - only network roundtrip cost

    CWRU, 27 October 2012 - Fred Hatfull

    Caches: Advantages

    48

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    49/87

    PAGE:

    Cache libraries can make it easy to cache weird things:

    Database models memcache connections (!)

    ??? - anything else that can be serialized (via pickle etc)

    Causes problems when object definitions change Cannot enumerate cache contents easily to fix polluted caches

    CWRU, 27 October 2012 - Fred Hatfull

    Caches: Pitfalls

    49

    Especially in dynamic languages, it can be easy to say oh do some serialization or whateverand then cache it! However, many times youll have things like SQLAlchemy models (whichhave connections to your database!), the connection you are using to access memcache, andmore. If any of those object definitions change (or you change/remove code that pickle/jsonexpects to have to deserialize), you may end up with a polluted cache which contains entries

    that you cant decode. Memcache also doesnt allow you to enumerate cache entries, soro rammaticall invalidatin certain subsets of ke s is hard if not im ossible.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    50/87

    PAGE:

    Makes exceeding failover capacityreallyeasy

    If memcache cluster goes down what happens?

    How do you handle additional web and DB load?

    Solution:

    Build in additional capacity

    Be able to isolate and turn off expensive features

    Have an emergency maintenance mode

    CWRU, 27 October 2012 - Fred Hatfull

    Caches: Pitfalls

    50

    Memcache helps to reduce load, but its another point of failure. Memcache outages cancause increased load proportional to what it ooaded for you, which can easily causecascading failures if not handled correctly.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    51/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Datacenters

    51

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    52/87

    PAGE:

    Geographic distribution helps you mitigate the speed of light

    Replication problems expand to more systems: memcache

    code deployments

    offline batch processing database slaves see non-trivial replication delay

    CWRU, 27 October 2012 - Fred Hatfull

    Datacenters

    52

    Its like database replication for your whole system. Out-of-sync caches can be problematic,and database replication becomes a super-non-trivial problem.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    53/87

    PAGE:

    Solutions:

    Each datacenter gets write master Still only One True Master, other write masters replicate

    Read/reporting slaves replicate from local write master

    Replicate cache inserts, invalidations Take advantage of existing mySQL replication stream

    CWRU, 27 October 2012 - Fred Hatfull

    Datacenters

    53

    Provide utilities for monitoring replication delay for services using the replication stream

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    54/87

    PAGE:

    Title Page General Bullet Points

    Graph Page

    Image Page Closing Page

    CWRU, 27 October 2012 - Fred Hatfull

    Slide or page title goes here

    54

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    55/87

    PAGE:

    Reduce HTTP Round-Trips

    Reduce download sizes

    Dont do things browsers dont like

    CWRU, 27 October 2012 - Fred Hatfull

    Front-End: Principles

    55

    Lots of front-end performance tips/tricks/hacks. Most of them are based on theseguidelines.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    56/87

    PAGE:

    ContentDeliveryNetworks

    Maintains copies of your assets

    Probably serves your assets faster than you do

    Examples:

    Akamai

    Cloudfront (Amazon Web Services)

    Cotendo

    CWRU, 27 October 2012 - Fred Hatfull

    CDNs

    56

    Like a big, giant, mega-cache.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    57/87

    PAGE:

    Hugenetworks of globally distributed edge nodes

    e.g. Akamai at > 100,000

    Easy to setup and drop in

    Transparent layer, just change hostnames to CDN

    Much lower bandwidth and equipment costs Asset gets uploaded to CDN once (ish)

    CWRU, 27 October 2012 - Fred Hatfull

    CDNs: Why?

    57

    [1] http://www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-web-servers/

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    58/87

    PAGE:

    RFC 2616 (HTTP 1.1):

    A single-user client SHOULD NOT maintain more than2

    connectionswith any server or proxy.

    CWRU, 27 October 2012 - Fred Hatfull

    Subdomain Sharding

    58

    http://www.ietf.org/rfc/rfc2616.txt

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    59/87

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    60/87

    PAGE:

    Distribute assets traffic across sharded subdomains

    Before: media.yelp.com -> 16.32.64.128

    After: media[1-4].yelp.com -> media.yelp.com -> 16.32.64.128

    CWRU, 27 October 2012 - Fred Hatfull

    Subdomain Sharding

    60

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    61/87

    PAGE:

    Reduce round trips by caching assets

    Use HTTP Cache-Control headers

    Use very long times e.g. 10 years

    Version assets via URL path or query params

    CWRU, 27 October 2012 - Fred Hatfull

    Cache Me If You Can

    61

    There are alternatives... e.g. ETag and If-Modified-Since. These require HTTP round-trips tocompute, though, so even though you dont end up needing to re-download the asset youstill end up with more TCP connections.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    62/87

    PAGE:

    GET /assets/js/1/32dce72546/main.js HTTP/1.1

    200 OK

    Cache-Control: max-age=315360000

    Content-Encoding: gzip, deflate

    Content-Length: 8437...

    CWRU, 27 October 2012 - Fred Hatfull

    Cache Me If You Can

    62

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    63/87

    PAGE:

    GET /assets/js/1/32dce72546/main.js HTTP/1.1

    200 OK

    Cache-Control: max-age=315360000

    Content-Encoding: gzip, deflate

    Content-Length: 8437...

    CWRU, 27 October 2012 - Fred Hatfull

    Cache Me If You Can

    63

    Global version + hash of asset

    10 years

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    64/87

    PAGE:

    GET / HTTP/1.1Host: www.yelp.comConnection: keep-aliveCache-Control: max-age=0User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko)Chrome/23.0.1271.52 Safari/537.11Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8Accept-Encoding: gzip,deflate,sdchAccept-Language: en-US,en;q=0.8

    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3Cookie:yuv=moFlBQmJAAfti607vjGInzN7FFk8_DSjAfHQ_lW4YaGwZEJqQqZmcEyNyeNTQam7Rqm2q6EOieOhxmRTZiNuMNmm_G7pet1m;__qca=P0-2101128917-1317837280654;__gads=ID=91ec4bf2b3418833:T=1321911452:S=ALNI_MbtB7iIIKDvemgnlX95Ywi4BsWEPg;bse=63c34cdcaef7b20a01cc89cc34ccfff5; fd=0; searchPrefs=%7B%22seen_pop%22%3Atrue%2C%22seen_crop_pop%22%3Atrue%2C%22prevent_scroll%22%3Afalse%2C%22maptastic_mode%22%3Atrue%2C%22mapsize%22%3A%22large%22%2C%22rpp%22%3A40%7D; fbm_97534753161=base_domain=.yelp.com; s=YGc7FduEf1Wv2m5hE1sMWU5pMolrEG8x;hl=en_US; recentlocations=New+York%2C+NY%2C+USA%3B%3B706+Mission+St%2C+San+Francisco%2C+CA%2C+USA%3B%3BLower+Pac+Heights%2C+San+Francisco%2C+CA%2C+USA%3B%3BFillmore%2C+MO%2C+USA%3B%3B1251+Waller+St%2C+San+Francisco%2C+CA%2C+USA%3B%3BAnn+Arbor%2C+MI%2C+USA%3B%3BHaight-Ashbury%2C+San+Francisco%2C+CA%2C

    +USA%3B%3BSOMA%2C+San+Francisco%2C+CA%2C+USA%3B%3BPittsburgh%2C+PA%2C+USA%3B%3BUnion+Square%2C+San+Francisco%2C+CA%2C+USA%3B%3BLondon%2C+UK%3B%3B706+Mission%2C+Kingsburg%2C+CA%2C+USA%3B%3BMiami%2C+FL%2C+USA%3B%3B510+Central+Ave%2C+Hot+Springs%2C+AR%2C+USA; location=%7B%22unformatted%22%3A+%22San+Francisco%2C+CA%22%2C+%22city%22%3A+%22San+Francisco%22%2C+%22state%22%3A+%22CA%22%2C+%22country%22%3A+%22US%22%7D; __utma=165223479.655521012.1316285892.1350862721.1351317047.69;__utmb=165223479.3.10.1351317047; __utmc=165223479; __utmz=165223479.1338263014.44.13.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided);fbsr_97534753161=Le92MKZfPPnUrIyfRghoVKhIdhfzAt4wW1jpJHo3fIk.eyJhbGdvcml0aG0iOiJITUFDLVNIQTI1NiIsImNvZGUiOiIzNWM5OTViNzIxYTgyYjQzMzVmZjNlNDAuMS0xNDczNjMwMTIyfDEzNTEzMTczNDl8Vm1mLU5DLXh1RkswSHRmdF9wQWRDR0NRTGNZIiwiaXNzdWVkX2F0IjoxMzUxMzE3MDQ5LCJ1c2VyX2lkIjoiMTQ3MzYzMDEyMiJ9

    CWRU, 27 October 2012 - Fred Hatfull

    Cookieless Domains

    64

    This is big request...

    The Dirty WorkScaling Out Websites With Your Own Two Hands

    http://www.yelp.com/http://www.yelp.com/
  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    65/87

    PAGE:

    2128 bytes = 2k!

    CWRU, 27 October 2012 - Fred Hatfull

    Cookieless Domains

    65

    Could be as big as the asset itself!

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    66/87

    PAGE:

    Only assign cookies to domains which need them

    Put static assets/other cookie-less content elsewhere

    *.yelp.com vs. *.yelp-cdn.com

    CWRU, 27 October 2012 - Fred Hatfull

    Cookieless Domains

    66

    Previous request becomes 399 bytes of overhead.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    67/87

    PAGE:

    Use gzip

    Always include styles in the

    Always put scripts at the end of

    Avoid inline styles

    Avoid manipulating the DOM

    Try not to trigger repaints

    Load non-critical content via AJAX (if applicable)

    CWRU, 27 October 2012 - Fred Hatfull

    Assorted Front-End Tips

    67

    Lots more... web is abundant with tips. These are some of the ones we use.

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    68/87

    PAGE:

    [Cat]

    CWRU, 27 October 2012 - Fred Hatfull

    Slide or page title goes here

    68

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    69/87

    PAGE:

    Mission-critical

    Needs to be simple, easy-to-understand, durable

    Strategies vary widely based on application requirements

    Drop-in products only get you so far

    Exposes:

    what is broken when

    what works and how well it works

    CWRU, 27 October 2012 - Fred Hatfull

    Monitoring

    69

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    70/87

    PAGE:

    Nagios - Alerts

    Firefly - Performance and availability analysis

    Tertiary tools

    Smokeping

    Pharos

    Ganglia

    etc...

    CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Our Strategy

    70

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    71/87

    PAGE:

    Nagios

    Off-the-shelf solution

    Flexible custom reporting

    Well-understood for monitoring systems

    e.g. load, memory/disk usage, hardware failures, etc

    Needs well-known states (CRITICAL/OK)

    CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Alerts

    71

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    72/87

    PAGE:

    Firefly

    Graphical front-end to time-series data

    Extensible data ingestion API

    Open-source: github.com/Yelp/firefly

    Statmonster

    Code-name for data collection and preprocessing

    Upstream of Firefly

    Turns log lines into stats, analyzes, and stores

    CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Performance

    72

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    73/87

    PAGE:

    Brief Aside: Logging

    Incredibly important for application development Often only source of information when everything blows up

    Useful for pulling data out of the webapp on the fly

    Huge volumes of log data require special infrastructure

    CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Logging

    73

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    74/87

    PAGE:

    Distributed log aggregation system

    Composed ofleafandaggregator nodes

    Leavescollect log lines

    Aggregatorsaggregate incoming lines based onchannel

    Each line associated with a channel

    e.g. (yelp_timings, Homepage render took 32.9ms

    Eventually Consistent

    CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Scribe

    74

    Note: eventual consistency != strong consistency. Lines may be delayed/out-of-order

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    75/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Scribe

    75

    webs

    (channel3, message)

    (channel1, message)

    (channel2, message)

    (channel2, message)

    (channel1, message)

    leaves

    aggregators

    [channel1]

    [channel2, channel3]

    localconsumers

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    76/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Performance

    76

    scribe

    log digestionstat generation

    windowingadditional statistics

    log lines (json etc)

    stats

    (performance.home, 1.2)

    RRD Data Chunks

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    77/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Performance

    77

    scribe

    log digestionstat generation

    windowingadditional statistics

    log lines (json etc)

    stats

    (performance.home, 1.2)

    RRD Data Chunks

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    78/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Performance

    78

    {time_start: 10,time_dispatch: 12,time_end: 44,checkpoints: {

    user_details: 22,review_collection: 37,template_render: 43

    }}

    def digest(e):time_start = e[time_start]checkpoints = e[checkpoints]total_time = e[time_end] - time_startcompute_time = e[template_render] - time_start

    reviews_time = checkpoints[review_collection] - time_start

    emit([performance, total_time], total_time)emit([performance, compute_time], compute_time)emit([checkpoint_times, reviews], reviews_time)

    ([performance, total_time], 34, 10)([performance, compute_time], 32, 10)([checkpoint_times, reviews], 27, 10)

    Trailing 10 in the emitted stats is the timestamp

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    79/87

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    80/87

    PAGE:CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Performance

    80

    ([performance, total_time], 34, 10)([performance, compute_time], 32, 10)([checkpoint_times, reviews], 27, 10)

    [performance, total_time]

    (8,3

    2)

    (8,4

    0)

    (10,2

    3)

    (9,3

    6)

    (11,2

    9)

    (9,3

    3)

    (10,3

    5)

    (11,2

    8)

    (12,3

    9)

    (10s buffer)

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    81/87

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    82/87

    PAGE:CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Performance

    82

    ([performance, total_time], 34, 10)([performance, compute_time], 32, 10)([checkpoint_times, reviews], 27, 10)

    [performance, total_time]

    (8,3

    2)

    (8,4

    0)

    (10,2

    3)

    (9,3

    6)

    (11,2

    9)

    (9,3

    3)

    (10,3

    5)

    (11,2

    8)

    (12,3

    9)

    (10,3

    4)

    (10s buffer)

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    83/87

    PAGE:CWRU, 27 October 2012 - Fred Hatfull

    Monitoring: Performance

    83

    (8,32)

    (8,40)

    (10,23)

    (9,36)

    (11,29)

    (9,33)

    (10,35)

    (11,28)

    (12,39)

    (10,34)

    stats50th, 75th, 95th, 99th,

    mean, count, etc...

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    84/87

    PAGE:

    Title Page

    General Bullet Points

    Graph Page

    Image Page

    Closing Page

    CWRU, 27 October 2012 - Fred Hatfull

    Slide or page title goes here

    84

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    85/87

    PAGE: CWRU, 27 October 2012 - Fred Hatfull

    Questions?

    85

    The Dirty WorkScaling Out Websites With Your Own Two Hands

  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    86/87

    PAGE:

    Full-Time

    Interns

    Front-End and Back-End Engineering

    http://www.yelp.com/careers

    CWRU, 27 October 2012 - Fred Hatfull

    Were Hiring!

    86

    The Dirty WorkScaling Out Websites With Your Own Two Hands

    http://www.yelp.com/careershttp://www.yelp.com/careers
  • 7/31/2019 Link-State 2012 - Fred Hatfull - The Dirty Work: Scaling Out Websites With Your Own Two Hands

    87/87

    PAGE:

    Title Page

    General Bullet Points

    Graph Page

    Image Page

    Closing Page

    CWRU, 27 October 2012 - Fred Hatfull

    Slide or page title goes here

    87

    Thanks!