Django at Scale

43
Django at Scale Brett Hoerner @bretthoerner http://bretthoerner.com Whirlwind of various tools and ideas, nothing too deep. I tried to pick things that are applicable/useful even for smaller sites.

Transcript of Django at Scale

Page 1: Django at Scale

Django at ScaleBrett Hoerner@bretthoerner

http://bretthoerner.com

Whirlwind of various tools and ideas, nothing too deep.I tried to pick things that are applicable/useful even for smaller sites.

Page 2: Django at Scale

Who?

Django Weekly Review in November 2005.I took that job in Dallas.Django for 5+ years.Disqus for 2 years.

Page 3: Django at Scale

DISQUS

A commenting system with an emphasis for connecting online communities.Almost a million ‘forums’ (sites), millions and millions of users and comments.

Page 4: Django at Scale

“The embed”

You’ve probably seen it somewhere, if you haven’t seen it you’ve probably loaded it.More customization than one might think at first glance, or make for your own system.

Page 5: Django at Scale

How big?

• 19 employees, 9 devs/ops

• 25,000 requests/second peak

• 500 million unique monthly visitors

• 230 million requests to Python in one day

Slighty dated traffic information, higher now.Except the 230MM number I just pulled from logs: doesn’t include cached varnish hits, media, etc.Growing rapidly, when I joined I thought it was “big”... hahaha.

Page 6: Django at Scale

Long Tail

Today’s news is in the green, but the yellow is very long and represents all of the older posts people are hitting 24/7.Hard to cache everything.Hard to know where traffic will be.Hard to do maintenance since we’re part of other peoples’ site’s.

Page 7: Django at Scale

Infrastructure

• Apache

• mod_wsgi

• PostgreSQL

• Memcached

• Redis

• Solr

• Nginx

• Haproxy

• Varnish

• RabbitMQ

• ... and more

A little over 100 total servers; not Google/FB scale, but big.Don’t need our own datacenter.Still one of the largest pure Python apps, afaik.Not going deep on non-python/app stuff, happy to elaborate now/later.

Page 8: Django at Scale

But first ...

... a PSA

Page 9: Django at Scale

USE PUPPET OR CHEF

No excuses if this isn’t a pet project.If you do anything else you’re reinventing wheels.It’s not that hard.Your code 6 months later may as well be someone else’s, same holds true for sysadmin work.But ... not really the subject of this talk.

Page 10: Django at Scale

Application Monitoring• Graphite

• http://graphite.wikidot.com/

You should already be using Nagios, Munin, etcIt’s Python! (and Django, I think)Push data in, click it to add to graph, save graph for later.Track errors, new rows, logins - it’s UDP so it’s safe to call a lot from inside your app.Stores rates and more ... I think?

Page 11: Django at Scale

Using Graphite / statsd

statsd.increment('api.3_0.endpoint_request.' + endpoint)

That’s it.

Periods are “namespaces”, created automatically.From devs at Etsy, check out their blog.

Page 12: Django at Scale

Error Logging

• Exception emails suck

• Want to ...

• ... group by issue

• ... store more than exceptions

• ... mark things fixed

• ... store more detailed output

• ... tie unique ID of a 500 to an exception

We were regularly locked out of Gmail when we used exception emails.

Page 13: Django at Scale

Sentry dashboard

Page 14: Django at Scale

Sentry detail

Page 15: Django at Scale

Using Sentryimport loggingfrom sentry.client.handlers import SentryHandler

logger = logging.getLogger()logger.addHandler(SentryHandler())

# usagelogging.error('There was some crazy error', exc_info=sys.exc_info(), extra={ # Optionally pass a request and we'll grab any information we can 'request': request,

# Otherwise you can pass additional arguments to specify request info 'view': 'my.view.name', 'url': request.build_absolute_url(),

'data': { # You may specify any values here and Sentry will log and output them 'username': request.user.username }})

Try generating and sending unique IDs, send them out with your 500 so you can search for them later (from user support requests, etc).

Page 16: Django at Scale

Background Tasks

• Slow external APIs

• Analytics and data processing

• Denormalization

• Sending email

• Updating avatars

• Running large imports/exports/deletes

Everyone can use this, it helps with scale but is useful for even the smallest apps.

Page 17: Django at Scale

Celery + RabbitMQ

• http://celeryproject.org/

• Super simple wrapper over AMQP (and more)

@taskdef check_spam(post): if slow_api.check_spam(post): post.update(spam=True)

# usagepost = Post.objects.all()[0]check_spam.delay(post)

Tried inventing our own queues and failed, don’t do it.Currently have over 40 queues.We have a Task subclass to help with testing (enable only tasks you want to run).Also good for throttling.

Page 18: Django at Scale

Celery + Eventlet = <3

• Especially for slow HTTP APIs

• Run hundreds/thousands of requests simultaneously

• Save yourself gigs of RAM, maybe a machine or two

Can be a bit painful... shoving functionality into Python that nobody expected.We have hacks to use the Django ORM, ask if you need help.Beware “threading” issues pop up with greenthreads, too.

Page 19: Django at Scale

Delayed Signals• Typical Django signals sent to a queue

# in models.pypost_save.connect(delayed.post_save_sender, sender=Post, weak=False)

# elsewheredef check_spam(sender, data, created, **kwargs): post = Post.objects.get(pk=data['id']) if slow_api.check_spam(post): post.update(spam=True)

delayed.post_save_receivers['spam'].connect(check_spam, sender=Post)

# usagepost = Post.objects.create(message="v1agr4!")

Not really for ‘scale’, more dev ease of use.We don’t serialize the object (hence the query).Not open sourced currently, easy to recreate.Questionable use ... it’s pretty easy to just task.delay() inside a normal post_save handler.

Page 20: Django at Scale

Dynamic Settings

• Change settings ...

• ... without re-deploying

• ... in realtime

• ... as a non-developer

Things that don’t deserve their own table.Hard to think of an example right now (but we built something more useful ontop of this... you’ll see).

Page 21: Django at Scale

modeldictclass Setting(models.Model): key = models.CharField(max_length=32) value = models.CharField(max_length=200)settings = ModelDict(Setting, key='key', value='value', instances=False)

# access missing valuesettings['foo']>>> KeyError

# set the valuesettings['foo'] = 'hello'

# fetch the current value using either methodSetting.objects.get(key='foo').value>>> 'hello'

settings['foo']>>> 'hello'

https://github.com/disqus/django-modeldict

Backed by the DB.Cached, invalidated on change, fetched once per request.

Page 22: Django at Scale

Feature Switches

• Do more development in master

• Dark launch risky features

• Release big changes slowly

• Free and easy beta testing

• Change all of this live without knowing how to code (and thus without needing to deploy)

No DB Magic, your stuff needs to be backwards compatible on the data layer.

Page 23: Django at Scale

Gargoyle

• https://github.com/disqus/gargoyle

Powered by modeldict.Everything remotely big goes under a switch.We have many, eventually clean when the feature is stable.

Page 24: Django at Scale

Using Gargoyle

from gargoyle import gargoyle

def my_function(request): if gargoyle.is_active('my switch name', request): return 'foo' else: return 'bar'

Also usable as a decorator, check out the docs.You can extend it for other models like .is_active(‘foo’, forum).Super handy but still overhead to support both versions, not free.

Page 25: Django at Scale

Caching

• Use pylibmc + libmemcached

• Use consistent hashing behavior (ketama)

• A few recommendations...

Page 26: Django at Scale

def update_homepage(request): page = Page.objects.get(name='home') page.body = 'herp derp' page.save()

cache.delete("page:home")

return HttpResponse("yay")

def homepage(request): page = cache.get("page:home") if not page: page = Page.objects.get(name='home') cache.set("page:home", page)

return HttpResponse(page.body)

Caching problem in update_homepage?

See any problems related to caching in “update_homepage”?If not, imagine the homepage is being hit 1000/sec, still?

Page 27: Django at Scale

Set don’t delete

• If possible, always set to prevent ...

• ... races

• ... stampedes

Previous slide: Race: Another request in transaction stores the old copy when it gets a cache miss. Stampede: 900 users start a DB query to fill the empty cache.Setting > Deleting fixes both of these.This happened to us a lot when we went from “pretty busy” to “constantly under high load”.Can still happen (more rarely) on small sites. Confuses users, gets you support tickets.

Page 28: Django at Scale

‘Keep’ cache

• Store in thread local memory

• Flush dict after request finishes

cache.get("moderators:cnn", keep=True)

Useful when something that hits cache may be called multiple times in different parts of the codebase.Yes, you can solve this in lots of other ways, I just feel like “keep” should be on by default.No released project, pretty easy to implement.Surprised I haven’t seen this elsewhere? Does anyone else do this?

Page 29: Django at Scale

Mint Cache

• Stores (val, refresh_time, refreshed)

• One (or few) clients will refresh cache, instead of a ton of them

• django-newcache does this

One guy gets an early miss, causing him to update the cache.Alternative is: item falls out of cache, stampede of users all go to update it at once.Check out newcache for code.

Page 30: Django at Scale

Django Patches

• https://github.com/disqus/django-patches

• Too deep, boring, use-case specific to go through here

• Not comprehensive

• All for 1.2, I have a (Disqus) branch where they’re ported to 1.3 ... can release if anyone cares

Maybe worth glancing through.Just wanted to point this out.Some of these MAY be needed for edge cases inside of our own open sources Django projects... we should really check. :)

Page 31: Django at Scale

DBor: The Bottleneck

• You should use Postgres (ahem)

• But none of this is specific to Postgres

• Joins are great, don’t shard until you have to

• Use an external connection pooler

• Beware NoSQL promises but embrace the shit out of it

External connection poolers have other advantages like sharing/re-using autocommit connections.Ad-hoc queries, relations and joins help you build most features faster, period.Also come to the Austin NoSQL meetup.

Page 32: Django at Scale

multidb

• Very easy to use

• Testing read slave code can be weird, check out our patches or ask me later

• Remember: as soon as you use a read slave you’ve entered the world of eventual consistency

No general solution to consistency problem, app specific.Huge annoyance/issue for us. Beware, here there be dragons.

Page 33: Django at Scale

Update don’t save

• Just like “set don’t delete”

• .save() flushes the entire row

• Someone else only changes ColA, you only change ColB ... if you .save() you revert his change

We send signals on update (lots of denormalization happens via signals), you may want to do this also. (in 1.3? a ticket? dunno)

Page 34: Django at Scale

Instance update

https://github.com/andymccurdy/django-tips-and-tricks/blob/master/model_update.py

# instead ofModel.objects.filter(pk=instance.id).update(foo=1)

# we can now doinstance.update(foo=1)

Prefer this to saving in nearly all cases.

Page 35: Django at Scale

ALTER hurts

• Large tables under load are hard to ALTER

• Especially annoying if you’re not adding anything complex

• Most common case (for us): new boolean

Page 36: Django at Scale

bitfield

https://github.com/disqus/django-bitfield

class Foo(models.Model): flags = BitField(flags=( 'awesome_flag', 'flaggy_foo', 'baz_bar', ))

# Add awesome_flagFoo.objects.filter(pk=o.pk).update(flags=F('flags') | Foo.flags.awesome_flag)

# Find by awesome_flagFoo.objects.filter(flags=Foo.flags.awesome_flag)

# Test awesome_flagif o.flags.awesome_flag: print "Happy times!"

Uses a single BigInt field for 64 booleans.Put one on your model from the start and you probably won’t need to add booleans ever again.

Page 37: Django at Scale

(Don’t default to) Transactions

• Default to autocommit=True

• Don’t use TransactionMiddleware unless you can prove that you need it

• Scalability pits that are hard to dig out of

Middleware was sexy as hell when I first saw it, now sworn mortal enemy.Hurts connection pooling, hurts the master DB, most apps just don’t need it.

Page 38: Django at Scale

Django DB Utils

• attach_foreignkey

• queryset_to_dict

• SkinnyQuerySet

• RangeQuerySet

https://github.com/disqus/django-db-utils

See Github page for explainations.

Page 39: Django at Scale

NoSQL

• We use a lot of Redis

• We’ve used and moved off of Mongo, Membase

• I’m a Riak fanboy

We mostly use Redis for denormalization, counters, things that aren’t 100% critical and can be re-filled on data loss.Has helped a ton with write load on Postgres.

Page 40: Django at Scale

Nydus

https://github.com/disqus/nydus

from nydus.db import create_cluster

redis = create_cluster({ 'engine': 'nydus.db.backends.redis.Redis', 'router': 'nydus.db.routers.redis.PartitionRouter', 'hosts': { 0: {'db': 0}, 1: {'db': 1}, 2: {'db': 2}, }})

res = conn.incr('foo')assert res == 1

It’s like django.db.connections for NoSQL.Notice that you never told conn which Redis host to use, the Router decided that for you based on key.Doesn’t do magic like rebalancing if you add a node (don’t do that), just a cleaner API.

Page 41: Django at Scale

Sharding

• Django Routers and some Postgres/Slony hackery make this pretty easy

• Need a good key to shard on, very app specific

• Lose full-table queries, aggregates, joins

• If you actually need it let’s talk

Fun to talk about but not general or applicable to 99%.

Page 42: Django at Scale

Various Tools

• Mule https://github.com/disqus/mule

• Chishop https://github.com/disqus/chishop

• Jenkins http://jenkins-ci.org/

• Fabric http://fabfile.org/

• coverage.py http://nedbatchelder.com/code/coverage/

• Vagrant http://vagrantup.com/

Not to mention virtualenv, pip, pyflakes, git-hooks ...

Page 43: Django at Scale

Get a job.

• Want to live & work in San Francisco?

http://disqus.com/jobs/