Scaling Django Dc09

Scaling Django Web AppsMike Malone

djangocon 2009Thursday, September 10, 2009

Thursday, September 10, 2009

http://www.flickr.com/photos/kveton/2910536252/Thursday, September 10, 2009

djangocon 2009

Pownce

• Large scale

• Hundreds of requests/sec

• Thousands of DB operations/sec

• Millions of user relationships

• Millions of notes

• Terabytes of static data

djangocon 2009

Pownce

• Encountered and eliminated many common scaling bottlenecks

• Real world example of scaling a Django app

• Django provides a lot for free

• I’ll be focusing on what you have to build yourself, and the rare places where Django got in the way

Scalability

djangocon 2009

Scalability

• Speed / Performance

• Generally affected by language choice

• Achieved by adopting a particular technology

Scalability is NOT:

djangocon 2009

import time

def application(environ, start_response): time.sleep(10) start_response('200 OK', [('content-type', 'text/plain')]) return ('Hello, world!',)

A Scalable Application

djangocon 2009

def application(environ, start_response): remote_addr = environ['REMOTE_ADDR'] f = open('access-log', 'a+') f.write(remote_addr + "\n") f.flush() f.seek(0) hits = sum(1 for l in f.xreadlines()

if l.strip() == remote_addr) f.close() start_response('200 OK', [('content-type', 'text/plain')]) return (str(hits),)

A High Performance Application

djangocon 2009

Scalability

A scalable system doesn’t need to change when the size of the problem changes.

djangocon 2009

Scalability

• Accommodate increased usage

• Accommodate increased data

• Maintainable

djangocon 2009

Scalability

• Two kinds of scalability

• Vertical scalability: buying more powerful hardware, replacing what you already own

• Horizontal scalability: buying additional hardware, supplementing what you already own

djangocon 2009

Vertical Scalability

• Costs don’t scale linearly (server that’s twice is fast is more than twice as much)

• Inherently limited by current technology

• But it’s easy! If you can get away with it, good for you.

djangocon 2009

Vertical Scalability

Sky scrapers are special. Normal buildings don’t need 10 floor foundations. Just build!

- Cal Henderson

djangocon 2009

Horizontal Scalability

The ability to increase a system’s capacity by adding more processing units (servers)

djangocon 2009

It’s how large apps are scaled.

djangocon 2009

• A lot more work to design, build, and maintain

• Requires some planning, but you don’t have to do all the work up front

• You can scale progressively...

• Rest of the presentation is roughly in order

Caching

djangocon 2009

Caching

• Several levels of caching available in Django

• Per-site cache: caches every page that doesn’t have GET or POST parameters

• Per-view cache: caches output of an individual view

• Template fragment cache: caches fragments of a template

• None of these are that useful if pages are heavily personalized

djangocon 2009

Caching

• Low-level Cache API

• Much more flexible, allows you to cache at any granularity

• At Pownce we typically cached

• Individual objects

• Lists of object IDs

• Hard part is invalidation

djangocon 2009

Caching

• Cache backends:

• Memcached

• Database caching

• Filesystem caching

djangocon 2009

Caching

Use Memcache.

djangocon 2009

Sessions

Use Memcache.

djangocon 2009

Sessions

Or Tokyo Cabinethttp://github.com/ericflo/django-tokyo-sessions/

Thanks @ericflo

djangocon 2009

from django.core.cache import cache

class UserProfile(models.Model): ... def get_social_network_profiles(self): cache_key = ‘networks_for_%s’ % self.user.id profiles = cache.get(cache_key) if profiles is None: profiles = self.user.social_network_profiles.all() cache.set(cache_key, profiles) return profiles

Caching

Basic caching comes free with Django:

djangocon 2009

from django.core.cache import cachefrom django.db.models import signals

def nuke_social_network_cache(self, instance, **kwargs): cache_key = ‘networks_for_%s’ % self.instance.user_id cache.delete(cache_key)

signals.post_save.connect(nuke_social_network_cache, sender=SocialNetworkProfile)signals.post_delete.connect(nuke_social_network_cache, sender=SocialNetworkProfile)

Caching

Invalidate when a model is saved or deleted:

djangocon 2009

Caching

• Invalidate post_save, not pre_save

• Still a small race condition

• Simple solution, worked for Pownce:

• Instead of deleting, set the cache key to None for a short period of time

• Instead of using set to cache objects, use add, which fails if there’s already something stored for the key

djangocon 2009

Advanced Caching

• Memcached’s atomic increment and decrement operations are useful for maintaining counts

• They were added to the Django cache API in Django 1.1

djangocon 2009

Advanced Caching

• You can still use them if you poke at the internals of the cache object a bit

• cache._cache is the underlying cache object

try: result = cache._cache.incr(cache_key, delta)except ValueError: # nonexistent key raises ValueError # Do it the hard way, store the result.return result

djangocon 2009

Advanced Caching

• Other missing cache API

• delete_multi & set_multi

• append: add data to existing key after existing data

• prepend: add data to existing key before existing data

• cas: store this data, but only if no one has edited it since I fetched it

djangocon 2009

Advanced Caching

• It’s often useful to cache objects ‘forever’ (i.e., until you explicitly invalidate them)

• User and UserProfile

• fetched almost every request

• rarely change

• But Django won’t let you

• IMO, this is a bug :(

djangocon 2009

class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';'))

def add(self, key, value, timeout=0): if isinstance(value, unicode): value = value.encode('utf-8') return self._cache.add(smart_str(key), value, timeout or self.default_timeout)

The Memcache Backend

djangocon 2009

class CacheClass(BaseCache): def __init__(self, server, params): BaseCache.__init__(self, params) self._cache = memcache.Client(server.split(';'))

def add(self, key, value, timeout=None): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.add(smart_str(key), value, timeout)

The Memcache Backend

djangocon 2009

Advanced Caching

• Typical setup has memcached running on web servers

• Pownce web servers were I/O and memory bound, not CPU bound

• Since we had some spare CPU cycles, we compressed large objects before caching them

• The Python memcache library can do this automatically, but the API is not exposed

djangocon 2009

from django.core.cache import cachefrom django.utils.encoding import smart_strimport inspect as i

if 'min_compress_len' in i.getargspec(cache._cache.set)[0]: class CacheClass(cache.__class__): def set(self, key, value, timeout=None, min_compress_len=150000): if isinstance(value, unicode): value = value.encode('utf-8') if timeout is None: timeout = self.default_timeout return self._cache.set(smart_str(key), value, timeout, min_compress_len) cache.__class__ = CacheClass

Monkey Patching core.cache

djangocon 2009

Advanced Caching

• Useful tool: automagic single object cache

• Use a manager to check the cache prior to any single object get by pk

• Invalidate assets on save and delete

• Eliminated several hundred QPS at Pownce

djangocon 2009

Advanced Caching

All this and more at:

http://github.com/mmalone/django-caching/

djangocon 2009

Caching

Now you’ve made life easier for your DB server,next thing to fall over: your app server.

Load Balancing

djangocon 2009

Load Balancing

• Out of the box, Django uses a shared nothing architecture

• App servers have no single point of contention

• Responsibility pushed down the stack (to DB)

• This makes scaling the app layer trivial: just add another server

djangocon 2009

Load Balancing

App Servers

Database

Load Balancer

Spread work between multiple nodes in a cluster using a load balancer.

• Hardware or software• Layer 7 or Layer 4

djangocon 2009

Load Balancing

• Hardware load balancers

• Expensive, like $35,000 each, plus maintenance contracts

• Need two for failover / high availability

• Software load balancers

• Cheap and easy, but more difficult to eliminate as a single point of failure

• Lots of options: Perlbal, Pound, HAProxy, Varnish, Nginx

djangocon 2009

Load Balancing

• Most of these are layer 7 proxies, and some software balancers do cool things

• Caching

• Re-proxying

• Authentication

• URL rewriting

djangocon 2009

Load Balancing

A common setup for large operations is to use redundant layer 4 hardware balancers in front of a pool of layer 7 software balancers.

Hardware Balancers

Software Balancers

App Servers

djangocon 2009

Load Balancing

• At Pownce, we used a single Perlbal balancer

• Easily handled all of our traffic (hundreds of simultaneous connections)

• A SPOF, but we didn’t have $100,000 for black box solutions, and weren’t worried about service guarantees beyond three or four nines

• Plus there were some neat features that we took advantage of

djangocon 2009

Perlbal Reproxying

Perlbal reproxying is a really cool, and really poorlydocumented feature.

djangocon 2009

Perlbal Reproxying

1. Perlbal receives request

2. Redirects to App Server

1. App server checks auth (etc.)

2. Returns HTTP 200 with X-Reproxy-URL header set to internal file server URL

3. File served from file server via Perlbal

djangocon 2009

Perlbal Reproxying

• Completely transparent to end user

• Doesn’t keep large app server instance around to serve file

• Users can’t access files directly (like they could with a 302)

djangocon 2009

def download(request, filename): # Check auth, do your thing response = HttpResponse() response[‘X-REPROXY-URL’] = ‘%s/%s’ % (FILE_SERVER, filename) return response

Perlbal Reproxying

Plus, it’s really easy:

djangocon 2009

Load Balancing

Best way to reduce load on your app servers: don’t use them to do hard stuff.

Queuing

djangocon 2009

Queuing

• A queue is simply a bucket that holds messages until they are removed for processing by clients

• Many expensive operations can be queued and performed asynchronously

• User experience doesn’t have to suffer

• Tell the user that you’re running the job in the background (e.g., transcoding)

• Make it look like the job was done real-time (e.g., note distribution)

djangocon 2009

Queuing

• Lots of open source options for queuing

• Ghetto Queue (MySQL + Cron)

• this is the official name.

• Gearman

• TheSchwartz

• RabbitMQ

• Apache ActiveMQ

• ZeroMQ

djangocon 2009

Queuing

• Lots of fancy features: brokers, exchanges, routing keys, bindings...

• Don’t let that crap get you down, this is really simple stuff

• Biggest decision: persistence

• Does your queue need to be durable and persistent, able to survive a crash?

• This requires logging to disk which slows things down, so don’t do it unless you have to

djangocon 2009

Queuing

• Pownce used a simple ghetto queue built on MySQL / cron

• Problematic if you have multiple consumers pulling jobs from the queue

• No point in reinventing the wheel, there are dozens of battle-tested open source queues to choose from

djangocon 2009

from django.core.management import setup_environfrom mysite import settings

setup_environ(settings)

Django Standalone Scripts

Consumers need to setup the Django environment

THE DATABASE!

djangocon 2009

The Database

• Til now we’ve been talking about

• Shared nothing

• Pushing problems down the stack

• But we have to store a persistent and consistent view of our application’s state somewhere

• Enter, the database...

djangocon 2009

CAP Theorem

• Three properties of a shared-data system

• Consistency: all clients see the same data

• Availability: all clients can see some version of the data

• Partition Tolerance: system properties hold even when the system is partitioned & messages are lost

• But you can only have two

djangocon 2009

CAP Theorem

• Big long proof... here’s my version.

• Empirically, seems to make sense.

• Eric Brewer

• Professor at University of California, Berkeley

• Co-founder and Chief Scientist of Inktomi

• Probably smarter than me

djangocon 2009

CAP Theorem

• The relational database systems we all use were built with consistency as their primary goal

• But at scale our system needs to have high availability and must be partitionable

• The RDBMS’s consistency requirements get in our way

• Most sharding / federation schemes are kludges that trade consistency for availability & partition tolerance

djangocon 2009

The Database

• There are lots of non-relational databases coming onto the scene

• CouchDB

• Cassandra

• Tokyo Cabinet

• But they’re not that mature, and they aren’t easy to use with Django

Denormalization

djangocon 2009

Denormalization

• Django encourages normalized data, which is usually good

• But at scale you need to denormalize

• Corollary: joins are evil

• Django makes it really easy to do joins using the ORM, so pay attention

djangocon 2009

Denormalization

• Start with a normalized database

• Selectively denormalize things as they become bottlenecks

• Denormalized counts, copied fields, etc. can be updated in signal handlers

Replication

djangocon 2009

Replication

• Typical web app is 80 to 90% reads

• Adding read capacity will get you a long way

• MySQL Master-Slave replication

Read & Write

Read only

djangocon 2009

Replication

• Django doesn’t make it easy to use multiple database connections, but it is possible

• Some caveats

• Slave lag interacts with caching in weird ways

• You can only save to your primary DB (the one you configure in settings.py)

• Unless you get really clever...

djangocon 2009

class SlaveDatabaseWrapper(DatabaseWrapper): def _cursor(self, settings): if not self._valid_connection(): kwargs = { 'conv': django_conversions, 'charset': 'utf8', 'use_unicode': True, } kwargs = pick_random_slave(settings.SLAVE_DATABASES) self.connection = Database.connect(**kwargs) ... cursor = CursorWrapper(self.connection.cursor()) return cursor

Replication

1. Create a custom database wrapper by subclassing DatabaseWrapper

djangocon 2009

class MultiDBQuerySet(QuerySet): ... def update(self, **kwargs): slave_conn = self.query.connection self.query.connection = default_connection super(MultiDBQuerySet, self).update(**kwargs) self.query.connection = slave_conn

Replication

2. Custom QuerySet that uses primary DB for writes

djangocon 2009

class SlaveDatabaseManager(db.models.Manager): def get_query_set(self): return MultiDBQuerySet(self.model, query=self.create_query())

def create_query(self): return db.models.sql.Query(self.model, connection)

Replication

3. Custom Manager that uses your custom QuerySet

djangocon 2009

Replication

http://github.com/mmalone/django-multidb/

Example on github:

http://bit.ly/multidbThursday, September 10, 2009

djangocon 2009

Replication

• Goal:

• Read-what-you-write consistency for writer

• Eventual consistency for everyone else

• Slave lag screws things up

djangocon 2009

Replication

What happens when you become write saturated?

Federation

djangocon 2009

Federation

• Start with Vertical Partitioning: split tables that aren’t joined across database servers

• Actually pretty easy

• Except not with Django

djangocon 2009

Federation

django.db.models.base

djangocon 2009

Federation

• At some point you’ll need to split a single table across databases (e.g., user table)

• Auto-increment PKs won’t work

• It’d be nice to have a UUIDField for PKs

• You can probably build this yourself

Profiling, Monitoring & Measuring

djangocon 2009

>>> Article.objects.filter(pk=3).query.as_sql()('SELECT "app_article"."id", "app_article"."name", "app_article"."author_id" FROM "app_article" WHERE "app_article"."id" = %s ', (3,))

Know your SQL

djangocon 2009

>>> import sqlparse>>> def pp_query(qs):... t = qs.query.as_sql()... sql = t[0] % t[1]... print sqlparse.format(sql, reindent=True, keyword_case='upper')... >>> pp_query(Article.objects.filter(pk=3))SELECT "app_article"."id", "app_article"."name", "app_article"."author_id"FROM "app_article"WHERE "app_article"."id" = 3

Know your SQL

djangocon 2009

>>> from django.db import connection>>> connection.queries[{'time': '0.001', 'sql': u'SELECT "app_article"."id", "app_article"."name", "app_article"."author_id" FROM "app_article"'}]

Know your SQL

djangocon 2009

Know your SQL

• It’d be nice if a lightweight stacktrace could be done in QuerySet.__init__

• Stick the result in connection.queries

• Now we know where the query originated

djangocon 2009

Measuring

Django Debug Toolbar

http://github.com/robhudson/django-debug-toolbar/

djangocon 2009

Monitoring

• Ganglia

• Munin

You can’t improve what you don’t measure.

djangocon 2009

Measuring & Monitoring

• Measure

• Server load, CPU usage, I/O

• Database QPS

• Memcache QPS, hit rate, evictions

• Queue lengths

• Anything else interesting

All done... Questions?Contact me at mjmalone@gmail.com or @mjmalone

Scaling Django Dc09

Technology

Transcript of Scaling Django Dc09

DC09 086A. 294 kW (400 hp)mackboring.com/wp-content/uploads/2020/11/DC09-400HP.pdfSCANIA INDUSTRIAL ENGINES SE 151 87 Södertälje, Sweden Telephone +46 8 553 810 00 Telefax +46 8

DC09 DCSmith compressed - emcesd.com

New Batches Info - Quality Thought · Deploying Django to a production server j. Deploying Django with Apache and mod_wsgi k. Serving static files in production l. Scaling i. Running

Scaling Django Web Apps - files.meetup.com · Scaling Django Web Apps Mike Malone san francisco meetup Tuesday, May 26, 2009

Django installation · Django Girls Tutorialtutorial.djangogirls.org Django installation · Django Girls Tutorial DjangoGirls 7-9 minuti Note If you're using a Chromebook, skip this

Without Django - Pocoomitsuhiko.pocoo.org/django-without-django.pdf · Without Django applying django principles to non django projects. I Love Django ‣ Using Django since the very

Django Jobs | Django developers | Freelance Jobs

Customizing The Django Admin Euro Django Con09

Django-Chapter 6: The Django Admin Site

django-admin-tools Documentation - Read the Docsmedia.readthedocs.org/pdf/django-admin-tools/latest/django-admin... · django-admin-tools Documentation, Release 0.8.1 This documentation

Saas rad with django, django rest framework

DJANGO - mpvmgg.commpvmgg.com/Class X 2019/DJANGO.pdf · DJANGO ----- Django is an open source web application development framework. It was Named after famous Guitarist “Django

Meet Django - Django Webframework in Python

dC09 077a. 243 kW (330 hp) - Scania · dC09 077a. 243 kW (330 hp) eu stage iiiB Standard equipment • Scania Engine Management System, EMS • Extra high pressure fuel injection

django-scribbler Documentation · django-scribbler Documentation, Release 0.2.1 django-scribbler is an application for managing snippets of text for a Django website. Similar projects

django-frontend Documentation€¦ · django-frontend Documentation, Release 1.8.0 Django Frontend is a collection of static ﬁles and templates to jumpstart Django front-end development.

DC09 085A. 257 kW (350 hp) - Scania Group€¦ · DC09 (202 kW-257 kW) 786 DC09 (276 kW-294 kW) 900 DC13 (257 kW-331 kW) 900 DC13 (368 kW-405 kW) 970 DC16 970 Evaporator (DC16) SCR

django-dashing Documentation...django-dashing Documentation, Release 0.2 Django Settings Conﬁguration for Django Dashing is all namespaced inside a single Django setting, named DASHING.

ECML PKDD Discovery Challenge 2009 (DC09)ceur-ws.org/Vol-497/proceedings.pdf · ECML PKDD Discovery Challenge 2009 (DC09) ... Time based Tag Recommendation using Direct and Extended

Machine Learning-Powered Product categorization to ......Python, Scikit Learn, Jupyter Notebook (for prototyping), PySpark, Pickling using Django/ Flask (for scaling and production)