Post on 12-May-2015
description
Caching techinques in python
Michael Domanskieuropython 2010
czwartek, 22 lipca 2010
who I am
• python developer, professionally for a few years now
• experienced also in c and objective-c
• currently working for 10clouds.com
czwartek, 22 lipca 2010
Interesting intro
• a bit of theory
• common patterns
• common problems
• common solutions
czwartek, 22 lipca 2010
How I think about cache
• imagine a giant dict storing all your data
• you have to manage all data manually
• or provide some automated behaviour
czwartek, 22 lipca 2010
similar to....
• manual memory managment in c
• cache is memory
• and you have to controll it manually
czwartek, 22 lipca 2010
profits
• improved performance
• ...?
czwartek, 22 lipca 2010
problems
• managing any type of memory is hard
• automation often have to be done custom each time
czwartek, 22 lipca 2010
common patterns
czwartek, 22 lipca 2010
memoization
czwartek, 22 lipca 2010
• very old pattern (circa 1968)
• we own the name to Donald Mitchie
czwartek, 22 lipca 2010
• we assosciate input with output, and store in somewhere
• based on the assumption that for a given input, output is always the same
how it works
czwartek, 22 lipca 2010
code example
CACHE_DICT = {}
def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper
czwartek, 22 lipca 2010
what if output can change?
• our pattern is still usefull
• we simply need to add something
czwartek, 22 lipca 2010
cache invalidation
czwartek, 22 lipca 2010
There are only two hard problems in Computer Science: cache invalidation and naming things
Phil Karlton
czwartek, 22 lipca 2010
• basically, we update data in cache
• we need to know when and what to change
• the more granular you want to be, the harder it gets
czwartek, 22 lipca 2010
def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key
code example
czwartek, 22 lipca 2010
common problems
czwartek, 22 lipca 2010
invalidating too much/not enough
• flushing all data any time something changes
• not flushing cache at all
• tragic effects
czwartek, 22 lipca 2010
@cached('key1')def simple_function1(): return db_get(id=1)
@cached('key2')def simple_function2(): return db_get(id=2)
# SUPPOSE THIS IS IN ANOTHER MODULE
@cached('big_key1')def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ def inner_workings(): db_set(1, 'something totally new') ####### ## imagine 100 lines of code here :) ###### inner_workings()
return [simple_function1(),simple_function2()]
if __name__ == '__main__': simple_function1() simple_function2() a,b = some_bigger_function() assert a == db_get(id=1), "this fails because we didn't invalidated cache properly"
czwartek, 22 lipca 2010
invalidating too soon/too late
• your cache have to be synchronised to you db
• sometimes very hard to spot
• leads to tragic mistakes
czwartek, 22 lipca 2010
@cached('key1')def simple_function1(): return db_get(id=1)
@cached('key2')def simple_function2(): return db_get(id=2)
# SUPPOSE THIS IS IN ANOTHER MODULE
def some_bigger_function(): db_set(1, 'something') value = simple_function1() db_set(2, 'something else') #### now we know we used 2 cached functions so.... invalidate('key1') invalidate('key2') #### now we know we are safe, but for a price return simple_function2()
if __name__ == '__main__': some_bigger_function()
czwartek, 22 lipca 2010
superposition of dependancy
• somehow less obvious problem
• eventually you will start caching effects of computation
• you have to know very preciselly of what your data is dependant
czwartek, 22 lipca 2010
@cached('key1')def simple_function1(): return db_get(id=1)
@cached('key2')def simple_function2(): return db_get(id=2)
# SUPPOSE THIS IS IN ANOTHER MODULE
@cached('key')def some_bigger_function():
return { '1': simple_function1(), '2': simple_function2(), '3': db_get(id=3) }
if __name__ == '__main__': simple_function1() # somewhere else db_set(1, 'foobar') # and again db_set(3, 'bazbar') invalidate('key') # ooops, we forgot something data = some_bigger_function() assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the keys"
czwartek, 22 lipca 2010
summing up
• know your data....
• be aware what and when you cache
• take care when using cached data in computation
czwartek, 22 lipca 2010
common solutions
czwartek, 22 lipca 2010
process level cache
czwartek, 22 lipca 2010
why?
• very fast access
• simple to implement
• very effective as long as you’re using single process
czwartek, 22 lipca 2010
clever tricks with dicts
czwartek, 22 lipca 2010
code example
CACHE_DICT = {}
def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper
czwartek, 22 lipca 2010
invalidation
czwartek, 22 lipca 2010
def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key
code example
czwartek, 22 lipca 2010
application level cache
czwartek, 22 lipca 2010
memcache
czwartek, 22 lipca 2010
• battle tested
• scales
• fast
• supports a few cool features
• behaves a lot like dict
• supports time-based expiration
czwartek, 22 lipca 2010
• python-memcache
• python-libmemcache
• python-cmemcache
• pylibmc
libraries?
czwartek, 22 lipca 2010
why no benchmarks
• not the point of this talk :)
• benchmarks are generic, caching is specific
• pick your flavour, think for yourself
czwartek, 22 lipca 2010
cache = memcache.Client(['localhost:11211'])
def memcached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): value = cache.get(str(key)) if not value: value = func(*args, **kwargs) cache.set(str(key), value) return value return arg_wrapper return func_wrapper
code example
czwartek, 22 lipca 2010
invalidation
czwartek, 22 lipca 2010
def mem_invalidate(key): cache.set(str(key), None)
code example
czwartek, 22 lipca 2010
batch key managment
czwartek, 22 lipca 2010
• what if I don’t want to expire each key manually
• that’s a lot to remember
• and we have to be carefull :(
czwartek, 22 lipca 2010
groups?
• group keys into sets
• which are tied to one key per set
• expire one key, instead of twenty
czwartek, 22 lipca 2010
how to get there?
• store some extra data
• you can store dicts in cache
• and cache behaves like dict
• so it’s a case of comparing keys and values
czwartek, 22 lipca 2010
#we start with specified key and groupkey='some_key'group='some_group'
# now retrieve some data from memcacheddata=memcached_client.get_multi(key, group)# now data is a dict that should look like #{'some_key' :{'group_key' : '1234',# 'value' : 'some_value' },# 'some_group' : '1234'}#if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value']
czwartek, 22 lipca 2010
def cached(key, group_key='', exp_time=0 ):
# we don't want to mix time based and event based expiration models if group_key : assert exp_time==0, "can't set expiration time for grouped keys" def f_wrapper(func): def arg_wrapper(*args, **kwargs): value = None if group_key: data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)]) data_dict = data.get(tools.make_key(key)) if data_dict: value = data_dict['value'] group_value = data_dict['group_value'] if group_value != data[tools.make_key(group_key)]: value = None else: value = cache.get(key) if not value: value = func(*args, **kwargs) if exp_time: cache.set(tools.make_key(key), value, exp_time) elif not group_key: cache.set(tools.make_key(key), value) else: # exp_time not set and we have group_keys group_value = make_group_value(group_key) data_dict = { 'value':value, 'group_value': group_value} cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value }) return value arg_wrapper.__name__ = func.__name__ return arg_wrapper return f_wrapper
czwartek, 22 lipca 2010
questions?
czwartek, 22 lipca 2010
code samples @http://github.com/
mdomans/europython2010
czwartek, 22 lipca 2010
follow me
twitter: mdomansblog: blog.mdomans.com
czwartek, 22 lipca 2010