Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

32
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask Piotr Przymus Nicolaus Copernicus University Europython 2014, Berlin P. Przymus 1/31

description

Have you ever wondered what happens to all the precious RAM after running your 'simple' CPython code? Prepare yourself for a short introduction to CPython memory management! This presentation will try to answer some memory related questions you always wondered about. It will also discuss basic memory profiling tools and techniques.

Transcript of Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Page 1: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Everything You Always Wanted to Know AboutMemory in PythonBut Were Afraid to Ask

Piotr Przymus

Nicolaus Copernicus University

Europython 2014,Berlin

P. Przymus 1/31

Page 2: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

About Me

Piotr PrzymusPhD student / Research Assistant at Nicolaus Copernicus University.Interests: databases, GPGPU computing, datamining.8 years of Python experience.Some of my Python projects:

Parts of trading platform in turbineam.com (back testing, tradingalgorithms)Mussels bio-monitoring analysis and data mining software.Simulator of heterogeneus processing environment for evaluation ofdatabase query scheduling algorithms.

P. Przymus 2/31

Page 3: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Size of objects

Table: Size of different types in bytes

Type Python32 bit 64 bit

int (py-2.7) 12 24long (py-2.7) / int (py-3.3) 14 30

+2 · number of digitsfloat 16 24complex 24 32str (py-2.7) / bytes (py-3.3) 24 40

+2 · lengthunicode (py-2.7) / str (py-3.3) 28 52

+(2 or 4) ∗ lengthtuple 24 64

+(4 · length) +(8 · length)

P. Przymus 3/31

Page 4: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

DIY – check size of objects

sys.getsizeof(obj)

From documentationSince Python 2.6Return the size of an object in bytes. The object can be any type.All built-in objects will return correct results.May not be true for third-party extensions as it is implementationspecific.Calls the object’s sizeof method and adds an additional garbagecollector overhead if the object is managed by the garbage collector.

P. Przymus 4/31

Page 5: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Objects interning – fun example

1 a = [ i % 257 for i in xrange (2**20) ]2

Listing 1: List of interned integers

1 b = [ 1024 + i % 257 for i in xrange (2**20) ]2

Listing 2: List of integers

Any allocation difference between Listing 1 and Listing 2 ?

Results measured using psutilsListing 1 – (resident=15.1M, virtual=2.3G)Listing 2 – (resident=39.5M, virtual=2.4G)

P. Przymus 5/31

Page 6: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Objects interning – fun example

1 a = [ i % 257 for i in xrange (2**20) ]2

Listing 3: List of interned integers

1 b = [ 1024 + i % 257 for i in xrange (2**20) ]2

Listing 4: List of integers

Any allocation difference between Listing 1 and Listing 2 ?

Results measured using psutilsListing 1 – (resident=15.1M, virtual=2.3G)Listing 2 – (resident=39.5M, virtual=2.4G)

P. Przymus 5/31

Page 7: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Objects interning – explained

Objects and variables – general ruleObjects are allocated on assignmentVariables just point to objects (i.e. they do not hold the memory)

Interning of ObjectsThis is an exception to the general rule.Python implementation specific (examples from CPython).”Often” used objects are preallocated and are shared instead of costlynew alloc.Mainly due to the performance optimization.

1 >>> a = 0, b = 02 >>> a is b, a == b3 (True , True)4

Listing 5: Interning of Objects

1 >>> a = 1024 , b = 10242 >>> a is b, a == b3 (False , True)4

Listing 6: Objects allocationP. Przymus 6/31

Page 8: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Objects interning – behind the scenes

WarningThis is Python implementation dependent.This may change in the future.This is not documented because of the above reasons.For reference consult the source code.

CPython 2.7 - 3.4Single instances for:

int – in range [−5, 257)str / unicode – empty string and all length=1 stringsunicode / str – empty string and all length=1 strings for Latin-1tuple – empty tuple

P. Przymus 7/31

Page 9: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

String interning – example

1 >>> a, b = " strin ", " string "2 >>> a + ’g’ is b # returns False3 >>> intern (a+’g’) is intern (b) # returns True4 >>> a = [ "spam %d" % (i % 257)\5 for i in xrange (2**20) ]6 >>> # memory usage ( resident =57.6M, virtual =2.4G)7 >>> a = [ intern ("spam %d" % (i % 257))\8 for i in xrange (2**20) ]9 >>> # memory usage ( resident =14.9M, virtual =2.3G)

10

Listing 7: String interning

P. Przymus 8/31

Page 10: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

String interning – explained

String interning definitionString interning is a method of storing only one copy of each distinct stringvalue, which must be immutable.

intern (py-2.x) / sys.intern (py-3.x)From Cpython documentation:

Enter string in the table of “interned” strings.Return the interned string (string or string copy).Useful to gain a little performance on dictionary lookup (keycomparisons after hashing can be done by a pointer compare instead ofa string compare).Names used in programs are automatically internedDictionaries used to hold module, class or instance attributes haveinterned keys.

P. Przymus 9/31

Page 11: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Mutable Containers Memory Allocation Strategy

Plan for growth and shrinkageSlightly overallocate memory neaded by container.Leave room to growth.Shrink when overallocation threshold is reached.

Reduce number of expensive function calls:relloc()memcpy()

Use optimal layout.

List, Sets, Dictionaries

P. Przymus 10/31

Page 12: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

List allocation – example

Figure: List growth example

P. Przymus 11/31

Page 13: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

List allocation strategy

Represented as fixed-length array of pointers.Overallocation for list growth (by append)

List size growth: 4, 8, 16, 25, 35, 46, . . .For large lists less then 12.5%

Due to the memory actions involved, operations:at end of list are cheap (rare realloc),in the middle or beginning require memory copy or shift!

Note that for 1,2,5 elements lists, space is wasted.List allocation size:

32 bits – 32 + (4 * length)64 bits – 72 + (8 * length)

Shrinking only when list size < 1/2 of allocated space.

P. Przymus 12/31

Page 14: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Overallocation of dictionaries/sets

Represented as fixed-length hash tables.Overallocation for dict/sets – when 2/3 of capacity is reached.

if number of elements < 50000: quadruple the capacityelse: double the capacity

1 // dict growth strategy2 (mp ->ma_used >50000 ? 2 : 4) * mp -> ma_used ;3 // set growth strategy4 so ->used >50000 ? so ->used *2 : so ->used *4);5

Dict/Set growth/shrink code1 for ( newsize = PyDict_MINSIZE ;2 newsize <= minused && newsize > 0;3 newsize <<= 1);4

Shrinkage if dictionary/set fill (real and dummy elements) is much largerthan used elements (real elements) i.e. lot of keys have been deleted.

P. Przymus 13/31

Page 15: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Various data representation

1 # Fields : field1 , field2 , field3 , ... , field82 # Data: "foo 1", "foo 2", "foo 3", ... , "foo 8"3 class OldStyleClass : #only py -2.x4 ...5 class NewStyleClass ( object ): # default for py -3.x6 ...7 class NewStyleClassSlots ( object ):8 __slots__ = (’field1 ’, ’field2 ’, ...)9 ...

10 import collections as c11 NamedTuple = c. namedtuple (’nt ’, [ ’field1 ’, ... ,])12

13 TupleData = (’value1 ’, ’value2 ’, ....)14 ListaData = [’value1 ’, ’value2 ’, ....]15 DictData = {’field1 ’:, ’value2 ’, ....}16

Listing 8: Various data representation

P. Przymus 14/31

Page 16: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Various data representation – allocated memory

0 MB 50 MB 100 MB 150 MB

OldStyleClass

NewStyleClass

DictData

NamedTuple

TupleData

ListaData

NewStyleClassWithSlots

Python 2.x Python 3.x

Figure: Allocated memory after creating 100000 objects with 8 fields eachP. Przymus 15/31

Page 17: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Notes on garbage collector, reference count and cycles

Python garbage collectorUses reference counting.Offers cycle detection.Objects garbage-collected when count goes to 0.Reference increment, e.g.: object creation, additional aliases, passed tofunctionReference decrement, e.g.: local reference goes out of scope, alias isdestroyed, alias is reassigned

Warning – from documentationObjects that have del () methods and are part of a reference cycle causethe entire reference cycle to be uncollectable!

Python doesn’t collect such cycles automatically.It is not possible for Python to guess a safe order in which to run the

del () methods.

P. Przymus 16/31

Page 18: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Tools

psutilmemory profilerobjgraphMeliae (could be combined with runsnakerun)Heapy

P. Przymus 17/31

Page 19: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Tools – psutil

psutil – A cross-platform process and system utilities module for Python.

1 import psutil2 import os3 ...4 p = psutil . Process (os. getpid ())5 pinfo = p. as_dict ()6 ...7 print pinfo [’memory_percent ’],8 print pinfo [’memory_info ’].rss , pinfo [’memory_info ’]. vms

Listing 9: Various data representation

P. Przymus 18/31

Page 20: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Tools – memory profiler

memory profiler – a module for monitoring memory usage of a pythonprogram.

Recommended dependency: psutil.May work as:

Line-by-line profiler.Memory usage monitoring (memory in time).Debugger trigger – setting debugger breakpoints.

P. Przymus 19/31

Page 21: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

memory profiler – Line-by-line profilerPreparation

To track particular functions use profile decorator.Running

1 python -m memory_profiler

1 Line # Mem usage Increment Line Contents2 ================================================3 45 9.512 MiB 0.000 MiB @profile4 46 def create_lot_of_stuff (

times = 10000 , cl = OldStyleClass ):5 47 9.516 MiB 0.004 MiB ret = []6 48 9.516 MiB 0.000 MiB t = "foo %d"7 49 156.449 MiB 146.934 MiB for i in xrange ( times ):8 50 156.445 MiB -0.004 MiB l = [ t % (j + i%8)

for j in xrange (8)]9 51 156.449 MiB 0.004 MiB c = cl (*l)

10 52 156.449 MiB 0.000 MiB ret. append (c)11 53 156.449 MiB 0.000 MiB return ret

Listing 10: Results

P. Przymus 20/31

Page 22: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

memory profiler – memory usage monitoringPreparation

To track particular functions use profile decorator.Running and plotting

1 mprof run --python python uniwerse .py -f 100 100 -s 100100 10

2 mprof plot

Figure: ResultsP. Przymus 21/31

Page 23: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

memory profiler – Debugger trigger

1 eror@eror - laptop :˜$ python -m memory_profiler --pdb -mmem =10uniwerse .py -s 100 100 10

2 Current memory 20.80 MiB exceeded the maximumof 10.00 MiB3 Stepping into the debugger4 > /home/eror/ uniwerse .py (52) connect ()5 -> self.adj. append (n)6 (Pdb)

Listing 11: Debugger trigger – setting debugger breakpoints.

P. Przymus 22/31

Page 24: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Tools – objgraph

objgraph – draws Python object reference graphs with graphviz.1 import objgraph2 x = []3 y = [x, [x], dict(x=x)]4 objgraph . show_refs ([y], filename =’sample - graph .png ’)5 objgraph . show_backrefs ([x], filename =’sample -backref - graph .png ’

)

Listing 12: Tutorial example

Figure: Reference graph Figure: Back reference graphP. Przymus 23/31

Page 25: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Tools – Heapy/Meliae

HeapyThe heap analysis toolset. It can be used to find information about theobjects in the heap and display the information in various ways.

part of ”Guppy-PE – A Python Programming Environment”

MeliaePython Memory Usage Analyzer

”This project is similar to heapy (in the ’guppy’ project), in its attemptto understand how memory has been allocated.”runsnakerun GUI support.

P. Przymus 24/31

Page 26: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Tools – Heapy1 from guppy import hpy2 hp=hpy ()3 h1 = hp.heap ()4 l = [ range (i) for i in xrange (2**10) ]5 h2 = hp.heap ()6 print h2 - h1

Listing 13: Heapy example

1 Partition of a set of 294937 objects . Total size = 11538088bytes .

2 Index Count % Size % Cumulative % Kind ( class / dictof class )

3 0 293899 100 7053576 61 7053576 61 int4 1 1025 0 4481544 39 11535120 100 list5 2 6 0 1680 0 11536800 100 dict (no owner )6 3 2 0 560 0 11537360 100 dict of guppy .etc.

Glue. Owner7 4 1 0 456 0 11537816 100 types . FrameType8 5 2 0 144 0 11537960 100 guppy .etc.Glue.

Owner9 6 2 0 128 0 11538088 100 str

Listing 14: ResultsP. Przymus 25/31

Page 27: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Meliae and runsnakerun1 from meliae import scanner2 scanner . dump_all_objects (" representation_meliae .dump")3 # In shell : runsnakemem representation_meliae .dump

Listing 15: Heapy example

Figure: Meliae and runsnakerunP. Przymus 26/31

Page 28: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

malloc() alternatives – libjemalloc and libtcmalloc

Pros:In some cases using different malloc() implementation ”may” help toretrieve memory from CPython back to system.

Cons:But equally it may work against you.

1 $LD_PRELOAD ="/usr/lib/ libjemalloc .so .1" pythonint_float_alloc .py

2 $ LD_PRELOAD ="/usr/lib/ libtcmalloc_minimal .so .4" pythonint_float_alloc .py

Listing 16: Changing memory allocator

P. Przymus 27/31

Page 29: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

malloc() alternatives – libjemalloc and libtcmalloc

Step malloc jemalloc tcmallocres virt res virt res virt

step 1 7.4M 46.5M 8.0M 56.9M 9.4M 56.1Mstep 2 40.0M 79.1M 41.6M 88.9M 42.5M 89.3Mstep 3 16.2M 55.3M 8.2M 88.9M 42.5M 89.3Mstep 4 40.0M 84.3M 41.5M 100.9M 51.5M 98.4Mstep 5 8.2M 47.3M 8.5M 100.9M 51.5M 98.4M

P. Przymus 28/31

Page 30: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Other useful tools

Build Python in debug mode (./configure –with-pydebug . . . ).Maintains list of all active objects.Upon exit (or every statement in interactive mode), print all existingreferences.Trac total allocation.

valgrind – a programming tool for memory debugging, leak detection,and profiling. Rather low level.

CPython can cooperate with valgrind (for >= py-2.7, py-3.2)gdb-heap (gdb extension)

low level, still experimentalcan be attached to running processesmay be used with core file

Web applications memory leaksdowser – cherrypy application that displays sparklines of python objectcounts.dozer – wsgi middleware version of the cherrypy memory leak debugger(any wsgi application).

P. Przymus 29/31

Page 31: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

Summary

Summary:Try to understand better underlying memory model.Pay attention to hot spots.Use profiling tools.”Seek and destroy” – find the root cause of the memory leak and fix it ;)

Quick and sometimes dirty solutions:Delegate memory intensive work to other process.Regularly restart process.Go for low hanging fruits (e.g. slots , different allocators).

P. Przymus 30/31

Page 32: Everything You Always Wanted to Know About Memory in Python But Were Afraid to Ask

Introduction Basic stuff Notes on memory model Memory profiling tools Summary References

References

Wesley J. Chun, Principal CyberWeb Consulting, ”Python 103...MMMM: Understanding Python’s Memory Model, Mutability, Methods”David Malcolm, Red Hat, ”Dude – Where’s My RAM?” A deep dive intohow Python uses memory.Evan Jones, Improving Python’s Memory AllocatorAlexander Slesarev, Memory reclaiming in PythonSource code of PythonTools documentation

P. Przymus 31/31