The Data Structures of Python
-
Upload
alexgaynor -
Category
Documents
-
view
228 -
download
0
Transcript of The Data Structures of Python
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 1/47
Data structures and
PythonAlex Gaynor
PyCon 2011
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 2/47
Who here remembers
their CS data structures class?
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 3/47
Who cares?
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 4/47
“We read Knuth soyou don't have to.“
- Tim Peters
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 5/47
If this isn’t CS102,what is it?
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 6/47
Why Python is
awesomeAs seen through data structures
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 7/47
I. What’s builtin, and how do we usethem?
II. Reaching for a little more
III.A little “do it yourself”
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 8/47
• list
• tuple
• dict
• set
• frozenset
The builtins
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 9/47
A brief interlude
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 10/47
Big O Notation
(in 30 seconds)
• How efficient an operation is, in termsof the number of items.
• O(1) - does the same number of
operations, regardless of how many
items
• O(n) - does a constant number
operation per item
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 11/47
Now where were we?
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 12/47
Awesome cars
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 13/47
lists, tuples, sets,
frozensets, and camels
• You might notice I didn’t include thedict, and it’s not just because camels
are awesome.
• dicts are a mapping, these are all
sequences
• That doesn’t make them
interchangeable
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 14/47
The pairings
• list vs tuple
• list vs set
• set vs frozenset
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 15/47
list vs tuple
• The obvious: lists are mutable andtuples aren’t
• So...
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 16/47
First the semantics
• I have a pretty simple rule about whento use tuples: only if using a namedtuple would be equally
approrpiate
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 17/47
class ArticleModelAdmin(admin.ModelAdmin):list_display = ["title", "author", "published", "get_absolute_url"]
class ArticleModelAdmin(admin.ModelAdmin):list_display = ("title", "author", "published", "get_absolute_url")
It’s a 2 character difference...
Which is right?
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 18/47
stocks = [("GOOG", "Google"),("T", "AT&T"),("AAPL", "Apple"),
]
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 19/47
And sometimes you
don’t have a choice
def memoize(func):cache = {}
def inner(*args): if args not in cache:
cache[args] = func(*args) return cache[args] return inner
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 20/47
The 2 Commandments
(so far)
1. Use types idiomatically.
2. Sometimes you don’t get a choice.
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 21/47
sets vs lists
• lists have an order, sets don’t
• list items can be anything, set itemsmust be hashable
• Computational complexity
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 22/47
def remove_dupes(seq):
seen = set()items = [] for item in seq: if item not in seen:
seen.add(item)items.append(item) return items
O(n)
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 23/47
list set
append/add
in/not in
remove
O(1) O(1)
O(n) O(1)
O(n) O(1)
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 24/47
3 Commandments
(and counting...)
1. Use types idiomatically.
2. Sometimes you don’t get a choice.
3. Be efficient, when it doesn’t cost you
anything.
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 25/47
set vs frozenset
• Immutable vs immutable
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 26/47
class Lexer(object):keywords = frozenset([
"for", "if", "while", # etc...])
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 27/47
Why no frozendict?
• lists and tuples, sets and frozensets
• Not a ton of usecases
• Semantics might surprise people: do
both the key and the value need to beimmutable?
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 28/47
Its kind of amazing
how far you can gowith just those
Or: what to do if you need a little more
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 29/47
The stdlib
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 30/47
collections.OrderedDict
• New in 2.7/3.1
• Previous implementations existed in
tons of 3rd party libs
•For when you’ve got a dict that has
order!
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 31/47
class Food(Model):
name = Field()kind = Field()delicious = Field()
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 32/47
Solution!
OrderedDict([("name", Field()),
("kind", Field()),("delicious", Field()),])
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 33/47
4 Articles of the data
structure constitution
1. Use types idiomatically.
2. Sometimes you don’t get a choice.
3. Be efficient, when it doesn’t cost youanything.
4. Sometimes you have more than oneconcern to deal with. The standard lib
can help!
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 34/47
collections.deque
• Ok I lied, I do use linked lists sometimes.
• Fact: list.pop(0) and list.insert(0) are
slow.
Monday, April 11, 2011
f
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 35/47
Also a really nifty ring
buffer
• Since 2.7 takes a maxlen parameter
• You keep on appending, it never getstoo big.
• Good for in memory logs and such.
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 36/47
See also
• The rest of collections
• array
• heapq (sort of)
• ... and more
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 37/47
DIY
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 38/47
collections.abc
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 39/47
Or: don’t subclass dict
ever>>> class SpecialDict(dict):... def __getitem__ (self, key):... return 42
... >>> SpecialDict()["a"]42>>> SpecialDict().get("a")>>> SpecialDict().get("a", 12)
12
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 40/47
• Subclassing Python’s builtincontainers tends not to produce the
results we want or expect.
• Subclassing the ABCs does.
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 41/47
>>> from collections import Mapping>>> >>> class SpecialDict(Mapping):... def __getitem__ (self, key):... return 42...
>>> SpecialDict()Traceback (most recent call last):
File "<stdin>", line 1, in <module>TypeError: Can't instantiate abstract class SpecialDictwith abstract methods __iter__, __len__
A tiny bit of checking for us
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 42/47
>>> from collections import Mapping>>> class SpecialDict(Mapping):... def __len__ (self):
... return 0
... def __iter__ (self):
... return iter([])
... def __getitem__ (self, key):
... return 72 # Best number evar!... >>> SpecialDict()["a"]72>>> SpecialDict().get("a")
72
Hallelujah!
Monday, April 11, 2011
I O d dDi t
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 43/47
I see your OrderedDict
and raise you:
OrderedSet
• Ordered collection, with that O(1) “in”
check we want
• But Python doesn’t include what,
whatever will I do!
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 44/47
import collections
KEY, PREV, NEXT = range(3)
class OrderedSet(collections.MutableSet):
def __init__ (self, iterable=None):self.end = end = []end += [None, end, end] # sentinel node for doubly linked listself.map = {} # key --> [key, prev, next]
if iterable is not None:self |= iterable
def __len__ (self): return len(self.map)
def __contains__ (self, key): return key in self.map
def add(self, key): if key not in self.map:
end = self.endcurr = end[PREV]curr[NEXT] = end[PREV] = self.map[key] = [key, curr, end]
def discard(self, key): if key in self.map:
key, prev, next = self.map.pop(key)prev[NEXT] = nextnext[PREV] = prev
def __iter__ (self):end = self.endcurr = end[NEXT]
while curr is not end: yield curr[KEY]
curr = curr[NEXT]
def __reversed__ (self):end = self.end
curr = end[PREV] while curr is not end: yield curr[KEY]
curr = curr[PREV]
def pop(self, last=True): if not self: raise KeyError('set is empty')
key = next(reversed(self)) if last else next(iter(self))self.discard(key)
return key
def __repr__ (self): if not self: return '%s()' % (self.__class__ .__name__,) return '%s(%r)' % (self.__class__ .__name__, list(self))
def __eq__ (self, other): if isinstance(other, OrderedSet): return len(self) == len(other) and list(self) == list(other) return set(self) == set(other)
def __del__ (self):self.clear() # remove circular references
http://code.activestate.com/recipes/576694/
Monday, April 11, 2011
5 t bl t f d t
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 45/47
5 tablets of data
structures from on high1. Use types idiomatically.
2. Sometimes you don’t get a choice.
3. Be efficient, when it doesn’t cost youanything.
4. Sometimes you have more than one concern
to deal with. The standard lib can help!
5. Don’t do more than you have to: ABCs are
there to help.
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 46/47
Questions?Comments? Thrown fruit?
Monday, April 11, 2011
8/7/2019 The Data Structures of Python
http://slidepdf.com/reader/full/the-data-structures-of-python 47/47
http://alexgaynor.net