The Data Structures of Python

47
Data structures and Python Alex Gaynor PyCon 2011 Monday, April 11, 2011

Transcript of The Data Structures of Python

Page 1: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 1/47

Data structures and

PythonAlex Gaynor

PyCon 2011

Monday, April 11, 2011

Page 2: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 2/47

Who here remembers

their CS data structures class?

Monday, April 11, 2011

Page 3: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 3/47

Who cares?

Monday, April 11, 2011

Page 4: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 4/47

“We read Knuth soyou don't have to.“

- Tim Peters

Monday, April 11, 2011

Page 5: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 5/47

If this isn’t CS102,what is it?

Monday, April 11, 2011

Page 6: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 6/47

Why Python is

awesomeAs seen through data structures

Monday, April 11, 2011

Page 7: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 7/47

I. What’s builtin, and how do we usethem?

II. Reaching for a little more

III.A little “do it yourself”

Monday, April 11, 2011

Page 8: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 8/47

• list

• tuple

• dict

• set

• frozenset

The builtins

Monday, April 11, 2011

Page 9: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 9/47

A brief interlude

Monday, April 11, 2011

Page 10: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 10/47

Big O Notation

(in 30 seconds)

• How efficient an operation is, in termsof the number of items.

• O(1) - does the same number of 

operations, regardless of how many 

items

• O(n) - does a constant number

operation per item

Monday, April 11, 2011

Page 11: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 11/47

Now where were we?

Monday, April 11, 2011

Page 12: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 12/47

Awesome cars

Monday, April 11, 2011

Page 13: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 13/47

lists, tuples, sets,

frozensets, and camels

• You might notice I didn’t include thedict, and it’s not just because camels

are awesome.

• dicts are a mapping, these are all

sequences

• That doesn’t make them

interchangeable

Monday, April 11, 2011

Page 14: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 14/47

The pairings

• list vs tuple

• list vs set

• set vs frozenset

Monday, April 11, 2011

Page 15: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 15/47

list vs tuple

• The obvious: lists are mutable andtuples aren’t

• So...

Monday, April 11, 2011

Page 16: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 16/47

First the semantics

• I have a pretty simple rule about whento use tuples: only if using a namedtuple would be equally 

approrpiate

Monday, April 11, 2011

Page 17: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 17/47

class ArticleModelAdmin(admin.ModelAdmin):list_display = ["title", "author", "published", "get_absolute_url"]

class ArticleModelAdmin(admin.ModelAdmin):list_display = ("title", "author", "published", "get_absolute_url")

It’s a 2 character difference...

Which is right?

Monday, April 11, 2011

Page 18: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 18/47

stocks = [("GOOG", "Google"),("T", "AT&T"),("AAPL", "Apple"),

]

Monday, April 11, 2011

Page 19: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 19/47

And sometimes you

don’t have a choice

def memoize(func):cache = {}

  def inner(*args):  if args not in cache:

cache[args] = func(*args)  return cache[args]  return inner

Monday, April 11, 2011

Page 20: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 20/47

The 2 Commandments

(so far)

1. Use types idiomatically.

2. Sometimes you don’t get a choice.

Monday, April 11, 2011

Page 21: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 21/47

sets vs lists

• lists have an order, sets don’t

• list items can be anything, set itemsmust be hashable

• Computational complexity 

Monday, April 11, 2011

Page 22: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 22/47

def remove_dupes(seq):

seen = set()items = []  for item in seq:  if item not in seen:

seen.add(item)items.append(item)  return items

O(n)

Monday, April 11, 2011

Page 23: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 23/47

list set

append/add

in/not in

remove

O(1) O(1)

O(n) O(1)

O(n) O(1)

Monday, April 11, 2011

Page 24: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 24/47

3 Commandments

(and counting...)

1. Use types idiomatically.

2. Sometimes you don’t get a choice.

3. Be efficient, when it doesn’t cost you

anything.

Monday, April 11, 2011

Page 25: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 25/47

set vs frozenset

• Immutable vs immutable

Monday, April 11, 2011

Page 26: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 26/47

class Lexer(object):keywords = frozenset([

  "for", "if", "while", # etc...])

Monday, April 11, 2011

Page 27: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 27/47

Why no frozendict?

• lists and tuples, sets and frozensets

• Not a ton of usecases

• Semantics might surprise people: do

both the key and the value need to beimmutable?

Monday, April 11, 2011

Page 28: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 28/47

Its kind of amazing 

how far you can gowith just those

Or: what to do if you need a little more

Monday, April 11, 2011

Page 29: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 29/47

The stdlib

Monday, April 11, 2011

Page 30: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 30/47

collections.OrderedDict

• New in 2.7/3.1

• Previous implementations existed in

tons of 3rd party libs

•For when you’ve got a dict that has

order!

Monday, April 11, 2011

Page 31: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 31/47

class Food(Model):

name = Field()kind = Field()delicious = Field()

Monday, April 11, 2011

Page 32: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 32/47

Solution!

OrderedDict([("name", Field()),

("kind", Field()),("delicious", Field()),])

Monday, April 11, 2011

Page 33: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 33/47

4 Articles of the data 

structure constitution

1. Use types idiomatically.

2. Sometimes you don’t get a choice.

3. Be efficient, when it doesn’t cost youanything.

4. Sometimes you have more than oneconcern to deal with. The standard lib

can help!

Monday, April 11, 2011

Page 34: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 34/47

collections.deque

• Ok I lied, I do use linked lists sometimes.

• Fact: list.pop(0) and list.insert(0) are

slow.

Monday, April 11, 2011

f

Page 35: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 35/47

Also a really nifty ring 

buffer

• Since 2.7 takes a maxlen parameter

• You keep on appending, it never getstoo big.

• Good for in memory logs and such.

Monday, April 11, 2011

Page 36: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 36/47

See also

• The rest of collections

• array 

• heapq (sort of)

• ... and more

Monday, April 11, 2011

Page 37: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 37/47

DIY 

Monday, April 11, 2011

Page 38: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 38/47

collections.abc

Monday, April 11, 2011

Page 39: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 39/47

Or: don’t subclass dict

ever>>> class SpecialDict(dict):...  def __getitem__ (self, key):...  return 42

... >>> SpecialDict()["a"]42>>> SpecialDict().get("a")>>> SpecialDict().get("a", 12)

12

Monday, April 11, 2011

Page 40: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 40/47

• Subclassing Python’s builtincontainers tends not to produce the

results we want or expect.

• Subclassing the ABCs does.

Monday, April 11, 2011

Page 41: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 41/47

>>> from collections import Mapping>>> >>> class SpecialDict(Mapping):...  def __getitem__ (self, key):...  return 42... 

>>> SpecialDict()Traceback (most recent call last):

File "<stdin>", line 1, in <module>TypeError: Can't instantiate abstract class SpecialDictwith abstract methods __iter__, __len__ 

A tiny bit of checking for us

Monday, April 11, 2011

Page 42: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 42/47

>>> from collections import Mapping>>> class SpecialDict(Mapping):...  def __len__ (self):

...  return 0

...  def __iter__ (self):

...  return iter([])

...  def __getitem__ (self, key):

...  return 72 # Best number evar!... >>> SpecialDict()["a"]72>>> SpecialDict().get("a")

72

Hallelujah!

Monday, April 11, 2011

I O d dDi t

Page 43: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 43/47

I see your OrderedDict

and raise you:

OrderedSet

• Ordered collection, with that O(1) “in”

check we want

• But Python doesn’t include what,

whatever will I do!

Monday, April 11, 2011

Page 44: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 44/47

import collections

KEY, PREV, NEXT = range(3)

class OrderedSet(collections.MutableSet):

  def __init__ (self, iterable=None):self.end = end = []end += [None, end, end] # sentinel node for doubly linked listself.map = {} # key --> [key, prev, next]

  if iterable is not None:self |= iterable

  def __len__ (self):  return len(self.map)

  def __contains__ (self, key):  return key in self.map

  def add(self, key):  if key not in self.map:

end = self.endcurr = end[PREV]curr[NEXT] = end[PREV] = self.map[key] = [key, curr, end]

  def discard(self, key):  if key in self.map:

key, prev, next = self.map.pop(key)prev[NEXT] = nextnext[PREV] = prev

  def __iter__ (self):end = self.endcurr = end[NEXT]

  while curr is not end:  yield curr[KEY]

curr = curr[NEXT]

  def __reversed__ (self):end = self.end

curr = end[PREV]  while curr is not end:  yield curr[KEY]

curr = curr[PREV]

  def pop(self, last=True):  if not self:  raise KeyError('set is empty')

key = next(reversed(self)) if last else next(iter(self))self.discard(key)

  return key

  def __repr__ (self):  if not self:  return '%s()' % (self.__class__ .__name__,)  return '%s(%r)' % (self.__class__ .__name__, list(self))

  def __eq__ (self, other):  if isinstance(other, OrderedSet):  return len(self) == len(other) and list(self) == list(other)  return set(self) == set(other)

  def __del__ (self):self.clear() # remove circular references

http://code.activestate.com/recipes/576694/

Monday, April 11, 2011

5 t bl t f d t

Page 45: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 45/47

5 tablets of data 

structures from on high1. Use types idiomatically.

2. Sometimes you don’t get a choice.

3. Be efficient, when it doesn’t cost youanything.

4. Sometimes you have more than one concern

to deal with. The standard lib can help!

5. Don’t do more than you have to: ABCs are

there to help.

Monday, April 11, 2011

Page 46: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 46/47

Questions?Comments? Thrown fruit?

Monday, April 11, 2011

Page 47: The Data Structures of Python

8/7/2019 The Data Structures of Python

http://slidepdf.com/reader/full/the-data-structures-of-python 47/47

http://alexgaynor.net