Download - Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Transcript
Page 1: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Byterun: A (C)Python interpreter in Python

Allison Kaptur !

github.com/akaptur akaptur.github.io

@akaptur

Page 2: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Byterun with Ned Batchelder

!Based on

# pyvm2 by Paul Swartz (z3p) from http://www.twistedmatrix.com/users/z3p/

Page 3: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Why would you do such a thing

>>> if a or b: ... do_stuff()

Page 4: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can do

out = "" for i in range(5): out = out + str(i) print(out)

Page 5: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can do

def fn(a, b=17, c="Hello", d=[]): d.append(99) print(a, b, c, d) !fn(1) fn(2, 3) fn(3, c="Bye") fn(4, d=["What?"]) fn(5, "b", "c")

Page 6: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can do

def verbose(func): def _wrapper(*args, **kwargs): return func(*args, **kwargs) return _wrapper !@verbose def add(x, y): return x+y !add(7, 3)

Page 7: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can do

try: raise ValueError("oops") except ValueError as e: print("Caught: %s" % e) print("All done")

Page 8: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can doclass NullContext(object): def __enter__(self): l.append('i') return self ! def __exit__(self, exc_type, exc_val, exc_tb): l.append('o') return False !l = [] for i in range(3): with NullContext(): l.append('w') if i % 2: break l.append('z') l.append('e') !l.append('r') s = ''.join(l) print("Look: %r" % s) assert s == "iwzoeiwor"

Page 9: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can do

g = (x*x for x in range(3)) print(list(g))

Page 10: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

A problem

g = (x*x for x in range(5)) h = (y+1 for y in g) print(list(h))

Page 11: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

The Python virtual machine: !

A bytecode interpreter

Page 12: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Bytecode: the internal representation of a python

program in the interpreter

Page 13: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Bytecode: it’s bytes!

>>> def mod(a, b): ... ans = a % b ... return ans

Page 14: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Bytecode: it’s bytes!

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod.func_code.co_code

Function Code object

Bytecode

Page 15: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Bytecode: it’s bytes!

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod.func_code.co_code '|\x00\x00|\x01\x00\x16}\x02\x00|\x02\x00S'

Page 16: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Bytecode: it’s bytes!

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod.func_code.co_code ‘|\x00\x00|\x01\x00\x16}\x02\x00|\x02\x00S' >>> [ord(b) for b in mod.func_code.co_code] [124, 0, 0, 124, 1, 0, 22, 125, 2, 0, 124, 2, 0, 83]

Page 17: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

dis, a bytecode disassembler

>>> import dis >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) ! 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Page 18: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

dis, a bytecode disassembler

>>> import dis >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) ! 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Line Number

Index in bytecode Instruction

name, for humans

More bytes, the argument to each

instruction

Hint about arguments

Page 19: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

whatever

some other thing

something

whatever

some other thing

something

a

b

whatever

some other thing

something

ans

Before After BINARY_MODULO

After LOAD_FAST

Data stack on a frame

Page 20: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 return z return bar(x) foo() # <--- (1) !c a l l !s t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [<foo>] k ---------------------

Page 21: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l --------------------- | foo Frame | -> blocks: [] s | | -> data: [<bar>, 1] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 22: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c --------------------- a | bar Frame | -> blocks: [] l | (newest) | -> data: [1, 2] l --------------------- | foo Frame | -> blocks: [] s | | -> data: [] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 23: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c --------------------- a | bar Frame | -> blocks: [] l | (newest) | -> data: [3] l --------------------- | foo Frame | -> blocks: [] s | | -> data: [] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 24: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l --------------------- | foo Frame | -> blocks: [] s | | -> data: [3] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 25: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l !s t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [3] k ---------------------

Page 26: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

dis, a bytecode disassembler

>>> import dis >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) ! 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Page 27: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014
Page 28: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

} /*switch*/

/* Main switch on opcode */ READ_TIMESTAMP(inst0); !switch (opcode) {

Page 29: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

#ifdef CASE_TOO_BIG default: switch (opcode) { #endif

/* Turn this on if your compiler chokes on the big switch: */ /* #define CASE_TOO_BIG 1 */

Page 30: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Back to that bytecode

!>>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) ! 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Page 31: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

case LOAD_FAST: x = GETLOCAL(oparg); if (x != NULL) { Py_INCREF(x); PUSH(x); goto fast_next_opcode; } format_exc_check_arg(PyExc_UnboundLocalError, UNBOUNDLOCAL_ERROR_MSG, PyTuple_GetItem(co->co_varnames, oparg)); break;

Page 32: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

case BINARY_MODULO: w = POP(); v = TOP(); if (PyString_CheckExact(v)) x = PyString_Format(v, w); else x = PyNumber_Remainder(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) continue; break;

Page 33: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

It’s “dynamic”

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3

Page 34: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

“Dynamic”

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3 >>> mod(“%s%s”, (“Py”, “Gotham”))

Page 35: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

“Dynamic”

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3 >>> mod(“%s%s”, (“Py”, “Gotham”)) PyGotham

Page 36: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

“Dynamic”

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3 >>> mod(“%s%s”, (“Py”, “Gotham”)) PyGotham >>> print “%s%s” % (“Py”, “Gotham”) PyGotham

Page 37: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

case BINARY_MODULO: w = POP(); v = TOP(); if (PyString_CheckExact(v)) x = PyString_Format(v, w); else x = PyNumber_Remainder(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) continue; break;

Page 38: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

>>> class Surprising(object): … def __mod__(self, other): … print “Surprise!” !>>> s = Surprising() >>> t = Surprsing() >>> s % t Surprise!

Page 39: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

“In the general absence of type information, almost every instruction must be treated as INVOKE_ARBITRARY_METHOD.”

!- Russell Power and Alex Rubinsteyn, “How Fast Can

We Make Interpreted Python?”

Page 40: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Back to our problem

g = (x*x for x in range(5)) h = (y+1 for y in g) print(list(h))

Page 41: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 return z return bar(x) foo() # <--- (1) !c a l l !s t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [<foo>] k ---------------------

Page 42: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l --------------------- | foo Frame | -> blocks: [] s | | -> data: [<bar>, 1] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 43: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c --------------------- a | bar Frame | -> blocks: [] l | (newest) | -> data: [1, 2] l --------------------- | foo Frame | -> blocks: [] s | | -> data: [] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 44: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c --------------------- a | bar Frame | -> blocks: [] l | (newest) | -> data: [3] l --------------------- | foo Frame | -> blocks: [] s | | -> data: [] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 45: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l --------------------- | foo Frame | -> blocks: [] s | | -> data: [3] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 46: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l !s t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [3] k ---------------------

Page 47: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Back to our problem

g = (x*x for x in range(5)) h = (y+1 for y in g) print(list(h))

Page 48: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

More

Great blogs http://tech.blog.aknin.name/category/my-projects/pythons-innards/ by @aknin http://eli.thegreenplace.net/ by Eli Bendersky !Contribute! Find bugs! https://github.com/nedbat/byterun !Apply to Hacker School! www.hackerschool.com/apply