Python Compiler Internals Presentation Slides

31
Python Compiler Internals Thomas Lee Shine Technologies, Melbourne

Transcript of Python Compiler Internals Presentation Slides

Python Compiler Internals

Thomas LeeShine Technologies, Melbourne

My Contributions to Python

● try/except/finally syntax for 2.5● AST to bytecode compilation from Python● “optimization” at the AST level

What will you get out of this?

Compiler development != rocket science.No magic: it's just code.

Compiler? Isn't Python interpreted?

Well, yes and no.

WTF

It ... It's Java?

Aaanyway

Let's screw with the compiler.

Erm ... Before we start ...

A patch to fix a quirk in the build process.

This has been reported on the bug tracker!

An overview of the compiler

Tokenizer Parser

AST Translation

Tokens

Parse Tree

Bytecode Generation Bytecode Optimization

Execution

Symtable ConstructionAST

Bytecode+ Data

AST + Symtable

New construct:the “unless” statement

unless may_give_you_up: print “never gonna give you up...”

Semantics of “unless”

Works just like “if not” ...

if not would_consider_letting_you_down: print “never gonna let you down...”

Implementing “unless”

1. Modify the Grammar2. Change the AST definition3. Generate bytecode for “unless”

So what's first?

1. Modify the Grammar2. Change the AST definition3. Generate bytecode for “unless”

What does this change affect?

Tokenizer Parser

AST Translation

Tokens

Parse Tree

Bytecode Generation Bytecode Optimization

Execution

Symtable ConstructionAST

Bytecode+ Data

AST + Symtable

WTF is a “grammar”?

Defines the syntactic structure of the language.

Why a tokenizer?

Makes parsing easier.

So what does a tokenizer do, then?

Generates a stream of events for the parser.

What does the parser do?

Organises tokens into the structure defined by the grammar.This is the “parse tree”.

What's next?

1. Modify the Grammar2. Change the AST definition3. Generate bytecode for “unless”

What's affected by this change?

Tokenizer Parser

AST Translation

Tokens

Parse Tree

Bytecode Generation Bytecode Optimization

Execution

Symtable ConstructionAST

Bytecode+ Data

AST + Symtable

What's an AST?

Similar to a parse tree, but not bound to the syntax

“Abstract Syntax Tree”

Why does Python use an AST?

Much easier to operate upon an AST than a parse tree.

AST reuse

New constructs can sometimes be expressedusing existing AST nodes ...

AST reuse

... so we could implement “unless” using “if” and “not” AST nodes.

But then I wouldn't be able to show you the code generator ...

Unless(test, body) = If(Not(test), body)

What's next?

1. Modify the Grammar2. Change the AST definition3. Generate bytecode for “unless”

What's affected by this change?

Tokenizer Parser

AST Translation

Tokens

Parse Tree

Bytecode Generation Bytecode Optimization

Execution

Symtable ConstructionAST

Bytecode+ Data

AST + Symtable

Generating bytecode for “unless”

● Add a hook for our new AST node● Generate bytecode implementing “unless” logic● Use basicblocks as labels for jumps

What about the rest of the compiler?

Tokenizer Parser

AST Translation

Tokens

Parse Tree

Bytecode Generation Bytecode Optimization

Execution

Symtable ConstructionAST

Bytecode+ Data

AST + Symtable

“unless” in action

Let's see if this stuff works ...

Where to go from here?

● Use the Source.● Don't try to grok it all at once.● Don't be afraid to be wrong.● Annoy the python-dev mailing list, like I do!

Thanks!

Questions?