Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation...

48
The University of North Carolina at Chapel Hill Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January 15, 2009 Based on notes by A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts

Transcript of Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation...

Page 1: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Lecture 2: Compilation and Interpretation

COMP 524 Programming Language Concepts Stephen OlivierJanuary 15, 2009

Based on notes by A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts

Page 2: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Goals

•In this lecture, we will discuss how to translate source code into machine executable code.

Page 3: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Compilation and Interpretation

•There are two primary methods for translating high-level code:

• Compilation

• Interpretation

Page 4: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Compilation

Source Program

Compiler

Target ProgramInput Output

Page 5: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Compilation

Source Program

Compiler

Target ProgramInput Output

The “target program” is called the object code.

Page 6: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Compilation

Source Program

Compiler

Target ProgramInput Output

You translate once and run many times.

Page 7: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Interpretation

Interpreter

Source Program Input

Output

Page 8: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Interpretation

Interpreter

Source Program Input

Output

You translate for each run.

Page 9: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Comparison

Source

Compiler

Target In Out

Source In

Out

Interpreter

Page 10: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Comparison

Source

Compiler

Target In Out

Source In

Out

Compilers are usually faster than interpreters.

Interpreter

Page 11: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Comparison

Source

Compiler

Target In Out

Interpreter

Source In

Out

Interpreters are usually more flexible and easier to debug than compilers.

Page 12: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Compilation and Interpretation

Source Program

Translator

Intermediate Program

Input

OutputVirtual

Machine

Page 13: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Compilation and Interpretation

Source Program

Translator

Intermediate Program

Input

OutputVirtual

Machine

The Virtual Machine acts as an interpreter.

Page 14: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Compilation and Interpretation

Source Program

Translator

Intermediate Program

Input

OutputVirtual

Machine

The translator can be a compiler or an interpreter. It is considered to be a compiler if:1. There is a thorough analysis of the program

2. The transformation is non-trivial.

Page 15: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Compilation and Interpretation

Source Program

Translator

Intermediate Program

Input

OutputVirtual

Machine

This is exactly the process that Java uses.

Page 16: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Linking Source Program

Compiler

LibraryRoutines

Incomplete machine language

Linker

Machinelanguageprogram

Page 17: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Linking Source Program

Compiler

LibraryRoutines

Incomplete machine language

Linker

Machinelanguageprogram

The Linker is used to connect subroutines that are not contained in the original code.

Page 18: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Linking Source Program

Compiler

LibraryRoutines

Incomplete machine language

Linker

Machinelanguageprogram

This is used by Fortran

Page 19: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Preprocessing

Source Program

Preprocessor

Modified source program

Page 20: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Preprocessing

Source Program

Preprocessor

Modified source program

For example, this may be used to remove comments or use replacement macros.

Page 21: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Preprocessing

Source Program

Preprocessor

Modified source program

On the note about macros, consider the macro#define FALSE 0.

This code will replace all instances of the word “FALSE” with the value 0.

Page 22: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Tombstone Diagrams

•Useful for graphically representing programs and translators.

• Machine (x86):

• Program (in C): • Interpreter (for Perl, impl. in x86)

• Translator (C to x86, impl. in x86)

Page 23: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Tombstone Diagrams

•Example: Compiling a C program to run on x86

Page 24: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

•Pascal came shipped with three things:

• A Pascal to P-Code Compiler, written in Pascal.

• A P-code interpreter, in Pascal.

• A Pascal to P-Code Compiler, written in P-Code.

The University of North Carolina at Chapel Hill

Bootstrapping

Page 25: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

•Pascal came shipped with three things:

• A Pascal to P-Code Compiler, written in Pascal.

• A P-code interpreter, in Pascal.

• A Pascal to P-Code Compiler, written in P-Code.

The University of North Carolina at Chapel Hill

Bootstrapping

P-Code is a very simple language that can easily be translated into any machine

language.

Page 26: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

•Pascal came shipped with three things:

• A Pascal to P-Code Compiler, written in Pascal.

• A P-code interpreter, in Pascal.

• A Pascal to P-Code Compiler, written in P-Code.

The University of North Carolina at Chapel Hill

Bootstrapping

Page 27: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

•Pascal came shipped with three things:

• A Pascal to P-Code Compiler, written in Pascal.

• A P-code interpreter, in Pascal.

• A Pascal to P-Code Compiler, written in P-Code.

The University of North Carolina at Chapel Hill

Bootstrapping

The interpreter is hand translated into a supported language.

Page 28: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

•Pascal came shipped with three things:

• A Pascal to P-Code Compiler, written in Pascal.

• A P-code interpreter, in Pascal.

• A Pascal to P-Code Compiler, written in P-Code.

The University of North Carolina at Chapel Hill

Bootstrapping

The interpreter is hand translated into a supported language.And then compiled into machine code.

Page 29: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Bootstrapping

•To get a simple (but slow) compiler, one could use the “Pascal to P-Code compiler, in P-Code” and the “P-Code to machine lang. interpreter” to compile Pascal code.

Page 30: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Bootstrapping

•To get a simple (but slow) compiler, one could use the “Pascal to P-Code compiler, in P-Code” and the “P-Code to machine lang. interpreter” to compile Pascal code.

This interpreter is used to run the Pascal to P-Code compiler and the program.

Page 31: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Bootstrapping (Faster)

•To create a faster compiler, we modify the “Pascal to P-Code compiler, in Pascal” so that it is a “Pascal to machine language compiler, in Pascal.”

Page 32: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Bootstrapping (Faster)

•To create a faster compiler, we modify the “Pascal to P-Code compiler, in Pascal” so that it is a “Pascal to machine language compiler, in Pascal.”

This is MUCH HARDER than creating a P-Code to Machine language interpreter.

Page 33: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Bootstrapping (Faster)

Now construct the “Pascal to machine lang. compiler, in P-Code.”

Page 34: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Bootstrapping (Faster)

Then run the “Pascal to machine lang. compiler, in Pascal” through the “P-

Code” version to produce the “Machine lang.”

version.

Page 35: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Bootstrapping (Faster)

Page 36: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Compiling

Scanner (lexical analysis)

Parser (syntax analysis)

Semantic analysis & intermediate code gen.

Machine-independent optimization (optional)

Target code generation.

Machine-specific optimization (optional)

Symbol Table

Character Stream

Token Stream

Parse Tree

Abstract syntax tree

Modified intermediate form

Machine language

Modified target language

Page 37: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Example Program GCD

program gcd(input, output);var i, j: integer;beginread(i,j); // get i & j from readwhile i<>j doif i>j then i := i-jelse j := j-1;

writeln(i)end.

Page 38: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Lexical Analysis

•Recognize structures without regard to meaning and groups them into tokens.

•The purpose of the scanner is to simplify the parser by reducing the size of the input.

Scanner (lexical analysis)

program gcd(input, ouput);

program gcd ( input , output ) ;

Page 39: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Syntax Analysis

•Parsing organizes the tokens into a context-free grammar (i.e., syntax).

program gcd ( input , output ) ;

Parser (syntax analysis)

program

id(GCD) ( id(INPUT) more_ids ) ; block

, id(OUTPUT) more_ids

empty

Rest of code

Page 40: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Syntax Analysis

•Parsing organizes the tokens into a context-free grammar (i.e., syntax).

program gcd ( input , output ) ;

Parser (syntax analysis)

program

id(GCD) ( id(INPUT) more_ids ) ; block

, more_ids

empty

Rest of code

The Syntax analysis catches all malformed statements

id(OUTPUT)

Page 41: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Syntax Analysis

•Parsing organizes the tokens into a context-free grammar (i.e., syntax).

program gcd ( input , output ) ;

Parser (syntax analysis)

program

id(GCD) ( id(INPUT) more_ids ) ; block

, more_ids

empty

Rest of code

The parse tree is sometimes called a concrete syntax tree because it contains

how all tokens are derived...

id(OUTPUT)

Page 42: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Syntax Analysis

•Parsing organizes the tokens into a context-free grammar (i.e., syntax).

program gcd ( input , output ) ;

Parser (syntax analysis)

program

id(GCD) ( id(INPUT) more_ids ) ; block

, more_ids

empty

Rest of code

...however, much of this information is extraneous for the “meaning” of the

code(e.g., the only purpose of “;” is to end a statement).

id(OUTPUT)

Page 43: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Semantic Analysis

•Semantic analysis discovers the meaning of a program. by creating an abstract syntax tree that removes “extraneous” tokens.

•To do this, the analyzer builds & maintains a symbol table to map identifiers to information known about it. (i.e., scope, internal structure, etc...)

•By using the symbol table, the semantic analyzer can catch problems not caught by the parser. For example,

• Identifiers are declared before used

• subroutine calls provide correct number and type of arguments.

Semantic analysis & intermediate code gen.

Page 44: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Semantic Analysis

•Not all semantic rules can be checked at compile time.

• Those that can are called static semantics of the language.

• Those that cannot are called dynamic semantics of the language. For example,

• Arithmetic operations do not overflow.

• Array subscripts expressions lie within the bounds of the array.

Semantic analysis & intermediate code gen.

Page 45: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Example Program GCD

program gcd(input, output);var i, j: integer;beginread(i,j); // get i & j from readwhile i<>j doif i>j then i := i-jelse j := j-1;

writeln(i)end.

Page 46: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Semantic Analysis Semantic analysis & intermediate code gen.

program

id(GCD) ( id(INPUT) more_ids ) ; block

program

(5) read

(3) (6) read

(3) (7)

Rest of code

Index Symbol Type1 INTEGER type

2 TEXTFILE type

3 INPUT 2

4 OUTPUT 2

5 GCD program

6 I 1

7 J 1

Page 47: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Target code generation

•Code generation takes the abstract syntax tree and the symbol table to produce machine readable code.

•Simple code follows directly from the abstract syntax tree and symbol table.

Target code generation.

Page 48: Lecture 2: Compilation and Interpretationolivier/comp524/Lecture02.pdf · Lecture 2: Compilation and Interpretation COMP 524 Programming Language Concepts Stephen Olivier January

The University of North Carolina at Chapel Hill

Optimization

•The process so far will produce correct code, but it may not be fast.

•Optimization will adjust the code to improve performance.

• A possible machine-indp. optimization would be to keep the variables i and j in registers throughout the main loop.

• A possible machine-spec. optimization would be to assign the variables i and j to specific registers.

Machine-independent optimization (optional)

Machine-specific optimization (optional)