Introduction CPSC 388 Ellen Walker Hiram College.

29
Introduction CPSC 388 Ellen Walker Hiram College

Transcript of Introduction CPSC 388 Ellen Walker Hiram College.

Page 1: Introduction CPSC 388 Ellen Walker Hiram College.

Introduction

CPSC 388Ellen WalkerHiram College

Page 2: Introduction CPSC 388 Ellen Walker Hiram College.

Why Learn About Compilers?

• Practical application of important computer science theory

• Ties together computer architecture and programming

• Useful tools for developing language interpreters– Not just programming languages!

Page 3: Introduction CPSC 388 Ellen Walker Hiram College.

Computer Languages

• Machine language– Binary numbers stored in memory– Bits correspond directly to machine actions

• Assembly language– A “symbolic face” for machine language– Line-for-line translation

• High-level language (our goal!)– Closer to human expressions of problems, e.g. mathematical notation

Page 4: Introduction CPSC 388 Ellen Walker Hiram College.

Assembler vs. HLL

• AssemblerLdi $r1, 2 -- put the value 2 in R1

Sto $r1, x -- store that value in X

• HLLX = 2;

Page 5: Introduction CPSC 388 Ellen Walker Hiram College.

Characteristics of HLL’s

• Easier to learn (and remember)

• Machine independent– No knowledge of architecture needed

– … as long as there is a compiler for that machine!

Page 6: Introduction CPSC 388 Ellen Walker Hiram College.

Early Milestones

• FORTRAN (Formula Translation)– IBM (John Backus) 1954-1957– First High-level language, and first compiler

• Chomsky Hierarchy (1950’s)– Formal description of natural language structure

– Ranks languages according to the complexity of their grammar

Page 7: Introduction CPSC 388 Ellen Walker Hiram College.

Chomsky Hierarchy

• Type 3: Regular languages– Too simple for programming languages– Good for tokens, e.g. numbers

• Type 2: Context Free languages– Standard representation of programming languages

• Type 1: Context Sensitive Languages

• Type 0: Unrestricted

Page 8: Introduction CPSC 388 Ellen Walker Hiram College.

CSL

Another View of the Hierarchy

CFL

RL

Page 9: Introduction CPSC 388 Ellen Walker Hiram College.

Formal Language & Automata Theory

• Machines to recognizes each language class– Turing Machine (computable languages)– Push-down Automaton (context-free languages)

– Finite Automaton (regular languages)

• Use machines to prove that a given language belongs to a class

• Formally prove that a given language does not belong to a class

Page 10: Introduction CPSC 388 Ellen Walker Hiram College.

Practical Applications of Theory

• Translate from grammar to formal machine description

• Implement the formal machine to parse the language

• Tools:– Scanner Generator (RL / FA): LEX, FLEX

– Parser Generator (CFL / FA): YACC, Bison

Page 11: Introduction CPSC 388 Ellen Walker Hiram College.

Beyond Parsing

• Code generation• Optimization

– Techniques to “mindlessly” improve code

– Usually after code generation– Rarely “optimal”, simply better

Page 12: Introduction CPSC 388 Ellen Walker Hiram College.

Phases of a Compiler

• Scanner -> tokens• Parser -> syntax tree• Semantic Analyzer -> annotated tree

• Source code optimizer -> intermediate code

• Code generator -> target code• Target code optimizer -> better target code

Page 13: Introduction CPSC 388 Ellen Walker Hiram College.

Additional Tables

• Symbol table– Tracks all variable names and other symbols that will have to be mapped to addresses later

• Literal table– Tracks literals (such as numbers and strings) that will have to be stored along with the eventual program

Page 14: Introduction CPSC 388 Ellen Walker Hiram College.

Scanner

• Read a stream of characters• Perform lexical analysis to generate tokens

• Update symbol and literal tables as needed

• Example:Input: a[j] = 4 + 1Tokens: ID Lbrack ID Rbrack EQL NUM PLUS NUM

Page 15: Introduction CPSC 388 Ellen Walker Hiram College.

Parser

• Performs syntax analysis• Relates the sequence of tokens to the grammar

• Builds a tree that represents this relationship, the parse tree

Page 16: Introduction CPSC 388 Ellen Walker Hiram College.

Partial Grammar

• assign-expr -> expr = expr• array-expr -> ID [ expr ]• expr -> array-expr • expr -> expr + expr• expr -> ID• expr -> NUM

Page 17: Introduction CPSC 388 Ellen Walker Hiram College.

Example Parse

assign-expression

expression

add-expressionarray-expression

ID [

ID

]

=

NUM

+

NUM

expression expression

expression expression

Page 18: Introduction CPSC 388 Ellen Walker Hiram College.

Abstract Syntax Tree

assign-expression

expression

add-expressionarray-expression

ID

ID NUM NUM

expression expression

expression expression

Page 19: Introduction CPSC 388 Ellen Walker Hiram College.

Semantic Analyzer

• Determine the meaning (not structure) of the program

• This is “compile-time” or static semantics only

• Example; a[j] = 4 + 1– a refers to an array location– a contains integers– j is an integer – j is in the range of the array (not checked in C)

• Parse or Syntax tree is “decorated” with this information

Page 20: Introduction CPSC 388 Ellen Walker Hiram College.

Source Code Optimizer

• Simplify and improve the source code by applying rules– Constant folding: replace “4+2” by 6– Combine common sub-expressions– Reordering expressions (often prior to constant folding)

– Etc.

• Result: modified, decorated syntax tree or Intermediate Representation

Page 21: Introduction CPSC 388 Ellen Walker Hiram College.

Code Generator

• Generates code for the target machine

• Example:– MOV R0, j value of j into R0– MUL R0, 2 2*j in R0 (int = 2 wds)

– MOV R1, &a value of a in R1– ADD R1, R0 a+2*j in R1 (addr of a[j])

– MOV *R1, 6 6 into address in R1

Page 22: Introduction CPSC 388 Ellen Walker Hiram College.

Target Code Optimizer

• Apply rules to improve machine code• Example:

– MOV R0, j– SHL R0 (shift to multiply by 2)

Use more complex– MOV &a[R0], 6 machine instruction to

replace simpler ones

Page 23: Introduction CPSC 388 Ellen Walker Hiram College.

Major Data Structures

• Tokens• Syntax Tree• Symbol Table• Literal Table• Intermediate Code• Temporary files

Page 24: Introduction CPSC 388 Ellen Walker Hiram College.

Structuring a Compiler

• Analysis vs. Synthesis– Analysis = understanding the source code– Synthesis = generating the target code

• Front end vs. Back end– Front end: parsing & intermediate code generation (target machine-independent)

– Back end: target code generation

• Optimization included in both parts

Page 25: Introduction CPSC 388 Ellen Walker Hiram College.

Multiple Passes

• Each pass process the source code once– One pass per phase– One pass for several phases– One pass for entire compilation

• Language definition can preclude one-pass compilation

Page 26: Introduction CPSC 388 Ellen Walker Hiram College.

Runtime Environments

• Static (e.g. FORTRAN)– No pointers, no dynamic allocation, no recursion

– All memory allocation done prior to execution

• Stack-based (e.g. C family)– Stack for nested allocation (call/return)– Heap for random allocation (new)

• Fully dynamic (LISP)– Allocation is automatic (not in source code)

– Garbage collection required

Page 27: Introduction CPSC 388 Ellen Walker Hiram College.

Error Handling

• Each phase finds and handles its own types of errors– Scanning: errors like: 1o1 (invalid ID)

– Parsing: syntax errors– Semantic Analysis: type errors

• Runtime errors handled by the runtime environment– Exception handling by programmer often allowed

Page 28: Introduction CPSC 388 Ellen Walker Hiram College.

Compiling the Compiler

• Using machine language– Immediately executable, hard to write

– Necessary for the first (FORTRAN) compiler

• Using a language with an existing compiler and the same target machine

• Using the language to be compiled (bootstrapping)

Page 29: Introduction CPSC 388 Ellen Walker Hiram College.

Bootstrapping

• Write a “quick & dirty” compiler for a subset of the language (using machine language or another available HLL)

• Write a complete compiler in the language subset

• Compile the complete compiler using the “quick & dirty” compiler