Topic 3 -Binding Time and Symbol Tables Dr. William A. Maniatty Assistant Prof. Dept. of Computer...

53
Topic 3 -Binding Time and Symbol Tables Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems Concepts Fall 2002 Monday Wednesday 2:30-3:50 LI 99

Transcript of Topic 3 -Binding Time and Symbol Tables Dr. William A. Maniatty Assistant Prof. Dept. of Computer...

Topic 3 -Binding Time andSymbol Tables

Dr. William A. ManiattyAssistant Prof.

Dept. of Computer ScienceUniversity At Albany

CSI 511Programming Languages and Systems Concepts

Fall 2002

Monday Wednesday 2:30-3:50LI 99

Introduction to Binding

Binding refers to associating an entity with a value, such as

Variable name with address0 Result of expression with ephemeral storage Constant with its value Seperately compiled function with address

Binding Time

Binding time refers to when entities are associated with their values is made.

Design Binding Times

There are extra binding times available to programming language designers.

Language Design Time - Choose fundamental primitives, reserved words, etc.

Compiler/Interpreter Implementation Time -How to internally represent language constructs.

Programming Time - Language users pick the algorithms and data structures.

Object - What does it mean?

The word Object has many meanings in program languages.

Object Module -A compiled (but not linked) module of a program.

Object (OOP sense) - An instance of a class in Object Oriented Programming.

Object (Programming Language Sense) -The entities which are bound to values.

Use the programming language for now.

Binding Time Design Issues

Late binding of objects indicates that interpreters.

Dynamic Type Systems

Care needs to be taken to avoid ambiguity when binding.

Name Space Collisions Polymorphism (Overloading)

Object Attributes

Objects have many attributes Lifetime (Persistence) Type Scope Value/Address

Language should: Precisely specify attributes Be Orthogonal -Separate Controls

Object Persistence vs. Lifetime

Persistence -Persistant objects last longer than the process that created it.

Examples - Files, databases. Memory for nonpersistent objects is called

volatile (you lose data if powered down).

Lifetime - When is the storage allocated to an object available?

Events Impacting Object Lifetime

Life Time has several aspects. Creation of objects Creation of bindings References to variables/subroutines/types/etc. (Re)activation and Deactivation of bindings Destruction of bindings Destruction of Objects

Allocation andObject Lifetime

How can objects be allocated? Statically -Exist during Program's Lifetime Stack -Used for ephemeral objects and

ephemeral objects. Heap Objects -Have controlled lifetimes

Deallocation: How is it indicated? Explicitly - Destructors/free/delete Implicitly - Garbage collection

Initialization - Separate (Constructors)

Static Allocation

Done at compile time Literals (and constants) bound to values Variables bound to addresses

Compiler notes undefined symbols Library functions Global Variables and System Constants

Linker (and loader if DLLs used) resolve undefined references.

Stack Based Allocation

Stack Layout determined at compile time Variables bound to offsets from top of stack.

Layout called stack frame or activation record

Compilers use registers

Function parameters and results need consistent treatment across modules

C/C++ use prototypes Eiffel/Java/Oberon use single definition

Parameter Passing Conventions

Actual Parameters -at the call site

Formal Parameters - at the subroutine declaration

Address - a memory location, data objects containing addresses can be called:

Pointer - use explicit dereferencing operation. Reference - use implicit dereferencing.

Parameter Passing Conventions

Call by value - Copy to the function

Call by reference - Pass reference

Call by address - Pass address to function

Call by result - Pass result back to caller

Call by value result - Copy inputs to the function and copy results to caller.

Parameters can be on stack or in registers.

Call Site Code Generation for Stack Allocation

Call Setup Push Register Values on stack (if caller saves) Push parameters on stack (or load into

registers)

Call Function Push Return Address on stack Goto Function's Start Address

Call Cleanup (if caller saves)

Subroutine Code Generation for Stack Allocation

Prologue -Push Registers that will be overwritten on stack (if callee saves)

Body of function

Call Cleanup (if caller saves) Copy results (if any) Pop Parameters off stack. Pop registers Return

Stack and Frame Layout

Stack here grows toward low addresses.

Heap Allocation

Heap provides dynamic memory management.

Not to be confused with binary heap or binomial heap data structures.

Under the hood, may periodically need to request additional memory from the O/S.

Requested large regions (requests are expensive).

Done using a library (e.g. C) Or as part of the language (C++, Java, Lisp).

Heap Data Structures

Must track allocated/Free Memory.

Metadata is added (pointers, size, etc).

Memory Management

Holes can form where memory is freed. Coalesce adjacent holes Small holes fragment the memory.

Suppose you allocate a smaller chunk, which hole do we take it from?

First fit - The first hole found that it fits into Best fit - The smallest segment it fits into Worst fit - The largest segment it fits into

When to Free Memory

Depends on language. Explicit deallocation -needed for library

approaches (e.g. C). Implicit Deallocation - aka garbage collection

Garbage is unreferenced memory. Compaction moves allocated memory to contiguous

addresses (coalescing all holes). Can cause timing variations (care is needed in real

time systems).

Speeding Up Searching for a Free Block

Recall all fitting scheme require finding sufficiently large blocks.

Idea: Organize Free List according to block size.

Fibbonacci Heap - Use Fibbonacci numbers for block sizes.

Buddy System -Use Block sizes of 2k

Introduction to Scope

Scope refers to the region of a program during which a binding is active.

Consider the following code segment, what should the output be?

program(output){const int i = 1;procedure b(){

write(output, i); // What value is output here?}procedure a(){

const int i = 2;b();

}a(); // invoke a

}

Scope Rules

Two popular answers to the problem. Static (lexical) scope -Use compile time

analysis. Normally in block structured languages, the containing scope is preferred, output is 1 in this case.

Dynamic Scope -Value found at run time by resolving to nearest stack frame in which the value is defined, output is 2 in this case.

Lexical scope is more popular.

Variants of Static Scope

Single Global Scope (BASIC) - simplest

Global and Local (Fortran) Fortran Common Blocks

Supports separate compilation Gives base address of region Each program specifies (possibly different) layout

Block Structured (Pascal)

Modules and Separate Compilation

Modules support encapsulation (much like classes).

Found in Modula 2, Euclid, Oberon and Ada.

For separate compilation define interfaces (data and subroutines) Export statements - published interfaces Import statements - uses published interfaces

Classes are extensions of modules

More Notation

Fundamental question: Does the scope need to be explicitly imported to be visible?

Yes - Referred to as closed scope. No - Referred to as open scope.

Aliasing -having more than one way to refer to the same object.

Classes and Scope

Classes provide encapsulation in object oriented programming (OOP).

Supports aggregating heterogeneous data and operations together.

Interfaces are published C++ public section in classes Internals can be hidden (ala private section in C++) Constructors and destructors supported.

OOP Features

I think of OOP as providing Encapsulation -groups data with operations Inheritance -permits extension of more general

base classes (and overriding behaviors) Polymorphism (overloading) - allows

operators/subroutines to have behaviors dependent on the types of arguments and results expected.

Dynamic Scope

Dynamic scoping prefers the instance defined in the most recently invoked function.

Not very popular currently (hard to debug) Found in interpreted languages (APL, older

Lisp dialects, e.g. EMACS Lisp).

Fans claim that it makes customizing subroutines easier.

Another Dynamic vs. Static Scope Example

Symbol Table Design Criteria

Symbol tables require: Fast insertion Fast lookup Occasional deletion (should be fast).

Which motivates the use of hash tables.

But ordinary hash tables are not good with nesting (ala classes/records/subroutines)

Operations on Symbol Tables (Static Scope)

A Symbol Table should support: Entering Scope Leaving Scope Inserting a symbol (with scope information) Looking up a symbol (with scope information)

It is often useful to store symbol table in object/executables

e.g. For debugging or source level analysis

LeBlanc-Cook Symbol Table Lookup 1/5

LeBlanc-Cook Symbol Table Lookup Each Scope is assigned a serial number Elements are never deleted from the table A Scope Counter is maintained

The first scope is 0 Every new scope encountered increments the counter

To track nesting, a scope stack is maintained. Push to enter scope, pop when leaving scope

LeBlanc-Cook Symbol Table Lookup 2/5

Put all symbols in a single hash table. Keywords not inserted (can use another hash). Entries indexed using both name and scope.

To lookup a name Look in the hash table for (name,scope) pair. If not found:

Parent scope is found using stack Test if parent scope is open or exports symbol

LeBlanc-Cook Symbol Table Lookup 3/5

About Hashing and Hash Functions: Is the universe of keys known in advance?

Yes - perfect minimal hashing may be possible. No - must handle collisions

e.g. Quadratic Rehash or Chaining

Symbol Table Algorithm has to handle collisions if hashing is used.

LeBlanc-Cook Symbol Table Lookup 4/5

LeBlanc-Cook Symbol Table Lookup 5/5

LeBlanc-Cook Symbol Table (An Example)

Dynamic Scope and Symbol Table Management

Dynamic scope has different symbol table management needs than static scope

Needs insert, lookup, enter scope, leave scope. Just like static scope

Competing Approaches: simplicity vs. speed Association Lists -Simple, fast scope entry/exit. Central Reference Table -Like Leblanc-Cook sans

reference stack. Faster Lookup (common case?), slower scope entry/exit.

Association Lists

Association Lists (A-Lists) combine list and stack treatment.

When a new scope is entered Push its symbols on the stack Use a unidirectional linked list to implement stack.

To find an item Scan stack starting at top of stack.

When leaving a scope Pop all symbols in scope from the stack.

Central Reference Tables (1)

Central Reference Tables use hashing Elements are keyed by symbol Each element is a stack

So we have one stack per symbol Newest Scope is on top

Use a unidirectional linked list to implement stack.

Central Reference Tables (2)

To insert a symbol/scope Hash on symbol, push symbol/scope on stack.

To find a symbol in a scope Hash to symbol's stack Use scope at top of stack.

When leaving scope Pop all symbols in that scope from top of their

respective stacks.

Resolving Static Scope at Run Time

Consider a function F containing G. i.e. F and G are nested functions

Suppose G uses an identifier in F's scope. How can G find F's frame pointer at run time?

If G is always invoked by F, just do base + offset Called static chaining - offset computed at compile time.

But what if G is separated by recursive invocations Use pointer jumping (exploit transitivity and associativity) Called dynamic chaining - requires run time support

An Example Requiring Dynamic Chaining

program DynChain(input,output);var basex, basey, TimesCalled : integer;function Ackermann(x, y : integer) : integer;begin TimesCalled := TimesCalled + 1; if (x = 0) then begin writeln("Returning ", y + 1, ", bx = ", basex, ",by = ", basey, "TimesCalled = ", TimesCalled); Ackermann := y + 1; end else if (y = 0) then Ackerman := Ackermann(x-1, 1, TimesCalled) else Ackerman := Ackermann(x - 1, ackerman(x, y- 1, TimesCalled), TimesCalled);end; {Ackermanm}begin TimesCalled = 0; writeln("Enter basex and base y"); readln(basex, basey); writeln("ackerman( basex = ", basex, ", basey = ", basey, ") = ",

Ackermann(basex, basey, TimesCalled), "TimesCalled = ", TimesCalled);end.

Subroutine Closures

Consider when a function, F, is passed as an argument to another function, G

E.g. Comparison Operators for sorting When G invokes F, how can we determine the

scope?

Subroutine closures describe a function's scope and instruction space address

Overloading Defined

An overloaded function or operator selects its semantics based on the types of its parameters and result

Implicit overloading - provided by language e.g. addition in Pascal can handle real or integers Write and Writeln in Pascal

Explicit overloading - programmers resolve actions

e.g. Overloaded operators and methods in C++

Some thoughts on Overloading

Should user defined overloading of operators be permitted?

Pro: Permits consistent interface e.g. A = B * C; good for integer, real, complex ...

Cons: You may need to read the entire program to understand a single line of code.

e.g. A = B * C; What if B and C are objects? Inheritance?

What to do with ephemeral objects? e.g A * B * C

More Thoughts

Meyer's Eiffel overloads A(i) Single parameter function Single index array Because functions and arrays are often

interchangeable!

Operator vs. function overloading Operator - Syntactic Sugar Function - Programmers know to read code

Challenges of Overloading

Compiler needs to be smart about types

Separate compilation hard e.g. Unix Linker - Predates C++ Name Space Mangling

Can break system tools (profilers/debuggers) Compiler creates a unique name based on

operator/function name and parameter/result types. No standard defined

Hard to link code compiled by different compilers

Templates

Templates in C++ are used for container classes.

The base type describes elements in the container.

The base type is a parameter to the template passed when instantiated (or in a typedef).

Makes separate compilation hard Typically interface needs to be compiled by

both publisher and user (header files)

Templates Pros and Cons

Templates promote code reuse But also promotes compiled code bloat

Recovering from syntax errors is hard! Make a small STL error, get pages of errors And the error messages are not helpful! Vandevoorde's Xroma - Have template

developer give compiler hints (also for code generation).

Summary

Binding associates names and values

Scope rules govern which name binds to which value in the event that a name is reused.

Naming combined with type information permits overloading (promoting code reuse).