Names and Binding In procedural programming, you write instructions the manipulate the “state”...

Names and Binding• In procedural programming, you write instructions the manipulate

the “state” of the process where the “state” is the collection of variables and their values– in this chapter we will consider the idea of variables, storage, binding, types

and scope• Design issues for identifiers (names)

– does the language have a maximum length?• most languages either have no restriction or the restriction is large enough to be

immaterial (e.g., 31 in C and Pascal, 30 in COBOL)– does the language have legal connectors?

• most languages use _ or “camel” notation• COBOL uses – (hyphen) detracting from readability

– are letters case sensitive?• this can detract from readability and writability both

– are the special words of the language context sensitive (key words) or reserved?

• in FORTRAN, INTEGER and REAL are context sensitive leading to this possibility

– INTEGER REAL– REAL INTEGER

Variables• A variable is an abstraction of

– a memory location (for reference)

– and the type-specific methods to perform operations

• The address is sometimes referred to as the l-value – the memory address or location of the left-hand side variable

• Aliases arise when multiple identifiers refer to the same memory location – aliases are created through

• pointers pointing at the same allocated piece of memory

• parameter passing when the item is passed through a pointer

• union types (we explore this in chapter 6)

• Type – specifies the allowable range of values to be stored and allowable operations

• Value – the current value stored in memory, sometimes referred to as the r-value

Binding• Binding is the association of an object to its attributes,

operations, or name– there are many different types of bindings and bindings can

occur at different times– for example:

• design time binding: binding * to multiplication• language implementation time binding: int size and operations• compile time binding: binding type to a variable

– for instance in int x; x will be treated as an int from here forward

• link time binding: binding function name to a specific function definition

• load time binding: binding variable to memory location• run time binding: binding variable to memory location or bind a

polymorphic variable to specific class type

• Binding is static if it occurs before run time and remains unchanged, otherwise binding is dynamic

Bindings: Variable Declarations• Binding the identifier name to the declared type• Are variables declared explicitly or implicitly?

– most languages require explicit variable declarations– exceptions are

• FORTRAN: if first letter is I..N, then the variable is an integer, otherwise a real

• BASIC and PL/I: type binding occurs when variable is first assigned a value• Perl, JavaScript, Ruby, ML, Lisp: variables are typeless – that is, they can

change types any time the values assigned to them are changed – see dynamic type binding below

• Dynamic Type Binding – binding of type is not explicit but derived by examining assignment

statements at runtime• LISP, JavaScript, Perl, JavaScript, Ruby, and PHP are all like this• note that in Perl, different types of variables are determined by their name

– $name is a scalar, @name is an array, %name is a hash structure

• Type Inference – using inference rules to determine the type returned by a function – this is

the case in the languages ML, Miranda and Haskell

Static Lifetime• The lifetime is the time from which a variable is bound to

memory until it is unbound– a static lifetime means that the variable is bound before

program execution begins and remains the same until program termination

• Variables are statically bound if they are– global variables– variables in a C function declared as static– variables in a FORTRAN subroutine

• this allows the subprogram to retain the value of the variable after the subprogram terminates

• however, this prohibits recursion or shadowing

• Static is the most efficient form of binding – access is performed using direct memory addressing mode– no overhead needed for allocation or deallocation at runtime

Stack Dynamic Lifetime• The variable is bound when execution reaches the variable

declaration in the code and unbound when the code that includes the declaration terminates– local variables and parameters in procedures, functions, methods are stack

dynamic for the Algol-descended languages (C/C++/Java, Pascal, Ada, etc)

– variables are pushed onto the run-time stack when the function/method begins execution and are popped off the stack when it terminates

• allocation and deallocation are performed by the run-time environment

– stack dynamic binding allows for recursion • FORTRAN’s handling of parameters being static does not allow for recursion• extra runtime overhead is needed for allocation and deallocation

– memory space for these variables is provided on the run-time stack

– variables declared inside a block are stack dynamic• { int x; …} in C/C++/Java, or inside begin…end blocks in Pascal/Ada/Algol

– stack dynamic lifetimes will not allow a history of values to be saved after the block of code in which they were declared has terminated

• FORTRAN 95 includes a Save list instruction to save the list of variables somewhere other than the stack

• if you want to retain a variable’s value in C, declare it as static

Explicit Dynamic Lifetime• Variables are allocated and deallocated explicitly at runtime

– memory for these types of variables comes from a system reserved area called the heap

– allocated memory from the heap is nameless • that is, there is no binding of a named variable to a memory location

– these locations must be referenced by pointers• the pointer might be named, as in int *p; • or the pointer may be part of a struct from heap itself

– we use this memory for dynamic structures like linked lists, trees– in C/C++, we allocate heap memory using malloc and calloc – in Java, C++, Pascal and Ada, we use new – In PL/I, we use allocate– we must explicitly deallocate heap memory in most languages (but not Java

or C# where garbage collection is used)– C# has parameters that can be either stack dynamic or explicit dynamic

• The variable’s type is bound at compile time even though the variable’s memory is not allocated until run-time– since memory deallocation is often OS specific, many languages don’t

actually implement a deallocation routine (so for instance, free in C may or may not work!)

Implicit Dynamic Lifetime• The variable’s memory is only bound while it is assigned

a value– the variable’s attributes are bound during this time so that, if

the variable is unbound and bound to a new memory location, then its attributes (including type) change

• Lisp and JavaScript both use this approach• ALGOL 58 for “flex” arrays also used this approach

– like explicit dynamic lifetime binding, this form also comes from the run-time heap, but here, allocation is implicit – that is, you do not have to explicitly use an allocation instruction like new or malloc

– as with Java and C#, there is no need to explicitly deallocate the memory, it is garbage collected when no pointers are pointing at it

– this form of lifetime has the highest degree of flexibility but also highest amount of overhead

• no compile-time type checking is possible since the type can change, so any type checking (if performed at all), must be done at run-time

– allows for Generic code which can operate on any type

Type Checking• Ensures that operands of an operator are compatible

– a compatible type is either one that is legal for the operator or is allowed via an implicit coercion (coercion is the automatic conversion of a variable’s type to a legal type for the operation)

• in C and Java, an int can be coerced into a float and a float can be coerced into a double, but not the other way

• in Pascal, an integer can be coerced into a string but in C++ or Java, it must be cast into a string

• A type error (often called a type mismatch) occurs if an operand is not appropriate for an operator and cannot be coerced

• If all bindings of variable to type are static, then type checking can be done completely at compile time– JavaScript, Lisp and APL perform dynamic (run-time) type checking

– type checking is complicated if a memory location can store different types at different times of a program’s execution

• which is the case in FORTRAN or with Union types (we visit this in chapter 6)

Strongly Typed• A programming language is strongly typed if all type checking

errors are detectable before run-time– a more restrictive definition is if every identifier in a program has a single

type associated with it and known at compile time

• Both definitions require static binding– having a language be strongly typed language is desirable because it offers

the best reliability with regard to type errors

• Few languages are strongly typed because nearly all languages have a mechanism to get around type checking– FORTRAN – in early versions, parameters were not checked, union types

are available (EQUIVALENCE)– Pascal – has variant records (we will examine this in chapter 6)– C, C++ – functions may have params that skip type checking, and has

union types– Ada, Java and C# – they are close to being strongly typed in that no type

errors can arise implicitly, however, all three languages have unchecked conversions (casts in Java and C#) which can lead to typing errors

– APL, SNOBOL, LISP – dynamic type binding– ML is strongly typed but some of the types are inferred instead of declared

Type Compatibility• Type compatibility determines whether a type error

should arise or not– types are compatible if one is coercible into another

• Languages will determine compatibility based on one of two strategies:– name type compatibility

• if the variables are declared using the same declaration or the same type• example: int x; int y; // x and y are name type compatible

– compatible by structure • if the variables have the same structure even though they are of

differently named types, for example:– struct foo { int x; float y; };– struct bar { int x; float y;}; – foo a; bar b; // a and b are structure type compatible

• C uses compatibility by structure while C++ uses name compatibility• Ada uses name compatibility except for anonymous arrays which use

structure compatibility

Scope• Scope is the range of statements from which a variable

is “visible” (where the variable can be referenced)• Scope rules of a given language determine how the

name being referenced will be associated with a particular variable in memory of that name – this is necessary when dealing with non-local references to

variables, or when variables are re-declared inside of blocks

• We will examine two forms of scope, static scope and dynamic scope– nearly all languages use static scope because it is easier to

understand and type checking can be performed at compile time

• Lisp is one of the few that has used dynamic scope, so we will consider this although today Lisp makes static scope available because of the difficulties in reading code that is dynamically scoped

Static Scope• Introduced in ALGOL 60 to bind names to non-local

variables and has been copied by most languages since– scope for any variable can be determined prior to execution

– two subtypes of static scoping • languages where subprograms can be nested (e.g., Ada, Pascal)

• languages where subprograms can not be nested (e.g., C, Java)

– if subprograms can be nested, then this creates a hierarchy of scopes formed by the definitions of the subprograms

• example: sub1 is defined inside of sub2 which is inside of sub3, then a variable referenced in sub3 but not declared in sub3 would be found in sub2, and if not in sub2, then in sub1

– if two or more subprograms use the same name for a variable, the reference is to the definition that occurs in the subprogram innermost to the current, whereas the outer variable is “hidden”

• in Ada, a hidden variable can still be accessed via notation like: sub1.x

Example of Static Scope

Consider a program with nested subprograms: Main contains A and B A contains C and D B contains E

In the language of this program, a nested subprogram can call any nested subprogram above it within the same subprogram, and can be called by the subprogram it is nested in

so A can call C and D, D can call C, but C cannot call D

B (which appears below A) can call A or D but D and C cannot call B or E

Assume x is declared in MAIN, B and C and assume MAIN calls B calls E calls A calls D

If x is referenced in E, it is B’s x whereasif x is referenced in D, it is MAIN’s x (not C’s) because D is statically scoped inside of MAIN but not in C

MAIN MAIN

A B

C D E

A

C

B

ED

Blocks• In C-like languages– static scoping is not as much an

issue because subprograms are not nested inside one another

• any non-local variable will be a variable declared in the file (otherwise there is an error)

• C-languages do allow blocks, which have the same scope rules as static scoping– a variable is found in the

section of code it is referenced, or else you must follow the blocks outward until you reach the definition or reach the end of the function

void scopeexample(int x){… // reference 1 { int x; // declaration A … // reference 2 { … // reference 3 { int x; // declaration B

… // reference 4 { … // reference 5 }

} } } … // reference 6}

In the above example, reference 1is to the parameter, reference 2 and3 are to declaration A, reference 4 and5 are to declaration B, reference 6 is tothe parameter again. If x were not the name of the parameter, reference 1, 2 and 6 would yield syntax errors

Dynamic Scoping• Scope is based on the

sequence of calling subprograms– not their physical

location

• To determine reference– search backward through

the chain of subprogram calls until the subprogram in which the variable was declared is found

• Dynamic scoping was used in APL, SNOBOL4, early LISPs

MAIN - declaration of x SUB1 - declaration of x - ... call SUB2 ... SUB2 ... - reference to x - ...

MAIN calls SUB1SUB1 calls SUB2SUB2 references x

Static scoping – reference to x is to MAIN's x

Dynamic scoping - reference to x is to SUB1's x

Dynamic scoping makes a program less readable and less reliable because a non-local reference cannot be determined until run-time

From previous example, if MAIN calls B calls E calls Acalls D, and we access x in D, then we will referenceB’s x, not MAIN’s

Referencing Environment• The referencing

environment is the collection of variables which are accessible (visible) to a given statement– in statically scoped

languages, each statement’s Referencing Environment can be determined at compile time

– in dynamically scoped languages, the Referencing Environment of a statement consists of all variables in the local subprogram plus all other active subprograms (i.e., execution started but not yet terminated)

procedure Example is A, B : Integer; … procedure Sub1 is X, Y : Integer; begin … 1 end; procedure Sub2 is X : Integer; … procedure Sub3 is X : Integer; begin … 2 end; begin … 3 end; begin … 4 end.

ReferencingEnvironment at1: X, Y of Sub1 A, B of Example2: X of Sub3 A and B of Example (X of Sub2 is hidden but accessible as Sub2.X)3: X of Sub2, A and B of Example4: A and B of Example

Example in Ada

Constants, Variable Initializations• A named constant is an identifier bound to a value at the time it

is bound to storage and unalterable during its lifetime– constants can aid readability and reliability of a program

– Ada, C++, Java allow dynamic binding of constants so that the value is not set by the programmer but can be determined at runtime (for instance, passed into a method as a parameter)

• For convenience, variable initialization can occur prior to execution– FORTRAN:

Integer SumData Sum /0/

– Ada: Sum : Integer :=0;

– ALGOL 68: int first := 10;

– Java: int num = 5;

– LISP (Let (x (y 10)) ... )

Names and Binding In procedural programming, you write instructions the manipulate the “state”...

Documents

Transcript of Names and Binding In procedural programming, you write instructions the manipulate the “state”...