Principles of Programming Languages Full Note
-
Upload
francjohny -
Category
Documents
-
view
48 -
download
7
description
Transcript of Principles of Programming Languages Full Note
-
Principles of Programming Languages
Page 1
Unit-1
Introduction:
A programming language is an artificial language designed to communicate instructions to a
machine, particularly a computer. Programming languages can be used to create programs that
control the behavior of a machine and/or to express algorithms precisely.
The earliest programming languages predate the invention of the computer, and were used to
direct the behavior of machines such as Jacquard looms and player pianos. Thousands of
different programming languages have been created, mainly in the computer field, with many
more being created every year. Most programming languages describe computation in an
imperative style, i.e., as a sequence of commands, although some languages, such as those that
support functional programming or logic programming, use alternative forms of description.
The description of a programming language is usually split into the two components of syntax
(form) and semantics (meaning). Some languages are defined by a specification document (for
example, the C programming language is specified by an ISO Standard), while other languages,
such as Perl 5 and earlier, have a dominant implementation that is used as a reference.
1. What is a name in programming languages? A name is a mnemonic character string used to represent something else.
Names are a central feature of all programming languages. In the earliest programs, numbers
were used for all purposes, including machine addresses. Replacing numbers by symbolic names
was one of the first major improvements to program notation.
e.g. variables, constants, executable code (methods, procedures, subroutines, functions, even
whole programs), data types, classes, etc.
In general, names are of two different types:
1. Special symbols: +, -, *
2. Identifiers: sequences of alphanumeric characters (in most cases beginning with a letter), plus
in many cases a few special characters such as '_' or '$'.
2. What is binding? A binding is an association between two things, such as a name and the thing it names
E.g. the binding of a class name to a class or a variables name to a variable
Static and Dynamic binding:
A binding is static if it first occurs before run time and remains unchanged throughout program
execution.
A binding is dynamic if it first occurs during execution or can change during execution of the
program.
3. What is binding time? Binding time is the time when an association is established.
In programming, a name may have several attributes, and they may be bound at different times.
-
Principles of Programming Languages
Page 2
Example1:
int n;
n = 6;
first line binds the type int to n and the second line binds the value 6 to n. The first binding
occurs when the program is compiled. The second binding occurs when the program is executed.
Examlpe2:
void f()
{
int n=7;
printf("%d", n);
}void main ()
{ int k;
scanf("%d", &k);
if (k>0)
f();
}
In FORTRAN, addresses are bound to variable names at compile time. The result is that, in the
compiled code, variables are addressed directly, without any indexing or other address
calculations. (In reality, the process is somewhat more complicated. The compiler assigns an
address relative to a compilation unit. When the program is linked, the address of the unit within
the program is added to this address. When the program is loaded, the address of the program is
added to the address. The important point is that, by the time execution begins, the absolute address of the variable is known.)
FORTRAN is efficient, because absolute addressing is used. It is inflexible, because all
addresses are assigned at load time. This leads to wasted space, because all local variables
occupy space whether or not they are being used, and also prevents the use of direct or indirect
recursion.
Early and Late Binding:
Early binding - efficiency
Late binding -flexibility
4. Explain about different times at which decisions may be bound? Or
Explain different types of binding times?
In the context of programming languages, there are quite a few alternatives for binding time.
1. Language design time 2. Language implementation time 3. Program writing time 4. Compile time 5. Link time
-
Principles of Programming Languages
Page 3
6. Load time 7. Run time
1. Language design time:
In most languages, the control flow constructs the set of fundamental types, the available
constructors for creating complex types, and many other aspects of language semantics
are chosen when the language is designed.
Or
During the design of the programming languages, the programmers decides what symbols
should be used to represent operations
e.g. .Binding of operator symbols to operations
( * + ) (Multiplication, addition )
2. Language Implementation Time: Most language manuals leave a variety of issues to the discretion of the language
implementor. Typical examples include the precision of the fundamental types, the
coupling of I/O to the operating systems notion of files, the organization and maximum
sizes of stack and heap and the handling of run-time exceptions such as arithmetic
overflow.
Or
During language implementation programmers decides what should be the range of
values given to a data type
Bind data type, such as int in C to the range of possible values
Example: the C language does not specify the range of values for the type int. Implementations
on early microcomputers typically used 16 bits for ints, yielding a range of values from -32768
to +32767. On early large computers, and on all computers today, C implementations typically
use 32 bits for int.
3. Program writing time: Programmers of course, choose algorithms, data structures and names
Or
While writing programs, programmers bind certain names with procedure, class etc
Example: many names are bound to specific meanings when a person writes a program
4. Compile time: The time when a single compilation unit is compiled, while compiling the type of a
variable can be identified
Example: int c; [at compile time int, c forms an association]
-
Principles of Programming Languages
Page 4
5. Link time: The time when all the compilation units comprising a single program are linked as the
final step in building the program
[The separate modules of a single program will be bound only at link time]
6. Load time: Load time refers to the point at which the operating system loads the program into
memory so that it can run
7. Run time: Run time is actually a very broad term that covers the entire span from the beginning to
the end of execution. If we give the value of a variable during runtime, it is known as
runtime binding
Ex. Printf (Enter the value of X); Scanf (%f, &X);
5. What is Scope?
The textual region of the program in which a binding is active is its scope.
The scope of a name binding is the portion of the text of the source program in which that
binding is in effect - i.e. the name can be used to refer to the corresponding object. Using a name
outside the scope of a particular binding implies one of two things: either it is undefined, or it
refers to a different binding. In C++, for example, the scope of a local variable starts at the
declaration of the variable and ends at the end of the block in which the declaration appears.
Scope is a static property of a name that is determined by the semantics of the PL and the text of
the program.
Example:
class Foo
{
private int n;
void foo() {
// 1
}
void bar() {
int m,n;
...
// 2
}
...
-
Principles of Programming Languages
Page 5
A reference to m at point 1 is undefined
A reference to n at point 1 refers to an instance variable of Foo;
6. Describe the difference between static and dynamic scoping(scope rules)
The scope rules (static, dynamic) of a language determine how references to names are
associated with variables
Static Scoping:
In a language with static scoping, the bindings between names and objects can be
determined at compile time by examining the text of the program, without consideration of the
flow of control at run time.
Scope rules are somewhat more complex in FORTRAN, though not much more. FORTRAN
distinguishes between global and local variables. The scope of a local variable is limited to the
subroutine in which it appears; it is not visible elsewhere. Variable declarations are optional. If a
variable is not declared, it is assumed to be local to the current subroutine and to be of type
integer if its name begins with the letters I-N, or real otherwise.
Global variables in FORTRAN may be partitioned into common blocks, which are then imported
by subroutines. Common blocks are designed for separate compilation: they allow a subroutine
to import only the sets of variables it needs. Unfortunately, FORTRAN requires each subroutine
to declare the names and types of the variables in each of the common blocks it uses, and there is
no standard mechanism to ensure that the common blocks it uses, and there is no standard
mechanism to ensure that the declarations in different subroutines are the same.
Nested scopes- Many programming languages allow scopes to be nested inside each other.
Example: Java actually allows classes to be defined inside classes or even inside methods, which
permits multiple scopes to be nested.
class Outer
{
int v1; // 1
void methodO()
{
float v2; // 2
class Middle
{
char v3; // 3
void methodM()
-
Principles of Programming Languages
Page 6
{
boolean v4; // 4
class Inner
{
double v5; // 5
void methodI()
{
String v6; // 6
}
}
}
}
}
}
The scope of the binding for v1 the whole program
The scope of the binding for v2 is methodO and all of classes
Middle and Inner, including their methods
The scope of the binding for v3 is all of classes Middle and
Inner, including their methods
The scope of the binding for v4 is methodM and all of class
Inner, including its method
The scope of the binding for v5 is all of class Inner, including
its method
The scope of the binding for v6 is just methodI
Some programming languages - including Pascal and its descendants (e.g. Ada) allow
procedures to be nested inside procedures. (C and its descendants do not allow this)
Declaration order
- A field or method declared in a Java class can be used anywhere in the class, even before its
declaration.
- A local variable declared in a method cannot be used before the point of its declaration
Example: class Demo
{
public void method()
{
// Point 1
int y;
}
private int x;
}
-
Principles of Programming Languages
Page 7
The instance variable x can be used at Point 1, but not y
Example: Java
class Demo
{
public void method1()
{
...
method2();
...
}
public void method2()
{
...
method1();
...
}
}
Example: C/C++:
void method2(); // Incomplete declaration
void method1();
{
...
method2();
...
}
void method2() // Definition completes the abov
{
...
method1();
...
}
-
Principles of Programming Languages
Page 8
Dynamic Scoping
In a language with dynamic scoping, the bindings between names and objects
depend on the flow of control at run time, and in particular on the order in which subroutines are
called.
Dynamic scope rules are generally quite simple: the current binding for a given name is the one
encountered most recently during execution, and not yet destroyed by returning from its scope.
Languages with dynamic scoping include APL, Snobol, Perl etc. Because the flow of control
cannot in general be predicted in advance, the binding between names and objects in a language
with dynamic scoping cannot in general be determined by a compiler. As a result, many semantic
rules in a language with dynamic scoping become a matter of dynamic semantics rather than
static semantics.
Ex. Procedure Big is
X:integer;
Procedure sub1 is
X:integer
begin.end Procedure sub2 is
begin. X;
end;
begin------
end;
In dynamic scoping the X inside sub2 may refer either to the X in Big or X in sub2 based on the
calling sequence of procedures
7. What is Local Scope? In block-structured languages, names declared in a block are local to the block. In Algol 60 and
C++, local scopes can be as small as the programmer needs. Any statement context can be
instantiated by a block that contains local variable declarations. In other languages, such as
Pascal, local declarations can be used only in certain places. Although a few people do not like
the fine control offered by Algol 60 and C++, it seems best to provide the programmer with as
much freedom as possible and to keep scopes as small as possible.
Local scope in C++
{
// x not accessible.
t x;
// x accessible
. . . .
{ // x still accessible in inner block
. . . .
}// x still accessible
}// x not accessible
-
Principles of Programming Languages
Page 9
8. What is Global Scope? The name is visible throughout the program. Global scope is useful for pervasive entities, such as
library functions and fundamental constants (_ = 3.1415926) but is best avoided for application
variables.
FORTRAN does not have global variables, although programmers simulate them by over using
COMMON declarations. Subroutine names in FORTRAN are global.
Names declared at the beginning of the outermost block of Algol 60 and Pascal programs have
global scope. (There may be holes in these global scopes if the program contains local declarations of the same names.)
9. Explain about Object Lifetime? The word "lifetime" is used in two slightly different ways
1. To refer to the lifetime of an object
2. The lifetime of the binding of a name to an object.
a. An object can exist before the binding of a particular name to it
Example (Java):
void something(Object o) {
// 2
}
....
Object p = new Object()
// 1
....
something(p);
....
// 3
(The object named by p exists at point 1, but the name o is not
bound to it until point 2)
b. An object can exist after the binding of a particular name to it has ceased
Example:
void something(Object o) {
// 2
}
....
Object p = new Object()
-
Principles of Programming Languages
Page 10
// 1
....
something(p);
....
// 3
(The object named by p continues to exist at point 3, even though the binding of the name
o to it ended when method something() completed)
c. A name can be bound to an object that does not yet exist
Example (Java)
Object o;
// 1
....
o = new Object(); // 2
(At point 1, the name o is bound to an object that does not come into existence until point
2)
d. A name can be bound to an object that has actually ceased to exist
Example (C++ - not possible in Java)
Object o = new Object();
...
delete o; // 1
...
// 2
At 2, the name o is bound to an object that has ceased to exist.
(The technical name for this is a dangling reference).
e. It is also possible for an object to exist without having any name at all bound to it.
Example (Java or C++)
Object o = new Object();
// 1
...
o = new Object();
// 2
At point 2, the object that o was bound to at point 1 still exists, but o no longer is bound
to it. In the absence of any other name bindings to it between points 1 and 2, this object
now has no name referring to it. (The technical name for this is garbage).
-
Principles of Programming Languages
Page 11
10. Storage Allocation Mechanisms?
Static (permanent) Allocation:- The object exists during the entire time the program is
running.
a. Global variables
- The precise way of declaring global variables differs from language to language
- In some languages (e.g. BASIC) any variable
- In FORTRAN any variable declared in the main program or in
a COMMON block
- In Java, any class field explicitly declared static
- In C/C++, any variable declared outside of a class, or (C++)
declared static inside a class
b. Static local variables - only available in some languages
C/C++ local variables explicitly declared static
int foo() {
static int i;
...
A static local variable retains its value from one call of a routine to another
c. Many constants (but not constants that are actually read-only variables)
Advantages: efficiency (direct addressing),history-sensitive subprogram support (static
variables retain values between calls ofsubroutines).
Disadvantage: lack of flexibility (does notsupport recursion)
Stack Based Allocation:- Storage bindings are created for variables when their declaration
statements are elaborated
Typically, the local variables and parameters of a method have stack lifetime. This name comes
from the normal way of implementing routines, regardless of language.
Since routines obey a LIFO call return discipline, they can be managed by using a stack
composed of stack frames - one for each currently active routine.
Example:
void d() { /* 1 */ }
-
Principles of Programming Languages
Page 12
void c() { ... d() ... }
void b() { ... c() ... }
void a() { ... b() ... }
int main() { ... a() ... }
Stack at point 1:
-------------------
| Frame for d |
-
Principles of Programming Languages
Page 13
The shaded blocks are in use, the clear blocks are free. Cross hatched space at the ends of in use
blocks represents internal fragmentation. The discontiguous free blocks indicate external
fragmentation.
Internal fragmentation occurs when a storage management algorithm allocates a block that is
larger than required to hold a given object, the extra space is then unused. External fragmentation
occurs when the blocks that have been assigned to active objects are scattered through the heap
in such a way that the remaining, unused space is composed of multiple blocks: there may be
quite a lot of free space, but no one piece of it may be large enough to satisfy some future
request.
As the program runs, Heap space grows as objects are created. However, to prevent growth
without limit, there must be some mechanism for recycling the storage used by objects that are
no longer alive.
A language implementation typically uses one of three approaches to "recycling" space used by
objects that are no longer alive:
Explicit - the program is responsible for releasing space needed by objects that are no
longer needed using some construct such as delete (C++)
Reference counting: each heap object maintains a count of the number of external
pointers/references to it. When this count drops to zero, the space utilized by the object
can be reallocated
Garbage collection: The process of deallocating the memory given to a variable is known
as garbage collection. The system can work in proper order only by proper garbage
collection. C, C++ does not do garbage collection implicitly but java has implicit garbage
collector.
Advantage: Provides for dynamic storage management
Disadvantage: Inefficient (instead of static) and unreliable
11. What are internal and external fragmentations? A heap is a region of storage in which subblocks can be allocated and deallocated at arbitrary
times. Heaps are required for the dynamically allocated pieces of linked data structures, and for
object like fully general character strings, lists, and sets, whose size may change as a result of an
assignment statement or other update operation.
Heap
-
Principles of Programming Languages
Page 14
Allocation Req.
The shaded blocks are in use, the clear blocks are free. Cross hatched space at the ends of in use
blocks represents internal fragmentation. The discontiguous free blocks indicate external
fragmentation.
Internal fragmentation occurs when a storage management algorithm allocates a block that is
larger than required to hold a given object, the extra space is then unused. External fragmentation
occurs when the blocks that have been assigned to active objects are scattered through the heap
in such a way that the remaining, unused space is composed of multiple blocks: there may be
quite a lot of free space, but no one piece of it may be large enough to satisfy some future
request.
12. What is garbage collection? The process of deallocating the memory given to a variable is known as garbage collection (GC).
The system can work in proper order only by proper garbage collection. C, C++ does not do
garbage collection implicitly but java has implicit garbage collector.
The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by
objects that are no longer in use by the program. Garbage collection was invented by John
McCarthy around 1959 to solve problems in Lisp.
Garbage collection does not traditionally manage limited resources other than memory that
typical programs use, such as network sockets, database handles, user interaction windows, and
file and device descriptors. Methods used to manage such resources, particularly destructors,
may suffice as well to manage memory, leaving no need for GC. Some GC systems allow such
other resources to be associated with a region of memory that, when collected, causes the other
resource to be reclaimed; this is called finalization. Finalization may introduce complications
limiting its usability, such as intolerable latency between disuse and reclaim of especially limited
resources, or a lack of control over which thread performs the work of reclaiming.
13. What is Aliasing? Two or more names that refer to the same object at the same point in the program are said to be
aliases.
1. Aliases can be created by assignment of pointers/references
Example: Java: Robot karel = ...
Robot foo = karel;
foo and karel are aliases for the same Robot object
2. Aliases can be created by passing reference parameters
Example: C++
void something(int a [], int & b)
{
// 1
-
Principles of Programming Languages
Page 15
...
}
int x [100];
int y;
something(x, y);
After the call to something(x, y), at point 1 x and a are aliases for the same array
14. What is Overloading? A name is said to be overloaded if, in some scope, it has two or more meanings, with the actual
meaning being determined by how it is used.
Example: C++
void something(char x)
...
void something(double x)
...
// 1
At point 1, something can refer to one or the other of the two methods, depending
on its parameter.
15. What is Polymorphism? Polymorphism is the concept that supports the capability of an object of a class to behave
differently in response to a message or action.
1. Compile-time (static) polymorphism: the meaning of a name is determined at compile-time
from the declared types of what it uses
2. Run-time (dynamic) polymorphism: the meaning of a name is determined when the program is
running from the actual types of what it uses.
Example: Java has both types in different contexts:
a. When a name is overloaded in a single class, static polymorphism is used to determine which
declaration is meant.
b. When a name is overridden in a subclass, dynamic polymorphism is used to determine which
version to use.
-
Principles of Programming Languages
Page 16
16. What is control flow? The order in which operations are executed in a program
e.g. in C++ like language,
a = 1;
b = a + 1;
if a > 100 then b = a - 1; else b = a + 1;
a - b + c
17. Name eight major categories of control flow mechanisms? a. Sequencing: - Statements are to be executed in a certain specified order- usually the order
in which they appear in the program text.
b. Selection: - Depending on some run time condition, a choice is to be made among two or more statements or expressions. The most common selection condition are if and case
statements. Selection is also sometimes referred to as alternation.
c. Iteration: - A given fragment of code is to be executed repeatedly, either a certain number of times, or until a certain run- time condition is true. Iteration constructs include for/do,
while, and repeat loops.
d. Procedural abstraction: - A potentially complex collection of control constructs is encapsulated in a way that allows it to be treated as a single unit, usually subject to
parameterization.
e. Recursion: - An expression is defined in terms of itself, either directly or indirectly; the computational model requires a stack on which to save information about partially
evaluated instances of the expression. Recursion is usually defined by means of self-
referential subroutines.
f. Concurrency:- Two or more program fragments are to be executed/evaluated at the same time, either in parallel on separate processors, or interleaved on a single processor in a
way that achieves the same effect.
g. Exception handling and speculation:- A program fragment is executed optimistically, on the assumption that some expected condition will be true. If that condition turns out to be
false, execution branches to a handler that executes in place of the remainder of the
protected fragment or in place of the entire protected fragment. For speculation, the
language implementation must be able to undo, or roll back any visible effects of the
protected code.
h. Nondeterminacy: - The ordering or choice among statements or expressions is deliberately left unspecified, implying that any alternative will lead to correct results.
Some languages require the choice to be random, or fair, in some formal sense of the
word.
18. What distinguishes operators from other sort of functions? An expression generally consists of either a simple object or an operator or function applied to a
collection of operands or arguments, each of which in turn is an expression. It is conventional to
use the term operator for built in functions that use special, simple syntax, and to use the term
operand for an argument of an operator. In most imperative languages, function call consists of a
function name followed by a parenthesized, comma-separated list of arguments, as in
-
Principles of Programming Languages
Page 17
my_func (A, B, C)
Operators are typically simpler, taking only one or two arguments, and dispensing with the
parentheses and commas:
a + b
-c
In general, a language may specify that function calls employ prefix, infix, or postfix notation.
These terms indicate, respectively, whether the function name appears before, among, or after its
several arguments:
Prefix: op a b
Infix: a op b
Postfix: a b op
19. Explain the difference between prefix, infix, and postfix notation. What is Cambridge polish notation?
Most imperative languages use infix notation for binary operators and prefix notation for unary
operators and other functions. Lisp uses prefix notation for all notation for all functions.
Cambridge polish notation places the function name inside the parentheses:
( * ( + 1 3) 2)
(append a b c my_list)
20. What is an L- value? An r-value? Consider the following assignments in C:
d= a;
a= b+c;
In the first statement, the right- hand side of the assignment refers to the value of a, which we
wish to place into d. In the second statement, the left hand side refers to the location of a, where
we want to put the sum of b and c. Both interpretations-value and location-are possible because a
variable in C is a named container for a value. We sometimes say that languages like C use a
value model of variables. Because of their use on the left hand side of assignment statements,
expressions that denote locations are referred to as L-values. Expressions that denote values are
referred to as r- values. Under a value model of variables, a given expression can be either an L-
value or an r-value, depending on the context in which it appears.
Of course, not all expressions can be L-values, because not all values have a location, and not all
names are variables. In most languages it makes no sense to say 2+3 =a, or even a= 2+3, if a is
the name of a constant. By the same token, not all L-values are simple names; both L-values and
r-values can be complicated expressions. In C one may write
(f (a) +3) -> b [c]=2;
In this expression f (a) returns a pointer to some element of an array of pointers to structures. The
assignment places the value 2 into the c-th element of field b of the structure pointed at by the
third array element after the one to which fs return value points. In C++ it is even possible for a function to return a reference to a structure, rather than a pointer
to it , allowing one to write
g(a) . b[c]= 2;
-
Principles of Programming Languages
Page 18
21. Define orthogonality in the context of programming language design? One of the principal design goals of Algol 68 was to make the various features of the languages
as orthogonal as possible. Orthogonality means that features can be used in any combination, the
combinations all make sense, and the meaning of a given feature is consistent, regardless of the
other features with which it is combined.
Algol 68 was one of the first languages to make orthogonality a principal design goal, and in fact
few languages since have given the goal such weight. Among other things, Algol 68 is said to be
expression oriented: it has no separate notion of statement. Arbitrary expressions can appear in
contexts that would call for a statement in a language like Pascal, and constructs that are
considered to be statements in other languages can appear within expressions. The following, for
example is valid in Algol 68:
begin
a:=if b < c then d else e;
a:= begin f(b); g(c) end;
g(d);
2+3
End
22. Expression Evaluation
An expression consists of
o A simple object, e.g. number or variable
o An operator applied to a collection of operands or arguments which are
expressions
Common syntactic forms for operators:
o Function call notation, e.g. somefunc(A, B, C), where A, B, and C are expressions
o Infix notation for binary operators, e.g. A + B
o Prefix notation for unary operators, e.g. -A
o Postfix notation for unary operators, e.g. i++
o Cambridge Polish notation, e.g. (* (+ 1 3) 2) in Lisp =(1+3)*2=8
Expression Evaluation Ordering: Precedence and Associativity:-
Precedence rules specify that certain operators, in the absence of parentheses, group more
tightly than other operators. In most languages multiplication and division group more tightly
than addition and subtraction, so 2+3*4 is 14 and not 20.
In Java all binary operators except assignments are left associative
3 - 3 + 5
x = y = f()
(assignments evaluate to the value being assigned)
In C++ arithmetic operators (+, -, *, ...) have higher precedence than relational operators (
-
Principles of Programming Languages
Page 19
The use of infix, prefix, and postfix notation leads to ambiguity as to what is an operand
of what, e.g. a+b*c**d**e/f in Fortran
The choice among alternative evaluation orders depends on
o Operator precedence: higher operator precedence means that a (collection of)
operator(s) group more tightly in an expression than operators of lower
precedence
o Operator associativity: determines evaluation order of operators of the same
precedence
left associative: operators are evaluatated left-to-right (most common)
right associative: operators are evaluated right-to-left (Fortran power
operator **, C assignment operator = and unary minus)
non-associative: requires parenthesis when composed (Ada power
operator **)
Evaluation Order in Expressions:
Precedence and associativity define rules for structuring expressions
But do not define operand evaluation order
o Expression a-f(b)-c*d is structured as (a-f(b))-(c*d) by compiler, but either (a-
f(b)) or (c*d) can be evaluated first at run-time
Knowing the operand evaluation order is important
o Side effects: e.g. if f(b) above modifies d (i.e. f(b) has a side effect) the expression
value will depend on the operand evaluation order
o Code improvement: compilers rearrange expressions to maximize efficiency
Improve memory loads:
a:=B[i]; load a from memory
c:=2*a+3*d; compute 3*d first, because waiting for a to arrive in
processor
Common subexpression elimination:
a:=b+c;
d:=c+e+b; rearranged as d:=b+c+e, it can be rewritten into d:=a+e
Expression Reordering Problems
Rearranging expressions may lead to arithmetic overflow or different floating point
results
o Assume b, d, and c are very large positive integers, then if b-c+d is rearranged
into (b+d)-c arithmetic overflow occurs
o Floating point value of b-c+d may differ from b+d-c
o Most programming languages will not rearrange expressions when parenthesis are
used, e.g. write (b-c)+d to avoid problems
Java: expressions evaluation is always left to right and overflow is always detected
Pascal: expression evaluation is unspecified and overflows are always detected
C and C++: expression evaluation is unspecified and overflow detection is
implementation dependent
-
Principles of Programming Languages
Page 20
Short-Circuit Evaluation
Short-circuit evaluation of Boolean expressions means that computations are skipped
when logical result of a Boolean operator can be determined from the evaluation of one
operand
C, C++, and Java use conditional and/or operators: && and ||
o If a in a&&b evaluates to false, b is not evaluated
o If a in a||b evaluates ot true, b is not evaluated
o Avoids the Pascal problem
o Useful to increase program efficiency, e.g.
if (unlikely_condition && expensive_condition()) ...
Pascal does not use short-circuit evaluation
o The program fragment below has the problem that element a[11] can be accessed
resulting in a dynamic semantic error:
o var a:array [1..10] of integer;
...
i:=1;
while i
-
Principles of Programming Languages
Page 21
o Compiler produces better code, because the address of a variable is only
calculated once
Multiway assignments in Clu, ML, and Perl
o a,b := c,d assigns c to a and d to b simultaneously, e.g. a,b := b,a swaps a with b
a,b := 1 assigns 1 to both a and b
23. What is short circuit Boolean evaluation? Why is it useful?
Short-circuit evaluation of Boolean expressions means that computations are skipped when
logical result of a Boolean operator can be determined from the evaluation of one operand
C, C++, and Java use conditional and/or operators: && and ||
a. If a in a&&b evaluates to false, b is not evaluated b. If a in a||b evaluates ot true, b is not evaluated c. Avoids the Pascal problem d. Useful to increase program efficiency, e.g.
if (unlikely_condition && expensive_condition()) ...
Pascal does not use short-circuit evaluation
e. The program fragment below has the problem that element a[11] can be accessed resulting in a dynamic semantic error:
f. var a:array [1..10] of integer; ...
i:=1;
while i
-
Principles of Programming Languages
Page 22
Iteration: for and while loop statements
Subroutine calls and recursion
Sequencing One statement appearing after another
- A list of statements in a program text is executed in top-down order - A compound statement is a delimited list of statements
o A compund statement is a block when it includes variable declarations
o C, C++, and Java use { and } to delimit a block
o Pascal and Modula use begin ... end
o Ada uses declare ... begin ... end
- C, C++, and Java: expressions can be used where statements can appear
In pure functional languages, sequencing is impossible (and not desired!)
Selection Selects which statements to execute next
Forms of if-then-else selection statements:
o C and C++ EBNF syntax:
if () [else ]
Condition is integer-valued expression. When it evaluates to 0, the else-clause
statement is executed otherwise the then-clause statement is executed. If more
than one statement is used in a clause, grouping with { and } is required
o Java syntax is like C/C++, but condition is Boolean type
o Ada syntax allows use of multiple elsif's to define nested conditions:
if then
elsif then
elsif then
...
else
end if
-
Principles of Programming Languages
Page 23
Case/switch statements are different from if-then-else statements in that an expression
can be tested against multiple constants to select statement(s) in one of the arms of the
case statement:
o C, C++, and Java syntax:
switch ()
{ case : break;
case : break;
...
default:
}
o break is necessary to transfer control at the end of an arm to the end of the switch
statement
The use of a switch statement is much more efficient compared to nested if-then-else statements
Iteration
Iteration means the act of repeating a process usually with the aim of approaching a desired goal
or target or result. Each repetition of the process is also called an "iteration," and the results of
one iteration are used as the starting point for the next iteration.
A conditional that keeps executing as long as the condition is true
e.g: while, for, loop, repeat-until, ...
Iteration and recursion are the two mechanisms that allow a computer to perform similar
operations repeatedly. Without at least one of these mechanisms, the running time of a program
would be a linear function of the size of the program text. In a very real sense, it is iteration and
recursion that make computer useful.
Enumeration-Controlled Loops
Enumeration controlled iteration originated with the do loop of Fortran I. Similar mechanisms have been
adopted in some form by almost every subsequent language, but syntax and semantics vary widely.
Fortran-IV:
DO 20 i = 1, 10, 2
...
20 CONTINUE
which is defined to be equivalent to
i = 1
20 ...
i = i + 2
IF i.LE.10 GOTO 20
Algol-60 combines logical conditions:
-
Principles of Programming Languages
Page 24
o for := do
where the EBNF syntax of is
-> [, enumerator]*
->
| step until
| while
Difficult to understand and too many forms that behave the same:
for i := 1, 3, 5, 7, 9 do ...
for i := 1 step 2 until 10 do ...
for i := 1, i+2 while i < 10 do ...
Pascal has simple design:
o for := to do
for := downto do
o Can iterate over any discrete type, e.g. integers, chars, elements of a set
o Index variable cannot be assigned and its terminal value is undefined
Ada for loop is much like Pascal's:
o for in .. loop
end loop for in reverse .. loop
end loop o Index variable has a local scope in loop body, cannot be assigned, and is not
accessible outside of the loop
C, C++, and Java do not have enumeration-controlled loops although the logically-
controlled for statement can be used to create an enumeration-controlled loop:
o for (i = 1; i
-
Principles of Programming Languages
Page 25
Problems with Enumeration-Controlled Loops:
C/C++:
o This C program never terminates:
#include
main()
{ int i;
for (i = 0; i
-
Principles of Programming Languages
Page 26
Logically-Controlled Post test Loops:
Logically-controlled post test loops test an exit condition after each loop iteration
Not available in Fortran-77 (!)
Pascal:
o repeat [; ]* until
where the condition is a Boolean expression and the loop will terminate when the
condition is true
C, C++:
o do while ()
where the loop will terminate when the expression evaluates to 0 and multiple
statements need to be enclosed in { and }
Java is like C++, but condition is a Boolean expression
Logically-Controlled Mid test Loops:
Logically-controlled mid test loops test exit conditions within the loop
Ada:
o loop
exit when ;
exit when ;
...
end loop o Also allows exit of outer loops using labels:
outer: loop
... for i in 1..n loop
... exit outer when cond;
...
end loop;
end outer loop;
C, C++:
o Use break statement to exit loops
o Use continue to jump to beginning of loop to start next iteration
Java is like C++, but combines Ada's loop label idea to allow jumps to outer loops
-
Principles of Programming Languages
Page 27
Recursion When a function may directly or indirectly call itself
Can be used instead of loops Functional languages frequently have no loops but only recursion
Iteration and recursion are equally powerful: iteration can be expressed by recursion and
vice versa
Recursion can be less efficient, but most compilers for functional languages will optimize
recursion and are often able to replace it with iterations
Recursion can be more elegant to use to solve a problem that is recursively defined
Tail Recursive Functions
Tail recursive functions are functions in which no computations follow a recursive call in
the function
A recursive call could in principle reuse the subroutine's frame on the run-time stack and
avoid deallocation of old frame and allocation of new frame
This observation is the key idea to tail-recursion optimization in which a compiler
replaces recursive calls by jumps to the beginning of the function
For the gcd example, a good compiler will optimize the function into:
int gcd(int a, int b)
{ start:
if (a==b) return a;
else if (a>b) { a = a-b; goto start; }
else { b = b-a; goto start; }
}
which is just as efficient as the iterative implementation of gcd:
int gcd(int a, int b)
{ while (a!=b)
if (a>b) a = a-b;
else b = b-a;
return a;
}
Continuation-Passing-Style:
Even functions that are not tail-recursive can be optimized by compilers for functional
languages by using continuation-passing style:
o With each recursive call an argument is included in the call that is a reference
(continuation function) to the remaining work
-
Principles of Programming Languages
Page 28
The remaining work will be done by the recursively called function, not after the call, so the
function appears to be tail-recursive
Other Recursive Function Optimizations
Another function optimization that can be applied by hand is to remove the work after the
recursive call and include it in some other form as an argument to the recursive call
For example:
typedef int (*int_func)(int);
int summation(int_func f, int low, int high)
{ if (low==high) return f(low)
else return f(low)+summation(f, low+1, high);
}
can be rewritten into the tail-recursive form:
int summation(int_func f, intlow, int high, int subtotal)
{ if (low==high) return subtotal+f(low)
else return summation(f, low+1, high, subtotal+f(low));
}
This example in Scheme:
(define summation (lambda (f low high)
(if (= low high) ;condition
(f low) ;then part
(+ (f low) (summation f (+ low 1) high))))) ;else-part rewritten:
(define summation (lambda (f low high subtotal)
(if (=low high)
(+ subtotal (f low))
(summation f (+ low 1) high (+ subtotal (f low))))))
Nondeterminacy: Our final category of control flow is nondeterminacy. A nondeterministic construct is one in
which the choice between alternatives is deliberately unspecified. Some languages, notably
Algol 68 and various concurrent languages, provide more extensive nondeterministic
mechanisms, which cover statements as well.
25. Data Types
Most programming languages require the programmer to declare the data type of every data
object, and most database systems require the user to specify the type of each data field. The
available data types vary from one programming language to another, and from one database
application to another, but the following usually exist in one form or another:
-
Principles of Programming Languages
Page 29
integer : In more common parlance, whole number; a number that has no fractional
part.
floating-point : A number with a decimal point. For example, 3 is an integer, but 3.5
is a floating-point number.
character (text ): Readable text
26. What purpose do types serve in a programming language? a. Types provide implicit context for many operations, so that the programmer does
not have to specify that context explicitly. In C, for instance, the expression a+b
will use integer addition if a and b are of integer type, it will use floating point
addition if a and b are of double type.
b. Types limit the set of operations that may be performed in a semantically valid
program. They prevent the programmer from adding a character and a record, for
example, or from taking the arctangent of a set, or passing a file as a parameter to
a subroutine that expects an integer.
27. Discuss about Type Systems
A type system consist of (1) a mechanism to define types and associate them with certain
language constructs, and (2) a set of rules for type equivalence, type compatibility, and type
inference. The constructs that must have types are precisely those that have values or that can
refer to objects that have values. These constructs include named constants, variables, record
field, parameters, and sometimes subroutines; literal constants. Type equivalence rules determine
(a) when the types of two values are the same. Type compatibility rules determine (a) when a
value of a given type can be used in a given context. Type inference rules define the type of an
expression based on the types of its constituent parts or the surrounding context.
Type checking:-
Type checking is the process of ensuring that a program obeys the languages type compatibility
rules. A violation of the rules is known as a type clash. A language is said to be strongly typed if
it prohibits, in a way that the language implementation can enforce, the application of any
operation to any object that is not intended to support that operation. A language is said to be
statically typed if it is strongly typed and type checking can be performed at compile time.
Ex: Ada is strongly typed and for the most part statically typed. A Pascal implementation can
also do most of its type checking at compile time, though the language is not quite strongly
typed: untagged variant records are its only loophole.
Dynamic type checking is a form of late binding, and tends to be found in languages that delay
other issues until run time as well. Lisp and smalltalk are dynamically typed. Most scripting
languages are also dynamically typed; some (Python, Ruby) are strongly typed.
-
Principles of Programming Languages
Page 30
Classification of Types:-
The terminology for types varies some from one language to another. Most languages provide
built in types similar to those supported in hardware by most processors: integers, characters,
Boolean, and real (floating point) numbers.
Booleans are typically implemented as single byte quantities with 1 representing true and 0
representing false.
Characters have traditionally been implemented as one byte quantities as well, typically using the
ASCII encoding. More recent languages use a two byte representation designed to accommodate
the Unicode character set.
Numeric Types:-
A few languages (C, Fortran) distinguish between different lengths of integers and real numbers;
most do not, and leave the choice of precision to the implementation. Unfortunately, differences
in precision across language implementations lead to a lack of portability: programs that run
correctly on one system may produce run-time errors or erroneous results on another.
A few languages, including C,C++,C# and Modula-2,provide both signed and unsigned integers.
A few languages (Fortran,C99,Common Lisp) provide a built in complex type, usually
implemented as a pair of floating point numbers that represent the real and imaginary Cartesian
coordinates; other languages support these as a standard library class. Most scripting languages
support integers of arbitrary precision. Ada supports fixed point types, which are represented
internally by integers. Integers, Booleans, characters are examples of discrete types.
Enumeration Types:-
Enumerations were introduced by Wirth in the design of Pascal. They facilitate the creation of
readable programs, and allow the compiler to catch certain kinds of programming errors. An
enumeration type consists of a set of named elements. In Pascal one can write:
Type weekday=(sun, mon, tue, ordered, so comparisons are generally valid(mon
-
Principles of Programming Languages
Page 31
In Ada one would write
Type test_score is new integer range 0..100;
Subtype workday is weekday range mon..fri;
The range portion of the definition in Ada is called a type constraint. test_score is a derived type,incompatible with integers.
The workday can be more or less freely intermixed.
Composite Types:-
Nonscalar types are usually called composite, or constructed types. They are generally created by
applying a type constructor to one or more simpler types. Common composite types include
records, variant records, arrays, sets, pointers, lists, and files.
Records-Introduced by Cobol, and have been supported by most languages since the 1960s.A record consists of collection of fields, each of which belongs to a
simpler type.
Variant records-It differs from normal records in that only one of a variant records field is valid at any given time.
Arrays-Are the most commonly used composite types. An array can be thought of as a function that maps members of an index type to members of a component
type.
Sets-Introduced by Pascal. A set type is the mathematical powerset of its base type, which must often be discrete.
Pointers-A pointer value is a reference to an object of the pointers base type. Pointers are often but not always implemented as addresses. They are most often
used to implement recursive data types
Lists-Contain a sequence of elements, but there is no notion of mapping or indexing. Rather, a list is defined recursively as either an empty list or a pair
consisting of a head element and a reference to a sublist. To find a given element
of a list ,a program must examine all previous elements, recursively or iteratively,
starting at the head. Because of their recursive definition, lists are fundamental to
programming in most functional languages.
Files-Are intended to represent data on mass storage devices, outside the memory in which other program objects reside.
28. Discuss about Type checking
In most statically typed languages, every definition of an object must specify the objects type.
Type equivalence:-
In a language in which the user can define new types, there are two principal ways of defining
type equivalence. Structural equivalence is based on the content of type definitions. Name
equivalence is based on the lexical occurrence of type definitions. Structural equivalence is used
in Algol-68,Modula-3,C.Name equivalence is the more popular approach in recent languages. It
is used in Java, C#, standard Pascal and Ada
-
Principles of Programming Languages
Page 32
The exact definition of structural equivalence varies from one language to another.
Structural equivalence in Pascal:
Type R2=record
a,b : integer
end;
should probably be considered the same as
type R3 = record
a : integer;
b : integer
end;
But what about
Type R4 = record
b : integer;
a : integer
end;
should the reversal of the order of the fields change the type? Most languages say yes.
In a similar vein,consider the following arrays,again in a Pascal like notation:
type str = array [1..10] of char; type str = array [09] of char;
Here the length of the array is the same in both cases, but the index values are different. Should
these be considered equivalent? Most languages say no, but some (Fortran, Ada) consider them
compatible.
Type compatibility:-
Most languages do not require equivalence of types in every context. Instead, they merely say
that a values type must be compatible with that of the context in which it appears. In an
assignment statement, the type of the right hand side must be compatible with that of the left-
hand side. The types of the operands of + must both be compatible with some common type that
supports addition .In a subroutine call, the types of any arguments passed into the subroutine
must be compatible with the types of the corresponding formal parameters, and the types of any
formal parameters passed back to the caller must be compatible with the types of the
corresponding arguments.
Coercion:-
Whenever a language allows a value of one type to be used in a context that expects another, the
language implementation must perform an automatic, implicit conversion to the expected type.
This conversion is called a type coercion.
-
Principles of Programming Languages
Page 33
29. Discuss about Records(Structures) and Variants(Unions)
Record types allow related data of heterogeneous types to be stored and manipulated together.
Some languages (Algol 68, C,C++,Common Lisp) use the term structure instead of record.
Fortran 90 simply calls its records types. Structures in C++ are defined as a special form of class.
Syntax and Operations:-
In c a simple record might be defined as follows.
struct element {
char name[2];
int atomic_number;
double atomic_weight;
_Bool metallic;
};
In Pascal the corresponding declarations would be
Type two_chars = packed array [1..2] of char;
Type element = record
name : two_chars;
atomic_number : integer;
atomic_weight : real;
metallic : Boolean
end;
Memory layout and its impact:
The fields of a record are usually stored in adjacent locations in memory. In its symbol table, the
compiler keeps track of the offset of each field within each record type. When it needs to access
a field, the compiler typically generates a load or store instruction with displacement addressing.
For a local object, the base register is the frame pointer; the displacement is the sum of the
records offset from the register and the fields offset within the record.
Variant Records (Unions):
Programming language of the 1960s and 1970s were designed in an era of severe memory
constraints. Many allowed the programmer to specify that certain variables should be allocated
on top of one another, sharing the same bytes in memory. Cs syntax heavily influenced by Algol 68.
Union {
int i;
double d;
_Bool b;
};
-
Principles of Programming Languages
Page 34
In practice, unions have been used for two main purposes. The first arises in systems programs,
where unions allow the same set of bytes to be interpreted in different ways at different times.
The canonical example occurs in memory management, where storage may sometimes be treated
as unallocated space, sometimes as bookkeeping information, and sometimes as user allocated
data of arbitrary type.
The second common purpose for unions is to represent alternative sets of fields within a record.
A record representing an employee, for example, might have several common fields (name,
address, phone, department) and various other fields such as salaried, hourly or consulting basis.
30. Briefly describe two purposes for unions/ variant records Programming language of the 1960s and 1970s were designed in an era of severe memory
constraints. Many allowed the programmer to specify that certain variables should be allocated
on top of one another, sharing the same bytes in memory. Cs syntax heavily influenced by Algol 68.
union {
int i;
double d;
_Bool b;
};
In practice, unions have been used for two main purposes. The first arises in systems programs,
where unions allow the same set of bytes to be interpreted in different ways at different times.
The canonical example occurs in memory management, where storage may sometimes be treated
as unallocated space, sometimes as bookkeeping information, and sometimes as user allocated
data of arbitrary type.
The second common purpose for unions is to represent alternative sets of fields within a record.
A record representing an employee, for example, might have several common fields (name,
address, phone, department) and various other fields such as salaried, hourly or consulting basis.
31. What is an Array? Arrays are the most common and important composite data types. They have been a fundamental
part of almost every high-level language, beginning with FortranI. Unlike records, which group
related fields of disparate types, arrays are usually homogeneous. Semantically, they can be
thought of as a mapping from an index type to a component or element type.
Syntax and operations:
Most languages refer to an element of an array by appending a subscript delimited by
parentheses or square brackets-to the name of the array. In Fortran and Ada ,one says A(3);in
Pascal and C, one says A[3].Since parentheses are generally used to delimit the arguments to a
subroutine call, square bracket subscript notation has the advantage of distinguishing between
the two. The difference in notation makes a program easier to compile and arguably easier to
read.
Declarations:
In some languages one declares an array by appending subscript notation to the syntax that
would be used to declare a scalar. In C:
-
Principles of Programming Languages
Page 35
Char upper[26];
In Fortran:
character, dimension (1:26)::upper
character (26) upper //shorthand notation
In C the lower bound of an index range is always zero; the indices of an n-element array are
0n-1. In Fortran the lower bound of the index range is one by default.
32. What is an Array Slices A slice or section is a rectangular portion of an array. Fortran 90 and single assignment C
provide extensive facilities for slicing, as do many scripting languages including Perl, Python,
Ruby and R. A slice is simply a contiguous range of elements in a one- dimensional array.
Fortran 90 has a very rich set of array operations: built in operations that take entire arrays as
arguments. Because Fortran uses structural type equivalence, the operands of an array operator
need have the same element type and shape. In particular, slices of the same shape can be
intermixed in array operations, even if the arrays from which they were sliced have very different
shapes. Any of the built in arithmetic operators will take arrays as operands; the result is an
array,of the same shape as the operands, whose elements are the result of applying the operator
to corresponding elements.
Array slices (sections) in Fortran 90.
[ a : b : c in a subscript indicates positions a, a + c, . . . through b. If a or b is omitted, the
corresponding bound is assumed. If c is omitted, 1 is assumed. If c is negative, then we select
positions in reverse order. The slashes in the second subscript of the lower-right example delimit
an explicit list of positions.]
-
Principles of Programming Languages
Page 36
33. How arrays are allocated? Storage management is more complex for arrays whose shape is not known until elaboration
time, or whose shape may change during execution. For these the compiler must arrange not only
to allocate space, but also to make shape information available at run time. Some dynamically
typed languages allow run time binding of both the number and bounds of dimensions. Compiled
languages may allow the bounds to be dynamic, but typically require the number of dimensions
to be static. A local array whose shape is known at elaboration time may still be allocated in the
stack. An array whose size may change during execution must generally be allocated in the heap.
Global lifetime, static shape: allocate space for the array in static global memory
Local lifetime, static shape: space can be allocated in the subroutines stack frame at run time
Local lifetime, shape bound at elaboration time: an extra level of indirection is required to place the space for the array in the stack frame of its subroutine (Ada, C)
Arbitrary lifetime, shape bound at elaboration time: at elaboration time either space is allocated or a preexistent reference from another array is assigned (Java, C#)
Arbitrary lifetime, dynamic shape: must generally be allocated from the heap. A pointer to the array still resides in the fixed-size portion of the stack frame (if local lifetime).
Allocation in Ada of local arrays whose shape is bound at elaboration time
//Ada:
Procedure foo (size : integer ) is
M : array (1size, 1..size) of real; .. begin
.. end foo;
//C99:
void foo (int size)
{
double M[size][size];
}
-
Principles of Programming Languages
Page 37
[The compiler arranges for a pointer to M to reside at a static offset from the frame pointer. M
cannot be placed among the other local variables because it would prevent those higher in the
frame from having static offsets.]
34. Discuss about Memory Layout of Arrays Arrays in most language implementations are stored in contiguous locations in memory. In a one
dimensional array the second element of the array is stored immediately after the first; the third
is stored immediately after the second, and so forth. For arrays of records, it is common for each
subsequent element to be aligned at an address appropriate for any type; small holes between
consecutive records may result.
For multidimensional arrays, there are two layouts: row-major order and column-major order
In row-major order, consecutive locations in memory hold elements that differ by one in the final subscript (except at the ends of rows).
In column-major order, consecutive locations hold elements that differ by one in the initial subscript
-
Principles of Programming Languages
Page 38
[ In row major order, the elements of a row are contiguous in memory; in column-major order,
the elements of a column are contiguous. The second cache line of each array is shaded, on the
assumption that each element is an eight-byte floating point number, that cache lines are 32 bytes
long and that the array begins at a cache line boundary. If the array is indexed from A[0,0] to
A[9,9],then in the row major case elements A[0,4] through A[0,7] share a cache line; in the
column-major case elements A[4,0] through A[7,0] share a cache line.]
Row-Pointer Layout:
Allow the rows of an array to lie anywhere in memory, and create an auxiliary array of pointers to the rows.
Technically speaking, only the contiguous layout is a true multidimensional array
This row-pointer memory layout requires more space in most cases but has three potential advantages.
It sometimes allows individual elements of the array to be accessed more quickly, especially on CISC machines with slow multiplication instructions
It allows the rows to have different lengths, without devoting space to holes at the ends of the rows; the lack of holes may sometimes offset the increased space for
pointers
It allows a program to construct an array from preexisting rows (possibly scattered throughout memory) without copying
C, C++, and C# provide both contiguous and row-pointer organizations for multidimensional arrays
Java uses the row-pointer layout for all arrays
Row-Pointer Layout in C:
char days [ ][10]={
Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday };
-
Principles of Programming Languages
Page 39
. days [2] [3] = =s; /*in Tuesday */
[ It is a true two dimensional array. The slashed boxes are NUL bytes; the shaded areas are
holes.]
char *days [ ] = {
Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday };
.. days [2] [3] = = s; /* in Tuesday */
[ It is a ragged array of pointers to arrays of characters.]
-
Principles of Programming Languages
Page 40
35. What is a dope vector? What purposes does it serve? A dope vector will contain the lower bound of each dimension and the size of each dimension
other than the last. If the language implementation performs dynamic semantic checks for out of
bounds subscripts in array references, then the dope vector may contain upper bounds as well.
Given lower bounds and sizes, the upper bound information is redundant, but it is usually
included anyway, to avoid computing it repeatedly at run time.
The contents of the dope vector are initialized at elaboration time, or whenever the number or
bounds of dimensions change. In a language like Fortran 90, whose notion of shape includes
dimension sizes but not lower bounds, an assignment statement may need to copy not only the
data of an array, but dope vector contents as well.
In a language that provides both a value model of variables and arrays of dynamic shape, we
must consider the possibility that a record will contain a field whose size is not statically known.
In this case the compiler may use dope vectors not only for dynamic shape arrays, but also for
dynamic shape records. The dope vector for a record typically indicates the offset of each field
from the beginning of the record.
36. Discuss about String representations in programming languages In many languages, a string is simply an array of characters. In other languages, strings
have special status, with operations that are not available for arrays of other sorts.
It is easier to provide special features for strings than for arrays in general because strings are one-dimensional
Manipulation of variable-length strings is fundamental to a huge number of computer applications
Particularly powerful string facilities are found in various scripting languages such as Perl, Python and Ruby.
C, Pascal, and Ada require that the length of a string-valued variable be bound no later than elaboration time, allowing contiguous space allocation in the current stack frame
Lisp, Icon, ML, Java, C# allow the length of a string-valued variable to change over its lifetime, requiring that space be allocated by a block or chain of blocks in the heap
37. Sets
A set is an unordered collection of an arbitrary number of distinct values of a common type
Introduced by Pascal, and are found in many more recent languages as well. Pascal supports sets of any discrete type, and provides union, intersection, and difference
operations:
var A,B,C :set of char;
D,E : set of weekday;
. A := B + C;
A := B * C;
-
Principles of Programming Languages
Page 41
A := B - C;
Many ways to implement sets, including arrays, hash tables, and various forms of trees The most common implementation employs a bit vector whose length (in bits) is the
number of distinct values of the base type
Operations on bit-vector sets can make use of fast logical instructions on most machines. Union is bit-wise or; intersection is bit-wise and; difference is bit-wise not, followed by
bit-wise and
38. Discuss the tradeoffs between Pointers and the Recursive Types that arise naturally in a language with a reference model of variables.
A recursive type is one whose objects may contain one or more references to other objects of the
type. Most recursive types are records, since they need to contain something in addition to the
reference, implying the existence of heterogeneous fields. Recursive types are used to build a
wide variety of linked data structures, including lists and trees.
In some languages (Pascal, Ada 83,Modula-3) pointers are restricted to point only to objects in
the heap. The only way to create a new pointer value is to call a built-in function that allocates a
new object in the heap and returns a pointer to it. In other languages ( PL/I, Algol 68, C, C++,
Ada 95) one can create a pointer to a nonheap object by using an address of operator.
Syntax and Operations:-
Operations on pointers include allocation and deallocation of objects in the heap, dereferencing
of pointers to access the objects to which they point, and assignment of one pointer into another.
The behavior of these operations depends heavily on whether the language is functional or
imperative and on whether it employs a reference or value model for variables.
Functional languages generally employ a reference model for names. Objects in a functional
language tend to be allocated automatically as needed, with a structure determined by the
language implementation. Variables in an imperative language may use either a value or a
reference model, or some combination of the two. In C, Pascal, or Ada which employ a value
model, the assignment A: = B puts the value of B into A. If we want B to refer to an object and
we want A: = B to make A refer to the object to which B refers, then A and B must be pointers.
Reference Model:
In Lisp, which uses a reference model of variables but is not statically typed, tree could be
specified textually as (# \ R (# \X ( ) ( ) ) ( # \ Y (# \ Z ( ) ( ) ) (# \ W ( ) ( ) ))).
-
Principles of Programming Languages
Page 42
[Implementation of a tree in Lisp, A diagonal slash through a box indicates a null pointer. The C
and A tags serve to distinguish the two kinds of memory blocks:cons cells and blocks containing
atoms ]
In Pascal tree data types would be declared as follows:
type chr_tree_ptr = ^chr_tree;
chr_tree = record
left,right : chr_tree_ptr;
val : char
end;
In Ada:
type chr_tree;
type chr_tree_ptr is access chr_tree;
type chr_tree is record
left,right : chr_tree_ptr;
val : character;
end record;
In C:
struct chr_tree
{
struct chr_tree * left, *right;
char val;
};
39. What are Dangling References? How are they created? When a heap allocated object is no longer live, a long running program needs to reclaim the
objects space. Stack objects are reclaimed automatically as part of the subroutine calling
sequence. There are two alternatives to reclaime heap objects. Languages like Pascal, C, and
C++ require the programmer to reclaim an object explicitly.
In Pascal:
dispose(my_ptr);
In C:
free (my_ptr);
In C++:
delete my_ptr;
A dangling reference is a live pointer that no longer points to a valid object.
Dangling reference to a stack variable in C++:
int i=3;
int *p = &i;
-
Principles of Programming Languages
Page 43
. void foo( )
{
int n=5;
p=&n;
}
.. cout
-
Principles of Programming Languages
Page 44
[ Reference counts and circular lists]
41. Summarize the differences among mark- and sweep, stop- and-copy, pointer reversal and generational garbage collection.
1). Mark and-Sweep: - The classic mechanism to identify useless blocks. It proceeds in there main steps, executed by the garbage collector when the amount of free space remaining in the
heap falls below some minimum threshold.
a). The collector walks through the heap, tentatively marking every block as useless.
b). Beginning with all pointers outside the heap, the collector recursively explores all linked
data structures in the program, marking each newly discovered block as useful.
c). The collector again walks through the heap, moving every block that is still marked useless
to the free list.
2). Pointer Reversal :- When the collector explores the path to a given block, it reverses the
pointers it follows, so that each points back to the previous block instead of forward to the next.
As it explores, the collector keeps track of the current block and the block from where it came.
3). Stop and Copy: - In a language with variable size heap blocks, the garbage collector can
reduce external fragmentation by performing storage compaction. Many garbage collector
employ a technique known as stop and copy that achieves compaction. Specifically they divide
the heap into two regions of equal size. All allocation happens in the first half. When this half is
full, the collector begins its exploration of reachable data structures. Each reachable blocks is
copied into the second half of the heap, is overwritten with a useful flag and a pointer to the new
location. Any other pointer that refers to the same block is set to point to the new location. When
the collector finishes its exploration, all useful objects have been moved into the second half of
the heap, and nothing in the first half is needed anymore. The collector can therefore swap its
notion of first and second halves, and the program can continue.
4). Generational collection: - The heap is divided into multiple regions. When space runs low the
collector first examines the youngest region, which it assumes is likely to have the highest
-
Principles of Programming Languages
Page 45
proportion of garbage. Only if it is unable to reclaim sufficient space in this region does the
collector examine the next older region. To avoid leaking storage in long running systems, the
collector must be prepared, if necessary, to examine the entire heap. In most cases, however, the
overhead of collection will be proportional to the size of the youngest region only.
Any object that survives some small number of collections in its current region is promoted to
the next older region, in a manner reminiscent of stop and copy. Promotion requires, of course,
that pointers from old objects to new objects be updated to reflect the new locations. While such
old space to new space pointers tends to be rare, a generational collector must be able to find
them all quickly. At each pointer assignment, the compiler generates code to check whether the
new value is an old to new pointer, if so it adds the pointer to a hidden list accessible to the
collector.
5). Conservative Collection:- When space runs low, the collector tentatively marks all blocks in
the heap as useless. It then scans all word aligned quantities in the stack and in global storage. If
any of these words appears to contain the address of something in the heap, the collector marks
the block that contains that address as useful. Recursively, the collector then scans all word-
aligned quantities in the block, and marks as useful any other blocks whose addresses are found
therein. Finally the collector reclaims any blocks that are still marked useless.
42. Why are Lists so heavily used in functional programming languages A list is defined recursively as either the empty list or a pair consisting of an object and another
list. Lists are ideally suited to programming in functional and logic languages, which do most of
their work via recursion and higher order functions. In Lisp, in fact a program is a list, and can
extended itself at run time by constructing a list and executing it.
Lists in ML and Lisp:
Lists in ML are homogeneous: every element of the list must have the same type. Lisp
lists, by contrast are heterogeneous: any object may be placed in a list, so long as it is
never used in an inconsistent fashion. An ML list is usually a chain of blocks, each of
which contains an element and a pointer to the next block. A Lisp list is a chain of cons
cells, each of which contains two pointers, one to the element and one to the next cons
cell.
An ML list is enclosed in square brackets, with elements separated by commas:[a, b, c, d]
A Lisp list is enclosed in parentheses, with elements separated by white space: (a b c d).
In both cases, the notation represents a proper list- one whose innermost pair consists of
the final element and the empty list. In Lisp it is also possible to construct an improper
list, whose final pair contains two elements.
The most fundamental operations on lists are those that construct them from their
components or extract their components from them.
In Lisp:
( cons a (b)) => (a b) (car (a b)) => a (car nil ) => ??
( cdr (a b c)) => (b c) (cdr (a)) => nil
-
Principles of Programming Languages
Page 46
(cdr nil) =>??
(append (a b) (c d)) => (a b c d)
Here we have used => to mean evaluates to. The car and cdr of the empty list (nil) are defined to be nil in Common Lisp.
In ML the equivalent operations are written as follows:
a :: [b] => [a, b]
hd [a, b] => a
hd [ ] => run-time exception
t1 [a, b, c] => [b, c]
t1 [a] => nil
t1[ ] => run-time exception
[a, b] @ [c, d] => [a, b, c, d]
43. Discuss about Files and Input/Output Input/output facilities allow a program to communicate with the outside world. Interactive I/O
generally implies communication with human users or physical devices, which work in parallel
with the running program, and whose input to the program may depend on earlier output from
the program. Files generally refer to off-line storage implemented by the operating system. Files
may be further categorized into those that are temporary and those that are persistent. Temporary
files exist for the duration of a single program run; their purpose is to store information that is
too large to fit in the memory available to the program. Persistent files allow a program to read
data that existed before the program began running, and to write data that will continue to exist
after the program has ended. Some languages provide built in file data types and special syntactic
constructs for I/O. Others relegate I/O entirely to library packages, which export a file type and a
variety of input and output subroutines. The principal advantage of language integration is the
ability to employ non-subroutine call syntax, and to perform operations that may not otherwise
be available to library routines.
-
Principles of Programming Languages
Page 47
Unit-2
44. Discuss about Static and Dynamic links In a language with nested subroutines and static scoping, objects that lie in surrounding
subroutines, and that are thus neither local nor global, can be found by maintaining a static chain.
Each stack frame contains a reference to the frame of the lexically surrounding subroutine. This
reference is called the static link. By analogy, the saved value of the frame pointer, which will be
restored on subroutine return, is called the dynamic link. The static and dynamic links may or
may not be the same, depending on whether the current routine was called by its lexically
surrounding routine, or by some other routine nested in that surrounding routine.
Whether or not a subroutine is called directly by the lexically surrounding routine, we can be
sure that the surrounding routine is active; there is no other way that the current routine could
have been visible, allowing it to be called.
If subroutine D is called directly from B, then clearly Bs frame will already be on the stack. D is nested inside of B, when control enters B that D comes into view. It can therefore be called by C,
or by any other routine that is nested inside C or D, but only because these are also within B.
-
Principles of Programming Languages
Page 48
45. Calling Sequences
Maintenance of the subroutine call stack is the responsibility of the calling sequence. Sometimes
the term calling sequence is used to refer to the combined operations of the caller, the prologue,
and the epilogue.
Tasks that must be accomplished on the way into a subroutine include passing parameters,
saving the return address, changing the program counter, changing the stack pointer to allocate
space, saving registers that contain important values and that may be overwritten by the callee,
changing the frame pointer to refer to the new frame, and executing initialization code for any
objects in the new frame that require it. Tasks that must be accomplished on the way out include
passing return parameters or function values, executing finalization code for any local objects