(MC0073)system programming

Master of Computer Application (MCA) – Semester 3

MC0073 – System Programming

Assignment Set – 1

Que 1. Describe the following with respect to Language Specification:A) Fundamentals of Language ProcessingB) Language Processor development tools

Ans:

Fundamentals of Language Processing

Language Processing – Definition:

Language Processing = Analysis of Source Program + Synthesis of Target Program.

This definition motivates a generic model of language processing activities. We refer to the collection of language processor components engaged in analysing a source program as the analysis phase of the language processor. Components engaged in synthesizing a target program constitute the synthesis phase.

A specification of the source language forms the basis of source program analysis. The specification consists of three components:

1. Lexical rules which govern the formation of valid lexical units in the source language.

2. Syntax rules which govern the formation of valid statements in the source language.

3. Semantic rules which associate meaning with valid statements of the language.

Thus, analysis of a source statement consists of lexical, syntax and semantic analysis.

Lexical analysis (Scanning)

Lexical analysis identifies the lexical units in a source statement. It then classifies the units into different lexical classes, e.g. id’s, constants, reserved id’s, etc. and enters them into different tables. Lexical analysis builds a descriptor, called a token, for each lexical unit.

Syntax analysis (Parsing)

Syntax analysis processes the string of tokens built by lexical analysis to determine the statement class, e.g. assignment statement, if statement, etc. It then builds an IC which

1

represents the structure of the statement. The IC is passed to semantic analysis to determine the meaning of the statement.

Semantic analysis

Semantic analysis of declaration statements differs from the semantic analysis of imperative statements. The former results in addition of information to the symbol table, e.g. type, length and dimensionality of variables. The latter identifies the sequence of actions necessary to implement the meaning of a source statement. In both cases the structure of a source statement guides the application of the semantic rules.

Example : Consider the statement

percent-profit := (profit * 100) / cost-price;

in some programming language. Lexical analysis identifies : =, * and / as operators, 100 as a constant and the remaining strings as identifiers. Syntax analysis identifies the statement as an assignment statement with percent-profit as the left hand side and (profit * 100) / cost-price as the expression on the right hand side. Semantic analysis determines the meaning of the statement to be the assignment of

to the variable percent-profit.

The synthesis phase is concerned with the construction of target language statement(s) which have the same meaning as a source statement. Typically, this consists of two main activities:

· Creation of data structures in the target program

· Generation of target code.

We refer to these activities as memory allocation and code generation, respectively.

Phases and Passes of a language processor

From the preceding discussion it is clear that a language processor consists of two distinct phases–the analysis phase and the synthesis phase. This process is so complex that it is not reasonable, either from a logical point of view or from an implementation point of view. For this reason, it is customary to partition the compilation process into a series of sub processes called phases.

2

http://resources.smude.edu.in/slm/wp-content/uploads/2010/01/clip-image01036.jpg

Phase:

A phase is a logically cohesive operation that takes as input one representation of the source program and produces as output another representation.

Pass: The portions of one or more phases are combined into a module called a pass. A pass reads the source program or output of another pass, makes the transformations specified by its phases and writes the output to an intermediate file, which may then be read by a subsequent pass.

Intermediate representation of programs

The language processor performs certain processing more than once. In pass I, it analyses the source program to note the type information. In pass II, it once again analyses the source program to generate target code using the type information noted in pass I. This can be avoided using an intermediate representation of the source program.

(Intermediate Representation (IR))

The first pass performs analysis of the source program, and reflects its results in the intermediate representation. The second pass reads and analyses the IR, instead of the source program, to perform synthesis of the target program. This avoids repeated processing of the source program. The first pass is concerned exclusively with source language issues. Hence it is called the front end of the language processor. The second pass is concerned with program synthesis for a specific target language. Hence it is called the back end of the language processor.

Desirable properties of an IR are:

· Ease of use: IR should be easy to construct and analyse.

· Processing efficiency: efficient algorithms must exist for constructing and analysing the IR.

· Memory efficiency: IR must be compact.

Language Processor Development Tools (Lpdt)

There are two LPDTs widely used in practice. These are, the lexical analyzer generator LEX, and the parser generator YACC. The input to these tools is a specification of the lexical and syntactic constructs of L, and the semantic actions to be performed on recognizing the constructs.

Compiler or Interpreter for a programming language is often decomposed into two parts:

1. Read the source program and discover its structure.

2. Process this structure, e.g. to generate the target program.

3

Lex and Yacc can generate program fragments that solve the first task.

The task of discovering the source structure again is decomposed into subtasks:

1. Split the source file into tokens (Lex).

2. Find the hierarchical structure of the program (Yacc).

Lex – A Lexical Analyzer Generator

Lex helps write programs whose control flow is directed by instances of regular expressions in the input stream. It is well suited for editor-script type transformations and for segmenting input in preparation for a parsing routine.

Lex source is a table of regular expressions and corresponding program fragments. The table is translated to a program which reads an input stream, copying it to an output stream and partitioning the input into strings which match the given expressions. As each such string is recognized the corresponding program fragment is executed. The recognition of the expressions is performed by a deterministic finite automaton generated by Lex. The program fragments written by the user are executed in the order in which the corresponding regular expressions occur in the input stream.

Yacc (Yet another-compiler to compiler)

Computer program input generally has some structure; in fact, every computer program that does input can be thought of as defining an “input language” which it accepts. An input language may be as complex as a programming language, or as simple as a sequence of numbers. Unfortunately, usual input facilities are limited, difficult to use, and often are lax about checking their inputs for validity.

Yacc provides a general tool for describing the input to a computer program. The Yacc user specifies the structures of his input, together with code to be invoked as each such structure is recognized. Yacc turns such a specification into a subroutine that handles the input process; frequently, it is convenient and appropriate to have most of the flow of control in the user’s application handled by this subroutine.

Que 2. Define the following:A) Addressing modes for CISC (Motorola and Intel)B) Addressing modes for RISC Machines.

Ans:

4

Addressing Modes

Addressing modes, a concept from computer science, are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that architecture identify the operand (or operands) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

2.3.1 Addressing Modes of CISC

The 68000 addressing (Motorola) modes

· Register to Register,

· Register to Memory,

· Memory to Register, and

· Memory to Memory

68000 Supports a wide variety of addressing modes.

· Immediate mode –- the operand immediately follows the instruction

· Absolute address – the address (in either the "short" 16-bit form or "long" 32-bit form) of the operand immediately follows the instruction

· Program Counter relative with displacement – A displacement value is added to the program counter to calculate the operand’s address. The displacement can be positive or negative.

· Program Counter relative with index and displacement – The instruction contains both the identity of an "index register" and a trailing displacement value. The contents of the index register, the displacement value, and the program counter are added together to get the final address.

· Register direct – The operand is contained in an address or data register.

· Address register indirect – An address register contains the address of the operand.

· Address register indirect with predecrement or postdecrement – An address register contains the address of the operand in memory. With the predecrement option set, a predetermined value is subtracted from the register before the (new) address is used. With the postincrement option set, a predetermined value is added to the register after the operation completes.

5

· Address register indirect with displacement — A displacement value is added to the register’s contents to calculate the operand’s address. The displacement can be positive or negative.

· Address register relative with index and displacement — The instruction contains both the identity of an "index register" and a trailing displacement value. The contents of the index register, the displacement value, and the specified address register are added together to get the final address.

2.3.2 Addressing Modes for the Intel 80×86 Architecture

· Simple Addressing Modes (3)

- Immediate Mode : operand is part of the instruction

Example: mov ah, 09hmov dx, offset Prompt

- Register Addressing : operand is contained in register

Example: add ax, bx

- Direct : operand field of instruction contains effective address

Example: add ax, a

· Register Indirect Mode – contents of register is effective address

Example:mov bx, offsetTable add ax, [bx]

- Only the base registers BX, BP and the index registers SI, DI can be used for register indirect addressing. However for reasons give below do not use the BP register.

- Register indirect can be used to implement arrays

Example To sum an array of word-length integers mov cx, size ; set up size of Table mov bx, offset Table ; BX <- address of Table xor ax, ax ; zero out Sum Loop1: add ax, [bx] inc bx ; words are 2 bytes long inc bx loop loop1

- Push and Pop instructions are implemented using register indirection with the SP register.

- The DS segment register is used with the BX, SI, and DI registers. However since the SS segment register is used with the BP, using BP for register indirection will access the stack and not the data segment.

· Base + Offset Indirect or Index + Offset Indirect

6

The effective address is obtained by adding the offset value contained in the operand field of the instruction to the contents of a register

Example add ax, Table[bx]

Here the effective address is obtained by adding the value Table (not the contents stored at location Table) to the BX register. This is the effectiveaddress.

- Base + Offset Indirect (Index + Offset Indirect) makes use of the Base registers BX and BP (but avoid BP for reasons given above) or the Index registers SI and DI.

- Base + Offset Indirect provides an alternate method for inplementing arrays

Examplemov cx, size ; set up size of Table xor bx, bx ; BX <- 0 for zero offset xor ax, axLoop2: add ax, Table[bx] inc bx ; words are 2 bytes long inc bx loop Loop2

- Array Implementation – Offset contains fixed value (usually address of zeroth byte in array) while the contents of the Base register is incremented to compute offset addresses within array. See above example.

- Record Implementation – Fields within records are accessed as fixed offsets from the Base address of the record. For example a record might consist of a integer field (2 bytes) followed by a character field (1 byte) followed by a 12 bytes string field. Offsets for the integer field, character field and string field are 0, 2 and 3 respectively. Thus to access the character field use

Mov bx, offset Recordmov al, [bx]+2

- Syntax for Base + Offset Indirect Addressing. The following are equivalent

add ax, Table[bx] add ax, [Table+bx] add ax, Table+[bx] add ax, [bx]+Table

· Base + Index + Offset Indirect

The effective address is obtained by adding the contents of a Base register (BX or BP but avoid BP) to the contents of an Index register (SI or DI) plus an offset (operand field of instruction). That is

Eaddr <- C (Base Reg) + C(Index Reg) + Offset.

· Relative (branch instructions only) : IP <- IP + offset; same as relative addressing.

7

Que 3. Explain the design of single pass and mutli pass assemblers.Ans:

Single Pass Assembler

A single pass assembler scans the program only once and creates the equivalent binary program. The assembler substitute all of the symbolic instruction with machine code in one pass.

Advantages Every source statement needs to be processed once.

Disadvantages We cannot use any forward reference in our program.

The ability to compile in a single pass has classically been seen as a benefit because it simplifies the job of writing a compiler and one pass compilers generally compile faster than multi-pass compilers. Thus, partly driven by the resource limitations of early systems, many early languages were specifically designed so that they could be compiled in a single pass (e.g., Pascal).

In some cases the design of a language feature may require a compiler to perform more than one pass over the source. For instance, consider a declaration appearing on line 20 of the source which affects the translation of a statement appearing on line 10. In this case, the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a subsequent pass.

The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once.

Splitting a compiler up into small programs is a technique used by researchers interested in producing provably correct compilers. Proving the correctness of a set of small programs often requires less effort than proving the correctness of a larger, single, equivalent program.

Multi pass assembler:

If we use a two-pass assembler, the following symbol definition cannot be allowed.

ALPHA EQU BETA

BETA EQU DELTA

8

DELTA RESW 1

This is because ALPHA and BETA cannot be defined in pass 1. Actually, if we allow multi-pass processing, DELTA is defined in pass 1, BETA is defined in pass 2, and ALPHA is defined in pass 3, and the above definitions can be allowed.

This is the motivation for using a multi-pass assembler It is unnecessary for a multi-pass assembler to make more than two passes over the

entire program. Instead, only the parts of the program involving forward references need to be

processed in multiple passes. The method presented here can be used to process any kind of forward references. Use a symbol table to store symbols that are not totally defined yet. For a undefined symbol, in its entry,

o We store the names and the number of undefined symbols which contribute to the calculation of its value.

o We also keep a list of symbols whose values depend on the defined value of this symbol.

When a symbol becomes defined, we use its value to reevaluate the values of all of the symbols that are kept in this list.

The above step is performed recursively.

Que . 4. Explain the following with respect to Macros and Macro Processors:A) Macro Definition and Expansion B) Conditional Macro ExpansionC) Macro Parameters

Ans: Macro definition and Expansion

Definition : macro

A macro name is an abbreviation, which stands for some related lines of code. Macros are useful for the following purposes:

· To simplify and reduce the amount of repetitive coding

· To reduce errors caused by repetitive coding

· To make an assembly program more readable.

A macro consists of name, set of formal parameters and body of code. The use of macro name with set of actual parameters is replaced by some code generated by its body. This is called macro expansion.

9


Macros allow a programmer to define pseudo operations, typically operations that are generally desirable, are not implemented as part of the processor instruction, and can be implemented as a sequence of instructions. Each use of a macro generates new program instructions, the macro has the effect of automating writing of the program.

Macros can be defined used in many programming languages, like C, C++ etc. Example macro in C programming.Macros are commonly used in C to define small snippets of code. If the macro has parameters, they are substituted into the macro body during expansion; thus, a C macro can mimic a C function. The usual reason for doing this is to avoid the overhead of a function call in simple cases, where the code is lightweight enough that function call overhead has a significant impact on performance.

For instance,

#define max (a, b) a>b? A: b

Defines the macro max, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing

z = max(x, y);

Becomes z = x>y? X:y;

While this use of macros is very important for C, for instance to define type-safe generic data-types or debugging tools, it is also slow, rather inefficient, and may lead to a number of pitfalls.

C macros are capable of mimicking functions, creating new syntax within some limitations, as well as expanding into arbitrary text (although the C compiler will require that text to be valid C source code, or else comments), but they have some limitations as a programming construct. Macros which mimic functions, for instance, can be called like real functions, but a macro cannot be passed to another function using a function pointer, since the macro itself has no address.

In programming languages, such as C or assembly language, a name that defines a set of commands that are substituted for the macro name wherever the name appears in a program (a process called macro expansion) when the program is compiled or assembled. Macros are similar to functions in that they can take arguments and in that they are calls to lengthier sets of instructions. Unlike functions, macros are replaced by the actual commands they represent when the program is prepared for execution. function instructions are copied into a program only once.

Macro Expansion.

A macro call leads to macro expansion. During macro expansion, the macro statement is replaced by sequence of assembly statements.

10


Figure 1.1 Macro expansion on a source program.

Example

In the above program a macro call is shown in the middle of the figure. i.e. INITZ. Which is called during program execution. Every macro begins with MACRO keyword at the beginning and ends with the ENDM (end macro).when ever a macro is called the entire is code is substituted in the program where it is called. So the resultant of the macro code is shown on the right most side of the figure. Macro calling in high level programming languages

(C programming)

#define max(a,b) a>b?a:b

Main () {

int x , y;

x=4; y=6;

z = max(x, y); }

The above program was written using C programing statements. Defines the macro max, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing

Becomes z = x>y ? x: y;

After macro expansion, the whole code would appear like this.

#define max(a,b) a>b?a:b

main()

11




{ int x , y;

x=4; y=6;z = x>y?x:y; }

Example 2:

Consider a typical scenario where one needs to do a number of divisions of Ax register by 10. The following lists the typical evolution of macro development and usage. The final result is the expansion of the macro. which becomes part of the program. In the following example the macro use simply inserts the three instructions of the macro definition.

Conditional Assembly

Means that some sections of the program may be optional, either included or not in the final program, dependent upon specified conditions. A reasonable use of conditional assembly would be to combine two versions of a program, one that prints debugging information during test executions for the developer, another version for production operation that displays only results of interest for the average user. A program fragment that assembles the instructions to print the Ax register only if Debug is true is given below. Note that true is any non-zero value.

Here is a conditional statements in C programming, the following statements tests the expression `BUFSIZE == 1020′, where `BUFSIZE’ must be a macro.

#if BUFSIZE == 1020

printf ("Large buffers!n");

#endif /* BUFSIZE is large */

12



Note : In the C programming macros are defined above the main() .

Parameters in Macros

Macros may have any number of parameters, as long as they fit on one line. Parameter names are local symbols, which are known within the macro only. Outside the macro they have no meaning!

Syntax:

<macro name> MACRO <parameter 1>…….<parameter n>

<body line 1>

<body line 2>

.

.

<body line m>

ENDM

Valid macro arguments are

1. arbitrary sequences of printable characters, not containing blanks, tabs, commas, or semicolons

2. quoted strings (in single or double quotes)

3. Single printable characters, preceded by ‘!’ as an escape character

4. Character sequences, enclosed in literal brackets < … >, which may be arbitrary sequences of valid macro blanks, commas and semicolons

5. Arbitrary sequences of valid macro arguments

6. Expressions preceded by a ‘%’ character

During macro expansion, these actual arguments replace the symbols of the corresponding formal parameters, wherever they are recognized in the macro body. The first argument replaces the symbol of the first parameter, the second argument replaces the symbol of the second parameter, and so forth. This is called substitution.

Example 3

13

MY_SECOND MACRO CONSTANT, REGISTER

MOV A,#CONSTANT

ADD A,REGISTER

ENDM

MY_SECOND 42, R5

After calling the macro MY_SECOND, the body lines

MOV A,#42

ADD A,R5

are inserted into the program, and assembled. The parameter names CONSTANT and REGISTER have been replaced by the macro arguments "42" and "R5". The number of arguments, passed to a macro, can be less (but not greater) than the number of its formal parameters. If an argument is omitted, the corresponding formal parameter is replaced by an empty string. If other arguments than the last ones are to be omitted, they can be represented by commas.

Macro parameters support code reuse, allowing one macro definition to implement multiple algorithms. In the following, the .DIV macro has a single parameter N. When the macro is used in the program, the actual parameter used is substituted for the formal parameter defined in the macro prototype during the macro expansion. Now the same macro, when expanded, can produce code to divide by any unsigned integer.

Fig. 3.0

Example 4

The macro OPTIONAL has eight formal parameters:

OPTIONAL MACRO P1,P2,P3,P4,P5,P6,P7,P8

.

.

<macro body>

.

14


.

ENDM

If it is called as follows,

OPTIONAL 1,2,,,5,6

the formal parameters P1, P2, P5 and P6 are replaced by the arguments 1, 2, 5 and 6 during substitution. The parameters P3, P4, P7 and P8 are replaced by a zero length string.

Que5. Describe the process of Bootstrapping in the context of Linkers

Ans. Boot straping

In computing, bootstrapping refers to a process where a simple system activates another more complicated system that serves the same purpose. It is a solution to the Chicken-and-egg problem of starting a certain system without the system already functioning. The term is most often applied to the process of starting up a computer, in which a mechanism is needed to execute the software program that is responsible for executing software programs (the operating system).

Bootstrap loading

The discussions of loading up to this point have all presumed that there’s already an operating system or at least a program loader resident in the computer to load the program of interest. The chain of programs being loaded by other programs has to start somewhere, so the obvious question is how is the first program loaded into the computer?

In modern computers, the first program the computer runs after a hardware reset invariably is stored in a ROM known as bootstrap ROM. as in "pulling one’s self up by the bootstraps." When the CPU is powered on or reset, it sets its registers to a known state. On x86 systems, for example, the reset sequence jumps to the address 16 bytes below the top of the system’s address space. The bootstrap ROM occupies the top 64K of the address space and ROM code then starts up the computer. On IBM-compatible x86 systems, the boot ROM code reads the first block of the floppy disk into memory, or if that fails the first block of the first hard disk, into memory location zero and jumps to location zero. The program in block zero in turn loads a slightly larger operating system boot program from a known place on the disk into memory, and jumps to that program which in turn loads in the operating system and starts it. (There can be even more steps, e.g., a boot manager that decides from which disk partition to read the operating system boot program, but the sequence of increasingly capable loaders remains.)

Why not just load the operating system directly? Because you can’t fit an operating system loader into 512 bytes. The first level loader typically is only able to load a single-segment

15

program from a file with a fixed name in the top-level directory of the boot disk. The operating system loader contains more sophisticated code that can read and interpret a configuration file, uncompress a compressed operating system executable, address large amounts of memory (on an x86 the loader usually runs in real mode which means that it’s tricky to address more than 1MB of memory.) The full operating system can turn on the virtual memory system, loads the drivers it needs, and then proceed to run user-level programs.

Many Unix systems use a similar bootstrap process to get user-mode programs running. The kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long, into that process. The tiny program executes a system call that runs /etc/init, the user mode initialization program that in turn runs configuration files and starts the daemons and login programs that a running system needs.

None of this matters much to the application level programmer, but it becomes more interesting if you want to write programs that run on the bare hardware of the machine, since then you need to arrange to intercept the bootstrap sequence somewhere and run your program rather than the usual operating system. Some systems make this quite easy (just stick the name of your program in AUTOEXEC.BAT and reboot Windows 95, for example), others make it nearly impossible. It also presents opportunities for customized systems. For example, a single-application system could be built over a Unix kernel by naming the application /etc/init.

Software Bootstraping & Compiler Bootstraping

Bootstrapping can also refer to the development of successively more complex, faster programming environments. The simplest environment will be, perhaps, a very basic text editor (e.g. ed) and an assembler program. Using these tools, one can write a more complex text editor, and a simple compiler for a higher-level language and so on, until one can have a graphical IDE and an extremely high-level programming language.

5.2.3 Compiler Bootstraping

In compiler design, a bootstrap or bootstrapping compiler is a compiler that is written in the target language, or a subset of the language, that it compiles. Examples include gcc, GHC, OCaml, BASIC, PL/I and more recently the Mono C# compiler.

Que 6. Describe the procedure for design of a Linker.

Ans. Design of a linker

Relocation and linking requirements in segmented addressing

16

The relocation requirements of a program are influenced by the addressing structure of the computer system on which it is to execute. Use of the segmented addressing structure reduces the relocation requirements of program.

17

(MC0073)system programming

Documents

Transcript of (MC0073)system programming