System Software and Languages

7/30/2019 System Software and Languages

1/55

SYSTEM SOFTWARE AND LANGUAGES

INTRODUCTION TO COMPUTER SOFTWAREA computer contains two basic parts: (i) Hardware and (ii) Software. In the first two units wetouched upon hardware issues in quite detail. In this unit and also in the rest of the units of this

block we will discuss topics related to software. Without software a computer will remain just a

metal. With software, a computer can store, retrieve, solve different types of problems, createfriendly environment for software development etc.

The process of software development is called programming. To do programming one should haveknowledge of (i) a particular programming language, (ii) set of procedures (algorithm) to solve a

problem or develop software. The development of an algorithm is basic to computer programmingand is an important part of computer science studies. Developing a computer program is a detailed

process, which requires serious thought, careful planning and accuracy. It is a challenging andexacting task, drawing on the creativity of the programmer.

Once an algorithm is obtained, the next step for a solution using a computer would be to program

the algorithm using mathematical and data processing techniques. Programming languagesconstitute the vehicle for this stage of problem solving. The development of programming

Languages is one of the finest intellectual achievements in Computer Science. It has been said "tounderstand a computer, it is necessary to understand a programming language. Understanding

them does not really mean only being able to use them. A lot of people can use them without reallyfully understanding them".

An Operating System is system software, which may be viewed as an organized collection of

software consisting of procedures for operating a computer and providing an environment forexecution of programs. It acts as an interface between users and the hardware of a computer

system.

There are many important reasons for studying operating systems. Some of them are:User interacts with the computer through operating system in order to accomplish his task since it

is his primary interface with a computer. It helps users to understand the inner functions of acomputer very closely. Many concepts and techniques found in operating system have general

applicability in other applications. In this unit, we will discuss about the concepts relating to a

programming language and in the next unit we will deal with the operating system concepts.

INTRODUCTION TO SYSTEM SOFTWAREComputer software consists of sets of instructions that mould the raw arithmetic and logical

capabilities of the hardware units to perform.

In order to communicate with each other, we use natural languages like Hindi, English, Bengali,Tamil, Marathi, Gujarati etc. In the same way programming languages of one type or another are

used in order to communicate instructions and commands to a computer for solving problems.Learning a programming language requires learning the symbols, words and rules of the language.

Program and Programming: A computer can neither think nor make any judgment on its own. Also

it is impossible for any computer to independently analyse a given data and follow its own methodof solution. It needs a program to tell it what to do. A program is a set of instructions that arearranged in a sequence that guides the computer to solve a problem. The process of writing a

program is called Programming. Programming is a critical step in data processing. If the system is

not correctly programmed, it delivers information results that cannot be used. There are two waysin which we can acquire a program. One is to purchase an existing program, which is normally

referred to as packaged software and the other is to prepare a new program from scratch in whichcase it is called customized software. A computer software can be broadly classified into two

categories-System Software and Application Software. Today, there are many languages availablefor developing programs software. These languages are designed keeping in mind some specific

areas of applications. Thus, some of the languages may be good for writing systemprograms/software while some other for application software.


2/55

Since a computer can be used for writing various types of application/system software, there aredifferent programming languages.

i) System Programming Languages: System programs are designed to make the computer

easier to use: An example of system software is an operating system, which consists of many otherprograms for controlling input/output devices, memory, processor etc. To write an operating

system, the programmer needs instruction to control the computer's circuitry (hardware part). Forexample, instructions that move data from one location of storage to a register of the processor. C

and C++ languages are widely used to develop system software.

ii) Application Programming Language: Application programs are designed for specificapplications, such as payroll processing, inventory control etc. To write programs for payroll

processing or other applications, the programmer does not need to control the basic circuitry of acomputer. Instead the programmer needs instructions that make it easy to input data, produce

output, do calculations and store and retrieve data. Programming languages that are suitable forsuch application programs support these instructions but not necessarily the types of instructions

needed for development of system programs.

There are two main categories of application programs: business programs and scientificapplication programs. Most programming languages are designed to be good for one category of

applications but not necessarily for the other, although there are some general purpose languagesthat supports both types. Business applications are characterized by processing of large inputs and

large outputs, high volume data storage and retrieval but call for simple calculations. Languages,which are suitable for business program, development, must support high volume input, output

and storage but do not need to support complex calculations. On the other hand, programming

languages that are designed for writing scientific programs contain very powerful instructions forcalculations but rather poor instructions for input, output etc. Amongst traditionally used

programming languages, COBOL (Commercial Business Oriented Programming Language) is more

suitable for business applications whereas FORTRAN (Formula Translation - Language) is moresuitable for scientific applications. Before we discuss more about languages let us briefly look at the

categories of software viz. system and application software.

SYSTEM SOFTWARE

Language TranslatorA language translator is a system software which translates a computer program written by a user

into a machine understandable form.

Operating SystemAn operating system (OS) is the most important system software and is a must to operate acomputer system. An operating system manages a computer's resources very effectively, takes

care of scheduling multiple jobs for execution and manages the flow of data and instructionsbetween the input/output units and the main memory. Advances in the field of computer hardware

have also helped in the development of more efficient operating systems.

UtilitiesUtility programs are those which are very often requested by many application programs. A fewexamples are:

SORT/MERGEutilities, which are used for sorting large volumes of data and merging them into a single sortedlist, formatting etc.

APPLICATION SOFTWAREApplication software is written to enable the computer to solve a specific data processing task. A

number of powerful application software packages, which does not require significant programmingknowledge, have been developed. These are easy to learn and use as compared to the


3/55

programming languages.Although these packages can perform many general and special functions, there are applications

where these packages are not found adequate. In such cases, application program is written tomeet the exact requirements. A user application program may be written using one of these

packages or a programming language. The most important categories of software packagesavailable are:

Data Base Management Software

Spreadsheet Software

Word Processing Desktop Publishing (DTP) and presentation Software Graphics SoftwareData Communication SoftwareStatistical and Operational Research Software

Data Base Management SoftwareDatabases are very useful in creation maintaining query, the databases and generation of reports.Many of today's Database Management System are Relational Database Management System's.

Many RDBMS packages provide smart assistants for creation of simple databases for invoices,orders and contact lists. Many database management systems are available in the market these

days. You can select any one based on your needs, for example, if you have only few databasesthen package like dBase, FoxPro etc. may be good. If you require some additional features and

moderate work load then Lotus Approach, Microsoft Access are all-right. However, if you are having

high end database requirements which requires multi-user environment and data security, accessright, very good user interface etc. then you must go for professional RDBMS package like Ingress,

Oracle, Integra etc.

Accounting PackageThe accounting packages are one of the most important packages for an office. Some of the

features, which you may be looking on an accounting, may be:

tax planner facilityfacility for producing charts and graphs

finding accounts payablesimple inventory control facility

payroll functions

on-line connection to stock quotescreation of invoices easily

One of the good packages in this connection is Quicken for windows.

Communication PackageThe communication software includes software for fax. The fax-software market is growing up.Important fax software is Delrina's WinFax PRO 4.0. Some of the features such as Remote

Retrieval and Fax Mailbox should be looked into fax software. These features ensure thatirrespective of your location you will receive the fax message. Another important feature is fax

Broadcast. This allows you to send out huge numbers of faxes without tying up your fax machineall day.

If you have to transfer files from your notebook computer to a desktop computer constantly thenyou need a software program that coordinates and updates documents. On such software is Laplink for Windows. This software offers very convenient to use features. For example, by simply

dragging and dropping a file enables file transfer. This software can work if a serial cable or aNovell network or a modem connects you.

Desktop Publishing PackagesDesktop Publishing Packages are very popular in Indian context. Newer publishing packages also

provide certain in built formats such as brochures, newsletters, flyers etc., which can be useddirectly. Already created text can be very easily put in these packages, so are the graphics

placements. Many DTP packages for English and languages other than English are available.Microsoft Publisher, PageMaker, Corel Ventura are few popular names. Desktop publishing


4/55

packages, in general, are better equipped in Apple-Macintosh computers.

CATEGORIES OF LANGUAGESWe can choose any language for writing a program according to the need. But a computer executes progra

after they are represented internally in binary form (sequences of 1s and 0s). Programs written in any othelanguage must be translated to the binary representation of the instructions before the computer can execu

those. Programs written for a computer may be in one of the following categories of languages.

MACHINE LANGUAGEThis is a sequence of instructions written in the form of binary numbers consisting of l s, 0s to which the co

responds directly. The machine language was initially referred to as code, although now the term code is usmore broadly to refer to any program text. An instruction prepared in any machine language will have at leparts. The first part is the command or Operation, which tells the computer what functions, is to be perform

computers have an operation code for each of its functions. The second part of the instruction is the operantells the computer where to find or store the data that has to be manipulated. Just as hardware is classified

generations based on technology, computer languages also have a generation classification based on the leinteraction with the machine. Machine language is considered to be the first generation language.

Advantage Of Machine LanguageIt is faster in execution since the computer directly starts executing it.

Disadvantage Of Machine Language

It is difficult to understand and develop a program using machine language. Anybody going through this prfor checking will have a difficult task understanding what will be achieved when this program is executed.Nevertheless, the computer hardware recognizes only this type of instruction code.

The following program is an example of a machine language program for adding two numbers.

0011 1110 Load A register with

0000 0111 value 7

0000 0110 Load B register with 10

0000 1010 A = A+B

1000 0000 store the result

0011 1010 into the memory location

0110 0110

0000 0000 whose address is 100 (decimal)

0111 0110 Halt processing

ASSEMBLY LANGUAGEAssembly language unlocks the secret of your computer's hardware and software. It teaches you about thethe computer's hardware and operating system work together and how, the application programs communi

with the operating system. Assembly language, unlike high level languages, is machine dependent. Each

microprocessor has its own set of instructions, that it can support.

When we employ symbols (letter, digits or special characters) for the operation part, the address part and

parts of the instruction code, this representation is called an assembly language program. This is considerethe second-generation language. Machine and Assembly languages are referred to as low level languages s

the coding for a problem is at the individual instruction level.

Each machine has got its own assembly language, which is dependent upon the internal architecture of theprocessor. An assembler is a translator, which takes its input in the form of an assembly language program

produces machine language code as its output. The following program is an example of an assembly languaprogram for adding two numbers X and Y and storing the result in some memory location.

LDA ,7 Load register A with 7

LDB ,10 Load register B with 10


5/55

ADD A,B A_A+B

LD (100),A Save the result in the location 100

HALT Halt process

From this program, it is clear that usage of mnemonics in our example LD, ADD, HALT are the mnemonics)improved the readability of our program significantly.

A machine cannot execute an assembly language program directly, as it is not in a binary form. An assemb

needed in order to translate an assembly language program into the object code executable by the machinis illustrated in the figure 1.

Figure 1: Assembler

Advantage of Assembly LanguageWriting a program in assembly language is more convenient than in machine language. Instead of binary

sequence, as in machine language, it is written in the form of symbolic instructions. Therefore, it gives a litmore readability.

Disadvantages of Assembly LanguageAssembly language (program) is specific to particular machine architecture. Assembly languages are design

specific make and model of a microprocessor. It means that assembly language programs written for oneprocessor will not work on a different processor if it is architecturally different. That is why the assembly lan

program is not portable. Assembly language program is not as fast as machine language. It has to be firsttranslated into machine (binary) language code.

VARIABLES, CONSTANTS, DATA TYPE, ARRAY AND EXPRESSIONS

These are the smallest components of a programming language.

figure - 2 Memory Organization

VariableThe first thing we must learn is how to use the internal memory of a computer in writing a

program. Memory may be pictured as a series of separate memory cells as shown in figure 2 .Computer memory is divided into several locations. Each location has got its own address.

Each storage location holds a piece of information. In order to store or retrieve information from a


6/55

memory location, we must give that particular location a name. Now study the following definition.Variable: It is a character or group of characters assigned by the programmer to a single memory

location and used in the program as the name of that memory location in order to access the valuestored in it.

For example in expression A = 5, A is a name of memory location i.e. a variable where 5 is stored.

ConstantIt has fixed value in the sense that two cannot be equal to four. String constant is simply asequence of characters such as "computer" which is a string of 8 characters. The numeric constant

can be integer representing whole quantities or a number with a decimal point to representnumbers with fractional part. Constant would be probably the most familiar concept to us since wehave used it in doing everything that has to do with numbers. Numeric constants can be added,

subtracted, multiplied, divided, and also compared to say whether two of them are equal, less than

or greater than each other.

As string constants are a sequence of characters, a related string constant may be obtained from a

given one, by chopping off some characters from beginning or end or both or by appending anotherstring constant at the beginning or end. For example, from 'Gone with the wind', we can get 'one

with ', 'Gone with wind', and so on. String constants can also be compared in a lexicographic(dictionary) sense to say whether two of them are equal, not equal, less than or greater than each

other.

Data typeIn computer, programming, the term data refers to anything and everything processed by thecomputer. There are different types of data processed by the computer, numbers are one type of

data and words are of another type. In addition, the operations that are performed on data differfrom one type of data to another type. For example multiplication applies to numbers and not

words or sentences.

Data type defines a set of related values/integers, number with fraction, characters and a set of

specific operations that can be performed on those values.In BASIC a statement LET A = 15 denotes that A is a numeric data type because it contains

numbers but in a statement LET A$ = "BOMBAY", A$ is a variable of character data type. Data type

also defines in terms of contiguous cells should be allocated for a particular variable.

ArrayIn programming we deal with large amount of related data. To represent each data element we

have to consider them as separate variables. For example if we have to analyse for the salesperformance of a particular company for the last 10 years, we can take ten different variables

(names) each one representing sales of a particular year. If we analyse sales information for morethan 10 years, then accordingly number of variables will further increase. It is very difficult to

manage with large number of variables in a program. To deal with such situation an array is used.An array is a collection of same type of data (either string or numeric), all of that are referenced by

the same name. For example, list of 5 years sales information of a company can be referred to bysame array name A.

A(1) A(2) A(3) A(4) A(5)50,000 1,00,000 5,00,000 8,00,000 9,00,000

A(1) specifies Sales information of a first year

A(2) specifies Sales information of a second yearA(3) specifies Sales information of a fifth year

ExpressionWe know that we can express intended arithmetic operations using expressions such as X +Y+ Z

and so on. Several simple expressions can even be nested together using parentheses to formcomplex expressions. Every computer language specifies an order by in which various arithmetic

operators are evaluated in a given expression. An expression may contain operators such as


7/55

Parentheses ( )Exponentiation ^

Negation -Multiplication, division *, /

Addition, subtraction +,-The operators are evaluated in the order given above. For example, the expression

2+8*(4 - 613)can be considered to be evaluated as follows:

2+8*(4 - 6/3)Sub expression (4 - 6/3) taken up first

2+8*(4 - 2)division 6/3 within (4 - 6/3) has higher priority than 4 - 62+8*2Subtraction (4 - 2) is performed next (4 - 6/3) is now complete.2+8*28*2 will be executed first then its result will be added with 2 that is 16 + 2 = 18

It is useful to remember the order of priority of various operators. But it is safer to simplify

expressions and enclose them in parentheses to avoid unpleasant surprises. So far we havefocused on arithmetic expressions. But expression is a very general concept. We mentioned earlier

that apart from arithmetic operations we could compare numbers or strings. We do it by usingrelational operators in expressions.

The following is a list of relational operators:

= equal to

< > not equal to< less than.

> greater than

= greater than or equal to

These operations have the same level of Priority among themselves but a lower priority thanarithmetic operators mentioned earlier. The relational expressions result in one of the truth-values,

either TRUE or FALSE. When a relational expression such as (3 > 5) is evaluated to be FALSE bysuch languages, a value 0, that is false, is assigned, whereas (5, < 7) will be evaluated to be

TRUE, and value 1 will be assigned.

Note that relational expressions are capable of comparing only two values separated by appropriaterelational operator. If we want to an express idea such as whether number 7 happens to be within

two other numbers 4 and 10, we may be tempted to write relational expression 4 2) OR (7 > 2) is TRUE


8/55

XOR

TRUE only if one of the adjoining

expressions is TRUE and other isFALSE.

The XOR has same priority as OR. (4< 7) XOR (7 < 10) is FALSE.

ASSEMBLY LANGUAGE FUNDAMENTALS

The best way to learn to write assembly language program, is to first study a simple assembly

written program. We shall in this section do just the same.

A Sample Program;ABSTRACT : This program adds 2 8-bit words in the memory

; : locations called NUM1 and NUM2. The result is

; : stored in the memory location called RESULT. If; : there was a carry from the addition it will be stored

; : as 0000 0001 in the location CARRY

;ALGORITHM:

; get NUM l; Add NUM2

; put sum into memory at SUM; position carry in LSB of byte registers

; mask off upper seven bits; Store the result in the carry location.

;;PORTS :None used

;PROCEDURES :None used;REGISTERS : Uses CS, DS, AX

DATA SEGMENTNUM1 DB 15h ; First number stored here

NUM2 DB 20h ; Second number stored hereRESULT DB ? ; Put sum here

CARRY DB ? ; Put any carry here

DATA ENDSCODE SEGMENT

ASSUME CS:CODE,

DS:DATASTART: MOV AX, DATA ; Initialize data segment

MOV DS, AX ; register

MOV AI, NUM1 ; Get the first numberADD AI, NUM2 ; Add it to 2nd number

SULT, AL ; Store the resultRCL AL, 01 ; Rotate carry into LSB

AND AL, 00000001B ; Mask out all but LSB

MOV CARRY, AL ; Store the carry resultMOV AX,4C00h

INT 21hCODE

ENDSEND START

The program contains, certain additional mnemonics, in addition to the instructions you have

studied so far. These are called as assembler directives or pseudo operations. These are thedirections for the assembler. Their meaning is valid only till the assembly time. There is no code

generated for them.

SEGMENT and ENDS Directive


9/55

The SEGMENT and ENDS directives are used to identify a group of data items or a group ofinstructions, called the segment. These directives are used in the same way as parentheses are

used in algebra, to group the like items together. A group of data statements or the instructions,that are put in between the SEGMENT and ENDS directives are said to constitute a logical segment.

This segment is given a name. In our example CODE and DATA are the names given to code anddata segments respectively.

The segments should have a unique name, there can be no blanks within the segment name, thelength of the segment name can be up to 31 characters. Name of the mnemonics or any other

reserved words is not allowed as the segment name or label.

Data Definition DirectivesIn assembly language, we define storage for variables using data definition directives. Data

definition directives create storage at assembly time, and can even initialize a variable string to a

starting value. The directives are summarized in the following table:

Directive Description Number of bytes Attribute

DB Define byte 1 Byte

DW Define word 2 word

DDDefine double-

word

4 double word

DO Define quadword 8 quad word

DT Define 10 bytes 10 ten bytes

As we see from the following table, the variable being defined is given an attribute. The attribute

refers to the basic unit of storage used when the variable was defined. These variables can begiven a name as follows:

Example

CHAR_VAR DB 'A'; CHAR_VAR = 41hWORD_VAR DW 01234h; ex number should begin with zero

LIST DB 1,2,3,4; list of 4 bytes initialized by numbers 1,2,3,4NUM DW 4200

DEN DB 20

DUP directive is used to duplicate the basic data definition 'n' number of times. Example:ARRAY DB 10 DUP (0)

Define an array ARRAY of 10 data bytes, each byte initialized to 0. The initial value can be anythingacceptable to the basic data type.EQU directive is used to define a name to a constant. Example:

CONS EOU 20

will define a constant with value 20. Now in your program, where ever you want to use 20, you can

use the name instead. The advantage of this is that: lets say, you want to change the value ofCONS to, say 10, at some instance of time. Now, instead of making changes every where in the

program, you just have to change the EQU definition, and assemble the program again. Thechange will be done automatically at all places.

Types of numbers used in data statements can be octal, binary, hexadecimal, decimal and ASCII.

Following are the examples of each type:

TEMP_MAX DB 01101100B ;BInary

OLD_VAL DW 73410 ;Octal

DECIMAL DB 49 ;Decimal

HEX_VAL DW 03B2Ah ;Hex

ASCII_VAL DB 'EXAMPLE' ;ASCII


10/55

The ASSUME Directive8086 has four type of segments, discussed in the previous unit. In the program there can be morethan one code segments, data segments, or extra segments defined. However, only one of each

type can be active at a time. ASSUME directive is used to tell the assembler, which segment is tobe used as an active segment at any instant, and with respect to which it has to calculate the

offsets of the variables or instructions. It is usually placed immediately after the SEGMENTdirective, in the code segment, but you can have as many additional Assumes as you like.

Each time an ASSUME is encountered, the assembler starts to calculate the offset with respect to

that segment. In the example above CODE and DATA are the two segments defined, one each forcode and data.

Initializing Segment RegistersASSUME is only a directive, which is used to calculate the offset of variables, instructions or stackelement, with respect to a specific segment of its type. It does not initialize the segment registers.

Initialization of the segment registers has to be done explicitly using MOV instructions as follows:MOV AX,DATA

MOV DS,AX

The above statements are used to initialize the data segment register. The segment registerscannot be directly loaded with memory variable, therefore, the segment name is first moved into

some general purpose register, which then is moved into the segment register. All segmentregisters can be initialized in the same manner. Code segment register, is initialized automatically

by the loader.

END DirectiveThe END directive tells the assembler to stop reading and assembling the program from there on.Any statement after the END will be ignored by the assembler. There can be only one END in the

program, which is the last statement of the program.

THE ASSEMBLY LANGUAGE PROGRAMSThe assembly language programs can be written in two ways: one in which all code and data is

written as part of one segment, called COM programs, and the other where you have more than

one segment, called the EXE programs. We shall . study each of them in brief, looking at theiradvantages and disadvantages.

COM ProgramsA COM (Command) program is simply a binary image of a machine language program. It is loaded

in the memory at the lowest available segment address. The program code begins at offset 100h,the first 1K being occupied by the interrupt vector table, discussed in the earlier section. All

segment registers are set to the base segment address of the program.

A COM program keeps, its code, data, and stack within the same segment. Thus, its total size

should not exceed 64K bytes. A COM program sample is shown. The program's only segment(CSEG) must be declared explicitly using segment directives.

;TITLE ADD TWO NUMBERS AND STORE THE CARRY IN A THIRD; VARIABLECSEG SEGMENTASSUME CS:CSEG, DS:CSEG, SS:CSEGORG 100hSTART:MOV AX, CSEG ; Initialize data segment

MOV DS, AX ; register

MOV AL, NUM1 ; Get the first numberADD AL, NUM2 ; Add it to. 2nd number


11/55

MOV RESULT, AL ; Store the resultRCL AL, 01 ; Rotate carry into LSB

AND AL, 00000001B ; Mask out all but LSBMOV CARRY, AL ; Store the carry result

MOV AY,4C00hINT 21h

NUM1 DB 15h ; First number stored hereNUM2 DB 20h ; Second number stored here

RESULT DB ? ; Put sum here

CARRY DB ? ; Put any carry hereCSEG ENDSENDSTART

The ORG directive sets the location counter at offset 100h before generating any instruction. A

COM program takes up less space on disk, as compared to the EXE program. In spite of this itallocates all available RAM when loaded. COM programs require at least one full segment, because

they automatically place their stack at the end of the segment.

EXE ProgramsAn EXE program is stored on disk with extension EXE. EXE programs are longer than the COMprograms, because with each EXE program is associated an EXE header followed by a load module

containing the program itself The EXE header, is of fixed 256 bytes, and contains information,which is used by DOS to correctly calculate the address of segments and other components. We

will not go into the details of these.

The load module consists of separate segments, which may be thought of as reserved area forinstructions, variables and stack. The EXE program may contain up to 64K segments, although at

the most only four segments may be active at any time. The segments may be of variable size,with maximum being 64K bytes.

Advantages Of exe programs are :

EXE programs are better suited to debugging.EXE-format assembler programs are more easily converted into subroutines for high-level

languages.

The third reason has to do with memory management. EXE programs are more easily relocatable,because, there is no ORG statement, forcing the program to be loaded from a specific address. Also

to fully use multitasking operating system, programs must be able to share computer memory andresources. An EXE program is easily able to do this.

ASSEMBLER / MACRO PROCESSOR

INTRODUCTIONComputers have changed a lot since the days when people used to communicate with them by onand off switches denoting primitive instructions. With present day computers interaction has

become more user-friendly because of the advancement in hardware and software tools. One

category of software which assist in the mechanics of software development is system software.

Assembler, linker/loader, compiler, operating system all belong to the realm of system software.

We discussed several components of programming languages, basic definitions of Assembler,

Compiler, interpreters and differences among them. In this unit our focus will be on the

implementation and use of assemblers. We will also cover broadly the use of macro processor,loaders and linkers. This unit is organized as follows:

ASSEMBLERAssembler ImplementationAn assembly is a program that accepts as input, an assembly language program and produces its

machine language equivalent along with information for the loader (Figure 1).


12/55

Fig. 1: Assembler

For example, the externally defined symbols (library program) must be indicated to the loader theassembler does not know the address of these symbols and it is up to the loader to find the

programs containing them, load them into memory and place the values of these symbols in thecalling program. Here we will discuss the different approaches to design of an assembler and its

related program. Assembler and its related Program The assembler-language program containsthree kinds of entities. Absolute entities include operation codes, numeric and string constants

and fixed addresses. The values of absolute entities are independent of which storage locations

the resulting machine code will eventually occupy.

Relative entities include the addresses of instructions and of working storage. These are fixed only

with respect to each other, and are normally staled relative to the address of the beginning of themodule. An externally defined entity is used within a module but not defined within it Absolute or

relative is not necessarily known at the time the module is translated.

The object program includes identification of which addresses are relative. which symbols aredefined externally, and which internally defined symbols are expected to be referenced externally.

In the modules in which the latter are used. they are considered to be externally defined. Theseexternal references are resolved for two or more object programs by a linker. The linker accepts

the several object program as input and produces a single program ready for loading, hencetermed a load program.

The module is free of external references and consists essentially of machine-language code

accompanied by a specification of which addresses are relative. When the actual main storagelocations to be occupied by the program become known, a relocating loader reads the program

into storage and adjusts the relative addresses to refer to those actual locations. The output fromthe loader is a machine-language program ready for execution. The overall process is depicted in

Figure 3. If only a single source-language module containing no external references is translated,it can be loaded directly without intervention by the linker. In some programming systems the

format of linker output is sufficiently compatible with that of its input to permit the linking of apreviously produced load module with some new object modules.

The functions of linking and loading are sometimes both effected by a single program, called a

linking loader. Despite the convenience of combining the linking and loading functions, it isimportant to realize that they are distinct functions, each of which can be performed

independently of the other.


13/55

Fig. 3 : Program Translation

LOAD AND GO ASSEMBLERThe simplest assembler program is the load and go assembler. It accepts as input a programwhose instructions are essentially one to one correspondence with those of machine language but

with symbolic names used for operators and operands. It produces machine language as outputwhich are loaded directly in main memory and gets executed. The translation is usually performed

in a single pass over the input program text. The resulting machine language program occupiesstorage locations which are fixed at the time of translation and cannot be changed subsequently.

The program can call library subroutines, provided that they occupy other locations than thoserequired by the program. No provision is made for combining separate subprograms translated in

this manner. The load and go assembler forgoes the advantages of modular programdevelopment. Among the most of these are

(1) the ability to design code and test different program components in parallel.

(2) change in one particular module does not require scanning the rest of program. Mostassemblers are therefore designed to satisfy the desire to create programs in modules. These

module assemblers. generally are developed in a two-pass translation. During the first pass theassembler examines the assembler-language program and collects the symbolic names into a

table. During the second pass, the assembler generates code which is not quite in machinelanguage. It is rather in a similar form, sometimes called "relocatable code" and here called object

code. The program module in object-code form is typically called an object module.

ONE-PASS MODULE ASSEMBLERThe translation performed by an assembler is essentially a collection of substitutions: machine

operation code for mnemonic, machine address for symbolic, machine encoding of a number forits character representation, etc. Except for one factor, these substitutions could all be performed

in one sequential pass over the source text. That factor is the forward reference (reference to an

instruction which has not yet been scanned by an assembler). The separate passes of the twopass assemblers are required to handle forward references without restriction. If certain

limitations are imposed, however, it becomes possible to handle forward references withoutmaking two passes. Different sets of restrictions lead to the one pass assembler. These one- pass

assemblers are particularly attractive when secondary storage is either slow or missing entirely,as on many small machines.

TWO PASS ASSEMBLERMostly assembler are designed in two passes stages), therefore, they are called Two-Pass

Assemblers. 'Re pass-wise grouping of tasks in a two pass assembler is given below:

Pass I


14/55

Separate the symbols, mnemonic op-code and operational fields.Determine the storage requirement for every assembly language statement and up date the

location counter.Build the symbol table. (Table that is used to store each label and its corresponding value).

Pass IIGenerate object code.

FUNCTIONThe program of figure 4, although, written in a hypothetical assembler language, contains the

basic elements which need to be translated into machine language. (It is not essential forstudents to understand the meaning of each statement of the program.) For ease of reference,

each instruction is defined by a line number, which is not part of the program. Each instruction inour language contains either an operation specification (lines 1- 15) or a storage specification

(lines 16- 21). An operation specification is a symbolic operation code, which may be preceded bya label and must be followed by 0, 1, or two operand specifications, as appropriate to the

operation. A storage specification is a symbolic instruction to the assembler. In our assemblerlanguage, it must be preceded by a label and must be followed, if appropriate, by a constant

FIXED. Labels and operand specifications are symbolic addresses; every operand specificationmust appear somewhere in the program as a label.

Line Label Operation Operand 1 Operand 2

1 COPY ZERO OLDER

2 COPY ONE OLD

3 READ LIMIT

4 WRITE OLD

5 FRONT LOAD OLDER

6 ADD OLD

7 STORE NEW

8 SUBST LIMIT

9 BRPOS FINAL 10 WRITE NEW

11 COPY OLD OLDER

12 COPY NEW OLD

13 JMP FRONT

14 FINAL WRITE LIMIT

15 STOP

16 ZERO CONST 0

17 ONE CONST

18 OLDER SPACE

19 OLD SPACE

20 NEW SPACE

21 LIMIT SPACE

figure 4 : Sample Assembler-Language Program

Operation Code No of

Symbolic Machine Length Operands Action


15/55

ADD 02 2 1 ACC - ACC + OPDI

JMP 00 2 1 Jump to OPDI

JMPNEG 05 2 1Jump to OPDI if ACC

0

JMPZERO 04 2 1Jump to OPDI IF ACC =

0

COPY 13 3 2 PD2 - OPDI

DIVIDE 10 2 1 ACC - ACC / OPDI

LOAD 03 2 1 ACC - OPDI

MULT 14 2 1 ACC -ACC X OPDI

READ 12 2 1 OPDI - input stream

STOP 11 1 0 Stop execution

STORE 07 2 1 OPDI - ACC

SUB 06 2 1 ACC - ACC -OPDI

WRITE 08 2 1 Output stream - OPDI

figure 5 : Instruction SetOur hypothetical machine has a single accumulator and a main storage of unspecified size. Its 14

instructions are listed in Figure 6. Ale first column shows the operation code and the second givesthe machine-language equivalent (in decimal). The fourth column specifies the number of

operands, and the last column describes the action which ensues when the instruction isexecuted. In that column "ACC", "OPDI", and "OPD2" refer to contents of the accumulator, of the

first operand location, and of the second operand location, respectively. The length of eachinstruction in words is, 1 greater than the number of its operands.

Thus if the machine has 12 bit words, an ADD instruction is 2 words of 24 bits, long. The table'sthird column, which is redundant, gives the instruction length. If our hypothetical computer had a

fixed instruction length, the third and fourth columns could both he omitted.

The storage specification SPACE reserves one word of storage which presumably will eventually

hold a number; there is no operand. lie storage specification FIXED also reserves a word ofstorage; it has an operand which is the value of a number to be placed in that word by the

assembler.

The instructions of the program are presented in four fields, and might indeed be, constrained so

such a format on the input medium. The label, if present, occupies the first field. The second fieldcontains the symbolic operation code or storage specification which will hence- forth be referred

to simply as the operation. The third and fourth fields hold the operand specification, or simplyoperands, if present.

Although, it is not at all important to our discussion to understand what the example program

does, the foregoing specifications of the machine and of its assembler language reveal thealgorithm. The program simply, computes the so-called Fibonacci numbers (0,1,1,2,3,5,8,...).

This program is also written in BASIC programming language of Unit 1 Course 2. Now that wehave seen the elements of an assembler-language program we can ask what functions the

assembler must perform in translating it Here is the listReplace symbolic addresses by numeric addresses.

Replace symbolic operation codes by machine operation codes.Reserve storage for instructions and data.

Translate constants into machine representation.The assignment of numeric addresses can be performed without prior knowledge of what actual

locations will eventually be occupied by the assembled program. It is necessary only to generateaddresses relative to the start of the program. We shall assume that our assemble normally

assigns addresses starting at 0. In translating line 1 of our example program, the resultingmachine instruction will therefore be assigned address 1 and occupy 3 words, because COPY


16/55

instructions are 3 words long. Hence the instruction corresponding to line 2 will be assignedaddress 3, the READ instruction will be assigned address 6, and the WRITE instruction of line 4

will be assigned address 8, and so on to the end of the program. But what addresses will beassigned to the operands named ZERO and OLDER? These addresses must be inserted in the

machine-language representation of the first instruction.

IMPLEMENTATIONThe assembler uses a counter to keep track of machine- language addresses. Because theseaddresses will ultimately specify locations in main storage, the counter is called the location

counter. Before assembly, the location counter is initialized to zero. After each source line hasbeen examined on the first pass, the location counter is incremental by the length of themachine-language code which will ultimately be generated to correspond to that source line.

When the assembler first encounters line 1 of the example program, it cannot replace thesymbols ZERO and OLDER by addresses because those symbols make forward references to

source language program lines not yet reached by the assembler. The most straightforward wayto cope with the problem of forward references is to examine the entire program, text once,

before attempting to complete the translation. During that examination, the assemblerdetermines the address which corresponds to each symbol, and places both the symbols and their

addresses in a symbol table. This is possible because each symbol used in an operand field mustalso appear as a label. The address corresponding to a label is just the dress of the symbol table

requires one pass over the source text. During a second pass, the assembler uses the addressescollected in the symbol table to perform the translation.

As such symbolic address is encountered in the second pass, the corresponding numeric address

is substituted for it in the object code. Two of the most common logical errors in assembler-language programming involve improper use of symbols. If a symbol appears in the operand field

of some instruction, but nowhere in a label field. it is undefined. If a symbol appears in the labelfields of more than one instruction, it is multiply defined.

In building the symbol table on the first pass, the assembler must examine the label field of eachinstruction to permit it to associate the location counter value with each symbol. Multiply-defined

symbols will be found on this pass. Undefined symbols, on the other hand, will not be found onthe first pass unless the assembler also examines operand fields for symbols. Although this

examination is not required for construction of the symbol table, normal practice is to perform it

anyhow, because of its value in early detection of program errors. There are many ways toorganize a symbol table. The organisation of a symbol table will not be discussed in this Unit.

The state of processing after fine 3 is shown in Figure 7. During processing of line 1, the symbols

ZERO and OLDER were encountered and entered into the fiat two positions of the symbol table,

The operation COPY was identified. and instruction length, information from figure 6 used toadvance the location counter from 0 to 3. During processing of line 2 two more symbols were

encountered and entered in the symbol table and the location counter was advanced from 3 to 6.Line 3 yielded the fifth symbol, LIMIT, and caused incrimination of the location counter from 6 to

8. At this point the symbol table holds five symbols, none of which yet has an address. Thelocation counter holds the address 8, and processing ready to continue from line 4. Neither the

line numbers nor the addresses shown in part (a) of the figure are actually part of the source-language program. The addresses record the history of incrimination of the location counter the

line numbers permit easy reference. Clearly, the assembler needs not only a location counter, butalso a line counter to keep track of which source line is being processed.

Line Address Label Operation Operand 1 Operand 2

1 0 COPY ZERO OLDER

2 3 COPY ONE OLD

3 6 READ LIMIT

(a) Source text scanned


17/55

Symbol Address

ZERO --

OLDER --

ONE --

OLD -- Location counter ; 8

LIMIT -- Line counter ; 4

(b) Symbol table: Countersfigure 6 : First Pass After Scanning Line 3During processing of line 4 the symbol OLD is encountered for the second time. Because it isalready in the symbol table, it is not entered again. During processing of line 5, the symbol

FRONT is encountered in ft label field. It is entered into the symbol table, and the current locationcounter value, 10 is entered with it as its address. Figure 7 displays the state of the translation

after line 9 has been processed.

Line Address Label Operation Operand 1 Operand 2

1 0 COPY ZERO OLDER

2 3 COPY ONE OLD

3 6 READ LIMIT 4 8 WRITE OLD

5 10 FRONT LOAD OLDER

6 12 ADD OLD

7 14 STORE NEW

8 16 ADD OLD

9 18 JWPOS FINAL

10 20 WRITE NEW

11 22 COPY OLD OLDER

12 25 COPY NEW OLD

13 28 JMP FRONT

14 30 FINAL WRITE LIMIT

15 32 STOP

16 33 ZERO CONST 0

17 34 ONE CONST 1

18 35 OLDER SPACE

19 36 OLD SPACE

20 37 NEW SPACE 21 38 LIMIT SPACE

(a) Source text scanned

Symbol Address

ZERO 33

OLDER 35

ONE 34

OLD 36


18/55

LIMIT 38

FRONT 10 Location Counter : 39

NEW 37

FINAL 30 Line Counter .. 22

(b) Symbol table: Counters

Figure : 7The XX can be thought of as a specification to the loader will eventually process the object code,

that the content of the location corresponding to address 35 does not need to have any specificvalue loaded. The loader can then just skip over that location. Some assemblers specify anyway a

particular value for reserved storage locations, often zeros. There is no logical requirement to doso, however, and the user unfamiliar with his assembler is ill-advised to count on a particular

value.

Address Length Machine Code

00 3 13 33 35

03 3 13 34 36

06 2 12 38

08 2 08 36

10 2 03 3512 2 02 36

14 2 07 37

16 2 06 38

18 2 01 30

20 2 08 37

22 3 13 36 35

25 3 13 36 35

28 2 00 10

32 1 11

33 1 0034 1 01

35 1 XX

36 1 XX

37 1 XX

XX

38 1

Figure 8 : Object Code Generated on 2nd Pass

The specifications CONST and SPACE do not correspond to machine instructions. They are really

instructions to the assembler program. Because of this, we shall refer to them as assemblerinstructions. Another common designation for them is pseudo-instructions. Neither term is really

satisfactory. Of the two types of assembler instructions in our example program, one results inthe generation of machine code and the other in the reservation of storage. Later we shall see

assembler instructions which result in neither of these actions. One organization is to use aseparate table which is usually searched before the operation code table is searched. Another is to

include both machine operations and assembler instructions in the same table. A field in the table

entry then identifies the types to the assembler.

A few variations to the foregoing process can be considered. Some of the translation can actuallybe performed during the first pass. Operation fields must be examined during the first pass to

determine their effect on the location counter. The second pass table lookup to determine the


19/55

machine operation code can be obviated at he cost of producing intermediate test which holdsmachine operation code and instruction length in addition to source text.

Another translation which can be performed during the first pass is that of constants, e.g. fromsource- language decimal to machine-language binary. The translation of any symbolic addresses

which refer backward in the text, rather than forward, could be performed on the first pass, but itis more convenient to wait for the second pass and treat all symbolic addresses uniformly.

A minor variation is to assemble addresses relative to a starting address other than 0. Thelocation counter is merely initialized to the desired address. If, for example, the value 200 is

chosen, the symbol table would appear as in figure 11.The object code corresponding to line 1

wouldbe200 3 13 233 235.

Symbol Address

ZERO 233

OLDER 235

ONE 234

OLD 236

LIMIT 238

FRONT 210

NEW 237

FINAL 230

figure 9 : Symbol Table with Starting Location 200If it were known at assembly time that the program is to reside at location 200 for execution then

full object code with address and length need not be generated. The machine code alone wouldsuffice. In this event the result of translation would be the following 39-word sequence.

13 233 235 13 234 236 12 238 08

236 03 235 02 236 07 237 06 238

01 230 08 238 13 236 235 13 237

236 00 210 08 238 11 00 01 XX

XX XX XX

MACRO PROCESSORThe assembly language programmer often finds it necessary to repeat some statements or block of

code several times in a program. The block may consist of code to swap sets of registers, do somearithmetic operations. In this situation the programmer find a macro instruction facility useful.

Macro instruction (often called macros) are single line abbreviation for group of instructions. Inemploying a macro, the programmer essentially defines a single instruction to represent a block of

code. For every occurrence of this one-line macro instruction in his program, the macro processingassembler substitute the entire block.

Macro Definition and UsageTo highlight salient aspects of macro-processor. The example is very similar to Intel's 8 bit

microprocessor assembly language instruction.Example : -

MACRO

INCRMT &A , &B

LOAD &A Macro

ADD &B Definition

STORE &A


20/55

ENDM

INCRMT X,Y LOAD X Macro

ADD Y expansion

STORE X

ENDM Macro Program

Figure 10

A macro definition is placed at the start of a program, enclosed between the statements MACRO

and ENDM. A MACRO statement indicates that a macro definition starts, while statement ENDMindicates the end of a macro definition. Thus, a group of statements starting with MACRO andending with ENDM constitutes one macro definition unit. If many macros are to be defined in a

program, as many definition modules will exist at the start of the program. Each definition modulecontains a new operation and defines it to consist of a sequence of assembly language statement

In example above, INCRMT is defined to be the name of the LOAD-ADD-STORE instructionsequence. The operation defined by a macro can be used by writing the macro name in the

mnemonic field and its operands in the operand field of an assembly statement Appearance of amacro name in the mnemonic field amounts to a call on the macro. The assembler replaces such a

statement by the statement sequence comprising the macro. This is known as macro expansion.

INCRMTX,Y

is shown to lead to insertion of the assembly statementsLOAD XADD Y

STORE Xin its place. All macro calls in a program are expanded in this fashion.

DEFINING A MACROLet us take another look at the macro definition unit appearing in the following Figure 10.The

macro header statement indicates the existence of a macro definition unit Absence of the headerstatement as the first statement of a program or ft first statement following a macro definition unit,

signals the start of the main assembly language program. The next statement in the definition unitis die prototype for a macro call. This statement names the macro and indicates how the operands

in any call on the macro would be written.The prototype is followed by the so called model statements. These are assembly statements which

will replace the macro call as a result of macro expansion.

Positional ParametersThe prototype statement indicates how operands in a macro call would be written. These operandsare called parameters or arguments. All parameters used in the prototype statement have names

starting with the special character '&'. These parameters are known as formal parameters. A macrocall is written using parameter names which do not start with ft special character '&'. These are

known as actual parameters.The lists of formal and actual parameters also called as formal and actual parameter lists specified

in the prototype and macro call statements respectively, establish a correspondence between eachformal parameter and an actual parameter. In figure 10 , this correspondence is determined by the

relative positions of these parameters in their respective lists. Thus the first actual parameter inthe fist is paired with the first of formal parameters etc.

Considering the prototype and macro call statements once again.INCRMT &A,&B ... prototype

INCRMT X,Y ... macro callWe see that X would be paired with &A and Y with &B. While expanding a macro call, any formal

parameter appearing within a model statement is replaced by the corresponding actual parameter.

This is how expansion of the call INCR X,Y heads to the following statements

LOAD X

ADD YSTORE X


21/55

Schematics for Macro-Expansion

Above we touched upon the fundamental aspects of macro expansion. From the discussion, itappears that the process of macro expansion is similar to language translation. The source

program containing macro definitions and calls is translated into an assembly language, programwithout any macro definitions or calls. This program form can now be handed over to a

conventional assembler as to obtain the target languages form of the program.

In such a schematic (Figure 11), the process of macro expansion is completely segregated from the

process of assembly program. The translator which performs macro expansion in this manner iscalled a macro pre-processor. The advantage of this scheme is that any existing conventionalassembler can be enhanced in this manner to incorporate macro processing. It would reduce the

programming cost involved in making a macro facility available to programmer using a computersystem. The disadvantage is that this scheme is probably not very efficient because of the time

spent in generating assembly language statements and processing them again for the purpose oftranslation to the target language.

Fig. 12 : A pre-processor based scheme for macro assembly

ISSUES RELATED TO THE DESIGN OF A MACRO PRE-PROCESSORAs against this schematic of prefixing a conventional assembler with a macro pre-processor, it is

possible to design a macro assembler which not only processes macro definitions and macro callsfor the purpose of expansion, but also assembles the expanded statements along with the original

assembly statements. The macro assembler should require fewer passes over the program thanthe pre-processor scheme. This holds out a promise for better efficiency. But for the sake of

simplicity in this section, we will discuss the issues related to implementation of macro pre-processor instead of actual implementation.

Issues related to the Design of a Macro Pre ProcessorOur discussion regarding the definition and use of macros in an assembly program has brought out

to some extent the working principles of a macro pre-processor. To summarise, we should be ableto differentiate between macro names and invalid operation code mnemonics. On thus recognizing

a call on a macro, we should be able to access the text of its definition so that we can expand thecall. For generating a statement during expansion, we need to develop a simple scheme for

substituting the appearance of a formal parameter with its value. Correspondence between aformal parameter and its value will have to be established for this purpose. It is desirable that

instead of performing this action for every appearance of a formal parameter, correspondent

between formal parameters and their value should be established once and for all, at the start ofmacro expansion.Considerations of positional and keyword correspondence would thus get localized to the start of

macro expansion only. This would have the further advantage that no distinction would need to bemade between keyword and positional parameters during macro expansion.

Step 1:

Scan all macro definitions one by one. For each macro defined.enter its name in the Macro Name Table (MNT).

store the entire macro definition in the Macro Definition Table (MDT).add auxiliary information to the MNT indicating where the definition of a macro can be found in

MDT.


22/55

Step 2:Examine all statements in the assembly source program to detect macro calls. For each macro call

locate the macro in MNT.obtain information from MNT regarding position of the macro definition in MDT.

process the macro call statement to establish correspondence between all formal parameters andtheir values (i.e. actual parameters).

expand the macro call by following the procedure given in step 3.Step 3:

Process the statements in the macro definition as found in MDT in their expansion time order until

the ENDM statement is encountered. The conditional assembly statement AIF and AGO will enforcechanges in the normal sequential order based on certain expansion time relations between valuesof formal parameters and expansion time variables.

In order to have a complete working scheme within the above framework, we need to finalise thefollowing details:

Method of establishing correspondence between a formal parameter and its value.Method of sequencing through the statements comprising a macro definition in expansion time

order.Method of expanding a model statement

Allocation of storage for expansion time variables and access to their values during expansion.

COMPILER/ LINKER LOADER

LOADERSINTRODUCTIONThe purpose of this section is to discuss various functions of a loader. The loader is aprogram which accepts an object code and prepare them for an execution. An object code

produced by an assembler/compiler cannot be executed without any modification. As manyas four more function must be performed first. These functions are performed by a loader.

These functions are:Allocation of space in main memory for the programs.

Linking of a program with each other like library programsAdjust all address dependent locations. such as address constants, to correspond to the

allocated space. it is also called relocationPhysically load the machine instructions and data into memory. The following figure 1 shows

the function of a loader

Fig. 1: Function of a loader.

Let us examine the need of some of these function of the loader.

Linking


23/55

The need for linking a program with other programs arises because a program written by aprogrammer or its translated version is rarely of a 'stand-alone' nature. That is a program

generally cannot execute on its own. without requiring the presence of some otherprograms in the computer's memory.

For example. consider a program written in high level languages like C. Such a program

may contain calls on certain Input/Output functions like Printf ( ), Scanf ( ) etc., which amnot written by the programmer himself. During program execution, those standard functions

must reside into the main memory. Furthermore, every time an Input/Output function is

called by a C language program, control should get transferred to the appropriate function.The linking function makes address of programs known to each other so that such transferscan take place during the execution.

RELOCATIONAnother function commonly performed by a loader is that of program relocation. This

function can be explained as follows: Assume that a program written in C ( let us call it A)calls standard function Printf ( ). A and Printf ( ) would have to be linked with each other.

But where is main storage shall we load A and Printf ( ). A possible solution would be to loadthem according to the addresses assigned when they were U~W& For example, as

translated . A might be given stone area from 200 to 300 while Printf ( )function occupiesarea from 100 to 150.

If we were to load these programs at their translated addresses, a lot of storage lying

between them may go waste. Another possibility is that both A and Printf ( ) may have beentranslated with the identical start address of 100. 7bus, A extends from 100 to 200 while

Printf ( ) extends from 100 to 1 50. But there is simply no way A and Printf ( )can co-existat same storage location. Therefore, the loader may have to relocate one or both of these

programs to avoid address conflicts or storage waste. It should be noted that relocation ismore than simply moving a program from one area to another in the storage. It refers to

adjustment of address fields and not to movement of a program.

The task of relocation is to add some constant value to each relative address in the segment(the segment is a unit of information dust is treated as an entity, be it a program or data. It

is possible to produce multiple program or data segment in a single source file). The pan of

a loader which performs relocation is called relocating loader.

LOADER SCHEMESThere, are several schemes accomplishing the four loading function. These schemes are (i)

Absolute loader (ii) Relocating Loader (iii) Direct Linking Loader (iv) Dynamic Loading (v)Dynamic Linking etc.

Absolute Loader : The task of an absolute loader is virtually trivial. The loader simplyaccepts the machine language code produced by the assembler and places it into main

memory at the location specified by the assembler.

Relocating Loader: To avoid possible reassembling of all subroutines when a singlesub-routine is changed and to perform the tasks of allocation and linking for theprogrammer. The general class of relocating loader was introduced.The output of a relocating loader is the object program and information about all other

programs it references. In addition, there is information (relocation information) as tolocation in this program that need to be changed if it is to be loaded in an arbitrary location

in memory.

Direct Linking Loader: It is a general relocatable loader, and is perhaps the mostpopular loading scheme presently used. It has the advantage of allowing the programmermultiple procedure segments and multiple data segments and of giving him complete

freedom in referencing data or instructions contained in other segments. This provides


24/55

flexible inter segment referencing and accessing ability, while at the same time allowingindependent translations of programs. The other two loader schemes will be discussed in

the next section.

Dynamic Loading And Linking: There are numerous variations to the previouslypresented loader schemes. One disadvantage of the direct-linking loader, as presented, is

that it is necessary to allocate, relocate, link. And load all of the subroutines each time inorder to execute a program. Since there may be tens and often hundreds of subroutines

involved, especially when we include utility routines such as SQRT etc., this loading process

can be extremely time- consuming.

Furthermore, even though the loader program may be smaller than the assembler, it doesabsorb a considerable amount of space. These problems can be solved by dividing the

loading process into two separate programs: a binder and a module loader. A binder is aprogram that performs the same functions as the direct-linking loader in binding

subroutines together, but rather Cm placing the relocated and linked text directly intomemory, it outputs the text as a file. This output file is in a format ready to be loaded and is

typically called a load module. The module loader merely has to physically load the moduleinto main memory. The binder essentially performs the functions of allocation, relocation,

and linking; the module loader merely performs the function of loading. There are twomajor classes of binders. The simplest type produces a load module that looks very much

like a single absolute loader filet This means that the specific memory allocation of theprogram is performed at the time that the subroutines are bound together. A more

sophisticated binder, called a linkage editor. can keep auk of the relocation information sothat the resulting load module can be further relocated and thereby loaded anywhere, in

memory. In this case the module loader must perform additional allocation and relocation aswell as loading, but it does not have to worry about the complex problems of linking.

In both cases, a program that is to be used repeatedly need only be bound once and then

can be loaded whenever required. The first binder is relatively simple and fast. The secondone (linkage editor binder) is somewhat more complex but allows a more flexible allocation

and loading scheme.

Dynamic LoadingIn each of the previous loader schemes we have assumed that all of the subroutines neededare loaded into main memory at the same time. If the total amount of memory required by

all these subroutines exceeds the amount available, as is common with large programs onsmall computers, there is trouble! There are several hardware, techniques, such as paging

and segmentation, that attempt to solve this problem.

Usually the subroutines of a program are needed at different times: for example, pass 1 andpass 2 of an assembler are mutually exclusive ~ 1 and pass 2 should not simultaneously

occupy memory resources). By explicitly recognizing which subroutines call othersubroutines it is possible to produce an overlay structure that identifies mutually exclusive

subroutines.

Figure 2 illustrates a program consisting of five subprograms (A, B. C, D and E) that require100K bytes of memory. The arrows indicate that subprogram A only calls B, D and E;subprogram B only calls C and E; subprogram D only calls E; and subprograms C and E do

not call any other routines. Figure 16(a) highlights that interdependencies between theprocedures. Note that procedures B and D are never in use at the same time; neither are C

and E. If we load only those procedures that are actually to be used at any particular time.the amount of memory needed is equal to the longest path of the overlay structure.

This happens to be 7-K for the example in Figure 16(b) procedures A, B and C. Figure 2 (c)

illustrates a storage assignment for each procedure consistent with the overlay structure.In order for the overlay structure to work it is necessary for the module loader to load the,

various procedures as they are needed. We will not go into their specific details, but there


25/55

are many binders Capable of processing and allocating an overlay structure. The portion ofthe loader that actually intercepts the calls and loads the necessary procedure is called the

over lay supervisor or simply the flipper. This overall scheme is called dynamic loading orload on-call

Figure 2 ( A )

Figure 2 ( B )

Figure 2 ( C )


26/55

Figure 2 ( D )

Fig. 2 : Dynamic Loading

DYNAMIC LINKINGThe major disadvantage of all of the previous loading schemes is that if a subroutine is

referenced but never executed (e.g. if the programmer had placed a call statement in his

program but this statement was never executed because of a condition did not satisfy) the

loader would still incur the overhead of linking the subroutine.

Furthermore, all of these schemes require the programmer to explicitly name all procedures

that might be called. A very general type of loading scheme is charted dynamic linking. Thisis a mechanism by which loading and linking of external references are postponed until

execution time. The loader loads only the main program. If the main program shouldexecute a transfer instruction to an external address, or should reference an external

variable (that is, a variable that has not been defined in this procedure segment), the loaderis called. Only then is the segment containing the external reference loaded. An advantage

here is that no overhead is incurred unless the procedure to be called or referenced isactually used. A further advantage is that the system can be dynamically reconfigured. The

major drawback to using this type of loading scheme is the considerable overhead and

complexity incurred, due to the fact that we have postponed most of the binding processuntil execution time.

Now we will discuss the implementation of the simplest type of loader scheme which iscalled an absolute loader.

Implementation of an Absolute LoaderAbsolute loaders are simple to implement but they do have disadvantages. First, theprogrammer must specify to the assembler the address in memory when the program is to

be loaded. Further, if there are multiple function to be called within a program, the

programmer must remember the address of each and use that absolute address explicitly inhis Other functions to perform linking of functions. The figure B illustrates the operation of

an absolute loader. The programmer must he careful not to assign two subroutine functionto the same or overlapping address.


27/55

Figure 3 : Absolute LoaderThe program First. c is assigned to locations 100-300 and the sqrt function is assigned

location 400-450. If changes were made to A that increased its length to more than 300bytes, the end of first. c (at 100+300 = 400) would overlap the start of sqrt (at 400). It

would then be necessary to assign sqrt to a new address. Furthermore, it would also benecessary to modify all other functions that referred to sqrt. In situation when dozen of

subroutines are being used, this manual shuffling can get very complex, tedious andwasteful of time and memory.

The four loader functions are accomplished as follows in an absolute loading scheme:

MACRO

INCRMT &A , &B

LOAD &A Macro

ADD &B Definition

STORE &A

ENDM

INCRMT X,Y LOAD X Macro

ADD Y expansion

STORE X

ENDM Macro Program

COMPILERThe study of compiler designing form a central theme in the field of computer science. An

understanding of the technique used by high level language compilers can give the programmer aset of skills applicable in many aspects of software design - one does not have to be a compiler

writer to make use of them.

Assembler which translates assembly language program into machine language. here we will lookat another type of translator called compiler. The compiler writing is not confined to one discipline

only but rather spans several other disciplines: programming languages, computer architecture,theory of programming languages, algorithms, etc. Today a few basic compiler writing techniques

can be used to construct translators for a wide variety of languages. This unit is intended as anintroduction to the basic essential features of compiler designing.

WHAT IS A COMPILER?A compiler is a software (Program) that reads a program written in a source language and

translates it into an equivalent program in another language - the target language (see figure4).The important aspect of compilation, process is to produce diagnostic (error messages) in the

source program. These error messages are mainly due to the grammatical mistakes done by aprogrammer. A familiarity with the material covered in this unit will be a great help in

understanding the inner function of a compiler


28/55

Fig. 4 . A Complier

There are thousands of source languages, ranging from C and PASCAL to specialized languagesthat have arisen in virtually every area of computer application. Target languages a also in

thousands. A target language may be another programming language or the machine language oran assembly language. Compilers are classified as single pass, multitasks, debugging or optimizing,

depending on how they have been constructed or on what functions are supposed to perform.Earlier (in 1950's) compilers were considered as a difficult program to write.

The first FORTRAN compiler, for example, took 18 staff-years to implement B now several newtechniques and tools have been developed for handling many of the important tasks that occur

during compilation process. Good implementation languages, programming environments (editors,debuggers, etc.) and software tools have also been developed. With these development compiler

writing exercise has become easier.

Approaches To Compiler DevelopmentThere are several approaches to compiler developments. Here we will look at some of them are -

Assembly Language CodingEarly compilers were mostly coded in assembly language. The main consideration was to increase

efficiency. This approach worked very well for small High Level Languages (HLL). As languagesand their compilers became larger, lots of bugs started surfacing which were difficult to remove.The major difficulty with assembly language implementation was of poor software maintenance.

Around this time, it was realised that coding the compilers in high level language would overcome

this disadvantage of poor maintenance. Many compilers were therefore coded in FORTRAN, theonly widely available HLL at that time. For example, FORTRAN H compiler for IBM/360 wascoded in FORTRAN. Later many system programming languages were developed to ensureefficiency of compilers written into HLL.Assembly language is still being used but trend is towards

compiler implementation through HLL.

Cross-CompilerA cross-compiler is a compiler which runs on one machine and generates a code for anothermachine. The only difference between a cross-compiler and a normal compiler is in terms of code

generated by it. For example, consider the problem of implementing a Pascal compiler on a new

piece of hardware (a computer called X) on which assembly language is the only programminglanguage already available. Under these circumstances, the obvious approach is to write the Pascalcompiler in assembler. Hence, the compiler in this case is a program that takes Pascal source as

input, produces machine code for the target machine as output and is written in the assemblylanguage of the target machine. The languages characterizing this compiler can be represented as:


29/55

figure 5 :showing that Pascal source is translated by a program written in X assembly language (the

compiler) running on machine X into X's object code. This code can then be run on the targetmachine. This notation is essentially equivalent to the T-diagram. The T-diagram for this compiler

is shown in figure 5 .

Fig. 5 T-diagramThe language accepted as input by the compiler is stated on the left the language output by the

compiler is shown on the right and the language in which the compiler is written is shown at thebottom. The advantage of this particular notation is that several T-diagrams can be meshedtogether to represent more complex compiler implementation methods. This compiler

implementation involves a great deal of work since a large assembly language program has to be

written for X. It is to be noticed in this case that the compiler is very machine specific; that is, notonly does it run on X but it also produces machine code suitable for running on X.

Furthermore, only one computer is involved in the entire implementation process.The use of a high-level language for coding the compiler can offer great savings in implementation

effort. If the language in which the compiler is being written is already available on the computer inuse, then the process is simple. For example, Pascal might already be available on machine X, thus

permitting the coding of, say, a Modula-2 compiler in Pascal.

Such a compiler can be represented as:


30/55

If the language in which the compiler is being written is not available on the machine, then all is

not lost, since it may be possible to make use of an implementation of that language on anothermachine. For example, a Modulc-2 compiler could be implemented in Pascal on machine Y,

producing object code for machine X:

The object code for X generated on machine Y would of course have, to be transferred to X for itsexecution. This process of generating code on one machine for execution on another is called

cross-compilation.

At first sight, the introduction of a second computer to the compiler implementation plan seems tooffer a somewhat inconvenient solution. Each time a compilation is required, it has to be done on

machine Y and the object code transferred, perhaps via a slow or laborious mechanism, to machineX for execution. Furthermore, both computes have to be running and inter-linked somehow, for

this approach to work.

BOOTSTRAPPINGIt is a concept of developing a compiler for a language by using subsets (small pail) of the samelanguage. Suppose that a Modula-2 compiler is required for machine X, but that the compiler be

coded in Modula-2. Coding the compiler in the language it is to compile is nothing nothing special

and, as will be seen, it has a great deal in its favour. Suppose further that Modula-2 is alreadyavailable on machine Y. In this case, the compiler can be run on machine Y, producing object codefor machine X:

This is the same situation as before except that the compiler is coded in Modula-2 rather thanPascal. The special feature of this approach appears in the next step. The compiler, running on Y, is

nothing more than a large program written in Modula-2. Its function an input file of Module-2statements into a functionally equivalent sequence of statement in X's machine code.

Therefore, the source statements of this Module-2 compiler can be passed into itself running on Yto produce a file containing X's, machine code. This file is of course a Module-2 compiler, which is

capable of being run on X. By making the compiler compile itself, a version of the compiler thatruns on X has been created.


31/55

Once this machine code has been transferred to X, a self-sufficient Module-2 compiler is availableon X; hence there is no further use for machine Y for supporting Module-2 compilation.

This implementation plan is very attractive. Machine Y is only required for compiler development

and once this development has reached the stage at which the compiler can (correctly) compileitself, machine Y is no longer required. Consequently, the original compiler implemented on Y need

not be of the highest quality - for example, optimization can be completely disregarded. Furtherdevelopment (and obviously conventional use) of the compiler can then continue at leisure on

machine X.This approach to compiler implementation is called bootstrapping. Many languages,

including C, Pascal, FORTRAN and LISP have been implemented in this way.

Pascal was first implemented by writing a compiler in Pascal itself. This was done through several

bootstrapping processes. The compiler was then translated "by hand" into an available low level

language.

Compiler Designing PhasesThe compiler being a complex program is developed through several phases. Each phasetransforms the source program from one representation to another. The tasks of a compiler can be

divided very broadly into two sub-tasks.The analysis of a source program

The synthesis of the object program

In a typical compiler, the analysis task consists of 3 phases.Lexical analysis

Syntax analysis

Semantic analysisThe synthesis task is usually considered as a code generation phase but it can be divided into some

other distinct phases like intermediate code generation and code optimization. These four phasefunctions in sequence are shown in figure 6 . Code optimization is beyond this unit.

The nature of the interface between these four phases depends on the compiler. It is perfectlypossible for the four phases to exist as four separate programs.

Fig. 6 Compiler Design Phases

Lexical AnalysisLexical analysis is the first phase of a compiler. Lexical analysis, also called scanning, scans a

source program form left to right character by character and group them into tokens having acollective meaning. It performs two important tasks. First, it scans a source program character bycharacter from left to right and groups them into tokens (or syntactic element). Each token or

basic syntactic element represents a logically cohesive sequence of characters such as identifier

(also called variable), a keyword (if, then. else, etc.), a multi -character operator < =, etc. Theoutput of this phase goes to the next phase, i.e., syntax analysis or parsing. The interaction

between two phases is shown below in figure 7 .


32/55

Fig. 7 Interaction between the first two phases

The second task performed during lexical analysis is to make entry of tokens into a symbol table ifit is not there. Some other tasks performed during lexical analysis are:

to remove all comments, tabs, blank spaces and machine characters.

to produce error messages (also called diagnostics) occurred in a source program.Let us consider the following Pascal language statement.

For i = 1 To 50 do sum = sum + x [i]; sum of numbers stored in array x

After going through the statement, the lexical analysis transforms it into the sequence of tokens

System Software and Languages

Documents

Transcript of System Software and Languages

Controlled Languages in Software User Documentation20665/FULLTEXT01.pdf · Controlled Languages in Software User ... comprehensibility and translatability of software user ... Controlled

Simulation Software and Simulation in Java. Simulation languages & software Simulation Model Development General Purpose Languages Simulation Programming.

Software Languages

Traceability Support for Multi-Lingual Software Projects · distributed software projects produce software artifacts written in two or more languages. The use of intermingled languages

Software – Applications software and programming languages.

Software Languages Engineering: Experimental Evaluation

Languages for Software Defined Networking

Features on Demand - Software Languages Lab

PROGRAMMING LANGUAGES FOR SCALABLE SOFTWARE …ranger.uta.edu/~nystrom/papers/thesis-chbars.pdf · programming languages for scalable software extension and composition ... programming

Evolution and History of Programming Languages Software/Hardware/System.

Systems Software & Operating systems. Computer Languages.

CONTENTS: INTRODUCTION TO DIGITAL COMPUTER · Higher-level languages and Compiler Interpreter Editor System Software Application Software Notes prepared by K.Raghuveer, NIE and B.G.Prasad

Information Technology Programming languages, … languages, their environments and system software interfaces — Extensions to the C Library, — Part I: Bounds-checking interfaces

Software Development software development. Software Development 1 - The Software Development Process 2 - Software Development Languages & Environments.

Building Systems & Applications Software Development, Programming, & Languages

Computer Software. Evolution of Programming Languages Machine Languages Assembly Languages High-Level Languages Fourth-Generation Languages.

Information technology — Programming languages, their ...ISO+IEC+1989... · Information technology — Programming languages, their environments and system software interfaces —

Software Development Languages and Environments

Programming Languages and Software Construction

Software Computer Program Languages + Business Software HTM304