Post on 14-Apr-2018
7/30/2019 System Software and Languages
1/55
SYSTEM SOFTWARE AND LANGUAGES
INTRODUCTION TO COMPUTER SOFTWAREA computer contains two basic parts: (i) Hardware and (ii) Software. In the first two units wetouched upon hardware issues in quite detail. In this unit and also in the rest of the units of this
block we will discuss topics related to software. Without software a computer will remain just a
metal. With software, a computer can store, retrieve, solve different types of problems, createfriendly environment for software development etc.
The process of software development is called programming. To do programming one should haveknowledge of (i) a particular programming language, (ii) set of procedures (algorithm) to solve a
problem or develop software. The development of an algorithm is basic to computer programmingand is an important part of computer science studies. Developing a computer program is a detailed
process, which requires serious thought, careful planning and accuracy. It is a challenging andexacting task, drawing on the creativity of the programmer.
Once an algorithm is obtained, the next step for a solution using a computer would be to program
the algorithm using mathematical and data processing techniques. Programming languagesconstitute the vehicle for this stage of problem solving. The development of programming
Languages is one of the finest intellectual achievements in Computer Science. It has been said "tounderstand a computer, it is necessary to understand a programming language. Understanding
them does not really mean only being able to use them. A lot of people can use them without reallyfully understanding them".
An Operating System is system software, which may be viewed as an organized collection of
software consisting of procedures for operating a computer and providing an environment forexecution of programs. It acts as an interface between users and the hardware of a computer
system.
There are many important reasons for studying operating systems. Some of them are:User interacts with the computer through operating system in order to accomplish his task since it
is his primary interface with a computer. It helps users to understand the inner functions of acomputer very closely. Many concepts and techniques found in operating system have general
applicability in other applications. In this unit, we will discuss about the concepts relating to a
programming language and in the next unit we will deal with the operating system concepts.
INTRODUCTION TO SYSTEM SOFTWAREComputer software consists of sets of instructions that mould the raw arithmetic and logical
capabilities of the hardware units to perform.
In order to communicate with each other, we use natural languages like Hindi, English, Bengali,Tamil, Marathi, Gujarati etc. In the same way programming languages of one type or another are
used in order to communicate instructions and commands to a computer for solving problems.Learning a programming language requires learning the symbols, words and rules of the language.
Program and Programming: A computer can neither think nor make any judgment on its own. Also
it is impossible for any computer to independently analyse a given data and follow its own methodof solution. It needs a program to tell it what to do. A program is a set of instructions that arearranged in a sequence that guides the computer to solve a problem. The process of writing a
program is called Programming. Programming is a critical step in data processing. If the system is
not correctly programmed, it delivers information results that cannot be used. There are two waysin which we can acquire a program. One is to purchase an existing program, which is normally
referred to as packaged software and the other is to prepare a new program from scratch in whichcase it is called customized software. A computer software can be broadly classified into two
categories-System Software and Application Software. Today, there are many languages availablefor developing programs software. These languages are designed keeping in mind some specific
areas of applications. Thus, some of the languages may be good for writing systemprograms/software while some other for application software.
7/30/2019 System Software and Languages
2/55
Since a computer can be used for writing various types of application/system software, there aredifferent programming languages.
i) System Programming Languages: System programs are designed to make the computer
easier to use: An example of system software is an operating system, which consists of many otherprograms for controlling input/output devices, memory, processor etc. To write an operating
system, the programmer needs instruction to control the computer's circuitry (hardware part). Forexample, instructions that move data from one location of storage to a register of the processor. C
and C++ languages are widely used to develop system software.
ii) Application Programming Language: Application programs are designed for specificapplications, such as payroll processing, inventory control etc. To write programs for payroll
processing or other applications, the programmer does not need to control the basic circuitry of acomputer. Instead the programmer needs instructions that make it easy to input data, produce
output, do calculations and store and retrieve data. Programming languages that are suitable forsuch application programs support these instructions but not necessarily the types of instructions
needed for development of system programs.
There are two main categories of application programs: business programs and scientificapplication programs. Most programming languages are designed to be good for one category of
applications but not necessarily for the other, although there are some general purpose languagesthat supports both types. Business applications are characterized by processing of large inputs and
large outputs, high volume data storage and retrieval but call for simple calculations. Languages,which are suitable for business program, development, must support high volume input, output
and storage but do not need to support complex calculations. On the other hand, programming
languages that are designed for writing scientific programs contain very powerful instructions forcalculations but rather poor instructions for input, output etc. Amongst traditionally used
programming languages, COBOL (Commercial Business Oriented Programming Language) is more
suitable for business applications whereas FORTRAN (Formula Translation - Language) is moresuitable for scientific applications. Before we discuss more about languages let us briefly look at the
categories of software viz. system and application software.
SYSTEM SOFTWARE
Language TranslatorA language translator is a system software which translates a computer program written by a user
into a machine understandable form.
Operating SystemAn operating system (OS) is the most important system software and is a must to operate acomputer system. An operating system manages a computer's resources very effectively, takes
care of scheduling multiple jobs for execution and manages the flow of data and instructionsbetween the input/output units and the main memory. Advances in the field of computer hardware
have also helped in the development of more efficient operating systems.
UtilitiesUtility programs are those which are very often requested by many application programs. A fewexamples are:
SORT/MERGEutilities, which are used for sorting large volumes of data and merging them into a single sortedlist, formatting etc.
APPLICATION SOFTWAREApplication software is written to enable the computer to solve a specific data processing task. A
number of powerful application software packages, which does not require significant programmingknowledge, have been developed. These are easy to learn and use as compared to the
7/30/2019 System Software and Languages
3/55
programming languages.Although these packages can perform many general and special functions, there are applications
where these packages are not found adequate. In such cases, application program is written tomeet the exact requirements. A user application program may be written using one of these
packages or a programming language. The most important categories of software packagesavailable are:
Data Base Management Software
Spreadsheet Software
Word Processing Desktop Publishing (DTP) and presentation Software Graphics SoftwareData Communication SoftwareStatistical and Operational Research Software
Data Base Management SoftwareDatabases are very useful in creation maintaining query, the databases and generation of reports.Many of today's Database Management System are Relational Database Management System's.
Many RDBMS packages provide smart assistants for creation of simple databases for invoices,orders and contact lists. Many database management systems are available in the market these
days. You can select any one based on your needs, for example, if you have only few databasesthen package like dBase, FoxPro etc. may be good. If you require some additional features and
moderate work load then Lotus Approach, Microsoft Access are all-right. However, if you are having
high end database requirements which requires multi-user environment and data security, accessright, very good user interface etc. then you must go for professional RDBMS package like Ingress,
Oracle, Integra etc.
Accounting PackageThe accounting packages are one of the most important packages for an office. Some of the
features, which you may be looking on an accounting, may be:
tax planner facilityfacility for producing charts and graphs
finding accounts payablesimple inventory control facility
payroll functions
on-line connection to stock quotescreation of invoices easily
One of the good packages in this connection is Quicken for windows.
Communication PackageThe communication software includes software for fax. The fax-software market is growing up.Important fax software is Delrina's WinFax PRO 4.0. Some of the features such as Remote
Retrieval and Fax Mailbox should be looked into fax software. These features ensure thatirrespective of your location you will receive the fax message. Another important feature is fax
Broadcast. This allows you to send out huge numbers of faxes without tying up your fax machineall day.
If you have to transfer files from your notebook computer to a desktop computer constantly thenyou need a software program that coordinates and updates documents. On such software is Laplink for Windows. This software offers very convenient to use features. For example, by simply
dragging and dropping a file enables file transfer. This software can work if a serial cable or aNovell network or a modem connects you.
Desktop Publishing PackagesDesktop Publishing Packages are very popular in Indian context. Newer publishing packages also
provide certain in built formats such as brochures, newsletters, flyers etc., which can be useddirectly. Already created text can be very easily put in these packages, so are the graphics
placements. Many DTP packages for English and languages other than English are available.Microsoft Publisher, PageMaker, Corel Ventura are few popular names. Desktop publishing
7/30/2019 System Software and Languages
4/55
packages, in general, are better equipped in Apple-Macintosh computers.
CATEGORIES OF LANGUAGESWe can choose any language for writing a program according to the need. But a computer executes progra
after they are represented internally in binary form (sequences of 1s and 0s). Programs written in any othelanguage must be translated to the binary representation of the instructions before the computer can execu
those. Programs written for a computer may be in one of the following categories of languages.
MACHINE LANGUAGEThis is a sequence of instructions written in the form of binary numbers consisting of l s, 0s to which the co
responds directly. The machine language was initially referred to as code, although now the term code is usmore broadly to refer to any program text. An instruction prepared in any machine language will have at leparts. The first part is the command or Operation, which tells the computer what functions, is to be perform
computers have an operation code for each of its functions. The second part of the instruction is the operantells the computer where to find or store the data that has to be manipulated. Just as hardware is classified
generations based on technology, computer languages also have a generation classification based on the leinteraction with the machine. Machine language is considered to be the first generation language.
Advantage Of Machine LanguageIt is faster in execution since the computer directly starts executing it.
Disadvantage Of Machine Language
It is difficult to understand and develop a program using machine language. Anybody going through this prfor checking will have a difficult task understanding what will be achieved when this program is executed.Nevertheless, the computer hardware recognizes only this type of instruction code.
The following program is an example of a machine language program for adding two numbers.
0011 1110 Load A register with
0000 0111 value 7
0000 0110 Load B register with 10
0000 1010 A = A+B
1000 0000 store the result
0011 1010 into the memory location
0110 0110
0000 0000 whose address is 100 (decimal)
0111 0110 Halt processing
ASSEMBLY LANGUAGEAssembly language unlocks the secret of your computer's hardware and software. It teaches you about thethe computer's hardware and operating system work together and how, the application programs communi
with the operating system. Assembly language, unlike high level languages, is machine dependent. Each
microprocessor has its own set of instructions, that it can support.
When we employ symbols (letter, digits or special characters) for the operation part, the address part and
parts of the instruction code, this representation is called an assembly language program. This is considerethe second-generation language. Machine and Assembly languages are referred to as low level languages s
the coding for a problem is at the individual instruction level.
Each machine has got its own assembly language, which is dependent upon the internal architecture of theprocessor. An assembler is a translator, which takes its input in the form of an assembly language program
produces machine language code as its output. The following program is an example of an assembly languaprogram for adding two numbers X and Y and storing the result in some memory location.
LDA ,7 Load register A with 7
LDB ,10 Load register B with 10
7/30/2019 System Software and Languages
5/55
ADD A,B A_A+B
LD (100),A Save the result in the location 100
HALT Halt process
From this program, it is clear that usage of mnemonics in our example LD, ADD, HALT are the mnemonics)improved the readability of our program significantly.
A machine cannot execute an assembly language program directly, as it is not in a binary form. An assemb
needed in order to translate an assembly language program into the object code executable by the machinis illustrated in the figure 1.
Figure 1: Assembler
Advantage of Assembly LanguageWriting a program in assembly language is more convenient than in machine language. Instead of binary
sequence, as in machine language, it is written in the form of symbolic instructions. Therefore, it gives a litmore readability.
Disadvantages of Assembly LanguageAssembly language (program) is specific to particular machine architecture. Assembly languages are design
specific make and model of a microprocessor. It means that assembly language programs written for oneprocessor will not work on a different processor if it is architecturally different. That is why the assembly lan
program is not portable. Assembly language program is not as fast as machine language. It has to be firsttranslated into machine (binary) language code.
VARIABLES, CONSTANTS, DATA TYPE, ARRAY AND EXPRESSIONS
These are the smallest components of a programming language.
figure - 2 Memory Organization
VariableThe first thing we must learn is how to use the internal memory of a computer in writing a
program. Memory may be pictured as a series of separate memory cells as shown in figure 2 .Computer memory is divided into several locations. Each location has got its own address.
Each storage location holds a piece of information. In order to store or retrieve information from a
7/30/2019 System Software and Languages
6/55
memory location, we must give that particular location a name. Now study the following definition.Variable: It is a character or group of characters assigned by the programmer to a single memory
location and used in the program as the name of that memory location in order to access the valuestored in it.
For example in expression A = 5, A is a name of memory location i.e. a variable where 5 is stored.
ConstantIt has fixed value in the sense that two cannot be equal to four. String constant is simply asequence of characters such as "computer" which is a string of 8 characters. The numeric constant
can be integer representing whole quantities or a number with a decimal point to representnumbers with fractional part. Constant would be probably the most familiar concept to us since wehave used it in doing everything that has to do with numbers. Numeric constants can be added,
subtracted, multiplied, divided, and also compared to say whether two of them are equal, less than
or greater than each other.
As string constants are a sequence of characters, a related string constant may be obtained from a
given one, by chopping off some characters from beginning or end or both or by appending anotherstring constant at the beginning or end. For example, from 'Gone with the wind', we can get 'one
with ', 'Gone with wind', and so on. String constants can also be compared in a lexicographic(dictionary) sense to say whether two of them are equal, not equal, less than or greater than each
other.
Data typeIn computer, programming, the term data refers to anything and everything processed by thecomputer. There are different types of data processed by the computer, numbers are one type of
data and words are of another type. In addition, the operations that are performed on data differfrom one type of data to another type. For example multiplication applies to numbers and not
words or sentences.
Data type defines a set of related values/integers, number with fraction, characters and a set of
specific operations that can be performed on those values.In BASIC a statement LET A = 15 denotes that A is a numeric data type because it contains
numbers but in a statement LET A$ = "BOMBAY", A$ is a variable of character data type. Data type
also defines in terms of contiguous cells should be allocated for a particular variable.
ArrayIn programming we deal with large amount of related data. To represent each data element we
have to consider them as separate variables. For example if we have to analyse for the salesperformance of a particular company for the last 10 years, we can take ten different variables
(names) each one representing sales of a particular year. If we analyse sales information for morethan 10 years, then accordingly number of variables will further increase. It is very difficult to
manage with large number of variables in a program. To deal with such situation an array is used.An array is a collection of same type of data (either string or numeric), all of that are referenced by
the same name. For example, list of 5 years sales information of a company can be referred to bysame array name A.
A(1) A(2) A(3) A(4) A(5)50,000 1,00,000 5,00,000 8,00,000 9,00,000
A(1) specifies Sales information of a first year
A(2) specifies Sales information of a second yearA(3) specifies Sales information of a fifth year
ExpressionWe know that we can express intended arithmetic operations using expressions such as X +Y+ Z
and so on. Several simple expressions can even be nested together using parentheses to formcomplex expressions. Every computer language specifies an order by in which various arithmetic
operators are evaluated in a given expression. An expression may contain operators such as
7/30/2019 System Software and Languages
7/55
Parentheses ( )Exponentiation ^
Negation -Multiplication, division *, /
Addition, subtraction +,-The operators are evaluated in the order given above. For example, the expression
2+8*(4 - 613)can be considered to be evaluated as follows:
2+8*(4 - 6/3)Sub expression (4 - 6/3) taken up first
2+8*(4 - 2)division 6/3 within (4 - 6/3) has higher priority than 4 - 62+8*2Subtraction (4 - 2) is performed next (4 - 6/3) is now complete.2+8*28*2 will be executed first then its result will be added with 2 that is 16 + 2 = 18
It is useful to remember the order of priority of various operators. But it is safer to simplify
expressions and enclose them in parentheses to avoid unpleasant surprises. So far we havefocused on arithmetic expressions. But expression is a very general concept. We mentioned earlier
that apart from arithmetic operations we could compare numbers or strings. We do it by usingrelational operators in expressions.
The following is a list of relational operators:
= equal to
< > not equal to< less than.
> greater than
= greater than or equal to
These operations have the same level of Priority among themselves but a lower priority thanarithmetic operators mentioned earlier. The relational expressions result in one of the truth-values,
either TRUE or FALSE. When a relational expression such as (3 > 5) is evaluated to be FALSE bysuch languages, a value 0, that is false, is assigned, whereas (5, < 7) will be evaluated to be
TRUE, and value 1 will be assigned.
Note that relational expressions are capable of comparing only two values separated by appropriaterelational operator. If we want to an express idea such as whether number 7 happens to be within
two other numbers 4 and 10, we may be tempted to write relational expression 4 2) OR (7 > 2) is TRUE
7/30/2019 System Software and Languages
8/55
XOR
TRUE only if one of the adjoining
expressions is TRUE and other isFALSE.
The XOR has same priority as OR. (4< 7) XOR (7 < 10) is FALSE.
ASSEMBLY LANGUAGE FUNDAMENTALS
The best way to learn to write assembly language program, is to first study a simple assembly
written program. We shall in this section do just the same.
A Sample Program;ABSTRACT : This program adds 2 8-bit words in the memory
; : locations called NUM1 and NUM2. The result is
; : stored in the memory location called RESULT. If; : there was a carry from the addition it will be stored
; : as 0000 0001 in the location CARRY
;ALGORITHM:
; get NUM l; Add NUM2
; put sum into memory at SUM; position carry in LSB of byte registers
; mask off upper seven bits; Store the result in the carry location.
;;PORTS :None used
;PROCEDURES :None used;REGISTERS : Uses CS, DS, AX
DATA SEGMENTNUM1 DB 15h ; First number stored here
NUM2 DB 20h ; Second number stored hereRESULT DB ? ; Put sum here
CARRY DB ? ; Put any carry here
DATA ENDSCODE SEGMENT
ASSUME CS:CODE,
DS:DATASTART: MOV AX, DATA ; Initialize data segment
MOV DS, AX ; register
MOV AI, NUM1 ; Get the first numberADD AI, NUM2 ; Add it to 2nd number
SULT, AL ; Store the resultRCL AL, 01 ; Rotate carry into LSB
AND AL, 00000001B ; Mask out all but LSB
MOV CARRY, AL ; Store the carry resultMOV AX,4C00h
INT 21hCODE
ENDSEND START
The program contains, certain additional mnemonics, in addition to the instructions you have
studied so far. These are called as assembler directives or pseudo operations. These are thedirections for the assembler. Their meaning is valid only till the assembly time. There is no code
generated for them.
SEGMENT and ENDS Directive
7/30/2019 System Software and Languages
9/55
The SEGMENT and ENDS directives are used to identify a group of data items or a group ofinstructions, called the segment. These directives are used in the same way as parentheses are
used in algebra, to group the like items together. A group of data statements or the instructions,that are put in between the SEGMENT and ENDS directives are said to constitute a logical segment.
This segment is given a name. In our example CODE and DATA are the names given to code anddata segments respectively.
The segments should have a unique name, there can be no blanks within the segment name, thelength of the segment name can be up to 31 characters. Name of the mnemonics or any other
reserved words is not allowed as the segment name or label.
Data Definition DirectivesIn assembly language, we define storage for variables using data definition directives. Data
definition directives create storage at assembly time, and can even initialize a variable string to a
starting value. The directives are summarized in the following table:
Directive Description Number of bytes Attribute
DB Define byte 1 Byte
DW Define word 2 word
DDDefine double-
word
4 double word
DO Define quadword 8 quad word
DT Define 10 bytes 10 ten bytes
As we see from the following table, the variable being defined is given an attribute. The attribute
refers to the basic unit of storage used when the variable was defined. These variables can begiven a name as follows:
Example
CHAR_VAR DB 'A'; CHAR_VAR = 41hWORD_VAR DW 01234h; ex number should begin with zero
LIST DB 1,2,3,4; list of 4 bytes initialized by numbers 1,2,3,4NUM DW 4200
DEN DB 20
DUP directive is used to duplicate the basic data definition 'n' number of times. Example:ARRAY DB 10 DUP (0)
Define an array ARRAY of 10 data bytes, each byte initialized to 0. The initial value can be anythingacceptable to the basic data type.EQU directive is used to define a name to a constant. Example:
CONS EOU 20
will define a constant with value 20. Now in your program, where ever you want to use 20, you can
use the name instead. The advantage of this is that: lets say, you want to change the value ofCONS to, say 10, at some instance of time. Now, instead of making changes every where in the
program, you just have to change the EQU definition, and assemble the program again. Thechange will be done automatically at all places.
Types of numbers used in data statements can be octal, binary, hexadecimal, decimal and ASCII.
Following are the examples of each type:
TEMP_MAX DB 01101100B ;BInary
OLD_VAL DW 73410 ;Octal
DECIMAL DB 49 ;Decimal
HEX_VAL DW 03B2Ah ;Hex
ASCII_VAL DB 'EXAMPLE' ;ASCII
7/30/2019 System Software and Languages
10/55
The ASSUME Directive8086 has four type of segments, discussed in the previous unit. In the program there can be morethan one code segments, data segments, or extra segments defined. However, only one of each
type can be active at a time. ASSUME directive is used to tell the assembler, which segment is tobe used as an active segment at any instant, and with respect to which it has to calculate the
offsets of the variables or instructions. It is usually placed immediately after the SEGMENTdirective, in the code segment, but you can have as many additional Assumes as you like.
Each time an ASSUME is encountered, the assembler starts to calculate the offset with respect to
that segment. In the example above CODE and DATA are the two segments defined, one each forcode and data.
Initializing Segment RegistersASSUME is only a directive, which is used to calculate the offset of variables, instructions or stackelement, with respect to a specific segment of its type. It does not initialize the segment registers.
Initialization of the segment registers has to be done explicitly using MOV instructions as follows:MOV AX,DATA
MOV DS,AX
The above statements are used to initialize the data segment register. The segment registerscannot be directly loaded with memory variable, therefore, the segment name is first moved into
some general purpose register, which then is moved into the segment register. All segmentregisters can be initialized in the same manner. Code segment register, is initialized automatically
by the loader.
END DirectiveThe END directive tells the assembler to stop reading and assembling the program from there on.Any statement after the END will be ignored by the assembler. There can be only one END in the
program, which is the last statement of the program.
THE ASSEMBLY LANGUAGE PROGRAMSThe assembly language programs can be written in two ways: one in which all code and data is
written as part of one segment, called COM programs, and the other where you have more than
one segment, called the EXE programs. We shall . study each of them in brief, looking at theiradvantages and disadvantages.
COM ProgramsA COM (Command) program is simply a binary image of a machine language program. It is loaded
in the memory at the lowest available segment address. The program code begins at offset 100h,the first 1K being occupied by the interrupt vector table, discussed in the earlier section. All
segment registers are set to the base segment address of the program.
A COM program keeps, its code, data, and stack within the same segment. Thus, its total size
should not exceed 64K bytes. A COM program sample is shown. The program's only segment(CSEG) must be declared explicitly using segment directives.
;TITLE ADD TWO NUMBERS AND STORE THE CARRY IN A THIRD; VARIABLECSEG SEGMENTASSUME CS:CSEG, DS:CSEG, SS:CSEGORG 100hSTART:MOV AX, CSEG ; Initialize data segment
MOV DS, AX ; register
MOV AL, NUM1 ; Get the first numberADD AL, NUM2 ; Add it to. 2nd number
7/30/2019 System Software and Languages
11/55
MOV RESULT, AL ; Store the resultRCL AL, 01 ; Rotate carry into LSB
AND AL, 00000001B ; Mask out all but LSBMOV CARRY, AL ; Store the carry result
MOV AY,4C00hINT 21h
NUM1 DB 15h ; First number stored hereNUM2 DB 20h ; Second number stored here
RESULT DB ? ; Put sum here
CARRY DB ? ; Put any carry hereCSEG ENDSENDSTART
The ORG directive sets the location counter at offset 100h before generating any instruction. A
COM program takes up less space on disk, as compared to the EXE program. In spite of this itallocates all available RAM when loaded. COM programs require at least one full segment, because
they automatically place their stack at the end of the segment.
EXE ProgramsAn EXE program is stored on disk with extension EXE. EXE programs are longer than the COMprograms, because with each EXE program is associated an EXE header followed by a load module
containing the program itself The EXE header, is of fixed 256 bytes, and contains information,which is used by DOS to correctly calculate the address of segments and other components. We
will not go into the details of these.
The load module consists of separate segments, which may be thought of as reserved area forinstructions, variables and stack. The EXE program may contain up to 64K segments, although at
the most only four segments may be active at any time. The segments may be of variable size,with maximum being 64K bytes.
Advantages Of exe programs are :
EXE programs are better suited to debugging.EXE-format assembler programs are more easily converted into subroutines for high-level
languages.
The third reason has to do with memory management. EXE programs are more easily relocatable,because, there is no ORG statement, forcing the program to be loaded from a specific address. Also
to fully use multitasking operating system, programs must be able to share computer memory andresources. An EXE program is easily able to do this.
ASSEMBLER / MACRO PROCESSOR
INTRODUCTIONComputers have changed a lot since the days when people used to communicate with them by onand off switches denoting primitive instructions. With present day computers interaction has
become more user-friendly because of the advancement in hardware and software tools. One
category of software which assist in the mechanics of software development is system software.
Assembler, linker/loader, compiler, operating system all belong to the realm of system software.
We discussed several components of programming languages, basic definitions of Assembler,
Compiler, interpreters and differences among them. In this unit our focus will be on the
implementation and use of assemblers. We will also cover broadly the use of macro processor,loaders and linkers. This unit is organized as follows:
ASSEMBLERAssembler ImplementationAn assembly is a program that accepts as input, an assembly language program and produces its
machine language equivalent along with information for the loader (Figure 1).
7/30/2019 System Software and Languages
12/55
Fig. 1: Assembler
For example, the externally defined symbols (library program) must be indicated to the loader theassembler does not know the address of these symbols and it is up to the loader to find the
programs containing them, load them into memory and place the values of these symbols in thecalling program. Here we will discuss the different approaches to design of an assembler and its
related program. Assembler and its related Program The assembler-language program containsthree kinds of entities. Absolute entities include operation codes, numeric and string constants
and fixed addresses. The values of absolute entities are independent of which storage locations
the resulting machine code will eventually occupy.
Relative entities include the addresses of instructions and of working storage. These are fixed only
with respect to each other, and are normally staled relative to the address of the beginning of themodule. An externally defined entity is used within a module but not defined within it Absolute or
relative is not necessarily known at the time the module is translated.
The object program includes identification of which addresses are relative. which symbols aredefined externally, and which internally defined symbols are expected to be referenced externally.
In the modules in which the latter are used. they are considered to be externally defined. Theseexternal references are resolved for two or more object programs by a linker. The linker accepts
the several object program as input and produces a single program ready for loading, hencetermed a load program.
The module is free of external references and consists essentially of machine-language code
accompanied by a specification of which addresses are relative. When the actual main storagelocations to be occupied by the program become known, a relocating loader reads the program
into storage and adjusts the relative addresses to refer to those actual locations. The output fromthe loader is a machine-language program ready for execution. The overall process is depicted in
Figure 3. If only a single source-language module containing no external references is translated,it can be loaded directly without intervention by the linker. In some programming systems the
format of linker output is sufficiently compatible with that of its input to permit the linking of apreviously produced load module with some new object modules.
The functions of linking and loading are sometimes both effected by a single program, called a
linking loader. Despite the convenience of combining the linking and loading functions, it isimportant to realize that they are distinct functions, each of which can be performed
independently of the other.
7/30/2019 System Software and Languages
13/55
Fig. 3 : Program Translation
LOAD AND GO ASSEMBLERThe simplest assembler program is the load and go assembler. It accepts as input a programwhose instructions are essentially one to one correspondence with those of machine language but
with symbolic names used for operators and operands. It produces machine language as outputwhich are loaded directly in main memory and gets executed. The translation is usually performed
in a single pass over the input program text. The resulting machine language program occupiesstorage locations which are fixed at the time of translation and cannot be changed subsequently.
The program can call library subroutines, provided that they occupy other locations than thoserequired by the program. No provision is made for combining separate subprograms translated in
this manner. The load and go assembler forgoes the advantages of modular programdevelopment. Among the most of these are
(1) the ability to design code and test different program components in parallel.
(2) change in one particular module does not require scanning the rest of program. Mostassemblers are therefore designed to satisfy the desire to create programs in modules. These
module assemblers. generally are developed in a two-pass translation. During the first pass theassembler examines the assembler-language program and collects the symbolic names into a
table. During the second pass, the assembler generates code which is not quite in machinelanguage. It is rather in a similar form, sometimes called "relocatable code" and here called object
code. The program module in object-code form is typically called an object module.
ONE-PASS MODULE ASSEMBLERThe translation performed by an assembler is essentially a collection of substitutions: machine
operation code for mnemonic, machine address for symbolic, machine encoding of a number forits character representation, etc. Except for one factor, these substitutions could all be performed
in one sequential pass over the source text. That factor is the forward reference (reference to an
instruction which has not yet been scanned by an assembler). The separate passes of the twopass assemblers are required to handle forward references without restriction. If certain
limitations are imposed, however, it becomes possible to handle forward references withoutmaking two passes. Different sets of restrictions lead to the one pass assembler. These one- pass
assemblers are particularly attractive when secondary storage is either slow or missing entirely,as on many small machines.
TWO PASS ASSEMBLERMostly assembler are designed in two passes stages), therefore, they are called Two-Pass
Assemblers. 'Re pass-wise grouping of tasks in a two pass assembler is given below:
Pass I
7/30/2019 System Software and Languages
14/55
Separate the symbols, mnemonic op-code and operational fields.Determine the storage requirement for every assembly language statement and up date the
location counter.Build the symbol table. (Table that is used to store each label and its corresponding value).
Pass IIGenerate object code.
FUNCTIONThe program of figure 4, although, written in a hypothetical assembler language, contains the
basic elements which need to be translated into machine language. (It is not essential forstudents to understand the meaning of each statement of the program.) For ease of reference,
each instruction is defined by a line number, which is not part of the program. Each instruction inour language contains either an operation specification (lines 1- 15) or a storage specification
(lines 16- 21). An operation specification is a symbolic operation code, which may be preceded bya label and must be followed by 0, 1, or two operand specifications, as appropriate to the
operation. A storage specification is a symbolic instruction to the assembler. In our assemblerlanguage, it must be preceded by a label and must be followed, if appropriate, by a constant
FIXED. Labels and operand specifications are symbolic addresses; every operand specificationmust appear somewhere in the program as a label.
Line Label Operation Operand 1 Operand 2
1 COPY ZERO OLDER
2 COPY ONE OLD
3 READ LIMIT
4 WRITE OLD
5 FRONT LOAD OLDER
6 ADD OLD
7 STORE NEW
8 SUBST LIMIT
9 BRPOS FINAL 10 WRITE NEW
11 COPY OLD OLDER
12 COPY NEW OLD
13 JMP FRONT
14 FINAL WRITE LIMIT
15 STOP
16 ZERO CONST 0
17 ONE CONST
18 OLDER SPACE
19 OLD SPACE
20 NEW SPACE
21 LIMIT SPACE
figure 4 : Sample Assembler-Language Program
Operation Code No of
Symbolic Machine Length Operands Action
7/30/2019 System Software and Languages
15/55
ADD 02 2 1 ACC - ACC + OPDI
JMP 00 2 1 Jump to OPDI
JMPNEG 05 2 1Jump to OPDI if ACC
0
JMPZERO 04 2 1Jump to OPDI IF ACC =
0
COPY 13 3 2 PD2 - OPDI
DIVIDE 10 2 1 ACC - ACC / OPDI
LOAD 03 2 1 ACC - OPDI
MULT 14 2 1 ACC -ACC X OPDI
READ 12 2 1 OPDI - input stream
STOP 11 1 0 Stop execution
STORE 07 2 1 OPDI - ACC
SUB 06 2 1 ACC - ACC -OPDI
WRITE 08 2 1 Output stream - OPDI
figure 5 : Instruction SetOur hypothetical machine has a single accumulator and a main storage of unspecified size. Its 14
instructions are listed in Figure 6. Ale first column shows the operation code and the second givesthe machine-language equivalent (in decimal). The fourth column specifies the number of
operands, and the last column describes the action which ensues when the instruction isexecuted. In that column "ACC", "OPDI", and "OPD2" refer to contents of the accumulator, of the
first operand location, and of the second operand location, respectively. The length of eachinstruction in words is, 1 greater than the number of its operands.
Thus if the machine has 12 bit words, an ADD instruction is 2 words of 24 bits, long. The table'sthird column, which is redundant, gives the instruction length. If our hypothetical computer had a
fixed instruction length, the third and fourth columns could both he omitted.
The storage specification SPACE reserves one word of storage which presumably will eventually
hold a number; there is no operand. lie storage specification FIXED also reserves a word ofstorage; it has an operand which is the value of a number to be placed in that word by the
assembler.
The instructions of the program are presented in four fields, and might indeed be, constrained so
such a format on the input medium. The label, if present, occupies the first field. The second fieldcontains the symbolic operation code or storage specification which will hence- forth be referred
to simply as the operation. The third and fourth fields hold the operand specification, or simplyoperands, if present.
Although, it is not at all important to our discussion to understand what the example program
does, the foregoing specifications of the machine and of its assembler language reveal thealgorithm. The program simply, computes the so-called Fibonacci numbers (0,1,1,2,3,5,8,...).
This program is also written in BASIC programming language of Unit 1 Course 2. Now that wehave seen the elements of an assembler-language program we can ask what functions the
assembler must perform in translating it Here is the listReplace symbolic addresses by numeric addresses.
Replace symbolic operation codes by machine operation codes.Reserve storage for instructions and data.
Translate constants into machine representation.The assignment of numeric addresses can be performed without prior knowledge of what actual
locations will eventually be occupied by the assembled program. It is necessary only to generateaddresses relative to the start of the program. We shall assume that our assemble normally
assigns addresses starting at 0. In translating line 1 of our example program, the resultingmachine instruction will therefore be assigned address 1 and occupy 3 words, because COPY
7/30/2019 System Software and Languages
16/55
instructions are 3 words long. Hence the instruction corresponding to line 2 will be assignedaddress 3, the READ instruction will be assigned address 6, and the WRITE instruction of line 4
will be assigned address 8, and so on to the end of the program. But what addresses will beassigned to the operands named ZERO and OLDER? These addresses must be inserted in the
machine-language representation of the first instruction.
IMPLEMENTATIONThe assembler uses a counter to keep track of machine- language addresses. Because theseaddresses will ultimately specify locations in main storage, the counter is called the location
counter. Before assembly, the location counter is initialized to zero. After each source line hasbeen examined on the first pass, the location counter is incremental by the length of themachine-language code which will ultimately be generated to correspond to that source line.
When the assembler first encounters line 1 of the example program, it cannot replace thesymbols ZERO and OLDER by addresses because those symbols make forward references to
source language program lines not yet reached by the assembler. The most straightforward wayto cope with the problem of forward references is to examine the entire program, text once,
before attempting to complete the translation. During that examination, the assemblerdetermines the address which corresponds to each symbol, and places both the symbols and their
addresses in a symbol table. This is possible because each symbol used in an operand field mustalso appear as a label. The address corresponding to a label is just the dress of the symbol table
requires one pass over the source text. During a second pass, the assembler uses the addressescollected in the symbol table to perform the translation.
As such symbolic address is encountered in the second pass, the corresponding numeric address
is substituted for it in the object code. Two of the most common logical errors in assembler-language programming involve improper use of symbols. If a symbol appears in the operand field
of some instruction, but nowhere in a label field. it is undefined. If a symbol appears in the labelfields of more than one instruction, it is multiply defined.
In building the symbol table on the first pass, the assembler must examine the label field of eachinstruction to permit it to associate the location counter value with each symbol. Multiply-defined
symbols will be found on this pass. Undefined symbols, on the other hand, will not be found onthe first pass unless the assembler also examines operand fields for symbols. Although this
examination is not required for construction of the symbol table, normal practice is to perform it
anyhow, because of its value in early detection of program errors. There are many ways toorganize a symbol table. The organisation of a symbol table will not be discussed in this Unit.
The state of processing after fine 3 is shown in Figure 7. During processing of line 1, the symbols
ZERO and OLDER were encountered and entered into the fiat two positions of the symbol table,
The operation COPY was identified. and instruction length, information from figure 6 used toadvance the location counter from 0 to 3. During processing of line 2 two more symbols were
encountered and entered in the symbol table and the location counter was advanced from 3 to 6.Line 3 yielded the fifth symbol, LIMIT, and caused incrimination of the location counter from 6 to
8. At this point the symbol table holds five symbols, none of which yet has an address. Thelocation counter holds the address 8, and processing ready to continue from line 4. Neither the
line numbers nor the addresses shown in part (a) of the figure are actually part of the source-language program. The addresses record the history of incrimination of the location counter the
line numbers permit easy reference. Clearly, the assembler needs not only a location counter, butalso a line counter to keep track of which source line is being processed.
Line Address Label Operation Operand 1 Operand 2
1 0 COPY ZERO OLDER
2 3 COPY ONE OLD
3 6 READ LIMIT
(a) Source text scanned
7/30/2019 System Software and Languages
17/55
Symbol Address
ZERO --
OLDER --
ONE --
OLD -- Location counter ; 8
LIMIT -- Line counter ; 4
(b) Symbol table: Countersfigure 6 : First Pass After Scanning Line 3During processing of line 4 the symbol OLD is encountered for the second time. Because it isalready in the symbol table, it is not entered again. During processing of line 5, the symbol
FRONT is encountered in ft label field. It is entered into the symbol table, and the current locationcounter value, 10 is entered with it as its address. Figure 7 displays the state of the translation
after line 9 has been processed.
Line Address Label Operation Operand 1 Operand 2
1 0 COPY ZERO OLDER
2 3 COPY ONE OLD
3 6 READ LIMIT 4 8 WRITE OLD
5 10 FRONT LOAD OLDER
6 12 ADD OLD
7 14 STORE NEW
8 16 ADD OLD
9 18 JWPOS FINAL
10 20 WRITE NEW
11 22 COPY OLD OLDER
12 25 COPY NEW OLD
13 28 JMP FRONT
14 30 FINAL WRITE LIMIT
15 32 STOP
16 33 ZERO CONST 0
17 34 ONE CONST 1
18 35 OLDER SPACE
19 36 OLD SPACE
20 37 NEW SPACE 21 38 LIMIT SPACE
(a) Source text scanned
Symbol Address
ZERO 33
OLDER 35
ONE 34
OLD 36
7/30/2019 System Software and Languages
18/55
LIMIT 38
FRONT 10 Location Counter : 39
NEW 37
FINAL 30 Line Counter .. 22
(b) Symbol table: Counters
Figure : 7The XX can be thought of as a specification to the loader will eventually process the object code,
that the content of the location corresponding to address 35 does not need to have any specificvalue loaded. The loader can then just skip over that location. Some assemblers specify anyway a
particular value for reserved storage locations, often zeros. There is no logical requirement to doso, however, and the user unfamiliar with his assembler is ill-advised to count on a particular
value.
Address Length Machine Code
00 3 13 33 35
03 3 13 34 36
06 2 12 38
08 2 08 36
10 2 03 3512 2 02 36
14 2 07 37
16 2 06 38
18 2 01 30
20 2 08 37
22 3 13 36 35
25 3 13 36 35
28 2 00 10
32 1 11
33 1 0034 1 01
35 1 XX
36 1 XX
37 1 XX
XX
38 1
Figure 8 : Object Code Generated on 2nd Pass
The specifications CONST and SPACE do not correspond to machine instructions. They are really
instructions to the assembler program. Because of this, we shall refer to them as assemblerinstructions. Another common designation for them is pseudo-instructions. Neither term is really
satisfactory. Of the two types of assembler instructions in our example program, one results inthe generation of machine code and the other in the reservation of storage. Later we shall see
assembler instructions which result in neither of these actions. One organization is to use aseparate table which is usually searched before the operation code table is searched. Another is to
include both machine operations and assembler instructions in the same table. A field in the table
entry then identifies the types to the assembler.
A few variations to the foregoing process can be considered. Some of the translation can actuallybe performed during the first pass. Operation fields must be examined during the first pass to
determine their effect on the location counter. The second pass table lookup to determine the
7/30/2019 System Software and Languages
19/55
machine operation code can be obviated at he cost of producing intermediate test which holdsmachine operation code and instruction length in addition to source text.
Another translation which can be performed during the first pass is that of constants, e.g. fromsource- language decimal to machine-language binary. The translation of any symbolic addresses
which refer backward in the text, rather than forward, could be performed on the first pass, but itis more convenient to wait for the second pass and treat all symbolic addresses uniformly.
A minor variation is to assemble addresses relative to a starting address other than 0. Thelocation counter is merely initialized to the desired address. If, for example, the value 200 is
chosen, the symbol table would appear as in figure 11.The object code corresponding to line 1
wouldbe200 3 13 233 235.
Symbol Address
ZERO 233
OLDER 235
ONE 234
OLD 236
LIMIT 238
FRONT 210
NEW 237
FINAL 230
figure 9 : Symbol Table with Starting Location 200If it were known at assembly time that the program is to reside at location 200 for execution then
full object code with address and length need not be generated. The machine code alone wouldsuffice. In this event the result of translation would be the following 39-word sequence.
13 233 235 13 234 236 12 238 08
236 03 235 02 236 07 237 06 238
01 230 08 238 13 236 235 13 237
236 00 210 08 238 11 00 01 XX
XX XX XX
MACRO PROCESSORThe assembly language programmer often finds it necessary to repeat some statements or block of
code several times in a program. The block may consist of code to swap sets of registers, do somearithmetic operations. In this situation the programmer find a macro instruction facility useful.
Macro instruction (often called macros) are single line abbreviation for group of instructions. Inemploying a macro, the programmer essentially defines a single instruction to represent a block of
code. For every occurrence of this one-line macro instruction in his program, the macro processingassembler substitute the entire block.
Macro Definition and UsageTo highlight salient aspects of macro-processor. The example is very similar to Intel's 8 bit
microprocessor assembly language instruction.Example : -
MACRO
INCRMT &A , &B
LOAD &A Macro
ADD &B Definition
STORE &A
7/30/2019 System Software and Languages
20/55
ENDM
INCRMT X,Y LOAD X Macro
ADD Y expansion
STORE X
ENDM Macro Program
Figure 10
A macro definition is placed at the start of a program, enclosed between the statements MACRO
and ENDM. A MACRO statement indicates that a macro definition starts, while statement ENDMindicates the end of a macro definition. Thus, a group of statements starting with MACRO andending with ENDM constitutes one macro definition unit. If many macros are to be defined in a
program, as many definition modules will exist at the start of the program. Each definition modulecontains a new operation and defines it to consist of a sequence of assembly language statement
In example above, INCRMT is defined to be the name of the LOAD-ADD-STORE instructionsequence. The operation defined by a macro can be used by writing the macro name in the
mnemonic field and its operands in the operand field of an assembly statement Appearance of amacro name in the mnemonic field amounts to a call on the macro. The assembler replaces such a
statement by the statement sequence comprising the macro. This is known as macro expansion.
INCRMTX,Y
is shown to lead to insertion of the assembly statementsLOAD XADD Y
STORE Xin its place. All macro calls in a program are expanded in this fashion.
DEFINING A MACROLet us take another look at the macro definition unit appearing in the following Figure 10.The
macro header statement indicates the existence of a macro definition unit Absence of the headerstatement as the first statement of a program or ft first statement following a macro definition unit,
signals the start of the main assembly language program. The next statement in the definition unitis die prototype for a macro call. This statement names the macro and indicates how the operands
in any call on the macro would be written.The prototype is followed by the so called model statements. These are assembly statements which
will replace the macro call as a result of macro expansion.
Positional ParametersThe prototype statement indicates how operands in a macro call would be written. These operandsare called parameters or arguments. All parameters used in the prototype statement have names
starting with the special character '&'. These parameters are known as formal parameters. A macrocall is written using parameter names which do not start with ft special character '&'. These are
known as actual parameters.The lists of formal and actual parameters also called as formal and actual parameter lists specified
in the prototype and macro call statements respectively, establish a correspondence between eachformal parameter and an actual parameter. In figure 10 , this correspondence is determined by the
relative positions of these parameters in their respective lists. Thus the first actual parameter inthe fist is paired with the first of formal parameters etc.
Considering the prototype and macro call statements once again.INCRMT &A,&B ... prototype
INCRMT X,Y ... macro callWe see that X would be paired with &A and Y with &B. While expanding a macro call, any formal
parameter appearing within a model statement is replaced by the corresponding actual parameter.
This is how expansion of the call INCR X,Y heads to the following statements
LOAD X
ADD YSTORE X
7/30/2019 System Software and Languages
21/55
Schematics for Macro-Expansion
Above we touched upon the fundamental aspects of macro expansion. From the discussion, itappears that the process of macro expansion is similar to language translation. The source
program containing macro definitions and calls is translated into an assembly language, programwithout any macro definitions or calls. This program form can now be handed over to a
conventional assembler as to obtain the target languages form of the program.
In such a schematic (Figure 11), the process of macro expansion is completely segregated from the
process of assembly program. The translator which performs macro expansion in this manner iscalled a macro pre-processor. The advantage of this scheme is that any existing conventionalassembler can be enhanced in this manner to incorporate macro processing. It would reduce the
programming cost involved in making a macro facility available to programmer using a computersystem. The disadvantage is that this scheme is probably not very efficient because of the time
spent in generating assembly language statements and processing them again for the purpose oftranslation to the target language.
Fig. 12 : A pre-processor based scheme for macro assembly
ISSUES RELATED TO THE DESIGN OF A MACRO PRE-PROCESSORAs against this schematic of prefixing a conventional assembler with a macro pre-processor, it is
possible to design a macro assembler which not only processes macro definitions and macro callsfor the purpose of expansion, but also assembles the expanded statements along with the original
assembly statements. The macro assembler should require fewer passes over the program thanthe pre-processor scheme. This holds out a promise for better efficiency. But for the sake of
simplicity in this section, we will discuss the issues related to implementation of macro pre-processor instead of actual implementation.
Issues related to the Design of a Macro Pre ProcessorOur discussion regarding the definition and use of macros in an assembly program has brought out
to some extent the working principles of a macro pre-processor. To summarise, we should be ableto differentiate between macro names and invalid operation code mnemonics. On thus recognizing
a call on a macro, we should be able to access the text of its definition so that we can expand thecall. For generating a statement during expansion, we need to develop a simple scheme for
substituting the appearance of a formal parameter with its value. Correspondence between aformal parameter and its value will have to be established for this purpose. It is desirable that
instead of performing this action for every appearance of a formal parameter, correspondent
between formal parameters and their value should be established once and for all, at the start ofmacro expansion.Considerations of positional and keyword correspondence would thus get localized to the start of
macro expansion only. This would have the further advantage that no distinction would need to bemade between keyword and positional parameters during macro expansion.
Step 1:
Scan all macro definitions one by one. For each macro defined.enter its name in the Macro Name Table (MNT).
store the entire macro definition in the Macro Definition Table (MDT).add auxiliary information to the MNT indicating where the definition of a macro can be found in
MDT.
7/30/2019 System Software and Languages
22/55
Step 2:Examine all statements in the assembly source program to detect macro calls. For each macro call
locate the macro in MNT.obtain information from MNT regarding position of the macro definition in MDT.
process the macro call statement to establish correspondence between all formal parameters andtheir values (i.e. actual parameters).
expand the macro call by following the procedure given in step 3.Step 3:
Process the statements in the macro definition as found in MDT in their expansion time order until
the ENDM statement is encountered. The conditional assembly statement AIF and AGO will enforcechanges in the normal sequential order based on certain expansion time relations between valuesof formal parameters and expansion time variables.
In order to have a complete working scheme within the above framework, we need to finalise thefollowing details:
Method of establishing correspondence between a formal parameter and its value.Method of sequencing through the statements comprising a macro definition in expansion time
order.Method of expanding a model statement
Allocation of storage for expansion time variables and access to their values during expansion.
COMPILER/ LINKER LOADER
LOADERSINTRODUCTIONThe purpose of this section is to discuss various functions of a loader. The loader is aprogram which accepts an object code and prepare them for an execution. An object code
produced by an assembler/compiler cannot be executed without any modification. As manyas four more function must be performed first. These functions are performed by a loader.
These functions are:Allocation of space in main memory for the programs.
Linking of a program with each other like library programsAdjust all address dependent locations. such as address constants, to correspond to the
allocated space. it is also called relocationPhysically load the machine instructions and data into memory. The following figure 1 shows
the function of a loader
Fig. 1: Function of a loader.
Let us examine the need of some of these function of the loader.
Linking
7/30/2019 System Software and Languages
23/55
The need for linking a program with other programs arises because a program written by aprogrammer or its translated version is rarely of a 'stand-alone' nature. That is a program
generally cannot execute on its own. without requiring the presence of some otherprograms in the computer's memory.
For example. consider a program written in high level languages like C. Such a program
may contain calls on certain Input/Output functions like Printf ( ), Scanf ( ) etc., which amnot written by the programmer himself. During program execution, those standard functions
must reside into the main memory. Furthermore, every time an Input/Output function is
called by a C language program, control should get transferred to the appropriate function.The linking function makes address of programs known to each other so that such transferscan take place during the execution.
RELOCATIONAnother function commonly performed by a loader is that of program relocation. This
function can be explained as follows: Assume that a program written in C ( let us call it A)calls standard function Printf ( ). A and Printf ( ) would have to be linked with each other.
But where is main storage shall we load A and Printf ( ). A possible solution would be to loadthem according to the addresses assigned when they were U~W& For example, as
translated . A might be given stone area from 200 to 300 while Printf ( )function occupiesarea from 100 to 150.
If we were to load these programs at their translated addresses, a lot of storage lying
between them may go waste. Another possibility is that both A and Printf ( ) may have beentranslated with the identical start address of 100. 7bus, A extends from 100 to 200 while
Printf ( ) extends from 100 to 1 50. But there is simply no way A and Printf ( )can co-existat same storage location. Therefore, the loader may have to relocate one or both of these
programs to avoid address conflicts or storage waste. It should be noted that relocation ismore than simply moving a program from one area to another in the storage. It refers to
adjustment of address fields and not to movement of a program.
The task of relocation is to add some constant value to each relative address in the segment(the segment is a unit of information dust is treated as an entity, be it a program or data. It
is possible to produce multiple program or data segment in a single source file). The pan of
a loader which performs relocation is called relocating loader.
LOADER SCHEMESThere, are several schemes accomplishing the four loading function. These schemes are (i)
Absolute loader (ii) Relocating Loader (iii) Direct Linking Loader (iv) Dynamic Loading (v)Dynamic Linking etc.
Absolute Loader : The task of an absolute loader is virtually trivial. The loader simplyaccepts the machine language code produced by the assembler and places it into main
memory at the location specified by the assembler.
Relocating Loader: To avoid possible reassembling of all subroutines when a singlesub-routine is changed and to perform the tasks of allocation and linking for theprogrammer. The general class of relocating loader was introduced.The output of a relocating loader is the object program and information about all other
programs it references. In addition, there is information (relocation information) as tolocation in this program that need to be changed if it is to be loaded in an arbitrary location
in memory.
Direct Linking Loader: It is a general relocatable loader, and is perhaps the mostpopular loading scheme presently used. It has the advantage of allowing the programmermultiple procedure segments and multiple data segments and of giving him complete
freedom in referencing data or instructions contained in other segments. This provides
7/30/2019 System Software and Languages
24/55
flexible inter segment referencing and accessing ability, while at the same time allowingindependent translations of programs. The other two loader schemes will be discussed in
the next section.
Dynamic Loading And Linking: There are numerous variations to the previouslypresented loader schemes. One disadvantage of the direct-linking loader, as presented, is
that it is necessary to allocate, relocate, link. And load all of the subroutines each time inorder to execute a program. Since there may be tens and often hundreds of subroutines
involved, especially when we include utility routines such as SQRT etc., this loading process
can be extremely time- consuming.
Furthermore, even though the loader program may be smaller than the assembler, it doesabsorb a considerable amount of space. These problems can be solved by dividing the
loading process into two separate programs: a binder and a module loader. A binder is aprogram that performs the same functions as the direct-linking loader in binding
subroutines together, but rather Cm placing the relocated and linked text directly intomemory, it outputs the text as a file. This output file is in a format ready to be loaded and is
typically called a load module. The module loader merely has to physically load the moduleinto main memory. The binder essentially performs the functions of allocation, relocation,
and linking; the module loader merely performs the function of loading. There are twomajor classes of binders. The simplest type produces a load module that looks very much
like a single absolute loader filet This means that the specific memory allocation of theprogram is performed at the time that the subroutines are bound together. A more
sophisticated binder, called a linkage editor. can keep auk of the relocation information sothat the resulting load module can be further relocated and thereby loaded anywhere, in
memory. In this case the module loader must perform additional allocation and relocation aswell as loading, but it does not have to worry about the complex problems of linking.
In both cases, a program that is to be used repeatedly need only be bound once and then
can be loaded whenever required. The first binder is relatively simple and fast. The secondone (linkage editor binder) is somewhat more complex but allows a more flexible allocation
and loading scheme.
Dynamic LoadingIn each of the previous loader schemes we have assumed that all of the subroutines neededare loaded into main memory at the same time. If the total amount of memory required by
all these subroutines exceeds the amount available, as is common with large programs onsmall computers, there is trouble! There are several hardware, techniques, such as paging
and segmentation, that attempt to solve this problem.
Usually the subroutines of a program are needed at different times: for example, pass 1 andpass 2 of an assembler are mutually exclusive ~ 1 and pass 2 should not simultaneously
occupy memory resources). By explicitly recognizing which subroutines call othersubroutines it is possible to produce an overlay structure that identifies mutually exclusive
subroutines.
Figure 2 illustrates a program consisting of five subprograms (A, B. C, D and E) that require100K bytes of memory. The arrows indicate that subprogram A only calls B, D and E;subprogram B only calls C and E; subprogram D only calls E; and subprograms C and E do
not call any other routines. Figure 16(a) highlights that interdependencies between theprocedures. Note that procedures B and D are never in use at the same time; neither are C
and E. If we load only those procedures that are actually to be used at any particular time.the amount of memory needed is equal to the longest path of the overlay structure.
This happens to be 7-K for the example in Figure 16(b) procedures A, B and C. Figure 2 (c)
illustrates a storage assignment for each procedure consistent with the overlay structure.In order for the overlay structure to work it is necessary for the module loader to load the,
various procedures as they are needed. We will not go into their specific details, but there
7/30/2019 System Software and Languages
25/55
are many binders Capable of processing and allocating an overlay structure. The portion ofthe loader that actually intercepts the calls and loads the necessary procedure is called the
over lay supervisor or simply the flipper. This overall scheme is called dynamic loading orload on-call
Figure 2 ( A )
Figure 2 ( B )
Figure 2 ( C )
7/30/2019 System Software and Languages
26/55
Figure 2 ( D )
Fig. 2 : Dynamic Loading
DYNAMIC LINKINGThe major disadvantage of all of the previous loading schemes is that if a subroutine is
referenced but never executed (e.g. if the programmer had placed a call statement in his
program but this statement was never executed because of a condition did not satisfy) the
loader would still incur the overhead of linking the subroutine.
Furthermore, all of these schemes require the programmer to explicitly name all procedures
that might be called. A very general type of loading scheme is charted dynamic linking. Thisis a mechanism by which loading and linking of external references are postponed until
execution time. The loader loads only the main program. If the main program shouldexecute a transfer instruction to an external address, or should reference an external
variable (that is, a variable that has not been defined in this procedure segment), the loaderis called. Only then is the segment containing the external reference loaded. An advantage
here is that no overhead is incurred unless the procedure to be called or referenced isactually used. A further advantage is that the system can be dynamically reconfigured. The
major drawback to using this type of loading scheme is the considerable overhead and
complexity incurred, due to the fact that we have postponed most of the binding processuntil execution time.
Now we will discuss the implementation of the simplest type of loader scheme which iscalled an absolute loader.
Implementation of an Absolute LoaderAbsolute loaders are simple to implement but they do have disadvantages. First, theprogrammer must specify to the assembler the address in memory when the program is to
be loaded. Further, if there are multiple function to be called within a program, the
programmer must remember the address of each and use that absolute address explicitly inhis Other functions to perform linking of functions. The figure B illustrates the operation of
an absolute loader. The programmer must he careful not to assign two subroutine functionto the same or overlapping address.
7/30/2019 System Software and Languages
27/55
Figure 3 : Absolute LoaderThe program First. c is assigned to locations 100-300 and the sqrt function is assigned
location 400-450. If changes were made to A that increased its length to more than 300bytes, the end of first. c (at 100+300 = 400) would overlap the start of sqrt (at 400). It
would then be necessary to assign sqrt to a new address. Furthermore, it would also benecessary to modify all other functions that referred to sqrt. In situation when dozen of
subroutines are being used, this manual shuffling can get very complex, tedious andwasteful of time and memory.
The four loader functions are accomplished as follows in an absolute loading scheme:
MACRO
INCRMT &A , &B
LOAD &A Macro
ADD &B Definition
STORE &A
ENDM
INCRMT X,Y LOAD X Macro
ADD Y expansion
STORE X
ENDM Macro Program
COMPILERThe study of compiler designing form a central theme in the field of computer science. An
understanding of the technique used by high level language compilers can give the programmer aset of skills applicable in many aspects of software design - one does not have to be a compiler
writer to make use of them.
Assembler which translates assembly language program into machine language. here we will lookat another type of translator called compiler. The compiler writing is not confined to one discipline
only but rather spans several other disciplines: programming languages, computer architecture,theory of programming languages, algorithms, etc. Today a few basic compiler writing techniques
can be used to construct translators for a wide variety of languages. This unit is intended as anintroduction to the basic essential features of compiler designing.
WHAT IS A COMPILER?A compiler is a software (Program) that reads a program written in a source language and
translates it into an equivalent program in another language - the target language (see figure4).The important aspect of compilation, process is to produce diagnostic (error messages) in the
source program. These error messages are mainly due to the grammatical mistakes done by aprogrammer. A familiarity with the material covered in this unit will be a great help in
understanding the inner function of a compiler
7/30/2019 System Software and Languages
28/55
Fig. 4 . A Complier
There are thousands of source languages, ranging from C and PASCAL to specialized languagesthat have arisen in virtually every area of computer application. Target languages a also in
thousands. A target language may be another programming language or the machine language oran assembly language. Compilers are classified as single pass, multitasks, debugging or optimizing,
depending on how they have been constructed or on what functions are supposed to perform.Earlier (in 1950's) compilers were considered as a difficult program to write.
The first FORTRAN compiler, for example, took 18 staff-years to implement B now several newtechniques and tools have been developed for handling many of the important tasks that occur
during compilation process. Good implementation languages, programming environments (editors,debuggers, etc.) and software tools have also been developed. With these development compiler
writing exercise has become easier.
Approaches To Compiler DevelopmentThere are several approaches to compiler developments. Here we will look at some of them are -
Assembly Language CodingEarly compilers were mostly coded in assembly language. The main consideration was to increase
efficiency. This approach worked very well for small High Level Languages (HLL). As languagesand their compilers became larger, lots of bugs started surfacing which were difficult to remove.The major difficulty with assembly language implementation was of poor software maintenance.
Around this time, it was realised that coding the compilers in high level language would overcome
this disadvantage of poor maintenance. Many compilers were therefore coded in FORTRAN, theonly widely available HLL at that time. For example, FORTRAN H compiler for IBM/360 wascoded in FORTRAN. Later many system programming languages were developed to ensureefficiency of compilers written into HLL.Assembly language is still being used but trend is towards
compiler implementation through HLL.
Cross-CompilerA cross-compiler is a compiler which runs on one machine and generates a code for anothermachine. The only difference between a cross-compiler and a normal compiler is in terms of code
generated by it. For example, consider the problem of implementing a Pascal compiler on a new
piece of hardware (a computer called X) on which assembly language is the only programminglanguage already available. Under these circumstances, the obvious approach is to write the Pascalcompiler in assembler. Hence, the compiler in this case is a program that takes Pascal source as
input, produces machine code for the target machine as output and is written in the assemblylanguage of the target machine. The languages characterizing this compiler can be represented as:
7/30/2019 System Software and Languages
29/55
figure 5 :showing that Pascal source is translated by a program written in X assembly language (the
compiler) running on machine X into X's object code. This code can then be run on the targetmachine. This notation is essentially equivalent to the T-diagram. The T-diagram for this compiler
is shown in figure 5 .
Fig. 5 T-diagramThe language accepted as input by the compiler is stated on the left the language output by the
compiler is shown on the right and the language in which the compiler is written is shown at thebottom. The advantage of this particular notation is that several T-diagrams can be meshedtogether to represent more complex compiler implementation methods. This compiler
implementation involves a great deal of work since a large assembly language program has to be
written for X. It is to be noticed in this case that the compiler is very machine specific; that is, notonly does it run on X but it also produces machine code suitable for running on X.
Furthermore, only one computer is involved in the entire implementation process.The use of a high-level language for coding the compiler can offer great savings in implementation
effort. If the language in which the compiler is being written is already available on the computer inuse, then the process is simple. For example, Pascal might already be available on machine X, thus
permitting the coding of, say, a Modula-2 compiler in Pascal.
Such a compiler can be represented as:
7/30/2019 System Software and Languages
30/55
If the language in which the compiler is being written is not available on the machine, then all is
not lost, since it may be possible to make use of an implementation of that language on anothermachine. For example, a Modulc-2 compiler could be implemented in Pascal on machine Y,
producing object code for machine X:
The object code for X generated on machine Y would of course have, to be transferred to X for itsexecution. This process of generating code on one machine for execution on another is called
cross-compilation.
At first sight, the introduction of a second computer to the compiler implementation plan seems tooffer a somewhat inconvenient solution. Each time a compilation is required, it has to be done on
machine Y and the object code transferred, perhaps via a slow or laborious mechanism, to machineX for execution. Furthermore, both computes have to be running and inter-linked somehow, for
this approach to work.
BOOTSTRAPPINGIt is a concept of developing a compiler for a language by using subsets (small pail) of the samelanguage. Suppose that a Modula-2 compiler is required for machine X, but that the compiler be
coded in Modula-2. Coding the compiler in the language it is to compile is nothing nothing special
and, as will be seen, it has a great deal in its favour. Suppose further that Modula-2 is alreadyavailable on machine Y. In this case, the compiler can be run on machine Y, producing object codefor machine X:
This is the same situation as before except that the compiler is coded in Modula-2 rather thanPascal. The special feature of this approach appears in the next step. The compiler, running on Y, is
nothing more than a large program written in Modula-2. Its function an input file of Module-2statements into a functionally equivalent sequence of statement in X's machine code.
Therefore, the source statements of this Module-2 compiler can be passed into itself running on Yto produce a file containing X's, machine code. This file is of course a Module-2 compiler, which is
capable of being run on X. By making the compiler compile itself, a version of the compiler thatruns on X has been created.
7/30/2019 System Software and Languages
31/55
Once this machine code has been transferred to X, a self-sufficient Module-2 compiler is availableon X; hence there is no further use for machine Y for supporting Module-2 compilation.
This implementation plan is very attractive. Machine Y is only required for compiler development
and once this development has reached the stage at which the compiler can (correctly) compileitself, machine Y is no longer required. Consequently, the original compiler implemented on Y need
not be of the highest quality - for example, optimization can be completely disregarded. Furtherdevelopment (and obviously conventional use) of the compiler can then continue at leisure on
machine X.This approach to compiler implementation is called bootstrapping. Many languages,
including C, Pascal, FORTRAN and LISP have been implemented in this way.
Pascal was first implemented by writing a compiler in Pascal itself. This was done through several
bootstrapping processes. The compiler was then translated "by hand" into an available low level
language.
Compiler Designing PhasesThe compiler being a complex program is developed through several phases. Each phasetransforms the source program from one representation to another. The tasks of a compiler can be
divided very broadly into two sub-tasks.The analysis of a source program
The synthesis of the object program
In a typical compiler, the analysis task consists of 3 phases.Lexical analysis
Syntax analysis
Semantic analysisThe synthesis task is usually considered as a code generation phase but it can be divided into some
other distinct phases like intermediate code generation and code optimization. These four phasefunctions in sequence are shown in figure 6 . Code optimization is beyond this unit.
The nature of the interface between these four phases depends on the compiler. It is perfectlypossible for the four phases to exist as four separate programs.
Fig. 6 Compiler Design Phases
Lexical AnalysisLexical analysis is the first phase of a compiler. Lexical analysis, also called scanning, scans a
source program form left to right character by character and group them into tokens having acollective meaning. It performs two important tasks. First, it scans a source program character bycharacter from left to right and groups them into tokens (or syntactic element). Each token or
basic syntactic element represents a logically cohesive sequence of characters such as identifier
(also called variable), a keyword (if, then. else, etc.), a multi -character operator < =, etc. Theoutput of this phase goes to the next phase, i.e., syntax analysis or parsing. The interaction
between two phases is shown below in figure 7 .
7/30/2019 System Software and Languages
32/55
Fig. 7 Interaction between the first two phases
The second task performed during lexical analysis is to make entry of tokens into a symbol table ifit is not there. Some other tasks performed during lexical analysis are:
to remove all comments, tabs, blank spaces and machine characters.
to produce error messages (also called diagnostics) occurred in a source program.Let us consider the following Pascal language statement.
For i = 1 To 50 do sum = sum + x [i]; sum of numbers stored in array x
After going through the statement, the lexical analysis transforms it into the sequence of tokens