Chapter 2 Assemblers

72
Chapter 2 Assemblers System Software Chih-Shun Hsu

description

Chapter 2 Assemblers. System Software Chih-Shun Hsu. Basic Assembler Functions. Convert mnemonic operation codes to their machine language equivalent Convert symbolic operands to their equivalent machine addresses Build the machine instructions in the proper format - PowerPoint PPT Presentation

Transcript of Chapter 2 Assemblers

Page 1: Chapter 2 Assemblers

Chapter 2 Assemblers

System Software

Chih-Shun Hsu

Page 2: Chapter 2 Assemblers

Basic Assembler Functions

Convert mnemonic operation codes to their machine language equivalent

Convert symbolic operands to their equivalent machine addresses

Build the machine instructions in the proper format Convert the data constants specified in the source

program into their machine representations Write the object program and the assembly listing

Page 3: Chapter 2 Assemblers

Two Pass Assembler(2/1)

Forward reference—a reference to a label that is defined later in the program

Because of forward reference, most assembler make two pass over the source program

The first pass does little more than scan the source program for label definitions and assign addresses

The second pass performs most of the actual translation Assembler directives (or pseudo-instructions) provide

instructions to the assembler itself

Page 4: Chapter 2 Assemblers

Two Pass Assembler(2/2)

Pass 1 (define symbols) Assign addresses to all statements in the program Save the values (addresses) assigned to all labels Perform some processing of assembler directives

Pass 2 (assemble instructions and generate object program) Assemble instructions (translating operation codes and looking

up addresses Generate data values defined by BYTE, WORD, etc. Perform processing of assembler directives not done during

Pass 1 Write the object program and the assembly listing

Page 5: Chapter 2 Assemblers

Assembler Data Structure and Variable Two major data structures:

Operation Code Table (OPTAB): is used to look up mnemonic operation codes and translate them to their machine language equivalents

Symbol Table (SYMTAB): is used to store values (addresses) assigned to labels

Variable: Location Counter (LOCCTR) is used to help the assignment of

addresses LOCCTR is initialized to the beginning address specified in the

START statement The length of the assembled instruction or data area to be

generated is added to LOCCTR

Page 6: Chapter 2 Assemblers

OPTAB and SYMTAB

OPTAB must contain the mnemonic operation code and its machine language

In more complex assembler, it also contain information about instruction format and length

For a machine that has instructions of different length, we must search OPTAB in the first pass to find the instruction length for incrementing LOCCTR

SYMTAB includes the name and value (address) for each label, together with flags to indicate error conditions

OPTAB and SYMTAB are usually organized as hash tables, with mnemonic operation code or label name as the key, for efficient retrieval

Page 7: Chapter 2 Assemblers

Example of a SIC Assembler Language Program (3/1)

Page 8: Chapter 2 Assemblers

Example of a SIC Assembler Language Program (3/2)

for (int i=0; i<4096; i++){ scanf(“%c”,&BUFFER[i]); if (BUFFER[i]==0) break;}LENGTH=i;

Page 9: Chapter 2 Assemblers

Example of a SIC Assembler Language Program (3/3)

for (int i=0; i<LENGTH; i++){ printf(“%c”,BUFFER[i]);}

Page 10: Chapter 2 Assemblers

Program with Object Code (3/1)14 1033

Page 11: Chapter 2 Assemblers

Program with Object Code (3/2)

54 1039+8000=9039

Page 12: Chapter 2 Assemblers

Program with Object Code (3/3)

Page 13: Chapter 2 Assemblers

SYMTAB

symbol value flags

FIRST 1000

CLOOP 1003

ENDFIL 1015

EOF 102A

THREE 102D

ZERO 1030

RETADR 1033

LENGTH 1036

BUFFER 1039

RDREC 2039

RLOOP 203F

EXIT 2057

INPUT 205D

MAXLEN 205E

WRREC 2061

WLOOP 2064

OUTPUT 2079

Page 14: Chapter 2 Assemblers

Object Program Format

Header record (H) Col. 2-7 program name Col. 8-13 Starting address of object program (Hex) Col. 14-19 Length of object program in bytes (Hex)

Text record (T) Col. 2-7 Starting address for object code in this record

(Hex) Col. 8-9 length of object code in this record (Hex) Col 10-69. object code, represented in Hex

End record (E) Col.2-7 address of first executable instruction in object

program (Hex)

Page 15: Chapter 2 Assemblers

Object Program

Page 16: Chapter 2 Assemblers

Algorithm for Pass 1 of Assembler(3/1)read first input lineif OPCODE=‘START’ then

begin save #[OPERAND] as starting address initialize LOCCTR to starting address write line to intermediate file read next input line

endelse

initialize LOCCTR to 0while OPCODE≠’END’ do

begin if this is not a comment line then

begin if there is a symbol in the LABEL field then

Page 17: Chapter 2 Assemblers

Algorithm for Pass 1 of Assembler(3/2)

begin search SYMTAB for LABEL if found then set error flag (duplicate symbol) else insert (LABEL, LOCCTR) into SYMTABend {if symbol}

search OPTAB for OPCODE if found then

add 3 {instruction length} to LOCCTR else if OPCODE=‘WORD’ then

add 3 to LOCCTR else if OPCODE=‘RESW’ then

add 3 * #[OPERAND] to LOCCTR

Page 18: Chapter 2 Assemblers

Algorithm for Pass 1 of Assembler(3/3)

else if OPCODE=‘RESB’ thenadd #[OPERAND] to LOCCTR

else if OPCODE=‘BYTE’ thenbegin find length of constant in bytes add length to LOCCTRend {if BYTE}

elseset error flag (invalid operation code)

end {if not a comment} write line to intermediate file read next input lineend {while not END}

Write last line to intermediate fileSave (LOCCTR-starting address) as program length

Page 19: Chapter 2 Assemblers

Algorithm for Pass 2 of Assembler(3/1)read first input line (from intermediate file)If OPCODE=‘START’ then begin

write listing lineread next input line

end {if START}Write Header record to object programInitialize first Text recordWhile OPCODE≠ ‘END’ do begin

if this is not a comment line then begin

search OPTAB for OPCODEif found then begin

Page 20: Chapter 2 Assemblers

Algorithm for Pass 2 of Assembler(3/2)

if there is a symbol in OPERAND field then begin

search SYMTAB for OPERANDif found then store symbol value as operand addresselse begin store 0 as operand address set error flag (undefined symbol) end

end {if symbol} else store 0 as operand address assemble the object code instruction end {if opcode found}

Page 21: Chapter 2 Assemblers

Algorithm for Pass 2 of Assembler(3/3)

else if OPCODE=‘BYTE’ or ‘WORD’ then convert constant to object codeif object code will not fit into the current Text record then begin write Text record to object program initialize new Text record endadd object code to Text record

end {if not comment}write listing lineread next input line

end {while not END}write last Text record to object programWrite End record to object programWrite last listing line

Page 22: Chapter 2 Assemblers

Machine-Dependent Assembler Features Indirect addressing is indicated by adding the prefix @ to

the operand Immediate operands are denoted with the prefix # The assembler directive BASE is used in conjunction

with base relative addressing The extended instruction format is specified with the

prefix + added to the operation code Register-to-register instruction are faster than the

corresponding register-to-memory operations because they are shorter and because they do not require another memory reference

Page 23: Chapter 2 Assemblers

Example of SIC/XE Program(3/1)

Page 24: Chapter 2 Assemblers

Example of SIC/XE Program(3/2)

Page 25: Chapter 2 Assemblers

Example of SIC/XE Program(3/3)

Page 26: Chapter 2 Assemblers

Program with Object Code (3/1)

Page 27: Chapter 2 Assemblers

Object Code Translation

Line 10: STL=14, n=1, i=1ni=3, op+ni=14+3=17, RETADR=0030, x=0, b=0, p=1, e=0xbpe=2, PC=0003, disp=RETADR-PC=030-003=02D, xbpe+disp=202D, obj=17202D

Line 12: LDB=68, n=0, i=1ni=1, op+ni=68+1=69, LENGTH=0033, x=0, b=0, p=1, e=0xbpe=2, PC=0006, disp=LENGTH-PC=033-006=02D, xbpe+disp=202D, obj=69202D

Line 15: JSUB=48, n=1, i=1ni=3, op+ni=48+3=4B, RDREC=01036, x=0, b=0, p=0, e=1, xbpe=1, xbpe+RDREC=101036, obj=4B101036

Line 40: J=3C, n=1, i=1ni=3, op+ni=3C+3=3F, CLOOP=0006, x=0, b=0, p=1, e=0xbpe=2, PC=001A, disp=CLOOP-PC=0006-001A=-14=FEC(2’s complement), xbpe+disp=2FEC, obj=3F2FEC

Line 55: LDA=00, n=0, i=1ni=1, op+ni=00+1=01, disp=#3003, x=0, b=0, p=0, e=0xbpe=0, xbpe+disp=0003, obj=010003

op(6) n i x b p e disp(12)

op(6) n i x b p e address(20)

Format 3

Format 4

Page 28: Chapter 2 Assemblers

Program with Object Code (3/2)

Page 29: Chapter 2 Assemblers

Object Code Translation

Line 125: CLEAR=B4, r1=X=1, r2=0, obj=B410 Line 133: LDT=74, n=0, i=1ni=1, op+ni=74+1=75, x=

0, b=0, p=0, e=1xbpe=1, #4096=01000, xbpe+address=101000, obj=75101000

Line 160: STCH=54, n=1, i=1ni=3, op+ni=54+3=57, BUFFER=0036, B=0033, disp=BUFFER-B=003, x=1, b=1, p=0, e=0xbpe=C, xbpe+disp=C003, obj=57C003

op(8) r1(4) r2(4)

Page 30: Chapter 2 Assemblers

Program with Object Code (3/3)

Page 31: Chapter 2 Assemblers

SYMTAB

SYMBOL VALUE FLAGS

FIRST 0000

CLOOP 0006

ENDFIL 001A

EOF 002D

RETADR 0030

LENGTH 0033

BUFFER 0036

SYMBOL VALUE FLAGS

RDREC 1036

RLOOP 1040

EXIT 1056

INPUT 105C

WRREC 105D

WLOOP 1062

OUTPUT 1076

Page 32: Chapter 2 Assemblers

Program Relocation The actual starting address of the program is not known

until load time An object program that contains the information necessa

ry to perform this kind of modification is called a relocatable program

No modification is needed: operand is using program-counter relative or base relative addressing

The only parts of the program that require modification at load time are those that specified direct (as opposed to relative) addresses

Modification record Col. 2-7 Starting location of the address field to be modified, rela

tive to the beginning of the program (Hex) Col. 8-9 Length of the address field to be modified, in half-bytes

(Hex)

Page 33: Chapter 2 Assemblers

Examples of Program Relocation

Page 34: Chapter 2 Assemblers

Object Program

Page 35: Chapter 2 Assemblers

Machine-Independent Assembler Features Literals Symbol-defining statements Expressions Program block Control sections and program linking

Page 36: Chapter 2 Assemblers

Program with Additional Assembler Features(3/1)

Page 37: Chapter 2 Assemblers

Program with Additional Assembler Features(3/2)

Page 38: Chapter 2 Assemblers

Program with Additional Assembler Features(3/3)

Page 39: Chapter 2 Assemblers

Literals(2/1)

Write the value of a constant operand as a part of the instruction that uses it

Such an operand is called a literal Avoid having to define the constant elsewhere in the

program and make up a label for it A literal is identified with the prefix =, which is followed

by a specification of the literal value Examples of literals in the statements:

45 001A ENDFIL LDA =C’EOF’ 032010 215 1062 WLOOP TD =X’05’ E32011

Page 40: Chapter 2 Assemblers

Literals(2/2) With a literal, the assembler generates the specified value

as a constant at some other memory location The address of this generated constant is used as the

target address for the machine instruction All of the literal operands used in the program are

gathered together into one or more literal pools Normally literals are placed into a pool at the end of the

program A LTORG statement creates a literal pool that contains all

of the literal operands used since the previous LTORG Most assembler recognize duplicate literals: the same

literal used in more than one place and store only one copy of the specified data value

LITTAB (literal table): contains the literal name, the operand value and length, and the address assigned to the operand when it is placed in a literal pool

Page 41: Chapter 2 Assemblers

Symbol-Defining Statements

Assembler directive that allows the programmer to define symbols and specify their values

General form: symbol EQU value Line 133: +LDT #4096

MAXLEN EQU 4096+LDT #MAXLEN

It is much easier to find and change the value of MAXLEN Assembler directive that indirect assigns values to symbols ORG

STAB RESB 1100SYMBOL EQU STABVALUE EQU STAB+6FLAGS EQU STAB+9

STAB RESB 1100ORG STAB

SYMBOL RESB 6VALUE RESW 1FLAGS RESW 2

ORG STAB+1100

Page 42: Chapter 2 Assemblers

Expressions Assembler allow arithmetic expressions formed

according to the normal rules using the operator +, -, *, and /

Individual terms in the expression may be constants, user-defined symbols, or special terms

The most common such special term is the current value of the location counter (designed by *)

Expressions are classified as either absolute expressions or relative expressions

Symbol Type Value

RETADR R 0030

BUFFER R 0036

BUFFEND R 1036

MAXLEN A 1000

Page 43: Chapter 2 Assemblers

Program Block(2/1) Program blocks: segments of code that are

rearranged within a single object unit Control sections: segments that are translated into

independent object program units USE indicates which portions of the source program

belong to the various blocks

Block name Block number Address Length

(default) 0 0000 0066

CDATA 1 0066 000B

CBLKS 2 0071 1000

Page 44: Chapter 2 Assemblers

Program Block(2/2)

Because the large buffer area is moved to the end of the object program, we no longer need to used extended format instructions

Program readability is improved if the definition of data areas are placed in the source program close to the statements that reference them

It does not matter that the Text records of the object program are not in sequence by address; the loader will simply load the object code from each record at the indicated address

Page 45: Chapter 2 Assemblers

Example Program with Multiple Program Blocks(3/1)

Page 46: Chapter 2 Assemblers

Example Program with Multiple Program Blocks(3/2)

Page 47: Chapter 2 Assemblers

Example Program with Multiple Program Blocks(3/3)

Page 48: Chapter 2 Assemblers

Program Blocks Traced Through Assembly and Loading Processes

Page 49: Chapter 2 Assemblers

Object Program

Page 50: Chapter 2 Assemblers

Control sections(3/1)

References between control sections are called external references

The assembler generates information for each external reference that will allow the loader to perform the required linking

The EXTDEF (external definition) statement in a control section names symbol, called external symbols, that are define in this section and may be used by other sections

The EXTREF (external reference) statement names symbols that are used in this control section and are defined elsewhere

Page 51: Chapter 2 Assemblers

Control sections(3/2)

Define record (D) Col. 2-7 Name of external symbol defined in this

control section Col. 8-13 Relative address of symbol within this

control section (Hex) Col. 14-73 Repeat information in Col. 2-13 for other

external symbols Refer record (R)

Col. 2-7 Name of external symbol referred to in this control section

Col. 8-73 Names of other external reference symbols

Page 52: Chapter 2 Assemblers

Control sections(3/3)

Modification record (revised : M) Col. 2-7 Starting address of the field to be modified,

relative to the beginning of the control section (Hex) Col. 8-9 Length of the field to be modified, in half-

bytes (Hex) Col. 10 Modification flag (+ or -) Col. 11-16 External symbol whose value is to be

added to or subtracted from the indicated field

Page 53: Chapter 2 Assemblers

Example Program with Control Sections(3/1)

Page 54: Chapter 2 Assemblers

Example Program with Control Sections(3/2)

Page 55: Chapter 2 Assemblers

Example Program with Control Sections(3/3)

Page 56: Chapter 2 Assemblers

Object Program(2/1)

Page 57: Chapter 2 Assemblers

Object Program(2/2)

Page 58: Chapter 2 Assemblers

One-Pass Assemblers

Eliminate forward references: require that all such areas be defined in the source program before they are referenced

One-pass assembler: Generate their object code in memory for immediate

execution Load-and-go assembler is useful in a system that is

oriented toward program development and testing

Page 59: Chapter 2 Assemblers

Handle Forward Reference

The symbol used as an operand is entered into the symbol table

This entry is flagged to indicate that the symbol is undefined

The address of the operand field of the instruction that refers to undefined symbol is added to a list of forward references associated with the symbol table entry

When the definition for a symbol is encountered, the forward reference list for that symbol is scanned, and the proper address is inserted into any instructions previously generated

Page 60: Chapter 2 Assemblers

Sample Program for One-Pass assembler(3/1)

Page 61: Chapter 2 Assemblers

Sample Program for One-Pass assembler(3/2)

Page 62: Chapter 2 Assemblers

Sample Program for One-Pass assembler(3/3)

Page 63: Chapter 2 Assemblers

Example of Handling Forward Reference(2/1)

Page 64: Chapter 2 Assemblers

Example of Handling Forward Reference(2/2)

Page 65: Chapter 2 Assemblers

Multi-Pass Assemblers(6/1)

HALFSZ EQU MAXLEN/2MAXLEN EQU BUFFEND-

BUFFERPREVBT EQU BUFFER-1……….BUFFER RESB 4096BUFFEND EQU *

Page 66: Chapter 2 Assemblers

Multi-Pass Assemblers(6/2)

Page 67: Chapter 2 Assemblers

Multi-Pass Assemblers(6/3)

Page 68: Chapter 2 Assemblers

Multi-Pass Assemblers(6/4)

Page 69: Chapter 2 Assemblers

Multi-Pass Assemblers(6/5)

Page 70: Chapter 2 Assemblers

Multi-Pass Assemblers(6/6)

Page 71: Chapter 2 Assemblers

MASM Assembler

An MASM assembler language program is written as a collection of segments

Commonly used classes are CODE, DATA, CONST, and STACK

During program execution, segments are addressed via the x86 segment registers

ASSUME tells MASM the contents of a segment register; a programmer must provide instructions to load this register when the program is executed

A near jump is a jump to a target in the same code segment; a far jump is a jump to a target in a different code segment

Page 72: Chapter 2 Assemblers

SPARC Assembler A SPARC assembler language program is divided into u

nits called sections .TEXT Executable instructions .DATA Initialized read/ write data .RODATA Read-only data .BSS Uninitialized data areas

A global symbol is either symbol that is defined in the program and made accessible to others

A weak symbol is similar to a global symbol, but the definition of a weak symbol may be overridden by a global symbol with the same name

SPARC branch instructions are delayed branches: the instruction immediately following a branch instruction is actually executed before the branch is taken

Programmers often place NOP (no-operation) instructions in delay slots