8087 NDP Paper p174 Palmer

8
THE INTEL~8087 NUMERIC DATA PROCESSOR John Palmer Intel Corporation This paper describes a new device, the Intel~ 8087 Numeric Data Processor, with unprecedented speed, accuracy and capability. Its modified stack architecture and instruction set are explained and illustrative examples are included. The 8087, which conforms to the proposed IEEE FloAting-Point Standard, is a coprocessor in the Intel~8086 fam- ily. It supports seven data types: three REAL, three INTEGERand one packed BCD format, and per- forms all necessary numeric operations from addi- tion to logarithmic and trigonometric functions. new applications, most notably interval arithmetic [1]. The 8087 provides an unprecedented level of capability, safety and reliability with high per- formance and low cost and is a prime example of the almost incredible possibilities in combining soft- ware and architectural expertise with VLSI proces- sing capability. 2.0 8087 OVERVIEW The 8087 consists of a stack of registers for holding operands and results, a set of registers constituting its environment and a set of instruc- tions. The stack is a set of 8 registers, each 80 bits wide. Associated with the stack is a three bit stack pointer, TOP, and with each stack element a two bit tag. (Both the tags and TOP physically belong to the ENVIRONMENT but w i l l be shown with the stack.) The stack elements are numbered rela- tive to TOP (ST(i) means the ith stack element from the top of stack) as shown bel~. 1.0 INTRODUCTION The Intel~)8087 is a high performance gen- eral purpose nume~c data processor. It is a part of the InteNJ8086 family and can be used with either the 8086 or the 8088 to extend their instruction sets by over 120 numeric data manip- ulation operations. The 8087 is not a peripheral but a coprocessor; it monitors the instruction stream and when an 8086/8088 ESCAPE instruction is read, the 8087 takes over the bus and inter- prets and executes the ESCAPE instruction as one of its own instructions. This tightly coupled coprocessing interface permits the 8087 to exe- cute numeric instructions while the 8086 executes any others. The concurrent instruction execution increases the throughput of the system. Further- more, the 8087 is the only chip that must be added to an 8086 (8088) system to provide numeric capa- b i l i t y that exceeds software in speed by more than a factor of 100. The 8087 is intended to be general purpose and satisfy a very wide range of needs for math- ematical computation. It is fast enough for a great many scientific and statistical calculations; it is accurate enough for business and commercial computation; and it is precise enough for entirely TAGS i II STACK ¢- SIGN -7,=1 5 EXPONENT 51GNIFleRND o ST{~) ST (a) ST(K) ST(o) STBCK ST(G) 5T(5) ST(4) The tag field is used to detect uninitialized stack elements and to designate special values (e.g. zero) for microcode optimization. The value represented in a register has 64 bits of precision and a range of about 10 ±4900 (15 bit exponent). A more complete description of the register values will be given in Section 3. CH1494-4/80/0000-0174 $00.75 © 1980 IEEE 474

description

8087 coprocessor

Transcript of 8087 NDP Paper p174 Palmer

Page 1: 8087 NDP Paper p174 Palmer

THE INTEL~8087 NUMERIC DATA PROCESSOR

John Palmer

Intel Corporation

This paper describes a new device, the Intel ~ 8087 Numeric Data Processor, with unprecedented speed, accuracy and capability. Its modified stack architecture and instruction set are explained and i l lus t ra t ive examples are included. The 8087, which conforms to the proposed IEEE FloAting-Point Standard, is a coprocessor in the Intel~8086 fam- i l y . I t supports seven data types: three REAL, three INTEGER and one packed BCD format, and per- forms al l necessary numeric operations from addi- tion to logarithmic and trigonometric functions.

new applications, most notably interval arithmetic [1]. The 8087 provides an unprecedented level of capability, safety and re l i ab i l i t y with high per- formance and low cost and is a prime example of the almost incredible possibi l i t ies in combining soft- ware and architectural expertise with VLSI proces- sing capability.

2.0 8087 OVERVIEW

The 8087 consists of a stack of registers for holding operands and results, a set of registers constituting i ts environment and a set of instruc- tions.

The stack is a set of 8 registers, each 80 bits wide. Associated with the stack is a three bi t stack pointer, TOP, and with each stack element a two bi t tag. (Both the tags and TOP physically belong to the ENVIRONMENT but w i l l be shown with the stack.) The stack elements are numbered rela- tive to TOP (ST(i) means the i th stack element from the top of stack) as shown bel~.

1.0 INTRODUCTION

The Intel~)8087 is a high performance gen- eral purpose nume~c data processor. I t is a part of the InteNJ8086 family and can be used with either the 8086 or the 8088 to extend their instruction sets by over 120 numeric data manip- ulation operations. The 8087 is not a peripheral but a coprocessor; i t monitors the instruction stream and when an 8086/8088 ESCAPE instruction is read, the 8087 takes over the bus and inter- prets and executes the ESCAPE instruction as one of i ts own instructions. This t ight ly coupled coprocessing interface permits the 8087 to exe- cute numeric instructions while the 8086 executes any others. The concurrent instruction execution increases the throughput of the system. Further- more, the 8087 is the only chip that must be added to an 8086 (8088) system to provide numeric capa- b i l i t y that exceeds software in speed by more than a factor of 100.

The 8087 is intended to be general purpose and satisfy a very wide range of needs for math- ematical computation. I t is fast enough for a great many scient i f ic and stat ist ical calculations; i t is accurate enough for business and commercial computation; and i t is precise enough for entirely

TAGS

i I I

STACK

¢- SIGN -7,=1 5 EXPONENT 51GNIFleRND

o

ST{~)

ST (a) ST(K) ST(o) STBCK

ST(G)

5T(5)

ST(4)

The tag f ie ld is used to detect unini t ial ized stack elements and to designate special values (e.g. zero) for microcode optimization.

The value represented in a register has 64

bits of precision and a range of about 10 ±4900 (15 bi t exponent). A more complete description of the register values wi l l be given in Section 3.

CH1494-4/80/0000-0174 $00.75 © 1980 IEEE 474

Page 2: 8087 NDP Paper p174 Palmer

The 8087 environment consists of seven as i l l u s t r a t e d below.

B Z TOP~ C AISIN -iP U O (~!DI

words

ST/~TU5

CONTROL WORD TAG WOR.D

~NST~UC- T i o n

RDDIZESS

DATA

The STATUS word consists of the EXCEPTION f lags (0-7) and the STATUS b i ts (8-15) where the meanings are (* indicates a f i e l d reserved fo r fu tu re use):

EXCEPTION FLAGS

I : i nva l i d operation D : denormalized operand Q : d i v i s i on of nonzero by zero 0 : overf low U : underflow P : inexact (prec is ion) N : indicates a pending i n te r rup t

STATUS BITS

Z,C,A,S : condi t ion code b i ts fo r various ins t ruc t ions (e.g. COMPARE)

TOP : stack pointer

B : indicates whether the 8087 is BUSY (used for synchronizat ion)

The CONTROL WORD consists of EXCEPTION MASKS and CONTROL BITS. For each exception there is a mask which i f reset al lows an i n te r rup t to be gen- erated ( i f M = O) but i f set the i n te r rup t is sup- pressed and the 8087 executes a defau l t exception handling procedure (on chip) and continues (the procedure w i l l be explained in Section I I I ) . The M mask is the 8087 in te r rup t enable/disable b i t . The CONTROL BITS have the fo l low ing meaning

PC : precis ion control - resu l ts are rounded to one of three precis ions: Temporary Real (64 b i t s ) , Long Real (53 b i t s ) , Real (24 b i t s ) .

RC : rounding control - resu l ts are rounded in one of four d i rec t ions : unbiased round to nearest, round towards + ~ , round towards - ~ , round towards zero.

IC : i n f i n i t y control - there are two types of i n f i n i t y ar i thmet ic provided: a f f i ne and pro jec t ive .

The TAG word contains tags descr ib ing the contents of the corresponding stack elements. The i ns t ruc t i on and data pointers are the addresses o f an i ns t ruc t i on (and i t s referenced data i f any) i f

i t causes an exception that generates an i n te r rup t .

There are four types of 8087 ins t ruc t i ons : the CORE set, the EXTENDED set, the SPECIAL FUNCTION set and the ADMINISTRATIVE set. The core set in - cludes load and store of the stack values and a- r i thmet ic operators: add, subtract , mu l t i p l y , d i - vide and compare. The extended set is fo r loading and stor ing three special formats (see Section 3). The special funct ion set includes square root and transcendental funct ion support. The administra- t i ve ins t ruc t ions are used fo r context switching and processor cont ro l . Most of the ins t ruc t ions w i l l be described in more de ta i l as the 8087 de- sign goals are explained.

3.0 DESIGN GOALS

The 8087 is designed to achieve several major goals. F i r s t , the 8087 conforms to an improved and expanded version of I n t e l ' s standard fo r f l o a t - ing -po in t ar i thmet ic C2]. Second, the 8087 prov- ides s i g n i f i c a n t l y more capab i l i t y than mainframe and minicomputer f l o a t i n g - p o i n t processors and consequently has app l ica t ions beyond s c i e n t i f i c computation. Th i rd , the 8087 is convenient to use in assembly language and easy to generate code for in high level language. And f i n a l l y the capabi l - i t i e s of VLSI are used to provide a l l t h i s func t ion- a l i t y wi th high performance and e f f i c i ency in a s ing le device.

3.1 F loat ing-Po in t Standard

The In te l f l o a t i n g - p o i n t standard, cal led the REALMATH standard, was o r i g i n a l l y speci f ied in 1977 C2] and implemented in several products (FPAL, SBC-310, FORTRAN-80, BASIC-80). At about that time an IEEE committee was formed to propose a f l o a t i n g - point standard for microprocessors. In te l was in- v i ted to pa r t i c ipa te and offered i t s standard for considerat ion.

At the time th i s paper was wr i t t en i t had be- come apparent tha t the major i ty of the committee had agreed on a revised and expanded version of I n t e l ' s standard [3 ] . The standard speci f ies data formats, rounding algori thms and exception han- d l ing .

The standard spec i f ies and the 8087 supports three f l o a t i n g - p o i n t data types: Real (s ing le p rec is ion) , Long Real (double prec is ion) and Tem- porary Real (extended prec is ion) . A l l formats are binary and each has a biased exponent. The values represented by the three formats are shown below.

,~m ~ o

'Tq 6 ~

I ~ h L

0

I

t75

Page 3: 8087 NDP Paper p174 Palmer

TOTRL. L.E N ~TH

EXPoN ENT LENGTH

E X PO~E~,,I'r

VALU 4 e.-O [e,,o, JJ...I

INFINITY NOT'-A- NUrvaBEI~. C.aN)

RERL.. LONG R£BL T g M R REAL

3E bits

'3 b i t s ~4 bi-t-5 8 0 b i t 5

I I bi'i~ 15 b i ts p.,1 _ [ ~.,o_ I ~'~- I

<o..F')

e.-/l"-I,-F':O e: l l , . . I ,~ :O e.tl...I,i.l,.F':O e:ll-"l ,.4~0 e:~l...i , - C ¢ O e,Jl..-I,i.I,-~-O

The Temporary Real format (identical to the 8087 register format) is intended to hold inter- mediates and to support accurate Long Real cal- culations. I t has an expl ic i t leading bi t ( i ) in the significand thus allowing unnormalized arith- metic. However, the algorithms are designed so that normalized operands wi l l always yield nor- malized results.

The algorithms specified by the standard re- quire that the completely precise result of an operation be rounded to the nearest representable number, breaking ties by rounding to the nearest even number. This default mode of rounding is called "unbiased round to nearest". There are ,optional "directed rounding" modes that are spec- i f ied to yield

1. the nearest neighbor less than or equal to the true result.

2. the nearest neighbor greater than or equal to the true result.

The 8087 provides these rounding modes as con- trol led by a f ie ld (RC) in the CONTROL WORD.

The 8087, which does a l l ca lcu la t ions in Temporary Real format, has another f i e l d in the CONTROL word fo r speci fy ing the precis ion to which a resu l t is rounded (PC). Thus, the prec is ion of resu l ts is independent of the prec is ion of operands and, though held in Temporary Real format and ben- e f i t t i n g from extended range, may be forced to Real, Long Real or Temporary Real. This control is provided fo r languages that do not a l low ex- tended prec is ion intermediates and to al low the same code to be run under d i f f e r e n t precis ion set- t ings as an aid to er ror est imat ion.

The standard also specifies that al l excep- tions must be detected and that an implementation should permit exception handling. The 8087 sup- ports this by detecting six types of exceptions and by generating an interrupt i f the exception is not masked. I f an interrupt is generated, the in- terrupt procedure (exception handler) has avail- able the exception flags, a pointer to the instruc- tion causing the interrupt and a pointer to the datum i f memory was addressed. The six exceptions, each of which has an associated "sticky" flag (once set i t remains set unti l reset by software), are l isted below.

i . I : i nva l i d operat ion t h i s exception is signaled by stack overf low or underf low, the use of a NAN as an operand and several other cases as l i s t ed in ~3]

2. D : denormalized operand at least one operand is denormalized

3. Q : zero divisor the dividend is f i n i te and nonzero while the divisor is zero

4. 0 : overflow the exponent of the result is too large for the destination's format

5. U : underflow the exponent of the resu l t is too small fo r the des t ina t ion ' s format

6. P : inexact result the delivered result is not equal to the completely precise result but has been rounded

Since the default response to overflow and zero divisor is to set the result t o n , the 8087 supports two modes of i n f i n i t y arithmetic:

I . a f f i ne - there are two i n f i n i t e s , one ( - ~ ) less than a l l other numbers and one (+cx:~) greater

2. projective - there is only one i n f i n i t y (the sign on - - i s ignored) which closes the number system analogous to the point a t ~ o n the Reimann sphere.

These two modes require the representation of two zeros (±0) which are "equal" in comparison and al l other operations except division where*I/+O=,loc~ +I#O:-~. The mode of i n f i n i t y arithmetic is de- termined by a f ie ld (IC) in the CONTROL word.

There are instructions that support the stand- ard by controlling rounding, precision and in f in- i t y arithmetic and by permitting complete exception handling. These instructions load and store either the control word or the entire environment and store the exception flags.

The features and instructions discussed above support the Intel floating-point (REALMATH) stand- ard but additional capability is also desired.

3.2 Capability Extension

The 8087, by supporting the required and op- tional aspects of the standard and by supporting several features not mentioned by the standard, signif icant ly extends the capabilities of the 8086 family beyond that expected from a typical floating- point processor. These extensions include addi- tional data types, provision of exact arithmetic, support for interval arithmetic and special func- tions.

176

Page 4: 8087 NDP Paper p174 Palmer

The 8087 addresses seven d i f f e r e n t data types using a l l of the 8086 addressing modes. These data types are:

1. Real (32 b i t s )

2. Long Real (64 b i t s )

3. Temporary Real (80 b i t s )

4. Integer Word (16 b i t s 2's complement)

5. Integer (32 b i t 2's complement)

6. Long Integer (64 b i t 2's complement)

7. Packed BCD Integer (80 b i t s , 18 d i g i t s and sign)

A l l of the data types, when used as operands, are f i r s t converted (wi thout rounding er ror ) to Temporary Real and the resu l t of the operation is also returned as Temporary Real. Thus the 8087 ar i thmet ic un i t only has to work wi th one kind of data. When resu l ts are desired in one of the other formats, they are automat ica l ly converted to that type before they are stored in memory.

The provis ion of exact ar i thmet ic is accom- pl ished by inc lud ing the inexact exception (P) along wi th i t s mask. I f a rounding er ror is com- mi t ted, the co r rec t l y rounded resu l t is del ivered and the P f lag is set. I f the mask (PM) is zero an i n te r rup t is generated, otherwise execution simply continues. This permits f i nanc ia l account- ing funct ions to be performed wi thout fear of roundoff er ror . Exact ar i thmet ic is also useful in doing coe f f i c i en t "precondi t ion ing" [see 4].

The support of in te rva l ar i thmet ic is consid- ered one of the most important features of the 8087. As stated by W. Kahan [5 ] :

"No other feature would enhance safe numerical computation more than the provis ion of INTERVAL as a data type in FORTRAN as read i l y accessible as INTEGER or REAL."

This new INTERVAL data type, which the 8087 supports through the rounding modes (RC) and the signed zeros and i n f i n i t i e s , can be represented as an ordered pa~r: INTERVAL, I = [a,b~. I f a~b then I includes a l l numbers between a and b; but i f a > b then I includes a l l numbers x where x ~ a or x ~ b . An i l l u s t r a t i o n may help c l a r i f y the con- cept. Consider the set of numbers as a c i r c l e wi th the two cases described above pictured as

o 0

Start at a and proceed clockwise unti l b is reached; a l l numbers covered belong to I. The signs on zero and i n f i n i t y permit us to have open or closed intervals when zero or i n f i n i t y is an end point with the sign denoting which case pertains. I f an endpoint is neither zero nor i n f i n i t y then the interval is always closed. A complete def in i - tion of interval arithmetic cannot be given here; however, we can l i s t some of i ts uses. In addition to i ts obvious ab i l i t y to bound rounding errors, interval arithmetic can be used to estimate the effect of noise in data, to compute confidence in- tervals and to do worst-case analysis.

In addi t ion to exact and in te rva l a r i thmet i c , the 8087 provides several special i ns t ruc t i ons fo r e f f i c i e n t evaluat ion of many important mathematical funct ions with unprecendented accuracy. One of these ins t ruc t ions is square root. I t overwri tes the contents of the top of stack with i t s co r rec t l y rounded (according to RC and PC) square root . Be- sides being co r rec t l y rounded the square root op- erat ion is as fas t as the d iv ide i ns t r uc t i on . Thus algorithms need not be contorted to remove square roots.

There are two ins t ruc t ions to aid in argument reduction fo r transcendental funct ion evaluat ion: DECOMPOSE and REMAINDER. The decompose i ns t ruc t i on overwri tes the contents of the top of stack with the in tegra l value of i t s exponent in Temporary Real format, decrements the stack po in ter and loads in to the new top of stack the value of the s i g n i f i - cand of the o r ig ina l stack top scaled between I and 2 (or -1 and -2 i f negat ive). The operat ion is i l - lus t ra ted below.

A A

Top sl p lil

( I f the o r ig ina l top of stack is zero then both resu l ts are zero.)

The remainder i ns t ruc t i on is fo r reducing ar- guments of per iodic funct ions to a primary range. I t ca lculates the exact remainder (no roundoff er- ror) of the top two stack elements:

REM = (TOP) modulo (next-of-TOP)

The remainder is returned to the stack top and the next-of-TOP ( " d i v i s o r " ) is not changed. Since the execution of a f l o a t i n g - p o i n t remainder could be very lengthy, the remainder i ns t ruc t i on is ac tua l l y a p r im i t i ve : the resu l t is e i ther the remainder or the pa r t i a l remainder a f te r a f ixed number of steps. Thus to compute a remainder requires a software loop that terminates when I(TOP)I is less than

I(TOP +I) I . Even by using remainder we w i l l not have t r igonometr ic funct ions with period 2'Irsince 'IT'cannot be exact ly represented in the 8087. How- ever, the funct ions w i l l be exact ly per iodic wi th

'177

Page 5: 8087 NDP Paper p174 Palmer

period 2"Ir'* (whereqT'* is the machine approximation to.lr') and thus w i l l obey the i d e n t i t i e s that do not e x p l i c i t l y involveqT' .

The other ins t ruc t ions provided fo r special funct ions are TANGENT, ARCTANGENT, EXPONENTIAL and LOGARITHM.

The tangent assumes the top of stack, X, is between zero and'IT'/4 and returns two resu l ts as shown: .

I t A T A N A ToP . X I / Y

T~P /

The arctangent works in reverse by using two argu- ments and re turn ing one:

: I • ATAN . " A ly~z >O A y IT °p " X

TOP = ~ II X-- arc'fon(y~)j The exponential i n s t r u c t i o n , which calculates

2 X -1, assumes that 0 _~x~1/2 and overwri tes the argument on the top of the stack wi th the resu l t . The logari thm func t ion , which computes Y * log2(X), uses two arguments and returns a s ingle resu l t as shown

I i i Y x >o ~" TOP ~ X [~:y~loq~Cx)l

The error bound fo r a l l these funct ions is about 2 un i ts in the las t place thus a l lowing fo r Long Real arguments to be computed to Long Real accuracy. The prov is ion of the described special funct ions support the goal of increased capab i l i t y .

3.3 Ease of Use

As stated above, ease of use, along wi th sup- port of the standard and extended c a p a b i l i t y , is a major 8087 goal. We have made the 8087 easy and convenient fo r programmers and automatic code gen- erators by provid ing software emulation, a deep (8 leve ls ) in terna l stack of very wide precis ion

(64 bits) and large range (10:1:4900), optimized sym- metric mixed mode arithmetic and on chip default exception handling.

The in te r face between the 8086 (8088) and 8087 al lows fo r software emulation of the 8087 permit- t i ng software fo r the 8087 to be developed, de- bugged and executed on a system contain ing only an 8086 (8088). In order to run the developed soft- ware on an 8087 i t is not necessary to recompile but only rel ink. To understand how one can delay the resolution of either 8087 or emulator unt i l the

l i n k stage, i t is necessary to expla in the 8086- 8087 in ter face.

The 8086 (8088) has a set of ESCAPE ins t ruc - t ions tha t , in memory addressing mode, cause the 8086 to ca lcu la te the address and read the contents of that address. The 8086 ignores the word i t reads and then preceeds to execute subsequent in - s t ruc t ions . The 8087 is monitor ing the same in - s t ruc t ion stream and when i t detects an ESCAPE i t knows that i t is being ins t ruc ted to do something. I t latches the opcode and i f there was an address calculated the 8087 captures both the address and the datum read by the 8086. By decoding the i n - s t ruc t ion the 8087 knows how many more words i t meeds from memory and i t increments the address and fetches data u n t i l a l l required data is read. The 8087 then releases the bus and begins ca l cu la t i ng whi le the 8086 continues executing the i ns t r uc t i on stream. Because of the overlapped coprocessing of the 8086-8087 i t is necessary to preceed 8087 in - s t ruc t ions (ESCAPE) wi th a WAIT i ns t r uc t i on in or- der to synchronize the two processors. In place of the WAIT, when the software emulator is to be invoked, an INTERRUPT i ns t ruc t i on is inser ted. There are some other d i f ferences between the hard- ware and software in ter faces but they are the same length and use the same addressing mechanism. This permits a compiler to output an external reference instead of the WAIT-ESCAPE and l e t the LINKER f i l l in wi th e i t he r WAIT-ESCAPE or INTERRUPT depending on whether the user has an 8087 or desires to use the emulator.

In add i t ion to software emulation to aid so f t - ware development, the 8087 has an e ight level stack of reg is ters that supports the Temporary Real (80 b i t ) format and makes the 8087 fa r easier to use than other f l o a t i n g - p o i n t processors. A l l calcu- l a t i ons are done in t h i s extended format and as long as intermediates are kept in the stack or i t s equivalent memory format ( i f e ight is not enough) then the threat of roundoff damage and r i sk of over- f low or underf low is g rea t l y reduced. Roundoff er- ror is reduced because Temporary Real intermediates are more precise than Long Real data or f i na l re- su l ts by eleven guard b i t s . Most overf lows and underflows occur on intermediate ca lcu la t ions and the extended range of Temporary over Long Real (1024900 vs. 10 ±308 ) ensures tha t on intermediates these exceptions need seldom, i f ever, occur.

The symmetric mixed mode i ns t r uc t i on set also contr ibutes to ease of use. The CORE i ns t r uc t i ons , which include LOAD, STORE & POP, STORE, ADD, SUB- TRACT, SUBTRACT REVERSE, MULTIPLY, DIVIDE, DIVIDE REVERSE, COMPARE, and COMPARE & POP, take one o- perand from the top of stack and a second operand from e i the r memory or a stack element. There are thus two forms of CORE i ns t ruc t i ons : memory ad- dressed and stack addressed. The memory addressed form supports four memory formats in a l l 8086 ad- dressing modes:

Integer Word (16 b i t 2's complement) Integer (32 b i t 2's complement) Real (32 b i t ) Long Real (64 b i t )

~78

Page 6: 8087 NDP Paper p174 Palmer

The LOAD Integer i ns t ruc t i on converts an in teger to Temporary Real format and pushes i t on the stack; the ADD Long Real i ns t ruc t i on converts a Long Real operand to Temporary Real and adds i t to the top of the stack; and the STORE Integer Word i ns t ruc t i on converts the top of stack to a 16 b i t in teger and stores i t in memory (wi thout a l t e r i ng the contents of the stack).

The stack addressed form of the CORE i n s t r u c - t ions obtains the second operand from one of the stack elements instead of memory. The reference is always r e l a t i ve to the top of stack; thus stack element i , where i:O . . . . . 7, refers to the i t h e le- ment of the stack under the top of stack. The stack addressed form has two options fo r the des- t i na t i on of the resu l t . The resu l t can e i t he r over- wr i te the top of stack or replace the contents of the i t h stack element depending on the se t t ing of the "di-rect ion" (D) b i t in the i ns t ruc t i on . I f the dest inat ion is the i t h stack element then depending on the se t t ing of another b i t ( the "pop" (P) b i t ) the stack is popped or l e f t unaltered.

The EXTENDED instructi~on set consists of two memory addressed type of i ns t ruc t i ons , LOAD and STORE & POP, that support three addi t ional memory formats:

Long Integer (64 b i t 2's complement) Temporary Real (80 b i t ) Packed BCD (80 b i t )

The Temporary Real format is supported fo r extending the 8087 stack to memory when necessary; the Packed BCD format, which is a signed 18 d i g i t in teger as shown,

° I °°. Hod is used to aid binary-decimal conversion and COBOL type ca lcu la t ions ; and the Long Integer format is supported fo r app l ica t ions requ i r ing very wide pre- c is ion exact computation. Again i t is important to note that conversion of these formats to Tem- porary Real is done wi th no rounding er ror .

Another i n s t r u c t i o n , included to make the 8087 easy to use, is in ne i ther the CORE nor the EXTEN- DED set but i t s value is obvious. That i ns t ruc t i on is EXCHANGE top of stack with the i t h stack element. This i ns t ruc t i on has no memory form and ignores the D and P b i t s .

A f u r t he r user convenience in the 8087 is i t s on-chip defau l t exception handl ing. Though i t is possible to handle exceptions wi th software, i t is often an onerous task to w r i t e , debug and maintain exception handlers. The defau l t 8087 response to an exception is invoked by masking in the CONTROL WORD that exception. The 8087's response to masked exceptions balances safety With the u t i l i t y of con- t inued ca lcu la t ion . Listed below are the de fau l t responses to masked exceptions:

1. I nva l i d Operation - i f e i t he r operand is NAN, the 8087 propagates the lex icogra-

ph i ca l l y la rger ( ignor ing the sign) otherwise i t generates a special NAN cal led INDEFINITE as the resu l t .

2. Denormalized Operand - the operand is con- verted to an equivalent unnormalized rep- resentat ion preserving the same number of leading zeros.

3. Zero Div isor - since the dividend is non- zero the resu l t is ± ~ with the sign set in the usual way (XOR of the signs of the operands).

4. Overflow - the resu l t i s ~ w i t h the sign of the overflowed resu l t .

5. Underflow - the resu l t is denormalized to f i t the des t ina t ion ' s format ("gradual underflow" E4J).

6. Inexact Result - the co r rec t l y rounded resu l t is returned.

A l l of the features discussed above: software em- u l a t i on , deep Temporary Real stack, symmetric and powerful i ns t ruc t i on set and defau l t exception handl ing, make the 8087 easy and convenient to use; but to be useful i t must also be e f f i c i e n t .

3.4 Eff ic. iency

E f f i c iency was a major goal in the design of the 8087. An extensive treatment of the in terna l hardware and algori thms w i l l be given elsewhere, but a b r i e f descr ip t ion w i l l i l l u s t r a t e our concern fo r performance. The 8087's main ALU is more than 64 b i t s wide. This is to handle e f f i c i e n t l y 64 b i t operands wi th guard, round and s t i cky b i t s [6 ] and at least one overf low b i t . I t s s h i f t e r can s h i f t r i gh t or l e f t from 0 to 63 places in one clock cycle. This is useful fo r format t ing, nor- mal iz ing and denormalizing and fo r the transcen- dental funct ions. For normal iz ing there is hard- ware fo r detect ing the pos i t ion of the most s ig- n i f i c a n t one. F i n a l l y , there is special harc~ware to permit mu l t i p l y , d i v ide , remainder and square root to be calculated rap id ly . Approximate speeds of the basic operations fo r stack operands are summarized below:

5MHz Microseconds

COMPARE 5 ADD (MAGNITUDE) 10 SUBTRACT (MAGNITUDE) 16 MULTIPLY 16, 24* DIVIDE 38 SQUARE ROOT 38

* shorter time i f e i the r operand was o r i g i n a l l y Real (32 b i t )

The above t imings apply fo r Real, Long Real or Temporary Real operands and resu l ts . The prev i - ously described overlapped i ns t ruc t i on execution by the 8086 and 8087 also increases throughput. However, more important that absolute execution speeds is the stack wi th i t s in terna l addressing

'179

Page 7: 8087 NDP Paper p174 Palmer

that minimizes memory referencing. There is an in - s t ruc t ion fo r scal ing that is much fas ter than mul- t i p l y . For rapid context swi tch ing, the 8087 has SAVE and RESTORE ins t ruc t i ons . The i ns t ruc t i on set and the hardware to execute i t rap id ly give the 8087 very high performance wi thout s a c r i f i c i n g qua l i t y .

4.0 CONCLUSION

To i l l u s t r a t e the capab i l i t i e s of the 8087 an extensive set of programs would be very usefu l . We w i l l here give two examples that should re in force many of the points made e a r l i e r . The f i r s t example is to ca lcu la te the length of a vector. The task is conceptual ly simple but a r e l i a b l e , robust pro- gram for the typ ica l f l o a t i n g - p o i n t system is hard to produce. With the 8087 i t is easy, almost auto- matic, to produce such a program.

Temporary Real : SUM Long Real : X ( I ) , L

SUM : = 0 For I = 1 to N Do

SUM : = SUM + X ( I ) * * 2

L : = SQRT (SUM)

This program is free from intermediate overf low or underflow problems and unless N is very large i t s only s i g n i f i c a n t rounding er ror is in the l as t in - ,s t ruc t ion - where i t is unavoidable but easy to analyze.

The second example demonstrates how several accumulations can be calculated e f f i c i e n t l y in the 8087. I f we have two sets of data, Xi and Yi , that we want to analyze, we very l i k e l y w i l l want means, standard deviat ions and cor re la t ion coe f f i c i en t s . We thus want to ca lcu la te :

M x =~Ex i My =~y j S x = ~ x l 2 Sy =:Eyi 2

Cxy = ~ x i Y i

In an ord inary stack machine, the f i v e values l i s - ted above would probably be calculated in f i ve sep- arate passes through the data requ i r ing that each datum be read three times.

On the 8087 the f i ve values can be calculated in one pass through the data, requ i r ing that each datum be read only once. The a rch i tec tu ra l feature that permits t h i s increase in e f f i c i ency is the a b i l i t y to do ar i thmet ic wi th operands from any stack element. The algor i thm is described below.

STEP ACTION

O. Clear f i ve stack elements (push zero f i ve t imes): M x, My, S x, Sy, Cxy

1. LOAD X i

2. Add TOP (Xi) to M x

3. Dupl icate TOP of stack

4. Square TOP

5. Add TOP (XT) to S x and POP

6. LOAD Yi

7. Add TOP (Yj) to My

8. Mu l t i p l y TOP (Yi) to X i

9. Square TOP

I0. Add TOP (Y~ to Sy and POP

11. Add TOP (X iY i ) to Cxy and POP

12. Loop to Step I

The inner loop of t h i s program has only eleven 8087 ins t ruc t i ons and has the same propert ies of r e l i a b i l i t y and robustness as the f i r s t example. I t is also e f f i c i e n t since the minimum computation and memory addressing is done.

The In te l 8087 Numeric Data Processor, along wi th i t s design goals of meeting I n t e l ' s REALMATH standard, and provid ing increased c a p a b i l i t y , ease of use and performance, has been described. We have attempted to balance safety and u t i l i t y and have provided an unprecendented level of capabi l - i t y , accuracy and r e l i a b i l i t y in a math processor.

5.0 ACKNOWLEDGEMENTS

There are a great number of people who deserve recogni t ion fo r t h e i r con t r i bu t ion to the 8087. The i n i t i a l a rch i tec tu ra l design was the j o i n t work of the author and Bruce Ravenel, r e l y i nq heav i ly on the advice of Professor W. Kahan. Robert Koehler made s i g n i f i c a n t cont r ibu t ions to the sys- tems aspects of the 8087 and Janis Baron was re- sponsible fo r designing the assembly language and implementing the emulator. A great deal of c red i t must go to Rafi Nave and his team in In te l Israel fo r implementing the 8087 and to Dai-Sun Tsien fo r c a r e f u l l y reviewing and checking the implementa- t i on . Perhaps most s i g n i f i c a n t of a l l , we acknow- ledge the management of In te l fo r being w i l l i n g to commit s i g n i f i c a n t resources to both implementation and promotion of a standard fo r r e l i a b l e numeric data processing.

BIBLIOGRAPHY

I .

2.

Moore, R.E. (1979), "Methods and Appl ica t ions of In terva l Ana lys is , " SIAM Studies in Applied Mathematics, SIAM, Phi ladelph ia .

Palmer, J. (1977), "The In te l Standard fo r F loa t ing-Po in t A r i thmet i c , " Proc. COMPSAC, 107-112.

3. Coonan, J . , W. Kahan, J. Palmer, T. Pittman and D. Stevenson (1979), "A Proposed Standard fo r F loa t ing-Po in t A r i thmet i c , " SIGNUM News- l e t t e r , October, 1979.

4. Kahan, W., J. Palmer (1979), "On a Proposed F loa t ing-Po in t Standard," SIGNUM Newsletter, October, 1979.

'180

Page 8: 8087 NDP Paper p174 Palmer

5.

6.

Kahan, W. (1972), "A Survey of Error Analysis," Information Processing 71, North Holland Pub- l ishing Company, 1214-1239.

Yohe, J. (1973), "Roundings in Floating-Point Arithmetic," IEEE Trans. Computers, Vol. C-22, No. 6, 577-586.

t81