Comparison of pipelined IEEE-754 standard floating point...
Transcript of Comparison of pipelined IEEE-754 standard floating point...
Journal of Scientific & Industrial Research
Vol. 65, November 2006, pp. 900-904
Comparison of pipelined IEEE-754 standard floating point multiplier with
unpipelined multiplier
Kavita Khare1,*, R P Singh
1 and Nilay Khare
2
1Department of Electronics and Communication Engineering, MANIT, Bhopal
2CSE & IT Department, University Institute of Technology, Rajiv Gandhi Prodyougiki Vishvavidyalaya, Bhopal
Received 07 April 2005; revised 21 June 2006; accepted 20 July 2006
The IEEE-754 standard floating point multiplier that provides highly precise computations to achieve high throughput
and low area on the IC have been improved by insertion of pipelining technique. Floating point multiplier-using pipelining
has been simulated, analyzed and its superiority over traditional designs is discussed. To achieve pipelining, one must
subdivide the input process into sequence subtasks, each of which can be executed by specialized hardware stage that
operates concurrently with other stages in the pipeline without the need of extra computing units. Detailed synthesis and
simulation report operated upon Xilinx ISE 5.2i and Modelsim software is given. Hardware design is implemented on Virtex
FPGA chips.
Keywords: Floating point adder, IEEE floating point standard, Latency, Model-Sim, VHDL, Xilinx ISE 5.2i
Introduction Until recently, any meaningful floating-point
arithmetic (FPA) has been virtually impossible to
implement on Field Programmable Gate Arrays
(FPGA) based systems due to the limited density and
speed of older FPGAs. In addition, mapping
difficulties occurred due to inherent complexity of
FPA. With the introduction of high-level languages
such as VHDL, rapid prototyping of floating point
units has become possible. Advanced digital signal
processing requires FPA to achieve higher accuracy
and high dynamic range for numerical computation.
The IEEE has produced a standard for FPA. This
standard specifies how single precision (32 bit) and
double precision (64 bit) floating point numbers are to
be represented, as well as how arithmetic should be
carried out on them.
Methodology In this paper, single precision representation is
dealt with1. The IEEE single precision floating point
standard representation requires a 32 bit word, which
may be represented from 0 to 31, left to right. First bit
is the sign bit, s, the next eight bits are the exponent
bits, E, and the final 23 bits are the mantissa, m. In
IEEE-754 format2, the significant always takes on an
implied ‘1’ for the most significant digit assuming the
value represented is normalized (Table 1). Essential
idea behind floating point number systems is to
formulate representations and computation procedures
in which the scaling procedures introduced by fixed-
point systems2-4
.
Value of number, N = (-1) S X 2
(E-127) X (1.m)
where, 0 <E> 255, Actual exponent is: e = E – 127
Magnitude of numbers is in the range: 2-126
(1.0) to
2127
(2-2-23
)
Table 1Single precision floating point number
Exponent Significand Number presented
0 0 0
0 Non zero Denormalized number (May be
returned as a result of underflow in
multiplication)
1 to 254 Anything Floating Point Number
255 0 Infinity.(Positive divided by zero
yields “infinity”)
255 Non zero NaN (Zero divide by zero yields NaN
“Not A Number”)
___________
*Author for correspondence
Tel: 0755-2420777; Fax: 07552670538
E-mail: [email protected]
KHARE et al.: COMPARISON OF PIPELINED IEEE-754 MULTIPLIER WITH UNPIPELINED MULTIPLIER
901
Here pipelining offers an economic way to realize
temporal parallelism in digital systems that achieve
faster clock rates while sacrificing latency1. Most
modern processors, from PCs to supercomputer rely
on pipeline techniques and floating-point multipliers
(FPMs)/adders to achieve high throughput. A new
algorithm for pipeline insertion is developed here and
used for FP multiplication. The method of pipeline
insertion consisted in the introduction of rows of
latches through the multiplier structure, which divides
into rows of cells that operate independently from
each other5.
Multiplication operator expects to produce the
result after a single clock cycle, thus producing a
circuit requiring substantial amounts of CLB
resources. Instead a pipelined approach for the integer
multiplier has been examined to continue producing a
result in each clock cycle. By using a pipelined
multiplier, resource consumption decreases and speed
increases. FPMs are designed and synthesized through
Xilinx ISE 5.2i into a Virtex device.
Floating Point Multiplier and its VHDL Implementation
Assuming that the operands are already in the IEEE
754 format, performing floating-point multiplication
result [R = X * Y = (-1) Xs (Xm × 2Xe) * (-1) Ys
(Ym × 2Ye)] involves the following steps: 1) If one or
both operands is equal to zero, return the result as
zero, otherwise; 2) Compute the sign of the result Xs
XOR Ys; 3) Compute the mantissa of the result [a)
Multiply the mantissas: Xm * Ym; b) Round the
result to the allowed number of mantissa bits]; 4)
Compute the exponent of the result [Result exponent
= biased exponent (X) + biased exponent (Y) – bias];
5) Normalize if needed, by shifting mantissa right,
incrementing result exponent; and 6) Check result
exponent for overflow/underflow [a) If larger than
maximum exponent allowed return exponent
overflow; b) If smaller than minimum exponent
allowed return exponent underflow].
These independent operations within a multiplier
make it ideal for pipelining. The three steps can be
done for multiplier: 1) Unpack the operands, re-insert
the hidden bit, and which for any exceptions on the
operands (such as zeros or NaN); 2) Multiplication of
the significands, calculation of the sign of the two
significands and addition of the exponents takes
place; and 3) Normalization and exponent
adjustment5.
Rounding occurs in floating point multiplication
when the mantissa of the product is reduced from 48
bits to 24 bits. The least significant 24 bits are
discarded. Overflow occurs when the sum of the
exponents exceeds 127, the largest value which is
defined in bias-127 exponent representation. When
this occurs, the exponent is set to 128 (E = 255) and
the mantissa is set to zero indicating + or-infinity.
Underflow occurs when the sum of the exponents is
more negative than -126, the most negative value
which is defined in bias -127 exponent representation.
When this occurs, the exponent is set to -127 (E = 0).
If m = 0, the number is exactly zero. If m is not zero,
then a denormalized number is indicated which has an
exponent of -127 and a hidden bit of 0. The smallest
such number which is not zero is 2-149. This number
retains only a single bit of precision in the rightmost
bit of the mantissa.
Various VHDL modules developed are7,8
:
multiplier_pckg.vhd―declares the various data types,
functions and procedures in the design;
multiplier.vhd―consists of the various component
instantiations and their port mapping; flag_check_
load.vhd―first stage of the pipeline that performs the
function of loading the operands, checking
for the exceptional inputs, compares the exponents
and generates the exponent difference;
prod_sign.vhd―second stage in the pipeline that
shifts the mantissa according to the
exponent difference value generated in the
previous stage; speip_flag.vhd―third stage in the
pipeline that performs the basic addition
or subtraction; and reg.vhd, reg_bit.vhd,
reg_bitvector.vhd, reg_exp.vhd, reg_int.vhd,
reg_mantissa.vhd, reg_mnt.vhd―describe the various
registers used to interface the various stages.
Field Programmable Gate Arrays (FPGA)
FPGA can be volatile or non-volatile. It consists of
a two-dimensional array of logic blocks. Each logic
block is programmable to implement any logic
function. Thus, they are also called configurable logic
blocks (CLBs). Switchboxes or channels contain
interconnection resources that can be programmed to
connect CLBs to implement more complex logic
functions. Designers can use existing CAD tools to
convert HDL code in order to program FPGAs. An
FPGA contains 2,000-2,000,000 gates (or more).
Since FPGA can be reprogrammed, the turn around
time is only a few minutes. Advantages of FPGAs are
J SCI IND RES VOL 65 NOVEMBER 2006
902
lower prototyping costs and shorter production lead
KHARE et al.: COMPARISON OF PIPELINED IEEE-754 MULTIPLIER WITH UNPIPELINED MULTIPLIER
903
times, which advances the time-to-market and in turn
increases profitability. It can also ensure the reliability
of the design on the board9,10
.
Xilinx Vertex-II FPGA used here has input output
blocks (IOB) in two or four on the perimeter of each
device. IOB includes 6 storage elements, each can be
Table 2Comparison between pipelined and unpipelined multipliers
Device utilization summary: [Selected deviceVirtex 2p (2vp50ff1517-6)]
Results Unpipelined multipliers Pipelined multipliers
Number of slices 2222 out of 10304 (21%) 756 out of 24640 (3%)
Number of slice flipflops 102 out of 20608 (0%)
4234 out of 20608 (20%) 305 out of 49280 (0%)
Number of 4 input LUTs 102 out of 588 (17%) 1316 out of 49280 (2%)
Number of bonded IOBs 100 out of 916 (10%)
Timing summary (Speed Grade: -6)
Minimum period 63.812 ns 3.070ns 325.733MHz 1.265ns
Maximum frequency 15.671 MHz 1.265ns
Minimum input arrival time before clock 70.262 ns
Maximum output required time after clock 5.690 ns
Thermal summary multiplier
Estimated junction temperature: 25 25
Ambient temp: 25 25
Case temp: 25 25
Theta J-A: 0C/W 0C/W
Power summary of multiplier
S No. Results Unpipelined multiplier Pipelined multiplier
Power summary I (mA) P (mW) I (mA) P (mW)
1 Total estimated 938 55
2 power consumption
3 Vccint 1.5V: 533 933 300 450
4 Vcc.5V: 2 5 2 5
5 Clocks: 33 57 0 0
6 Nets: 0 0 0 0
7 Logic: 0 0 0 0
8 Inputs: 1 1 0 0
9 Outputs: 0 0
10 Quiescent 1.5V: 500 875 300 450
Quiescent 2.5V: 2 5 2 5
Fig. 1Flow diagram of pipelined multiplier
Fig. 2Chip schematic of pipelined and unpipelined multipliers
J SCI IND RES VOL 65 NOVEMBER 2006
904
Fig. 4Simulation Results of: a) Unpipelined multiplier; b)
Pipelined multiplier
configured as an edge triggered D-Type flip flop or a
level sensitive switch. Device has CLB in arrays of
switch. Each CLB has 4 slices.
Fig. 3RTL Schematic of: a) Unpipelined multiplier (32 pages); b) Pipelined multiplier
KHARE et al.: COMPARISON OF PIPELINED IEEE-754 MULTIPLIER WITH UNPIPELINED MULTIPLIER
905
Results and Conclusions Both unpipelined and pipelined FP multipliers have
been implemented in VHDL (Figs 1-5). Reports of
device utilization summary and timing summary are
given in Table 2. Several units were synthesized of FP
multiplier to quantify the performance and space
requirements under the reported approach. The
synthesis was carried from a VHDL source and the
target device was a Xilinx Virtex-II FPGA
(2V1000FG456–6)11
. Effect of increasing the number
of pipeline stages effectively increases the operating
frequency. If pipelined multiplier is used, device
utilization and power consumption is reduced, further
speed of output increases from 15.671 to
325.733MHz, hence throughput increases (Table 2).
References 1 Khare K, Singh R P & Khare N, Comparison of pipelined
IEEE-754 standard floating point adder with unpipelined
adder, J Sci Ind Res, 64 (2005) 354-357.
2 Shirazi Nabeel & Athanas P, Quantitative analysis of floating
point arithmetic based custom computing machines, IEEE
Symp on FPGA for Custom Computing Machines (Napa
Valley, California) 1995, 333-334.
3 Eldon John A & Robertson Craig, A floating point format for
signal processing, IEEE Acoustics, Speech, and Signal
Processing Conf (USA) 1992, 717-720.
4 Yalamanchi S & Koltur R, Single Precision Floating-Point
Unit, FDU project, 2001.
5 Asato C D, A data-path multiplier with automatic insertion
of pipeline stages, IEEE J Solid-State Circuits, 4 (1990)
383-885.
6 Walters A, Scaleable filter implement using 32 bit floating
point complex arithmetic on a FPGA based custom
computing platform, M S Thesis, Blacksburg, Virginia, 2002.
7 Ashenden Peter J, The Designers Guide to VHDL (Harcourt
Asia Pvt Ltd., Singapore) 2000, 53-335
8 Douglas P, VHDL, 2nd edn (McGraw Hill, Singapore) 1994,
15-165.
9 Armstrong J R & Gray F G, Structured Logic Design with
VHDL (Prentice Hall, India) 1993, 15-139.
10 Eshraghian K & Weste Neil H E, Principle of CMOS and
VLSI Design: A system perspective, 2nd edn (Addision
Wesley Publishing company, Singapore) 1993, 175-459.
11 Puspam Vikram, Miller Andy & Chappman Ken, Xilinx
application notes Xapp 219, Oct 2001.
Fig. 5FPGA Editor of: a) Unpipelined multiplier;
b) Pipelined multiplier