Synthesizable Ip Core for 32

131
32-BIT FLOATING POINT PROCESSOR TEC Abstract This project deals with the designing of 32-bit floating point multiplier DSP processor for RISC/DSP processor applications. It is capable of representing real and decimal numbers. The floating operations are incorporated into the design as functions. The numbers in contention have to be first converted into the standard IEEE floating point standard representation before any sorts of operation are conducted on them. The floating representation for a standard single precision number format is 32-bit number that is segmented to represent the floating point number. The IEEE format consists of four fields, the sign of the exponents. The next seven bits are that of exponent magnitude, and the remaining 24-bits represent the mantissa and mantissa sign. The exponent in this IEEE standard is represented in excess-127 format multiplication will be implemented by the processor. The main functional blocks of floating point arithmetic processor design includes arithmetic logic unit (ALU), register organization, control and decoding unit, memory block, 32-bit floating point multiplication. This . DEPARTMENT OF ECE PAGE NO.-1

Transcript of Synthesizable Ip Core for 32

Page 1: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Abstract

This project deals with the designing of 32-bit floating point multiplier DSP processor for

RISC/DSP processor applications. It is capable of representing real and decimal numbers.

The floating operations are incorporated into the design as functions. The numbers in

contention have to be first converted into the standard IEEE floating point standard

representation before any sorts of operation are conducted on them. The floating

representation for a standard single precision number format is 32-bit number that is

segmented to represent the floating point number.

The IEEE format consists of four fields, the sign of the exponents. The next seven bits are

that of exponent magnitude, and the remaining 24-bits represent the mantissa and mantissa

sign. The exponent in this IEEE standard is represented in excess-127 format multiplication

will be implemented by the processor. The main functional blocks of floating point arithmetic

processor design includes arithmetic logic unit (ALU), register organization, control and

decoding unit, memory block, 32-bit floating point multiplication. This processor IP core can

be embedded many places such as co-processor for embedded DSP and embedded RISC

controller.

The over all system architecture will be designed using HDL language and simulation will be

done .

. DEPARTMENT OF ECE PAGE NO.-1

Page 2: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

CHAPTER 1

INTRODUCTION

1.1 Objective

The main objective of this project is to design a 32 bit floating point basic arithmetic unit that can be used for DSP processor applications. The numbers in contention have to be first converted into the standard IEEE floating point standard representation before any sorts of operation are conducted on them. The floating representation for a standard single precision number format is 32-bit number that is segmented to represent the floating point number.

The IEEE format consists of three fields, the sign of the exponents. The next eight bits are that of exponent magnitude, and the remaining 23-bits represent the mantissa and mantissa sign. The exponent in this IEEE standard is represented in excess-127 format. This processor IP core can be embedded many places such as co-processor for embedded DSP and embedded RISC controller.

The following figure represents basic arithmetic unit for 32 bit floating point computations:

Fig 1.1 Basic Arithmetic Unit For 32 Bit Computations

. DEPARTMENT OF ECE PAGE NO.-2

Page 3: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

When the input is given as floating point it is represented in the form of IEEE 32 bit floating point standard. After representing the required operation is selected and the corresponding operation is performed and the result is also represented in the form of 32 bit IEEE floating point standard.

1.2 Methodology

The overall system architecture will be designed using HDL language and simulation, synthesis and implementation(translation,mapping,placing and routing) will be done using various FPGA based EDA tools. Finally in the proposed system architecture performance (speed, area, power and throughput) will be compared with already existing system implementations.

VLSI EDA tools used for development of code and simulation of modules are:ACTIVE HDLALDEC:

Active HDL is an integrated environment designed for development of VHDL, VERILOG, EDIF and mixed VHDL-VERILOG-EDIF RTL/behavioral/simulation models.

XILINX ISE:

Integrated Software Environment (ISE) enables you to quickly verify the functionality of the sources using the integrated simulation capabilities and the HDL bencher test bench generator

1.3 Conceptual Survey

Fixed point DSPs are generally cheaper, than the floating point devices. As fixed point format produce less precision and lesser dynamic range, floating point format is much preferred. 32 bit floating system can do better than a 16 bit fixed point system in the rate of performance. It can be rated in the form of signal to noise ratio and quantisation noise.

Instruction set required for fixed point format is more even if a simple operation is to be performed.

. DEPARTMENT OF ECE PAGE NO.-3

Page 4: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

1.4 Organisation Thesis

Chapter 1 includes objective of the project, the methodologies used in designing it and the conceptual survey.

Chapter 2 deals with the representation of floating point in IEEE floating point standard representation and how the basic arithmetic operations are been carried out using this representation.

Chapter 3 gives a brief outline regarding the DSP processors and why 32 bit floating point format is much preferred for DSP processors rather than using 16 bit fixed format.

Chapter 4 gives a brief description of VLSI language and tools used.

Chapter 5 shows the simulation results.

Chapter 6 includes applications of 32 bit floating point in DSP processor.

Chapter 7 gives conclusion and future scope of the project.

. DEPARTMENT OF ECE PAGE NO.-4

Page 5: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

CHAPTER 2

FUNCTIONAL BLOCKS OF 32 BIT FLOATING POINT ARITHMETIC UNIT

2.1. Real Number System

A number representation specifies some way of storing a number that may be encoded as a string of digits. The arithmetic is defined as a set of actions on the representation that simulate classical arithmetic operations.

As shown in Figure 1, the real-number system comprises the continuum of real numbers from minus infinity (- ) to plus infinity (+ ).

Figure 2.1: Binary Real Number System

Because the size and number of registers that any computer can have is limited, only a subset of the real-number continuum can be used in real-number calculations. As shown at the bottom of Figure 1, the subset of real numbers that a particular FPU supports represents an approximation of the real number system. The range and precision of this real-number subset is determined by the format that the FPU uses to represent real numbers.

. DEPARTMENT OF ECE PAGE NO.-5

Page 6: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

There are several mechanisms by which strings of digits can represent numbers. In common mathematical notation, the digit string can be of any length, and the location of the radix point is indicated by placing an explicit "point" character (dot or comma) there. If the radix point is omitted then it is implicitly assumed to lie at the right (least significant) end of the string (that is, the number is an integer). In fixed-point systems, some specific assumption is made about where the radix point is located in the string.

2.2. Fixed-point Vs floating-point in digital signal processing

Fig 2.2: Fixed Vs Floating Point System Digital signal processors (DSPs) are essential for real-time processing of real-world digitized data, performing the high-speed numeric calculations necessary to enable a broad range of applications – from basic consumer electronics to sophisticated industrial instrumentation. Software programmable for maximum flexibility and supported by easy-to-use, low-cost development tools, DSPs enable designers to build innovative features and differentiating value into their products, and get these products to market quickly and cost-effectively

There are many considerations that system developers weigh when selecting digital signal processors for their applications. Among the key factors to consider are the computational capabilities required for the application, processor and system costs, performance attributes, and ease of development. Balancing these factors together, designers can identify the DSP that is best suited for an application.

Digital Signal Processing can be divided into two categories, fixed point and floating point. The basic difference between fixed point DSP hardware and floating point DSP hardware is that in fixed-point DSP hardware performance is strictly integer arithmetic, while floating-point DSPs support either integer or real arithmetic. Binary fixed point is usually used in special-purpose applications on embedded processors that can only do integer arithmetic, but decimal fixed point is common in commercial applications. These refer to the format used to store and manipulate numbers within the devices. Fixed point DSPs usually represent each number with a minimum of 16 bits, although a different length can be used. For instance, Motorola manufactures a family of fixed point DSPs that use 24 bits. There are four common ways that these 216 ' 65,536 possible bit patterns can represent a number. In unsigned integer, the stored number can take on any integer value from 0 to 65,535. Similarly, signed integer uses two's complement to make the range include negative numbers, from -32,768 to 32,767. With unsigned fraction notation, the 65,536 levels are spread uniformly between 0 and 1. Lastly, the signed fraction format allows

. DEPARTMENT OF ECE PAGE NO.-6

Page 7: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

negative numbers, equally spaced between -1 and 1.In comparison, floating point DSPs typically use a minimum of 32 bits to store each value. This results in many more bit patterns than for fixed point, 2324,294,967,296 to be exact.

A key feature of floating point notation is that the represented numbers are not uniformly spaced. In the most common format (ANSI/IEEE Std.754-1985), the largest and smallest numbers are ±3.4 ×1038 and ±1.2 ×1038, respectively. The represented values are unequally spaced between these two extremes, such that the gap between any two numbers is about ten-million times smaller than the value of the numbers. This is important because it places large gaps between large numbers, but small gaps between small numbers.

All floating point DSPs can also handle fixed point numbers, a necessity to implement counters, loops, and signals coming from the ADC and going to the DAC.However, this doesn't mean that fixed point math will be carried out as quickly as the floating point operations; it depends on the internal architecture. For instance, the SHARC DSPs are optimized for both floating point and fixed point operations, and executes them with equal efficiency. For this reason, the SHARC devices are often referred to as "32-bit DSPs," rather than just "Floating Point."

2.3. Floating point A floating-point number is the one, which is capable of representing real and decimal numbers. The floating-point operations are incorporated into the design as functions. The logic for these is different from the ordinary arithmetic functions. Floating point describes a system for representing numbers that would be too large or too small to be represented as integers. The term” floating point” refers to the fact that the radix point can "float"; that is, it can be placed anywhere relative to the significant digits of the number. This position is indicated separately in the internal representation, and floating-point representation can thus be thought of as a computer realization of scientific notation. These numbers are in general represented approximately to a fixed number of significant digits and scaled using an exponent. The advantage of floating-point representation over fixed-point (and integer) representation is that it can support a much wider range of values.

The speed of floating-point operations is an important measure of performance for computers in many application domains. It is measured in” FLOPS”.

2.4. General floating point format:

A floating-point number consists of:

A signed digit string of a given length in a given base (or radix). This is known as the significand, or sometimes the mantissa (see below) or coefficient. The radix point is not explicitly included, but is implicitly assumed to always lie in a certain position within the significand often just after or just before the most significant

. DEPARTMENT OF ECE PAGE NO.-7

Page 8: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

digit, or to the right of the rightmost digit. The length of the significand determines the precision to which numbers can be represented.

A signed integer exponent, also referred to as the characteristic or scale, which modifies the magnitude of the number.

The typical number that can be represented exactly is of the form:

significand digits × baseexponent

The base for the scaling is normally 2, 10 or 16.

Symbolically, this final value is

where s is the value of the significand (after taking into account the implied radix point), b is the base, and e is the exponent.

The significand is multiplied by the base raised to the power of the exponent, equivalent to shifting the radix point from its implied position by a number of places equal to the value of the exponent to the right if the exponent is positive or to the left if the exponent is negative.

A real-valued number is represented in a floating-point format as:(-1)Sign × Significand × 2Exponent

where:

Sign is 0 for positive values, 1 for negative values. Significand is a real number, composed as integer.Fraction. Exponent is an integer value

The range of floating-point numbers depends on the number of bits or digits used for representation of the significand (the significant digits of the number) and for the exponent.

The floating-point format needs slightly more storage (to encode the position of the radix point), so when stored in the same space, floating-point numbers achieve their greater range at the expense of precision.

2.5. IEEE format for floating point:The IEEE has standardized the computer representation for binary floating-point numbers in IEEE 754. The IEEE-754 standard was created in the early 1980s after word sizes of 32 bits (or 16 or 64) had been generally settled upon. Prior to the IEEE-754 standard, computers used many different forms of floating-point. These differed in the word sizes, the format of the representations, and the rounding behaviour of operations. These differing systems implemented different parts of the arithmetic in hardware and software; with varying accuracy the bit representation of an IEEE binary floating-point number is proportional to its base 2 logarithm, with an average error of about 3%. (This is because the exponent field is in

. DEPARTMENT OF ECE PAGE NO.-8

Page 9: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

the more significant part of the datum.) This can be exploited in some applications, such as volume ramping in digital sound processing.  IEEE 754 binary floating point standard is used to represent floating point numbers and define the results of arithmetic operations. A float is represented using 32 bits, and each possible combination of bits represents one real number. This means that at most 232 possible real numbers can be exactly represented, even though there are infinitely many real numbers (even between 0 and 1).

IEEE-754 specifies binary representations for floating point numbers:

Table 2.5 : representations for floating point numbers

IEEE floating point numbers have three basic components: the sign, the exponent, and the mantissa.

The Sign Bit: The sign bit is as simple as it gets. 0 denotes a positive number; 1 denotes a negative number. Flipping the value of this bit flips the sign of the number.

The Exponent: The exponent field needs to represent both positive and negative exponents. To do this, a bias is added to the actual exponent in order to get the stored exponent.

The Mantissa: The mantissa, also known as the significand, represents the precision bits of the number. It is composed of an implicit leading bit and the fraction bits.

So, to sum up:

1. The sign bit is 0 for positive, 1 for negative. 2. The exponent's base is two. 3. The exponent field contains 127 plus the true exponent for single-precision, or

1023 plus the true exponent for double precision. 4. The first bit of the mantissa is typically assumed to be 1.f, where f is the field of

fraction bits.

There are many formats that are used for representation of floating point number. A few among them are:

16-bit : Half (binary16)

32-bit : Single (binary32)

64-bit : Double (binary64)

128-bit : Quadruple (binary128)

. DEPARTMENT OF ECE PAGE NO.-9

Sign Exponent Mantissa

Page 10: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

2.5.1 Half precision format:The following figure shows the layout of half precision (16 bit) floating point format

in which Sign bit is of 1 bit of magnitude, exponent bit is of 5 bits of magnitude and the significand bit is of 10 bits in magnitude.

Table 2.5.1: Half precision floating point format

Type Sign Exponent Fraction Total Bits Bits precision Exponent Bias

Half Precision 1 [15] 5 [14-08] 8 [07-00] 16 11 15

2.5.2 Single Precision Format:

The following figures show the layout for single (32-bit) precision floating-point values. The number of bits for each field are shown (bit ranges are in square brackets):

Table 2.5.2: Single (32-Bit) Precision Floating-Point Format

Type Sign Exponent FractionTotal Bits

Bits precision

Exponent Bias

Single Precision 1 [31] 8 [30-23] 23 [22-00] 32 24 127

The IEEE single precision floating point standard representation requires a 32 bit word, which may be represented as numbered from 0 to 31, left to right.

IEEE floating point 32 standard is:

V= S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF 31 30 2322 0

The first bit is the sign bit, S, the next eight bits are the exponent bits, 'E', and the final 23 bits are the fraction 'F':

For example: 01000010 01010101 01100110 00101010

where 1st bit represents sign bit (0), next 8 bits represent Exponent Bits(1000010 0) and the remaining part represents Mantissa Bits(1010101 01100110 00101010).

The value V represented by the word may be determined as follows:

If E=255 and F is nonzero, then V=NaN ("Not a number") If E=255 and F is zero and S is 1, then V=-Infinity If E=255 and F is zero and S is 0, then V=Infinity

. DEPARTMENT OF ECE PAGE NO.-10

Page 11: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

If 0<E<255 then V=(-1)**S * 2 ** (E-127) * (1.F) where "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point.

If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values.

If E=0 and F is zero and S is 1, then V=-0 If E=0 and F is zero and S is 0, then V=0

In particular,

0 00000000 00000000000000000000000 = 0 1 00000000 00000000000000000000000 = -0

0 11111111 00000000000000000000000 = Infinity 1 11111111 00000000000000000000000 = -Infinity

0 11111111 00000100000000000000000 = NaN 1 11111111 00100010001001010101010 = NaN

0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2 0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5 1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5

0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126) 0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127) 0 00000000 00000000000000000000001 = +1 * 2**(-126) * 0.00000000000000000000001 = 2**(-149) (Smallest positive value)

Examples of IEEE 754 single precision format:

-0.3125 

The biased exponent is -2+127=125= (01111101

   

1.0 

The biased exponent is 

. DEPARTMENT OF ECE PAGE NO.-11

Page 12: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

   

37.5 

The based exponent: 127+5= (10000100 ,

    .

-78.25 

The biased exponent: 127+6=133=(10000101

   

-1313.3125

131310 = 101001000012

0.3125 × 2 = 0.625 0

0.625 × 2 = 1.25 1

0.25 × 2 = 0.5 0

0.5 × 2 = 1.0 1

1313.312510 = 10100100001.01012.

= 1.010010000101012 × 210.

10 + 127 = 137 = 100010012, sign bit is 1.

So -1313.3125 is 

0.1015625

0.1015625 × 2 = 0.203125 0

0.203125 × 2 = 0.40625 0

0.40625 × 2 = 0.8125 0

. DEPARTMENT OF ECE PAGE NO.-12

1 1000 1001 10001001010010000101010000000

Page 13: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

0.8125 × 2 = 1.625 1

0.625 × 2 = 1.25 1

0.25 × 2 = 0.5 0

0.5 × 2 = 1.0 1

0.101562510 = 0.00011012 = 1.1012 × 2-4

The biased exponent : -4 + 127 = 123 = 011110112

So 0.1015625 is 

2.5.3. Double Precision Format:

The following figures show the layout for double (64-bit) precision floating-point values. The number of bits for each field are shown (bit ranges are in square brackets):

Table 2.5.3: Double (64-Bit) Precision Floating-Point Format

Type Sign Exponent Fraction Total Bits Bits precision Exponent Bias

Double Precision

1 [63] 11 [62-52] 52 [51-00]64 53

1023

The IEEE double precision floating point standard representation requires a 64 bit word, which may be represented as numbered from 0 to 63, left to right. The first bit is the sign bit, S, the next eleven bits are the exponent bits, 'E', and the final 52 bits are the fraction 'F':

V = S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 63 62 5251 0

For example: 0101000101010100001010111000010111001111000010111110011100010 01

Where 1st bit represents sign bit (0), next 11 bits represent exponent bits (10100010101) and the remaining part represents mantissa bits (010000101011100001011100111100001011111001110001001).

The value V represented by the word may be determined as follows:

. DEPARTMENT OF ECE PAGE NO.-13

0 00111101 110100000000000000000000 

Page 14: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

If E=2047 and F is nonzero, then V=NaN ("Not a number") If E=2047 and F is zero and S is 1, then V=-Infinity If E=2047 and F is zero and S is 0, then V=Infinity If 0<E<2047 then V=(-1)**S * 2 ** (E-1023) * (1.F) where "1.F" is intended to

represent the binary number created by prefixing F with an implicit leading 1 and a binary point.

If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-1022) * (0.F) These are "unnormalized" values.

If E=0 and F is zero and S is 1, then V=-0 If E=0 and F is zero and S is 0, then V=0

2.5.4. Quadruple Precision Format:

The following table represents the format for 128 bit Qaudruple precision with 1 sign bit , 15 exponent bits and 112 significand bits.

Table 2.5.4: Quadruple (128-Bit) Precision Floating-Point Format

Type Sign Exponent FractionTotal Bits

Bits precision

Exponent Bias

Quadruple Precision

1 [127] 15 [126-112]112 [111-

00]128 113

16383

2.6 Ranges of Floating-Point Numbers

By allowing the radix point to be adjustable, floating-point notation allows calculations over a wide range of magnitudes, using a fixed number of digits, while maintaining good precision. The range of floating-point numbers depends on the number of bits or digits used for representation of the significand (the significant digits of the number) and for the exponent.

Here's a table of the effective range (excluding infinite values) of IEEE floating-point numbers:

Table 2.6.1: Effective Range of IEEE Floating Point Number

. DEPARTMENT OF ECE PAGE NO.-14

Type Binary Decimal

Single (2-2-23) 2127 ~ 1038.53

Double (2-2-52) 21023 ~ 10308.25

Page 15: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

The range of positive floating point numbers can be split into normalized numbers (which preserve the full precision of the mantissa), and denormalized numbers (discussed later) which use only a portion of the fraction’s precision.

Table 2.6.2: Effective Range of IEEE Floating Point Number with Denormalized, Normalized And Approximate Decimal Values.

Denormalized Normalized Approximate Decimal

Single Precision 2-149 to (1-2-23) 2-126 2-126 to (2-2-23) 2127 ~10-44.85 to ~1038.53

Double Precision 2-1074 to (1-2-52 )2-1022 2-1022 to (2-2-52) 21023 ~10-323.3 to ~10308.3

Since the sign of floating point numbers is given by a special leading bit, the range for negative numbers is given by the negation of the above values.

The number of normalized floating point numbers in a system F(B, P, L, U) (where B is the base of the system, P is the precision of the system to P numbers, L is the smallest exponent represent able in the system, and U is the largest exponent used in the system) is: 2(B − 1)(BP

− 1)(U − L + 1).

There is a smallest positive normalized floating-point number, Underflow level = UFL = BL

which has a 1 as the leading digit and 0 for the remaining digits of the significand, and the smallest possible value for the exponent.

There is a largest floating point number, Overflow level = OFL = (1 − B − P)(BU + 1) which has B − 1 as the value for each digit of the significand and the largest possible value for the exponent.

There are five distinct numerical ranges that single-precision floating-point numbers are not able to represent:

1. Negative numbers less than -(2-2-23) 2127 (negative overflow) 2. Negative numbers greater than -2-149 (negative underflow) 3. Zero

. DEPARTMENT OF ECE PAGE NO.-15

Page 16: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

4. Positive numbers less than 2-149 (positive underflow)

5. Positive numbers greater than (2-2-23) 2127 (positive overflow)

Overflow occurs when the sum of the exponents exceeds 127, the largest value which is defined in bias-127 exponent representation. When this occurs, the exponent is set to 128 (E = 255) and the mantissa is set to zero indicating + or - infinity.

Underflow occurs when the sum of the exponents is more negative than -126, the most negative value which is defined in bias-127 exponent representation. When this occurs, the exponent is set to -127 (E = 0). If M = 0, the number is exactly zero.

2.7 Benefits Of Using Floating Point Arithmetic Over Fixed Point Arithmetic:

Image and digital signal processing applications require high floating-point calculations throughput, and nowadays FPGAs are being used for performing these Digital Signal Processing (DSP) operations. Floating point operations are hard to implement on FPGAs as their algorithms are quite complex. These floating point arithmetic operations are one of the performance bottlenecks in high speed and low power image and digital signal processing applications. Recently, there has been significant work on analysis of high-performance floating-point arithmetic on FPGAs. It is a well known concept that the single precision floating point operations algorithm is divided into three main parts corresponding to the three parts of the single precision format. Fixed point number usually allow only 8 bits (32 bit computing) of binary numbers for the fractional portion of the number which means many decimal numbers are recorded inaccurately. Floating Point numbers use exponents to shift the decimal point therefore they can store more accurate fractional values than fixed point numbers. However the CPU will have to perform extra arithmetic to read the number when stored in this format.

2.8 Architecture of 32-Bit Floating Point Basic Arithmetic Unit

. DEPARTMENT OF ECE PAGE NO.-16

Page 17: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Fig 2.8 : Architecture of 32-Bit Floating Point Basic Arithmetic UnitThere are four basic arithmetic operations performed used floating points. They are:1. Addition2. Subtraction3. Multiplication4. Division2.8.1. Addition Whenever addition is performed with two real numbers mostly the numbers after decimal point are discarded. But by using floating point addition this can be avoided to a little extent. Floating point addition is analogous to addition using scientific notation. For example, let us consider two numbers a= 2.25x and b= 1.340625x : for the addition of these two numbers the following steps are performed:

1. Shift the decimal point of the smaller number to the right until the exponents are equal. i.e.., as the smaller number here is a=2.25x and the larger exponent is 2 the smaller number ‘a’ is shifted to the left until both the exponents become equal. Hence the value of number ‘a’ becomes 0.0225x .

2. Add the numbers with decimal points aligned.

Now as both the exponent values are same, both the numbers are added.

3. Normalize the result.

. DEPARTMENT OF ECE PAGE NO.-17

Page 18: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

The normalised result may contain the required number of digits discarding the unwanted part.But the normalised result may sometimes carry the required result.

Consider a example in which a =1.234567x and b= 9.876543x and if the addition has to be performed, then the following result may occur:

1. the smaller number is shifted right so as to equalise the exponent of smaller number with the larger number

i.e.., b= 9.876543x after shifting becomes b= 0.00000009876543 x

2. Now both the numbers are added, i.e.

a =1.234567x

b= 0.00000009876543 x

c= 1.23456709876543 x In this case the normalised result after rounding to seven digits becomes 1.2345670 x

in which the remaining part (9876543) which is discarded also carries the result. Thus this case can said to be having rounding errors. Thus rounding errors can occur when the normalized result doesn’t have the required last digits which have been discarded for rounding of the value.

2.8.1.1. Block diagram representation of floating point adder: ExpA ExpB ManA ManB SignA SignB

Fig 2.8.1.1: Block Diagram of Floating Point Adder Let A and B be two numbers which are represented in IEEE floating point standard with SignA as sign of number A , ExpA as exponent of number A , ManA as mantissa of number A, signB as sign of number B, ExpB as exponent of number B and ManB as mantissa of number B. The exponents are made same for both the numbers by right shifting the mantissa of the smaller number. The mantissa of both numbers A and B are added. If both the numbers are positive then bit 0 is represented for sign and if both the numbers are negative, then bit 1 is represented for sign. If the numbers are represented with both positive and negative sign, then sign of greater number is considered.2.8.1.2. Addition of floating points using IEEE 754 format:Addition of two floating point using IEEE 754 format involves the following steps and can be represented in the form of flow chart as follows:

. DEPARTMENT OF ECE PAGE NO.-18

Normalization Unit

Exponent calculator

Out Mantissa Adder

Sign Calculator

Page 19: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Fig 2.8.1.2: Flow Chart for Floating Point Adder.1. Firstly, the numbers are represented in IEEE floating point format.

2. The exponents of these two numbers are compared and the smaller number is shifted right until the exponents of both the numbers are same. If the exponents are stored in biased form, the exponent sum would have doubled the bias. Thus, the bias value must be subtracted from the sum

3. Addition of significands is done.

4. The result is normalised either by shifting the significand to the right side and increasing the exponent or by shifting the exponent left side and decreasing the exponent.

5. If there is an underflow or overflow, exception is made.

6. If not, the significand is rounded to the appropriate number of bits required and again normalization is checked.

. DEPARTMENT OF ECE PAGE NO.-19

Normalized result

Page 20: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

7. If the result is well normalized then it will become the normalized result if not the result is again normalized and converted back to the floating point form.

Negative mantissas are handled by first converting to 2's complement and then performing the addition. After the addition is performed, the result is converted back to sign- magnitude form.

When adding numbers of opposite sign, cancellation may occur, resulting in a sum which is arbitrarily small, or even zero if the numbers are equal in magnitude. Normalization in this case may require shifting by the total number of bits in the mantissa, resulting in a large loss of accuracy.

Consider addition of the numbers 2.25x and 1.340625x .

The number 2.25 in IEEE Floating Point Standard is:

The number 134.0625 in IEEE Floating Point Standard is:

1. To align the binary points, the smaller exponent is incremented and the mantissa is shifted right until the exponents are equal. Thus, 2.25 becomes:

2. The mantissas are added using integer addition:

3. The result is already in normal form. If the sum overflows the position of the hidden bit, then the mantissa must be shifted one bit to the right and the exponent incremented. The mantissa is always less than 2, so the hidden bits can sum to no more than 3 (11).

. DEPARTMENT OF ECE PAGE NO.-20

Page 21: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

2.8.2. SubtractionConsider two numbers a= 2.25x and b= 1.340625x : For the subtraction of these two numbers the following steps are performed:

1. Shift the decimal point of the smaller number to the right until the exponents are equal.

i.e.., as the smaller number here is a=2.25x and the larger exponent is 2 the smaller number ‘a’ is shifted to the left until both the exponents become equal. Hence the value of number ‘a’ becomes 0.0225x .

2. Subtract the numbers with decimal points aligned.

Now as both the exponent values are same, both the numbers are added.

3. Normalize the result.

The normalised result may contain the required number of digits discarding the unwanted part.

2.8.2.1. Block diagram representation of floating point subtraction: ExpA ExpB ManA ManB SignA SignB

Fig 2.8.2.1.1: Block Diagram of Floating Point Subtraction Let A and B be two numbers which are represented in IEEE floating point standard with SignA as sign of number A , ExpA as exponent of number A , ManA as mantissa of number A, signB as sign of number B, ExpB as exponent of number B and ManB as mantissa of number B. The exponents are made same for both the numbers by right shifting the mantissa of the smaller number. The mantissa of both numbers A and B are subtracted. If both the numbers are positive then bit 0 is represented for sign and if both the numbers are negative, then sign is represented according to the number i.e., if larger number has to be subtracted from a smaller number then the sign bit would be’1’ which indicates negative sign and if smaller number has to be subtracted from larger number then the resultant sign bit would be ‘0’ which represents positive sign. . DEPARTMENT OF ECE PAGE NO.-21

Exponent calculator

Mantissa subtraction

Sign Calculator

NormalizationUnit Out

Page 22: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

2.8.2.2. Subtraction of floating points using IEEE 754 format:Subtraction of floating point using IEEE floating point standard involves the following steps:

1. The numbers are represented in IEEE floating point format.

2. The exponents of these two numbers are compared and the smaller number is shifted right until the exponents of both the numbers are same. If the exponents are stored in biased form, the exponent sum would have doubled the bias. Thus, the bias value must be subtracted from the sum

3. Subtraction of significands is done.

4. The result is normalised either by shifting the significand to the right side and increasing the exponent or by shifting the exponent left side and decreasing the exponent.

5. If there is an underflow or overflow, exception is made.

6. If not, the significand is rounded to the appropriate number of bits required and again normalization is checked.

Consider subtraction of the numbers 2.25x and 1.340625x .

The number 2.25 in IEEE Floating Point Standard is:

The number 134.0625 in IEEE Floating Point Standard is:

1. To align the binary points, the smaller exponent is incremented and the mantissa is shifted right until the exponents are equal. Thus, 2.25 become:

2. The mantissas are subtracted using integer subtraction:

. DEPARTMENT OF ECE PAGE NO.-22

Page 23: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

The result is already in normal form. If the sum overflows the position of the hidden bit, then the mantissa must be shifted one bit to the right and the exponent incremented.

2.8.2.3. Flow chart for floating point subtraction:

Fig 2.8.2.3: Flow chart for floating point subtraction

The flow chart can be explained as follows: For subtraction, consider two numbers X and Y and the resultant be Z.In the first step, number X is checked. If it is ‘0’ then the resultant solution Z would be Y i.e., Z=Y.If number X is not ‘0’, then number Y is checked. If it is ‘0’, then the result would be Z=X.If both the numbers X and Y are non zeros, then the following steps can be followed:Exponents of both the numbers are checked. If the exponents are same, then the significands of numbers X and Y are subtracted. If the significand is zero then it is returned if not significand overflow is checked. If overflow occurs, then the significand bits are shifted towards right side and exponents are increased and exponent overflow is checked. If overflow occurred then overflow is reported and returned. If not then the result is normalized. At this point, normalization is checked if it is occurred then the result is returned if not the result is normalized by decreasing the exponent

. DEPARTMENT OF ECE PAGE NO.-23

Subtractsignificand

si

Page 24: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

and shifting the significand towards left side and exponent underflow is checked. If underflow occurred then it is reported if not the normalized result is given out. If the exponents are not same, then the smaller exponent is incremented until the exponents of both numbers X and Y are same and significand is shifted right and checked if it is zero then the other number is kept at the result Z, if the significand is not zero then subtraction and further process is carried out.2.8.3. MultiplicationThe multiplication of two floating point numbers can be done by multiplying the mantissa part and adding exponent and adjusting the sign. For example, to multiply 1.8x times 9.5x :

1. Perform unsigned integer multiplication of the mantissas. The decimal point in the sum is positioned so that the number of decimal places equals the sum of the number of decimal places in the numbers.

1.8 x 9.5 -----

17.102. Add the exponents:

1 + 0 ---

13. Normalize the result:

4. Set the sign of the result.

2.8.3.1. Multiplication using IEEE floating point standard:

The multiplication of two IEEE FPS numbers is performed similarly. The number 18.0 in IEEE FPS format is:

The number 9.5 in IEEE FPS format is:

1. The product of the 24 bit mantissas produces a 48 bit result with 46 bits to the right of the binary point:

. DEPARTMENT OF ECE PAGE NO.-24

Page 25: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Truncated to 24 bits with the hidden bit in (), the mantissa is:

2. The biased-127 exponents are added. Addition in biased-127 representation can be performed by 2's complement with an additional bias of -127 since:

The sum of the exponents is:

E 1000 0011 (4)+ 1000 0010 (3)----------- 0000 0101+ 1000 0001 (-127)----------- 1000 0110 (+7)

3. The mantissa is already in normal form. If the position of the hidden bit overflows, the mantissa must be shifted right and the exponent incremented.

4. The sign of the result is the xor of the sign bits of the two numbers.

When the fields are assembled in IEEE FPS format, the result is:

2.8.3.2. Block diagram of floating point multiplication:

. DEPARTMENT OF ECE PAGE NO.-25

Page 26: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Fig 2.8.3.2: Block Diagram of Floating Point Multiplication Let A and B be two numbers which are represented in IEEE floating point standard with signA as sign of number A , expA as exponent of number A , manA as mantissa of number A, signB as sign of number B, expB as exponent of number B and manB as mantissa of number B. The exponents of both the numbers are added and subtracted from the bias 127. The mantissa of both numbers A and B are multiplied. Sign of the result is given by performing xor operation of signA and signB. Resultant mantissa is truncated and normalized to fit for the IEEE format.XOR operation for sign bit can be given as follows:

Table 2.8.3.2: XOR OPERATIONSign A Sign B Resultant sign0 0 00 1 11 0 11 1 1

2.8.3.3. Flow chart for floating point multiplicationThis flow chart can be explained as follows:

Let the two numbers which have to be multiplied be X and Y which are represented in IEEE floating point standard and the result be Z which has to be also represented in IEEE floating point standard.

At the first step, number X is checked and if it is zero then the result Z is also zero if not then next number Y is checked and if it is zero then the result would be zero. If both the numbers X and Y are not zero, then the exponents are added and a bias of 127 is subtracted from the result. If the exponents are stored in biased form, the exponent sum would have doubled the bias. Thus, the bias value

. DEPARTMENT OF ECE PAGE NO.-26

Page 27: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

must be subtracted from the sum Now exponent overflow and underflow conditions are checked and if they are present then it is reported.

If exponent overflow and underflow is not present then the next step would be multiplication of the significand bits and the result is truncated and normalized and the result is rounded and returned. The resultant sign bit would be the xor operation of sign bits of X and Y.

Fig 2.8.3.3: Flow Chart For Floating Point Multiplication.

2.8.4. Division

Consider an example of dividing a=0.3 and b= 0.2 . in general floating point division the exponents of both the numbers are subtracted and the significands are divided.

Exponent of a is 2 and exponent of b is 3. So resultant exponent would be 2-3=-1. When the division of both significands are done then the quotient would be 1.5, i.e., 0.3 0.2 =1.5.

Hence the result can be given as 1.5 .

. DEPARTMENT OF ECE PAGE NO.-27

Page 28: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

2.8.4.1. Block diagram for floating point division:

Fig 2.8.4.1: Block Diagram of Floating Point Division Let A and B be two numbers which are represented in IEEE floating point standard with SignA as sign of number A , ExpA as exponent of number A , ManA as mantissa of number A, signB as sign of number B, ExpB as exponent of number B and ManB as mantissa of number B. The exponents are subtracted and biased using the bias value. The mantissa of both numbers A and B are divided. If anyone number of the two are negative, then the result is also negative is represented by bit ‘1’. If both the numbers are either positive or negative, then the resultant sign is also positive and is represented by bit ‘0’.

2.8.4.2 Floating point division using IEEE floating point standard

Floating point division using IEEE floating point standard can be performed using the following steps:

1. Perform unsigned integer division of the dividend mantissa by the divisor mantissa. 2. Subtract the exponent of the divisor from the exponent of the dividend. 3. Normalize the result. 4. Set the sign of the result.

In the first step, the dividend mantissa is extended to 48 bits by adding 0's to the right of the least significant bit. When divided by a 24 bit divisor, a 24 bit quotient is produced.

The exponent arithmetic is performed by subtracting the exponent of the divisor from the exponent of the dividend. The negative of a bias-127 number is formed by complementing the bits and adding -1 (1111 1111) in 2's complement.

. DEPARTMENT OF ECE PAGE NO.-28

Page 29: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

As in floating point multiplication, overflow and underflow occur when the difference of the exponents is outside the range of the bias-127 exponent representation. Special representations are employed for these cases: E = 255 and M = 0 for overflow and E = 0 for underflow. Floating point division by zero is undefined and has a special representation in which E = 255 and M is nonzero. This value is called Not A Number, or NaN.

Considering a=0.3 and b= 0.2

0.3 can be represented as

S E M

010000001(0)11000000000000000000000

0.2 can be represented as

S E M

01000001100101100100000000000000

Exponents are to be subtracted. For this, in this case as larger number has to be subtracted from smaller number, 2’s complement of larger number is done and is added to smaller number and then the resultant exponent is again 2’s complemented and negative sign is assigned.

2’s complement of 1000 0011 is 01111101

1000 0010

+0111 1101

1111 1111

2’s complement of 1111 1111 is 0000 0001 and when biased is 1000 0000

When mantissas are divided and result is truncated and normalized and can be given as S E M

01000001101100000000000000000000 2.8.4.3. Flow chart for floating point division

If division of two numbers X and Y and result Z are considered to be represented in IEEE floating point standard .Then the steps that occur are:

1. Number X and Y are checked. If number X is zero then the result would be zero ‘0’ and if the number Y is zero ‘0’ then the result would be

infinity .

. DEPARTMENT OF ECE PAGE NO.-29

Page 30: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

2. If the numbers X and Y are non zero then the exponents are subtracted and bias 127 is added and exponent overflow and underflow conditions are checked.

3. If they are present, then those conditions are reported. If not the mantissas are divided and truncated and normalized result is given out.

Fig 2.8.4.3: Flow chart for floating point divisionProblems may arise as the result of these arithmetic operations: • Exponent overflow: A positive exponent exceeds the maximum possible exponent value. In some systems, this may be designated as +∞ or -∞. • Exponent underflow: A negative exponent is less than the minimum possible exponent value (e.g., - 200 is less than - 127).This means that the number is too small to be represented, and it may be reported as 0. • Significand underflow: In the process of aligning significands, digits may flow off the right end of the significand. As we shall discuss, some form of rounding is required. • Significand overflow: The addition of two significands of the same sign may result in a carry out of the most significant bit.

. DEPARTMENT OF ECE PAGE NO.-30

Page 31: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

2.9. Rounding Error

In floating point arithmetic, rounding errors occur as a result of the limited precision of the mantissa

Rounding occurs in floating point multiplication when the mantissa of the product is reduced from 48 bits to 24 bits. The least significant 24 bits are discarded.

The IEEE FPS defines four rounding rules for choosing the closest floating point when a rounding error occurs:

RN: Round to Nearest. Break ties by choosing the least significant bit = 0. RZ: Round toward Zero. Same as truncation in sign-magnitude. RP: Round toward Positive infinity. RM: Round toward minus infinity. Same as truncation in 2's complement.

RN is generally preferred and introduces less systematic error than the other rules.

The absolute error introduced by rounding is the actual difference between the exact value and the floating point representation. The size of the absolute error is proportional to the magnitude of the number. For numbers in IEEE FPS format, the absolute error is less than

The largest absolute rounding error occurs when the exponent is 127 and is approximately since

The relative error is the absolute error divided by the magnitude of the number which is approximated. For normalized floating point numbers, the relative error is approximately

since

For denormalized numbers (E = 0), relative errors increase as the magnitude of the number decreases toward zero. However, the absolute error of a denormalized number is less than

since the truncation error in a denormalized number is

2.10. Normalization

By normalization, highest precision can be achieved.

. DEPARTMENT OF ECE PAGE NO.-31

Page 32: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

To efficiently use the bits available for the significand, it is shifted to the left until all leading 0's disappear (as they make no contribution to the precision). The value can be kept unchanged by adjusting the exponent accordingly.

Moreover, as the MSB of the significand is always 1, it does not need to be shown explicitly. The significand could be further shifted to the left by 1 bit to gain one more bit for precision. The first bit 1 before the decimal point is implicit. The actual value represented is 

However, to avoid possible confusion, in the following the default normalization does not assume this implicit 1 unless otherwise specified.

Zero is represented by all 0's and is not (and cannot be) normalized.

Example:

A binary number   can be represented in 14-bit floating-point form in the following ways (1 sign bit, a 4-bit exponent field and a 9-bit significand field):

     

     

     

     

     

2.11. Truncation

To retain maximum accuracy, all extra bits during operation (called guard bits) are kept (e.g.,

multiplication). If we assume   bits are used in final representation of a

number,   extra guard bits are kept during operation. By the end of the operation, the

resulting   bits need to be truncated to   bits by one of the three methods.

1. Chopping: simply drop all   guard bit

. DEPARTMENT OF ECE PAGE NO.-32

Page 33: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

We define the truncation error as: 

We see that the truncation error of chopping is 

As   is always greater than 0, we say this truncation error is biased.

2. Von Neumann Rounding: If at least one of the guard bits is 1, set   (no matter whether it is originally 0 or 1), otherwise do nothing.

Two worst cases

Both two cases can be summarized as 

i.e., the Von Neumann rounding error is unbiased.

3. Rounding:a) If the highest guard bit is 1 and the rest guard bits are not all 0, add 1 to LSB  

.

Interpretation: Value represented by guard bits is greater than 0.5 , round up.

. DEPARTMENT OF ECE PAGE NO.-33

Page 34: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

 

   

b) If the highest guard bit   is 0, drop all guard bits.

Interpretation: Value represented by guard bits is smaller than 0.5  , round down.

c) If the highest guard bit is 1 and the rest guard bits are all 0, the rounding

depends on the LSB :

if  , round down: 

or if  , round up: 

Interpretation: Value represented by guard bits is 0.5 , it is randomly rounded either up or down with equal probability (50%).

The rounding error of these cases can summarized as 

. DEPARTMENT OF ECE PAGE NO.-34

Page 35: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

CHAPTER - 3

Floating Point Functions

A floating-point number is the one, which is capable of representing real and decimal numbers. The floating-point operations are incorporated into the design as functions. The logic for these is different from the ordinary arithmetic functions.

The numbers in contention have to be first converted into the standard IEEE 784, 1985 floating point standard representation before any sort of operations are conducted on them. The floating-point representation for a standard single precision number is…

A single precision number is a 32-bit number that is segmented to represent the floating-point number. The above representation is the IEEE-784 1985 standard representation. The MSB is the sign-bit i.e. the sign of the floating point number. The next eight bits are that of the exponent.

The exponent in this IEEE standard is represented in excess-127 format. I.e. the exponent obtained by balancing operations is added to 0111, 1111. Therefore zero is represented by 0111, 1111. Positive numbers are represented by binary values greater than 0111, 1111 and negative numbers are represented by binary values less than it.

The logic for floating point addition, subtraction, multiplication and division is presented in the following pages.

. DEPARTMENT OF ECE PAGE NO.-35

S E7-E0 Ma23-Ma0

Page 36: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

3.1 Floating Point Addition

The real number is first represented in the IEEE-784 standard floating point representation.

These numbers are stored into the memory from which they are read and processed. Now the numbers from the memory are loaded into two registers; namely

Accumulator and the Temp register that loads the value appearing on the data bus. These numbers are distinct. So to add their mantissa’s, we have to first normalize their

exponents. So, we compare the exponents and increment the exponent of the lower exponent

while right shifting its mantissa. This is done till the lower exponent becomes equal to the higher one.

Once the exponents are normalized. The mantissas are then added to each other and the result is then stored in a temporary register.

The exponent that is now normalized is concatenated with the resulting mantissa and the sign of the result that is calculated separately.

. DEPARTMENT OF ECE PAGE NO.-36

Page 37: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

3.2 Floating Point Subtraction

The real number is first represented in the IEEE-784 standard floating point representation.

These numbers are stored into the memory from which they are read and processed. Now the numbers from the memory are loaded into two registers; namely

Accumulator and the Temp register that loads the value appearing on the data bus. These numbers are distinct. So to add their mantissa’s, we have to first normalize their

exponents. So, we compare the exponents and increment the exponent of the lower exponent

while right shifting its mantissa. This is done till the lower exponent becomes equal to the higher one.

Once the exponents are normalized. The mantissas are then subtracted and the result is stored in a temporary register.

The exponent that is now normalized is concatenated with the resulting mantissa and the sign of the result that is calculated separately.

The following things have been possible due to the fact that Binary single-bit addition and subtraction are defined in Verilog-HDL.

The major difference between Addition and subtraction is in the sign of the final result that is calculated separately. Apart from that there is no difference in the procedure of normalizing the numbers before the business of Addition or subtraction is carried out.

. DEPARTMENT OF ECE PAGE NO.-37

Page 38: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

3.3 Floating Point Multiplication

Here the exponents and mantissas of the numbers in contention don’t have to be normalized.

In multiplication the operations are done simultaneously and separately on the mantissa and the exponent.

Binary multiplication is defined for single bit numbers in Verilog-HDL so the exponents are just added and the individual bits in the mantissas are multiplied to get the final result.

The final output is obtained by concatenating the product of the mantissas, the resulting exponent and the sign of the result that is calculated separately.

There is however a limitation to this operation. If two numbers of N-bits are multiplied then the resulting no will be of 2N-bits thereby decreasing the numerical scope of the inputs.

So each input should not exceed 12-bits in length, so that the result is restricted to not more than 24-bits.

. DEPARTMENT OF ECE PAGE NO.-38

Page 39: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

3.4 Floating Point Division

This is more complicated then Multiplication, owing to the fact that apart from taking care of the exponent we have to consider the cases of dealing with the mantissa.

The logic for floating point division is as follows. First the exponents are directly added or subtracted depending on which is bigger.

Apart from that the final sign of the division is calculated separately. Now both the numbers in the IEEE-784 standard format are compared. The

convention here is that the Numerator should be always less than the denominator. This is to ensure that whatever comes as the result is after the decimal point. The decimal is assumed to be before the MSB of the resulting quotient.

Now since the greater of the two numbers is decided, if the numerator is less than the denominator then we proceed to append the value of the numerator to 24 zeros and load it into an internal register say Temp that consists of 49-bits. Now the first 24-bits from the MSB are compared with the divisor.

If the divisor is more than the dividend then we left shift the dividend by 1 and add it to the two’s complement of the divisor.

The result is stored in Temp. if the MSB or the 49th bit is one than we add a one in the quotient. And if it is zero, we put a zero in the quotient.

We initiate a counter and carry this process for 24 times, till the quotient is full. Once the quotient is full, we append it with the exponent value and the Sign of the

division that are calculated separately.

. DEPARTMENT OF ECE PAGE NO.-39

Page 40: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

CHAPTER 4DSP PROCESSORS

4.1 Processor: The processor is an electronic circuit that operates at the speed of an internal clock. The clock speed (also called cycle), corresponds to the number of pulses per second, written in Hertz (Hz). Clock frequency is generally a multiple of the system frequency, meaning a multiple of the motherboard frequency. With each clock peak, the processor performs an action that corresponds to an instruction or a part thereof. A measure called CPI (Cycles per Instruction) gives a representation of the average number of clock cycles required for a microprocessor to execute an instruction. A microprocessor power can thus be characterized by the number of instructions per second that it is capable of processing. MIPS (millions of instructions per second) are the unit used and correspond to the processor frequency divided by the CPI. The number of bits in an instruction varies according to the type of data (between 1 and 4 8-bit bytes). When the processor executes instructions, data is temporarily stored in small, local memory locations of 8, 16, 32 or 64 bits called registers. Depending on the type of processor, the overall number of registers can vary from about ten to many hundreds. 

4.2 Digital Signal ProcessingDigital Signal Processors are microprocessors specifically designed to handle Digital Signal Processing tasks. These devices have seen tremendous growth in the last decade, finding use in everything from cellular telephones to advanced scientific instruments. DSPs are designed to perform the mathematical calculations needed in Digital Signal Processing.Computers are extremely capable in two broad areas

1. Data manipulation such as word processing and database management

2. Mathematical calculation used in science, engineering and digital signal processing.

All microprocessors can perform both tasks; however it is difficult or expensive to make a device that is optimized for both. There are technical tradeoffs in the hardware design, such as the size of the instruction set and how it interrupts are handled. There are marketing issues involved: development and manufacturing cost, competitive position, product life time, and so on. DSPs can perform the mathematical calculations needed in digital signal processing. Data manipulations involve storing and sorting information. For instance, consider a word processing program. The basic task is to store the information, organize the information and then retrieve the information such as saving the document on a floppy disk or printing it with laser printer. These tasks are accomplished by moving data from one location to another, and testing for inequalities (A=B, A<B , etc).Consider another example of how a document is printed from a word processor. The computer continually tests the input device (mouse or keyboard) for the binary code that indicates “print the document”. When this code is detected, the program moves the data from computer’s memory to the printer. While mathematics is occasionally used in this type of application, it is infrequent and does not significantly affect the overall execution speed.

. DEPARTMENT OF ECE PAGE NO.-40

Page 41: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

In comparison, the execution speed of most DSP algorithms is limited almost completely by the number of multiplications and additions required. For example, consider the implementation of an FIR digital filter, the most common DSP technique. Using standard notation, the input signal is referred to by x [ ], while the output signal is denoted by y [ ]. The task is to calculate the sample at location n in the output signal, i.e. y[n]. An FIR filter performs this calculation by multiplying appropriate samples from the input signal by a group of coefficients denoted by: ............. This is simply saying that the input signal has been convolved with a filter kernel consisting of: . depending on the application, there may only be a few coefficients in the filter kernel. While there is some data transfer and inequality evaluation in this algorithm, such as to keep track of the intermediate results and control the loops, the math operations dominate the execution time.

Fig4.2.1: Graphical representation of FIR digital filter design.

. DEPARTMENT OF ECE PAGE NO.-41

Page 42: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

In FIR filtering , each sample in the output signal ,y[n], is found by multiplying samples from the input signal, x[n],x[n-1],x[n-2],....,by the filter kernel coefficients,

. and summing the products. In addition to performing mathematical calculations very rapidly, DSPs must also have a predictable execution time. If suppose you are launching your desktop computer on some task , say, converting a word processing document from one form to another. It doesn’t matter if the processing takes 10 milliseconds or 10 seconds. You simply wait for the action to be completed before you give the computer its next assignment In comparison, most DSPs are used in applications where the processing is continuous, not having a defined start or end. For instance, consider a designing of an audio signal in DSP system such as a hearing aid. If the digital signal is being received at 20,000 samples per second, the DSP must be able to maintain a sustained throughput of 20,000 samples per second. There are a few reasons for why to not to make it faster than necessary because as speed increases, so as the cost , power consumption, design difficulty and so on. Hence execution time is critical for selecting the proper device, as well as the algorithms that can be applied. Digital signal processors are designed to quickly carry out FIR filters and similar techniques.

A 32-bit processor can offer a linear 32-bit address space with accompanying quick address calculations on a 32-bit data path. Floating point calculations also require a 32-bit processor for good efficiency. 16-Bit processors spend a significant amount of time manipulating stack elements when dealing with floating point numbers, whereas 32-bit processors are naturally suited to the size of the data elements. There are many instances in which scaled integer arithmetic is more appropriate than floating point numbers to increase speed on some processors. In these cases a 16-bit processor may suffice. However, floating point math must often be used to reduce the cost of programming a project, and to support code written in high level languages. Also, with the advent of very fast floating point processing hardware, the traditional speed advantage of integer operations over floating point operations is decreasing.

The disadvantages of 32-bit processors are cost and system complexity. 32-Bit processor chips tend to cost more because they have more transistors and pins than do 16-bit chips. They also require 32 bit wide program memory and a generally larger printed circuit board than 16-bit processors. There is less room on-chip for extra features such as hardware multipliers, but these items will appear as chip fabrication technology gets denser.

4.3. Difference between off-line processing and real time processing:In off-line processing, the entire input signal resides in the computer at the same time. For example, a geophysicist might use a seismometer to record the ground movement during the earthquake. After shaking is over, the information may be read into a computer and analysed in some way. The key point in off-line processing is that all of the information is simultaneously available to the processing program. This is common in scientific research and engineering. Off-line processing is a realm of personal computers and mainframes.In real-time processing, the output signal is produced at the same time that the input signal is acquired. For example, this is needed in telephone communication, hearing aids and radar.

. DEPARTMENT OF ECE PAGE NO.-42

Page 43: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

For instance, a 10 millisecond delay in a telephone call cannot be detected by the speaker or listener. Likewise, it makes no difference if a radar signal is delayed by few seconds before being displayed to the operator. Real time applications input a sample, perform the algorithm and output a sample, over and over. Alternatively, they may input a group of samples perform the algorithm and output a group of samples. This is the world of digital signal processors.4.4. Architecture of digital signal processor:One of the biggest bottlenecks in executing DSP algorithm is transferring information to and from memory. This includes data, such as samples from the input signal and filter coefficients as well as program instructions, the binary codes that go into the program sequencer.Different architectures available are:Von Neumann Architecture.Harvard Architecture.Super Harvard Architecture (SHARC).Von Neumann architecture contains a single memory and a single bus for transferring data into and out of the central processing unit (CPU). The von Neumann design is satisfactory when the contents of the task to be executed must be serial. Most of the computers are using this architecture today. Harvard architecture has separate memories for data and program instructions, with separate buses for each. Since the buses operate independently, program instructions and data can be fetched at the same time, improving the speed over the single bus design. Most present day DSPs use this dual bus architecture. The basis of Harvard design is that the data memory bus is busier than the program memory bus. When two numbers are multiplied, two binary values (the numbers) must be passed over the data memory bus, while only one binary value (the program instruction) is passed over the program memory bus. To improve upon this situation, we start by relocating part of the "data" to program memory. For instance, we might place the filter coefficients in program memory, while keeping the input signal in data memory.Super Harvard Architecture idea is to build upon the Harvard architecture by adding features to improve the throughput. While the SHARC DSPs are optimized in dozens of ways, two areas are important enough to be included are an instruction cache, and an I/O controller. The SHARC DSPs provides both serial and parallel communications ports. These are extremely high speed connections. For example, at a 40 MHz clock speed, there are two serial ports that operate at 40 Mbits/second each, while six parallel ports each provide a 40 Mbytes/second data transfer. When all six parallel ports are used together, the data transfer rate is an incredible 240Mbytes/second.

. DEPARTMENT OF ECE PAGE NO.-43

Page 44: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Figure 4.4.1: different architectures The Von Neumann architecture uses a single memory to hold both data and instructions. In comparison, the Harvard architecture uses separate memories for data and instructions, providing higher speed. The Super Harvard Architecture improves upon the Harvard design by adding an instruction cache and a dedicated I/O controller. DSP algorithms generally spend most of their execution time in loops, such as instructions. This means that the same set of program instructions will continually pass from program memory to the CPU. The Super Harvard architecture takes advantage of this situation by including an instruction cache in the CPU. This is a small memory that contains about 32 of the most recent program instructions. The first time through a loop, the program instructions must be passed over the program memory bus. This results in slower operation because of the conflict with the coefficients that must also be fetched along this path. However, on additional executions of the loop, the program instructions can be pulled from the instruction cache. This means that all of the memory to CPU information transfers can be accomplished in a single cycle: the sample from the input signal comes over the data memory bus, the coefficient comes over the program memory bus, and the program instruction comes from the instruction cache. In the jargon of the field, this efficient transfer of data is called a high memory-access bandwidth. Some DSPs have on-board analog-to-digital and digital-to-analog converters, a feature called mixed signal. However, all DSPs can interface with external converters through serial or parallel ports.The main buses (program memory bus and data memory bus) are also accessible from outside the chip, providing an additional interface to off-chip memory and peripherals. This allows

. DEPARTMENT OF ECE PAGE NO.-44

Page 45: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

the SHARC DSPs to use a four Gigaword (16 Gbyte) memory, accessible at 40Mwords/second (160 Mbytes/second), for 32 bit data.The math processing is broken into three sections, a multiplier, an arithmetic logic unit (ALU), and a barrel shifter. The multiplier takes the values from two registers, multiplies them, and places the result into another register. The ALU performs addition, subtraction, absolute value, logical operations (AND, OR, XOR, NOT), conversion between fixed and floating point formats, and similar functions. Elementary binary operations are carried out by the barrel shifter, such as shifting, rotating, extracting and depositing segments, and so on. A powerful feature of the SHARC family is that the multiplier and the ALU can be accessed in parallel. In a single clock cycle, data from registers 0-7 can be passed to the multiplier, data from registers 8-15 can be passed to the ALU, and the two results returned to any of the 16 registers.At the top of the diagram are two blocks labelled Data Address Generator (DAG), one for each of the two memories. These control the addresses sent to the program and data memories, specifying where the information is to be read from or written to. In simpler microprocessors this task is handled as an inherent part of the program sequencer, and is quite transparent to the programmer.

Fig 4.4.2: Typical DSP architecture. Digital Signal Processors are designed to implement tasks in parallel. This simplified diagram is of the Analog Devices SHARC DSP. Compare this architecture with the tasks needed to implement an FIR filter. All of the steps within the loop can be executed in a single clock cycle.

. DEPARTMENT OF ECE PAGE NO.-45

Page 46: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

4.5. Comparison between Fixed Point and Floating Point System:

Both fixed- and floating-point DSPs are designed to perform the high-speed computations that underlie real-time signal processing. Both feature system-on-a-chip (SOC) integration with on-chip memory and a variety of high-speed peripherals to ensure fast throughput and design flexibility. Tradeoffs of cost and ease of use often heavily influenced the fixed- or floating-point decision in the past. Today, though, selecting either type of DSP depends mainly on whether the added computational capabilities of the floating-point format are required by the application.As the terms fixed- and floating-point indicate, the fundamental difference between the two types of DSPs is in their respective numeric representations of data. While fixed-point DSP hardware performs strictly integer arithmetic, floating-point DSPs support either integer or real arithmetic, the latter normalized in the form of scientific notation. TI’s TMS320C62x™ fixed-point DSPs have two data paths operating in parallel, each with a 16-bit word width that provides signed integer values within a range from –2^15 to2^15. TMS320C64x™ DSPs, double the overall throughput with four 16-bit (or eight 8-bit or two 32-bit) multipliers. TMS320C5x™ and TMS320C2x™ DSPs, with architectures designed for handheld and control applications, respectively, are based on single16-bit data paths.By contrast, 32 bit floating-point DSPs divide a 32-bit data path into two parts: a 24-bit mantissa that can be used for either for integer values or as the base of a real number, and an 8-bit exponent. The 16M range of precision offered by 24 bits with the addition of an 8-bit exponent, thus supporting a vastly greater dynamic range than is available with the fixed-point format. The 32 bit DSP can also perform calculations using industry-standard double-width precision (64 bits, including a 53-bit mantissa and an 11-bit exponent). Double-width precision achieves much greater precision and dynamic range at the expense of speed, since it requires multiple cycles for each operation. All floating point DSPs can also handle fixed point numbers, a necessity to implement counters, loops, and signals coming from the ADC and going to the DAC. However, this doesn't mean that fixed point math will be carried out as quickly as the floating point operations; it depends on the internal architecture .For instance, the SHARC DSPs are optimized for both floating point and fixed point operations, and executes them with equal efficiency. For this reason, the SHARC devices are often referred to as "32-bit DSPs," rather than just “Floating Point." fixed point arithmetic is much faster than floating point in general purpose computers. However, with DSPs the speed is about the same, a result of the hardware being highly optimized for math operations.The internal architecture of a floating point device is more complicated than for a fixed point device. All the registers and data buses must be 32 bits wide instead of only 16; the multiplier and ALU must be able to quickly perform floating point arithmetic, the instruction set must be larger and so on. Floating point (32 bit) has better precision and a higher dynamics range than fixed point (16 bit). In addition, floating point programs often have a shorter development cycle, since the programmer doesn’t generally need to worry about issues such as overflow, underflow and round-off. Fixed point DSPs are cheaper than floating point devices.

. DEPARTMENT OF ECE PAGE NO.-46

Page 47: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Figure 4.5.1: Fixed versus floating point. Fixed point DSPs are generally cheaper, while floating point devices have better precision, higher dynamic range, and a shorter development cycle. 32 bit floating system can do better than a 16 bit fixed point system in the rate of performance. It can be rated in the form of signal to noise ratio and quantisation noise. Suppose we store in a 32 bit floating point format. The gap between this number and its adjacent number is about one ten-millionth of the value of the number. To store the number, it must be round up or down by a maximum of one-half the gap size i.e., each time we store a number in floating point notation, we add noise to the signal. The same thing happens when a number is stored as a 16-bit fixed point value, except that the added noise is much worse. This is because the gaps between adjacent numbers are much larger. For instance, suppose we store the number 10,000 as a signed integer. The gap between numbers is one ten-thousandth of the value of the number we are storing.Noise is signal is usually represented by its standard deviation. Standard deviation of this quantisation noise is about one-third of the gap size. This means that the signal-to –noise ratio for storing a floating point number is about 30 million to one, while for a fixed point number it is only about ten-thousand to one. In other words, floating point has roughly 3,000 times less quantisation noise than fixed point. Suppose we implement an FIR filter in fixed point. To do this, we loop through each coefficient, multiply it by the appropriate sample from the input signal, and add the product to an accumulator. Here's the problem. In traditional microprocessors, this accumulator is just another 16 bit fixed point variable. To avoid overflow, we need to scale the values being added, and will correspondingly add quantization noise on each step. In the worst case, this quantization noise will simply add, greatly lowering the signal-to-noise ratio of the system. For instance, in a 500 coefficient FIR filter, the noise on each output sample may be 500 times the noise on each input sample. The signal-to-noise ratio of ten-thousand to one has dropped to a ghastly twenty to one. Although this is an extreme case, it illustrates the main point when many operations are carried out on each sample; it's bad, really bad.DSPs handle this problem by using an extended precision accumulator. This is a special register that has 2-3 times as many bits as the other memory locations. For example, in a 16 bit DSP it may have 32 to 40 bits, while in the SHARC DSPs it contains 80 bits for fixed point use. This extended range virtually eliminates round-off noise while the accumulation is in progress. The only round-off error suffered is when the accumulator is scaled and stored in the 16 bit memory. This strategy works very well, although it does limit how some algorithms must be carried out. In comparison, floating point has such low quantization noise that these techniques are usually not necessary.

. DEPARTMENT OF ECE PAGE NO.-47

Page 48: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

In addition to having lower quantization noise, floating point systems are also easier to develop algorithms for. Most DSP techniques are based on repeated multiplications and additions. In fixed point, the possibility of an overflow or underflow needs to be considered after each operation. The programmer needs to continuously understand the amplitude of the numbers, how the quantization errors are accumulating, and what scaling needs to take place. In comparison, these issues do not arise in floating point; the numbers take care of themselves.Considering number of bits are used in the ADC and DAC which can be considered as trade off for fixed and floating point. In many applications, 12-14 bits per sample is the crossover for using fixed versus floating point. For instance, television and other video signals typically use 8 bit ADC and DAC, and the precision of fixed point is acceptable. In comparison, professional audio applications can sample with as high as 20 or 24 bits, and almost certainly need floating point to capture the large dynamic range. The next thing to look at is the complexity of the algorithm that will be run .If it is relatively simple, think fixed point; if it is more complicated, think floating point. For example, FIR filtering and other operations in the time domain only require a few dozen lines of code, making them suitable for fixed point. In contrast, frequency domain algorithms, such as spectral analysis and FFT convolution, are very detailed and can be much more difficult to program. While they can be written in fixed point, the development time will be greatly reduced if floating point is used. When fixed point is chosen, the cost of the product will be reduced, but the development cost will probably be higher due to the more difficult algorithms. In the reverse manner, floating point will generally result in a quicker and cheaper development cycle, but a more expensive final product.

. DEPARTMENT OF ECE PAGE NO.-48

Page 49: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Figure 4.5.2: Fixed versus floating point instructions.This describes the ways that multiplication can be carried out for both fixed and floating point formats. These are the multiplication instructions used in the SHARC DSPs. While only a single command is needed for floating point, many options are needed for fixed point. Fn = Fx * Fy, where Fn, Fx, and Fy are any of the 16 data registers. It could not be any simpler. In comparison, look at all the possible commands for fixed point multiplication. These are the many options needed to efficiently handle the problems of round-off, scaling, and format. Rn, Rx, and Ry refer to any of the 16 data registers, and MRF and MRB are 80 bit accumulators. The vertical lines indicate options. For instance, the top-left entry in this table means that all the following are valid commands: Rn = Rx * Ry, MRF = Rx * Ry, and MRB = Rx * Ry. In other words, the value of any two registers can be multiplied and placed into another register, or into one of the extended precision accumulators. This table also shows that the numbers may be either signed or unsigned (S or U), and may be fractional or integer (F or I). The RND and SAT options are ways of controlling rounding and register overflow.The important idea is that the fixed point programmer must understand dozens of ways to carry out the very basic task of multiplication. In contrast, the floating point programmer can spend his time concentrating on the algorithm.

. DEPARTMENT OF ECE PAGE NO.-49

Page 50: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

4.6 Trends in DSP:

Figure 4.6.1: Major trends in DSPs. As illustrated in (a), about 38% of embedded designers have already switched from conventional microprocessors to DSPs, and another 49% are considering the change. The high throughput and computational power of DSPs often makes them an ideal choice for embedded designs. In (b), about twice as many engineers use fixed point as use floating point DSPs. This is mainly driven by consumer products that must have low cost electronics, such as cellular telephones. However, as shown in (c), floating point is the fastest growing segment; over one-half of engineers currently using 16 bit devices plan to migrate to floating point DSPs .As shown in (c), over one-half of engineers using 16-bits devices plan to migrate to floating point at some time in the near future. 32-bit floating point has a higher dynamic range, meaning there is a greater difference between the largest number and the smallest number that can be represented. About twice as many engineers currently use fixed point as use floating point DSPs. However, this depends greatly on the application. Fixed point is more popular in competitive consumer products where the cost of the electronics must be kept very low. A good example of this is cellular telephones. When you are in competition to sell millions of your product, a cost difference of only a few dollars can be the difference between success and failure. In comparison, floating point is more common when greater performance is needed and cost is not important. For instance, suppose you are designing a medical imaging system, such a

. DEPARTMENT OF ECE PAGE NO.-50

Page 51: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

computed tomography scanner. Only a few hundred of the model will ever be sold, at a price of several hundred-thousand dollars each. For this application, the cost of the DSP is insignificant, but the performance is critical. In spite of the larger number of fixed point DSPs being used, the floating point market is the fastest growing segment. 4.7 Accuracy of Floating Point DSPThe greater accuracy of the floating-point format results from three factors. First, the 24-bit word width in floating-point DSPs yields greater precision than the 16-bit fixed-point word width, in integer as well as real values. Second, exponentiation vastly increases the dynamic range available for the application. A wide dynamic range is important in dealing with extremely large data sets and with data sets where the range cannot be easily predicted. Third, the internal representations of data in floating-point DSPs are more exact than in fixed-point, ensuring greater accuracy in end results.Three data word widths are important to consider in the internal architecture of a DSP. The first is the I/O signal word width, which is 24 bits for floating-point, 16 bits for fixed-point, and can be 8, 16, or 32 bits for fixed-point DSPs. The second word width is that of the coefficients used in multiplications. While fixed-point coefficients are 16 bits, the same as the signal data in DSPs, floating-point coefficients can be 24 bits or 53 bits of precision, depending whether single or double precision is used. The precision can be extended beyond the 24 and 53 bits in some cases when the exponent can represent significant zeroes in the coefficient.Finally, there is the word width for holding the intermediate products of iterated multiply accumulate (MAC) operations. For a single 16-bit by 16-bit multiplication, a 32-bit product would be needed, or a 48-bit product for a single 24-bit by 24-bit multiplication. However, iterated MACs require additional bits for overflow headroom. In fixed- point devices, this overflow headroom is 8 bits, making the total intermediate product word width 40 bits (16 signal + 16 coefficient + 8 overflow). Integrating the same proportion of overflow headroom in 32 bit floating-point DSPs would require 64 intermediate product bits (24 signal + 24 coefficient + 16 overflow), which would go beyond most application requirements in accuracy. Fortunately, through exponentiation the floating-point format enables keeping only the most significant 48 bits for intermediate products, so that the hardware stays manageable while still providing more bits of inter mediate accuracy than the fixed-point format offers.

. DEPARTMENT OF ECE PAGE NO.-51

Page 52: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

CHAPTER: 5

INTRODUCTION

5.1 INTRODUCTION TO VLSI: The first digital circuit was designed by using electronic components like vacuum

tubes and transistors. Later Integrated Circuits (ICs) were invented, where a designer can be able to place digital circuits on a chip consists of less than 10 gates for an IC called SSI (Small Scale Integration) scale. With the advent of new fabrication techniques designer can place more than 100 gates on an IC called MSI (Medium Scale Integration). Using design at this level, one can create digital sub blocks (adders, multiplexes, counters, registers, and etc.) on an IC. This level is LSI (Large Scale Integration), using this scale of integration people succeeded to make digital subsystems (Microprocessor, I/O peripheral devices and etc.) on a chip. At this point design process started getting very complicated. i.e., manually conversion from schematic level to gate level or gate level to layout level was becoming somewhat lengthy process and verifying the functionality of digital circuits at various levels became critical. This created new challenges to digital designers as well as circuit designers. Designers felt need to automate these processes. In this process, Rapid advances in Software Technology and development of new higher level programming languages taken place. People could able to develop CAD/CAE (Computer Aided Design/Computer Aided Engineering) tools, for design electronics circuits with assistance of software programs. Functional verification and Logic verification of design can be done using CAD simulation tools with greater efficiency. It became very easy to a designer to verify functionality of design at various levels.With advent of new technology, i.e., CMOS (Complementary Metal Oxide Semiconductor) process technology. One can fabricate a chip contains more than Million of gates. At this point design process still became critical, because of manual converting the design from one level to other. Using latest CAD tools could solve the problem. Existence of logic synthesis tools design engineer can easily translate to higher-level design description to lower levels. This way of designing (using CAD tools) is certainly a revolution in electronic industry. This may be leading to development of sophisticated electronic products for both consumer as well as business.

. DEPARTMENT OF ECE PAGE NO.-52

Page 53: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

5.2 IC DESIGN FLOW:

5.3 INTRODUCTION TO VHDL:VHDL is acronym for VHSIC hardware Description language.

. DEPARTMENT OF ECE PAGE NO.-53

Specifications

Behavioral Description

r

Behavioral Simulation

Functional Simulation

avioral

Synthesis

Logic Synthesis

Gate Level Net list

Constraints

Constraints

Lib

AutomaticP&R L

ayout

Logic simulation

Fabrication

Lay Out Manageme

nt

SPECIFICATION

Behavioral simulation

Behavioralsimulation

RTL Description

Functional simulation

lib

Gate level netlist

layout

Logic simulation

Page 54: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

VHSIC is acronym for very high speed Integrated Circuits. It is a hardware description language that can be used to model a digital system at many levels of abstraction, ranging from the algorithmic level to the gate level.

The VHDL language can be regarded as an integrated amalgamation of the following languages:

Sequential language Concurrent language Net-list language Timing specifications

Waveform generation language VHDLThis language not only defines the syntax but also defines very clear simulation

semantics for each language construct. Therefore, models written in this language can be verified using a VHDL simulator. This subset is usually sufficient to model most applications .The complete language, however, has sufficient power to capture the descriptions of the most complex chips to a complete electronic system.

5.3.1 HISTORY:The requirements for the language were first generated in 1988 under the VHSIC chips

for the department of Defense (DOD). Reprocurement and reuse was also a big issue. Thus, a need for a standardized hardware description language for the design, documentation, and verification of the digital systems was generated. The IEEE in the December 1987 standardized VHDL language; this version of the language is known as the IEEE STD 1076-1987. The official language description appears in the IEEE standard VHDL language Reference manual, available from IEEE. The language has also been recognized as an American National Standards Institute (ANSI) standard. According to IEEE rules, an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard. Consequently, the language was upgraded with new features, the syntax of many constructs was made more uniform, and many ambiguities present in the 1987 version of the language were resolved. This new version of the language is known as the IEEE STD 1076-1993.

5.3.2 CAPABILITIESThe following are the major capabilities that the language provides along with the

features that the language provides along with the features that differentiate it from other hardware languages.

The language can be used as exchange medium between chip vendors and CAD tool users. Different chip vendors can provide VHDL descriptions of their components to system designers.

The language can be used as a communication medium between different CAD and CAE tools

The language supports hierarchy; that is a digital can be modeled as asset of interconnected components; each component, in turn, can be modeled as a set of interconnected subcomponents.

. DEPARTMENT OF ECE PAGE NO.-54

Page 55: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

The language supports flexible design methodologies: top-down, bottom-up, or mixed. It supports both synchronous and asynchronous timing models.

Various digital modeling techniques, such as finite –state machine descriptions, and Boolean equations, can be modeled using the language.

The language is publicly available, human-readable, and machine-readable. The language supports three basic different styles: Structural, Dataflow, and

behavioral. It supports a wide range of abstraction levels ranging from abstract behavioral

descriptions to very precise gate-level descriptions. Arbitrarily large designs can be modeled using the language, and there are no

limitations imposed by the language on the size of the design.

5.3.3 HARDWARE ABSTRACTION:VHDL is used to describe a model for a digital hardware device. This model specifies

the external view of the device and one or more internal views. The internal view of the device specifies functionality or structure, while the external view specifies the interface of the device through which it communicates with the other modules in the environment.

In VHDL each device model is treated as a distinct representation of a unique device, called an Entity. The Entity is thus a hardware abstraction of the actual hardware device. Each Entity is described using one model, which contains one external view and one or more internal views.

Architecture Body:An architecture body using any of the following modeling styles specifies the internal

details of an entity.1. As a set of interconnected components (to represent structure)2. As a set of concurrent assignment statements (to represent data flow)3. As a set of sequential assignment statements (to represent behavior)

As any combination of the above three.Structural style of modeling:In this one an entity is described as a set of interconnected components. Such a model

for the HALF_ADDER entity, is described in a n architecture bodyArchitecture ha of ha isComponent Xor2 Port (X, Y: in BIT; Z:out BIT);End component;Component And2 Port (L, M: in BIT; N:outBIT); End component; Begin X1: Xor2portmap (A, B, SUM) A1: AND2portmap (A, B, CARRY); End ha;The name of the architecture body is ha .the entity declaration for half adder specifies

the interface ports for this architecture body. The architecture body is composed of two parts:

. DEPARTMENT OF ECE PAGE NO.-55

Page 56: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

the declaration part and the statement part. Two component declarations are present in the declarative part of the architecture body.

The declared components are instantiated in the statement part of the architecture body using component instantiation. The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position.

5.3.4 DATAFLOW STYLE OF MODELING: In this modeling style, the flow of data through the entity is expressed primarily using concurrent signal assignment statements. The data flow model for the half adder is described using two concurrent signal assignment statements .In a signal assignment statement, the symbol <=implies an assignment of a value to a signal.

5.3.5 BEHAVIORAL STYLE OF MODELING: The behavioral style of modeling specifies the behavior of an entity as a set of

statements that are executed sequentially in the specific order. These sets of sequential statements, which are specified inside a process statement, do not explicitly specify the structure of the entity but merely its functionality. A process statement is a concurrent statement that can appear with in an architecture body.

5.4 INTRODUCTION TO HDL TOOLS

5.4.1. SIMULATION TOOL5.4.1.1 Active HDL Overview:

Active-HDL is an integrated environment designed for development of VHDL, Verilog, and EDIF and mixed VHDL-Verilog-EDIF designs. It comprises three different design entry tools, VHDL'93 compiler, Verilog compiler, single simulation kernel, several debugging tools, graphical and textual simulation output viewers, and auxiliary utilities designed for easy management of resource files, designs, and libraries.

5.4.2. Standards Supported

VHDL:

The VHDL simulator implemented in Active-HDL supports the IEEE Std. 1076-1993 standard.

Verilog:

The Verilog simulator implemented in Active-HDL supports the IEEE Std. 1364-1995 standard. Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL.

EDIF:

Active-HDL supports Electronic Design Interchange Format version 2 0 0.

. DEPARTMENT OF ECE PAGE NO.-56

Page 57: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

VITAL:

The simulator provides built-in acceleration for VITAL packages version 3.0. The VITAL-compliant models can be annotated with timing data from SDF files. SDF files must comply with OVI Standard Delay Format Specification Version 2.1.

WAVES:

Active-HDL supports automatic generation of test benches compliant with the WAVES standard. The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P1029.1/D1.0 May 1997). The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs, the communication of hardware design and test verification data, the maintenance, modification and procurement of hardware system.

5.4.3 ACTIVE-HDL Macro Language:

All operations in Active-HDL can be performed using Active-HDL macro language. The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI).

1. HDL Editor:

HDL Editor is a text editor designed for HDL source files. It displays specific syntax categories in different colors (keyword coloring). The editor is tightly integrated with the simulator to enable debugging source code. The keyword coloring is also available when HDL Editor is used for editing macro files, Perl scripts, and Tcl scripts.

2. Block Diagram Editor:

Block Diagram Editor is a graphical tool designed to create block diagrams. The editor automatically translates graphically designed diagrams into VHDL or Verilog code.

3. State Diagram Editor:State Diagram Editor is a graphical tool designed to edit state machine diagrams. The

editor automatically translates graphically designed diagrams into VHDL or Verilog code.

4. Waveform Editor:

Waveform Editor displays the results of a simulation run as signal waveforms. It allows you to graphically edit waveforms so as to create desired test vectors.

5. Design Browser:

The Design Browser window displays the contents of the current design, that is:a. Resource files attached to the design.b. The contents of the default-working library of the design.

. DEPARTMENT OF ECE PAGE NO.-57

Page 58: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

c. The structure of the design unit selected for simulation.d. VHDL, Verilog, or EDIF objects declared within a selected region

of the current design.

6. Console window:

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands, macros, and scripts. All Active-HDL tools output their messages to Console.

5.4.4 Compilation:

Compilation is a process of analysis of a source file. Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator. In Active-HDL, a source file can be on of the following:

VHDL file (.vhd) Verilog file (.v) EDIF net list file (.EDIF) State diagram file (.asf) Block diagram file (.bde)

In the case of a block or state diagram file, the compiler analyzes the intermediate VHDL, Verilog, or EDIF file containing HDL code (or net list) generated from the diagram.

A net list is a set of statements that specifies the elements of a circuit (for example, transistors or gates) and their interconnection.

Active-HDL provides three compilers, respectively for VHDL, Verilog, and EDIF. When you choose a menu command or toolbar button for compilation, Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled.

5.4.5 Simulation:

The purpose of simulation is to verify that the circuit works as desired. The Active-HDL simulator provides two simulation engines. Event-Driven Simulation Cycle-Based Simulation The simulator supports hybrid simulation – some portions of a design can be

simulated in the event-driven kernel while the others in the cycle-based kernel. Cycle-based simulation is significantly faster than event-driven.

. DEPARTMENT OF ECE PAGE NO.-58

Page 59: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Fig4.3.5.1: Simulation

5.4.6 SYNTHESIS TOOL:

5.4.6.1 OVERVIEW OF XILINX ISE:

Integrated Software Environment (ISE) is the Xilinx design software suite. This overview explains the general progression of a design through ISE from start to finish.

ISE enables you to start your design with any of a number of different source types, including:

HDL (VHDL, Verilog HDL, ABEL) Schematic design files EDIF NGC/NGO State Machines IP Cores

From your source files, ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities, including ModelSim Xilinx Edition and the HDL Bencher test bench generator.  HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE.  The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD, and finally produce a bit stream for your device configuration.

5.4.6.2 Design Entry:

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports.

. DEPARTMENT OF ECE PAGE NO.-59

Page 60: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create, view, and edit schematics and symbols for the Design Entry step of the Xilinx® design flow.

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders, to system-level building blocks such as filters, transforms, FIFOs, and memories.

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints.

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit I/O, Global logic, and Area Group constraints.

State CAD State Machine Editor - State CAD allows you to specify states, transitions, and actions in a graphical editor.  The state machine will be created in HDL.

5.4.6.3 Implementation:

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file.

Map - The Map program maps a logical design to a Xilinx FPGA.

Place and Route (PAR) - The PAR program accepts the mapped design, places and routes the FPGA, and produces output for the bit stream generator.

Floor planner - The Floor planner allows you to view a graphical representation of

the FPGA, and to view and modify the placed design.

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation, including routing.

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs. With Timing Analyzer, analysis can be performed immediately after mapping, placing or routing an FPGA design, and after fitting and routing a CPLD design.

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file.

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs, macro cell details, equations, and pin assignments.

. DEPARTMENT OF ECE PAGE NO.-60

Page 61: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

5.4.6.4 Device Download and Program File Formatting:

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration.

iMPACT - The iMPACT tool generates various programming file formats, and subsequently allows you to configure your device.

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices.

Integration with ChipScope Pro.

. DEPARTMENT OF ECE PAGE NO.-61

Page 62: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

CHAPTER 6

SIMULATION RESULTS

Simulation for floating point addition, subtraction, multiplication and division are done using active HDL tool and the results are as follows:

6.1 simulation results of floating point additionSimulation result for 32 bit floating point addition can be explained as follows:

Fig 6.1 simulation results for floating point addition

The inputs given are in the form of hexadecimal and converted into binary format. This converted inputs are represented in the form of IEEE 32 bit floating point standard representation and hence the 31st bit which are sign bits of both the inputs are checked and then the exponent bits which are the next eight bits are also checked and the necessary operation is carried out . Remaining bits are mantissa bits whose addition is performed and the result is converted back into hexadecimal form.

. DEPARTMENT OF ECE PAGE NO.-62

Page 63: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

6.2 simulation results for floating point subtractionSimulation result for 32 bit floating point subtraction can be explained as follows:

Fig 6.2 simulation results for floating point subtraction

The inputs given are in the form of hexadecimal and converted into binary format. This converted inputs are represented in the form of IEEE 32 bit floating point standard representation and hence the 31st bit which are sign bits of both the inputs are checked and then the exponent bits which are the next eight bits are also checked and the necessary operation is carried out . Remaining bits are mantissa bits whose subtraction is performed and the result is converted back into hexadecimal form.

. DEPARTMENT OF ECE PAGE NO.-63

Page 64: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

6.3 simulation results for floating point multiplicationSimulation result for 32 bit floating point multiplication can be explained as follows:

Fig 6.3 simulation results for floating point multiplication

The inputs given are in the form of hexadecimal and converted into binary format. This converted inputs are represented in the form of IEEE 32 bit floating point standard representation and hence the 31st bit which are sign bits of both the inputs are checked and then the exponent bits which are the next eight bits are also checked and the necessary operation is carried out . Remaining bits are mantissa bits whose multiplication is performed and the result is converted back into hexadecimal form.

. DEPARTMENT OF ECE PAGE NO.-64

Page 65: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

6.4 simulation results for floating point divisionSimulation result for 32 bit floating point can be division explained as follows:

Fig 6.4 simulation results for floating point division

The inputs given are in the form of hexadecimal and converted into binary format. This converted inputs are represented in the form of IEEE 32 bit floating point standard representation and hence the 31st bit which are sign bits of both the inputs are checked and then the exponent bits which are the next eight bits are also checked and the necessary operation is carried out . Remaining bits are mantissa bits whose division is performed and the result is converted back into hexadecimal form.

. DEPARTMENT OF ECE PAGE NO.-65

Page 66: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

CHAPTER 7

CONCLUSION AND FUTURE SCOPE

7.1 CONCLUSIONS Floating point numbers are been converted into required IEEE floating point standard

representation.

Procedures for performing basic arithmetic operations are been formed.

Basic arithmetic operations such as addition, subtraction, multiplication and division are been performed on floating point by representing the numbers in IEEE 32-bit floating point standard.

The Functional-simulation has been successfully carried out with the results matching with the expected ones.

7.2 FUTURE SCOPE 32 bit floating point arithmetic operations can be extended to 64 bit floating point

arithmetic operations.

Advanced booth algorithm can also be implemented for 32 bit floating point multiplication.

. DEPARTMENT OF ECE PAGE NO.-66

Page 67: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

CHAPTER 8

APPLICATIONS

Floating-point applications are those that require greater computational accuracy and flexibility than fixed-point DSPs. Image recognition used for medicine is similar to audio in requiring a high degree of accuracy. Many levels of signal input from light, x-rays, ultrasound and other sources must be defined and processed to create output images that provide useful diagnostic information. The greater precision of signal data, together with the device’s more accurate internal representations of data, enable imaging systems to achieve a much higher level of recognition and definition for the user.Radar for navigation and guidance is a traditional floating-point application since it requires a wide dynamic range that cannot be defined ahead of time and either uses the divide operator or matrix inversions. The radar system may be tracking in a range from 0 to infinity, but need to use only a small subset of the range for target acquisition and identification. Since the subset must be determined in real time during system operation, it would be all but impossible to base the design on a fixed-point DSP with its narrow dynamic range and quantization effects.Wide dynamic range also plays a part in robotic design. Normally, a robot functions within a limited range of motion that might well fit within a fixed-point DSP’s dynamic range. However, unpredictable events can occur on an assembly line. For instance, the robot might weld itself to an assembly unit, or something might unexpectedly block its range of motion. In these cases, feedback is well out of the ordinary operating range, and a system based on a fixed-point DSP might not offer programmers an effective means of dealing with the unusual conditions. The wide dynamic range of a floating-point DSP, however, enables the robot control circuitry to deal with unpredictable circumstances in a predictable manner.

. DEPARTMENT OF ECE PAGE NO.-67

Page 68: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

BIBLIOGRAPHY

Fraeman, M., Hayes, J., Williams, R. & Zaremba, T. (1986) A 32 bit processor architecture for direct execution of Forth. In: 1986 FORML Conf. Proc., 28-30 November 1986, Pacific Grove CA, pp. 197-210

Jones, S. P. (1987) The Implementation of Functional Programming Languages, Prentice-Hall, New York

McKeeman, W. (1975) Stack computers. In: Stone, H. (Ed.) Introduction to Computer Architecture, Science Research Associates, Chicago, 1975, pp. 281-317

Yamamoto, M. (1981) A survey of high-level language machines in Japan. Computer, July 1981, 14(7) 68-78

REFERENCES

www.ieee xplore. ieee .org

www.computer.org/portal/web/csdl/doi/10.1109/SNPD.2007.46

www.intel.com

. DEPARTMENT OF ECE PAGE NO.-68

Page 69: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

APPENDIX

--***************************************************************************--Entity Name : Fadd

--Entity Description : Floating Point Addition involves three steps-- 1.Compute Ea-Eb -- 2.Shift the that has lesser Exponent by Ea-Eb places to the right *-- 3.Add with another Mantissa *--***************************************************************************--**************This Module performs Floating Point Addition*****************library IEEE;use IEEE.std_logic_1164.all;use IEEE.std_logic_arith.all;use IEEE.std_logic_unsigned.all;--***************Input and Output Declarations*******************************entity Fadd is

port ( a : in std_logic_vector(31 downto 0); b : in std_logic_vector(31 downto 0); y : out std_logic_vector(31 downto 0) );

end Fadd;architecture Fadd of Fadd is--*****************************************************************--* This Function performs Floating Point Addition *

--*****************************************************************Function float_add(Acc,Data:in std_logic_vector(31 downto 0)) return std_logic_vector is--************Function to convert 7-bit binary to integer**********Function to_integer(x:in std_logic_vector(6 downto 0)) return integer is--**************variable Declararions****************************** variable sum :integer:=0; variable Temp :Std_logic_vector(6 downto 0); begin

temp:=x; --*********Generating loop to convert 7-bit Binary to integer*******

. DEPARTMENT OF ECE PAGE NO.-69

Page 70: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

xxx: for i in 0 to 6 loop if (temp(i)='1')then

sum:=sum+2**i; else

Sum:=Sum; end if;

end loop; return sum;

end function; --*********************Variable Declarations************************ variable MaIn : std_logic_vector(22 downto 0); -- Internal Registervariable MbIn : std_logic_vector(22 downto 0); -- Internal Registervariable Ea,Eb : std_logic_vector(7 downto 0); -- Two Exponents including Signvariable IR : std_logic_vector(22 downto 0); -- Resultant Mantissavariable IE : std_logic_vector(6 downto 0); -- Resultant Exponentvariable Ns : integer; -- Number Of Shiftsvariable Ma,Mb : std_logic_vector(22 downto 0); -- Mangitude Of Two mantissas variable ES : std_logic; -- Sign Of Resulant Exponentvariable a,b : std_logic; -- Sign Of Two exponentsvariable s1,s2 : std_logic; -- Sign Of Two mantissas variable Sign : std_logic; -- Sign Of Resultant Mantissavariable W,Z : std_logic_vector(1 downto 0);variable X : std_logic_vector(31 downto 0); -- Final Resultbegin

MaIn:=Acc(22 downto 0);MbIn:=Data(22 downto 0);Ea :=Acc(30 downto 23);Eb :=Data(30 downto 23);a :=Acc(31);b :=Data(31);Z :=(a&b);

--*****************************************************************--*Equalization of Exponents includes two steps--*1.Subtraction of Exponents--*2.Alignement of Mantissas --*****************************************************************

case Z iswhen "00" => Mb:=MbIn;

Ma:=MaIn; if((Ea(6 downto 0))<(Eb(6 downto 0))) then

NS:=to_integer(Eb(6 downto 0)-Ea(6 downto 0)); for x in 1 to Ns loop

Ma := ('0' & Ma(22 downto 1)); end loop; IE:=Eb(6 downto 0);

. DEPARTMENT OF ECE PAGE NO.-70

Page 71: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Es:=Eb(7) elsif((Eb(6 downto 0))<(Ea(6 downto 0))) then

NS:=to_integer(Ea(6 downto 0)-Eb(6 downto 0)); for x in 1 to Ns loop

Mb:=('0' & Mb(22 downto 1));end loop;

IE:=Ea(6 downto 0);Es:=Ea(7);

else NS:=Ns; Ma:=Ma;

Mb:=Mb; IE:=IE;

ES:=Ea(7); end if;

when "01" => Mb:=MbIn; Ma:=MaIn;

NS:=to_integer(Ea(6 downto 0)+Eb(6 downto 0)); for x in 1 to Ns loop

Mb:=('0' & Mb(22 downto 1)); end loop; IE:=Ea(6 downto 0);

ES:=Ea(7);

when "10" => Mb:=MbIn; Ma:=MaIn;

NS:=to_integer(Eb(6 downto 0)+Ea(6 downto 0)); for x in 1 to Ns loop

Ma:=('0' & Ma(22 downto 1)); end loop; IE:=Eb(6 downto 0);

ES:=Eb(7);

when "11" => Mb:=MbIn;

Ma:=MaIn; if((Ea(6 downto 0))<(Eb(6 downto 0))) then

NS:=to_integer(Eb(6 downto 0)-Ea(6 downto 0)); for x in 1 to Ns loop

Mb:=('0' & Mb(22 downto 1)); end loop; IE:=Ea(6 downto 0);

ES:=Ea(7); elsif((Eb(6 downto 0))<(Ea(6 downto 0))) then

. DEPARTMENT OF ECE PAGE NO.-71

Page 72: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

NS:=to_integer(Ea(6 downto 0)-Eb(6 downto 0)); for x in 1 to Ns loop

Ma:=('0' & Ma(22 downto 1));

end loop; IE:=Eb(6 downto 0);

ES:=Eb(7); else

NS:=Ns; Ma:=Ma;

Mb:=Mb; IE:=IE;

ES:=Ea(7); end if;

when others => Null;end case;

--******************Addition of Mantissas**************************** IR:=Ma+Mb; --***********logic for the sign of the mantissa**********************

s1:=Acc(31); s2:=Data(31);

W :=(s1&s2); case W iswhen "00" => sign:='0'; when "11" => sign:='1'; when "01" => if(Ea>Eb) then

sign:='0'; elsif(Ea<Eb) then

sign:='1'; elsif(Ea=Eb) then if(Ma>Mb) then

sign:='0'; elsif(Ma<Mb) then

sign:='1'; elsif(Ma=Mb) then

sign:='0'; else

sign:=sign; end if; else sign:=sign;

end if; when "10" => if(Ea>Eb) then. DEPARTMENT OF ECE PAGE NO.-72

Page 73: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

sign:='1'; elsif(Ea<Eb) then

sign:='0'; elsif(Ea=Eb) then

if(Ma>Mb) thensign:='1';

elsif(Ma<Mb) thensign:='0';

elsif(Ma=Mb) thensign:='0';

elsesign:=sign;

end if; else sign:=sign;

end if;when others => null;

end case;

--***********Final Result After Addition**************************X:=(sign & ES & IE & IR(22 downto 0)); return X;

end function;begin

process(a,b)begin

y<=float_add(a,b);end process;

end Fadd;

--***************************************************************************--Entity Name : Fsub

*

--Entity Description : Floating Point Addition involves three steps *. DEPARTMENT OF ECE PAGE NO.-73

Page 74: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

-- 1.Compute Ea-Eb *

-- 2.Shift the that has lesser Exponent by Ea-Eb *-- places to the right *-- 3.Subtract with another Mantissa *

--***************************************************************************

library IEEE;use IEEE.std_logic_1164.all;use ieee.std_logic_arith.all;use ieee.std_logic_unsigned.all;

--***************Input and Output Declarations******************************

entity Fsub isport (

a : in STD_LOGIC_VECTOR (31 downto 0);b : in STD_LOGIC_VECTOR (31 downto 0);y : out std_logic_vector(31 downto 0));

end Fsub;

architecture F_sub of Fsub is

--*****************************************************************--* This Function performs Floating Point Subtraction *

--*****************************************************************

Function float_sub(Accout,Data:in std_logic_vector(31 downto 0)) return std_logic_vector is

--************Function to convert 7-bit binary to integer**********

Function to_integer(x:in std_logic_vector(6 downto 0)) return integer is--*********************Variable Declarations***********************variable sum : integer:=0;variable Temp : Std_logic_vector(6 downto 0); begin temp:=x; . DEPARTMENT OF ECE PAGE NO.-74

Page 75: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

xxx: for i in 0 to 6 loop if (temp(i)='1')then

sum:=sum+2**i; else

Sum:=Sum; end if;

end loop; return sum;end function; --*********************variable Declarations***********************

variable MaIn,MbIn: std_logic_vector(22 downto 0); -- Internal Registervariable Ea,Eb : std_logic_vector(7 downto 0); -- Two exponents Including Signvariable IR : std_logic_vector(22 downto 0); -- Resultant Mantissavariable IE : std_logic_vector(6 downto 0); -- Resultant Exponentvariable Ns : integer; -- Number Of Shiftsvariable Ma,Mb : std_logic_vector(22 downto 0); -- Mangitude Of Two Mantissas variable ES : std_logic; -- Sign Of Resulant Exponentvariable a,b : std_logic; -- Sign Of Two Exponentsvariable s1,s2 : std_logic; -- Sign Of Two Mantissas variable sign : std_logic; -- Sign Of Resultant Mantissavariable W,Z : std_logic_vector(1 downto 0);variable X : std_logic_vector(31 downto 0); -- Final Result

begin

MaIn:=Accout(22 downto 0);MbIn:=Data(22 downto 0);Ea :=Accout(30 downto 23);Eb :=Data(30 downto 23);a :=Accout(30);b :=Data(30);Z :=(a&b);

--*****************************************************************--*Equalization of Exponents includes two steps *

*--*1.Subtraction of Exponents

* *

. DEPARTMENT OF ECE PAGE NO.-75

Page 76: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

--*2.Alignement of Mantissas * *

--*****************************************************************case Z is

when "00" => Mb:=MbIn;

Ma:=MaIn; if((Ea(6 downto 0))<(Eb(6 downto 0))) then

NS:=to_integer(Eb(6 downto 0)-Ea(6 downto 0));

for x in 1 to Ns loop Ma:=('0' & Ma(22 downto 1));

end loop;

IE:=Eb(6 downto 0); ES:=Eb(7);

elsif((Eb(6 downto 0))<(Ea(6 downto 0))) then

NS:=to_integer(Ea(6 downto 0)-Eb(6 downto 0));

for x in 1 to Ns loop Mb:=('0' & Mb(22 downto 1));

end loop; IE:=Ea(6 downto 0);

ES:=Ea(7);else

NS:=Ns; Ma:=Ma;

Mb:=Mb; IE:=IE;

ES:=Ea(7); end if;

when "01" => Mb:=MbIn; Ma:=MaIn;

NS:=to_integer(Ea(6 downto 0)+Eb(6 downto 0)); for x in 1 to Ns loop

Mb:=('0' & Mb(22 downto 1));

end loop; IE:=Ea(6 downto 0);

ES:=Ea(7);

. DEPARTMENT OF ECE PAGE NO.-76

Page 77: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

when "10" => Mb:=MbIn; Ma:=MaIn;

NS:=to_integer(Eb(6 downto 0)+Ea(6 downto 0)); for x in 1 to Ns loop

Ma:=('0' & Ma(22 downto 1));

end loop; IE:=Eb(6 downto 0);

ES:=Eb(7);

when "11" => Mb:=MbIn; Ma:=MaIn;

if((Ea(6 downto 0))<(Eb(6 downto 0))) then NS:=to_integer(Eb(6 downto 0)-Ea(6 downto 0)); for x in 1 to Ns loop

Mb:=('0' & Mb(22 downto 1));

end loop; IE:=Ea(6 downto 0);

ES:=Ea(7);

elsif((Eb(6 downto 0))<(Ea(6 downto 0))) then NS:=to_integer(Ea(6 downto 0)-Eb(6 downto 0)); for x in 1 to Ns loop

Ma:=('0' & Ma(22 downto 1));

end loop; IE:=Eb(6 downto 0);

ES:=Eb(7);else

NS:=Ns; Ma:=Ma;

Mb:=Mb; IE:=IE;

ES:=Ea(7); end if;

when others => null;end case;

--******************Subtraction of Mantissas************************

IR:=Ma-Mb;

. DEPARTMENT OF ECE PAGE NO.-77

Page 78: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

--***********logic for the sign of the mantissa**********************

s1:=Accout(31); s2:=Data(31);

W:=(s1&s2);case W is

when "00"=> sign:='0';when "11"=> sign:='1';when "01"=> if(Ea>Eb)then

sign:='0'; elsif(Ea<Eb) then

sign:='1'; elsif(Ea=Eb) then if(Ma>Mb) then

sign:='0'; elsif(Ma<Mb) then

sign:='1'; elsif (Ma=Mb) then

sign:='0'; else

sign:=sign;end if;

elsesign:=sign;

end if;

when "10"=> if (Ea>Eb)thensign:='1';

elsif (Ea<Eb) thensign:='0';

elsif (Ea=Eb) thenif(Ma>Mb) then

sign:='1'; elsif(Ma<Mb) then

sign:='0'; elsif (Ma=Mb) then

sign:='0'; else

sign:=sign;end if;

elsesign:=sign;

end if;. DEPARTMENT OF ECE PAGE NO.-78

Page 79: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

when others=> null;

end case;

--***********Final Result After Subtraction************************

X:=(sign & ES & IE & IR(22 downto 0));return X;

end function;begin

process(a,b)beginy<=float_sub(a,b);end process;

end f_sub;--**************************************************************************--Entity Name : Fmul

*--

*--Entity Description : Floating Point Multiplication Two steps *

-- 1.Addtion of the Exponents *

-- 5.Multiplication of the Mantissas *

--**************************************************************************

library IEEE;use IEEE.std_logic_1164.all;use IEEE.std_logic_unsigned.all;

--***************Input and Output Declarations***********************

entity Fmul isport (

a: in STD_LOGIC_VECTOR (31 downto 0);b: in STD_LOGIC_VECTOR (31 downto 0);y: out STD_LOGIC_VECTOR (31 downto 0)

);end Fmul;

. DEPARTMENT OF ECE PAGE NO.-79

Page 80: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

architecture F_mul of Fmul is --*****************************************************************--* This Function performs Floating Point Multiplication *

--*****************************************************************

Function float_mul(Accout,Data:in std_logic_vector(31 downto 0)) return std_logic_vector is

--*********************variable Declarations***********************

variable e1,e2 : std_logic_vector(7 downto 0); -- Two Exponents Icluding Sign variable m1,m2 : std_logic_vector(10 downto 0); -- Magnitude O Two Mantissas variable s : std_logic; -- Sign Of Resultant Mantissa variable a,b : std_logic; -- Sign Two Exponents variable s1,s2 : std_logic; -- sign Two Mantissas variable Ea,Eb : std_logic_vector(6 downto 0); -- Magnitude Of Two Exponents variable c : std_logic; -- Sign Of Resultant Exponent variable e : std_logic_vector(6 downto 0); -- Resultant exponent variable m : std_logic_vector(21 downto 0); -- Resultant Mantissa variable carry : std_logic; -- Carry variable W,Z : std_logic_vector(1 downto 0); variable x : std_logic_vector(31 downto 0); -- Final Result begin

Carry:='0'; e1 :=Accout(30 downto 23); e2 :=Data(30 downto 23); m1 :=Accout(10 downto 0); m2 :=Data(10 downto 0);

--************logic for the sign of the Mantissa*******************

s1:=Accout(31); s2:=Data(31); Z :=(s1&s2);

case Z iswhen "00" => s:='0';when "11" => s:='0';when others=> s:='1';

end case;. DEPARTMENT OF ECE PAGE NO.-80

Page 81: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

--************logic for the sign of the exponent*******************

Ea:=e1(6 downto 0);Eb:=e2(6 downto 0);

a :=Accout(30);b :=Data(30);W :=(a&b);

case W is

when "00" => c:='0'; e:=Ea+Eb;

when "11" => c:='1';e:=Ea+Eb;

when "01" => if(Ea>Eb) then c:='0'; e:=Ea-Eb;

elsif(Ea<Eb) then c:='1';

e:=Eb-Ea; else

c:='0'; e:="0000000";

end if;

when "10" => if(Ea>Eb) then c:='1'; e:=Ea-Eb;

elsif(Ea<Eb) then c:='0'; e:=Eb-Ea;

else c:='0'; e:="0000000";

end if;

when others => null;end case;

--*************logic for multiplication*************************

m:=m1*m2;

. DEPARTMENT OF ECE PAGE NO.-81

Page 82: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

--***********Final Result After Multiplication******************

x:=(s & c & e(6 downto 0) &Carry & m(21 downto 0));return x;

end function; begin

process(a,b)begin

y<=float_mul(a,b);end process;

end F_mul;

***********************************************************************--Entity Name : Fdiv

*--Entity Description : Floating Point Division includes Five steps *

-- 1.Check for Zeros *-- 2.Evaluate the sign *-- 3.Align the Dividend *

-- 4.Subtraction of Exponents *

-- 5.Divide the Mantissas *

--**************************************************************************

library IEEE;use IEEE.std_logic_1164.all;use IEEE.std_logic_unsigned.all;

--***************Input and Output Declarations******************************

entity Fdiv isport (

a: in STD_LOGIC_VECTOR (31 downto 0);b: in STD_LOGIC_VECTOR (31 downto 0);y: out STD_LOGIC_VECTOR (31 downto 0)

);end Fdiv;

architecture F_div of Fdiv is--*****************************************************************

. DEPARTMENT OF ECE PAGE NO.-82

Page 83: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

--* This Function performs Floating Point Division *

--*****************************************************************

Function float_div(Accout,Data:std_logic_vector(31 downto 0)) return std_logic_vector is

--*********************Variable Declarations**********************

variable m1,m2 :std_logic_vector(22 downto 0); -- Magnitude Two Mantissasvariable e1,e2 :std_logic_vector(7 downto 0); -- Two Exponents Including Signvariable s1,S2 :std_logic; -- Sign Of Two Mantissasvariable S :std_logic; -- Sign Of Resultant Mantissavariable a,b :std_logic; -- Sign Of Two Exponentsvariable Ea,Eb :std_logic_vector(6 downto 0); -- Magnitude Of Two Exponents variable c :std_logic; -- Sign Of Resultant Exponentvariable e :std_logic_vector(6 downto 0); -- Resultant Exponentvariable temp1 :std_logic_vector(22 downto 0); -- Temporary Registervariable temp :std_logic_vector(45 downto 0); -- Temporary Registervariable Q :std_logic_vector(22 downto 0); -- Quotientvariable Remi :std_logic_vector(22 downto 0); -- Remaindervariable r1,r2 :std_logic_vector(22 downto 0);variable W,Z :std_logic_vector(1 downto 0);variable X :std_logic_vector(31 downto 0); -- Final Resultvariable i :integer;

beginm1:=Accout(22 downto 0);m2:=Data(22 downto 0);e1:=Accout(30 downto 23);e2:=Data(30 downto 23);

--*********logic for the sign of mantissa**************************

s1:=Accout(31); s2:=Data(31); W:=(s1&s2); case W is

when "00"=> s:='0';when "11"=> s:='0';when others=> s:='1';

end case;

. DEPARTMENT OF ECE PAGE NO.-83

Page 84: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

--*************************1.Checking for Zeros********************

if((m1="00000000000000000000000") and (m2="00000000000000000000000")) then report "Non Arithmetic Numbers:Please Verify Inputs"; elsif(m2="00000000000000000000000") then

report "Non Arithmetic Numbers:Please Verify Inputs"; else

m1:=m1;m2:=m2;

end if;

--*****************************************************************--*Dividend Alignment :- *--*If Dividend is greater than or equal to the Divisor,then *--* the Dividend fraction is Shifted to the Right and *--*its Exponent is incremented by '1' *

--*****************************************************************

r1:=m1;r2:=m2;

if (m1>m2) then report "m1 is greater than m2";r1:=r1(22 downto 1)&'0';Ea:=Ea+1;Temp:=(r1 & "00000000000000000000000");Temp1:=Temp(45 downto 23);

--*******************************************************************--*Generating the loop:- *--*1.If Dividend is smaller than the Divisor then left shift until *--* it becomes greater or equal ,keep those many zeros in Quotient.*--*2.Once Dividend became greter then subtract Divisor from the *--* Dividend and keep '1' in Quotient. *--*3.Continue the procedure until number of bits in Quotient become *--* 23,because the Remainder never becomes Zero. *--*******************************************************************

for i in 22 downto 0 loop if(Temp1>r2) then

Remi:=Temp1-r2; Remi:=(Remi(21 downto 0) & '0');

Remi(0):=Temp(i);Q(i):='1';

elsif(Temp1<r2) then. DEPARTMENT OF ECE PAGE NO.-84

Page 85: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

Remi:=Temp1-"00000000000000000000000"; Remi:=Remi(21 downto 0)&'0';

Remi(0):=Temp(i);Q(i):='0';

elseTemp1:=Remi;

end if; end loop;

elsif(m1=r2) then --***Since Both Dividend and Divisor are Equal Quotient is made '1'**

Q:="00000000000000000000001"; elsif(m1<m2)then

--*******************************************************************--*Generating the loop:- *--*1.If Dividend is smaller than the Divisor then left shift until *--* it becomes greater or equal ,keep those many zeros in Quotient.*--*2.Once Dividend became greter then subtract Divisor from the *--* Dividend and keep '1' in Quotient. *--*3.Continue the procedure until number of bits in Quotient become *--* 23,because the Remainder never becomes Zero. *--*******************************************************************

Temp:=(r1 & "00000000000000000000000");Temp1:=Temp(45 downto 23);

for i in 22 downto 0 loopif(Temp1>=r2) then

Remi:=Temp1-r2;Remi:=(remi(21 downto 0) & '0' );Remi(0):=Temp(i);Q(i):='1';

elsif(Temp1<r2) then

Remi:=Temp1-"00000000000000000000000";Remi:=(remi(21 downto 0) & '0');Remi(0):=Temp(i);Q(i):='0';

end if;Temp1:=Remi;

end loop;end if;

. DEPARTMENT OF ECE PAGE NO.-85

Page 86: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

--**************Logic for the Sign of Exponent****************

Ea:=e1(6 downto 0);Eb:=e2(6 downto 0);

a :=e1(7); b :=e2(7); Z :=(a&b);

case Z iswhen "00" => if(Ea>Eb) then

c:='0'; e:=Ea-Eb;

elsif(Ea<Eb) then c:='0';

e:=Eb-Ea; else

c:='0'; e:="0000000";

end if; when "11" => if(Ea>Eb) then

c:='1'; e:=Ea-Eb;

elsif(Ea<Eb) then c:='0';

e:=Eb-Ea; else

c:='0'; e:="0000000";end if;

when "01"=> if(Ea>Eb) then

c:='0'; e:=Ea+Eb;

elsif(Ea<Eb) then c:='0';

e:=Eb+Ea; else

c:='0'; e:="0000000";

end if;

when "10"=> if(Ea>Eb) then c:='1';

e:=Ea+Eb; elsif(Ea<Eb) then

c:='1';. DEPARTMENT OF ECE PAGE NO.-86

Page 87: Synthesizable Ip Core for 32

32-BIT FLOATING POINT PROCESSOR TEC

e:=Eb+Ea; else

c:='0';e:="0000000";

end if;

when others=> null;end case;

--***********Final Result After Division*************************** X:=(S & C & e(6 downto 0) &Q(22 downto 0));

return X;end function;

beginprocess(a,b)

beginY<=float_div(a,b);

end process;end F_div;

. DEPARTMENT OF ECE PAGE NO.-87