Download - Complete Floating Point(Blog)

7/31/2019 Complete Floating Point(Blog)

1/18

FLOATING POINT : )

Created by: Amira Hurriff


2/18

What is mean by floating point?

The following are example of floating-point numbers:

3.0 , -111.5 , , 3E-5

it is represent for non-integral numbers.(including very small

and very large numbers )

In essence, computers are integer machines and arecapable of representing real numbers only by using complex

codes. The most popular code for representing real numbers

is called the *IEEEFloating-Point Standard.* we will learn more about IEEE floating point standard,after

this. :D


3/18

Types float and double in c++ , and c (programming language)

in scientific notation

i. -4.44 x 10^77 normalized

ii. +9.943 x 10^-5

iii. 0.001 x 10^ 3 not normalized

The termfloating pointis derived from the fact

that there is no fixed number of digits before andafter the decimal point; that why it is, is called

float.


4/18

IEEE FLOATING-POINT

FORMAT

Example of table IEEE floating-point


5/18

The Sign Bit

The sign bit is as simple as it gets. 0 denotes a positive

number; 1 denotes a negative number. Flipping thevalue of this bit flips the sign of the number.

Normalize significant : 1.0


6/18

Videos about IEEE floating point ,its help you to moreunderstand about it. Hopefully


7/18

Single-precision range smallest value

- exponent : 0000001

- actual exponent = 1 27 = -126

- Fraction : 0000000 significand = 1.0

1.0 x 2^-126 1.2 x 10^-38 largest value

- Exponents : 11111110

- Actual exponent = 254 127 = +127

- Fraction : 111.11 significand 2.0

2.0 x 2^127 3.4 x 10^38


8/18

Double-Precision Range

smallest valueexponent : 00000000000001

Actual exponent = 1 1023 = -1022

Fraction : 00000 , significand = 1.01.0 x 2^ -1022 2.2 x 10 ^ -308

largest valueExponent : 1111111111110

Actual exponent = 2046 1023 = +1023Fraction : 11111 , significand 2.0

2.0 x 2^ 1023 1.8 x 10^ 308


9/18


10/18

FLOATING-POINT PRECISION

Relative precision

- equivalent to 23 x log10 2 23 x 0.3 6decimal digits of precision

- Equivalent to 52 x log10 2 16 decimal digitsof precision


11/18

Teach about how to calculate single

precision and etc


12/18

Floating-point example

represent -0.75

-0.75 = (-1) x 1.1 x 2

S = 1

Fraction = 100000Exponent = -1 + Bias

single : -1 + 127 = 126 = 01111110

double : -1 + 1023 = 1022 = 01111111110

single: 101111110100000

double : 101111111110100000


13/18

What number is represented by the single-

precision float

1100000010100000

S = 1

Fraction = 01000.00Fxponent = 10000001 = 129

x= (-1) x (1 + 01) x 2^(129-127)

= (-1) x 1.25 x 2= -5.0


14/18

Floating-point addition

consider a 4-digit decimal example

- 9.999 x 10 + 1.610 x 10

1. align decimal pointsShift number with smaller exponent

9.999 x 10 + 0.016 x 10 = 10.015 x 10

2. add significands

9.999 x 10 + 0.016 x 10 = 10.015 x 10

3. normalize result & check for over/underflow

1.0015 x 10 4. Round and renormalize if necessary

1.002 x 10


15/18

4-digit binary example

1.000 x 2 +

-1.110 x 2 ( 0.5 +

- 0.4375)1. Align binary points

shift number with smaller exponent

1.000 x 2 + -0.111 x 22. Add significands

1.000 x 2 + -0.111 x 2 = 0.001 x 2

3. Normalize result & check for over/underflow

1.000 x 2 (no change) = 0.0625


16/18

Floating-point adder

hardware


17/18

Floating point arithmetic hardware(FP ADDER

HARDWARE)

usually does

- Addition , subtraction , multiplication,division,

reciprocal, square-root

FP= integer conversion Operation usually takes several cycles


18/18

Consider a 4- digit decimal

Example : 1.110 x 10 x 9.200 x 10

1. Add exponents

- For biased exponents , subtract bias from sum

- New exponent = 10+ -5 =5

2. Multiply significands

- 1.110 x 9.200 = 10.212 , (10.212 x 10 )

3. Normalize result & check for over/underflow

- 1.0212 x 10

4. Round and renormalize if necessary- 1.021 x 10

5. Determine sign of result from signs of operands

- +1.021 x 10