UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING...

78
UNIT-II CENTRAL PROCESSING UNIT •INTODUCTION •ARITHMETIC LOGIC UNIT •FIXED POINT ARITHMETIC •FLOATING POINT ARITHMETIC •EXECUTION OF A COMPLETE INSTRUCTION •BASIC CONCEPTS OF PIPELINING

Transcript of UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING...

Page 1: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

UNIT-II CENTRAL PROCESSING UNIT

•INTODUCTION

•ARITHMETIC LOGIC UNIT

•FIXED POINT ARITHMETIC

•FLOATING POINT ARITHMETIC

•EXECUTION OF A COMPLETE INSTRUCTION

•BASIC CONCEPTS OF PIPELINING

Page 2: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

The arithmetic logic unit (ALU)

The central processing unit (CPU) performs operations on data. In most architectures it has three parts: an arithmetic logic unit (ALU), a control unit and a set of registers, fast storage locations (Figure ).

Figure Central processing unit (CPU)

Page 3: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Data Representation• The basic form of information handled

by a computer are instructions and data

• Data can be in the form of numbers or nonnumeric data

• Data in the number form can further classified as fixed point and floating point

Page 4: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Digit Sets and Encodings

Conventional and unconventional digit sets

Decimal digits in [0, 9]; 4-bit BCD, 8-bit ASCII Hexadecimal, or hex for short: digits 0-9 & a-f

Conventional digit set for radix r is [0, r – 1] Conventional binary digit set in [0, 1]

Page 5: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Positional Number Systems

Representations of natural numbers {0, 1, 2, 3, …}

||||| ||||| ||||| ||||| ||||| || sticks or unary code 27 radix-10 or decimal code 11011 radix-2 or binary code XXVII Roman numerals

Fixed-radix positional representation with k digits

Value of a number: x = (xk–1xk–2 . . . x1x0)r = xi r i

For example: 27 = (11011)two = (124) + (123) + (022) + (121) +

(120)

k–1

i=0

Page 6: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Fixed Point Representation

• Fixed point number actually symbolizes the real data types.

• As radix point is fixed ,the number system is fixed point number system

• Fixed point numbers are those which have a defined numbers after and before the decimal point.

Page 7: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Fixed-Point NumbersPositional representation: k whole and l fractional digits

Value of a number: x = (xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l )r = xi r i

For example:

2.375 = (10.011)two = (121) + (020) + (021) + (122) + (123)

Numbers in the range [0, rk – ulp] representable, where ulp = r –l

Fixed-point arithmetic same as integer arithmetic (radix point implied, not explicit)

Two’s complement properties (including sign change) hold here as well:

(01.011)2’s-compl = (–021) + (120) + (02–1) + (12–2) + (12–3) = +1.375

(11.011)2’s-compl = (–121) + (120) + (02–1) + (12–2) + (12–3) = –0.625

Page 8: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Unsigned Integer

• Unsigned integers represent positive numbers

• The decimal range of unsigned 8-bit binary numbers is 0 - 255

Page 9: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Unsigned Binary Integers

Schematic representation of 4-bit code for integers in [0, 15].

0000 0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

0 1

2

3

4

5

6 7

15

11

14

13

12

8 9

10

Inside: Natural number Outside: 4-bit encoding

0

1 2

3

15

4

5 6

7 8 9

Turn x notches counterclockwise

to add x

Turn y notches clockwise

to subtract y

11

14 13

12

10

Page 10: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Signed Integers

• We dealt with representing the natural numbers

• Signed or directed whole numbers = integers

{ . . . , 3, 2, 1, 0, 1, 2, 3, . . . }

•Signed magnitude for 8 bit numbers ranges from +127 to -127

• Signed-magnitude representation

+27 in 8-bit signed-magnitude binary code 0 0011011 –27 in 8-bit signed-magnitude binary code 1 0011011 –27 in 2-digit decimal code with BCD digits 1 0010 0111

Page 11: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Introduction to Fixed Point Arithmetic

• Using fixed point numbers to simulate floating point numbers

• Fixed point processor is usually cheaper

Page 12: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Addition

1011 (11)+ 0011 (3)

1110 (14)

010010.1 (18.5)+ 0110.110 (6.75)

011001.010 (25.25)

Page 13: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Subtraction

010010.100 (18.5)+ 111001.010 (-6.75)

1001011.110 (11.75)

010010.1 (18.5)- 0110.110 (6.75)

1011 (11)

11000 (8)

+ 1101 (-3)

1011 (11)- 0011 (3)

Page 14: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

A Serial Multiplier

Page 15: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Example of Multiplication Using Serial Multiplier

Page 16: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Serial Divider

Page 17: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Division Example Using Serial Divider

Page 18: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Floating-Point Numbers

Floating-point representation is like scientific notation: 20 000 000 = 2 10 7 0.000 000 007 = 7 10–9

To accommodate very large integers and very small fractions, a computer must be able to represent numbers and operate on them in such a way that the position of the binary point is variable and is automatically adjusted as computation proceeds.

Significand ExponentExponent base

Also, 7E9

Page 19: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Floating-point ComputationsFloating-point Computations

• Representation: (fraction, exponent) Has three fields: sign, significant digits and exponent

eg.111101.100110 1.11101100110*25

• Value representation = +/- M*2 E’-127

In case of a 32 bit number 1 bit represents sign

8 bits represents exponent E’=E +127(bias) [ excess 127 format]

23 bits represents Mantissa

Page 20: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Floating-point ComputationsFloating-point Computations

• Arithmetic operations

.5372400 x 102

+ .1580000 x 10-1

.5372400 x 102

+ .0001580 x 102

.5373980 x 102

.56780 x 105

+ .56430 x 105

.00350 x 105

.35000 x 103

.5372400 x 102

x .1580000 x 10-1

Addition

Page 21: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Floating-point ComputationsFloating-point Computations• Biased Exponent

– Bias: an excess number added to the exponent so that all exponents become positive

– Advantages• Only positive exponents

• Simpler to compare the relative magnitude

Page 22: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Floating-point ComputationsFloating-point Computations

• Standard Operand Format of floating-point numbers– Single-precision data type: 32bits

• ADDFS

– Double-precision data type: 64bits• ADDFL

IEEE Floating-Point Operand Format

Page 23: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Floating-point ComputationsFloating-point Computations

• Significand– A leading bit to the left of the implied binary point, together with

the fraction in the field

f field Significand Decimal Equivalent100…0 1.100…0 1.50010…0 1.010…0 1.25000…0 1.000…0 1.00

~ ~

s

Minimum number Maximum number01000..000

Page 24: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

ANSI/IEEE Standard Floating-Point Format (IEEE 754)

The two ANSI/IEEE standard floating-point formats.

Short (32-bit) format

Long (64-bit) format

Sign Exponent Significand

8 bits, bias = 127, –126 to 127

11 bits, bias = 1023, –1022 to 1023

52 bits for fractional part (plus hidden 1 in integer part)

23 bits for fractional part (plus hidden 1 in integer part)

Short exponent range is –127 to 128but the two extreme values

are reserved for special operands(similarly for the long format)

Revision (IEEE 754R) is being considered by a committee

Page 25: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Short and Long IEEE 754 Formats: Features

Table Some features of ANSI/IEEE standard floating-point formats Feature Single/Short Double/LongWord width in bits 32 64Significand in bits 23 + 1 hidden 52 + 1 hiddenSignificand range [1, 2 – 2–23] [1, 2 – 2–52]Exponent bits 8 11Exponent bias 127 1023Zero (±0) e + bias = 0, f = 0 e + bias = 0, f = 0Denormal e + bias = 0, f ≠ 0

represents ±0.f 2–126e + bias = 0, f ≠ 0represents ±0.f 2–1022

Infinity (∞) e + bias = 255, f = 0 e + bias = 2047, f = 0Not-a-number (NaN) e + bias = 255, f ≠ 0 e + bias = 2047, f ≠ 0Ordinary number e + bias [1, 254]

e [–126, 127]represents 1.f 2e

e + bias [1, 2046]e [–1022, 1023]represents 1.f 2e

min 2–126 1.2 10–38 2–1022 2.2 10–308

max 2128 3.4 1038 21024 1.8 10308

Page 26: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Floating Point Arithmetic• Floating point arithmetic differs from integer arithmetic in that

exponents must be handled as well as the magnitudes of the operands.

• The exponents of the operands must be made equal for addition and subtraction. The fractions are then added or subtracted as appropriate, and the result is normalized.

• Eg: Perform the floating point operation:(.101*23 +.111*24)2

• Start by adjusting the smaller exponent to be equal to the larger exponent, and adjust the fraction accordingly. Thus we have .101* 23 = .010 *24, losing .001 *23 of precision in the process.

• The resulting sum is (.010 +.111)*24 =1.001*24 =.1001* 25, and rounding to three significant digits, .100 *25, and we have lost another 0.001 *24 in the rounding process.

Page 27: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Floating Point Multiplication/Division• Floating point multiplication/division are performed in a manner

similar to floating point addition/subtraction, except that the sign, exponent, and fraction of the result can be computed separately.

• Like/unlike signs produce positive/negative results, respectively. Exponent of result is obtained by adding exponents for multiplication, or by subtracting exponents for division. Fractions are multiplied or divided according to the operation, and then normalized.

• Ex: Perform the floating point operation: (+.110 *25)/(+.100* 24)2

• The source operand signs are the same, which means that the result will have a positive sign. We subtract exponents for division, and so the exponent of the result is 5 – 4 = 1.

• We divide fractions, producing the result: 110/100 = 1.10.

• Putting it all together, the result of dividing (+.110 *25) by (+.100 * 24) produces (+1.10* 21). After normalization, the final result is (+.110* 22).

Page 28: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Floating point Arithmetic

• Represent binary number in floating point format

• 10011101011.001=1.0011101011001*210

• In single precision format sign =0,exponent =e+127 =10+127=137=10001001

• 0 1000 1001 0011101011001…0

Page 29: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Floating Point Addition• A= 0 1000 1001 0010000…0• B= 0 1000 0101 0100000…0• Exponent for A=1000 1001+137• Actual Exponent =137-127=10• Exponent B =1000 0101=133• Actual exponent=133-127=6• Number B has smaller exponent with difference 4 .Hence its

mantissa is shifted right by 4 bits• Shifted mantissa of B= 00000100..0• Add mantissas• A =00100000…0• B =00000100…0• Result=00100100…0• Result = 0 1000 1001 00100100…0

Page 30: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Adders and Simple ALUs

Addition is the most important arithmetic operation in computers:– Even the simplest computers must have an adder– An adder, plus a little extra logic, forms a simple ALU

• Simple Adders

• Carry Lookahead Adder

• Counting and Incrementing

• Design of Fast Adders

• Logic and Shift Operations

• Multifunction ALUs

Page 31: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Simple Adders

Binary half-adder (HA) and full-adder (FA).

x y c s 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0

Inputs Outputs

HA

x y

c

s

x y c c s 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1

Inputs Outputs

c out c in

out in x

y

s

FA

Page 32: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Full-Adder Implementations

Full adder implemented with two half-adders, by means of two 4-input multiplexers, and as two-level gate network.

(a) FA built of two HAs

(c) Two-level AND-OR FA (b) CMOS mux-based FA

1

0

3

2

HA

HA

1

0

3

2

0

1

x y

x y

x y

s

s s

c out

c out

c out

c in

c in

c in

Page 33: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Ripple-Carry Adder: Slow But Simple

Ripple-carry binary adder with 32-bit inputs and output.

x

s

y

c c

x

s

y

c

x

s

y

c

c out c in

0 0

0

c 0

1 1

1

1 2

31

31

31

31

FA FA FA 32 . . .

Critical path

Because of the carry propagation time to MSb position. It is linearly proportional to the length n of the adder

Page 34: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Carry Look ahead adder

The main part of an adder is the carry network. The rest is just a set of gates to produce the g (carry generate function) and p (carry propagate function) signals and the sum bits.

Carry network

. . . . . .

x i y i

g p

s

i i

i

c i c i+1

c k 1

c k

c k 2 c 1

c 0

g p 1 1 g p 0 0

g p k 2 k 2 g p i+1 i+1 g p k 1 k 1

c 0 . . . . . .

0 0 0 1 1 0 1 1

annihilated or killed propagated generated (impossible)

Carry is: g i p i

gi = xi yi pi = xi yi

The carry look ahead adder generates carry for any position parallely by additional logic circuit referred to as carry look ahead block.

Page 35: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Carry-Lookahead Addition

Gi = aibi and Pi = ai + bi

c0 = 0 c1 = G0 c2 = G1 + P1G0 c3 = G2 + P2G1 + P2P1G0 c4 = G3 + P3G2 + P3P2G1 + P3P2P1G0

• Carries are represented in terms of Gi (generate) and Pi (propagate) expressions.

Page 36: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Ripple-Carry Adder Revisited

The carry propagation network of a ripple-carry adder.

. . . c

k 1

c

k c

k 2

c

1

g

p

1

1

g

p

0

0

g

p

k 2

k 2

g

p

k 1

k 1

c

0 c

2

The carry recurrence: ci+1 = gi + pi ci

Latency of k-bit adder is roughly 2k gate delays:

1 gate delay for production of p and g signals, plus 2(k – 1) gate delays for carry propagation, plus1 XOR gate delay for generation of the sum bits

Page 37: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

The Complete Design of a Carry Look Ahead Adder

K-bit carry- lookahead adder

Carry network

. . . . . .

x i y i

g p

s

i i

i

c i c i+1

c k 1

c k

c k 2 c 1

c 0

g p 1 1 g p 0 0

g p k 2 k 2 g p i+1 i+1 g p k 1 k 1

c 0 . . . . . .

0 0 0 1 1 0 1 1

annihilated or killed propagated generated (impossible)

Carry is: g i p i

gi = xi yi pi = xi yi

Page 38: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Carry Lookahead Adder• Maximum gate

delay for the carry generation is only 3. The full adders introduce two more gate delays. Worst case path is 5 gate delays.

Page 39: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

16-bit Group Carry Lookahead Adder• A16-bit GCLA is composed of four 4-bit CLAs, with

additional logic that generates the carries between the four-bit groups.

GG0 = G3 + P3G2 + P3P2G1 + P3P2P1G0

GP0 = P3P2P1P0

c4 = GG0 + GP0c0

c8 = GG1 + GP1c4 = GG1 + GP1GG0 + GP1GP0c0

c12 = GG2 + GP2c8 = GG2 + GP2GG1 + GP2GP1GG0 + GP2GP1GP0c0

c16 = GG3 + GP3c12 = GG3 + GP3GG2 + GP3GP2GG1 +GP3GP2GP1GG0 + GP3GP2GP1GP0c0

Page 40: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

16-Bit Group Carry Lookahead Adder• Each

CLA has a longest path of 5 gate delays.

• In the GCLL section, GG and GP signals are generated in 3 gate delays; carry signals are generated in 2 more gate delays, resulting in 5 gate delays to generate the carry out of each GCLA group and 10 gates delays on the worst case path (which is s15 – not c16).

Page 41: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

The Booth Algorithm• Booth multiplication reduces the number of

additions for intermediate results, but can sometimes make it worse as we will see.

• Positive and negative numbers treated alike.

Page 42: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

A Worst Case Booth Example• A worst case situation in which the simple Booth

algorithm requires twice as many additions as serial multiplication.

Page 43: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Bit-Pair Recoding (Modified Booth Algorithm)

Page 44: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Coding of Bit Pairs

Page 45: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Multifunction ALUs

General structure of a simple arithmetic/logic unit.

Logicunit

Arithunit

0

1

Operand 1

Operand 2

Result

Logic fn (AND, OR, . . .)

Arith fn (add, sub, . . .)

Select fn type (logic or arith)

Page 46: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

An ALU for MiniMIPS

Figure A multifunction ALU with 8 control signals (2 for function class, 1 arithmetic, 3 shift, 2 logic) specifying the operation.

AddSub

x y

y

x

Adder

c 32

c 0

k /

Shifter

Logic unit

s

Logic function

Amount

5

2

Constant amount

Variable amount

5

5

ConstVar

0

1

0

1

2

3

Function class

2

Shift function

5 LSBs Shifted y

32

32

32

2

c 31

32-input NOR

Ovfl Zero

32

32

MSB

ALU

y

x

s

Shorthand symbol for ALU

Ovfl Zero

Func

Control

0 or 1

AND 00 OR 01

XOR 10 NOR 11

00 Shift 01 Set less 10 Arithmetic 11 Logic

00 No shift 01 Logical left 10 Logical right 11 Arith right

Page 47: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Machine CycleThe CPU uses repeating machine cycles to

execute instructions in the program, one by one, from beginning to end. A simplified cycle can consist of three phases: fetch, decode and execute

The steps of a cycle

Page 48: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Load Fetch/Execute Cycle

1. PC -> MAR Transfer the address from the PC to the MAR

2. MDR -> IR Transfer the instruction to the IR

3. IR(address) -> MAR Address portion of the instruction loaded in MAR

4. MDR -> A Actual data copied into the accumulator

5. PC + 1 -> PC Program Counter incremented

Page 49: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Store Fetch/Execute Cycle

1. PC -> MAR Transfer the address from the PC to the MAR

2. MDR -> IR Transfer the instruction to the IR

3. IR(address) -> MAR Address portion of the instruction loaded in MAR

4. A -> MDR* Accumulator copies data into MDR

5. PC + 1 -> PC Program Counter incremented

*Notice how Step #4 differs for LOAD and STORE

Page 50: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

ADD Fetch/Execute Cycle

1. PC -> MAR Transfer the address from the PC to the MAR

2. MDR -> IR Transfer the instruction to the IR

3. IR(address) -> MAR Address portion of the instruction loaded in MAR

4. A + MDR -> A Contents of MDR added to contents of accumulator

5. PC + 1 -> PC Program Counter incremented

Page 51: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

The Fetch/Execute Cycle• A five-step cycle:

1. Instruction Fetch (IF)

2. Instruction Decode (ID)

3. Data Fetch (DF)

4. Instruction Execution (EX)

5. Result Return (RR)

Page 52: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Instruction Interpretation• Process of executing a program

– Computer is interpreting our commands, but in its own language

Page 53: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

• Execution begins by moving the instruction at the address given by the PC from memory to the control unit

Page 54: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Instruction Interpretation (cont'd)• Bits of the instruction are placed into the

decoder circuit of the CU

• Once an instruction is fetched, the Program Counter (PC) can be readied for fetching the next instruction

• The PC is “incremented”

Page 55: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Instruction Interpretation (cont'd)• In the Instruction Decode step, the ALU is set up for

the indicated operation

• The Decoder will find the memory address of the instruction's data (source operands)– Most instructions operate on 2 data values stored in

memory (like ADD), so most instructions have addresses for two source operands

– These addresses are passed to the circuit that fetches the values from memory during the next step, Data Fetch

• The Decoder finds destination address for the Result Return step, and places it in RR circuit

• Decoder determines what operation the ALU will perform, and sets it up appropriately

Page 56: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.
Page 57: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.
Page 58: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Instruction Interpretation (cont'd)

• Instruction Execution: The actual computation is performed.

• For the ADD instruction, the addition circuit adds the two source operands together to produce their sum

Page 59: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.
Page 60: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Instruction Interpretation (cont'd)

• Result Return: result of execution is returned to the memory location specified by the destination address.

• Once the result is returned, the cycle begins again (This is a Loop).

Page 61: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.
Page 62: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.
Page 63: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.
Page 64: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Execution of complete Instructions

• Consider the instruction Add (R3), R1 which adds the content of memory location pointed to by R3 to register R1.

• Executing this instruction requires the following actions

• Fetch the instruction• Fetch the first operand• Perform the addition• Load the result into R1

Page 65: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

FETCH OPERATION

• Loading the content of PC into MAR and sending Read request to the memory.

• Select signal is set to select 4, which causes the MUX to select the constant 4 and add to the operand at B, Which is the content of PC and the result is stored in register Z

• The updated value is moved from register Z back into PC

• The word fetched from memory loaded into IR

Page 66: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

DECODE and EXECUTING PHASE

• Interprets the content of IR• Enables the control circuitry to activate the

control signals• The content of register R3 transferred to MAR

and memory Read initiated• Content of R1 transferred to register Y to

prepare for addition operation• Memory operand available in register MDR and

addition performed• Sum is stored in register Z, then transferred to

R1

Page 67: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

What Is A Pipeline?• Pipelining is used by virtually all modern

microprocessors to enhance performance by overlapping the execution of instructions.

• A common analogue for a pipeline is a factory assembly line. Assume that there are three stages:

1. Welding

2. Painting

3. Polishing

• For simplicity, assume that each task takes one hour.

Page 68: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

What Is A Pipeline?• If a single person were to work on the

product it would take three hours to produce one product.

• If we had three people, one person could work on each stage, upon completing their stage they could pass their product on to the next person (since each stage takes one hour there will be no waiting).

• We could then produce one product per hour assuming the assembly line has been filled.

Page 69: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Characteristics Of Pipelining• If the stages of a pipeline are not balanced

and one stage is slower than another, the entire throughput of the pipeline is affected.

• In terms of a pipeline within a CPU, each instruction is broken up into different stages. Ideally if each stage is balanced (all stages are ready to start at the same time and take an equal amount of time to execute.) the time taken per instruction (pipelined) is defined as:Time per instruction (unpipelined) / Number of stages

Page 70: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Characteristics Of Pipelining• The previous expression is ideal. We will see

later that there are many ways in which a pipeline cannot function in a perfectly balanced fashion.

• In terms of a CPU, the implementation of pipelining has the effect of reducing the average instruction time, therefore reducing the average CPI.

• EX: If each instruction in a microprocessor takes 5 clock cycles (unpipelined) and we have a 4 stage pipeline, the ideal average CPI with the pipeline will be 1.25 .

Page 71: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

• Instruction Pipelining• • Break the instruction cycle into stages• • Simultaneously work on each stage• Two Stage Instruction Pipeline• Break instruction cycle into two stages:• • FI: Fetch instruction• • EI: Execute instruction• FI EI• Clock cycle ® 1 2 3 4 5 6 7• Instruction i• Instruction i+1• Instruction i+2• Instruction i+3• Instruction i+4 FI• EI• EI• EI• E

Page 72: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Two Stage Instruction Pipeline

Break instruction cycle into two stages:• FI: Fetch instruction• EI: Execute instructionClock cycle 1 2 3 4 5 6 7Instruction i FI EIInstruction i+1 FI EIInstruction i+2 FI EIInstruction i+3 FI EIInstruction i+4 FI EI

Page 73: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Two Stage Instruction Pipeline

• But not doubled:q Fetch usually shorter than executionq If execution involves memory accessing,

the fetch stage has to waitq Any jump or branch means that prefetched

instructions are not the required instructions

• Add more stages to improve performance

Page 74: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Six Stage Pipelining

• Fetch instruction (FI)

• Decode instruction (DI)

• Calculate operands (CO)

• Fetch operands (FO)

• Execute instructions (EI)

• Write operand (WO)

Page 75: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

MIPS Pipeline• Pipeline stages:

– IF– ID (decode + Reg fetch)– EX – MEM– Write back

On each clock cycle another instruction is fetched and begins its five-step execution. If an instruction is started every clock cycle, the performance will be five times that of a machine that is not pipelined.

Instruction Clock numbernumber 1 2 3 4 5 6 7 8 9Instruction i IF ID EX MEM WBInstruction i+1 IF ID EX MEM WBInstruction i+2 IF ID EX MEM WBInstruction i+3 IF ID EX MEM WBInstruction i+4 IF ID EX MEM WB

Page 76: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

Looking At The Big Picture• Overall the most time that an non-

pipelined instruction can take is 5 clock cycles. Below is a summary:

• Branch - 2 clock cycles• Store - 4 clock cycles• Other - 5 clock cycles

• EX: Assuming branch instructions account for 12% of all instructions and stores account for 10%, what is the average CPI of a non-pipelined CPU?

ANS: 0.12*2+0.10*4+0.78*5 = 4.54

Page 77: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.

The Classical RISC 5 Stage Pipeline

• In an ideal case to implement a pipeline we just need to start a new instruction at each clock cycle.

• Unfortunately there are many problems with trying to implement this. Obviously we cannot have the ALU performing an ADD operation and a MULTIPLY at the same time. But if we look at each stage of instruction execution as being independent, we can see how instructions can be “overlapped”.

Page 78: UNIT-II CENTRAL PROCESSING UNIT INTODUCTION ARITHMETIC LOGIC UNIT FIXED POINT ARITHMETIC FLOATING POINT ARITHMETIC EXECUTION OF A COMPLETE INSTRUCTION.