Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)
-
Upload
claire-osborne -
Category
Documents
-
view
213 -
download
0
Transcript of Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)
![Page 1: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/1.jpg)
arithmetic.12/15
Computer Arithmetic
ALU Performance is critical( App. C5, C6 4th ed.)
![Page 2: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/2.jpg)
arithmetic.22/15
Requirements: CPU needs a 32-bit ALU(1) Functional Specification
inputs: 2 x 32-bit operands A, B, 4-bit modeoutputs: 32-bit result S, 1-bit carry, 1 bit overflowoperations: add, addu, sub, subu, and, or, xor, nor, slt, sltU
(2) Block Diagram (schematic symbol/ Verilog description)
ALUALUA B
movf
S
32 32
32
4c
![Page 3: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/3.jpg)
arithmetic.32/15
1-bit adder Review (Appendix B.5, B.6)
A B C Co Sum
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Sum = a!bc! + ab!c! + a!b!c+abc
= a b c = XOR
Carryout = a!bc + ab!c + abc! + abc
a
b
SumSum
CarryIn
CarryOut
a
b
Cin
CoA
B
Cinsum
2 units of delay from A/B to sum
1unit of delay from Cin to sum
![Page 4: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/4.jpg)
arithmetic.42/15
Carry Out circuit
b
CarryOut
a
CarryInCin
a
b
Cout2 units of delay
from Cin to Cout
![Page 5: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/5.jpg)
arithmetic.52/15
1-bit ALU cell: ADD, AND, OR
A
B
1-bitFull
Adder
CarryOut
Mu
x
CarryIn
Result
add
and
or
S-select
A B C Co
O
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Full Adder(3->2 element)
![Page 6: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/6.jpg)
arithmetic.62/15
Additional operations: Subtract, AND, OR
• A - B = A + (– B) = A + B + 1– form two complement by invert and add one
A
B
1-bitFull
Adder
CarryOut
Mu
x
CarryIn
Result
add
and
or
S-selectinvert
![Page 7: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/7.jpg)
arithmetic.72/15
1-bit ALU: AND, OR, a+b, a+b!
Most significant bit
0
3
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b 2
Less
0
3
Result
Operation
a
1
CarryIn
0
1
Binvert
b 2
Less
Set
Overflow detection Overflow
a.
b.
ALU Delays
Result = 1 gate delay
From a to result = 2
Form b to Result = 2 (ignore b invert)
![Page 8: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/8.jpg)
arithmetic.82/15
Final 32-bit ALU,
including zero detect
Seta31
0
Result0a0
Result1a1
0
Result2a2
0
b31
b0
b1
b2
Result31
Overflow
Bnegate
Zero
ALU0Less
CarryOut
ALU1Less
CarryIn
CarryOut
ALU2Less
CarryIn
CarryOut
ALU31Less
CarryIn
Operation
![Page 9: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/9.jpg)
arithmetic.92/15
Behavioral Representation: verilog, RTL FYI)
module ALU(A, B, m, S, c, ovf);input [0:31] A, B;input [0:3] m;output [0:31] S;output c, ovf;
reg [0:31] S;reg c, ovf;
always @(A, B, m) begincase (m)
0: S = A + B;
. . .
endendmodule
• Code written, simulated & verified
• translated into hardware (mapped)
• How complex digital design is done
![Page 10: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/10.jpg)
arithmetic.102/15
Overflow ?? - 4-bit example
• Examples: 7 + 3 = 10 but ...
• - 4 - 5 = - 9 but ...
2’s ComplementBinaryDecimal
0 0000
1 0001
2 0010
3 0011
0000
1111
1110
1101
Decimal
0
-1
-2
-3
4 0100
5 0101
6 0110
7 0111
1100
1011
1010
1001
-4
-5
-6
-7
1000-8
0 1 1 1
0 0 1 1+
1 0 1 0
1
1 1 0 0
1 0 1 1+
0 1 1 1
110
7
3
1
– 6
– 4
– 5
7
![Page 11: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/11.jpg)
arithmetic.112/15
Overflow Detection• Overflow: arithmetic result too large (or too small) to represent properly
– Example: - 8 4-bit binary number 7
• When adding operands with different signs, overflow cannot occur!
• Overflow occurs when adding:
– 2 positive numbers and sum is negative
– 2 negative numbers and the sum is positive
• On your own: Prove you can detect overflow by:
– Carry into MSB Carry out of MSB
0 1 1 1
0 0 1 1+
1 0 1 0
1
1 1 0 0
1 0 1 1+
0 1 1 1
110
7
3
1
– 6
–4
– 5
7
0
![Page 12: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/12.jpg)
arithmetic.122/15
Overflow Detection Logic
• Carry into MSB Carry out of MSB– For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1]
CarryIn0
A0
B0
1-bitALU
Result0
CarryOut0
A1
B1
1-bitALU
Result1
CarryIn1
CarryOut1
A2
B2
1-bitALU
Result2
CarryIn2
A3
B3
1-bitALU
Result3
CarryIn3
CarryOut3
Overflow
X Y X XOR Y
0 0 0
0 1 1
1 0 1
1 1 0
![Page 13: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/13.jpg)
arithmetic.132/15
MIPS ALU requirements
• Add, AddU, Sub, SubU, AddI, AddIU – => 2’s complement adder/sub with overflow detection
• And, Or, AndI, OrI, Xor, Xori, Nor– => Logical AND, logical OR, XOR, nor
• SLTI, SLTIU (set less than)– => 2’s complement adder with inverter, check sign bit of result
• ALU must support these ops
![Page 14: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/14.jpg)
arithmetic.142/15
MIPS arithmetic instruction format - Review
• Signed arithmetic generate overflow, no carry
R-type:
I-Type:
31 25 20 15 5 0
op Rs Rt Rd funct
op Rs Rt Immed 16
Type op funct
ADDI 10 xx
ADDIU 11 xx
SLTI 12 xx
SLTIU 13 xx
ANDI 14 xx
ORI 15 xx
XORI 16 xx
LUI 17 xx
Type op funct
ADD 00 40
ADDU 00 41
SUB 00 42
SUBU 00 43
AND 00 44
OR 00 45
XOR 00 46
NOR 00 47
Type op funct
00 50
00 51
SLT 00 52
SLTU 00 53
![Page 15: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/15.jpg)
arithmetic.152/15
Ripple Adder Performance?• Critical Path of n-bit
Rippled-carry adder is n*CP
A0
B0
1-bitALU
Result0
CarryIn0
CarryOut0
A1
B1
1-bitALU
Result1
CarryIn1
CarryOut1
A2
B2
1-bitALU
Result2
CarryIn2
CarryOut2
A3
B3
1-bitALU
Result3
CarryIn3
CarryOut3
Very slow: Must improveAssume t = carry delay / bit32- bit ALU needs 32 * t units of delay64-bit ALU needs64 * t units of delay
A
B
Cin sum
2 units of delay from A/B to sum
1unit of delay from Cin to sum
b
CarryOut
a
CarryIn
![Page 16: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/16.jpg)
arithmetic.162/15
Fast Addition : Carry Lookahead
• Carry Inputs can be precomputed by logic c1 = g0 + c0 p0 = a0 b0 + c0 (a0 + b0) p0 = a0 + b0 g0 = a0 b0
c2 = g1 + p1 c1 = g1 + p1 g0 + p1 p0 c0 = a1 b1 + c1 a1 + b1) p1 = a1 + b1 g1 = a1 b1 c3 = g2 + p2 g1 + p2 p1 g0 + p2 p1 p0 c0
c4 = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0 + p3 p2 p1 p0 c0
C4= func( a3, b3, a2, b2, a1, b1, a0, b0, c0)
1 unit delay each p, g
1 unit delay
3 units of delay
3 units of delay
3 units of delay
![Page 17: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/17.jpg)
arithmetic.172/15
Fast Addition: Carry Look Ahead – 4 bits A B C-out0 0 0 “kill”0 1 C-in “propagate”1 0 C-in “propagate”1 1 1 “generate”
g = a and b 1 delay p = a or b
C0 = Cin
c1 = g0 + c0 p0
c2 = g1 + g0 p1 + c0 p0 p1
c3 = g2 + g1 p2 + g0 p1 p2 + c0 p0 p1 p2
a0
b0
a1
b1
a2
b2
a3
b3
S
S
S
S
gp
gp
gp
gp
G0=g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0
C4 = . . .
P0 = p3 p2 p1 p0
3 units of delay for G0
3 units of delay for c1, c2, c3, (c4)4 units of delay for S1, S2, S3
3
3
3
4
4
4
2
![Page 18: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/18.jpg)
arithmetic.182/15
Carry Lookahead – 2nd level – 16 bits Add 2nd level abstraction for more practical 4-bit units Each Pi, Gi handles 4 bits at a time, 0-3, 4-7, 8-11,..)
P0 = p3 p2 p1 p0 ; G0 = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0
P1 = p7 p6 p5 p4 ;G1 = g7 + p7 g6 + p7 p6 g5 + p7 p6 p5 g4
P2 = p11 p10 p9 p8 ;G2 =g11 + p11 g10 + p11 p10 g9 + p11 p10 p9 g8
P3 = p15 p14 p13 p12;G3 = …….
3 units of delay for G0, G1, G2, G3
2 units of delay for P0, P1, P2, P3
![Page 19: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/19.jpg)
arithmetic.192/15
Fast Addition: Cascaded Carry Look-ahead (16-bit):
CLA
4-bitAdder
4-bitAdder
4-bitAdder
c4 = G0 + C0 P0
c8 = G1 + G0 P1 + C0 P0 P1
c12 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2
GP
G0P0
c16 = . . .
C0
5 units of delay for c8, c12, c16
c4 has 4 units of delay
c8
c12
5
5
4
![Page 20: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/20.jpg)
arithmetic.202/15
Carry Lookahead Homework
You are required to calculate the performance of a 16-bit Carry lookahead adder similar to the one discussed in class. The design has 2 options
1. assuming ripple carry is used inside each 4-bit cell2. Carry lookahead is used inside each 4-bit cell
•Both cases use carry lookahead at predicting 4-bit boundary carries [c4, c8, c12]•Draw a table showing the delay of each adder bit i.e. Sum0 - Sum 15; as well as the carry at each stage of the design – for the 2 designs
![Page 21: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/21.jpg)
arithmetic.212/15
8-bit carry lookahead adder (4-bit block is also CLA)
c5= g4 + c4.p4Delays 1 4 1
S0
S1
S2
S3
a4b4
S4
S5
S6
S7
a5b5
a6b6
a7b7
c4= G0 + c0 P0
2nd level carry lookahead
a0b0
a1b1
a2b2
a3b3
3
3
3
4 units of delay
6
6
6
G0
P0
G1
P1
5
6
![Page 22: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/22.jpg)
arithmetic.222/15
8-bit CLA – uses ripple carry inside 4-bit block
a0b0
Result0
Result1
Result2
Result3
a1b1
a2b2
a3b3
a4b4
Result4
Result5
Result6
Result7a7b7
a6b6
a5b5
2nd level carry lookahead c4
0
2
4
6
4
6
8
10
2
3
5
7
5
7
9
11
![Page 23: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/23.jpg)
arithmetic.232/15
Additional MIPS ALU requirements
• Mult, MultU, Div, DivU => Need 32-bit multiply and divide, signed and unsigned
• Sll, Srl, Sra => Need left shift, right shift, right shift arithmetic by 0 to 31 bits
• Nor (leave as exercise !)=> logical NOR or use 2 steps: (A OR B) XOR 1111....1111
![Page 24: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/24.jpg)
arithmetic.242/15
Multiply, Divide & Shift
![Page 25: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/25.jpg)
arithmetic.252/15
MIPS arithmetic instructions
• Instruction Example Meaning Comments• add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible• subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible• add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible• add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions• subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions• add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 +
constant; no exceptions• multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product• multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product• divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder • Hi = $2 mod $3 • divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder • Hi = $2 mod $3• Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi• Move from Lo mflo $1 $1 = Lo Used to get copy of Lo
![Page 26: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/26.jpg)
arithmetic.262/15
MULTIPLY (unsigned)• Paper and pencil example (unsigned):
Multiplicand 1000 AMultiplier 1001 B 1000
0000 0000 1000
Product 01001000• m bits x n bits = m+n bit product• Binary makes it easy:
–0 => place 0 ( 0 x multiplicand)–1 => place a copy ( 1 x multiplicand)
• 4 versions of multiply hardware & algorithm: –successive refinement
![Page 27: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/27.jpg)
arithmetic.272/15
Fast Multiply== Array Multiplier
• Stage i accumulates A * 2 i if Bi == 1
• Q: How much hardware for 32 bit multiplier? Critical path?
B0
A0A1A2A3
A0A1A2A3
A0A1A2A3
A0A1A2A3
B1
B2
B3
P0P1P2P3P4P5P6P7
0 0 0 0
FA
bj sum in
sum out
carryout
ai
carryin
Bi
Aj
Multiplicand A
Multiplier BProduct P
Cell delays ?
![Page 28: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/28.jpg)
arithmetic.282/15
Multiplier operation
• At each stage shift multiplicand left ( x 2)
• Multiplier bit Bi determines : add in shifted multiplicand
• Accumulate 2n bit partial product at each stage
B0
A0A1A2A3
A0A1A2A3
A0A1A2A3
A0A1A2A3
B1
B2
B3
P0P1P2P3P4P5P6P7
0 0 0 00 0 0
Multiplication, using shift & Add
![Page 29: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/29.jpg)
arithmetic.29
Multiplication, using shift & Add
• long-multiplication approach
1000× 1001 1000 0000 0000 1000 1001000
Length of product is the sum of operand lengths
multiplicand
multiplier
product
2/15
![Page 30: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/30.jpg)
arithmetic.30
Multiplication Hardwareusing shift & Add
Initially 0
2/15
![Page 31: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/31.jpg)
arithmetic.31
Optimized Multiplierusing shift & Add
• Perform steps in parallel: add/shift
One cycle per partial-product addition ok, if frequency of multiplications is low
2/15
32 – bit ALU, multiplicand
![Page 32: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/32.jpg)
arithmetic.322/15
Multiply Algorithm
DoneYes: 32 repetitions
2. Shift the Product register right 1 bit.
No: < 32 repetitions
1. TestProduct0
Product0 = 0Product0 = 1
1a. Add multiplicand to the left half of product & place the result in the left half of Product register
32nd repetition?
Start
0000 0011 0010 1: 0010 0011 0010 2: 0001 0001 0010 1: 0011 0001 0010 2: 0001 1000 0010 1: 0001 1000 0010 2: 0000 1100 0010 1: 0000 1100 0010 2: 0000 0110 0010
0000 0110 0010
Product Multiplicand
![Page 33: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/33.jpg)
arithmetic.332/15
MIPS logical instructions• Instruction Example Meaning Comment
• and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND• or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR• xor xor $1,$2,$3 $1 = $2 $3 3 reg. operands; Logical XOR• nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR• and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant• or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant• xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant• shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant• shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant• shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) • shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable• shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable• shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable
![Page 34: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/34.jpg)
arithmetic.342/15
How shift instructions are implemented
Two kinds: logical-- value shifted in is always "0"
arithmetic-- on right shifts, sign extend
msb lsb"0" "0"
msb lsb "0"
instruction can request 0 to 32 bits to be shifted!
1011 1110
shift right arithmeticby 2
1100 1011
shift right logical by 2
![Page 35: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/35.jpg)
arithmetic.35
– Shift value can be either be:• 5 bit unsigned integer• Specified in bottom byte of another
register.
Example: ADD r0, r1, r2, LSL#7
• Semantics: r2 is shifted left by 7 & then added to r1
Result
Operand 1
BarrelShifter
Operand 2
ALU
ARM :: Barrel Shifter:
2/14
![Page 36: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/36.jpg)
arithmetic.362/15
Barrel Shifter, used in ICsShift Right using one transistor per switch
D3
D2
D1
D0
A6
A5
A4
A3 A2 A1 A0
SR0SR1SR2SR3
![Page 37: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/37.jpg)
arithmetic.37
Barrel Shifter, used in ICsShift ……Left & right
D3
D2
D1
D0
A5
A4
A3
A2 A1 A0
SR0SR1SR2SL 1 SL 2 SL3
![Page 38: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/38.jpg)
arithmetic.382/15
Summary: Multiply & Shift• Multiply: successive refinement to see final design
– 32-bit Adder, 64-bit shift register, 32-bit Multiplicand Register
• Fast multiply Array multiplier
• Shifter: success refinement 1/bit at a time shift register to barrel shifter
![Page 39: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/39.jpg)
arithmetic.392/15
Floating Point Arithmetic
• How to represent – numbers with fractions, e.g., 3.1416
– very small numbers, e.g., .000000001
– very large numbers, e.g., 3.15576 109
• Fixed point• Floating point: a number system with floating decimal
point• Normalized numbers: no leading 0’s , single digit before
decimal point1.0 x3.1557 x350.03
10 9
109
![Page 40: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/40.jpg)
arithmetic.402/15
Floating Point Notation – IEEE 754 FP
6.02 x 10 1.673 x 1023 -24
exponent
radix (base)Mantissa
decimal point
Sign, magnitude
Sign, magnitude
IEEE F.P. ± 1.M x 2e - 127
• Issues:– Arithmetic (+, -, *, / )– Representation, Normal form– Range and Precision, Single, Double– Rounding– Exceptions (e.g., divide by zero, overflow, underflow)
![Page 41: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/41.jpg)
arithmetic.412/15
Floating-Point ArithmeticFloating point numbers in IEEE 754 standard:
single precision1 8 23
sign
exponent:excess 127binary integer
mantissa:sign + magnitude, normalizedbinary significand w/ hiddeninteger bit: 1.M
actual exponent ise = E - 127
S E M
N = (-1) 2 (1.M)S E-127
0 < E < 255
0 = 0 00000000 0 . . . 0 -1.5 = 1 01111111 10 . . . 0
Numbers that can be represented is in the range:
2-126
(1.0) to 2127
(2 - 2-23 )
Double Precision IEEE 754 [64-bits]
Exponent = 11 bits, Bias = 1023, Mantissa = 52, Sign= 1bit
127
![Page 42: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/42.jpg)
arithmetic.422/15
Exponent Bias used to simplify comparisons
• If we use 2’s complement, not good for sorting and comparison
0000 0000 1111 1111most negative most positiveexponent exponent
![Page 43: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/43.jpg)
arithmetic.432/15
Floating Point – Example review
•
• Represents – bias = 127 for 32-bit word– S = 1: negative
0: positive or zero
• Example (from fraction to floating point representation)-0.75
S exponent significant
( ) ( ) (exp. ) 1 1 2s biassignificant
![Page 44: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/44.jpg)
arithmetic.442/15
Floating-Point Example - review
• Represent –0.75– –0.75 = (–1)1 × 1.12 × 2–1
– S = 1
– Fraction = 1000…002
– Exponent = –1 + Bias = 126• Single: –1 + 127 = 126 = 011111102
• Double: –1 + 1023 = 1022 = 011111111102
• Single: 1011111101000…00• Double: 1011111111101000…00
![Page 45: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/45.jpg)
arithmetic.452/15
Addition – Multiply Algorithm issuesFor addition (or subtraction) :
(1) compute Ye - Xe (getting ready to align binary point)
(2) right shift Xm that many positions to form Xm 2
(3) compute Xm 2 + Ym
(4) for multiply, doubly biased exponent must be corrected:
Xe = 7 Ye = -3 Excess 8 extra subtraction step of the bias amount
Xe-Ye
Xe-Ye
Xe = 1111Ye = 0101 10100
= 15= 5 20
= 7 + 8= -3 + 8 4 + 8 + 8
![Page 46: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/46.jpg)
arithmetic.462/15
Floating Point Addition
• Step 1: align, round
• Step 2: add
• Step 3: normalize, check overflow or underflow
• Step 4: round
• Example: 9 99 10 1610 10 1. .ten ten
![Page 47: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/47.jpg)
arithmetic.472/15
Floating Point Multiplication
• Step 1: add exponents, subtract bias, Mpy mantissas
• Step 2: normalize and check over/underflow
• Step 3: round
• Step 4: check sign
• Example: 05 0 4375. ( . )
![Page 48: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/48.jpg)
arithmetic.48
FP Adder Hardware
• more complex than integer adder
• Doing it in one clock cycle - takes too long– Much longer than integer operations– Slower clock would penalize all instructions
• FP adder usually takes several cycles– pipelined
2/15
![Page 49: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/49.jpg)
arithmetic.49
FP Adder Hardware
Step 1
Step 2
Step 3
Step 4
2/15
![Page 50: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/50.jpg)
arithmetic.502/15
Floating Point: Overflow & Underflow
• Exponent too large to be represented
• Underflow: negative exponent too small to fit in exponent field
![Page 51: Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf881a28abf838c89690/html5/thumbnails/51.jpg)
arithmetic.512/15
Summary of Floating Point Arithmetic
• IEEE floating point standard 32 bit and 64 bit
• Converting decimal numbers to floating point and vice versa
• Overflow and underflow
• Floating point add and multiply