Post on 07-Apr-2015
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
ECE 124AECE 124AVLSI PrinciplesVLSI Principles
Lecture 16Lecture 16
Prof. Kaustav BanerjeeElectrical and Computer Engineering
E-mail: kaustav@ece.ucsb.edu
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
A Generic Digital ProcessorA Generic Digital Processor
MEMORY
DATAPATH
CONTROL
Inpu
t / O
utpu
t
Lecture 15Static, Dynamic MemoryLatch, RegisterTiming, Stability
Interconnect
Sequential Circuit
Today……
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
DATAPATHDATAPATHThe core of a digital ProcessorDatapath consists
Logic Blocks– Combinational Logic Functions (AND, OR, XOR….)
Arithmetic Blocks– Addition– Multiplication– Comparison– Shift
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
An Intel MicroprocessorAn Intel Microprocessor9-
1 M
ux9-
1 M
ux
5-1
Mux
2-1
Mux
ck1
CARRYGEN
SUMGEN+ LU
1000um
b
s0
s1
g64
sum sumb
LU : LogicalUnit
SUM
SEL
a
to Cachenode1
REG
Fetzer, Orton, ISSCC’02
Itanium has 6 integer execution units like this
Intel Itanium®
Integer Datapath
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
BitBit--Sliced DesignSliced Design
Reg
iste
r
Add
er
Shift
er
Mul
tiple
xer
Bit 0
Bit 32
……
DA
TA I
N
DA
TA O
UT
Control
Design a single bit datapath and repeat for all bits
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
AddersAdders
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
DDeesign an Addersign an AdderFundamental Arithmetic Building BlockPerformance
Logic Level Optimization– Optimize Boolean Functions– Carry Lookahead
Circuit Level Optimization– Transistor Sizing
Power Consumption
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
HalfHalf--Adder ImplementationsAdder ImplementationsHalf-Adder X
Xy
y
s
c
Xy
Xy
sc
s
cX
y
Half-Adder
X y
Carry
SUM
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
FullFull--AdderAdder
HA
HA
AB
Cin
Cout
s
A B
Cout
Sum
Cin Fulladder
in inin in in
out in in
S A B C ABC ABC ABC ABCC AB BC AC= ⊕ ⊕ = + + +
= + +
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Express Sum and CarryExpress Sum and Carry
Generate (G) = AB
Propagate (P) = A ⊕ B
Delete (D) = A B
Co = G + PCi
S = P Ci⊕Truth Table for Full Adder
G11111G10011P10101P01001P10110P01010D01100D00000
statusCoSCiBA
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Complimentary Static CMOS Full AdderComplimentary Static CMOS Full Adder
28 Transistors
A B
B
A
Ci
Ci A
X
VDD
VDD
A B
Ci BA
B VDD
A
B
Ci
Ci
A
B
A CiB
Co
VDD
S
0( )in i i
o in in
S A B C ABC C A B CC AB BC AC= ⊕ ⊕ = + + +
= + +
C0
S
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
The RippleThe Ripple--Carry AdderCarry Adder
Propagation Delay is linearly proportional to Ntcarry dominates the propagation delay
Worst case delay: linear with the number of bits td = O(N)
tadder = (N-1)tcarry + tsum
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Inverting PropertyInverting Property
0 0
( , , ) ( , , )
( , , ) ( , , )
ii
ii
S A B C S A B C
C A B C C A B C
=
=
Inverting all inputs to FA results in inverted values for all outputs
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Minimize Critical Path by Reducing Inverting StagesMinimize Critical Path by Reducing Inverting Stages
Exploit Inversion Property
A3
FA FA FA
Even cell Odd cell
FA
A0 B0
S0
A1 B1
S1
A2 B2
S2
B3
S3
Ci,0 Co,0 Co,1 Co,3Co,2
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Mirror Adder (1)Mirror Adder (1)
G11111
G10011
P10101
P01001
P10110
P01010
D01100
D00000
statusCoSCiBA
A
A
A
A
B B
BB
VDD
CiCo
A B Ci
A B Ci
VDD
A
B
Ci
Ci
A
B
VDD
S
Carry
Truth Table for Full Adder
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
G11111
G10011
P10101
P01001
P10110
P01010
D01100
D00000
statusCoSCiBA
A
A
A
A
B B
BB
VDD
CiCo
A B Ci
A B Ci
VDD
A
B
Ci
Ci
A
B
VDD
S
Mirror Adder (2)Mirror Adder (2)Sum
Truth Table for Full Adder
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Stick Diagram
CiA B
VDD
GND
B
Co
A Ci Co Ci A B
S
Mirror Adder ImplementationMirror Adder Implementation
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Mirror Adder SummaryMirror Adder Summary24 Transistors
The NMOS and PMOS chains are completely symmetrical
A maximum of 2 series transistors in carry-generation circuitry
The transistors connected to Ci should be closest to the output
Carry-Stage transistors have to be optimized for speed
Sum-Stage transistors can be optimized for area
The most critical issue is to minimize the capacitance at Co
Co is composed of 4 diffusion capacitances, 2 internal gate
capacitances, and 6 gate capacitances in the connecting adder cell
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
A
B
P
Ci
VDDA
A A
VDD
Ci
A
P
AB
VDD
VDD
Ci
Ci
Co
S
Ci
P
P
P
P
P
Sum Generation
Carry Generation
Setup
This implementation has similar sum and carry output delay
Transmission Gate Full AdderTransmission Gate Full Adder
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
CoCi
Gi
Di
Pi
Pi
VDD CoCi
Gi
Pi
VDD
φ
φ
Manchester CarryManchester Carry--ChainChainManchester Carry Gates
Static Implementation Dynamic Implementation
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
G2
φ
C3
G3Ci,0
P0
G1
VDD
φ
G0
P1 P2 P3
C3C2C1C0
44--Bit Manchester CarryBit Manchester Carry--ChainChain
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Pi + 1 Gi + 1 φ
Ci
Inverter/Sum Row
Propagate/Generate Row
Pi Gi φ
Ci - 1Ci + 1
VDD
GND
Stick Diagram
Manchester CarryManchester Carry--Chain ImplementationChain Implementation
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Design for Long Word LengthDesign for Long Word Length
Propagation delay of chain style design is quadratic in the number of bits
Chain style design is NOT practical for long word length (e.g. 32 bits)
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,3Co,2Co,1Co,0Ci,0
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,2Co,1Co,0Ci,0
Co,3
Mul
tiple
xer
BP=PoP1P2P3
Idea: If (P0 and P1 and P2 and P3 = 1)then Co3 = C0, else “kill” or “generate”.
CarryCarry--Bypass AdderBypass AdderCarry-Skip Adder
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Carrypropagation
SetupBit 0–3
Sum
M bits
tsetup
tsum
Carrypropagation
SetupBit 4–7
Sum
tbypass
Carrypropagation
SetupBit 8–11
Sum
Carrypropagation
SetupBit 12–15
Sum
tadder = tsetup + Mtcarry + (N/M-1)tbypass + (M-1)tcarry + tsum
1616--bit Carrybit Carry--Bypass AdderBypass Adder
N=16 M=4
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Carry Ripple versus Carry BypassCarry Ripple versus Carry Bypass
For smaller N, bypass adder is not preferred due to the overhead of bypass multiplexer
Ripple Adder
Bypass Adder
tp
N4~8
Depends on technology
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Setup
"0" Carry Propagation
"1" Carry Propagation
Multiplexer
Sum Generation
Co,k-1 Co,k+3
"0"
"1"
P,G
Carry Vector
Linear CarryLinear Carry--Select AdderSelect Adder
Both situations are evaluated
~ 30 % Hardware overhead
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
0
1
Sum Generation
Multiplexer
1-Carry
0-Carry
Setup
Ci,0 Co,3 Co,7 Co,11 Co,15
S0–3
Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15
0
1
Sum Generation
Multiplexer
1-Carry
0-Carry
Setup
S4–7
0
1
Sum Generation
Multiplexer
1-Carry
0-Carry 0-Carry
Setup
S8–11
0
1
Sum Generation
Multiplexer
1-Carry
Setup
S12–15
tadder = tsetup + Mtcarry + (N/M)tmux + tsum
1616--bit Linear Carrybit Linear Carry--Select AdderSelect Adder
N=16 M=4
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13
S0-1 S2-4 S5-8 S9-13
Ci,0
(4) (5) (6) (7)
(1)
(1)
(3) (4) (5) (6)
Mux
Sum
S14-19
(7)
(8)
Bit 14-19
(9)
(3)
SquareSquare--Root CarryRoot Carry--Select AdderSelect Adder
22 3 ...... ( 1) 2
PN P= + + + + ≅
N bits P stages First Stage has M bitsFor example: 2P N=
tadder = tsetup + Mtcarry + (N/M)tmux + tsum= tsetup + Mtcarry +Ptmux + tsum
M=2
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Adder Delays Adder Delays -- Comparison Comparison
Square root select
Linear select
Ripple adder
20 40N
t p(in
uni
t del
ays)
600
10
0
20
30
40
50
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
AN-1, BN-1A1, B1
P1
S1
• • •
• • • SN-1
PN-1Ci, N-1
S0
P0Ci,0 Ci,1
A0, B0
Expanding Lookahead equations:
All the way:
C0,k=Gk+PkC0,k-1
C0,k=Gk+Pk(Gk-1+Pk-1C0,k-2)
C0,k=Gk+Pk(Gk-1+Pk-1(……+P1(G0+P0Ci,0))))))
LookLook--Ahead Ahead -- Basic IdeaBasic Idea
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Carry DeterminationCarry DeterminationCo,0=G0+P0Ci,0
Co,1=G1+P1Co,0=G1+P1(G0+P0Ci,0)=G1+P1G0+P1P0Ci,0
Co,2=G2+P2G1+P2P1G0+P2P1P0Ci,0
Co,3=G3+P3G2+P3P2G1+P3P2P1G0+P3P2P1P0Ci,0
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Co,3
Ci,0
VDD
P0
P1
P2
P3
G0
G1
G2
G3
44--bit Lookbit Look--Ahead Ahead
Large stack
Poor Performance
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Co,0=G0+P0Ci,0
Co,1=G1+P1Co,0
Co,2=G2+P2C0,1
Co,3=G3+P3Co,2
A7
F
A6A5A4A3A2A1
A0
A0A1
A2A3
A4A5
A6
A7
F
tp∼ log2(N)
tp∼ N
(g”,p”) (g’,p’)
(g,p)
g=g”+g’p”p=p’p”
Hierarchically DecompositionHierarchically Decomposition
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Logarithmic LookLogarithmic Look--Ahead AdderAhead Adder
2logpt N∝
Kogge-Stone tree(A
0, B
0)
(A1,
B1)
(A2,
B2)
(A3,
B3)
(A4,
B4)
(A5,
B5)
(A6,
B6)
(A7,
B7)
(A8,
B8)
(A9,
B9)
(A10
, B10
)
(A11
, B11
)
(A12
, B12
)
(A13
, B13
)
(A14
, B14
)
(A15
, B15
)
S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
MultipliersMultipliers
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
The Binary MultiplicationThe Binary Multiplication
x
+
Partial products
Multiplicand
Multiplier
Result
1 0 1 0 1 0
1 0 1 0 1 0
1 0 1 0 1 0
1 1 1 0 0 1 1 1 0
0 0 0 0 0 0
1 0 1 0 1 0
1 0 1 1
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Reduce Partial ProductsReduce Partial Products
Partial products can be reduced by multiplier transformation(Booth’s Recording)
Reduce number of partial products is equivalent to reducing the number of additions
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Y0
Y1
X3 X2 X1 X0
X3
HA
X2
FA
X1
FA
X0
HA
Y2X3
FA
X2
FA
X1
FA
X0
HA
Z1
Z3Z6Z7 Z5 Z4
Y3X3
FA
X2
FA
X1
FA
X0
HA
Z2
Z0
4X4 Bit4X4 Bit--Array MultiplierArray Multiplier
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Critical PathCritical Path Y0
Y1
X3 X2 X1 X0
X3
HA
X2
FA
X1
FA
X0
HA
Y2X3
FA
X2
FA
X1
FA
X0
HA
Z1
Z3Z6Z7 Z5 Z4
Y3X3
FA
X2
FA
X1
FA
X0
HA
Z2
Z0
For MXN Bit-Array Multiplier
M
N-1
tmultiplier = [ (M-1) + (N-2) ] tcarry + (N-1) tsum + tand
tand
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
CarryCarry--Save MultiplierSave Multiplier
HA HA HA HA
FAFAFAHA
FAHA FA FA
FAHA FA HA
Vector Merging Adder
tmultiplier = (N-1) tcarry + tand + tmerge
Carry is saved for the next adder stage
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Multiplier Multiplier FloorplanFloorplan
SCSCSCSC
SCSCSCSC
SCSCSCSC
SC
SC
SC
SC
Z0
Z1
Z2
Z3Z4Z5Z6Z7
X0X1X2X3
Y1
Y2
Y3
Y0
Vector Merging Cell
HA Multiplier Cell
FA Multiplier Cell
X and Y signals are broadcastedthrough the complete array.( )
Carry-Save Multiplier
Rectangular shape is easy for integration
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
6 5 4 3 2 1 0 6 5 4 3 2 1 0
Partial products First stage
Bit position
6 5 4 3 2 1 0 6 5 4 3 2 1 0Second stage Final adder
FA HA
(a) (b)
(c) (d)
Partial Products TransformationPartial Products Transformation
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
WallaceWallace--Tree MultiplierTree MultiplierPartial products
First stage
Second stage
Final adder
FA FA FA
HA HA
FA
x3y3
z7 z6 z5 z4 z3 z2 z1 z0
x3y2x2y3
x1y1x3y0 x2y0 x0y1x0y2
x2y2x1y3
x1y2x3y1x0y3 x1y0 x0y0x2y1
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Wallace Tree MultiplierWallace Tree MultiplierPros:
Substantial hardware savingReduce propagation delay
Cons:Irregular structureCustomized layout
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
ShiftersShifters
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
The Binary ShifterThe Binary Shifter
Ai
Ai-1
Bi
Bi-1
Right Leftnop
Bit-Slice i
...
Widely used for floating point units, multiplications by constant numbers
This binary shifter is extremely slow for multi-bit applications
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
The Barrel ShifterThe Barrel Shifter
Sh3Sh2Sh1Sh0
Sh3
Sh2
Sh1
A3
A2
A1
A0
B3
B2
B1
B0
: Control Wire
: Data Wire
Area Dominated by Control Wiring
Decoder Control Signal
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
4x4 barrel shifter4x4 barrel shifter
BufferSh3S h2Sh 1Sh0
A3
A2
A 1
A 0
Widthbarrel ~ 2 pm M
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
Logarithmic ShifterLogarithmic ShifterSh1 Sh1 Sh2 Sh2 Sh4 Sh4
A3
A2
A1
A0
B1
B0
B2
B3
Lecture 16, ECE 124A, VLSI Principles Kaustav Banerjee
A3
A 2
A1
A0
Out3
Out2
Out1
Out0
00--7 bit Logarithmic Shifter7 bit Logarithmic Shifter