混合式 CMOS/PTL 合成器嵌入在 標準元件庫之 IC 設計流程 Presenter: Ming-Yu Tsai.
-
Upload
madeline-patterson -
Category
Documents
-
view
240 -
download
3
Transcript of 混合式 CMOS/PTL 合成器嵌入在 標準元件庫之 IC 設計流程 Presenter: Ming-Yu Tsai.
混合式 CMOS/PTL合成器嵌入在標準元件庫之 IC設計流程
混合式 CMOS/PTL合成器嵌入在標準元件庫之 IC設計流程
Presenter: Ming-Yu TsaiPresenter: Ming-Yu Tsai
Ming-Yu Tsai 2
OutlineOutline
Introduction Introduction
Overview Cell Based Design FlowOverview Cell Based Design Flow
Pure PTL Synthesis Embedded in Cell Based Pure PTL Synthesis Embedded in Cell Based Design FlowDesign Flow
Hybrid CMOS/PTL Synthesis Embedded in Cell Hybrid CMOS/PTL Synthesis Embedded in Cell Based Design FlowBased Design Flow
ApplicationApplication
Conclusion Conclusion
Ming-Yu Tsai 3
OutlineOutline
Introduction Introduction – PTL cellsPTL cells– PTL synthesisPTL synthesis
Overview Cell Based Design FlowOverview Cell Based Design Flow
Pure PTL Synthesis Embedded in Cell Based Pure PTL Synthesis Embedded in Cell Based Design FlowDesign Flow
Hybrid CMOS/PTL Synthesis Embedded in Cell Hybrid CMOS/PTL Synthesis Embedded in Cell Based Design FlowBased Design Flow
ApplicationApplication
Conclusion Conclusion
Ming-Yu Tsai 4
IntroductionIntroduction
What’s PTLWhat’s PTL– Pass-Transistor-Logic (PTL) Pass-Transistor-Logic (PTL)
Ming-Yu Tsai 5
Threshold DropsThreshold Drops
VDD
VDD 0
0 VDD
CL
CL
VDD
0 VDD - VTn
CL
VDD
VDD
VDD |VTp|
CL
S
D S
D
VGS
S
SD
D
VGS
VGS>|Vt|
Ming-Yu Tsai 6
MOS Transistors in Series/ParallelMOS Transistors in Series/Parallel
Ming-Yu Tsai 7
Pass Transistor Logic (PTL)Pass Transistor Logic (PTL)
N transistors instead of 2NN transistors instead of 2N
No static power consumptionNo static power consumption
Bidirectional (versus undirectional)Bidirectional (versus undirectional)
AB
FB
0
A
0
B
B= A BF = A B
Ming-Yu Tsai 8
Advantages of PTLAdvantages of PTL
PTL cells PTL cells – Small area Small area – Better performance for MUX- and XOR-based Better performance for MUX- and XOR-based
circuitscircuits– Less power consumptionLess power consumption
PTL synthesizer PTL synthesizer – Only two types of cellsOnly two types of cells
– 2-to-1 multiplexers (MUX) 2-to-1 multiplexers (MUX) – Inverters Inverters
– Regular CMOS inverterRegular CMOS inverter– Special inverter with a feedback weak pMOSSpecial inverter with a feedback weak pMOS
– When process technology is updatedWhen process technology is updated– Easy to update all of PTL cell circuitsEasy to update all of PTL cell circuits
Ming-Yu Tsai 9
Different Categories of PTL Circuit Designs (1/2) Different Categories of PTL Circuit Designs (1/2)
PTL cell family
Single- rail Dual-rail
driving Non-driving driving Non-driving
Leap PTL CMOSTG CVSL PPL DCVSPGEEPL CPL DPL SRPL
PTL_n PTL_np
Ming-Yu Tsai 10
Different Categories of PTL Circuit Designs (2/2) Different Categories of PTL Circuit Designs (2/2)
Dual-railDual-rail– CPL (Complementary Pass-transistor Logic)CPL (Complementary Pass-transistor Logic)– DPL (Double Pass-transistor Logic)DPL (Double Pass-transistor Logic)– SRPL(Swing-Restore Pass-transistor Logic)SRPL(Swing-Restore Pass-transistor Logic)– EEPL (Energy Economized Pass transistor Logic)EEPL (Energy Economized Pass transistor Logic)– PPL (Push-Pull Pass transistor Logic)PPL (Push-Pull Pass transistor Logic)– CVSL (Cascode Voltage Switch Logic)CVSL (Cascode Voltage Switch Logic)– DCVSPG (Differential Cascode Voltage Switch DCVSPG (Differential Cascode Voltage Switch
with Pass-Gate)with Pass-Gate)
Single-railSingle-rail– LEAP (LEAn-integration Pass-transistor logic)LEAP (LEAn-integration Pass-transistor logic)– CMOSTG (CMOS with Transmission Gate)CMOSTG (CMOS with Transmission Gate)– PTL_n & PTL_npPTL_n & PTL_np
Ming-Yu Tsai 11
PTL cells (Dual-rail)PTL cells (Dual-rail)
S
S
A
B
Y
YL
S
S
A
B
L
S
S
A
B
Y
S
S
A
B
Y
S
S
S
A
B
Y
YL
S
S
A
B
L
CPL SRPL
DPL
Ming-Yu Tsai 12
PTL Cells (Single-rail)PTL Cells (Single-rail)
Y
S
S
A
B
L
LEAP
CMOSTG
PTL_n
PTL_np
Ming-Yu Tsai 13
Tradition of PTL Synthesis (1/2)Tradition of PTL Synthesis (1/2)
BDDBDD– Binary Decision DiagramBinary Decision Diagram
– MUX-basedMUX-based
ExampleExample 1
),,,,,( FEDCBAf
A=0 A=1
1B=0 B=1
1C=0 C=1
1D=0 D=1
F 1E=0 E=1
A
B
C
D
E
FEDCBAFEDCBAf ),,,,,(
Ming-Yu Tsai 14
Why Need Using BDDsWhy Need Using BDDs
Because employing BDDs for PTL ensures a Because employing BDDs for PTL ensures a sneak-path-free implementationsneak-path-free implementation– Sneak-pathSneak-path
– Both two nMOS switches are turned on at the Both two nMOS switches are turned on at the same timesame time
– resulting in larger power dissipationresulting in larger power dissipation
Ming-Yu Tsai 15
Tradition of PTL Synthesis (2/2)Tradition of PTL Synthesis (2/2)
Single-levelSingle-level– Small areaSmall area– long critical path long critical path
– depend on # of primary inputs depend on # of primary inputs
Multi-level Multi-level – Large area Large area – Short critical path Short critical path
1A=0 A=1
1B=0 B=1
1C=0 C=1
1D=0 D=1
F 1E=0 E=1
A
B
C
D
E
Single-level
A
B
C
D
E
E
A
B
C
D
E
Multi-level
X
A1
X=0 X=1
B1
A=0 A=1
D
1
B=0 B=1
1D=0 D=1
E
F 1E=0 E=1
C
X
A
BD
E
F
A
C
BD
E
Ming-Yu Tsai 16
OutlineOutline
Introduction Introduction
Overview Cell Based Design FlowOverview Cell Based Design Flow
Pure PTL Synthesis Embedded in Cell Based Pure PTL Synthesis Embedded in Cell Based Design FlowDesign Flow
Hybrid CMOS/PTL Synthesis Embedded in Cell Hybrid CMOS/PTL Synthesis Embedded in Cell Based Design FlowBased Design Flow
ApplicationApplication
Conclusion Conclusion
Ming-Yu Tsai 17
Design Abstraction Levels (1/2)Design Abstraction Levels (1/2)
SYSTEM
GATE
CIRCUITVoutVin
CIRCUITVoutVin
MODULE
+
DEVICE
n+S D
n+
G
Cell based design
Ming-Yu Tsai 18
Design Abstraction Levels (2/2)Design Abstraction Levels (2/2)
System interface
介面電路
System
A/D Register
RAM
Block or AlgorithmU1A
74LS02
2
31
U2A
74LS02
2
31
U3A
74LS01
2
31
AB
CD
Y
Schematic representation
Physical layout
Y
VDD
VSS
M2
MbreakP
M1
MbreakN
VDD
VSS
YA
Devices and connections
Ming-Yu Tsai 19
Cell-Based Design FlowCell-Based Design Flow
Specification
Architecture Design
RTL Coding
Logic Synthesis
C M O S
S t a n d a r d C e l l s
Floorplanner and Placement & Route
Tape out
A DCB E
O1 O2
OutputInput
Ming-Yu Tsai 20
Logic SynthesisLogic Synthesis
Synthesis = Synthesis = Translation + Translation + Optimization + MappingOptimization + Mapping
Translation =HDL code → Generic Boolean Translation =HDL code → Generic Boolean (GTECH)(GTECH)– It’s technology It’s technology independenceindependence
Optimize + Map = Generic Boolean → Target Optimize + Map = Generic Boolean → Target TechnologyTechnology– It’s technology It’s technology dependencedependence
Ming-Yu Tsai 21
Layout view of C432Layout view of C432
Ming-Yu Tsai 22
IC FabricationIC Fabrication
Ming-Yu Tsai 23
Cell-Based Design Flow V.S. Software Design Flow Cell-Based Design Flow V.S. Software Design Flow
Verilog/VHDL
SynthesisCell library(logic cells)
Place & RoutCell library
(physical cells)
UMC/TSMC
layout
C/C++
Compiler instruction set
assembler
Memory
Machine code
Assemble codeLogic circuit
instruction set
Ming-Yu Tsai 24
OutlineOutline
Introduction Introduction
Overview Cell Based Design FlowOverview Cell Based Design Flow
Pure PTL Synthesis Embedded in Cell Based Pure PTL Synthesis Embedded in Cell Based Design FlowDesign Flow– Basic PTL cellsBasic PTL cells– Pure PTL synthesis method Pure PTL synthesis method
Hybrid CMOS/PTL Synthesis Embedded in Cell Hybrid CMOS/PTL Synthesis Embedded in Cell Based Design FlowBased Design Flow
ApplicationApplication
Conclusion Conclusion
Ming-Yu Tsai 25
Basic PTL CellsBasic PTL Cells
2-to-1 multiplexer2-to-1 multiplexer– MUXMUX
Regular CMOS inverter Regular CMOS inverter – INVINV
Special inverter with a feedback weak pMOS Special inverter with a feedback weak pMOS – PINVPINV
– Level-restoring Level-restoring
Select Select
In1 In2
Out
Select SelectOut
In2In1
OutputInput
OutputInputP
weak
PINV Design
Ming-Yu Tsai 26
Why Need Insert PINV (1/2)Why Need Insert PINV (1/2)
In = 0 VDD
VDD
xOut
0.5/0.25
0.5/0.25
1.5/0.25
0
1
2
3
0 0.5 1 1.5 2Time, ns
Vol
tage
, V
In
Out
x = 1.8V
D
S
B
Ming-Yu Tsai 27
Why Need Insert PINV (2/2)Why Need Insert PINV (2/2)
P’P
N
N1
N1
N2
N2
N3
N3
MUX1
P_INV
MUX3MUX2
k=3
P
N
INV
Input
Input
Input
Cload
N1
N1
N2
N2
N3
N3
MUX1 MUX3MUX2
N4
N4
MUX4
k=4
MUX PartInput Part
Cn1 Cn22Cn1 Cn32Cn2 2Cn3Cload
MUX1 MUX3MUX2
Rrpinv{ ; }1 0 0 1 Rrn{ ; '}0 1 0 1 Rn{ '; '}0 1 0 1
Rn1 Rn2 Rn3Rrn{ '; }0 1 0 1
Cn+Cp+Cp’
Cn1 Cn22Cn1 Cn32Cn2 2Cn3Cload
MUX1 MUX3MUX2
Rrpinv{ ; }1 0 0 1 Rrn{ ; '}0 1 0 1 Rn{ '; '}0 1 0 1
Rn1 Rn2 Rn3
Cn+Cp+Cp’
τ1
τ2
2Cn4
MUX4
Rn4Rrn{ '; }0 1 0 1
Cn4
Rn{ '; '}0 1 0 1
1 1 1 1 2
1 2 2 3 1 2 3 3
22 2
R C C C C R R C CR R R C C R R R R C C
p p p n n p n n n
p n n n n p n n n n load
( ) ( ) ( )( ) ( ) ( ) ( )
'
2 1 1 1 2 1 2 2 3
1 2 3 3 4 1 2 3 4 4
2 22 2
R C C C C R R C C R R R C CR R R R C C R R R R R C C
p p p n n p n n n p n n n n
p n n n n n p n n n n n laod
( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )
'
Ming-Yu Tsai 28
Circuit Design of PINV (1/2)Circuit Design of PINV (1/2)
MOS size ratio of PINVMOS size ratio of PINV– Logic “0” passing through nMOS is strongLogic “0” passing through nMOS is strong– Logic “1” passing through nMOS is weakLogic “1” passing through nMOS is weak
Need low-skew inverter designNeed low-skew inverter design– UMC 90nUMC 90n
– 1/2.5 1/2.5
P
A A
In1 In2
Out
Ming-Yu Tsai 29
Circuit Design of PINV (2/2)Circuit Design of PINV (2/2)
Feedback pMOS Feedback pMOS – A ratioed circuit design causing temporary A ratioed circuit design causing temporary
fightingfighting– Avoid malfunctionAvoid malfunction
– Keep the feedback pMOS as small as possibleKeep the feedback pMOS as small as possible– Set a restriction on the maximum fanout loadSet a restriction on the maximum fanout load
Ming-Yu Tsai 30
Pure PTL Synthesis Embedded in Standard Cell-Based Design FlowPure PTL Synthesis Embedded in Standard Cell-Based Design Flow
Specification
Architecture Design
RTL Coding
Logic Synthesis
C M O S
S t a n d a r d C e l l s
Floorplanner and Placement & Route
Tape out
Replaced by our PTL synthesis
Logic Minimization
S t a n d a r d
L o g i c C e l l s
PTL Synthesis1. logic mapping based on PTL cells2. Buffer Elimination
P T L C e l l s
A DCB E
O1 O2
Select SelectOut
In2In1
P
Z Z
A AB B
Ming-Yu Tsai 31
PTL Library Logic CellsPTL Library Logic Cells
One-level PTL logic cell circuitOne-level PTL logic cell circuit
Two-level PTL logic cell circuitsTwo-level PTL logic cell circuitsP
A A
In1 In2
Out
P
A A
B B C C
In1 In2 In4In3
Out
Ming-Yu Tsai 32
One-Level PTL Logic CellsOne-Level PTL Logic Cells
Ming-Yu Tsai 33
Selection of One-level PTL Logic CellsSelection of One-level PTL Logic Cells
Using all the one-level PTL logic cells is bestUsing all the one-level PTL logic cells is best
Ming-Yu Tsai 34
Multi-Level PTL Logic CellsMulti-Level PTL Logic Cells
Case I: no clear advantageCase I: no clear advantage
Case II: much better circuit Case II: much better circuit
Ming-Yu Tsai 35
PTL Logic SimplificationPTL Logic Simplification
Remove redundant invertersRemove redundant inverters– called buffer eliminationcalled buffer elimination
Use simple regular invertersUse simple regular inverters– for example, in primary inputsfor example, in primary inputs
Ming-Yu Tsai 36
Buffer Elimination (1/3)Buffer Elimination (1/3)
Method Method – If possible, move PINV from output of MUX to If possible, move PINV from output of MUX to
the two inputsthe two inputs
Four exceptionsFour exceptions– The PINV cannot be movedThe PINV cannot be moved
– k-level rule (k=3 )k-level rule (k=3 )
Primary output
P P P
P
P
...
K-level-check
P
P P
S SB
A1 A2
A1 A2
S SB
Z
Z
Ming-Yu Tsai 37
Buffer Elimination (2/3)Buffer Elimination (2/3)
Inverter Elimination (three cases)Inverter Elimination (three cases)
a
b
a
b
P
P P
Pk-level-checkpassed
a
b
a
b
c c
P
P P
Pk-level-check passed
b
P
P
P
P
c
c
c
b
Ming-Yu Tsai 38
Example (C17) Example (C17)
Ming-Yu Tsai 39
Buffer Elimination (3/3)Buffer Elimination (3/3)
Number of inverters before and after buffer Number of inverters before and after buffer eliminationelimination– 54%~60% saving rate54%~60% saving rate
Ming-Yu Tsai 40
Experimental ResultsExperimental Results
UMC 90nm technologyUMC 90nm technology
Area of some PTL circuits are larger than Area of some PTL circuits are larger than CMOSCMOS– layout problem of pure nMOS cellslayout problem of pure nMOS cells
Ming-Yu Tsai 41
Layout Problems of PTL Basic CellsLayout Problems of PTL Basic Cells
Generic λ design Generic λ design rulesrules– need safe distance need safe distance
along the cell along the cell boundaryboundary
SolutionsSolutions– Separation of rowsSeparation of rows– No pure nMOS cellsNo pure nMOS cells
P+
N+
N+
N+
P+
VDD
GND
well tap
Substrate tap
N-well
P+
WPINV
6
6
6
633
3
6 6
6
WMUX
P+
N+
N+
P+
WPINV
6
6
6
633
3
GND
6
N+
P+
66
6
P+
N+
N+
P+
6
6
6
63 3
3
Unit: λ
P+
N+
VDD
GND
Substrate tap
N-well
WPINV-15
6
1.5
GND
6
P+
N+
6
3
3
cell of well/substrate taps
well tap
P+
N+
WPINV-15
6
6
P+
N+
WPINV-15
6
6
P+
N+
6
6
P+
N+
6
6
Unit: λ
Ming-Yu Tsai 42
Why Need Design Rule (1/2) Why Need Design Rule (1/2)
Ming-Yu Tsai 43
Why Need Design Rule (2/2) Why Need Design Rule (2/2)
Ming-Yu Tsai 44
Layout Compaction MethodsLayout Compaction Methods
Separation of rows for MUX and PINV cellsSeparation of rows for MUX and PINV cells
N+
N+
P+
N+
VDD
VDD
GND
Well Tap
P+
Substrate Tap
Row with nMOS(MUX cells)
Row with pMOS+nMOS(PINV/INV cells)
Ming-Yu Tsai 45
Experimental ResultsExperimental Results
Separation of rowsSeparation of rows ISCAS’85 ISCAS’85 benchmark circuit benchmark circuit C432C432
Ming-Yu Tsai 46
Merging basic PTL cells (1/2)Merging basic PTL cells (1/2)
Eliminate pure nMOS cellsEliminate pure nMOS cells– Merge of Different Types of Basic CellsMerge of Different Types of Basic Cells
Ming-Yu Tsai 47
Merging basic PTL cells (2/2)Merging basic PTL cells (2/2)
– Merge of Same Types of Basic CellsMerge of Same Types of Basic Cells
Ming-Yu Tsai 48
Experimental ResultsExperimental Results
Merge of basic PTL Merge of basic PTL cellscells
ISCAS’85 ISCAS’85 benchmark circuit benchmark circuit C432C432
Ming-Yu Tsai 49
Synthesis ResultsSynthesis Results
Use merging of basic PTL cellsUse merging of basic PTL cells
Compare with CMOS cell libraryCompare with CMOS cell library– area-optimization constraintarea-optimization constraint– better area, power and area-delay-power better area, power and area-delay-power
productproduct
Ming-Yu Tsai 50
OutlineOutline
Introduction Introduction
Overview Cell Based Design FlowOverview Cell Based Design Flow
Pure PTL Synthesis Embedded in Cell Based Pure PTL Synthesis Embedded in Cell Based Design FlowDesign Flow
Hybrid CMOS/PTL Synthesis Embedded in Cell Hybrid CMOS/PTL Synthesis Embedded in Cell Based Design FlowBased Design Flow
ApplicationApplication
Conclusion Conclusion
Ming-Yu Tsai 51
Observations from Pure PTL Synthesis Observations from Pure PTL Synthesis
CMOS is in general faster than PTLCMOS is in general faster than PTL– Critical path uses CMOS cellsCritical path uses CMOS cells– Non-critical path uses PTL cellsNon-critical path uses PTL cells
– To reduce area and powerTo reduce area and power
Problems with timing estimationProblems with timing estimation– Some optimizations after PTL synthesis Some optimizations after PTL synthesis
– Buffer EliminationBuffer Elimination– Layout compactionLayout compaction
– ProblemsProblems– Final circuits might not satisfy the original design Final circuits might not satisfy the original design
constraintsconstraints– Cause timing violation problemsCause timing violation problems
Ming-Yu Tsai 52
PTL V.S. CMOS (1/2)PTL V.S. CMOS (1/2)
[source]R.-S. Shelar and S.-S. Sapatnekar, “BDD Decomposition for Delay Oriented Pass Transistor Logic Synthesis”, IEEE Trans. VLSI Systems, Vol. 13, No. 8, pp. 957-970, Aug. 2005.
Ming-Yu Tsai 53
PTL V.S. CMOS (2/2)PTL V.S. CMOS (2/2)
Ming-Yu Tsai 54
Hybrid PTL/CMOS Synthesis FlowHybrid PTL/CMOS Synthesis Flow
PTL basic physical cellsPTL basic physical cells– MUXMUX
– include an inverterinclude an inverter– INV INV – PINVPINV
Specification
Architecture Design
RTL Coding
Logic Synthesis
C M O S
S t a n d a r d C e l l s
Floorplanner and Placement & Route
Tape out
Replaced by our PTL synthesis
Logic synthesis
M u l t i - l e v e l P T L
L o g i c C e l l s l i b r a r y
PTL Basic cells mapping(our programs)
P T L b a s i c p h y s i c a l
C e l l s l i b r a r y
C M O S
S t a n d a r d C e l l s
SelectIn1 In2
OutputOutput
Select
In1 In2
SelectSelect
Select Select
In1 In2
Out
Select SelectOut
In2In1
A
C
P
ZA
A
E
P
ZNA
B
B
CELL1_NOR2 CELL2_18_1_1
EBAAF
P
Ming-Yu Tsai 55
One-Level PTL Logic CellsOne-Level PTL Logic Cells
Ming-Yu Tsai 56
Multi-Level PTL Logic CellsMulti-Level PTL Logic Cells
n-level PTL logic functionsn-level PTL logic functions– Each input signalEach input signal
– {V{VDDDD, GND, variable}, GND, variable}– Without any simplificationWithout any simplification
– PTL logic cellsPTL logic cells
Three-level PTL logic cellsThree-level PTL logic cells– 4014 cells (after simplication)4014 cells (after simplication)– ExamplesExamples
P
S1
In1 In2 In4In3
Out
S2_1 S2_2
Sn_2 Sn_2n-1
-1 Sn_2n-1
.
.
.
...Sn_1
In2nIn2n-1In2n-2In2n-3
CFIAEBAHDABABDG
P
A
Out
B
C
D
EF
G
H
IP
A
Out
BD
G
H
AHDBADGBAAB
n23
Ming-Yu Tsai 57
PTL Logic Cells for XORPTL Logic Cells for XOR
2-input XOR2-input XOR
3-input XOR 3-input XOR – Method 1Method 1 – Method 2Method 2
P
A
C
OutP
B
P
A
C
Out
B
B
A
BZN
P
Ming-Yu Tsai 58
Cell Characterization for PTL Logic Cells (1/3)Cell Characterization for PTL Logic Cells (1/3)
Example Example – CMOS AND3 + level-3 PTL cellCMOS AND3 + level-3 PTL cell
– delay calculation for separate cells delay calculation for separate cells
– more accurate delaymore accurate delay
– difference of the above two delay calculations difference of the above two delay calculations
77665455444
332122111
_33
77665455444_3
3321221113
)()(
)()(
)()(
)()(
CRCRRRCRRCR
CRRRCRRCR
CRCRRRCRRCR
CRRRCRRCR
PTLlevelnandsynopsys
PTLlevel
nand
776654321
55432144321
332122111
)(
)()(
)()(
CRCRRRRRR
CRRRRRCRRRR
CRRRCRRCRtrue
))(( 654321 CCCRRR
Ming-Yu Tsai 59
Cell Characterization for PTL Logic Cells (2/3)Cell Characterization for PTL Logic Cells (2/3)
R1
R2
R3
R4 R5 R6
C1
C2
C3 C4 C5 C6
R7
C7
P
3a
ZN4
5 6
FO4
FO4
Ming-Yu Tsai 60
Cell Characterization for PTL Logic Cells (3/3)Cell Characterization for PTL Logic Cells (3/3)
SolutionSolution– model drain input capacitance (at node 3) of model drain input capacitance (at node 3) of
the three-level PTL logic cell as the three-level PTL logic cell as instead of instead of
Comparison with SPICE simulation Comparison with SPICE simulation
6543 CCCC 3C
Ming-Yu Tsai 61
Some Improvements for PTL Logic CellsSome Improvements for PTL Logic Cells
MUX_NPMUX_NP– When one drain input of MUX is always When one drain input of MUX is always
connected to Vconnected to VDDDD
Inverter reduction Inverter reduction
SelectIn1
Output
SelectIn1
Output
B
C
A
Can be removed
B
C
A
Ming-Yu Tsai 62
Experimental Results (1/2)Experimental Results (1/2)
Synthesis results using UMC 90nm technology Synthesis results using UMC 90nm technology – Post-layout simulationPost-layout simulation
Ming-Yu Tsai 63
Experimental Results (2/2)Experimental Results (2/2)
Cell utilization rate (%) for hybrid PTL/CMOS synthesis
Critical paths with delay-optimized synthesis constraint
Ming-Yu Tsai 64
OutlineOutline
Introduction Introduction
Overview Cell Based Design FlowOverview Cell Based Design Flow
Pure PTL Synthesis Embedded in Cell Based Pure PTL Synthesis Embedded in Cell Based Design FlowDesign Flow
Hybrid CMOS/PTL Synthesis Embedded in Cell Hybrid CMOS/PTL Synthesis Embedded in Cell Based Design FlowBased Design Flow
ApplicationApplication– Reciprocal Function with Hybrid Piecewise Reciprocal Function with Hybrid Piecewise
polynomial and Newton-Raphson Methodpolynomial and Newton-Raphson Method
Conclusion Conclusion
Ming-Yu Tsai 65
Piecewise Polynomial Method (1/3)Piecewise Polynomial Method (1/3)
n
i
ni
nn
nnk xaxaxaxaxaaxpxf
0
11
2210 ...)()(
polynomial Approximationpolynomial Approximation
Degree-n means that the approach have xn
)(xf
n
i
ni xa
0
Ming-Yu Tsai 66
Piecewise Polynomial Method (2/3)Piecewise Polynomial Method (2/3)
f(x)=logf(x)=log22x,n=5 ,m=2x,n=5 ,m=2
xm xl
N bitx
m bit (N-m) bit
lmmk xxaxaxpxf )()()()( 10
Ming-Yu Tsai 67
Piecewise Polynomial Method (3/3)Piecewise Polynomial Method (3/3)
Architecture for degree-nArchitecture for degree-n
67
Ming-Yu Tsai 68
Newton-Raphson (NR) methodNewton-Raphson (NR) method
)(
)(1
i
iii xf
xfxx
)2(
1
1
)(
)(
2
2
1
ii
ii
ii
i
iii
dxx
xdxx
x
xd
x
xf
xfxx
i
i
1
or 1
ofroot theis 1
-dx
f(x)x
df(x)d
Ming-Yu Tsai 69
Error Analysis of NR for reciprocal (1/2)Error Analysis of NR for reciprocal (1/2)
The Error isThe Error is
So xSo xii is is
Inserting xInserting xii into the will yield into the will yield
dxixi
1
dx
ixi
1
)2(1 iii dxxx
2
2
1
1
122
))1
(2)(1
(
ix
iiixi
ii
dd
dd
d
dd
dx
xxx
xxi
Ming-Yu Tsai 70
Error Analysis of NR for reciprocal (2/2)Error Analysis of NR for reciprocal (2/2)
SimilarSimilar
Proportional to the square of one previous Proportional to the square of one previous errorerror
dx
dx
i
i
xi
ix
1
1
1
1
1
1
dx
dd
x
i
ix
xi
i
1
1
11
21
2
2
1
1
11
ixi
ixi
d
ddd
x
x
Ming-Yu Tsai 71
Reciprocal Function Unit Using Hybrid Piecewise Polynomial and Newton-RaphsonReciprocal Function Unit Using Hybrid Piecewise Polynomial and Newton-Raphson
polynomial of degree 2polynomial of degree 2
Newton Raphson (NR) iterationsNewton Raphson (NR) iterations
)()( 2)(2)(1)(0 lxlxx xaxaaxPxf
mmm
,1,0,2)2( 21 idxxdxxx iiiii
xm xl
Coefficient Tables
Mult-1
Multi-Operand Adder
Mult-2
Sqra0 a1 a2
dxxx iii 2
1 2
d
Sqr: squarerMult-1, Multi-2: multipliers
Ming-Yu Tsai 72
Unified ArchitectureUnified Architecture
Approximation of Approximation of 1/d1/d with accuracy of with accuracy of fractional bitsfractional bits– According to our experimentsAccording to our experiments
–
required full precisionrequired full precision–
6m
Coeff. Table
Mult-1
Multi-Operand Adder
Mult-2
Sqr
a0 a2a1
PN
PN
PN
PN
3m+g 2m+gm+g
PN
2m
3m
6m
6m
2m
6m
3m-1
3m3m
6m
6m 6m
2mm 3m
6m
X
3m
PN
PN
lX ,0mX ,0 ldX ,
3m-g
6m
3m
5m-g
m
6m
6m6m
D
d
m
3m
3m-1
6m
6m
6m 6m
2/1 nn
6/0 nn
mn 6
Ming-Yu Tsai 73
Sub-Word-Sharing ArchitectureSub-Word-Sharing Architecture
Multiplier in Newton Raphson operationMultiplier in Newton Raphson operation–
Multiplier in Piecewise operationMultiplier in Piecewise operation– –
Multi-operand AdderMulti-operand Adder– 3-operand 3-operand
– instead of 5-operandinstead of 5-operand
6m
Coeff. Table
Multi-Operand Adder
Sqr
a0 a1 a2
3m+g
2m+g m+g
PN
Mult
6m 6m
6m
3m
PN
PN
PN
PN
PN
6m
6m6m
6m
2m
2m
3m
6m
m 2m 3m
6m
3m-1
3m-1
X6m
lX ,0mX ,0 ldX ,
2mm
2m
3m-2g
3m-g
3m
D
d
6m
6m6m
m
6m
mm 66
mgm 2]2[
mgm 2][
waste
merge
Ming-Yu Tsai 74
Sub-Word-Sharing Architecture (Mult-2)(1/3)
Sub-Word-Sharing Architecture (Mult-2)(1/3)
Operand assignment for the multiplier
mm 66
Ming-Yu Tsai 75
Sub-Word-Sharing Architecture (Mult-2)(2/3)
Sub-Word-Sharing Architecture (Mult-2)(2/3)
Ming-Yu Tsai 76
Sub-Word-Sharing Architecture (Mult-2)(3/3)
Sub-Word-Sharing Architecture (Mult-2)(3/3)
{2,1}{6,5,4,3}2m
{B,A}
PN
PPG(2m x 2m)
PPG(4m x 2m)
2m
PPG(6m x 2m)
4m 2m6m
{6~1} {F,E}
2m
{2,1}{5,4,3}
3m
{D,C}
PN
PPG(2m x 2m)
PPG(3m x 2m)
2m2m 2m
mm4
m6
m6
m2
m3
m3
m
m
m
m
m
m
m4 m2
m2
m2m
PPG(m x 2m)
m
{6}
2m
6m 6m
PPG zero injection
F E D C B A
lx ,02,0 lx
1a 2a
mgmxa
mgmxa
gma
xaxaaxf
l
l
lli
2][:
2]2[:
]3[:
)(
2,02
,01
0
2,02,0101
6 5 4 3 2 1
A6 A5 A4 A3 A2 A1
B6 B5 B4 B3 B2 B1
C6 C5 C4 C3 C2 C1
D6 D5 D4 D3 D2 D1
E6 E5 E4 E3 E2 E1
F6 F5 F4 F3 F2 F1
Ming-Yu Tsai 77
Comparison of Major Components in Different ArchitecturesComparison of Major Components in Different Architectures
[P1] M. J. Schulte, I. E. Stine, and K.E. Wires, “High-speed reciprocal approximations,” Proc. 31st Asilomar Conf. On Signals, Circuits and Systems, pp:1178-1182, 1998.
[P2] K. Umut and A. Ahmet, “Design and Implementation of Reciprocal Unit Using Table Look-up and Newton-Raphson Iteration,” Proc. Euromicro Systems on Digital System Design, pp. 249-253, 2004.
[P3] J. A. Pineiro and J.D. Bruguera, “High-speed double-precision computation of reciprocal, division, square root, and inverse square root,” IEEE Trans. on Computers, Vol. 51, No. 12, pp. 1377-1388, Dec.. 2002.
[P2] [P1] [P3]
Ming-Yu Tsai 78
Estimation Delay and Area ModelEstimation Delay and Area Model
Ming-Yu Tsai 79
ROM (1/2)ROM (1/2)
gates 1log1 2 f]f[/αDROM
f
.
.
.
f
f
2f
2f
.
.
. 2f
gates 2)(21 f/]bf[/αA fROM
Ming-Yu Tsai 80
ROM (2/2)ROM (2/2)
For UMC 90n technology For UMC 90n technology
Ming-Yu Tsai 81
Synthesis Results (1/2)Synthesis Results (1/2)
Area comparisonArea comparison– AO and DOAO and DO
– Save 38%~46%Save 38%~46%
Delay comparison Delay comparison – AOAO
– Save as 20%Save as 20%– DODO
– Save as 1%Save as 1%
Ming-Yu Tsai 82
Synthesis Results (2/2)Synthesis Results (2/2)
XOR-based cellsXOR-based cells– used in Wallace treeused in Wallace tree
– 3-2 counter 3-2 counter
MUX-based cellsMUX-based cells– used in selection unitused in selection unit
– 22%~27%22%~27%
Ming-Yu Tsai 83
OutlineOutline
Introduction Introduction
Overview Cell Based Design FlowOverview Cell Based Design Flow
Pure PTL Synthesis Embedded in Cell Based Pure PTL Synthesis Embedded in Cell Based Design FlowDesign Flow
Hybrid CMOS/PTL Synthesis Embedded in Cell Hybrid CMOS/PTL Synthesis Embedded in Cell Based Design FlowBased Design Flow
ApplicationApplication
Conclusion Conclusion
Ming-Yu Tsai 84
ConclusionConclusion
Novel hybrid PTL/CMOS synthesis Novel hybrid PTL/CMOS synthesis methodology methodology – Can be easily embedded in the standard cell-Can be easily embedded in the standard cell-
based design flowbased design flow
Hybrid CMOS/PTL leads to best results Hybrid CMOS/PTL leads to best results – area-optimized or delay-optimized synthesisarea-optimized or delay-optimized synthesis
Ming-Yu Tsai 85
Must-Have AttitudeMust-Have Attitude
Your Health Is Indispensable Your Health Is Indispensable
Time Is Most Important ConstraintTime Is Most Important Constraint
Help Your Boss Help Your Boss
Promise Your PromisePromise Your Promise
Don't Overestimate Others, Underestimate Don't Overestimate Others, Underestimate YourselfYourself
Work Smart & Work Right
Ming-Yu Tsai 86
Thanks for your attention!