Logic Restructuring for Timing Optimization

29
1 Logic Restructuring Logic Restructuring for Timing for Timing Optimization Optimization Outline: Outline: Definitions and problem statement Definitions and problem statement Overview of techniques Overview of techniques (motivated (motivated by adders) by adders) Tree height reduction (THR) Tree height reduction (THR) Generalized bypass transform (GBX) Generalized bypass transform (GBX) Generalized select transform (GST) Generalized select transform (GST) Partial collapsing (?) Partial collapsing (?)

description

Logic Restructuring for Timing Optimization. Outline: Definitions and problem statement Overview of techniques (motivated by adders) Tree height reduction (THR) Generalized bypass transform (GBX) Generalized select transform (GST) Partial collapsing (?). Timing Optimization. - PowerPoint PPT Presentation

Transcript of Logic Restructuring for Timing Optimization

Page 1: Logic Restructuring for Timing Optimization

1

Logic Restructuring for Logic Restructuring for Timing OptimizationTiming Optimization

Outline:Outline:• Definitions and problem statementDefinitions and problem statement• Overview of techniques Overview of techniques (motivated by (motivated by

adders)adders)– Tree height reduction (THR)Tree height reduction (THR)– Generalized bypass transform (GBX)Generalized bypass transform (GBX)– Generalized select transform (GST)Generalized select transform (GST)– Partial collapsing (?)Partial collapsing (?)

Page 2: Logic Restructuring for Timing Optimization

2

Timing OptimizationTiming OptimizationFactors determining Factors determining delaydelay of circuit: of circuit:• Underlying Underlying circuitcircuit technology technology

– Circuit type Circuit type (e.g. domino, static CMOS, etc.)(e.g. domino, static CMOS, etc.)– Gate typeGate type– Gate sizeGate size

• Logical Logical structurestructure of circuit of circuit– Length of computation pathsLength of computation paths– False pathsFalse paths– BufferingBuffering

• ParasiticsParasitics– Wire loadsWire loads– Layout Layout

Page 3: Logic Restructuring for Timing Optimization

3

Problem StatementProblem StatementGiven:Given:• Initial circuit function descriptionInitial circuit function description• Library of primitive functionsLibrary of primitive functions• Performance constraints Performance constraints (arrival/required (arrival/required

times)times)Generate:Generate:an implementation of the circuit using the an implementation of the circuit using the

primitive functions, such that:primitive functions, such that:1.1. performanceperformance constraints are met constraints are met2.2. circuit circuit areaarea is minimized is minimized

Page 4: Logic Restructuring for Timing Optimization

4

Current Design ProcessCurrent Design ProcessBehaviorBehaviorOptiizationOptiization(scheduling)(scheduling)

PartitioningPartitioning(retiming)(retiming)

Logic synthesisLogic synthesis•Technology independentTechnology independent•Technology mappingTechnology mapping

Timing drivenTiming drivenplace and routeplace and route

Behavioral descriptionBehavioral description

Logic and latchesLogic and latches

Logic equationsLogic equations

Gate netlistGate netlist

Layout Layout

•Gate libraryGate library•Perf. ConstraintsPerf. Constraints•Delay modelsDelay models

Page 5: Logic Restructuring for Timing Optimization

5

Technology mapping for Technology mapping for delaydelay

FunctionFunctiontreetree

BufferBuffertreetree

Page 6: Logic Restructuring for Timing Optimization

6

Overview of Solutions for Overview of Solutions for delaydelay

1.1. Circuit Circuit re-structuringre-structuring– Rescheduling operations to reduce time of computationRescheduling operations to reduce time of computation

2.2. Implementation of Implementation of functionfunction trees trees (technology mapping)(technology mapping)– Selection of gates from librarySelection of gates from library

• Minimum delay Minimum delay (load independent model - Kukimoto)(load independent model - Kukimoto)• Minimize delay and area Minimize delay and area (Jongeneel, DAC’00)(Jongeneel, DAC’00)

(combines Lehman-Watanabe and Kukimoto)(combines Lehman-Watanabe and Kukimoto)3.3. Implementation of Implementation of bufferbuffer trees trees

– Touati Touati (LT-trees)(LT-trees)– SinghSingh

4.4. ResizingResizing

Focus Focus herehere on circuit on circuit re-structuringre-structuring

Page 7: Logic Restructuring for Timing Optimization

7

Circuit re-structuringCircuit re-structuringApproaches:Approaches:Local:Local: • Mimic optimization techniques in Mimic optimization techniques in addersadders

– Carry lookahead (Carry lookahead (THRTHR tree height reduction) tree height reduction)– Conditional sum (Conditional sum (GSTGST transformation) transformation)– Carry bypass (Carry bypass (GBXGBX transformation) transformation)

Global:Global:• Reduce depth of entire circuitReduce depth of entire circuit

– Partial collapsingPartial collapsing– Boolean simplificationBoolean simplification

Page 8: Logic Restructuring for Timing Optimization

8

Re-structuring methodsRe-structuring methodsPerformance measured by Performance measured by

1.1. levels, levels, 2.2. sensitizable paths, sensitizable paths, 3.3. technology dependent delaystechnology dependent delays

• LevelLevel based optimizations: based optimizations:– Tree height reduction (Singh ‘88)Tree height reduction (Singh ‘88)– Partial collapsing and simplification (Touati ‘91)Partial collapsing and simplification (Touati ‘91)– Generalized select transform (Berman ‘90)Generalized select transform (Berman ‘90)

• SensitizableSensitizable paths paths– Generalized bypass transform (Mcgeer ‘91)Generalized bypass transform (Mcgeer ‘91)

Page 9: Logic Restructuring for Timing Optimization

9

Re-structuring for delay: Re-structuring for delay: tree-height reductiontree-height reduction

nnll mm

ii jjhh

kk33

6655 55

11 4411

00 00 00 00 22 00 00aa bb cc dd ee ff gg

ii11

00 00aa bb

mmjj

hh

kk3344

11

00 00 22 00 00cc dd ee ff gg

n’n’DuplicatedDuplicatedlogiclogic

11220000

55CriticalCriticalregionregion

CollapsedCollapsedCritical regionCritical region

Page 10: Logic Restructuring for Timing Optimization

10

Restructuring for delay: Restructuring for delay: path reductionpath reduction

ii11

00 00aa bb

mm

jj

hh

kk3344

11

00 00 22 00 00cc dd ee ff gg

n’n’DuplicatedDuplicatedlogiclogic

11220000

55

ii11

00 00aa bb

mm

jj

hh

kk3344

11

00 00 22 00 00cc dd ee ff gg

112200

3355

n’n’

2211

00

44

Singh ‘88Singh ‘88

CollapsedCollapsedCritical regionCritical region

New delay = 5New delay = 5

Page 11: Logic Restructuring for Timing Optimization

11

Generalized bypass Generalized bypass transform (GBX)transform (GBX)

• Make critical path Make critical path falsefalse– Speed up the circuitSpeed up the circuit

• BypassBypass logic of critical path(s) logic of critical path(s)

McGeer ‘91McGeer ‘91

ffmm=f=f ffm+1m+1 ffnn=g=g……

ffm m =f=f ffm+1m+1 ffnn=g=g…… 00

11g’g’

dgdg____dfdf

BooleanBooleandifferencedifference

s-a-0 redundants-a-0 redundant

Page 12: Logic Restructuring for Timing Optimization

12

GBX and KMS transformGBX and KMS transformGBX gives little area increase, GBX gives little area increase, BUT BUT have now created an have now created an untestableuntestable

fault fault (on control input to multiplexor)(on control input to multiplexor)KMS transform:KMS transform: (remove false paths without increasing delay)(remove false paths without increasing delay)

1.1. ffkk is is lastlast node on false path that fans out. node on false path that fans out.2.2. DuplicateDuplicate false path {f false path {f11,…, f,…, fkk} -> } -> {f’{f’11, … , f’, … , f’kk}}3.3. f’f’jj fans out to every fanout of f fans out to every fanout of f jj except fexcept fj+1j+1, and f, and fjj just fans out to just fans out to ffj+1j+1

4.4. Set fSet f00 input to f input to f11 to to controlling valuecontrolling value and propagate constant and propagate constant (can do (can do because path is false and does not fanout)because path is false and does not fanout)

KMS resultsKMS results1.1. Function of every node, except fFunction of every node, except f11, … ,f, … ,fk k is is unchangedunchanged2.2. Added k-1 nodesAdded k-1 nodes3.3. Area added in Area added in linearlinear in size of length of false paths; in practice in size of length of false paths; in practice smallsmall area area

increase.increase.

Page 13: Logic Restructuring for Timing Optimization

13

KMS KMS ((Keutzer, Malik, Saldanha Keutzer, Malik, Saldanha ‘90‘90))

ffmm ffm+1m+1 ffnn……ffkk ffk+1k+1

f’f’mm f’f’m+1m+1 f’f’kk

ffmm ffm+1m+1 ffnn……ffkk ffk+1k+100

……Delay is Delay is notnotincreasedincreased

Page 14: Logic Restructuring for Timing Optimization

14

End of lecture 20End of lecture 20

Page 15: Logic Restructuring for Timing Optimization

15

Generalized select Generalized select transform (GST)transform (GST)

LateLate signal feeds multiplexor signal feeds multiplexor

cc dd ee ff gg

aabb

outout

cc dd ee ff ggbb

cc dd ee ff ggbb

a=0a=0

a=1a=1outout

00

11

aa

Berman ‘90Berman ‘90

Page 16: Logic Restructuring for Timing Optimization

16

GST vs GBXGST vs GBX…… 00

11g’g’

dhdh____dada

aa

0/10/1

bb

cc gghh

cc dd ee ff ggbb

cc dd ee ff ggbb

a=0a=0

a=1a=1outout00

11

aa

GSTGST

cc dd ee ff ggbb

cc dd ee ff ggbb

a=0a=0

a=1a=1

…… 00

11g’g’

0/10/1 cc gg

bb

GBXGBX

aahh

Boolean diffe

Note:

rence =

a a ah h h

GBXGBX

Page 17: Logic Restructuring for Timing Optimization

17

GST vs GBXGST vs GBX• Select transform Select transform appearsappears to be more to be more areaarea

efficientefficient• ButBut Boolean difference generally more Boolean difference generally more

efficiently formed in efficiently formed in practicepractice• NoNo delay/speedup delay/speedup advantageadvantage for either for either

transformtransform• Need Need

– one MUX one MUX perper fanoutfanout in GST, in GST, – only only oneone MUX in GBX MUX in GBX

cc dd ee ff ggbb

cc dd ee ff ggbb

a=0a=0

a=1a=1

out1out100

11

aa

GSTGST out2out200

11

aa

Page 18: Logic Restructuring for Timing Optimization

18

Technology independent Technology independent delay reductionsdelay reductions

Generally THR, GBX, GST Generally THR, GBX, GST (critical path based methods)(critical path based methods) work OK, work OK, butbut notnot great great

Why are technology independent delay reductions Why are technology independent delay reductions hardhard??

Lack of Lack of fast and accuratefast and accurate delay models delay models1.1. # levels# levels, , fastfast but but crudecrude2.2. # levels + correction term# levels + correction term (fanout, wires,… ): a little (fanout, wires,… ): a little betterbetter, ,

but still crude (what coefficients to use?)but still crude (what coefficients to use?)3.3. Technology mappedTechnology mapped: reasonable, but very : reasonable, but very slowslow4.4. Place and routePlace and route: better but : better but extremely slowextremely slow5.5. SiliconSilicon: best, but : best, but infeasiblyinfeasibly slow (except for FPGAs) slow (except for FPGAs)

bbeetttteerr

sslloowweerr

Page 19: Logic Restructuring for Timing Optimization

19

Clustering/partial-collapseClustering/partial-collapseTraditional Traditional critical-pathcritical-path based methods require based methods require

– Well defined Well defined criticalcritical path path– Good Good delay/slackdelay/slack information information

Problems:Problems:– Good delay information comes from mapper and layoutGood delay information comes from mapper and layout– Delay estimates and models are weakDelay estimates and models are weak

Possible solutions:Possible solutions:– Better delay modeling at technology independent levelBetter delay modeling at technology independent level– Make speedup, insensitive to actual critical paths and Make speedup, insensitive to actual critical paths and

mapped delaysmapped delays

Page 20: Logic Restructuring for Timing Optimization

20

Clustering/partial-collapseClustering/partial-collapseTwo-level circuits are fastTwo-level circuits are fast

– Collapse circuit to 2-level - Collapse circuit to 2-level - butbut• Huge Huge areaarea penalty penalty• Huge capacitive Huge capacitive loadingloading on inputs (can be on inputs (can be muchmuch slower) slower)

To avoid huge area penaltyTo avoid huge area penalty– IdentifyIdentify clusters of nodes clusters of nodes

• Each cluster has some fixed sizeEach cluster has some fixed size– Perform Perform collapsecollapse of each cluster of each cluster– SimplifySimplify each node each node

DetailsDetails– How to choose the How to choose the clustersclusters??– How to choose cluster How to choose cluster sizesize??– How to How to simplifysimplify each node? each node?

Page 21: Logic Restructuring for Timing Optimization

21

Lawler’s clustering Lawler’s clustering algorithmalgorithm

• OptimalOptimal in delay: in delay:– For a given clustering sizeFor a given clustering size

• May May duplicateduplicate nodes nodes (hence possible area (hence possible area penalty)penalty)– Not optimal w.r.t duplicationNot optimal w.r.t duplication– Use a heuristicUse a heuristic

• FastFast: O(m : O(m xx k) k)– m = number of edges in networkm = number of edges in network– k = maximum cluster sizek = maximum cluster size

Page 22: Logic Restructuring for Timing Optimization

22

Clustering algorithm - Clustering algorithm - overviewoverview

1.1. Label phase:Label phase: ( (kk is cluster size) is cluster size)– If node u is an input, If node u is an input, label(u) := L := 0label(u) := L := 0

• Else Else L := max label of fanin of uL := max label of fanin of u– If (# nodes in TFI(u) with (label = L) >= If (# nodes in TFI(u) with (label = L) >= kk))

label(u) := L+1label(u) := L+12.2. Cluster phase:Cluster phase: (outputs to inputs) (outputs to inputs)

– If node u is an output, If node u is an output, L := infinityL := infinity• Else Else L := max label of fanouts of uL := max label of fanouts of u

– If (label(u) < L) then create a If (label(u) < L) then create a newnew cluster with “root” u and with cluster with “root” u and with members members allall the nodes in TFI(u) with label = label(u) the nodes in TFI(u) with label = label(u)

3.3. Collapse phase:Collapse phase: (order independent) (order independent)– Collapse all nodes in a cluster into a Collapse all nodes in a cluster into a singlesingle node node– NoteNote: a node may be in : a node may be in severalseveral clusters (causes area increase clusters (causes area increase

Page 23: Logic Restructuring for Timing Optimization

23

Example of clusteringExample of clustering0000

0000 00

00

11

11

11

11

22

00 11

1122

00

00

ResultResult: Lawler’s algorithm: Lawler’s algorithmgives gives minimum depthminimum depth circuit circuit

Typically, Typically, 1.1. we decompose initial we decompose initial

circuit into 2-input NANDs circuit into 2-input NANDs and invertors. and invertors.

2.2. then cluster size then cluster size k k reflects # 2-input NANDs reflects # 2-input NANDs to be collapsed together.to be collapsed together.

k = 3k = 3

Page 24: Logic Restructuring for Timing Optimization

24

Choosing Choosing kk• I(k):I(k): number of levels, given k number of levels, given k• d(k):d(k): duplication ratio duplication ratio

– Number of gates in cluster network Number of gates in cluster network divideddivided by number of gates in original by number of gates in original networknetwork

• Determine kDetermine k00 where k where k00/d(k/d(k00)~2.0)~2.0• For every k from 2 to kFor every k from 2 to k00, compute d(k), I(k), compute d(k), I(k)

– Use exhaustive enumeration: label and cluster (without collapse) for each Use exhaustive enumeration: label and cluster (without collapse) for each k.k.

– Each iteration is O(|E|k)Each iteration is O(|E|k)• Choose k such that Choose k such that

– I(k) is minimizedI(k) is minimized• Break ties using d(k)Break ties using d(k)

– Minimize d(k)Minimize d(k) d(k)d(k)

I(k)I(k)11 22 kk00

Page 25: Logic Restructuring for Timing Optimization

25

Area recoveryArea recoveryArea increase is due to node Area increase is due to node duplicationduplication - -

– this occurs when node is in this occurs when node is in multiplemultiple clustersclusters

Two solutions:Two solutions:1.1. Break clusters into Break clusters into smallersmaller pieces off pieces off

critical pathcritical path2.2. After cluster and collapse, After cluster and collapse, recoverrecover area area

Page 26: Logic Restructuring for Timing Optimization

26

Relabeling procedure:Relabeling procedure:Attempt to Attempt to increaseincrease node labels without exceeding node labels without exceeding

cluster sizecluster sizeIn In reversereverse topological order topological order

StartStart : assign : assign

IncreaseIncrease label(u) if label(u) if1.1. new-label(u) <= label(v) for each fanout v new-label(u) <= label(v) for each fanout v andand2.2. new-label(u) = new-label(v) for each fanout v only if new-label(u) = new-label(v) for each fanout v only if

label(u) = label(v) before relabeling, label(u) = label(v) before relabeling, andand3.3. no cluster size is violatedno cluster size is violated

- ( ) max ( )i jj POnew label O label O

Page 27: Logic Restructuring for Timing Optimization

27

Relabeling exampleRelabeling example

0000

0000 00

00

11

11

11

22

22

0000

0000 00

00

11

11

11

11

22beforebefore

afterafter

Page 28: Logic Restructuring for Timing Optimization

28

Post-collapse area Post-collapse area recoveryrecovery

• Do algebraic factorization, Do algebraic factorization, butbut– UndoUndo factorization if depth increases factorization if depth increases

• Full_simplifyFull_simplify– Only consider node Only consider node vv as possible fanin of a node as possible fanin of a node

((v v introduced by introduced by using don’t cares) using don’t cares) if if level of level of vv < level of node. < level of node.

• Redundancy removalRedundancy removal

Page 29: Logic Restructuring for Timing Optimization

29

Conclusions Conclusions • Variety of methods for delay optimizationVariety of methods for delay optimization

– No single technique dominates No single technique dominates (KJ Singh PhD thesis)(KJ Singh PhD thesis)• When applied to ripple-carry adder getWhen applied to ripple-carry adder get

– Carry-lookahead adder (THR)Carry-lookahead adder (THR)– Carry-bypass adder (GBX)Carry-bypass adder (GBX)– Carry-select adder (GST)Carry-select adder (GST)– ? (partial collapse)? (partial collapse)

• All techniques ignore All techniques ignore false pathsfalse paths when when assessing the delay and critical regionsassessing the delay and critical regions– Can use Can use KMSKMS transform to eliminate false paths transform to eliminate false paths

without increasing delay without increasing delay (area increase however).(area increase however).