1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions...

29
1 Logic Restructuring Logic Restructuring for Timing for Timing Optimization Optimization Outline: Outline: Definitions and problem statement Definitions and problem statement Overview of techniques Overview of techniques (motivated (motivated by adders) by adders) Tree height reduction (THR) Tree height reduction (THR) Generalized bypass transform (GBX) Generalized bypass transform (GBX) Generalized select transform (GST) Generalized select transform (GST) Partial collapsing (?) Partial collapsing (?)

Transcript of 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions...

Page 1: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

1

Logic Restructuring for Logic Restructuring for Timing OptimizationTiming Optimization

Outline:Outline:• Definitions and problem statementDefinitions and problem statement• Overview of techniques Overview of techniques (motivated by (motivated by

adders)adders)– Tree height reduction (THR)Tree height reduction (THR)– Generalized bypass transform (GBX)Generalized bypass transform (GBX)– Generalized select transform (GST)Generalized select transform (GST)– Partial collapsing (?)Partial collapsing (?)

Page 2: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

2

Timing OptimizationTiming OptimizationFactors determining Factors determining delaydelay of circuit: of circuit:• Underlying Underlying circuitcircuit technology technology

– Circuit type Circuit type (e.g. domino, static CMOS, etc.)(e.g. domino, static CMOS, etc.)– Gate typeGate type– Gate sizeGate size

• Logical Logical structurestructure of circuit of circuit– Length of computation pathsLength of computation paths– False pathsFalse paths– BufferingBuffering

• ParasiticsParasitics– Wire loadsWire loads– Layout Layout

Page 3: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

3

Problem StatementProblem Statement

Given:Given:• Initial circuit function descriptionInitial circuit function description• Library of primitive functionsLibrary of primitive functions• Performance constraints Performance constraints (arrival/required (arrival/required

times)times)

Generate:Generate:

an implementation of the circuit using the an implementation of the circuit using the primitive functions, such that:primitive functions, such that:1.1. performanceperformance constraints are met constraints are met

2.2. circuit circuit areaarea is minimized is minimized

Page 4: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

4

Current Design ProcessCurrent Design Process

BehaviorBehaviorOptiizationOptiization(scheduling)(scheduling)

PartitioningPartitioning(retiming)(retiming)

Logic synthesisLogic synthesis•Technology independentTechnology independent•Technology mappingTechnology mapping

Timing drivenTiming drivenplace and routeplace and route

Behavioral descriptionBehavioral description

Logic and latchesLogic and latches

Logic equationsLogic equations

Gate netlistGate netlist

Layout Layout

•Gate libraryGate library•Perf. ConstraintsPerf. Constraints•Delay modelsDelay models

Page 5: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

5

Technology mapping for Technology mapping for delaydelay

FunctionFunctiontreetree

BufferBuffertreetree

Page 6: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

6

Overview of Solutions for Overview of Solutions for delaydelay

1.1. Circuit Circuit re-structuringre-structuring– Rescheduling operations to reduce time of computationRescheduling operations to reduce time of computation

2.2. Implementation of Implementation of functionfunction trees trees (technology (technology mapping)mapping)– Selection of gates from librarySelection of gates from library

• Minimum delay Minimum delay (load independent model - Kukimoto)(load independent model - Kukimoto)• Minimize delay and area Minimize delay and area (Jongeneel, DAC’00)(Jongeneel, DAC’00)

(combines Lehman-Watanabe and Kukimoto)(combines Lehman-Watanabe and Kukimoto)

3.3. Implementation of Implementation of bufferbuffer trees trees– Touati Touati (LT-trees)(LT-trees)– SinghSingh

4.4. ResizingResizing

Focus Focus herehere on circuit on circuit re-structuringre-structuring

Page 7: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

7

Circuit re-structuringCircuit re-structuring

Approaches:Approaches:

Local:Local: • Mimic optimization techniques in Mimic optimization techniques in addersadders

– Carry lookahead (Carry lookahead (THRTHR tree height reduction) tree height reduction)– Conditional sum (Conditional sum (GSTGST transformation) transformation)– Carry bypass (Carry bypass (GBXGBX transformation) transformation)

Global:Global:• Reduce depth of entire circuitReduce depth of entire circuit

– Partial collapsingPartial collapsing– Boolean simplificationBoolean simplification

Page 8: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

8

Re-structuring methodsRe-structuring methods

Performance measured by Performance measured by 1.1. levels, levels,

2.2. sensitizable paths, sensitizable paths,

3.3. technology dependent delaystechnology dependent delays

• LevelLevel based optimizations: based optimizations:– Tree height reduction (Singh ‘88)Tree height reduction (Singh ‘88)– Partial collapsing and simplification (Touati ‘91)Partial collapsing and simplification (Touati ‘91)– Generalized select transform (Berman ‘90)Generalized select transform (Berman ‘90)

• SensitizableSensitizable paths paths– Generalized bypass transform (Mcgeer ‘91)Generalized bypass transform (Mcgeer ‘91)

Page 9: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

9

Re-structuring for delay: Re-structuring for delay: tree-height reductiontree-height reduction

nn

ll mm

ii jj

hh

kk33

66

55 55

11 4411

00 00 00 00 22 00 00aa bb cc dd ee ff gg

ii11

00 00

aa bb

mm

jj

hh

kk3344

11

00 00 22 00 00

cc dd ee ff gg

n’n’DuplicatedDuplicatedlogiclogic

11220000

55CriticalCriticalregionregion

CollapsedCollapsedCritical regionCritical region

Page 10: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

10

Restructuring for delay: Restructuring for delay: path reductionpath reduction

ii11

00 00

aa bb

mm

jj

hh

kk3344

11

00 00 22 00 00

cc dd ee ff gg

n’n’DuplicatedDuplicatedlogiclogic

11220000

55

ii11

00 00

aa bb

mm

jj

hh

kk33

4411

00 00 22 00 00

cc dd ee ff gg

1122

00

3355

n’n’

22

11

00

44

Singh ‘88Singh ‘88

CollapsedCollapsedCritical regionCritical region

New delay = 5New delay = 5

Page 11: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

11

Generalized bypass Generalized bypass transform (GBX)transform (GBX)

• Make critical path Make critical path falsefalse– Speed up the circuitSpeed up the circuit

• BypassBypass logic of critical path(s) logic of critical path(s)

McGeer ‘91McGeer ‘91

ffmm=f=f ffm+1m+1 ffnn=g=g……

ffm m =f=f ffm+1m+1 ffnn=g=g…… 00

11g’g’

dgdg____dfdf

BooleanBooleandifferencedifference

s-a-0 redundants-a-0 redundant

Page 12: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

12

GBX and KMS transformGBX and KMS transformGBX gives little area increase, GBX gives little area increase, BUT BUT have now created an have now created an

untestableuntestable fault fault (on control input to multiplexor)(on control input to multiplexor)

KMS transform:KMS transform: (remove false paths without increasing delay)(remove false paths without increasing delay)1.1. ffkk is is lastlast node on false path that fans out. node on false path that fans out.

2.2. DuplicateDuplicate false path {f false path {f11,…, f,…, fkk} -> } -> {f’{f’11, … , f’, … , f’kk}}

3.3. f’f’jj fans out to every fanout of f fans out to every fanout of fjj except fexcept fj+1j+1, and f, and fjj just fans out to just fans out to ffj+1j+1

4.4. Set fSet f00 input to f input to f11 to to controlling valuecontrolling value and propagate constant and propagate constant (can do (can do because path is false and does not fanout)because path is false and does not fanout)

KMS resultsKMS results1.1. Function of every node, except fFunction of every node, except f11, … ,f, … ,fk k is is unchangedunchanged

2.2. Added k-1 nodesAdded k-1 nodes

3.3. Area added in Area added in linearlinear in size of length of false paths; in practice in size of length of false paths; in practice smallsmall area area increase.increase.

Page 13: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

13

KMS KMS ((Keutzer, Malik, Saldanha Keutzer, Malik, Saldanha ‘90‘90))

ffmm ffm+1m+1 ffnn……ffkk ffk+1k+1

f’f’mm f’f’m+1m+1 f’f’kk

ffmm ffm+1m+1 ffnn……ffkk ffk+1k+100

……Delay is Delay is notnotincreasedincreased

Page 14: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

14

End of lecture 20End of lecture 20

Page 15: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

15

Generalized select Generalized select transform (GST)transform (GST)

LateLate signal feeds multiplexor signal feeds multiplexor

cc dd ee ff gg

aa

bb

outout

cc dd ee ff gg

bb

cc dd ee ff gg

bb

a=0a=0

a=1a=1

outout00

11

aa

Berman ‘90Berman ‘90

Page 16: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

16

GST vs GBXGST vs GBX…… 00

11g’g’

dhdh____dada

aa

0/10/1

bb

cc gg

hh

cc dd ee ff ggbb

cc dd ee ff ggbb

a=0a=0

a=1a=1

outout00

11

aa

GSTGST

cc dd ee ff ggbb

cc dd ee ff ggbb

a=0a=0

a=1a=1

…… 00

11g’g’

0/10/1 cc gg

bb

GBXGBX

aa

hh

Boolean

diffe

Note:

rence =

a a a

hh h

GBXGBX

Page 17: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

17

GST vs GBXGST vs GBX• Select transform Select transform appearsappears to be more to be more areaarea

efficientefficient• ButBut Boolean difference generally more Boolean difference generally more

efficiently formed in efficiently formed in practicepractice• NoNo delay/speedup delay/speedup advantageadvantage for either for either

transformtransform• Need Need

– one MUX one MUX perper fanoutfanout in GST, in GST, – only only oneone MUX in GBX MUX in GBX

cc dd ee ff ggbb

cc dd ee ff ggbb

a=0a=0

a=1a=1

out1out100

11

aa

GSTGST out2out200

11

aa

Page 18: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

18

Technology independent Technology independent delay reductionsdelay reductions

Generally THR, GBX, GST Generally THR, GBX, GST (critical path based (critical path based methods)methods) work OK, work OK, butbut notnot great great

Why are technology independent delay reductions Why are technology independent delay reductions hardhard??

Lack of Lack of fast and accuratefast and accurate delay models delay models1.1. # levels# levels, , fastfast but but crudecrude

2.2. # levels + correction term# levels + correction term (fanout, wires,… ): a little (fanout, wires,… ): a little betterbetter, but still crude (what coefficients to use?), but still crude (what coefficients to use?)

3.3. Technology mappedTechnology mapped: reasonable, but very : reasonable, but very slowslow

4.4. Place and routePlace and route: better but : better but extremely slowextremely slow

5.5. SiliconSilicon: best, but : best, but infeasiblyinfeasibly slow (except for FPGAs) slow (except for FPGAs)

bbeetttteerr

sslloowweerr

Page 19: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

19

Clustering/partial-collapseClustering/partial-collapse

Traditional Traditional critical-pathcritical-path based methods require based methods require– Well defined Well defined criticalcritical path path– Good Good delay/slackdelay/slack information information

Problems:Problems:– Good delay information comes from mapper and layoutGood delay information comes from mapper and layout– Delay estimates and models are weakDelay estimates and models are weak

Possible solutions:Possible solutions:– Better delay modeling at technology independent levelBetter delay modeling at technology independent level– Make speedup, insensitive to actual critical paths and Make speedup, insensitive to actual critical paths and

mapped delaysmapped delays

Page 20: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

20

Clustering/partial-collapseClustering/partial-collapse

Two-level circuits are fastTwo-level circuits are fast– Collapse circuit to 2-level - Collapse circuit to 2-level - butbut

• Huge Huge areaarea penalty penalty• Huge capacitive Huge capacitive loadingloading on inputs (can be on inputs (can be muchmuch slower) slower)

To avoid huge area penaltyTo avoid huge area penalty– IdentifyIdentify clusters of nodes clusters of nodes

• Each cluster has some fixed sizeEach cluster has some fixed size– Perform Perform collapsecollapse of each cluster of each cluster– SimplifySimplify each node each node

DetailsDetails– How to choose the How to choose the clustersclusters??– How to choose cluster How to choose cluster sizesize??– How to How to simplifysimplify each node? each node?

Page 21: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

21

Lawler’s clustering Lawler’s clustering algorithmalgorithm

• OptimalOptimal in delay: in delay:– For a given clustering sizeFor a given clustering size

• May May duplicateduplicate nodes nodes (hence possible area (hence possible area penalty)penalty)– Not optimal w.r.t duplicationNot optimal w.r.t duplication– Use a heuristicUse a heuristic

• FastFast: O(m : O(m xx k) k)– m = number of edges in networkm = number of edges in network– k = maximum cluster sizek = maximum cluster size

Page 22: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

22

Clustering algorithm - Clustering algorithm - overviewoverview

1.1. Label phase:Label phase: ( (kk is cluster size) is cluster size)– If node u is an input, If node u is an input, label(u) := L := 0label(u) := L := 0

• Else Else L := max label of fanin of uL := max label of fanin of u– If (# nodes in TFI(u) with (label = L) >= If (# nodes in TFI(u) with (label = L) >= kk))

label(u) := L+1label(u) := L+1

2.2. Cluster phase:Cluster phase: (outputs to inputs) (outputs to inputs)– If node u is an output, If node u is an output, L := infinityL := infinity

• Else Else L := max label of fanouts of uL := max label of fanouts of u– If (label(u) < L) then create a If (label(u) < L) then create a newnew cluster with “root” u and with cluster with “root” u and with

members members allall the nodes in TFI(u) with label = label(u) the nodes in TFI(u) with label = label(u)

3.3. Collapse phase:Collapse phase: (order independent) (order independent)– Collapse all nodes in a cluster into a Collapse all nodes in a cluster into a singlesingle node node– NoteNote: a node may be in : a node may be in severalseveral clusters (causes area increase clusters (causes area increase

Page 23: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

23

Example of clusteringExample of clustering

00

00

00

00 00

00

11

11

11

11

22

0011

1122

00

00

ResultResult: Lawler’s algorithm: Lawler’s algorithmgives gives minimum depthminimum depth circuit circuit

Typically, Typically, 1.1. we decompose initial we decompose initial

circuit into 2-input NANDs circuit into 2-input NANDs and invertors. and invertors.

2.2. then cluster size then cluster size k k reflects # 2-input NANDs reflects # 2-input NANDs to be collapsed together.to be collapsed together.

k = 3k = 3

Page 24: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

24

Choosing Choosing kk• I(k):I(k): number of levels, given k number of levels, given k• d(k):d(k): duplication ratio duplication ratio

– Number of gates in cluster network Number of gates in cluster network divideddivided by number of gates in by number of gates in original networkoriginal network

• Determine kDetermine k00 where k where k00/d(k/d(k00)~2.0)~2.0

• For every k from 2 to kFor every k from 2 to k00, compute d(k), I(k), compute d(k), I(k)– Use exhaustive enumeration: label and cluster (without collapse) for Use exhaustive enumeration: label and cluster (without collapse) for

each k.each k.– Each iteration is O(|E|k)Each iteration is O(|E|k)

• Choose k such that Choose k such that – I(k) is minimizedI(k) is minimized

• Break ties using d(k)Break ties using d(k)– Minimize d(k)Minimize d(k) d(k)d(k)

I(k)I(k)

11 22 kk00

Page 25: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

25

Area recoveryArea recovery

Area increase is due to node Area increase is due to node duplicationduplication - - – this occurs when node is in this occurs when node is in multiplemultiple

clustersclusters

Two solutions:Two solutions:1.1. Break clusters into Break clusters into smallersmaller pieces off pieces off

critical pathcritical path

2.2. After cluster and collapse, After cluster and collapse, recoverrecover area area

Page 26: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

26

Relabeling procedure:Relabeling procedure:

Attempt to Attempt to increaseincrease node labels without exceeding node labels without exceeding cluster sizecluster size

In In reversereverse topological order topological orderStartStart : assign : assign

IncreaseIncrease label(u) if label(u) if

1.1. new-label(u) <= label(v) for each fanout v new-label(u) <= label(v) for each fanout v andand

2.2. new-label(u) = new-label(v) for each fanout v only if new-label(u) = new-label(v) for each fanout v only if label(u) = label(v) before relabeling, label(u) = label(v) before relabeling, andand

3.3. no cluster size is violatedno cluster size is violated

- ( ) max ( )i jj PO

new label O label O

Page 27: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

27

Relabeling exampleRelabeling example

00

00

00

00 00

00

11

11

11

22

22

00

00

00

00 00

00

11

11

11

11

22

beforebefore

afterafter

Page 28: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

28

Post-collapse area Post-collapse area recoveryrecovery

• Do algebraic factorization, Do algebraic factorization, butbut– UndoUndo factorization if depth increases factorization if depth increases

• Full_simplifyFull_simplify– Only consider node Only consider node vv as possible fanin of a node as possible fanin of a node

((v v introduced by introduced by using don’t cares) using don’t cares) if if level of level of vv < level of node. < level of node.

• Redundancy removalRedundancy removal

Page 29: 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions and problem statement Overview of techniques (motivated.

29

Conclusions Conclusions

• Variety of methods for delay optimizationVariety of methods for delay optimization– No single technique dominates No single technique dominates (KJ Singh PhD thesis)(KJ Singh PhD thesis)

• When applied to ripple-carry adder getWhen applied to ripple-carry adder get– Carry-lookahead adder (THR)Carry-lookahead adder (THR)– Carry-bypass adder (GBX)Carry-bypass adder (GBX)– Carry-select adder (GST)Carry-select adder (GST)– ? (partial collapse)? (partial collapse)

• All techniques ignore All techniques ignore false pathsfalse paths when when assessing the delay and critical regionsassessing the delay and critical regions– Can use Can use KMSKMS transform to eliminate false paths transform to eliminate false paths

without increasing delay without increasing delay (area increase however).(area increase however).