Simulation and Synthesis Techniques for Asynchronous FIFO Design
Introduction to asynchronous circuit design: specification and synthesis
description
Transcript of Introduction to asynchronous circuit design: specification and synthesis
Introduction toasynchronous circuit design:
specification and synthesis
Part III:
Advanced topics on synthesis of control circuits from STGs
Outline
• Logic decomposition– Hazard-free decomposition– Signal insertion– Technology mapping
• Optimization based on timing information– Relative timing– Timing assumptions and constraints– Automatic generation of timing assumptions
Specification(STG)
State Graph
SG withCSC
Next-state functions
Decomposed functions
Gate netlist
Reachability analysis
State encoding
Boolean minimization
Logic decomposition
Technology mapping
DesignDesignflowflow
No Hazards
abc
x 0
abcx1000
1100
b+
0100
a-
0110
c+
1
1
0
0
1
1
0
1
0
1
0
0
Decomposition May Lead to Hazards
abcx1000
1100
b+
0100
a-
0110
c+
a
bz
cx
1
0
0
0
0
1000
11001100
0100
0110
1
1
0
0
0
1
1
1
0
0
0
1
1
0
0
0
1
1
1
1
0
1
0
1
0
Decomposition
• Acknowledgement
• Global acknowledgement
• Generating candidates
• Hazard-free signal insertion
– Event insertion
– Signal insertion
Global acknowledgement
abc
z
abd
y
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
abc
z
abd
y
How about 2-input gates ?
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
a
bc
z
abd
y
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
How about 2-input gates ?
a
bc
z
abd
y
00
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
How about 2-input gates ?
abc
z
a
bd
y
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
How about 2-input gates ?
cz
dy
a
b
d- b+ d+ y+ a- y- c+ d-
c- d+ z- b- z+ c+ a+ c-
How about 2-input gates ?
Strategy for logic decomposition
• Each decomposition defines a new internal signal
• Method: Insert new internal signals such that– After resynthesis, some large gates are decomposed– The new specification is hazard-free
• Generate candidates for decomposition using standard logic factorization techniques:
– Algebraic factorization– Boolean factorization (boolean relations)
y-
z- w-
y+ x+
z+
x-
w+
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
Decomposition example
yz=1yz=0
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
C
C
x
y
x
y
w
z
xyz
y
zw
z
w
z
y
s-
s+
s-
s-
s=1
s=0
1001 1011
1000
1010
0111
0011y+
x-
w+
z+
z-
0001
0000 0101
0010 0100
0110
x+
w-
w-
w-
z-
z-y+
y+
x+
x+
1001
1000
1010
y+
z-
0111
C
C
x
y
x
y
w
z
x
y
z
w
z
w
z
y
sy-
y-
z- w-
y+ x+
z+
x-
w+
s-
s+
s-
s+
s-
s-
s=1
s=0
1001 1011
1000
1010
0111
0011y+
x-
w+
z+
z-
0001
0000 0101
0010 0100
0110
x+
w-
w-
w-
z-
z-y+
y+
x+
x+
1001
1000
1010
y+
z-
0111
y-
C
C
x
y
x
y
w
z
xyz
y
zw
z
w
z
y
yz=1yz=0
1001 1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
1011
1000
1010
0001
0000 0101
0010 0100
0110 0111
0011
y-
y+
x-
x+w+
w-
z+
z-
w-
w-
z-
z-y+
y+
x+
x+
1001
s-
s+
s=1
s=0
1001 1011
0111
0011
x-
w+
z+
0001
0000 0101
0010 0100
0110
x+
w-
w-
w-
z-
z-y+
y+
x+
x+
1001
1000
1010
y+
z-
0111
y-y-
z- w-
y+ x+
z+
x-
w+
s-
s+
z- is delayed by the new transition s- !
C
C
x
y
x
y
w
z
x
y
z
w
z
w
z
yyyyyyy
s-
s+
s=1
s=0
1001 1011
0111
0011
x-
w+
z+
0001
0000 0101
0010 0100
0110
x+
w-
w-
w-
z-
z-y+
y+
x+
x+
1001
1000
1010
y+
z-
0111
y-
FC
Sr
D
Decomposition(Algebraic, Boolean relations)
Hazard-free ?(Event insertion)
NO YES
C
C
C
C
SrSr
D
D
FC
Sr
D
Hazard-free ?(Event insertion)
NO YES
CC
Sr
D
until no more progress
Decomposition(Algebraic, Boolean relations)
Signal insertion for function F
State Graph
F=0 F=1
Insertion by input borders
F-
F+
Event insertion
a b
ER(x)
c
Event insertion
a b
ER(x)
cx x x x
b
SR(x)
a
Properties to preserve
a
a
b
b
a
a
b
b
a
a
b
b
xx
a
a
b
b
a
a
b
b
ba
a
b
b
xx
xx
a ispersistent
a is disabled by b
= hazards
Boolean decomposition
Fx1
xn
f H Gx1
xn
h1
hm
f
f = F (x1,…,xn) f = G(H(x1,…,xn))
Our problem: Given F and G, find H
Ch1
h2
f
state f next(f) (h1,h2)
s1 0 0 (0,-) (-,0) s2 0 1 (1,1) s3 1 0 (0,0) s4 1 1 (-,1) (1,-) dc - - (-,-)This is a Boolean Relation
y-
a+ c-
d-
a-
c+
a+
y+
a-c-
d+
c+
y
acd Facd y c d ( )
Rsy
R
S
y-
a+ c-
d-
a-
c+
a+
y+
a-c-
d+
c+
y
acd acd y c d ( )
Rsy
acdc
d
y-
a+ c-
d-
a-
c+
a+
y+
a-c-
d+
c+
y
acd acd y c d ( )
Rsy
cd yc
a
y-
a+ c-
d-
a-
c+
a+
y+
a-c-
d+
c+
y
acd acd y c d ( )
Rsya
Ddc
Technology mapping
• Merging small gates into larger gates introduces no new hazards
• Standard synchronous technique can be applied, e.g. BDD-based boolean matching
• Handles sequential gates and combinational feedbacks
• Due to hazards there is no guarantee to find correct mapping (some gates cannot be decomposed)
• Timing-aware decomposition can be applied in these rare cases
Specification(STG)
State Graph
SG withCSC
Next-state functions
Decomposed functions
Gate netlist
Reachability analysis
State encoding
Boolean minimization
Logic decomposition
Technology mapping
DesignDesignflowflow
Timing assumptions in design flow
• Speed-independent: wire delays after a forksmaller than fan-out gate delays
• Burst-mode: circuit stabilizes betweentwo changes at the inputs
• Timed circuits: Absolute bounds on gate / environment delays are known a priori (before physical design)
Relative Timing Circuits
• Assumptions: “a before b” – for concurrent events: reduces reachable state space
– for ordered events: permits early enabling
– both increase don’t care space for logic synthesis => simplify logic (better area and timing)
• “Assume - if useful - guarantee” approach: assumptions are used by the tool to derive a circuit and required timing constraints that must be met in physical design flow
• Applied to design of the Rotating Asynchronous Pentium Processor(TM) Instruction Decoder (K.Stevens, S.Rotem et al. Intel Corporation)
Speed-independent C-element
Relative Timing Asynchronous Circuits
a- before b-Timing assumption (on environment):
ab c
RT C-element: faster,smaller; correct only under timing constraint: a- before b-
ab c
State Graph (Read cycle)
DSr+
DSr+
DSr+
DTACK-
DTACK-
DTACK-
LDS-LDS-LDS-
LDTACK- LDTACK- LDTACK-
D-
DSr-DTACK+
D+
LDTACK+
LDS+
Lazy Transition Systems
ER (LDS+)ER (LDS+)
ER (LDS-)ER (LDS-)
LDS-LDS-
LDS+
LDS-DTACK- FR (LDS-)FR (LDS-)
Event LDS- is lazy: firing = subset of enabling
Timing assumptions
• (a before b) for concurrent events: concurrency reduction for firing and enabling
• (a before b) for ordered events: early enabling
• (a simultaneous to b wrt c) for triples of events: combination of the above
Speed-independent Netlist
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr
LDS
LDTACK
csc
map
Adding timing assumptions (I)
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr
LDS
LDTACK
csc
map
LDTACK- before DSr+
FAST
SLOW
Adding timing assumptions (I)
DTACKD
DSr
LDS
LDTACK
csc
map
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
LDTACK- before DSr+
State space domain
LDTACK- before DSr+
LDTACK-
DSr+
State space domain
LDTACK- before DSr+
LDTACK-
DSr+
State space domain
LDTACK- before DSr+
LDTACK-
DSr+
Two more unreachable states
Boolean domain
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
LDS = 0 LDS = 1
0 1-0
0 0 0 0 0 0/1?
1
111
-
-
-
---
- - - -
-
- ---
- - -
Boolean domain
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
LDS = 0 LDS = 1
0 1-0
0 0 - 0 0 1
1
111
-
-
-
---
- - - -
-
- ---
- - -
One more DC vector for all signals One state conflict is removed
Netlist with one constraint
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr
LDS
LDTACK
csc
map
Netlist with one constraint
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACK D
DSr LDS
LDTACK
LDTACK- before DSr+
TIMING CONSTRAINT
Timing assumptions
• (a before b) for concurrent events: concurrency reduction for firing and enabling
• (a before b) for ordered events: early enabling
• (a simultaneous to b wrt c) for triples of events: combination of the above
Ordered events: early enabling
a
c
b
a
a
c
b
a
bb
c cF G
Logic for gate c may change
Adding timing assumptions (II)
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr LDS
LDTACK
D- before LDS-
State space domain
LDS-
D-
Reachable space is unchanged
For LDS- enabling can be changed in one state
D- before LDS-
Potential enabling for LDS-
DSr-
Boolean domain
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
LDS = 0 LDS = 1
0 1-0
0 0 - 0 0 1
1
111
-
-
-
---
- - - -
-
- ---
- - -
Boolean domain
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
DTACKDSrD
LDTACK 00 01 11 10
00
01
11
10
LDS = 0 LDS = 1
0 1-0
0 0 - 0 0 1
1
11-
-
-
-
---
- - - -
-
- ---
- - -
One more DC vector for one signal: LDSIf used: LDS = DSr, otherwise: LDS = DSr + D
Before early enabling
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
DTACKD
DSr LDS
LDTACK
Netlist with two constraints
LDS+ LDTACK+ D+ DTACK+ DSr- D-
DTACK-
LDS-LDTACK-
DSr+
LDTACK- before DSr+and D- before LDS-
TIMING CONSTRAINTSDTACKD
DSr LDS
LDTACK
Both timing assumptions are used for optimization and become constraints
• Rule I (out of 6): a,b - non-input events
– Untimed ordering: a||b and a enabled before b, but not vice versa
– Derived assumption: a fires before b
– Justification: delay of a gate can be made shorter than delay of two (or more) gates: del(a) < del(c)+del(b)
Deriving automatic timing assumptions
aa a
b
b
b
c
c
• Rule I (out of 6): a,b - non-input events
– Untimed ordering: (a||b) and (a enabled before b), but not vice versa
– Derived assumption: a fires before b
– Justification: delay of a gate can be made shorter than delay of two (or more) gates
Deriving automatic timing assumptions
aa a
b
b
b
c
c
– Effect I: a state becomes DC for all signals
• Rule I (out of 6): a,b - non-input events
– Untimed ordering: (a||b) and (a enabled before b), but not vice versa
– Derived assumption: a fires before b
– Justification: delay of a gate can be made shorter than delay of two (or more) gates
Deriving automatic timing assumptions
aa a
b
b
b
c
c
– Effect II: another state becomes local DC for signal of event b
Backannotation of Timing Constraints
• Timed circuits require post-verification
• Can synthesis tools help ?– Report the least stringent set of timing constraints
required for the correctness of the circuit
– Not all initial timing assumptions may be required
• Petrify reports a set of constraints for order of firing that guarantee the circuit correctness
Timing constraints generation
abc
d
e
d d
e e
b
b
c
c
da
Assumptions:
d before b and
c before e and
a before d
Timing constraints generation
abc
d
e
Assumptions:
d before b and
c before e and
a before d
d d
e e
b
b
c
c
da
Timing constraints generation
abc
d
e
Assumptions:
d before b and
c before e and
a before d
d d
e e
b
b
c
cCorrect behavior
da
Timing constraints generation
abc
d
e
Assumptions:
d before b and
c before e and
a before d
d d
e e
b
b
c
c
1
2
Incorrect behavior
da
Covering incorrect behavior
abc
d
e
Assumptions:
d before b and
c before e and
a before d
d d
e e
b
b
c
c
1
2 4
3
{1, 3}
d before b
{1}
d before c
da
5
{2, 4}
c before e
Other possible constraints remove states from assumption domain => invalid
Covering incorrect behavior
abc
d
e
Assumptions:
d before b and
c before e and
a before d
d d
e e
b
b
c
c
1
2 4
3
{1}
d before c
da
5
{2, 4}
c before e
Constraints for the minimal cost solution:
d before c and
c before e
Timing aware state encoding
• Solve only state conflicts reachable in the RT assumptions domain
• Generate automatic timing assumptions for inserted state signals => state signals can be implemented as RT logic
• State variables inserted concurrently with I/O events => latency and cycle time reduction
Value of Relative Timing
• RT circuits provides up to 2-3x (1.3-2x) delay&area reduction with respect to SI circuits synthesized without (with) concurrency reduction
• Automatic generation of timing assumptions => foundation for automatic synthesis of RT circuits with area/performance comparable/better than manual
• Back-annotation of timing constraints => minimal required timing information for the back-end tools
• Timing-aware state encoding allows significant area/performance optimization
Specification(STG + user assumptions)
Lazy State Graph
Lazy SG withCSC
Next-state functions
Decomposed functions
Gate netlist
Reachability analysis
Timing-aware state encoding
Boolean minimization
Logic decomposition
Technology mapping
Design Flow with TimingDesign Flow with Timing
Required Timing Constraints
Automatic Timing Assumptions
FIFO example
FIFOli
lo
ro
ri
li-
li+
lo+
lo-
ro+
ro-
ri+
ri-
Speed-Independent Implementation
without concurrency reduction 3 state signals are required
SI implementation with concurrency reduction
li
lo ro
ri
xli-
li+
lo+
lo-
ro+
ro-
ri+
ri-
x+
x-
+gCgC +-
RT implementation
li
lo ro
ri
xli-
li+
lo+
lo-
ro+
ro-
ri+
ri-
x+
x-
OR
li-
li+
lo+
lo-
ro+
ro-
ri+
ri-
x+
x-
RT implementation
li
lo ro
ri
xli-
li+
lo+
lo-
ro+
ro-
ri+
ri-
x+
x-
OR
li-
li+
lo+
lo-
ro+
ro-
ri+
ri-
x+
x-
To satisfy the constraint: Delay(x- ) < Delay (ri+ ) andDelay(lo+) + Delay(x- ) < Delay(ro+ ) + Delay (ri+ ) All constraints are either satisfied by default oreasy to satisfy by sizing