Is the die cast for the token game? Alex Yakovlev, Frank Burns, Alex Bystrov, Delong Shang, Danil...
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Is the die cast for the token game? Alex Yakovlev, Frank Burns, Alex Bystrov, Delong Shang, Danil...
Is the die cast for the token game?
Alex Yakovlev, Frank Burns, Alex Bystrov, Delong Shang, Danil Sokolov
University of Newcastle upon Tyne
ICATPN’02 – Adelaide
Casting dice for old and new token games …
What is this talk about?
Firstly, about the role of Petri nets in modern hardware design process (design flow), which is a gamble of its own
Secondly, about searching for the right way of deriving logic circuits (computational structures) from Petri nets (behavioural specifications)
However, I won’t talk here about use of Petri nets for circuit verification
Int. Technology Roadmap for Semiconductors says:
• 2010 will bring a system-on-a-chip with: – 4 billion 50-nanometer transistors, run at 10GHz– Moore’s law: steady growth at 60% in the
number of transistors per chip per year as the functionality of a chip doubles every 1.5-2 years.
• Technology troubles: process parameter variation, power dissipation (IBM S/390 chip operation PICA video), clock distribution etc. present new challenges for Design and Test
• But the biggest threat of all is design cost
Design productivity gap
From ITRS’99
A design team of 1000 working for 3 years on a MPU chip would cost some $1B (25% time spent on verification, 45% on redesign after first silicon)
Design costs and time to market
How to reduce them?
New design approaches to facilitate design component re-use (IP cores), but there is a problem of timing closure
New CAD methods to minimise costs of verification, testing and re-design
Timing problems
Year
Clock
Frequency
GHz
2000
Global clock
Local clock Global clock cannot cope with:
Fewer gate delays per clock cycle
Greater clock skew
Timing problems
Year
Clock
Frequency
GHz
2000
Global clock
Local clock Global clock cannot cope with:
Fewer gate delays per clock cycle
Greater clock skew
Clocks have to be localised
The number of Time Zones increases to 1000s and more
Self-timed Systems
• Get rid of global clocking and build systems based on handshaking:– Globally asynchronous locally synchronous
(GALS)– Design the whole system in a self-timed
way
• Whatever way is followed new CAD tools for self-timed design are needed
The Timing Mode Spectrum
Synchronous (globally/locally clocked)
Fully delay-insensitiveSpeed-independent
Burst-mode and fundamental mode
Globally asynchronous locally synchronous (GALS)
Asynchronous(self-timed)
Multiple clock domainsClock gating and distributionSingle clock
With relative timing and i/o mode
GALS module with stoppable clock
Local CLK
R RCL
Async-to-sync Wrapper
Req1
Req2
Req3
Req4
Ack3
Ack4Ack2
Ack1
Asynchronous World
Clocked Domain
GALS: an Example
In1 Out1
Clockgenerator
EnIn1 EnOut1
RCIn1
ACIn1RCOut1
ACOut1
Out2 In2
Clockgenerator
EnOut2 EnIn2
RCOut2
ACOut2
RCIn2
ACIn2
clk1
clk2
A1 A2R1 R2
Sync
Unit 1
Sync
Unit 2
Async
Interface
GALS: Petri net modelclk1-
clk1+
RCIn1 ACIn1
A1
MutexIn1 MutexOut1
R2
ACOut1 RCOut1
RCOut2 RCIn1
R1 A2ACOut2 ACIn2
Clk1=0
Main talk outline
Motivation: design flow problems• Backend language: Petri nets?• New design flow: two-level control • Direct mapping of PNs: event-based and
level-based• Direct mapping of STGs• Case studies • Conclusion
Motivation
• Complex self-timed controllers still cannot be designed fully automatically and provably correct (cf. work at Philips, Theseus Logic, Fulcrum, Self-Timed Solutions)
• It is important to interface to HL hardware description languages, e.g. VHDL, Verilog (standard for digital design) and/or Tangram, Balsa (CSP-based)
• Success (90’s) of behavioural synthesis for sync design • Parts of architectural synthesis (CDFG extraction,
scheduling and allocation) are similar to sync. design • Synthesis of RTL control/sequencer and its
implementation should be completely new for asynchronous circuits
• Need for a good intermediate (back-end) language
Motivation (cont’ed)
• Existing logic synthesis tools (cf. Petrify and Minimalist) can only cope with small-scale low level designs (state-space explosion, limited optimisation heuristics)
• Logic synthesis produces circuits whose structure does not correspond to their behaviour structure (bad for analysis and testing)
• Syntax-direct translation techniques may be a way forward but applied at what level?
Motivation for use of Petri nets
• Implications to new research targets on: – Translation between HDLs and Petri nets, particularly formal
underpinning of semantical links between front-end and back-end formats
– New composition and decomposition techniques (incl. various forms of refinement and transformation) applied to labelled PNs
– New circuit mapping and optimisation techniques for different types of models (under various delay-dependence or relative time assumptions and different signalling schemes)
– Combination of direct mapping with logic synthesis (e.g. circuits with ‘peep-hole optimisation’)
Main talk outline
• Motivation: design flow problems Backend language: Petri nets?• New design flow: two-level control • Direct mapping of PNs: event-based and
level-based• Direct mapping of STGs• Case studies• Conclusion
Intermediate language
• What is the most adequate formal language for the intermediate (still behavioural) level?– You don’t need one at all - directly map
syntax into circuit structure (Design flow 1)– Petri nets, at the level of Signal Transition
Graph (STG), and then use logic synthesis (Design flow 2)
Design Flow 1 (e.g. Tangram or Balsa (currently))
HDL
Handshake circuit netlist
QDI circuit netlist
Syntax-direct compilation
Direct mapping with Burst Mode FSM ‘peephole’
optimisation
HDL syntax directed mapping
doif (X=A) then
parOP1;OP2;
rapelse
seqOP3;OP4;
qesifod
do
par seq
OP2OP1 OP3 OP4
(X=A) ifthen else
Control flow is transferred between HDL syntax constructs rather than between operations
Pros and cons of Flow 1
• Pros:– Simple linear-size translation, guarantees high
productivity– Allows local optimisation and re-synthesis of
parts– Testing can be ‘programmed’ at high-level
• Cons: – Lack of global optimisation– Circuit structure follows the parsing tree of the
specification - this leads to low performance
Design Flow 2 (STG logic synthesis)
STG specification
Synthesisable STG
QDI circuit netlist
Analysis and optimisation (consistency, CSC, relative
timing)
Extras (e.g. refining to FC subclass for structural
methods)
Logic synthesis (via full State Space or structural
methods)
Logic synthesis (STGs & Petrify)
States with state coding problem
Total no. of states is 24 but only 16 binary codes
Rin
Ain
Decoupled latch
controller Aout
Rout
STG spec:
State graph:
Logic synthesis (STGs & Petrify)
# EQN file for model decoupled-latch# Estimated area = 16.00
[Rout] = Aout' Rout + csc2; [Ain] = csc0; [csc0] = Aout' csc1' csc2' + Rin csc0;
[csc1] = csc1 (csc0 + Rin') + Rout; [csc2] = csc1' (csc2 + csc0);
# Set/reset pins: reset(Rout) set(csc1)
Output from Petrify:
csc0, csc1, csc2 – state encoding signals
Logic synthesis (STGs & Petrify)
Resulting state graph (with csc signals): has 59 states and no coding conflicts
(coding space is 2^7=128)
Logic synthesis (STGs & Petrify)
Rin
Ain
Decoupled latch
controller R2out/A2out
R1out/A1outWhat if the
system gets bigger?
# EQN file for model decoupled-latch.1-2# Estimated area = 34.00
[R1out] = A1out' R1out + csc2 csc3'; [R2out] = R2out A2out' + csc4; [Ain] = csc0; [csc0] = A1out' A2out' csc1 csc2 csc3 csc4' + csc0 (csc1 csc4' + csc3 + Rin); [csc1] = R2out' (csc0' Rin + csc1); [csc2] = R1out' csc2 + csc3; [csc3] = csc0' (R1out' Rin csc2' + csc3); [csc4] = csc1 (csc4 + csc0);
# Set/reset pins: reset(R1out) reset(R2out) reset(csc1) reset(csc2) reset(csc3)
Logic synthesis (STGs & Petrify)
Logic is asymmetric
Delay grows out of proportion
Pros and cons of Flow 2
• Pros:– Guarantees global optimality– Allows HDL to STG translation for ‘more
pragmatic’ front-end (e.g. Blunno&Lavagno: Verilog to STG translation) and allows model-checking together with synthesis (so makes design provably correct)
• Cons: – State space size is a problem– Solving state-coding in a ‘good way’ is a
problem
Main talk outline
• Motivation: design flow problems• Backend language: Petri nets? New design flow: two-level control • Direct mapping of PNs: event-based and
level-based• Direct mapping of STGs• Case studies• Conclusion
Towards new design flow
• How to combine advantages of both approaches?– Use them at different levels– Introduce intermediate behavioural level -
labelled Petri nets (LPNs)– Perform semantical (based on execution order)
translation of HDLs to LPNs– Use direct mapping for large LPNs– Decompose control and use STGs and logic
synthesis at the low level (apply structural methods – e.g. Pastor at al.)
New Design Flow
HDL: Standard (VHDL, Verilog) or Async design
specific (Balsa)
LPNs
CDFG and LPN Compilation (semantic)
Synthesisable LPN
DC netlist
Verification (coherence etc.) Optimisation (scheduling,
dummies, fanin/fanout)
Direct mapping
New Design Flow: possible sources of useful translation techniques
• HDL to PN translation:– VHDL to Extended Timed PNs (Linkoping) – VHDL to Control Data Flow Graphs (Lyngby)– Verilog to PN/STGs (Torino)– B(PN^2) to M-net translation (PEP tool)– …
• But none of them caters for a good PN structure needed for direct mapping from PNs to circuits (mostly to work via state space exploration, esp. in model-checking)
Design flow
Control/data splitting
Hierarchical control spec
HDL specification
Datapath spec
LPN to circuit synthesis(direct mapping)
HDL implementation
Data logic synthesis
Control&data interfacing
Hierarchical control logic
STG to circuit synthesis(Petrify & direct mapping)
LPNSTG
Data logic
Our present focus
HDL syntax directed mapping
doif (X=A) then
parOP1;OP2;
rapelse
seqOP3;OP4;
qesifod
do
par seq
OP2OP1 OP3 OP4
(X=A) ifthen else
Control flow is transferred between HDL syntax constructs rather than
between operations
HDL-to-LPN (high-level control)
doif (X=A) then
parOP1;OP2;
rapelse
seqOP3;OP4;
qesifod
(X=A) (X<>A)
OP1 OP2 OP3
OP4
dum dum
dum
High level control: Labelled Petri net (LPN)
Labelled PNs and Datapath
• LPN is defined as (PN,OP,L) underlying PN (P,T,F,M0), operation alphabet OP and labelling function: L:T->OP
• Operations (typically assignments, comparisons, calls to macros such as arbiters) in OP are defined as signatures on the elements of datapath (e.g. lists of input/output registers R and operation units involved in the operation U), e.g. op(i)=<R,U>
Labelled PNs and Datapath
• Operations in OP are associated with req,ack (two-way, for assignments, or multi-way for comparisons and arbitration) handshakes – hence opr(i) and opa(i) signals
• Interface with actual req and ack signals associated with registers in R and op-units in U is either synthesized via Petrify (low-level control) or hardwired using MUXes and DEMUXes
Low-level control
Data path 1
OP1rOP3r
OP1a
OP4rOP3aOP4a
req1 ack1
Data path 2OP2r OP2a
ack2 req2
OP1r OP1a
OP3r OP3a
OP4r OP4a
req1 ack1
dum
OP2r+
req2+
ack2+
OP2a+
req2-
OP2r-
OP2a+
ack2-
Low level control: Signal Transition Graphs (STG)
Direct mapping of LPN to David cells
(X=A) (X<>A)
OP1 OP2 OP3
OP4
dum dum
dum
DC1
DC2
DC3
DC4
DC5
High-level control logic directly mapped from LPN
Basic David cell (DC)
Direct mapping cell library
LPN-to-DC mapping elements
linear
join
fork
controlled choice
arbitrated choice
merge
input test
Gate-level DC implementations
Main talk outline
• Motivation: design flow problems• Backend language: Petri nets?• New design flow: two-level control Direct mapping of PNs: event-based and
level-based• Direct mapping of STGs• Case studies• Conclusion
Direct mapping vs logic synthesis: conceptual
difference• Logic synthesis uses a Petri net (STG) as a
generator of an encoded state-space. The circuit structure is not directly related to the net structure (though some correspondence exists and is exploited in structural logic synthesis methods, Pastor et al.)
• Direct mapping considers a PN literally, as a prototype of the circuit structure (cf. Varshavsky’s use of term ‘modelling circuit’)
Direct mapping vs logic synthesis
• Direct mapping has linear computational complexity but can be area inefficient (inherent one-hot encoding)
• Logic synthesis has problems with state space explosion, and with recognition of repetitive and regular structures (log-based encoding approach)
Direct Translation of Petri Nets
• Previous work dates back to 70s• Synthesis into event-based (two-phase) circuits
– S.Patil, F.Furtek (MIT)• Synthesis into level-based (four-phase) circuits
– R. David (’69, translation of FSM graphs to CUSA cells)
– L. Hollaar (’82, translation from parallel flowcharts)– V. Varshavsky et al. (’90,’96, translation from PN
into an interconnection of David Cells)• See various examples of synthesis in both styles in
Yakovlev&Koelmans (Petri net lectures, LNCS, 1998)
Patil’s set of modulesPetri net fragment: Circuit equivalent:
wireplace
marked placeinverter
join C C-element
merge XOR
fork fan-out
shared (conflict) place S
s
switchEffectively
RGD arbiter
Example
Buf(1)P(ut)
pr
pa
gr
gaG(et)
passive h/s active h/s
Two phase (NRZ) protocol:
pr pa
ga
gr
Two-phase implementation
(using Patil’s elements):
C
Environment
Example
Buf(1)P(ut)
pr
pa
gr
gaG(et)
passive h/s active h/s
Two phase (NRZ) protocol:
pr pa
ga
gr
Two-phase implementation
(using Patil’s elements):
C
Environment
Example
Buf(1)P(ut)
pr
pa
gr
gaG(et)
passive h/s active h/s
Two phase (NRZ) protocol:
pr pa
ga
gr
Two-phase implementation
(using Patil’s elements):
Environment
C
pr
pa
gr
ga
Environment
Other useful elementsSelect:
Call:
Toggle:
L
L
T
FIn
D
sel
D-
T
F
In
D
F
T
D+
D
D1
R1
R2
D2
R
D2
R1
D1 R
Dcall
R2
DW
R1
R2
D1
D2
R
D
InOut1
Out2
L
L
Out1
Out2
In
ba
Direct synthesis example(modulo-k Up-Down counter)
Modulo kUp/DownCounter
Up
Down
incinc'dec'dec
CNT CNT'
(a) (b)
inc
inc'
dec'
dec
Up
Down
CNTCNT' k-1k-1
k-1 k-1
k-1
inc
inc'
dec'
dec
Up
Down
Mod-k counter LPN Environment LPN
Direct synthesis example(modulo-k Up-Down counter)
Up/DownCounter
Up/DownCounter
Modulo 2 Modulo k/2Up1
inc1
Down1
inc1'
dec1'
Up2
Down2
inc2
MUX_2
MUX_2
dec1 dec2
Up1
Down1 dec2'
inc2'
inc
inc'
dec'
dec
a1
a2b
a1
a2b
Decomposition (structural view)
Direct synthesis example(modulo-k Up-Down counter)
CNT1
Ua
DaCounterUp/DownModulo 2
CNT1'
CounterUp/Down
CNT*CNT*'
Ur*
Dr* Da*
Ua*Ua1Uc1
Dc1Da1
Dr
Ur Ur1
Dr1
(a)
(b)
Modulo k/2
Ur
Dr
inc
inc'
dec'
dec
CNT1'
Ua
Da
inc
inc'
dec'
dec
CNT1CNT*'
Ua1
Dc1
Uc1
Da1
CNT*k* k*= k/2-1k* k*k* k*
CNT1
Ua
DaCounterUp/DownModulo 2
CNT1'
CounterUp/Down
CNT*CNT*'
Ur*
Dr* Da*
Ua*Ua1Uc1
Dc1Da1
Dr
Ur Ur1
Dr1
(a)
(b)
Modulo k/2
Ur
Dr
inc
inc'
dec'
dec
CNT1'
Ua
Da
inc
inc'
dec'
dec
CNT1CNT*'
Ua1
Dc1
Uc1
Da1
CNT*k* k*= k/2-1k* k*k* k*
structure
LPN
Direct synthesis example(modulo-k Up-Down counter)
CNT1
Ua
DaCounterUp/DownModulo 2
CNT1'
CounterUp/Down
CNT*CNT*'
Ur*
Dr* Da*
Ua*Ua1Uc1
Dc1Da1
Dr
Ur Ur1
Dr1
(a)
(b)
Modulo k/2
Ur
Dr
inc
inc'
dec'
dec
CNT1'
Ua
Da
inc
inc'
dec'
dec
CNT1CNT*'
Ua1
Dc1
Uc1
Da1
CNT*k* k*= k/2-1k* k*k* k*
CNT1
Ua
DaCounterUp/DownModulo 2
CNT1'
CounterUp/Down
CNT*CNT*'
Ur*
Dr* Da*
Ua*Ua1Uc1
Dc1Da1
Dr
Ur Ur1
Dr1
(a)
(b)
Modulo k/2
Ur
Dr
inc
inc'
dec'
dec
CNT1'
Ua
Da
inc
inc'
dec'
dec
CNT1CNT*'
Ua1
Dc1
Uc1
Da1
CNT*k* k*= k/2-1k* k*k* k*
structure
LPN
Direct synthesis example(modulo-k Up-Down counter)
DWUr
Dr
B+
B-
Dc1
Da1
Ua1
Uc1
CNT
Ur
Dr
B+B-
B+
B+
B-
CNT' CNT
Ur
Dr
Uc1
Dc1
Da1
Ua1 Ua1
Uc1
Dc1
Da1B-
(a)
CNT'
CNTMerge
(c)
(b)
Toggle 2-by-2 DW
Synthesis into level-based circuits
• David’s method for asynchronous Finite State Machines
• Hollaar’s extensions to parallel flow charts• Varshavsky’s method for 1-safe persistent
Petri nets: based on associating places with latches; the method works for closed (autonomous) circuits with no input choice, arbitration and inputs can only be part of handshakes activated by control logic
David’s original approach
a
b
c
d
x1 x’2
x’1
x2 ya
yc
yb
x’2
x1
Fragment of a State Machine flow graph
CUSA element for storing state b
Hollaar’s approach
K
L
A
B
K
N
M
L
N
Fragment of a flow-chart (allows parallelism) One-hot circuit cell
A B
(0) (1)
11
(1)
(1)
(0)
(1)
M
Hollaar’s approach
K
L
MA
B
K
N
M
L
N
Fragment of flow-chart One-hot circuit cell
A B1
1
0
(1)
(0)
(1)
01
Hollaar’s approach
K
L
MA
B
K
N
M
L
N
Fragment of flow-chart One-hot circuit cell
A B0
1
1
(1)
(0)
(1)
01
Varshavsky’s Approachp1 p2
p1 p2
(1) (0) (0) (1)
1*(1)
OperationControlled
To Operation
Varshavsky’s Approachp1 p2
p1 p2
(1) (0) 0->1 1->0
1->0 (1)To Operation
Varshavsky’s Approachp1 p2
p1 p21->0 0->1 0->1 1->0
1->0->1 1*To Operation
Varshavsky’s Approach
• This method associates places with latches (flip-flops) – so the state memory (marking) of PN is directly mimicked in the circuit’s state memory
• Transitions are associated with controlled actions (e.g. activations of data path units or lower level control blocks – by using handshake protocols)
• Modelling discrepancy (be careful!): – in Petri nets removal of a token from pre-places and adding
tokens in post-places is instantaneous (i.e. no intermediate states)
– in circuits the “move of a token” has a duration and there is an intermediate state
Direct mapping of LPNs and STGs
LPN-to-DC mapping elements
linear
join
fork
controlled choice
arbitrated choice
merge
input test
Gate-level DC implementations
Fast David cellvdd
t2
t1
a1
a2
GasPsection
vdd
vdd
GasPsections
r2''
a2''
r1''
a1''
r2'
a2'
a1'
r1'
Fast DC
Timing assumptions
GasP section The same with negative gates
Implementability condition for LPNs
• Autonomous control interpretation: each transition is associated with a handshake to the controlled part (datapath) or a dummy
• Implementability: Any 1-safe labelled PN with autonomous control semantics of transitions with no loops of less than three transitions can be directly mapped into a speed-implemented control circuit whose behaviour is equivalent (bisimilar) to the PN
• Consistency of labelling: transitions labelled by reference to the same datapath blocks must be conistent with the local semantics of those blocks (e.g. must not be mutually concurrent)
Main talk outline
• Motivation: design flow problems• Backend language: Petri nets?• New design flow: two-level control • Direct mapping of PNs: event-based and
level-based Direct mapping of STGs• Case studies• Conclusion
Direct mapping of STGsRin
Ain
Decoupled latch
controller Aout
Rout STG specification:
Mapped circuit:
Rout Aout
Here all signal transitions are associated with handshakes and handshake compression must be done before mapping
What about direct mapping of arbitrary STGs
• Associate with each output transition a latch (one per signal x), with each input some sampling logic and set (for x+) or reset (for x-) handshake - pull for inputs, and push for outputs.
inp1+ inp2+ out1+
out2+
inp1-out1-
inp1+
inp1-
inp1Inputsample
pull
out1Outputlatch
push
out1+
out1-
What about direct mapping of arbitrary STGs
inp1+ inp2+ out1+
out2+
inp1-out1-
inp2+
inp2-
inp2Inputsample
pull
out1Outputlatch
push
out1+
out1-
DC1 DC3
inp2+ out1+
Problem: long delay between input event and output response
mux & demux logic
What about direct mapping of arbitrary STGs
• Another problem for direct mapping: STGs may contain self-loops (or read arcs) for testing ‘level-oriented’ inputs and outputs:
x+ x-
x=1
y+
Low latency approach
• Can we connect inputs directly to the control structure to minimise the i/o latency?
DC1 DC3
inp2out1+
The problem of mapping STGs
• Given: an 1-safe STG• Target: netlist of David cells, input wires
and output flip-flops• Procedure: use direct mapping of
elements of underlying PN into elements of the netlist
• Problem: need for intermediate form of STG, where I/O is connected to control by read arcs only
Device environment interface
Device environment interface
tracker
Output
latch
Input wire
To derive circuit implementation we only use tracker and i/o subnets
Direct mapping
Optimisation
a=0
a+
a-
a=1
b=0
b-
b=1
b+
c=0
c+
c=1
c-
d=0
d-
d=1
d+
input input
output output
Environment
a=0
a+
a-
a=1
b=0
b-
b=1
c=0
c+
c=1
c-
d=0
d-
d=1
d+
output output
input input
p5
p1
b+
(a+)
(b+)
(c+)
(d+)
Removing places from the tracker.
Latency reduction effect if the place between an input and the following output is removed.
Coding conflicts are possible.
Places perform state separation.
Tracker
Tracker
Optimisation: coding conflicts
a+ b+ a-
p1 t1 p2 t2 p3 t3 p4 t4 p5
inp. outp. inp. outp.
c+
a+
!!!!!!
output
b+
b-
output
c+
c-
p1 p5p3t2
(b+)
a=0
a-
a=1input
b=1b=0 c=0 c=1
Input signal a changes twice between p1 and p5.
Keeping p3 solves the conflict and preserves low latency.
Irreducible input coding conflicts
• Certain input labelling cannot be implemented in a speed-independent way, without timing assumptions (e.g. input changes are slower than David cell operation) or without changing the I/O interface (introduce new outputs response to the environment)
inp+ inp- out+
inp=0 inp=0
Inseparable states
(for the tracker)
Implementability of STGs
• Sufficient condition: – an STG with a 1-safe underlying PN with
consistent signal transition labelling (transitions of the same signal are in precedence and +/- alternate) and monotonic input bursts (for each connected input-labelled subgraph each signal changes only once)
• N&S condition is an open problem!
Main talk outline
• Motivation: design flow problems• Backend language: Petri nets?• New design flow: two-level control • Direct mapping of PNs: event-based and
level-based• Direct mapping of STGs Case studies• Conclusion
Communication channel example
• A duplex delay-insensitive channel for low power and pin-efficiency proposed by Steve Furber (AINT’2002)
• Relatively simple data path (with handshake access via push and pull protocols)
• Sophisticated control (involves arbitration, choice and concurrency)
• Natural two-level control decomposition• Requires low-latency (existing STG and BM
solutions produce too heavy logic)
Channel Structure
Master Slave
N-of-M code
N-of-M code
N-of-M codes: dual-rail, 3-of-6,2-of-7
Key Protocol Symbols (e.g. in dual rail):
Start (01), Ack (10), Slave-Ack (11), Data (01 or 10)
Protocol Specification
Master SlaveProtocol
Automaton
The protocol can be defined on an imaginary Protocol Automaton receiving symbols from both sides (it will hide all activity internal to Master and Slave)
Protocol Specification
Master SlaveProtocol
Automaton
Controller Overview
push push
pull pull
High
Level
control
Data path
and low level
control
push
Low-level logic
Tx controller
Sending interface
LPN model for high level control (master)
Calls to local arbiters Slave-Ack pull
Three-way pulls
Three-way pushes
pushes
pulls
dummies inserted for direct DC mapping
High level control (master) mapped directly from LPN
arbiter1
arbiter2
push
pull
dummies
pull
pullpush push
push
push
push
pull
Towards synthesis for higher performance
dummypull
push
pull
Is the dummy in the right place?
It is on the cycle of (output) push and (input) pull:
pull->dummy->push->pull-dummy->push -> …
Towards synthesis for higher performance
pull
push
Critical path
Non-critical path Synthesis rule:
Don’t insert dummies on critical paths
dummy
Synthesis for lower I/O latency LPN level
pull logic
Environment (channel)
pull push internal actions
pull
High-level control
… …
push logic
input output input…
pull logic
Low latency shortcut
Channel Cycle Time
Controller Implementation
Simplex mode Duplex mode
Direct mapping from LPN
7.6 ns 8.3 ns
Logic synthesis from STG
12.7 ns 16.5 ns
• These results were obtained for 0.6 micro CMOS
• Further improvement can be achieved by more use of low latency techniques (at the gate level) and introducing aggressive relative timing, in David cells and low level logic
Case study: VME bus controller
lds-
ldtack-
dsr+ dsw+
dsr-
dsw-
lds+
ldtack+
d+
dtack+
d-
d+
lds+
ldtack+
d-
dtack+
dtack-
ldtack
ldtack
dsw
dsrd-
dswdsr
lds+ d+
lds+d+ldtack
dtack+
dtack+
d-
dtack- lds-
ldtack
Inputs: dsr, dsw, ldtack Outputs: d, lds dtack
Case study: VME bus controller
dsr
dtack lds
d
dtack
lds
dsw
dsw
dsr
d=0 d=1
d+
d-
r1
r2
ldtackw1
w2ldtack
d
dtack-
d
m
r1w2dtack+
dtack=0 dtack=1
ldtackldtacklds+
lds-
r1 w1
lds=1
m
d
lds=0
r1
r2
w1
w2
m
m
dtack
lds
1
r1
d
w2
ddtack
m
lds
dtack
dtack
r1
lds
lds
w1
d
m
ldtack
ldtack
dsr
ldtack
dsr
d
r1
ldtack
dsr
dw1
ldtack
w2 r2
d
dsr
dsw
r1
r2
w1
w2
c 1
0
0
1
1
m
w2dsw
Case study: VME bus controller
Case study: VME bus controller
dsr
csc
ldtack
dsw
csc
dsr
csc
ldtack
d
d
d
dsr
csc
d
dsr
csc
dtack
dsw
d
dtack
ldtack
dsr d
csc
csclds
Circuit generated by logic synthesis (Petrify):
•Smaller, though comparable in size
•Transistor stacks are larger
Case study: VME bus controllerLatency comparison between our method and Petrify solution.
Transition Petrify Fast DC
ldtack+ -> d+ 0.35ns 0.29ns
ldtack+ -> d- 0.20ns 0.16ns
d+ -> dtack+ 0.27ns 0.27ns
dsw- -> dtack- 0.42ns 0.44ns
ldtack- -> lds+ (rd) 0.38ns 0.21ns
ldtack+ -> lds+ (wr) 0.38ns 0.29ns
dsw- -> lds- 0.33ns 0.26ns
Number of transistors 32 56
Conclusion
• Hierarchical (eg. Protocol) controller synthesis can go via back-end LPN/STG models
• Direct mapping from LPNs/STGs yields fast circuits that are easy to analyse and test
• Translation from PNs to David cell netlists implemented in tool pn2dc
• Translation from VHDL specs to LPNs and STGs implemented in tools fsm2lpn and fsm2stg
• Further work needed on:• Formal link between HDLs and PNs (semantics and
equivalence), leading to better synthesis of PNs from HDLs • Optimisation techniques at LPN/STG and circuit levels
• See our papers in Async’02 and ISCAS’02
Open problems
• Formally characterise properties of PNs that make them good for circuit design, like optimality wrt I/O response time, worst/average case cycle time, positions of silent (dummy) events
• Control (place/transition nets)+datapath separate versus use of high-level nets for both
• Testing via Petri nest specification (faults in PNs: stuck tokens, transitions …)