Asynchronous Circuits Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power...
-
Author
emma-sullivan -
Category
Documents
-
view
222 -
download
0
Embed Size (px)
Transcript of Asynchronous Circuits Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power...

Asynchronous CircuitsAsynchronous Circuits
Kent OrthnerWed. March 2nd, 2005
Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams
Kent OrthnerWed. March 2nd, 2005
Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams

for High speed and Low Power VLSI, Carleton UniversityPage 2 Kent Orthner, March 2nd 2005
AgendaAgenda What are Asynchronous Circuits? Advantages & Disadvantages Example Asynchronous Circuit GasP FPGAs Design Project

for High speed and Low Power VLSI, Carleton UniversityPage 3 Kent Orthner, March 2nd 2005
What are Asynchronous Circuits?What are Asynchronous Circuits? Synchronous Circuits
Everything synchronized to a global clockClock edges determine the time instants where data is sampled Register inputs are sampled at the clock rising edgeData wires may glitch between clock edges
“Worst case” operation:The clock frequency is limited by the speed of the slowest stage.The clock frequency must be slow enough that the circuit will work with worst case PVT, and worst case data.
Clock
9 ns
10 ns 10 ns 10 ns
4 ns 6 ns

for High speed and Low Power VLSI, Carleton UniversityPage 4 Kent Orthner, March 2nd 2005
What are Asynchronous Circuits?What are Asynchronous Circuits? Asynchronous Circuits
Eliminate the global Clock signal
States defined in terms of input values and internal actions
Synchronize data transfer by other meansHandshaking, flow control
“Average-case” performance: each block goes as fast as it goes.Each block goes as fast as it goes.
9 ns
10 ns 5 ns 7 ns
4 ns 6 nsAck
Req

for High speed and Low Power VLSI, Carleton UniversityPage 5 Kent Orthner, March 2nd 2005
MicropipelinesMicropipelines Each data channel associated with two abstract control signals
Rdy – indicates when the upstream stage has data. Ack – indicates when the downstream stage is finished with the previous data.
Data moves through a stage when the upstream stage has data available, and the downstream stage is ready for new data.
If no logic processing is being performed, the circuit acts as an elastic FIFO.
C
C C
CRin
Ain R1
A1
R3
A3
A2
R2
Aout
Rout
Din Dout

for High speed and Low Power VLSI, Carleton UniversityPage 6 Kent Orthner, March 2nd 2005
AdvantagesAdvantages Performance
Average-case instead of worst case
Low Power Clock accounts for 30 – 50% of chip
dynamic power Automatic clock gating in
asynchronous
Escape from Metastability No concern about clock crossing:
circuits are metastable-safe by design
Easier Circuit Synthesis No clock distribution, no clock
skews, no clock buffering tree analysis
No timing-driven placement necessary
Technology Scaling Potential No circuit retiming/re-pipelining
necessary Technology-independent, in some
ways Automatic adaptation to physical
properties, PVT
Lower EMI Activity in synchronous circuits
produce predictable EMI patterns
Ease of composition Easier to interface heterogeneous
IP cores No timing assumptions necessary

for High speed and Low Power VLSI, Carleton UniversityPage 7 Kent Orthner, March 2nd 2005
DisadvantagesDisadvantages Vulnerable to circuit hazards & glitches Circuits are larger
more area for control & handshaking logic, encoding scheme, hazard avoidance
More difficult & less mature than synchronous designs Benefits not explored on large-scale VLSI Synchronous designs
are well understood : it’s easier to think sequentially than concurrently provide a simple way to deal with noise and hazards are tolerant to glitches
CAD Tools Synchronous tools are quite mature No such established asynchronous tools

for High speed and Low Power VLSI, Carleton UniversityPage 8 Kent Orthner, March 2nd 2005
Example Asynchronous CircuitExample Asynchronous Circuit TOKYO, Japan, February 9, 2005:Epson Develops the World's First Flexible 8-Bit Asynchronous Microprocessor
Seiko Epson Corp. ("Epson") has announced that it has developed the world's first*1 flexible 8-bit asynchronous microprocessor using low-temperature polysilicon thin-film transistors (LTPS-TFTs) on a plastic substrate
With energy consumption reduced by 70% compared to the synchronous microprocessors now in everyday use, Epson is now researching potential applications for its invention.
Using asynchronous circuit design technology, Epson has been able to:1. Make a stable 8-bit microprocessor
composed of 32,000 LTPS-TFTs,
2. Achieve energy consumption 70% lower than the synchronous design,
3. Reduce electromagnetic radiation by 20dB.

for High speed and Low Power VLSI, Carleton UniversityPage 9 Kent Orthner, March 2nd 2005
GasPGasP A family of asynchronous circuits that provide controls for:
simple pipelines branching and joining, Scatter & gather Join on demand with arbitration
Excess of 1.5 G data items / second in 0.35 um A single wire is used to carry both Ack & req messages, indicating
that each is empty or full. Rely on careful choice of transistor widths to equalize delay in logic
gates.

for High speed and Low Power VLSI, Carleton UniversityPage 10 Kent Orthner, March 2nd 2005
GasP CircuitGasP Circuit
1. If the upstream state conductor is full (low), and the downstream state conductor is empty (high), b and x both conduct, driving the voltage at (1) low.
2. This causes transistor p to turn on, making the data latch momentarily transparent.

for High speed and Low Power VLSI, Carleton UniversityPage 11 Kent Orthner, March 2nd 2005
GasP CircuitGasP Circuit
3. The low voltage at (2) causes transistor d to turn on, driving the downstream state conductor to low (full).
4. This also causes transistor y to turn on, driving the upstream state conductor to high (empty)
5. Transistor t turns on, resetting the top of the nand gate to a high value, causing pass transistor p to turn off.

for High speed and Low Power VLSI, Carleton UniversityPage 12 Kent Orthner, March 2nd 2005
GasP CircuitGasP Circuit
The propagation of data in the forward direction through the circuit is four gate delays per stage: a b c d The transistors for Logic functions must be sized such that the logic functions
take no more than four gate delays.
The propagation of holes in the reverse direction is two gate delays per stage: x y

for High speed and Low Power VLSI, Carleton UniversityPage 13 Kent Orthner, March 2nd 2005
FPGAsFPGAs Commonly built of 4-input look-up tables (LUTs)
Effectively a small RAM block with 1 data bit, and 16 memory locations. Any logic function with up to 4 inputs can be made from a 4 input LUT.
Combinations of LUTs are used to create larger logic functions.
RAM is programmed at configuration time, or during operation. A register for each logic element
Connected with a ‘sea of programmable interconnect’ SRAM used to configured at start-up time

for High speed and Low Power VLSI, Carleton UniversityPage 14 Kent Orthner, March 2nd 2005
FPGAsFPGAs Almost exclusively synchronous
Frequency is limited by the worst case path from a register, through one or more lookup tables, through the routing matrix, and into the next register.
The delay through a LUT is constant (and worst case!) A 2-input XOR function takes as much time as a complex 4-input function.
The path from a register to the next register is very granular If the logic function is 5 inputs, then then the propagation delay is almost doubled over
the 4-input case.
High power Clock distribution network goes everywhere. Power consumed to drive logic elements that aren’t used for a given design

for High speed and Low Power VLSI, Carleton UniversityPage 15 Kent Orthner, March 2nd 2005
Design ProjectDesign Project 16:1 pipeline multiplexer in four stages, using GasP pipeline.
Essentially a 4-input LUT Compare with equivalent synchronous design with the same gate sizes
Performance, Power & Energy per cycle, Circuit Size SPICE Simulations, with 0.13um technology
using TSMC models from MOSIS
Example: Out ABCD
0 In00 In10 In20 In30 In40 In50 In60 In70 In80 In90 In100 In110 In120 In130 In141 In15
Out
Sel [ABCD]
D-Sel0 C-Sel1 B-Sel2 A-Sel3
Delay Delay Delay

for High speed and Low Power VLSI, Carleton UniversityPage 16 Kent Orthner, March 2nd 2005
Design ProjectDesign Project Motivation
The pipeline is shortened when some inputs are not used, leading to reduced propagation delay.
If GasP latches are at each stage within the LUT, the flip-flop after each LUT is not required The effective operating frequency is not due to the propagation between GasP stages, not
LUTs. Performance can be further increased by incorporating GasP FIFO stages into the routing network.
Example: Z AB
0 In00 In10 In20 In30 In40 In50 In60 In70 In80 In90 In100 In111 In121 In131 In141 In15
Out
Delay Delay DelaySel [ABCD]
D-Sel0 C-Sel1 B-Sel2 A-Sel3
0
0
01

for High speed and Low Power VLSI, Carleton UniversityPage 17 Kent Orthner, March 2nd 2005
Tentative ScheduleTentative Schedule
Milestone Date
Background Research February
Design & Implementation of
GasP & Synchronous Circuits
Early / Mid March
Testing & Result Collection Late March
Class Presentation Early April
Prepare Report April

for High speed and Low Power VLSI, Carleton UniversityPage 18 Kent Orthner, March 2nd 2005
ReferencesReferences[1] Sutherland, Ivan, and Fairbanks, Scott, “GasP: A minimal FIFO Control”, Synchronous Circuits and Systems,
2001. ASYNC 2001. Seventh International Symposium on , 11-14 March 2001
[2] Shams, Maitham, Ebergen, Jo, and Elmasry, Mohammed I. “Asynchronous Circuits”, http://citeseer.ist.psu.edu/495643.html
[3] Ebergen, J, “Squaring the FIFO in GasP”, Asynchronous Circuits and Systems, 2001. ASYNC 2001. Seventh International Symposium on , 11-14 March 2001 [1] I. Sutherland, “Micropipelines”, Communications of the ACM, June 1989
[4] Girish Venkataramani, “Asynchronous Logic Design: What, Why and How?” National University of Singapore, Sept, 2004
[5] Myers, Chris J, “Asynchronous Circuit Design”, University of Utah lecture notes
[6] A. Davis, S. Nowick, “An Introduction to Asynchronous Circuit Design”, University of Utah, Columbia University.
[7] Asynchronous Logic Homepage http://www.cs.man.ac.uk/async/
[8] http://www.epson.co.jp/e/newsroom/2005/news_2005_02_09.htm
[9] S.Brown, J. Rose, “Architecture of FPGAs and CPLDs: A Tutorial”, Department of Electrical and Computer Engineering, University of Toronto, 1994

Asynchronous CircuitsAsynchronous Circuits
Kent OrthnerWed. March 2nd, 2005
Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams
Kent OrthnerWed. March 2nd, 2005
Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams

for High speed and Low Power VLSI, Carleton UniversityPage 20 Kent Orthner, March 2nd 2005
Classification: TimingClassification: Timing Delay-Insensitive (DI)
Designed to operate correctly regardless of the delays on gates & wires “Unbounded” gate & delay model assumed.
The class of simple DI operations built out of basic gates is almost empty Practical DI circuits can be build with complex compnents that use timing assumptions within
the component. Example: C-Element
Quasi-Delay Insensitive (QDI) Same as DI, but with Isochronic fork delay assumption
An isochronic fork is a forked wire where all branches have the same or a bounded delay
Weakest compromise to true DI circuits needed to build practival circuits. Speed-Independent (SI)
Unbounded delays for gates and “negligible” (optimistic) delays for wires. Self-timed
The circuit contains a number of elements, where each element may be SI internally. Communication between regions is assumed to be Delay Insensitive.

for High speed and Low Power VLSI, Carleton UniversityPage 21 Kent Orthner, March 2nd 2005
Classification: SignalingClassification: Signaling Control Signaling
Request/Acknowledge (Self-Timed) is popular Four phase / Return to Zero / Level signalling
Req / Ack / Req \ Ack \ : 1 cycle. Two phase / Non-RTZ / Transition Signalling
Req / Ack / : 1 Cycle. Req \ Ack \ : 1 cycle.
Data Signaling Bundled Data
Normal wires, one wire per bit. Use control signals to indicate when data is valid.
Dual-rail data 2 wires per bit, encoding implies data validity 00=no data, 01=0, 10=1, 11=invalid Simple acknowledge control wire