Demystifying Data-Driven and Pausible Clocking Schemes
description
Transcript of Demystifying Data-Driven and Pausible Clocking Schemes
![Page 1: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/1.jpg)
Demystifying Data-Driven and Pausible Clocking Schemes
Robert MullinsComputer Architecture GroupComputer Laboratory, University of CambridgeASYNC 2007, 13th IEEE International Symposium on Asynchronous Circuits and Systems
![Page 2: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/2.jpg)
2
System-Timing: Emerging Challenges• Current shift is from
complex monolithic designs to networks of energy efficient cores
• Distinct block and system-level timing challenges
• Network-level timing– Physically distributed– Activity may be sparse– Interconnect delay and power
are significant– Significant variations in
temperature, supply voltage and process parameters
Higher-level control, timing and scheduling is naturally event-driven
![Page 3: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/3.jpg)
3
Combining Local and Global Approaches to Timing
• Synchronization free approaches• Coping with metastability
– Timing-Safe• Allocate a fixed period of time for metastability to
resolve, e.g. two flip-flop synchronizer– Value-Safe
• Wait for metastability to resolve, e.g. clock stretching or pausing techniques
• Clock is generated locally• Value-safe ideas are less well understood,
avoided by industry
![Page 4: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/4.jpg)
4
Advantages of a value-safe approach
• Efficiency – Synchronization delay is minimized– Opportunities for optimization
• Robustness– Inherently robust, no trade-off against performance. – Only way to guarantee data is never lost, no MTBF.
Could still have functional failures if we are delayed too long – don’t hit performance requirements
• Transparency– Synchronous block is unaffected by clocking wrapper. – Less true for traditional synchronization and clock-
gating approaches.• Simplicity and modularity
– I aim to illustrate how simple these schemes are
![Page 5: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/5.jpg)
5
Adding an asynchronous interface to a clock generator
CLOCK
![Page 6: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/6.jpg)
6
Adding an asynchronous interface to a clock generator
C
Req
Ack
CLOCK
![Page 7: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/7.jpg)
7
Adding an asynchronous interface to a clock generator
C
CLOCK
![Page 8: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/8.jpg)
8
Adding an asynchronous interface to a clock generator
C
Req Grant
MUTEX
CLOCK
![Page 9: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/9.jpg)
9
Input register driven by a
pausible clock
![Page 10: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/10.jpg)
10
C C
CLOCK
Ack
Req
CLOCK
Req Grant
MUTEX
Data-Driven Clock Pausible Clock- May need to add a mechanism to ensure block receives enough clock edges, e.g. to flush pipeline
- Need to add an explicit sleep mechanism if we want to halt clock generator during periods of inactivity
Helps classify and understand existing techniques. In reality, the design space is a continuum
![Page 11: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/11.jpg)
11
Stretchable Clocks
A type of data-driven clock1. Rising clock edge is generated2. Stretch signal may be asserted
(synchronously) in response to clk+3. Low-phase of clock is stretched until
some operation has completed and stretch signal is removed
![Page 12: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/12.jpg)
12
Stretchable Clocks
C
Req
Ack
CLOCK
![Page 13: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/13.jpg)
13
Stretchable Clocks
C
Ack
Req
CLOCK
![Page 14: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/14.jpg)
14
Stretchable Clocks
C
Ack
Req
CLOCK
Stretch
![Page 15: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/15.jpg)
15
Stretchable Clocks
C
Ack
Req
CLOCK
Stretch Stretch delays Ack+
![Page 16: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/16.jpg)
16
Stretchable Clocks
C
Ack
Req
CLOCK
Stretch
![Page 17: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/17.jpg)
17
Input Ports
• Arbitrated Inputs– At most one input can be served per cycle
• Synchronised Inputs– Cannot proceed until multiple inputs are ready
• Sampled Inputs– Can progress with a variable number of data inputs
(or none)• Need to also choose event to trigger sampling of inputs
• Paper provides implementation details for each input port type for pausible and data-driven clock generators
![Page 18: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/18.jpg)
18
Output Ports
• Scheduled– Ensure data is output on a particular clock cycle, stall
until data is consumed• Registered
– Addition of an output register allows next computation to proceed while data is consumed
• Polled– Sample output port ready signal and take appropriate
action. Clock period is only ever extended to allow metastability to resolve, not because output is blocked.
![Page 19: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/19.jpg)
19
A GALS Wrapper Example
• Free running clock• Asynchronous input
– we know nothing about when data will arrive
– For simplicity, lets assume we can always accept new data
• Registered output feeding asynchronous FIFO
Simple to combine clock generator, input and output ports
![Page 20: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/20.jpg)
20
A GALS Wrapper Example: Step 1.
Local clock generator with H/S interface
![Page 21: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/21.jpg)
21
A GALS Wrapper Example: Step 2.
Pausible Clock Template
![Page 22: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/22.jpg)
22
A GALS Wrapper Example: Step 3.
Provide registered output port
support (stretchable
clock template)
![Page 23: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/23.jpg)
23
A GALS Wrapper Example: Step 4.
![Page 24: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/24.jpg)
24
Data-Driven Clocking for On-Chip Networks
• Why is global synchrony limiting for on-chip networks?– Reconfigurable networks, adaptive low-voltage
interconnect drivers, irregular topologies, ….• Problem with traditional synchronization
techniques– Latency (could easily double best-case latency, our
routers are single-cycle – support VCs < 30FO4)• Problems with fully-asynchronous
implementations– Latency (for the router designs we have examined)– More difficult to speculate? Scheduling is expensive?
![Page 25: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/25.jpg)
25
Data-Driven Clocking for On-Chip Routers
• Router should be clocked when one or more inputs are valid (or flits are buffered)
• Elevator analogy…– Free running (paternoster) elevator
• Chain of open compartments • Must synchronise before you jump on!
– Traditional elevator (data-driven clock)• Wait for someone to arrive• Close doors, decide who is in and who is out• Metastability issue again (potentially painful!)
![Page 26: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/26.jpg)
26
Data-Driven Clock with Sampled Inputs
Local Clock Generator TemplateSample inputs
when at least one input is ready (and clock is low)
Assert Lock
Either admitted or locked out
(Close Lift Doors)
Incoming data
![Page 27: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/27.jpg)
27
Clock Tree Insertion Delays
• Delay from root to leaf of clock tree can be considerable (certainly non-zero!)
• If every clock cycle is the same, this clock insertion delay is not normally an issue
• If we stretch the clock the insertion delay must be considered in our timing analysis (also true for clock gating in synchronous world)
• Not difficult to handle, but can increase time required to admit new data
![Page 28: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/28.jpg)
28
Clock Tree Insertion Delays
Can place logic here
![Page 29: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/29.jpg)
29
Clock Tree Insertion Delays
• How do we handle multi-cycle insertion delays?
• In practice, we would want to avoid very large synchronous blocks
• Need to ensure we admit data on the correct clock cycle
• Cannot cheat and promote data!
We simply remember on which clock cycle data has been scheduled to be admitted
![Page 30: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/30.jpg)
30
Summary
• Value-safe techniques are simple and robust– Powerful framework for composing synchronous sub-
systems– Build efficient event-driven global communication and
scheduling infrastructure?– Scope for supporting low-power techniques? (self-
timed power-gating, DVFS support, timing-speculation…)
• Scope for exploiting event-driven scheduling and clocking at system-level.
• Synchronization costs are low enough to prompt use in on-chip network applications
• More in the paper, aims to be a useful survey and hopefully fills some gaps too.
![Page 31: Demystifying Data-Driven and Pausible Clocking Schemes](https://reader035.fdocuments.us/reader035/viewer/2022062302/568167df550346895ddd3eef/html5/thumbnails/31.jpg)
31
Thank You!