Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Computer Architecture Group...

download Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Computer Architecture Group Computer Laboratory, University of Cambridge ASYNC 2007,

If you can't read please download the document

Transcript of Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Computer Architecture Group...

  • Slide 1

Demystifying Data-Driven and Pausible Clocking Schemes Robert Mullins Computer Architecture Group Computer Laboratory, University of Cambridge ASYNC 2007, 13 th IEEE International Symposium on Asynchronous Circuits and Systems Slide 2 2 System-Timing: Emerging Challenges Current shift is from complex monolithic designs to networks of energy efficient cores Distinct block and system- level timing challenges Network-level timing Physically distributed Activity may be sparse Interconnect delay and power are significant Significant variations in temperature, supply voltage and process parameters Higher-level control, timing and scheduling is naturally event-driven Slide 3 3 Combining Local and Global Approaches to Timing Synchronization free approaches Coping with metastability Timing-Safe Allocate a fixed period of time for metastability to resolve, e.g. two flip-flop synchronizer Value-Safe Wait for metastability to resolve, e.g. clock stretching or pausing techniques Clock is generated locally Value-safe ideas are less well understood, avoided by industry Slide 4 4 Advantages of a value-safe approach Efficiency Synchronization delay is minimized Opportunities for optimization Robustness Inherently robust, no trade-off against performance. Only way to guarantee data is never lost, no MTBF. Could still have functional failures if we are delayed too long dont hit performance requirements Transparency Synchronous block is unaffected by clocking wrapper. Less true for traditional synchronization and clock- gating approaches. Simplicity and modularity I aim to illustrate how simple these schemes are Slide 5 5 Adding an asynchronous interface to a clock generator Slide 6 6 Slide 7 7 Slide 8 8 Slide 9 9 Input register driven by a pausible clock Slide 10 10 Data-Driven ClockPausible Clock - May need to add a mechanism to ensure block receives enough clock edges, e.g. to flush pipeline - Need to add an explicit sleep mechanism if we want to halt clock generator during periods of inactivity Helps classify and understand existing techniques. In reality, the design space is a continuum Slide 11 11 Stretchable Clocks A type of data-driven clock 1.Rising clock edge is generated 2.Stretch signal may be asserted (synchronously) in response to clk+ 3.Low-phase of clock is stretched until some operation has completed and stretch signal is removed Slide 12 12 Stretchable Clocks Slide 13 13 Stretchable Clocks Slide 14 14 Stretchable Clocks Slide 15 15 Stretchable Clocks Slide 16 16 Stretchable Clocks Slide 17 17 Input Ports Arbitrated Inputs At most one input can be served per cycle Synchronised Inputs Cannot proceed until multiple inputs are ready Sampled Inputs Can progress with a variable number of data inputs (or none) Need to also choose event to trigger sampling of inputs Paper provides implementation details for each input port type for pausible and data-driven clock generators Slide 18 18 Output Ports Scheduled Ensure data is output on a particular clock cycle, stall until data is consumed Registered Addition of an output register allows next computation to proceed while data is consumed Polled Sample output port ready signal and take appropriate action. Clock period is only ever extended to allow metastability to resolve, not because output is blocked. Slide 19 19 A GALS Wrapper Example Free running clock Asynchronous input we know nothing about when data will arrive For simplicity, lets assume we can always accept new data Registered output feeding asynchronous FIFO Simple to combine clock generator, input and output ports Slide 20 20 A GALS Wrapper Example: Step 1. Local clock generator with H/S interface Slide 21 21 A GALS Wrapper Example: Step 2. Pausible Clock Template Slide 22 22 A GALS Wrapper Example: Step 3. Provide registered output port support (stretchable clock template) Slide 23 23 A GALS Wrapper Example: Step 4. Slide 24 24 Data-Driven Clocking for On-Chip Networks Why is global synchrony limiting for on-chip networks? Reconfigurable networks, adaptive low-voltage interconnect drivers, irregular topologies, . Problem with traditional synchronization techniques Latency (could easily double best-case latency, our routers are single-cycle support VCs < 30FO4) Problems with fully-asynchronous implementations Latency (for the router designs we have examined) More difficult to speculate? Scheduling is expensive? Slide 25 25 Data-Driven Clocking for On-Chip Routers Router should be clocked when one or more inputs are valid (or flits are buffered) Elevator analogy Free running (paternoster) elevator Chain of open compartments Must synchronise before you jump on! Traditional elevator (data-driven clock) Wait for someone to arrive Close doors, decide who is in and who is out Metastability issue again (potentially painful!) Slide 26 26 Data-Driven Clock with Sampled Inputs Local Clock Generator Template Sample inputs when at least one input is ready (and clock is low) Assert Lock Either admitted or locked out (Close Lift Doors) Incoming data Slide 27 27 Clock Tree Insertion Delays Delay from root to leaf of clock tree can be considerable (certainly non-zero!) If every clock cycle is the same, this clock insertion delay is not normally an issue If we stretch the clock the insertion delay must be considered in our timing analysis (also true for clock gating in synchronous world) Not difficult to handle, but can increase time required to admit new data Slide 28 28 Clock Tree Insertion Delays Can place logic here Slide 29 29 Clock Tree Insertion Delays How do we handle multi-cycle insertion delays? In practice, we would want to avoid very large synchronous blocks Need to ensure we admit data on the correct clock cycle Cannot cheat and promote data! We simply remember on which clock cycle data has been scheduled to be admitted Slide 30 30 Summary Value-safe techniques are simple and robust Powerful framework for composing synchronous sub- systems Build efficient event-driven global communication and scheduling infrastructure? Scope for supporting low-power techniques? (self- timed power-gating, DVFS support, timing- speculation) Scope for exploiting event-driven scheduling and clocking at system-level. Synchronization costs are low enough to prompt use in on-chip network applications More in the paper, aims to be a useful survey and hopefully fills some gaps too. Slide 31 31 Thank You!