An Ultra-Low Power Asynchronous-Logic

38
Prepared and Presented By: Hossam Hassan MSIS LAB, CBNU An Ultra-Low Power Asynchronous-Logic In-Situ Self-Adaptive VDD System for Wireless Sensor Networks Authors: Tong Lin, Kwen-Siong Chong, Joseph S. Chang, and Bah-Hwee Gwee Journal: IEEE Journal of Solid-State Circuits, vol. 48, no. 2, 2013

Transcript of An Ultra-Low Power Asynchronous-Logic

Page 1: An Ultra-Low Power Asynchronous-Logic

Prepared and Presented By:Hossam HassanMSIS LAB, CBNU

An Ultra-Low Power Asynchronous-Logic In-Situ Self-Adaptive VDD System for Wireless Sensor Networks

Authors: Tong Lin, Kwen-Siong Chong, Joseph S. Chang, and Bah-Hwee GweeJournal: IEEE Journal of Solid-State Circuits, vol. 48, no. 2, 2013

Page 2: An Ultra-Low Power Asynchronous-Logic

Outline• Preliminaries • Wireless Sensor Network• Node Architecture• Proposed Idea for Low Power Design• Self-Adaptive VDD System for Wireless Sensor Networks• Adaptive Vdd Scaling Systems• System Design• Results And Benchmarking

Page 3: An Ultra-Low Power Asynchronous-Logic

Preliminaries • What Is Asynchronous Logic?

• Traditional way of Sequencing and Computation is the use of a global time reference (“the clock”)

• Can we compute without a clock?• Yes!: “asynchronous” or “clockless” logic• Also “self-timed” or “speed-independent”

• Asynchronous system: collection of modules communicating by handshake protocols

• Can we compute without a clock and without delay assumptions?• Quasi-delay-insensitive (QDI) logic

Adopted from:Alain J. Martin, California Institute of Technology

Page 4: An Ultra-Low Power Asynchronous-Logic

Preliminaries • Why Asynchronous and QDI Logic?

• No clock• Up to 50% of clock power recuperated (get back)

• Automatic shut-off of idle parts• Perfect clock gating

• No glitches (spurious transitions)• Up to 50% of power in combinational circuits

• Automatic adaptation to parameter’s variations• Voltage scaling: Perfect exchange of delay against energy through voltage scaling

• Flexibility of asynchronous interfaces: • Better use of concurrency

• Robustness to PVT Variations: Variations of physical parameters all affect timing.

Adopted from:Alain J. Martin, California Institute of Technology

Page 5: An Ultra-Low Power Asynchronous-Logic

Preliminaries • Disadvantages of Async

• Size overhead (more transistors) (i.e. Handshaking)• Poorly understood and rarely taught• No industrial CAD tools (yet) (i.e. Custom Design)• No well-developed testing procedure (yet) (i.e. Custom Design)

Page 6: An Ultra-Low Power Asynchronous-Logic

Preliminaries • Static Logic vs Dynamic Logic

Page 7: An Ultra-Low Power Asynchronous-Logic

Preliminaries • NULL Convention Logic

• NCL is a delay-insensitive (DI) asynchronous (i.e. clockless) paradigm, which means that NCL circuits will operate correctly regardless of when circuit inputs become available; therefore NCL circuits are said to be correct by-construction (i.e. no timing analysis is necessary for correct operation). NCL circuits utilize dual-rail or quad-rail logic to achieve delay-insensitivity.

Page 8: An Ultra-Low Power Asynchronous-Logic

Preliminaries • Pre-Charge Static Logic (PCSL):

• It is an asynchronous-logic Quasi-Delay-Insensitive architecture based on Static-Logic, featuring fully-range Dynamic Voltage Scaling including robust operation in the sub-threshold voltage regime, with simultaneous low hardware overheads, high-speed and yet low power dissipation.

• The PCSL logic circuit achieves this by integration of the Request sub-circuit into the Static-Logic cell.

• During the initial phase, the output of Static-Logic cell (within the PCSL logic circuit) is pre-charged.

• During the evaluate phase, the Static-Logic cell computes the input and the PCSL logic circuit outputs the computation.

Enable the circuit

State Retention (i.e store the logic output value)

Pre-Charged Static-Logic (PCSL) architecture

Page 9: An Ultra-Low Power Asynchronous-Logic

Preliminaries • Muller C-elements:

• It is a small digital block widely used in design of asynchronous circuits and systems. • In a Synchronous Circuit, the role of the clock is to define points in time where signals are stable and valid. In

between the clock ticks, signals may exhibit hazards and may make multiple transitions as combo circuit stabilizes.

• In Asynchronous System, situation is different. The absence of clock means signals are valid all the time, every transition has a meaning and consequently any hazard and races must be avoided.

Muller C Element and corresponding CMOS implementation.

Truth Table for Muller C Element

Page 10: An Ultra-Low Power Asynchronous-Logic

Preliminaries • Filter bank

• In signal processing, a filter bank is an array of band-pass filters that separates the input signal into multiple components, each one carrying a single frequency sub-band of the original signal.

• The process of decomposition performed by the filter bank is called analysis (meaning analysis of the signal in terms of its components in each sub-band); the output of analysis is referred to as a sub-band signal with as many sub-bands as there are filters in the filter bank.

• The reconstruction process is called synthesis, meaning reconstitution of a complete signal resulting from the filtering process.

Page 11: An Ultra-Low Power Asynchronous-Logic

Preliminaries • Frequency Response Masking (FRM):

• Frequency-response masking filters are a technique to design sharp low-pass, high-pass, bandpass and band-stop filters with arbitrary passband bandwidth.

• furthermore linear phase FIR filters are generated, which have advantages such as guaranteed stability and are free of phase distortion.

• however, the problem with FIR filters is the high complexity for sharp filters • with the frequency-response masking technique the resulting filter has very sparse

coefficients • since only a very small fraction of its coefficient values are nonzero, its complexity is very

much lower than the infinite word-length minimax optimum filter • with an additional multiplier-less design method the complexity is reduced to a minimum • in linear phase FIR filters phase is a linear function of frequency • they have a symmetric impulse response

Page 12: An Ultra-Low Power Asynchronous-Logic

Preliminaries • Dynamic frequency scaling

• It is a technique in computer architecture whereby the frequency of a microprocessor can be automatically adjusted "on the fly", either to conserve power or to reduce the amount of heat generated by the chip.

• It is commonly used in laptops and other mobile devices, where energy comes from a battery and thus is limited.

• Dynamic voltage scaling:• It is another power conservation technique that is often used in conjunction with frequency

scaling, as the frequency that a chip may run at is related to the operating voltage.• Since increasing power use may increase the temperature, increases in voltage or frequency

may increase system power demands.

Page 13: An Ultra-Low Power Asynchronous-Logic

Preliminaries Impact of DVS

Page 14: An Ultra-Low Power Asynchronous-Logic

Wireless Sensor Network• Spatially distributed autonomous sensors• Monitor physical or environmental conditions

• Temperature, sound, etc.

• Pass their data through the network to a main location• Modern networks are bi-directional, also enabling control of sensor activity• Applications

• Battlefield surveillance• Industrial process monitoring

Page 15: An Ultra-Low Power Asynchronous-Logic

Wireless Sensor Network• The WSN is built of "nodes“

• a few to several hundreds or even thousands• each node is connected to one (or sometimes several) sensors

• Each such sensor network node has typically several parts• a radio transceiver• a microcontroller• an electronic circuit for interfacing with the sensors• an energy source, usually a battery

• As the WSN is typically designed for multiple-year operational life-span, power is carefully budgeted and where pertinent, energized only when required, such that the overall average power is typically 10–100 uW.

• Achieve the lowest possible power operation for the prevailing throughput and circuit conditions—VDD adjusted to within 50 mV of the minimum voltage, yet high operational robustness with minimal overheads for a WSN.

Page 16: An Ultra-Low Power Asynchronous-Logic

Node Architecture

Page 17: An Ultra-Low Power Asynchronous-Logic

Proposed Idea for Low Power Design • Signal processor accounts for ~50% of total power

consumption• ‘Sub-threshold Self-Adaptive Scaling’ (SSAVS)

• Circuits work in sub-threshold region• Supply voltage is adjusted dynamically depending on

the processing speed required by external environment

• Adopting the Quasi-Delay-Insensitive (QDI) asynchronous-logic protocols where the circuits therein are self-timed,

• Embodiment of Subthreshold Pre-Charged-Static-Logic (PCSL) design approach.

• Async SSAVS system has been benchmarked against its conventional sync DVFS system counterpart.

Page 18: An Ultra-Low Power Asynchronous-Logic

Proposed Idea for Low Power Design • Asynchronous logic implementation

• Pre-charged Static Logic (PCSL)• Superior than existing asynchronous logics in energy, delay and chip area.

Page 19: An Ultra-Low Power Asynchronous-Logic

Self-Adaptive VDD System for Wireless Sensor Networks• As the WSN is typically designed for multiple-year operational life-span, power is carefully

budgeted and where pertinent, energized only when required, such that the overall average power is typically 10–100 uW.

• In our WSN depicted in Fig. 1, its overall active/passive operation ratio is approximately 20/80. In the passive mode, only the Sensor Front-End module is continuously energized. The Sensor and the Conditioning Circuits therein are powered directly by VDD_BAT ( 2.8 V) battery, via a Low-Dropout (LDO) Regulator.

• The Simple Processor is powered by VDD_NOM (1.2 V) via a power-efficient Buck DC-DC Converter.

• The Simple Processor ascertains if the input is possibly useful, and if it is, the WSN goes into active mode where it signals the Power Management module to energize the Signal Processor module via VDD_ADJ .

Page 20: An Ultra-Low Power Asynchronous-Logic

Self-Adaptive VDD System for Wireless Sensor Networks• The voltage of VDD_ADJ, typically in the sub-threshold voltage (sub-Vt) range, is self-adjusted

such that the lowest possible voltage is used—to enable ultra-low power operation. • Signal Processor Module:

• The Signal Processor module buffers (via a FIFO) the output of the Simple Processor, filters the output signal before final computation by the Microcontroller Unit (MCU).

• When the MCU ascertains that the filtered signal is useful, the Wireless Transceiver is energized and the processed signal is subsequently transmitted wirelessly.

• With the wireless transmission expected to be 0.01% active and with a 20/80 WSN active/passive operation, 50% of the overall power is attributed to the Signal Processor module, which is of interest in terms of power dissipation.

Page 21: An Ultra-Low Power Asynchronous-Logic

Self-Adaptive VDD System for Wireless Sensor Networks• The approaches taken to minimize power involve all levels of the design space including

algorithmic design and at the hardware level. • Frequency Response Masking (FRM) technique

• In the algorithmic design, the filtering in the Signal Processor module embodies the Frequency Response Masking (FRM) technique.

• This involves the Interpolated Finite Impulse Response (IFIR) Filter and the FRM Filter Bank (FB), and is computationally more efficient than the usual FIR and IIR filter approaches.

• Ultra-low power design techniques in the hardware level, the operation in the sub- region is one of the most effective. • This is particularly applicable because the speed of the digital circuits in the Signal Processor is modest

—the clocking speed ranges from 1.4 kHz to 1.4 MHz for a sampling rate range from 0.1 kSamples/s (kS/s) to 100 kS/s.

Page 22: An Ultra-Low Power Asynchronous-Logic

Self-Adaptive VDD System for Wireless Sensor Networks• Despite the potential advantages of sub- operation, this region of operation is challenging here

for several reasons. • First, the WSN is designed to work in a wide range of conditions, including extreme environments (-55o

C to +125o C). • Second, Process, Voltage and Temperature (PVT) variations for fine-dimensioned CMOS processes

increase dramatically in sub- operation, and the ensuing delay variations are very severe, possibly intractable. Typically, a very large delay safety margin (for synchronous-logic (sync) circuits) would need to be allowed for.

• Third, the input signal to the Signal Processor module is variable. From a robust operation perspective, the circuits would need to be designed to meet the worst-case conditions— the fastest input rate and extreme temperatures.

• To design the WSN for ultra-low power operation, a self-adjusting VDD approach whilst operating in the sub-Vt region, termed ‘Sub-threshold Self-Adaptive VDD Scaling’ (SSAVS) where the VDD is in-situ dynamically self-adjusted is adopted.

Page 23: An Ultra-Low Power Asynchronous-Logic

Self-Adaptive VDD System for Wireless Sensor Networks• The operation involves ‘dialing up’ VDD when the need for computation increases or when the

operating conditions are less favorable, and VDD is ‘dialed-down’ when the conditions are the converse. • Put simply, the lowest VDD is used where possible because in general the lower the VDD, the lower is

the power dissipation due to dynamic and leakage currents.

• The novel self-adjustment is obtained very simply—by exploiting (and comparing) the existing Request and Acknowledge signals of the QDI protocol signaling, and thereafter adjusting the VDD_ADJ accordingly. The ensuing overhead is hence very low.

Page 24: An Ultra-Low Power Asynchronous-Logic

Adaptive Vdd Scaling Systems• The general modality of adaptive VDD scaling systems to reduce power is to adaptively adjust as

low as possible (with appropriate timing margin) to meet the throughput requirement for the prevailing operating conditions (including PVT variations).• This largely requires the pertinent circuit delay variations to be tracked, observed, or inferred.

• There are many reported techniques, but it can be argued that these reported tracked, observed and inferred techniques are inadequate in terms of robustness, particularly in sub-Vt operation. Further, the hardware/computation overheads are considerable, including the need to scale VDD with the scaling of the clock frequency, i.e. Dynamic Voltage Frequency Scaling (DVFS).

• The proposed idea directly measuring the delay and comparing it against the throughput for the prevailing conditions, and VDD is thereafter adjusted accordingly. • To enable this, the adoption of the self-timed async QDI where its dual-rail encoding includes the

Request signal which indicates that the input sample is ready and the Acknowledge signal that indicates the completion of the computation.

Page 25: An Ultra-Low Power Asynchronous-Logic

Adaptive Vdd Scaling Systems• By counting the number of Requests against Acknowledges within a given period, we ascertain if

the delay of the circuit is excessive, or otherwise, with respect to the throughput for the prevailing conditions.• VDD is thereafter adjusted accordingly such that the delay is just slightly less than the delay between

input samples, thereby satisfying the throughput.

• Further, as Acknowledges is inherent in QDI async protocols, the computation is uninterrupted while VDD is transitioning during its self-adjustment; in reported adaptive scaling systems, circuit operation typically ceases when is transitioning.

Page 26: An Ultra-Low Power Asynchronous-Logic

System Design• Fig. 2 depicts the proposed SSAVS system

within the Power Management module embodying the SSAVS Controller and its associated adjustable VDD means (a Buck DC-DC Converter), and the PCSL-based 8x8-Bit Quad-Channel Async QDI FRM FB within the FRM FB.

• There are two voltage rails in the overall proposed SSAVS system a fixed VDD_NOM and a variable VDD_ADJ whose sub-Vt voltage typically ranges from 150mV to 400mV.

• For ease of illustration, the specific VDD rail is shown in parenthesis for the supply rails and for signals of the various modules.

Page 27: An Ultra-Low Power Asynchronous-Logic

System Design• In Fig. 2, the voltage of input and of request signals is first adjusted from VDD_NOM =1.2 V to

VDD_ADJ by the Step-Down Level Converter, and are thereafter buffered by the Async FIFO Buffer (depth of 50) before input (Input_FB and Req_FB) to the async FRM FB.

• The FB outputs (Output 1–4) and their associated Acknowledges (combined from Ack 1–4 via the Completion Detection Circuit) are output to the MCU for further processing.

• Acknowledge is also fed back to the Async FIFO Buffer. • The Request and Acknowledge signals are input to the Power Management module, and

Acknowledge is stepped up from VDD_ADJ to VDD_NOM. • The SSAVS Controller within the Power Management module monitors the number of requests

and Acknowledge signals in each period (a 10 Hz clock generated by the Update VDD Clock Generator for a target throughput of 1 kS/s).

• The VDD_Code is a 5-bit code that sets one of 24 voltage levels (in the Buck DC-DC Converter) ranging from ‘00000’=50 mV to ‘10111’=1.2 V (in 50 mV steps) for VDD_ADJ.

Page 28: An Ultra-Low Power Asynchronous-Logic

System Design• Fig. 3 graphically depicts an example of the self-adjustment of VDD_ADJ. • When the WSN is first initiated, the SSAVS Controller outputs VDD_Code = ‘10111’, equivalently

VDD_ADJ = 1.2 V, and the speed of the FB would far exceed the required computation.• The voltage of VDD_ADJ of the FB is in-situ adaptively self-adjusted to be as low as possible

(within 50 mV) to meet the throughput for the prevailing operating conditions, and on average, the voltage of VDD_ADJ is slightly higher than the actual required minimum.

• Hence, the FB is ultra-low power and highly power-efficient.

Page 29: An Ultra-Low Power Asynchronous-Logic

System Design• In view of the need for sub-Vt operation, it is imperative to adopt circuits based on the static-logic

family to mitigate the effects of critical transistor sizing; dynamic- and pass-logic families are inappropriate.

• Pre-Charged Static-Logic’ (PCSL).• The basic architecture comprises an Inverting Static-Logic Cell, three transistors (for output pre-charging

during the reset phase/evaluation during the computation phase), and two inverters (for output buffering). The outputs are Q.T (Output True) and Q.F (Output False).

The basic architecture of the proposed async cells, coined ‘Pre-Charged Static-Logic’ (PCSL).

Page 30: An Ultra-Low Power Asynchronous-Logic

System Design• In PCSL cells, when Request is ‘0’, both outputs are ‘0’. On the other hand, when Request is ‘1’

(indicating that an operation is ready) and when the input signals are valid, the operation commences and an ensuing output is obtained.

• The architecture of the PCSL cell involves an integration of the sub-circuit associated with the signal and a buffer (to each output) into the standard static-logic library cell (redesigned for dual-rail async), thereby sharing of (common) transistors. • This reduces the number of transistors, resulting in simultaneous lower power/energy dissipation,

faster speed and smaller IC area.

Page 31: An Ultra-Low Power Asynchronous-Logic

System Design• To depict the hardware advantage of the proposed PCSL approach, the 2-input AND/NAND gate in

can be compared to the same gate realized by three reported static-logic QDI approaches:a) Delay-Insensitive- Minterm-Synthesis (DIMS) approach b) NULL Convention Logic (NCL) with complex gates (denoted NCL1), and c) NCL with fast-reset complex gates denoted NCL2).

Page 32: An Ultra-Low Power Asynchronous-Logic

System Design• On the basis of simulations (130 nm CMOS), delay and IC area of six basic cells of the various

approaches. The competing cells are normalized to the PCSL cells whose actual values are shown within parentheses. The average attributes are tabulated in the last row.

• Cells embodying the proposed PCSL approach simultaneously exhibit the lowest , shortest delay and smallest IC area.

Page 33: An Ultra-Low Power Asynchronous-Logic

System Design• With the proposed PCSL QDI realization approach, an 8x8-Bit Quad-Channel Async QDI FRM FB is

designed. • A semicustom design flow is adopted.• Each FB channel comprises an Async Read/Write Controller, an 8x8-Bit Coefficient Memory, an 8x8-Bit

Data Memory, an 8-Bit PCSL Multiplier, and a 20-Bit PCSL Adder.• To preserve the QDI protocol and proper async handshaking, Datapath Completion Detection (DCD) and

Latch Completion Detection (LCD) circuits are included with Muller C-elements (denoted by a ‘C’).

Latch Completion Detection (LCD)

Datapath Completion Detection (DCD)

Page 34: An Ultra-Low Power Asynchronous-Logic

Scenario 1, the sync DVFS system embodies a temperature sensor and on the basis of the measured temperature and pre-characterization of the sync filter, the clocking frequency is selected accordingly.

RESULTS AND BENCHMARKING

Page 35: An Ultra-Low Power Asynchronous-Logic

Scenario 2, the sync DVFS system is much simpler where the clocking frequency is fixed (to the worst-case) to accommodate all conditions.

RESULTS AND BENCHMARKING

Page 36: An Ultra-Low Power Asynchronous-Logic

• Scenario 1, no specific FB is particularly advantageous—the sync DVFS FB and async SSAVS FB are advantageous in different conditions.

• Nevertheless, the sync FB may be disadvantageous if the temperature sensor overheads associated with DVFS for Scenario 1 are considered.

• In Scenario 2, the async FB is advantageous in terms of reduced delay with respect to VDD, usually lower Eper with respect to VDD, and in terms of power dissipation, advantageous in some conditions (while the sync advantageous in other conditions).

• Further, in the context of continuous circuit operation and overheads associated with DVS, the proposed SSAVS is advantageous over the conventional DVFS in terms of uninterrupted circuit operation and not requiring external intervention (such as changing clock rate, pre-characterization, etc.).

Results And Benchmarking

Page 37: An Ultra-Low Power Asynchronous-Logic
Page 38: An Ultra-Low Power Asynchronous-Logic