IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF...

10
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog Low-Density Parity-Check Decoder Based on Margin Propagation Ming Gu, Student Member, IEEE, and Shantanu Chakrabartty, Senior Member, IEEE Abstract—One of the key factors underlying the popularity of low-density parity-check (LDPC) code is its iterative decoding algorithm which is amenable to efcient analog and digital im- plementation. However, different applications of LDPC codes (e.g. wireless sensor networks) impose different sets of constraints which include speed, bit error rates (BER) and energy efciency. Our previous work reported an algorithmic framework for de- signing margin propagation (MP) based LDPC decoders where the BER performance can be traded off with its energy efciency. In this paper we present an analog current-mode implementation of an MP-based (32,8) LDPC decoder. The implementation uses only addition, subtraction and threshold operations and hence is inde- pendent of transistor biasing. Measured results from prototypes fabricated in a 0.5 m CMOS process verify the functionality of a (32,8) LDPC decoder and demonstrate the trade-off capability which is realized by adapting a system hyper-parameter. When congured as a min-sum LDPC decoder, the proposed imple- mentation demonstrates superior BER performance compared to the state-of-the-art analog min-sum decoder at SNR greater than 3.5 dB. We show that an optimal conguration of the same MP-based decoder can also deliver up to 3 dB improvement in BER compared to the benchmark min-sum LDPC decoder. Index Terms—Analog decoders, current-mode circuits, error-correcting code, low-density parity-check (LDPC) decoder, margin propagation (MP), piecewise-linear (PWL) approxima- tion. I. INTRODUCTION L OW-DENSITY parity-check codes [1] constitute an important class of capacity approaching error-correcting codes which has seen widespread acceptance in emerging communication standards [2]–[4]. One of the key factors behind the success of LDPC codes is its iterative decoding algorithms [5]–[8] which are scalable and hence can be easily mapped onto digital [9]–[12] and analog [13]–[16] hardware. As LDPC codes are applied to a wide range of applications from digital video broadcasting (DVB) [2] to wireless sensor networks (WSNs) [17], each of these applications imposes a different set of constraints on the LDPC encoding and decoding algorithms and hence demands different performance speci- cations. For instance, a DVB application might require a lower Manuscript received January 04, 2011; revised March 03, 2011; accepted March 14, 2011. Date of publication May 02, 2011; date of current version May 25, 2011. This paper was approved by Associate Editor Stefan Rusu. This work was supported by a research grant from the U.S. National Science Foundation (CCF:0728996). The authors are with the Department of Electrical and Computer Engi- neering, Michigan State University, East Lansing, MI 48824 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/JSSC.2011.2134550 BER than a WSN application, whereas a WSN application might impose a much stricter energy efciency constraint on the LDPC decoding than its DVB counterpart. This motivates the need for LDPC decoders whose BER performance can potentially be traded off with respect to its energy efciency. While this trade-off can be achieved in digital LDPC decoders by changing the decoding time and the number of iterations [10], [18], analog LDPC decoders require a system hyper-pa- rameter that can modify the dynamics of the decoding process to achieve this trade-off. In our previous work, we had reported a margin propagation (MP) based LDPC decoding algorithm [19], [20] and demon- strated that adapting a system hyper-parameter can achieve different levels of message sparsity and hence energy efciency, however, at the expense of different BERs. In this paper, we present a current-mode implementation of a CMOS analog MP-based LDPC decoder. Like other analog LDPC decoders [14], [15], [21], [22], MP-based decoders can also demonstrate superior power/speed ratio and lower switching noise compared to their digital counterparts [9]–[12]. Also, the use of analog decoding eliminates the need for high speed analog-to-dig- ital (A/D) conversion which is typically required for digital decoder implementations, which could lead to signicant sav- ings in power and silicon area. However, there are additional advantages of MP-based decoders when implemented using CMOS analog circuits, which includes: 1) the operation of the MP decoder is invariant to the MOS transistor biasing (weak, moderate and strong inversion) [23]; 2) improved dynamic range due to current-mode implementation. The functionality of the MP-based LDPC decoder is veried and characterized using a (32,8) decoder prototype fabricated in a 0.5 m standard CMOS process. The measurement results validate our previously hypothesized ability of analog MP-de- coders to adapt its BER and energy efciency by adjustment of a system hyper-parameter. We show that, when congured as a min-sum decoder the BER performance of the proposed imple- mentation outperforms a benchmark state-of-the-art min-sum decoder [14] at SNR levels greater than 3.5 dB. Also, when an optimal value of the system hyper-parameter is chosen, the pro- posed decoder outperforms the benchmark min-sum decoder by more than 3 dB. The paper is organized as follows: Section II introduces the conventional LDPC decoding algorithm and describes the basic operational principles underlying the proposed MP-based LDPC decoding technique. In Sections III and IV, the architec- ture and circuit implementation of the basic decoder modules, namely, the check node module and the variable node module, are described. In Section V, we describe the experimental setup 0018-9200/$26.00 © 2011 IEEE

Transcript of IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF...

Page 1: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433

A 100 pJ/bit, (32,8) CMOS Analog Low-DensityParity-Check Decoder Based on Margin Propagation

Ming Gu, Student Member, IEEE, and Shantanu Chakrabartty, Senior Member, IEEE

Abstract—One of the key factors underlying the popularity oflow-density parity-check (LDPC) code is its iterative decodingalgorithm which is amenable to efficient analog and digital im-plementation. However, different applications of LDPC codes(e.g. wireless sensor networks) impose different sets of constraintswhich include speed, bit error rates (BER) and energy efficiency.Our previous work reported an algorithmic framework for de-signingmargin propagation (MP) based LDPC decoders where theBER performance can be traded off with its energy efficiency. Inthis paper we present an analog current-mode implementation ofan MP-based (32,8) LDPC decoder. The implementation uses onlyaddition, subtraction and threshold operations and hence is inde-pendent of transistor biasing. Measured results from prototypesfabricated in a 0.5 m CMOS process verify the functionality ofa (32,8) LDPC decoder and demonstrate the trade-off capabilitywhich is realized by adapting a system hyper-parameter. Whenconfigured as a min-sum LDPC decoder, the proposed imple-mentation demonstrates superior BER performance comparedto the state-of-the-art analog min-sum decoder at SNR greaterthan 3.5 dB. We show that an optimal configuration of the sameMP-based decoder can also deliver up to 3 dB improvement inBER compared to the benchmark min-sum LDPC decoder.

Index Terms—Analog decoders, current-mode circuits,error-correcting code, low-density parity-check (LDPC) decoder,margin propagation (MP), piecewise-linear (PWL) approxima-tion.

I. INTRODUCTION

L OW-DENSITY parity-check codes [1] constitute animportant class of capacity approaching error-correcting

codes which has seen widespread acceptance in emergingcommunication standards [2]–[4]. One of the key factorsbehind the success of LDPC codes is its iterative decodingalgorithms [5]–[8] which are scalable and hence can be easilymapped onto digital [9]–[12] and analog [13]–[16] hardware.As LDPC codes are applied to a wide range of applicationsfrom digital video broadcasting (DVB) [2] to wireless sensornetworks (WSNs) [17], each of these applications imposes adifferent set of constraints on the LDPC encoding and decodingalgorithms and hence demands different performance specifi-cations. For instance, a DVB application might require a lower

Manuscript received January 04, 2011; revised March 03, 2011; acceptedMarch 14, 2011. Date of publication May 02, 2011; date of current version May25, 2011. This paper was approved by Associate Editor Stefan Rusu. This workwas supported by a research grant from the U.S. National Science Foundation(CCF:0728996).The authors are with the Department of Electrical and Computer Engi-

neering, Michigan State University, East Lansing, MI 48824 USA (e-mail:[email protected]; [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/JSSC.2011.2134550

BER than a WSN application, whereas a WSN applicationmight impose a much stricter energy efficiency constraint onthe LDPC decoding than its DVB counterpart. This motivatesthe need for LDPC decoders whose BER performance canpotentially be traded off with respect to its energy efficiency.While this trade-off can be achieved in digital LDPC decodersby changing the decoding time and the number of iterations[10], [18], analog LDPC decoders require a system hyper-pa-rameter that can modify the dynamics of the decoding processto achieve this trade-off.In our previous work, we had reported a margin propagation

(MP) based LDPC decoding algorithm [19], [20] and demon-strated that adapting a system hyper-parameter can achievedifferent levels of message sparsity and hence energy efficiency,however, at the expense of different BERs. In this paper, wepresent a current-mode implementation of a CMOS analogMP-based LDPC decoder. Like other analog LDPC decoders[14], [15], [21], [22], MP-based decoders can also demonstratesuperior power/speed ratio and lower switching noise comparedto their digital counterparts [9]–[12]. Also, the use of analogdecoding eliminates the need for high speed analog-to-dig-ital (A/D) conversion which is typically required for digitaldecoder implementations, which could lead to significant sav-ings in power and silicon area. However, there are additionaladvantages of MP-based decoders when implemented usingCMOS analog circuits, which includes: 1) the operation of theMP decoder is invariant to the MOS transistor biasing (weak,moderate and strong inversion) [23]; 2) improved dynamicrange due to current-mode implementation.The functionality of the MP-based LDPC decoder is verified

and characterized using a (32,8) decoder prototype fabricatedin a 0.5 m standard CMOS process. The measurement resultsvalidate our previously hypothesized ability of analog MP-de-coders to adapt its BER and energy efficiency by adjustment ofa system hyper-parameter. We show that, when configured as amin-sum decoder the BER performance of the proposed imple-mentation outperforms a benchmark state-of-the-art min-sumdecoder [14] at SNR levels greater than 3.5 dB. Also, when anoptimal value of the system hyper-parameter is chosen, the pro-posed decoder outperforms the benchmark min-sum decoder bymore than 3 dB.The paper is organized as follows: Section II introduces

the conventional LDPC decoding algorithm and describes thebasic operational principles underlying the proposed MP-basedLDPC decoding technique. In Sections III and IV, the architec-ture and circuit implementation of the basic decoder modules,namely, the check node module and the variable node module,are described. In Section V, we describe the experimental setup

0018-9200/$26.00 © 2011 IEEE

Page 2: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog

1434 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011

Fig. 1. Tanner graph corresponding to a 32-bit (3,4) LDPC code.

and measurements obtained from fabricated decoder proto-types. Finally, Section VI concludes the paper by discussingnew directions and extensions of the proposed architecture toemerging low-voltage technology.

II. LDPC CODES AND DECODING ALGORITHMS

A. LDPC Codes and Tanner Graph

The structure of an LDPC code can be graphically repre-sented using a Tanner graph [5], an example of which is shownin Fig. 1. It consists of variable nodes , whichare connected to check nodes , using edges.From Fig. 1, it can be seen that the LDPC code consists of 32variable nodes and 24 check nodes. The number of variablenodes can be interpreted as the length of the codeword trans-mitted in a communication system. In this paper, the number ofedges associated with each check node and variable node (alsoknown as the degree of the node) will be denoted by and .If each check(variable) node has the same , the LDPCcode is deemed as a regular LDPC code, otherwise, it is an ir-regular one [24]. A regular LDPC code can be expressed withthe degrees . Thus, the LDPC code represented by Fig. 1is a 32-bit (3,4) LDPC code, or a (32,8) LDPC code.Let denote the set of check nodes connected to the variable

node and represent the set of check nodes other thanthat are connected to variable node . Similarly, let denotethe set of variable nodes connected to the check node and

represent the set of variable nodes other than the nodeconnected to .

B. Conventional LDPC Decoding Algorithms

Conventional LDPC decoding algorithms are either basedon (a) the sum-product formulation [25] or it’s approximationcommonly referred to as (b) the min-sum formulation [6] or itsvariants like (c) the normalized and offset min-sum algorithms[26]–[28].In a sum-product based decoding algorithm, each check

node receives messages (log-likelihood ratios) from its setof neighbors (denoted by ) and computesmessages to be sent to the variable nodes (denoted by

) according to

(1)whereand equivalently .

Similarly, each variable node receives messages from itsneighboring check nodes and re-computes messagesthat will be sent to the check node (denoted by )according to

(2)

where denotes the initial value of the message computedusing the noisy signal-levels (corresponding to transmitted bits)received over the communication channel.Equation (1) is typically considered the most computation-

ally intensive part of the LDPC decoding algorithm and severalapproximation techniques have been proposed in the analog anddigital VLSI literature to reduce the computational complexity.One such approach is a min-sum based LDPC decoder whichapproximates the in (1) according to

(3)where .

C. MP-Based LDPC decoding

Margin propagation (MP) provides a lower message sparsityand yet a more accurate approximation to the (1) [19]. MP ap-proximation has been described in detail in [29] and in this sec-tion, we provide a brief outline to support the hardware descrip-tion provided later in this paper. Given a set of scores ,

, MP approximation computes a normalizationfactor according to the constraint:

(4)

where is a rectification operation andrepresents a hyper-parameter of the approximation. For the

sake of consistency, we will express to be the output of a MPfunction whose inputs will be the set of scores

, of size and the hyper-parameter .In [19], we proved that the MP function is a piecewise-linear

(PWL) approximation to a log-sum-exp function according to

(5)

which is illustrated graphically in Fig. 2 for a two-inputMP-based function which is compared against an equivalenttwo-input sum-product and min-sum function. In [19], weshowed that the parameter controls a theoretical measureof energy efficiency of an MP-based LDPC decoder. This isshown in Fig. 3 which is obtained using a three-parameterMonte Carlo simulation of a 1000-bit (3,6) LDPC code. It canbe seen that for a fixed SNR, there exists an optimal value of(hence energy efficiency) that yields the best BER perfor-

mance. This indicates that a correct choice of could providean optimal performance in terms of energy efficiency andBER performance. Also, Fig. 3 could be used as a calibration

Page 3: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog

GU AND CHAKRABARTTY: A 100 pJ/bit, (32,8) CMOS ANALOG LOW-DENSITY PARITY-CHECK DECODER BASED ON MARGIN PROPAGATION 1435

Fig. 2. Comparison of 3D LDPC check function for (a) sum-product (b) min-sum and (c) margin propagation with .

Fig. 3. Effect of adapting hyper-parameter on BER performance (for dif-ferent AWGN channel conditions).

curve for designing adaptive LDPC decoders with varyingspecifications of energy efficiency.

III. IMPLEMENTATION OF MP DECODER CHECK NODES

As shown in (4), MP approximation requires addition, sub-traction and rectification. Each of these operations can be im-plemented in a CMOS process using basic conservation laws(e.g. Kirchhoff’s current summation) which scale across tran-sistor operating regimes (weak, moderate and strong inversion).Therefore, the same decoder architecture can be operated at dif-ferent decoding speeds by scaling the bias current and henceby scaling its power dissipation. In this section, we describe thecurrent-mode circuits which has been used for implementing thecheck node module of the proposed MP-based LDPC decoder.A differential topology has been adopted to represent the posi-tive and negative log-likelihood ratios and for canceling out anycommon-mode interference.We used the check node in Fig. 1 as an example to explain

the implementation details. is connected to the variable nodes, , , and . To avoid long mathematical notations we

will denote the message from to asand the messages as , as and

as . The check function, according to (1) canthen be written as:

(6)

If the log-likelihood ratios s are represented by their differ-ential forms as , with , and

, then (3) can be rewritten as

(7)

Once the check function has been expressed in terms of log-sum-exp functions, we can apply the MP-based approximationaccording to the procedure described in Section II-C. The MPapproximate check function (5) is written as

(8)

where and are given by

(9)

These equations are also applicable for computing messages, , and .

Fig. 4 shows the architecture of the check node imple-menting (9). The differential messages generated by the vari-able nodes are selected and the permutations and summationsare carried out according to the (9). The summation results arethen processed by individual MP units as described in (9). Theoutput produced by the respective MP units ( and in (9))are then propagated to the neighboring variable nodes.Fig. 5(a) shows the current-mode implementation of the (9).

The currents in Fig. 5(a) represent the differential vari-

ables in (9), , 2, 3. The hyper-parameter in (9)is represented by the current . The circuit has two operationalmodes: (a) when the reset signal is set to logic low, tran-sistor pulls the gate voltage of transistors low andthe output current is set to zero (implying no message isconveyed); (b) when the reset signal is set to logic high,the gate voltage of set to which determines the current. At equilibrium, the output current is determined by the

MP condition (4).

Page 4: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog

1436 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011

Fig. 4. Architecture of the check node in Fig. 1.

The PMOS diodes through ensures a unidirectionalflow of current thus implementing the rectification operationin the (4). The subtraction operations in (4) are implementedby transistors and the summation constraint is im-plemented using the Kirchhoff’s current summation at the node. However, the diodes introduce a threshold voltage drop re-

quiring a larger supply voltage ( ). An alterna-tive low-voltage implementation is shown in Fig. 5(b) whichconsists of a cascode PMOS through . The implementa-tion eliminates the threshold voltage drop but potentially suf-fers from current leakage between neighboring stages as shownin Fig. 5(b). This situation could occur when one of the branchcurrents is much larger than the other. In the proposed imple-mentation, we have chosen the circuit shown in Fig. 5(a). Sincethe operation of the circuit is based on the Kirchhoff’s currentlaw, it is functional irrespective of the biasing condition of thetransistors. Consequently, the magnitude of current can vary ina wide range from pA to , which only effects the operational

speed of the circuit. However, the matching between the transis-tors is important because it ensures that the constraintin (4) is accurately satisfied.

IV. IMPLEMENTATION OF THE VARIABLE NODE

We will use the variable node to illustrate the basic oper-ation as described by (2). The architecture of the variable nodeis shown in Fig. 6 and the circuit implementation for computingthe message propagated from variable node to check node

is shown in Fig. 7. The currents represent the posi-tive part of differential message

. Similarly, and represent theand . These currents are summed at node

and mirrored through transistors and . Similarly, the cur-rent through equals the sum of all the negative portion of thedifferential message. For brevity, we denote the two summedcurrents as and as shown in Fig. 7, where

(10)

The two currents and are compared at node and whenis greater than , the current through equals to the pos-

itive part of the differential message propagated from the vari-able node to the check node. A similar circuit computes the neg-ative part of the differential message. Thus, we realize the dif-ferential form of (2) as

ifotherwise.

ifotherwise.

(11)

Note, the transistors and in Fig. 7 act as current limiterswhere the common-mode current and hence the total power dis-sipation of the decoder can be limited using the voltage .

V. MEASUREMENT RESULTS

Fig. 8 shows a system level schematic of the (32,8) LDPCdecoder corresponding to the Tanner-graph shown in Fig. 1.Besides the variable node and the check node modules, thedecoder also integrates the following modules to facilitateoutput buffering and testing through an external digital inter-face (FPGA in this paper):(a) A 6-bit current-mode digital-to-analog converter

(DAC), used to digitally programming initial messages(currents) in (10). Its MSB of the DAC is denoted bywhereas the LSB is denoted by . The architecture ofthe current DAC (shown in Fig. 8 is based on the popularresistive divider [30] where the current through eachbranch is twice as large as its neighbor. The output asdetermined

(b) by the bits is a binary weighted sum of currentswhich are mirrored into each of the variable node mod-ules.

(c) Output comparators, which are used to compute the de-coded bits by comparing the differential variable node

Page 5: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog

GU AND CHAKRABARTTY: A 100 pJ/bit, (32,8) CMOS ANALOG LOW-DENSITY PARITY-CHECK DECODER BASED ON MARGIN PROPAGATION 1437

Fig. 5. (a) Circuits for implementing the check node module using the MP algorithm: (a) design used in this paper; and (b) an alternate low-voltage circuit.

Fig. 6. Architecture of variable node in Fig. 1.

currents when the decoder has reached equilibrium. Forexample, if is greater than , the output of the com-parator is logic “high” and indicative of the bit “0”being decoded. Note that the comparator needs to be resetbefore each comparison is made.

Fig. 7. Circuit to implement variable node.

(d) Digital sample-and-hold input buffer is a shift reg-ister chain which converts the serial input to thedecoder IC into parallel bit slices that are processed bythe DAC module. As shown in Fig. 8, there are two setsof non-overlapping control signals for the shift registerchain: and . and . In one period,when ( and ), the previous stage datais sampled; and when ( and ), the sampleddata is held during while the DAC module processes thelatched bits.

(e) Digital sample-and-hold output buffer is also a shiftregister chain which sample the parallel decoded bits(from the comparator modules) and convert them into abit-serial format .

The microphotograph of a prototype MP-based LDPC de-coder fabricated in a 0.5 m standard CMOS process is shownin Fig. 9. Table I summarizes the main specifications of the chip.The measurement setup used to evaluate the fabricated chip

is shown in Fig. 10. The decoder chip is hosted on a daughterboard which is then mounted on a test station mother board. A

Page 6: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog

1438 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011

Fig. 8. System level schematic of the (32,8) LDPC decoder.

TABLE ISPECIFICATIONS OF THE MP-BASED ANALOG LDPC DECODER

second mountable daughter board hosts a field-programmablegate array (FPGA) which is responsible for clock generationand data synchronization. A sample timing diagram for all thedigital signals used by the LDPC chip (shown in Fig. 8) andgenerated by the FPGA are shown in Fig. 11. The FPGA se-lectively programs the on-chip DACs to emulate data receivedover an additive white Gaussian noise (AWGN) channel afterwhich the FPGA enables the analog decoding core. The outputof the decoder is latched on the comparator after a pre-deter-mined decoding time and the latched bit-stream is serially re-trieved by the FPGA. As shown in Fig. 10, the experimentalsetup is also controlled through a Visual Studio interface on a

Fig. 9. Die microphotograph of the chip.

PC. The C script emulates the AWGN channel and generates6-bit signal values at different SNR levels. These bit patterns are

Page 7: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog

GU AND CHAKRABARTTY: A 100 pJ/bit, (32,8) CMOS ANALOG LOW-DENSITY PARITY-CHECK DECODER BASED ON MARGIN PROPAGATION 1439

Fig. 10. Experimental setup of the chip.

Fig. 11. Timing diagram of digital signals used to control the operation of theLDPC decoder chip.

then stored on the SDRAM of the FPGA which uses the valuesto perform real-time decoding. After each noisy transmission,decoded bits are retrieved from the LDPC chip and stored onthe FPGA’s SDRAM. At the end of each Monte Carlo run, theVisual Studio interface transfers the data logged on the SDRAMand computes the overall BER. The MATLAB interface is alsoused to adapt the bias voltages of the decoder through a NationalInstruments data acquisition card.Fig. 12 compares the BER performance of a software imple-

mentation of the MP-based decoder and the measured resultsobtained from the fabricated prototype. For this experiment thedecoding throughput was set to 320 Kbps and the hyper-param-eter was optimally chosen based on iterative experiments asdescribed later. At low SNR, the hardware implementation out-performs its software counterpart which could be attributed tothe continuous-time dynamics of the analog decoder. However,at high SNR the software implementation outperforms its hard-ware counterpart which could be attributed to the limited dy-namic range of the programming DACs and due to the offsetand gain errors introduced by the current mirrors.Fig. 13 shows the measured BER curves which are obtained

under the conditions: (a) when the system is configured tooperate as a min-sum decoder [19] and the decoding

Fig. 12. Comparison of BER of the MP-based LDPC decoder using softwaresimulation and measurements from the fabricated prototypes.

Fig. 13. Comparison of BERs of the measured MP-based LDPC decoder anda benchmark min-sum LDPC decoder.

throughput is set to 320 Kbps; and (b) when the MP-baseddecoder is configured with an optimal setting of the hyper-pa-rameter . For comparison we have included BER resultsreported for a state-of-the-art analog min-sum decoder [14].The results in Fig. 13 show that for , the performance ofMP-based min-sum LDPC decoder is inferior to the benchmarkmin-sum decoder [14] at low SNR, however, at high SNR( 3.5 dB) the performance of MP min-sum decoder outper-forms the benchmark. When , the results show that theMP-based LDPC decoder outperforms the benchmark min-sumdecoder by more than 3 dB.Fig. 14 shows the measured BER curves for different values

of the hyper-parameter . Again, for this experiment the de-coding throughput is set to 320 Kbps. As increases ( in-creases), it can be seen that the BER of the MP-based LDPCdecoder first improves and then degrades, which is consistentwith the BER-SNR- trade-off previously demonstrated onlyusing simulation results (shown in Fig. 3).Fig. 15 shows the measured BER performance (under dif-

ferent SNRs) when the decoding time (inverse of decodingspeed) is varied. For this experiment, is held constant at0.74 V (correspondent to one of the four curves shown inFig. 14). It is seen that as the decoding time is increased, theBER performance improves, which is consistent with the re-sponse of previously reported analog decoders [14]. When the

Page 8: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog

1440 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011

TABLE IICMOS DIGITAL (TOP 4) AND ANALOG (BOTTOM 6) DECODERS

Fig. 14. Comparison of measured BER performances for different values ofthe hyper-parameter .

decoding time is reduced to less than 2.5 s (data throughput12.8 Mb/s), the BER performance starts to decreases rapidly.When decoding time is as low as 833 ns (data throughput38.4 Mb/s), the BER (measured at SNR 7 dB) could be13 times higher than the BER measured at data throughput of12.8 Mb/s.Table II compares the measured specification of the fabri-

cated MP-based LDPC decoder to the specifications of differentdigital and analog decoders reported in literature. The compar-ison shows that the MP-based LDPC decoder has the secondhighest throughput amongst the reported analog decoders. How-ever, the turbo decoder with the highest throughput [31] has

Fig. 15. Measured BER performances of the MP-based LDPC decoderfor different decoding times (throughput).

an energy efficiency much lower than the MP-based design.Moreover, the turbo decoder has a longer codeword length thanours. The benchmark min-sum LDPC decoder [14] has the samecodelength as the proposedMP-based LDPC decoder. However,the former measured the energy efficiency based on the powerconsumption for the whole chip, while the latter based on thecore, which makes the comparison difficult. However, it can beseen that the throughput of theMP-based LDPC decoder is morethan twice that of the benchmark min-sum LDPC decoder. Interms of energy efficiency, the MP-based LDPC decoder has thesecond highest energy efficiency. Whereas the (8,4) HammingTrellis graph decoder, which has the highest energy efficiency,

Page 9: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog

GU AND CHAKRABARTTY: A 100 pJ/bit, (32,8) CMOS ANALOG LOW-DENSITY PARITY-CHECK DECODER BASED ON MARGIN PROPAGATION 1441

has a much lower throughput than this implementation. In termsof the silicon area, the proposed implementation achieves inte-gration density comparable to that of [14], considering the dif-ferences in the respective technology feature size.The table of comparison also shows that state-of-the-art

digital decoders can also enjoy high energy-efficiencies asthe analog decoders. This is because these implementationsexploit highly parallel architecture [9], [10], early termi-nation techniques [18], post-processing methods [32] andaggressive feature and voltage scaling. However, it should benoted that unlike analog decoders, digital decoders requirean analog-to-digital converter to digitize the analog channelinformation. For instance, the digital decoder reported in [32]requires 4 to 6-bit digital inputs and the energy efficiencymetric (pJ/bit) reported in Table II does not incorporate thepower dissipation of the ADC.

VI. CONCLUSIONS AND DISCUSSIONS

In this paper, we have proposed an implementation ofan analog LDPC decoder based on MP algorithm. The useof MP-based decoding allows trading off the BER perfor-mance with respect to the energy efficiency of the decoderand hence can be adapted to a wide range of applications. Aproof-of-concept prototype has been fabricated in a 0.5 mCMOS technology and measured results validate the trade-offcapability of the decoder using a system hyper-parameter.The measured results show that the BER performance of theMP-based decoder outperforms a benchmark state-of-the-artmin-sum decoder at SNR levels greater than 3.5 dB andcan achieve energy efficiencies greater than 100 pJ/bit at athroughput of 12.8 Mbps. Even though we chose the 0.5 mCMOS technology for implementing the MP-decoder, weenvision the proposed decoder being attractive for sub-100-nmCMOS processes. The architecture of the decoder is modularand hence can be scaled up to implement longer codes like theones used in DVB-S2. However, for large analog decoders,the effect of mismatch and environmental variations could besignificant and has to be considered during the design phase andultimately incorporated into the automation and verificationtools. Also, implementing a large scale decoder would implylonger interconnects which will require larger driving currents.Hence, maintaining a strict control over the distribution ofcurrents (or messages) would be the key towards ensuringenergy efficiency. MP-based LDPC decoder allows such acontrol through the use of the hyper-parameter . Also, due toaggressive scaling (feature size and power dissipation) of dig-ital CMOS technology, low-voltage analog circuits that can beintegrated with low-voltage digital designs are becoming oneof the major requirements [33]. Even though the supply-voltagehas been aggressively scaled, the threshold voltage of thetransistor can not be significantly reduced due to exponentialincrease in the sub-threshold leakage power [34]. Therefore,there is a need for analog computing circuits that can operateacross multiple biasing conditions ranging from weak-inver-sion, moderate-inversion to strong-inversion regime. Sincethe underlying principle of MP decoder is based on currentconservation, the proposed circuits function no matter what

the operational region the transistors are biased in. Thus, usingthe MP-based technique it is possible to implement analogdecoders operating with currents ranging from femtoamperesto microamperes.

REFERENCES[1] R. G. Gallager, “Low-density parity-check codes,” Ph.D. dissertation,

Massachusetts Inst. Technol., Cambridge, MA, 1963.[2] ETSI EN 302 307, Digital Video Broadcasting (DVB): Second Gener-

ation Framing Structure, Channel Coding and Modulation Systems forBroadcasting, Interactive Services, News Gathering and Other Broad-band Satellite Applications ver. 1.1.2, Jun. 2006.

[3] B. Bangerter, E. Jacobsen, M. Ho, A. Stephens, A. Maltsev, A.Rubtsov, and A. Sadri, “Wireless technologies: High-throughputwireless LAN air interface,” Intel Technical J., ser. 3, vol. 7, Aug.2003.

[4] Y. Rashi, E. Sharon, and S. Litsyn, “LDPC: Efficient Alternative FECfor the TFI-OFDM PHY Proposal,” IEEE 802.15 Working Groupfor WPAN [Online]. Available: http://www.grouper.ieee.org/groups/802/15/pub

[5] R. M. Tanner, “A recursive approach to low complexity codes,” IEEETrans. Inf. Theory, vol. 27, no. 5, pp. 533–547, Sep. 1981.

[6] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binaryblock and convolutional codes,” IEEE Trans. Inf. Theory, vol. IT-42,no. 3, pp. 429–445, Mar. 1996.

[7] D. J. C. MacKay, “Good error-correcting codes based on very sparsematrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, Mar.1999.

[8] M. G. Luby,M.Mitzenmacher, M. A. Shokrollahi, and D. A. Spielman,“Improved low-density parity-check codes using irregular graphs,”IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 585–598, Feb. 2001.

[9] M. M. Mansour and N. R. Shanbhag, “A 640 Mbps 2048-bit pro-grammable LDPC decoder chip,” IEEE J. Solid-State Circuits, vol.41, pp. 684–698, Mar. 2006.

[10] A. J. Blanksby and C. J. Howland, “A 690-mW 1-Gb/s 1024-b, rate-1/2low-density parity-check code decoder,” IEEE J. Solid-State Circuits,vol. 37, no. 3, pp. 404–412, Mar. 2002.

[11] A. Darabiha, A. C. Carusone, and F. R. Kschischang, “A 3.3-Gbps bit-serial block-interlaced min-sum LDPC decoder in 0.13- m CMOS.,”in Proc. IEEE Custom Integrated Circuits Conf., 2007, pp. 459–462.

[12] T. Brandon, R. Hang, G. Block, V. C. Gaudet, B. Cockburn, S. Howard,C. Giasson, K. Boyle, P. Goud, S. S. Zeinoddin, A. Rapley, S. Bates, D.Elliott, and C. Schlegel, “A scalable LDPC decoder ASIC architecturewith bit-serial message exchange,” Elsevier Integration–The VLSI J.,vol. 41, no. 3, pp. 385–398, May. 2008.

[13] J. Hagenauer and M. Winklhofer, “The analog decoder,” in Proc. IEEEInt. Symp. Information Theory (ISIT ’98), Cambridge, MA, Aug. 1998,p. 145.

[14] S. Hemati, A. Banihashemi, and C. Plett, “A 0.18- m CMOS analogmin-sum iterative decoder for a (32,8) low-density parity-check(LDPC) code,” IEEE J. Solid-State Circuits, vol. 41, no. 11, pp.2531–2540, Nov. 2006.

[15] C. Winstead, N. Nguyen, V. Gaudet, and C. Schlegel, “Low-voltageCMOS circuits for analog iterative decoders,” IEEE Trans. CircuitsSyst. I, vol. 53, pp. 829–841, Apr. 2006.

[16] V. Gaudet, “Energy efficient circuits for LDPC decoding,” CMOSEmerging Technologies, 2007.

[17] S. B. Qaisar, S. Karande, K. Misra, and H. Radha, “Optimally mappingan iterative channel decoding algorithm to a wireless sensor network,”in Proc. IEEE Int. Conf. Commun., 2007, pp. 3283–3288.

[18] A. Darabiha, A. C. Carusone, and F. R. Kschischang, “Power reductiontechniques for LDPC codes,” IEEE J. Solid-State Circuits, vol. 43, no.8, pp. 1835–1845, Aug. 2008.

[19] M. Gu, K. Misra, H. Radha, and S. Chakrabartty, “Sparse decodingof low density parity check codes using margin propagation,” in Proc.IEEE Globecom, Nov. 2009, pp. 1–6.

[20] M. Gu and S. Chakrabartty, “An adaptive analog low-density parity-check decoder based on margin propagation,” in Proc. IEEE Int. Symp.Circuits and Systems, May 2011, accepted.

[21] C. Winstead, J. Dai, S. Yu, C. Myers, R. Harrison, and C. Schlegel,“CMOS analogMAP decoder for (8,4) Hamming code,” IEEE J. Solid-State Circuits, vol. 39, no. 1, pp. 122–131, Jan. 2004.

[22] H.-A. Loeliger, F. Lustenberger, M. Helfenstein, and F. Tarköy, “Prob-ability propagation and decoding in analog VLSI,” IEEE Trans. Inf.Theory, vol. 47, no. 2, pp. 837–843, Feb. 2001.

Page 10: IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE … · 2017-08-26 · IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011 1433 A 100 pJ/bit, (32,8) CMOS Analog

1442 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 6, JUNE 2011

[23] S. Chakrabartty, “CMOS analog iterative decoders usingmargin propa-gation circuits,” in Proc. IEEE Int. Symp. Circuits and Systems, Greece,2006, pp. 5003–5006.

[24] T. Richardson, M. Shokrollahi, and R. Urbanke, “Design of capacityapproaching irregular low-density parity-check codes,” IEEE Trans.Inf. Theory, vol. 47, no. 2, pp. 619–637, Feb. 2001.

[25] F. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and thesum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp.498–519, Feb. 2001.

[26] J. Chen, A. Dholakia, E. Eleftheriou, M. P. C. Fossorier, and X.-Y.Hu, “Reduced-complexity decoding of LDPC codes,” IEEE Trans.Commun., vol. 53, no. 8, pp. 1288–1299, Aug. 2005.

[27] J. Zhao, F. Zarkeshvari, and A. H. Banihashemi, “On implementationof min-sum algorithm and its modifications for decoding low-densityparity-check (LDPC) codes,” IEEE Trans. Commun., vol. 53, no. 4, pp.549–554, Apr. 2005.

[28] D. Oh and K. K. Parhi, “Min-sum decoder architectures with reducedword length for LDPC codes,” IEEE Trans. Circuits Syst. I, vol. 57, no.1, pp. 105–114, Jan. 2010.

[29] C. Kong and S. Chakrabartty, “Analog iterative LDPC decoder basedon margin propagation,” IEEE Trans. Circuits Syst. II, vol. 54, no. 12,pp. 1140–1144, Dec. 2007.

[30] C.-H. Lin and K. Bult, “A 10-b, 500-MSample/s CMOS DAC in 0.6mm ,” IEEE J. Solid-State Circuits, vol. 33, no. 12, pp. 1948–1958,Dec. 1998.

[31] V. Gaudet and G. Gulak, “A 13.3-Mb/s 0.35 m CMOS analog turbodecoder IC with a configurable interleaver,” IEEE J. Solid-State Cir-cuits, vol. 38, no. 11, pp. 2010–2015, Nov. 2003.

[32] Z. Zhang, V. Anantharam, M. J. Wainwright, and B. Nikolić, “An effi-cient 10GBASE-T ethernet LDPC design with low error floors,” IEEEJ. Solid-State Circuits, vol. 45, no. 4, pp. 843–855, Apr. 2010.

[33] International Technology Roadmap for Semiconductors, ITRS, 2010[Online]. Available: http://www.itrs.net

[34] A. Balankutty, S.-A. Yu, Y. Feng, and P. R. Kinget, “A 0.6-V zero-IF/low-IF receiver with integrated fractional-N synthesizer for 2.4-GHzISM-band applications,” IEEE J. Solid-State Circuits, vol. 45, no. 3,pp. 538–553, Mar. 2010.

[35] D. Vogrig, A. Gerosa, A. Neviani, A. Amat, G. Montorsi, and S.Benedetto, “A 0.35- m CMOS analog turbo decoder for the 40-bitrate 1/3 UMTS channel code,” IEEE J. Solid-State Circuits, vol. 40,pp. 753–762, Mar. 2005.

Ming Gu received the B.E. and M.E. degree inautomation science and electrical engineering fromthe Beijing University of Aeronautics and Astronau-tics, Beijing, China, in 2000 and 2003, respectively,and the M.E. degree in electrical and computerengineering from Michigan State University, EastLansing, MI, in 2008.From 2003 to 2006, she worked as a Research En-

gineer in the Center for Space Science and AppliedResearch, Chinese Academy of Sciences, Beijing,China.

She is currently working towards the Ph.D. degree at Michigan State Univer-sity, East Lansing, MI. Her research interests include mixed signal VLSI circuitsdesign, error-control coding, and analog signal processing.

Shantanu Chakrabartty (SM’99–M’04–S’09)received the B.Tech. degree from the Indian Instituteof Technology, Delhi, India, in 1996, and the M.S.and Ph.D. degrees in electrical engineering from TheJohns Hopkins University, Baltimore, MD, in 2002and 2005, respectively.He is currently an Associate Professor in the De-

partment of Electrical and Computer Engineering atMichigan State University. From 1996 to 1999 hewas with Qualcomm Incorporated, San Diego, CA,and during 2002 he was a visiting researcher at the

University of Tokyo, Japan. His work covers different aspects of analog com-puting, in particular non-volatile circuits, and his current research interests in-clude energy harvesting sensors and neuromorphic and hybrid circuits and sys-tems.Dr. Chakrabartty was a Catalyst foundation fellow from 1999 to 2004

and is a recipient of the National Science Foundation’s CAREER award andMichigan State University’s Teacher-Scholar Award. He is currently servingas an associate editor for the IEEE TRANSACTIONS OF BIOMEDICAL CIRCUITSAND SYSTEMS, associate editor for the Advances in Artificial Neural Systemsjournal, and a review editor for the Frontiers of Neuromorphic Engineeringjournal.