Download - FPGA-based Evaluation of LDPC Codeskumar/Sabanci_Seminar.pdfFPGA-based Evaluation of LDPC CodesFPGA-based Evaluation of LDPC Codes Prof. Vijayakumar Bhagavatula ... Log-Likelihood

FPGA-based Evaluation of LDPC CodesFPGA-based Evaluation of LDPC Codes

Prof. Vijayakumar [email protected]

AcknowledgementsDr. Hongwei Song

Dr. Zongwang Li

Dr. Lingyan Sun

Xinde Hu

OutlineOutline

Motivation for using low density parity check (LDPC) codes in data storage systemsStructured LDPC codesSoft output Viterbi algorithm (SOVA)Implementation on FPGA hardwareLDPC code evaluation for magnetic recording channel modelsSummary

Digital Data Storage ChannelDigital Data Storage Channel

Media

WriteHead

ReadHead

Use

r Bits

User BitsWrite

CurrentWrite

Current

Timing Recovery

ReadEqualization

ReadEqualization

EncodingEncoding

DetectionDetectionDecodingDecoding

Tape cartridge Tape cartridge and driveand drive

Analog• Digital data represented by analog media• magnetization changes (magnetic disk, tape, magneto-optic recording)• pits and lands (CD/DVD)• phase change (DVR)• refractive index changes (holographic storage)

Hard Disk Drive Signal Processing TrendsHard Disk Drive Signal Processing Trends

PEAK DETECTMFM

(2,7)

(1,7)

PR4EPR4 NPML with PARITY

d=0 ord=1

Density

Time

ANALOG DIGITAL

E PR4, GEnPR4n

TURBO/LDPC CODES

d=0

Courtesy: H. Thapar

• Normalized densities > 3• Low SNRs; 6 to 10 dB range?• Higher data rates; need faster detection

Partial Response Maximum Likelihood (PRML)

ChannelT

PR Equalizer Viterbi detector kaka

• Forcing ISI to zero causes noise amplification• PR equalization leaves a controlled amount of ISI behind• Maximum likelihood (ML) uses Viterbi algorithm (VA) to unravel the original bits from noisy samples containing controlled ISI• HDD employs PRML

MotivationHard disk drives have enjoyed compound annual density growths as

high as 100%, but are reaching limits due to effects such as super-paramagnetic effect

Advanced coding and signal processing needed to cope with the lower SNRs and higher inter-symbol interference (ISI) of future systems.

Simulations of iterative soft detection/decoding using LDPC codes exhibit 3~5 dB coding gain over uncoded partial response maximum likelihood (PRML) system at bit error rate (BER) ~10-5

Performance of LDPC codes at BER <10-10 remains an open question; in particular, error floor is a concern

Approach: Use FPGA platform to evaluate LDPC coded channels to very low BER levels

LDPC Codes

X is valid codeword if0S HX= =

⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

5

4

3

2

1

0

3

2

1

0

ˆˆˆˆˆˆ

011000001100011010100001

xxxxxx

ssss

H

3c 2c 1c 0c

5b 4b 3b 2b 1b 0b

Rate, block length, minimum distanceGirth: the shortest circle lengthGood structure of H matrix

facilitate LDPC decodingEasier analysis

Tanner Graph

Structured LDPC CodesStructured LDPC Codes

Regular codes: all columns have same number of onesStructured codes: parity check matrix or generator matrix has a structure that offers implementation advantages

Disjoint Difference Sets (DDS) codesArray codesEuclidean geometry codesQuasi-cyclic codes

Disjoint Difference Set (DDS) CodesDisjoint Difference Set (DDS) Codes

Difference set: A set {a1, a2, …,aj} of different residues (mod v) is called a difference set (v, j) if no two of the ordered j(j-1) differences ai-ai’ modulo v are identical.

D={{0, 1, 4}, {0, 2, 7}} is a disjoint difference sets (DDS) with v=13 {0, 1, 4} set: ai-ai’={1 3 4 9 10 12}{0, 2, 7} set: ai-ai’={2 5 6 7 8 11}

1 2

0 0 0 0 0 0 0 0 0 0⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎡ ⎤

⎣ ⎦ ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

= =

1 1 1 1 1 11 1 1 1 1 1

1 1 1 1 1 11 1 1 1 1 1

1 1 1 1 1 11 1 1 1 1 1

1 1 1 1 1 11 1 1 1 1 1

1 1 1 1 1 11 1 1 1 1 1

1 1 1 1 1 11 1 1 1 1 1

1 1 1 1 1 1

H H H

0 1 4 0 2 7

*H. Song, “Iterative soft detection and decoding for data storage channels,” Ph.D dissertation, Dept. of ECE, Carnegie Mellon University, Pittsburgh, USA, Dec. 2002

D={0, 1, 3} is a difference set (7, 3), ai-ai’={1 2 3 4 5 6}

Array based LDPC codesArray based LDPC codes

2 3 1

2 4 6 2( 1)

1 2( 1) 3( 1) ( 1)( 1)

p

p

j j j j p

−

−

− − − − −

⎡ ⎤⎢ ⎥⎢ ⎥

= ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

I I I I II σ σ σ σ

H I σ σ σ σ

I σ σ σ σ

11

1

1 p p×

⎡ ⎤⎢ ⎥⎢ ⎥

= ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

σ

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

1 1 1 1 11 1 1 1 1

1 1 1 1 11 1 1 1 1

1 1 1 1 11 1 1 1 1

1 1 1 1 11 1 1 1 1

1 1 1 1 11 1 1 1 1

1 1 1 1 11 1 1 1 1

1 1 1 1 11 1 1 1 1

1 1 1 1 1

1

2

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦3

HH = H =

H


Quasi-Cyclic (QC) LDPC Code

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

=

ctcc

t

t

HHH

HHHHHH

H

21

22221

11211

Each Hij is either an all-zero matrix or a circulant matrix.(circulant size 5x5)n: codeword length, 4608

k: information bit length, 4096ql: circulant size qlxql, 128

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

=

0100000100000100000110000

, jiH

Parity check matrix of QC-LDPC code in circulant formLow routing congestion in ASIC

Soft InformationSoft Information

Larger LLR magnitudes represent more confidence

Prob. of bit (+1)Prob. of bit ( -1)Log-Likelihood Ratio (LLR) =loge

-2

-1

0

1

2

0 1 2 3 4 5 6 7 8 9 10Sample Index

LLR

High Confidence of (-1)

Low Confidence of (+1)

High Confidence of (+1)

LDPC DecodingLDPC Decoding

Bit-to-Check update

Check to bit update

LLR

Iterative Detection and DecodingIterative Detection and Decoding

Soft detection and decoding

Exchanging soft information between channel detector and LDPC decoder

3~5 dB Gain in simulations over uncoded PRML system at BER 10-5

Soft ChannelDetector

Soft ChannelDetector

LDPC DecoderLDPC

Decoder

Readback signal

Channel Iteration

Decoded bits

Example Performance Gain

EPR4 (1+D-D2-D3)

AWGN

Codeword length 4608

Code rate 8/9

10 channel iterations

1 LDPC iteration

4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.510-6

10-5

10-4

10-3

10-2

10-1

SNR

BER

PRMLLDPC(10,1)

Problem of Error FloorProblem of Error Floor

Error floor is a major problem in applying LDPC codes to the magnetic recoding channel

5 6 7 8 9 10 11

Signal to Noise Ratio [dB]

10-1

10-2

10-3

10-4

10-5

10-6

10-7

10-8

10-9

10-10

10-11

Bit E

rror R

ate

Error Floor

C limitation

LDPC Code Evaluation ApproachLDPC Code Evaluation Approach

ApproachFPGA platformCode evaluation platform for AWGN channelCode Evaluation platform for ideal PR channelError analysis using FPGA

ChallengesHardware throughput and software flexibility Tradeoff between throughput, logic consumption

Reasons for FPGAReasons for FPGA

speed Cost

Programmability

Comparison of Hardware Platforms for DSP Algorithm

FPGA

uPs

ASIC uPs : General Purpose Microprocessors/ Digital Signal Processors

ASIC: Application Specific Integrated Circuit

EPR4 target, depth=15, TS-SOVA

Operations/bit ~350.

Assume 1GHz PC, 3 clock cycles/operation. 1013 bits need 122 days

100Mbps, 1013 bits need 1.15days

High speed: required to achieve BER <10-10

For SOVA only, at BER 10-12, more than 100days for PC and about 1 day for FPGAExperiments: For LDPC codes in AWGN channel, to get BER 10-11, more than one month for PC and about 1 day for FPGA.Reconfigurable

First Platform - System View

Matlab, C

PCI Port

PCI (33MHz) between PC and FPGA

Simulation controller: Matlab and C program

FPGA chip: Xilinx Virtex II 6000

6M gates

144 Memory blocks

Hardware / Software Co-designHardware / Software Co-design

Task Partition:

Generate SamplesCollect Errors Detection/Decoding

High bandwidth

PC FPGA

Collect Errors PCI Port

Generate SamplesDetection/Decoding

High speed sample generation

PC FPGA

PCI Port

Soft Channel DetectorsSoft Channel Detectors

Provide soft information for the LDPC DecoderMaximum-A-Posterior Probability (MAP) Detectors

MAPMax-Log-MapLog-Map

Maximum Likelihood Sequence Detectors (MLSD)Classic SOVA, Two step SOVABi-directional SOVA

Soft Channel Detector ComparisonMAP, Max-Log-Map, Log-Map, Bi-SOVA, TS-SOVA

Window length 32 EPR4 channelAWGNFloating pointN=4608, j=3, rate 8/910 LDPC iterations0 Channel iterations

5 5.5 6 6.510-4

10-3

10-2

10-1

SNR (dB)

BE

R

MAPLOG_MAPMAX_LOG_MAPBI_SOVASOVA

LDPC Decoder Implementations

Fully Parallel ImplementationDirectly map the algorithm to hardware implementation

Highest throughput. (64 Gbps/iteration at 64MHz using 0.16um standard cell CMOS processing with 5 metal layer*)

Complicated wiring dominates the chip area

Design not scalable

* C. Howland, A. Blanksby, “Parallel decoding architectures for low density parity check codes,” The 2001 IEEE International Symposium on Circuits and Systems, vol. 4, page 742 – 745, 2001.

PE

PE

Shared Memory Architecture

PEMemory

Problems:

Code specific design

Different H matrix structures give different interconnections

Large amount of memory consumption

Requirements:

A single decoder for a broad class of codes

Column weight, code rate, block length, H matrix structure

Fit the decoder in a single FPGA

High throughput

0b 1b 2b 4b3b 8b 9b6b 7b5b 10b

0c 1c 2c 3c 4c 5c 6c 7c

11b

Generalized LDPC Decoder Implementation Set H matrix constraint: consists of regular sub matrix

Physical node to virtual node mapping

0,0 0,1 0, 1

1,0 1,1 1, 1

1,0 1,1 1, 1

1 1 0 0 0 0 0 0 1 0 0 0

0 1 1 0 0 0 0 0 0 0 1 0

1 0 0 1 0 0 0 0 0 1 0 0

0 0 1 1 0 0 0 0 0 0 0 1

0 0 0 1 1 1 1 0 0 0 1 1

1 0 0 0 1 0 1 1 0 1 1 0

0 0 1 0 0 1 1 1 1 1 0 0

0 1 0 0 1 1 0 1 1 0 0 1

..

.... .. .. ..

..

k

k

j j j k

H H HH H H

H

H H H

−

−

− − − −

⎡ ⎤⎢ ⎥⎢ ⎥

⎡ ⎤ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎢ ⎥

⎢ ⎥⎢ ⎥⎣ ⎦

'2b'

1b'0b

'0c

'1c

'0b '

1b '2b

'0c '

1c

AWGN Channel Evaluation Platform

AWGNError Analyzer

PCI_ IF

DEC LLR Cal

PCI Port

PCI handshaking logic

Start, stop, noise variance, max errorBlock error,

Bit error

Fully reconfigurable: Iteration numbers, bit width, block length, Column width of code

High throughput

Small size to fit in one chip

Error floor of DDS code in AWGNError floor of DDS code in AWGN

3.5 4 4.5 5 5.5 6 6.510-10

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

SNR

BE

R/B

LER

DDS j=5, N=4923, M=547, Iterations=50

BLER

BER


DDS Codes in AWGN Channel

3 3.5 4 4.5 5 5.5 6 6.5 710-10

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

SNR

BER

/BLE

R

DDS j=3,4,5, N=4923, M=547, Iterations=50

DDS j=3DDS j=4DDS j=5 AWGN channel,

6 bit LLR, 50 iterationsJ=3 Error Floor at BLER=1e-3J=4 Error Floor at BLER=1e-4J=5 Error Floor at BLER=1e-6


3.5 4 4.5 5 5.5 610-12

10-11

10-10

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

SNR

BE

R/B

LER

Array j=5, N=4637, M=515, Iterations=50

BLER

BER

Error floor of Array code in AWGNError floor of Array code in AWGN

Array Code Evaluation in AWGN Channel

3 3.5 4 4.5 5 5.5 6 6.5 710-12

10-11

10-10

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

SNR

BER

/BLE

R

ARRAY j=3,4,5, Iterations=50

ARRAY j=3, N=4671, M=519ARRAY j=4, N=4716, M=524ARRAY j=5, N=4635, M=515

AWGN channel6 bit LLR, 50 iterationsJ=3 Error Floor at BLER=1e-5J=4 Error Floor at BLER=1e-6J=5 Error Floor at BLER=1e-9

* J. Fan, "Array codes as Low-Density Parity Check Codes" 2nd International Symposium on Turbo Codes and Related Topics (Brest, France), September 2000.

PR Channel Evaluation Platform

Main challenges

Data flow control

Buffer and memory requirement

Clock skew in large design

PCI (33.33 MHz) handshaking

RDG ENC FIFO Interleaver

Precoder

PR+AWGN

SOVADe-interleaver

DecoderInterleaver FIFO

Error distribution Error

CountError Location

Start, stop, noise variance, max error

Error Location, Error number, Error distribution

Error floor of DDS code (column weight 3) in PR channel

4 4.5 5 5.5 6 6.5 7 7.510

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

SNR

BE

R/B

LER

DDS J=3, N=4923, M=547

AWGN Channel Result at 50 LDPC iteration

PR channel with 25 LDPC iterations and 2 channel iterations

BLER

BER BER

BLER

C Simulation

EPR4 channel7 bit SOVA6 bit LDPC Decoder25 LDPC iterations2 Channel iteration

4 4.5 5 5.5 6 6.5 7 7.510

-10

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

SNR

BE

R/B

LER

DDS J=5, N=4923, M=547

AWGN Channel at 50 LDPC iteration PR Channel with

25 LDPC iterations and 2 channel iterations

BLER

BER

BLER

BER

Error Floor of DDS Code (column weight 5) in PR Channel


4 4.5 5 5.5 6 6.5 7 7.5 810-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

SNR

BE

R/B

LER

ARRAY j=3, N=4671, M=519

BER

BLER

AWGN Channel Result at 50 LDPC iteration

PR channel with 25 LDPC iterations and 2 channel iterations

BLER

BER

Error Floor of Array Code (column weight 3) in PR Channel


Code length: 5760 Message length: 5120 Code rate: 8/9

Column Weight: 3, 4, 5Row weight: 27,36,45

Max. # iterations: 15

BER/BLER

1. 0E- 11

1. 0E- 10

1. 0E- 09

1. 0E- 08

1. 0E- 07

1. 0E- 06

1. 0E- 05

1. 0E- 04

1. 0E- 03

1. 0E- 02

1. 0E- 01

1. 0E+00

2. 5 3 3. 5 4 4. 5 5Eb/ N0

BER ( FPGA deg5)

BLER ( FPGA deg5)

BER ( FPGA deg 3)

BLER ( FPGA deg 3)

BER ( FPGA deg 4)

BLER ( FPGA deg 4)

Rate 8/9 QC-LDPC Code

BER/BLER

1. 0E- 11

1. 0E- 10

1. 0E- 09

1. 0E- 08

1. 0E- 07

1. 0E- 06

1. 0E- 05

1. 0E- 04

1. 0E- 03

1. 0E- 02

1. 0E- 01

1. 0E+00

0 0. 5 1 1. 5 2 2. 5 3

Eb/ N0 ( dB)

BLER( 30 i t er at i on,Scal ed mi n- sum, FPGA)

BER( 30 i t er at i on,Scal ed mi n- sum, FPGA)

Rate ½PEG QC-LDPC Code with Block Length of 32768 bits

BER/BLERBER/BLER

Low BERs Possible thanks due to FPGA

SOVA+LDPC FPGA SimulatorSOVA+LDPC FPGA Simulator

RDG ENC Interleaver

Write processor

SOVA Deinterleaver

DECinterleaverERR ANY

Read processor

Magnetic Recording Channel

Transition NoiseTransition Noise

Transition Jitter Width Variation

Transitions between bit cells not straightThe “zigzag”s result in transition shift (transition jitter) and pulse width variation

Media NonlinearitiesMedia NonlinearitiesPartial Erasure (PE)

Each bit cell is partially erased by the field of neighboring field (when transition occurs)Results in an amplitude loss in the readback signal

Nonlinear Transition Shift (NLTS)The previous written bit cells interfere with the current writing field Results in transition location shift

Partial Erasure

Original write bubble

Actual write bubble

Original Transition Position

Actual Transition Position

The writing head

NLTS Interference field

NLTS Factor

Partial ErasureFactor

Data Dependent Noise

Data Independent Noise

AWGN Generator

Transition NoiseEPR4 Equalizer

Jitter Noise

Width Variation

Magnetic Recording Channel ModelMagnetic Recording Channel Model

ISI Factor

Input Buffer

From Encoder

To Decoder

Electronic Noise

The ReadbackChannel

AWGN Generator

AWGN Generator

Hardware Resource UsageHardware Resource Usage

1 out of 16 6%

56 out of 168 33%

27308 out of 93184 29%

15936 out of 93184 17%

17897 out of 46592 38%

Number of GCLKs

Number of MULT18X18s

Number of 4 input LUTs

Number of Slice Flip Flops

Number of Slices

The resource utilization (Xilinx Virtex II 8000 -4 )• Total equivalent gate count for design: 3,696,944

Processing speed and throughput• Clock speed 40 Mbits/s• Throughput 180 Mbits/s per iteration, 3.6 Mbits/s with 50 iterations• Time required to reach BER of 10-11: FPGA --- 30 hours

C simulation --- 3,000 hours

Signal-to-Noise Ratio (SNR)Signal-to-Noise Ratio (SNR)

Includes both AWGN and transition noiseSNR defined as the ratio of the power of a single pulse and the height of noise power spectral density

The transition noise is the dominant part of the noise (80% in power)

02

250 0

/ 1 4

t

j t

E NSNRE

PW Nσ

=+

Xilinx Virtex II 8000FPGA ChipAlphaData ADM-XRC-II PCI AdaptorAdaptor

25 LDPC (outer) iterations2 channel (inner) iterations

Number of iterations12dB --- 22 dBSNR

EPR4PR target

Code length 4932, rate 8/9 LDPC codeLDPC Codes

VHDLHardware Programming language

Simulation SettingsSimulation Settings

BER for Different ChannelsBER for Different Channels

6 dB due to realistic channel modeling of Lorentzian pulse shape and the non-adaptive equalizer

2.5 dB due to PR channel

3 dB due to the presence of media nonlinearities

When the PE ratio reaches 0.7, the loss in BER is significant. For the NLTS effect, the threshold is around 0.15T. When ε exceeds 0.15T, a precompensator is necessary for the system.

Partial Erasure and NLTSPartial Erasure and NLTS

PE Effect NLTS Effect

Inefficient region Waterfall Region Error Floor

Each curve represents the ratio of errors corrected during the past 5 iterations In the low SNR region, the noise level is too high for the LDPC code to be effective. In the “waterfall” region, the error rate drops dramatically as SNR increases. Each iteration is able to correct a large portion of error bits left. In the error floor region, only the early iterations in each of the two channel iterations are effective. Some iterations introduce more errors than they correct, shown as negative ratio.

Iteration EfficiencyIteration Efficiency

BER with Fewer IterationsBER with Fewer Iterations

0.6 dB

Reduce the total number of iterations to 10 (in 2 channel iterations) from 50The loss in BER is 0.6 dB in the waterfall regionThe guaranteed throughput is increased by 5

FPGA LDPC Code Platform EvolutionFPGA LDPC Code Platform Evolution

AWGN Channel + LDPC ENDECIdealized EPR4 + LDPCEqualized EPR4 + LDPCLorentzian readback signal (on FPGA) + equalized EPR4 + LDPC

AWGNTransition noiseNonlinear transition shiftPartial erasure

Perpendicular recording channel model (on FPGA)

Perpendicular RecordingPerpendicular Recording

Step response modeled by error functionTransition noise Media nonlinearities

– Nonlinear transition shift (NLTS)

– Partial erasure (PE)

Inter-symbol interferenceElectronic Noise

ln16( ) ( )50dt

tg t E erfPW

=

LDPC Code with Perpendicular RecordingLDPC Code with Perpendicular Recording

BER FER

The effects of major impairments in perpendicular recording systems are similar to that in longitudinal recording except NLTS does not appear to impact perpendicular recording as much.At normalized recording density of 2 (PW50/T =2), the BER reaches 10-12 at SNR of 21.5 dB.

Magnetic recording channel models (both longitudinal and perpendicular recording systems) include transition noise, nonlinear transition shift, partial erasure and electronic noiseFPGA allows us to reach BERs as low as 10-12

At normalized recording density of 2, simulation exhibits 10-12 BER for SNR of around 21 dBThe individual impact of different impairments on coding performance can be investigated

SummarySummary