FPGA-based Evaluation of LDPC CodesFPGA-based Evaluation of LDPC Codes
Prof. Vijayakumar [email protected]
AcknowledgementsDr. Hongwei Song
Dr. Zongwang Li
Dr. Lingyan Sun
Xinde Hu
OutlineOutline
Motivation for using low density parity check (LDPC) codes in data storage systemsStructured LDPC codesSoft output Viterbi algorithm (SOVA)Implementation on FPGA hardwareLDPC code evaluation for magnetic recording channel modelsSummary
Digital Data Storage ChannelDigital Data Storage Channel
Media
WriteHead
ReadHead
Use
r Bits
User BitsWrite
CurrentWrite
Current
Timing Recovery
ReadEqualization
ReadEqualization
EncodingEncoding
DetectionDetectionDecodingDecoding
Tape cartridge Tape cartridge and driveand drive
Analog• Digital data represented by analog media• magnetization changes (magnetic disk, tape, magneto-optic recording)• pits and lands (CD/DVD)• phase change (DVR)• refractive index changes (holographic storage)
Hard Disk Drive Signal Processing TrendsHard Disk Drive Signal Processing Trends
PEAK DETECTMFM
(2,7)
(1,7)
PR4EPR4 NPML with PARITY
d=0 ord=1
Density
Time
ANALOG DIGITAL
E PR4, GEnPR4n
TURBO/LDPC CODES
d=0
Courtesy: H. Thapar
• Normalized densities > 3• Low SNRs; 6 to 10 dB range?• Higher data rates; need faster detection
Partial Response Maximum Likelihood (PRML)
ChannelT
PR Equalizer Viterbi detector kaka
• Forcing ISI to zero causes noise amplification• PR equalization leaves a controlled amount of ISI behind• Maximum likelihood (ML) uses Viterbi algorithm (VA) to unravel the original bits from noisy samples containing controlled ISI• HDD employs PRML
MotivationHard disk drives have enjoyed compound annual density growths as
high as 100%, but are reaching limits due to effects such as super-paramagnetic effect
Advanced coding and signal processing needed to cope with the lower SNRs and higher inter-symbol interference (ISI) of future systems.
Simulations of iterative soft detection/decoding using LDPC codes exhibit 3~5 dB coding gain over uncoded partial response maximum likelihood (PRML) system at bit error rate (BER) ~10-5
Performance of LDPC codes at BER <10-10 remains an open question; in particular, error floor is a concern
Approach: Use FPGA platform to evaluate LDPC coded channels to very low BER levels
LDPC Codes
X is valid codeword if0S HX= =
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
5
4
3
2
1
0
3
2
1
0
ˆˆˆˆˆˆ
011000001100011010100001
xxxxxx
ssss
H
3c 2c 1c 0c
5b 4b 3b 2b 1b 0b
Rate, block length, minimum distanceGirth: the shortest circle lengthGood structure of H matrix
facilitate LDPC decodingEasier analysis
Tanner Graph
Structured LDPC CodesStructured LDPC Codes
Regular codes: all columns have same number of onesStructured codes: parity check matrix or generator matrix has a structure that offers implementation advantages
Disjoint Difference Sets (DDS) codesArray codesEuclidean geometry codesQuasi-cyclic codes
Disjoint Difference Set (DDS) CodesDisjoint Difference Set (DDS) Codes
Difference set: A set {a1, a2, …,aj} of different residues (mod v) is called a difference set (v, j) if no two of the ordered j(j-1) differences ai-ai’ modulo v are identical.
D={{0, 1, 4}, {0, 2, 7}} is a disjoint difference sets (DDS) with v=13 {0, 1, 4} set: ai-ai’={1 3 4 9 10 12}{0, 2, 7} set: ai-ai’={2 5 6 7 8 11}
1 2
0 0 0 0 0 0 0 0 0 0⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎡ ⎤
⎣ ⎦ ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
= =
1 1 1 1 1 11 1 1 1 1 1
1 1 1 1 1 11 1 1 1 1 1
1 1 1 1 1 11 1 1 1 1 1
1 1 1 1 1 11 1 1 1 1 1
1 1 1 1 1 11 1 1 1 1 1
1 1 1 1 1 11 1 1 1 1 1
1 1 1 1 1 1
H H H
0 1 4 0 2 7
*H. Song, “Iterative soft detection and decoding for data storage channels,” Ph.D dissertation, Dept. of ECE, Carnegie Mellon University, Pittsburgh, USA, Dec. 2002
D={0, 1, 3} is a difference set (7, 3), ai-ai’={1 2 3 4 5 6}
Array based LDPC codesArray based LDPC codes
2 3 1
2 4 6 2( 1)
1 2( 1) 3( 1) ( 1)( 1)
p
p
j j j j p
−
−
− − − − −
⎡ ⎤⎢ ⎥⎢ ⎥
= ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
I I I I II σ σ σ σ
H I σ σ σ σ
I σ σ σ σ
11
1
1 p p×
⎡ ⎤⎢ ⎥⎢ ⎥
= ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
σ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1 1 1 1 11 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1 1
1
2
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦3
HH = H =
H
*H. Song, “Iterative soft detection and decoding for data storage channels,” Ph.D dissertation, Dept. of ECE, Carnegie Mellon University, Pittsburgh, USA, Dec. 2002
Quasi-Cyclic (QC) LDPC Code
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
ctcc
t
t
HHH
HHHHHH
H
21
22221
11211
Each Hij is either an all-zero matrix or a circulant matrix.(circulant size 5x5)n: codeword length, 4608
k: information bit length, 4096ql: circulant size qlxql, 128
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
=
0100000100000100000110000
, jiH
Parity check matrix of QC-LDPC code in circulant formLow routing congestion in ASIC
Soft InformationSoft Information
Larger LLR magnitudes represent more confidence
Prob. of bit (+1)Prob. of bit ( -1)Log-Likelihood Ratio (LLR) =loge
-2
-1
0
1
2
0 1 2 3 4 5 6 7 8 9 10Sample Index
LLR
High Confidence of (-1)
Low Confidence of (+1)
High Confidence of (+1)
LDPC DecodingLDPC Decoding
Bit-to-Check update
Check to bit update
LLR
Iterative Detection and DecodingIterative Detection and Decoding
Soft detection and decoding
Exchanging soft information between channel detector and LDPC decoder
3~5 dB Gain in simulations over uncoded PRML system at BER 10-5
Soft ChannelDetector
Soft ChannelDetector
LDPC DecoderLDPC
Decoder
Readback signal
Channel Iteration
Decoded bits
Example Performance Gain
EPR4 (1+D-D2-D3)
AWGN
Codeword length 4608
Code rate 8/9
10 channel iterations
1 LDPC iteration
4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.510-6
10-5
10-4
10-3
10-2
10-1
SNR
BER
PRMLLDPC(10,1)
Problem of Error FloorProblem of Error Floor
Error floor is a major problem in applying LDPC codes to the magnetic recoding channel
5 6 7 8 9 10 11
Signal to Noise Ratio [dB]
10-1
10-2
10-3
10-4
10-5
10-6
10-7
10-8
10-9
10-10
10-11
Bit E
rror R
ate
Error Floor
C limitation
LDPC Code Evaluation ApproachLDPC Code Evaluation Approach
ApproachFPGA platformCode evaluation platform for AWGN channelCode Evaluation platform for ideal PR channelError analysis using FPGA
ChallengesHardware throughput and software flexibility Tradeoff between throughput, logic consumption
Reasons for FPGAReasons for FPGA
speed Cost
Programmability
Comparison of Hardware Platforms for DSP Algorithm
FPGA
uPs
ASIC uPs : General Purpose Microprocessors/ Digital Signal Processors
ASIC: Application Specific Integrated Circuit
EPR4 target, depth=15, TS-SOVA
Operations/bit ~350.
Assume 1GHz PC, 3 clock cycles/operation. 1013 bits need 122 days
100Mbps, 1013 bits need 1.15days
High speed: required to achieve BER <10-10
For SOVA only, at BER 10-12, more than 100days for PC and about 1 day for FPGAExperiments: For LDPC codes in AWGN channel, to get BER 10-11, more than one month for PC and about 1 day for FPGA.Reconfigurable
First Platform - System View
Matlab, C
PCI Port
PCI (33MHz) between PC and FPGA
Simulation controller: Matlab and C program
FPGA chip: Xilinx Virtex II 6000
6M gates
144 Memory blocks
Hardware / Software Co-designHardware / Software Co-design
Task Partition:
Generate SamplesCollect Errors Detection/Decoding
High bandwidth
PC FPGA
Collect Errors PCI Port
Generate SamplesDetection/Decoding
High speed sample generation
PC FPGA
PCI Port
Soft Channel DetectorsSoft Channel Detectors
Provide soft information for the LDPC DecoderMaximum-A-Posterior Probability (MAP) Detectors
MAPMax-Log-MapLog-Map
Maximum Likelihood Sequence Detectors (MLSD)Classic SOVA, Two step SOVABi-directional SOVA
Soft Channel Detector ComparisonMAP, Max-Log-Map, Log-Map, Bi-SOVA, TS-SOVA
Window length 32 EPR4 channelAWGNFloating pointN=4608, j=3, rate 8/910 LDPC iterations0 Channel iterations
5 5.5 6 6.510-4
10-3
10-2
10-1
SNR (dB)
BE
R
MAPLOG_MAPMAX_LOG_MAPBI_SOVASOVA
LDPC Decoder Implementations
Fully Parallel ImplementationDirectly map the algorithm to hardware implementation
Highest throughput. (64 Gbps/iteration at 64MHz using 0.16um standard cell CMOS processing with 5 metal layer*)
Complicated wiring dominates the chip area
Design not scalable
* C. Howland, A. Blanksby, “Parallel decoding architectures for low density parity check codes,” The 2001 IEEE International Symposium on Circuits and Systems, vol. 4, page 742 – 745, 2001.
PE
PE
Shared Memory Architecture
PEMemory
Problems:
Code specific design
Different H matrix structures give different interconnections
Large amount of memory consumption
Requirements:
A single decoder for a broad class of codes
Column weight, code rate, block length, H matrix structure
Fit the decoder in a single FPGA
High throughput
0b 1b 2b 4b3b 8b 9b6b 7b5b 10b
0c 1c 2c 3c 4c 5c 6c 7c
11b
Generalized LDPC Decoder Implementation Set H matrix constraint: consists of regular sub matrix
Physical node to virtual node mapping
0,0 0,1 0, 1
1,0 1,1 1, 1
1,0 1,1 1, 1
1 1 0 0 0 0 0 0 1 0 0 0
0 1 1 0 0 0 0 0 0 0 1 0
1 0 0 1 0 0 0 0 0 1 0 0
0 0 1 1 0 0 0 0 0 0 0 1
0 0 0 1 1 1 1 0 0 0 1 1
1 0 0 0 1 0 1 1 0 1 1 0
0 0 1 0 0 1 1 1 1 1 0 0
0 1 0 0 1 1 0 1 1 0 0 1
..
.... .. .. ..
..
k
k
j j j k
H H HH H H
H
H H H
−
−
− − − −
⎡ ⎤⎢ ⎥⎢ ⎥
⎡ ⎤ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎢ ⎥
⎢ ⎥⎢ ⎥⎣ ⎦
'2b'
1b'0b
'0c
'1c
'0b '
1b '2b
'0c '
1c
AWGN Channel Evaluation Platform
AWGNError Analyzer
PCI_ IF
DEC LLR Cal
PCI Port
PCI handshaking logic
Start, stop, noise variance, max errorBlock error,
Bit error
Fully reconfigurable: Iteration numbers, bit width, block length, Column width of code
High throughput
Small size to fit in one chip
Error floor of DDS code in AWGNError floor of DDS code in AWGN
3.5 4 4.5 5 5.5 6 6.510-10
10-9
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
SNR
BE
R/B
LER
DDS j=5, N=4923, M=547, Iterations=50
BLER
BER
*H. Song, “Iterative soft detection and decoding for data storage channels,” Ph.D dissertation, Dept. of ECE, Carnegie Mellon University, Pittsburgh, USA, Dec. 2002
DDS Codes in AWGN Channel
3 3.5 4 4.5 5 5.5 6 6.5 710-10
10-9
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
SNR
BER
/BLE
R
DDS j=3,4,5, N=4923, M=547, Iterations=50
DDS j=3DDS j=4DDS j=5 AWGN channel,
6 bit LLR, 50 iterationsJ=3 Error Floor at BLER=1e-3J=4 Error Floor at BLER=1e-4J=5 Error Floor at BLER=1e-6
*H. Song, “Iterative soft detection and decoding for data storage channels,” Ph.D dissertation, Dept. of ECE, Carnegie Mellon University, Pittsburgh, USA, Dec. 2002
3.5 4 4.5 5 5.5 610-12
10-11
10-10
10-9
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
SNR
BE
R/B
LER
Array j=5, N=4637, M=515, Iterations=50
BLER
BER
Error floor of Array code in AWGNError floor of Array code in AWGN
Array Code Evaluation in AWGN Channel
3 3.5 4 4.5 5 5.5 6 6.5 710-12
10-11
10-10
10-9
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
SNR
BER
/BLE
R
ARRAY j=3,4,5, Iterations=50
ARRAY j=3, N=4671, M=519ARRAY j=4, N=4716, M=524ARRAY j=5, N=4635, M=515
AWGN channel6 bit LLR, 50 iterationsJ=3 Error Floor at BLER=1e-5J=4 Error Floor at BLER=1e-6J=5 Error Floor at BLER=1e-9
* J. Fan, "Array codes as Low-Density Parity Check Codes" 2nd International Symposium on Turbo Codes and Related Topics (Brest, France), September 2000.
PR Channel Evaluation Platform
Main challenges
Data flow control
Buffer and memory requirement
Clock skew in large design
PCI (33.33 MHz) handshaking
RDG ENC FIFO Interleaver
Precoder
PR+AWGN
SOVADe-interleaver
DecoderInterleaver FIFO
Error distribution Error
CountError Location
Start, stop, noise variance, max error
Error Location, Error number, Error distribution
Error floor of DDS code (column weight 3) in PR channel
4 4.5 5 5.5 6 6.5 7 7.510
-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
SNR
BE
R/B
LER
DDS J=3, N=4923, M=547
AWGN Channel Result at 50 LDPC iteration
PR channel with 25 LDPC iterations and 2 channel iterations
BLER
BER BER
BLER
C Simulation
EPR4 channel7 bit SOVA6 bit LDPC Decoder25 LDPC iterations2 Channel iteration
4 4.5 5 5.5 6 6.5 7 7.510
-10
10-9
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
SNR
BE
R/B
LER
DDS J=5, N=4923, M=547
AWGN Channel at 50 LDPC iteration PR Channel with
25 LDPC iterations and 2 channel iterations
BLER
BER
BLER
BER
Error Floor of DDS Code (column weight 5) in PR Channel
EPR4 channel7 bit SOVA6 bit LDPC Decoder25 LDPC iterations2 Channel iteration
4 4.5 5 5.5 6 6.5 7 7.5 810-9
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
SNR
BE
R/B
LER
ARRAY j=3, N=4671, M=519
BER
BLER
AWGN Channel Result at 50 LDPC iteration
PR channel with 25 LDPC iterations and 2 channel iterations
BLER
BER
Error Floor of Array Code (column weight 3) in PR Channel
EPR4 channel7 bit SOVA6 bit LDPC Decoder25 LDPC iterations2 Channel iteration
Code length: 5760 Message length: 5120 Code rate: 8/9
Column Weight: 3, 4, 5Row weight: 27,36,45
Max. # iterations: 15
BER/BLER
1. 0E- 11
1. 0E- 10
1. 0E- 09
1. 0E- 08
1. 0E- 07
1. 0E- 06
1. 0E- 05
1. 0E- 04
1. 0E- 03
1. 0E- 02
1. 0E- 01
1. 0E+00
2. 5 3 3. 5 4 4. 5 5Eb/ N0
BER ( FPGA deg5)
BLER ( FPGA deg5)
BER ( FPGA deg 3)
BLER ( FPGA deg 3)
BER ( FPGA deg 4)
BLER ( FPGA deg 4)
Rate 8/9 QC-LDPC Code
BER/BLER
1. 0E- 11
1. 0E- 10
1. 0E- 09
1. 0E- 08
1. 0E- 07
1. 0E- 06
1. 0E- 05
1. 0E- 04
1. 0E- 03
1. 0E- 02
1. 0E- 01
1. 0E+00
0 0. 5 1 1. 5 2 2. 5 3
Eb/ N0 ( dB)
BLER( 30 i t er at i on,Scal ed mi n- sum, FPGA)
BER( 30 i t er at i on,Scal ed mi n- sum, FPGA)
Rate ½PEG QC-LDPC Code with Block Length of 32768 bits
BER/BLERBER/BLER
Low BERs Possible thanks due to FPGA
SOVA+LDPC FPGA SimulatorSOVA+LDPC FPGA Simulator
RDG ENC Interleaver
Write processor
SOVA Deinterleaver
DECinterleaverERR ANY
Read processor
Magnetic Recording Channel
Transition NoiseTransition Noise
Transition Jitter Width Variation
Transitions between bit cells not straightThe “zigzag”s result in transition shift (transition jitter) and pulse width variation
Media NonlinearitiesMedia NonlinearitiesPartial Erasure (PE)
Each bit cell is partially erased by the field of neighboring field (when transition occurs)Results in an amplitude loss in the readback signal
Nonlinear Transition Shift (NLTS)The previous written bit cells interfere with the current writing field Results in transition location shift
Partial Erasure
Original write bubble
Actual write bubble
Original Transition Position
Actual Transition Position
The writing head
NLTS Interference field
NLTS Factor
Partial ErasureFactor
Data Dependent Noise
Data Independent Noise
AWGN Generator
Transition NoiseEPR4 Equalizer
Jitter Noise
Width Variation
Magnetic Recording Channel ModelMagnetic Recording Channel Model
ISI Factor
Input Buffer
From Encoder
To Decoder
Electronic Noise
The ReadbackChannel
AWGN Generator
AWGN Generator
Hardware Resource UsageHardware Resource Usage
1 out of 16 6%
56 out of 168 33%
27308 out of 93184 29%
15936 out of 93184 17%
17897 out of 46592 38%
Number of GCLKs
Number of MULT18X18s
Number of 4 input LUTs
Number of Slice Flip Flops
Number of Slices
The resource utilization (Xilinx Virtex II 8000 -4 )• Total equivalent gate count for design: 3,696,944
Processing speed and throughput• Clock speed 40 Mbits/s• Throughput 180 Mbits/s per iteration, 3.6 Mbits/s with 50 iterations• Time required to reach BER of 10-11: FPGA --- 30 hours
C simulation --- 3,000 hours
Signal-to-Noise Ratio (SNR)Signal-to-Noise Ratio (SNR)
Includes both AWGN and transition noiseSNR defined as the ratio of the power of a single pulse and the height of noise power spectral density
The transition noise is the dominant part of the noise (80% in power)
02
250 0
/ 1 4
t
j t
E NSNRE
PW Nσ
=+
Xilinx Virtex II 8000FPGA ChipAlphaData ADM-XRC-II PCI AdaptorAdaptor
25 LDPC (outer) iterations2 channel (inner) iterations
Number of iterations12dB --- 22 dBSNR
EPR4PR target
Code length 4932, rate 8/9 LDPC codeLDPC Codes
VHDLHardware Programming language
Simulation SettingsSimulation Settings
BER for Different ChannelsBER for Different Channels
6 dB due to realistic channel modeling of Lorentzian pulse shape and the non-adaptive equalizer
2.5 dB due to PR channel
3 dB due to the presence of media nonlinearities
When the PE ratio reaches 0.7, the loss in BER is significant. For the NLTS effect, the threshold is around 0.15T. When ε exceeds 0.15T, a precompensator is necessary for the system.
Partial Erasure and NLTSPartial Erasure and NLTS
PE Effect NLTS Effect
Inefficient region Waterfall Region Error Floor
Each curve represents the ratio of errors corrected during the past 5 iterations In the low SNR region, the noise level is too high for the LDPC code to be effective. In the “waterfall” region, the error rate drops dramatically as SNR increases. Each iteration is able to correct a large portion of error bits left. In the error floor region, only the early iterations in each of the two channel iterations are effective. Some iterations introduce more errors than they correct, shown as negative ratio.
Iteration EfficiencyIteration Efficiency
BER with Fewer IterationsBER with Fewer Iterations
0.6 dB
Reduce the total number of iterations to 10 (in 2 channel iterations) from 50The loss in BER is 0.6 dB in the waterfall regionThe guaranteed throughput is increased by 5
FPGA LDPC Code Platform EvolutionFPGA LDPC Code Platform Evolution
AWGN Channel + LDPC ENDECIdealized EPR4 + LDPCEqualized EPR4 + LDPCLorentzian readback signal (on FPGA) + equalized EPR4 + LDPC
AWGNTransition noiseNonlinear transition shiftPartial erasure
Perpendicular recording channel model (on FPGA)
Perpendicular RecordingPerpendicular Recording
Step response modeled by error functionTransition noise Media nonlinearities
– Nonlinear transition shift (NLTS)
– Partial erasure (PE)
Inter-symbol interferenceElectronic Noise
ln16( ) ( )50dt
tg t E erfPW
=
LDPC Code with Perpendicular RecordingLDPC Code with Perpendicular Recording
BER FER
The effects of major impairments in perpendicular recording systems are similar to that in longitudinal recording except NLTS does not appear to impact perpendicular recording as much.At normalized recording density of 2 (PW50/T =2), the BER reaches 10-12 at SNR of 21.5 dB.
Magnetic recording channel models (both longitudinal and perpendicular recording systems) include transition noise, nonlinear transition shift, partial erasure and electronic noiseFPGA allows us to reach BERs as low as 10-12
At normalized recording density of 2, simulation exhibits 10-12 BER for SNR of around 21 dBThe individual impact of different impairments on coding performance can be investigated
SummarySummary
Top Related