Post on 01-Sep-2018
Advanced Hardware Architecture for
Soft Decoding Reed-Solomon Codes
Stefan Scholl, Norbert Wehn
Microelectronic Systems Design Research Group
TU Kaiserslautern, Germany
Overview
• Soft decoding decoding for the RS(255,239)
• New hardware architecture
• Goal: large FER gain (over hard decision decoding)
• Algorithm based on information set decoding
• Complexity evaluation on a Virtex 5 FPGA
2
Motivation RS / BCH Decoder Hardware
NASA / CCSDS
wireless
VDSL
wired storage
Optical (G.709)
3
Widely used code: RS(255,239) or its shortened versions
Decoding Algorithms for Reed-Solomon
Hard Decoding Soft Decoding
4
Progress in microelectronicsallows for more complexity today!
• standard method• algebraic decoding• complexity very low:
first chip implementations inthe 1970/80s
Algorithm:
Decoding Algorithms for Reed-Solomon
Hard Decoding Soft Decoding
5
Progress in microelectronicsallows for more complexity today!
Algorithms:
Chase Decoding
Information Set Decoding
Adaptive Belief Propagation
Kötter-Vardy
…
Improved error correctionpossible gain: up to 3 dB(depends on length and coderate)
• standard method• algebraic decoding• complexity very low:
first chip implementations inthe 1970/80s
Algorithm:
Decoding Algorithms for Reed-Solomon
Hard Decoding Soft Decoding
6
Progress in microelectronicsallows for more complexity today!
Algorithms:
Chase Decoding
Information Set Decoding
Adaptive Belief Propagation
Kötter-Vardy
…
Improved error correctionpossible gain: up to 3 dB(depends on length and coderate)
• standard method• algebraic decoding• complexity very low:
first chip implementations inthe 1970/80s
Algorithm:
We consider the widely used RS(255,239)
but RS(255,239) seems to be challenging
“medium gain”hardware0.5 – 1 dB
State-of-the-art Soft Decoder Hardware
7
Real & complete hardware implementations for RS(255,239)
Paper Year Algorithm Gain (over HDD)
An (PhD thesis, MIT) 2010 Low complexity Chase
0.45 dB
Hsu et al (ESSCIRC) 2011 Chase 0.35 db
Garcia-Herrero et al (CSSP)
2011 Low complexity Chase
0.3 dB
Kan et al (ISTC) 2008 Adaptive BP 0.75 dB
Heloir et al (NEWCAS) 2012 Stochastic Chase 0.7 dB
Scholl et al (DATE) 2014 Information set 0.75 dB
“low gain” hardware<0.5 dB
State-of-the-art Hardware Implementations
9
Hard decision decoding
“low gain”<0.5 dB
“medium gain”0.5 - 1 dB
State-of-the-art Hardware Implementations
10
Hard decision decoding
“low gain”<0.5 dB
“medium gain”0.5 - 1 dB
“high gain”> 1 dB
Not yetinvestigated!
Literature shows:up to 2 dB gain should be possible
0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0
Implemented Algorithm*
11
1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability
H =
Received bits
*A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of theadaptive parity check matrix based soft-decision decoding algorithm, 2004.
most reliable least reliable
Binary image
Information set decoding approach
0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0
Implemented Algorithm*
12
1 1 0 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0 0 0 1
1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability
H =
Received bits
*A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of theadaptive parity check matrix based soft-decision decoding algorithm, 2004.
most reliable least reliable
Diagonalizedby Gaussianelimination
Binary image
Information set decoding approach
0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0
Implemented Algorithm*
13
1 1 0 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0 0 0 1
1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability
H =
Received bits
001000
syndrome
Syndrome weight:
Small:Only errors in least rel. part
Large:Min. 1 errors in most rel part
*A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of theadaptive parity check matrix based soft-decision decoding algorithm, 2004.
most reliable least reliable
Diagonalizedby Gaussianelimination
Binary image
Information set decoding approach
0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0
Implemented Algorithm*
14
1 1 0 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0 0 0 1
1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability
H =
Received bits
001000
syndrome
Syndrome weight:
Small:Only errors in least rel. part
Large:Min. 1 errors in most rel part
Order 1 processing: tentatively flip each most reliable bit (here: 1912)Order 2 processing: tentatively flip all combinations of 2 most reliable bits
(~2 million cases)
*A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of theadaptive parity check matrix based soft-decision decoding algorithm, 2004.
Can be seen as a low complexity variant of ordered-statistics decoding
most reliable least reliable
Diagonalizedby Gaussianelimination
Binary image
Information set decoding approach
Algorithm Improvements
We add further features for improvement (mostly from other literature):
• Use a hard decision decoder (counters potential error floor)
• Use three differently diagonalized parity check matrices (improves FER)
• Partial overlapping of diagonalized parts
• allows for sophisticated architecture (complexity reduction)
• Restrict order 2 processing to “fair” reliable bits (250 out of 1912)
• Need to determine additional group: fair reliable (besides least and most)
• Large reduction of processings (factor 60 less)
• Use approximative reliability sorting to enable parallelization
(higher speed)
15
Overall loss due to complexity reduction: < 0.1 dB
Our New Hardware Architecture
16
Implementation on Virtex 5 FPGA
Input: 2040 bit LLRs8 in parallelQuantization: 6 bits
output: 2040 bits(hard out)8 in parallel
Our Hardware Architecture
17
Sorting
Finds low and fair reliable bits
Finds 378 lowest out of 2040 LLRs
Shift register based insertion sort
8 sorters parallel (approximative sorting)
Stores bit positions in four memories
Our Hardware Architecture
18
Gaussian Elimination /Diagonalization:
Original matrix stored in memory
Diagonalization “on the fly”
Diagonalizaton “column wise”
2 phases: setup & elimination
Saves ~70% hardware over state-of-the-art diagonalizations (e.g. systolic arrays)
Three diagonalizations: exploit overlapping
+ +
+
+
P
P
P
P: Fixed pivot positions!
columnoriginalmatrix
column eliminated matrix
Pipelined array eliminator
Our Hardware Architecture
19
Correction Unit
Performs order 1 and 2 processing
Parallelized order 2 proc.
In 1 clock cycle: 1x order 1
6x order 2
3 instances (for 3 matrices)
Selects best results for output
Our Hardware Architecture
20
Syndrome Calculation:
Required: syndrome of the diagonalized matrix
Strategy:
First: calculate syndrome using original matrix
Second: “diagonalize” syndrome in the Gaussian Elimination
Advantage: allows use of Galois field operations (much faster)
FPGA Implementations
21
Kan et al Scholl et al Heloir et al THIS WORK
Algorithm Adaptive BP Information Set Stoch. Chase Information Set
Chip Stratix II Virtex 5 Virtex 5 Virtex 5
Flipflops n/a 42,000 143,000 70,200
Look-Up Tables 43,700 13,700 117,000 32,400
Throughput 4 Mbit/s 800 Mbit/s 50 Mbit/s 300 Mbit/s
Communicationsgain over HDD
0.75 dB 0.75 dB 0.7 dB 1.3 dB
Our new
architectureM. Kan et al., Hardware implementation of soft-decision decoding for Reed-Solomon code.
In Proc. 5th Int. Turbo Codes and Related Topics Symp, 2008.
S. Scholl and N. Wehn, “Hardware Implementation of a Reed-Solomon
Soft Decoder based on Information Set Decoding, DATE ’14, 2014.
R. Heloir, C. Leroux, S. Hemati, M. Arzel, and W.J.Gross.
Stochastic chase decoder for reed-solomon codes. IEEE NEWCAS 2012
State-of-the-art soft decoder RS(255,239), gain > 0.5 dB
FPGA Implementations
22
Kan et al Scholl et al Heloir et al THIS WORK
Algorithm Adaptive BP Information Set Stoch. Chase Information Set
Chip Stratix II Virtex 5 Virtex 5 Virtex 5
Flipflops n/a 42,000 143,000 70,200
Look-Up Tables 43,700 13,700 117,000 32,400
Throughput 4 Mbit/s 800 Mbit/s 50 Mbit/s 300 Mbit/s
Communicationsgain over HDD
0.75 dB 0.75 dB 0.7 dB 1.3 dB
Our new
architectureM. Kan et al., Hardware implementation of soft-decision decoding for Reed-Solomon code.
In Proc. 5th Int. Turbo Codes and Related Topics Symp, 2008.
S. Scholl and N. Wehn, “Hardware Implementation of a Reed-Solomon
Soft Decoder based on Information Set Decoding, DATE ’14, 2014.
R. Heloir, C. Leroux, S. Hemati, M. Arzel, and W.J.Gross.
Stochastic chase decoder for reed-solomon codes. IEEE NEWCAS 2012
State-of-the-art soft decoder RS(255,239), gain > 0.5 dB
FPGA Implementations
23
Kan et al Scholl et al Heloir et al THIS WORK
Algorithm Adaptive BP Information Set Stoch. Chase Information Set
Chip Stratix II Virtex 5 Virtex 5 Virtex 5
Flipflops n/a 42,000 143,000 70,200
Look-Up Tables 43,700 13,700 117,000 32,400
Throughput 4 Mbit/s 800 Mbit/s 50 Mbit/s 300 Mbit/s
Communicationsgain over HDD
0.75 dB 0.75 dB 0.7 dB 1.3 dB
Our new
architectureM. Kan et al., Hardware implementation of soft-decision decoding for Reed-Solomon code.
In Proc. 5th Int. Turbo Codes and Related Topics Symp, 2008.
S. Scholl and N. Wehn, “Hardware Implementation of a Reed-Solomon
Soft Decoder based on Information Set Decoding, DATE ’14, 2014.
R. Heloir, C. Leroux, S. Hemati, M. Arzel, and W.J.Gross.
Stochastic chase decoder for reed-solomon codes. IEEE NEWCAS 2012
State-of-the-art soft decoder RS(255,239), gain > 0.5 dB
Summary & Outlook
Proposed new RS soft decoder hardware for RS(255,239)
Based on information set decoding
Implementation with currently best FER: gain 1.3 dB over HDD
New “High gain” architecture, besides low & medium gain
Acceptable complexity
Improving implementation efficiency
Architectures for specific application’s requirements
Approach applicable to every linear code
Summary
Future Challenges
25
Our new Binary Gaussian Elimination
• Basic operation: adding rows onto other rows to form unit columns
• For our hardware: Two Phase Approach
1. Setup: configures addition patterns
2. Elimination: performs actual elimination
• Architecture: Column by column processing with pipelined array
27
Columns fromoriginal matrix
Columns of eliminated matrix
+ +
+
+
P
P
P
P: Fixed pivot positions!
S. Scholl, C. Stumm, and N. Wehn. Hardware Implementations of Gaussian Elimination over GF(2) for Channel Decoding Algorithms. IEEE AFRICON 2013.
Comparison, 128 x 2040 matrix
28
Architecture Look-Up-Tables Flipflops Throughput
SMITH* 780k* 260k*
Systolic array 82k 99k 219k matrices / s
proposed 17k 33k 272k matrices / s
Design Example: Reed-Solomon (255,239) Code:
Binary Matrix Size: 128 x 2040
Implementation on a Xilinx FPGA Chip (Virtex 7)
* estimated +25% increase-67% saving-80% saving
Efficient Gaussian elimination
is the key for efficient soft decoding!