ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming.
1 ELEC692 VLSI Signal Processing Architecture Lecture 10 Viterbi Decoder Architecture.
-
Upload
rose-mccormick -
Category
Documents
-
view
242 -
download
10
Transcript of 1 ELEC692 VLSI Signal Processing Architecture Lecture 10 Viterbi Decoder Architecture.
1
ELEC692 VLSI Signal Processing ArchitectureLecture 10
Viterbi Decoder Architecture
2
Outline• Convolutional Code Structure
– Encoder Structure
– Finite state machine representation
– Trellis diagram
• Decoding Algorithm– Viterbi Decoder
• Viterbi Decoder VLSI Architecture
3
Convolution Code• Coding – add redundancy on the original bit data for error
checking or error correction– E.g. error checking – parity check code– Error correction code
• Either block codes or convolutional codes. The classification depends on the presence or absence of memory.
• A block codeblock code has no memory.
• Each output codeword of an (n,k) block code depends only on the current buffer.
– K is the # of original data bit and n is the # of encoded bit• The encoder adds n-k redundant bits to the buffered bits.
The added bits are algebraically, related to the buffered bits. • The encoded block contains n bits.• The ratio k/n is known as the code ratecode rate.
4
Convolution Code
• A convolutional coderconvolutional coder may process one of more samples during an encoding cycle. – It is described by 3 integers: n, k, and K.
– k/n = Code Rate (Information/coded bits).
– But n does not define a block or codeword length.
– K is the constraint lengthconstraint length and is a measure of the code redundancy.
– The encoder acts on the serial bit stream as it enters the transmitter.
– Convolutional codes have memorymemory.
– The n-tuple emitted by the encoder is not only a function of an input k-tuple, but is also a function of the previous K-1 input k-tuples.
5
Encoder Structure• Map k bits to n bits using the previous (K-1)k bits• rate k/n code with constraint length K• n generators or polynomial , each is a binary vector with
K bits long• The following shows the case where k=1 (easily
extendable)
• Example: k=1, n=2, K=3, g1=[101]=1+z-2, g2=[111]=1+z-
1+z-2 +
+
Input (b1,b2,…)
Output (c1,c2,c3,c4,…)
6
Basic Channel Coding for Basic Channel Coding for Wideband CDMAWideband CDMA
Convolutional code is Convolutional code is rate 1/3 and rate 1/2rate 1/3 and rate 1/2, all with , all with constraint length 9constraint length 9
Convolutional Codes
Concatenated Codes
7
Convolutional Encoding
• Let m = m1, m2, …, mi, … denote the input message bits.
U = U1, U2, …, Ui, … denote the codeword sequence.with
Ui = u1i, u2i, …, uni, = ith codeword and
uji, = jth binary code symbol of Ui.
• Let Z = Z1, Z2, …, Zi, … denote the demodulated sequence Estimate of the input message bits
with Zi = z1i, z2i, …, zni,,...ˆ,...,ˆ,ˆˆ 21 immmm
8
Convolutional Encoding DecodingConvolutional Encoding Decoding
Modulate
AWGNChannel
DemodulateConvolutionalDecoder
Informationsink
Informationsource
ConvolutionalEncoder
m=m1,m2,….,mi,…Input sequence
U=G(m) =U1,U2,…Ui,.. Codeword sequencewhere Ui=u1j,….,uji,….,uni
si(t)
Z=Z1,Z2,…,Zi,….where Zi=z1i,….,zji,…,zni
and zji is the jth demodulator output Symbol of branch word zi
,...ˆ,...,ˆ,ˆˆ 21 immmm )(ˆ tsi
9
Convolutional Encoding
• A general convolutional encoder with constraint length K and rate k/n consists of.– kK-stage shift register and n mod-2 adders– K = Number of k-bit shifts over which a single information bit can
influence the output.– At each unit of time:
• k bits are shifted into the 1st k stages of the register• All bits in the register are shifted k stages to the right• The outputs of the n adders are sequentially sampled to give the
coded bits.• There are n coded bits for each input group of k information or
message bits. Hence R=k/n information bit/coded bit is the code rate (k < n).
10
Convolutional Encoder (with Constraint Convolutional Encoder (with Constraint length K and rate k/n)length K and rate k/n)
1 2 3 … kKmm = m1 , m2 , …, mi , … Input sequence
(shifted in k at a time)
kK-stage shift register
1 2 . . . nn modulo-2
Adders
Codeword sequence U = U1, U2, … Ui, … where Ui = u1i,…,uji,…,uni,
= ith codeword branch
uji = jth binary code symbol
of branch word Ui.
Typically binary codes for which k=1 are used. Hence, we will mainly consider Rate mainly consider Rate 1/n1/n codes codes
11
Convolutional Codes Representation
• To describe a convolutional code, we must describe the encoding must describe the encoding functionfunction G(m) that characterizes the relationship between the information sequence m and the output coded sequence U.
• There are 4 popular methods for representation– Connection pictorial and Connections polynomials
– State Diagram
– Tree Diagram
– Trellis Diagram
12
Connection Representation
• Specify n connection vectorsconnection vectors, gi, (i=1, …, n) one for each of the n mod-2 adders.
• Each vector has K dimension and describes the connection of the shift register to the mod-2 adders.
• A 1 in the ith position of the connection vector implies shift register is connected.
• A 0 in the the ith position of the connection vector implies no connection exists.
13
Convolutional Encoder (K =3, Rate 1/2)
g1 = 1 1 1g2 = 1 0 1
If Initial Register Content is 0 0 0and Input Sequence is 0 0 1. ThenOutput (or Impulse Response) Sequence is 11 10 11.
Org1(X)=1+X+X2
g2(X)=1+X2
Input Input ...001001...
U1 : First code symbolFirst code symbol
U2 : Second Second
code symbolcode symbol
...111......111... OutputOutput...101......101...
14
Example (for the previous code)Example (for the previous code)
t1
u1
u2
1 11 0 0
t2
u1
u2
1 00 1 0 t5
u1
u2
1 10 0 1
t3
u1
u2
0 01 0 1 t6
u1
u2
0 00 0 0
t4
u1
u2
1 00 1 0
Encoderm=1 0 11 0 1 u
TimeTime OutputOutput TimeTime OutputOutput
Output sequenceOutput sequence: 11 10 00 10 1111 10 00 10 11
Message bits input at t1, t2, t3
(K-1)= 2 zeros are input at t4, t5 to flush
Register. Another 0 inputat t6 to get 00.
15
State Representation
• The statestate of a rate 1/n code = Contents of the rightmost K-1 stages.• Knowledge of the state and the next input is necessary and sufficient to determine
the next output.• Codes can be represented by a State DiagramState Diagram where the states representstates represent the the
possible contents of the rightmost possible contents of the rightmost K-1K-1 stages of the shift register stages of the shift register.• From each state there are only 2 transitionstransitions (to the next state) corresponding to the
2 possible input bits.• The transitions are represented by paths on which we write the output word represented by paths on which we write the output word
associated with the state transitionassociated with the state transition.– A solid line path corresponds to an input bit 0.– A dashed line path corresponds to an input bit 1A dashed line path corresponds to an input bit 1.
16
State Diagram for our CodeState Diagram for our Code(K=3, Rate ½)(K=3, Rate ½)
b=10
a=00
d=11
c=01
11
01
11
00
10
01
OutputOutputBranch wordBranch word
EncoderEncoderstatestate
Legend:Legend:Input bit 0Input bit 1
0010
17
ExampleExampleAssume that m=1101111011 is the input followed by K-1=2 zeros to flush the register. Also assume that the initial register contents are all zeros. Find the output sequence U.
Input bit mi
0 0 0 0 1 0 1 1 0 1 1 0 1 1 0
1
Register contents
State at time ti
State at time ti+1
Branch word at time ti
u1 u2
- 1 1 0 1 1 0 0
0 0 0 1 0 0 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 0 0 1
0 0 1 0 1 1 0 1 1 0 1 1 0 1 0 0
1 0 0 0 0 0 1
1 1 1 0 1 1 1
State tiState tState ti+1i+1
Output sequence: U = Output sequence: U = 11 01 01 00 01 01 1111 01 01 00 01 01 11
18
Tree Diagram Representation
• The tree diagram is similar to the state diagram, except that it adds adds the dimension of timethe dimension of time.
• The code is represented by a tree where each tree branch describes and output word– If the input is 0input is 0, then we move to the next rightmost branch in the upward upward
directiondirection.
– If the input is 1input is 1, then we move to the next rightmost branch in the downward downward directiondirection.
• Using the tree diagram, one can dynamically describe the encoder as a function of a particular input sequence.
19
Tree Diagram for our CodeTree Diagram for our Code
t1 t2 t3 t4 t5
Structure repeats itself after the 3rd branching
(at t4)
Heavy Line represents mm = 1 1 0 1 1
00
1110
0111
0001
10
00 a
11 b
10 c
01 d
00
1110
0111
0001
10
11 a
00 b
01 c
10 d
00 a
11 b
10 c
01 d
00 a
00 a
00
1110
0111
0001
10
00 a
11 b
10 c
01 d
00
1110
0111
0001
10
11 a
00 b
01 c
10 d
11 a
01 d
01 c
10 d
10 c
11 b
11 b
00 b
a
Codeword Codeword branchbranch
11
00
Output Codeword UU = 11 01 01 00 01
20
Trellis Diagram Representation
• In general, the tree structure repeats itself after tree structure repeats itself after KK branching branching (K = Constraint length).
• Label each node in the tree by its corresponding State.
• Each transition from a node state produces 2 nodes (2 states).
• Any 2 nodes having the same state label, at the same time, can be mergedmerged since all succeeding paths will be indistinguishable.
• The diagram we get by doing so is called the Trellis diagramTrellis diagram.
21
Trellis Diagram for our CodeTrellis Diagram for our Codestatestate
LegendLegend Input bit 0
Input bit 1
Codeword Codeword branchbranch
• 00t1
a=00 • 00t4•
00t3• 00t2 •
00t5 • t6
11111111
• • • • •
• • • • •
• • • • •
•
•
•
d=11
c=01
b=10
11
10 101010
01010101
1010 10
11 1111
000000
01 0101
Trellis structure repeats itself after depth K =3
22
Decoding of Convolutional Code
• Maximum Likelihood DecodingMaximum Likelihood Decoding• Viterbi AlgorithmViterbi Algorithm
23
Maximum Likelihood DecodingMaximum Likelihood Decoding
• Let U(m) denote one of the possible (say, the mth) transmitted sequence and Z the received sequence.
• The optimum decoderoptimum decoder (which minimizes probability of error) is the one that maximizes P(Z| U(m)). I.e., Optimum Decoder chooses the sequence U(j) if
• This is known as the Maximum Likelihood DecoderMaximum Likelihood Decoder.
)|(max)|( )(
)(
)(
m
U
j UZPUZPm
24
Maximum Likelihood MetricMaximum Likelihood Metric• Assume a memorylessmemoryless channel, i.e., noise components are
independent. Then, for a rate 1/n code
where Zi is the ith branch of Z. Then the problem is to find a path (each path defines a codeword) through the trellis (or tree) s.t.
)|()|()U|Z( )(
1 1
)(
1
)( mji
i
n
jji
mi
ii
m uzPUZPP
maximized is )|( )(
1 1
mji
i
n
jji uzP
maximized is )|logP(z :log) the(by takingor )(
1 1ji
mji
i
n
ju
25
Maximum Likelihood MetricMaximum Likelihood Metric
• This function which we need to maximize is known as the log-likelihood function or the log-likelihood metriclog-likelihood metric.
• To find the optimum path, we can compare all possible paths in the tree or trellis and find the path which maximizes the log-likelihood metriclog-likelihood metric. This is known as the brute-forcebrute-force or exhaustive approachexhaustive approach.
• The brute-force approach is not practical as the # paths grows exponentially as the path length increases.
• The optimum algorithm for solving this problem is the Viterbi Decoding Algorithm Viterbi Decoding Algorithm oror Viterbi Decoder. Viterbi Decoder.
26
Binary Symmetric Channel (BSC)
Have
p = Crossover ProbabilityCrossover Probability or Channel Symbol Error Channel Symbol Error Probability or Channel BERProbability or Channel BER
X Y
1-p
1-p
p
p
0 0
1 1
pXYPXYP 0110
pXYPXYP 11100
27
Log-Likelihood MetricLog-Likelihood Metric• Assume that U(m) and Z are each L-bit long and that they
differ in dm positions. I.e., Hamming Distance between them is dm. Then
where A and B are positive constants (as p < 0.5).
mm dLdm ppUZP )1()|( )(
BAd
pLp
pdUZP
m
mm
)1log(1
log)|(logor )(
28
Log-Likelihood MetricLog-Likelihood Metric• Since A and B > 0, maximizing the Log-Likelihood
Metric is equivalentequivalent to minimizing the Hamming minimizing the Hamming DistanceDistance.
• Maximum Likelihood (ML) Decoder Maximum Likelihood (ML) Decoder (Hard Hard Decision DecodingDecision Decoding):– Choose in the tree or trellis diagram, the path whose
corresponding sequence is at the minimum Hamming distance to the received sequence ZZ.
– I.e Choose the minimum distance metric.
i.e. Hard-Decision Maximum Likelihood DecoderHard-Decision Maximum Likelihood Decoder = Minimum Hamming Distance DecoderMinimum Hamming Distance Decoder
29
Viterbi Decoding (R=1/2 & K=3)Viterbi Decoding (R=1/2 & K=3)
Branch metric
Decoder tries to find the minimum distance pathDecoder tries to find the minimum distance path
state
• 2t1
a=00 • 1t4•
1t3• 1t2 •
1t5 • t6
1111
• • • • •
• • • • •
• • • • •
•
•
•
d=11
c=01
b=10
0
2 202
0 20
22 0
1 11111
0 01
0
Input data sequence
Transmitted codeword
Received sequenceReceived sequence
mm:
UU:
ZZ:
1
00
10
0
01
01
1
01
01
1
11
11
1
01
01
...
...
...
30
Viterbi DecoderViterbi Decoder• Basic IdeaBasic Idea:
– If any 2 paths in the trellis merge to a single state, one of them can always be eliminated in the search.
– E.g., at time t5, 2 paths merge tomerge to (enter) state 00.• The cumulative Hamming path metriccumulative Hamming path metric of a given path at ti = Sum
of the branch hamming distance metrics along that path up to time ti.– The upper path metric is 4 and the lower path metric is 1.– The upper path cannot thus be part of the optimum path since the
lower path which enters the same state has a lower metric.– This is true because future output branches depend only on the
current state and not the previous states.
31
Path Metrics for 2 Merging PathsPath Metrics for 2 Merging Paths
statestate
• • • • •
• • • • •
Path metric = 4
Path metric = 1
•
t1
a=00 • 1
t4
•
t3
•
t2
• • • • • d=11
c=01
b=10
0
2
0
1 1
0
•
t5
32
Viterbi DecodingViterbi Decoding• At time ti, there are 22K-1K-1 states states in the trellis where K is
the constraint length. (NB: #states#states is an important complexity measurecomplexity measure for Viterbi decoders.)
• Each state can be entered by means of 2 states.• Viterbi Decoding consists of computing the metrics for
the 2 paths entering each state and eliminating one of eliminating one of themthem.
• This is done for each of the 22K-1K-1 nodes nodes at time ti. • The decoder then moves to at time ti+1, and repeats the
process.
33
Viterbi decoding ExampleViterbi decoding Example
• •
Path metrics
• t1
a=00 • t2
b=100
2a=2
b=0
Path metrics
a=3
b=3
d=0
c=2
a=00
b=10
t1 t2
• • • •
0
2
• • 1
1 •
• • • • • •
0
2•
t3
c=01
d=11
• • • • • • • •
• t1
a=00 • t4
• t3
• t2
• • • • d=11
c=01
b=10 0
20
1
0
1 12
2
02
1 • • • • • • • •
• t1
a=00 • t4
• t3
• t2
• • • • d=11
c=01
b=10 0
20
1
2
0
1
Path metricsPath metrics
a=3
b=3
d=2
c=0
(a)
(d)(c)
(b)
1
34
Viterbi decoding ExampleViterbi decoding Example
(e)
(h)(g)
a=1
b=1
d=2
c=3
a=00
c=01
b=10
t5
(f)
• • • • • •
• t1
• t3
• t2
• • • d=11
0
20
• •
• t4
• 2
0
1 • •
•
• 0
0
1
1• • • • • •
• t1
a=00 • t3
• t2
• • • d=11
c=01
b=10 0
20
• •
• t4
•
1
2
0
1 • •
• t5
•
1
0
0
11
1
1 2
• •
• t6
•
1
2
2
11
1
0 0
a=00
c=01
b=10
t5
• • • • • •
• t1
• t3
• t2
• • • d=11
0
20
• •
• t4
• 2
0
1 • •
•
• 0
0
1
1
a=2
b=2
d=1
c=2
• • t6
•
11
0 0
t5
• • • • • •
• t1
• t3
• t2
• • •
0
0 •
• t4
• 2
0
• •
•
• 0
1
1• •
35
Convolutional Codes Distance Properties
• The minimum distanceminimum distance between all pairs of possible codewords is quite important and is related to the error-correcting capability of the code.
• To compute it we can simply consider the all-zeros sequence (since the code is linear).
• Assuming that the all-zeros path is the correct one.– An error eventerror event (or errors) would occur when there exists a path which starts
and ends at the a=00 state at time ti (but does not return to the 00 state in between) with a metric that is smaller than the all-zeros path at ti. In this case, we say the correct path does not survive.
• The minimum distance of such an error path can be found using an exhaustive search for all possible error events.
36
Trellis labeled with distances from the Trellis labeled with distances from the all-zeros pathall-zeros path
statestate
• 0t1
a=00 • 0t4•
0t3• 0t2 •
0t5 • t6
2222
• • • • •
• • • • •
• • • • •
•
•
•
d=11
c=01
b=10
2
1 111
1111
11 1
2 22
000
1 11
37
Minimum Distance
• In the previous example there are:– 1 path with distance 5 (merges at t4) and correspond to the input sequence 1 0 0.
– 2 paths at distance 6 (One merges at t5 and the other at t6.). They are 1 1 0 0 and 1 0 1 0 0.
• df = Minimum Free DistanceMinimum Free Distance = Minimum distance of all
arbitrary long paths that diverge and remerge. df = 5 in this case and the code can correct any t=2 errors.
• A code can correct any can correct any tt channel errors channel errors where (this is an approximation).
2
1fdt
38
Formalized Viterbi algorithm
• Use the maximum likelihood decoding procedure• Find the closest sequence of symbols in the given trellis, using
either the Euclidean distance of the Hamming distance as distance measure
• The resulting sequence is called the global most-likely sequence• For a received N-state sequence v containing L symbols
v={v(0),v(1),…,v(L-1)}, where the first symbol v(0) is received at time instance 0 and the last one v(L-1) is received at time instance L-1, the Viterbi decoder iteratively computes the survivor path entering each state at time instances 1,…,L-1.
• The survivor path for a given state at time instance n is the sequence of the symbols closest in distance of the received sequence up to time n.
39
Viterbi algorithm
• Path metrics xi(n) – a metric assigned to each state denoting the distance between the survivor path for state i and the received sequence up to time n.
• Branch metrics- difference between the current received symbol v(n) and the output symbol in the encoding trellis.
• From time instance n to n+1, the Viterbi algorithm updates the survivor paths and the path metrics values xi (n+1) from the survivor path metrics at time instance n and the branch metrics (aij(n)) in the given trellis as follows:
• The updating mechanism is based on an optimization algorithm called dynamic programming.
Njinanxnx ijii
j ,...,2,1,)],()([min)1(
40
Viterbi Algorithm
• Let PM(s0=a,sn=b) be the maximum path metric (sum of accumulated branch metric BMs) from s0=a to sn=b
• Then, we can calculate PM(s0=a,s10=b) easily if we know PM(s0=a,s9=s) for all possible s, particularly those that has a branch to state b in the trellis
• PM(s0=a,s10=b) = maxsPM(s0=a,s9=s) +BM(s9=s,s10=b)
At this point, we can eliminate one of these two paths
41
Example• For this encoding trellis, assume at
the time instances n, the path metrics for the 4 states are:– x1(n)=2, x2(n)=0, x3(n)=1, x4(n)=2,
– Received symbol is v(n)=11
– Using the Hamming distance as the measure of the distance, we have the following branch metrics for all the transitions in the trellis
1|)0111(|)(,1|)1011(|)(
2|)0011(|)(,0|)1111(|)(
1|)1011(|)(,1|)0111(|)(
0|)1111(|)(,2|)0011(|)(
4443
3231
2423
1211
weightnaweightna
weightnaweightna
weightnaweightna
weightnaweightna
S11S00
S10S01
0/001/01
0/111/11 1/10 0/10
1/00
0/01
g1(z)=1+z-2
g2(z)=1+z-1+z-2
S00
S01
S10
S11
S00
S01
S10
S11
0/00
1/110/11
0/01
1/10 1/00
0/10
1/01
Timen
Timen+1
42
Example• Survivor path and its path
metrics for each state from time n to n+1 are updated.
• 2 possible path entering each state, the one with large metric is discarded.
The update process is carried out iteratively from n=1 to n=L
43
Example
• The global most-likely sequence is the survivor path of the state with minimum path metrics at time=L, i.e.
– Where ind-1 means “take the index of the corresponding state”.
• Optimality guaranteed based on the dynamic programming algorithms have the property that the optimum solution from an initial iteration to the iteration n+m must consist of the optimum solution from initial iteration to iteration n and from iteration n to iteration n+m.
)})({(min1 Lxindi jj
44
Example
• Example
• Figure 1.11
45
Computation in Viterbi algorithm
• Computing of branch metrics aij(n)
• Updating the path metrics– Requires addition, comparison and selection (ACS) for every
state each time instance
• Selecting the final state• Tracing back its survivor path
46
Design and Implementation of Viterbi Decoder
• Real Viterbi Decoder need to consider the following practical problems– Arbitrarily long decoding delays cannot be tolerated. The
decoder has to output decoded information bits before the entire encoded message has been retrieved.
– Incoming analog signals has to be quantized by ADC
– The decoder may be brought on line in the middle of transmission and will thus not know where one n-bit block ends and the next begins.
• Need block synchronization
47
Block Diagram of a practical Viterbi decoder
48
Quantization
• Difference in performance between an un-quantized soft-decision and a hard-decision decoder
• B-bit quantization provides decoder performance in between
• B=3 (8 levels) quantization introduces only a slight reduction in performance (~0.25db)
49
Block synchronizer
• Segment the received bit stream into n-bit blocks, each block corresponding to a stage in the trellis.
• If the received bits are not properly divided up, the results are disastrous.
• We can use this disastrous nature to help the draw the block boundary.– If the boundary is correct, one or a few partial path metrics will be much
lower than the others after a few constraint lengths of branch metric computations.
– If the alignment is wrong, the metric tends to be random and all paths have similar partial path metrics and there is not dominant path.
– We can use this detect “out-of-sync” and adjust the block boundary until this is fixed
– We can use a simple threshold for this detection.
50
Branch Metric (BM) Computer• Typically based on a look-up table containing the various bit metrics
• Look up n bit metrics associated with each branch and sums them to obtain the branch metric
• For symmetric channel, the BM calculation is simpler. The second row of the bit metric table is simply a reversed image of the first row.
• Same look-up function is performed n times per branch for each 2MK branches per stage in the trellis. – An extreme fast decoder may need n2MK look-up table circuits or a simple
decoder needs to use the same look-up table n2MK times.
• Reducing number of bits required for the BM by simplification and approximation
M(r|y)
y=0y =1
R = 0’ 0 1 1’
5 4 2 0
0 2 4 5
M(r|y)
y=0y =1
R = 0’ 0 1 1’
3 2 1 0
0 1 2 3
(need 3 bits) (need only 2 bits)
51
Path Metric Updating and Storage• Basic trellis element of a rate 1/n convolution code
Sj,t
Sj+2^(M-1),t
S2j,t+1
S2j+1,t+1
Mj,2j(rt+1)
Mj+2^(M-1),2j+1(rt+1)
Mj+2^(M-1),2j (rt+1)
Mj,2j+1(rt+1)
• A common circuit, add-compare-select (ACS) to calculate the above basic trellis element– Parallel or single ACS can be used depending on the throughput requirement
Add
Add
Compare Muxselect
V(Sj,t)
V(Sj+2^(M-1),t)
Mj,2j(rt+1), Mj,2j+1(rt+1)
Mj+2^(M-1),2j (rt+1)Mj+2^(M-1),2j+1(rt+1)
V(S2j,t+1)
V(S2j+1,t+1)
52
Information Sequence Updating and Storage
• This unit is responsible for keeping track of the information bits associated with the surviving paths.
• Two basic design approaches: – register exchange and trace back
– Both need shift register to associate with every trellis node throughout the decoding operations
53
Decoding depth (or Survivor Path Length)
• # of bits that a register must be capable of storing is a function of decoding depth
• At some point during decoding, the decoder can begin to output information bits
• The information bits associated with a survivor branch at time t can be released when the decoder begins operation on the branches at time t+, where is called the decoding depth and is usually set to be five to ten times the constraint length of the code.
• The meaning of the survivor path length is that after trace back to that point, all the shortest path (survivor paths) from all possible starting states should have merged and the input corresponding to the transition from the state at time t is decoded.
• # of registers need = length K• Once the register is full, (t = ) the oldest bits in the register are output as
new bits are entered. The register are thus FIFO of fixed length
54
Example of Trellis
…x2,x1,x0
…y2(0),y1
(0),y0(0)
…y2(1),y1
(1),y0(1)
…y2(2),y1
(2),y0(2)
S0
S1
S3
S2
0/000
1/1110/110
0/111
0/0011/001
1/000
1/110
55
Register Exchange• Register for a given node at a given time contains the information bits
associated with the surviving partial path that terminates at the node.
• As the decoding operations proceed, the contents of the registers in the bank are updated and exchanged as dictated by the surviving branches
Hardware intensive -Each register must be able to send and receive strings of bits to and from two other registers
Simple to implement
S0
S1
S2
S3
10100
11001
11010
10111
RegisterBank
t=5
0
1
-
-
RegisterBank
t=1
00
01
10
11
RegisterBank
t=2
000
101
110
111
RegisterBank
t=3
1100
1101
1010
1011
RegisterBank
t=4
-
-
-
-
RegisterBank
t=0
56
Trace back
• Register for each state, but the contents of the registers do not move back and forth.
• It contains the past history of the surviving branches entering that state.
• Information bits are obtained by “tracing” back through the trellis as dictated by this connection history.
• The states in the state diagram (or trellis) were associated with the encoder shift-register contents.– E.g. State S2 in a two-state encoder corresponds to the encoder shift-
register contents 01.– In general, a state Sxy can be preceded only by state SY0 or SY1.– A zero or one may this be used to uniquely designate the surviving
branch entering a given state.
57
Trace back register content
S0
S1
S2
S3
00011
00110
0100
0101
RegisterBank
t=5
0
0
-
-
RegisterBank
t=1
00
00
0
0
RegisterBank
t=2
000
001
01
01
RegisterBank
t=3
0001
0011
010
010
RegisterBank
t=4
-
-
-
-
RegisterBank
t=0
58
Low Power ACS unit for IS-95
• E.g. IS-95: Rate 1/2, K=9 Convolutional Code, generator functions:g0 = 753 (octal), g1 = 561(octal)
g00
g11
c00
c11
information bit(input)
59
• The path metric coming into state j from state i, at recursion n (PM(i,j)
n): BM(i,j) + PM in-1
• The branch metric (BM(i,j)) is the squared distance between the received noisy symbol, yn,and the idea noiseless output symbol of that transition.
Branch Metric calculation• For IS-95, k = 9, code rate = 1/2, there are 2 competitive
paths arriving at each state at each cycle, branch metric calculation: BMi,j,t = (yt - xi,j)2:
• For IS-95,n=2, there are only 4 different BMs• Carefully examining the rate 1/2 convolution code, we can
find– and hence
– m can be anyone of 512 possible branches in the trellis.
– Consequently, there is no need to have additional additions in BMU.
1
0
)(,,
)(n
l
mltlt
mt xyBM
)'(1
)(1
mt
mt BMBM
)(0,
)'(0,
)(0, 2 m
tm
tm
t BMBMBM
Path Metric Calculation• Partial path metrics of 2 competitive paths (m1 and m2), from states
s1 and s2 to state s at cycle i:
– PMi(m1) = PM(s1)
i-1 + BM(s1,s)i
– PMi(m2) = M(s2)
i-1 + BM(s2,s)i
• After the new partial path metrics are calculated, the following comparison is carried out:– PMi
(s) = min (PMi(m1) , PMi
(m2) ):
• For IS-95, there are 256 states,512 Add and 256 Compare and Select operations have to be done for every decoded bit
• Comparing with BMU and SMU the number of ACS operations is significant and hence reducing its power consumption is essential.
Conventional ACS Unit• ONE ACS operation requires reading two Path
Metric values• Butterfly operation
Si1
So2Si2
So1
t-1 t
• The number of read accesses can be reduce if the ACS operations to calculate the survivor paths at So1 and So2 are done together.
63
Bit width requirement for path metric
• Re-normalization of path metric values is required to avoid overflow.
• Increase the number of un-necessary operation of the ACSU
• Modular Normalization [Shung-1990] - if path metric memory bit width is >2Dmax where Dmax the maximum possible difference between the path metrics, no normalization is required.
• For IS-95 the maximum number of bits required for path metrics is 9 bits if the bit precision of the received symbol is 4.
64
ACSU: Modulo Normalization
• All binary values are evenly distributed on the circle.• PMs run on the circle clock-wisely• To compare two path metrics, compute the (n-1)th bit of the result of a
straightforward 2’s complement subtraction of the two 9-bit numbers. E.g. m1 = (m1,8,…,m1,0), m2 = (m2,8,…,m2,0) and d = (d8,…,d0) = (m1-m2) then
)(1
)(1
)'(1
)'(1
mt
mt
mt
mt BMPMBMPM )(
1)(
1)'(
1)'(
1m
tm
tm
tm
t BMPMBMPM
000001
111
dif m m
otherwise8
1 21
0
,
,
65
Architecture of conventional ACSU
• For a butterfly operation, 4 9-bit to 5-bit additions and 2 9-bit comparisons are needed
PMt-1(sa) Adder
Adder
COMP
BM t(sa,S0)
BM t (sb,s0)
Adder
Adder
COMP
BM t (sa,S1)
BM t,1(sb,s1)
PMt-1(sb)
PMt(S0)
PMt(S1)
Conventional butterfly
Re-arranging the ACS Calculation in the Butterfly
• For calculating PM(S0)t, instead of finding min(PM(sa)
t-1 + BM(sa,S0)t, PM(sb)
t-1 + BM(sb,S0)t), we can compare the values of (PM(sa)
t-1 - PM(sb)t-1, BM(sb,S0)
t - BM(sa,S0)t) instead.
• Similarly, For calculating PM(S1)t, we compare the values of (PM(sa)
t-1 - PM(sb)t-1, BM(sb,S1)
t - BM(sa,S1)t).
• Both computations share PM(sa)t-1 - PM(sb)
t-1.
• One computation can be saved.
• For IS95, the two values BM(sb,S0)t - BM(sa,S0)
t and BM(sb,S1)t - BM(sa,S1)
t can be precomputed and stored.
The proposed ACSU Architecture
COMP
COMP
Sub.
Adder
Adder
PMt-1(sa)
PMt-1(sb)
BM t(sa,S0)
BM t(sb,S0)
BM t(sa,S1)
BM t(sb,S1)
PMt(S0)
PMt(S1)
New butterfly
BM t(sb,S0)- BM t
(sa,S0)
BM t(sb,S1)- BM t
(sa,S1)
• For a butterfly operation, 1 9-bit subtraction, 2 9-bit to 5-bit additions, and 2 9-bit to 5-bit comparison are needed
Comparison of the two architecturesConventional ACS Proposed ACS
Operation Type Operation Type
PM(m)t,0=PM(sa)
t-1 + BM(sa,S0) t 9 to 5 bit addition PM=PM(sa)
t-1 - PM(sb) t-1 9 bit subtraction
PM(m)t,1=PM(sa)
t-1 + BM(sa,S1) t 9 to 5 bit addition
comp(PM, BM(sb,S0)t - BM(sa,S0)
t)9 to 6 bit comparison
PM(m’)t,0=PM(sb)
t-1 + BM(sb,S0) t 9 to 5 bit addition comp(PM, BM(sb,S1)
t - BM(sa,S1) t) 9 to 5 bit comparison
PM(m’)t,1=PM(sb)
t-1 + BM(sb,S1) t 9 to 5 bit addition PM(S0)
t = PM(s*)t-1 + BM(s*,S0)
t 9 to 5 bit addition
PM(S0)t = min(PM(m)
t,0,
PM(m’)t,0)
9 bit comparison(subtraction) andselect
PM(S1)t = PM(s’)
t-1 + BM(s’,S1) t 9 to 5 bit addition
PM(S1)t = min(PM(m)
t,1,
PM(m’)t,1)
9 bit comparison(subtraction) andselect
69
Pre-computational architecture
• Further reduction of number of comparisons required during the ACS operation using pre-computation concept.
• Comparison is done on a 9-bit data and a 6-bit data. Instead of doing 9-bit comparison, we use 4 MSBs of the 9 bits data and the sign bit of the 6-bit data to pre-determine whether the magnitude of 9-bit data is larger and if not, then use a 5-bit comparator to compare the magnitude of the 6-bit data and the 5 LSBs of the 9-bit data.
70
Pre-computation Architecture
clk
Sel_sa/Sel_sb
Ni[4:0]>=Di[4:0]?
Di[4:0]
Ni[4:0]
Di[4:0]
Di[5]
Ni[7:5]
Ni[8:0]
Ni[8]
sa
sb
BMi(sb,S0) - BMi
(sa,S0)
PMi-1(m)
PMi-1(m’)
Su
btr
acto
r
5-b
itre
gist
er5-
bit
regi
ster
1-b
itre
gist
er1-
bit
regi
ster
5-b
itco
mp
arat
or
PrecomputationLogic
71
Pre-computation Architecture
• A two-stage pipeline for calculating the selection signal
• At the first stage, Ni[8:5] and Di [5] are used to pre-compute the condition of selecting sa or sb.
• When the condition is detected, the clock signal going to the 2 5-bit registers is gated to save power for the 5-bit comparison.
72
Results• Both conventional and proposed ACSU were
synthesized with Synopsys using MOSIS 0.8m technology library
• Power consumption was estimated using a gate level power simulator
• Simulation vectors were generated in compliance with IS-95 and IS-98 standard.
ConventionalACSU
ProposedACSU
%reduction
Power (W) 477.2 333.3 30.2
Area (m2) 777980 623530 19.9
73
Memory Organization for path metrics
• For M state Viterbi decoder, we need to store m path metrics. Since path metrics at time i+1 are computed using path metrics at time i, it seems that it is necessary to double buffer the path metric memory and we need 2*M memory.
• One way to eliminate the double buffer is to use in-place computation.– We need only the metrics of the M present states ( j i , j i - l , … , j i - k + 2 ,
x)-for M choices of x-to compute metrics for the M hypotheses, i.e. next state ( y , j i , j i - l , … , j i - k + 2)-for M choices of y .
– If the M metrics needed are read from memory, then M memory locations become available to store the M newly computed metrics, and no double buffering is required for metrics.
74
In-place computation
• It is natural to treat the contents of the shift register, (ak, ak- 1, …, a l ) , as a k-digit M-ary number and use this number as the address to the memory of path metrics. Such an addressing scheme is inconsistent with writing new metrics over old metrics.
• Consider the example of M = 2, k = 3. – The decoder has eight hypotheses ending in 000=0;001=1;010=2;
…;lll=7.
– Our natural order would store stage i metrics in table locations 0 through 7.
– But the two successors to, say, 000 and 001 are 000 and 100. This means we read metrics from 0 and 1 and write them (by definition of natural order) in 0 and 4. This is not in-place.
75
In-place computationSuppose the path metrics are originally placed in nature1 order. After one, two, and three stages of decoding we would see the evolution of memory organization for path metrics pointers as follows:
Computing in-place means we want to write back the result of the current path metric to the path metric locations that are used to calculated the current path metrics
To guarantee in-place computation, we need to have an addressing scheme which changes after each decoding cycle. E.g. at the first cycle, we input 0,1 and output 0,4 and put the 0,4 to the location of 0,1. At the second cycle, we need to take input 0,1, again, but now 0 is stored in location 0 but 1 is stored in location 2. SO we need to change the addressing of the input every cycle
76
In-place computation
• From the previous figure, we can see that if the path metric of the hypothesis with shift register contents (a,b,c) (i.e. the state content) at time t is in memory location 4a+2b+c, then the path metric of the hypothesis with the shift register content (a,b,c) at time i+1 will be in location 4c+2a+b.
• In general, the metrics accessed together are found by generating their natural address but rotating the bits of these address by i places before reading (or writing) the metrics from (or into) the memory.
• A cyclic shift of i places is identical to a cyclic shift of i modulo k places.
77
Example
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
000->000,001->010,010->100,011->110100->001,101->011,110->101,111->111
Left rotate by 1 bit
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
000->000,001->100,010->001,011->101100->010,101->110,110->011,111->111
Left rotate by 2 bit
78
Survival path memory organization
• To “prune” the survivor path, the hypothesis with the lowest path metric is identified and its old symbols are the decoder output.’ The oldest symbols may then be dropped from all survivor paths. A symbol may be pruned from the path memory once each decoding cycle, or p symbols may be pruned away after every p decoding cycles.
79
Survivor path memory organization• For minimum error rate, the length of the survivor path memory
field should be made as large as possible. There is a rule of thumb that four or five constraint lengths is adequate.
• For M ,= 2, the constraint length is k + 1. A practical case has k = 6 , so a survivor path memory field of 35 bits is implied. It is inconvenient to handle such a long field all at once, although the operations needed are quite simple.
• To store the survivor path, w can use a pointer mechanism to avoid handling the entire field.
• Since each pointer can only point to M ancestors, the pointer can be abbreviated to an M-ary symbol. This M-ary symbol is identical to the M-ary symbol which is appended to the path.
• Thus no extra storage is needed for the pointers as we can interpret the path memory contents as pointers.
80
Survivor path memory organization
• During the decoding cycle, an M-ary choice is recorded for reach hypothesis in the ith digital position of the survivor path field for each of the Mk surviving hypothesis.– E.g. for the hypothesis with shift register content (ai, ai-1,
…,ai-k+1) the symbol stored is x. To find its predecessor we look in digital position i-1 of the memory word whose address is (ai-1, …,ai-k+1,x);
– If we read a y there, we look in digital position i-2 of the memory word whose address is (ai-2, …,ai-k+1,x,y);
– The procedure carries on forward.
81
Example of the Survivor path memory organization
m=2, k=3
82
Survivor path memory organization
• Whenever two survivor paths agree on k successive pointers , they must necessary converge
• We need to trace back such a path to prune and decode• If the path memory field is L M-ary symbols wide, we
may decode after p decoding cycles, obtaining p decoding symbols, then overwrite new path symbols into the newly freed digital positions on the next p decoding cycles.
• A new symbol will be stored in digital position (i mod L) of the path during decoding cycle i.
83
Survivor Sequence Memory Management supporting simultaneous updating and reading the memory• Here we discuss several different survivor sequence memory
management scheme that support simultaneous updating and reading the memory
• Traceback memory is organized in a 2-dimensional structure, with rows and columns– # of rows = # of states N = 2v.– Each column stores the results of N comparisons corresponding to one symbol
interval or one stage.• 3 types of operations inside a Trace-back decoder
– Traceback Read (TB) – reading a bit and interpreting this bit in conjunction with the present state number as a pointer that indicate the previous state number. Pointer values are not output as decoded values.
• Run to a predetermined depth T before being used to initiate the decode read operation
84
Survivor Sequence Memory Management supporting simultaneous updating and reading the memory
• 3 types of operations inside a Trace-back decoder – Decode read (DC) – operation same as TB, but operates on older data,
with the state number of the first DC in a memory bank being determined by the previously completed traceback. Pointer values are the decoded values and are sent to the bit-order reversing circuit.
• Decode read multiple columns using one traceback read operation of T columns
– Writing New Data (WR) – decisions made by the ACS are written into locations corresponding to the states.
• Data are written to locations just freed by the DC operations
• For every set of column write operations (N-bit wide), an average of one decode read must be performed.
* ref: G. Feygin, P.G Gulak “Architectural Tradeoffs for Survivor Sequence Memory Management in Viterbi Decoders” IEEE Transactions on Communications, pp. 425-429, March, 1993
85
K-pointer Even AlgorithmK=3
86
K-pointer Even Algorithm• The memory is divided into 2k2 memory banks, each of size (T/(k2-1))
columns.• Each read pointer is used to perform the traceback operation in k2-1
memory banks, and the decode read in one memory bank.• Every T stages, a new traceback front is started from the fixed state that
has the best path metric.• Since the traceback depth T must be achieved before decoding can be
performed, so k2-1 memory banks must be greater than equal to T.• Total # of memory required: 2k2* (T/ (k2-1))• The decoded bit are generated in a reverse order, this a scheme is
required for reversing the ordering of the decoded bits– A simple two-stack LIFO is used to perform the bit order reversal
• Each stack is T/(k2-1) in depth• During decoding, decoded bits are pushed on one stack while the bits stored on
the other stack are popped.• Upon completion of the decoding of a given memory bank, stacks switch from
pushing to popping and vice versa.
87
K-pointer Odd Algorithm
88
K-pointer Odd Algorithm
• There are 2k2-1 memory banks, each of length (T/(k2-1))
• Total length = (2k2-2)T/(k2-1)
• A 2-stack LIFO structure is also required to perform bit order reversal.
• The decode pointer and the write pointer always point to the same column in the memory, although the decode pointer will be used to read only one memory location, while the write pointer will be used to sequentially update memory locations corresponding to all states in a given trellis state.
• It is necessary to perform decoding before new data can be written, otherwise the memory being used may be overwritten.
89
One-pointer algorithm• Different from the k-pointer algorithm which use k read pointers to perform
the required k reads for every column write operation, a single read pointer, but accelerate read operations are used.
• Every time the write counter advances by one column, k column reads occur.• The acceleration is based that among the writing new data, traceback read
and decoder read operations, writing new data is the most time consuming – 2v bits are written every stage, comparing with only k bits being read at every stage.
• k1+1 memory banks, each T/(k1-1) columns long.• A single read pointer produces the decoded bit in bursts.
– During the decode read operation in the k1th memory bank, decoded bits are generated at a rate of k1 per stage.
– 2-stack structure can perform both bit order reversal and burst elimintion at the same time.
90
One-pointer algorithm
91
Hybrid algorithm
• Combine some features of the k-pointer algorithm and a one-pointer algorithm.
• K column reads per stage are performed using k2 read pointers, each advancing at a rate of k1 column per stage. (k = k1k2 and k =< T+1)
92
Hybrid algorithm
93
Radix-4 Viterbi Decoder
• Radix-2 Trellis and ACS
Radix-2 trellis 2-way ACS Radix-2 ACS unit
94
Radix-4 ACS
• A 2v-state trellis can be iterated from time index n-k to n by decomposing the trellis into 2v-k sub-trellis, each consisting of k iterations of a 2k-state trellis.
• Each 2k-state subtrellis can be collapsed into an equivalent one-stage radix-2k trellis by applying k levels of lookahead to the recursive ACS udpate
E.g. 8-state Radix-4 trellis
95
Radix-4 ACS
• Parallel and Serial implementation of the ACS unit– Parallel – one ACS butterfly for each pair of states
– Serial – for large constraint length, parallel implementation may not be feasible, use single/(or fewer # than the # of states) ACS butterfly
• Throughput can be increased if the number of ACS iteration for each stage can be reduced.
• # of ACS iteration is reduced by half using Radix-4 ACS. If the critical path of a radix-4 ACS is the same as that of a radix-2 ACS, a potential 2 fold speed up is achievable.
• Of course the potential speedup comes with a complexity increase since the radix-4 ACS is more complex. Therefore higher-radix ACS is not very practical.
96
Radix-4 ACS (cont.)
Radix-4 trellis 4-way ACSRadix-4 ACS unit
97
A 4-way ACS Block diagram