[IEEE 2005 Student Conference on Engineering Sciences and Technology - Karachi, Pakistan...

CCECE 2004- CCGEI 2004, Niagara Falls, May/mai 2004 0-7803-8253-6/04/$17.00 ©2004 IEEE

FPGA IMPLEMENTATION OF A LOW COMPLEXITY EFFICIENT TRACEBACK VITERBI DECODER FOR WIRELESS APPLICATIONS

Zohaib Mahtab

Dept. of Computer & Information Systems Engg., NED University of Engg. & Tech., Karachi. Email: [email protected]

Abstract

Convolutional coding is a coding scheme often

employed in deep space communications and more recently in digital wireless communications. Viterbi decoders are used to decode convolutional codes. Viterbi decoders employed in digital wireless communications are complex and dissipate large power. With the proliferation of battery powered devices such as cellular phones and laptop computers, power dissipation, along with speed and area, is a major concern in VLSI design.

In this project, a novel architecture, low-complexity

design of Viterbi decoders for wireless applications is proposed. The focus of my design is the modified Trace back approach for final decoding. A 4 state viterbi decoder following the proposed architecture is implemented and the synthesis results are presented. Testing is done by simulating the synthesized model on MODEL SIM.

Keywords: Viterbi Decoder, Trace Back, Convolutional Encoder, Path metric, Branch metric, hamming distance.

1. INTRODUCTION

Convolutional coding has been used in communication systems including deep space communications [4] and wireless communications. It offers an alternative to block codes for transmission over a noisy channel. An advantage of convolutional coding is that it can be applied to a continuous data stream as well as to blocks of data. IS-95, a wireless digital cellular standard for CDMA (code division multiple access), employs convolutional coding [5]. A third generation wireless cellular standard, under preparation, plans to adopt turbo coding, which stems from convolutional coding.

The Viterbi decoding algorithm [1], proposed in 1967

by Sir Andrew J.Viterbi, is a decoding process for

convolutional codes in memory-less noise. The algorithm can be applied to a host of problems encountered in the design of communication systems. The Viterbi decoding algorithm provides both a maximum-likelihood and a maximum a posteriori algorithm. Although widely used, this most popular communications decoding algorithm requires an exponential increase in hardware complexity to achieve greater decode accuracy [6] .Therefore I have adopted the analysis and implementation of a reduced complexity decoder approach.

In this paper, a low-complexity design of Viterbi

decoders at the behavior level has been described in a high-level hardware description language, Verilog. The behavioral design is synthesized to generate a gate level design. For testing the behavioral design MODEL SIM has been used.

This paper is organized as follows. In section 2

convolutional encoder is explained. Viterbi Algorithm and the modified Trace back unit are described in section 3. Section 4 comprises of the proposed Viterbi Architecture. Its implementation is discussed in section 5 and the synthesis results are depicted in section 6. The paper is concluded in section 7.

2. CONVOLUTIONAL ENCODER

The convolutional Encoder provides powerful error-

correcting & encoding capability. It offers an alternative to block codes for transmission over a noisy channel

Encoded bits are functions of information bits and the number of memory elements. The information sequence is shifted into and along a shift registers k bits at a time. Bits are tapped off at different stages of the shift register and summed in a modulo-2 adder (XOR gate).

The convolutional encoder is defined by two

parameters: 1. Constraint Length K = Number of shift registers

+ 1. This basically represents the number of locations from where bits can be tapped of.

Figure 1: 4 states Convolutional Encoder

S0

S1

S2

S3

Figure 2: State Diagram for Convolutional Encoder

2. Rate = k/n. If there are n modulo 2 adders, it means that for every k-bit shift, there will be an output of k-bits.

Most wireless applications define the rate to be 1/2.,

which means that for every one bit that enters the convolutional encoder, 2 bits are received at the output. For the proposed design, the constraint length is kept at 3; therefore there will be two shift registers at the encoder side.

2.1 State Diagram

A convolutional encoder is a Mealy machine, where the output is a function of the current state and the current input. It consists of one or more shift registers and multiple XOR gates. The operation of a convolutional encoder can be easily understood with the aid of a state diagram. Figure 1.2 represents the state diagram of the encoder shown in Figure 1.2. It also depicts state transitions and the corresponding encoded outputs.

As there are two memory-elements in the circuit, there

are four possible states that the circuit can assume. These four states are represented as S0 through S3. Each state’s information (i.e. the contents of flip-flops for the state) along with an input generates an encoded output code. For each state, there can be two outgoing transitions; one corresponding to a ‘0’ input bit and the other corresponding to a ‘1’input bit.

2.2 The Trellis Diagram

A trellis diagram is an extension of a state diagram

that explicitly shows the passage of time. Figure 2.3 shows a trellis diagram for the encoder given in Figure 2.1.

In the trellis diagram, nodes correspond to the states

of the encoder. From an initial state (S0) the trellis records the possible transitions to the next states for each possible input pattern. For the encoder in Figure 2.1, there are two encoded symbols corresponding to input bit ‘0’ and ‘1’. The Figure 2.3 shows the encoded symbol generated for each transition. A the stage t=1 there are two states S0 and S1, and each state has two transitions corresponding to input bits ‘0’ and ‘1’. Hence the trellis grows up to the maximum number of states or nodes, which is decided by the number of memory elements in the encoder.

3. VITERBI DECODING ALGORITHM

The Viterbi decoding algorithm [1] is a maximum likelihood decoding process for convolutional codes for a memory-less channel. A decoding algorithm that maximizes the probability p (r|e) is a maximum likelihood (ML) algorithm. Input at receiver end (r) is the information with redundancy and possibly, noise. The receiver tries to extract the original information through a decoding algorithm and generates an estimate (e).

One way to view a Viterbi decoder is to construct it

as a network of simple identical processor with one processor for each state in the trellis. Our example system will have 4 node processors. The processor assigned state S0 will receive input from itself and from the node processor assigned to sate S2. It supplies output

+

+

t t + 1

S0 (00) : 0/00,1/11

S1 (10) : 0/10,1/01

S2 (01) : 0/11,1/00

S3 (11) : 0/01,1/10

Figure 3: Trellis Diagram for the Convolutional Encoder

DECODED MESSAGE

Figure 4: Proposed 4 state Viterbi Decoder Architecture

to itself and to node processor for state S1 and so on for each state. Each processor is responsible for the following tasks.

• It must calculate a number called likelihood

metric that is the Hamming distance between the received sequence and the expected transmitted sequence.

• Each processor must supply, as an output, its

likelihood metric to each node processor connected to it on its output side.

• For each of its input paths, the node processor

must calculate the Hamming distance between the n-bit code symbol it has just received and n bit bode symbol it should have received if the path of the transmitted message had just made a transition from the input side node processor ,this is called likelihood update, the node processor adds the likelihood update to the likelihood supplied to it by the source node processor : it then compares the two likelihood metrics and selects a path associated with the input side node processor having the smallest accumulated Hamming distance (i.e. it selects the most likely path);the node processor then replaces its own likelihood metric with the newly selected likelihood metric (this is task 1 which is end of processing cycle).

• Finally based on which input path it selects the

node processor must trace back and decode the message bit associated with the selected path.

3.1 The Modified Trace Back Approach

For the proposed design Modified Trace Back

approach to decode the original message bits is used. It is an alternative way of managing the survivor path register. Traditionally the viterbi decoders were used with the Register Transfer method for decoding, but it is not the most cost effective method of keeping track of the decoded message when a high speed decoder is required [7]. This is because the register transfer in such a decoder must take place in parallel at each step through the trellis. Consequently, the survivor path registers must be interconnected to permit parallel transfers, and this interconnection can be very costly to implement in an integrated circuit. The implementation of the Modified Trace back approach employed in this design is very cost effective than the register transfer method for high-speed decoders.

4. PROPOSED ARCHITECTURE

The design of high performance Viterbi Decoder has

been investigated intensively in the past three decades [3]. [3], [8] present a very good generic architecture of the Viterbi decoder. Here the 4-state architecture of the Viterbi decoder is explained with the low complexity modified trace back module. The design prototype can easily be extended to higher states.

The major building blocks of a Viterbi decoder are

shown in Figure 4. There are seven conceptual blocks of a Viterbi decoder, and the role of each block is described briefly below.

4.1 ROM

CODED MESSAGE

ACS

0

ACS

1

ACS

2

ACS

3

R O M INPUT

STATES

AND OUTPUTS

FROM

TRELLIS

TRACE BACK

MODULE

R A M PATH

METRIC

AND

SMALLER STATE

SELECTION

Figure 5: Next State and Output Table

The ROM module is responsible for storing the input states of the present states. The input states are derived from the trellis diagram. Since for the state 4 viterbi decoder, at maximum any of the state can has two inputs, the state no. of the input states will be recorded in to the ROM in binary. This ROM will also be used to hold the outputs that are generated while transiting from the input states to the present states. Like the input states, the no. of the outputs is also two and is recorded as the binary number.

4.2 Add Compare Select (ACS) Module

There will be Four ACS Modules, one for each state

of the viterbi decoder. Each ACS will be responsible for interfacing with all the other modules, including the Trace back unit. The major functions that an ACS will perform are:

• To get the two input states of the Present state from ROM.

• To get the Path metric of the input states at that time from the RAM.

• To find the hamming distance between the Received output sequence from ROM and the message that is received at that instant.

• To Add the Path metric of the input states and their corresponding hamming distance.

• To Compare the two input states’ Added Result and find the smaller one, this is the updated path metric.

• To write the Updated path metric back in to the RAM at the state path metric place at t’ = t + 1.

4.3 RAM The RAM is used to store the Path metric of the states

for providing information to the ACS module, and for the Trace back module. The RAM will consist of an array of memory each for a particular instant of time. The RAM is also responsible for finding the state having the smallest path metric at any instant of time and provides this information to the Trace back module.

4.4 Trace Back Module

Here the actual decoding is performed based on the

information provided by the ACS module and the RAM module. The decoding is performed by traversing through the present state and the previous state and finding the potential input that have caused that transition from the previous state to this present state.

5. IMPLEMENTATION

The proposed architecture of the Adaptive viterbi

decoder is coded in the Verilog® HDL. The software that is used is Xilinx Project Navigator v 6.3i. The FPGA that is targeted during the development of the decoder is Virtex2 FPGA, device family is xc2v3000 and the package was fg676 with a speed grade of -4. The tool used for the synthesis of the design is XST (Verilog). The simulator used is the Modelsim® XE-II 5.8c with Full Verilog package.

The individual modules are coded in Verilog, while

the Top level module that integrates the different coded module is developed using the Schematic diagram. In the following section, brief implementation will be provided module by module.

5.1 ROM (ROM_Inputstates)

ROM_Inputstates module is used to maintain the table

for the inputs and outputs of the particular state. The size of the ROM is 4 x 8.

According to the algorithm [1] , for each of its input

paths, the node processor (state) must calculate the Hamming distance between the n-bit code symbol Y it has just received and the n-bit code symbol it should have received if the path of the transmitted message had just made a transition from the input side node processor.

It is required to find the “input states which a state”;

so that we can then find out the corresponding path metrics from the RAM which a particular input state has at that time, therefore the following tables are required to maintain in the ROMS [2] (the two tables are merged here):

5.1.1 Design Strategy: In order to maintain these two tables, there can be two different approaches: • Designing of the two separate ROMs, one for input

state table, and the other for the output symbols table. Thus there will be two ROMs each with size 4 x 4.

• Merge the two ROMs, and design only one ROM, containing the data of both the tables.

Next State / Output, if Current State Input = 0: Input = 1:

00 00 / 00 10 / 11 01 00 / 11 10 / 00 10 01/ 10 11 / 01 11 01 / 01 11 / 10

State 0

State 1

State 2

State 3

2’b00

2’b01

2’b10

2’b11

Address

Memory0 Memory1 Memory7

t=0 t=1 t=7

Trace Back Length

State 0

State 1

State 2

State 3

State 0

State 1

State 2

State 3

Figure 6: Path Metric Memory Management

The latter approach is more efficient and is followed here. Since the common thing in both the ROM is the present state which will be addressed, merging the two tables in the one will save this space, and therefore the size of our ROM will be 4 x 8. The Rom_Inputstates is storing the input state information for all four of the Add Compare Select module. The ACS of the S0 state will require the output word information for the state S0 only, and so on. The design strategy that is used here is that, since all the four ACS need their respective output word, Enable pin used to distinguish between various ACS output words. When the enable pin ‘0’ of the Rom_Inputstates is made high, the output word of the S0 state will be made available. This enable pin ‘0’ of the Rom_Inputstates is connected to the ACS 0, so that it can get its required information whenever desired.

5.2 RAM (RAM_Pathmemory)

The Ram_Pathmemory module is designed to store the path metrics of all of the states at various instants. It will be accessed by all the ACSs simultaneously. Not only, it will read and write the path metrics, it will find the smallest path metric and the state which possess that smallest path metric at any instant. This information is useful for the Trace back unit which will be discussed shortly.

5.2.1 Design Considerations: All the Reads will be made at the positive clock cycle. • All the Writes are assumed to be at the negative edge

of the clock cycle. • The clocking mechanism will be handled by the

ACS. To perform the desired operation at any edge of the clock, the corresponding pin is made high. For e.g. to read at the positive edge, the read enable pin is made high.

• The memory depth is 4, while each memory vector is 4 bits wide.

• The Trace back length is kept to be 7. • Separate memory is implemented for every time

instant, with maximum up to 7, after 7 time intervals the decoding is assumed to be true and the contents of the memory are cleared.

• While finding the smallest path metric state, lower no. state is given the highest priority. It means that, if there are more than one state on the same lowest path metric, say State 0 and State 1, State 0 will be considered smaller. This has to be done because there can no more than one state smaller at a given time in the trace back unit.

5.2.2 Design Strategy: i) Memory Management: To store the path metric at every time instant for every state,

there should be a memory management scheme. The design strategy that is followed here is that: There is a separate 4 x 4 memory for each time instant. Since the trace back length is kept at 7, there are 8 memory 4 x 4 modules to handle the path metric at each time. Each register in a memory is 4 bit wide, this can accommodate up to 15 units path metric which is more than sufficient for 4 state viterbi decoder. 2) Memory Initialization: Memory0 of the Ram_Pathmemory represents the contents of the memory @ t = 0. To start with, we know the initial state of the convolutional encoder = 0 0 (for 4 state). Therefore according to the algorithm, the path metric of the state0 is kept at minimum which is 0000 (since the register is 4 bits wide). The other memory registers which represent the corresponding state path metric @ t = 0 are kept at maximum. The maximum here is assumed to be 7 = 0111, because since it would have been 15 = 1111, but adding 1 to it, without any carry out will make it 0 = 0000. 3) State Sorting: Now for the trace back unit, it is required to find the smallest path metric and hence that state which possess this path metric. This is done here using the same data from the input port = datain_acsx, instead of assigning it to the memory and then sorting it out from there. The code starts to check from the state 0 path metric, if it is the smallest among all, the state_smaller2 will be assigned state 0 and so on. These output ports are then connected to the trace back unit.

Figure 7: Trace Back Table

5.3 ADD COMPARE SELECT This is the most important module in the whole viterbi decoder. Its main functions are: • To get the two input states for each present state. • To get the output that is produced while transiting

from the input states to the present state. This information is drawn from the Rom_Inputstates module.

• To retrieve the path metric corresponding to each input state it has drawn from Rom_Inputstate.

• To calculate the difference between the output transitions and the message, and thus determine the Hamming Distance between them. This is done for both of the input states.

• Add this calculated Hamming distance with the corresponding path metric of the input state that was retrieved before. This is done for both of the input states. The added result is called updated path metric.

• Now compare both the states’ updated path metric and determine the smaller one.

• The smaller updated path metric is then written back to the present state path metric register at that particular time.

5.3.1 Design Considerations: To implement the above mentioned functions, following design assumptions are made:

• Since there are multiple modules interacting with the ACS block, and the result of each interaction depends on its previous read or written data; it is decided to use multiple clocks to cope with the design needs.

• The enable to the Rom_Inputstates will be made high at the positive edge of the clock.

• The data from the Rom_Inputstates will then be read at the negative edge of the clock.

• Same goes for the Ram_Pathmemory module also. All the enables will be made active during the positive edge of its cycle, and then the data will be read and manipulated in the negative edge of its cycle.

5.3.2 Clocking Methodology: The use of multiple clocks within the design greatly reduces the complexity of the viterbi decoder. Moreover since all the clocks are synchronized and are derived from the same major clock, there is no over head involved in maintaining and generating separate clocks. The different types of clock that are used in this design are: 1. clk_rom: For interfacing with the Rom_Inputstates.

At the positive edge of the clock, the ROM is

enabled. At the negative edge of the clock, the data is Read from the ROM.

2. clk_rpm: For interacting with the Ram_Pathmemory

module. At the positive edge of the clock, the chip is enabled, the two addresses are placed, and the read signal is made high. At the negative edge of the clock, the corresponding path metric is read back into the ACS.

3. clk_write_rpm: For the purpose of writing back the

updated path metric into the Ram_Pathmemory module. The duty cycle of this clock is very short, it is merely meant for providing a write strobe at the very right time when the updated data is available. Shortly after this clock, the clk_rpm reads the written result from the memory; therefore it must be very much synchronized.

5.4 THE MODIFIED TRACE BACK UNIT The Trace back unit is used to decode the actual message [1]. It is the last block of the viterbi decoder. The decoding is performed by traversing through the present state and the previous state and finding the potential input that have caused that transition from the previous state to the present state. The trace back length is kept to 7 in order to perform the efficient decoding. The potential input that is responsible for the transition between the states is determined from the following table [2]:

Input was, Given Next State =

Current State

002 = 0 012 = 1 102 = 2 112 = 3

002 = 0 0 x 1 x 012 = 1 0 x 1 x 102 = 2 x 0 x 1 112 = 3 x 0 x 1

Note: In the above table, x denotes an impossible transition from one state to another state [2]. Following assumptions are done while designing the Trace back unit. 5.4.1 Design Considerations:

Figure 8: Synthesis Results

• No separate ROM is maintained for storing the above table.

• The module is coded as two synchronous procedures which operate at the positive and the negative edge of the same clock “clk_tb”.

• At the positive edge of the clk_tb, the table is loaded into the registers.

• At the negative edge of the clk_tb, the actual traversing in search of the potential input is performed.

5.4.2 Design Strategy: The most important thing in this trace back unit is the design of the above mentioned table. The table should be stored in such a way, that it makes traversing and searching the potential input possible. Consider the transition from S2 to S0, the transition to state S0 is possible only from states S2 and the state S0 itself. If S2 transits to S0, then the input that will applied be 1, if S0 transits to S0 again, then the input will definitely be 0. The table in the module should be maintained in such a way that it becomes possible to find this input state and thus the input combination. A series of eight for loops are used in which a simple check for the potential input is performed. 5.4.3 Reduced Complexity Approach: In the traditional Viterbi approach, the trace back unit happened to be very complex and power consuming. The major reason was that the Trace back was performed for the whole constraint length and for every interval of time. In this reduced Complexity approach, the trace back is performed from the time t to the t-1, and this is done only once for each time stamp. All the loops will not get executed at any instant. Only the loop whose time stamp matches with that of the provided ACS time variable value, will only get execute. This not only reduces the complexity of the design, and the power consumption, but also saves the whole decoded message variable to get updated at every instant, only its corresponding bit will be updated.

6. SIMULATION RESULTS After synthesizing the module, the simulation is done

using the Modelsim® XE-II 5.8c simulator. In this section, the Final Simulation Results of the module as a whole will be presented: 6.1 Device Utilization Summary Figure 8 shows the Device Utilization summary implemented module. Selected Device : 2v3000fg676-4 Number of Slices: 7 out of 14336 => 0%

Number of Slice Flip Flops: 11 out of 28672 = > 0% Number of bonded IOBs: 56 out of 484 = > 11% Number of GCLKs: 2 out of 16 => 12%

6.2 Timing Diagram • Speed Grade: -4 • Minimum period: 2.911ns • Maximum Frequency: 343.525MHz • Maximum output required time after clock: 5.953ns

7. CONCLUSIONS

Viterbi decoders used in wireless and deep space

communications are intricate, time-consuming and costly when implemented in integrated circuits. Especially the trace back unit costs much of the hardware and makes the viterbi decoder to perform slowly. In this paper, the investigation on a very efficient trace back method is done, and a 4 state viterbi decoder as a prototype for more complex viterbi decoder, is implemented. Design strategies are discussed briefly and the final overview of the simulation result is also presented. The head-to-head comparison with the other traditional architectures would become difficult because of the different environments,

Design Statistics IOs 62 Macro Statistics : Registers 76 1-bit Register 8 2-bit Register 48 4-bit Register 20 Multiplexers 170 2-to-1 multiplexer 98 4-bit 4-to-1 multiplexer 64 4-bit 8-to-1 multiplexer 8 Adders / Subtractors 1 32-bit adder 1 Comparators 13 4-bit comparator lessequal 13 Xors 16 1-bit xor3 16 Cell Usage : BELS 2 GND 1 VCC 1 Flip-Flops/Latches 11 Clock Buffers 2 IO Buffers 56

such as hard decision versus soft decision and constraints imposed. Nevertheless the Trace back technique can be applied to other low-power designs to further reduce complexity.

References

[1] Richard B. Wells, APPLIED CODING AND

INFORMATION THEORY FOR ENGINEERS.

[2] Chip Fleming, “A Tutorial on Convolutional Coding with Viterbi Decoding” Spectrum Applications.

[3] Samirkumar Ranpara, “A Low-Power Viterbi

Decoder Design for Wireless Communications Applications” ASIC conference, Sept. 1999, Washington, D.C.

[4] P. H. Kelly and P. M. Chau. A flexible constraint

length, foldable Viterbi decoder. In IEEE Global

Telecommunications Conference Including a Communications Theory Mini- Conference, volume 1, pages 631–635, 1993.

[5] I. Kang and A. N. Willson, Jr., “Low-power Viterbi

decoder for CDMA mobile terminals,” IEEE J. Solid-State Circuits, vol. 33, pp. 473–482, Mar. 1998.

[6] Sriam Swaminathan, Russell Tessier, Dennis Goecket

and Wayne Burleson, “A Dynamically reconfigurable Adaptive Viterbi Decoder”, University of Massachusetts.

[7] Yun-Nan Chang, Hiroshi, and Keshab K. Parhi, “A 2-

Mbps 256 state 10mW Rate 1/3 Viterbi Decoder” JIEEE Journal of Solid State Circuits, June, 2000

[8] Joao Portela, “Power Optimized Viterbi Decoder

Implementation through Architectural Transforms”

[IEEE 2005 Student Conference on Engineering Sciences and Technology - Karachi, Pakistan...

Documents

Transcript of [IEEE 2005 Student Conference on Engineering Sciences and Technology - Karachi, Pakistan...