A Dirty Paper Coding Modem - ULisboa

70
A Dirty Paper Coding Modem Tomás Law Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering Supervisor(s): Prof. Paulo Alexandre Crisóstomo Lopes Examination Committee Chairperson: Prof. José Eduardo Charters Ribeiro da Cunha Sanguino Supervisor: Prof. Paulo Alexandre Crisóstomo Lopes Member of the Committee: Prof. José Manuel Bioucas Dias June 2018

Transcript of A Dirty Paper Coding Modem - ULisboa

Page 1: A Dirty Paper Coding Modem - ULisboa

A Dirty Paper Coding Modem

Tomás Law

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisor(s): Prof. Paulo Alexandre Crisóstomo Lopes

Examination Committee

Chairperson: Prof. José Eduardo Charters Ribeiro da Cunha SanguinoSupervisor: Prof. Paulo Alexandre Crisóstomo Lopes

Member of the Committee: Prof. José Manuel Bioucas Dias

June 2018

Page 2: A Dirty Paper Coding Modem - ULisboa

ii

Page 3: A Dirty Paper Coding Modem - ULisboa

Declaration

I declare that this document is an original work of my own authorship and that it fulfills all the require-

ments of the Code of Conduct and Good Practices of the Universidade de Lisboa.

iii

Page 4: A Dirty Paper Coding Modem - ULisboa

iv

Page 5: A Dirty Paper Coding Modem - ULisboa

Acknowledgments

I would like to thank my family for supporting me throughout my studies.

I would like to thank Prof. Paulo Lopes for the incredible support during the thesis.

I would also like to thank my friends for the encouragement throughout the entire course and the thesis.

v

Page 6: A Dirty Paper Coding Modem - ULisboa

vi

Page 7: A Dirty Paper Coding Modem - ULisboa

Resumo

Dirty Paper Coding (DPC) e um conjunto de tecnicas para a reducao dos efeitos da interferencia num

sinal ao longo do canal de transmissao. Utiliza tecnicas de pre-codificacao no transmissor para moldar

e transformar a mensagem original. Permite o cancelamento de interferencia arbitraria previamente

conhecida no transmissor mantendo constante a potencia necessaria para transmitir o sinal. O principal

objetivo da dissertacao foi a implementacao dum sistema de transmissao completo extremo-a-extremo

que emprega DPC. Obviamente, ambos os extremos do sistema necessitam de ser coordenados e

compatıveis, o que envolve a implementacao das tecnicas de codificacao no transmissor e as respetivas

tecnicas de descodificacao no recetor. O sistema implementado foi utilizado para efetuar diversas

simulacoes onde o parametro de ruıdo ira variar. A partir destas simulacoes, a taxa de erros de bits

(BER) e o numero total de erros na mensagem descodificada serao medidos para diferentes condicoes

de sinal-ruıdo (SNR). As simulacoes foram efetuadas utilizando o software Matlab.

Palavras-chave: DPC, Transmissor, Recetor, Interferencia, SNR.

vii

Page 8: A Dirty Paper Coding Modem - ULisboa

viii

Page 9: A Dirty Paper Coding Modem - ULisboa

Abstract

Dirty Paper Coding (DPC) is a set of techniques for reducing the effects of interference on a signal

across the channel by using precoding techniques at the transmitter to shape and transform the original

message. It allows the cancelation of arbitrary known interference at the transmitter while keeping

the power required to transmit the signal constrained. The main objective of the dissertation was to

implement a complete end-to-end transmission system which employs DPC. Obviously, both ends of the

system need to be coordinated and compatible. This involves implementing the encoding techniques

performed at the transmitter and the respective decoding techniques at the receiver. The implemented

system was used to perform several simulations where the noise parameter in the channel will vary. From

these simulations, the Bit Error Rate (BER) and the total number of errors on the decoded message will

be measured for different Signal to Noise Ratio (SNR) conditions. The simulations were performed using

Matlab software.

Keywords: DPC, Transmitter, Receiver, Interference, SNR

ix

Page 10: A Dirty Paper Coding Modem - ULisboa

x

Page 11: A Dirty Paper Coding Modem - ULisboa

Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Topic Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Background 7

2.1 Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Modulo lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2 Voronoi region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.3 Shaping gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Viterbi algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Log-Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Implementation 15

3.1 Designed system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.1 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.2 Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.3 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Repeat-Accumulator Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.3 Decoding algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 4-PAM Mapping, Dither and Interference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

xi

Page 12: A Dirty Paper Coding Modem - ULisboa

3.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4 Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4.2 Mapping to the Voronoi region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4.3 Implementation code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4.4 Calculation of the lattice codewords . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 Bahl-Cocke-Jelinek-Raviv Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5.2 Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5.4 Initial and terminal states of the decoder . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5.5 Computation of the L-values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Results 41

4.0.1 Overall results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.0.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Conclusions 49

Bibliography 51

xii

Page 13: A Dirty Paper Coding Modem - ULisboa

List of Tables

1.1 Analogies used in Costa’s question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 Mapping of the 4-PAM symbols to real values . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 General overview of the evolution of the decoder . . . . . . . . . . . . . . . . . . . . . . . 36

3.3 Example of the evolution of the state bits of the decoder given an arbitrary input sequence 36

3.4 Example of the evolution of the outputs of the decoder given an arbitrary input sequence . 36

3.5 Example of the terminating process of the state of the decoder . . . . . . . . . . . . . . . 38

4.1 SNR values for each run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

xiii

Page 14: A Dirty Paper Coding Modem - ULisboa

xiv

Page 15: A Dirty Paper Coding Modem - ULisboa

List of Figures

1.1 General transmission system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Transmission steps in a communications system . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Lattice representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Lattice point λ1 calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Coset of the lattice Λ generated by x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Representation of the Fundamental Voronoi Region of λ1 . . . . . . . . . . . . . . . . . . 11

3.1 Full overview of the designed system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Example of the structure of the Repeat-Accumulator Encoder . . . . . . . . . . . . . . . . 18

3.3 L-values permutations in the Interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Upsampling of the original information sequence . . . . . . . . . . . . . . . . . . . . . . . 21

3.5 State transition in a shift-register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.6 Example of a state transition in a shift-register and possible sk−1 values . . . . . . . . . . 27

3.7 Structure of the shift-register and the generator polynomials to obtain the lattice codewords 28

3.8 Representation of a trellis diagram for the Viterbi algorithm . . . . . . . . . . . . . . . . . 28

3.9 State diagram with 3 states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.10 Illustration of the trellis processing at the receiver side using BCJR . . . . . . . . . . . . . 32

3.11 Example of possible and impossible state transitions . . . . . . . . . . . . . . . . . . . . . 34

4.1 Evolution of the number of errors along the iterations for each signal-to-noise ratio . . . . 42

4.2 Evolution of the number of errors along extra iterations for SNR = -0.80271 dB . . . . . . 43

4.3 Evolution of the number of errors along the iterations for -1.826 dB < SNR < -0.803 dB . 44

4.4 Evolution of the Bit-Error-Rate (BER) for different Signal-Noise Ratio (SNR) values . . . . 45

4.5 Evolution of the turbo cliff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.6 Evolution of the number of errors along the elements for each iteration for σ = 0.7 . . . . . 46

xv

Page 16: A Dirty Paper Coding Modem - ULisboa

xvi

Page 17: A Dirty Paper Coding Modem - ULisboa

Nomenclature

∗ Convolution

Λ Lattice

Rn The set of all n-tuples of real numbers

X Matrix

x Vector

ν Fundamental Voronoi region

⊕ Modulo-two addition

cov{X,Y } Covariance of signals X and Y

gs(Λ) Shaping gain of Λ

mod Λ Modulo lattice Λ operation

p(x) Probability density function

p(x, y) Joint probability density function

P{X} Power of signal X

ACC Accumulator

APP a-posteriori probability

ARQ Automatic Repeat Query

AWGN Additive White Gaussian Noise

BCJR Bahl-Cocke-Jelinek-Raviv

CND Check node decoder

DMC Discrete memoryless channel

DPC Dirty Paper Coding

FEC Forward Error Correction

xvii

Page 18: A Dirty Paper Coding Modem - ULisboa

IL Interleaver

L-value Log-likelihood value

LDPC Low-density parity-check

MIMO Multiple-Input and Multiple-Output

MMSE Minimum Mean-Square Error

PAM Pulse-amplitude modulation

RA Repeat-Accumulate

SISO Soft-In Soft-Out

SNR Signal-to-noise Ratio

VND Variable node decoder

VQ Vector-quantizer

xviii

Page 19: A Dirty Paper Coding Modem - ULisboa

Chapter 1

Introduction

The communication of data between a transmitter and a receiver involves the transmission of information

through some key procedures. Usually, it starts with the generation of the message we wish to send

which is converted into a sequence of symbols suitable for communication. Then, these symbols are

encoded into an appropriate format for transmission over a channel. Finally, after being transmitted

through the channel, the received symbols are decoded into the original message. Yet, the receiver

doesn’t always manage to decode the symbols into the correct original message due to imperfections in

the system as well as noise and interference across the channel. These channels can be of many kinds

such as: cables, optical fiber, wireless channels and even digital storage media.

Figure 1.1: General transmission system

In a typical communications systems as shown in Figure 1.1, interference can be considered as

anything which modifies a signal while it’s in a channel between a transmitter and a receiver. Therefore,

in most situations, it’s considered an unwanted signal and we wish to reduce it as much as possible.

Many different methods can be used to achieve this goal such as: Forward Error Correction (FEC) and

Automatic Repeat Query (ARQ). The FEC method works by encoding the data using an error-correction

code before the transmission where redundancy is added to the signal. This redundancy in the bits will

allow the receiver to recover the original data at the cost of extra bandwidth. The ARQ method works by

forcing the receiver to send messages to the transmitter to acknowledge that it has correctly received a

certain data packet. For example, if the timeout duration expires, the transmitter resends the data packet.

1

Page 20: A Dirty Paper Coding Modem - ULisboa

Both these methods have different advantages and disadvantages but, when used simultaneously, they

allow the reduction of the probability of error to almost zero.

Dirty Paper Coding (DPC) is a technique for transmitting data efficiently across a channel with in-

terference. It uses several methods to precode the data at the transmitter side so that it becomes less

vulnerable to the effects of interference across the channel. It allows the cancellation of the effects of a

portion of interference known at the transmitter, without increasing the overall power necessary to trans-

mit the signal. This cancellation will allow us to reach a close-to-capacity transmission rate. That is, a

transmission rate close to the maximum transmission rate possible at which information can be reliably

transmitted. Therefore, we achieve a more efficient, but still reliable, signal transmission system.

1.1 Motivation

Nowadays, signal interference problems are a recurring issue on many people’s lives. Ranging from

connections dropping off due to a low received power of the signal to erroneous messages being de-

livered to the receiver, multiple situations are negatively affected by signal interference. The usage of

multiple-input multiple-output (MIMO) systems has been increasing significantly. This lead to the use of

multiple user MIMO (MU-MIMO), in which the same bandwidth is used by several users. For example,

at home, the more devices that connect to the home Wi-Fi, the slower it gets because most modems

can only communicate with one device at a time. In these single-user, MIMO modems, each connected

device has to wait its turn to send and receive data from the Internet. MU-MIMO modems allow the

connection of multiple devices simultaneously which reduces the time each device has to wait for a sig-

nal and speeds up the network. In MU-MIMO systems interference between users can be reduced by

using DPC. The signal sent from a base station to one user can be seen as known interference to the

signal sent to another user. Overall, with this project, we intend to increase the quality and reliability of

nowadays digital transmission services, whether it’s in single user or multi-user communication systems.

1.2 Topic Overview

As initially said, generally, the steps to transmitting a message across a communications system are the

following:

1. Generate a message W

2. Encode the message into the signal X before transmitting

3. Transmit X across the channel

4. Decode the received signal Y

5. Obtain the decoded message W ′

2

Page 21: A Dirty Paper Coding Modem - ULisboa

The received signal Y can be represented by Equation 1.1:

Y = X + S +N , (1.1)

where:

• S is an arbitrary interference known at the transmitter;

• N is a statistically independent Gaussian random variable representing noise.

This scenario is represented in Figure 1.2.

Figure 1.2: Transmission steps in a communications system

Examining Equation 1.1, we observe that, to cancel the interference S, the receiver could simply

subtract the interference S from the received signal Y , leading to an interference-free signal only affected

by the Gaussian noise N . Yet, this would only be possible if the interference S was known at the receiver

side. Since the interference S is not known at the receiver, we could pre-subtract the interference at the

transmitter. This way, the transmitted signal would be instead:

X ′ = X − S (1.2)

The received signal would be interference free as shown in Equation (1.3):

Y ′ = X ′ + S +N = X − S + S +N = X +N (1.3)

However, this approach raises a problem related to the power constraint. Transmitting the signal X ′ =

X − S corresponds to transmitting the sum of 2 signals. The average power necessary to transmit the

sum of 2 signals is given by Equation (1.4).

P{X + Y } = P{X}+ P{Y }+ Cov{X,Y } (1.4)

where Cov{X,Y } represents the covariance of signals X and Y and P{X} = E[X2].

The covariance of two signals is a measure of the strength of the correlation between the two signals.

The more correlated two signals are, more positive will be the covariance. On the other hand, the more

oppositely correlated two signals are, more negative will be the covariance. Ultimately, if two signals are

3

Page 22: A Dirty Paper Coding Modem - ULisboa

not correlated at all, the covariance of these two signals is zero. Therefore, when the interference S is

independent of the signal X, the average required power to transmit the signal X ′ would simply be the

sum of the required power to transmit X with the required power to transmit S as shown in Equation

(1.5).

P{X + S} = P{X}+ P{S} (1.5)

As the interference is arbitrarily strong, the power required to transmit the signal would need to be

relatively higher. This makes the transmission of the signal inefficient which is an issue we will try to

solve using DPC.

Related to DPC but in a more general perspective, there are several systems that perform digital wa-

termarking techniques. Digital watermarking consists of embedding one signal, ”embedded signal” or

”watermark”, within another signal, ”host signal”. This method allows the transmission of auxiliary infor-

mation where the embedded signal causes no serious degradation to its host signal. It works by altering

the transmitted signal using an embedding function which depends on the original signal and makes it

more robust to noise interference.

One of the applications of DPC is in digital watermarking systems. In these systems some assump-

tions are made. These assumptions are related to the type of noise affecting the system as well as other

factors but we won’t go deep into explaining them because they are out of the scope of this dissertation.

Nonetheless, some important observations will be made regarding this issue at the end.

The concept of DPC was originated way before the rise of research in digital watermarking in 1983

where Costa postulated the following problem: ”imagine we have a piece of paper covered with indepen-

dent dirty spots of Normally distributed intensity, and we write a message on it with a limited amount of

ink. The dirty paper, with the message on it, is then sent to someone else, and acquires more Normally

distributed dirt along the way. If the recipient cannot distinguish between the ink and the dirt, how much

information can we reliably send?” [8]

The problem exposed by Costa in the above paragraph makes an analogy to a real communications

channel scenario. In this problem, a transmitter attempts to transmit a signal across a channel to a

receiver and, across the channel, there will be two interference sources. At the transmitter/encoder side,

the first source is known but the second source is unknown. The used analogies are described in Table

1.1.

At the time, the Shannon-Hartley theorem was well known and it defined the maximum rate at which

information can be transmitted over a communications channel in the presence of noise. According to

the theorem, the higher is the amount of noise found across the channel, the lower will be the amount

of information that can be transmitted through that channel. In the analogy, the higher is the amount of

dirt found in the transit of the paper, the lower would be the information that can be reliably received.

Therefore, one would think the same happens if the dirt was added to the paper before the message was

written. That is, even if the first source of noise is known at the encoder, the amount of information that

can be reliably received would be reduced. Yet, Costa proved this wrong in his paper [9] and showed

4

Page 23: A Dirty Paper Coding Modem - ULisboa

Table 1.1: Analogies used in Costa’s question

Paper Channel

writer transmitter

ink/dirt noise

limited amount of ink power constraint of the signal

first layer of dirty spots first known source of noise

dirt acquired along the way second unknown source of noise

recipient receiver

that the capacity C of the system is independent of the first source of noise that is known at the encoder.

It depends only on the signal power P and the noise N found along the channel as shown in Equation

(1.6).

C =1

2log2

(1 +

P

N

)(1.6)

To satisfy Equation (1.6), a power constraint on the transmitted signal must be verified:

1

n

n∑i=1

X2i ≤ P (1.7)

Originally, Costa’s results were only proven for Gaussian noise sources but later they were extended

to arbitrary, deterministic or random interference.

1.3 Objectives

With this thesis, the main objective for the future is to allow DPC to become a basic building block

integrated with other blocks to build a better digital communication system. In practical terms, the end

goal is the implementation of an end-to-end system which encodes a signal at the transmitter, transmits

it across a noisy channel and decodes the signal at the receiver. More specifically, we want to achieve

this using several DPC techniques for a minimum signal-to-noise ratio (SNR). This end goal can be split

into smaller goals which consist of the successful implementation of the modules that represent the DPC

techniques.

The smaller goals mainly consist of implementing both sides of the system:

• Encoder at the transmitter side

• Decoder at the receiver side

5

Page 24: A Dirty Paper Coding Modem - ULisboa

1.4 Thesis Outline

This dissertation will start by giving an overall explanation regarding data transmission across channels

and pointing out the problems caused by interference. Some examples of solutions to solve these kind of

problems will be presented such as the main advantages and disadvantages of each of these solutions.

Then, we will introduce the dirty paper coding solution along with the challenges associated with each

element of it. Mainly, the advantages and disadvantages of following the dirty-paper coding approach

against other current approaches. Also, a brief presentation regarding the evolution of this approach

throughout the history such as the major breaking points and developments in this area.

Most importantly, before explaining the system implemented to achieve our goal, we will provide

some background information about the main concepts involved in this project whether in the initial

implementation section or in the results analysis section. Terms such as lattices are crucial in the

implementation of this encoder/decoder while key indicators such as the shaping gain are critical in

the result analysis process. Other elements such as the log-likelihood are of very highly importance

throughout the whole dissertation. Therefore, it’s essential to understand these concepts before moving

on to the implementation section.

In the implementation section, we start by describing the encoder from 2 different point of views: the

transmitter side and the receiver side. For each side, we explain what we want to achieve and then

how we will do it. That is, we show what is the goal of each side from a theoretical point of view and

then explain the purpose of each element in the designed system in order to achieve that goal. At the

transmitter side, we explain how we transform the original message into the signal we wish transmit over

the channel, including the RA code and the Viterbi algorithm. At the receiver side, we show how we

transform the received signal from the channel into the decoded original message, including the BCJR

algorithm.

After the whole implementation process is explained, we present the results obtained from the sim-

ulations with the system we implemented. These results include not only information about whether

or not the input sequence was correctly decoded but also information about intermediate steps of our

algorithm. This way, we can analyze the role that each element of our system has during the execution

of the algorithm. Then, we proceed to explain the results and the behavior of our algorithm.

Finally, a brief summary about the execution of the project and some thoughts about applications

of DPC in existing systems will be given. Special relevance regarding the relationship between DPC

and MIMO systems will be taken into consideration and future work to be done in this process will be

mentioned.

6

Page 25: A Dirty Paper Coding Modem - ULisboa

Chapter 2

Background

2.1 Evolution

Initially, Max Costa studied DPC with Gaussian noise and interference and, using the general formula by

Gelfand and Pinkser for the capacity of channels with side information known at the transmitter, showed

that the capacity is equal to 12 log2

(1 + PX

PN

). This means that, theoretically, the interference doesn’t

incur any loss in capacity. Only later, ”the connection of the DPC model to precoding for interference

cancellation was established, and Costa’s result was extended to arbitrary interference, deterministic or

random” [10].

Recently, a significant amount of research regarding the application of DPC to broadcast over MIMO

channels has been done. Consequently, DPC has been slowly emerging as a building block in multi-user

communication systems.

Using lattice coding to cancel the effect of interference was introduced by Willems [11] who provided

achievable strategies for the cancellation of the interference based on the DPC. Also, Costa proved that,

in the scenario of a Gaussian noncausally known interference, the capacity of the channel is the same

as when there’s no interference at all. To achieve this goal, a scheme based on lattice quantization and

minimum mean-squared error (MMSE) scaling would be used. Initially, the scheme was originally built

upon the concept of trellis shaping and ”syndrome dilution” but, later, it was extended by using capacity-

approaching codes, iterative detection and iterative decoding. After further researches, a complete

end-to-end dirty paper transmission system that offers considerable gains under low conditions of signal-

to-noise ratio (SNR) was designed.

2.2 Lattices

As defined in [10], ”a lattice Λ is a discrete subgroup of the Eucliden space Rn” and it can be regarded

as a linear code defined over the real numbers. One use of lattices is in the wireless communications

field. Why? Because this field is related to a great deal of concepts such as power constraints and

lattices are useful in this setting.

7

Page 26: A Dirty Paper Coding Modem - ULisboa

In a strict sense, a lattice is a linear additive subgroup of Rn. To ease the process of understanding

the concept of a lattice, we can work with the definition that a lattice is represented by a linear combina-

tion of basis vectors in N-dimensional space. Therefore, each lattice point λ ∈ Λ can be represented by

Equation (2.1).

λ = g1b1 + g2b2 + ...+ gnbn (2.1)

where:

• g1, g2...gn are basis vectors;

• b1, b2, ..., bn are integers.

Since there’s no limit on the number of vectors we combine, the lattice Λ has infinite size and is

periodic in nZn. From all the normal operations that can be done with basis vectors, for the quantization

method we will implement with the lattices, two are of special importance: the ordinary vector addition

and the squared Euclidean distance. The ordinary vector addition operation states that given two points

xT = (x1, x2, ..., xn) and yT = (y1, y2, ..., yn), the addition of these two points is given by Equation (2.2).

Additionally, since the lattice is linear, if these two points belong to the lattice, then their sum also belongs

to the lattice. That is, if x,y ∈ Λ, then x+ y ∈ Λ.

xT + yT = (x1 + y1, x2 + y2, ..., xn + yn) (2.2)

The Squared Euclidean Distance represents the straight-line squared distance between two points

in Euclidean space defined by Equation (2.3).

||x− y||2 =

n∑i=1

(xi − yi)2 (2.3)

An example of a two-dimensional lattice is shown in Figure 2.1 where we have two generator vectors

g1 and g2.

Figure 2.1: Lattice representation

8

Page 27: A Dirty Paper Coding Modem - ULisboa

Alternatively, we can represent the lattice generator vectors in the matrix form G where each gener-

ator vector is represented in a column:

G =

. . .

g1 g2 . . . gn

. . .

This way, the lattice points can represented as shown in Equation (2.4).

λ = G · b (2.4)

where:

• b is a vector of integers.

For example, given the generator vectors g1 = [1 1.5]T and g2 = [0 − 1]T , we generate the two-

dimensional lattice shown in Figure 2.2. The lattice point λ1 can be obtained using the integer vector

b1 = [1 2]T .

λ1 =

1 0

1.5 −1

·1

2

=

1

−0.5

Figure 2.2: Lattice point λ1 calculation

Similarly, we can immediately show that, if we take

b =

0

0

9

Page 28: A Dirty Paper Coding Modem - ULisboa

and

G =

g1,1 g2,1

g1,2 g2,2

then

λ =

g1,1 g2,1

g1,2 g2,2

·0

0

=

0

0

That is, the origin is always a lattice point. This characteristic will be specially relevant for the

modulo lattice operation which will be explained later.

The lattice Λ can also be translated by x ∈ Rn so, consequently, all lattice points λ are translated

by x. The translated version of Λ defined by the set x + Λ represents a coset. An example coset for a

two-dimensional lattice is shown in Figure 2.3.

Figure 2.3: Coset of the lattice Λ generated by x

For each lattice Λ, we define the nearest neighbor quantizer QΛ(x) in Equation (2.5).

QΛ(x) = λ ∈ Λ if ||x− λ|| ≤ ||x− λ′||, ∀ λ′ ∈ Λ (2.5)

The nearest neighbor quantizer QΛ(x) represents the lattice point that is the closest to the sequence

x. The distance between QΛ(x) and x is calculated using the modulo lattice operation.

2.2.1 Modulo lattice

The modulo lattice operation is used to compute the distance between the lattice point λ that is the

closest to a certain sequence x and the sequence x itself. This sequence is represented by the corre-

sponding coset x+Λ in Euclidean space. Finding the closest lattice point is accomplished by computing

the minimum squared Euclidean distance between x and any of the surrounding lattice points λi. Since

the lattice is infinite, we obviously can’t compute the squared Euclidean distance between x and all lat-

10

Page 29: A Dirty Paper Coding Modem - ULisboa

tice points so we only handle the surrounding lattice points. However, a problem arises: how does the

algorithm know which lattice points are surrounding the sequence x and which aren’t. The solution for

this problem relies on the fact that lattices are periodic in nZn. Therefore, we can translate any sequence

x to the proximity of the lattice point centered at the origin by performing a modulo operation as shown

in Equation (2.6).

x′ = xmod Zn (2.6)

Then, we can compute the Euclidean distance between the translated sequence x′ and the sur-

rounding lattice points to find which one is the closest as shown in Equation (2.7). After we perform this

operation, the coset leader y is found which is defined as the unique member of the coset lying in the

fundamental Voronoi region.

y = x′ mod Λ = (xmod Zn)mod Λ (2.7)

2.2.2 Voronoi region

The fundamental Voronoi region of Λ ⊂ Rn denoted by V0 is defined as the set of minimum Euclidean

norm points representatives of the cosets of Λ. The Voronoi region V of a lattice point λ is the space

which is closer to λ than to any other lattice point as represented in Figure 2.4. Also, for any given Λ, all

regions have the same volume and it corresponds to the absolute value of the determinant of the lattice

generator vectors G as shown in Equation (2.8).

Figure 2.4: Representation of the Fundamental Voronoi Region of λ1

V (V) = |det(G)| (2.8)

11

Page 30: A Dirty Paper Coding Modem - ULisboa

The fundamental Voronoi region of a lattice can be computed by performing the modulo lattice oper-

ation over Rn as shown in Equation (2.9).

V0 = Rn mod Λ (2.9)

It can also be computed by Equation (2.10) which corresponds to the set of all points that are closer

to the origin lattice point than to any other lattice point. That is, the sequences x whose nearest neighbor

QΛ(x) is the origin.

V0 , {x ∈ Rn : QΛ(x) = 0} (2.10)

The fundamental Voronoi region is the key element in setting the power constraint on the transmitted

signal. The objective of the modulo lattice operation is to map the information sequences into the funda-

mental Voronoi region V0 because this means that transmitted sequences correspond to the minimum

energy sequences.

2.2.3 Shaping gain

The lattice shaping allows the approximation of the signal into a uniform distribution inside the Voronoi

region. Consequently, the amount of information that can be transmitted is close to what could be

achieved if the source had a Gaussian distribution which achieves the capacity.

A useful way of measuring the gain from using the lattice method is the shaping gain gs(Λ):

gs(Λ)|dB = 10 log10

G(Zn)

G(Λ)= 10 log10

1

12G(Λ)(2.11)

where:

• G(Zn) is the normalized second moment of a hypercube of any dimension (no shaping);

• G(Λ) is the normalized second moment of the lattice (using V for shaping).

For a hypercube of any dimension G(Zn), we have the normalized second moment as:

G(Zn) =

∫ 12

− 12

x2 dx =1

12

It measures how much more power is needed when using an input uniformly distributed over a cube,

rather than a distribution uniform over the Voronoi region V, in order to obtain the same entropy. This

entropy directly depends on the volume of the Voronoi region because, since the distribution is uniform,

the entropy H(x) is given by:

H(x) =

∫X

p(x) log

(1

p(x)

)= log

1

V olume

12

Page 31: A Dirty Paper Coding Modem - ULisboa

The normalized second moment of the lattice G(Λ) relates the volume of the Voronoi region |V| and

the averaged second moment P (Λ) of the lattice as shown in Equation 2.12.

G(Λ) =P (Λ)

|V| 2n(2.12)

The averaged second moment P (Λ) of the lattice Λ is given by Equation 2.13.

P (Λ) =1

n|V

∫V||x||2 dx (2.13)

It’s easy to see that for any dimension n the region that has the smallest normalized second moment

is the n-sphere.

limn→∞

G(n-sphere) =1

2πe(2.14)

From Equations 2.11 and 2.14, we can infer that the ultimate shaping gain with respect to a cubic

region is:

gs(Λ)|dB(optimal shaping) = 10 log10

2πe

12≈ 1.53 dB (2.15)

This gain translates to a gain in signal power when we use modulo-lattice coding for the DPC.

2.3 Viterbi algorithm

The Viterbi algorithm is a dynamic programming algorithm which provides an efficient way of finding

the most likely state sequence of a finite-state discrete-time Markov process in memoryless noise. It

offers a recursive optimal solution for estimating the maximum a-posteriori probability of these type of

processes. Based on the observed output of a certain process, it estimates what was the most likely

sequence state during that process. The estimated state sequence is called Viterbi path. Besides

estimating the state sequence, it allows to obtain the respective input bit sequence which caused the

estimated state sequence.

The sequence of states can be considered as the core element of this algorithm upon which many

calculations and estimations will be performed. Another indispensable notion to understand is that time

is discrete. Therefore, the state xk corresponds to one of a finite number M of states m at time k which

ranges from time 0 to time K. That is:

• 1 ≤ m ≤M

• 0 ≤ k ≤ K

where x0 represents the initial state and xK represents the terminal state of our process.

The underlying key for estimating the state sequence involves taking advantage of a Markov property

which affirms that the future state of a process depends solely on the present state, not on the sequence

of events that preceded it. More specifically, the probability of state xk+1 depends solely on the state xk,

13

Page 32: A Dirty Paper Coding Modem - ULisboa

not on the previous states xk−1, xk−2, ..., x0, as described in Equation (2.16). The state transitions will

be dependent on the input of the Markov process.

p(xk+1|x0, x1, ..., xk) = p(xl+1|xk) (2.16)

In the context of our problem, we will be using the Viterbi algorithm for a slightly different purpose

than the one it was intended to but this will detailed later in the next chapter. For now, all we need to

know is that we will be using the Viterbi algorithm to find the lattice codeword that most closely matches

the received information sequence. This information sequence corresponds to the sequence of input

bits after several transformations that have been performed. Our lattice points correspond to codewords

generated from the convolutional code and we will map each information sequence to the closest lattice

point. This convolutional code can be modeled by a shift-register in which the shift-register bits represent

the state and the shift-register input represents the input bit uk.

The received sequence is compared to the generated codeword from the shift-register so that we

can estimate the state sequence and, consequently, the input bit sequence. In other words, the Viterbi

algorithm finds what state sequence causes the most similar codeword sequence to the one actually

received.

An explanation with a step-by-step description and evolution of the algorithm will be further detailed

in the next chapter.

2.4 Log-Likelihood

In the algorithms described in this system, the use of Soft-In Soft-Out (SISO) decoders is very important.

These type of decoders allow the input and output of each operation to take values other than 0 or 1.

More specifically, all the bits will be represented by a log-likelihood value (L-value) which is calculated

using Equation (2.17).

L(u) = lnP (u = 1)

P (u = −1)(2.17)

where:

• p(u = 1) is the probability of the bit u to be equal to ’0’;

• p(u = −1) is the probability of the bit u to be equal to ’1’.

In this notation, the sign of L(u) indicates the hard decision (bit ’0’ or ’1’) and the magnitude of L(u)

indicates the reliability of this decision (how likely the hard decision is true).

Therefore, when p[u = 1] > p[u = −1], then L(u) > 0. Otherwise, if p[u = 1] < p[u = −1], then

L(u) < 0.

Similarly, when p[u = 1] >> p[u = −1], then L(u) >> 0. Otherwise, if p[u = 1] << p[u = −1], then

L(u) << 0.

14

Page 33: A Dirty Paper Coding Modem - ULisboa

Chapter 3

Implementation

3.1 Designed system

It was modeled a system which allows the simulation of the three main components in a general trans-

mission system: transmitter, channel and receiver. This way, for testing purposes, we can compare the

original message at the transmitter with the decoded message at the receiver. Additionally, several spec-

ifications such as noise power can be modified in order to test the limits of our system. The overview of

the implemented system is depicted in Figure 3.1.

3.1.1 Transmitter

At the transmitter side, we start with the message we wish to send. The coding of the original mes-

sage is performed by several elements which can be simplified into two main modules. The first module

includes an error-correcting code which consists of a nonsystematic repeat-accumulate (RA) encoder.

It’s composed of an outer mixture of repetition codes of different rates (variable nodes), an edge inter-

leaver, an inner mixture of single parity-check codes of different rates (check nodes) and a memory one

differential encoder (accumulator). These rates define the distribution of the repetition and check nodes

which is responsible for the code design.

After the repeat-accumulate code has been applied, the second module of the transmitter further

transforms the signal by applying a trellis shaping code. This module consists of several signal transfor-

mation operations with the main goal of minimizing the energy of the signal before transmitting across

the channel. The signal transformation operations start by forwarding the signal to an upsampler and

grouping the upsampled sequence in pairs. Each pair forms a 4-PAM symbol which will be mapped to

a real value using the 4-PAM mapper. Before executing the Viterbi algorithm, we add the scaled state

interference αS and the dither to the signal. Then, the Viterbi algorithm is used, which will provide us the

closest lattice codeword to the signal but translated to the fundamental Voronoi region. The transforma-

tion is performed by transmitting only the distance between each signal symbol and the closest lattice

point so we will subtract Viterbi’s output by Viterbi’s input and perform the modulo lattice operation. This

way, we guarantee that the transmitted signal is inside the fundamental Voronoi region and ready to be

15

Page 34: A Dirty Paper Coding Modem - ULisboa

Figure 3.1: Full overview of the designed system

transmitted.

3.1.2 Channel

Across the channel, the signal will be affected by two sources of interference:

• The state interference S which was known at the transmitter and already pre-subtracted with a

scale factor α;

• An Additive White Gaussian Noise signal n

3.1.3 Receiver

At the receiver side, after rescaling the signal by factor α and removing the dither u, we can’t apply the

Viterbi algorithm to demodulate the received signal. The Viterbi algorithm ”minimizes the probability of

word error for convolutional codes. The algorithm does not, however, necessarily minimize the probability

of symbol (or bit) error” [12]. The solution involves using the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm

16

Page 35: A Dirty Paper Coding Modem - ULisboa

which minimizes the symbol error probability. This decoder is used to estimate the bit probabilities by

using the folded Euclidean metric which will be described ahead. In Figure 3.1, the ie and ia notes

represent the flow of information and whether it’s extrinsic or a-priori, respectively. The BCJR algorithm

receives the a-priori L-values and the received signal from the channel and provides the a-posteriori

L-values. Then, the extrinsic information from the values is forwarded to the channel decoder which

will refine the estimations using the redundancy in the coded bit stream. At the CND, the ”box-plus”

operation is performed before forwarding the values to the Interleaver (IL). The VND receives the values

from the IL, computes new values and forwards them back to the IL. Again, the CND receives values

from the IL, computes new values with ”box-plus” operation and provides new a-priori L-values to the

BCJR module. The decoder will iterate over these steps until the bit probabilities converge. These

iterations are responsible for conducting the Belief Propagation algorithm which consists of a sum-

product message passing algorithm. That is, we are computing the marginal probabilities of the L-

values at each module based on L-values originated from other modules. These L-values represent

messages that are transmitted between the different modules of the algorithm which increase the amount

of information known at each module until it converges to a solution.

3.2 Repeat-Accumulator Code

3.2.1 Introduction

Repeat-accumulator (RA) codes are a class of error correcting codes and they are related to turbo codes

in the sense that they are also decoded iteratively and allow to come close to capacity. Turbo codes are

a class of high performance forward error correcting codes that allow the code rate to be close to the

theoretical maximum. Turbo codes are generated by convolutional coders.

RA codes are considered a competitive alternative to low-density parity-check (LDPC) codes. Ac-

cording to [13], irregular RA codes consist of an outer code that is a mixture of repetition codes, an inner

code that consists of parity checks and an accumulator. More specifically, they consist of a variable node

coder (VND), an interleaver (IL), a check node coder (CND) and an accumulator (ACC).

The first proposed codes were check-regular, systematic or non-systematic. The difference between

systematic and non-systematic codes is that in a systematic code, both the inner and outer decoder

receive channel information. In a non-systematic RA code, only the inner decoder receives channel

information which makes this kind of code simpler to implement. The problem is that, without the outer

decoder receiving directly the information bits, the decoder finds it problematic to reach convergence of

the bit probabilities. The solution to the problem is to use check bi-regular codes in which we no longer

need to have all the check nodes with the same degree. The check nodes can either have degree 1 or

dc for the decoder to converge.

17

Page 36: A Dirty Paper Coding Modem - ULisboa

3.2.2 Structure

Our encoder receives a stream of k information bits which enter the first part of our repeat-accumulate

coder: the repetition codes. These codes are represented by variable nodes which have different de-

grees depending on the rate of the repetition codes. After the repetition codes, there’s an interleaver,

represented by an edge interleaver, which will foward the bits to the parity check codes. These codes are

represented by check nodes which, similarly to the variable nodes, have different degrees depending on

the rate of the single parity check code. Finally, after the single parity check codes, there’s an accumu-

lator represented by a chain of check nodes. Note that we adopted a terminology in which the degree

of a node doesn’t exactly match the number of edges connected to the node. The degree of a node

corresponds to the number of edges connected from the node to the edge interleaver as represented in

Figure (3.2).

Figure 3.2: Example of the structure of the Repeat-Accumulator Encoder

At the decoder side of the RA code, more specifically, the inner decoder side, the ”BCJR decoding

process” is performed which will be detailed later in Section 3.5. This algorithm receives as input the

a-priori probabilities of the accumulator’s input bits in the L-value format. Then, the a-posteriori L-values

are fed back to the check nodes from the accumulator. The objective of this RA decoder is to use the

redundancy in the bits, which was intentionally created with the variable nodes, to improve the accuracy

of the L-values. An iteration of this decoding algorithm will be detailed in the next section.

18

Page 37: A Dirty Paper Coding Modem - ULisboa

3.2.3 Decoding algorithm

An iteration of this algorithm starts by subtracting the resulting a-posteriori L-values of the BCJR de-

coding algorithm by the a-priori L-values to obtain the extrinsic L-values. In this operation, if the bit

probability already converged to a value, the corresponding a-priori L-value would be −∞ (if bit = ”0”)

or +∞ (if bit =”1”). The same would happen to the a-posteriori L-value. When subtracting them, a

”NaN” (Not a Number) would be assigned to the extrinsic L-value. When this situation occurs, to avoid

the problem, the a-priori L-value is simply assigned to the extrinsic L-value. These extrinsic L-values

are forwarded from the accumulator to the check nodes. The next step of this process corresponds to

forwarding the output L-values from the check nodes to the edge interleaver.

CND output L-values

Each check nodes has dc+1 L-values associated where dc L-values come from the interleaver and 1

comes from the accumulator. These L-values are used as inputs to the CND and they will be used to

calculate the output L-values of the CND using Equation 3.1

Li,out = ln1−

∏j 6=i

1−eLj ,in

1+eLj ,in

1 +∏j 6=i

1−eLj ,in

1+eLj ,in

(3.1)

which we will denote by a ”box-plus” operation using the L-value notation of [14], obtaining Equation

3.2 instead.

Li,out =∑j 6=i

�Lj ,in (3.2)

Note that, in Equation (3.2), one has j 6= i. This means that, for a CND of degree 3 (4 edges), when

calculating the output L-value of the first edge of the CND, the L-values of the second, third and fourth

edges should be used. When calculating the output L-value of the second edge of the CND, the L-values

of the first, third and fourth edges should be used. This corresponds to the implementation of the Belief

Propagation algorithm

Additionally, in [14], after some equation manipulation and using the relation tanh(u2

)= eu−1

eu+1 , Equa-

tion 3.1 was simplified into Equation 3.3.

Li,out = 2arctanh

J∏j=1

tanh(L(uj)

2

(3.3)

Now, the CND output L-values are forwarded to the interleaver where they will suffer permutations,

reversing them back to the original redundant bit sequence that was inputted into the interleaver at the

encoder side as shown in Figure 3.3. These L-values are forwarded to the VND and will be used to

calculate the VND output L-values.

19

Page 38: A Dirty Paper Coding Modem - ULisboa

Figure 3.3: L-values permutations in the Interleaver

VND output L-values

Similarly to the calculation of the CND output L-values, the VND output L-values are obtained from the

VND input L-values. Each VND has dc + 1 edges: dc edges coming from the IL and 1 from the source.

However, the a-priori L-value of the source edge is always zero because we have no a-priori information

about the coded bits. Each VND output L-value is calculated using Equation (3.4).

Li,out =

dc∑j 6=i

Lj ,in (3.4)

This equation corresponds to adding all the input dc+1 edges of the VND, except the one we are

calculating the output L-value on. Since each L-value contains information about the hard decision and

the reliability of that decision, by adding the input L-values, all this information combined will produce a

more complete and reliable L-value.

The L-value of the channel for each input bit i can be easily calculated with Equation (3.5). Therefore,

for each VND, the channel L-value is the sum of the L-values coming from the IL. This channel L-value

provides a probability (in L-value format) of what the channel bit will be (0 or 1).

Li,ch =

dc∑j

Lj ,in (3.5)

Due to the redundancy provided by the repetition codes, we can improve the accuracy of the channel

L-value and, consequently, a more reliable channel input bit estimation. At this point, we’ve computed an

estimate of the input bit probability but, in order to further improve the accuracy of this value, we should

perform more iterations of this decoding algorithm. Now, the following L-values correspond to the ”flow”

of the L-values in the opposite direction: the IL-to-ACC direction. The VND output L-values are fed back

to the IL which will reverse the permutation and forward them to the CND. This reverse permutation at

the IL is performed by corresponding the original indexes of the IL inputs to the respective destination

indexes at the IL outputs. Then, the ”box-plus” operation will be performed again with the newly received

20

Page 39: A Dirty Paper Coding Modem - ULisboa

L-values from the IL. This second ”box-plus” operation calculates the ACC input given the CND inputs

which corresponds to a simple ”box-plus” operation on the L-values of the inputs. The result of this

operation provides the a-priori L-values on which the BCJR algorithm will operate.

3.3 4-PAM Mapping, Dither and Interference

After the RA code has been applied to the input bits, we now have a signal composed of a sequence of

coded bits at the output of the accumulator. However, before applying the Viterbi algorithm to find the

corresponding codeword, we need to perform some transformations to the signal regarding the bits. We

will denote these transformations as signal shaping and they mainly consist of two important operations:

map and dither addition. The main purpose of these operations is to transform the existing signal into

a signal with some specific characteristics. In our case, we want to reduce the average transmission

power.

In a simple way, the modulo lattice operation performed on the signal causes the signal to lose half

of its information. Therefore, we upsample it by a factor of 2 causing the information of the signal to

double before the mod − Λ block. This way, we guarantee that this operation is information preserving.

The term sign-bit shaping is used because the mod− Λ operation changes the sign-bit values to obtain

the shaping gain. The whole process will be explained first and an example will be shown later.

As previously said, before applying the Viterbi algorithm, we have a sequence of coded bits at the

output of the accumulator which we will denote by original information sequence. Sign-bit shaping will

now be applied where this sequence will be upsampled and then mapped to real scalar values. The

upsampler generates 4-PAM symbols from the original information sequence by grouping every 3 bits

into two pairs of bits. The 2nd bit of each group of 3 bits is repeated in both pairs and corresponds to

the most significant bit in each pair. We will denote the resulting sequence as upsampled information

sequence and this upsampling process is described in Figure 3.4.

Figure 3.4: Upsampling of the original information sequence

Each pair of bits from the upsampled information sequence represents a 4-PAM symbol. Then, each

of these symbols will be mapped to a real scalar value using a mapper centered at the origin which

generates values in the set {−1.5,−0.5, 0.5, 1.5}. The mappings from the 4-PAM symbols to the real

values are described in Table 3.1.

21

Page 40: A Dirty Paper Coding Modem - ULisboa

Table 3.1: Mapping of the 4-PAM symbols to real values

Symbol Mapped value00 -1.501 -0.510 0.511 1.5

To the resulting sequence of real values, denoted by mapped and upsampled information sequence,

we will now subtract the scaled interference and the dither. This scaled interference corresponds to the

arbitrary interference known at the transmitter which will be added in the channel. The magnitude of this

interference depends on the power of the signal so it will have a scale factor α. This scale factor is given

by:

α =PX

PX + PN(3.6)

where:

• PX = E[X2] is the power of the signal X

• PN is the additive Gaussian noise power

The choice of α is explained in [9].

For the designed system, the power of the signal PX can be calculated using the methods described

in [10]. Basically, we will use Equation (3.7) to compute the signal power PX from the shaping gain gs.

gs(Λ)|dB = Puni,(−2,2]|dB − PX |dB + 10 log10RV Q (3.7)

where:

• Puni,(−2,2] is the power of the uniformly distributed signal in (−2, 2]

• PX is the power of the (truncated Gaussian-like) shaped signal at the output of the transmitter

• RV Q is the rate of the quantizer

Puni,(−2,2] corresponds to the power of a random variable uniformly distributed in the interval (−2, 2]

and it can be computed as shown in Equation (3.8). It’s distributed in the interval (−2, 2] because 4-PAM

mapping to the values {−1.5,−0.5, 0.5, 1.5} is being performed.

Puni,(−2,2] =

∫ 2

−2

x2

2− (−2)dx =

4

3' 1.249 dB (3.8)

RV Q = 12 as it represents the vector quantizer code rate. That is, the code rate of the code to

generate the lattice.

In Section 2, we’ve inferred the ultimate shaping gain gs(Λ)|dB ' 1.53 dB. Specified in [10], for the

lattice code we used (polynomials 058 and 078) and for a rate RV Q = 12 , it has been measured a shaping

gain:

gs(Λ)|dB ' 0.98 dB

22

Page 41: A Dirty Paper Coding Modem - ULisboa

Therefore, we compute the signal power PX as:

PX |dB = Puni,(−2,2]|dB + 10 log10RV Q − gs(Λ)|dB

= 1.249 dB + 10 log10

1

2− 0.98 dB

' −2.74 dB ' 0.532

We can now compute α and subtract the scaled interference off the signal. The next step corresponds

to applying the dither signal.

The dither consists of a random variable that is intentionally subtracted to the signal. Its main char-

acteristic is the uniform distribution centered at the origin and it’s used to shape the signal with one main

objective: to make the signal uniformly distributed in the Voronoi region.

We want to transform the signal into an uniformly distributed signal before sending across the channel

because the mutual information of the channel is maximized for an uniform input. A deeper explanation

on this is in [10] but what we need to infer is that, this way, instead of sending the input signal directly, we

send the sum of the signal with the dither. In other words, this will allow us to describe the transmitted

signal (which will be sent across the channel) by an uniform signal that is independent of the input

signal. This is a key characteristic of the signal which will allow the methods we implemented to operate

correctly.

We will denote the resulting sequence of real values after subtracting off the scaled interference and

the dither by dithered information sequence with scaled interference. One more operation will need to

be performed to transform the signal. The lattice points are represented by 2(C+2ZN ) and not (C+2ZN )

because we are shaping with the sign-bit which has a weight of 2. Since the 4-PAM operation generates

values in {−1.5,−0.5, 0.5, 1.5}, after they are replicated by adding multiples of 2, an infinite sequence of

values spaced by 1 will be obtained {...,−3.5,−2.5,−1.5,−0.5, 0.5, 1.5, 2.5, 3.5, ...}. Therefore, Viterbi’s

fundamental Voronoi region is in [−1 1]. The signal values need to be reduced by half before executing

the algorithm. The Viterbi algorithm can now be applied to obtain the lattice codewords corresponding to

the dithered information sequence with scaled interference. To better understand the explanation above,

an example follows.

3.3.1 Example

At the accumulator output, we have the following original information sequence:

original information sequence = [0 1 1 1 0 1 0 0 1]

This original information sequence can be grouped into sequences of 3 bits where the 2nd one is the

sign bit (in boldface):

original information sequence = [0 1 1 1 0 1 0 0 1]

23

Page 42: A Dirty Paper Coding Modem - ULisboa

After the upsample on the sign-bit, we obtain the upsampled information sequence:

upsampled information sequence = [1 0 1 1 0 1 0 1 0 0 0 1]

The 4-PAM mapping, according to Table 3.1 will now be applied on the bit pairs to obtain the mapped

and upsampled information sequence:

mapped and upsampled information sequence = [0.5 1.5 −0.5 −0.5 −1.5 −0.5]

The dither and scaled interference will now be added to the signal. To obtain the interference scaling

factor for a noise coefficient σ = 0.6, first, we need to compute α using Equation (3.19a) where PX =

0.532.

α =PX

PX + PN=

PXPX + σ2

=0.532

0.532 + 0.62= 0.5964

Therefore, applying this scaled factor to a signal with amplitude 10 results in the scaled interference:

scaled interference = [−3 0.2 −2.6 −17.5 −2.9 −8.3]

Similarly, the dither is obtained using a uniformly distributed signal in the interval [−2, 2]:

dither = [1.5 0.3 0.2 −1.4 1.4 0.5]

Now, we will subtract the dither and the scaled interference of the mapped and upsampled informa-

tion sequence to obtain the dithered information sequence with scaled interference:

mapped and upsampled information sequence = [0.5 1.5 −0.5 −0.5 −1.5 −0.5]

dithered information sequence with scaled interference = [4.84 8.12 2.46 12.83 −8.63 −4.1]

This resulting sequence will now be divided by 2 to adapt the lattice codewords 2(C+2ZN ) to Viterbi’s

lattice codewords (C + 2ZN ).

viterbi input sequence = [2.42 4.06 1.23 6.415 −4.315 −2.05]

3.4 Viterbi Algorithm

3.4.1 Introduction

The Viterbi element in our system is placed after the dither and scaled interference addition and receives

the information sequences. It serves a goal different than the original objective of the Viterbi decoder.

Instead of finding the input bits, it is used to generate the closest codeword. It will generate code-

24

Page 43: A Dirty Paper Coding Modem - ULisboa

word sequences based on the generating polynomials to build a convolutional code. These codeword

sequences represent the lattice points.

bi(D) = gi(D) ∗ uvq(D) (3.9)

where:

• gi(D) is a generating polynomial;

• uvq(D) ranges over all binary input sequences.

Since bi represents a bit, either 0 or 1, the fundamental Voronoi region of the lattice is in the interval

[−1 1]. Next, we combine every pair of output codewords into a coded sequence C by interlacing them.

In our scenario, this distance of the codewords corresponds to the modulo Euclidean distance be-

tween the two sequences. This minimization process is represented by Equation (3.10) where xk repre-

sents an information sequence and bk represents a codeword from the lattice.

min

{n∑k=1

((xk − bk)mod 2)2

}(3.10)

As earlier introduced in the previous chapter, this algorithm estimates what was the most likely se-

quence of input bits, based on some observed output, by computing all possible outputs for each state

and input bit combination. Yet, in our system, the observed output sequence corresponds to a sequence

of bits, grouped in pairs. These pairs don’t represent a codeword, they are just arbitrary information se-

quences outputted after the dither addition. We will denote these information sequences as observed

output. To estimate the most likely state sequence we will be comparing each pair of samples from

the observed output with the pair of samples generated from the convolutional code. That is, we will be

finding the lattice point, represented by a codeword, that is the closest to the observed output. These

lattice points were generated from a specific convolutional code which consists of the polynomials 58

and 78 to create the codewords. These polynomials were chosen by Uri Erez as mentioned in [10]. Most

likely, there will be information sequences from the observed output that don’t match any of the possible

codewords generated. The purpose of the Viterbi algorithm is to find the codeword that is the closest to

the observed output.

At the end of the Viterbi algorithm, we have a sequence of codewords belonging to the lattice that

closely match the sequence of observed outputs. The estimated virtual input bits that produced such

state sequence can be discarded.

3.4.2 Mapping to the Voronoi region

After the computations to obtain the sequence of codewords have been completed, a question may be

asked: Why did we convert the sequence of observed outputs into a lattice code codeword? The

answer is complex and it relies on the characteristics of lattices quantization and the Voronoi region

properties.

25

Page 44: A Dirty Paper Coding Modem - ULisboa

To answer this question, we start by going back to the start. The core property of this dirty-paper-

coding technique, regarding the minimization of the interference, rests on the transmission of the signal

with limited power. That is, without requiring additional power to cancel the interference. As described in

the previous chapter, regarding lattices quantization, we need the signal to be inside the Voronoi region.

This region is the region inside each cell of the lattice grid. And, if we recall the introduction about the

lattices, each point of the grid corresponds to a certain codeword from a convolutional code. Therefore,

returning back to the Viterbi algorithm, if we subtract off the codeword from the observed output by

performing the modulo lattice operation, we guarantee that the result is inside the Voronoi region. Then,

we are ready to transmit the power constrained coded signal across the channel.

3.4.3 Implementation code

As mentioned before, the lattice codewords can be modeled by a shift-register from which we can de-

scribe the state evolution of the process using a trellis diagram. This trellis displays all possible state

transitions for each time instant. It also describes what conditions are necessary for each state transition

and what is the output codeword that was generated in that transition. The convolutional code used poly-

nomials 58 and 78. Therefore, to match the number of bits from the polynomials (3 bits), the shift-register

was designed with 2 state bits which, along with the input bit, equal a total of 3 bits. One important

property of this trellis is that it starts at the all-zero state and it must converge to a terminal all-zero

state. This trellis method was implemented using a set of matrices representing the output codeword as

a function of the state and the input of the shift register.

For each instant and for each state sk, we test all possible previous state transitions that led to that

state. Each state transition consists of a previous state sk−1 and a previous input bit uk−1. Because the

states are being modeled by a shift-register, the most significant bit of the current state sk corresponds

to the previous input bit sk−1. On the other hand, the remaining bits from the current state sk correspond

to the most significant bits of the previous state sk1 . In our case, since our state consists of solely 2 bits

and there’s only 1 input bit each instant, we have 2 possible previous states sk−1 which differ in the least

significant bit.

An illustration of the state transitions described above is shown in Figure 3.5 and an example with

arbitrary values is shown in Figure 3.6.

Figure 3.5: State transition in a shift-register

In the example given in Figure 3.6, we show how we compute the state transitions for a given state

sk = [1 0]. The previous input bit uk−1 is immediately obtained from the most-significant bit of sk which

26

Page 45: A Dirty Paper Coding Modem - ULisboa

Figure 3.6: Example of a state transition in a shift-register and possible sk−1 values

is ”1”. To obtain the previous state sk−1, we simply perform a left arithmetic shift of the current state sk

where the least significant bit of sk−1 is unknown. Then, for each possible value to fill in the unknown bit

of sk−1, we calculate the resultant codeword using the convolutional code as described next.

3.4.4 Calculation of the lattice codewords

For creating our lattice codewords, we used a convolutional code which was generated by the polyno-

mials (g1, g2) = (058, 078) of memory 2.

g1 = 058 = [g1,2 g1,1 g1,0] = [1 0 1] (3.11a)

g2 = 078 = [g2,2 g2,1 g2,0] = [1 1 1] (3.11b)

These generator polynomials were used to calculate the output of a shift register which the input

is the virtual input bit u, implementing a convolution operation. At a given instant k, the state of the

shift-register sk consists of the two input bits from the two previous instants as shown in Equation (3.12).

sk = [u(k − 1) u(k − 2)] (3.12)

We will denote each lattice codeword sequence by (b0, b1) which is composed of the output of each

generator polynomial as described in Equations (3.13b) and (3.13a).

b0 = g1 ∗ u(k) = g1,2 · u(k)⊕ g1,1 · u(k − 1)⊕ g1,0 · u(k − 2) (3.13a)

b1 = g2 ∗ u(k) = g2,2 · u(k)⊕ g2,1 · u(k − 1)⊕ g2,0 · u(k − 2) (3.13b)

This calculation is represented in Figure 3.7.

Then, this calculated codeword is compared to the information sequence of the observed output.

The result of this comparison represents the metric we use to estimate which of the possible previous

states, is the most likely one. This metric corresponds to the total number of errors propagated for the

respective path. That is, for a certain information sequence, the metric includes:

• Number of differences between the current information sequence of the observed output and the

closest lattice codeword

27

Page 46: A Dirty Paper Coding Modem - ULisboa

Figure 3.7: Structure of the shift-register and the generator polynomials to obtain the lattice codewords

• Number of propagated errors since the start until the previous information sequence for the possi-

ble previous state

The comparison between the information sequence and the lattice codeword is done as shown in

Equation (3.14).N∑i=1

((xxi − bi)mod 2)2 (3.14)

where:

• xx = [xx1 xx0] represents an information sequence

• b = [b1 b0] represents a pair of bits from the lattice codeword from Figure 3.7

The operations described above are executed for all instants of the algorithm. The corresponding

trellis is represented in Figure 3.8.

Figure 3.8: Representation of a trellis diagram for the Viterbi algorithm

Figure 3.8 shows the trellis diagram used to describe the Viterbi algorithm where both the initial

and the terminal phases of the algorithm are represented. In this trellis, we represent the transitions

between each state, the input bit necessary to cause that transition and the output generated from that

transition. More specifically, in each state transition, a red branch represents a state transition caused by

an input bit equal to ’0’ and a green branch represents a state transition caused by an input bit equal to

28

Page 47: A Dirty Paper Coding Modem - ULisboa

’1’. Over each branch, the sequence of numbers represents two bits of the codeword generated in that

state transition polynomials. Observing the diagram, we can notice that the initial and terminal states

are computed differently.

Regarding the initial states, as mentioned previously, the trellis starts at the all-zero state. At instant

k = 0:

• The only possible state is sk = [0 0]

Therefore, at the second instant k = 1 of the algorithm:

• The only possible states are sk = [0 0] or sk = [1 0]

Consequently, at the third instant k = 2,

• Although all four states are possible, the only possible previous states are sk−1 = [0 0] or sk−1 =

[1 0]

Regarding the terminal states, to ensure the convergence of the states by setting the input bit to ’0’,

at instant k = N − 3, where N represents the total number of information sequences:

• Although all four states are possible, the only possible next states are sN−2 = [0 0] or sN−2 =

[0 1]

To continue with the process of converging the process into the all-zero state, at instant k = N − 2:

• The only possible next state is sN−1 = [0 0]

At this point, it can be guaranteed that the terminal state is the all-zero state. However, one additional

iteration with the input bit set to ’0’ needs to be assumed so that it matches the structure of the decoder

that we implemented. More specifically, so that the BCJR algorithm terminal state, which has 3 state

bits, is the all-zero state but this will be detailed in Section 3.5. This condition is necessary for the BCJR

algorithm because, in the Markov model used for its inputs, the final state is zero. Although the model

used for BCJR is equivalent to the actual coder, it has an extra state bit. In order for this extra state bit

to be zero, it’s required that the last three virtual bits are ’0’, and not just two as is required for the shift

register used in the actual modulator. So, at instant k = N − 1:

• The only possible next state is sN−1 = [0 0]

After all iterations of the algorithm have been computed, we now can easily obtain the Viterbi path by

following the sequence of most likely previous states, starting at the terminal all-zero state. Along with

the state sequence, the associated codeword sequence is obtained.

As previously mentioned, the true advantage of this system relies on the transmission of a signal

which is inside the Voronoi region. So, if we subtract the resulting information sequence after the dither

addition by the output codeword of the Viterbi algorithm, then the result will be inside the Voronoi region.

So, now we can transmit the resulting signal across the channel and guarantee that it satisfies the power

constraint on the signal power.

29

Page 48: A Dirty Paper Coding Modem - ULisboa

3.5 Bahl-Cocke-Jelinek-Raviv Algorithm

3.5.1 Introduction

The Bahl-Cocke-Jelinek-Raviv (BCJR) decoder allows the estimation of the APP of the states and transi-

tions of a Markov source through a discrete and memoryless channel (DMC). Unlike the Viterbi algorithm

which minimizes the probability of word error, BCJR minimizes the probability of symbol/bit error prob-

ability. That’s the reason for the designation of ”bit-wise quantization detector”. The relevance of this

algorithm to our problem is that estimating the a-posteriori symbol probability is part of the solution.

Since our system can be described in a space state, the output of the system is a Markov process so

we will represent it with a state diagram as in Figure 3.9. Yet, our diagram will be much more complex

because there will be more than 4 possible states for our decoder.

Figure 3.9: State diagram with 3 states

The number of possible states M is defined by the number of state bits and they are indexed by the

integer m = 0, 1, ...,M−1. The state of the coder at a time t is denoted by St and the transitions between

them are described by the transition probability pt(m|m′) = Pr{St = m|St−1 = m′}. For describing the

probabilities related to the BCJR states and output, the notation Pr{X} will be adopted instead of p(x).

The output of the coder is denoted by Xt and it’s described by the probability qt(X|m′,m) = Pr{Xt =

X|St−1 = m′;St = m}. This output Xt is transmitted through a noisy DMC resulting in the output Yt,

The transition probability of the DMC between the Xt and Yt is denoted by R(Yt, Xt). Since the noise

of the DMC is AWGN, the transition probabilities will be given by the Normal distribution as shown in

Equation (3.15). Our decoder was specifically programmed to start at the initial state S0 = 0 and end

at the terminal state Sτ = 0 where τ represents the last time instant. This was done because these

2 conditions are assumed by the BCJR algorithm. How these 2 conditions were programmed will be

described more extensively later.

R(Yt, X) =1√

2πσ2e−

(y−x)2

2σ2 (3.15)

30

Page 49: A Dirty Paper Coding Modem - ULisboa

3.5.2 Behaviour

The algorithm estimates the APP of the states and transitions by comparing the outputs Y t with the

output produced by each combination of state bits for all possible states in the trellis. Let Y t2t1 be the

set formed by the values of Yt from t = t1 to t = t2 and consider a set of measured values from t = 1

to t = τ . For each node, there’s the APP Pr{St = m|Y τ1 } associated and for each branch there’s the

APP Pr{St−1 = m′;St = m|Y τ1 }. To calculate these values, expressions from [12] were used. These

expressions use some auxiliary probabilities which allow us to calculate the desired probabilities. The

calculations defined in the paper will now be explained briefly.

Firstly, to obtain the APP Pr{St−1 = m′;St = m|Y τ1 }, the corresponding joint probability is defined

as:

σt(m′,m) = Pr{St−1 = m′;St = m;Y τ1 } (3.16)

Then, the following auxiliary probabilities can be defined as:

αt(m) = Pr{St = m;Y t1 } (3.17a)

βt(m) = Pr{Y τt+1|St = m} (3.17b)

γt(m′,m) = Pr{St = m;Yt|St−1 = m′} (3.17c)

Due to the initial and terminal states of our decoder, we have the following conditions:

• α0(0) = 1 and α0(m) = 0 for m 6= 0;

• βτ (0) = 1 and βτ (m) = 0 for m 6= 0.

Taking advantage of Markov properties, it was shown that σt(m′,m) could be computed from α, β

and γ with:

σt(m′,m) = αt−1(m′) · γt(m′,m) · βt(m) (3.18)

Also, simplified expressions for the probabilities (3.17a),(3.17b) and (3.17c) were obtained:

αt(m) =

M−1∑m′=0

αt−1(m′) · γt(m′,m); (3.19a)

βt(m) =

M−1∑m′=0

βt+1(m′) · γt+1(m,m′); (3.19b)

γt(m′,m) =

∑X

pt(m|m′) · qt(X|m′,m) ·R(Yt, X); (3.19c)

Figure 3.10 shows a possible implementation of the encoder if the virtual bits uV Q were known.

This is used to determine the state space model used. In other words, it describes the state space

structure used for soft demapping of input bits. This encoder model used by the decoder will be denoted

31

Page 50: A Dirty Paper Coding Modem - ULisboa

Figure 3.10: Illustration of the trellis processing at the receiver side

by EM. This structure is similar to the one shown in [10], Figure 10, with one important difference.

Instead of upsampling the first coded bit from the accumulator [cACC,1], the second coded bit [cACC,2]

is upsampled. This happens because, at the transmitter side, the 2nd bit of each group of 3 bits was

upsampled so, at the receiver side, the same operation must be performed. As previously introduced,

the decoder has 4 state bits: one coded bit from the accumulator cACC and 3 state bits from the vector

quantizer [uV Q,1 uV Q,2 uV Q,3]. These 3 state bits are used to create 2 bits of the codeword using the

same generator polynomials as the ones used at the transmitter side (058, 078). By examining Figure

3.10, we can see that the computation of the output symbols m1 and m2 involve all 3 coded bits from

the accumulator [cACC,1 cACC,2 cACC,3] while the state bits only contain 1 coded bit (cACC). This

may seem like a problem but a solution that takes advantage of the fact that BCJR algorithm allows a

dependence of the output on the previous state will allow to solve it.

3.5.3 Operations

The accumulator consists of a memory one differential encoder with a stream of bits (1 input bit uACC

at each iteration) as input. The BCJR determines the a-posteriori probabilities of the input bits of the

accumulator. These a-posteriori probabilities are obtained using the a-priori L-values at the accumulator.

The a-priori L-values are defined by Equation 3.20.

La-priori(uACC,k) = lnp(uACC,k = 1)

p(uACC,k = −1)(3.20)

where:

• p(uACC,k = 1) is the probability of the input bit uacc to be equal to ’0’ at the time instant k ;

• p(uACC,k = −1) is the probability of the input bit uacc to be equal to ’1’ at the time instant k.

Therefore, we can obtain the Equations (3.21a) and (3.21b) for the a-priori probabilities of the bits by

32

Page 51: A Dirty Paper Coding Modem - ULisboa

manipulating Equation 3.20.

p(uACC,k = 1) =1

1 + 1

eLa-priori(uACC,k)

(3.21a)

p(uACC,k = −1) =1

1 + eLa-priori(uACC,k)(3.21b)

The output and new state of the accumulator is the result of a XOR operation between the previous

input bit and the previous state of the accumulator. In the implementation of Figure 3.10, the vector

quantizer is formed by a 3-bit shift register that receives a bit uV Q every 3 input bits. This input bit is

denoted by ”virtual bit” because its values are undetermined, that is, they make the VQ codeword range

over all possible values and do not correspond to input or output bits. The vector quantizer codeword

(cV Q,1, cV Q,2) is obtained from the convolution of the virtual bits uV Q with the polynomials (g1(D), g2(D)):

cVQ,1 = g1(D) ∗ uV Q (3.22a)

cVQ,2 = g2(D) ∗ uV Q (3.22b)

The decoder that was designed performs the operations in a cycle of 3, that is, in each one of the

3 input bits, it behaves differently. The output is calculated by performing a 4-PAM mapping of the bits

(b1,1, b1,0) for m1 and (b2,1, b2,0) for m2. These bits can be obtained from the following expressions:

b1,1 = cACC,2 ⊕ cV Q,1 (3.23a)

b1,0 = cACC,1 (3.23b)

b2,1 = cACC,2 ⊕ cV Q,2 (3.23c)

b2,0 = cACC,3 (3.23d)

In 4-PAM mapping, these bits are used to obtain m1 and m2 as represented in Equations (3.24a) and

(3.24b) and it corresponds to a conversion from binary to decimal with an additional subtraction by 1.5.

m1 = 21 × b1,1 + 20 × b1,0 − 1.5; (3.24a)

m2 = 21 × b2,1 + 20 × b2,0 − 1.5. (3.24b)

Regarding the comparisons between the calculated output (m1,m2) and the received output Y , the

received output was grouped in pairs where the first member of the pair is compared to m1 and the

second one is compared to m2. Namely, Yt for an odd t is compared with m1 and Yt for an even

t is compared with m2. This comparison is done by using the folded Euclidean metric. Instead of

just calculating the squared Euclidean distance between Yt and mi, the operation mod(Yt − mi, 2)2 is

performed instead.

The BCJR algorithm works by testing all state transitions for every state. Due to the dimension

33

Page 52: A Dirty Paper Coding Modem - ULisboa

of our decoder, we decided to test only the possible state transitions which significantly reduced the

simulation time while keeping the results unaltered. The probabilities of the other state transitions are

simply zero. A possible state transition from state Sa to state Sb corresponds to a state transition in which

a combination of the input accumulator bit uACC and the input VQ bit uV Q causes state Sa to transition

to state Sb. That is, given the state Sa = [A B C D], the input bits uACC = X and uV Q = Y cause a

state transition to state Sb = [A⊕X Y B C] as exemplified in Figure 3.11. Some important events

of the algorithm’s evolution are performed on a cycle at every 3 input bits. For t = 1, 4, 7, ..., there is no

generated output. For t = 2, 5, 8, ..., the virtual input bit uV Q enters the shift register. These operations

are described with more detail ahead.

Figure 3.11: Example of possible and impossible state transitions

For t = 1 , only the accumulator received an input bit which results in the coded bit cACC,1. The

vector quantizer does not receive any input bit so its state remains the same. At the end of this loop

iteration, using (3.23b), we only have b1,0 which corresponds to the least significant bit of the output

m1. No output is produced and γ(m,m′) is computed only with the a-priori probability of the input bit as

shown in Equation (3.25) where u represents the accumulator input bit uACC that caused the transition

from state m’ to m.

γ(m,m′) = P [uACC,k = u] (3.25)

For t = 2, both the accumulator and vector quantizer received input bits. The accumulator outputs

the coded bit cACC,2 which will be upsampled. The vector quantizer state is shifted as the input virtual

bit is shifted into the first slot of the shift-register. Then, the VQ codeword is calculated according to the

VQ state and the polynomials (058, 078). The upsampled bit cACC,2 is used in a XOR operation each. In

the first XOR operation, cACC,2 is added to the VQ codeword bit cV Q,1 using Equation (3.23a), resulting

in the sign bit b1,1. In the second XOR operation, cACC,2 is added to the other VQ codeword bit cV Q,2

using Equation (3.23c), resulting in the sign bit b2,1. Since the BCJR algorithm allows to consider that

the output depends on both the current and the previous state, the value of cACC,1 can still be retrieved

and, consequently, the bit b1,0 can still be calculated. At the end of this loop iteration, we have both bits

34

Page 53: A Dirty Paper Coding Modem - ULisboa

(b1,1, b1,0,) to compute m1 as well as b2,1. So, γ(m,m′) is computed using the a-priori probability of the

input bit u that caused the transition from state m’ to m and also comparing the calculated output m1

with the received output y from the channel and assigning a probability to the respective state transition

depending on whether or not the calculated output m1 matches the received output. This probability

is based on the noise that affected the signal in the channel so it’s calculated assuming a Normal

distribution. Therefore, γ(m,m′) is calculated using Equation (3.26).

γ(m,m′) = P [uACC,k = u] · 1√2πσ2

e−(mod(y−m1),2)2

2σ2 (3.26)

where σ2 represents the effective noise power. Described in [10], this effective noise power at the

receiver is calculated using Equation (3.27).

σ2 = (1− α)2PX + α2PN (3.27)

where PX and PN represent the signal and the noise power which are computed as described in a

previous section.

For t = 3, similarly to the first loop iteration, only the accumulator received an input bit which results

in the coded bit cACC,3 from which we obtain b2,0 using (3.23d). With no virtual input bit, the vector

quantizer remains in the same state and the vector quantizer codeword (cV Q,1, cV Q,2) is the same.

Again, due to the fact that BCJR allows using the previous state, the bit cACC,2 can still be retrieved. So,

the bit b2,1 can still be calculated by performing the operation b2,1 by using (3.23c). At the end of this

loop iteration, we have both bits (b2,1, b2,0) to compute m2. Now, repeating the process performed in the

previous loop iteration, γ(m′,m) is computed using the a-priori of the input bit u that caused the transition

from state m’ to m and also comparing the calculated output m2 with the received output and assigning

a probability to the respective state transition depending on whether or not the calculated output m2

matches the received output. Therefore, it’s calculated similarly as in the previous loop iteration using

Equation (3.26) for the output m2 instead.

These 3 loop iterations are executed repeatedly in a cycle until all the bits coming from the channel

are received. The reason why we chose 3 state bits for the VQ instead of using only 2 state bits and

the input bit uV Q was because we need to compute the same VQ codeword during 3 loop iterations to

use in the calculation of the outputs m1 and m2. Since the input bit is only available once every 3 loop

iterations, it would be impossible to compute the VQ codeword in all 3 loop iterations.

To help understanding the structure of our decoder, a detailed description of the operations per-

formed at each loop iteration (for a couple of iterations) will be given. Table 3.2 gives a general overview

of the evolution of the bits that establish the state of the decoder. Tables 3.3 and 3.4 show an ex-

ample of the evolution of the EM given an arbitrary input sequence of accumulator and vector quan-

tizer bits where the first table describes the state bits and the second table describes the intermediate

(cV Q,1, cV Q,2, b1,1, b1,0, b2,1, b2,0) and final outputs (m1,m2).

We now perform a detailed step-by-step description of the operations for the example shown in Tables

3.3 and 3.4.

35

Page 54: A Dirty Paper Coding Modem - ULisboa

time VQ time Input bits VQ state bits

t k uACC(t) uV Q(k) cACC(t) uV Q(k − 1) uV Q(k − 2) uV Q(k − 3)

0 1 uACC(0) × cACC(0) uV Q(0) uV Q(−1) uV Q(−2)

1 1 uACC(1) uV Q(1) uACC(0)⊕ cACC(0) uV Q(0) uV Q(−1) uV Q(−2)

2 2 uACC(2) × uACC(1)⊕ cACC(1) uV Q(1) uV Q(0) uV Q(−1)

3 2 uACC(3) × uACC(2)⊕ cACC(2) uV Q(1) uV Q(0) uV Q(−1)

4 2 uACC(4) uV Q(2) uACC(3)⊕ cACC(3) uV Q(1) uV Q(0) uV Q(−1)

5 3 uACC(5) × uACC(4)⊕ cACC(4) uV Q(2) uV Q(1) uV Q(0)

6 3 uACC(6) × uACC(5)⊕ cACC(5) uV Q(2) uV Q(1) uV Q(0)

... ... ... ... ... ... ... ...

Table 3.2: General overview of the evolution of the decoder

time VQ time Input bits VQ state bits

t k uACC(t) uV Q(k) cACC(t) uV Q(k − 1) uV Q(k − 2) uV Q(k − 3)

0 1 1 × 0 0 0 0

1 1 0 1 1 0 0 0

2 2 1 × 1 1 0 0

3 2 1 × 0 1 0 0

4 2 0 0 0 1 0 0

5 3 0 × 0 0 1 0

6 3 1 × 1 0 1 0

... ... ... ... ... ... ... ...

Table 3.3: Example of the evolution of the state bits of the decoder given an arbitrary input sequence

time VQ codeword bits m1 bits m2 bits output symbols

t cV Q1cV Q2

b1,1 b1,0 b2,1 b2,0 m1 m2

0 × × × × × × × ×1 0 0 × 1 × × × ×2 1 1 0 1 0 × −0.5 ×3 1 1 0 × 0 0 × −1.5

4 1 1 × 1 × × × ×5 0 1 1 1 0 × 1.5 ×6 0 1 1 × 0 1 × −0.5

... ... ... ... ... ... ... ... ...

Table 3.4: Example of the evolution of the outputs of the decoder given an arbitrary input sequence

For t=1:

• The accumulator state cACC(1) is obtained from the XOR operation between the previous accumu-

lator input bit uACC(0) and the previous accumulator state cACC(0). Therefore, cACC = 1⊕ 0 = 1.

• The vector quantizer receives an input VQ bit uV Q(1) = 1 which will only be shifted into the VQ

shift register in the next iteration.

36

Page 55: A Dirty Paper Coding Modem - ULisboa

• According to the current VQ state, the VQ codeword bit cV Q,1 is computed using Equation (3.22a)

so cV Q,1 = [1 0 1] · [0 0 0] = 0. Similarly, cV Q,2 is computed using Equation (3.22b), so

cV Q,2 = [1 1 1] · [0 0 0] = 0.

• No output is calculated in this iteration.

For t=2:

• Similarly to the previous iteration, the accumulator state is given by cACC(2) = uACC(1)⊕cACC(1) =

0⊕ 1 = 1.

• Now, the previous input bit uV Q(1) is shifted into the VQ shift register, resulting in the VQ state =

[1 0 0].

• From the new VQ state, we compute the new VQ codeword bits cV Q,1 = [1 0 1] · [1 0 0] = 1

and cV Q,2 = [1 1 1] · [1 0 0] = 1.

• Using Equation (3.23a), we calculate b1,1 = 1 ⊕ 1 = 0. Since the BCJR allows using the previous

state, the value cACC(1) = 1 can still be retrieved and, consequently, using Equation (3.23b), we

obtain b1,0 = 1. Therefore, using Equation (3.24a), we compute m1 = 21× 0 + 20× 1− 1.5 = −0.5.

For t=3:

• The accumulator state is given by cACC(3) = uACC(2)⊕ cACC(2) = 1⊕ 1 = 0.

• In this iteration, the VQ isn’t shifted resulting in the same VQ state = [1 0 0].

• Consequently, we have the same VQ codeword bits cV Q,1 = 1 and cV Q,2 = 1.

• Using Equation (3.23d), we obtain b2,0 = 0. Again, information about the previous state can

be retrieved so the value cACC(2) = 1 can still be obtained and, consequently, using Equation

(3.23c), we calculate b2,1 = 1 ⊕ 1 = 0. Therefore, using Equation (3.24b), we compute m2 =

21 × 0 + 20 × 0− 1.5 = −1.5.

For t=4:

• Perform the same operations as performed in iteration number 1.

For t=5:

• Perform the same operations as performed in iteration number 2

For t=6:

• Perform the same operations as performed in iteration number 3

37

Page 56: A Dirty Paper Coding Modem - ULisboa

3.5.4 Initial and terminal states of the decoder

Special attention needs to be taken regarding the termination of the decoder trellis. At the beginning, it

was said that both the initial state S0 and terminal state St had to be at the all-zero state for the BCJR

algorithm to work correctly.

1. S0 = [0 0 0 0]

2. Sτ = [0 0 0 0]

For the first condition, we simply initialize the decoder state with the all-zero state. To guarantee the sec-

ond condition, more complex operations had to be performed. Recapitulating, the decoder state consists

of 1 coded bit from the accumulator (cACC) and 3 bits from the vector quantizer state (uV Q,1uV Q,2uV Q,3).

To have cACC = 0, we need to guarantee that the input bit into the accumulator is the same as the ac-

cumulator’s previous state so that the XOR operation returns 0 for the next accumulator state.

cACC,nextState = uACC ⊕ cACC,currentState (3.28)

This is done at the encoder side by directly forcing the desired input bit into the accumulator.

To get [uV Q,1 uV Q,2 uV Q,3] = [0 0 0], we need the vector quantizer to receive as input, uV Q = 0

a total of 3 times. Yet, since that input bit only enters the vector quantizer once every 3 iterations,

terminating process needs to be started 3×3 = 9 iterations before the end. Obviously, we didn’t need to

start the terminating process for the accumulator state so early (only the last iteration would be enough)

but, for ease of implementation, we directly forced the last 9 bits of the accumulator input at the encoder

side using Equation (3.28). Table 3.5 shows an example of the termination process of our decoder

regarding its state where τ represents the time instant of the last iteration.

time VQ time Input bits VQ state bits

t k uACC(t) uV Q(k) cACC(t) uV Q(k − 1) uV Q(k − 2) uV Q(k − 3)

... ... ... ... ... ... ... ...τ − 10 1 1 × 0 1 1 0

τ − 9 1 1 × 1 1 1 0

τ − 8 1 0 0 0 1 1 0

τ − 7 2 0 × 0 0 1 1

τ − 6 2 0 × 0 0 1 1

τ − 5 2 0 0 0 0 1 1

τ − 4 3 0 × 0 0 0 1

τ − 3 3 0 × 0 0 0 1

τ − 2 3 0 0 0 0 0 1

τ − 1 3 0 × 0 0 0 0

τ 3 0 or 1 × 0 0 0 0

Table 3.5: Example of the terminating process of the state of the decoder

We now perform a detailed step-by-step description of the operations for the example shown in Table

3.5.

38

Page 57: A Dirty Paper Coding Modem - ULisboa

At t = τ − 10: This is the last iteration before the terminating process starts which is executed

normally as described in the previous section.

From t = τ − 9 to t = τ : In these last iterations, the accumulator input bit uACC is directly defined

depending on the current state of the decoder, more specifically, the current accumulator state cACC .

Regarding the termination of the decoder, our objective is to guarantee that the terminal state of the

accumulator and the terminal state of the vector quantizer are set to the zero state. Since the next

state of the accumulator is defined by Equation (3.28), we just need to set the accumulator input to be

equal to the current accumulator state so that the next accumulator state is the zero state. Therefore,

uACC(τ − 9) = cACC(τ − 9) = 1. Regarding the vector quantizer, whenever there’s an input VQ bit, we

need to set it to 0. This way, in the end, all 3 VQ state bits are set to 0.

3.5.5 Computation of the L-values

At this point, we’ve computed the values of γ(m′,m) for all incoming bits from the channel. Using the

Equations 3.19a and 3.19b, we can compute the values of αt(m) and βt(m), keeping in mind the initial

and terminal conditions of the decoder. Finally, we are able to compute the values of σ(m′,m) using

Equation 3.18. Then, we obtain the desired APP which will be converted into the log-domain, resulting

in the a-posteriori L-values which, similarly to the La−priori values, can be defined as

La-posteriori(uACC,k) = lnp(uACC,k = 1)

p(uACC,k = −1)(3.29)

where:

• p(uACC,k = 1) is the probability of the input bit uacc to be equal to ’0’ at the time instant k ;

• p(uACC,k = −1) is the probability of the input bit uacc to be equal to ’1’ at the time instant k.

These input bit uacc probabilities are computed using a similar approach to the computation of

γ(m′,m). Since, for each state transition, there’s a corresponding combination of input bits (uACC , uV Q),

we can associate the probability of a state transition to the probability of the corresponding input bits,

more specifically, the input bit from the accumulator uACC . Consequently, we obtain p(uACC,k = 1)

by summing all the σ(m′,m) for the state transitions in which the input bit of the accumulator was ’0’

and p(uACC,k = −1) by summing all the σ(m′,m) for the state transitions in which the input bit of the

accumulator was ’1’ as shown in Equations 3.30a and 3.30b.

p(uACC,k = 1) =∑

(m′,m)∈Tk,0

σk(m′,m) =∑

(m′,m)∈Tk,0

Pr{Sk−1 = m′;Sk = m;Y τ1 } (3.30a)

p(uACC,k = −1) =∑

(m′,m)∈Tk,1

σ(m′,m) =∑

(m′,m)∈Tk,1

Pr{Sk−1 = m′;Sk = m;Y τ1 } (3.30b)

where:

• Tk,0 represents all the state transitions in which uACC,k = 0

39

Page 58: A Dirty Paper Coding Modem - ULisboa

• Tk,1 represents all the state transitions in which uACC,k = 1

These a-posteriori L-values correspond to the sum of the a-priori L-values LA with the extrinsic L-

values LE . So, after the BCJR algorithm, the decoder will use the extrinsic L-values and the redundancy

of the repeat-accumulator code to refine the accumulator input bits probabilities.

40

Page 59: A Dirty Paper Coding Modem - ULisboa

Chapter 4

Results

Our simulations concluded with success that where we were able to decode the correct sequence of

input bits after transmission over a channel with interference. As expected, the decoder is not per-

fect because, if the noise is too great, the algorithm is not capable of decoding the correct sequence.

Nonetheless, for a reasonable amount of interference, our simulations showed that the dirty-paper tech-

nique does work and the received signal can be decoded correctly.

The tests were executed with a constant random message of 6000 bits. The signal power, as cal-

culated in the previous section, is PX = 0.532. The standard deviation σ was the varying factor in the

multiple tests we executed.

Since we implemented both sides of the communication system (the transmitter/encoder and the

receiver/decoder), for testing purposes, we know exactly what were the sequence of bits transmitted

and the sequence of bits received. More specifically, we also know the value of the bits in each element

of the encoder and decoder. Therefore, in the same way the final decoded message should match the

original message, the bit sequences in the decoding process should also match some bit sequences in

the encoding process.

Our simulation results will be largely based on comparing the bit values between the encoder and

decoder. Firstly, we will compare the original message with the final decoded message and then, we will

analyze the evolution of our system throughout each iteration.

4.0.1 Overall results

As introduced previously, we simulated our system for different noise values by modifying the noise

standard deviation σ between 0 and 1 with a step of 0.1. For each of those noise values, using Equation

4.1, we obtained the corresponding SNR values which are displayed in Table 4.1.

SNRdB = 10 log10

(PsignalPnoise

)= 10 log10

(PXσ2

)(4.1)

In each of these runs, we cycled through 10 iterations of our algorithm and the results are displayed

in Figure 4.1.

41

Page 60: A Dirty Paper Coding Modem - ULisboa

Table 4.1: SNR values for each run

Run number σ SNR(dB)1 0,1 17,2592 0,2 11,2383 0,3 7,7174 0,4 5,2185 0,5 3,286 0,6 1,6967 0,7 0,3578 0,8 -0,8039 0,9 -1,826

10 1 -2,741

Figure 4.1: Evolution of the number of errors along the iterations for each signal-to-noise ratio

In Figure 4.1, we display the number of errors between the original message and the decoded mes-

sage where the message contains a total of 6000 bits. Each curve represents a whole run for a specific

SNR value and the evolution can be tracked by observing the x-axis which indicates the iteration number.

By analyzing all the curves, we can see that for some SNR values, our decoder manages to obtain

the correct sequence of bits for the message. For SNR ≥ 7.7167 dB it succeeded in doing so with only 2

iterations of our algorithm. For lower SNR, some additional iterations were needed to obtain the correct

sequence . For SNR = -0.80271 dB, we notice that the algorithm is indeed converging to the correct

solution (zero errors) but the 10 iterations we performed weren’t enough to fully decode the received

signal. The complete run for this SNR value is shown in Figure 4.2 where we exceptionally performed

more than 10 iterations to show that the algorithm needed more than 10 iterations to converge to the

correct solution. While the -0.803 dB curve is almost converging to the correct sequence at the end of

10 iterations, the -1,826 dB curve appears to be converging into the wrong sequence where almost half

of the bits are incorrectly decoded. In this situation, we can assume that the system was not capable of

decoding the correct bit sequence from the received signal. The same can be said for SNR = -2,741 dB

42

Page 61: A Dirty Paper Coding Modem - ULisboa

where each iteration doesn’t seem to be adding any useful information to the process of decoding the

received signal.

Figure 4.2: Evolution of the number of errors along extra iterations for SNR = -0.80271 dB

The capacity gap between the transmission rate and the capacity can be calculated. For each input

bit of the original message, the repeat-accumulate coder generates 6 bits. These 6 bits are upsampled

into 8 bits which are mapped into 4 symbols. That is, 1 input bit generates 4 samples. Therefore, the

transmission rate rt is given by:

rt = 0.25 bit/sample

Using Equation (1.6), the SNR at which the capacity is equal to this transmission rate will be com-

puted.

C =1

2log2 (1 + SNRmin)

⇔ SNRmin = 22C − 1 = 0.414 = −3.828 dB

Analyzing Figure 4.1, the minimum SNR at which it was possible to decode the correct bit sequence

in 10 iterations was SNR = 0.357 dB.

Note that this SNR corresponds to a maximum of 10 iterations of the algorithm. If the algorithm

performed more iterations, the minimum SNR at which it was possible to decode the correct bit sequence

would be lower and, consequently, the capacity gap would be lower. Therefore, the capacity gap is given

by:

capacity gap = SNR|dB − SNRmin|dB = 0.357 dB − (−3.828 dB) = 4.185 dB

This capacity gap is bigger than the one obtained in [10] which was 1.9 dB. However, this value was

obtained after performing around 60 to 90 iterations of the algorithm.

We further analyze the values between SNR = -1.826 dB and SNR = -0.803 dB by performing addi-

tional runs for intermediary SNR values. The results are displayed in Figure 4.3.

Observing Figure 4.3, we confirm that as the SNR keeps decreasing, the number of iterations nec-

essary to fully decode the received signal keeps increasing. In this chart we can also perceive that the

43

Page 62: A Dirty Paper Coding Modem - ULisboa

Figure 4.3: Evolution of the number of errors along the iterations for -1.826 dB < SNR < -0.803 dB

relation between noise and number of errors is not linear. The curve for SNR = - 0.911 dB showed a

higher number of errors in the first 4 iterations than the curve for SNR = -1.123 dB even though the noise

power was lower. This can be due to random fluctuations on the results, due to the random nature of

the signal used in the simulations.

This behavior is as expected according to the system we implemented because, as explained through-

out the thesis, the decoder works by acquiring new information through the computation of the a-priori

and a-posteriori bit probabilities with the BCJR algorithm and the redundancy provided by the RA code.

Therefore, we can assign the success of the decoding process to two major factors:

• Amount of information needed to achieve the correct sequence;

• Quality of information acquired in each iteration.

Both of these factors are directly affected by the noise because if the noise is high, then:

• The received signal will most likely have many erroneous bits which means we will need much

more information to find the correct sequence;

• The calculations performed by the algorithm will be less accurate because they involve predicting

the evolution of the received bits. Too many erroneous bits might induce the algorithm into the

wrong solution. Therefore, the quality of the information acquired in each iteration is degraded.

So far, we’ve displayed the evolution of the algorithm along the iterations for the number of errors

in absolute value. We will focus on the results of the algorithm after performing 10 iterations. More

specifically, we will compute the bit error rate (BER) for each SNR value in Figure 4.4. The BER is

computed as shown in Equation (4.2).

BER =number of bit errorstotal number of bits

(4.2)

44

Page 63: A Dirty Paper Coding Modem - ULisboa

Figure 4.4: Evolution of the Bit-Error-Rate (BER) for different Signal-Noise Ratio (SNR) values

Figure 4.4 displays the BER in a logarithmic scale calculated for the same SNR values that were in

Figure 4.1. Only the 3 lowest SNR values still showed errors at the end of the 10th iteration. The rest of

the results displayed no errors at the end of the 10th iteration so, consequently, the BER should converge

asymptotically to −∞. However, even when there were no errors at the end of the 10th iteration, it can’t

be proved that the BER is converging to −∞ because not enough tests were performed for the result to

have statistical meaning. When there’s only one bit error, BER is given by

BER =1

6000' 1.67× 10−4 < 1.0× 10−3

Therefore, it can be inferred that BER should be lower than 1.0× 10−3 but the same cannot be inferred

about the 1.0× 10−4 threshold.

Although we don’t have many data points in this chart, we detect a sudden drop in the BER curve at

around SNR = -1.7 dB. This huge increase in the steepness of the curve is denoted by turbo cliff and

defines the SNR values at which the system starts converging to the correct solution.

In Figure 4.5, we performed additional runs over the SNR values around the turbo cliff so that we can

more effectively analyze its evolution.

Figure 4.5: Evolution of the turbo cliff

45

Page 64: A Dirty Paper Coding Modem - ULisboa

Figure 4.5 shows that the BER keeps decreasing as the SNR increases. Initially, it decreases rela-

tively slowly but at around SNR = -0,693 dB, it starts becoming more and more steep. This is the turbo

cliff effect which corresponds to the convergence of iterative decoding towards low BER.

Besides comparing the final decoded message with the original message, we will also analyze the

intermediary elements of the algorithm. To do so, we will compare the bit sequences between the trans-

mitter and the receiver at specific elements of each side. We will denote these comparison sequences

as checkpoints. We placed 5 checkpoints in each iteration spread throughout the algorithm’s elements

as listed below:

1. BCJR Output: Represents the bit sequence with the a-posteriori bit probabilities that were com-

puted with the BCJR algorithm;

2. CND to IL: Represents the bit sequence that is forwarded to the IL after performing the first ”box-

plus” operation;

3. Input: Represents the bit sequence that corresponds to the final decoded message;

4. IL to CND: Represents the bit sequence that is forwarded from the IL to the CND after processing

the bit sequences with the VNDs;

5. BCJR Input: Represents the bit sequence that will be forwarded to the BCJR algorithm after per-

forming the second ”box-plus” operation.

The results of the comparisons at each of these checkpoints for σ = 0.7 is depicted in Figure 4.6.

Figure 4.6: Evolution of the number of errors along the elements for each iteration for σ = 0.7

Figure 4.6 provides a visual representation of the algorithm’s performance. Each curve represents

an iteration of the algorithm and indicates the percentage of bit errors at each checkpoint. A percentage

46

Page 65: A Dirty Paper Coding Modem - ULisboa

is specified instead of the absolute number of errors because the total number of bits processed at

the checkpoints varies. These curves allow us to follow the effectiveness of the elements between the

checkpoints. Between the end of an iteration and the start of the following iteration, we see that the

percentage of errors always decreases which demonstrates the effectiveness of the BCJR algorithm.

Inside each iteration, we notice that the percentage of errors doesn’t always decrease. This can be

explained by the fact that not all elements of the decoder work individually. Most of them are integrated

with other elements so that, together, they can improve the decoding process. For example, the first

”box-plus” operation performed between the ”BCJR Output” and the ”CND to IL” checkpoints actually

increased the percentage of errors. This happened because this element combines all the extrinsic

information coming from the BCJR with the existing information from the previous iteration so it might

not always represent a better bit estimation. However, after this combined information is processed by

the remaining elements of the repeat-accumulate code, it will improve.

4.0.2 Summary

Recapitulating, at high SNR, the noise doesn’t distort the signal too much so the received signal doesn’t

have too many erroneous bits. The amount of information needed to obtain the correct sequence is

low so very few iterations of the algorithm are needed. Also, the information acquired in each iteration

shows the necessary quality to further improve the decoding process. Slightly reducing the SNR, the

noise starts to have a higher effect on the received signal by causing many bits to be received incorrectly.

Consequently, many iterations for improving the received signal will need to be performed, each of them

consecutively adding useful information to the signal. Lastly, at extremely low SNR, the noise greatly

distorts the signal causing many of the bits to be received incorrectly so lots of iterations would be

needed to acquire the needed information to obtain the correct bit sequence. However, since the noise

is too high, the algorithm’s accuracy gets compromised. It might perform incorrect conjectures about the

bit sequence and cause the information acquired to be imprecise. This leads to a propagation of the

errors and induces the algorithm into converging to the wrong solution. In this situation, the decoder is

unable to decode the message correctly.

47

Page 66: A Dirty Paper Coding Modem - ULisboa

48

Page 67: A Dirty Paper Coding Modem - ULisboa

Chapter 5

Conclusions

The objective of this thesis was to implement an end-to-end system which employed the Dirty Paper

Coding technique. The capability of this system to transmit a message and correctly decode it at the

receiver end was tested with different conditions. The complexity of the system is relatively high and it

directly depends on the vector quantizer memory. This system was implemented with vector quantizers

of memory 2 but higher shaping gains can be achieved by using vector quantizers with larger memory.

However, the complexity of the system will grow exponentially with the code memory.

Overall, the implemented system worked correctly and each component worked as expected by

meeting the objectives we had formulated. That is, for relatively low SNR conditions, the system still

managed to correctly decode the received signal into the correct original message. From the analysis of

the results, we can conclude that the iterative decoding process describes three singular behaviors:

1. Converges to the correct solution after very few iterations;

2. Requiring several more iterations and slowly converging into the correct solution;

3. Converges to the wrong solution.

For future incorporation in other systems, the first behavior is the most desirable one since it provides

a quick and correct solution. The second behavior is still acceptable although it obviously depends on

the system specifications and the requirements of the project. The third behavior is not acceptable but it

only occurs when the SNR conditions are too low.

One of DPC’s main application may be in Multiple-Input Multiple-Output (MIMO) systems. These

systems allow an increase in the transmission rate by using multiple antennas to transmit the signal.

However, these antennas share the same communication medium and cause interference between

them. These antennas can cooperate in reducing the interference by using DPC which allows the

removal of noncausally known interference at the transmitter. Therefore, DPC has great potential in

MIMO systems.

Regarding the extent of the implemented system to other systems, since the concepts behind DPC

and digital watermarking are very similar, one can conclude that watermarking can greatly benefit from

using side information during the watermark coding process.

49

Page 68: A Dirty Paper Coding Modem - ULisboa

In the future, considering the relatively high complexity involved in implementing DPC, the question

to ask is: how big of a performance boost does DPC offer against other strategies?

50

Page 69: A Dirty Paper Coding Modem - ULisboa

Bibliography

[1] B. Chen and G. W. Wornell. Digital watermarking and information embedding using dither modula-

tion. In 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175), pages

273–278, Dec 1998. doi: 10.1109/MMSP.1998.738946.

[2] S. c. Lin and H. j. Su. Practical vector dirty paper coding for mimo gaussian broadcast channels.

IEEE Journal on Selected Areas in Communications, 25(7):1345–1357, September 2007. ISSN

0733-8716. doi: 10.1109/JSAC.2007.070908.

[3] T. Koch. On the capacity of the dither-quantized gaussian channel. Dec 2013.

[4] A. Khina. The robustness of dirty paper coding and the binary dirty multiple access channel with

common interference. Master’s thesis, Tel-Aviv University, 2010.

[5] G. Ku. Dirty paper coding. Notes from Adaptive Signal Processing and Information Theory Re-

search Group, 2012.

[6] J. Proakis. Digital Communications. McGraw-Hill, 2001. ISBN 9780071181839.

[7] T. M. Thompson. From Error-Correcting Codes through Sphere Packings to Simple Groups. Math-

ematical Association of America, 1983. doi: 10.5948/UPO9781614440215.

[8] M. M. Ingemar Cox and J. Bloom. Digital Watermarking. Morgan Kaufmann Publishers, 1st edition,

2006. ISBN:9780080504599.

[9] M. Costa. Writing on dirty paper. In IEEE Transactions on Information Theory, pages 439–441.

IEEE Information Theory Society, 1983.

[10] U. Erez and S. T. Brink. A Close-to-Capacity Dirty Paper Coding Scheme. In IEEE Transactions on

Information Theory, pages 3417–3432. IEEE Information Theory Society, 2005.

[11] F. M. J. Willems. On gaussian channels with side information at the transmitter. In Proc. 9th Symp.

Information Theory in the Benelux, 1988.

[12] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv. Optimal Decoding of Linear Codes for Minimizing

Symbol Error Rate. In IEEE Transactions on Information Theory, pages 284–287. IEEE Information

Theory Society, 1974.

51

Page 70: A Dirty Paper Coding Modem - ULisboa

[13] S. ten Brink and G. Kramer. Design of Repeat-Accumulate Codes for Iterative Detection and De-

coding. In IEEE Transactions on Signal Processing, pages 2764–2772. IEEE Signal Processing

Society, 2003.

[14] J. Hagenauer, E. Offer, and L. Papke. Iterative decoding of binary block and convolutional codes.

IEEE Transactions on Information Theory, 42(2):429–445, Mar 1996.

52