Information Theory and its Applications

173

Transcript of Information Theory and its Applications

Page 1: Information Theory and its Applications
Page 2: Information Theory and its Applications

Feature: Information Theory and its Applications

1 Guest Editorial; Geir E. Øien

A Historical Perspective on Information Theory

3 Information Theory: The Foundation of Modern Communications; Geir E. Øien

20 On Shannon and “Shannon’s Formula”; Lars Lundheim

30 Statistical Communication Theory 1948 – 1949; Nic. Knudtzon

Novel Developments on Channel Capacity and Spectral Efficiency

35 The True Channel Capacity of Pair Cables With Respect to Near EndCrosstalk; Nils Holte

47 Bounds on the Average Spectral Efficiency of Adaptive Coded Modulation; Kjell J. Hole

53 Breaking the Barriers of Shannon’s Capacity; An Overview of MIMOWireless Systems; David Gesbert and Jabran Akhtar

Turbo Coding and Iterative Decoding: Theory and Applications

65 An Introduction to Turbo Codes and Iterative Decoding; Øyvind Ytrehus

78 Theory and Practice of Error Control Coding for Satellite and Fixed Radio Systems; Pål Orten and Bjarne Risløw

Modulation, Coding and Beyond

92 A New Look at the Exact BER Evaluation of PAM, QUAM and PSK Constellations; Pavan K. Vitthaladevuni and Mohamed-Slim Alouini

106 Performance Analysis of Adaptive Coded Modulation with Antenna Diversityand Feedback Delay; Kjell J. Hole, Henrik Holm and Geir E. Øien

114 Shannon Mappings for Robust Communication; Tor A. Ramstad

Historical Papers

129 Information Theory; Nic. Knudtzon

139 Statistical Communication Theory; Nic. Knudtzon

145 Statistically Optimal Networks; Nic. Knudtzon

Special

153 Multiple Bottom Lines? Telenor’s Mobile Telephony Operations inBangladesh; Arvind Singhal, Peer J. Svenkerud and Einar Flydal

Status

163 Introduction; Per Hjalmar Lehne

164 UMTS Network Domain Security; Geir M. Køien

Contents

Telektronikk

Volume 98 No. 1 – 2002

ISSN 0085-7130

Editor:

Ola Espvik

Tel: (+47) 913 14 507

email: [email protected]

Status section editor:

Per Hjalmar Lehne

Tel: (+47) 916 94 909

email: [email protected]

Editorial assistant:

Gunhild Luke

Tel: (+47) 415 14 125

email: [email protected]

Editorial office:

Telenor Communication AS

Telenor R&D

NO-1331 Fornebu

Norway

Tel: (+47) 67 89 00 00

email: [email protected]

Editorial board:

Berit Svendsen, CTO Telenor

Ole P. Håkonsen, Professor

Oddvar Hesjedal, Director

Bjørn Løken, Director

Graphic design:

Design Consult AS, Oslo

Layout and illustrations:

Gunhild Luke and Britt Kjus,

Telenor R&D

Prepress and printing:

Optimal as, Oslo

Circulation:

3,750

Page 3: Information Theory and its Applications

1

On February 24, 2001, Claude Elwood Shannondied at the age of 84. As the father of, and mostimportant contributor to, the field of informationtheory, he ranks as one of the most brilliant andimportant of all 20th century scientists. In a trib-ute speech made by senator John D. Rockefellerin the US Congress after Shannon’s death, hiswork was referred to as “The Magna Carta of theinformation age”. Had there been awarded aNobel prize within the information and commu-nication sciences, no one would have been amore worthy candidate than Shannon.

Shannon’s ideas, first presented to the world atlarge in his seminal 1948 paper “A Mathemati-cal theory of communication” in Bell SystemTechnical Journal, have been crucial in enablingthe information and communication technologi-cal advances which have created today’s infor-mation society. As with all true pioneers, Shan-non’s way of thinking about information andcommunication represented a true paradigmshift. For example, prior to the arrival of hisseminal papers of the late 40s and 50s, theresimply had been no satisfactory way of mod-elling and analyzing the process of informationgeneration, transfer, and reception from a trans-mitter to a receiver over a noisy communicationchannel – which actually is a generic descriptionof how all practical communication systems work.

With Shannon’s introduction of a generic com-munication system model, his view of informa-tion as a probabilistic entity (sidestepping itsactual semantic meaning), the insight that theprocess of information transmission is funda-mentally stochastic in nature, and his inventionof precise mathematical tools to give a completeperformance analysis of his model, the door wassuddenly opened to a much more fundamentalunderstanding of the possibilities and limitationsof communication systems. In the words ofanother notable information theorist (and formercolleague of Shannon), David Slepian, “Proba-bly no single work in this century has more pro-foundly altered man’s understanding of commu-nication than C.E. Shannon’s article, “A mathe-matical theory of communication”, first pub-lished in 1948. The ideas in Shannon’s paperwere soon picked up by communication engi-neers and mathematicians around the world.They were elaborated upon, extended, and com-plemented with new related ideas. The subjectthrived and grew to become a well-rounded andexciting chapter in the annals of science.”

In a world where predictions for the future per-formance of telecommunication systems some-times seem to be made more out of marketingconcerns than out of a scientifically soundjudgement, information theory still has a lot toteach us – insights that are sometimes sobering,but may also be encouraging. The optimisticvision suggested by some, that “any telecommu-nication service can be made available any-where, anytime, and to anyone” in the future,can fairly easily be shown to have no roots inreality. One example may illustrate this: Theclaims originally made regarding the availablerates and coverage for the upcoming UniversalMobile Telecommunications System (UMTS)so far seem a bit over-optimistic …

However, in some cases information theory canalso be used to uncover a performance potentialbeyond what was previously thought possible,and aid in the design of systems realizing thispotential. As an important example of the appli-cability of Shannon’s results, his theory showedus how to design more efficient communicationand storage systems by demonstrating the enor-mous gains achievable by coding, and by provid-ing the intuition for the correct design of codingsystems. The sophisticated coding schemes usedin systems as diverse as deep-space communica-tion systems, and home compact disk audio sys-tems, owe their success to the insights providedby Shannon’s theory.

Shannon published many more important andinfluential works in a variety of disciplines,including Boolean algebra and cryptography.His work has had an influence on such diversefields as linguistics, phonetics, psychology, gam-bling theory, stock trading, artificial intelligence,and digital circuit design. It also has strong linksto disciplines such as thermodynamics and bio-chemistry.

Shannon was also known for his playfulness andeclectic interests, which led to famous stuntssuch as juggling while riding a unicycle downthe halls of Bell Labs. He designed and builtchess-playing, maze-solving, juggling, andmind-reading machines. These activities bearproof to Shannon’s claim that his motivationwas always curiosity more than usefulness. Inan age where basic research motivated purely byscientific curiosity sometimes seem to be at thelosing end as far as public interest and funding isconcerned, this is a statement worth remember-

Guest EditorialG E I R E . Ø I E N

Geir E Øien

Front cover:

Information appears as achange in the detectablepattern

The artist Odd Andersenvisualises a set of planesas areas for information toappear. Whatever kind ofpredictable pattern thatalready may exist on thoseplanes has no interest. Onlywhen that pattern is changedhas information occurred.When part of the pattern ofone plane is moved througha transmission channel toanother plane and changesits pattern, this is seen asinformation received by thenew plane.

The artist’s generic message:Information has beenproduced and understoodwhen the pattern of one planehas become changed anddetected.

Ola Espvik, Editor in Chief

Telektronikk 1.2002

Page 4: Information Theory and its Applications

2 Telektronikk 1.2002

ing. The success of information theory clearlyshows how one person’s curiosity can translateinto very useful results.

In Norway, Shannon’s memory and the achieve-ments of information theory were recently hon-ored with a Claude E. Shannon In MemoriamSeminar, arranged by Telenor R&D at Kjelleron the August 9, 2001. The seminar drew over100 participants from Norwegian industry andacademia. They came to listen to technical talksby some of Norway’s foremost experts withinthe fields of information and communicationtheory, as well as to some historically and philo-sophically flavored talks. The seminar was high-lighted by a unique reminiscence by formerTelenor R&D director Nic. Knudtzon, the onlyNorwegian to have met Shannon at the pioneer-ing time when information theory was actuallyborn. This special issue of Telektronikk wasinspired to a great extent by the success of thisseminar, and you will find papers by many of thesame scientists here – including Dr. Knudtzon’swonderful reminiscence, as well as three 1950papers of his which rank as the very first presen-tations of information theory made in Norway.

When putting together this issue, I set myselfone main goal and four sub-goals. The main goalwas to collect a high-quality collection of paperswhich could serve to dispel the often-heard viewthat “information theory is not constructive, butis only concerned with theoretical limits whichcan never be reached in practice”. It is my firmview that information theory can be very con-structive, in the sense that it has a lot to sayabout what one should do and what one shouldnot do when designing communication systems.

In fact, these are particularly exciting times forinformation theory, because we now finally haveavailable powerful techniques for actuallyapproaching the performance limits predictedby Shannon. One important example is the classof error control codes called “turbo codes”. Whydo these codes work so well? Because their con-struction turns out to be based to a great extentupon the “nonconstructive” proof techniquesused by Shannon in his analysis!

Regarding my four subgoals; first, the issueshould place information theory in a historicalcontext, while at the same time introducing itsbasic principles and discussing their importance.The papers collected under the heading “A his-toric perspective on information theory” servethis purpose.

Secondly, I wanted to show that information the-ory is still very much an active research field ofpractical importance, and that the Norwegianresearch community is currently making someimportant contributions to this research. Thirdly,some of the most important applications ofinformation theory should be highlighted.Finally, the most promising and exciting currentdevelopments in communication systems designshould be included. I do believe that the finalresult reflects these goals successfully, and thatthe standard of the papers within is very high.I hope the reader will find them as inspiring,insightful, and useful as I do. For me, puttingtogether this issue has truly been a labour oflove!

Page 5: Information Theory and its Applications

3

1 IntroductionModern information theory was basically in-vented by Claude E. Shannon in a series of clas-sic papers [1], [2], [3], and has since been ex-tended and refined by a large number of re-searchers from all over the world. To some, thefield of information theory might seem overtlyconcerned with mathematical and statisticalrigor, and with an abundance of long, technicallyinvolved mathematical proofs. However, theadvantages of this rigor are many. Starting outwith mostly simple, but stringently defined,mathematical assumptions – typically corre-sponding to practical/physical constraints underwhich real-world communication systems are tobe designed – one may derive fundamental lim-its on the performance of such systems, as wellas making statements about the conditions underwhich a given performance can be attained. Inmany situations we are also able to obtain usefulguidelines for how the practical design should(not to mention should not) be done in order toapproach the theoretical performance bounds.

In the following, we shall define and explainsome of the most basic concepts and resultsintroduced by Shannon, and try to expand onhow they affect the design of communicationand information storage systems. Due to spacelimitations and to ease the reading of this paper,we shall not provide proofs for any of the resultspresented. Most of the notation and language isbased on that of Blahut [4] and the more recentbook by Cover and Thomas [5]. Another impor-tant text on information theory is Berger’s clas-sic on rate distortion theory [6].

2 Shannon’s Generic Modelof a Communication System

Central to the development of information the-ory is the notion of a generic communicationsystem model which can be used as a unifyingframework suitable for describing a wide rangeof real-world systems. In the generic communi-cation system model proposed by Shannon,information is transmitted from an informationsource to a user, by means of a transmitter, a

Information Theory: The Foundation of Modern CommunicationsG E I R E . Ø I E N

This paper gives a brief tutorial introduction to information theory, the fundamental concepts and

theorems of which are at the very heart of modern communications and information technology.

The results of information theory tell us:

• how to quantify the information content in a set of data;

• how to model and analyze a wide range of communication channels and their capacity for trans-

mitting information;

• conditions under which error-free representation and transmission of information is possible, and

when it is strictly impossible;

• conditions for the design of good ways of (codes for) representing information so as to achieve

data compaction and compression, and channel error robustness;

• which minimal quality reduction we may expect for a given transmission rate, a given information

source, and a given communication channel;

• how we may split our communication systems into subsystems, in order to simplify design without

the loss of theoretical performance.

We will present the basic concepts and most famous theorems of information theory and explain their

usefulness in the design and analysis of communications and information storage systems.

Figure 1 Generic communication system model

Geir E Øien (36) received hisMSc and PhD degrees from theNorwegian University of Scienceand Technology (NTNU) in 1989and 1993, respectively. From1994 until 1996 he was withStavanger University College asassociate professor. Since 1996he has been with NTNU, since2001 as full professor of infor-mation theory.Prof. Øien is a member of IEEEand the Norwegian Signal Pro-cessing Society. He is author/co-author of more than 40research papers. His currentresearch interests are in infor-mation theory, communicationtheory, and signal processing,with emphasis on analysis anddesign of wireless communica-tion systems.

[email protected]

Source Transmitter Channel Receiver User

Noise and impairments

Telektronikk 1.2002

Page 6: Information Theory and its Applications

4 Telektronikk 1.2002

communication channel, and a receiver. This isdepicted in Figure 1. Examples of possible infor-mation sources in this context are: human speak-ers, video cameras, musical instruments, micro-phones, loudspeakers, and computer keyboards.

The transmitter and receiver perform informa-tion coding, which means processing the mes-sages generated by the information source inorder to

• represent (encode) the messages in a suitableway during transmission over the channel, and

• regenerate (decode) the messages at thereceiver end, with as little deviation from whatwas originally transmitted as is required forthe service under discussion.

Note that the term “transmission” here isintended to cover transmission both in space(between two different locations) and in time(i.e. storage of data on an imperfect medium).Examples of physical communication channelsthus range from wireless channels such as satel-lite links, mobile radio channels, and broadcastchannels, via wired links such as optical fibresand copper transmission lines, to magnetic stor-age media and CDs.

The impairments to the transmitted messagesmay differ a lot depending on the type of chan-nel, but may include

• thermal noise and atmospheric noise (additivenoise);

• interference from other sources and systemusers;

• reflections and scattering of transmitted radiowave power during propagation through theterrain;

• signal attenuation due to path loss in radiochannels or transmission line resistance;

• intersymbol-interference due to a lack ofavailable bandwidth;

• Doppler shifts due to relative movementsbetween receiver and transmitter;

• nonlinear effects, e.g. due to nonlinear poweramplifier characteristics;

• stains or scratches on a CD-ROM.

Whatever the channel, the aim of informationtheory is to model these impairments in a quan-titative way, mainly by statistical models. Themodelling is then used to deduce performance

limits, and to devise methods for efficient trans-mission over the channel – that is, coding algo-rithms.

The ultimate goal of the coding is to exploit thecommunication channel as well as possible. Thisis to say that we want to spend as little as possi-ble of the limited physical resources we haveavailable – time, bandwidth, transmit power, ordisc space – on the transmission or storage ofinformation, in order to maximize the number ofusers, systems, or services that are able to sharethese resources. At the same time we need toensure that the quality of the informationretrieved at the receiver end is satisfactory.

By “quality” we usually mean something like“degree of similarity to the transmitted mes-sage”. The appropriate measure of, and typicaldemands on, quality is dependent on the type ofinformation transmitted and on the application orservice: For data communications, the criterionmight be that the average information bit errorprobability (bit error rate – BER) should be lessthan, say, 10-9 – whereas for speech and videocommunications, the aural or visual quality per-ceived by human ears or eyes is the most impor-tant thing, and a much higher BER can usuallybe accepted.

The accepted perceptual quality range alsovaries with the application, and is different e.g.for mobile telephony (low-to-medium qualityapplication) and high fidelity audio (very highquality application). Real-time applications suchas two-way speech communication also placeconstraints on average delay, buffering, proba-bility of no transmission (“outage”), etc.

3 Information – What is it?One of the most basic questions answered byinformation theory is “What is information –in a quantitative sense?” When discussing thisissue it will be useful to distinguish betweentwo different perspectives:

• How to quantify the information content in thedata produced by a source (as discussed inSubsection 3.2).

• How to quantify the information content trans-mitted from a source to a user by means of achannel (as discussed in Section 5).

3.1 Information as a Measureof Unpredictability

Regardless of which of the two above perspec-tives is used, an intuitively appealing line ofthinking about information content is as follows:An event has a low information content if its“future” can be forecast with a high degree ofaccuracy, based on knowledge of its “past”. In

Page 7: Information Theory and its Applications

5Telektronikk 1.2002

this case an observer’s uncertainty about thefuture of the event is low, which means he willnot receive much new information by continuedobservation of the event.

Thus, an event’s information content is inti-mately linked to the a priori degree of random-ness and uncertainty associated with the event:The more predictable an event is, the less knowl-edge we need to describe or predict it; thus theless information we receive by observing it.

Conversely, if an event is highly unpredictable, adetailed observation of its actual development isneeded in order to describe it, which means itsinformation content is high. This holds regard-less of the “physical” content of the event inquestion – sometimes referred to as the semanticmeaning of the information.

The above can be restated as saying that the actualsemantic meaning in a message is of no conse-quence for the amount of information carried bythe message. All that matters is the degree of pre-dictability; i.e. its statistical properties.

Note, however, that the predictability of an eventis usually strongly linked to our knowledge ofphysical or statistical properties and models ofthe event in question. The same event mayappear unpredictable to one observer and highlypredictable to another, the difference being thatonly the latter observer has a priori knowledgeof the underlying model describing the event.

An example of this is found in language mod-elling, applied e.g. in speech recognition, wherean observer will need knowledge of the structure(usually in the sense of Markov models de-scribed by transition probabilities) and vocabu-lary of a language if he is to attempt predictingfuture words in a sentence, based on what hasalready been said.

Figure 2 Example of time-discrete source output

Figure 3 Examples ofcontinuous-time source outputs

which are respectivelyamplitude-continuous (upper

curve) and amplitude-discrete(lower curve)

5

4

3

2

1

0

-1

-2

-3

-40 10 20 30 40 50 60 70 80 90 100

600100

1

0.5

0

-0.50 200 300

Amplitude-continuous (upper curve)

400 500

600100

1

0.8

0.6

0.4

0.2

0

-0.2

-0.40 200 300

Amplitude-discrete (lower curve)

400 500

Page 8: Information Theory and its Applications

6 Telektronikk 1.2002

In a communication context, at least from thesecond of the two above perspectives, the“events” under study are usually informationmessages passed from a source to a user. Thedegree of uncertainty on the receiver side, andthe a priori predictability of the messages, willthen depend on the statistical properties of theinformation source under study, the impairmentsintroduced by the physical communication chan-nel, and the way the transmitter and receiver aredesigned.

3.2 The Information Content in a Source

Armed with these basic insights, we now addressthe following question: “What is the informationcontent in a given source that outputs messageswhich we are interested in storing, compressing,or transmitting?”

Initially, to make the explanations and notationsimple while still illuminating the general the-ory, we will consider an information source thathas been digitized, such that its output is discreteboth in time (sampled) and amplitude (quan-tized). Figure 2 shows outputs from a discrete-time source, while Figure 3 shows examples ofcontinuous-time source outputs that are ampli-tude continuous and discrete, respectively. Whenboth time and amplitude have been discretizedwe say that the source output consists of dis-crete-valued samples, and the source is referredto as discrete.

We denote the set of J possible sample values(also called representation levels, or more gener-ally source symbols) by S = x1, ..., xJ. This setis called the source alphabet. We assume that

the probability of the source output xj is known(e.g. through histogram estimation based ontraining data) to be pj.

3.2.1 Source EntropyLet us now introduce a notion which will turnout to be intimately linked to source informationcontent: The entropy of a source S as describedabove is defined as

(1)

and is measured in information bits per sourcesymbol. Note that the base of the logarithm inthe general definition of entropy is arbitrary, butfor the entropy to be measured in units of bitsper sample, the binary logarithm must be used.

An illustration of the entropy for the special caseof a binary source (J = 2) is shown as a functionof symbol probability p (the two symbols in thiscase have probabilities p and 1 – p, since theirprobabilities must sum to 1) in Figure 4. Notethat the entropy reaches its maximum – of 1information bit per symbol – when the two pos-sible outcomes are equiprobable, i.e. p = 0.5.

3.2.2 Entropy as a Measure ofInformation Content

A commonly used information theoretic state-ment is that “the source entropy (as definedabove) is a measure of the average amountof information per symbol produced by thesource.” What does this mean? Shannon’s firstcoding theorem (the noiseless source codingtheorem) addresses this question in a preciseway [5].

In the simplest version of this theorem, it isassumed that the source in question is memory-less, i.e. that the different source symbols outputby the source are statistically independent ofeach other. Also, for simplicity, we shall assumethat we want to represent each source symbolby a distinct binary word, for storage or trans-mission on a channel where only binary symbolsare admitted (such as is the case e.g. on a CD).

We denote by lj the number of bits in a code-word which will be used to represent the sourcesymbol xj. The set of codewords for all j makeup the source code. Reasonable demands on thiscode are that it should be able to encode anysymbol string output by the source withouterrors, and that it should be uniquely decodable,i.e. an arbitrary encoded binary string should bepossible to decode without errors into a uniquesequence of source symbols.

In this situation H(S) turns out to be the lowerlimit for the average number of bits per symbol

Figure 4 The entropy ofa binary source

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Ent

ropy

(in

fom

atio

n bi

ts/s

ymbo

l)

0 0.1 0.2 0.3 0.4 0.5

Probability p

0.6 0.7 0.8 0.9 1

H(S) = −

J∑

j=1

pj log2pj ,

Page 9: Information Theory and its Applications

7Telektronikk 1.2002

(average codeword length), , we may use torepresent the source output without errors orambiguities:

(2)

In other words, it is not possible to construct acode with a lower average codeword length thanH(S) bits per codeword. Conversely, it is theo-retically possible to construct codes whose aver-age length comes arbitrarily close to this limit(from above). This is the essence of the noiselesscoding theorem.

Thus the notion of entropy as a measure of thesource information content is rather obvious ifviewed from a data storage viewpoint. Note thatthe entropy is maximal (log2 J bits/symbol) if allsource symbols are equiprobable, i.e. maximallyunpredictable. It is minimal (0 bits/symbol) for adeterministic source1), whose future output “tellsus nothing new” and is thus totally redundant.

4 Source Code ConstructionOf course, one thing is to know the attainablelimit of code efficiency, quite another is to actu-ally find a code approaching this limit. Happily,information theory also provides necessary andsufficient conditions for how to construct thecodewords of an optimal code, which actuallyapproaches the above lower codeword lengthlimit as closely as possible [4].

We will here still consider only the (practicallyquite interesting) case of a binary, variable-length, and prefix-free code – i.e. a code whosecodewords are binary strings, where differentcodewords might have different lengths, and inwhich no codeword is a prefix of another. Thepractical interest of this last property is to makethe code instantaneously decodable, i.e. eachcodeword can be immediately decoded withoutany need for “look-ahead” to other codewords.

A necessary condition for a variable-length pre-fix-free code to be uniquely decodable is that allof its codewords can be put on a code tree. Anexample of such a tree is shown, for the exampleof a source alphabet with four symbols, in Fig-ure 5. The way to obtain a codeword from thetree is to start at the top node and write downthe binary-valued labels as the tree is traverseddown to one of the end nodes. Each resultingsequence of labels then constitutes one code-word – one for each end node. It is easily seenthat the set of codewords thus constructed hasthe prefix-free property.

4.1 Huffman CodingAs stated previously, there exists a constructivealgorithm which may be used to build an optimalcode according to the above principles. The re-sulting code is the well-known Huffman code [7].

The Huffman algorithm is best illustrated bymeans of an example: Consider a source with 5different possible symbol outcomes x1, …, x5,with corresponding probabilities given by

p1 = 0.30p2 = 0.24p3 = 0.20p4 = 0.15p5 = 0.11 (3)

The Huffman code construction for this case isillustrated in Figure 6. As seen from the figure,the symbol probabilities are initially orderedfrom the largest to the smallest. The two leastprobable symbols (to begin with, symbols 4 and5 in this case) are assigned a distinct binary codesymbol each – 0 or 1 – to be able to distinguishbetween them. Then their probabilities areadded, to form the probability of a “quasi-sym-bol” whose outcome is defined as “one of thetwo outcomes whose probabilities have just beenadded”. The code symbols just assigned can beused to distinguish between the two possiblequasisymbol outcomes.

Then the remaining probabilities are re-ordered,with the quasi-symbol probability being insertedat its proper place according to its size. Theabove procedure is thereafter repeated for theremaining symbols/quasi-symbols. Repeatingthe procedure several times, in the end there

Figure 5 Code tree for thebinary prefix-free code

0, 10, 110, 111

1) This is the case e.g. if p1 = 1 and p2 = … = pJ = 0.

0

0

0

1

1

1

l =J∑

j=1

pj lj ≥ H(S).

l

Page 10: Information Theory and its Applications

8 Telektronikk 1.2002

will be only two symbols/quasi-symbols left(such as is the case to the right in Figure 6),of which one is assigned a 1 and the other a 0.The codewords which are used to distinguishbetween all original symbol outcomes can nowfinally be deduced by tracing the paths from thelast two symbols back to each original symbol,while at the same time noting the binary labelsgiven at each forward repetition of the proce-dure. In this example such backtracing givesthe codes

C2 = 0, 1 for the two-symbol quasi-source,C3 = 1, 00, 01 for the three-symbol quasi-

source,C4 = 00, 01, 10, 11 for the four-symbol

quasi-source,C5 = 00, 10, 11, 010, 011 for the original

five-symbol source. (4)

C5 is the Huffman code for the source.

4.2 Source ExtensionIn general, to obtain a code with an averagecodeword length coming arbitrarily close to theentropy, one must combine Huffman codingwith source extension, a process in which vec-tors of source symbols are treated as single sym-bols from an “extended” source. Using vectorsof length n, the extended source has Jn vectorsymbols. As an example, a ternary (3-dimen-sional) extension of a binary source with alpha-bet 0, 1 will yield a source with binary vectoroutcomes in the alphabet 000, 001, 010, 011,100, 101, 110, 111.

If the source is assumed memoryless, each vec-tor is associated with a probability equal to theproduct of the probabilities of its individual ele-ments. A Huffman code may now be constructedaccording to this new, extended set of probabili-ties. The larger n is used, the closer to theentropy we are able to come; in fact it is easy toshow that the following bounds on the averagecodeword length hold for a Huffman code whenn-fold source extension is used [5]:

[bits per symbol] (5)

The above result is dependent on the fact that weknow the symbol probabilities. Thus a limitationof the Huffman code is that the source’s proba-bility distribution has to be known or estimateda priori in order for a code to be constructed. Ifthis distribution is estimated incorrectly, thecode will be less efficient – the average code-word length will be increased by an amountwhich can also be quantified exactly using in-formation theoretic notions [5].

However, there also exist source code construc-tions which are able to asymptotically (as thenumber of symbols to be coded goes to infinity)reach the entropy bound even without any a pri-ori knowledge of the source’s statistical proper-ties. Such codes are called universal codes. TheLempel-Ziw algorithm [5] for compaction oflarge data files is by far the most used universalcode. Note that universal codes do not necessar-ily provide compression for short symbolsequences.

4.3 Sources with MemoryMost natural information sources are not memo-ryless; rather, the samples are correlated witheach other. Indeed, this is what makes it possiblefor us as observers to make sense of the sourceoutput. A good example is again that of humanlanguages: It is precisely the fact that there existwell-defined grammatical rules and structure,which lead to interword and inter-sentence cor-relation (i.e. memory, or statistical dependen-cies) which makes it possible to convey andunderstand information by means of a writtentext or a spoken message. How does the abovetheory apply to sources with memory?

4.3.1 Entropy RateThe answer can loosely be said to be as follows:By applying source extension with sufficientlylarge vector lengths, any random source is essen-tially made into a memoryless extended source.There may be correlation or statistical depen-dency between individual elements within eachvector, but not between two different vectors– or “extended symbols” – if these are longenough. To make this argument stringent in the

H(S) ≤ l ≤ H(S) +1

n

Figure 6 Construction ofHuffman code for a 5-symbolsource. Two arrows pointingto a single node denotes theadding of two probabilities toform a “quasisymbol”. Thebacktracing to find the code-word 010 for the symbol X4 isshown as dashed arrows, withcorresponding binary labelsmarked by circles

00

10

11

010

011

1

X2

1

X3

1

X4

1

X5

0.30 0.30 0.44 0.56

0.24 0.26 0.30 0.44

0.20 0.24 0.26

0.15 0.20

0.11

0

0

0

0

Page 11: Information Theory and its Applications

9Telektronikk 1.2002

general case, one needs to let the vector lengthn go to infinity, to account for correlation overarbitrary lags. The minimum average number ofbits per original source symbol is now equal tothe entropy rate of the source,

(6)

where H(S(n)) is the entropy of the source S ex-tended to n-dimensional “supersymbols”. Theentropy rate thus replaces the first order entropyas the natural measure of information content ina source with memory.

4.3.2 Markov SourcesOne common way of modelling memory insources is to use Markov models. This is anaccepted model e.g. for natural speech produc-tion. A (discrete-time, discrete-amplitude)Markov model of order M has the propertythat its memory stretches back only M time in-stances. More precisely, the probability of asymbol outcome at time m is dependent on theoutcomes at times m – 1, …, m – M, but not fur-ther back in time. Each possible set of the M pre-vious outcomes constitutes a model state. Thenumber of possible states is less than or equal toJM, where J is the number of possible outcomesin the source alphabet. Statistically, the model iscompletely characterized by transition probabil-ities which together quantify the probability ofeach state being followed by each of the otherstates.

A Markov model is usually visualized by meansof a state diagram. A state diagram of a simple1st order Markov model with J = 3 source sym-bols a1, a2, a3 is depicted in Figure 7. Here,

is by definition the probability of the

transition from aj to ai. If there is no arrowbetween two of the states, it means that thatparticular transition is impossible to make.

For Markov sources, there is a simple expressionfor the entropy rate. It is simply the expectedvalue – with respect to the state distribution –of the entropy conditioned on being in a givenstate. Denoting the nth state by sn we obtain:

(7) H (S ) = P

n=1

J M

∑ sn( )H X sn( )

Pai |aj

H (S ) = limn→∞

H S(n)( ),

where , with

. This set of equations can be

used to solve for the state probabilities P(sl),l = 1, ..., JM, when the transition probabilitiesP(sn|sl) are given. The conditional entropyH(x|sn) is given by the usual entropy formula,only with state-conditional symbol probabilitiesreplacing unconditional symbol probabilities:

(8)

Equation (7) can now be used to find the ulti-mate limit of error-free compressibility for aMarkov source, and a set of state-conditionalHuffman codes can be used to provide efficientsource compaction. That is to say, for each statewe can construct an optimal Huffman code asbefore, but according to the conditional symbolprobabilities corresponding to the state. Theencoder will then switch between the differentHuffman codes according to the state of thesource.

4.4 Continuous-Amplitude SourcesMost “natural” sources not only have memory,but are analogue in nature, and hence producedata that are continuous both in time (or space)and amplitude. Examples are acoustic and elec-tromagnetic waves as encountered e.g. in audiotechnology and in radio communications. Sam-pling and quantization operations convert suchsources into time and amplitude-discrete ones asdiscussed in previous sections. Of these opera-tions, sampling can preserve all information aslong as the source is band limited and the sam-pling theorem criterion, i.e. sampling at a rateof at least twice the bandwidth of the source, isfulfilled.

H X sn( ) = − Pj =1

J

∑ x j sn( ) log2 P x j sn( ).

Pl =1J M

∑ sl( ) = 1

P sn( ) = Pl =1J M

∑ sl( )P sn sl( )

Figure 7 State diagram of a 1st order,3-state Markov source

Pa1|a1

Pa3|a3Pa2|a2

a1

a3 a2

Pa1|a3Pa3|a1

Pa2|a1Pa1|a2

Pa2|a3

Pa3|a2

Page 12: Information Theory and its Applications

10 Telektronikk 1.2002

Quantization, however, always reduces informa-tion, as measured through the source entropy.In fact, while the entropy of a discrete source isalways finite, every nontrivial continuous-ampli-tude source has infinite entropy. This merelyreflects the fact that with a continuous probabil-ity density, every fixed amplitude value hasprobability zero, so it is impossible to predict(or even measure) with infinite resolution whichvalues forthcoming samples will have.

The way to see that this is true mathematically isto model the continuous amplitude distributionas the result of a uniform quantization, with aquantization interval ∆ that goes to zero. Thisis illustrated for a Gaussian source in Figure 8.

The entropy may then be approximated by a sumover the quantized probability distribution. Thissum converges to an integral over a source prob-ability density function (pdf) fX(x) as ∆ goes tozero. The expression for the amplitude-continu-ous source entropy then becomes

(9)

The last term is an infinitely large positive con-stant, which is common to all analogue sources.This implies that the absolute entropy of an ana-logue source will be infinite.

The first integral term in the above equation,however, is finite and well defined, and meritsits own name and notation: It is the differentialentropy, h(S), of the source. The main reason forthe importance of this term, which in itself hasno fundamental physical interpretation2), is thefact that several important information theoreticconcepts, such as channel capacity, are definedin terms of differences between entropies.

In the continuous case, the difference betweentwo (infinitely large) source entropies is alwaysequal to the difference between the correspond-ing (finite) differential entropies, whose valuescan be computed. This situation arises becausethe term – lim∆x→0 log2 ∆x is the same for allcontinuous sources, and thus is cancelled outwhenever a difference is computed.

5 Information Transmissionover a Channel

We now consider the second of the two perspec-tives previously introduced regarding informa-tion content: That of quantifying the amount ofinformation transferred to a user from a sourceby means of a channel. In doing so, we need tointroduce another information theoretic notion,that of mutual information. This is a functionwhich can be said to measure the amount of use-ful source information received by a user. Here,it is important to emphasize the usability ofinformation received, as the total “informationcontent” received can stem from many sources,including noise and interference. This type of“information” is not only superfluous; as weshall see it actually reduces the useful informa-tion.

Mutual information is one of the most funda-mental information theoretic concepts. We willshow later that it can be used to describe notonly the capacity properties of communicationchannels, but also the compression possibilitiesfor a given source when a certain amount oferror (distortion) is accepted in the source repre-sentation.

5.1 The Mutual Information FunctionInitially consider two given discrete, possiblystatistically dependent random variables, X ∈x1, ..., xJ and Y ∈ y1, ..., yΚ. We can think ofX as the input to, and Y as the output from acommunication channel; i.e. Y is a “noisy” ver-sion of X. X and Y have probability distributionsp = [p1, ..., pJ]T and q = [q1, ..., qΚ]T respec-

2) In some cases, though, it can be seen as a measure of the “relative randomness” of a given source compared to another: For example, for two differentGaussian sources, the one with the largest variance (and hence the largest amplitude variation from sample to sample) will have the largest differentialentropy. However, it is not possible to successfully generalize this interpretation to two sources with different functional forms on their probabilitydensity.

Figure 8 A Gaussian pdf withexpected value 0, variance 1,with P(0.5 ≤ x ≤ 1) showncoloured, together with theapproximation (quantization)pX(0.75) ⋅ 0.5 (i.e. ∆ =0.5)

-2

0.5

0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

H(S) =

∫∞

−∞

fX(x) log2fX(x)dx

− lim∆x→0

log2 ∆x

Page 13: Information Theory and its Applications

11Telektronikk 1.2002

tively. The conditional probability distributionof xj, given yk, is denoted Pj|k. Pj|k will then de-scribe the probability of xj being transmitted ifyk is observed at the receiver.

The mutual information between X and Y is nowdefined as

I(X;Y) = H(X) – H(X|Y), (10)

where H(X) is the entropy of the source that out-puts X, and, by definition,

(11)

H(X|Y) should be read “the entropy of X when Yis observed”. It is a conditional entropy, whichmeans that it is based on the a priori knowledgeof some information – in this case Y. It is simplya measure of the average information content(uncertainty) which is left in X when Y is known.

Then, the mutual information I(X;Y ) can bethought of as the information Y gives about X –the average reduction in the observer’s uncer-tainty about X which is brought about by observ-ing Y. If X is the input to, and Y is the outputfrom, a given communication channel, H(X|Y)can then be thought of as information “lost” bythe channel.

For an ideal (i.e. noiseless) channel Y would ofcourse be equal to X, in which case it is easy toshow from the definition that H(X|Y) is zero.Hence I(X;Y) = H(X), as of course is intuitivelycorrect in this case: All transmitted informationhas been received; X has been fully determinedby observing Y.

Correspondingly, if X and Y are independentvariables, Y carries no information about X, andknowing Y cannot reduce our uncertainty aboutX. In a communication context this would corre-spond to the received signal being completelyburied in noise. Hence it is intuitive that H(X|Y)= H(X) – all information is lost during transmis-sion. Thus the intuitive result would be I(X;Y) =0 in this case. This also turns out to be the case

if the above formulae are applied. For most prac-tical channels, 0 < I(X;Y) < H(X).

It is interesting to note that the mutual informa-tion is a symmetric function, i.e. I(X;Y) = I(Y;X)= H(Y) – H(Y|X). In some cases, like when com-puting the channel capacity of a channel withinput X and output Y, this might actually be amore useful form of the function. Physically,it means that the transmitter’s degree of uncer-tainty regarding what will be received, is thesame as the receiver’s uncertainty regardingwhat was sent.

The above theory can be beautifully and simplyillustrated by means of a Venn diagram, asshown in Figure 9. The joint entropy H(X;Y)referred to in this figure is the total averageuncertainty an observer will encounter regardingthe simultaneous outcomes of X and Y.

5.2 Mutual Information in theContinuous Case

Mutual information is an example of a conceptdefined in terms of a difference between twoentropies. If the variables X and Y are continuousrandom variables (with infinite absolute entro-pies), this difference reduces to a differencebetween differential entropies. The formulabecomes

where fXY(x, y) is the joint pdf of X and Y. Bothintegrals are taken from –∞ to ∞. Through theuse of Bayes’ rule [4] we may manipulate this

I X Y h X h X Y

f x y f x dxdyXY X Y

( ; ) ( ) ( | )

( , )log ( )|

= −

= − ∫∫ 2

Figure 10 Channel encodingand decoding as mappings

Figure 9 Visualization of therelationship between entropies,

conditional entropies, mutualinformation and joint entropy

for two variables X and Y

H(X|Y)

H(Y|X)

I(X;Y)

H(X,Y)

H(X) H(Y)

sequence

of noisychannelsymbols

ChannelChannel

encoder C

sequence

of channelsymbols

Channeldecoder D

information

message

decoded

informationmessage

Noise and distortion

H(X | Y ) =K−1∑

k=0

qk · H(X | yk)

= −

J−1∑

j=0

K−1∑

k=0

qkPj|k log2Pj|k

Page 14: Information Theory and its Applications

12 Telektronikk 1.2002

An important question now is:

Under what circumstances, if any, is ittheoretically possible to design a channelcoding system (encoder + decoder) suchthat the overall transmission of informationfrom source to user becomes as reliable aswe desire?

The answer lies in the capacity of the channel,and in Shannon’s second coding theorem, alsoknown as the channel coding theorem. In orderto introduce this result, we need some notation,and a statistical model for the channels understudy.

6.2 Channel ModellingA discrete memoryless channel may be de-scribed by its transition probability distribution,i.e. a description of how probable it is that thevarious input symbols to the channel emergefrom the channel as each of the possible outputsymbols. Consider a discrete channel having aninput alphabet A = a0, …, aJ-1 and an outputalphabet B = b0, …, bK-1. A practical exampleis a channel where BPSK modulation symbolsare transmitted (J = 2), while the channel addscontinuously distributed noise, and the receiverperforms quantization to, say, K = 8 levels, ofthe noisy (and thus amplitude-continuous)received symbols. Such quantization must bedone in order to facilitate the use of digital pro-cessing during subsequent decoding.

Such a channel may be described by a probabil-ity transition matrix P whose (k,j)-element isPk|j, the probability that the channel output is bk

when the input was aj. It is also common to visu-alize such a channel by an input-output transi-tion diagram as exemplified in Figure 11.

For additive-noise channels transmitting contin-uous-amplitude symbols we may write the chan-nel output as Y = X + Z where X is the channelinput and Z is the additive noise. The probabilitydensity function of the noise, fZ(z), then de-scribes the channel.

6.3 The Channel Capacity of aMemoryless Channel

For a given memoryless channel, we now intro-duce the notion of channel capacity at cost S.The most obvious “cost” in a communicationsystem is perhaps an upper limit on the averagesymbol power available for transmission overthe channel. Another possible cost is that ofbandwidth. For simplicity, we consider the casewhere the bandwidth is given, and thus is nota subject for optimization. The capacity at cost

Figure 11 Input-outputtransition diagram fordiscrete memoryless channel(J= 2, K = 3) a1

a2

b1

b2

b3

Pb1|a2

Pb1|a1

Pb2|a1

Pb2|a2

Pb3|a1

Pb3|a2

formula in various ways to incorporate the prob-ability densities that are most practical to use ina given situation. We shall now see how themutual information function can be used to findthe capacity of a general communication channel.

6 Channel Capacity and CodingAs discussed in the introduction, most of thephysical channels we transmit information overare subject to noise and distortion of variouskinds, resulting in errors in the received wave-forms/channel symbols. For the sake of simplic-ity we shall here be content to study memorylesschannels, which transmit discrete- or continu-ous-valued symbols in discrete time intervals3).Memorylessness of a channel means that theadded noise at a given time instance does notinfluence the channel output in any other timeinstances.

6.1 Channel CodingChannel coding means applying a certain map-ping (the channel encoder C) from the sourcealphabet S to a channel codeword alphabet C,and a mapping from C back to S (the channeldecoder D). In a communication system themappings are applied as depicted in Figure 10.

The two mappings should be designed to ensurethat transmission errors in the symbol sequencesproduced by the channel encoder will not neces-sarily result in errors in the decoded source sym-bol sequences (information messages) producedby the channel decoder. Examples of classicalchannel codes are algebraic forward-error-cor-recting codes such as BCH codes, and convolu-tional codes [8]. More recently the field hasbeen revolutionized by the advent of iterativedecoding, especially as applied to turbo codesand low-density parity check codes. We refer tothe paper [9] by Øyvind Ytrehus in this issue ofTelektronikk, and the references therein, for atutorial introduction to these fields.

3) This is a valid description also for a channel transmitting continuous-time waveforms, if it is perfectly bandlimited and the Nyquist criterion is fulfilled.

Page 15: Information Theory and its Applications

13Telektronikk 1.2002

S is still defined in terms of a maximum of themutual information I(P;Q) between channelinput and output4), as

bits per channel use (12)

where PS is the set of all possible channel sym-bol distributions (discrete or continuous depend-ing on the channel) such that the average costper channel use, E[s], is less than or equal tothe constant S. P is our notation for an arbitrarychannel symbol distribution, while Q denotesthe channel’s statistical model – here assumedgiven. Note that the probability distribution Pwhich actually achieves the maximum is the dis-tribution of symbols that the channel encodermust approximate if we are to achieve perfor-mance close to the capacity limit in practice.

We shall now state the channel coding theorem,which shows the physical significance of thecapacity C(S):

The channel coding theorem:Let C(S) be the capacity of a memorylesschannel at cost S. For any R < C(S), and forany desired reliability of transmission overthe channel (as measured through probabil-ity of channel decoding error) there exists achannel code of rate R (information bits perchannel symbol) such that the desired relia-bility may be obtained. For rates R > C(S)no such codes exist, and there will alwaysbe a nonzero probability of decoding errors.

What was completely novel about this result,compared to the usual line of thinking beforeShannon’s papers, was that it shows that theachievable transmission rate is not a function ofthe desired degree of communication reliability.Either communication can be made reliable, or itcannot. In the case that it can, it can be made asreliable as we want: Methods for achieving anydesired reliability have been proven to exist fortransmission rates all the way up to the (constant)channel capacity – at the cost of increased systemcomplexity. Conversely, it has been proven thatthere are no transmission schemes to be foundwhich can guarantee any degree of reliability ifone attempts to transmit above capacity.

The channel coding theorem holds for both dis-crete and continuous channels. It is, perhaps,chiefly an existence theorem; in other words,it does not exactly tell us how to construct ourchannel coding systems – beyond imposing nec-essary constraints on the channel symbol distri-bution.

C(S) = max

P∈P s

I(P;Q)

However, it has indeed been demonstrated thatwhen these constraints are sought met, by so-called shaping [10], the system performancedoes improve considerably over that of the morecommon systems where simple uniform symboldistributions are used regardless of the channel.Also, it is worth mentioning that the high-perfor-mance turbo codes are – perhaps almost by acci-dent, but nonetheless – constructed in a waywhich owes a great deal to techniques used byShannon when proving the channel coding theo-rem. It may seem that information theory canbe quite constructive after all, if only designersachieve enough insight into its results.

6.3.1 The Delay ProblemA limiting factor of the channel coding theorem,and indeed of many information theoreticresults, is that if rates close to the capacity aredesired, we may have to resort to codingextremely long blocks of symbols at a time inorder to obtain the desired reliability. This willyield a very complex system with long codingdelays. Much work has been done on so-calleddelay-constrained communication systems, andthe corresponding capacity limits. The paperby Pål Orten and Bjarne Risløw [11] in this issueof Telektronikk contains a discussion of howadditional delay constraints limit the achievablecapacity.

6.4 The Capacity of a BinarySymmetric Channel

The simplest example of a communication chan-nel one can think of is the binary symmetricchannel. This channel, which models well e.g.the storage on a CD, or an optical fibre withbinary modulation, takes binary symbols (de-noted 0 and 1, for simplicity) as input and trans-mits them with a symmetric probability of error,denoted p. That is to say, the probability of atransmitted 0 being received as a 1 is p, as is theprobability of a 1 being received as a 0. Notethat p can be limited to [0, 0.5] since, if p werehigher than 0.5, the bit error rate could beimproved by inverting all the received bits.The probability transition diagram for thischannel is depicted in Figure 12.

Figure 12 A binary symmetric channel

0 0

1 1

1-p

1-p

p

p

4) The mutual information is really a function of the input distribution and the channel transition probability properties, not the random input and outputvariables themselves – therefore we often write it as I(P;Q) instead of I(X; Y).

Page 16: Information Theory and its Applications

14 Telektronikk 1.2002

The channel capacity is achieved when the inputsymbols are uniformly distributed, i.e. P =0.5, 0.5. The maximum mutual informationachieved is

C = 1 + p log2(p) + (1 – p) log2(1 – p)[information bits per channel bit] (13)

In other words, the capacity is C = 1 – H(p),where H(p) = –p log2(p) – (1 – p) log2(1 – p) isthe entropy of the “channel noise source”. Thehigher p is (up to 0.5), the more noise there ison the channel, and the smaller the capacity is.At p = 0.5 there is equal probability that everyreceived bit is correct or wrong, so there is no

chance of correct decoding. Hence, the capacityis zero. At p = 0 and p = 1 the capacity attainsits maximum of 1, meaning that the channel isnoiseless (in the case of p = 1 every single trans-mitted bit is inverted by the channel, but ofcourse these “errors” can all easily be correctedby the receiver!). Thus, in these special cases noerror protection (error control coding) is needed,and every transmitted channel bit can be aninformation bit.

6.5 The Capacity of a GaussianMemoryless Channel

As another very important example, let us con-sider a memoryless continuous-amplitude chan-nel with additive white Gaussian zero-meannoise (AWGN) of power (variance) N, statisti-cally independent of the channel input. That is tosay, the noise power is uniformly distributed infrequency (hence white noise), while the noisesamples follow a Gaussian distribution.

This noise model is valid for all cases wherethere are many independent sources of noise; inparticular it models thermal noise in electroniccomponents well. In a cellular mobile radio sys-tem with many active users transmitting in thesame frequency band, it is also a good approxi-mation when modelling interference betweenusers.

We furthermore assume that the channel isbandlimited to B Hz and that the signals to besent are critically sampled, i.e. at fs = 2B Hz.This is a good model e.g. for a telephone linechannel. For such a channel, Shannon derivedthe following famous formula for the capacityat received signal power S (which is the same astransmitted power if, as is usually done for thischannel model, we assume that no signal attenu-ation is present in the channel):

(14)

measured in information bits per second. SNRdenotes channel signal-to-noise ratio; i.e. theratio between received signal power and noisepower.

This formula, whose fascinating history is tracedin the paper by Lars Lundheim [12] in the pre-sent issue of Telektronikk, has become so wellknown that it can sometimes be found misinter-preted as expressing the capacity of any noisychannel – which it does not. It is visualized inFigure 14, for various values of the channelSNR. From the derivation, not to be done here, itis evident that if we are to actually achieve thiscapacity in practice, we must use a signallingalphabet with channel codewords that also fol-

C(S) = B ⋅ log2 1+S

N⎛⎝

⎞⎠

= B ⋅ log2 1+ SNR( ),

Figure 14 Capacity (bits persecond) as a function of chan-nel signal-to-noise ratio (SNR)for a memoryless, bandwidth-and power-limited channel withadditive Gaussian noise. Thecapacity is depicted for band-widths ranging from 1000 Hz(lower curve) to 4000 Hz (uppercurve), in steps of 1000 Hz

Figure 13 Channel capacity ofa binary symmetric channel

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Cha

nnel

cap

acity

0 0.1 0.2 0.3 0.4 0.5

Transition probability p

0.6 0.7 0.8 0.9 1

5

Channel SNR (linear scale)

8

7

6

5

4

3

2

1

0

Cha

nnel

cap

acity

0 1 2 3 4 6 7 8 9 10

x 105

x 104

Page 17: Information Theory and its Applications

15Telektronikk 1.2002

low a Gaussian distribution, with zero mean andvariance S.

The capacity curves shown in Figure 14 seem toimply a linear, unlimited capacity growth as thebandwidth increases. A weakness of this figure,however, is that it does not take into account thepractical fact that thermal (white) noise power ina communication system is proportional to thesystem bandwidth. Thus, operating at the sameSNR at different bandwidths means that thetransmit power is different for each bandwidth.A more “fair” comparison from a resource pointof view is obtained by normalizing the SNR withrespect to the bandwidth, i.e. express the SNR asthe ratio of transmitted signal power S to noisepower per Hertz bandwidth, denoted N0 [W/Hz].Doing this normalization, we obtain the channelcapacity on an AWGN channel as follows:

[information bits/s] (15)

which implies a finite (assuming finite transmitpower) asymptotic upper bound on the capacityas the bandwidth is increased to infinity:

(16)

For a desired actual transmission rate R ≤ C[information bits/second] this can be used toobtain a lower bound on the transmit energy perinformation bit which must be used if error-freetransmission is to be at this rate. The energyspent per channel symbol is

[J/information bit]. (17)

Thus, S = EbR, so, from Equation (15)

(18)

or

(19)

The absolute lower bound for error-free trans-mission over the channel using any transmissionscheme is obtained from this equation by lettingthe actual transmission rate go to zero:

min Eb = N0 ⋅ ln(2). (20)

This is often expressed as the Shannon limit forreliable transmission on an AWGN channel,

(21)

or –1.6 dB on the decibel scale. Figure 15 illus-trates the upper bound (18) on the achievablerate per Hz bandwidth, as a function of the sig-nal-to-noise ratio per information bit. It can beseen that the upper rate bound goes asymptoti-cally to zero as the SNR approaches the Shannonlimit.

7 Source Compression andRate Distortion Theory

At the beginning of this paper we studied datacompaction, the process of efficient, error-freesource representation. However, in many appli-cations we are willing, or even forced, to acceptsome distortion in the source output – such asblurring or blocking effects in an image, orslightly “synthetic” quality, loss of treble orbackground noise in a speech signal – in order tofit the source data into a given channel, be that amagnetic disc, a telephone line, or a radio link.

As we have seen, each such channel is character-ized by a finite capacity which provides anupper bound on the amount of information perchannel symbol (or per unit time) which can bereliably transmitted. If we have a source whoseinformation content is higher than the capacityof the channel we want to use, we essentiallyhave two choices:

1. Transmit at a rate higher than the sourceentropy rate (and thus higher than the channelcapacity). Admittedly, this enables us to send

Figure 15 Capacity limit of anAWGN channel as a function of

Performance bound

Realizable system

R/W

Eb

N0dB

-1.6

Eb ≥ N0

2R

B − 1

R/B.

Eb

N0

C = B log2

(1 +

S

N0B

)

C(B = ∞) =S

N0 ln(2).

Eb =

S

R

R/B ≤ C/B = log2

(1 +

EbR

N0B

),

min

Eb

N0

= ln(2),

Page 18: Information Theory and its Applications

16 Telektronikk 1.2002

the source data error-free into the channel, butwe are then forced to accept the decodingerrors which invariably will occur after trans-mission over the channel. These errors willtypically be outside of our control, and arestochastic in nature.

2. Before transmission, reduce the informationcontent of the source in a controlled way (thusintroducing a type of distortion over which wehave some amount of control, e.g. low passfiltering or coarser amplitude quantization)until it is lower than the channel capacity.The quality-reduced but hopefully still usablesource can then in principle be reliably trans-mitted over the given channel.

Developing good methods for implementingchoice 2 is the task of source compression.In general, source compression is performed bya source coder-decoder pair, which is placed

within a complete communication system asshown in Figure 16.

The idea that the tasks of source and channelcoding can be separated as shown above, with-out losing overall system optimality, but greatlyreducing design complexity, is rooted in the sep-aration principle, originally devised by Shan-non. However, it is important to realize that thisprinciple holds only when there are no con-straints on computational complexity or overalldelay in the system. For real systems where suchpractical constraints are imposed, there mightbe performance advantages in a joint design ofsource and channel coding. The paper by Tor A.Ramstad [13] in this Telektronikk issueaddresses this problem.

7.1 Scalar QuantizationThe very simplest way of compressing asource’s information content is that of scalaramplitude quantization. A scalar quantizer Q isa nonlinear operation which is used, on a single-symbol basis, to limit the source alphabet to afinite, countable set – i.e. for each real-valuedinput symbol x, which may be taken from a con-tinuous distribution, the quantizer outputs one ofonly a limited number N of amplitude levels orreproduction values. The output y = Q(x) is ineach case chosen based on a nearest neighbourrule. That is, for each input symbol, the quan-tizer simply outputs the quantized output levelwhich is closest to the input on the real line. It iscommon to visualize a scalar quantizer by meansof its quantizer characteristic, which simplydepicts the input-output relation y = Q(x) as afunction.

In Figure 17 a non-uniform scalar quantizercharacteristic is shown. A quantizer is said to benon-uniform when the quantization intervals –the intervals between the possible output inter-vals – have varying length. This is beneficialwhen quantizing sources with non-uniform pdfs.One important example is in the CCITT A-lawused for speech in Pulse Code Modulation(PCM) for telephony [14]. Shown here is a 3-bitquantizer, which simply means that there are 8 =23 output levels, which can be uniquely indexedby means of 3-bit strings or binary codewords.Thus the information content of the quantizedsource in this case is not more than 3 bits persource symbol, whereas the information contentbefore quantization was infinitely large.

7.2 Rate Distortion TheoryThe branch of information theory that providesthe foundation for source compression – i.e. thebounds we search to reach when we compresse.g. an image or an audio signal – is called ratedistortion theory. This theory is first and fore-most concerned with the following question:

Figure 16 Informationencoding and decodingoperations separated intosource and channel encodingand decoding

Channel

encoder

Source

encoder

Channel

encoder

Source

encoder

original

informationsignal

decoded

informationsignal

compressed

information(sourcecode)

decoded

sourcecode

error protected

information(channelsymbols)

noisy

channelsymbol sequence

to channel

from channel

x4

x3

x2

x1

x8

x7

x6

x5

y

Figure 17 3-bit non-uniformscalar quantizer characteristic

Page 19: Information Theory and its Applications

17Telektronikk 1.2002

For a given source S, and a given represen-tation rate R [code symbols/source symbol],find the minimal distortion D that can theo-retically be attained in the reproduction ofS – or, equivalently: For a given maximaldistortion D we are willing to accept in thereproduction of S, what is the minimal rateR which is theoretically possible to use?

The answer to the above question(s) is given bythe rate distortion (or source distortion) functionR(D), which for a given (discrete or continuous)source S is also defined in terms of the mutualinformation function:

(22)

Here QD is the set of all possible noisy test chan-nels that yields an average symbol distortionE[d] less than or equal to D when the source Sis transmitted through the channel.

That is, one imagines that the distortion due tocompression of the source has occurred due tothe transmission of the source over some (imagi-nary) noisy channel, described by a transitionprobability matrix or noise distribution – a chan-nel that gives rise to a distortion not larger thanD. All channels for which this is possible toachieve are considered; then one chooses the onechannel where the lowest transmission rate couldbe used. This minimal rate is then R(D), and thenoise characteristics of the corresponding chan-nel tell us how the distortion due to codingshould be distributed among the source symbols.

The designer challenge when designing goodcompression algorithms is now to find an actualalgorithm which results in reproduction noisewith the same noise characteristics as those ofthe imaginary channel mentioned above.

The choice of symbol distortion measure, d, isarbitrary and may be done by the designer. Inpractice, one often uses the mean square distor-tion between original and decompressed sourcesymbols, i.e. d = (X – Y)2, such that E[(X – Y)2]≤ D. This is mainly due to the computationaltractability and simplicity of this measure.

The exact physical significance of the rate dis-tortion function is summed up in Shannon’sthird coding theorem, also termed the source-compression theorem:

R(D) = minQ∈QD

I(S;Q).

The source-compression theoremLet S be a memoryless source with rate dis-tortion function R(D). Then, for every D > 0it is possible to find a source code of rater > R(D) such that the source after beingencoded and decoded with this code isreproduced with average symbol distortionless than or equal to D. If r ≤ R(D) thereexists no code such that this is possible.

Again, the theorem holds for both discrete andcontinuous sources. It does not tell us exactlyhow to find practically realizable source codesthat approach the rate-distortion function for agiven source. However, from the proof of thetheorem, as for the case of channel coding, it isevident that we may have to consider encodingsymbol blocks of possibly infinite length n (andhence infinite delay and coder complexity) inorder to do this.

As an example, consider a continuous, memory-less source with symbol outcomes following aGaussian distribution N(0, σ2). This is a reason-ably valid model e.g. for sub-band samples com-ing from a well-designed analysis filterbank[15]. For such a source, the rate distortion func-tion is given by

, for 0 ≤ D ≤ σ2 (23)

In Figure 18 this function is depicted, for σ2 = 1.Even though closed form expressions for the ratedistortion function may be found for only rela-tively simple source models [6], its basic generalproperties may be generally derived from theformal definition:

R(D) =1

2log2

σ 2

D

Figure 18 The rate-distortionfunction for a memoryless

N(0, 1) source

0.5

Quadratic per-symbol distortion D

0

3.5

3

2.5

2

1.5

1

0.5

0

Rat

e di

stor

tion

func

tion

R(D

)

0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1

Page 20: Information Theory and its Applications

18 Telektronikk 1.2002

The function always goes from the rate valueH(S) (which here is infinite, since the source iscontinuous) at distortion 0, to rate 0 at somefinite maximal distortion Dmax. Between thesetwo points the function is always strictly de-creasing and convex; hence also continuous.This implies that we can be sure that if we in-crease the rate, it is always possible to obtain asource representation that has strictly improved,at least in a signal-to-noise ratio sense. The rela-tive improvement will be largest at low rates.

7.3 OPTA – Optimum PerformanceTheoretically Attainable

The capacity bound for noisy channels and therate distortion bound for compression may becombined to produce what is commonly referredto as the Optimum Performance TheoreticallyAttainable, or OPTA. This is the best perfor-mance that can ever be achieved in a systemwhere a certain source is to be transmitted over agiven channel. If the rate distortion function ofthe source is R(D), and the channel capacity isC, the minimal distortion that can be achieved isfound by solving the equation C = R(D) withrespect to D.

For an AWGN channel where the channel sym-bol rate is 1/Tc [channel symbols/s] and thesource symbol rate is 1/Ts [source symbols/s],this solution can be found on a simple closedform [13]:

(24)

with corresponding maximal SNR after decod-ing given by

(25)

In Figure 19 this optimally achievable SNR afterdecoding is shown, for various ratios betweenthe source and channel symbol rate. It is seenthat the more channel bandwidth spent (band-width expansion), the better the fidelity. This isbecause bandwidth expansion allows for the useof more powerful error control schemes.

8 Concluding RemarksThis paper has merely scratched the surface of ahuge and continuously expanding field, and triedto convey some basic insights into how informa-tion theory works. There is a lot more to begained from studies of information theoretic top-ics than what could possibly be included here,e.g. error analysis of communication systems,construction and performance analysis of chan-nel and source codes, optimization of quantiza-tion schemes, and so on.

Information theory does of course not provide uswith all the answers to questions regarding com-munication system design. However, it doesoften provide us with important pointers as towhat we should or should not do. For an exam-ple, consider modern telephone line modemstandards utilizing trellis coded modulation(TCM) techniques [16], with throughput rates upto 33.4 kbit/s. 15 years ago, it was thought“impossible” to implement such modem rates inpractice due to channel noise problems. Yet thecapacity theorems of information theory have allthe time indicated that the fundamental attain-able rate limit for error-free transmission is farhigher — approximately 60 kbit/s for today’stelephone lines. The modern high-rate modemswould have been unimaginable without theinsights that information theory provides, aswould the knowledge of the actual potential tobe gained.

The same can be said for most modern channelcoding, modulation, compression, and com-paction techniques. It is thus clear that informa-tion theory can be of much more practical usethan what is suggested by the maybe too-com-mon picture many communication engineershave of the field, as a set of theoretical bounds,attainable only with the aid of both unknown,and possibly infinitely complex, algorithms. Itis a challenge to those of us who are teachinginformation theoretic concepts to replace thispicture with a more positive and useful one.

Finally, we remark that there are still many chal-lenges for information theory, and communica-tion system design problems which might bene-fit from its insight. We would particularly like tomention the following fields as exciting areas forinformation theoretic research in the future:

20

Channel signal-to-noise ratio γ

150

90

80

70

60

50

40

30

20

10

0

SN

Rop

ta

5 10 25 30 35 40

Figure 19 Optimal signal-to-noise ratio (dB) after decod-ing, as a function of the chan-nel signal-to-noise ratio (dB),for a Gaussian memorylesssource transmitted overa memoryless, bandlimitedAWGN channel. Solid line: Ts = Tc/2 (bandwidthreduction during transmis-sion). Dashed line: Ts = Tc.Dash-dotted line: Ts = 2Tc

(bandwidth expansion duringtransmission)

SNRo =S

Do

.

Do = S

(1 +

S

N0B

)−

Ts

Tc

,

Page 21: Information Theory and its Applications

19Telektronikk 1.2002

• Design and performance analysis of multiusercommunication networks and MAC protocols;

• Time-varying and frequency-dispersive wire-less channels, particularly multiple-input-mul-tiple-output (MIMO) channels;

• Diversity issues in general.

It seems quite clear that information theory stillhas an important role to play if tomorrow’s com-munication systems are to live up to the expecta-tions towards them. Shannon’s ideas are likely tocast long shadows into the 21st century.

References1 Shannon, C E. A mathematical theory of

communication. Bell Syst. Tech. J., 27,379–423 and 623–656, 1948.

2 Shannon, C E. Communication in the pres-ence of noise. Proc. IRE, 37, 10–21, Jan.1949.

3 Shannon, C E. Coding theorems for a dis-crete source with a fidelity criterion. IRENat. Conv. Rec., 142–163, Mar. 1959.

4 Blahut, R E. Principles and Practice ofInformation Theory. Reading, MA, AddisonWesley, 1987.

5 Cover, T M, Thomas, J A. Elements of Infor-mation Theory. New York, Wiley, 1991.

6 Berger, T. Rate Distortion Theory. Engle-wood Cliffs, NJ, Prentice-Hall, 1971.

7 Huffman, D A. A method for the construc-tion of minimum redundancy codes. Proc.IRE, 40, 1098–1101, 1952.

8 Blahut, R E. Theory and Practice of ErrorControl Codes. Addison-Wesley, 1984.

9 Ytrehus, Ø. An Introduction to Turbo Codesand Iterative Decoding. Telektronikk, 98 (1),65–77, 2002. (This issue.)

10 Forney, G D Jr. Trellis shaping. IEEE Trans.on Information Theory, 38 (2), 1992.

11 Orten, P, Risløw, B. Theory and Practice ofError Control Coding for Satellite and FixedRadio Systems. Telektronikk, 98 (1), 78–91,2002. (This issue.)

12 Lundheim, L. On Shannon and “Shannon’sformula”. Telektronikk, 98 (1), 20–29, 2002.(This issue.)

13 Ramstad, T. Shannon Mappings for RobustCommunication. Telektronikk, 98 (1),114–129, 2002. (This issue.)

14 Haykin, S. Communication Systems. Wiley,1994 (3rd ed.).

15 Ramstad, T A, Aase, S O, Husøy, J H. Sub-band Compression of Images – Principlesand Examples. North Holland, Elsevier,1995.

16 Biglieri, E et al. Introduction to Trellis-coded modulation with applications. NewYork, Macmillan, 1991.

Page 22: Information Theory and its Applications

Telektronikk 1.2002

It is interesting to note that as practical limita-tions were removed, several fundamental orprincipal limitations were established. Forinstance, Carnot showed that there was a fun-damental limit to how much energy could beextracted from a heat engine. Later this resultwas generalized to the second law of thermo-dynamics. As a result of Einstein’s special rela-tivity theory, the existence of an upper velocitylimit was found. Other examples includeKelvin’s absolute zero, Heissenberg’s uncer-tainty principle and Gödel’s incompletenesstheorem in mathematics. Shannon’s Channelcoding theorem, which was published in 1948,seems to be the last one of such fundamentallimits, and one may wonder why all of themwere discovered during this limited time-span.One reason may have to do with maturity. Whena field is young, researchers are eager to find outwhat can be done – not to identify borders theycannot pass. Since telecommunications is one ofthe youngest of the applied sciences, it is naturalthat the more fundamental laws were establishedat a late stage.

In the present paper we will try to shed somelight on developments that led up to Shannon’sinformation theory. When one compares thegenerality and power of explanation of Shan-non’s paper “A Mathematical Theory of Com-munication” [1] to alternative theories at thetime, one can hardly disagree with J.R. Piercewho states that it “came as a bomb” [4]. In orderto see the connection with earlier work, we willtherefore focus on one particular case of Shan-non’s theory, namely the one which is some-times referred to as “Shannon’s formula”. Aswill be shown, this result was discovered inde-pendently by several researchers, and serves asan illustration of a scientific concept whose timehad come. Moreover, we will try to see howdevelopment in this field was spurred by techno-

logical advances, rather than theoretical studiesisolated from practical life.

Besides the original sources cited in the text, thispaper builds on historical overviews, such as [4]and [19]–[23].

“Shannon’s Formula”Sometimes a scientific result comes quite unex-pected as a “stroke of genius” from an individualscientist. More often a result is gradually re-vealed, by several independent research groups,and at a time which is just ripe for the particulardiscovery. In this paper we will look at one par-ticular concept, the channel capacity of a band-limited information transmission channel withadditive white, Gaussian noise. This capacity isgiven by an expression often known as “Shan-non’s formula1”1):

C = W log2(1 + P/N) bits/second. (1)

We intend to show that, on the one hand, this isan example of a result for which time was ripeexactly a few years after the end of World WarII. On the other hand, the formula represents aspecial case of Shannon’s information theory2)

presented in [1], which was clearly ahead of timewith respect to the insight generally established.

“Shannon’s formula” (1) gives an expression forhow many bits of information can be transmittedwithout error per second over a channel with abandwidth of W Hz, when the average signalpower is limited to P watt, and the signal isexposed to an additive, white (uncorrelated)noise of power N with Gaussian probability dis-tribution. For a communications engineer oftoday, all the involved concepts are familiar –if not the result itself. This was not the case in1948. Whereas bandwidth and signal powerwere well-established, the word bit was seen in

On Shannon and “Shannon’s Formula”L A R S L U N D H E I M

Lars Lundheim (44) received hisSiv.ing. and Dr.ing. degreesfrom the Norwegian Universityof Science and Technology(NTNU), Trondheim, in 1984 and1992, respectively. He has heldvarious research and teachingpositions at NTNU, SINTEF,CERN and Trondheim Collegeof Engineering. He is currentlyemployed as Associate Profes-sor at NTNU, doing researchand teaching in signal process-ing for wireless communication.

[email protected]

1) Many mathematical expressions are connected with Shannon’s name. The one quoted here is not the mostimportant one, but perhaps the best known among communications engineers. It is also the one with the mostimmediately understandable significance at the time it was published.

2) For an introduction to Shannon’s work, see the paper by N. Knudtzon in this issue.

The period between the middle of the nineteenth and the middle of the twentieth century represents a

remarkable period in the history of science and technology. During this epoch, several discoveries and

inventions removed many practical limitations of what individuals and societies could achieve. Espe-

cially in the field of communications, revolutionary developments took place such as high speed rail-

roads, steam ships, aviation and telecommunications.

20

Page 23: Information Theory and its Applications

21Telektronikk 1.2002

print for the first time in Shannon’s paper. Thenotion of probability distributions and stochasticprocesses, underlying the assumed noise model,had been used for some years in research com-munities, but was not part of an ordinary electri-cal engineer’s training.

The essential elements of “Shannon’s formula”are:

1. Proportionality to bandwidth W2. Signal power S3. Noise power P4. A logarithmic function

The channel bandwidth sets a limit to how fastsymbols can be transmitted over the channel.The signal to noise ratio (P/N) determines howmuch information each symbol can represent.The signal and noise power levels are, of course,expected to be measured at the receiver end ofthe channel. Thus, the power level is a functionboth of transmitted power and the attenuation ofthe signal over the transmission medium (chan-nel).

The most outstanding property of Shannon’spapers from 1948 and 1949 is perhaps theunique combination of generality of results andclarity of exposition. The concept of an informa-tion source is generalized as a symbol-generat-ing mechanism obeying a certain probabilitydistribution. Similarly, the channel is expressedessentially as a mapping from one set of symbolsto another, again with an associated probabilitydistribution. Together, these two abstractionsmake the theory applicable to all kinds of com-munication systems, man-made or natural, elec-trical or mechanical.

Independent DiscoveriesOne indicator that the time was ripe for a funda-mental theory of information transfer in the firstpost-war years is given in the numerous papersattempting at such theories published at thattime. In particular, three sources give formulasquite similar to (1). The best known of these isthe book entitled Cybernetics [2] published byWiener in 1949. Norbert Wiener was a philo-sophically inclined and proverbially absent-minded professor of mathematics at MIT. None-theless, he was deeply concerned about theapplication of mathematics in all fields of soci-ety. This interest led him to founding the scienceof cybernetics. This field, which is perhaps bestdefined by the subtitle of [2]: “Control and Com-munication in the Animal and the Machine”included, among other things, a theory for infor-

mation content in a signal and the transmissionof this information through a channel. However,Wiener was not a master of communicating hisideas to the technical community, and eventhough the relation to Shannon’s formula ispointed out in [2], the notation is cumbersome,and the relevance to practical communicationsystems is far from obvious.

Reference to Wiener’s work was done explicitlyby Shannon in [1]. He also acknowledged thework by Tuller3). William G. Tuller was anemployee at MIT’s Research Laboratory forElectronics in the second half of the 1940s. In1948 he defended a thesis at MIT on “Theoreti-cal Limitations on the Rate of Transmission ofInformation”4). In his thesis Tuller starts byreferring to Nyquist’s and Hartley’s works (seebelow). Leaning on the use of sampling andquantization of a band-limited signal, and argu-ing that intersymbol interference introduced bya band-limited channel can in principle be elimi-nated, he states quite correctly that under noise-free conditions an unlimited amount of informa-tion can be transmitted over such a channel.Taking noise into account, he delivers an argu-ment partly based on intuitive reasoning, partlyon formal mathematics, arriving at his mainresult that the information H transmitted over atransmission link of bandwidth B during a timeinterval T with carrier-to-noise-ratio C/N is lim-ited by

Claude Elwood Shannon(1916–2001), the founder ofinformation theory, also had

a practical and a playful side.The photo shows him with one

of his inventions: a mechanical“mouse” that could find itsway through a maze. He is

also known for his electroniccomputer working with Roman

numerals and a gasoline-powered pogo stick

3) Shannon’s work was in no way based on Wiener or Tuller; their then unpublished contributions had been pointed out to Shannon after the completionof [1].

4) Later published as RLE Technical report 114 and as a journal paper [5] (both in 1949).

Page 24: Information Theory and its Applications

22 Telektronikk 1.2002

H ≤ 2BT log(1 + C/N). (2)

This expression has a striking resemblance toShannon’s formula, and would by most readersbe considered equivalent. It is interesting to notethat for the derivation of (2) Tuller assumes theuse of PCM encoding.

A work not referenced by Shannon is the paperby Clavier [16]5). In a similar fashion to Tuller,starting out with Hartley’s work, and assumingthe use of PCM coding, Clavier finds a formulaessentially equivalent to (1) and (2). A fourthindependent discovery is the one by Laplumepublished in 1948 [17].

Early Attempts at a GeneralCommunication TheoryShannon and the other researchers mentionedabove were not the first investigators trying tofind a general communication theory. BothShannon, Tuller and Clavier make referencesto the work done in the 1920s by Nyquist andHartley.

By 1920 one can safely say that telegraphy as apractical technological discipline had reached amature level. Basic problems related to sendingand receiving apparatus, transmission lines andcables were well understood, and even wirelesstransmission had been routine for several years.At this stage of development, when only smallincreases in efficiency are gained by technologi-cal improvements, it is natural to ask whetherone is close to fundamental limits, and to try tounderstand these limits. Harry Nyquist, in hispaper “Certain Factors Affecting TelegraphSpeed” [7], seems to be the first one to touch

upon, if not fully identify, some of the issuesthat were clarified by Shannon twenty yearslater.

First, it is obvious to Nyquist that the “Speed oftransmission of intelligence” (which he terms W)is limited by the bandwidth of the channel6).Without much formal mathematical argument,Nyquist derives the following approximate for-mula for W:

W = K log m (3)

where m is the “number of current values”,which in modern terms would be called “the sizeof the signalling alphabet” and K is a constant.

Whereas Nyquist’s paper is mostly concernedwith practical issues such as choice of pulsewaveform and different variants of the Morsecode, a paper presented three years later by Hart-ley is more fundamental in its approach to theproblem. The title is simply “Transmission ofInformation”, and in the first paragraph theauthor says that “What I hope to accomplish (...)is to set up a quantitative measure whereby thecapacities of various systems to transmit infor-mation may be compared”. Even though Nyquisthad given parts of the answer in his 1924 paper,this is the first time the question that was to leadup to Shannon’s information theory is explicitlystated.

Compared to Nyquist, Hartley went a couple ofsteps further. For one thing, he stated explicitlythat the amount of information that may betransmitted over a system7) is proportional to thebandwidth of that system. Moreover, he formu-lated what would later be known as Hartley’slaw, that information content is proportional tothe product of time (T) and bandwidth (B), andthat one quantity can be traded for the other. Itshould also be mentioned that Hartley arguedthat the theory for telegraph signals (or digitalsignals in modern terms) could be generalizedto continuous-time signals such as speech ortelevision. Hartley’s law can be expressed as

Amount of information = const ⋅ BT ⋅ log m. (4)

Relations between bandwidth and time similar tothe one found by Nyquist was discovered simul-taneously by Karl Küpfmüller in Germany [10].

Norbert Wiener (1894–1964)had been Shannon’s teacher atMIT in the early 1930s. By hisseminal work Extrapolation,Interpolation and Smoothing ofStationary Time Series madeduring World War II he lay thefoundation for modernstatistical signal processing.Although Shannon wasinfluenced by Wiener’s ideas,they had little or no contactduring the years when theymade their contributions tocommunication theory. Therestyles were very different.Shannon was down-to-earth inhis papers, giving illustrativeexamples that made hisconcepts possible to grasp forengineers, and giving hismathematical expression asimple, crisp flavour. Wienerwould rather like to use thespace in-between crowdedformulas for philosophicalconsiderations and esoterictopics like Maxwell’s demon

5) It is, perhaps, strange that neither Shannon nor Clavier have mutual references in their works, since both [3] and [16] were orally presented at thesame meeting in New York on December 12, 1947, and printed more than a year afterwards.

6) Proportionality to bandwidth is not explicitly stated by Nyquist in 1924. He has probably been aware of it, and includes it in his more comprehensivepaper [7] four years later.

7) Neither Nyquist nor Hartley make explicit distinction between source, channel and destination, as Shannon does twenty years later. This may seem likea trivial omission, but the distinction is essential for Shannon’s general definition of channel capacity, which requires this separation to define quanti-ties such as source entropy and mutual information.

Page 25: Information Theory and its Applications

23Telektronikk 1.2002

A more mathematically stringent analysis of therelation was carried out in Gabor’s “Theory ofcommunication”.

As was pointed out by Tuller [5], a fundamentaldeficiency of the theories of Nyquist, Hartley,Küpfmuller and Gabor, is that their formulas donot include noise. The role of noise is that it setsa fundamental limit to the number of levels thatcan be reliably distinguished by a receiver. Fromexpressions (3) and (4) we see that both Nyquistand Hartley were aware of the fact that theamount of information depends on the numberof distinguishable signal levels (or symbols).However, they seem content to include this num-ber m in their formulas instead of deriving itfrom a more fundamental quantity, such as thesignal-to-noise level. In a short discussion,Nyquist mentions “interference” as one of thelimiting factors of the signal alphabet size. Hart-ley points out the inter-symbol interference dueto channel distortion as the most important limit-ing factor. This is fundamentally wrong, asTuller remarks, since inter-symbol interferencecan, in principle, be removed by an equalizer.This is precisely what Nyquist shows in his 1928paper [7].

Developments in the Under-standing of BandwidthAs we have seen, the work of Nyquist, Hartleyand Küpfmüller in the 1920s represents an im-portant step towards the fully developed channelcapacity formulas expressed twenty years later.A key insight was the realization that informa-tion transmission rate was limited by systembandwidth. We have argued that this understand-ing came when the maturity of telegraph tech-nology made it natural to ask if fundamental lim-itations were within reach. Another, related fac-tor was that the concept of bandwidth, so essen-tial in the cited works, was now clearly under-stood by the involved researchers. Today, whenstudents of electrical engineering are exposed toFourier analysis and frequency domain conceptsfrom early on in their education, it seems strangethat such a fundamental signal property as band-width was not fully grasped until around 1920.We will therefore take a look at how frequencydomain thinking was gradually established ascommunications technology evolved.

If one should assign a birth date to practical(electrical) telecommunications technology, itwould be natural to connect it to one of the first

successful demonstrations of telegraphy by Veiland Morse in the 1840s. It took only a few yearsbefore all developed countries had establishedsystems for telegraph transmission. These sys-tems were expensive, both to set up and to oper-ate, and from the start it was important to makethe transmission as cost-effective as possible.This concern is already reflected in the Morsecode9) alphabet which is designed according tothe relative frequencies of characters in writtenEnglish.

It was soon evident that, for transmission onoverhead wires, the transmission speed was lim-ited by the telegraph operator and, possibly, theinertia of the receiving apparatus, not by anyproperties of the transmission medium. The onlyproblem connected to what we today would callthe channel was signal attenuation. This was,however, a relatively small problem, since signalretransmission was easily accomplished, eithermanually or with automatic electromagneticrepeater relays.

For the crossing of rivers or stretches of ocean,the telegraph signals were transmitted usingcables. This transmission medium soon showedto be far more problematic than overhead lines.For long spans repeaters were not a practicalsolution, meaning that the attenuation problembecame serious. It was also found that operators

Harry Nyquist (right)(1889–1976) with John R.

Pierce (left) and R. Kompfner.Nyquist was born in Nilsby in

Värmland, Sweden, andemigrated to USA in 19078).He earned an MS degree in

Electrical Engineering in 1914and a PhD in physics at Yale

in 1917. The same year he wasemployed by AT&T where heremained until his retirement

in 1954. Nyquist madecontribution in very differentfields, such as the modelling

of thermal noise, the stabilityof feed-back systems, and

the theory of digitalcommunication

8) More about Harry Nyquist’s early years can be found on Lars-Göran Nylén’s homepage, URL: http://members.tripod.com/~lgn75/9) The Vail Code would be a more proper term, since it was Morse’s assistant Alfred Vail who in 1837 visited a print shop in Morristown to learn from

the contents of the type cases which letters were more frequent in use. He then advised Morse to abandon his plan of using a word code involving theconstruction of a dictionary assigning a number to all English words, and use the more practical character code with unique dash-dot combinations foreach letter [19].

Page 26: Information Theory and its Applications

24 Telektronikk 1.2002

would have to restrain themselves and use alower speed than normal to obtain a distinguish-able message at the receiver end. Both theseproblems were of concern in the planning of thefirst transatlantic telegraph cable. Expert opin-ions were divided from the start, but the mathe-matical analysis of William Thomson (later LordKelvin) showed that even though the attenuationwould be large, practical telegraphy would bepossible by use of sensitive receiving equipment.In particular, Thomson’s analysis explained howthe dispersion of the cable sets a limit to the pos-sible signalling speed.

In our connection, Thomson’s work is interest-ing because it was the first attempt of mathemat-ical analysis of a communication channel. Wesee that two of the four elements of Shannon’sformula were indirectly taken into account: sig-nal power reduced by the attenuation and band-width limiting the signalling speed. Bandwidthwas not explicitly incorporated in Thomson’stheory. This is quite natural, since the relevantrelationships were expressible in physical cableconstants such as resistance and capacitance.These were parameters that were easily under-stood and that could be measured or calculated.Bandwidth, on the other hand, was simply not arelevant notion, since engineers of the time hadnot yet learnt to express themselves in frequencydomain terms.

During the nineteenth century topics such asoscillation, wavelength and frequency were thor-oughly studied in fields such as acoustics, opticsand mechanics. For electrical and telegraph engi-neers, however, these concepts had little interestfrom the start.

Resonance was a well-known phenomenon inacoustics. It was therefore an important concep-tual break-through when Maxwell showed math-ematically how a circuit containing both capaci-tance and inductance would respond signifi-cantly different when connected to generatorsproducing alternating current of different fre-quencies. The phenomenon of electrical reso-

nance was then demonstrated in practical experi-ments by Hertz10).

It is interesting to see how, even before electricalresonance was commonly understood, acousticalresonance was suggested as a means of enhanc-ing the capacity of telegraph systems. As notedabove, the telegraph operator represented thebottleneck in transmission by overhead wires.Thus, many ingenious schemes were suggestedby which two or more operators could use thesame line at a time. As early as 1853 the Ameri-can inventor M.B. Farmer is reported to havesuggested the first system for time division mul-tiplex (TDM) telegraphy. The idea, which wasindependently set forth several times, was per-fected and made practical by the FrenchmanJ.M.E. Baudot11) around 1878. In parallel withthe TDM experiments, several inventors wereworking with frequency division multiplex(FDM) schemes. These were based on vibratingreeds, kept in oscillation by electromagnets. Byassigning a specific frequency to each telegraphoperator, and using tuned receivers, independentconnections could be established over a singletelegraph line. One of the most famous patents isthe “harmonic telegraph” by A.G. Bell from the1870s. It was during experiments with this ideathat Bell more or less accidentally discovered away of making a practical telephone.

By 1890 electrical resonance phenomena weregenerally understood by scientists and well-informed engineers. Consequently, practicalpatents on how to use electrical resonance inFDM telegraphy began to be filed. Theseattempts, which also included some ideas ofFDM telephony (not practical at the time) were,however, overshadowed by a new invention: thewireless.

After the first few experiments with wirelesstelegraphy at the start of the twentieth century, itbecame clear that sharp resonance circuits, tunedto specific frequencies or wavelengths were nec-essary to avoid disturbance among differentusers. This requirement made it necessary forradio engineers to have a good understanding offrequency and the behaviour of electrical circuitswhen connected to sources of varying frequencycontent. It is important to note that the conceptof bandwidth was much more elusive than thatof frequency. Today, the decomposition of aninformation-bearing signal into its Fourier com-ponents is a routine operation to any electrical orcommunications engineer. This was not the case80–100 years ago. At that time, one would com-monly assume that a radio transmitter was tuned

Detail from Alexander GrahamBell’s patent for the“Harmonic Telegraph”.An alternating current isgenerated in the left coil witha frequency given by thevibrating reed c. The reed h tothe right will resonate and givea sound only if it is tuned tothe same frequency. By thismechanism several users couldtransmit telegraph signal overthe same line by usingequipment tuned to differentfrequencies

10) For more details of the history of electrical resonance, including an early contribution by Thomson, see the paper by Blanchard [20].11) Baudot is today remembered by the unit baud for measuring the number of symbols per second transmitted through a communication channel.

Page 27: Information Theory and its Applications

25Telektronikk 1.2002

to one – and only one – frequency. The band-width of a telegraph signal was small comparedto both the carrier frequencies used and to thewidth of the resonance circuits employed. Ageneral awareness of bandwidth did not developuntil some experience had been gained with tele-phone transmission.

In parallel with the development of wirelesstelegraphy, and, gradually, telephony, work con-tinued on FDM in telecommunications systemsor “carrier current telephony and telegraphy”12)

which was the term used at the time. In thesesystems, first intended for enhancing the capac-ity of long-distance wire-bound connections, itbecame important to minimize the spacing be-tween carrier frequencies and at the same timeprevent cross-talk between the channels. Toobtain this, traditional resonance circuits wereno longer adequate for transmitter or for receivertuning. What was needed was band-pass filters,sufficiently broad to accept the necessary band-width of the modulated speech signal, flatenough to avoid distortion, and with sufficientstop-band attenuation to avoid interference withneighbour channels. This kind of device wasdeveloped during the years prior to World War I,and was first patented by G.A. Campbell of BellSystems in 1917.

With hindsight it is curious to note that beforeCampbell’s invention, band-limited channels ina strict and well-defined way did not exist! Ear-lier transmission channels were surely band-lim-ited in the sense that they could only be used inpractice for a limited frequency range. However,the frequency response tended to roll-off gradu-ally so as to make the definition of bandwidth,such as is found in Shannon’s formula, question-able.

So, around 1920 it was evident that an informa-tion-bearing signal needed a certain bandwidth,

whether it was to be transmitted in original ormodulated (by a carrier) form. There was, how-ever, some discussion as to how large this band-width had to be. The discussion seems to haveceased after Carson’s “Notes on the Theory ofModulation” in 1922. By this time it had beentheoretically shown (by Carson) and practicallydemonstrated that by using so-called single side-band modulation (SSB) a modulated signal canbe transmitted in a bandwidth insignificantlylarger than the bandwidth of the original (un-modulated) signal. The aim of Carson’s paperwas to refute claims that further bandwidthreduction could be obtained by modulating thefrequency of the carrier wave instead of theamplitude13). An often quoted remark from theintroduction is that, according to Carson, “allsuch schemes [directed towards narrowing thebandwidth] are believed to involve a fundamen-tal fallacy”. Among others, Gabor [10] takes thisstatement as a first step towards the understand-ing that bandwidth limitation sets a fundamentallimit to the possible information transfer rate ofa system.

The Significance of NoiseWe have seen that the work towards a generaltheory of communication had two major break-throughs where several investigators made simi-lar but independent discoveries. The first onecame in the 1920s by the discovery of the rela-tion between bandwidth, time and informationrate. The second one came 20 years later. Animportant difference between the theories pub-lished during these two stages is that in the1920s the concept of noise was completelylacking.

Why did it take twenty years to fill the gapbetween Hartley’s law and Shannon’s formula?The only necessary step was to substitute1 + C/N for m in (4). Why, all of a sudden, didthree or more people independently “see the

Example of Campbell’sbandpass filter designs withtransfer function. (Original

figures from U.S. PatentNo 1 227 113)

12) A paper [21] with this title by Colpitts and Blackwell was published in three instalments in Journal of the American Institute of Electrical Engineers in1921, giving a comprehensive overview both of the history of the subject and the state-of-the-art around 1920.

13) This was an intuitively appealing idea at the time, but sounds almost absurd today, when the bandwidth expansion of FM is a well-known fact.

Page 28: Information Theory and its Applications

26 Telektronikk 1.2002

light” almost at the same time? Why did neitherNyquist, nor Hartley or Küpfmüller realize thatnoise, or more precisely the signal-to-noise ratioplay as significant a role for the informationtransfer capacity of a system as does the band-width?

One answer might be that they lacked the neces-sary mathematical tools for an adequate descrip-tion of noise. At this time Wiener had just com-pleted a series of papers on Brownian motion(1920–24) which would become a major contri-bution to what was later known as stochasticprocesses, the standard models for description ofnoise and other unpredictable signals, and one ofShannon’s favourite tools. These ideas, based onprobabilistic concepts, were, however, far frommature to be used by even the most sophisticatedelectrical engineers of the time14). Against thisexplanation, it may be argued that when Shan-non’s formula was discovered, only two15) ofthe independent researchers used a formal proba-bilistically based argument. The others basedtheir reasoning on more common-sense reason-ing, not resorting to other mathematical tech-niques than what were tools of the trade in the1920s.

Another explanation could be that the problemof noise was rather new at the time. Noise isnever a problem as long as it is sufficiently smallcompared to the signal amplitude. Therefore itusually arises in situations where a signal hasbeen attenuated during transmission over a chan-nel. As we have seen, such attenuation had beena problem from the early days of telegraphy.From the beginning, the problem of attenuationwas not that noise then became troublesome, butrather that the signal disappeared altogether.(More precisely, it became too weak to activatethe receiving apparatus in the case of telegraphy,or too weak to be heard by the human ear in caseof telephony.) This situation changed radicallyaround 1910, when the first practical amplifiersusing vacuum tubes were devised. By use ofthese, long-distance telephony could for the firsttime be achieved, and ten years later the firstcommercial radio broadcasting could begin. But,alas, an electronic amplifier is not able to distin-guish between signal and noise. So, as a by-product, interference, such as thermal noise16),always present in both the transmission linesand the components of the amplifiers, would beamplified – and made audible – together with thesignal. Early amplifiers were not able to amplifythe signal very much, due to stability problems.

When the feed-back principle, patented by H.S.Black in 1927, came in use, gains in the order ofhundreds or thousands became possible by cas-cading several amplifier stages. This made noisea limiting factor to transmission systems, impor-tant to control, and by the 1930s signal-to-noiseratio had become a common term among com-munications engineers.

Although the researchers of the 1920s wereaware of the practical problem represented bynoise and interference, it seems that they did notregard it as a fundamental property of the trans-mission system, but rather as one of the manyimperfections of practical systems that should bedisregarded when searching for principal limitsof what could be accomplished.

Thus, the fact that noise had just begun to playan active role in communications systems, mightpartly explain why it was not given sufficientattention as a limitation to transmission capacity.However, when one looks at the reasoning usedby both Tuller and Clavier (and to some degreeShannon), one will find that their arguments areinspired by two practical ideas, both inventedduring the 1930s, namely frequency modulation(FM) and pulse code modulation (PCM).

Two Important InventionsWhen reading textbooks and taking universitycourses in engineering, one may get the idea thatnew products are based on the results of engi-neering science, which again rely on a thoroughunderstanding of more basic sciences such asphysics and chemistry, which finally lean onmathematics as the most basic of all exactknowledge. One can also get the impression thatthe development in these fields follows the samepattern: technology must wait for physics toexplain new phenomena by mathematics madeready for the purpose in advance. Reality is quitedifferent. Time and time again inventors withonly coarse knowledge of the physical phenom-ena they exploit, have come up with ingeniousproblem solutions. Similarly, engineers andphysicists have discovered several mathematicalresults, which they have not been able to provesatisfactorily, leaving it to the established mathe-maticians to “tie up the ends” and provide thenecessary comprehensive theory afterwards.

With this in mind, it is interesting to note thatthe understanding of noise in a theory of channelcapacity had to wait for two practical invention.These inventions illustrated how signal-to-noise

14) One of the first papers with a rudimentary mathematical treatment of noise was published by Carson in 1925 [12]. It should also be mentioned thatHarry Nyquist derived a mathematical model of thermal noise in 1928 [13]. This model was, however, derived without use of probabilistic methods.

15) Shannon and Wiener, of course.16) Not to mention the “shot noise” generated by travelling electrons in the tubes themselves.

Page 29: Information Theory and its Applications

27Telektronikk 1.2002

ratio (SNR) and bandwidth of a transmissionsystem could actually be traded one against theother.

We have already mentioned Carson’s 1922paper, where he showed that frequency modula-tion (FM) would result in a signal bandwidthconsiderably larger than what would result bySSB, or even traditional AM. This is undeniablytrue, and it was therefore natural that mostresearchers also accepted Carson’s rejection inthe same article that FM would be more robustwith respect to noise. Among the few who con-tinued working seriously with FM was EdmundArmstrong. After several years of experimenta-tion, and after introducing an amplitude limiterin the receiver, he was able to demonstrate thatit was possible to significantly increase the SNRof a radio communication system by using FMat the cost of expanded bandwidth17) [13].

What Armstrong’s results showed was that atrade-off between bandwidth and SNR could inprinciple be possible. The next step, to realizethat the information transfer capacity of a systemdepended both on bandwidth and SNR tooksome time, and needed another invention.

PCM – Pulse Code Modulation – consists of thesampling and quantizing of a continuous wave-form. The use of sampling in telephone trans-mission had been suggested as early as 1903

by W.M. Miner.18) Miner’s motivation was toenhance the capacity of transmission lines bytime division multiplex, as had already beendone for telegraphy (see above). Miner’s conceptcontained no form of quantizing and should notbe considered a PCM system. This importantaddition was first introduced by A.H. Reeves in1937.19) Reeves realized that his system wouldneed more bandwidth than traditional modula-tion methods. His rationale was the same asArmstrong’s: the combat of noise. Reeves’ radi-cal insight was that by inserting repeaters at suit-able intervals along the transmission line, noadditional noise would be added during trans-mission together with the quantizing noise intro-duced by the encoding (modulation) process atthe transmitter. The quantizing noise could bemade arbitrarily small by using a sufficientlyhigh number of quantization levels.

Implicit in Reeves’ patent lies two importantprinciples:

1. An analog signal, such as speech, can be rep-resented with arbitrary accuracy by use of suf-ficiently frequent sampling, and by quantizingeach sample to one of a sufficiently largenumber of predefined levels.

2. Each quantized sample can be transmitted ona channel with arbitrarily small probability oferror, provided the SNR is sufficiently large.

Alec Harley Reeves (1902 –1971) with two figures from his

PCM patent. Although awarethat his invention was far

ahead of what was possiblewith the technology of his time,

the patent included severalcircuit solutions to some ofthe involved functionalities

17) Carson immediately accepted this as a fact, and very soon afterwards presented a paper together with C. Fry [15] showing mathematically how thismechanism worked, and that this was due to the amplitude limiter not included in the early proposals refuted by him in 1922.

18) U.S. Patent 745,734.19) French Patent 852,183.

Page 30: Information Theory and its Applications

28 Telektronikk 1.2002

A result from this, not explicitly stated byReeves, is that on a noise-free channel, an infi-nite amount of information can be transmitted inan arbitrarily small bandwidth. This is in sharpcontrast to the results of the 1920s, and shouldbe considered as a major reason why “Shannon’sformula” was discovered by so many just at atime when PCM was starting to become well-known.

Concluding RemarksIn this paper we have not written much aboutShannon’s general information theory. On theother hand, we have tried to show how the ideasleading up to “Shannon’s formula” graduallyemerged, as practical technological inventionsmade such ideas relevant. We have also seen thatafter the end of World War II, the subject wassufficiently mature, so that several independentresearchers could complete what had been onlypartially explained in the 1920s.

On this background, one might be led to con-clude that Shannon’s work was only one amongthe others, and, by a stroke of luck, he was thefirst one to publish his results. To avoid such amisunderstanding, we will briefly indicate howShannon’s work is clearly superior to the others.First, we should make a distinction between theworks of Shannon and Wiener ([1]–[3]) and theothers ([5] [16] [17]). Both Shannon and Wienerdelivered a general information measure basedon the probabilistic behaviour of informationsources, which they both designate by entropydue to the likeness with similar expressions instatistical mechanics. Shannon, furthermore,uses this concept in his general definition ofchannel capacity:

C = max[H(x) – Hy(x)].

This expression can be interpreted as the maxi-mum of the difference of the uncertainty aboutthe message before and after reception. Theresult is given in bit/second and gives an upperbound of how much information can be trans-mitted without error on a channel. The mostastonishing with Shannon’s result, which wasnot even hinted at by Wiener, was perhaps not somuch the quantitative expression as the fact thatcompletely error-free information exchange waspossible at any channel, as long as the rate wasbelow a certain value (the channel capacity).

The entropy concept is absent in all the presenta-tions of the other group, which deal explicitlywith a channel with additive noise. All reasoningis based on this special case of a transmissionchannel. The genius of Shannon was to see thatthe role of noise (or any other disturbances, be-ing additive or affecting the signal in any otherway) was to introduce an element of uncertainty

in the transmission of symbols from source todestination. This uncertainty is adequately mod-elled by a probability distribution. This under-standing was shared by Wiener, but his attentionwas turned in other directions than Shannon’s.According to some, Wiener “under the misap-prehension that he already knew what Shannonhad done, never actually found out” [4].

References1 Shannon, C E. A Mathematical Theory of

Communication. Bell Syst. Techn. J., 27,379–423, 623–656, 1948.

2 Wiener, N. Cybernetics: or Control andCommunication in the Animal and theMachine. Cambridge, MA, MIT Press, 1948.

3 Shannon, C E. Communication in the Pres-ence of Noise. In: Proc. IRE, 37, 10–21,1949.

4 Pierce, J R. The Early Days of InformationTheory. IEEE Trans. on Information Theory,IT-19 (1), 1973.

5 Tuller, W G. Theoretical Limitations on theRate of Information. Proc. IRE, 37 (5),468–78, 1949.

6 Carson, J R. Notes on the Theory of Modula-tion. Proc. IRE, 10, 57, 1922.

7 Nyquist, H. Certain factors affecting tele-graph speed. Bell Syst. Tech. J., 3, 324–352,1924.

8 Nyquist, H. Certain topics in telegraph trans-mission theory. AIEE Trans., 47, 617–644,1928.

9 Hartley, R V L. Transmission of informa-tion. Bell Syst. Techn. J., 7, 535–563, 1928.

10 Küpfmuller, K. Über Einschwingvorgange inWellen Filtern. Elektrische Nachrichten-Technik, 1, 141–152, 1924.

11 Gabor, D. Theory of communication. J. IEE,93 (3), 429–457, 1946.

12 Carson, J R. Selective Circuits and StaticInterference. Bell Syst. Techn. J., 4, 265,1925.

13 Nyquist, H. Thermal Agitation of ElectricCharge in Conductors. Phys. Rev., 32, 1928.

14 Armstrong, E H. A Method of ReducingDisturbances in Radio Signaling by a Systemof Frequency-Modulation. Proc. IRE, 24,689–740, 1936.

Page 31: Information Theory and its Applications

29Telektronikk 1.2002

15 Carson, J R, Fry, T C. Variable FrequencyCircuit Theory with Application to the The-ory of Frequency-Modulation. Bell Syst.Tech. J., 16, 513–540, 1937.

16 Clavier, A G. Evaluation of transmissionefficiency according to Hartley’s expressionof information content. Elec. Commun. : ITTTech. J., 25, 414–420, 1948.

17 Laplume, J. Sur le nombre de signaux dis-cernables en présence du bruit erratique dansun système de transmission à bande passantelimitée. Comp. Rend. Adac. Sci. Paris, 226,1348–1349, 1948.

18 Carson, J. The statistical energy-frequencysystem spectrum of random distrubances.Bell Syst. Tech. J., 10, July, 374–381, 1931.

19 Oslin. G P. The Story of Telecommunica-tions. Georiga, Macon, 1992.

20 Blanchard, J. The History of Electrical Reso-nance. Bell Syst. Techn. J., 23, 415–433,1944.

21 Colpitts, Blackwell. Carrier Wave Tele-phony and Telegraphy. J. AIEE, April 1921.

22 Bray, J. The Communications Miracle. NewYork, Plenum Press, 1995.

23 Hagemeyer, F W. Die Entstehung von Infor-mationskonzepten in der Nachrichtentechnik: eine Fallstudie zur Theoriebildung in derTechnik in Industrie- und Kriegsforschung.Berlin, Freie Universität Berlin, 1979. (PhDdissertation.)

Page 32: Information Theory and its Applications

Telektronikk 1.2002

IntroductionI thank you for the invitation to once again enterthis distinguished rostrum here at Telenor’sR&D Department!

My presentation will consist of three parts:1)The general state of affairs, 1948–1949:

In order for you to properly understand andappreciate the topic I have been given, whichdates back more than 50 years, I initially needto place you in the world of those times.

2)Statistical communication theory, 1948–1949:Consistent with the original terminology, Ishall use statistical communication theory as acommon term for Claude Shannon’s informa-tion theory and Norbert Wiener’s mathemati-cal theory of statistically optimal networks.This part of the presentation is an overview ofwhat I saw, heard, and learned about the topicduring my stay at the MIT Research Labora-tory of Electronics from February 1948 untilJuly 1949. I had the good fortune – as the onlyNorwegian – to work in this most inspiringenvironment during those pioneering years.Furthermore I shall give a very condensedreprise of the three papers I presented on thetopic at “Studiemøtet i radioteknikk og elek-troakustikk” (Symposium of Radio Technol-ogy and Electro-acoustics) at Farris Bad in1950.

3)Reflections: Finally, I will conclude withsome reflections on the developments –positive and negative – since those days.

The General State of Affairs,1948 – 1949Let us now place ourselves in 1948–1949 andlist some notable events during this period:

• The relationship between the western worldand The Soviet Union is dominated by thecold war.

• The new German states of Western Germanyand Eastern Germany are established.

• Harry Truman unexpectedly wins the presi-dential election in the USA.

• The Marshall Plan is initiated.

• NATO is established, with Norway as a mem-ber.

• Mao Tse-Tung proclaims the People’s Repub-lic of China.

• The state of Israel is proclaimed.

• Mahatma Gandhi is murdered by a Hindufanatic in Delhi.

• Long-playing records are introduced in theUSA.

• The number of television receivers is reaching750,000 in the USA.

• A committee is appointed to assess televisionin Norway.

• The transistor is invented.

• Claude E. Shannon publishes “A Mathemati-cal Theory of Communication”.

• Norbert Wiener publishes the book “Cyber-netics”.

The State of Science and Researchin Norway, 1948In 1941 we were 30 students who, based on ouroutstanding results from the matriculation exam,were admitted to The Faculty of Electrical Engi-neering at The Norwegian Institute of Technol-ogy (NTH): 20 to Power Electrical Engineering,and 10 to Electronics. In this wartime period theFaculty had three professors, all in Power Elec-trical Engineering. Their expertise was alsomade available to the Electronics students,although the frequency range was limited to50 Hz. However, in the second half of our stud-ies, laboratory engineer Reno Berg introducedus to frequencies above 50 Hz.

Our most memorable experience was the sixweeks during the winter of 1946 when HelmerDahl (38 at the time) and Matz Jenssen (36) –based on their achievements and experience

Statistical Communication Theory 1948 – 1949N I C . K N U D T Z O N 1)

Dr. Nic. Knudzon (80) obtainedhis Engineering degree from theTechnical University of Norway,Trondheim in 1947 and his Doc-tor’s degree from the TechnicalUniversity in Delft, the Nether-lands in 1957. 1948–1949 hewas with the Research Labora-tory of Electronics, Massachu-setts Institute of Technology,working with information theoryand experiments. 1950–1955 hewas with the Norwegian De-fence Research Establishment,Bergen, working on the devel-opment of microwave radiolinks; and from 1955 to 1967 hewas Head of the Communica-tions Division at Shape Techni-cal Center in The Hage, Nether-lands, where his efforts went intothe planning of military telecom-munications networks and sys-tems in Western Europe. From1968 to 1992 he was Director ofResearch at the Norwegian Tele-communications Administration,working on the planning of futuretelecommunications systems,networks and services. Dr.Knudtzon has been member ofgovernment commissions andvarious committees, includingthe Norwegian Research Coun-cil, the National Council forResearch Libraries, the Inter-national TelecommunicationsUnion, EURESCOM, etc.

1) The paper is a transcript of a presentation given by the author at the Telenor seminar “From Shannon’s infor-mation theory to today’s information society: Claude Shannon (1916–2001) In Memoriam” on August 9, 2001.The paper was presented in Norwegian and has been translated by Geir E. Øien.

30

Page 33: Information Theory and its Applications

31Telektronikk 1.2002

from UK laboratories during World War II –gave us unforgettable inspiration by providinginsights into the enormous progress made in ourscientific discipline. The frequencies reachedinto the GHz range!

My Personal SituationPlease allow me to say a few words about mypersonal situation, which led me to the USA andto close contact with the pioneers of statisticalcommunication theory.

During his stay at NTH in early 1946, HelmerDahl gave a lecture on a topic which, in his ownwords, “had nothing to do with the curriculum”:Thermal noise as the fundamentally limiting fac-tor of communication systems. This caught myattention to such a degree that the choice of top-ics both for my main thesis and my doctoral the-sis was made there and then! Since NTH had noexpertise in this area, I wrote to Helmer Dahl,who at the time was establishing the so-calledDepartment of Radar – in fact their researchactivities came to be primarily associated withmicrowave radio links – at The NorwegianDefence Research Institute (FFI), and askedwhether it would be possible for me to do mymain thesis under his supervision.

The topic of the thesis was remarkable for Nor-way at the time: “Give an overview of molecularnoise (fluctuation noise), and compute the sig-nal-to-noise ratio associated with pulse-widthmodulation and frequency modulation for ultra-shortwave transmission.” This was the first stepin a long-term strategy to determine the choiceof modulation scheme for radio links developedby FFI. The two alternatives were frequencymodulation, which is compatible with conven-tional carrier frequency telephony, and pulsemodulation with time-shared channels. Hence,in 1947 I was already told to assess fluctuationnoise and compute signal-to-noise ratios in adigital system, which could not be efficientlyrealized in those pre-transistor days, but whichwas possible to analyse mathematically.

My studies were successful, and FFI sent me toMassachusetts Institute of Technology (MIT) fora period of 5 months, which MIT generouslyoffered to extend by one year. Travelling to theUSA in those days was not a trivial affair; I hadto take an oath – with my hand on the Bible – asI received my government official visa. Subse-quently I travelled by an ocean liner over theNorthern Atlantic Ocean, in a hurricane.

USA 1948 – 1949The transition from a war-ridden and run-downNorway, whose reconstruction had not yet prop-erly started, to the dynamic USA, was over-whelming. As a small illustration, I may mention

that both NTH and MIT were building newlibraries at the time: At NTH they dug withspades and transported by horse-carriages; atMIT I saw a bulldozer in action for the firsttime.

I came to a flourishing and proud USA, the win-ner of World War II, now acknowledged as theworld’s leading superpower.

At MIT I worked at the Research Laboratory ofElectronics, in the temporary buildings of theRadiation Laboratory, which had been a centerfor the US research on radar during World WarII.

The annual updating within our scientific disci-plines took place at the IRE (Institute of RadioEngineers) National Convention in New York,and my first meeting with this mustering oftheory and practice, shortly after my arrival in1948, was a great experience. In my diary I com-mented: “Good” (!) – particularly on a March24 session, which included contributions fromClaude Shannon and Norbert Wiener, whom Ithen saw and heard for the first time. ClaudeShannon’s contribution was “Communicationsin the presence of noise” [1].

Of the many other impressions, I would hereparticularly like to emphasize the immediate,friendly American way of communicating andthe sense of being taken seriously – no matteryour age – when you accepted a challenge.Claude Shannon serves as a typical examplein his relations with me. When he, on May 17,1948, visited the group of 5–6 researchers towhich I belonged at MIT, he invited me to visitBell Telephone Laboratories. During the wholeof July 7 he was my cicerone, both at the oldlocation in West Street, and at Murray Hill.There I also met other well-known scientists,among them H. Nyquist and S.O. Rice, whosearticles “Mathematical Analysis of RandomNoise” [2] and “Statistical Properties of a SineWave plus Random Noise” [3] have given methe background for much of my work. OnAugust 10, I personally received from ClaudeShannon a copy of the preprint of his pioneeringpaper “A Mathematical Theory of Communica-tion”, and with this treasure I immediately initi-ated a seminar series in our research group atMIT.

Claude Elwood Shannon (1916 – 2001) acquiredhis B.Sc. degrees in electrical engineering andmathematics at Michigan University in 1936,and his M.Sc. degree in electrical engineering aswell as his Ph.D. degree in mathematics at MITin 1940. He became a National Research Fellowin 1941. Subsequently he worked at Bell Tele-phone Laboratories and had an affiliation with

Page 34: Information Theory and its Applications

32 Telektronikk 1.2002

this institution until 1972. He was a professor atMIT from 1956 until 1978, when he becameProfessor Emeritus. Apart from his monumentalcontributions in statistical communication theoryhe also wrote papers in several other areas, whileat the same time serving as an advocate of mod-eration regarding publishing for the sake of pub-lishing itself. The most spectacular I have heardabout him, is his passion for riding a unicyclewhile juggling. Many have offered the opinionthat Shannon was worthy of a Nobel Prize, butit seems like he fell outside the six Nobel cate-gories. My meetings with Claude Shannon tookplace when he was 32–33 years old and at thetop of his most productive period, and I remem-ber him as a pleasant and straightforward man.

Norbert Wiener (1894 – 1964) was in 1948 aworld-famous 52-year-old mathematician whomade his daily rounds in the MIT corridors andlaboratories, a bit remote and with a particularsense for absent-mindedness. Due to his interestin statistical communication theory he visitedour research group quite often, also observingsome of my measurements of the statisticalproperties of noise. I sporadically followed someof his lectures in mathematics, which could bequite peculiar – particularly when he becameaddicted to his own computations at the black-board, giving us associations to Maxwell’sDemon himself.

Norbert Wiener was affiliated with MIT since1919, and was a professor there when ClaudeShannon acquired his Ph.D. in mathematics.

Statistical CommunicationTheory

The DevelopmentIn 1928, R.V.L. Hartley at Bell Telephone Labo-ratories concluded in his “Transmission of Infor-mation” [5], based on physical (as opposed topsychological) considerations, that the amountof information transmitted in a noiseless systemis proportional to the product of bandwidth andtransmission time.

In 1946 this result was developed further byD. Gabor at British Thomson-Houston Co. in“Theory of Communication” [6], but he, too,confined himself to a system without noise.

At a meeting of the IRE in New York on De-cember 12, 1947, A.G. Clavier of ITT’s FederalTelecommunication Laboratories presented“Evaluation of Transmission Efficiency Accord-ing to Hartley’s Expression of Information Con-tent” [7]. This work contained an analysis of thetransmission efficiency of frequency modulationand various kinds of pulse modulation, based onHartley’s definition of information content, butfor a system with noise.

At the same IRE meeting in New York, ClaudeShannon presented “Communications in thepresence of noise” [1], which is identical to hispreviously mentioned presentation made atIRE’s National Convention in 1948. He took asa starting point his figure of a “General Commu-nication System” (cf. Figure 1), used a geometri-cal viewpoint when solving the problem, andconcluded with his formula for the optimaltransmission capacity,

C = W log2 (P + N) / N,

for a communication channel with bandwidth W,average transmit power P, and thermal (Gaus-sian and white) noise power N. Furthermore, hepresented figures showing how practical systemscompared in quantitative performance to theoptimal performance theoretically attainable.I discuss this paper in depth in [11] and [12].

Figure 1 GeneralCommunication System

Figure 2 Example ofprediction of filteredfluctuation noise

Informationsource

Transmitter Receiver Destination

Message Signal Signal+ noise

Message+ noise

Noise

Channel

Page 35: Information Theory and its Applications

33Telektronikk 1.2002

Then, in July 1948, Claude Shannon’s pioneer-ing paper [4] was published. This paper, too, isdiscussed in [11] and [12]. Claude Shannon, in afootnote (p. 626 in [4]), expresses the view that“Communication theory is heavily indebted toWiener for much of its basic philosophy andtheory ...”. He also states (p. 627) that “We mayalso here refer to Wiener’s forthcoming book“Cybernetics” dealing with the general prob-lems of communication and control.” Withoutgoing further into the matter, it should also benoted that Claude Shannon worked on cryptog-raphy during World War II. This report was notdeclassified until 1948, when it was published as“Communication Theory of Secrecy Systems” [8].

During World War II Norbert Wiener developed“The Extrapolation, Interpolation, and Smooth-ing of Stationary Time Series” (NationalDefence Research Council, Section D2 Report,Feb. 1942), dealing with control of gunfireaimed at targets in motion. This report was alsoclassified, and in addition rather heavy reading,so it stayed fairly unknown before it became thebasis for an MIT course from 1947. I belongedto the second class following this course. Nor-bert Wiener’s starting point is an analysis of sta-tistically stationary time series, a topic I discussin [11] and [13]. For illustration, I have chosento reproduce here – as Figure 2 – a simplifiedFigure 5 from [13], showing the result of usinga statistically optimal predictor for filtered fluc-tuation noise – built at MIT by my good friendCharles A. Stutt as part of his Ph.D. work.

In 1948 Norbert Wiener published his book“Cybernetics”, about control through communi-cation between, and feedback of the processes inliving organisms, in machines, and in society[9]. The title (derived from the Greek word forsteersman) is not only contextually meaningful,it has also been included in the vocabulary ofmany languages. In Chapter III, “Time Series,Information, and Communication” (pp. 74-112),Norbert Wiener gives the following “certifica-tion”: “The relevant general theory has beenpresented in a very satisfactory form by Dr. C.Shannon”. Claude Shannon, in his book reviewof “Cybernetics” [10], writes: “Communicationengineers have a charter right and responsibilityin several of the roots of this broad field and willfind Wiener’s treatment interesting reading,filled with stimulating and occasionally contro-versial ideas. – Professor Wiener, who has con-tributed much to communication theory, is to becongratulated for writing an excellent introduc-tion to a new and challenging branch of science.”

The “Bit” as a Unit of InformationThe information unit bit for “binary digit” seemsto be officially introduced in print for the firsttime in [4], and was “suggested by J.W. Tukey”according to Claude Shannon. John WilderTukey (1915 – 2000) was a well-known statis-tician, a contemporary of Shannon from BellTelephone Laboratories (birth and death oneyear before Shannon’s), and at the same timeprofessor at Princeton University. He also intro-duced the term “software”, and made importantcontributions to the development of Fast FourierTransform (FFT) algorithms.

My 1950 Presentations of StatisticalInformation TheoryMy journal papers [11], [12], and [13] were thefirst presentations on statistical information the-ory in Norway.2) I would like to point out onething regarding these papers: The parts dealingwith information theory are heavily influencedby Claude Shannon’s own way of presenting histheory. Please note how many fundamentalresults can be derived using simple explanations.Following Shannon’s publications many othershave come up with articles and books, many ofwhich try to glorify the subject matter by usingneedlessly complicated descriptions. My advicehas always been: “Read Shannon in original!”

Reflections

On Mathematical SolutionsIt may be worth pointing out that the methods ofresearch have changed considerably as comput-ers have become widely available and obtainedgreater computational power.

Wiener’s solution for statistically optimal filters,as developed during World War II, was based onthe choice of minimum mean squared error as anoptimality criterion. This led to the Wiener-Hopfequations, which were solvable analytically.Without further comparison intended, I was oneof those facing a similar situation in my doctoralwork. We were dependent on mastering the ana-lytical solutions; otherwise our research mightfail completely, even after years of work. Today,computers are available for obtaining numericalsolutions. However, it might be added that un-critical use of computers can also lead to a lossof judgement, intuition, and physical under-standing.

Good or Bad?The development of telecommunications issteadily accelerating, particularly after theadvent of transistors, satellites, and optical fibres

2) Translator’s note: Translated versions of [11], [12] and [13] are found in this volume of Telektronikk.

Page 36: Information Theory and its Applications

34 Telektronikk 1.2002

in practical systems. The number of servicesavailable to the users is already overwhelming,and seems set to increase further.

As with all technology, telecommunications arein many cases a good servant, but it can also bea bad master. Personally, I particularly dislikethree things:• The mindless over-use. I am increasingly sur-

prised to see that people have so much timeand money to spend on telecommunicationservices.

• The lack of quality in the content that is trans-mitted, be it the shallowness of many mobilephone conversations, bad television programs,or the lack of quality control concerning whatbecomes available on the Internet.

• The distortion of our language found in e-mail.

However, regardless of• type of communication link – copper wires,

cable, fibres, or radio communication, director via satellite, fixed or mobile;

• user equipment – telegraph, telephone, fac-simile, television, or computers;

Claude Shannon’s fundamental capacity formuladefines the optimal channel use, against whichthe various technical solutions can be measured;

see Figure 3. Shannon gave the system plannersthe basis for quantitative assessments.

References1 Shannon, C E. Communications in the pres-

ence of noise. Proc. of the IRE, 10–21, 1949.

2 Nyquist, H. Mathematical analysis of ran-dom noise. Bell Syst. Technical Journal,XXIII, 282–332, 1944; and XXIV, 46–156,1945.

3 Rice, S O. Statistical properties of a sinewave plus random noise. Bell Syst. TechnicalJournal, XXVII, 109–157, 1948.

4 Shannon, C E. A mathematical theory ofcommunication. Bell Syst. TechnicalJournal, XXVII, 623–656, 1948; andXXVIII, 623–715, 1948. (Later reprinted in:Shannon, C E, Weaver, W. The Mathemati-cal Theory of Communication. The Univer-sity of Illinois Press: Urbana, 1949.)

5 Hartley, R V L. Transmission of informa-tion. Bell Syst. Technical Journal, VII,535–573, 1928.

6 Gabor, D. Theory of communication. Jour-nal of IEE, 93, 439–457, 1946.

7 Glavier, A G. Evaluation of transmissionefficiency according to Hartley’s expressionof information content. Electrical Communi-cation, 25, 414–420, 1948.

8 Shannon, C E. Communication theory ofsecrecy systems. Bell Syst. Technical Jour-nal, XXVIII, 656–715, 1949.

9 Wiener, N. Cybernetics or The Controland Communication in the Animal and theMachine. New York, The Technology Press,John Wiley, 1948.

10 Shannon, C E. Cybernetics, or Control andCommunication in the Animal and theMachine, by Norbert Wiener (book review).Proc. of the IRE, 1305, 1949.

11 Knudtzon, N. Statistisk kommunikasjonste-ori. En kortfattet oversikt over problemstill-ingen. Teknisk Ukeblad, 16. nov. 1950,883–887.

12 Knudtzon, N. Informasjonsteori. Elektro-teknisk Tidsskrift, 30, 373–380, 1950.

13 Knudtzon, N. Statistisk optimale nettverk.Elektroteknisk Tidsskrift, 32, 413–416, 1950.

Figure 3 Comparison of PCMand PPM with idealperformance

Page 37: Information Theory and its Applications

35

1 IntroductionDifferent types of xDSL (Digital SubscriberLine) systems used in the existing pair cables ofthe public access network are some of the majoralternatives for implementing fixed broadbandaccess. Symmetrical systems like SHDSL [1]have equal transmission rates in both directionsand use two-way transmission within the samefrequency band. In this case the dominatingnoise mechanism is near end crosstalk (NEXT).The traditional method for calculating the chan-nel capacity of pair cables with respect to cross-talk has been to use Shannon’s channel capacityformula, where the noise power spectral densityis based upon the 99 % confidence limit of thecrosstalk power sum. This approach is used intextbooks like Starr, Cioffi and Silverman [2]and also in different xDSL standards, forinstance the ITU standard for SHDSL [1]. Thismethod gives a pessimistic estimate for NEXT,because NEXT has significant variations withfrequency. The NEXT noise powers in two dif-ferent frequency bands are uncorrelated for a fre-quency separation greater than approximately100 – 200 kHz. This effect was first modelled indetail by Gibbs and Addie [3] who applied theirmodel to single carrier baseband systems.

In the current paper, the basic principles ofGibbs and Addie are generalised and extended tothe calculation of channel capacity. This gives arealistic estimate of the achievable bitrate for amulticarrier system using adaptive modulation.The calculations are based both upon analyticalmethods and a Monte Carlo simulation of ran-dom crosstalk coupling coefficients throughoutthe cables in a large ensemble of cables. Bymeans of the Monte Carlo simulation the entire

probability distribution of the channel capacityis calculated. Hence, the 1 % confidence limit ofthe channel capacity for a given system is esti-mated directly from the probability distributionof the channel capacity instead of the indirectestimate based upon worst case NEXT (99 %confidence limit).

The new method will primarily be suitable forsystems with two-way transmission, whereNEXT is the dominating noise mechanism.The results show that the new method gives anincrease in bitrate of approximately 5 % com-pared to the traditional method when appliedto a system with bandwidth 400 kHz in a 10 paircable (10 pair binder group).

This new approach may in principle be used alsofor far end crosstalk (FEXT). However, in cableswith identical propagation constants in all pairs,there will be full correlation between FEXT atdifferent frequencies, so that the new and thetraditional method will give identical results.

The paper is organised as follows. First, a NEXTmodel is presented together with an analyticalanalysis of NEXT. Then the Monte Carlo simu-lation is explained, and it is shown how thechannel capacity may be calculated. Estimatesof average rate and worst-case rate are presentedfor a multicarrier system example, and the re-sults are compared with results found by usingtraditional estimates of NEXT. To conclude, it isexplained how this new method might contributeto an improvement of the maximum range oftransmission systems for the SHDSL applica-tion, by using a multicarrier system instead ofthe standardised single carrier SHDSL system.

The True Channel Capacity of Pair CablesWith Respect to Near End CrosstalkN I L S H O L T E

Previously, the channel capacity of pair cables that use two-way transmission has been calculated by

means of a worst-case estimate of near end crosstalk (NEXT). This is a pessimistic estimate because

near end crosstalk in frequency bands with some frequency separation is uncorrelated. In this paper

the true channel capacity in a pair cable with respect to NEXT is calculated by means of a stochastic

crosstalk model. It is shown how the mean and variance of the channel capacity can be calculated both

by analytical methods and by simulation. The entire probability distribution of the channel capacity is

estimated by means of a Monte Carlo simulation over a large ensemble of cables. Hence, a true esti-

mate of the 1 % confidence limit of the channel capacity with respect to NEXT can be calculated for

a given type of cable. The results are compared with traditional estimates. It is shown for a typical

example that the worst-case channel capacity is approximately 5 % higher than the traditional estimate

and that the average channel capacity is approximately another 10 % higher than the true worst-case

estimate.

Nils Holte (55) is professor inTelecommunications at the Nor-wegian University of Scienceand Technology (NTNU). Hereceived his Siv.Ing. degree in1971 and his Dr.Ing. degree in1976, both from NTNU. His mainresearch interests are adaptivefilters, crosstalk in pair cables,and digital communications,with special emphasis on OFDMand xDSL systems.

[email protected]

Telektronikk 1.2002

Page 38: Information Theory and its Applications

36 Telektronikk 1.2002

2 Near End Crosstalk ModelThe near end crosstalk between two pairs in amultipair cable will consist of contributions asillustrated in Figure 2.1. It is assumed that bothpairs are terminated by their characteristicimpedance Z0 at both ends of the cable.

The near end crosstalk transfer function for highfrequencies (f > 100 kHz) between pair no. i andpair no. j of a cable is given by Klein [4]:

(2.1)

is the lossless phaseconstant of the cable in rad/km

γ = α + jβ is the propagation constant of thecable

α = α(f) is the attenuation constant of a pair inNeper/km

β = β(f) is the phase constant of a pair in rad/kmκNi,j(x) = [Ci,j(x) / C + Li,j(x) / L] / 2 is the nor-

malised NEXT coupling coefficient betweenpair i and j at position x

Ci,j(x) is the mutual capacitance per unit lengthbetween pair i and j

Li,j(x) is the mutual inductance per unit lengthbetween pair i and j

C is the capacitance per unit length of the pairsL is the capacitance per unit length of the pairs

is the cable length

According to Cravis and Crater [5], the couplingcoefficients are stationary, Gaussian, randomprocesses with correlation length less than a fewmeters. In comparison with actual values of αand β, this means that the coupling coefficientsmay be modelled as white noise processes withcorrelation functions:

RNi,j(τ) = E[κNi,j(x) ⋅ κNi,j(x + τ)] =κNi,jδ(τ). (2.2)

The constants kNi,j will be different for differentpair combinations and can be estimated fromcrosstalk measurements. Coupling coefficientsin different pair combinations are statisticallyindependent.

Most other authors [3, 5, 6] treat this coefficientas a random variable. Cravis and Crater [5]assume that kNi,j follows a gamma distribution,while Bradley [6] and Gibbs and Addie [3]assume a log normal distribution, but theseassumptions are only approximations. For agiven cable design the coefficient kNi,j is a deter-ministic function of the pair combinations, andit is mainly determined by the average distancebetween the two pairs and the difference intwisting periods between them [7].

A pair cable consisting of N pairs is used as anexample. For simplicity, only one specific dis-turbed pair is taken into account, and this pair isdenoted pair no. 1. The procedure will be thesame for other disturbed pairs. Measurementsfor a 0.6 mm pair cross stranded cable using 10pair binder groups have been averaged over indi-vidual pair combinations [8], and the result isshown in Table 2.1.

There are moderate differences in NEXT powersum between the pairs of a binder group (1.5 dBbetween max and min). The pair with the medianof average NEXT power sum has been chosen asthe disturbed pair (pair no. 1). Table 2.1 showsthe average NEXT between pair no. 1 and all theother pairs in the binder group.

There will be a minor absolute error by usingcalculations for only one pair, but this error isprobably less than 1 dB, and this approach willcertainly be sufficient for a comparison ofmethods.

Figure 2.1 Near end crosstalkbetween two pairs

Table 2.1 Average NEXT fordifferent pair combinations ina 0.6 mm cross strandedpair cable

Pair combination Average NEXT at 1 MHz

1–2 54.2 dB

1–3 55.7 dB

1–4 57.1 dB

1–5 57.9 dB

1–6 59.0 dB

1–7 59.0 dB

1–8 59.1 dB

1–9 59.3 dB

1–10 59.6 dB

z0

z0 V10

V20

z0

o

V1,

V2,

z0

,x x+dx

χN(x)

HNi,j(f) =

V20

V10

= jβ0

0

KNi,j(x) exp(−2γx)dx.

β0 = β0(f) = ω√

LC

Page 39: Information Theory and its Applications

37Telektronikk 1.2002

2.1 Average NEXT TransferFunctions

According to the model given by (2.1) and (2.2),the near end crosstalk transfer function will be arandom variable with zero mean. The averageNEXT power transfer function for one pair com-bination is given by:

(2.3)

* denotes complex conjugate.

The upper integration limit has been increased

from to infinity to simplify the calculations.

This can be done because crosstalk coupling atthe outer end of the cable does not contribute toNEXT for cable lengths of practical interest.Using (2.2) the result may be expressed:

(2.4)

Above 100 kHz, the attenuation and phase con-stants may be approximated by:

(2.5)

β0 = β = β(f) = β1MF. (2.6)

α1M and β1M are the attenuation and phase con-stants at 1 MHz.

F is the frequency in MHz.

Consequently the NEXT power transfer functionis expressed:

(2.7)

where kN is a constant. This is the result foundby Cravis and Crater [5], and it shows that aver-age NEXT increases 15 dB/decade of frequency.

For a cable with many pairs, crosstalk from dif-ferent pairs will add on a power basis. If thesame type of system is used in all pairs of an N-pair cable, the effective crosstalk is given by thecrosstalk power sum. Near end crosstalk powersum for pair no. 1 may be expressed:

(2.8)

Average NEXT power sum will be:

HNps ( f )2

= HN1,i ( f )2

i=2

N

∑ .

Pi, j ( f ) =kNi, j ⋅β1M

2

4α1M⋅F1.5 = kN ⋅F1.5 ,

α = α ( f ) = α1M F ,

Pi, j ( f ) = kNi, j ⋅β02 exp

0

∫ −4αx( ) ⋅dx

=kNi, j ⋅β0

2

4α.

l

Pi, j ( f ) = E HNi, j ( f )2⎡

⎣⎢⎤⎦⎥

=

E β02 κNi, j

0

∫0

∫ (x) ⋅κNi, j (y)⎡

⎣⎢⎢

exp −2γx − 2γ *y( ) ⋅dx ⋅dy].

(2.9)

where kNps is a constant. NEXT power sum isalso increasing 15 dB/decade with frequency.

2.2 The Covariance of NEXTIn order to study the correlation between NEXTat different frequencies it is convenient to inves-tigate the covariance of the crosstalk powertransfer function. The covariance of the NEXTcrosstalk power transfer function at two frequen-cies f1 and f2 for one specific pair combination isdefined by:

(2.10)

The first term may be expressed:

(2.11)

The solution is in correspondence with Gibbsand Addie [3] given by:

(2.12)

It is common to use the assumption that α << βin order to simplify the results, and hence thelast term may be neglected. However, this isonly a fair approximation for the lowest frequen-cies. At 100 kHz the ratio between β and α isless than 10. Using this assumption, the covari-ance may be expressed:

(2.13)

cov ,

.

, ,

,

H f H f

k

i j i j

i j

N N

N

1

2

2

2

2012

022

1 2

2

1 2

24

( ) ( )[ ] =

⋅ ⋅

+( ) + −( )[ ]β β

α α β β

E H f H f

k

i j i j

i j

N N

N

, ,

,

1

2

2

2

2012

022

1 2

1 2

2

1 2

2

1 2

2

1 2

2

161

4

4

( ) ⋅ ( )[ ] =

⋅ ⋅

⎣⎢

++( ) + −( )

++( ) + +( )

⎦⎥⎥

β β

α α

α α β β

α α β β

E HNi, j f1( )2⋅ HNi, j f 2( )

2⎡⎣⎢

⎤⎦⎥

=

E β012 ⋅β02

2[ κNi, j x( )0

∫0

∫0

∫0

∫ ⋅κNi, j y( ) ⋅

κNi, j z( ) ⋅κNi, j w( ) ⋅

exp −2γ1x − 2γ1*y − 2γ 2z − 2γ 2

*w( ) ⋅dx ⋅dy ⋅dz ⋅dw]

cov ,, ,

, ,

, , ,

H f H f

E H f H f

P f P f P f

i j i j

i j i j

i j i j i j

N N

N N

1

2

2

2

1

2

2

2

1 1 2

( ) ( )[ ] =

( ) ⋅ ( )[ ]− ( ) ⋅ ( ) ⋅ ( )

P f E H f

E H f

k k F

ii

N

ii

N

ps Nps

N

N Nps

( ) ( )

( )

,

,

,.

= [ ] =

[ ] =

= ⋅

=

=

2

1

2

2

02

12

1 5

4βα

Page 40: Information Theory and its Applications

38 Telektronikk 1.2002

The normalised covariance between the NEXTpower transfer function for one pair combinationat frequencies f1 and f2 is found by division withthe average power sum at the two frequencies:

(2.14)

The covariance of NEXT power sum is definedby:

(2.15)

The result is expressed:

(2.16)

The normalised covariance of NEXT power sumis:

(2.17)

ρ

ρ α α

α α β βρ ρ

ps

Nps Nps

ps ps

NN

=( ) ( )[ ]

( ) ⋅ ( )=

⋅ ⋅ ⋅

+( ) + −( )= ⋅

cov ,

,

H f H f

P f P f

kk

1

2

2

2

1 2

1 2

1 2

2

1 2

2

4

cov ,

.,

H f H f

k ii

N

Nps Nps

N

1

2

2

2

012

022

1 2

2

1 2

2 12

24

( ) ( )[ ] =

+( ) + −( )[ ]⋅

=

∑β β

α α β β

cov ,

.

, ,

H f H f

E H f H f

P f P f

ii

N

i jj

N

Nps Nps

N N

ps ps

1

2

2

2

1 1

2

22

2

2

1 2

( ) ( )[ ] =

( )⎛

⎝⎜

⎠⎟ ⋅ ( )⎛

⎝⎜⎜

⎠⎟⎟

⎣⎢⎢

⎦⎥⎥

− ( ) ⋅ ( )= =

∑ ∑

ρ

α α

α α β β

=( ) ( )[ ]

( ) ⋅ ( )=

⋅ ⋅

+( ) + −( )

cov ,

.

, ,

, ,

H f H f

P f P f

i j i j

i j i j

N N1

2

2

2

1 2

1 2

1 2

2

1 2

2

4

where:

(2.18)

The approximations (2.5) and (2.6) are used forthe frequencies F1 and F2 given in MHz. Settingε = α1M / β1M, the normalised covariance ofNEXT power sum is expressed:

(2.19)

For the 0.6 mm cable used as example, the coef-ficient ρkN is equal to 0.137 and ε = 0.055. Thenormalised covariance between NEXT powersum at f1 and f2 for this cable is shown in Figure2.2.

The figure shows that near end crosstalk at twodifferent frequencies with a large frequency sep-aration is almost uncorrelated. The correlationis low for frequency differences larger than100 kHz in the frequency range below 500 kHz.

2.3 Probability Distributions of NEXTCrosstalk will vary statistically over an ensembleof different cables of the same type. The NEXTtransfer function is a linear function of the cou-pling coefficients, and hence it will be a com-plex Gaussian variable with zero mean. In orderto calculate the channel capacity of a cable it isdesirable to know the probability distributionsof NEXT. The real and imaginary parts of theNEXT transfer function for a single pair combi-nation are given by:

(2.20)

(2.21)

A detailed calculation of the probability distribu-tion of the crosstalk transfer function HNi,j(f) isgiven in Appendix A. Under the assumption thatα << β, the second order moments of ri,j(f) andqi,j(f) are given by:

(2.22)

E r f E q f

k P f

E r f q f

i j i j

i j i j

i j i j

, ,

, ,

, ,

( ) ( )

( ),

( ) ( ) .

2 2

02

8 2

0

[ ] = [ ] =

⋅=

⋅[ ] =

N β

α

ρps =4 ⋅ρkN ⋅ ε 2 ⋅ F1 ⋅ F2

ε 2 F1 + F2( )2+ F1 − F2( )2

.

ρk

ii

N

ii

N

k

k

N

N

N

=⎡

⎣⎢

⎦⎥

=

=

12

2

12

2

,

,

.

Figure 2.2 The normalisedcovariance between NEXTpower sum at two frequenciesas a function of the frequencydifference

0

Frequency difference, f2 - f1, MHz

0.15

0.1

0.05

0

Nor

mal

ised

NE

XT

ps

cova

rian

ce a

t f1

and

f2

-0.5 -0.4 -0.3 -0.2 -0.1 0.1 0.2 0.3 0.4 0.5

f1 = 200 kHz

f1 = 500 kHz

f1 = 1 MHz

f1 = 2 MHz

ri,j(f) = Re[HNi,j(f)] =

jβ0

0

KNi,j(x) · sin(2βx) · exp(−2αx)dx,

qi,j(f) = Im[HNi,j(f)] =

jβ0

0

KNi,j(x) · cos(2βx) · exp(−2αx)dx.

Page 41: Information Theory and its Applications

39Telektronikk 1.2002

endix B. The variables z1,i,j and z2,i,j are theNEXT power transfer function between pair iand j at frequencies f1 and f2 respectively.

(2.27)

I0(x) is the modified Bessel function of orderzero [14], and ρ is given by (2.14).

The two-dimensional probability density func-tion of NEXT power sums z1 = z(f1) and z2 =z(f2) for pair no. 1 is given by:

p(z1, z1; f1, f2) =p1,2(z1,1,2, z2,1,2; f1, f2) *p1,3(z1,1,3, z2,1,3; f1, f2) * ... * (2.28)p1,N(z1,1,N, z2,1,N; f1, f2).

In this equation * denotes two-dimensional con-volution. Given that the two independent sets ofparameters (x1, y1) and (x2, y2) have the indepen-dent probability distributions p1(x1, y1) and p2(x2,y2), the probability distribution of the sums z1 =x1 + y1 and z2 = x2 + y2, may be expressed by thetwo-dimensional convolution defined by:

The power transfer function of NEXT at onefrequency for a specific pair combination isdenoted zi,j, where zi,j = zi,j(f) = ri

2, j(f) + qi

2, j(f).

Because ri,j(f) and qi,j(f) are independent Gaus-sian variables with identical variances, the prob-ability density of zi,j will in accordance withappendix A be given by:

(2.23)

NEXT power sum for pair no. 1 is denoted

The probability distribution of NEXT powersum is given by:

p(z; f) = p1,2(z1,2; f) * p1,3(z1,3; f) * ... *p1,N(z1,N; f), (2.24)

where * denotes convolution.

For the general case where all pair combinationshave different values of kNi,j, the result of theconvolution is:

(2.25)

For the special case that all pair combinationshave the same NEXT level, P(f) = P1,i(f), i =2,N, the NEXT power sum will be gamma dis-tributed with N – 1 degrees of freedom and prob-ability density given by:

(2.26)

Γ(x) is the gamma function. In practical cablesthe NEXT level usually varies between differentpair combinations, so that (2.25) represents amore realistic case than (2.26).

2.4 Joint Probability Distributionof NEXT at Two Frequencies

In order to calculate estimates of the variance offor instance the channel capacity, it is necessaryto know the joint probability distributions ofNEXT at two different frequencies f1 and f2. Thefollowing two-dimensional probability densityfunction of NEXT power transfer function fora single pair combination is calculated in App-

p(z; f ) =

1

Γ(N −1)⋅

z N−2

P( f )[ ]N−1 ⋅exp −z

P( f )

⎝⎜

⎠⎟ ,

z ≥ 0.

p(z; f ) =

P1,i ( f )[ ]N−3⋅exp −

z

P1,i ( f )

⎝⎜

⎠⎟

P1,i ( f )− P1, j ( f )[ ]j =2j≠ i

N

⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟

i=2

N

∑ ,

z ≥ 0.

Figure 3.1 Illustration of aMonte Carlo simulation of

M cables

Cable no M

Cable no ...

Cable no 3

Cable no 2

Cable no 1

Pair no 1

Pair no 2

Pair no N

∆x

Pi,j(zi,j ; f) =

1

Pi,j(f)· exp

(−

zi,j

Pi,j(f)

), zi,j ≥ 0.

z = z(f) =N∑

i=2

zi,j(f).

pi,j(z1,i,j , z2,i,j ; f1, f2) =

1

Pi,j(f1) · Pi,j(f2) · (1 − ρ)·

exp

(−

z1,i,j

(1 − ρ) · Pi,j(f1)−

z2,i,j

(1 − ρ) · Pi,j(f2)

I0

(2√

ρ

1 − ρ·√

z1,i,j · z1,i,j

Pi,j(f1) · Pi,j(f2)

).

Page 42: Information Theory and its Applications

40 Telektronikk 1.2002

est frequency under consideration (∆x < λ / 20).Statistically independent crosstalk couplings aredrawn according to (2.2) for all cable segmentsand all pair combinations in each cable, and anensemble of many different cables may be gen-erated. This is illustrated in Figure 3.1.

Near end crosstalk for a specific pair combina-tion is then calculated from Equation (2.1). Thisgives results as shown in Figure 3.2 for threeindividual pair combinations of the 10 pair cable(binder group) described in Table 2.1. Thisshows the typical peaky nature of NEXT that isobserved in measurements. Average NEXT forthis cable type is also shown in the figure as abroken line. In accordance with (2.7) this is astraight line with slope 15 dB/decade of thefrequency.

The NEXT power sum in different cables of thetype described in Table 2.1 has been calculatedby simulation, and the result is shown in Figure3.3 for pair no. 1 of three different cables. Due tothe averaging over 9 different pair combinations,the NEXT power sum is less peaky than NEXTfor individual pair combinations. The figure alsoshows the average NEXT power sum, which isa straight line that varies 15 dB/decade with fre-quency.

4 Channel CapacityThe theoretical channel capacity of a copper paircan be found by Shannon’s channel capacity for-mula. A realistic estimate of the channel capac-ity of a pair may be calculated by inserting twofactors into Shannon’s channel capacity formula.This modified channel capacity will be a realis-tic estimate of the transmission rate of a multi-carrier system using adaptive modulation. Acc-ording to Starr, Cioffi and Silverman [2], thetransmission rate for a system operating in thefrequency band [fl, fh] of a cable may be ex-pressed:

(4.1)

z = z(f) is the near end crosstalk power sum ofthe actual pair in the cable

keff ≤ 1 is the ratio between user available bitrateand the total bitrate

η = 10-mdB/10 where mdB is the distance to theShannon bound in dB

The crosstalk power sum, z, is a random vari-able, and hence the channel capacity or transmis-sion rate of the system will be a random vari-able. The moments and probability distributionof the channel capacity can either be calculatedby analytical methods or by Monte Carlo simu-lation.

Figure 3.2 NEXT for three pair combinations generated by a Monte Carlo simulation

Figure 3.3 NEXT power sumfor pair no. 1 of threedifferent cables

100

Frequency, MHz

90

80

70

60

50

40

30

Nea

r en

d cr

osst

alk,

dB

10-1 101

pair combination 1

pair combination 2

pair combination 3

average NEXT

NEXT ps,cable 1

NEXT ps,cable 2

NEXT ps,cable 3

average NEXT ps

100

Frequency, MHz

65

60

55

50

45

40

35

30

NE

XT

pow

er s

um,

dB

10-1 101

(2.29)

3 Monte Carlo Simulationof NEXT

A cable is simulated in a computer by dividingthe cable in short segments of length ∆x, whichis much shorter than the wavelength at the high-

p z1, z2( ) = p x1, x2( )* p y1, y2( ) =

x1 =0

∫x2 =0

∫ p x1, x2( ) ⋅

p z1 − x1, z2 − x2( )dx1 ⋅dx2 .

R() = keff ·

∫ fh

fl

log2

(1 + η ·

exp(−2α)

z

)df.

Page 43: Information Theory and its Applications

41Telektronikk 1.2002

4.1 Calculation of Channel Capacityby Analytical Methods

The first and second order moments of the chan-nel capacity can be calculated by means of theprobability distributions of NEXT power sumfound in Section 2.3. The average channelcapacity will be:

(4.2)

The variance of the channel capacity will be:

var[R( )] = E[R( )2] – E[R( )]2 (4.3)

which may be expressed:

(4.4)

The integrals of (4.2) and (4.4) cannot be solvedanalytically and have to be calculated either byapproximations or numerical integration. In Sec-tion 5 it is shown that the channel capacity isapproximately Gaussian, and hence an approxi-mate analytical probability of the channel capac-ity is defined by (4.2) and (4.4). However, due tothe complexity of both the above integrals andthe convolution integral (2.28) the use of MonteCarlo simulation has been chosen in the rest ofthis paper.

4.2 Calculation of Channel Capacityby Monte Carlo Simulation

An ensemble of M different cables is generatedin a Monte Carlo simulation. The channel capac-ity of cable no. k is found by:

(4.5)

where is the NEXT power sum ofthis sum of this specific cable.

By means of the above formulas, the channelcapacity can be calculated for the ensemble ofcables. The average bitrate and the variance ofthe bitrate are estimated by:

(4.6)

(4.7)

The 1 % confidence limit (worst case estimate)

of the bitrate, R1%( ), may be estimated from the

ensemble Rk( ).

5 System ExampleTwo-way transmission in pair cables is efficientonly at the lowest frequencies. At higher fre-quencies it is clearly advantageous to use one-way transmission. This is exploited in the stan-dards for ADSL [9] and VDSL [10]. The basicdifferences between one-way and two-way sys-tems are explained for instance in a tutorialpaper by Holte [11]. Two-way transmission isused in HDSL systems and in the newly stan-dardised SHDSL system [1]. The advantagesof calculating the channel capacity by this newapproach are demonstrated by an example. Thisexample is a system which uses a spectrum simi-lar to that of the SHDSL system with maximumrate, 2.3 Mbit/s. The SHDSL system is a base-band system with a 10 dB bandwidth of approxi-mately 400 kHz. The same bandwidth is used inthe example, but because the crosstalk modelsare not valid below 100 kHz, the bandwidth hasbeen shifted up to the frequency interval from100 kHz to 500 kHz. Hence the capacity esti-mates will be pessimistic in comparison with asystem operating from 0 – 400 kHz. The mainassumptions of the system model used in thesimulations are listed below:

Bandwidth:100 kHz < f < 500 kHz

Modulation:Multicarrier modulation (DMT), [12]

Modul. meth. in each sub-band:Trellis coded M-QAM, 4 ≤ M ≤ 16384

Corresp. bandwidth efficiency:1 to 13 bit/s/Hz

Distance to Shannon bound:9 dB (6 dB margin + 3 dB for TCM)

User available bitrate:90 % of total bitrate

Cable type:0.6 mm copper cable (22 AWG)

Cable size:10 pair binder group

Attenuation constant:15.1 dB/km at 1 MHz, proportional to

Phase constant:31.4 rad/km at 1 MHz, proportional to f

Noise model:Only NEXT between identical systems withina binder group; all pairs used

f

E[R()] = keff ·

∫ fh

fl

∫∞

0

p(z; f)·

log2

(1 + η ·

exp(−2α)

z

)dz · df.

var[R()] =

k2

eff ·

∫ fh

fl

∫ fh

fl

∫∞

0

∫∞

0

[p(z1, z2; f1, f2)−

p(z1; f1) · p(z2; f2)]·

log2

(1 + η ·

exp(−2α1

z1

)

log2

(1 + η ·

exp(−2α2)

z2

)dz1·

dz2 · df1 · df2.

Rk() =

keff ·

∫ fh

fl

log2

(1 + η ·

exp(−2α)

| HNps(f) |2

)df,

| HNps(f) |2

Rav() =1

M

M∑

k=1

Rk(),

σ2

R() =1

M

M∑

k=1

R2

k() − R2

av().

Page 44: Information Theory and its Applications

42 Telektronikk 1.2002

Average NEXT power sum:47.9 dB at 1 MHz, proportional to f1.5 (pair 1)

99 % conf.limit, NEXT ps:44.7 dB at 1 MHz (pair 1)

In order to take into account further practicallimitations connected to adaptive modulation,the estimation of transmission rate has beenslightly modified in comparison with Equation(4.5). The bandwidth efficiency is set to zero atfrequencies where the signal to noise ratio givesa bandwidth efficiency below 1 bit/s/Hz, and thebandwidth efficiency is limited to a maximum of13 bit/s/Hz. In the results presented in this paper,no quantisation of bandwidth efficiency has beenused within the allowed interval. An alternativeapproach will be to quantise the bandwidth effi-ciency to integer values in order to simulateadaptive modulation which uses two-dimen-sional trellis coding. The reason for not usingquantisation of bandwidth efficiency in thispaper is that it introduces more or less randomquantisation effects in the results, and this maybe disturbing in relative comparisons. It is rec-ommended to use quantisation of bandwidthefficiency in detailed dimensioning cases.

6 Results for AdaptiveModulation

Bitrates for different cable lengths have beenestimated by means of the Monte Carlo simula-tion, based upon the above methods and thecrosstalk measurements shown in Table 2.1. Theestimated bitrates for 10,000 different cables atcable lengths 1, 2 and 3 km are shown in Figure6.1 in a normal distribution plot. This resultshows that the bitrate of a system with adaptivemodulation fits very well with a Gaussian proba-bility distribution.

Both the average bitrate of the simulated cablesand the 1 % confidence limit of the bitrate areshown in Figure 6.2 as a function of the cablelength.

The difference between the two upper curvesshows the average increase in bitrate that isobtained by using adaptive modulation insteadof offering a guaranteed fixed rate. This increaseis approximately 0.25 Mbit/s. The increase inbitrate is typically 10 % at range 3 km. Figure6.2 also shows the bitrate for the same type ofsystem based upon a traditional estimate ofNEXT. This means that the 99 % confidencelimit of NEXT power sum is used at all frequen-cies. The difference between the two lowercurves shows the increase in bitrate that isobtained by using the capacity estimates of thenew method instead of the traditional approach.The increase in bitrate is approximately 5 % forthis example. The increase in bitrate between thetwo approaches decreases for longer ranges. Thereason is that the useful bandwidth decreaseswith range due to high attenuation in the upperend of the frequency. Hence, the number of fre-quency bands that have independent NEXT willdecrease as the cable length increases.

7 Implications for NewSystems

The newly standardised SHDSL system [1] is asingle carrier baseband system using 16-levelpulse amplitude modulation and trellis codedmodulation, and hence it has a bandwidth effi-ciency of less than 6 bit/s/Hz. If a new systemis implemented which uses adaptive multicarriermodulation and is spectrally compatible withSHDSL, this new system will obtain signifi-cantly higher bitrates for the same cable lengths.The increase in bitrate is due to the followingeffects:

• higher bandwidth efficiency of multicarriermodulation, in particular at the lowest fre-quencies;

• less excess bandwidth for multicarrier modu-lation;

• independent NEXT in different frequencybands as explained in this paper;

• a variable rate system will have a larger aver-age bitrate than a fixed rate system as shownin Figure 6.2.

8 ConclusionsIt has been shown how the true channel capacityof a pair cable with respect to NEXT can be cal-culated both analytically and by simulation. Itis demonstrated that by taking into account thefrequency variations of near end crosstalk, the

1.5

0.999

Pro

babi

lity

4

L = 2 kmL = 3 kmL = 4 km

Normal Probability Plot

0.9970.990.980.950.90

0.75

0.50

0.25

0.100.050.020.01

0.0030.001

3.5

Bitrate, Mbit/s

2 2.5 3

Figure 6.1 Cumulativedistribution of estimatedbitrate for a 0.6 mm cablefor different cable lengths

Page 45: Information Theory and its Applications

43Telektronikk 1.2002

channel capacity for two-way transmission inpair cables may be increased in comparison withtraditional methods. The increase in worst-casetransmission rate is approximately 5 % for a typ-ical system example. For the same example it isalso shown that the average bitrate for a rateadaptive system may be increased by typically10 % in comparison with a system with a guar-anteed minimum rate. This applies to multicar-rier systems with adaptive modulation that usetwo-way transmission and echo cancelling at allfrequencies.

Appendix AProbability Distributions ofNEXT at a Single FrequencyThe simplified notation r = ri,j(f) and q = qi,j(f) isused in this appendix for the real and imaginarypart of the NEXT transfer function for one paircombination. Because the NEXT transfer func-tion is a linear function of the coupling coeffi-cients and the coupling coefficients are Gaussianwith zero mean, r and q will be jointly Gaussianwith zero mean. The variance of the real partwill be given by:

(A.1)

Inserting (2.2) gives:

(A.2)

Solving the integral gives the result:

(A.3)

The remaining second order moments of theNEXT transfer function are calculated corre-spondingly:

(A.4)

(A.5)

(A.6)

E q2[ ] = β02 ⋅kNi, j cos2

0

∫ 2βx( ) ⋅

exp −4αx( )dx,

E r ⋅q[ ] =β0

2 ⋅kNi, j

8⋅

β

α 2 +β 2 ,

E r ⋅q[ ] = β02 ⋅kNi, j sin 2βx( )

0

∫ ⋅

cos 2βx( ) ⋅exp −4αx( )dx,

σ r2 = E r2[ ] =

β02 ⋅kNi, j

8α⋅

β 2

α 2 +β 2 .

E r2[ ] =

β02 ⋅kNi, j sin2

0

∫ 2βx( ) ⋅exp −4αx( )dx.

E r E x y

x y

x y dx dy

i j i j2

02

00

2 2

2 2

[ ] = ( ) ⋅ ( )[ ] ⋅

( ) ⋅ ( ) ⋅− −( ) ⋅

∞∞

∫∫β κ κ

β β

α α

N N, ,

sin sin

exp .

(A.7)

The correlation coefficient is defined by:

(A.8)

According to Papoulis [13], the probability den-sity of NEXT for a single pair combination maybe expressed:

(A.9)

Insertion of (A.3), (A.7) and (A.8) and using thenotation Pi,j = Pi,j(f) for average NEXT transferfunction gives:

(A.10)

By a change of variables the probability densitymay be expressed by the NEXT power transferfunction z = r2 + q2 and the phase angle ϕ =arctan(q / r):

p r qP

r q

P

r

Pr q

i ji j

i j i j

,,

, ,

( , )

exp .

=+

⋅ ⋅

−+

−⋅⋅

⋅ − ⋅( )⎡

⎣⎢⎢

⎦⎥⎥

α βπ β

αβ

α β

2 2

2 2

2

2

2

pi, j (r,q) =1

2π ⋅σ r ⋅σq 1−ρrq2

exp −1

2 1−ρrq2( )

r2

σ r2 − 2ρrq

r ⋅qσ r ⋅σq

+q2

σq2

⎝⎜

⎠⎟

⎢⎢

⎥⎥.

ρrq =E r ⋅q[ ]σ r ⋅σq

2α 2 +β 2.

σq2 = E q2[ ] =

β02 ⋅kNi, j

8α⋅2α 2 +β 2

α 2 +β 2 .

Figure 6.2 Estimated bitratesfor two-way transmission as a

function of range for amulticarrier system with

adaptive modulation on a0.6 mm pair cable

2

4.5

4

3.5

3

2.5

2

1.5

1

0.5

0

Bitr

ate,

Mbi

t/s

1 6

Average bitrate

1% confidence limit

Traditional estimate

Range, km

3 4 5

Page 46: Information Theory and its Applications

44 Telektronikk 1.2002

(B.3)

(B.4)

(B.5)

k,l = 1,2.

The solutions are expressed:

(B.6)

(B.7)

(B.8)

Under the assumption α << β the last term in theequations (B.6) – (B.8) may be neglected, andfor this case the covariance matrix is given by:

(B.9)

Using the notation P1,i,j = Pi,j(f1) and P2,i,j =Pi,j(f2) for average NEXT, the elements of thematrix may be expressed:

(B.10)

(B.11)

(B.12)

(B.13)

d = E r1 ⋅q2[ ] = −E q1 ⋅ r2[ ] =

P1,i, j ⋅P2,i, j

2⋅

β1 −β2( ) ⋅ α1 ⋅α2

α1 +α2( )2+ β1 −β2( )2 ,

c = E r1 ⋅ r2[ ] = E q1 ⋅q2[ ] =

P1,i, j ⋅P2,i, j

2⋅

α1 +α2( ) ⋅ α1 ⋅α2

α1 +α2( )2+ β1 −β2( )2 ,

b = E r22[ ] = E q2

2[ ] =P2,i, j

2,

a E r E qP i j= [ ] = [ ] =1

212 1

2, , ,

Ci, j =

a 0 c d0 a −d cc −d b 0d c 0 b

⎢⎢⎢

⎥⎥⎥

.

E q qk

k lk l i j

k l

k l k l

k l

k l k l

⋅[ ] =⋅ ⋅

+

+( ) + −( )+

+

+( ) + +( )

⎣⎢⎢

⎦⎥⎥

β β

α α

α α β β

α α

α α β β

0 0

2 2 2 2

4N ,

E r qk

k lk l i j

k l

k l k l

k l

k l k l

⋅[ ] =⋅ ⋅

+( ) + −( )+

+

+( ) + +( )

⎣⎢⎢

⎦⎥⎥

β β

β β

α α β β

β β

α α β β

0 0

2 2 2 2

4N ,

E rk ⋅ql[ ] =β0k ⋅β0l ⋅kNi, j

4

αk +α l

αk +α l( )2+ βk −βl( )2 +

αk +α l

αk +α l( )2+ βk +βl( )2

⎢⎢

⎥⎥

E qk ⋅ql[ ] = βk ⋅βl ⋅kNi, j cos 2βk x( )0

∫ ⋅

cos 2βl x( ) ⋅exp −2αk x − 2α l x( )dx,

E rk ⋅ql[ ] = βk ⋅βl ⋅kNi, j sin 2βk x( )0

∫ ⋅

cos 2βl x( ) ⋅exp −2αk x − 2α l x( )dx,

E rk ⋅ rl[ ] = βk ⋅βl ⋅kNi, j sin 2βk x( )0

∫ ⋅

sin 2βl x( ) ⋅exp −2αk x − 2α l x( )dx,

(A.11)

where ϕ = arctan(β / α), z ≥ 0, and 0 ≤ ϕ < 2π.

The phase of the signals in different pairs in acable is random. Hence, the phase angles ofNEXT transfer functions are irrelevant. Theprobability density of the NEXT power transferfunction for one pair combination is given by:

(A.12)

Under the assumption α << β this simplifies toan exponential distribution:

(A.13)

Appendix BJoint Probability Distributionsof NEXT at Two DifferentFrequenciesIn order to calculate the joint probability distri-bution of NEXT at two different frequencies f1and f2, the following simplified notation is intro-duced in this appendix: r1 = ri,j(f1), q1 = qi,j(f1),r2 = ri,j(f2) and q2 = qi,j(f2). Furthermore, theparameters α, β, β0 are given an additional index1 or 2 to denote the values at f1 and f2 respec-tively. The vector of NEXT variables is definedby:

vi,j = [r1 q1 r2 q2]T. (B.1)

where T denotes transposition. The vector vi,j

will in correspondence with Appendix A beGaussian with zero mean, and the probabilitydensity function is given by [13]:

(B.2)

where Ci,j is the covariance matrix of vi,j. Theelements of Ci,j are calculated in the same wayas in Appendix A and are given by:

pi, j vi, j ; f1, f 2( ) =

1

2π( )2 Ci, j

⋅exp −1

2vi, j

T Ci, jvi, j⎛⎝

⎞⎠,

pi, j (z) =1

Pi, jexp −

z

Pi, j

⎝⎜

⎠⎟ , z ≥ 0

pi, j (z) = p z,ϕ( )0

∫ dϕ =

α 2 +β 2

β ⋅Pi, jexp −

z

Pi, j⋅α 2 +β 2

β 2

⎣⎢⎢

⎦⎥⎥⋅

I0z ⋅α ⋅ α 2 +β 2

β 2 ⋅Pi, j

⎝⎜⎜

⎠⎟⎟, z ≥ 0

pi, j (z,ϕ ) =α 2 +β 2

2π ⋅β ⋅Pi, jexp −

z

Pi, j⋅

⎣⎢⎢

α 2 +β 2

β 2 +z ⋅α

β 2 ⋅Pi, j

⋅cos 2ϕ − θ( )⎤

⎦⎥⎥,

Page 47: Information Theory and its Applications

45Telektronikk 1.2002

(B.20)

The parameters a, b, c, d are replaced accordingto (B.10) – (B.13), and the result is expressed bythe normalised covariance ρ defined in (2.14).Hence, the joint probability density of the NEXTpower transfer function at two frequencies f1 andf2 for one pair combination is given by:

(B.21)

References1 ITU-T. Single-Pair High-Speed Digital Sub-

scriber Line (SHDSL) transceivers. Geneva,2001. (ITU-T Recommendation G.991.2.)

2 Starr, T, Cioffi, J M, Silverman, P J. Under-standing digital subscriber line technology.Upper Saddle River, Prentice Hall, 1999.

3 Gibbs, A J, Addie, R. The covariance of nearend crosstalk and its application to PCM sys-tem engineering in multipair cable. IEEETrans. on Commun., COM-27 (2), 1979,469–477.

4 Klein, W. Die Theorie des Nebensprechensauf Leitungen. Berlin, Springer, 1955.

5 Cravis, H, Crater, T V. Engineering of T1carrier system repeatered lines. Bell SystemTechn. Journal, March 1963, 431–486.

6 Bradley, S D. Crosstalk considerations for a48 channel PCM repeatered line. IEEETrans. on Communications, COM-23 (7),1975, 722–728.

7 Holte, N. A Crosstalk Model for Cross-Stranded Cables. Int. Wire & Cable Symp.,Cherry Hill, USA, Nov. 1982.

8 Holte, N. Crosstalk in subscriber cables,Final report. Trondheim, SINTEF, 1985.

pi, j z1, z2( ) =1

P1,i, j ⋅P2,i, j ⋅ 1−ρ( )⋅

exp −z1

P1,i, j−

z2

P2,i, j

⎝⎜

⎠⎟ ⋅

I02 ρ

1−ρz1 ⋅ z2

P1,i, j ⋅P2,i, j

⎝⎜

⎠⎟ ,

z1, z2 ≥ 0.

pi, j z1, z2( ) =

pi, j0

∫0

∫ z1,ϕ , z2 ,θ( )dϕ ⋅dθ =

1

4g⋅exp −

bz1 + az2

2g

⎝⎜

⎠⎟ ⋅

I0z1 ⋅ z2 ⋅ c2 + d 2

g

⎝⎜⎜

⎠⎟⎟,

z1, z2 ≥ 0.

E[r1 ⋅ q1] = E[r2 ⋅ q2] = 0. (B.14)

Gibbs and Addie [3] have neglected the termd. This is obviously a significant term, whichmeans that Gibbs and Addie’s calculation ofjoint probability densities at two frequencies isincorrect.

The determinant of Ci,j will be:

| Ci,j | = (a ⋅ b – c2 – d2)2 = g2. (B.15)

where g may be expressed:

(B.16)

The inverse correlation matrix is given by:

(B.17)

The probability density of vi,j may hence beexpressed:

(2.18)

A change of variables to NEXT power transferfunctions and NEXT phase angles is carried out.The new variables are, z1 = r1

2 + q12, ϕ =

arctan[q1 / r1], z2 = r22 + q2

2, and θ = arctan[q2 /r2], and the probability density may then bewritten:

(B.19)

where ψ = arctan[d / c], z1, z2 ≥ 0, and 0 ≤ ϕ, θ< 2π. The phase angles of NEXT transfer func-tions are irrelevant, and the joint probability den-sity of z1 and z2 will be:

pi, j z1,ϕ , z2 ,θ( ) =1

4π( )2 g⋅

exp −1

2gbz1 + az2 + 2 z1 ⋅ z2 ⋅[⎛

⎝⎜

c2 + d 2 cos ϕ − θ + ψ( )⎤⎦⎥⎞⎠⎟ ,

pi, j vi, j( ) =1

2π( )2 g⋅

exp −1

2gb r1

2 + q12( ) + a r2

2 + q22( )−[⎛

⎝⎜

2c r1 ⋅ r2 + q1 ⋅q2( )− 2d r1 ⋅q2 − q1 ⋅ r2( )]).

Ci, j−1 =

1

g

b 0 −c −d0 b d −c−c d a 0−d −c 0 a

⎢⎢⎢

⎥⎥⎥

.

g a b c dP Pi j i j= ⋅ − − =

⋅⋅

−⋅

+( ) + −( )

⎝⎜⎜

⎠⎟⎟

2 2 1 2

1 2

1 2

2

1 2

2

4

1

, , , ,

.α α

α α β β

Page 48: Information Theory and its Applications

46 Telektronikk 1.2002

(SINTEF report no. STF44 F85001.) (InNorwegian.)

9 ITU-T. Asymmetric Digital Subscriber Line(ADSL) transceivers. Geneva, 1999. (ITU-TRecommendation G.992.1.)

10 ETSI. Very high speed Digital SubscriberLine (VDSL); Part 1: Functional require-ments. Sophia Antipolis, 1999. (ETSI TS101 270-1, V1.2.1.)

11 Holte, N. Broadband communication inexisting copper cables by means of xDSLsystems – a tutorial. NORSIG, Trondheim,Norway, October 2001.

12 Bingham, J A C. Multicarrier modulation fordata transmission : An idea whose time hascome. IEEE Communications Magazine,May 1990, 5–19.

13 Papoulis, A. Probability, random variables,and stochastic processes. 3rd. ed., NewYork, McGraw Hill, 1991.

14 Abramowitz, M, Stegun, I A. Handbook ofmathematical functions. New York, Dover,1970.

Page 49: Information Theory and its Applications

47

I. IntroductionTime-varying channel conditions is an importantfeature of most wireless communication sys-tems. Future systems must therefore exhibit ahigh degree of adaptivity on many levels to sup-port traffic flows with large information rates(measured in transmitted information bits persecond). Examples of such adaptivity are: powercontrol, code adaptation, bandwidth adaptation,antenna adaptation, and protocol adaptation [1],[2]. This paper deals with code adaptation.

The spectral efficiency of a wireless channelor link is equal to the information rate per unitbandwidth. When the instantaneous spectral effi-ciency varies due to code adaptation, the aver-age spectral efficiency (ASE) should be maxi-mized. Goldsmith and Varaiya [3] have deter-mined the maximum ASE of a single-user com-munication system with a slowly varying fre-quency-flat fading channel, fixed transmit signalpower, and perfect channel-state information(CSI) available at the transmitter and receiver.

It is shown in [3] that the maximum ASE may beobtained by a theoretical adaptive codingscheme. Adaptive coded modulation (ACM)with fixed transmit power may be a practicaladaptive transmission technique for frequency-flat fading channels with available CSI [4]–[11].ACM may utilize e.g. a set of classical trelliscodes [12]–[17] or a set of turbo-trellis codes[18]–[21] in which each code is designed forgood performance on an additive white Gaussiannoise (AWGN) channel. If the codes in a set arebased on quadrature amplitude modulation(QAM) constellations of different sizes [22],then a low bit error rate (BER) and large ASEmay be obtained simultaneously by switchingadaptively between the codes according to theCSI. The goal of this paper is to present newupper bounds on the ASE of single-user wirelessACM/QAM channels with frequency-flat fading.

We begin by introducing, in Section II, a com-munication system model utilizing ACM/QAMon an arbitrary single-user flat-fading channel.Section III then upper bounds the ASE of ACM/QAM both for maximum likelihood decoding(MLD) and for sequential decoding (SD). As anexample, the MLD bound is applied to single-user Nakagami fading channels, and both theMLD and SD bounds are compared to the theoret-ical maximum ASE for Nakagami fading chan-nels [23]. Conclusions are drawn in Section IV.

II. Fading Channels and ACMThe discrete-time system model consists of atransmitter and a receiver communicating overa single user wireless channel degraded by mul-tipath fading. The fading is assumed to remainnearly constant over hundreds of channel sym-bols. Pilot symbols are sent repeatedly over thechannel to ensure that the receiver is able tofully compensate for the amplitude and phasevariations in the received signal, i.e. we assumeideal channel estimation and coherent detec-tion.1) Hence, we may model the stationary andergodic fading amplitude α(t) (≥ 0) as a stochas-tic variable with real values.

Let the transmitted signal have complex base-band representation x(t) at time index t ∈ 0, 1,2, .... The received baseband signal is thengiven by y(t) = α(t)x(t) + n(t) where n(t) denotescomplex AWGN. The real and imaginary partsof the noise are statistically independent, bothwith variance (N0B)/2 where N0 [W/Hz] is thetotal one-sided noise power spectral density andB [Hz] is the one-sided channel bandwidth.

Denote the average transmit power by S [W].The instantaneous received carrier-to-noiseratio (CNR) is represented by the stochasticvariable γ(t) = α2(t)S/(N0B). (In the sequel weomit the time reference t and refer to α and γ.)Let p(γ) be the probability density function (pdf)

Bounds on the Average Spectral Efficiencyof Adaptive Coded ModulationK J E L L J . H O L E

An introduction to adaptive coded modulation (ACM) was given in an earlier paper by Hole and Øien [4].

It was shown that ACM may achieve a large average spectral efficiency (ASE) on wireless channels

with slowly varying frequency-flat fading. This paper presents new upper bounds on the ASE of ACM

with maximum likelihood decoding and sequential decoding. The bounds are compared to the theoreti-

cal maximum ASE. It is found that this theoretical maximum provides an optimistic upper bound on the

achievable ASE of practical ACM schemes. However, the new bounds indicate that ACM may still pro-

vide a large ASE.

Kjell Jørgen Hole (41) receivedhis BSc, MSc and PhD degreesin computer science from theUniversity of Bergen in 1984,1987 and 1991, respectively.He is currently Senior ResearchScientist at the Department ofTelecommunications at the Nor-wegian University of Scienceand Technology (NTNU) inTrondheim. His research inter-ests are in the areas of codingtheory and wireless communi-cations.

[email protected]

1) The overhead bandwidth associated with the pilot symbols is ignored.

Telektronikk 1.2002

Page 50: Information Theory and its Applications

48 Telektronikk 1.2002

Assuming Nyquist signaling, the time used totransmit one two-dimensional QAM symbol isTs = 1/B [s]. Since the number of informationbits per QAM symbol is log2(Mn) – c/L, theinformation rate of code n is Rn = (log2(Mn) –c/L)/Ts [bits/s], and the spectral efficiency isRn/B = log2(Mn) – c/L [bits/s/Hz]. Consequently,the ASE of all N codes is [9],[24]

(2)

where

(3)

is the probability of the instantaneous CNRγfalling in fading region n.

In principle, infinitely many sets of 2L-dimen-sional codes are available. We restrict ourselvesto code sets which contain N codes and considerall such sets having the same parameters c, L,

and . In general, the code sets will

correspond to different sets of thresholds γnfor a common target BER0 [9]. A change of codeset therefore implies changing one or more ofthe associated thresholds. To determine whichcode set maximizes the ASE under the givenconditions, we first prove the following result.

Lemma 1: For given c, L, BER0, and Mn, theASE defined by (2) and (3) does not decreasebut may increase when a threshold γn is reduced.

Proof: First, let n > 1 and reduce the nth thresh-

old from γn to such that γn-1 ≤ < γn.

Keep the other thresholds fixed. We study thesum of the (n – 1)th and nth terms in (2). Beforethe threshold is changed, the sum is given by

for ,

while after the threshold is changed the sumbecomes

The pdf p(γ) in (3) is assumed to be a continuousfunction and we may write

P γ n−1,γ n( ) = P γ n−1, γ n( ) + P γ n ,γ n( )P γ n ,γ n+1( ) = P γ n ,γ n( ) + P γ n ,γ n+1( )

Q = Rn−1 ⋅P γ n−1, γ n( ) + Rn ⋅P γ n ,γ n+1( ).

Rj = Rj / B = log2 M j( )− c / L, j = n −1,n

Q = Rn−1 ⋅P γ n−1,γ n( ) + Rn ⋅P γ n ,γ n+1( )

γ nγ n

Mn n=1N

P p dn nn

nγ γ γ γγ

γ, ( )+( ) = +∫1

1

ASE

=

bits/s/Hz

=1

=1

γ

γ γ

γ γ

n

n

n

N

n n

nn

N

n n

R

BP

M c L P

( )

⋅ ( )

= ( ) −( ) ( )

[ ]

+

+

,

log / ,

1

2 1

of the CNR γ. We only assume that p(γ) is a con-tinuous function on the interval [0, ∞). The CNR

has expectation E[γ] = = ΩS/(N0B), where

E[α2] = Ω is the average received power gain.We assume that there exists a noiseless and zero-delay feedback channel from the receiver to thetransmitter such that both the transmitter andreceiver have perfect knowledge of the instanta-neous received CNR γ at all times.

Let N quantization levels (or fading regions)represent the time-varying received CNR γ.When ACM is used, one trellis code designed tocombat AWGN is assigned to each fading region[4]–[11]. The N fading regions are defined bythe thresholds 0 < γ1 < γ2 < ... < γN+1 = ∞. Coden, n ∈ 1, 2, ..., N, is utilized every time the in-stantaneous received CNR γ falls in region n, i.e.when γn ≤ γ < γn+1. The ACM system is de-signed such that the BER never exceeds a giventarget maximum BER0 for any CNRγ. Details on

how to choose thresholds to achieve

BER ≤ BER0 for a given set of N codes can befound in [9].

Fading region n = 1 represents the smallest val-ues of γ for which information is transmitted. Noinformation is sent when γ < γ1 simply becausethe channel quality is too bad to successfullytransmit any information with the availablecodes. Hence, an ACM scheme experiencesan outage during which information must bebuffered at the transmitter end. The probabilityof outage is

(1)

III. Bounds on ASE of ACMThis section upper bounds the ASE of ACMutilizing trellis codes for QAM signal constella-tions.

III.A Bound for MLDLet 4 ≤ M1 < M2 < ... < MN denote the number ofsymbols in N two-dimensional QAM constella-tions of growing size, and let code n be basedon the constellation with Mn symbols. For somesmall fixed L ∈ 1, 2, ..., the encoder for code naccepts L ⋅ log2(Mn) – c information bits at eachtime index k = L ⋅ t ∈ 0, L, 2L, ... and gener-ates L ⋅ log2(Mn) coded bits, 1 ≤ c < L ⋅ log2(Mn).The coded bits specify L modulation symbols inthe nth QAM constellation. These symbols aretransmitted at time indices k, k + 1, ..., k + L – 1.The L two-dimensional symbols can be viewedas one 2L-dimensional symbol, and for this rea-son the code is said to be a 2L-dimensional trel-lis code.

P p d p dout γ γ γ γ γγ

γ1 0

1

1

1( ) = = −∫ ∫ +∞( ) ( ) .

γ n n=1N +1

γ

Page 51: Information Theory and its Applications

49Telektronikk 1.2002

The difference D = – Q then becomes

where > -1 > 0. Since P( , γn) ≥ 0, we

have D ≥ 0. When n = 1 the difference reducesto

D = 1 [P( 1, γ2) – P(γ1, γ2)]

= 1 ⋅ P( 1, γ1)

for some 1 > 0. Since P( 1, γ1) ≥ 0, we again

have D ≥ 0. Q.E.D.

It is possible to lower bound the thresholds γnwhich can be used for a given BER0. Code n isonly active when the instantaneous CNR falls infading region n, i.e. γ ∈ [γn, γn+1). Hence, for agiven region n, the fading channel can beapproximated by a bandlimited AWGN channelwith CNR at least equal to γn. The BER mustnever exceed the target maximum BER0 andcode n must therefore achieve BER ≤ BER0 onan AWGN channel of CNR γn [9]. The maxi-mum spectral efficiency of this AWGN channelis equal to the Shannon capacity, Cn [bits/s],divided by the bandwidth B [Hz]

[bits/s/Hz].

Since the spectral efficiency of code n must sat-isfy Rn/B ≤ Cn/B, we have

n = 1, 2, ..., N.

Note that the spectral efficiency Rn/B is obtainedfor a small finite target BER0 while an arbitrarilysmall BER is assumed for the maximum spectralefficiency Cn/B. Hence, the lower boundγ MLD

n is in fact valid for any fixed BER0. Fur-thermore, observe that the proof of the Shannoncapacity Cn (see e.g. [25, Sec. 10.1]) does notrequire MLD; however, we use the superscriptMLD to denote that γ MLD

n is the minimum possi-ble threshold also for MLD.

As an example, let the number of symbols in thenth QAM constellation be Mn = 2n+1. The spec-tral efficiency Rn/B = n + 1 – c/L and the mini-mum threshold γ MLD

n = 2n+1-c/L – 1 are listed (indB) in Table 1 for c = 1, L = 1, 2, and n = 1, 2,..., 11.

Theorem 1: For given c, L, and Mn, the ASEdefined by (2) and (3) is maximized for the min-imum thresholds γ MLD

n , n = 1, 2, ..., N.

γ n ≥Mn

2c / L −1 =def

γ nMLD ,

Cn

B= log2 1+ γ n( )

γR

γR

γR

γ nRnRn

D = Rn − Rn−1[ ]⋅P γ n ,γ n( )

Q

Proof: The ASE defined by (2) and (3) has anabsolute maximum at (γ MLD

1 , ..., γ MLDn ). To see

this, reduce the threshold γ1 to its minimumvalue γ MLD

1 . It follows from Lemma 1 that theASE is not reduced. Repeat the procedure foreach of the remaining thresholds γ2, γ3, ..., γN.Q.E.D.

Since the outage probability Pout (γ1) in (1) doesnot increase but may decrease when the thresh-old γ1 is reduced, we also have

Corollary 1: The minimum outage probabilityin an ACM/QAM system is Pout (γ MLD

1 ).

To obtain an adaptive codec with ASE close tothe maximum given by Theorem 1, the informa-tion rate Rn of each trellis code n must be closeto the Shannon capacity Cn of an AWGN chan-nel with CNR equal to γ MLD

n . he capacity Cn isobtained with a Gaussian distribution over con-tinuous-valued channel input symbols. ACMutilizes trellis codes based on equiprobablediscrete-valued QAM symbols. However, thereexist various techniques, called constellationshaping techniques, that achieve a more Gaus-sian-like distribution of the coded QAM symbols[17].

n Mn L Rn/B [dB] [dB]

1 4 1 1.0 0.00 2.39

2 1.5 2.62 4.80

2 8 1 2.0 4.77 6.80

2 2.5 6.68 8.61

3 16 1 3.0 8.45 10.30

2 3.5 10.13 11.94

4 32 1 4.0 11.76 13.53

2 4.5 13.35 15.09

5 64 1 5.0 14.91 16.63

2 5.5 16.46 18.17

6 128 1 6.0 17.99 19.69

2 6.5 19.52 21.21

7 256 1 7.0 21.04 22.73

2 7.5 22.55 24.24

8 512 1 8.0 24.07 25.75

2 8.5 25.58 27.26

9 1024 1 9.0 27.08 28.76

2 9.5 28.59 30.27

10 2048 1 10.0 30.10 31.78

2 10.5 31.61 33.28

11 4096 1 11.0 33.11 34.79

2 11.5 34.62 36.30

γ nSDγ n

MLD

Table 1 Spectral eficiencyRn/B [bits/s/Hz] and minimum

thresholds γ MLDn [dB] and

γ SDn [dB] for c = 1 and various

values of n and L

Page 52: Information Theory and its Applications

50 Telektronikk 1.2002

As an example, we apply Theorem 1 to maxi-mize the ASE of Nakagami multipath fadingchannels whose fading amplitude α has a Nak-agami pdf controlled by a real parameter m =Ω2/Var[α2] with constraint m ≥ 1/2 [26, p. 48].2)

Here, Var[⋅] denotes the variance. The probabil-ity of γ falling in the nth fading region (3) isequal to [24]

(4)

where Γ(⋅, ⋅) is the complementary incompletegamma function [27, Eq. (8.350.2)] and Γ(m) isthe gamma function, which equals Γ(m) = (m –1)! when m is a positive integer.

Let m ∈ 1, 2, 4, c = L = 1, and Mn = 2n+1 forn = 1, ..., 11. From Theorem 1, the maximumachievable ASE of ACM/QAM is obtained from(2) and (4) using the N = 11 minimum thresholdsγ MLD

n for L = 1 in Table 1. The maximumachievable ASE is plotted in Figure 1. Observethat a large ASE is achievable even for Rayleighfading (m = 1). The difference between thetheoretical maximum ASE [23, Eq. (23)], oftendenoted MASE, and the maximum achievableASE of ACM/QAM is plotted in Figure 2. Thecurves show that the use of N = 11 codes (withShannon capacity performance on continuousinput AWGN channels) results in a maximumachievable ASE of about .5 bit/s/Hz less than thetheoretical maximum ASE. Nearly the same dif-ference was found for L = 2.

III.B Bound for Sequential DecodingIn practice, MLD with the Viterbi algorithmis only implemented for classical QAM trelliscodes with relatively short constraint lengths.For a set of codes with large constraint lengths,we may use SD instead [28]–[30]. The channelcutoff rate for each code n, R0(γn) [bits/s/Hz], isthen the maximum spectral efficiency at whichthe average number of computations per de-coded information bit is bounded for the lowerthreshold γn. For the bandlimited complexAWGN channel with Gaussian distributed input,we have [28]

[bits/s/Hz].

Theorem 2: If SD with a finite average numberof computations per decoded bit is to be used,then the ASE defined by (2) and (3) is maxi-mized for the thresholds

(5)

n = 1, 2, ..., N,

given c, L, and Mn.

Proof: The theorem is an immediate conse-quence of Lemma 1 and the following observa-tion: When a set of N trellis codes based onQAM constellations with Mn symbols are used

γ nSD =

defmin γ n R0 γ n( ) ≥ log2 Mn( )− c / L

Figure 1 Maximum achievableASE of ACM/QAM obtainedfrom (2) and (4) for minimumthresholds γ MLD

n in Table 1,c = L = 1, n = 1, 2, ..., 11

Figure 2 Difference betweentheoretical maximum ASE andmaximum achievable ASE ofACM/QAM

2) Nakagami is a general fading distribution that reduces to Rayleigh for m = 1. It also approximates the Rician distribution for m > 1, and it canapproach the log-normal distribution.

ASE bound for MLD, c=L=1

12 14 16 18 20

average CNR in dB

22 24 26 28 30

9

8

7

6

5

4

3

m=4

m=2

m=1

bits

/s/H

z

Difference between MASE and discrete-rate MLD bound, c=L=1

12 14 16 18 20

average CNR in dB

22 24 26 28 30

0.515

0.51

0.505

0.5

0.495

m=1

m=2

m=4

bits

/s/H

z

P (γn, γn+1) =Γ

(m, mγn

γ

)− Γ

(m,

mγn+1

γ

)

Γ(m),

R0(γn) = (log2e)

[1 +

γn

2−

√1 +

(γn

2

)2

]

+ log2e

[1

2

(1 +

√1 +

(γn

2

)2

)]

Page 53: Information Theory and its Applications

51Telektronikk 1.2002

in conjunction with SD, the spectral efficiencyof code n given by Rn/B = log2(Mn) – c/L mustsatisfy Rn/B ≤ R0(γn) where γn is the lowerthreshold of fading region n. Q.E.D.

Corollary 2: The minimum outage probabilityfor SD is Pout(γ SD

1 ).

A variation on Newton’s method was used todetermine the thresholds γ SD

n in (5). These val-ues are tabulated in the rightmost column ofTable 1 for c = 1, L = 1, 2 and Mn = 2n+1, n = 1,..., 11. For two-dimensional codes (L = 1) andNakagami fading with m ∈ 1, 2, 4, the differ-ence between the theoretical maximum ASE [23,Eq. (23)] and achievable ASE (2), (4) is plottedin Figure 3. Nearly 1.06 bits/s/Hz is lost com-pared to the theoretical maximum ASE. Thesame is true for four-dimensional codes (L = 2).Consequently, assuming optimal performingcodes, an increase in ASE of about .56 bit/s/Hzmay be obtained by utilizing MLD instead of SDwhere MLD is feasible.

IV. ConclusionsFor single-user systems with frequency-flat fad-ing, it seems clear that the theoretical maximumASE [3], [23] provides an optimistic upper boundon the achievable ASE of practical ACM codecs.It was found that any sets of two-dimensional orfour-dimensional trellis codes for MLD have ASEat least .5 bit/s/Hz less than the maximum ASE(see Figure 2) on a Nakagami fading channel.However, ACM may still provide a large ASE.

The ASE of optimal performing ACM withMLD is about .56 bit/s/Hz larger than the ASEof optimal performing ACM with SD. A prelimi-nary investigation indicates that this differencemay be smaller for sets of known trellis codes.Hence, ACM with SD may be an interestingalternative to ACM with MLD. More research isneeded to determine the real world performanceof ACM with SD on wireless channels.

References1 Meyr, H. Algorithm design and system

implementation for advanced wireless com-munications systems. In: Proc. InternationalZurich Seminar on Broadband Communica-tions (IZS’2000), Zürich, Switzerland, Feb.2000.

2 Bose, V, Wetherall, D, Guttag, J. Next cen-tury challenges: RadioActive Networks. In:Proc. ACM/IEEE International Conferenceon Mobile Computing and Networking(MOBICOM’99), Seattle, WA, Aug. 1999.

3 Goldsmith, A J, Varaiya, P P. Capacity offading channels with channel side informa-

tion. IEEE Trans. Inform. Theory, 43 (6),1986–1992, 1997.

4 Hole, K J, Øien, G E. Adaptive coding andmodulation: A key to bandwidth-efficientmultimedia communications in future wire-less systems. Telektronikk, 97 (1), 49–57,2001.

5 Goldsmith, A J, Chua, S-G. Adaptive codedmodulation for fading channels. IEEE Trans.Commun., 46 (5), 595–602, 1998.

6 Lau, V K N, Macleod, M D. Variable rateadaptive trellis coded QAM for high band-width efficiency applications in Rayleighfading channels. In: Proc. 48th IEEE Vehic-ular Technology Conference (VTC’98),Ottawa, Canada, May 1998, 348–352.

7 Goldsmith, A J. Adaptive modulation andcoding for fading channels. In: Proc. IEEEInform. Theory and Commun. Workshop,Kruger National Park, South Africa, June1999, 24–26.

8 Goeckel, D L. Adaptive coding for time-varying channels using outdated fadingestimates. IEEE Trans. Commun., 47 (6),844–855, 1999.

9 Hole, K J, Holm, H, Øien, G E. Adaptivemultidimensional coded modulation over flatfading channels. IEEE J. Select. Areas Com-mun., 18 (7), 1153–1158, 2000.

Figure 3 Difference betweentheoretical maximum ASE andachievable ASE of ACM/QAM

with SD, obtained from (2) and(4) for minimum thresholdsγ SD

n in Table 1, c = L = 1,n = 1, 2, ..., 11

Difference between MASE and discrete-rate SD bound, c=L=1

12 14 16 18 20

average CNR in dB

22 24 26 28 30

1.06

1.05

1.04

1.03

1.02

1.01

1

0.99

m=4

m=2

m=1

bits

/s/H

z

Page 54: Information Theory and its Applications

52 Telektronikk 1.2002

10 Hole, K J, Øien, G E. Spectral efficiency ofadaptive coded modulation in urban micro-cellular networks. IEEE Trans. Veh. Tech-nol., 50 (1), 205–222, 2001.

11 Vishwanath, S, Goldsmith, A J. Exploringadaptive turbo coded modulation for flat fad-ing channels. In: Proc. 52nd IEEE VehicularTechnology Conference (2000 VTC-Fall),Boston, MA, Sept. 2000.

12 Ungerboeck, G. Channel coding with multi-level/phase signals. IEEE Trans. Inform.Theory, IT-28 (1), 55–67, 1982.

13 Forney, G D Jr. et al. Efficient modulationfor band-limited channels. IEEE J. Select.Areas Commun., SAC-2 (5), 632–647, 1984.

14 Wei, L-F. Trellis-coded modulation withmultidimensional constellations. IEEETrans. Inform. Theory, 33 (4), 483–501,1987.

15 Pietrobon, S S, Costello, D J Jr. Trellis cod-ing with multidimensional QAM signal sets.IEEE Trans. Inform. Theory, 39 (2),325–336, 1993.

16 Wang, F-Q, Costello, D J Jr. New rotation-ally invariant four-dimensional trellis codes.IEEE Trans. Inform. Theory, 42 (1),291–300, 1996.

17 Forney, G D Jr., Ungerboeck, G. Modulationand coding for linear Gaussian channels.IEEE Trans. Inform. Theory, 44 (6),2384–2415, 1998.

18 Le Goff, S, Glavieux, A, Berrou, C. Turbo-codes and high spectral e±ciency modula-tion. In: Proc. IEEE Int. Conf. Commun.(ICC’94), New Orleans, Louisiana, May1994, 645–649.

19 Robertson, P, Wörz, T. A novel bandwidthe±cient coding scheme employing turbocodes. In: Proc. IEEE Int. Conf. Commun.(ICC’96), Dallas, Texas, June 1996,962–967.

20 S. Benedetto, S et al. Parallel concatenatedtrellis-coded modulation. In: Proc. IEEE Int.Conf. Commun. (ICC’96), Dallas, Texas,June 1996, 974–978.

21 Vucetic, B, Yuan, J. Turbo Codes : Princi-ples and Applications. Norwell, MA,Kluwer, 2000.

22 Webb, W T, Hanzo, L. Modern QuadratureAmplitude Modulation. Graham Lodge, Lon-don, Pentech Press, 1994.

23 Alouini, M-S, Goldsmith, A J. Capacity ofNakagami multipath fading channels. In:Proc. 47th IEEE Vehicular Technology Con-ference (VTC’97), Phoenix, Arizona, May1997, 358–362.

24 Alouini, M-S, Goldsmith, A J. Adaptive M-QAM modulation over Nakagami fadingchannels. In: Proc. 6th CommunicationsTheory Mini-Conference (CTMC VI) in con-junction with IEEE Global CommunicationsConference (GLOBECOM’97), Phoenix,Arizona, Nov. 1997, 218–223.

25 Cover, T M, Thomas, J A. Elements of Infor-mation Theory. New York, John Wiley, 1991.

26 Stüber, G L. Principles of Mobile Communi-cation. Norwell, MA, Kluwer AcademicPublishers, 1996.

27 Gradshteyn, I S, Ryzhik, I M. Table of Inte-grals, Series, and Products. San Diego, CA,Academic Press, fifth ed., 1994.

28 Wang, F-Q, Costello, D J Jr. Erasure-freesequential decoding of trellis codes. IEEETrans. Inform. Theory, 40 (6), 1803–1817,1994.

29 Couturier, S, Costello, D J Jr., Wang, F-Q.Sequential decoding with trellis shaping.IEEE Trans. Inform. Theory, 41 (6),2037–2040, 1995.

30 Wang, F-Q, Costello, D J Jr. Sequentialdecoding of trellis codes at high spectralefficiencies. IEEE Trans. Inform. Theory,43 (6), 2013–2019, 1997.

Page 55: Information Theory and its Applications

53

I. IntroductionDigital communications using MIMO (multiple-input multiple-output), or sometimes called “vol-ume to volume” wireless links, has emerged asone of the most promising research areas inwireless communications. It also figures promi-nently on the list of hot technologies that mayhave a chance to resolve the bottlenecks oftraffic capacity in the forthcoming high-speedbroadband wireless Internet access networks(UMTS1) and beyond).

MIMO systems can be defined simply. Givenan arbitrary wireless communication system,MIMO refers to a link for which the transmittingend as well as the receiving end is equipped withmultiple antenna elements, as illustrated in Fig-ure 1. The idea behind MIMO is that the signalson the transmit antennas on one end and that ofthe receive antennas on the other end are “com-bined” in such a way that the quality (Bit ErrorRate) or the data rate (Bit/Sec) of the communi-cation will be improved. MIMO systems usespace-time processing techniques in that the timedimension (natural dimension of transmissionsignals) is completed with the spatial dimensionbrought by the multiple antennas. MIMO sys-tems can be viewed as an extension of the so-called “smart antennas” [1], a popular technol-ogy for improving wireless transmission thatwas first invented in the 70s. However, as we

will see here, the underlying mathematicalnature of MIMO environments can give perfor-mance which goes well beyond that of conven-tional smart antennas. Perhaps the most strikingproperty of MIMO systems is the ability to turnmultipath propagation, usually a pitfall of wire-less transmission, into an advantage for increas-ing the user’s data rate, as was first shown ingroundbreaking papers by J. Foschini [2], [3].

In this paper, we attempt to explain the promiseof MIMO techniques and explain the mecha-nisms behind it. To highlight the specifics ofMIMO systems and give the necessary intuition,we illustrate the difference between MIMO andconventional smart antennas in section II. Amore theoretical (information theory) standpointis taken in part III. Practical design of MIMOsolutions involves both transmission algorithmsand channel modeling to measure their perfor-mance. These issues are addressed in sectionsIV and V respectively. Radio network level con-siderations to evaluate the overall benefits ofMIMO setups are finally discussed in section VI.

II. MIMO Systems: More ThanSmart Antennas

In the conventional wireless terminology, smartantennas refer to those signal processing tech-niques exploiting the data captured by multipleantenna elements located on one end of the link

Breaking the Barriers of Shannon’s Capacity:An Overview of MIMO Wireless SystemsD A V I D G E S B E R T A N D J A B R A N A K H T A R

Appearing a few years ago in a series of information theory articles published by members of the Bell

Labs, multiple-input multiple-output (MIMO) systems have evolved quickly to both become one of the

most popular topics among wireless communication researchers and reach a spot in today’s ‘hottest

wireless technology’ list. In this overview paper, we come back on the fundamentals of MIMO wireless

systems and explain the reasons of their success, triggered mainly by the attraction of radio transmis-

sion capacities far greater than those available today. We also describe some practical transmission

techniques used to signal data over MIMO links and address channel modeling issues. The challenges

and limitations posed by deploying this technology in realistic propagation environment are discussed

as well.

Figure 1 Diagram for a MIMO wireless transmission system. The transmitter and receiver areequipped with multiple antenna elements. Coding, modulation and mapping of the signals onto the

antennas may be realized jointly or separately

1) Universal Mobile Telephone Services.

Jabran Akhtar (25) is currentlya PhD student at the Universityof Oslo. His research interestsinclude MIMO systems andspace-time coding techniques.

[email protected]

David Gesbert (32) holds anMSc from the Nat. Inst. for Tele-communications, Evry, France,1993, and a PhD from Ecole Nat.Superieure des Telecommunica-tions, Paris, 1997. He has workedwith France Telecom Researchand been a postdoctoral fellowin the Information Systems Lab.,Stanford University. In 1998 hetook part in the founding team ofIospan Wireless Inc., San Jose,a company promoting high-speedwireless Internet access net-works. In 2001 he joined theSignal Processing Group at theUniv. of Oslo as adjunct associ-ate professor. Dr. Gesbert’s re-search interests are in the areaof high-speed wireless data / IPnetworks, smart antennas andMIMO, link layer and [email protected]

CHANNEL“H”

codingmodulation

weighting/mapping

weighting/demappingdemodulation

decoding

0010100 0010110

Telektronikk 1.2002

Page 56: Information Theory and its Applications

54 Telektronikk 1.2002

only, typically at the base station (BTS) wherethe extra cost and space are more easily afford-able. The multiple signals are combined upontransmission before launching into the channelor upon reception. The goal is to offer a morereliable communications link in the presence ofadverse propagation conditions such as multi-path fading and interference. A key concept insmart antennas is that of beamforming by whichone increases the average signal to noise ratio(SNR) through focusing energy into desireddirections. Indeed, if one estimates the responseof each antenna element to a desired transmittedsignal, one can optimally combine the elementswith weights selected as a function of each ele-ment response. One can then maximize the aver-age desired signal level and minimize the levelof other components (noise and/or interference).

Another powerful effect of smart antennas iscalled spatial diversity. In the presence of multi-path, the received power level is a random func-tion of the user location and, at times, experi-ences fading. When using antenna arrays, theprobability of losing the signal altogether van-ishes exponentially with the number of decorre-lated antenna elements. The diversity order isdefined by the number of decorrelated spatialbranches.

When multiple antennas are added at the sub-scriber’s side as well as to form a MIMO link,conventional benefits of smart antennas are

retained since the optimization of the transmit-ting and receiving antenna elements can be car-ried out in a larger space. But in fact MIMOlinks offer advantages which go far beyond thatof smart antennas [4]. Multiple antennas at boththe transmitter and the receiver create a matrixchannel (of size the number of receive antennastimes the number of transmit antennas). The keyadvantage lies in the possibility of transmittingover several spatial modes of the matrix channelwithin the same time-frequency slot at no addi-tional power expenditure.

While we use information theory below todemonstrate this rigorously, the best intuition isperhaps given by a simple example of a trans-mission algorithm over MIMO referred hereas spatial multiplexing, which was initiallydescribed in [3], [5]. In Figure 2, a high rate bitstream (left) is decomposed into three indepen-dent bit sequences, which are then transmittedsimultaneously using multiple antennas. The sig-nals are launched and naturally mixed togetherinto the wireless channel as they use the samefrequency spectrum. At the receiver, after havingidentified the mixing channel matrix throughtraining symbols, the individual bit streams areseparated and estimated. This occurs in the sameway, as three unknowns are resolved from a lin-ear system of three equations. The separation ispossible only if the equations are independentwhich can be interpreted by each antenna ‘see-ing’ a sufficiently different channel. That is typi-

A1

A2

A3

B1

B2

B3

C1

C2

C3

b1 b4 •••

b2 b5 •••b6b4 b5b2 b3b1

b3 b6 •••

Mod

ulat

ion

and

map

ping

SIG

NA

L P

RO

CE

SS

ING

A1

A2

A3

B1

B2

B3

C1

C2

C3

b6b4 b5b2 b3b1

b1 b4 •••

b2 b3 •••

b3 b6 •••

••••••

Figure 2 Basic spatialmultiplexing (SM) schemewith 3 transmit and 3 receiveantennas yielding three-foldimprovement in spectralefficiency

Page 57: Information Theory and its Applications

55Telektronikk 1.2002

cally the case in the presence of rich multipath.Finally the bits are merged together to yield theoriginal high rate signal.

In general though, one will define the rank of theMIMO channel as the number of independentequations offered by the linear system men-tioned above. It is also equal to the algebraicrank of the channel matrix. Clearly the rank isalways both less than the number of transmitantennas and less than the number of receiveantennas. In turn, the number of independentsignals that one may safely transmit through theMIMO system is at most equal to the rank. Inthis example, the rank is assumed full (equalto three) and the system shows a spectrum effi-ciency gain of three. This surprising result canbe demonstrated from an information theorystandpoint.

III. Fundamental Limits ofWireless Transmission

Today’s inspiration for research and applicationsof wireless MIMO systems was mostly triggeredby the initial Shannon capacity results obtainedindependently by Bell Lab’s researchers E.Telatar [6] and J. Foschini [3], further demon-strating the seminal role of information theory intelecommunications. The analysis of informationtheory-based channel capacity gives very useful,although idealistic, bounds on what is the maxi-mum information transfer rate one is able torealize between two points of a communicationlink modeled by a given channel. Further, theanalysis of theoretical capacity gives informa-tion on how the channel model or the antennasetup itself may influence the transmission rate.Finally it helps the system designer benchmarktransmitter and receiver algorithm performance.Here we examine the capacity aspects of MIMOsystems compared with single input single out-put (SISO), single input multiple output (SIMO)and multiple input single output (MISO) systems.

III.A Shannon Capacity of WirelessChannels

Given a single channel corrupted by an additivewhite Gaussian noise (AWGN), at a level ofSNR denoted by ρ, the capacity (rate that can beachieved with no constraint on code or signalingcomplexity) can be written as [7]:

C = log2 (1 + ρ) Bit/Sec/Hz (1)

This can be interpreted by an increase of 3 dB inSNR required for each extra bit per second perHertz. In practice, wireless channels are time-varying and subject to random fading. In thiscase we denote h the unit-power complex Gaus-sian amplitude of the channel at the instant ofobservation. The capacity, written as:

C = log2 (1 + ρ | h |2) Bit/Sec/Hz (2)

becomes a random quantity, whose distributioncan be computed. The cumulative distribution ofthis “1 × 1” case (one antenna on transmit andone on receive) is shown on the left in Figure 3.We notice that the capacity takes, at times, verysmall values, due to fading events.

Interesting statistics can be extracted from therandom capacity related with different practicaldesign aspects. The average capacity Ca, aver-age of all occurrences of C, gives informationon the average data rate offered by the link. Theoutage capacity Co is defined as the data ratethat can be guaranteed with a high level of cer-tainty, for a reliable service:

ProbC ≥ Co = 99.9..9 % (3)

We will now see that MIMO systems affect Ca

and Co in different ways than conventional smartantennas do. In particular MIMO systems havethe unique property of significantly increasingboth Ca and Co.

III.B Multiple Antennas at One EndGiven a set of M antennas at the receiver (SIMOsystem), the channel is now composed of M dis-tinct coefficients h = [h0, h1, ..., hM-1 where hi isthe channel amplitude from the transmitter to thei-th receive antenna. The expression for the ran-dom capacity (2) can be generalized to [3]:

C = log2 (1 + ρhh*) Bit/Sec/Hz (4)

where * denotes the transpose conjugate. In Fig-ure 3 we see the impact of multiple antennas onthe capacity distribution with 8 and 19 antennasrespectively. Both the outage area (bottom of thecurve) and the average (middle) are improved.This is due to the spatial diversity which reducesfading and thanks to the higher SNR of the com-bined antennas. However going from 8 to 19antennas does not give very significant improve-ment as spatial diversity benefits quickly leveloff. The increase in average capacity due to SNRimprovement is also limited because the SNR isincreasing inside the log function in (4). We alsoshow the results obtained in the case of multipletransmit antennas and one receive antennas,“8 × 1” and “19 × 1” when the transmitter doesnot know the channel in advance (typical for afrequency duplex system). In such circumstancesthe outage performance is improved but not theaverage capacity. That is because multiple trans-mit antennas cannot beamform blindly.

In summary, conventional multiple antenna sys-tems are good at improving the outage capacityperformance, attributable to the spatial diversity

Page 58: Information Theory and its Applications

56 Telektronikk 1.2002

effect but this effect saturates with the number ofantennas.

III.C Capacity of MIMO LinksWe now consider a full MIMO link as in Figure1 with respectively N transmit and M receiveantennas. The channel is represented by a matrixof size M × N with random independent ele-ments denoted by H. It was shown in [3] that thecapacity, still in the absence of transmit channelinformation, is derived from:

(5)

where ρ is the average SNR at any receivingantenna. In Figure 3 we have plotted the resultsfor the 3 × 3 and the 10 × 10 case, giving thesame total of 9 and 20 antennas as previously.The advantage of the MIMO case is significant,both in average and outage capacity. In fact, fora large number M = N of antennas the averagecapacity increases linearly with M:

Ca <≈ M log2 (1 + ρ) (6)

In general the capacity will grow proportionalwith the smallest number of antennas min(N,M)outside and no longer inside the log function.Therefore in theory and in the case of idealizedrandom channels, limitless capacities can berealized provided we can afford the cost andspace of many antennas and RF chains. In realitythe performance will be dictated by the practicaltransmission algorithms selected and by thephysical channel characteristics.

C = log2 det IM +ρN

HH *⎛⎝

⎞⎠

⎣⎢

⎦⎥,

IV. Data Transmission overMIMO Systems

A usual pitfall of information theoretic analysisis that it does not reflect the performanceachieved by actual transmission systems, sinceit is an upper bound realized by algorithms/codes with boundless complexity. The develop-ment of algorithms with reasonable perfor-mance/complexity compromise is required torealize the MIMO gains in practice. Here wegive the intuition behind key transmission algo-rithms and compare their performance.

IV.A General PrinciplesCurrent transmission schemes over MIMO typi-cally fall into two categories: Data rate maxi-mization or diversity maximization schemes.The first kind focuses on improving the averagecapacity behavior. For example in the case ofFigure 2, the objective is just to perform spatialmultiplexing as we send as many independentsignals as we have antennas.

More generally, however, the individual streamsshould be encoded jointly in order to protecttransmission against errors caused by channelfading. This leads to a second kind of approachin which one tries also to minimize the outageprobability.

Note that if the level of coding is increasedbetween the transmit antennas, the amount ofindependence between the signals decreases.Ultimately it is possible to code the signals sothat the effective data rate is back to that of asingle antenna system. Effectively each transmitantenna then sees a differently encoded versionof the same signal. In this case the multipleantennas are only used as a source of spatialdiversity and not to increase data rate directly.

The set of schemes allowing to adjust and opti-mize joint encoding of multiple transmit anten-nas are called space-time codes (STC). AlthoughSTC schemes were originally revealed in [8] inthe form of convolutional codes for MISO sys-tems, the popularity of such techniques reallytook off with the discovery of the so-calledspace-time block codes (STBC). In contrast toconvolutional codes, which require computation-hungry trellis search algorithms at the receiver,STBC can be decoded with much simpler linearoperators, at little loss of performance. In theinterest of space and clarity we limit ourselvesto an overview of STBC below. A more detailedsummary of the whole area can be found in [9].

IV.B Maximizing Diversity withSpace-Time Block Codes

The field of space-time block coding was initi-ated by Alamouti [10] in 1998. The objectivebehind this work was to place two antennas at

Figure 3 Shannon capacityas function of number ofTX × RX antennas. The plotsshow the so-called cumulativedistribution of capacity. Foreach curve, the bottom and themiddle give indication of theoutage performance andaverage data rate respectively

0

Pro

b C

apac

ity <

abs

ciss

a

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

05 10 15 20 25 30 35

Capacity in Bits/Sec/Hz

Capacity of i.i.d Rayleigh diversity channels at 10dB SNR

1 x 1

MIMO Diversity3 x 3

10 x 10

1 x 19

SIMO Diversity1 x 8

MISO Diversity8 x 119 x 1

Page 59: Information Theory and its Applications

57Telektronikk 1.2002

the transmitter side and thereby provide an ordertwo diversity advantage to a receiver with onlya single antenna, with no a priori channel infor-mation at the transmitter. The very simple struc-ture of Alamouti’s method itself makes it a veryattractive scheme that is currently being consid-ered in UMTS standards.

The strategy behind Alamouti’s code is as fol-lows. The symbols to be transmitted are groupedin pairs. Because this scheme is a pure diversityscheme and results in no rate increase2) we taketwo symbol durations to transmit a pair of sym-bols, such as s0 and s1. We first transmit s0 onthe first antenna while sending s1 simultaneouslyon the second one. In the next time-interval –s1

*

is sent from the first antenna while s0* from the

second one. In matrix notation, this scheme canbe written as:

(7)

The rows in the code matrix C denote the an-tennas while the columns represent the symbolperiod indexes. As one can observe the blockof symbols s0 and s1 are coded across time andspace, giving the name space-time block code tosuch designs. The normalization factor addition-ally ensures that the total amount of energytransmitted remains at the same level as in thecase of one transmitter.

The two (narrow-band) channels from the twoantennas to the receiver can be placed in a vectorformat as h = [h0, h1]. The receiver collectsobservations over two time frames in a vectory which can then be written as y = hC + n orequivalently as yt = s + n, where

s = [s0, s1]T and n is

the noise vector.

Because the matrices C, are orthogonal bydesign, the symbols can be separated/decoded ina simple manner from filtering of the observedvector y. Furthermore, each symbol comes witha diversity order of two exactly. Notice finallythis happens despite the channel coefficientsbeing unknown to the transmitter.

More recently some authors have tried to extendthe work of Alamouti to more than two transmitantennas [11], [12]. It turns out however that inthat case it is not possible to design a perfectlyorthogonal code, except for real valued modula-tions (e.g. PAM). In the case of a general com-plex symbol constellation, full-rate orthogonalcodes cannot be constructed. This has therefore

led to a variety of code design strategies to pro-long Alamouti’s work where one either sacri-fices the data rate to preserve a simple decodingstructure or the orthogonality of the code toretain a full data rate [13], [14], [15]. Althoughtransmit diversity codes have mainly beendesigned with multiple transmit and singlereceive antenna in mind, the same ideas can eas-ily be expanded towards a full MIMO setup. TheAlamouti code implemented on a system withtwo antennas at both transmitter and receiverside will for example give a four-order diversityadvantage to the user and still has a simple de-coding algorithm. However, in a MIMO situa-tion, one would not only be interested in diver-sity but also in increasing the data rate as shownbelow.

IV.C Spatial MultiplexingSpatial multiplexing, or V-BLAST (Vertical BellLabs Layered Space-Time) [3], [16] can beregarded as a special class of space-time blockcodes where streams of independent data aretransmitted over different antennas, thus maxi-mizing the average data rate over the MIMOsystem. One may generalize the example givenin II in the following way: Assuming a block ofindependent data C is transmitted over the N × MMIMO system, the receiver will obtain Y = HC+ N. In order to perform symbol detection, thereceiver must un-mix the channel, in one of vari-ous possible ways. Zero-forcing techniques use astraight matrix inversion, a simple approach thatcan also result in poor results when the matrix Hbecomes very ill-conditioned in certain randomfading events. The optimum decoding methodon the other hand is known as maximum likeli-hood (ML) where the receiver compares all pos-sible combinations of symbols that could havebeen transmitted with what is observed:

(8)

The complexity of ML decoding is high, andeven prohibitive when many antennas or highorder modulations are used. Enhanced variantsof this, like sphere decoding [17] have recentlybeen proposed. Another popular decoding strat-egy proposed alongside V-BLAST is known asnulling and canceling which gives a reasonabletradeoff between complexity and performance.The matrix inversion process in nulling and can-celing is performed in layers where one esti-mates a symbol, subtracts this symbol estimatefrom Y and continues the decoding successively[3].

Straight spatial multiplexing allows for full inde-pendent usage of the antennas, however it gives

2) Diversity gains can however be used to increase the order of the modulation.

C =1√

2

(s0 −s

1

s1 s∗

0

).

H =1√

2

(h0 h1

h∗

1−h

0

),

C = argmin

C‖Y − HC‖

H

H

Page 60: Information Theory and its Applications

58 Telektronikk 1.2002

limited diversity benefit and is not always thebest transmission scheme for a given BER tar-get. Coding the symbols within a block canresult in additional coding and diversity gain,which can help improve the performance, eventhough the data rate is kept at the same level. Itis also possible to sacrifice some data rate formore diversity. Methods to design such codesstart from a general structure where one oftenassumes that a weighted linear combination ofsymbols may be transmitted from any givenantenna at any given time. The weights them-selves are selected in different fashions by usinganalytical tools or optimizing various cost func-tions [11], [18], [19], [20].

In what follows we compare four transmissionstrategies over a 2 × 2 MIMO system with ide-ally uncorrelated elements. All schemes result inthe same spectrum efficiency but offer differentBER performance.

Figure 4 shows such a plot where the BER ofvarious approaches are compared: The Alamouticode in [7], spatial multiplexing (SM) with zeroforcing (ZF) and with maximum likelihooddecoding (ML), and a combined STBC spatialmultiplexing scheme [20]. A 4-QAM constella-tion is used for the symbols except for the Alam-outi code, which is simulated under 16-QAMto keep the data rate at the same level. It can beseen from the figure that spatial multiplexingwith zero-forcing returns rather poor results,while the curves for other coding methods aremore or less closer to each other. Codingschemes, such as Alamouti and the block codegive better results than what can be achieved

with spatial multiplexing alone for the case oftwo antennas. The Alamouti curve has the bestslope at high SNR because it focuses entirely ondiversity (order four). At lower SNR, the schemecombining spatial multiplexing with some blockcoding is the best one.

It is important to note that as the number ofantennas increases, the diversity effect will givediminishing returns, while the data rate gain ofspatial multiplexing remains linear with thenumber of antennas. Therefore, for a larger num-ber of antennas it is expected that more weighthas to be put on spatial multiplexing and less onspace-time coding. Interestingly, having a largernumber of antennas does not need to result in alarger number of RF chains. By using antennaselection techniques (see for example [21]) it ispossible to retain the benefits of a large MIMOarray with just a subset of antennas being activeat the same time.

V. Channel ModelingChannel modeling has always been an importantarea in wireless communications and this area ofresearch is particularly critical in the case ofMIMO systems. In particular, as we have seenearlier, the promise of high MIMO capacitieslargely relies on decorrelation propertiesbetween antennas as well as the full-rankness ofthe MIMO channel matrix. The performance ofMIMO algorithms such as those above can varyenormously depending on the realization or notof such properties. In particular, spatial multi-plexing becomes completely inefficient if thechannel has rank one. The final aim of channelmodeling is therefore to get an understanding of,by the means of converting measurement datainto tractable formulas, what performance can bereasonably expected from MIMO systems inpractical propagation situations. The other roleof channel models is to provide with the neces-sary tools to analyze the impact of selectedantenna or propagation parameters (spacing, fre-quency, antenna height, etc.) onto the capacityto influence the system design in the best way.Finally, models are used to try out transmit andreceive processing algorithms with more realis-tic simulation scenarios than those normallyassumed in the literature.

V.A Theoretical ModelsThe original papers on MIMO capacity used an‘idealistic’ channel matrix model consisting ofperfectly uncorrelated (i.i.d.) random Gaussianelements. This corresponds to a rich multipathenvironment, yielding maximum excitation ofall channel modes. It is also possible to defineother types of theoretical models for the channelmatrix H, which are not as ideal. In particularwe emphasize the separate roles played byantenna correlation (on transmit or on receive)

Figure 4 Bit Error Rate (BER)comparisons for varioustransmission techniques overMIMO. All scheme results usethe same transmission rate

0

Bit

Err

or R

ate

10-1

10-2

10-3

10-4

10-5

5 10 15 20 25SNR (dB) per receive antenna

2 transmitters - 2 receivers

Alamouti - Linear (16QAM)SM - ZF (4QAM)SM - ML (4QAM)STBC - ML (4QAM)

Page 61: Information Theory and its Applications

59Telektronikk 1.2002

Remote scatterers

Local RX scatterers

Local TX scatterers

and the rank of the channel matrix. If fully corre-lated antennas will lead to a low rank channel,the converse is not true in general.

Let us next consider the following MIMO theo-retical model classification, starting from Fos-chini’s ideal i.i.d. model, and interpret the per-formance. In each case below we consider a fre-quency-flat channel. In the case of broadband,frequency selective channels, a different fre-quency-flat channel can be defined at each fre-quency.

• Uncorrelated High Rank (UHR, a.k.a. i.i.d.)model: The elements of H are i.i.d. complexGaussian.

• Correlated Low Rank (CLR) model: H =grxg*

txurxu*tx where grx and gtx are independent

Gaussian coefficients (receive and transmitfading) and urx and utx are fixed deterministicvectors of size M × 1 and N × 1, respectively,and with unit modulus entries. This model isobtained when antennas are placed too closeto each other or there is too little angularspread at both the transmitter and the receiver.This case yields no diversity nor multiplexinggain whatsoever, just receive array / beam-forming gain. We may also imagine the caseof uncorrelated antennas at the transmitter anddecorrelated at the receiver, or vice versa.

• Uncorrelated Low Rank (ULR) (or “pin-hole”[22]) model: H = grxg*

tx, where grx and gtx areindependent receive and transmit fading vec-tors with i.i.d. complex-valued components.In this model every realization of H has rank1 despite uncorrelated transmit and receiveantennas. Therefore, although diversity is pre-sent capacity must be expected to be less thanin the UHR model since there is no multiplex-ing gain. Intuitively, in this case the diversityorder is equal to min(M,N).

V.B Heuristic ModelsIn practice of course, the complexity of radiopropagation is such that MIMO channels will notfall completely in either of the theoretical casesdescribed above. Antenna correlation and matrixrank are influenced by as many parameters as theantenna spacing, the antenna height, the presenceand disposition of local and remote scatterers, thedegree of line of sight and more. Figure 5 depictsa general setting for MIMO propagation. Thegoal of heuristic models is to display a widerange of MIMO channel behaviors through theuse of as few relevant parameters as possiblewith as much realism as possible.

A good model shall give us answers to the fol-lowing problems: What is the typical capacity ofan outdoor or indoor MIMO channel? What are

the key parameters governing capacity? Underwhat simple conditions do we get a full rankchannel? If possible the model parametersshould be controllable (such as antenna spacing)or measurable (such as angular spread of multi-path [23], [24], which is not always easy toachieve.

The literature on these problems is still veryscarce. For the line-of-sight (LOS) case it hasonly been shown how very specific arrange-ments of the antenna arrays at the transmitterand the receiver can maximize the orthogonalitybetween antenna signatures and produce maxi-mum capacity as reported in [25]. But, in a gen-eral situation with fading, which is the truepromising case, this work is not applicable.

In the presence of fading, the first step in in-creasing the model’s realism consists in takinginto account the correlation of antennas at eitherthe transmit or receive side. The correlation canbe modeled to be inversely proportional to theangular spread of the arriving/departing multi-path. The experience suggests that higher corre-lation at the BTS side can be expected becausethe BTS antenna is usually higher above theclutter, causing reduced angular spread. In con-trast the subscriber’s antenna will be buried inthe clutter (if installed at street level) and willexperience more multipath angle spread, henceless correlation for the same spacing. The way

Figure 5 MIMOchannel propagation.

The complicateddisposition of thescatterers in the

environment willdetermine the numberof excitable modes in

the MIMO channel

Page 62: Information Theory and its Applications

60 Telektronikk 1.2002

MIMO models can take correlation into accountis similar to how usual smart antenna channelmodels do it. The channel matrix is pre- (orpost-) multiplied by a correlation matrix control-ling the antenna correlation as function of thepath angles, the spacing and the wavelength. Forexample, for a MIMO channel with correlatedreceive antennas, we have:

(9)

where H0 is an ideal i.i.d. MIMO channel matrix

and is the M × M correlation matrix. θr

is the receive angle spread and dr is the receiveantenna spacing. Different assumptions on thestatistics of the paths’ directions of arrival(DOA) will yield different expressions for

[26], [27], [28]. For uniformly dis-

tributed DOAs, we find [27], [26]

(10)

where S (assumed odd) is the number of pathswith corresponding DOAs θr,i. For “large” val-ues of the angle spread and/or antenna spacing,

will converge to the identity matrix,

which gives uncorrelated fading. For “small”values of θr,dr, the correlation matrix becomesrank deficient (eventually rank one) causingfully correlated fading. The impact of the corre-lation on the capacity was analyzed in severalpapers, including [29]. Note that it is possible togeneralize this model to include correlation onboth sides by using two distinct correlationmatrices:

(11)H = Rθr ,dr

1/ 2 H0Rθt ,dt

1/ 2

Rθr ,dr

Rθr ,dr[ ]m,k=

1

S i= S−12

i= S−12

∑ e−2πj k−m( )dr cos π

2 +θr,i( )

Rθr ,dr

Rθr ,dr

H = Rθr ,dr

1/ 2 H0

V.B.1 Impact of Scattering RadiusOne limitation of simple models like the one in(11) is that it implies that rank loss of H can

only come from rank loss in or in

, i.e. a high correlation between the

antennas. However as suggested by the theoreti-cal model “ULR” above, it may not always beso. In practice such a situation can arise wherethere is significant local scattering around boththe BTS and the subscriber’s antenna and stillonly a low rank is realized by the channelmatrix. That may happen because the energytravels through a narrow “pipe”, i.e. if the scat-tering radius around the transmitter and receiveris small compared to the traveling distance. Thisis depicted in Figure 6. This situation is referredto as pinhole or keyhole channel in the literature[22], [30].

In order to describe the pinhole situation more,so-called double scattering models are devel-oped that take into account the impact of thescattering radius at the transmitter and at thereceiver. The model is based on a simplified ver-sion of Figure 5 shown in Figure 7 where onlylocal scatterers contributing to the total apertureof the antenna as seen by the other end are con-sidered. The model can be written as [22]:

(12)

where the presence of two (instead of one) i.i.d.random matrices H0,t and H0,r accounts for the

double scattering effect. The matrix

dictates the correlation between scattering ele-ments, considered as virtual receive antennaswith virtual aperture 2Dr. When the virtual aper-ture is small, either on transmit or receive, therank of the overall MIMO channel will fallregardless of whether the actual antennas arecorrelated or not.

RθS ,2 Dr / S

H =1

SRθr ,dr

1/ 2 H0,rRθS ,2 Dr / S1/ 2 H0,tRθt ,dt

1/ 2 ,

Rθt ,dt

Rθr ,dr

Figure 6 An example of pinhole realization. Reflections around the BTS andsubscribers cause uncorrelated fading, yet the scatter rings are too small forthe rank to build up

Narrow pipe

scatter ring

BTS area

User´s area

scatter ring

Page 63: Information Theory and its Applications

61Telektronikk 1.2002

V.C Broadband ChannelsIn broadband applications the channel experi-ences frequency selective fading. In this casethe channel model can be written as H(f) wherea new MIMO matrix is obtained at each fre-quency/sub-band. This type of model is of inter-est in the case of orthogonal frequency divisionmultiplexing (OFDM) modulation with MIMO.It was shown that the MIMO capacity actuallybenefits from the frequency selectivity becausethe additional paths that contribute to the selec-tivity will also contribute to a greater overallangular spread and therefore improve the aver-age rank of the MIMO channel across frequen-cies [31].

V.D Measured ChannelsIn order to validate the models as well as to fos-ter the acceptance of MIMO systems into wire-less standards, a number of MIMO measurementcampaigns have been launched in the last twoyears, mainly led by Lucent and ATT Labs andby various smaller institutions or companiessuch as Iospan wireless in California. Morerecently Telenor R&D put together its ownMIMO measurement capability.

Samples of analysis for UMTS type scenarioscan be found in [32], [33], [34]. Measurementsconducted at 2.5 GHz for broadband wirelessaccess applications can be found in [35]. So far,the results reported largely confirm the highlevel of dormant capacity of MIMO arrays, atleast in urban or suburban environments. Indoorscenarios lead to even better results due to a veryrich multipath structure. Eigenvalues analysesreveal that a large number of the modes ofMIMO channels can be exploited to transmitdata. Which particular combination of spatialmultiplexing and space time coding will lead tothe best performance complexity trade-off oversuch channels remains however an area of activeresearch.

VI. System Level Issues

VI.A Optimum Use of MultipleAntennas

Multiple antenna techniques are not new in com-mercial wireless networks. Spatial diversity sys-tems, using two or three antenna elements, co- orcross-polarized, have been in use since the earlystages of mobile network deployments. Morerecently, beamforming-type BTS productsequipped with five to ten or more antennas havebeen offered on the market. These products areusing diversity to improve the link budget andthe beamforming capability to extend the cellrange or help in load balancing.

Beyond the information theory aspects add-ressed earlier, there are significant network-leveldifferences between the beamforming approachand the MIMO approach to using multipleantennas.

While beamforming systems tend to use a largernumber of closely spaced antennas, MIMO willoperate with typically fewer antennas (althoughthe only true constraint is at the subscriber siderather than at the BTS side). Furthermore theMIMO antennas will use as much space as canbe afforded to try and realize decorrelationbetween the elements while the directional-based beamforming operation imposes stringentlimits on spacing. Also most MIMO algorithmsfocus on diversity or data rate maximizationrather than just increasing the average SNR atthe receiver or reducing interference. Finally,beamforming systems thrive in near line of sightenvironments because the beams can be moreeasily optimized to match one or two multipathsthan a hundred of them. In contrast, MIMO sys-tems turn rich multipath into an advantage andlose their multiplexing benefits in line of sightcases.

Figure 7 Double scatteringMIMO channel model

Dr

Dr

dt

MRXs

θr

θs θt

NTXs

R

Dt

Page 64: Information Theory and its Applications

62 Telektronikk 1.2002

Because of these differences, the optimal way ofusing multiple antenna systems, at least at theBTS, is likely to depend on the situation. Thesearch for compromising solutions, in whichthe degrees of freedom offered by the multipleantennas are best used at each location, is anactive area of work. A key to this problemresides in adaptive techniques, which throughthe tracking of environment/propagation charac-teristics are able to pick the right solution at alltimes.

VI.B MIMO in Broadband InternetAccess

One unfavorable aspect of MIMO systems,when compared with traditional smart antennas,lies in the increased cost and size of the sub-scriber’s equipment. Although a sensible designcan extract significant gains with just two orthree antennas at the user’s side, it may alreadyprove too much for simple mobile phonedevices. Instead wireless LAN modems, PDAsand other high speed wireless Internet access,fixed or mobile, devices constitute the realopportunity for MIMO because of less stringentsize and algorithmic complexity limitations. InFigure 8 we show the data rates achieved by afixed broadband wireless access system with2 × 3 MIMO. The realized user’s data rates arecolor coded from 0 to 13.5 Mb/s in a 2 MHz RFchannel3), function of the user’s location. Theaccess point is located in the middle of an ideal-ized hexagonal cell. Detailed assumptions can befound in [36]. The figure illustrates the advan-tages over a system with just one transmit an-tenna and one or two receive antennas. Currentstudies demonstrating the system level advan-tages of MIMO in wireless Internet access focusmainly on performance. While very promising,

the evaluation of overall benefits of MIMO sys-tems, taking into account deployment and costconstraints, is still in progress.

VII. ConclusionsThis paper reviews the major features of MIMOlinks for use in future wireless networks. Infor-mation theory reveals the great capacity gainswhich can be realized from MIMO. Whether weachieve this fully or at least partially in practicedepends on a sensible design of transmit andreceive signal processing algorithms. Moreprogress in channel modeling will also be need-ed. In particular upcoming performance mea-surements in specific deployment conditions willbe key to evaluate precisely the overall benefitsof MIMO systems in real-world wireless sys-tems scenarios such as UMTS.

References1 Paulraj, A, Papadias, C B. Space-time pro-

cessing for wireless communications. IEEESignal Proc. Mag., 14, 49–83, 1997.

2 Foschini, G J, Gans, M J. On limits of wire-less communications in a fading environ-ment when using multiple antennas. WirelessPersonal Communications, 6, 311–335,1998.

3 Foschini, G J. Layered space-time architec-ture for wireless communication. Bell LabsTechnical Journal, 1, 41–59, 1996.

4 Sheikh, K et al. Smart antennas for broad-band wireless access. IEEE CommunicationsMagazine, Nov 1999.

5 Paulraj, A J, Kailath, T. Increasing capacityin wireless broadcast systems using dis-

3) A user gets zero if the link quality does not satisfy the target BER.

Figure 8 User rates in 2 MHzFDD channels in a fixedwireless access system. Theplots show the relative gainsbetween various number ofantennas at transmitter ×receiver (SISO, SIMO, MIMO)

0.0 1.1 2.2 3.4 4.5 5.6 6.7 9.0 11.2 13.5

Mbps

1 x 1 1 x 2 2 x 3

Page 65: Information Theory and its Applications

63Telektronikk 1.2002

tributed transmission/directional reception.U.S. Patent, 1994. (No. 5,345,599.)

6 Telatar, I E. Capacity of multi-antenna Gaus-sian channels. Bell Labs Technical Memo-randum, 1995.

7 Proakis, J G. Digital Communications. NewYork, McGraw-Hill, 1989.

8 Tarokh, V, Seshadri, N, Calderbank, A R.Space-time codes for high data rate wirelesscommunication: Performance criterion andcode construction. IEEE Trans. Inf. Theory,44, 744–765, 1998.

9 Naguib, A, Seshadri, N, Calderbank, R.Increasing data rate over wireless channels.IEEE Signal Processing Magazine, May2000.

10 Alamouti, S M. A simple transmit diversitytechnique for wireless communications.IEEE Journal on Selected Areas in Commu-nications, 16, 1451–1458, 1998.

11 Tarokh, V, Jafarkhani, H, Calderbank, A R.Space-time block codes for wireless commu-nications: Performance results. IEEE Jour-nal on Selected Areas in Communications,17, 1999.

12 Ganesan, G, Stoica, P. Space-time diversityusing orthogonal and amicable orthogonaldesigns. Wireless Personal Communications,18, 165–178, 2001.

13 Jafarkhani, H. A quasi orthogonal space-time block code. IEEE Trans. Comm., 49,1–4, 2001.

14 Tirkkonen, O, Boariu, A, Hottinen, A. Mini-mal non-orthogonality rate 1 space-timeblock code for 3+ tx antennas. In: Proc.IEEE Int. Symp. Spread Spectrum Technol-ogy, 2000.

15 Tarokh, V, Jafarkhani, H, Calderbank, A R.Space-time block codes from orthogonaldesigns. IEEE Trans. Inf. Theory, 45,1456–1467, 1999.

16 Golden, G D et al. Detection algorithm andinitial laboratory results using the V-BLASTspace-time communication architecture.Electronics Letters, 35, 1, 14–15, 1999.

17 Damen, M O, Chkeif, A, Belfiore, J C. Lat-tice codes decoder for space-time codes.IEEE Communications Letters, 4, 161–163,2000.

18 Hassibi, B, Hochwald, B. High rates codesthat are linear in space and time. Submittedto IEEE Trans. On Information Theory,2000.

19 Sandhu, S, A. Paulraj. Unified design of lin-ear space-time block-codes. IEEE GlobecomConference, 2001.

20 Damen, M O, Tewfik, A, Belfiore, J C. Aconstruction of a space-time code based onnumber theory. IEEE Trans. On InformationTheory, March 2002.

21 Molisch, A, Winters, M Z W J, Paulraj, A.Capacity of mimo systems with antennaselection. In: IEEE Intern. Conf. On Com-munications, 570–574, 2001.

22 Gesbert, D et al. Outdoor mimo wirelesschannels: Models and performance predic-tion. IEEE Trans. Communications, 2002.To appear.

23 Pedersen, K I, Mogensen, P E, Fleury, B.A stochastic model of the temporal andazimuthal dispersion seen at the base stationin outdoor propagation environments. IEEETrans. On Vehicular Technology, 49, 2000.

24 Rossi, J P, Barbot, J P, Levy, A. Theory andmeasurements of the angle of arrival andtime delay of uhf radiowaves using a ringarray. IEEE Trans. On Antennas and Propa-gation, May 1997.

25 Driessen, P, Foschini, J. On the capacity for-mula for multiple input multiple output wire-less channels: a geometric interpretation.IEEE Trans. Comm., 173–176, Feb 1999.

26 Ertel, R B et al. Overview of spatial channelmodels for antenna array communicationsystems. IEEE Personal Communications,10–22, Feb 1998.

27 Asztély, D. On antenna arrays immobilecommunication systems: Fast fading andGSM base station receiver algorithms. RoyalInstitute of Technology, Stockholm, Swe-den, March 1996. (Tech. Rep. IR-S3-SB-9611.)

28 Fuhl, J, Molisch, A F, Bonek, E. Unifiedchannel model for mobile radio systems withsmart antennas. IEE Proc.-Radar, SonarNavig., 145, 32–41, 1998.

29 Shiu, D et al. Fading correlation and itseffect on the capacity of multi-elementantenna systems. IEEE Trans. Comm.,March 2000.

Page 66: Information Theory and its Applications

64 Telektronikk 1.2002

30 Chizhik, D, Foschini, G, Valenzuela, R A.Capacities of multi-element transmit andreceive antennas: Correlations and keyholes.Electronic Letters, 1099–11, 2000.

31 Bölcskey, H, Gesbert, D, Paulraj, A J. Onthe capacity of wireless systems employingOFDM-based spatial multiplexing. IEEETrans. Comm., 2002. To appear.

32 Martin, C C, Winters, J, Sollenberger, N.Multiple input multiple output (mimo) radiochannel measurements. In: IEEE VehicularTechnology Conference, Boston (MA),2000.

33 Ling, J et al. Multiple transmitter multiplereceiver capacity survey in Manhattan. Elec-tronic Letters, 37, Aug 2001.

34 Buehrer, R et al. Spatial channel models andmeasurements for imt-2000 systems. In:Proc. IEEE Vehicular Technology Confer-ence, May 2001.

35 Pitchaiah, S et al. Modeling of multiple-input multiple-output (mimo) radio channelbased on outdoor measurements conductedat 2.5 GHz for fixed bwa applications. In:Proc. International Conference on Commu-nications, 2002.

36 Gesbert, D et al. Technologies and perfor-mance for non line-of-sight broadband wire-less access networks. IEEE CommunicationsMagazine, April 2002.

Page 67: Information Theory and its Applications

65

I. IntroductionConsider the problem of sending a block u of Kbits over a noisy channel. With some nonzeroprobability, these bits will be corrupted by thenoise. To combat these effects, the informationblock is almost always encoded with an error-correcting code.

An error-correcting encoder works by addingN – K extra parity-check bits to the informationblock, to produce a codeword of N bits. Thecode is the set of codewords that arises when uranges over all possible information blocks, andthe code rate is the ratio R = K/N. The codewordis transmitted over the channel. At the receivingend, a decoder produces an estimate , based onthe received message, of u. If u, we have adecoding error, see Figure 1, from [1]1). Theparity-check bits are selected with the aim tominimize the probability of decoding error.

In his seminal paper [2], Shannon introduced thenotion of the channel capacity C of a communi-cation channel. He proved the following by anon-constructive argument: For a given commu-nication channel and for any arbitrarily small ε,provided the code length N is sufficiently largethere exists a code of any rate R < C with theproperty that the decoding error probability withoptimum decoding is less than ε. Conversely, forrates larger than the capacity, error free trans-mission is not possible.

During the last half of the 20th century, intenseresearch efforts were devoted towards designingpractical error correcting codes with a perfor-mance approaching Shannon’s predictions.However, this goal turned out to be difficult toachieve, even though powerful algebraic con-structions were devised [5], [8]. The discoveryof turbo codes by Berrou et al. [11] in 1993 rep-resented a major breakthrough. Recent develop-ments in this area have produced implementablecodes with performance very close to Shannon’sbounds [36].

Figure 2 shows the relationship between the sig-nal-to-noise ratio (SNR) and the bit error rate(BER) for an additive white Gaussian noise(AWGN) channel. The SNR, expressed in dB, is

(1)

where Eb is the average received signal energyper information bit, and N0 is the single sidedpower spectral density of the Gaussian noise.2)

For a given code rate, Shannon’s results implythat there is a threshold SNR, which is referredto as the Shannon bound, below which error freetransmission is impossible. For a continuousinput AWGN channel, the Shannon bound indB is given as

(2)

An Introduction to Turbo Codes andIterative DecodingØ Y V I N D Y T R E H U S

The discovery of turbo codes by Berrou et. al. [11] in 1993 revolutionized the theory of error-correcting

codes. The purpose of this paper is

• to provide an introduction to turbo codes;

• to describe the weight distribution properties of turbo codes that allow low error probability, even

when the underlying communication channel is poor;

• to explain the low complexity decoding algorithms that make turbo codes so attractive;

• to suggest how to select essential components of the turbo construction, such as interleavers and

constituent codes;

• to mention that turbo coding can be used in practical situations, for example in a coded modulation

scheme; and finally

• to point out the limitations of turbo codes that, so far, restrict their use in some applications.

Øyvind Ytrehus (42) is professorand currently the DepartmentChair at the Department of In-formatics, University of Bergen.His research interests includeerror correcting properties anddecoding complexity of alge-braic codes, convolutionalcodes, turbo codes, and codesbased on graphs; applicationsof coding theory in communica-tion and storage; and the inter-action between coding theoryand cryptology.

[email protected]

1) For non-Norwegian readers: The story “God dag mann! – Økseskaft” is about a hearing-impaired ferrymanwho receives a call from the local policeman. Prior to the visit, the ferryman anticipates the policeman’s ques-tions. However, as the signal-to-noise level of the actual communication channel is too low, this results in anabsurd conversation.

2) Throughout this paper, commonly known facts and results will be presented without explicit references. Ex-planations and derivations of these results can be found in some of the monographs [8], [10], [17], [24], [29].

Telektronikk 1.2002

u

u =

SNR = 10 log10

Eb

N0

,

SNRC = 10 log10

22R − 1

2R

Page 68: Information Theory and its Applications

66 Telektronikk 1.2002

BER vs. SNR (dB)

SNR

-2

-4

-6

-8

log 10

[B

ER

]

Plateau

Waterfall region

Error floor

Uncoded

Convolutional code

Shannon bound rate 1/3: -0.55 dB

-1 0 1 2 3 4

The BER performance of a typical turbo codeused with the iterative turbo decoding algorithm(to be discussed in Section III) is shown in Fig-ure 2. The proximity to the rate-specific Shan-non bound varies with the information lengthN, with the choice of constituent encoders andinterleaver, and with details of implementationof the decoding algorithms. However, over therange of such varying parameters, the curvesdisplay the characteristics as shown: There is aplateau region at very low SNR, where there islittle or no improvement over uncoded transmis-sion; followed by the waterfall region, wherethe BER drops off rapidly with increasing SNR.Finally there is an error floor at higher SNR,where the BER mainly depends on the probabil-ity of decoding to a few most likely error vectors.

Berrou et al.’s turbo codes are parallel concate-nated codes, generated by an encoder as shownin Figure 3. The encoder accepts an informationblock u = (u0, ..., uK-1) consisting of K bits. Itwill be convenient sometimes to consider u as a

sequence (where the vari-

able D can be thought of as just a placeholder).The encoder is systematic, meaning that theinformation block u is visible as an explicit partof the codeword. The rest of the codeword con-sists of parity check symbols from the two con-stituent encoders, encoder A and encoder B. InBerrou et al.’s model, which we will considerthroughout this paper except for the generaliza-tions in Section V, encoders A and B are identi-cal recursive convolutional encoders. Thismeans that the first parity sequence cA(D) isobtained from the information block as cA(D) =u(D)(f(D)/g(D)), for some fixed binary polyno-mials f(D) and g(D). The second parity sequenceis obtained from the information block as cB(D)= π(u(D))(f(D)/g(D)), where π(u(D)) is thesequence resulting from permuting the coordi-nates of u(D) according to an interleaver map π.In some cases, 2ν extra bits are appended to theinput sequences in order to terminate the con-stituent trellises (see Section III and [24], [29]).This is not shown in Figure 3. Finally, someof the encoded bits (usually some of the paritycheck bits) are punctured (deleted) according toa puncturing pattern P, leaving a total of N en-coded bits. Hence the turbo code is completelyspecified by the two constituent encoders, by theinterleaver π, by the puncturing pattern P, andby the termination rules.

In Section II we investigate the properties of theweight distribution that explain the characteristicerror curves as shown in Figure 1. Section IIIdeals with the turbo decoding algorithm. Selec-tion of essential system components is consid-ered in Section IV. Finally, Section V discusses

u(D) = u jj =0K −1∑ D j

Figure 2 The connection between the SNR and the BER. The Shannon bound of (1)is shown at approximately –0.55 dB for rate R=1/3. Also shown are the BER curvesfor uncoded transmission (of rate 1), a 64-state rate 1/3 convolutional code, and arate 1/3 turbo code with a simple random interleaver of information length N =1000. In this example, the blue and the yellow curves intersect at an SNR of about14 dB and a BER of about 10-55. However, the turbo code can easily be improvedby changing the interleaver

Figure 1 “God dag mann! – Økseskaft!”: Example of decoding error

Page 69: Information Theory and its Applications

67Telektronikk 1.2002

variations, generalizations, limitations, andapplications.

In a short paper about a topic that in a few yearshas grown into a major research area, there is nospace for entering into deeper discussion on thefiner points. The reference list contains pointersto the literature, for those who want to pursuethis subject.

II. Weight DistributionProperties: Performanceat the Error Floor

When the BER is not too small, it can be accu-rately approximated using computer simulation.However, accurate simulation results require theobservation of hundreds of error events (see anytextbook in statistics or computer simulation),which are particularly hard to obtain for longturbo codes with low error floors.

The Hamming weight of a binary vector isdefined as the number of nonzero positions inthe vector. It is well known (see for example[10]) that, under the assumption that maximumlikelihood decoding is used, the Frame error rate(FER; the probability that a given frame containsan error) of any error correcting code can beapproximated for high SNRs by the unionbound,

(3)

where Aw is the number of codewords of Ham-ming weight w, d is the minimum distance of thecode = the minimum nonzero w for which Aw > 0,and Q() is the complementary error function,

(4)

The set of numbers Aw|w = 0, ..., N is calledthe weight distribution or the weight spectrumof the code. It is sometimes convenient to referto the weight enumerator function A(X)= AwXw. It is also often convenient to con-sider the conditional input-redundancy weightenumerator function Ai(Z) = Ai,zZz, [14],where Ai,z is the number of codewords of infor-mation weight i and parity check weight z. Notethat

(5)

Similarly,

(6)

where Bw = Ai,w–i is the total informationweight of all codewords of weight w.

The union bounds can be refined. These refine-ments lead to sharper bounds that also allow usto restrict the summations of equations (3) and(6) to the first few terms:

+ some-

thing that diminishes at high SNR; (7)

+ some-

thing that diminishes at high SNR; (8)

where m is some small integer. Thus in the errorfloor region, instead of performing a costly com-puter simulation, we would like to determine thefirst few terms of the weight distribution. Forvery large SNR, the code’s minimum distancedC and the total information weight of weight-dC

codewords, BdC, determine the BER quite pre-cisely.

Benedetto and Montorsi [14] considered the caseof a uniform interleaver; i.e. an idealized ran-dom interleaver. Under this assumption, theyshowed that the weight enumerator of the (basic,unpunctured) turbo code can be approximated by

(9)

where AAi (Z) and AB

i (Z) , respectively, are theconditional input-redundancy weight enumeratorfunction of the constituent codes.

These results explain the behaviour of turbocodes with a random interleaver, as observedand demonstrated by Berrou et al.:

Figure 3 Turbo encoder.Trellis termination is

not shown

f/g

Encoder A

f/g

Encoder B

π

N bitsto channel

Mul

tiple

xing

Pun

ctur

ing

u : K infomation bits

FER ≤N∑

w=d

AwQ(√

2wR · SNR)

,

Q(x) =1

√2π

∫∞

x

e−t2/2dt.

∑N

w=0

∑N−K

z=0

K∑

i=0

ZiAi(Z) =

K∑

i=0

Zi

N−K∑

z=0

Ai,zZz

=

K∑

i=0

N−K∑

z=0

Ai,zZi+z

=

N∑

w=0

Zw

K∑

i=0

Ai,w−1

= A(Z).

BER ≤N∑

w=d

Bw

KQ

(√2wR · SNR

),

∑w

i=1

FER ≤d+m∑w=d

AwQ(√

2wR · SNR)

BER ≤d+m∑w=d

Bw

KQ

(√2wR · SNR

)

Ai(Z) ≈AA

i(Z)AB

i(Z)(

K

i

)

Page 70: Information Theory and its Applications

68 Telektronikk 1.2002

• The actual minimum distance of a turbo codeis not impressive compared to the minimumdistance dC of an arbitrary “classical” errorcorrecting code of similar length and rate.This explains why, for a given SNR in theerror floor region, the error floor is flatter forthe turbo code than for the “classical” code.

• The number of codewords of moderately lowweight (say, of weights not much larger thandC) is extremely small compared to the “clas-sical” code. This latter phenomenon is usuallyreferred to as spectral thinning [14], and ex-plains why the error floor of the turbo code islower (for moderately low SNRs) than for the“classical” code.

• For random interleavers, the expected mini-mum distance grows slowly, and the expectednumber of codewords of low weight is slowlyreduced, with the interleaver length K.

An additional observation [14] that arises from(9) is that most low weight codewords are asso-ciated with input vectors of weight 2. This ob-servation actually makes (9) obsolete, since itmotivates the design of better, non-random inter-leavers. Interleaver design is discussed in Sec-tion IV. For now we conclude that, since (9)does not apply to nonrandom interleavers, weneed another way to determine the initial partof the weight distribution for an arbitrary inter-leaver.

II.A Algorithms for Determination ofthe Weight Distribution

In this section we consider algorithms for deter-mining the number of codewords of weight≤ wmax in a particular turbo code with fixedinterleaver and constituent codes.

An upper bound on the minimum distance canbe achieved by considering only input vectors oflow Hamming weight, say of weights 1, 2 or 3.This method can be applied for very large codes,but there is no guarantee that no lower weightcodewords exist with higher weight input vec-tors.

Breiling and Huber [31], refining the previousmethod, used pre-processing to obtain a list ofall input vectors corresponding to constituentcodewords of weight “almost” wmax. The algo-rithm proceeds to attempt to combine input vec-tors to the two constituent encoders. The algo-rithm works well for small values of wmax, butis prohibitively complex for larger wmax.

Another approach was followed by Benedettoet al. in [37], developed further by Rosnes andYtrehus [43]. The idea is to set up a search treecontaining all possible input vectors. Each nodein the search tree is a constraint set, which deter-mines the value of a subset of the input vectorpositions. This constraint set specifies a subcodeof each constituent code. The algorithm evalu-ates the minimum distance of each subcode, andthereby also a lower bound t on the minimumdistance of the corresponding subset of the turbocode. If t exceeds wmax, the constraint set can bediscarded from the search tree. Otherwise, thesearch tree is expanded with two new nodes,expanding the current constraint set in two newdirections. See details in [37]. The efficiencyof the evaluation methods is discussed andimproved in [43], facilitating the analysis oflarger codes and larger values of wmax.

The algorithm in [43] was used to determine theminimum distance of all UMTS turbo codes.The results are shown in Figure 4. As a commenton the first class of algorithms discussed in thissection, it can be noted that in 351 of the 5075cases considered in Figure 4, the minimum dis-tance codewords are actually caused by weight-9inputs.

As an example of the minimum distance’simpact on the BER, we also show in Figure 5results for the UMTS codes of informationlength K = 5114. This code has a codeword oflength 26. We also found another turbo code,using the same constituent code but with aninterleaver generated by a technique similar tothe one described in [42], with minimum dis-tance 36. Note that at target bit error rates larger

Figure 4 The minimum distance of UMTS turbo codes of information length Kranging from 40 to 5114

28

26

24

22

20

18

16

14

12

10

0 1000 2000 3000 4000 5000

Page 71: Information Theory and its Applications

69Telektronikk 1.2002

than 10-6, the performance of the two turbocodes is almost identical. However, at a targetBER of 10-8, the code with the larger minimumdistance has a relative coding gain of almost1 dB, i.e. it requires 1 dB less SNR to achievethe same BER. The asymptotic coding gain dif-ference is approximately 1.2 dB.

II.B Distance BoundsUpper bounds on the minimum distance of turbocodes with arbitrary interleavers were derivedby Breiling and Huber [40]. They consider onlycodewords of input weight two and four of thetypes shown in Figure 6, (a) and (b), respec-tively. An input vector containing a subvectorof 10mp-11 (a one followed by mp – 1 zeros fol-lowed by a one), where m is a positive integerand p is an integer called the period of theencoder denominator polynomial g(D), will gen-erate a constituent codeword of parity weightapproximately proportional to m. Hence, forexample, if the interleaver π maps one suchinput vector u into another vector π(u) of thesame type, as in Figure 6 (a), the correspondingturbo codeword will also have a low overallweight. The approach followed in [40] is toshow that for any interleaver π, there must existsome short loops of these two types. One impor-tant theoretical consequence of this is that thenormalized minimum distance dC/N of turbocode as described in this paper approaches zerofor large N, in contrast to the best possible blockcodes, according to the Varshamov-Gilbertbound [5]. In practice, observed minimum dis-tances dC of moderate length turbo codes aremuch lower than Breiling and Huber’s upperbounds. Therefore, all turbo codes will displaya significant error floor.

III. Iterative DecodingIn Section II, the analysis of the error probabilitywas made under the assumption of maximumlikelihood decoding, which in a strict sense un-fortunately seems to be infeasible. However,Berrou and his colleagues re-invented (see Sec-tion V) iterative decoding, which for moderateto high SNRs appears to perform very close tomaximum likelihood decoding, although a for-mal proof to this effect has yet to be presented.However, we will return to this issue in SectionIV.

Iterative decoding relies on cooperation betweenthe decoding modules of the two constituentcodes involved. Each of the decoding modulesuses the available information; namely the re-ceived values for each transmitted symbol, andcertain a priori information presented by theother decoder module. To motivate this discus-sion, let us return briefly to the situation of Fig-ure 1. In this example, a priori information wasassumed to aid the process of decoding. How-

ever, as the story sadly demonstrates, “incorrect”a priori information can end up being disastrousfor the decoding process. Of course, in a decod-ing situation there is no way ahead of time todetermine which a priori information is “correct”and which is not. Iterative decoding works byusing extrinsic information, to be defined below,instead of complete information as a connectionbetween the SISO decoders. During the courseof many iterations, on average the contributionsof correct extrinsic information will outweighthe negative effects of incorrect extrinsic infor-mation.

Iterative decoding is presented in Figure 7. Con-sider a stochastic variable X that takes a value xwith some known probability P(x) (perhaps con-ditioned on some specific event). It will be con-

Figure 5 Simulation resultsand bounds for information

length 5114

Figure 6 Low weight errorsmade up of (a) weight two

inputs (b) weight four inputs.Here, p = 3

UMTS Length 5114 BER vs. SNR (dB)

SNR

-2

-4

-6

-8

log 10

[B

ER

]

Union Bound UMTS code

New code simulation

0.2 0.4 0.6 0.8 1

UMTS code sim

ulation

terms of weight at most 36Union Bound new codeterms of weight at most 40

Input Encoder B

Input Encoder A

Input Encoder B

Input Encoder A

1 0 101 0 10

1 0 101 0 10 0 0 0

1 0 1

1 0 10 0 0 0

0

(a)

(b)

Page 72: Information Theory and its Applications

70 Telektronikk 1.2002

venient to represent this probability in terms of alog-likelihood ratio (LLR), i.e. a function of thetype

(10)

Abusing notation but convening to the literature,we will sometimes refer to LLRs as informa-tion3) (i e. channel information, a priori infor-mation, extrinsic information, and completeinformation) on the given bit x. Using an LLRinstead of a probability means that operationsbecome additive rather than multiplicative.

Let λ(x) denote a block of LLRs, i.e. (λ(x1), ...,λ(xm)) for some vector (x1, ..., xm). The receiverde-multiplexes the received sequence into threeblocks of LLRs, λ(R:U), λ(R:C)A, and λ(R:C)B, rep-resenting the channel information on the trans-mitted information symbols and the parity checksymbols in the two constituent codes, respec-tively. The determination of the probability dis-tributions requires an accurate estimate of thereceived SNR, but it is not particularly difficultto obtain such an estimate when the channel isstable. A punctured and non-transmitted symbolis represented by a zero in the correspondingLLR block.

For each constituent code, a soft-in-soft-output(SISO) decoder, to be discussed below, producesthe complete a posteriori information λ(Complete)

on the block of transmitted symbols, based onthe a priori information λ(A), the channel infor-mation λ(R), and the structural relationship be-tween the bits as imposed by the constituentcode. On a block level,

λ(Complete) = λ(A) + λ(R) + λ(E). (11)

Subtracting λ(A) and λ(R) from λ(Complete), weobtain the extrinsic information λ(E), which can

be regarded as the amount of information gainedby this pass through the decoder. The extrinsicinformation λ(E)(uj) on a particular informationbit uj is independent of the a priori informationλ(A)(uj) on that same bit.

At the start of the decoding process, the blocksλ(R:U) and λ(R:C)A are submitted to SISO A,while the initial a priori information λ(A) is zerofor all symbols. Secondly, the blocks λ(R:U)

(after an appropriate interleaving) and λ(R:C)B arepresented to SISO B, together with the extrinsicinformation λ(E) from the previous step, whichis now used after the interleaving as a prioriinformation λ(A) for SISO B. Subsequently, theextrinsic information from SISO B is in turn de-interleaved and presented to SISO A as a prioriinformation for the second round of decoding.As the decoding progresses, the extrinsic infor-mation is gradually refined, and information onthe whole block is used to produce a final esti-mate on each user bit uj, 0 ≤ j < K. Technically,one can argue that a threshold decision on thecomplete information λj

(Complete) should be usedto obtain the estimate j on bit uj. In practice,after some iterations it seems to make little dif-ference if the current extrinsic information λj

(E)

is used in place of the complete information.

III.A The SISO Blocks and theBCJR Algorithm

The iterative decoding algorithm relies heavilyon the SISO blocks. Several approaches to deal-ing with the SISO approach have been devel-oped, including the maximum a priori decodinginvented by Bahl, Cocke, Jelinek, and Raviv in[4], subsequently known as the BCJR algorithm,and soft-output Viterbi decoding [15]. Belowfollows a description of the BCJR algorithm, inthe additive version as presented by Benedettoet al. [16]. This version produces the extrinsicinformation directly.

The BCJR algorithm, like most SISO algo-rithms, requires a trellis [18] representation ofthe code, see Figure 8. A trellis is a directedgraph. The set of nodes or states of the graphcan be partitioned into subsets S0, S1, ..., Sn-1, Sn.The states of Sj belong to the j-th depth. For ablock code, and hence for a terminated convolu-tional code as the constituent codes of a turbocode, the initial and final depths contain onestate each, so S0 = s0 and Sn = sn. An edgee from a state in Si-1 terminates at a state in Si,1 ≤ i ≤ n. Let sS(e) and sE(e) be the states wheree starts and ends, respectively.

Figure 7 Turbo decoding

N bitsfromchannel

Dem

ultip

lexi

ngLo

g m

etri

c ca

lcul

atio

n

π π π

SISO A

SISO B

Tu

λ(R:C)A

λ(R:U)

λ(E) λ(A)

-1

λ(R:C)B

λ(R:U) λ(E)λ(A)

3) This informal concept of information should not be confused with the mathematically defined information function of information theory, which is usedin the discussion of density evolution in Section IV.

λ(x) = logP (x)

1 − P (x).

u

Page 73: Information Theory and its Applications

71Telektronikk 1.2002

For the terminated trellis of a binary rate 1/2recursive convolutional constituent code, n =K + ν. The trellis in Figure 8 has K = 3 and ν =2.During the first K depths there are two edgesleaving each state, but starting at the K-th depththere is just one edge leaving each state, givinga unique terminating path leading to the endingstate sn at depth n. Each edge e is associatedwith the following labels:

• u(e), the information bit associated with e.This is shown by the color in Figure 8 (greyis zero, orange is one).

• c(e), the parity check symbol associated withe. This is not shown in Figure 8.

Consider a path starting in s0 and terminating insn in the constituent code trellis. Collecting thepair of edge labels at each edge, we obtain anedge label sequence, which corresponds to acodeword of the constituent code. The con-stituent code consists of the set of 2K edge labelsequences obtained by traversing all possiblepaths from s0 to sn.

The algorithm requires two passes (or recur-sions), one forward pass through the trelliswhere certain values regarding the j-th depth ofthe trellis are computed in a recursion based onthe same values at the (j – 1)-th depth; and onebackward pass. In the forward recursion, thedecoder starts the recursion with α0(s0) = 0 andcomputes, for depth j = 1, ..., n and for each states at depth j, the value

(12)

where max* aj operating on T numbers a1, ..., aT

is the sum-operation

(13)

The quantity αj(s) is an LLR representing theprobability that the constituent encoder is in trel-lis state s at time j, conditioned on the availablepreceding channel information and a priori in-formation pertaining to all symbols until time j.To avoid numerical problems, the values αj(s)are normalized at each depth j by finding m =maxs αj(s), and subtracting m from αj(s) for allstates s at depth j.

The backward pass similarly starts the recursionwith βn(sn) = 0 and calculates

(14)

for all states s at depth j, j = n – 1, ..., 1. Thequantity βj(s) is an LLR representing the proba-bility that the constituent encoder is in trellisstate s at time j, conditioned on the availablechannel information and a priori information onall symbols after time j. The values βj(s) are nor-malized in the same way as αj(s).

The final step of the SISO module is to compute,for all j, 0 ≤ j < K, the extrinsic log metric of thej-th information bit, λj

(E). For u = 0, 1, compute

(15)

Then

(16)

Note that the extrinsic information λj(E) on the

j-th information bit is independent of the a priorivalue λj

(A) on the j-th bit.

III.B Termination CriteriaExperience, backed by the density evolutionarguments to be described in Section IV, showsthat the performance of iterative decoding usu-ally improves as the number of iterations in-creases, up to a certain point. The number ofiterations before this convergence is observeddepends on the SNR.

Figure 8 A trellisS0 Sn

αj(s) =∗

maxe:sE(e)=s

⎧⎪⎪⎨⎪⎪⎩

αj−1(sS(e))︸ ︷︷ ︸

preceding state

+ u(e)λ(A)j︸ ︷︷ ︸

a priori value

+ u(e)λ(R:U)j + c(e)λ

(R:C)j︸ ︷︷ ︸

channel information

⎫⎪⎪⎬⎪⎪⎭

log

[T∑

t=1

eat

]

βj(s) =∗

maxe:sS(e)=s

βj+1

(sE(e)

)

+u(e)λ(A)j+1 + u(e)λ

(R:U)j+1

+c(e)λ(R:C)j+1

µ(u, j) =∗

maxe:u(e)=u

αj−1

(sS(e)

)

+βj

(sE(e)

)+ u(e)λ

(R:U)j

+c(e)λ(R:C)j

.

λ(E)j = µ(1, j) − µ(0, j).

Page 74: Information Theory and its Applications

72 Telektronikk 1.2002

How many iterations must be carried out? Hage-nauer et al. [15] suggested to use a threshold onthe cross entropy between the extrinsic informa-tion produced by the two decoders. Wu et al.[30] suggested to use a threshold on the numberof sign differences between these two extrinsicvalues. To avoid some rare cases where thedecoder does not converge, these terminationcriteria must be accompanied by a maximumlimit on the number of iterations. The simulatedBER performance of these techniques is close towhat is obtained with a virtually unlimited num-ber of iterations. The actual number of iterationsdepends on the constituent codes as well as onthe channel SNR, as can also be deduced fromthe density evolution arguments in the next sec-tion. For “typical” constituent codes, [30] claimsthat the average number of iterations is small –less than four or five – at SNRs lower than theerror floor threshold.

IV. Selection of SystemComponents

The turbo code is made up from the constituentcodes, the interleaver, and the puncturingschemes. A complete optimization of puncturingschemes seems to be infeasible; on the otherhand, among simpler schemes there does notseem to be much of a difference. We will there-fore not discuss the issue of puncturing, andfocus on the constituent codes and the inter-leavers.

IV.A Constituent CodesAs for convolutional codes, there is a trade-offbetween decoding complexity and performance.Decoding complexity is directly related to trelliscomplexity [18]. At the error floor, complexconstituent trellises generally offer better perfor-mance, but in the waterfall region the picture isnot so clear.

IV.A.1 Performance at the Error FloorIn [20], a computer search is used to find goodconvolutional constituent codes. The goal of thissearch was to find recursive convolutional en-coders which offer high weight codewords forall weight two input vectors, due to the lessonslearnt from equation (9). However, for non-ran-dom interleavers, weight two input vectors maybe less important. Another design approach is touse codes whose codeword weights grow fastwith the active length of the input vectors, i.e.the number of successive time instants when theencoder is not in the zero state. This correspondsto maximizing the minimum average cycleweights of the constituent codes.

IV.A.2 Performance in the Waterfall Region

Recently [33], [35], [39], density evolution con-siderations have been introduced as a tool forstudying the iterative decoding procedure. In thisapproach, it is assumed that each SISO producesextrinsic information which obeys a Gaussianprobability density function, and which is notcorrelated with the observed received values.These assumptions are reasonable for large inter-leavers [39].

The technique can be applied to many channelmodels. We assume an AWGN channel. For afixed channel SNR, define IA = I(λ(A), u) to bethe mutual information between the a priori val-ues and the transmitted information.4) Similarly,we can define IE = I(λ(E), u) to be the mutualinformation between the extrinsic values pro-duced by the SISO and the transmitted informa-tion. Now, consider the function IE = F(IA, SNR).Analytical expressions for this function havebeen obtained for simple constituent codes [33].For a general code, analytical expressions areunknown, but the relationships can easily beobtained by computer simulation. This computersimulation is much simpler [39] than a turbocode BER simulation, which requires the ob-servation of a large number of very rare errorevents. In Figure 9 a typical function IE = F(IA,SNR) is shown for two different SNRs.

4) At this point we will not go into the finer details of information theory. For a presentation of information theory, see any textbook, for example [9].Note that a mutual information value of 1 means that knowledge of one parameter exactly determines the other. A mutual information value of 0 meansthat the two parameters are statistically independent.

Figure 9 The connectionbetween a priori informationand extrinsic information

Density evolution

Mutual information I at input of SISO block

1

0.8

0.6

0.4

0.2

0

Mut

ual i

nfor

mat

ion

I E a

t ou

tput

of

SIS

O b

lock

0 0.2 0.4 0.6 0.8 1

Higher channel S

NR

Lower channel S

NR

Page 75: Information Theory and its Applications

73Telektronikk 1.2002

Figure 10 contains an EXIT (extrinsic informa-tion transfer) chart. The functions F for the firstSISO and F-1 for the second SISO, for somefixed SNR, are plotted in the same coordinatesystem. The iterative decoding procedure fol-lows a trajectory between these two curves. Asshown in Figure 10, at low SNR, the two curvesintersect so that decoding beyond a few itera-tions will not lead to improved decoding results.At moderate SNR, there is an open tunnel be-tween the two curves, allowing decoding to pro-ceed. The smallest SNR for which the tunnel isopen is called the pinch-off limit in [39], whichcorresponds to the SNR values at which thewaterfall region starts. The EXIT charts can beused for multiple purposes, such as searching forand evaluating good constituent codes, and pre-dicting (reasonably accurately) the BER after aprescribed number of decoding iterations. Thistechnique also presents a strong argument thatturbo coding is very close to optimal at highSNR.

Note that the density evolution technique doesnot take into consideration the effect of a limitedminimum distance. Thus it cannot be applied todetermine the error floor.

IV.B InterleaversIn principle, any permutation of the K input bitscan serve as an interleaver, but all permutationsare not equally good. Before we proceed to per-formance related issues, note that any permuta-tion can be represented by a K × K lookup table.For large K, or in the case where K is adaptive tochannel conditions or can be selected by the user(within a range), lookup tables may be inconve-nient. In such cases it would be nice to have asimple and fast deterministic algorithm to spec-ify the interleaver. For example, a simple blockinterleaver provides maximum spreading of theinterleaved symbols, which according to boththe following subsections is an advantage. How-ever, the regularity of this construction alsoseems to be disadvantageous both in the water-fall region and especially at the error floorregion. As of this writing, the known simpledeterministic algorithms produce interleaverswith a worse performance than the interleaversdiscussed below.

The purpose of the interleaver is basically tooptimize the performance of the turbo codes. Wewill consider the waterfall region and the errorfloor region separately.

IV.B.1 Performance in the Waterfall Region

The density evolution arguments rely on theinformation that is passed between the con-stituent SISO modules to be independent. Sincethese SISO modules are linked through the inter-

leavers, this assumption is not entirely correct.A first order cycle is made up of any two num-bers i, j, 0 ≤ i < j < K. The length of the cycleis defined as l = l(i, j) = j – i + |π(i) – π(j)|. Onesuch cycle is shown in Figure 11. If l is small,the independence assumption is jeopardized.Indeed, the negative impact on the waterfallperformance in an interleaver with many shortcycles can be shown by density evolution argu-ments as well as by simulation.

Hokfeldt et al. [26] designed interleavers withthe aim of minimizing the correlation betweenthe extrinsic information produced by successiveSISO iterations. The method seeks to minimizethe number of short first order cycles. Theredoes not seem to be a significant difference inthe waterfall region between these interleaversand those designed in [42], designed next.

IV.B.2 Performance in the Error Floor Region

In the error floor region, the key parametersdetermining performance are the code’s mini-mum distance, and the number of codewordsof minimum distance. The interleaver can be

Figure 11 Short interleavercycles

Figure 10 Exit chart

Density evolution

A priori information I of SISO A, Extrinsic information IE of SISO B

1

0.8

0.6

0.4

0.2

0

A p

rior

i inf

orm

atio

n I

of S

ISO

B,

Ext

rins

ic in

form

atio

n I E

of

SIS

O A

0 0.2 0.4 0.6 0.8 1

Input Encoder B

Input Encoder A

π(i)

i j

π(j)

Page 76: Information Theory and its Applications

74 Telektronikk 1.2002

designed with the aim to optimize these parame-ters.

Interestingly, the simplest step is again to makesure that there are no short first order cycles inthe interleaver. Dolinar and Divsalar suggestedto use s-random interleavers [13]. Their s-ran-dom interleavers do not contain interleaver map-pings π(i), π(j) if

|i – j| < s and |π(i) – π(j)| < s,simultaneously. (17)

These are pseudo-randomly generated inter-leavers, where interleaver mappings π(i) areadded one by one and discarded during the gen-eration process if they violate (17) together withany of the existing mappings π(j). Crozier [42]redefined the s-random property slightly, byrequiring that l = |i – j| + |π(i) – π(j)| ≥ s, and pre-sented two efficient ways to generate powerfulinterleavers.

Andrews et al. [21] suggested to also removeshort cycles of the type shown in Figure 6 (a).Breiling et al. [32] designed interleavers for spe-cific pairs of constituent codes, by explicitlyavoiding loops that involved known low weightcodewords in both constituent codes. The“other” interleaver in Figure 5 was found byusing these techniques in combination with theones in [42].

V. Variations, Generalizations,Applications and Limitations

V.A VariationsCodes can be concatenated in other ways than byparallel concatenation, and still be decoded bythe turbo decoding method. This can include twoor more constituent codes. Serial concatenationis one possibility.

V.A.1 Serial ConcatenationSerial concatenation [19] is shown in Figure 12.Such constructions can also be decoded with aniterative decoding method. For details see [19].Serially concatenated codes tend to have largerminimum distances than parallel concatenatedcodes of comparable complexity and rate. How-ever, research results so far suggest that theturbo decoding algorithm also seems to be lessefficient in this case.

V.A.2 Repeat-Accumulate CodesDivsalar et al. [23] suggested a particularly sim-ple serial concatenation: The inner “code” is arate 1 accumulator mapping, i.e. its single outputbit at time j is simply the modulo 2 sum of allinput bits so far. The accumulator mapping canbe described by a 2-state trellis, to which a SISOmodule is applied. The outer code is a rate 1/3repetition code: each information bit is copiedtwice in the output. This simple constructionperforms surprisingly well, especially in thewaterfall region. This can be explained by den-sity evolution arguments [33].

V.B GeneralizationsA turbo code can be viewed as a low-densityparity-check (LDPC) code, introduced already in1962 by Gallager [3], and reinvented by Tannerin 1981 [6]. Apparently, technology was not ripefor these ideas at the time when they were firstintroduced.

Wiberg et al. [12] observed that turbo codes, aswell as LDPC codes, can be described by a Tan-ner graph. Turbo decoding, as well as iterativedecoding of LDPC codes, proceeds as a messagepassing algorithm on this graph. This graph andthe decoding algorithm acts by factoring thedecoding problem. Generalizing these concepts,we arrive at factor graphs, which cover a vastnumber of problems, from many applicationareas, and their solutions, as special cases.See [34] for more details.

In the message passing algorithms, optimalscheduling of the messages is a problem yet tobe solved. This has led to the introduction ofparallel message passing algorithm, and to theextreme cases of parallelism: Analog decoders[22], [25].

λ(E) : all N’ code bitsλ(A) : all N’ code bits

N bits through channel

Demultiplexing, puncturing,Log metric calculation

π

T u

λ(R) : all N’ code bits

Multiplexing and puncturing

Outer EncoderOuter Encoder

N

N’

u : K information bits

InnerSISO

π

λ(A) : N’ inform. bitsλ(E) : N’ inform. bits

OuterSISO

λ(E) : K information bits

π-1

Figure 12 Serial concatenation

Page 77: Information Theory and its Applications

75Telektronikk 1.2002

V.C ApplicationsThis paper has focused on turbo codes applied toAWGN channels with BPSK modulation. Turbocodes can of course be applied to a variety ofcoding and communication situations. Wereview some of these below.

• Coded modulation is possible with turbocodes as well. In contrast to traditional meth-ods, like trellis coded modulation [7], set par-titioning cannot be readily applied. Somechallenges remain, concerning issues such ashow to map the binary encoded bits to signalconstellation points, or rotation invariance.Still, turbo coded modulation produce goodresults [29].

• Turbo codes have been applied with successto fading channels [29]. This usually requiresa combined iterative detection/decodingscheme where equalization [25] and/or esti-mation of channel parameters [38] are in-cluded in the decoding process.

• An example of a completely different applica-tion is the use of turbo decoding in correlationattacks on cryptographic functions [27], [28].

McEliece [41] suggests that turbo-like codes willwork well in most communication situations.

V.D LimitationsTurbo coding performs close to the Shannonbound even when a very small BER is required.The relatively small minimum distance presentssome problems, though. This means that toachieve a small BER at low SNR, a certain mini-mum information length N is required. Sinceturbo decoding processes one block at a time,this can cause a decoding delay beyond what istolerable for some applications.

References1 Asbjørnsen, P C, Moe, J. God dag mann! –

Økseskaft! In: Norske Folkeeventyr, 1841 –1844. (In Norwegian)

2 Shannon, C. A Mathematical Theory ofCommunication. Bell System Tech. J., 27,July and October, 379–423, 623–656, 1948.

3 Gallager, R. Low-density parity-checkcodes. IRE Transactions on InformationTheory, IT-8 (1), 21–28, 1962.

4 Bahl, L R et al. Optimal decoding of linearcodes for minimizing symbol error rate.IEEE Transactions on Information Theory,IT-20 (2), 284–287, 1974.

5 MacWilliams, F J, Sloane, N J A. The theoryof error-correcting codes. Amsterdam,North-Holland, 1977.

6 Tanner, M. A recursive approach to lowcomplexity codes. IEEE Transactions onInformation Theory, IT-27 (5), 533–547,1981.

7 Ungerboeck, G. Channel coding with multi-level/phase signalling. IEEE Transactions onInformation Theory, IT-28 (1), 55–67, 1982.

8 Lin, S, Costello, D. Error Control Codes.Englewood Cliffs, Prentice Hall, 1983.

9 Johannesson, R. Informationsteori : Grund-valen för (tele-)kommunikation. Lund,Studentlitteratur, 1988. (In Swedish).

10 Blahut, R. Digital Transmission of Informa-tion. New York, Addison-Wesley, 1990.

11 Berrou, C, Glavieaux, A, Thitimajshima, P.Near Shannon limit error correcting codingand decoding : Turbo-codes. In: Proc.ICC’93, Geneva, Switzerland, May 1993,1064–1070.

12 Wiberg, N, Loeliger, H-A, Koetter, R. Codesand iterative decoding on general graphs.European Transactions on Telecommunica-tions, 6 (5), 513–525, 1995).

13 Dolinar, S, Divsalar, D. Weight distributionsfor turbo codes using random and nonran-dom permutations. Pasadena, CA, USA, JetPropulsion Lab (JPL), 1995. (TDA Progressreport, 42-122.)

14 Benedetto, S, Montorsi, G. Unveiling turbocodes: some results on parallel concatenatedcoding schemes. IEEE Transactions onInformation Theory, 42 (3), 409–428, 1996.

15 Hagenauer, J, Offer, E, Papke, L. Iterativedecoding of binary block and convolutionalcodes. IEEE Transactions on InformationTheory, 42 (3), 429–445, 1996.

16 Benedetto, S et al. A soft-input soft-outputmaximum a posteriori (MAP) module todecode parallel and serial concatenatedcodes. Pasadena, CA, USA, Jet PropulsionLab (JPL), 1996. (TDA Progress report, 42-127.)

17 Pless, V, Huffman, W C (eds.). Handbook ofCoding Theory. Amsterdam, North-Holland,1998.

Page 78: Information Theory and its Applications

76 Telektronikk 1.2002

18 Vardy, A. Trellis structure of codes. In:Pless, V, Huffman, W C (eds.). Handbook ofCoding Theory. Amsterdam, North-Holland,1998. 1989–2118.

19 Benedetto, S et al. Serial Concatenation ofInterleaved Codes : Performance Analysis,Design, and Iterative Coding. IEEE Transac-tions on Information Theory, 44 (3),909–926, 1998.

20 Benedetto, S, Garello, R, Montorsi, G. Asearch for good convolutional codes to beused in the construction of turbo codes.IEEE Transactions on Communications, 44(9), 1101–1105, 1998.

21 Andrews, K S, Heegard, C, Kozen, D. Inter-leaver design methods for turbo codes. In:Proc. IEEE Int. Symposium on InformationTheory, Cambridge, MA, USA, 1998, 420.

22 Loeliger, H-A et al. Iterative sum-productdecoding with analog VLSI. In: Proc. IEEEInt. Symposium on Information Theory,Cambridge, MA, USA, 1998, 146.

23 Divsalar, D, Jin, H, McEliece, R J. CodingTheorems for ‘turbo-like’ codes. In: Proc.1998 Allerton Conference on Communica-tions, Allerton, IL, USA, Sept. 1998.

24 Heegard, C, Wicker, A B. Turbo Coding.Boston, Kluwer, 1999.

25 Hagenauer, J et al. Decoding and equaliza-tion with analog non-linear networks. Euro-pean Transactions on Telecommunications,10 (5), 659–690, 1999.

26 Hokfeldt, J, Edfors, O, Maseng, T. Inter-leaver design for turbo codes based on theperformance of iterative decoding. In: Proc.IEEE International Conference on Commu-nications, Vancouver, BC, Canada, June1999.

27 Johansson, T, Jönsson, F. Fast correlationattacks based on Turbo code techniques. In:Proceedings of Crypto’99, Santa Barbara,CA, USA, August 1999. Lecture Notes inComputer Science, Berlin, Springer, 1999.

28 Fossorier, M P C, Mihaljevic, M J, Imai, H.Critical noise for convergence of iterativeprobabilistic decoding with belief propaga-tion in cryptographic applications. In: Pro-ceedings of AAECC’13, Honolulu, HI, USA,November 1999. Lecture Notes in ComputerScience, Berlin, Springer, 1999.

29 Vucetic, B, Yuan, J. Turbo Codes : Princi-ples and Applications. Boston, Kluwer,2000.

30 Wu, Y, Woerner, B D, Ebel, W J. A simplestopping criterion for turbo decoding. IEEECommunication Letters, 4 (8), 258–260,2000.

31 Breiling, M, Huber, J B. A method for deter-mining the distance profile of turbo codes.In: Proc. of 3rd ITG conference on sourceand channel coding, Munich, Germany, Jan-uary 2000.

32 Breiling, M, Peeters, S, Huber, J B. Inter-leaver design using backtracking and spread-ing methods. In: Proc. IEEE Int. Symposiumon Information Theory, Sorrento, Italy, June2000, 451.

33 Divsalar, D, Dolinar, S, Pollara, F. Lowcomplexity turbo-like codes. In: Proc. 2ndInt. Symposium on Turbo Codes, Brest,France, September 2000, 73–80 .

34 Kschischang, F R, Frey, B J, Loeliger, H-A.Factor graphs and the sum-product algo-rithm. IEEE Transactions on InformationTheory, 47 (2), 498–519, 2001.

35 Richardson, T, Urbanke, R. The capacity oflow-density parity-check codes under mes-sage-passing decoding. IEEE Transactionson Information Theory, 47 (2), 599–618,2001.

36 Chung, S Y et al. On the design of low-den-sity parity-check codes within 0.0045 dB ofthe Shannon limit. IEEE Communicationsletters, 5 (2), 58–60, 2001.

37 Garello, R G, Pierleoni, P, Benedetto, S.Computing the free distance of turbo codesand serially concatenated codes with inter-leavers: Algorithms and applications. IEEEJournal of Selected Areas in Communica-tions, 19 (5), 800–812, 2001.

38 Valenti, M C, Woerner, B D. Iterative chan-nel estimation and decoding of pilot symbolassisted turbo codes over flat-fading chan-nels. IEEE Journal on Selected Areas inCommun., 19 (9), 1697–1705, 2001.

39 ten Brink, S. Convergence behavior of itera-tively decodes parallel concateneated codes.IEEE Transactions on Communications, 49(10), 1727–1737, 2001.

Page 79: Information Theory and its Applications

77Telektronikk 1.2002

40 Breiling, M, Huber, J B. Combinatorial anal-ysis of the minimum distance of turbo codes.IEEE Transactions on Information Theory,47 (7), 2737–2750, 2001.

41 McEliece, R J. Are turbo-like codes effectiveon non-standard channels? IEEE InformationTheory Newsletter, 51 (4), 2001. (Presentedat IEEE Int. Symposium on InformationTheory, Washingthon, D.C., USA, June,2001.)

42 Crozier, S N. New High-Spread High-Dis-tance Interleavers for Turbo-Codes.Preprint, 2001.

43 Rosnes, E, Ytrehus, Ø. Algorithms for turbocode weight distribution calculation withapplications to UMTS codes. To be pre-sented at IEEE Int. Symposium on Informa-tion Theory, Lausanne, Switzerland, July2002.

Page 80: Information Theory and its Applications

Telektronikk 1.2002

1 IntroductionBy the widespread use of mobile cellularphones, radio communications have become anatural part of life for many people around theworld. This has been made possible by advancedtechnical research enabling cost-effective imple-mentation and reliable communication in arather harsh environment. In general a radiochannel is exposed to many degrading and dis-turbing effects like interference, multipath prop-agation causing fading, Doppler shifts, and ther-mal noise. Obviously, to be able to operate acommunication system under such conditions,some method or scheme for error control isrequired. Many of these advanced error controlschemes were developed for deep space commu-nications and satellite communications wherepower (and to some extent bandwidth) limits theperformance. Other advances in coding theoryhave been made for applications where band-width efficiency is very important. Examples ofsuch systems are cable modems and radio relaycommunications.

An example of a satellite communication link isshown in Figure 1. Satellite communication sys-

tems are characterised by an extremely large dis-tance between transmitter and receiver, since thecommunication signal is transmitted via a satel-lite located up to 36,000 km (for Geosynchron-ous Earth Orbits – GEO) above the earth surface.Since the intensity of electromagnetic waves (infree space) decays with the square of the radiopath length, such a system is clearly power lim-ited. Therefore, satellite systems normally needline of sight to have sufficient link margin forreliable operation. Naturally, sophisticated errorcontrol coding schemes are necessary for accept-able performance, as well as for limiting thepower consumption in both satellite and termi-nal. This is especially important for mobile ter-minals. Another major problem with the longsignal path of GEO satellites is the correspond-ing long delay. This delay might reduce the per-ceived quality of speech services and causeproblems for delay sensitive data communica-tion services. Thus, satellite systems have alsobeen designed with satellites in lower earthorbits. For such systems, the delay is reduced,but other problems like more Doppler and needfor hand-over between satellites occur. A com-mon and accepted channel model for satellite

Theory and Practice of Error Control Codingfor Satellite and Fixed Radio SystemsP Å L O R T E N A N D B J A R N E R I S L Ø W

Strong “state-of-the-art” Forward Error Correction (FEC) coding has been extensively applied for both

satellite and fixed radio communication. With the discovery of Turbo codes, performance close to the

capacity limits for moderate bit error rates became possible. In this paper we discuss various aspects

of the capacity theorem and the capacity limits. We describe and present results for some error control

coding schemes that are currently applied in satellite and fixed radio systems, and compare these

results with the theoretical limits. Finally, we describe and evaluate promising new coding schemes

that have not yet been applied in commercial systems.

Bjarne Risløw (38) is a ResearchScientist at Nera Research. Hereceived his MSc in ElectricalEngineering at the NorwegianInstitute of Technology in 1988.He has worked as a ResearchScientist at SINTEF DELAB(1989–1992), ABB CorporateResearch (1993–1994) and from1995 at Nera Research. At ABBand Nera he has worked withthe development of terminalsand earth stations for theInmarsat systems. In the lastyears he has been mostlyinvolved in wireless broadbandaccess procject, DVB-RCS andLMDS. His main field of interestis with terminal technologiesand in particular modem tech-nology, coding and [email protected]

Pål Orten (35) is Research Man-ager at Nera Research. He re-ceived his Siv.Ing. degree fromthe Norwegian University of Sci-ence and Technology in 1989,and his PhD from Chalmers Uni-versity of Technology in 1999.From 1990 to 1995 he wasResearch Scientist at ABB Cor-porate Research and from 1995at Nera Research (interruptedby PhD studies). In addition tovarious research and develop-ment activities in channel cod-ing and signal processing atABB and Nera, he also partici-pated in the European researchproject FRAMES, which resultedin the definition of the 3G UMTSsystem in ETSI. His current re-search has focus at fixed wire-less communications andmobile satellite communications.

[email protected]

Satellite

Terminal Earth Station

Return Channel

Forward Channel

Figure 1 Satellite System

78

Page 81: Information Theory and its Applications

79Telektronikk 1.2002

systems is the Ricean channel model. Withhandheld phones, and thus rather non-directiveantennas, the scattered components may be sig-nificant. However, with rather directive anten-nas, and slowly moving or fixed terminals, thedirect ray is very strong compared to the diffusescattered components. The channel is then quiteclose to an Additive White Gaussian Noise(AWGN) Channel. For maritime or aeronauticalapplications these assumptions may not hold,and other fading models must be used. Moreinformation about satellite channels can befound for instance in [1].

Figure 2 shows a radio link communication sys-tem with some high capacity radio hops (typi-cally STM-1 data rates1) bringing the communi-cation to a city where the traffic is further dis-tributed to the user by cable or radio transmis-sion. Radio link communication systems arerather different from satellite communicationsystems, since they are typically used insteadof fibre cables in areas where it is impracticalor not cost effective to deploy fibre cables.Radio relay systems therefore have to be highlyreliable, provide rather high data rates, and alsoa very low bit error rate. The goal for the radiosystem design is then to provide close to fibrequality and speed. Obviously, this sets strongrequirements on the channel coding that may beapplied. Radio links operate with line of sightcommunications, with antennas normallymounted several meters above the ground. Nev-ertheless, reflections may occur from oceansand/or mountains resulting in a multipath fadingchannel. Also precipitation may cause the signalto fade. A much used channel model for radiolink communication is a two-ray model, wherethe rays are separated by 6.3 ns.

In this paper we focus on coding for satellitecommunications and radio link communicationsystems. Initially, we present the capacity limitsof Shannon, as well as the practical limits for agiven code rate, modulation method and blocklength. We then study coding schemes that havebeen applied in many satellite and radio link sys-tems, and show examples of their performance.We then proceed with a presentation of Turbocodes that are now implemented in severalrecently designed satellite systems. Next, welook at coding schemes and performance forhigh spectral efficiency radio systems. Further-more, we present coding schemes that have notyet been applied in commercial systems, includ-ing two block code based coding schemes withiterative decoding, Turbo Product Codes andLow Density parity Check Codes. We believethese are strong candidates to be implemented

in future satellite and radio systems. Finally, thepaper presents a comparison of some of the cod-ing schemes along with a discussion of theirproperties.

2 The Capacity Theoremand Capacity Limits

2.1 IntroductionClaude Shannon developed the formulas for “themaximum rate of transmission of binary digitsover a system when the signal is perturbed byvarious types of noise” [2]. For the AWGNchannel with bandwidth, W, and signal-to-noiseratio, S/N, Shannon showed that it was possibleto transmit binary digits at a rate

(1)

with as small probability of errors as desired.This maximum rate is also called the channelcapacity. Shannon used a geometrical approachto prove this. Assume that m bits are encodedwith M = 2m different signal functions inside the

code sphere of radio. P is the average

power of the codes. Then, by letting T (codelength) go to infinity one would reach the capac-ity limit. A remarkable point is that the boundholds on average even if the M signals arepicked at random within the code sphere.

Many researchers have tried to design random-like codes, but a decoder for a pure random codewould be very complex since the decoder wouldhave to compare the received signal against allM possible transmitted sequences. This decoderwould generally have to calculate Euclidean dis-tances (use soft decision). With the discovery ofTurbo codes one suddenly found a practical wayto decode long random-like codes using softdecisions, but still with limited decoding com-plexity.

Let us take a closer look at the capacity formula.When signalling at a certain rate Rs the signal tonoise ratio S/N relates to the energy to noise den-sity for each bit Eb / N0 as:

2TWP

C = W log2 1+S

N⎛⎝

⎞⎠

Figure 2 Radio linkcommunication system

1) STM-1 (Synchronous Transport Module level 1) is the basic transmission rate (155.52 Mbit/s) in SDH (Synchronous Digital Hierarchy).

CityLink

Page 82: Information Theory and its Applications

80 Telektronikk 1.2002

(2)

The minimum required Eb / N0 when transmit-ting at capacity (Rs = C) is then:

(3)

The relationship between the minimum requiredEb / N0 and the relative bandwidth (W/C) isshown in Figure 3.

It is then easy to see that when the channel band-width W goes to infinity the minimum requiredEb / N0 will become ln(2) or –1.6 dB. This resultis not very useful since unlimited bandwidth willnot be available. Nevertheless, it is a fundamen-tal limit. Now if we let the η = C/W be the spec-tral efficiency (bit/s/Hz), then Eq. (3) becomesEb / N0 = (2η – 1) / η, which is the unconstrainedlimit as plotted in Figure 5.

Until a few years ago there was no practical wayof getting really close to the Shannon limit. Witha certain coding scheme a reasonably good cod-ing gain compared to uncoded modulation couldbe achieved, but we were still far away from theShannon limit. As an example, concatenation ofan inner convolutional and an outer Reed-Solomon code was still about 2–3 dB away fromthe capacity limit. However, with the discoveryof Turbo codes and iterative decoding oneclosed the gap and achieved performance within1 dB of the Shannon limit with practical codelengths.

2.2 Capacity Limit as Function of Code Rate

Eq. (1) can be re-written to take into account theeffect of the code rate. If we let S/N = R *Eb / N0 where R is the code rate of the code, i.e.the ratio between the number of information bitsand actual number of bits transmitted on thechannel, and W = 1/2T = Rs / 2 we get the capac-ity in bits per dimension as:

(4)

From the converse to the coding theorem onecan derive the inequality, [3]

R ⋅ (1 – Hb(p)) ≤ C (5)

where

Hb(p) = –p ⋅ ln(p) – [1 – p] ⋅ ln[1 – p]). (6)

Hb(p) is the binary entropi function where p isthe coded bit error probability. One can now cal-culate the bit error rate for a specific code rateand Eb / N0 and the results are shown in Figure 4.

Furthermore, the Shannon limit for the differentcode rates can be written as:

(7)

From Eq. (8) and Figure 4 we can see that thecapacity limit by using 1/2-rate coding is 0 dB.

Figure 3 Relationship betweenthe minimum required Eb / N0

and bandwidth (W/C)

Figure 4 Minimum bit errorrate as function of code rate

-2

Shannon LimitAsymptote = -1.6 dB

8

7

6

5

4

3

2

1

0

C/W

-1 0 1 2

Eb/N0 [dB]

3 4 5 6

100

10-1

10-2

10-3

10-4

10-5

10-6

BE

R

-2 -1.5 -1 -0.5 0

Eb/N0 [dB]

0.5 1 1.5 2

R=0.75R=0.5R=0.33R=0.1

S

N=

Eb

N0

Rs

W

Eb

N0

=W

C

(2

C/W− 1

)

C =1

2log

2

(1 +

2REb

N0

)

Eb

N0

=22R

− 1

2R

Page 83: Information Theory and its Applications

81Telektronikk 1.2002

2.3 Capacity Limit with Constrained Input

The bounds calculated earlier in this documentare based on unconstrained input, but most sys-tems use some kind of constrained input like forinstance BPSK, QPSK, 8-PSK, or 16-QAM. Thechannel capacity for equiprobable binary con-strained input (± 1), [4], can be written as:

(8)

The probability density function of the receivedsignal including noise, p(y | ±1), has mean ±1and variance σ2 = 1 / (2 * Es / N0). Adjusting forthe code rate, we have Es / N0 = R * Eb / N0. Wecan now determine the capacity in bit/s/Hz fromEq. (8). This expression can easily be extendedto any square QAM constellation, and the capac-ity limit for some of these constellations isshown in Figure 5.

If we assume QPSK and a code rate of 1/2 thisgives a spectral efficiency of 1 bit/s/Hz. Thecapacity limit in this case is 0.2 dB while theunconstrained limit from Figure 4 was 0 dB.We see that for higher order modulationschemes the equiprobable square QAM constel-lation will never reach the Shannon limit, andasymptotically this distance will be 1.53 dB.There are two ways to improve the performance;either by making the probability distributionmore Gaussian or by increasing the dimension ofthe signal constellation. This is also called shap-ing gain, and at high SNR this gain is separablefrom the coding gain. Methods exist that caneasily obtain 1 dB of the theoretical gain.

2.4 Capacity Limit as Function of Block Length

Finally the actual code length used will limitthe capacity. This can be found from the spherepacking bound and the detailed formulas canbe found in [5]. In Figure 6 we have plotted thecapacity limit as function of block length for 1/2and 3/4 rate.

Note that the influence of the block length is thesame for both code rates. The curves are onlyshifted upwards as the code rate is increased. Inorder to reach the capacity limit a rather longblock length (106) is required, but such blocklengths are impractical for most systems. Typi-cal block lengths can be the ATM cell size (424bit) or MPEG packet size (1504 bit), and if weassume 1/2-rate coding this gives a capacity lossof 1.4 dB and 0.8 dB respectively.

C =1

2p

−∞

∫ y 1( ) log2p y 1( )p y( )

dy +

1

2p

−∞

∫ y −1( ) log2p y −1( )

p y( )dy

2.5 How to Utilise the Limits

Code DesignTurbo codes have been made, which are veryclose to the Shannon Limit at moderate BER lev-els. These codes and other related codes are notalgebraic codes, but depend on some parameters,which are found by a cut and try process. When

Figure 5 Spectral Efficiency as function capacity limit QAM

Figure 6 Capacity limit as function of block length (information bits)

6

Capacity Limit Eb/N0 [dB]

6

5

4

3

2

1

0

Spe

ctra

l Effe

cien

cy

-2 0 2 4 8 10 11 14

UnconstrainedBPSKQPSK16-QAM64-QAM

Cap

acity

lim

it (d

B)

R=0.5R=0.75Limit for R=0.5Limit for R=0.75

3.5

3.

2.5

2

1.5

1

0.5

0

-0.5

-1

-1.5104

Block length N information bits

102 103 105 106

Page 84: Information Theory and its Applications

82 Telektronikk 1.2002

doing simulation for a set of code parameters wecan easily determine whether a code is good ornot by comparing the simulation results againstthe theoretical limits. Further in this article wewill use the limits to compare different codingschemes, since not all codes are directly compa-rable to each other (different code rate, blocklength).

System Design LevelNew codes like Turbo codes offer very largeflexibility with respect to code rates and blocklengths. When evaluating different coding tech-niques to be used in a new system this usuallyincludes extensive time-consuming simulations.Instead we can use the capacity limits to esti-mate the performance since the limits presentedoffer an easy way to get a first estimate.

3 Channel Coding forRadio Systems

3.1 IntroductionSince the radio channel is subject to variousimpairments like fading, interference and ther-mal noise, advanced channel coding is necessaryfor sufficient performance and range. In additionto the performance criteria, the coding schememust also allow sufficient system flexibility. Forsatellite communication systems it might also benecessary to apply adaptive modulation and cod-ing to account for varying quality of servicerequirements and channel variations. The codingscheme should then be able to provide variableerror protection depending on the channel condi-tions. For fixed radio systems reliability andextremely low error rates are vital, and the flexi-bility is not easily utilised.

Figure 7 Convolutional feedforward encoder

3.2 Coding for Satellite Applications

3.2.1 Viterbi Decoding of Convolutional Codes

Convolutional codes have been very popular forforward error correction in satellite systems. Forconvolutional codes, dependence between thesymbols is obtained by performing a convolutionon the data symbols. With more than one suchlinear combination, redundancy is added, and thecode can correct errors. Figure 7 shows a feedforward convolutional encoder with constraintlength K and code rate R = 1/n or R = 1/(n+1)if the systematic2) encoder switch is closed.Which shift register contents to add (modulo 2)is decided by the generator polynomials, gi. Theoutputs are then multiplexed onto the output line.

A nice feature with convolutional codes is theexistence of a maximum likelihood sequencedecoding algorithm originally proposed byViterbi in 1967 [6] and now known as theViterbi algorithm. The constraint length, K, of aconvolutional code is commonly defined as thenumber of encoder memory (or delay) elementsplus one3). Since the complexity of the Viterbialgorithm increases exponentially with the mem-ory of the encoder, we normally have to chooseconstraint lengths around 10 or lower to be ableto decode with the Viterbi algorithm. The coderate, R, is the relation between the number ofinformation bits, k, and code bits, n, transmitted,such that R = k/n. Rate k/n codes where k is dif-ferent from one can be constructed by applyingmultiple shift registers. A problem with thisapproach is that the number of trellis statesbecomes 2kK. Instead, high rate convolutionalcodes can be obtained by puncturing. Puncturingmeans that we delete some code bits of a lowerrate 1/n code to obtain a code with a higher rate.For these codes the complexity of the decodingis essentially the same as for a rate 1/n code.Results have shown that the performance loss isalso quite low by constructing families of multi-rate and rate compatible codes by puncturing [7][8]. An important parameter for good perfor-mance of convolutional codes is the free dis-tance, df. The free distance is the distance (orweight) of the path with the shortest distancefrom the correct path in the trellis.

A convolutional code that has become almost astandard for satellite communications is the rateR = 1/2 constraint length K = 7 convolutionalcode with generator polynomials 133 and 171(in octal notation). It is for instance used in anumber of Inmarsat systems, and in the DigitalVideo Broadcasting (DVB) satellite system. The

2) When the encoder is systematic the information sequence appears directly in the coded sequence.3) Other definitions also exist in the literature.

1 2 3 K-2 K-1 K

g0

gn

Page 85: Information Theory and its Applications

83Telektronikk 1.2002

free distance of this code is 10. This is the high-est possible free distance of a rate 1/2 feed-for-ward convolutional code with constraint length7. This code turns out also to be optimum withregard to a more sophisticated criterion [9]. InTable 1 we give the required Eb / N0 to achievea bit error rate of 10-6 for this constraint lengthK = 7 code. Observe that although the code has arather short constraint length, the performance isless than 5 dB away from the theoretical limit ona BPSK/QPSK channel.

3.2.2 Sequential Decoding ofConvolutional Codes

As described above the complexity of theViterbi decoding algorithm increases exponen-tially with the constraint length. At the sametime the error probability of the convolutionalcode also decreases exponentially with the con-straint length. To have sufficiently low bit errorrates at low signal-to-noise ratios, we might haveto use a constraint length that makes Viterbidecoding impractical. A possible solution is thento use sequential decoding, see for instance [10]or [11] for details. Sequential decoding is a sub-optimal decoding algorithm, but since the decod-ing complexity increases linearly with the con-straint length instead of exponentially, we canalways choose a constraint length that is suffi-ciently high to meet our BER requirements evenif the decoding is sub-optimal.

The basic idea of sequential decoding is tosearch the code tree sequentially, going alongthe branch which gave the best metric increment.When the channels are good this is most likelythe path that would have been chosen also by theViterbi algorithm. When the channel is noisier,we will from time to time make an incorrectlocal decision in our search. This will usually berealised quite soon since the accumulated metricwill now be bad. The decoder will thus back upand try alternative paths, which locally gave aworse metric. Obviously, when the channel isvery noisy there will be a lot of back- and for-ward searches before the correct path is found.

Therefore, the decoding time will vary with thechannel conditions and is thus a random vari-able. To cope with this varying decoding com-plexity, large buffers are required to store datain periods of intense search activity. With finitebuffer sizes there is always a certain probabilitythat the decoder has not finished decoding be-fore the buffers are full (buffer overflow situa-tion). The decoding must then be stopped. Thisis normally the most critical event with sequen-tial decoding. Since the undetected error proba-bility can be made sufficiently low by choosinga high constraint length, the overflow rate willdominate the total error rate. For sequentialdecoding the speed of the decoder will thereforehave influence on the system performance. Abetter implementation or a faster hardware orsignal processor will improve the error rate sincethe decoder has time for more search, and bufferoverflows become more rare. To be able torecover quickly from an overflow situation, sys-tematic encoders are often used with sequentialdecoding. It can be shown that if a channel/mod-ulation parameter called the Pareto exponent isabove one, ρ > 1, then the mean value of thenumber of computations required will bebounded. In this case a sufficiently large con-straint length can be chosen to have negligibleerror rate.

“Inmarsat B High speed data” is a mobile/port-able satellite service launched some years ago.This system uses a rate 1/2 systematic convolu-tional code with constraint length K = 36. Due tothe very high constraint length, Viterbi decodingis impossible and sequential decoding is applied.The generator polynomial of the encoder is714461625313 (octal notation). This code is anoptimum distance profile (ODP) convolutionalcode [12], which makes it well suited forsequential decoding.

There has also been some work on sequentialdecoding for Rayleigh fading channels [13], andfor such channels a higher signal-to-noise ratiois needed for the Pareto exponent to be one.

Code rate Eb / N0 to achieve Capacity for constrained Capacity limit Loss compared to

Performance at 10-6 binary input block length constrained input limit

1/2 5.0 dB 0.2 dB Not applicable 4.8 dB

3/4 6.0 dB 1.6 dB Not applicable 4.4 dB

Table 1 Performance of constraint length 7

convolutional code withViterbi decoding, compared

to capacity limits

Code rate Eb / N0 where Pareto Capacity for constrained Capacity limit Loss compared to

exponent is equal to 1 binary input block length constrained input limit

1/2 2.2 dB 0.2 dB Not applicable 2.0 dB

3/4 3.7 dB 1.6 dB Not applicable 2.1 dB

Table 2 Performance ofsequential decoding, compared

to capacity limits

Page 86: Information Theory and its Applications

84 Telektronikk 1.2002

3.2.3 Concatenated CodingConcatenated coding is a combination of aninner code and an outer code where the innercode should work well at low signal-to-noiseratios using maximum likelihood decoding (softdecoding) and a strong algebraic outer code withlow redundancy. A much used combination is touse a convolutional code (K = 7) with Viterbidecoding and a Reed-Solomon outer code. TheReed-Solomon code is non-binary and typicallyworks on 8-bit symbols. The errors from theViterbi decoder are bursty and in order to com-bat these errors a byte interleaver is used to splitup the long error burst. However, for packettransmission and time critical services interleav-ing cannot be used.

This coding scheme was for a long period the“state-of-the-art” in communication systemsand has been used in deep-space communication(“Voyager” [14]) and is used in the DVB satel-lite system. The Voyager code uses a (255,223)Reed-Solomon code and DVB-RCS a (204,188)Reed-Solomon code [15]. In the DVB-RCS sys-tem one is limited to MPEG packets and cannotuse interleaving, while for Voyager one couldinterleave up to 8 Reed-Solomon blocks. Theperformance is shown in Table 3.

3.2.4 Turbo CodesBerrou et al. [16] introduced a new class of errorcorrecting codes, called Turbo Codes. This cod-ing technique is based on Parallel ConcatenatedConvolutional (PCC) codes, where two parallel

Figure 8 The original TurboCode as presented by Berrou

convolutional codes are separated by an inter-leaver, see Figure 8. The component encodersare RSC (Recursive Systematic Convolutional)codes. These encoders can be very short. Proba-bly the most important element in the code isthe interleaver, which defines the minimum dis-tance. Designing the interleaver is critical for theperformance, and the interleaver should ideallybe random or pseudo-random [17]. The Turbocode can easily be applied to any block size andcode rate, [16] and [18]. However one must beespecially careful with the interleaver and punc-turing map for high code rates [19].

The decoding is iterative and based on soft-inputand soft-output.

To achieve this, the Turbo decoding algorithmmay apply the Bahl-Jelinek-Cocke-Raviv (BJCR)algorithm [20], or the less optimum soft-outputViterbi algorithm (SOVA) [21]. A simplificationof the BJCR algorithm has been shown toachieve very good results with very little perfor-mance loss [22], however several simplified ver-sions exist. The actual complexity depends oncomponent codes and the number of iterationsrequired. The Turbo coding/decoding techniquehas achieved results very close to the ShannonLimit.

A lot of work has been done for PCC codes andhas been proposed for a wide range of standards(EDGE, UMTS, DVB-RCS, CCSDS and severalmore). Commercial implementations can be

Code Code rate Eb / N0 to achieve Capacity for Capacity limit Loss at 10-5

BER of 10-5 constrained input block length4)

Voyager 0.44 2.5 dB –0.1 dB 0.2 dB 2.3 dB

DVB-RCS 0.46 3.5 dB 0.0 dB 0.8 dB 2.7 dB

Table 3 Performance ofConcatenated Codes(Convolutional and Reed-Solomon)

4) The block length for the Voyager code is based on 8 interleaved Reed-Solomon blocks while the DVB-RCS code does not use interleaving.

+ T

+

T

+

T

+

T

+

+ T

+

T

+

T

+

T

+

Pseudo-random

Interleaver

Puncturing

Page 87: Information Theory and its Applications

85Telektronikk 1.2002

bought, even though most of them are based onField-Programmable Gate Array (FPGA) solu-tions, like the Nera Turbo decoder [23] and oth-ers [24], [25].

Nera uses both a variant of the original Berroucode and a similar but somewhat less complexTurbo code in its satellite products. The latter isused in the DVB Return Channel System (DVB-RCS), [15] [26]. In Table 4 the performance ofthese two codes is given for MPEG block sizes(188 bytes). We see that both codes are within1 dB of the capacity limit.

The disadvantage of the Turbo code is that onemight get a flattening of the bit error curve.However, with a good interleaver design thisflattening effect can be mitigated and only occurat very low error rates. The inventors havepatented both the Turbo encoding scheme anddifferent decoders.

3.3 Comparison of Coding Techniques

In Figure 9 we compare convolutional codingand concatenated coding with Turbo togetherwith some coding limits. The comparison isdone for the MPEG packet size and approxi-mately 1/2-rate coding. We see that all codesachieve a very good coding gain, but the Turbocode gives us an additional 2 dB compared tomore traditional coding techniques. Appropriatesimulation results for sequential decoding werenot available, but results from [13] for aRayleigh fading channel and a constraint lengthK = 36 code, show that BER = 10-6 is achievedapproximately 1 dB from the theoretical sequen-tial limit. The difference is probably less for anAWGN channel. The sequential limit that isplotted is for 1/2-rate coding, correspondingto a Pareto exponent ρ = 1.

3.4 Coding for Systems withHigh Spectral Efficiency

3.4.1 System RequirementsThe most effective transmission medium bothin terms of reliability and transmission speed isoptical fibre transmission. However, in many

cases it is not possible or cost effective to applyfibre cables. The obvious, and often the onlyalternative is then radio. When the high datarates that are normally transmitted on fibre areto be transmitted via a radio system, we needmodulation schemes with very high spectral effi-ciency, and coding schemes that can provide vir-tually error-free communication. The high spec-tral efficiency requirement means that the coderate must be rather high, otherwise the modula-tion alphabet must be extremely large. Virtuallyerror-free communication means that the biterror rate must be below 10-12. The last fewyears coding schemes have been found that arequite close to the Shannon limit6). However,these schemes often also have rather large blocklengths. Due to attenuation, fading and atmo-spheric disturbances the range of such a radiotransmission scheme is limited to between 50and 150 km where the specific range dependson the climatic conditions of the installation site,the topography and the carrier frequency that isapplied. At high carrier frequencies the attenua-tion will be the limiting factor, while at lowerfrequencies multipath propagation may be themost critical. Therefore, in order to connect two

Code Code rate Eb / N0 to achieve Capacity for Capacity limit Loss at 10-6

BER of 10-6 Constrained input Block length5)

Berrou code (variant) 1/2 1.6 dB 0.2 dB 1.0 dB 0.6

DVB-RCS Code 1/2 1.8 dB 0.2 dB 1.0 dB 0.8

Table 4 Performance of theoriginal Turbo and DVB-RCS

code

5) The capacity of 1/2-rate coding with binary constrained input is 0.2, but including 0.8 loss due to the limited block size (MPEG) we get 1.0 dB.6) The distance to the Shannon limit is still higher for higher order modulation than for binary coding. Some improvement is expected by spectral shaping.7) We have used the term effective block length to indicate that it is not necessarily a block code that is applied, but for trellis codes the “block length” is

not easily defined.

10-1

10-2

10-3

10-4

10-5

10-6

10-7

BE

R

0 4 6 82 10 12 14

Capacity LimitUncodedSequential limitConvolutional CodingConcatenated CodingDVB-RCS Turbo

Eb/N0 [dB]

Figure 9 Comparison ofcoding techniques

Page 88: Information Theory and its Applications

86 Telektronikk 1.2002

distant locations, we need several radio hops.Thus, to avoid a much higher delay than whatis present for the fibre system, we also need tolimit the processing delay for the radio transmit-ter and receiver. This puts strong requirementson the allowed effective block length7) of thecoding scheme.

Fixed radio systems are normally connected tothe backbone network and must therefore atleast offer STM-1 rates (155.52 Mbit/s). In mostEuropean countries these links must follow afrequency plan with a channel spacing of 28MHz, 40 MHz or 56 MHz. In Table 5 we givethe necessary spectral efficiencies in these cases.It would be beneficial to offer two STM-1 carri-ers in one channel or even to go to the higherSTM-4 rate (622.02 Mbit/s), and to offer ashigh capacity and spectral efficiency as possible.However, implementation issues limit the possi-ble spectral efficiency upward to around8 bits/s/Hz.

Table 6 summarises the minimum code rates thatcan be applied in order to achieve a given spec-tral efficiency as presented in Table 5 for a givenmodulation method. We see that if a code rateclose to 1/2 is to be used, we need a very largemodulation alphabet to achieve sufficiently highspectral efficiency. Such large constellations willhave other impacts on the system like problemswith synchronisation and non-linearity effects inpower amplifier and RF front end. Although wemay obtain a spectral efficiency of 6.5 with acode rate 1/2 using 8192QAM, such a radio linkwill be extremely expensive or difficult toimplement with existing technology.

Channel Practical Spectral Minimum modulation

Spacing Symbol Rate Efficiency alphabet (QAM)

28 MHz 24 6.5 128-QAM

40 MHz 33 4.7 32-QAM

56 MHz 47 3.3 16-QAM

Table 5 Necessary spectralefficiency STM-1 in differentchannels

Spectral Modulation Minimum

efficiency alphabet Code rate

3.3 16QAM 0.83

3.3 32QAM 0.66

3.3 64QAM 0.55

4.7 32QAM 0.94

4.7 64QAM 0.78

4.7 256QAM 0.58

6.5 64QAM Not possible

6.5 128QAM 0.93

6.5 256QAM 0.81

6.5 1024QAM 0.65

Table 6 Necessary minimum code rate for a given spectral efficiency andmodulation type

Figure 10 Trellis Coded Modulation (TCM) and mapping of bits to symbols

MAPPING

Select signalfrom subset

SelectSubset

nn-m

ai

Xn

Xm+1

Xm

X1

: :

: : : :

: :

Zm

Z1

Z0

ConvolutionalEncoder

Rate m/(m+1)

Parameter Overall Spectral Measurements Loss compared to

Code rate Efficiency Eb / N0 at BER = 10-6 unconstrained

capacity limit

2D TCM, 32 QAM 0.80 4 11.7 dB 6.0 dB

2D TCM, 64QAM 0.83 5 14.0 dB 6.1 dB8)

4D TCM, 128QAM 0.93 6.5 15.7 dB 4.3 dB

Table 7 Performance andparameters for some specificNera TCM schemesimplemented in radio relaysystems, XD means Xdimensional

8) Compared against the constrained input the loss is 4.8 dB.

Page 89: Information Theory and its Applications

87Telektronikk 1.2002

3.4.2 Trellis Coded ModulationSince Ungerboeck published the idea of trelliscoded modulation (TCM) [28] combining thecoding and modulation operations, TCM hasbecome the obvious solution for spectrally effi-cient systems like radio relay systems. The codesnormally used with TCM are rather short, andthus the delay does not lead to any serious prob-lems. However, the mapping to the constellationpoints is a critical operation in order to achievegood performance. Figure 10 shows the basicidea of trellis coded modulation including themapping operation to constellation symbols.Table 7 presents some performance results forTCM schemes used with radio links.

3.5 Promising New Coding SchemesThe coding schemes described so far have beenapplied for some time and are in practical use invarious systems or prototypes. In this section wedescribe coding schemes that have so far mostlybeen subject to research, but are promising can-didates with potential for future applications. Thefirst is a serial concatenation of convolutionalcodes with iterative decoding. Then we look ata class of Turbo codes that are based on productcodes (see for instance [29]), which are decodediteratively. These codes have therefore beennamed Turbo Product Codes. The third methodis a type of codes which was first proposed byGallager as early as 1963 [30], and is known asLow Density Parity Check (LDPC) codes.

3.5.1 Serial ConcatenatedConvolutional Codes

Serial Concatenated Convolutional Codes(SCCC) was introduced by Benedetto et al. as analternative to Turbo Codes based on serial con-catenation [31]. Basically, the algorithms andmuch of the theory used for PCC can be appliedfor these codes. Note however that interleaverand component codes are different. Accordingto [31] the outer code should be a maximum freedistance, non-systematic convolutional code(NSC), while the inner code should be recursive.However, a somewhat different design is to usea simple repetition code (R = 1/2) as the outercode and rate 1 recursive convolution code asan inner code as proposed by ten Brink [32]. A

switch is used to replace some of the parity bitswith systematic bits at a very low rate (1:100).This doping is necessary to “kick-start” the itera-tive decoder. The code has very low complexity,but still gets within 0.1 dB of the Shannon limitat 10-5 for very long block lengths.

Bit error rate (BER) performance results haveshown these codes to be asymptotically betterthan PCC. There are few or no basic patentsdue to the many publications, especially byBenedetto. This scheme has slower convergencecompared to PCC, which might stem from thefact that the inner and outer decoder will work atdifferent signal to noise ratios. Good high coderate performance seems harder to achieve. It isunclear whether this is a fundamental problemor just lack of good codes.

3.5.2 Turbo Product CodesThe idea of Turbo Product Codes (TPC) is toapply iterative (or Turbo) decoding to the decod-ing of two concatenated block codes, known asproduct codes (see for instance [29]). In Figure12 we show the fundamental idea of ProductCodes. A number of code words are generatedusing the first component code (the row vectors),and then encoding is performed “vertically” bythe second component code. Observe that wealso encode the parity symbols creating parity onthe parity. The two codes that are applied maybe different or the same. Typically BCH codes,Hamming codes and/or parity check codes areused. We may also apply a third code, requiringa cube for illustration. To be able to performiterative decoding, we need soft decisions bothin and out of the individual decoding processes.Details of this decoder can be found in the paper[33] by Phyndia et al. who first published theidea of Turbo Product Codes. These codes havesince gained much interest due to some attrac-tive properties. Block codes typically performbetter than convolutional codes for high coderates. Turbo Product Codes therefore seem toperform better than the convolutional code basedTurbo codes for high code rates. Therefore, weconsider Turbo product codes to be a strong can-didate for replacing TCM in the high spectralefficiency radio systems described above. Also,

+ T

+

T

+

TPseudo-random

Interleaver

OuterRepetition

Code

Parity

Systematic Doping

Figure 11 Serial Concate-nated Convolutional code,

ten Brink

9) This feature is sometimes erroneously referred to as an error floor. We prefer to call it a levelling out feature since it is certainly not a floor, and thedecrease in error rate is still better than for the uncoded system.

Page 90: Information Theory and its Applications

88 Telektronikk 1.2002

the levelling out effect9) of the bit error ratecurve that has been observed for other Turbocodes seem less dramatic for the product codesthan the convolutional codes. There are indica-tions that the levelling out effect can be madenegligible for many applications by some smartmodifications [34]. For the PCC codes this effectis to some extent controlled or reduced by care-ful interleaver design, improving the minimumdistance of the code.

As described, product codes consist of two ormore concatenated block codes as componentcodes. Therefore, the Turbo Product Codes havethe same restriction on the relation between coderate and block length as their component codes.Thus, we cannot get any arbitrary code ratematching any desired block length. The totalblock length of the product codes is the productof the block length of each component code.This means also that the code rate is given by

where ki is the number of information bitsencoded into a block with code i, and ni is thetotal number of encoded bits in the code i. Thislack of flexibility for TPC may be a rather unde-sired feature for many applications.

The authors are not aware of any commercialradio systems in operation today using TurboProduct Codes. However, these codes were seri-ously considered for several satellite systemsthat were standardised or planned recently. TPCis also an option in the IEEE 802.16 standard forwireless metropolitan area networks that was rat-ified in December 2001. Since these codes nowstart to appear in standards, there are also chipmanufacturers that have TPC ASICS underdevelopment.

Figure 13 shows the performance of TPC with64QAM modulation plotted together with64TCM. The spectral efficiency of the 64TCMscheme is 5, while the spectral efficiency of the64QAM modulated TPC is 4.7 (the code rateused is R = 0.779, and two BCH (64,57) compo-nent codes are used). Observe that the spectralefficiency is slightly lower for the TPC scheme,but it also significantly outperforms the TCMscheme. Because of the length of the TPC codethe delay is also somewhat higher than for TCM.

3.5.3 Low Density Parity Check CodesLow Density Parity Check (LDPC) codes areblock codes that were first presented by Gallager[30], and are also referred to as Gallager codes.These codes were “forgotten” for many yearsuntil they were rediscovered by McKay [35].LDPC codes probably have the world record oncoding gain with a performance only 0.005 dBfrom the Shannon limit [36] (using extremelylong code words). These codes have a verysparse parity check matrix, H. A sparse paritymatrix means that the number of ones is verylow compared to the number of zeros. This iswhy the codes are referred to as low density par-ity check (LDPC) codes. The parity checkmatrix is constructed in a semi-random manner,placing t ones on each column. Research results[35] indicate that it is advantageous to have anoverlap of only one (overlap constraint) betweenthe ones of the columns10). In addition we try toachieve as uniform a row weight as possible.A parity check matrix constructed as describedabove is then transformed into systematic form,and the generator matrix G is easily obtained andused for encoding. For a given G the encodingprocess is exactly as for any block code.

R =k1k2

n1n2,d11 d12 d13 d14 d15 P16 P17

d21 d22 d23 d24 d25 P26 P27

d31 d32 d33 d34 d35 P36 P37

P41 P42 P43 P44 P45 P46 P47

Figure 12 The principle ofproduct codes: parity symbolsare calculated for a datavector, and then verticallyacross the previouslycalculated code words withthe second code

10) This is to avoid short cycles in the graph, which are bad for the decoder performance.

10-2

10-4

10-6

10-810 12

Capacity LimitTPC and 64QAM64-TCM

864 14 16 18

BE

R

Eb/N0 [dB]

Figure 13 Turbo ProductCodes with 64QAM compared to 64TCM

Page 91: Information Theory and its Applications

89Telektronikk 1.2002

Gallager codes, where the columns of the paritycheck matrix have the same (uniform) weight,are called regular Gallager codes. Similarly,irregular Gallager codes have parity check matri-ces with non-uniform column weights. There areindications that irregular Gallager codes havebetter performance than regular Gallager codes[35]. Gallager codes may also be constructedover GF(q). Such non-binary codes may be betterthan the irregular Gallager codes over GF(2) [35].

Gallager codes are decoded iteratively by a sum-product algorithm. After each iteration the syn-drome is calculated and a new iteration is per-formed if the syndrome is not zero. When thesyndrome is zero, we have found a valid codeword. The decoding is then terminated and theinformation bits are released. If the syndrome isnot zero after some predefined maximum num-ber of iterations, the decoding is halted and theinformation part of the current code word esti-mate is released. The minimum distance of Gal-lager codes is high, and, except for very shortblock lengths, errors are normally only observedwhen the decoding is terminated because themaximum number of iterations was reached.Thus, Gallager codes have a built-in error detec-tion mechanism that can make them promisingcandidates for packet data communication withretransmission (hybrid ARQ). Gallager codesneed many more iterations in the decoding thanthe Turbo codes, but the complexity of each iter-ation is much lower.

In contrast to the decoders for many other blockcodes, the decoder for Gallager codes can easilyutilise soft decisions. In the decoder the softdecisions are used to initialise to appropriate ini-tial symbol probabilities. Consequently, there isalso limited extra complexity in soft decisiondecoding of Gallager codes. Figure 14 shows theperformance of some LDPC codes with MPEGand ATM packet sizes. The performance is com-parable to or better than the performance of thebest Turbo codes.

4 Comparison and DiscussionAs a comparison of the discussed codingschemes, we have plotted the spectral efficiencyof a number of coding techniques as a functionof the signal-to-noise ratio (Eb / N0). In Figure15 we present the spectral efficiency for ATMsized (i.e. 53 bytes or 424 information bits)packets as a function of the required Eb / N0 toachieve a packet error rate equal to 10-5, whereEb is the bit energy and N0 the noise spectraldensity. Note that for the LDPC codes we haveused bit error rate instead of packet error rate,and therefore the results are slightly too goodcompared to the other schemes. However, dueto the steep curves the SNR difference resultingfrom this is modest. We see that the iterative (or

3 4210 5 6 7

Bit

erro

r ra

te

100 it., N=848, R=1/2, t=3100 it., N=530, R=4/5, t=5100 it., N=3008, R=1/2, t=3100 it., N=1880, R=4/5, t=5100 it., N=477, R=8/9, t=5

100

10-1

10-2

10-3

10-4

10-5

10-6

Eb/N0 [dB]

Figure 14 Performance of LDPC codes with various packet sizes (MPEG and ATMpacket sizes) and various code rates. Results are shown for regular LDPC codes

with t denoting the column weight of the parity check matrix

3

Required Eb/N0 [dB]

4210 5 6 7

Spe

ctra

l Effi

cien

cy

CapacityConcatenatedPCCTPCLDPC

1.0

0.8

0.6

0.4

0.2

0

ATM size block

-1

Figure 15 Spectral efficiency of various error control schemes for ATM sizedpackets. We have applied a packet error rate of 10-5, except for LDPC codes a bit

error rate equal to 10-5 is applied. Modulation is BPSK

Page 92: Information Theory and its Applications

90 Telektronikk 1.2002

Turbo based) schemes perform significantly bet-ter than the concatenated schemes. In Figure 16we have plotted the corresponding results forMPEG packets (i.e. 188 bytes or 1504 informa-tion bits). The concatenated coding scheme doesnot apply soft decisions when decoding the ReedSolomon code since this would be rather com-plex. Soft decisions explain a major part of theperformance difference.

The coding schemes presented also have differ-ent degrees of flexibility. The PCC-Turbo codecan accommodate any block length by changingthe interleaver size. We can change the code rateby altering the puncturing pattern. Thus, coderate and block length can be modified indepen-dently of each other. This gives great flexibilityin the system design. For the Turbo ProductCodes the code rates and block lengths aredirectly given by the code rate and block lengthof the component codes. Shortening of the codeis of course possible, but still we do not havethe same flexibility as for the PCC Turbo codes.Note the relatively poor performance of theR = 0.6 code for TPC and MPEG packets. Thiscomes from the limited flexibility of the TPCcode that is applied. With additional componentcodes one could achieve improved performance.The LDPC codes are somewhere between thesetwo other Turbo codes in flexibility. A code withany code rate and block length can in principlebe designed, but distinct encoders and decodersmust be implemented. There are indications ofsignificant loss when puncturing the LDPCcodes [37], so changing the code rate by punc-turing is not recommended.

References1 Sheriff, R E, Hu, Y F. Mobile satellite com-

munication networks. West Sussex, JohnWiley, 2001.

2 Shannon, C E. Communication in the Pres-ence of Noise. In: Proceeding IRE, 37,10–21, January 1949. (Reprint available inProceedings of the IEEE, 86 (2), 447–457,1998.)

3 Blahut, R E. Principles and Practise ofInformation Theory. Addison-Wesley, 1987.

4 Proakis, J G. Digital Communications. 2ndEdition. New York, McGrawHill, 1989.

5 Dolinar, S, Divsalar, D, Pollara, F. CodePerformance as a Function of Block Size.Pasadena, California, Jet Propulsion Labora-tory, California Institute of Technology,1998. (TMO Progress Report 42-133.)

6 Viterbi, A J. Error bounds for convolutionalcodes and an asymptotically optimumdecoding algorithm. IEEE Transactions onInformation Theory, IT-13 (2), 260–269,1967.

7 Frenger, P et al. Rate compatible convolu-tional codes for multirate DS-CDMA sys-tems. In: IEEE Transactions on Communica-tions, 47 (6), 828–836, 1999.

8 Frenger, P et al. Multirate convolutionalcodes. Gothenburg, Sweden, Chalmers Uni-versity of Technology, Dept. of Signals andSystems, Communication Systems Group,1998. (Tech. Rep. 21.)

9 Frenger, P, Orten, P, Ottosson, T. Convolu-tional codes with optimum distance spec-trum. IEEE Communication Letters, 3 (11),317–319, 1999.

10 Viterbi, A J. Principles of digital communi-cation and coding. New York, McGraw-Hill,1979.

11 Wicker, S B. Error control systems for digi-tal communication and storage. New Jersey,Prentice Hall, 1995.

12 Johannesson, R. Some long rate one-halfbinary convolutional codes with an optimumdistance profile. IEEE Transactions on infor-mation Theory, IT-22 (5), 629–631, 1976.

13 Orten, P, Svensson, A. Sequential decodingof convolutional codes for Rayleigh fadingchannels. Wireless Personal Communica-

3

Required Eb/N0 [dB]

4210 5 6 7

Spe

ctra

l Effi

cien

cy

CapacityConcatenatedPCCTPCLDPC

1.0

0.8

0.6

0.4

0.2

0

MPEG size block

-1

Figure 16 Spectral efficiencyof various error controlschemes for MPEG sizedpackets. We have applied apacket error rate of 10-5,except for LDPC codes a biterror rate equal to 10-5 isapplied. Modulation is BPSK

Page 93: Information Theory and its Applications

91Telektronikk 1.2002

tions, 20 (1), 61–74, 2002. Kluwer Aca-demic Publishers.

14 Costello, D J et al. Applications of Error-Control Coding. IEEE Transactions onInformation Theory, 44 (6), 2531–2560,1998.

15 ETSI. Standard Digital Video Broadcasting(DVB); Interaction channel for Satellite Dis-tribution Systems. Sophia Antipolis, 2000.(EN 301 790 V.1.2.2, 2000-12.)

16 Glavieux, A, Berrou, C, Thitimajshima, P.Near Shannon Limit error correcting codingand decoding: Turbo-Codes (1). In: Proceed-ings of IEEE International Conference onCommunications, Geneva, Switzerland, May1993, 1064–1070.

17 Crozier, S et al (eds.). Performance of TurboCodes with Relative Prime and Golden Inter-leaving Strategies. Sixth International Con-ference on Mobile Satellite Communication(IMSC ’99), Ottawa, Canada, June 1999.(www.cra.ca/fec)

18 Pyndiah, R et al. Near optimum decoding ofproducts codes. In: Proceedings of IEEEGlobal Telecommunications Conference(GLOBECOM ’94), San Francisco,Nov–Dec 1994, 1/3, 339–343.

19 Acikel, O F, Ryan, W E. Punctured turbo-codes for BPSK/QPSK channels. IEEETransactions on Communications, COM-47(9), 1315–1323, 1999.

20 Bahl, L R et al. Optimal decoding of linearcodes for minimizing symbol error rate.Transactions on Information Theory, IT-20,284–287, 1974.

21 Hagenauer, J, Offer, E, Papke, L. Iterativedecoding of binary block and convolutionalcodes. IEEE Trans. on Inform. Theory, 42(2), 429–445, 1996.

22 Pietrobon, S, Barbulescu, A S. A simplifica-tion of the modified Bahl decoding algo-rithm for systematic convolutional codes.Int. Symposium on Information Theory & itsApplications, Sydney, Australia, Nov. 1994,1073–1077.

23 Risløw, B et al. Implementation of a 64kbit/s Satellite Modem Using Turbo Codesand 16-QAM. In: Proceedings of NorwegianSignal Processing Symposium, Tromsø, May23–24, 1997.

24 SmallWorld. (May 7, 2002) [online] – URL:http://www.sworld.com.au/

25 TurboConcept. (May 7, 2002) [online] –URL: http://www.turboconcept.com/

26 ETSI. Digital Video Broadcasting (DVB);Interaction channel for Satellite DistributionSystems, Guideline of use of EN 301 790.Sophia Antipolis, 2001. (ETSI TechnicalReport TR 101 790, 2001-06.)

27 Forney, G D Jr, Ungerboeck, G. Modulationand Coding for Linear Gaussian Channels.IEEE Transactions on Information Theory,44 (6), 2384–2415, 1998.

28 Ungerboeck, G. Trellis-Coded Modulationwith redundant signal sets. Part 1: Introduc-tion. IEEE Communications Magazine, 25(2), 5–11, 1987.

29 Wilson, S. Digital Modulation and coding.New Jersey, Prentice Hall, 1996.

30 Gallager, R G. Low density parity-checkcodes. Cambridge, MA, MIT Press, 1963.(Research Monograph series no 21.)

31 Benedetto, S et al. Serial Concatenation ofInterleaved Codes : Performance Analysis,Design and Iterative Decoding. IEEE Trans-actions on Information Theory, 44 (3),909–926, 1998.

32 ten Brink, S. Rate one-half code forapproaching the Shannon limit by 0.1 dB.IEE Electronics Letters, 36 (1), 1293–1294,2000.

33 Pyndiah, R. Near optimum decoding ofproduct codes: Block Turbo Codes. IEEETransactions on Communications, 46 (8),1003–1010, 1998.

34 Advanced Hardware Architectures. (May 7,2002) [online] – URL: http://www.aha.com/

35 MacKay, D J. Good error-correcting codesbased on very sparse matrices. IEEE Trans-action on Information Theory, 45 (2),399–431, 1999.

36 Chung, S et al. On the design of low-densityparity-check codes within 0.0045 dB of theShannon limit. IEEE Communications Let-ters, 5 (2), 58–60, 2001.

37 Orten, P. Low density parity check codes forwireless communications. Billingstad, NeraResearch, 2000. (Technical Report.)

Page 94: Information Theory and its Applications

Telektronikk 1.2002

I. IntroductionIn his study of broadcast channels, Cover [1]showed about three decades ago that one strat-egy to guarantee basic communication in allconditions is to divide the broadcasted messagesinto two or more classes and to give every classa different degree of protection according to itsimportance. The goal is that the most importantinformation (known as basic or coarse data)must be recovered by all receivers while the lessimportant information (known as refinement,detail, or enhancement data) can only be recov-ered by the “fortunate” receivers which benefiteither from better propagation conditions (e.g.closer to the transmitter and/or with a direct line-of-sight path) or from better RF devices (e.g.lower noise amplifiers or higher antenna gains).Motivated by this information-theoretic study,many researchers have shown since then thatone practical way of achieving this goal relieson the idea of hierarchical constellations (knownalso as embedded, multi-resolution, or asymmet-ric constellations) which consist of non-uniform-ly spaced signal points [2], [3], [4]. This conceptwas studied further in the early nineties for digi-tal video broadcasting systems [3], [5] and hasgained more recently new actuality with

• the demand to support multimedia services bysimultaneous transmission of different typesof traffic, each with its own quality require-ment [6], [7], [8]; and

• a possible application in the DVB-T standard[9] in which hierarchical modulations can beused on OFDM subcarriers.

The evaluation of the bit error rate (BER) ofclassical uniform M-ary pulse amplitude modu-lation (PAM), M-ary quadrature amplitude mod-ulation (QAM), and M-ary phase-shift-keyed(PSK) constellations has long been of interest.For example, exact expressions for the BER of16-QAM and 64-QAM were derived in [10].Later on, generic (in M) but approximate BERexpressions for uniform M-QAM have beendeveloped in [11] and [12]. More recently, Yoonet al. [13] obtained the explicit and generic (inM) closed-form expression for the BER of uni-form square QAM. We extended these results tohierarchical square and non-square 4/M-QAMconstellations (see [14]). For this particular fam-ily of hierarchical constellations, for every chan-nel access, two bits are assigned for the basicinformation and (log2M – 2) bits are assigned forthe refinement information. However, for moregeneral hierarchical PAM, QAM or PSK con-stellations, the derivation of exact and genericclosed-form expressions becomes more tedious.In this paper, we take a new look at this problemand we present an alternative recursive approachfor the exact BER computation of generalizedhierarchical M-PAM, M-QAM, and M-PSK con-stellations over additive white Gaussian channel(AWGN) channels. For example, we considerthe 2/4-PAM wherein a BPSK constellation isembedded into a 4-PAM constellation (see Fig-ure 1) and the 2/4/8-PAM constellation whereina BPSK constellation is embedded into a 4-PAMconstellation, which is in turn embedded into a8-PAM constellation (see Figure 2). For thesehierarchical constellations one information bit issent for each of the two or three different levels

A New Look at the Exact BER Evaluationof PAM, QAM and PSK Constellations*)

P A V A N K . V I T T H A L A D E V U N I A N D M O H A M E D - S L I M A L O U I N I

Mohamed-Slim Alouini (33) re-ceived his Diplome d'Ingenieurdegree from TELECOM Parisand his Diplome d'Etudes App-rofondies in Electronics from theUniv. of Pierre & Marie Curie,Paris, both in 1993. He receivedhis M.S.E.E. from Georgia Tech,Atlanta, in 1995, and his PhD inelectrical engineering from Cal-tech, Pasadena, in 1998. Hejoined the department of Electri-cal and Computer Engineeringof the Univ. of Minnesota in 1998,where his current research inter-ests include statistical modelingof multipath fading channels,adaptive and hierarchical modu-lation techniques, diversity sys-tems, interference mitigationtechniques, and digital commu-nication over fading [email protected]

Pavan K. Vitthaladevuni (23)received his B.Tech. degree inElectrical Engineering from theIndian Institute of Technology(IIT), Madras, India in 1999 andhis MSEE degree from the Uni-versity of Minnesota, Twin Citiesin 2001. He is currently a gradu-ate research assistant in theDepartment of Electrical andComputer Engineering at theUniversity of Minnesota, TwinCities and is working towardshis PhD. His research interestsspan digital communicationsover wireless channels andinformation theory.

[email protected]

*) This work is supported in part by the National Science Foundation (NSF) grant CCR-9983462 and in part bythe Center of Trnasportation Studies (CTS) through the Intelligent Transport Systems (ITS) Institute, Minneapo-lis, Minnesota.

Hierarchical constellations offer a different degree of protection to the transmitted messages according

to their relative importance. As such they found interesting application in digital video broadcasting

systems as well as wireless multimedia services. Although a great deal of attention has been devoted

in the recent literature to come up with explicit closed-form expressions for the bit error rate (BER)

performance of standard pulse amplitude modulation (PAM), quadrature amplitude modulation (QAM),

and phase-shift-keying (PSK) constellations, very few results are known on the BER performance of

hierarchical constellations. In this paper, we argue that a recursive way of Gray coding a constellation

ensures the existence of a recursive algorithm for the exact and generic (in the constellation size) BER

computation of generalized hierarchical PAM, QAM, and PSK constellations over additive white

Gaussian channel (AWGN) channels. This new approach provides also as a byproduct an alternative

unified way for the BER evaluation of the well-known standard PAM, QAM, and PSK constellations.

Because of its generic nature, this new approach readily allows numerical evaluation for various cases

of practical interest.

92

Page 95: Information Theory and its Applications

93Telektronikk 1.2002

of protection required, respectively. Apart fromsolving for the exact BER of these generalizedhierarchical constellations, this new approachprovides as a byproduct an alternative unifiedway for the BER evaluation of the well-knownstandard uniform constellations.

The remainder of this paper is organized as fol-lows. The subsequent section presents somedesign issues as well as the model and parame-ters of generalized hierarchical constellations.While section III describes the recursive algo-rithm for the exact BER computation of the con-stellations under consideration, section IV showshow these general results can be exploited toobtain the BER of standard uniform constella-tions. Section V illustrates the mathematical for-malism by some numerical examples. Next, sec-tion VI outlines some potential applications ofthese new results. Finally, section VII concludesthe paper with a summary of the main points ofthe paper.

II. System Model andParameters

II.A General Setup

II.A.1 PAMConsider a situation wherein we want to achieveunequal error protection for various bits in aPAM symbol. Given that information rides onthe amplitude of the symbol, we need to con-struct a specific PAM constellation, to achievethe stated goal of unequal error protection. Weconsider a generalized 2/4/⋅⋅⋅/M-PAM constella-tion with Karnaugh map style Gray mapping(see Figure 2 for the generalized 8-PAM exam-ple). In the case of 8-PAM, we assume that thereare 3 streams of data, each of which has a prior-ity (or equivalently, a target BER). We take onebit from each stream to form a symbol of 3 bits.The bit from the bit stream with the highest pri-ority (lowest allowed BER) is assigned to the

most significant position, and as such, is referredto as the most significant bit (MSB). The bitfrom the bit stream with the second highest pri-ority is assigned the next most significant posi-tion, and the bit with the least priority is assign-ed the least significant position, and is referredto as the least significant bit (LSB). In general, ifwe were to have m bit streams of data (referredto as sub-channels, i1 through im, hereafter) withtheir respective priorities, we follow a similarprocedure in assigning the bits to positions in them-bit symbol. We would then have M = 2m sym-bols, and these can be visualized as points in anM-PAM constellation. In addition to arrangingthe bits in a symbol with respect to (w.r.t.) theirpriorities, we need to place these symbols in ahierarchical way in the constellation to achieveunequal error protection. In Figure 2, the ficti-tious BPSK constellation denoted by the largegrey circle represents the highest priority (here-after referred to as first level of hierarchy or sub-channel i1). The distance d1 refers to the highestpriority. The distance d2 represents the secondpriority (second level of hierarchy or sub-chan-nel i2). Finally, the distance d3 represents thethird priority (third level of hierarchy or sub-channel i3). In other words, the constellation(A through H) can be visualized as a BPSK (S1and S2) embedded into a 4-PAM (T1 throughT4), which is further embedded into an 8-PAM(A through H). Gray coding is also done hierar-chically. The fictitious BPSK is coded using 1

Figure 1 Generalizedhierarchical 2/4-PAM

constellation

Figure 2 Generalizedhierarchical 2/4/8-PAM

constellations

10 1 11 01 0 00

2d22d1

i2

i1

101S1

i2

2d1

i1

T1100 110T2111 011S2T3010 000T4001

2d3

2d2

i3 i3

1 0B10A D11C F01E H00G

Page 96: Information Theory and its Applications

94 Telektronikk 1.2002

bit. As shown in Figure 2, S1 is coded as 1. Inthe next level of hierarchy, the fictitious T1 andT2 are coded as 10 and 11 respectively. Note thatwe are tagging a 0 or 1 to the right of the Graycode of S1. A and B are then coded as 100 and101 respectively, by tagging a 0 and 1 to theright of the Gray code for T1. We remind thereader that A through H are the symbols thatare transmitted. It helps to visualize the 8-PAMconstellation as one that evolves from a 4-PAMwhich in turn evolves from a BPSK. This is whywe will refer to this 8-PAM constellation as a2/4/8-PAM. This procedure of Gray coding en-sures the greatest protection to the MSB at thecost of the LSB.

In general, given any M-PAM (M = 2m), thisprocess can be automated as follows. Label theconstellation points from the left to right (or viceversa) with integers starting from 0 to M – 1.Then, convert the integer labels to their binaryform (for example, if we are dealing with a 16-PAM (16 = 24), the binary representation of 4would be 0100). For the k-th symbol, let thebinary equivalent be b1,k b2,k ⋅⋅⋅ bm,k. Then, thecorresponding Gray code (g1,k g2,k ⋅⋅⋅ gm,k) (k =0, 1, 2, ⋅⋅⋅ , M – 1) with integer label b (b1,k b2,k⋅⋅⋅ bm,k) is given by

g1,k = b1,k,gi,k = bi,k ⊕ bi-1,k, i = 2, 3, ⋅⋅⋅, m, (1)

where ⊕ represents modulo-2 addition. Forreader convenience, MATLAB programs gener-ating Gray codes are available at [15].

II.A.2 QAMA generalized hierarchical square M-QAM con-stellation (M = 22m) can be modeled as follows.We assume that there are m bit streams of data

. Each one of these incoming

streams carries information of a particular prior-ity. For every channel access 2 bits are chosenfrom each level of priority. The 2 highest prior-ity bits are assigned the MSB positions in the in-phase(I) and the quadrature phase(Q), respec-tively. Bits with lower priorities are assigned thesubsequent positions of lower significance. Forinstance the 2 bits with the second highest prior-ity are assigned the second most significant posi-tions in the I-phase and Q-phase, and so on untilthe 2 least priority bits are assigned the LSBposition in the I- and Q-phase. This can beviewed as a 4/16/64⋅⋅⋅/M-QAM constellation.

m =1

2log2 M⎛

⎝⎞⎠

Figure 3 Generalizedhierarchical 4/16-QAMconstellations

1000

i2

i1

1010

1001 1011

10

0010 0000

1011 0001

00

1101 1111

1100 1110

11

0111 0101

0110 0100

01S1

2d1

2d2

q2

q1

Page 97: Information Theory and its Applications

95Telektronikk 1.2002

The Gray codes for the symbols are given by, where the superscripts “i”

and “q” refer to the in-phase and quadraturephase respectively. In other words, the Graycode for the symbol is obtained by interleavingthe Gray codes of the symbol position in the I-channel and Q-channel PAMs respec-tively. For instance, in the 4/16-QAM constella-tion shown in Figure 3, the symbol S1 has the I-channel Gray code as 00 (rightmost), and the Q-channel Gray code as 11 (third from the top). So,the Gray code of S1 is 0101.

II.A.3 PSKThe system model is similar to that of an M-PAM constellation, the only difference beingthat the information rides on the phase of thesymbol, rather than the amplitude. In addition toarranging the bits in a symbol w.r.t. their priori-ties, we need to place these symbols in a hierar-chical way in the constellation to achieveunequal error protection. In Figure 4, the ficti-tious BPSK constellation denoted by the * sym-bol represents the highest priority (hereafterreferred to as first level of hierarchy or sub-channel i1). The angle θ2 represents the secondpriority (second level of hierarchy or sub-chan-nel i2). Finally, the angle θ3 represents the thirdpriority (third level of hierarchy or sub-channeli3). In other words, the constellation (A throughH) can be visualized as a BPSK (S1 and S2)embedded into a QPSK (T1 through T4), whichis further embedded into an 8-PSK (A throughH). Gray coding is also done hierarchically. Thesame equations (1) can be used to Gray code ageneralized hierarchical M-PSK constellation.For reader convenience, MATLAB programsautomating the Gray code generation are avail-able at [16].

II.B System Parameters

II.B.1 PAM

DistancesAs we have described in section II-A.1, the dis-tances we use evolve in a hierarchy. To simplifythe notation in our proposed algorithm, wedefine the distance vector as d = [d1 d2 ⋅⋅⋅ dm]and the priority vector p as

(2)

This vector controls the relative message priori-ties. The larger the ratio pi / pi+1 the greater isthe protection for the bit in position i than the bitin position (i + 1).

M

EnergiesThe average symbol energy for the generalizedM-PAM constellation can be computed as fol-lows. If we were to denote every point in theconstellation by its distance from the origin, thenthe coordinate will be of the form paT, where ais a coordinate row vector (with elements ±dm).For example, the symbol S1 in Figure 2 has vec-tor a = [–1 + 1 – 1]d3. The energy of a point inthe constellation with coordinate x can be writ-ten x2. By construction, we observe that forevery point with coordinates (d1 + x), there is apoint with coordinates (d1 – x). Also, we notethat for every point with coordinates (d1 + d2 +x), there exists a point with coordinates (d1 + d2– x), and this is true through the hierarchy fromlevel 1 up to level m. Hence, because of thissymmetry, we can see that the average energyEs can be written as

(3)

II.B.2 QAM

DistancesAs has been mentioned previously, squareM-QAM constellations can be viewed as two

-PAMs in quadrature. Therefore, to

describe them, we need two distance vectors, di

and dq, where the superscripts “i” and “q” againrefer to the I-phase and the Q-phase respectively.

M

Figure 4 Generalizedhierarchical 2/4/8-PSK

constellations

011

θ2

010

001

000

θ3

θ1=π/2

110 100

111101

T2

01

D

C

S10

S2

1

B

A00

T1

E

FG

H1011

T3 T4

r

p = [p1p2 · · · pm−1pm] =

[d1

dm

d2

dm

· · ·

dm−1

dm

1

].

Es =(d2

1+ d

2

2+ · · · + d

2

m

)= pp

Td2

m.

gi1g

q1g

i2g

q2· · · g

img

qm

Page 98: Information Theory and its Applications

96 Telektronikk 1.2002

tude d, and get the m bits corresponding to this dand assign them to their respective bit streams(MSB to i1, ⋅⋅⋅, LSB to im). Equivalently, we

could use to decode individual bit streams as

shown in Figure 5. We note from Figure 2 thatfor symbols 2/4/8-PAM constellation, the MSBis 0 in the right half plane, and 1 in the left half

plane. So, if the recovered amplitude, , is pos-

itive, then we can directly assign “0” to theMSB. Similarly, the next most significant bit (i2)

is 1 if |d| ≤ d1 and 0 else. So, if ≥ d1, we

directly assign “0” to bit i2. More generally, for2/4/⋅⋅⋅/M-PAM, the demodulator uses the follow-ing decision rules:

• For bit i1: If ≥ 0, i1 = 0; else i1 = 1.

• For bit i2: If | | ≥ d1, i2 = 0; else i2 = 1.

⋅⋅⋅• For bit im: The decision boundaries are given

by vector B, that can be constructed as shownby the MATLAB code in Table I.

This is nothing but ML decoding done on indi-vidual bits instead of symbols. Demodulation forQAMs can be done similarly.

PSKThe demodulator is also based on the ML rule.The phase η, of the incoming signal is recov-ered. This phase is then used for decoding the bitstreams. We could interpret the decoder in twoways. We could use η to identify the most likelytransmitted phase θ, and get the m bits corre-sponding to this θ, and assign them to theirrespective bit streams (MSB to i1, ⋅⋅⋅, LSB to im).Equivalently, we could use θ to decode individ-ual bit streams. We note from Figure 4 that forsymbols 2/4/8-PSK constellation, the MSB is 0in the upper half plane, and 1 in the lower halfplane. So if the recovered phase, η, is such that0 < η < π, then we can directly assign “0” to theMSB. Similarly, the next most significant bit (i2)is 1 in the left half plane, and 0 in the right half

plane. So if , we directly assign

“0” to bit i2. More generally, for 2/4/⋅⋅⋅/M-PSK,the demodulator uses the following decisionrules:

• For bit i1: If 0 < η < π, i1 = 0; else i1 = 1.

• For bit i2: If ; else i2 = 1.

⋅⋅⋅• For bit im: The decision boundaries are given by

vector B, that can be constructed similar to whatis shown by the MATLAB code in Table I.

−π2

< η <π2

,i2 = 0

−π2

< η <π2

d

d

d

d

d

Figure 5 Hierarchical2/4/⋅⋅⋅/M-PAM Demodulator

In the case of square QAM, these two vectorsare identically equal and are given by

di = dq = [d1 d2 ⋅⋅⋅ dm]. (4)

EnergiesJust as in the case of PAM, we can show forQAM that the average energy is given by:

(5)

II.B.3 PSKAs we have described in section II-A.3, a2/4/⋅⋅⋅/M-PSK (M = 2m) constellation can bedescribed through a set of angles. To simplifythe notation, we define an angle vector as

θ = [θ1 θ2 ⋅⋅⋅ θm]1 × m (6)

and the priority vector p as

(7)

with Note that these definitions are on

the same lines as for PAMs. As in the case ofPAMs, the p vector controls the relative messagepriorities. The larger the ratio pi / pi+1 the greateris the protection for the bit in position i than thebit in position (i + 1). Another important systemparameter is the symbol energy for the general-ized 2/4/⋅⋅⋅/M-PSK constellation, which is givenby Es = r2, where r is the symbol amplitude.

II.B.4 Demodulator/Detector

PAMThe demodulator is based on Maximum Likeli-

hood (ML) rule. The amplitude , of the in-

coming signal is recovered. This amplitude isthen used for decoding the bit streams. We couldinterpret the decoder in two ways. We could use

to identify the most likely transmitted ampli-d

d

θ1 =π2

.

Decision rule for bit i1

Decision rule for bit i2

Decision rule for bit im

Resultsin bitstreams(sub channels)

(n+1) Ts

nT

Ocs

r(t)

dt

Bit i1

Bit i2

Bit im

d

Es = 2(d2

1+ d

2

2+ · · · + d

2

m

)= 2pp

Td2

m.

p = [p1p2 · · · pm−1pm] =

[θ1

θm

θ2

θm

· · ·

θm−1

θm

]1×m

Page 99: Information Theory and its Applications

97Telektronikk 1.2002

where

d+ = [d1 d2 ⋅⋅⋅ dm-2 dm-1 + dm]1×(m-1) (11)

and

d– = [d1 d2 ⋅⋅⋅ dm-2 dm-1 + dm]1×(m-1) (12)

are (m – 1) dimensional row vectors (d is anm dimensional row vector). As we proceedthrough the recursion, the constellation sizegoes on decreasing until we either reach astage wherein the vectors d+ and d– are oflength 2 (in the case of bits i1 and i2), or bitik (for k > 2) becomes the LSB. In the formercase we use the result from 4-PAM (given inSect. III-A.1), and come out of the recursion.In the latter case (k = m), we do the following.

• If k = m or in the event that bit ik becomes theLSB at some stage through the recursion, weneed an LSB recursion algorithm. It wasshown in [17] that the recursion for the LSB isgiven by

(13)

where

(14)

and

(15)

P1 =

i=1

2m−1

∑ 1+1

2j =1

2m−1

∑ −1( ) j erfcB( j)− d1(i)

N0

⎝⎜

⎠⎟

⎢⎢

⎥⎥

⎝⎜⎜

⎠⎟⎟,

P0 =

i=1

2m−1

∑j =1

2m−1

∑1

2−1( ) j +1 erfc

B( j)− d0 (i)

N0

⎝⎜

⎠⎟

⎢⎢

⎥⎥

P M i P Pb m m, , ,d( ) = +( )1

2 0 1

Generation of vector B

DecisionBoundVector=d(1 : m – 1)

end

B=zeros(2m-1, 1)

for i = 2m-2 : 2m-1 – 1

index=i

mult=zeros(1, m – 1)

for j = m – 1 : –1 : 1

mult(1, j)=2*mod(index,2)–1

index=floor(index/2)

end

B(i + 1,1)=mult*DecisionBoundVector

end

for i = 2m-2 : –1 : 1

B(i, 1) = –B(2m-1 – i +1, 1)

end

Table I MATLAB pseudo-codefor the generation of vector B

Once again, this is nothing but ML decodingdone on individual bits instead of symbols.

III. Exact BER Computation

III.A Exact BER for 2/4/⋅⋅⋅/M-PAMConstellations

III.A.1 2/4-PAM ConstellationConsider the 2/4-PAM constellation as shown inFigure 1. The probability of error for the bit i1 in

symbol 00 is given by , where

erfc(⋅) function is the complimentary error func-tion defined by

and is the two-sided power spectral density

of the band pass AWGN. Similarly, the bit errorprobability for bit i1 in symbol 01 is given by

. Hence, the average bit error

probability for bit i1, Pb(4, (d1, d2); i1), is given by

(8)

Now, consider the second sub-channel i2. It canbe shown that the average bit error probabilityfor the sub-channel i2, Pb(4, (d1, d2), i2), is givenby [17]

(9)

III.A.2 Generalized 2/4/⋅⋅⋅/M-PAMConstellation

We propose a recursive algorithm for the exactBER computation of generalized 2/4/⋅⋅⋅/M-PAMconstellations. We use the generalized 4-PAMconstellation as the root for this algorithm. Wehave already developed the expressions for exactBER of bits i1 and i2 in the case of 4-PAM (m =2). To compute the BER for bits ik, Pb(M, d, ik),(k = 1, 2, ⋅⋅⋅, m), where bit im represents the LSBand m > 2. The pseudo-code of the algorithmwas shown to be given by [17]:

• If k < m

(10)

Pb M,d,ik( ) =

1

2Pb

M

2,d+ ,ik

⎛⎝

⎞⎠

+ PbM

2,d− ,ik

⎛⎝

⎞⎠

⎣⎢

⎦⎥,

P d d i

d

N

d d

N

d d

N

b 4

14

22 2

1 2 2

2

0

1 2

0

1 2

0

, , ,

.

( )( ) =

−+

+−⎡

⎣⎢⎢

⎦⎥⎥

erfc erfc erfc

Pb 4, d1,d2( ),i1( ) =

1

4erfc

d1 − d2

N0

+1

4erfc

d1 + d2

N0

.

1

2erfc

d1 − d2

N0

N0

2

erfc(x) =2

πexp −z2( )x

∞∫ dz

1

2erfc

d1 + d2

N0

Page 100: Information Theory and its Applications

98 Telektronikk 1.2002

where

- B is the vector of decision boundary posi-tions for the LSB, w.r.t. the origin,

- d0 is the vector of positions of constellationpoints whose LSB is 0, w.r.t. the origin, and

- d1 is the vector of positions of constellationpoints whose LSB is 1, w.r.t. the origin.

The generation of these vectors is shown inTables I and II. For the reader’s convenience,our MATLAB computer programs for PAM(along with a readme file) are available at [15]to allow one to immediately compare the per-formance of various generalized hierarchicalconstellations.

Generation of vectors d0 and d1

dsym=zeros(2m, 1)

d0=zeros(2m-1, 1)

d1=zeros(2m-1, 1)

for i = 2m-1 : 2m – 1

index=i

mult=zeros(1,m – 1)

for j = m – 1 : –1 : 1

mult(1, j)=2*mod(index,2)–1

index=floor(index/2)

end

dsym(i + 1, 1)=mult*d

end

for i = 1 : 2m-1

dsym(i, 1)=–d(2m – i + 1, 1)

end

d0(1,1)=dsym(1,1);

x=2;y=1;

while(x < 2m – 1)

d1(y,1)=dsym(x,1)

d1(y+1,1)=dsym(x+1,1)

d0(y+1,1)=dsym(x+2,1)

d0(y+2,1)=dsym(x+3,1)

x=x+4

y=y+2

end

d1(y,1)=dsym(x,1)

d1(y+1,1)=dsym(x+1,1)

d0(y+1,1)=dsym(x+2,1)

Table II MATLABpseudo-code for generationof vectors d0 and d1

III.B Extension to GeneralizedHierarchical M-QAMConstellations

As mentioned before, QAM constellations canbe viewed as 2 PAMs in quadrature. This facthelps to deduce the BER expressions [17] forQAMs in terms of those of the 2 PAM constella-tions. For instance, square QAM constellationsuse 2 bits for every level of priority (M = 22m).Figure 6 shows a 4/16/64-QAM constellation.

In the case where every level of priority is notrestricted to 2 bits per channel access, manyother QAM constellations can arise. The recur-sive algorithm developed here for the exact BERcomputation can be readily adapted to treat allthese cases. As an example of the applicabilityof the proposed algorithm, we will consider afamily of rectangular M-QAM (i.e. M = 22m+1)constellations with m + 1 incoming streams ofdata. For this considered family of constellationsthe highest priority level is assigned 1 bit in thein-phase whereas all the other levels are assignedtwo bits in similar fashion as the square QAMcase described above. This can be viewed as a2/8/32⋅⋅⋅/M-QAM constellation. As an illustra-tion of this family of constellations Figure 7shows a generalized 32-QAM constellation.

III.C Exact BER for 2/4/⋅⋅⋅/M-PSK

III.C.1 2/4-PSK ConstellationConsider the 2/4-PSK constellation shown inFigure 8. For the MSB (bit i1), it can be shownthat

(16)

where is the symbol carrier to noise

ratio (CNR). Now, consider the LSB (secondsub-channel i2). It can easily be shown that [18]

(17)

III.C.2 Generalized 2/4/⋅⋅⋅/M-PSKConstellations

We notice that the decision boundaries for bitsi1, i2,⋅⋅⋅, im in terms of θ1, θ2, ⋅⋅⋅, θm are the samein both 2/4/⋅⋅⋅/M-PSK and 2/4/⋅⋅⋅/M/2M-PSK. So,the protection angles for these bits in the lattercase differ from those in the former, by ±θm+1.Similar to the PAM case, we propose a recursivealgorithm for which we use the generalized 2/4-PSK constellation as a root. We have alreadydeveloped the expressions for exact BER of bitsi1 and i2 in the case of 2/4-PSK (m = 2). To com-pute the BER for bits ik, Pb(M, θ, ik), (k = 1, 2,⋅⋅⋅, m), where bit im represents the LSB, and

Pb 4, θ1,θ2( ),i2( ) =1

2erfc γ cosθ2( ).

γ =r2

N0

Pb 4, θ1,θ2( ),i1( ) =1

2erfc γ sinθ2( ),

Page 101: Information Theory and its Applications

99Telektronikk 1.2002

m > 2, we use a recursive algorithm, whosepseudo code is given as follows:

• If k < m

(18)

where

θ+ = [θ1 θ2 ⋅⋅⋅ θm-2 θm-1 + θm]1×(m-1) (19)

and

θ– = [θ1 θ2 ⋅⋅⋅ θm-2 θm-1 – θm]1×(m-1) (20)

are (m – 1) dimensional row vectors (note thatθ is an m dimensional row vector). As we pro-ceed through the recursion, the constellationsize goes on decreasing until we either reacha stage wherein the vectors θ+ and θ– are oflength 2 (this is the case with bits i1 and i2) orthe bit ik (for k > 2) becomes the LSB. In theformer case we use the result from 2/4-PSK(given in Section III-C.1) and come out of therecursion. In the latter case, we use the follow-ing closed form expression.

• If k = m or in the event that bit ik becomes theLSB at some stage through the recursion, theBER of the LSB, Pb(M, θ, im), can be writtenin closed form [18] as

(21)

where

(22)

and

(23)

where the F-function is defined in [19], [20]as follows:

(24)

where sgn(⋅) is the sign function. Please referto [19], [20] for the properties of this function.

In (22) and (23), the vectors B, φ0 and φ1 aredefined as follows.

F ψ( ) = −sgn ψ( )

2πexp

0

π − ψ∫ −γ

sin2 ψ

sin2 θ

⎣⎢⎢

⎦⎥⎥dθ ,

−π < ψ < π ,

P1 =i=1

2m−1

∑ −1( ) j +1

j =1

2m−1

∑ F B j( )− φ1 i( )( ),

P0 =i=1

2m−1

∑ −1( ) j

j =1

2m−1

∑ F B j( )− φ0 i( )( ),

Pb M,θθ ,im( ) =1

2m P0 + P1( ),

Pb M,θθ ,ik( ) =

1

2Pb

M

2,θθ+ ,ik

⎛⎝

⎞⎠

+ PbM

2,θθ− ,ik

⎛⎝

⎞⎠

⎣⎢

⎦⎥,

Figure 6 Hierarchical 64-QAM constellation. Gray coding is done hierarchically.For instance, the Gray code for symbol S1 is 001010, while that for symbol T1

is 110000

2d3T1

2d2

2d1

00

00 01

10 11

01

01 00

11 10

00

10 11

00 01

11

11 10

01 00

10

01

00 01

10 11

00

01 00

11 10

01

10 11

00 01

10

11 10

01 00

11

10

00 01

10 11

11

01 00

11 10

10

10 11

00 01

01

11 10

01 00

00

11

00 01

10 11

10

01 00

11 10

11

10 11

00 01

00

11 10

01 00

01

S1

2d3

S1

2d22d1

0

00 01

10 11

01

01 00

11 10

00

10 11

00 01

11

11 10

01 00

10

1

00 01

10 11

00

01 00

11 10

01

10 11

00 01

10

11 10

01 00

11

2d3

2d2

T1

Figure 7 Hierarchical 32-QAM constellation. The Gray coding is done hierarchi-cally. For instance, the Gray code for symbol S1 is 00100, while that for symbol T1

is 01101

Page 102: Information Theory and its Applications

100 Telektronikk 1.2002

- The elements of B represent the angularpositions of the decision boundaries of theLSB w.r.t. the reference axis.

- The elements of φ0 represent the angularpositions of those symbols in the constella-tion (2m – PSK), whose LSB is ‘0’, w.r.t.the reference axis.

- The elements of φ1 represent the angularpositions of those symbols in the constella-tion (2m – PSK), whose LSB is ‘1’, w.r.t.the reference axis.

As an example, see Figure 9 for the generationof vectors a, φ0 and φ1 in the case of 2/4/8-PSK constellation. More generally, these vec-tors can be easily generated for any 2/4/⋅⋅⋅/M-PSK. Note that vector B is similar to vector Bin the case of PAMs, and it can be generatedusing the same pseudo code given in Table I,but by replacing vector d with θ. Vectors φ0and φ1 are similar to the vectors d0 and d1respectively. They can be generated using thesame pseudo-code given in Table II, but byreplacing vectors d, d0 and d1 with θ, φ0 andφ1 respectively. For reader convenience, allour MATLAB programs for PSK (along witha readme file) are available at [16].

IV. Special Cases

IV.A Uniform M-PAMStandard uniform M-PAM constellations can beviewed as a special case of the generalized M-PAMs when p = [2m-1 2m-2 ⋅⋅⋅ 4 2 1]. The result-ing BER for bit ik, Pb(M, p, ik), obtained byusing the recursive algorithm, can be shown tobe in agreement with the explicit closed formexpression recently derived in [13] and given by

(25)

where is the floor function. The average

BER Pb(M, p) is given by

(26)

IV.B Uniform Square QAMM-QAM can be viewed as a special case of thegeneralized M-QAMs when p = [2m-1 2m-2 ⋅⋅⋅ 4 21]. Please refer to Figure 6 for a 4/16/64-QAMexample. This can be used when all the incom-ing bit streams have nearly the same priority.

Pb (M,p) =1

m k =1

m

∑ Pb M,p,ik( ).

⋅⎣ ⎦

Pb M,p,ik( ) =

1

M j =0

1−2−k( )M−1

∑ −1( )j2k−1

M⎢⎣⎢

⎥⎦⎥ 2k−1 −

j2k−1

M+

1

2

⎣⎢⎢

⎦⎥⎥

⎣⎢⎢

⎦⎥⎥

erfc 2 j +1( ) 3γ log2 M

M2 −1( )⎡

⎢⎢⎢

⎥⎥⎥

,

Figure 8 2/4-PSK constellation

Figure 9 Vectors a, φ0 and φ1 for a 2/4/8-PSK constellation

01B A

C D

00

11 10

1

0

θ2

θ1=π/2

011

θ2010

a(2)

a(4)a(3)

a(1)001

000

θ3

θ1

reference

010 100

111 101

π2Vector Φ1 = θ2- θ3-

π2 θ2+ θ3-

π2 θ2- θ3-

π2 θ2+ θ3-- -

π2Vector a = θ2-

π2 θ2+ -

π2 θ2-

π2 θ2+-

π2Vector Φ0 = θ2- θ3-

π2 θ2+ θ3+

π2 θ2- θ3-

π2 θ2+ θ3+- -

Page 103: Information Theory and its Applications

101Telektronikk 1.2002

By symmetry, the in-phase and quadrature phasebits (ik and qk) have the same BER. The averageBER for both the in-phase and quadrature phasebits is obtained by averaging the BER of all in-phase bits ik and quadrature bits qk

yielding

(27)

where is the BER of the bit ikin a -PAM constellation.

IV.C Uniform PSKUniform M-PSK can be viewed as a special caseof the generalized 2/4/⋅⋅⋅/M-PSKs when

, (i.e. when p = [2m-1 2m-2

2m-3 ⋅⋅⋅ 4 2 1]). The average BER is obtained byaveraging the BER of all the bits, ik (k = 1, 2, ⋅⋅⋅,log2M) yielding

(28)

where Pb(M, θ, ik) is the BER of the bit ik in theconstellation, and which can be evaluated using(18) and (21). Using these equations in (28), itcan also be shown that the exact BER of uniformM-PSK can be written in terms of the Hammingweights of the M symbols,as

in agreement with [21], [22] and [23, Section4.1]. Note that for constellations Gray coded inthe Karnaugh map style (as shown in sectionII-A.3), these Hamming weights WM

k can be cal-culated in terms of the m bits (m = log2M) in theGray codes of the symbols, gk, (corresponding tothe k-th symbol) as

-5 0 5 10

Carrier to noise ratio, Es/N0 [dB]

15 20

10-0

10-1

10-2

10-3

10-4

Ave

rage

bit

erro

r pr

obab

ility

Uniform 64-QAM obtained by setting priority vector p = [4 2 1]

Approximate Recursive Algorithm [12]Exact Explicit Expression for Square QAM [13]and Exact Recursive AlgorithmMonte Carlo SimulationLeading Term Approximation

Figure 10 Comparison of thevarious methods (Yang andHanzo [12], Cho and Yoon

[13] and the proposedrecursive algorithm) for thecomputation of the BER for

the uniform 64-QAM casewith p = [4 2 1]

(k = 1, · · · ,

1

2log

2M

)

θ =

2

π

4

π

8· · ·

π

M

]

Pb(M) =1

log2 M

⎡⎣

log2

M∑k=1

Pb(M, θ, ik)

⎤⎦

WM

k, (k = 1, 2, · · · , M)

P s

b(M,d) =

1

2m

[m∑

k=1

Pb

(√M,d, ik

)

+

m∑k=1

Pb

(√M,d, qk

)]

=1

m

m∑k=1

Pb

(√M,d, ik

),

Pb

(√M,d, ik

)(√

M

WMk =

log2

M∑

i=1

gi,k

Pb(M) =1

log2M

M∑

k=1

WM

k

[F

((2k − 1)π

M

)− F

((2k − 3)π

M

)]

Page 104: Information Theory and its Applications

102 Telektronikk 1.2002

V. Numerical ExamplesSince the number of different cases covered byour analysis is quite large, we just present in thissection some numerical examples that demon-strate the usefulness of the proposed analytical

tools for a variety of scenarios of practical inter-est. We further note that the analytical expres-sions derived in the previous sections, and thecorresponding numerical results presented in thissection have been verified extensively by Monte

-5 0 5 10

Carrier to noise ratio, Es/N0 [dB]

15 20

10-0

10-1

10-2

10-3

10-4

10-5

10-6

Ave

rage

bit

erro

r pr

obab

ility

4/16/64-QAM with p = [5 2.25 1]

LSB in the I and Q channel(bits i3 and q3)

Bits i2 and q2

MSB in the I and Q channel(bits i1 and q1)

10-0

10-1

10-2

10-3

10-4

-5 0 5 10

Carrier to noise ratio, Es/N0 [dB]

15 20

Ave

rage

bit

erro

r pr

obab

ility

Generalized 2/4/8/16-PSK with Θ = [π/2 π/6 π/15 π/40], p = [20 6.6667 2.6667 1]

Bits i1 (MSB)

Bits i2

Bits i3

Bits i3

Figure 12 Performance of2/4/8/16-PSK. Note that theMSB performs nearly 20 dBbetter than the LSB atBER ≤ 10-2

Figure 11 Performance of4/16/64-QAM. Note that theMSB performs much betterthan the LSB

Page 105: Information Theory and its Applications

103Telektronikk 1.2002

Carlo simulations for the various constellationsunder consideration. Thus, the reader can betotally confident in the correctness of the newlyderived recursive expressions and the accuracyof the numerical results illustrated below.

Figure 10 deals with the uniform 64-QAM caseand shows the perfect match between the pro-posed exact recursive algorithm and the exactgeneric expression obtained by Yoon et al. [13]as well as the Monte-Carlo simulations. Theapproximate recursive expression obtained in[12] comes very close to the exact result, where-as the leading term approximation1) gives anoptimistic result at low CNR. Figure 11 showsthe performance of 4/16/64-QAM constellation.Note that for the priority vector p = [5 2.25 1],the protection of the MSB gets a higher protec-tion than the LSB. Figure 12 shows the similarnature of performance of 2/4/8/16-PSK. Figure13 shows the performance of a 2/4/8/16/32-PSKconstellation, where the p vector is given byp = [β24 β23 β22 2 1], with β = 1.3 ≥ 1. This is asituation wherein the 3 most significant bits areprotected better than the 2 least significant bits.Also, note that the bits i1, i2, i3 form a uniform8-PSK, while the bits i4, i5 form a uniform 4-PSK. So, this is a constellation wherein an 8-PSK, embedded in a 32-PSK is protected by

the factor β. These protected bits (i1, i2, i3) arereferred to as base bits, and the others (i4, i5)are referred to as the refinement bits. As β isincreased, the 2 sets of curves move further apart.

VI. Potential Applications

VI.A Voice and Data ApplicationsFour years ago, a new scheme for simultaneousdata and voice transmission over fading channelswas proposed [24]. It provided high averagespectral efficiency for data transmission, whilemeeting stringent delay requirements for voicetransmission. This modem always transmitsvoice packets, in the form of a BPSK signal onthe Q-channel. When fading is not severe, themodem assigns a significant transmit power todata transmission. This is done by using a largePAM constellation (on the I-channel). As thechannel fading gets severe, the modem reducesthe size of this constellation, thereby givingmore power to voice transmission to meet thestringent delay constraints. We are currentlylooking into designing an alternative adaptivescheme wherein hierarchical QAM or PSK con-stellations are used instead, in order to improvethe spectral efficiency for simultaneous multi-media transmission.

1) By leading term, we mean just the dominant erfc(⋅) term.

10-0

10-1

10-2

10-3

10-4

-5 0 5 10 15 20

Ave

rage

bit

erro

r pr

obab

ility

8/32-PSK constellation, with Θ = [π/2 π/4 π/8 π/20.8 π/41.6], p = [20.8 10.4 5.2 2 1]

Bits i1 and i2

Bit i3

Average BERfor base bits

Bit i4

Bit i5

Average BER forrefinement bits

Carrier to noise ratio, Es/N0 [dB]

Refinementbits

Basebits

Figure 13 Performance of2/4/8/16/32-PSK. The 3 mostsignificant bits are the basebits and the remaining arerefinement bits. Note that onaverage, the base bits perform10–15 dB better than therefinement bits at BER ≤ 10-2

Page 106: Information Theory and its Applications

104 Telektronikk 1.2002

VI.B Downlink Multiplexing/Multicasting

As we have seen in the previous section, byvirtue of Gray coding, we protect the MSB muchmore than the LSB. Could this property alongwith hierarchical channel coding be used formulticasting information from base station tomobile units? Information for the user who suf-fers the least fading could be transmitted on theLSBs of the signal, while that for the user suffer-ing the worst fading could be transmitted on theMSBs of the same signal. The users receive thesymbol and look only for the bits in particularpositions within the symbol. We are now focus-ing on the development of this application andthe comparison of its spectral efficiency perfor-mance with that of existing multicast strategies.

VII. ConclusionWe have argued that the recursive way of Graycoding a PAM/PSK constellation ensures theexistence of a recursive algorithm for the com-putation of the BER. As a result, we haveobtained exact and generic expressions (in M)for the BER of the generalized hierarchicalM-PAM constellations over AWGN channels.These results can be extended easily to fadingchannels [17], [18]. We have also shown thatthese expressions can be extended to generalizedhierarchical M-QAM constellations. Using thesame argument, we have derived similar BERexpressions for hierarchical PSK constellations.Because of their generic nature, these newexpressions readily allow numerical evaluationfor various cases of practical interest. In particu-lar numerical examples have shown that theleading-term approximation gives significantlyoptimistic BER values at low CNR but is quiteaccurate in the high CNR region. Finally, wehave suggested possible applications, voice anddata integration and downlink multiplexing, toname a few.

References1 Cover, T. Broadcast channels. IEEE Trans.

on Inform. Theory, IT-18, 2–14, 1972.

2 Sundberg, C E W, Wong, W C, Steele, R.Logarithmic PCM weighted QAM transmis-sion over Gaussian and Rayleigh fadingchannels. IEE Proc., 134, 557–570, 1987.

3 Ramchandran, K et al. Multiresolutionbroadcast for digital HDTV using jointsource/channel coding. IEEE Journal ofSelected Areas in Communications, 11,1993.

4 Wei, L-F. Coded modulation with unequalerror protection. IEEE Trans. Commun.,COM-41, 1439–1449, 1993.

5 Morimoto, M et al. A study on power assign-ment of hierarchical modulation schemesfor digital broadcasting. IEICE Trans. Com-mun., E77-B, 1994.

6 Morimoto, M, Okada, M, Komaki, S. A hier-archical image transmission system in a fad-ing channel. In: Proc. IEEE Int. Conf. Univ.Personal Comm. (ICUPC’95), 769–772,1995.

7 Pursley, M B, Shea, J M. Nonuniform phase-shift-key modulation for multimedia multi-cast transmission in mobile wireless net-works. IEEE Journal of Selected Areas inCommunications, SAC-17, 774–783, 1999.

8 Pursley, M B, Shea, J M. Adaptive nonuni-form phase-shift-key modulation for multi-media traffic in wireless networks. IEEEJournal of Selected Areas in Communi-cation, SAC-00, 1394–1407, 2000.

9 DVB-T standard: ETS 300 744. DigitalBroadcasting Systems for Television, Soundand Data Services: Framing Structure,Channel Coding and Modulation for DigitalTerrestrial Television. (ETSI Draft, 1.2.1,EN300 744, 1999-1.)

10 Fitz, M O, Seymour, J P. On the bit errorprobability of QAM modulation. Interna-tional Journal of Wireless Information Net-works, 1 (2), 131–139, 1994.

11 Lu, J et al. M-PSK and M-QAM ber com-putation using signal-space concepts. IEEETrans. Commun., COM-47, 181–184, 1999.

12 Yang, L L, Hanzo, L. A recursive algorithmfor the error probability evaluation of M-QAM. IEEE Commun. Letters, 4, 304–306,2000.

13 Yoon, D, Cho, K, Lee, J. Bit error probabil-ity of M-ary quadrature amplitude modula-tion. In: Proc. IEEE Veh. Technol. Conf.(VTC’2000-Fall), Boston, Massachusetts,2422–2427, September 2000.

14 Vitthaladevuni, P K, Alouini, M-S. BERcomputation of generalized hierarchical4/M-QAM constellations. In: Proc. IEEEInternational Symposium on Personal,Indoor and Mobile Radio Commun. Conf.(PIMRC’2001), San Diego, California, 1,85–89, September 2001. (Journal version inIEEE Trans. on Broadcasting, 47 (3),228–239, 2001.)

15 Vitthaladevuni, P K, Alouini, M-S. MATLABprograms for design and BER computation

Page 107: Information Theory and its Applications

105Telektronikk 1.2002

of generalized hierarchical M-QAM constel-lations. Available at: http://www.ece.umn.edu/users/pavan/Generalized-Hierarchical-Qam.html

16 Vitthaladevuni, P K, Alouini, M-S. MATLABprograms for design and BER computationof generalized hierarchical M-PSK constel-lations. Available at: http://www.ece.umn.edu/users/pavan/Generalized-Hierarchical-PSK.html

17 Vitthaladevuni, P K, Alouini, M-S. BERcomputation of generalized QAM constella-tions. In: Proc. IEEE Global Commun. Conf.(GLOBECOM’2001), San Antonio, Texas, 1,632–636, 2001. (Journal version submittedto IEEE Trans. on Information Theory.)

18 Vitthaladevuni, P K, Alouini, M-S. BERcomputation of generalized hierarchical PSKconstellations. In: Proc. IEEE InternationalConference on Communications (ICC’2002),New York, April 2002. (Journal version sub-mitted to IEEE Trans. Commun.)

19 Pawula, R F, Rice, S O, Roberts, J H. Distri-bution of the phase angle between two vec-tors perturbed by Gaussian noise. IEEETrans. Commun., COM-30, 1828–1841,1982.

20 Pawula, R F. Distribution of the phase anglebetween two vectors perturbed by Gaussiannoise II. IEEE Trans. Veh. Technol., 50,576–583, 2001.

21 Lee, P J. Computation of the bit error rate ofcoherent M-ary PSK with Gray code bitmapping. IEEE Trans. Commun., COM-34,488–491, 1986.

22 Tellambura, C, Mueller, A J, Bhargava, V K.Analysis of M-ary phase-shift keying withdiversity reception for land-mobile satellitechannels. IEEE Trans. Veh. Technol.,VT-46, 910–922, 1997.

23 Simon, M K, Hinedi, S M, Lindsey, W C.Digital Communication Techniques – SignalDesign and Detection. Englewood Cliffs,NJ, Prentice Hall, 1995.

24 Alouini, M-S, Tang, X, Goldsmith, A. Anadaptive modulation scheme for simultane-ous voice and data transmission over fadingchannels. IEEE Journal of Selected Areas inCommunication, SAC-17, 837–850, May1999. (See also Proc. of the IEEE VehicularTechnology Conference (VTC’98), 939-943,Ottawa, Ontario, Canada, May 1998.)

Page 108: Information Theory and its Applications

Telektronikk 1.2002

I. IntroductionMany authors have studied adaptive (coded)modulation for wireless communications (see [1]and the references therein). In an earlier paper[2], we considered a general adaptive codingscheme for single-user channels with frequency-flat slowly varying multipath fading. A particu-lar instance of this coding scheme utilizes a setof multidimensional trellis codes designed foradditive white Gaussian noise (AWGN) chan-nels of different qualities. A feedback channelmakes it possible for the encoder to switch adap-tively between these codes based on channelstate information (CSI) fed back from the de-coder, thus resulting in an overall scheme withhigh spectral efficiency.

The output bit-error-rate (BER) of an adaptivecoding scheme may increase with growing timedelay in the feedback channel and/or increasingterminal speed [3]. Since any implemented feed-back channel has nonzero feedback delay, andsince it is necessary to allow for mobile termi-nals, it is important to determine the BER degra-dation of the proposed adaptive coding schemein [2]. Alouini and Goldsmith [4] have deter-mined the BER degradation for uncoded adap-tive modulation. In this paper, we extend theirtechnique to determine the BER degradation ofany instance of the proposed adaptive codingscheme.

We first introduce, in Section II, a mobile wire-less channel with Rayleigh fading, where themobile terminal has multiple receive antennaswhose signals are combined using the maximalratio combining (MRC) method [5, Ch. 5]. Forany instance of the adaptive coding scheme andany number of receive antennas, Section III thenshows how to determine the BER degradationassociated with a nonzero feedback delay and a

nonzero terminal speed. As an example, SectionIV evaluates a specific adaptive encoder anddecoder (codec) utilizing a set of four-dimen-sional trellis codes. A conclusion is drawn inSection V.

II. System Model and Coding Scheme

The system model consists of a stationary trans-mitter/receiver (transceiver), a wireless fre-quency-flat fading channel, and a mobile trans-ceiver, or terminal. It is assumed that the dis-tance between the stationary transceiver and themobile terminal is not more than a few hundredmeters. We will only consider the flow of userinformation on the downlink. Hence, in ourmodel the feedback channel (or uplink) from theterminal to the receiver will only be used forCSI.

The stationary transceiver has one transmitantenna, while the mobile terminal has H (≥ 1)receive antennas. Each of the H antennabranches is modeled as a Rayleigh fading chan-nel with ideal coherent detection. It is assumedthat the branch signals are statistically indepen-dent.

Denoting the transmitted complex baseband sig-nal at time index t ∈ 0, 1, 2 ... by x(t), thereceived signal at antenna h ∈ 1, 2, ..., Hcan then be written as yh(t) = αh(t) ⋅ x(t) + nh(t).Here, the stationary and ergodic fading envelopeαh(t) is a real-valued random variable with aRayleigh distribution, and nh(t) is complex-val-ued AWGN with statistically independent realand imaginary components. The total one-sidedpower spectral density of the AWGN is denotedN0 [W/Hz] and the one-sided channel bandwidthis denoted B [Hz].

Performance Analysis of Adaptive CodedModulation with Antenna Diversity andFeedback DelayK J E L L J . H O L E , H E N R I K H O L M A N D G E I R E . Ø I E N

A general adaptive coding scheme for spectrally efficient transmission on flat fading channels was intro-

duced by the authors in an earlier paper [2]. An instance of the coding scheme utilizes a set of multi-

dimensional trellis codes designed for additive white Gaussian noise channels of different qualities.

A feedback channel between the decoder and encoder makes it possible for the encoder to switch

adaptively between these codes based on channel state information fed back from the decoder. In this

paper, the adaptive coding scheme is employed in a mobile wireless communication system consisting

of a stationary transmitter with one antenna, a wireless Rayleigh fading channel, and a mobile terminal

with one or more receive antennas. The bit-error-rate at the output of the decoder is determined for

various terminal speeds, time delays in the feedback channel, and number of receive antennas. The

obtained results indicate that the proposed adaptive coding scheme is well suited for communications

over mobile wireless channels with carrier frequencies in the high MHz range, delay spread up to

250 ns, and terminal mobility up to pedestrian speed.

Henrik Holm (30) received hisSiv.Ing. and PhD degrees inelectrical engineering from theNorwegian University of Scienceand Technology (NTNU) in 1997and 2002, respectively. He iscurrently a post doctoral re-seacher at the NTNU and aguest researcher at the Univer-sity of Minnesota. His researchinterests include statistical mod-elling and robust transmissionon wireless channels.

[email protected]

Kjell Jørgen Hole (41) receivedhis BSc, MSc and PhD degreesin computer science from theUniversity of Bergen in 1984,1987 and 1991, respectively.He is currently Senior ResearchScientist at the Department ofTelecommunications at the Nor-wegian University of Scienceand Technology (NTNU) inTrondheim. His research inter-ests are in the areas of codingtheory and wireless communi-cations.

[email protected]

106

Page 109: Information Theory and its Applications

107

Let S [W] denote the constant average transmitpower. The instantaneous received carrier-to-noise ratio (CNR) on antenna branch h at timeindex t is then

with expectation E[γh(t)] = = ΩS/(N0B)

where is assumed independent ofh. Thus, is also equal for all h.

The mobile terminal implements an MRC com-biner to process the H received branch signals[5, p. 316]. Since the branch signals are statisti-cally independent, the instantaneous CNR at theoutput of the H-branch MRC combiner is givenby .1) If we denote E[γ] = then

= /H, and the gamma probability densityfunction (pdf) of the instantaneous CNR γ at theoutput of the MRC combiner may be written as[5, Eq. (5.2-14)]

(1)

γ ≥ 0.

It is convenient to view the combination of the Hantenna branches and the MRC combiner as asingle channel. The instantaneous CNR γ at theoutput of this channel determines the channelstate at a given time. We assume that the mobileterminal has perfect knowledge of γ. The range[0,° ) of possible CNR values is divided intoN + 1 non overlapping intervals (or fadingregions). At any given time the CNR will fall inone of these fading regions, and the associatedCSI, i.e. the region index n ∈ 0, 1, ..., N, issent to the stationary receiver via the feedbackchannel, which is assumed to be error free.

Assume that γ ∈ [0, γ1) in fading region 0,γ ∈ [γn, γn+1) in region n ∈ 1, 2, ..., N – 1, andγ ∈ [γN,° ) in region N. Also, assume that theBER must never exceed a target maximumBER0. When γ ∈ [γn, γn+1) we use a multidimen-sional trellis code, denoted code n ∈ 1, 2, ...,N, designed to achieve a BER ≤ BER0 on anAWGN channel of CNR γ ≥ γn. For γ < γ1, i.e.γ in fading region 0, the channel conditions areso bad that no information is transmitted, andwe have an outage during which the informationflow is buffered.

Let 4 ≤ M1 < M2 < ... < MN denote the number ofsymbols in N quadrature amplitude modulation(QAM) constellations of growing size, and let

code n be based on the constellation with Mn

symbols. For some small fixed L ∈ 1, 2, ...,the encoder for code n accepts L ⋅ log2(Mn) – 1information bits at each time index k = L ⋅ t ∈0, L, 2L, ... and generates L ⋅ log2(Mn) codedbits. The coded bits specify L modulation sym-bols in the nth QAM constellation. These sym-bols are transmitted at time indices k, k + 1, ...,k + L – 1. The L two-dimensional symbols canbe viewed as one 2L-dimensional symbol, andfor this reason the code is said to be a 2L-dimen-sional trellis code. In practice, the N codes arechosen such that they may be encoded anddecoded by the same codec [2].

To determine the values of the fading regionboundaries (or thresholds) γn, we need to deter-mine the BER performance of each code. Whencode n is operating on an AWGN channel ofCNR γ, the BER-CNR relationship for varyingγ may be approximated by the expression

(2)

where an (> 0) and bn (> 0) are constants whichdepend only on the weight distribution of thecode [2]. These constants can be found for anygiven code by least-squares curve fitting of datafrom AWGN channel simulations to (2). The fit-ting must be done separately for each code in theset.

Plots of BER found in the literature indicate thatthe approximation in (2) is accurate for anyCNR γ resulting in BER & 10-1 (see Figure 1 foran example). Unfortunately, for the minimumvalue γ = 0, the approximation reduces to BER ≈an, and since an can be larger than one, (2) maybe of little use for low CNRs. When we onlywant to approximate the BER at moderate-to-high CNRs, as was done in [2], this is not aproblem. However, we need to approximate theBER for any CNR γ ≥ 0 in this paper, and wewill therefore use the following BER expressionfor code n

(3)

Here, the boundary

Geir E Øien (36) received hisMSc and PhD degrees from theNorwegian University of Scienceand Technology (NTNU) in 1989and 1993, respectively. From1994 until 1996 he was withStavanger University College asassociate professor. Since 1996he has been with NTNU, since2001 as full professor of infor-mation theory.Prof. Øien is a member of IEEEand the Norwegian Signal Pro-cessing Society. He is author/co-author of more than 40research papers. His currentresearch interests are in infor-mation theory, communicationtheory, and signal processing,with emphasis on analysis anddesign of wireless communica-tion systems.

[email protected]

1) We suppress the time dependence from now on for notational simplicity.

Telektronikk 1.2002

γh(t) =α2

h(t) · S

N0B, h = 1, 2, · · · , H,

γh

γh

γh

Ω = E[α2

h(t)

]

γ =

∑H

h=1γh γ

γ

pγ(γ) =

(H

γ

)HγH−1

(H − 1)!exp

(−H

γ

γ

),

BER ≈ an · exp

(−bnγ

Mn

),

BERn =

an · exp

(−

bnγMn

), γ ≥ γ∗

n

1

2, γ < γ∗

n

γ∗

n=

ln(2an)Mn

bn

Page 110: Information Theory and its Applications

108 Telektronikk 1.2002

is the smallest CNR such that the BER is nolarger than 0.5. The boundary was obtained byassuming equality in (2), setting BER = 0.5, andsolving for γ.

For a true BER between 10-1 and 0.5 the expo-nential expression in (3) tends to produce alarger value than the true BER, assuming thatthe coded communication system manages tomaintain synchronization. In practice, it is diffi-cult to maintain synchronization for a very highBER, and the approximation BER = 0.5 maytherefore be close to the true BER of a real sys-tem. If the coded system should exhibit a BER> 0.5 for a very low CNR, then all decodedinformation bits may be flipped to achieve aBER < 0.5. Hence, 0.5 is a reasonable upperbound on the BER.

Assuming a target BER0 such that γ > γn* andsetting BERn equal to BER0 in (3), the thresh-olds are given by [2]

γn = (MnKn)/bn, n = 1, 2, ..., N (4)γN+1 = ∞

where Kn = –ln (BER0/an).

The probability that γ falls in fading region n,Pn = P(γn ≤ γ < γn+1), is given by [4, Eq. (10)]

(5)PH H

Hn

H Hn n

=( ) − ( )

−( )

+Γ Γ, ,

!

γγ

γγ

1

1

where

(6)

is the complementary incomplete gamma func-tion [6, Eq. (8.350.2)]. Since H is an integer in(5), the function may be calculated using [6, Eq.(8.352.2)].

III. BER DegradationThe BER degradation due to nonzero feedbackdelay and nonzero terminal speed is determinedin this section. It is assumed that the communi-cation system utilizes a set of N trellis codeswith known parameters an and bn.

Let the total feedback delay, τ [s], be the timebetween the moment the mobile terminalacquires a set of L modulation symbols and themoment the stationary transmitter activates anew code. The total feedback delay is deter-mined by the sum of three delays: i) the process-ing time needed by the terminal to estimate theinstantaneous CNR γ and to determine in whichfading region n the CNR falls, ii) the timeneeded to feed back the region index n to thetransmitter, and iii) the processing time neededby the transmitter to activate code n.

In a real system, the processing delay i) dependson the technique used to estimate the instanta-neous received CNR, whereas the processingdelay iii) depends on the encoder complexity.Since the distance between the stationary trans-ceiver and the mobile terminal is assumed to beno more than a few hundred meters, the trans-mission delay ii) is mainly determined by thecommunication protocols.

The size of the signal constellation Mn = Mn(γ)at time index t is a function of the instantaneousreceived CNR γ, but the constellation is used attime t + τ when γ has changed to γτ . Conse-quently, while the CNR γ falls in some fadingregion n, i.e. γn ≤ γ < γn+1, the CNR γτ may falloutside this region. Substituting γτ for γ in (3),we can write the BER as a function of γτ for agiven γ:

(7)

The average BER for γ in fading region n is nowgiven by

(8)

BER nτ =

γ n

γ n+1∫

BERnτ γ τ γ( ) pγ τ γ γ τ γ( )dγ τ0

∞∫ pγ (γ )dγ ,

BERnτ γ τ γ( ) =

an ⋅exp − bnγ τMn (γ )( ), γ τ ≥ γ n

*

12 , γ τ < γ n

*

⎧⎨⎪

⎩⎪

Γ υ,µ( ) = tυ−1µ

∞∫ e− tdt

Figure 1 The boxes are BERestimates generated bysoftware simulation and thecurves are estimates obtainedfrom (3). The labels indicatethe number of symbols in theQAM signal constellationsutilized by the 4-dimensionaltrellis codes

5 10 15

CNR in dB

20 25 30

-0.5

-1

-1.5

-2

-2.5

-3

-3.5

-4

M=4

log 10

BE

R

M=8

M=16

M=32

M=64

M=128

M=256

M=512

Page 111: Information Theory and its Applications

109Telektronikk 1.2002

where pγ(γ) is given by (1). Furthermore,

is the pdf of γτ conditioned on γ [4]

(9)

The function IH-1(⋅) in (9) is the (H – 1)th-ordermodified Bessel function of the first kind [7, Ch.9]. The pdf also contains the channel power cor-relation coefficient ρ at lag τ. It is shown inAppendix A that ρ is given by the square of thezeroth-order Bessel function of the first kind [7,Ch. 9],

(10)

for any number of receive antennas. Here,fD = υ/λ [Hz] is the maximum Doppler fre-quency shift defined by the terminal speed υ[m/s] and the wavelength λ [m] of the carrier.

Using (7), the average BER in fading region ngiven by (8) can be rewritten as the differencebetween two double integrals,

where

(11)

and

(12)

The double integral I2(n) is zero for γn* = 0, i.e.when parameters an and bn result in good BERapproximations for any γτ ≥ 0. Hence, I2(n) maybe viewed as a “correction term” needed whenan and bn are only useful for γτ ≥ γn* > 0.

It is shown in Appendix B that

(13)

I1(n) =an

H −1( )!H

γ

⎝⎜

⎠⎟

H

Γ H,βnγ n( )− Γ H,βnγ n+1( )ωn( )H

I2(n) =def

γ n

γ n+1∫

an ⋅exp −bnγ τMn

⎝⎜

⎠⎟ −

1

2

⎣⎢⎢

⎦⎥⎥0

γ n*

∫⎧⎨⎪

⎩⎪

pγ τ γ γ τ γ( )dγ τ ⋅ pγ (γ )dγ .

I1(n) =def

γ n

γ n+1∫

an ⋅exp −bnγ τMn

⎝⎜

⎠⎟ pγ τ γ γ τ γ( )dγ τ0

∞∫

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪⋅pγ (γ )dγ

BER nτ = I1(n)− I2(n),

ρ = J02 2πf Dτ( ),

pγ τ γ γ τ γ( ) =H

1−ρ( )γγ τργ

⎝⎜

⎠⎟

H−1( ) / 2

⋅IH−12H ργγ τ

1−ρ( )γ⎛

⎝⎜

⎠⎟ ⋅exp −

H ργ + γ τ( )1−ρ( )γ

⎝⎜

⎠⎟ .

pγ τ γ γ τ γ( )

where

(14)

and

(15)

From Appendix C, we have

(16)

for

(17)

where

(18)

is the incomplete gamma function [6, Eq.(8.350.1)]. Since H + j is an integer in (17), thefunction may be calculated using [6, Eq.(8.352.1)].

The average BER over all N codes, denoted by

, is equal to the expected number of

information bits in error per modulation symboldivided by the expected number of transmittedinformation bits per modulation symbol,

. (19)

Here, in = log2(Mn) – 1/L is the number of infor-mation bits per modulation symbol and Pn isdefined by (5). In practice, the double integralI2(n) can only be approximated since the sum in(17) must be terminated after a definite numberof terms. Since each term in the sum is positive,the termination causes the expression in (19) tobecome an upper bound on the BER. The tight-ness of the bound improves as the number ofterms is increased. We will use the ten firstterms in the sum of (17) in the next section.

BER τ =inn=1

N∑ BER nτ

inn=1N∑ Pn

=inn=1

N∑ I1(n)− I2(n)[ ]inn=1

N∑ Pn

BER τ

γ inc υ,µ( ) = tυ−10

µ∫ e− tdt

S a b aH

j H j

HM

b HM

H jb

M

H

H jH

n n n

H

j

j

n

n n

j H

n

nn

,!

! !

,

,

*

( ) =−( )−( )

⋅+ −( )

⎧⎨⎪

⎩⎪ −( ) +

⎣⎢⎢

⎦⎥⎥

⋅ + +−( )

⎣⎢⎢

⎦⎥⎥

⎝⎜⎜

⎠⎟⎟

⋅ +

=

∞+

def

inc

1

1

1 1

1

0

ρ

ρρ γ

γρ γ

γ

γΓ nn nH j

H

1 11

−( )

⎝⎜⎜

⎠⎟⎟

⎣⎢⎢

− +−( )

⎝⎜⎜

⎠⎟⎟⎤

⎦⎥⎥

⎫⎬⎪

⎭⎪+

ρ γγρ γ

Γ ,

I2(n) = S an ,bn( )−S 1

2 ,0( )

ωn =H

γ+

bn

Mn.

βn =H

γ+

Hρbn

HMn + γ 1−ρ( )bn

Page 112: Information Theory and its Applications

110 Telektronikk 1.2002

IV. Evaluation of ExampleCodec

An adaptive codec with eight 4-dimensional trel-lis codes was described in [2]. The individualcodes’ BER performances on an AWGN channelwere simulated for various CNRs. The obtainedBER points (represented by boxes) are shownin Figure 1. Curve fitting with the least squaresmethod was used to obtain the parameters an andbn listed in Table 1. The corresponding BERapproximations (3) are plotted in Figure 1. Theexpression in (4) was used to determine the tabu-lated thresholds2) γn (rounded to one decimaldigit) for target BER0 = 10-4.

Using the thresholds γn, setting L = 2, and= 20 dB, the base-10 logarithm of the averageBER (19) is plotted as a function of the correla-tion coefficient ρ in Figure 2 for H ∈ 1, 2, 4receive antennas. We observe that because thethresholds are chosen according to (4), theinstantaneous BER is smaller than the targetBER0 for γn < γ < γn+1 and ρ close to one. As aresult, the average BER will be below BER0 forlarge ρ (see Figure 2).

Let τmax denote the maximum total delay, ormaximum tolerable delay, for a given targetBER0. The expression (10) for ρ can be used todetermine the maximum tolerable delay τmax fordifferent Doppler shifts fD and targets BER0.The minimum values of ρ (rounded to three dec-imal digits) needed to achieve ≤ BER0= 10-4 are listed in Table 2 for H ∈ 1, 2, 4.

If we let the carrier frequency be f = 1900 MHzand use the value c = 3 ⋅ 108 m/s for the speedof light, then the wavelength of the carrier fre-quency is λ = c/f = 3/19 (⊕ 0.16) m. A mobileterminal with (pedestrian) speed υ = 1 m/s thenhas Doppler shift fD = υ/λ = 19/3 (⊕ 6.33) Hz.The corresponding maximum tolerable delaysτmax (rounded to one decimal digit) are listed inTable 2.

To see that the fading is nearly constant overmany hundred modulation symbols for commu-nications at pedestrian speed, we calculate thenumber of symbols transmitted during the maxi-mum tolerable delay τmax. We first need todetermine a bandwidth B for which it is reason-able to assume that the fading is frequency-flat.The (rms) delay spread, σd [s], measures howmuch a signal component may be delayed duringtransmission [8, Sec. 2.2.2]. The reciprocal ofthe delay spread provides a measure of the widthof the band of frequencies which are similarlyaffected by the channel response. The channel

2) The thresholds in Table 1 are larger than the thresholds in [2, Table I] because we have reduced the target BER0 from 10-3 to 10-4. Furthermore, thepath memory length of the Viterbi decoder was set to 9 in [2] while a path memory length of 16 was used in this paper.

n Mn an bn γn [dB] ≈

1 4 188.7471 9.8182 7.7

2 8 288.8051 6.8792 12.4

3 16 161.6898 7.8862 14.6

4 32 142.6920 7.8264 17.6

5 64 126.2118 7.4931 20.8

6 128 121.5189 7.7013 23.7

7 256 79.8360 7.1450 26.9

8 512 34.6128 6.9190 29.7

Table 1 Parameters an and bn for example codec and calculated thresholds γn [dB]for target BER0 = 10-4

H min. ρ τmax [ms] ≈ τmax/T [symb.] ≈

1 0.997 2.7 1,080

2 0.991 3.4 1,360

4 0.963 6.9 2,760

Table 2 Minimum correlation coefficient ρ needed to achieve BER ≤ 10-4 for aver-age antenna branch CNR = 20 dB and different number H of receive antennas.Maximum tolerable delay τmax [ms] and number of modulation symbols transmittedduring τmax for carrier frequency 1900 MHz, bandwidth 400 kHz, and terminalspeed ν = 1 m/s

Figure 2 Base-10 logarithm of average BER for correlation coefficient 0.8 ≤ ρ < 1,target BER0 = 10-4, and average antenna branch CNR = 20 dB

0.80 0.82 0.84 0.86 0.88 0.90

correlation coefficient

-1.5

-2

-2.5

-3

-3.5

-4

-4.5

-5

H=1

log 10

BE

R

0.92 0.94 0.96 0.98

H=2

H=4

γh

γh

γh

〈BER〉τ

Page 113: Information Theory and its Applications

111Telektronikk 1.2002

is therefore approximately frequency-flat if thebandwidth B << 1/σd.

At 1900 MHz, the multipath delay spread is upto σd = 250 ns for a cordless phone in indoorand outdoor environments [9]. Hence, we mayassume that a channel with bandwidth at leastup to B = 400 kHz has frequency flat fading.The time needed to transmit one symbol at theNyquist signaling rate is T = 1/B = 2.5 µs, result-ing in τmax/T symbols being transmitted duringthe maximum tolerable delay. Using the roundedvalues of τmax in Table 2, we obtain the τmax/Tvalues listed in the rightmost column of Table 2for terminal speed υ = 1 m/s and H ∈ 1, 2, 4.

V. ConclusionIt has been shown (see Figure 2) that the BERperformance may degrade considerably as ρ de-creases, which – for a given carrier frequency– corresponds to increasing the terminal speed.However, the degradation can be mitigated bythe use of MRC antenna diversity. Still, ourresults indicate that adaptive coded modulationmay be best suited for systems with moderatemobility requirements, with terminals movingat pedestrian speed.

Appendix A – Calculation of ρIn this appendix we show that the channel powercorrelation coefficient ρ is given by the expres-sion in (10). The instantaneous received CNRon the channel may be expressed as γ = α2 ⋅ Kwhere α2 is the channel power gain and K =S/(N0 B). Since ,we have E[α2] = HΩ. Assume that α2 is thepower gain at some time t and let ατ

2 be thepower gain at time t + τ for τ > 0. The correla-tion coefficient ρ between α2 and ατ

2 is thengiven by

(20)

The channel gain α2 is gamma distributed [8, p.48]. Hence, assuming that the channel powergains α2 and ατ

2 have the same expectations andstandard deviations, we have E[α2]E[ατ

2] =(HΩ)2 and = (E[α2])2/H = HΩ2 in (20).

To calculate E[α2 ατ2], we first compare two

different expressions for the instantaneousreceived CNR γ. When the communication chan-nel is viewed as a Rayleigh fading channel witha H-branch MRC combiner, then

where αh2 is the

power gain on the Hth antenna branch. Since wealso have γ = α2 ⋅ K, it follows that

and we can write

(21)

Furthermore, because the signals on differentantenna branches (h i) are statistically indepen-dent, the covariance

or equivalently, .The expression in (21) is then equal to

and the correlation coefficient in (20) reduces to

(22)

Observe that (22) is independent of the numberof receive antennas H. In fact, (22) defines thecorrelation coefficient for a Rayleigh fadingchannel (without MRC). It is shown in [8, Eq.(2.68)] that the numerator in (22) is equal to Ω2

J02 (2πfDτ), and as a result, ρ is given by the

expression in (10).

Appendix B – Evaluation of II1(n)In the following we calculate the double integralin (11). For the inner integral, BERn in (3) isfixed since the CNR γ is fixed. It follows from(3) that

Using this expression for Mn and setting Dn =– ln(BERn / an), the inner integral in (11) isequal to

(23)

=

E[γ] =∑

H

h=1E[γh] = HKΩ

ρ =cov(α2

, α2

τ)√

σ2

α2σ

2

α2τ

=E[α2

α2

τ] − E[α2]E[α2

τ]

σα

2σα

.

σα

2σα

γ =

∑H

h=1γh = K

∑H

h=1α2

h

α2

=

∑H

h=1α

2

h

E[α2α

2

τ ] = E

[(H∑

h=1

α2

h

) (H∑

i=1

α2

i,τ

)]

=

H∑

h=1

E[α2

hα2

h,τ

]+

H∑

h=1

i =h

E[α2

hα2

i,τ

].

cov(α

2

h, α2

i,τ

)=

E[α2

hα2

i,τ

]− E

[α2

h

]E

[α2

i,τ

]= 0,

E[α2

hα2

i,τ

]− E

[α2

h

]E

[α2

i,τ

]= 0,

E[α2α2

τ

]= HE

[α2

hα2

h,τ

]+ H(H − 1)Ω2,

ρ =

E[α2

hα2

h,τ

]− Ω

2

Ω2

Mn = −

bnγ

ln(BERn/an).

I1(n, γ)def= an

∫∞

0

H

(1 − ρ)γ

(γτ

γρ

)(H−1)/2

· exp

(−

H(ργ + γτ )

(1 − ρ)γ−

Dnγτ

γ

)

·IH−1

(2H

√ργγτ

γ(1 − ρ)

)dγτ .

Page 114: Information Theory and its Applications

112 Telektronikk 1.2002

Appendix C – Evaluation of II2(n)We shall calculate the double integral I2(n)defined by (12). We first split the double integralin two to obtain

(28)

(29)

The second integral (29) is a special case of thefirst integral (28) with an = 2-1 and bn = 0.Hence, we only need to consider the first inte-

gral. The pdf defined by (9) con-

tains the (H – 1)th-order modified Bessel func-tion of the first kind defined by [7, Eq. (9.28)]

for ν an integer. Using this definition, the innerintegral in (28) is equal to

where γinc(⋅, ⋅) is defined by (18). The outer inte-gral in (28) is then equal to (17). Since the dou-ble integral in (29) is a special case of the doubleintegral in (28), the “correction term” I2(n) isnow given by (16).

References1 Hole, K J, Øien, G E. Spectral efficiency of

adaptive coded modulation in urban micro-cellular networks. IEEE Trans. Veh.Technol., 50 (1), 205–222, 2001.

2 Hole, K J, Holm, H, Øien, G E. Adaptivemultidimensional coded modulation over flatfading channels. IEEE J. Select. Areas Com-mun., 18 (7), 1153–1158, 2000.

3 Goeckel, D L. Adaptive coding for time-varying channels using outdated fading esti-mates. IEEE Trans. Commun., 47 (6),844–855, 1999.

aHM

b HM

H

j H j

M H

b HM

H jb

M

H

nn

n n

H

j

n

n n

j

n

n

1 1

11 1 1

1

0

2

−( ) +

⎣⎢⎢

⎦⎥⎥

⋅ −−( )

⎝⎜⎜

⎠⎟⎟

+ −( ) −( ) −( ) + ⎡

⎣⎢⎢

⎦⎥⎥

+ +−( )

⎣⎢⎢

=

ρ γργρ γ

ργρ γ ρ γ

γρ γ

exp

! !

,inc⎦⎦⎥⎥

⎝⎜⎜

⎠⎟⎟γ n

*

Iν (z) =1

2z⎛

⎝⎞⎠

ν 12 z( )2 j

j + ν( )! j!j =0

pγ τ γ γ τ γ( )

− 2−10

γ n*

∫ pγ τ γ γ τ γ( )dγ τ⎧⎨⎩

⎫⎬⎭γ n

γ n+1∫ pγ (γ )dγ .

I2(n) =

an ⋅exp −bnγ τMn

⎝⎜

⎠⎟ pγ τ γ γ τ γ( )dγ τ0

γ n*

∫⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪γ n

γ n+1∫

⋅pγ (γ )dγ

Introducing the constant

and making the substitution

the integral (23) can be written as

(24)

The value of the integral in (24) is equal toQH(x, 0) where QH(⋅, ⋅) is the generalized Mar-cum Q-function of order H [7, Eq. (11.63)].Since QH(x, 0) = Q1(x, 0) = 1 for all x, we have

(25)

The double integral in (11) can now be writtenas

I1(n) = F(γn) – F(γn+1) (26)

for

To calculate F(ξ), we first observe that BERn in(3) is no longer a constant since γ varies. Usingthe connection Dn = – ln(BERn / an) = (bnγ)/Mn,it follows from (1) and (25) that

where βn is defined by (14). Substituting t = βnγand observing that

for ωn defined by (15), we get

(27)

where Γ(⋅, ⋅) is given by (6). The expression forI1(n) in (13) is now obtained from (26) and (27).

F (ξ ) =

an

H −1( )!H

γ

⎝⎜

⎠⎟

H Γ H,βnξ( )ωn( )H

HMn

HMn + γ 1−ρ( )bn

⎝⎜

⎠⎟

H1

βn

⎝⎜

⎠⎟

H

=1

ωn

⎝⎜

⎠⎟

H

F (ξ ) =an

H −1( )!H

γ

⎝⎜

⎠⎟

HHMn

HMn + γ 1−ρ( )bn

⎝⎜

⎠⎟

H

⋅ γ H−1ξ

∞∫ exp −βnγ( )dγ

F ξ( ) = I1(n,γ ) pγξ

∞∫ γ( )dγ .

I1(n,γ ) = anHγ

Hγ + γ 1−ρ( )Dn

⎝⎜

⎠⎟

H

⋅exp −ρDnHγ

Hγ + γ 1−ρ( )Dn

⎝⎜

⎠⎟ .

I1(n,γ ) = anHγ

Hγ + γ 1−ρ( )Dn

⎝⎜

⎠⎟

H

⋅exp −ρDnHγ

Hγ + γ 1−ρ( )Dn

⎝⎜

⎠⎟

⋅z

x⎛⎝

⎞⎠

H−1( ) / 2

e− z− x IH−1 2 xz( )0

∞∫ dz.

z =H

γ 1−ρ( )+

Dn

γ

⎝⎜

⎠⎟ γ τ ,

x =ρH2γ 2

γ 1−ρ( ) Hγ + γ 1−ρ( )Dn( )

Page 115: Information Theory and its Applications

113Telektronikk 1.2002

4 Alouini, M-S, Goldsmith, A J. Adaptive M-QAM modulation over Nakagami fadingchannels. Proc. 6th Communications TheoryMini-Conference (CTMC VI) in conjunctionwith IEEE Global Communications Confer-ence (GLOBECOM’97), Phoenix, Arizona,Nov. 1997, 218–223.

5 Jakes, W C (ed.). Microwave Mobile Com-munications. NJ, Piscataway, IEEE Press,second ed., 1994.

6 Gradshteyn, I S, Ryzhik, I M. Table of Inte-grals, Series, and Products. San Diego, CA,Academic Press, fifth ed., 1994.

7 Temme, N M. Special Functions – An Intro-duction to the Classical Functions of Mathe-matical Physics. New York, NY, JohnWiley, 1996.

8 Stüber, G L. Principles of Mobile Communi-cation. Norwell, MA, Kluwer, 1996.

9 Ue, T et al. Symbol rate and modulation lev-elcontrolled adaptive modulation/TDMA/TDD system for high-bit-rate wireless datatransmission. IEEE Trans. Veh. Technol., 47(4), 1134–1147, 1998.

Page 116: Information Theory and its Applications

Telektronikk 1.2002

1 Shannon MappingsModern telecommunications are to a largedegree based on the works of Nyquist and Shan-non. The present work is no exception, but it isspecifically related to a less known general ideapresented by Shannon in his famous 1949 paper[33]. There he suggests mapping time discretesignals from a continuous multidimensionalsource space to a continuous channel space ofdifferent dimensionality. By this mechanism onecan obtain bandwidth compression, which willincrease the number of simultaneous users on aphysical channel, or bandwidth expansion if it isnecessary to increase the received signal fidelitydue to a noisy channel.

When we initially started the work presented inthis article, we were not aware of Shannon’sidea. When we discovered that Shannon hadintroduced the geometrical mappings in his 1949paper [33], we were pleased and assumed wewere on the right track. To honor Shannon, wewould like to propose to call the mappings usedin this paper Shannon mappings. Shannon didnot pursue his idea, partly because technologywas not ready for designing and implementingsuch systems at the time. Today the situation isquite different.

The mapping description is very general andleads to a deeper understanding of the generalcommunication problem. One can cast suchtechniques as quantization and modulation intothis framework. The term joint source-channelcoding is often encountered in modern literature.The mapping idea can probably incorporate allthe different techniques suggested for source-channel coding, and it is probably the most gen-eral way of looking at the entire communicationproblem.

In this article we will present the general ideas,introduce some simple examples of how to con-struct mappings for special sources and chan-nels, and show complete systems for imagetransmission.

2 Introduction to theCommunication Problem

The basic objective in efficient communicationof natural signals like speech, images, and video,can be stated as: optimizing for conveying asmany signals as possible with a specified qualityover a given physical channel. The channel isusually band-limited, it has certain power and/oramplitude constraints, e.g. due to battery lifetimeand regulations for radio transmission, and thechannel will usually distort the signal by inser-tion of different noise types, like thermic noiseor contamination by crosstalk from other electri-cal signals, or linear and nonlinear distortionsdue to non-ideal behavior of the channel.

The exploitation of a given physical channelcan be influenced by different measures. Theseinclude signal adaptation to the channel charac-teristics, driving power, relay amplification/regeneration, and optimal receivers. Given thechannel capacity, the number of useful signalsthat can be transmitted is also influenced by sig-nal compression. However, the amount of com-pression is limited by the unavoidable noise gen-erated when amplitude-continuous signals arecompressed.

Under certain simplifying, but rather realisticconditions, the channel capacity can be calcu-lated [33]. To approach the channel capacity,which guarantees lossless transmission of multi-level signals, one has to resort to systems of in-finite complexity, which also require infinite

Shannon Mappings for Robust CommunicationT O R A . R A M S T A D

Shannon’s geometric interpretation of messages and channel representations in communicating time-

discrete, amplitude-continuous source symbols is exploited in this paper. The basic idea is to map what

we can call source space symbols into channel space symbols, where the two symbols possibly have

different dimensions. If we decrease the dimension through this operation, bandwidth reduction is

obtained. If, on the other hand, the dimension is increased when mapping from the source to the

channel space, bandwidth expansion results. In practical systems several mappings are applied for

different dimension changes depending on the importance of the source symbols as well as the avail-

able channel resources. The paper also discusses theoretical limits for Gaussian sources and channels,

expressed by OPTA (optimal performance theoretically attainable), and compares practical mapping

constructions with these limits. Finally, the paper demonstrates how efficient image communication

systems can be designed, and shows that much higher robustness towards channel variations can be

obtained than is possible for pure digital systems.

Tor A. Ramstad (60) receivedhis Siv.Ing. and Dr.Ing. degreesin 1968 and 1971, respectively,both from the Norwegian Univer-sity of Science and Technology(NTNU). He has held variouspositions in the Dep. of Telecom-munications at NTNU, wheresince 1983 he has been a fullprofessor of telecommunica-tions. In 1982–83 he was a visit-ing associate professor at theUniversity of California, SantaBarbara; he was with the Geor-gia Institute of Technology in1989–90 as a visiting adjunctprofessor, and again at UCSBas a visiting professor in 1997–98. Dr. Ramstad’s researchinterests include multirate signalprocessing, speech and imageprocessing with emphasis onimage and video communica-tions, where joint source-chan-nel coding is a central topic

[email protected]

114

Page 117: Information Theory and its Applications

115Telektronikk 1.2002

delay. In such a channel, the compression opera-tion can be viewed as a separate problem [32,34] due to Shannon’s separation theorem. Toobtain a minimum digital representation, infinitecomplexity is also required for the codec.

The aim of this paper, however, is to contributeto developing understanding for the possiblegains which can be reached by doing joint com-pression and channel coding, when we constrainthe allowed complexity and delay. We stress thatvery good robustness is obtained for the pro-posed systems without resorting to explicit errorprotection.

2.1 Discrete Transmission overNoise-free, Band-limited Channels

Throughout the discussion we assume that alluseful signals are either ideally band-limited, orcan be made close to that through lowpass filter-ing without loss of subjective quality. (The pass-band of the filter can be freely chosen.) Thisenables us to sample the signal at or above theNyquist rate without loss of information. If thebandwidth is B, and the minimum necessarysampling frequency is Fs = 2B, then the sampledsignal can be transmitted over an ideal andnoise-free Nyquist channel with bandwidth W =B. That is, the time discrete signal requires thesame bandwidth as the original analog signal.

In practice, we need to relax this assumptionsomewhat. First of all, some oversampling isnecessary to allow for realistic low-pass anti-aliasing filters, and also, any Nyquist channelneeds some roll-off factor in order to obtainfinite order filters. We can account for botheffects by including some oversampling factorα implying that the practical channel bandwidthmust be W = αB. α is a design parameter thatcan be made close to 1 by increasing the filtercomplexities.

2.2 Limits in Signal CommunicationThe above discussion on error free transmissionspecifies the number of samples that can betransmitted for a given channel bandwidth. Itdoes not include limitations induced by noise,which is an omnipresent phenomenon. As weshall see, noise-free transmission would implyinfinite channel capacity!

The second important aspect in communicationsis how signals carry information and how theycan be represented by samples that can be con-veyed through the physical channel. Actually,from a mathematical point of view any analogsignal contains infinite information, it is onlywhen accepting noise contamination that finiteinformation results. However, this is not a disas-ter, as we shall see that the human observer isquite tolerant towards certain degradations.

2.2.1 Distortion and RateNatural signals conveying meaningful informa-tion to humans invariably have inherent sample-to-sample dependencies. Such dependencies areusually characterized as redundancy. Further-more, human perception is not capable of distin-guishing signals with certain distortions from thepure original. Even observable distortion is tol-erated. Most communication systems rely heav-ily on the user tolerance to distortions. Theamount of imperceivable distortion is oftencalled irrelevancy. The amount of acceptabledistortion can be called noise tolerance.

To assess distortion we need distortion mea-sures. Such measures should agree as much aspossible with the perception of the receiver.When for natural signals the receiver is a humanbeing, the distortion measure should mimichuman perception. This turns out to be rathercomplex both for visual and auditory perception.Although it does not reflect human perceptionwell, the most common distortion measure is themean squared error (mse). If the original- andreconstructed signals are given by vectors with

components xi, i = 1, 2, 3, ..., M and , i = 1, 2,

3, ..., M, respectively, the mse is defined by

(1)

where ε is the expectation operator.

If we have a good signal model, we can, accord-ing to Shannon, evaluate the necessary rate forobtaining a given distortion. A Gaussian sourceof independent and identically distributed sam-ples has a simple rate distortion (R-D) functionmeasured in bits/sample

(2)

where σ2X is the signal power and σ2

D is theaccepted distortion. As the rate can never benegative, the limiting case means that the noiseis set equal to the signal. That is, it is preferableto transmit nothing for this case. The expressionσ2

X / σ2D is called the signal-to-noise ratio

(SNR), and is usually measured in dB.

With correlated sources, the R-D function ismore complex. The main point is that due to theinterrelation between the samples, a correlatedsignal can be coded at a lower rate for the samedistortion.

Although we measure the rate in bits/sample, theformula does not imply that we must quantize

R =12 log2

σX2

σD2

⎧⎨⎩

⎫⎬⎭, for σD

2 ≤ σ X2 ,

0, for σD2 > σ X

2 ,

⎨⎪

⎩⎪

D = ε1

Mxi − xi( )2

i=1

M

∑⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪,

xi

Page 118: Information Theory and its Applications

116 Telektronikk 1.2002

the signal. This is only a measure of information,that is, it measures the information in a signalwhen a certain noise level is acceptable.

2.2.2 Channel CapacityThe Nyquist theorem tells us how many samplescan be transmitted given the bandwidth. It doesnot tell us how much information can be con-veyed by each sample when the channel has agiven signal-to-noise ratio. Shannon’s channelcapacity provides this measure.

The simplest case is when the channel is memo-ryless and power limited, that is, the channelpower is given by σ2

C, and the channel is cor-rupted by white, Gaussian noise with power σ2

N.Then the channel has a capacity measured in bitsper sample:

(3)

σ2C / σ2

N is called the channel signal-to-noiseratio (CSNR), and is also usually measured in dB.

It is again more complicated to find the capacityof channels with other constraints and memoryas well as other noise types. For our purpose thischannel will illustrate our main points.

2.2.3 Optimal Performance TheoreticallyAttainable (OPTA)

OPTA is a measure for the limits for efficientcommunications. It can be derived from the rate-distortion function and the channel capacity.

The simplest case occurs when we transmit onesource sample per channel sample. This meansthat the source rate has to be equal to the channelcapacity

(4)

which implies that

(5)

This means that the signal-to-noise ratio in thereceived signal is given by the signal-to-noiseratio on the channel plus 1.

An interesting fact is that unlike almost all othercases, the OPTA can be reached using a verysimple implementation.

A possible system model is given by Figure 1.The signal samples are transmitted without anymodification over a Nyquist channel with addi-tive Gaussian noise. We reach the OPTA condi-tion for this system when the signal is also Gaus-sian by selecting

(6)

Remember that σ2C = σ2

X for the above case.

If we try to construct an optimal system basedon the separation theorem for this case, wewould have to apply an infinite-dimensionalvector quantizer for the source coder followedby a channel coder with infinite delay.

In the above case the signal bandwidth and thechannel bandwidth are the same. If, for somereason, this is not acceptable, there must be asample rate change. This is easily understood ifwe consider rate per time unit rather than rateper symbol. The source produces 2B samples persecond, which indicates that its rate is given by2BRs bits per second. Likewise, the channel cantransmit 2W symbols per second, which gives acapacity of 2WC bits per second. Setting thesetwo rates equal; we obtain OPTA for the generalcase (still assuming the same simple source andchannel models).

(7)

Solving this equation with respect to the sourcesignal-to-noise ratio, we obtain

(8)

We observe that the bandwidth relation W/Benters the equation. This bandwidth change canbe obtained only by dimension change whereM source samples are combined into K channelsamples, where K/M W/B.

In Figure 2 the OPTA-curves for different sam-ple compression ratios are given. We haveincluded dimension increase in the plots, whichgives sample rate expansion, as well as dimen-sion reduction. We mentioned initially that the

Figure 1 Optimal system withno bandwidth alteration

x(k)

n(k) σ2

y(k)

β

N

σ2X

C =1

2log

2

1 +

σ2

C

σ2

N

.

Rs =1

2log

2

σ2

X

σ2

D

= C

=1

2log

2

1 +

σ2

C

σ2

N

σ2

X

σ2

D

=

(1 +

σ2

C

σ2

N

).

β =σ2

X

σ2

X+ σ2

N

.

2B1

2log

2

σ

2

X

σ2

D

= 2W

1

2log

2

1 +

σ2

C

σ2

N

.

σ2

X

σ2

D

=

(1 +

σ2

C

σ2

N

)W/B

Page 119: Information Theory and its Applications

117Telektronikk 1.2002

ultimate aim is to make rate or bandwidth reduc-tion, but this can be obtained only if the channelhas a sufficient CSNR for the required receivedsignal quality. If the channel is not good enougheven without compression, sample rate increaseis necessary.

Practical signal sources are usually decomposedinto sub-sources with different variances andeven time variations. Each source would nor-mally require different SNR, implying that somesub-sources can be compressed, others can besent unmodified, while some have to be ex-panded. Actually, some sub-sources are insignif-icant and can be skipped altogether. This is anatural consequence of the condition that Rs = 0for σ2

D = σ2X in Equation 2. The average rate

required by all these sub-sources is the interest-ing number for assessing the efficiency of thesystem.

This theory gives the limits for any single-sourcesingle-channel system, but it does not tell ushow to get close to these limits. The rest of thispaper will provide examples of possible methodswhere nonlinear mappings are used for dimen-sion change. In most of these methods no bitrepresentation is used, but it is convenient tocompare the results to more traditional resultswhere bits give an intermediate representationform.

3 Geometric Description ofDimension Change

Define the input signal as a vector x consistingof M components. The vector can be representedgeometrically as a point in a Euclidean space ofdimension M. Mathematically we say thatx ∈ RM. Then the vector components x1, x2, ...,xM are the coordinate values of the vector. Adimension change can be performed by a map-ping Ψ from the space of dimension M to aspace of dimension K (Ψ : RM → RK) as

y = Ψ(x). (9)

The operator Ψ is generally nonlinear, but wecan occasionally make use of linear operators.

Mappings without dimension change (K = M),and mappings with dimension increase (K > M)can be made invertible, whereas dimension re-ducing mappings (K < M) for all practical pur-poses involve approximative operations, and arethus not invertible.

It is the instructive to split the mapping opera-tion into two parts; an approximation operationq, which maps the complete space onto a sub-space, and the dimension-changing operation T,which is an invertible operation:

Ψ = q ° T. (10)

The operations are illustrated in Figure 3. Theoperation q is, of course, an identity operationwhen Ψ is invertible.

The simplest way of doing dimension reductionis to skip some of the vector components.

Although dimension increase can always bedone without approximation, the most commonway of increasing the signal dimension is byquantization, which is not an invertible operation.

3.1 Quantization in Source-to-Channel Mappings

Quantization must be applied for making a digi-tal representation from an analog signal. We willdiscuss the most general form called vectorquantization, and show that the principle is verysimple and closely related to the mappings dis-cussed in this paper.

Figure 2 OPTA curves withM/K as parameter

50

45

40

35

30

25

20

15

10

5

00

SN

R (

dB)

5 10 15 20 25

CSNR (dB)

1:2

1:1

3:2

2:1

3:14:1

q

x1

x2

xM

•••

w1

w2

wM

•••

T

y1

y2

yK

•••

Figure 3 Mapping fromhigher to lower dimension

as a two-stage process. q indicates approximation

while T represents themapping between dimensions

Page 120: Information Theory and its Applications

118 Telektronikk 1.2002

A vector quantizer can be split into an approxi-mation part q and a part which maps to a discreterepresentation. This is an intermediate represen-tation from which we can map to the channel inmany ways.

q is a mapping of x ∈ RM to a finite set W =w0, ..., wL-1|wi ∈ RM of representation vec-tors:

q : RM → W. (11)

The mapping q can be defined on a partition ofRM into L non-overlapping, M-dimensional cellsCi according to the rule

x ∈ Ci ⇒ q(x) = wi, i = 0, 1, ..., L – 1, (12)

where Ci satisfies

= RM and Ci ∩ Cj = for i j. (13)

In a VQ setting the collection of representationvectors is referred to as the codebook. The cellsCi, called Voronoi regions, can be thought of assolid polygons in the M-dimensional space RM.

The index i identifies that vector uniquely and istherefore an efficient representation of the vec-tor. The vector can be reconstructed approxi-mately as x wi by looking up the representa-tion vector in cell number i. Thus, the bit rate inbits per sample in this scheme is b = log2(L)/M.A further bit reduction can also be obtained byentropy coding of the indices, provided the sym-bols have different probabilities.

The next step is to design the channel represen-tation. The simplest example is transmitting theindices bitwise on a binary channel using K = b.We then obtain a mapping Ψ : RM → Rlog2(L).Whether this is a bandwidth expansion or reduc-tion depends on whether log2(L) >< M.

With reference to Figure 3 the dimension-chang-ing mapping T consists of the index assignmentand the channel representation of the index.

But there are many other ways of making chan-nel representations of the index. One couldgroup the bits in the index into, say b1 bits, andtransmit the message by 2b1-level signals. Thiswould reduce the bandwidth by a factor of b1.Often we want to protect the bits using forwarderror correction (FEC), in which case we addparity bits, thus increasing the necessary channelbandwidth.

Let us analyze the simplest system of all wherewe use scalar, uniform quantization (M = 1) withb bits per sample and binary signaling on the

Ci

i=0

L−1

U

channel. Uniform quantization requires that allVoronoi regions are of length ∆ on the real line.

The reconstructed signal can then be written

(14)

where the ai’s represent the bits received withvalues 0 and 1, and b is the total number of bitswhen we reserve one bit for the sign of x.

Consider now the result of a bit error when theai’s are transmitted directly on the binary chan-nel. An error in the least significant bit hardlyinfluences the reconstruction, whereas a bit errorin the most significant bit changes the sign of thesignal and thus can change the value greatly ifthe amplitude is large at the same time. Thisproblem relates to what Shannon calls thethreshold effect (see below). The advantage ofthis method is that we require only a modestCSNR to get a very low bit error rate.

If we, on the other hand, transmit the completeindex directly using L = 2b channel levels, atransition from one state to another only makes achange in the least significant bit. This requires amuch higher CSNR than for binary transmission,but at the same time guarantees a moderatechannel noise influence on every sample.

Even though we can represent the process ofgoing from the real line via quantization to somechannel representation in many steps, the twoimportant steps are the approximation stepwhich maps the real line to a discrete subset onthe real line, and the mapping from these dis-crete values to a discrete channel space of possi-bly higher dimension.

We claimed earlier that dimension increase doesnot have to involve any approximation. And as amatter of fact, the quantization step is not neces-sary. We will return to this shortly.

We will now leave methods based on Shannon’sseparation theorem and study mappings ingreater detail and present some important exam-ples. We start with a discussion of methods sug-gested by Shannon.

3.2 Ideas in Shannon’s PaperShannon, in his 1949 paper [33], suggests amapping which is reproduced in Figure 4. Ifused for signal expansion, the length along thecurve from some reference point represents thesource sample amplitude, while the two coordi-nates of any point on the curve gives the twochannel samples, which can be represented as aQAM symbol. Channel noise will disturb the

ˆ ( ) ,x x aii

i

b

= −

=

∑∆sign 21

1

=∆

Page 121: Information Theory and its Applications

119Telektronikk 1.2002

signal. If, in the receiver, the decision deviceprojects the noise corrupted sample down to theclosest point on the curve, then small noise sam-ples will only move the signal along the curveand thus produce insignificant change. If thechannel noise reaches a certain level, there is aprobability of crossing to a neighboring line.This is what Shannon refers to as the thresholdeffect. This effect will be encountered for allnon-linear expanding mappings.

Shannon also mentions that the same mappingcan be used for signal compression. Then thetwo coordinates represent the signal vector, andare approximated by the closest point on thecurve. The channel symbol can be selected asthe distance along the curve from a referencepoint to the approximation point on the curve.

We will demonstrate that Shannon-like map-pings work quite well for signal expansion, andargue that they may also perform well for com-pression if the signal is uniformly distributedover the square.

Shannon suggested another algorithm for signalcompression. For the special case of 2:1 map-ping (Ψ : R2+ → R1+) he first represents the twocomponents in decimal or binary form:

x1 = 0.c1c2c3 ...x2 = 0.d1d2d3 ..., (15)

where ci and d1, i = 1, 2, 3 ... are the digits. Theone-dimensional signal y is obtained by pickingdigits from x1 and x2 alternately and insertingthem into y as

y = 0.c1d1c2d2c3d3 ..., (16)

In this way it is possible to obtain an exact repre-sentation of any two real numbers by one realnumber using an infinite number of digits. In acommunication system y can be transmitted asone sample.

It is important to note that if x1 and x2 are bothdescribed by b bits, then y needs 2b bits to beexactly represented. Assume we transmit firstthe original samples using one multilevel symbolfor each component. This would require 2b lev-els for each symbol. If y is transmitted as a mul-tilevel symbol, the required bandwidth is re-duced by a factor of 2, but the number of levelsnow needs to be 22b for exact representation. Fortransmission over noisy channels this wouldrequire approximately twice the signal-to-noiseratio measured in dB for transmitting y com-pared to transmitting the original symbols x1and x2. So if we define a bandwidth-SNR prod-uct (Hz × dB), it remains unchanged for thismapping operation.

It is easy to generalize this method to any map-ping Ψ : RM → R1 or even Ψ : RM → RK.

In order to compare this system with others, wecast the algorithm into geometric form. Assumethat the two components have a supportxk ∈ [0, 1], k = 1, 2, which is indicated by theway we have written the numbers in Equations15 and 16. Also represent the components with

Figure 4 One-to-two-dimensional mappingsuggested by Shannon

Uncertaintydue to noise

Figure 5 Cantor maps forsignal dimension change

representing the mappinggiven by Equations 15 and 16

0

1

0.8

0.6

0.4

0.2

0

x 2

0.2 0,4 0.6 0.8 1

x1

0 0.2 0,4 0.6 0.8 1

x1

1

0.8

0.6

0.4

0.2

0

x 2

Page 122: Information Theory and its Applications

120 Telektronikk 1.2002

In the next two sections we will dig deeper intothe geometric construction of mappings fordimension change in communication systems.

3.3 Dimension ReductionLet us discuss further how we can make exactrepresentations of a signal vector by using avector of a lower dimension. We consider 2:1mappings which try to represent the 2-D space,which is a plane, by use of a one-dimensionalcontinuous curve. This is somewhat differentfrom the previous example where we used a dis-crete subset as an intermediate representation.This new problem is related to what is calledspace-filling curves, or Peano-curves after theinventor.

Assume that the region of support for the 2-Dsignal is a unit square. Figure 6 shows how theclassical Hilbert-curve can be constructed to fillthe entire space. To be able to represent anypoint exactly, an infinite number of iterationshas to be used.

A possible one-dimensional representation ofany point in the two-dimensional space is thedistance from the entrance-point of the curve tothe point on the curve which coincides with thecoordinates of the vector. It is obvious that thisdistance will, in general, be infinite. To copewith this problem for practical use, we mustresort to a limited number of iterations, whichwould mean that we can only represent mostpoints of the space approximately, but the dis-tance to all points will then be finite.

3.3.1 Signal ApproximationWe can conclude that it is necessary to makeapproximations when we lower the signal di-mensionality and want to transmit the resultingsignal components over a channel with finitesignal-to-noise ratio.

The subspace generated by q in Figure 3 musthave certain topological properties suited forsubsequent mapping to the lower dimension inthe operator T.

A very good continuous mapping for the casewhen the region of support is a disk of radius 1,is a double spiral composed of two spirals ofArchimedes. The spirals can be described para-metrically as,

(17)

and

x1 = 2∆θ

2πcos(θ ),

x2 = 2∆θ

2πsin(θ ),

Figure 6 The first 4 iterationsfor constructing a space-fillingHilbert-curve

Figure 7 Double spiral ofArchimedes. The orange spiralcan be represented by positivechannel symbols, while thegrey spiral can be representedby negative channel symbols.The star represents a 2-Dinput vector. The channelrepresentation is obtained byapproximation to the closestpoint on the spiral indicatedby a circle. The channel addsnoise to the channel sample,which in turn causes therepresentation point to movealong the spiral, as indicatedby the diamond

4

2

0

-2

-4-4 -2 0 2 4

a finite number of bits, which obviously is thesame as quantization. Then mappings using 3and 4 bits are shown in Figure 5 in the upperand lower parts, respectively.

The lines connecting the quantization values in-dicate how y is constructed. The numeric valueof y is proportional to the number of nodeswhich is traversed from the origin to the actualvector point. From the two figures one easilydeduces a way to construct systems with morebits in a systematic way. In the limit when b → , the whole space will be densely popu-lated by points, and thus the accuracy of thecompressed signal is also guaranteed.

Page 123: Information Theory and its Applications

121Telektronikk 1.2002

Let us try to get more insight into the noise prob-lem by studying the spiral example further. Fig-ure 7 shows the effect of approximation andchannel noise. As the amplitude of the transmit-ted signal is corrupted by noise, the reconstruc-tion will be inexact. If the channel sample wasoriginally obtained as the length along the spiralto the approximation point, the received signalwill give the length along the spiral to thedecoded point as indicated in Figure 7 with adiamond.

How do the two noise contributions interact? Itis obvious that the approximation noise dependson the density of the spiral relative to the signalcomponents’ standard deviation. We illustratethis in Figure 9. The noise is more severe whenthe standard deviation of the signal is small rela-tive to the density of the spiral.

But the denser the spiral becomes, the larger thechannel amplitudes get. If the channel power (orchannel amplitudes, if channel is amplitude lim-ited) is too large, a downscaling is necessary. Inthe receiver an upscaling must be applied, whichalso increases the noise with the upscaling factor.

We observe a typical trade-off: If we lowerthe approximation noise by making the spiraldenser, the influence from the channel noisewill become more severe, and vice versa. There

Figure 8 Overall system model

Figure 9 Scatter plots ofGaussian signals with

standard deviation σX = 1.0 inside the spiral

q T R

w1

w2

wM

•••

x1

x2

xM

•••

•••

y1

y2

yK

•••

n1

n2

nK

•••

x1

x2

xM

4

2

0

-2

-4-4 -2 0 2 4

x1

x 2

(18)

θ is the rotation angle to the point [x1, x2] rela-tive the curves’ derivative at the origin. 2∆θ/πis the radius to the point, while ∆ is the distancebetween two neighboring crossings of the realaxis of either of the two spirals.

A spiral example is shown in Figure 7. Any 2-Dvector, as e.g. the value represented by the star,will be approximated to the closest point on oneof the spirals, in this case the point indicated bythe circle. This is very similar to vector quanti-zation. In VQ the approximations are points inthe space, while here it is a continuous curve.

The transform T can be chosen in many ways.One possibility is to make y equal to the lengthfrom the approximation point on the spiral alongthe spiral to the origin. One spiral can be repre-sented by positive amplitudes while the othercan be represented by negative channel samples.Or one could select y to be the rotation angle θ.

To fill the entire space with the two curves weneed to let ∆ → 0. Then the spiral length will beinfinite.

It is also easy to conceive systems for makingspace-filling curves for mappings from higherdimensions to one dimension. A one-dimension-al approximate representation of a three-dimen-sional sphere can be visualized by a ball of yarn.The thread is the curve, and a one-dimensionalrepresentation of a point within the sphere couldbe the length of the thread to the closest possiblepoint on the thread.

T can be generalized to any non-linear lengthadjustments, as if the thread in the ball of yarnwas made from rubber, and could be stretchedunevenly.

3.3.2 Channel NoiseIn the complete communication chain, the chan-nel noise is the next obstacle when we want tomaximize throughput with a given fidelity. Amore complete signal chain is shown in Figure 8.

In the model we represent the channel noise bythe vector n. At the receiver an approximateinverse operation, R, tries to minimize the noiseeffect while preserving the original signal aswell as possible. The received signal vector isthus

= R ° (n + T ° q)(x). (19) x

x1 = 2∆θ

2πcos(θ + π ),

x2 = 2∆θ

2πsin(θ + π ).

Page 124: Information Theory and its Applications

122 Telektronikk 1.2002

exists a balance between the two contributionswhich will make the system optimal for a givenchannel signal-to-noise ratio (CSNR). Figure 10shows results from several simulations whentransmitting Gaussian, white noise over a Gaus-sian, memoryless channel using different spiralsand different channels.

It is quite interesting that the closest point fromeach of the simulated curves to OPTA is in therange of 1 – 2 dB. By studying the backgroundmaterial more closely, another interesting con-clusion is that at the closest point to OPTA therelation σN/∆ = 0.35 holds approximately for alarge range of conditions when no scaling of thePAM symbols was performed.

Another important aspect of the double spiralapproximation shown in Figure 9 is that it runsthrough the origin and covers the plane in a sym-metric manner for the negative and positivechannel amplitudes (drawn as grey and yellowlines, respectively). This implies that the channelrepresentation is symmetric (provided that thesource samples are rotation symmetrically dis-tributed), and that the transmitted power is lowbecause the probability density function is peakyat the origin for Gaussian signals, and will berepresented by small channel amplitudes.

We can summarize the important requirementsfor obtaining a good dimension reducing map-ping.

1. The mapping should cover the entire space insuch a way that any point is mapped to a closerepresentation point.

2. Probable source symbols should be mapped tochannel symbols with low amplitudes to mini-mize the average channel power.

3. Signals that are close in the channel spaceshould remain close when mapped back to thesource space. This will prevent small channelnoise samples from inducing large errors inthe decoded signal. The opposite is not neces-sary. Two close source samples may well bemapped to entirely different regions in thechannel space.

Based on the above observations, will Shannon’smapping in Figure 4 perform well for dimensionreduction? For the region of support shown inthe figure, the curve covers the space quitenicely. If we pick the center of the figure as thereference point for zero channel amplitudes, thepower requirement will also be satisfied if theprobability density function is uniform over thesquare. On the other hand, if the signal is Gaus-sian, then the channel power will be higher forthis mapping than for the spiral mapping.Finally, the channel noise will perturb the signalin an acceptable way. Altogether, these specula-tions indicate that Shannon’s original mappingwould perform fairly well, especially for uni-formly distributed signals.

3.4 Dimension IncreaseDimension increase becomes necessary toimprove the fidelity in transmitting signalamplitudes over noise prone channels.

A well-known mapping is obtained by transmit-ting a signal amplitude K times and averagingthe output signal. This is therefore a 1 : K map-ping. This will improve the signal-to-noise ratioin the receiver if each of the samples is contami-nated by independent noise components. This isa repetition code.

It is easy to conclude that we gain 3 dB in SNRper doubling of K. A plot of the SNR versusCSNR using a repetition code for K = 2 is shownin Figure 11. There is a striking difference be-tween the obtained performance and OPTAexcept at very low rates.

Let us take a closer look at the above repetitioncode in terms of geometry and try to find thereason for its poor performance. Figure 12shows the repetition code when K = 2. Both

Figure 10 SNR versus CSNRfor different values of σX/∆ =[0.1, 1.0, 2.0, 4.0]. The OPTA-curve is also shown for the 2D-1D mapping

35

30

25

20

15

10

5

0

-510 20 30

CSNR (dB)40 50 60

SN

R (

dB)

0

Figure 11 Comparison of theperformance of a code wherethe samples are sent twice tothe OPTA-curve for two timesbandwidth expansion

55

45

35

25

15

55 10 15 20 25

SN

R (

dB)

0CSNR (dB)

Page 125: Information Theory and its Applications

123Telektronikk 1.2002

channel components are equal, which impliesthat they lie on a diagonal line in the squarechannel space. It is immediately clear that thechannel space is not well exploited. Most of thespace is empty!

3.4.1 Error-free mapping Ψ : R → R2

Although expanding mappings can certainly bedevised for dimension change by rational ratios,we limit our discussion to the case Ψ : R → R2.

A very simple error-free expanding mappinguses one component that represents a discretizedversion of the signal and an extra componentthat represents the difference between the exactsignal and the discretized signal.

The discretized signal can be transmitted as aPAM multilevel signal

y1 = K1q(x), (20)

where K1 is a scaling factor. The second compo-nent is the correction term which can be trans-mitted as a continuous PAM signal

y2 = K2(x – q(x)),

where K2 is a second scaling factor. The poweris distributed among the two samples throughthe scaling factors. The quantizer is optimizedboth for decision and representation levels.

Two typical resulting mappings (from [2]) thathave been optimized for different CSNRs areshown in Figure 13. Observe that the orangelines are not part of the mappings, but illustratethe connection between the different parts. Theperformance of this type of mapping is given inFigure 14. It performs much better than the repe-tition code presented in Figure 11!

To approach the mapping suggested by Shan-non, every second of the horizontal lines in theprevious mapping is reversed, resulting in theoptimized system in Figure 15. The performanceof this system is slightly worse than the systemin Figure 13. The original mapping suggested byShannon is thus expected to perform quite wellfor signal expansion.

4 Joint Source Coding andModulation IncorporatingSignal Decomposition

Real-world signals are much more complex thanthe signals we have studied so far. As a matterof fact, the type of information conceivable byhuman observers requires structures in the signalthat involve statistical variations and sampledependencies. This can be modeled as signalswith short-term statistics, such as spectra, thatchange from position to position. The non-white

Figure 12 Repetitionexpansion from one to twodimensions (Ψ : R → R2)

0.8

0.4

0

-0.4

-0.8

-0.5 0 0.5 1

y 2

-1y1

Figure 13 Optimized 1D-to-2D mapping for CSNR =

20 dB (top) and 3 dB (bottom)

6

4

2

0

-2

-4

-6-2 0 2 8

y 1

-6y2

-4-8 64

6

4

2

0

-2

-4

-6-2 0 2 8

y 1

-6y2

-4-8 64

Figure 14 Performance of the1D-to-2D mapping as a

function of the CSNR when thesystem is optimized for each

point on the curve. The upper curve is OPTA

60

40

20

015 20 25

SN

R (

dB)

5

CSNR (dB)

100 30

Page 126: Information Theory and its Applications

124 Telektronikk 1.2002

Figure 15 “Shannon-look-alike” mapping

6

4

2

0

-2

-4

-6-2 0 2

y 1-6

y2

-4-8 4 6 8

spectra, which account for sample dependencies,imply that some signal decomposition should beapplied before any mapping takes place to de-correlate the signal. The variabilities must beaccounted for by some type of adaptivity.

Signal decomposition can be performed in fre-quency selective filter banks. If non-overlappingbands are implemented, the output subbandchannels will be uncorrelated. With many chan-nels the bands will be narrow, which implies thateach band also will be close to white. The filterbank outputs have different power in each chan-nel, which also vary with position.

A complete system for signal transmission in-cluding signal decomposition is shown in Figure16. In this system the outputs from the filterbank are analyzed and classified. The classifica-tion information is used to select different map-pings that are pre-optimized for the encoder.

4.1 System ExampleIn the following we present an example takenfrom [2]. We give a brief review of the mostimportant aspects of the system. Details can befound in the thesis.

In this system the signal decomposition is per-formed in a filter bank using “System K” from[1]. This is a separable filter bank which in eachdimension first splits the signal into 8 subbandsof equal size. The resulting low-pass band is fur-ther split dyadically into three stages. The filtercoefficients were found by optimizing for codinggain.

Altogether 5 different mappings are used in thecoder. These are, in terms of their K : M ratiogiven by 1:4, 1:2, 2:3, 1:1, 2:1. The dimensionreducing mappings are optimized for Laplaciansources, because this corresponds most closelyto the actual distribution. The optimizationmethod was developed by Fuldseth [6]. It opti-mizes a discrete set of points for minimum msetaking the approximation noise and the channelnoise into account. That is, the mappings areoptimized for a certain CSNR. The 1:2 mappingis shown in Figure 17. Notice that the mappingshave a “spiral form”. If we drew an optimizedspiral for a Gaussian source, it would look evenmore like Archimedes’ spirals.

The 1:1 mapping is based on Equation (6) al-though it is optimal only for Gaussian sources.In [2] it is shown that the deviation from OPTAis slight even when the source is Laplacian. The1:2 mappings are of the type shown in Figure 13.

A very important part of the coder is the map-ping allocation [2] included in Figure 16. This isa mechanism for using the available resourcesoptimally. The resources in this case meansbandwidth and channel power. Bandwidth isdirectly related to the different mapping ratios,while power is a consequence of the mappings,but can be altered by including scaling factorswhen inserting the signal components into thechannel. In this coder the allocation is based onthe local variances in each of the subbands.

Some digital side information must be conveyedto the receiver in order to inform about whichmappings have been used. Full error protectionis provided for this information to make sure thatit is not lost before the noise destroys all mean-

Figure 16 Signalcommunication systemincluding signaldecomposition andmapping allocation

Figure 17 2:1 mapping forLaplacian source. Themapping was optimized fora white, Gaussian channel withCSNR = 23.1 dB. 256 pointwere used for the optimization

MappingAllocation

SignalDecomp.

ChannelSourceq()•

••

T()

-2 0 2-6 -4 4 6

6

4

2

0

-2

-4

-6

Page 127: Information Theory and its Applications

125Telektronikk 1.2002

ingful information anyway. Channel resourcesare allocated also to this part of the message.

It is difficult to make a completely fair compari-son to other systems. In [2] the reference systemuses JPEG2000 part 1 baseline coder (ISO/IEC,2001), implemented in verification model 8.0.Using QAM modulation with Gray coding andlow-density parity check codes with soft deci-sion, performance 3 dB away from the channelcapacity can be achieved for an additive, white,Gaussian noise channel [28]. By assuming thismodel and error-free performance down to thisCSNR, and breakdown after this point, the simu-lation results for two images are shown in Figure18. PSNR means Peak Signal-to-Noise Ratio,and is defined as the ratio between the quadraticpeak value divided by the noise power, mea-sured in dB. It is a common quality measure forimage evaluation.

Three curves are shown for each image. Theyrepresent systems optimized for the CSNR indi-cated by the circles and crosses. For the refer-ence coder this is the CSNR for which the coderbreaks down. This requires a certain bit rate forerror protection. The system does not improvefor the given design when the channel CSNRincreases. This is different for the proposed sys-tem.

For the “Goldhill” image, which is quite de-tailed, the proposed coder outperforms the refer-ence coder everywhere. The situation is differentfor “Lenna”. This image is much smoother andfavors the reference coder. But the proposedcoder still offers graceful degradation and canbe used for lower CSNRs.

We have argued that the signal-to-noise ratiodoes not correspond to our perception. Thereforethe visual quality of the received signal shouldalso be studied. In the images in Figure 19 thenew system is compared to the JPEG2000 sys-tem combined with the efficient channel code atCSNR = 20 dB for the two upper images. Thelower image is for the new system at CSNR =17 dB. This is at a point where the JPEG2000system breaks down, so there is no need to showthat image, as it contains only rubbish.

What clearly should be observed is thatJPEG2000 blurs the tiles and the brick structureof the walls. The new system maintains more ofthat even at 17 dB CSNR. Notice, however, thatthere is a light spot on the roof of the left-handhouse which is probably due to a severe channelnoise component.

5 ConclusionThis paper uses the geometrical mapping methodsuggested by Shannon for making channel repre-sentations from signal vectors. This is a directmethod which does not need the intermediatequantization step for data reduction. What weoptimize for is bandwidth- and power use, whichare the natural resources available. The papershows results for an image coder recently devel-oped. It gives comparable coding results as theJPEG2000 coder combined with the best channelcoding methods available, and it offers muchbetter robustness towards channel changes. Itshould also be noted that the complexity of theproposed system can be quite low, and there isno extra delay for channel coding for the mainportion of the information.

Figure 18 Simulation resultsfor “Goldhill” (upper figure)

and “Lenna” (lower figure)including three systems

optimized for different CSNRs(marked with star or circle).

The dashed lines represent thereference coder, where the

circles indicate the breakdownpoint. The solid lines give theperformance of the proposedsystem. The crosses mark the

design point for each curve

15 20 25 30

34

32

30

28

26

24

22

CSNR (dB)

PS

NR

(dB

)

10

15 20 2510 30

34

32

30

28

26

24

22

CSNR (dB)

PS

NR

(dB

)

Page 128: Information Theory and its Applications

126 Telektronikk 1.2002

Many other systems based on the mappingmethod have been devised and simulated includ-ing video coders and other channel representa-tions, such as phase modulation.

We believe that the robustness offered, and thepossible gains obtained by further developmentof such systems could make them good candi-dates for wireless systems, especially for imageand video communication, but also for broad-casting.

The paper has for the most part only referencedthe central Shannon papers plus the most impor-tant Dr.Ing. theses because these make the mostcomplete description of the methods. However,there exist several papers which present parts ofthe methods and results, and others that present

aspects not covered in those references. Forcompleteness the following references are there-fore provided: [20, 18, 24, 25, 22, 19, 14, 23, 26,27, 21, 8, 10, 11, 12, 13, 6, 9, 7, 16, 17, 15, 4, 5,3, 29, 30, 31].

References1 Balasingham, I. On Optimal Perfect Recon-

struction Filter Banks for Image Compres-sion. Trondheim, Norwegian University ofScience and Technology, 1998. (PhD thesis.)

2 Coward, H. Joint source-channel coding :Development of methods and utilization inimage communications. Trondheim, Norwe-gian University of Science and Technology,2002. (PhD thesis.)

3 Coward, H, Ramstad, T A. Bandwidth dou-bling in combined source-channel coding ofmemoryless Gaussian sources. In: Proc.IEEE Int. Symp. Intell. Signal ProcessingCommun. Syst. (ISPACS), 1, 571–576, Hon-olulu, HI, USA, November 2000.

4 Coward, H, Ramstad, T A. Quantizer opti-mization in hybrid digital-analog transmis-sion of analog source signals. In: Proc. Int.Conf. on Acoustics, Speech, and Signal Proc.(ICASSP), Istabul, June 2000, 2636–2640.IEEE.

5 Coward, H, Ramstad, T A. Robust imagecommunication using bandwidth reducingand expanding mappings. In: Thirty FourthAsilomar Conference on Signals, Systemsand Computers, 2, 1384–1388, PacificGrove, CA, USA, October 2000.

6 Fuldseth, A. Robust Subband Video Com-pression for Noisy Channels with MultilevelSignaling. Trondheim, Norwegian Univer-sity of Science and Technology, 1997. (PhDthesis.)

7 Fuldseth, A, Fischer,T R, Ramstad, T A.Channel-optimized trellis-coded vectorquantization for channels with a power con-straint. In: Proc. Information Theory Work-shop (ITW-98), San Diego, USA, February1998.

8 Fuldseth, A, Lervik, J M. Combined sourceand channel coding for channels with apower constraint and multilevel signaling.In: Proc. ITG Conference, München, Ger-many, October 1994, 429–436. ITG.

9 Fuldseth, A, Ramstad, T A. Robust and effi-cient video communication based on com-bined source- and channel coding. In: Proc.Nordic Signal Processing Symposium

Figure 19 Image coding examplesat the rate 0.1 channel symbolsper pixel. Upper and lower imagesresult from the proposed coderat CSNR = 20 dB and 17 dB,respectively. The middle image is by the reference coder at CSNR = 20 dB

Page 129: Information Theory and its Applications

127Telektronikk 1.2002

(NORSIG -97), Tromsø, Norway, May 1995,65– 68.

10 Fuldseth, A, Ramstad, T A. Combined videocoding and multilevel modulation. In: Proc.Int. Conf. on Image Processing (ICIP), Lau-sanne, Switzerland, September 1996, I,941–944.

11 Fuldseth, A, Ramstad, T A. Robust subbandvideo coding with leaky prediction. In: Sev-enth IEEE Digital Signal Processing Work-shop, Loen, Norway, September 1996,57–60.

12 Fuldseth, A, Ramstad, T A. Bandwidth com-pression for continuous amplitude channelsbased on vector approximation to a continu-ous subset of the source signal space. In:Proc. Int. Conf. on Acoustics, Speech, andSignal Proc. (ICASSP), 1997, IV,3093–3096.

13 Fuldseth, A, Ramstad, T A. Channel-opti-mized subband video coding for channelswith a power constraint. In: Proc. Int. Conf.on Image Processing (ICIP), 3, 428–431,Santa Barbara, CA, USA, October 1997.

14 Grøvlen, A, Lervik, J M, Ramstad, T A.Combined digital compression and digitalmodulation. In: Proc. NORSIG-95 (SignalProcessing Symposium), Stavanger, Norway,September 1995, 69–74. IEEE/NORSIG.

15 Hjørungnes, A. Optimal bit and power con-strained filter banks. Trondheim, NorwegianUniversity of Science and Technology,2000. (PhD thesis.)

16 Hjørungnes, A, Ramstad, T A. Linear solu-tion of the combined source-channel codingproblem using joint optimal analysis andsynthesis filter banks. In: Thirty-First Asilo-mar Conference on Signals, Systems andComputers, Naval Postgraduate School, SanJose, CA, USA, 2, 990–994. Maple Press,November 1997.

17 Hjørungnes, A, Ramstad, T A. On the per-formance of linear transmission systemsover power constrained, continuous ampli-tude channels. In: Proc. for the UCSB Work-shop on Signal & Image Processing, SantaBarbara, USA, December 1998, 31–35.

18 Lervik, J M. Integrated system design in dig-ital video broadcasting. Piksel’n, 10 (4),12–22, 1993.

19 Lervik, J M. Joint optimization of digitalcommunication systems: Principles andpractice. In: Proc. ITG Conference,München, Germany, October 1994,115–122. ITG.

20 Lervik, J M, Eriksen, H R, Ramstad, T A.Bandwidth efficient image transmission sys-tem based on subband coding. A possiblemethod for HDTV. In: Proc. NOBIM Conf.,Lillehammer, Norway, February 1993. (InNorwegian.)

21 Lervik, J M, Fischer, T R. Robust subbandimage coding for waveform channels withoptimum power- and bandwidth allocation.In: Proc. Int. Conf. on Acoustics, Speech,and Signal Proc. (ICASSP), München, Ger-many, April 1997. IEEE.

22 Lervik, J M, Fuldseth, A, Ramstad, T A.Combined image subband coding and multi-level modulation for communication overpowerand bandwidth limited channels. In:Proc. Workshop on Visual Signal Processingand Communications, New Brunswick, NJ,USA, September 1994, 173–178. IEEE.

23 Lervik, J M, Grøvlen, A, Ramstad, T A.Robust digital signal compression and mod-ulation exploiting the advantages of analogcommunication. In: Proc. IEEE GLOBE-COM, Singapore, November 1995,1044–1048. IEEE.

24 Lervik, J M, Ramstad, T A. An analog inter-pretation of compression for digital commu-nication systems. In: Proc. Int. Conf. onAcoustics, Speech, and Signal Proc.(ICASSP), Adelaide, South Australia, April1994, 5, V–281–V–284. IEEE.

25 Lervik, J M, Ramstad, T A. A new approachto objective evaluation of power- and band-width-limited integrated communication sys-tems. In: Proc. Nordic Signal ProcessingSymposium (NORSIG -94), Ålesund, Nor-way, June 1994, 49–54. NORSIG.

26 Lervik, J M, Ramstad, T A. Robust imagecommunication using subband coding andmultilevel modulation. In: Proc. 1996 Sym-posium on Visual Communications andImage Processing (VCIP-96), Orlando, FL,USA, March 1996, SPIE 2727, 2, 524–535.SPIE/IEEE.

27 Lervik, J M. Subband Image Communicationover Digital Transparent and Analog Wave-form Channels. Trondheim, Norway, Nor-wegian University of Science and Technol-ogy, 1996. (PhD thesis.)

Page 130: Information Theory and its Applications

128 Telektronikk 1.2002

28 Myhre, B, Markhus, V, Øien, G E. LDPCcoded adaptive multilevel modulation forslowly varying Rayleigh-fading channels.In: Proc. Norwegian Signal ProcessingSymp. (NORSIG), Trondheim, Norway,October 2001.

29 Ramstad, T A. Efficient and robust commu-nication based on signal decomposition andapproximative multi-dimensional mappingsbetween source and channel spaces. In:Proc. NORSIG, Helsinki, Finland, Septem-ber 1996.

30 Ramstad, T A. Digital image communica-tion. In: Proc. International Workshop onCircuits, Systems and Signal Processing forCommunications ’97, Tampere, Finland,April 1997.

31 Ramstad, T A. Robust image and video com-munication for mobile multimedia. In:NATO Advanced Study Institute, Signal Pro-cessing for Multimedia, Il Cioccho, July1998.

32 Shannon, C E. A mathematical theory ofcommunication. Bell Syst. Tech. J., 27,379–423 and 623–656, 1948.

33 Shannon, C E. Communication in the pres-ence of noise. Proc. IRE, 37, 10–21, January1949.

34 Shannon, C E. Coding theorems for a dis-crete source with a fidelity criterion. IRENat. Conv. Rec., March 1959, 142–163.

Page 131: Information Theory and its Applications

129

1 IntroductionInformation theory is a branch of statistical com-munications theory. Figure 1-1 is a schematicrepresentation of a communications system; itconsiss of an information source, a communica-tions channel and a destination, the communica-tions channel consisting of a transmitter, a con-nection and a receiver. The information sourcegenerates the messages that are to be transmit-ted. The transmitter converts the messages to asuitable signal for transmission, and in thereceiver the inverse operation takes place. Thedestination is the person or equipment to whom,or to which, the message is addressed. On theway, noise is added. In the following treatment,we will ignore the effects of distortion of mes-sages resulting from non-linear characteristicsof the equipment, and other system “errors”.

We cannot know the content of the individualmessages beforehand, all we can know is thestatistical characteristics of the class of messageswe wish to transmit. Telecommunication istherefore a statistical process, and communica-tions systems must be constructed for a specifiedclass of messages with given statistical charac-teristics.*) In order to make an objective assess-ment of the ability of a system to transmit infor-mation, it is necessary to define a unit of infor-mation, just as the volt is the unit of electricalvoltage. We will now define such a unit. Bymeans of this we will then study the informationin different messages, how these should be con-verted to achieve an efficient transmission chan-

nel, and how much information it is possible totransmit through a channel with a given band-width and signal-to-noise ratio.

We are not concerned with the semantic contentof the message (its meaning); a communicationsengineer must be capable of constructing an effi-cient telegraphy system for Greek, knowing thestatistical characteristics of that language, eventhough he does not understand Greek.

We will attempt to reach our conclusions bymeans of clear and simple considerations, avoid-ing abstract mathematics (probability theory anddimension theory) which would be necessary forexact derivations. Hence, we can only considervery simple examples by means of practicalapplications.

2 Classification of SystemsAccording to Message Type

We can distinguish between the followingsystems:a) Discrete system. The message and the signal

both consist of a series of discrete (discontinu-ous) symbols. An example of this is ordinarytelegraphy, in which the message is a series ofletters and the signal consists of dots anddashes.

b)Continuous system. The message and the sig-nal are both continuously varying. An exam-ple is ordinary telephony, in which the mes-sage consists of pressure variations in the air

Information TheoryA lecture presented at a study session for radio technology and electro-acoustics

at Farris Bad, 16–18 June 1950by Graduate Engineer Nic. Knudtzon,

Norwegian Defence Research Establishment, Bergen

Figure 1-1 A communications system

*) See also “Statistical Communications theory. A brief overview of the problem”, an introductory lecture pre-sented at a study session for radio technology and electro-acoustics, 16–18 June 1950. Teknisk Ukeblad 1950.

This is a translation into English of the paper “Informasjonsteori”, which appeared in ElektrotekniskTidsskrift 63 (30), pp. 373–380, 1950. The translation was done by Berlitz GlobalNET and final quality

control was done by Geir E. Øien.

Informationsource

Transmitter Receiver Destination

Message Signal Signal+ noise

Message+ noise

Noise

Channel

Dr. Nic. Knudzon (80) obtainedhis Engineering degree from theTechnical University of Norway,Trondheim in 1947 and his Doc-tor’s degree from the TechnicalUniversity in Delft, the Nether-lands in 1957. 1948–1949 hewas with the Research Labora-tory of Electronics, Massachu-setts Institute of Technology,working with information theoryand experiments. 1950–1955 hewas with the Norwegian De-fence Research Establishment,Bergen, working on the devel-opment of microwave radiolinks; and from 1955 to 1967 hewas Head of the Communica-tions Division at Shape Techni-cal Center in The Hage, Nether-lands, where his efforts went intothe planning of military telecom-munications networks and sys-tems in Western Europe. From1968 to 1992 he was Director ofResearch at the Norwegian Tele-communications Administration,working on the planning of futuretelecommunications systems,networks and services. Dr.Knudtzon has been member ofgovernment commissions andvarious committees, includingthe Norwegian Research Coun-cil, the National Council forResearch Libraries, the Inter-national TelecommunicationsUnion, EURESCOM, etc.

Telektronikk 1.2002

Page 132: Information Theory and its Applications

130 Telektronikk 1.2002

and the signal is a continuously varying elec-trical function of time.

c) Hybrid system. Both discrete and continuoussignals are present. An example is pulse-codemodulation (PCM).

The individual messages are generated througha series of selections from a given register, thatis to say, by successive selection from a givencollection of symbols of one type or another.For example, a written message is produced byselecting letters from the alphabet of the lan-guage in question; speech by selecting particularsounds that the speaker can generate with his orher voice. It is worth noting that the register ingeneral consists of a limited number of symbols:hence we only have a few dozen written charac-ters at our disposal, and speech is physiologi-cally restricted to approximately 50 differentsounds. The number of possible messages offinite duration is therefore also limited. Thelarger the number of different symbols and thelonger the duration, the more information theindividual message will contain. The amountof information in a single message thereforeincreases with the number of possibilities.

3 Information in a DiscreteSystem

Written text is a typical example of a discretemessage. There are indications that signals in thehuman nervous system are also discrete, and cantherefore be represented by a series of choices.

3.1 Unit of InformationThe simplest type of choice is a choice betweentwo equal possibilities: 1 or 0, yes or no, headsor tails, or simply, any case of equally possible“either – or” states. We will define the unit ofinformation as the outcome of such a binary ele-mentary choice, and we will designate it 1 bit(abbreviation of “binary digit”). The designation1 Hartley has also been suggested, after one ofthe pioneers of information theory. Further, wewill postulate that H independent elementarychoices provide H bits of information. Based onthis, we are able to develop methods of calculat-ing the amount of information in discrete mes-sages.

3.2 Nth-order choiceThe selection of one symbol from a register ofN – that is a group of N possible elements – iscalled an Nth-order choice. To specify howmuch information such a choice represents, wemust reduce it to a series of independent elemen-tary choices. The number of these necessary tospecify one of the N possible elements is by def-inition equal to the number of bits of informationfor the Nth-order choice in question.

3.21 Choice of N Elements of EqualProbability

We envisage these as N symbols arranged in acolumn, as shown for N = 8 in Figure 3.1. Tospecify one particular element by means of aseries of elementary choices, we proceed asfollows: First, we divide them into two equalgroups, which represents one elementary choice.Then we divide each group into two sub-groups;two elementary choices will be sufficient tospecify one of these. The process of successivedivision is continued until the desired symbol isisolated from the others. We see that

1 elementary choice is needed for N = 22 elementary choices are needed for N = 43 elementary choices are needed for N = 8etc.

In general,

HN = log2 N bits (3-1)

are required to specify a choice from N equalpossibilities.

The groups which arise after each division canbe designated by 0 or 1 respectively, as shown in

Figure 3-1 Nth-order choices,elements of equal probability

S1

3. Division

0 0 0

S2

2. Division

0 0 1

S3

3. Division

0 1 0

S4

1. Division

0 1 1

S5

3. Division

1 0 0

S6

2. Division

1 0 1

S7

3. Division

1 1 0

S8 1 1 1

Symbol Numbers

Page 133: Information Theory and its Applications

131Telektronikk 1.2002

Figure 3-1. The resulting number of digits foreach symbol is then equal to HN, that is, equal tothe number of bits necessary to specify the sym-bols.

The above-mentioned result is strictly speakingonly correct if N is a power of 2, so that HN is aninteger. If this is not the case, HN will be equalto one of the two integers that are closest tolog2 N, depending on which symbol is to bespecified.

3.22 Choice of N Elements of UnequalProbability

Each of these possibilities has given (a priori)probabilities. Again, we perform successivedivisions in elementary choices, that is, into twogroups of elements having equal probability.The divisions are performed such that the totalprobability in the groups is equal.

Let us first consider an example. In Figure 3-2,four symbols Si are given, with probabilities pi

(where i = 1,..4). The divisions are performed asshown, and the groups are designated by 0s and1s. The number of digits is then equal to thenumber of elementary choices Hi bits, that arenecessary to specify the symbol in question. Hi

is different for the different symbols, the infre-quent ones representing more information thanthe frequent ones. We find that

symbol S1 S2 S3 S4

Hi 1 2 3 3

The average information per choice is thus

Generally, we find that the number of bits neces-sary to specify a symbol with a priori probabil-ity pi

HN =1

2⋅1+

1

4⋅2 +

1

8⋅3 +

1

8⋅3 =

7

4

Since the sum of the probabilities of all symbolsis equal to 1, the average information for the Nth-order choice is

(3-2)

This expression has a maximum at pi = 1/N, thatis to say, in the case of equally likely choices. Itis then reduced to Equation (3-1). If the foursymbols in the example above had been equallylikely, i.e. all pi = 1/4, the information per Nth-order choice would have been

HN = log2 N = 2

which is greater that for any other set of proba-bilities.

The right-hand side of Equation (3-2) has, ex-cept for the sign, the same mathematical form asentropy in thermodynamics for a system whosepossible states have the probabilities pi. It hastherefore been designated negative entropy.

The fact that the base 2 has been included in theabove expressions results from our decision todefine the unit of information as a choicebetween 2 equal possibilities. If we instead hadchosen to define the unit as a choice between 10equal possibilities, we would have obtainedcommon logarithms, etc. Mathematically, thereis of course no difference. However, in practicalterms, there are good arguments for workingwith binary numbers. An electrical apparatus, forexample, does not need to measure the size of apulse, but only decide whether the pulse is pre-sent or not. A switch, or flip-flop circuit has two

HN =

pi log21pi

i=1

N

pii=1

N

∑= − pi

i=1

N

∑ log2 pi

Hi = log21

pi= − log2 pi

Figure 3-2 Nth-order choices,elements of unequal

probability

2. Division

3. Division

S1

S2

S3

S4

1. Division

0

1 0

1 1 0

1 1 1

Symbol Numbers

1

2

3

3

Hi

a prioriProbability

Pi

1/2

1/4

1/8

1/8

Page 134: Information Theory and its Applications

132 Telektronikk 1.2002

possible states; m such elements therefore have2m possibilities, and can therefore store m bitsof information. Moreover, from experience, it isnatural for a person to divide successively intotwo equal groups, rather than directly into tengroups, for example.

3.3 Discrete Messages and SignalsA discrete message (signal) will generally con-sist of a series of symbols S1, S2, …, Si, …, SN,having the probabilities p1, p2, …, pi, …, pN, i.e.a series of Nth-order choices. Such a process,which depends on a set of a priori probabilities,is called a stochastic process. It is characterisedby certain statistical properties, which are deter-mined by these probabilities. An example ofsuch a discrete message is ordinary text, inwhich the symbols are selected from the alpha-bet of the language in question, which hasknown letter frequencies of occurrence, etc.

In the following treatment, we will assume thatthe messages (signals) we are concerned with arestatistically stationary, that is to say that theirstatistical characteristics are invariant towardstranslation in time of the relevant time series.Stated in another way: we assume that theergodic hypothesis is valid. The statistical char-acteristics can then be determined as time aver-ages for a single message of sufficient duration,or as the average for all messages in the sameclass at a given point in time.

3.31 The Symbols are IndependentIf the individual messages consist of one symbolthat can be selected from N possible symbols,there are N possiblities. If they consist of twosymbols, there are N2 possibilities, and so on.The total number of messages possible with nsuch Nth-order choices is M = Nn. If these areequally possible then the average informationper message (consisting of n symbols) is therefore

HM = log2 M = n log2 N

and the average information per symbol

HS = log2 N (3-3)

If the choices do not have equal probability, theaverage information per symbol according to theergodic hypothesis

(3-4)

It can be shown that this expression has a maxi-mum value at pi = 1/N, in other words, when allsymbols are equally possible.

HS is zero only when all values of pi are zeroexcept one, and this equals 1, i.e. N = 1. Themessage consists in this case of a series of one

single symbol, and is therefore characterised byan a priori probability of 1 (i.e. certainty), andcontains no information.

3.32 The Symbols are DependentIn practice, the symbols are often dependent.Thus, in Norwegian e is very often followed byn or r in word-endings, and there is therefore agreater conditional probability of n or r occur-ring after e than, for example, after l.

It can be shown that Equation (3-4) is valid alsoin the case of dependent choices, if we for pi

introduce the conditional probability for symbolSi, based on knowledge of all previous choices.

With dependent choices, the results of Section3.31 can be considered as a first approximation.They will give too high values of HS, becausethe uncertainty of each choice is smaller whenthose choices are dependent.

3.4 RedundancyWe define the redundancy of an informationsource as

where HSmaxis the maximum negative entropy

we could have obtained for the same symbols(ifthese were independent and equally probable).The redundancy is an expression of the correla-tion in the message.

For example, let us consider the English lan-guage. There are certain a priori conditionalprobabilities for the occurrence of digrams (twoconsecutive letters), trigrams (three consecutiveletters), etc, and for certain words. The redun-dancy for everyday English is found to beapproximately 50 %, if we do not consider sta-tistical structures over more than eight letters.This means that half of written English is deter-mined by the structure of the language, whilethe remainder is chosen freely. The above-men-tioned figures have been determined in differentways: by calculating the negative entropy of sta-tistical approximations to English, by eliminat-ing a certain proportion of the letters in a textand letting another person read it, and by crypto-graphic methods. Here, it is worth mentioningthat there is a considerable difference betweenthe redundancy of Shakespeare’s English with alarge vocabulary and that of so-called “basicEnglish” with a vocabulary of about 850 words.In the first case, it is possible to express oneselfbriefly and concisely, and the redundancy is low.In the second case, a large number of words isneeded for an exact description, and the redun-dancy is therefore high. Similar considerationscan be made for the different forms of the Nor-wegian language.

HS = −

N∑

i=1

pi log2pi

R =

HSmax− HS

HSmax

Page 135: Information Theory and its Applications

133Telektronikk 1.2002

Another example is an address label on a postalpackage recently received by the author. It read

Forsavarels Farstiningsint,[correct: Forsvarets Forskningsinstitutt]Avd. for RadarBergen ... Norway

Thanks to the redundancy in this message, thepackage reached its destination.

Also in television signals, redundancy is com-mon; hence the background of the picture canremain unchanged for long periods, while a per-son in the foreground makes small movements.

In an efficient communications system, we willoften remove part of the redundancy at the inputto the channel. The more we remove, the moreimportant the remaining symbols become. Howmuch we are to remove therefore depends on theimpairments (noise) on the channel.

3.5 CodingThe discrete messages consist of symbols ofvarying frequency of occurrence, and can there-fore be represented by a series of Nth-orderchoices (which as a rule have unequal probabil-ity). Each of these can be reduced to a series ofelementary choices; this process is called binarycoding. Theoretically, optimal binary codingrequires an average of HS elementary choices/symbol, where HS is given by Equation (3-4).

The example in Section 3.22 illustrates a verysimple coding process. The binary codes areidentical to the numbers that arise from the suc-cessive divisions into two groups of equal proba-bility. It follows from this that the most fre-quently occurring symbol has the shortest code,and that the code becomes longer the more in-frequent the symbols are.

We can think of coding as a “stretching” of thetime scale according to the probability of occur-rence for the individual symbols, that is, a statis-tical adaptation of the channel to the class ofmessage that it is to transmit. Of course, thisrequires a time delay, which in the case of opti-mal coding, can be infinitely long. HS in Equa-tion (3-4) is therefore to be considered as a lowerbound in practice. It is, however, generally easyto calculate how close we are to this optimalstate and thus indicate the degree of efficiencyof the coding. This will depend on the memoryof the transmitter.

Of course, we do not need to use binary coding:the Morse code, for example, is quaternary. ForEnglish and Norwegian text, this is certainly notbad, as the most commonly occurring letterssuch as e and t have the shortest codes, and so

on. For the Czech language, which has com-pletely different letter frequencies, it is probablyless efficient.

4 Information in Continuousand Hybrid Systems

The information content in continuous messagesand signals can be determined by reducing themto discrete signals. This is done in two stages:sampling of the signals gives a series of ordi-nates, which can then be quantized. We shallnow look at these two processes in more detail.

4.1 SamplingThe function of time f(t) is assumed to containonly frequency components in the bandwidth 0 –W p/s. Intuitively, we know that f(t) cannot thenchange to a substantially new value in an inter-

val of time , which is half the period of the

highest frequency. On this basis, pulse modula-tion was introduced, and experiments were per-formed to examine the intelligibility of signalswith sampling frequencies from 1 up to 3 timesW. It was found that a factor of around 2 wassufficient to achieve good quality. We will nowdemonstrate that f(t) is defined exactly by sam-pling at a frequency of 2W per second.

f(t) is multiplied by a function S(t), consisting ofdelta pulses repeated at a frequency fr. The prod-uct f(t) ⋅ S(t) is then a series of ordinates of f(t)separated by a distance 1/fr. S(t) can be brokendown into a Fourier series consisting of a DCcomponent, the fundamental frequency and thehigher harmonics.

As shown by the theory of pulse modulation,multiplication by f(t) will result in sidebandsaround each of these components with the sameshape as the spectrum of f(t), that is with a widthof W. This is illustrated in Figure 4-1. If we areto separate the spectrum for f(t) around the DCcomponent from the lower sideband around thefundamental, the two must not be mixed. Thecondition for this is

fr – W > W

orfr ≥ 2W

In most practical cases, it is simplest to sampleperiodically, but this is not necessary. We can,for example, also determine f(t) uniquely bymeans of ordinates with varying time intervals,or by means of ordinates and derivatives, as longas the frequency of sampling is at least 2W persecond.

12W

Page 136: Information Theory and its Applications

134 Telektronikk 1.2002

If we are interested in the time series f(t) for aninterval of T, then

n = 2TW samples (4-1)

will be sufficient to characterize f(t) uniquelyand exactly.

4.2 QuantizationThrough sampling, we have reduced a continu-ous function of time f(t) with limited bandwidthto a series of ordinates which exactly define f(t).These can nevertheless vary continuously inamplitude, and hence an infinite number of ele-mentary choices, i.e. an infinite number of bitsof information, will be needed to specify anyone of them. However, there is no need to spec-ify the amplitudes with greater accuracy than acertain tolerance ∆, which is primarily deter-mined by the level of noise. We will thereforeonly specify certain amplitude steps, and we callthis quantization. If the maximum value of theamplitude of the function is A, we will be ableto distinguish between a total of

We wish to express N in terms of a signal powerPs, and a noise power Pn, which are the charac-teristic expressions for statistical messages andnoise. Knowing the amplitude distribution offluctuation noise, we can, on the basis of experi-ence, set

(4-2)N =A + ∆∆

=Ps + Pn

Pn

N =A + ∆∆

5 The Information Capacityof the Channel

5.1 Derivation and DefinitionWe will now determine how many bits H ofinformation it is possible to transmit in time Tthrough a channel with bandwidth W and signal-to-noise ratio Ks = Ps / Pn (reduced to receiverinput).

Figure 5.1 shows a continuous signal f(t) in agrid where the division along the time axis is the

sampling interval and the division along

the amplitude axis is the tolerance ∆.

According to Equation (4-2), there are

possible levels in each time interval, and for asignal of duration T there are, according toEquation (4-1),

n = 2WT

such time intervals.

In a single time interval there are N possibleamplitudes, after two time intervals there aretherefore N2 possibilities, and so on. The totalnumber of possible signals is therefore

In other words, there is a limit to the numberof signals we can transmit through this type ofchannel. If they are equally probable, the amountof information per signal in bits is maximum,that is

H = log2 (1 + Ks)WT = WT log2 (1 + Ks) (5-1)

The information the channel can transmit pertime interval is therefore

C = H/T = W log2(1 + Ks) (5-2)

This quantity is called the information capacityof the channel, and its units are bits/sec.

If we are to exploit this capacity fully to transmita message containing an amount of informationH in a time T, optimal coding will be required(see Section 3.5), that is, the transmitter mustconvert the message to a signal which is com-pletely statistically adapted to the channel. Wewill call such a communication system an idealsystem. It is not possible to transmit more infor-mation per time interval than C.

N n = 1+ Ks2WT

= 1+ Ks( )WT

N =Ps + Pn

Pn= 1+ Ks

12W

Figure 4-1 Frequencycomponents for S(t),with associatedsidebands for f(t)

Figure 5-1 Continuous signal

Amplitude

fr - W

W

fr 2fr 3frFrequency

0

Amplitude

A

12W

f(t)

Time t

Page 137: Information Theory and its Applications

135Telektronikk 1.2002

5.2 Discussion of an Ideal SystemEquation (5-2) applies to all values of Ks. ForKs << 1, log2 (1 + Ks) ≈ Ks log2ε, i.e.

C 1.443WKs

For Ks >> 1

C Wlog2K

Equation (5-2) shows the relationship betweenthe parameters W, Ks and C. For the same infor-mation capacity C in two cases, marked 1 and 2,

or, approximated for large values of Ks

We can thus reduce the bandwidth W if we sim-ply increase the signal-to-noise ratio enough. Areduction in W by half will therefore require anincrease in Ks to the second power. Conversely,we can increase W and thus manage with a sub-stantially lower Ks.

The reduction in bandwidth could, for example,be achieved in the following manner: instead ofspecifying each sample (at intervals of ),every second sample and its derivative are trans-mitted as a composite number. In this way, thenecessary bandwidth can be halved. On the otherhand, the signal-to-noise ratio must be raised tothe power of 2 to enable us to read the compositenumber with the same accuracy.

It should be mentioned that the bandwidth entersinto the expressions for Ks = Ps / Pn, since, forfluctuation noise,

Pn = W ⋅ 4kTabs

where k is Boltzmann’s constant and Tabs = tem-perature.

Equation (5-2) can be derived exactly by meansof multi-dimensional geometry.3) This kind oftreatment will also lead to more wide-reachingresults, such as an explanation of the thresholdproblem in broadband modulation systems.

6 Applications

6.1 MiscellaneousThere are reasons to believe that the nervoussystems of living organisms consist of nervecells with two possible responses: 0 or 1, exactlylike a switch. The signals can therefore be char-acterised by a series of binary choices, and the

concepts and results of information theory openup a possibility of studying the transmission ofinformation in nervous systems.6)

The human central nervous system containssomething of the order of 10,000 million cells.The most complicated computer so far built,ENIAC, has 10,000 binary elements, which isabout the same number as in the nervous systemof an earthworm. The large number of cells inthe human brain makes it possible for a signal tobe transmitted by several parallel routes, so thata fault in a single cell in a chain does not signifi-cantly interfere with the transmission. In an elec-tronic computer, however, such a fault wouldusually lead to a completely erroneous result inall subsequent calculations, because the signalusually can only travel by one route. The resultsof information theory have become important inconnection with the operation and correction oferrors in computers.7)

Moreover, an complete theory for transmittingsecret information (cryptography) has beendeveloped.8)

6.2 Practical CommunicationsSystems

The equation

C / W = log2(1 + Ks)

Figure 6-1 Graphic represen-tation of C/W = log2(1 + Ks).

Some typical practical systemsare indicated

PPM systems

Idea

l sys

tem

PCMsystems

Ks db

-20 -10 0 10 20

5

4

3

2

4

3

2

C/W bit

1

(1 + Ks1)W1 = (1 + Ks2

)W2

1 + Ks1= (1 + Ks2

)W2/W1

Ks1≈ K

W2/W1

s2

1

2W

Page 138: Information Theory and its Applications

136 Telektronikk 1.2002

is represented graphically in Figure 6-1. Thecurve shows how much information it is gener-ally possible to transmit per unit time and percycle of bandwidth for a signal-to-noise ratio Ks.In general, the ideal values will not be attainablein practice, since this would require optimal cod-ing, which can result in an infinitely long timedelay. However, it is possible to calculate howfar a practical system will be from the optimalsystem; hence the diagram indicates points fortypical pulse position modulation (PPM) sys-tems and binary, tertiary, and so on, pulse codemodulation (PCM) systems, without delay in thetransmitter.

We will define the information efficiency η asthe ratio of the amount of information per sec-ond that is actually transmitted to the maximumamount of information that could have beentransmitted through a corresponding ideal sys-tem. To put it another way

(6-1)

where the indices m and s represent the messageand the signal, respectively.

Let us consider the information efficiency for anumber of practical modulation systems.5)

Amplitude modulation (AM):single sideband Wm = Ws and Km Ks

or η 1 = 100 %double sideband Wm = 0.5Ws and Km Ks

or η 0.5 = 50 %

Frequency modulation (FM):η is in the region of 20 – 2 % when the modu-lation index µ (= frequency deviation/Wm)varies between 1 and 100. η decreases as µincreases; when the modulation index in-creases, i.e. increase of the bandwidth Ws, abetter message-to-noise ratio is obtained, butthe information efficiency decreases. The rea-son for this is that the denominator increaseswith the bandwidth Ws, while the numeratoronly increases with the log of Km.

Pulse modulation:With the exception of PCM, the different sys-tems have an information efficiency of thesame order of magnitude as FM. This can beexplained in the same way, namely that thebandwidth Ws becomes significantly largerthan Wm, so as to obtain a desired message-to-noise ratio for a relatively low Ks.

Pulse code modulation:Here, η is around 40 %. As is known, PCMis the only practical system devised so far forwhich Km increases with bandwidth Ws. Anincrease in Ws will therefore not reduce η, aswith other types of pulse modulation.

Previously, it has only been possible to deter-mine a communications system’s efficiency intransmitting information by means of compre-hensibility tests. This is a subjective measure,unless we allow a very large number of peopleto act as information source or destination, anduse statistical methods to find average scores.This would, however, prove expensive, becausethe system must first be built and then subjectedto time-consuming tests. However, on the basisof the general definition Equation (6-1) for infor-mation efficiency η, we are now able to calcu-late (that is, objectively assess) the ability of anindividual communications system to transmitinformation of various types.

Literature1 Fano, R M. The Transmission of Informa-

tion. Massachusetts Institute of Technology,Research Laboratory of Electronics. (Tech-nical Report No. 65.)

2 Shannon, C E. A Mathematical Theory ofCommunication. Bell Syst. Tech. J., 27,623–656, 1948.

3 Shannon, C E. Communication in the Pres-ence of Noise. Proc IRE, 37, 10–21, 1949.

4 Tuller, W G. Theoretical Limitations on theRate of Transmission of Information. ProcIRE, 37, 468–478, 1949.

5 Clavier, A G. Evaluation of TransmissionEfficiency According to Hartley’s Expres-sion of Information Content. Elec. Com., 25,414–420, 1948.

6 Wiener, N. Cybernetics. Wiley, 1948.

7 Hamming, R W. Error Detecting and Cor-recting Codes. Bell Syst. Tech. J., 29,147–160, 1950.

8 Shannon, C E. Communication Theory ofSecrecy Systems. Bell Syst. Tech. J., 28,656–715, 1949.

Excerpts from the DiscussionAfter the Lecture

Garwick:The lecture was very interesting. I have noticed acouple of points that need further clarification.

η =inf./sec. in received message

information capacity

=Wm log

2(1 + Km)

Ws log2(1 + Ks)

Page 139: Information Theory and its Applications

137Telektronikk 1.2002

1. The lecturer mentioned that when a timeseries is to be transmitted without distortionby means of sampling, it is sufficient tosample with a time interval equal to or greaterthan , where W is the bandwidth. But whatis the bandwidth? In the case of a completelysquare spectrum, this is simple enough, andsimilarly if the spectrum has some othershape, but is limited such that frequencycomponents above a certain limit do not exist.In practice, the bandwidth will often have in-finite extent. The frequency spectrum’s side-bands will also have infinite extent, and theproof will therefore not be mathematicallyvalid.

2. During his treatment of quantising, the lec-turer stated that:

The left-hand side is correct, but it is notimmediately obvious that the right-hand sideis equal to the left-hand side, since it is theamplitudes that characterise the noise, and notthe power.

Knudtzon:1. It was assumed that the message was limited

in frequency to the bandwidth 0 – W p/s.Whatever the shape of the spectrum withinthis bandwidth, the bandwidth is therefore W.It is easy to demonstrate mathematically thatan infinitely long message can have a limitedfrequency spectrum. If the message has passedthrough an ideal low-pass filter whose upperfrequency limit is W, the bandwidth will be W,irrespective of the behaviour of the systemfunction in the pass band. In practice, theattenuation in the rejected band can be madelarge, but not infinite, for all frequenciesgreater than W. We will therefore introduce atolerance in the same way as in quantising,and the bandwidth will be that frequency atwhich the attenuation definitively exceeds thetolerance.

2. I stressed that from our knowledge of theamplitude distribution of fluctuation noise, thetwo expressions can be equated to each other.In fact, it is precisely the power that charac-terises the noise. A mathematical treatmentbased on multi-dimensional geometry hasbeen presented by Shannon.*)

Gaudernack:The lecturer has pointed out that a piece of infor-mation is not completely known until one hasfull knowledge of its statistical properties, andthat this knowledge is necessary to be able to

dimension the network statistically correctly.What significance does this have for the practis-ing engineer who wishes to try out using thesenew methods? What investigations must be car-ried out before it is possible to say that oneknows a class of information completely?

Knudtzon:I will mention some examples. In telegraphy,it is possible to set up tables showing the fre-quency of occurrence of letters, digraphs, tri-graphs and words. Such tables exist for theEnglish, French, German, Italian and Spanishlanguages at least. In the case of telephony, oneneeds the correlation function, which can bemeasured electronically. The result depends toa large extent upon what one wishes to transmit,for example to what extent the character (intona-tion, emotion, atmosphere) is to be preserved.These matters are being subjected to extensivemeasurements at the Psycho-Acoustic Labora-tory at Harvard, among other places. Also televi-sion signals are being studied: it should, forexample, be unnecessary to transmit a stationarybackground in a television picture continuously.

I will briefly summarise the technique for elec-tronic determination of the correlation function.The autocorrelation function for a time functionf(t) is defined as:

The function f(t) is sampled periodically at a fre-quency of fs in pairs separated by intervals of τk

seconds, giving us the samples α1, β1, ..., αn, βn,... Pulses are generated with heights proportionalto αn, and widths proportional to βn, such thatthe areas represent the products αnβn. The aver-age is then taken of the integrated product overthe observation time T, and we get the followingexpression for the autocorrelation for τk

Then, the measurement is carried out for τk+1,and so on. Correlators of this type have beenbuilt at the Research Laboratory of Electronicsat MIT.

Falnes:The lecturer has treated statistically distributednoise. In practice, impulsive noise is also experi-enced. Although the average noise voltage islow, strong spikes can completely obliterate amessage such as a telegraphy character.

*) Proc. IRE 1949 pp 10-21.

1

2W

N =A + ∆

∆=

√Ps + Pn

Pn

ϕ(τ) = limT→∞

1

2T

∫ +T

−T

f(t)f(t + τ)dt

ϕ(τk) ≈1

Tfs

Tfs∑

n=1

αnβn | τk

Page 140: Information Theory and its Applications

138 Telektronikk 1.2002

The Hell System which has replaced ordinarytelegraphy is less sensitive to impulsive noise,because each character is represented by a largernumber of impulses. As the lecturer pointed out,one therefore achieves better signal-to-noiseratio, at the expense of bandwidth.

Garwick:I would like to mention some examples whichillustrate how the transmission of unnecessarydata provides a check of the accuracy of thetransmitted signal. First, let us consider how thenumber 25 can be written in the binary system.

The number is written as 11001. This is the min-imum number of necessary characters. If one ofthem is reproduced incorrectly, the number itselfbecomes incorrect. In a coded decimal system,the number can be written as 0010, 0101. Thefirst group represents the figure 2, the secondgroup the figure 5. In each group, there are somecharacter combinations which do not representa number. Thus, if such an impossible combina-tion is received, this indicates that an error hasoccurred. Even safer systems can be achievedwith other methods of coding. This method isused, for example, in a mathematical computerconstructed at the Norwegian Defence ResearchCentre.

Helmer Dahl:It might be interesting to stress the importanceof the derived equation for information capacityfrom a practical point of view. The equationreads:

and represents the maximum amount of informa-tion that can possibly be transmitted in a giventime interval, at a bandwidth W and signal-to-

noise ratio Ks. The equation defines in a way theboundary between what is possible and what isimpossible. For example, it can prevent us fromtrying to transmit a message through a channelthat is not capable of carrying that message.

On the other hand, we can only achieve the max-imum information capacity by using optimalcoding, and this can demand expensive and com-plex technical equipment. Hence, it may well beeconomically justifiable to use a more primitivetype of coding, and therefore a greater band-width and output power than is strictly neces-sary, technically speaking. This is especially thecase when one attempts to save power by usinglarge bandwidths, since one then makes use offar greater bandwidths than the informationcapacity actually requires, in order to obtain sim-ple coders and decoders (modulators and demod-ulators). This may also have historical causes,since up to now modulation systems have beendeveloped without considering the concept ofinformation capacity, and it is possible that byusing information theory, more suitable methodsmay be found which are both economically andtechnically efficient.

Nevertheless, there is reason to believe that max-imum information capacity and optimum eco-nomics only rarely will coincide. In cases wherebandwidth is cheap, such as in UHF systems, itmay be worthwhile to use large bandwidths inorder to save power by simple technical means.On the other hand, it is interesting that in caseswhere bandwidth is expensive, as in cable com-munications, single sideband transmission hasproved to be a suitable system which approxi-mates closely the theoretical optimum for trans-mission. Carson demonstrated the special char-acteristics of this system long ago, on the basisof other criteria. Based on information theory,we can see his results in a broader perspective,as one of many possible optimal solutions.

Knudtzon:Finally, I would like to mention that in a com-munications system containing noise it is possi-ble to calculate various negative entropies: at theinput to the channel, at the output from the chan-nel, and corresponding expressions for the con-ditional probabilities between input and output.Many important conclusions can be derived onthe basis of these calculations. It can also bedemonstrated mathematically that the negativeentropy will always decrease on passing throughthe system, and the average information cantherefore never increase in an isolated system.This is completely analogous to the second lawof thermodynamics.

α1 β1τk

α2 β2τk

α3 β3τk

α4 β4τk

1/fs 1/fs 1/fs 1/fs

Time

Time

Time

αnβn

Σαnβn τk

α1

β1

α2

β2

α3

β3α4

β4

f(t)

C =H

T= W log

2(1 + Ks)

Page 141: Information Theory and its Applications

139

The schematic diagram in Figure 1 illustrates acommunication system, which consists of aninformation source, channel and destination,with the channel consisting of sender, transmis-sion and receiver. The information source gener-ates messages, which can either be discrete2), asused in telegraphy, or continuous as in telephony.The sender converts the message to a signal thatsuits the connection, which is a cable or radioconnection. The receiver attempts to retrieve theoriginal message by carrying out the inverse pro-cess of the sender. The destination is the personor device that the message is bound for.

Three things characterise a communication sys-tem of this type:

a) It is to transmit information.b)The equipment and channel have a limited

frequency band.c) Noise occurs, while at the same time signal

strength is lost. A certain signal/noise relation-ship is therefore required in order to achieve aspecified message/noise relationship at thedestination.

These facts may seem rather obvious to ustoday. However, it is only in recent years that wehave fully taken the consequences of them, andin so doing have arrived at a general communi-cation theory. In order to see this more clearlywe will briefly consider some fundamentalstages in the development of communicationtechniques.

The original idea was that the electrical functionof time representing the message at receiver-out,should be an exact reproduction of the timeseries for the message at sender-in. Conse-quently, what was desired was the elimination ofthe distance between the information source anddestination, hence the prefix “tele” (far off) intelegraphy, telephone, television, telemetry etc.It was expected that part of the signal outputwould be lost in the channel, but that it had a fre-quency-dependent response just like all otherelectrical networks was a surprise to most of thecommunication engineers of the time. At thattime only slow telegraphy and telephony hadbeen developed and the systems’ frequencycharacteristics were detailed based on practicaltests, most of which were comprehensibilitytests. Due to the properties of these special typesof signals and the human ear, only the amplituderesponse was taken into consideration, while thephase response was almost completely ignored.Unfortunately, this became customary and isoften still the case today. This further led to elec-trical networks being studied exclusively in thefrequency range for many years, by means of so-called “sinusoidal analysis” (steady state analy-sis). The network is then uniquely determined byits system function H(jω), which can be definedas the relationship between a signal’s frequencyspectra at output and input. This is generally acomplex-valued function.

The next milestone of significance to our discus-sion was the discovery of the radio valve, which

Statistical Communication TheoryA Brief Outline of the Problem1)

Graduate Engineer Nic. Knudtzon, M.N.I.F.Norwegian Defence Research Establishment, Bergen

1) Lecture presented at a study session for radio technology and electro-acoustics at Farris Bad, 16–18 June 1950.2) Non-continuous.

Informationsource

Transmitter Receiver Destination

Message Signal Signal+ noise

Message+ noise

Noise

Channel

Figure 1 Communication system

This is a translation into English of the paper “Statistisk kommunikasjonsteori”, which appeared in

Teknisk Ukeblad on November 16, pp. 883–887, 1950. The translation was done by Berlitz GlobalNET

and final quality control was done by Geir R. Øien.

Dr. Nic. Knudzon (80) obtainedhis Engineering degree from theTechnical University of Norway,Trondheim in 1947 and his Doc-tor’s degree from the TechnicalUniversity in Delft, the Nether-lands in 1957. 1948–1949 hewas with the Research Labora-tory of Electronics, Massachu-setts Institute of Technology,working with information theoryand experiments. 1950–1955 hewas with the Norwegian De-fence Research Establishment,Bergen, working on the devel-opment of microwave radiolinks; and from 1955 to 1967 hewas Head of the Communica-tions Division at Shape Techni-cal Center in The Hage, Nether-lands, where his efforts went intothe planning of military telecom-munications networks and sys-tems in Western Europe. From1968 to 1992 he was Director ofResearch at the Norwegian Tele-communications Administration,working on the planning of futuretelecommunications systems,networks and services. Dr.Knudtzon has been member ofgovernment commissions andvarious committees, includingthe Norwegian Research Coun-cil, the National Council forResearch Libraries, the Inter-national TelecommunicationsUnion, EURESCOM, etc.

Telektronikk 1.2002

Page 142: Information Theory and its Applications

140 Telektronikk 1.2002

enabled the amplification of the signals. Thelosses could now be gained back, and this wasof importance for transmission over both cablesand radio. However, a new problem arose asdifferent types of noise manifested themselves.Fluctuation noise that enters via the aerial andis generated in valves and resistors, occurs atall frequencies; the power is proportional to thebandwidth. In order to combat the noise, othermodulation methods were introduced in additionto the amplitude modulation (AM), which wasknown from multiplexing channels, amongstother things. Common to all modulation meth-ods is that a low frequency message can betransmitted as a high frequency signal. Fre-quency modulation (FM) was the first solution.This idea was first put forward because it washoped that one could manage with narrower fre-quency bands than with AM, so that the effect ofthe noise would be reduced. However, the oppo-site proved to be true, as practical tests showedthat the broader the frequency band became, themore the message/noise relationship improvedfor the same transmit power. FM is thus an illus-trative example of how fumbling progress hasbeen made. After further experimentation, thedifferent pulse modulation systems were devel-oped one by one; as we know, intense work iscurrently being carried out on pulse code modu-lation (PCM). Detailed investigations of all thesesystems show that for the same message/noiserelationship at receiver-out, the sender’s averagesignal power PS can be reduced when the band-width W is increased, if only the signal/noiserelationship KS = PS / PN (reduced to sender-out)is higher than a certain threshold value. Thus,noise reduction requires a broad frequencyband. It should be added that when calculatingthe message/noise relationship for the differentmodulation systems, only messages that aresinusoidal functions of time have been so farconsidered.

As is known, during pulse modulation, a seriesof the message’s ordinates are transmittedinstead of the entire continuous time series. Thisprocess is known as sampling. It can be shown3)

that the message is completely and exactly deter-mined by the samples, if these are taken moreoften than the equivalent of twice the highest

frequency component. The samples are quan-tized in PCM, which means that only certain dis-crete amplitudes are transmitted. The stagesbetween these are determined by the distortionthat is thereby allowed. By sampling and quan-tizing, we have managed to convert the continu-ous message to a discrete message.

Television was developed around the same timeas the modulation methods. The television sig-nals consist of a series of pulses; a typical exam-ple of how such a pulse is deformed in an elec-trical network is shown in Figure 2. It isextremely important that the pulses do not over-lap. The development of the television thereforeled to studies in the networks’ responses in thetime domain for a pulse input of a suitable form.It has proved to be beneficial in calculations touse the so-called δ pulse, which is defined as fol-lows:

All other time series can be thought of as com-posed of such δ pulses. The network is subse-quently precisely determined by its δ pulseresponse.

It may be appropriate here to explain the connec-tion between the system function H(jω) and theδ pulse response h(t) for an electrical network.This has been done in detail in the Appendix.The results show that “sinusoidal analysis” and“δ pulse analysis” are closely connected and canbe derived from each other. If we choose towork in the frequency range, it has to be stressedthat the amplitude response and the phaseresponse are equally important, and that theycannot be specified independently of each other.

Servo mechanisms were particularly developedduring and after World War II, as there was agreat need for control equipment for artilleryetc. The reason why development in this areahas happened so quickly, is that it quickly be-came apparent that the problems had much incommon with those that were previously

δ (t) = 0 for t ≠ 0

δ (t) dt−∞

+∞

∫ = 1

3) “Information Theory” lecture presented at a study session for radio technology and electro acoustics at Farris Bad 15–18 June 1950. ETT no. 30,1950.

NetworkIn Out

Time Time

Figure 2 Typical deformationof pulse in an electricalnetwork

Page 143: Information Theory and its Applications

141Telektronikk 1.2002

encountered in the network theory, and thusknown results could be utilised. However, theopposite effect has also taken place. An impor-tant problem that has had major consequencesoccurred in connection with anti-aircraft guns,where the target can move quickly and in irregu-lar paths. It was therefore important to be able topredict the position of the target for the length oftime that it took for the projectile to reach it. Inorder to solve this problem a thorough study hadto be made of the aircraft’s path as a function oftime. A pilot under fire will naturally steer incurved paths, but the freedom of movement islimited by the construction of the aircraft. Thus,for example the radius of curvature cannot bemade smaller than a certain size depending onthe aircraft’s design, weight and speed. Theflight path cannot therefore be regarded asknown in advance, but the path’s statisticalproperties can be determined for the types ofaircraft that are targeted. Consequently, this iswhat needs to be borne in mind when construct-ing the predictor.

We are faced with exactly the same problems incommunication engineering. The individualmessages that we want to transmit over the sys-tem cannot be known in advance if they are tocontain any information for the destination.There is therefore no point in transmitting asinus function or a single δ pulse, as their timeresponse is determined for all time. The messagereceived could therefore be predicted immedi-ately and the easiest thing would be not to sendit at all. “Sinus analysis” and “δ pulse analysis”have subsequently no real justification. On theother hand, there is reason to emphasise thatonly a limited number of signals can be transmit-ted over a channel with a certain frequency bandand signal-to-noise ratio. The communicationsystem is therefore constructed for a class ofmessages with certain known statistical charac-teristics. For example, these could be amplitudedistribution, correlation functions, power densityspectra, letter and word frequencies etc. We willassume that the messages are statistically sta-tionary, i.e. that these characteristics are invari-ant with respect to a shift of the time scale. Thisis the nature of fluctuation noise.

Let me now consider the filter problem illus-trated in Figure 3. A message fm(t) and noisefn(t) are input to a linear electrical network,which we wish to construct so that the timefunction fu(t) is as an exact reproduction of mes-sage-in fm(t) as possible. Completely exact ren-dering is out of the question, due to stray capaci-tance etc., which limits the frequency band, andif this was not limited, we would get infinitelystrong noise. The best option is therefore to con-struct the system so that we get the least possibleerror between the actual time series fu(t) and thedesired time series fd(t), which in this case withthe filter is identical to fm(t). We must first de-cide what we mean by error. It is natural herethat we specify this in the time domain. If wedirectly let [fu(t) – fd(t)], the error will be a func-tion of time. If we take the mean over a longtime, we see that positive and negative failureswill cancel each other out and this is not whatwe want. We bypass this problem by taking themean of the square of the difference. This meansthat the system will be determined by

This error criterion is physically sound for manyapplications and leads to a mathematically solv-able problem. We will call the correspondingnetwork statistically optimal. When solving theproblem mathematically, it is shown4) that thecorrelation functions for the classes of timeseries that occur at input and output, determinethe network’s system function. The network willalways be realisable and stable, as the require-ment for this is taken into consideration in thesolution. It is also possible to derive an expres-sion for the size of the error. Above we haveconsidered the filter problem where fd(t) = fm(t).Generally, however, we can specify fd(t) as wewish, and decide the relevant network in a simi-lar way as for the filter. For a statistically opti-mum differentiator we thus set fd(t) = fm'(t), andfor a statistically optimum predictor fd(t) = fm(t +α), where α is the desired prediction time. It isworth noting that we can thus, without difficulty,demand a shift α in the time domain. The prob-

limT→∞

1

2Tf u (t)− f d (t)[ ]2

−T

+T

∫ dt = minimum

4) “Statistically Optimal Networks”, lecture presented at a study session for radio technology and electro-acoustics at Farris Bad, 15–18 June 1950. ETTno. 30, 1950.

tH(jω)

In Out

h(t)

Messagefm(t)

Noisefn(t)

Desired fd(t)

fu(t)

t

t

Figure 3 The filter problem

Page 144: Information Theory and its Applications

142 Telektronikk 1.2002

lem of constructing statistically optimum net-works for classes of time functions with knowncorrelation functions is therefore solved in the-ory, and practical applications have also beenfound. It is apparent that the problem is posedvery differently and is based on a much sounderfoundation than previously known filter theory.

It is important to have a clear view of what wereally want to transmit. We shall use telephonyas an example. Since the ear does not react tosmaller distortions of the phase, an exact repro-duction of the microphone current is not essen-tial. We know from experience that a frequencyband of at least 4,000 p/s is required to transmitspeech of a reasonable quality, so that the userunderstands what the information source isrelaying, and also to some degree gets animpression of its emotions and feelings as is thecase in direct speech. Since a very large part ofthe conversations over the commercial telephonenetwork convey emotions, emphasis here is puton the transmitted message giving as close to thesame impression to the listener as if the informa-tion source was addressing him directly. Practicehas shown that the price of a telephone channelincreases proportionately with the bandwidth,and for financial reasons it is therefore desirableto reduce this. This can be done in a number ofways; we shall briefly discuss some of thesebelow.

a) Analysis-synthesis telephony. The micro-phone current is analysed in the sender, codesare transmitted for the tone type and powerspectrum’s form, the codes then influence an“artificial voice” (oscillator set) in the re-ceiver. The necessary bandwidth for the samecomprehensibility as that of an ordinary chan-nel is approximately 400 p/s, i.e. reductionwith a factor of 10. Systems of this type werebuilt before the beginning of the 1940s by BellTelephone Laboratories. They are known asvocoders.

b)Sound code telephony. Speech is physiologi-cally limited to approximately 50 differentsounds, and someone that talks quickly canbarely express more than 5–6 per second. Ifevery sound is given a code, a frequency bandof approximately 40 p/s should be sufficient.

c) Sound group telephony. If we are only inter-ested in transmitting certain sound groups(special sound combinations or words), thebandwidth for special systems can conceiv-ably be further reduced by a factor of 10.

If any of these methods of bandwidth reductionare to be used, the terminal equipment will ofcourse become more expensive so that they canonly be beneficial for long transmissions. How-

ever, we are here primarily interested in the factthat it is possible (without time delay) to trans-mit comprehensible speech over considerablynarrower bands than those that are currently inuse. This is done at the expense of the differentinformation sources’ individual characters; withdrastic bandwidth reduction the listener willunderstand well enough what is being said, butwill not be able to determine whether the infor-mation source is male or female or happy or sad.In many cases, for example in systems for trans-mitting orders, weather reports etc., the speech’sindividual character is of little consequence. Byreducing the bandwidth we have thus managedto remove the redundancy in the message. Onlyrarely will all the redundancy be removed sincethis will help to increase the comprehensibilityif there are disturbances. Similar considerationscan be made for other communication systems,for example for telegraphy and television. In thelatter, transmitting the entire visible spectrum(3 × 1014 p/s) has never been suggested, but theimage is analysed, a code signal is transmittedand the image is recovered in the receiver bysynthesis; this is all in full analogy with the pre-viously discussed vocoder telephony. The con-clusion is that communication systems must beconstructed based on knowledge of the informa-tion source, the destination and what is to betransmitted. If information is being transmittedto or from a person we need to recognise thereaction of different messages for the type ofpeople that will use the system. Work is there-fore currently underway in the USA to measurethe comprehensibility and quality of messagesto the ear and eye under different conditions.

Up to now we have used the expression informa-tion without further discussion of what it actu-ally is. We have stated that we shall concentratethe attention not so much on the time function,as on what the time function represents, and thiswhat is information. We have furthermore indi-cated that each message is selected from theclass of messages that we wish to transmit;information therefore has something to do withselection. The difficulty in trying to give a closerdefinition of what we mean is not only due to thefact that we have included people in our commu-nication system. Even in cases where text istransmitted from paper roll to paper roll in teleg-raphy (where people are not counted as eitherinformation sources or destinations), we havenot yet been able to account for how much infor-mation can be transmitted per second. Neitherhave we been able to say how much informationis lost if 1% of the characters are transmittedincorrectly. We therefore need a measure (i.e. aunit) of information, just like 1 volt is the unitfor electric voltage. A unit of this type isdefined5) and has been given the designation1 bit (or 1 Hartley).

Page 145: Information Theory and its Applications

143Telektronikk 1.2002

With a unit of this type at our disposal we canlogically attack the problem of transmittinginformation as effectively as possible. Let usconsider a simple example from telegraphy. Themessage consists of letters of varying frequencyof occurrence. In Norwegian and English the let-ter E appears more often than for example Z; inan efficient system we should therefore allowthe signal for E to be shorter than the signal forZ, as in Morse code. In a telegraphy system forthe Czech language, the messages will haveother letter frequencies, and a different code istherefore desired. Similar considerations can bemade for the other types of communication sys-tems. What is generally involved is convertingthe message so that it statistically adapts to thechannel. This process is known as coding, andwill generally require a time delay in the sender,which consequently must be equipped with amemory of one type or another. In order to avoidmisunderstandings, we draw your attention tothe fact that this form of coding must not be con-fused with the previously discussed simplifica-tion of the message by removing the redun-dancy.

It can be shown6) that a channel with bandwidthW p/s and signal-to-noise ratio KS (reduced tosender-out) during the time T can transmit aquantity of information:

H = T W log2 (1 + KS) bits

With optimal coding, the channel’s informationcapacity is thus

C = H/T = W log2 (1 + KS) bits/sec

C expresses the maximum number of bits persecond that it is possible to transmit over such achannel. There is thus an optimum that can eas-ily be calculated but which in practice can onlybe achieved with optimal coding. Generallyspeaking however, it is also possible to calculatehow far from this optimum the individual sys-tems are. Previously, a communication system’seffectiveness for transmitting information couldonly be estimated by carrying out comprehensi-bility tests, which is a subjective measurementunless many different people are used to repre-sent the information source and destination.Now on the other hand, we have obtined a toolfor calculating (i.e. objectively calculate) a sys-tem’s information efficiency.

From the expression above we see that for thesame information capacity the bandwidth W can

be reduced, if KS, i.e. the sender output, isincreased sufficiently. Conversely, an increase inthe bandwidth will allow a lower sender output.This concurs exactly with practical experience.

In the introduction we mentioned the threethings that characterise a communication sys-tem: information, frequency band and signal-to-noise ratio. We have now arrived at a generalrelationship between them.

We have confined ourselves above to consider-ing electrical communication systems. Thepoints of view that have been expressed canhowever be used with regard to communicationsystems in a much wider context. The sciencethat generally deals with the control and trans-mission of information in living beings andmachines, has been given the name Cybernetics(Greek for steersman) by its founder, NorbertWiener.

What is fundamentally new is that we havefound telecommunications to be a statistical pro-cess and that a unit is defined for information.Whilst before we considered the different sys-tems individually, we can now look at them col-lectively. This line of theought is very important;it opens up great possibilities and will undoubt-edly gain greater practical importance as wedefine statistical characteristics for the differentclasses of messages we want to transmit.

Appendix

Correlation between “sinusoidalanalysis” and “δ pulse analysis”The system function H(jω) for a network can bedefined as the relationship between a signal’sfrequency spectrum at output Fu(jω) and inputFi(jω)

We select a δ pulse as the signal; it can beregarded as the limit shape for the pulse

when a → ∞, i.e. frequency

spectrum-in, is in accordance with Fourier’sintegral formula

or it is permitted to reverse the order of the limitand the integral,

F ja

a t j t dtia

ωπ

ω( ) = − −( )⎡

⎣⎢

⎦⎥⋅

→∞−∞

+∞

∫ lim exp 2 2

a

πexp −a2t2( )

H jω( ) =Fu jω( )Fi jω( )

5) “Information Theory”, l.c.6) “Information Theory”, l.c.

Page 146: Information Theory and its Applications

144 Telektronikk 1.2002

(2)

The network’s δ pulse response is h(t), i.e. fre-quency spectrum-out

(3)

Consequently, with the insertion of (2) and (3) in(1)

or according to Fourier’s integral formula

There is therefore a unique relationship betweenH(jω) and h(t).

h(t) =1

2πH( jω )

−∞

+∞

∫ exp jωt( ) ⋅dω

H( jω ) = h(t)−∞

+∞

∫ exp − jωt( ) ⋅dt

Fu jω( ) = h(t)−∞

+∞

∫ exp − jωt( ) ⋅dt

F j

aa t j t dt

i

a

ω

πω

( ) =

− −( ) ⋅ =→∞

−∞

+∞

∫lim exp 2 2 1

LiteratureShannon, C E. A Mathematical Theory ofCommunication. Bell Syst. Techn. J., 27,379–423, 623–656, 1948.

Shannon, C E. Communication in the Presenceof Noise. Proc. IRE, 37, 10–21, 1949.

Wiener, N. Cybernetics. Wiley, 1948.

Wiener, N. Extrapolation, Interpolation, andSmoothing of Stationary Time Series. Wiley,1949.

Halsey, R J, Swaffield, J. Analysis-SynthesisTelephony with Special Reference to theVocoder. Proc. Inst. Elec. Eng., III 95, 391–405,1948.

Page 147: Information Theory and its Applications

145

The individual messages (signals) that are trans-mitted over a communication system cannot beknown in advance if they are to contain anyinformation for the addressee. The only thingthat is known is certain statistical characteristicsfor the class of messages we want to transmit.1)

Previously, electrical networks were constructedon the basis of messages that were a sine curveor a single δ pulse. The responses for these timeseries are determined forever, and will not trans-mit any information. These methods are nottherefore based on realistic situations. However,all other time series can conceivably be built upof such sine curves or δ pulses. We have beenaware of this for a long time, but until recentlywe have neglected to take account of the individ-ual components’ statistical weight, i.e. howoften the individual frequency components or δpulses appear in the messages. When we charac-terise an electrical network, we should combinethe system function H(jω) and the δ pulseresponse h(t) with the statistical characteristicsfor the class of messages the network is to trans-mit. As we know, the following unique relation-ship exists between an electrical network’s sys-tem function and δ pulse response

A filter separates a desired message from anunwanted message and noise. Previously, workwas carried out in the frequency domain, and thefilter’s critical frequencies were set more or lessrandomly based on practical tests, where it wasfound which amplitude and phase response sepa-rated the unwanted messages from the desiredmessages most effectively without any substan-tial distortion. “Most effective” and “substan-tial” were however based on subjective assess-ments. The calculations, therefore, were notfounded on any firm basis. Gradually the needfor more complete solutions increased, and thusthe television problems were created, which dueto the nature of the message were different fromthose previously encountered in telephony andtelegraphy. We were then led to proceed tenta-tively with finding usable solutions in this newfield. Communication engineering was in manyways an art, since not only the principle of theconstructions but their application depended alot on the engineer’s ingenuity and “goodnature”, whereas the quality could be discusseddepending on personal taste. With the wealthof experience we have gradually gained in thisway, very good and efficient networks haveundoubtedly emerged. However, we have stillnot been able to judge how efficient these net-works are, because the optimum has remainedunknown. Moreover, we have been led to pro-ceed tentatively each time a new type of problemhas turned up. In brief, we have missed a generaland well-founded theory for networks that will

Statistically Optimal NetworksSummary of lecture presented at a study session for radio technology and electro-

acoustics at Farris Bad, 16-18 June 1950by Graduate Engineer Nic. Knudtzon, Norwegian Defence Research Establishment,

Bergen

A more detailed presentation will be issued as a separate publication. Only a brief outline of the problem

will be presented here without any details about the mathematical solution.

1) See also “Statistical communication theory. A brief outline of the problem”, Teknisk Ukeblad, 1950.

This is a translation into English of the paper “Statistisk optimale nettverk”, which appeared in Elektro-teknisk Tidsskrift 63 (30), 413–416, 1950. The translation was done by Berlitz GlobalNET and final qual-

ity control was done by Geir E. Øien.

tH(jω)

In Out

h(t)

Messagefm(t)

Noisefn(t)

Desired fd(t)

fu(t)

t

t

Figure 1 The network problem

Dr. Nic. Knudzon (80) obtainedhis Engineering degree from theTechnical University of Norway,Trondheim in 1947 and his Doc-tor’s degree from the TechnicalUniversity in Delft, the Nether-lands in 1957. 1948–1949 hewas with the Research Labora-tory of Electronics, Massachu-setts Institute of Technology,working with information theoryand experiments. 1950–1955 hewas with the Norwegian De-fence Research Establishment,Bergen, working on the devel-opment of microwave radiolinks; and from 1955 to 1967 hewas Head of the Communica-tions Division at Shape Techni-cal Center in The Hage, Nether-lands, where his efforts went intothe planning of military telecom-munications networks and sys-tems in Western Europe. From1968 to 1992 he was Director ofResearch at the Norwegian Tele-communications Administration,working on the planning of futuretelecommunications systems,networks and services. Dr.Knudtzon has been member ofgovernment commissions andvarious committees, includingthe Norwegian Research Coun-cil, the National Council forResearch Libraries, the Inter-national TelecommunicationsUnion, EURESCOM, etc.

Telektronikk 1.2002

H(jω) =

∫ +∞

−∞

h(t)ε−jωtdt

h(t) =1

∫ +∞

−∞

H(jω)εjωtdω

Page 148: Information Theory and its Applications

146 Telektronikk 1.2002

transmit information of different kinds. Theproblem and methods we are now dealing withconstitute a step in the development towards thisgoal.

We will now formulate the network problem inthe time domain, see Figure 1. The messages areregarded as being continuous functions of time,where the redundancy is removed to the extentwe wish. The information in a message of thistype can easily be calculated.2) Our problem isnow to create a network in such a way that forcertain classes of time functions-in fi(t) we gettime functions-out fu(t) which deviate as little aspossible from the desired time functions fd(t).The deviation is defined in the time domain asthe mean squared error, i.e.

(1)

Networks that are constructed based on this min-imum condition are defined as statistically opti-mal. Some examples of these networks are givenin the table below, where fm(t) represents a mes-sage and fn(t) represents noise or an unwantedmessage – both have known statistical character-istics.

It is worth noting that we can specify a time shiftα in the time domain, and are thereby able toconstruct networks, known as predictors, whichcan predict the outcome for fm(t).

It is assumed in the following that1)all time functions-in are statistically station-

ary, i.e. their statistical characteristics areinvariant upon a translation in time of theassociated time function. Commonly occur-ring messages and noise will most often fulfilthis condition.

2) the network is linear. This condition is en-forced due to the mathematical difficultieswhen solving non-linear problems. A solutionof the linear problem is however better thannone at all.

As we already know, the following relationshipexists between time functions-out fu(t) and timefunctions-in fi(t) in a linear electrical network3)

(2)

where h(t) is the δ pulse response.

Upon insertion of equation (2) in (1), the condi-tion for obtaining a minimum is attained on thefollowing form

(3)

The expressions in the brackets are all of theform

This function is known as the correlation func-tion; for f1(t) = f2(t) we get auto-correlation, andfor f1(t) f2(t) we get cross-correlation.

The auto-correlation function ϕ11(χ) is a veryimportant statistical parameter for the associatedtime function f1(t). Its most important property isexpressed in Wiener-Khintchine’s theorem:ϕ11(χ) is the Fourier transformation of the powerdensity spectrum φ11(jω), which is defined as themean output per frequency unit for f1(t). This isexpressed mathematically as follows:

(3a)

(3b)

Corresponding relations apply to the cross-corre-lation function. Equation (3a) forms the keyequation for the correlation method: the powerdensity spectrum can be determined via the cor-relation function, which is often easily calcu-lated based on its definition equation by meansof statistical methods.

fi(t) fd(t) network

fm(t) + fn(t) fm(t) filter

fm(t) fm'(t) differentiator

fm(t) fm(t + α) predictor

fm(t) + fn(t) fm(t + α) filter-predictor

3) See for example “Theory of Servomechanisms”, R.L. Series No. 25, p 35.2) “Information theory”, lecture presented at a study session for radio technology and electro-acoustics at Farris Bad, 16–18 June 1950. ETT no. 30.

1950.

E = limT→∞

1

2T

∫ +T

−T

[fu(t) − fd(t)]2dt = minimum

fu(t) =

∫+∞

−∞

fi(t − τ)h(τ)dτ

E =

∫ +∞

−∞

h(τ)dτ

∫ +∞

−∞

h(σ)d(σ)

[lim

T→∞

1

2T

∫ +T

−T

fi(t)fi[t − (σ − τ)dt]

]

−2

∫ +∞

−∞

h(τ)dτ

[lim

T→∞

1

2T

∫ +T

−T

fi(t − τ)fd(t)dt

]

+

[lim

T→∞

1

2T

∫ +T

−T

fd(t)fd(t)dt

]

= minimum

ϕ12(χ) = limT→∞

1

2T

∫ +T

−T

f1(t)f2(t ± χ)dt

=

Φ11(jω) =

∫ +∞

−∞

ϕ11(χ) cos ωχdχ

ϕ11(χ) =1

∫+∞

−∞

Φ11(jω) cos ωκdω

Page 149: Information Theory and its Applications

147Telektronikk 1.2002

Upon insertion of the correlation functions,equation (3) takes the following form:

(4)

Here the correlation functions are known for theclasses of time series we will deal with, and thenetwork is selected so that its δ pulse responseh(τ) makes the error E minimum. This is a prob-lem of variation, which is solved by varying h(τ)and using Euler-Lagrange’s condition forextremal values. Equation (4) will hereby bereduced to Wiener-Hopf’s integral equation

(5)

When solving this equation, consideration isgiven to the fact that the network must be stable.The condition for this is h(τ) = 0 for τ < 0, orexpressed in the frequency domain, that the sys-tem function H(λ) does not have poles inside theright half plane.

The system function for the statistically optimalnetwork can then be shown to be

(6)

where λ = σ + jω and w = x + jy are complex-valued variables. The power spectrum for timeseries-in Φii(λ) is split into factors Φii

(H)(λ) andΦii

(V)(λ), which have all poles and zeros in theright and left half respectively, i.e.

When H(λ) is thus determined, we will find aconfiguration of common network elements thathas this system function. This is known as syn-thesis and there are several methods for solvingthis problem. It must be pointed out that the syn-thesis problem does not have a unique solution;there are several different configurations thathave one and the same system function.

It is also possible to find an expression for thesize of the error. Just like the system function,this is completely and exclusively determined bythe correlation functions (or their Fourier trans-forms, which are the power spectra) for theclasses of time functions that occur at input andwhich are desired.

As an example we will find the special systemfunction for a predictor. Here is

fi(t) = fm(t) and fd(t) = fm(t + α)

where α is the prediction time.

Consequently

Φii(λ) = Φmm(λ)

and it can further be shown that

Φid(λ) = Φmm(λ)ελα

so that, after equation (6)

(7)

We will apply this equation to a simple example.Let fm(t) be a time series as shown in Figure 2,consisting of a series of identical pulses of theform tε-t. The single pulses are totally indepen-dent of each other and are random, there are kper unit time. The correlation function can beshown to be

and by applying equation (3a) the power spec-trum is found as

This is factorised as follows

and this results in

te-t

Time

Figure 2 Signal

E =

∫ +∞

−∞

h(τ)dτ

∫ +∞

−∞

h(σ)d(σ)ϕii(τ − σ)

−2

∫ +∞

−∞

h(τ)dτϕid(τ) − ϕdd(0) = minimum

∫ +∞

−∞

h(σ)dσϕii(τ − σ) − ϕid(τ) = 0

for τ > 0

H(λ) =1

2πΦ(V )ii

(λ)

∫∞

0

ε−λt

dt

∫ c+j∞

c−j∞

Φid(w)

Φ(H)ii (w)

εwt

dw

Φii(λ) = Φ(H)ii

(λ) · Φ(V )ii

(λ)

H(λ) =1

2πΦ(V )mm(λ)

∫ +∞

0

ε−λt

dt

∫ c+j∞

c−j∞

Φ(V )mm(w)εw(t+α)

dw

ϕmm(τ) =k

4ε−|τ | (1+ | τ |)

ϕmm(jω) =k

1

(1 + ω2)2

ϕmm(jω) =k

1

(1 + jω)21

(1 − jω)2

Φ(V )

mm(w) =

√k

1

(1 + w)2

Page 150: Information Theory and its Applications

148 Telektronikk 1.2002

The system function for the optimal predictor forthis time series is then

or calculated, as the complex integration is per-formed using the residue theorem, as

H(λ) = ε-α(1 + α + αλ)

This is a network as shown in Figure 3. Noguidelines have previously existed for the con-struction of predictors.

Lee & Stutt [3] have carried out a practicaldemonstration of predictors for filtered fluctua-

tion noise, see the block diagram in Figure 4.The noise source was a radio tube 6D4, and thefilter a single resonance circuit of fo = 1080 p/sand Q varying in stages from 10 to 90. The auto-correlation function for such filtered noise canbe calculated; it will be a damped cosine func-tion with frequency fo and attenuation propor-tional to Q. The optimal predictor was then con-structed based on Equation (7). Figure 5 showsan example of the prediction when Q = 10, thesolid curve represents the time function at pre-dictor-out and the dotted one at predictor-in.This results in the predicted function followingnicely the actual function. Such predictors canbe envisaged to have a bearing on the suppres-sion of noise in communication systems.

Similar examples can be found for other types ofstatistical optimal networks: filters, differentia-tors, compensators etc, which are determined byEquation (6) upon due specification of thedesired time series fd(t). This equation has there-fore a very general validity.

Literature1 Wiener, N. Extrapolation, Interpolation and

Smoothing of Stationary Time Series. TheTechnology Press, John Wiley, 1949.

2 Lee, Y W, Wiesner, J B. Correlation Func-tions and Communication Applications.Electronics, 23, 86–92, 1950.

3 Lee, Y W, Stutt, C A. Statistical Predictionof Noise. Proc. Nat. Elec. Conf., Chicago, 5,342–365, 1949.

Excerpts from the DiscussionAfter the LectureGarwick:1. Derivation of the system function H(jω) based

on the delta function is not mathematicallycorrect. The delta function is not a mathemati-cal function; it does not satisfy Dirichlet’s def-inition of a function, and cannot be integrated,either as a Riemann or a Lebesgues integral.When using delta functions, the order of thelimit and integral sign are swapped. The limitmust be carried out before the integral sign. Itis possible that the result will be correctregardless, but this must be checked usingother methods.

2. Where the error size E was discussed, therewas absolutely no mention of the transmissionchannel’s noise. Is the intention to imaginethat the signal plus noise at input are alreadystrengthened so much that the channel’s setnoise is negligible?

Figure 3 Predictor circuit forthe signal in Figure 2

Figure 4 Demonstration of predictor for noise

1 + α

α

Noisesource

PredictorFilterOsc.

scope2 beams

Figure 5 Example of prediction of filtered fluctuation noise

H(λ) =1

2π(1 + λ)2

∫ +∞

0

ε−λt

dt

∫ c+j∞

c−j∞

εw(t+α)

(1 + w)2dw

Page 151: Information Theory and its Applications

149Telektronikk 1.2002

3. The equation:

cannot always be used. f(t) will in practice bealmost equal to zero for t > T1 and t < T2. Theintegral can then be divided in three:

When T goes to infinity, this expression goesto zero regardless of the network.

Knudtzon:1. Communication engineers and mathemati-

cians will probably never agree on the δ func-tion. I refer to a very sound and worthy articleby van der Pol.4) I only used the δ function

when recapitulating known results from thenetwork theory and could not take time to dis-cuss this in more detail here on the black-board, so as to avoid displacing the topic ofmy lectures.

2. Linear networks are assumed. The noise fromthe network is reduced therefore to the input.

3. Statistically stationary time series are ex-pressedly assumed. The equation is mathemat-ically correct.

C.B. & H.G.W.

4) Philips Research Reports 1948, pp 174–190.

E = limT→∞

1

2T

∫ +T

−T

[fm(t) − fd(t)]2dt

1

2T

∫ +T1

−T

[ ]2dt +

1

2T

∫T2

T1

[ ]2dt +

1

2T

∫T

T2

[ ]2dt

Page 152: Information Theory and its Applications

150 Telektronikk 1.2002

Page 153: Information Theory and its Applications

151151Telektronikk 1.2002

Special

Page 154: Information Theory and its Applications

152 Telektronikk 1.2002

Page 155: Information Theory and its Applications

153

The social responsibility of business is toincrease profits.

Milton Freidman, Nobel-prize-winningeconomist (quoted in Hood, 1996, p. 16).

Corporate social responsibility is not an occa-sional activity. It is not like visiting a mausoleumonce a year, or hearing a church sermon everySunday. It has to be completely integrated withthe corporation’s business function.

Muhammad Yunus, Managing Director of theGrameen Bank, in a personal interview (May2, 2001).

The mobile phone is like a cow. It gives me“milk” several times a day. And all I need to dois to keep its battery charged. It does not needto be fed, cleaned, and milked. It has now con-nected our village with the world.

Parveen Begum, owner and sole dispenserof mobile telephony services in VillageChakalgram, Savar Thana, Bangladesh, ina personal interview (May 2, 2001).

I want my fellow Americans to know that thepeople of Bangladesh are a good investment.With loans to buy cell phones, entire villages arebrought into the information age. I want peoplethroughout the world to know this story.

U.S. President Bill Clinton in an address dur-ing his meeting with members of the VillagePhone Project in Dhaka, Bangladesh inMarch, 2000.

The present article investigates2) the mobile tele-phony operations in Bangladesh of Telenor, theleading Norwegain telecommunication com-pany. Telenor has forged a business and strategicsocial change partnership with the GrameenBank, one of the best known development orga-nizations in the world. A historical backgroundon Telenor’s involvement in Bangladesh is pro-vided. Telenor’s business and social accomplish-ments in Bangladesh are presented, highlightinghow a corporation can pursue multiple bottomlines.

Multiple Bottom Lines?1)Telenor’s Mobile Telephony Operations in BangladeshA R V I N D S I N G H A L , P E E R J . S V E N K E R U D A N D E I N A R F L Y D A L

The present article distills lessons learned about sound business and corporate social responsibility

practices from Telenor’s participation in mobile telephony operations in Bangladesh. Telenor’s mobile

telephony operations in Bangladesh provide valuable lessons about how corporations can strategically

make forays in uncharted markets, and, while doing so, creatively seek and meet multiple co-existing

bottom lines. Telenor is financially, intellectually, and structurally “richer” through its Bangladesh

venture, gaining significant new insights and experiences in doing business in “distant” geographies,

developing new culturally-based “benchmarking” standards, and experimenting with new business

models that integrate telephone “ownership” with “access”. Further, by partnering in Bangladesh with

an internationally-known socially-driven development organization (i.e. the Grameen Bank), Telenor

has strategically gained global visibility in both corporate and social sectors.

Peer J. Svenkerud (36) is Direc-tor of Stakeholder Relations,Telenor ASA. Svenkerud holds aPhD in organizational communi-cation with specific focus onintercultural issues from OhioUniversity, and has conductedpostgraduate studies at HarvardBusiness School. He is author ofnumerous peer-reviewed articlesin international journals, and topacademic papers at interna-tional conferences. His researchinterests center around informa-tion diffusion campaigns andthe social impact of new commu-nication technologies. Svenke-rud was formerly director of Bur-son-Marsteller in Oslo, andAssistant Professor at the Uni-versity of New [email protected]

Dr. Arvind Singhal (39) is Presi-dential Research Professor andScholar in the School of Inter-personal Communication, Col-lege of Communication, OhioUniversity, where he teachesand conducts research in theareas of diffusion of innovations,mobilizing for change, designand implementation of strategiccommunication campaigns, andthe entertainment-educationcommunication strategy. Dr.Singhal is co-author of fourbooks and has won several TopPaper Awards. He has won theBaker Award for Research atOhio University twice, andnumerous other teaching andresearch recognitions. Dr. Sing-hal has served as a consultantto the World Bank and UN pro-grams, as well as private [email protected]

This grocery shop is now the village information hub: The village phone is here during openinghours, and otherwise with the shopkeeper’s wife at home

Telektronikk 1.2002

Page 156: Information Theory and its Applications

Telektronikk 1.2002

Telenor Goes to BangladeshHow did Telenor get involved in Bangladesh?To fully answer this question, a little backgroundon the Grameen Bank operations in Bangladeshis useful. The Grameen (rural) Bank, founded inBangladesh in 1983 by Professor MuhammadYunus, is a system of lending small amounts ofmoney to poor women so that they can earn aliving through self-employment. No collateral isneeded, as the poor do not have any. Instead, thewomen borrowers are organized in a group offive friends. Each group member must repaytheir loan on time, while ensuring that othergroup members do the same, or else their oppor-tunity for a future loan is jeopardized. This deli-cate dynamic between “peer pressure” and “peersupport” among Grameen borrowers is at theheart of its widespread success (Yunus, 1999).By December 2001, the Grameen Bank loanedmoney to about 2.4 million poor women borrow-ers, and had an enviable loan recovery rate of 95percent. The idea of micro-lending, based on theGrameen Bank experience, has spread through-out the world, and has everywhere proven effec-tive in gaining a high rate of repayment of theloans.

In the mid-1990s, the Grameen Bank began dis-cussions with various mobile telephony opera-tors around the world, including Telenor, toaccomplish its vision of placing one mobilephone in each of the 68,000 villages of Bangla-desh. At that time, there was one telephone inBangladesh for every 400 people, representingone of the lowest telephone densities in theworld3). There was virtually no access to tele-phony services in rural areas, where 85 percentof Bangladesh’s 130 million people lived. Pro-fessor Yunus realized that while it was not possi-ble for each rural household to own a telephone,it was possible through mobile telephone tech-nology to provide access to each villager. Tooperationalize his vision, Professor Yunus estab-lished a non-profit organization called GrameenTelecom.

Telenor CEO Tormod Hermansen was intriguedby Professor Yunus’ idea of the village tele-phone, and believed that Bangladesh, given itonly had 500,000 fixed line telephones in urbanareas, presented a significant business opportu-nity for Telenor. Mobile telephony servicescould address the large unmet demand for tele-phony in Bangladesh, where the waiting periodfor a private fixed line connection was ten years.While there were significant “first-mover”advantages to be gained, the business risk wasextremely high, given the unpredictable natureof Bangladesh’s political and regulatory envi-ronment. Telenor had previously never con-ducted business in a developing country inAsia, and Bangladesh seemed aeons away from

Norway. While Mr. Hermansen’s top advisorswere “torn” about whether or not to foray intoBangladesh, Mr. Hermansen was enthusiastic,and provided patronage for the project to moveforward4).

So, in 1996, Telenor and Grameen Telecomformed a joint venture company calledGrameenPhone Ltd (GP). Telenor provided 51percent of the equity investment, Grameen Tele-com provided 35 percent, Marubeni of Japanprovided 9.5 percent, and Gonofone Develop-ment Corporation of USA provided the balance4.5 percent. The Company, GrameenPhone, wasawarded license to operate nation-wide GSM-900 cellular network on November 11, 1996.GP started its operation on March 26, 1997.

In launching its Bangladesh operations,GrameenPhone knew that its commercial viabil-ity depended on meeting the large unmet needfor telephony services in urban areas. Since itsinception in 1997, GrameenPhone’s subscriptionhas doubled each year to reach 500,000 sub-scribers by December 2001. The companyturned a profit three years later in 2000, witheven brighter business prospects ahead: Demandfor mobile telephony services in Bangladesh isestimated at about 5 to 6 million subscribers(out of a population of 130 million people).GrameenPhone’s growing mobile telephony net-work in the country, and its financial viability,helps the Grameen Telecom’s Village PhoneProject to piggyback on it.

GrameenPhone sells air time in bulk to GrameenTelecom to re-sell it to members of GrameenBank in villages. The eventual goal is for oneGrameen borrower in each of the nation’s68,000 villages to become the “telephone lady”for her village. Some 10,000 villages have beencovered until February 2002. The village tele-phone lady operates a mobile pay phone busi-ness, with the cheapest cellular rate in the world:9 cents per minute during peak hours and 6.7

Einar Flydal (53) is cand.polit.from the University of Oslo, 1983in pol. science, and a Master ofTelecom Strategy from the Uni-versity of Science and Technol-ogy (NTNU), Trondheim, 2002.Apart from 1985–1993, whenengaged in IT in education forthe Ministery of Education andwith small IT companies, Flydalhas worked with Telenor in avariety of fields since 1983. Hehas also worked as a radio free-lancer on minorities and music;on statistical indicators at theChair of Peace and ConflictResearch, Univ. of Oslo, and onwork organisation on oil plat-forms at the Work ResearchInstitute, Oslo. Present profes-sional interests: environment,CSR, innovation, and ICT.

[email protected]

More than 10,000 village phone ladies can showyou with pride their new means of income andsocial advancement: the mobile phone

154

Page 157: Information Theory and its Applications

155Telektronikk 1.2002

cents in the off-peak. Her “mobile” presencemeans that all village residents can receive andmake telephone calls, obviating the need toinstall expensive large-scale telephone ex-changes and digital switching systems.

Strategic Importance ofGrameenPhoneThe Telenor – GrameenPhone venture is oftremendous strategic importance to Telenor forat least two compelling reasons:

1 GrameenPhone was a majority stake, start-upventure for Telenor, as opposed to it purchas-ing a minority holding in an already estab-lished telecommunications business venture(as is the case with Telenor’s involvement inThailand and Malaysia). So Telenor wasinvolved in launching GrameenPhone fromday one, thus experiencing the full gamut ofpioneering experiences.

2 The Bangladesh venture, metaphoricallyspeaking, was as “distant” as could be fromTelenor’s past business ventures. Here was anestablished, affluent, Norwegian corporation,on the cutting-edge of telecommunicationstechnology, adept at doing business in a stablepolitical and regulatory environment, and in acountry where the telephone density is thehighest in the world, establishing a businessventure in a far-away, fledgling, unserved andunderserved Bangladeshi market, where thetelephone density is about the lowest in theworld, and the political and regulatory envi-ronment is relatively unstable. When oneadds to this the social and cultural “distance”between Norwegians and Bangladeshis, onerealizes this venture was bound to yield signif-icant new learnings for Telenor.

Multiple Bottom Lines Telenor’s involvement in the GrameenPhonemobile telephony project in Bangladesh hasyielded an impressive list of business and socialaccomplishments:

Market Penetration and Return on Equity• As noted previously, GrameenPhone’s sub-

scription has more than doubled each year toreach 500,000 subscribers by December,2001, which represents the biggest subscriberbase and coverage of any mobile telephonyoperator in Bangladesh, and in the entireSouth Asia region. Mobile telephony users5)

(650,000) in Bangladesh in December 2001outnumber the country’s fixed-line telephonesubscribers (590,000).

• GrameenPhone earned a profit in both 2000and in 2001 and is strategically positioned in2002 for “explosive” growth. The company’s

market value now, estimated by its manage-ment at a modest $600 to $800 per subscriber,is $300 to $400 million (U.S.). These numberssuggest that GrameenPhone’s present value toTelenor is about six to seven times its majority(51 percent) equity investment of $40 plusmillion.

Contribution to the BangladeshEconomy• GrameenPhone, to date, has invested $160

million in Bangladesh, making it the largestforeign private investor in the country.

• GrameenPhone has to date contributed $75million to the national treasury of Bangladeshin the form of new telephony tariffs, licensefees, fees for leasing of the fiber-optic line,and other such receivables.

• By February 2002, GrameenPhone hasdirectly created about 600 jobs (its employeestrength) internally and 10,000 jobs externallyin 10,000 Bangladeshi villages through theVillage Phone Project of Grameen Telecom.

Service Provision in Unserved andUnderserved Areas• The 10,000 village-based mobile phones,

leased or purchased by women members (alsocalled “village telephone ladies”) of theGrameen Bank through a loan, serve 18 mil-lion rural inhabitants, who previously did nothave access to telephony services. By the endof 2004, the number of village phones willlikely increase to 40,000, serving an estimated75 to 80 million rural inhabitants, about two-thirds of the entire Bangladesh population.

• The village phones, on average, generate 3–4times more revenues for GrameenPhone thanan individual use city/township subscription.

• The village telephone ladies, on average,make $70 to $80 per month of net profit fromselling mobile telephony services in ruralareas, which amounts to three times the percapita GNP of Bangladesh.

• In overall terms, the Village Phone Project(VPP) makes telephony services accessibleand affordable to poor, rural Bangladeshis,spurs employment, increases the social statusof the village telephone ladies, providesaccess to market information and to medicalservices, and represents a tool to communicatewith family and friends within Bangladeshand outside. Studies indicate that the VPP hashad a tremendous positive economic impactin rural areas, creating a substantial consumersurplus, and immeasurable quality-of-lifebenefits (Richardson et al., 2000; Bayes et al.,

Page 158: Information Theory and its Applications

156 Telektronikk 1.2002

1999). For instance, the village phone nowobviates the need for a rural farmer to make atrip to the city to find out the market price ofproduce, or to schedule a transport pick-up.The village phone accomplishes the task atabout one-fourth the transportation costs andalmost instantaneously (as compared to thehours of time it can take to make the trip),serving as a boon to the rural poor.

Gains in Intellectual andStructural CapitalWhat gains in intellectual and structural capitalhave accrued to Telenor through its involvementin GrameenPhone?

Pioneering Experience in Uncharted Markets• Reaching out to a relatively underserved and

unserved telecommunications markets, whilerelatively risky for an organization likeTelenor, is a viable business proposition espe-cially if it wishes to expand operations geo-graphically, and pioneer in gaining new busi-ness competencies. Being an early entrant inthe field of mobile telephony in a relativelyunserved and underserved telecommunica-tions market bodes well for Telenor to main-tain its dominant market leader position inBangladesh, and to forge new opportunitieselsewhere.

Rethinking Benchmarks• Telenor has learned that conventional Euro-

pean benchmarking for estimating marketpotential – using measures of per capita GNPor Western patterns of telecommunicationstraffic – may be inappropriate or at leastinadequate in the Asian context. In Bangla-desh, for instance, people spend a muchhigher percent of their disposable income on

telephony than in Western countries. Also,Asian cultures, by virtue of their “collectivis-tic” orientation and extended kinship struc-tures, spur more frequent telephone talk be-tween family members and friends, and forextended durations.

Teleaccess Business Models• In relatively underserved and unserved tele-

communication markets, business potentialshould be evaluated not just on the basis ofteledensity (or potential ownership) but alsoon teleaccessibility (or potential for providingaccess to those who cannot afford to own asubscription), as evidenced by the large reachof the Village Phone Project.

Choice of Partner• In relatively “uncharted” markets or new

geographies, it is of critical importance tochoose a suitable local partner who addslong-term complementary value. Telenor’smajor partner in Bangladesh is Grameen Tele-com, a non-governmental organization floatedby the internationally-acclaimed microlendinginstitution, Grameen Bank, which has 2.4million borrowers in 41,000 of the 68,000Bangladeshi villages. In Bangladesh (andoverseas), “Grameen” has tremendous brandequity by virtue of its widespread success inpoverty alleviation, empowerment of ruralwomen, and its well-known credo that “gooddevelopment is good business” (the slogan ofthe the Village Phone Project).

Many in Bangladesh feel that the “Grameen”brand is far more recognized in Bangladeshthan even Coca Cola! So branding the newventure “GrameenPhone” brought instantcredibility to Telenor’s business venture inBangladesh. Also, Telenor’s partnering withGrameen Telecom made possible the VillagePhone Project, whereby Grameen Bank bor-rowers who take loans to lease or purchase themobile telephone sets now settle their monthlytelephone bills through the bank workers. Thealready existing village-based loan disburse-ment and repayment infrastructure of theGrameen Bank allows for handling the logis-tics of the Village Phone Project at a verysmall, additional marginal cost. As noted pre-viously, while the Village Phone Project rep-resents some 10,000 subscribers (2 percent ofGrameenPhone’s 500,000 subscribers), and arelatively small percent of its revenues (6 to 8percent), it yields very high social impact interms of reaching 18 million rural Bangla-deshis who previously did not have access totelephony services, resulting in significantquality-of-life enhancements for them.

In towns, the phone serviceshop may often be splitbetween a front office formen, and a back entrance forwomen. The phone lady infront of her books show peoplethat gender patterns can bechanged

Page 159: Information Theory and its Applications

157Telektronikk 1.2002

• In an “uncharted” market or “new geogra-phy”, a local partner can also play a signifi-cant role in familiarizing an organization likeTelenor with the various political, regulatory,social, and cultural uncertainties, and helpingto cope with them.

Two-Way Learning• Telenor’s foray into Bangladesh has not been

a one-way flow of capital, technology, andorganizational structures from Norway toBangladesh. Rather, technology-transfer,knowledge sharing, and capacity buildinghave occurred in both directions, accruingsignificant benefits for Telenor. Telenor hashelped create a corporate culture at Grameen-Phone that is perceived by its Bangladeshiemployees as being democratic, relativelynon-hierarchical, merit-based, and gender-sen-sitive. Also, Telenor pioneered in Bangladeshthe idea of integrating health, safety, and envi-ronmental issues in its business practices: Forinstance, it has immunized all its employeesagainst the Hepatitis-B virus, and fields a doc-tor in its corporate office who provides medi-cal consultation to employees and regularlyconducts training programs on occupationaland health safety. In turn, employees ofGrameenPhone in Bangladesh have pioneeredseveral operational innovations in Bangladeshthat hold tremendous value for Telenor in itsgreenfield companies and other establishedmarkets. For instance, the Customer RelationsDivision of GrameenPhone, in-house, devel-oped software that cuts down the customerphone activation procedure from 19 computerkeystrokes to two keystrokes. This innovationhas significantly enhanced employee produc-tivity, obviating the need for hiring additional

personnel. One employee can now activate upto 2,000 telephone subscriptions a day, ascompared to a paltry 150 previously. Telenoris presently sharing this GrameenPhone cus-tomer activation software, through a CD-ROM, in other geographies.

Human Resources for a Global Marketplace• Over 50 Telenor officials have spent varying

periods of time in Bangladesh, gaining invalu-able experience in living and conducting busi-ness in a foreign nation’s political, regulatory,bureaucratic, social, and cultural environ-ment. Such experiences, laced with all kindsof uncertainty, adjustment, acculturation, andnew learnings, contribute significantly to acorporation’s human resources in a globalplaying field.

Lessons for Business and Cor-porate Social ResponsibilityIn addition to an impressive list of commercialand social accomplishments in Bangladesh,tremendous public relations and promotionalbenefits accrue to Telenor by cooperating withthe Grameen family of companies, which repre-sent an icon of development organizing to theoutside world. When the world’s leadingagenda-setter, the U.S. President (at that time,Bill Clinton) visits with the village telephoneladies in Bangladesh and hails the integratedbusiness and social aspects of their venture (asexpressed in the statement listed at the top ofthis case study), mass media, policy-makers,corporations, and the public all over the worldtake notice.

Women’s network groupleaving the Grameen Bank

weekly micro-repaymentmeeting. Grameen Bank

village facilities: building incorrugated iron sheets

– itself a symbol of progress

Page 160: Information Theory and its Applications

158 Telektronikk 1.2002

Specifically, Telenor’s foray into Bangladeshhighlights the following lessons for its businessand corporate social responsibility functions:

#1 Sound business means subscribing to multi-ple, co-existing, and mutually-reinforcing(win-win) bottom-lines, which also impliesacting as a socially responsible corporation.In Bangladesh, Telenor’s multiple bottom-lines included:

• Meeting commercial interests in terms ofrevenues, profits, and growth.

• Meeting social cause-related interests interms of serving unserved and underservedmarkets nationally, and also serving poor,rural, illiterate inhabitants who are tradi-tionally excluded from traditional markets(thus overcoming the digital divide).

• Gaining substantial amount of experiencein overseas operations by doing businessin a “distant” geography and an unfamiliarmarket, which helps build intellectual andstructural capital for future ventures.

• Gaining in “image” and “prestige” bypartnering in a unique commercial andsocial experiment with an internationallyacclaimed branded local partner, theGrameen Bank. The “value” of endorse-ments from such world luminaries as U.S.President Bill Clinton, or the “value” of theGSM Community Award bestowed on theVillage Phone Project during the GSMWorld Congress in Cannes, France, in2000, are hard to gauge in pure economicterms, pointing to the value of recognizingmultiple bottom-lines.

#2 Corporate social responsibility does notmean merely “showcasing” one initiative(such as the Bangladesh case), but rather

integrating CSR as a competitive asset inall business ventures. Hence, true corporatesocial responsibility means integrating allbusiness functions with a social imperative,and measuring the effectiveness of the CSRfunction not just by what Telenor hasachieved to date in Bangladesh, but whatmore can it achieve in the long-term.

At the present time, Telenor’s operations inBangladesh have centered around only one of itscore competencies, i.e. mobile telephony. How-ever, every aspect of Telenor’s business (Internetservices, communication satellites for narrowand broadband services, interactive Web-basedservices, cable television, telemedicine, fixedand mobile telephony services, and others) holdsa strategic business potential in Bangladesh, andin other developing country markets. How canthis business potential be tapped and leveragedin new geographies?

Telenor’s well-established partnership with theGrameen Bank – which has launched severalnew information technology companies such asGrameen Telecom, Grameen Communications,Grameen Software, Grameen Cybernet,Grameen Shakti (power), and others – positionsTelenor, like no other corporation in the worldto experiment with new initiatives in E-health,E-education, E-commerce, E-banking, and otherservices that may have a long-term business aswell as a social function. For instance, Grameen-Phone’s has 1,800 kilometers of available opti-cal fiber (leased from Bangladesh Railways),which to date has been barely utilized. CanTelenor leverage its relationship with theGrameen family of companies to develop newbusiness ventures with a social imperative?

To profit further on the lessons learned, shouldTelenor seriously consider looking at Bangla-desh as the prime location for establishing anindependent R&D and/or a Business Develop-

GrameenPhone’s mainincome source is the cities. Here publicity boards in oneof Dhaka’s most fashionablehotels, shown us byGrameenPhone’s information officer, Yamin Bakht

Page 161: Information Theory and its Applications

159Telektronikk 1.2002

ment Center, mainly hiring talented Bangladeshipersonnel to experiment with new initiatives inE-health, E-commerce, E-education, E-banking,and other need-based applications to establishventures in one of the most “unserved” and“underserved” world markets? With its existingphysical presence in Bangladesh throughGrameenPhone, an already established relation-ship with a branded local partner in the GrameenBank, and a tremendous base of already-gainedintellectual and structural capital, can Telenor beat the forefront of developing new products andservices which can be economically viable andalso address the needs of unserved and under-served markets of Asia, Africa, and Latin Amer-ica? Could Telenor, for instance, in associationwith the Grameen family of companies, experi-ment with the synergies that arise from the pres-ence of credit (provided by Grameen Bank),connectivity (provided by GrameenPhone), andenergy (provided by Grameen Shakti throughsolar panels) in unserved and underserved mar-kets? With the microcredit movement growingby leaps and bounds around the world, and withthe Grameen family of companies leading thismarch (Grameen replication efforts are nowunderway in over 75 countries), could Telenorposition itself for new market opportunities asno other corporation?

In summary, Telenor’s mobile telephony opera-tions in Bangladesh in co-operation with theGrameen system suggest that a corporation canstrategically pursue multiple bottom lines. It doesnot imply that such a success story is always aclear and neat result of plans and detailed over-sight. On the contrary, new options and innova-tive entrepreneurships do also imply trials anderrors and new and uncommon problems tosolve. None the less, Telenor’s operations inBangladesh, especially the strategic integration

of business and corporate social responsibilityfunctions, have resulted in an exemplary firstmile in a long marathon race. Will Telenor con-sider running the full race?

ReferencesBayes, A, von Braun, J, Akhter, R. 1999. VillagePay Phones and Poverty Reduction. Bonn, Ger-many, Center for Development Research.

Hood, J. 1996. The Heroic Enterprise. NewYork, Free Press.

Richardson, D, Ramirez, R, Haq, M. 2000.Grameen Telecom’s Village Phone Programmein Rural Bangladesh : A Multimedia Case Study.Ottawa, Canada, Canadian International Devel-opment Agency.

Yunus, M. 1999. Banker to the Poor. New York,Public Affairs.

Endnotes1) We thank the following individuals who helped us in

implementing the present project: Tormod Her-

mansen, CEO of Telenor; Beth Tungland, Senior

Vice President, Telenor’s Corporate Social Respon-

sibility function; Marit Reutz, Director, Telenor

Corporate University; Sigve Brekke, Managing

Director, Telenor Asia; Ola Ree, Managing Direc-

tor of GrameenPhone; Professor Muhammad

Yunus, Managing Director of the Grameen Bank;

Mr. Khalid Shams, Deputy Managing Director of

the Grameen Bank; Syed Yamin Bakht, Additional

General Manager of Information, GrameenPhone,

and various others.

2) Our data-collection procedures consisted of (1)

extensive archival research, including reading of

various evaluation reports on the GrameenPhone

project (for instance, the Richardson et al., 2000;

A new initiative: One ofseveral Grameen IT

education centers, providingthe resources for self reliant

software development anddistance services like

secretarial, programming,punching etc.

Page 162: Information Theory and its Applications

160 Telektronikk 1.2002

and the Bayes et al., 1999 reports); books written on

the Grameen Bank, including Professor Muhammad

Yunus’ (1999) book, Banker to the Poor, and others;

(2) in-depth interviews at Telenor AS, Norway with

key individuals involved in the planning and imple-

mentation of the Telenor-GrameenPhone project in

Bangladesh, and with their Bangladeshi counter-

parts, including Professor Muhammad Yunus,

Managing Director, and Mr. Khalid Shams, Deputy

Managing Director of Grameen Bank; (3) a two-

week field visit to Bangladesh for observation of,

and in-depth interviews with key principals at

GrameenPhone and the Village Pay Phone projects

of Grameen Telecom. Our above field-based activi-

ties in Bangladesh yielded about 30 in-depth taped

interviews, several volumes of field-notes, and over

250 photographs.

3) By December 2001, there is one telephone in

Bangladesh for every 200 people, largely as a result

of Telenor-GrameenPhone’s mobile telephony oper-

ations.

4) Several middle and senior managers at Telenor con-

tinue to be worried about the sustainability of the

GrameenPhone initiative. For the GrameenPhone

business to continue growing, large capital invest-

ments are continually needed. In 2001, Telenor’s

other equity partners in GrameenPhone (Grameen

Telecom, Marubeni, and Gonofone) were unable to

raise their share of the new investments, which put

Telenor in the awkward position of somehow raising

(through credit) the needed funds at the last minute.

5) Some 150,000 mobile telephony users are served by

other competitors of GrameenPhone.

Page 163: Information Theory and its Applications

161161Telektronikk 1.2002

Status

Page 164: Information Theory and its Applications

162 Telektronikk 1.2002

Page 165: Information Theory and its Applications

163

In this issue of the Status section of Telektro-nikk, we focus on one of the most important butoften underrated aspects of modern telecommu-nications, namely security. The section containsonly one paper, however it is a very comprehen-sive description of UMTS Network DomainSecurity written by Geir M. Køien from TelenorR&D, who is a delegate in the 3GPP SA3 deal-ing with security.

In his paper, which is a follow-up from a previ-ous paper in 2000, he addresses the furtherdevelopments of the security specifications inUMTS Releases 4 and 5. Work on security forthe control plane of the UMTS core networkstarted with Release 4. Here only security of theMobile Application Part (MAP) of SS7 wasspecified. This is referred to as MAPsec. Secu-rity for IP-based control plane protocols wasscheduled for Release 5.

The paper thus contains a fairly detailed descrip-tion of MAPsec, as it is specified by SA3 andgiven by one of the key people of this work.Additionally, the Network Domain Security forIP (NDS/IP) is explained and the limitations ofthe IP security protocol (IPsec) are addressed.He concludes with what lies ahead for UMTSNetwork Domain Security, namely the introduc-tion of a Public Key Infrastructure (PKI) to sup-port the use of digital certificates. Secure authen-tication methods are probably one of the mostimportant mechanisms necessary to facilitate awide use of e-commerce in general and m-com-merce particularly.

IntroductionP E R H J A L M A R L E H N E

Per Hjalmar Lehne (44) obtainedhis MSc from the NorwegianUniversity of Science and Tech-nology in 1988. He has sincebeen with Telenor R&D workingwith different aspects of terres-trial mobile communications.1988 – 1991 he was involved instandardisation of the ERMESpaging system in ETSI as wellas in studies and measurementson EMC. His work since 1993has been in the area of radioprogapation and access tech-nology, especially on smartantennas for GSM and UMTS.He has participated in the RACE2 Mobile Broadband Project(MBS), COST 231, and COST259 and is from 2001 vice-chair-man of COST 273. His currentinterests are in 4th generationmobile systems and the use ofMIMO technology in terrestrialmobile networks, where heparticipates in the IST projectFLOWS.

[email protected]

Telektronikk 1.2002

Page 166: Information Theory and its Applications

Telektronikk 1.2002

1 IntroductionThis article is a follow-up to the article onOverview of UMTS security for Release 99 inTelektronikk 1.2000 [1]. During the last twoyears the UMTS security architecture hasevolved to include security for the control planeof the core network as well as to cover securityfor the new IP Multimedia Subsystem (IMS)architecture.

This article will focus on describing the servicesand features of the core network control planesecurity extensions. These are collectively calledNetwork Domain Security and comprise twotechnical specifications.

A brief description will also be given on the wayforward for the UMTS security specificationprocess.

1.1 Security Features for UMTSRelease 4

The work to provide security for the controlplane of the UMTS core network started for realwith UMTS Release 41). This work took placeunder the work item name of Network DomainSecurity (NDS) and the goal was to secure allimportant control plane protocols in the corenetwork. This included both protocols based onthe telephone signalling systems (SS7) protocolstack and protocols based on the IP protocolstack. It was realized that the SS7-based and IP-based protocols were sufficiently different towarrant separate security solutions. The workto protect IP-based protocols was scheduled forUMTS Release 5.

During the process for Release 1999 it was real-ized that to protect SS7-based protocols wouldinevitably mean to protect them at the applica-tion layer. The drawback of implementing pro-tection at the application layer is that the targetprotocol itself will have to be updated in a per-vasive and non-trivial way. This is an expensiveand time consuming process that would have tobe repeated for every target protocol. After care-ful deliberations it was found that one could notafford to protect more than a selected set of pro-tocols and in the end it was decided that the onlySS7-based protocol that would be protected wasthe Mobile Application Part (MAP) protocol [2].

UMTS Network Domain SecurityG E I R M . K Ø I E N

Geir Køien (36) has been anemployee of Telenor R&D since1998. He has been working withvarious mobile systems since1991 and is interested in secu-rity and signalling aspects ofmobile systems. He is theTelenor delegate to 3GPP SA3(Security) were has served/serves as rapporteur for theNetwork Domain Security speci-fications 3GPP TS 33.200 and3GPP TS 33.210. He is also pur-suing a PhD at Agder UniversityCollege.

[email protected]

The reason that one chose to protect MAP is thatMAP is a crucial core network protocol that pro-vides mobility management services and dis-tributes the Authentication Vector (AV) securitydata from the HLR/AuC to the VLR/SGSN.

The technical specification 3GPP TS 33.200Network Domain Security; MAP applicationlayer security [3a] was completed for Release 4by June 2001. The specification, which is oftenjust called MAPsec, contained procedures forsecure transport of MAP messages betweenMAP network elements, but lacked mechanismsfor key negotiations and distribution. The keynegotiation and distribution procedures arescheduled to be included in the Release 5 ver-sion of TS 33.200 [3b].

1.2 Network Domain SecurityFeatures for UMTS Release 5

The main Network Domain Security goals forUMTS Release 5 are:

a) to provide Network Domain Security protec-tion for IP-based control plane protocols;

b) to complete the MAP security architecture toinclude key negotiation and distribution pro-cedures.

The Network Domain Security for IP-based con-trol plane protocols are based on the IETF IPsecprotocols. The work was completed when thetechnical specification 3GPP TS 33.210 NetworkDomain Security; IP network layer security [4]was approved March 2002.

2 Network Domain Security;MAP Application LayerSecurity

In this section an attempt is made to explain thetechnical realization of the MAP security protec-tion.

2.1 MAPsec Security ServicesThe security services provided by MAPsec are:

• Cryptographic data integrity of the MAPmessages;

• Data origin authentication for the MAPmessages;

1) The naming conventions for the releases changed and Release 4 is the subsequent release to Release 1999.

164

Page 167: Information Theory and its Applications

165Telektronikk 1.2002

• Replay protection for the MAP messages;• Confidentiality (encryption) for the MAP

messages.

2.2 The MAP Security ArchitectureThe main MAP security architecture consists ofthe following elements and interfaces:

• Key Administration Centre (KAC)– Release 5This new network element is responsible forkey negotiation and distribution between net-work operators. The KAC is part of TS 33.200Release 5 [3b].

• MAP Network Elements (MAP-NE)The MAP network elements must be updatedto support the Zf-interface to participate insecure MAP communication. MAP-NEs con-forming to MAPsec Release 5 specificationsmust also support the Ze-interface.

• Zd-interface (KAC – KAC) – Release 5The Zd-interface is an IP-based interface thatis part of MAPsec Release 5. It is used to neg-otiate MAPsec Security Associations (SAs2))between MAP security domains. The onlytraffic over the Zd-interface is the InternetKey Exchange (IKE) negotiations of MAPsecsecurity associations. The semantics of theMAPsec SAs are defined in The MAP SecurityDomain of Interpretation for ISAKMP infor-mational RFC. At present only a draft versionof the MAPsec DoI is available (draft-arkko-map-doi-04.txt [5]). The security servicesspecified by the security association is en-coded in the protection profile informationelement.

• Ze-interface (KAC – MAP-NE) – Release 5The Ze-interface is an IP-based interface thatis part of MAPsec Release 5. This interfaceprovides distribution of security associationdata from a KAC to a MAP-NE within oneoperator domain.

• Zf-interface (MAP-NE – MAP-NE)– Release 4The Zf-interface is a MAP interface that ispart of MAPsec Release 4. The MAP-NEsmay be from the same security domain orfrom different security domains (as shown inFigure 1). The MAP-NEs use MAPsec secu-rity associations received from a KAC to pro-tect the MAP operations. The MAP operationswithin the MAP dialogue are protected selec-tively according to the chosen MAPsec pro-tection profile.

For the purpose of the actual protection, theSecureTransport meta-operation component isused. The original MAP component is encap-sulated in SecureTransport.

From an architectural point of view, one shouldnote that the use of IKE for key negotiationintroduces an anomaly. To use IKE for the Zd-interface may seem to be an obvious choice inthat IKE is a protocol that is specifically de-signed to carry out key negotiations. However,IPsec/IKE make the basic assumption that IKEnegotiates security associations on behalf ofthe network element on which it resides. ForMAPsec this will not be the case since the KACshall not use the MAPsec SA itself. MAPsecsecurity associations are furthermore valid on anetwork-to-network basis and not individuallybetween the communicating parties.

This means that one pair of security associationswill be used for all MAPsec communicationbetween two domains. In Figure 1 MAP-NEA1and MAP-NEA2 will use the same security asso-ciation pair when communicating with MAP-NEB1. This also extends to the case were twoMAP-NEs within the same security domainneeds to engage in MAPsec secured dialogues,but here the security association (SA) pair mustbe a special initiator-SAself and responder-SAselfpair.

Figure 1 Overview of the Zd,Ze and Zf-interfaces

(from TS 33.200 [3a])

2) A MAPsec Security Association (SA) is a unidirectional logical control channel for a secured connection. The SA specifies the security services, thealgorithms and the lifetime of the secured connection amongst others. Due to different requirements, a MAPsec SA is different from an IPsec SA.

MAP

NEA1

MAP

NEA2

KACA KACB

MAP

NEB

Zd

Zf

ZeZeZe

Zf

Security domain A Security domain B

IKE “connection” between KACs

Local key distribution from KAC to MAP-NE (secured)

MAP operations secured by means of SecureTransport

Page 168: Information Theory and its Applications

166 Telektronikk 1.2002

2.3 MAPsec Protection ModesMAPsec provides for three different protectionmodes. These protection modes regulate the pro-tection level of the original MAP component:

• Protection Mode 0: No ProtectionProtection Mode 0 uses the encapsulation pro-vided by MAPsec, but offers no cryptographicprotection.

• Protection Mode 1: Integrity, AuthenticityIn Protection Mode 1 the protected payload isa concatenation of the cleartext and the Mes-sage Authentication Code (MAC) generatedby the integrity function f7.

• Protection Mode 2: Confidentiality,Integrity, and AuthenticityIntegrity in protection mode 2 is achieved bythe same means as for protection mode 1.Confidentiality is achieved by encrypting thecleartext using the encryption function f6.

Note that in protection mode 0 no protection isoffered and that the “protected” payload is iden-tical to the payload of the original MAP mes-sage. However, since a protection mode 0 com-ponent is encapsulated by means of Secure-Transport it is not identical to the original com-ponent when it comes to the processing stepstaken by MAP/MAPsec.

2.4 MAPsec Protection ProfilesThe notion of a MAPsec protection profile wasinvented to simplify negotiation of security asso-ciations between roaming partners. The idea wasto make agreements on the extent of MAPsecprotection very simple, both qualitatively andquantitatively. This would make it much easierfor roaming partners to create mutually compati-ble MAPsec security requirements. Unfortu-nately, the actual construction of the protectionprofiles has become somewhat counterintuitive,slightly inflexible and may appear a bit con-trived.

Technically, MAPsec protection is specified perMAP operation component. The MAPsec pro-tection profiles are organized by means of pro-tection groups, which are loosely organizedaround the various transactions (dialogues) thatMAP executes. Each protection group definesa set of MAP operations and their protectionmodes at the operation component level. Theconcept of “protection level” is introduced toadministrate the protection mode on operationcomponent level. A protection level of an opera-tion determines the protection modes used forthe operation’s components according to Table 1.

It shall be noted that not all MAP operations/components are included in the protectiongroups. The operations/components that are notpresent in any protection group cannot be pro-tected by means of MAPsec.

The protection profiles are composed of non-overlapping protection groups and are prede-fined in TS 33.200. The current set of protectiongroups and protection profiles may be extendedin later releases.

2.5 MAPsec Security PoliciesMAPsec security policies are structured aroundthe protection profile concept. The extent ofsecurity protection is thereby regulated both interms of which operations to protect and the pro-tection level by means of choosing the appropri-ate protection profile. The protection profile con-cept offers very simple policy management atthe expense of flexibility.

Network operators must agree on which protec-tion profile to use in bilateral agreements. Theseagreements should become part of the standardroaming agreements between operators.

3 Network Domain Security; IPnetwork Layer Security(NDS/IP)

The most important IP-based protocol to protectin the UMTS core network control plane is theGTP-C protocol [6]. NDS/IP is defined on thenetwork layer and it is therefore easy to adaptNDS/IP to protect GTP-C. In fact, no changesare required to the target protocol.

Technical specification 3GPP TS33.210 Net-work Domain Security; IP network layer secu-rity [4] defines how the IP-based core networkcontrol plane protocols can be protected. TheIETF already has a stable and well-definedarchitecture for IP security (IPsec, RFC-2401[7]) at the network layer. It was therefore anobvious choice for SA3 to base NDS/IP on thesecurity services afforded by IPsec. For the pur-pose of the closed domain of NDS/IP, many ofthe options and services of IPsec were redun-

Protection Protection mode Protection mode Protection modelevel for invoke for result for error

component component component

1 1 0 0

2 1 1 0

3 1 2 0

4 2 1 0

5 2 2 0

6 2 0 0

Table 1 MAPsec protectionlevels (from TS 33.200 [3a])

Page 169: Information Theory and its Applications

167Telektronikk 1.2002

dant. The use of IPsec in NDS/IP is thereforeprofiled to remove unnecessary functionality.The driving force behind the profiling was toreduce complexity in order to facilitate stableand effortless interoperability.

3.1 NDS/IP Security ServicesIPsec offers a set of security services by meansof the two security protocols AuthenticationHeader (AH) (RFC-2402, [8]) and EncapsulatingSecurity Payload (ESP) (RFC-2406, [9]). ForNDS/IP the IPsec security protocol shall alwaysbe ESP and it shall always be used in tunnelmode. Tunnel mode is an IPsec mode that pro-vides protection for the whole of the original IPpacket and is typically used between securitygateways. The other mode, transport mode, istargeted specifically towards end-to-end commu-nication between users, and where the primarygoal is to protect the payload portion of the orig-inal packet – i.e. providing protection for appli-cation data.

The security services provided by NDS/IP are:a) Connectionless data integrityb)Replay protectionc) Data origin authenticationd)Data confidentiality for the whole original IP

packet (optional)e) Limited protection against traffic flow analy-

sis when confidentiality is applied

List-1 Security services provided by NDS/IP

When using NDS/IP, the minimum protectionlevel provided shall be integrity protection/mes-sage authentication with replay protection. Thisamounts to the services a), b) and c) in List-1.The ESP authentication mechanisms providethese security services.

Confidentiality protection (encryption) is anoption when using NDS/IP, and for NDS/IPit shall always be used in conjunction withintegrity protection as recommended in the ESPRFC [9]. The ESP confidentiality mechanismsare used to provide services d) and e).

NDS/IP also specifies the use of the Internet KeyExchange (IKE) (RFC-2409, [10]) protocol forautomatic key negotiation and distribution.

3.2 IPsec LimitationsIPsec operates at the IP layer and provides itssecurity services to the transport layer protocols.The transport layer protocols have traditionallybeen the User Datagram Protocol (UDP) [11]and the Transmission Control Protocol (TCP)[12]. Since IPsec is located at the network layer,its processing rules are exclusively based oninformation in the IP header. This means thatIPsec cannot identify logical connection at

higher layers and cannot differentiate its servicesbeyond the information found in the IP headers.For the purpose of the UMTS core network con-trol plane protocols, this will not normally mat-ter. The exception to this is that IPsec cannot beused to discriminate on the contents in IP tun-nels. This means that NDS/IP, located at theUMTS control plane, cannot selectively protectthe contents of the GTP-U protocol [6] since itcannot inspect the contents of the tunnel.

As mentioned above, IPsec would normally beused to provide security services to the transportprotocols TCP and UDP. This is all fine, exceptthat the new IP-multimedia services in UMTSwill also use the new Stream Control Transmis-sion Protocol (SCTP) transport layer protocol[13]. SCTP is amongst other things capable ofsupporting multiple IP addresses at each end-point (multi-homing), and this is currently aproblem with IPsec. For the time being onemust therefore accept that IPsec, and thereforeNDS/IP, cannot be guaranteed to support SCTP.Work is ongoing in the IETF to solve this prob-lem, and NDS/IP will be updated to supportSCTP when a solution is finalized for IPsec.

3.3 The NDS/IP Security ArchitectureThe NDS/IP key management and distributionarchitecture is based on the IPsec IKE protocol.This protocol provides automated Security Asso-ciation (SA) negotiation and distribution for theIPsec protocols. During SA negotiation, IKEgoes through two phases. Phase-1 is the set-upof the IKE “control channel” and phase-2 iswhere the actual IPsec SA negotiation takesplace.

The basic idea of the NDS/IP architecture is toprovide hop-by-hop security. This is in accor-dance with the chained-tunnels or hub-and-spoke models of operation. The use of hop-by-hop security also makes it easy to operate sepa-rate security policies internally and towardsother external security domains.

In NDS/IP only the Security Gateways (SEGs)shall engage in direct communication with enti-ties in other security domains for NDS/IP traffic.The SEGs will then establish and maintain IPsecsecured ESP tunnels between security domains.

All NDS/IP traffic from an NE in one securitydomain towards an NE in a different securitydomain will be routed via an SEG and will beafforded chained-tunnels security protectiontowards the final destination.

Operators need only establish one ESP tunnelbetween two communicating security domains.This would make for coarse-grained securitygranularity. Alternatively, the operators may set

Page 170: Information Theory and its Applications

168 Telektronikk 1.2002

up separate tunnels for each of the protocols andservices that are protected3).

The following interfaces are defined for protec-tion of native IP-based protocols:

• Za-interface (SEG-SEG)The Za-interface covers all NDS/IP trafficbetween security domains. The SEGs use IKEto negotiate, establish and maintain a secureESP tunnel between them.

• Zb-interface (NE-SEG / NE-NE)The Zb-interface is located between SEGs andNEs and between NEs within the same secu-rity domain. The Zb-interface is optional. Nor-mally ESP shall be used with both encryptionand authentication/ integrity, but an authenti-cation/integrity only mode is allowed. TheESP tunnel shall be used for all control planetraffic that needs security protection. Whetherthe tunnel is established when needed or a pri-ori is for the security domain operator todecide. The tunnel is subsequently used forexchange of NDS/IP traffic between the NEs.

The security policy established over the Za-interface is subject to roaming agreements. Thisdiffers from the security policy enforced over theZb-interface, which is unilaterally decided bythe security domain operator.

There is no NE-NE interface for NEs belongingto separate security domains. This is because itis important to have a clear separation betweenthe security domains. This is particularly rele-

vant when different security policies are em-ployed within the security domain and towardsexternal destinations.

The restriction not to allow secure inter-domainNE-NE communication does not preclude a sin-gle physical entity to contain both NE and SEGfunctionality.

3.4 NDS/IP Encryption AlgorithmsIPsec offers a wide set of confidentiality trans-forms. The mandatory transforms that compliantIPsec implementations must support are theESP_NULL and the ESP_DES transforms.However, the Data Encryption Standard (DES)transform is no longer considered sufficientlystrong in terms of cryptographic strength. ForNDS/IP, neither of the mandatory transformsis allowed.

The new Advanced Encryption Standard (AES)[14] developed by NIST is expected to be avail-able for IPsec shortly. For the purpose of NDS/IP, the ESP_AES transform(s) will be mademandatory when they are approved by the IETF.In the meantime the 3DES transform is to beused.

3.5 NDS/IP Integrity AlgorithmsThe integrity transforms that compliant IPsecimplementation is required to support are theESP_NULL, the ESP_HMAC_MD5 and theESP_HMAC_SHA-1 transforms. Since NDS/IPtraffic always requires the anti-replay service,the ESP_NULL transform is not allowed inNDS/IP.

3) IPsec will normally discriminate traffic based on the quintuple (source-IP-address, destination-IP-address, source-port-number, destination-port-number and transport-protocol-identity).

NE

A-2

ZaZbZb

Security domain A Security domain B

NE

A-1

IKE “connection”

ESP tunnel

NE

A-2

NE

A-1

SEGA SEGB

Zb Zb

Zb Zb

Figure 2 NDS architecturefor IP-based protocols (from TS 33.210 [4])

Page 171: Information Theory and its Applications

169Telektronikk 1.2002

The new Advanced Encryption Standard (AES)[14] developed by NIST is expected to be avail-able for IPsec shortly. For the purpose ofNDS/IP, the ESP AES MAC transform(s) willbe made mandatory when they are approved bythe IETF. NDS/IP implementations shall alsosupport the ESP_HMAC_SHA-1 transform.

4 The Road Ahead for UMTSNetwork Domain Security

4.1 ScalabilityFor the IP-based services, a need is foreseen foran authentication framework for network ele-ments. The basis for this assumption is thatauthentication for the IPsec IKE phase-1 “con-trol channel” is currently provided by means ofpre-shared secrets. Normally, the use of pre-shared secrets scales poorly. For NDS/IP theproblem is manageable due to the inherent sec-tioning and hierarchy introduced by the securitydomains and the fact that protection is by meansof chained tunnels.

Nevertheless, in an environment where the num-ber of IP capable network elements is expectedto rise dramatically, the need for a more scalablearchitecture is likely to be needed within thenext few years. The rise in the number of IPaddressable network elements is likely to co-incide with migration to IPv6. The migration toIPv6 will also mean that many of the technicalobstacles to end-to-end security that are found inIPv4 will likely disappear or be mitigated by theadvent of IPv6.

So at the same time as the number of IP-address-able network elements is expected to rise sharplyit will also be possible to introduce true end-to-end communication. This will theoreticallyresult in the need for authentication between allIP-addressable network entities for all UMTSoperators. This amounts to (n • (n – 1))/2 mutualauthentication associations, where n is the num-ber of all IP-addressable UMTS network enti-ties. The need for a truly scaleable authenticationframework will therefore be strong at that time.

With this as the background, SA3 has startedwork to prepare for the introduction of a PublicKey Infrastructure (PKI) to support digital cer-tificates in the core network. These digital cer-tificates will be used solely for authenticationpurposes for the core network elements. IPsec/IKE has been designed with support for variousauthentication methods and IPsec/IKE can usedigital certificates for authentication. This meansthat IPsec and IKE can still be used for provisionof security services. Furthermore, IPsec/IKE issufficiently flexible to allow for gradual migra-tion to PKI and digital certificates as the pre-ferred authentication method.

4.2 Authentication FrameworkA new work item is being created in SA3 toaddress this issue. The new work item, whichwill likely result in a new technical specification,has tentatively got the title “Network DomainSecurity; Authentication Framework (NDS/AF)”.The main purpose of NDS/AF is to provide PKIbased entity authentication for network elementsparticipating in NDS/IP secured communication.

The introduction of PKI in NDS/IP will proba-bly be executed in phases. For the first phases,the requirement to always use chained tunnelswill likely be kept. This will facilitate smoothmigration towards full PKI support in NDS/IPfor all operators. For instance, an operator maychoose to introduce PKI very early within itsown security domain while still supporting theuse of pre-shared secret towards external desti-nations. Figure 3 shows a case where PKI basedauthentication is used for IKE phase-1 over theZa-interface between the SEGs. In this case,operator A may also use PKI internally whileoperator B uses pre-shared secrets internally.

When PKI is fully integrated in NDS/IP andwhen IPv6 is widely deployed in the UMTS corenetworks, the requirement in NDS/IP to use achained-tunnel architecture will likely be relaxedto allow for direct end-to-end communicationbetween NDS/IP peers.

Figure 3 IPsec used withchain-tunnels, i.e. hierarchy

still kept, one CertificateAuthority (CA) per security

domain which must becross-certified with

roaming partners (from S3-010622 [15])

Cert(SEGA)CAA

Cert(SEGB)CAB

Security domain A Security domain B

SEGA SEGB

CAA CAB

IKEPhase: 1

Cross-certification (off-line)

RepositoryA

Publishcertificates/CRLs

RepositoryB

Publishcertificates/CRLs

Page 172: Information Theory and its Applications

170 Telektronikk 1.2002

Note that there is no strong need for PKI forMAPsec since the number of MAPsec securityassociations is limited by the number of securitydomains and not by the number of MAP networkelements. Therefore, the use of pre-sharedsecrets for authentication of the MAPsec IKEphase-1 session is unlikely to introduce scalabil-ity problems for the foreseen lifetime ofMAPsec4).

5 SummaryDuring the last two years the development ofNetwork Domain Security in UMTS has comea long way.

Through the standards defined in 3GPP TS33.200 and 3GPP TS 33.210 it will be possibleto security protect the MAP protocol and GTP-Cand other IP-based protocols in the UMTS corenetwork control plane.

The work to refine the security protection inthe core network will continue. Automated keymanagement will be developed for MAPsec inorder to facilitate robust interworking and man-agement for MAP security. Improved authenti-cation services based on PKI are planned for theNDS/IP. Support for new transport layer proto-cols like SCTP as well as support for new cryp-toalgorithms like AES will be added when theybecome available from the IETF.

A remaining obstacle to security in the UMTScore network control plane is for operators toactually deploy the security protocols in theirnetworks. Let us hope that operators will adoptthe new security standards and use the securityservices aggressively.

6 Acronyms and AbbreviationsThe technical report 3GPP TR 21.905 [16] con-tains all the official acronyms and abbreviationsthat apply to more than one technical specifica-tion group. In addition, most technical specifica-tions and reports contain a list of abbreviationswith relevance to the respective document.

For the purposes of the present document, thefollowing abbreviations apply:

3GPP 3rd Generation Partnership Project(www.3gpp.org)

AES Advanced Encryption StandardCR Change RequestDoI Domain of Interpretationf6 MAP encryption algorithmf7 MAP integrity algorithmIETF Internet Engineering Task Force

(www.ietf.org)IKE Internet Key ExchangeIP Internet ProtocolIPsec IP security protocols

(as defined in RFC 2401)ISAKMP Internet Security Association and

Key Management ProtocolKAC Key Administration CentreMAC Message Authentication CodeMAP Mobile Application PartMAP-NE MAP Network ElementMAPsec MAP security – the MAP security

protocol suiteNDS Network Domain SecurityNDS/IP Network Domain Security;

IP network layer securityNDS/MAP Network Domain Security;

MAP application layer securityNE Network Entity or Network ElementNIST National Institute of Standards and

Technology (www.nist.gov)PKI Public Key InfrastructureRFC Request For Comment (IETF stan-

dards are published as standardstrack RFCs)

SCTP Stream Control TransmissionProtocol

SEG Security GatewaySA Security AssociationSA3 Services and system Architecture:

Work Group 3 (Security)SS7 Signalling System No. 7TCP Transmission Control ProtocolUDP User Datagram ProtocolUMTS Universal Mobile Telecommunica-

tions System

4) Observe that MAP itself may live longer than MAPsec. MAP may today be run over the IP protocol stack. NDS/IP could then conceivably be used tosecure MAP instead of MAPsec. On the other hand, MAPsec, which is an application layer solution, can be used for both MAP-over-SS7 and MAP-over-IP.

Page 173: Information Theory and its Applications

171Telektronikk 1.2002

Appendix A Resources

3GPP ResourcesAll meeting contributions, technical reportsand technical specifications are found at the3GPP web and ftp sites. The web site is atwww.3gpp.org and the ftp site is at ftp.3gpp.org.

On the website under the “specification” titleone will find information about the current set ofspecifications, the latest approved versions, etc.This is very convenient. The 3GPP website alsocontains a lot of other useful information and iswell worth a closer look if one is interested inUMTS standardization.

IETF ResourcesA number of 3GPP specifications draw on IETFstandards. The IETF standards can be found atthe www.ietf.org website.

The email Exploder ListsFor those interested in following the email com-munications of the technical specification groups(TSGs) there are essentially two ways to do this.

The first is to subscribe to the list(s) of interest.This is done at the at list server webpage athttp://list.3gpp.org/. You simply pick out thegroups of interest to you and subscribe to them.Be warned that many of the groups have a highvolume of email contributions to the lists.

The second is simply to use the list server webpages to browse the up-to-date archives of all thelistserver email lists. This is convenient sinceone does not have to subscribe to a list to be ableto browse it.

7 References1 Køien, G M. Overview of UMTS security

for Release 99. Telektronikk, 96 (1),102–107, 2000.

2 3GPP. Mobile Application Part (MAP).(3GPP TS 29.002)

3a 3GPP. Network Domain Security; MAPapplication layer security (Release 4).(3GPP TS 33.200)

3b 3GPP. Network Domain Security; MAPapplication layer security (Release 5) (3GPPTS 33.200)

4 3GPP. Network Domain Security; IP net-work layer security (Release 5). Draft.(3GPP TS 33.210)

5 IETF. The MAP Security Domain of Inter-pretation for ISAKMP. (Work in progress.)http://www.ietf.org/internet-drafts/draft-arkko-map-doi-04.txt (Note: Internet-Draftsare removed after 6 months.)

6 3GPP. Tunnelling Protocol (GTP) across theGn and Gp interfaces. (3GPP TS 29.060GPRS)

7 IETF. Security Architecture for the InternetProtocol. (RFC-2401)

8 IETF. IP Authentication Header. (RFC-2402)

9 IETF. IP Encapsulating Security Payload(ESP). (RFC-2406)

10 IETF. The Internet Key Exchange (IKE).(RFC-2409)

11 IETF. User Datagram Protocol. (RFC-768)

12 IETF. Transmission Control Protocol.(RFC-793)

13 IETF. Stream Control Transmission Proto-col. (RFC-2960)

14 NIST. Specification for the AdvancedEncryption Standard (AES). (FIPS-197)(Copies can be obtained at http://csrc.nist.gov/encryption/aes/)

15 3GPP. Using PKI to provide networkdomain security (Telenor/Nokia). (Tempo-rary document S3-010622.)

16 3GPP. Vocabulary for 3GPP specifications.(3GPP TR 21.905)