Some Useful Coding Techniques for Binary Communication Systems
Transcript of Some Useful Coding Techniques for Binary Communication Systems
Purdue UniversityPurdue e-PubsDepartment of Electrical and ComputerEngineering Technical Reports
Department of Electrical and ComputerEngineering
1-1-1962
Some Useful Coding Techniques for BinaryCommunication SystemsJ. C. HancockPurdue University
J. L. HolsingerPurdue University
Follow this and additional works at: https://docs.lib.purdue.edu/ecetr
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] foradditional information.
Hancock, J. C. and Holsinger, J. L., "Some Useful Coding Techniques for Binary Communication Systems" (1962). Department ofElectrical and Computer Engineering Technical Reports. Paper 503.https://docs.lib.purdue.edu/ecetr/503
TR-EE62-1CONTRACT NO. AF 33(616)-8283
PURDUE UNIVERSITYSCHOOL OF ELECTRICAL ENGINEERING
Some Useful Coding Techniquesfor
Binary Communication Systems
J. C. Hancock, Principal Investigator
J. L. Holsinger
Communications Sciences LaboratoryJanuary 2, 1962 Lafayette, Indiana
RESEARCH PROJECT PRF 2906 CONTRACT NO. AF 33(616)-8283
FORUNITED STATES AIR FORCE
AERONAUTICAL SYSTEMS DIVISION WRIGHT-PATTERSON AIR FORCE BASE
OHIO
M
SOME USEFUL CODING TECHNIQUES
FOR
BINARY COMMUNICATION SYSTEMS
J. C. Hancock, Principal Investigator
J. L. Holsinger
SCHOOL OF ELECTRICAL ENGINEERING
PURDUE UNIVERSITY
Lafayette, Indiana JANUARY 1962
UNITED STATES AIR FORCE
AERONAUTICAL SYSTEMS DIVISION
WRIGHT-PATTERS ON AIR FORCE BASE
OHIO
PREFACE
On, June 2, 1961 the Communication Sciences Laboratory, School of Electrical
Engineering, Purdue University, "was awarded 1SAE Contract No« 33(616)-8283.
This contract is administered under the Aeronautical Systems Division, Wright-
Patterson Air Force Base, ©hi® by Mr, B. W. Russell,
During the initial phases of this program it seemed desirable to classify
and unify the various coding techniques as they related to digital communication
systems. This report represents the results of this brief study and represents
a minor phase of the overall program.
•Ill**
ABSTRACT
An introduction to coding theory and a discussion of specific coding tech
niques are given as applied to digital communication systems.
The place of coding in a communication system is illustrated and the various
approaches to coding are discussed. The information theory concepts required are
presented along with the First and Second Fundamental Theorems of Shannon, The
relation between Shannon1s. theorems and coding for the noisy and noiseless channel
is discussed. For the noiseless channel the techniques o£ Shannon, Fano, Huff
man, Gilbert-More, Karp and others are discussed. For the noisy channel the
techniques of Hamming, Slepian, Elias, Cowell, Bose-^Chaudhuri, Reed~14uller, Fire,
and Wozencraft are presented. The relationships between the various codes are
given and the advantages and disadvantages of each indicated. Numerous examples
illustrating the use of the codes are given and areas of further research outlined,
»iv*=
TABLE OF CONTENTS
PREFACE
ABSTRACT
TABLE OF CONTENTS LIST OF FIGURES '
CHAPTER I - INTRODUCTION
1*1 Purpose and Structure of the Report
1*2 A General Gommunic at ion System
1*3 Information Theory Concepts
CHAPTER II -CODING FOR THE NOISELESS CHANNEL
2.1 Introdmetion
2*2 Shannon-Fano Encoding
2*3 Shannons Binary Encoding
2*4 Huffiaan Encoding
2.5 Additional Techniques for the Noiseless Channel
CHAPTER III-GODING FOR Til NOISY CHANNEL
3*1 Introduction
3*2 lamming Codes
3*3 Slepian Group Codes
3*4 Elias*s Iterative Coding
3*5 fse of Group Codes in Feedback
3.6 Additional Techniques for the Noisy Channel
3*7 Relationship Between the Coding Techniques Discussed in this Report
3*S Conclusion
BIBLIOGRAPHY
Communication Systems
iii
iv
v
1
1
2i§
24
24
■ 32.
41
ii
88
93
99
-Vf
LIST OF FIGURES
Figure. . Title page
1 A Communication System 3
2 Types of Communication Systems .5
3 Binary Channel Models. 6
4 Binary Communication System Q
5 Entropy of a Binary Source 13
6 Coding for the Noisy. Channel 53
7 A Block Diagram Representation of the RelationshipBetween Various Coding Techniques 105
& Some Advantages and Disadvantages of the Noisy CodingTechniques ]_o6
CHAPTER 1
INTRODUCTION '
The classic work of Shannon (1,2), followed by that of Feinstein, Khinehin,
Fano, Elias, and others laid the foundation for the modern field of Information
Theory. In his original work Shannon proved an important theorem giving a promise
of information transmission capabilities previously considered impossible. Looselyft
stated the theorm is as followss If information is transmitted over a noisy
channel at a rate less .than the channel capacity it is possible to encode the
transmitted message in a manner such that it may be received with an arbitrarily
small error rate. .
Unfortunately Shannon’s proof of this theoremis an existence proof, and does
not give any information about how the encoding is to be accomplished in practice.
The severity of this problem is readily apparent when it is realized that today,
more then a decade after Shannon's original work, communication systems still do
not operate at an information rate or an error rate even close to that theoretic .
cally possible*
At present a large amount of work is being done in an attempt to devise codl
ing techniques that will allow this situation to be improved. Present results
(3,4) indicate that within a few years it will be possible to operate a communica
tion system at an information rate near the channel capacity with an error rate in
the range of one error per day to possibly one error per several hundred years.
Because of this it is increasingly important that more people become aware of
the basic concepts involved in coding.
1*1 Purpose and Structure of the Report
To a person working in the field of coding theory the names Golay, Hamming,
Slepian, Shannon, Fano, Elias, Bose-Ghaudhuri, Huffman, Gilbert “Moore, Wozencraft
* Here as in the next few paragraphs, terms such as information, information rate, capacity, coding, and others should be given their intuitive meaning until more precise definitions are given.
-2'
and Reed-Muller bring to mind several approaches to the solution of the coding
problem. Unfortunately this same preponderance of names givee rise to considerable
confusion in the minds of those not acquainted with the coding field and in addition
points out. the lack of a unified approach to the determination of optimum codes.
It is the aim of this report to alleviate some of this confusion by presenting,
with a minimum of proofs and analyses., the various better known coding schemes
and showing how they are related. Thus, this report will be tutorial in nature
and mil, hopefully,, provide a more, unified picture of the field of coding than
can presently be obtained from the literature.
As is .always the case in a tutorial presentation, an assumption must be made
concerning the background of the reader. In. this report it will be assumed that
the reader is familiar with the representation,of discrete-messages by binary num
bers and with, the basic concepts Of discrete probability theory,. References (5,6,7)
provide an introduction to discrete probability theory for those lacking this back
ground, v ■
The construction of this report is briefly as follows; First, a discussion
of a general communication system will be given, pointing out how coding fits into,
the complete system. Next, several concepts from Information Theory will be.pre
sented, This will involve precise definitions of terms such as information and
channel capacity that are required for a study cf coding theory. Thirdly,.coding ■
techniques for the noiseless binary channel will be discussed. Fourthly, the major
portion, of the.report will discuss coding techniques for the noisy binary channel#
In each case numerous examples will be given to illustrate the material discussed.
1.2 A General Communication System*■*"'* * '» "«"ih«'.i">h.'«■■■*■* f I'*'."''■« *■!'■” *"1 ■ '
A conventional communication system is illustrated in Fig, 1# Here an infor
mation source supplies a message to the transmitter. The transmitter converts the
message to a form suitable for transmission over the channel. (For an RF channel
this usually involves modulating some property of a carrier with the message signal#
MessageMessage
Fig. 1 - A Communication System
TRANSMITTER USER
SOURCE
“4-
For a wire circuit it could involve nothing more than direct transmission of the
message.) At th® channel output there is, in general, a noisy, distorted replica
of the transmitted message., The receiver operates on this signal converting it
into a,form suitable for the user. It is usually desired that the message supplied
to the user be as nearly identical, in some sense, to the source message as is
possible.
In general, a communication system may be either continuous, discrete or both.
An example of a continuous system is a conventional Atl system used for transmitting
voice infoimation. A teletype system represents, a discrete system while a system
for a transmitting W.signals by pulse-code-modulation (PGM) represents a combined
discrete and continuous system* Fig* 2 illustrates the basic differences between.,
the.signals involved in each of these systems.
At present, there has been essentially no work done in coding for continuous
systems,although Shannon's work applies to these as well as discrete systems. 'Be-* .
cause of this, this report will be concerned only with the; discrete, case and, due .
to its widespread use, .only the binary form of this. Thus, the information source
of Fig. 1 will now be considered to produce a sequence of 0's and l*s which represent
the message to be transmitted. It is the function of the transmitter, channel, and
receiver to accept these binary digits (binits),: to reproduce them with as few
errors as possible at the receiver, and to supply the results to the user.
The numerous details involved in.designing a transmitter and receiver to
operate with a.specified channel and to produce a minimum error rate are of no
interest, to the coding theorist. For this reason the transmitter, channel, and
receiver are usually considered as a "black box*' which accepts 0's and l's at its
input and reproduces these, with an occasional error, at its output. This simpli
fied model is illustrated in Fig. 3(a), In this illustration PQ is the transitional
probability that a transmitted Q will be received as a 1, For example if P =0,1
this -model.implies that for every 100 O's transmitted there will be, on the average,
b) A Discrete System
1 . XTeletype
_n_TL 7 -SHffl- ^VP pfipiupr
_n_n i '—rTeletypeI ransmi iL©r XVww wl V vl
c) A Combination Discrete and Continuous System
Analog to_nn_ruL
HP «»A vs OTTl AllDigital
1 rctnpiiiiirtpr
-
. - jin_n_rL Digital to iReceiver Analog
® CZD ®TV Monitor
Fig- 2 — Types of Communication Systems
-6~
General Binary Channel.
■Binary Symmetric" Channel’
c)
Input Output
Fig. 3 ^Binary Channel Models '
-7-
ten of. these received erroneously as l*s. The remaining 0»s will be received
correctly. Similarily P-^ is the transitional probability that a transmitted 1
is received as a 0. These transitional probabilities provide all the information
required to determine the effectiveness of a particular code. Because of this all
further discussion in the report will be based on the model of Fig. 3(a) or on the
closely related models of Fig. 3(b) and 3(c)*
The binary communication system resulting from these simplifications is shown
in Fig. 4(a). Here a source produces a sequence of binary digits'that are trans
mitted over the channel*. Due to noise on the channel some of these are received
in error and thus represent erroneous information. In general this, incorrect in
formation can not be tolerated and some means of eliminating the errors must be
found. This can be accomplished in one of the following three ways:.
1* Use Error Correcting codes that correct an error before the message is
presented to the user. In general this involves the periodic insertion of
so-called."check digits" into the sequence of message digits that are to be
transmitted. Proper use of these cheek digits at the receiver allows the
most probable transmission errors to be corrected. Fig. 4(b) illustrates a
system using this technique. Note that additional equipment, usually a small
special purpose digital computer, is required at both the transmitting and re
ceiving terminals to perform the encoding and decoding operations,
2. Use Error Detecting Codes. These codes provide only for the detection
of errors and as such are normally of use only when provision is made for
the retransmission of incorrectly received messages. This requires the use
of a feedback channel from receiver to transmitter which is not always avail
able. However, recent results (3) indicate that this approach offers the
greatest hope for attaining the information and error rates theoretically
possible,
3* Use . Error Correcting and Detecting codes. With some eodes it is
Errors
Error
for Error CorrectionA Binary Coramunioation Systeia'. 'With.
Fig. 4 - A Binary Communication System
ChannelSource Encoder
... -9-
possible to correct some of the received, errors and to detect, but not
correct^ additional.errors. . When a feedback channel is present these de
tected, but uneorfected, erroneous messages are retransmitted. The ad
vantage of these codes, as compared to error detection only codes, lies in
the reduced capacity required for the feedback channel.
With these techniques it is possible, in principle, to obtain an arbitrarily
small error rate provided only that PQ and F1 are less then 1/2 and that informa
tion is transmitted at a rate below the channel capacity. In practice, an increas
ingly large amount of coding equipment is necessary as. the required, error rate is
decreased* . This means that in most situations a compromise must be made between
equipment cost and allowable error rate. At present there is considerable effort
being expended to discover codes, that require less equipment for a specified error
rate* To date the most promising of the new approaches appears to be that of
sequential coding (4 ) discovered by Wozencraft of MIT and that of error detec
tion coding xfith feedback (3)*
In summary,, the following concepts from this section should be emphasized*
1. This report will be concerned only with binary communication systems and
the coding techniques for these. The binary information source will be con
sidered to produce a sequence of 0*s and l*s that represents the information
to be transmitted and the transmitter-channel-receiver will be represented by
one of the models of Fig. 3*
2. Due to noise on the channel some of the transmitted 0«s will be received
: as l's and vice versa. This effect is included in the models of Pig, 3 through
the probabilities Pq and P^.
3*. Use of suitable encoding techniques at the transmitting station and de
coding techniques at the receiving station allow these errors to be reduced
to an arbitrarily low value.
4. There are essentially two coding methods that can be used to approach
-10-
this low error rate, namely, error correcting codes or error detecting codes
plus a feedback channel for retransmission*
It should be noted that although error detecting and correcting codes form a
major portion of the field of coding theory a second type of coding, used with a
noise free channel, is also of considerable importance and will be discussed later*
1*3 Information theory concepts*
Up to this point the terms, information, channel capacity, information rate,
etc, have been used in an intuitive manner. To be able to discuss further the
concepts of coding and the benefits to be gained from coding it is necessary that
these terms be defined in a precise manner.
Consider the intuitive concept of information. Imagine that a coin is to be
tossed. If the coin is biased so that it is certain to corae up heads then, intui
tively, it would seem that no information would be gained by tossing the coin.
Similar reasoning follows if the coin is certain to come up tails. However, when
either heads or tails may occur some information may be obtained by tossing the
coin* Thus txhe conclusion to be reached is that information can be obtained from
the occurrance of an event only when the probability of that event occurring is
less than 1. Extending this reasoning it seems reasonable to require than any
measure of information be such that the information associated with an event in
crease as the probability of the event decreases. This reasoning plus other
mathematical requirements (pp 80-82, Ref, 8) leads to the following definition of
information, commonly ealled entropy or uncertainty.
- -logg P(l.) bits/event (l)
Here and throughout the report, all logarithms are to the base 2 unless otherwise
noted*
As an example consider the tossing of a biased coin where the probability of
obtaining a head is l/4. From Eq. (1) the information, or entropy associated with
-11-
obtaining a head is H>*iog 1/4 = leg 4 " 2 bits. Similarily the uncertainty
associated with a tail is log 4/3 = 0.415 bits. Since different entropies are
associated with a head and a tail it is more meaningful to speak of the average
uncertainty associated with the toss of the coin. The average uncertainty is just
the uncertainty associated with a head times the, probability that a head .is obtain
ed plus the uncertainty associated with a tail times the probability of a tail* Letting the probability of tails = P(t) and the probability of heads = P(H) the
above statement becomes
H--P(f ) log P(T) - P(H) log P(f|) bits/toss (2)
Note the use of H to denote the average entropy as contrasted to # which repre
sents an individual entropy. For the above example Eq. (2) shows that the entropy
associated with tossing the biased coin is
H = -1/4 log l/4 - 3/4 log 3/4 = .811 bit/toss.
Observe that H as given by Eq. (2) is maximum for P(T) = P(H) = l/2. This is
intuitively satisfying since this represents a condition of maximum uncertainly'
about the outcome of the toss of a coin.
Next, consider a binary source that produces a sequence of 0*s and l*s.
Before each digit is produced there is a certain probability that it will be a 1, denoted P(l), and a corresponding probability that it will be a 0, denoted P(0)»
Since it is assumed that either a 0 or a 1 must be produced, the relation P(l) + P(0) 1 must hold. Analogous to Eq. (2) the amount of information pro
duced this source is defined to be
H(X) = -P(0) log P(0) - P(l) log P(l) bit/binit (3)
or, since P(0) + P(l) = 1
H(x) = -P(0) log P(©) - n-P(O)] log [l-P(0)l bit/binit (4)
-12-
Here the notation H(l) rather than just H is used so that the entropies,
associated, with a source, H(X), can be differentiated from the entropy at the
user, H(x). The reason for this distinction will become clear as the discussion
progresses.
Pig* 5 illustrates this entropy function for values of P(0) between zero and
unity. Observe that H(X) is maximum when a 0 and a 1 are equiprobable and. is
zero when either a 0 or a 1 is certain.
This definition gives the amount of information produced when a single binary
digit (binit) is produced. Thus if the source generates 1 binit/sec. it produces
information at. a rate of H(X) bits/sec. Likewise if m binits/sec are produced
the information rate of the source is mH(X) bits/sec. With this definition it is
possible to specixy unambiguously the amount of information produced.by a binary
source,
A useful generalization of Eq. (4) can be obtained by considering a discrete
Source that can produce any one of m symbols each with a specified probability.
Denoting the m symbols by X^* Xg - - - 2^ and the corresponding probabilities by
P(Xl), P(X2), —- - P(Xm) the entropy of such a source is defined to be
MH(X) - x: fUx) log P(Xj_) bits/symbol (5)
^•1-
Example 1.2-1
Consider a discrete information source that produces the following
. symbols with the.probabilities indicated
A 1/2 D 1/16
B 1/4 E 1/32
G 1/B F 1/32
For this source the average entropy per symbol is
bits/binit
Entropy,-. of. "a Binary Source
- P(D
-14-
H(X) = -1/2 log 1/2 - 1/4 log .1/4---------- 1/32 log 1/32
= 1 15/l6 bits/symbol
It cam be shown (pp 82-49, Ref. 8 ) that the entropy of a source is
maximum when all symbols, a,re equiprobable. Thus for this ease ’
Max H(X) = -1/6 log 1/6 --------- 1/6 log 1/6
= log 6 - 2,58 bits/symbol
From this example it can be seen that more information is produced, on.
a per symbol basis, when a larger -number of symbols are possible.
Denoting the symbols supplied to the user by Y ,the information supplied to
the user is defined ip a manner analogous to that of the source, i.e.,
H(T) = ~P(Y=0) log P(Y=§) - P(1=1) log P(1=1) bits/binit (6)
The relations between P(Y), the channel probabilities, and P(X) can be determined
from Fig. 3(a) and are as followss
P(T=0) = P(X=OXl-P0) + P(X=1) . p1
p(l-l) = P(x=0) PG + P(X=1) (1-P-l) (7)
In certain cases (for example P(X=0) = P(X=l) and PQ = P^ ) H(Y) and H(X)
are equal numerically! however, in general this is not true.
Referring again t© the channel model of Fig, 3(a) note that the various
probabilities (P0, Pp, 1-Pp) indicated are actually, transitional, or con
ditional, payabilities, i.e, PQ represents the probability of receiving a 1
given that a 0 is transmitted, P^ represents the probability of receiving a 1
given that a 0 was transmitted etc.
Thus in a more consistent notation the relations are
-15-
PQ = P(l=l|x=0) P(Y1(X0)
P-L - P(X=Q|X=l) * P(I0|XX)
1-P0 = P(I=0|I=0) = P(Yq|X0) ^
1-P-L =p P(X=l|X=l) » P(I1U1)
With these definitions three additional entropies associated with the source and
user may be defined'as follows;
2 2H(I|X) - - H XL P(XisI.) log ?(!,-. IX-) (9)
i=i j=l 3 ~ 3 '
2 2H(X|I) = -.^1 X ^(X^Xj) log PCX.IXj), (10)
i-1 j=l2 2 , '
H(X,l)=-H 51 P(X.,I,) log P(Xi#I.) (11)i=l j=l J J
The justification of these entropies on an intuitive basis is not as
straight forward as it was for the source and user entropies, H(X) and H(X),
However, some feeling for the meaning of these Hiajr be obtained in the following
manner. Associated with the occurrence of a specified event at both the trans
mitter and the receiver (for example the event a 1 is transmitted and a 1 is re
ceived) there is a definite probability which depends upon the source probability,
P(X±), and the transitional probability P(Xj|Xx) which is
P(Xi,Tj) = PdJ^) P(X±) (12)
From the earlier discussion it appears reasonable to define the information
The symbol ^ is to be read: "is defined as" or, "is, by definition, equal to".
-16-
ass oaia ted with this event as
yf --leg P(Z^sYj) bits/occurrence of X^ and Y^
Taking the average of this information over all possible values of X and Y
gives the expression of Eq» (10) for the average entropy associated with the
joint occurrence of a source and user symbol*
In a similar manner the conditional entropies of Eqs. (9) and (10) are the
average of the individual entropies ‘- log P(Y.|x. ) and. - log P(X. I Y.) respec-
lively* The conditional entropy H(y|X) can be considered as the uncertainty
concerning the received symbol Y when it is known that X has been transmitted* For a nosieless channel Y would be uniquely determined by X and H(Y|X) would be zero, likewise H(X|Y) is the average uncertainty concerning the symbol transmitted, X, where the received symbol, Y, is known. For a noiseless channel
H(x|y) is also zero.
For convenience the five entropies associated with a source-user combina
tion are listed below along with their appropriate interpretations.
H(X-) - A measure of the average uncertainty' of the symbols produced by
the source in terms of bits/source symbol,
H(Y) - A measure of the average uncertainty associated with the received
symbols in the terms of bits/received symbol.
E(X,Y) - The average uncertainty associated with the transmission and re
ception of a symbol in terms of bits/symbol pair*
H(YfX) - The uncertainty concerning the received symbol when the trans
mitted symbol is known.
H(X.|y) - The equivocation of the channel which is a measure of the uncer
tainty concerning the source when the -received symbol is known.
It is readily shown (.pp 101-102, Ref. 8 ) that the following relationships
exist between the various entropies.
-17-
H(X,Y). = H(X) + H(Y|X)
- H(I) + H(X| I)
h(x)2h(x|i)
HCy) £ H(l| X)
(13)
04)
Example 1*2-2
A source produces 0*s anc| l*s with the probabilities P(0) = 1/4*
P(l) = 5/4* The channel transitional probabilities are given by
H(X) = -1/4 log 1/4 - 3/4 log 3/4 =0.811 bit/binit
P(I0) = 1/4 x 0*9 + 3/4 x 0.1 = 0.300P(Tl) = 1/4 x 0.1 + 3/4 x 0.9 = 0,700
H(l) = -0.3 log 0.3 - 0,7 log 0*7 = 0.881 bit/binit
H(Y(X) = " 7p log 0,9 -
log 0*1
0*1 n i~ log 0*10*9x3 n 'T" los G*9
0.9x34
= -0,9 log 0.9 - 0.1 log 0.1
H(Y|X) = 0.469 bit/binit
P(X1,Y1) = P(Y1|X1) P(XX) = 0,9 x 0.75 = *675
P(XpY0) = P(Yq|X1) P(X]_) = 0.1 x 0.75 = *075
P(X ,Yj = P(Y |X.) P(X ) = 0.1 X 0.25 = ,025O X -L O O'
P(Xo,Y0) = P(-0IX0) P(X0) =0.9 X .25 = ,225
Hex.!) = -,675 log .675 - .225 log ,225 -,075 log .075 - *025 log .025
= 1*280 bits/ pair of binits
■18-
PCxL:|x1>
T} ^A ^ Tv/ tt* ' \ r*sf'pCi-l).' ,7
. H(xll) = - ,765 log *965 - »075 log .25. -r ,025 log .035 - .225 iog ,75
' . - 0*399 bit/binit
Note that
H(X*X) = H(X) + H(XIX) = 0.881 + 0,399 » 1.280 bit« H(X) + H(X|X) = 0,811 + 0*469 - 1.280 bit
H(X) - ,811^H(X|X) = 0,399
H(X) = ,881 ^ H(X|X) = 0*469
Note that in this example the uncertainty* H(X)* at the user is greater
than that* H(X) supplied, to the channel. It should be emphasized that this in-
erease in extropy does not. mean that useful information is gained on the channel
but only that the.noise has introduced additional uncertainty into the received
symbols, fhe following discussion concerning the actual information transmitted
through the channel will further clarify this point.
One additional concept* that of mutual information or transinfomation is
required before proceeding further. Mutual information is a measure of tjie
amount of information transmitted through channel and is defined* for.the binary
channel as
2 2Y Y los1=1 j=l
(15)
Straight forward algebraic manipulations show that
l(X,X) = H(X) - H(X|X) bits/binit- H(x) H(X|X) bits/binit (l6)» H(X) + H(X) - H(X,X) . bits/binit
For a noiseless channel H(X) * H(X), H(xlx) = H(xlx) = 0 and the information transmitted through the channel is equal to the source information H(X). Conversely, when the noise on the channel is so great that P(fjJ XQ) = P(XQ|Xj) =l/2
then H(X) - H(X'|x) and the information transmitted through the channel is zero* Since these are intuitively satisfying results this appears to be reasonable defh-
nition for the amount of information transmitted through a channel*
Considering the results of Example 1.2-2 above observe that the information
transmitted through the channel is
-19-
I(X,X) H(X) -H(X|X) = 0.311 - Q.399 = *412 bit/binit = H(X) -H(xjx) - 0.831 - *469 =0*412 bit /binit = H(X) +H(X) -H(X,X) = *811 + .881 - 1.280 = .412 bit/binit
Thus, due to noise on the channel, the information l(X,X), transmitted through
the channel is considerably less than that supplied to it.
With these precise definitions for the amount of information supplied by a
source and the amount of information transmitted by a channel it is now possible
to define precisely what is meant by channel capacity. According to Shannon (l)
the capacity of a discrete, memoryless, channel is given by
C = max I(X,X) = max Uh(X) *H(X IxX]= max Ch(X) -H(XIX)] (17)
where the maximization is with respect to the source probabilities P(X). This
definition is completely general, applying to: all discrete channels and even to
continuous channels when the various entropies are properly defined. The work
in this report will be concerned with only the binary symmetric channel (BSC)
of Fig. 3(b) and the binary erasure channel (BEG) of Fig. 3(c), The calcula
tion of the capacity for these channels is straightforward and is illustrated
Below, .
Example 1,2-3
Determine, the capacity of "the BSG shown below
* Po; loS p0 + Qo l0S Qo
t hus :C- max fH(Y) + PQ log PQ + Qq log Q0]
From previous results H(l) is maximum .when
p(l0) - P(X-l) = 1/2
and has a value of unity. Thus, for the BSC
G = 1+ P0 log F0 + Qq log Qq bits/binit (18)
Since P(Yq) - a Q0 + (l-a) P0 = 1/2, the source probability that will
transmit information over the channel at this rate.is a * l/2.
Example 1.2-4
Determine the capacity for the BEG shown
-21-
O 1
For a BEG the symbols received as x are ignored. Thus
H(x|x) = - aQg log Qq - (1-a) Qq log Qq
= - Q0 log Q0
Since H(x|x) is independent of P(x) it is. necessary?- only to maximize
H(X),
As before H(X) is maximum for equiprobable symbols. The corresponding
entropy is
Max H(X) » - 1/2 Q0 log l/2 Q0 - l/2 Qp log l/2 Qq
= - Qo log 1/2 Qq
G = Qq flog Q0 - log 1/2 Q^J = Q0bits/binit (19)
It is an interesting property of the BEC that for Q0^ 0.23 the channel
capacity is greater than that for the BSC. In addition all digits received as
a 0 or a 1 are correct. Because of this it is in some cases easier to use error
correcting codes with the BEC than with the BSC.
With this material as background it is now possible to give the first and
second fundamental theorems of Shannon, It is because of the conimunication
possibilities promised by these theorems, and the fact that both theorms are
based upon the assumption of appropriate coding techniques, that the current
interest exists in the field of coding theory.
-22-
In the first of Shannon*s theorems channel capacity is considered on a per
second basis rather than on a per binit basis. If m binits/sec ean be trans
mitted over a channel its capacity, on a per second basis, is
0* - rnC bits/sec . (20)
Shannons First Fundamental Theorem applies to a discrete noiseless, (i,e,
the probability of an error is zero) memoiyless channel and is as follows®
Theorem -.Let. a source have an entropy of H(X) bits/souree symbol
.and a channel a capacity of C* bits/sec. Then it is possible to encode
the output of the source in such a way as to transmit over the channel
at an average rate arbitrarily close to C'/H source symbols per second.
It is not possible to transmit at an average rate greater than C’/H.
Considering the symbols of Eixampl.© 1,2—1 and assuming a binary channel
with m * 1 binit/sec, this theorm states that the source symbols, A, B, C* p,
Ej F, can be encoded into binary digits in such a manner that they can be trans
mitted over the. channel at a rate up to but not exceeding 16 source symbols per
31 second®
The importance of this is made clear by noting that since there are 6
symbols to be represented a 3 digit code would appear to be necessary. This
would allow transmission of only 1 symbol/3 seconds which is considerably
less than that indicated by Shannon’s theorem* Optimum coding techniques for
this situation have been developed and will be presented later in this report,
ShWhon’s second fundamental theorem, given earlier in a less precise form,
applies to a noisy, memoiyless channel and is as follows:
Theorem - Let a binary source have an entropy of H(X) bits/binit and
channel a capacity of C bits/binit. Then if H(X) < C there exists a
coding, scheme such that the output of the source may be transmitted over
the channel with arbitrarily small error rate. This is not possible if
-23-
H(X) > G.
The importance of this theorem lies in the fact that it was previously
thought that a reduction in the error rate could be accomplished only through
a corresponding reduction in the information rate. Thus as the error rate -
approached zero so would the information rate. Shannon*s theorem states that
this is not true provided that proper encoding techniques can be obtained.
The determination of these techniques represents the major effort in
coding research at the present time.
r
-24-
GHAPTER II
CODING FOR THE NOISELESS CHANNEL
2,1 Introduction
Before starting a discussion of coding the following definitions, pertain
ing to a noiseless channel, are given,
Source symbol - - One of n possible symbols produced by a message source.
Alphabet - - A list of all n allowable source symbols.
Message -» - A finite sequence of source symbols,.
. Encoding or enciphering: - - By definition this operation occurs at the trans
mitter and is a procedure for associating the source symbols with a corres
ponding set of binary digits in a.one-to-one manner.
Decoding or deciphering - This operation occurs at the receiver and
corresponds to the inverse of encoding, i»e., it is a procedure whereby
the received set of binary digits are associated with the original source
symbols.
Coding - - A general term including both the operation of encoding and that
of decoding.
Code word - - The binary number assigned to a source symbol. This may be
composed of one or more binary digits.
Length of a.Code word - - The number of binits in a code word.
Optimum Code - - A code having the maximum possible efficiency for a given
set of source symbols and probabilities.
The capacity, C», of a noiseless, binary channel is given by Eqs, (IS) and
(20) (with PQ - 0) as a m bits/sec. Shannon’s first theoren states that the
symbols from a source having an entropy of H(X) bits/source symbol can be en-
cpded in sueh a manner that they can be transmitted over this channel at a rateG tup to but not exceeding W/VS" source symbols/sec. Thus if a binary source with
-25'
P(Q) **.P(l) = l/2 is considered the entropy is H(X) 53 1 bit/s our cq symbol and
the symbols can be transmitted at a rate of m source symbols per second.
Obviously this represents straight foreward transmission of binits each having
maximum possible entropy and no coding is possible or necessary. However, if
the source probabilities are such that H(X) =? 0,25 bit/source symbol the theorem
states that 4 m source symbols, or binits, can be transmitted over the channel
each- second. Since the channel can transmit only m binits per second coding is
obviously required for this case. The following example illustrates encoding
for this situation.
Example 2.141
Assume that a coin is to be tossed 100 times at the rate of one toss/
sec and that the results (heads=l or tails = 0) are to be transmitted, in
order, over a noiseless binary channel. If a fair coin.is considered the
probabilities will be P(Q) = P(l) 53 l/2 giving rise to a source entropy,
on a per second basis, of
H»(X) - -1/2 log 1/2 - 1/2 log 1/2 = 1 bit/sec.
Since the capacity of a noiseless binary channel transmitting 1
binit/sec is 1 bit/sec., the entropy of the information supplied to this
channel is equal to the channel capacity and coding is not required,
JJext consider the case where the coin is biased so that the probabilities
are P(Q) = 0.05 and P(l) * 0,95,
Under this condition the source entropy is
H»(X) - “0.05 log 0,15 - 0.95 log 0,95
— 0,286 bit/sec.
Direct transmission of these symbols results in an information input to
■26-
the channel of approximately one-fourth its capacity# Shannon's first
theoranstates that this situation can be improved by suitable coding.
Ideally this coding would lead either to the transmission of nearly 4
times as many source symbols per second over the same channel or the trans
mission of the same number of source symbols per second over a channel
having a capacity of approximately one-fourth that of the original channel.
In either case this would represent a considerable increase in the channel
utilisation. To demonstrate the improvement possible with a relatively
simple code consider the. technique of transmitting only the positions in
which a 0 occurs and assuming that all other positions are l«s, Since
there are 100 positions to be represented, a seven digit binary code is ,
required. Thus if a 0 occurs in positions 7? 25, 63, 75 and 92 the code
sequence to be transmitted is
0000111 0011001 0111111 1001011 1011100
If this experiment were repeated a large number of times the average
number of 0>s appearing would be 5 and the average code length would be 35
binits. Thus a channel operating at a rate of 0.35 binit/sec can be used
to transmit the coded message as compared to a rate of 1 binit/see required
for the uncoded message.
Since no information is gained or lost in the encoding process the
information associated with one binit in the original sequence must be the
same as that associated with 0,35 binit in the code sequence. This gives
an entropy for the code binits of
H (7) = bits/source binitc' 0.35 code binlt/souree binit
= 0.817 bits/Gode binit
A convenience measure of the efficiency of a coding procedure is the
ratio of the average information per code binit to the maximum possible
-.27-
inf orpiatioia per code binit'# The efficiency, y^Q> is thus
H (X) bits/code.binit1biVcode binit (21)
- 81.7*
Later discussion of more sophisticated techniques will show that in
general efficiencies of greater than 95% are readily obtained#
Note that 7|c is equally well a measure of the efficiency of channel
utilization since it is equivalent to the ratio of the actual rate of in
formation transmission to the maximum possible rate of information trans
mission#
This example illustrates encoding for a binary' source# In many case this
binary source would have been obtained by assigning binary digits to each of the
n (n an integer > 2) possible messages of the original message source. An ex
ample of this would be the transmission of English text by assigning 5 binits
to each letter of the alphabet. In general this would not result in binary
sequences for which HC(X) = 1 bit/binit and therefore coding of the binary source
would be required# This two step encoding procedure is rather pointless since
it should be possible to encode the original message in such a manner that
H (X) 0£ I bit/binit. The following example illustrates this point# c
Example 2#l-2
Consider the source of Example 1»2-1. For this source the symbols
and their probabilities were
A 1/2 D 1/16
B 1/4 E 1/32
Cl/8 F 1/32
These six symbols are to be transmitted over a binary channel and
therefore must be represented by binary digits. When assigning binits to
-28-
these symbols it should be realized that if fewer digits are assigned to
the symbols having the greatest probability then, with care, the average-
number of binits per Source symbol will be less than when, an equal number
of binits are assigned to each source symbol.
With this in mind consider the following code.
A 0 D 110 . . .
B 10- E 11110
C 110 F 11111
The average number of binits, 1, required,, for this code is
L = 1^1/2 + 2x1/4 + 3x1/8 + 4x1/16 + 5*1/32 + 5*1/32
- 1 15/16 binits/source symbol
: fhe entropy of this source was previously found to be
H(X) = 1 15/16 bits/source, symbol
The entropy of the code digits is given by
u = H(X) _ t -,rH/ bit 1 source symbolH0tx; 1 15/16 ———— X 1-157V6 biiits
= 1 bit/binit
Since the maximum entropy of a binit is 1 bit/binit, the coding
efficiency is
w- x 100 - ioc$'» c 1 bit/binit
..Note thatp/Q\ = average number of zeros
■ ' average number binits
, tol/2 + M/4 ; + 1*1/32 =• 31/32 X 16/31 - 1/2
Likewise
-29-
p(l) 1x1/4 + 2xl/3 + 3x1/16 + /pcl/32 + 5x1/32—i 15/16 = 31/32 x 16/31 = 1/2
or, equally well,
P(l) = 1-P(0) - 1/2
This illustrates that for this ease the code binits are equiprobable
and independent*
At this point a question mught be raised concerning the reason for assign
ing such a large number of binits to some of the symbols in the._ above example.
For example why not assign the code words in the following, apparently much more
efficient, way?
AO D 01
B 1 E 10
G 00 F 11
This gives an average length of
L - lxl/2 + lxl/4 + 2x1/8 + 2x1/16 + 2xl/32 + 2xl/32
= 1 l/4 binit/source symbol
He(X) - SM - || bits/binit
Since the maximum possible entropy for binary digits is 1 bit/binit if is
obvious that a falacy in this coding scheme must exist and indeed one does.
This is readily demonstrated by considering the code sequences that would be
transmitted for the message symbols A G F E D B* These sequences are as follows
a) original code 01101111111110111010 .
b) alternate code §001110011
Imagine that'these sequences have been received and are to be decoded.
The decoding procedure consists of cheeking the first digit to see if it eorres-
pondg. to a source symbol. If it does not. then the first two symbols are con-,
sidered. This procedure is continued until a group of digits are recognized
that correspond to a source symbol. The symbol is then recorded and the pro
cedure repeated, for the following digits in exactly the same manner.
The following illustrates this procedure for the original code*
digits checked result
0 A
1 meaningless
n meaningless
no G’ l ■ meaningless
11 meaningless
111 meaningless
1111 meaningless
11111 F
1 meaningless
1111 meaningless
11110 E
1 meaningless
11 meaningless
• 111 ' meaningless
1110 D
1 meaningless
10 B
Thus'the transmitted message is
AC F ED B
When this procedure is applied to the alternate code the following sequences
are obtained as possible transmitted messages*
■ AAA BBS AA 33 '
. AAA P E l B
\ G. B B B G F; . . ' .
■ AGF ElB '
: Because of this, imambiguoUS. decoding is not possible'for the alternate
coding scheme and no useful information can be transmitted. It is for this,
reason that an inconsistent value was found for the entropy of the alternate
code.digit*- ,
Since it .is usually desirable to obtain codes having a maximum average
length it is well, to. reconsider the above situation in an attempt to determine
wiiy one obde failed and the other did not. Consider the siutation that would
exist if the alternate code were transmitted with a.space between each of the
code words* For this case the code sequences would be
0 00 11 10 01 1 .
Obviously, no ambiguity exists with this code and the transmitted message
is directly obtained as AQFEBB, Mote, however, that this spacing of code words
was not required for the original code. Thus the property required for un
ambiguous decoding is that the code words can be transmitted in sequence with
out intervening spaces. A code having this property is. described as being
uniquely decipherable* Further consideration will show that the alternate
code does not have this property due to the fact that some of the code words
can be. obtained from others by adding a digit. For example the code word for
0 is obtained from the code wbrd for A by adding a 0* Observe that this situ
ation does not- exist in the original code, i*e, no code word is the prefix of
another code word. It is this property, called the prefix property* that de
termines whether or not a particular code is uniquely decipherable.
From this dis'cussion. the two requirements for an optimum code should be
-32-
clear*
1. The code must be uniquely decipherable i.e,, each code word must have
the prefix property. This condition also insures that H (X)- < 1 bit/binit.
(A proof of this statement is given in Ref. 3, pp 143-151). Because of
this, 7] c, as defined above, can never exceed 10C^.
2* The average length of a code word should be as small as possible con
sistent with the above requirement.
For the sake of completeness it should be.noted that it is possible to con
ceive of codes that do not have the prefix property but are still uniquely
decipherable* For example consider the code
A 1
B 10 C 100,Application of the above decoding procedure to any sequence of these code
words shows tjiis to be a uniquely decipherable code. However, there appears to
be no general method for determining such codes and in addition no known codes
of this type have a higher efficiency than codes having the prefix property.
Thus all codes discussed in the following sections.of this chapter mil have the
prefix property.
The following sections will discuss, in a more or less chronological order,
the various better known techniques used in coding for the noiseless channel.
Since Huffman encoding represents.the optimum (in the source of giving maximum• ( -
efficiency) coding procedure it may seem superfluous to describe some of the
other non-optimum techniques. To delete these, however, would be to defeat the
purpose of this report.
2*2 Shannon^Fano Encoding
The Shannon-Fano encoding procedure (9) appears to have been the first
constructive procedure for determining codes having the prefix property and as
such represents a logical starting point in the discussion of specific coding
techniques.
In essence the procedure is a technique for assigning binary digits to
source symbols in such a manner that the number of binits assigned is inversely
proportional to the probability of the corresponding message symbol. The ■:pro-
cedure consists of fisting the source symbols in the order of nonincreasing prob
ability and then, dividing this group of symbols into two new groups having ap
proximately equal probabilities, A 0 is assigned as the first digit of the
code words in one group and a 1 is assigned to the first digit in the other
group. This subdivision process is then repeated until groups are obtained
that contain ouly one source symbol each* The resulting code will in.all cases
have the prefix property although it will not always have maximum efficiency.
The following, examples illustrate this procedure#
Apply the 3hannon-Fano encoding procedure to the following source.
Symbol A B G D E F
Probability l/2 l/A l/& l/l6 l/32 l/32
Step 1, List the symbols in the order of honincreasing probability.
Step.2,. Divide the list into two groups having probabilities that are
as nearly equal as possible,
A l/2 "^T total prob, - l/2
Example 2,2-1
-34-
Although probabilities of exactly l/2 are obtained for this problem
in general this will not be possible.
Step 3* Assign a 0 as the first digit in the code word for the first
group and a 1 as the first digit in the code word for the second group.
Step-4* Repeat the division and assigning of digits process until single
symbol groups are obtained.
A 1/2
B 1/4
C 1/8 I) 1/16 S 1/32
F 1/32
©
1 0
1 1 0
1 1 1 0
1 1 1 1 0
1 1 1 1 1
1st division
2nd division
■ 3rd division
•4th division
5th division
Observe that this code has the prefix property and from Example 2,1-2 fts
efficiency is 1QCP>. Thus this is an optimum code and no other procedure
can yield a better code. This situation, however, is not typical of.
Shannon-Fano encoding and occurs in this example only because of the
particular source probabilities used. In fact the following proof shows
that 100/a efficiency is possible only when
-n.PQq) = 2 1
where n^ is the number of letters in the i th code word. Note that this
relation exists in the above example.
Taking the logarithm of both sides of Eq, (21), multiplying by. P(X.),
and summing over all i yields N N
- 21 P(2i) loS P(*±) = 21 P(%) n. (23)i=l i=l
The right hand side of this expression is the average length of the code
«r*I
-35
words, L, while the left hand side is the entropy of the source symbols,
H(X). Thus H(X) = L and 71 c ■»' x 100 = 1QC$. Any other sets ofL
probabilities will lead to the condition
P(^)> 2
causing L to be greater than H(X) and . 7[c to be less then 10C$.
Example 2.2-2:
Determine the Shannon-Pano code for the following source.
Source Symbol Xx Xg X^ X^ X$ X6 Xy x8 *9 X10
Probability *3 .2 .2 .1 .05 .05 *0(3 > 03 * 02 . 02
Applying the procedure demonstrated above yields
2nd division
+ 0,1 + .0*1 - 2,0 binits/source symbol
H(X) (0.3 log 0.3 + 0.4 log 0,2 + 0.1 log 0.1
+ 0.1 log 0.005 + 0.06 log 0.03 + 0.04 log -0.02)'
5= 2.743 bit/souree symbol
7lc '* ~^-x 100 - x 100 - 97.9%
h
Although this efiiciency is quite high it will be shown, later that
this code is. not optimum i.e,, it is possible to obtain a higher efficiency
with another coding technique, Obviously any improvement will be small.
Example 2.2-3
Apply the Shannon-Fano encoding procedure to the binary source of
Example 2.1-1 for the case of P(0) - 0.05.
The procedure for encoding, a binary source consists of grouping the
source binits into groups of two or more binits and considering these
groups as new source symbols. Binary code words.are then assigned to
these symbols in the usual manner. •
Consider, the- ease of using 2 binits per group. The possible sequences
two binits in length are 00, 01, 10, 11. Assuming successive binits to be
indpendent, the corresponding probabilities of these .sequences are
-36-
Sequence Probability
00 .05. x .05 =. .0025
01 .05 x .95 - .0475
10 .95 x .05 = .0475
11 .95 x ,95 * ,9025
The Shannon Fano code for these sequences is thus
Sequence
oo
Oode word
01
U
10
10 110
11 111
l x .9025 + 2 X ,0475 + 3 x .0475 + 3 x .0025
1.1475 binit/sequence
Since the source binits are considered in pairs the average entropy
per pair is twice the entropy for a single binit. Thus, from example 2,1-1,
H(Z) - 2 x 0,286 ~ 0*572 bits/source symbol
-37-
71 C=? 0*57% ^ 2.QQ _ IQlc 1.1475 x u . • I/O
This example illustrates that encoding groups of two source binits can
reduce the required channel capacity from 1 bit/sec to 0.5875 bit/sec. An
even greater reduction in channel capacity can be obtained by encoding larger
groups of source symbols. This is illustrated in the following example.
Example 2,2-4
A source produces two independent symbols, A and B, with the probability
= 1/16, P(B) =15/16.
It is desired to encode these so as to obtain a coding efficiency
greater than 7C$,
1. Encoding of single symbols
Symbol Probability code word.
A 1/16 0
B- 15/16 1
L = 1 .
I-I(X) = - (1/16 log 1/16 + 15/16 log 15/16)
0,337 bit/source symbol
?tc " 33,7?
2, Encoding of pairs of symbols
Symbol . Probability code word
BB 225/256 0
AB. 15/256 10
BA 15/256 110
AA 1/256 111
>lc= ^ §J|| 3C 100 = 56,5$
L
L = l/2 ( |||- + + 25I ) = 0.592 binit/source symbol
3* Encoding of 3 symbols
Symbol Probability code wordBBS (15/16)3 0BBA (15/16)2(1/16) 100BAB (15/16)2(l/l6) 101ABBi (15/16)2(l/l6) 110
, ■ . 2BAA (15/16) (1/16) mooABA (15/16) (1/16)2 11101AAB (15/16) (1/16 )2 1U10AAA (l/l6 )3 Hill
f m ( 3375 + 9 x 225 + 46 x 5x t7 v 4096 /
= 0*45® binit/source symbol
Me » |4U « 100 = 73.3$
This illustrates that encoding larger groups of source symbols yields more
efficient;codes * Actuany it is possible to obtain an >fc arbitrarily close to
,100$. by encoding suitably large groups of symbols* However, for this example
the efficiency increases so slowly for groups greater than 4 or 5 binits in
length that the increased cost of the encoding equipment would, in most situa
tions, more than offset the increase in coding efficiency*
A second'fundamental limitation of any coding scheme can be observed in
this example. Note that as larger groups of source symbols are encoded there,
is an increasing. amount of delay between the time that a source symbol is pro-
duced and the time that its code word is transmitted. Thus, depending upon the
particular application, there may also be a limit on the size of the groups that
may be encoded due to a limitation on the allowable delay time;'-.
An alternate method for constructing Shannon-Fano codes consists of using
a coding tree. When using the coding tree it is desired that the probabilities
of symbols whose branches meet at a node point be as nearly equal as. possible,
The use of the code tree is best demonstrated by giving the code tree for some
of the previously derived codes.
-39-
Example 2,2-5
. Determine the Shannon-Fano code for the source of Example 2,2-1 by
means of the code tree.
B 1/4
c i/a
D 1/16
F 1/32
lots that at each node point the symbol probabilities are equal* ' When,
this condition exists a 10©^ efficient code is obtained.
The code words are obtained by starting at the root and progressing
via the branches to the desired symbol noting the 0's and-lfs that are
encountered on each branch. Thus the. code word for C is 110, The result
ing code words derived from this tree are the same as those of Example 2.2—1,
Example 2*2-6
Draw the code tree for the source of Example 2.2-2
-40-
In this ease the symbol probabilities are not equal at each node point and
a 10Q$ efficient code is not obtained. The code symbols derived from this
tree correspond to those found in Example 2.2-2.
-41.
2,3 Shannon*s Binary Encoding
Shannon’s binary encoding procedure (pp* 4G2-4C3 Eef. 1) is-primarily of
theoretical interest -since it has n© practical advantages over other techniques
and often has a lower efficiency. It is presented here for the sake of eomplete-
hesp and because it allows a simple verification of-Shannon’s, first fundamental
theorem# '
The procedure is based upon determining code words that have the prefix
property and satisfy the. following relation.
2~ni > - P(Xi). £.2~ni (24)
where, as before, .h^ is the number of binits in the i th code word. The code
words are determined in the following manner, ,
1, list the symbol probabilities in nonincreasing order and let these be
denoted by P(X-j_), P(2^) - •— PCX^)where
pcy ^poy £ - - ^p(x^)
2* Calculate the numbers
■ k-1 -Pk -ZlP(Xi) k « 2, 3, - » n
i=l
P-,
3, For the k th symbol write P^ as a binary number* of binits where
# In the binary representation of a number less than unity the binary digit weights are „ , 0 „y p—j} •Thus, for. example,
0,$ * *2^ + 2“2 + 0j2~3 0S2"4 + 2~5+
■ ^ mm
-42~
is an integer satisfying Eq. (24). The resulting binary number is the.
k th code word.
It can be shown (p4Q2, Ref • 1) that these code words have the prefix pro
perty. Th@ following example illustrates this procedure.
Example 2.3-1
Debermine.the Shannon binary code for the following source and compare
. it to the corresponding Shannon—Fano code.
Source Symbol ^ X^ X^ X^ Xg
Probability 0.4 0.1 0.01 0.2 0.01 0.03 0.2 0.05
1. list probabilities in noninereasing order.
h \ b . *2 b % V \
0.4 0.8 0.2 0.1 0.0? 0.Q3 0,01 o.ca
2. .. Calculate P. .k*s
P1 = o = .0000 ^ -
P2 = 0.4 = .0110 - -
P3 = 0.6 - .1001- -
P^ = 0.8 - .1100 - -
P^ = 0.9 - .11100 - -
P6 » 0.95 = .111100
P? = 0.98 ^,1111101--
Pg^ 0.99 « .1111110 - ^ '
*43-
3. Determine r^s and code symbols
k.
1 2
2 3
3 3
4 4
5 5
6 67 7
$ 7
Source Symbol
X-L
% .
%
Code word
00
11001111101
on
1111110
11110©
100moo .
For this code
L - 0,8 + 0,6 + 0.6 + 0,4 + 0,25 + 0,18 + 0.0? + 0,©7
■ 2,97 biniis/source symbol
H(X) «* -(0,4 log 0.4 + 0,4 log 0,2 + 0.1 log 0.1
+ 0,05 log 0.05 + 0.Q3 log 0.03 + ©*-08 log 0.02)
- 2,29 bits/source symbol
Thus the entropy of the code digits is
H,(X) = SQQ- *. |$||> - 0.77 bit/binit
L
and
y{c - 7i%
It is readily determined that the Shannon-Fan© code for this source
is as fellows.
-44*-
X,■h
JL,
X,
X,
0
11 1 0
111 1 1 1 0
I o
1111 111
111110
II 0
1 11 1 0
For this code
I s3 2,37 binit/source symbol
= 96*0^ : .
This illustrates the fact that. in general Shannon*s binary encoding
is less efficient than other methods*.
. The theoretical importance of this coding technique lies in the fact that
the condition imposed by Eq, (24) allows bounds to be determined for the average
code length L.
These bounds are readily determined in the following manner* Taking the -
logarithm of Bq. (24), multiplying by P(X^) and summing over all i yields
' n ' n n2^ P^) p(%) log p(x.)>;>r; ■*(&)' (iva) (25). i=l . i=l . . i*l .
loting that
i=l
n- ZL p(%) log P<X) - H(X)
' i=l
-45-
n n nY ?(Xi) (ru-1) = P(2i) n - ^ p(X±) 53 L - Xi^l i=l i»l
allows Eq, (25) to be written as f
L > H(X) > L - 1
. which can be rearranged to give
H(X) > 1 > X >L H(X) . (26)
Here H(X) is the average entropy per encoded symbol and L is the average
number of binits per encoded symbol. If it is assumed that the encoded symbols
represent groups of I independent source symbols the relation between the
entropy of the encoded symbols and that of the source symbol is
H(X) ?= N H.(X) bits/encoded symbol
Where H (X) denotes the entropy associated with a single source symbol, similar
ly
L - N.tg. .binits/encoded, symbol
Using these relations Eq* (26) can be rewritten as
N Hg(X.) + 1 > N "tig ^ N Hg(X),
orH(X)t|>Ls >^(1) (27)
It is this relation that justifies^consideration of Shannon*s binary en
coding procedure, Observe that as larger groups of source'symbols are 'encoded,
the average number of code binits per source symbol approaches the entropy of
the source symbols. However, the condition L_ « H (X) is exactly that required
to obtain 100/1 coding efficiency and the transmission of information at the
-46-
channel capacity. Thus the limiting form (as I becomes infinite) of Eq, (27)
demonstrates that an encoding procedure can be determined for the noiseless
channel that will allow the transmission of information at a rate arbitrarily
close to the channel capacity*
This statement is exactly that of Shannon’s first fundamental theorem and
as such demonstrates the importance of Shannon*s binary encoding procedure. It
should be emphasised, however, that this result does not imply that a more
efficient code cap not be obtained for a given value of N. The above example
illustrates that one can.
2,4 Huffman Encoding
The Huffman encoding procedure (10) is a systematic method for determining
optimum codes in the sense that no. other codes having the prefix property and a
higher efficiency can be determined.
This procedure is slightly more complex than those previously discussed and
is as follows: •
1* list the symbols, to be encoded in the order of nonincreasing probability,
.2, Group the two least probable symbols together and consider these as a
single new symbol whose probability is the sum of the individual probabili-
■ tips,
3* Form a new list of symbols containing the remaining original symbols
and the new symbol. List these in the order of nonincreasing probability
also.
4. Group the two least probable symbols of this list forming a second new
symbol whose probability is the sum of the individual probabilities.
5. Repeat the regrouping and relisting process until a one element group
having a probability of unity is obtained.
6f Assign code binits to the original symbols according to the position
occupied by the symbol in the various subgroups that were formed.
•47'
The following examples illustrate systematic procedures for carrying out
these steps.
Example 2,4-1
leiermine the Huffman code for the following set of source symbols*
Symbols A B C 1 E
Probability l/2 l/6 l/6 1/12 1/12.
Proceeding with steps one through four above gives*
A 1/2 1/2
B l/6 1/6
G 1/6 1/6
D l/12-j »l/6
E 1/12J
The exact location of the resulting symbol is unimportant as long as
no symbols having a greater probability are below it in the' list*
Continuing with step five gives, as a complete result,
A
B
G
D
E
The code binits are determined for each symbol by assigning a 0 to
the code word each time the symbol, or a sub-group containing the symbol,
is the lower element in a subgroup and a 1 when it is the upper element.
For example, the locations of G, or a subgroup containing C, in the
above columns are as follows *
-48-
location: not included upper upper lower
code symbols - 1 1 0
The uniquely decipherable code words are obtained by writing these binits
in the reverse order*
The code words for this, source are thus
A ' 1
B : 0 0
Q 0 1 1 . . y;.
D 0 10 1
E 0 10 0
L =* .1/2 + 2/6 + 3/6 + 4/12 + 4/12 115 2 binit s/s ounce symbol
5 •H(X) ** - 2Z P(%) 1°I P(%) = 1*959 bits/souroe symbol
i=l
HC(X) = -Ay - 0*979 bits/binit L
.■)<«>
An alternate procedure for carrying out Huffman encoding that is similar
to the eoding tree for Shannon-Fan© encoding, has been given by Fano (pp. 75 Ref,
11). Applied to this problem it gives the following result.
A 1/2 -
B 1/6 -
C 1/6-
D 1/12
E 1/12
"1-1/3
/0
1
01/2 1.0
The determination of the code word from this graph is essentially the: same
as above, namely, proceed from the symbol via the most direct path to the terminal
-49'
point noting the Q*s and l*s encountered. The code words are these digits in
reverse order and are the same as those above.
Example 2,4-2
Apply the Htafftaan encoding procedure to the following symbols.
Symbol X^ X^ X^ X^ X^
Probability 1/6 l/6 l/6 l/6 1/6 1/6
Code word Symbol. Probability
0 1 X, 1/6-I1jC — 1/3---1
00 X2 1/6 Ho
1-2/3c
1—-1*
161 * i/6-tx/3—; 0
10 0 X4 1/6 HO
111 1/6 1 1.0 X^ 1/6
t» 1/3 + 1/3 + 1/2 + 1/2 + 1/2 + 1/2 = 2 2/3 binit/symbol
H(X) ** log 6 ~ 2,5S bits/symbol
HC(X) «* = 0,96? bit/binit
7{c = 96,7/
Example 2,4-3
Determine the Huffman code for the source of Example 2,2-2*
Code Word
11 ..
01001011
Symbol
X1
Xn
10101100111001©101001
101000
*3
X4
Z5
*6
Xj
Probability
0.3 0.3, 0.3 0.3 0.3 0.3 ,0.3p40p0.2 0,2 0.2 0,2 0.2 O,2p30\ .30)\,i
0,2 0.2 0.2 0.2 0.2 0.2 \ ,Z0)0.1 0.1 0,1 0.1p,llp.l9jl.20)
;0? \ .10) ,ll\
1.0.
X,s
X-10
L = 0.6 + 0,4 + 0.4 + 0,4 + 0.2 + 0.25 + 0.15 +0,15
+ 0.12 + 0.12
- 2.79 binifs/source symbol
H(X) = 2,743 bits/source symbol
Ho = x 100 ' 98 • J/P
Note-that this gives an efficiency slightly higher than that for the
3Iiannon-Fo.no code previously considered,
2.5 Additional Tsohniques for the Moiseless channel
In the previous discussions it has been assumed that the code having the
greatest efficiency, for a given source, is the best code. This is a valid
assumption when the cost, in time or money, involved in transmitting a 0 is
the same as that for a 1. For this condition, the. total cost involved in trans
mitting a message is minimized when the coding efficiency is maximized. How
ever, when the code symbols have unequal cost the maximization of y[Q> as
■Walr
as previous.defined, does not give the least cost encoding. Under this condition
the Huffman procedure is no longer optimum and other techniques must be considered*
Blackman (12) and Marcus (13) have considered this, problem giving results that
are extensions of the Shannon-Fano and Huffman procedures. Their methods,: how
ever, do hot necessarily give minimum cost encoding,. A recent article by Karp
(14) describes such a technique which involves the use of digital computer. Be
cause of the-, complexity of this procedure reference should be made to the article
for specific details. ■
An additional situation in which Huffman encoding can not be used occurs
when the encoding is to be done in such a manner that the alphabetical order'of
the- source symbols is maintained in the code words. This might occur, for ex
ample, when English text is to be encoded for storage in a computer memory.
Gilbert and Mopre (15) have developed a technique for encoding such sources.
When applied to the English alphabet this t eehnique results in an average code
word’ length of 4*197$ binits/letter as compared to the minimum possible
(Huffman code) of 4*1195 binits/letter. The procedure for determining these
codes, however, is considerably more complex than that for the Huffman code.
-52-
CHAPTER 3
CODING FOR THE NOXSX CHANNEL
3.1 Introduction
The previous chapter considered coding for the noiseless channel. The .
techniques discussed represent methods for approaching the information trans
mission rate given by Shannon*s first fundamental theorem. Inmost practical
situations, however, the entire channel will not be noise free and the second
fundamental theorem must be applied. The coding techniques discussed in this
chapter represent various approaches to the realization of the information and
error rates given by this second theorem.
Unfortunately there is at present no single technique, analogous to the
Huffman procedure for the noiseless channel, that gives a maximum information
rate and a minimum error rate. There are instead a number of procedures, each
having their own advantages and disadvantages, that have been,proposed as a
solution to this problem* The better known and more readily explained of these
techniques will be discussed in this chapter. It should be emphasized, however,
that a large amount of work remains to be done in this area since the techniques
presented all represent essentially trial-and-error solutions to the coding prob
lem* , “ '
In the previous chapter it was shown that the output of a discrete source
could be encoded into binary digits (binits) in such a manner that the resulting
probabilities for a 0 and a 1 were as nearly equal as desired. Thus this chapter:
can consider,without loss of generality, only a binary source for which the symbol
probabilities are equal. The block diagram resulting from this approach is given
in Fig. 6.
If a sequence of 0*s and l*s are transmitted over a noisy binary channel
some will be received in error, Since the source binits are assumed to be pro-
message Mriltsmessagebinits
m discretemessage binits :
cmno
Encoding
DecodingUser
Fig. 6 - Coding For The Noisy Channel
"54"
cluccd independently, all sequences of binits will be equiprobable and there will
be no way in which the erroneous digits can be detected.
One method of alleviating this problem is to transmit, each message digit an
odd number of times and to select at the receiver the digit occurring most often
in each ghoup. For example, assume the sequence to be transmitted is 001 (71116
and that three digits are to be transmitted for each message digit. The trans
mitted sequence is thus
000 000 111 000 111 111 HI 000
Assume the transmission errors are such that the received sequence is
. 100 01© Oil. 000 101 110 111 010
Assigning to each successive group of three binits the symbol appearing most often
in the group yields the original transmitted sequence. Thus this technique
gives error free transmission when only one error occurs within an individual
group. Note, however, that to obtain this improvement in error rate it has been
necessary to reduce the rate of transmitting message digits by a factor of one-
third, Proper selection of the redundant digits allows more efficient error
correction tnan that illustrated. However, the selection of these binits in an
optimum manner represents the major problem in coding for the noisy channel.
This example illustrates an important general characteristic of coding for
the noisy channel, namely, to be able to detect and/or correct an error at the
receiver it is necessary that redundant binits be inserted into the message at
the transmitter. In the above example the second and third binits In each group
are redundant since they are uniquely determined by the first binit. These
redundant digits contain no information. Their effect is thus to reduce the
average information, or entropy, per binit of the transmitted sequence. Be
cause of this it can be stated that for error detection and/or correction to be
-55-
possible it is necessary that the average entropy of the transmitted binits be
less than 1 bit/binit. This should not be too surprising since Shannon’s second
fundamental theorem states that the average entropy per binit for the digits
supplied to the channel must be less than the channel capacity, 0, if error free
transmission is to be theoretically possible,
A second important characteristic of coding for the noisy.channel is the
encoding of groups of message digits. In most codes groups of, say, m, message
binits are encoded by inserting k redundant digits to give a code word of
m = m + k binits, -Such codes in which all code words are of equal length are
commonly called block codes, In the above example m = 1, k = 2 and n - 3,
In summary, the two important properties of codes for use with a noisy
channel are as follows,
1, The probability of error for a received code word* If the coding is
to be of value this must be less than the probability of error for the
message sequence when it is transmitted ucLthout coding,
2, The ratio of message binits, m, to total binits, n, in a code word.
This ratio can never exceed, G, the channel capacity, but should be close
to it for efficient transmission. At present, few codes approach this
ideal while simultaneously giving a low probability of error.
Before proceeding to the discussion of specific coding techniques the
following definitions pertaining to coding for the noisy channel are given.
Memoryless channel — A channel in which the probability of error for a
received binit is independent of the occurrence of previous errors.
Parity check digits - A more descriptive.term that means essentially the
same as the term redundant digits used above.
Code word - A sequence of n binits composed of both message digits and parity check, or simply, check digits.Length of a code word - The number of binits in a code word. Usually all
-56-
code words in a given eode are of equal length.
Weight of a code word - The number of l*s in the word.
Block code - Any code in which all code words are of equal length, n.
GroujD - A collection of elements or symbols having a specific mathematical
■ property. This term will be defined more precisely in section 3.3.2 and is
given here only to indicate that it now has a specific mathematical definition.
Group code — A binary code in which the eode words have the group property.
Systematic code - An n binit block: code in which m digits are'information
digits and k = n - m digits are parity check digits*
linear .code — A mathematical term for n—ary (binary, trinary, etc.) codes
having a specific property. For binary codes the terms linear code and
group code are synonymous.
3.2 Hamming Codes
The error detecting and error correcting codes discovered by Hamming (15)
represent bhe first useful coding techniques for the noisy memoryless channel.
The work of Hamming is best considered in four parts as follows.
1* Coding to provide for single error detection, i.e., SED codes.
2. Coding to provide for single error correction, i.e., SEC codes.
3. Coding which allows single error correction plus double error detection,
i.e., SEC-DED codes.
4. Certain conditions required of code words to obtain higher orders of
detectability and correetability.
All of the following results are based upon the assumption of a binary
source with equiprobable symbols, a binary symmetric.channel (BSC) with PQ < l/2,
and the use of equal length code Xfords.
3.2,1 SEB Codes
Hamming’s SEB codes for n binit code words are readily determined in the
following manners In the first n-1 positions are placed message digits. In the
n th position a 0 or a 1 is placed so that there is an even number of l’s in the
-57'
total code word* The resulting code word allows single error detection (actually
odd error detection) since any single (odd) error would result in an odd number
of l*s in the .received- Code word, Observe that all even errois go undetected.
Since the ratio of message digits to total digits is m/n = 1 - l/n it might
appear desirable to make n as large as possible so as to obtain the maximum
transmission of message digits,, However, as n increases the probability of two
or more errors, and thus an undetected error, increases. Thus when a maximum
probability of an undetected error is specified there is an upper limit on n,
Example 3*2,1-1
-2For a BSG in which PQ = 10 determine the value of n for a Hamming
SBB code that will make the probability of an undetected error approximately 10“3. ©ompare this to the probability of an undetected error without cod
ing. ■
For the SEB code the probability of an undetected error, P(UBE), in a
code word is simply the probability that an even number of errors will occur.
Thus
P(UDE) = P(2 errors) + P(4 errors) + - -
For a BSG the probability of a particular set of two errors out of a2 . n-2transmitted.digits is PQ (1-PQ) , There are a combination of n digits
/ n \*^taken 2 at a time, (2) , different ways in which two errors can occur. Thus
the total probability of two errors in n digits is
,n 2 n~2(?)■?,
Siirdlar reasoning follows for 4, 6, 0 — - errors. Thus
nln'|""'l(n-r Jf
-58-:.,E,
p(ude) = 1 (?) P^
i-even
Use of the binomial expansion (pp 51-52, Ref. 0) allows this to be
written as
P(UDE) * 1/2 ri-2(l-^f + (l-2P0f J (28)
. ©r
P(UDE)- 1/2 [l-2(.?9)B + ]
Substituting values of n gives
n P(lIDl)
3 ,00057
4 .00112.5 . 00229 ,
Thus a value of n=4 meets the specified error probability*
Without coding, an undetected error will occur whenever a message is
received incorrectly. Thus
P(UDE) = l-P(no errors)- 1- d-P0)n
= i- (.99 y*
= 0,03934
The probability of an undetected error has thus been reduced by a
factor of more than 30 while reducing the information rate by only 25$.
The type of check used above to determine the presence of a single error
is called a parity cheek and will be used throughout the discussion of coding
for the noisy channel. The above discussion used an even parity check. lad an
-59'
odd check beer* used,- the n th digit would have been chosen go as to make an odd
number of digits in the code word* This report mil use only even parity checks.
It should be noted that the parity eheck need not always involve a check over
all of th® message digits.but may check only a portion of these. The codes of
the following sections illustrate this.point.
3,2.2 SEQ, Codes.
Hamming SEC code allows the correction of any single error that occurs with
in a particular code word. '.However* when two or more errors occur this procedure
ean cause additional errors to be created in the decoding process. Thus it is
necessary that these codes be used only in siutations where the probability of
two or more errors is negligibly small.
■The construction of SEC code proceeds by first assigning m of the n binits
in a code word to be information digits. For a given ij, m will be considered to
be fixed. The specific location of three digits will be determined later. The
remaining lc = n-m positions are assigned to be parity check digits. The values
of the check digits will be determined in the encoding operation by even parity
checks over the selected information places. The following discussion will de- .
termine how these parity checks are to be made.
Consider the situation in which a code word has been received either with or
without a single error. Assuming the parity check rules to be known, they can
be applied ip order with the condition that for each time the parity check assigns
the value observed in the corresponding check position a 0 will be recorded while
a 1 will be recorded when the two values disagree. Since there are k check digits*
a sequence of k 0>s and l's will be obtained. When this sequence is written from
right to left it'can be considered as a binary number. This number is called
the. cheeking number and shall be required to give the position of any single error
in the code word. The zero value of this number shall, mean that no error has
occurred. Since the code words are n binits long the cheeking number must be
-40-
capable of specifying n + 1 different events. The relation between the number
of check digits, k, and n is thus
„k n + 1
n maximum a k = n-mi 0 . . 1
' 2 0 23/,
■ 1 2O4
' . 5k
J.2Q
3
7 43
' 3'8 4 4910
' 5k
4
11 . 744
12 8 4
With this result the values of Table X may be
determined. This table gives, for a specified,
n, the maximum number of message digits that can
be used while retaining the capability for correct
ing single errors.
Although it appears from this table that
more information can be transmitted by using
larger values of n it should be remembered that
the probability of two or more errors also in
creases with n« Thus an upper bound on n will
also exist for SEC codes when the maximum prob
ability of error is specified. :
It is now necessary to determine the parity
check rules that will allow the operation des—TABLE I
cribed to be obtained. The digits of the checking number are to be obtained by
applying the parity check rules in order and recording, from right to left, the
resulting sequence of ©is and l*s. Since the checking number is to give the
position of any single error in a code word, any position in the code having a 1
on the right side of its binary representation must cause the first parity check
to fail. The binary representations of the various positions are as follows.
Position Binar
1 0001
23
4 : ■ 0100 '
■ 5: ' 0101 v.
./ 0110 ■■■
■ 7 . . oin ; ■ . _ v;■ : ' 1000
■ 9 ; ■ 1001.'
Observe the right hand, binit is a 1 for all odd positions*, Thus the first
parity check apst be over positions 1, 3, 5, 7, 9 •—■ **• Similar reasoning in**
dicates that the second parity check should be over all positions having a 1 in
the second digit from the right. From above these are 2, 3, 6, 7, 10, 11, - *- *».
Likewise the positions for the third parity check are 4, 5, 6, 7, 12, 13, 14, 15
etc. /. ■ ■
These results indicate the positions to be checks in each of the successive
parity cheeks*. It remains to determine exactly where in the n binit sequence
the k parity check digits should be placed. Observe that by placing the check
digits in positions 1, 2* 4, $ etc • each check digit will.be involved, in only
one of the parity check operations determined above. Although this condition is
not required to obtain the SEC property, it greatly simplifies the decoding pro*-
cedure. Thus these positions will, be used. Table IX summarizes these results.
...:■■ -/1 r ■ * '•. •Parity Cheek ,
No.Location ofCheck Digit
PositionsChecked
V:i-.v '
2
3 ...
4 '■■■■;
1
2 ,.'V 4
$
1,3,5,7,9,11---
2,3,6,7,10,11—
4,3,6,7,12,13,14,15 —
&, 9,10,11, U, 13*14,15,24, 25—
-62-
Example 3*2.2-1
As an illustration of the proceeding results consider the Wammi ng SEC
code of n=7» fable X shows that there are 4 message digits and 3 check
digits per code word, fable II shows that the first parity check is over
positions 1, 3j 7 and determines the value for the digit in position. 1$
the second parity check is over positions 2, 3> 6, 7 and determines the
value in the second position] while the third cheek is over 4, 5, 6, J. and
determines the value in position 4* The information positions in this code
are thus 35 5, 6, 7 allowing a total of 2^ = 16 different code words. As
an example of the application of these check rules assume that the digits in
positions 3# 6,7 are 1, 0, 1, 1 respectively* The first parity check
rule thus requires that a © be plaeed in position 1. Likewise, the second
and third parity check rules require a 1 and a 0 in positions 2 and 4 respec
tively. The. resulting code word is 0110011. fable III givesthe code words
when all 16 possible message sequences are considered.
To demonstrate the error correcting capability of this code assume that
code word 6 has been received as
©110101
Applying the first parity check to positions 1, 3> 5* 7 indicates that the
digit in position 1 should be a 1. Since the received digit is a 0 the
first digit of the checking number is 1. Similarly the second parity check
predicts a 0 for position 2 which disagrees with the received digit. Thus
a 1 is written to the left of the 1 obtained above.
Finally, the third check predicts a 0 for position 4 which agrees with
the received value. The resulting check number is thus
0 1 1
which correctly indicates that position 3 is in error.
fo demonstrate the effect of 2 errors consider the situation in which
-63-
code word 9 is received as
' IX 11X00
Applying the parity check rules gives the checking number
y 0 0 1Thus the decoder wpald change the digit in position 1 to a Q causing a new
error to be. created. This demonstrates that the probability of two or more
errors should be'negligibly small when a SEC code is used,
Letter Position
TABLE III
Since a SEC eode is used, to reduce the probability of a code word being
received in error it is useful to determine the amount by which this probability
is reduced* For this code, the probability of correct reception is the proba
bility that either no errors or a single error occurs. From the results of
example 3.2*1-1 this is given by
P(noerror) = (l-Po)n+(J)Po(l-P0)n_1
- (i-p0)n + n p0a-p0f*1
Since '
P(no error) = 1 P( error) ‘»t __' * 1 f Pe
the desired result is
/ 'P - 1.^ (1-P )a - n P (1-P f1”1 e o cr
Without coding* a digits would be transmitted in each word*
The corresponding probability of error is thus
Pa " 1 " »ro>m
Considering specific values ©fa* 7, a * 4, and P * 10 gives the follow
ing results*
With coding
Pe » 1 - (.99)7 - 7(.99)6 * 0.00195
Without coding
P - X - (.99)4 = 0.03936
-64-
-65“
. Thus, SBC coding has reduced the probability of an uncorrected error by a
factor of 15 while reducing the information rate by less than a half (the rate
with coding is essentially 4/7 bits/binit giving a reduction of 43/).
Observe here that a penality has been paid to obtain the error correcting
Capability. In example 3.2,1-1 a reduction in the probability of an undetected
error of 30 times.'was obtained for only a 25/ reduction in the information rate.
The primary advantage of the SEG codes as compared to the SSI) codes of the
last section lies in the fact that SEC codes correct instead of only detecting
the most probable of the received errors. Thus in situations where, message
retransmission is not possible SEG codes can be used to improve the reliability
of transmission. However, a penality is paid for this capability since fewer
message digits can be transmitted in each code word. Because of this the SEP
Codes can be of value when a feedback channel is available. In the following
section a code is discussed which has both error detecting and error correcting
capabilities,.
3*2*3 SEO-DEB codes
In some cases where a low capacity feedback channel is present it might be
advantageous to correct the most probable single errors by means of a SEC code
and to provide.for message retransmission via the feedback channel when more than
a single error occurs. Hamming has suggested such a code which is obtained from
the SEC code.by simply adding an additional digit that is an even parity cheek
oyer all previous digits. For the code words of Table III this involves adding
an 8th column having the following digits
0 .01 ■1
1100
—66-
1100
001
' '.1'The operation of the SUCCEED code is best explained by considering several
cases.
• 1* Wo errors occur. In this case all parity checks are satisfied. For
example if the sequence
1 0 1 10 10 0
is received the check number is found to be 0 0 0. Since an even number of
l»s are present the last parity check is also satisfied. Thus when the last
parity cheek is satisfied and the checking number is zero it is concluded
that no errors have occurred. (Actually this is not completely true since
the errors could be such as to change one code word into another. The
probability of this occurring, however, is considerably less than the corres
ponding probability of a single or double error.)
2. A single error occurs. For this situation the last parity check will
fail. The resulting checking number will indicate the position of an error
with a zero indicating an error in the last check position. For example, .
assume that the sequence
00001110is received. The checking number is found to be 100 and the last check
fails. Thus the error is in position 4.
3, Two errors occur. In this situation the last parity check is satisfied
but a cheeking number is obtained. This indicates that two errors have
occurred but gives no information regarding their location. Thus, if the
received sequence is
-67'
10001011x
the last cheek is satisfied and the checking number is 0 1 1. However, the
errors occurred ia positions 4 and 5 and the checking number is of no use#
4. lore than two errors occur# In this case no useful information is ob
tained and if the number of errors is odd (so that the last check is satis-
fied) it is possible that the resulting cheeking number will cause an addi
tional error to be created.
For the SEC-DED code the probability that a received code word is.either
correct, or. known to be incorrect is simply the probability: that either 0, .
1 or 2 errors have occurred# Thus the probability, F , of receiving an
erroneous wo 3rd and not knowing that it is incorrect is •'
f - 1 - P(no errors) - P(l error) - P(2 errors)
- i - (i - P0)n+1 -U + i) po (i - P@f - PQ2 (l - Pof”1
.—2For the. values considered previously, i#e., n = 7* M. m 4. and P@ = 1© this . gives a P^ of less, than l©"^# Thus the use .of a SEG-DED code.has reduced
the probability of an undetected error by approximately 500 times while
causing a reduction in the information rate of 5Q$*
3.2.4 Code Requirements for Larger values ©f Detecting and Correcting Capability
In his a article (15) Hamming introduced a geometrical model that allows some
conditions to be specified for codes that are to either detect or correct more
than two errors. This model consists of identifying the sequences of 0‘s and l's
in each code word with a point In n-dimensional space. For large values of n
this ia a rather abstract concept that is of value primarily to the mathematician.
However, the case for n = 3 can be readily considered and illustrates the basic ■ 3
concept* For n = 3 the 2 = g possible code words can be associated with the
points of.a 3-dimensional cube in the following manner.
-68
Let the distance between any two of these points,, say X.and X, be D(X,X).
From the above Figure it is clean that the distance between any two points is
equal to the number of digits in which.the two corresponding code words differ.
Thus, for example, the distance between the parts 101 and 010 is. 2, This corre-
sponds to the number ox edges of the cube that must be traversed in going from
one point to the other. '
. Using this concept it is apparent that the effect of an error in the trans
mission of a code word is to move the code point to a new location. Thus if all
points are used as code words the occurrence of an error can not be detected.
However, if code words having a minimum distance of 2 units are chosen a single
error will cause a code point to be moved in only.one coordinate to a . point that
is not defined as a code word. This allows a single error to be detected. From
the above model one such set of symbols would be
0 0 0 0 1 1
1 0 1
110
or, equally well,
0 0 1 0 1 0 10 0 10 0
-69-
Observe now that if the minimum distance between code words is at least 3
units then any single error will leave the displaced point nearer to the correct
point than to any other code point. This means that any single error can be
corrected. This can be generalized to larger minimum distances with the follow
ing results.
MinimumDistance
Resulting code
1 No error detection or correction possible
2 Single error detection (SED)
3 Single error correction (SEC)
4 Single error correction — double error detectxon(SEC—DED)
5 Double error correction (DEC) or,
Single error correction - triple error detection (SEG-TED)
or, quadruple error detection (QED)
6 Double error correction - triple error detection (DEC-TEC)
or, etc.
The procedures discussed in sections 3,2,1, 3*2,2 and 3*2,3 are merely
specific techniques for determining code words having a minimum distance of 2,
3, and,4 respectively. Thus all the code words of Table III will be observed to
have a minimum distance of at least 3 units.
The determination of a set of code words which Is as large as possible
while maintaining a specified minimum distance represents an unsolved problem
for distances greater than 4 units, These results, however, give conditions that
mu,st be met by any coding scheme that may be devised,
3,3 Slepian Group Codes
3,3,1 Introduction
The work of Slepian (16), which is a generalization of results obtained
-70-
earlier by Hamming and Beed-Muller (17), represents a major contribution to the
field of coding theory. In essence Slepian showed the Hamming and Reed-Muller
coo.es bo a subclass of a larger class of codes called gnoup codes. The group
codes have several special features of practical interest. In particular, (1)
the encoding scheme is relatively simple to instrument due to the placement of
the check digits in the last k positions of the code word; (2) the decoder - a
maximum likelihood detector-is the best possible theoretically (i.e. it gives the
lowest possible PQ for a given code) and is reasonably easy to instrument? and
(3) in many cases of practical interest the codes are the best possible theoreti
cally (i.e., no other code of any type which is composed of the same number of
equal.length n-binit code words has a lower P
The Slepian group codes do not, however, allow transmission at a rate near
the channel capacity with an arbitrarly small error rate. Since Elias (17) has
shown that such codes do exist it is clear that additional work remains to be
done. At present nearly all of the block codes being studied are a subclass of
the general group codes discussed by Slepian.
As with the Hamming codes, all discussion of the Slepian codes is based
upon the assumption of a memoryless binary symmetric channel (BSG) with PQ < 1/2
and equiprobable binary source symbols.
The following section will discuss the mathematical properties that are re-
quired for an understanding of subsequent work.
3«3*2 Definition and Properties of a Group.
The following discussion of the definition and properties of a group is
not as rigorous nor as complete as that given by mathematicians. The information
presented, however, will allow the fundamental properties of group codes to be
understood.
In terms of binary words (i.e., sequences of n binary digits) a group is
defined as follows!
•71-
Definition; A collection of binary words is said to form a group if the
product'4, of any two words is also a member of this collection and if the
collection contains the identity element (this elaaent, I, is defined to be
the all-zero n binit sequence), .
Prom this definition it is clear that the 2n possible sequences of n binits
form a group since the product of any number of the sequences is another sequence,
This group is denoted by Bn and contains 2n words or elements.
Other groups having less than 2n elements can also be found from these binary
words, Por example the words
0 0 ©
10 0
0 0 1 10 1
form a group since the product of any number of the words is also contained in
the group, (Note that any word multiplied by itself yields the identity element).
Groups of this type are contained in the larger group B and are defined to be a
subgroup of Bn. The group codes investigated by Slepian are in this category,
3,3,3 Definition of a Group Code
An noplace group code is defined to be a collection of 2m (m<n) n binit
code words that form a group as defined above. Since the group B^ contains all
2n possible sequences of n binits, all n-place group codes are subgroups of Bn*
* The product of two binary words is defined as follows* Let A= a-^, a2? a^,an and B m b-j, b2, - bn be two n-digit binary words. Then the pro
duct AB is defined as
AB ,= ax + b1? a2 + b2, - ~ + ba
where + denotes addition modulo 2, i.e,, 0+0=1 + 1=0, 0+1 = 1+ 0 = 1, Thus if A = 011000 and B » 110110, AB = 101110.
-72.
Ebr simplicity in subsequent discussion such codes will be deonoted as (n,en
codes »
Slepian has shown (pp 219-221, Ref. 16) that there are exactly
J(n^m)(2n-2°) (2n-2X) (2n-22)(2m»2°) (2m~21) - - - -
(2n-2m-1)
(2^2^)(29)
different subgroups of Bn having 2m elements or words and thus N(n,m) possible
(n,m)-codes. Some values of K(n,m) are given in Table 17 below.
11,811
Observe that as n and m become large the number of possible subgroups in
creases rapidly* Since the Slepian group codes are to be selected from these
subgroups, the problem of choosing the best code for a given n and m becomes '
quite difficult for large n and m.
Example 3.3.3-1
Eor n = 3, m * 2 Table 17 shows that there are N(3t2) = 7 possible
(3,2)-codes. . Trial and error methods show that these are as follows:
Words
The determination of these codes for larger values of n and m is not
a simple problem.
Assuming that a (n,m)«>code has been chosen and the 2ra words determined, in
formation is transmitted with this code by selecting blocks of m message digits and associating these in a one-to-one manner with the 2m code words. Then as
each block of ffi message digits is received, the corresponding block of n code
digits is transmitted over the channel. Due to noise on the channel some of
the digits in the received code word will be in error, . The next problem is thus
concerned with the method for correcting these errors using the known property
that the transmitted words formed a group*
3,3,4 Detection of Group Codes-in i iHH|iHr»MnmiiijiimMM!WH i i!;iti)Hii|»i«M>j|ii I i i Willi rn iiihii . iitii^iiiiijiiiii ijwrrownnm n i m m nWa-
It has been stated that the transmitted code words form a subgroup of Bn, .
This means that only 2m of the possible 2n n-binit sequences are transmitted.
However, due to noise on the channel it is possible to receive any of the 2
n-binit sequences* Thus, the detection process must involve associating a number
of received words with each of the transmitted words in such a manner that the
probability of error is minimized, Slepian has shown (pp 222-223, Ref, 16) that
the optimum detection method (i.e,, it gives the least probability of error) is
as described in the following paragraphs,
let the words of a specific (m,m)~eode be A = I = 000 - - (I is the identity
■74-
©lement),,A^-, - - - A^, where u - 2m. The group BQ (i.e* the collection of
all 2 possible received words) can be developed from this subgroup as shown, be
low
I A2 A^ • • • Au
S2 S2 A2 3zS ° * * S2Au
Hiwhere u = 2 , v = 2 , and S^Aj is the product of n-binit sequences as previously
defined* Observe that there are u - 2n elements, or words, in this array. It
can be shown (pp 17, Ref. 19) that this array contains every element of Bn once
and only once if the words S2, % - - Sv are chosen in the following manners ■
For S2 choose any code word not containedin the first row, for S3, any word not
contained in the first two rows, etc. The -various rows, other than the first,
in this array are called cosets and the first word, i.e,, S2, S3 — Sy, in. each
row is called a coset leader.
It can also be shown (p 436, Ref. 8) that if a coset leader is replaced by
any element in the coset, the same coset will result. Thus, the two collections
of words
■H* ■ ^1^39
and
'■W* AZ» %*■ " ^ ^SiAk) An
are bhe. same. (Sote that this does not imply that words in the same position of
each coset are identical but only that the same words are contained somewhere in
-75-
each eoset),
•The weighty ot an element in the above array is defined to be the number
of l*s in the n-binit word located in the i th row and the j th column. With this
definition and in view of the. proceeding paragraph it is possible to rearrange
the array for so that the coset leaders will have the minimum weight in each
coset. Such-an array is defined to be a standard array.
Example 3,3*4-1
A standard array for B, when developed according to the specific4(4,2)-Code 0000, 110©,. 0011, 1111 is as follows
©ooe iioo ©on . . mi0001 1101 001© 111©oio© looo ©in ionono loio. oioi. looi
In the last row any of the elements could have been chosen as coset leaders
since all are of equal weight 2* In the third row either 0100 or 1000 could
have been used, while either 0001 or 0010 could have been used in the
. second row. It should be clear that many such standard arrays could be
obtained by choosing different coset leaders having the same weights.
The detection scheme for a group code used with a BSC is now as follows?
When a word, say Aj, is transmitted, the received word can be any element in Bn»
If the received word lies in column i of the standard array the detector will
indicate that 1^ has been transmitted. For example, the array of Example 3,3.4-1
shows that the received word ©111 will be produced by the detector as ©Oil, 0110
mil be produced as 0000, etc. Since any word in a standard array is at least as
close to the code word at the top of its column as it is to any other transmitted
code word (pp 222-223, Eef, 16) this detection scheme represents maximum likeli
-76-
hood detection, i.e,, the detected symbol is the one most likely to have been
transmitted. It xri.ll be shown later that, for a given group code, this scheme
gives the lowest possible probability of error, i.e., no other method has a
greater average probability that the transmitted word be correctly produced by
the detector.
Observe that this detection scheme requires a knowledge of all 2n possible
received words. This means that detection equipment requirements will grow
exponentially with increasing code length. Since in many practical situations
large code words are required this represents a serious limitation. A later
section of this report irill discuss an alternate mathod for obtaining maximum
likelihood detection that does not have this characteristic.
3.3.5 Frobability of Error for Group Codes
let an arbitrary code word that is to be transmitted over a BSC be denoted
by A and the resulting received word by T. Note that each of these words are
n-plane binary sequences. The digits of T differ from those of A only in the
positions where an error occurred due to noise on the channel. Thus, a new word,
I can be defined as I = AT which will have a 1 in each position in which the digits
of A and T differ, i.e., in each position in which an error occurred. This word,
is also an element of Bn and serves as a record of the noise on the channel during
the transmission. (For example, if A ^ 1010010 and T = 1110110 then M = AT =
0100100 indicating that an error occurred in positions 2 and. 5.) From previous
results it is known that the probability of U being any particular element of B.nis
(1 -
where w is the weight of I, -
Consider now the ease of transmitting with a particular (n,m)-code and
assume that the standard array for this code is known at the receiver. If the
-77-
maximum likelihood detection scheme is used, a transmitted, letter, .A^, will be
produced without error if and only; if the received word is of the form S .A., i.e„,
the received word:must lie in the column of the standard array having A^ as its .
head. Thus there will be no error only if the noise on the channel represented
by N, is one of the coset leaders. In view of this the probability of correct
detection, 1 - P , is just the sum of the probabilities that N is a coset leader* 0 1
-L # 0 # £
I1-'.)”1 W
i=o
where is the weight of Since* for a fixed % the term
P 1 (1- P ) 1o o
is minimum when w^.is minimum (PQ < l/2) and since the coset leaders of a
standard array have minimum weight the probability of correct detection given by
Ed. (3©) is as large as possible. Thus, as previously indicated, maximum/likeli-
.hood detection gives, for a particular code, the greatest average .probability of
correct detection. However, for a specified n and m there are l(n,m) possible
group codes and this result tells nothing about which of these will have minimum
P_» This problem is considered in a later section,V
Example 3.3.5-1
. For the (4,2) code of Example 3,3«4-i the probability of correct
detection is
i.pe - (i-p0)4 * p0(;-?o;J + p0(i-p0)3 * p02u-p0i2
Assuming PQ = 10 gives
-7S~
i 3 ii, 2P6 = 1-C.99) > 0,02 (.99) + 10 (.99)
« .01905
Without coding
. dm , v2P = 1 - (1-P ) = l-(.99)6 O
= .0199
Thus, in this example nothing has been gained by coding. In fact a
loss is involved since the information rate with coding is,only 5Q^ of that
without coding. This illustrates that coding can not be used indiscriminately
to obtain a reduction In. P * ■ .
In general such a situation would be remedied by encoding larger blocks
of message digits.
3.3.6 Generation of Group Codes by Parity Checks
An encoding method has been suggested in Section 3.3*3 in which the 2m(n,m)-
code words are listed in a code book along with the 2m possible m binit sequences.
The sequence of binits from the message source is then divided into blocks of m
binits and the corresponding code word determined from the code book. The re
sulting m binit code word is transmitted over the channel. This procedure suffers from the fact that 2m+1 words must be stored in the encoding device. Thus,
storage requirement will increase exponentially with increasing message block
lengths. A simpler encoding procedure, giving rise to only a linear increase in
equipment requirements, involves the generation of code words by suitable parity
checks over the message digits in a manner similar to the Hamming procedure. Two
concepts are required before this approach ean be discussed: that of a systematic
code and that' of equivalence.
In a systematic code the digits in any word can be divided into two classes:
(X) the information digits (there are m of these in a (n,m)-code), and (2) the Chech digits (n-m^k in number for the (n,m) code). All words in the code have
the same information digit locations and the same cheek digit locations. The m
information locations may be occupied by any of the 2m m-digit binary sequences.
The digits in the check locations are determined by fixed parity checks over
prescribed combinations of the information digits. Thus the Hamming codes are
one example of systematic codes.
Example 3,3*6-1
Consider the (4,2)-eode given by 0000, 1100, 0011, 1111. Assume that
positions 2 and 3 are to be information positions. Appropriate parity
cheeks over these positions mil give the digits in position 1 and A. De
noting a code word by X-j_, X2, X3, the parity check rules can be deter
mined by solving for the unknown constants in the following equations,
X-^ = A^X2 ® Ag^S^j (^0
\ - *3*2® Vi}
Substituting values from the second and third code words above gives the
following simultaneous equations
1 = An * 1 ® Ao * ©1 (a-l)0 - A-l • 0 ® A2 • 1
0 * Ao * 1 <£> A, • § .■ ■ ■ (b-l)
I- A3 • 0©A^» 1
Simultaneous solution of Eqs. (2-1) and (b-l) gives A-^ = A^ - 1,
A2 = Ao = 0. Thus the parity check rules are
-79-
-00-
“IT' _ "VT~\ - x3
If instead the information' positions where chosen to be 1 and 3 the
above procedure would yield for the parity check rules
X2 =
V"
%
*3
Note that for this code positions 1 and 2 or 3 and 4 can not be used for
information since only 2 numbers appear in each position, i*e. either 00
or 11*
Example 3.3.6-2
Consider the (5,3)-eode ©0000,: 10001, 01011, 00111, UplO,: 10110,
01100, 11101 and choose the information positions to be positions 1, 2,
and 3. The general parity check equations to be solved are
X4 = AjX-l © A2X2 © 43X3 (e)^ A4XX © A5X2 © (d)
Using the second, fifth, and seventh code words gives, for the simul
taneous equations,
0 = A^_ • 1 © A2 • 0 © * 0
1 » A^ . 1 © A2 . 1 © A3 , 0 (c—l)
© — A^ * © © A2 » 1 © A3 .1
1 = A^ » 1 © A5 . 0 © A6 . 0
0 = A^ • 1©A5 * 1 © A6 * 0 (d-1)
0 = * 0 © A5 • 1 <0 A6 * 1
■81-.
Simultaneous solution of these gives for the parity cheek equations
h-hsb
X5 - e> %2 ® X3
Two group codes are defined to be equivalent if one can be obtained from
the other by permuting the digit locations* Thus in Example 3.3.3-1;code numbers
1* 3> 4 are equivalent! code numbers 5y 6, 7 are equivalent! and code number 2 is
in a class by itself. In conjunction with this Slepian gives the following im
portant results: (p 210, Ref. 16) . ■
1* Every group code is a systematic code and. vice versa.
2, Every. (n,w)- code is equivalent to a (n,m)- code in which the first m
places are information digits and in which the last n-m - k places are
determined by parity checks over the first m places.
Because of these .results it is now necessary to consider only (n,m)-codes
in which the first m digits are information digits. The general expression for
the k check digits then becomes
m% 53 XT i = m + 1, - - - n (31)
j=l
Here the summation is modulo 2 with the multiplication rules being 0:1 - 1:0
0:0 = 0# 1:1 = 1. The km values for \ • • may be either 0. or 1 and. define the
particular (n,m)-code being used.
Using these results, group codes will now be specified by giving the parity
check rules rather than by listing all Zf code words. The encoding operation
will then be performed by applying these check rules to blocks of m information
digits in the order specified. The k check digits thus obtained will be added
to the m information digits and the resulting n binit sequence transmitted as
the code word.
-82-
Example 3.3.6-3
Consider the (6,3 )*-code. Suitable parity cheek rules are given by
Slepian (Table III, Ref. 16) as
\ - *L © x2
X5 = X .2_ © X3
x6=x2©x3
Since-m ^ 3, there are 2^ ~ $ words in the (6,3Opcode which are as
follows.
Codeword
InformationDigits
Parity cheik Digits
' 1 000 000
2 001 Oil
3 010 101
4 on 110
5 100 no6 101 101
7 no Oil
' 8 m 000
3.3.7 Detection of Group Codes by Parity Cheeks
The detection method presented in section 3.3.4 is analogous to the cpde
book encoding described above, i.e. the standard array lists all possible received
words:and assigns each of these to a transmitted word, As .mentioned previously
the disadvantage of this method lies in, the fact that storage space for 2n words
must be provided at the decoder. Slepian has described a method for obtaining
maximum likelihood detection by means of parity checks over the received code
-83-
words* This approach eliminates the need for storing all possible code words
and results in a simplification of the detection equipment.
Consider the standard array for the group Bn which has been developed about
a specific (n,m)-eode. This code is assumed to have a set of parity check rules
in the form of Eq« (31). For any word in the array, say T, these parity checks
may be applied. The check digit resulting from the i th.parity check may or may
not agree with the digit in the i th position of T. If it does T satisfies the
i th parity check and a 0 is recorded. Otherwise T fails the i.th check and a 1
is recorded. Proceeding in this manner with all k parity checks results^in a k
digit binary sequence which is defined to be R(T), the parity check sequence of
T. (In determining Rtf) the digits are to be written from left to right.as the
parity checks are applied in order, starting with the cheek for position m + 1.)
For example, using ...the parity cheek rules of Example 3 *3.6-3 shows R( 101001) »
100 Since % +'X2 = 1 i + I3 = 0 = X3 and Xp + X3 = 1 = X^. Obviously,-;
Rtf) can be determined for any word in the array. Using this definition of R(T)
Slepian (pp 224-225, Ref, 16) has .proved the -following theorem.
Theorem; Let X, A2, A3, - - - A^, be a (n,m)-code and consider Bn to be
developed in a standard array about this code. Let R(T) be the parity
Qheck sequence for a word T whieh has been foimed in accordance with the
parity check rules of the specified code. Then R(Tj_) = R(T2) if and only
if Tp and T2 lie in the same row of the standard array.
Example 3*3.7-1
Consider the (4,2)-code shown below
0000 1011 0101 111©
0010 . 1001 0111 1100©100 1111 0001 10101000 0011 1101 - 0110
3 © X2. Every word in the second row fails the first parity check
(for the digit in position 3) and satisfies the second check. The parity
check sequence is thus 10, In like manner the parity check sequence for
row 3 is 01 and for row 4 is 11. By definition, all words in the first row
satisfy the parity checks giving a parity check sequence of 00. The follow
ing relations can thus be established between the coset leaders and thq parity
checlfe sequences.
00-^S-l 33 0000
10-^S2 = 001©
01^ s3 = 0100
n-^s4 = 1000
■.Maximum likelihood detection can now be obtained in the following manner.
When a word T is received it is subjected to the k parity checks of the code
being used. This gives a parity check sequence R(T) which places T in a definite
coset and identifies the coset leader, say S^. The product S^T is formed (S^T is
the word that would be at the head of the column containing T in the standard
array) and produced as the detector output. The probability of error for the de
tected word, Pe, is as given by Eq» (30),
Using this detection scheme only (2351 + 2k - 1) words, plus the parity cheek
rules, must be stored by the detector. For large n and m this represents a
considerable reduction from the Zn words required for the original scheme.
The parity check rules for this code can be shown to be X3 53 X]_,
Example 3.3.7-2
Assume that the word 0001 of the (4,2)-code of Example 3.3,7-1 has
been received. The parity check rules are X3 = X^, = X-^ © X, giving a
parity check sequence of 01. From above, this sequence corresponds to
= 0100, , The detected word is thus (0100) (0001) = 0101.
-B5-
3.3»& Determination of Groups Codes Having Minimum Probability of ErrorThe discussion to this point has assumed that the code words, or the parity
check rules, for a given (n,s)-code where known* However, it was previously in
dicated that there are H(n,m) possible (n,m)~codes from which to choose. Since
these codes are used for error correction it is reasonable to require that the
(n,m)~code selected have a minimum Pg when compared to all other codes having the
same n and m» These considerations give rise to the following questions.
1. Mhich of the l(n,m) different subgroups of Bn give a (n,m)-eode, having
a minimum P ?
2, What is the value of the minimum P§?
Unfortunately, the answers to these questions are not known for general
values of n and m« Slepian has, however, determined the answers for several
specific values*
Hie results are presented in Tables II and III of Reference 16 and in Tables
Tr-4 and T-5 of Reference 0, These results will be discussed in this section.
Additional details should be obtained from the references cited,
Bq. (30) gives the probability, 1 - P , of correctly detecting a transmitted
wprd as
leader will be of weight while having a specific configuration. In general
there will be several, say c<^, coset leaders having a weight Wj_. Grouping these
together allows Eq. (32) to be written as
v(32)
i=o
It will be recalled that P0^ (l - P0)n' is the probability that a coset
(33)
-86-
HHEJlSinqe there are 2 = y coset leader the relation
n
i-o
Must hold for any (n^^)*^code# The Maxiy&urii possible nuMber of coset leaders
having a weight w± is the number of ways in which n digits can be divided into
two collections of w^_ l»s and (n-w^) 0*s. Thus
/ / vi a _ ni: ; wp.- 5J7 (n^rjf (34)
In a previous discussion a new word N was defined as N = AT, where A re
presents the transmitted code word and T the received word. It was shown that
N was a record of the errors on the channel during the transmission of A and that
Correct detection was obtained only when N was one of the coset leaders. Thus,
the.1*8 in.a coset leader indicate the position, and the weight of a coset leader
indicates the number of transmission errors that can occur without causing a
detection error. The oC^*s defined above thus give the number of i-fold, errors
that ean be corrected by a given (n,m)-eode.
Tables II and 1-4 of the cited references give values of oC^ for the best
(i.e. they have the minimum possible Pe) (n,m)-codes for values of k = 2, 3, --
“ n - 1, and n » 4,-----10. (These references use Qj_ instead of Pe. Here
— ^ ~ ®i*) binomial coefficients, (wj_), of Eq, (34) representing the
maximum possible number of i—fold errors, are also listed for comparison with the
^i'3*
Example 3.3.3-1
For m - 4 and n - 7 Table II, Ref, 16, shows that all 7 single and none
of the 21 double or 25 triple errors will be corrected. The corresponding
probability of error, as given by this table is
. -§7- •
p0 -1 - (i - r0)7 - ? ?0d - p0>6
Note that this is the same expression as determined for the Hamming SEC
code of Example 3.3.3-1. Since Hamming codes are a subgroup of fche Slepian
group codes and since the above (7,4)-code corrects all single errors this
means that the two codes are equivalent.If instead n m 10 is use£ Table II, Ref. 16# shovrs that all 10 single,
39 of the possible 45 double, 14 of the possible 120 triple, and none of the
possible 210 quadruple errors will be corrected. The resulting Pg is
?e - ~ - (i - P0;1C - 10 ?0 (i - ?0)9 - 39 p/ Cl - ?0)s - 14 ?03 (1 - ?0)7
In addition to knowing the minimum possible Pe for a given n and m it is also
necessary to know the parity check rules that will allow the corresponding best
code words to be generated. These rules are given in Table III and T-5 of Ref.
16 and 8 respectively. The use of these tables is best explained by an example.
Example 3»3.8-2
For the (7>4)-code considered above Table III (Ref. 16) gives the
■ parity check rules as
: 5 1 3 4
6 12 4
7 12 3 '"7
In terms of previous notations this becomes
X5 = X-l ©
x6 = x1s>x2©x4
X? = © Xg ©Xj
Thuss if a particular 4 binit message sequence is 1100 the correspond
ing code word is 1100 Xc X^ X« where
~88~
I5 = 1©0©0=1
x6 = 1 © 1 © 0 = ©
x7 = l®l©0=0
Slepian makes the following observation about the best codes given by Table
III, Ref. 16.
1. The (n,m)-code best for a particular value of PQ is best for all values
Of P0, 0$Po^l/2.
2. Mot all best (n,m)-codes have the greatest possible minimum distance be
tween nearest words.
3. If a (n,m)*-code corrects all errors equal to or less than j and no errors
greater than j + 1, then there exists no 2m word, n digit code of any type
that is better than the (n,m)-code listed. Such codes are defined to be
. optimum codes. Note that all optimum codes are best codes but that best
codes are not necessarily optimum. For example, of the best codes listed
in Table II, Ref. 16, the (11,3)-code is not optimum while the (8,2)-code
is optimum*
3 »4 Elias1s Iterative Coding
At the present time, the iterative encoding and decoding techniques presented
by Elias (20), (21), (22) represent the only practical method for obtaining an
arbitrarly small error rate without using a feedback channel. The procedure is
conceptually quite simple and may be used with either the BSC or the binary
erasure channel (BEG). The following discussion will illustrate the operation
for the BEG. Similar results using the Hamming SEC-DSD code, or any other
systematic code, are obtained for the BSC (20).
Consider a BEG as given in Fig. 3-(c). This channel model differs from the
BSC in that the decision process at the receiver is modified so as to produce an
erasure symbol, x, instead of an erroneous symbol. In practice this would in-
volve the use of two decision levels instead of the single level used with a
BSC. This difference allows error correction to be obtained with the BSC by
using a single parity check encoding procedure equivalent to the Hamming SED
code.
The error correction feature is obtained by dividing the input sequence of
0»s and l*s into blocks of n - 1 binits. In the n th position of each block is
placed a digit resulting from an even parity check over all previous n-1 digits*
This sequence of n binits is transmitted over the BEC. At the receiver there is
a probability* .P0* that a given digit will be received as an erasure. If only one
binit in a single block is received as an erasure the missing digits can be re
inserted by performing an even parity check over the remaining digits* i.e.* if
an even number of l*s remain the erased symbol was a 0 while if an odd number
remains the erased symbol was a 1. For example assume the following blocks
(n = 8) were received,
01110X10
1 0 1 0 0 1 X X
1100X100
0 0 0 1 1 1 0 X
In the first block the erased symbol must have, been a 0 since an even number of
1*S remain. Likewise the erased symbol must have been a 1 in the third and
fourth blocks* No correction is possible in the second block since the erased
symbols could have been either 0 1 or 1 0*
It is clear that this procedure reduces the average number of erasures re
maining in a block. The amount of this reduction is determined as follows; Be
fore correction the probability of exactly z erasures is
P (z erasures) = (§) PQZ (l - PQ)n
The average, or expected* value* X* of a discrete random.variable* X* is
-90-
given by
T~ 2~ % p(%)i
Thus the average number of erasures before correction is
2 - u<(£) PQZ (X - ?Qf~Z (35)Z~i
= n PQ (Series Ho* 194* Ref. 23)
The average number of erasures after correction, zj is given by n
*■ = H z © p0z a - p0f ■
2=2
a*a* ' / _ _ v n** X*= z - n P (l - P ) o o'
■nPc [l-(1-P0)n-1] (36)
The average number of erasures is thus reduced by a factor of |~1 - (l — P03
when the first order correction procedure is used.
Example 3*4-1
-2 -■If n = 7 and PQ = 10 the resulting values are.
a - ?ao~2 = 0,0?
z» = 0.07 £ 1 - (.99)6 ]
- 0. 00409
Thus? compared to the situation with no coding, the average number of
erasures has been reduced by a factor of 15 while reducing the information
rate by only 12.5$* This compares quite favorably with the Hamming SEG
code of Example 3.2.2-1,
Elias’s iteration technique suggests that the average number of erasures can
be further reduced by periodically transmitting blocks of n binits that are
second order parity checks over the digits in the preceeding n^ - 1 blocks* In
this manner correction may be made for most of the double erasures. This pro
cedure is best explained by means of the following example.
Example 3*4-2
Assume that blocks of 8 digits are to be transmitted and that every
8th block is to contain the second order parity check digits. Let the in
put to the encoder be the following 7 sequence of 7 binits each.
0111101
11110001100101
11011100011111
0110010
ooom©
The first order cheek digits are obtained by an even parity cheek
over the digits in each row and are placed at the end of the corresponding
row. The second order check digits are obtained by an even parity cheek
over the digits in each column and are placed at the bottom of the corres
ponding column. Applied to the above sequences this results in the
following array.
-92-
oiinoi i~^\1111000 o
1100101 o
uoiiio i y0011111 1 0110010 1 oooino : jl J,1101101 1.. ' ^- - v- - - - 1
> 2nd order check digits
1st order check digits
The.code words to be transmitted corresponds to the rows in this array.
At the receiver the words, containing the erasures, are, placed in a similar
array. All single erasures are then corrected by parity checks over the
rows in this array. Additional erasures are corrected by checks over the :
columns in the array. Elias (22) has shown that the average number of
erasures, s’*remaining after this second order correction is
where = PQ [l - (1 - P0f~X ]
and
nj - 1 = the number of digits checked by the second order
parity check.
For the values of Example 3.4-1 this gives
F1 - 0.00409 x 1/7 * 0,000584
na-**- 0,00409 [l - (.999426) J
- 0.00409 x 0, 0.000019/V
-93-
In this case the average information rate at the 'channel input is
49/64 bits/binit. This represents a descrease of 12.5$ from the rate for
single order correction and a 23.5$ reduction from a he rate with no correct- :
tion. Corresponding to these reductions the average number of remaining -
erasures has been reduced by a factor 200 times from the single correction
value and by a factor of 3500. from the no correction value.
ISlias has shown.(20) that this iteration procedure can be continued by means
of 3rd, 4th, — - - order parity checks and that in the limit the average number
of erasures will approach zero while the information rate.remains at a usable
non-zero value.
Thus, using this method, it is possible to make the erasure probability as :
small as desirable if the receiver is willing to wait until a sufficiently high
order parity eheck has been received. A unique feature of this technique is the
fact that the erasure probability can be controlled at the receiver without
changing the transmitted code words.
3.5 Use of Group Codes in Feedback Gommunication Systems
Previous discussions have indicated that when a feedback channel (i.e. a
communication link from the receiver to the transmitter) is present it is possible
to use error detecting codes and to request retransmission of erroneous words via
this channel. When possible, this approach has the advantage of requiring less
eoding equipment while simultaneously giving a high infoi’mation rate and a lower
error rate. It should be emphasized, however, that this method does, not exceed
the information rate given by the second fundamental theorem but only provides a
practical means of more closely approaching this rate while maintaining a low
error rate. ■ : ■;
Many investigators (3) (24) (25) (26) (27) (28) (29) (30) have analyzed the .
characteristics of systems using a feedback channel. However^ to quote Peterson
(Ref. 19) ‘
•94-
"The efficient use of feedback in error control has not received the attention in deserves in coding theory. Certainly feedback can greatly simplify error correction, let there is a definite limit to the efficiency of a simple error- detection and retransmission system, for short error-detection codes cannot efficiently detect errors, while if extremely long codes are used retransmission must be performed too frequently. Little is known about the use of the feedback channel in any more sophisticated way."
Thus, the method presented in-this section is not to be considered as the
ultimate answer in coding for the feedback channel* Instead it represents one
approach that illustrates the use of group codes for error detection.
Cowell (30) has investigated the use of group codes in a feedback system in
which the group property is used to correct some errors in the conventional
manner (as described in Sec, 3.3»4) and to detect additional errors. When an
error is detected a request is sent, via the feedback channel, for a retrnas-
mission of the erroneous code word. The procedure for accomplishing this is as
follows: First, a (n,m)-code is assumed and the array (not necessarily in stand
ard form) for the group B^ is developed about this code. The resulting 2n“m = v
coset leaders are then divided into two sets one of which contains the identity
elan cut, I. Let S be the set containing I, Also, let I, Aj_, A2 - - - A^,(u = 2m) represent the code words of the (n,m)-code selected. The decoding
operation is then performed by expressing the received word, T, as the product
of a transmitted word, A, and a noise word, N, i.e., T « AN, (This noise word,
is. the same as previously discussed and is a record of the errors during the
transmission of A.) If the word N is contained in the set S the received word
is decoded as A; otherwise the transmitter is requested, via the feedback channel,
to retransmit the code word. Thus, this decoder corrects all error patterns that
give noise words contained in S and requests retransmission when the noise word
is not contained in S. If S contains all v coset leaders and these are of
minimum weight this corresponds to the maximum likelihood detection previously
discussed. Conversely, if S contains only the identity element retransmission
occurs whenever the received word is not a code word.
-95'
1 - P , of a word being correctly decoded (this includes the case of correct
decoding after numerous retransmission!)is given by
■ Using this decoding, scheme Cowell (30) has shown that the probability,
1 - P* D.Q
(37)
where
D = L P
N
w(N)Cl * V
n - w(l)
w(lA) n-t)N A
n-w(M)
Here the summations are over all noise words, N, contained in the set S and
over all code words, A, contained in the (n,m)~eode» fhe following example illus
trates this
Example 3.5-1
Consider the following (5,2)-code
OOQOO, OHIO, 10101, 11011,
A suitable standard array for this code is as follows
ooooo oiiiG 10101 1101100001 01111 10100 1101000010 01100 10111 1100100100 01010 10001 1111101000 00110 11101 1001110000 11110 00101 0101100011 01101 10110 1100010010 • 11100 00111 01001When S contains all coset leaders of weight 0 or 1 any received word
-96-
lying in.the'first.6 rows of the array will be decoded as one of the code
words* Similarly a received word lying in either of the last two rows
will cause a request for retransmission. For this situation the summation
for B is over the first 6 coset leaders in the array, Thus
B =* (1 * vj* + 5 P0 (1 - PQ)4
>. (1 + 4 P0) (1 - P0)4
Likewise the double summation for 1 - © is over all words in the first
6 rows of the array*. This gives
" . ♦* po3 Cl - V* + 5 p04 (x - p0) * f05
•it (f) p/ Cl - Pol5"1 - 4[po2(1 - Pol3 ♦ p63 (X - P0)2 ]i=0
= 1 w 4 P02(1 - PD)2
where the last step follows from the binomial expansion* The final
expression is thus
3 -P - (1 * 4P0) (1 - Pq)49 1 - 4 P 2 (1 - P )2
o 4 o'(38)
If, instead, S contains only I the expressionsbecome
B - (1-P015
1 - e - (l - P015 + 2 P03 (1 - T0f * P04 (1 - P0)
(39)
-97-
Likewise, if 8 contains all coset leaders the expressions are
D = (l - P0)5 + 5P9(1- P0)4 + 2 P/ (1 - PQ)3
1-0 = 1
1 - Pe = B (40)
which agrees with the value given by Slepian (Table II, ref. 16) for maximum
likelihood detection.
It is instructive to compare, numerically, the cases for 3 containing
only I and for S containing all coset leaders. Assume PQ = IQ"'. Then from
Eq, (39) the probability of error using retransmission only (i.e», 3 - I) is
.. P =?. T 1 .......... ....... . „.. .e -6 , n-3 , .-A1 + 2.1£f° (.99) + 10 (*99)
» 2,165 x 10"7
Conversely, when no retransmission is used (i.e., 3 contains all eoset
leaders) Eq, (40) gives
P = 1 - (.99/ + 5‘10”2 (.99)4 + 2*icf4 (.99)3e
. . = 7.36 x 10"4
Thus when the code is used only for error detection (with error correc
tion being obtained.by retransmitting the erroneous word) the probability of
error is reduced by a factor ef approximately 3400 times,
Cowell (p I69, Ref, 30) has show that this result is true in general. Thus
when a group code may be used for either error correction, error detection (error
correction via a feedback channel is assumed), or for both, the minimum PQ will
be obtained when the code is used only for error detection. This is intuitively
satisfying since for this case the probability of retransmitting code words is
maximum thus introducing a maximum amount of redundancy into the transmitted
sequences.
Due to the pronounced improvement in Pg obtained with error-detection-only
operation a question arises as to the amount by which this type of operation
reduces the information rate. A convenient means of specifying this reduction
is to define the coding efficiency, Ne, as
N = m = number of message digits per code word c average number of digits transmitted "until
a word is decoded
This ratio, when expressed as a decimal gives the information rate at the
channel input. Thus, when no retransmission is used (i.e,, the code is used
only for error correction) the input information rate is
mn (41)
When the code is used for both correction and detection (or detection only)
Gowell has shown that the coding efficiency is given by
m (l ~ ©)n + L ©
where n, m, and © are as previously defined and 1 represents the number of
digits that are lost whenever a retransmission occurs (i.e., digits required to
re-establish synchronization, digits lost because of an in interleaved trans
mission pattern, etc,). -
In determining L the digits of a retransmitted code word are:not included.
Thus the value of L depends directly upon the communication system and only in
directly, if at all, upon the (n,m)~eode.
-99-
Bxample 3*5-2
Assume that the (5,2) code of Example 3*5-1 is used for error detection
—2only, that L = n, and that P0 = 10 . For this situation the coding efficiency
is
Ncm. 1n * T
1-01 + 0
, s5 -6, .2 -*8,= 2 # (*99/ + 2*10 (,99) + 10 (*99)5 2-(.99f - 2«10“6(.99)2 -10“^(»99)
- 0.362
Thus the use of the (5,2)-code for error detection only (as compared
to its use for error correction only) causes a reduction in the information
rate of approximately 10/ while giving a reduction in error rate of 3400
times. This result illustrates the fact that in general the use of a feed
back: system will allow a.greatly reduced error rate for a given information
pqte and code word length* However, very little work has been done in
determining optimum codes for this operation and little is known about the
maximum possible improvement that can be obtained* At present this area
appears to offer the greatest potential for determining practical techniques
that will allow the rates of the second fundamental theorem to be approached
.and as such is an area worthy of much additional research,
3*6 Additional Techniques for Noisy Channel
The coding techniques presented in the preceeding four sections were chosen
primarily for.two reasons; (l) they represent some of the most basic and.better
known of present techniques! and (2) they are relatively easy to explain. This
section will present some of the more advanced techniques. These will not be
-100-
discussed in detail, however, since they require a knowledge of modern algebra
with a strong emphasis on matrix theory.
3.6.1 Bose-Ghaudhuri Codes
Bose-Chaudhuri codes (31) (32) (Chapt. 9, Ref, 19) represent a generaliza
tion of Hamming codes in that a specific procedure is given for constructing a
set of code words when the amount of error correctability is specified. Peter
sen (p 165, Ref. 19) gives the following theorem regarding these codes:
"For any m and. t (mt<n) there is a Bose-Ghaudhuri code of length 2m-l which corrects all combinations of t or fewer errors and has no more than mt parity' check digits,"
Thus, in contrast to the Hamming (which could be constructed only for a
capability up to SJCG-DED) and the Slepian (which must be constructed by some
type of a search through N(n,m)possibilitie^ codes, the Bose-Ghaudhuri Codes
can be constructed for any n, m, and t provided the relations of the above theorem
are satisfied. However, there is at present no general information concerning Pe
for these codes.
The Bose-Ghaudhuri codes are related to the Slepian codes in that they are
a subgroup of cyclic codes-which are in turn a subgroup of the general class of
group codes. The decoding procedure, however, differs considerably from that for
the Slepian codes (33).
3.6.2 Reed-Muller Codes
As indicated previously, the Reed-Muller codes (17) are a subclass of the
group codes considered by Slepian. They differ from the :Slepian codes in that a *
* ' A cyclic code is a special.group code in which a cyclic shift of any code word is another code word. For example if 10110000 is a word of a cyclic code then 01011000, 0010110, etc. must also be code words.
specific.procedure, is available for-determining a set of code words when.the
Where t is the maximum distance between code words and r is the ''order of the
code. For example, if t *». 4 and r == 2 the Reed-Muller code would have n = 16,
m = 11 and, from section 3.2.4? Would be a SEC-DED code. The generation of the
code words for a Eeed-Muller code involves the use of vector algebra and therefore wil^ not be discussed. .The primary advantage of Reed-rMuller codes lies in
the redlitive ease with which decoding equipment can be constructed. Some wprk
has been done at the M.I.T. Lincoln Laboratory (34) in the construction of an
encoder and decoder for a Reed-Muller code with n = 12S, m = 64.
3.6.3 Fire Codesn I, mm >>»»■'■'■. in. ii .']■■!' .■-■■! i,i
The Fire Codes (Sections 10.1 and 10.2, Ref. 19) are designed to detect and/
or correct errors that occur in a single burst.within a code word (i.e., the
errors do not occur independently but instead occur in several consecutive digits)
Other codes such as the Reed-Solomon codes (Sections 9.3 and 10.7, Ref. 19) can
correct more than one burst of errors.
The.conditions under which a Fire Code can be constructed are as follows:
where b- length (in binits) of burst to be corrected
d - length (in binits) of burst to be detected
t 51 an integer b
n - m ^ k = t + b + d ~ 1
When used for detection alone such a code can detect a single burst of
following relations are satisfied
n - m - 1 + (|) + 1 - (h„r^l)
n least common multiple (LCM) of (2^ - 1) and (b + d - 1)
-102-
length no greater than k binits. When used for both detection and correction it
will correct any single burst of length b or less and detect any single burst of
length d or less.
Example 3.6,3-1
For m = 5, b = 5> d - 7 the length <?f the Fire Code is given by
n - LCM of 31 and 11 - 341
Thus
k = 5 + 5 + 7- l-l6
andm = 325 '
This code will correct a burst of 5 errors and detect a burst of 7
errors. Observe the high ratio of m/n for this code. This is a character
istic of codes for burst error detection and correction and is not possible
with codes for independent errors.
Details concerning the construction of Fire Codes should be obtained from
Ref. 19s Section 10.1.
3.6.4 Wogencraft*s. Sequential Coding
All of the codes previously discussed have been block codes. All block
codes have the fault that as n is increased (in an attempt to obtain a greater
m/n ratio and a lower P@) the delay between the time a symbol is produced at
the source and the time it is decoded.at the receiver also increases. Thus in
many situations a maximum allowable delay places an upper bound upon the length
of any block code that might he used. This in turn limits the information rate
and Pe that may be obtained. : A practical method for circumventing this problem
could.offer a considerable potential for more closely approaching the rates of
the second fundamental, theorem.
The sequential encoding and decoding technique discovered by Wozencraft (4)
-103-
(35) represents such a method,,
Since the crux of this method lies in the decoding operation only this will
be considered.
In essence, sequential decoding is accomplished by decoding one received in
fo mation digit at a time. The procedure is as followsi The actual received
sequence is compared with all possible transmitted sequences starting with a 0
and also with all possible transmitted sequences starting with a 1. Unless a
large., number'of errors have occurred the actual received sequence will differ
from all but one sequence in one of these sets by such a large amount that it
can be concluded that the sequence for which the difference is a minimum represents
the transmitted sequence. In this manner the first information digit is deter
mined. It is then recorded and deleted from the sequence. The comparison pro
cedure is then repeated to determine the next information digit, etc*
With this procedure the delay between a generated symbol and a decoded
symbol is greatly reduced for a given P , A second advantage of this method
lies in the fact that decoding equipment requirements grow approximately as
the square of the effective code length while many block decoding schemes in
volve equipment requirements that grow exponentially with increasing code length.
At the present time, sequential coding represents what is probably the most
sophisticated of all techniques and as such is one of the most difficult to under
stand. For the serious worker in this area Ref. 4 gies a thorough discussion-.of
the details involved,
3.7 Relationship Between the Coding Techniques Discussed in this Report.
It is often difficult for a newcomers to the field of coding theory to.
establish just exactly where the numerous coding techniques fit into the overall
picture. The block diagram of Fig. 7 has been prepared to provide such a picture.
Starting at the top, the general area of the study of coding techniques is in
dicated, This area can be divided into essentially two groups? (l) those systems
-104-
that use binary symbols, and (2) all non-binary (or n-ary) systems. Because of
their widespread use, this report has considered only binary systems. Proceeding
with binary systems, there are within this group two further divisions, namely,
coding for the noiseless channel and coding for the noisy channel. From this
further breakdown are indicated between equal and non-equal cost symbols, etc.
Finally, the various coding techniques are indicated under the appropriate blocks.
, For purposes of comparison, Fig. 8 lists some of the advantage and disadvant
ages of the various codes.
3*8 Conclusion
The coding techniques presented in this report represent some useful and
practical methods of coding for both the noisy and noiseless channel. The noise
less procedures of Huffman, Gilbert-Moore and Karp represent optimum (i.e. they
give maximum efficiency) procedure? for the noiseless channel and as such may be
used essentially without qualification. However, the noisy procedures that have
been presented do not have this desirable characteristic. Instead, these pro
cedures represent some of the less mathematical, and thus more readily explained,
better known procedures. In many eases these procedures are well known simply
because they represent the first work in a particular area and not because they
are the best possible techniques. Thus any practical application of noisy coding
should be proceeded by further investigation into some of the later and more ad
vanced techniques. Elias (pp 342-343* Ref. 36) gives an excellent discussion of
some additional factors and methods that should be considered*
Coding Coding■ for ■ - for
Non-Binary BinarySystems . Systems
Not discussedCoding Coding
for the for theNoiseless NoisyChannel Channel
' : ' •
jhannon-Fano
Gilbert-Moore
Huffman
Blackman
Karp
Marcus
Shannon1 s Binary
Coding "with Equal - Cost
Symbols
Coding vriLth Non-Equal-Cost
Symbols
Coding for th< BSC or BEC without memory
Coding for Channel x\iith memory
Reed-Soloman
Fire
Group Codes
(Slepian)
Hamming
Bos e-Chaudhuri
Fijg» 7 - A Block Diagram Presentation Reed-Muller
of the Relationship Between Various Coding Techniques
... CodingTechniques .
. Advantages Disadvantages
Hamming Codes T. Are conceptually the most simple codes,
2, Are reasonably easy to instrument.
1. No procedure for constructing codes having a minimum distance greater
■ than 4. V:;
2. Information rate is small due to the restriction on Tvord length imposed by the requirements that the probability of higher order errors be negligiblysmall. I--.;. ■
Slepian GroupCodes
1.' In some cases are best possible . codes.£
2« The encoding scheme is relatively easy to instrument.
3. The decoding scheme is the bestpossible theoretically and is . irelatively easy to instrument.
1. The procedure for determining parity check rules involves a, search through a large number of possible codes.Because of this, codes for n greater than 12 have not been determined.
2. The procedure for determining coset leaders used in decoding is involvedfor large n. -
Elias * s Iterative Coding .
1. Allows error rate to be made arbitrarily small while giving a useful information rate.
2. For ,moderate Pg requirements the '■ decoding is relatively simple.
3. When used With a BEC the encoding is extremely simple.
1. Both encoder and decoder have large storage requirements when PQ must be small.
2. Transmission at channel capacity is not possible while simultaneously obtaining arbitrarily small P .
Bos e-ChauahuriCodes
1. Provide an explicit procedure for . , constructing codes having a. speci
fied minimum distance between code words .
■ 1. Procedure is'applicable only forW code word lengths of 2P _. i
P = <-} 3j “ ■ • — •
Fig. 8 — Some Advantages and Disadvantages of Various Noisy Coding Techniques
Coding..Techniques
Advantages Disadvantages .
Reed-Muller Codes 1. Decoding procedure is relatively simple to instrument.
2. Frovides an explicit procedure fbr constructing codes having a specified minimum disatance between code words.
1. Conceptually quite complex.
2,. Procedure applies only for code lengths of 2P, p = 2, 3* 4> —
Fire Codes 1. Can correct errors occuring in bursts-with fewer check digits than with codes designed for
’ independent errors.
2, ' Are .'relatively,easy to instrument.
3 , Can be used for sliultaneous detection and correction.
1. Can correct only a- single burst of errors within a given code word.
2. Require a knowledge of modem algebra to -understand.
Sequential Coding 1, Offers nossibiD^ty of obtaining small P w'thout the excessive delay tine of block codes.
2, Decoding equipment.grows slowly ■: witn effective block length as compared to block decoding.
1. Operation extremely difficult to analyze. '
2. Ho information available on Pg. y;
Feedback 37sterns 1. Reduces equipment .complexity for a given information and error rate, v
2.. Allows the use of error—detec- ■ tion-only codes Tilth are easier10 inst i-ument.
1. Requires a feedback channel.
Fig. $ - — (cant.)
BIBLIOGRAPHY
1. G« E. Shannon, r,A Mathematical Theory of Communication," Bell System Technical Joumel, Vol. 27, 1948, pp. 379-423 and pp. 623-$>6.'
\2. 0, E* Shannon, "Communication in The Presence of Noise," Proc. I.R.E.
Vol, 37, 1949, pp, 10-21. '
3. B« Reiffen, ¥» G. Schmidt and H. L. Yadkin, "The Design of an Error-Free Transmission System for Telephone Circuits," Communication and Electronics, July, 1961, pp, 224-230.
4* J« M» Wozencraft a-nd B. Reiffen, "Sequential Decoding," Technology Press and Wiley, 1961,
5* J« C, Hancock, "An Introduction to the Principles of Communication Theory," McGraw-Hill, 1961, pp, 78-89#
6. Laning and Battin, "Random Processes in Automatic Control," McGraw-Hill,1956, Chapter 2,
7# Y. W, Lee, "Statistical Theory of Communication," Wiley, i960, Chapter 3,
8. F. M, Reza, "An Introduction to Information Theory," McGraw-Hill, 1961. .
9* R. Mo Fano, "Tire Transmission of Information," Technical Report 65, M.I.T, Cambridge, Mass., March 17, 1949.
IQ. D. A. Huffman, "A Method for the Construction of Minimum Redundance Codes," Proc, IRE. Sept. 1952, pp, 1098-1101.
11, R, M. Fano, "Transmission of Information," M.I.T. Press and Wiley, 1961,
12, N. M. Blackman, "Mnimum-Gost Encoding of Information," IRE Trans. on Information Theory, Vol. IT-3, pp, 139-149#
13, R» S» Marcus, "Discrete Noiseless Coding," M.S. thesis in electrical engineering, M.I.T., Cambridge, 1957.
14, R* M. Karp, "Minimum-Redundance Goding for the Discrete Noiseless Channel," IRE Trans, on Information Theory, Vol. IT-7, pp, 27-38.
15, R. W. Hamming, Error Detecting and Error Correcting Codes," Bell System Technical Journal, Vol, 29* 1950* pp# 147-160.
16, D. Slepian, "A Class of Binary Signaling Alphabets," Bell System Technical Journal, Vol. 35* 1956, pp. 2Q3-234.
17* I® S. Reed, "A Glass of Multiple-Error-Correcting Codes and the DecodingScheme," IRE Transactions on Information Theory, PGIT - 4* 1954* PP® 38-49.
18. P. Elias, "Coding for Noisy Channels," IRE Convention Record, Part 4, 1955* pp. 37-46.
•109-
19. W* W, Peterson, "Error-Correcting Codes," M.I.T. Press and Wiley, 1961.
20. P.Eliasj "Error-Free Coding," IKE Transaction on' Information Theory. PGIT-4 September, 1954, pp. 29-37.
21* B. J* Baghdady, "Lectures on Communication System Theory,"' McGraw-Hill, 1961, Chapter 13 * ,
i22. Grabbe, Rarno, and Wooldridge, "Handbook of Automation, Computation, and Con
trol," John Wiley & Sons, 1958, Vol* 1, pp. 16-35 to 16-38,
23* L* B. W. Jolley, "Summation of Series,"'Dover Publications, Inc*, 196l*
24. S* S. L.Ghang, "The Theory of Information Feedback Systems," IRE Transactions on Information Theory. Vol* IT - 2, Sept. 1956, pp. 29-40.
25. B* Harris, A* Hauptschein, and L. S. Schwartz,- "Optimum Decision Feedback Circuits," IRE Convention Record. Part 2, 1957, PP» 3-10.
26. B..Harris, A. Hauptschein, K« C* Morgan, and L. S* Schwartz, "Binary Decision Feedback Systems for Maintaining Reliability Under Conditions of Varying Field Strength," Proceeding of the National Electronics Conference. Vol. 13,1957, pp. 126-140:-------- ■------------------------------------
27*. W* B. Bishop and B. L, Buchanan, "Message Redundance versus Feedback forReducing Message Uncertainty," IRE Convention Record. Part 2, 1957, PP» 33-39*
2S* B. Harris and K* C. Morgan, "Binary Symmetric Decision Feedback Systems," Communication and Electronics, No. 38, 1958, pp. 436-443.
29* J. J» Metzner, "Binary Relay Communication with Decision Feedback,11 IRE Convention Record, Part 4, 1959, PP« 112-119.
30* W» E* Cowell, "Thus Use of Group Codes in Error Detection and■Message Retransmission," IRE Transactions on Information Theory, Vol. IT - 7, July,
31. R, C.Bose and D. K. Ray-Chaudhuri, "On a Glass'of Error Correcting Binary Group Codes." Information and Control. Vol. 3, No. 1, March I960,, pp. 68-79.
32. R, C«. Bose and D. K* Ray-Chaudhuri, "Further Results on Error Correcting Binary Group Codes," Ibid,. Vol. 3, No. 3, Sept. I960, pp. 279-290,
33. W* W, Peterson, "Encoding and Error-Correction Procedures for the Bose- Ghaudhuri Codes," IRE Transactions on Information Theory* Vol, IT-6, Sept.,I960, pp. 459-470,
34* K, E. Perry, "An Error-Correcting Encoder and Decoder for Phone Line Data,"IRE Wescon Convention Record. Part 4, Aug,, 1959, PP. 21-26,
35* J* M. Wozencraft, "Sequential Decoding for Reliable Communication,".IKE National Convention Record," Part 2, Mar,, 1957, pp. 11-23.
36. E. J. Baghdady, "Lectures on Communication System Theory," McGraw-Hill, 1961.