XII. PROCESSING AND TRANSMISSION OF INFORMATION* Prof. …

XII. PROCESSING AND TRANSMISSION OF INFORMATION*

Prof. E. Arthurs Prof. W. A. Youngblood M. HorsteinProf. P. Elias D. C. Coll T. S. HuangProf. R. M. Fano J. E. Cunningham F. JelinekProf. J. Granlund J. B. Dennis T. KailathProf. D. A. Huffman H. A. Ernst R. A. KirschProf. R. C. Jeffrey E. F. Ferretti G. E. Mongold, Jr.Prof. H. Rodgers, Jr. R. G. Gallager B. ReiffenProf. C. E. Shannon T. L. Grettenberg L. G. RobertsProf. W. M. Siebert F. C. Hennie III W. L. WellsProf. J. M. Wozencraft E. M. Hofstetter H. L. Yudkin

A. REPRODUCTION OF PICTURES BY COMPUTER DISPLAY

Digital, sampled picture data have been used in digital computers for studying

pictures. The computer operations consist of various recoding processes that are

used to determine upper bounds on the amount of information necessary for specifying

pictures. The recoding and subsequent decoding must be done in such a manner that

the picture quality is not degraded excessively. To determine whether or not a par-

ticular process is successful in this respect, a picture must be resynthesized from the

decoded data.

Hitherto, this resynthesis was done by reading paper tape (upon which a decoded

version of the recorded data had been punched) into a modified facsimile reproducer.

This process took 4. 5 hours; during this time the modulated light source was subject

to drift, and photographic distortion followed in a subsequent copying process.

A program has been written for the TX-0 computer which will read in paper tapes

of this kind and display the sample values on an addressable-coordinate cathode-ray

oscilloscope. Each point of the oscilloscope display corresponds to a sample point of

the picture, and each point is intensified a number of times equal to the value of the

reflectance at that sample point. This display is photographed with a Polaroid camera

and yields photographs such as those shown in Fig. XII-1. The pictures labeled (a) and

(b) are reproductions of the original data tape; these pictures are quantized to 64 levels;

a 6-digit binary number is required for the specification of each sample point. The

pictures labeled (c) and (d) are reproduced to represent a 16-level quantization; a

4-digit binary number is required for each sample point. The successful reproduction

of the pictures is due, in part, to the excellent cooperation of the TX-0 computer staff

in overcoming display problems.

The time required for reproducing one of these pictures is 10 minutes. Provision

is made in the program for masking out any or all of the 6 binary numbers used to

,This research was supported in part by Purchase Order DDL-B222 with LincolnLaboratory, which is supported by the Department of the Army, the Department of theNavy, and the Department of the Air Force under Contract AF19(604)-5200 with M.I.T.

113

(XII. PROCESSING AND TRANSMISSION OF INFORMATION)

(a)

(c)

(b)

(d)

Fig. XII-1. Pictures reproduced by computer display.

specify the 64 intensity levels of the data, and to add noise or bias as desired.

By means of a program of this sort we can get a picture with resynthesized data by

performing any encoding and decoding operation before displaying. The program thus

provides a high-speed reproduction of the picture which we can study to determine the

effects of the encoding processes.J. E. Cunningham

B. THE CAPACITY OF A MULTIPLICATIVE CHANNEL

Consider the memoryless channel whose input is the set of real numbers -oo < x < co,

whose output is the set of real numbers -co < y < oo, and whose transition probability

114

- r w I m-

-- ~--- --


p(ylx) is a function of the ratio y/x. This defines a multiplicative channel

y = nx (1)

where the probability distribution of the multiplicative noise n is given. We consider

here only symmetric probability densities of n, that is,

Pn(n) = pn(-n)

We ask, "Does this channel have a capacity when the second moment of x is con-

strained ? "

We show in this report that the capacity of this channel is indeed infinite, and that

this capacity may be obtained with a suitably shifted version of any nontrivial probability

density function px(x).

Note that because p n(n) is symmetric, p y(y) will also be symmetric, irrespective

of Px(x). Thus without loss of generality, we can confine our attention to the three

variables x', y', n' which may not assume negative values:

px(x) + px(-x) x > 0

Pr(x') = px(O) x = 0 (2)

0 x<O0

p (y) + py(-y) y > 0

Pr(y') = p y(0) y = 0 (3)

o y<0

Pn(n) + pn(-n) n > 0

Pr(n') = pn (O) n = 0 (4)

0 n<0

Thus

y' = n'x' (5)

Define

w = log y', u = log n', v = log x' (6)

Therefore

w=u+v (7)

115


Thus, by a change of variables, we have constructed a channel with additive noise: u.

Since Pr(n') is known, the density function p (u) can be obtained by standard methods.

The second-moment constraint on x and x',

x Px(x) dx = (x') 2 Pr(x') dx = (8)o 0

in terms of the transformed variable v = log x', becomes

2v 2e Pv (v) dv = o (9)0

We desire a choice of Px(x) or Pr(x') that is consistent with the constraint of Eq. 8,which maximizes the mutual information between input and output, I(X;Y) or equivalently,

I(X';Y'). However, v and w are related to x' and y' in one-to-one correspondence.

Thus

I(X';Y') = I(V;W) = H(W) - H(W IV) (10)

where the H's are the respective entropies.

Since the noise u is additive in its channel, H(W IV) is the noise entropy and is a

constant, being a function only of the density function pu(u).

Now, since w = u + v, and u and v are independent random variables, H(W) > H(V).

Therefore,

I(V;W) > H(V) - H(W V) (11)

Consider any probability density Pv(v) that makes the right-hand side of Eq. 11

equal to an arbitrarily large rate. In general, this will not satisfy the constraint of

Eq. 9.

Define the new density

Pv(v) = pv(v+X) (12)

Applying the constraint of Eq. 9 to p' , we havev2v

e p v(v+X) dv (13)00

Making the substitution q = v + X, we obtain

e (q ) dq = e f e Pv(v) dv (14)

Thus by suitable choice of parameter X, we can be assured that the density p'V

satisfies the constraint.

116


However, the entropy of density pv is clearly identical to the entropy of density pv.

Thus we have succeeded in transmitting an arbitrarily high rate with arbitrarily low

second moment on the input variable x.

Since x' = ev, we can find the density of x' associated with densities pv and p:

P'r(x') = (log x'), Pr(x') = Pv(log x') (15)

Since pv (v) = pv(v+X), we have

P'r(x) = 1p (log [e x']) (16)-x.

Thus the P'r (a) and Pr(b) densities are related by the transformation a = e- b. If we

note this relationship between a and b, it is clear that the effect of a linear shift X > 0

in the v-domain is equivalent in the x'-domain to a compression toward zero and renor-

malization.

I wish to acknowledge helpful discussions with Professor C. E. Shannon on this

problem.

B. Reiffen

C. ASYMPTOTIC BEHAVIOR OF OPTIMUM FIXED-LENGTH AND

SEQUENTIAL DICHOTOMIES

One of the classical problems of statistics is the determination of the probability

distribution of a random variable from measurements made on the variable itself. The

simplest, nontrivial version of this problem is the dichotomy - a situation in which the

unknown distribution is known to be one of two possible distributions. In this report

two of the most important solutions of the dichotomy are described and their perform-

ances compared.

1. The Optimum Fixed-Length Dichotomy

Let x be a random variable admitting one of two possible (discrete) probability

distributions, Po(x) or pl(x), and let denote the a priori probability that distribution

"0" pertains. It is our object to decide, on the basis of N independent measurements

on x, which distribution is present. To do this with a minimum probability of error (1),

we calculate the a posteriori probability, N' that " 0 " is true,

Po(X) ' . Po(xN) (1)N = Po (xo )... Po(N) + (1-) Pl(X ... P 1(XN)

and decide in favor of nO if N > 1/2, and in favor of " 1" if N < 1/2. Since Eq. 1

117


may be written in the equivalent form,

1N= 1+ 1_

N

where

Pl(x 1) P1(XN)N p Po(x) Po(xN)

the decision procedure amounts to selecting "O" if N < (1 - ( ) / (, and "l" otherwise.

This decision rule may be put into a still more useful form by defining

N PI(xi) ]L N = lg N il log [P(ki)]

In terms of this quantity, our decision rule reads: select "O" if LN < log (l-), and"1" otherwise. The probability of error, p , incurred by this process is

P = PoLN > log + (1-g) PlLN <log (1 5 (2)

where the subscripts denote which probability distribution is to be used in the calculation

of the pertinent probability.

Since the x. are independent, we see that LN is the sum of N independently and

identically distributed random variables. The problem of computing the asymptotic

behavior of p for large N thus reduces to the classical problem of finding the asymp-

totic behavior of the probability of the "tailsn of the distribution of the sum of a largenumber of independent random variables. This problem has been solved by Shannon (2),whose results we shall now state in a form suited to our needs.

2. Asymptotic Behavior of the Tails of the Distribution of the Sum of NIndependently and Identically Distributed Random Variables

Let {yi} denote a sequence of independent random variables with the common cumu-

lative distribution function, F(x) = P(yi < x).We now define

1(s) = log (s)

where

ooxs

t(s) exs dF(x)

and

118


N

YN Yii=l

In terms of these definitions, Shannon's results may be summarized by the following

statement: For any A,

P(YN A) K e[(s)N (3)

P(YN A) N1/2

as N - oo. Here s is the root of the equation )0(s) = 0, and K is a constant that depends

on A but is independent of N. The upper formula holds when the expectation of y is

greater than zero ('(0) > 0), and the lower formula holds when the expectation of y is

less than zero ( i'(0) < 0).

3. Asymptotic Behavior of p E for the Optimum Fixed-Length Dichotomy

We now apply relation 3 to our fixed-length decision problem, in order to determine

the asymptotic behavior of p . To this end, we need the functions:

i() Po(x) i(x) dx

Ii(s) = log i(s) i = 0, 1

We note that Ko(s) = 1(s-1) and, therefore, 1 o(s) = o(s-1). It follows that, if so and

s1 are the roots of the equations p (s ) = 0 and il (s = 0, respectively, s s - 1.

This implies that io(S) = =1(S ) which, in turn, implies that both terms in Eq. 2 have

the same exponential behavior. In other words, the large N behavior of p is given by

the formula

K e o(So)N (4)p ~ 1N/2 e

Where so is the root of the equation, oi' (So) = 0, and K is a constant that depends on

5 but is independent of N. We shall now derive the asymptotic behavior of the optimum

sequential dichotomy and compare it with expression 4.

4. The Optimum Sequential Dichotomy

It can be shown (1) that the sequential decision scheme that minimizes the proba-

bility of error for a given expected length of test is of the following form:

At the N th step of the test compute LN and

if LN < log B, accept "0"

119


if LN > log A, accept "1"

if log B < LN < log A, take another measurement and go through the same procedure,using LN+1. The constants A and B are to be chosen so as to minimize p for a given

expected length, N.

Wald (3) has shown that the asymptotic behavior of p and N as A - 0o and B - 0

is given by

p ~& + (1- ) B

S1 5(5)N - logB + g A

o 1

where

Ii = 1 i(0 ) (6)

It is now easy to minimize p for a fixed N; the result is-I

S~ K e-I N (7)

where

1 -__ 1-_I I I0

and K' is a constant dependent upon but independent of N. Equation 7 is the desired

asymptotic relationship for the sequential test.

5. Comparison of the Asymptotic Behavior of the Two Dichotomies

We shall now compare the asymptotic behavior of the fixed-length dichotomy given

by expression 4 with that of the sequential dichotomy given by expression 7. Our main

result in this direction is that the exponent in the sequential case is larger than the

exponent in the fixed-length case; that is,

I > - o0(s o )

(8)

To derive this, we note that Eq. 6 implies that po' (0) = and o' (1) = I. Recalling

Lo (S)0 So

SLOPE 10SLOPEI

o LOPE: Io SLOE) Fig. XII-2. Lower bound for io(So).oSo I -'-, .-

120


that so is the root of the equation io(so) = 0, we can make a sketch of fio(s), as shown

in Fig. XII-2. From the sketch we see that the convexity (4"(s) > 0) of the function

Jo(s) enables us to use the intersection of the tangents to 1 0(s) at s = 0 and s = 1 as

a lower bound for o(So). Symbolically,

o(So I

However,

1 ( s ol 0 )o-- go(So ) I1 1 I

1I -0 0

(9)o/ 1 0

Expression 9 can be written in two equivalent forms as follows,

1- Ii 1 I1-_0(s ) I 1- I

I I =- II1 1

o o

if -I

<1l --- 11 -

1- if Iif

Inequality 8 follows at once. It shows that for long tests (large N or N) the sequential

test has a probability of error that is almost exponentially smaller than the fixed-length

test. In other words, not only is the probability of error for the sequential test less

than the corresponding probability of error for the fixed-length test, but the ratio of the

two probabilities goes to zero almost (except for a factor of N 1/2) exponentially.

E. M. Hofstetter

References

1. D. Blackwell and M. A. Girshick, Theory of Games and Statistical Decisions(John Wiley and Sons, Inc., New York, 1954).

2. C. E. Shannon, Information Theory Seminar Notes, M.I.T., 1956 (unpublished).

3. A. Wald, Statistical Decision Functions (John Wiley and Sons, Inc., New York,1950).

121

XII. PROCESSING AND TRANSMISSION OF INFORMATION* Prof. …

Documents

Transcript of XII. PROCESSING AND TRANSMISSION OF INFORMATION* Prof. …