Dhih, A Review of Rare Buddhist Texts XII - Prof. S. Rinpoche and Prof. Vrajvallabh Dwivedi
XII. PROCESSING AND TRANSMISSION OF INFORMATION* Prof. …
Transcript of XII. PROCESSING AND TRANSMISSION OF INFORMATION* Prof. …
XII. PROCESSING AND TRANSMISSION OF INFORMATION*
Prof. E. Arthurs Prof. W. A. Youngblood M. HorsteinProf. P. Elias D. C. Coll T. S. HuangProf. R. M. Fano J. E. Cunningham F. JelinekProf. J. Granlund J. B. Dennis T. KailathProf. D. A. Huffman H. A. Ernst R. A. KirschProf. R. C. Jeffrey E. F. Ferretti G. E. Mongold, Jr.Prof. H. Rodgers, Jr. R. G. Gallager B. ReiffenProf. C. E. Shannon T. L. Grettenberg L. G. RobertsProf. W. M. Siebert F. C. Hennie III W. L. WellsProf. J. M. Wozencraft E. M. Hofstetter H. L. Yudkin
A. REPRODUCTION OF PICTURES BY COMPUTER DISPLAY
Digital, sampled picture data have been used in digital computers for studying
pictures. The computer operations consist of various recoding processes that are
used to determine upper bounds on the amount of information necessary for specifying
pictures. The recoding and subsequent decoding must be done in such a manner that
the picture quality is not degraded excessively. To determine whether or not a par-
ticular process is successful in this respect, a picture must be resynthesized from the
decoded data.
Hitherto, this resynthesis was done by reading paper tape (upon which a decoded
version of the recorded data had been punched) into a modified facsimile reproducer.
This process took 4. 5 hours; during this time the modulated light source was subject
to drift, and photographic distortion followed in a subsequent copying process.
A program has been written for the TX-0 computer which will read in paper tapes
of this kind and display the sample values on an addressable-coordinate cathode-ray
oscilloscope. Each point of the oscilloscope display corresponds to a sample point of
the picture, and each point is intensified a number of times equal to the value of the
reflectance at that sample point. This display is photographed with a Polaroid camera
and yields photographs such as those shown in Fig. XII-1. The pictures labeled (a) and
(b) are reproductions of the original data tape; these pictures are quantized to 64 levels;
a 6-digit binary number is required for the specification of each sample point. The
pictures labeled (c) and (d) are reproduced to represent a 16-level quantization; a
4-digit binary number is required for each sample point. The successful reproduction
of the pictures is due, in part, to the excellent cooperation of the TX-0 computer staff
in overcoming display problems.
The time required for reproducing one of these pictures is 10 minutes. Provision
is made in the program for masking out any or all of the 6 binary numbers used to
,This research was supported in part by Purchase Order DDL-B222 with LincolnLaboratory, which is supported by the Department of the Army, the Department of theNavy, and the Department of the Air Force under Contract AF19(604)-5200 with M.I.T.
113
(XII. PROCESSING AND TRANSMISSION OF INFORMATION)
(a)
(c)
(b)
(d)
Fig. XII-1. Pictures reproduced by computer display.
specify the 64 intensity levels of the data, and to add noise or bias as desired.
By means of a program of this sort we can get a picture with resynthesized data by
performing any encoding and decoding operation before displaying. The program thus
provides a high-speed reproduction of the picture which we can study to determine the
effects of the encoding processes.J. E. Cunningham
B. THE CAPACITY OF A MULTIPLICATIVE CHANNEL
Consider the memoryless channel whose input is the set of real numbers -oo < x < co,
whose output is the set of real numbers -co < y < oo, and whose transition probability
114
- r w I m-
-- ~--- --
(XII. PROCESSING AND TRANSMISSION OF INFORMATION)
p(ylx) is a function of the ratio y/x. This defines a multiplicative channel
y = nx (1)
where the probability distribution of the multiplicative noise n is given. We consider
here only symmetric probability densities of n, that is,
Pn(n) = pn(-n)
We ask, "Does this channel have a capacity when the second moment of x is con-
strained ? "
We show in this report that the capacity of this channel is indeed infinite, and that
this capacity may be obtained with a suitably shifted version of any nontrivial probability
density function px(x).
Note that because p n(n) is symmetric, p y(y) will also be symmetric, irrespective
of Px(x). Thus without loss of generality, we can confine our attention to the three
variables x', y', n' which may not assume negative values:
px(x) + px(-x) x > 0
Pr(x') = px(O) x = 0 (2)
0 x<O0
p (y) + py(-y) y > 0
Pr(y') = p y(0) y = 0 (3)
o y<0
Pn(n) + pn(-n) n > 0
Pr(n') = pn (O) n = 0 (4)
0 n<0
Thus
y' = n'x' (5)
Define
w = log y', u = log n', v = log x' (6)
Therefore
w=u+v (7)
115
(XII. PROCESSING AND TRANSMISSION OF INFORMATION)
Thus, by a change of variables, we have constructed a channel with additive noise: u.
Since Pr(n') is known, the density function p (u) can be obtained by standard methods.
The second-moment constraint on x and x',
x Px(x) dx = (x') 2 Pr(x') dx = (8)o 0
in terms of the transformed variable v = log x', becomes
2v 2e Pv (v) dv = o (9)0
We desire a choice of Px(x) or Pr(x') that is consistent with the constraint of Eq. 8,which maximizes the mutual information between input and output, I(X;Y) or equivalently,
I(X';Y'). However, v and w are related to x' and y' in one-to-one correspondence.
Thus
I(X';Y') = I(V;W) = H(W) - H(W IV) (10)
where the H's are the respective entropies.
Since the noise u is additive in its channel, H(W IV) is the noise entropy and is a
constant, being a function only of the density function pu(u).
Now, since w = u + v, and u and v are independent random variables, H(W) > H(V).
Therefore,
I(V;W) > H(V) - H(W V) (11)
Consider any probability density Pv(v) that makes the right-hand side of Eq. 11
equal to an arbitrarily large rate. In general, this will not satisfy the constraint of
Eq. 9.
Define the new density
Pv(v) = pv(v+X) (12)
Applying the constraint of Eq. 9 to p' , we havev2v
e p v(v+X) dv (13)00
Making the substitution q = v + X, we obtain
e (q ) dq = e f e Pv(v) dv (14)
Thus by suitable choice of parameter X, we can be assured that the density p'V
satisfies the constraint.
116
(XII. PROCESSING AND TRANSMISSION OF INFORMATION)
However, the entropy of density pv is clearly identical to the entropy of density pv.
Thus we have succeeded in transmitting an arbitrarily high rate with arbitrarily low
second moment on the input variable x.
Since x' = ev, we can find the density of x' associated with densities pv and p:
P'r(x') = (log x'), Pr(x') = Pv(log x') (15)
Since pv (v) = pv(v+X), we have
P'r(x) = 1p (log [e x']) (16)-x.
Thus the P'r (a) and Pr(b) densities are related by the transformation a = e- b. If we
note this relationship between a and b, it is clear that the effect of a linear shift X > 0
in the v-domain is equivalent in the x'-domain to a compression toward zero and renor-
malization.
I wish to acknowledge helpful discussions with Professor C. E. Shannon on this
problem.
B. Reiffen
C. ASYMPTOTIC BEHAVIOR OF OPTIMUM FIXED-LENGTH AND
SEQUENTIAL DICHOTOMIES
One of the classical problems of statistics is the determination of the probability
distribution of a random variable from measurements made on the variable itself. The
simplest, nontrivial version of this problem is the dichotomy - a situation in which the
unknown distribution is known to be one of two possible distributions. In this report
two of the most important solutions of the dichotomy are described and their perform-
ances compared.
1. The Optimum Fixed-Length Dichotomy
Let x be a random variable admitting one of two possible (discrete) probability
distributions, Po(x) or pl(x), and let denote the a priori probability that distribution
"0" pertains. It is our object to decide, on the basis of N independent measurements
on x, which distribution is present. To do this with a minimum probability of error (1),
we calculate the a posteriori probability, N' that " 0 " is true,
Po(X) ' . Po(xN) (1)N = Po (xo )... Po(N) + (1-) Pl(X ... P 1(XN)
and decide in favor of nO if N > 1/2, and in favor of " 1" if N < 1/2. Since Eq. 1
117
(XII. PROCESSING AND TRANSMISSION OF INFORMATION)
may be written in the equivalent form,
1N= 1+ 1_
N
where
Pl(x 1) P1(XN)N p Po(x) Po(xN)
the decision procedure amounts to selecting "O" if N < (1 - ( ) / (, and "l" otherwise.
This decision rule may be put into a still more useful form by defining
N PI(xi) ]L N = lg N il log [P(ki)]
In terms of this quantity, our decision rule reads: select "O" if LN < log (l-), and"1" otherwise. The probability of error, p , incurred by this process is
P = PoLN > log + (1-g) PlLN <log (1 5 (2)
where the subscripts denote which probability distribution is to be used in the calculation
of the pertinent probability.
Since the x. are independent, we see that LN is the sum of N independently and
identically distributed random variables. The problem of computing the asymptotic
behavior of p for large N thus reduces to the classical problem of finding the asymp-
totic behavior of the probability of the "tailsn of the distribution of the sum of a largenumber of independent random variables. This problem has been solved by Shannon (2),whose results we shall now state in a form suited to our needs.
2. Asymptotic Behavior of the Tails of the Distribution of the Sum of NIndependently and Identically Distributed Random Variables
Let {yi} denote a sequence of independent random variables with the common cumu-
lative distribution function, F(x) = P(yi < x).We now define
1(s) = log (s)
where
ooxs
t(s) exs dF(x)
and
118
(XII. PROCESSING AND TRANSMISSION OF INFORMATION)
N
YN Yii=l
In terms of these definitions, Shannon's results may be summarized by the following
statement: For any A,
P(YN A) K e[(s)N (3)
P(YN A) N1/2
as N - oo. Here s is the root of the equation )0(s) = 0, and K is a constant that depends
on A but is independent of N. The upper formula holds when the expectation of y is
greater than zero ('(0) > 0), and the lower formula holds when the expectation of y is
less than zero ( i'(0) < 0).
3. Asymptotic Behavior of p E for the Optimum Fixed-Length Dichotomy
We now apply relation 3 to our fixed-length decision problem, in order to determine
the asymptotic behavior of p . To this end, we need the functions:
i() Po(x) i(x) dx
Ii(s) = log i(s) i = 0, 1
We note that Ko(s) = 1(s-1) and, therefore, 1 o(s) = o(s-1). It follows that, if so and
s1 are the roots of the equations p (s ) = 0 and il (s = 0, respectively, s s - 1.
This implies that io(S) = =1(S ) which, in turn, implies that both terms in Eq. 2 have
the same exponential behavior. In other words, the large N behavior of p is given by
the formula
K e o(So)N (4)p ~ 1N/2 e
Where so is the root of the equation, oi' (So) = 0, and K is a constant that depends on
5 but is independent of N. We shall now derive the asymptotic behavior of the optimum
sequential dichotomy and compare it with expression 4.
4. The Optimum Sequential Dichotomy
It can be shown (1) that the sequential decision scheme that minimizes the proba-
bility of error for a given expected length of test is of the following form:
At the N th step of the test compute LN and
if LN < log B, accept "0"
119
(XII. PROCESSING AND TRANSMISSION OF INFORMATION)
if LN > log A, accept "1"
if log B < LN < log A, take another measurement and go through the same procedure,using LN+1. The constants A and B are to be chosen so as to minimize p for a given
expected length, N.
Wald (3) has shown that the asymptotic behavior of p and N as A - 0o and B - 0
is given by
p ~& + (1- ) B
S1 5(5)N - logB + g A
o 1
where
Ii = 1 i(0 ) (6)
It is now easy to minimize p for a fixed N; the result is-I
S~ K e-I N (7)
where
1 -__ 1-_I I I0
and K' is a constant dependent upon but independent of N. Equation 7 is the desired
asymptotic relationship for the sequential test.
5. Comparison of the Asymptotic Behavior of the Two Dichotomies
We shall now compare the asymptotic behavior of the fixed-length dichotomy given
by expression 4 with that of the sequential dichotomy given by expression 7. Our main
result in this direction is that the exponent in the sequential case is larger than the
exponent in the fixed-length case; that is,
I > - o0(s o )
(8)
To derive this, we note that Eq. 6 implies that po' (0) = and o' (1) = I. Recalling
Lo (S)0 So
SLOPE 10SLOPEI
o LOPE: Io SLOE) Fig. XII-2. Lower bound for io(So).oSo I -'-, .-
120
(XII. PROCESSING AND TRANSMISSION OF INFORMATION)
that so is the root of the equation io(so) = 0, we can make a sketch of fio(s), as shown
in Fig. XII-2. From the sketch we see that the convexity (4"(s) > 0) of the function
Jo(s) enables us to use the intersection of the tangents to 1 0(s) at s = 0 and s = 1 as
a lower bound for o(So). Symbolically,
o(So I
However,
1 ( s ol 0 )o-- go(So ) I1 1 I
1I -0 0
(9)o/ 1 0
Expression 9 can be written in two equivalent forms as follows,
1- Ii 1 I1-_0(s ) I 1- I
I I =- II1 1
o o
if -I
<1l --- 11 -
1- if Iif
Inequality 8 follows at once. It shows that for long tests (large N or N) the sequential
test has a probability of error that is almost exponentially smaller than the fixed-length
test. In other words, not only is the probability of error for the sequential test less
than the corresponding probability of error for the fixed-length test, but the ratio of the
two probabilities goes to zero almost (except for a factor of N 1/2) exponentially.
E. M. Hofstetter
References
1. D. Blackwell and M. A. Girshick, Theory of Games and Statistical Decisions(John Wiley and Sons, Inc., New York, 1954).
2. C. E. Shannon, Information Theory Seminar Notes, M.I.T., 1956 (unpublished).
3. A. Wald, Statistical Decision Functions (John Wiley and Sons, Inc., New York,1950).
121