Framework for Linear Information Inequality
-
Upload
michael-atallah -
Category
Documents
-
view
8 -
download
3
description
Transcript of Framework for Linear Information Inequality
1924 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
A Framework for Linear Information InequalitiesRaymond W. Yeung,Senior Member, IEEE
Abstract—We present a framework for information inequali-ties, namely, inequalities involving only Shannon’s informationmeasures, for discrete random variables. A region inIR2 �1,denoted by��, is identified to be the origin of all information in-equalities involvingn random variables in the sense that all suchinequalities are partial characterizations of ��. A product fromthis framework is a simple calculus for verifying all unconstrainedand constrained linear information identities and inequalitieswhich can be proved by conventional techniques. These includeall information identities and inequalities of such types in theliterature. As a consequence of this work, most identities andinequalities involving a definite number of random variables cannow be verified by a software called ITIP which is available onthe World Wide Web. Our work suggests the possibility of theexistence of information inequalities which cannot be proved byconventional techniques. We also point out the relation between�� and some important problems in probability theory and
information theory.
Index Terms—Entropy, I-Measure, information identities, in-formation inequalities, mutual information.
I. INTRODUCTION
SHANNON’S information measures refer to entropies, con-ditional entropies, mutual informations, and conditional
mutual informations. For information inequalities, we referto those involving only Shannon’s information measures fordiscrete random variables. These inequalities play a centralrole in converse coding theorems for problems in informationtheory with discrete alphabets. This paper is devoted to a sys-tematic study of these inequalities. We begin our discussion byexamining the two examples below which exemplify what wecall the “conventional” approach to proving such inequalities.
Example 1: This is a version of the well-known data pro-cessing theorem. Let be random variables such that
- - - - form a Markov chain. Then
In the above, the second equality follows from the Markovcondition, while the inequality follows because isalways nonnegative.
Manuscript received August 10, 1995; revised February 10, 1997. Thematerial in this paper was presented in part at the 1996 IEEE InformationTheory Workshop, Haifa, Israel, June 9–13, 1996.
The author is with the Department of Information Engineering, The ChineseUniversity of Hong Kong, Shatin, N.T., Hong Kong.
Publisher Item Identifier S 0018-9448(97)06816-8.
Example 2:
The inequalities above follow from the nonnegativity of, , and , respectively.
In the conventional approach, we invoke certain elementaryidentities and inequalities in the intermediate steps of a proof.Some frequently invoked identities and inequalities are
if - - - -
if - - - -
Proving an identity or an inequality using the conventionalapproach can be quite tricky, because it may not be easy tosee which elementary identity or inequality should be invokednext. For certain problems, like Example 1, we may rely onour insight to see how we should proceed in the proof. But ofcourse, most of our insight in problems is developed from thehindsight. For other problems like or even more complicatedthan Example 2 (which involves only three random variables),it may not be easy at all to work it out by brute force.
The proof of information inequalities can be facilitated bythe use ofinformation diagrams1 [25]. However, the use ofsuch diagrams becomes very difficult when the number ofrandom variables is more than four.
1It was called anI-diagram in [25], but we prefer to call it aninformationdiagram to avoid confusion with an eye diagram in communication theory.
0018–9448/97$10.00 1997 IEEE
YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1925
In the conventional approach, elementary identities andinequalities are invoked in a sequential manner. In the newframework that we shall develop in this paper, all identitiesand inequalities are considered simultaneously.
Before we proceed any further, we would like to make afew remarks. Let and be any expressions dependingonly on Shannon’s information measures. We shall call theminformation expressions, and specifically linear informationexpressions if they are linear combinations of Shannon’sinformation measures. Likewise, we shall call inequalitiesinvolving only Shannon’s information measures informationinequalities. Now if and only if .Therefore, if for any expression we can determine whether itis always nonnegative, then we can determine whether anyparticular inequality always holds. We note that ifand only if and . Therefore, it suffices tostudy inequalities.
The rest of the paper is organized as follows. In the nextsection, we first give a brief review of-Measure [25] onwhich a few proofs will be based. In Section III, we introducethecanonical formof an information expression and discuss itsuniqueness. We also define a region calledwhich is centralto the discussion in this paper. In Section IV, we presenta simple calculus for verifying information identities andinequalities which can be proved by conventional techniques.In Section V, we further elaborate the significance of bypointing out its relations with some important problems inprobability theory and information theory. Concluding remarksare given in Section VI.
II. REVIEW OF THE THEORY OF -MEASURE
In this section, we give a review of the main resultsregarding -Measure. For a detailed discussion of-Measure,we refer the reader to [25]. Further results on-Measure canbe found in [7].
Let be jointly distributed discrete randomvariables, and be a set variable corresponding to a randomvariable . Define the universal set to be and let
be the -field generated by . Theatoms of have the form , where is either or .Let be the set of all atoms of except for ,which is by construction because
(1)
Note that .To simplify notations, we shall use to denote
and to denote . Let . It wasshown in [25] that there exists a uniquesignedmeasure on
which is consistent with all Shannon’s information measuresvia the following formal substitution of symbols:
( ), i.e., for any (not necessarily disjoint)
(2)
When , we interpret (2) as
(3)
When , (2) becomes
(4)
When and , (2) becomes
(5)
Thus (2) covers all the cases of Shannon’s information mea-sures.
Let
for some (6)
Note that . Let . Definearbitrary one-to-one mappings
and let
(7)
(8)
where and for .Then
(9)
where is a unique matrix (independent of) with
ifif
(10)
An important characteristic of is that it is invertible [25],so we can write
(11)
In other words, is completely specified by the set of values, , namely, all the joint entropies involving
, and it follows from (5) that is the uniquemeasure on which is consistent with all Shannon’s infor-mation measures. Note that in general is not nonnegative.However, if form a Markov chain, is alwaysnonnegative [7].
As a consequence of the theory of-Measure, the infor-mation diagram was introduced as a tool to visualize therelationship among information measures [25]. Applicationsof information diagrams can be found in [7], [25], and [26].
1926 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
III. T HE CANONICAL FORM
In the rest of the paper, we shall assume thatare the random variables involved in our discussion.
We observe that conditional entropies, mutual informations,and conditional mutual informations can be expressed as alinear combination of joint entropies by using the followingidentity:
(12)
where . Thus any information expression canbe expressed in terms of the joint entropies. We call this thecanonical formof an information expression.
Now for any their joint entropies correspondto a vector in , where we regard as thecoordinates of . On the other hand, a vector in issaid to beconstructibleif there exist whose jointentropies are given by. We are then motivated to define
is constructible
As we shall see, not only gives a complete characterizationof all information inequalities, but it also is closely related tosome important problems in probability theory and informationtheory. Thus a complete characterization ofis of fundamen-tal importance. To our knowledge, there has not been such acharacterization in the literature (see Section V).
Now every information expression can be expressed incanonical form. A basic question to ask is in what sensethe canonical form is unique. Toward this end, we shall firstestablish the following theorem.
Theorem 1: Let be measurable such thathas zero Lebesque measure. Thencannot
be identically zero on .We shall need the following lemma which is immediate
from the discussion in [26, Sec. 6]. The proof is omitted here.Lemma 1: Let
is constructible
(cf., (11)). Then the first quadrant of is a subset of .Proof of Theorem 1:If has positive Lebesque mea-
sure, since has zero Lebesque measure andhence has zero Lebesque measure,
has positive Lebesque measure. Thencannot be a subset of , which implies that
cannot be identically zero on . Thus it suffices to provethat has positive Lebesque measure. Using the aboveLemma, we see that the first quadrant of , which haspositive Lebesque measure, is a subset of. Thereforehas positive Lebesque measure. Since is an invertiblelinear transformation of , its Lebesque measure must alsobe positive. This proves the theorem.
The uniqueness of the canonical form for very generalclasses of information expressions follows from this theorem.For example, suppose and are two polynomials ofthe joint entropies such that for all . Let
. If is not the zero function, thenhas zero Lebesque measure. By the theorem,cannot beidentical to zero on , which is a contradiction. Thereforeisthe zero function, i.e., . Thus we see that the canonicalform is unique for polynomial information expressions. Wenote that the uniqueness of the canonical form for linearinformation expressions has been discussed in [4] and [2, p.51, Theorem 3.6].
The importance of the canonical form will become clearin the next section. An application of the canonical form torecognizing the symmetry of an information expression willbe discussed in Appendix II-A. We note that any invertiblelinear transformation of the joint entropies can be used forthe purpose of defining the canonical form. Nevertheless, thecurrent definition of the canonical form has the advantagethat if and are two sets of random variables suchthat , then the joint entropies involving the randomvariables in is a subset of the joint entropies involving therandom variables in .
IV. A CALCULUS FOR VERIFYING
LINEAR IDENTITIES AND INEQUALITIES
In this section, we shall develop a simple calculus for verify-ing all linear information identities and inequalities involvinga definite number of random variables which can be provedby conventional techniques. All identities and inequalitiesin this section are assumed to be linear unless otherwisespecified. Although our discussion will primarily be on linearidentities and inequalities (possibly with linear constraints),our approach can be extended naturally to nonlinear cases.For nonlinear cases, the amount of computation required islarger. The question of what linear combinations of entropiesare always nonnegative was first raised by Han [5].
A. Unconstrained Identities
Due to the uniqueness of the canonical form for linearinformation expressions as discussed in the preceding section,it is easy to check whether two expressions and areidentical. All we need to do is to express in canonicalform. If all the coefficients are zero, then and areidentical, otherwise they are not.
B. Unconstrained Inequalities
Since all information expressions can be expressed in canon-ical form, we shall only consider inequalities in this form.The following is a simple yet fundamental observation whichapparently has not been discussed in the literature.
For any , always holds if andonly if .
This observation, which follows immediately from the def-inition of , gives a complete characterization of all un-constrained inequalities (not necessary linear) in terms of.From this point of view, an unconstrained inequality is simplya partial characterization of .
The nonnegativity of all Shannon’s information measuresform a set of inequalities which we shall refer to as thebasic
YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1927
inequalities. We observe that in the conventional approachto proving information inequalities, whenever we establishan inequality in an intermediate step, we invoke one of thebasic inequalities. Therefore, all information inequalities andconditional information identities which can be proved byconventional techniques are consequences of the basic inequal-ities. These inequalities, however, are not nonredundant. Forexample, and , which are bothbasic inequalities of the random variablesand , imply
again a basic inequality of and .We shall be dealing with linear combinations whose co-
efficients are nonnegative. We call such linear combinationsnonnegative linear combinations. We observe that any Shan-non’s information measure can be expressed as a nonnegativelinear combination of the following twoelemental formsofShannon’s information measures:
i)ii) , where and .
This can be done by successive (if necessary) application(s)of the following identities:
(13)
(14)
(15)
(16)
(17)
(18)
(Note that all the coefficients in the above identities arenonnegative.) It is easy to check that the total number ofShannon’s information measures of the two elemental formsis equal to
(19)
The nonnegativity of the two elemental forms of Shannon’sinformation measures form a proper subset of the set ofbasic inequalities. We call the inequalities in this smaller setthe elemental inequalities. They are equivalent to the basicinequalities because each basic inequality which is not anelemental inequality can be obtained by adding a certain setof elemental inequalities in view of (13)–(18). The minimalityof the elemental inequalities is proved in Appendix I.
If the elemental inequalities are expressed in canonical form,then they become linear inequalities in . Denote this set ofinequalities by , where is an matrix, anddefine
(20)
Since the elemental inequalities are satisfied by any, we have . Therefore, if
then
i.e., always holds.Let , be the column -vector whose th
component is equal to and all the other components areequal to . Since a joint entropy can be expressed as anonnegative linear combination of the two elemental formsof Shannon’s information measures, eachcan be expressedas a nonnegative linear combination of the rows of. Thisimplies that is a pyramid in the positive quadrant.
Let be any column -vector. Then , alinear combination of joint entropies, is always nonnegative if
. This is equivalent to say that the minimumof the problem
(Primal) Minimize subject to
is zero. Since gives ( is the only cornerof ), all we need to do is to apply the optimality test ofthe simplex method [19] to check whether the pointis optimal.
We can obtain further insight in the problem from theDuality Theorem in linear programming [19]. The dual of theabove linear programming problem is
(Dual) Maximize subject to and
where
By the Duality Theorem, the maximum of the dual problem isalso zero. Since the cost function in the dual problem is zero,the maximum of the dual problem is zero if and only if thefeasible region
and (21)
is nonempty.Theorem 2: is nonempty if and only if for
some , where is a column -vector, i.e., is anonnegative linear combination of the rows of.
Proof: We omit the simple proof that is nonempty ifand only if for some , where is a column
-vector. Let
(22)
If for some , then . Let
Since can be expressed as a nonnegative linearcombination of the rows of
(23)
can also be expressed as a nonnegative linear combinationsof the rows of . By (22), this implies for some
.Thus always holds (subject to ) if and
only if it is a nonnegative linear combination of the elementalinequalities (in canonical form).
1928 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
We now summarize the results in this section. For infor-mation expressions and , let be thecost function subject to the elemental inequalities. Then applythe optimality test of the simplex method to the point .If is optimal, then always holds. Ifnot, then may or may not always hold. If italways holds, it is not implied by the elemental inequalities. Inother words, it cannot be proved byconventionaltechniques,namely, invoking the elemental inequalities.
Han has previously studied unconstrained information in-equalities involving three random variables [5] as well asinformation inequalities which are symmetrical in all therandom variables involved [6], and explicit characterizations ofsuch inequalities were obtained. A discussion of these resultsis found in Appendix II.
C. Constrained Inequalities
Linear constraints on arise frequently in infor-mation theory. Some examples are
1) , , and are mutually independent if and only if
2) , , and are pairwise-independent if and only if
3) is a function of if and only if .4) - - - - - - form a Markov chain if and only
if and .
In order to facilitate our discussion, we now introducean alternative set of notations for . We do notdistinguish elements and singletons of, and we write unionsof subsets of as juxtapositions. For any nonempty ,we use to denote , i.e., (refer to SectionII for the definition of ). We also define for nonempty
to simplify notations.In general, a constraint is given by a subsetof . For
instance, for the last example above,
When , there is no constraint. (In fact, there isno constraint if .) Parallel to our discussion in thepreceding subsection, we have the following more generalobservation:
Under the constraint , for any ,always hold if and only if .
Again, this gives a complete characterization of all constrainedinequalities in terms of . Thus in fact is the origin of allconstrained inequalities, with unconstrained inequalities beinga special case. In this and the next subsection, however, weshall confine our discussion to the linear case.
When is a subspace of , we can easily modify themethod in the last subsection by taking advantage of the linearstructure of the problem. Let the constraints on begiven by
(24)
where is a matrix (i.e., there are constraints).Following our discussion in the last subsection, a linearcombination of joint entropies is always nonnegativeunder the constraint if the minimum of the problem
Minimize subject to and
is zero.Let be the rank of . Since is in the null space of ,
we can write
(25)
where is a matrix whose columns form a basisof the orthogonal complement of the row space of, andis a column -vector. Then the elemental inequalitiescan be expressed as
(26)
and in terms of , becomes
(27)
which is a pyramid in (but not necessarily in the positivequadrant). Likewise, can be expressed as .
With the constraints and all expressions in terms of,is always nonnegative under the constraint if theminimum of the problem
Minimize subject to
is zero. Again, since gives ( is theonly corner of ), all we need to do is to apply the optimalitytest of the simplex method to check whether the pointis optimal.
By imposing the constraints in (24), the number of ele-mental inequalities remains the same, while the dimensionof the problem decreases from to . Again from theDuality Theorem, we see that is always nonnegative if
for some , where is a column -vector, i.e., is a nonnegative linear combination of theelemental inequalities (in terms of).
We now summarize the results in this section. Let theconstraints be given in (24). For expressions and , let
. Then let be the cost function subjectto the elemental inequalities (in terms of) and apply theoptimality test to the point . If is optimal,then always holds, otherwise it may or maynot always hold. If it always holds, it is not implied by theelemental inequalities. In other words, it cannot be proved byconventional techniques.
YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1929
D. Constrained Identities
We impose the constraints in (24) as in the last subsection.As we have pointed out at the beginning of the paper, twoinformation expressions and are identical if and only if
and always hold. Thus we can apply themethod in the last subsection to verify all constrained identitiesthat can be proved by conventional techniques.
When are unconstrained, the uniqueness of thecanonical form for linear information expressions asserts that
if and only if . However, when the constraintsin (24) on are imposed, does not imply
. We give a simple example to illustrate this point.Suppose and we impose the constraint .Then every information expression can be expressed in termsof and . Now consider
(28)
Note that the coefficients in the above expression are nonzero.But from the elemental inequalities, we have
(29)
and
(30)
which imply that .We now discuss a special application of the method de-
scribed in this subsection. Let us consider the followingproblem which is typical in probability theory. Suppose we aregiven that - - - - and - - - - form a Markovchain, and that and are independent. We ask whether
and are always independent. This problem can beformulated in information-theoretic terms with the constraintsrepresented by , , and
, and we want to know whether they imply.
Problems of such kind can be handled by the methoddescribed in this subsection. Our method can prove anyindependence relation which can be proved by conventionalinformation-theoretic techniques. The advantage of using aninformation-theoretic formulation of the problem is that we canavoid manipulations of the joint distribution directly, which isawkward [8], if not difficult.
It may be difficult to devise a calculus to handle inde-pendence relations of random variables in a general setting,2
because an independence relation is “discrete” in the sensethat it is either true or false. On the other hand, the problembecomes a continuous one if it is formulated in information-theoretic terms (because mutual informations are continuousfunctionals), and continuous problems are in general lessdifficult to handle. From this point of view, the problem ofdetermining independence of random variables is a discreteproblem embedded in a continuous problem.
2A calculus for independence relations has been devised by Massey [9] forthe special case when the random variables have a causal interpretation.
V. FURTHER DISCUSSION ON
We have seen that , but it is not clear whether. If so, and hence all information inequalities are
completely characterized by the elemental inequalities. In thefollowing, we shall use the notations and when werefer to and for a specific .
For
In -Measure notations, the elemental inequalities are, , and .
It then follows from Lemma 1 that .Inspired by the current work, the characterization ofandhas recently been investigated by Zhang and Yeung. They
have found that (therefore in general), but, the closure of , is equal to [29]. This implies that all
unconstrained (linear or nonlinear) inequalities involving threerandom variables are consequences of the elemental inequal-ities of the same set of random variables. However, it is notclear whether the same is true for all constrained inequalities.They also have discovered the following conditional inequalityinvolving four random variables which is not implied by theelemental inequalities:
If and , then
If, in addition,
and
the above inequality implies that
This is a conditional independence relation which is notimplied by the elemental inequalities. However, whether
remained an open problem. Subsequently, they have de-termined that by discovering the following uncon-strained inequality involving four random variables which isnot implied by the elemental inequalities of the same set ofrandom variables [30]:
The existence of the above two inequalities indicates thatthere may be a lot of information inequalities yet to bediscovered. Since most converse coding theorems are provedby means of information inequalities, it is plausible that someof these inequalities yet to be discovered are needed to settlecertain open problems in information theory.
In the remainder of the section, we shall further elaborateon the significance of by pointing out its relations withsome important problems in probability theory and informationtheory.
1930 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
A. Conditional Independence Relations
For any fixed number of random variables, a basic questionis what sets of conditional independence relations are possible.In the recent work of Matus and Studeny [17], this problemis formulated as follows. Recall that and let
be the family of all couples where andis the union of two, not necessarily different, singletonsand
of . Having a system of random variableswith subsystems , , we introduce thenotation
where is the abbreviation of the statement “is conditionally independent of given .” For ,
means is determined by . The subsystemis presumed to be constant. A subfamily is calledprobabilistically ( -) representableif there exists a system,called a -representation, such that . The problem isto characterize the class of all -representable relations.Note that this problem is more general than the applicationdiscussed in Section IV-D.
Now is equivalent to . Ifis a proper subset of , i.e., is not of
elemental form, then can be written as a nonnegativecombination of the corresponding elemental forms of Shan-non’s information measure. We observe thatif and only if each of the corresponding elemental formsof Shannon’s information measures vanishes, and that anelemental form of Shannon’s information measure vanishesif and only if the corresponding conditional independencerelation holds. Thus it is actually unnecessary to consider
, , for separately because itis determined by the other conditional independence relations.
Let us now look at some examples. Forand
As pointed out in the last paragraph, the couplesare actually redundant. Let be a
system of random variables such that is notdeterministic, , , and and are notfunctions of each other. Then it is easy to see that
Thus is -representable.On the other hand, is not -representable, because and imply .
The recent studies on the problem of conditional indepen-dence relations was launched by a seminal paper by Dawid[3], in which he proposed four axioms as heuristic prop-erties of conditional independence. In information-theoreticterms, these four axioms can be summarized by the following
statement:
and
Subsequent work on this subject has been done by Pearl andhis collaborators in the 1980’s, and their work is summarizedin the book by Pearl [18]. Their work has mainly beenmotivated by the study of the logic of integrity constraintsfrom databases. Pearl conjectured that Dawid’s four axiomscompletely characterize the conditional independence structureof any joint distribution. This conjecture, however, was refutedby the work of Studeny [20]. Since then, Matus and Studenyhave written a series of papers on this problem [10]–[17],[20]–[24]. So far, they have solved the problem for threerandom variables, but the problem for four random variablesremains open.
The relation between this problem and is the following.Suppose we want to determine whether a subfamilyof
is -representable. Now each correspondsto setting to zero in . Note thatis a hyperplane containing the origin in . Thus is -representable if and only if there exists ain
such that for all . Therefore, the problemof conditional independence relations is a subproblem of theproblem of characterizing .
B. Optimization of Information Quantities
Consider minimizing given
(31)
(32)
(33)
where . This problem is equivalent to the followingminimization problem.
Minimize subject to
(34)
(35)
(36)
and
(37)
As no characterization of is available, this minimizationproblem cannot be solved. Nevertheless, since , ifwe replace by in the above minimization problem, itbecomes a linear programming problem which renders a lowerbound on the solution.
C. Multiuser Information Theory
The framework for information inequalities developed inthis paper provides new tools for problems in multiuserinformation theory. Consider the source coding problem inFig. 1, in which and are source random variables,
YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1931
Fig. 1. A multiterminal source coding problem.
and the blocks on the left and right are encoders anddecoders, respectively. The random variables, , and
are the outputs of the corresponding encoders. Given, , and , where
and , we are interested in theadmissible region of the triple .Evidently, , , and give the number ofbits needed for the encoders. From the encoding and decodingrequirements, we immediately have , ,
, , , andequal to zero. Now there are five random
variables involved in this problem. Then the intersection ofand the set containing all such that
is the set of all possible vectors of the joint entropies involvinggiven that they satisfy the encoding and decoding
requirements of the problem as well as the constraints on thejoint entropies involving and . Then is given as theprojection of this set on the coordinates, , and . Inthe same spirit as that in the last subsection, an explicit outerbound of , denoted by , is given by replacing by .
We refer to an outer bound such as as an LP (linearprogramming) bound. This is a new tool for proving conversecoding theorems for problems in multiuser information theory.The LP bound already has found applications in the recentwork of Yeung and Zhang [28] on a new class of multiter-minal source coding problems. We expect that this approachwill have impact on other problems in multiuser informationtheory.
VI. CONCLUDING REMARKS
We have identified the region as the origin all infor-mation inequalities. Our work suggests the possibility of theexistence of information inequalities which cannot be provedby conventional techniques, and this has been confirmed bythe recent results of Zhang and Yeung [29], [30].
A product from the framework we have developed is asimple calculus for verifying all linear information inequalitiesinvolving a definite number of random variables possiblywith linear constraints which can be proved by conventionaltechniques; these include all inequalities of such type inthe literature. Based on this calculus, a software running onMATLAB called ITIP (Information-Theoretic Inequality
Prover) has been developed by Yeung and Yan [27], and itis available on World Wide Web. The following session fromITIP contains verifications of Example 1 and 2, respectively,in Section I.>> ITIP(’I(Y; Z) >= I(X; Z)’,
’I(X; Z|Y) = 0’)True>> ITIP(’H(X,Y) - 1.04 H(Y) + 0.7 I(Y; X,Z)
+ 0.04 H(Y|Z) >= 0’)True
We see from (19) that the amount of computation required ismoderate when . Our work gives a partial answer toHan’s question of what linear combinations of entropies arealways nonnegative [5]. A complete answer to this question isimpossible without further characterization of .
The characterization of is a very fundamental problemin information theory. However, in view of the difficulty ofsome special cases of this problem [15], [17], [29], [30], it isnot very hopeful that this problem can be solved completelyin the near future. Nevertheless, partial characterizations of
may lead to the discovery of some new inequalities whichmake the solutions of certain open problems in informationtheory possible.
APPENDIX IMINIMALITY OF THE ELEMENTAL INEQUALITIES
The elemental inequalities in set-theoretic notations haveone of the following two forms:
1) ;2) , where and .
They will be referred to as-inequalities and -inequalities,respectively.
We are to show that all the elemental inequalities arenonredundant, i.e., none of them is implied by the others. Foran -inequality
(38)
since it is the only elemental inequality which involves theatom , it is clearly not implied by the otherelemental inequalities. Therefore, we only need to show that all
-inequalities are nonredundant. To show that a-inequalityis nonredundant, it suffices to show that there exists a measure
on which satisfies all other elemental inequalities exceptfor that one.
We shall show that the -inequality
(39)
is nonredundant. To facilitate our discussion, we denoteby and we let
be the atoms in , where
(40)
We first consider the case when , i.e.,. We construct a measure by
ifotherwise
(41)
where . In other words, is the only
1932 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
atom with measure ; all other atoms have measure. Thenis trivially true. It is also trivial to
check that for any
(42)
and for any such that and
(43)
if . On the other hand, if is a propersubset of , then contains at leasttwo atoms, and therefore,
(44)
This completes the proof for the-inequality in (39) to benonredundant when .
We now consider the case when , or. We construct a measure as follows. For
the atoms in , let
.(45)
For , if is odd, it is referred to as anodd atomof, and if is even, it is referred to as aneven
atom of . For any atom ,we let
(46)
This completes the construction of.We first prove that
(47)
Consider
where the last equality follows from the binomial formula
(48)
for . This proves (47).Next we prove that satisfies all -inequalities. We note
that for any , the atom is not in. Thus
(49)
It remains to prove that satisfies all -inequalities exceptfor (39), i.e., for any such thatand
(50)
Consider
(51)
The nonnegativity of the second term above follows from (46).For the first term, isnonempty if and only if
and (52)
If this condition is not satisfied, then the first term in (51)becomes , and (50) follows immediately.
Let us assume that the condition in (52) is satisfied. Thenby simple counting, we see that the number atoms in
is equal to , where
For example, for , there are atoms in
namely,
where or for . We check that
We first consider the case when , i.e.,
Then
contains exactly one atom. If this atom is an even atom of, then the first term in (51) is eitheror (cf.,
(45)), and (50) follows immediately. If this atom is an oddatom of , then the first term in (51) is equal to
. This happens if and only if and have onecommon element, which implies that
is nonempty. Therefore, the second term in (51) is at least,and hence (50) follows.
Finally, we consider the case when . Using thebinomial formula in (48), we see that the number of odd atomsand even atoms of in
are the same. Therefore, the first term in (51) is equal toif
and is equal to otherwise. The former is true if and only if, which implies that
YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1933
is nonempty, or that the second term is at least. Thus ineither case (50) is true. This completes the proof that (39) isnonredundant.
APPENDIX IISOME SPECIAL FORMS OF UNCONSTRAINED
INFORMATION INEQUALITIES
In this appendix, we shall discuss some special formsof unconstrained linear information inequalities previouslyinvestigated by Han [5], [6]. Explicit necessary and sufficientconditions for these inequalities to always hold have beenobtained. The relation between these inequalities and theresults in the current paper will also be discussed.
A. Symmetrical Information Inequalities
An information expression is said to besymmetricalif itis identical under every permutation among . Forexample, for , the expression
is symmetrical. This can be seen by permuting andsymbolically in the expression. Now let us consider the
expression . If we replaceand by each other, the expression becomes
, which is symbolically different from the orig-inal expression. However, both expression are identical to
. Therefore, the two expressions are in factidentical, and the expression isactually symmetrical although it is not readily recognizedsymbolically.
The symmetry of an information expression in generalcannot be recognized symbolically. However, it is readilyrecognized symbolically if the expression is in canonicalform. This is due to the uniqueness of the canonical formas discussed in Section III.
Consider a linear symmetrical information expression(in canonical form). As seen in Section IV-B, can beexpressed as a linear combination of the two elemental formsof Shannon’s information measures. It was shown in [5] thatevery symmetrical expression can be written in the form
where
and, for ,
Note that is the sum of all Shannon’s information mea-sures of the first elemental form, and for , isthe sum of all Shannon’s information measures of the secondelemental form conditioning on random variables.
It follows trivially from the elemental inequalities thatis a sufficient condition for
to always hold. The necessity of this condition can be seenby noting the existence of random variables foreach such that andfor all and . This implies thatall unconstrained linear symmetrical information inequalitiesare consequences of the elemental inequalities. We refer thereader to [5] for a more detailed discussion of symmetricalinformation inequalities.
B. Information Inequalities Involving Three Random Variables
Consider . Let
and
and let
Since is an invertible linear transformation of, all linearinformation expression can be written as , where
It was shown in [6] that always holds if and only ifthe following conditions are satisfied:
(53)
In terms of , the elemental inequalities can be expressed as, where
From the discussion in Section IV-B, we see thatalways holds if and only if is a nonnegative
linear combination of the rows of . We leave it as anexercise for the reader to show that is a nonnegativelinear combination of the rows of if and only if theconditions in (53) are satisfied. Therefore, all unconditionallinear inequalities involving three random variables areconsequences of the elemental inequalities. This result alsoimplies that is the smallest pyramid containing .
1934 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
ACKNOWLEDGMENT
The author wishes to acknowledge the help of a fewindividuals during the preparation of this paper. They includeI. Csiszar, B. Hajek, F. Matus, Y.-O. Yan, E.-h. Yang, and Z.Zhang.
REFERENCES
[1] T. M. Cover and J. A. Thomas,Elements of Information Theory.NewYork: Wiley, 1991.
[2] I. Csiszar and J. K¨orner, Information Theory: Coding Theorems forDiscrete Memoryless Systems.New York: Academic, 1981.
[3] A. P. Dawid, “Conditional independence in statistical theory (withdiscussion),”J. Roy. Statist. Soc., Ser. B, vol. 41, pp. 1–31, 1979.
[4] T. S. Han, “Linear dependence structure of the entropy space,”Inform.Contr, vol. 29, pp. 337–368, 1975.
[5] , “Nonnegative entropy measures of multivariate symmetric cor-relations,” Inform. Contr., vol. 36, pp. 133–156, 1978.
[6] , “A uniqueness of Shannon’s information distance and relatednonnegativity problems,”J. Combin.., Inform. Syst. Sci., vol. 6, no. 4,pp. 320–331, 1981.
[7] T. Kawabata and R. W. Yeung, “The structure of theI-Measure of aMarkov chain,” IEEE Trans. Inform. Theory,vol. 38, pp. 1146–1149,May 1992.
[8] J. L. Massey, “Determining the independence of random variables,” in1995 IEEE Int. Symp. on Information Theory(Whistler, BC, Canada,Sept. 17–22, 1995).
[9] , “Causal interpretations of random variables,” in1995 IEEE Int.Symp. on Information Theory(Special session in honor of Mark Pinskeron the occasion of his 70th birthday) (Whistler, BC, Canada, Sept.17–22, 1995).
[10] F. Matus, “Abstract functional dependency structures,”Theor. Comput.Sci., vol. 81, pp. 117–126, 1991.
[11] , “On equivalence of Markov properties over undirected graphs,”J. Appl. Probab., vol. 29, pp. 745–749, 1992.
[12] , “Ascending and descending conditional independence relations,”in Trans. 11th Prague Conf. on Information Theory, Statistical DecisionFunctions and Random Processes(Academia, Prague, 1992), vol. B,pp. 181–200.
[13] , “Probabilistic conditional independence structures and matroidtheory: Background,”Int. J. General Syst., vol. 22, pp. 185–196, 1994.
[14] , “Extreme convex set functions with many nonnegative differ-ences,”Discr. Math., vol. 135, pp. 177–191, 1994.
[15] , “Conditional independence among four random variables II,”Combin., Prob. Comput., to be published.
[16] , “Conditional independence structures examined via minors,”Ann. Math. Artificial Intell., submitted for publication.
[17] F. Matus and M. Studeny, “Conditional independence among fourrandom variables I,”Combin., Prob. Comput., to be published.
[18] J. Pearl,Probabilistic Reasoning in Intelligent Systems.San Mateo,CA: Morgan Kaufman, 1988.
[19] G. Strang,Linear Algebra and Its Applications, 2nd ed. New York:Academic, 1980.
[20] M. Studeny, “Attempts at axiomatic description of conditional indepen-dence,” in Proc. Work. on Uncertainty Processing in Expert Systems,supplement toKybernetika, vol. 25, nos. 1–3, pp. 65–72, 1989.
[21] , “Multiinformation and the problem of characterization of con-ditional independence relations,”Probl. Contr. and Inform. Theory, vol.18, pp. 3–16, 1989.
[22] , “Conditional independence relations have no finite completecharacterization,” inTrans. 11th Prague Conf. on Information The-ory, Statistical Decision Functions and Random Processes(Academia,Prague, 1992), vol. B, pp. 377–396.
[23] , “Structural semigraphoids,”Int. J. Gen. Syst., submitted forpublication.
[24] , “Descriptions of structures of stochastic independence by meansof faces and imsets (in three parts),”Int. J. Gen. Syst., submitted forpublication.
[25] , “A new outlook on Shannon’s information measures,”IEEETrans. Inform. Theory, vol. 37, pp. 466–474, May 1991.
[26] , “Multilevel diversity coding with distortion,” IEEE Trans.Inform. Theory, vol. 41, pp. 412–422, Mar. 1995.
[27] R. W. Yeung and Y.-O. Yan, ITIP, [Online] Available WWW:http://www.ie.cuhk.edu.hk/�ITIP.
[28] R. W. Yeung and Z. Zhang, “Miltilevel distributed source coding,” in1997 IEEE Int. Symp. on Information Theory(Ulm, Germany, June1997), p. 276.
[29] Z. Zhang and R. W. Yeung, “A non-Shannon type conditional informa-tion inequality,” this issue, pp. 1982–1986.
[30] , “On the characterization of entropy function via informationinequalities,” to be published inIEEE Trans. Inform. Theory.