A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful...

ANALELE STIINTIFICE ALE UNIVERSITATII ”AL.I.CUZA” IASITomul XLVII, s.I a, Matematica, 2001, f.2.

A COUNTERPART OF JENSEN’S CONTINUOUSINEQUALITY

AND APPLICATIONS IN INFORMATION THEORY

BY

S.S. DRAGOMIR∗ and C.J. GOH

Abstract. We derive a counterpart of Jensen’s continuous inequality for differen-tiable convex mappings. This seemingly trivial extension has significant applications ininformation theory. In particular, we derive several useful bounds for (differential) en-tropy, joint entropy, conditional entropy, and mutual information arising from the theoryof information. We also apply it to derive bounds for entropy measures for mixed popu-lations of random variables.

AMS Subject Classification: 94D17, 26D15, 26D20, 26D99.Key words: Jensen’s inequality, Information theory, Differential Entropy.

1. Introduction. Consider a random variable X : Ω → I ⊂ IR (whereΩ is some sample space) with a corresponding density function pX : I →[0, 1] which has finite support, so that pX(x) = 0 ∀x /∈ I. For notationalclarity, we shall suppress the subscript, i.e., the random variable which thedensity function is associated with, so that p(x) is the same as pX(x). LetY (X) be another random variable which is a function of the random variableX, and f : IR → IR be some convex differentiable function, such that

(1.1) f(x)− f(y) ≥ f ′(y)(x− y) ∀x, y ∈ IR.

Define the inner product of two functions: φ : I → IR and ψ : I → IR to be

(1.2) 〈φ, ψ〉 =∫

Iφ(x)ψ(x)p(x)dx

∗SSD acknowledges the financial support from the University of Western Australiaduring his visit when this work was completed.

240 S.S. DRAGOMIR and C.J. GOH 2

with

(1.3) ||φ||2 =∫

Iφ2(x)p(x)dx.

The well-known Schwartz inequality is given by

(1.4) |〈φ, ψ〉| ≤ ||φ||||ψ||

or

(1.5) |IE(φ(X)ψ(X))| ≤ [IE(φ(X)2)]12 [IE(ψ(X)2)]

12 .

where IE(·) :=∫I(·)p(x)dx. The following inequality is well-known in the

literature as the (continuous) Jensen inequality for the convex function f :

(1.6) f(IE(Y )) = f [∫

IY (x)p(x)dx] ≤

∫If(Y (x))p(x)dx = IE(f(Y )).

It is the aim of this paper to derive a counterpart of Jensen’s inequality,and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for continuous random variables. Thefollowing is the continuous version of Theorem 2.1 of [1].

Theorem 1.1. Let f : IR → IR be convex and differentiable, Y bea random variable (as a function of the random variable X), and IE(·) =∫I(·)p(x)dx. Then, we have

(1.7)0 ≤ IE[f(Y )]− f [IE(Y )] ≤ IE[Y f ′(Y )]− IE[f ′(Y )]IE(Y ) ≤

≤ IE[f ′(Y )2]12 IE(Y 2)− [IE(Y )]2

12 .

Proof. The first inequality is just Jensen’s, which follows readily fromthe convexity of f . Since f is convex, we have

f(IE(Y ))− f(Y ) ≥ f ′(Y )[IE(Y )− Y ].

Taking expectation on both side, we have

f [IE(Y )]− IE[f(Y )] ≥ IE[f ′(Y )]IE(Y )− IE[Y f ′(Y )]

3 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 241

which is the second inequality. Furthermore,

IE[f ′(Y )] IE(Y )− IE[Y f ′(Y )] == IE[f ′(Y )IE(Y )− Y f ′(Y )] = IE[f ′(Y )(IE(Y )− Y ] ≤≤ IE[f ′(Y )2]

12 IE[(Y − IE(Y ))2]

12 by Schwartz inequality

= IE[f ′(Y )2]12 IE(Y 2)− [IE(Y )]2

12 ,

which is the third inequality.

2. Applications in Information Theory: Bounds for Differ-ential Entropy. Given a continuous random variable X : Ω → I, theuncertainty of the associated random event is given by the entropy function(see McEliece [2]):

(2.1) Hb(X) = IE(−logbp(x)) = −∫

Ip(x)logb(p(x))dx.

Theorem 2.1. With the above assumptions, we have:

(2.2)0 ≤ logb|I| −Hb(x) ≤

≤ 1ln b

(|I|∫

Ip(x)2dx− 1

)=

12 ln b

∫I

∫I(p(x)− p(y))2dxdy

where equality holds if and only if p(x) is constant almost everywhere in I.

Proof: In Theorem 1.1, let

f(y) = −logby,

f ′(y) = − 1ln b

1y,

Y (x) = 1p(x) .

we have

0 ≤∫

Ip(x)logbp(x) dx+ logb

(∫I

p(x)p(x)

dx

)≤

≤∫

I

[− 1

ln bp(x)

]dx+

∫I

1ln b

p(x)2dx∫

I

p(x)p(x)

dx =

=1

ln b

[|I|∫

Ip(x)2dx− 1

]


or0 ≤ logb|I| −Hb(X) ≤

≤ 1ln b

(|I|∫

Ip(x)2dx− 1

)=

=12

1ln b

∫I

∫I(p(x)− p(y))2dxdy.

The following Corollary is immediately obvious.

Corollary 2.2. Let the density p be such that

(2.3) max(x,y)∈I2

|p(x)− p(y)| ≤√

2 ε ln b

|I|

for some ε > 0, then

(2.4) 0 ≤ logb|I| −Hb(X) ≤ ε.

The above result can be easily extended to the joint entropy for tworandom variables. Consider 2 continuous random variables X : Ω → I andY : Ω → J with joint density function p(x, y). The joint entropy of X andY is defined as

(2.5) Hb(X,Y ) = −∫

I

∫Jp(x, y)logbp(x, y)dxdy.

The proof of the following Theorem and its Corollary is similar to thatof Theorem 2.1.

Theorem 2.3. With the above assumptions, we have

(2.6)0 ≤ logb|I × J | −Hb(x, y) ≤

≤ 12 ln b

∫I

∫J

∫I

∫J

(p(x, y)− p(u, v))2 dxdydudv,

where equality holds if and only if p(x, y) is a constant almost everywherein I × J .

Corollary 2.4. Let the joint density p(x, y) be such that

(2.7) max(x,y),(u,v)∈I×J

|p(x, y)− p(u, v)| ≤√

2 ε ln b

|I × J |


for some ε > 0, then

(2.8) 0 ≤ logb|I × J | −Hb(X,Y ) ≤ ε.

3. Results for Conditional Entropy. Consider a triplet of randomvariables X : Ω → I, Y : Ω → J and Z : Ω → K. The conditional entropyof X given Y is defined as:

(3.1) Hb(X | Y ) =∫

I

∫Jp(x, y) log

1p(x | y)

dxdy,

where p(x | y) = p(x, y)/p(y) is the conditional probability of X givenY = y. We also define the conditional probability of Z given Y = y asp(z | y), and the conditional probability of Z given Y = y and X = x asp(z | x, y). Let

(3.2) A(z) :=∫

I

∫Jγxy(z)dxdy

where

(3.3) γxy(z) :=p(x, y, z)p(x | y)

∀x, y, z.

Theorem 3.1. Given the assumptions above, we have:

0 ≤ Hb(Z) + IEZ [logbA(Z)]−Hb(X | Y ) ≤

≤ 12lnb

∫K

1p(z)

∫I

∫J

∫I

∫Jγxy(z)γuv(z)(p(x | y)−

−p(u | v))2dxdy dudv dz,

where IEZ(·) =∫K(·)p(z)dz.

Proof. To prove the Theorem, we need to modify Theorem 1.1 to read,

0 ≤ IEX,Y |z (f(W (X,Y )))− f(IEX,Y |z(W (X,Y ))

)≤

≤ IEX,Y |z (W (X,Y )f ′(W (X,Y )))−−IEX,Y |z (f ′(W (X,Y ))) IEX,Y |z (W (X,Y )) .


Substituting this term for the last two terms of (5.1), we get:

0 ≤∫

I

∫J

[−p(x, y, z)

p(z)logb

1p(x | y)

]dxdy+

+ logb

(∫I

∫J

p(x, y, z)p(z)p(x | y)

dxdy

)≤

≤ 12p(z)2 ln b

(∫I

∫J

∫I

∫Jγxy(z)γuv(z) (p(x | y) −

−p(u | v))2 dxdydudv.

Multiplying the above by p(z) and integrating over z to give:

0 ≤∫

Kp(z) logb

1p(z)

dz +

+∫

Kp(z) logb

(∫I

∫J

p(x, y, z)p(x | y)

dxdy

)dz−

−∫

I

∫J

∫Kp(x, y, z) logb

1p(x | y)

dxdydz ≤

≤ 12 ln b

∫K

1p(z)

(∫I

∫J

∫I


− p(u | v))2 dxdydudv)dz,

or0 ≤ Hb(Z) + IEZ (logbA(Z))−Hb(X | Y ) ≤

≤ 12 ln b

∫K

1p(z)

(∫I

∫J

∫I


− p(u | v))2 dxdydudv)dz.

which completes the proof. The following Corollary follows readily from the above:

Corollary 3.2. Let ε > 0 be given. If

(3.6) max(x,y),(u,v)

|p(x | y)− p(u | v)| ≤√

2 ε lnbM

where

(3.7) M :=∫

K

A(z)2

p(z)dz =

∫K

1p(z)

(∫I

∫Jγxy(z)dxdy

)2

dz,


then

(3.8) 0 ≤ Hb(Z) + IEZ (logbA(Z))−Hb(X | Y ) ≤ ε.

Using theorem 1.1, we can readily obtain the following result and its corol-lary:


(3.9)0 ≤ logb|I| −Hb(X | Y ) ≤

≤ 12 ln b

∫I

∫J

∫I

∫Jp(y)p(v)(p(x | y)− p(u | v))2dxdydudv


(3.10) max(x,y),(u,v)

|p(x | y)− p(u | v)| ≤√

2 ε ln b|I|

then

(3.11) 0 ≤ logb|I| −Hb(X | Y ) ≤ ε.

4. Results for Mutual Information. Given the pair of continuousrandom variables X : Ω → I and Y : Ω → J with joint density functionp(x, y), the mutual information is defined as [2, p. 24]:

(4.1) Ib(X;Y ) := Hb(X)−Hb(X | Y ) =∫

I

∫Jp(x, y) logb

p(x, y)p(x)p(y)

dxdy.


(4.2)

0 ≤ Ib(X;Y ) ≤

≤ 12ln b

∫I

∫J

∫I

∫Jp(x)p(y)p(u)p(v)

(p(x, y)p(x)p(y)

−

− p(u, v)p(u)p(v)

)2

dxdydudv.

Moreover, equalities holds in both inequalities simultaneously if and only ifX and Y are independent.


Proof. To prove this Theorem, we modify Theorem 1.1 to read:

(4.3)0 ≤ IEX,Y (f(W (X,Y ))− f(IEX,YW (X,Y )) ≤

≤ IEX,Y (W (X,Y )f ′(W (X,Y ))−−IEX,Y f

′(W (X,Y ))IEX,YW (X,Y ).

where IEX,Y (·) =∫I

∫J(·)p(x, y)dxdy. Let

f(W ) = − logbW

f ′(W ) = − 1ln b

1W

W (x, y) =p(x)p(y)p(x, y)

,

in the above to obtain

0 ≤∫

I

∫Jp(x, y) logb

p(x, y)p(x)p(y)

dxdy +

+ logb

(∫I

∫Jp(x, y)

p(x)p(y)p(x, y)

dxdy

)≤

≤ − 1ln b

∫I

∫J

p(x)p(y)p(x, y)

p(x, y)p(x)p(y)

p(x, y)dxdy+

+1

ln b

∫I

∫J

p(x, y)2

p(x)p(y)dxdy

∫I

∫J

p(x)p(y)p(x, y)

p(x, y)dxdy

or0 ≤ Ib(X;Y ) ≤

≤ 1ln b

(∫I

∫J

(p(x, y)p(x)p(y)

)2

p(x)p(y)dxdy∫I

∫Jp(x)p(y)dxdy − 1

)=

=12

1ln b

∫I

∫J

∫I

∫Jp(x)p(y)p(u)p(v)(

p(x, y)p(x)p(y)

− p(u, v)p(u)p(v)

)2

dxdydudv.

Clearly, X and Y are independent if and only if p(x, y) = p(x)p(y) almosteverywhere if and only if the last term of the above is 0. This completesthe proof.

The following Corollary is immediately obvious:



(4.4) max(x,y),(u,v)

| p(x, y)p(x)p(y)

− p(u, v)p(u)p(v)

| ≤√

2 ε lnb,

then,

(4.5) 0 ≤ Ib(X;Y ) ≤ ε.

Now consider a triplet of random variables X : Ω → I, Y : Ω → J andZ : Ω → K. Define the mutual information (interpreted as the amount ofinformation X and Y provide about Z, see p.26 of [2])

(4.6) Ib(X,Y ;Z) :=∫

I

∫J

∫Kp(x, y, z) logb

p(z | x, y)p(z)

dxdydz.


(4.7)

0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z) ≤

≤∫

I

∫J

∫K

∫I

∫J

∫Kp(x, y)p(u, v)p(z | y)p(w | v)×

×(p(z | x, y)p(z | y)

− p(w | u, v)p(w | v)

)2

dxdydzdudvdw,

where equality holds if and only if p(z | x, y) = p(z | y) ∀(x, y, z) withp(x, y, z) > 0.

Proof. We first midify Theorem 1.1 to read:

0 ≤ IEXY Z (f(W (X,Y, Z)))− f (IEXY ZW (X,Y, Z)) ≤≤ IEXY Z (W (X,Y, Z)f ′(W (X,Y, Z)))−−IEXY Z (f ′(W (X,Y, Z))) IEXY Z (W (X,Y, Z)) ,

where IEXY Z(·) =∫I

∫J

∫K p(x, y, z)dxdydz. Let:

f(W ) = − logbW

f ′(W ) = − 1ln b

1W

W (X,Y, Z) =p(z | y)p(z | x, y)

,


then,

(4.10) 0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z) ≤ ε.

5. Results for the entropy, joint entropy and conditional en-tropy of mixed populations. Consider two continuous random variablesX1, X2 both having the same (finite) range I but having different densi-ties. If we mix the two populations representing X1 and X2 together withproportion 0 ≤ α ≤ 1 and 1 − α respectively, we derive a new populationrepresented by a random variable X, such that

p(x) = αp1(x) + (1− α)p2(x), α ∈ [0, 1].

where p1, p2 are the probability density of X1, X2 respectively, p(x) isthe probability density for X. In this and the following sections, we shallstudy the convexity properties of the entropy of mixed population, andestablish some useful bounds for various important quantitative measuresin information theory.

Theorem 5.1. With the above assumptions we have

(5.1)

0 ≤ Hb(X)− αHb(X1)− (1− α)Hb(X2) ≤

≤ 12ln b

∫I

∫Ip(x)p(y)

(α

(p1(x)p(x)

− p1(y)p(y)

)2

+

+ (1− α)(p2(x)p(x)

− p2(y)p(y)

)2)dxdy.

Proof. We have,

(5.2)

Hb(X)− αHb(X1)− (1− α)Hb(X2) =

=∫

Ip(x) logb

1p(x)

dx− α

∫Ip1(x) logb

1p1(x)

dx−

−(1− α)∫

Ip2(x) logb

1p2(x)

dx =

= α

∫Ip1(x) logb

p1(x)p(x)

dx+ (1− α)∫

Ip2(x) logb

p2(x)p(x)

dx =

= −α∫

Ip1(x) logb

p(x)p1(x)

dx− (1− α)∫

Ip2(x) logb

p(x)p2(x)

dx.


By Jensen’s inequality we have that:∫Ip1(x) logb

p(x)p1(x)

dx ≤ logb

∫Ip1(x)

p(x)p1(x)

dx =

= logb

∫Ip(x)dx = 0

and similarly ∫Ip2(x) logb

p(x)p2(x)

dx ≤ 0,

from which we obtain the first inequality of (5.1):

Hb(X)− αHb(X1)− (1− α)Hb(X2) ≥ 0.

Using a modification of Theorem 1.1 we have:

(5.3)

−∫

Ip1(x) logb

p(x)p1(x)

dx ≤ − logb

(∫Ip1(x)

p(x)p1(x)

dx

)+

+1

2ln b

∫I

∫I

p1(x)p1(y)p(x)p1(x)

p(y)p1(y)

(p(x)p1(x)

− p(y)p1(y)

)2

dxdy =

=1

2ln b

∫I

∫I

p21(x)p

21(y)

p(x)p(y)(p(x)p1(y)− p(y)p1(x))2

p21(x)p

21(y)

dxdy =

=1

2ln b

∫I

∫Ip(x)p(y)

(p1(y)p(y)

− p1(x)p(x)

)2

dxdy

and similarly,

(5.4)−∫

Ip2(x) logb

p(x)p2(x)

dx ≤

≤ 1ln b

∫I

∫Ip(x)p(y)

(p2(x)p(x)

− p2(y)p(y)

)2

dxdy.

Consequently, using the identity (5.2), and inequalities (5.3) and (5.4) weobtain the second inequality of (5.1).

Corollary 5.2. If

(5.5) maxx,y

maxi=1,2

∣∣∣∣pi(x)p(x)

− pi(y)p(y)

∣∣∣∣ ≤√

2εln b


for some ε > 0 (Note: ε depends on α), then

0 ≤ Hb(X)− αHb(X1)− (1− α)Hb(X2) ≤ ε.

Proof. From (5.5) we have that

α

(p1(x)p(x)

− p1(y)p(y)

)2

+ (1− α)(p2(x)p(x)

− p2(y)p(y)

)2

≤ 2εln b,

thus, by (5.1)

Hb(X)− αHb(X1)− (1− α)Hb(X2) ≤

≤ 12ln b

· 2εln b∫

I

∫Ip(x)p(y)dxdy = ε.

We can prove similarly, that the joint entropy of two random variablesis also concave in the joint density distribution.

Let (X1, Y1), (X2, Y2) and (X,Y ) be pairs of random variables such thatX1, X2 and X have the same (finite) range I, and Y1, Y2 and Y have thesame (finite) range J , and furthermore,

p(x, y) = αp1(x, y) + (1− α)p2(x, y) ∀x ∈ I, ∀y ∈ J,

where α ∈ [0, 1]. The following Theorem and its Corollary can be proved inthe same way as that of Theorem 5.1 and Corollary 5.2:

Theorem 5.3. With the previous assumptions we have

(5.6)

0 ≤ Hb(X,Y )− αHb(X1, Y1)− (1− α)Hb(X2, Y2) ≤

≤ 12ln b

∫I

∫J

∫I

∫Jp(x, y)p(u, v)×

×

(α

(p1(x, y)p(x, y)

− p1(u, v)p(u, v)

)2

+

+(1− α)(p2(x, y)p(x, y)

− p2(u, v)p(u, v)

)2)dxdydudv.

The following Corollary follows from Theorem 5.3 directly.


Using Jensen’s inequality, we have

−∫

I

∫Jp1(x, y) logb

p(x | y)p1(x | y)

dxdy ≥

≥ −logb

∫I

∫Jp1(x, y)

p(x | y)p1(x | y)

dxdy =

= −logb

(∫I

∫Jp1(y)p(x | y)

)dxdy =

= −logb

(∫I

∫Jp(x | y)dxp1(y)

)dy =

= −logb

(∫Jp1(y)dy

)= logb1 = 0,

and similarly

−∫

I

∫Jp2(x, y)logb

p(x | y)p2(x | y)

dxdy ≥ 0

thus proving the first inequality in (5.8). Furthermore, using a modifiedversion of Theorem 1.1, we have

∫I

∫Jp1(x, y)logb

p(x | y)p1(x | y)

dxdy ≤

≤ 12ln b

∫I

∫J

∫I

∫J

p1(x, y)p1(u, v)p(x|y)p1(x|y)

p(u|v)p1(u|v)

(p(x | y)p1(x | y)

− p(u | v)p1(u | v)

)2

dxdydudv=

=1

2ln b

∫I

∫J

∫I

∫J

p1(x, y)p1(u, v)p1(x | y)p1(u | v)

1p(x | y)p(u | v)

×

× (p(x | y)p1(u | v)−p1(x | y)p(u | v))2 dxdydudv =

=1

2ln b

∫I

∫J

∫I

∫Jp1(y)p1(v)p(x | y)p(u | v)(

p1(u | v)p(u | v)

− p1(x | y)p(x | y)

)2

dxdydudv.

A similar inequality for p2 can be obtained. Using these and the inequality


(5.9) we have

0 ≤ Hb(X | Y )− αHb(X1 | Y1)− (1− α)Hb(X2 | Y2) ≤

≤ α1

2ln b

∫I

∫J

∫I

∫Jp1(y)p1(v)p(x | y)p(u | v)×

×(p1(u | v)p(u | v)

− p1(x | y)p(x | y)

)2

dxdydudv+

+(1− α)1

2ln b

∫I

∫J

∫I

∫Jp2(y)p2(v)p(x | y)p(u | v)×

×(p2(u | v)p(u | v)

− p2(x | y)p(x | y)

)2

dxdydudv,

which is the second inequality of (5.7). The following Corollary is immediately obvious:

Corollary 5.5. With the above assumptions and if:

maxu,v,x,y

maxi=1,2

|pi(u | v)p(u | v)

− pi(x | y)p(x | y)

| <√

2 ε ln b, ε > 0,

then:

0 ≤ Hb(X | Y )− αHb(X1 | Y1)− (1− α)Hb(X2 | Y2) < ε.

6. Results for the mutual information of mixed populations.The mutual information between two random variables X and Y is previ-ously defined in section 3. We may extend the results in section 3 further.Consider two pairs of discrete random variables (X1, Y1) and (X2, Y2) withjoint probability densities p1(x, y) and p2(x, y) respectively. One can thinkof X1 and X2 as the inputs to some noisy transmission channel and Y1 andY2 as the outputs from the channel. Let the range of X1 and X2 be I,and the range of Y1 and Y2 be J . Define another pair of random variable(X,Y ) (having range I and J respectively) where X’s probability densityis a convex combination

p(x) = αp1(x) + (1− α)p2(x) ∀x,

where 0 ≤ α ≤ 1. The discrete version of the following result is known [2,p. 28]:

(6.1) αIb(X1;Y1) + (1− α)Ib(X2;Y2) ≤ Ib(X;Y )


where Y1, Y2 and Y are the channel outputs corresponding to X1, X2 andX respectively. The continuous version (concavity of I with respect to theinput probabilities) can be established in the following Theorem:

Theorem 6.1. With the above assumptions we have:

(6.2)

0 ≤ Ib(X;Y )− αIb(X1;Y1)− (1− α)Ib(X2;Y2) ≤

≤ 12ln b

∫J

∫Jp(y)p(v)

(α

(p1(v)p(v)

− p1(y)p(y)

)2

+

+(1− α)(p2(v)p(v)

− p2(y)p(y)

)2)dydv.

Proof. We have, by definition,

(6.3)

αIb(X1;Y1) +(1−α)Ib(X2;Y2)−Ib(X;Y ) =

= α

∫I

∫Jp1(x, y) logb

p(y | x)p1(y)

dxdy+

+(1−α)∫

I

∫Jp2(x, y) logb

p(y | x)p2(y)

dxdy+

+∫

I

∫J

(αp1(x, y)(1−α)p2(x, y)) logb

p(y | x)p(y)

dxdy=

= α

∫I

∫Jp1(x, y) logb

p(y)p1(y)

dxdy+

+(1−α)∫

I

∫Jp2(x, y) logb

p(y)p2(y)

dxdy.

Application of Jensen inequality to each term of the above sums, wehave:∫

I

∫Jp1(x, y) logb

p(y)p1(y)

dxdy ≤∫

I

∫J

logb

(p1(x, y)

p(y)p1(y)

dxdy

)=

= logb

(∫J

p(y)p1(y)

∫Ip1(x, y)dxdy

)=

= logb

(∫J

p(y)p1(y)

p1(y)dy)

= logb 1 = 0.

Similarly, ∫I

∫Jp2(x, y) logb

p(y)p1(y)

dxdy ≤ 0.


Adding the above yileds the first inequality of (6.2). By a modified versionof Theorem 1.1 we get that:

αIb(X1;Y1) + (1− α)Ib(X2;Y2)− Ib(X;Y ) ≥

≥ − α

2ln b

∫I

∫Jp(y)p(v)

(p1(v)p(v)

− p1(y)p(y)

)2

dxdy−

−(1− α)2ln b

∫I

∫Jp(y)p(v)

(p2(v)p(v)

− p2(y)p(y)

)2

dxdy,

which is the second inequality of (6.2). The following Corollary follows directly.


(6.4) maxv,y

maxi=1,2

∣∣∣∣pi(v)p(v)

− pi(y)p(y)

∣∣∣∣ <√

2εln b, ε > 0

for some ε > 0 (note: depends on α) then we have

(6.5) 0 ≤ αIb(X1;Y1) + (1− α)Ib(X2;Y2)− Ib(X;Y ) ≤ ε.

Furthermore, if the input probabilities p(x) are fixed and we are giventwo sets of transition probabilities p1(y | x) and p2(y | x). Let the convexcombination of these transition probabilities be

p(y | x) = αp1(y | x) + (1− α)p2(y | x).

The discrete version of the following result is known [2, p. 29]

(6.6) Ib(X;Y ) ≤ αIb(X;Y1) + (1− α)Ib(X;Y2)

where Y , Y1, Y2 are channel outputs corresponding to the transition prob-abilities p(y | x), p1(y | x) and p2(y | x). This result can be extended asfollows.

Theorem 6.3. With the above assumptions we have:

(6.6)

0 ≤ αIb(X;Y1) + (1− α)Ib(X;Y2)− Ib(X;Y ) ≤

≤ 12

1ln b

∫I

∫J

∫I

∫Jp(x | y)p(u | v)·

·

(αp1(y)p1(v)

(p1(u | v)p(u | v)

− p1(x | y)p(x | y)

)2

+

+ (1− α)p2(y)p2(v)(p2(u | v)p(u | v)

− p2(x | y)p(x | y)

)2)dxdydudv.


Proof. By definition, we have:

(6.7)

Ib(X;Y ) −αIb(X;Y1)− (1− α)Ib(X;Y2) =

= α

∫I

∫Jp1(x, y) logb

p(x | y)p1(x | y)

dxdy+

+(1− α)∫

I

∫Jp2(x, y) logb

p(x | y)p2(x | y)

dxdy,

Applying Jensen’s inequality to the first term to get:

∫I

∫J

p1(x, y) logb

p(x | y)p1(x | y)

dxdy ≤

≤ logb

(∫I

∫Jp1(x, y)

p(x | y)p1(x | y)

dxdy

)=

= logb

(∫I

∫Jp1(y)

p(x, y)p(y)

dxdy

)=

= logb

(∫J

p1(y)p(y)

∫Ip(x, y)dxdy

)=

= logb

(∫J

p1(y)p(y)

p(y)dy)

=

= logb

(∫Jp1(y)dy

)= 0.

Similarly,

∫I

∫Jp1(x, y) logb

p(x | y)p1(x | y)

dxdy ≤ 0.

Adding the above yields the first inequality of (6.6).


By a modified version of Theorem 1.1, we have:

(6.8)

∫I

∫Jp1(x, y) logb

p(x | y)p1(x | y)

dxdy ≥

≥ logb

(∫I

∫Jp1(x, y)

p(x | y)p1(x | y)

dxdy

)−

− 12ln b

∫I

∫J

∫I

∫J

p1(x, y)p1(u, v)p(x | y)p1(x | y)

· p(u | v)p1(u | v)(

p(x | y)p1(x | y)

− p(u | v)p1(u | v)

)2

dxdy =

= − 12ln b

∫I

∫J

∫I

∫J

p1(x, y)p1(u, v)p1(x | y)p1(u | v)

· 1p(x | y)p(u | v)

×

× (p(x | y)p1(u | v)− p1(x | y)p(u | v))2dxdy =

= − 12ln b

∫I

∫J

∫I

∫Jp1(y)p1(v)p(x | y)p(u | v)(

p1(u | v)p(u | v)

− p1(x | y)p(x | y)

)2

dxdy.

Using these and (6.7) and a similar inequality for p2, we obtain

αIb(X;Y1) + (1− α)Ib(X;Y2)− Ib(X;Y ) ≤

≤ 12ln b

∫I

∫J

∫I

∫Jp(x | y)p(u | v)×

×

(αp1(y)p1(v)

(p1(u | v)p(u | v)

− p1(x | y)p(x | y)

)2

+

+(1− α)p2(y)p2(v)(p2(u | v)p(u | v)

− p2(x | y)p(x | y)

)2)× dxdydudv,

and the Theorem is proved. The following Corollary follows directly.


maxu,v,x,y

maxi=1,2

∣∣∣∣pi(u | v)p(u | v)

− pi(x | y)p(x | y)

∣∣∣∣ <√

2εln b

for some ε > 0 (note: depends on α) then we have:

0 ≤ αIb(X;Y1) + (1− α)Ib(X;Y2)− Ib(X;Y ) ≤ ε.


7. Further bounds based on ratio of maximum and minimumdensity. All the corollaries that we have discussed so far yield sufficientconditions for the lower bounds on the respective measures of random vari-ables or mixed population of random variables. These sufficient conditionsare all based on the absolute difference between some densities. We nowderive another sufficient condition which is based on the ratio of densities.We show that this leads to similar lower bounds which are in many casessimpler in appearance. The following Theorem is a Corollary of Theorem1.1.

Theorem 7.1. Let

(7.1) ρ := maxx,y

p(x)p(y)

,

If

(7.2) ρ ≤ φ(ε) := 1 + ε ln b +√ε ln b(ε ln b + 2),

then

(7.3) 0 ≤ logb|I| −Hb(X) ≤ ε.

Proof. If ρ ≤ φ(ε), then clearly, for all i, k,

1φ(ε)

≤ p(x)p(y)

≤ φ(ε),

iff (p(x)p(y)

)2

− (2 + 2ε ln b)(

p(x)p(y)

)+ 1 ≤ 0,

iff(p(x)− p(y))2

p(x)p(y)=(p(x)p(y)

)− 2 +

(p(x)p(y)

)≤ 2ε ln b.

Hence from (2.2) of Theorem 2.1, we have0 ≤ logb|I| −Hb(X) ≤ 1

2ln b

∫I

∫J p(x)p(y) 2ε ln b dxdy = ε.

The following Theorem is a Corollary of Theorem 2.3, and the proof issimilar to that of Theorem 7.1.


Theorem 7.2. Let

(7.4) ρ := maxx,y,u,v

p(x, y)p(u, v)

,

If

(7.5) ρ ≤ φ(ε),

then

(7.6) 0 ≤ logb|I| −Hb(X,Y ) ≤ ε.

The following Theorem is a Corollary of Theorem 3.1, and the proofis similar to that of Theorem 7.1. Notice that this sufficient condition isindependent of M (as defined in (3.7), and is in a simpler form than before.

Theorem 7.3. Let

(7.7) ρ := maxx,y,u,v

p(x | y)p(u | v)

,

If

(7.8) ρ ≤ φ(ε),

then

(7.9) 0 ≤ Hb(Z) + IE(logbA)−Hb(X | Y ) ≤ ε,

where Hb(X | Y ) and A are as defined in (3.1) and (3.2) respectively.The following Theorem is a Corollary of Theorem 4.1, and the proof is

similar to that of Theorem 7.1.

Theorem 7.4. Let

(7.10) ρ := maxx,y,u,v

p(x)p(y)p(u, v)p(u)p(v)p(x, y)

,

If

(7.11) ρ ≤ φ(ε),


then

(7.12) 0 ≤ Ib(X,Y ) ≤ ε.

The following Theorem is a Corollary of Theorem 4.3, and the proof issimilar to that of Theorem 7.1.

Theorem 7.5. Let

(7.13) ρ := maxx,y,z,u,v,w

p(z | x, y)p(w | v)p(w | u, v)p(z | y)

,

If

(7.14) ρ ≤ φ(ε),

then

(7.15) 0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z) ≤ ε.

REFERENCES

1. Dragomir, s.s. and Goh, c.j. – A counterpart of Jensen’s Discrete Inequality for

Differentiable Convex Mappings and Applications in Information Theory, Mathe-

matical and Computer Modelling, to appear.

2. Mceliece, r.j. – The Theory of Information and Coding, Addison Wesley Publish-

ing Company, Reading, 1977.

Received: 19.I.2000 Department of Mathematics

University of Timisoara

RO-1900 Timisoara

ROMANIA

Department of Mathematics

University of Western Australia

Nedlands, WA 6907

AUSTRALIA

A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful...

Documents

Transcript of A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful...