A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful...

24
ANALELE S ¸TIINT ¸ IFICE ALE UNIVERSIT ˘ AT ¸ II ”AL.I.CUZA” IAS ¸I Tomul XLVII, s.I a, Matematic˘ a, 2001, f.2. A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY AND APPLICATIONS IN INFORMATION THEORY BY S.S. DRAGOMIR * and C.J. GOH Abstract. We derive a counterpart of Jensen’s continuous inequality for differen- tiable convex mappings. This seemingly trivial extension has significant applications in information theory. In particular, we derive several useful bounds for (differential) en- tropy, joint entropy, conditional entropy, and mutual information arising from the theory of information. We also apply it to derive bounds for entropy measures for mixed popu- lations of random variables. AMS Subject Classification: 94D17, 26D15, 26D20, 26D99. Key words: Jensen’s inequality, Information theory, Differential Entropy. 1. Introduction. Consider a random variable X I IR (where Ω is some sample space) with a corresponding density function p X : I [0, 1] which has finite support, so that p X (x)=0 x/ I. For notational clarity, we shall suppress the subscript, i.e., the random variable which the density function is associated with, so that p(x) is the same as p X (x). Let Y (X ) be another random variable which is a function of the random variable X , and f : IR IR be some convex differentiable function, such that (1.1) f (x) - f (y) f (y)(x - y) x, y IR. Define the inner product of two functions: φ : I IR and ψ : I IR to be (1.2) φ, ψ = I φ(x)ψ(x)p(x)dx * SSD acknowledges the financial support from the University of Western Australia during his visit when this work was completed.

Transcript of A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful...

Page 1: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

ANALELE STIINTIFICE ALE UNIVERSITATII ”AL.I.CUZA” IASITomul XLVII, s.I a, Matematica, 2001, f.2.

A COUNTERPART OF JENSEN’S CONTINUOUSINEQUALITY

AND APPLICATIONS IN INFORMATION THEORY

BY

S.S. DRAGOMIR∗ and C.J. GOH

Abstract. We derive a counterpart of Jensen’s continuous inequality for differen-tiable convex mappings. This seemingly trivial extension has significant applications ininformation theory. In particular, we derive several useful bounds for (differential) en-tropy, joint entropy, conditional entropy, and mutual information arising from the theoryof information. We also apply it to derive bounds for entropy measures for mixed popu-lations of random variables.

AMS Subject Classification: 94D17, 26D15, 26D20, 26D99.Key words: Jensen’s inequality, Information theory, Differential Entropy.

1. Introduction. Consider a random variable X : Ω → I ⊂ IR (whereΩ is some sample space) with a corresponding density function pX : I →[0, 1] which has finite support, so that pX(x) = 0 ∀x /∈ I. For notationalclarity, we shall suppress the subscript, i.e., the random variable which thedensity function is associated with, so that p(x) is the same as pX(x). LetY (X) be another random variable which is a function of the random variableX, and f : IR → IR be some convex differentiable function, such that

(1.1) f(x)− f(y) ≥ f ′(y)(x− y) ∀x, y ∈ IR.

Define the inner product of two functions: φ : I → IR and ψ : I → IR to be

(1.2) 〈φ, ψ〉 =∫

Iφ(x)ψ(x)p(x)dx

∗SSD acknowledges the financial support from the University of Western Australiaduring his visit when this work was completed.

Page 2: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

240 S.S. DRAGOMIR and C.J. GOH 2

with

(1.3) ||φ||2 =∫

Iφ2(x)p(x)dx.

The well-known Schwartz inequality is given by

(1.4) |〈φ, ψ〉| ≤ ||φ||||ψ||

or

(1.5) |IE(φ(X)ψ(X))| ≤ [IE(φ(X)2)]12 [IE(ψ(X)2)]

12 .

where IE(·) :=∫I(·)p(x)dx. The following inequality is well-known in the

literature as the (continuous) Jensen inequality for the convex function f :

(1.6) f(IE(Y )) = f [∫

IY (x)p(x)dx] ≤

∫If(Y (x))p(x)dx = IE(f(Y )).

It is the aim of this paper to derive a counterpart of Jensen’s inequality,and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for continuous random variables. Thefollowing is the continuous version of Theorem 2.1 of [1].

Theorem 1.1. Let f : IR → IR be convex and differentiable, Y bea random variable (as a function of the random variable X), and IE(·) =∫I(·)p(x)dx. Then, we have

(1.7)0 ≤ IE[f(Y )]− f [IE(Y )] ≤ IE[Y f ′(Y )]− IE[f ′(Y )]IE(Y ) ≤

≤ IE[f ′(Y )2]12 IE(Y 2)− [IE(Y )]2

12 .

Proof. The first inequality is just Jensen’s, which follows readily fromthe convexity of f . Since f is convex, we have

f(IE(Y ))− f(Y ) ≥ f ′(Y )[IE(Y )− Y ].

Taking expectation on both side, we have

f [IE(Y )]− IE[f(Y )] ≥ IE[f ′(Y )]IE(Y )− IE[Y f ′(Y )]

Page 3: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

3 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 241

which is the second inequality. Furthermore,

IE[f ′(Y )] IE(Y )− IE[Y f ′(Y )] == IE[f ′(Y )IE(Y )− Y f ′(Y )] = IE[f ′(Y )(IE(Y )− Y ] ≤≤ IE[f ′(Y )2]

12 IE[(Y − IE(Y ))2]

12 by Schwartz inequality

= IE[f ′(Y )2]12 IE(Y 2)− [IE(Y )]2

12 ,

which is the third inequality.

2. Applications in Information Theory: Bounds for Differ-ential Entropy. Given a continuous random variable X : Ω → I, theuncertainty of the associated random event is given by the entropy function(see McEliece [2]):

(2.1) Hb(X) = IE(−logbp(x)) = −∫

Ip(x)logb(p(x))dx.

Theorem 2.1. With the above assumptions, we have:

(2.2)0 ≤ logb|I| −Hb(x) ≤

≤ 1ln b

(|I|∫

Ip(x)2dx− 1

)=

12 ln b

∫I

∫I(p(x)− p(y))2dxdy

where equality holds if and only if p(x) is constant almost everywhere in I.

Proof: In Theorem 1.1, let

f(y) = −logby,

f ′(y) = − 1ln b

1y,

Y (x) = 1p(x) .

we have

0 ≤∫

Ip(x)logbp(x) dx+ logb

(∫I

p(x)p(x)

dx

)≤

≤∫

I

[− 1

ln bp(x)

]dx+

∫I

1ln b

p(x)2dx∫

I

p(x)p(x)

dx =

=1

ln b

[|I|∫

Ip(x)2dx− 1

]

Page 4: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

242 S.S. DRAGOMIR and C.J. GOH 4

or0 ≤ logb|I| −Hb(X) ≤

≤ 1ln b

(|I|∫

Ip(x)2dx− 1

)=

=12

1ln b

∫I

∫I(p(x)− p(y))2dxdy.

The following Corollary is immediately obvious.

Corollary 2.2. Let the density p be such that

(2.3) max(x,y)∈I2

|p(x)− p(y)| ≤√

2 ε ln b

|I|

for some ε > 0, then

(2.4) 0 ≤ logb|I| −Hb(X) ≤ ε.

The above result can be easily extended to the joint entropy for tworandom variables. Consider 2 continuous random variables X : Ω → I andY : Ω → J with joint density function p(x, y). The joint entropy of X andY is defined as

(2.5) Hb(X,Y ) = −∫

I

∫Jp(x, y)logbp(x, y)dxdy.

The proof of the following Theorem and its Corollary is similar to thatof Theorem 2.1.

Theorem 2.3. With the above assumptions, we have

(2.6)0 ≤ logb|I × J | −Hb(x, y) ≤

≤ 12 ln b

∫I

∫J

∫I

∫J

(p(x, y)− p(u, v))2 dxdydudv,

where equality holds if and only if p(x, y) is a constant almost everywherein I × J .

Corollary 2.4. Let the joint density p(x, y) be such that

(2.7) max(x,y),(u,v)∈I×J

|p(x, y)− p(u, v)| ≤√

2 ε ln b

|I × J |

Page 5: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

5 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 243

for some ε > 0, then

(2.8) 0 ≤ logb|I × J | −Hb(X,Y ) ≤ ε.

3. Results for Conditional Entropy. Consider a triplet of randomvariables X : Ω → I, Y : Ω → J and Z : Ω → K. The conditional entropyof X given Y is defined as:

(3.1) Hb(X | Y ) =∫

I

∫Jp(x, y) log

1p(x | y)

dxdy,

where p(x | y) = p(x, y)/p(y) is the conditional probability of X givenY = y. We also define the conditional probability of Z given Y = y asp(z | y), and the conditional probability of Z given Y = y and X = x asp(z | x, y). Let

(3.2) A(z) :=∫

I

∫Jγxy(z)dxdy

where

(3.3) γxy(z) :=p(x, y, z)p(x | y)

∀x, y, z.

Theorem 3.1. Given the assumptions above, we have:

0 ≤ Hb(Z) + IEZ [logbA(Z)]−Hb(X | Y ) ≤

≤ 12lnb

∫K

1p(z)

∫I

∫J

∫I

∫Jγxy(z)γuv(z)(p(x | y)−

−p(u | v))2dxdy dudv dz,

where IEZ(·) =∫K(·)p(z)dz.

Proof. To prove the Theorem, we need to modify Theorem 1.1 to read,

0 ≤ IEX,Y |z (f(W (X,Y )))− f(IEX,Y |z(W (X,Y ))

)≤

≤ IEX,Y |z (W (X,Y )f ′(W (X,Y )))−−IEX,Y |z (f ′(W (X,Y ))) IEX,Y |z (W (X,Y )) .

Page 6: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

244 S.S. DRAGOMIR and C.J. GOH 6

Let

f(W ) = − logbW

f ′(W ) = − 1ln b

1W

W (x, y) =1

p(x | y)

in the above, we obtain

(3.5)

0 ≤∫

I

∫J

[−p(x, y | z) logb

1p(x | y)

]dxdy+

+ logb

(∫I

∫J

p(x, y | z)p(x | y)

dxdy

)≤

≤ 1ln b

∫I

∫J

[−p(x, y | z)

p(x | y)p(x | y)

]dxdy+

+1

ln b

∫I

∫Jp(x, y | z)p(x | y)dxdy×

×∫

I

∫J

p(u, v | z)p(u | v)

dudv.

The last two terms on the right hand side of the second inequality can berewritten as:

∫I

∫J

p(x, y, z)p(z)

p(x | y)dxdy ·∫

I

∫J

p(u, v, z)p(z)p(u | v)

dudv − 1 =

=1

p(z)2

(∫I

∫J

p(x, y, z)p(x | y)

p(x | y)2dxdy ·

·∫

I

∫J

p(u, v, z)p(u | v)

dudv − p(z)2)

=

=1

p(z)2

(∫I

∫Jγxy(z)p(x | y)2dxdy ·

·∫

I

∫Jγuv(z)dudv −

(∫I

∫Jp(x, y, z)dxdy

)2)

=

=1

2p(z)2

∫I

∫J

∫I

∫Jγxy(z)γuv(z) (p(x | y)− p(u | v))2 dxdydudv.

Page 7: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

7 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 245

Substituting this term for the last two terms of (5.1), we get:

0 ≤∫

I

∫J

[−p(x, y, z)

p(z)logb

1p(x | y)

]dxdy+

+ logb

(∫I

∫J

p(x, y, z)p(z)p(x | y)

dxdy

)≤

≤ 12p(z)2 ln b

(∫I

∫J

∫I

∫Jγxy(z)γuv(z) (p(x | y) −

−p(u | v))2 dxdydudv.

Multiplying the above by p(z) and integrating over z to give:

0 ≤∫

Kp(z) logb

1p(z)

dz +

+∫

Kp(z) logb

(∫I

∫J

p(x, y, z)p(x | y)

dxdy

)dz−

−∫

I

∫J

∫Kp(x, y, z) logb

1p(x | y)

dxdydz ≤

≤ 12 ln b

∫K

1p(z)

(∫I

∫J

∫I

∫Jγxy(z)γuv(z) (p(x | y) −

− p(u | v))2 dxdydudv)dz,

or0 ≤ Hb(Z) + IEZ (logbA(Z))−Hb(X | Y ) ≤

≤ 12 ln b

∫K

1p(z)

(∫I

∫J

∫I

∫Jγxy(z)γuv(z) (p(x | y) −

− p(u | v))2 dxdydudv)dz.

which completes the proof. The following Corollary follows readily from the above:

Corollary 3.2. Let ε > 0 be given. If

(3.6) max(x,y),(u,v)

|p(x | y)− p(u | v)| ≤√

2 ε lnbM

where

(3.7) M :=∫

K

A(z)2

p(z)dz =

∫K

1p(z)

(∫I

∫Jγxy(z)dxdy

)2

dz,

Page 8: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

246 S.S. DRAGOMIR and C.J. GOH 8

then

(3.8) 0 ≤ Hb(Z) + IEZ (logbA(Z))−Hb(X | Y ) ≤ ε.

Using theorem 1.1, we can readily obtain the following result and its corol-lary:

Theorem 3.3. With the above assumptions, we have:

(3.9)0 ≤ logb|I| −Hb(X | Y ) ≤

≤ 12 ln b

∫I

∫J

∫I

∫Jp(y)p(v)(p(x | y)− p(u | v))2dxdydudv

Corollary 3.4. Let ε > 0 be given. If

(3.10) max(x,y),(u,v)

|p(x | y)− p(u | v)| ≤√

2 ε ln b|I|

then

(3.11) 0 ≤ logb|I| −Hb(X | Y ) ≤ ε.

4. Results for Mutual Information. Given the pair of continuousrandom variables X : Ω → I and Y : Ω → J with joint density functionp(x, y), the mutual information is defined as [2, p. 24]:

(4.1) Ib(X;Y ) := Hb(X)−Hb(X | Y ) =∫

I

∫Jp(x, y) logb

p(x, y)p(x)p(y)

dxdy.

Theorem 4.1. With the above assumptions, we have:

(4.2)

0 ≤ Ib(X;Y ) ≤

≤ 12ln b

∫I

∫J

∫I

∫Jp(x)p(y)p(u)p(v)

(p(x, y)p(x)p(y)

− p(u, v)p(u)p(v)

)2

dxdydudv.

Moreover, equalities holds in both inequalities simultaneously if and only ifX and Y are independent.

Page 9: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

9 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 247

Proof. To prove this Theorem, we modify Theorem 1.1 to read:

(4.3)0 ≤ IEX,Y (f(W (X,Y ))− f(IEX,YW (X,Y )) ≤

≤ IEX,Y (W (X,Y )f ′(W (X,Y ))−−IEX,Y f

′(W (X,Y ))IEX,YW (X,Y ).

where IEX,Y (·) =∫I

∫J(·)p(x, y)dxdy. Let

f(W ) = − logbW

f ′(W ) = − 1ln b

1W

W (x, y) =p(x)p(y)p(x, y)

,

in the above to obtain

0 ≤∫

I

∫Jp(x, y) logb

p(x, y)p(x)p(y)

dxdy +

+ logb

(∫I

∫Jp(x, y)

p(x)p(y)p(x, y)

dxdy

)≤

≤ − 1ln b

∫I

∫J

p(x)p(y)p(x, y)

p(x, y)p(x)p(y)

p(x, y)dxdy+

+1

ln b

∫I

∫J

p(x, y)2

p(x)p(y)dxdy

∫I

∫J

p(x)p(y)p(x, y)

p(x, y)dxdy

or0 ≤ Ib(X;Y ) ≤

≤ 1ln b

(∫I

∫J

(p(x, y)p(x)p(y)

)2

p(x)p(y)dxdy∫I

∫Jp(x)p(y)dxdy − 1

)=

=12

1ln b

∫I

∫J

∫I

∫Jp(x)p(y)p(u)p(v)(

p(x, y)p(x)p(y)

− p(u, v)p(u)p(v)

)2

dxdydudv.

Clearly, X and Y are independent if and only if p(x, y) = p(x)p(y) almosteverywhere if and only if the last term of the above is 0. This completesthe proof.

The following Corollary is immediately obvious:

Page 10: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

248 S.S. DRAGOMIR and C.J. GOH 10

Corollary 4.2. Let ε > 0 be given. If

(4.4) max(x,y),(u,v)

| p(x, y)p(x)p(y)

− p(u, v)p(u)p(v)

| ≤√

2 ε lnb,

then,

(4.5) 0 ≤ Ib(X;Y ) ≤ ε.

Now consider a triplet of random variables X : Ω → I, Y : Ω → J andZ : Ω → K. Define the mutual information (interpreted as the amount ofinformation X and Y provide about Z, see p.26 of [2])

(4.6) Ib(X,Y ;Z) :=∫

I

∫J

∫Kp(x, y, z) logb

p(z | x, y)p(z)

dxdydz.

Theorem 4.3. With the above assumptions, we have:

(4.7)

0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z) ≤

≤∫

I

∫J

∫K

∫I

∫J

∫Kp(x, y)p(u, v)p(z | y)p(w | v)×

×(p(z | x, y)p(z | y)

− p(w | u, v)p(w | v)

)2

dxdydzdudvdw,

where equality holds if and only if p(z | x, y) = p(z | y) ∀(x, y, z) withp(x, y, z) > 0.

Proof. We first midify Theorem 1.1 to read:

0 ≤ IEXY Z (f(W (X,Y, Z)))− f (IEXY ZW (X,Y, Z)) ≤≤ IEXY Z (W (X,Y, Z)f ′(W (X,Y, Z)))−−IEXY Z (f ′(W (X,Y, Z))) IEXY Z (W (X,Y, Z)) ,

where IEXY Z(·) =∫I

∫J

∫K p(x, y, z)dxdydz. Let:

f(W ) = − logbW

f ′(W ) = − 1ln b

1W

W (X,Y, Z) =p(z | y)p(z | x, y)

,

Page 11: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

11 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 249

in the above to obtain

0 ≤ logb

(∫I

∫J

∫K

p(z | y)p(z | x, y)

p(x, y, z)dxdydz)−

−∫

I

∫J

∫Kp(x, y, z)logb

p(z | y)p(z | x, y)

dxdydz ≤

≤∫

I

∫J

∫K

[−W 1

Wp(x, y, z)

]dxdydz+

+∫

I

∫J

∫K

p(z | x, y)p(z | y)

p(x, y, z)dxdydz×

×∫

I

∫J

∫K

p(z | y)p(z | x, y)

p(x, y, z)dxdydz.

Noting that: ∫I

∫J

∫K

p(z | y)p(z | x, y)

p(x, y, z)dxdydz =

=∫

I

∫J

∫K

p(z | y)p(xy)

dxdydz =

=∫

I

∫Jp(x, y)

∫Kp(z | y)dz dxdy =

=∫

I

∫Jp(x, y)dxdy = 1,

we have,

0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z)1

ln b

(∫I

∫J

∫K

p(z | x, y)p(z | y)

p(x, y, z)dxdydz ×

×∫

I

∫J

∫K

p(z | y)p(z | x, y)

p(x, y, z)dxdydz − 1)

=

=12

1ln b

∫I

∫J

∫K

∫I

∫J

∫Kp(x, y)p(u, v)p(z | y)p(w | v)×

×(p(z | x, y)p(z | y)

− p(w | u, v)p(w | v)

)2

dxdydzdudvdw,

which completes the proof. The following Corollary is immediately obvious:

Corollary 4.4. Let ε > 0 be given. If

(4.9) max(x,y,z),(u,v,)

|p(z | x, y)p(z | y)

− p(w | u, v)p(w | v)

| ≤√

2 ε ln b,

Page 12: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

250 S.S. DRAGOMIR and C.J. GOH 12

then,

(4.10) 0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z) ≤ ε.

5. Results for the entropy, joint entropy and conditional en-tropy of mixed populations. Consider two continuous random variablesX1, X2 both having the same (finite) range I but having different densi-ties. If we mix the two populations representing X1 and X2 together withproportion 0 ≤ α ≤ 1 and 1 − α respectively, we derive a new populationrepresented by a random variable X, such that

p(x) = αp1(x) + (1− α)p2(x), α ∈ [0, 1].

where p1, p2 are the probability density of X1, X2 respectively, p(x) isthe probability density for X. In this and the following sections, we shallstudy the convexity properties of the entropy of mixed population, andestablish some useful bounds for various important quantitative measuresin information theory.

Theorem 5.1. With the above assumptions we have

(5.1)

0 ≤ Hb(X)− αHb(X1)− (1− α)Hb(X2) ≤

≤ 12ln b

∫I

∫Ip(x)p(y)

(p1(x)p(x)

− p1(y)p(y)

)2

+

+ (1− α)(p2(x)p(x)

− p2(y)p(y)

)2)dxdy.

Proof. We have,

(5.2)

Hb(X)− αHb(X1)− (1− α)Hb(X2) =

=∫

Ip(x) logb

1p(x)

dx− α

∫Ip1(x) logb

1p1(x)

dx−

−(1− α)∫

Ip2(x) logb

1p2(x)

dx =

= α

∫Ip1(x) logb

p1(x)p(x)

dx+ (1− α)∫

Ip2(x) logb

p2(x)p(x)

dx =

= −α∫

Ip1(x) logb

p(x)p1(x)

dx− (1− α)∫

Ip2(x) logb

p(x)p2(x)

dx.

Page 13: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

13 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 251

By Jensen’s inequality we have that:∫Ip1(x) logb

p(x)p1(x)

dx ≤ logb

∫Ip1(x)

p(x)p1(x)

dx =

= logb

∫Ip(x)dx = 0

and similarly ∫Ip2(x) logb

p(x)p2(x)

dx ≤ 0,

from which we obtain the first inequality of (5.1):

Hb(X)− αHb(X1)− (1− α)Hb(X2) ≥ 0.

Using a modification of Theorem 1.1 we have:

(5.3)

−∫

Ip1(x) logb

p(x)p1(x)

dx ≤ − logb

(∫Ip1(x)

p(x)p1(x)

dx

)+

+1

2ln b

∫I

∫I

p1(x)p1(y)p(x)p1(x)

p(y)p1(y)

(p(x)p1(x)

− p(y)p1(y)

)2

dxdy =

=1

2ln b

∫I

∫I

p21(x)p

21(y)

p(x)p(y)(p(x)p1(y)− p(y)p1(x))2

p21(x)p

21(y)

dxdy =

=1

2ln b

∫I

∫Ip(x)p(y)

(p1(y)p(y)

− p1(x)p(x)

)2

dxdy

and similarly,

(5.4)−∫

Ip2(x) logb

p(x)p2(x)

dx ≤

≤ 1ln b

∫I

∫Ip(x)p(y)

(p2(x)p(x)

− p2(y)p(y)

)2

dxdy.

Consequently, using the identity (5.2), and inequalities (5.3) and (5.4) weobtain the second inequality of (5.1).

Corollary 5.2. If

(5.5) maxx,y

maxi=1,2

∣∣∣∣pi(x)p(x)

− pi(y)p(y)

∣∣∣∣ ≤√

2εln b

Page 14: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

252 S.S. DRAGOMIR and C.J. GOH 14

for some ε > 0 (Note: ε depends on α), then

0 ≤ Hb(X)− αHb(X1)− (1− α)Hb(X2) ≤ ε.

Proof. From (5.5) we have that

α

(p1(x)p(x)

− p1(y)p(y)

)2

+ (1− α)(p2(x)p(x)

− p2(y)p(y)

)2

≤ 2εln b,

thus, by (5.1)

Hb(X)− αHb(X1)− (1− α)Hb(X2) ≤

≤ 12ln b

· 2εln b∫

I

∫Ip(x)p(y)dxdy = ε.

We can prove similarly, that the joint entropy of two random variablesis also concave in the joint density distribution.

Let (X1, Y1), (X2, Y2) and (X,Y ) be pairs of random variables such thatX1, X2 and X have the same (finite) range I, and Y1, Y2 and Y have thesame (finite) range J , and furthermore,

p(x, y) = αp1(x, y) + (1− α)p2(x, y) ∀x ∈ I, ∀y ∈ J,

where α ∈ [0, 1]. The following Theorem and its Corollary can be proved inthe same way as that of Theorem 5.1 and Corollary 5.2:

Theorem 5.3. With the previous assumptions we have

(5.6)

0 ≤ Hb(X,Y )− αHb(X1, Y1)− (1− α)Hb(X2, Y2) ≤

≤ 12ln b

∫I

∫J

∫I

∫Jp(x, y)p(u, v)×

×

(p1(x, y)p(x, y)

− p1(u, v)p(u, v)

)2

+

+(1− α)(p2(x, y)p(x, y)

− p2(u, v)p(u, v)

)2)dxdydudv.

The following Corollary follows from Theorem 5.3 directly.

Page 15: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

15 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 253

Corollary 5.5. With the above assumptions and if

max(x,y),(u,v)

maxi=1,2

∣∣∣∣pi(x, y)p(x, y)

− pi(u, v)p(u, v)

∣∣∣∣ <√

2εln b

for some ε > 0 (note: this depends on α), then we have

0 ≤ Hb(X,Y )− αHb(X1, Y1)− (1− α)Hb(X2, Y2) ≤ ε.

Similar results for conditional entropy can also be established as follows:

Theorem 5.4. With the previous assumptions we have

(5.7)

0 ≤ Hb(X | Y )− αHb(X1 | Y1)− (1− α)Hb(X2 | Y2) ≤

≤ 12ln b

∫I

∫J

∫I

∫Jp(x | y)p(u | v)[αp1(y)p1(v)(

p1(u | v)p(u | v)

−p1(x | y)p(x | y)

)2

+

+(1− α)p2(y)p2(v)(p2(u | v)p(u | v)

− p2(x | y)p(x | y)

)2

]dxdydudv.

Proof. We have, by definition,

(5.8)

Hb(X | Y ) −αHb(X1 | Y1)− (1− α)Hb(X2 | Y2) =

=∫

I

∫Jp(x, y)logb

1p(x | y)

dxdy−

−α∫

I

∫Jp1(x, y)logb

1p1(x | y)

dxdy−

−(1− α)∫

I

∫Jp2(x, y)logb

1p2(x | y)

dxdy =

= −α∫

I

∫Jp1(x, y)logb

p(x | y)p1(x | y)

dxdy−

−(1− α)∫

I

∫Jp2(x, y)logb

p(x | y)p2(x | y)

dxdy.

Page 16: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

254 S.S. DRAGOMIR and C.J. GOH 16

Using Jensen’s inequality, we have

−∫

I

∫Jp1(x, y) logb

p(x | y)p1(x | y)

dxdy ≥

≥ −logb

∫I

∫Jp1(x, y)

p(x | y)p1(x | y)

dxdy =

= −logb

(∫I

∫Jp1(y)p(x | y)

)dxdy =

= −logb

(∫I

∫Jp(x | y)dxp1(y)

)dy =

= −logb

(∫Jp1(y)dy

)= logb1 = 0,

and similarly

−∫

I

∫Jp2(x, y)logb

p(x | y)p2(x | y)

dxdy ≥ 0

thus proving the first inequality in (5.8). Furthermore, using a modifiedversion of Theorem 1.1, we have

∫I

∫Jp1(x, y)logb

p(x | y)p1(x | y)

dxdy ≤

≤ 12ln b

∫I

∫J

∫I

∫J

p1(x, y)p1(u, v)p(x|y)p1(x|y)

p(u|v)p1(u|v)

(p(x | y)p1(x | y)

− p(u | v)p1(u | v)

)2

dxdydudv=

=1

2ln b

∫I

∫J

∫I

∫J

p1(x, y)p1(u, v)p1(x | y)p1(u | v)

1p(x | y)p(u | v)

×

× (p(x | y)p1(u | v)−p1(x | y)p(u | v))2 dxdydudv =

=1

2ln b

∫I

∫J

∫I

∫Jp1(y)p1(v)p(x | y)p(u | v)(

p1(u | v)p(u | v)

− p1(x | y)p(x | y)

)2

dxdydudv.

A similar inequality for p2 can be obtained. Using these and the inequality

Page 17: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

17 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 255

(5.9) we have

0 ≤ Hb(X | Y )− αHb(X1 | Y1)− (1− α)Hb(X2 | Y2) ≤

≤ α1

2ln b

∫I

∫J

∫I

∫Jp1(y)p1(v)p(x | y)p(u | v)×

×(p1(u | v)p(u | v)

− p1(x | y)p(x | y)

)2

dxdydudv+

+(1− α)1

2ln b

∫I

∫J

∫I

∫Jp2(y)p2(v)p(x | y)p(u | v)×

×(p2(u | v)p(u | v)

− p2(x | y)p(x | y)

)2

dxdydudv,

which is the second inequality of (5.7). The following Corollary is immediately obvious:

Corollary 5.5. With the above assumptions and if:

maxu,v,x,y

maxi=1,2

|pi(u | v)p(u | v)

− pi(x | y)p(x | y)

| <√

2 ε ln b, ε > 0,

then:

0 ≤ Hb(X | Y )− αHb(X1 | Y1)− (1− α)Hb(X2 | Y2) < ε.

6. Results for the mutual information of mixed populations.The mutual information between two random variables X and Y is previ-ously defined in section 3. We may extend the results in section 3 further.Consider two pairs of discrete random variables (X1, Y1) and (X2, Y2) withjoint probability densities p1(x, y) and p2(x, y) respectively. One can thinkof X1 and X2 as the inputs to some noisy transmission channel and Y1 andY2 as the outputs from the channel. Let the range of X1 and X2 be I,and the range of Y1 and Y2 be J . Define another pair of random variable(X,Y ) (having range I and J respectively) where X’s probability densityis a convex combination

p(x) = αp1(x) + (1− α)p2(x) ∀x,

where 0 ≤ α ≤ 1. The discrete version of the following result is known [2,p. 28]:

(6.1) αIb(X1;Y1) + (1− α)Ib(X2;Y2) ≤ Ib(X;Y )

Page 18: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

256 S.S. DRAGOMIR and C.J. GOH 18

where Y1, Y2 and Y are the channel outputs corresponding to X1, X2 andX respectively. The continuous version (concavity of I with respect to theinput probabilities) can be established in the following Theorem:

Theorem 6.1. With the above assumptions we have:

(6.2)

0 ≤ Ib(X;Y )− αIb(X1;Y1)− (1− α)Ib(X2;Y2) ≤

≤ 12ln b

∫J

∫Jp(y)p(v)

(p1(v)p(v)

− p1(y)p(y)

)2

+

+(1− α)(p2(v)p(v)

− p2(y)p(y)

)2)dydv.

Proof. We have, by definition,

(6.3)

αIb(X1;Y1) +(1−α)Ib(X2;Y2)−Ib(X;Y ) =

= α

∫I

∫Jp1(x, y) logb

p(y | x)p1(y)

dxdy+

+(1−α)∫

I

∫Jp2(x, y) logb

p(y | x)p2(y)

dxdy+

+∫

I

∫J

(αp1(x, y)(1−α)p2(x, y)) logb

p(y | x)p(y)

dxdy=

= α

∫I

∫Jp1(x, y) logb

p(y)p1(y)

dxdy+

+(1−α)∫

I

∫Jp2(x, y) logb

p(y)p2(y)

dxdy.

Application of Jensen inequality to each term of the above sums, wehave:∫

I

∫Jp1(x, y) logb

p(y)p1(y)

dxdy ≤∫

I

∫J

logb

(p1(x, y)

p(y)p1(y)

dxdy

)=

= logb

(∫J

p(y)p1(y)

∫Ip1(x, y)dxdy

)=

= logb

(∫J

p(y)p1(y)

p1(y)dy)

= logb 1 = 0.

Similarly, ∫I

∫Jp2(x, y) logb

p(y)p1(y)

dxdy ≤ 0.

Page 19: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

19 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 257

Adding the above yileds the first inequality of (6.2). By a modified versionof Theorem 1.1 we get that:

αIb(X1;Y1) + (1− α)Ib(X2;Y2)− Ib(X;Y ) ≥

≥ − α

2ln b

∫I

∫Jp(y)p(v)

(p1(v)p(v)

− p1(y)p(y)

)2

dxdy−

−(1− α)2ln b

∫I

∫Jp(y)p(v)

(p2(v)p(v)

− p2(y)p(y)

)2

dxdy,

which is the second inequality of (6.2). The following Corollary follows directly.

Corollary 6.2. With the above assumptions and if

(6.4) maxv,y

maxi=1,2

∣∣∣∣pi(v)p(v)

− pi(y)p(y)

∣∣∣∣ <√

2εln b, ε > 0

for some ε > 0 (note: depends on α) then we have

(6.5) 0 ≤ αIb(X1;Y1) + (1− α)Ib(X2;Y2)− Ib(X;Y ) ≤ ε.

Furthermore, if the input probabilities p(x) are fixed and we are giventwo sets of transition probabilities p1(y | x) and p2(y | x). Let the convexcombination of these transition probabilities be

p(y | x) = αp1(y | x) + (1− α)p2(y | x).

The discrete version of the following result is known [2, p. 29]

(6.6) Ib(X;Y ) ≤ αIb(X;Y1) + (1− α)Ib(X;Y2)

where Y , Y1, Y2 are channel outputs corresponding to the transition prob-abilities p(y | x), p1(y | x) and p2(y | x). This result can be extended asfollows.

Theorem 6.3. With the above assumptions we have:

(6.6)

0 ≤ αIb(X;Y1) + (1− α)Ib(X;Y2)− Ib(X;Y ) ≤

≤ 12

1ln b

∫I

∫J

∫I

∫Jp(x | y)p(u | v)·

·

(αp1(y)p1(v)

(p1(u | v)p(u | v)

− p1(x | y)p(x | y)

)2

+

+ (1− α)p2(y)p2(v)(p2(u | v)p(u | v)

− p2(x | y)p(x | y)

)2)dxdydudv.

Page 20: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

258 S.S. DRAGOMIR and C.J. GOH 20

Proof. By definition, we have:

(6.7)

Ib(X;Y ) −αIb(X;Y1)− (1− α)Ib(X;Y2) =

= α

∫I

∫Jp1(x, y) logb

p(x | y)p1(x | y)

dxdy+

+(1− α)∫

I

∫Jp2(x, y) logb

p(x | y)p2(x | y)

dxdy,

Applying Jensen’s inequality to the first term to get:

∫I

∫J

p1(x, y) logb

p(x | y)p1(x | y)

dxdy ≤

≤ logb

(∫I

∫Jp1(x, y)

p(x | y)p1(x | y)

dxdy

)=

= logb

(∫I

∫Jp1(y)

p(x, y)p(y)

dxdy

)=

= logb

(∫J

p1(y)p(y)

∫Ip(x, y)dxdy

)=

= logb

(∫J

p1(y)p(y)

p(y)dy)

=

= logb

(∫Jp1(y)dy

)= 0.

Similarly,

∫I

∫Jp1(x, y) logb

p(x | y)p1(x | y)

dxdy ≤ 0.

Adding the above yields the first inequality of (6.6).

Page 21: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

21 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 259

By a modified version of Theorem 1.1, we have:

(6.8)

∫I

∫Jp1(x, y) logb

p(x | y)p1(x | y)

dxdy ≥

≥ logb

(∫I

∫Jp1(x, y)

p(x | y)p1(x | y)

dxdy

)−

− 12ln b

∫I

∫J

∫I

∫J

p1(x, y)p1(u, v)p(x | y)p1(x | y)

· p(u | v)p1(u | v)(

p(x | y)p1(x | y)

− p(u | v)p1(u | v)

)2

dxdy =

= − 12ln b

∫I

∫J

∫I

∫J

p1(x, y)p1(u, v)p1(x | y)p1(u | v)

· 1p(x | y)p(u | v)

×

× (p(x | y)p1(u | v)− p1(x | y)p(u | v))2dxdy =

= − 12ln b

∫I

∫J

∫I

∫Jp1(y)p1(v)p(x | y)p(u | v)(

p1(u | v)p(u | v)

− p1(x | y)p(x | y)

)2

dxdy.

Using these and (6.7) and a similar inequality for p2, we obtain

αIb(X;Y1) + (1− α)Ib(X;Y2)− Ib(X;Y ) ≤

≤ 12ln b

∫I

∫J

∫I

∫Jp(x | y)p(u | v)×

×

(αp1(y)p1(v)

(p1(u | v)p(u | v)

− p1(x | y)p(x | y)

)2

+

+(1− α)p2(y)p2(v)(p2(u | v)p(u | v)

− p2(x | y)p(x | y)

)2)× dxdydudv,

and the Theorem is proved. The following Corollary follows directly.

Corollary 6.4. With the above assumptions and if

maxu,v,x,y

maxi=1,2

∣∣∣∣pi(u | v)p(u | v)

− pi(x | y)p(x | y)

∣∣∣∣ <√

2εln b

for some ε > 0 (note: depends on α) then we have:

0 ≤ αIb(X;Y1) + (1− α)Ib(X;Y2)− Ib(X;Y ) ≤ ε.

Page 22: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

260 S.S. DRAGOMIR and C.J. GOH 22

7. Further bounds based on ratio of maximum and minimumdensity. All the corollaries that we have discussed so far yield sufficientconditions for the lower bounds on the respective measures of random vari-ables or mixed population of random variables. These sufficient conditionsare all based on the absolute difference between some densities. We nowderive another sufficient condition which is based on the ratio of densities.We show that this leads to similar lower bounds which are in many casessimpler in appearance. The following Theorem is a Corollary of Theorem1.1.

Theorem 7.1. Let

(7.1) ρ := maxx,y

p(x)p(y)

,

If

(7.2) ρ ≤ φ(ε) := 1 + ε ln b +√ε ln b(ε ln b + 2),

then

(7.3) 0 ≤ logb|I| −Hb(X) ≤ ε.

Proof. If ρ ≤ φ(ε), then clearly, for all i, k,

1φ(ε)

≤ p(x)p(y)

≤ φ(ε),

iff (p(x)p(y)

)2

− (2 + 2ε ln b)(

p(x)p(y)

)+ 1 ≤ 0,

iff(p(x)− p(y))2

p(x)p(y)=(p(x)p(y)

)− 2 +

(p(x)p(y)

)≤ 2ε ln b.

Hence from (2.2) of Theorem 2.1, we have0 ≤ logb|I| −Hb(X) ≤ 1

2ln b

∫I

∫J p(x)p(y) 2ε ln b dxdy = ε.

The following Theorem is a Corollary of Theorem 2.3, and the proof issimilar to that of Theorem 7.1.

Page 23: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

23 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 261

Theorem 7.2. Let

(7.4) ρ := maxx,y,u,v

p(x, y)p(u, v)

,

If

(7.5) ρ ≤ φ(ε),

then

(7.6) 0 ≤ logb|I| −Hb(X,Y ) ≤ ε.

The following Theorem is a Corollary of Theorem 3.1, and the proofis similar to that of Theorem 7.1. Notice that this sufficient condition isindependent of M (as defined in (3.7), and is in a simpler form than before.

Theorem 7.3. Let

(7.7) ρ := maxx,y,u,v

p(x | y)p(u | v)

,

If

(7.8) ρ ≤ φ(ε),

then

(7.9) 0 ≤ Hb(Z) + IE(logbA)−Hb(X | Y ) ≤ ε,

where Hb(X | Y ) and A are as defined in (3.1) and (3.2) respectively.The following Theorem is a Corollary of Theorem 4.1, and the proof is

similar to that of Theorem 7.1.

Theorem 7.4. Let

(7.10) ρ := maxx,y,u,v

p(x)p(y)p(u, v)p(u)p(v)p(x, y)

,

If

(7.11) ρ ≤ φ(ε),

Page 24: A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for

262 S.S. DRAGOMIR and C.J. GOH 24

then

(7.12) 0 ≤ Ib(X,Y ) ≤ ε.

The following Theorem is a Corollary of Theorem 4.3, and the proof issimilar to that of Theorem 7.1.

Theorem 7.5. Let

(7.13) ρ := maxx,y,z,u,v,w

p(z | x, y)p(w | v)p(w | u, v)p(z | y)

,

If

(7.14) ρ ≤ φ(ε),

then

(7.15) 0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z) ≤ ε.

REFERENCES

1. Dragomir, s.s. and Goh, c.j. – A counterpart of Jensen’s Discrete Inequality for

Differentiable Convex Mappings and Applications in Information Theory, Mathe-

matical and Computer Modelling, to appear.

2. Mceliece, r.j. – The Theory of Information and Coding, Addison Wesley Publish-

ing Company, Reading, 1977.

Received: 19.I.2000 Department of Mathematics

University of Timisoara

RO-1900 Timisoara

ROMANIA

Department of Mathematics

University of Western Australia

Nedlands, WA 6907

AUSTRALIA