A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful...
Transcript of A COUNTERPART OF JENSEN’S CONTINUOUSannalsmath/pdf-uri anale/F2...and apply it to obtain useful...
ANALELE STIINTIFICE ALE UNIVERSITATII ”AL.I.CUZA” IASITomul XLVII, s.I a, Matematica, 2001, f.2.
A COUNTERPART OF JENSEN’S CONTINUOUSINEQUALITY
AND APPLICATIONS IN INFORMATION THEORY
BY
S.S. DRAGOMIR∗ and C.J. GOH
Abstract. We derive a counterpart of Jensen’s continuous inequality for differen-tiable convex mappings. This seemingly trivial extension has significant applications ininformation theory. In particular, we derive several useful bounds for (differential) en-tropy, joint entropy, conditional entropy, and mutual information arising from the theoryof information. We also apply it to derive bounds for entropy measures for mixed popu-lations of random variables.
AMS Subject Classification: 94D17, 26D15, 26D20, 26D99.Key words: Jensen’s inequality, Information theory, Differential Entropy.
1. Introduction. Consider a random variable X : Ω → I ⊂ IR (whereΩ is some sample space) with a corresponding density function pX : I →[0, 1] which has finite support, so that pX(x) = 0 ∀x /∈ I. For notationalclarity, we shall suppress the subscript, i.e., the random variable which thedensity function is associated with, so that p(x) is the same as pX(x). LetY (X) be another random variable which is a function of the random variableX, and f : IR → IR be some convex differentiable function, such that
(1.1) f(x)− f(y) ≥ f ′(y)(x− y) ∀x, y ∈ IR.
Define the inner product of two functions: φ : I → IR and ψ : I → IR to be
(1.2) 〈φ, ψ〉 =∫
Iφ(x)ψ(x)p(x)dx
∗SSD acknowledges the financial support from the University of Western Australiaduring his visit when this work was completed.
240 S.S. DRAGOMIR and C.J. GOH 2
with
(1.3) ||φ||2 =∫
Iφ2(x)p(x)dx.
The well-known Schwartz inequality is given by
(1.4) |〈φ, ψ〉| ≤ ||φ||||ψ||
or
(1.5) |IE(φ(X)ψ(X))| ≤ [IE(φ(X)2)]12 [IE(ψ(X)2)]
12 .
where IE(·) :=∫I(·)p(x)dx. The following inequality is well-known in the
literature as the (continuous) Jensen inequality for the convex function f :
(1.6) f(IE(Y )) = f [∫
IY (x)p(x)dx] ≤
∫If(Y (x))p(x)dx = IE(f(Y )).
It is the aim of this paper to derive a counterpart of Jensen’s inequality,and apply it to obtain useful upper bounds for several quantitative mea-sures arising from information theory for continuous random variables. Thefollowing is the continuous version of Theorem 2.1 of [1].
Theorem 1.1. Let f : IR → IR be convex and differentiable, Y bea random variable (as a function of the random variable X), and IE(·) =∫I(·)p(x)dx. Then, we have
(1.7)0 ≤ IE[f(Y )]− f [IE(Y )] ≤ IE[Y f ′(Y )]− IE[f ′(Y )]IE(Y ) ≤
≤ IE[f ′(Y )2]12 IE(Y 2)− [IE(Y )]2
12 .
Proof. The first inequality is just Jensen’s, which follows readily fromthe convexity of f . Since f is convex, we have
f(IE(Y ))− f(Y ) ≥ f ′(Y )[IE(Y )− Y ].
Taking expectation on both side, we have
f [IE(Y )]− IE[f(Y )] ≥ IE[f ′(Y )]IE(Y )− IE[Y f ′(Y )]
3 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 241
which is the second inequality. Furthermore,
IE[f ′(Y )] IE(Y )− IE[Y f ′(Y )] == IE[f ′(Y )IE(Y )− Y f ′(Y )] = IE[f ′(Y )(IE(Y )− Y ] ≤≤ IE[f ′(Y )2]
12 IE[(Y − IE(Y ))2]
12 by Schwartz inequality
= IE[f ′(Y )2]12 IE(Y 2)− [IE(Y )]2
12 ,
which is the third inequality.
2. Applications in Information Theory: Bounds for Differ-ential Entropy. Given a continuous random variable X : Ω → I, theuncertainty of the associated random event is given by the entropy function(see McEliece [2]):
(2.1) Hb(X) = IE(−logbp(x)) = −∫
Ip(x)logb(p(x))dx.
Theorem 2.1. With the above assumptions, we have:
(2.2)0 ≤ logb|I| −Hb(x) ≤
≤ 1ln b
(|I|∫
Ip(x)2dx− 1
)=
12 ln b
∫I
∫I(p(x)− p(y))2dxdy
where equality holds if and only if p(x) is constant almost everywhere in I.
Proof: In Theorem 1.1, let
f(y) = −logby,
f ′(y) = − 1ln b
1y,
Y (x) = 1p(x) .
we have
0 ≤∫
Ip(x)logbp(x) dx+ logb
(∫I
p(x)p(x)
dx
)≤
≤∫
I
[− 1
ln bp(x)
]dx+
∫I
1ln b
p(x)2dx∫
I
p(x)p(x)
dx =
=1
ln b
[|I|∫
Ip(x)2dx− 1
]
242 S.S. DRAGOMIR and C.J. GOH 4
or0 ≤ logb|I| −Hb(X) ≤
≤ 1ln b
(|I|∫
Ip(x)2dx− 1
)=
=12
1ln b
∫I
∫I(p(x)− p(y))2dxdy.
The following Corollary is immediately obvious.
Corollary 2.2. Let the density p be such that
(2.3) max(x,y)∈I2
|p(x)− p(y)| ≤√
2 ε ln b
|I|
for some ε > 0, then
(2.4) 0 ≤ logb|I| −Hb(X) ≤ ε.
The above result can be easily extended to the joint entropy for tworandom variables. Consider 2 continuous random variables X : Ω → I andY : Ω → J with joint density function p(x, y). The joint entropy of X andY is defined as
(2.5) Hb(X,Y ) = −∫
I
∫Jp(x, y)logbp(x, y)dxdy.
The proof of the following Theorem and its Corollary is similar to thatof Theorem 2.1.
Theorem 2.3. With the above assumptions, we have
(2.6)0 ≤ logb|I × J | −Hb(x, y) ≤
≤ 12 ln b
∫I
∫J
∫I
∫J
(p(x, y)− p(u, v))2 dxdydudv,
where equality holds if and only if p(x, y) is a constant almost everywherein I × J .
Corollary 2.4. Let the joint density p(x, y) be such that
(2.7) max(x,y),(u,v)∈I×J
|p(x, y)− p(u, v)| ≤√
2 ε ln b
|I × J |
5 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 243
for some ε > 0, then
(2.8) 0 ≤ logb|I × J | −Hb(X,Y ) ≤ ε.
3. Results for Conditional Entropy. Consider a triplet of randomvariables X : Ω → I, Y : Ω → J and Z : Ω → K. The conditional entropyof X given Y is defined as:
(3.1) Hb(X | Y ) =∫
I
∫Jp(x, y) log
1p(x | y)
dxdy,
where p(x | y) = p(x, y)/p(y) is the conditional probability of X givenY = y. We also define the conditional probability of Z given Y = y asp(z | y), and the conditional probability of Z given Y = y and X = x asp(z | x, y). Let
(3.2) A(z) :=∫
I
∫Jγxy(z)dxdy
where
(3.3) γxy(z) :=p(x, y, z)p(x | y)
∀x, y, z.
Theorem 3.1. Given the assumptions above, we have:
0 ≤ Hb(Z) + IEZ [logbA(Z)]−Hb(X | Y ) ≤
≤ 12lnb
∫K
1p(z)
∫I
∫J
∫I
∫Jγxy(z)γuv(z)(p(x | y)−
−p(u | v))2dxdy dudv dz,
where IEZ(·) =∫K(·)p(z)dz.
Proof. To prove the Theorem, we need to modify Theorem 1.1 to read,
0 ≤ IEX,Y |z (f(W (X,Y )))− f(IEX,Y |z(W (X,Y ))
)≤
≤ IEX,Y |z (W (X,Y )f ′(W (X,Y )))−−IEX,Y |z (f ′(W (X,Y ))) IEX,Y |z (W (X,Y )) .
244 S.S. DRAGOMIR and C.J. GOH 6
Let
f(W ) = − logbW
f ′(W ) = − 1ln b
1W
W (x, y) =1
p(x | y)
in the above, we obtain
(3.5)
0 ≤∫
I
∫J
[−p(x, y | z) logb
1p(x | y)
]dxdy+
+ logb
(∫I
∫J
p(x, y | z)p(x | y)
dxdy
)≤
≤ 1ln b
∫I
∫J
[−p(x, y | z)
p(x | y)p(x | y)
]dxdy+
+1
ln b
∫I
∫Jp(x, y | z)p(x | y)dxdy×
×∫
I
∫J
p(u, v | z)p(u | v)
dudv.
The last two terms on the right hand side of the second inequality can berewritten as:
∫I
∫J
p(x, y, z)p(z)
p(x | y)dxdy ·∫
I
∫J
p(u, v, z)p(z)p(u | v)
dudv − 1 =
=1
p(z)2
(∫I
∫J
p(x, y, z)p(x | y)
p(x | y)2dxdy ·
·∫
I
∫J
p(u, v, z)p(u | v)
dudv − p(z)2)
=
=1
p(z)2
(∫I
∫Jγxy(z)p(x | y)2dxdy ·
·∫
I
∫Jγuv(z)dudv −
(∫I
∫Jp(x, y, z)dxdy
)2)
=
=1
2p(z)2
∫I
∫J
∫I
∫Jγxy(z)γuv(z) (p(x | y)− p(u | v))2 dxdydudv.
7 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 245
Substituting this term for the last two terms of (5.1), we get:
0 ≤∫
I
∫J
[−p(x, y, z)
p(z)logb
1p(x | y)
]dxdy+
+ logb
(∫I
∫J
p(x, y, z)p(z)p(x | y)
dxdy
)≤
≤ 12p(z)2 ln b
(∫I
∫J
∫I
∫Jγxy(z)γuv(z) (p(x | y) −
−p(u | v))2 dxdydudv.
Multiplying the above by p(z) and integrating over z to give:
0 ≤∫
Kp(z) logb
1p(z)
dz +
+∫
Kp(z) logb
(∫I
∫J
p(x, y, z)p(x | y)
dxdy
)dz−
−∫
I
∫J
∫Kp(x, y, z) logb
1p(x | y)
dxdydz ≤
≤ 12 ln b
∫K
1p(z)
(∫I
∫J
∫I
∫Jγxy(z)γuv(z) (p(x | y) −
− p(u | v))2 dxdydudv)dz,
or0 ≤ Hb(Z) + IEZ (logbA(Z))−Hb(X | Y ) ≤
≤ 12 ln b
∫K
1p(z)
(∫I
∫J
∫I
∫Jγxy(z)γuv(z) (p(x | y) −
− p(u | v))2 dxdydudv)dz.
which completes the proof. The following Corollary follows readily from the above:
Corollary 3.2. Let ε > 0 be given. If
(3.6) max(x,y),(u,v)
|p(x | y)− p(u | v)| ≤√
2 ε lnbM
where
(3.7) M :=∫
K
A(z)2
p(z)dz =
∫K
1p(z)
(∫I
∫Jγxy(z)dxdy
)2
dz,
246 S.S. DRAGOMIR and C.J. GOH 8
then
(3.8) 0 ≤ Hb(Z) + IEZ (logbA(Z))−Hb(X | Y ) ≤ ε.
Using theorem 1.1, we can readily obtain the following result and its corol-lary:
Theorem 3.3. With the above assumptions, we have:
(3.9)0 ≤ logb|I| −Hb(X | Y ) ≤
≤ 12 ln b
∫I
∫J
∫I
∫Jp(y)p(v)(p(x | y)− p(u | v))2dxdydudv
Corollary 3.4. Let ε > 0 be given. If
(3.10) max(x,y),(u,v)
|p(x | y)− p(u | v)| ≤√
2 ε ln b|I|
then
(3.11) 0 ≤ logb|I| −Hb(X | Y ) ≤ ε.
4. Results for Mutual Information. Given the pair of continuousrandom variables X : Ω → I and Y : Ω → J with joint density functionp(x, y), the mutual information is defined as [2, p. 24]:
(4.1) Ib(X;Y ) := Hb(X)−Hb(X | Y ) =∫
I
∫Jp(x, y) logb
p(x, y)p(x)p(y)
dxdy.
Theorem 4.1. With the above assumptions, we have:
(4.2)
0 ≤ Ib(X;Y ) ≤
≤ 12ln b
∫I
∫J
∫I
∫Jp(x)p(y)p(u)p(v)
(p(x, y)p(x)p(y)
−
− p(u, v)p(u)p(v)
)2
dxdydudv.
Moreover, equalities holds in both inequalities simultaneously if and only ifX and Y are independent.
9 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 247
Proof. To prove this Theorem, we modify Theorem 1.1 to read:
(4.3)0 ≤ IEX,Y (f(W (X,Y ))− f(IEX,YW (X,Y )) ≤
≤ IEX,Y (W (X,Y )f ′(W (X,Y ))−−IEX,Y f
′(W (X,Y ))IEX,YW (X,Y ).
where IEX,Y (·) =∫I
∫J(·)p(x, y)dxdy. Let
f(W ) = − logbW
f ′(W ) = − 1ln b
1W
W (x, y) =p(x)p(y)p(x, y)
,
in the above to obtain
0 ≤∫
I
∫Jp(x, y) logb
p(x, y)p(x)p(y)
dxdy +
+ logb
(∫I
∫Jp(x, y)
p(x)p(y)p(x, y)
dxdy
)≤
≤ − 1ln b
∫I
∫J
p(x)p(y)p(x, y)
p(x, y)p(x)p(y)
p(x, y)dxdy+
+1
ln b
∫I
∫J
p(x, y)2
p(x)p(y)dxdy
∫I
∫J
p(x)p(y)p(x, y)
p(x, y)dxdy
or0 ≤ Ib(X;Y ) ≤
≤ 1ln b
(∫I
∫J
(p(x, y)p(x)p(y)
)2
p(x)p(y)dxdy∫I
∫Jp(x)p(y)dxdy − 1
)=
=12
1ln b
∫I
∫J
∫I
∫Jp(x)p(y)p(u)p(v)(
p(x, y)p(x)p(y)
− p(u, v)p(u)p(v)
)2
dxdydudv.
Clearly, X and Y are independent if and only if p(x, y) = p(x)p(y) almosteverywhere if and only if the last term of the above is 0. This completesthe proof.
The following Corollary is immediately obvious:
248 S.S. DRAGOMIR and C.J. GOH 10
Corollary 4.2. Let ε > 0 be given. If
(4.4) max(x,y),(u,v)
| p(x, y)p(x)p(y)
− p(u, v)p(u)p(v)
| ≤√
2 ε lnb,
then,
(4.5) 0 ≤ Ib(X;Y ) ≤ ε.
Now consider a triplet of random variables X : Ω → I, Y : Ω → J andZ : Ω → K. Define the mutual information (interpreted as the amount ofinformation X and Y provide about Z, see p.26 of [2])
(4.6) Ib(X,Y ;Z) :=∫
I
∫J
∫Kp(x, y, z) logb
p(z | x, y)p(z)
dxdydz.
Theorem 4.3. With the above assumptions, we have:
(4.7)
0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z) ≤
≤∫
I
∫J
∫K
∫I
∫J
∫Kp(x, y)p(u, v)p(z | y)p(w | v)×
×(p(z | x, y)p(z | y)
− p(w | u, v)p(w | v)
)2
dxdydzdudvdw,
where equality holds if and only if p(z | x, y) = p(z | y) ∀(x, y, z) withp(x, y, z) > 0.
Proof. We first midify Theorem 1.1 to read:
0 ≤ IEXY Z (f(W (X,Y, Z)))− f (IEXY ZW (X,Y, Z)) ≤≤ IEXY Z (W (X,Y, Z)f ′(W (X,Y, Z)))−−IEXY Z (f ′(W (X,Y, Z))) IEXY Z (W (X,Y, Z)) ,
where IEXY Z(·) =∫I
∫J
∫K p(x, y, z)dxdydz. Let:
f(W ) = − logbW
f ′(W ) = − 1ln b
1W
W (X,Y, Z) =p(z | y)p(z | x, y)
,
11 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 249
in the above to obtain
0 ≤ logb
(∫I
∫J
∫K
p(z | y)p(z | x, y)
p(x, y, z)dxdydz)−
−∫
I
∫J
∫Kp(x, y, z)logb
p(z | y)p(z | x, y)
dxdydz ≤
≤∫
I
∫J
∫K
[−W 1
Wp(x, y, z)
]dxdydz+
+∫
I
∫J
∫K
p(z | x, y)p(z | y)
p(x, y, z)dxdydz×
×∫
I
∫J
∫K
p(z | y)p(z | x, y)
p(x, y, z)dxdydz.
Noting that: ∫I
∫J
∫K
p(z | y)p(z | x, y)
p(x, y, z)dxdydz =
=∫
I
∫J
∫K
p(z | y)p(xy)
dxdydz =
=∫
I
∫Jp(x, y)
∫Kp(z | y)dz dxdy =
=∫
I
∫Jp(x, y)dxdy = 1,
we have,
0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z)1
ln b
(∫I
∫J
∫K
p(z | x, y)p(z | y)
p(x, y, z)dxdydz ×
×∫
I
∫J
∫K
p(z | y)p(z | x, y)
p(x, y, z)dxdydz − 1)
=
=12
1ln b
∫I
∫J
∫K
∫I
∫J
∫Kp(x, y)p(u, v)p(z | y)p(w | v)×
×(p(z | x, y)p(z | y)
− p(w | u, v)p(w | v)
)2
dxdydzdudvdw,
which completes the proof. The following Corollary is immediately obvious:
Corollary 4.4. Let ε > 0 be given. If
(4.9) max(x,y,z),(u,v,)
|p(z | x, y)p(z | y)
− p(w | u, v)p(w | v)
| ≤√
2 ε ln b,
250 S.S. DRAGOMIR and C.J. GOH 12
then,
(4.10) 0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z) ≤ ε.
5. Results for the entropy, joint entropy and conditional en-tropy of mixed populations. Consider two continuous random variablesX1, X2 both having the same (finite) range I but having different densi-ties. If we mix the two populations representing X1 and X2 together withproportion 0 ≤ α ≤ 1 and 1 − α respectively, we derive a new populationrepresented by a random variable X, such that
p(x) = αp1(x) + (1− α)p2(x), α ∈ [0, 1].
where p1, p2 are the probability density of X1, X2 respectively, p(x) isthe probability density for X. In this and the following sections, we shallstudy the convexity properties of the entropy of mixed population, andestablish some useful bounds for various important quantitative measuresin information theory.
Theorem 5.1. With the above assumptions we have
(5.1)
0 ≤ Hb(X)− αHb(X1)− (1− α)Hb(X2) ≤
≤ 12ln b
∫I
∫Ip(x)p(y)
(α
(p1(x)p(x)
− p1(y)p(y)
)2
+
+ (1− α)(p2(x)p(x)
− p2(y)p(y)
)2)dxdy.
Proof. We have,
(5.2)
Hb(X)− αHb(X1)− (1− α)Hb(X2) =
=∫
Ip(x) logb
1p(x)
dx− α
∫Ip1(x) logb
1p1(x)
dx−
−(1− α)∫
Ip2(x) logb
1p2(x)
dx =
= α
∫Ip1(x) logb
p1(x)p(x)
dx+ (1− α)∫
Ip2(x) logb
p2(x)p(x)
dx =
= −α∫
Ip1(x) logb
p(x)p1(x)
dx− (1− α)∫
Ip2(x) logb
p(x)p2(x)
dx.
13 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 251
By Jensen’s inequality we have that:∫Ip1(x) logb
p(x)p1(x)
dx ≤ logb
∫Ip1(x)
p(x)p1(x)
dx =
= logb
∫Ip(x)dx = 0
and similarly ∫Ip2(x) logb
p(x)p2(x)
dx ≤ 0,
from which we obtain the first inequality of (5.1):
Hb(X)− αHb(X1)− (1− α)Hb(X2) ≥ 0.
Using a modification of Theorem 1.1 we have:
(5.3)
−∫
Ip1(x) logb
p(x)p1(x)
dx ≤ − logb
(∫Ip1(x)
p(x)p1(x)
dx
)+
+1
2ln b
∫I
∫I
p1(x)p1(y)p(x)p1(x)
p(y)p1(y)
(p(x)p1(x)
− p(y)p1(y)
)2
dxdy =
=1
2ln b
∫I
∫I
p21(x)p
21(y)
p(x)p(y)(p(x)p1(y)− p(y)p1(x))2
p21(x)p
21(y)
dxdy =
=1
2ln b
∫I
∫Ip(x)p(y)
(p1(y)p(y)
− p1(x)p(x)
)2
dxdy
and similarly,
(5.4)−∫
Ip2(x) logb
p(x)p2(x)
dx ≤
≤ 1ln b
∫I
∫Ip(x)p(y)
(p2(x)p(x)
− p2(y)p(y)
)2
dxdy.
Consequently, using the identity (5.2), and inequalities (5.3) and (5.4) weobtain the second inequality of (5.1).
Corollary 5.2. If
(5.5) maxx,y
maxi=1,2
∣∣∣∣pi(x)p(x)
− pi(y)p(y)
∣∣∣∣ ≤√
2εln b
252 S.S. DRAGOMIR and C.J. GOH 14
for some ε > 0 (Note: ε depends on α), then
0 ≤ Hb(X)− αHb(X1)− (1− α)Hb(X2) ≤ ε.
Proof. From (5.5) we have that
α
(p1(x)p(x)
− p1(y)p(y)
)2
+ (1− α)(p2(x)p(x)
− p2(y)p(y)
)2
≤ 2εln b,
thus, by (5.1)
Hb(X)− αHb(X1)− (1− α)Hb(X2) ≤
≤ 12ln b
· 2εln b∫
I
∫Ip(x)p(y)dxdy = ε.
We can prove similarly, that the joint entropy of two random variablesis also concave in the joint density distribution.
Let (X1, Y1), (X2, Y2) and (X,Y ) be pairs of random variables such thatX1, X2 and X have the same (finite) range I, and Y1, Y2 and Y have thesame (finite) range J , and furthermore,
p(x, y) = αp1(x, y) + (1− α)p2(x, y) ∀x ∈ I, ∀y ∈ J,
where α ∈ [0, 1]. The following Theorem and its Corollary can be proved inthe same way as that of Theorem 5.1 and Corollary 5.2:
Theorem 5.3. With the previous assumptions we have
(5.6)
0 ≤ Hb(X,Y )− αHb(X1, Y1)− (1− α)Hb(X2, Y2) ≤
≤ 12ln b
∫I
∫J
∫I
∫Jp(x, y)p(u, v)×
×
(α
(p1(x, y)p(x, y)
− p1(u, v)p(u, v)
)2
+
+(1− α)(p2(x, y)p(x, y)
− p2(u, v)p(u, v)
)2)dxdydudv.
The following Corollary follows from Theorem 5.3 directly.
15 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 253
Corollary 5.5. With the above assumptions and if
max(x,y),(u,v)
maxi=1,2
∣∣∣∣pi(x, y)p(x, y)
− pi(u, v)p(u, v)
∣∣∣∣ <√
2εln b
for some ε > 0 (note: this depends on α), then we have
0 ≤ Hb(X,Y )− αHb(X1, Y1)− (1− α)Hb(X2, Y2) ≤ ε.
Similar results for conditional entropy can also be established as follows:
Theorem 5.4. With the previous assumptions we have
(5.7)
0 ≤ Hb(X | Y )− αHb(X1 | Y1)− (1− α)Hb(X2 | Y2) ≤
≤ 12ln b
∫I
∫J
∫I
∫Jp(x | y)p(u | v)[αp1(y)p1(v)(
p1(u | v)p(u | v)
−p1(x | y)p(x | y)
)2
+
+(1− α)p2(y)p2(v)(p2(u | v)p(u | v)
− p2(x | y)p(x | y)
)2
]dxdydudv.
Proof. We have, by definition,
(5.8)
Hb(X | Y ) −αHb(X1 | Y1)− (1− α)Hb(X2 | Y2) =
=∫
I
∫Jp(x, y)logb
1p(x | y)
dxdy−
−α∫
I
∫Jp1(x, y)logb
1p1(x | y)
dxdy−
−(1− α)∫
I
∫Jp2(x, y)logb
1p2(x | y)
dxdy =
= −α∫
I
∫Jp1(x, y)logb
p(x | y)p1(x | y)
dxdy−
−(1− α)∫
I
∫Jp2(x, y)logb
p(x | y)p2(x | y)
dxdy.
254 S.S. DRAGOMIR and C.J. GOH 16
Using Jensen’s inequality, we have
−∫
I
∫Jp1(x, y) logb
p(x | y)p1(x | y)
dxdy ≥
≥ −logb
∫I
∫Jp1(x, y)
p(x | y)p1(x | y)
dxdy =
= −logb
(∫I
∫Jp1(y)p(x | y)
)dxdy =
= −logb
(∫I
∫Jp(x | y)dxp1(y)
)dy =
= −logb
(∫Jp1(y)dy
)= logb1 = 0,
and similarly
−∫
I
∫Jp2(x, y)logb
p(x | y)p2(x | y)
dxdy ≥ 0
thus proving the first inequality in (5.8). Furthermore, using a modifiedversion of Theorem 1.1, we have
∫I
∫Jp1(x, y)logb
p(x | y)p1(x | y)
dxdy ≤
≤ 12ln b
∫I
∫J
∫I
∫J
p1(x, y)p1(u, v)p(x|y)p1(x|y)
p(u|v)p1(u|v)
(p(x | y)p1(x | y)
− p(u | v)p1(u | v)
)2
dxdydudv=
=1
2ln b
∫I
∫J
∫I
∫J
p1(x, y)p1(u, v)p1(x | y)p1(u | v)
1p(x | y)p(u | v)
×
× (p(x | y)p1(u | v)−p1(x | y)p(u | v))2 dxdydudv =
=1
2ln b
∫I
∫J
∫I
∫Jp1(y)p1(v)p(x | y)p(u | v)(
p1(u | v)p(u | v)
− p1(x | y)p(x | y)
)2
dxdydudv.
A similar inequality for p2 can be obtained. Using these and the inequality
17 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 255
(5.9) we have
0 ≤ Hb(X | Y )− αHb(X1 | Y1)− (1− α)Hb(X2 | Y2) ≤
≤ α1
2ln b
∫I
∫J
∫I
∫Jp1(y)p1(v)p(x | y)p(u | v)×
×(p1(u | v)p(u | v)
− p1(x | y)p(x | y)
)2
dxdydudv+
+(1− α)1
2ln b
∫I
∫J
∫I
∫Jp2(y)p2(v)p(x | y)p(u | v)×
×(p2(u | v)p(u | v)
− p2(x | y)p(x | y)
)2
dxdydudv,
which is the second inequality of (5.7). The following Corollary is immediately obvious:
Corollary 5.5. With the above assumptions and if:
maxu,v,x,y
maxi=1,2
|pi(u | v)p(u | v)
− pi(x | y)p(x | y)
| <√
2 ε ln b, ε > 0,
then:
0 ≤ Hb(X | Y )− αHb(X1 | Y1)− (1− α)Hb(X2 | Y2) < ε.
6. Results for the mutual information of mixed populations.The mutual information between two random variables X and Y is previ-ously defined in section 3. We may extend the results in section 3 further.Consider two pairs of discrete random variables (X1, Y1) and (X2, Y2) withjoint probability densities p1(x, y) and p2(x, y) respectively. One can thinkof X1 and X2 as the inputs to some noisy transmission channel and Y1 andY2 as the outputs from the channel. Let the range of X1 and X2 be I,and the range of Y1 and Y2 be J . Define another pair of random variable(X,Y ) (having range I and J respectively) where X’s probability densityis a convex combination
p(x) = αp1(x) + (1− α)p2(x) ∀x,
where 0 ≤ α ≤ 1. The discrete version of the following result is known [2,p. 28]:
(6.1) αIb(X1;Y1) + (1− α)Ib(X2;Y2) ≤ Ib(X;Y )
256 S.S. DRAGOMIR and C.J. GOH 18
where Y1, Y2 and Y are the channel outputs corresponding to X1, X2 andX respectively. The continuous version (concavity of I with respect to theinput probabilities) can be established in the following Theorem:
Theorem 6.1. With the above assumptions we have:
(6.2)
0 ≤ Ib(X;Y )− αIb(X1;Y1)− (1− α)Ib(X2;Y2) ≤
≤ 12ln b
∫J
∫Jp(y)p(v)
(α
(p1(v)p(v)
− p1(y)p(y)
)2
+
+(1− α)(p2(v)p(v)
− p2(y)p(y)
)2)dydv.
Proof. We have, by definition,
(6.3)
αIb(X1;Y1) +(1−α)Ib(X2;Y2)−Ib(X;Y ) =
= α
∫I
∫Jp1(x, y) logb
p(y | x)p1(y)
dxdy+
+(1−α)∫
I
∫Jp2(x, y) logb
p(y | x)p2(y)
dxdy+
+∫
I
∫J
(αp1(x, y)(1−α)p2(x, y)) logb
p(y | x)p(y)
dxdy=
= α
∫I
∫Jp1(x, y) logb
p(y)p1(y)
dxdy+
+(1−α)∫
I
∫Jp2(x, y) logb
p(y)p2(y)
dxdy.
Application of Jensen inequality to each term of the above sums, wehave:∫
I
∫Jp1(x, y) logb
p(y)p1(y)
dxdy ≤∫
I
∫J
logb
(p1(x, y)
p(y)p1(y)
dxdy
)=
= logb
(∫J
p(y)p1(y)
∫Ip1(x, y)dxdy
)=
= logb
(∫J
p(y)p1(y)
p1(y)dy)
= logb 1 = 0.
Similarly, ∫I
∫Jp2(x, y) logb
p(y)p1(y)
dxdy ≤ 0.
19 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 257
Adding the above yileds the first inequality of (6.2). By a modified versionof Theorem 1.1 we get that:
αIb(X1;Y1) + (1− α)Ib(X2;Y2)− Ib(X;Y ) ≥
≥ − α
2ln b
∫I
∫Jp(y)p(v)
(p1(v)p(v)
− p1(y)p(y)
)2
dxdy−
−(1− α)2ln b
∫I
∫Jp(y)p(v)
(p2(v)p(v)
− p2(y)p(y)
)2
dxdy,
which is the second inequality of (6.2). The following Corollary follows directly.
Corollary 6.2. With the above assumptions and if
(6.4) maxv,y
maxi=1,2
∣∣∣∣pi(v)p(v)
− pi(y)p(y)
∣∣∣∣ <√
2εln b, ε > 0
for some ε > 0 (note: depends on α) then we have
(6.5) 0 ≤ αIb(X1;Y1) + (1− α)Ib(X2;Y2)− Ib(X;Y ) ≤ ε.
Furthermore, if the input probabilities p(x) are fixed and we are giventwo sets of transition probabilities p1(y | x) and p2(y | x). Let the convexcombination of these transition probabilities be
p(y | x) = αp1(y | x) + (1− α)p2(y | x).
The discrete version of the following result is known [2, p. 29]
(6.6) Ib(X;Y ) ≤ αIb(X;Y1) + (1− α)Ib(X;Y2)
where Y , Y1, Y2 are channel outputs corresponding to the transition prob-abilities p(y | x), p1(y | x) and p2(y | x). This result can be extended asfollows.
Theorem 6.3. With the above assumptions we have:
(6.6)
0 ≤ αIb(X;Y1) + (1− α)Ib(X;Y2)− Ib(X;Y ) ≤
≤ 12
1ln b
∫I
∫J
∫I
∫Jp(x | y)p(u | v)·
·
(αp1(y)p1(v)
(p1(u | v)p(u | v)
− p1(x | y)p(x | y)
)2
+
+ (1− α)p2(y)p2(v)(p2(u | v)p(u | v)
− p2(x | y)p(x | y)
)2)dxdydudv.
258 S.S. DRAGOMIR and C.J. GOH 20
Proof. By definition, we have:
(6.7)
Ib(X;Y ) −αIb(X;Y1)− (1− α)Ib(X;Y2) =
= α
∫I
∫Jp1(x, y) logb
p(x | y)p1(x | y)
dxdy+
+(1− α)∫
I
∫Jp2(x, y) logb
p(x | y)p2(x | y)
dxdy,
Applying Jensen’s inequality to the first term to get:
∫I
∫J
p1(x, y) logb
p(x | y)p1(x | y)
dxdy ≤
≤ logb
(∫I
∫Jp1(x, y)
p(x | y)p1(x | y)
dxdy
)=
= logb
(∫I
∫Jp1(y)
p(x, y)p(y)
dxdy
)=
= logb
(∫J
p1(y)p(y)
∫Ip(x, y)dxdy
)=
= logb
(∫J
p1(y)p(y)
p(y)dy)
=
= logb
(∫Jp1(y)dy
)= 0.
Similarly,
∫I
∫Jp1(x, y) logb
p(x | y)p1(x | y)
dxdy ≤ 0.
Adding the above yields the first inequality of (6.6).
21 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 259
By a modified version of Theorem 1.1, we have:
(6.8)
∫I
∫Jp1(x, y) logb
p(x | y)p1(x | y)
dxdy ≥
≥ logb
(∫I
∫Jp1(x, y)
p(x | y)p1(x | y)
dxdy
)−
− 12ln b
∫I
∫J
∫I
∫J
p1(x, y)p1(u, v)p(x | y)p1(x | y)
· p(u | v)p1(u | v)(
p(x | y)p1(x | y)
− p(u | v)p1(u | v)
)2
dxdy =
= − 12ln b
∫I
∫J
∫I
∫J
p1(x, y)p1(u, v)p1(x | y)p1(u | v)
· 1p(x | y)p(u | v)
×
× (p(x | y)p1(u | v)− p1(x | y)p(u | v))2dxdy =
= − 12ln b
∫I
∫J
∫I
∫Jp1(y)p1(v)p(x | y)p(u | v)(
p1(u | v)p(u | v)
− p1(x | y)p(x | y)
)2
dxdy.
Using these and (6.7) and a similar inequality for p2, we obtain
αIb(X;Y1) + (1− α)Ib(X;Y2)− Ib(X;Y ) ≤
≤ 12ln b
∫I
∫J
∫I
∫Jp(x | y)p(u | v)×
×
(αp1(y)p1(v)
(p1(u | v)p(u | v)
− p1(x | y)p(x | y)
)2
+
+(1− α)p2(y)p2(v)(p2(u | v)p(u | v)
− p2(x | y)p(x | y)
)2)× dxdydudv,
and the Theorem is proved. The following Corollary follows directly.
Corollary 6.4. With the above assumptions and if
maxu,v,x,y
maxi=1,2
∣∣∣∣pi(u | v)p(u | v)
− pi(x | y)p(x | y)
∣∣∣∣ <√
2εln b
for some ε > 0 (note: depends on α) then we have:
0 ≤ αIb(X;Y1) + (1− α)Ib(X;Y2)− Ib(X;Y ) ≤ ε.
260 S.S. DRAGOMIR and C.J. GOH 22
7. Further bounds based on ratio of maximum and minimumdensity. All the corollaries that we have discussed so far yield sufficientconditions for the lower bounds on the respective measures of random vari-ables or mixed population of random variables. These sufficient conditionsare all based on the absolute difference between some densities. We nowderive another sufficient condition which is based on the ratio of densities.We show that this leads to similar lower bounds which are in many casessimpler in appearance. The following Theorem is a Corollary of Theorem1.1.
Theorem 7.1. Let
(7.1) ρ := maxx,y
p(x)p(y)
,
If
(7.2) ρ ≤ φ(ε) := 1 + ε ln b +√ε ln b(ε ln b + 2),
then
(7.3) 0 ≤ logb|I| −Hb(X) ≤ ε.
Proof. If ρ ≤ φ(ε), then clearly, for all i, k,
1φ(ε)
≤ p(x)p(y)
≤ φ(ε),
iff (p(x)p(y)
)2
− (2 + 2ε ln b)(
p(x)p(y)
)+ 1 ≤ 0,
iff(p(x)− p(y))2
p(x)p(y)=(p(x)p(y)
)− 2 +
(p(x)p(y)
)≤ 2ε ln b.
Hence from (2.2) of Theorem 2.1, we have0 ≤ logb|I| −Hb(X) ≤ 1
2ln b
∫I
∫J p(x)p(y) 2ε ln b dxdy = ε.
The following Theorem is a Corollary of Theorem 2.3, and the proof issimilar to that of Theorem 7.1.
23 A COUNTERPART OF JENSEN’S CONTINUOUS INEQUALITY 261
Theorem 7.2. Let
(7.4) ρ := maxx,y,u,v
p(x, y)p(u, v)
,
If
(7.5) ρ ≤ φ(ε),
then
(7.6) 0 ≤ logb|I| −Hb(X,Y ) ≤ ε.
The following Theorem is a Corollary of Theorem 3.1, and the proofis similar to that of Theorem 7.1. Notice that this sufficient condition isindependent of M (as defined in (3.7), and is in a simpler form than before.
Theorem 7.3. Let
(7.7) ρ := maxx,y,u,v
p(x | y)p(u | v)
,
If
(7.8) ρ ≤ φ(ε),
then
(7.9) 0 ≤ Hb(Z) + IE(logbA)−Hb(X | Y ) ≤ ε,
where Hb(X | Y ) and A are as defined in (3.1) and (3.2) respectively.The following Theorem is a Corollary of Theorem 4.1, and the proof is
similar to that of Theorem 7.1.
Theorem 7.4. Let
(7.10) ρ := maxx,y,u,v
p(x)p(y)p(u, v)p(u)p(v)p(x, y)
,
If
(7.11) ρ ≤ φ(ε),
262 S.S. DRAGOMIR and C.J. GOH 24
then
(7.12) 0 ≤ Ib(X,Y ) ≤ ε.
The following Theorem is a Corollary of Theorem 4.3, and the proof issimilar to that of Theorem 7.1.
Theorem 7.5. Let
(7.13) ρ := maxx,y,z,u,v,w
p(z | x, y)p(w | v)p(w | u, v)p(z | y)
,
If
(7.14) ρ ≤ φ(ε),
then
(7.15) 0 ≤ Ib(X,Y ;Z)− Ib(Y ;Z) ≤ ε.
REFERENCES
1. Dragomir, s.s. and Goh, c.j. – A counterpart of Jensen’s Discrete Inequality for
Differentiable Convex Mappings and Applications in Information Theory, Mathe-
matical and Computer Modelling, to appear.
2. Mceliece, r.j. – The Theory of Information and Coding, Addison Wesley Publish-
ing Company, Reading, 1977.
Received: 19.I.2000 Department of Mathematics
University of Timisoara
RO-1900 Timisoara
ROMANIA
Department of Mathematics
University of Western Australia
Nedlands, WA 6907
AUSTRALIA