Information Theory Handout Bw

3/23/2009

1

INFORMATION THEORY

MODULE IIIPART I

Module IV –Part I

Information theory and coding: Discretemessages - amount of information – entropy -information rate. Coding- Shannon’s theorem,Channel capacity - capacity of Gaussian channel-Bandwidth S/N Trade off - Use of orthogonal signalto attain Shannon‘s limit - Efficiency oforthogonal signal transmission.

AMOUNT OF INFORMATION

Consider a communication system in which the allowablemessages are m1,m2…. with probabilities of occurrence p1,p2…Then p1+p2+……….=1.Let the transmitter transmit a message mk with probability pk. Letthe receiver has correctly identified the message. Then theamount of information conveyed by the system is defined as:Ik= logb (1/pk) where b is the base of log.

= -logb pkThe base may be 2,10 or e.When the base is 2 the unit of Ik is is bit (binary unit)when it is 10 the unit is Hartley or decit.When the natural logarithmic base is used the unit is nat.Base 2 is commonly used to represent Ik.

AMOUNT OF INFORMATION

The above units are related as :

The base of 2 is preferred because in binary PCM thepossible messages 0 and 1 occur with likely hood and the

102

10

ln loglogln 2 log 2

= =a aa 1 3.32

1 1.44==

Hartley bitsNat bits

possible messages 0 and 1 occur with likely hood and theamount of information conveyed by each bit is log22=1 bit.

IMPORTANT PROPERTIES OF IK

Ik approaches 0 as pk approaches 1.pk=1 means the receiver already knows the message andthere is no need for transmission so Ik=0. Eg: The statement‘sun rises in the east’ conveys no information.Ik must be a non-negative quantity since each messagecontains some information in the worst case Ik=0.The information content of a message having higherprobability of occurrence is less than the information contentof message having lower probability.As pk approaches 0, Ik approaches infinity. The informationcontent in a highly improbable event approaches unity.

NOTESWhen the symbols 0 and 1 of a PCM data occur with equal likelyhood with probabilities ½ the amount of information conveyed byeach bit isIk(0) = Ik(1) = log22= 1 bit

When the probabilities are different the less probable symbolconveys more information.Let p(0)=1/4 p(1)=3/4Let p(0)=1/4 p(1)=3/4Ik(0)=log2 4=2 bitsIk(1)=log2 4/3=0.42 bit

When there are M equally likely and independent messages suchthat M=2N with N an integer, the information in each message isIk=log2 M=log2 2N = N bits.

3/23/2009

2

NOTES

In this case if we are using binary PCM code for representing theM messages the number of binary digits required to represent allthe 2N messages is also N.i.e when there are M (=2N) equally likely messages the amount ofinformation conveyed by each message is equal to the number ofbinary digits needed to represent all the messages.When two independent messages mk and mI are correctlyidentified the amount of information conveyed is the sum of theidentified the amount of information conveyed is the sum of theinformation associated with each of the messages individually.

When the messages are independent the probabilities of thecomposite message is pkpI.

21log=k

k

Ip 2

1log=II

Ip

, 2 2 21 1 1log log log= = + = +k I k Ik I k I

I I Ip p p p

EXAMPLE

EXAMPLE 1A source produces one of four possible symbols during eachinterval having probabilities: p(x1)=1/2, p(x2)=1/4, p(x3) = p(x4)= 1/8. Obtain the information content of each of thesesymbols.

ANS:I(x1)=log22 =1 bitI(x2)=log24 =2 bitsI(x3)=log28 =3 bitsI(x4)=log28 =3 bits

AVERAGE INFORMATION,ENTROPY

Suppose we have M different and independent messagesm1,m2… with probabilities of occurrence p1,p2…Suppose further that during a long period of transmission asequence of L messages has been generated. If L is verylarge we may expect that in the L message sequence wetransmitted p1L messages of m1, p2L messages of m2,etc.The total information in such a message will be:The total information in such a message will be:

Average information per message interval is represented bythe symbol H is given by:

1 2 2 21 2

1 1log log ...................= + +TotalI p L p Lp p

1 2 2 21 2

1 1log log ...................≡ = + +TotalIH p pL p p


Average information is also referred to as Entropy. Its unit is information bits/symbol or bits/message.

21

1log=

≡ ∑M

kk k

H pp

21

1log=

≡ ∑M

kk k

H pp

AVERAGE INFORMATION,ENTROPYWhen pk=1, there is only a single possible message and thereceipt of that message conveys no information.H = log2 1 = 0When pk→0 amount of information Ik→∞ and the averageinformation in this case is :

20

1log 0lim→

=p

pp

The average information associated with an extremely unlikelymessage as well as an extremely likely message is zero.Consider that a source generates two messages with probabilitiesp and (1-p). The average information per message is :

2 21 1log (1 ) log

(1 )= + −

−H p p

p p

0, 0 = =when p H 1, 0 = =when p H


1 HMAX

H Plot of H as a function of p

1/2 10 p

3/23/2009

3

AVERAGE INFORMATION,ENTROPYThe maximum value of H may be located by setting 0=

dHdp

1 1l ( ) l ( )dH ⎛ ⎞ ⎛ ⎞−

2 21 1log (1 ) log

(1 )H p p

p p= + −

−

( )2 2log (1 ) log 1H p p p p= − − − −

1 log 1 log(1 )p p= − − + + −

1 1log (1 ). log(1 ). 1(1 )

dH p p p pdp p p

⎛ ⎞ ⎛ ⎞= − ⋅ + − − + − −⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠

log(1 ) logp p= − −

1log pp

⎛ ⎞−= ⎜ ⎟

⎝ ⎠


Similarly when there are 3 messages the average information Hbecomes maximum when the probability of each of these messages

1/3

0dHdP

= 1log 0pp

⎛ ⎞−=⎜ ⎟

⎝ ⎠1 1p

p−

= 1 p p− =12

=p

p=1/3.

Extending this, when there are M messages H becomes a maximumwhen all the messages are equally likely with p=1/M. In this caseeach message has a probability 1/M and

2 2 2 21 1 1log 3 log 3 log 3 log 33 3 3

= + + =MAXH

max 2 21 log log= =∑H M MM

INFORMATION RATE R

Let a source emits symbol at the rate r symbols/second. Then information rate of the source:R= r H information bits/second.

R→ information rate, H→ entropy of the sourcer rate at which symbols are generatedr→ rate at which symbols are generated.

R= r (symbols/second) x H (information bits/symbol)R= rH (information bits/second)

EXAMPLE 1A discrete source emits one of the five symbols once very millisecondswith probabilities 1/2, 1/4, 1/8, 1/16 and 1/16 respectively. Determinesource entropy and information rate.

21

1logM

ii i

H PP=

= ∑5

21

1logii i

PP=

= ∑

2 2 2 2 21 1 1 1 1log 2 log 4 log 8 log 16 log 16= + + + +

1000Infor 1.8mation r 1875 bits/sat c75e eR rH= = × =

2 2 2 2 2log 2 log 4 log 8 log 16 log 162 4 8 16 16

+ + + +

1 1 3 1 12 2 8 4 4

= + + + +15 1.875 bits/symbol8

= =

3

1 110

Symbol rate 1000 symbols/se cbb

r fT −= = = =

EXAMPLE 2The probabilities of five possible outcomes of an experiment are givenas

Determine the entropy and information rate if there are 16 outcomes persecond.

1 2 3 4 51 1 1 1( ) , ( ) , ( ) , ( ) ( )2 4 8 16

P x P x P x P x P x= = = = =

5

21( ) ( ) log bits/symboliH X P x=∑ 2

1

( ) ( ) g y( )i

i iP x=∑

15Rate of information 30 bits6 / c)8

s( eR rH X= = × =

2 2 2 2 21 1 1 1 1log 2 log 4 log 8 log 16 log 162 4 8 16 16

= + + + +

1 2 3 4 42 4 8 16 16

= + + + + 1.875 bits/out15 me8

co= =

Rate of outcomes 16 outcomes/secr =

EXAMPLE 3An analog signal band limited to 10kHz is quantized into 8 levels ofa PCM system with probabilities of 1/4, 1/5, 1/5,1/10, 1/10, 1/20,1/20 and 1/20 respectively. Find the entropy and rate of information.fm= 10 kHz fs = 2 x 10kHz = 20 kHzRate at which messages are produced 320 10 / secsr f messages= = ×

2 2 2 21 1 1 1( ) log 4 log 5 2 log 10 2 log 20 34 5 10 20

H X ⎛ ⎞ ⎛ ⎞ ⎛ ⎞= + × + × + ×⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠

56800 bits/sec=

4 5 10 20⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠

2.84 bits/messages=

( )R rH X=

20000 2.84= ×

3/23/2009

4

EXAMPLE 4Consider a telegraph source having two symbols dot and dash. Thedot duration is 0.2s. The dash duration is 3 times the dot duration.The probability of the dots occurring is twice that of the dash andthe time between symbols is 0.2s. Calculate the information rate ofthe telegraph source.

(dot) (dash)2p p= (dot) (dash) (dash)3 1p p p+ = =

( ) 0.92 b/symbolH X =

( ) ( ) ( )

(dash)13

p =(dot)

23

p =

(dot) 2 (dash) 2(dot) (dash)

1 1( ) log logH X p pp p

= +

0.667 0.585 0.333 1.585 0.92 b/symbol= × + × =

EXAMPLE 4 (Contd..)Average time per symbol is

(dot) (dot) (dash) (dash) spacesT P t P t t= ⋅ + ⋅ +

2 10.2 0.6 0.23 3

= × + × +

0.5333 seconds/symbol=

Average information ra 1.875 1.720. b/se 92t R rH= = × =

Average symbol rate is 1.875 symbols/s1 ec s

rT

= =

SOURCE CODINGLet there be M equally likely messages such that M=2N. If themessages are equally likely, the entropy H becomes maximumand is given by

The number of binary digits needed to encode each message isalso N

max 2 2log log 2NH M N= = =

also N.So entropy H = N if the messages are equally likely.The average information carried by individual bit is H/N = 1 bit.If however the messages are not equally likely H is less than Nand each bit carries less than 1 bit of information.This situation can be corrected by using a code in which not allmessages are encoded into the same number of bits.

SOURCE CODINGThe more likely a message is, the fewer the number of bitsthat should be used in its code word.Let X be a DMS with finite entropy H(X) and an alphabet x1,x2,…,xm with corresponding probabilities of occurrence p(xi)where i = 1, 2, 3, ……..m.Let the binary code word assigned to symbol xi by the

d h l th d i bit L th f dencoder have length ni measured in bits. Length of a codeword is the number of bits in the code word.The average code word length L per source symbol is givenby

1

( )M

i ii

p x n=

= ∑1 2 2( ) ( ) ... ( )i m mL p x n p x n p x n= + + +

x1

x2

.

.

y1

y2

.

.

SOURCE CODING

n1n2n3

p(x1)

p(x2)

p(x3)

CHANNELX Y.

.

.

.

.

.

xm

.

.

.

.

.

.

yn

.

nm

.

.p(xm)

SOURCE CODINGThe parameter L represents the average number of bits persource symbol used in the source coding process.Code efficiency η is defined as where Lmin is theminimum possible value of L. When η approaches unity thecode is said to be efficient.Code redundancy γ is defined as γ = 1 – η

LminL

=η

DMS SOURCE CODINGkb

Binary sequence

3/23/2009

5

SOURCE CODINGThe conversion of the output of a DMS into a sequence ofbinary symbols (binary codes) is called source coding.The device that performs this is called source encoder.If some symbols are known to be more probable than others

then we may assign short codes to frequent source symbolsand long code words to rare source symbols.Such a code is called a variable length code.As an example, in Morse code the letter E is encoded into asingle dot where as the letter Q is encoded as ‘_ _ . _ ‘. This isbecause in English language letter E occurs more frequentlythan the letter Q.

SHANNON’S SOURCE CODING THEOREM

Source coding theorem states that for a DMS X with entropyH(x) the average code word length per symbol is bounded as

and L can be made as close to H(x) as desired for somesuitably chosen code. When the code

ffi i i

( )L H x≥

min ( )L H x=( )H xefficiency is

No code can achieve efficiency greater than 1, but for anysource, there are codes with efficiency as close to 1 asdesired.The proof does not give a method to find the best codes. Itjust sets a limit on how good they can be.

( )H xL

η =


Proof of the statement: Length of code ≥ H(X) ≥ 0 [0 ≤ H(X) ≤ N]

−

− −

0 1 1

0 1 1 0 1 1

, ,..., , ,...

Consider any two probability distributions and on the alphabet of a discret

, , ,e mem

.oryless

.., channel.

M

M M

p p pq q q x x x

⎡ ⎤ ⎛ ⎞

Applying this property to Eq-(1)

− −

= =

⎡ ⎤ ⎛ ⎞= ⎜ ⎟⎢ ⎥

⎣ ⎦ ⎝ ⎠∑ ∑

1 1

20 0

1log ln .............. (1)ln 2

M Mi i

i ii ii i

q qp pp p

≤ − ≥By a special property of the natural logarithm (ln), we havln 1 0

e,,x x x

SHANNON’S SOURCE CODING THEOREM− −

= =

⎛ ⎞ ⎛ ⎞≤ −⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠∑ ∑

1 1

20 0

1log 1 ln2

M Mi i

i ii ii i

q qp pp p

−

=

≤ −∑1

0

1 ( ) ln2

M

i ii

q p

− −

= =

⎛ ⎞≤ − =⎜ ⎟

⎝ ⎠∑ ∑

1 1

0 0

1 0 ln2

M M

i ii i

q p− ⎛ ⎞

⎜ ⎟∑1

l 0M

iq

−

=

⎛ ⎞≤⎜ ⎟

⎝ ⎠∑

1

20

Thus we obtain the fundamental inequalit

log 0

y

..............(2)M

ii

i i

qpp

−

=

−

−

=∑1

2

1 1

20

1 2 1

2

, ,...,, ,...

If there are equally probable messages with probabilities the ent, ropy of this DMS is given b

1log lo

y

............ (g .. 3)

M

MM

ii i

x x xq q

M

q

q

q M

=

⎛ ⎞≤⎜ ⎟

⎝ ⎠∑ 2

0log 0i

ii i

qpp


= = −1q for 0,1,..Also, ............, 1 ...(4)i i MM

−

=

≤∑1

20

Substituting Eq-(4) in Eq- )1l g 0

(2

oM

ii i

pp M

−

=

=≤ ∑1

02 sinclog , e 1

M

ii

M p

− −

= =

+ ≤∑ ∑1 1

2 20 0

1 1log log 0M M

i ii ii

p pp M

− −

= =

≤ −∑ ∑1 1

2 20 0

1 1log logM M

i ii ii

p pp M

−

=

≤ ∑1

20

logM

ii

p M

≤ 2( ) logH X M

≤ = if 2( ) NH X N M

≤If the symbols are equally li

( )k

ely.

H X L


H(X) = 0 if and only if the probability pi = 1 for some i and theremaining probabilities in the set are all zero. This lowerbound on entropy corresponds to no uncertainty.

H(X) = log2M if and only if pi = 1/M for all i, i.e., all the symbolsin the alphabet are equi-probable. This upper bound onentropy corresponds to maximum uncertainty.

Proof of the lower bound: Each probability pi is less than orequal to 1. Each term pi log2(1/ pi) is zero if and only if pi = 0or 1. i.e., pi = 1 for some I and all others are zeroes. Sinceeach probability pi ≤ 1, each term pi log2(1/ pi) is always nonnegative.

3/23/2009

6

CLASSIFICATION OF CODESFixed Length Code: A fixed length code is one whose code wordlength is fixed. Code 1 and code 2 of Table 1 are fixed lengthcodes.Variable Length Code: A variable length code is one whose codeword length is not fixed. Shannon-Fano and Huffman’s codes areexamples of variable length codes. Code 3, 4, 5 in the Table 1are variable length codesare variable length codes.Distinct Code: A code is distinct if each code word isdistinguishable from other code words. Codes 2,3,4,5 and 6 aredistinct codes.Prefix Code: A code in which no code word can be formed byadding code symbols to another code word is called a prefixcode. No code word should be a prefix to another.e.g. Codes 2,4 and 6

CLASSIFICATION OF CODESUniquely Decodable Code: A code is uniquely decodable ifthe original source sequence can be reconstructed perfectlyfrom the encoded binary sequence.Code 3 of the table is not uniquely decodable since the binarysequence 1001 may correspond to source sequences x2x3x2or x2x1x1x2.A sufficient condition to ensure that a code is uniquely

d d bl i th t d d i fi f th Thdecodable is that no code word is a prefix of another. Thuscodes 2,4 and 6 are uniquely decodable codes.Prefix-free condition is not a necessary condition for uniquedecodability. e.g. code 5Instantaneous Codes: A code is called instantaneous if theend of any code word is recognizable without examiningsubsequent code symbols. Prefix-free codes areinstantaneous codes e.g. code 6

CLASSIFICATION OF CODES.

xi Code 1

Code 2

Code 3

Code 4

Code 5

Code 6

x1 00 00 0 0 0 1

x2 01 01 1 10 01 01

x3 00 10 00 110 011 001

x4 11 11 11 111 0111 0001

Fixed Length Codes: 1,2Variable Length Code: 3,4,5,6Distinct Code: 2,3,4,5,6

Prefix Code: 2,4,6Uniquely Decodable Code: 2,4,6Instantaneous Codes: 2,4,6

PREFIX CODING (INSTANTANEOUS CODING)Consider a discrete memory-less source of alphabet x0, x1,…,xm-1 with statistics p0, p1, …, pm-1Let the code word assigned to source symbol xk be denoted bymk1, mk2, …, mkn where the individual elements are 0s and 1sand n is the code word length.Initial part of code word is represented by mk1, …, mki for some i≤nAny sequence made of the initial part of the code word is called aAny sequence made of the initial part of the code word is called aprefix of the code word.A prefix code is defined as a code in which no code word is aprefix of any other code word.It has the important property that it is always uniquely decodable.But the converse is not always true.Thus, a code that does not satisfy the prefix condition is alsouniquely decodable.

EXAMPLE 1(Contd…)

xi

x1

x2

x3

Code A

0 0

0 1

1 0

Code B

0

1 0

1 1

Code C

0

1 1

1 0 0

Code D

0

1 0 0

1 1 0

B anc C are not uniquely decodable

x4 1 1 1 1 0 1 1 0 1 1 1

A,C,D satisfies Kraft inequality A and D are Prefix codes.

A prefix code always satisfies Kraft inequality. But the converse is not always true.

EXAMPLEAn analog signal is band-limited to fm Hz and sampled at Nyquistrate. The samples are quantized into 4 levels. Each levelrepresents one symbol. The probabilities of occurrence of these4 levels (symbols) are p(x1) = p(x4) = 1/8 and p(x2) = p(x3) = 3/8.Obtain the information rate of the source.Answer:

3 1( ) ( ) p( ) ( )= = = =p x p x x p x2 3 1 4( ) ( ) p( ) ( )8 8

= = = =p x p x x p x

⎛ ⎞⎛ ⎞= =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

symbols bits2 1.8 3.6 bits/secondsecond symbolm mR f f

= + + +2 2 2 21 3 8 3 8 1( ) log 8 log log log 88 8 3 8 3 8

H X = 1.8 bits/symbol.

=Nyquist rate means 2s mf f

=Rate at which symbols are generated 2 symbols/secondmr f

3/23/2009

7

EXAMPLE

We are transmitting 3.6fm bits/second. There are four levels, these four levels may be coded using binary PCM as shown below.

Symbol Probabilities Binary digitsQ1 1/8 00Q2 3/8 01Q3 3/8 10Q4 1/8 11

Two binary digits are needed to send each symbol. Sincesymbols are sent at the rate 2fm symbols/sec, the transmissionrate of binary digits will be:

EXAMPLE

Since one binary digit is capable of conveying 1 bit of

onddigits/secbinary 4 second

symbols2symbol

digitsbinary 2 ratedigit Binary

m

m

f

f

=

×=

y g p y ginformation, the above coding scheme is capable ofconveying 4fm information bits/sec.But we have seen earlier that we are transmitting 3.6fm bits ofinformation per second.This means that the information carrying ability of binary PCMis not completely utilized by this transmission scheme.

EXAMPLE

In the above example if all the symbols are equally likely iep(x1)=p(x2)=p(x3)=p(x4)=1/4

With binary PCM coding, maximum information rate is achieved ifall messages are equally likely.Often this is difficult to achieve. So we go for alternative coding

h t i i f ti bitschemes to increase average information per bit.

SHANNON-FANO CODING PROCEDURE(i) List the source symbols in the order of decreasing

probability.

(ii) Partition the set in to two sets that are as close toequiprobables as possible.

(iii) Assign 0’s to upper set and 1’s to the lower set.( ) g pp

(iv) Continue the process each time partitioning the sets withas nearly equal probabilities as possible until furtherpartitioning is not possible.

(v) The rows of the table corresponding to the symbol givesthe Shannon –Fano code.

Find out the Shannon-Fano Codes corresponding to eightmessages m1,m2,m3……m7 with probabilities 1/2, 1/8, 1/8, 1/16,1/16, 1/16, 1/32 and 1/32

SHANNON-FANO CODING

Message Probabilities Codes No of bits/message

m1 1/2 0 0 1

m2 1/8 1 0 0 100 3

m3 1/8 1 0 1 101 3 m4 1/16 1 1 0 0 1100 4m5 1/16 1 1 0 1 1101 4

m6 1/16 1 1 1 0 1110 4

m7 1/32 1 1 1 1 0 11110 5

m8 1/32 1 1 1 1 1 11111 5

SHANNON-FANO CODING

m1

m2

m3

m

0

1 0 0

1 0 1

1 1 0 0m4

m5

m6

m7

m8

1 1 0 0

1 1 0 1

1 1 1 0

1 1 1 1 0

1 1 1 1 1 1

3/23/2009

8

SHANNON-FANO CODING8

1

( )i ii

L p x n=

= ∑

53215

3214

1614

1614

1613

813

811

21

×+×+×+×+×+×+×+×=

2.31=8 1∑ 2

1

1( ) log( )i

i i

H p xp x=

= ∑

⎟⎠⎞

⎜⎝⎛ ××+⎟

⎠⎞

⎜⎝⎛ ××+⎟

⎠⎞

⎜⎝⎛ ××+×= 232log

321316log

16128log

812log

21

2222

2.31=

100%HL

η ==

There are 6 possible messages m1, m2, m3…….m6 withprobabilities 0.3, 0.25, 0.2, 0.12, 0.08, 0.05 Obtain Shannon-Fano Codes

Messages Probablities codes length

m1 0.3 0 0 00 2

SHANNON-FANO CODING

1

m2

m3

m4

m5

m6

0 30.250. 20.120.080.05

001111

010111

011

01

00011011011101111

22344


1

( )i ii

L p x n=

= ∑

0.3 2 0.25 2 0.2 2 0.12 3 0.08 4 0.05 4= × + × + × + × + × + ×

symbolb / 38.2=8

21( ) logiH p x= ∑ 2

1

( ) g( )i

i i

pp x=

∑

12.01log12.0

2.01log2.0

25.01log25.0

3.01log3.0 2222 ×+×+×+×=

symbolb / 36.2=

2.36 0.99 99%2.38

HL

η = = ==05.01log05.0

08.01log08.0 22 ×+×+

1 0.01 1% Redundancy γ η = == −

SHANNON-FANO CODING

xi P(xi) Codes Length

x1

x2

0.20.2

0 0 0 0

0 1

2

20 1

A DMS has five equally likely symbols. Construct Shannon-Fano Code

x3

x4

x5

1 0

1 1 0

1 1 1

2

3

3

0.20.20.2

1 0

11

1 0

1 1


1

( )i ii

L p x n=

= ∑( ) 4.2332222.0 =++++=

symbolb / 4.2=5

21( ) logiH p x= ∑ 2

1

( ) g( )i

i i

pp x=

∑

52.0

1log2.0 2 ×⎟⎠⎞

⎜⎝⎛ ×= symbolb / 32.2=

%7.96967.04.232.2

====LHη

SHANNON-FANO CODING

xi P(xi) Codes Length

x1

x2

0.40.19

0 0 0 0

0 1

2

20 1

A DMS has five symbols x1, x2, x3, x4, x5, construct Shannon-Fano Code

x3

x4

x5

1 0

1 1 0

1 1 1

2

3

3

0.160.150.1

1 0

11

1 0

1 1

3/23/2009

9

HUFFMANN -CODING(i) List the source symbols in the order of decreasing probability.(ii) Combine the probabilities (add) of two symbols having the

lowest probabilities and reorder the resultant probabilities.This process is called bubbling.

(iii) During the bubbling process if the new weight is equal toexisting probabilities the new branch is to be bubbled to thetop of the group having same probabilities.p g p g p

(iv) Complete the tree structure and assign a ‘1’ to the branchrising up and ‘0’ to that coming down.

(v) From the final point trace the path to the required symbol andorder the 0’s and 1’s encountered in the path to form the code.

* It produces the optimum code .

* It has the highest efficiency.

EXAMPLES OF HUFFMANN’S CODING

x1

x2

x3

0.4

0.2

0.1

0.4

0.2

0.2

0.4

0.2

0.2

0.4

0.4

0.2

0.6

0.41

01

01

1

(1 1)

(0 0)

x4

x5

x6

0.1

0.1

0.1

0.1

0.1

0.2 01

01

01

(1 0 1)

(1 0 0)

(0 1 1)

(0 1 0)


x1

x2

x3

0.30

0.25

0.20

0.30

0.25

0.20

0.30

0.25

0.25

0.45

0.30

0.25

0.55

0.451

01

01

1

(1 1)

(0 1)

x4

x5

x6

0.12

0.08

0.05

0.13

0.12

0.20 01

01

01

(0 0)

(1 0 0)

(1 0 1 1)

(1 0 1 0)


x1

x2

x3

0.4

0.19

0.16

0.4

0.25

0.19

0.4

0.35

0.25

0.6

0.4 1(0)

(1 1 1)

101

01

x4

x5

0.15

0.1

0.16(1 1 0)

(1 0 1)

(1 0 0)

0

1

01


x1

x2

x3

0.2

0.2

0.2

0.4

0.2

0.2

0.4

0.4

0.2

0.6

0.4 1(1 0)

(0 1)

101

01

x4

x5

0.2

0.2

0.2(0 0)

(1 1 1)

(1 1 0)

0

1

01

A communication channel may be defined as the path ormedium through which the symbols flow to the receiver.A Discrete Memory-less Channel (DMC) is a statistical modelwith an input X and an output Y as shown below.During each signalling interval, the channel accepts an inputsignal from X, and in response it generates an output symbol

CHANNEL REPRESENTATION

from Y.The channel is discrete when the alphabets of X and Y are bothfinite values.It is memory-less when the current output depends only on thecurrent input and not on any of the previous inputs.

3/23/2009

10

x1

x2

.

.

y1

y2

.

.


p(yj/xi)X Y.

.

.

.

.

.

xm

.

.

.

.

.

.

yn

A diagram of a DMC with m inputs and n outputs is shownabove.The input X consists of input symbols x1,x2,…..xm. The output Yconsists of output symbols y1,y2….yn.

Each possible input to output path is indicated along with aconditional probability p(y /x ) which indicates the conditional


conditional probability p(yj/xi) which indicates the conditionalprobability of obtaining output yj given that input is xi and is calledchannel transition probability.A channel is completely specified by the complete set oftransition probabilities. So a DMC is often specified by a matrixof transition probabilities [P(y/x)]

CHANNEL MATRIX

1 1 2 1 1

1 2 2 2 2

( | ) ( | ) .................. ( | )( | ) ( | ) ................. ( | )

.......................................................................

..........................

n

n

P y x P y x P y xP y x P y x P y x

.............................................( | ) ( | ) ( | )P P P

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦1 2( | ) ( | ) ................. ( | )m m n mP y x P y x P y x⎢ ⎥⎣ ⎦

Matrix [P(y|x)] is called channel matrix. Each row of the matrixspecifies the probabilities of obtaining y1,y2….yn, given x1. So, thesum of elements in any row should be unity.

If the probabilities P(X) are represented by the row matrix, then we

CHANNEL MATRIX

1

( | ) 1 n

j ij

p y x for all i=

=∑p ( ) p y ,

have [P(X)] = [p(x1) p(x2) ………………...p(xm)]The output probabilities P(y) are represented by the row matrix

[P(Y)] = [p(y1) p(y2)………………….p(yn)]The output probabilities may be expressed in terms of inputprobabilities as

[P(Y)] = [P(X)] [P(Y|X)]

If [P(X)] is represented as a diagonal matrix

CHANNEL MATRIX

1

2

( ) 0 ...................00 ( ) .... .........0 ....................................0 0 ( )

p xp x

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Then [P(X,Y)] = [P(X)]d [P(Y|X)]

The (i,j) element of matrix [P(X,Y)] has the form p(xi,yj). Thematrix [P(X,Y)] is known as the joint probability matrix and theelement p(xi,yj) is the joint probability of transmitting xi andreceiving yj.

0 0 ........... ( )mp x⎢ ⎥⎣ ⎦

A channel described by a channel matrix with only one non-zero element in each column is called a lossless channel.In a lossless channel no source information is lost intransmission.

x1

y13/4

1/4

LOSSLESS CHANNEL

3 1 0 0 0 4 4

⎡ ⎤⎢ ⎥

x2

x3

y2

y3

y4

y5

1/3

2/3

1

4 41 2 0 0 03 3

0 0 0 0 1

⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

3/23/2009

11

DETERMINISTIC CHANNEL

A channel described by a channel matrix with only one non-zero element in each row is called a deterministic channel

x1

x2

x3

x5

y1

y2

y3

11

11x4

1

1 0 01 0 0

[P(Y|X)] = 0 1 00 1 0

DETERMINISTIC CHANNEL

0 1 00 0 1

Since each row has only one non-zero element, this elementmust be unity. When a given source symbol is sent to adeterministic channel, it is clear which output symbol is received.

A channel is called noiseless if it is both lossless anddeterministic.The channel matrix has only one element in each row and eachcolumn, and this element is unity.The input and output alphabets are of the same size.

NOISELESS CHANNEL

x1

x2

x3

xm

y1

y2

y3

ym

1

1

1

1

BINARY SYMMETRIC CHANNEL

A binary symmetric channel is defined by the channel diagramshown below and its channel matrix is given by

[ ] 1( | )

1p p

P Y Xp p−⎡ ⎤

= ⎢ ⎥−⎣ ⎦

x1=0

x2=1

y1=0

y2=1

1-p

1-p

p

p

The channel matrix has two inputs 0 and 1 and two outputs 0and 1.This channel is symmetric because the probability ofreceiving a 1 if a 0 is sent is the same as the probability ofreceiving a 0 if a 1 is sent.This common transition probability is represented by p.

BINARY SYMMETRIC CHANNEL EXAMPLE 1

(i) Find the channel matrix of the binary channel.

p(x1)

p(x2)

y1

y2

0.9

0.8

(ii) Find p(y1) and p(y2) when p(x1)=p(x2)=0.5(iii) Find the joint probabilities p(x1,y2) and p(x2,y1) when

p(x1)=p(x2)=0.5

3/23/2009

12

SOLUTION

1 1 2 1

1 2 2 2

( | ) ( | ) ( | )=

( | ) ( | )p y x p y x

Channel Matrix p y xp y x p y x⎛ ⎞⎜ ⎟⎝ ⎠

0.9 0.1=

0.2 0.8⎛ ⎞⎜ ⎟⎝ ⎠

[ ] [ ][ ]=( ) ( ) ( | )P Y P X P Y X [ ] [ ]0.9 0.10.5 0.5 0.55 0.45

0.2 0.8⎛ ⎞

= =⎜ ⎟⎝ ⎠

= =1 2( ) 0.55, p(y ) 0.45p y

⎛ ⎞⎛ ⎞ ⎛ ⎞[ ] [ ] [ ]=( , ) ( ) ( | )d

P X Y P X P Y X 0.5 0 0.9 0.1 0.45 0.050 0.5 0.2 0.8 0.1 0.4

⎛ ⎞⎛ ⎞ ⎛ ⎞= =⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

1 1 1 2

2 1 2 2

( , ) ( , ) 0.45 0.05( , ) ( , ) 0.1 0.4

p x y p x yp x y p x y⎛ ⎞ ⎛ ⎞

=⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠

= =1 2 2 1( , ) 0.05 ( , ) 0.1p x y p x y

EXAMPLE 2Two binary channels of the above example are connected incascade. Find the overall channel matrix and draw the resultantequivalent channel diagram. Find p(z1), p(z2) whenp(x1)=p(x2)=0.5

x1 z10.9 0.9

x2z2

0.8 0.8

SOLUTION

[ ][ ]=( | ) ( | ) ( |P Z X P Y X P Z Y

0.9 0.1 0.9 0.1 0.83 0.170.1 0.8 0.2 0.8 0.34 0.66⎛ ⎞⎛ ⎞ ⎛ ⎞

= =⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

z10.83x1

z20.66

x2

[ ] [ ][ ]=( ) ( ) ( | )P Z P X P Z X

[ ] [ ]0.83 0.17( ) 0.5 0.5 0.585 0.415

0.34 0.66P Z ⎛ ⎞

= =⎜ ⎟⎝ ⎠

EXAMPLE 3A channel has the channel matrix

(i) Draw the channel diagram(ii) If the source has equally likely outputs compute the

probabilities associated with the channel outputs for p=0.2

[ ] −⎡ ⎤= ⎢ ⎥−⎣ ⎦

1 0( | )

0 1p p

P Y Xp p

p p p

x1=0

x2=1

y1=0

y3=1

y2=e (erasure)

SOLUTIONThis channel is known as binary erasure channel (BEC)It has two inputs x1=0 and x2=1 and three outputs y1=0, y2=e,y3=1 where e denotes erasure. This means that the output is indoubt and hence it should be erased

[ ] 02.08.05050)]([ ⎥

⎤⎢⎡

YP [ ]

[ ]4.02.04.08.02.00

5.05.0)]([

=

⎥⎦

⎢⎣

=YP

EXAMPLE 5

31 3

1

21

41

41

41

31

1x

2x

1y

2y

(i) Find the channel matrix(ii) Find output probabilities if , .(iii) Find the output entropy .

11( )2

p x =2 3

1( ) ( )4

p x p x= =

)(YH

21

41

3x 3y

3/23/2009

13

SOLUTION⎡ ⎤⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥= = ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥ ⎣ ⎦

⎢ ⎥⎣ ⎦

1 1 13 3 3 0.33 0.33 0.33

1 1 1[ / ] 0.25 0.5 0.254 2 40.25 0.25 0.51 1 1

4 4 2

P Y X

=[ ] [ ( )][ ( | )]P Y P X P Y X [ ]⎡ ⎤⎢ ⎥= ⎢ ⎥⎢ ⎥⎣ ⎦

0.33 0.33 0.330.5 0.25 0.25 0.25 0.5 0.25

0.25 0.25 0.5⎡ ⎤= ⎣ ⎦7 17 17

24 48 48

1.58 b/symbols=

⎣ ⎦24 48 48

=

⎧ ⎫= ⎨ ⎬

⎩ ⎭∑

3

21

1( ) ( ) log( )i

i i

H Y p yp y

2 2 27 24 17 48 17 48log log log24 7 48 17 48 17

⎛ ⎞ ⎛ ⎞ ⎛ ⎞= + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠

MUTUAL INFORMATION AND CHANNEL CAPACITY OF DMCLet a source emit symbols and the receiver receive symbols . The set of symbols may or may not be identical to the set of symbols depending on the nature of the receiver. Several types of probabilities will be needed to deal with the two alphabets and

mxxx ,...,, 21

nyyy ,...,, 21

kykx

kx kyp k ky

nx

xx

.

.

.2

1

CHANNEL

ny

yy

.

.

.2

1

X Y

PROBABILITIES ASSOCIATED WITH A CHANNEL

i. p(xi) is the probability that the source selects symbol xifor transmission

ii. p(yi) is the probability that the symbol yj is received.iii. p(xi,yi) is the joint probability that xi is transmitted and yj

is receivediv p(xi/yj) is the conditional probability that xi wasiv. p(xi/yj) is the conditional probability that xi was

transmitted given that yj is receivedv. p(yj/xi) is the conditional probability that yj is received

given that xi was transmitted.

ENTROPIES ASSOCIATED WITH A CHANNEL

Correspondingly we have the following entropies also:

i. H(X) is the entropy of the transmitter.ii. H(Y) is the entropy of the receiveriii. H(X,Y) is the joint entropy of the transmitted and received

symbolsiv. H(X|Y) is the entropy of the transmitter with a knowledge

of the received symbols.v. H(Y|X) is the entropy of the receiver with a knowledge of

the transmitted symbols.

ENTROPIES ASSOCIATED WITH A CHANNEL

− − ⎛ ⎞= ⎜ ⎟⎜ ⎟∑∑

1 1 1( | ) ( )logn m

H X Y p x y

−

=

⎛ ⎞= ⎜ ⎟

⎝ ⎠∑

1

20

1( ) ( ) log( )

m

ii i

H X p xp x

−

=

⎛ ⎞= ⎜ ⎟

⎝ ⎠∑

1

20

1( ) ( ) log( )

m

ii i

H Y p yp y

= =

= ⎜ ⎟⎜ ⎟⎝ ⎠

∑∑ 20 0

( | ) ( , )log( | )i j

j i i j

H X Y p x yp x y

− −

= =

⎛ ⎞= ⎜ ⎟⎜ ⎟

⎝ ⎠∑∑

1 1

20 0

1( | ) ( , )log( | )

n m

i jj i j i

H Y X p x yp y x

− −

= =

= ∑∑1 1

20 0

1( , ) ( , )log( , )

n m

i jj i i j

H X Y p x yp x y

RELATIONSHIP BETWEEN ENTROPIES

− −

= =

= ∑∑1 1

20 0

1( , ) ( , )log( , )

n m

i jj i i j

H X Y p x yp x y

− −

= =

= ∑∑1 1

20 0

1( , )log( | ) ( )

n m

i jj i i j j

p x yp x y p y

− − ⎛ ⎞= +⎜ ⎟⎜ ⎟

⎝ ⎠∑∑

1 1

2 20 0

1 1( , ) log log ( | ) ( )

n m

i jj i

p x yp x y p y= = ⎝ ⎠0 0 ( | ) ( )j i i j jp x y p y

− −

= =

⎛ ⎞= +⎜ ⎟⎜ ⎟

⎝ ⎠∑∑

1 1

2 20 0

1 1( , )log ( , )log( | ) ( )

n m

i j i jj i i j j

p x y p x yp x y p y

− −

= =

= +∑∑1 1

20 0

1( | ) ( , )log( )

n m

i jj i j

H X Y p x yp y

3/23/2009

14

RELATIONSHIP BETWEEN ENTROPIES

−

=

= +∑1

20

1( | ) ( )log ( )

n

ij j

H X Y p yp y

− −

= =

⎡ ⎤= + ⎢ ⎥

⎣ ⎦∑ ∑

1 1

20 0

1( , ) ( | ) ( , ) log( )

n m

i jj i j

H X Y H X Y p x yp y

( | ) ( )H X Y H Y

−

=

⎡ ⎤=⎢ ⎥

⎣ ⎦∑

1

0( , ) ( )

m

i j ji

p x y p y

= +( | ) ( )H X Y H Y

Similarly

= +( , ) ( | ) ( )H X Y H X Y H Y

+H(X,Y)= ( | ) ( )H Y X H X

MUTUAL INFORMATION

If the channel is noiseless then the reception of some symbolyj uniquely determines the message transmitted.Because of noise there is a certain amount of uncertaintyregarding the transmitted symbol when yj is received.p(xi|yj) represents the conditional probability that thetransmitted symbol was xi given that yj is received.The average uncertainty about x when yj is received isThe average uncertainty about x when yj is received isrepresented as

The quantity H(X|Y=yj) is itself a random variable that takes onvalues H(X|Y=y0), H(X|Y=y1),…, H(X|Y=yn) with probabilitiesp(y0), p(y1),…, p(yn).

−

=

⎧ ⎫⎪ ⎪= = ⎨ ⎬⎪ ⎪⎩ ⎭

∑1

20

1( | ) ( | )log( | )

m

j i ji i j

H X Y y p x yp x y

MUTUAL INFORMATION

Now the average uncertainty about X when Y is received is − −

= =

⎡ ⎤⎛ ⎞= ⎢ ⎥⎜ ⎟⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦∑ ∑

1 1

20 0

1( | ) ( | )log ( )( | )

n m

i j jj i i j

H X Y p x y p yp x y

1 1

20 0

1( | ) ( )log( | )

n m

i j jj i i j

p x y p yp x y

− −

= =

⎛ ⎞= ⎜ ⎟⎜ ⎟

⎝ ⎠∑∑

H(X|Y) represents the average loss of information about atransmitted symbol when a symbol is received.It is called equivocation of X w. r. t. Y.

− −

= =

⎛ ⎞= ⎜ ⎟⎜ ⎟

⎝ ⎠∑∑

1 1

20 0

1( , )log( | )

n m

i jj i i j

p x yp x y

MUTUAL INFORMATION

If the channel were noiseless the average amount of informationreceived would be H(X) bits per received symbol.H(X) is the average amount of information transmitted per symbol.Because of channel noise we lose an average of H(X|Y) informationper symbol.Due to this loss the receiver receives on the average H(X) – H(X|Y)bits per symbol.p yThe quantity H(X) – H(X|Y) is denoted by I(X;Y) and is calledmutual information.

− − −

= = =

⎛ ⎞⎛ ⎞= − ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠∑ ∑∑

1 1 1

2 20 0 0

1 1( ; ) ( )log ( , )log( ) ( | )

m m n

i i ji i ji i j

I X Y p x p x yp x p x y

MUTUAL INFORMATION−

=

=∑1

0 ( , ) ( )

n

i j ij

But p x y p x

1 1 1 1

20 0 0 0

1 1( ; ) ( , )log ( , )log( ) ( | )

m n m n

i j i ji j i ji i j

I X Y p x y p x yp x p x y

− − − −

= = = =

⎛ ⎞⎛ ⎞= − ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠∑∑ ∑∑

1 1

0 0

( / )( , )log

( )

m ni j

i ji j i

p x yp x y

p x

− −

= =

=∑∑

If we interchange the symbols xi and yj the value of eq(1) is not altered, so we get I(X;Y)=I(Y;X).H(X) – H(X|Y)=H(Y) – H(Y|X)

− −

= =

= ∑∑1 1

0 0

( , )( , )log ...........(1)

( ) ( )

m ni j

i ji j j i

p x yp x y

p y p x

0 0 ( )i j ip= =

CHANNEL CAPACITY

A particular communication channel has fixed source anddestination alphabets and a fixed channel matrix.So the only variable quantity in the expression for mutualinformation I(X;Y) is the source probability p(xi).Consequently maximum information transfer requires specificsource statistics obtained through source coding.A suitable measure of the efficiency of information transferA suitable measure of the efficiency of information transferthrough a DMS is obtained by comparing the actualinformation transfer to the upper bound of such trans-information for a given channel.The information transfer in a channel is characterised bymutual information and Shannon named the maximum mutualinformation as the channel capacity.

Channel capacity C=I(X;Y)max

3/23/2009

15

CHANNEL CAPACITY

Channel capacity C is the maximum possible informationtransmitted when one symbol is transmitted from thetransmitter.Channel capacity depends on the transmission medium, kindof signals, kind of receiver, etc. and it is a property of thesystem as a whole.

CHANNEL CAPACITY OF A BSC

x1=0

x2=1

y1=0

y2=1

1-α

1- α

α

αp

1

The source alphabet consists of two symbols x1 and x2 with probabilities p(x1)=p and p(x2)=1-p. The destination alphabet is y1,y2.The average error probability per symbol is

1 α1-p

1 2 1 2 1 2( ) ( | ) ( ) ( | )ep p x p y x p x p y x= +(1 )p pα α α= + − =


The error probability of a BSC is αThe channel matrix is given by

Destination entropy H(Y) is

[ ] 1( | )

1P Y X

α αα α−⎡ ⎤

= ⎢ ⎥−⎣ ⎦

1 2 2 21 2

1 1( ) ( ) log ( ) log( ) ( )

H Y p y p yp y p y

= +

( )1 2 1 21 1

1 1( ) log 1 ( ) log( ) 1 ( )

p y p yp y p y

= + −−

1[ ( )]p y= Ω

( )2 21 1( ) log 1 log

1x x x

x xΩ = + −

−


1 HMAX

H Plot of H as a function of α

1/2 10 α


The maxima occurs at x=0.5 and Hmax =1 bit/symbolThe probability of the output symbol y1 is

1 1( ) ( , )ix

p y p x y=∑1 1 1 1 2 2( | ) ( ) ( | ) ( )p y x p x p y x p x= +

(1 ) (1 )p pα α= − + −( ) ( )p p2p pα α= + −

21 1

1 ( | ) ( , ) log( | )

n m

i jj i j i

Noise entropy H Y X p x yp y x= =

= ∑∑2 2

21 1

1( ) ( | ) log( | )i j i

j i j i

p x p y xp y x= =

=∑∑

( ) ( 2 )H Y p pα α= Ω + −


2 2

21 1

1( | ) ( ) ( | ) log( | )i j i

i j j i

H Y X p x p y xp y x= =

= ∑ ∑

1 1 1 2 1 2 1 21 1 2 1

1 1( ) ( | ) log ( ) ( | ) log( | ) ( | )

p x p y x p x p y xp y x p y x

= +

1 12 1 2 2 2 2 2 2

1 2 2 2

1 1( ) ( | ) log ( ) ( | ) log( | ) ( | )

p x p y x p x p y xp y x p y x

+ +

2 2 2 21 1 1 1(1 )log log (1 ) log (1 )(1 )log

1 1p p p pα α α α

α α α α= − + + − + − −

− −

2 21 1(1 )log log

1α α

α α= − +

−( )α=Ω ( | ) ( )H Y X α=Ω

3/23/2009

16


If the noise is small, error probability α<<1 and the mutualinformation becomes almost the source entropy.

( ; ) ( 2 ) ( )I X Y p pα α α= Ω + − −Ω

( ; ) ( ) ( | )I X Y H Y H Y X= −

On the other hand if the channel is very much noisy, α=1/2.

For a fixed α, Ω(α) is a constant, but the other term Ω(α+p-2 αp)varies with source probability.This term reaches a maximum value of 1 when α+p-2 αp=1/2

( ; ) ( ) ( )I X Y p H X= Ω =

( ; ) 0I X Y =

CHANNEL CAPACITY OF A BSCThis condition is satisfied by any α if p=1/2.So the channel capacity of a BSC can be written as

1 ( )C α= −Ω

1 ( )C α= −Ω

SHANNON’S THEOREM ON CHANNEL CAPACITY

i. Given a source of M equally likely messages with M>>1which is generating information at a rate R, given a channelwith channel capacity C, then if R≤C, there exists a codingtechnique such that the output of the source may betransmitted over the channel with a probability of error inthe received message which may be made arbitrarily small.

ii. Given a source of M equally likely messages with M>>1which is generating information at a rate R; then if R>C, theprobability of error is close to unity for every possible set ofM transmitter signals.

DIFFERENTIAL ENTROPY H(X)

Consider a continuous random variable X with the probability density function fX(x). By analogy with the entropy of a discrete random variable we can introduce the definition

21( ) ( )log( )Xh X f x dx

f x

∞ ⎛ ⎞= ⎜ ⎟

⎝ ⎠∫

h(x) is called differential entropy of X to distinguish it from the ordinary or absolute entropy. The difference between h(x) and H(X) can be explained as below.

( )Xf x−∞ ⎝ ⎠∫


We can view the continuous random variable X as the limiting form of a discrete random variable that assumes the values xk=kΔx where k=0, ±1, ±2, … and Δx approaches zero. The continuous random variable X assumes a value in the interval xk, xk + Δx with probability fX(xk)Δx. Hence permitting Δx to approach zero the ordinary entropy of p g pp y pythe continuous random variable X may be written in the limit as follows

∞

Δ →=−∞

⎛ ⎞= Δ ⎜ ⎟Δ⎝ ⎠

∑ 20(continuous)

1( ) lim ( ) log( )X kx k X k

H X f x xf x x

∞ ∞

Δ →−∞ =−∞

⎧ ⎫⎛ ⎞⎪ ⎪= Δ − Δ Δ⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭∑ ∑2 20

1lim ( )log log ( )( )X k X kx kX k

f x x x f x xf x


In the limit as , approaches infinity.This implies that the entropy of a continuous random

0→Δx 20lim log

xx

Δ →− Δ

∞→)(XH

Δ

∞

−∞→

= − Δ⎡ ⎤

=⎢ ⎥⎣ ⎦

∫20(continuous) since (( ) ( ) )lim lo g 1 Xx

f x dxH X h X x

∞ ∞

Δ →−∞ −∞

⎛ ⎞= − Δ⎜ ⎟

⎝ ⎠∫ ∫2 20

1( )log lim log ( )( )X Xx

X

f x dx x f x dxf x

. This implies that the entropy of a continuous randomvariable is infinitely large.A continuous random variable may assume a value anywhere inthe interval and the uncertainty associated with thevariable is on the order of infinity.So we define h(X) as differential entropy with the term serving as a reference.

∞→)(XH

to +−∞ ∞

2log x− Δ

3/23/2009

17

EXAMPLE

A signal amplitude X is a random variable uniformlydistributed in the range (-1,1). The signal is passed throughan amplifier of gain 2. The output Y is also a random variableuniformly distributed in the range (-2,+2). Determine thedifferential entropies of X and Y

⎧ <⎪⎨

12, 1

( )x

f ⎪= ⎨⎪⎩

2( )0, xf x

Otherwise

⎧ <⎪= ⎨⎪⎩

14, 2

( )0, y

yf y

Otherwise

EXAMPLE

The entropy of the random variable Y is twice that of X.

[ ] bitxdxxh 1 2log )( 1 12

12

1

12

1 === −−∫

bitsxdyyh 2][2 4log )(2

2

2 24

124

1 =×== ∫−

−

Here Y=2X and a knowledge of X uniquely determines Y.Hence the average uncertainty about X and Y should beidentical.Amplification can neither add nor subtract information. But hereh(Y) is twice as large as h(X).This is because h(X) and h(Y) are differential entropies and theywill be equal only if their reference entropies are equal.

EXAMPLEThe reference entropy R1 for X is and reference entropy R2 for Y is In the limit as

log x− Δlog y− Δ

0, →ΔΔ yx1 0

lim logx

R xΔ →

= − Δ 2 0lim logy

R yΔ →

= − Δ

1 2 lim log yR R Δ− =

Δlog dy

dx⎛ ⎞= ⎜ ⎟⎝ ⎠

(2 )log d xdx

⎡ ⎤= ⎢ ⎥⎣ ⎦

R1, reference entropy of X is higher than the referenceentropy R2 for Y. Hence if X and Y have equal absoluteentropies their differential entropies must differ by 1 bit.

1 2 , 0g

x y xΔ Δ → Δ dx⎝ ⎠ dx⎢ ⎥⎣ ⎦log 2 1bit= = 1 2. ., 1 = +i e R R bits

CHANNEL CAPACITY AND MUTUAL INFORMATION

Let a random variable X is transmitted over a channel.Each value of X in a given continues range is now a message thatmay be transmitted. e.g. a pulse of height X.The message recovered by the receiver will be a continuousrandom variable Y.If the channel were noise free the received value Y would uniquelydetermine the transmitted value Xdetermine the transmitted value X.Consider the event that at the transmitter a value of X in theinterval (x, x+∆x) has been transmitted (∆x→0).Here the amount of information transmitted is since the probability of the above event is fx(x)∆x. Let the value of Y at the receiver be y and let fx(x|y) is the conditional pdf of X given Y.

[ ]log 1 ( )Xf x xΔ

MUTUAL INFORMATION

Then fx(x|y) ∆x is the probability that X will lie in the interval(x, x+∆x) when Y=y provided ∆x→0. There is an uncertaintyabout the event that X lies in the interval (x,x+∆x).This uncertainty arises because of channelnoise and therefore represents a loss of information.Because is the information transmitted and

[ ]log 1 ( | )Xf x y xΔ

log[1 ( ) ]Xf x xΔ. is the information lost over the channel thenet information received is given by the different between thetwo.

g[ ( ) ]Xf[ ]log 1 ( | )Xf x y xΔ

[ ] [ ]log 1 ( ) log 1 ( | )X Xf x x f x y x= Δ − Δ( | )log

( )X

X

f x yf x

=

Comparing with the discrete case we can write the mutualinformation between random variable X and Y as

MUTUAL INFORMATION

( ; ) 2( | )( , ) log

( )

∞ ∞

−∞ −∞

⎡ ⎤= ⎢ ⎥

⎣ ⎦∫ ∫ X

X Y XYX

f x yI f x y dxdyf x

1∞ ∞ ∞ ∞

∫ ∫ ∫ ∫2 21( , ) log ( , ) log ( | )( )−∞ −∞ −∞ −∞

= +∫ ∫ ∫ ∫XY XY XX

f x y dxdy f x y f x y dxdyf x

2 21( ) ( | )log ( , )log ( | )( )

∞ ∞ ∞ ∞

−∞ −∞ −∞ −∞

= +∫ ∫ ∫ ∫X Y XY XX

f x f y x dxdy f x y f x y dxdyf x

2 21( )log ( | ) ( , ) log ( | )( )

∞ ∞ ∞ ∞

−∞ −∞ −∞ −∞

= +∫ ∫ ∫ ∫X Y XY XX

f x dx f y x dy f x y f x y dxdyf x

3/23/2009

18

MUTUAL INFORMATION

1, ( ) log ( )( )

∞

−∞

=∫ XX

Now f x dx h xf x

( | ) 1∞

−∞

=∫ Yand f y x dy

( ; ) 2( ) ( , ) log ( | )∞ ∞

−∞ −∞

= + ∫ ∫x y XY XI h x f x y f x y dxdy

1∞ ∞

The second term on the RHS represents the average over x and y ofBut this term represents the uncertainty about x when y is received.

21( ) ( , ) log

( | )XYX

h x f x y dxdyf x y−∞ −∞

= − ∫ ∫

[ ]log 1 ( | )Xf x y[ ]log 1 ( | )Xf x y

MUTUAL INFORMATION

It is the loss of information over the channel.The average of is the average loss ofinformation over the channel when some x is transmittedand y is received.By definition this quantity is represented by h(x|y) and iscalled equivocation of X and Y

[ ]log 1 ( | )Xf x y

q

21( | ) ( , ) log

( | )XYx

h X Y f x y dxdyf x y

∞ ∞

−∞ −∞

= ∫ ∫

( ; ) ( ) ( | )I X Y h x h x y= −

CHANNEL CAPACITYThat is when some value of X is transmitted and when some value of Y is received the average information transmitted over the channel is I(X;Y).Channel capacity C is defined as the maximum amount of information that can be transmitted on the average.

max[ ( ; )]=C I X Y

max[ ( ; )]=C I X Y

MAXIMUM ENTROPY FOR CONTINUOUS CHANNELS

For discrete random variables the entropy is maximum when all the outcomes were equally likely. For continuous random variables there exists a PDF fx(x) that maximizes h(x).It is found that the PDF that maximizes h(x) is Gaussian distribution given byg y

Also the random variables X and Y must have the same mean μand same variance σ2

2

2( )

21( )2

μσ

πσ

−−

=x

Xf x e

MAXIMUM ENTROPY FOR CONTINUOUS CHANNELSConsider an arbitrary pair of random variation X and Y Whose PDF are respectively denoted by fy(x) and fx(x) where x is a dummy variable.Adapting the fundamental inequality

log 0≤∑m

kqp

we may write

21

log 0=

≤∑ kk k

pp

2( )( ) log 0( )

∞

−∞

≤∫ XY

Y

f xf x dxf x

2 21( ) log ( ) log ( ) 0( )Y Y X

Y

f x f x f xf x

∞ ∞

−∞ −∞

+ ≤∫ ∫


2( ) ( ) log ( ) ..........(1)Y Xh Y f x f x dx∞

−∞

≤ − ∫

2 21( ) log ( ) log ( )( )

∞ ∞

−∞ −∞

≤ −∫ ∫Y Y XY

f x f x f x dxf x

When the random variable X is Gaussian its PDF is given by

Substituting (2) in (1)

2( )21( ) .............(2)

2

x

Xf x eμ

σ

πσ

−−

=

2

2

2

( )21( ) ( ) log

2

x

Yh Y f x e dxμσ

πσ

−∞ −

−∞

≤ − ∫

3/23/2009

19


Converting the logarithm to base e using the relation

2 2log ( ) log [log ( )]ex e x=2

2( )

22

1( ) ( ) log log2

μσ

πσ

−∞ −

−∞

⎡ ⎤≤ − ⎢ ⎥

⎢ ⎥⎣ ⎦∫

x

Y eh Y f x e e dx∞ ⎢ ⎥⎣ ⎦

2

2 2

( )log ( ) log 22μ πσ

σ

∞

−∞

⎡ ⎤−≤ − − −⎢ ⎥

⎣ ⎦∫ Y e

xe f x dx

2

2 2

( )log ( ) ( ) log 22μ πσ

σ

∞ ∞

−∞ −∞

⎧ ⎫⎛ ⎞−≤ − − −⎨ ⎬⎜ ⎟

⎝ ⎠⎩ ⎭∫ ∫Y Y e

xe f x dx f x dx


It is given that the random variable X and Y has the properties (i) mean=μ, (ii) variance=σ2

( ) 1,∞

−∞

=∫ Yf x 2 2( ) ( )μ σ∞

−∞

− =∫ Yx f x dx

21( ) log log 22

πσ⎡ ⎤≤− − −⎢ ⎥⎣ ⎦eh Y e2( ) g g

2⎢ ⎥⎣ ⎦e

2 21log log .log 22

πσ≤ + ee e

2 21 lo g lo g 22

π σ≤ +e

22 2

1 1log log 22 2

πσ≤ +e

( )22

1 lo g 22

π σ≤ e

( )22

1( ) lo g 22

h Y eπ σ≤


Maximum value of h(Y) is

For a finite variance σ2 the gaussian random variable has thelargest differential entropy attainable by any random variable.The entropy is uniquely determined by its variance.

22

1( ) log (2 )2

h y eπ σ=

py q y y

CHANNEL CAPACITY OF A BAND LIMITED AWGN CHANNEL (SHANNON HARTLEY THEOREM)

The channel capacity C is the maximum rate of informationtransmission over a channel . The mutual information I(X;Y) isgiven by I(X;Y)=h(Y)-h(Y|X)

The channel capacity is the maximum value of the mutualinformation I(X;Y). Let a channel is band limited to B Hz anddi t b d b hit G i i f PSD ( /2)disturbed by a white Gaussian noise of PSD (η/2)

Let the signal power be S. The disturbance is assumed to beadditive so the received signal y(t)=x(t) + n(t)

Because the channel is band limited both the signal x(t) and thenoise n(t) are bandlimited to B Hz . y(t) is also bandlimited toB Hz.


All these signals are therefore completely specified by samples taken at the uniform rate of 2B samples / second . Now we have to find the maximum information that can be transmitted per sample .Let x,n and y represent samples of x(t) , n(t) and y(t) .The information I(X;Y) transmitted per sample is given byThe information I(X;Y) transmitted per sample is given by I(X;Y) = h(Y)-h(Y|X)By definition

( | ) ( , ) log(1 / ( | ))XY Yh Y X f x y f y x dxdy∞ ∞

−∞ −∞= ∫ ∫

( ) ( | )log(1 ( | )X Y Yf x f y x f y x dxdy∞ ∞

−∞ −∞= ∫ ∫


For a given x, y is equal to a constant x+n .Hence the distribution of Y when X has a given value isidentical to that of n except for a translation by x .If fn represents the PDF of the noise sample n

2( | ) ( ) ( | ) log (1 ( | ))X Y Yh Y X f x dx f y x f y x dy∞ ∞

−∞ −∞= ∫ ∫

If fn represents the PDF of the noise sample n

putting y-x = z

( | ) ( )Y Nf y x f y x= −

2 2( | )log (1 ( | ) ( )log (1 ( ))Y Y N Nf y x f y x dy f y x f y x dy∞ ∞

−∞ −∞= − −∫ ∫

2 2( | ) log (1 ( | ) ( ) log (1 ( ))Y Y N Nf y x f y x f z f z dz∞ ∞

−∞ −∞=∫ ∫

3/23/2009

20


The mean square value of the x(t) = S and the mean square

( ; ) ( ) ( )= −I X Y h y h n

( | ) ( ) ( ) ( )= = − =h Y X h z h y x h n

( | ) ( )=h Y X h n

The mean square value of the x(t) S and the mean square value of the noise = N . Mean square value of y is given byNow mean square value of output entropy h(y) is obtained when Y is Gaussian and is given by

2 = +y S N

m a x1h (y ) lo g 2 e (S N )2

π= + 2σ = +S N


For a white gaussian noise with mean square value N

max max( , ) ( ) ( )= −I x y h y h n

)())(2log(21 nhNSe −+= π

N B1( ) l 2h N N Bη=1( ) lo g 22

π=h n e N

[ ] [ ]1 1log 2 ( ) log(2 )2 2

π π= + −SC e S N eN

max ( , )Channel capacity per sample =SC I x y

⎥⎦⎤

⎢⎣⎡ +

=eN

NSeπ

π2

)(2log21

Channel capacity per sample is ½ log( 1+S/N) . There are 2B samples per second .So the channel capacity per second is given by

⎥⎦⎤

⎢⎣⎡ +

=N

NSCS)(log

21 1 lo g 1

2SN

⎡ ⎤= +⎢ ⎥⎣ ⎦

12 log 1⎡ ⎤⎛ ⎞= +⎜ ⎟⎢ ⎥SC B


2 log 12

+⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦C B

N

lo g 1 b its /se c o n dSBN

⎡ ⎤⎛ ⎞= +⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

log 1 bits/second⎡ ⎤⎛ ⎞= +⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

SC BN

CAPACITY OF A CHANNEL OF INFINITE BANDWIDTH BW

The Shannon Hartley Theorem indicates that a noiselessGaussian channel with S/N= infinity has an infinite capacitysince

When the bandwidth B increases the channel capacity does notbecome infinite as expected because with an increase in BW the

2 lo g 1 SC BN

⎡ ⎤⎛ ⎞= +⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

become infinite as expected because with an increase in BW thenoise power also increases.Thus for a fixed signal power and in presence of white Gaussiannoise the channel capacity approaches an upper limit withincrease in band width .

2log 1 ⎡ ⎤⎛ ⎞= +⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

SC BN

η=Putting N B

CAPACITY OF A CHANNEL OF INFINITE BANDWIDTH BW

⎥⎦

⎤⎢⎣

⎡+=

BS

SBS

ηη

η1log. 2

SB

BSS

η

ηη ⎥⎦

⎤⎢⎣

⎡+= 1log2

S

2log 1 η

⎡ ⎤⎛ ⎞= +⎢ ⎥⎜ ⎟

⎝ ⎠⎣ ⎦

SC BB

1s B

2S Slog 1

Bη

η η⎡ ⎤

= +⎢ ⎥⎣ ⎦

Putting this expression becomes S xBη=

xxSC /12 )1(log +=

η→∞ →As B , x 0

1/ x2x 0

SC lim log (1 x)η∞ →

= +

CAPACITY OF A CHANNEL OF INFINITE BANDWIDTH

This equation indicates that we may trade off bandwidth for signal to noise ratio and vice versa

2SC log eη∞ =

=SC 1 .4 4

S1.44η

=SC 1.44η∞ =

For a maximum C we can trade off S/N and B.If S/N is reduced we have to increase the BW . If the BW is to be reduced we have to increase S/N and so on .

η∞C 1 .4 4

( )⎛ ⎞= ⎜ ⎟⎝ ⎠

S1 . 4 4 BN

S BC 1 .4 4N∞ = putting N

Bη =

3/23/2009

21

ORTHOGONAL SET OF FUNCTIONS

Consider a set of functions defined overthe interval x1 ≤ x ≤ x2 and which are related to one another asbelow

If we multiply and integrate the functions over the interval x1d th lt i t h th i l th

2

1

( ) ( ) 0x

i jxi j

g x g x

≠

=∫

1 2( ), ( ),.........., ( )ng x g x g x

and x2 the result is zero except when the signal are the same.A set of functions which has this property is described asbeing orthogonal over the interval from x1 to x2.The function can be compared to vector and whose dotproduct is given by

cosi jv v θ

iv jv


The vectors vi and vj are perpendicular when θ = 90 i.e.vi.vj= 0. The vectors are then said to be orthogonal. Incorrespondence function whose integrated product is zero arealso orthogonal to one another.Consider we have an arbitrary function f(x) and we areinterested in f(x) only in the range from x1 and x2 ie over theinterval in which the set of functions g(x) are orthogonal.Now we can expand f(x) as a linear sum of the functions g (x)Now we can expand f(x) as a linear sum of the functions gn(x)

where c’s are numerical constants.The orthogonality of the g’s makes it easy to compute thecoefficients cn. To evaluate cn we multiply both sides of eq(2)by gn(x) and integrate over the interval of orthogonality.

.

1 1 2 2( ) ( ) ( )................ ( )...............(2)n nf x c g x c g x c g x= + +


Because of orthogonality all of the terms on the right hand side

2 2 2

1 1 1

1 1 2 2( ) ( ) ( ) ( ) ( ) ( ) .....x x x

n n nx x x

f x g x dx c g x g x dx c g x g x dx= + +∫ ∫ ∫2

1

....... ( ) ( )x

n n nx

c g x g x dx+ ∫Because of orthogonality all of the terms on the right hand side becomes zero with a single exception.

2 2

1 1

2( ) ( ) ( )x x

n n nx x

f x g x dx c g x dx=∫ ∫2

1

2

1

2

( ) ( )

( )

x

nxn x

nx

f x g x d xc

g x d x=∫∫

If we are selecting the denominator of RHS such that

we have

When orthogonal functions are selected such thatthey are said to be normalised.

ORTHOGONAL SET OF FUNCTIONS2

1

2( ) 1=∫x

nx

g x dx

2

1

( ) ( ) ..........(3)x

n nx

c f x g x dx= ∫2

1

2( ) 1=∫x

nx

g x dx

yThe use of normalised functions has the advantage that cn’scan be calculated from eq(3) without having to evaluate the

integral

A set of functions which are both orthogonal and normalisedis called orthonormal set.

2

1

2( )∫x

nx

g x dx

MATCHED FILTERS RECEPTION OF MARY FSK

Let a message source generates M messages each with equallikelihood.Let each message be represented by one of the orthogonal setof signals s1(t), s2(t), ……., sn(t). The message interval is T.The signals are transmitted over a communication channelwhere they are corrupted by Additive White Gaussian Noise(AWGN).At th i d t i ti f hi h h bAt the receiver a determination of which message has beentransmitted is made through the use of M matched filters orcorrelators.Each correlator consists of a multiplier followed by an integrator.The local inputs to the multipliers are si(t).Suppose that in the absence of noise the signal si(t) istransmitted and the output of each integrator is sampled at theend of a message interval.


SOURCE OFM

AWGN

1( )s t

2 ( )s t0

T

∫

0

T

∫

1e

2eM

MESSAGES

( )Ms t

0

0

T

∫ Me

. . . . .

. . . . .

3/23/2009

22


Then because of the orthogonality condition all the integrator willhave zero output except for the ith integrator whose output will be

It is adjusted to produce an output of Es, symbol energy.In the presence of an AWGN wave from n(t), the output of the lthcorrelator will be

2

0

( )T

is t d t∫

( ( ) ( )) ( )T

e s t n t s t dt= +∫ ( ) ( ) ( ) ( )T

T

l i ls t n t dt s t s t dt= +∫ ∫

The quantity nl is a random variable which is gaussian, which has amean value of zero and has a mean square value given byσ2=ηEs/2The correlator corresponding to the transmitted message willhave an output

0( ) ( )

T

l ln t s t dt n= ≡∫

0( ( ) ( )) ( )

T

i i ie s t n t s t dt= +∫

0( ( ) ( )) ( )l i le s t n t s t dt= +∫ 0

0

( ) ( ) ( ) ( )l i ls t n t dt s t s t dt+∫ ∫To determine which message has been transmitted we shallcompare the matched filters output e1, e2 …….., eM.


( )0

( ) ( ) ( )= +∫T

i i ie s t n t s t dt

2

0 0( ) ( ) ( )= +∫ ∫

T T

i is t dt n t s t dt s iE n= +

1 2 MWe may decide that si(t) has been transmitted if thecorresponding output ei is larger than the output of any of thefilters.The probability that some arbitrarily selected output el is less thanthe output ei is given by

2

221( ) ................(1)2

li

ee

l i lp e e e deσ

π

−

−∞< =

σ ∫


The probability that e1 and e2 both are smaller than ei is given by

1 2 1 2( ) ( ) ( )i i i ip e e and e e p e e p e e< < = < <

[ ]21( )ip e e= < [ ]2

2( )ip e e= <

The probability pL that ei is that largest of the outputs is given by

1 2 3( , , ,..... )L i Mp p e e e e e= > 1[ ( )]Ml ip e e −= <

2

2

1

212

li

Me

e

le deσ

π

−−

−∞

⎡ ⎤= ⎢ ⎥

σ⎢ ⎥⎣ ⎦∫


2

2

1

212

σ

π

−−+

−∞

⎡ ⎤= ⎢ ⎥

σ⎢ ⎥⎣ ⎦∫

ls i

Me

E n

L lp e de

e 2σ

=lLet x

e ,

e , 2 2

l

s il s i

When xE nWhen E n xσ σ

= −∞ = −∞

= + = +

2lde dxσ=

2

1

21 σ σ

π

−+

−

−∞

⎡ ⎤⎢ ⎥= ⎢ ⎥⎢ ⎥⎣ ⎦

∫s i

ME n

xLp e d x


2 , 2 2

η ησ σ= =s sE E

2

1

21 ..........( 4 )

s iM

E n

xLp e d x

η σ

π

−+

−

⎡ ⎤⎢ ⎥

= ⎢ ⎥⎢ ⎥

∫

= 2 ησ

s sE E

π −∞⎢ ⎥⎣ ⎦

∫

, ,2L

s iE nMp fη σ

⎛ ⎞= ⎜ ⎟

⎝ ⎠

pL depends on two deterministic parameters Es/η and M and on asingle random variable ni/√2σ

To find the probability that ei is the largest output without reference tothe noise output ni of the ith correlator we need to average pL over allpossible values of n


, ,2L L

s iE np p Mη σ

⎛ ⎞= ⎜ ⎟

⎝ ⎠

possible values of ni.The average is the probability that we shall be correct in decidingthat the transmitted signal corresponds to the correlator which yieldsthe largest output. Let this probability be pC.The probability of an error is then pE = 1-pCni is a random variable (gaussian) with zero mean and variance σ2 .Hence the average value of pL considering all possible values of ni isgiven by

3/23/2009

23


2

2212

, ,2

is i

n

C L ip p e dE M nn σ

σ η σ

∞ −

−∞

⎛ ⎞= ⎜ ⎟

⎝ ⎠∫

and using eq (42

)iIf nyσ

=

1

2 2

1

1s

ME y

My x

Cp e e dx dyη

π

−+

∞− −

−∞ −∞

⎛ ⎞⎜ ⎟⎛ ⎞= ⎜ ⎟⎜ ⎟

⎝ ⎠ ⎜ ⎟⎜ ⎟⎝ ⎠

∫ ∫

1e cp p= −

EFFICIENCY OF ORTHOGONAL SIGNAL TRANSMISSION

The above equation is plotted after evaluation by numerical integration by a computer

M=2

M=4

1

0.5

0.1

10 2

pe

M=16

M=1024

M=2048

M→α

10-2

10-3

10-4

10-5

0.71 2 3 6 10 20

ln 2 iSRη 2log

iSMη

=

EFFICIENCY OF ORTHOGONAL SIGNAL TRANSMISSION

( , )e sp f M E η=

22

loglogs sE E TAbscissa is T MM ηη

=

iS T= iS

si

EPut ST

=

2log MP R2log Mη

=2log MTη

=

iSRη

=

2g Put RT

=

Efficiency of Orthogonal Signal Transmission :Observations About the Graph

For all M, pe decreases as increases. As

For M ,pe=0 provided and pe=1 otherwise

For fixed M and R, pe decreases as as the noise density ηd

iSRη

, 0ie

S pRη

→ ∞ →

→∞ ln 2iSRη≥

decreases.For fixed M and η, pe decreases as the signal power goes up.For fixed Si, η and M, pe decreases as we allow more time T forthe transmission of a single message or the rate R is decreased.For fixed Si, η and T, pe decreases as M decreasesAs M, the number of messages increases, the errorprobability reduces.


As the error probability provided

As the bandwidth (B = 2Mfs)

Maximum allowable errorless transmission rate R is the

M →∞

0ep →M →∞

B →∞

ln 2iSRη≥

Maximum allowable errorless transmission rate Rmax is the channel capacity.

The maximum rate obtained for this M ary FSK is the same as that obtained by Shannon’s Theorem

max

ln 2iSRη

= maxln 2

iSRη

= 1.44 iSη

=


As M increases, the bandwidth is increased and the number ofmatched filters increases and so does the circuit complexityErrorless transmission is really possible as predicted byShannon’s Theorem provided Rmax=1.44 Si/ η ifAs we are required to transmit information at the same fixed rateR in the presence of fixed noise power spectral density with fixed

M →∞

R in the presence of fixed noise power spectral density with fixederror probability and fixed M, we have to control signal power Si

Information Theory Handout Bw

Documents

Transcript of Information Theory Handout Bw