Question (from exercises 2) Are the following sources likely to be stationary and ergodic? (i)Binary...
-
Upload
isabel-ross -
Category
Documents
-
view
215 -
download
3
Transcript of Question (from exercises 2) Are the following sources likely to be stationary and ergodic? (i)Binary...
Question (from exercises 2)Are the following sources likely to be stationary and ergodic?
(i) Binary source, typical sequence aaaabaabbabbbababbbabbbbbabbbbaabbbbbba......
(ii) Quaternary source (4 symbols), typical sequences abbabbabbababbbaabbabbbab....... andcdccdcdccddcccdccdccddcccdc......
(iii) Ternary source (3 symbols), typical sequence AABACACBACCBACBABAABCACBAA…
(iv) Quaternary source, typical sequence 124124134124134124124 …
Definitions
• A source is stationary if its symbol probabilities do not change with time, e.g.
• Binary source: Pr(0) = Pr(1) = 0.5• Probabilities assumed same all the time
• A source is ergodic if it is stationary and(a)No proper subset of it is stationary
i.e. source does not get locked in subset of symbols or states
(b)It is not periodici.e. the states do not occur in a regular pattern
E.g. output s1 s2 s3 s1 s4 s3 s1 s4 s5 s1 s2 s5 s1 s4 s3 …
is periodic because s1 occurs every 3 symbols
Review
source encode/transmit
receive/decode
destinationchannel
Ideal
message messagesignal
NOISE Actual
measure of information - entropy• conditional entropy, mutual information• entropy per symbol (or per second),
entropy of Markov sourceredundancyinformation capacity (ergodic source)
(i) remove redundancy to maximise information transfer(ii) use “redundancy” to correct transmission errors
Shannon Source Coding Theorem
N identical independently distributed random variables
each with entropy H(x)
virtually certain that no information will be lost
N H(x) bitsvirtually certain that
information will be lost
compressionNumber of bits
0
Optimal Coding
• Requirements for a code– efficiency– uniquely decodable– immunity to noise– instantaneous
sourceencode/transmit
receive/decode
destination
A X BXchannel
output of transmitter = input to receiver
alphabet
Aoutput of source = input to encoder
{a1, a2, ... am}
X {b1, b2, ... bn}
Boutput of decoder = input to destination
{a1, a2, ... am}
Noise-free communication channel
Definitions
€
H / L × log2 n( )
€
b1,…,bn{ }
( )L N ai ii
m
= ∑ P=1
Coding conversion of source symbols into a different alphabet for transmission over a channel.Input to encoder = source alphabet = encoder output alphabet = Coding necessary if n < m
Code word group of output symbols corresponding to an input symbol (or group of input symbols)
Code set (table) of all input symbols (or input words) and the corresponding code words
Word Length number of output symbols in a code word
Average Word Length (AWL)
where Ni = length of word
for symbol ai
Optimal Code
has minimum average word length for a given source
Efficiency where H is the entropy per symbol of the source
€
a1,…,am{ }
Binary encoding• A (binary) symbol code f is a mapping or (abusing notation)
where {0,1}+ = {0, 1, 00, 01, 10, 11, 000, 001, … }
• if f has an inverse then it is uniquely decodable
• compression is achieved (on average) by assigning– shorter encodings to the more probable symbols in A– longer encodings to the less probable symbols
• easy to decode if we can identify the end of a codeword as soon as it arrives (instantaneous)– no codeword can be a prefix of another codeword– e.g 1 and 10 are prefixes of 101
€
f : A → 0,1{ }+
€
∀x,y ∈ A+, x ≠ y ⇒ f x( ) ≠ f y( )€
f : A+ → 0,1{ }+
Prefix codes• no codeword is a prefix of any other codeword.
– also known as an instantaneous or self-punctuating code,
– an encoded string can be decoded from left to right without looking ahead to subsequent codewords
– prefix code is uniquely decodeable (but not all uniquely decodable codes are prefix codes)
– can be written as a tree, leaves = codewords
a 1
b 10
c 100
d 1000
a 0
b 10
c 110
d 111
a 00
b 01
c 10
d 11
a 0
b 01
c 011
d 111
Limits on prefix codes• the maximum number of
codewords of length l is 2l
• if we shorten one codeword, we must lengthen others to retain unique decodability
• For any uniquely decodable binary coding, the codeword lengths li satisfy
€
2−li
i
∑ ≤1
(Kraft inequality)
0
00
0000000
0001
0010010
0011
01
0100100
0101
0110110
0111
1
10
1001000
1001
1011010
1011
11
1101100
1101
1111110
1111
source code 1 code 2 code 30 0 0000 0001 1 0001 0012 10 0010 01103 11 0011 0111
4 100 0100 01005 101 0101 01016 110 0110 100
7 111 0111 1018 1000 1000 1109 1001 1001 111
average word length 2.6 4 3.4
code 1 code 2 code 3
length variable fixed variable
efficiency
uniquely decodable
instantaneousprefix
Kraft inequality
• all source digits equally probable
• source entropy = log210 = 3.32 bits/sym
Coding example
Prefix codes (reminder)
• variable length• uniquely decodable• instantaneous• can be represented as a tree• no code word is a prefix of another
– e.g. if ABAACA is a code word then A, AB, ABA, ABAA, ABAAC cannot be used as code words
• Kraft inequality
€
2−li
i
∑ ≤1
Optimal prefix codes
• if Pr(a1) Pr(a2) … Pr(am),then l1 l2 … lm where li = length of word for symbol ai
• at least 2 (up to n) least probable input symbols will have the same prefix and only differ in the last output symbol
• every possible sequence up to lm-1 output symbols must be a code word or have one of its prefixes used as a code word (lm is the longest word length)
• for a binary code, the optimal word length for a symbol is equal to the information content i.e.
li = log2(1/pi)
Converse
• conversely, any set of word lengths {li} implicitly defines a set of symbol probabilities {qi} for which the word lengths {li} are optimal
€
qi =2−li
z
z = 2j
∑−l j
a 0
b 10
c 110
d 111
1/2
1/4
1/8
1/8
Compression - How close can we get to the entropy?
• We can always find a binary prefix code with average word length L satisfying
€
H A( ) ≤ L < H A( ) +1
€
x⎡ ⎤
x⎡ ⎤< x +1
Let be the smallest integer that is ≥ x
Clearly
€
li = log2
1
pi
⎛
⎝ ⎜
⎞
⎠ ⎟
⎡
⎢ ⎢
⎤
⎥ ⎥Now consider
Huffman prefix code
• used for image compression• General approach
– Work out necessary conditions for a code to be optimal– Use these to construct code
• from condition (3) of prefix code (earlier slide)am x x … x 0 (least probable)
am-1 x x … x 1 (next probable)
therefore assign final digit first• e.g. consider the source on the right
Symbol Probability
s1 0.1
s2 0.25
s3 0.2
s4 0.45
Algorithm
1. Lay out all symbols in a line, one node per symbol
2. Merge the two least probable symbols into a single node
3. Add their probabilities and assign this to the merged node
4. Repeat until only one node remains
5. Assign binary code from last node, assigning 0 for the lower probability link at each step
Examples1
Pr(s1)=0.1
s2
Pr(s2)=0.25
s3
Pr(s3)=0.2
s4
Pr(s4)=0.45
s1
Pr(s1)=0.1
s2
Pr(s2)=0.25
s3
Pr(s3)=0.2
s4
Pr(s4)=0.45
0.3
Example - contd.
s1
Pr(s1)=0.1
s2
Pr(s2)=0.25
s3
Pr(s3)=0.2
s4
Pr(s4)=0.45
0.3
0.55
1
Example - step 5
s1
Pr(s1)=0.1
s2
Pr(s2)=0.25
s3
Pr(s3)=0.2
s4
Pr(s4)=0.45
0.3
0.55
1
10
10 0
1
0
10
111110
Algorithm
1. Lay out all symbols in a line, one node per symbol
2. Merge the two least probable symbols into a single node
3. Add their probabilities and assign this to the merged node
4. Repeat until only one node remains
5. Assign binary code from last node, assigning 0 for the lower probability link at each step
Comments
• we can choose different ordering of 0 or 1 at each node– 2m different codes (m = number of merging nodes, i.e., not symbol
nodes)– 23 = 8 in previous example
• But, AWL is the same for all codes– hence source entropy and efficiency are the same
• What if n (number of symbols in code alphabet) is larger than 2?
– Condition (2) says we can group from 2 to n symbols– Condition (3) effectively says we should use groups as large as
possible and end with one composite symbol at end
Disadvantages of Huffman Code
• we have assumed that probabilities of our source symbols are known and fixed– symbol frequencies may vary with context (e.g. markov source)
• up to 1 extra bit per symbol is needed– could be serious if H(A) ≈1bit !– e.g. English : entropy is approx 1 bit per character
• beyond symbol codes - arithmetic coding– move away from the idea that one symbol integer number of
bits– e.g. Lempel-Ziv coding– not covered in this course
Another question
• consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown
• what is the probability that a randomly chosen bit from the encoded message is 1?
a 1/2 0
b 1/4 10
c 1/8 110
d 1/8 111
Shannon-Fano theorem
• Channel capacity– Entropy (bits/sec) of encoder determined by entropy of source (bits/sym)– If we increase the rate at which source generates information (bits/sym) eventually we
will reach the limit of the encoder (bits/sec). At this point the encoder’s entropy will have reached a limit
• This is the channel capacity• S-F theorem
– Source has entropy H bits/symbol– Channel has capacity C bits/sec– Possible to encode the source so that its symbols can be transmitted at up to C/H
symbols per second, but no faster– (general proof in notes)
source encode/transmit
receive/decode
destinationchannel
€
2−li
i
∑ = 2− log 1 p i( )⎡ ⎤
i
∑ ≤ 2− log 1 p i( )
i
∑ = pi
i
∑ =1
satisfies kraft
€
L = pi log 1 pi( )⎡ ⎤i
∑ < pi log 1 pi( ) +1( )i
∑ = H A( ) +1
average word length