On the Deletion and Insertion Channels

21
On the Deletion and Insertion Channels Xudong Ma Ph.D. Candidate Multimedia Communications Lab Electrical and Computer Engineering University of Waterloo March 9, 2005 Multimedia Communications Laboratory Seminar – p. 1/2

Transcript of On the Deletion and Insertion Channels

Page 1: On the Deletion and Insertion Channels

On the Deletion and InsertionChannels

Xudong Ma

Ph.D. Candidate

Multimedia Communications Lab

Electrical and Computer Engineering

University of Waterloo

March 9, 2005

Multimedia Communications Laboratory Seminar – p. 1/21

Page 2: On the Deletion and Insertion Channels

ExamplesSeries line with unknown varying clock speed.

hard disk: rotation speed uncertainty

DAT tape

DNA

Multimedia Communications Laboratory Seminar – p. 2/21

Page 3: On the Deletion and Insertion Channels

OutlineDiggavi-Grossglauser bound

Drinea-Mitzenmacher bound

Monte Carlo result by Kavcic and Monwani (ISIT 2004)

single deletion correction codes:Varshamov-Tenengolts codes

Mitzenmacher concatenate coding scheme

Mackay’s coding scheme based on watermark

Multimedia Communications Laboratory Seminar – p. 3/21

Page 4: On the Deletion and Insertion Channels

Diggavi-Grossglauser BoundGiven a stationary and ergodic deletion channel withlong-term deletion probability given by pd = 1 − θ (withpd < 1 − 1/K), and an input alphabet size K, the capacity ofthis channel is lower bounded as

C ≥ log

(

K

K − 1

)

+ θ log(K − 1) − H0(θ) (1)

Proof Sketch:

generate a random codebook of 2nR i.i.d.

collision error

atypical errors happen exponentially small errors

Multimedia Communications Laboratory Seminar – p. 4/21

Page 5: On the Deletion and Insertion Channels

Diggavi-Grossglauser boundAssume received (θ − ǫ)(n − 1) symbols

pairwise error probabilityThe number of sequence containing a subsequence y is

F (n, |y|, K) =n∑

j=|y|

(

n

j

)

(K − 1)n−j (2)

P2 =F (n,m,K)

Kn≤ n

Kn

(

n

m

)

(K − 1)n−m

≤ n

Kn2nH(m/n)(K − 1)n−m (3)

Multimedia Communications Laboratory Seminar – p. 5/21

Page 6: On the Deletion and Insertion Channels

Diggavi-Grossglauser BoundApply union bound

Pe ≤2nR n

Kn2nH(m/n)(K − 1)n−m

≤n

[

2R2H(m/n)K − 1

K

1

(K − 1)m/n

]n

(4)

The error goes to zero asymptotically

2R2H(m/n)K − 1

K

1

(K − 1)m/n< 1 (5)

Further improvement: Markov chain generatedcodebook

Multimedia Communications Laboratory Seminar – p. 6/21

Page 7: On the Deletion and Insertion Channels

Drinea-Mitzenmacher Boundbinary codeword consists of alternating blocks of zerosand ones

the length of each block is i.i.d. with a distribution P

let X denote the transmitted sequence, Y the receivedsequence

for each block of Y , associate a typet = (z, s1, r1, · · · , si, ri) depending on the blocks in X

probability that a block has type t

Pr[T = t] =Pz(1 − dz)

1 − x

(

i∏

l=1

PsldslPrl

)

(1 − x) (6)

where x =∑

j Pjdj

Multimedia Communications Laboratory Seminar – p. 7/21

Page 8: On the Deletion and Insertion Channels

Drinea-Mitzenmacher Bounddefine F (i, z, r, s) to be the family of types such that

consist of 2i + 1 blocksthe length of the first block is z

r =∑i

l=1 rl

s =∑i

l=1 sl

the probability that a block in the received sequencehas length k ≥ 1 is given by

Pk =

(

1 − d

d

)d∑

(i,z,r,s)

((

z + r

k

)

−(

r

k

))

dz+r+sPzQr,iQs,i

(7)

Multimedia Communications Laboratory Seminar – p. 8/21

Page 9: On the Deletion and Insertion Channels

Drinea-Mitzenmacher Boundthe expected number of blocks in the receivedsequence is approximately B = N(1 − d)/

k kPk

a received sequence Y is a typical output for acodeword X if it consists of Pr[T = t,K = k]B(1 + β)where

length 1 ≤ k ≤ c1 arise from type t with at most c2

blocks, c1, and c2 are fixed

β = Θ(1/√

N)

Multimedia Communications Laboratory Seminar – p. 9/21

Page 10: On the Deletion and Insertion Channels

Drinea-Mitzenmacher BoundBt,k denote the number of blocks of length k with type t

for typical output, Bt,k = Pr(T = t,K = t)B(1 + o(1))

consider all possible ways of choosing the type of eachblock in the received sequence Y being typical output

find the list of all possible input sequence X whichyields Y

typical set decoding

Multimedia Communications Laboratory Seminar – p. 10/21

Page 11: On the Deletion and Insertion Channels

Kavcic and Motwani Result

Multimedia Communications Laboratory Seminar – p. 11/21

Page 12: On the Deletion and Insertion Channels

Varshamov-Tenengolts codeFor 0 ≤ a ≤ n, the Varshamov-Tenengolts code V Ta(n)consists of all binary vectors (x1, · · · , xn) satisfying

n∑

i=1

ixi = a (mod n + 1) (8)

Assume the symbol s in position p is deleted

L0 0 and L1 1 to the left of s

R0 0 and R1 1 to the right of s

the weight w = L1 + R1

new check sum∑n−1

i=1 ix′i

Multimedia Communications Laboratory Seminar – p. 12/21

Page 13: On the Deletion and Insertion Channels

Varshamov-Tenengolts codethe difference between the new check sum and theoriginal one is at most n

if s = 0, the difference is R1 ≤ w

if s = 1, the difference isp + R1 = 1 + L0 + L1 + R1 = 1 + w + L0 > w

The decoding rule follows.

Multimedia Communications Laboratory Seminar – p. 13/21

Page 14: On the Deletion and Insertion Channels

Varshamov-Tenengolts codeV T0(5) = {00000, 10001, 01010, 11011, 11100, 00111}10001 is sent, 1001 is received

weight w = 2

checksum is 5

we conclude that a zero was deleted

we then conclude that R1 = 1

the decoding result is 10001

Multimedia Communications Laboratory Seminar – p. 14/21

Page 15: On the Deletion and Insertion Channels

Mitzenmacher scheme

LDPCEncoder

VTEncoder

MarkerEncoder

Multimedia Communications Laboratory Seminar – p. 15/21

Page 16: On the Deletion and Insertion Channels

Marker CodeTo solve the synchronization problem

periodically insert a marker

11111111000011111111

marker0000

2nd codeword1st codeword

1111111111111111

Multimedia Communications Laboratory Seminar – p. 16/21

Page 17: On the Deletion and Insertion Channels

Mackay code: Encoding

Watermark

SparsifierEncoderLDPC

+

Multimedia Communications Laboratory Seminar – p. 17/21

Page 18: On the Deletion and Insertion Channels

Sparsifer and WatermarkSparsifer map uniform sequence into sparse sequencein a block by block manner

Watermark is a sequence known to both the encoderand decoder

Multimedia Communications Laboratory Seminar – p. 18/21

Page 19: On the Deletion and Insertion Channels

Mackay code: Decoding

WatermarkDecoder

LDPCDecoder

Soft Message

Multimedia Communications Laboratory Seminar – p. 19/21

Page 20: On the Deletion and Insertion Channels

Watermark Decoding

1

2

3

4

5

6

7

8

r1 r2 r3 r4 r5 6 r7 r8 r9r

t

ttttttt

Multimedia Communications Laboratory Seminar – p. 20/21

Page 21: On the Deletion and Insertion Channels

Thank You

Questions?

Multimedia Communications Laboratory Seminar – p. 21/21