Forensic performance of short amplicon Insertion-Deletion ...
On the Deletion and Insertion Channels
Transcript of On the Deletion and Insertion Channels
On the Deletion and InsertionChannels
Xudong Ma
Ph.D. Candidate
Multimedia Communications Lab
Electrical and Computer Engineering
University of Waterloo
March 9, 2005
Multimedia Communications Laboratory Seminar – p. 1/21
ExamplesSeries line with unknown varying clock speed.
hard disk: rotation speed uncertainty
DAT tape
DNA
Multimedia Communications Laboratory Seminar – p. 2/21
OutlineDiggavi-Grossglauser bound
Drinea-Mitzenmacher bound
Monte Carlo result by Kavcic and Monwani (ISIT 2004)
single deletion correction codes:Varshamov-Tenengolts codes
Mitzenmacher concatenate coding scheme
Mackay’s coding scheme based on watermark
Multimedia Communications Laboratory Seminar – p. 3/21
Diggavi-Grossglauser BoundGiven a stationary and ergodic deletion channel withlong-term deletion probability given by pd = 1 − θ (withpd < 1 − 1/K), and an input alphabet size K, the capacity ofthis channel is lower bounded as
C ≥ log
(
K
K − 1
)
+ θ log(K − 1) − H0(θ) (1)
Proof Sketch:
generate a random codebook of 2nR i.i.d.
collision error
atypical errors happen exponentially small errors
Multimedia Communications Laboratory Seminar – p. 4/21
Diggavi-Grossglauser boundAssume received (θ − ǫ)(n − 1) symbols
pairwise error probabilityThe number of sequence containing a subsequence y is
F (n, |y|, K) =n∑
j=|y|
(
n
j
)
(K − 1)n−j (2)
P2 =F (n,m,K)
Kn≤ n
Kn
(
n
m
)
(K − 1)n−m
≤ n
Kn2nH(m/n)(K − 1)n−m (3)
Multimedia Communications Laboratory Seminar – p. 5/21
Diggavi-Grossglauser BoundApply union bound
Pe ≤2nR n
Kn2nH(m/n)(K − 1)n−m
≤n
[
2R2H(m/n)K − 1
K
1
(K − 1)m/n
]n
(4)
The error goes to zero asymptotically
2R2H(m/n)K − 1
K
1
(K − 1)m/n< 1 (5)
Further improvement: Markov chain generatedcodebook
Multimedia Communications Laboratory Seminar – p. 6/21
Drinea-Mitzenmacher Boundbinary codeword consists of alternating blocks of zerosand ones
the length of each block is i.i.d. with a distribution P
let X denote the transmitted sequence, Y the receivedsequence
for each block of Y , associate a typet = (z, s1, r1, · · · , si, ri) depending on the blocks in X
probability that a block has type t
Pr[T = t] =Pz(1 − dz)
1 − x
(
i∏
l=1
PsldslPrl
)
(1 − x) (6)
where x =∑
j Pjdj
Multimedia Communications Laboratory Seminar – p. 7/21
Drinea-Mitzenmacher Bounddefine F (i, z, r, s) to be the family of types such that
consist of 2i + 1 blocksthe length of the first block is z
r =∑i
l=1 rl
s =∑i
l=1 sl
the probability that a block in the received sequencehas length k ≥ 1 is given by
Pk =
(
1 − d
d
)d∑
(i,z,r,s)
((
z + r
k
)
−(
r
k
))
dz+r+sPzQr,iQs,i
(7)
Multimedia Communications Laboratory Seminar – p. 8/21
Drinea-Mitzenmacher Boundthe expected number of blocks in the receivedsequence is approximately B = N(1 − d)/
∑
k kPk
a received sequence Y is a typical output for acodeword X if it consists of Pr[T = t,K = k]B(1 + β)where
length 1 ≤ k ≤ c1 arise from type t with at most c2
blocks, c1, and c2 are fixed
β = Θ(1/√
N)
Multimedia Communications Laboratory Seminar – p. 9/21
Drinea-Mitzenmacher BoundBt,k denote the number of blocks of length k with type t
for typical output, Bt,k = Pr(T = t,K = t)B(1 + o(1))
consider all possible ways of choosing the type of eachblock in the received sequence Y being typical output
find the list of all possible input sequence X whichyields Y
typical set decoding
Multimedia Communications Laboratory Seminar – p. 10/21
Kavcic and Motwani Result
Multimedia Communications Laboratory Seminar – p. 11/21
Varshamov-Tenengolts codeFor 0 ≤ a ≤ n, the Varshamov-Tenengolts code V Ta(n)consists of all binary vectors (x1, · · · , xn) satisfying
n∑
i=1
ixi = a (mod n + 1) (8)
Assume the symbol s in position p is deleted
L0 0 and L1 1 to the left of s
R0 0 and R1 1 to the right of s
the weight w = L1 + R1
new check sum∑n−1
i=1 ix′i
Multimedia Communications Laboratory Seminar – p. 12/21
Varshamov-Tenengolts codethe difference between the new check sum and theoriginal one is at most n
if s = 0, the difference is R1 ≤ w
if s = 1, the difference isp + R1 = 1 + L0 + L1 + R1 = 1 + w + L0 > w
The decoding rule follows.
Multimedia Communications Laboratory Seminar – p. 13/21
Varshamov-Tenengolts codeV T0(5) = {00000, 10001, 01010, 11011, 11100, 00111}10001 is sent, 1001 is received
weight w = 2
checksum is 5
we conclude that a zero was deleted
we then conclude that R1 = 1
the decoding result is 10001
Multimedia Communications Laboratory Seminar – p. 14/21
Mitzenmacher scheme
LDPCEncoder
VTEncoder
MarkerEncoder
Multimedia Communications Laboratory Seminar – p. 15/21
Marker CodeTo solve the synchronization problem
periodically insert a marker
11111111000011111111
marker0000
2nd codeword1st codeword
1111111111111111
Multimedia Communications Laboratory Seminar – p. 16/21
Mackay code: Encoding
Watermark
SparsifierEncoderLDPC
+
Multimedia Communications Laboratory Seminar – p. 17/21
Sparsifer and WatermarkSparsifer map uniform sequence into sparse sequencein a block by block manner
Watermark is a sequence known to both the encoderand decoder
Multimedia Communications Laboratory Seminar – p. 18/21
Mackay code: Decoding
WatermarkDecoder
LDPCDecoder
Soft Message
Multimedia Communications Laboratory Seminar – p. 19/21
Watermark Decoding
1
2
3
4
5
6
7
8
r1 r2 r3 r4 r5 6 r7 r8 r9r
t
ttttttt
Multimedia Communications Laboratory Seminar – p. 20/21
Thank You
Questions?
Multimedia Communications Laboratory Seminar – p. 21/21