Sequitur Algorithm CSE 589 Applied Algorithms Data Structure...

1

CSE 589Applied Algorithms

Autumn 2001

Sequitur ConclusionImage CompressionVector Quantization

Nearest Neighbor SearchCSE 589 - Lecture 7 - Autumn

20012

Sequitur Algorithm

Input the first symbol s to create the production S -> s;repeat

match an existing rule:

create a new rule:

remove a rule:

input a new symbol:

until no symbols left

A -> ….XY…B -> XY

A -> ….B….B -> XY

A -> ….XY….B -> ....XY....

A -> ….C....B -> ....C....C -> XY

A -> ….B….B -> X1X2…Xk

A -> …. X1X2…Xk ….

S -> X1X2…Xk S -> X1X2…Xks

CSE 589 - Lecture 7 - Autumn 2001

3

Data Structure Example

S -> CeBCeA -> beB -> AAC -> bB

2

2

C

0 C e B C e

b e

A A

2 b B

B

A

S

BCeBCebeAAbB

digram table

current digram

reference count


4

Basic Encoding a Grammar

S -> DBDA -> beB -> AAD -> bBe

b 000e 001S 010A 011B 100D 101# 110

D B D # b e # A A # b B e101 100 101 110 000 001 110 011 011 110 000 100 001 39 bits

Grammar Symbol Code

Grammar Code

|Grammar Code| = )1(log)1( 2 ++−+ arrs

r = number of ruless = sum of right hand sidesa = number in original symbol alphabet


5

Better Encoding of the Grammar

• Nevill-Manning and Witten suggest a more efficient encoding of the grammar that resembles LZ77.– The first time a nonterminal is sent, its right hand

side is transmitted instead.– The second time a nonterminal is sent the new

production rule is established with a pointer to the previous occurrence sent along with the length of the rule.

– Subsequently, the nonterminal is represented by the index of the production rule.


6

Compression Quality

• Neville-Manning and Witten 1997

size compress gzip sequitur PPMCbib 111261 3.35 2.51 2.48 2.12book 768771 3.46 3.35 2.82 2.52geo 102400 6.08 5.34 4.74 5.01obj2 246814 4.17 2.63 2.68 2.77pic 513216 0.97 0.82 0.90 0.98progc 38611 3.87 2.68 2.83 2.49

Files from the Calgary CorpusUnits in bits per character (8 bits)compress - based on LZWgzip - based on LZ77PPMC - adaptive arithmetic coding with context

2


7

Notes on Sequitur

• Very new and different from the standards.• Yields compression and hierarchical structure

simultaneously.• With clever encoding is competitive with the

best of the standards.• Practical linear time encoding and decoding.• Alternatives

– Off-line algorithms – (i) find the most frequent digram, (ii) find the longest repeated substring


8

Exercise

• Apply Sequitur to the stringaaaaaaaaaaaaaaaa (16 a’s).

• What is the length of the code assuming a 2 symbol alphabet?


9

Lossy Image Compression Methods

• Basic theory - trade-off between bit rate and distortion.

• Vector quantization (VQ).– A indices of set of representative blocks can be

used to code an image, yielding good compression. Requires training.

• Wavelet Compression.– An image can be decomposed into a low resolution

version and the detail needed to recover the original. Sending most significant bits of the wavelet coded yields excellent compression.


10

JPEG Standard

• JPEG - Joint Photographic Experts Group– Current image compression standard. Uses

discrete cosine transform, scalar quantization, and Huffman coding.

– JPEG 2000 uses to wavelet compression.


11

Barbara

original JPEG

VQ Wavelet-SPIHT

32:1 compression ratio.25 bits/pixel (8 bits)


12

JPEG

3


13

VQ


14

SPIHT


15

Original


16

Distortion

• Lossy compression:• Measure of distortion is commonly mean

squared error (MSE). Assume x has n real components (pixels).

Encoder Decoder

compressedoriginal

x y x̂

decompressed

xx ˆ≠

2

1

)ˆ(1 ∑

=

−=n

iii xx

nMSE


17

PSNR

• Peak Signal to Noise Ratio (PSNR) is the standard way to measure fidelity.

• PSNR is measured in decibels (dB).• .5 to 1 dB is said to be a perceptible

difference.

)(log102

10 MSE

mPSNR =

where m is the maximum value of a pixel possible. For gray scale images (8 bits per pixel) m = 255.


18

Rate-Fidelity Curve

20222426283032343638

0.02

0.13

0.23

0.33

0.42

0.52

0.65

0.73

0.83

0.92

bits per pixel

PS

NR

SPIHT coded Barbara

Properties:- Increasing- Slope decreasing

4


19

PSNR is not Everything

PSNR = 25.8 dB PSNR = 25.8 dB

VQ


20

PSNR Reflects Fidelity (1)

PSNR 25.8.63 bpp12.8 : 1

VQ


21


PSNR 24.2.31 bpp25.6 : 1

VQ


22


PSNR 23.2.16 bpp51.2 : 1

VQ


23

Vector Quantization

1

2

n

.

.

.

source imagecodebook

1

2

n

.

.

.

codebook

i

index of nearest codeword

decoded image


24

Vector Quantization Facts

• The image is partitioned into a x b blocks.• The codebook has n representative a x b

blocks called codewords, each with an index.• Compression is

• Example: a = b = 4 and n = 1,024– compression is 10/16 = .63 bpp– compression ratio is 8 : .63 = 12.8 : 1

ab

n2log bpp

5


25

Examples

4 x 4 blocks.63 bpp

4 x 8 blocks.31 bpp

8 x 8 blocks.16 bpp

Codebook size = 1,024


26

Encoding and Decoding

• Encoding:– Scan the a x b blocks of the image. For each

block find the nearest codeword in the code book and output its index.

– Nearest neighbor search.

• Decoding:– For each index output the codeword with that

index into the destination image.– Table lookup.


27

The Codebook

• Both encoder and decoder must have the same codebook.

• The codebook must be useful for many images and be stored someplace.

• The codebook must be designed properly to be effective.

• Design requires a representative training set.• These are major drawbacks to VQ.


28

Codebook Design Problem

• Input: A training set X of vectors of dimension d and a number n. (d = a x b and n is number of codewords)

• Ouput: n vectors c1, c2, ... cn that minimizes the sum of the distances from each member of the training set to its nearest codeword. That is minimizes

where cn(x) is the nearest codeword to x.

( )∑ ∑∑∈

=∈

−=−Xx

d

i xnXx

xn ixicxc1

2)()( )()(


29

Algorithms for Codebook Design

• The optimal codebook design problem appears to be a NP-hard problem.

• There is a very effective method, called the generalized Lloyd algorithm (GLA) for finding a good local minimum.

• GLA is also known in the statistics community as the k-means algorithm.

• GLA is slow.


30

GLA

• Start with an initial codebook c1, c2, ... cn and training set X.

• Iterate:– Partition X into X1, X2, ..., Xn where Xi includes the

members of X that are closest to ci.– Let xi be the centroid of Xi .

– if is sufficiently small then quit.otherwise continue the iteration with the new codebook c1, c2, ... cn = x1, x2, ... xn .

∑∈

=iXxi

i xX

x1

∑ =−n

i ii cx1

6


31

GLA Example (1)

codeword

trainingvector

ci


32

GLA Example (2)

codeword

trainingvector

ci

Xi


33

GLA Example (3)

codeword

trainingvector

centroid

ci

Xi

xi


34

GLA Example (4)

codeword

trainingvector


35

GLA Example (5)

codeword

trainingvector


36

GLA Example (6)

codeword

trainingvector

centroid

7


37

GLA Example (7)

codeword

trainingvector


38

GLA Example (8)

codeword

trainingvector


39

GLA Example (9)

codeword

trainingvector

centroid


40

GLA Example (10)

codeword

trainingvector

centroid


41

Codebook

1 x 2 codewords

Note: codewordsdiagonally spread


42

GLA Advice

• Time per iteration is dominated by the partitioning step, which is m nearest neighbor searches where m is the training set size. – Average time per iteration O(m log n) assuming d

is small.

• Training set size.– Training set should be at least 20 training vectors

per code word to get reasonable performance.– To small a training set results in “over training”.

• Number of iterations can be large.

8


43

Nearest Neighbor Search

• Preprocess a set of n vectors V in ddimensions into a search data structure.

• Input: A query vector q.• Ouput: The vector v in V that is nearest to q.

That is, the vector v in V that minimizes

( )∑=

−=−d

i

iqivqv1

22)()((


44

NNS in VQ

• Used in codebook design.– Used in GLA to partition the training set.– Since codebook design is seldom done then

speed of NNS is not too big a issue.

• Used in VQ encoding.– Codebook size is commonly 1,000 or more.– Naive linear search would make encoding too

slow.– Can we do better than linear search?


45

Naive Linear Search

• Keep track of the current best vector, best-v, and best distance squared, best-squared-d. – For an unsearched vector v compute ||v -q||2 to

see if it smaller than best-squared-d. – If so then replace best-v.– If d is moderately large it is a good idea not to

compute the squared distance completely. Bail out when k < d and

∑=

−≤−−k

i

iqiv1

2))()((dsquaredbest


46

Orchard’s Method

• Invented by Orchard (1991)• Uses a “guess codeword”. The guess

codeword is the codeword of an adjacent coded vector.

• Orchard’s Principle.– if r is the distance between the guess codeword

and the query vector q then the nearest codeword to q must be within a distance of 2r of the guess codeword.


47

Orchard’s Principle

query

codeword

r 2r

guess codeword

Nearest codeword within distance r of guess codeword.


48

Orchard Data Structure

6 5 4 2 3 8 7

3 6 7 5 8 1 4

5 2 7 8 6 4 1

1

1

2

3

4

5

6

7

8

• For each codeword sort the other codewords by distance to the codeword.

2

3

.

.

.

indexdistance

A[1..8,1..7]

9


49

Basic Orchard Algorithm

Let i be the index of initial guess codeword;r := b := ||ci - q||;best-index := i;j := 1;while A[i,j].distance < 2*r do

if ||cj - q|| < b then best-index := j;b := ||cj - q||;

j := j + 1;

This algorithm searches all the codewords within a distance 2r of the guess codeword.


50

Orchard Improvements

• Early bailout using squared distance:– ||cj - q|| < b is done by early bailout using the

comparison ||cj - q||2 < b2 .

• Switching Lists:– When a nearer codeword is found then switch the

search to its list.– Care must be taken to avoid doing a distance

computation the same codeword twice. Marking visited codewords solves this problem.


51

Switching Lists (1)

query

unvisitedcodeword

r

2r

best codeword

visitedcodeword


52

Switching Lists (2)

query

unvisitedcodeword

r

2r

best codeword

visitedcodeword


53

Orchard Notes

• Very fast.– Appears O(log n) average time per search, but

there is no proof of this performance.

• Requires too much memory.– Requires O(n2) memory for the sorted lists.– Modification for large n. Just store the first m

closest to make an n x m array. If the search runs off the array then revert to linear search.


54

k-d Tree

• Jon Bentley, 1975• Tree used to store spatial data.

– Nearest neighbor search.– Range queries.– Fast look-up

• k-d tree are guaranteed log2 n depth where n is the number of points in the set.– Traditionally, k-d trees store points in d-

dimensional space which are equivalent to vectors in d-dimensional space.

10


55

k-d Tree Construction

• If there is just one point, form a leaf with that point.

• Otherwise, divide the points in half by a line perpendicular to one of the axes.

• Recursively construct k-d trees for the two sets of points.

• Division strategies– divide points perpendicular to the axis with widest

spread.– divide in a round-robin fashion.


56

x

y

k-d Tree Construction (1)

ab

f

c

gh

ed

i

divide perpendicular to the widest spread.


57

y


x

ab

c

gh

ed

i s1

s1

x

f


58

y


x

ab

c

gh

ed

i s1

s2y

s1

s2

x

f


59

y


x

ab

c

gh

ed

i s1

s2y

s3x

s1

s2

s3

x

f


60

y


x

ab

c

gh

ed

i s1

s2y

s3x

s1

s2

s3

a

x

f

11


61

y


x

ab

c

gh

ed

i s1

s2y

s3x

s1

s2

s3

a b

x

f


62

y


x

ab

c

gh

ed

i s1

s2y

s3x

s4y

s1

s2

s3

s4

a b

x

f


63

y


x

ab

c

gh

ed

i s1

s2y

s3x

s4y

s5x

s1

s2

s3

s4

s5

a b

x

f


64

y


x

ab

c

gh

ed

i s1

s2y

s3x

s4y

s5x

s1

s2

s3

s4

s5

a b

dx

f


65

y


x

ab

c

gh

ed

i s1

s2y

s3x

s4y

s5x

s1

s2

s3

s4

s5

a b

d ex

f


66

y


x

ab

c

gh

ed

i s1

s2y

s3x

s4y

s5x

s1

s2

s3

s4

s5

a b

d e

g

x

f

12


67

y


x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s5x

s1

s2

s3

s4

s5

s6

a b

d e

g

x

f


68

y


x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s5x

s1

s2

s3

s4

s5

s6

s7a b

d e

g

x

f


69

y


x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s5x

s1

s2

s3

s4

s5

s6

s7a b

d e

g c

x

f


70

y


x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s5x

s1

s2

s3

s4

s5

s6

s7a b

d e

g c f

x

f


71

y


x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f

x

f


72

y


x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h

x

f

13


73

y


x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f


74

k-d Tree Construction Complexity

• First sort the points in each dimension.– O(dn log n) time and dn storage.– These are stored in A[1..d,1..n]

• Finding the widest spread and equally divide into two subsets can be done in O(dn) time.

• Constructing the k-d tree can be done in O(dn log n) and dn storage


75

k-d Tree Codebook Organizaton


76

x

y

k-d Tree Splitting

ab

c

gh

ed

i a d g b e i c h fa c b d f e h g i

xy

sorted points in each dimension

f

• max spread is the max offx -ax an iy - ay.

• In the selected dimension themiddle point in the list splits the data.

• To build the sorted lists for the other dimensions scan the sortedlist adding each point to one of two sorted lists.

1 2 3 4 5 6 7 8 9


77

Node Structure for k-d Trees

• A node has 5 fields– axis (splitting axis)– value (splitting value)– left (left subtree)– right (right subtree)– point (holds a point if left and right children are null)


78

k-d Tree Nearest Neighbor Search

NNS(q, root, p, infinity)initial call

NNS(q: point, n: node, p: ref point w: ref distance)if n.left = n.right = null then {leaf case}

w’ := ||q - n.point||;if w’ < w then w := w’; p := n.point;

elseif w = infinity then

if q(n.axis) < n.value then NNS(q, n.left, p, w);if q(n.axis) + w > n.value then NNS(q, n.right, p, w);

else NNS(q, n.right, p, w);if q(n.axis) - w < n.value then NNS(q, n.left, p, w)

else {w is finite}if q(n.axis) - w < n.value then NNS(q, n.left, p, w)if q(n.axis) + w > n.value then NNS(q, n.right, p, w);

14


79

y

k-d Tree NNS (1)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

query point


80

y

k-d Tree NNS (2)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

query point


81

y

k-d Tree NNS (3)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

query point


82

y

k-d Tree NNS (4)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

query point

w


83

y

k-d Tree NNS (5)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

c f h i

x

f

query point

w

g


84

y

k-d Tree NNS (6)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

c f h i

x

f

query point

w

g

15


85

y

k-d Tree NNS (7)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

c f h i

x

f

query point

w

g


86

y

k-d Tree NNS (8)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

c f h i

x

f

query point

w

g


87

e

y

k-d Tree NNS (9)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

g


88

y

k-d Tree NNS (10)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g


89

y

k-d Tree NNS (11)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g


90

y

k-d Tree NNS (12)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

16


91

y

k-d Tree NNS (13)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g


92

y

k-d Tree NNS (14)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g


93

y

k-d Tree NNS (15)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g


94

y

k-d Tree NNS (16)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g


95

y

k-d Tree NNS (17)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g


96

y

k-d Tree NNS (18)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

17


97

y

k-d Tree NNS (19)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g


98

y

k-d Tree NNS (20)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g


99

y

k-d Tree NNS (21)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g


100

Notes on k-d NNS

• Has been shown to run in O(log n) average time per search in a reasonable model. (Assume d a constant)

• For VQ it appears that O(log n) is correct.• Storage for the k-d tree is O(n).• Preprocessing time is O(n log n) assuming d

is a constant.


101

Alternative is PCP-Tree

• Zatloukal, Johnson, Ladner (1999)• Organize a tree using principal components

partitioning.– Partition the data perpendicular to the line that

minimizes the sum of the distances of the points to the line. Eigenvector computation required.

• About is easy to construct as the k-d tree.• In NNS processing per node in the PCP tree

is more expensive than in the k-d tree, but fewer codewords are searched.


102

Principal Component Partition

18


103

PCP Tree vs. k-d tree

PCP k-d


104

Comparison in Time per Search

0

1

2

3

4

5

6

7

4D 16D 64D

dimension

norm

aliz

ed ti

me

per

sea

rch

Orchard

k-d tree

PCP tree

4,096 codewords


105

NNS Summary

• Orchard – fastest– excessive memory– good guess codeword

needed

• k-d tree– good in low dimension– small storage– no guess codeword

• PCP– best in high dimension– fewer codewords

searched than k-d– small storage– no guess codeword


106

Notes on VQ

• Works well in some applications.– Requires training

• Has some interesting algorithms.– Codebook design– Nearest neighbor search

• Variable length codes for VQ.– PTSVQ - pruned tree structured VQ (Chou,

Lookabaugh and Gray, 1989)– ECVQ - entropy constrained VQ (Chou,

Lookabaugh and Gray, 1989)

Sequitur Algorithm CSE 589 Applied Algorithms Data Structure...

Documents

Transcript of Sequitur Algorithm CSE 589 Applied Algorithms Data Structure...