IRISApeople.irisa.fr/.../CompressionTools_DIIC3...1011.pdf · Introduction Entropy Coding Other...

HistoryTable of Content

COMPRESSION

O. Le [email protected]

Univ. of Rennes 1http://www.irisa.fr/temics/staff/lemeur/

October 2010

1

mailto:[email protected]

http://www.irisa.fr/temics/staff/lemeur/


VERSION:

2009-2010: Document creation, done by OLM;

2010-2011: Document updated, done by OLM:Major revions of the part concerning lossless vs lossy coding.

2


TOOLS FOR IMAGE AND VIDEO COMPRESSION

1 Introduction

2 Entropy Coding

3 Other coding methods

4 Lossless vs lossy coding

5 Distortion/quality assessment

6 Quantization

7 Predictive Coding

8 Transform coding

9 Motion estimation

3

IntroductionEntropy Coding

Other coding methodsLossless vs lossy coding

Distortion/quality assessmentQuantization

Predictive CodingTransform codingMotion estimation

SummaryIntroduction

1 Introduction

2 Entropy Coding




6 Quantization

7 Predictive Coding

8 Transform coding

9 Motion estimation

4





SummaryIntroduction

Why is it required to compress information?

Example (Facts)

Standard denition 720× 576, 16 bits/pixel, 50 Hz:

6.6 Mbits/image (720× 576× 16)

330 Mbits/second...

5





Summary

Some denitionsDenition of entropy codingFano-Shannon codingHuman codingArithmetic coding

Entropy Coding

1 Introduction

2 Entropy CodingSome denitionsDenition of entropy codingFano-Shannon codingHuman codingArithmetic coding




6 Quantization

7 Predictive Coding

8 Transform coding

9 Motion estimation

6





Summary


Some denitions

Denition (Alphabet)

An alphabet is a set of data a1, ..., aN that we might wish to encode.

Denition (Code, Codewords)

A code C is a mapping from an alphabet a1, ..., aN to a set of nite length binarystrings. C(aj ) is called the codeword for symbol aj .

Denition (length of a codeword)

The length l(aj ) of a codeword C(aj ) is the number of bits of this codeword.

Denition (Fixed length code)

A xed length code is a code such that l(aj ) = l(ai ), ∀i , j .

Denition (Variable Length Code (VLC))

A variable length code is a code that is not a xed length code.7





Summary


Some denitions

Denition (Prex code)

A code is called a prex code (instantaneous code) if no codeword is a prex ofanother codeword.

Denition (Optimal prex code)

Assume an alphabet of N symbols with probabilities p(ai ). An optimal prex code Cis a prex code with minimal average length, that is, if C ′is another prex code andl(ai )

′ are the lengths of codewords of C ′ then∑Ni=1 l(ai )p(ai ) ≤

∑Ni=1 l(ai )

′p(ai )

8





Summary


Entropy coding

Denition (Entropy coding)

The entropic coding converts a vector X of integers from a source S into a binarystream Y . It exploits the redundancies in the statistical distribution of X to reduce asmuch as possible the size of Y (Variable Length Codes).

Ideally, the codewords are optimal such that H(S) ≤ l ≤ H(S) + 1, withl =

∑i l(ai )p(ai ).

Remark

The lower bound for the number of bits of Y is the Shannon entropy H(S) given byH(S) = −

∑i p(ai )× log2(p(ai )).

This is a lossless data compression...

9





Summary


Fano-Shannon coding

Algorithm:

1. Sort symbols according to their probabilities;2. Recursively divide into two equiprobable parts;3. One part is set to 0, the other to 1.

Example (A = a0, ..., a7 (probabilities of each symbol are given below))

a7 p(a7) = 0.0625

a6 p(a6) = 0.0625

a5 p(a5) = 0.0625

a4 p(a4) = 0.0625

a3 p(a3) = 0.14

a2 p(a2) = 0.15

a1 p(a1) = 0.21

a0 p(a0) = 0.25

1

1

1

1

1

1

0

0

1

1

1

1

0

0

1

0

1

1

0

0

1

0

1

0

1

0

C(a7)=1111

C(a6)=1110

C(a5)=1101

C(a4)=1100

C(a3)=101

C(a2)=100

C(a1)=01

C(a0)=00

10





Summary


Human coding

David Human proposed in 1952 a method for building an optimal prex code for agiven source S. Its average word lenght l is in the range H(S) ≤ l ≤ H(S) + 1.The proposed algorithm rests on three principles:

1 if p(X = xj ) > p(X = xi ), i 6= j , then l(xj ) ≥ l(xi );

2 the two symbols having the less important probability have the same length;

3 These two symbols have the same nmax − 1 last values.

Algorithm

1 Sort symbols according to their probabilities;

2 A binary tree is generated from left to right taking the two less probable symbols,putting them together to form another equivalent symbol having a probabilitythat equals the sum of the two symbols;

3 The process is repeated until there is just one symbol;

4 The tree can then be read backwards, from right to left, assigning dierent bitsto dierent branches.

11





Summary


Human coding

Example

a7 0.0625

a6 0.0625

a5 0.0625

a4 0.0625

a3 0.14

a2 0.15

a1 0.21

a0 0.25

0.125

1

0

0.125

1

0

0.25

0

1

0.29

1

0

0.460

1

0.540

1

1.00

1

C(a7)=1111

C(a6)=1110

C(a5)=1101

C(a4)=1100

C(a3)=011

C(a2)=010

C(a1)=10

C(a0)=00

H(S) = 2.781

l = 2.79

12





Summary


Human coding

Human coding individually codes each input symbol according to the symbolprobabilities. An integer number of bits is associated to each symbol and thisnumber is never less than 1.

Although Human coding is optimal for a symbol-by-symbol coding, sometimes,its optimatily is not as good as one could expect. This is the case when theprobability of one or more symbols is very high.

Example

We assume that A = a0, a1, a2, with p(a0) = 0.02, p(a1) = 0.18, p(a2) = 0.8.

a0 0.02

a1 0.18

a2 0.8

C(a0)=11

C(a1)=01

C(a2)=0 H(S) = 0.8157

l = 1.2

13





Summary


Arithmetic coding

Denition

Arithmetic coding is a lossless encoding method that allows combining multiplesymbols into a single codable unit. A message is then encoded as a real number in aninterval from one to zero.

Basic algorithm for arithmetic coding

1 Start with a current interval [L,H[ initialized to [0, 1[.

2 Subdivided it into subintervals, one for each possible event. The size of a event'ssubinterval is proportional to the probability of the symbol.

3 We select the subinterval corresponding to the event and make it the new currentinterval (we redene this interval into smaller ones as previously described, so goback to 1);

4 The process above is repeated until all symbols are encoded or until themaximum precision of the machine is reached.

14





Summary


Arithmetic coding

Example

We assume that A = a, b, c, with p(a) = 0.6, p(b) = 0.3, p(c) = 0.1. Suppose wewant to encode the message acb.

a b c0 10.6 0.9

a0 0.36 0.54 0.6

c0.54 0.576 0.594 0.6

15





Summary


Arithmetic coding

Example

We assume that A = a, b, c, with p(a) = 0.6, p(b) = 0.3, p(c) = 0.1. Suppose wewant to encode the message acb.

a b c0 10.6 0.9

a0 0.36 0.54 0.6

c0.54 0.576 0.594 0.6

Final interval:[0.576, 0.594[acb can be coded by the number 0.59375 (=(0.10011)2). This is the shortest binary

fraction that lies within the interval.

(0.10011)2 = (1× 12

+ 0× 122

+ 0× 123

+ 1× 124

+ 1× 125

)10

15





Summary


Arithmetic coding

Example

We assume that A = a, b, c, with p(a) = 0.6, p(b) = 0.3, p(c) = 0.1. Suppose wewant to decode the codeword 0.10011(0.59375). acb.

a b c0 10.6 0.9

0.59375

0 0.36 0.54 0.60.59375

c

0.54 0.576 0.594 0.6

0.59375

b

Final message: acb

16





Summary


Arithmetic coding

The previous description is rather theoric and dicult to implement. Two drawbackscan be mentionned:

the shrinking current interval requires the use of high precision arithmetic(especially when the sequence to encode is long (small intervalls))

encoding delay (no output until the entire message has been read).

Pratical arithmetic coding [Witten et al.,87]

Let dene an interval [0, 1[. i and s denote the lower and the higher bound of theinterval, respectively. f is a bit counter called underow. Three rules are applied onthe interval of the current symbol:

(R1) if the interval is featured by s ≤ 0.5 then i → 2i and s → 2s, send a bit 0, and fbits of 1, f = 0;

(R2) if the interval is featured by i ≥ 0.5 then i → 2(i − 0.5) and s → 2(s − 0.5), senda bit 1, and f bits of 0, f = 0

(R3) if the interval is featured by 0.25 ≤ i < 0.5 ≤ s < 0.75 then i → 2(i − 0.25) ands → 2(s − 0.25), f + +.

17





Summary


Arithmetic coding

Example (Arithmetic coding with incremental transmission)

p(a) = 0.6; p(b) = 0.2; p(c) = 0.2. We want to code abc.

Next symbol i s Code f

a 0 0.6 −− 0b 0.36 0.48R1 0 0

0.72 0.96R2 1 0

0.44 0.92c 0.792 0.92R2 1 0

0.584 0.84R2 1 0

0.168 0.68

Code for the message abc 0111.

18





Summary

Run-Length CodingLempel-Ziv-Welch (LZW) algorithm

Other coding methods

1 Introduction

2 Entropy Coding

3 Other coding methodsRun-Length CodingLempel-Ziv-Welch (LZW)algorithm



6 Quantization

7 Predictive Coding

8 Transform coding

9 Motion estimation

19





Summary


Run-Length Coding

Denition

A sequence of identic symbols is called a run. Each run is represented by a singlecodeword (the determination of these codewords is, most of the time, based onHuman's procedure).

Example

We assume that A = a, b, c. We want to code the message aaaabcbc.

aaaabcbc → (a, 4)(b, 1)(c, 1)(b, 1)(c, 1).

Remark:

Only interesting when there exist large uniform areas (fax)...

20





Summary


Lempel-Ziv-Welch (LZW) algorithm

Denition

Lempel-Ziv-Welch (LZW) is a universal lossless data compression algorithm created byLempel, Ziv, and Welch. It was published by Welch in 1984 as an improvedimplementation of the LZ78 algorithm published by Lempel and Ziv in 1978. They areboth dictionary coders, unlike minimum redundancy coders [Welch,84].

Example

21





Summary



Denition


Example

Dictionnary Message Coded symbol index New entrya,b,c abababaacb a 0 ab

21





Summary



Denition


Example


a,b,c,ab bababaacb b 1 ba

21





Summary



Denition


Example


a,b,c,ab bababaacb b 1 baa,b,c,ab,ba ababaacb ab 3 aba

21





Summary



Denition


Example



a,b,c,ab,ba,aba abaacb aba 5 abaa

21





Summary



Denition


Example



a,b,c,ab,ba,aba abaacb aba 5 abaaa,b,c,ab,ba,aba,abaa acb a 0 ac

21





Summary



Denition


Example




a,b,c,ab,ba,aba,abaa,ac cb c 2 cb

21





Summary



Denition


Example




a,b,c,ab,ba,aba,abaa,ac cb c 2 cba,b,c,ab,ba,aba,abaa,ac,cb b b 1

Message abababaacb is coded by 0,1,3,5,0,2,1.

21





Summary


Lempel-Ziv-Welch (LZW) algorithmDecoding...

To decode an LZW-compressed message, we need to know the initial dictionnary. Additionalentries will be recontructed during the decoding process. These entries are always simpleconcatenations of previous entries.

Example (0,1,3,5,0,2,1)

22





Summary




Example (0,1,3,5,0,2,1)

Dictionnary Index Received symbol Message New entrya,b,c 0 a a

22





Summary




Example (0,1,3,5,0,2,1)

Dictionnary Index Received symbol Message New entrya,b,c 0 a a a,b,c 1 b ab ab

22





Summary




Example (0,1,3,5,0,2,1)


a,b,c,ab 3 ab abab ba

22





Summary




Example (0,1,3,5,0,2,1)


a,b,c,ab 3 ab abab ba

a,b,c,ab,ba 5 ab+a abababa aba

There is no codeword for this index, this is a new entry...In this case, the decoded codeword is composed of the previous decoded codeword ab plus its

rst char a. We decode aba.

22





Summary




Example (0,1,3,5,0,2,1)


a,b,c,ab 3 ab abab baa,b,c,ab,ba 5 ab+a abababa aba

a,b,c,ab,ba,aba 0 a abababaa abaa

22





Summary




Example (0,1,3,5,0,2,1)



a,b,c,ab,ba,aba 0 a abababaa abaaa,b,c,ab,ba,aba,abaa 2 c abababaac ac

22





Summary




Example (0,1,3,5,0,2,1)



a,b,c,ab,ba,aba 0 a abababaa abaaa,b,c,ab,ba,aba,abaa 2 c abababaac ac

a,b,c,ab,ba,aba,abaa,ac 1 b abababaacb cb

22





Summary

Limitations of the lossless compressionElement of Rate/Distortion theoryLagrangian formulation of the R-D problem

Lossless vs lossy coding

1 Introduction

2 Entropy Coding


4 Lossless vs lossy codingLimitations of the losslesscompressionElement of Rate/Distortion theoryLagrangian formulation of the R-Dproblem


6 Quantization

7 Predictive Coding

8 Transform coding

9 Motion estimation

23





Summary


Lossless vs lossy codingLimitations of the lossless compression

Reminding the goal of a lossless coding:

To nd code words C such that the average length lC is close to the entropy of thesource H(S).

Small compression rate (distortionless):

2 or 3 for natural images;

3 or 4 for video sequences.

To increase this rate, it is necessary to degrade the source quality:

reduction of the image source resolution (HD, SD, CIF, QCIF);

frame dropping...

24





Summary


Rate distortion theory is the branch of information theory addressing the problem ofdetermining the minimal amount of entropy or information that should be

communicated over a channel such that the source can be reconstructed at thereceiver with given distortion.

Brief introduction about distortion (see next section):

d(x , y) ≥ 0, x and y are the transmitted and received data;

d(x , y) = 0, if x = y .

25





Summary


Lossless vs lossy codingElement of Rate/Distortion theory

Rate/Distortion theory calculates the minimum transmission bit-rate R for a requiredlevel of distortion D.

D

R(D)

Entropy/Lossless coding

Source Information

Redundancy

Max. Distortion

D0

R(D0)

26





Summary


Lossless vs lossy codingElement of Rate/Distortion theory

Rate/Distortion theory calculates the minimum transmission bit-rate R for a requiredlevel of distortion D.

Given maximum rate R0, minimizedistortion D.

D

R(D)

R0

Given a level of maximum distortion D0,minimize rate R.

D

R(D)

D0

Constrained optimization problem.

27





Summary



The function that relates the rate and the distortion are found as the solution of thefollowing minimization problem:

R(D0) = minD≤D0

(I (X ;Y ))

where

D0 is the maximum average distortion (allowed)(D(X ,Y ) = E [d(x , y)] =

∑u

∑v p(u, v)d(u, v));

I (X ;Y ) is the mutual information.

28





Summary



The function that relates the rate and the distortion are found as the solution of thefollowing minimization problem:

R(D0) = minD≤D0

(I (X ;Y ))

= minD≤D0

(H(X )− H(X |Y ))

= H(X )− maxD≤D0

(H(X |Y ))

= H(X )− maxD≤D0

(H(X − Y |Y ))

This relation suggests that the source coder have to produce a distortion X − Y thatis statistically independent from the reconstructed signal Y . Of course, it is not always

possible!

29





Summary



Shannon lower bound

R(D0) ≥ H(X )− maxD≤D0

(H(X − Y ))

Rate-distortion theory tell us that no compression system exists that performs outsidethe gray area. The closer a practical compression system is to the red (lower) bound,

the better it performs.

30





Summary



R(D) is usually very dicult to compute and can usually be found only approximately.However, the constraint problem above can be solved for few cases:

Memoryless Gaussian source;

Gaussian source with memory.

31





Summary



Memoryless Gaussian source

Let O = 0(s), s ∈ S be a random source of discrete observations on grid S with aGaussian PDF denoted:

p [o(s) = i ] = pi =1

√2πσ

exp

(−

1

2σ2i2)

The entropy is given by H = −∑

i pi log2pi . Since log2pi = log 1√2πσ− i2

2σ2.

H = −log21

√2πσ

∑i

pi +1

2σ2

∑i

pi i2 (1)

H = log2(√2πσ) +

1

2(2)

H =1

2log2(2πσ2) +

1

2log2e (3)

H =1

2log2(2πeσ2) (4)

32





Summary




We suppose that the source is Gaussian with a variance σ2.

I (X ;Y ) = H(X )− H(X |Y )

= H(X )− H(X − Y |Y )

≥ H(X )− H(X − Y )

≥ 1/2log2(2πeσ2)− H(N (0,E[(X − Y )2

]))

= 1/2log2(2πeσ2)− 1/2log2(2πeD)

= 1/2log2(2πeσ2

D)

(2) to (3): conditionning reduces entropy;(4) to (5): the normal distribution maximizes the entropy for a given second moment.

33





Summary




R(D) = 1/2log2(2πeσ2

D)

when 0 ≤ D ≤ σ2.

D(R) = σ2exp(−2R)

Each bit of description reduces the expected distortion by a factor 4.

34





Summary


Lossless vs lossy codingLagrangian formulation of the R-D problem

The following constrained problems are transformed into an unconstrained Lagrangiancost function:

minθ Dθwith the constraint

Rθ ≤ Rmax .

minθ Rθwith the constraint

Dθ ≤ Dmax .

J = D + λR, λ is the Lagrangian factor.

D

R(D)

Lines of constant

J = D + λR

Slope=− 1λ

35





Summary

TaxonomySignal delityPerceptual metricExamplesPerformancesExtension to the temporal dimension

Distortion/quality assessment

1 Introduction

2 Entropy Coding



5 Distortion/quality assessmentTaxonomySignal delityPerceptual metricExamplesPerformancesExtension to the temporaldimension

6 Quantization

7 Predictive Coding

8 Transform coding

9 Motion estimation

36





Summary


Taxonomy

Distortion/quality metrics can be divided into 3 categories:

1 Full-Reference metrics (FR) for which both the original and the distorted imagesare required (benchmark, compression) ;

2 Reduced-Reference metrics (RR) for which a description of the original and thedistorted image is required (network monitoring);

3 No-Reference (NR) metrics for which the original image is not required (networkmonitoring).

Each category can be divided into two subcategories: metrics based on signal delityand metrics based on properties on the human visual system.

37





Summary


Peak Signal to Noise Ratio (PSNR)

The PSNR is the most popular quality metric. This simple metric just calculates themathematical dierence between each pixel of the degraded image and the originalimage.

Denition (PSNR)

Let I and D the original and impaired images, respectively. These images having a sizeof M pixels are coded with n bits.

PSNR = 10log10((2n−1)2)

MSEdB,

with the Mean Squared Error MSE =

∑(x,y)(I (x,y)−D(x,y))2

M.

A high value indicates that the amount of impairment is small. A small value indicatesthat there is a strong degradation.

38





Summary



The PSNR is not always well correlated with the human judgment (MOS Mean OpinionScore). The reason is simple: this metric does not take into account the properties of thehuman visual system.

Example

(a) (b) (c) Original (d) Original+uniform noise

(a) and (b) from [Nadenau,00]: impact of Gabor patch on our perception.

39





Summary



Example (These three pictures have the same PSNR...)

(a) Original (b) Contrast stretched

(c) Blur (d) JPEG

40





Summary


Metric based on the error visibility

For this type of metric, the behavior of the visual cells are simulated:

Perceptual Color Space PSD CSF Masking

−

Perceptual Color Space PSD CSF Masking

Pooling Quality

I

D

PSD: Perceptual Subband Decomposition (Wavelet, Gabor, Fourier);

CSF: Contrast Sensitivity Function.

41





Summary


Metric based on the error visibility

Example

VDP (Visible Dierences Predictor) [Daly,93]:

WQA (Wavelet-based Quality Assessment) [Ninassi et al.,08a]

VQM (Video Quality Model) [Pinson et al.,04]...

42





Summary


Metric based on the structural similarity

SSIM standing for Structural Similarity index. Image degradations are considered here asperceived structural information loss instead of perceived errors [Wang et al.,04a].

Denition

Let I and D the original and the degraded images, respectively.

S(x, y) = l(x, y) × c(x, y) × s(x, y) (5)

The luminance comparison measure l(x, y) =2µxµy

µ2x + µ2y

The contrast comparison measure c(x, y) =2σxσy

σ2x + σ2y

The structural comparison measure s(x, y) =σxy

σxσy

SSIM(x, y) =(2µxµy + C1)(2σxy + C2)

(µ2x + µ2y + C1)(σ2x + σ2y + C2)

SSIM → 1, the best quality and SSIM → 0 indicates a poor quality.

43





Summary


Example of distortion maps

Example

(a) Original (b) Degraded

(c) MSE (d) WQA (e) SSIM

44





Summary


Impact of the visual masking (from [Ninassi et al.,08b])

Example

(a) Original (b) Degraded

(c) WQA-masking (d) WQA+masking

45





Summary


PSNR and SSIM computation example

Example (Uniform quantization mid-riser, ∆ = 16, N = 16, 8 bits)

Flat areas: O =

63 65 68 6763 63 66 6660 67 65 5667 65 63 65

Oq =

56 72 72 7256 56 72 7256 72 72 5672 72 56 72

MSE = 35.68; PSNR=32.6 dB; SSIM = 0.81.

Textured areas: O =

86 97 28 24187 27 207 149151 63 156 20178 148 77 31

Oq =

88 104 24 24888 24 200 152152 56 152 20072 152 72 24

MSE = 23.68; PSNR=34.38 dB; SSIM = 0.999.

Edges: O =

45 45 45 4545 45 167 16745 167 167 16745 167 167 167

Oq =

40 40 40 4040 40 168 16840 168 168 16840 168 168 168

MSE = 13; PSNR=36.99 dB; SSIM = 0.998.

46





Summary


Comparisons with subjective tests (from [Ninassi et al.,08b])Description of the three subjective experiments : IVC, Toyama1 and Toyama2.

Subjective Distortions Contents / Protocol Viewing Display ObserversExperiments Distorted images Conditions Devices (#)

IVCDCT Coding,

10 / 120 DSISITU-R BT 500.10

CRTFrench

DWT Coding, 6H (20)Blur

Toyama1 DCT Coding, 14 / 168 ACR ITU-R BT 500.10 CRT JapaneseDWT Coding 4H (16)

Toyama2 DCT Coding, 14 / 168 ACR ITU-R BT 500.10 LCD FrenchDWT Coding 4H (27)

(DSIS = Double Stimulus Impairment Scale), (ACR = Absolute Category Rating).

Metrics IVC (DSIS) Toyama2 (ACR) Toyama1 (ACR)CC SROCC RMSE CC SROCC RMSE CC SROCC RMSE

MOSp(WQA) 0.923 0.921 0.48 0.937 0.941 0.38 0.919 0.923 0.514MOSp(PSNR) 0.768 0.77 0.795 0.699 0.685 0.777 0.685 0.678 0.943MOSp(SSIM) 0.832 0.844 0.691 0.823 0.826 0.618 0.814 0.82 0.754

MOS = Mean Opinion Score;

CC = Linear correlation coecient [−1, 1];

RMSE = Root mean square error [0,+∞[;

SROCC = Spearman rank order correlation coecient (a non-parametric measure of correlation) [−1, 1].

47





Summary


Quality metric for video

How do we built our own opinion regarding the quality of a video?

Quick to criticize and slow to forgive...

Example (Video quality metric)

Temporal SSIM [Wang et al.,04b]:1 To reduce the importance of dark areas compared to lighter areas (that are deemed

to be more attractive (discutable));2 To reduce the importance of spatial distortions when the dominant motion is high.

Temporal WQA [Ninassi et al.,09]:1 HVS integrates most of the visual information during the visual xation step (≈ 250

ms);2 Distortion is evaluated when the area is stabilized on the fovea (motion

compensation);3 The characteristics of temporal distortions, such as time frequency and amplitude of

the variations, impact the perception.

48





Summary

Scalar quantizationVector quantization

Quantization

1 Introduction

2 Entropy Coding




6 QuantizationScalar quantizationPrincipleUniform quantizationOptimal quantization, Lloyd-MaxalgorithmExamples: uniform, semi-uniform andoptimal quantizerSummary

Vector quantizationPrincipleVoronoi diagramClustering and K-means

7 Predictive Coding

8 Transform coding

9 Motion estimation49





Summary


Principle

The quantization is a process to represent a large set of values with a smaller set.

Denition (Scalar quantization)

Q : X 7−→ C = yi , i = 1, 2, ...Nx Q(x) = yi

N is the number of quantization level;

X could be continue (ex: R) or discret;C is always discret (codebook,dictionnary);

card(X ) > card(C);

As x 6= Q(x), we will lost some information (lossy compression).

50





Summary


Uniform quantization

Denition

In the uniform quantization, the quantization step size ∆ is xed, no matter what thesignal amplitude is.

input x

output y

t6 t7 t8 t9t4t3t2t1

y2

y1

∆

ti , decision levels

yi , representative levels

Denition:

The quantization thresholds areuniformly distributed:∀i ∈ 1, 2, ...,N, ti − ti+1 = ∆

The output values are the center ofthe quantization interval:

∀i ∈ 1, 2, ...,N, yi =ti+ti+1

2

Example of the nearest neighborhoodquantizer (see on the left):

Q(x) = ∆×⌊x∆

+ 0.5⌋.

51





Summary



A uniform quantization is completely dened by the number of levels, the quantizationstep and if it is a mid-step or mid-riser quantizer.

A mid-step quantizerZero is one of the representative levels yk

input x

output y

t6 t7 t8 t9t4t3t2t1

y2

y1

∆

A mid-riser quantizerZero is one of the decision levels tk

input x

output y

t6 t7 t8 t9t4t3t2t1

y2

y1

Usually, a mid-riser quantizer is used if the number of representative levels is even and a mid-stepquantizer if the number of level is odd.

52





Summary


Uniform quantization with dead zone


input x

output y

t6 t7 t8 t9t4t3t2t1

y2

y1

∆

Uniform quantization with dead zone

input x

output y

t6 t7 t8 t9t4t3t2t1

y2

y1

Deadzone

Interest:To remove small coecients by favouring the zero value. Increase the codingeciency with a small visual impact.

53





Summary


Example of an uniform quantization

Example (Original picture quantized to 8, 7, 6, 4 bits/pixels)

(a) Original (8 bits per pixel)

(b) 7 bpp (∆ = 2) (c) 6 bpp (∆ = 4) (d) 4 bpp (∆ = 8)

54





Summary


Optimal quantization

Denition (Optimal quantization)

An optimal quantization of a random variable X having a probability distribution p(x)is obtained by a quantizer that minimises a given metric:

Linf -norm: D = max |X − Q(X )|;L1-norm: D = E [|X − Q(X )|];L2-norm: D = E

[(X − Q(X ))2

]called the Mean-Square Error (MSE). This is

the most used.

Considering the MSE, we have:

if the random variable X is continue: D =∫ xmaxxmin

p(x)(x − Q(x))2dx ;

if the random variable X is discret: D =∑n

k=1 p(xk)(xk − Q(xk))2.

55





Summary


Example

Example (Quantization error)

Hypothesis:

Uniform quantization with a quantization step ∆;

p(x) is a uniform probability distribution of a random variable X ;

N is the number of representative levels.

xmax − xmin

∆

xmin xmax∆2−∆

2

Quantization step: ∆ =xmax−xmin

N

Probability distribution: p(x) = 1xmax−xmin

= 1N∆

56





Summary


Example

Example (Quantization error)

D =

∫ xmax

xmin

p(x)(x − Q(x))2dx

= N ×∫ ∆/2

−∆/2p(x)(x − 0)2dx , 0 the mid-point

= N ×∫ ∆/2

−∆/2

1

N∆x2dx

=∆2

12

57





Summary


Optimal quantizer

Uniform quantizer is not optimal if the source is not uniformly distributed.

Optimal quantizer

To nd the decision levels ti and the representative levels yi to minimize thedistortion D.

To reduce the MSE, the idea is to decrease the bin's size when the probability ofoccurrence is high and to increase the bin's size when the probability is low. For Nrepresentative levels and with a probability density p(x), the distortion is given by:

D =

∫ xmax

xmin

p(x)(x − Q(x))2dx

=

N−1∑k=1

∫ tk+1

tk

p(x)(x − yk)2dx

58





Summary


Optimal quantizer

The optimal ti and yi satisfy:

∂D∂ti

= 0 and ∂D∂yi

= 0

Lloyd-Max quantizer [Lloyd,82, Max,60]

∂D∂ti

= 0 ⇒ ti =yi+yi+1

2.

ti is the midpoint of yi and yi+1.

∂D∂yi

= 0 ⇒ yi =

∫ titi−1

p(x)xdx∫ titi−1

p(x)dx.

yi is the centroid of the interval [ti−1, ti ].

⇒ given the ti, we can nd the corresponding optimal yi.⇒ given the yi, we can nd the corresponding optimal ti.

How can we nd the optimal ti and yi simultaneously?

59





Summary


Lloyd-Max algorithm

The Lloyd-Max algorithm is an algorithm for nding the representative levels yi and thedecision levels ti to meet the previous conditions, with no prior knowledge.

Principle of the iterative process

1 The iterative process starts for k = 0 with a set of initial values for the representative

levelsy

(0)1 , ..., y

(0)N

.

2 New values for decision levels are determined t(k+1)i =

y(k)i

+y(k)i+1

2

3 Compute the distortion D(k) and the relative errors δ(k);

4 Depending on the stopping criteria (δ(k) < ε), stop the process or update the

representative levels y(k+1)i =

∫ t(k+1)i

t(k+1)i−1

xp(x)dx

∫ t(k+1)i

t(k+1)i−1

p(x)dx

and go back to step 2.

60





Summary


Uniform, Semi-uniform and optimal quantizer

Suppose we have the following 1D discrete signal:

X = 0, 0.01, 2.8, 3.4, 1.99, 3.6, 5, 3.2, 4.5, 7.1, 7.9

Example (Uniform quantizer: N = 4, Mid-riser and ∆ = 2)

input x

output y

02 4 6 8

1

3

5

7

ti ∈ T = t0 = 0, t1 = 2, t2 = 4, t3 = 6, t4 = 8

ri ∈ R = r0 = 1, r1 = 3, r2 = 5, r3 = 7

Check the denition of a uniform quantizer...

Quantized vector X = 1, 1, 3, 3, 1, 3, 5, 3, 5, 7, 7 (MSE = 0.42)

61





Summary


Uniform, semi-uniform and optimal quantizer


X = 0, 0.01, 2.8, 3.4, 1.99, 3.6, 5, 3.2, 4.5, 7.1, 7.9

Example (Semi-uniform quantizer: N = 4, Mid-riser and ∆ = 2)

input x

output y

02 4 6 8

1

3

5

7

ti ∈ T = t0 = 0, t1 = 2, t2 = 4, t3 = 6, t4 = 8

ri ∈ R = r0 = 2/3, r1 = 3.25, r2 = 4.75, r3 = 7.5

Quantized vector X = 2/3, 2/3, 3.25, 3.25, 2/3, 3.25, 4.75, 3.25, 4.75, 7.5, 7.5(MSE = 0.31)

62





Summary




X = 0, 0.01, 2.8, 3.4, 1.99, 3.6, 5, 3.2, 4.5, 7.1, 7.9

Example (Lloyd-Max Algorithm)

1 Initial values of decision levels T = t0 = 0, t1 = 2, t2 = 4, t3 = 6, t4 = 8

2 First iteration: ri =

∫ ti+1ti

xp(x)dx∫ ti+1ti

p(x)dx

In this case, the pdf is not known, so, we assign each observation to the representativelevels leading to the smallest distortion. Then we compute the centroid:

0 2 4 6 8

0, 0.01 1.99, 2.8, 3.2, 3.4, 3.6 4.5, 5 7.1, 7.9

R = 0.005, 2.998, 4.75, 7.53 Second iteration: ti =

ri−1+ri2

and new representative levelsT = 0, 1.5, 3.87, 6.125, 8 R = 0.005, 2.998, 4.75, 7.5

4 Third iteration: ti =ri−1+ri

2and new representative levels

T = 0, 1.5, 3.87, 6.125, 8 R = 0.005, 2.998, 4.75, 7.5

63





Summary




X = 0, 0.01, 2.8, 3.4, 1.99, 3.6, 5, 3.2, 4.5, 7.1, 7.9




∫ ti+1ti

xp(x)dx∫ ti+1ti

p(x)dx


0 2 4 6 8

0, 0.01 1.99, 2.8, 3.2, 3.4, 3.6 4.5, 5 7.1, 7.9

R = 0.005, 2.998, 4.75, 7.5

3 Second iteration: ti =ri−1+ri


T = 0, 1.5, 3.87, 6.125, 8 R = 0.005, 2.998, 4.75, 7.54 Third iteration: ti =

ri−1+ri2


63





Summary




X = 0, 0.01, 2.8, 3.4, 1.99, 3.6, 5, 3.2, 4.5, 7.1, 7.9




∫ ti+1ti

xp(x)dx∫ ti+1ti

p(x)dx


0 2 4 6 8

0, 0.01 1.99, 2.8, 3.2, 3.4, 3.6 4.5, 5 7.1, 7.9

R = 0.005, 2.998, 4.75, 7.53 Second iteration: ti =

ri−1+ri2


4 Third iteration: ti =ri−1+ri


T = 0, 1.5, 3.87, 6.125, 8 R = 0.005, 2.998, 4.75, 7.5

63





Summary




X = 0, 0.01, 2.8, 3.4, 1.99, 3.6, 5, 3.2, 4.5, 7.1, 7.9


Given that T = 0, 1.5, 3.87, 6.125, 8 and R = 0.005, 2.998, 4.75, 7.5, thequantized vector isX = 0.005, 0.005, 2.998, 2.998, 2.998, 2.998, 4.75, 2.998, 4.75, 7.5, 7.5 (MSE=0.18).

input x

output y

02 4 6 8

1

3

5

7Uniform

Semi-uniform

Optimal

xi ∈ X

64





Summary


Uniform vs optimal quantizer

Example (N=5)

(a) Original (b) Uniform (c) Optimal

(d) Histo. (e) Decision levels (f) Decision levels

65





Summary


Summary

Scalar quantization

The scalar quantization involves basically two operations:

1 partitionning the range of possible input values into a nite collection ofsubranges or subsets

2 For each subsets choosing a representative value to be output when an inputvalue is in that subrange.

Vector quantization is also based on this two operations that take place not in aone-dimensional scalar space, but in an N-dimensional vector space.

66





Summary


Principle

Vector quantization, also called block quantization, allows the encoding of valuesstemming from a multidimensional vector space into a nite set of value from adiscrete subspace of lower dimension.

Denition (Vector quantization)

A vector quantizer maps n-dimensional vectors in the vector space Rn into a nite setof vectors C = yi , i = 1, 2, ...N. Each vector yi in the vector space Rn is called acode vector or a codeword and the set of all the codewords is called a codebook C.

Q : Rn 7−→ C = yi , i = 1, 2, ...N

x =

x1x2· · ·xn

7−→ yi =

yi,1yi,2· · ·yi,n

n is the size of the input vector x;

N is the number of representative levels;

The ouput y is a vector of size n.

67





Summary


Voronoi diagram

The set of Voronoi regions R partition the entire space Rn such that:⋃Ni=1 Ri = Rn and

⋂Ni=1 Ri = , ∀i 6= j .

Example (Example of a 2-dimensional VQ)

N = 6

x

y

y1

y2

y3

y4 y5

x

R5

∀x ∈ R5, Q(x) = y5

A representative level or codeword

68





Summary


Voronoi diagram

Example (Two 1D-scalar quantizations)

N = 9

x

y

One possible approach is to quantize each dimension independently with a scalarquantizer. This results in a rectangular grid of quantization regions.

69





Summary


Pro and cons of vector quantization

Pro

The statistical dependancy between the dierent dimensions of the space is takeninto account. Very good candidate when the samples are statistically highlydependent;

For the same number of representative levels, the distortion is less important thanthe distortion of a scalar quantization.

Cons

Complexity to nd a good partition of the space...

70





Summary


Optimal vector quantization

The questions are:

1 What are the representative levels that best represent a given set of input vectors?

2 How many representative levels should be chosen?


Let Ri a Voronoi region. An optimal vector quantization implies that:

Vi =x ∈ Rn : d(x , yi ) ≤ d(x , yj ), ∀j 6= i

d(.) is a given distance metric (Euclidean distance), yi is the centroid of the region

Ri .

71





Summary



The questions are:

1 What are the representative levels that best represent a given set of input vectors?

2 How many representative levels should be chosen?

General framework Linde-Buzo-Gray (LBG) algorithm [Linde et al.,80]

1 Determine the number of representative levels, N;

2 Select N representative levels at random;

3 Using the Euclidean distance measure clusterize the space around eachrepresentative level. For a given input vector, nd the representative level thatyields the minimum distance;

4 Compute the new set of representative levels;

5 Repeat steps 2 and 3 until the representative levels are almost constant.

This is an extension of Lloyd-Max algo (very similar to k-means).

72





Summary


Clustering and k-means

A cluster is therefore a collection of objects which are similar between them and aredissimilar to the objects belonging to other clusters.

Example (K-means on a picture)

(a) Original (b) N=2 (c) N=5 (d) N=10

74





Summary


Clustering and k-mean

K-mean algorithm

Repeat the following three steps until convergence (stability of the centroid):

1 determine the centroid coordinates (for the rst iteration, random values can bechosen);

2 determine the distances of each data points to the centroids;

3 group the data points based on minimum distance.

Example

Let A = 1, 2, 3, 6, 7, 8, 13, 15, 17. Three clusters are required.

75





Summary

PrincipleSpatial linear predictionTemporal predictionPredictive coding and quantization

Predictive coding

1 Introduction

2 Entropy Coding




6 Quantization

7 Predictive CodingPrincipleSpatial linear predictionTemporal predictionPredictive coding and quantization

8 Transform coding

9 Motion estimation

76





Summary


Predictive coding

Denition (Basic predictive coders)

Predictive coding exploits the correlation between adjacents pels (spatially as well astemporally). A prediction of the pel to be encoded is made from previously coded information,already transmitted.

A predictive coder is composed of a predictor, a quantizer step and an entropy encoder.

Lossless predictive coding

PredictionMemory

Entropy Encoder+Coding process

f −

f

εBinary codes

MemoryPrediction

Entropy Decoderf = f + εDecoding process+ε

Binary codes

f

+

77





Summary


Lossless predictive coding

PredictionMemory

Entropy Encoder+Coding process

f −

f

εBinary codes

f (n) input sample;

f (n) predicted sample;

ε(n) = f (n)− f (n) is the prediction error.

A linear prediction at Pth order is given by: f (n) =∑P

i=1 ai f (n − i)

Ideally, the coecients ai minimize the Lα-norm of the prediction error.

78





Summary


Least Mean Square to compute the coecients ai

Let be:

the input samples F = [f (0) · · · f (N − 1)]t ;

the prediction error E = [ε(0) · · · ε(N − 1)]t ;

the parameter vector Ω = [a1 · · · aP ]t .

The optimal parameters ai are given by the following equation (optimal regarding the MSE):

ΩOPT = arg minΩ1N

∑N−1n=0 ε(n)2

ε(0)ε(1)

.

.

.ε(N − 1)

=

f (0)f (1)

.

.

.f (N − 1)

+

f (−1) · · · f (−P)f (0) · · · f (−P + 1)

.

.

.. . .

.

.

.f (N − 2) · · · f (NP − 1)

a1...aP

E = F + ΓΩ

We have∑N−1

n=0 ε(n)2 = E tE and

E tE = (F + ΓΩ)T (F + ΓΩ) = (F tF + 2(ΓtF )tΩ + ΩtΓtΓΩ)

The optimal parameters are given by ∂∂ΩE

tE = 0.

Ωopt = (ΓtΓ)−1ΓtF

79





Summary


Predictive coding

Spatial linear prediction (Intra-picture)

The current value is predicted from its immediate neighboorhood. This is called DPCM(Dierence Pulse Coding Modulation).

j a0

a1 a2 a3

i

fi,j = a0fi−1,j + a1fi−1,j−1 + a2fi,j−1 + a3fi+1,j−1

Example

a0 = 1 and a1,2,3 = 0, fi,j = fi−1,j ;

a0 = a2 = 1/2 and a1 = a3 = 0, fi,j =fi−1,j+fi,j−1

2 ;

a0 = a2 = 1, a1 = −1 and a3 = 0, fi,j = fi−1,j + fi,j−1 − fi−1,j−1.

80





Summary


Predictive coding

Example (Classical predictor)

a0 i,j

a1 a2 a3

Linear predictors:

fi,j =fi−1,j+fi,j−1

2 , prediction based on the two nearest pixels;

fi,j =2fi−1,j+fi,j−1

3 , prediction based on the two nearest pixels but with a preferredorientation;

fi,j = 23 (fi−1,j + fi,j−1)− 1

3 fi−1,j−1.

Non-linear predictor used in JPEG-LS:

fi,j =

min(fi−1,j , fi,j−1) if fi−1,j−1 ≥ max(fi−1,j , fi,j−1)max(fi−1,j , fi,j−1) if fi−1,j−1 ≤ min(fi−1,j , fi,j−1)fi−1,j + fi,j−1 − fi−1,j−1 Otherwise

81





Summary


Predictive coding

Example (Spatial linear prediction)

(a) Orig. (b) PDF Luma.

(c) (d) (e) (f) PDF Error

(a) H = 6.6 bits/symbol, L = 6.7;(c) H = 2.33 bits/symbol, L = 4.6; Simple prediction a0 = 1, otherwise 0;(d) H = 2.14 bits/symbol, L = 4.2; Minimum-variance prediction a0 = 7/8, a1 = −5/8, a2 = 3/4 and a3 = 0;(e) H = 2.16 bits/symbol, L = 4.3; Minimum-entropy prediction a0 = 7/8, a1 = −1/2, a2 = 5/8 and a3 = 0.

82





Summary


Predictive coding

Temporal prediction (Inter-pictures)

Two solutions to deal with the temporal redundancy:

I (i, j, t + 1) = I (i, j, t), (error=Frames Dierence (FD));

I (i, j, t + 1) = I (i + di, j + dj, t), (error=Displaced FD).

Example

FD

I (t)

DFD

I (t + 1)

~V = [di, dj]t

83





Summary


Predictive coding

Lossy predictive coding (open-loop or feedforward)

PredictionMemory

Entropy EncoderQ+Coding process

f −

f

ε εQBinary codes

MemoryPrediction

Entropy Decoderf = f + εDecoding process+εQ

Binary codes

f

+

Predictor is based on the input (before the quantization). Therefore, any errorintroduced by Q can not be recovered.

84





Summary


Predictive coding and quantization

Example (Open-loop, simple prediction, quantization 2k, 2k + 1 → 2k)

Encoding side...

f Values 1 1 2 3 0 2 1 3

f Prediction 0 1 1 2 3 0 2 1

ε = f − f 1 0 1 1 -3 2 -1 2εQ 0 0 0 0 -4 2 -2 2

MemoryPrediction

d = f + εDecoding process+ε

f

+

εQ 0 0 0 0 -4 2 -2 2

f Prediction 0 0 0 0 0 -4 -2 -4d Decoded 0 0 0 0 -4 -2 -4 -2

85





Summary


Predictive coding and quantization

Lossy predictive Coding (closed-loop or feedback)

Predictor is based on input (after Q). Eect of Q is fed back to the input for adjustment.

MemoryPrediction

Q Entropy Coder

Entropy Decoder

Prediction

ε = f − f εQ = Q(ε)+f

f

−

+

f = f + εQ

+

Channel

+ εQfD+

f

Reconstruction Error: fD − f = εQ − ε

86





Summary


Predictive coding

Summary

Prediction: estimation of random variable from past or present observablerandom variables;

Intra-frame prediction to exploit spatial similarities;

Inter-frame prediction to exploit similarity of temporally successive pictures (MCor not);

Prediction shapes error signal (Laplacian, generalized gaussian);

Simple and ecient;

Prediction is based on quantized samples;

Adaptive intra/inter frame DPCM.

87





Summary

A brief reminderKarhunen-Loeve transformDiscrete Fourier transform (DFT)Discrete Cosine transform (DCT)Discret Wavelet Transform (DWT)

Transform coding

1 Introduction

2 Entropy Coding




6 Quantization

7 Predictive Coding

8 Transform codingA brief reminderKarhunen-Loeve transformDiscrete Fourier transform (DFT)Discrete Cosine transform (DCT)Discret Wavelet Transform (DWT)

9 Motion estimation

88





Summary


A brief reminder

Denition (Discrete inner product)

Let f and g two vectors of size N. The inner product is given by:

〈f , g〉 =∑N−1

n=0 f (n)g(n)

Denition (A basis)

A basis consists of a set of functions/vectors from which any other vectors can begenerated via linear combinations.

Denition (Discrete orthogonal basis)

A set of functions ϕm0≤m<N denes an orthogonal basis B if:

〈ϕm, ϕn〉 =

0 if n 6= m

c if n = m

if c = 1, the basis is said orthonormal.

89





Summary


The goal

Transform coding is fundamental to achieve a signicant bit-rate reduction because it allows toreduce the redundancy of the original signal (energy compaction). Three main features arerequired:

Energy compaction;

Decorrelate the tranform coecients;

Conservation of the energy.

Denition (Orthogonal decompositions or forward transform)

Let I the input picture of size N. This picture can be perfectly approximated by a linearcombination of a set of orthogonal functions ϕm0≤m<N .

I =∑N−1

m=0 αmϕm

αm are given by 〈I , ϕm〉

How to nd or what is the best orthogonal basis B? KLT, Fourier, DCT, Wavelet...

90





Summary


Karhunen-Loeve transform

Principle

Let X a squared matrix of size N containing the input data (Xi are random variables):

X =

X1· · ·XN

=

x11 · · · x1N· · · · · · · · ·xN1 · · · xNN

The covariance matrix Σ is dened by the element Σij given by:

Σij = cov(Xi ,Xj ) = E[(Xi − µi )(Xj − µj )

], with µi = E [Xi ].

Σ =

E [(X1 − µ1)(X1 − µ1)] · · · E [(X1 − µ1)(XN − µN)]

.

.

....

.

.

.E [(XN − µN)(X1 − µ1)] · · · E [(XN − µN)(XN − µN)]

The KLT basis vectors are the eigenvectors of Σ (diagonalization of Σ): Σ = UDUt .

D the eigenvalues matrix. D = diag(d1, ..., dN), with d1 ≥ · · · ≥ dN

The eigenvectors (column vectors of U) are arranged according to decreasing eigenvalues.

KLT = Principal Component Analysis (PCA)

91





Summary



Example (Hard thresholding)

(a) Original (b) 1 EV(MSE=515)

(c) 10EV(MSE=49)

(d) 15EV(MSE=26)

(e) 30 EV (MSE=7) (f) 60EV(MSE=0.86)

(g) 120EV(MSE=10−4)

EV = Eigen Vectors.

92





Summary



Pro

The KL transform coecients are uncorrelated;

Basic vectors are ordered according to decreasing eigenvalues (compaction ofenergy).

Cons

Representation (the transform matrix) is signal dependent;

Computation cost is signicant.

93





Summary


Discrete Fourier transform (DFT)

Let f a sampled nite signal (N samples).

Denition (Discrete Fourier basis)

The discrete Fourier basis B = ϕm0≤m<N is dened by:

ϕm(n) = 1√Nexp 2iπ

N mn

Discrete Fourier Transform: F (k) =∑N−1

n=0 f (n)exp− 2iπ

N kn

Inverse Fourier Transform: f (n) = 1√N

∑N−1k=0

F (k)exp 2iπ

N kn

Extension to 2D

2D DFT can be separated into a sequence of 2 1D DFT. The basis functions are then:

ϕm1,m2 (n1, n2) = ϕm1 (n1)ϕm2 (n2)

ϕm1,m2(n1, n2) =1

Nexp

2iπ

N(m1n1 + m2n2)

94





Summary


DFT

Example

Original Ampitude spectrum 3 contour lines

95





Summary


Discrete Cosine transform (DCT)

Denition (Discrete Cosine basis)

The discrete cosine basis B = ϕm0≤m<N is dened by:

ϕm(n) = λ(m)√

2N cos

π(2n+1)m2N

with, λ(m) =

1√2

if m = 0

1 otherwise

Extension to 2D

2D DCT can be separated into a sequence of 2 1D DCT. The basis functions are then:

ϕm1,m2 (n1, n2) = ϕm1 (n1)ϕm2 (n2)

ϕm1,m2 (n1, n2) = λ(m1)λ(m2)2

Ncos

π(2n1 + 1)m1

2Ncos

π(2n2 + 1)m2

2N

with,

λ(m) =

1√2

if m = 0

1 otherwise

96





Summary



DCT and inverse DCT

The DCT of a block I of size NxN is dened by:

DCT (I )(n,m) = 2Nλ(n)λ(m)

∑N−1i=0

∑N−1j=0 cos

π(2i+1)n2N

cosπ(2j+1)m

2NI (i , j)

⇔〈ϕn,m, I 〉

The inverse DCT is dened by:

I (n1, n2) = 2N

∑N−1m1=0

∑N−1m2=0 λ(m1)λ(m2)cos

π(2n1+1)m12N cos

π(2n2+1)m22N DCT (I )(n1, n2).

Two dimensional DCT basis. The source data (8x8)is transformed to a linear combination of these 64frequency squares.

97





Summary



Example (Hard thresholding)

DCT

IDCT

IDCT

IDCT

Mask

Mask

Mask

fu

fv

Highest frequencies

Smallest coe.

98





Summary



Pro

Good decorrelation of the coe;

Good compaction of energy (easy to remove coe. without visual annoyance).

A lot of almost null coe for the highest frequencies. Possibility to reorder thecoecients using the zig-zag scan (Run-length coding).

Cons

??

99





Summary


Discret Wavelet Transform (DWT)

Denition (Wavelet [Mallat,98])

A wavelet ψ is a function of zero average which is dilated with a scale parameter sand translated by u:

ψs,u(t) = 1√sψ( t−u

s)

The familyψj,n

(j,n)∈Z2 is an orthonormal basis of L2(R) (s = 2j ).

Denition (Link with the multiresolution approach)

Because the orthogonal wavelet dilated by s = 2j carry signal variation at theresolution 2−j , the wavelet transform can be seen as an iterative process. For eachscale j , f ∈ L2(R) is decomposed into:

1 A coarse approximation with a scaling function φ:aj (n) =

⟨f , φj,n

⟩, with the orthonormal basis

φj,n

n∈Z ;

2 A ne approximation with a wavelet function ψ:dj (n) =

⟨f , ψj,n

⟩, with the orthonormal basis

ψj,n

n∈Z .

100





Summary


Wavelets in a basis set are composed of scaled and translated copies (daughterwavelets) of a primary basis function (mother wavelet).

Example (Haar wavelets [Haar, 1910])

Mother wavelet

Wavelet function

ψ(t) =

1 0 ≤ t < 1/2−1 1/2 ≤ t < 10 otherwise

Scaling functionφ(t) = 1[0,1]

ψn,k(t) = ψ(2nt − k)

(n,k)∈Z2

(Haar basis)

ψ0,0(t) ψ0,1(t)

ψ1,0(t) ψ1,1(t)

A signal f ∈ L2(R) can be decomposed on this orthogonal basisψn,k

(n,k)∈Z2 :

f =∑+∞

j=−∞∑+∞

n=−∞⟨f , ψj,n

⟩ψj,n

101





Summary


2D wavelet orthogonal basis

Denition (2D DWT)

A wavelet basis in 2D (L2(R2)) is constructed with separable products from a 1D wavelet ψand a scaling function φ. Three wavelets are dened:

ϕH(n1, n2) = φ(n1)ψ(n2)

ϕV (n1, n2) = ψ(n1)φ(n2)

ϕD(n1, n2) = ψ(n1)ψ(n2)

These wavelets extract images details at dierent scales and orientations.

The 2D wavelet transform provides for each scale 2−j the following coecients and for eachpoint n = (n1, n2): aj (n) =

⟨f , φ2j,n

⟩dkj (n) =

⟨f , ψkj,n

⟩with k ∈ H,V ,D

Remarks: in practice, these coe. are obtained by using 1D lters applied successively on therows and then on the columns

φ is equivalent to a low-pass lter;

ψ is equivalent to a high-pass lter.

102





Summary


2D wavelet orthogonal basis

h

Low pass lter

↓ 2 h ↓ 2

g ↓ 2

g

High pass lter

↓ 2 h ↓ 2

g ↓ 2

(Sx, Sy/2)aj−1

(Sx, Sy)aj ...

(Sx/2, Sy/2)

dVj

dHj

dDj

on each row on each column

Example (3-Layers decomposition (CDF 9/7))

First Iter. Second Iter.

H D

V

fu

fv

dH1

103





Summary


2D DWT

Pro

Good decorrelation of the coe;

Good compaction of energy (easy to remove coe. without visual annoyance).

A lot of almost null coe for the highest frequencies. Possibility to reorder thecoecients using the zig-zag scan (Run-length coding).

Used by JPEG2000.

Cons

??

Example (Coecients of the lters)

Daubechies 5/3 5 for the low pass −1, 2, 6, 2,−1 and 3 for the high pass 1, 2, 1

Daubechies 9/7: 0.0378,−0.0238,−0.1106, 0.3774, 0.8527, 0.3774, ...,−0.0645,−0.0407, 0.4181, 0.7885; 0.4181,−0.0407,−0.0645

104





Summary


Lifting scheme [Sweldens, 95]

Goal and principle

The purpose of the lifting scheme is to decompose a function into a sum of a coarseapproximation associated to a correction to the coarse representation.The decomposition is carried out by ltering alternatively the function at odd andeven locations. Interests are:

reversible operation;

simple process.

P1 U1 P2Split

−

+

+

+xe , even s1

xo , odd d1

.....x

s, Coarse Rep.

d , Fine Rep.

d1 = x0 − P1(xe)

s1 = xe + U1(d)

Pi and Ui are the ith predict and update stages, respectively. The number of predict and

update stages depend on the lter's size.

105





Summary

IntroductionBlock MatchingHierarchical block matchingQuality of a motion estimator

Motion estimation

1 Introduction

2 Entropy Coding




6 Quantization

7 Predictive Coding

8 Transform coding

9 Motion estimationIntroductionBlock MatchingHierarchical block matchingQuality of a motion estimator

106





Summary


Introduction

Motivation

To deal with the temporal redundancy of a video sequence.

Motion estimator is a fundamental tool, and that for a number of applications:

Data compression

Filtering such as noise reduction

Frame interpolation (upconversion 24 frames/sec to 60 frames/sec)

De-interlacing (conversion from interlaced to progressive video)

107





Summary


Fundamental assumption

An image of the sequence is dened by I (x , y , t). (x , y) represents the spatialcoordinates whereas t represents the time.

Fundamental assumption

The image intensity is conserved along trajectories !!!

I (x , y , t) = I (x + δx , y + δy , t + δt)

Classication:

Feature / Region Matching: the motion eld is estimated by correlating features(edge, intensity...) from one frame to another (Block Matching, Phasecorrelation...);

Gradient-based methods: the motion eld is estimated by using spatio-temporalgradients of the image intensity distribution (Pel-recursive method, theHorn-Schunck algorithm...).

108





Summary


Motion models

Motion models

2D Translation (2 parameters):This model dealing only with the translation is used in video coding (works quitewell because motion between concecutive frames is rather small).[

u

v

]=

[x

y

]+

[dx

dy

]Ane model (6 parameters):Translation, rotation, scaling and deformation are taken into account to evaluatethe displacement. [

u

v

]=

[dxx dxydyx dyy

] [x

y

]+

[dx

dy

]From these models, we can estimate:

the global apparent motion (dominant motion): a motion model is dened for thewhole image (refers to apparent motion of background);

the local apparent motion: a motion model is dened for each pixel (block) of theimage.

109





Summary


Block Matching

Principle

1 Matching is performed by minimizing an error criterion;

2 A search procedure must be dened (exhaustive, non-exhaustive search...);

3 Regularization (smooth constraint);

4 Spatial resolution of the motion vector (pel, 1/2 pel, 1/4 pel...).

110





Summary


Error criterion

Matching for a block Bi of size NxN

Image(t-1) Image(t)

Bi

~V = [dx , dy ]t

The estimated displacement vector has to minimize a criterion f :(dx∗, dy∗) = argmin(dx,dy)∈W f (I (x , y , t), I (x + dx , y + dy , t − 1)), W ⊆ I .

MSE or MAD

MSE (Mean Square Error), L2-norm:

MSEdx,dy (x, y) = 1N2∑

(k,l )∈Bi(I (x + k, y + l, t)− I (x + k + dx, y + l + dy , t − 1))2

MAD (Mean Absolute Dierence), L1-norm:

ADdx,dy (x, y) = 1N2∑

(k,l )∈Bi|I (x + k, y + l, t)− I (x + k + dx, y + l + dy , t − 1)|

Notice that we made the assumption that the pixels of the blocks undergo a commondisplacement (coherence constraint).111





Summary


Search procedure

Dierent procedures can be used, more or less complex, more or less ecient:

Full search;

Three step search [Koga et al.,81]:

Two dimensional logarithmic search [Jain et al.,81];

Orthogonal search [Puri et al.,87];

One-at-a-Time search [Srinivasan,85].

Predictive motion vector eld adaptive search technique (PMVFAST)[Tourapis,01].

112





Summary


Search procedure

Full search

A full search is an exhaustive search within the picture or within a predeterminedwindow. The size of the predetermined window determines the maximumdisplacement range.

(dx∗, dy∗) = argmin(dx,dy)∈W f (I (x , y , t), I (x + dx , y + dy , t − 1)), W ⊆ I .

Let W the window having a size of(2N + 1)× (2M + 1), the number of

computation to nd the globalminimum is 2N + 1× 2M + 1.

2M+1

2N+1

Original pixel (block); Pixel (block) to match; Best solution (global maximum);

OPTIMAL MATCH

113





Summary


Search procedure

Three step search [Koga et al.,81]

This algorithm is based on a coarse-to-ne process. Three steps are used:

1 Initial search on 8 pixels at a given distance of the center ( );

2 The distance is halved. The centre is moved to the pixel that minimizes thematching criterion ( );

3 Repeat steps 1 and 2 until the step size is smaller than a given threshold ( ).

STEP 1 STEP 2 STEP 3

SUBOPTIMAL MATCH (in this case)

114





Summary


Search procedure

Variations of the three step search

2D logarithmic search

1 Four pixels (cardinal axis) are considered at a given distance ( and seconditeration);

2 If the position of best match is at the centre, halve the distance. Otherwise,repeat step 1 from the best candidate;

3 When the distance is of 1, all the nine blocks around the centre are chosen ( ).

Original pixel (block); Pixel (block) to match; Best solution (global maximum).

115





Summary


Search procedure


Orthogonal search

1 Two pixels are chosen at a given distance in the horizontal direction ( and 3th

iteration) and the point minimizing the matching criterion is chosen;

2 Take two points in the vertical direction and nd the minimum ( and 4th

iteration);

3 Halve the distance and go back to 1, if the distance is greater than one.Otherwise, stop ( ).

Original pixel (block); Pixel (block) to match; Best solution (global maximum).116





Summary


Search procedure


One-at-a-time search

1 Two pixels are chosen about the position of the original pixel ( the pointminimizing the matching criterion is chosen. A direction is dened (right or left);

2 Continue in this direction while the distortion is smaller than the previouscandidate ( );

3 Repeat step 1 and 2 by changing the direction. Consider now the vertical axis (then ).

Original pixel (block); Pixel (block) to match; Best solution (global maximum).

117





Summary


Search procedure

PMVFAST, Predictive motion vector eld adaptive search technique [Tourapis,01]

~Ve

t − 1 t

~Va

~Vb ~Vc ~Vd

~Vp

~Vp =

[MED(vax , vbx , vcx , vdx , vex )MED(vay , vby , vcy , vdy , vey )

]Small diamont Large diamont

118





Summary


Regularization

Goal and principle

The regularization is used to improve the robustness and the consistency of a motioneld (in function or not of a targeted application).

(dx∗, dy∗) = argmin(dx,dy)∈W f (I (x , y , t), I (x + dx , y + dy , t− 1)) +λ× g(.), W ⊆ I .

The rst term is related to the residual from the optical ow equation;

The second concerns the regularization term:→ Hidden variables of interest (extracted from the data);→ Prior knowledge about application / target.

Example used in a compression scheme:

(dx∗, dy∗) = arg min(dx,dy)∈W f (I (x, y , t), I (x + dx, y + dy , t − 1)) + λ× C(~Vx,y , ~Vx−1,y )

with ~Vx,y motion vector at (x, y).

119





Summary


Spatial resolution of the motion vector 12 - pel

Luminance sample interpolation

The displacement estimate can be rened to subpixel accuracy ( 12- pel, 1

4- pel).

The prediction picture is rened to subpixel accuracy.

A B

C D

PELe

f

1/2 PEL

g

e = (A + B + 1)/2

f = (A + C + 1)/2

g = (A + B + C + D + 2)/4

For H264/AVC, half sample positions are obtained by applying a 6-taps lter (1,-5,20,20,-5,1).120





Summary


Spatial resolution of the motion vector 14 - pel

Luminance sample interpolation

Once the half-pel samples are avaiblable, the samples at quarter-step (quater-pel)positions are produced by bilinear interpolation.

A B

C D

PELe

f

1/2 PEL

ggh

ga gb gc

gd

gegfgg

1/4 PEL

For H264/AVC, quarter sample positions are obtained by applying a bilinear lter.

121





Summary


Hierarchical block matching

Goal and principle

The basic idea of a hierarchical block matching is to perform motion estimation ateach level of a pyramid. The estimation starts with the lowest resolution.

Low resolution

Low frequencies

High resolution

High frequencies

Rough estimateof the motion information

Accurate estimateof the motion information

122





Summary



Components of a HME

1 Pyramid construction

2 Motion estimation (dierent distances can be used)3 Coarse to ne renement:

Coarse (lowest resolution): dominant motion estimation that is propagated to thenext higher level of the pyramid

Fine (highest resolution): local motion, the previous estimated motion is locallyrened

123





Summary



Pros and cons

Pros:

Signicantly reduce the computational load (complexity);

Quite good estimation.

Cons:

Dicult to assess the motion of small object;

The initialization step is very important;

Storage resources (to keep the pictures at dierent resolutions).

124





Summary


Quality of a motion estimator

Dierent parameters... more or less important for a given application

Energy of the Displaced Frame Dierence image (DFD);

Entropy of the DFD frame;

Spatial uniformity of the motion vector (at least over the same moving areas).

For up-conversion, the quality of reconstructed frame is fundamental.

Ambiguities resulting from intensity conservation:

I (t) I (t + 1)

pp

~V1

~V2

For the point p, there are two candidates, namely ~V1 and ~V2 that provide the same DFD. To

deal with that we can use an a priori smoothness constraint to resolve the ambiguity in favor

of ~V2. But not so easy....125





Summary


Quality of a motion estimator

Example (Backward estimation, 5 levels)

126





Summary

List of compression tools

Summary

1 Introduction

2 Entropy Coding




6 Quantization

7 Predictive Coding

8 Transform coding

9 Motion estimation

127





Summary



Lossless encoding tools

Entropy coding: Human, arithmetic, Fano-Shannon;

Lempel-Ziv-Welch and run-length coding.

Lossy encoding tools for reducing the amount of information

Quantization: scalar quantizer and vector quantizer.

Lossless tools to increase the eciency of aforementionned tools

Prediction (intra, inter-frame): encode the prediction error (less bits);

Transform the image into a new domain (better compaction of the energy).

128

Suggestion for further reading...

[Daly,93] S. Daly. The visible dierences predictor: An algorithm for the assessment of imagedelity. Digital Images and Human Vision, pp. 179-206, 1993, MIT Press.

[Haar, 1910] A. Haar. Zur Theorie der orthogonalen funktionensysteme. Mathematischeannalen, 69, p. 331-371, 1910.

[Jain et al.,81] J. R. Jain and A. K. Jain. Displacement Measurement and Its Application inInterframe Image Coding. IEEE Trans. Commun., vol. COM-29, 1981.

[Koga et al.,81] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro. MotionCompensated Interframe Coding for Video Conferencing. in Proc. Nat. Telcommun.Conf., 1981.

[Linde et al.,80] Y. Linde, A. Buzo, R. Gray. An Algorithm for Vector Quantizer Design. IEEETransactions on Communications. Vol. 28, pp. 84-94, 1980.

[Lloyd,82] S. P. Lloyd. Least squares quantization in PCM. Institute of MathematicalStatistics Meeting, Atlantic City, NJ, September 1957; IEEE Transactions on InformationTheory, pp. 129-136, March 1982.

[Mallat,98] S. Mallat. A wavelet tour of signal processing. Academic Press, 1998.

[Max,60] J. Max. Quantizing for minimum distortion. IRE Trans. Information, Theory, it-6, pp.7-12, 1960.

128

[Nadenau,00] M. Nadenau. Integration of human color vision models into high quality imagecompression. PhD thesis, EPFL, 2000.

[Ninassi et al.,08a] A. Ninassi, O. Le Meur, P. Le Callet, and D. Barba. On the performance ofhuman visual system based image quality assessment metric using wavelet domain. Proc.SPIE Human Vision and Electronic Imaging XIII., Vol. 6806, pp. 680610-12, 2008.

[Ninassi et al.,08b] A. Ninassi, O. Le Meur, P. Le Callet and D. Barba. Which Semi-LocalVisual Masking Model For Wavelet Based Image Quality Metric ?, ICIP 2008, San Diego,California, USA, 2008.

[Ninassi et al.,09] A. Ninassi, O. Le Meur, P. Le Callet and D. Barba. Considering thetemporal variations of spatial visual distortions in video quality assessment. IEEE SignalProcessing, special issue on visual media quality assessment, 2009.

[Pinson et al.,04] M. Pinson and S. Wolf. A new standardized method for objectivelymeasuring video quality. IEEE Trans. Broadcasting, Vol. 50, N. 3, pp. 312-322, 2004.

[Puri et al.,87] A. Puri, H. M. Hang, D.L; Schilling. An ecient block-matching algorithm formotion compensated coding. International Conference on Acoustics, Speech and SignalProcessing, 1987.

[Srinivasan,85] R. Srinivasan and K. R. Rao. Predictive coding based on ecient motionestimation. IEEE Transactions on Communications , Vol. COM-33, No. 8, pp. 888-896,1985.

128

[Sweldens, 95] W. Sweldens. The lifting scheme: a new philosophy in biorthogonal waveletconstructions. Wavelet applications in signal and image processing III. SPIE 2569, p.68-79, 1995.

[Tourapis,01] A. Tourapis, C. Oscar and L. Ming. Predictive motion vector eld adaptivesearch technique (PMVFAST): enhancing block-based motion estimation. Proc. SPIEVol. 4310, p. 883-892, Visual Communications and Image Processing 2001.

[Wang et al.,04a] Z. Wang, A. C. Bovik, H.R. Sheikh and E.P. Simoncelli. Image qualityassessment: from error visibility to structural similarity. IEEE Trans. on Image Processing,Vol. 13, pp. 600-612, 2004.

[Wang et al.,04b] Z. Wang, L. Lu and A. Bovik. Video quality assessment based on structuraldistortion measurement. Signal processing: image communication, Vol. 19, N. 1, 2004.

[Welch,84] T.A Welch. A technique for high performance data compression. IEEE Computer,Vol. 17, N. 6, pp. 8-19, 1984.

[Witten et al.,87] I.H. Witten, R.M. Neal and J.G. Cleary. Arithmetic coding for datacompression. Communications of the ACM, 30(6), pp. 520-540, 1987.

128

IRISApeople.irisa.fr/.../CompressionTools_DIIC3...1011.pdf · Introduction Entropy Coding Other...

Documents

Transcript of IRISApeople.irisa.fr/.../CompressionTools_DIIC3...1011.pdf · Introduction Entropy Coding Other...