Lecture 3

Digital Imaging2006/2007

Lecture 3

Image processing II

Ioannis Ivrissimtzis 05 - Mar - 2007

Summary of the lecture

Frequency domain

Basis change

Walsh - Hadamard transform

Tensor product transforms

Discrete Cosine Transform

Image compression

Encoding

Huffman encoding

Image compression

JPEG

Spatial/Frequency domain

The analysis and processing of an image can be done in different domains:

The spatial domain (up to now we always worked in this domain).

A frequency domain.

Basis change

Example: We have two numbers a,b.

We can represent them separately as a,b.

We can represent them by two different numbers (a+b)/2, (a-b)/2.

If we know a,b we can find (a+b)/2, (a-b)/2.

If we know (a+b)/2, (a-b)/2 we can find a,b (check).

Basis change

In matrix language we have

for the first representation and

for the second.

b

a

b

a

10

01

2

211

11

22

22ba

ba

baba

baba

b

a

Basis change

We use the terminology:

b

a

b

a

10

01

2

211

11

22

22ba

ba

baba

baba

b

a

Coefficients

Basis

Basis changeWhy use the second more complicated basis?

In some applications we may have to work with an incomplete set of coefficients.

A subset of the coefficients of the second base may give better information about the whole set of coefficients.

In the first base the first coefficient is a, while in the second base the first coefficient is (a+b)/2 which is more representative.

In progressive image transmission we prefer the first 10% of the data to give an approximation of the whole image, rather than an exact description of a small part of it.

Basis change

Why use the second more complicated basis?

It could be convenient.

Consider the problem of allocating a fixed amount of money x to two people.

In the first basis we have to work with two variables a,b related by the equation a+b = x.

In the second basis the first coefficient is fixed at S/2 and the only variable is the second coefficient, which controls the difference between the money each person gets.

1

0

0

0

,

0

1

0

0

,

0

0

1

0

,

0

0

0

1

Example

The four matrices below form a basis for the 4x1 matrices.

We can write any other 4x1 matrix as a linear combination of them, in a

unique way.

Writing a 4x1 matrix in this basis is trivial. We have:

giving,

4

3

2

1

4

3

2

1

0

0

0

0

0

0

0

0

0

0

0

0

a

a

a

a

a

a

a

a

Example

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

4321

4

3

2

1

aaaa

a

a

a

a

1

0

0

0

?

0

1

0

0

?

0

0

1

0

?

0

0

0

1

3

8

1

0

3

1

0

0

0

?

0

1

0

0

?

0

0

1

0

?

0

0

0

1

?

8

1

0

3

1

0

0

0

?

0

1

0

0

?

0

0

1

0

0

0

0

0

1

3

8

1

0

3

Example

For example:

We call this base, the natural base.

1

0

0

0

?

0

1

0

0

1

0

0

1

0

0

0

0

0

1

3

8

1

0

3

1

0

0

0

8

0

1

0

0

1

0

0

1

0

0

0

0

0

1

3

8

1

0

3

Example

A different basis for the 4x1 matrices:

We can write any other 4x1 matrix as a linear combination of the four matrices above, in a unique way. We call this a basis change.

1

1

1

1

,

1

1

1

1

,

1

1

1

1

,

1

1

1

1

How do we write a 4x1 matrix in the new basis?

1

1

1

1

?

1

1

1

1

?

1

1

1

1

?

1

1

1

1

?

8

1

0

3

Example

Let the coefficients be the unknowns .

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

8

1

0

3

4321 xxxx

Example

4321 , , , xxxx

We can rewrite the equation as a linear system in matrix form.

4

3

2

1

1111

1111

1111

1111

8

1

0

3

x

x

x

x

Example

To solve the system we invert the transformation matrix.

4

3

2

1

8

1

0

31

1111

1111

1111

1111

x

x

x

x

Example

Example

The inverse of this matrix is

In the literature, sometimes the original transformation matrix is divided

by 1/4 or 1/2. ( 1/N or in general ).

1111

1111

1111

1111

4

1

1

1111

1111

1111

1111

N/1

Example

We get,

This is called the Walsh-Hadamard transform of this.

1

2/5

2/3

3

8

1

0

3

1111

1111

1111

1111

4

1

8

1

0

31

1111

1111

1111

1111

Example

The natural basis

Example

The Walsh-Hadamard basis


Frequency domain

Basis change




Image compression

Encoding

Huffman encoding

Image compression

JPEG

Walsh-Hadamard transform

We can generalise the previous transform to larger matrices. For example, the Walsh-Hadamard transform of order 8 is given by the matrix

11111111

11111111

11111111

11111111

11111111

11111111

11111111

11111111


Notice that the rows are ordered by the number of sign changes. We say that they are ordered by sequency.

01234567

11111111

11111111

11111111

11111111

11111111

11111111

11111111

11111111


The natural basis for 8x1 matrices


The Walsh-Hadamard basis for 8x1 matrices


Different components of the natural basis correspond to different pixels of the image.

Different components of the Walsh-Hadamard basis correspond to different “frequencies”.

The general W-H transform

The general Hadamard matrix Hn of order 2n is defined recursively by:

A sequency ordering of its rows will give the corresponding Walsh-

Hadamard transform.

11

11

nn

nnn HH

HHH

Two dimensional W-H transform The 2D Walsh-Hadamard transform is the tensor of the 1D transform.

Example: Every 4x4 greyscale image can be uniquely written in the Walsh-Hadamard basis as linear combination of these 16 images.

The white squares denote 1’s and the black squares denote -1’s.

Two dimensional W-H transform

(1,1,1,1)

(1,1,-1,-1)

(1,-1,-1,1)

(1,-1,1,-1)

(1,1

,1,1

)

(1,1

,-1

,-1)

(1,-

1,-1

,1)

(1,-

1,1

,-1)

How do we compute these sixteen images?

Take the corresponding elements of the 1D basis and find their tensor product.

Two dimensional W-H transform

11111

11111

11111

11111

1111


Frequency domain

Basis change




Image compression

Encoding

Huffman encoding

Image compression

JPEG


How can we compute T(F), the W-H transform of an 2D image F ?

It may seem that we have to solve a large and complicated linear system.

In fact, the transform is computed directly by

where H is the W-H matrix of the 1D transform and H′ is the transpose of H.

')( HFHFT


That is, to find the transform of an image, we multiply it from the left with the transformation matrix and from the right with its transpose.

44434241

34333231

24232221

14131211

44342414

43332313

42322212

41312111

44434241

34333231

24232221

14131211

44434241

34333231

24232221

14131211

zzzz

zzzz

zzzz

zzzz

tttt

tttt

tttt

tttt

aaaa

aaaa

aaaa

aaaa

tttt

tttt

tttt

tttt

The original Image A

The transformof A

The transformation matrix T

The transpose of T


To see why this happens we first need to introduce the notion of orthogonality.

We say that a matrix is orthogonal if its inverse is equal to its transpose.

bbbb

bbbb

bbbb

bbbb

bbbb

bbbb

bbbb

bbbb

44342414

43332313

42322212

41312111

44434241

34333231

24232221

141312111

Tensor product transformsConsider an image with all its pixels equal to 0, except one which has value 1. Notice that this is an element of the natural basis. We have:

4342334223421342

4332333223321332

4322332223221322

4312331223121312

44342414

43332313

42322212

41312111

42

32

22

12

000

000

000

000

bbbbbbbb

bbbbbbbb

bbbbbbbb

bbbbbbbb

bbbb

bbbb

bbbb

bbbb

b

b

b

b

44342414

43332313

42322212

41312111

44434241

34333231

24232221

14131211

0000

0000

0100

0000

bbbb

bbbb

bbbb

bbbb

bbbb

bbbb

bbbb

bbbb

=


This is equivalent to the tensor product of the two corresponding 1D basis

images:

4342334223421342

4332333223321332

4322332223221322

4312331223121312

bbbbbbbb

bbbbbbbb

bbbbbbbb

bbbbbbbb

434233422342134242

433233322332133232

432233222322132222

432233122312131212

43332313

bbbbbbbbb

bbbbbbbbb

bbbbbbbbb

bbbbbbbbb

bbbb


To put it all together, let B be an orthogonal matrix corresponding to an 1D basis. Then, T=B-1 is the transform matrix.

Let A be an image. We have to show that Z=T·A·T′ is the transform of A

corresponding to the tensor product of the 1D basis B.

To see this, let be the (u, v) elements of the 2D natural and

transform basis, respectively.

uvuv BI ,

A B z B z

A BIB z BIB z

A (T')IT z (T')IT z

A (T')) z I I(z T

T' A T ) I z I (z

TATZ

nnnn

nnnn

-nn

-nn

--

-nnnn

-

nnnn

1111

1111

11111

111

11111

1

1111

''

'


The latter means that the matrix Z gives indeed the coefficients for writing A in the tensor product basis of B.


Frequency domain

Basis change




Image compression

Encoding

Huffman encoding

Image compression

JPEG

DCTThe Discrete Cosine Transform (DCT) is given by the matrix with entries

In fact, there are several types of DCT and this is the type II, which is the most popular.

Each type corresponds to different assumptions about the boundary of the image.

otherwise

2

)12(cos

0for 2/1

),(

n

uv

u

vuT

Example

The DCT matrix for n=4

8

21cos

8

15cos

8

9cos

8

3cos

8

14cos

8

10cos

8

6cos

8

2cos

8

7cos

8

5cos

8

3cos

8cos

2/12/12/12/1

T

Spatial/Frequency domain

The DCT is an example of a transform from the spatial to the frequency domain.

While the elements of the original matrix correspond to pixels, the elements of the transform correspond to frequencies.

The decomposition of a signal into frequencies, has many applications (e.g. analysis, processing, compression). It allows us to see properties of the signal which remain hidden in the spatial domain representation.

The frequency domain, even though it is more complicated, it is the natural domain of many signals (e.g. sound, light, electric signals), because they consist of waves of different frequencies.

ExampleThe sound signal is decomposed into several frequencies.

Each frequency can be amplified or attenuated.

http://kaffeine.sourceforge.net/featurepics/equalizer.png

http://kaffeine.sourceforge.net/featurepics/equalizer.png

Example

The signal can be digital or analogue.

http://rolls.com/rollsproducts/




The 2D DCT is obtained in the usual way ,where T is the matrix of the 1D DCT.

For example, the basis of the 4x4 DCT consists of 16 matrices. One such matrix is shown below.

TATZ '**

0.250 0.135 -0.250 -0.326

0.135 0.073 -0.135 -0.176

-0.250 -0.135 0.250 0.326

-0.326 -0.176 0.326 0.426

The (2,3) element of the

basis of the 4x4 DCTWe add 0.5 to the matrix

to put it in the range [0,1]

2D DCT

DCT frequencies

As it is obvious from the definition of the DCT, the values of the u row of the matrix are on the underlying function:

We can see the rows of the matrix (which give the basis of the transform) as a sample from these functions.

n

vu

2

)12(cos

DCT frequencies

The underlying the functions for n=8.

DCT frequencies

Notice that the low frequencies correspond to the top rows and the high frequencies to the bottom rows.

Therefore, in the 2D DCT the low frequencies correspond to the top-left of the image’s transform.

Example

The large coefficients are concentrated on the upper left corner of the image of the DCT transform.

DCT

Filters for the DCT

The design of a filter for DCT depends on the distribution of the frequencies in the frequency domain.

Ideal lowpass filter Ideal highpass filter

The design of a filter for DCT depends on the distribution of the frequencies in the frequency domain.

Ideal filters for the DCT

Ideal bandpass filter Ideal bandreject filter


Frequency domain

Basis change




Image compression

Encoding

Huffman encoding

Image compression

JPEG

Encoding

We want to find a compact computer representation for a message written in the form of a string of symbols

a b a b a b a d a a a b c d a d b a a b

The symbols are also called letters.

The set of all different letters is called alphabet. The above string uses an alphabet of 4 letters:

{ a , b , c , d }

EncodingThe resulting computer representation will be a string of bits

0 1 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0 1 0 1 0 1

That is, a sequence of 0’s and 1’s.

The process of going from the initial string of symbols to the bit string is called encoding.

The reverse process, from the bits to the string of symbols is called decoding.

The aim is to find a representation as compact as possible, that is, to use as few bits as possible.

Encoding

The coding methods we study here assign a bit string to each letter of the alphabet.

This correspondence between letters and sting bits is called the code.

An simple example of code is:

We say that the code of a is 00, the code of b is 01, …

11

01

10

00

d

c

b

a

Encoding

With the above code:

The string

“a b a b a b a d a a a b c d a d b a a b”

is encoded as

“0001000100010011000000011011001101000001”

11

01

10

00

d

c

b

a

Encoding

The bit string

“0001110111010011000000011011001101000001”

is decoded by taking the bits two-by-two and finding the corresponding letter

“ 00 | 01 | 00 | 01 | 00 | 01 | 00 | 11 | 00 | 00 | 00 | 01 | … “

“a b a b a b a d a a a b … “

Fixed-length code

In the previous example all the letters were encoded by same number of bits, (two bits for each letter). Such codes are called fixed-length codes.

We want to improve on the previous code, that is, to use less bits for encoding a given string.

The idea is to use fewer bits for the letters that appear more frequently and more bits for the letters that are used less frequently.

Variable length codeConsider the code

The string


is now encoded as

“010010010111000101101110111100010”

that is, with 33 bits instead of the 40 bits with the previous code.

111

011

01

0

d

c

b

a

Prefix codesThe code in the previous slide is a prefix free and this property allows us to decode it back to the original string.

Prefix free means that the code of a letter can not be the beginning of the code of another letter.

The code in the previous slide is prefix free because

the code of a is 0 and no other letter’s code starts with 0

the code of b is 10 and no other letter’s code starts with 10

the code of c is 110 and no other letter’s code starts with 110

the code of d is 111 and no other letter’s code starts with 111

Prefix codes

To decode a prefix free code:

We start from the beginning of the string and check the first bit, then the first two bits and so on, until we find the code of a letter.

Add this letter to the decoded string. Because of the prefix free property it can not be the beginning of the code of a different letter.

We continue with the next bits, until the end of the string.

If the process breaks down (i.e. we can not find a valid code), it means that there was an error in the encoding process.

Prefix codesIn the previous example the string

“010010010111000101101110111100010”

is decoded as

“0 | 10 | 0 | 10 | 0 | 10 | 111 | 0 | 0 | 0 | 10 | 110 | 111 | 01 | …”

giving,


Different letters may have codes of different length, but nevertheless we do not need separators to indicate the end of a letter’s code.


Frequency domain

Basis change




Image compression

Encoding

Huffman encoding

Image compression

JPEG

Huffman coding

The Huffman encoding is an algorithm for finding an efficient prefix code for a given string.

The Huffman code is optimal. It requires the least number of bits compared to any other prefix code, under the assumption that every letter is encoded-decoded separately.

Huffman coding

STEP 1: (preparatory)

• Count how many times each letter appears in the string.

• Divide these numbers by the total number of letters in the string to find the probability of each letter.

This normalization is not necessary. The algorithm would also work with the actual number of appearances of each letter.

If we do not know the string we encode, we may use an estimate of the probabilities.

Huffman coding

STEP 2:

• Sort the probabilities of the letters in descending order.

• Combine the two letters with the lowest probability. The probability of the compound letter is the sum of the probabilities of its components.

• Sort again the letters and continue combining the two letters with the lowest probability until there are only two letters.

It is important to keep a record of the letter combined at each step.It will be needed at STEP 3 where this process will be reversed.

Huffman coding

STEP 3

• At this stage there are only two (possibly compound) letters. Their first bit will be 0 and 1, respectively.

• While there is a compound letter, split it, and make the next bit of the two constituent (possibly compound) letters to be 0 and 1, respectively.

• Continue splitting, until there is no compound letter left and you have recovered the initial letters.

Example

Consider a string S consisting of 1000 letters from an alphabet of 6

{ a , b , c , d , e , f }

The frequency with which a letter appears in the message is shown in the following table

Exercise: Find the Huffman code for S.

a b c d e f

450 100 120 30 200 100

Example

STEP 1

We divide by 1000 (the number of letters in the message) to find the probability of each letter

We sort them in descending order

a b c d e f

0.45 0.1 0.12 0.03 0.2 0.1

a e c b f d

0.45 0.2 0.12 0.1 0.1 0.03

ExampleSTEP 2

We combine the two least probable letters and sort again


a e f + d c b

0.45 0.2 0.13 0.12 0.1

a c + b e f + d

0.45 0.22 0.2 0.13

ExampleSTEP 2


We combine the two least probable letters

a e + ( f + d ) c + b

0.45 0.33 0.22

( e + ( f + d ) ) + ( c + b ) a

0.55 0.45

ExampleSTEP 3

We now have only two letters and can proceed with STEP 3. Notice that we do not need any information about the probabilities any more.

We assign the first bit of the two letters

We split the compound letter, assigning the second bit

( e + ( f + d ) ) + ( c + b ) a

0 1

e + ( f + d ) c + b a

00 01 1

ExampleSTEP 3

We split the compound letters, assigning the third bit

We split the compound letter, assigning the fourth bit

e f + d c b a

000 001 010 011 1

e f d c b a

000 0010 0011 010 011 1

Binary trees

Why does the Huffman algorithm produces prefix free codes?

An intuitive way to see this by considering binary trees.

A binary tree is a special type of graph, that is, nodes connected with edges.

One node of the binary tree, called the root is at the top.

Each node is connected either with none, or exactly with two nodes below it.

The node above is called the parent and the two nodes below are called the children.

The nodes with no children are called leaves.

Example of binary tree:

Binary trees

Root

Level 1

Level 2

Level 3

Level 4

The leaves are shown aswhite circles.

Binary trees

There is a natural way to assign a bit string to each node.

The left child has the bit string of its parent with an additional 0 at the end.

The right child has the bit string of its parent with an additional 1 at the end.

Binary trees

0 1

00 01

010 011000 001

0011 0010

The bit strings of the leaves of a binary tree form a prefix free code.

Indeed, consider any code that is the beginning of the a leaf’s code.

They are all in the path joining the leave with the root.

Thus, they are internal nodes.

Binary trees

0 1

00 01

010 011000 001

0011 0010

The Huffman code is given by the leaves of a binary tree.

Therefore, it is prefix free.

Binary trees

a

bc

d

e

f

Huffman coding

The STEP 2 of the algorithm determines a binary tree.

The STEP 3 of the algorithm reconstructs it.

The algorithm is efficient (in fact, it is optimal) because pushes the letters with low probability to the bottom of the tree, while the letters with high probability stay at the top levels of the tree.


Frequency domain

Basis change




Image compression

Encoding

Huffman encoding

Image compression

JPEG

Compression

Two data sets may represent the same information but havedifferent size in the computer’s memory.

For example the same picture can be encoded in different formats, .jpg, .png, .bmp and have different file size.

If the same information is encoded by two data sets consisting of n1 and n2 information units (e.g. bits), we say that the second set has been compressed with a compression ratio

Cr = n1 / n2

Compression

The processing of the initial data into a new data set with smaller size is called compression (or encoding), while the inverse process of recovering the initial data is called decompression (decoding).

Decompression is necessary when the compressed data are in a form

that can not be immediately used.

Usually, the main consideration in compression is the compression ratio. We are looking for large compression ratios, that is, to use a few bits as possible.

Compression

Other considerations in compression are:

The the complexity and the time and memory costs of the compression-decompression algorithm.

The resiliency of the algorithm. That is, how does small errors in the compressed data affect the decompressed data?

CompressionSometimes, depending on the application, we are not interested in recovering the exact initial data but only an approximation of it.

Compression algorithms that can recover the exact initial data are called lossless. Algorithms that can not recover the exact initial data are called lossy.

Generally, lossless algorithms achieve better compression rates.

In image processing usually we can afford the loss of some information,

so, lossy algorithms are common.

CompressionTo evaluate a lossy algorithm we need a measure of the information lost by the compression. That is, we need an estimate of the difference between the initial and the compressed and then decompressed image.

The root-mean-square error.

The root mean-square signal to noise ratio.

Subjective criteria:

Absolute rating scale (e.g. the rating scale of the TelevisionAllocations Study Organization).

Side by side comparison of the original and the compressed and then decompressed image.

CompressionIn image compression, the initial image is processed and a new compressed image is obtained, with smaller size but which, nevertheless, carries the same or a comparable amount of information.

That means that the initial data carried a certain amount of redundancy.

We can identify several types of redundancy, from higher type to lower:

Psychovisual redundancy

Interpixel redundancy

Coding redundancy


Frequency domain

Basis change




Image compression

Encoding

Huffman encoding

Image compression

JPEG

JPEGThe JPEG is a compression algorithm using the Discrete Cosine Transform and Huffman encoding.

Subdivide the image into pixel blocks of 8x8 size. The blocks are processed one after the other from left to right and top to bottom.

Assuming that the values of the pixels are integers in the range [0 , 255], subtract 128 to bring them in the range [-128 , 127]. The reason is that the DCT is maps the interval [-128 , 127] to itself.

Apply the DCT. The DCT values are computed with 11-bit precision (even though the input has 8-bit precision).

JPEGNext, we scale and quantize the DCT values, using the scaling array

16 11 10 16 24 40 51 61

12 12 14 19 26 58 60 55

14 13 16 24 40 57 69 56

14 17 22 29 51 87 80 62

18 22 37 56 68 109 103 77

24 35 55 64 81 104 113 92

49 64 78 87 103 121 120 101

72 92 95 98 112 100 103 99

JPEG

The DCT coefficients at the top right of the array are scaled less.

The actual number were obtained experimentally.

DCT

JPEGCreate a sequence of DCT coefficients using the zig-zag pattern:

Finally, encode with Huffman encoding, using a special symbol for the end of the non-zero coefficients.

0 1 5 6 14 15 27 28

2 4 7 13 16 26 29 42

3 8 12 17 25 30 41 43

9 11 18 24 31 40 44 53

10 19 23 32 39 45 52 54

20 22 33 38 46 51 55 60

21 34 37 47 50 56 59 61

35 36 48 49 57 58 62 63

Lecture 3

Documents

Transcript of Lecture 3