Gabriele Monfardini - Corso di Basi di Dati Multimediali a...

41
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 1 Introduction Much of the information is in form of images Images are handled by machines as a matrix of digital picture elements, or pixels The appearance of an image depends on image type resolution

Transcript of Gabriele Monfardini - Corso di Basi di Dati Multimediali a...

Page 1: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 1

Introduction

Much of the information is in form of images

Images are handled by machines as a matrix of digital picture elements, or pixels

The appearance of an image depends onimage type

resolution

Page 2: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 2

Types of images & Resolution

bilevel (black & white)e.g. faxes

grayscalecolor

dot per inches (dpi)600 x 600 – actual medium quality laser printer1200 x 1200 – low cost phototypesetter4800 x 4800 – high resolution phototypesetter

Page 3: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 3

Bilevel images: CCITT fax standard

fax: facsimileCCITT Comité Consultatif International Téléphonique et

Télégraphique, it is part of the ITU International

Telecommunication Union, one of the specialized agencies of the United NationsIn the late 70s CCITT starts thinking about a standard for fax transmission1980 CCITT Group 3 standard

group 1 & 2 are earlier attempt, which use simpler encoding and modulations techniques, resulting in very slow transmissions

Page 4: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 4

CCITT Group 3 - I

It is the most common standard for fax transmissionIt is accepted worldwide, almost every fax machine supports this standardIt uses compression algorithms for bilevel images

Page 5: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

5

CCITT Group 3 - II

Paper size: international A4 (not US letter)standard resolution 204x98 dpi (200x100)high resolution 204x196 dpi (200x200)

1728 bits/line

1188 lines/page

bilevel image 1 bit/pixelimage size: 1728x1188 bits at standard resolution about 2 MbitTransmission rate: 4.8 Kbit/s

today is usually higher, 14.4 – 33.6 Kbit/sAt 4.8Kbit/s in std resolution one page would take about 430 sec, but only 1 minute on average with Group 3 algorithms

Page 6: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 6

Run-length coding

Each scan line is composed by sequences of pixel of the same color

Count the number of element of each runExample 3w 4b 9w 2b 2w 6b 5w 2b 5w...

Page 7: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 7

G3 1D

Group 3 One-Dimensional coding (G3 1D) is called Modified Huffman (MH) as it encodes runlengths using a predefined Huffman codeIn order to maintain black/white syncronization, each line begins with a white run, eventually of zero length

Page 8: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 8

G3 1D

1000 011 10100 11 0111 0010 ...

predefined Huffman codewords have been found from the probabilities of the runs in typical handwritten documents

Page 9: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 9

G3 1DAs one line has 1728 bits, we have to define a codeword for all 1728 black and white run lengthsAs shorter runs occur more frequently that longer runs, we code each run length in an additive form

there is a terminating and makeup codewordLengths form 0 to 63 are coded with a single terminating codewordLonger runs are coded with one or more makeup codewords and a terminating codeword

Each line is terminated with a EOL symbol composed of eleven 0 and one 1

Page 10: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

10

G3 2DGroup 3 Two-Dimensional coding (G3 2D) is called Modified READ (MR) as it is a variant of a previously defined code, called READ (Relative Element Address Designate)Many images have a high degree of vertical coherence between consecutive lines

changing elements are coded w.r.t. a “nearby” change position of the same color in the previous (reference) line

Page 11: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 11

G3 2D

Nearby means within an interval of radius 3 pixelsIf there are changing elements in the current line without correspondents in the reference line switch to horizontal mode (1D)On the opposite if the ref line has a run with no counterpart in the current line special pass code

Page 12: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

12

G3 2D

reference line

current line

...vertical mode horizontal mode

pass code

vertical mode

<mode | length of preceding white run | length of black run>

+2 -2-1 0

from a Huffman table, with codewords for -3, -2, -1, 0, +1, +2, +3

0001generated code

Page 13: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 13

G3 2D

Two dimensional coding is more prone to transmission errors

In the G3 1D an error may cause problems in the entire line, but syncronization is forced back by EOL codewordHere an error in the reference line is likely propagated in all the other linesFor this reason there are 1 reference line for each klines (i.e. k-1 are coded w.r.t. each ref line)standard resolution k=2high resolution k=4

Page 14: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

14

CCITT fax standard compression performances

Standard resolution (~200x100 dpi)G3 1D 0.13 bits/pixel 57s. for A4 at 4.8 KbpsG3 2D (k=2) 0.11 bits/pixel 47s. for A4 at 4.8 Kbps

High resolution (~200x200 dpi)G3 2D (k=4) 0.09 bits/pixel 74s. for A4 at 4.8 Kbps

Compression is very good for office image where run lengths are longIt would be very bad for bilevel natural images

Page 15: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 15

Continuous-tone images: why lossless compression?

lossy compression is often preferred to have remarkably more compressed images, with good qualityHowever there are some situations in which using an approximation may not be adequate

medical imageshistorical documentsimages with legal relevance

Page 16: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

16

Continuous-tone images: lossless compression

GIF standardPNG standardJPEG-LS

It is a quite new standard. The original JPEG standard included a lossless mode, but its performances were not close to ‘state of the art’extimation of pixel value using quite simple context: effective and low cost solutionwww.hpl.hp.com/loco

Page 17: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 17

GIF image format - I

Adopted by CompuServe to minimize the time required to download images over a modem linkThe most widely used lossless image format until 19958-bit pixel description256 color images, but it is possible to use a color map

Page 18: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 18

GIF image format - II

The color map can be specified for each image or can be omitted

if specified, it is included as an header into image file, in uncompressed formcolor map is composed of 256 24-bit entries, that specify 256 RGB colors

Compression scheme used is LZWAlphabet symbols are the 256 colors of the color map plus a “clear” code and an “end-of-information”code

Page 19: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

19

GIF image format - IIIEven if this feature is not widely used, GIF files may contain more than one image, and it is possible to share the color mapLZW-coded information is grouped into blocks preceded by a byte-count, in order to skip an image without decompressing itIn 1995 Unisys announced that there would be royalties on GIF implementations due to an old patent they held on LZWThis catalyzed the development of a new lossless image format, designed for public domain and with the last improvements

Page 20: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 20

PNG image format - I

Portable Network Graphics (pronounced “ping”)

it uses gzip compression schemethrough some improvements compression obtained is about 10-30% better than GIFBy default it encodes the pixels in raster scan order, but some other methods are available

it is possible to code horizontal difference, i.e. the difference between current pixel value and the previous one or vertical difference, i.e. the difference w.r.t. the above pixelaverage difference, the difference with the average of above and next pixel...

Page 21: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 21

PNG image format - II

It is possible to use more than 256 colors, up to 16 bit grayscale and 48 bit colorGIF uses one special pixel value to indicate transparency, PNG uses 256 different values per pixel, allowing for picture progressively fading into the background

It seems inevitable that PNG format will gradually assume the role of standard lossless image format for the WWW, replacing GIF

Page 22: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 22

Continuous-tone images: why lossy compression?

Digital images are yet an approximation of the real analog phenomenonlossy techniques allow to obtain very good compression with a modest lost of detailsThis is useful for storing and trasmitting images

Page 23: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 23

Continuous-tone images: lossy compression

JPEGJPEG2000

a new image coding system that uses state-of-the-art compression techniques based on wavelet technologyfile extension .jp2With very compressed files, if image size is the same, perceived quality of JPEG2000 images is better w.r.t. JPEG images

Page 24: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 24

JPEG format - I

JPEG is a standard defined by the Joint Photographic Experts Group in 1992It was conceived to transmit images at 64 KbpsIt has a lossy mode and a lossless mode (not so much used, and today replaced by the JPEG-LS standard)With lossy mode it allows to obtain very good quality at about 1 bit/pixelImplementation complexity is reasonable

Page 25: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 25

JPEG format - II

It could be used with graylevel and color imagesEach channel of the color space (RGB, YUV...) is treated separatelyit allows progressive transmission (that is much better suited for WWW than raster transmission)

Page 26: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Raster vs. progressive transmission

Page 27: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 27

JPEG Coder - I

BinaryBinaryEncoderEncoder

DiscreteDiscreteCosineCosine

TransformTransformQuantizationQuantization

Page 28: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 28

JPEG Coder - II

Image is divided in 8x8-pixel squaresPreprocessingApply Discrete Cosine Transform on each squareCoefficient quantizationBit stream encoding

Page 29: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29

Preprocessing: color space transformation & downsampling

from RGB into YUVThe Y component represents the brightness of a pixel, and the U and V components together represent the hue and saturationHuman eye can see more detail in the Y component than in the U and V, that can be compressed more aggressively

4:4:4 no downsampling4:2:2 horizontal downsampling of a factor 24:2:0 both horizontal and vertical downsampling

Page 30: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 30

Discrete Cosine Transform - I

The discrete cosine transform (DCT) is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbersIt is used in JPEG because it is fast and quite easy to implement efficiently

Page 31: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 31

Discrete Cosine Transform - II

where the block is pixels (in JPEG, 8x8)A(i,j) is the value of pixel of position (i,j)

is the DCT coefficient of positionlow values for corresponds to low vertical frequencies, low values for to low horizontal frequenciesGenerally higher frequencies have very low values

1 2N N×

1 2B(k ,k ) 1 2(k ,k )

1k2k

Page 32: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 32

Discrete Cosine Transform - III

DCT function basis

each 8x8 square is reduced to 64 coefficient

Page 33: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 33

Discrete Cosine Transform - IV

Knowing with infinite precision the 64 DCT coefficient it is possible to reconstruct exactly the pixels of the squareBut

finite precisionquantization of the coefficients (always)Some coefficient related to high frequency are not transmitted. This allows higher compression without sacrifying too much quality as human eye is less responsible

Page 34: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

34

Quantization - I

The DCT matrix obtained is scaled differently in each component, dividing each by a diferent factorthe factor for each component has been decided based on human sensitivity to changes at each frequencyIn practice the matrix of factor is usually

Page 35: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 35

Quantization - II

Next, all values are rounded to nearest integerThis leads to a quite high number of 0s in the high frequency zone, as factors are bigger

Page 36: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 36

Zig-zag scan

Low frequency coefficients are transmitted before higher frequency coefficientsThis allows for progressive visualization of this 8x8 block

Page 37: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 37

Raster vs. progressive transmission

Raster transmissionDCT coefficient of the upper left block, then those of all the others in the upper part of the image and so on

Progressive transmissionfirst all (0,0) coefficients, than all (0,1) and so on, following zig-zag scan in each block

Page 38: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 38

Binary coding

DCT(0,0) has usually a very slow variation from one block to the next, as it is the mean value

For this reason it is convenient to encode the difference from the previous value

Tipically the bit stream is coded with HuffmanIt is possible to use arithmetic scheme, gaining some compression at cost of decoding speed

Huffman codes are predefined, or it is possible to build optimal tables and insert them in the stream

Page 39: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 39

JPEG Decoder

BinaryBinaryDecoderDecoder

DequantizationDequantization

Some values are lost!

Inverse DCTInverse DCT

Good quality, but reconstruction is not exact

Page 40: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 40

JPEG performances - I

Page 41: Gabriele Monfardini - Corso di Basi di Dati Multimediali a ...marco/bdm/Materiale_didattico/2005... · Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 29 Preprocessing:

41

JPEG performances - IIOriginal Quality factor 75

Quality factor 20 Quality factor 3