Main

Chapter 1

INTRODUCTION

Digital image processing is an area characterized by the need for extensive experimental

work to establish the viability of proposed solutions to a given problem. An important

characteristic underlying the design of image processing systems is the significant level of

testing & experimentation that normally is required before arriving at an acceptable

solution. This characteristic implies that the ability to formulate approaches &quickly

prototype candidate solutions generally plays a major role in reducing the cost & time

required to arrive at a viable system implementation.

1.1 What is DIP?

An image may be defined as a two-dimensional function f(x, y), where x & y are spatial

coordinates, & the amplitude of f at any pair of coordinates (x, y) is called the intensity or

gray level of the image at that point. When x, y & the amplitude values of f are all finite

discrete quantities, we call the image a digital image. The field of DIP refers to

processing digital image by means of digital computer. Digital image is composed of a

finite number of elements, each of which has a particular location & value. The elements

are called pixels.

Vision is the most advanced of our sensor, so it is not surprising that image play

the single most important role in human perception. However, unlike humans, who are

limited to the visual band of the EM spectrum imaging machines cover almost the entire

EM spectrum, ranging from gamma to radio waves. They can operate also on images

generated by sources that humans are not accustomed to associating with image.

There is no general agreement among authors regarding where image processing

stops & other related areas such as image analysis& computer vision start. Sometimes a

distinction is made by defining image processing as a discipline in which both the input &

output at a process are images. This is limiting & somewhat artificial boundary. The area

of image analysis (image understanding) is in between image processing & computer

vision.

1

There are no clear-cut boundaries in the continuum from image processing at one

end to complete vision at the other. However, one useful paradigm is to consider three

types of computerized processes in this continuum: low-, mid-, & high-level processes.

Low-level process involves primitive operations such as image processing to reduce

noise, contrast enhancement & image sharpening. A low- level process is characterized

by the fact that both its inputs & outputs are images. Mid-level process on images

involves tasks such as segmentation, description of that object to reduce them to a form

suitable for computer processing & classification of individual objects. A mid-level

process is characterized by the fact that its inputs generally are images but its outputs are

attributes extracted from those images. Finally higher- level processing involves “Making

sense” of an ensemble of recognized objects, as in image analysis & at the far end of the

continuum performing the cognitive functions normally associated with human vision.

Digital image processing, as already defined is used successfully in a broad range of areas

of exceptional social & economic value. Images are an everyday aspect of computers

now. Web sites on the internet are generally made up of many pictures. A large

proportion of transmission bandwidth and storage facilities are taken up with computer

images. Reducing the storage requirements of the image while retaining the quality is

very important, otherwise systems would become completely clogged. Since 1990, the

JPEG1 picture format has been adopted as the standard for photographic images on the

internet. This project looks at another method for compressing images using the Singular

Value Decomposition (SVD).

1.2 What is an Image?

An image is represented as a two dimensional function f(x, y) where x and y are spatial

co-ordinates and the amplitude of ‘f’ at any pair of coordinates (x, y) is called the

intensity of the image at that point.

1.3 Coordinate Convention

The result of sampling and quantization is a matrix of real numbers. We use two principal

ways to represent digital images. Assume that an image f(x, y) is sampled so that the

resulting image has M rows and N columns. We say that the image is of size M X N. The

2

values of the coordinates (xylem) are discrete quantities. For notational clarity and

convenience, we use integer values for these discrete coordinates. In many image

processing books, the image origin is defined to be at (xylem)=(0,0).The next coordinate

values along the first row of the image are (xylem)=(0,1).It is important to keep in mind

that the notation (0,1) is used to signify the second sample along the first row. It does not

mean that these are the actual values of physical coordinates when the image was

sampled. Following figure shows the coordinate convention. Note that x ranges from 0 to

M-1 and y from 0 to N-1 in integer increments.

The coordinate convention used in the toolbox to denote arrays is different from

the preceding paragraph in two minor ways. First, instead of using (xylem) the toolbox

uses the notation (race) to indicate rows and columns. Note, however, that the order of

coordinates is the same as the order discussed in the previous paragraph, in the sense that

the first element of a coordinate topples, (alb), refers to a row and the second to a column.

The other difference is that the origin of the coordinate system is at (r, c) = (1, 1); thus, r

ranges from 1 to M and c from 1 to N in integer increments. IPT documentation refers to

the coordinates. Less frequently the toolbox also employs another coordinate convention

called spatial coordinates which uses x to refer to columns and y to refers to rows. This is

the opposite of our use of variables x and y.

1.4 Image as Matrices

The preceding discussion leads to the following representation for a digitized image

function

f (0,0) f(0,1) ……….. f(0,N-1)

f (1,0) f(1,1) ………… f(1,N-1)

f(xylem) = . . .

. . .

. . .

f (M-1,0) f(M-1,1) ………… f(M-1,N-1)

3

The right side of this equation is a digital image by definition. Each element of this array

is called an image element, picture element, pixel or pel. The terms image and pixel are

used throughout the rest of our discussions to denote a digital image and its element.

A digital image can be represented naturally as a MATLAB matrix:

f(1,1) f(1,2) ……. f(1,N)

f(x, y) = f(2,1) f(2,2) …….. f(2,N)

. . .

f (M, 1) f(M,2) …….f(M,N)

Where f(1,1) = f(0,0) (note the use of a monoscope font to denote MATLAB quantities).

Clearly the two representations are identical, except for the shift in origin. f(p ,q) denotes

the element located in row p and the column q.

Matrices in MATLAB are stored in variables with names such as A, a, RGB, real

array and so on. Variables must begin with a letter and contain only letters, numerals and

underscores. As noted in the previous paragraph, all MATLAB quantities are written

using monoscope characters. We use conventional Roman, italic notation such as f(x ,y),

for mathematical expressions.

1.5 Image Types

The toolbox supports four types of images:

Intensity images

Binary images

Indexed images

RGB images

Most monochrome image processing operations are carried out using binary or intensity

images, so our initial focus is on these two image types. Indexed and RGB color images.

4

1.5.1 Intensity Images

An intensity image is a data matrix whose values have been scaled to represent intentions.

When the elements of an intensity image are of class unit8, or class unit 16, they have

integer values in the range [0,255] and [0, 65535], respectively.

1.5.2 Binary Images

Binary images have a very specific meaning in MATLAB.A binary image is a logical

array of 0s and1s.Thus, an array of 0s and 1s whose values are of data class, say unit8, is

not considered as a binary image in MATLAB .A numeric array is converted to binary

using function logical. Thus, if A is a numeric array consisting of 0s and 1s, we create an

array B using the statement.

B=logical (A)

If A contains elements other than 0s and 1s.Use of the logical function converts all

nonzero quantities to logical 1s and all entries with value 0 to logical 0s. Using relational and

logical operators also creates logical arrays. To test if an array is logical we use the I logical

function: is logical(c). If c is a logical array, this function returns a 1.Otherwise returns a 0.

Logical array can be converted to numeric arrays using the data class conversion functions.

1.5.3 Indexed Images

An indexed image has two components:

A data matrix integer, x.

A color map matrix, map.

Matrix map is an m*3 arrays of class double containing floating_ point values in the

range [0, 1]. The length m of the map are equal to the number of colors it defines. Each

row of map specifies the red, green and blue components of a single color. An indexed

images uses “direct mapping” of pixel intensity values color map values. The color of

each pixel is determined by using the corresponding value the integer matrix x as a

pointer in to map. If x is of class double ,then all of its components with values less than

or equal to 1 point to the first row in map, all components with value 2 point to the second

row and so on. If x is of class units or unit 16, then all components value 0 point to the

first row in map, all components with value 1 point to the second and so on.

5

1.5.4 RGB Images

An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet

corresponding to the red, green and blue components of an RGB image, at a specific

spatial location. An RGB image may be viewed as “stack” of three gray scale images that

when fed in to the red, green and blue inputs of a color monitor.

Produce a color image on the screen. Convention the three images forming an

RGB color image are referred to as the red, green and blue components images. The data

class of the components images determines their range of values. If an RGB image is of

class double the range of values is [0, 1].

Similarly the range of values is [0,255] or [0, 65535].For RGB images of class

units or unit 16 respectively. The number of bits use to represents the pixel values of the

component images determines the bit depth of an RGB image. For example, if each

component image is an 8bit image, the corresponding RGB image is said to be 24 bits

deep. Generally, the number of bits in all component images is the same. In this case the

number of possible color in an RGB image is (2^b) ^3, where b is a number of bits in

each component image. For the 8bit case the number is 16,777,216 colors.

1.6 Need for Image Compression

One of the important aspects of image storage is its efficient Compression. To make this

fact clear let's see an example. An image, 1024 pixel x 1024 pixel x 24 bit without

compression would require 3 MB of storage and 7 minutes for transmission, utilizing a

high speed 64 Kbits/s ISDN line. If the image is compressed at a 10:1 compression ratio,

the storage requirement is reduced to 300 KB and the transmission time drops to under 6

seconds. Seven 1 MB images can be compressed and transferred to a floppy disk in less

time than it takes to send one of the original files, uncompressed, over an AppleTalk

network.

In a distributed environment large image files remain a major bottleneck within

systems. Compression is an important component of the solutions available for creating

file sizes of manageable and transmittable dimensions. Increasing the bandwidth is

another method, but the cost sometimes makes this a less attractive solution.

6

1.7 Data Compression

Data compression is similar in concept to lossless image compression; however it does

not try to interpret the data as an image. Data compression searches for patterns in a data

stream. Common data compression methods are deflated and LZW 6. The lossless

compression file format GIF7 simply turns the image into a long string of data (all the

horizontal lines appended) and applies LZW data compression. The lossy file format

JPEG uses Huff-man encoding to compress the data stream that is output from the

Discrete Cosine Transformation process. This project will not deal with data

compression. However as the main goal of this project is to compare SVD against JPEG,

a general purpose data compression algorithm is applied to the output data stream to give

meaningful comparisons.

7

Chapter 2

VIDEO COMPRESSION

The digital video compression technology has been boomed for many years. Today, when

people chat with their friends through a visual telephone, when people enjoy the movie

broadcasting through Internet or the digital music such as mp3, the convenience that the

digital video industry brings to us cannot be forgotten. All of these should attribute to the

enhancement on mass storage media or streaming video/audio services which has

influenced our daily life deeply. In this project we will be implementing in simulink,

Simulink is a platform for multidomain simulation and Model-Based Design for dynamic

systems. It provides an interactive graphical environment and a customizable set of block

libraries, and can be extended for specialized applications.

This video compression using motion compensation and Discrete Cosine

Transform (DCT) techniques with the Video and Video Processing Block set. The demo

calculates motion vectors between successive frames and uses them to reduce redundant

information. Then it divides each frame into sub matrices and applies the discrete cosine

transform to each sub matrix. Finally, the demo applies a quantization technique to

achieve further compression. The Decoder subsystem performs the inverse process to

recover the original video.

2.1 Why is Digital Video Compressed?

Digital video is compressed because it takes up a staggering amount of room in its

original form. By compressing the video, you make it easier to store. Digital video can be

compressed without impacting the perceived quality of the final product because it affects

only the parts of the video that humans can't really detect.

Compressing video is essentially the process of throwing away data for things we

can't perceive. Standard digital video cameras compress video at a ratio of 5 to 1, and

there are formats that allow you to compress video by as much as 100 to 1. But too much

compression can be a bad thing. The more you compress the more data you throw away.

Throw away too much, and the changes become noticeable. With heavy compression you

can get video that's nearly unrecognizable. When you compress video, always try several

8

compression settings. The goal is to compress as much possible until the data loss

becomes noticeable and then notch the compression back a little. That will give you the

right balance between file size and quality. And remember that every video is different.

2.2 Categories of Data Compression Algorithms

Two categories of data compression algorithm can be distinguished: lossless and lossy

Lossy techniques cause image quality degradation in each compression/decompression

step. Careful consideration of the human visual perception ensures that the degradation is

often unrecognizable, though this depends on the selected compression ratio. In general,

lossy techniques provide far greater compression ratios than lossless techniques. Here

we'll discuss the roles of the following data compression techniques:

2.2.1 Lossless Coding Techniques

Lossless coding guaranties that the decompressed image is absolutely identical to the

image before compression. This is an important requirement for some application

domains, e.g. medical imaging, where not only high quality is in demand, but unaltered

archiving is a legal requirement. Lossless techniques can also used for the compression of

other data types where loss of information is not acceptable, e.g. text documents and

program executables.

Some compression methods can be made more effective by adding a 1D or 2D delta

coding to the process of compression. These deltas make more effective use of run length

encoding, have (statistically) higher maxima in code tables (leading to better results in

Huffman and general entropy coding), and build greater equal value areas usable for area

coding.

Some of these methods can easily be modified to be lossy. Lossy element fits

perfectly into 1D/2D run length search. Also, logarithmic quantization may be inserted to

provide better or more effective results.

Run Length Encoding: Run length encoding is a very simple method for compression of

sequential data. It takes advantage of the fact that, in many data streams, consecutive

single tokens are often identical. Run length encoding checks the stream for this fact and

inserts a special token each time a chain of more than two equal input tokens are found.

This special input advises the decoder to insert the following token n times into his output

9

stream. The effectivity of run length encoding is a function of the number of equal tokens

in a row in relation to the total number of input tokens. This relation is very high in

undeterred two tone images of the type used for facsimile. Obviously, effectively

degrades when the input does not contain too many equal tokens. With a rising density of

information, the likelihood of two following tokens being the same does sink

significantly, as there is always some noise distortion in the input. Run length coding is

easily implemented, either in software or in hardware. It is fast and very well verifiable,

but its compression ability is very limited.

Huffman Encoding: This algorithm, developed by D.A. Huffman, is based on the fact

that in an input stream certain tokens occur more often than others. Based on this

knowledge, the algorithm builds up a weighted binary tree according to their rate of

occurrence. Each element of this tree is assigned a new code word, whereat the length of

the code word is determined by its position in the tree. Therefore, the token which is most

frequent and becomes the root of the tree is assigned the shortest code. Each less common

element is assigned a longer code word. The least frequent element is assigned a code

word which may have become twice as long as the input token. The compression ratio

achieved by Huffman encoding uncorrelated data becomes something like 1:2.

On slightly correlated data, as on images, the compression rate may become much

higher, the absolute maximum being defined by the size of a single input token and the

size of the shortest possible output token (max. compression = token size[bits]/2[bits]).

While standard palletized images with a limit of 256 colors may be compressed by 1:4 if

they use only one color, more typical images give results in the range of 1:1.2 to 1:2.5.

Entropy Coding: The typical implementation of an entropy coder follows J. Ziv/A.

Lempel's approach. Nowadays, there is a wide range of so called modified Lempel/Ziv

coding. These algorithms all have a common way of working. The coder and the decoder

both build up an equivalent dictionary of met symbols, each of which represents a whole

sequence of input tokens. If a sequence is repeated after a symbol was found for it, then

only the symbol becomes part of the coded data and the sequence of tokens referenced by

the symbol becomes part of the decoded data later. As the dictionary is build up based on

the data, it is not necessary to put it into the coded data, as it is with the tables in a

Huffman coder. Entropy coders are a little tricky to implement, as there are usually a few

tables, all growing while the algorithm runs.

10

Area Coding: Area coding is an enhanced form of run length coding, reflecting the two

dimensional character of images. This is a significant advance over the other lossless

methods. For coding an image it does not make too much sense to interpret it as a

sequential stream, as it is in fact an array of sequences, building up a two dimensional

object. Therefore, as the two dimensions are independent and of same importance, it is

obvious that a coding scheme aware of this has some advantages. The algorithms for area

coding try to find rectangular regions with the same characteristics. These regions are

coded in a descriptive form as an Element with two points and a certain structure. The

whole input image has to be described in this form to allow lossless decoding afterwards.

Practical implementations use recursive algorithms for reducing the whole area to

equal sized sub rectangles until a rectangle does fulfill the criteria defined as having the

same characteristic for every pixel. This type of coding can be highly effective but it

bears the problem of a nonlinear method, which cannot be implemented in hardware.

Therefore, the performance in terms of compression time is not competitive.

2.2.2 Lossy Coding Techniques

In most of applications we have no need in the exact restoration of stored image. This fact

can help to make the storage more effective, and this way we get to lossy compression

methods. Lossy image coding techniques normally have three components:

Image modeling which defines such things as the transformation to be applied to

the image.

Parameter quantization whereby the data generated by the transformation is

quantized to reduce the amount of information.

Encoding, where a code is generated by associating appropriate codeword to the

raw data produced by the quantization.

Each of these operations is in some part responsible of the compression. Image modeling

is aimed at the exploitation of statistical characteristics of the image (i.e. high correlation,

redundancy). Typical examples are transform coding methods, in which the data is

represented in a different domain (for example, frequency in the case of the Fourier

Transform [FT], the Discrete Cosine Transform [DCT], the Kahrunen-Loewe Transform

11

[KLT], and so on), where a reduced number of coefficients contains most of the original

information. In many cases this first phase does not result in any loss of information.

The aim of quantization is to reduce the amount of data used to represent the information

within the new domain. Quantization is in most cases not a reversible operation:

therefore, it belongs to the so called 'lossy' methods.

Encoding is usually error free. It optimizes the representation of the information

(helping sometimes to further reduce the bit rate), and may introduce some error detection

codes. In the following sections, a review of the most important coding schemes for lossy

compression is provided. Some methods are described in their canonical form (transform

coding, region based approximations, fractal coding, wavelets, hybrid methods) and some

variations and improvements presented in the scientific literature are reported and

discussed.

Transform Coding (DCT/Wavelets/Gabor): A general transform coding scheme

involves subdividing an NxN image into smaller nxn blocks and performing a unitary

transform on each sub image. A unitary transform is a reversible linear transform whose

kernel describes a set of complete, ortho normal discrete basic functions. The goal of the

transform is to decorate the original signal, and this declaration generally results in the

signal energy being redistributed among only a small set of transform coefficients. In this

way, many coefficients may be discarded after quantization and prior to encoding. Also,

visually lossless compression can often be achieved by incorporating the HVS contrast

sensitivity function in the quantization of the coefficients.

Transform coding can be generalized into four stages:

Image subdivision

Image transformation

Coefficient quantization

Huffman encoding

For a transform coding scheme, logical modeling is done in two steps: a segmentation

one, in which the image is subdivided in bi dimensional vectors (possibly of different

sizes) and a transformation step, in which the chosen transform (e.g. KLT, DCT, and

Hadamard) is applied. Quantization can be performed in several ways. Most classical

approaches use 'zonal coding', consisting in the scalar quantization of the coefficients

belonging to a predefined area (with a fixed bit allocation), and 'threshold coding',

12

consisting in the choice of the coefficients of each block characterized by an absolute

value exceeding a predefined threshold. Another possibility, that leads to higher

compression factors, is to apply a vector quantization scheme to the transformed

coefficients. The same type of encoding is used for each coding method. In most cases a

classical Huffman code can be used successfully. The JPEG and MPEG standards are

examples of standards based on transform coding.

Vector Quantization: A vector quantize can be defined mathematically as a transform

operator T from a K-dimensional Euclidean space R^K to a finite subset X in R^K made

up of N vectors. This subset X becomes the vector codebook or more generally, the

codebook. Clearly, the choice of the set of vectors is of major importance. The level of

distortion due to the transformation T is generally computed as the most significant error

(MSE) between the "real" vector x in R^K and the corresponding vector x' = T(x) in X.

This error should be such as to minimize the Euclidean distance d. An optimum scalar

quantizer was proposed by Lloyd and Max. Later on Linde, Buzo and Gray resumed and

generalized this method, extending it to the case of a vector quantizer.

The LBG algorithm for the design of a vector codebook always reaches a local

minimum for the distortion function, but often this solution is not the optimal one. A

careful analysis of the LBG algorithm's behavior allows one to detect two critical points:

the choice of the starting codebook and the uniformity of the Voronoi regions'

dimensions. For this reason some algorithms have been designed that give better

performances. With respect to the initialization of the LBG algorithm, for instance, one

can observe that a random choice of the starting codebook requires a large number of

iterations before reaching an acceptable amount of distortion. Moreover, if the starting

point leads to a local minimum solution, the relative stopping criterion prevents further

optimization steps.

Segmentation and Approximation Methods: With segmentation and approximation

coding methods, the image is modeled as a mosaic of regions, each one characterized by a

sufficient degree of uniformity of its pixels with respect to a certain feature (e.g. grey

level, texture); each region then has some parameters related to the characterizing feature

associated with it.

The operations of finding a suitable segmentation and an optimum set of

approximating parameters are highly correlated, since the segmentation algorithm must

13

take into account the error produced by the region reconstruction (in order to limit this

value within determined bounds). These two operations constitute the logical modeling

for this class of coding schemes; quantization and encoding are strongly dependent on the

statistical characteristics of the parameters of this approximation.

For polynomial approximation regions are reconstructed by means of polynomial

functions in (x, y); the task of the encoder is to find the optimum coefficients. In texture

approximation, regions are filled by synthesizing a parameterized texture based on some

model (e.g. fractals, statistical methods, Markov Random Fields [MRF]). It must be

pointed out that, while in polynomial approximations the problem of finding optimum

coefficients is quite simple (it is possible to use least squares approximation or similar

exact formulations), for texture based techniques this problem can be very complex.

Fractal Compression: It is a form of vector quantization and it is a lossy compression.

Compression is performed by locating self-similar sections of an video, then using a

fractal algorithm to generate the sections. Like DCT, discrete wavelet transform

mathematically transforms an video into frequency components. The process is

performed on the entire video, which differs from the other methods (DCT) that work on

smaller pieces of the desired data. The result is a hierarchical representation of a video,

where each layer represents a frequency band.

2.3 Compression Standards

MPEG stands for the Moving Picture Experts Group. MPEG is an ISO/IEC working

group, established in 1988 to develop standards for digital audio and video formats. There

are five MPEG standards being used or in development. Each compression standard was

designed with a specific application and bit rate in mind, although MPEG compression

scales well with increased bit rates. They include:

MPEG-1: Designed for up to 1.5 Mbit/sec Standard for the compression of moving

pictures and audio. This was based on CD-ROM video applications, and is a popular

standard for video on the Internet, transmitted as .mpg files. In addition, level 3 of

MPEG-1 is the most popular standard for digital compression of audio--known as MP3.

MPEG-1 is the standard of compression for VideoCD, the most popular video distribution

format throughout much of Asia.

14

MPEG-2: Designed for between 1.5 and 15 Mbit/sec Standard on which Digital

Television set top boxes and DVD compression is based. It is based on MPEG-1, but

designed for the compression and transmission of digital broadcast television. The most

significant enhancement from MPEG-1 is its ability to efficiently compress interlaced

video. MPEG-2 scales well to HDTV resolution and bit rates, obviating the need for an

MPEG-3.

MPEG-4: It is a Standard for multimedia and Web compression. MPEG-4 is based on

object-based compression, similar in nature to the Virtual Reality Modeling Language.

Individual objects within a scene are tracked separately and compressed together to create

an MPEG4 file. This results in very efficient compression that is very scalable, from low

bit rates to very high. It also allows developers to control objects independently in a

scene, and therefore introduce interactivity

JPEG: JPEG stands for Joint Photographic Experts Group. It is also an ISO/IEC working

group, but works to build standards for continuous tone video coding. JPEG is a lossy

compression technique used for full-color or gray-scale videos, by exploiting the fact that

the human eye will not notice small color changes JPEG 2000 is an initiative that will

provide an video coding system using compression techniques based on the use of

wavelet technology.

2.4 Transforms

There are several common transforms being used in signal processing, such as the

Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Discrete Wavelet

Transform (DWT) and some other as well. The (DCT) is the most common transform

being used when processing videos and video. The (DWT) is used in the video

compression standard JPEG2000, and will be used in this application as well. Both the

(DCT) and (DWT) will be more thoroughly described. The basic idea of using transforms

when processing for example a video is to decorrelate the pixels to one another. By doing

so compression is achieved since the amount of redundant information is minimized. A

transform can be seen as a projection onto orthonormal bases, separated in time and/or

frequencies. By transforming a signal the energy is separated into sub bands. By

describing each sub band with dierent precisions, higher precision within high energy sub

bands and less precision in low energy sub bands, the signal can be compressed.

15

1/√N k=0, 0 ≤ n ≤ N-1 C (k, n) =

√Ncos (π(2n+1) k/2N) 1 ≥ k≤ N-1, 0 ≥ n ≤ N-1

To transform a matrix Y the transform matrix, c, is multiplied with with Y and gives the

transformed matrix X =CY. The Cosine transform is real-valued and orthogonal which

means that X has the properties as in

X=X*

X‾ =XT

The DCT is also excellent in energy compaction which means that the energy of the

matrix is concentrated to a small region of the transformed matrix. It has also good

decorrelation properties. These properties are very suitable for video and video processing

and are therefore widely used,(i.e. JPEG, MPEG and H.263.). The two-dimensional DCT

of Figure 2.1(a) can be seen. Note that the compaction of energy is concentrated to the

upper left corner.

2.5 DCT and Video Compression

In the JPEG video compression algorithm, the input video is divided into 8-by-8 or 16-

by-16 blocks, and the two-dimensional DCT is computed for each block. The DCT

coefficients are then quantized, coded, and transmitted. The JPEG receiver (or JPEG file

reader) decodes the quantized DCT coefficients, computes the inverse two-dimensional

DCT of each block, and then puts the blocks back together into a single video. For typical

videos, many of the DCT coefficients have values close to zero; these coefficients can be

discarded without seriously affecting the quality of the reconstructed video.

The example code below computes the two-dimensional DCT of 8-by-8 blocks in

the input video, discards (sets to zero) all but 10 of the 64 DCT coefficients in each block,

and then reconstructs the video using the two-dimensional inverse DCT of each block.

The transform matrix computation method is used.

I = imread ('cameraman.tif');

I = im2double (I);

T = dctmtx (8);

B = blkproc (I,[8 8],'P1*x*P2',T,T');

16

B2 = blkproc (B,[8 8],'P1.*x',mask);

I2 = blkproc (B2,[8 8],'P1*x*P2',T',T);

Although there is some loss of quality in the reconstructed video, it is clearly

recognizable, even though almost 85% of the DCT coefficients were discarded. To

experiment with discarding more or fewer coefficients, and to apply this technique to

other videos, try running the demo function DCT demo.

2.6 General Description

The Discrete Cosine Transform (DCT) is a technique that converts a spatial domain

waveform into its constituent frequency components as represented by a set of

coefficients. The process of reconstructing a set of spatial domain samples is called the

Inverse Discrete Cosine Transform (IDCT).

For data compression of video/video frames, usually a block of data is converted

from spatial domain samples to another domain (usually frequency domain), which offers

more compact representation. The optimal transform is the Karhunen-Loeve Transform

(KLT), as it packs most of the block energy into a fewer number of frequency domain

elements, it minimizes the total entropy of the block, and it completely de-correlates its

elements. However, it’s main Disadvantage is that its basis functions are video-

dependent. This complicates the digital implementation. The Discrete Cosine Transform

introduced by Ahmed in 1974, has the next best performance in compaction Efficiency,

while also having video-independent basis functions. Hence DCT is used to provide the

17

Mask =

necessary transform and the resultant data is then compressed using quantization and

various coding techniques to offer lossless as well as lossy compression.

Chapter 3

SINGULAR VALUE DECOMPOSITION (SVD)

Singular Value Decomposition (SVD) is said to be a significant topic in linear algebra by

many renowned mathematicians. SVD has many practical and theoretical values; Special

feature of SVD is that it can be performed on any real (m, n) matrix. In this presentation

we will demonstrate how to use Singular Value Decomposition (SVD) to factorize and

approximate large matrices, specifically images.

Singular value decomposition takes a rectangular m×n matrix A and calculates three

matrices U, S, and V. S is a diagonal m×n matrix (the same dimensions as A). U and V

are unitary or orthogonal matrices with sizes m×m and n×n respectively. The matrices are

related by the equation

A=USVH

Calculating the SVD consists of finding the Eigen values and eigenvectors of AAH and

AHA. The eigenvectors of AHA make up the columns of V; the eigenvectors of AAH make

up the columns of U. The eigen values of AHA or AAH are the squares of the singular

values for A. The singular values are the diagonal entries of the S matrix and are arranged

in descending order. The singular values are always real numbers. If the matrix A is a real

matrix, then U and V are also real. Equation (2) can be expressed as

The matrix A can be approximated by matrix Ậ with rank k b

18

The matrix U contains one orthonormal basis. U is also known as the left singular vectors.

The matrix V contains another orthonormal basis. V is also known as the right singular

vectors. The diagonal matrix S contains the singular values.

3.1 Factoring V and S

First we will find V. To eliminate U from the equation A = USV T you simply

multiply on the left by AT :

ATA= (USVT) T (USVT)= VSTU T USVT

Since U is an orthogonal matrix, UTU = I which gives

ATA= VS2V T

Notice that this is similar to the digitalization of a matrix A, where A = QΛQT .

But now the symmetric matrix is not A it is ATA.

To find V and S we need to diagonalize ATA by finding the eigenvalues and

eigenvectors. The eigenvalues are the square of the elements of S (the singular

values) and the eigenvectors are the columns of V (the right singular vectors).

3.2 Factoring U

Eliminating V from the equation is very similar to eliminating U. Instead of

multiplying on the left by AT we will multiply on the right by AT.This gives

A AT= (USVT) (USVT) T = USVT VSTU T

Since V TV = I, this gives

A AT =VS2V T

Again we will find the eigenvectors, but this time for AAT. These are the columns

of U (the left singular vectors).

3.3 Properties of the SVD

There are many properties and attributes of SVD, here we just present parts of the

properties that we used in this project.

The singular value σ1, σ2….σn are unique, however, the matrices U and V are not

unique;

19

Since ATA = VS T SV T, so V diagonalizes ATA, it follows that the vi s are

the Eigenvector of ATA.

Since AAT = USS T U T, so it follows that U diagonalizes AAT and that the ui’s are

the eigenvectors of AAT.

If A has rank of r then vj, vj, , vr form an orthonormal basis for range space of

AT ,R(AT ), and uj, uj, …, ur form an orthonormal basis for .range space A, R(A).

The rank of matrix A is equal to the number of its nonzero singular values.

3.4 Using SVD for Image Compression

Image compression deals with the problem of reducing the amount of data required to

represent a digital image. Compression is achieved by the removal of three basic data

redundancies:

Coding redundancy, which is present when less than optimal;

Interpixel redundancy, which results from correlations between the pixels;

Psycho visual redundancies, which is due to data that is ignored by the human

visual.

To illustrate the SVD image compression process, consider the following equation

That is A can be represented by the outer product expansion:

A=S1U1V1T + S2U2V2

T +…..+SrUrVrT

When compressing the image, the sum is not performed to the very last SVs; the

SVs with small enough values are dropped. (Remember that the SVs are ordered on the

diagonal.) The closet matrix of rank k is obtained by truncating those sums after the first k

terms:

A=S1U1V1T + S2U2V2

T +…..+SkUkVkT

The total storage for Ak will be K (m+n+1).

The integer k can be chosen confidently less then n, and the digital image

corresponding to Ak still have very close the original image. However, choosing the

different k will have a different corresponding image and storage for it. For typical

choices of the k, the storage required for Ak will be less the 20 percentage.

20

3.5 Splitting Image into Smaller Blocks

The SVD process has order n3, which makes it very slow for large pictures. However if

the picture is broken up into smaller sections and each handled separately, the overall

processing time is much lower. This is not a trade-off situation and in fact, as will be

seen, is necessary for good rates of compression.

The key to SVD compression is using low rank approximations to the image. The

less complicated an image the lower the rank necessary to accurately represent it. For

example a picture that is a single color block can be perfectly represented by a rank 1

SVD.

Let X be an n×n matrix with every value as some constant c Є R. Then X=cjjT

Where j=(1,1,…,1)T now if u=v=1/√((n) and s=cn it can be seen that X=cijT=usvT

A realistic photo (for example take a figure shown below) however is generally

complex overall, but may contain sections of simpler images. With the test image

Frogrock there are areas with simple images. For example the sky has very little detail.

Also the side of the hut is fairly monotone. While other sections such as the person are

more complicated. It would make sense to break up this photo so that the simple sections

can be represented with low rank approximations, while the complicated sections have

higher rank to include the detail.

21

Figure 3.1: Frog Rock Test ImageA human can quickly look at a photograph and isolate the sections of high detail

from low detail. However this can be a difficult task for a computer, requiring a lot of

processing. Ideally the picture would be perfectly split into separate regions based on

complexity, but in practice this would be too time consuming and require too much

overhead information to keep track of the regions.

A simple approach is to break the image into smaller blocks of the same size.

Although the blocks won’t perfectly align with the different regions of complexity, if

there are enough blocks then the blocks will generally match regions of complexity. This

is the approach used by JPEG; pictures are divided into blocks of 8× 8 (the JPEG

specification allows block sizes of 16× 16 but this is rarely used).

The second approach used in this project is to have adaptive block sizes. Initially

the picture is broken up into a series of large blocks. Then each block is split into four

quarter size blocks. If less storage is required when the block is split into quarters, then

these new blocks are accepted, otherwise the original block is left. This process can be

repeated on the new blocks, getting smaller and smaller each time.

3.6 Adaptive Block Sizes

For a given rank, the larger the block size the more efficient the storage. For example, a

100× 100 matrix approximated with rank k requires (100 + 100) k = 200k elements. If the

matrix was split into four 50×50 matrices also approximated with rank k, then each matrix

22

would require (50+50)k = 100k elements. Therefore all four of them would require 400k

elements, which is twice the amount required for the single block. However it is to be

hoped the smaller blocks are simpler and require a lower rank to represent them. An n ×n

block with rank k requires 2nk elements. If this block is split into four quarter blocks of

size 1/2n× 1/2n, with each sub-block having rank k1, k2, k3, k4 respectively, the number

of storage elements is

n (k1+k2+k3+k4)

So the decision as to whether a block is to be subdivided should be based on

n (k1+k2+k3+k4) < 2nk

k1+k2+k3+k4 <2k

Unfortunately, to calculate the values k1, k2, k3, k4, the block has to be divided and

a SVD applied to each sub-block. As a result many more SVDs are performed than are

used in the output. This results in a much slower compression time, however does not

affect the speed of decompression. The advantage of this adaptive block size technique is

that it can better map the regions of complexity of the picture.

3.7 Mixed Mode

With the dividing block technique we could start with a block the size that fits the whole

picture. The block size needs to be chosen so that it evenly divides into quarters at each

step. So preferably the dimensions should be a power of 2. The picture is therefore

‘padded’ out so that the whole image is a square with the extra pixels being set to zero.

The zero sections of the picture will not require much storage, and in fact any sub square

consisting entirely of zeros will have rank 0.

This technique has the disadvantage that the compression can take a very long

time, as the first few SVDs to perform are on very large blocks. When this technique was

used on the test images, none of the first few blocks were accepted. The blocks had to be

reduced to a small enough size before the algorithm determined that it was not worth

subdividing further. A combination of the fixed block and the subdividing block

techniques solved this problem.

First the picture is divided into moderate size fixed blocks. Then the adaptive

block size technique is applied to each fixed block separately. So there is an upper limit

on the block sizes which saves a lot of unnecessary processing time. Similarly the

23

algorithms never accepted a block size that was too small, so a lower limit on the block

size could be used. The test images used, sensible upper and lower bounds were found to

be 64×64 and 8×8. Therefore only four block sizes were allowed: 64×64, 32×32, 16× 16,

and 8×8. So the extra processing required for rejected blocks was not too great.

3.8 Picture Quality

Image Compression Measures: In order to measure the performance of the SVD image

compression method, we can computer the compression factor and the quality of the

compressed image. Image compression factor can be computed using the Compression

ratio

CR = m*n/ (k (m + n + 1))

Measuring Picture Quality: The original image is represented by the matrix A. The

approximating image is matrix Ậ. It is necessary to have a measure of image quality.

Unfortunately image quality as perceived by the eye is a very subjective measurement. A

human can quickly look at an image and determine that the quality is acceptable or not

acceptable but it is difficult to mathematically represent this. The most common

measurement used in image processing is the Peak to Peak Signal to Noise Ratio (PSNR)

measured in decibels (db). Although not a great model of the human eye, it is simple to

calculate.

PSNR = 10log10 ((max range)2/RMSE)

RMSE=

Max range is the allowed value range of the pixels. For convenience pixels will be in the

range [0….. 1]. Hence max range = 1. RMSE is the Root Mean Square Error.

Higher Order SVD: Tensor decomposition was studied in psychometric data analysis

during the 1960s, when data sets having more than two dimensions (generally called

“three-way data sets”) became widely used. A fundamental achievement was brought by

Tucker (1963), who proposed to decompose a 3-D signal using directly a 3-D principal

24

component analysis (PCA) instead of unfolding the data on one dimension and using the

standard SVD. This three-way PCA is also known as Tucker3 decomposition. In the

1980s, such multidimensional techniques were also applied to chemometrics analysis.

The signal processing community only recently showed interest in the Tucker3

decomposition. The work of Lathauwer et al. (2000) proved that this decomposition is a

multilinear generalization of the SVD to multidimensional data. Studying its properties

with a notation more familiar to the signal processing community, the authors highlighted

its properties concerning the rank, oriented energy, and best reduced-rank approximation.

As the decomposition can have higher dimensions than 3, they called it higher order SVD

(HOSVD). In the following, we consider the notation of and define the HOSVD

decomposition.

Multiple-Level Decomposition: The decomposition process can be iterated, with

successive approximations being decomposed in turn, so that one signal is broken down

into many lower resolution components. This is called the wavelet decomposition tree.

Figure 3.2: Multiple Level Decomposition

Number of Levels: Since the analysis process is iterative, in theory it can be continued

indefinitely. In reality, the decomposition can proceed only until the individual details

consist of a single sample or pixel. In practice, you’ll select a suitable number of levels

based on the nature of the signal, or on a suitable criterion such as entropy.

Recently, the parametric model proposed by Doretto et al. was shown to be a valid

approach for analysis/synthesis of dynamic textures. Each video frame is unfolded into a

column vector and constitutes a point that follows a trajectory as time evolves. The

analysis consists in finding an appropriate space to describe this trajectory and in

25

cA3 cA4

cA2 cD2

cA1 cD1

S

identifying the trajectory using methods of dynamical system theory. The first part is

done by using singular value decomposition (SVD) to perform dimension reduction to a

lower dimensional space. The point trajectory is then described using a multivariate auto-

regressive (MAR) process of order 1. Dynamic textures are, thus, modeled using a linear

dynamic system and synthesis is obtained by driving this system with white noise. In this

model, the SVD exploits the temporal correlation between the video frames but the

unfolding operations prevent the possibility of exploiting spatial and chromatic

correlations.

We use the parametric approach of but perform the dynamic texture analysis with

a higher order SVD, which permits to simultaneously decompose the temporal, spatial,

and chromatic components of the video sequence. This approach was proposed by the

authors in [10] and here it is described in detail. Our scheme is depicted in Fig. 1. SVD in

the analysis is substituted by HOSVD.

Figure 3.3: Schematic Representation of the Tensor-Based Linear Model Approach for Analysis and Synthesis.

26

HOSVD is an extension of the SVD to higher order dimensions. It is not an

optimal tensor decomposition in the sense of least squares data fitting and has not the

truncation property of the SVD, where truncating the first singular values permits to find

the best -rank approximation of a given matrix. Despite this, the approximation obtained

is not far from the optimal one and can be computed much faster. In fact, the computation

of HOSVD does not require iterative alternating least squares algorithms, but needs

standard SVD computation only. The major advantage of the HOSVD is the ability of

simultaneously considering the spatial, temporal, and chromatic correlations. This allows

for a better data modeling than a standard SVD, since dimension reduction can be

performed not only in the time dimension but also separately for spatial and chromatic

content.

The separate analysis of each signal component allows adapting the signal

“compression” given by the dimension reduction to the characteristics of each dynamic

texture. For comparable visual synthesis quality, we, thus, obtain a number of model

coefficients that is on average five times smaller than those obtained using standard SVD.

Creating more compact models is also addressed in, where dynamic texture shape and

visual appearance are jointly addressed, thus enabling the modeling of complex video

sequences containing sharp edges. Their and our approach is both characterized by a more

computationally expensive analysis, but also a fast synthesis. In our case, synthesis can be

done in real-time. This makes our technique very appropriate for applications with

memory constraints, such as mobile devices. We believe that HOSVD is a very promising

technique for other video analysis and approximation applications. Recently, it has been

successfully used in image based texture rendering, face super resolution, and in face

analysis and recognition.

In the framework of video compression and transmission, it is useful to find a way

to analyze/synthesize dynamic textures. An efficient compression, in fact, would open the

possibility of having access to realistic video animations on devices that have strong

constraints in the available bandwidth. This is the case of mobile phones, for instance.

The approaches used to model dynamic textures can be classified into non-parametric and

parametric. In the first case, the analysis and synthesis is conducted directly from a given

representation of the image (the pixel values or a description in an transformed domain

obtained using certain bases, as wavelets for instance).

27

In the second case, researchers aim to describe the dynamic texture using

dynamical models. An interesting approach is to consider a linear dynamic model (LDS).

In fact, if some simplifications are considered, a close solution for the estimation of the

model's parameters can be found for such systems. Unfortunately, the synthesized

sequences obtained using this method are not visually appealing, if compared to the

original sequence, where periodicity (oscillation) has been introduced in the model by

forcing the poles of the dynamic system to lay on the unit circle. This solution permits to

obtain more realistic sequences, but still is based on the same assumptions used for the

construction.

A dynamic texture can be considered as a multidimensional signal. In the case of a

grayscale image video, it can be represented with a 3-D tensor by assigning spatial

information to the first two dimensions and time to the third. In a color video sequence,

chromatic components add another dimension. The input signal then becomes 4-D.

The analysis is done by first decomposing the input signal using the HOSVD and

then by considering the orthogonal matrix derived from the decomposition along the time

dimension. This matrix contains the dynamics of the video sequences, since its columns,

ordered along the time axis, correspond to the weights that control the appearance of the

dynamic texture as time evolves.

3.9 Dynamic Texture Synthesis

Dynamic textures are textures that change over time. Videos including fire, water, smoke,

and so on, are typical example of dynamic textures. Dynamic textures synthesis is the

process of creating artificial textures. This can be achieved starting either from a

description (model) of a physical phenomenon or from existing video sequences.

The first approach is called physics-based and leads to a description of the

dynamic texture that usually requires few parameters. This approach has been extensively

adopted for the reproduction of synthetic flames or fire, since they are often used in

gaming applications or digital movies. Even though parameter tuning is not always

straightforward, the synthesis results are impressive, but computationally extremely

expensive. This limits the use of this type of model to cases where synthesis can be done

offline, such as during editing in the movie making process.

28

The second approach is called image-based, as it does not aim at modeling the

physics underlying the natural process, but at replicating existing videos. This can be

done in two ways. In the first, synthesis is done by extracting different clips from the

original video and patching them together to obtain a longer video, ensuring that the

temporal joints are not noticeable and that the dynamic appearance is maintained. This

type of synthesis is called nonparametric or patch-based, since it is not based on a model

and reduces the synthesis to a collage of patches. It has the advantage of ensuring high

visual quality because the synthetic video is composed of the original video frames,

marginally modified by morphing operations only along clips discontinuities. However,

the entire synthetic video has to be created in one step and stored in memory, thus not

allowing for on-the-fly synthesis. In addition, this technique is not flexible, since it

permits to modify the appearance of single frames, but not the texture dynamics.

In the second, a parametric image-based approach is used to build a model of

dynamic textures. The dynamic texture is analyzed and model parameters are computed.

The visual quality of the synthesis is generally less good than for patch-based techniques,

but the parametric approach is more flexible, more compact in terms of memory

occupation, and usually permits on-the-fly synthesis. Moreover, it can also be used for

other applications, such as segmentation, recognition, and editing.

The term “specificity” indicates if a given approach is specific to a certain type of

dynamic texture, such as fire, water, or smoke, or can be used for all kinds of dynamic

textures. The term “flexibility” indicates if the characteristics of the generated texture can

easily be changed during the synthesis.

The physics-based approaches have high flexibility, but also high specificity,

since a model for fire cannot be used for the generation of water or smoke, for instance.

They have high flexibility since the visual appearance of the synthetic texture can be

modified by tuning the model parameters

3.10 Tensor

Tensor is a general name for multi–linear mappings over a set of vector spaces, i.e. a

vector is a 1–mode tensor, a matrix is a 2–mode tensor. The tensor T is an N mode tensor

where the dimensionality of the mode i is di. In the same way as a matrix can be pre–

multiplied (mode– 1 multiplication) or post-multiplied (mode–2 multiplication) with

29

another matrix, a matrix can be multiplied with a higher order tensor with respect to

different modes. The mode multiplication of a matrix MIn×dn with a tensor T is denoted

as T×nM and results in a tensor U with the same number of modes. The elements of the

tensor U is computed in the following way:

Ud1....dn−1indn+1...dN =Xdn td1...dN × mindn

Tensor Decomposition: Principal Component Analysis (PCA) is a version of Singular

Value Decomposition (SVD) which is a 2–mode tool, commonly used in signal

processing to reduce the dimensionality of the space and reduce noise. SVD decomposes

a matrix into three other matrices, such that:

A = US VT

Where, the matrix U spans the row space of A, the matrix V spans the column space of A

and S is a diagonal matrix of singular values. The column eigenvectors vectors of

matrices U (likewise for V ) are orthonormal to each other, describing a new orthonormal

coordinate system for the space spanned by matrix A. N–mode SVD or Higher Order

SVD (HOSVD) is a generalization of the matrix SVD to tensors. It decomposes a tensor

T , by orthogonolazing its modes, yielding a core tensorand matrices spanning the vector

spaces in each mode of the tensor, i.e.:

T = S ×1 U1 ×2 U2.... ×n Un

The tensor S is called the core tensor and is analogous to the diagonal singular

value matrix in the traditional SVD. However, for HOSVD, the tensor S is not a diagonal

tensor but coordinates the interaction of matrices to produce the original tensor. Matrices

Ui are again orthonormal and the column vectors of Ui spans the space of tensor T ,

flattened with respect to mode i. The row vectors of Ui are the coefficient sets describing

each dimension in mode i. These coefficients can be thought as the coefficients extracted

from PCA but there are different sets of coefficients for each mode in HOSVD analysis.

Dimensionality Reduction: After decomposing the original data tensor to yield the core

tensor and mode matrices, we are able to reduce the dimensionality with respect to the

mode we want, unlike PCA where the dimensionality reduction is only based on the

variances. By reducing the number of dimensions in one mode and keeping the other

intact, we can have more control over the noise reduction, classification accuracies and

complexity of the problem. The dimensionality reduction is achieved by deleting the last

m–column vectors from the desired mode matrix and deleting the corresponding m

30

hyper–planes from the core tensor. It is also defined that the error after dimensionality

reduction is bounded by the Frobenius–norm of the hyper–planes deleted from the core

tensor.

Chapter 4

INTRODUCTION to MATLAB

MATLAB® is a high-performance language for technical computing. It integrates

computation, visualization, and programming in an easy-to-use environment where

problems and solutions are expressed in familiar mathematical notation. Typical uses

include

Math and computation

Algorithm development

Data acquisition

Modeling, simulation, and prototyping

Data analysis, exploration, and visualization

Scientific and engineering graphics

Application development, including graphical user interface building.

MATLAB is an interactive system whose basic data element is an array that does not

require dimensioning. This allows you to solve many technical computing problems,

especially those with matrix and vector formulations, in a fraction of the time it would

take to write a program in a scalar non interactive language such as C or FORTRAN.

31

The name MATLAB stands for matrix laboratory. MATLAB was originally

written to provide easy access to matrix software developed by the LINPACK and

EISPACK projects. Today, MATLAB engines incorporate the LAPACK and BLAS

libraries, embedding the state of the art in software for matrix computation.

MATLAB has evolved over a period of years with input from many users. In

university environments, it is the standard instructional tool for introductory and

advanced courses in mathematics, engineering, and science. In industry, MATLAB is the

tool of choice for high-productivity research, development, and analysis.

MATLAB features a family of add-on application-specific solutions called

toolboxes. Very important to most users of MATLAB, toolboxes allow you to learn and

apply specialized technology. Toolboxes are comprehensive collections of MATLAB

functions (M-files) that extend the MATLAB environment to solve particular classes of

problems. Areas in which toolboxes are available include signal processing, control

systems, neural networks, fuzzy logic, wavelets, simulation, and many others.

4.1 The MATLAB System

The MATLAB system consists of five main parts

Development Environment

Matlab Mathematical Function

Matlab Language

Graphics

Matlab Application Program Interface

Development Environment: This is the set of tools and facilities that help you use

MATLAB functions and files. Many of these tools are graphical user interfaces. It

includes the MATLAB desktop and Command Window, a command history, an editor

and debugger, and browsers for viewing help, the workspace, files, and the search path.

The MATLAB Mathematical Function: This is a vast collection of computational

algorithms ranging from elementary functions like sum, sine, cosine, and complex

arithmetic, to more sophisticated functions like matrix inverse, matrix eigen values,

Bessel functions, and fast Fourier transforms.

32

The MATLAB Language: This is a high-level matrix/array language with control flow

statements, functions, data structures, input/output, and object-oriented programming

features. It allows both "programming in the small" to rapidly create quick and dirty

throw-away programs, and "programming in the large" to create complete large and

complex application programs.

Graphics: MATLAB has extensive facilities for displaying vectors and matrices as

graphs, as well as annotating and printing these graphs. It includes high-level functions

for two-dimensional and three-dimensional data visualization, image processing,

animation, and presentation graphics. It also includes low-level functions that allow you

to fully customize the appearance of graphics as well as to build complete graphical user

interfaces on your MATLAB applications.

The MATLAB Application Program Interface (API): This is a library that allows you

to write C and FORTRAN programs that interact with MATLAB. It includes facilities for

calling routines from MATLAB (dynamic linking), calling MATLAB as a computational

engine, and for reading and writing MAT-files.

4.2 MATLAB Working Environment

Getting Help

Using the Matlab Editor to Create M-Files

MATLAB Desktop

MATLAB Desktop: Matlab Desktop is the main Matlab application window. The

desktop contains five sub windows, the command window, the workspace browser, the

current directory window, the command history window, and one or more figure

windows, which are shown only when the user displays a graphic.

The command window is where the user types MATLAB commands and

expressions at the prompt (>>) and where the output of those commands is displayed.

MATLAB defines the workspace as the set of variables that the user creates in a work

session. The workspace browser shows these variables and some information about them.

Double clicking on a variable in the workspace browser launches the Array Editor, which

can be used to obtain information and income instances edit certain properties of the

variable.

33

The current directory tab that is present above the workspace tab shows the

contents of the current directory, whose path is shown in the current directory window.

For example, in the windows operating system the path might be as follows: C:\

MATLAB\Work, indicating that directory “work” is a subdirectory of the main directory

“MATLAB”; which is installed in Drive C. clicking on the arrow in the current directory

window shows a list of recently used paths. Clicking on the button to the right of the

window allows the user to change the current directory.

MATLAB uses a search path to find M-files and other MATLAB related files,

which are organize in directories in the computer file system. Any file run in MATLAB

must reside in the current directory or in a directory that is on search path. By default, the

files supplied with MATLAB and math works toolboxes are included in the search path.

It is the easiest way to see which directories are on the search path. The easiest way to see

which directories are soon the search paths, or to add or modify a search path, is to select

set path from the File menu the desktop, and then use the set path dialog box. It is good

practice to add any commonly used directories to the search path to avoid repeatedly

having the change the current directory.

The Command History Window contains a record of the commands a user has

entered in the command window, including both current and previous MATLAB

sessions. Previously entered MATLAB commands can be selected and re-executed from

the command history window by right clicking on a command or sequence of commands.

This action launches a menu from which to select various options in addition to executing

the commands. This is useful to select various options in addition to executing the

commands. This is a useful feature when experimenting with various commands in a

work session.

Using the Matlab Editor to Create M-Files: The MATLAB editor is both a text editor

specialized for creating M-files and a graphical MATLAB debugger. The editor can

appear in a window by itself, or it can be a sub window in the desktop. M-files are

denoted by the extension .m, as in pixelup.m. The MATLAB editor window has

numerous pull-down menus for tasks such as saving, viewing, and debugging files.

Because it performs some simple checks and also uses color to differentiate between

various elements of code, this text editor is recommended as the tool of choice for writing

and editing M-functions. To open the editor, type edit at the prompt opens the M-file

34

filename.m in an editor window, ready for editing. As noted earlier, the file must be in the

current directory, or in a directory in the search path.

Getting Help: The principal way to get help online is to use the MATLAB help browser,

opened as a separate window either by clicking on the question mark symbol (?) on the

desktop toolbar, or by typing help browser at the prompt in the command window. The

help Browser is a web browser integrated into the MATLAB desktop that displays a

Hypertext Markup Language (HTML) documents. The Help Browser consists of two

panes, the help navigator pane, used to find information, and the display pane, used to

view the information. Self-explanatory tabs other than navigator pane are used to perform

a search.

4.3 CommandsUigetfile: Open standard dialog box for retrieving

filesDescriptionUigetfile displays a modal dialog box that lists files in the current

directory and enables the user to select or type the name of a file to be opened. If the

filename is valid and if the file exists, uigetfile returns the filename when the user clicks

Open. Otherwise uigetfile displays an appropriate error message from which control

returns to the dialog box. The user can then enter another filename or click Cancel. If the

user clicks Cancel or closes the dialog window, uigetfile returns 0.Aviinfo:Information

about Audio/Video Interleaved (AVI) fileDescriptionFileinfo = aviinfo (filename) It

returns a structure whose fields contain information about the AVI file specified in the

string filename. If filename does not include an extension, then .avi is used. The file must

be in the current working directory or in a directory on the MATLAB path.Aviread:Read

Audio/Video Interleaved (AVI) fileDescriptionMov = aviread (filename) reads the AVI

movie filename into the MATLAB movie structure mov. If filename does not include an

extension, then .avi is used. Use the movie function to view the movie

mov.frame2im:Convert movie frame to indexed imageDescription[X, Map] = frame2im

(F) converts the single movie frame F into the indexed image X and associated colormap

Map. The functions getframe and im2frame create a movie frame. If the frame contains

true-color data, then Map is empty.im2frame:Convert image to movie

frameDescriptionf = im2frame(X, map) converts the indexed image X and associated

colormap map into a movie frame f. If X is a truecolor (m-by-n-by-3) image, then map is

optional and has no effect.Imwrite:Write image to graphics fileDescriptionImwrite(X,

35

map, filename, fmt) writes the indexed image in X and its associated colormap map to

filename in the format specified by fmt. If X is of class uint8 or uint16, imwrite writes the

actual values in the array to the file. If X is of class double, imwrite offsets the values in

the array before writing, using uint8(X–1). Map must be a valid MATLAB colormap.

Note that most image file formats do not support colormaps with more than 256

entries.When writing multiframe GIF images, X should be an 4-dimensional M-by-N-by-

1-by-P array, where P is the number of frames to write.Imread:Read image from

graphics fileDescriptionA = imread (filename, fmt) reads a grayscale or color image

from the file specified by the string filename. If the file is not in the current directory, or

in a directory on the MATLAB® path, specify the full pathname.Movie:Play recorded

movie framesDescriptionMovie plays the movie defined by a matrix whose columns are

movie frames (usually produced by getframe).movie (M) plays the movie in matrix M

once, using the current axes as the default target. If you want to play the movie in the

figure instead of the axes, specify the figure handle (or gcf) as the first argument: movie

(figure_handle...). M must be an array of movie frames (usually from

getframe).Chapter 5RESULTS and CONCLUSIONBy Higher Order

SVD Analysis for Dynamic Texture Synthesis Videos like Flame, Pond & Grass are

given as input, so that the obtained Output video is 3 times compressed of Input video.

Figure 5.1: Output Frame for

Input Flame Video Description: It is one of the output frames that are obtained from the

given input video after compression. The following are the parameters that

are obtained from the compressed video.input_file_size = 20505600output_file_size

= 6835200Compression = 3Output file size is 3 times compression of input file

36

sizeCompression _ratio = 0.3333compression_ratio = 33.3333Psnr = 37.2036


Input Pond VideoDescription: It is one of the output frames that is obtained from the

given input video after compression. The following are the parameters that are

obtained from the compressed video. input_file_size = 39744000output_file_size =

13248000Compression = 3Output file size is 3 times compression of input file

sizecompression_ratio = 0.3333compression_ratio = 33.3333Psnr = 40.8908


Input Grass VideoDescription: Figure 5.3 is one of the output frames that are obtained

from the given input video after compression. The following are the parameters that are

obtained from the compressed video.input_file_size = 9676800output_file_size =

3225600Compression = 3Output file size is 3 times compression of input file

sizecompression_ratio = 0.3333compression_ratio = 33.3333Psnr =

45.4285ConclusionHere it is proposed to decompose the multidimensional signal that

represents a dynamic texture by using a tensor decomposition technique. As opposed to

techniques that unfold the multi dimensional signal on a 2-D matrix, our method analyzes

data in their original dimensions. This decomposition, only recently used for applications

37

in image and video processing, permits to better exploit the spatial, temporal, and

chromatic correlation between the pixels of the video sequence, leading to an important

decrease in model size. Compared to algorithms where the unfolding operations are

performed in 2-D or where the spatial information is exploited by considering the analysis

in the Fourier domain, this method results in models with on average five times less

coefficients, still ensuring the same visual quality. Despite being a suboptimal solution for

the tensor decomposition, the HOSVD ensures close-to-optimal energy compaction and

approximation error. The sub optimality derives from the fact that the HOSVD is

computed directly from the SVD, without using expensive iterative algorithms, such as

done for the optimal solution. This is an advantage, since the analysis can be done faster

and with less computational power. The few model parameters permit to perform

synthesis in real-time. Moreover, the small memory occupancy favours the use of the

HOSVD based model in architectures characterized by constraints in memory and

computational power complexity, such as PDAs or mobile

phones.APPENDIXSource Codeclear all;clc;[filename,

pathname]=uigetfile('*.avi');str2='.bmp';file=aviinfo(filename); % to get inforamtaion

abt video filefrm_cnt=file.NumFrames ; % No.of frames in the video filefor

i=1:frm_cntfrm(i)=aviread(filename,i); % read the Video

filefrm_name=frame2im(frm(i));filename1=strcat(strcat(num2str(i)),str2);imwrite(frm_na

me,filename1); % Write image fileendstr3='.png';for

j=1:frm_cntfilename_1=strcat(strcat(num2str(i)),str2);D=imread(filename_1);

[u1,s1,v1]=svd(double(filename_1));im = (u1 * s1 *

transpose(v1));file_2=strcat(strcat(num2str(j)),str3);imwrite(im,file_2);endfor

k=1:frm_cntfile_2=strcat(num2str(k),'.bmp');v=imread(file_2);[Y, map] =

rgb2ind(v,255);F(:,:,k)=im2frame(flipud(Y),map);save F Fendmov=aviread(filename);[h,

w, p] = size(mov(1).cdata);hf = figure('Name','INPUT VIDEO ');set(hf, 'position', [150

150 w h]);movie(gcf,mov);[h, w, p] = size(F(1).cdata);hf = figure('Name','HOSVD

COMPRESSED VIDEO ');set(hf, 'position', [150 150 w h]);movie(gcf,F);input_file_size

= frm_cnt * size(frm(1).cdata,1)* size(frm(1).cdata,2) *

size(frm(1).cdata,3)output_file_size=frm_cnt * size(F(1).cdata,1)* size(F(1).cdata,2) *

size(F(1).cdata,3)compression = (input_file_size / output_file_size)fprintf('output file size

38

is %d times compression of input file size',compression);compression_ratio =

output_file_size/input_file_sizecompression_ratio = compression_ratio *

100mse=(sum(mov(1).cdata(:,:,1)-F(1).cdata).*sum(mov(1).cdata(:,:,1)-F(1).cdata))/

input_file_size;psnr=20*log10(255/sqrt(max(mse))) REFERENCES [1]

Doretto.G, Chiuso. A, Wu.Y, and Soatto.S, “Dynamic textures,” Int.J. Comput. Vis.,

vol.51, no.2, pp. 91–109, 2003.[2] Doretto.G, Cremers.D, Favaro.P, and Soatto.S,

“Dynamic texture segmentation,” in Proc. IEEE Int. Conf. Image Processing, 2003,

pp.1236–1242.[3] Kwatra.V, Schödl.A, Essa.I, Turk.G, and Bobick.B, “Graphcut

textures: Image and Video synthesis using graph cuts,” in Proc. Siggraph, 2003, pp.

277286.[4] Rafael C. Gonzalez, Richard E.Woods. Digital Image Processing – Second

Edition[5] Schödl, Szeliski.R, Salesin.D, and Essa.I, “Video textures,” in

Proc .ACM Siggraph, 2000, pp. 489–

98.URL’Shttp://rr.epfl.ch/15/3/04389813.pdf

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4389813




39





http://rr.epfl.ch/15/3/04389813.pdf

Main

Documents

Transcript of Main