Chapter 1
INTRODUCTION
Digital image processing is an area characterized by the need for extensive experimental
work to establish the viability of proposed solutions to a given problem. An important
characteristic underlying the design of image processing systems is the significant level of
testing & experimentation that normally is required before arriving at an acceptable
solution. This characteristic implies that the ability to formulate approaches &quickly
prototype candidate solutions generally plays a major role in reducing the cost & time
required to arrive at a viable system implementation.
1.1 What is DIP?
An image may be defined as a two-dimensional function f(x, y), where x & y are spatial
coordinates, & the amplitude of f at any pair of coordinates (x, y) is called the intensity or
gray level of the image at that point. When x, y & the amplitude values of f are all finite
discrete quantities, we call the image a digital image. The field of DIP refers to
processing digital image by means of digital computer. Digital image is composed of a
finite number of elements, each of which has a particular location & value. The elements
are called pixels.
Vision is the most advanced of our sensor, so it is not surprising that image play
the single most important role in human perception. However, unlike humans, who are
limited to the visual band of the EM spectrum imaging machines cover almost the entire
EM spectrum, ranging from gamma to radio waves. They can operate also on images
generated by sources that humans are not accustomed to associating with image.
There is no general agreement among authors regarding where image processing
stops & other related areas such as image analysis& computer vision start. Sometimes a
distinction is made by defining image processing as a discipline in which both the input &
output at a process are images. This is limiting & somewhat artificial boundary. The area
of image analysis (image understanding) is in between image processing & computer
vision.
1
There are no clear-cut boundaries in the continuum from image processing at one
end to complete vision at the other. However, one useful paradigm is to consider three
types of computerized processes in this continuum: low-, mid-, & high-level processes.
Low-level process involves primitive operations such as image processing to reduce
noise, contrast enhancement & image sharpening. A low- level process is characterized
by the fact that both its inputs & outputs are images. Mid-level process on images
involves tasks such as segmentation, description of that object to reduce them to a form
suitable for computer processing & classification of individual objects. A mid-level
process is characterized by the fact that its inputs generally are images but its outputs are
attributes extracted from those images. Finally higher- level processing involves “Making
sense” of an ensemble of recognized objects, as in image analysis & at the far end of the
continuum performing the cognitive functions normally associated with human vision.
Digital image processing, as already defined is used successfully in a broad range of areas
of exceptional social & economic value. Images are an everyday aspect of computers
now. Web sites on the internet are generally made up of many pictures. A large
proportion of transmission bandwidth and storage facilities are taken up with computer
images. Reducing the storage requirements of the image while retaining the quality is
very important, otherwise systems would become completely clogged. Since 1990, the
JPEG1 picture format has been adopted as the standard for photographic images on the
internet. This project looks at another method for compressing images using the Singular
Value Decomposition (SVD).
1.2 What is an Image?
An image is represented as a two dimensional function f(x, y) where x and y are spatial
co-ordinates and the amplitude of ‘f’ at any pair of coordinates (x, y) is called the
intensity of the image at that point.
1.3 Coordinate Convention
The result of sampling and quantization is a matrix of real numbers. We use two principal
ways to represent digital images. Assume that an image f(x, y) is sampled so that the
resulting image has M rows and N columns. We say that the image is of size M X N. The
2
values of the coordinates (xylem) are discrete quantities. For notational clarity and
convenience, we use integer values for these discrete coordinates. In many image
processing books, the image origin is defined to be at (xylem)=(0,0).The next coordinate
values along the first row of the image are (xylem)=(0,1).It is important to keep in mind
that the notation (0,1) is used to signify the second sample along the first row. It does not
mean that these are the actual values of physical coordinates when the image was
sampled. Following figure shows the coordinate convention. Note that x ranges from 0 to
M-1 and y from 0 to N-1 in integer increments.
The coordinate convention used in the toolbox to denote arrays is different from
the preceding paragraph in two minor ways. First, instead of using (xylem) the toolbox
uses the notation (race) to indicate rows and columns. Note, however, that the order of
coordinates is the same as the order discussed in the previous paragraph, in the sense that
the first element of a coordinate topples, (alb), refers to a row and the second to a column.
The other difference is that the origin of the coordinate system is at (r, c) = (1, 1); thus, r
ranges from 1 to M and c from 1 to N in integer increments. IPT documentation refers to
the coordinates. Less frequently the toolbox also employs another coordinate convention
called spatial coordinates which uses x to refer to columns and y to refers to rows. This is
the opposite of our use of variables x and y.
1.4 Image as Matrices
The preceding discussion leads to the following representation for a digitized image
function
f (0,0) f(0,1) ……….. f(0,N-1)
f (1,0) f(1,1) ………… f(1,N-1)
f(xylem) = . . .
. . .
. . .
f (M-1,0) f(M-1,1) ………… f(M-1,N-1)
3
The right side of this equation is a digital image by definition. Each element of this array
is called an image element, picture element, pixel or pel. The terms image and pixel are
used throughout the rest of our discussions to denote a digital image and its element.
A digital image can be represented naturally as a MATLAB matrix:
f(1,1) f(1,2) ……. f(1,N)
f(x, y) = f(2,1) f(2,2) …….. f(2,N)
. . .
f (M, 1) f(M,2) …….f(M,N)
Where f(1,1) = f(0,0) (note the use of a monoscope font to denote MATLAB quantities).
Clearly the two representations are identical, except for the shift in origin. f(p ,q) denotes
the element located in row p and the column q.
Matrices in MATLAB are stored in variables with names such as A, a, RGB, real
array and so on. Variables must begin with a letter and contain only letters, numerals and
underscores. As noted in the previous paragraph, all MATLAB quantities are written
using monoscope characters. We use conventional Roman, italic notation such as f(x ,y),
for mathematical expressions.
1.5 Image Types
The toolbox supports four types of images:
Intensity images
Binary images
Indexed images
RGB images
Most monochrome image processing operations are carried out using binary or intensity
images, so our initial focus is on these two image types. Indexed and RGB color images.
4
1.5.1 Intensity Images
An intensity image is a data matrix whose values have been scaled to represent intentions.
When the elements of an intensity image are of class unit8, or class unit 16, they have
integer values in the range [0,255] and [0, 65535], respectively.
1.5.2 Binary Images
Binary images have a very specific meaning in MATLAB.A binary image is a logical
array of 0s and1s.Thus, an array of 0s and 1s whose values are of data class, say unit8, is
not considered as a binary image in MATLAB .A numeric array is converted to binary
using function logical. Thus, if A is a numeric array consisting of 0s and 1s, we create an
array B using the statement.
B=logical (A)
If A contains elements other than 0s and 1s.Use of the logical function converts all
nonzero quantities to logical 1s and all entries with value 0 to logical 0s. Using relational and
logical operators also creates logical arrays. To test if an array is logical we use the I logical
function: is logical(c). If c is a logical array, this function returns a 1.Otherwise returns a 0.
Logical array can be converted to numeric arrays using the data class conversion functions.
1.5.3 Indexed Images
An indexed image has two components:
A data matrix integer, x.
A color map matrix, map.
Matrix map is an m*3 arrays of class double containing floating_ point values in the
range [0, 1]. The length m of the map are equal to the number of colors it defines. Each
row of map specifies the red, green and blue components of a single color. An indexed
images uses “direct mapping” of pixel intensity values color map values. The color of
each pixel is determined by using the corresponding value the integer matrix x as a
pointer in to map. If x is of class double ,then all of its components with values less than
or equal to 1 point to the first row in map, all components with value 2 point to the second
row and so on. If x is of class units or unit 16, then all components value 0 point to the
first row in map, all components with value 1 point to the second and so on.
5
1.5.4 RGB Images
An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet
corresponding to the red, green and blue components of an RGB image, at a specific
spatial location. An RGB image may be viewed as “stack” of three gray scale images that
when fed in to the red, green and blue inputs of a color monitor.
Produce a color image on the screen. Convention the three images forming an
RGB color image are referred to as the red, green and blue components images. The data
class of the components images determines their range of values. If an RGB image is of
class double the range of values is [0, 1].
Similarly the range of values is [0,255] or [0, 65535].For RGB images of class
units or unit 16 respectively. The number of bits use to represents the pixel values of the
component images determines the bit depth of an RGB image. For example, if each
component image is an 8bit image, the corresponding RGB image is said to be 24 bits
deep. Generally, the number of bits in all component images is the same. In this case the
number of possible color in an RGB image is (2^b) ^3, where b is a number of bits in
each component image. For the 8bit case the number is 16,777,216 colors.
1.6 Need for Image Compression
One of the important aspects of image storage is its efficient Compression. To make this
fact clear let's see an example. An image, 1024 pixel x 1024 pixel x 24 bit without
compression would require 3 MB of storage and 7 minutes for transmission, utilizing a
high speed 64 Kbits/s ISDN line. If the image is compressed at a 10:1 compression ratio,
the storage requirement is reduced to 300 KB and the transmission time drops to under 6
seconds. Seven 1 MB images can be compressed and transferred to a floppy disk in less
time than it takes to send one of the original files, uncompressed, over an AppleTalk
network.
In a distributed environment large image files remain a major bottleneck within
systems. Compression is an important component of the solutions available for creating
file sizes of manageable and transmittable dimensions. Increasing the bandwidth is
another method, but the cost sometimes makes this a less attractive solution.
6
1.7 Data Compression
Data compression is similar in concept to lossless image compression; however it does
not try to interpret the data as an image. Data compression searches for patterns in a data
stream. Common data compression methods are deflated and LZW 6. The lossless
compression file format GIF7 simply turns the image into a long string of data (all the
horizontal lines appended) and applies LZW data compression. The lossy file format
JPEG uses Huff-man encoding to compress the data stream that is output from the
Discrete Cosine Transformation process. This project will not deal with data
compression. However as the main goal of this project is to compare SVD against JPEG,
a general purpose data compression algorithm is applied to the output data stream to give
meaningful comparisons.
7
Chapter 2
VIDEO COMPRESSION
The digital video compression technology has been boomed for many years. Today, when
people chat with their friends through a visual telephone, when people enjoy the movie
broadcasting through Internet or the digital music such as mp3, the convenience that the
digital video industry brings to us cannot be forgotten. All of these should attribute to the
enhancement on mass storage media or streaming video/audio services which has
influenced our daily life deeply. In this project we will be implementing in simulink,
Simulink is a platform for multidomain simulation and Model-Based Design for dynamic
systems. It provides an interactive graphical environment and a customizable set of block
libraries, and can be extended for specialized applications.
This video compression using motion compensation and Discrete Cosine
Transform (DCT) techniques with the Video and Video Processing Block set. The demo
calculates motion vectors between successive frames and uses them to reduce redundant
information. Then it divides each frame into sub matrices and applies the discrete cosine
transform to each sub matrix. Finally, the demo applies a quantization technique to
achieve further compression. The Decoder subsystem performs the inverse process to
recover the original video.
2.1 Why is Digital Video Compressed?
Digital video is compressed because it takes up a staggering amount of room in its
original form. By compressing the video, you make it easier to store. Digital video can be
compressed without impacting the perceived quality of the final product because it affects
only the parts of the video that humans can't really detect.
Compressing video is essentially the process of throwing away data for things we
can't perceive. Standard digital video cameras compress video at a ratio of 5 to 1, and
there are formats that allow you to compress video by as much as 100 to 1. But too much
compression can be a bad thing. The more you compress the more data you throw away.
Throw away too much, and the changes become noticeable. With heavy compression you
can get video that's nearly unrecognizable. When you compress video, always try several
8
compression settings. The goal is to compress as much possible until the data loss
becomes noticeable and then notch the compression back a little. That will give you the
right balance between file size and quality. And remember that every video is different.
2.2 Categories of Data Compression Algorithms
Two categories of data compression algorithm can be distinguished: lossless and lossy
Lossy techniques cause image quality degradation in each compression/decompression
step. Careful consideration of the human visual perception ensures that the degradation is
often unrecognizable, though this depends on the selected compression ratio. In general,
lossy techniques provide far greater compression ratios than lossless techniques. Here
we'll discuss the roles of the following data compression techniques:
2.2.1 Lossless Coding Techniques
Lossless coding guaranties that the decompressed image is absolutely identical to the
image before compression. This is an important requirement for some application
domains, e.g. medical imaging, where not only high quality is in demand, but unaltered
archiving is a legal requirement. Lossless techniques can also used for the compression of
other data types where loss of information is not acceptable, e.g. text documents and
program executables.
Some compression methods can be made more effective by adding a 1D or 2D delta
coding to the process of compression. These deltas make more effective use of run length
encoding, have (statistically) higher maxima in code tables (leading to better results in
Huffman and general entropy coding), and build greater equal value areas usable for area
coding.
Some of these methods can easily be modified to be lossy. Lossy element fits
perfectly into 1D/2D run length search. Also, logarithmic quantization may be inserted to
provide better or more effective results.
Run Length Encoding: Run length encoding is a very simple method for compression of
sequential data. It takes advantage of the fact that, in many data streams, consecutive
single tokens are often identical. Run length encoding checks the stream for this fact and
inserts a special token each time a chain of more than two equal input tokens are found.
This special input advises the decoder to insert the following token n times into his output
9
stream. The effectivity of run length encoding is a function of the number of equal tokens
in a row in relation to the total number of input tokens. This relation is very high in
undeterred two tone images of the type used for facsimile. Obviously, effectively
degrades when the input does not contain too many equal tokens. With a rising density of
information, the likelihood of two following tokens being the same does sink
significantly, as there is always some noise distortion in the input. Run length coding is
easily implemented, either in software or in hardware. It is fast and very well verifiable,
but its compression ability is very limited.
Huffman Encoding: This algorithm, developed by D.A. Huffman, is based on the fact
that in an input stream certain tokens occur more often than others. Based on this
knowledge, the algorithm builds up a weighted binary tree according to their rate of
occurrence. Each element of this tree is assigned a new code word, whereat the length of
the code word is determined by its position in the tree. Therefore, the token which is most
frequent and becomes the root of the tree is assigned the shortest code. Each less common
element is assigned a longer code word. The least frequent element is assigned a code
word which may have become twice as long as the input token. The compression ratio
achieved by Huffman encoding uncorrelated data becomes something like 1:2.
On slightly correlated data, as on images, the compression rate may become much
higher, the absolute maximum being defined by the size of a single input token and the
size of the shortest possible output token (max. compression = token size[bits]/2[bits]).
While standard palletized images with a limit of 256 colors may be compressed by 1:4 if
they use only one color, more typical images give results in the range of 1:1.2 to 1:2.5.
Entropy Coding: The typical implementation of an entropy coder follows J. Ziv/A.
Lempel's approach. Nowadays, there is a wide range of so called modified Lempel/Ziv
coding. These algorithms all have a common way of working. The coder and the decoder
both build up an equivalent dictionary of met symbols, each of which represents a whole
sequence of input tokens. If a sequence is repeated after a symbol was found for it, then
only the symbol becomes part of the coded data and the sequence of tokens referenced by
the symbol becomes part of the decoded data later. As the dictionary is build up based on
the data, it is not necessary to put it into the coded data, as it is with the tables in a
Huffman coder. Entropy coders are a little tricky to implement, as there are usually a few
tables, all growing while the algorithm runs.
10
Area Coding: Area coding is an enhanced form of run length coding, reflecting the two
dimensional character of images. This is a significant advance over the other lossless
methods. For coding an image it does not make too much sense to interpret it as a
sequential stream, as it is in fact an array of sequences, building up a two dimensional
object. Therefore, as the two dimensions are independent and of same importance, it is
obvious that a coding scheme aware of this has some advantages. The algorithms for area
coding try to find rectangular regions with the same characteristics. These regions are
coded in a descriptive form as an Element with two points and a certain structure. The
whole input image has to be described in this form to allow lossless decoding afterwards.
Practical implementations use recursive algorithms for reducing the whole area to
equal sized sub rectangles until a rectangle does fulfill the criteria defined as having the
same characteristic for every pixel. This type of coding can be highly effective but it
bears the problem of a nonlinear method, which cannot be implemented in hardware.
Therefore, the performance in terms of compression time is not competitive.
2.2.2 Lossy Coding Techniques
In most of applications we have no need in the exact restoration of stored image. This fact
can help to make the storage more effective, and this way we get to lossy compression
methods. Lossy image coding techniques normally have three components:
Image modeling which defines such things as the transformation to be applied to
the image.
Parameter quantization whereby the data generated by the transformation is
quantized to reduce the amount of information.
Encoding, where a code is generated by associating appropriate codeword to the
raw data produced by the quantization.
Each of these operations is in some part responsible of the compression. Image modeling
is aimed at the exploitation of statistical characteristics of the image (i.e. high correlation,
redundancy). Typical examples are transform coding methods, in which the data is
represented in a different domain (for example, frequency in the case of the Fourier
Transform [FT], the Discrete Cosine Transform [DCT], the Kahrunen-Loewe Transform
11
[KLT], and so on), where a reduced number of coefficients contains most of the original
information. In many cases this first phase does not result in any loss of information.
The aim of quantization is to reduce the amount of data used to represent the information
within the new domain. Quantization is in most cases not a reversible operation:
therefore, it belongs to the so called 'lossy' methods.
Encoding is usually error free. It optimizes the representation of the information
(helping sometimes to further reduce the bit rate), and may introduce some error detection
codes. In the following sections, a review of the most important coding schemes for lossy
compression is provided. Some methods are described in their canonical form (transform
coding, region based approximations, fractal coding, wavelets, hybrid methods) and some
variations and improvements presented in the scientific literature are reported and
discussed.
Transform Coding (DCT/Wavelets/Gabor): A general transform coding scheme
involves subdividing an NxN image into smaller nxn blocks and performing a unitary
transform on each sub image. A unitary transform is a reversible linear transform whose
kernel describes a set of complete, ortho normal discrete basic functions. The goal of the
transform is to decorate the original signal, and this declaration generally results in the
signal energy being redistributed among only a small set of transform coefficients. In this
way, many coefficients may be discarded after quantization and prior to encoding. Also,
visually lossless compression can often be achieved by incorporating the HVS contrast
sensitivity function in the quantization of the coefficients.
Transform coding can be generalized into four stages:
Image subdivision
Image transformation
Coefficient quantization
Huffman encoding
For a transform coding scheme, logical modeling is done in two steps: a segmentation
one, in which the image is subdivided in bi dimensional vectors (possibly of different
sizes) and a transformation step, in which the chosen transform (e.g. KLT, DCT, and
Hadamard) is applied. Quantization can be performed in several ways. Most classical
approaches use 'zonal coding', consisting in the scalar quantization of the coefficients
belonging to a predefined area (with a fixed bit allocation), and 'threshold coding',
12
consisting in the choice of the coefficients of each block characterized by an absolute
value exceeding a predefined threshold. Another possibility, that leads to higher
compression factors, is to apply a vector quantization scheme to the transformed
coefficients. The same type of encoding is used for each coding method. In most cases a
classical Huffman code can be used successfully. The JPEG and MPEG standards are
examples of standards based on transform coding.
Vector Quantization: A vector quantize can be defined mathematically as a transform
operator T from a K-dimensional Euclidean space R^K to a finite subset X in R^K made
up of N vectors. This subset X becomes the vector codebook or more generally, the
codebook. Clearly, the choice of the set of vectors is of major importance. The level of
distortion due to the transformation T is generally computed as the most significant error
(MSE) between the "real" vector x in R^K and the corresponding vector x' = T(x) in X.
This error should be such as to minimize the Euclidean distance d. An optimum scalar
quantizer was proposed by Lloyd and Max. Later on Linde, Buzo and Gray resumed and
generalized this method, extending it to the case of a vector quantizer.
The LBG algorithm for the design of a vector codebook always reaches a local
minimum for the distortion function, but often this solution is not the optimal one. A
careful analysis of the LBG algorithm's behavior allows one to detect two critical points:
the choice of the starting codebook and the uniformity of the Voronoi regions'
dimensions. For this reason some algorithms have been designed that give better
performances. With respect to the initialization of the LBG algorithm, for instance, one
can observe that a random choice of the starting codebook requires a large number of
iterations before reaching an acceptable amount of distortion. Moreover, if the starting
point leads to a local minimum solution, the relative stopping criterion prevents further
optimization steps.
Segmentation and Approximation Methods: With segmentation and approximation
coding methods, the image is modeled as a mosaic of regions, each one characterized by a
sufficient degree of uniformity of its pixels with respect to a certain feature (e.g. grey
level, texture); each region then has some parameters related to the characterizing feature
associated with it.
The operations of finding a suitable segmentation and an optimum set of
approximating parameters are highly correlated, since the segmentation algorithm must
13
take into account the error produced by the region reconstruction (in order to limit this
value within determined bounds). These two operations constitute the logical modeling
for this class of coding schemes; quantization and encoding are strongly dependent on the
statistical characteristics of the parameters of this approximation.
For polynomial approximation regions are reconstructed by means of polynomial
functions in (x, y); the task of the encoder is to find the optimum coefficients. In texture
approximation, regions are filled by synthesizing a parameterized texture based on some
model (e.g. fractals, statistical methods, Markov Random Fields [MRF]). It must be
pointed out that, while in polynomial approximations the problem of finding optimum
coefficients is quite simple (it is possible to use least squares approximation or similar
exact formulations), for texture based techniques this problem can be very complex.
Fractal Compression: It is a form of vector quantization and it is a lossy compression.
Compression is performed by locating self-similar sections of an video, then using a
fractal algorithm to generate the sections. Like DCT, discrete wavelet transform
mathematically transforms an video into frequency components. The process is
performed on the entire video, which differs from the other methods (DCT) that work on
smaller pieces of the desired data. The result is a hierarchical representation of a video,
where each layer represents a frequency band.
2.3 Compression Standards
MPEG stands for the Moving Picture Experts Group. MPEG is an ISO/IEC working
group, established in 1988 to develop standards for digital audio and video formats. There
are five MPEG standards being used or in development. Each compression standard was
designed with a specific application and bit rate in mind, although MPEG compression
scales well with increased bit rates. They include:
MPEG-1: Designed for up to 1.5 Mbit/sec Standard for the compression of moving
pictures and audio. This was based on CD-ROM video applications, and is a popular
standard for video on the Internet, transmitted as .mpg files. In addition, level 3 of
MPEG-1 is the most popular standard for digital compression of audio--known as MP3.
MPEG-1 is the standard of compression for VideoCD, the most popular video distribution
format throughout much of Asia.
14
MPEG-2: Designed for between 1.5 and 15 Mbit/sec Standard on which Digital
Television set top boxes and DVD compression is based. It is based on MPEG-1, but
designed for the compression and transmission of digital broadcast television. The most
significant enhancement from MPEG-1 is its ability to efficiently compress interlaced
video. MPEG-2 scales well to HDTV resolution and bit rates, obviating the need for an
MPEG-3.
MPEG-4: It is a Standard for multimedia and Web compression. MPEG-4 is based on
object-based compression, similar in nature to the Virtual Reality Modeling Language.
Individual objects within a scene are tracked separately and compressed together to create
an MPEG4 file. This results in very efficient compression that is very scalable, from low
bit rates to very high. It also allows developers to control objects independently in a
scene, and therefore introduce interactivity
JPEG: JPEG stands for Joint Photographic Experts Group. It is also an ISO/IEC working
group, but works to build standards for continuous tone video coding. JPEG is a lossy
compression technique used for full-color or gray-scale videos, by exploiting the fact that
the human eye will not notice small color changes JPEG 2000 is an initiative that will
provide an video coding system using compression techniques based on the use of
wavelet technology.
2.4 Transforms
There are several common transforms being used in signal processing, such as the
Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Discrete Wavelet
Transform (DWT) and some other as well. The (DCT) is the most common transform
being used when processing videos and video. The (DWT) is used in the video
compression standard JPEG2000, and will be used in this application as well. Both the
(DCT) and (DWT) will be more thoroughly described. The basic idea of using transforms
when processing for example a video is to decorrelate the pixels to one another. By doing
so compression is achieved since the amount of redundant information is minimized. A
transform can be seen as a projection onto orthonormal bases, separated in time and/or
frequencies. By transforming a signal the energy is separated into sub bands. By
describing each sub band with dierent precisions, higher precision within high energy sub
bands and less precision in low energy sub bands, the signal can be compressed.
15
1/√N k=0, 0 ≤ n ≤ N-1 C (k, n) =
√Ncos (π(2n+1) k/2N) 1 ≥ k≤ N-1, 0 ≥ n ≤ N-1
To transform a matrix Y the transform matrix, c, is multiplied with with Y and gives the
transformed matrix X =CY. The Cosine transform is real-valued and orthogonal which
means that X has the properties as in
X=X*
X‾ =XT
The DCT is also excellent in energy compaction which means that the energy of the
matrix is concentrated to a small region of the transformed matrix. It has also good
decorrelation properties. These properties are very suitable for video and video processing
and are therefore widely used,(i.e. JPEG, MPEG and H.263.). The two-dimensional DCT
of Figure 2.1(a) can be seen. Note that the compaction of energy is concentrated to the
upper left corner.
2.5 DCT and Video Compression
In the JPEG video compression algorithm, the input video is divided into 8-by-8 or 16-
by-16 blocks, and the two-dimensional DCT is computed for each block. The DCT
coefficients are then quantized, coded, and transmitted. The JPEG receiver (or JPEG file
reader) decodes the quantized DCT coefficients, computes the inverse two-dimensional
DCT of each block, and then puts the blocks back together into a single video. For typical
videos, many of the DCT coefficients have values close to zero; these coefficients can be
discarded without seriously affecting the quality of the reconstructed video.
The example code below computes the two-dimensional DCT of 8-by-8 blocks in
the input video, discards (sets to zero) all but 10 of the 64 DCT coefficients in each block,
and then reconstructs the video using the two-dimensional inverse DCT of each block.
The transform matrix computation method is used.
I = imread ('cameraman.tif');
I = im2double (I);
T = dctmtx (8);
B = blkproc (I,[8 8],'P1*x*P2',T,T');
16
B2 = blkproc (B,[8 8],'P1.*x',mask);
I2 = blkproc (B2,[8 8],'P1*x*P2',T',T);
Although there is some loss of quality in the reconstructed video, it is clearly
recognizable, even though almost 85% of the DCT coefficients were discarded. To
experiment with discarding more or fewer coefficients, and to apply this technique to
other videos, try running the demo function DCT demo.
2.6 General Description
The Discrete Cosine Transform (DCT) is a technique that converts a spatial domain
waveform into its constituent frequency components as represented by a set of
coefficients. The process of reconstructing a set of spatial domain samples is called the
Inverse Discrete Cosine Transform (IDCT).
For data compression of video/video frames, usually a block of data is converted
from spatial domain samples to another domain (usually frequency domain), which offers
more compact representation. The optimal transform is the Karhunen-Loeve Transform
(KLT), as it packs most of the block energy into a fewer number of frequency domain
elements, it minimizes the total entropy of the block, and it completely de-correlates its
elements. However, it’s main Disadvantage is that its basis functions are video-
dependent. This complicates the digital implementation. The Discrete Cosine Transform
introduced by Ahmed in 1974, has the next best performance in compaction Efficiency,
while also having video-independent basis functions. Hence DCT is used to provide the
17
Mask =
necessary transform and the resultant data is then compressed using quantization and
various coding techniques to offer lossless as well as lossy compression.
Chapter 3
SINGULAR VALUE DECOMPOSITION (SVD)
Singular Value Decomposition (SVD) is said to be a significant topic in linear algebra by
many renowned mathematicians. SVD has many practical and theoretical values; Special
feature of SVD is that it can be performed on any real (m, n) matrix. In this presentation
we will demonstrate how to use Singular Value Decomposition (SVD) to factorize and
approximate large matrices, specifically images.
Singular value decomposition takes a rectangular m×n matrix A and calculates three
matrices U, S, and V. S is a diagonal m×n matrix (the same dimensions as A). U and V
are unitary or orthogonal matrices with sizes m×m and n×n respectively. The matrices are
related by the equation
A=USVH
Calculating the SVD consists of finding the Eigen values and eigenvectors of AAH and
AHA. The eigenvectors of AHA make up the columns of V; the eigenvectors of AAH make
up the columns of U. The eigen values of AHA or AAH are the squares of the singular
values for A. The singular values are the diagonal entries of the S matrix and are arranged
in descending order. The singular values are always real numbers. If the matrix A is a real
matrix, then U and V are also real. Equation (2) can be expressed as
The matrix A can be approximated by matrix Ậ with rank k b
18
The matrix U contains one orthonormal basis. U is also known as the left singular vectors.
The matrix V contains another orthonormal basis. V is also known as the right singular
vectors. The diagonal matrix S contains the singular values.
3.1 Factoring V and S
First we will find V. To eliminate U from the equation A = USV T you simply
multiply on the left by AT :
ATA= (USVT) T (USVT)= VSTU T USVT
Since U is an orthogonal matrix, UTU = I which gives
ATA= VS2V T
Notice that this is similar to the digitalization of a matrix A, where A = QΛQT .
But now the symmetric matrix is not A it is ATA.
To find V and S we need to diagonalize ATA by finding the eigenvalues and
eigenvectors. The eigenvalues are the square of the elements of S (the singular
values) and the eigenvectors are the columns of V (the right singular vectors).
3.2 Factoring U
Eliminating V from the equation is very similar to eliminating U. Instead of
multiplying on the left by AT we will multiply on the right by AT.This gives
A AT= (USVT) (USVT) T = USVT VSTU T
Since V TV = I, this gives
A AT =VS2V T
Again we will find the eigenvectors, but this time for AAT. These are the columns
of U (the left singular vectors).
3.3 Properties of the SVD
There are many properties and attributes of SVD, here we just present parts of the
properties that we used in this project.
The singular value σ1, σ2….σn are unique, however, the matrices U and V are not
unique;
19
Since ATA = VS T SV T, so V diagonalizes ATA, it follows that the vi s are
the Eigenvector of ATA.
Since AAT = USS T U T, so it follows that U diagonalizes AAT and that the ui’s are
the eigenvectors of AAT.
If A has rank of r then vj, vj, , vr form an orthonormal basis for range space of
AT ,R(AT ), and uj, uj, …, ur form an orthonormal basis for .range space A, R(A).
The rank of matrix A is equal to the number of its nonzero singular values.
3.4 Using SVD for Image Compression
Image compression deals with the problem of reducing the amount of data required to
represent a digital image. Compression is achieved by the removal of three basic data
redundancies:
Coding redundancy, which is present when less than optimal;
Interpixel redundancy, which results from correlations between the pixels;
Psycho visual redundancies, which is due to data that is ignored by the human
visual.
To illustrate the SVD image compression process, consider the following equation
That is A can be represented by the outer product expansion:
A=S1U1V1T + S2U2V2
T +…..+SrUrVrT
When compressing the image, the sum is not performed to the very last SVs; the
SVs with small enough values are dropped. (Remember that the SVs are ordered on the
diagonal.) The closet matrix of rank k is obtained by truncating those sums after the first k
terms:
A=S1U1V1T + S2U2V2
T +…..+SkUkVkT
The total storage for Ak will be K (m+n+1).
The integer k can be chosen confidently less then n, and the digital image
corresponding to Ak still have very close the original image. However, choosing the
different k will have a different corresponding image and storage for it. For typical
choices of the k, the storage required for Ak will be less the 20 percentage.
20
3.5 Splitting Image into Smaller Blocks
The SVD process has order n3, which makes it very slow for large pictures. However if
the picture is broken up into smaller sections and each handled separately, the overall
processing time is much lower. This is not a trade-off situation and in fact, as will be
seen, is necessary for good rates of compression.
The key to SVD compression is using low rank approximations to the image. The
less complicated an image the lower the rank necessary to accurately represent it. For
example a picture that is a single color block can be perfectly represented by a rank 1
SVD.
Let X be an n×n matrix with every value as some constant c Є R. Then X=cjjT
Where j=(1,1,…,1)T now if u=v=1/√((n) and s=cn it can be seen that X=cijT=usvT
A realistic photo (for example take a figure shown below) however is generally
complex overall, but may contain sections of simpler images. With the test image
Frogrock there are areas with simple images. For example the sky has very little detail.
Also the side of the hut is fairly monotone. While other sections such as the person are
more complicated. It would make sense to break up this photo so that the simple sections
can be represented with low rank approximations, while the complicated sections have
higher rank to include the detail.
21
Figure 3.1: Frog Rock Test ImageA human can quickly look at a photograph and isolate the sections of high detail
from low detail. However this can be a difficult task for a computer, requiring a lot of
processing. Ideally the picture would be perfectly split into separate regions based on
complexity, but in practice this would be too time consuming and require too much
overhead information to keep track of the regions.
A simple approach is to break the image into smaller blocks of the same size.
Although the blocks won’t perfectly align with the different regions of complexity, if
there are enough blocks then the blocks will generally match regions of complexity. This
is the approach used by JPEG; pictures are divided into blocks of 8× 8 (the JPEG
specification allows block sizes of 16× 16 but this is rarely used).
The second approach used in this project is to have adaptive block sizes. Initially
the picture is broken up into a series of large blocks. Then each block is split into four
quarter size blocks. If less storage is required when the block is split into quarters, then
these new blocks are accepted, otherwise the original block is left. This process can be
repeated on the new blocks, getting smaller and smaller each time.
3.6 Adaptive Block Sizes
For a given rank, the larger the block size the more efficient the storage. For example, a
100× 100 matrix approximated with rank k requires (100 + 100) k = 200k elements. If the
matrix was split into four 50×50 matrices also approximated with rank k, then each matrix
22
would require (50+50)k = 100k elements. Therefore all four of them would require 400k
elements, which is twice the amount required for the single block. However it is to be
hoped the smaller blocks are simpler and require a lower rank to represent them. An n ×n
block with rank k requires 2nk elements. If this block is split into four quarter blocks of
size 1/2n× 1/2n, with each sub-block having rank k1, k2, k3, k4 respectively, the number
of storage elements is
n (k1+k2+k3+k4)
So the decision as to whether a block is to be subdivided should be based on
n (k1+k2+k3+k4) < 2nk
k1+k2+k3+k4 <2k
Unfortunately, to calculate the values k1, k2, k3, k4, the block has to be divided and
a SVD applied to each sub-block. As a result many more SVDs are performed than are
used in the output. This results in a much slower compression time, however does not
affect the speed of decompression. The advantage of this adaptive block size technique is
that it can better map the regions of complexity of the picture.
3.7 Mixed Mode
With the dividing block technique we could start with a block the size that fits the whole
picture. The block size needs to be chosen so that it evenly divides into quarters at each
step. So preferably the dimensions should be a power of 2. The picture is therefore
‘padded’ out so that the whole image is a square with the extra pixels being set to zero.
The zero sections of the picture will not require much storage, and in fact any sub square
consisting entirely of zeros will have rank 0.
This technique has the disadvantage that the compression can take a very long
time, as the first few SVDs to perform are on very large blocks. When this technique was
used on the test images, none of the first few blocks were accepted. The blocks had to be
reduced to a small enough size before the algorithm determined that it was not worth
subdividing further. A combination of the fixed block and the subdividing block
techniques solved this problem.
First the picture is divided into moderate size fixed blocks. Then the adaptive
block size technique is applied to each fixed block separately. So there is an upper limit
on the block sizes which saves a lot of unnecessary processing time. Similarly the
23
algorithms never accepted a block size that was too small, so a lower limit on the block
size could be used. The test images used, sensible upper and lower bounds were found to
be 64×64 and 8×8. Therefore only four block sizes were allowed: 64×64, 32×32, 16× 16,
and 8×8. So the extra processing required for rejected blocks was not too great.
3.8 Picture Quality
Image Compression Measures: In order to measure the performance of the SVD image
compression method, we can computer the compression factor and the quality of the
compressed image. Image compression factor can be computed using the Compression
ratio
CR = m*n/ (k (m + n + 1))
Measuring Picture Quality: The original image is represented by the matrix A. The
approximating image is matrix Ậ. It is necessary to have a measure of image quality.
Unfortunately image quality as perceived by the eye is a very subjective measurement. A
human can quickly look at an image and determine that the quality is acceptable or not
acceptable but it is difficult to mathematically represent this. The most common
measurement used in image processing is the Peak to Peak Signal to Noise Ratio (PSNR)
measured in decibels (db). Although not a great model of the human eye, it is simple to
calculate.
PSNR = 10log10 ((max range)2/RMSE)
RMSE=
Max range is the allowed value range of the pixels. For convenience pixels will be in the
range [0….. 1]. Hence max range = 1. RMSE is the Root Mean Square Error.
Higher Order SVD: Tensor decomposition was studied in psychometric data analysis
during the 1960s, when data sets having more than two dimensions (generally called
“three-way data sets”) became widely used. A fundamental achievement was brought by
Tucker (1963), who proposed to decompose a 3-D signal using directly a 3-D principal
24
component analysis (PCA) instead of unfolding the data on one dimension and using the
standard SVD. This three-way PCA is also known as Tucker3 decomposition. In the
1980s, such multidimensional techniques were also applied to chemometrics analysis.
The signal processing community only recently showed interest in the Tucker3
decomposition. The work of Lathauwer et al. (2000) proved that this decomposition is a
multilinear generalization of the SVD to multidimensional data. Studying its properties
with a notation more familiar to the signal processing community, the authors highlighted
its properties concerning the rank, oriented energy, and best reduced-rank approximation.
As the decomposition can have higher dimensions than 3, they called it higher order SVD
(HOSVD). In the following, we consider the notation of and define the HOSVD
decomposition.
Multiple-Level Decomposition: The decomposition process can be iterated, with
successive approximations being decomposed in turn, so that one signal is broken down
into many lower resolution components. This is called the wavelet decomposition tree.
Figure 3.2: Multiple Level Decomposition
Number of Levels: Since the analysis process is iterative, in theory it can be continued
indefinitely. In reality, the decomposition can proceed only until the individual details
consist of a single sample or pixel. In practice, you’ll select a suitable number of levels
based on the nature of the signal, or on a suitable criterion such as entropy.
Recently, the parametric model proposed by Doretto et al. was shown to be a valid
approach for analysis/synthesis of dynamic textures. Each video frame is unfolded into a
column vector and constitutes a point that follows a trajectory as time evolves. The
analysis consists in finding an appropriate space to describe this trajectory and in
25
cA3 cA4
cA2 cD2
cA1 cD1
S
identifying the trajectory using methods of dynamical system theory. The first part is
done by using singular value decomposition (SVD) to perform dimension reduction to a
lower dimensional space. The point trajectory is then described using a multivariate auto-
regressive (MAR) process of order 1. Dynamic textures are, thus, modeled using a linear
dynamic system and synthesis is obtained by driving this system with white noise. In this
model, the SVD exploits the temporal correlation between the video frames but the
unfolding operations prevent the possibility of exploiting spatial and chromatic
correlations.
We use the parametric approach of but perform the dynamic texture analysis with
a higher order SVD, which permits to simultaneously decompose the temporal, spatial,
and chromatic components of the video sequence. This approach was proposed by the
authors in [10] and here it is described in detail. Our scheme is depicted in Fig. 1. SVD in
the analysis is substituted by HOSVD.
Figure 3.3: Schematic Representation of the Tensor-Based Linear Model Approach for Analysis and Synthesis.
26
HOSVD is an extension of the SVD to higher order dimensions. It is not an
optimal tensor decomposition in the sense of least squares data fitting and has not the
truncation property of the SVD, where truncating the first singular values permits to find
the best -rank approximation of a given matrix. Despite this, the approximation obtained
is not far from the optimal one and can be computed much faster. In fact, the computation
of HOSVD does not require iterative alternating least squares algorithms, but needs
standard SVD computation only. The major advantage of the HOSVD is the ability of
simultaneously considering the spatial, temporal, and chromatic correlations. This allows
for a better data modeling than a standard SVD, since dimension reduction can be
performed not only in the time dimension but also separately for spatial and chromatic
content.
The separate analysis of each signal component allows adapting the signal
“compression” given by the dimension reduction to the characteristics of each dynamic
texture. For comparable visual synthesis quality, we, thus, obtain a number of model
coefficients that is on average five times smaller than those obtained using standard SVD.
Creating more compact models is also addressed in, where dynamic texture shape and
visual appearance are jointly addressed, thus enabling the modeling of complex video
sequences containing sharp edges. Their and our approach is both characterized by a more
computationally expensive analysis, but also a fast synthesis. In our case, synthesis can be
done in real-time. This makes our technique very appropriate for applications with
memory constraints, such as mobile devices. We believe that HOSVD is a very promising
technique for other video analysis and approximation applications. Recently, it has been
successfully used in image based texture rendering, face super resolution, and in face
analysis and recognition.
In the framework of video compression and transmission, it is useful to find a way
to analyze/synthesize dynamic textures. An efficient compression, in fact, would open the
possibility of having access to realistic video animations on devices that have strong
constraints in the available bandwidth. This is the case of mobile phones, for instance.
The approaches used to model dynamic textures can be classified into non-parametric and
parametric. In the first case, the analysis and synthesis is conducted directly from a given
representation of the image (the pixel values or a description in an transformed domain
obtained using certain bases, as wavelets for instance).
27
In the second case, researchers aim to describe the dynamic texture using
dynamical models. An interesting approach is to consider a linear dynamic model (LDS).
In fact, if some simplifications are considered, a close solution for the estimation of the
model's parameters can be found for such systems. Unfortunately, the synthesized
sequences obtained using this method are not visually appealing, if compared to the
original sequence, where periodicity (oscillation) has been introduced in the model by
forcing the poles of the dynamic system to lay on the unit circle. This solution permits to
obtain more realistic sequences, but still is based on the same assumptions used for the
construction.
A dynamic texture can be considered as a multidimensional signal. In the case of a
grayscale image video, it can be represented with a 3-D tensor by assigning spatial
information to the first two dimensions and time to the third. In a color video sequence,
chromatic components add another dimension. The input signal then becomes 4-D.
The analysis is done by first decomposing the input signal using the HOSVD and
then by considering the orthogonal matrix derived from the decomposition along the time
dimension. This matrix contains the dynamics of the video sequences, since its columns,
ordered along the time axis, correspond to the weights that control the appearance of the
dynamic texture as time evolves.
3.9 Dynamic Texture Synthesis
Dynamic textures are textures that change over time. Videos including fire, water, smoke,
and so on, are typical example of dynamic textures. Dynamic textures synthesis is the
process of creating artificial textures. This can be achieved starting either from a
description (model) of a physical phenomenon or from existing video sequences.
The first approach is called physics-based and leads to a description of the
dynamic texture that usually requires few parameters. This approach has been extensively
adopted for the reproduction of synthetic flames or fire, since they are often used in
gaming applications or digital movies. Even though parameter tuning is not always
straightforward, the synthesis results are impressive, but computationally extremely
expensive. This limits the use of this type of model to cases where synthesis can be done
offline, such as during editing in the movie making process.
28
The second approach is called image-based, as it does not aim at modeling the
physics underlying the natural process, but at replicating existing videos. This can be
done in two ways. In the first, synthesis is done by extracting different clips from the
original video and patching them together to obtain a longer video, ensuring that the
temporal joints are not noticeable and that the dynamic appearance is maintained. This
type of synthesis is called nonparametric or patch-based, since it is not based on a model
and reduces the synthesis to a collage of patches. It has the advantage of ensuring high
visual quality because the synthetic video is composed of the original video frames,
marginally modified by morphing operations only along clips discontinuities. However,
the entire synthetic video has to be created in one step and stored in memory, thus not
allowing for on-the-fly synthesis. In addition, this technique is not flexible, since it
permits to modify the appearance of single frames, but not the texture dynamics.
In the second, a parametric image-based approach is used to build a model of
dynamic textures. The dynamic texture is analyzed and model parameters are computed.
The visual quality of the synthesis is generally less good than for patch-based techniques,
but the parametric approach is more flexible, more compact in terms of memory
occupation, and usually permits on-the-fly synthesis. Moreover, it can also be used for
other applications, such as segmentation, recognition, and editing.
The term “specificity” indicates if a given approach is specific to a certain type of
dynamic texture, such as fire, water, or smoke, or can be used for all kinds of dynamic
textures. The term “flexibility” indicates if the characteristics of the generated texture can
easily be changed during the synthesis.
The physics-based approaches have high flexibility, but also high specificity,
since a model for fire cannot be used for the generation of water or smoke, for instance.
They have high flexibility since the visual appearance of the synthetic texture can be
modified by tuning the model parameters
3.10 Tensor
Tensor is a general name for multi–linear mappings over a set of vector spaces, i.e. a
vector is a 1–mode tensor, a matrix is a 2–mode tensor. The tensor T is an N mode tensor
where the dimensionality of the mode i is di. In the same way as a matrix can be pre–
multiplied (mode– 1 multiplication) or post-multiplied (mode–2 multiplication) with
29
another matrix, a matrix can be multiplied with a higher order tensor with respect to
different modes. The mode multiplication of a matrix MIn×dn with a tensor T is denoted
as T×nM and results in a tensor U with the same number of modes. The elements of the
tensor U is computed in the following way:
Ud1....dn−1indn+1...dN =Xdn td1...dN × mindn
Tensor Decomposition: Principal Component Analysis (PCA) is a version of Singular
Value Decomposition (SVD) which is a 2–mode tool, commonly used in signal
processing to reduce the dimensionality of the space and reduce noise. SVD decomposes
a matrix into three other matrices, such that:
A = US VT
Where, the matrix U spans the row space of A, the matrix V spans the column space of A
and S is a diagonal matrix of singular values. The column eigenvectors vectors of
matrices U (likewise for V ) are orthonormal to each other, describing a new orthonormal
coordinate system for the space spanned by matrix A. N–mode SVD or Higher Order
SVD (HOSVD) is a generalization of the matrix SVD to tensors. It decomposes a tensor
T , by orthogonolazing its modes, yielding a core tensorand matrices spanning the vector
spaces in each mode of the tensor, i.e.:
T = S ×1 U1 ×2 U2.... ×n Un
The tensor S is called the core tensor and is analogous to the diagonal singular
value matrix in the traditional SVD. However, for HOSVD, the tensor S is not a diagonal
tensor but coordinates the interaction of matrices to produce the original tensor. Matrices
Ui are again orthonormal and the column vectors of Ui spans the space of tensor T ,
flattened with respect to mode i. The row vectors of Ui are the coefficient sets describing
each dimension in mode i. These coefficients can be thought as the coefficients extracted
from PCA but there are different sets of coefficients for each mode in HOSVD analysis.
Dimensionality Reduction: After decomposing the original data tensor to yield the core
tensor and mode matrices, we are able to reduce the dimensionality with respect to the
mode we want, unlike PCA where the dimensionality reduction is only based on the
variances. By reducing the number of dimensions in one mode and keeping the other
intact, we can have more control over the noise reduction, classification accuracies and
complexity of the problem. The dimensionality reduction is achieved by deleting the last
m–column vectors from the desired mode matrix and deleting the corresponding m
30
hyper–planes from the core tensor. It is also defined that the error after dimensionality
reduction is bounded by the Frobenius–norm of the hyper–planes deleted from the core
tensor.
Chapter 4
INTRODUCTION to MATLAB
MATLAB® is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where
problems and solutions are expressed in familiar mathematical notation. Typical uses
include
Math and computation
Algorithm development
Data acquisition
Modeling, simulation, and prototyping
Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including graphical user interface building.
MATLAB is an interactive system whose basic data element is an array that does not
require dimensioning. This allows you to solve many technical computing problems,
especially those with matrix and vector formulations, in a fraction of the time it would
take to write a program in a scalar non interactive language such as C or FORTRAN.
31
The name MATLAB stands for matrix laboratory. MATLAB was originally
written to provide easy access to matrix software developed by the LINPACK and
EISPACK projects. Today, MATLAB engines incorporate the LAPACK and BLAS
libraries, embedding the state of the art in software for matrix computation.
MATLAB has evolved over a period of years with input from many users. In
university environments, it is the standard instructional tool for introductory and
advanced courses in mathematics, engineering, and science. In industry, MATLAB is the
tool of choice for high-productivity research, development, and analysis.
MATLAB features a family of add-on application-specific solutions called
toolboxes. Very important to most users of MATLAB, toolboxes allow you to learn and
apply specialized technology. Toolboxes are comprehensive collections of MATLAB
functions (M-files) that extend the MATLAB environment to solve particular classes of
problems. Areas in which toolboxes are available include signal processing, control
systems, neural networks, fuzzy logic, wavelets, simulation, and many others.
4.1 The MATLAB System
The MATLAB system consists of five main parts
Development Environment
Matlab Mathematical Function
Matlab Language
Graphics
Matlab Application Program Interface
Development Environment: This is the set of tools and facilities that help you use
MATLAB functions and files. Many of these tools are graphical user interfaces. It
includes the MATLAB desktop and Command Window, a command history, an editor
and debugger, and browsers for viewing help, the workspace, files, and the search path.
The MATLAB Mathematical Function: This is a vast collection of computational
algorithms ranging from elementary functions like sum, sine, cosine, and complex
arithmetic, to more sophisticated functions like matrix inverse, matrix eigen values,
Bessel functions, and fast Fourier transforms.
32
The MATLAB Language: This is a high-level matrix/array language with control flow
statements, functions, data structures, input/output, and object-oriented programming
features. It allows both "programming in the small" to rapidly create quick and dirty
throw-away programs, and "programming in the large" to create complete large and
complex application programs.
Graphics: MATLAB has extensive facilities for displaying vectors and matrices as
graphs, as well as annotating and printing these graphs. It includes high-level functions
for two-dimensional and three-dimensional data visualization, image processing,
animation, and presentation graphics. It also includes low-level functions that allow you
to fully customize the appearance of graphics as well as to build complete graphical user
interfaces on your MATLAB applications.
The MATLAB Application Program Interface (API): This is a library that allows you
to write C and FORTRAN programs that interact with MATLAB. It includes facilities for
calling routines from MATLAB (dynamic linking), calling MATLAB as a computational
engine, and for reading and writing MAT-files.
4.2 MATLAB Working Environment
Getting Help
Using the Matlab Editor to Create M-Files
MATLAB Desktop
MATLAB Desktop: Matlab Desktop is the main Matlab application window. The
desktop contains five sub windows, the command window, the workspace browser, the
current directory window, the command history window, and one or more figure
windows, which are shown only when the user displays a graphic.
The command window is where the user types MATLAB commands and
expressions at the prompt (>>) and where the output of those commands is displayed.
MATLAB defines the workspace as the set of variables that the user creates in a work
session. The workspace browser shows these variables and some information about them.
Double clicking on a variable in the workspace browser launches the Array Editor, which
can be used to obtain information and income instances edit certain properties of the
variable.
33
The current directory tab that is present above the workspace tab shows the
contents of the current directory, whose path is shown in the current directory window.
For example, in the windows operating system the path might be as follows: C:\
MATLAB\Work, indicating that directory “work” is a subdirectory of the main directory
“MATLAB”; which is installed in Drive C. clicking on the arrow in the current directory
window shows a list of recently used paths. Clicking on the button to the right of the
window allows the user to change the current directory.
MATLAB uses a search path to find M-files and other MATLAB related files,
which are organize in directories in the computer file system. Any file run in MATLAB
must reside in the current directory or in a directory that is on search path. By default, the
files supplied with MATLAB and math works toolboxes are included in the search path.
It is the easiest way to see which directories are on the search path. The easiest way to see
which directories are soon the search paths, or to add or modify a search path, is to select
set path from the File menu the desktop, and then use the set path dialog box. It is good
practice to add any commonly used directories to the search path to avoid repeatedly
having the change the current directory.
The Command History Window contains a record of the commands a user has
entered in the command window, including both current and previous MATLAB
sessions. Previously entered MATLAB commands can be selected and re-executed from
the command history window by right clicking on a command or sequence of commands.
This action launches a menu from which to select various options in addition to executing
the commands. This is useful to select various options in addition to executing the
commands. This is a useful feature when experimenting with various commands in a
work session.
Using the Matlab Editor to Create M-Files: The MATLAB editor is both a text editor
specialized for creating M-files and a graphical MATLAB debugger. The editor can
appear in a window by itself, or it can be a sub window in the desktop. M-files are
denoted by the extension .m, as in pixelup.m. The MATLAB editor window has
numerous pull-down menus for tasks such as saving, viewing, and debugging files.
Because it performs some simple checks and also uses color to differentiate between
various elements of code, this text editor is recommended as the tool of choice for writing
and editing M-functions. To open the editor, type edit at the prompt opens the M-file
34
filename.m in an editor window, ready for editing. As noted earlier, the file must be in the
current directory, or in a directory in the search path.
Getting Help: The principal way to get help online is to use the MATLAB help browser,
opened as a separate window either by clicking on the question mark symbol (?) on the
desktop toolbar, or by typing help browser at the prompt in the command window. The
help Browser is a web browser integrated into the MATLAB desktop that displays a
Hypertext Markup Language (HTML) documents. The Help Browser consists of two
panes, the help navigator pane, used to find information, and the display pane, used to
view the information. Self-explanatory tabs other than navigator pane are used to perform
a search.
4.3 CommandsUigetfile: Open standard dialog box for retrieving
filesDescriptionUigetfile displays a modal dialog box that lists files in the current
directory and enables the user to select or type the name of a file to be opened. If the
filename is valid and if the file exists, uigetfile returns the filename when the user clicks
Open. Otherwise uigetfile displays an appropriate error message from which control
returns to the dialog box. The user can then enter another filename or click Cancel. If the
user clicks Cancel or closes the dialog window, uigetfile returns 0.Aviinfo:Information
about Audio/Video Interleaved (AVI) fileDescriptionFileinfo = aviinfo (filename) It
returns a structure whose fields contain information about the AVI file specified in the
string filename. If filename does not include an extension, then .avi is used. The file must
be in the current working directory or in a directory on the MATLAB path.Aviread:Read
Audio/Video Interleaved (AVI) fileDescriptionMov = aviread (filename) reads the AVI
movie filename into the MATLAB movie structure mov. If filename does not include an
extension, then .avi is used. Use the movie function to view the movie
mov.frame2im:Convert movie frame to indexed imageDescription[X, Map] = frame2im
(F) converts the single movie frame F into the indexed image X and associated colormap
Map. The functions getframe and im2frame create a movie frame. If the frame contains
true-color data, then Map is empty.im2frame:Convert image to movie
frameDescriptionf = im2frame(X, map) converts the indexed image X and associated
colormap map into a movie frame f. If X is a truecolor (m-by-n-by-3) image, then map is
optional and has no effect.Imwrite:Write image to graphics fileDescriptionImwrite(X,
35
map, filename, fmt) writes the indexed image in X and its associated colormap map to
filename in the format specified by fmt. If X is of class uint8 or uint16, imwrite writes the
actual values in the array to the file. If X is of class double, imwrite offsets the values in
the array before writing, using uint8(X–1). Map must be a valid MATLAB colormap.
Note that most image file formats do not support colormaps with more than 256
entries.When writing multiframe GIF images, X should be an 4-dimensional M-by-N-by-
1-by-P array, where P is the number of frames to write.Imread:Read image from
graphics fileDescriptionA = imread (filename, fmt) reads a grayscale or color image
from the file specified by the string filename. If the file is not in the current directory, or
in a directory on the MATLAB® path, specify the full pathname.Movie:Play recorded
movie framesDescriptionMovie plays the movie defined by a matrix whose columns are
movie frames (usually produced by getframe).movie (M) plays the movie in matrix M
once, using the current axes as the default target. If you want to play the movie in the
figure instead of the axes, specify the figure handle (or gcf) as the first argument: movie
(figure_handle...). M must be an array of movie frames (usually from
getframe).Chapter 5RESULTS and CONCLUSIONBy Higher Order
SVD Analysis for Dynamic Texture Synthesis Videos like Flame, Pond & Grass are
given as input, so that the obtained Output video is 3 times compressed of Input video.
Figure 5.1: Output Frame for
Input Flame Video Description: It is one of the output frames that are obtained from the
given input video after compression. The following are the parameters that
are obtained from the compressed video.input_file_size = 20505600output_file_size
= 6835200Compression = 3Output file size is 3 times compression of input file
36
sizeCompression _ratio = 0.3333compression_ratio = 33.3333Psnr = 37.2036
Figure 5.2: Output Frame for
Input Pond VideoDescription: It is one of the output frames that is obtained from the
given input video after compression. The following are the parameters that are
obtained from the compressed video. input_file_size = 39744000output_file_size =
13248000Compression = 3Output file size is 3 times compression of input file
sizecompression_ratio = 0.3333compression_ratio = 33.3333Psnr = 40.8908
Figure 5.3: Output Frame for
Input Grass VideoDescription: Figure 5.3 is one of the output frames that are obtained
from the given input video after compression. The following are the parameters that are
obtained from the compressed video.input_file_size = 9676800output_file_size =
3225600Compression = 3Output file size is 3 times compression of input file
sizecompression_ratio = 0.3333compression_ratio = 33.3333Psnr =
45.4285ConclusionHere it is proposed to decompose the multidimensional signal that
represents a dynamic texture by using a tensor decomposition technique. As opposed to
techniques that unfold the multi dimensional signal on a 2-D matrix, our method analyzes
data in their original dimensions. This decomposition, only recently used for applications
37
in image and video processing, permits to better exploit the spatial, temporal, and
chromatic correlation between the pixels of the video sequence, leading to an important
decrease in model size. Compared to algorithms where the unfolding operations are
performed in 2-D or where the spatial information is exploited by considering the analysis
in the Fourier domain, this method results in models with on average five times less
coefficients, still ensuring the same visual quality. Despite being a suboptimal solution for
the tensor decomposition, the HOSVD ensures close-to-optimal energy compaction and
approximation error. The sub optimality derives from the fact that the HOSVD is
computed directly from the SVD, without using expensive iterative algorithms, such as
done for the optimal solution. This is an advantage, since the analysis can be done faster
and with less computational power. The few model parameters permit to perform
synthesis in real-time. Moreover, the small memory occupancy favours the use of the
HOSVD based model in architectures characterized by constraints in memory and
computational power complexity, such as PDAs or mobile
phones.APPENDIXSource Codeclear all;clc;[filename,
pathname]=uigetfile('*.avi');str2='.bmp';file=aviinfo(filename); % to get inforamtaion
abt video filefrm_cnt=file.NumFrames ; % No.of frames in the video filefor
i=1:frm_cntfrm(i)=aviread(filename,i); % read the Video
filefrm_name=frame2im(frm(i));filename1=strcat(strcat(num2str(i)),str2);imwrite(frm_na
me,filename1); % Write image fileendstr3='.png';for
j=1:frm_cntfilename_1=strcat(strcat(num2str(i)),str2);D=imread(filename_1);
[u1,s1,v1]=svd(double(filename_1));im = (u1 * s1 *
transpose(v1));file_2=strcat(strcat(num2str(j)),str3);imwrite(im,file_2);endfor
k=1:frm_cntfile_2=strcat(num2str(k),'.bmp');v=imread(file_2);[Y, map] =
rgb2ind(v,255);F(:,:,k)=im2frame(flipud(Y),map);save F Fendmov=aviread(filename);[h,
w, p] = size(mov(1).cdata);hf = figure('Name','INPUT VIDEO ');set(hf, 'position', [150
150 w h]);movie(gcf,mov);[h, w, p] = size(F(1).cdata);hf = figure('Name','HOSVD
COMPRESSED VIDEO ');set(hf, 'position', [150 150 w h]);movie(gcf,F);input_file_size
= frm_cnt * size(frm(1).cdata,1)* size(frm(1).cdata,2) *
size(frm(1).cdata,3)output_file_size=frm_cnt * size(F(1).cdata,1)* size(F(1).cdata,2) *
size(F(1).cdata,3)compression = (input_file_size / output_file_size)fprintf('output file size
38
is %d times compression of input file size',compression);compression_ratio =
output_file_size/input_file_sizecompression_ratio = compression_ratio *
100mse=(sum(mov(1).cdata(:,:,1)-F(1).cdata).*sum(mov(1).cdata(:,:,1)-F(1).cdata))/
input_file_size;psnr=20*log10(255/sqrt(max(mse))) REFERENCES [1]
Doretto.G, Chiuso. A, Wu.Y, and Soatto.S, “Dynamic textures,” Int.J. Comput. Vis.,
vol.51, no.2, pp. 91–109, 2003.[2] Doretto.G, Cremers.D, Favaro.P, and Soatto.S,
“Dynamic texture segmentation,” in Proc. IEEE Int. Conf. Image Processing, 2003,
pp.1236–1242.[3] Kwatra.V, Schödl.A, Essa.I, Turk.G, and Bobick.B, “Graphcut
textures: Image and Video synthesis using graph cuts,” in Proc. Siggraph, 2003, pp.
277286.[4] Rafael C. Gonzalez, Richard E.Woods. Digital Image Processing – Second
Edition[5] Schödl, Szeliski.R, Salesin.D, and Essa.I, “Video textures,” in
Proc .ACM Siggraph, 2000, pp. 489–
98.URL’Shttp://rr.epfl.ch/15/3/04389813.pdf
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4389813
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=212766
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5346016
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1639573
39
Top Related