segment.pptx

Click here to load reader

download segment.pptx

of 16

Transcript of segment.pptx

Slide 1

Image Segmentation using Nearest Neighbor Classifier in Matlab

Dr. Rashi Agarwal, IT Deptt,UIET, CSJM University, Kanpur

Discrete Cosine TransformThe real part of the Fourier Transform.1D Forward DCTGiven a list of n intensity values I(x), where x = 0, , n-1Compute the n DCT coefficients:

Visualization of 1D DCT BasisFunctions

F(0) F(1) F(2) F(3 ) F(4) F(5) F(6) F(7)Extend DCT from 1D to 2D

Perform 1D DCT on each row of the block and again for each column of 1D coefficients

Equations for 2D DCTVisualization of 2D DCT Basis Functions

F(0,0) includes the lowest frequency in both directions is called DC coefficient Determines fundamental color of the blockF(0,1) . F(7,7) are called AC coefficientsTheir frequency is non-zero in one or both directionsHuffman Coding

Average length = 1*0.4 +2*0.3+3*0.3+4*0.1+0.06*5 +0.04*5 -2.2MISSISSIPPIRIVER16 alphabetsM:1, I:5, S:4, P:2, R:2, V:1, E:1alphabetprobability

I5/16 = 0.310.310.310.310.440.66 (0)S4/16=0.250.250.250.250.31(01)0.44(1)P2/16=0.1250.1250.1900.25 (10)0.25(00)R2/16=0.1250.1250.125(101)0.19(11)M1/16=0.0640.126(110)0.125(100)V1/16=0.063(1101)0.064 (111)E1/16-0.063(1100)I01V1101S00E1100R101P100M110JPEG using DCTUncompressed Image

Let us first consider how much memory an uncompressed bitmap raster image uses.Using adesktop 1280 x 1024 pixel imagefor example. Each pixel requires 3 memory locations to store the RGB colours.

So 3 Blocks ( arrays-grids) of 1280 x 1024 memory are used = 3932160 memory locations.

With larger images the huge file sizes create a problem for data storage and transmission over networks.To overcome these problems data compression is used to reduce the file size.

Lossless and Lossy CompressionThe first data compression methods devised were loss-less. That is after compression and decompression you getback the original data.These methodsrelied on the data being inefficiently coded in the first place to get good compression ratios.Graphic image data with lots of fine detail when compressed using a loss-less method entail lots of processing for little compression effect. New ideas were needed to overcome the problem which led to a detailed examination of the information stored in an image. An image is shades of light and dark of different hues. The viewer is the human eye and brain. The new ideas were centered around exploiting the strengths and weakness of the human system.

JPEGIn 1987 two groups were combined to form a joint committee (Joint Photographic Experts Group ( JPEG )) that would research and produce a single standard.JPEG unlike other compression methods is not a single algorithm but maybe thought of as a toolkit of image compression methods to suit the users needs.JPEG uses a lossy compression method that throws useless data away during encoding. This is why lossy schemes manage to obtain superior compression ratios over most loss-less schemes. JPEG is designed to discard information the human eye cannot easily see. The eye barely notices slight changes in colour but will pick out slight changes in brightness or contrast.

STEPSConvert RGB to YCbCr format (luminance & chroma)Y = 0.299 R + 0.587 G + 0.114 BCb = - 0.1687 R - 0.3313 G + 0.5 B + 128Cr = 0.5 R - 0.4187 G - 0.0813 B + 128

Jpeg Chroma samplingThe luminance channel is retained at full resolution.Both chrominance channels are typically down sampled by 2:1 horizontally and either 1:1 or 2:1 vertically

The luminance and chrominance components of the image are divided up into an array of 8x8 pixel blocks. Padding is provided if required to ensure blocks on the right and bottom of the image are full. These 8x8 pixel blocks are fed into a process that preforms a forward Discrete Cosine Transformation (DCT) . The output of this process is a set of 64 values.Jpeg QuantizationThe next step is the Quantization process which is the main source of the Lossy Compression. The values in the quantization table are chosen to preserve low-frequency information and discard high-frequency (noise-like) detail as humans are less critical to the loss of information in this area.

Each DCT term is divided by the corresponding position in the Quantization table and then rounded to the nearest integer as illustrated below. In each table the low frequency terms are in the top left hand corner and the high frequency terms are in the bottom right hand corner. This is the point at which we can control the quality and amount of compression of the JPEG. The lower the quality setting, the greater the divisor, increasing the chance of a zero result.

Camera manufacturers independently choose an arbitrary "image quality" name (or level) to assign to the 64-value quantization matrix that they devise, and so the names cannot be compared between makes or even models by the same manufacturer. The Quantization tables used is stored as part of the Jpeg.The first output is a steady (DC) level of all the 64 image pixels averaged together and the other 63 outputs represent different frequency (AC) levels found in the 8 x 8 pixel image block. If through the user compression level (quality factor) slider in the quantization stage it discarded all of the 63 AC outputs the resultant image would show 8 x 8 pixel areas of the same tone. The image will get maximum compression typically something in excess of 120:1 but you may of dumped a lot of image information to get it.Jpeg Huffman

After quantization the 63 DCT AC terms are collected using the zigzag method illustrated on the left. This collection method takes advantage of the fact that high frequency terms will tend to zero after quantization improving the chance of getting longer runs of zero which will be idea for good run length compression.The 63 AC components from the DCT process are compressed using the loss lessrun length encodingHowever the DC component is treated differently. It is assumed that neighboring 8 x 8 blocks will have a similar average value so instead of dealing with a large number format it uses a small number format which defines the difference in level from the previous block thereby requiring less code to store the information.