7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
1/18
DATA COMPRESSION AND HUFFMAN ALGORITHM
Technical Seminar Paper Submitted by
Presented by
Vineet Agarwala
NATIONAL INSTITUTE OF SCIENCE & TECHNOLOGY
IT200118155
Technical Seminar Under the guidance of
Anisur Rahman
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
2/18
DATA COMPRESSION
Virtually all forms of data - text, numerical, image, video containredundant elements
Data can be compressed by eliminating the redundant elements.
A code is substituted for the eliminated redundant element, wherethe code is shorter than eliminated element.
When compressed data is retrieved from storage or received over
a communications link, it is expanded back to its original form,based on the code.
Compression is used:
to save storage space
to reduce communications transmission requirements
The art or science of compactly representing informationDigital realm: using lesser number of bits to represent information
Data + Compression = informationredundancy
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
3/18
REDUNDANCYMost types of computer files are fairly redundant -- they have the same
information listed over and over again. File-compression programs
simply get rid of the redundancy
Ask not what your country can do for you -- ask what
you can do for your country.
Ignoring the difference between capital and lower-case
letters, roughly half of the phrase is redundant. Nine words -- ask, not, what, your, country, can, do, for, you -- give us
almost everything we need for the entire quote
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
4/18
Compression Techniques
LosslessData can be completely recovered after decompression
Recovered data is identical to original
Exploits redundancy in data
LossyData cannot be completely recovered after
decompression
Some information is lost for ever
Gives more compression than losslessDiscards insignificant data components
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
5/18
IMAGE COMPRESSION Image compression can be lossy or lossless
Methods for lossless image compression are:
Run-length encoding
Entropy coding
Adaptive dictionary algorithms such as LZW
Methods for lossy compression are:Reducing the color space to the most common colors in the image.
The selected colors are specified in the color palette in the header of
the compressed image. Each pixel just references the index of a color
in the color palette. This method can be combined with dithering to
blur the color borders.
Transform coding. This is the most commonly used method. AFourier-related transform such as DCT or the wavelet transform are
applied, followed by quantization and entropy coding.
Fractal compression.
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
6/18
JPEG (TRANSFORM COMPRESSION)
JPEG is named after its origin, the Joint Photographers ExpertsGroup
This involves reducing the number of bits per sample or entirely
discard some of the samples
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
7/18
MULTIMEDIA COMPRESSION
Multimedia compression is a general term referring to thecompression of any type of multimedia, most notably
graphics, audio, and video
MPEG (Moving Pictures Experts Group ) The future of this
technology is to encode the compression anduncompression algorithms directly into integrated circuits.
The approach used by MPEG can be divided into two types
of compression: within-the-frame and between-frame
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
8/18
DATA COMPRESSION ALGORITHMS
LOSSY COMPRESSION
Run Length Encoding
Huffman Coding
Delta
LZW
LOSS LESS
COMPRESSION
CS & Q
JPEG
MPEG
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
9/18
RUN-LENGTH ENCODING
Example of run-length encoding. Each run of zeros is
replaced by two characters in the compressed file: a zero
to indicate that compression is occurring, followed by thenumber of zeros in the run.
Data files frequently contain the same characterrepeated many times in a row.
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
10/18
HUFFMAN ENCODING
This method is named after D.A. Huffman, whodeveloped the procedure in the 1950s.
More than 96% of this file consists of only 31
characters out of 127
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
11/18
HUFFMAN ENCODING EXAMPLE
Character frequenciesA: 20% (.20)
B: 9% (.09)
C: 15%D: 11%
E: 40%
F: 5%
C
.15
A
.20
D
.15
F.05
BF
.14
B
.09
01
E
.4
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
12/18
HUFFMAN ENCODING EXAMPLE (CONDT.)
Codes
A: 010
B: 0000
C: 011D: 001
E: 1
F: 0001
ABCDEF
1.0
E
.4
C
.15
A
.20
D
.15
F
.05
BF
.14
AC
.35
BFD
.25
ABCDF
.6
B
.09
0
0
0
0
0
1
1
11
1
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
13/18
Run Length Encoding
CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC
CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC
CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC
CT5A3GTCG6TG3C5GCCT7C } Run length encoded: 21symbols
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
14/18
Run Length Encoding (Contd.)
WWWBWWWWWBWWWBWWWWBWWWWWBWWWBWWWWWBWWBWWWWWWBBBWWWWWWWBWBWWWWW
WWBWWBBWWWWWBWWWWBWWWWBWWWWB
WWWBWWWWWBWWWBWWWWB.
3WB5WB3WB4WB.
3151314 possible optimization, but
#W3151314.. Optimization requires escape character
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
15/18
Run Length Encoding (Contd.)
Is run length encoding practical for images?
No
Yes
Chances of three or more identicalconsecutive pixels are low for most real
images.
Especially images with large color depth.
Some images do have lots of consecutivepixels.
Especially images with low color depth.
RLE is used for fax machines, and by BMP,
TIFF and PCX files.
http://www.cs.ucr.edu/~eamonn/new_photo.php7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
16/18
LZW Compression
LZW compression is named after its
developers, A. Lempel and J. Ziv, with later
modifications by Terry A. Welch. It is the
foremost technique for general purpose data
compression due to its simplicity and
versatility
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
17/18
LZW Compression (contd.)
LZW compressionflowchart.
The variable, CHAR, is
a single byte. The
variable, STRING, is a
variable length
sequence of bytes.
Data are read from the
input file (box 1 & 2) as
single bytes, andwritten to the
compressed file (box 4)
as 12 bit codes.
7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt
18/18
CONCLUSION
Is it possible to create a data compressionalgorithm that will always compress data?
Is there an optimal data compression algorithm?Lossless: No, compression rates depend on the data.
Lossy: No, the quality of compression is subjective.
Is Data Compression is really that important?
Top Related