1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.
-
date post
20-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.
![Page 1: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/1.jpg)
1
Data CompressionEngineering Math Physics (EMP)
Steve Lyon
Electrical Engineering
![Page 2: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/2.jpg)
2
Why Compress?
• Digital information is represented in bits– Text: characters (each encoded as a number)– Audio: sound samples– Image: pixels
• More bits means more resources– Storage (e.g., memory or disk space)– Bandwidth (e.g., time to transmit over a link)
• Compression reduces the number of bits– Use less storage space (or store more items)– Use less bandwidth (or transmit faster)– Cost is increased processing time/CPU hardware
![Page 3: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/3.jpg)
• Video– TV – 640x480 pixels (ideal US broadcast TV)– 3 colors/pixel (Red, Green, Blue)– 1 byte (values from 0 to 255) for each color ~900,000 bytes per picture (frame)– 30 frames/second ~27MB/sec– DVD holds ~5 GB Can store ~3 minutes of uncompressed video on a DVD
Must compress
3
Do we really need to compress?
![Page 4: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/4.jpg)
4
Compression Pipeline
• Sender and receiver must agree– Sender/writer compresses the raw data– Receiver/reader un-compresses the compressed data
• Example: digital photography
compress uncompress
compress uncompress
![Page 5: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/5.jpg)
5
Two Kinds of Compression
• Lossless– Only exploits redundancy in the data– So, the data can be reconstructed exactly– Necessary for most text documents (e.g., legal
documents, computer programs, and books)
• Lossy– Exploits both data redundancy and human perception– So, some of the information is lost forever– Acceptable for digital audio, images, and video
![Page 6: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/6.jpg)
6
Lossless: Huffman Encoding
• Normal encoding of text– Fixed number of bits for each character
• ASCII with seven bits for each character– Allows representation of 27=128 characters– Use 97 for ‘a’, 98 for ‘b’, …, 122 for ‘z’
• But, some characters occur more often than others– Letter ‘a’ occurs much more often than ‘x’
• Idea: assign fewer bits to more-popular symbols – Encode ‘a’ as “000”– Encode ‘x’ as “11010111”
![Page 7: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/7.jpg)
7
Lossless: Huffman Encoding
• Challenge: generating an efficient encoding– Smaller codes for popular characters– Longer codes for unpopular characters
English Text: frequency distribution
Morse code
![Page 8: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/8.jpg)
8
Lossless: Run-Length Encoding
• Sometimes the same symbol repeats– Such as “eeeeeee” or “eeeeetnnnnnn”– That is, a run of “e” symbols or a run of “n” symbols
• Idea: capture the symbol only once– Count the number of times the symbol occurs– Record the symbol and the number of occurrences
• Examples– So, “eeeeeee” becomes “@e7”– So, “eeeeetnnnnnn” becomes “@e5t@n6”
• Useful for fax machines– Lots of white, separate by occasional black
![Page 9: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/9.jpg)
9
Image Compression
• Benefits of reducing the size– Consume less storage space and network bandwidth– Reduce the time to load, store, and transmit the image
• Redundancy in the image– Neighboring pixels often the
same, or at least similar– E.g., the blue sky
• Human perception factors– Human eye is not sensitive
to high spatial frequencies
![Page 10: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/10.jpg)
Approximating arbitrary functions (curves)
How can we represent some
arbitrary function by some
simple ones?
Ex. This mountain range
![Page 11: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/11.jpg)
Approximating with a sum of cosines
n=1½ wavelength2 ½ wavelengths
n=5n=15
7 ½ wavelengthsconstantn=0
![Page 12: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/12.jpg)
Approximation with 5 terms
![Page 13: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/13.jpg)
Approximation with 15 terms
![Page 14: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/14.jpg)
Approximation with 45 terms
![Page 15: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/15.jpg)
Approximation with 145 terms
![Page 16: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/16.jpg)
16
Discrete cosine transform
• How do we determine the coefficients of each term? – How much of “3 wavelengths” vs. “47 wavelengths”?– Look at the fit and tweak the coefficients?
– Maybe for a couple– Insane for 145
• Idea: look at
• = 0 if n m• = /2 if n = m (or if n = m = 0)
• So, if
• Then
0
)cos()cos( dmn
)cos()(0
nxfxfn
n
dxxnxffn )cos()(2
0
![Page 17: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/17.jpg)
Often, most of the information is in the first few fn
low frequencies
Ex. “filter” and keep only the low frequencies
compression
Can manipulate the Fourier coefficients (fn)
17
![Page 18: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/18.jpg)
Periodic functions
18
Produces a periodic curve:
Cosine transforms particularly good forrepresenting periodic signals
- Like sound (music)
![Page 19: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/19.jpg)
19
Example: Digital Audio
• Sampling the analog signal– Sample at some fixed rate – Each sample is an arbitrary real number
• Quantizing each sample– Round each sample to one of a finite number of values– Represent each sample in a fixed number of bits
4 bit representation(values 0-15)
![Page 20: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/20.jpg)
20
Example: Digital Audio
• Speech– Sampling rate: 8000 samples/second– Sample size: 8 bits per sample– Rate: 64 kbps
• Compact Disc (CD)– Sampling rate: 44,100 samples/second– Sample size: 16 bits per sample– Rate: 705.6 kbps for mono,
1.411 Mbps for stereo
![Page 21: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/21.jpg)
21
Example: Digital Audio
• Audio data requires too much bandwidth – Speech: 64 kbps is too high for a dial-up modem user– Stereo music: 1.411 Mbps exceeds most access rates
• Compression to reduce the size– Remove redundancy– Remove details that humans tend not to perceive
• Example audio formats– Speech: GSM (13 kbps), G.729 (8 kbps), and G.723.3
(6.4 and 5.3 kbps)– Stereo music: MPEG 1 layer 3 (MP3) at 96 kbps, 128
kbps, and 160 kbps
![Page 22: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/22.jpg)
22
108 KB 34 KB
8 KB
Joint Photographic Experts Group (JPEG)
Lossy compression
![Page 23: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/23.jpg)
23
Contrast Sensitivity Curve
![Page 24: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/24.jpg)
• Digital cameras (CCDs) output RGB– Eyes most sensitive to intensity– Less sensitive to color variations
• Convert image to YCbCr– Y = intensity ~ (R+G+B)
Gives black & white B&W TV’s could use that when color TV first came out
– Cb ~ (B – Y)– Cr ~ (R – Y)
• Sometimes leave as RGB – gives poorer quality jpeg 24
How JPEG works 1
![Page 25: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/25.jpg)
25
How JPEG works 2• Either RGB or YCbCr gives 3 8-bit “planes”
– Process separately
• Process image in 8-pixel x 8-pixel blocks– 2-dimensional discrete Fourier Transform (DCT) N1 = N2 = 8
– Just matrix multiplication– Produces 8x8 matrix (B) of spatial frequencies– “Quantize” divide each element by fixed number
High-frequency coefficients divided by larger number If result is small, set to 0 (the lossy part) Can be “lossier” on Cb and Cr than on Y
• Lossless compression to squeeze out the 0’s
![Page 26: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/26.jpg)
26
151 -9 -1 0
5 0 0 0
1 0 0 0
0 0 0 0
62 65 72 42
45 60 66 51
58 52 60 42
82 65 76 53
452 -60 -7 2
32 2 -8 3
-20 7 1 0
5 -1 0 5
3 7 11 15
7 11 15 19
11 15 19 23
15 19 23 27
Block of pixels (really 8 by 8) 2D DCT of Block
Quantization Matrix(accentuate the low frequencies)
Quantized Pixel Matrix
2D DiscreteCosine
Transform(DCT)
Division andRounding
Lowfrequency
Highfrequency
![Page 27: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/27.jpg)
JPEG Artifacts
27
JPEG does not compress text or diagrams well. Here same file size as lossless compression – gif Get “halos” around letters, lines, etc.
Lines and text have sharp edges JPEG smears
Get “blotchy” appearance when heavily compressed
Have 8x8 blocks of all one color – only constant term in DCT remained
![Page 28: 1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d4c5503460f94a29ccf/html5/thumbnails/28.jpg)
28
Conclusion
• “Raw” digital information often has many more bits than necessary– Redundancies and patterns we can use– Information that is imperceptible to people
• Lossless compression– Used when must be able to exactly recreate original– Find common patterns (letter frequencies, repeats, etc.)
• Lossy Compression– Can get very large compression ratios – a few to 1000’s– Exploit redundancy and human perception
Remove information we (people) don’t need
– Too much compression degrades the signals