Hardware Implementation of Transform & Quantization Blocks in H.264/AVC Video Coding Standard
description
Transcript of Hardware Implementation of Transform & Quantization Blocks in H.264/AVC Video Coding Standard
Hardware Implementation of Transform & Quantization Blocks in H.264/AVC Video Coding Standard
By:Hoda Roodaki
Instructor:Dr. Fakhraei
Custom Implementation of DSP Systems Class Seminar. All materials are copy rights of Custom Implementation of DSP Systems Class Seminar. All materials are copy rights of their respective authors as listed in referencestheir respective authors as listed in references
In the Name of God
Outline
• Video Coding & Standardization• Video Coding Standards & Application• H.264/AVC (MPEG-10) Standard• H.264 Drawbacks• Proposed Method for 4x4 DCT, 8x8 DCT and
Quantization• Concolusion
2
Video Coding & Standardization
• Efficient digital representation of video signals has been the subject of considerable research over the past twenty years.
Visual communications has become more feasible
• Availability of digital transmission links• Progress in signal processing • VLSI technology• Video compression research
Increased commercial interest in video communications
Standardization3
Video Coding Standards & Application
Moving Picture Experts Group (MPEG)
• MPEG1 (1988-1992)– Audio and video on storage
media such as CDROM
• MPEG2 (1993)– Digital TV: SDTV, HDTV
• MPEG4 (1994)– A standard for very low bit
rate coding of limited complexity audio-visual material
ITU-T Video Coding Expert Group (VCEG)
• H.261 (1988-1990)– Videoconferencing, video-
telephone applications over ISDN telephone lines
•
• H.263 (1996)– mobile network
4
H.264/AVC (MPEG4-part10) Standard
• In 2001 • With the aim of developing a more efficient
compression system,
MPEG VSEG
Joint Video Team (JVT)
5
H.264/AVC (MPEG-10) Standard• Significant improvement in coding efficiency
– Average bit rate reduction of 50% given fixed fidelity compared to any other video standard
• Error robustness• Applications
• Broadcast over cable, satellite, cable modem, DSL, terrestrial.• Interactive or serial storage on optical and magnetic storage devices, DVD,
etc.• Conversational services over ISDN, Ethernet, LAN, DSL, wireless and
mobile networks, modems.• Video-on-demand or multimedia streaming services over cable modem,
DSL, ISDN, LAN, wireless networks.• Multimedia messaging services over DSL, ISDN.
• Broad range of bit rates and picture sizesranging from very low bit rate, low frame rate video for mobile and dial-up devices through to entertainment-quality standard-definition television services, HDTV, and beyond.
6
H.264 Drawbacks• These aggressive compression techniques
increase computational complexity and need an efficient architecture to implement these techniques
• Quantization & Transformation blocks are two critical parts of encoder
We need some methods that simplifies these
blocks
Real Time Applications
7
Proposed Method for 4x4 DCT [1]
• The forward 4x4 DCT of a sample block
TAXAY
cbbcaaaabccbaaaa
A
83cos
21
8cos2121
c
b
a
8
Proposed Method for 4x4 DCT(Cont.)
ffT EWECXCY )(
dd
ddC
111111111111
22
22
22
22
babbababaabababbababaaba
E f bcd
21 dCCC
0110111110011111
1C
1001000001100000
2C
TCXCW
9
Proposed Method for 4x4 DCT(Cont.)
TTTT
TT
XCCdXCdCXCdCXCCW
dCCXdCCW
222
122111
2121 )()(
44434241343332312423222114131211
0110111110011111
1
XXXXXXXXXXXXXXXX
XCB
BFa
10
[1]
Proposed Method for 4x4 DCT(Cont.)
11
Proposed Method for 4x4 DCT Evaluation
12
Proposed Method for 4x4 DCT Evaluation
13
• Synthesized with Xilinx Project Navigator 10.01 for Xilinx Virtex 5 (xc5vlx30).
Typical Implementation
Typical Implementation
Proposed Method
DCT (9 bits) DCT/Quant (16 bits) DCT/Quant (16 bits)
3737 gates DCT block requires 294 gates65 FFs256 bits R/W memory
7000 gates
Proposed Method for 8x8 DCT [2]
• Initial H.264 specification adopted an integer approximation of 4×4.
• But the 4×4 block is not enough higher resolutions
8x8 DCT
Significant Compression Performance
Additional Complexity
14
Proposed Method for 8x8 DCT(Cont.)
81
36101212106348844884612310103126888888881031266123108448844812106336101288888888
C
TCXCW
15
Proposed Method for 8x8 DCT(Cont.)
• The 2-D forward 8x8 – 1-D horizontal (row) transform – 1-D vertical (column) transform
16
Proposed Method for 8x8 DCT(Cont.)
17
Proposed Method for 8x8 DCT(Cont.)
]7[]6[]5[]4[]3[]2[]1[]0[
36101212106348844884612310103126888888881031266123108448844812106336101288888888
81
xxxxxxxx
W
]7[]6[]5[]4[]3[]2[]1[]0[]0[ xxxxxxxxColumFirstRowFirstW
]1[]0[]0[ bbW ]2[]1[]1[]3[]0[]0[
aabaab
]4[]3[]3[]5[]2[]2[]6[]1[]1[]7[]0[]0[
xxaxxaxxaxxa
18
Proposed Method for 8x8 DCT(Cont.)
19
]7[23]6[
45]5[
43]4[
83]3[
83]2[
43]1[
45]0[
23]1[ xxxxxxxxColumFirstRowSecondW
]7[23]6[
45]5[
43]4[
83]3[
83]2[
43]1[
45]0[
23
])4[]3[(41
])4[]3[(81])5[]2[(
41])6[]1[(
41
]7[]0[])7[]0[(21]5[]2[]6[]1[]1[
])7[2]7[]6[]5[(
41]4[
2]4[]6[]5[]1[
4]7[]4[]1[
xxxxxxxx
xx
xxxxxx
xxxxxxxxW
aaaaaaaaW
bbW
Proposed Method for 8x8 DCT(Cont.)
Architecture of Proposed Algorithm [2]23
Proposed Method for 8x8 DCT(Cont.)
1-D Transform Block [2] 24
H.264 Quantization
25
qbitsfMFWZ
QstepPFMFMFWroundZ
boraboraPFQstepPFWroundZ
QstepY
roundZ
ijij
qbitsqbitsijij
ijij
ijij
).(
2)
2.(
42).(
)(
22
Qstep?
Proposed Quantization Block Architecture
27
[2]
Proposed Method for 8x8 DCT & Quantization - Evaluation
• In the architecture– Each input column vector of 8 pixels is input to the 1-D
DCT block for 8 cycles => 64 cycles are required to process all pixel elements in one 8×8 block
– Without multiplication– The pixel by pixel processing can remove redundant
modules processing in integer transform block and quantization block.
– Quantization block is designed to cover all multiplication factors without using a real multiplier.
28
Proposed Method for 8x8 DCT & Quantization - Evaluation
Parallel Implementation
Proposed method
Critical path delay (ns) 14.598 8.943
Clk frequency 68.5 111.8
Parallelism 64 1
Latency 1 64
The target device chosen is Xilinx Virtex-II Pro XC2VP30 FPGA.
29
[2]90% area reduction in Proposed Method
Conclusion• The continuing development of digital video coding
has produced H.264/MPEG-4 (Part 10) Advanced Video Coding.
• It provides gains in compression efficiency of up to 50% over a wide range of bit rates and video resolutions compared to previous standards
• Besides, network friendliness and good video quality at high and low bit rates are important features that distinguish H.264 from other standards.
• These advantages are paid with a considerably higher need of computational complexity.
30
Conclusion
• To implement DCT and quantization blocks for H.264, many efforts have been carried out.
• 4x4 DCT => a method without any multiplication– less complex and definitely faster than typical method
• 8x8 DCT => a pipeline method Without multiplication for DCT & Quantization– Less complex and less area than parallel method but
slower
31
References• [1] Nandi, S.; Rajan, K.; Biswas, P. “Hardware implementation of 4×4 DCT/quantization
block using multiplication and error-free algorithm”, TENCON 2009.• [2] Jeoong Sung Park; Ogunfunmi, T. “A New Hardware Implementation Of The H.264
8×8 Transform And Quantization”, IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009.
• [3] Mohammad Norouzi, Karim Mohammadi, Mohammad Mahdy Azadfar,” Multiplication and Error Free Implementation of H.264 like 4x4 DCT/Quan_IQuan/IDCT using Algebraic Integer Encoding”, IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.9B, September 2006.
• [4] Iain E G Richardson, “H.264 / MPEG-4 Part 10 White Paper : Transform & Quantization”, vcodex, 2003.
• [5] Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra, “Overview of the H.264 / AVC Video Coding Standard”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, JULY 2003.
• [6] Thomas Sikora,” Digital Video Coding Standards and Their Role in Video Communications”, Signal Processing for Multimedia. J.S. Byrnes (Ed.) IOS Press, 1999.
32
33
Thanks For Your Attendance