11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor...
-
Upload
dale-lewis -
Category
Documents
-
view
215 -
download
0
Transcript of 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor...
11
1
Customizing Wide-SIMD Architecturesfor H.264
Sangwon Seo1, Mark Woh1, Scott Mahlke1, Trevor Mudge1
Vijay Sundaram2, Chaitali Chakrabarti2
1 University of Michigan2 Arizona State University
22
2
Customizing Wide-SIMD Architectures for H.264
Outline
Motivation
H.264 Analysis
Proposed Architecture
H.264 Kernel Mappings
Results
Conclusion
2
33
3
Customizing Wide-SIMD Architectures for H.264
Motivation – Smart Phone
3
Reference Images : http://www.apple.com/iphone/gallery/
44
4
Customizing Wide-SIMD Architectures for H.264
Motivation – Inside Smart Phone
4
Reference Images : http://idannyb.files.wordpress.com/2008/07/xiuvbfueck3gsdum-large.jpg
55
5
Customizing Wide-SIMD Architectures for H.264
H.264 Design
5
Reference Images : I. Richardson, “H.264 and MPEG-4 video compression,” WILEY, 2003
H.264 encoder/decoder reference design
66
6
Customizing Wide-SIMD Architectures for H.264
H.264 – Analysis
H.264 Kernel Algorithms
Heavy SIMD workload
Different natural SIMD widths
High & Medium Thread Level Parallelism
Need to support multiple SIMD widths to maximize the SIMD utilization
6
77
7
Customizing Wide-SIMD Architectures for H.264
H.264 – Analysis
Example – Deblocking Filter
Two dimensional data are used for multimedia algorithms.
Row or column order memory access works well for one set of edges, but not for the other.
Diagonal memory bank system helps to access blocks along a row or a column.
7
Horizontal Filtering
Vertical
Filtering
88
8
Customizing Wide-SIMD Architectures for H.264
H.264 – Analysis
Subgraphs for Innerloops of two kernel algorithms
Large amount of data locality
Large RF power consumption (Read/Write)
Bypass and Temporary buffer support
8
99
9
Customizing Wide-SIMD Architectures for H.264
H.264 - Analysis
Instruction Pairs
Heavy usage of shuffle and arithmetic operations
Add-Shift : round operation
Sub-Abs : SAD operation
Need to fuse the frequently used instruction pairs
9
1010
10
Customizing Wide-SIMD Architectures for H.264
H.264 - Analysis
Permutation Patterns for Intraprediction
Fixed set of shuffle patterns
Need for programmable shuffle network
10
1111
11
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
11
1212
12
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
12
Multiple SIMD widths
Thread-Level Parallelism
1313
13
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
13
Diagonal Memory Organization
Memory Bank System + Shuffle Network
1414
14
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
14
Short-lived values stored in temporary buffers
1515
15
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
15
Short-lived values
Fused Operation
1616
16
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
16
Shuffle Networks are placed here and there to align data
1717
17
Customizing Wide-SIMD Architectures for H.264
Mapping of H.264 Kernels
Intra Prediction
17
1818
18
Customizing Wide-SIMD Architectures for H.264
Results
System Breakdown
H.264 CIF video at 30fps
18
1919
19
Customizing Wide-SIMD Architectures for H.264
Results
Speedup Breakdown
2.13x performance increase on average
19
2020
20
Customizing Wide-SIMD Architectures for H.264
Results
Energy-Delay product comparison
29% energy-delay improvement on average
20
2121
21
Customizing Wide-SIMD Architectures for H.264
Results
21
Comparison with latest H.264 encoders
[17] T. C. Chen et.al, “2.8 to 62.7 mW low-power and power-aware H.264 encoder for mobile
applications,” 2007 IEEE Symposium on VLSI Circuits, pp. 222–223, June 2007.
[18] M. Bhatnagar, “TMS320DM6446/3 Power Consumption Summary,” Texas Instruments
Application Reports, http://focus.ti.com/lit/an/spraad6a/spraad6a.pdf, Feb. 2008.
2222
22
Customizing Wide-SIMD Architectures for H.264
Conclusion
Key architectural enhancements SIMD partitioning
Diagonal memory bank system
Bypass and temporary buffer support
Fused operation support
Programmable crossbar
Future work Image processing algorithms on SIMD architecture
22
2323
23
Customizing Wide-SIMD Architectures for H.264
Backup Slides
23
2424
24
Customizing Wide-SIMD Architectures for H.264
H.264 – Analysis
Diagonal Memory Organization
Two dimensional data are used for multimedia algorithms.
Blocks along a row or a column need to be accessed easily.
24
2525
25
Customizing Wide-SIMD Architectures for H.264
Mapping of H.264 Kernels
Deblocking Filter
25
2626
26
Customizing Wide-SIMD Architectures for H.264
Mapping of H.264 Kernels
Motion Compensation
26
2727
27
Customizing Wide-SIMD Architectures for H.264
Mapping of H.264 Kernels
Motion Estimation
27