11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor...

27
1 1 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1 , Mark Woh 1 , Scott Mahlke 1 , Trevor Mudge 1 Vijay Sundaram 2 , Chaitali Chakrabarti 2 1 University of Michigan 2 Arizona State University

Transcript of 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor...

Page 1: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

11

1

Customizing Wide-SIMD Architecturesfor H.264

Sangwon Seo1, Mark Woh1, Scott Mahlke1, Trevor Mudge1

Vijay Sundaram2, Chaitali Chakrabarti2

1 University of Michigan2 Arizona State University

Page 2: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

22

2

Customizing Wide-SIMD Architectures for H.264

Outline

Motivation

H.264 Analysis

Proposed Architecture

H.264 Kernel Mappings

Results

Conclusion

2

Page 3: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

33

3

Customizing Wide-SIMD Architectures for H.264

Motivation – Smart Phone

3

Reference Images : http://www.apple.com/iphone/gallery/

Page 4: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

44

4

Customizing Wide-SIMD Architectures for H.264

Motivation – Inside Smart Phone

4

Reference Images : http://idannyb.files.wordpress.com/2008/07/xiuvbfueck3gsdum-large.jpg

Page 5: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

55

5

Customizing Wide-SIMD Architectures for H.264

H.264 Design

5

Reference Images : I. Richardson, “H.264 and MPEG-4 video compression,” WILEY, 2003

H.264 encoder/decoder reference design

Page 6: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

66

6

Customizing Wide-SIMD Architectures for H.264

H.264 – Analysis

H.264 Kernel Algorithms

Heavy SIMD workload

Different natural SIMD widths

High & Medium Thread Level Parallelism

Need to support multiple SIMD widths to maximize the SIMD utilization

6

Page 7: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

77

7

Customizing Wide-SIMD Architectures for H.264

H.264 – Analysis

Example – Deblocking Filter

Two dimensional data are used for multimedia algorithms.

Row or column order memory access works well for one set of edges, but not for the other.

Diagonal memory bank system helps to access blocks along a row or a column.

7

Horizontal Filtering

Vertical

Filtering

Page 8: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

88

8

Customizing Wide-SIMD Architectures for H.264

H.264 – Analysis

Subgraphs for Innerloops of two kernel algorithms

Large amount of data locality

Large RF power consumption (Read/Write)

Bypass and Temporary buffer support

8

Page 9: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

99

9

Customizing Wide-SIMD Architectures for H.264

H.264 - Analysis

Instruction Pairs

Heavy usage of shuffle and arithmetic operations

Add-Shift : round operation

Sub-Abs : SAD operation

Need to fuse the frequently used instruction pairs

9

Page 10: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

1010

10

Customizing Wide-SIMD Architectures for H.264

H.264 - Analysis

Permutation Patterns for Intraprediction

Fixed set of shuffle patterns

Need for programmable shuffle network

10

Page 11: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

1111

11

Customizing Wide-SIMD Architectures for H.264

Modified SIMD architecture

11

Page 12: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

1212

12

Customizing Wide-SIMD Architectures for H.264

Modified SIMD architecture

12

Multiple SIMD widths

Thread-Level Parallelism

Page 13: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

1313

13

Customizing Wide-SIMD Architectures for H.264

Modified SIMD architecture

13

Diagonal Memory Organization

Memory Bank System + Shuffle Network

Page 14: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

1414

14

Customizing Wide-SIMD Architectures for H.264

Modified SIMD architecture

14

Short-lived values stored in temporary buffers

Page 15: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

1515

15

Customizing Wide-SIMD Architectures for H.264

Modified SIMD architecture

15

Short-lived values

Fused Operation

Page 16: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

1616

16

Customizing Wide-SIMD Architectures for H.264

Modified SIMD architecture

16

Shuffle Networks are placed here and there to align data

Page 17: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

1717

17

Customizing Wide-SIMD Architectures for H.264

Mapping of H.264 Kernels

Intra Prediction

17

Page 18: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

1818

18

Customizing Wide-SIMD Architectures for H.264

Results

System Breakdown

H.264 CIF video at 30fps

18

Page 19: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

1919

19

Customizing Wide-SIMD Architectures for H.264

Results

Speedup Breakdown

2.13x performance increase on average

19

Page 20: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

2020

20

Customizing Wide-SIMD Architectures for H.264

Results

Energy-Delay product comparison

29% energy-delay improvement on average

20

Page 21: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

2121

21

Customizing Wide-SIMD Architectures for H.264

Results

21

Comparison with latest H.264 encoders

[17] T. C. Chen et.al, “2.8 to 62.7 mW low-power and power-aware H.264 encoder for mobile

applications,” 2007 IEEE Symposium on VLSI Circuits, pp. 222–223, June 2007.

[18] M. Bhatnagar, “TMS320DM6446/3 Power Consumption Summary,” Texas Instruments

Application Reports, http://focus.ti.com/lit/an/spraad6a/spraad6a.pdf, Feb. 2008.

Page 22: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

2222

22

Customizing Wide-SIMD Architectures for H.264

Conclusion

Key architectural enhancements SIMD partitioning

Diagonal memory bank system

Bypass and temporary buffer support

Fused operation support

Programmable crossbar

Future work Image processing algorithms on SIMD architecture

22

Page 23: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

2323

23

Customizing Wide-SIMD Architectures for H.264

Backup Slides

23

Page 24: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

2424

24

Customizing Wide-SIMD Architectures for H.264

H.264 – Analysis

Diagonal Memory Organization

Two dimensional data are used for multimedia algorithms.

Blocks along a row or a column need to be accessed easily.

24

Page 25: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

2525

25

Customizing Wide-SIMD Architectures for H.264

Mapping of H.264 Kernels

Deblocking Filter

25

Page 26: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

2626

26

Customizing Wide-SIMD Architectures for H.264

Mapping of H.264 Kernels

Motion Compensation

26

Page 27: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

2727

27

Customizing Wide-SIMD Architectures for H.264

Mapping of H.264 Kernels

Motion Estimation

27