Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding...

26
Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack

Transcript of Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding...

Page 1: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Lu Hao

Profiling-Based Hardware/Software Co-

Exploration for the Design of Video Coding Architectures

Heiko Hübert and Benno Stabernack

Page 2: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Contents

1. Background

2. MEMTRACE profiler

3. Software/Hardware Optimization

4. Conclusion

Page 3: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Background -- profiling

Profiling is used to understand the run-time behavior of applications

Page 4: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Efficient profiling approaches

Software profiling Sampling, Instrumentation Flexible but have high overhead

Hardware profiling Performance counter inexpensive but more rigid and may not be

universally availableHybrid Combinations of the above

Hold great potential since they combine the advantages of both without the drawbacks

Page 5: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

An example of hardware profiling

PC – Performance Counter

Page 6: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Background – system analysis

Why we need profiling? It is very important to adapt the system to the

application in order to find an efficient solution.

Video coding

Page 7: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Contents

1. Background

2. MEMTRACE profiler

3. Software/Hardware Optimization

4. Conclusion

Page 8: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

MEMTRACE profiler

MEMTRACE delivers cycle-accurate profiling results on a C function level.

The results include clock cycles, various memory access statistics, and optionally energy consumption estimation for reduced instruction set computer (RISC)-based processors.

A focus is placed on memory access analysis, as for data-intensive applications this aspect has a high potential for increasing system efficiency.

Page 9: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

MEMTRACE profiling toolflow

Page 10: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

MEMTRACE -- Initialization

Page 11: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

MEMTRACE – Performance Analysis

Page 12: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

MEMTRACE – Post Processing

Page 13: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

MEMTRACE backend

Page 14: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

MEMTRACE -- Profiling data acquisition

Page 15: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

MEMTRACE -- Profiling data acquisition

init() Initialize the profiler. Creates a list of all functions and global variables

nextInstruction() Checks if the program execution has changed from

one function to another If so, the cycle count of the previous function is

recalculated and the call count of the new function is incremented

memoryAccess() It is decided if a load or store access was performed,

and which bit-width (8, 16, or 32-bit) was used.

Page 16: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

MEMTRACE -- Profiling data acquisition

busActivity() Identifies the bus status (idle cycle, core

access or DMA access) and increments the appropriate counter of the current function

cacheMiss() Is called each time a cache miss occurs

finish() When the ISS terminates the simulation

Page 17: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Processor model generator

Page 18: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Interconnection

Page 19: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

What can we do by using the result of MEMTRACE profiler?

Page 20: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Contents

1. Background

2. MEMTRACE profiler

3. Software/Hardware Optimization

4. Conclusion

Page 21: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

System partitioning Computationally intensive functions are well-

suited for hardware acceleration in a coprocessor

Control-intensive functions are better suited for software implementation on ASIPs (Application Specific Instruction set Processors)

Page 22: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Software Optimization

Loop unrollingFor computational intensive parts,

arithmetic optimizations or SIMD instructions can be applied, if such instructions are available in the processor Video applications

Page 23: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Hardware Optimization

Memory Subsystem Optimizations External memory Cache (Cache miss)

• The data areas with the most cache misses and the smallest size should be stored in on-chip memory

SRAM Instruction Set Architecture Optimizations

Frequently used instructions should be considered as targets for optimization during the processor architecture development.

Page 24: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Conclusion

Profiling and system analysisMEMTRACE architecture

Initialization Performance analysis Post processing

Hardware/Software optimization Software Hardware

Page 25: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Lu Hao

And questions?

Page 26: Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

References

[1] H Hübert, B Stabernack. Profiling-based hardware/software co-exploration for the design of video coding architectures. IEEE Transactions on Circuits and Systems for Video Technology, 2009, Pages: 1680-1691

[2]ST Microelectronics: Nomadik STn8820 Mobile Multimedia Application Processor (2008, Feb.). Data brief. [Online]. Available: www.st.com

[3] Broadcom: BCM2820 Low Power, High Performance Application Processor (2006, Sep.). Product brief. [Online]. Available: www.broadcom.com

[4] G. de Micheli and L. Benini, Network on Chips. San Francisco, CA: Morgan Kaufmann, 2006.

[5] H. H¨ubert, “MEMTRACE: A memory, performance and energy profiler targeting RISC-based embedded systems for dataintensive applications,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., Tech. Univ. Berlin, Germany, 2009. [Online]. Available: http://opus.kobv.de/tuberlin/volltexte/2009/2261