Evaluating the Imagine Stream Processor

24
Evaluating the Imagine Stream Processor Jung Ho Ahn, William J. Dally, Brucek Khailany, Ujval J. Kapasi, and Abhishek Das ISCA 2004

description

Evaluating the Imagine Stream Processor. Jung Ho Ahn , William J. Dally, Brucek Khailany , Ujval J. Kapasi , and Abhishek Das ISCA 2004. Motivation. Provide efficiency of an ASIC Provide flexibility of a programmable processor Simplify special-purpose processor design - PowerPoint PPT Presentation

Transcript of Evaluating the Imagine Stream Processor

Page 1: Evaluating the Imagine Stream Processor

Evaluating the Imagine Stream Processor

Jung Ho Ahn, William J. Dally, Brucek Khailany, Ujval J. Kapasi, and Abhishek Das

ISCA 2004

Page 2: Evaluating the Imagine Stream Processor

Motivation• Provide efficiency of an ASIC• Provide flexibility of a programmable processor• Simplify special-purpose processor design • Lower special-purpose processor design cost• Provide better applicability• Target media applications

Page 3: Evaluating the Imagine Stream Processor

Stream Architecture

Page 4: Evaluating the Imagine Stream Processor

Development Board

PowerPC, 150 MHz2 x Imagine, 200 MHzFPGA Bridge, 66 MHz

256MB of SDRAM / Imagine, 100 MHz

Page 5: Evaluating the Imagine Stream Processor

Applications

Page 6: Evaluating the Imagine Stream Processor

Mapping

Page 7: Evaluating the Imagine Stream Processor

Execution on a Single Stream

…Iteration n

Iteration 1

……

Output Stream

Input Stream

SRFKernel 1

Page 8: Evaluating the Imagine Stream Processor

Execution of Multiple KernelsSRF Kernel 1

Stream 1

Stream 2

Stream 3

processing…

Kernel 2

processing…

Kernel 3

processing…

Stream 4

Page 9: Evaluating the Imagine Stream Processor

Application PerformanceGOPS: 18%

GFLOPS: 60%

Page 10: Evaluating the Imagine Stream Processor

Sources of Overhead

Page 11: Evaluating the Imagine Stream Processor

Stream Length Effects

Page 12: Evaluating the Imagine Stream Processor

Access Pattern Effects

Page 13: Evaluating the Imagine Stream Processor

Energy Efficiency

Energy consumption per FLOP :(when normalized to 0.13um 1.2V process)

Imagine @ 200 MHz:277pJ/FLOP

TI C67x DSP @ 225MHz:889pJ/FLOP (3.2x more)

Intel Pentium M @ 1200GHz:3600pJ/FLOP (13x more)

Page 14: Evaluating the Imagine Stream Processor

Memory Bandwidth Requirement

Page 15: Evaluating the Imagine Stream Processor

Host Processor Bandwidth Requirement

Page 16: Evaluating the Imagine Stream Processor

Programming Model

Page 17: Evaluating the Imagine Stream Processor

Compiler OptimizationsStream Ordering

Page 18: Evaluating the Imagine Stream Processor

Compiler OptimizationsSRF Overlapping and Packing

Page 19: Evaluating the Imagine Stream Processor

Compiler OptimizationsStrip-mining

Page 20: Evaluating the Imagine Stream Processor

Compiler OptimizationsLoop Unrolling and Software Pipelining

Page 21: Evaluating the Imagine Stream Processor

Conclusions

• Provides performance close to that of ASIC and flexibility via programming

• Can sustain between 16% and 60% of the peak arithmetic performance

• Exposed 2-level register file allows compiler to exploit locality

• Broader applicability• Requires considerable programming effort• Limited to media applications with regular control-

flow

Page 22: Evaluating the Imagine Stream Processor

Collab Questions

• How does the performance compare to other processors? (Dan, Marko, Jason, Prateeksha, Chris)

• What is the compiler efficiency? (Mario, Liang)• How were the design decisions motivated? (Jing,

Marisabel)• How does the programming model compare to that

of GPUs? (Greg)

Page 23: Evaluating the Imagine Stream Processor
Page 24: Evaluating the Imagine Stream Processor

Kernels