A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper...

of 20 /20
A New Generation of DSP A New Generation of DSP Architectures Architectures Bryan Ackland and Paul D’Arcy Bryan Ackland and Paul D’Arcy Lucent Technologies Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March 18, 2002

Embed Size (px)

Transcript of A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper...

  • A New Generation of DSP Architectures

    Bryan Ackland and Paul DArcyLucent Technologies

    Paper ReviewBabak Noory Professor Maitham Shams97.575March 18, 2002

  • Agenda

    Look at the evolution of Digital Signal Processors Review the emerging system requirements Summarize recent advances in low power DSP techniquesLook at a number of new high performance architecturesDescribe a bus based multi-core architecture for task level parallelism

  • IntroductionGeneral Purpose Digital Signal Processors

    Introduced in 1980- High performance engines- MAC speed advantage of 50:1 over the best micro-processorsToday- Modest performance improvements- Outperformed by micro-processors

  • DSP Evolution

    Performance of DSPs vs. MicroprocessorsAnd yet, DSPs generate over $ 3 billion dollars for the semiconductor industry every year.

  • DSP Evolution

    Lower cost Higher MOP/mm2 and MOP/mWPower and Cost of DSPs vs. Microprocessors

  • Emerging ApplicationsVery Low Power ApplicationsPortable Applications: functionalities such as video and web browsing added to cellular phones, PDAs, and Multimedia LaptopsAverage power becomes the main design constrain

    High Performance ApplicationsEmbedded Applications: digital audio broadcast and smart phonesPC based Applications: 3-D graphics and real-time video communicationsInfrastructure Applications: modem head-end and wireless basestations

  • Low Power TechniquesFull Custom Datapath LayoutCircuit TopologyTransistor SizingLayout Parasitics

  • Low Power TechniquesClock GatingSystem Level Clock Gating: Limit data transition and clock dissipation to active sub-systemsLocal Clock Gating: Deactivate non-active elements in a sequential circuitCourtesy [4]Operation ModePower Normal Mode (80MHz) 120mW

    Standby (Halt) 21 mW

    Slow Clock (16KHz) 2.3mW

    StopClk 30uW

  • Low Power TechniquesMinimizing Data TransitionsApplicable to circuits, where data transitions are well understoodDifficult to estimate internal node activity for complex circuits

    Courtesy [3]

  • Low Power TechniquesPartitioned Memory ArchitectureMemories occupy a great deal of silicon area, but activity factors in these individual circuits are very low.

    Adopt hierarchical sub-banking Replace large memory blocks with several smaller blocksMake use of gated clocks to limit switching activity to active blocks

  • Low Power TechniquesTechnology &Voltage ScalingAdjusting supply voltages to meet performance requirements

    Mixed voltage & mixed threshold logic familiesDynamic voltage scaling: Supply voltage and clock speed vary continuously according to processor loadSupply cut off: High threshold transistors used to cut off the power when chip goes in sleep mode

  • Emerging Applications (Revisited)Very Low Power ApplicationsPortable Applications: functionalities such as video and web browsing added to cellular phones, PDAs, and Multimedia LaptopsAverage power becomes the main design constrain

    High Performance ApplicationsEmbedded Applications: digital audio broadcast and smart phonesPC based Applications: 3-D graphics and real-time video communicationsInfrastructure Applications: modem head-end and wireless basestations

  • New Class of architecturesMinor enhancements in combination with process improvement will not meet the requirements of emerging applications. The new architectures must provide:Performance ranging from hundreds of MOPS to tens of GOPS Parallel architectures, many operations/clockLarge memory and I/O bandwidthCache hierarchiesCompiler driven programming environmentHigh-level programming languagesScalability Range of cost/performance targets

  • Media ProcessorsVery high performanceVery fast memoriesYet all programs (save Tri-Media) have been cancelled

    TI C80ChromaticsMPACTPhilipsTri-MediaIBMMFASTSamsungMSP-1Architecture4 64bDSP+ 32b RISCVLIW/SIMD4 ALUsVLIW25 exec. UnitsVLIW/SIMD4by4 folded array32-way SIMD+ 32b RISCclock40 MHz62 MHz100 MHz50 MHz100 MHzPerformance1.2 GOPS2.0 GOPS4.0 GOPS20 GOPS6.4 GOPSMemoryDRAM400 MB/sRAMBUS500 MB/sSDRAM 400 MB/sSDRAM800 MB/sSDRAM800 MB/sProgrammingCompiler + AssemblerIn-houseVLIW CompilerCompiler + AssemblerCompiler + Assembler

  • Media ProcessorsReasons:

    Programmability Issues - Required large quantities of assembly code- Explicit management of task level and instruction level parallelismLack of Scalability- Single price/performance (except for C80)Difficult Market- Multimedia applications on PC- Caught between high-performance ASICS and Software solutions

  • Daytona MIMD ArchitectureTask Level ParallelismCode and data

    ScalabilityBus support for N DSP coresCache memoryMemory &I/O ControllerExt. mem I/O hostSimulation has shown that N can be in the range of 8 to 10 processors !

  • Daytona DSP Core ArchitectureLIW Machine32b SPARC + 64b SIMDInstruction level parallelism:- 64b instructions- 2 x 32b RISC operations- 32b RISC + 32b coprocessor extensionDSP core programming in CBus Interface8kB Instruction and Data Cache32b SPARCRISC up64b 8-way SIMDVector Coprocessor

  • Conclusions(1)The DSP world is changing Emerging applications in combination with few backward compatibility issues require new architectures, which can maximize:ParallelismScalabilityProgrammability Generality While other measures must be taken to minimize:Cost Time to Market

  • Conclusions(2)The DSP world is changing What will separate the DSPs from general purpose microprocessors in the future, will simply be the cost factor.

    The DSP world is changing What will separate the DSPs from general purpose microprocessors in the future, will simply be the cost factor. Advances in programmable hardware field are also very promising, and could further change the DSP landscape in the future.

  • References[1] A. P. Chandrakasan and R.W. Brodersen, Low Power Digital CMOS Design, Kluwer Academic Publishers: Norwell, 1995.[2] K. D. Wagner, Clock System Design, IEEE Design & Test of Computers, PP. 9-27, October 1988[3] L. Wanhammar, DSP Integrated Circuits, Academic Press: London: 1999.[4] K. Hwang, Advanced Computer Architecture: Parallelism, Scalability, Programmability, McGraw-Hill: New York, 1993.[5] T. Kudra and T. Sakurai, Overview of Low-Power ULSI Circuit Techniques, IEICE Transactions on Electronics, Vol. E78-C, NO.4, PP. 334-344, April 1995[6] C. Hamacher, Z. Vranesic and S. Zaky, Computer Organization, fifth edition, McGraw-Hill: New York, 2002.[7] M. M. Mano, Computer System Architecture, McGraw-Hill: New York, 1993.

    kiss my ass