A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper...

Post on 30-Dec-2015

216 views 2 download

Tags:

Transcript of A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper...

A New Generation of DSP A New Generation of DSP ArchitecturesArchitectures

Bryan Ackland and Paul D’ArcyBryan Ackland and Paul D’ArcyLucent TechnologiesLucent Technologies

Paper Review

Babak Noory

Professor Maitham Shams

97.575

March 18, 2002

AgendaAgenda

1. Look at the evolution of Digital Signal Processors

2. Review the emerging system requirements

3. Summarize recent advances in low power DSP techniques

4. Look at a number of new high performance architectures

5. Describe a bus based multi-core architecture for task level parallelism

IntroductionIntroduction

General Purpose Digital Signal Processors

Introduced in 1980

- High performance engines

- MAC speed advantage of 50:1 over the best micro-processors

Today

- Modest performance improvements

- Outperformed by micro-processors

DSP EvolutionDSP Evolution

1980 1985 1990 1995 2000

10

1

100

1K

Performance

(Peak MACs)

M68000

80286

80386

Pentium

Pentium MMX

DSP-1

DSP-32C

DSP-16

DSP-1600 DSP-16210

Performance of DSPs vs. Microprocessors

And yet, DSPs generate over $ 3 billion dollars for the semiconductor

industry every year.

DSP EvolutionDSP Evolution

1980 1985 1990 1995 20001

10

100

1K

10K

M68000 ($200)

80286 ($200)

80386 ($300)Pentium ($500)DSP-1 ($150)

DSP-16A ($15)DSP-1600 (<$10)

DSP-32C ($250)

Power (mW/MIP)

Lower cost

Higher MOP/mm2 and MOP/mW

Power and Cost of DSP’s vs. Microprocessors

Emerging ApplicationsEmerging Applications

Very Low Power Applications

Portable Applications: functionalities such as video and web browsing added to cellular phones, PDAs, and Multimedia Laptops

Average power becomes the main design constrainAverage power becomes the main design constrain

High Performance Applications

Embedded Applications: digital audio broadcast and smart phones

PC based Applications: 3-D graphics and real-time video communications

Infrastructure Applications: modem head-end and wireless basestations

Low Power TechniquesLow Power Techniques

1. Full Custom Datapath Layout

Circuit Topology

Transistor Sizing

Layout Parasitics

Layout Topology Drain Capacitance Simple 45.6 fF

Finger 18.7 fF

Ring 10.8 fF

W

a) Simple

S

DW/4

c) Ring

S DX

W/2

b) Finger

S SD

X

Courtesy [1]

T

&

&

&

To boards 1-3

Gate CPU

Gate CPU Section 2

Gate CPU Section 3

Gate CPU Section 1

To boards 4-6

To boards 7-9Crystal Oscillator

System Clock

Low Power TechniquesLow Power Techniques

2. Clock Gating

System Level Clock Gating: Limit data transition and clock dissipation to active sub-systems

Local Clock Gating: Deactivate non-active elements in a sequential circuit

Courtesy [4]

Operation Mode Power

Normal Mode (80MHz) 120mW

Standby (Halt) 21 mW

Slow Clock (16KHz) 2.3mW

StopClk 30uW

Low Power TechniquesLow Power Techniques

3. Minimizing Data Transitions

Applicable to circuits, where data transitions are well understood

Difficult to estimate internal node activity for complex circuits

P(A=1) = 0.5

P(B=1) = 0.2

P(C=1) = 0.1

A

C

B

Z A

B

C

Z

x x

Activity at node x = 0.09 Activity at node x = 0.0196

Courtesy [3]

Low Power TechniquesLow Power Techniques

4. Partitioned Memory Architecture

Memories occupy a great deal of silicon area, but activity factors in these individual circuits are very low.

Adopt hierarchical sub-banking

Replace large memory blocks with several smaller blocks

Make use of gated clocks to limit switching activity to active blocks

Low Power TechniquesLow Power Techniques

5. Technology &Voltage Scaling

Adjusting supply voltages to meet performance requirements

Mixed voltage & mixed threshold logic families

Dynamic voltage scaling: Supply voltage and clock speed vary continuously according to processor load

Supply “cut off:” High threshold transistors used to cut off the power when chip goes in sleep mode

Emerging Applications Emerging Applications (Revisited)(Revisited)

Very Low Power Applications

Portable Applications: functionalities such as video and web browsing added to cellular phones, PDAs, and Multimedia Laptops

Average power becomes the main design constrain

High Performance Applications

Embedded Applications: digital audio broadcast and smart phones

PC based Applications: 3-D graphics and real-time video communications

Infrastructure Applications: modem head-end and wireless basestations

Minor enhancements in combination with process improvement will not

meet the requirements of emerging applications. The new architectures

must provide:

Performance ranging from hundreds of MOPS to tens of GOPS

Parallel architectures, many operations/clock

Large memory and I/O bandwidth

Cache hierarchies

Compiler driven programming environment

High-level programming languages

Scalability

Range of cost/performance targets

New Class of New Class of architecturesarchitectures

Media ProcessorsMedia Processors

TI

C80

Chromatics

MPACT

Philips

Tri-Media

IBM

MFAST

Samsung

MSP-1

Architecture 4 64bDSP

+ 32b RISC

VLIW/SIMD

4 ALUs

VLIW

25 exec. Units

VLIW/SIMD

4by4 folded array

32-way SIMD

+ 32b RISC

clock 40 MHz 62 MHz 100 MHz 50 MHz 100 MHz

Performance 1.2 GOPS 2.0 GOPS 4.0 GOPS 20 GOPS 6.4 GOPS

Memory DRAM

400 MB/s

RAMBUS

500 MB/s

SDRAM

400 MB/s

SDRAM

800 MB/s

SDRAM

800 MB/s

Programming Compiler +

Assembler

In-house VLIW Compiler

Compiler + Assembler

Compiler + Assembler

Very high performance

Very fast memories

Yet all programs (save Tri-Media) have been cancelled

Reasons:

1. Programmability Issues

- Required large quantities of assembly code

- Explicit management of task level and instruction level parallelism

2. Lack of Scalability

- Single price/performance (except for C80)

3. Difficult Market

- Multimedia applications on PC

- Caught between high-performance ASICS and Software solutions

Media ProcessorsMedia Processors

Task Level Parallelism

Code and data

ScalabilityBus support for N DSP cores

Cache memory

Daytona MIMD Daytona MIMD ArchitectureArchitecture

Memory &I/O Controller

STBus

DSP

cache

DSP

cache

DSP

cache

Ext. mem I/O host

Simulation has shown that N can be in the range of 8 to 10 processors !

LIW Machine32b SPARC + 64b SIMD

Instruction level parallelism:

- 64b instructions

- 2 x 32b RISC operations

- 32b RISC + 32b coprocessor

extension

DSP core programming in C

Daytona DSP Core Daytona DSP Core ArchitectureArchitecture

Bus Interface

STBus

8kB Instruction and Data Cache

32b SPARCRISC up

64b 8-way SIMDVector Coprocessor

Conclusions(1)Conclusions(1)

The DSP world is changing

Emerging applications in combination with few backward compatibility issues require new architectures, which can maximize:

Parallelism

Scalability

Programmability

Generality

While other measures must be taken to minimize:

Cost

Time to Market

Conclusions(2)Conclusions(2)

The DSP world is changing

What will separate the DSPs from general purpose microprocessors in the future, will simply be the cost factor.

The DSP world is changing

What will separate the DSPs from general purpose microprocessors in the future, will simply be the cost factor. Advances in programmable hardware field are also very promising, and could further change the DSP landscape in the future.

ReferencesReferences[1] A. P. Chandrakasan and R.W. Brodersen, “Low Power Digital CMOS Design,” Kluwer Academic Publishers: Norwell, 1995.

[2] K. D. Wagner, “Clock System Design,” IEEE Design & Test of Computers, PP. 9-27, October 1988

[3] L. Wanhammar, “DSP Integrated Circuits,” Academic Press: London: 1999.

[4] K. Hwang, “Advanced Computer Architecture: Parallelism, Scalability, Programmability,” McGraw-Hill: New York, 1993.

[5] T. Kudra and T. Sakurai, “Overview of Low-Power ULSI Circuit Techniques,” IEICE Transactions on Electronics, Vol. E78-C, NO.4, PP. 334-344, April 1995

[6] C. Hamacher, Z. Vranesic and S. Zaky, “Computer Organization,” fifth edition, McGraw-Hill: New York, 2002.

[7] M. M. Mano, “Computer System Architecture,” McGraw-Hill: New York, 1993.