Download - A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Transcript
Page 1: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

A New Generation of DSP A New Generation of DSP ArchitecturesArchitectures

Bryan Ackland and Paul D’ArcyBryan Ackland and Paul D’ArcyLucent TechnologiesLucent Technologies

Paper Review

Babak Noory

Professor Maitham Shams

97.575

March 18, 2002

Page 2: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

AgendaAgenda

1. Look at the evolution of Digital Signal Processors

2. Review the emerging system requirements

3. Summarize recent advances in low power DSP techniques

4. Look at a number of new high performance architectures

5. Describe a bus based multi-core architecture for task level parallelism

Page 3: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

IntroductionIntroduction

General Purpose Digital Signal Processors

Introduced in 1980

- High performance engines

- MAC speed advantage of 50:1 over the best micro-processors

Today

- Modest performance improvements

- Outperformed by micro-processors

Page 4: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

DSP EvolutionDSP Evolution

1980 1985 1990 1995 2000

10

1

100

1K

Performance

(Peak MACs)

M68000

80286

80386

Pentium

Pentium MMX

DSP-1

DSP-32C

DSP-16

DSP-1600 DSP-16210

Performance of DSPs vs. Microprocessors

And yet, DSPs generate over $ 3 billion dollars for the semiconductor

industry every year.

Page 5: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

DSP EvolutionDSP Evolution

1980 1985 1990 1995 20001

10

100

1K

10K

M68000 ($200)

80286 ($200)

80386 ($300)Pentium ($500)DSP-1 ($150)

DSP-16A ($15)DSP-1600 (<$10)

DSP-32C ($250)

Power (mW/MIP)

Lower cost

Higher MOP/mm2 and MOP/mW

Power and Cost of DSP’s vs. Microprocessors

Page 6: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Emerging ApplicationsEmerging Applications

Very Low Power Applications

Portable Applications: functionalities such as video and web browsing added to cellular phones, PDAs, and Multimedia Laptops

Average power becomes the main design constrainAverage power becomes the main design constrain

High Performance Applications

Embedded Applications: digital audio broadcast and smart phones

PC based Applications: 3-D graphics and real-time video communications

Infrastructure Applications: modem head-end and wireless basestations

Page 7: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Low Power TechniquesLow Power Techniques

1. Full Custom Datapath Layout

Circuit Topology

Transistor Sizing

Layout Parasitics

Layout Topology Drain Capacitance Simple 45.6 fF

Finger 18.7 fF

Ring 10.8 fF

W

a) Simple

S

DW/4

c) Ring

S DX

W/2

b) Finger

S SD

X

Courtesy [1]

Page 8: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

T

&

&

&

To boards 1-3

Gate CPU

Gate CPU Section 2

Gate CPU Section 3

Gate CPU Section 1

To boards 4-6

To boards 7-9Crystal Oscillator

System Clock

Low Power TechniquesLow Power Techniques

2. Clock Gating

System Level Clock Gating: Limit data transition and clock dissipation to active sub-systems

Local Clock Gating: Deactivate non-active elements in a sequential circuit

Courtesy [4]

Operation Mode Power

Normal Mode (80MHz) 120mW

Standby (Halt) 21 mW

Slow Clock (16KHz) 2.3mW

StopClk 30uW

Page 9: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Low Power TechniquesLow Power Techniques

3. Minimizing Data Transitions

Applicable to circuits, where data transitions are well understood

Difficult to estimate internal node activity for complex circuits

P(A=1) = 0.5

P(B=1) = 0.2

P(C=1) = 0.1

A

C

B

Z A

B

C

Z

x x

Activity at node x = 0.09 Activity at node x = 0.0196

Courtesy [3]

Page 10: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Low Power TechniquesLow Power Techniques

4. Partitioned Memory Architecture

Memories occupy a great deal of silicon area, but activity factors in these individual circuits are very low.

Adopt hierarchical sub-banking

Replace large memory blocks with several smaller blocks

Make use of gated clocks to limit switching activity to active blocks

Page 11: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Low Power TechniquesLow Power Techniques

5. Technology &Voltage Scaling

Adjusting supply voltages to meet performance requirements

Mixed voltage & mixed threshold logic families

Dynamic voltage scaling: Supply voltage and clock speed vary continuously according to processor load

Supply “cut off:” High threshold transistors used to cut off the power when chip goes in sleep mode

Page 12: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Emerging Applications Emerging Applications (Revisited)(Revisited)

Very Low Power Applications

Portable Applications: functionalities such as video and web browsing added to cellular phones, PDAs, and Multimedia Laptops

Average power becomes the main design constrain

High Performance Applications

Embedded Applications: digital audio broadcast and smart phones

PC based Applications: 3-D graphics and real-time video communications

Infrastructure Applications: modem head-end and wireless basestations

Page 13: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Minor enhancements in combination with process improvement will not

meet the requirements of emerging applications. The new architectures

must provide:

Performance ranging from hundreds of MOPS to tens of GOPS

Parallel architectures, many operations/clock

Large memory and I/O bandwidth

Cache hierarchies

Compiler driven programming environment

High-level programming languages

Scalability

Range of cost/performance targets

New Class of New Class of architecturesarchitectures

Page 14: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Media ProcessorsMedia Processors

TI

C80

Chromatics

MPACT

Philips

Tri-Media

IBM

MFAST

Samsung

MSP-1

Architecture 4 64bDSP

+ 32b RISC

VLIW/SIMD

4 ALUs

VLIW

25 exec. Units

VLIW/SIMD

4by4 folded array

32-way SIMD

+ 32b RISC

clock 40 MHz 62 MHz 100 MHz 50 MHz 100 MHz

Performance 1.2 GOPS 2.0 GOPS 4.0 GOPS 20 GOPS 6.4 GOPS

Memory DRAM

400 MB/s

RAMBUS

500 MB/s

SDRAM

400 MB/s

SDRAM

800 MB/s

SDRAM

800 MB/s

Programming Compiler +

Assembler

In-house VLIW Compiler

Compiler + Assembler

Compiler + Assembler

Very high performance

Very fast memories

Yet all programs (save Tri-Media) have been cancelled

Page 15: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Reasons:

1. Programmability Issues

- Required large quantities of assembly code

- Explicit management of task level and instruction level parallelism

2. Lack of Scalability

- Single price/performance (except for C80)

3. Difficult Market

- Multimedia applications on PC

- Caught between high-performance ASICS and Software solutions

Media ProcessorsMedia Processors

Page 16: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Task Level Parallelism

Code and data

ScalabilityBus support for N DSP cores

Cache memory

Daytona MIMD Daytona MIMD ArchitectureArchitecture

Memory &I/O Controller

STBus

DSP

cache

DSP

cache

DSP

cache

Ext. mem I/O host

Simulation has shown that N can be in the range of 8 to 10 processors !

Page 17: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

LIW Machine32b SPARC + 64b SIMD

Instruction level parallelism:

- 64b instructions

- 2 x 32b RISC operations

- 32b RISC + 32b coprocessor

extension

DSP core programming in C

Daytona DSP Core Daytona DSP Core ArchitectureArchitecture

Bus Interface

STBus

8kB Instruction and Data Cache

32b SPARCRISC up

64b 8-way SIMDVector Coprocessor

Page 18: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Conclusions(1)Conclusions(1)

The DSP world is changing

Emerging applications in combination with few backward compatibility issues require new architectures, which can maximize:

Parallelism

Scalability

Programmability

Generality

While other measures must be taken to minimize:

Cost

Time to Market

Page 19: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

Conclusions(2)Conclusions(2)

The DSP world is changing

What will separate the DSPs from general purpose microprocessors in the future, will simply be the cost factor.

The DSP world is changing

What will separate the DSPs from general purpose microprocessors in the future, will simply be the cost factor. Advances in programmable hardware field are also very promising, and could further change the DSP landscape in the future.

Page 20: A New Generation of DSP Architectures Bryan Ackland and Paul D’Arcy Lucent Technologies Paper Review Babak Noory Professor Maitham Shams 97.575 March.

ReferencesReferences[1] A. P. Chandrakasan and R.W. Brodersen, “Low Power Digital CMOS Design,” Kluwer Academic Publishers: Norwell, 1995.

[2] K. D. Wagner, “Clock System Design,” IEEE Design & Test of Computers, PP. 9-27, October 1988

[3] L. Wanhammar, “DSP Integrated Circuits,” Academic Press: London: 1999.

[4] K. Hwang, “Advanced Computer Architecture: Parallelism, Scalability, Programmability,” McGraw-Hill: New York, 1993.

[5] T. Kudra and T. Sakurai, “Overview of Low-Power ULSI Circuit Techniques,” IEICE Transactions on Electronics, Vol. E78-C, NO.4, PP. 334-344, April 1995

[6] C. Hamacher, Z. Vranesic and S. Zaky, “Computer Organization,” fifth edition, McGraw-Hill: New York, 2002.

[7] M. M. Mano, “Computer System Architecture,” McGraw-Hill: New York, 1993.