Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen

1
Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen Illinois Center fo Wireless Systems Wireless SoC Design Trends and Challenges Shrinking transistor technologies have transformed die into a host of extraordinary size and complexity systems All the analog and digital components that were implemented in 3-4 different ICs in past technologies, can now fit in a single chip Designer Productivity does not rise at the same rate as transistor capacity Design reuse and use of Commercial Off-The-Self (COTS) Intellectual Property (IP) help meet Time-To-Market (TTM) constraints but have other downsides Design space exploration is becoming a daunting task and conflicts with the shrinking TTM requirements System customization suffers in terms of functionality/ performance/power/area from “one system fits all” tactic Design focus is shifting from single thread speed optimization to execution parallelization through multi-processor systems Typical Design Practice & Design Paradigm Shift COTS IP modules are integrated to meet the required system functionality Usually a generic microprocessor/micro- controller is used for the control part and a separate DSP processor for the signal processing part Fixed-functionality IP modules are integrated for the various data processing IP-use speeds up the design phase but: imposes coarse granularity on optimization decisions regarding functionality, performance and power dissipation does not eliminate design time entirely, as interfacing between different IP modules can take up considerable engineering resources Design Paradigm needs a shift to higher abstraction level Design systems efficiently with higher flexibility and on-demand customization Instruction-less custom processor / accelerator: Microcode memory stores microcode words which control Functional-Units (FU) and data transfers each cycle Program Counter (PC) holds next microcode memory address Microcode words do not require any decoding FUs customized according to application domain Application-custom forwarding paths between FUs can eliminate unnecessary Register File (RF) reads/writes EPOS (Explicitly Parallel Operations System) Instruction-Level Parallelism (ILP) extraction: The front-end of the IMPACT compiler is used to optimize the HLL description using: Traditional compiler techniques Superblock and Hyperblock creation The EPOS accelerators generated can substitute the generic COTS IP by: Offering high customization according to the system requirements Providing better performance and power efficiency than a generic DSP-core/microprocessor EPOS – based Wireless SoC Solution Each module is mapped directly onto a customized EPOS accelerator The interfaces between the EPOS accelerators, as well as, between other IP and EPOS modules are defined in the HLL program and automatically synthesized along with the EPOS datapaths Exploration of alternative system implementations becomes efficient and extremely fast Each EPOS processor can be re-programmed within the system to execute optimized/modified versions of its original functionality EPOS Performance Results EPOS Configuration used: 4xALU 1xMUL 1xST-Port 1xLD-Port FU Latencies: ALU: 1 MUL: 3 LD: 4 ST: 1 Application NISC (cycles) EPOS (cycles) startup 1002 793 dijkstra 36074 15096 bubble 9691 2916 W ireless System A nalog C icuits Am plifier Filter ADC USB EPOS 802.11g EPOS B luetooth EPOS SRAM ROM MCU FFT EPOS Interrupt C ontroller Timers/ C ounters DMA C ontroller C rypto EPOS DCT EPOS W ireless System A nalog C icuits Am plifier Filter ADC USB 802.11g B luetooth SRAM ROM CPU DSP C ore Interrupt C ontroller Timers/ C ounters DMA C ontroller Encryption/ D ecryption PC M icroCode Mem ory FU1 FU2 FU3 + D ata Mem ory 1 C onstant Offset R egister File Superblock/H yperblock Form ation (IM PACT) Scheduling R egister A llocation Forw arding N etw ork Minim ization EPO S Flow PC R egister File FU1 D ata Mem ory 1 MC B ank2 MC B ank3 MC B ank4 PRF FU2 FU3 FU4 SRF1 Offset C onstant SRF2 SRF3 SRF4 + MC B ank1 EPO S accelerator BB1 BB2 BB3 BB4 90 10 10 90 1 1 BB1 BB2 BB3 BB4 90 10 10 90 B B 4d Superblock form ation 1 SB1 SB2 BB1 BB2 BB3 BB4 55 45 45 55 1 1 BB1 BB2 BB3 BB4 100 100 H yperblock form ation 1 HB1 1 99 Perform ance Speed-up 0 0.5 1 1.5 2 2.5 3 3.5 startup dijkstra bubble-sort N ISC EPO S

description

Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen. Illinois Center for Wireless Systems. Typical Design Practice & Design Paradigm Shift. Wireless SoC Design Trends and Challenges. - PowerPoint PPT Presentation

Transcript of Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen

Page 1: Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen

Using Custom Accelerators in Wireless SystemsAlex Papakonstantinou, Deming Chen

Illinois Center forWireless Systems

Wireless SoC Design Trends and Challenges

• Shrinking transistor technologies have transformed die into a host of extraordinary size and complexity systems

– All the analog and digital components that were implemented in 3-4 different ICs in past technologies, can now fit in a single chip

• Designer Productivity does not rise at the same rate as transistor capacity

– Design reuse and use of Commercial Off-The-Self (COTS) Intellectual Property (IP) help meet Time-To-Market (TTM) constraints but have other downsides

• Design space exploration is becoming a daunting task and conflicts with the shrinking TTM requirements

• System customization suffers in terms of functionality/ performance/power/area from “one system fits all” tactic

• Design focus is shifting from single thread speed optimization to execution parallelization through multi-processor systems

Typical Design Practice & Design Paradigm Shift

• COTS IP modules are integrated to meet the required system functionality

– Usually a generic microprocessor/micro-controller is used for the control part and a separate DSP processor for the signal processing part

– Fixed-functionality IP modules are integrated for the various data processing

• IP-use speeds up the design phase but:– imposes coarse granularity on optimization

decisions regarding functionality, performance and power dissipation

– does not eliminate design time entirely, as interfacing between different IP modules can take up considerable engineering resources

• Design Paradigm needs a shift to higher abstraction level– Design systems efficiently with higher flexibility and on-demand

customization

• Instruction-less custom processor / accelerator:

– Microcode memory stores microcode words which control Functional-Units (FU) and data transfers each cycle

– Program Counter (PC) holds next microcode memory address

– Microcode words do not require any decoding– FUs customized according to application domain– Application-custom forwarding paths between FUs

can eliminate unnecessary Register File (RF) reads/writes

EPOS (Explicitly Parallel Operations System)

• Instruction-Level Parallelism (ILP) extraction:

– The front-end of the IMPACT compiler is used to optimize the HLL description using:

• Traditional compiler techniques• Superblock and Hyperblock

creation

• The EPOS accelerators generated can substitute the generic COTS IP by:

– Offering high customization according to the system requirements– Providing better performance and power efficiency than a generic

DSP-core/microprocessor

EPOS – based Wireless SoC Solution

• Each module is mapped directly onto a customized EPOS accelerator

• The interfaces between the EPOS accelerators, as well as, between other IP and EPOS modules are defined in the HLL program and automatically synthesized along with the EPOS datapaths

• Exploration of alternative system implementations becomes efficient and extremely fast

• Each EPOS processor can be re-programmed within the system to execute optimized/modified versions of its original functionality

EPOS Performance Results• EPOS Configuration

used:– 4xALU– 1xMUL– 1xST-Port– 1xLD-Port

• FU Latencies:– ALU: 1– MUL: 3– LD: 4– ST: 1

ApplicationNISC(cycles)

EPOS(cycles)

startup 1002 793

dijkstra 36074 15096

bubble 9691 2916

Wireless SystemAnalogCicuits

Amplifier

Filter

ADC

USB EPOS

802.11g EPOS

Bluetooth EPOS

SRAM ROM

MCU

FFTEPOS

Interrupt Controller

Timers/Counters

DMA Controller

CryptoEPOS

DCTEPOS

Wireless SystemAnalogCicuits

Amplifier

Filter

ADC

USB

802.11g

Bluetooth

SRAM ROM

CPUDSPCore

Interrupt Controller

Timers/Counters

DMA Controller

Encryption/Decryption

PC

MicroCode Memory

FU1 FU2 FU3

+

DataMemory

1

ConstantOffsetRegister

File

Superblock/HyperblockFormation (IMPACT)

Scheduling

RegisterAllocation

Forwarding Network Minimization

EPOS Flow

PC

RegisterFile

FU1Data

Memory

1

MCBank2

MCBank3

MCBank4

PRF

FU2 FU3 FU4

SRF1

Offset Constant

SRF2 SRF3 SRF4

+

MCBank1

EPOS accelerator

BB1

BB2 BB3

BB4

9010

10 90

1

1

BB1

BB2 BB3

BB4

9010

10 90

BB4d

Superblock formation

1SB1

SB2

BB1

BB2 BB3

BB4

5545

45 55

1

1

BB1

BB2 BB3

BB4

100

100

Hyperblock formation

1HB1

1 99

Performance Speed-up

0

0.5

1

1.5

2

2.5

3

3.5

startup dijkstra bubble-sort

NISC

EPOS