Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen

Using Custom Accelerators in Wireless SystemsAlex Papakonstantinou, Deming Chen

Illinois Center forWireless Systems

Wireless SoC Design Trends and Challenges

• Shrinking transistor technologies have transformed die into a host of extraordinary size and complexity systems

– All the analog and digital components that were implemented in 3-4 different ICs in past technologies, can now fit in a single chip

• Designer Productivity does not rise at the same rate as transistor capacity

– Design reuse and use of Commercial Off-The-Self (COTS) Intellectual Property (IP) help meet Time-To-Market (TTM) constraints but have other downsides

• Design space exploration is becoming a daunting task and conflicts with the shrinking TTM requirements

• System customization suffers in terms of functionality/ performance/power/area from “one system fits all” tactic

• Design focus is shifting from single thread speed optimization to execution parallelization through multi-processor systems

Typical Design Practice & Design Paradigm Shift

• COTS IP modules are integrated to meet the required system functionality

– Usually a generic microprocessor/micro-controller is used for the control part and a separate DSP processor for the signal processing part

– Fixed-functionality IP modules are integrated for the various data processing

• IP-use speeds up the design phase but:– imposes coarse granularity on optimization

decisions regarding functionality, performance and power dissipation

– does not eliminate design time entirely, as interfacing between different IP modules can take up considerable engineering resources

• Design Paradigm needs a shift to higher abstraction level– Design systems efficiently with higher flexibility and on-demand

customization

• Instruction-less custom processor / accelerator:

– Microcode memory stores microcode words which control Functional-Units (FU) and data transfers each cycle

– Program Counter (PC) holds next microcode memory address

– Microcode words do not require any decoding– FUs customized according to application domain– Application-custom forwarding paths between FUs

can eliminate unnecessary Register File (RF) reads/writes

EPOS (Explicitly Parallel Operations System)

• Instruction-Level Parallelism (ILP) extraction:

– The front-end of the IMPACT compiler is used to optimize the HLL description using:

• Traditional compiler techniques• Superblock and Hyperblock

creation

• The EPOS accelerators generated can substitute the generic COTS IP by:

– Offering high customization according to the system requirements– Providing better performance and power efficiency than a generic

DSP-core/microprocessor

EPOS – based Wireless SoC Solution

• Each module is mapped directly onto a customized EPOS accelerator

• The interfaces between the EPOS accelerators, as well as, between other IP and EPOS modules are defined in the HLL program and automatically synthesized along with the EPOS datapaths

• Exploration of alternative system implementations becomes efficient and extremely fast

• Each EPOS processor can be re-programmed within the system to execute optimized/modified versions of its original functionality

EPOS Performance Results• EPOS Configuration

used:– 4xALU– 1xMUL– 1xST-Port– 1xLD-Port

• FU Latencies:– ALU: 1– MUL: 3– LD: 4– ST: 1

ApplicationNISC(cycles)

EPOS(cycles)

startup 1002 793

dijkstra 36074 15096

bubble 9691 2916

Wireless SystemAnalogCicuits

Amplifier

Filter

ADC

USB EPOS

802.11g EPOS

Bluetooth EPOS

SRAM ROM

MCU

FFTEPOS

Interrupt Controller

Timers/Counters

DMA Controller

CryptoEPOS

DCTEPOS

Wireless SystemAnalogCicuits

Amplifier

Filter

ADC

USB

802.11g

Bluetooth

SRAM ROM

CPUDSPCore

Interrupt Controller

Timers/Counters

DMA Controller

Encryption/Decryption

PC

MicroCode Memory

FU1 FU2 FU3

+

DataMemory

1

ConstantOffsetRegister

File

Superblock/HyperblockFormation (IMPACT)

Scheduling

RegisterAllocation

Forwarding Network Minimization

EPOS Flow

PC

RegisterFile

FU1Data

Memory

1

MCBank2

MCBank3

MCBank4

PRF

FU2 FU3 FU4

SRF1

Offset Constant

SRF2 SRF3 SRF4

+

MCBank1

EPOS accelerator

BB1

BB2 BB3

BB4

9010

10 90

1

1

BB1

BB2 BB3

BB4

9010

10 90

BB4d

Superblock formation

1SB1

SB2

BB1

BB2 BB3

BB4

5545

45 55

1

1

BB1

BB2 BB3

BB4

100

100

Hyperblock formation

1HB1

1 99

Performance Speed-up

0

0.5

1

1.5

2

2.5

3

3.5

startup dijkstra bubble-sort

NISC

EPOS

Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen

Documents

Transcript of Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen