Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen
description
Transcript of Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen
Using Custom Accelerators in Wireless SystemsAlex Papakonstantinou, Deming Chen
Illinois Center forWireless Systems
Wireless SoC Design Trends and Challenges
• Shrinking transistor technologies have transformed die into a host of extraordinary size and complexity systems
– All the analog and digital components that were implemented in 3-4 different ICs in past technologies, can now fit in a single chip
• Designer Productivity does not rise at the same rate as transistor capacity
– Design reuse and use of Commercial Off-The-Self (COTS) Intellectual Property (IP) help meet Time-To-Market (TTM) constraints but have other downsides
• Design space exploration is becoming a daunting task and conflicts with the shrinking TTM requirements
• System customization suffers in terms of functionality/ performance/power/area from “one system fits all” tactic
• Design focus is shifting from single thread speed optimization to execution parallelization through multi-processor systems
Typical Design Practice & Design Paradigm Shift
• COTS IP modules are integrated to meet the required system functionality
– Usually a generic microprocessor/micro-controller is used for the control part and a separate DSP processor for the signal processing part
– Fixed-functionality IP modules are integrated for the various data processing
• IP-use speeds up the design phase but:– imposes coarse granularity on optimization
decisions regarding functionality, performance and power dissipation
– does not eliminate design time entirely, as interfacing between different IP modules can take up considerable engineering resources
• Design Paradigm needs a shift to higher abstraction level– Design systems efficiently with higher flexibility and on-demand
customization
• Instruction-less custom processor / accelerator:
– Microcode memory stores microcode words which control Functional-Units (FU) and data transfers each cycle
– Program Counter (PC) holds next microcode memory address
– Microcode words do not require any decoding– FUs customized according to application domain– Application-custom forwarding paths between FUs
can eliminate unnecessary Register File (RF) reads/writes
EPOS (Explicitly Parallel Operations System)
• Instruction-Level Parallelism (ILP) extraction:
– The front-end of the IMPACT compiler is used to optimize the HLL description using:
• Traditional compiler techniques• Superblock and Hyperblock
creation
• The EPOS accelerators generated can substitute the generic COTS IP by:
– Offering high customization according to the system requirements– Providing better performance and power efficiency than a generic
DSP-core/microprocessor
EPOS – based Wireless SoC Solution
• Each module is mapped directly onto a customized EPOS accelerator
• The interfaces between the EPOS accelerators, as well as, between other IP and EPOS modules are defined in the HLL program and automatically synthesized along with the EPOS datapaths
• Exploration of alternative system implementations becomes efficient and extremely fast
• Each EPOS processor can be re-programmed within the system to execute optimized/modified versions of its original functionality
EPOS Performance Results• EPOS Configuration
used:– 4xALU– 1xMUL– 1xST-Port– 1xLD-Port
• FU Latencies:– ALU: 1– MUL: 3– LD: 4– ST: 1
ApplicationNISC(cycles)
EPOS(cycles)
startup 1002 793
dijkstra 36074 15096
bubble 9691 2916
Wireless SystemAnalogCicuits
Amplifier
Filter
ADC
USB EPOS
802.11g EPOS
Bluetooth EPOS
SRAM ROM
MCU
FFTEPOS
Interrupt Controller
Timers/Counters
DMA Controller
CryptoEPOS
DCTEPOS
Wireless SystemAnalogCicuits
Amplifier
Filter
ADC
USB
802.11g
Bluetooth
SRAM ROM
CPUDSPCore
Interrupt Controller
Timers/Counters
DMA Controller
Encryption/Decryption
PC
MicroCode Memory
FU1 FU2 FU3
+
DataMemory
1
ConstantOffsetRegister
File
Superblock/HyperblockFormation (IMPACT)
Scheduling
RegisterAllocation
Forwarding Network Minimization
EPOS Flow
PC
RegisterFile
FU1Data
Memory
1
MCBank2
MCBank3
MCBank4
PRF
FU2 FU3 FU4
SRF1
Offset Constant
SRF2 SRF3 SRF4
+
MCBank1
EPOS accelerator
BB1
BB2 BB3
BB4
9010
10 90
1
1
BB1
BB2 BB3
BB4
9010
10 90
BB4d
Superblock formation
1SB1
SB2
BB1
BB2 BB3
BB4
5545
45 55
1
1
BB1
BB2 BB3
BB4
100
100
Hyperblock formation
1HB1
1 99
Performance Speed-up
0
0.5
1
1.5
2
2.5
3
3.5
startup dijkstra bubble-sort
NISC
EPOS