University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power...

20
1 University of Michigan Electrical Engineering and Computer Science A Scalable Low-power A Scalable Low-power Architecture For Software Architecture For Software Radio Radio Scott Mahlke Collaboration between: University of Michigan, Arizona State University, and ARM Ltd.
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power...

Page 1: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

1 University of MichiganElectrical Engineering and Computer Science

A Scalable Low-power Architecture A Scalable Low-power Architecture For Software RadioFor Software Radio

Scott Mahlke

Collaboration between:University of Michigan, Arizona State University, and ARM Ltd.

Page 2: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

2 University of MichiganElectrical Engineering and Computer Science

Anatomy of a Cellular PhoneAnatomy of a Cellular Phone

BluetoothDSP+ASICs

GPSDSP+ASICs

BasebandProcessor

GPP+DSP+ASIC

AnalogFrontend

ASICs

ApplicationProcessorGPP+DSP

PowerManager

Camera

Keyboard

Display

Speaker

Microphone

W-CDMA

Page 3: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

3 University of MichiganElectrical Engineering and Computer Science

Software Defined Radio (SDR)Software Defined Radio (SDR)

• Use software routines instead of ASICs for the physical layer operations of wireless communication system

ASICs(PHY)ASICs(PHY)

ProgrammableHardware

ProgrammableHardware

SoftwareRoutinesSoftwareRoutines

• Rest of the talk– Characteristics of SDR algorithms– SODA architecture for power-efficient SDR– Compilation challenges and approach

Page 4: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

4 University of MichiganElectrical Engineering and Computer Science

Advantages of SDRAdvantages of SDR• Lower costs

– Platform longevity, higher volume– SW has lower development costs

• Time to market– Future protocols will have complex

implementations– Overlap testing/development cycles

• Adaptability– Standards change over time– Multi-mode operation– Sharing hardware resources

UWB EDGE 802.16a

802.16a Bluetooth

802.11b WCDMA 802.11n

SDR

Page 5: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

5 University of MichiganElectrical Engineering and Computer Science

Why is SDR Challenging?Why is SDR Challenging?

1

10

100

1000

0.1 1 10 100

Power (Watts)

Pe

ak

Pe

rfo

rma

nc

e (

Go

ps

)

Better

Pow

er Efficiency

10 Mops/m

W

100 Mops/m

W

1 Mops/m

W

GeneralPurpose

ProcessorsEmbeddedDSPs

Mobile SDRRequirements

Pentium MTI C6x

IBM CellHigh-end

DSPs

Page 6: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

6 University of MichiganElectrical Engineering and Computer Science

The Anatomy of Wireless ProtocolsThe Anatomy of Wireless Protocols

1. Filtering: suppress signals outside frequency band

2. Modulation: map source information onto signal waveforms

3. Channel Estimation: Estimate channel condition for transceivers

4. Error Correction: correct errors induced by noisy channel

LPF-Tx scrambler spreader InterleaverChannelencoder

LPF-Rx

searcher

descrambler despreader combiner

descrambler despreader

...

deinteleaverChanneldecoder

(turbo/viterbi)

Upper layersTransmitter

Receiver

D/A

A/D

FrontendW-CDMA Physical Layer Processing

LPF-Tx

LPF-Rx

scrambler spreader

descrambler despreader

descrambler despreader

combiner

searcher

InterleaverChannelencoder

deinteleaverChanneldecoder

(turbo/viterbi)

Page 7: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

7 University of MichiganElectrical Engineering and Computer Science

0

5000

10000

15000

20000

25000

30000

filtering modulation channelestimation

errorcorrection

[MO

PS

]

Idle State

Control Hold State

Active State

W-CDMA Workload ProfileW-CDMA Workload Profile

• One operation is equivalent to one RISC instruction• Searcher, Turbo decoder, and LPF are dominant workloads• Workload profile varies according to operation state

Page 8: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

8 University of MichiganElectrical Engineering and Computer Science

SDR Kernel CharacteristicsSDR Kernel Characteristics

• 8 to 16-bit precision• Vector operations

– long vectors– constant vector size

• Static data movement patterns

• Scalar operations

Kernels Type of Computation

Vector Width

W-CDMA

Filter Vector 64

Modulation Vector 2560

Channel Est. Vector 320

Error Correction Mixed 8 or 256

802.11a

Filter Vector 33

Modulation (FFT) Vector 64

Channel Est. Mixed 16

Error Correction Mixed 64

Page 9: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

9 University of MichiganElectrical Engineering and Computer Science

SODA System Architecture for 3GSODA System Architecture for 3G

• 4 PEs– static kernel mapping and

scheduling– SIMD+Scalar units

• 1 ARM GPP controller– scalar algorithms and

protocol controls

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

GlobalMemSystem ArchitectureARM

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

GlobalMemSystem ArchitectureARM

Page 10: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

10 University of MichiganElectrical Engineering and Computer Science

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

GlobalMemSystem ArchitectureARM

• 2-Level scratchpad memories– 12KB Local scratchpad memory

for stream queues– 64KB global scratchpad memory

for large buffers

• Low-throughput shared bus– 200MHz 32-bit bus– inter-PE communication using

DMA

SODA System Architecture for 3GSODA System Architecture for 3G

Page 11: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

11 University of MichiganElectrical Engineering and Computer Science

SODA PE ArchitectureSODA PE ArchitecturePE

Scalar pipeline

32x16bit

SSN

Vector to ScalarStage 1

SIMD Memory (8KB)

IR

RF ID

16bit EX

16bit WBALU

Scalar Memory (4KB)

32-waySIMD

IR

ScalarRF

RF ID

EX

WB

IR

AGURF

AGU ALU12bit

Inst.Mem.4KB

SIMD pipeline

AGU pipelineDMA16bit BUS

512bit

Vector to ScalarStage 2

Scalar to Vector

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

2 issue LIW (400MHz) - SIMD + (Scalar or AGU) DMA: - mem-to-mem transfer - access global memory

Page 12: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

12 University of MichiganElectrical Engineering and Computer Science

SODA PE SIMD PipelineSODA PE SIMD PipelinePE

Scalar pipeline

32x16bit

SSN

Vector to ScalarStage 1

SIMD Memory (8KB)

IR

RF ID

16bit EX

16bit WBALU

Scalar Memory (4KB)

32-waySIMD

IR

ScalarRF

RF ID

EX

WB

IR

AGURF

AGU ALU12bit

Inst.Mem.4KB

SIMD pipeline

AGU pipelineDMA16bit BUS

512bit

Vector to ScalarStage 2

Scalar to Vector

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

16-bit 16 entries2 read/ 1 write port

RF

EX

16-bitMultiplier

40-bit ACC

16-bit

ALU

16bit

16bitWB

16bit

Page 13: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

13 University of MichiganElectrical Engineering and Computer Science

SODA PE SIMD PipelineSODA PE SIMD PipelinePE

Scalar pipeline

32x16bit

SSN

Vector to ScalarStage 1

SIMD Memory (8KB)

IR

RF ID

16bit EX

16bit WBALU

Scalar Memory (4KB)

32-waySIMD

IR

ScalarRF

RF ID

EX

WB

IR

AGURF

AGU ALU12bit

Inst.Mem.4KB

SIMD pipeline

AGU pipelineDMA16bit BUS

512bit

Vector to ScalarStage 2

Scalar to Vector

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

SIMD: - 32 wide - predicated exec. - predicated neg.

Memory: - 512bit port - 1 read port - 1 write port - 8 KBytes

Page 14: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

14 University of MichiganElectrical Engineering and Computer Science

SODA PE SIMD Shuffle NetworkSODA PE SIMD Shuffle NetworkPE

Scalar pipeline

32x16bit

SSN

Vector to ScalarStage 1

SIMD Memory (8KB)

IR

RF ID

16bit EX

16bit WBALU

Scalar Memory (4KB)

32-waySIMD

IR

ScalarRF

RF ID

EX

WB

IR

AGURF

AGU ALU12bit

Inst.Mem.4KB

SIMD pipeline

AGU pipelineDMA16bit BUS

512bit

Vector to ScalarStage 2

Scalar to Vector

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

SIMD Shuffle Network

• Shuffle exchange• Inverse shuffle exchange• Exchange only• Iterative feedback

Page 15: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

15 University of MichiganElectrical Engineering and Computer Science

SODA PE Scalar PipelineSODA PE Scalar PipelinePE

Scalar pipeline

32x16bit

SSN

Vector to ScalarStage 1

SIMD Memory (8KB)

IR

RF ID

16bit EX

16bit WBALU

Scalar Memory (4KB)

32-waySIMD

IR

ScalarRF

RF ID

EX

WB

IR

AGURF

AGU ALU12bit

Inst.Mem.4KB

SIMD pipeline

AGU pipelineDMA16bit BUS

512bit

Vector to ScalarStage 2

Scalar to Vector

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

Scalar: - One 16-bit datapath - No mult unit Scalar memory: - 16bit port - 1 read/write port - 4 KBytes Scalar-to-Vector Vector-to-Scalar

Page 16: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

16 University of MichiganElectrical Engineering and Computer Science

Power Consumption at 180nmPower Consumption at 180nm

0

200

400

600

800

1000

1200

1400

PE DataMemory

PE SIMDRF

PE SIMDALUs

PE SIMDPipeline

PE Others GlobalMemory

SystemOthers

Po

wer

(m

W)

in 1

80n

m

W-CDMA (2Mbps) 802.11a (24Mbps)

• 180nm ~ 3 W, 26.6 mm2

• 90nm (est) ~ 0.5 W, 6.7 mm2

Page 17: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

17 University of MichiganElectrical Engineering and Computer Science

SDR Compilation StrategySDR Compilation Strategy• Two level application description

– System-level: Concurrent tasks extracted from “C + channels + attributes”

– Kernel-level: Data parallelism extracted from “C + vectors + Matlab operators”

• System compilation – Task level parallelism– Generates tasks, schedules, communication stubs, DMA

requests, timing assertions, synchronization, debug support• Kernel compilation – Data level parallelism

– Lower virtual DLP to physical implementation

Page 18: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

18 University of MichiganElectrical Engineering and Computer Science

Stylized Automatic ParallelizationStylized Automatic Parallelization

void main() { for(int i=0; i<N; ++i) { int x = spread(w[i]); int y = scramb(x); fir(y); }}

void main() { for(int i=0; i<N; ++i) { int x = spread(w[i]) @P0; int y = scramb(x) @P1; fir(y) @P2; }}

1. Task assignment

void t1() { for(int i=0; i<N; ++i) { int x = spread(w[i])@P0; putchannel(X,x); }}void t2() { for(int i=0; i<N; ++i) { int y = scramb(getchan(X))@P1; putchannel(Y,y); }}void t3() { for(int i=0; i<N; ++i) { fir(getchannel(Y)) @ P2; }}main() { fork (t1); fork (t2); fork (t3);}2. Decompose tasks

Page 19: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

19 University of MichiganElectrical Engineering and Computer Science

Kernel Level CompilationKernel Level Compilationtemplate<class T, TAPS, BSIZE>kernel FIR { vector<T, TAPS> z; vector<T, TAPS> coeff;

void set_coeff(vector<T, TAPS> c) { coeff = c; }

void run(channel<T, BSIZE> inbuf, channel<T, BSIZE> outbuf) { T in, out;

for (i = 0; i < BSIZE; i++) { in = inbuf.pop(); z += coeff * in; out = z[0]; outbuf.push(out); z = (z(1:TAPS-1),0); } }};

int main() { FIR<int8, 64, BUF_SIZE> fir65r; ...}

1. object declaration

// in = inbuf.pop();sld(sin, inbuf);agu_incr(inbuf, sizeof(int8));

// z += coeff * in;vdup(vin, sin);vmul(vt1, _fir65r_coeff_v0, vin);vmul(vt2, _fir65r_coeff_v1, vin);vadd(_fir65r_z_v0, _fir65_z_v0, vt1);vadd(_fir65r_z_v1, _fir65_z_v1, vt2);

2. operation translation

Page 20: University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

20 University of MichiganElectrical Engineering and Computer Science

Final ThoughtsFinal Thoughts• 2G and 3G SDR solutions are achievable in 90nm

– 3.9G 4-10x more performance with mW power consumption• Core technologies for future networks

– OFDM 64 – 2048 point FFT– MIMO – use of multiple antennas for transmission/reception– Low density parity check codes

• Key insight: SDR requires innovation across algorithm, software and hardware

• SDR platforms offer low-cost, longevity, and adaptability