Ziria: Wireless Programming for Hardware Dummies

94
Ziria: Wireless Programming for Hardware Dummies Božidar Radunović joint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis http://research.microsoft.com/en-us/ projects/ziria/

description

Ziria: Wireless Programming for Hardware Dummies. Božidar Radunović j oint work with Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis. http://research.microsoft.com/en-us/projects/ziria/. Layout. Introduction Ziria Programming Language Compilation and Execution - PowerPoint PPT Presentation

Transcript of Ziria: Wireless Programming for Hardware Dummies

Page 1: Ziria: Wireless Programming  for Hardware Dummies

Ziria: Wireless Programming for Hardware Dummies

Božidar Radunovićjoint work with

Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis

http://research.microsoft.com/en-us/projects/ziria/

Page 2: Ziria: Wireless Programming  for Hardware Dummies

2

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

Page 3: Ziria: Wireless Programming  for Hardware Dummies

3

Motivation – why this course? Lots of innovation in PHY/MAC design Popular experimental platform: GNURadio

Relatively easy to program but slow, no real network deployment Modern wireless PHYs require high-rate DSP Real-time platforms [SORA, WARP, …]

Achieve protocol processing requirements, difficult to program, no code portability, lots of low-level hand-tuning

Page 4: Ziria: Wireless Programming  for Hardware Dummies

4

Issues for wireless researchers CPU platforms (e.g. SORA)

Manual vectorization, CPU placement Cache / data sizing optimizations

FPGA platforms (e.g. WARP) Latency-sensitive design, difficult for new students/researchers to

break into Portability/readability

Manually highly optimized code is difficult to read and maintain Also: practically impossible to target another platform

Difficulty in writing and reusing code

hampers innovation

Page 5: Ziria: Wireless Programming  for Hardware Dummies

5

Hardware Platforms FPGA: Programmer deals with hardware issues

WARP, Airblue CPUs: SORA bricks [MSR Asia], GNURadio blocks

SORA was a huge breakthrough, design of RX/TX with PCI interface, 16Gbps throughput, ~ μs latency

Very efficient C++ library We build on top of SORA

Many other options now available: E.g. http://myriadrf.org/

Page 6: Ziria: Wireless Programming  for Hardware Dummies

6

What is wrong with current tools?

Page 7: Ziria: Wireless Programming  for Hardware Dummies

7

Current SDR Software Tools Portable (FPGA/CPU), graphical interface:

Simulink, LabView CPU-based: C/C++/Python

GnuRadio, SORA Control and data separation

CodiPhy [U. of Colorado], OpenRadio [Stanford]: Specialized languages (DSL):

Stream processing languages: StreamIt [MIT] DSLs for DSP/arrays, Feldspar [Chalmers]: we put more emphasis on

control Spiral

Page 8: Ziria: Wireless Programming  for Hardware Dummies

8

Issues Programming abstraction is tied to execution model Programmer has to reason about how the program will be

executed/optimized while writing the code Verbose programming Shared state Low-level optimizationWe next illustrate on Sora code examples(other platforms are have similar problems)

Page 9: Ziria: Wireless Programming  for Hardware Dummies

9

Running example: WiFi receiverremoveDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

Page 10: Ziria: Wireless Programming  for Hardware Dummies

10

How do we execute this on CPU?removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

Page 11: Ziria: Wireless Programming  for Hardware Dummies

11

Dataflow streaming abstractions

Events (messages) come in

Events (messages) come out

Why unsatisfactory? It does not expose: (1)When is vertex state (re-)

initialized?(2)Under which external “control”

messages can the vertex change behavior?

(3)How can vertex transmit “control” information to other vertices?

Predominant abstraction today [e.g. SORA, StreamIt, GnuRadio] is that of a “vertex” in a dataflow graph

Reasonable as abstraction of the execution model Unsatisfactory as programming and compilation model

Page 12: Ziria: Wireless Programming  for Hardware Dummies

12

Shared statestatic inlinevoid CreateDemodGraph11a_40M (ISource*& srcAll, ISource*& srcViterbi, ISource*& srcCarrierSense){CREATE_BRICK_SINK (drop, TDropAny, BB11aDemodCtx );CREATE_BRICK_SINK (fsink, TBB11aFrameSink, BB11aDemodCtx );CREATE_BRICK_FILTER (desc, T11aDesc, BB11aDemodCtx, fsink );typedef T11aViterbi <5000*8, 48, 256> T11aViterbiComm;CREATE_BRICK_FILTER (viterbi,T11aViterbiComm::Filter,BB11aDemodCtx, desc );CREATE_BRICK_FILTER (vit0, TThreadSeparator<>::Filter, BB11aDemodCtx, viterbi);// 6MCREATE_BRICK_FILTER (di6, T11aDeinterleaveBPSK, BB11aDemodCtx, vit0 );CREATE_BRICK_FILTER (dm6, T11aDemapBPSK::filter, BB11aDemodCtx, di6 );…

… CREATE_BRICK_SINK (plcp, T11aPLCPParser, BB11aDemodCtx );CREATE_BRICK_FILTER (sviterbik, T11aViterbiSig, BB11aDemodCtx, plcp );CREATE_BRICK_FILTER (dibpsk, T11aDeinterleaveBPSK, BB11aDemodCtx, sviterbik );CREATE_BRICK_FILTER (dmplcp, T11aDemapBPSK::filter, BB11aDemodCtx, dibpsk );CREATE_BRICK_DEMUX5 ( sigsel,TBB11aRxRateSel, BB11aDemodCtx,dmplcp, dm6, dm12, dm24, dm48 );CREATE_BRICK_FILTER (pilot, TPilotTrack, BB11aDemodCtx, sigsel );CREATE_BRICK_FILTER (pcomp, TPhaseCompensate, BB11aDemodCtx, pilot );CREATE_BRICK_FILTER (chequ, TChannelEqualization, BB11aDemodCtx, pcomp );CREATE_BRICK_FILTER (fft, TFFT64, BB11aDemodCtx, chequ );; CREATE_BRICK_FILTER (fcomp, TFreqCompensation, BB11aDemodCtx, fft );CREATE_BRICK_FILTER (dsym, T11aDataSymbol, BB11aDemodCtx, fcomp );CREATE_BRICK_FILTER (dsym0, TNoInline, BB11aDemodCtx, dsym );Shared

state

Page 13: Ziria: Wireless Programming  for Hardware Dummies

13

Separation of control and datavoid Reset() { Next0()->Reset(); // No need to reset all path, just reset the path we used in this frame

switch (data_rate_kbps) {case 6000:case 9000:

Next1()->Reset();break;

case 12000:case 18000:

Next2()->Reset();break;

case 24000:case 36000:

Next3()->Reset();break;

case 48000:case 54000:

Next4()->Reset();break;

} }

Resetting whoever* is downstream*we don’t know who that is when we write this

component

Page 14: Ziria: Wireless Programming  for Hardware Dummies

14

Verbosity DEFINE_LOCAL_CONTEXT(TBB11aRxRateSel, CF_11RxPLCPSwitch, CF_11aRxVector );template<TDEMUX5_ARGS>class TBB11aRxRateSel : public TDemux<TDEMUX5_PARAMS>{ CTX_VAR_RO (CF_11RxPLCPSwitch::PLCPState, plcp_state ); CTX_VAR_RO (ulong, data_rate_kbps ); // data rate in kbps

public: …..public: REFERENCE_LOCAL_CONTEXT(TBB11aRxRateSel); STD_DEMUX5_CONSTRUCTOR(TBB11aRxRateSel) BIND_CONTEXT(CF_11RxPLCPSwitch::plcp_state, plcp_state) BIND_CONTEXT(CF_11aRxVector::data_rate_kbps, data_rate_kbps) {}

- Declarations are written in host language- Language is not specialized, so often verbose

- Hinders fast prototyping

Page 15: Ziria: Wireless Programming  for Hardware Dummies

15

SDR manual optimizations (LUT)struct _init_lut { void operator()(uchar (&lut)[256][128]) { int i,j,k;

uchar x, s, o; for ( i=0; i<256; i++) {

for ( j=0; j<128; j++) { x = (uchar)i; s = (uchar)j; o = 0; for ( k=0; k<8; k++) {

uchar o1 = (x ^ (s) ^ (s >> 3)) & 0x01;

s = (s >> 1) | (o1 << 6);

o = (o >> 1) | (o1 << 7);

x = x >> 1; } lut [i][j] = o; } } } }

Hand-written bit-fiddling code to create lookup

tables for specific computations that must run

very fast

?

Page 16: Ziria: Wireless Programming  for Hardware Dummies

16

VectorizationremoveDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

- Beneficial to process items in chunks

- But how large can chunks be?

Page 17: Ziria: Wireless Programming  for Hardware Dummies

17

My Own Frustrations Implemented several PHY algorithms in FPGA

Never been able to reuse them: Complexity of interfacing (timing and precision) was higher than

rewriting! Implemented several PHY algorithms in Sora

Better reuse but still difficult Spent 2h figuring out which internal state variable I haven’t

initialized when borrowed a piece of code from other project. I want tools to allow me to write reusable codeand incrementally build ever more complex systems!

Page 18: Ziria: Wireless Programming  for Hardware Dummies

18

Improving this situation New wireless programming platform

1. Code written in a high-level language2. Compiler deals with low-level code optimization3. Same code compiles on different platforms (not there just yet!)

Challenges1. Design PL abstractions that are intuitive and expressive2. Design efficient compilation schemes (to multiple platforms)

What is special about wireless1. … that affects abstractions: large degree of separation b/w data

and control2. … that affects compilation: need high-throughput stream

processing

Page 19: Ziria: Wireless Programming  for Hardware Dummies

19

Our Choice: Domain Specific Language What are domain-specific languages? Examples:

Make SQL

Benefits: Language design captures specifics of the task This enables compiler to optimize better

Page 20: Ziria: Wireless Programming  for Hardware Dummies

20

Why is wireless code special? Wireless = lots of signal processing Control vs data flow separation Data processing elements:

FFT/IFFT, Coding/Decoding, Scrambling/Descrambling Predictable execution and performance, independent of data

Control flow elements: Header processing, rate adaptation

Page 21: Ziria: Wireless Programming  for Hardware Dummies

21

Programming modelremoveDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

Page 22: Ziria: Wireless Programming  for Hardware Dummies

22

How do we want code to look like? Example: IEEE 802.11a scrambler: S(x) = x7 + x4 + 1

Ziria:x <- take; do{

tmp := (scrmbl_st[3] ^ scrmbl_st[0]);scrmbl_st[0:5] := scrmbl_st[1:6];scrmbl_st[6] := tmp;y := x ^ tmp

}; emit (y)

Page 23: Ziria: Wireless Programming  for Hardware Dummies

23

What do we not want to optimize? We assume efficient DSP libraries:

FFT Viterbi/Turbo decoding

Same are used in many standards: WiFi, WiMax, LTE

This is readily available: FPGA (Xilinx, Altera) DSP (coprocessors) CPUs (Volk, Sora libraries, Spiral)

Most of PHY design is in connecting these blocks

Page 24: Ziria: Wireless Programming  for Hardware Dummies

24

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

Page 25: Ziria: Wireless Programming  for Hardware Dummies

25

Ziria: A 2-layer design Lower layer

Imperative C-like code for manipulating bits, bytes, arrays, etc. NB: You can plug-in any C function in this layer

Higher layer A monadic language for specifying and staging stream processors Enforces clean separation between control and data flow, clean state

semantics Runtime implements low-level execution model

Monadic pipeline staging language facilitates aggressive compiler optimizations

Page 26: Ziria: Wireless Programming  for Hardware Dummies

26

A stream transformer t, of type:

ST T a b

Ziria: control-aware stream abstractions

t

inStream (a)

outStream (b)

c

inStream (a)

outStream (b)

outControl (v)

A stream computer c, of type:

ST (C v) a b

Page 27: Ziria: Wireless Programming  for Hardware Dummies

27

Staging a pipeline, in diagrams

c1

t1

t2

t3

C T

repeat { v <- (c1 >>> t1) ; t2 >>> t3 }

“Vertical composition” (along data path -- “arrows”)

“Horizontal composition” (along control path --

“monads”)

Page 28: Ziria: Wireless Programming  for Hardware Dummies

28

Running example:WiFi Scrambler

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Page 29: Ziria: Wireless Programming  for Hardware Dummies

29

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in <rest of the code>

Start defining computational method

End defining computational method

Page 30: Ziria: Wireless Programming  for Hardware Dummies

30

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Local variables

Types:- Bit- Array of

bits

Constants

Page 31: Ziria: Wireless Programming  for Hardware Dummies

31

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Special-purpose computers:

Page 32: Ziria: Wireless Programming  for Hardware Dummies

32

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

Imperative (C/Matlab-like) code:

Page 33: Ziria: Wireless Programming  for Hardware Dummies

33

let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;

repeat seq { x <- take;

do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };

emit y }in ...

repeat

take do emit

yx

Computers and transformers

Page 34: Ziria: Wireless Programming  for Hardware Dummies

34

Whole program Read >>> do_something >>> write

Reads and writes can come from RF, IP, file, dummy

Page 35: Ziria: Wireless Programming  for Hardware Dummies

35

Computation language primitives Define control flow Two groups:

Transformers Computers

Page 36: Ziria: Wireless Programming  for Hardware Dummies

36

Transformers Map:

let f(x : int) = var y : int = 42; y := y + 1; return (x+y);in

read >>> map f >>> write

Repeat

let f(x : int) = x <- take; if (x > 0) then emit 1in

read >>> repeat f >>> write

Page 37: Ziria: Wireless Programming  for Hardware Dummies

37

Computers While:while (!crc > 0) { x <- take; do {crc = search(x);}}

If-then-else:if (rate == CR_12) then emit enc12(x);else emit enc23(x);

Also: take, emit, for

Page 38: Ziria: Wireless Programming  for Hardware Dummies

38

Expression language – data processing Mix of C and Matlab Can be directly linked to any C function Subset of data types (mainly fixed point):

<basetype> ::= bit | bool | double | int | int8 | int16 | int32 | complex | complex16 | complex32 | struct TYPENAME | arr <basetype> | arr[INTEGER] <basetype> | arr[length(VARNAME)] <basetype>

Page 39: Ziria: Wireless Programming  for Hardware Dummies

39

Expression language - examplelet build_coeff(pcoeffs:arr[64] complex16, ave:int16, delta:int16) =

var th:int16;

th := ave - delta * 26; for i in [64-26, 26] { pcoeffs[i] := complex16{re=cos_int16(th);im=-sin_int16(th)}; th := th + delta }; th := th + delta; for i in [1,26] { pcoeffs[i] := complex16{re=cos_int16(th);im=-sin_int16(th)}; th := th + delta }in

Array (equivalent to [64-26:64])Fixed-point complex numbers

External C function

Function

Page 40: Ziria: Wireless Programming  for Hardware Dummies

40

Libraries Ziria header: let external v_sub_complex32(c:arr complex32, a:arr[length(c)] complex32, b:arr[length(c)] complex32 ) : () in

C method:int __ext_v_add_complex32(struct complex32* c, int len, struct complex32* a, int __unused_2, struct complex32* b, int __unused_1)

Libraries (mainly linked to existing Sora libraries): SIMD instructions, FFT and Viterbi, fixed-point trigonometry,

visualisation

Page 41: Ziria: Wireless Programming  for Hardware Dummies

41

Frequently Asked Questions Why defining a new language? Why not use C/Matlab/<your favourite language>?

How do you share state? Why using let x = 20+3*z in instead ofx := 20 + 3*z;?

Why x <- take and not x := take?

Page 42: Ziria: Wireless Programming  for Hardware Dummies

42

Question: How do you implement teleport message?

Decoding

Frequency mixing

Equalizer new_freqreconfiguration

message

Page 43: Ziria: Wireless Programming  for Hardware Dummies

43

Answer: Use repeat to reinitialize in the new state

let processor() = var new_freq := X; // initializerepeat { ret <- ( freq_mixing(new_freq)

>>> equalizer >>> decoding

) ; do{ new_freq := ret } }

Freq_mixing

Decoding

repeatEqualize

r

Page 44: Ziria: Wireless Programming  for Hardware Dummies

44

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

Page 45: Ziria: Wireless Programming  for Hardware Dummies

45

How to write a compiler? Haskell + libraries

Parsing, code generation, flexible types, pattern matching First version in <2 months Easily extendible Moral: compilers can be a useful tool!

Page 46: Ziria: Wireless Programming  for Hardware Dummies

46

Compilation – High-level view Expression language -> C code Computation language -> Execution model Numerous optimizations on the way:

Vectorization Lookup tables Conventional optimizations: Folding, inlining, …

Page 47: Ziria: Wireless Programming  for Hardware Dummies

47

Execution model: How to execute code?

removeDC

DetectCarrier

ChannelEstimation

InvertChannel

Packetstart

Channel info

Decode Header

InvertChannel

Decode Packet

Packetinfo

Page 48: Ziria: Wireless Programming  for Hardware Dummies

Runtime

tick()

process(x)

YIELD (data_val)

SKIP

DONE (control_val)

B1

B2process(x)

tick()

Q: Why do we need ticks?

Actions: Return values:YIELD

DONE

A: Example: emit 1; emit 2; emit 3

Page 49: Ziria: Wireless Programming  for Hardware Dummies

49

Execution model - examplelet comp test1() = repeat{ (x:int) <- take; emit x; }in

tick()

SKIP

tick()

SKIP

tick()

YIELD(n)

read[int] >>> test1() >>> test1() >>> write[int]

process(n)

YIELD(n)

process(n)

process(n)DONE(n)process(n)

YIELD(n)

Page 50: Ziria: Wireless Programming  for Hardware Dummies

50

Runtime main loopL1: t.init() // init top-level componentL2: whatis := t.tick()L3: if (whatis == Yield b) then { put_buf(b); goto L2 } else if (whatis==Skip) then goto L2 else if (whatis==Done) then exit() else if (whatis==NeedInput) then { c = get_buf(); whatis := t.process(x); goto L3; }

In reality:• Very few function calls with a CPS-

based translation: every “process” function knows its continuation

• Optimizations: never tick components with trivial tick(), never generate process() for tick()-only components

• Only indirection is for bind: at different points in times, function pointers point to the correct “process” and “tick”

• Slightly different approach to input/output

Page 51: Ziria: Wireless Programming  for Hardware Dummies

51

How about performance?let comp test1() = repeat{ (x:int) <- take; emit x + 1; }in

read[int] >>> test1() >>> test1() >>> write[int]

(((read >>> let auto_map_6(x: int32) = x + 1 in {map auto_map_6}) >>> let auto_map_7(x: int32) = x + 1 in {map auto_map_7}) >>> write)

buf_getint32(pbuf_ctx, &__yv_tmp_ln10_7_buf);__yv_tmp_ln11_5_buf = auto_map_6_ln2_9(__yv_tmp_ln10_7_buf); __yv_tmp_ln12_3_buf = auto_map_7_ln2_10(__yv_tmp_ln11_5_buf); buf_putint32(pbuf_ctx, __yv_tmp_ln12_3_buf);

Page 52: Ziria: Wireless Programming  for Hardware Dummies

52

Type-preserving transformationslet block_VECTORIZED (u: unit) = var y: int; repeat let vect_up_wrap_46 () = var vect_ya_48: arr[4] int; (vect_xa_47 : arr[4] int) <- take1; __unused_174 <- times 4 (\vect_j_50. (x : int) <- return vect_xa_47[0*4+vect_j_50*1+0]; __unused_1 <- return y := x+1; return vect_ya_48[vect_j_50*1+0] := y); emit vect_ya_48 in vect_up_wrap_46 (tt)let block_VECTORIZED (u: unit) = var y: int; repeat let vect_up_wrap_46 () = var vect_ya_48: arr[4] int; (vect_xa_47 : arr[4] int) <- take1; emit let __unused_174 = for vect_j_50 in 0, 4 { let x = vect_xa_47[0*4+vect_j_50*1+0] in let __unused_1 = y := x+1 in vect_ya_48[vect_j_50*1+0] := y } in vect_ya_48 in vect_up_wrap_46 (tt)

Dataflow graph iteration converted to tight loop! In this case we got x3

speedup

Page 53: Ziria: Wireless Programming  for Hardware Dummies

53

Transformations on Abstract Syntax Tree One of main benefit of compiler Computation optimizations:

Vectorization, inlining, convert to map, optimize tick/process, …

Expression optimization: Lookup tables, inlining,

calculate constant expressions, unroll loops, … Tests:

Array boundary checks

seq

EArrRead

TArr

x

EVal

VInt

0

EVal

VInt

10

y := x[0,10]

y:=

Page 54: Ziria: Wireless Programming  for Hardware Dummies

54

Flexible compiler design Example: x[0,length(x)] x

subarr_inline_step e | EArrRead evals estart (LILength n) <- unExp e , EVal (VInt 0) <- unExp estart , TArr (Literal m) _ <- info evals , n == m = rewrite evals

Easy to add new transformations

expressionOptimization functionIf expression e is of type EArrRead on array evals == m with start estart == 0 and length == (LILength n)

evals => xestart => 0(LILength n) => length(x)

Page 55: Ziria: Wireless Programming  for Hardware Dummies

55

Vectorization Idea: batch processing over multiple data itemsrepeat {(x:int)<-take; emit x} repeat {(x:arr[64] int)<-take; emit x}

Modifications of the execution model: Possible since the execution model is not hardcoded in the code We need to respect the operational semantics

Benefits: LUT: bits -> bytes Lower overhead of the execution model (ticks/processes) Faster memcpy Better cache locality

Page 56: Ziria: Wireless Programming  for Hardware Dummies

Vectorization Challenges

56

ParseHeader CRC

(Len,Rate)If rate ==

6 Mbps

scrambler

½ encoder

interleaver

BPSK

2 bit1 bit

48 bit48 bit

1 bit1 complex

1 bit1 bit

1 bit1 bit CRC

scrambler

¾ encoder

interleaver

64 QAM

4 bit3 bit

288 bit288 bit

6 bit1 complex

1 bit1 bit

1 bit1 bit

Len

Len

8 bit4 bit

48 bit48 bit

8 bit8 complex

8 bit8 bit

8 bit8 bit

32 bit24 bit

288 bit288 bit

12 bit2 complex

8 bit8 bit

8 bit8 bit

32 bit24 bit

288 bit288 bit

12 bit2 complex

8 bit8 bit

8 bit216 bit

8 bit4 bit

48 bit48 bit

8 bit8 complex

8 bit8 bit

8 bit24 bit

Page 57: Ziria: Wireless Programming  for Hardware Dummies

Look-up Table (LUT) Optimizations Key optimization for Sora TX Identify block of expressions that

transform data has limited input and output size

Replace it with a LUT Similar to FPGA compilation

Especially beneficial for bit operations

Page 58: Ziria: Wireless Programming  for Hardware Dummies

58

LUT Optimizations (by example)let comp scrambler() =  var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1};   var tmp,y: bit;    repeat {      (x:bit) <- take;      do {        tmp := (scrmbl_st[3] ^ scrmbl_st[0]);        scrmbl_st[0:5] := scrmbl_st[1:6];        scrmbl_st[6] := tmp;        y := x ^ tmp      };

      emit (y)  }

let comp v_scrambler () =  var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1};   var tmp,y: bit;

  var vect_ya_26: arr[8] bit;  let auto_map_71(vect_xa_25: arr[8] bit) =    LUT for vect_j_28 in 0, 8 {          vect_ya_26[vect_j_28] := tmp := scrmbl_st[3]^scrmbl_st[0];             scrmbl_st[0:+6] := scrmbl_st[1:+6];             scrmbl_st[6] := tmp;             y := vect_xa_25[0*8+vect_j_28]^tmp;             return y        };        return vect_ya_26  in map auto_map_71

Vectorization

Automatic lookup-table-compilationInput-vars = scrmbl_st, vect_xa_25 = 15 bitsOutput-vars = vect_ya_26, scrmbl_st = 2 bytesIDEA: precompile to LUT of 2^15 * 2 = 64K

Page 59: Ziria: Wireless Programming  for Hardware Dummies

59

Question How to implement bit permutation as LUT?(idea from SORA):

out_arr := perm({0,2,3,1}, in_arr);in_arr = {1,2,3,4} out_arr = {1,3,4,2}

Hint: permutation indices are constants!

Page 60: Ziria: Wireless Programming  for Hardware Dummies

60

Answerlet perm(p : arr int, iarr : arr bit) = var oarr : arr[length(p)] bit; var oarrtmp : arr[length(p)] bit; var iarr1 : arr[8] bit;

unroll for j in [0,length(p)/8] { let p1 = p[j*8,8]; iarr1 := iarr[j*8,8]; perm8(p1,iarr1,oarrtmp) oarr := v_or(oarr,oarrtmp); } return oarr;in

let perm8(p : arr[8] int, iarr : arr[8] bit, oarr : arr bit) = for i in [0,8] { oarr[p[i]] := iarr[i]; }in

iarr

oarr

LUT size: 2^8 * sizeof(oarr)

Constants!out_arr := perm({0,2,3,1}, in_arr);

Page 61: Ziria: Wireless Programming  for Hardware Dummies

61

Supporting different HW architectures Work in progress… SMP vs FPGA vs ASIC Pipeline and data parallelism SIMD, coprocessors (DSP or ASIC)

Page 62: Ziria: Wireless Programming  for Hardware Dummies

62

Pipeline parallelismofdm |>>>| decode >>> packetize

ofdm >>> write(q1) >>> read(q1) >>> decode >>> packetize

ofdm >>> write(q1)

Thread 1, pin to Core 1

read(q1) >>> decode >>> packetize

Thread 2, pin to Core 2

Sync queue

Page 63: Ziria: Wireless Programming  for Hardware Dummies

63

Code Examples

Page 64: Ziria: Wireless Programming  for Hardware Dummies

64

Performance evaluation

Msample*/sec (sample = 32bits)

SORA Ziria Wifi

RX 6Mbps 164 156 40RX 12Mbps 125 100 40RX 24Mbps 81 67 40RX 48Mbps 61 52 40RX CCA 289 163 40

Mbps/sec SORA Ziria Wifi

TX 6Mbps 54 51 6TX 12Mbps 98 45 12TX 24Mbps 145 53 24TX 48Mbps 231 70 48

- WiFi RX and TX measurements

Page 65: Ziria: Wireless Programming  for Hardware Dummies

65

Real-time LTE-like demo

Page 66: Ziria: Wireless Programming  for Hardware Dummies

66

Status Released to GitHub under Apache 2.0 WiFi implementation included in release Currently supports SORA platform Essential dependency on CPU/SIMD Looking into porting to other CPU-based SDRs

Page 67: Ziria: Wireless Programming  for Hardware Dummies

67

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

Page 68: Ziria: Wireless Programming  for Hardware Dummies

68

Motivation Wireless architecture and design are fragmented:

EE considers PHY and parts of MAC CS considers MAC and above Opportunities for synergies missed: Q: How to change PHY to allow better/simpler network designs?

Page 69: Ziria: Wireless Programming  for Hardware Dummies

69

Examples: HARQ and/or rateless codes: don’t care about rate adaptation, just keep sending

Correlation and detection: detection takes time => huge MAC overheads and terrible efficiency

Channel impulse response and localization: more precise location information

Page 70: Ziria: Wireless Programming  for Hardware Dummies

70

Conventional WiFi design PHY: standardized and cannot be changed

data pipe, correct bits getting in and out MAC: innovation happen

Conventional: CSMA Many alternatives

Page 71: Ziria: Wireless Programming  for Hardware Dummies

71

Conventional cellular design PHY and MAC are standardized

Several modes of operations but none can be changed by applications

IP layer and above: innovations can happen

Page 72: Ziria: Wireless Programming  for Hardware Dummies

72

In reality: We have standard blocks:

Correlators, scramblers, coders/decoders, interleavers, FFT/IFFT Why not allow building network from these standard blocks? Respect certain ground rules Allow innovations

Page 73: Ziria: Wireless Programming  for Hardware Dummies

73

How to design a wireless transceiver? Challenges: performance, complexity, (cost/patents)

CDMA: Uses pseudo-random code against multi-path Not as complicated to implement as OFDM based systems Difficult to equalise the overall wide spectrum

OFDM: Uses subcarriers to combat multi-path combat multipath with greater robustness and less complexity. OFDMA can achieve higher spectral efficiency with MIMO than CDMA

Rule of a thumb: For data rates >= 10 Mbps use OFDM

Page 74: Ziria: Wireless Programming  for Hardware Dummies

74

OFDM OFDM used in most of the contemporary PHYs: WiFi, LTE, WiMax, 60 GHz, UWB

OFDMA is OFDM variant used in cellular (LTE, WiMax): Multiple users sharing the same OFDM symbol MAC scheduling at sub-symbol level Blurred distinction between MAC and PHY

Page 75: Ziria: Wireless Programming  for Hardware Dummies

75

WiFi TX Overview OFDM transmitter (@56Mbps):emits createSTSinTime(); emits createLTSinTime();crc216(h.len) >>> scrambler() >>> encode34() >>> interleaver_m64qam() >>> modulate_64qam() >>> add_pilots() >>> ifft() Preamble for detection and channel estimation

CRC (with padding) to check for errors Scrambler prevents high peaks Interleaver decouples errors

Page 76: Ziria: Wireless Programming  for Hardware Dummies

76

IFFT example

Page 77: Ziria: Wireless Programming  for Hardware Dummies

77

OFDM Symbol Add pilots, perform IFFT and add cyclic prefix

Page 78: Ziria: Wireless Programming  for Hardware Dummies

78

Channel impulse response

Page 79: Ziria: Wireless Programming  for Hardware Dummies

79

Effect on OFDM symbol in time

Page 80: Ziria: Wireless Programming  for Hardware Dummies

80

Effect on OFDM symbol in frequency

Page 81: Ziria: Wireless Programming  for Hardware Dummies

81

Channel Estimation

Page 82: Ziria: Wireless Programming  for Hardware Dummies

82

Channel inversion/equalization

Page 83: Ziria: Wireless Programming  for Hardware Dummies

83

How to deal with multi-path? Add cyclic prefix

Page 84: Ziria: Wireless Programming  for Hardware Dummies

84

Missing Cyclic Prefix

Page 85: Ziria: Wireless Programming  for Hardware Dummies

85

Scrambler Avoids large peak-to-avg power ratios

Page 86: Ziria: Wireless Programming  for Hardware Dummies

86

WiFi RX Overviewlet comp receiver() =seq{ (removeDC() >>> t<-detectSTS()) ; params <- ChannelEstimation() ; removeCP() >>> FFT() >>> ChannelEqualization(params) >>> PilotTrack() >>> RemovePilots() >>> receiveBits() }in

Page 87: Ziria: Wireless Programming  for Hardware Dummies

87

Pilots Channel estimation changes across OFDM symbols Channel changes Drift in oscillators between sender and receiver

Pilots are used for channel re-estimation Similar as initial channel estimation Interpolation for data points

Page 88: Ziria: Wireless Programming  for Hardware Dummies

88

Carrier Sensing and Synchronization Find where packet starts Accurate timing needed for the rest of RX Also used to estimate CFO Also used in carrier sensing

Page 89: Ziria: Wireless Programming  for Hardware Dummies

89

Preamble

Page 90: Ziria: Wireless Programming  for Hardware Dummies

90

Detection using correlation Correlate for a known preamble

How do we implement this in CPU/SIMD?

Page 91: Ziria: Wireless Programming  for Hardware Dummies

91

Detection

Page 92: Ziria: Wireless Programming  for Hardware Dummies

92

Performance of Detector

Page 93: Ziria: Wireless Programming  for Hardware Dummies

93

Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions

Page 94: Ziria: Wireless Programming  for Hardware Dummies

94

Conclusions Wireless innovations will happen at intersections of PHY and MAC levels

We need prototypes and test-beds to evaluate ideas

PHY programming in its infancy Difficult, limited portability and scalability Steep learning curve, difficult to compare and extend previous works

Wireless programming is easy and fun – go for it!http://research.microsoft.com/en-us/projects/

ziria/