Post on 23-Jan-2016
description
September 8, 2009 http://csg.csail.mit.edu/korea L03-1
Combinational Circuits in Bluespec
Arvind Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology
September 8, 2009 L03-2http://csg.csail.mit.edu/korea
Object code(Verilog/C)
Bluespec: Two-Level Compilation
Rules and Actions(Term Rewriting System)
• Rule conflict analysis• Rule scheduling
James Hoe & Arvind@MIT 1997-2000
Bluespec(Objects, Types,
Higher-order functions)
Level 1 compilation• Type checking• Massive partial evaluation and static elaboration
Level 2 synthesis
Lennart Augustsson@Sandburst 2000-2002
Now we call this Guarded Atomic Actions
September 8, 2009 L03-3http://csg.csail.mit.edu/korea
Static Elaboration
.exe
compile
design2 design3design1
elaborate w/params
run1run1run2.1…
run1run1run3.1…
run1run1run1.1…
run w/params
run w/params
run1run1…
run
At compile time Inline function calls and unroll loops Instantiate modules with specific parameters Resolve polymorphism/overloading, perform most
data structure operations
sourceSoftwareToolflow: source
HardwareToolflow:
September 8, 2009 L03-4http://csg.csail.mit.edu/korea
Combinational IFFT
in0
…
in1
in2
in63
in3
in4
Bfly4
Bfly4
Bfly4
x16
Bfly4
Bfly4
Bfly4
…
Bfly4
Bfly4
Bfly4
…
out0
…
out1
out2
out63
out3
out4
Perm
ute
Perm
ute
Perm
ute
All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power, ...
*
*
*
*
+
-
-
+
+
-
-
+
*jt2
t0
t3
t1
September 8, 2009 L03-5http://csg.csail.mit.edu/korea
4-way Butterfly Node
function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k);
BSV has a very strong notion of types Every expression has a type. Either it is declared by
the user or automatically deduced by the compiler The compiler verifies that the type declarations are
compatible
*
*
*
*
+
-
-
+
+
-
-
+
*i
t0
t1
t2
t3
k0
k1
k2
k3
September 8, 2009 L03-6http://csg.csail.mit.edu/korea
BSV code: 4-way Butterflyfunction Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k);
Vector#(4,Complex) m, y, z;
m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3];
y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]);
z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3];
return(z);endfunction
Polymorphic code: works on any type of numbers for which *, + and - have been defined
*
*
*
*
+
-
-
+
+
-
-
+
*i
m y z
Note: Vector does not mean storage
September 8, 2009 L03-7http://csg.csail.mit.edu/korea
Complex ArithmeticAddition
zR = xR + yR
zI = xI + yI
Multiplication zR = xR * yR - xI * yI
zR = xR * yI + xI * yR
The actual arithmetic for FFT is different because we use a non-standard fixed point representations
September 8, 2009 L03-8http://csg.csail.mit.edu/korea
BSV code for Addition
typedef struct{ Int#(t) r; Int#(t) i;} Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag});endfunction
September 8, 2009 L03-9http://csg.csail.mit.edu/korea
Combinational IFFT
in0
…
in1
in2
in63
in3
in4
Bfly4
Bfly4
Bfly4
x16
Bfly4
Bfly4
Bfly4
…
Bfly4
Bfly4
Bfly4
…
out0
…
out1
out2
out63
out3
out4
Perm
ute
Perm
ute
Perm
ute
stage_f function
repeat stage_f three times
function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in);
function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data);
September 8, 2009 L03-10http://csg.csail.mit.edu/korea
BSV Code: Combinational IFFTfunction Vector#(64, Complex) ifft
(Vector#(64, Complex) in_data);//Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]);return(stage_data[3]);
The for loop is unfolded and stage_f is inlined during static elaboration
Note: no notion of loops or procedures during execution
September 8, 2009 L03-11http://csg.csail.mit.edu/korea
BSV Code: Combinational IFFT- Unfoldedfunction Vector#(64, Complex) ifft
(Vector#(64, Complex) in_data);//Declare vectors Vector#(4,Vector#(64, Complex)) stage_data;
stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]);
Stage_f can be inlined now; it could have been inlined before loop unfolding also.
Does the order matter?
stage_data[1] = stage_f(0,stage_data[0]);stage_data[2] = stage_f(1,stage_data[1]);stage_data[3] = stage_f(2,stage_data[2]);
September 8, 2009 L03-12http://csg.csail.mit.edu/korea
Bluespec Code for stage_ffunction Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; endreturn(stage_out); twid’s are
mathematically derivable constants
September 8, 2009 L03-13http://csg.csail.mit.edu/korea
Higher-order functions:Stage functions f1, f2 and f3function f1(x); return (stage_f(1,x)); endfunction
function f2(x); return (stage_f(2,x)); endfunction
function f3(x); return (stage_f(3,x)); endfunction
What is the type of f1(x) ?
September 8, 2009 L03-14http://csg.csail.mit.edu/korea
Suppose we want to reuse some part of the circuit ...
in0
…
in1
in2
in63
in3
in4
Bfly4
Bfly4
Bfly4
x16
Bfly4
Bfly4
Bfly4
…
Bfly4
Bfly4
Bfly4
…
out0
…
out1
out2
out63
out3
out4
Perm
ute
Perm
ute
Perm
ute
Reuse the same circuit three times to reduce area
But why?
September 8, 2009 http://csg.csail.mit.edu/korea L03-15
Architectural Exploration:Area-Performance tradeoff in 802.11a Transmitter
September 8, 2009 L03-16http://csg.csail.mit.edu/korea
802.11a Transmitter Overview
Controller Scrambler Encoder
Interleaver Mapper
IFFTCyclicExtend
headers
data
IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain)
complex numbers accounts for 85% area
24 Uncoded
bits
One OFDM symbol (64 Complex Numbers)
Must produce one OFDM symbol every 4 sec
Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol
September 8, 2009 L03-17http://csg.csail.mit.edu/korea
Preliminary results[MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind
Design Lines of RelativeBlock Code (BSV) AreaController 49 0%Scrambler 40 0%Conv. Encoder 113 0%Interleaver 76 1%Mapper 112 11%IFFT 95 85%Cyc. Extender 23 3%
Complex arithmetic libraries constitute another 200 lines of code
September 8, 2009 L03-18http://csg.csail.mit.edu/korea
Combinational IFFT
in0
…
in1
in2
in63
in3
in4
Bfly4
Bfly4
Bfly4
x16
Bfly4
Bfly4
Bfly4
…
Bfly4
Bfly4
Bfly4
…
out0
…
out1
out2
out63
out3
out4
Perm
ute
Perm
ute
Perm
ute
Reuse the same circuit three times to reduce area
September 8, 2009 L03-19http://csg.csail.mit.edu/korea
f g
Design AlternativesReuse a block over multiple cycles
we expect:
Throughput to
Area to
ff g
decrease – less parallelism
The clock needs to run faster for the same throughput hyper-linear increase in energy
decrease – reusing a block
September 8, 2009 L03-20http://csg.csail.mit.edu/korea
Circular pipeline: Reusing the Pipeline Stage
in0
…
in1
in2
in63
in3
in4
out0
…
out1
out2
out63
out3
out4
…
Bfly4
Bfly4
Perm
ute
Stage Counter
September 8, 2009 L03-21http://csg.csail.mit.edu/korea
Superfolded circular pipeline: Just one Bfly-4 node!in0
…
in1
in2
in63
in3
in4
out0
…
out1
out2
out63
out3
out4
Bfly4
Perm
ute
Index == 15?
Index: 0 to 15
64
, 2-w
ay
Muxes
4, 1
6-w
ay
Muxes
4, 1
6-w
ay
DeM
uxes
Stage 0 to 2
September 8, 2009 L03-22http://csg.csail.mit.edu/korea
Pipelining a block
inQ outQ
f2f1 f3Combinational
C
inQ outQ
f2f1 f3Pipeline
P
inQ outQ
f Folded Pipeline
FP
Clock? Area? Throughput?Clock: C < P FP Area: FP < C < P Throughput: FP < C < P
September 8, 2009 L03-23http://csg.csail.mit.edu/korea
Syntax: Vector of Registers
Register suppose x and y are both of type Reg. Then
x <= y means x._write(y._read())
Vector of Int x[i] means sel(x,i) x[i] = y[j] means x = update(x,i, sel(y,j))
Vector of Registers x[i] <= y[j] does not work. The parser thinks it means
(sel(x,i)._read)._write(sel(y,j)._read), which will not type check
(x[i]) <= y[j] parses as sel(x,i)._write(sel(y,j)._read), and works correctly
Don’t ask me why
September 8, 2009 L03-24http://csg.csail.mit.edu/korea
Static vs dynamic expressions
Expressions that can be evaluated at compile time will be evaluated at compile-time 3+4 7
Some expressions do not have run-time representations and must be evaluated away at compile time; an error will occur if the compile-time evaluation does not succeed Integers, reals, loops, lists, functions, …