ASH: A Substrate for Scalable Architectures
description
Transcript of ASH: A Substrate for Scalable Architectures
![Page 1: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/1.jpg)
ASH: A Substrate for Scalable
Architectures
Mihai Budiu
Seth Copen Goldsteinhttp://www.cs.cmu.edu/~phoenix
CALCM Seminar, March 19, 2002
![Page 2: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/2.jpg)
/322
Resources
![Page 3: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/3.jpg)
/323
CPU Problems
• Complexity
• Power
• Global Signals
• Limited issue window => limited ILP
We propose an architecture with none of these limits
![Page 4: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/4.jpg)
/324
Outline
• Scalability
• Reconfigurable hardware advantages
• A hybrid RH + CPU architecture
• CPU and RH as peers
• Application Specific Hardware
![Page 5: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/5.jpg)
/325
FU * clock freq
Computational Bandwidth
CPU
Unbounded
RH
*
+
/
a=a+bb=b+c
![Page 6: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/6.jpg)
/326
Registers
Fixed
RH
Unbounded
eaxebxecxedx
ijklm spillsp[0]
CPU
![Page 7: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/7.jpg)
/327
Register Bandwidth
Fixed
CPU
R1R2R3W1W2
RH
Unbounded
![Page 8: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/8.jpg)
/328
Out-of-Order Execution
RHCPU
Fe
tch
De
cod
e
Dis
pa
tch
Exe
cute
Co
mm
it
In-order
Limited bywindow
Compiler’s window is unbounded
![Page 9: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/9.jpg)
/329
Outline
• Scalability
• Reconfigurable hardware advantages
• A hybrid RH + CPU architecture
• CPU and RH as peers
• Application Specific Hardware
![Page 10: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/10.jpg)
/3210
Hybrid system: CPU+RH
High ILP
application-specific
Low ILP+ OS + VM
generic
CPU RH
Memory
Tight coupling
![Page 11: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/11.jpg)
/3211
Problem
HLL Program
CPU RH
Memory
Compiler
![Page 12: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/12.jpg)
/3212
Our Solution
General: applicable to today’s software
Automatic: compiler-driven [RISC approach]
Scalable: with clock, hardware and program size
Parallelism: exploit application parallelism• bit-level• ILP• pipeline• loop-level
![Page 13: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/13.jpg)
/3213
Outline
• Scalability
• Reconfigurable hardware advantages
• A hybrid RH + CPU architecture
• CPU and RH as peers
• Application Specific Hardware
![Page 14: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/14.jpg)
/3214
Peeringa( ) {
b( );}
b( ) {c( );
}
c( ) {d( )
}
d( ) { }
CPU RH
a
b
c
d
Program
![Page 15: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/15.jpg)
/3215
marshalling,control transfer
softwareprocedure
callhardware
dependent
RH
“RPC”
CPU
a
b
c
d
b’
c’
d’
Stubs built automatically.
![Page 16: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/16.jpg)
/3216
Stub Synthesis
Proceduresfor RH
RH Compiler
Proceduresfor CPU
Program
Partitioning
Stubs
Configuration
Linker
Executable
![Page 17: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/17.jpg)
/3217
Outline
• Scalability
• Reconfigurable hardware advantages
• A hybrid RH + CPU architecture
• CPU and RH as peers
• Application Specific Hardware
![Page 18: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/18.jpg)
/3218
Application-Specific Hardware
Reconfigurablehardware
HLL program
Compiler
Circuit
HLL Program
CPU RH
Memory
Compiler
![Page 19: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/19.jpg)
/3219
CASH: Compiling for ASH
Memory partitioning
Interconnection net
Circuits
C Program
RH
![Page 20: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/20.jpg)
/3220
Asynchronous Computation
+
data
dataready
ack
Can extend to locally synchronous, globally asynchronous
![Page 21: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/21.jpg)
/3221
Dataflow Graphs
int plus(int x, int y)
{
return x + y;
}
![Page 22: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/22.jpg)
/3222
From Control Flow to Data Flow
![Page 23: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/23.jpg)
/3223
From Control Flow to Data Flow
![Page 24: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/24.jpg)
/3224
From Control Flow to Data Flow
![Page 25: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/25.jpg)
/3225
Conditionals = Speculation
int cond(int p, int x, int y)
{
int z;
if (p)
z = x;
else
z = y;
return z;
}
![Page 26: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/26.jpg)
/3226
Critical Paths
if (x > 0) y = -x;
elsey = b*x;
*
xb 0
y
!
- >
![Page 27: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/27.jpg)
/3227
Executing Lenient Operators
if (x > 0) y = -x;
elsey = b*x;
*
xb 0
y
!
- >
Up to 40% performance improvement.
![Page 28: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/28.jpg)
/3228
Pipelining
Pipelined Cycles
N 903
Y 653
![Page 29: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/29.jpg)
/3229
Loop Pipelining
Pipe FIFO Cycles
N 0 903
N 1 903
Y 0 653
Y 1 474
Y 2 408
Y 3 408
![Page 30: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/30.jpg)
/3230
Loop Pipelining
Pipe FIFO Cycles
N 0 903
N 1 903
Y 0 653
Y 1 474
Y 2 408
Y 3 408
![Page 31: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/31.jpg)
/3231
ASH Features
• What you code is what you get– no hidden control logic– really lean hardware
(no CAM, decoders, multiported files, etc.)
• Compiler has complete control• Dynamic scheduling => latency tolerant• Naturally exploits ILP,
even across loop iterations
![Page 32: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/32.jpg)
/3232
Conclusions
• ASH = Compiler-synthesized hardware
• ASH matches program parallelism
• Dynamically scheduled RH
• ASH scales with – clock frequency– transistors– program size
![Page 33: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/33.jpg)
/3233
Backup Slides
![Page 34: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/34.jpg)
/3234
Reconfigurable Hardware
Universal gates
and/or
storage elements
Interconnectionnetwork
Programmable switches
![Page 35: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/35.jpg)
/3235
Switch controlled by a 1-bit RAM cell
0001
Universal gate = RAM
a0a1a0
a1
dataa1 & a2
0data in
control
Main RH Ingredient: RAM Cell
![Page 36: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/36.jpg)
/3236
Stubs
a( ) { r = b(b_args);}
b(b_args) {
}
a( ) { r = b’(b_args);}
b’(b_args) { send_rh(b_args); invoke_rh(b); r = receive_rh( ); return r;}
RH
Program
![Page 37: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/37.jpg)
/3237
Independent of b
Dispatcher Stubs
a( ) { r = b(b_args);}
b(b_args) { if (x) c( ); return r;}
c( ) {
}
Program
b’(b_args) { send_rh(b_args); invoke_rh(b);
while (1) { com = get_rh_command( ); if (! com) break; (*com)( ); }
r = receive_rh( ); return r;}
c’s stub
![Page 38: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/38.jpg)
/3238
C’s Stuba( ) { r = b(b_args);}
b(b_args) { if (x) c( ); return r;}
c( ) {
}
Program
c’( ) { receive_rh(c_args); r = c(c_args); send_rh(r); invoke_rh(return_to_rh);}
back
![Page 39: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/39.jpg)
/3239
Input to Output
int io(int x)
{
return x;
}
![Page 40: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/40.jpg)
/3240
Loops
int loop()
{
int w = 10;
while (w > 0)
w--;
return w;
}
![Page 41: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/41.jpg)
/3241
Pointers and Arrays
int a[10];
void pointer(int *p)
{
a[2] += a[4] + *p;
}
![Page 42: ASH: A Substrate for Scalable Architectures](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813e2c550346895da80c7d/html5/thumbnails/42.jpg)
/3242
int sum(){ int s = 0; int i;
for (i=0; i < 10; i++)s += a[i];
return s;}
Pointers and Loops