Computing Without Processors Thesis Proposal
-
Upload
breanna-hampson -
Category
Documents
-
view
29 -
download
3
description
Transcript of Computing Without Processors Thesis Proposal
![Page 1: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/1.jpg)
Computing Without ProcessorsThesis Proposal
Mihai Budiu July 30, 2001
This presentation uses TeXPoint by George Necula
Thesis Committee:Seth Goldstein, chair
Todd Mowry Peter Lee
Babak Falsafi, ECENevin Heintze, Agere Systems
![Page 2: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/2.jpg)
2
Four Types of Research
• Solve nonexistent problems
• Solve past problems
• Solve current problems
• Solve future problems
![Page 3: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/3.jpg)
3
The Law
(source: Intel)
![Page 4: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/4.jpg)
4
The Crossover Phenomenon
time
technology
![Page 5: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/5.jpg)
5
Example Crossover
time
DRAM
CPU
1980
caches
access speed (ns)
no caches
200
![Page 6: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/6.jpg)
Trouble Aheadfor
Microarchitecture
![Page 7: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/7.jpg)
7
Signal Propagation
time
now
mmdie size
distancein 1 clock
20
![Page 8: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/8.jpg)
8
Reliability & Yield
time
defects/chip
tolerable
new process
occurring
now
![Page 9: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/9.jpg)
9
Energy
timenow
100W
CPU consumption
thermal dissipation
power
![Page 10: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/10.jpg)
10
Instruction-Level Parallelism (ILP)
time
fetch
commit
instructions
now
![Page 11: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/11.jpg)
11
Premises of this Research
• We will have lots of gates– Moore’s law continues– Nanotechnology
• Contemporary architectures do not scale
![Page 12: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/12.jpg)
12
Outline
• Motivation
• ASH: Application-Specific Hardware
• The spatial model of computation
• CASH: Compiling for ASH
• Evolutionary path
• Conclusions
• Future work
![Page 13: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/13.jpg)
13
ASH Application-Specific Hardware
Reconfigurablehardware
HLL program
Compiler
Circuit
![Page 14: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/14.jpg)
14
ASH: A Scalable Architecture-- Thesis Statement --
Application-specific hardware on a reconfigurable-hardware substrate is a solution for the smooth evolution of computer architecture.
We can provide scalable compilers for translating high-level languages into hardware.
![Page 15: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/15.jpg)
15
Exampleint f(void){ int i=0, j = 0;
for (; i < 10; i++) j += i;
return j;}
![Page 16: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/16.jpg)
16
Outline
• Motivation
• ASH: Application-Specific Hardware
• The spatial model of computation
• CASH: Compiling for ASH
• Evolutionary path
• Conclusions
• Future work
![Page 17: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/17.jpg)
17
• Build reconfigurable hardware using nanotechnology
Huge structures
ASH and Nanotechnology
• Low Power: 1010 gates use less than 2 W• Low cost: nanocents/gate• High density: 105x over CMOS
Nano-RAM cell
In yellow: a CMOS RAM cell.
![Page 18: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/18.jpg)
18
A graph of the whole program execution:
A Limit Study of Performance
Memory word
Basic block
Memory write
Memory read
Control-flow transfer
![Page 19: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/19.jpg)
19
Typical Program Graph (g721_e)
Control flow transfer
100% memory cluster
Memory reads
100% code cluster
memcpy
![Page 20: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/20.jpg)
20
Program Graph After Inlining memcpy
memcpy
![Page 21: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/21.jpg)
21
Application Slowdown
-1
0
1
2
3
4
5
6
7
8
9
10
11
tim
es s
low
er t
han
nat
ive
1 clock/square 5 clocks/square
![Page 22: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/22.jpg)
22
How Time Is Spent
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
099.g
o
129.c
ompr
ess
130.l
i
132.i
jpeg
adpc
m_d
adpc
m_e
epic_
e
g721
_Q_d
g721
_Q_e
gsm
_d
gsm
_e
jpeg_d
jpeg_e
mpe
g2_d
per
cen
t
idle
executioncontrol flow
register traffic
No caches: reads expensive
No speculation
![Page 23: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/23.jpg)
23
Lesson
The spatial model of computation has different properties.
![Page 24: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/24.jpg)
24
Outline
• Motivation
• ASH: Application-Specific Hardware
• The spatial model of computation
• CASH: Compiling for ASH
• Evolutionary path
• Future work
![Page 25: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/25.jpg)
25
CASH: Compiling for ASH
Memory partitioning
Interconnection net
Program to circuits
![Page 26: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/26.jpg)
26
Compilation
1. Program
int reverse(int x){ int k,r=0; for (k=0; k<32; k++) r |= x&1; x = x >> 1; r = r << 1; }}
Unknown latency ops.
Computations& local storage2. Split-phase Abstract
Machines
3. Configurations placed
independently4. Placement on chip
Reliability
![Page 27: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/27.jpg)
27
Split-phase Abstract Machines
SAM 1
SAM 2SAM 3
CFG
Power
![Page 28: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/28.jpg)
28
Hyperblock => SAM
• Single-entry, multiple exit
• May contain loops
![Page 29: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/29.jpg)
29
SAM => FSM
Start Loop
Exit
Exit
RemoteMemory
Localmemory
![Page 30: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/30.jpg)
30
Implementing SAMs- interesting details -
![Page 31: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/31.jpg)
31
The SAM FSM
Computation
Predicates (control)
Combinational logic
start exit
Reg
iste
r
args results
![Page 32: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/32.jpg)
32
Computation = Dataflow
• Variables => wires + tokens• No token store; no token matching • Local communication only
Signals
x = a & 7;...
y = x >> 2;
Programs
&
a 7
>>
2
x
Circuits
![Page 33: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/33.jpg)
33
Tokens & Synchronization
• Tokens signal operation completion• Possible implementations:
data
validack
Local
data
valid
reset
Global
data valid
Static
![Page 34: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/34.jpg)
34
Speculation
if (x > 0) y = -x;
elsey = b*x;
*
x
b 0
y
!
slow
Computation Predicates
- >- >
and Eager Muxes
Static-Single Assignment implemented in hardware
ILP
![Page 35: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/35.jpg)
35
Predicates
*q = 2;
• Guard side-effects– Memory access– Procedure calls
• Control looping
• Decide exit branch
• Select variable definition x=... x=...
...=x
![Page 36: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/36.jpg)
36
Computing Predicates
• Correct for irreducible graphs• Correct even when speculatively computed • Can be eagerly computed
s t
b
![Page 37: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/37.jpg)
37
Loops + Dataflow
for (i=0; i < 10; i++)a[i] += i;
+
load
+
store
&a[0]
+
1i
a[0]
0
a[1]
a[2]
a[3]
= Pipelining
![Page 38: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/38.jpg)
38
Outline
• Motivation
• ASH: Application-Specific Hardware
• The spatial model of computation
• CASH: Compiling for ASH
• Evolutionary path
• Conclusions
• Future work
![Page 39: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/39.jpg)
39
Evolutionary Path
Microprocessors ASH
The problem with ASH: Resources
![Page 40: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/40.jpg)
40
Virtualization
![Page 41: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/41.jpg)
41
CPU+ASH
core computation
support computation+ OS+ VM
CPU ASH
Memory
![Page 42: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/42.jpg)
42
Outline
• Motivation
• ASH: Application-Specific Hardware
• The spatial model of computation
• CASH: Compiling for ASH
• Evolutionary path
• Conclusions
• Future work
![Page 43: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/43.jpg)
43
ASH Benefits
Problem Solution
Reliability Configuration around defects
Power Only “useful” gates switching
Signals Localized computation
ILP Statically extracted
![Page 44: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/44.jpg)
44
Scalable Performance
performance
CPU
ASH
time
now
![Page 45: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/45.jpg)
45
Summary
• Contemporary CPU architecture faces lots of problems
• Application-Specific Hardware (ASH) provides a scalable technology
• Compiling HLL into hardware dataflow machines is an effective solution
![Page 46: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/46.jpg)
46
Timeline
12/0206/01
CASH core
09/01 12/01 04/02 06/02 09/02
Writethesis
Hw/sw partitioning(ASH + CPU)
Costmodels
ASH Simulation
Loop parallelization
Explore architectural/compiler trade-offs
now
Memory partitioning
![Page 47: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/47.jpg)
47
Extras
• Related work
• Reconfigurable hardware
• Other cross-over phenomena
• A CPU + ASH study
• More about predicates
![Page 48: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/48.jpg)
48
Related Work
• Hardware synthesis from HLL
• Reconfigurable hardware
• Predicated execution
• Dataflow machines
• Speculative execution
• Predicated SSA
back
![Page 49: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/49.jpg)
49
Reconfigurable Hardware
Universal gates
and/or
storage elements
Interconnectionnetwork
Programmable Switches
back to presentation
![Page 50: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/50.jpg)
50
Switch controlled by a 1-bit RAM cell
0001
Universal gate = RAM
a0a1a0
a1
dataa1 & a2
0data in
control
Main RH Ingredient: RAM Cell
back
![Page 51: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/51.jpg)
51
Reconfigurable Computing
• Back to ENIAC-style computation
• Synthesize one machine to solve one problem
back back to “extras”
![Page 52: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/52.jpg)
52
Efficiency
time
idle
used
hardware resources
now
![Page 53: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/53.jpg)
53
Manufacturing Cost
time
3x109$
now
cost
affordable
cost
![Page 54: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/54.jpg)
54
Complexity
time
transistors
manageable
available
109
108
1010
now
![Page 55: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/55.jpg)
55
CAD Tools
time
manual interventions
now
feasible
necessary
back
![Page 56: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/56.jpg)
56
ASH BenefitsProblem Solution
Reliability Configuration around defects
Power Only “useful” gates switching
Signals Localized computation
ILP Statically extracted
Complexity Hierarchy of abstractions
CAD Compiler + local place & route
Efficiency Circuit customized to application
Cost No masks, no physics, same substrate
Performance Scalableback
![Page 57: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/57.jpg)
57
CPU+ASH Study
• Reconfigurable functional unit on processor pipeline
• Adapted SimpleScalar 3.0• ASH & CPU use the same memory
hierarchy (incl. L1)• ASH can access CPU registers• CPU pipeline interlocked with ASH• Results pending
back
![Page 58: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/58.jpg)
58
Simplifying Predicates
• Shared implementations
• Control equivalence
a
b
c
![Page 59: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/59.jpg)
59
Deep Speculation
if (p) if (q) x = a; else x = b;else x = c;
x
a b c
!pp&!qp&q
![Page 60: Computing Without Processors Thesis Proposal](https://reader030.fdocuments.us/reader030/viewer/2022032415/56813459550346895d9b3cb0/html5/thumbnails/60.jpg)
60
Predicates & Tokens
*q = 2 readysafe
q
~x
ready
safe
x
*q = 2
1
ready & safe
q
Predicated tokens Eliminate speculation
~x
safe & readyx
back
ready
Eliminate wires
P P_ready
P & P_ready