Reiner Hartenstein, University of Kaiserslautern, GermanyUniversity of Kaiserslautern 14 Dead...
Transcript of Reiner Hartenstein, University of Kaiserslautern, GermanyUniversity of Kaiserslautern 14 Dead...
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
Enabling Technologies for
Reconfigurable Computing
Enabling Technologies for Reconfigurable Computing Part 2: Stream-based Computing for RC
Wednesday, November 21, 10.30 – 12.00 hrs.
Reiner Hartenstein
University of Kaiserslautern
November 21, 2001, Tampere, Finland
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
2
Schedule
time slot
08.30 – 10.00 Reconfigurable Computing (RC)
10.00 – 10.30 coffee break
10.30 – 12.00 Stream-based Computing for RC
12.00 – 14.00 lunch break
14.00 – 15.30 Resources for RC
15.30 – 16.00 coffee break
16.00 – 17.30 FPGAs: recent developments
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
3
>> EDA revolution
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
4
EDA: where Electronics begins [Richard Newton]
1k
• Dataquest Initiative
New book
• NASDAQ index
EDA index
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
5
[Richard Newton]
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
6
The End is near
year to market 10 0
10 3
10 6
10 9
10 12
10 15
1960 1970 1980 1990 2000 2010 2020 2030 2040
transistors/chip
The end of Hypergrowth ?
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
7
Paradigm
Shift
Mainstream
Tornado
Development of Hypergrowth Markets
Harper Business 1995
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
8
Makimoto’s 3rd wave
The next EDA Industry Revolution
1978
Transistor entry: Applicon, Calma, CV ...
1992
Synthesis: Cadence, Synopsys ... 1985
Schematics entry: Daisy, Mentor, Valid ...
[Keutzer / Newton]
EDA industry paradigm switching every 7 years
1999 (Co-) Compilation
Stream-based DPU arrays
[Hartenstein]
2006
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
9
Biggest Mistake in History
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
10
Innovation Stalled ? [Richard Newton]
What is next after VHDL ?
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
11
What is next after VHDL ?
Motivations • HDL-savvy designers needed • New Business Model • Co-Design never ending • HDLs ? • Extended HDLs – how far ? • Automatic Partitioning
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
12
>> Dead Supercomputer
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation http://www.uni-kl.de
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
13
Dead Supercomputer Society
• 37 university and corporate R&D projects: 2 or 3 successes…
• All the rest failed to work or to be successful (Research 1985-1995)
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
14
Dead Supercomputer Society
• ACRI • Alliant • American
Supercomputer • Ametek • Applied Dynamics • Astronautics • BBN • CDC • Convex • Cray Computer • Cray Research • Culler-Harris • Culler Scientific • Cydrome • Dana/Ardent/
Stellar/Stardent • DAPP
• Denelcor • Elexsi • ETA Systems • Evans and Sutherland • Computer • Floating Point Systems • Galaxy YH-1 • Goodyear Aerospace MPP • Gould NPL • Guiltech • ICL • Intel Scientific Computers • International Parallel
Machines • Kendall Square Research • Key Computer Laboratories
[Gordon Bell, keynote at ISCA 2000].
•MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
15
Dead Supercomputer Society • ACRI • Alliant • American Supercomputer • Ametek • Applied Dynamics • Astronautics • BBN • CDC • Convex • Cray Computer • Cray Research • Culler-Harris • Culler Scientific • Cydrome • Dana/Ardent/Stellar/Stardent • DAP (ICL) • Denelcor • Elexsi • ETA Systems • Evans and Sutherland Computer • Floating Point Systems • Galaxy YH-1
• Goodyear Aerospace MPP • Gould NPL • Guiltech • Intel Scientific Computers • International Parallel Machines • Kendall Square Research • Key Computer Laboratories • MasPar • Meiko • Multiflow • Myrias • Numerix • Prisma • Tera • Thinking Machines • Saxpy • Scientific Computer Systems (SCS) • Soviet Supercomputers • Supertek • Supercomputer Systems • Suprenum • Vitesse Electronics
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
16
>> Stream-based Computing
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
17
Coarse Grain Reconfigurable Arrays vs. Parallel Processes
I-Seq ALU
I-Seq ALU I-Seq ALU
I-Seq ALU I-Seq ALU
I-Seq ALU
I-Seq ALU I-Seq ALU
• • •
• • •
I-Seq ALU
• • •
• • •
• • •
• • •
• • •
• • •
Data
Sequencer
rALU rALU rALU
rALU rALU rALU
rALU rALU rALU
Paralellität auf Prozeß-Ebene Paralellität auf Datenpfad-Ebene
Parallelism at Process Level
Parallelism at Datapath Level
reconfigurable hardwired no instruction
sequencing !
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
18
Concurrent Computing
DPU instruction sequencer
DPU instruction sequencer
DPU instruction sequencer
DPU instruction sequencer
....
Bus(es) or switch box
CPU extremely inefficient
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
19
Stream-based Computing
DPU DPU DPU DPU
driven by data stream from / to memory or, from / to peripheral interface
transport-triggered execution no instruction sequencer inside !
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
20
Stream-based Computing: (r)DPU array
for both,
reconfigurable,
and, hardwired
DPU DPU DPU
DPU DPU DPU
DPU DPU DPU
driven by data streams
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
21
>>> extremely high efficiency
• avoiding address computation overhead
• avoiding instruction fetch and interpretation overhead
• high parallelism, massively multiple deep pipelines
• much less configuration memory
• no routing areas to configure functions from CLBs
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
22
Systolic Stream-based Computing System
Systolic Array [H. T. Kung, 1980]: an array of DPUs (Data Path Units)
y 1 0 ( )
y 2 0 ( )
y 3 0 ( )
x 1
x 2
x 3
-
-
-
a 12
a 11 a 21
a 32
a 31
a 23
a 33
a 22
a 13
-
-
y 1
y 2
y 3
-
-
-
-
DPU architecture
y
+ *
x
a
data
streams
equations
placement linear
projection
or
algebraic
mapping
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
23
computing in space
Computing in space and time
data
streams
y 1 0 ( )
y 2 0 ( )
y 3 0 ( )
-
-
-
y 1
y 2
y 3
-
-
-
x 1
x 2
x 3
-
- -
computing in time
a 12
a 11 a 21
a 32
a 31
a 23
a 33
a 22
a 13
placement
systolic arrays etc.
and other transformations migration by re-timing
this dichotomy is completely ignored
by our CS curricula
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
24
2
General Stream-based Computing System heterogenous Array of DPUs (data path units)
Scheduler
Mapper
expression tree
DPU architectures
y
+ *
x
a
1
simultaneous
placement
& routing
3
+
+ +
+
*
*
* sh *
sh
sh sh
xf
xf
-
- data
streams
4
The same mapper for both: Reconfigurable, or hardwired
Kress DPSS [1995]
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
25
Converging Design Flows
this synthesis method is a generalization of
systolic array synthesis: super systolic synthesis
and DPA [Broderson, 2000]:
terms:
DPU: datpath unit
DPA: data path array
rDPU: reconfigurable DPU
rDPA: reconfigurable DPA
the same synthesis method may be used for mapping an algorithm onto both:
rDPA [Kress, 1995],
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
26
Super Pipe Networks
pipeline properties array applications
shape resources
mapping scheduling
(data stream formation)
systolic array
regular data
dependencies only
linear only
uniform only
linear projection or algebraic synthesis
super-systolic rDPA
no restrictions simulated
annealing or P&R algorithm
(e.g. force-directed) scheduling algorithm
*
*) KressArray [1995]
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
27
>> Stream-based Memory Architecture
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
28
Hot Research Topic: Memory Architectures
• High Performance Embedded Memory Architectures
• High Performance Memory Communication Architectures [Herz]
• Custom Memory Management Methodology [Cathoor]
• Data Reuse Transformations [Kougia et al.]
• Data Reuse Exploration [Soudris, Wuytak]
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
29
Processor Memory Performance Gap
1
10
100
1000 Performance
1980 1990 2000
µProc
60%/yr..
DRAM
7%/yr..
Processor-Memory
Performance Gap:
(grows 50% / year)
DRAM
CPU
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
30
RAs: Cache does not help
• the memory bandwidth problem is often more dramatic then for microprocessors
• interleaving is not practicable, since based on sequential instruction streams
• classical caches do not help, since instruction sequencing is not used
• the problem: throughput of parallel data streams, not instruction streams
• super pipe networks, no parallel computers !
• Stream-based arrays are a memory bandwidth problem
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
31
http://kressarray.de
Efficient Memory Communication should be directly supported by the Mapper Tools
sequencers
memory ports
application
not used
Legend: Optimized Parallel Memory Controller
An example by Nageldinger’s KressArray Xplorer
Synthesizable Memory Communication
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
32
The Disk Farm? or a System On a Card?
The 500GB disc card LOTS of bandwidth A few disks replaced by >10s Gbytes RAM and a processor
14"
MicroDrive:1.7” x 1.4” x 0.2” 2006: ?
1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW)
Integrated IRAM processor 2x height
Connected via crossbar switch growing like Moore’s law
16 Mbytes; ; 1.6 Gflops; 6.4 Gops 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops
[Gordon Bell, Jim Gray,
ISCA2000]
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
33
Memory Communication Architecture
• hot research topic in embedded systems
• storage context transformations [Herz, others]
• for low power
• for high performance
• startups provide memory IP or generators
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
34
Stream-based Soft Machine
Scheduler Memory
(data memory)
memory bank
memory bank
memory bank
memory bank
memory bank
...
...
“instructions”
rDPA Compiler
Sequencers (data stream
generator)
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
35
>> Design Space Explorers
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
36
• domain-specific Reconfigurable Platforms will be suitable to cope with the 2nd Design Crisis
• just as the general purpose massively parallel computer system
general purpose is unrealistic
an Illusion ...
KressArray Explorer ...
• fully general purpose reconfigurable sometimes is ....
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
37
Universal RAs: is it feasible?
... such as obviously also the Universal Massively Parallel Computer Architecture
... counter-example: Application Domain of Image Processing
The General Purpose (coarse grain) Reconfigurable Array
appears to be an Illusion ...
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
38
-> Design Space Exploration
• Design Space Exploration – Design Space Explorer (DSEs) – Platform Space Explorers (PSEs) – Compiler / PSE symbiosis – Parallel computing vs. reconfigurable
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
39
Design Space Exploration Systems
Explorer System year source inter-active
status evaluation status generation
DPE 1991 [66] no abstract models rule-based
Clio 1992 [67] yes prediction models device generator
DIA 1998 [68] yes prediction from library rule-based
DSE for RAW 1998 [49] no analytical models analytical
ICOS 1998 [76] no fuzzy logic greedy search
DSE for Multimedia
1999 [77] no simulation branch and bound
Xplorer 1999 [11] [50] yes fuzzy rule-based simulated annealing
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
40
DSEs: an overview
• For VLSI design in general
• for parallel Computer Systems
• Xplorer the only one for reconfigurable platforms (auch MATRIX ?)
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
41
>> KressArray Xplorer
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
42
KressArray DPSS
Application Set
DPSS
published at ASP-DAC 1995
Architecture Editor
Mapping Editor
statist. Data
Delay Estim.
Analyzer
Architecture Estimator
interm. form 2
expr. tree
ALE-X Compiler
Power Estimator
Power Data
VHDL Verilog
HDL Generator Simulator
User
ALEX Code
Improvement Proposal Generator
Suggestion
Selection User
Interface
interm. form 3
Mapper
Design Rules
Datapath Generator Generator
Kress rDPU
Layout
data stream Schedule
Scheduler
KressArray Xplorer (Platform Design Space Explorer)
Xplorer
Inference Engine (FOX)
Sug- gest- ion
KressArray family
parameters
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
43
Architecture & Mapping Editor
Sta
tistic
s
KressArray DPSS
Datastream Generator
HDL Generator Simulator
Datapath Generator Generator
Delay & Power
Estimator Improvement
Proposal Generator
User DPSS
Source Input KressArray
(Design Space) Platform Space Explorer
http://kressarray.de
Xplorer
Application Set
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
44
Design Flow of Domain-specific
Architecture Optimization
ApplicationCompilation
ApplicationSelection
ApplicationMapping
MappingAnalysis
ModificationSuggestion
ArchitectureModification
ArchitectureVerification
OptimizedArchitecture
ApplicationSet
Initial Arch.Estimation
or benchmark
Nageldinger’s KressArray
Design Space Xplorer:
including a
Fuzzy Logic
Improvement
Proposal
Generator accessible
by internet: http://kressarray.de
runs best with Netscape 4.6.1
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
45
KressArray Design Space Xplorer
DPSS-N Data Path Systhesis System
Analyser
HDL Generator HDL Description
.v
Module Generator
.krs
Kress IP Library
other IP
Editor / User Interface
Architecture Estimation
Intermediate Format
.map
ALE-X Compiler
ALE-X Code
.alex
User
Mapper
Interm. Format
.map
including configware code
Technology Mapping
Scheduler
Data .seq Sequencing
Code
Kress rDPU .krs Layout
Placement & Routing
M a p p i n
g
Statistical Data
.stat
to Synthesis Environment
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
46
>> Machine paradigms
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
47
d a ta cou n ter
instructions
program cou n ter: state register
Compiler Memory
Datapath
har dw ired
Sequencer
Computer Computer tightly coupled
by compact instruction code
“von Neumann” “von Neumann” does not support
soft data paths
does not support
soft data paths
Datapath
Xputer Xputer
Scheduler
Compiler Memory
multiple sequencer
Datapath Array
“instructions”
University of Kaiserslautern
Xputer Lab
loosely coupled by decision data bits only
Xputer: Xputer: The Soft
Machine
Paradigm
The Soft
Machine
Paradigm reconfigurable reconfigurable
also for hardwired also for hardwired
Computer: the wrong Machine Paradigm
“von Neumann”
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
48
Soft Machine Paradigm
Xputer Xputer Parallel Xputer Parallel Xputer
reconfigurable
Scheduler
Compiler Memory
Sequencer Datapath
“instructions”
d a ta cou n ter
Scheduler
Compiler
Sequencer Datapath
Sequencer
•
“instructions”
d a ta cou n ters reconfigurable
•
mem
ory
mem
ory
• • • •
multiple
Decision data only; i, e, loose coupling
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
49
Computer: the wrong Machine Paradigm
Compiler Memory
Sequencer
Decoder Datapath
instructions
program cou n ter
har dw ired
tightly coupled by a compact instruction code “von
Neumann” “von Neumann” does not support
soft data paths:
does not support
soft data paths:
“von Neumann”
at run time: no
instruction fetch
at run time: no
instruction fetch
:
Instruction Sequencer
Datapath
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
50
Machine Paradigms
machine categoryComputer
(“v. Neumann”)Xputer
(no transputer!)
driven by: control flow data streams (no “dataflow”)
engine principles instruction sequencing data sequencing
state register program counter (multiple) data counter(s)
communicationpath set-up
at run time at load time
resource single ALU array of ALUs & other rDPUsdatapath
operation sequential parallel pipe network
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
51
Machine Paradigms
machine categoryComputer
(“v. Neumann”)Xputer [8]
(no transputer!)
Machine paradigm procedural sequencing: deterministic
driven by: control flow(no dataflow [13])
data stream(s)
RA support no yes
engine principles Instruction sequencing data sequencing
state register program counter (multiple) data counter(s)
communicationpath set-up
at run time at load time
resource single ALU array of ALUsdatapath operation sequential parallel
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
52
Fundamental Ideas available
• Data Sequencer Methodology
• Data-procedural Languages (Duality w. v. N.)
• ... supporting memory bandwidth optimization
• Soft Data Path Synthesis Algorithms
• Parallelizing Loop Transformation Methods
• Compilers supporting Soft Machines
• SW / CW Partitioning Co-Compilers
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
53
>> Co-Compilation
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
54
FPGA-Style Mapping for coarse grain reconfigurable arrays
mapping Kress DPSS CHESS RaPiD Colt
placement simulated annealinggenetic
algorithm
routing
simulatedannealing
Pathfindergreedy
algorithm
Compiler
Mapper
Scheduler specifies and
assembles the data streams
from / to array
DPSS
KressArray DPSS
(Datapath Synthesis System)
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
55
Changing Models of Computing
“von Neumann”
downloading
RAM
downloading
data path instruction sequencer
I / O
(procedural) Software
contemporary
host
hardwired
downloading
accelerator(s)
CAD
RAM
reconfigurable computing
host
re-
downloading
conf. accelerator(s)
RAM RAM
Software Configware
ASICs
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
56
Changing Models of Computation
contemporary host
hardwired
Compiler
accelerator(s)
CAD
RAM
reconfigurable computing
host
re-
Co-Compiler
conf. accelerator(s)
RAM RAM
Software Configware
ASICs
*) even 80% hardware people hate their tools
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
57
mProcessor
Co-Compilation
partitioning compiler
Computer Machine Paradigm
Software running on
Xputer “Soft” Machine Paradigm
Configware running on GNU C
compiler Analyzer / Profiler
supporting different platforms
Resource Parameters
inte
rface
X-C compiler
Reconfigurable Accelerators KressArray
DPSS
high level programming language source X-C
Partitioner
Jürgen Becker’s Co-DE-X Co-Compiler [ASP-DAC’95]
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
58
Co-Compilation
Xputer “Soft” Machine Paradigm
Configware running on
partitioning compiler
high level programming language source
mProcessor Reconfigurable
Accelerators interf
ace
Reconfigurable
Architecture (RA)
-- instead of hardwired
We introduce: Co-Compilation
Computer Machine Paradigm
Software running on
Xputer “Soft” Machine Paradigm
Configware running on
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
59
Jürgen Becker’s Co-DE-X Co-Compiler
Analyzer / Profiler
host
GNU C compiler
para d igm Computer machine
DPSS KressArray
X-C compiler
Xputer machine paradigm
Partitioner
X-C is C language extended by MoPL X-C
Resource Parameters
supporting different platforms
supporting platform-based design
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
60
Loop Transformation Examples
loop 1-8 body body endloop
loop 1-8 body endloop
loop 9-16 body endloop
fork
join strip mining
loop 1-4 trigger endloop
loop 1-2 trigger endloop
loop 1-8 trigger endloop
reconf.array: host: loop 1-16 body endloop
sequential processes: resource parameter driven Co-Compilation
loop unrolling
-
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
61
History of Loop Transformations
David Loveman, 1977, Allen and Kennedy, et al.
Loop Unrolling, Loop Fusion, Strip Mining ....
• (Parameter-driven) Time to Time/Space Partitioning 1995/97 [Karin Schmidt / Jürgen Becker]: downto Datapath Level:
e. g.: Transformation from Sequential Process to Super-systolic
• Multi-dimensional Loop Unrolling / Storage Scheme Optimization supporting burst-mode & parallel Memory Banks
2000 [Michael Herz]: optimized RA to Memory Communication Bandwidth:
70ies - 80ies: at Process Level: • Sequential to Parallel Processes, incl. Vectorization
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
62
History of Loop Transformations
• For Sequential Programs on Parallel Computers: David Loveman, 1977, Allen and Kennedy, etc.:
Loop Unrolling, Loop Fusion, Strip Mining ....
• For memory communication: Michael Herz (2000): Multi-Level Loop Unrolling to reduce Memory Cycles needed to create RA Data Streams
• For parallel Datapaths: Jürgen Becker (1997): to • Sequential to Super-Systolic Transformation • Optimize Throughput of Reconfigurable Arrays (RAs)
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
63
Future Coarse Grain RA Development
• It is indispensable to operate within the Convergence Area of Compilers, Co-Compilers, Architecture and full-custom-style VLSI Design (array cells).
• It is a must, that Products come with a Development Platform which encourages users,especially also those with a limited Hardware Background.
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
64
>> Design Space Explorers
• EDA revolution
• Dead Supercomputer
• Stream-based Computing
• Stream-based Memory Architecture
• Design Space Explorers
• KressArray Xplorer
• Machine paradigms
• Co-Compilation http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
65
Schedule
time slot
08.30 – 10.00 Reconfigurable Computing (RC)
10.00 – 10.30 coffee break
10.30 – 12.00 Stream-based Computing for RC
12.00 – 14.00 lunch break
14.00 – 15.30 Resources forRC
15.30 – 16.00 coffee break
16.00 – 17.30 FPGAs: recent developments
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
66
END