Kaiserslautern Reconfigurable ComputingKaiserslautern University of Technology 7 Dead Supercomputer...
Transcript of Kaiserslautern Reconfigurable ComputingKaiserslautern University of Technology 7 Dead Supercomputer...
-
DASS ‘2003 und
SDA ‘2003
Data-Stream-based Reconfigurable Computing
Reiner Hartenstein
Kaiserslautern University of Technology
Dresden, May 8-9, 2003
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
2
„new“ terms
Flowware*: similar to software, but data counter manipulation:
data streams instead of instruction streams
Configware: sources for programming morphware
Software: you all know Hardware: you all know Morphware: structurally programmable „hardware“
(only the terms are „new“, however, not their subject)
clean terminology and taxonomy needed for
comprehensibility *) no relations to „dataflow machine“ (dead area)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
3
flowware defines ....
time
port #
time
DPA
x x x
x x x
x x x
|
| |
x x
x
x
x
x
x x
x
- -
-
input data streams
x x
x
x
x
x
x x
x
- -
-
-
-
-
-
-
-
-
-
-
x x x
x x x
x x x
|
|
|
|
|
|
|
|
|
|
|
| output data streams
time
port # time
port #
... which data item at which time at which port
1980: data streams
(Kung, Leiserson) 1995: super systolic
rDPA (Kress) 1996+: SCCC (LANL),
SCORE, ASPRC, Bee (UCB), ...
(tutorials and courses available on all this)
flowware history:
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
4
domain procedural structural
computing in ... time only* space and time
program source software*
hardwired reconfigurable
currently emerging
(hardware +) software**
(hardware +) flowware
configware + flowware
„instruction“ fetch at runtime
before fabrication at loading time
data „fetch“ at run time **) software „simulates“ flowware
algorithms variable
resources variable
reconfigurable:
http://hartenstein.de
programming: procedural vs. structural
algorithms fixed
resources fixed
fully hardwired: not programmable
*) only one source needed
algorithms variable
resources fixed
CPU:
embedded systems: data-stream-based
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
5
platform program source running on it machine paradigm
hardware (not programmable)
none
morphware
fine grain rGA (FPGA) configware
coarse grain
rDPU, rDPA reconfigurable data stream processor
flowware & configware anti
machine data stream processor (hardwired) flowware
instruction stream processor software von Neumann machine
Digital System Platforms clearly distinguished (1)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
6
Crusty Computing Sciences
[David Padua, John Hennessy]
shrinking supercomputing conferences
more and more efforts yield only marginal improvements
dataflow machines are dead
98.5% vN-only
this monopoly is dangerous
areas fade away
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
7
Dead Supercomputer Society
•ACRI •Alliant •American Supercomputer
•Ametek •Applied Dynamics •Astronautics •BBN •CDC •Convex •Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent
•DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland •Computer •Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel Machines
•Kendall Square Research •Key Computer Laboratories
[Gordon Bell, keynote at ISCA 2000]
•MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
8
Stealthy CS Crisis
progress in CS stalled by qualification problems in industry and academia
communication barriers between disciplines
exploding design cost and implementation cost
not only in embedded systems: comprehensibility barrier between procedural and structural mind set
severe software quality problems
often hardware people needed to solve CS problems
80% of designers hate their tools... ... unusable for SW people
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
9
What are the Challenges ? (1) [ST microelectronics, MorphICs, Dataquest, eASIC]
1
2
0 10 12 18 months
factor
*) Department of Trade and Industry, London
10y
4y
90% by 2010
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
10
McKinsey Curve: dynamics of R&D disciplines
maturity of a discipline
year
fundmental issues
saturation: limitations met
evangelists create awareness
consolidation
challenges and motivation
CS discipline gets crusted
innovation
evangelists ....
challenges ....
new discipline on top of it .... new CS by innovation
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
11
data streams ...
History of Computing
mainframes PC
?
1957
1967
1977
1987
1997
2007
new CS
maturity
technology issue and
business model
free rider
classical CS
morphware
but awareness still missing .... ... still ignored by most CS curricula
it´s already existing ...
here?
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
12
data streams ...
Semiconductor Revolutions
mainframes PC
?
1957
1967
1977
1987
1997
2007
technology issue and business model
Trittbrettfahrer
morphware
TTL
µproc. memory
“Mainstream Silicon Application is switching every 10 Years” standard
custom
LSI, MSI
ASICs, accel’s
here?
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
13
time of Makimoto’s 3rd wave
[Hartenstein]
The next EDA Industry Revolution
1978
Transistor entry: Applicon, Calma, CV ...
1992
Synthesis: Cadence, Synopsys ... 1985
Schematics entry: Daisy, Mentor, Valid ...
courtesy [Keutzer / Newton]
EDA industry paradigm switching every 7 years
1999 (Co-) Compilation
Data-Stream-based DPU arrays
2006
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
14
it‘s time for a new CS
it‘s time for a new CS ...
configware flowware
embedded systems: hw/cw/sw co-design
next EDA wave: high level languages
CS crisis: qualification
problems
.... a dichotomy of 2 machine paradigms
urging us opportunities
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
15
Matter & Antimatter
The World of Matter machine paradigm: the Atom
+ + -
The World of Anti Matter machine paradigm: Anti Atom
- - +
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
16
Matter & Antimatter of Informatics :
- DPU
+
Anti Machine paradigm
+
CPU
-
nothing central !
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
17
Drafting a Road Map
The talk gives a draft of a road map toward a symbiosys of basic computing paradigms
What delays the break-through of Reconfiguable Computing ?
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
18
Machine paradigms
von Neumann data-stream machine instruction
stream machine
M
I/O
instruction sequencer
CPU
instruction stream
DPU
Software
M
DPU or rDPU
data address generator
(data sequencer)
memory
data stream I/O
asM*
Configware
Flowware
Legend:
download
(reconf.)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
19
heavy anti atoms: DPA = DPU array
- DPA
- DPU
- DPU
- DPU
- DPU
- DPU
- DPU
- DPU
- DPU
- DPU -
DPA
+
+
+
+
+
+
+ +
+
flow
ware
: dat
a st
ream
s sp
inni
ng a
roun
d
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
20
Machine paradigms
von Neumann data-stream machine instruction
stream machine
M
I/O
instruction sequencer
CPU
instruction stream
I/O M M M M M
(r)DPU
DPU
Software
I/O
M M M M M
(r)DPA
memory
M
DPU or rDPU
data address generator
(data sequencer)
memory
data stream I/O
asM*
Configware
Flowware
Legend:
download
(reconf.)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
21
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 = 160 rDPUs
rDPA example
rout thru only
not used backbus connect
SNN filter KressArray Mapping Example
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
22
PACT XPP: Reference Module: XPU128 Co-Processor
ALU - PAE
CF
G
PAE
core
ALU CtrlALU
CF
GC
FG
PAE
core
CF
GC
FG
PAE
core
PAE
core
ALU CtrlALUALU CtrlALU
CF
GC
FG
CF
GC
FG
XPP128 ALU-Array
• 2 X PACs (Cluster) • 128 X ALU-PAEs • 32 X 1Kbyte RAM-PAEs • 8X I/O Elements
• Full 32 or 24 Bit Design • 2 Configuration Hierarchies • Evaluation Board (2001) • XDS Development Tool with Simulator
• PAE Core is 32- or 24-Bit ALU with DSP-Instruction Set and Controller
• Connecttions: Inputs + Outputs (Channels) + Events
[Jürgen Becker, Univ. Karlsruhe]
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
23
Throughput vs. Efficiency
1000
100
10
1
0.1
0.01
0.001 2 1 0.5 0.25 0.13 0.1 0,07
MOPS / mW
µ feature size
S S
S S
resources needed for
reconfigurability
L
L L
L L
L
L L L
area used by application
1 Bit CLB
T. Claasen et al.: ISSCC 1999
Wiring by abutment: 32 Bit example
*) R. Hartenstein: ISIS 1997
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
24
Throughput vs. Flexibilityy
1000
100
10
1
0.1
0.01
0.001 2 1 0.5 0.25 0.13 0.1 0,07
MOPS / mW
µ feature size
T. Claasen et al.: ISSCC 1999
tment: example
*) R. Hartenstein: ISIS 1997
flexibility
throughput
hard- wired
von Neumann
FPGAs
coarse grain goes far beyond bridging the gap
coarse grain
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
25
Machine paradigms
von Neumann data-stream machine instruction
stream machine
M
I/O
instruction sequencer
CPU
instruction stream
I/O M M M M M
(r)DPU
DPU
Software
I/O
M M M M M
(r)DPA
memory embedded memory architecture*
M
DPU or rDPU
data address generator
(data sequencer)
memory
data stream I/O
asM*
Configware
Flowware
Legend:
download
(reconf.)
*) new discipline: came just in time: Herz et al.: Proc IEEE ICECS 2002
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
26
Configware / Flowware Compilation
r. Data Path Array
rDPA intermediate
high level source program
wrapper
address generator
configware
mapper
flowware
scheduler
M M M M
M M M M
M
M
M
M
M
M
M
M
data streams
data sequencer
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
27
http://kressarray.de
Efficient Memory Communication should be directly supported by the Mapper Tools
sequencers
memory ports
application
not used
Legend: Optimized Parallel Memory Controller
An example by Nageldinger’s KressArray Xplorer
Synthesizable Memory Communication
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
28
Data-Stream-based Soft Machine
Scheduler Memory
(data memory)
memory bank
memory bank
memory bank
memory bank
memory bank
...
...
“instructions”
rDPA Compiler
Sequencers (data stream
generator)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
29
The Disk Farm? or a System On a Card?
The 500GB disc card LOTS of bandwidth A few disks replaced by >10s Gbytes RAM and a processor
14"
MicroDrive: 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW)
Integrated IRAM processor Connected via crossbar switch
growing like Moore’s law 16 Mbytes; ; 1.6 Gflops; 6.4 Gops 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops
[Gordon Bell, Jim Gray,
ISCA2000]
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
30
computing paradigms and methodologies
1946: machine paradigm (von Neumann) 1980: data streams (Kung, Leiserson) 1989: anti machine paradigm introduced 1990: anti machine implementation methodology 1990: rDPU (Rabaey) 1994: anti machine high level programming language 1995: super systolic rDPA (Kress) 1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ... 1997: configware / software partitioning compiler (Becker) 2000: generator for rDPA with high memory bandwidth
(tutorials and courses available on all this)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
31
Digital System Platforms clearly distinguished (2)
platform program source running on it machine paradigm
hardware (not programmable)
none
morphware
fine grain rGA (FPGA) configware
coarse grain
rDPU, rDPA reconfigurable data stream processor
flowware & configware anti
machine data stream processor (hardwired) flowware
instruction stream processor software von Neumann machine
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
32
Software Industry
TTL µproc., memory
custom
standard
ASICs, accel’s LSI, MSI
1957
1967
1977
1987
1997
2007
Procedural personalization via RAM-based
Machine Paradigm
Software Industry’s Secret of Success
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
33
Configware Industry ?
TTL µproc., memory
custom
standard
ASICs, accel’s LSI, MSI
1957
1967
1977
1987
1997
2007
structural personalization:
RAM-based before run time
Repeat Success Story by new Machine Paradigm !
Configware Industry
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
34
not a niche market
Analyzer / Profiler
SW code
SW compiler
para d igm “vN" machine
CW Code
CW compiler
anti machine paradigm
Partitioner
Resource Parameters
supporting different platforms
supporting platform-based design
High level PL source
could provide the platforms
The Secret of Success: Co-Compilation
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
35
thank you
thank you for your patience
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
36
>>> END
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
37 © 2001, [email protected] http://KressArray.de
University of Kaiserslautern
Xputer Lab>>> Appendix
Appendix for discussion
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
38
not a niche market
Analyzer / Profiler
SW code
SW compiler
para d igm “vN" machine
CW Code
CW compiler
anti machine paradigm
Partitioner
Resource Parameters
supporting different platforms
supporting platform-based design
High level PL source
should provide the platforms
The Secret of Success: Co-Compilation
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
39
Machine Paradigms
machine category Computer (the Machine:
“v. Neumann”) The Anti Machine
driven by: Instruction streams data streams (no “dataflow”)
engine principles instruction sequencing sequencing data streams
state register single program counter (multiple) data counter(s)
Communication path set-up .
at run time at load time
resource DPU (e.g. single ALU) DPU or DPA (DPU array) etc. data path
operation sequential parallel pipe network etc.
( “instruction fetch” )
also hardwired implementations* *) e g. Bee project Prof. Broderson
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
40
Programming Language Paradigms
language category Computer Languages Languages f. Anti Machine
both deterministic procedural sequencing: traceable, checkpointable
operation sequence driven by:
read next instruction, goto (instr. addr.),
jump (to instr. addr.), instr. loop, loop nesting
no parallel loops, escapes, instruction stream branching
read next data item, goto (data addr.),
jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching
state register program counter data counter(s)
address computation
massive memory cycle overhead overhead avoided
Instruction fetch memory cycle overhead overhead avoided
parallel memory bank access interleaving only no restrictions
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
41 © 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
Jürgen Becker’s Co-DE-X Co-Compiler
Analyzer / Profiler
Host Software
GNU C compiler
para d igm Computer machine
DPSS KressArray Configware
X-C compiler
Xputer machine paradigm
Partitioner
X-C is C language extended by MoPL X-C
Resource Parameters
supporting different platforms
supporting platform-based design
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
42
KressArray Family generic Fabrics: a few examples
Examples of 2nd Level Interconnect: layouted over rDPU cell - no separate routing areas !
+
rout-through and function
rout-through
only more NNports:
rich Rout Resources
Select Function
Repertory
select Nearest Neighbour (NN) Interconnect: an example
16 32 8 24
4
2 rDPU
Select mode, number, width of NNports
http://kressarray.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
43
Impact of Makimoto’s wave
TTL µproc., memory
custom
standard
ASICs, accel’s LSI, MSI
1957
1967
1977
1987
1997
2007
Procedural personalization via RAM-based
Machine Paradigm
Personalization (CAD) before fabrication
structural personalization:
RAM-based before run time
Software Industry’s Secret of Success
Repeat Success Story by new Machine Paradigm !
Configware Industry
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
44
The Dominance of the Submarine Model ...
Hardware
... indicates, that our CS education system produces zillions of
mentally disabled Persons
(procedural) structurally disabled
… completely disabled to cope with solutions other than software only
It‘s time to attack the software faculty dictatorship. Get involved!
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
45
However, current CS Education ….
Hardware invisible: under the surface
… is based on the Submarine Model
Brain usage: procedural-only
Algorithm
Assembly Language
procedural high level Programming Language
Hardware
This model disables ...
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
46
Hardware, Configware
Hardware and Software as Alternatives
Algorithm
Software
partitioning
Software only
Software & Hardw/Configw
procedural structural
Brain Usage: both Hemispheres
Hardw/Configw only
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
47
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
Why Coarse Grain instead of FPGA ?
physical logical
FPGA logical
1980 1990 2000 2010
FPGA physical
100 000 000 000
10 000 000 000
1000 000 000
100 000 000
10 000 000
1000 000
100 000
10 000
1000
Tra
nsis
tors
/ c
hip
~ 10
~ 10 000
drastically smaller configuration memory
a lot of more benefits
much faster loading
FPGA routed
reduced reconfigurability overhead by up to ~ 1000
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
48
Second Blossom of CS
progress in CS stalled by qualification problems in industry and academia
Communication barriers between disciplines
Exploding design and implementation cost
Not only in embedded systems: comprehensibility barrier between procedural and structural mind set
Severe software quality problems
Bad hardware / configware design tools: more than 80% of designers hate their tools
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
49
Procedural vs. structural
progress in CS stalled by qualification problems in industry and academia
like microprocessors also morphware is RAM-based – secret of sucsess of software industry
Could configware industry repeat this success story ?
Configware will remain a niche market, unless it Comes along with hardware / configware / software
co-design
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
50
Algorithms and Data Structures People
... have to go beyond pointers, queues, and stacks
#
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
51
roadmap
old CS lab course philosophy:
given an application: implement it by a program -/-
new CS freshman lab course environment: Given an application:
a) implement it by writing a program b) implement it as a morphware prototype c) Partition it into P and Q
c.1) implement P by software c.2) implement Q by morphware c.3) implement P / Q communication interface
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
52
Algorithms and Data Structures
... have to go beyond pointers, queues, and stacks
Extend by including algorithmic issues in software /morphware/ hardware
migration additional levels of parallelism: chaining, pipelining,
systolic, super-systolic, wavefront arrays additional data structures and storage organization: the
new distributed memory discipline
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
53
Computer Organization / Architecture
... have to go beyond von Neumann,
Extend by including nested machines, address generators the anti machine paradigm Extended taxonomy of platforms: procedural, structural,
hardwired, reconfigurable, zhybrid systems
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
54
Languages and Compilers
... have to go beyond von Neumann,
Extend by including Configware / flowware compilers, Procedural / structural co-compilers (data-procedural) flowware languages
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
55
Semiconductor Revolutions
“Mainstream Silicon Application is switching every 10 Years”
TTL
custom
standard
1957
1967
1977 LSI, MSI
µproc., memory
1987
1997 ASICs, accel’s
1st
desi
gn c
risi
s
2nd
des
ign
cris
is
hardware people new breed (M&C)
software people new breed needed
2007
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
56
EDA the main bottleneck
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
57
Biggest Mistake of EDA guess it !
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
58
Innovation Stalled ? [Richard Newton]
What is next after VHDL ?
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
59
Flowware and Software
Software: instruction-stream-based – i. e. based on program counter manipulation
Flowware: data-stream-based – i. e.based on data counter manipulation
Software and lowware: like 2-eiige Zwillinge einführen
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
60
Models (1)
1. There is a very wide variety of architectures
2. Most papers have bad organization: to show authors‘ creativeness often less relevant details are stressed in a confusing mix of abstraction levels
4. a common model is existing – but it‘s usually ignored
3. Architectures are not described in terms of a common model
5. We need a comprehensible taxonomy of architectures
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
61
Models (2)
1. Reconfigurable instructions et extension
2. Reconfigurable co-processor 2a. FPGA
2b. Coarse grain
I omit 3: hardwired accelerators I do not talk about reconfigurable instruction set processors
M&C structured VLSI design: max no. Of transistors within regular strcutures – Craig Mudge: regularity factor
- structured Configware Design
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
62
>> history & terminology
• history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
63
Semiconductor Revolutions
“Mainstream Silicon Application is switching every 10 Years”
TTL
custom
standard
1957
1967
1977 LSI, MSI
µproc., memory
1987
1997 ASICs, accel’s
1st
desi
gn c
risi
s
2nd
des
ign
cris
is
hardware people new breed (M&C)
software people new breed needed
2007
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
64
Terminology: DPU versus CPU ...
• DPU: data path unit • DPA: DPU array • GA: gate array • rDPU: reconfigurable DPU • rDPA: reconfigurable DPA • rGA: reconfigurable GA
• DPU is no CPU: there is nothing central - like in a DPA
DPU DPU
DPU instruction sequencer
CPU
DPA (r)
(r)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
65
flowware defines ....
time
port #
time
DPA
x x x
x x x
x x x
|
| |
x x
x
x
x
x
x x
x
- -
-
input data streams
x x
x
x
x
x
x x
x
- -
-
-
-
-
-
-
-
-
-
-
x x x
x x x
x x x
|
|
|
|
|
|
|
|
|
|
|
| output data streams
time
port # time
port #
... which data item at which time at which port
flowware manipulates the data counter(s) ...
... software manipulates the program counter
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
66
History of data-streams
1980: data streams (Kung, Leiserson) 1995: super systolic rDPA (Kress) 1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...
(tutorials and courses available on all this)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
67
>> skyrocketing requirements
• history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
68
What are the Challenges ? (1) [ST microelectronics, MorphICs, Dataquest, eASIC]
1
2
0 10 12 18 months
factor
4y
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
69
Changing Models of Computing
“von Neumann”
down loa d in g
RAM
down loa d in g
da ta pa th in s t ru ct ion s equ en cer
I / O
(procedural) Software
hardware/software co-design
software design
the problem with typical CS
people: -the dominance of von Neumann
- they cannot partition
- they cannot migrate
h os t
hardwired
down loa d in g
accelerator(s)
CAD
RAM
hardware
Software hardware
spec
hardware people needed
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
70
>> destructive von Neumann monopoly
• history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
71
Which machine paradigm ?
von Neuman does not support morphware
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
72
What about CS people ?
TTL µproc., memory 1957
1967
1977
1987
1997
2007
ASICs, accel’s
LSI, MSI
FPGAs
coarse grain
soft CPUs
CS people
procedural programming
languages, compiler computer
architecture
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
73
Flag ship example: annual IEEE ISCA conference series
Resignation?
taken over by the opposition:
Interconnect Fabrics:
vN Parallelism:
the Datenflow Machine is dead
Statistics [David Padua, John Hennessy, et al.]
Reconfigurable Computing
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
74
There are more Levels of Parallelism
Loop Level (data-stream-based, pipe nets, etc.)
Instruction Level (VLIW etc.)
Logic Level (FPGAs)
RT Level (special architectures etc.)
Process level
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
75
What are the Challenges ? (2) [ST microelectronics, MorphICs, Dataquest, eASIC]
1
2
0 10 12 18 months
factor
*) Department of Trade and Industry, London
10y
4y
90% by 2010
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
76
Changing Models of Computing
h os t
re-
down loa d in g
conf. accelerator(s)
RAM RAM
Software Configware
(structural)
Morphware
configware/software co-design
hardware/configware/software co-design “von Neumann”
down loa d in g
RAM
down loa d in g
da ta pa th in s t ru ct ion s equ en cer
I / O
(procedural) Software
h os t
hardwired
down loa d in g
accelerator(s)
CAD
RAM
Hardware
Software
hardware/software co-design
software design
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
77
no von Neumann bottleneck ?
typical CS people:
• how to provide more performance to these people ?
• think in terms of machine models: sequencing instruction by instruction
• cannot be turned into hardware people
• new machine paradigm needed which does not have a von Neumann bottleneck
• the anti machine has no von Neumann bottleneck
• data streams instead of an instruction stream
• flowware instead of software
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
78
Just in time
The new distributed memory discipline:
just in time to implement the anti machine.
[3] M. Herz et al. (invited): Memory Organization for
Data-Stream-based Reconfigurable Computing; Proc. ICECS 2002
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
79
>> high mask cost
• history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
80
What are the Challenges ? (3) [ST microelectronics, MorphICs, Dataquest, eASIC]
1
2
0 10 12 18 months
factor
*) Department of Trade and Industry, London
30y
10y
4y
3y avoid application-
specific silicon !
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
81
Coarse grain vs. Fine grain
coarse grain (PACT AG, Munich)
multi grain (e. g. by slice bundling)
fine grain (FPGAs, rGAs)
Reconfigurability:
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
82
>> low battery capacity
• history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
83
What are the Challenges ? (4) [ST microelectronics, MorphICs, Dataquest, eASIC]
1
2
0 10 12 18 months
factor
*) Department of Trade and Industry, London
30y
Battery capacity (1.03/year)
10y
4y
3y
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
84
Algorithmic cleverness
Very high throughput on low power slow
FPGAs may be obtained only by algorithmic
cleverness - not yet taught by CS & CSE at
Universities – an urgent educational problem.
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
85
>> new compilation model
• history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
86
What are the Challenges ? (5) [ST microelectronics, MorphICs, Dataquest, eASIC]
1
2
0 10 12 18 months
factor
*) Department of Trade and Industry, London
30y Battery capacity (1.03/year)
10y
4y
3y
5y
2y new
compilation techniques
needed ! supported
by a new machine
paradigm
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
87
>> conclusions
• history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
88
Conclusion
No, we are not ready for the break-through,
since our computing education is obsolete, because of the von Neumann monopoly.
But all ingredients are available to jazz up our CS & CSE curricula
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
89
>>> thank you
thank you for your patience
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
90
scalability
The Scalability Problem
The Routing congestion Problem grows with the size of the FPGA
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
91
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 = 160 rDPUs
http://kressarray.de
SNN filter KressArray Mapping Example
rout thru only
not used backbus connect
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
92
route-thru-only rDPU
3 vert. NNports, 32 bit
http://kressarray.de
Xplorer Plot: SNN Filter Example
+ [13]
2 hor. NNports, 32 bit
operator
result
operand
operand
route thru
backbus connect
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
93
Conclusion: all knowledge needed is available
• languages
• machine paradigm
• compilation techniques
• anti architectural resources
• sequencing methodology: hw & sw
• hw / sw partitioning methodology
• parallel memory IP core and module generator vendors
courses / embedded tutorials:
full day courses:
• anything else needed
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
94
... has a chance
Configware Industry has a Chance
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
95
Conclusions
•the anti machine is the way to go for massive parallelism, also data-intensive applications
•reconfigurable anti machine for high performance with short product life cycles, unstable standards
•reconfigurable for low cost low volume production
•Giga FPGAs highly promising - only by a new design flow: configware could repeat the success of software industry
•sparepart problem: needs new infrastructures
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
96
Paradigm Shifts: Nick Tredennick‘s view
algorithms variable
resources fixed
instruction-stream-based computing:
algorithms variable
resources variable
reconfigurable computing:
programmable
why 2 program sources ?
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
97
Compilation for (r)DPA of anti machine
mapper
scheduler
expressionmorphware
configware
streamware
tree
high level source program
wrapperparameters
codegenerators
DPU library
(software notation)
flowware
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
98
Misleading predictors
Moore's Law is becoming a misleading
predictor of future developments.
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
99
High mask cost
High mask cost may be avoided
completely by morphware use, or,
partly by GAs (ASICs).
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
100
Fault tolerance
Morphware is the only way to
obtain fault-tolerant ICs.
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
101
World-wide services
FPGAs may provide an important
benefit for world-wide services and
all other after sales consequences
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
102
„Re-configurable Hardware“ ??
„Re-configurable Hardware“ ??
this „Hardware“ is not hard !
We need a concise terminology: a consensus is on the way
it‘s Morphware
Terminology has been highly confusing
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
103
Super Pipe Networks
pipeline properties array applications
shape resources
mapping scheduling
(data stream formation)
systolic array
regular data dependencies
only
linear only
uniform only
linear projection or algebraic synthesis
super-systolic rDPA
no restrictions simulated
annealing or P&R algorithm
(e.g. force-directed) scheduling algorithm
* *) KressArray [1995]
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
104
http://kressarray.de
Efficient Memory Communication should be directly supported by the Mapper Tools
sequencers
memory ports
application
not used
Legend: Optimized Parallel Memory Controller
An example by Nageldinger’s KressArray Xplorer
Synthesizable Memory Communication
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
105
Stream-based Soft Machine
Scheduler Memory
(data memory)
memory bank
memory bank
memory bank
memory bank
memory bank
...
...
“instructions”
rDPA Compiler
Sequencers (data stream
generator)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
106
JPEG zigzag scan pattern
x
y
EastScan is step by [1,0] end EastScan;
SouthScan is step by [0,1] endSouthScan;
*> Declarations
NorthEastScan is loop 8 times until [*,1] step by [1,-1] endloop end NorthEastScan;
SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan;
HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag;
goto PixMap[1,1]
HalfZigZag; SouthWestScan uturn (HalfZigZag)
HalfZigZag
HalfZigZag
data counter data counter
data counter data counter
1
3
2
4
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
107
Similar Programming Language Paradigms
language category Computer Languages Xputer Languages
both deterministic procedural sequencing: traceable, checkpointable
sequencingdriven by:
read next instruction, goto (instruction addr.), jump (to instruction addr.), instruction loop, instruction loop nesting no parallel loops, instruction loop escapes, instruction stream branching
read next data object, goto (data addr.), jump (to data addr.), data loop, data loop nesting, parallel data loops, data loop escapes, data stream branching
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
108
GAG = Address Generator
Generic GAG Scheme
Limit Stepper
Base Stepper
GAG
Address Stepper
B0 DA L0
A
D A L B 0 [ ] | | | |
limit
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
109
GAG: Address Stepper
GAG =
Address
Generator
Generic
+ / –
Escape
Clause End
Detect
Step Counter
=o
L A D A init tag
A
Address endExec
maxStepCount
0 B Limit Base stepVector
[ ] | |
D A L B 0 [ ] | | | |
limit
GAG: Address Stepper
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
110
Generic Sequence Examples
a) b)
c)
d) e) f) g)
Limit Slider
Base Slider
GAG
Address Stepper
B0 DA L0
A
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
111
floor
F
address
ceiling
C
Slider Operation Demo Example
yx
B 0 L0
DLDB
DL
DA
DB
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
112
What are the Challenges ? [ST microelectronics, MorphICs, Dataquest, eASIC]
1
2
0 10 12 18 months
factor
*) Department of Trade and Industry, London
30y
Battery capacity (1.03/year)
10y
4y
3y
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
113
What are the Challenges ? [ST microelectronics, MorphICs, Dataquest, eASIC]
1
2
0 10 12 18 months
factor
*) Department of Trade and Industry, London
30y
Battery capacity (1.03/year)
10y
4y
3y design complexity: +40%/year doub 2y
design productivity: +15%/year doub 5y
SIA roadmap]
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
114
>> Outline
• Morphware
• Changing Models by SoC Development
• New Machine Paradigm needed
• The Dichotomy of Paradigms
• Outlook http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
115
The Morphware Market
Xilinx 42%
Altera 37%
Lattice 15%
Actel 6%
Top 4 PLD Manufacturers 2000
total: $3.7 Bio
• [Dataquest] > $7 billion by 2003.
• PLD vendors’ and their alliances provide libraries of “soft IPs”
Configware Market
• fastest growing semiconductor market segment
coarse-grained:
rDPUs: configurable functional blocks
fine-grained:
cLBs, rLBs: configurable logic blocks
PACT AG, Munich, Germany http://pactcorp.com
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
116
Coarse grain vs. Fine grain
coarse grain (PACT AG, Munich)
multi grain (e. g. by slice bundling)
fine grain (FPGAs, rGAs)
Reconfigurability:
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
117
route-thru-only rDPU
3 vert. NNports, 32 bit
http://kressarray.de
Xplorer Plot: SNN Filter Example
+ [13]
2 hor. NNports, 32 bit
operator
result
operand
operand
route thru
backbus connect
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
118
Morphware only: some soft CPU core examples
Spartan-II 16 bit DSP DSPuva16
FLEX10K30 or EPF6016
i8080A My80
32-bit gr1050
16-bit gr1040
Altera – Mercury
8 bit Nios
Altera
22 D-MIPS
32-bit instr. set
Nios 50 MHz
Altera
Mercury
16-bit instr. set
Nios
Xilinx up to 100 on one FPGA
32 bit standard RISC
32 reg. by 32 LUT RAM-based reg.
MicroBlaze 125 MHz 70 D-MIPS
platform architecture core
SpartanXL RISC integer C xr16
old Xilinx FPGA Board
16-bit RISC, 2 opd. Instr.
YARD-1A
1 Flex 10K20 Acorn-1
Altera, Lattice, Xilinx
8 bit CISC 1Popcorn-1
Lattice 4 isp30256, 4 isp1016
12 bit DSP Reliance-1
2 XILINX 3020 LCA
8 bits Instr. + ext. ROM
REGIS
200 XC4000E CLBs
CISC, 32 reg. uP1232 8-bit
ARM ARM7 clone
SPARC Leon
25 Mhz
platform architecture core
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
119
soft CPUs in academic teaching
• UCSC: 1990!
• Märaldalen University • Chalmers University • Cornell University • Gray Research • Georgia Tech • Hiroshima City Univ.
• Michigan State • Univ. de Valladolid • Virginia Tech • Washington U. St. Louis • New Mexico Tech • UC Riverside • Tokai University
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
120
>> New Machine Paradigm needed
• Morphware
• Changing Models by SoC Development
• New Machine Paradigm needed
• The Dichotomy of Paradigms
• Outlook http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
121
>> The Dichotomy of Paradigms
• Morphware
• Changing Models by SoC Development
• New Machine Paradigm needed
• The Dichotomy of Paradigms
• Outlook http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
122
>> Outlook
• Morphware
• Changing Models by SoC Development
• New Machine Paradigm needed
• The Dichotomy of Paradigms
• Outlook
http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
123
Why fine grain ?
•no specific silicon: low production volume (aerospace, automotive, military, industrial controllers, et al.)
• the spare part problem
•design flow
•coming Giga-FPGA
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
124
Configware Industry vs. Software Industry
can configware industry repeat the success story?
•RAM-based
•Compatibility
•Scalability
•Education problems
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
125
Problems of Parallelism
Software to rDPA migration
the area of parallel algorithms needs a complete re-orientation of its scope ...
methodology only in special areas (DSP, wireless ....)
Software to FPGA migration:
enormous speed-ups: factor of 3 to >10 000
algorithmic cleverness missing, no education no methodology for interconnect estimation
... far beyond traditional platforms
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
126
Evolution of FPGA and its design flow
User Code Compiler Executable
Netlister Netlist
Place and
Route . .
Bitstream
Schematics/
HDL
HLL Compiler
Compiler HLL
[à la S. Guccione]
CPU core
FPGA core
Memory core Compiler
HLL
soft CPU
© 2002, [email protected] http://KressArray.de
inter face
s
CPU core
FPGA core
Memory core
rDPA core
inter face
s
soft rDPA
as soon as Giga FPGA is available
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
127
ASIC emulation
•ASIC emulation / Rapid Prototyping: to replace simulation
•Quickturn (Cadence), IKOS (Synopsys), Celaro (Mentor)
•hours of compilation run: inefficient since netlist-based: ...
• ... ASIC emulators will become obsolete soon
•by RTR: in-circuit execution debugging instead of emulation
•new business model: upgradable morphware is the product
•emulation for solving the spare part problem in many areas
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
128
Nasty Matter
+ CPU
Data Path
instruction sequencer
RAM
Address Computation Overhead
Instruction Fetch Overhead
central von Neumann bottleneck
extremely power hungry and area inefficient
reconfigurable?
the wrong machine paradigm
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
129
- DPU
Data Path Unit
DPU
Data Path
instruction sequencer
Matter vs. Antimatter: CPU vs. DPU
+
dat
a st
ream
dat
a st
ream
s +
+
Data Path Unit
DPU
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
130
+ CPU
Data Path
instruction sequencer
+ simple machine paradigm + scalability
+ relocatability + compatibility
= secret of success of software industry
RAM
RAM-based CPU:
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
131
Success Factors
property instruction
stream based
data stream based
reconfigurable hardwired fine grain
(FPGA) coarse grain
RAM-based yes yes yes (hardwired)
machine paradigm yes no available available
compatibility yes limited feasible feasible
scalability yes no good* (hardwired)
code relocatability yes no good* (hardwired)
*) if KressArray used
**) mapping coarse grain onto FPGA
good**
good**
feasible**
available**
success of software industry
• for configware industry is missing: – FPGA compatibility, – fully scalable FPGA, – relocatable configuration code • rDPUs and rDPAs do
much better than FPGAs
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
132
>>> Problems with Concurrency
• The Computer Architecture Crisis
• The Impact of Reconfigurable Platforms
• The Dichotomy of Models
• Parallelism
• Conclusions http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
133
Parallelism by Concurrency
independent instruction streams
....
Bus(es) or switch box
Data Path
instruction sequencer
Data Path
instruction sequencer
Data Path
instruction sequencer
Data Path
instruction sequencer
+ -
+ -
- +
+
+ -
+
- +
-
-
difficult coordination
massive run time overhead
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
134
>> The Dominance of Embedded Systems
• The Computer Architecture Crisis
• The Impact of Reconfigurable Platforms
• The Dichotomy of Models
• Parallelism
• Conclusions http://www.uni-kl.de
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
135
Summary of the Anti Machine Paradigm
• anti language primitives are almost the same (slightly extended)
• anti machine execution potential is dramatically more powerful
• provides drastically more flexibility
• not always replacing von Neumann
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
136
JPEG zigzag scan pattern
x
y
EastScan is step by [1,0] end EastScan;
SouthScan is step by [0,1] endSouthScan;
*> Declarations
NorthEastScan is loop 8 times until [*,1] step by [1,-1] endloop end NorthEastScan;
SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan;
HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag;
goto PixMap[1,1]
HalfZigZag; SouthWestScan uturn (HalfZigZag)
HalfZigZag
data counter data counter
data counter data counter
2
1
3
4
HalfZigZag
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
137
>> Address Generators for Data Streams
• Introduction
• Smart Address Generators
• Address Generators for Data Streams
• Customized Memory Organization
• Conclusions http://www.uni-kl.de
(data streams introduced earlier in this session)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
138
2-D Generic Data Sequence Examples
a) b)
c)
d) e) f) g)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
139
GAG = Address
Generatorc
Generic GAU generic address unit Scheme
Base Slider
B0
Limit Slider
L0
0 B
[
Address Stepper
DA
A
D A | | | |
L
]
limit
all 3 are copies of the same BSU
stepper circuit GAU
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
140
GAG: Address Stepper
GAG =
Address
Generator
Generic
+ / –
Escape
Clause End
Detect
Step Counter
=o
L A D A init tag
A
Address endExec
maxStepCount
0 B Limit Base stepVector
[ ] | |
D A L B 0 [ ] | | | |
limit
GAG: Address Stepper
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
141
GAG Slider Model
LimitStepper
BaseStepper
AddressStepper
B0DAL0
A
LimitStepper
BaseStepper
AddressStepper
B0DAL0
A
sliders
B 0 B
[
0 L
]
0 L 0
B 0 B
[
0 A D
A D
L
]
0 L 0
GAG Generic
Address Generator
floor ceiling
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
142
GAG Complex Sequencer Implementation
Limit Slider
Base Slider
GAU
Address Stepper
B0 DA L0
A
all `been published
in 1990
Limit Slider
Base Slider
GAU
Address Stepper
B0 DA L0
A
Limit Slider
Base Slider
GAU
Address Stepper
B0 DA L0
A
GAU GAU
GAG Generic Address Generator
SDS
GAG
VLIW stack
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
143
ceiling
C
address
GAG Slider Operation Demo Example
yx
DLDB
L0B 0 DAF floor
DLDB
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
144
The microelectronics spare part problem
•Original fab line is no more existing
•ICs do not survive storage time
•Demand: several decades of availability
2 1 0.5 0.25 0.13 0.1 0,07 µ feature size
[Hartenstein 2002]
• e. g. car price: ~25% electronics
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
145
The microelectronics spare part problem
2 1 0.5 0.25 0.13 0.1 0,07 µ feature size
[Hartenstein 2002]
key problem in many application areas: medical, aerospace, automotive, other transportation, military, industrial equipment controllers, et al.
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
146
Dead Supercomputer Society
•ACRI •Alliant •American Supercomputer
•Ametek •Applied Dynamics •Astronautics •BBN •CDC •Convex •Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent
•DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland •Computer •Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel Machines
•Kendall Square Research •Key Computer Laboratories
[Gordon Bell, keynote at ISCA 2000].
•MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
147
CS: young ? dynamic?
.. but the von Neumann Paradigm is still the dominant doctrine ...
Microelectronics is ignored (except falling cost of computational effort)
... still pushing he basic models from the times of mainframe dinosaurs
after >10 technology generations ...
• 1th 4004 • 2nd 8008 • 3rd 8086 • 4th 80286 • 5th 80386 • 6th 80486 • 7th P5 (Pentium) • 8th P6 (Pentium Pro / Pentium II) • 9th Pentium III • 10th .... • 11th • .......
... the vN Microprocessor is a methusela, the steam engine of the silicon age.
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
148
better to go for reconfigurable platforms
• [Dataquest] PLD market > $7 billion by 2003.
• fastest growing segment of semiconductor market
• IP reuse and silicon reuse
• FPGAs are going into every type of application
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
149
Throughput vs. Flexibility
flexibility
throughput 1000
100
10
1
0.1
0.01
0.001 2 1 0.5 0.25 0.13 0.1 0,07
MOPS / mW
µ feature size
T. Claasen et al.: ISSCC 1999
hard- wired
von Neumann
FPGAs
the anti machine goes far beyond bridging the gap
anti machine
*) R. Hartenstein: ISIS 1997
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
150
Why coarse grain ?
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
151
Terminology
DPU data path unit rDPU reconfigurable DPU DPA data path array (DPU array) rDPA reconfigurable DPA RA reconfigurable array ISP instruction set processor AM anti machine AMP data stream processor* rAMP reconfigurable AMP
*) no “dataflow machine”
platform category
programming source
machine paradigm
hardware (not programmable) none
ISP software von Neumann
• morphware configware FPGA: none data stream processor (AMP)
streamware anti machine
reconfigurable
AMP (rAMP)
streamware &
configware
digital system platforms:
morphware use granularity (path width) (re)configurable blocks
reconfigurable logic • fine grain (~1 bit) CLBs
reconfigurable computing coarse grain (e.g. 32 bits) rDPUs (e.g. ALU-like)
multi granular: by slice bundling rDPU slices (e.g. 4 bits)
categories of morphware:
consensus is near
FPGA field-programmable gate array FPL field-programmable logic PLD programmable logic device CPLD complex PLD
instruction set processor
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
152
>> Problems to be solved
• Configware Market
• FPGA Market
• Embedded Systems (Co-Design)
• Hardwired IP Cores on Board
• Run-Time Reconfiguration (RTR)
• Rapid Prototyping & ASIC Emulation
• Evolvable Hardware (EH)
• Academic Expertise
• ASICs dead
• Soft CPU
• HLLs
• Problems to be solved
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
153
EDA industry shift into CS mentality [Wojciech Maly]
• patches instead of engineering • innovation stalled many years ago • 85% users hate their tools • netlist-based: do not care about efficiency, ... • ... do not care about transistor density
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
154
[Jonathan Rose] FPGAs Give You
• Instant Fabrication – Get to Market Fast – Fix ‘em quick
• Zero NRE Charges – Low Risk – Low Cost at good volume
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
155
The Crisis of Computing Sciences
• Computing Sciences are in a severe crisis • Computing curricula are obsolete because of strictly
enforced „procedural-only“ blinders • Computer Architecture and related areas have
lost leadership in digital system implementation
• CS ignores > 90% µprocessors in embedded systems: 10 times more programmers will write embedded applications than computer software by 2010
• A disruptive promising therapy introduced by new approaches coming with Reconfigurable Computing
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
156
Ubiquitous embedded systems
20 billion µprocessors (2001)
> 90% in embedded systems
10 times more programmers will write embedded applications than computer software by 2010
That’s where our graduates will go
Embedded systems means:
• hardware / software co-design
• configware / software co-design
• hardware / configware / software co-design
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
157
The Situation in Computing Sciences
• Computing Sciences are in a severe crisis
• New fundamentals and R&D directions are inevitable
• my mission: getting you involved
• All knowledge needed is readily available ...
• ... even from Computing Sciences
• Silicon application and EDA provide useful concepts
• Reconfigurable Computing has the remedy
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
158
the edu gap has dramatic consequences
•Key R&D scenes are drying out or dying •because of a lack of qualified researchers •the embedded system design crisis gets worse •because of a lack of qualified designers •many innovative products cannot be sold •because of a lack of qualified customers •the edu gap is widening dramatically •because of a lack of qualified educators
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
159
Super Pipe Networks
pipeline properties array applications
shape resources
mapping scheduling
(data stream formation)
systolic array
regular data
dependencies only
linear only
uniform only
linear projection or algebraic synthesis
super-systolic DPA
no restrictions simulated
annealing or P&R algorithm
(e.g. force-directed) scheduling algorithm
*) KressArray [ASP-DAC-1995]
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
160
.... it‘s an alternative culture ....
• now the area is going mainstream: a rapidly widening audience of non-specialists gets interested ...
• severe communication gaps due to educational deficits
• not only to users: still many hardware and EDA experts ask: isn’t it just logic design on a strange platform ?
• it is time to clarify and popularize fundamental aspects and to explain, that it is a fundamentally different culture
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
161 © 2001, [email protected]
University of Kaiserslautern
Xputer Lab
instructions
program cou n ter: state register
Compiler RAM
Datapath
har dw ired
Sequencer
Computer tightly coupled by compact instruction code
“von Neumann” does not support soft data paths
Datapath
Xputer
Scheduler
Compiler
RAM
(multiple) sequencer
Datapath Array
“instructions”
University of Kaiserslautern
Xputer Lab
loosely coupled by decision data bits only
Xputer: The Soft Machine Paradigm reconfigurable
also for hardwired
Computer: the wrong Machine Paradigm
“von Neumann”
s d a ta cou n ter
(anti machine)
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
162
Semiconductor Revolutions
“Mainstream Silicon Application is switching every 10 Years”
TTL µproc., memory
custom
standard
1957
1967
1977
1987
1997
2007
ASICs, accel’s
LSI, MSI
“The Programmable System-on-a-Chip is the next wave“
Tredennick’s Paradigm Shifts
hardwired
algorithm: fixed
resources: fixed
procedural programming
algorithm: variable
resources: fixed
structural programming
algorithm: variable
resources: variable
vN machine paradigm
anti machine paradigm
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
163
Impact of Data-stream-based ...
TTL µproc., memory
custom
standard
ASICs, accel’s LSI, MSI
1957
1967
1977
1987
1997
2007
structural personalization:
hardwired before fabrication
Repeat Success Story by new Machine Paradigm !
Embedded Hardware/ Configware Industry
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
164
Rapidly growing CS education gap
•Our computing curricula are obsolete • introduction is strictly „procedural-only“
•vN-only use of terms like „computer organisation“, „ computer structures“, „ computer architecture
•graduates are not prepared to the real world – most applications for embedded systems (>90% by 2010)
•our graduates are unable to compete with EE graduates •only a few % curricula need to be changed
•my mission: getting you involved
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
165
Binding Time vs. Computing Domain
time domain (procedural)
Binding time: (Set-up of Communication Channels)
at run time microprocessor parallel computer
time & space (hybrid)
later fabrication step ASICs
space domain (structural)
before fabrication full custom ICs
at loading time
at compile time
Reconfigurable Computing
array processor
programming domain:
supersystolic arrays systolic
arrays
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
166
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
Why Coarse Grain instead of FPGA ?
physical logical
FPGA logical
1980 1990 2000 2010
FPGA physical
100 000 000 000
10 000 000 000
1000 000 000
100 000 000
10 000 000
1000 000
100 000
10 000
1000
Tra
nsis
tors
/ c
hip
~ 10
~ 10 000
drastically smaller configuration memory
a lot of more benefits
much faster loading
FPGA routed
reduced reconfigurability overhead by up to ~ 1000
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
167
What are the differences ?
vN* computing:
• computing in time
• instruction fetch at run time
• procedural programming
• instruction scheduling
Reconfigurable Computing:
• computing in space and time
• “instruction” fetch at compile time
• structural programming
• data scheduling
• i. e. Data-stream-based
• also hardwired implementations**
• “instruction” fetch before fabrication **) e g. Bee project Prof. Broderson *) vN stands for “von Neumann”
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
168
Basics of Binding Time
run time
loading time
compile time
time of “Instruction Fetch”
microprocessor parallel computer
Reconfigurable Computing
“Instruction” generalized: including complex expressions and other datapaths
strong impact on the machine paradigm !
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
169
Data-stream-based Parallelism
See my other talk
ICECS 2002 IEEE 9th International Conference
on Electronics, Circuits and Systems
Memory Organisation for Datastream-based Reconfigurable Computing
(invited paper)
Michael Herz, Agilent Technologies
Reiner Hartenstein, University of Kaiserslautern Miguel Miranda, Erik Brockmeyer, Francky Catthoor, IMEC, Leuven
Dubrovnik, Croatia September 15-18, 2002
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
170
Machine paradigms
M
I/O
instructionsequencer
datapath(ALU)
CPU
instructionstream
Software
von Neumann
M
datapath
DPU orrDPU
unit
data addressgenerator(data sequencer)
memory
datastreamI/O
asM*
data-stream machine
I/O
MM MM M
(r)DPA
memory
I/OMM MM M
(r)DPU
embedded memory architecture*
Configware
Flowware
instruction stream machine
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
171
Synthesizable Memory Communication
http://kressarray.de
Efficient Memory Communication should be directly supported by the Mapper Tools
An example by Nageldinger’s KressArray Xplorer
sequencers
memory ports
application
not used
Legend: Optimized Parallel Memory Controller
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
172
############### Terminology has been highly confusing
1
2
0 10 12 18
mon
ths
factor
*) Department of Trade and Industry, London
30y
Battery capacity (1.03/year)
10y
4y
24 36 48
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
173
Semiconductor Revolutions
“Mainstream Silicon Application is switching every 10 Years”
TTL µproc., memory
custom
standard
1957
1967
1977
1987
1997
2007
ASICs, accel’s
LSI, MSI
“The Programmable System-on-a-Chip is the next wave“
Tredennick’s Paradigm Shifts
hardwired
algorithm: fixed
resources: fixed
procedural programming
algorithm: variable
resources: fixed
structural programming
algorithm: variable
resources: variable
vN machine paradigm
anti machine paradigm
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
174
No vN bottleneck
The anti machine has no von
Neumann bottleneck.
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
175
3 different mind sets
TTL µproc., memory 1957
1967
1977
1987
1997
2007
ASICs, accel’s
LSI, MSI
FPGAs
coarse grain
soft CPUs
hardware people CS people new breed needed
Common terminology needed
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
176
Throughput vs. Flexibility
1000
100
10
1
0.1
0.01
0.001 2 1 0.5 0.25 0.13 0.1 0,07
MOPS / mW
µ feature size
T. Claasen et al.: ISSCC 1999
flexibility
throughput
hard- wired
von Neumann
FPGAs
the anti machine goes far beyond bridging the gap
anti machine
*) R. Hartenstein: ISIS 1997
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
177
resources variable
algorithms variable
configware
streamware
morphwareAnti machine data stream machine
flowware
Programming sources
von Neumann instruction stream machine resources fixed
algorithms variable
hardware
software
reconfigurable or hardwired
hardwired only
-
© 2003, [email protected] http://hartenstein.de
Kaiserslautern University of Technology
178
Some soft CPU core examples
core architecture platform
MicroBlaze 125 MHz 70 D-MIPS
32 bit standard RISC
32 reg. by 32 LUT RAM-based reg.
Xilinx up to 100 on one FPGA
Nios 16-bit instr. set
Altera
Mercury
Nios 50 MHz
32-bit instr. set
Altera
2