Kaiserslautern Reconfigurable ComputingKaiserslautern University of Technology 7 Dead Supercomputer...

187
DASS 2003 und SDA 2003 Data-Stream-based Reconfigurable Computing Reiner Hartenstein Kaiserslautern University of Technology Dresden, May 8-9, 2003

Transcript of Kaiserslautern Reconfigurable ComputingKaiserslautern University of Technology 7 Dead Supercomputer...

  • DASS ‘2003 und

    SDA ‘2003

    Data-Stream-based Reconfigurable Computing

    Reiner Hartenstein

    Kaiserslautern University of Technology

    Dresden, May 8-9, 2003

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    2

    „new“ terms

    Flowware*: similar to software, but data counter manipulation:

    data streams instead of instruction streams

    Configware: sources for programming morphware

    Software: you all know Hardware: you all know Morphware: structurally programmable „hardware“

    (only the terms are „new“, however, not their subject)

    clean terminology and taxonomy needed for

    comprehensibility *) no relations to „dataflow machine“ (dead area)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    3

    flowware defines ....

    time

    port #

    time

    DPA

    x x x

    x x x

    x x x

    |

    | |

    x x

    x

    x

    x

    x

    x x

    x

    - -

    -

    input data streams

    x x

    x

    x

    x

    x

    x x

    x

    - -

    -

    -

    -

    -

    -

    -

    -

    -

    -

    -

    x x x

    x x x

    x x x

    |

    |

    |

    |

    |

    |

    |

    |

    |

    |

    |

    | output data streams

    time

    port # time

    port #

    ... which data item at which time at which port

    1980: data streams

    (Kung, Leiserson) 1995: super systolic

    rDPA (Kress) 1996+: SCCC (LANL),

    SCORE, ASPRC, Bee (UCB), ...

    (tutorials and courses available on all this)

    flowware history:

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    4

    domain procedural structural

    computing in ... time only* space and time

    program source software*

    hardwired reconfigurable

    currently emerging

    (hardware +) software**

    (hardware +) flowware

    configware + flowware

    „instruction“ fetch at runtime

    before fabrication at loading time

    data „fetch“ at run time **) software „simulates“ flowware

    algorithms variable

    resources variable

    reconfigurable:

    http://hartenstein.de

    programming: procedural vs. structural

    algorithms fixed

    resources fixed

    fully hardwired: not programmable

    *) only one source needed

    algorithms variable

    resources fixed

    CPU:

    embedded systems: data-stream-based

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    5

    platform program source running on it machine paradigm

    hardware (not programmable)

    none

    morphware

    fine grain rGA (FPGA) configware

    coarse grain

    rDPU, rDPA reconfigurable data stream processor

    flowware & configware anti

    machine data stream processor (hardwired) flowware

    instruction stream processor software von Neumann machine

    Digital System Platforms clearly distinguished (1)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    6

    Crusty Computing Sciences

    [David Padua, John Hennessy]

    shrinking supercomputing conferences

    more and more efforts yield only marginal improvements

    dataflow machines are dead

    98.5% vN-only

    this monopoly is dangerous

    areas fade away

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    7

    Dead Supercomputer Society

    •ACRI •Alliant •American Supercomputer

    •Ametek •Applied Dynamics •Astronautics •BBN •CDC •Convex •Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent

    •DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland •Computer •Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel Machines

    •Kendall Square Research •Key Computer Laboratories

    [Gordon Bell, keynote at ISCA 2000]

    •MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    8

    Stealthy CS Crisis

    progress in CS stalled by qualification problems in industry and academia

    communication barriers between disciplines

    exploding design cost and implementation cost

    not only in embedded systems: comprehensibility barrier between procedural and structural mind set

    severe software quality problems

    often hardware people needed to solve CS problems

    80% of designers hate their tools... ... unusable for SW people

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    9

    What are the Challenges ? (1) [ST microelectronics, MorphICs, Dataquest, eASIC]

    1

    2

    0 10 12 18 months

    factor

    *) Department of Trade and Industry, London

    10y

    4y

    90% by 2010

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    10

    McKinsey Curve: dynamics of R&D disciplines

    maturity of a discipline

    year

    fundmental issues

    saturation: limitations met

    evangelists create awareness

    consolidation

    challenges and motivation

    CS discipline gets crusted

    innovation

    evangelists ....

    challenges ....

    new discipline on top of it .... new CS by innovation

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    11

    data streams ...

    History of Computing

    mainframes PC

    ?

    1957

    1967

    1977

    1987

    1997

    2007

    new CS

    maturity

    technology issue and

    business model

    free rider

    classical CS

    morphware

    but awareness still missing .... ... still ignored by most CS curricula

    it´s already existing ...

    here?

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    12

    data streams ...

    Semiconductor Revolutions

    mainframes PC

    ?

    1957

    1967

    1977

    1987

    1997

    2007

    technology issue and business model

    Trittbrettfahrer

    morphware

    TTL

    µproc. memory

    “Mainstream Silicon Application is switching every 10 Years” standard

    custom

    LSI, MSI

    ASICs, accel’s

    here?

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    13

    time of Makimoto’s 3rd wave

    [Hartenstein]

    The next EDA Industry Revolution

    1978

    Transistor entry: Applicon, Calma, CV ...

    1992

    Synthesis: Cadence, Synopsys ... 1985

    Schematics entry: Daisy, Mentor, Valid ...

    courtesy [Keutzer / Newton]

    EDA industry paradigm switching every 7 years

    1999 (Co-) Compilation

    Data-Stream-based DPU arrays

    2006

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    14

    it‘s time for a new CS

    it‘s time for a new CS ...

    configware flowware

    embedded systems: hw/cw/sw co-design

    next EDA wave: high level languages

    CS crisis: qualification

    problems

    .... a dichotomy of 2 machine paradigms

    urging us opportunities

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    15

    Matter & Antimatter

    The World of Matter machine paradigm: the Atom

    + + -

    The World of Anti Matter machine paradigm: Anti Atom

    - - +

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    16

    Matter & Antimatter of Informatics :

    - DPU

    +

    Anti Machine paradigm

    +

    CPU

    -

    nothing central !

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    17

    Drafting a Road Map

    The talk gives a draft of a road map toward a symbiosys of basic computing paradigms

    What delays the break-through of Reconfiguable Computing ?

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    18

    Machine paradigms

    von Neumann data-stream machine instruction

    stream machine

    M

    I/O

    instruction sequencer

    CPU

    instruction stream

    DPU

    Software

    M

    DPU or rDPU

    data address generator

    (data sequencer)

    memory

    data stream I/O

    asM*

    Configware

    Flowware

    Legend:

    download

    (reconf.)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    19

    heavy anti atoms: DPA = DPU array

    - DPA

    - DPU

    - DPU

    - DPU

    - DPU

    - DPU

    - DPU

    - DPU

    - DPU

    - DPU -

    DPA

    +

    +

    +

    +

    +

    +

    + +

    +

    flow

    ware

    : dat

    a st

    ream

    s sp

    inni

    ng a

    roun

    d

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    20

    Machine paradigms

    von Neumann data-stream machine instruction

    stream machine

    M

    I/O

    instruction sequencer

    CPU

    instruction stream

    I/O M M M M M

    (r)DPU

    DPU

    Software

    I/O

    M M M M M

    (r)DPA

    memory

    M

    DPU or rDPU

    data address generator

    (data sequencer)

    memory

    data stream I/O

    asM*

    Configware

    Flowware

    Legend:

    download

    (reconf.)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    21

    rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

    array size: 10 x 16 = 160 rDPUs

    rDPA example

    rout thru only

    not used backbus connect

    SNN filter KressArray Mapping Example

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    22

    PACT XPP: Reference Module: XPU128 Co-Processor

    ALU - PAE

    CF

    G

    PAE

    core

    ALU CtrlALU

    CF

    GC

    FG

    PAE

    core

    CF

    GC

    FG

    PAE

    core

    PAE

    core

    ALU CtrlALUALU CtrlALU

    CF

    GC

    FG

    CF

    GC

    FG

    XPP128 ALU-Array

    • 2 X PACs (Cluster) • 128 X ALU-PAEs • 32 X 1Kbyte RAM-PAEs • 8X I/O Elements

    • Full 32 or 24 Bit Design • 2 Configuration Hierarchies • Evaluation Board (2001) • XDS Development Tool with Simulator

    • PAE Core is 32- or 24-Bit ALU with DSP-Instruction Set and Controller

    • Connecttions: Inputs + Outputs (Channels) + Events

    [Jürgen Becker, Univ. Karlsruhe]

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    23

    Throughput vs. Efficiency

    1000

    100

    10

    1

    0.1

    0.01

    0.001 2 1 0.5 0.25 0.13 0.1 0,07

    MOPS / mW

    µ feature size

    S S

    S S

    resources needed for

    reconfigurability

    L

    L L

    L L

    L

    L L L

    area used by application

    1 Bit CLB

    T. Claasen et al.: ISSCC 1999

    Wiring by abutment: 32 Bit example

    *) R. Hartenstein: ISIS 1997

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    24

    Throughput vs. Flexibilityy

    1000

    100

    10

    1

    0.1

    0.01

    0.001 2 1 0.5 0.25 0.13 0.1 0,07

    MOPS / mW

    µ feature size

    T. Claasen et al.: ISSCC 1999

    tment: example

    *) R. Hartenstein: ISIS 1997

    flexibility

    throughput

    hard- wired

    von Neumann

    FPGAs

    coarse grain goes far beyond bridging the gap

    coarse grain

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    25

    Machine paradigms

    von Neumann data-stream machine instruction

    stream machine

    M

    I/O

    instruction sequencer

    CPU

    instruction stream

    I/O M M M M M

    (r)DPU

    DPU

    Software

    I/O

    M M M M M

    (r)DPA

    memory embedded memory architecture*

    M

    DPU or rDPU

    data address generator

    (data sequencer)

    memory

    data stream I/O

    asM*

    Configware

    Flowware

    Legend:

    download

    (reconf.)

    *) new discipline: came just in time: Herz et al.: Proc IEEE ICECS 2002

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    26

    Configware / Flowware Compilation

    r. Data Path Array

    rDPA intermediate

    high level source program

    wrapper

    address generator

    configware

    mapper

    flowware

    scheduler

    M M M M

    M M M M

    M

    M

    M

    M

    M

    M

    M

    M

    data streams

    data sequencer

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    27

    http://kressarray.de

    Efficient Memory Communication should be directly supported by the Mapper Tools

    sequencers

    memory ports

    application

    not used

    Legend: Optimized Parallel Memory Controller

    An example by Nageldinger’s KressArray Xplorer

    Synthesizable Memory Communication

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    28

    Data-Stream-based Soft Machine

    Scheduler Memory

    (data memory)

    memory bank

    memory bank

    memory bank

    memory bank

    memory bank

    ...

    ...

    “instructions”

    rDPA Compiler

    Sequencers (data stream

    generator)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    29

    The Disk Farm? or a System On a Card?

    The 500GB disc card LOTS of bandwidth A few disks replaced by >10s Gbytes RAM and a processor

    14"

    MicroDrive: 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW)

    Integrated IRAM processor Connected via crossbar switch

    growing like Moore’s law 16 Mbytes; ; 1.6 Gflops; 6.4 Gops 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops

    [Gordon Bell, Jim Gray,

    ISCA2000]

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    30

    computing paradigms and methodologies

    1946: machine paradigm (von Neumann) 1980: data streams (Kung, Leiserson) 1989: anti machine paradigm introduced 1990: anti machine implementation methodology 1990: rDPU (Rabaey) 1994: anti machine high level programming language 1995: super systolic rDPA (Kress) 1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ... 1997: configware / software partitioning compiler (Becker) 2000: generator for rDPA with high memory bandwidth

    (tutorials and courses available on all this)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    31

    Digital System Platforms clearly distinguished (2)

    platform program source running on it machine paradigm

    hardware (not programmable)

    none

    morphware

    fine grain rGA (FPGA) configware

    coarse grain

    rDPU, rDPA reconfigurable data stream processor

    flowware & configware anti

    machine data stream processor (hardwired) flowware

    instruction stream processor software von Neumann machine

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    32

    Software Industry

    TTL µproc., memory

    custom

    standard

    ASICs, accel’s LSI, MSI

    1957

    1967

    1977

    1987

    1997

    2007

    Procedural personalization via RAM-based

    Machine Paradigm

    Software Industry’s Secret of Success

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    33

    Configware Industry ?

    TTL µproc., memory

    custom

    standard

    ASICs, accel’s LSI, MSI

    1957

    1967

    1977

    1987

    1997

    2007

    structural personalization:

    RAM-based before run time

    Repeat Success Story by new Machine Paradigm !

    Configware Industry

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    34

    not a niche market

    Analyzer / Profiler

    SW code

    SW compiler

    para d igm “vN" machine

    CW Code

    CW compiler

    anti machine paradigm

    Partitioner

    Resource Parameters

    supporting different platforms

    supporting platform-based design

    High level PL source

    could provide the platforms

    The Secret of Success: Co-Compilation

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    35

    thank you

    thank you for your patience

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    36

    >>> END

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    37 © 2001, [email protected] http://KressArray.de

    University of Kaiserslautern

    Xputer Lab>>> Appendix

    Appendix for discussion

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    38

    not a niche market

    Analyzer / Profiler

    SW code

    SW compiler

    para d igm “vN" machine

    CW Code

    CW compiler

    anti machine paradigm

    Partitioner

    Resource Parameters

    supporting different platforms

    supporting platform-based design

    High level PL source

    should provide the platforms

    The Secret of Success: Co-Compilation

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    39

    Machine Paradigms

    machine category Computer (the Machine:

    “v. Neumann”) The Anti Machine

    driven by: Instruction streams data streams (no “dataflow”)

    engine principles instruction sequencing sequencing data streams

    state register single program counter (multiple) data counter(s)

    Communication path set-up .

    at run time at load time

    resource DPU (e.g. single ALU) DPU or DPA (DPU array) etc. data path

    operation sequential parallel pipe network etc.

    ( “instruction fetch” )

    also hardwired implementations* *) e g. Bee project Prof. Broderson

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    40

    Programming Language Paradigms

    language category Computer Languages Languages f. Anti Machine

    both deterministic procedural sequencing: traceable, checkpointable

    operation sequence driven by:

    read next instruction, goto (instr. addr.),

    jump (to instr. addr.), instr. loop, loop nesting

    no parallel loops, escapes, instruction stream branching

    read next data item, goto (data addr.),

    jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching

    state register program counter data counter(s)

    address computation

    massive memory cycle overhead overhead avoided

    Instruction fetch memory cycle overhead overhead avoided

    parallel memory bank access interleaving only no restrictions

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    41 © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    Jürgen Becker’s Co-DE-X Co-Compiler

    Analyzer / Profiler

    Host Software

    GNU C compiler

    para d igm Computer machine

    DPSS KressArray Configware

    X-C compiler

    Xputer machine paradigm

    Partitioner

    X-C is C language extended by MoPL X-C

    Resource Parameters

    supporting different platforms

    supporting platform-based design

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    42

    KressArray Family generic Fabrics: a few examples

    Examples of 2nd Level Interconnect: layouted over rDPU cell - no separate routing areas !

    +

    rout-through and function

    rout-through

    only more NNports:

    rich Rout Resources

    Select Function

    Repertory

    select Nearest Neighbour (NN) Interconnect: an example

    16 32 8 24

    4

    2 rDPU

    Select mode, number, width of NNports

    http://kressarray.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    43

    Impact of Makimoto’s wave

    TTL µproc., memory

    custom

    standard

    ASICs, accel’s LSI, MSI

    1957

    1967

    1977

    1987

    1997

    2007

    Procedural personalization via RAM-based

    Machine Paradigm

    Personalization (CAD) before fabrication

    structural personalization:

    RAM-based before run time

    Software Industry’s Secret of Success

    Repeat Success Story by new Machine Paradigm !

    Configware Industry

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    44

    The Dominance of the Submarine Model ...

    Hardware

    ... indicates, that our CS education system produces zillions of

    mentally disabled Persons

    (procedural) structurally disabled

    … completely disabled to cope with solutions other than software only

    It‘s time to attack the software faculty dictatorship. Get involved!

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    45

    However, current CS Education ….

    Hardware invisible: under the surface

    … is based on the Submarine Model

    Brain usage: procedural-only

    Algorithm

    Assembly Language

    procedural high level Programming Language

    Hardware

    This model disables ...

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    46

    Hardware, Configware

    Hardware and Software as Alternatives

    Algorithm

    Software

    partitioning

    Software only

    Software & Hardw/Configw

    procedural structural

    Brain Usage: both Hemispheres

    Hardw/Configw only

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    47

    Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld

    Why Coarse Grain instead of FPGA ?

    physical logical

    FPGA logical

    1980 1990 2000 2010

    FPGA physical

    100 000 000 000

    10 000 000 000

    1000 000 000

    100 000 000

    10 000 000

    1000 000

    100 000

    10 000

    1000

    Tra

    nsis

    tors

    / c

    hip

    ~ 10

    ~ 10 000

    drastically smaller configuration memory

    a lot of more benefits

    much faster loading

    FPGA routed

    reduced reconfigurability overhead by up to ~ 1000

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    48

    Second Blossom of CS

    progress in CS stalled by qualification problems in industry and academia

    Communication barriers between disciplines

    Exploding design and implementation cost

    Not only in embedded systems: comprehensibility barrier between procedural and structural mind set

    Severe software quality problems

    Bad hardware / configware design tools: more than 80% of designers hate their tools

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    49

    Procedural vs. structural

    progress in CS stalled by qualification problems in industry and academia

    like microprocessors also morphware is RAM-based – secret of sucsess of software industry

    Could configware industry repeat this success story ?

    Configware will remain a niche market, unless it Comes along with hardware / configware / software

    co-design

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    50

    Algorithms and Data Structures People

    ... have to go beyond pointers, queues, and stacks

    #

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    51

    roadmap

    old CS lab course philosophy:

    given an application: implement it by a program -/-

    new CS freshman lab course environment: Given an application:

    a) implement it by writing a program b) implement it as a morphware prototype c) Partition it into P and Q

    c.1) implement P by software c.2) implement Q by morphware c.3) implement P / Q communication interface

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    52

    Algorithms and Data Structures

    ... have to go beyond pointers, queues, and stacks

    Extend by including algorithmic issues in software /morphware/ hardware

    migration additional levels of parallelism: chaining, pipelining,

    systolic, super-systolic, wavefront arrays additional data structures and storage organization: the

    new distributed memory discipline

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    53

    Computer Organization / Architecture

    ... have to go beyond von Neumann,

    Extend by including nested machines, address generators the anti machine paradigm Extended taxonomy of platforms: procedural, structural,

    hardwired, reconfigurable, zhybrid systems

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    54

    Languages and Compilers

    ... have to go beyond von Neumann,

    Extend by including Configware / flowware compilers, Procedural / structural co-compilers (data-procedural) flowware languages

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    55

    Semiconductor Revolutions

    “Mainstream Silicon Application is switching every 10 Years”

    TTL

    custom

    standard

    1957

    1967

    1977 LSI, MSI

    µproc., memory

    1987

    1997 ASICs, accel’s

    1st

    desi

    gn c

    risi

    s

    2nd

    des

    ign

    cris

    is

    hardware people new breed (M&C)

    software people new breed needed

    2007

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    56

    EDA the main bottleneck

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    57

    Biggest Mistake of EDA guess it !

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    58

    Innovation Stalled ? [Richard Newton]

    What is next after VHDL ?

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    59

    Flowware and Software

    Software: instruction-stream-based – i. e. based on program counter manipulation

    Flowware: data-stream-based – i. e.based on data counter manipulation

    Software and lowware: like 2-eiige Zwillinge einführen

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    60

    Models (1)

    1. There is a very wide variety of architectures

    2. Most papers have bad organization: to show authors‘ creativeness often less relevant details are stressed in a confusing mix of abstraction levels

    4. a common model is existing – but it‘s usually ignored

    3. Architectures are not described in terms of a common model

    5. We need a comprehensible taxonomy of architectures

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    61

    Models (2)

    1. Reconfigurable instructions et extension

    2. Reconfigurable co-processor 2a. FPGA

    2b. Coarse grain

    I omit 3: hardwired accelerators I do not talk about reconfigurable instruction set processors

    M&C structured VLSI design: max no. Of transistors within regular strcutures – Craig Mudge: regularity factor

    - structured Configware Design

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    62

    >> history & terminology

    • history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    63

    Semiconductor Revolutions

    “Mainstream Silicon Application is switching every 10 Years”

    TTL

    custom

    standard

    1957

    1967

    1977 LSI, MSI

    µproc., memory

    1987

    1997 ASICs, accel’s

    1st

    desi

    gn c

    risi

    s

    2nd

    des

    ign

    cris

    is

    hardware people new breed (M&C)

    software people new breed needed

    2007

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    64

    Terminology: DPU versus CPU ...

    • DPU: data path unit • DPA: DPU array • GA: gate array • rDPU: reconfigurable DPU • rDPA: reconfigurable DPA • rGA: reconfigurable GA

    • DPU is no CPU: there is nothing central - like in a DPA

    DPU DPU

    DPU instruction sequencer

    CPU

    DPA (r)

    (r)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    65

    flowware defines ....

    time

    port #

    time

    DPA

    x x x

    x x x

    x x x

    |

    | |

    x x

    x

    x

    x

    x

    x x

    x

    - -

    -

    input data streams

    x x

    x

    x

    x

    x

    x x

    x

    - -

    -

    -

    -

    -

    -

    -

    -

    -

    -

    -

    x x x

    x x x

    x x x

    |

    |

    |

    |

    |

    |

    |

    |

    |

    |

    |

    | output data streams

    time

    port # time

    port #

    ... which data item at which time at which port

    flowware manipulates the data counter(s) ...

    ... software manipulates the program counter

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    66

    History of data-streams

    1980: data streams (Kung, Leiserson) 1995: super systolic rDPA (Kress) 1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...

    (tutorials and courses available on all this)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    67

    >> skyrocketing requirements

    • history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    68

    What are the Challenges ? (1) [ST microelectronics, MorphICs, Dataquest, eASIC]

    1

    2

    0 10 12 18 months

    factor

    4y

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    69

    Changing Models of Computing

    “von Neumann”

    down loa d in g

    RAM

    down loa d in g

    da ta pa th in s t ru ct ion s equ en cer

    I / O

    (procedural) Software

    hardware/software co-design

    software design

    the problem with typical CS

    people: -the dominance of von Neumann

    - they cannot partition

    - they cannot migrate

    h os t

    hardwired

    down loa d in g

    accelerator(s)

    CAD

    RAM

    hardware

    Software hardware

    spec

    hardware people needed

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    70

    >> destructive von Neumann monopoly

    • history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    71

    Which machine paradigm ?

    von Neuman does not support morphware

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    72

    What about CS people ?

    TTL µproc., memory 1957

    1967

    1977

    1987

    1997

    2007

    ASICs, accel’s

    LSI, MSI

    FPGAs

    coarse grain

    soft CPUs

    CS people

    procedural programming

    languages, compiler computer

    architecture

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    73

    Flag ship example: annual IEEE ISCA conference series

    Resignation?

    taken over by the opposition:

    Interconnect Fabrics:

    vN Parallelism:

    the Datenflow Machine is dead

    Statistics [David Padua, John Hennessy, et al.]

    Reconfigurable Computing

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    74

    There are more Levels of Parallelism

    Loop Level (data-stream-based, pipe nets, etc.)

    Instruction Level (VLIW etc.)

    Logic Level (FPGAs)

    RT Level (special architectures etc.)

    Process level

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    75

    What are the Challenges ? (2) [ST microelectronics, MorphICs, Dataquest, eASIC]

    1

    2

    0 10 12 18 months

    factor

    *) Department of Trade and Industry, London

    10y

    4y

    90% by 2010

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    76

    Changing Models of Computing

    h os t

    re-

    down loa d in g

    conf. accelerator(s)

    RAM RAM

    Software Configware

    (structural)

    Morphware

    configware/software co-design

    hardware/configware/software co-design “von Neumann”

    down loa d in g

    RAM

    down loa d in g

    da ta pa th in s t ru ct ion s equ en cer

    I / O

    (procedural) Software

    h os t

    hardwired

    down loa d in g

    accelerator(s)

    CAD

    RAM

    Hardware

    Software

    hardware/software co-design

    software design

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    77

    no von Neumann bottleneck ?

    typical CS people:

    • how to provide more performance to these people ?

    • think in terms of machine models: sequencing instruction by instruction

    • cannot be turned into hardware people

    • new machine paradigm needed which does not have a von Neumann bottleneck

    • the anti machine has no von Neumann bottleneck

    • data streams instead of an instruction stream

    • flowware instead of software

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    78

    Just in time

    The new distributed memory discipline:

    just in time to implement the anti machine.

    [3] M. Herz et al. (invited): Memory Organization for

    Data-Stream-based Reconfigurable Computing; Proc. ICECS 2002

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    79

    >> high mask cost

    • history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    80

    What are the Challenges ? (3) [ST microelectronics, MorphICs, Dataquest, eASIC]

    1

    2

    0 10 12 18 months

    factor

    *) Department of Trade and Industry, London

    30y

    10y

    4y

    3y avoid application-

    specific silicon !

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    81

    Coarse grain vs. Fine grain

    coarse grain (PACT AG, Munich)

    multi grain (e. g. by slice bundling)

    fine grain (FPGAs, rGAs)

    Reconfigurability:

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    82

    >> low battery capacity

    • history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    83

    What are the Challenges ? (4) [ST microelectronics, MorphICs, Dataquest, eASIC]

    1

    2

    0 10 12 18 months

    factor

    *) Department of Trade and Industry, London

    30y

    Battery capacity (1.03/year)

    10y

    4y

    3y

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    84

    Algorithmic cleverness

    Very high throughput on low power slow

    FPGAs may be obtained only by algorithmic

    cleverness - not yet taught by CS & CSE at

    Universities – an urgent educational problem.

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    85

    >> new compilation model

    • history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    86

    What are the Challenges ? (5) [ST microelectronics, MorphICs, Dataquest, eASIC]

    1

    2

    0 10 12 18 months

    factor

    *) Department of Trade and Industry, London

    30y Battery capacity (1.03/year)

    10y

    4y

    3y

    5y

    2y new

    compilation techniques

    needed ! supported

    by a new machine

    paradigm

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    87

    >> conclusions

    • history & terminology • skyrocketing requirements • destructive von Neumann monopoly • high mask cost • low battery capacity • new compilation model • conclusions http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    88

    Conclusion

    No, we are not ready for the break-through,

    since our computing education is obsolete, because of the von Neumann monopoly.

    But all ingredients are available to jazz up our CS & CSE curricula

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    89

    >>> thank you

    thank you for your patience

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    90

    scalability

    The Scalability Problem

    The Routing congestion Problem grows with the size of the FPGA

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    91

    rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

    array size: 10 x 16 = 160 rDPUs

    http://kressarray.de

    SNN filter KressArray Mapping Example

    rout thru only

    not used backbus connect

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    92

    route-thru-only rDPU

    3 vert. NNports, 32 bit

    http://kressarray.de

    Xplorer Plot: SNN Filter Example

    + [13]

    2 hor. NNports, 32 bit

    operator

    result

    operand

    operand

    route thru

    backbus connect

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    93

    Conclusion: all knowledge needed is available

    • languages

    • machine paradigm

    • compilation techniques

    • anti architectural resources

    • sequencing methodology: hw & sw

    • hw / sw partitioning methodology

    • parallel memory IP core and module generator vendors

    courses / embedded tutorials:

    full day courses:

    • anything else needed

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    94

    ... has a chance

    Configware Industry has a Chance

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    95

    Conclusions

    •the anti machine is the way to go for massive parallelism, also data-intensive applications

    •reconfigurable anti machine for high performance with short product life cycles, unstable standards

    •reconfigurable for low cost low volume production

    •Giga FPGAs highly promising - only by a new design flow: configware could repeat the success of software industry

    •sparepart problem: needs new infrastructures

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    96

    Paradigm Shifts: Nick Tredennick‘s view

    algorithms variable

    resources fixed

    instruction-stream-based computing:

    algorithms variable

    resources variable

    reconfigurable computing:

    programmable

    why 2 program sources ?

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    97

    Compilation for (r)DPA of anti machine

    mapper

    scheduler

    expressionmorphware

    configware

    streamware

    tree

    high level source program

    wrapperparameters

    codegenerators

    DPU library

    (software notation)

    flowware

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    98

    Misleading predictors

    Moore's Law is becoming a misleading

    predictor of future developments.

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    99

    High mask cost

    High mask cost may be avoided

    completely by morphware use, or,

    partly by GAs (ASICs).

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    100

    Fault tolerance

    Morphware is the only way to

    obtain fault-tolerant ICs.

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    101

    World-wide services

    FPGAs may provide an important

    benefit for world-wide services and

    all other after sales consequences

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    102

    „Re-configurable Hardware“ ??

    „Re-configurable Hardware“ ??

    this „Hardware“ is not hard !

    We need a concise terminology: a consensus is on the way

    it‘s Morphware

    Terminology has been highly confusing

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    103

    Super Pipe Networks

    pipeline properties array applications

    shape resources

    mapping scheduling

    (data stream formation)

    systolic array

    regular data dependencies

    only

    linear only

    uniform only

    linear projection or algebraic synthesis

    super-systolic rDPA

    no restrictions simulated

    annealing or P&R algorithm

    (e.g. force-directed) scheduling algorithm

    * *) KressArray [1995]

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    104

    http://kressarray.de

    Efficient Memory Communication should be directly supported by the Mapper Tools

    sequencers

    memory ports

    application

    not used

    Legend: Optimized Parallel Memory Controller

    An example by Nageldinger’s KressArray Xplorer

    Synthesizable Memory Communication

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    105

    Stream-based Soft Machine

    Scheduler Memory

    (data memory)

    memory bank

    memory bank

    memory bank

    memory bank

    memory bank

    ...

    ...

    “instructions”

    rDPA Compiler

    Sequencers (data stream

    generator)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    106

    JPEG zigzag scan pattern

    x

    y

    EastScan is step by [1,0] end EastScan;

    SouthScan is step by [0,1] endSouthScan;

    *> Declarations

    NorthEastScan is loop 8 times until [*,1] step by [1,-1] endloop end NorthEastScan;

    SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan;

    HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag;

    goto PixMap[1,1]

    HalfZigZag; SouthWestScan uturn (HalfZigZag)

    HalfZigZag

    HalfZigZag

    data counter data counter

    data counter data counter

    1

    3

    2

    4

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    107

    Similar Programming Language Paradigms

    language category Computer Languages Xputer Languages

    both deterministic procedural sequencing: traceable, checkpointable

    sequencingdriven by:

    read next instruction, goto (instruction addr.), jump (to instruction addr.), instruction loop, instruction loop nesting no parallel loops, instruction loop escapes, instruction stream branching

    read next data object, goto (data addr.), jump (to data addr.), data loop, data loop nesting, parallel data loops, data loop escapes, data stream branching

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    108

    GAG = Address Generator

    Generic GAG Scheme

    Limit Stepper

    Base Stepper

    GAG

    Address Stepper

    B0 DA L0

    A

    D A L B 0 [ ] | | | |

    limit

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    109

    GAG: Address Stepper

    GAG =

    Address

    Generator

    Generic

    + / –

    Escape

    Clause End

    Detect

    Step Counter

    =o

    L A D A init tag

    A

    Address endExec

    maxStepCount

    0 B Limit Base stepVector

    [ ] | |

    D A L B 0 [ ] | | | |

    limit

    GAG: Address Stepper

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    110

    Generic Sequence Examples

    a) b)

    c)

    d) e) f) g)

    Limit Slider

    Base Slider

    GAG

    Address Stepper

    B0 DA L0

    A

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    111

    floor

    F

    address

    ceiling

    C

    Slider Operation Demo Example

    yx

    B 0 L0

    DLDB

    DL

    DA

    DB

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    112

    What are the Challenges ? [ST microelectronics, MorphICs, Dataquest, eASIC]

    1

    2

    0 10 12 18 months

    factor

    *) Department of Trade and Industry, London

    30y

    Battery capacity (1.03/year)

    10y

    4y

    3y

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    113

    What are the Challenges ? [ST microelectronics, MorphICs, Dataquest, eASIC]

    1

    2

    0 10 12 18 months

    factor

    *) Department of Trade and Industry, London

    30y

    Battery capacity (1.03/year)

    10y

    4y

    3y design complexity: +40%/year doub 2y

    design productivity: +15%/year doub 5y

    SIA roadmap]

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    114

    >> Outline

    • Morphware

    • Changing Models by SoC Development

    • New Machine Paradigm needed

    • The Dichotomy of Paradigms

    • Outlook http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    115

    The Morphware Market

    Xilinx 42%

    Altera 37%

    Lattice 15%

    Actel 6%

    Top 4 PLD Manufacturers 2000

    total: $3.7 Bio

    • [Dataquest] > $7 billion by 2003.

    • PLD vendors’ and their alliances provide libraries of “soft IPs”

    Configware Market

    • fastest growing semiconductor market segment

    coarse-grained:

    rDPUs: configurable functional blocks

    fine-grained:

    cLBs, rLBs: configurable logic blocks

    PACT AG, Munich, Germany http://pactcorp.com

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    116

    Coarse grain vs. Fine grain

    coarse grain (PACT AG, Munich)

    multi grain (e. g. by slice bundling)

    fine grain (FPGAs, rGAs)

    Reconfigurability:

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    117

    route-thru-only rDPU

    3 vert. NNports, 32 bit

    http://kressarray.de

    Xplorer Plot: SNN Filter Example

    + [13]

    2 hor. NNports, 32 bit

    operator

    result

    operand

    operand

    route thru

    backbus connect

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    118

    Morphware only: some soft CPU core examples

    Spartan-II 16 bit DSP DSPuva16

    FLEX10K30 or EPF6016

    i8080A My80

    32-bit gr1050

    16-bit gr1040

    Altera – Mercury

    8 bit Nios

    Altera

    22 D-MIPS

    32-bit instr. set

    Nios 50 MHz

    Altera

    Mercury

    16-bit instr. set

    Nios

    Xilinx up to 100 on one FPGA

    32 bit standard RISC

    32 reg. by 32 LUT RAM-based reg.

    MicroBlaze 125 MHz 70 D-MIPS

    platform architecture core

    SpartanXL RISC integer C xr16

    old Xilinx FPGA Board

    16-bit RISC, 2 opd. Instr.

    YARD-1A

    1 Flex 10K20 Acorn-1

    Altera, Lattice, Xilinx

    8 bit CISC 1Popcorn-1

    Lattice 4 isp30256, 4 isp1016

    12 bit DSP Reliance-1

    2 XILINX 3020 LCA

    8 bits Instr. + ext. ROM

    REGIS

    200 XC4000E CLBs

    CISC, 32 reg. uP1232 8-bit

    ARM ARM7 clone

    SPARC Leon

    25 Mhz

    platform architecture core

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    119

    soft CPUs in academic teaching

    • UCSC: 1990!

    • Märaldalen University • Chalmers University • Cornell University • Gray Research • Georgia Tech • Hiroshima City Univ.

    • Michigan State • Univ. de Valladolid • Virginia Tech • Washington U. St. Louis • New Mexico Tech • UC Riverside • Tokai University

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    120

    >> New Machine Paradigm needed

    • Morphware

    • Changing Models by SoC Development

    • New Machine Paradigm needed

    • The Dichotomy of Paradigms

    • Outlook http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    121

    >> The Dichotomy of Paradigms

    • Morphware

    • Changing Models by SoC Development

    • New Machine Paradigm needed

    • The Dichotomy of Paradigms

    • Outlook http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    122

    >> Outlook

    • Morphware

    • Changing Models by SoC Development

    • New Machine Paradigm needed

    • The Dichotomy of Paradigms

    • Outlook

    http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    123

    Why fine grain ?

    •no specific silicon: low production volume (aerospace, automotive, military, industrial controllers, et al.)

    • the spare part problem

    •design flow

    •coming Giga-FPGA

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    124

    Configware Industry vs. Software Industry

    can configware industry repeat the success story?

    •RAM-based

    •Compatibility

    •Scalability

    •Education problems

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    125

    Problems of Parallelism

    Software to rDPA migration

    the area of parallel algorithms needs a complete re-orientation of its scope ...

    methodology only in special areas (DSP, wireless ....)

    Software to FPGA migration:

    enormous speed-ups: factor of 3 to >10 000

    algorithmic cleverness missing, no education no methodology for interconnect estimation

    ... far beyond traditional platforms

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    126

    Evolution of FPGA and its design flow

    User Code Compiler Executable

    Netlister Netlist

    Place and

    Route . .

    Bitstream

    Schematics/

    HDL

    HLL Compiler

    Compiler HLL

    [à la S. Guccione]

    CPU core

    FPGA core

    Memory core Compiler

    HLL

    soft CPU

    © 2002, [email protected] http://KressArray.de

    inter face

    s

    CPU core

    FPGA core

    Memory core

    rDPA core

    inter face

    s

    soft rDPA

    as soon as Giga FPGA is available

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    127

    ASIC emulation

    •ASIC emulation / Rapid Prototyping: to replace simulation

    •Quickturn (Cadence), IKOS (Synopsys), Celaro (Mentor)

    •hours of compilation run: inefficient since netlist-based: ...

    • ... ASIC emulators will become obsolete soon

    •by RTR: in-circuit execution debugging instead of emulation

    •new business model: upgradable morphware is the product

    •emulation for solving the spare part problem in many areas

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    128

    Nasty Matter

    + CPU

    Data Path

    instruction sequencer

    RAM

    Address Computation Overhead

    Instruction Fetch Overhead

    central von Neumann bottleneck

    extremely power hungry and area inefficient

    reconfigurable?

    the wrong machine paradigm

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    129

    - DPU

    Data Path Unit

    DPU

    Data Path

    instruction sequencer

    Matter vs. Antimatter: CPU vs. DPU

    +

    dat

    a st

    ream

    dat

    a st

    ream

    s +

    +

    Data Path Unit

    DPU

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    130

    + CPU

    Data Path

    instruction sequencer

    + simple machine paradigm + scalability

    + relocatability + compatibility

    = secret of success of software industry

    RAM

    RAM-based CPU:

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    131

    Success Factors

    property instruction

    stream based

    data stream based

    reconfigurable hardwired fine grain

    (FPGA) coarse grain

    RAM-based yes yes yes (hardwired)

    machine paradigm yes no available available

    compatibility yes limited feasible feasible

    scalability yes no good* (hardwired)

    code relocatability yes no good* (hardwired)

    *) if KressArray used

    **) mapping coarse grain onto FPGA

    good**

    good**

    feasible**

    available**

    success of software industry

    • for configware industry is missing: – FPGA compatibility, – fully scalable FPGA, – relocatable configuration code • rDPUs and rDPAs do

    much better than FPGAs

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    132

    >>> Problems with Concurrency

    • The Computer Architecture Crisis

    • The Impact of Reconfigurable Platforms

    • The Dichotomy of Models

    • Parallelism

    • Conclusions http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    133

    Parallelism by Concurrency

    independent instruction streams

    ....

    Bus(es) or switch box

    Data Path

    instruction sequencer

    Data Path

    instruction sequencer

    Data Path

    instruction sequencer

    Data Path

    instruction sequencer

    + -

    + -

    - +

    +

    + -

    +

    - +

    -

    -

    difficult coordination

    massive run time overhead

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    134

    >> The Dominance of Embedded Systems

    • The Computer Architecture Crisis

    • The Impact of Reconfigurable Platforms

    • The Dichotomy of Models

    • Parallelism

    • Conclusions http://www.uni-kl.de

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    135

    Summary of the Anti Machine Paradigm

    • anti language primitives are almost the same (slightly extended)

    • anti machine execution potential is dramatically more powerful

    • provides drastically more flexibility

    • not always replacing von Neumann

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    136

    JPEG zigzag scan pattern

    x

    y

    EastScan is step by [1,0] end EastScan;

    SouthScan is step by [0,1] endSouthScan;

    *> Declarations

    NorthEastScan is loop 8 times until [*,1] step by [1,-1] endloop end NorthEastScan;

    SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan;

    HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag;

    goto PixMap[1,1]

    HalfZigZag; SouthWestScan uturn (HalfZigZag)

    HalfZigZag

    data counter data counter

    data counter data counter

    2

    1

    3

    4

    HalfZigZag

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    137

    >> Address Generators for Data Streams

    • Introduction

    • Smart Address Generators

    • Address Generators for Data Streams

    • Customized Memory Organization

    • Conclusions http://www.uni-kl.de

    (data streams introduced earlier in this session)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    138

    2-D Generic Data Sequence Examples

    a) b)

    c)

    d) e) f) g)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    139

    GAG = Address

    Generatorc

    Generic GAU generic address unit Scheme

    Base Slider

    B0

    Limit Slider

    L0

    0 B

    [

    Address Stepper

    DA

    A

    D A | | | |

    L

    ]

    limit

    all 3 are copies of the same BSU

    stepper circuit GAU

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    140

    GAG: Address Stepper

    GAG =

    Address

    Generator

    Generic

    + / –

    Escape

    Clause End

    Detect

    Step Counter

    =o

    L A D A init tag

    A

    Address endExec

    maxStepCount

    0 B Limit Base stepVector

    [ ] | |

    D A L B 0 [ ] | | | |

    limit

    GAG: Address Stepper

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    141

    GAG Slider Model

    LimitStepper

    BaseStepper

    AddressStepper

    B0DAL0

    A

    LimitStepper

    BaseStepper

    AddressStepper

    B0DAL0

    A

    sliders

    B 0 B

    [

    0 L

    ]

    0 L 0

    B 0 B

    [

    0 A D

    A D

    L

    ]

    0 L 0

    GAG Generic

    Address Generator

    floor ceiling

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    142

    GAG Complex Sequencer Implementation

    Limit Slider

    Base Slider

    GAU

    Address Stepper

    B0 DA L0

    A

    all `been published

    in 1990

    Limit Slider

    Base Slider

    GAU

    Address Stepper

    B0 DA L0

    A

    Limit Slider

    Base Slider

    GAU

    Address Stepper

    B0 DA L0

    A

    GAU GAU

    GAG Generic Address Generator

    SDS

    GAG

    VLIW stack

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    143

    ceiling

    C

    address

    GAG Slider Operation Demo Example

    yx

    DLDB

    L0B 0 DAF floor

    DLDB

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    144

    The microelectronics spare part problem

    •Original fab line is no more existing

    •ICs do not survive storage time

    •Demand: several decades of availability

    2 1 0.5 0.25 0.13 0.1 0,07 µ feature size

    [Hartenstein 2002]

    • e. g. car price: ~25% electronics

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    145

    The microelectronics spare part problem

    2 1 0.5 0.25 0.13 0.1 0,07 µ feature size

    [Hartenstein 2002]

    key problem in many application areas: medical, aerospace, automotive, other transportation, military, industrial equipment controllers, et al.

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    146

    Dead Supercomputer Society

    •ACRI •Alliant •American Supercomputer

    •Ametek •Applied Dynamics •Astronautics •BBN •CDC •Convex •Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent

    •DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland •Computer •Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel Machines

    •Kendall Square Research •Key Computer Laboratories

    [Gordon Bell, keynote at ISCA 2000].

    •MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    147

    CS: young ? dynamic?

    .. but the von Neumann Paradigm is still the dominant doctrine ...

    Microelectronics is ignored (except falling cost of computational effort)

    ... still pushing he basic models from the times of mainframe dinosaurs

    after >10 technology generations ...

    • 1th 4004 • 2nd 8008 • 3rd 8086 • 4th 80286 • 5th 80386 • 6th 80486 • 7th P5 (Pentium) • 8th P6 (Pentium Pro / Pentium II) • 9th Pentium III • 10th .... • 11th • .......

    ... the vN Microprocessor is a methusela, the steam engine of the silicon age.

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    148

    better to go for reconfigurable platforms

    • [Dataquest] PLD market > $7 billion by 2003.

    • fastest growing segment of semiconductor market

    • IP reuse and silicon reuse

    • FPGAs are going into every type of application

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    149

    Throughput vs. Flexibility

    flexibility

    throughput 1000

    100

    10

    1

    0.1

    0.01

    0.001 2 1 0.5 0.25 0.13 0.1 0,07

    MOPS / mW

    µ feature size

    T. Claasen et al.: ISSCC 1999

    hard- wired

    von Neumann

    FPGAs

    the anti machine goes far beyond bridging the gap

    anti machine

    *) R. Hartenstein: ISIS 1997

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    150

    Why coarse grain ?

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    151

    Terminology

    DPU data path unit rDPU reconfigurable DPU DPA data path array (DPU array) rDPA reconfigurable DPA RA reconfigurable array ISP instruction set processor AM anti machine AMP data stream processor* rAMP reconfigurable AMP

    *) no “dataflow machine”

    platform category

    programming source

    machine paradigm

    hardware (not programmable) none

    ISP software von Neumann

    • morphware configware FPGA: none data stream processor (AMP)

    streamware anti machine

    reconfigurable

    AMP (rAMP)

    streamware &

    configware

    digital system platforms:

    morphware use granularity (path width) (re)configurable blocks

    reconfigurable logic • fine grain (~1 bit) CLBs

    reconfigurable computing coarse grain (e.g. 32 bits) rDPUs (e.g. ALU-like)

    multi granular: by slice bundling rDPU slices (e.g. 4 bits)

    categories of morphware:

    consensus is near

    FPGA field-programmable gate array FPL field-programmable logic PLD programmable logic device CPLD complex PLD

    instruction set processor

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    152

    >> Problems to be solved

    • Configware Market

    • FPGA Market

    • Embedded Systems (Co-Design)

    • Hardwired IP Cores on Board

    • Run-Time Reconfiguration (RTR)

    • Rapid Prototyping & ASIC Emulation

    • Evolvable Hardware (EH)

    • Academic Expertise

    • ASICs dead

    • Soft CPU

    • HLLs

    • Problems to be solved

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    153

    EDA industry shift into CS mentality [Wojciech Maly]

    • patches instead of engineering • innovation stalled many years ago • 85% users hate their tools • netlist-based: do not care about efficiency, ... • ... do not care about transistor density

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    154

    [Jonathan Rose] FPGAs Give You

    • Instant Fabrication – Get to Market Fast – Fix ‘em quick

    • Zero NRE Charges – Low Risk – Low Cost at good volume

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    155

    The Crisis of Computing Sciences

    • Computing Sciences are in a severe crisis • Computing curricula are obsolete because of strictly

    enforced „procedural-only“ blinders • Computer Architecture and related areas have

    lost leadership in digital system implementation

    • CS ignores > 90% µprocessors in embedded systems: 10 times more programmers will write embedded applications than computer software by 2010

    • A disruptive promising therapy introduced by new approaches coming with Reconfigurable Computing

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    156

    Ubiquitous embedded systems

    20 billion µprocessors (2001)

    > 90% in embedded systems

    10 times more programmers will write embedded applications than computer software by 2010

    That’s where our graduates will go

    Embedded systems means:

    • hardware / software co-design

    • configware / software co-design

    • hardware / configware / software co-design

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    157

    The Situation in Computing Sciences

    • Computing Sciences are in a severe crisis

    • New fundamentals and R&D directions are inevitable

    • my mission: getting you involved

    • All knowledge needed is readily available ...

    • ... even from Computing Sciences

    • Silicon application and EDA provide useful concepts

    • Reconfigurable Computing has the remedy

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    158

    the edu gap has dramatic consequences

    •Key R&D scenes are drying out or dying •because of a lack of qualified researchers •the embedded system design crisis gets worse •because of a lack of qualified designers •many innovative products cannot be sold •because of a lack of qualified customers •the edu gap is widening dramatically •because of a lack of qualified educators

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    159

    Super Pipe Networks

    pipeline properties array applications

    shape resources

    mapping scheduling

    (data stream formation)

    systolic array

    regular data

    dependencies only

    linear only

    uniform only

    linear projection or algebraic synthesis

    super-systolic DPA

    no restrictions simulated

    annealing or P&R algorithm

    (e.g. force-directed) scheduling algorithm

    *) KressArray [ASP-DAC-1995]

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    160

    .... it‘s an alternative culture ....

    • now the area is going mainstream: a rapidly widening audience of non-specialists gets interested ...

    • severe communication gaps due to educational deficits

    • not only to users: still many hardware and EDA experts ask: isn’t it just logic design on a strange platform ?

    • it is time to clarify and popularize fundamental aspects and to explain, that it is a fundamentally different culture

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    161 © 2001, [email protected]

    University of Kaiserslautern

    Xputer Lab

    instructions

    program cou n ter: state register

    Compiler RAM

    Datapath

    har dw ired

    Sequencer

    Computer tightly coupled by compact instruction code

    “von Neumann” does not support soft data paths

    Datapath

    Xputer

    Scheduler

    Compiler

    RAM

    (multiple) sequencer

    Datapath Array

    “instructions”

    University of Kaiserslautern

    Xputer Lab

    loosely coupled by decision data bits only

    Xputer: The Soft Machine Paradigm reconfigurable

    also for hardwired

    Computer: the wrong Machine Paradigm

    “von Neumann”

    s d a ta cou n ter

    (anti machine)

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    162

    Semiconductor Revolutions

    “Mainstream Silicon Application is switching every 10 Years”

    TTL µproc., memory

    custom

    standard

    1957

    1967

    1977

    1987

    1997

    2007

    ASICs, accel’s

    LSI, MSI

    “The Programmable System-on-a-Chip is the next wave“

    Tredennick’s Paradigm Shifts

    hardwired

    algorithm: fixed

    resources: fixed

    procedural programming

    algorithm: variable

    resources: fixed

    structural programming

    algorithm: variable

    resources: variable

    vN machine paradigm

    anti machine paradigm

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    163

    Impact of Data-stream-based ...

    TTL µproc., memory

    custom

    standard

    ASICs, accel’s LSI, MSI

    1957

    1967

    1977

    1987

    1997

    2007

    structural personalization:

    hardwired before fabrication

    Repeat Success Story by new Machine Paradigm !

    Embedded Hardware/ Configware Industry

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    164

    Rapidly growing CS education gap

    •Our computing curricula are obsolete • introduction is strictly „procedural-only“

    •vN-only use of terms like „computer organisation“, „ computer structures“, „ computer architecture

    •graduates are not prepared to the real world – most applications for embedded systems (>90% by 2010)

    •our graduates are unable to compete with EE graduates •only a few % curricula need to be changed

    •my mission: getting you involved

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    165

    Binding Time vs. Computing Domain

    time domain (procedural)

    Binding time: (Set-up of Communication Channels)

    at run time microprocessor parallel computer

    time & space (hybrid)

    later fabrication step ASICs

    space domain (structural)

    before fabrication full custom ICs

    at loading time

    at compile time

    Reconfigurable Computing

    array processor

    programming domain:

    supersystolic arrays systolic

    arrays

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    166

    Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld

    Why Coarse Grain instead of FPGA ?

    physical logical

    FPGA logical

    1980 1990 2000 2010

    FPGA physical

    100 000 000 000

    10 000 000 000

    1000 000 000

    100 000 000

    10 000 000

    1000 000

    100 000

    10 000

    1000

    Tra

    nsis

    tors

    / c

    hip

    ~ 10

    ~ 10 000

    drastically smaller configuration memory

    a lot of more benefits

    much faster loading

    FPGA routed

    reduced reconfigurability overhead by up to ~ 1000

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    167

    What are the differences ?

    vN* computing:

    • computing in time

    • instruction fetch at run time

    • procedural programming

    • instruction scheduling

    Reconfigurable Computing:

    • computing in space and time

    • “instruction” fetch at compile time

    • structural programming

    • data scheduling

    • i. e. Data-stream-based

    • also hardwired implementations**

    • “instruction” fetch before fabrication **) e g. Bee project Prof. Broderson *) vN stands for “von Neumann”

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    168

    Basics of Binding Time

    run time

    loading time

    compile time

    time of “Instruction Fetch”

    microprocessor parallel computer

    Reconfigurable Computing

    “Instruction” generalized: including complex expressions and other datapaths

    strong impact on the machine paradigm !

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    169

    Data-stream-based Parallelism

    See my other talk

    ICECS 2002 IEEE 9th International Conference

    on Electronics, Circuits and Systems

    Memory Organisation for Datastream-based Reconfigurable Computing

    (invited paper)

    Michael Herz, Agilent Technologies

    Reiner Hartenstein, University of Kaiserslautern Miguel Miranda, Erik Brockmeyer, Francky Catthoor, IMEC, Leuven

    Dubrovnik, Croatia September 15-18, 2002

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    170

    Machine paradigms

    M

    I/O

    instructionsequencer

    datapath(ALU)

    CPU

    instructionstream

    Software

    von Neumann

    M

    datapath

    DPU orrDPU

    unit

    data addressgenerator(data sequencer)

    memory

    datastreamI/O

    asM*

    data-stream machine

    I/O

    MM MM M

    (r)DPA

    memory

    I/OMM MM M

    (r)DPU

    embedded memory architecture*

    Configware

    Flowware

    instruction stream machine

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    171

    Synthesizable Memory Communication

    http://kressarray.de

    Efficient Memory Communication should be directly supported by the Mapper Tools

    An example by Nageldinger’s KressArray Xplorer

    sequencers

    memory ports

    application

    not used

    Legend: Optimized Parallel Memory Controller

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    172

    ############### Terminology has been highly confusing

    1

    2

    0 10 12 18

    mon

    ths

    factor

    *) Department of Trade and Industry, London

    30y

    Battery capacity (1.03/year)

    10y

    4y

    24 36 48

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    173

    Semiconductor Revolutions

    “Mainstream Silicon Application is switching every 10 Years”

    TTL µproc., memory

    custom

    standard

    1957

    1967

    1977

    1987

    1997

    2007

    ASICs, accel’s

    LSI, MSI

    “The Programmable System-on-a-Chip is the next wave“

    Tredennick’s Paradigm Shifts

    hardwired

    algorithm: fixed

    resources: fixed

    procedural programming

    algorithm: variable

    resources: fixed

    structural programming

    algorithm: variable

    resources: variable

    vN machine paradigm

    anti machine paradigm

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    174

    No vN bottleneck

    The anti machine has no von

    Neumann bottleneck.

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    175

    3 different mind sets

    TTL µproc., memory 1957

    1967

    1977

    1987

    1997

    2007

    ASICs, accel’s

    LSI, MSI

    FPGAs

    coarse grain

    soft CPUs

    hardware people CS people new breed needed

    Common terminology needed

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    176

    Throughput vs. Flexibility

    1000

    100

    10

    1

    0.1

    0.01

    0.001 2 1 0.5 0.25 0.13 0.1 0,07

    MOPS / mW

    µ feature size

    T. Claasen et al.: ISSCC 1999

    flexibility

    throughput

    hard- wired

    von Neumann

    FPGAs

    the anti machine goes far beyond bridging the gap

    anti machine

    *) R. Hartenstein: ISIS 1997

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    177

    resources variable

    algorithms variable

    configware

    streamware

    morphwareAnti machine data stream machine

    flowware

    Programming sources

    von Neumann instruction stream machine resources fixed

    algorithms variable

    hardware

    software

    reconfigurable or hardwired

    hardwired only

  • © 2003, [email protected] http://hartenstein.de

    Kaiserslautern University of Technology

    178

    Some soft CPU core examples

    core architecture platform

    MicroBlaze 125 MHz 70 D-MIPS

    32 bit standard RISC

    32 reg. by 32 LUT RAM-based reg.

    Xilinx up to 100 on one FPGA

    Nios 16-bit instr. set

    Altera

    Mercury

    Nios 50 MHz

    32-bit instr. set

    Altera

    2