Asynchronous Design Automation Enabling FTL Systems• StarStream

16
Copyright 2006 FTL Systems Asynchronous Design Automation Enabling FTL Systems’ StarStream Compact Supercomputer StarStream and Asynchronous Merlin research and development solely at commercial expense, evaluation of early release systems funded in part by the US Navy Specifications subject to change without notice StarStream and Merlin are trademarks of FTL Systems, all rights reserved StarStream and Merlin are the subject of FTL’s pending and issued patents StarStream and Asynchronous versions of Merlin not yet offered for sale Dr. Robert G Babb II Northrop Grumman Information Technology [email protected] Dr. John Willis FTL Systems [email protected]

Transcript of Asynchronous Design Automation Enabling FTL Systems• StarStream

Copyright 2006 FTL Systems

Asynchronous Design AutomationEnabling FTL Systems’ StarStream Compact Supercomputer

StarStream and Asynchronous Merlin research and development solely at commercial expense,evaluation of early release systems funded in part by the US Navy

Specifications subject to change without notice

StarStream and Merlin are trademarks of FTL Systems, all rights reserved

StarStream and Merlin are the subject of FTL’s pending and issued patents

StarStream and Asynchronous versions of Merlin not yet offered for sale

Dr. Robert G Babb II

Northrop Grumman Information Technology

[email protected]

Dr. John Willis

FTL Systems

[email protected]

Copyright 2006 FTL Systems

Outline

1. Company Roles

2. Why Use Asynchronous for High Performance Computing?

3. Design Automation Strategy using Merlin

4. StarStream Compact Supercomputers Enabled

5. Value of Asynchronous Technology

6. Lessons Learned

Copyright 2006 FTL Systems

Company Roles

! FTL Systems: Provides end-to-end design automation flow with behavioralasynchronous capability, StarStream compact supercomputer processors,computers, programming compilers

! Northrop Grumman (IT): Provides applications government-orientedapplications and on-site operational services

! Cisco: Provide high-bandwidth network and file system capability

! Many other companies, North American / European governmentalorganizations are involved

(Intellectual properties remain owned by respective developers, noendorsements expressed or implied)

Copyright 2006 FTL Systems

Why Use Asynchronous for High Performance Computing?:Power Management

! High performance processors are limited by power in and heat out:! Optimal per-die dissipation of 10-40 watts

! 100-200 watt dissipation feasible with economical technology

! 90nm die running at 2 to 6 GHz can exceed 100-200 watts

! Some power dissipation results from leakage currents (largelyspeed-independent), addressed by other means

! High die temperature leads to higher cost and lower reliability

! Current designs manage via limiting logic duty cycle (clock-gating…)

! Asynchronous logic facilitates managing power by disablinglocalized parts of processor on a cycle-by-cycle basis

Copyright 2006 FTL Systems

Why Use Asynchronous for High Performance Computing?:Complexity Management

! High-end microprocessors use 1,000M to 2,000M transistors

! Supercomputing processors such as StarStream use 5,000M to 10,000Mtransistors (exceeding current single-die fabrication, even at 45nM)

! Timing closure of even a 5,000M transistor design running at1G Hz to 10G Hz is time-consuming (frequently months),expensive (significant part of design cost) and error-prone(field failures often traced to marginal timing at particularvoltage, temperature, process point…)

! Asynchronous technology can reduce back-end design ruleclosure time, money and risk

! However, fully delay insensitive asynchronous logic is generally notsuitable for supercomputing performance requirements

Copyright 2006 FTL Systems

Why Use Asynchronous for High Performance Computing?:Complications, The Down-Side

! Transit times associated with various kinds of completion signals increase theeffective cycle time, especially between physically distant parts of a processor

! Most asynchronous logic techniques significantly increase bothtransistor count and logical wiring complexity (greatly complicatingdesigns that already do not fit on a single die)

! Meta-stability physics associated with merging of local time domains inducesstatistical reliability problems, especially with large and fast systems

! Standard cell libraries (FPGA & ASIC) are seldom complete forasynchronous logic technologies (resulting in less optimal realizationor specialized cell library development)

! Few designers have expertise or design time to manually deal with asynchronous

! Most design automation technology does not support complex asynchronousdesigns using standard design languages (such as VHDL) and design flowinterfaces

All are solvable challenges, this talk focuses on a commercial solution to the last twoissues

Copyright 2006 FTL Systems

Design Automation Strategy: Flow

StarStream

Behavioral Processor & System

Specification (using standard VHDL

with minimal extensions)

Logical Technology Specific &

Realization Technology Specific

Implementations of “type systems”

FTL Systems’

Software-only

Implementation

For Compiler Testing

& Early Performance

Evaluation

Early Release

Version (runs

at greater than

400 MHz with

modest cost, risk)

Asynchronous

Quasi-Custom

Logic Version

runs at greater

than 1.5G Hz

Clocked Logic

Quasi-Custom

Logic Version

Runs at greater

Than 1.5 GHz

StarStream implementations

Copyright 2006 FTL Systems

Design Automation Strategy:Abstraction of Logic & Realization Technology

! Conventional asynchronous design practice embodies a specificasynchronous logic design style and often realization directlyin design

! This implies an early (perhaps premature) selection of thebest logic technology and realization technology

! It also requires that all designers have an expert knowledge ofthe asynchronous logic technologies employed

New Approach:! Capture the behavioral design intent distinct from implementation

and realization decisions! Define one or more systems of data type representations

and operator implementations, one for each logic or realizationtechnology

! Design automation tool semi-automatically selects suitabletechnologies to apply to each segment of the design to bestmeet design constraints (timing, size, power)

Copyright 2006 FTL Systems

Design Automation Strategy: Example

Processor behavioral model specifies addition of two values

Each visible type system specifies:

1. Data representation (for example a particular approach

to representing bundled data)

2. Implementation of spanning set of operators

(such as addition) specific to data representation

3. Type conversion operators suitable for implicit or

explicit type conversion from other logic or

realization type (logic) systems

Combination of localized usage, hardware scheduling,

constraints and optimization strategies automatically

(or semi-automatically) choose and apply the most suitable

logic and realization technology.

Addition may be expressed using a particular bundled data approach, specific adder

algorithm and specific devices (much as compiler chooses specific registers & assembly)

Copyright 2006 FTL Systems

Application of Logic Technology to Design

Distinct asynchronous logic technologies

Distinct asynchronouslogic technologies maybe applied to designregions or specific state

Applications via operationalconstraints or interactivegraphical user interface

Single tool combines bothverification (digital andanalog) as well as synthesisinto a tight iterative loop

“Flat” regions Objects in region

Copyright 2006 FTL Systems

StarStream Supercomputers Enabled: Processors

• Nominal .5 integer TeraFlops in Early Release -> ~ greater than

1.5 TeraFlops with asynchronous or clocked production logic

• Automatically compiled from a common behavioral model: implementation

specific effort is largely verification, production & test costs

• CPU has ~5-8 times the number of transistors in next generation

microprocessors (6,000M to 10,000M transistors)

• Early release processors use 29 packages, each a large 90nM die

• Module power and cost are on high-end of microprocessor range

Copyright 2006 FTL Systems

StarStream Supercomputers Enabled: Systems

• approx 19” cube suitable for deskside or standard 19” rack mount

• Cubes assemble via electro-optic switches into PetaOP systems

using a few tens of racks

• Fully asynchronous cubes more easily assembled into multi-cube

systems than cubes with inherent clock domains; asynchronous scales!

12U H

15” D19” W

Copyright 2006 FTL Systems

StarStream Supercomputers Enabled: Status

! Early Release silicon being populated on boards and inbring-up test now

! Early release systems expected to be in circulation forsoftware verification and testing mid-2006

! Design focus shifting from Early Release to Production systems

Copyright 2006 FTL Systems

Value of Asynchronous Technology

! As physical size, operating speed, design complexity growwhile device dimensions decrease, asynchronous systemshave increasing criticality to feasibility of faster computers.

! Incorporating asynchronous technology into the first systemshas a very significant cost increase over conventional computersystem research and development costs; subsequent systemsare likely to be less expensive

! Asynchronous technologies are good for localizing power management

! Expect to have commercial relevant comparative data ofasynchronous and clocked logic on distinct implementationsof the same high performance computer system within the next year

Copyright 2006 FTL Systems

Lessons Learned so Far (1/2)

Merlin & StarStream effort underway for eight years, already learnedseveral important lessons:

! Largest effort is on compilers, behavioral architecturedesign & design automation tools; hardware comes downto diligent analog engineering and pre-fabrication verification

! Design teams can be significantly smaller and more efficient(tens rather than hundreds of people), resulting in agility

! Effective use of approach requires designers with even broaderknowledge and deeper experience than conventional approaches(most of those involved are PhD with 10-25 years experience,extremely hard for technicians and new graduates to get traction)

! Business model focuses on computer systems; EDA tools are anenabler

Copyright 2006 FTL Systems

Lessons Learned so Far (2/2)

! VHDL suitable for capturing designer’s behavioral intent; VHDL’sextensible type system of significant value. Behavioral synthesismust recognize domains in which VHDL over-constraints timing.

! Significant logic technology and device realization research anddevelopment required: handshake latency, growth in transistorcount which are fine for low-power, comparatively small systemsare not suitable for large, high-performance systems

! Highly automated behavioral synthesis, simulation, formalverification and design rule checking required with specialcharacteristics supporting asynchronous design. Manualapproaches are too error-prone for large designs with highverification and fabrication costs