Course 3: ARIES Login and Getting Started in ARIES ARIES ON DEMAND Training 3.00.
Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In...
Transcript of Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In...
Project ARIESAdvanced RAM Integration for Efficiency and Scalability
Presented by the 8th floor buttheads
(pun intended)
30/04/00 Project Aries 2
Talk Overview
• Motivation and research goals
• Previous work
• System description
• Current work
• Research Opportunities
30/04/00 Project Aries 3
Motivation - System Level
• Many applications have massive memory and/orcomputational requirements– graphics, CAD, physical simulation, factoring, etc.
• Current parallel systems have severe limitations:– difficult to program
– poor network performance
– insufficient scalability
• There is a need for programmable systemsscalable to 1M processors and beyond
30/04/00 Project Aries 4
Motivation - Architecture
• Transistor counts have increased1000x in 20 years– 1978: 29K in Intel 8086
– 1999: 23M in NVIDIA GeForce 2568086
• Suppose someone gave you 100M transistors andyou had never seen a load-store architecture.What would you build?
• There is a need for designs scalable to 1Gtransistors and beyond
30/04/00 Project Aries 5
Motivation - Area Efficiency
K7 Die Photo
• In modern processors < 25% ofchip area devoted to usefulwork (red areas on die shot)
• > 75% devoted to making the25% faster
• Even the 25% is bloated due to– large instruction sets
– complex superscalar design
• ...not necessarily a bad thing
30/04/00 Project Aries 6
Motivation - Scalability
Today
• 1-4 processors per die
• Use available area tomake processor fast– complicated designs
Tomorrow
• N processors per die
• Use available area forlots of processors– simple designs
30/04/00 Project Aries 7
Motivation - RAM Integration
• Logic and DRAM on a single die provides:– lower latency access to memory (~2x)
– much higher bandwidth (~10x)
• What architectural features are enabled by thistechnology?– SRAM-tagged DRAM
– Transactional memory
– Forwarding pointers
30/04/00 Project Aries 8
Research Goals
• Highly programmable parallel system– language
– compiler
– operating system
– architecture
• Manage huge data sets
• Area efficient architecture
• Integration of processor and memory
30/04/00 Project Aries 9
Architectural Goals
• Fast general purpose processor– ~1 GFLOP/processing node
• High performance network– high bandwidth (4GB/s at each node)
– low latency (~10ns across a 1000 node system)
• Compilability
• Scalability– up to 1,000,000 nodes
30/04/00 Project Aries 10
Talk Overview
• Motivation and research goals
• Previous work
• System Description
• Current work
• Research Opportunities
30/04/00 Project Aries 11
Previous Work - Processors
• Bill Dally– J-Machine
– M-Machine
– Imagine
• Anant Agarwal– Alewife
– RAW
• NYU Ultracomputer
• Hydra
• VLIW stuff
• KSR1
• NuMachine
• Hydra
• Tera
• IBM Power4
30/04/00 Project Aries 12
Previous Work - Networks
• Tom Knight– Metro
• Bill Dally/John Poulton– 4GB/s
• Craylink
• SGI
• Charles Leiserson– FAT trees
• SCI
• HiPPi
• CM-5
• Myranet
• Artic
30/04/00 Project Aries 13
Previous Work - Cache
• SHASTA
• DASH
• FLASH
• LimitLESS
30/04/00 Project Aries 14
Previous Work - Languages
• NESL
30/04/00 Project Aries 15
Other Previous Work
• Capabilities
• Guarded Pointers (Dally)
• Containment
• IRAM
• IStore
• SimOS
• Beowulf
• Active Pages
30/04/00 Project Aries 16
Talk Overview
• Motivation and research goals
• Previous work
• System description
• Current work
• Research Opportunities
30/04/00 Project Aries 17
System Description
• Distributed shared memory machine
• No caching of remote data
• Single shared virtual address space
• Large collection of processor-memory nodes
• High Performance FAT Tree network
30/04/00 Project Aries 18
System Description
• OS– Memory management
– threads
• language– ???
30/04/00 Project Aries 19
Top-Level View
• Bunnie should make this slide.. Bunch of cabinetscontaining processor and network boards; 1 diskdrive per network board; high speed serial links;enough wire to tether the moon to tech square.
30/04/00 Project Aries 20
Chip-Level View
• array of processor-memorynodes connected by on-chipnetwork
• high speed signalingtechnology exists (Dally) toprovide large off-chipbandwidth
– e.g. 256 pin pairs at 2Gb/seach = 512 Gb/s
I$ D$
PE
DRAM N
E
T
D$ I$
PE
DRAMN
E
T
I$ D$
PE
DRAM N
E
T
D$ I$
PE
DRAMN
E
T
I$ D$
PE
DRAM N
E
T
D$ I$
PE
DRAMN
E
T
I$ D$
PE
DRAM N
E
T
D$ I$
PE
DRAMN
E
T
I$ D$
PE
DRAM N
E
T
D$ I$
PE
DRAMN
E
T
I$ D$
PE
DRAM N
E
T
D$ I$
PE
DRAMN
E
T
I$ D$
PE
DRAM N
E
T
D$ I$
PE
DRAMN
E
T
I$ D$
PE
DRAM N
E
T
D$ I$
PE
DRAMN
E
T
30/04/00 Project Aries 21
Processor-Memory Node
RAM
Processing Element
RAMRAMRAM
D$ I$
Network
Memory Controller
NetworkInterface
Legend
~ 128 bits
~ 512 bits
~ 32 bits
30/04/00 Project Aries 22
Processing element
• 128 bit multi-context VLIW processor
• Four functional units
• IEEE floating point
• Hardware enforced capabilities
• Hardware Macros???
30/04/00 Project Aries 23
Simple Silicon
• No dynamic superscalar issue
• No out of order issue
• No register renaming
• No branch prediction
• No speculative execution
• No remote-data caching
30/04/00 Project Aries 24
Key Features
• Fast fault tolerant network– blah blah blah
• Augmented DRAM– SRAM tagged DRAM
• fast searching for marked data
• high performance GC
• experimental versioning
– Hardware page tables
30/04/00 Project Aries 25
Key Features
• Fixed virtual address → node mapping– Eliminates global tables
• Multi-striped address → node mapping– Stripe contiguous addresses across nodes at any power
of two granularity
• Memory efficient guarded pointers– Power-of-two schemes waste up to 50% of the virtual
address space
30/04/00 Project Aries 26
Talk Overview
• Motivation and research goals
• Previous work
• System description
• Current work
• Research Opportunities
30/04/00 Project Aries 27
Hardware Design Effort
• Node design– processor
– memory system
• Network design– routing protocol
– topology
• Cycle accurate simulator
30/04/00 Project Aries 28
Prototyping System
• Simulator for complex system issues– not practical to do an accurate, pure software simulation
of a massively parallel processor
• Evaluation of experimental architectural featuresin a realistic environment– network topologies
– language design
– hardware support for programming features
30/04/00 Project Aries 29
Prototyping System
• Modular design– easily reconfigure a single design for multiple network
topologies by recabling and redistributing processorand network cards
– network can be adapted for other performance-orientedplatforms
– enable collaboration between multiple research groupsin different research institutions
30/04/00 Project Aries 30
Modular Design
30/04/00 Project Aries 31
Prototype Node
• Features for fast debug, profiling, reconfiguration
• Architecture to enable experimentation with novelmemory architectures and domains of execution
30/04/00 Project Aries 32
Prototype Node
30/04/00 Project Aries 33
Language Design
• Communication intensive programming
• Leverage “data snapshots” for migratability– e.g. function calls
• Blah blah blah
30/04/00 Project Aries 34
Garbage Collection
• Fast area local GC
• Data migration
• Paging friendly
• Blah blah blah
30/04/00 Project Aries 35
Talk Overview
• Motivation and research goals
• Previous work
• System description
• Current work
• Research Opportunities
30/04/00 Project Aries 36
Research Opportunities
• Processor evaluation– hand coded sample apps
– performance at various design points
• Language design/evaluation– expresses parallelism and communication
– programmability vs. compilability
30/04/00 Project Aries 37
Research Opportunities 2
• Compiler Design– C++ compiler
– PSCHEME compiler
– VLIW scheduling
• Operating System Design– Thread management, page management, memory
management, etc. etc. etc.
30/04/00 Project Aries 38
Research Opportunities 3
• Alternative Execution Domains– secure computing
– efficient dynamic typing
– transactional/speculative computing
– dataflow