Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In...

38
Project ARIES Advanced RAM Integration for Efficiency and Scalability Presented by the 8th floor buttheads (pun intended)

Transcript of Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In...

Page 1: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

Project ARIESAdvanced RAM Integration for Efficiency and Scalability

Presented by the 8th floor buttheads

(pun intended)

Page 2: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 2

Talk Overview

• Motivation and research goals

• Previous work

• System description

• Current work

• Research Opportunities

Page 3: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 3

Motivation - System Level

• Many applications have massive memory and/orcomputational requirements– graphics, CAD, physical simulation, factoring, etc.

• Current parallel systems have severe limitations:– difficult to program

– poor network performance

– insufficient scalability

• There is a need for programmable systemsscalable to 1M processors and beyond

Page 4: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 4

Motivation - Architecture

• Transistor counts have increased1000x in 20 years– 1978: 29K in Intel 8086

– 1999: 23M in NVIDIA GeForce 2568086

• Suppose someone gave you 100M transistors andyou had never seen a load-store architecture.What would you build?

• There is a need for designs scalable to 1Gtransistors and beyond

Page 5: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 5

Motivation - Area Efficiency

K7 Die Photo

• In modern processors < 25% ofchip area devoted to usefulwork (red areas on die shot)

• > 75% devoted to making the25% faster

• Even the 25% is bloated due to– large instruction sets

– complex superscalar design

• ...not necessarily a bad thing

Page 6: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 6

Motivation - Scalability

Today

• 1-4 processors per die

• Use available area tomake processor fast– complicated designs

Tomorrow

• N processors per die

• Use available area forlots of processors– simple designs

Page 7: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 7

Motivation - RAM Integration

• Logic and DRAM on a single die provides:– lower latency access to memory (~2x)

– much higher bandwidth (~10x)

• What architectural features are enabled by thistechnology?– SRAM-tagged DRAM

– Transactional memory

– Forwarding pointers

Page 8: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 8

Research Goals

• Highly programmable parallel system– language

– compiler

– operating system

– architecture

• Manage huge data sets

• Area efficient architecture

• Integration of processor and memory

Page 9: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 9

Architectural Goals

• Fast general purpose processor– ~1 GFLOP/processing node

• High performance network– high bandwidth (4GB/s at each node)

– low latency (~10ns across a 1000 node system)

• Compilability

• Scalability– up to 1,000,000 nodes

Page 10: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 10

Talk Overview

• Motivation and research goals

• Previous work

• System Description

• Current work

• Research Opportunities

Page 11: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 11

Previous Work - Processors

• Bill Dally– J-Machine

– M-Machine

– Imagine

• Anant Agarwal– Alewife

– RAW

• NYU Ultracomputer

• Hydra

• VLIW stuff

• KSR1

• NuMachine

• Hydra

• Tera

• IBM Power4

Page 12: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 12

Previous Work - Networks

• Tom Knight– Metro

• Bill Dally/John Poulton– 4GB/s

• Craylink

• SGI

• Charles Leiserson– FAT trees

• SCI

• HiPPi

• CM-5

• Myranet

• Artic

Page 13: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 13

Previous Work - Cache

• SHASTA

• DASH

• FLASH

• LimitLESS

Page 14: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 14

Previous Work - Languages

• NESL

Page 15: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 15

Other Previous Work

• Capabilities

• Guarded Pointers (Dally)

• Containment

• IRAM

• IStore

• SimOS

• Beowulf

• Active Pages

Page 16: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 16

Talk Overview

• Motivation and research goals

• Previous work

• System description

• Current work

• Research Opportunities

Page 17: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 17

System Description

• Distributed shared memory machine

• No caching of remote data

• Single shared virtual address space

• Large collection of processor-memory nodes

• High Performance FAT Tree network

Page 18: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 18

System Description

• OS– Memory management

– threads

• language– ???

Page 19: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 19

Top-Level View

• Bunnie should make this slide.. Bunch of cabinetscontaining processor and network boards; 1 diskdrive per network board; high speed serial links;enough wire to tether the moon to tech square.

Page 20: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 20

Chip-Level View

• array of processor-memorynodes connected by on-chipnetwork

• high speed signalingtechnology exists (Dally) toprovide large off-chipbandwidth

– e.g. 256 pin pairs at 2Gb/seach = 512 Gb/s

I$ D$

PE

DRAM N

E

T

D$ I$

PE

DRAMN

E

T

I$ D$

PE

DRAM N

E

T

D$ I$

PE

DRAMN

E

T

I$ D$

PE

DRAM N

E

T

D$ I$

PE

DRAMN

E

T

I$ D$

PE

DRAM N

E

T

D$ I$

PE

DRAMN

E

T

I$ D$

PE

DRAM N

E

T

D$ I$

PE

DRAMN

E

T

I$ D$

PE

DRAM N

E

T

D$ I$

PE

DRAMN

E

T

I$ D$

PE

DRAM N

E

T

D$ I$

PE

DRAMN

E

T

I$ D$

PE

DRAM N

E

T

D$ I$

PE

DRAMN

E

T

Page 21: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 21

Processor-Memory Node

RAM

Processing Element

RAMRAMRAM

D$ I$

Network

Memory Controller

NetworkInterface

Legend

~ 128 bits

~ 512 bits

~ 32 bits

Page 22: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 22

Processing element

• 128 bit multi-context VLIW processor

• Four functional units

• IEEE floating point

• Hardware enforced capabilities

• Hardware Macros???

Page 23: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 23

Simple Silicon

• No dynamic superscalar issue

• No out of order issue

• No register renaming

• No branch prediction

• No speculative execution

• No remote-data caching

Page 24: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 24

Key Features

• Fast fault tolerant network– blah blah blah

• Augmented DRAM– SRAM tagged DRAM

• fast searching for marked data

• high performance GC

• experimental versioning

– Hardware page tables

Page 25: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 25

Key Features

• Fixed virtual address → node mapping– Eliminates global tables

• Multi-striped address → node mapping– Stripe contiguous addresses across nodes at any power

of two granularity

• Memory efficient guarded pointers– Power-of-two schemes waste up to 50% of the virtual

address space

Page 26: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 26

Talk Overview

• Motivation and research goals

• Previous work

• System description

• Current work

• Research Opportunities

Page 27: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 27

Hardware Design Effort

• Node design– processor

– memory system

• Network design– routing protocol

– topology

• Cycle accurate simulator

Page 28: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 28

Prototyping System

• Simulator for complex system issues– not practical to do an accurate, pure software simulation

of a massively parallel processor

• Evaluation of experimental architectural featuresin a realistic environment– network topologies

– language design

– hardware support for programming features

Page 29: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 29

Prototyping System

• Modular design– easily reconfigure a single design for multiple network

topologies by recabling and redistributing processorand network cards

– network can be adapted for other performance-orientedplatforms

– enable collaboration between multiple research groupsin different research institutions

Page 30: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 30

Modular Design

Page 31: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 31

Prototype Node

• Features for fast debug, profiling, reconfiguration

• Architecture to enable experimentation with novelmemory architectures and domains of execution

Page 32: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 32

Prototype Node

Page 33: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 33

Language Design

• Communication intensive programming

• Leverage “data snapshots” for migratability– e.g. function calls

• Blah blah blah

Page 34: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 34

Garbage Collection

• Fast area local GC

• Data migration

• Paging friendly

• Blah blah blah

Page 35: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 35

Talk Overview

• Motivation and research goals

• Previous work

• System description

• Current work

• Research Opportunities

Page 36: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 36

Research Opportunities

• Processor evaluation– hand coded sample apps

– performance at various design points

• Language design/evaluation– expresses parallelism and communication

– programmability vs. compilability

Page 37: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 37

Research Opportunities 2

• Compiler Design– C++ compiler

– PSCHEME compiler

– VLIW scheduling

• Operating System Design– Thread management, page management, memory

management, etc. etc. etc.

Page 38: Project ARIES - MIT CSAIL30/04/00 Project Aries 5 Motivation - Area Efficiency K7 Die Photo • In modern processors < 25% of chip area devoted to useful work ( red areas on die

30/04/00 Project Aries 38

Research Opportunities 3

• Alternative Execution Domains– secure computing

– efficient dynamic typing

– transactional/speculative computing

– dataflow