Stamatis Vassiliadis Symposium The Future of Computing A+A=A

18
1 A+A=A Stamatis Vassiliadis Symposium Stamatis Vassiliadis Symposium The Future of Computing A+A=A Mateo Valero Barcelona Supercomputing Center To Stamatis, my loved friend

description

Stamatis Vassiliadis Symposium The Future of Computing A+A=A. Mateo Valero Barcelona Supercomputing Center. To Stamatis, my loved friend. The way we all do research ... As seen from HPCA 1999. Microarchitecture idea. Applications. SPEC, PerfectClub, TPC-D, NAS, Splash …. Compiler. - PowerPoint PPT Presentation

Transcript of Stamatis Vassiliadis Symposium The Future of Computing A+A=A

Page 1: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

1 A+A=A Stamatis Vassiliadis Symposium

Stamatis Vassiliadis SymposiumThe Future of Computing

A+A=A

Mateo Valero

Barcelona Supercomputing Center

To Stamatis,my loved friend

Page 2: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

2 A+A=A Stamatis Vassiliadis Symposium

The way we all do research ... As seen from HPCA 1999

• Microarchitecture idea

Applications

Compiler

Simulator

Results

SPEC, PerfectClub, TPC-D, NAS, Splash …

Production, public, custom, …

Public, custom, …

How much we get from our idea

Page 3: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

3 A+A=A Stamatis Vassiliadis Symposium

The Past Future ... As seen from HPCA 1999

Algorithms

Compiler

Architecture

Hardware

Applications

Absolutely obsessed with going to

the limits of extracting available ILP on a single core

Page 4: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

4 A+A=A Stamatis Vassiliadis Symposium

The Past Future Continued:Advanced ILP Techniques for Superscalar Processors

• Optimized Pipeline

• Cache

• Branch Predictors

• Instruction Collapsing

• Value Prediction

• Reuse

• Assisted/Subordinated Threads

• Trace Cache/Processor

• Control/Data Speculation

• Kilo-instruction Processors

• ………

Page 5: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

5 A+A=A Stamatis Vassiliadis Symposium

Distant Parallelism: Non-numerical applications

• (In)Dependent threads: e.g. m88ksim

• Application speed-up: 2.65

check_issue kill_time

Real_execution

breakpoint?

PC guess breakpoint? fetch_next

statistics

cmmutime

Sbus2

TIMING

EXE

FETCH

Page 6: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

6 A+A=A Stamatis Vassiliadis Symposium

The “immediate” future: Number of cores doubled every 18 months

“It is better for Intel to get involved in this now so when we get to the point of having 10s and 100s of cores we will have the answers.

There is a lot of architecture work to do to release the potential, and we will not bring these products to market until we have good solutions to the programming problem”

Justin Rattner Intel CTO

“Now, the grains inside these machines more and more will be multi-core type devices, and so the idea of parallelization won't just be at the individual chip level, even inside that chip we need to explore new techniques like transactional memory that will allow us to get the full benefit of all those transistors and map that into higher and higher performance.” Bill Gates, Supercomputing 05 keynote

Marenostrum

Most beautiful supercomputerFortune magazine, Sept. 2006

#1 in Europe, #5 in the World

100's of TeraFlops with general purpose Linux supercluster of commodity PowerPC-based Blade Servers

Page 7: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

7 A+A=A Stamatis Vassiliadis Symposium

Supercomputers will likely have millions of processing cores

Page 8: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

8 A+A=A Stamatis Vassiliadis Symposium

The “far” future (e.g. 2017) and The big question!

How to solve the programming problem? a.k.a. How to program the beast?

• How to enable the power of the hundreds to millions of cores on a system?

• Computer Architects must adapt their thinking. From now on, parallel software requirements will directly drive systems design

• We need a multidisciplinary top-down approach to this, including

• Applications

• Algorithms

• Debugging

• Programming models

• Programming languages

• Compilers

• Operating Systems

• Runtime environment

… as design drivers for future Architectures

Page 9: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

9 A+A=A Stamatis Vassiliadis Symposium

The holistic view: A + A = A

How to solve the programming problem? a.k.a. How to program the beast?

• How to enable the power of the hundreds to millions of cores on a system?

• Computer Architects must adapt their thinking. From now on, many-core software requirements will directly drive processor design

• We need a multidisciplinary top-down approach to this, including

• Applications

• Algorithms

• Debugging

• Programming models

• Programming languages

• Compilers

• Operating Systems

• Runtime environment

… as design drivers

Applirithms +

Adhesive=

Architecture

Page 10: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

10 A+A=A Stamatis Vassiliadis Symposium

Far Future: Applications

• What will be the typical applications in 2017?

• Is it Dwarfs and/versus RMS the right path to follow?

• Applications are ephemeral but the kernels are forever: the applications may change, the kernels stay the same.

• Will streaming applications require new architectures?

• Are we approaching the special purpose accelerators for specific applications?

M. Valero. Microsoft Workshop on Multicore, Seattle, June-2007

Page 11: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

11 A+A=A Stamatis Vassiliadis Symposium

Far Future: Algorithms

• Bad news (for some folks): “Rethink and rewrite the algorithms”

• For manycores, the algorithms need to carefully consider:

• The right level of parallelism

• Load Balancing

• Communication-Computation overlapping

• Speculation (e.g. in message passing)

Source: Jack Dongarra Microsoft Workshop on Multicore, Seattle, June-2007

Page 12: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

12 A+A=A Stamatis Vassiliadis Symposium

Top-Down CMP Design, an initial programmer wishlist

• Easy-to-express paralellism

• Transactional Memory (TM): Compared to locks, TM provides an easy to use mechanism for ensuring mutual exclusion

• Hide all kind of non-uniformities to the programmer (heterogeneous cores, non-uniform memory access, …)

• Continue using standard tools

• OpenMP: the industry standard for writing parallel programs on shared memory

• TM and OpenMP combines ease with familiarity for programming multi-cores

• BSC-UPC-Microsoft: IWOMP07, MEDEA07

• Stanford: PACT07

• Dataflow model ideally suited to express paralelism

• Cell Superscalar = Distant Parallelism+Data Flow+ Out of Order Execution

• Super computers: MPI+ (OpenMP/Cell Superscalar)+TM))

Page 13: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

13 A+A=A Stamatis Vassiliadis Symposium

Chip organization in 2017: many-core

• How many cores will the processor of 2017 have?

• Will they be homogeneous or heterogeneous?. Arrays of simple in order cores, fewer complex out of order or a mix of the two? Consentry and Internet Security

• Simultaneous Multithreading is just for servers?

• Should we push for further optimizing classical OoO implementations or research how to put into practical use radical new approaches such as dataflow or asynchronous architectures?

Mem

ory

Mem

ory

Cac

heC

ache

Cac

heC

acheOn-

die

Inte

rcon

nect

Cac

heC

ache

Cac

heC

acheOn-

die

Inte

rcon

nect

Off-die Interconnect MemoryMemory

Microsoft Workshop on Multicore, Seattle, June-2007

Page 14: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

14 A+A=A Stamatis Vassiliadis Symposium

Chip organization in 2017: memory and interconnection network

• How will the latency and bandwidth problems be addressed?

• 3D integration aware Computer Architecture: it is a great future idea. Will it will always be a great future idea?

• What is the best many-core interconnect topology?

• How we can evaluate the importance of the interconnection network in the applications?

• What are the obstacles that are presented for parallel applications when I/O doesn't scale well?

Microsoft Workshop on Multicore, Seattle, June-2007

Page 15: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

15 A+A=A Stamatis Vassiliadis Symposium

App

licat

ions

Architecture

Transactional Memory

STM HTM

Func

tion

alIm

pera

tiveP

rogr

amm

ing

mod

el

An overall picture of the Microsoft Many-core project

• Programming models for futuremany-core architectures

• Architectural support to programmingmodels

• OpenMP+TM

• HW acceleration for Haskell

• Many-core architecture

• Power-aware

Page 16: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

16 A+A=A Stamatis Vassiliadis Symposium

An overall picture of the IBM MareIncognito project

• Our 10-100 Petaflop research project for BSC (2010)

• Port/develop applications to reduce time-to-production once installed

• Programming models (MPI, OpenMP+TM, CellSs)

• Tools for application developmentand to support previous evaluations

• Evaluate node architecture (heavily multicored)

• Evaluate interconnect optionsPerformance analysis and

PredictionTools

Processor and node

Load balancing

Interconnect

Applicationdevelopment

an tuning

Fine-grain programming

models

Model andprototype

Page 17: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

17 A+A=A Stamatis Vassiliadis Symposium

Supercomputing and e-Science Consolider program

• 5 Grand Challenge applications• 22 groups• 119 senior researchers

Strong interaction

Interaction to be created

Earth Sciences

Astrophysics

Engineering

Material Sciences

Life SciencesCompilers and

tuning of application kernels

Programming models and performance tuning tools

Architecturesand hardwaretechnologies

Page 18: Stamatis Vassiliadis Symposium The Future of Computing A+A=A

18 A+A=A Stamatis Vassiliadis Symposium

Education for multi-core

I programming

multicores

Multicore-based pacifier