Third Software Crisis - Computer Science and...

44
1 Third Software Crisis The Multicore Problem Saman Amarasinghe Massachusetts Institute of Technology

Transcript of Third Software Crisis - Computer Science and...

1

Third Software CrisisThe Multicore Problem

Saman AmarasingheMassachusetts Institute of Technology

2

The “Software Crisis”

“To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem."

-- E. Dijkstra, 1972 Turing Award Lecture

3

The First Software Crisis

• Time Frame: ’60s and ’70s

• Problem: Assembly Language Programming– Computers could handle larger more complex programs

• Needed to get Abstraction and Portability without loosing Performance

4

How Did We Solve the First Software Crisis?

• High-level languages for von-Neumann machines– FORTRAN and C

• Provided “common machine language” for uniprocessors

Single memory image

Single flow of control

Common PropertiesUniprocessors

ISA

Functional Units

Register File

Differences:

5

The Second Software Crisis

• Time Frame: ’80s and ’90s

• Problem: Inability to build and maintain complex and robust applications requiring multi-million lines of code developed by hundreds of programmers– Computers could handle larger more complex programs

• Needed to get Composability, Malleability and Maintainability– High-performance was not an issue left for Moore’s Law

6

How Did We Solve the Second Software Crisis?

• Object Oriented Programming– C++, C# and Java

• Also…– Better tools

– Component libraries, Purify – Better software engineering methodology

– Design patterns, specification, testing, code reviews

7

The OO Revolution

• Object Oriented revolution did not come out of a vacuum

• Hundreds of small experimental languages

• Rely on lessons learned from lesser-known languages– C++ grew out of C, Simula, and other languages– Java grew out of C++, Eiffel, SmallTalk, Objective C, and Cedar/Mesa1

• Depend on results from research community

J. Gosling, H. McGilton, The Java Language Enviornment1

8

Object Oriented Languages

• Ada 95• BETA• Boo• C++• C#• ColdFusion• Common Lisp• COOL (Object

Oriented COBOL)• CorbaScript• Clarion• Corn• D• Dylan• Eiffel• F-Script• Fortran 2003• Gambas• Graphtalk• IDLscript• incr Tcl• J• JADE

• Java• Lasso• Lava• Lexico• Lingo• Modula-2• Modula-3• Moto• Nemerle• Nuva• NetRexx• Nuva• Oberon (Oberon-1)• Object REXX• Objective-C• Objective Caml• Object Pascal (Delphi)• Oz• Perl 5• PHP• Pliant• PRM• PowerBuilder

• ABCL• Python• REALbasic• Revolution• Ruby• Scala• Simula• Smalltalk• Self• Squeak• Squirrel• STOOP (Tcl

extension)• Superx++• TADS• Ubercode• Visual Basic• Visual FoxPro• Visual Prolog• Tcl• ZZT-oop

Source: Wikipedia

9

Language EvolutionFrom FORTRAN to a few present day languages

Source: Eric Levenez

10

Origins of C++

Source: B. Stroustrup, The Design and Evolution of C++

1960

1970

1980

1990

Structural influenceFeature influence

FortranAlgol 60

CPL

BCPL

C

ANSI C

Simula 67

C with Classes

C++

C++arm

C++std

ML CluAlgol 68

Ada

11

Academic Influence on C++

“Exceptions were considered in the original design of C++, butwere postponed because there wasn't time to do a thorough job ofexploring the design and implementation issues.

In retrospect, the greatest influence on the C++ exceptionhandling design was the work on fault-tolerant systems started at the University of Newcastle in England by Brian Randell and his colleagues and continued in many places since.”

-- B. Stroustrup, A History of C++

12

Origins of Java• Java grew out of C++, Eiffel, SmallTalk, Objective C, and Cedar/Mesa• Example lessons learned:

– Stumbling blocks of C++ removed (multiple inheritance, preprocessing, operator overloading, automatic coercion, etc.)

– Pointers removed based on studies of bug injection– GOTO removed based on studies of usage patterns– Objects based on Eiffel, SmallTalk– Java interfaces based on Objective C protocols– Synchronization follows monitor and condition variable paradigm (introduced by Hoare,

implemented in Cedar/Mesa)– Bytecode approach validated as early as UCSD P-System (‘70s)

Lesser-known precursors essential to Java’s success

Source: J. Gosling, H. McGilton, The Java Language Enviornment

13

Role of Research in Language Design

1. Invent new features and paradigms2. Simplify or eliminate bad features3. Experimental evaluation

14

1. Invent New Features

Modula-2Modules

Simula 67Monads

CobolMacros

KRC (influencing Haskell)List comprehension

ISWIM (influencing Haskell)Lazy evaluation

Simula 67Inheritance

Scheme R4RSHygienic macros

LispHeap allocation

LispGarbage collection

Algol 58Explicit typing

PL/IException handling

Algol 58Compound statements

CobolComments

LispClosure

Simula 67Class

BCPLChained comparisons

Algol WCase statement

Algol 60Block nesting with scope

Algol 60 syntaxBNF (Backus-Naur Form)

C++Assignment operator overloading

Simula 67Abstract data types

First DemonstrationProgramming Concept

Algol 68Using C as portable assembler

C++ (Cfront)Variable decl anywhere in block

MLUser-defined data types

HaskellType inference

CobolType classes

FortranStructures (records)

Algol 60Static allocation

Algol 58Stack dynamic variables

Fortran IIStack allocation

Algol 68Separate compilation

PL/IReferences

Hope (influencing ML)Pointer datatype

Algol WPattern matching

FortranPass by value/result

Algol 60Pass by value

MLPass by name

Algol 68Parametric Polymorphism

Algol 68Orthogonality

Simula 67Operator overloading

HaskellObject-oriented Programming

First DemonstrationProgramming Concept

Source: Pascal Rigaux, Mandriva

15

2. Remove Bad Features

• E. Dijkstra, Go To Statement Considered Harmful

• B.W. Kernighan, Why Pascal is Not my Favorite Programming Language

• T.A. Cargill, The Case Against Multiple Inheritance in C++

• E. A. Lee, The Problem With Threads

• H.J. Boehm, Threads Cannot be Implemented as a Library

• O. Shivers, Higher-order Control-flow Analysis in Retrospect:Lessons Learned, Lessons Abandoned

• Lowe & Ericsson, Component Adaptations and Optimisations Considered Harmful

• F. Steimann, Why Most Domain Models are Aspect Free

16

3. Experimental Evaluation

• To advance the state-of-the-art programming languages, need to evaluate practical impact of ideas

“Multiple inheritance is a great theory that in practice creates as many problems as it solves.”

-- Labview Object-Oriented Programming: The Decisions Behind the Design

17

The Third Software Crisis

• Time Frame: 2010 to ??

• Problem: Sequential performance is left behind by Moore’s law

18

1

10

100

1000

10000

100000

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Per

form

ance

(vs.

VA

X-1

1/78

0)

25%/year

52%/year

??%/year

8086

286

386

486

PentiumP2

P3P4

ItaniumItanium 2

Moore’s Law

From David Patterson

1,000,000,000

100,000

10,000

1,000,000

10,000,000

100,000,000

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006

Num

ber of Transistors

19

8086

286

386

486

PentiumP2

P3P4

ItaniumItanium 2

1

10

100

1000

10000

100000

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Per

form

ance

(vs.

VA

X-1

1/78

0)

25%/year

52%/year

Uniprocessor Performance (SPECint)

From David Patterson

1,000,000,000

100,000

10,000

1,000,000

10,000,000

100,000,000

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006

Num

ber of Transistors

20

8086

286

386

486

PentiumP2

P3P4

ItaniumItanium 2

1

10

100

1000

10000

100000

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Per

form

ance

(vs.

VA

X-1

1/78

0)

25%/year

52%/year

??%/year

Uniprocessor Performance (SPECint)

• General-purpose unicores have stopped historic performance scaling– Power consumption– Wire delays– DRAM access latency– Diminishing returns of more instruction-level parallelism

From David Patterson

1,000,000,000

100,000

10,000

1,000,000

10,000,000

100,000,000

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006

Num

ber of Transistors

21

The Third Software Crisis

• Time Frame: 2010 to ??

• Problem: Sequential performance is left behind by Moore’s law

• Needed continuous and reasonable performance improvements – to support new features– to support larger datasets

• While sustaining portability, malleability and maintainability without unduly increasing complexity faced by the programmer

• à critical to keep-up with the current rate of evolution in software

22

The Third Software Crisis

• Time Frame: 2010 to ??

• Problem: Sequential performance is left behind by Moore’s law

• Needed continuous and reasonable performance improvements – to support new features– to support larger datasets

• While sustaining portability, malleability and maintainability without unduly increasing complexity faced by the programmer

à critical to keep-up with the current rate of evolution in software

23

The Third Software Crisis

• Time Frame: 2010 to ??

• Problem: Sequential performance is left behind by Moore’s law

• Needed continuous and reasonable performance improvements – to support new features– to support larger datasets

• While sustaining portability, malleability and maintainability without unduly increasing complexity faced by the programmer

critical to keep-up with the current rate of evolution in software

24

How to programmulticores?

1985 199019801970 1975 1995 2000 2005

Raw

Power4Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

Cell

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Boardcom 1480

20??

# ofcores

1

2

4

8

16

32

64

128256

512

Opteron 4PXeon MP

AmbricAM2045

The Third Software Crisis

4004

8008

80868080 286 386 486 Pentium P2 P3P4Itanium

Itanium 2Athlon

25

How Are We Going to Solve the Third Software Crisis?

1. Build and mobilize the community

2. Seed the proliferation of research into new – Programming Models– Languages – Compilers– Tools

3. Facilitate Testing and Feedback for Rapid Evolution

4. Create a framework to identify and collect winning ideas for standardization and adoption

5. Support the migration of the dusty deck to the new paradigm

26

1. Build and Mobilize the Community

• Bring the High Performance Languages/Compilers/Tools folks out of the woodwork!– Then: A few customers with the goal of performance at any cost.– Then: Had to compete with Moore’s law– Now: Reasonable performance improvements for the masses

• Bring the folks who won the second crisis– Then: the focus is improving programmer productivity– Now: how to maintain performance in a multicore world– Now: If not solved, all the productivity gains will be lost!

27

1. Build and Mobilize the Community

• Bring the High Performance Languages/Compilers/Tools folks out of the woodwork!– Then: A few customers with the goal of performance at any cost.– Then: Had to compete with Moore’s law– Now: Reasonable performance improvements for the masses

• Bring the folks who won the second crisis– Then: the focus is improving programmer productivity– Now: how to maintain performance in a multicore world– Now: If not solved, all the productivity gains will be lost!

28

2. Proliferation of Research

• Need to generate a lot of new ideas– We haven’t done much in the last decade!

• Need to look into the areas of – Programming Models– Languages – Compilers– Tools

29

Experience from High Performance Languages

• In the early 90’s– HPF, FORTRAN-D etc.– Low productivity languages– Ended up low performance languages!– Architects (and compiler writers) are not good in designing languages

• The next decade – StreamIt, Brook, Cilk etc.– Very little work, not a priority

• Now– HPC effort: Fortress, X10 etc.

– Trying to achieve a lot without many positive or recent examples to follow

30

Why New Languages?• Paradigm shift in architecture

– From sequential to multicore– Need a new “common machine language”

• New application domains– Streaming– Scripting– Event-driven (real-time)

• New hardware features– Transactions – Introspection– Scalar Operand Networks or Core-to-core DMA

• New customers– Mobile devices– The average programmer!

• Can we achieve parallelism without burdening the programmer?

31

How to design a new language• Language design is an art form

• Good languages need to support– Powerful– Useable by Novice to Expert– Supports Abstraction– Supports Modularization– Supports Portability– Supports Malleability– Increases Programmer Productivity

• Need to understand and have a lot of experience in all the concepts – One bad feature can kill a language

• Need many experimental languages before building one with mass acceptance – A language with mass acceptance so far is once a decade event

32

Domain Specific Languages

• There is no single programming domain!– Many programs don’t fit the OO model (example streaming)

• Need to identify new programming models/domains– Develop domain specific end-to-end systems– Develop languages, tools, applications a body of knowledge

• Stitching multiple domains together is a hard problem– A central concept in one domain may not exist in another

– Shared memory is critical for transactions, but not available in streaming – Need conceptually simple and formally rigorous interfaces – Need integrated tools– But critical for many DOD and other applications

33

programmability

domain specificoptimizations

enable parallelexecution

simple and effective optimizations for domain specific abstractions

boost productivity, enable faster development and rapid prototyping

Compiler-Aware Language Design: StreamIt Experience

• Some programming models are inherently concurrent– Coding them using a sequential language is…

– Harder than using the right parallel abstraction– All information on inherent parallelism is lost

• There are win-win situations– Increasing the programmer productivity while extracting parallel performance

target tiled architectures, clusters, DSPs, multicores, graphics processors, …

34

Parallelizing Compilers:SUIF Experience

• Automatic Parallelism is not impossible– Can work well in many domains (example: ILP)

• Automatic Parallelism for multiprocessors “almost” worked in the ‘90s– SUIF compiler got the Best SPEC results by automatic

parallelization

• But…– The compilers were not robust– Clients were impossible (performance at any cost)– Multiprocessor communication was expensive – Had to compete with improvements in sequential performance– The Dogfooding problem

35

Compilers

• Compilers are critical in reducing the burden on programmers– Identification of data parallel loops can be easily automated, but

many current systems (Brook, PeakStream) require the programmer to do it.

• Need to revive the push for automatic parallelization– Best case: totally automated parallelization hidden from the user– Worst case: simplify the task of the programmer

36

Tools

• A lot of progress in tools to improve programmer productivity

• Need tools to– Identify parallelism– Debug parallel code– Update and maintain parallel code– Stitch multiple domains together

• Need an “Eclipse platform for multicores”

37

3. Facilitate Evaluation and Feedback for Rapid Evolution

Language/Compiler/ToolsIdea

Implementation

Evaluation

Evaluation

Develop aProgram

FunctionalDebugging

PerformanceDebuggingEvaluate

38

The Dogfooding ProblemCAD Tools vs. OO Languages

• CAD Tools– Universally hated by the users– Only a few can hack it– Very painful to use

• Object Oriented Languages– User friendly – Universal acceptance – Use by ordinary programmers– Huge improvements in

programmer productivity

39

The Dogfooding ProblemCAD Tools vs. OO Languages

• CAD Tools– Universally hated by the users– Only a few can hack it– Very painful to use

• Origins– Developed by CAD experts– User community is separate

• Object Oriented Languages– User friendly – Universal acceptance – Use by ordinary programmers– Huge improvements in

programmer productivity

• Origins– Developed by PL experts– The compiler is always written

using the language/tools– Rapid feedback

40

The Dogfooding ProblemCAD Tools vs. OO Languages

• CAD Tools– Universally hated by the users– Only a few can hack it– Very painful to use

• Origins– Developed by CAD experts– User community is separate

• Object Oriented Languages– User friendly – Universal acceptance – Use by ordinary programmers– Huge improvements in

programmer productivity

• Origins– Developed by PL experts– The compiler is always written

using the language/tools– Rapid feedback

• High Performance Languages– User community is separate– Hard to get feedback– Slow evolution

41

Rapid Evaluation

• Extremely hard to get– Real users have no interest in flaky tools– Hard to quantify – Superficial users vs. Deep users will give different feedback

– Fatal flaws as well as amazing uses may not come out immediately

• Need a huge, sophisticated (and expensive) infrastructure – How to get a lot of application experts to use the system? – How do you get them to become an expert?– How do you get them to use it for a long time?– How do you scientifically evaluate?– How go you get actionable feedback?

• A “Center for Evaluating Multicore Programming Environments”??

42

4. Identify, Collect, Standardize, Adopt

• Good languages/tools cannot be designed by committee

• However, you need a vibrant ecosystem of ideas

• Need a process of natural selection – Quantify Productivity and Performance – Competition between multiple teams– Winner(s) get to design the final language

43

5. Migrate the Dusty Deck

• Help rewrite the huge stack of dusty deck– Application in use– Source code available– Programmer long gone

• Getting the new program to have the same behavior is hard– “Word pagination problem”

• Can take advantage of many recent advances– Creating test cases– Extracting invariants– Failure oblivious computing

44

Conclusions• Programming language research is a critical long-term investment

– In the 1950s, the early background for the Simula language was funded by the Norwegian Defense Research Establishment

– In 2002, the designers received the ACM Turing Award “for ideas fundamental to the emergence of object oriented programming.”

• We need to lay the foundation for the programming paradigm of the 21st century

• Switching to multicores without losing the gains in programmer productivity may be the Grandest of the Grand Challenges

– Half a century of work still no winning solution– Will affect everyone!

• Need a government industry partnership to solve this– Intel is worried multicores are needed to deliver Moore’s law gains to customers– Microsoft is concerned need to program the multicores – DOD will be affected from a problem at the labs to one that every programmer

needs to tackle (from the smallest embedded device to supercomputers)