Where Architectures and Applications In the

Post on 12-Sep-2021

2 views 0 download

Transcript of Where Architectures and Applications In the

Compiler TechnologyIn the Near and Distant Future

Where Architectures and ApplicationsAre Taking Us

Ken KennedyCenter for High Performance Software

Rice University

http://www.cs.rice.edu/~ken/Presentations/CompilerFuture.pdf

Center for High Performance Software

Center for High Performance Software

My Mentors

Fran AllenJohn Cocke

Jack SchwartzSue GrahamJeff Ullman

Center for High Performance Software

My PhD Students

Randy AllenVasanth BalaDavid CallahanSteve CarrDavid ChaseKeith CooperErvan DarnellRodney FarrowGina GoffMary HallReinhard v. HanxledenPaul HavlakUli KremerLorie LiebrockScott MarksNat McIntosh

Kathryn McKinleyDoug MonkHausi M�llerAllan PorterfieldJay RamanathanCarl RoseneJerry RothAjay SethiTom ShieldsJaspal SubhlokKhalid ThabitLinda TorczonChau-Wen TsengScott WarrenKenny ZadeckLin Zucconi

Center for High Performance Software

Other Collaborators

¥ Faculty and Staff Collaborators

¥ Current and Former Students

Vikram AdveBradley BroomRalph BricknerCorky CartwrightAlok ChoudharyTerry ClarkMani ChandyJack Dongarra

Andrei ErshovHorace FlattIan FosterRob FowlerGeoffrey FoxBob HoodSeema HiranandaniGuohua Jin

Seema HiranandaniChuck KoelbelJohn Mellor-CrummeySanjay RankaDan ReedJoel SaltzBob ThrallDavid Whalley

Don BaumgartnerPreston BriggsZoran BudimlicAlan CarleBen ChaseScott Comer

Ron CytronChen DingMike MauldinPaul MilazzoNenad NedeljkovicDejan Mir�evsky

Mike PalecznyJeff PiperCarrie PorterfieldHariklia TsalapatasJoe Warren

Center for High Performance Software

Context

¥ Explosive Growth of Information TechnologyÑNow represents 20 percent of economy, 35 percent of GDP growthÑEssential to operation of most organizations, especially government

Center for High Performance Software

Context

¥ Explosive Growth of Information TechnologyÑNow represents 20 percent of economy, 35 percent of GDP growthÑEssential to operation of most organizations, especially government

¥ Enormous Demand for Software

¥ Shortage of IT ProfessionalsÑChallenge: double the number of CS graduates

Center for High Performance Software

Context

¥ Explosive Growth of Information TechnologyÑNow represents 20 percent of economy, 35 percent of GDP growthÑEssential to operation of most organizations, especially government

¥ Enormous Demand for Software

¥ Shortage of IT ProfessionalsÑChallenge: double the number of CS graduates

¥ Complex Computer ArchitecturesÑDeep memory hierarchies, high degrees of parallelismÑHeterogeneous, geographically distributed platforms

Ð Changes in performance of nodes and links during execution

Center for High Performance Software

Context

¥ Explosive Growth of Information TechnologyÑNow represents 20 percent of economy, 35 percent of GDP growthÑEssential to operation of most organizations, especially government

¥ Enormous Demand for Software

¥ Shortage of IT ProfessionalsÑChallenge: double the number of CS graduates

¥ Complex Computer ArchitecturesÑDeep memory hierarchies, high degrees of parallelismÑHeterogeneous, geographically distributed platforms

Ð Changes in performance of nodes and links during execution

¥ Complex ApplicationsÑMany diverse components, dynamic, adaptive, unstructured

Center for High Performance Software

Philosophy

¥ Compiler Technology = Off-Line ProcessingÑGoals: improved performance and language usability

Ð Making it practical to use the full power of the language

Center for High Performance Software

Philosophy

¥ Compiler Technology = Off-Line ProcessingÑGoals: improved performance and language usability

Ð Making it practical to use the full power of the languageÑTrade-off: preprocessing time versus execution timeÑRule: performance of both compiler and application must be

acceptable to the end user

Center for High Performance Software

Philosophy

¥ Compiler Technology = Off-Line ProcessingÑGoals: improved performance and language usability

Ð Making it practical to use the full power of the languageÑTrade-off: preprocessing time versus execution timeÑRule: performance of both compiler and application must be

acceptable to the end user

¥ ExamplesÑMacro expansion

Ð PL/I macro facility Ñ 10x improvement with compilation

Center for High Performance Software

Philosophy

¥ Compiler Technology = Off-Line ProcessingÑGoals: improved performance and language usability

Ð Making it practical to use the full power of the languageÑTrade-off: preprocessing time versus execution timeÑRule: performance of both compiler and application must be

acceptable to the end user

¥ ExamplesÑMacro expansion

Ð PL/I macro facility Ñ 10x improvement with compilationÑQuery processing

Ð Dramatic improvement in speed through planning

Center for High Performance Software

Philosophy

¥ Compiler Technology = Off-Line ProcessingÑGoals: improved performance and language usability

Ð Making it practical to use the full power of the languageÑTrade-off: preprocessing time versus execution timeÑRule: performance of both compiler and application must be

acceptable to the end user

¥ ExamplesÑMacro expansion

Ð PL/I macro facility Ñ 10x improvement with compilationÑQuery processing

Ð Dramatic improvement in speed through planningÑCommunication planning in dynamic applications

Ð Develop efficient communication schedules at run time

Center for High Performance Software

Making Languages Usable

It was our belief that if FORTRAN, during itsfirst months, were to translate any reasonableÒscientificÓ source program into an object programonly half as fast as its hand-coded counterpart,then acceptance of our system would be in seriousdanger... I believe that had we failed to produceefficient programs, the widespread use oflanguages like FORTRAN would have been seriouslydelayed.

Ñ John Backus

Center for High Performance Software

A Java Experiment

¥ Scientific Programming In JavaÑGoal: make it possible to use the full object-oriented power for

scientific applicationsÐ Many scientific implementations mimic Fortran style

Center for High Performance Software

A Java Experiment

¥ Scientific Programming In JavaÑGoal: make it possible to use the full object-oriented power for

scientific applicationsÐ Many scientific implementations mimic Fortran style

¥ OwlPack Benchmark SuiteÑThree versions of LinPACK in Java

Ð Fortran styleÐ Lite object-oriented styleÐ Full polymorphism

No differences for type

Center for High Performance Software

A Java Experiment

¥ Scientific Programming In JavaÑGoal: make it possible to use the full object-oriented power for

scientific applicationsÐ Many scientific implementations mimic Fortran style

¥ OwlPack Benchmark SuiteÑThree versions of LinPACK in Java

Ð Fortran styleÐ Lite object-oriented styleÐ Full polymorphism

No differences for type

¥ ExperimentÑCompare running times for different styles on same Java VMÑEvaluate potential for compiler optimization

Center for High Performance Software

Performance Results

0

5

10

15

20

25

30

35

Run Timein

Secs

dgefa dgesl dgedi

Fortran StyleLite OO StyleOO StyleOptimized OONative F90

Results Using JDK 1.2JIT on SUN Ultra 5

Center for High Performance Software

Preliminary Conclusions

¥ Definition of Application Will Become FuzzyÑKnowledge of the computation will be revealed in stagesÑExamples:

Ð Compilation with input data,Ð Compiler-generated run-time preprocessingÐ Optimization with late binding of target platformÐ Compilation based on predefined component libraries

Center for High Performance Software

Preliminary Conclusions

¥ Definition of Application Will Become FuzzyÑKnowledge of the computation will be revealed in stagesÑExamples:

Ð Compilation with input data,Ð Compiler-generated run-time preprocessingÐ Optimization with late binding of target platformÐ Compilation based on predefined component libraries

¥ Performance Will Be More ElusiveÑEven reliable performance will be hard to achieveÑCompiler will need to be even more heroic,

Ð Yet programmer will continue to want control

Center for High Performance Software

Preliminary Conclusions

¥ Definition of Application Will Become FuzzyÑKnowledge of the computation will be revealed in stagesÑExamples:

Ð Compilation with input data,Ð Compiler-generated run-time preprocessingÐ Optimization with late binding of target platformÐ Compilation based on predefined component libraries

¥ Performance Will Be More ElusiveÑEven reliable performance will be hard to achieveÑCompiler will need to be even more heroic,

Ð Yet programmer will continue to want control

¥ Compilers Structure Will Be More FlexibleÑCompilation will be carried out in stages

Center for High Performance Software

Compiling with Data

CompilerCompiler

ApplicationApplication

ProgramProgram

Center for High Performance Software

Compiling with Data

CompilerCompiler

Reduced ApplicationReduced Application

ProgramProgram

Slowly-Changing Data

Slowly-Changing Data

Center for High Performance Software

Compiling with Data

CompilerCompiler

Reduced ApplicationReduced Application

ProgramProgram

Rapidly-Changing Data

Rapidly-Changing Data

Slowly-Changing Data

Slowly-Changing Data

AnswersAnswers

Center for High Performance Software

Run-Time Compilation

CompilerCompiler

ApplicationApplication

ProgramProgram

Center for High Performance Software

Run-Time Compilation

CompilerCompiler

ApplicationApplication

ProgramProgram

Slowly-Changing Data

Slowly-Changing Data Pre-OptimizerPre-Optimizer

Center for High Performance Software

Run-Time Compilation

CompilerCompiler

ApplicationApplication

ProgramProgram

Rapidly-Changing Data

Rapidly-Changing Data

Slowly-Changing Data

Slowly-Changing Data

AnswersAnswers

Pre-OptimizerPre-Optimizer

Center for High Performance Software

Bandwidth as Limiting Factor

¥ Program and Machine BalanceÑProgram Balance: Average number of bytes that must be

transferred in memory per floating point operationÑMachine Balance: Average number of bytes the machine can

transfer from memory per floating point operation

Center for High Performance Software

Bandwidth as Limiting Factor

¥ Program and Machine BalanceÑProgram Balance: Average number of bytes that must be

transferred in memory per floating point operationÑMachine Balance: Average number of bytes the machine can

transfer from memory per floating point operation

Applications Flops L1ÐReg L2ÐL1 MemÐL2Convolution 1 6.4 5.1 5.2Dmxpy 1 8.3 8.3 8.4Mmjki (o2) 1 24.0 8.2 5.9FFT 1 8.3 3.0 2.7SP 1 10.8 6.4 4.9Sweep3D 1 15.0 9.1 7.8SGI Origin 1 4 4 0.8

Center for High Performance Software

Cache and Bandwidth

Center for High Performance Software

Cache and Bandwidth

Memory

L2 Cache 128 Bytes6.25 % Utilization

Center for High Performance Software

Cache and Bandwidth

Memory

L1 Cache 32 Bytes

25 % Utilization

L2 Cache 128 Bytes6.25 % Utilization

Center for High Performance Software

Cache and Bandwidth

Memory

L1 Cache 32 Bytes

25 % Utilization

L2 Cache 128 Bytes6.25 % Utilization

Register 8 Bytes

100 % Utilization

Center for High Performance Software

Dynamic Data Packing

¥ Suppose the Calculation is IrregularÑExample: Molecular Dynamics

Ð Force calculations (pairs of forces)Ð Updating locations (single force per update)

Center for High Performance Software

Dynamic Data Packing

¥ Suppose the Calculation is IrregularÑExample: Molecular Dynamics

Ð Force calculations (pairs of forces)Ð Updating locations (single force per update)

¥ StrategyÑDynamically reorganize data

Ð So locations used together are updated togetherÑDynamically reorganize interactions

Ð So indirect accesses are not neededÑExample: Òfirst touchÓ

Ð Assign elements to cache lines in order of first touch by pairscalculation

Center for High Performance Software

First-Touch Ordering

P1 P2P3P4P5

Original Ordering

Center for High Performance Software

First-Touch Ordering

P1 P1

P2

P1

P3 P4

P2 P2

P3 P5

InteractionPairs

P1 P2P3P4P5

Original Ordering

Center for High Performance Software

First-Touch Ordering

P1 P2P3P4P5

Original Ordering

P1 P1

P2

P1

P3 P4

P2 P2

P3 P5

InteractionPairs

P1 P2 P3 P4 P5

First-Touch Ordering

Center for High Performance Software

Performance Results 1

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

1

Exe. time L1 misses L2 misses TLB misses

Moldyn

original data regrouping base packing opt packing

Center for High Performance Software

Performance Results 2

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

1

Exe. time L1 misses L2 misses TLB misses

Magi

original data regrouping base packing opt packing

Center for High Performance Software

Irregular Multilevel Blocking

¥ Associate a tuple of block numbers with each particleÑOne block number per level of the memory hierarchy

Ð Block number = selected bits of particle address

Particle address

L2 block number

L1 block number

TLB block number

A B C

Center for High Performance Software

Irregular Multilevel Blocking

¥ Associate a tuple of block numbers with each particleÑOne block number per level of the memory hierarchy

Ð Block number = selected bits of particle address

Particle address

L2 block number

L1 block number

TLB block number

¥ For an interaction pair, interleave block numbers for particles

A B C

A B CA B C

¥ Sorting by composite block number Ô multi-level blocking

Center for High Performance Software

Dynamic Optimization

CompilerCompiler

ApplicationApplication

ProgramProgram

Center for High Performance Software

Dynamic Optimization

CompilerCompiler

ApplicationApplication

ProgramProgram

ConfigurationAnd Data

ConfigurationAnd Data

Dynamic Optimizer(Optimizing Loader)

Dynamic Optimizer(Optimizing Loader)

Center for High Performance Software

Dynamic Optimization

CompilerCompiler

ApplicationApplication

ProgramProgram

Rapidly-Changing Data

Rapidly-Changing Data

ConfigurationAnd Data

ConfigurationAnd Data

AnswersAnswers

Dynamic Optimizer(Optimizing Loader)

Dynamic Optimizer(Optimizing Loader)

Center for High Performance Software

Grid Compilation Architecture

¥ Goal: reliable performance under varying load

Whole-ProgramCompiler

Libraries

DynamicOptimizer

Real-timePerformance

Monitor

PerformanceProblem

ServiceNegotiator

Scheduler

GridRuntimeSystem

SourceAppli-cation

Config-urableObject

Program

SoftwareComponent s

Performance Feedback

Negotiation

GrADS Project: Berman, Chien, Cooper, Dongarra, Foster, Gannon, Johnsson, Kennedy, Kesselman, Reed, Torczon, Wolski

Center for High Performance Software

Programming Productivity

¥ ChallengesÑprogramming is hardÑprofessional programmers are in short supplyÑhigh performance will continue to be important

Center for High Performance Software

Programming Productivity

¥ ChallengesÑprogramming is hardÑprofessional programmers are in short supplyÑhigh performance will continue to be important

¥ One Strategy: Make the End User a ProgrammerÑprofessional programmers develop componentsÑusers integrate components using:

Ð problem-solving environments (PSEs)Ð scripting languages (possibly graphical)

examples: Visual Basic, Tcl/Tk, AVS, Khoros

Center for High Performance Software

Programming Productivity

¥ ChallengesÑprogramming is hardÑprofessional programmers are in short supplyÑhigh performance will continue to be important

¥ One Strategy: Make the End User a ProgrammerÑprofessional programmers develop componentsÑusers integrate components using:

Ð problem-solving environments (PSEs)Ð scripting languages (possibly graphical)

examples: Visual Basic, Tcl/Tk, AVS, Khoros

¥ Compilation for High PerformanceÑtranslate scripts and components to common intermediate languageÑoptimize the resulting program using interprocedural methods

Center for High Performance Software

Script-Based Programming

Component Library

Component Library

User LibraryUser Library

ScriptScript

Center for High Performance Software

Script-Based Programming

Component Library

Component Library

User LibraryUser Library

ScriptScript

IntermediateCode

IntermediateCodeTranslatorTranslator

Center for High Performance Software

Script-Based Programming

Component Library

Component Library

User LibraryUser Library

ScriptScript

IntermediateCode

IntermediateCode

GlobalOptimizerGlobal

Optimizer

TranslatorTranslator

Center for High Performance Software

Code GeneratorCode

Generator

Script-Based Programming

Component Library

Component Library

User LibraryUser Library

ScriptScript

IntermediateCode

IntermediateCode

GlobalOptimizerGlobal

Optimizer

TranslatorTranslator

Center for High Performance Software

Telescoping Languages

L1 Class Library

L1 Class Library

Center for High Performance Software

Telescoping Languages

L1 Class Library

L1 Class Library L0 CompilerL0 Compiler

L1 CompilerL1 Compiler

Center for High Performance Software

Telescoping Languages

L1 Class Library

L1 Class Library

L2 ClassLibrary

L2 ClassLibrary

L0 CompilerL0 Compiler

L1 CompilerL1 Compiler

L2 CompilerL2 Compiler

Center for High Performance Software

Telescoping Languages

L1 Class Library

L1 Class Library

L2 ClassLibrary

L2 ClassLibrary

ScriptScript

L0 CompilerL0 Compiler

L1 CompilerL1 Compiler

L2 CompilerL2 CompilerScriptTranslatorScript

TranslatorOptimizedApplicationOptimizedApplication

Center for High Performance Software

Telescoping Languages: Advantages

¥ Compile times can be reasonableÑMore compilation time can be spent on libraries

Ð Amortized over many usesÑScript compilations can be fast

Ð Components reused from scripts may be included in libraries

Center for High Performance Software

Telescoping Languages: Advantages

¥ Compile times can be reasonableÑMore compilation time can be spent on libraries

Ð Amortized over many usesÑScript compilations can be fast

Ð Components reused from scripts may be included in libraries

¥ Compilation at script level can be localÑContext-based optimization of library calls can be achieved through

Ð Inlining, conditional inlining, compiler entry selectionÑSide-effect information can be determined via jump function tables

Ð Encapsulated at an earlier compilation stageÑSensitive library source need only be selectively revealed

Center for High Performance Software

Telescoping Languages: Advantages

¥ Compile times can be reasonableÑMore compilation time can be spent on libraries

Ð Amortized over many usesÑScript compilations can be fast

Ð Components reused from scripts may be included in libraries

¥ Compilation at script level can be localÑContext-based optimization of library calls can be achieved through

Ð Inlining, conditional inlining, compiler entry selectionÑSide-effect information can be determined via jump function tables

Ð Encapsulated at an earlier compilation stageÑSensitive library source need only be selectively revealed

¥ User can retain substantive control over application performance

Center for High Performance Software

Example: HPF Revisited

HPFTranslator

HPFTranslator

Fortran 90Program

Fortran 90Program

GlobalOptimizerGlobal

OptimizerMPI Code GeneratorMPI Code Generator

Center for High Performance Software

Example: HPF Revisited

DistributionLibrary

DistributionLibrary

HPFTranslator

HPFTranslator

Fortran 90Program

Fortran 90Program

GlobalOptimizerGlobal

OptimizerMPI Code GeneratorMPI Code Generator

DistributionPrecompilerDistributionPrecompiler

Center for High Performance Software

Example: HPF Revisited

DistributionLibrary

DistributionLibrary

HPFTranslator

HPFTranslator

Fortran 90Program

Fortran 90Program

GlobalOptimizerGlobal

OptimizerMPI Code GeneratorMPI Code Generator

DistributionPrecompilerDistributionPrecompiler

Distribute (HilbertLib): A,BDo i = 1,100

A(i) = B(i) + CEnddo

A.putBlock(1,100, B.getBlock(1,100) + C)

Center for High Performance Software

Flexible Compiler Architecture

¥ Flexible Definition of ComputationÑParameters

Ð program schemeÐ base library sequence (l1, l2, É, lp)Ð subprogram source files (s1, s2, ..., sn)Ð run history (r1, r2, ..., rk)Ð data sets (d1, d2, ..., dm)Ð target configuration

Center for High Performance Software

Flexible Compiler Architecture

¥ Flexible Definition of ComputationÑParameters

Ð program schemeÐ base library sequence (l1, l2, É, lp)Ð subprogram source files (s1, s2, ..., sn)Ð run history (r1, r2, ..., rk)Ð data sets (d1, d2, ..., dm)Ð target configuration

¥ Compilation = Partial EvaluationÑseveral compilation steps as information becomes available

Center for High Performance Software

Flexible Compiler Architecture

¥ Flexible Definition of ComputationÑParameters

Ð program schemeÐ base library sequence (l1, l2, É, lp)Ð subprogram source files (s1, s2, ..., sn)Ð run history (r1, r2, ..., rk)Ð data sets (d1, d2, ..., dm)Ð target configuration

¥ Compilation = Partial EvaluationÑseveral compilation steps as information becomes available

¥ Program ManagementÑWhen to back out of previous compilation decisions due to changeÑWhen to invalidate certain inputs

Ð Examples: change in library or run history

Center for High Performance Software

Summary

¥ Compilation = Off-Line ProcessingÑGoal: improved performance

Center for High Performance Software

Summary

¥ Compilation = Off-Line ProcessingÑGoal: improved performance

¥ Optimization Enables Language PowerÑPrinciple: encourage rather than discourage use of powerful

featuresÐ Good programming practice should be rewarded

Center for High Performance Software

Summary

¥ Compilation = Off-Line ProcessingÑGoal: improved performance

¥ Optimization Enables Language PowerÑPrinciple: encourage rather than discourage use of powerful

featuresÐ Good programming practice should be rewarded

¥ Target Platforms, Languages, and Apps Becoming More ComplexÑPlatforms: Parallel, heterogeneous, deep memory hierarchiesÑApplications: dynamic, irregular, extensive use of domain librariesÑProgramming: component development, system composition

Center for High Performance Software

Summary

¥ Compilation = Off-Line ProcessingÑGoal: improved performance

¥ Optimization Enables Language PowerÑPrinciple: encourage rather than discourage use of powerful

featuresÐ Good programming practice should be rewarded

¥ Target Platforms, Languages, and Apps Becoming More ComplexÑPlatforms: Parallel, heterogeneous, deep memory hierarchiesÑApplications: dynamic, irregular, extensive use of domain librariesÑProgramming: component development, system composition

¥ Compiler Structure Will Be Correspondingly ComplexÑPartial evaluation in stages with incremental informationÑMechanisms for graceful response to change