Matching Data Intensive Applications and Hardware/Software Architectures
Where Architectures and Applications In the
Transcript of Where Architectures and Applications In the
Compiler TechnologyIn the Near and Distant Future
Where Architectures and ApplicationsAre Taking Us
Ken KennedyCenter for High Performance Software
Rice University
http://www.cs.rice.edu/~ken/Presentations/CompilerFuture.pdf
Center for High Performance Software
Center for High Performance Software
My Mentors
Fran AllenJohn Cocke
Jack SchwartzSue GrahamJeff Ullman
Center for High Performance Software
My PhD Students
Randy AllenVasanth BalaDavid CallahanSteve CarrDavid ChaseKeith CooperErvan DarnellRodney FarrowGina GoffMary HallReinhard v. HanxledenPaul HavlakUli KremerLorie LiebrockScott MarksNat McIntosh
Kathryn McKinleyDoug MonkHausi M�llerAllan PorterfieldJay RamanathanCarl RoseneJerry RothAjay SethiTom ShieldsJaspal SubhlokKhalid ThabitLinda TorczonChau-Wen TsengScott WarrenKenny ZadeckLin Zucconi
Center for High Performance Software
Other Collaborators
¥ Faculty and Staff Collaborators
¥ Current and Former Students
Vikram AdveBradley BroomRalph BricknerCorky CartwrightAlok ChoudharyTerry ClarkMani ChandyJack Dongarra
Andrei ErshovHorace FlattIan FosterRob FowlerGeoffrey FoxBob HoodSeema HiranandaniGuohua Jin
Seema HiranandaniChuck KoelbelJohn Mellor-CrummeySanjay RankaDan ReedJoel SaltzBob ThrallDavid Whalley
Don BaumgartnerPreston BriggsZoran BudimlicAlan CarleBen ChaseScott Comer
Ron CytronChen DingMike MauldinPaul MilazzoNenad NedeljkovicDejan Mir�evsky
Mike PalecznyJeff PiperCarrie PorterfieldHariklia TsalapatasJoe Warren
Center for High Performance Software
Context
¥ Explosive Growth of Information TechnologyÑNow represents 20 percent of economy, 35 percent of GDP growthÑEssential to operation of most organizations, especially government
Center for High Performance Software
Context
¥ Explosive Growth of Information TechnologyÑNow represents 20 percent of economy, 35 percent of GDP growthÑEssential to operation of most organizations, especially government
¥ Enormous Demand for Software
¥ Shortage of IT ProfessionalsÑChallenge: double the number of CS graduates
Center for High Performance Software
Context
¥ Explosive Growth of Information TechnologyÑNow represents 20 percent of economy, 35 percent of GDP growthÑEssential to operation of most organizations, especially government
¥ Enormous Demand for Software
¥ Shortage of IT ProfessionalsÑChallenge: double the number of CS graduates
¥ Complex Computer ArchitecturesÑDeep memory hierarchies, high degrees of parallelismÑHeterogeneous, geographically distributed platforms
Ð Changes in performance of nodes and links during execution
Center for High Performance Software
Context
¥ Explosive Growth of Information TechnologyÑNow represents 20 percent of economy, 35 percent of GDP growthÑEssential to operation of most organizations, especially government
¥ Enormous Demand for Software
¥ Shortage of IT ProfessionalsÑChallenge: double the number of CS graduates
¥ Complex Computer ArchitecturesÑDeep memory hierarchies, high degrees of parallelismÑHeterogeneous, geographically distributed platforms
Ð Changes in performance of nodes and links during execution
¥ Complex ApplicationsÑMany diverse components, dynamic, adaptive, unstructured
Center for High Performance Software
Philosophy
¥ Compiler Technology = Off-Line ProcessingÑGoals: improved performance and language usability
Ð Making it practical to use the full power of the language
Center for High Performance Software
Philosophy
¥ Compiler Technology = Off-Line ProcessingÑGoals: improved performance and language usability
Ð Making it practical to use the full power of the languageÑTrade-off: preprocessing time versus execution timeÑRule: performance of both compiler and application must be
acceptable to the end user
Center for High Performance Software
Philosophy
¥ Compiler Technology = Off-Line ProcessingÑGoals: improved performance and language usability
Ð Making it practical to use the full power of the languageÑTrade-off: preprocessing time versus execution timeÑRule: performance of both compiler and application must be
acceptable to the end user
¥ ExamplesÑMacro expansion
Ð PL/I macro facility Ñ 10x improvement with compilation
Center for High Performance Software
Philosophy
¥ Compiler Technology = Off-Line ProcessingÑGoals: improved performance and language usability
Ð Making it practical to use the full power of the languageÑTrade-off: preprocessing time versus execution timeÑRule: performance of both compiler and application must be
acceptable to the end user
¥ ExamplesÑMacro expansion
Ð PL/I macro facility Ñ 10x improvement with compilationÑQuery processing
Ð Dramatic improvement in speed through planning
Center for High Performance Software
Philosophy
¥ Compiler Technology = Off-Line ProcessingÑGoals: improved performance and language usability
Ð Making it practical to use the full power of the languageÑTrade-off: preprocessing time versus execution timeÑRule: performance of both compiler and application must be
acceptable to the end user
¥ ExamplesÑMacro expansion
Ð PL/I macro facility Ñ 10x improvement with compilationÑQuery processing
Ð Dramatic improvement in speed through planningÑCommunication planning in dynamic applications
Ð Develop efficient communication schedules at run time
Center for High Performance Software
Making Languages Usable
It was our belief that if FORTRAN, during itsfirst months, were to translate any reasonableÒscientificÓ source program into an object programonly half as fast as its hand-coded counterpart,then acceptance of our system would be in seriousdanger... I believe that had we failed to produceefficient programs, the widespread use oflanguages like FORTRAN would have been seriouslydelayed.
Ñ John Backus
Center for High Performance Software
A Java Experiment
¥ Scientific Programming In JavaÑGoal: make it possible to use the full object-oriented power for
scientific applicationsÐ Many scientific implementations mimic Fortran style
Center for High Performance Software
A Java Experiment
¥ Scientific Programming In JavaÑGoal: make it possible to use the full object-oriented power for
scientific applicationsÐ Many scientific implementations mimic Fortran style
¥ OwlPack Benchmark SuiteÑThree versions of LinPACK in Java
Ð Fortran styleÐ Lite object-oriented styleÐ Full polymorphism
No differences for type
Center for High Performance Software
A Java Experiment
¥ Scientific Programming In JavaÑGoal: make it possible to use the full object-oriented power for
scientific applicationsÐ Many scientific implementations mimic Fortran style
¥ OwlPack Benchmark SuiteÑThree versions of LinPACK in Java
Ð Fortran styleÐ Lite object-oriented styleÐ Full polymorphism
No differences for type
¥ ExperimentÑCompare running times for different styles on same Java VMÑEvaluate potential for compiler optimization
Center for High Performance Software
Performance Results
0
5
10
15
20
25
30
35
Run Timein
Secs
dgefa dgesl dgedi
Fortran StyleLite OO StyleOO StyleOptimized OONative F90
Results Using JDK 1.2JIT on SUN Ultra 5
Center for High Performance Software
Preliminary Conclusions
¥ Definition of Application Will Become FuzzyÑKnowledge of the computation will be revealed in stagesÑExamples:
Ð Compilation with input data,Ð Compiler-generated run-time preprocessingÐ Optimization with late binding of target platformÐ Compilation based on predefined component libraries
Center for High Performance Software
Preliminary Conclusions
¥ Definition of Application Will Become FuzzyÑKnowledge of the computation will be revealed in stagesÑExamples:
Ð Compilation with input data,Ð Compiler-generated run-time preprocessingÐ Optimization with late binding of target platformÐ Compilation based on predefined component libraries
¥ Performance Will Be More ElusiveÑEven reliable performance will be hard to achieveÑCompiler will need to be even more heroic,
Ð Yet programmer will continue to want control
Center for High Performance Software
Preliminary Conclusions
¥ Definition of Application Will Become FuzzyÑKnowledge of the computation will be revealed in stagesÑExamples:
Ð Compilation with input data,Ð Compiler-generated run-time preprocessingÐ Optimization with late binding of target platformÐ Compilation based on predefined component libraries
¥ Performance Will Be More ElusiveÑEven reliable performance will be hard to achieveÑCompiler will need to be even more heroic,
Ð Yet programmer will continue to want control
¥ Compilers Structure Will Be More FlexibleÑCompilation will be carried out in stages
Center for High Performance Software
Compiling with Data
CompilerCompiler
ApplicationApplication
ProgramProgram
Center for High Performance Software
Compiling with Data
CompilerCompiler
Reduced ApplicationReduced Application
ProgramProgram
Slowly-Changing Data
Slowly-Changing Data
Center for High Performance Software
Compiling with Data
CompilerCompiler
Reduced ApplicationReduced Application
ProgramProgram
Rapidly-Changing Data
Rapidly-Changing Data
Slowly-Changing Data
Slowly-Changing Data
AnswersAnswers
Center for High Performance Software
Run-Time Compilation
CompilerCompiler
ApplicationApplication
ProgramProgram
Center for High Performance Software
Run-Time Compilation
CompilerCompiler
ApplicationApplication
ProgramProgram
Slowly-Changing Data
Slowly-Changing Data Pre-OptimizerPre-Optimizer
Center for High Performance Software
Run-Time Compilation
CompilerCompiler
ApplicationApplication
ProgramProgram
Rapidly-Changing Data
Rapidly-Changing Data
Slowly-Changing Data
Slowly-Changing Data
AnswersAnswers
Pre-OptimizerPre-Optimizer
Center for High Performance Software
Bandwidth as Limiting Factor
¥ Program and Machine BalanceÑProgram Balance: Average number of bytes that must be
transferred in memory per floating point operationÑMachine Balance: Average number of bytes the machine can
transfer from memory per floating point operation
Center for High Performance Software
Bandwidth as Limiting Factor
¥ Program and Machine BalanceÑProgram Balance: Average number of bytes that must be
transferred in memory per floating point operationÑMachine Balance: Average number of bytes the machine can
transfer from memory per floating point operation
Applications Flops L1ÐReg L2ÐL1 MemÐL2Convolution 1 6.4 5.1 5.2Dmxpy 1 8.3 8.3 8.4Mmjki (o2) 1 24.0 8.2 5.9FFT 1 8.3 3.0 2.7SP 1 10.8 6.4 4.9Sweep3D 1 15.0 9.1 7.8SGI Origin 1 4 4 0.8
Center for High Performance Software
Cache and Bandwidth
Center for High Performance Software
Cache and Bandwidth
Memory
L2 Cache 128 Bytes6.25 % Utilization
Center for High Performance Software
Cache and Bandwidth
Memory
L1 Cache 32 Bytes
25 % Utilization
L2 Cache 128 Bytes6.25 % Utilization
Center for High Performance Software
Cache and Bandwidth
Memory
L1 Cache 32 Bytes
25 % Utilization
L2 Cache 128 Bytes6.25 % Utilization
Register 8 Bytes
100 % Utilization
Center for High Performance Software
Dynamic Data Packing
¥ Suppose the Calculation is IrregularÑExample: Molecular Dynamics
Ð Force calculations (pairs of forces)Ð Updating locations (single force per update)
Center for High Performance Software
Dynamic Data Packing
¥ Suppose the Calculation is IrregularÑExample: Molecular Dynamics
Ð Force calculations (pairs of forces)Ð Updating locations (single force per update)
¥ StrategyÑDynamically reorganize data
Ð So locations used together are updated togetherÑDynamically reorganize interactions
Ð So indirect accesses are not neededÑExample: Òfirst touchÓ
Ð Assign elements to cache lines in order of first touch by pairscalculation
Center for High Performance Software
First-Touch Ordering
P1 P2P3P4P5
Original Ordering
Center for High Performance Software
First-Touch Ordering
P1 P1
P2
P1
P3 P4
P2 P2
P3 P5
InteractionPairs
P1 P2P3P4P5
Original Ordering
Center for High Performance Software
First-Touch Ordering
P1 P2P3P4P5
Original Ordering
P1 P1
P2
P1
P3 P4
P2 P2
P3 P5
InteractionPairs
P1 P2 P3 P4 P5
First-Touch Ordering
Center for High Performance Software
Performance Results 1
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
1
Exe. time L1 misses L2 misses TLB misses
Moldyn
original data regrouping base packing opt packing
Center for High Performance Software
Performance Results 2
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
1
Exe. time L1 misses L2 misses TLB misses
Magi
original data regrouping base packing opt packing
Center for High Performance Software
Irregular Multilevel Blocking
¥ Associate a tuple of block numbers with each particleÑOne block number per level of the memory hierarchy
Ð Block number = selected bits of particle address
Particle address
L2 block number
L1 block number
TLB block number
A B C
Center for High Performance Software
Irregular Multilevel Blocking
¥ Associate a tuple of block numbers with each particleÑOne block number per level of the memory hierarchy
Ð Block number = selected bits of particle address
Particle address
L2 block number
L1 block number
TLB block number
¥ For an interaction pair, interleave block numbers for particles
A B C
A B CA B C
¥ Sorting by composite block number Ô multi-level blocking
Center for High Performance Software
Dynamic Optimization
CompilerCompiler
ApplicationApplication
ProgramProgram
Center for High Performance Software
Dynamic Optimization
CompilerCompiler
ApplicationApplication
ProgramProgram
ConfigurationAnd Data
ConfigurationAnd Data
Dynamic Optimizer(Optimizing Loader)
Dynamic Optimizer(Optimizing Loader)
Center for High Performance Software
Dynamic Optimization
CompilerCompiler
ApplicationApplication
ProgramProgram
Rapidly-Changing Data
Rapidly-Changing Data
ConfigurationAnd Data
ConfigurationAnd Data
AnswersAnswers
Dynamic Optimizer(Optimizing Loader)
Dynamic Optimizer(Optimizing Loader)
Center for High Performance Software
Grid Compilation Architecture
¥ Goal: reliable performance under varying load
Whole-ProgramCompiler
Libraries
DynamicOptimizer
Real-timePerformance
Monitor
PerformanceProblem
ServiceNegotiator
Scheduler
GridRuntimeSystem
SourceAppli-cation
Config-urableObject
Program
SoftwareComponent s
Performance Feedback
Negotiation
GrADS Project: Berman, Chien, Cooper, Dongarra, Foster, Gannon, Johnsson, Kennedy, Kesselman, Reed, Torczon, Wolski
Center for High Performance Software
Programming Productivity
¥ ChallengesÑprogramming is hardÑprofessional programmers are in short supplyÑhigh performance will continue to be important
Center for High Performance Software
Programming Productivity
¥ ChallengesÑprogramming is hardÑprofessional programmers are in short supplyÑhigh performance will continue to be important
¥ One Strategy: Make the End User a ProgrammerÑprofessional programmers develop componentsÑusers integrate components using:
Ð problem-solving environments (PSEs)Ð scripting languages (possibly graphical)
examples: Visual Basic, Tcl/Tk, AVS, Khoros
Center for High Performance Software
Programming Productivity
¥ ChallengesÑprogramming is hardÑprofessional programmers are in short supplyÑhigh performance will continue to be important
¥ One Strategy: Make the End User a ProgrammerÑprofessional programmers develop componentsÑusers integrate components using:
Ð problem-solving environments (PSEs)Ð scripting languages (possibly graphical)
examples: Visual Basic, Tcl/Tk, AVS, Khoros
¥ Compilation for High PerformanceÑtranslate scripts and components to common intermediate languageÑoptimize the resulting program using interprocedural methods
Center for High Performance Software
Script-Based Programming
Component Library
Component Library
User LibraryUser Library
ScriptScript
Center for High Performance Software
Script-Based Programming
Component Library
Component Library
User LibraryUser Library
ScriptScript
IntermediateCode
IntermediateCodeTranslatorTranslator
Center for High Performance Software
Script-Based Programming
Component Library
Component Library
User LibraryUser Library
ScriptScript
IntermediateCode
IntermediateCode
GlobalOptimizerGlobal
Optimizer
TranslatorTranslator
Center for High Performance Software
Code GeneratorCode
Generator
Script-Based Programming
Component Library
Component Library
User LibraryUser Library
ScriptScript
IntermediateCode
IntermediateCode
GlobalOptimizerGlobal
Optimizer
TranslatorTranslator
Center for High Performance Software
Telescoping Languages
L1 Class Library
L1 Class Library
Center for High Performance Software
Telescoping Languages
L1 Class Library
L1 Class Library L0 CompilerL0 Compiler
L1 CompilerL1 Compiler
Center for High Performance Software
Telescoping Languages
L1 Class Library
L1 Class Library
L2 ClassLibrary
L2 ClassLibrary
L0 CompilerL0 Compiler
L1 CompilerL1 Compiler
L2 CompilerL2 Compiler
Center for High Performance Software
Telescoping Languages
L1 Class Library
L1 Class Library
L2 ClassLibrary
L2 ClassLibrary
ScriptScript
L0 CompilerL0 Compiler
L1 CompilerL1 Compiler
L2 CompilerL2 CompilerScriptTranslatorScript
TranslatorOptimizedApplicationOptimizedApplication
Center for High Performance Software
Telescoping Languages: Advantages
¥ Compile times can be reasonableÑMore compilation time can be spent on libraries
Ð Amortized over many usesÑScript compilations can be fast
Ð Components reused from scripts may be included in libraries
Center for High Performance Software
Telescoping Languages: Advantages
¥ Compile times can be reasonableÑMore compilation time can be spent on libraries
Ð Amortized over many usesÑScript compilations can be fast
Ð Components reused from scripts may be included in libraries
¥ Compilation at script level can be localÑContext-based optimization of library calls can be achieved through
Ð Inlining, conditional inlining, compiler entry selectionÑSide-effect information can be determined via jump function tables
Ð Encapsulated at an earlier compilation stageÑSensitive library source need only be selectively revealed
Center for High Performance Software
Telescoping Languages: Advantages
¥ Compile times can be reasonableÑMore compilation time can be spent on libraries
Ð Amortized over many usesÑScript compilations can be fast
Ð Components reused from scripts may be included in libraries
¥ Compilation at script level can be localÑContext-based optimization of library calls can be achieved through
Ð Inlining, conditional inlining, compiler entry selectionÑSide-effect information can be determined via jump function tables
Ð Encapsulated at an earlier compilation stageÑSensitive library source need only be selectively revealed
¥ User can retain substantive control over application performance
Center for High Performance Software
Example: HPF Revisited
HPFTranslator
HPFTranslator
Fortran 90Program
Fortran 90Program
GlobalOptimizerGlobal
OptimizerMPI Code GeneratorMPI Code Generator
Center for High Performance Software
Example: HPF Revisited
DistributionLibrary
DistributionLibrary
HPFTranslator
HPFTranslator
Fortran 90Program
Fortran 90Program
GlobalOptimizerGlobal
OptimizerMPI Code GeneratorMPI Code Generator
DistributionPrecompilerDistributionPrecompiler
Center for High Performance Software
Example: HPF Revisited
DistributionLibrary
DistributionLibrary
HPFTranslator
HPFTranslator
Fortran 90Program
Fortran 90Program
GlobalOptimizerGlobal
OptimizerMPI Code GeneratorMPI Code Generator
DistributionPrecompilerDistributionPrecompiler
Distribute (HilbertLib): A,BDo i = 1,100
A(i) = B(i) + CEnddo
A.putBlock(1,100, B.getBlock(1,100) + C)
Center for High Performance Software
Flexible Compiler Architecture
¥ Flexible Definition of ComputationÑParameters
Ð program schemeÐ base library sequence (l1, l2, É, lp)Ð subprogram source files (s1, s2, ..., sn)Ð run history (r1, r2, ..., rk)Ð data sets (d1, d2, ..., dm)Ð target configuration
Center for High Performance Software
Flexible Compiler Architecture
¥ Flexible Definition of ComputationÑParameters
Ð program schemeÐ base library sequence (l1, l2, É, lp)Ð subprogram source files (s1, s2, ..., sn)Ð run history (r1, r2, ..., rk)Ð data sets (d1, d2, ..., dm)Ð target configuration
¥ Compilation = Partial EvaluationÑseveral compilation steps as information becomes available
Center for High Performance Software
Flexible Compiler Architecture
¥ Flexible Definition of ComputationÑParameters
Ð program schemeÐ base library sequence (l1, l2, É, lp)Ð subprogram source files (s1, s2, ..., sn)Ð run history (r1, r2, ..., rk)Ð data sets (d1, d2, ..., dm)Ð target configuration
¥ Compilation = Partial EvaluationÑseveral compilation steps as information becomes available
¥ Program ManagementÑWhen to back out of previous compilation decisions due to changeÑWhen to invalidate certain inputs
Ð Examples: change in library or run history
Center for High Performance Software
Summary
¥ Compilation = Off-Line ProcessingÑGoal: improved performance
Center for High Performance Software
Summary
¥ Compilation = Off-Line ProcessingÑGoal: improved performance
¥ Optimization Enables Language PowerÑPrinciple: encourage rather than discourage use of powerful
featuresÐ Good programming practice should be rewarded
Center for High Performance Software
Summary
¥ Compilation = Off-Line ProcessingÑGoal: improved performance
¥ Optimization Enables Language PowerÑPrinciple: encourage rather than discourage use of powerful
featuresÐ Good programming practice should be rewarded
¥ Target Platforms, Languages, and Apps Becoming More ComplexÑPlatforms: Parallel, heterogeneous, deep memory hierarchiesÑApplications: dynamic, irregular, extensive use of domain librariesÑProgramming: component development, system composition
Center for High Performance Software
Summary
¥ Compilation = Off-Line ProcessingÑGoal: improved performance
¥ Optimization Enables Language PowerÑPrinciple: encourage rather than discourage use of powerful
featuresÐ Good programming practice should be rewarded
¥ Target Platforms, Languages, and Apps Becoming More ComplexÑPlatforms: Parallel, heterogeneous, deep memory hierarchiesÑApplications: dynamic, irregular, extensive use of domain librariesÑProgramming: component development, system composition
¥ Compiler Structure Will Be Correspondingly ComplexÑPartial evaluation in stages with incremental informationÑMechanisms for graceful response to change