High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf ·...
Transcript of High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf ·...
![Page 1: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/1.jpg)
High-Performance Domain-
Specific Languages using Delite
Kunle Olukotun, Kevin Brown, Hassan Chafi, Zach DeVito, Sungpack Hong, Arvind Sujeeth
Pervasive Parallelism Laboratory
Stanford University
![Page 2: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/2.jpg)
Tutorial Overview
Motivation for tutorial Lots of interest in DSLs
New ideas: DSLs for productivity and parallelism
New software paradigm: DSL infrastructure
Goals
Introduction to performance oriented DSL development
DSL examples and uses
DSL implementation basics
Delite: DSL infrastructure for DSL compiler development
Intro to Scala: basis for Delite, and important new
programming lang.
![Page 3: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/3.jpg)
2020 Vision for Parallelism
Make parallelism accessible to all programmers
Parallelism is not for the average programmer Too difficult to find parallelism, to debug,
maintain and get good performance for the masses
Need a solution for “Joe/Jane the programmer”
Can’t expose average programmers to parallelism But auto parallelizatoin doesn’t work
![Page 4: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/4.jpg)
Three Faces of Computing
Predicting the future
Modeling and simulation (weather, materials, products)
Decide what to build and experiment or instead of build and experiment ⇒ third pilar of science
Coping with the present (real time)
Embedded systems control (cars, planes, communication)
Virtual worlds (second life, facebook)
Electronic trading (airline reservation, stock market)
Robotics (manufacturing, cars, household)
Understanding the past Big data set analysis (commerce, web, census, simulation)
Discover trends and develop insight
![Page 5: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/5.jpg)
Explosion of Data Sources
![Page 6: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/6.jpg)
Computing Goals: The 4 Ps
Power efficiency
Performance
Productivity
Portability
![Page 7: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/7.jpg)
Era of Power Limited Computing
Mobile
Battery operated
Passively cooled
Data center
Energy costs
Infrastructure costs
![Page 8: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/8.jpg)
Power and Performance
Power =Joules
Op´Ops
second
FIXED
Energy efficiency
Performance
![Page 9: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/9.jpg)
Specialized (Heterogeneous) Hardware
Heterogeneous HW for energy efficiency
Multi-core, ILP, threads, data-parallel engines, custom engines
H.264 encode study
1
10
100
1000
4 cores + ILP + SIMD + custominst
ASIC
Performance
Energy Savings
Source: Understanding Sources of Inefficiency in General-Purpose Chips (ISCA’10)
~3 orders of magnitude
Future performance gains will come mainly from heterogeneous
hardware with different specialized resources
![Page 10: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/10.jpg)
DE Shaw Research: Anton
D. E. Shaw et al. SC 2009, Best Paper and Gordon Bell Prize
100 times more power efficient
Molecular dynamics computer
![Page 11: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/11.jpg)
Heterogeneous Parallel Architectures Today
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
![Page 12: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/12.jpg)
Heterogeneous Parallel Programming
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
MPI PGAS
Pthreads OpenMP
CUDA OpenCL
Verilog VHDL
![Page 13: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/13.jpg)
Programmability Chasm
Too many different programming models
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
MPI PGAS
Pthreads OpenMP
CUDA OpenCL
Verilog VHDL
Virtual
Worlds
Personal
Robotics
Data informatics
Scientific
Engineering
Applications
![Page 14: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/14.jpg)
Hypothesis
It is possible to write one program and
run it on all these machines
![Page 15: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/15.jpg)
Programmability Chasm
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
MPI PGAS
Pthreads OpenMP
CUDA OpenCL
Verilog VHDL
Virtual
Worlds
Personal
Robotics
Data informatics
Scientific
Engineering
Applications
Ideal Parallel
Programming Language
![Page 16: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/16.jpg)
Performance
Productivity Generality
The Ideal Parallel Programming Language
![Page 17: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/17.jpg)
Successful Languages
Performance
Productivity Generality
![Page 18: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/18.jpg)
True Hypothesis ⇒ Domain Specific Languages
Domain Specific
Languages
Performance (Heterogeneous Parallelism)
Productivity Generality
![Page 19: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/19.jpg)
Domain Specific Languages
Domain Specific Languages (DSLs)
Programming language with restricted expressiveness for a particular domain
High-level, usually declarative, and deterministic
![Page 20: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/20.jpg)
DSL Benefits
Productivity
•Shield average programmers from the difficulty of parallel programming
•Focus on developing algorithms and applications and not on low level implementation details
Performance
•Match high level domain abstraction to generic parallel execution patterns
•Restrict expressiveness to more easily and fully extract available parallelism
•Use domain knowledge for static/dynamic optimizations
Portability and forward scalability
•DSL & Runtime can be evolved to take advantage of latest hardware features
•Applications remain unchanged
•Allows innovative HW without worrying about application portability
![Page 21: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/21.jpg)
Our Approach: Three Views
Little embedded languages Domain abstractions improve productivity
Domains provide specific knowledge
Smart libraries Libraries that can compile/optimize themselves
Optimizations cross library call boundaries
Optimizations exploit domain specific knowledge
Smart compilers Raise abstraction-level of compiler optimization
Load and stores ⇒ Data structures
Language statements ⇒ Algorithms
![Page 22: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/22.jpg)
Reinterpreting Levels of Abstraction
Problem statement
Algs. & Data structs.
Sequential Program
(HLL)
GP ISA
HLL Compiler
Programmer
Programmer
Problem statement
Algs. & Data structs.
(DSL)
Heterogeneous Parallel
Program
GP ISA
Programmer
DSL Compiler
SP ISA
Compilers
![Page 23: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/23.jpg)
Bridging the Programmability Chasm
Virtual
Worlds
Personal
Robotics
Data
informatics
Scientific
Engineering
Physics
(Liszt)
Data Analytics
(OptiQL)
Graph Alg.
(Green Marl)
Machine Learning (OptiML)
Statistics
(R)
Applications
Domain
Specific
Languages
Heterogeneous
Hardware
DSL
Compiler
New
Arch.
DSL
Compiler
DSL
Compiler
DSL
Compiler
DSL
Compiler
![Page 24: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/24.jpg)
Common DSL Infrastructure
Virtual
Worlds
Personal
Robotics
Data
informatics
Scientific
Engineering
Physics
(Liszt)
Data Analytics
(OptiQL)
Graph Alg.
(Green Marl)
Machine Learning (OptiML)
Statistics
(R)
Applications
Domain
Specific
Languages
Heterogeneous
Hardware
DSL
Compiler
New
Arch.
DSL
Compiler
DSL
Compiler
DSL
Compiler
DSL
Compiler
DSL
Infrastructure
![Page 25: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/25.jpg)
Delite DSL Framework
Embedding Language (Scala) + DSL Framework (Delite)
Virtual
Worlds
Personal
Robotics
Data
informatics
Scientific
Engineering
Physics
(Liszt)
Data Analytics
(OptiQL)
Graph Alg.
(Green Marl)
Machine Learning (OptiML)
Statistics
(R)
Parallel Runtime (Delite RT)
Dynamic Domain Spec. Opt. Locality Aware Scheduling
Staging Polymorphic Embedding
Applications
Domain
Specific
Languages
Heterogeneous
Hardware
Delite DSL
Infrastructure
Task & Data Parallelism
Static Domain Specific Opt.
New
Arch.
![Page 26: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/26.jpg)
Agenda
OptiML: A DSL for Machine Learning (Arvind Sujeeth)
Liszt: A DSL for solving mesh-based PDEs (Zach DeVito)
Green-Marl: A DSL for efficient Graph Analysis (Sungpack Hong)
Scala Tutorial (Hassan Chafi)
DSL Infrastructure Overview (Kevin Brown)
High Performance DSL Implementation Using Delite (Arvind Sujeeth)
Delite Status and Future Directions in DSL Research (Hassan Chafi)
Wrap up (Kunle Olukotun)
![Page 27: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/27.jpg)
![Page 28: High-Performance Domain- Specific Languages using Deliteppl.stanford.edu/papers/CGO2012-1.pdf · High-Performance Domain-Specific Languages using Delite Kunle Olukotun, Kevin Brown,](https://reader034.fdocuments.us/reader034/viewer/2022042710/5f571b336921a72c076e0185/html5/thumbnails/28.jpg)
Tutorial Wrap Up
Performance oriented DSLs High productivity, performance and portability Try out our DSLs (OptiML, Liszt, Green-Marl) Develop your own DSLs: collaborate with domain
experts
Implementing DSLs with Delite Embedded DSLs in Scala Mapping to Delite IR Domain specific optimizations Optimizations for parallelism Codegen for SMP and GPU, (Cluster) Try out Delite, give us feedback
Thanks for attending!