Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel, Fortress, X10
description
Transcript of Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel, Fortress, X10
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Optimizing CompilersCISC 673
Spring 2009Potential Languages of the Future
Chapel, Fortress, X10John Cavazos
University of Delaware
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Overview Developed for DARPA HPCS Program
High Productivity Computing Systems
Chapel: Cascade High-Productivity Language Fortress: The new Fortran? X10: A Parallel Variant of Java
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Chapel Chapel: Cascade High-Productivity Language Characteristics:
Global-view parallel language Support for general parallelism Locality-aware Object-oriented Generic programming
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Global vs Fragmented models
Global-view programming model Algorithm/data structures expressed as a whole Model executes as single thread upon entry Parallelism introduced through language constructs Examples: Chapel, OpenMP, HPF
Fragmented programming model Algorithms expressed on a task-by-task basis Explicit decomposition of data structures/control flow Examples: MPI, UPC, Titanium
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Global vs Fragmented models
Global-view languages leave detail to compiler Fragmented languages obfuscate code
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Support for General Parallelism
“Single level of parallelism” Prevelance of SPMD model
MPI (very popular) Supports coarse-grained parallelism
OpenMP Supports fine-grained parallelism
Should support “nested” parallelism Should also cleanly support data/task parallelism
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Data distribution and Locality
Hard for compiler to do good job of these Responsibility of performance-minded programmer Language should provide abstractions to:
control data distribution
control locality of interacting variables
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Object-oriented Programming
Proven successful in mainstream languages Separating interfaces from implementation Enables code reuse Encapsulate related code and data
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Generic Programming Algorithms are written without specifying types
Types somehow instantiated later
Latent types Compiler can infer type from program’s context Variable type inferred by initialization expression Function args inferred by actual arguments at callsites If compiler cannot infer declares an error
Chapel is statically-typed All types inferred (type checking done) at compile-time For performance reasons
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Chapel: Data Parallelism// a 2D ARITHMETIC DOMAIN storing indices (1,1) …(m,n) var D: domain(2) = [1..m, 1..n];
// an m X n array of floating point valuesvar A: [D] float;
// an INFINITE DOMAIN storing string indiciesvar People: domain (string);
// array of integers indexed with strings in the People domainvar Age: [People] int;
People += “John”; // add string “John” to People domainAge(“John”) = 62; // set John’s age
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Chapel: Data Parallelism// FORALL over domain of tuple of integers of domain Dforall ij in D { A(ij) = …;}
// FORALL over domain of strings from People domain forall I in People { Age(I) = …;}
// Simple Exampleforall I in 1..N do a(I) = b(I);
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Chapel: Task Parallelism//Begin Statement spawns new taskbegin writeln (“output from spawned task”); writeln(“output from main task”);
// Cobegin Statement// synchronization happens at the end of the cobegin blockcobegin { stmt1(); stmt2(); stmt3();}
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Chapel: Task Parallelism
// NOTE: Parallel tasks can coordinate with sync variables var finishedMainOutput$: sync bool;begin { finishedMainOutput$; writeln (“output from spawned task”);}writeln(“output from main task”);finishedMainOutput$ = true;
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Fortress Overview Developed at Sun Entirely new language Fortress features
Targeted to scientific computing Mathematical notation Implicitly parallel whenever possible
Constructs and annotations to serialize when necessary Whenever possible, implement language feature in library
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Fortress: Task Parallelism For loops
All iterations can execute in parallel
do … also do … end Can specify parallel tasks
Tuples Set of parallel expressions or functions
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Fortress: for loop parallelism For loops
5 4 6 3 7 2 9 10 1 8
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Fortress: Task Parallelism Examples
do … also tuples
do … end
do (factorial(10), factorial(5), factorial(2)) factorial(10)also do factorial(5)also do factorial(2)end
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Fortress: atomic expressions
Note: Z can be 2 or 0, but not 1!
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Fortress Code
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Fortress: Regions Every thread, object, element in the array has an
associated region Hierarchically form a tree Describe machine resources
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
X10 Overview Developed at IBM X10 is an extended subset of Java Base language = Java 1.4 language
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Fixes some Java limitations Java programming model: single uniform heap X10 introduces partitioned global address spaces Java intra-node and inter-node parallelism heavyweight
Threads and message/processes too heavyweight X10 introduces asynchronous activities
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
X10 != Java Some features removed from Java language
Java Concurrency -- threads, synchronized Java Arrays replaced with X10 arrays Java dynamic class loading removed
Some features added to Java language Concurrency -- async, finish, foreach, ateach, etc. Distribution – block, blockCyclic, etc. X10 arrays -- distributed arrays according to A.distribution
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
X10 Concurrency Distributed Collections
Map collection elements to places Collection<D,E> is a collection with distribution D and
element type E Parallel Execution
foreach (point p: R) S Creates |R| async statements in parallel at current place
async (P) S Creates a new activity to execute statement S at place P
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
X10: Activites, Places & PGAS