Titanium: Language and Compiler Support for Scientific Computing

26
gtb 1 Titanium Titanium: Language and Compiler Support for Scientific Computing Gregory T. Balls University of California - Berkeley Alex Aiken, Dan Bonachea, Phillip Colella, David Gay, Susan Graham, Paul Hilfinger, Arvind Krishnamurthy, Ben Liblit, Chang Sun Lin, Peter McCorquodale, Carleton Miyamoto, Geoff Pike, Kar Ming Tang, Siu Man Yau, Katherine Yelick

description

Titanium: Language and Compiler Support for Scientific Computing. Gregory T. Balls University of California - Berkeley - PowerPoint PPT Presentation

Transcript of Titanium: Language and Compiler Support for Scientific Computing

Page 1: Titanium: Language and Compiler Support for Scientific Computing

gtb 1Titanium

Titanium: Language and Compiler Support for Scientific Computing

Gregory T. Balls University of California - Berkeley

Alex Aiken, Dan Bonachea, Phillip Colella, David Gay, Susan Graham, Paul Hilfinger, Arvind

Krishnamurthy, Ben Liblit, Chang Sun Lin, Peter McCorquodale, Carleton Miyamoto, Geoff Pike, Kar Ming Tang, Siu Man Yau, Katherine Yelick

Page 2: Titanium: Language and Compiler Support for Scientific Computing

gtb 2Titanium

Target Problems

• Many modeling problems in astrophysics, biology, material science, and other areas require – Enormous range of spatial and temporal

scales• To solve interesting problems, one needs:

– Adaptive methods– Large scale parallel machines

• Titanium is designed for methods with– Stuctured grids– Locally-structured grids (AMR)

Page 3: Titanium: Language and Compiler Support for Scientific Computing

gtb 3Titanium

Common Requirements

• Algorithms for numerical PDE computations are (compared to linear algebra)– communication intensive– memory intensive

• AMR makes these harder– more small messages – more complex data

structures– most of the programming effort is

debugging the boundary cases– locality and load balance trade-off is hard

Page 4: Titanium: Language and Compiler Support for Scientific Computing

gtb 4Titanium

Titanium for Scientific Computing

• The Language– Java dialect compiled to C– Extensions for serial programming– Extensions for parallel programming

• The Compiler– Uniprocessor optimizations– Parallel optimizations– Available architectures

• The Results

Page 5: Titanium: Language and Compiler Support for Scientific Computing

gtb 5Titanium

Java for Scientific Computing

• Computational scientists work on increasingly complex models– Popularized C++ features: classes,

overloading, pointer-based data structures• But C++ is very complicated

– easy to lose performance and readability

• Java is a better C++– Safe: strongly typed, garbage collected– Much simpler to implement (research

vehicle)– Industrial interest as well: IBM HP Java

Page 6: Titanium: Language and Compiler Support for Scientific Computing

gtb 6Titanium

Data Types

• Primitive scalar types: boolean, double, int, etc.– implementations store these in place– access is fast -- comparable to other

languages• Objects: user-defined and library

– passed by pointer value – has level of indirection (pointer to) implicit– simple model, but inefficient for small

objects• Fast Objects (immutable classes)

– similar to structs in C

Page 7: Titanium: Language and Compiler Support for Scientific Computing

gtb 7Titanium

Titanium Object Example

immutable class Complex { private double real; private double imag; public Complex(double r, double i) { real = r; imag = i; } public Complex operator+(Complex c) { return new Complex(c.real + real, c.imag + imag); } public double getReal {return real;} public double getImag {return imag;}}

Complex c = new Complex(7.1, 4.3);c = c + c;

Page 8: Titanium: Language and Compiler Support for Scientific Computing

gtb 8Titanium

Arrays in Java

• Arrays in Java are objects

• Only 1D arrays are directly supported

• Multidimensional arrays are slow

2d array

• Subarrays are important in AMR (e.g., interior of a grid)– Even C and C++ don’t support these

well– Hand-coding (array libraries) can

confuse optimizer

Page 9: Titanium: Language and Compiler Support for Scientific Computing

gtb 9Titanium

Multidimensional Arrays in Titanium• New multidimensional array added to Java

– One array may be a subarray of another» e.g., a is interior of b, or a is all even elements of b

– Indexed by Points (tuples of ints)– Constructed over a set of Points, called

Rectangular Domains (RectDomains)– Points, Domains and RectDomains are

built-in immutable classes

• Support for AMR and other grid computations– domain operations: intersection, shrink,

border

Page 10: Titanium: Language and Compiler Support for Scientific Computing

gtb 10Titanium

Unordered iteration

• Memory hierarchy optimizations are essential

• Compilers can sometimes do these, but hard in general

• Titanium adds unordered iteration on rectangular domains

foreach (p in r) { ... } – p is a Point– r is a RectDomain or Domain

• Foreach simplifies bounds checking as well • Additional operations on domains and

arrays to subset and transform

Page 11: Titanium: Language and Compiler Support for Scientific Computing

gtb 11Titanium

Titanium for Scientific Computing

• The Language– Java dialect compiled to C– Extensions for serial programming– Extensions for parallel programming

• The Compiler– Uniprocessor optimizations– Parallel optimizations– Available architectures

• The Results

Page 12: Titanium: Language and Compiler Support for Scientific Computing

gtb 12Titanium

SPMD Model

• All processors start together and execute same code, but not in lock-step

• Basic control done using– Ti.numProcs() total number of processors– Ti.thisProc() number of executing processor

• Bulk-synchronous style read all particles and compute forces on mine

Ti.barrier();

write to my particles using new forces

Ti.barrier();

• This is neither message passing nor data-parallel

Page 13: Titanium: Language and Compiler Support for Scientific Computing

gtb 13Titanium

Global Address Space

• References (pointers) may be remote– useful in building adaptive meshes– easy to port shared-memory programs– uniform programming model across

machines

• Global pointers are more expensive than local– True even when data is on the same

processor» space (processor number + memory address)» dereference time (check to see if local)

– Use local declarations in critical sections

Page 14: Titanium: Language and Compiler Support for Scientific Computing

gtb 14Titanium

Example: A Distributed Data Structure

Proc 0 Proc 1

local_grids

• Data can be accessed across processor boundaries

all_grids

Page 15: Titanium: Language and Compiler Support for Scientific Computing

gtb 15Titanium

Example: Setting Boundary Conditionsforeach (l in local_grids.domain()) {

foreach (a in all_grids.domain()) {

local_grids[l].copy(all_grids[a]);

}

}

Page 16: Titanium: Language and Compiler Support for Scientific Computing

gtb 16Titanium

Titanium for Scientific Computing

• The Language– Java dialect compiled to C– Extensions for serial programming– Extensions for parallel programming

• The Compiler– Uniprocessor optimizations– Communication optimizations– Available architectures

• The Results

Page 17: Titanium: Language and Compiler Support for Scientific Computing

gtb 17Titanium

Sequential Optimizations

• Current optimizations–foreach loops

» within 20% of FORTRAN on many loop-intensive codes

• Optimizations in development– Cache blocking– Inlining

Page 18: Titanium: Language and Compiler Support for Scientific Computing

gtb 18Titanium

Parallel Optimizations

• Titanium compiler performs parallel optimizations– communication overlap and aggregation– fast parallel bulk I/O

• New analyses:– synchronization analysis: the parallel

analog to control flow analysis for serial code [Gay & Aiken]

– shared variable analysis: the parallel analog to dependence analysis [Krishnamurthy & Yelick]

– local qualification inference: automatically inserts local qualifiers [Liblit & Aiken]

Page 19: Titanium: Language and Compiler Support for Scientific Computing

gtb 19Titanium

Architectures

• Titanium runs on many platforms – SP machines, T3Es, Networks of

Workstations

• Titanium on Blue Horizon specifics– Uses LAPI (not MPI)– Allows user to specify threads (procs) per

node– Performs conservative distributed

garbage collection

Page 20: Titanium: Language and Compiler Support for Scientific Computing

gtb 20Titanium

Titanium for Scientific Computing

• The Language– Java dialect compiled to C– Extensions for serial programming– Extensions for parallel programming

• The Compiler– Uniprocessor optimizations– Communication optimizations– Available architectures

• The Results

Page 21: Titanium: Language and Compiler Support for Scientific Computing

gtb 21Titanium

AMR Gas Dynamics

• Hyperbolic Solver [McCorquodale & Colella]

– Implementation of Berger-Colella algorithm– Mesh generation algorithm included

• 2D Example (3D supported) – Mach-10 shock on solid surface

at oblique angle

Page 22: Titanium: Language and Compiler Support for Scientific Computing

gtb 22Titanium

FD-MLC for Poisson Problem

• Finite Difference based Method of Local Corrections [Balls & Colella]

• Example run on 16 processors– 1 large high-

wavenumber charge

– 2 smaller star-shaped charges

-6.4

7x10

-9

0

1

.31x

10-9

Page 23: Titanium: Language and Compiler Support for Scientific Computing

gtb 23Titanium

Parallel Performance

• Speedup on Ultrasparc SMP

• EM3D is small kernel– relaxation on

unstructured mesh – shows high parallel

efficiency of Titanium system

• AMR speedup limited by – small fixed mesh– 2-levels, 9 patches

0

1

2

3

4

5

6

7

8

1 2 4 8

em3d

amr

Page 24: Titanium: Language and Compiler Support for Scientific Computing

gtb 24Titanium

FD-MLC Parallel Performance• Communication requirement is low (< 5%)• Scaled speedup experiments are nearly ideal (flat)

IBM SP2 at SDSC Cray T3E at NERSC

Page 25: Titanium: Language and Compiler Support for Scientific Computing

gtb 25Titanium

Future Work

• Titanium language and compiler developments– Templates– Further optimization of serial

performance• Algorithm Development in Titanium

– Self-gravitating gas dynamics– Immersed boundary methods

• Comparison to library approach– Performance– Code size and readability

Page 26: Titanium: Language and Compiler Support for Scientific Computing

gtb 26Titanium

Summary

• Language support– Arrays, Immutable, Overloading, …

• Compiler optimizations– Uniprocessor optimizations– Parallel analyses

• Architectures– Ported to several different platforms

• Results– Several algorithms implemented– Good parallel performance