A Case for Source-Level Transformations in MATLAB Vijay Menon and Keshav Pingali Cornell University...

Post on 21-Dec-2015

212 views 0 download

Tags:

Transcript of A Case for Source-Level Transformations in MATLAB Vijay Menon and Keshav Pingali Cornell University...

A Case for Source-Level Transformations in

MATLAB

Vijay Menon and Keshav Pingali

Cornell University

The MaJic Project

at Illinois/Cornell•George Almasi•Luiz De Rose•David Padua

MATLAB

High-Level Interpreted Language for Numerical Computing Matrix is 1st class type Library of numerical functions

Application Domains Image Processing Structural Mechanics Computational Finance

The Problem

Development is fast... ~10X as concise as C/Fortran

Performance is slow! ~10X as slow as C/Fortran

Conventional Approach: Rewrite Compile

Our Approach: Source-Level Optimization

Apply high-level transformations directly on MATLAB codes

Significant performance benefit for: interpreted code compiled code

Outline

Overheads in MATLABConventional CompilationSource-Level OptimizationComparisonImplementation Status

Outline

Overheads in MATLAB Type/Shape Checking Memory Management Array Bounds Checking

Conventional CompilationSource-Level OptimizationComparisonImplementation Status

Type/Shape Checking

MATLAB has no type/shape declarationsConsider: A * B

Interpreter checks to perform multiply (*)

ShapeScalar*ScalarScalar*MatrixMatrix*Matrix

TypeReal*RealReal*ComplexComplex*Compl

ex

Type/Shape Checking

Consider:for i = 1:n

y = y + a * x(i)

end

Loops perform redundant checks magnify interpreter overhead

Memory Management: Dynamic Resizing

Consider:x(10) = 10;

C/Fortran: x must have >= 10 elements

MATLAB: x is resized if needed Memory reallocated Data copied

Memory Management: Dynamic Resizing

MATLAB dynamically grows arrays:for i = 1 : 1000

x(i) = i;

end

Every iteration triggers resize! 1,000 memory allocations ~500,000 elements copied

Execution Time: x is undefined: 14.2 seconds x is already defined: 0.37 seconds

Array Bounds Checking

Consider array indexing:x(i) = y(i);

Failed Bounds Check on x(i) can trigger resize y(i) can trigger error

Array Bounds Checking

In a loop:for i = 3:100

x(i) = x(i-1) + x(i-2);

end

Interpreter performance redundant checksCompiler work:

Nonresizable arrays: Gupta PLDI’90 Resizable arrays: more difficult

Common Theme

Loops magnify overheads every iteration: redundant checks,

resizes, …

MATLAB interprets naively computes as is no reorganization to optimize

Outline

Overheads in MATLABConventional Compilation

Compile to C/Fortran Rely on C/Fortran compiler for

optimizationSource-Level OptimizationComparisonImplementation Status

MATLAB Compilers

Compile to C/C++/Fortran MCC -> C (The MathWorks) MATCOM -> C++ (Mathtools) FALCON -> F90 (U of Illinois)

Native compiler generates executable code: Link back into MATLAB environment Run as stand-alone program

The MCC Compiler

Safe Optimization: Type Inference - no declarations in MATLAB Eliminate Type Checks / Reduce Storage Specialize for real input variables Always legal!

Unsafe Optimization: Assume all data is real Eliminate all bounds checks - disallow resizing User must ensure legality!

Falcon Benchmarks Collected by DeRose from MATLAB users at Illinois/NCSA

Element/Loop Intensive CN - Crank-Nicholson PDE Solver Di - Dirichlet PDE Solver FD - Finite Difference PDE Solver Ga - Galerkin PDE Solver IC - Incomplete Cholesky Factorization

Memory Intensive AQ - Adaptive Quadrature w/ Simpson’s Rule EC - Euler-Cromer 2 body problem RK - Runga Kutta 2 body problem

Library Intensive CG - Conjugate Gradients Iterative Solver Mei - 3D surface Generation QMR - Quasi-Minimal Residual SOR - Successive Over-Relaxation AQ

MCC: Safe Optimizations

0

10

20

30

40

50

60

70

80

AQ CG CN Di FD Ga IC Mei EC RK QMR SOR

Ex

ec

uti

on

Tim

e (

s)

Interpreted

MCC Safe

MCC: Unsafe Optimizations

0

10

20

30

40

50

60

70

CG Di FD IC QMR SOR

Ex

ecu

tio

n T

ime

(s)

Interpreted

MCC Safe

MCC Unsafe All

Note: User must ensure legality!

Outline

Overheads in MATLABConventional CompilationSource-Level Optimization

Vectorization Preallocation Expression Optimization

ComparisonImplementation Status

Vectorization

Loops are expensive Overheads are magnified

Idea: Eliminate Loops Map loops to higher-level matrix

operations Interpreter uses efficient libraries

BLASLINPACK/EISPACK

Example of Vectorization

In Galerkin, 98% of execution spent in:

for i = 1:N

for j = 1:N

phi(k) += a(i,j)*x(i)*y(i);

end

end

Vectorized Code

In Optimized Galerkin:

phi(k) += x*a*y’;

Fragment Speedup: 260Program Speedup: 110

Note: Not always possible!

Effect of Vectorization

0

10

20

30

40

50

60

70

80

CN Di FD Ga IC

Ex

ecu

tio

n T

ime

(s)

Original

Vectorized

Preallocation

Eliminate Dynamic Resizing Try to predict eventual size of array

Insert early allocation when possible:x = zeros(1000,1);

Resizing will not be triggered

Example of Preallocation

In Euler-Cromer, 87% of time spent in:

for i = 1:N

r(i) = …

th(i) = …

t(i) = …

k(i) = …

p(i) = …

end

Preallocated Code

In Optimized Euler-Cromer:

r = zeros(1,N);

...

for i = 1:N

r(i) = …

end

Fragment Speedup: 7Program Speedup: 4

Effect of Preallocation

0

10

20

30

40

50

60

70

80

CN Ga EC RK

Ex

ecu

tio

n T

ime

(s)

Original

Preallocated

Expression Optimization

MATLAB interprets expressions naïvely in left to right order

Simple restructuring may significantly effects execution time, e.g.: A*B*x : O(n3) flops A*(B*x) : O(n2) flops

Example of Expression Optimization

In QMR, 70% of execution spent in:

w = A’*q;

A : 420x420 matrixq, w : 420x1 vectors

A’ = transpose(A)

Expression Optimized Code

In Optimized QMR: A’*q == (q’*A)’

w = (q’*A)’;

Transpose 2 vectors instead 1 matrix

Fragment Speedup: 20Program Speedup: 3

Effect of Expression Optimization

0

10

20

30

40

50

60

70

EC RK QMR

Ex

ecu

tio

n T

ime

(s)

Original

Expr. Optimized

Summary Source-Level

0

10

20

30

40

50

60

70

80

AQ

CG

CN Di

FD

Ga IC

Mei

EC

RK

QM

R

SO

R

Ex

ecu

tio

n T

ime

(s)

Original

Source Optimized

Comparison

0

10

20

30

40

50

60

70

80

AQ CG CN Di FD Ga IC Mei EC RK QMR SOR

Ex

ec

uti

on

Tim

e (

s)

Interpreted MCC Safe MCC Best

Opt. Interpreted Opt. MCC Safe Opt. MCC Best

Point #1:

Source optimizations can outperform MCC

0

10

20

30

40

50

60

70

FD Ga IC QMR

Ex

ecu

tio

n T

ime

(s)

Interpreted MCC Safe MCC Best

Opt. Interpreted Opt. MCC Safe Opt. MCC Best

Point #2:

0

10

20

30

40

50

60

70

80

CN FD Ga IC EC

Ex

ecu

tio

n T

ime

(s)

Interpreted MCC Safe MCC Best

Opt. Interpreted Opt. MCC Safe Opt. MCC Best

Source optimizations complement MCC

Benefits of Source-Level Optimizations

Vectorization Directly eliminates loop overhead Move work to hand-optimized BLAS

Preallocation Eliminates resizing overhead Enables MCC array bounds elimination

Expression Optimization Uses algebraic info unavailable in C/Fortran

Implementation Status

Illinois/Cornell MaJic system Just-in-time MATLAB interpreter/compiler Incorporates Source-Level Transformation

Semantic Optimization (Menon/Pingali ICS’99)• Vectorization/BLAS call generation• Expression Optimization

Preallocation/Bounds Check Optimization (Work in progress)

Conclusion

Source Level Optimizations are important for enhancing performance of MATLAB whether code is just interpreted or later compiled

THE END

Unsafe Type Check Removal

0

10

20

30

40

50

60

70

80

Ex

ecu

tio

n T

ime

(s)

Interpreted

MCC Safe

MCC Unsafe Type

Correct on 11/12 Codes

Unsafe Bounds Check Removal

0

10

20

30

40

50

60

70

CG Di FD IC Mei QMR SOR

Ex

ecu

tio

n T

ime

(s)

Interpreted

MCC Safe

MCC Unsafe Bounds

Correct on 7/12 Codes