Symbolic Program Transformation for Numerical Codes Vijay Menon Cornell University.

Symbolic Program Transformation for Numerical Codes

Vijay MenonCornell University

Background Compiler/Runtime Support for Numerical

Programs (under Keshav Pingali)– Sparse matrix code generation

– Memory hierarchy optimizations

– Compiler/runtime techniques for MATLAB• MAJIC (with University of Illinois)• MultiMATLAB

Motivation

Using matrix properties to generate efficient code – e.g., Associativity, Commutativity of matrix

operations

Problem: loop notation– Matrix operations hidden within loop nests

Example: LU Factorization with Partial Pivoting

Also called: Gaussian Elimination

Key algorithm for solving systems of linear equations:– To solve A x = b for x:

• => Factor A into L U• => Solve L y = b for y• => Solve U x = y for x

LU Factorization

Problem: Poor cache behavior

do j = 1,N

p(j) = j; do i = j+1,N if (A(i,j)>A(p(j),j)) p(j) = i;

do k = 1,N tmp = A(j,k); A(j,k) = A(p(j),k); A(p(j),k) = tmp;

do i = j+1,N A(i,j) = A(i,j)/A(j,j);

do k = j+1,N do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k);

Select pivot row:

Swap pivot row with current:

Scale column (to store L):

Update(to compute partial U):

x x x x x x0 x x x x x0 0 5 x x x0 0 3 x x x0 0 7 x x x 0 0 2 x x x

Automatic Blocking Compiler

transformations:– Stripmine– Index-Set Split– Loop Distribute– Tile/Map to BLAS

Problems:– Establishing legality– Mapping to BLAS

do jB = 1,N,B

do j = jB,jB+B-1 p(j) = j; do i = j+1,N if (A(i,j)>A(p(j),j)) p(j) = i; do k = 1,N tmp = A(j,k); A(j,k) = A(p(j),k); A(p(j),k) = tmp; do i = j+1,N A(i,j) = A(i,j)/A(j,j); do k = j+1,jB+B-1 do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k);

do j = jB,jB+B-1 do k = jB+B,N do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k);

LU Performance

LAPACK

Distributed w ith BLAS

Distributed

Original LU

Key Points

Conventional techniques (dependence analysis) are insufficient to establish legality

Detection of matrix operations buys extra performance

Overview of this Talk

Fractal Symbolic Analysis– Framework for Symbolically determining

Legality of Program Transformations

Matrixization– Generalization of Vectorization to Matrix

Operations

Fractal Symbolic Analysis

Symbolic test to establish legality of program transformations

for i = 1 : n S1(i); S2(i);

for i = 1 : n S1(i);for i = 1 : n S2(i);

Dependence Analysis Independent operations may be reordered

Legality: No data dependence from S2(m) to S1(l) where l > m

for i = 1 : n S1(i); S2(i);

for i = 1 : n S1(i);for i = 1 : n S2(i);

Symbolic Comparison Dependence Analysis is conservative:

Symbolic execution shows equality:– aout = 2*(ain + bin)

– bout = 2*bin

But, intractable for recurrent loops

s1: a = 2*as2: b = 2*bs3: a = a+b

s3: a = a+bs1: a = 2*as2: b = 2*b

Distillation of LU Given p(j) j, prove:

Dependence analysis: too conservative Symbolic comparison: intractable

for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp

for i = j+1:n a(i) = a(i)/a(j)

B1(j):

B2(j):

for j = 1:n for i = j+1:n a(i) = a(i)/a(j)

B1(j):

B2(j):

Simplify– Prove equality of complex programs via

equality of simpler programs– Repeat if necessary

Sufficient, but not necessary– equality of simpler programs -> equality of

original programs

CommutativityS1, S2, S3, …,Sn = Sf(1), Sf(2), Sf(3), …,Sf(n)

i,j 1..n | i < j f(i) > f(j).

Si; Sj = Sj; Si

Permutation Sequence of Adjacent Transpositions

Commutative Legality

Loop Distribution

Legality:1 m < l N.

S2(m); S1(l) = S1(l); S2(m)

for i = 1 : n S1(i); S2(i);

for i = 1 : n S1(i);for i = 1 : n S2(i);

Commutative Legality

Similar conditions:– Statement reordering– Loop interchange – Loop reversal – Loop tiling

Compiler establishes legality of transformation by testing commute condition

Back to Example Given p(j) j, prove:

Simplified comparison =>

for i = j+1:n a(i) = a(i)/a(j)

B1(j):

B2(j):

for j = 1:n for i = j+1:n a(i) = a(i)/a(j)

B1(j):

B2(j):

Simplified Comparison Given p(j) j l > m prove:

Further simplification?

for i = m+1:n a(i) = a(i)/a(m)

tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp

B2(m):

B1(l):

B2(m):

Too Simple! Given p(j) j l > m i > m prove:

Too conservative - no longer legal! Hypothesis:

– Repeated application -> dependence analysis

a(i) = a(i)/a(m)

Symbolic Comparison Can we symbolically prove?:

Effect can be summarized:– non-recurrent loops– affine indices/loop bounds

B2(m):

B1(l):

B2(m):

Guarded Expressions

Compare symbolic values of live output variable in terms of input variables:

aout (k)=

Each– guard : affine expression defining part of aout

– expr : symbolic expression on input data

guard1 -> expr1

guard2 -> expr2

……guardn -> exprn

Guarded Expressions

For both program blocks:

aout(k) =

Omega Library (integer programming tool) to manipulate/compare affine guards

k m => ain(k)

k = l => ain(p(l))/ain(m)

k = p(l) => ain(l)/ain(m)

else => ain(k)/ain(m)

Summary of Fractal Symbolic Analysis Powerful legality test Explores tradeoffs between

– tractability (dependence analysis)– accuracy (symbolic comparison)

Similar application for LU w/ pivoting– 2 recursive simplification steps– 6 guarded regions/expressions

Prototype implemented in OCAML

Overview of this Talk

Fractal Symbolic Analysis– Framework for Symbolically determining

Legality of Program Transformations

Matrixization– Generalization of Vectorization to Matrix

Operations

Matrixization

Detect Matrix Operations– Map to

• hand-optimized libraries (BLAS)• special-purpose hardware

– Eliminate loop overhead• MATLAB: type/bounds checks

Exploit Matrix Semantics– Ring Properties of Matrix (+,*)

BLAS Performance

100150200250300350400450500

100 200 300 400 500 600 700 800 900 1000size

Compiler Matrix-Matrix ProductBLAS Matrix-Matrix Product (DGEMM)Compiler Matrix-Vector ProductBLAS Matrix-Vector Product (DGEMV)

MATLAB Performance

Crank-Nicholson

Dirichlet FiniteDifference

Galerkin Inc.Cholesky

Original

Vectorized

Compiled

Vect. & Comp.

Galerkin Example

In Galerkin, 98% of time spent in:

for i = 1:N for j = 1:N phi(k) += A(i,j)*x(i)*y(i);

end end

A - Matrix x, y - Vector

Vectorized Code

In Optimized Galerkin:

phi(k) += x*A*y’;

Fragment Speedup: 260 Program Speedup: 110

Conventional Approaches

Syntactic Pattern Replacement (KAPF,VAST,FALCON) Can encode high-level propertiesLimited use on loops

Vector Code Generation Can detect array operations in loopsCannot detect/exploit matrix products

Our Approach Map code to: Abstract Matrix Form (AMF)

Convert to Symmetric AMF formulation

Optimize AMF expressions– factorization– invariant removal

Detect Matrix Products

Map AMF to MATLAB/BLAS

Expansion

Array expressions:

Forall loops:

a(1:m,1:n) -> i j ai,j

for i = 1:n x(i) = x(i) + y(i); -> ixi = i(xi+yi)

Reduction

Sum Reduction Loops

for i = 1:n k = k + x(i); -> k = k + î i xi

Product

Matrix ProductC = A*B -> i jCi,j= P(i jAi,h,ijBh,j)

Product-Reduction Equivalence:

• e1 and e2 are scalar

• e1 is constant w.r.t. ik+1…in

• e2 is constant w.r.t. i1…ik-1

îki1…ine1 * i1…ine2) = Pîk(i1…ike1, ik…ine2)

Other AMF Properties

Distributive Properties– e1· î e2 = î((ie1) ·e2)

• when e1 is constant w.r.t. i

Interchange Properties if(e1, e2,…, en) = f(ie1, ie2,…, ien)

i e = i e • where î=

i j e = j i e

î e = î e

Back to Galerkin Original:

• k = k + î i j (ai,j * xi * yj)

Convert to Symmetric Form:• k = k + î (i j ai,j * i j xi * i j yj)

Optimize:• k = k + î i xi * (i j ai,j * i j yj)

Map to Matrix Operations:• k = k + Pî (i xi , P(i j ai,j, j yj))

=> k = k + x’*A*y;

Other Vectorization Examples

Taken from USENET:

for i = 1:n for j = 1:n C(i,j) = A(i,j)*x(i);

C(1:n,1:n) = A(1:n,1:n) .* repmat(x(1,n),1,n)

for i = 1:n x(i) = y(:,i)*A*y(:,i)’;

x(1:n) = sum(y.*(y*A’),2);

Summary/Status of Matrixization General technique to detect matrix/vector

operations

Implemented in MAJIC– MATLAB interpreter/compiler

Future work:– Extend to more ops: recurrences– JIT Matrixization

Related Work Commutativity Analysis

– Rinard & Diniz– Parallelization of OO programs

High-Level Pattern Replacement– DeRose, Marsolf, …– Exploit high-level matrix properties

• Structure• Symmetry• Orthogonality

Conclusion

High-level symbolic techniques:– Fractal Symbolic Analysis– Matrixization

New techniques to analyze & exploit underlying computation and achieve substantial performance gains

Symbolic Program Transformation for Numerical Codes Vijay Menon Cornell University.

Documents

Transcript of Symbolic Program Transformation for Numerical Codes Vijay Menon Cornell University.

MENON - edu.uowm.gr

Menon Enterprises Maharashtra India

Asha Raman Menon Asha raman menon@yahoo.co.uk …

Cases Status June 201707042017... · name) Doordarshan Bhavan, Copernicus Marg, New Delhi -110001. RTT Cell, DDK, Ahmedabad Shri Vijay Adesara LDC for Hindi Version Smt Mini Menon,

Crispr technology pnk menon

Menon W R M N P PROF. N. R. MADHAVA MENON ASIAN JURAL ...

Manisha Menon Ssis Portfolio

Varsha Menon _Architectural Portfolio

Proptosis by Dr.Ashwin Menon

ITRI II - Menon Costruttori

17c MENON Outing Heteronormativity

c 2016 Harshitha Menon Gopalakrishnan Menoncharm.cs.illinois.edu/newPapers/16-18/thesis.pdf · ADAPTIVE LOAD BALANCING FOR HPC APPLICATIONS BY HARSHITHA MENON GOPALAKRISHNAN MENON

Jayant Menon - cdri.e-khmer.com

Invitation Vijay Hi Vijay

The RAMPPS Course Handbook - Health Education England · Dr. Mohandas Dinesh Anne Hobbins Dr. Jill Horn Julie Megaw Dr. Vijay Menon ... Dr. Manoj Narayan Dr. Mark Radcliffe Tees,

dart2java - Matthias SpringerInternship at Google Seattle 4 buildings in Fremont, ~100 interns in Seattle + Kirkland offices Working in the Dart team (Host: Vijay Menon) Dart Sub-Teams

10 Menon 2007

Krishna Menon- Disarmament

Notes by Drdini Menon

Platon, 3.2 Gorgias, Menon