Symbolic Program Transformation for Numerical Codes Vijay Menon Cornell University.
-
date post
20-Dec-2015 -
Category
Documents
-
view
218 -
download
1
Transcript of Symbolic Program Transformation for Numerical Codes Vijay Menon Cornell University.
Symbolic Program Transformation for Numerical Codes
Vijay MenonCornell University
Background Compiler/Runtime Support for Numerical
Programs (under Keshav Pingali)– Sparse matrix code generation
– Memory hierarchy optimizations
– Compiler/runtime techniques for MATLAB• MAJIC (with University of Illinois)• MultiMATLAB
Motivation
Using matrix properties to generate efficient code – e.g., Associativity, Commutativity of matrix
operations
Problem: loop notation– Matrix operations hidden within loop nests
Example: LU Factorization with Partial Pivoting
Also called: Gaussian Elimination
Key algorithm for solving systems of linear equations:– To solve A x = b for x:
• => Factor A into L U• => Solve L y = b for y• => Solve U x = y for x
LU Factorization
Problem: Poor cache behavior
do j = 1,N
p(j) = j; do i = j+1,N if (A(i,j)>A(p(j),j)) p(j) = i;
do k = 1,N tmp = A(j,k); A(j,k) = A(p(j),k); A(p(j),k) = tmp;
do i = j+1,N A(i,j) = A(i,j)/A(j,j);
do k = j+1,N do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k);
Select pivot row:
Swap pivot row with current:
Scale column (to store L):
Update(to compute partial U):
x x x x x x0 x x x x x0 0 5 x x x0 0 3 x x x0 0 7 x x x 0 0 2 x x x
Automatic Blocking Compiler
transformations:– Stripmine– Index-Set Split– Loop Distribute– Tile/Map to BLAS
Problems:– Establishing legality– Mapping to BLAS
do jB = 1,N,B
do j = jB,jB+B-1 p(j) = j; do i = j+1,N if (A(i,j)>A(p(j),j)) p(j) = i; do k = 1,N tmp = A(j,k); A(j,k) = A(p(j),k); A(p(j),k) = tmp; do i = j+1,N A(i,j) = A(i,j)/A(j,j); do k = j+1,jB+B-1 do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k);
do j = jB,jB+B-1 do k = jB+B,N do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k);
LU Performance
0
50
100
150
200
250
300
350
400
450
500
size
MF
lop
s
LAPACK
Distributed w ith BLAS
Distributed
Original LU
Key Points
Conventional techniques (dependence analysis) are insufficient to establish legality
Detection of matrix operations buys extra performance
Overview of this Talk
Fractal Symbolic Analysis– Framework for Symbolically determining
Legality of Program Transformations
Matrixization– Generalization of Vectorization to Matrix
Operations
Fractal Symbolic Analysis
Symbolic test to establish legality of program transformations
for i = 1 : n S1(i); S2(i);
for i = 1 : n S1(i);for i = 1 : n S2(i);
Dependence Analysis Independent operations may be reordered
Legality: No data dependence from S2(m) to S1(l) where l > m
for i = 1 : n S1(i); S2(i);
for i = 1 : n S1(i);for i = 1 : n S2(i);
Symbolic Comparison Dependence Analysis is conservative:
Symbolic execution shows equality:– aout = 2*(ain + bin)
– bout = 2*bin
But, intractable for recurrent loops
s1: a = 2*as2: b = 2*bs3: a = a+b
s3: a = a+bs1: a = 2*as2: b = 2*b
?
Distillation of LU Given p(j) j, prove:
Dependence analysis: too conservative Symbolic comparison: intractable
for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp
for i = j+1:n a(i) = a(i)/a(j)
B1(j):
B2(j):
for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp
for j = 1:n for i = j+1:n a(i) = a(i)/a(j)
B1(j):
B2(j):
Idea
Simplify– Prove equality of complex programs via
equality of simpler programs– Repeat if necessary
Sufficient, but not necessary– equality of simpler programs -> equality of
original programs
CommutativityS1, S2, S3, …,Sn = Sf(1), Sf(2), Sf(3), …,Sf(n)
i,j 1..n | i < j f(i) > f(j).
Si; Sj = Sj; Si
Permutation Sequence of Adjacent Transpositions
Commutative Legality
Loop Distribution
Legality:1 m < l N.
S2(m); S1(l) = S1(l); S2(m)
for i = 1 : n S1(i); S2(i);
for i = 1 : n S1(i);for i = 1 : n S2(i);
Commutative Legality
Similar conditions:– Statement reordering– Loop interchange – Loop reversal – Loop tiling
Compiler establishes legality of transformation by testing commute condition
Back to Example Given p(j) j, prove:
Simplified comparison =>
for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp
for i = j+1:n a(i) = a(i)/a(j)
B1(j):
B2(j):
for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp
for j = 1:n for i = j+1:n a(i) = a(i)/a(j)
B1(j):
B2(j):
Simplified Comparison Given p(j) j l > m prove:
Further simplification?
for i = m+1:n a(i) = a(i)/a(m)
tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp
B2(m):
B1(l):
tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp
for i = m+1:n a(i) = a(i)/a(m)
B1(l):
B2(m):
Too Simple! Given p(j) j l > m i > m prove:
Too conservative - no longer legal! Hypothesis:
– Repeated application -> dependence analysis
a(i) = a(i)/a(m)
tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp
tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp
a(i) = a(i)/a(m)
Symbolic Comparison Can we symbolically prove?:
Effect can be summarized:– non-recurrent loops– affine indices/loop bounds
for i = m+1:n a(i) = a(i)/a(m)
tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp
B2(m):
B1(l):
tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp
for i = m+1:n a(i) = a(i)/a(m)
B1(l):
B2(m):
Guarded Expressions
Compare symbolic values of live output variable in terms of input variables:
aout (k)=
Each– guard : affine expression defining part of aout
– expr : symbolic expression on input data
guard1 -> expr1
guard2 -> expr2
……guardn -> exprn
Guarded Expressions
For both program blocks:
aout(k) =
Omega Library (integer programming tool) to manipulate/compare affine guards
k m => ain(k)
k = l => ain(p(l))/ain(m)
k = p(l) => ain(l)/ain(m)
else => ain(k)/ain(m)
Summary of Fractal Symbolic Analysis Powerful legality test Explores tradeoffs between
– tractability (dependence analysis)– accuracy (symbolic comparison)
Similar application for LU w/ pivoting– 2 recursive simplification steps– 6 guarded regions/expressions
Prototype implemented in OCAML
Overview of this Talk
Fractal Symbolic Analysis– Framework for Symbolically determining
Legality of Program Transformations
Matrixization– Generalization of Vectorization to Matrix
Operations
Matrixization
Detect Matrix Operations– Map to
• hand-optimized libraries (BLAS)• special-purpose hardware
– Eliminate loop overhead• MATLAB: type/bounds checks
Exploit Matrix Semantics– Ring Properties of Matrix (+,*)
BLAS Performance
050
100150200250300350400450500
100 200 300 400 500 600 700 800 900 1000size
MF
lop
s
Compiler Matrix-Matrix ProductBLAS Matrix-Matrix Product (DGEMM)Compiler Matrix-Vector ProductBLAS Matrix-Vector Product (DGEMV)
MATLAB Performance
0
10
20
30
40
50
60
70
80
Crank-Nicholson
Dirichlet FiniteDifference
Galerkin Inc.Cholesky
Ex
ecu
tio
n T
ime
(s)
Original
Vectorized
Compiled
Vect. & Comp.
Galerkin Example
In Galerkin, 98% of time spent in:
for i = 1:N for j = 1:N phi(k) += A(i,j)*x(i)*y(i);
end end
A - Matrix x, y - Vector
Vectorized Code
In Optimized Galerkin:
phi(k) += x*A*y’;
Fragment Speedup: 260 Program Speedup: 110
Conventional Approaches
Syntactic Pattern Replacement (KAPF,VAST,FALCON) Can encode high-level propertiesLimited use on loops
Vector Code Generation Can detect array operations in loopsCannot detect/exploit matrix products
Our Approach Map code to: Abstract Matrix Form (AMF)
Convert to Symmetric AMF formulation
Optimize AMF expressions– factorization– invariant removal
Detect Matrix Products
Map AMF to MATLAB/BLAS
Expansion
Array expressions:
Forall loops:
a(1:m,1:n) -> i j ai,j
for i = 1:n x(i) = x(i) + y(i); -> ixi = i(xi+yi)
end
Reduction
Sum Reduction Loops
for i = 1:n k = k + x(i); -> k = k + î i xi
end
Product
Matrix ProductC = A*B -> i jCi,j= P(i jAi,h,ijBh,j)
Product-Reduction Equivalence:
• e1 and e2 are scalar
• e1 is constant w.r.t. ik+1…in
• e2 is constant w.r.t. i1…ik-1
îki1…ine1 * i1…ine2) = Pîk(i1…ike1, ik…ine2)
Other AMF Properties
Distributive Properties– e1· î e2 = î((ie1) ·e2)
• when e1 is constant w.r.t. i
Interchange Properties if(e1, e2,…, en) = f(ie1, ie2,…, ien)
i e = i e • where î=
i j e = j i e
î e = î e
Back to Galerkin Original:
• k = k + î i j (ai,j * xi * yj)
Convert to Symmetric Form:• k = k + î (i j ai,j * i j xi * i j yj)
Optimize:• k = k + î i xi * (i j ai,j * i j yj)
Map to Matrix Operations:• k = k + Pî (i xi , P(i j ai,j, j yj))
=> k = k + x’*A*y;
Other Vectorization Examples
Taken from USENET:
for i = 1:n for j = 1:n C(i,j) = A(i,j)*x(i);
C(1:n,1:n) = A(1:n,1:n) .* repmat(x(1,n),1,n)
for i = 1:n x(i) = y(:,i)*A*y(:,i)’;
x(1:n) = sum(y.*(y*A’),2);
Summary/Status of Matrixization General technique to detect matrix/vector
operations
Implemented in MAJIC– MATLAB interpreter/compiler
Future work:– Extend to more ops: recurrences– JIT Matrixization
Related Work Commutativity Analysis
– Rinard & Diniz– Parallelization of OO programs
High-Level Pattern Replacement– DeRose, Marsolf, …– Exploit high-level matrix properties
• Structure• Symmetry• Orthogonality
Conclusion
High-level symbolic techniques:– Fractal Symbolic Analysis– Matrixization
New techniques to analyze & exploit underlying computation and achieve substantial performance gains