LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K....
Transcript of LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K....
LEVERAGING DSLS (ALMOST) FOR FREE
@alvinkcheung
uwdb.iouwplse.org
Time
Analytics
ImageProcessing
Hardware
ParallelComputing
?
Rotate Blur
Join
CachingPipelining
Hashimplementation
Embedded algorithmic detailsArchitecture-specific optimizations
HashPartitioning
?YourDSLhere
Annuals of Code Rewrites
• Overhaul of the Mozilla Gecko layout engine (2 years),and now to Servo
• Rewrite of INGRES database into PostGRES (3 years)
• Twitter search engine rewrite (2 years)
All of these were major engineering efforts!
Vision:Perform code rewrites automatically
Current focus: Performance-critical code fragments
for ($i = 0; $i < $l1.size(); ++$i) {$l2.append($l1[$i] + $c);
}$l2 = $l1.map(e -> e + $c);
List getUsersWithRoles () {List users = this.userDao.getUsers(); List roles = this.roleDao.getRoles(); List results = new ArrayList();for (User u : users) {
for (Role r : roles) {if (u.roleId == r.id)
results.add(u); }}
return results; }
?
Syntax-Driven Compilation
[Radoi et al, OOPSLA 14]
+ E:localize-fold-accssesdocs . f o l d (m) ( {
c a s e ( v34 , ( i , v8 ) ) =>val v4 = v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . z i p ( v34 ) . map ( {
c a s e ( v2 , ( v3 , v7 ) ) =>v7 + v3 . s i z e
} ) ;v34 ++ v5
} )
+ E:map-vertical-fissiondocs . f o l d (m) ( {
c a s e ( v34 , ( i , v8 ) ) =>val v4 = v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . map ( {
c a s e ( v2 , v3 ) => v3 . s i z e} ) . z i p ( v34 ) . map ( {
c a s e ( v2 , ( v9 , v7 ) ) => v7 + v9} ) ;
v34 ++ v5} )
+ E:identify-map-monoid-plusdocs . f o l d (m) ( {
c a s e ( v34 , ( i , v8 ) ) =>( v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ) . map ( {
c a s e ( v2 , v3 ) => v3 . s i z e} ) | + | v34
} )
+ E:pull-map-from-fold-by-subexpression-extractdocs . map ( {
c a s e ( i , v8 ) =>( v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ) . map ( {
c a s e ( v2 , v3 ) => v3 . s i z e} )
} ) . f o l d (m) ( {c a s e ( v34 , ( i , v10 ) ) =>
v10 | + | v34} )
+ E:map-horizontal-fissiondocs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID ) . __2
} ) . map ( {c a s e ( i , v11 ) =>
v11 . map ( {c a s e ( v2 , v3 ) => v3 . s i z e
} )} ) . f o l d (m) ( {
c a s e ( v34 , ( i , v10 ) ) =>v10 | + | v34
} )
+ E:swap-map-with-foldval v11 = docs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID ) . __2
} ) . f o l d ( "ZERO�TOKEN" ) ( {c a s e ( v12 , ( i , v13 ) ) =>
v12 | + | v13} ) ;
v11 . map ( {c a s e ( v2 , v3 ) => v3 . s i z e
} )
+ E:map-horizontal-fissionval v11 = docs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID )
} ) . map ( {c a s e ( i , v14 ) => v14 . __2
} ) . f o l d ( "ZERO�TOKEN" ) ( {
c a s e ( v12 , ( i , v13 ) ) =>v12 | + | v13
} ) ;v11 . map ( {
c a s e ( v2 , v3 ) =>v3 . s i z e
} )
+ E:map-horizontal-fissionval v11 = docs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " )
} ) . map ( {c a s e ( i , v15 ) =>
v15 . groupBy ( ID )} ) . map ( {
c a s e ( i , v14 ) => v14 . __2} ) . f o l d ( Map ( ) ) ( {
c a s e ( v12 , ( i , v13 ) ) =>v12 | + | v13
} ) ;v11 . map ( {
c a s e ( v2 , v3 ) =>v3 . s i z e
} )
+ E:swap-map-with-foldval v14 = docs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " )
} ) . map ( {c a s e ( i , v15 ) =>
v15 . groupBy ( ID )} ) . f o l d ( Map ( ) ) ( {
c a s e ( v16 , ( i , v17 ) ) =>v16 | + | v17
} ) ;( v14 . __2 ) . map ( {
c a s e ( v2 , v3 ) =>v3 . s i z e
} )
+ E:swap-map-with-foldval v15 = docs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " )
} ) . f o l d ( Map ( ) ) ( {c a s e ( v18 , ( i , v19 ) ) =>
v18 | + | v19} ) ;
val v14 = v15 . groupBy ( ID ) ;( v14 . __2 ) . map ( {
c a s e ( v2 , v3 ) =>v3 . s i z e
} )
+ E:flatMapval v15 = docs . f l a t M a p ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " )
} ) ;val v14 = v15 . groupBy ( ID ) ;( v14 . __2 ) . map ( {
c a s e ( v2 , v3 ) =>v3 . s i z e
} )
+ E:reducebykeydocs . f l a t M a p ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " )
} ) . map ( {c a s e ( j , v6 ) =>
( v6 , 1 )} ) . reduceByKey ( {
c a s e ( v2 , v3 ) =>v2 + v3
} )
+ E:localize-fold-accssesdocs . f o l d (m) ( {
c a s e ( v34 , ( i , v8 ) ) =>val v4 = v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . z i p ( v34 ) . map ( {
c a s e ( v2 , ( v3 , v7 ) ) =>v7 + v3 . s i z e
} ) ;v34 ++ v5
} )
+ E:map-vertical-fissiondocs . f o l d (m) ( {
c a s e ( v34 , ( i , v8 ) ) =>val v4 = v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . map ( {
c a s e ( v2 , v3 ) => v3 . s i z e} ) . z i p ( v34 ) . map ( {
c a s e ( v2 , ( v9 , v7 ) ) => v7 + v9} ) ;
v34 ++ v5} )
+ E:identify-map-monoid-plusdocs . f o l d (m) ( {
c a s e ( v34 , ( i , v8 ) ) =>( v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ) . map ( {
c a s e ( v2 , v3 ) => v3 . s i z e} ) | + | v34
} )
+ E:pull-map-from-fold-by-subexpression-extractdocs . map ( {
c a s e ( i , v8 ) =>( v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ) . map ( {
c a s e ( v2 , v3 ) => v3 . s i z e} )
} ) . f o l d (m) ( {c a s e ( v34 , ( i , v10 ) ) =>
v10 | + | v34} )
+ E:map-horizontal-fissiondocs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID ) . __2
} ) . map ( {c a s e ( i , v11 ) =>
v11 . map ( {c a s e ( v2 , v3 ) => v3 . s i z e
} )} ) . f o l d (m) ( {
c a s e ( v34 , ( i , v10 ) ) =>v10 | + | v34
} )
+ E:swap-map-with-foldval v11 = docs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID ) . __2
} ) . f o l d ( "ZERO�TOKEN" ) ( {c a s e ( v12 , ( i , v13 ) ) =>
v12 | + | v13} ) ;
v11 . map ( {c a s e ( v2 , v3 ) => v3 . s i z e
} )
+ E:map-horizontal-fissionval v11 = docs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID )
} ) . map ( {c a s e ( i , v14 ) => v14 . __2
} ) . f o l d ( "ZERO�TOKEN" ) ( {
c a s e ( v12 , ( i , v13 ) ) =>v12 | + | v13
} ) ;v11 . map ( {
c a s e ( v2 , v3 ) =>v3 . s i z e
} )
+ E:map-horizontal-fissionval v11 = docs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " )
} ) . map ( {c a s e ( i , v15 ) =>
v15 . groupBy ( ID )} ) . map ( {
c a s e ( i , v14 ) => v14 . __2} ) . f o l d ( Map ( ) ) ( {
c a s e ( v12 , ( i , v13 ) ) =>v12 | + | v13
} ) ;v11 . map ( {
c a s e ( v2 , v3 ) =>v3 . s i z e
} )
+ E:swap-map-with-foldval v14 = docs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " )
} ) . map ( {c a s e ( i , v15 ) =>
v15 . groupBy ( ID )} ) . f o l d ( Map ( ) ) ( {
c a s e ( v16 , ( i , v17 ) ) =>v16 | + | v17
} ) ;( v14 . __2 ) . map ( {
c a s e ( v2 , v3 ) =>v3 . s i z e
} )
+ E:swap-map-with-foldval v15 = docs . map ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " )
} ) . f o l d ( Map ( ) ) ( {c a s e ( v18 , ( i , v19 ) ) =>
v18 | + | v19} ) ;
val v14 = v15 . groupBy ( ID ) ;( v14 . __2 ) . map ( {
c a s e ( v2 , v3 ) =>v3 . s i z e
} )
+ E:flatMapval v15 = docs . f l a t M a p ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " )
} ) ;val v14 = v15 . groupBy ( ID ) ;( v14 . __2 ) . map ( {
c a s e ( v2 , v3 ) =>v3 . s i z e
} )
+ E:reducebykeydocs . f l a t M a p ( {
c a s e ( i , v8 ) =>v8 . s p l i t ( " " )
} ) . map ( {c a s e ( j , v6 ) =>
( v6 , 1 )} ) . reduceByKey ( {
c a s e ( v2 , v3 ) =>v2 + v3
} )
[13] M. H. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao,and M. S. Lam. Detecting coarse-grain parallelism using aninterprocedural parallelizing compiler. Supercomputing ’95,1995.
[14] R. Joshi, G. Nelson, and K. Randall. Denali: A goal-directedsuperoptimizer. PLDI ’02, pp. 304–314, 2002.
[15] R. A. Kelsey. A correspondence between continuation passingstyle and static single assignment form. IR ’95, pp. 13–22,1995.
[16] Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak.Automatic synthesis of out-of-core algorithms. SIGMOD ’13,pp. 133–144, 2013.
[17] K. Knobe and V. Sarkar. Array SSA form and its use inparallelization. POPL ’98, pp. 107–120, 1998.
[18] R. Lämmel. Google’s MapReduce programming model —revisited. Science of Computer Programming, 70(1):1 – 30,2008.
[19] S.-w. Liao. Parallelizing user-defined and implicit reductionsglobally on multiprocessors. ACSAC’06, pp. 189–202, 2006.
[20] E. Meijer, M. Fokkinga, and R. Paterson. Functional program-ming with bananas, lenses, envelopes and barbed wire. FPCA’91, pp. 124–144, 1991.
[21] C. Nugteren and H. Corporaal. Introducing Bones: a paralleliz-ing source-to-source compiler based on algorithmic skeletons.GPGPU-5, pp. 1–10, 2012.
[22] B. C. Oliveira, A. Moors, and M. Odersky. Type classes asobjects and implicits. OOPSLA ’10, pp. 341–360, 2010.
[23] N. Ramsey. Unparsing expressions with prefix and postfixoperators. Software: Practice and Experience, 28(12):1327–1356, 1998.
[24] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, andC. Kozyrakis. Evaluating MapReduce for multi-core andmultiprocessor systems. HPCA ’07, pp. 13–24, 2007.
[25] M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam,A. Rountev, and P. Sadayappan. Code generation for parallelexecution of a class of irregular loops on distributed memorysystems. SC ’12, pp. 72:1–72:11, 2012.
[26] E. Schkufza, R. Sharma, and A. Aiken. Stochastic superopti-mization. ASPLOS ’13, pp. 305–316, 2013.
[27] A. M. Sloane. Lightweight language processing in Kiama.GTTSE III, pp. 408–425. Springer, 2011.
[28] M. M. Strout, L. Carter, and J. Ferrante. Compile-timecomposition of run-time data and iteration reorderings. PLDI’03, pp. 91–102, 2003.
[29] S. d. Swierstra and O. Chitil. Linear, bounded, functionalpretty-printing. J. Funct. Program., 19(1):1–16, Jan. 2009.
[30] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal,M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth,B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, andE. Baldeschwieler. Apache Hadoop YARN: Yet anotherresource negotiator. SOCC ’13, pp. 5:1–5:16, 2013.
[31] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, andI. Stoica. Spark: Cluster computing with working sets. Hot-Cloud’10, pp. 10–10, 2010.
A. WordCountfor ( i n t i = 0 ; i < docs . l e n g t h ; i ++) {
S t r i n g [ ] s p l i t = docs [ i ] . s p l i t ( " " ) ;for ( i n t j = 0 ; j < s p l i t . l e n g t h ; j ++) {
S t r i n g word = s p l i t [ j ] ;I n t e g e r p r ev = m. g e t ( word ) ;if ( p r ev == n u l l ) p r ev = 0 ;m. p u t ( word , p r ev + 1 ) ;
}}
+ Java to LambdaRange ( 0 , docs . s i z e ) . f o l d (m) ( {
c a s e ( v34 , i ) =>val s p l i t = docs ( i ) . s p l i t ( " " ) ;val v10 = s p l i t . s i z e ;Range ( 0 , v10 ) . f o l d ( v34 ) ( {
c a s e ( v33 , j ) =>val v13 = v33 ( s p l i t ( j ) ) ;val p rev = { if ( v13 != n u l l ) v13 else 0 } ;v33 . u p d a t e d ( s p l i t ( j ) , p r e v + 1)
} )} )
+ S:eliminate-null-checkRange ( 0 , docs . s i z e ) . f o l d (m) ( {
c a s e ( v34 , i ) =>val v6 = docs ( i ) . s p l i t ( " " ) ;val v4 = Range ( 0 , v6 . s i z e ) . groupBy ( {
c a s e ( j ) => v6 ( j )} ) ;
val v5 = v4 . map ( {c a s e ( v2 , v3 ) =>
v3 . f o l d ( v34 ( v2 ) ) ( {c a s e ( v1 , j ) => v1 + 1
} )} ) ;
v34 ++ v5} )
+ S:reduce-monoid-identity-opRange ( 0 , docs . s i z e ) . f o l d (m) ( {
c a s e ( v34 , i ) =>val v6 = docs ( i ) . s p l i t ( " " ) ;val v4 = Range ( 0 , v6 . s i z e ) . groupBy ( {
c a s e ( j ) => v6 ( j )} ) ;
val v5 = v4 . map ( {c a s e ( v2 , v3 ) =>
v34 ( v2 ) + v3 . s i z e} ) ;
v34 ++ v5} )
+ E:localize-group-by-accessesRange ( 0 , docs . s i z e ) . f o l d (m) ( {
c a s e ( v34 , i ) =>val v4 = docs ( i ) . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . map ( {
c a s e ( v2 , v3 ) =>v34 ( v2 ) + v3 . s i z e
} ) ;v34 ++ v5
} )
+ E:localize-map-accessesRange ( 0 , docs . s i z e ) . f o l d (m) ( {
c a s e ( v34 , i ) =>val v4 = docs ( i ) . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . z i p ( v34 ) . map ( {
c a s e ( v2 , ( v3 , v7 ) ) =>v7 + v3 . s i z e
} ) ;v34 ++ v5
} )
[13] M. H. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao,and M. S. Lam. Detecting coarse-grain parallelism using aninterprocedural parallelizing compiler. Supercomputing ’95,1995.
[14] R. Joshi, G. Nelson, and K. Randall. Denali: A goal-directedsuperoptimizer. PLDI ’02, pp. 304–314, 2002.
[15] R. A. Kelsey. A correspondence between continuation passingstyle and static single assignment form. IR ’95, pp. 13–22,1995.
[16] Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak.Automatic synthesis of out-of-core algorithms. SIGMOD ’13,pp. 133–144, 2013.
[17] K. Knobe and V. Sarkar. Array SSA form and its use inparallelization. POPL ’98, pp. 107–120, 1998.
[18] R. Lämmel. Google’s MapReduce programming model —revisited. Science of Computer Programming, 70(1):1 – 30,2008.
[19] S.-w. Liao. Parallelizing user-defined and implicit reductionsglobally on multiprocessors. ACSAC’06, pp. 189–202, 2006.
[20] E. Meijer, M. Fokkinga, and R. Paterson. Functional program-ming with bananas, lenses, envelopes and barbed wire. FPCA’91, pp. 124–144, 1991.
[21] C. Nugteren and H. Corporaal. Introducing Bones: a paralleliz-ing source-to-source compiler based on algorithmic skeletons.GPGPU-5, pp. 1–10, 2012.
[22] B. C. Oliveira, A. Moors, and M. Odersky. Type classes asobjects and implicits. OOPSLA ’10, pp. 341–360, 2010.
[23] N. Ramsey. Unparsing expressions with prefix and postfixoperators. Software: Practice and Experience, 28(12):1327–1356, 1998.
[24] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, andC. Kozyrakis. Evaluating MapReduce for multi-core andmultiprocessor systems. HPCA ’07, pp. 13–24, 2007.
[25] M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam,A. Rountev, and P. Sadayappan. Code generation for parallelexecution of a class of irregular loops on distributed memorysystems. SC ’12, pp. 72:1–72:11, 2012.
[26] E. Schkufza, R. Sharma, and A. Aiken. Stochastic superopti-mization. ASPLOS ’13, pp. 305–316, 2013.
[27] A. M. Sloane. Lightweight language processing in Kiama.GTTSE III, pp. 408–425. Springer, 2011.
[28] M. M. Strout, L. Carter, and J. Ferrante. Compile-timecomposition of run-time data and iteration reorderings. PLDI’03, pp. 91–102, 2003.
[29] S. d. Swierstra and O. Chitil. Linear, bounded, functionalpretty-printing. J. Funct. Program., 19(1):1–16, Jan. 2009.
[30] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal,M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth,B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, andE. Baldeschwieler. Apache Hadoop YARN: Yet anotherresource negotiator. SOCC ’13, pp. 5:1–5:16, 2013.
[31] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, andI. Stoica. Spark: Cluster computing with working sets. Hot-Cloud’10, pp. 10–10, 2010.
A. WordCountfor ( i n t i = 0 ; i < docs . l e n g t h ; i ++) {
S t r i n g [ ] s p l i t = docs [ i ] . s p l i t ( " " ) ;for ( i n t j = 0 ; j < s p l i t . l e n g t h ; j ++) {
S t r i n g word = s p l i t [ j ] ;I n t e g e r p r ev = m. g e t ( word ) ;if ( p r ev == n u l l ) p r ev = 0 ;m. p u t ( word , p r ev + 1 ) ;
}}
+ Java to LambdaRange ( 0 , docs . s i z e ) . f o l d (m) ( {
c a s e ( v34 , i ) =>val s p l i t = docs ( i ) . s p l i t ( " " ) ;val v10 = s p l i t . s i z e ;Range ( 0 , v10 ) . f o l d ( v34 ) ( {
c a s e ( v33 , j ) =>val v13 = v33 ( s p l i t ( j ) ) ;val p rev = { if ( v13 != n u l l ) v13 else 0 } ;v33 . u p d a t e d ( s p l i t ( j ) , p r ev + 1)
} )} )
+ S:eliminate-null-checkRange ( 0 , docs . s i z e ) . f o l d (m) ( {
c a s e ( v34 , i ) =>val v6 = docs ( i ) . s p l i t ( " " ) ;val v4 = Range ( 0 , v6 . s i z e ) . groupBy ( {
c a s e ( j ) => v6 ( j )} ) ;
val v5 = v4 . map ( {c a s e ( v2 , v3 ) =>
v3 . f o l d ( v34 ( v2 ) ) ( {c a s e ( v1 , j ) => v1 + 1
} )} ) ;
v34 ++ v5} )
+ S:reduce-monoid-identity-opRange ( 0 , docs . s i z e ) . f o l d (m) ( {
c a s e ( v34 , i ) =>val v6 = docs ( i ) . s p l i t ( " " ) ;val v4 = Range ( 0 , v6 . s i z e ) . groupBy ( {
c a s e ( j ) => v6 ( j )} ) ;
val v5 = v4 . map ( {c a s e ( v2 , v3 ) =>
v34 ( v2 ) + v3 . s i z e} ) ;
v34 ++ v5} )
+ E:localize-group-by-accessesRange ( 0 , docs . s i z e ) . f o l d (m) ( {
c a s e ( v34 , i ) =>val v4 = docs ( i ) . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . map ( {
c a s e ( v2 , v3 ) =>v34 ( v2 ) + v3 . s i z e
} ) ;v34 ++ v5
} )
+ E:localize-map-accessesRange ( 0 , docs . s i z e ) . f o l d (m) ( {
c a s e ( v34 , i ) =>val v4 = docs ( i ) . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . z i p ( v34 ) . map ( {
c a s e ( v2 , ( v3 , v7 ) ) =>v7 + v3 . s i z e
} ) ;v34 ++ v5
} )
BrittleDifficult to be correct
Hard to maintain
SEARCH
Proof of translationTarget code
SEARCH
Proof of translationTarget code
DSL
<
Proof of translation
PROGRAM SYNTHESISTarget code
DSL
<
Program Compilation vs. Synthesis
Suppose we want to optimize x+x+x+x
With compiler rewrite rules:
With synthesis search:
x+x+x+x 4*x 2^2*x x<<2
x<<2
x*5x+3
x+2x
x<<42Space of programs
x-5
UnitTester
TheoremProver
ModelChecker
SpecificationLanguage
Infer Lift
Verified Lifting
Validation
CodeGeneration
Source Languages Target DSLs
Search
MetaLift
CodeIdentifier
SpecificationSynthesizer
TheoremProver
CodeGenerator
• Boolean• Arithmetic
• Classes• Lists• Arrays• …
DSL semanticsSearch space description
Target code fragments
Codegen rules
UnitTests
Compiler GeneratorSpec Language
MetaLift
CodeIdentifier
SpecificationSynthesizer
TheoremProver
CodeGenerator
• Boolean• Arithmetic
• Classes• Lists• Arrays• …
Target code fragments
Codegen rules
Compiler GeneratorSpec Language
select :: [(a)] -> ((a)-> Bool) -> [(a)]select l f = [t | t <- l, f(t)]
join :: [(a)] -> [(b)] -> [(a,b)]join l1 l2 = [(t1, t2) | t <- l1, t <- l2]...
UnitTests
MetaLift
CodeIdentifier
SpecificationSynthesizer
TheoremProver
CodeGenerator
• Boolean• Arithmetic
• Classes• Lists• Arrays• …
Codegen rules
Compiler GeneratorSpec Language
select :: [(a)] -> ((a)-> Bool) -> [(a)]select l f = [t | t <- l, f(t)]
join :: [(a)] -> [(b)] -> [(a,b)]join l1 l2 = [(t1, t2) | t <- l1, t <- l2]...
Java: while (*) { * }
UnitTests
MetaLift
CodeIdentifier
SpecificationSynthesizer
TheoremProver
CodeGenerator
• Boolean• Arithmetic
• Classes• Lists• Arrays• …
select :: [(a)] -> ((a)-> Bool) -> [(a)]select l f = [t | t <- l, f(t)]
join :: [(a)] -> [(b)] -> [(a,b)]join l1 l2 = [(t1, t2) | t <- l1, t <- l2]...
Compiler GeneratorSpec Language
Java: while (*) { * }
translate (select l f) = "select * from" + translate(l) + "where" + translate(f)...
UnitTests
Data Analytics: Java àSQL
List getUsersWithRoles () {List users = this.userDao.getUsers(); List roles = this.roleDao.getRoles(); List results = new ArrayList();for (User u : users) {
for (Role r : roles) {if (u.roleId == r.id)
results.add(u); }}
return results; }
List getUsersWithRoles () { return executeQuery(“SELECT u FROM users u, roles r WHERE u.roleId == r.idORDER BY u.roleId, r.id”);
}
Lifted code can be optimized by DBs100-1000x speedup
codegen
[PLDI 13, CIDR 14]
1.Defineordered listsandtheiroperations
2.Synthesizer infersspecfrom source
3.Retargetsynthesized spectoSQLselect :: [(a)] -> ((a)-> Bool) -> [(a)]select l f = [t | t <- l, f(t)]
join :: [(a)] -> [(b)] -> [(a,b)]join l1 l2 = [(t1, t2) | t <- l1, t <- l2]
Leveraging GPUs: Fortran à Halide
procedure sten(imin,imax,jmin,jmax,a,b) real,dim(imin:imax,jmin:jmax) :: a real,dim(imin:imax,jmin:jmax) :: b do j=jmin,jmax
t = b(imin, j) do i=imin+1,imax
q = b(i,j) a(i,j) = q + t t=q
enddoenddo
end procedure
int main() { ImageParam b(type_of<dbl>(),2); Func func; Var i, j; func(i,j) = b(i-1,j) + b(i,j); func.compile_to_file("ex1", b); return 0;
}
Lifted code can be executed on GPUs17x speedup
codegen
1.Definearraysand theiroperations
2.Synthesizer infersspecfrom source
3.Retargetsynthesized spectoHalide
[SNAPL 15, PLDI 16]
get a i = a istore a i e = a//[(i,e)]
Hardware: Domino à Programmable Switches
void flowlet (Packet p) {p.new_hop = hash3(p.sport, p.dport,
p.arrival) % HOPS;p.id = hash2(p.sport, p.dport) % FLOWS;
if (p.arrival – last_time[p.id]>THRESHOLD)
saved_hop[pkt.id] = p.new_hop;
...}
Compile 10 well-known data plane algorithms to switches & run at line rate
codegen
1.Definehardwarebuilding blocksasinstructions
2.Synthesizer infersspecfrom source
3.Retargetspectoprogrammable switches [SIGCOMM 16]mux c v1 v2 = if c then v1 else v2pairExec x y f g = (f x, g y)inc x e = x + e
Parallel Frameworks: Sequential Java à Hadoop
// sequential implementationint regress(Point [] points) {
int SumXY = 0;for (Point p : points){
SumXY += p.x * p.y;}
return SumXY;}
void map(Object key, Point [] value)
{ for (Point p : points)emit("sumxy", p.x * p.y);
}}
void reduce(Text key, int [] vs) { int SumXY = 0;
for (Integer val : vs)SumXY = SumXY + val;
emit(key, SumXY); }
Lifted code can be optimized by Hadoop/Spark32x speedup
codegen
1.Definesemanticsofmapandreduce
2.Synthesizer infersspecfrom source
3.Retargetsynthesized spectoHadoop andSparkmap l f = [f t | t <- l] reduce l f i = foldl f i l
[SYNT 16, SIGMOD 17]
CasperCompiling Sequential Java to HadoopAn Illustration of the MetaLift Toolchain
Demo
CodeIdentifier
SpecificationSynthesizer
TheoremProver
CodeGenerator
Casper Architecture
DSL semanticsSearch space description
Target code fragments
Codegen rules
CodeIdentifier
SpecificationSynthesizer
TheoremProver
CodeGenerator
Casper Architecture
DSL semanticsSearch space description
Codegen rules
Java: while (*) { * }
map :: [(a)] -> (a -> (k,v)) -> [(k,v)]map l f = [f t | t <- l]
reduce :: [(k,[v])] -> (v -> v -> v) -> [(k,v)]reduce l f i = [(k, foldl f i v) | (k,v) <- l]
translate (map l f) ="void map(...) { for (t : l) emit(" + translate(f) + "; }"
translate (reduce l f i) = ...
DSL semantics
Code generation
Search Space Grammar
e := (e1, e2) | e + e | e - e | var | lit
| e * e | e / e | if e1 then e2 else e3
Automatically generated for each input code fragment
Increases expressivity incrementally
CandidateSpecGenerator
BoundedModelChecker
Search space grammar
TheoremProver
Candidate Spec
SynthesisCoordinatorInput code fragment
Correct Specifications
Candidate Spec
CodeIdentifier
SpecificationSynthesizer
TheoremProver
CodeGenerator
Evaluation: Benchmarks
60 benchmarks collected from 5 benchmark suites
Phoenix: Classic MapReduce problems such as WordCount, 3D Histogram, Linear Regression and StringMatch.
Bigλ: Big-Data analytical benchmarks including basic sentiment analysis, database operations and Wikipedia log processing.
Arithmetic: Mathematical functions such as Max, Delta and Conditional Sum.
Statistical: Statistical analysis such as Mean, Covariance and Standard Error.
Fiji: Three image processing kernels: RedToMagenta, Temporal Median and Trails.
Feasibility: Does Casper work?Benchmark Extracted Translated
Phoenix 7 11
Bigλ 6 8
Arithmetic 11 11
Statistical 18 19
Fiji 8 11
Total 50 60
Causes of failures• 3 caused by calls to unsupported to external library methods• 7 caused by program constructs not currently expressible using MetaLift
Performance: Is Casper competitive?
0x
5x
10x
15x
20x
25x
30x
35x
StringMatch WordCount LinearRegression
3DHistogram YelpKids WikipediaPageCount
Covariance HadamardProduct
DatabaseSelect
AnscombeTransform
Speedup
MOLD
Spark(Casper)
Mean speedup: 17.6x Max speedup: 31x
0x
5x
10x
15x
20x
25x
30x
35x
StringMatch WordCount LinearRegression
3DHistogram YelpKids WikipediaPageCount
Covariance HadamardProduct
DatabaseSelect
AnscombeTransform
Speedup
MOLDSpark(Manual)Spark(Casper)
Competitive with expert implementations
Performance: Is Casper competitive?
Performance Analysis: Is Casper Practical?
Mean compilation time for one benchmark was 5.8 minutes.
Median compilation time for one benchmark was 36.6 seconds.
Mean time to reach first correct translation was only 48 seconds!
BenchmarkSolutions Generated
with PruningSolutions Generated
without Pruning
WordCount 2 827
3D Histogram 5 118
Yelp: Kid Friendly Restaurants 1 286