LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K....

36
LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheung uwdb.io uwplse.org

Transcript of LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K....

Page 1: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

LEVERAGING DSLS (ALMOST) FOR FREE

@alvinkcheung

uwdb.iouwplse.org

Page 2: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Time

Analytics

ImageProcessing

Hardware

ParallelComputing

Page 3: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

?

Page 4: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Rotate Blur

Join

CachingPipelining

Hashimplementation

Embedded algorithmic detailsArchitecture-specific optimizations

HashPartitioning

Page 5: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

?YourDSLhere

Page 6: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Annuals of Code Rewrites

• Overhaul of the Mozilla Gecko layout engine (2 years),and now to Servo

• Rewrite of INGRES database into PostGRES (3 years)

• Twitter search engine rewrite (2 years)

All of these were major engineering efforts!

Page 7: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Vision:Perform code rewrites automatically

Current focus: Performance-critical code fragments

Page 8: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

for ($i = 0; $i < $l1.size(); ++$i) {$l2.append($l1[$i] + $c);

}$l2 = $l1.map(e -> e + $c);

List getUsersWithRoles () {List users = this.userDao.getUsers(); List roles = this.roleDao.getRoles(); List results = new ArrayList();for (User u : users) {

for (Role r : roles) {if (u.roleId == r.id)

results.add(u); }}

return results; }

?

Syntax-Driven Compilation

Page 9: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

[Radoi et al, OOPSLA 14]

+ E:localize-fold-accssesdocs . f o l d (m) ( {

c a s e ( v34 , ( i , v8 ) ) =>val v4 = v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . z i p ( v34 ) . map ( {

c a s e ( v2 , ( v3 , v7 ) ) =>v7 + v3 . s i z e

} ) ;v34 ++ v5

} )

+ E:map-vertical-fissiondocs . f o l d (m) ( {

c a s e ( v34 , ( i , v8 ) ) =>val v4 = v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . map ( {

c a s e ( v2 , v3 ) => v3 . s i z e} ) . z i p ( v34 ) . map ( {

c a s e ( v2 , ( v9 , v7 ) ) => v7 + v9} ) ;

v34 ++ v5} )

+ E:identify-map-monoid-plusdocs . f o l d (m) ( {

c a s e ( v34 , ( i , v8 ) ) =>( v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ) . map ( {

c a s e ( v2 , v3 ) => v3 . s i z e} ) | + | v34

} )

+ E:pull-map-from-fold-by-subexpression-extractdocs . map ( {

c a s e ( i , v8 ) =>( v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ) . map ( {

c a s e ( v2 , v3 ) => v3 . s i z e} )

} ) . f o l d (m) ( {c a s e ( v34 , ( i , v10 ) ) =>

v10 | + | v34} )

+ E:map-horizontal-fissiondocs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID ) . __2

} ) . map ( {c a s e ( i , v11 ) =>

v11 . map ( {c a s e ( v2 , v3 ) => v3 . s i z e

} )} ) . f o l d (m) ( {

c a s e ( v34 , ( i , v10 ) ) =>v10 | + | v34

} )

+ E:swap-map-with-foldval v11 = docs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID ) . __2

} ) . f o l d ( "ZERO�TOKEN" ) ( {c a s e ( v12 , ( i , v13 ) ) =>

v12 | + | v13} ) ;

v11 . map ( {c a s e ( v2 , v3 ) => v3 . s i z e

} )

+ E:map-horizontal-fissionval v11 = docs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID )

} ) . map ( {c a s e ( i , v14 ) => v14 . __2

} ) . f o l d ( "ZERO�TOKEN" ) ( {

c a s e ( v12 , ( i , v13 ) ) =>v12 | + | v13

} ) ;v11 . map ( {

c a s e ( v2 , v3 ) =>v3 . s i z e

} )

+ E:map-horizontal-fissionval v11 = docs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " )

} ) . map ( {c a s e ( i , v15 ) =>

v15 . groupBy ( ID )} ) . map ( {

c a s e ( i , v14 ) => v14 . __2} ) . f o l d ( Map ( ) ) ( {

c a s e ( v12 , ( i , v13 ) ) =>v12 | + | v13

} ) ;v11 . map ( {

c a s e ( v2 , v3 ) =>v3 . s i z e

} )

+ E:swap-map-with-foldval v14 = docs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " )

} ) . map ( {c a s e ( i , v15 ) =>

v15 . groupBy ( ID )} ) . f o l d ( Map ( ) ) ( {

c a s e ( v16 , ( i , v17 ) ) =>v16 | + | v17

} ) ;( v14 . __2 ) . map ( {

c a s e ( v2 , v3 ) =>v3 . s i z e

} )

+ E:swap-map-with-foldval v15 = docs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " )

} ) . f o l d ( Map ( ) ) ( {c a s e ( v18 , ( i , v19 ) ) =>

v18 | + | v19} ) ;

val v14 = v15 . groupBy ( ID ) ;( v14 . __2 ) . map ( {

c a s e ( v2 , v3 ) =>v3 . s i z e

} )

+ E:flatMapval v15 = docs . f l a t M a p ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " )

} ) ;val v14 = v15 . groupBy ( ID ) ;( v14 . __2 ) . map ( {

c a s e ( v2 , v3 ) =>v3 . s i z e

} )

+ E:reducebykeydocs . f l a t M a p ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " )

} ) . map ( {c a s e ( j , v6 ) =>

( v6 , 1 )} ) . reduceByKey ( {

c a s e ( v2 , v3 ) =>v2 + v3

} )

+ E:localize-fold-accssesdocs . f o l d (m) ( {

c a s e ( v34 , ( i , v8 ) ) =>val v4 = v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . z i p ( v34 ) . map ( {

c a s e ( v2 , ( v3 , v7 ) ) =>v7 + v3 . s i z e

} ) ;v34 ++ v5

} )

+ E:map-vertical-fissiondocs . f o l d (m) ( {

c a s e ( v34 , ( i , v8 ) ) =>val v4 = v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . map ( {

c a s e ( v2 , v3 ) => v3 . s i z e} ) . z i p ( v34 ) . map ( {

c a s e ( v2 , ( v9 , v7 ) ) => v7 + v9} ) ;

v34 ++ v5} )

+ E:identify-map-monoid-plusdocs . f o l d (m) ( {

c a s e ( v34 , ( i , v8 ) ) =>( v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ) . map ( {

c a s e ( v2 , v3 ) => v3 . s i z e} ) | + | v34

} )

+ E:pull-map-from-fold-by-subexpression-extractdocs . map ( {

c a s e ( i , v8 ) =>( v8 . s p l i t ( " " ) . groupBy ( ID ) . __2 ) . map ( {

c a s e ( v2 , v3 ) => v3 . s i z e} )

} ) . f o l d (m) ( {c a s e ( v34 , ( i , v10 ) ) =>

v10 | + | v34} )

+ E:map-horizontal-fissiondocs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID ) . __2

} ) . map ( {c a s e ( i , v11 ) =>

v11 . map ( {c a s e ( v2 , v3 ) => v3 . s i z e

} )} ) . f o l d (m) ( {

c a s e ( v34 , ( i , v10 ) ) =>v10 | + | v34

} )

+ E:swap-map-with-foldval v11 = docs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID ) . __2

} ) . f o l d ( "ZERO�TOKEN" ) ( {c a s e ( v12 , ( i , v13 ) ) =>

v12 | + | v13} ) ;

v11 . map ( {c a s e ( v2 , v3 ) => v3 . s i z e

} )

+ E:map-horizontal-fissionval v11 = docs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " ) . groupBy ( ID )

} ) . map ( {c a s e ( i , v14 ) => v14 . __2

} ) . f o l d ( "ZERO�TOKEN" ) ( {

c a s e ( v12 , ( i , v13 ) ) =>v12 | + | v13

} ) ;v11 . map ( {

c a s e ( v2 , v3 ) =>v3 . s i z e

} )

+ E:map-horizontal-fissionval v11 = docs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " )

} ) . map ( {c a s e ( i , v15 ) =>

v15 . groupBy ( ID )} ) . map ( {

c a s e ( i , v14 ) => v14 . __2} ) . f o l d ( Map ( ) ) ( {

c a s e ( v12 , ( i , v13 ) ) =>v12 | + | v13

} ) ;v11 . map ( {

c a s e ( v2 , v3 ) =>v3 . s i z e

} )

+ E:swap-map-with-foldval v14 = docs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " )

} ) . map ( {c a s e ( i , v15 ) =>

v15 . groupBy ( ID )} ) . f o l d ( Map ( ) ) ( {

c a s e ( v16 , ( i , v17 ) ) =>v16 | + | v17

} ) ;( v14 . __2 ) . map ( {

c a s e ( v2 , v3 ) =>v3 . s i z e

} )

+ E:swap-map-with-foldval v15 = docs . map ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " )

} ) . f o l d ( Map ( ) ) ( {c a s e ( v18 , ( i , v19 ) ) =>

v18 | + | v19} ) ;

val v14 = v15 . groupBy ( ID ) ;( v14 . __2 ) . map ( {

c a s e ( v2 , v3 ) =>v3 . s i z e

} )

+ E:flatMapval v15 = docs . f l a t M a p ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " )

} ) ;val v14 = v15 . groupBy ( ID ) ;( v14 . __2 ) . map ( {

c a s e ( v2 , v3 ) =>v3 . s i z e

} )

+ E:reducebykeydocs . f l a t M a p ( {

c a s e ( i , v8 ) =>v8 . s p l i t ( " " )

} ) . map ( {c a s e ( j , v6 ) =>

( v6 , 1 )} ) . reduceByKey ( {

c a s e ( v2 , v3 ) =>v2 + v3

} )

[13] M. H. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao,and M. S. Lam. Detecting coarse-grain parallelism using aninterprocedural parallelizing compiler. Supercomputing ’95,1995.

[14] R. Joshi, G. Nelson, and K. Randall. Denali: A goal-directedsuperoptimizer. PLDI ’02, pp. 304–314, 2002.

[15] R. A. Kelsey. A correspondence between continuation passingstyle and static single assignment form. IR ’95, pp. 13–22,1995.

[16] Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak.Automatic synthesis of out-of-core algorithms. SIGMOD ’13,pp. 133–144, 2013.

[17] K. Knobe and V. Sarkar. Array SSA form and its use inparallelization. POPL ’98, pp. 107–120, 1998.

[18] R. Lämmel. Google’s MapReduce programming model —revisited. Science of Computer Programming, 70(1):1 – 30,2008.

[19] S.-w. Liao. Parallelizing user-defined and implicit reductionsglobally on multiprocessors. ACSAC’06, pp. 189–202, 2006.

[20] E. Meijer, M. Fokkinga, and R. Paterson. Functional program-ming with bananas, lenses, envelopes and barbed wire. FPCA’91, pp. 124–144, 1991.

[21] C. Nugteren and H. Corporaal. Introducing Bones: a paralleliz-ing source-to-source compiler based on algorithmic skeletons.GPGPU-5, pp. 1–10, 2012.

[22] B. C. Oliveira, A. Moors, and M. Odersky. Type classes asobjects and implicits. OOPSLA ’10, pp. 341–360, 2010.

[23] N. Ramsey. Unparsing expressions with prefix and postfixoperators. Software: Practice and Experience, 28(12):1327–1356, 1998.

[24] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, andC. Kozyrakis. Evaluating MapReduce for multi-core andmultiprocessor systems. HPCA ’07, pp. 13–24, 2007.

[25] M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam,A. Rountev, and P. Sadayappan. Code generation for parallelexecution of a class of irregular loops on distributed memorysystems. SC ’12, pp. 72:1–72:11, 2012.

[26] E. Schkufza, R. Sharma, and A. Aiken. Stochastic superopti-mization. ASPLOS ’13, pp. 305–316, 2013.

[27] A. M. Sloane. Lightweight language processing in Kiama.GTTSE III, pp. 408–425. Springer, 2011.

[28] M. M. Strout, L. Carter, and J. Ferrante. Compile-timecomposition of run-time data and iteration reorderings. PLDI’03, pp. 91–102, 2003.

[29] S. d. Swierstra and O. Chitil. Linear, bounded, functionalpretty-printing. J. Funct. Program., 19(1):1–16, Jan. 2009.

[30] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal,M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth,B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, andE. Baldeschwieler. Apache Hadoop YARN: Yet anotherresource negotiator. SOCC ’13, pp. 5:1–5:16, 2013.

[31] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, andI. Stoica. Spark: Cluster computing with working sets. Hot-Cloud’10, pp. 10–10, 2010.

A. WordCountfor ( i n t i = 0 ; i < docs . l e n g t h ; i ++) {

S t r i n g [ ] s p l i t = docs [ i ] . s p l i t ( " " ) ;for ( i n t j = 0 ; j < s p l i t . l e n g t h ; j ++) {

S t r i n g word = s p l i t [ j ] ;I n t e g e r p r ev = m. g e t ( word ) ;if ( p r ev == n u l l ) p r ev = 0 ;m. p u t ( word , p r ev + 1 ) ;

}}

+ Java to LambdaRange ( 0 , docs . s i z e ) . f o l d (m) ( {

c a s e ( v34 , i ) =>val s p l i t = docs ( i ) . s p l i t ( " " ) ;val v10 = s p l i t . s i z e ;Range ( 0 , v10 ) . f o l d ( v34 ) ( {

c a s e ( v33 , j ) =>val v13 = v33 ( s p l i t ( j ) ) ;val p rev = { if ( v13 != n u l l ) v13 else 0 } ;v33 . u p d a t e d ( s p l i t ( j ) , p r e v + 1)

} )} )

+ S:eliminate-null-checkRange ( 0 , docs . s i z e ) . f o l d (m) ( {

c a s e ( v34 , i ) =>val v6 = docs ( i ) . s p l i t ( " " ) ;val v4 = Range ( 0 , v6 . s i z e ) . groupBy ( {

c a s e ( j ) => v6 ( j )} ) ;

val v5 = v4 . map ( {c a s e ( v2 , v3 ) =>

v3 . f o l d ( v34 ( v2 ) ) ( {c a s e ( v1 , j ) => v1 + 1

} )} ) ;

v34 ++ v5} )

+ S:reduce-monoid-identity-opRange ( 0 , docs . s i z e ) . f o l d (m) ( {

c a s e ( v34 , i ) =>val v6 = docs ( i ) . s p l i t ( " " ) ;val v4 = Range ( 0 , v6 . s i z e ) . groupBy ( {

c a s e ( j ) => v6 ( j )} ) ;

val v5 = v4 . map ( {c a s e ( v2 , v3 ) =>

v34 ( v2 ) + v3 . s i z e} ) ;

v34 ++ v5} )

+ E:localize-group-by-accessesRange ( 0 , docs . s i z e ) . f o l d (m) ( {

c a s e ( v34 , i ) =>val v4 = docs ( i ) . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . map ( {

c a s e ( v2 , v3 ) =>v34 ( v2 ) + v3 . s i z e

} ) ;v34 ++ v5

} )

+ E:localize-map-accessesRange ( 0 , docs . s i z e ) . f o l d (m) ( {

c a s e ( v34 , i ) =>val v4 = docs ( i ) . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . z i p ( v34 ) . map ( {

c a s e ( v2 , ( v3 , v7 ) ) =>v7 + v3 . s i z e

} ) ;v34 ++ v5

} )

[13] M. H. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao,and M. S. Lam. Detecting coarse-grain parallelism using aninterprocedural parallelizing compiler. Supercomputing ’95,1995.

[14] R. Joshi, G. Nelson, and K. Randall. Denali: A goal-directedsuperoptimizer. PLDI ’02, pp. 304–314, 2002.

[15] R. A. Kelsey. A correspondence between continuation passingstyle and static single assignment form. IR ’95, pp. 13–22,1995.

[16] Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak.Automatic synthesis of out-of-core algorithms. SIGMOD ’13,pp. 133–144, 2013.

[17] K. Knobe and V. Sarkar. Array SSA form and its use inparallelization. POPL ’98, pp. 107–120, 1998.

[18] R. Lämmel. Google’s MapReduce programming model —revisited. Science of Computer Programming, 70(1):1 – 30,2008.

[19] S.-w. Liao. Parallelizing user-defined and implicit reductionsglobally on multiprocessors. ACSAC’06, pp. 189–202, 2006.

[20] E. Meijer, M. Fokkinga, and R. Paterson. Functional program-ming with bananas, lenses, envelopes and barbed wire. FPCA’91, pp. 124–144, 1991.

[21] C. Nugteren and H. Corporaal. Introducing Bones: a paralleliz-ing source-to-source compiler based on algorithmic skeletons.GPGPU-5, pp. 1–10, 2012.

[22] B. C. Oliveira, A. Moors, and M. Odersky. Type classes asobjects and implicits. OOPSLA ’10, pp. 341–360, 2010.

[23] N. Ramsey. Unparsing expressions with prefix and postfixoperators. Software: Practice and Experience, 28(12):1327–1356, 1998.

[24] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, andC. Kozyrakis. Evaluating MapReduce for multi-core andmultiprocessor systems. HPCA ’07, pp. 13–24, 2007.

[25] M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam,A. Rountev, and P. Sadayappan. Code generation for parallelexecution of a class of irregular loops on distributed memorysystems. SC ’12, pp. 72:1–72:11, 2012.

[26] E. Schkufza, R. Sharma, and A. Aiken. Stochastic superopti-mization. ASPLOS ’13, pp. 305–316, 2013.

[27] A. M. Sloane. Lightweight language processing in Kiama.GTTSE III, pp. 408–425. Springer, 2011.

[28] M. M. Strout, L. Carter, and J. Ferrante. Compile-timecomposition of run-time data and iteration reorderings. PLDI’03, pp. 91–102, 2003.

[29] S. d. Swierstra and O. Chitil. Linear, bounded, functionalpretty-printing. J. Funct. Program., 19(1):1–16, Jan. 2009.

[30] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal,M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth,B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, andE. Baldeschwieler. Apache Hadoop YARN: Yet anotherresource negotiator. SOCC ’13, pp. 5:1–5:16, 2013.

[31] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, andI. Stoica. Spark: Cluster computing with working sets. Hot-Cloud’10, pp. 10–10, 2010.

A. WordCountfor ( i n t i = 0 ; i < docs . l e n g t h ; i ++) {

S t r i n g [ ] s p l i t = docs [ i ] . s p l i t ( " " ) ;for ( i n t j = 0 ; j < s p l i t . l e n g t h ; j ++) {

S t r i n g word = s p l i t [ j ] ;I n t e g e r p r ev = m. g e t ( word ) ;if ( p r ev == n u l l ) p r ev = 0 ;m. p u t ( word , p r ev + 1 ) ;

}}

+ Java to LambdaRange ( 0 , docs . s i z e ) . f o l d (m) ( {

c a s e ( v34 , i ) =>val s p l i t = docs ( i ) . s p l i t ( " " ) ;val v10 = s p l i t . s i z e ;Range ( 0 , v10 ) . f o l d ( v34 ) ( {

c a s e ( v33 , j ) =>val v13 = v33 ( s p l i t ( j ) ) ;val p rev = { if ( v13 != n u l l ) v13 else 0 } ;v33 . u p d a t e d ( s p l i t ( j ) , p r ev + 1)

} )} )

+ S:eliminate-null-checkRange ( 0 , docs . s i z e ) . f o l d (m) ( {

c a s e ( v34 , i ) =>val v6 = docs ( i ) . s p l i t ( " " ) ;val v4 = Range ( 0 , v6 . s i z e ) . groupBy ( {

c a s e ( j ) => v6 ( j )} ) ;

val v5 = v4 . map ( {c a s e ( v2 , v3 ) =>

v3 . f o l d ( v34 ( v2 ) ) ( {c a s e ( v1 , j ) => v1 + 1

} )} ) ;

v34 ++ v5} )

+ S:reduce-monoid-identity-opRange ( 0 , docs . s i z e ) . f o l d (m) ( {

c a s e ( v34 , i ) =>val v6 = docs ( i ) . s p l i t ( " " ) ;val v4 = Range ( 0 , v6 . s i z e ) . groupBy ( {

c a s e ( j ) => v6 ( j )} ) ;

val v5 = v4 . map ( {c a s e ( v2 , v3 ) =>

v34 ( v2 ) + v3 . s i z e} ) ;

v34 ++ v5} )

+ E:localize-group-by-accessesRange ( 0 , docs . s i z e ) . f o l d (m) ( {

c a s e ( v34 , i ) =>val v4 = docs ( i ) . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . map ( {

c a s e ( v2 , v3 ) =>v34 ( v2 ) + v3 . s i z e

} ) ;v34 ++ v5

} )

+ E:localize-map-accessesRange ( 0 , docs . s i z e ) . f o l d (m) ( {

c a s e ( v34 , i ) =>val v4 = docs ( i ) . s p l i t ( " " ) . groupBy ( ID ) . __2 ;val v5 = v4 . z i p ( v34 ) . map ( {

c a s e ( v2 , ( v3 , v7 ) ) =>v7 + v3 . s i z e

} ) ;v34 ++ v5

} )

Page 10: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

BrittleDifficult to be correct

Hard to maintain

Page 11: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

SEARCH

Proof of translationTarget code

Page 12: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

SEARCH

Proof of translationTarget code

DSL

<

Page 13: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Proof of translation

PROGRAM SYNTHESISTarget code

DSL

<

Page 14: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Program Compilation vs. Synthesis

Suppose we want to optimize x+x+x+x

With compiler rewrite rules:

With synthesis search:

x+x+x+x 4*x 2^2*x x<<2

x<<2

x*5x+3

x+2x

x<<42Space of programs

x-5

UnitTester

TheoremProver

ModelChecker

Page 15: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

SpecificationLanguage

Infer Lift

Verified Lifting

Validation

CodeGeneration

Source Languages Target DSLs

Search

Page 16: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

MetaLift

CodeIdentifier

SpecificationSynthesizer

TheoremProver

CodeGenerator

• Boolean• Arithmetic

• Classes• Lists• Arrays• …

DSL semanticsSearch space description

Target code fragments

Codegen rules

UnitTests

Compiler GeneratorSpec Language

Page 17: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

MetaLift

CodeIdentifier

SpecificationSynthesizer

TheoremProver

CodeGenerator

• Boolean• Arithmetic

• Classes• Lists• Arrays• …

Target code fragments

Codegen rules

Compiler GeneratorSpec Language

select :: [(a)] -> ((a)-> Bool) -> [(a)]select l f = [t | t <- l, f(t)]

join :: [(a)] -> [(b)] -> [(a,b)]join l1 l2 = [(t1, t2) | t <- l1, t <- l2]...

UnitTests

Page 18: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

MetaLift

CodeIdentifier

SpecificationSynthesizer

TheoremProver

CodeGenerator

• Boolean• Arithmetic

• Classes• Lists• Arrays• …

Codegen rules

Compiler GeneratorSpec Language

select :: [(a)] -> ((a)-> Bool) -> [(a)]select l f = [t | t <- l, f(t)]

join :: [(a)] -> [(b)] -> [(a,b)]join l1 l2 = [(t1, t2) | t <- l1, t <- l2]...

Java: while (*) { * }

UnitTests

Page 19: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

MetaLift

CodeIdentifier

SpecificationSynthesizer

TheoremProver

CodeGenerator

• Boolean• Arithmetic

• Classes• Lists• Arrays• …

select :: [(a)] -> ((a)-> Bool) -> [(a)]select l f = [t | t <- l, f(t)]

join :: [(a)] -> [(b)] -> [(a,b)]join l1 l2 = [(t1, t2) | t <- l1, t <- l2]...

Compiler GeneratorSpec Language

Java: while (*) { * }

translate (select l f) = "select * from" + translate(l) + "where" + translate(f)...

UnitTests

Page 20: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Data Analytics: Java àSQL

List getUsersWithRoles () {List users = this.userDao.getUsers(); List roles = this.roleDao.getRoles(); List results = new ArrayList();for (User u : users) {

for (Role r : roles) {if (u.roleId == r.id)

results.add(u); }}

return results; }

List getUsersWithRoles () { return executeQuery(“SELECT u FROM users u, roles r WHERE u.roleId == r.idORDER BY u.roleId, r.id”);

}

Lifted code can be optimized by DBs100-1000x speedup

codegen

[PLDI 13, CIDR 14]

1.Defineordered listsandtheiroperations

2.Synthesizer infersspecfrom source

3.Retargetsynthesized spectoSQLselect :: [(a)] -> ((a)-> Bool) -> [(a)]select l f = [t | t <- l, f(t)]

join :: [(a)] -> [(b)] -> [(a,b)]join l1 l2 = [(t1, t2) | t <- l1, t <- l2]

Page 21: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Leveraging GPUs: Fortran à Halide

procedure sten(imin,imax,jmin,jmax,a,b) real,dim(imin:imax,jmin:jmax) :: a real,dim(imin:imax,jmin:jmax) :: b do j=jmin,jmax

t = b(imin, j) do i=imin+1,imax

q = b(i,j) a(i,j) = q + t t=q

enddoenddo

end procedure

int main() { ImageParam b(type_of<dbl>(),2); Func func; Var i, j; func(i,j) = b(i-1,j) + b(i,j); func.compile_to_file("ex1", b); return 0;

}

Lifted code can be executed on GPUs17x speedup

codegen

1.Definearraysand theiroperations

2.Synthesizer infersspecfrom source

3.Retargetsynthesized spectoHalide

[SNAPL 15, PLDI 16]

get a i = a istore a i e = a//[(i,e)]

Page 22: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Hardware: Domino à Programmable Switches

void flowlet (Packet p) {p.new_hop = hash3(p.sport, p.dport,

p.arrival) % HOPS;p.id = hash2(p.sport, p.dport) % FLOWS;

if (p.arrival – last_time[p.id]>THRESHOLD)

saved_hop[pkt.id] = p.new_hop;

...}

Compile 10 well-known data plane algorithms to switches & run at line rate

codegen

1.Definehardwarebuilding blocksasinstructions

2.Synthesizer infersspecfrom source

3.Retargetspectoprogrammable switches [SIGCOMM 16]mux c v1 v2 = if c then v1 else v2pairExec x y f g = (f x, g y)inc x e = x + e

Page 23: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Parallel Frameworks: Sequential Java à Hadoop

// sequential implementationint regress(Point [] points) {

int SumXY = 0;for (Point p : points){

SumXY += p.x * p.y;}

return SumXY;}

void map(Object key, Point [] value)

{ for (Point p : points)emit("sumxy", p.x * p.y);

}}

void reduce(Text key, int [] vs) { int SumXY = 0;

for (Integer val : vs)SumXY = SumXY + val;

emit(key, SumXY); }

Lifted code can be optimized by Hadoop/Spark32x speedup

codegen

1.Definesemanticsofmapandreduce

2.Synthesizer infersspecfrom source

3.Retargetsynthesized spectoHadoop andSparkmap l f = [f t | t <- l] reduce l f i = foldl f i l

[SYNT 16, SIGMOD 17]

Page 24: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

CasperCompiling Sequential Java to HadoopAn Illustration of the MetaLift Toolchain

Page 25: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Demo

Page 26: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

CodeIdentifier

SpecificationSynthesizer

TheoremProver

CodeGenerator

Casper Architecture

DSL semanticsSearch space description

Target code fragments

Codegen rules

Page 27: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

CodeIdentifier

SpecificationSynthesizer

TheoremProver

CodeGenerator

Casper Architecture

DSL semanticsSearch space description

Codegen rules

Java: while (*) { * }

Page 28: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

map :: [(a)] -> (a -> (k,v)) -> [(k,v)]map l f = [f t | t <- l]

reduce :: [(k,[v])] -> (v -> v -> v) -> [(k,v)]reduce l f i = [(k, foldl f i v) | (k,v) <- l]

translate (map l f) ="void map(...) { for (t : l) emit(" + translate(f) + "; }"

translate (reduce l f i) = ...

DSL semantics

Code generation

Page 29: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Search Space Grammar

e := (e1, e2) | e + e | e - e | var | lit

| e * e | e / e | if e1 then e2 else e3

Automatically generated for each input code fragment

Increases expressivity incrementally

Page 30: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

CandidateSpecGenerator

BoundedModelChecker

Search space grammar

TheoremProver

Candidate Spec

SynthesisCoordinatorInput code fragment

Correct Specifications

Candidate Spec

CodeIdentifier

SpecificationSynthesizer

TheoremProver

CodeGenerator

Page 31: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Evaluation: Benchmarks

60 benchmarks collected from 5 benchmark suites

Phoenix: Classic MapReduce problems such as WordCount, 3D Histogram, Linear Regression and StringMatch.

Bigλ: Big-Data analytical benchmarks including basic sentiment analysis, database operations and Wikipedia log processing.

Arithmetic: Mathematical functions such as Max, Delta and Conditional Sum.

Statistical: Statistical analysis such as Mean, Covariance and Standard Error.

Fiji: Three image processing kernels: RedToMagenta, Temporal Median and Trails.

Page 32: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Feasibility: Does Casper work?Benchmark Extracted Translated

Phoenix 7 11

Bigλ 6 8

Arithmetic 11 11

Statistical 18 19

Fiji 8 11

Total 50 60

Causes of failures• 3 caused by calls to unsupported to external library methods• 7 caused by program constructs not currently expressible using MetaLift

Page 33: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Performance: Is Casper competitive?

0x

5x

10x

15x

20x

25x

30x

35x

StringMatch WordCount LinearRegression

3DHistogram YelpKids WikipediaPageCount

Covariance HadamardProduct

DatabaseSelect

AnscombeTransform

Speedup

MOLD

Spark(Casper)

Mean speedup: 17.6x Max speedup: 31x

Page 34: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

0x

5x

10x

15x

20x

25x

30x

35x

StringMatch WordCount LinearRegression

3DHistogram YelpKids WikipediaPageCount

Covariance HadamardProduct

DatabaseSelect

AnscombeTransform

Speedup

MOLDSpark(Manual)Spark(Casper)

Competitive with expert implementations

Performance: Is Casper competitive?

Page 35: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.

Performance Analysis: Is Casper Practical?

Mean compilation time for one benchmark was 5.8 minutes.

Median compilation time for one benchmark was 36.6 seconds.

Mean time to reach first correct translation was only 48 seconds!

BenchmarkSolutions Generated

with PruningSolutions Generated

without Pruning

WordCount 2 827

3D Histogram 5 118

Yelp: Kid Friendly Restaurants 1 286

Page 36: LEVERAGING DSLS (ALMOST) FOR FREE @alvinkcheungmetalift.uwplse.org/strangeloop17.pdf · [17] K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL ’98, pp.