Presentation

22
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scala solution How to make your Scala controll effects a-la Haskell George Leontiev deltamethod GmbH February 28, 2013 (λx.folonexlambda-calcul.us)@ folone.info George Leontiev deltamethod GmbH Scala solution

Transcript of Presentation

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Scala solutionHow to make your Scala controll effects a-la

Haskell

George Leontiev

deltamethod GmbH

February 28, 2013

(λx.folonexlambda-calcul.us)@folone.info

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Introduction

https://github.com/folone/funclub-wordsNote: I intentionally made it more ”interesting” to show more neatscalaz stuffI won’t cover everything though. If something seems strange,please ask.

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Core

Main functionswordCount : : S t r i n g → Map ( S t r i ng , I n t )accep tedCha r s : : Char → BooleanHelper functionst ime : : ( a → IO b ) → IO bc l o s e : : C l o s e a b l e a ⇒ a →IO ( )

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Core

de f accep tedChar s ( c : Char ) = {v a l sum : ( ( ( Boolean , Boolean ) , Boolean ) ) ⇒

Boolean = _ match {ca se ( ( a , b ) , c ) ⇒ a | | b | | c

}v a l fun = ( (_ : Char ) . i s L e t t e r O r D i g i t ) &&&

((_ : Char ) . i sWh i t e s pa c e ) &&&((_ : Char ) == ’− ’)

( fun >>> sum ) ( c )}http://www.haskell.org/arrows/index.html

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Core

de f wordCount ( t e x t : S t r i n g ) : Map [ S t r i ng , I n t ] =t e x t . f i l t e r ( accep tedChar s )

// s p l i t words. toLowerCase . s p l i t (”\\W” ) . t o L i s t// group. groupBy ( i d e n t i t y )// c a l c u l a t e group s i z e s. map { ca se ( key , v a l u e ) ⇒

key . t r im → v a l u e . l e n g t h}

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Typeclass instances

v a l N = 10i m p l i c i t v a l map Ins tances = new Show [ L i s t [ ( S t r i ng , I n t ) ] ] {

o v e r r i d e de f shows ( l : L i s t [ ( S t r i ng , I n t ) ] ) =l . f i l t e r N o t (_ . _1 . i sEmpty )

. so r tBy (−_. _2)

. take (N)

. f o l d L e f t ( ” ” ) { ca se ( acc , ( key , v a l u e ) ) ⇒acc + ”\n” + key + ” : ” + v a l u e

}}

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Executing

// f u n c t i o n : : S t r i n g → IO S t r i n gde f mainIO ( path : S t r i n g ) =

f o r {r e s u l t ← time ( f u n c t i o n ( path ) )_ ← putSt rLn ( r e s u l t )

} y i e l d ( )

de f main ( a r g s : Ar ray [ S t r i n g ] ) = {v a l path = a rg s (0 )// Yuck !mainIO ( path ) . unsa fePer fo rmIO ( )

}

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Executing

package i n f o . f o l o n e . wordsimpor t s c a l a z ._, S ca l a z ._o b j e c t Main {

de f main ( a r g s : Ar ray [ S t r i n g ] ) {v a l path = a rg s (0 )v a l a c t i o n = WordsMemory . mainIO _ |+|

WordsStream . mainIO _ |+|WordMachine . mainIO _

// Yuck !a c t i o n ( path ) . unsa fePer fo rmIO ( )

}}

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

All set

Let’s see how far we can push this solution.

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

First attempt

de f w h o l e F i l e ( path : S t r i n g ) : IO [ S t r i n g ] =IO { Source . f r o m F i l e ( path ) } . b r a c k e t ( c l o s e ) {

s ou r c e ⇒IO {

v a l t e x t = sou r c e . mkStr ingv a l r e s u l t = wordCount ( t e x t )r e s u l t . t o L i s t . shows

}}

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

First attempt

Works fine, but eats all the heap on a large enough file.

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Second attempt

de f byL ine ( path : S t r i n g ) : IO [ S t r i n g ] =IO { Source . f r o m F i l e ( path ) } . b r a c k e t ( c l o s e ) {

s ou r c e ⇒IO {

v a l s t ream = sou r c e . g e t L i n e s . toStreamv a l r e s u l t = st ream . map( wordCount )

. f o l d L e f t (Map . empty [ S t r i ng , I n t ] ) {ca s e ( acc , v ) ⇒ acc |+| v

}r e s u l t . t o L i s t . shows

}}

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Second attempt

Just what is this |+|?

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Typeclasses

i n s t a n c e Show [ ( S t r i ng , I n t ) ] where . . .i n s t a n c e Show Monoid b ⇒ Map a b where . . .

http://debasishg.blogspot.de/2010/06/scala-implicits-type-classes-here-i.html

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Monoids

(S,⊗, 1)∀a, b ∈ S : a ⊗ b ∈ S

∀a, b, c ∈ S : (a ⊗ b)⊗ c = a ⊗ (b ⊗ c)∀a ∈ S : 1⊗ a = a ⊗ 1 = a

http://apocalisp.wordpress.com/2010/06/14/on-monoids/

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Second attempt

Pretty good, but can we do better?

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Iteratees

Scala machines(https://github.com/runarorama/scala-machines)https://dl.dropbox.com/u/4588997/Machines.pdfGave similar performance on a by-line basis.Thought, three times faster if we provide a Process to ssplit it bywords and then monoidally merge single-element Maps.

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Iteratees – same as Stream

de f wordFreq ( path : S t r i n g ) =g e t F i l e L i n e s ( new F i l e ( path ) ,

i d outmap wordCount ) e x e cu t e

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Iteratees – 3x faster

de f s p l i t W o r d s ( t e x t : S t r i n g ) : L i s t [ S t r i n g ] =t e x t . f i l t e r ( accep tedChar s )

. toLowerCase . s p l i t (”\\W” ) . t o L i s t

v a l words : P roce s s [ S t r i ng , S t r i n g ] = ( f o r {s ← awa i t [ S t r i n g ]_ ← t r a v e r s e P l a n _ ( s p l i t W o r d s ( s ) ) ( emit )

} y i e l d ( ) ) r e p e a t e d l y

de f wordCount ( path : S t r i n g ) =g e t F i l e L i n e s ( new F i l e ( path ) ,

( i d s p l i t words ) outmap (_ . f o l d ( l ⇒ (1 , Map . empty [ S t r i ng , I n t ] ) ,

w ⇒ (0 , Map(w → 1 ) ) ) ) ) e x e cu t eGeorge Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Wordcounting software

Scoobi http://nicta.github.com/scoobi/Spark http://spark-project.org/

Scalding https://github.com/twitter/scalding/wiki/Type-safe-api-reference

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Wordcounting

I did not have time to try to use those. But turns out, this codeshould work for these ”as is”.

George Leontiev deltamethod GmbHScala solution

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

That’s it

Questions?

George Leontiev deltamethod GmbHScala solution