Presentation
-
Upload
george-leontiev -
Category
Technology
-
view
161 -
download
0
Transcript of Presentation
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Scala solutionHow to make your Scala controll effects a-la
Haskell
George Leontiev
deltamethod GmbH
February 28, 2013
(λx.folonexlambda-calcul.us)@folone.info
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Introduction
https://github.com/folone/funclub-wordsNote: I intentionally made it more ”interesting” to show more neatscalaz stuffI won’t cover everything though. If something seems strange,please ask.
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Core
Main functionswordCount : : S t r i n g → Map ( S t r i ng , I n t )accep tedCha r s : : Char → BooleanHelper functionst ime : : ( a → IO b ) → IO bc l o s e : : C l o s e a b l e a ⇒ a →IO ( )
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Core
de f accep tedChar s ( c : Char ) = {v a l sum : ( ( ( Boolean , Boolean ) , Boolean ) ) ⇒
Boolean = _ match {ca se ( ( a , b ) , c ) ⇒ a | | b | | c
}v a l fun = ( (_ : Char ) . i s L e t t e r O r D i g i t ) &&&
((_ : Char ) . i sWh i t e s pa c e ) &&&((_ : Char ) == ’− ’)
( fun >>> sum ) ( c )}http://www.haskell.org/arrows/index.html
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Core
de f wordCount ( t e x t : S t r i n g ) : Map [ S t r i ng , I n t ] =t e x t . f i l t e r ( accep tedChar s )
// s p l i t words. toLowerCase . s p l i t (”\\W” ) . t o L i s t// group. groupBy ( i d e n t i t y )// c a l c u l a t e group s i z e s. map { ca se ( key , v a l u e ) ⇒
key . t r im → v a l u e . l e n g t h}
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Typeclass instances
v a l N = 10i m p l i c i t v a l map Ins tances = new Show [ L i s t [ ( S t r i ng , I n t ) ] ] {
o v e r r i d e de f shows ( l : L i s t [ ( S t r i ng , I n t ) ] ) =l . f i l t e r N o t (_ . _1 . i sEmpty )
. so r tBy (−_. _2)
. take (N)
. f o l d L e f t ( ” ” ) { ca se ( acc , ( key , v a l u e ) ) ⇒acc + ”\n” + key + ” : ” + v a l u e
}}
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Executing
// f u n c t i o n : : S t r i n g → IO S t r i n gde f mainIO ( path : S t r i n g ) =
f o r {r e s u l t ← time ( f u n c t i o n ( path ) )_ ← putSt rLn ( r e s u l t )
} y i e l d ( )
de f main ( a r g s : Ar ray [ S t r i n g ] ) = {v a l path = a rg s (0 )// Yuck !mainIO ( path ) . unsa fePer fo rmIO ( )
}
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Executing
package i n f o . f o l o n e . wordsimpor t s c a l a z ._, S ca l a z ._o b j e c t Main {
de f main ( a r g s : Ar ray [ S t r i n g ] ) {v a l path = a rg s (0 )v a l a c t i o n = WordsMemory . mainIO _ |+|
WordsStream . mainIO _ |+|WordMachine . mainIO _
// Yuck !a c t i o n ( path ) . unsa fePer fo rmIO ( )
}}
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
All set
Let’s see how far we can push this solution.
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
First attempt
de f w h o l e F i l e ( path : S t r i n g ) : IO [ S t r i n g ] =IO { Source . f r o m F i l e ( path ) } . b r a c k e t ( c l o s e ) {
s ou r c e ⇒IO {
v a l t e x t = sou r c e . mkStr ingv a l r e s u l t = wordCount ( t e x t )r e s u l t . t o L i s t . shows
}}
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
First attempt
Works fine, but eats all the heap on a large enough file.
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Second attempt
de f byL ine ( path : S t r i n g ) : IO [ S t r i n g ] =IO { Source . f r o m F i l e ( path ) } . b r a c k e t ( c l o s e ) {
s ou r c e ⇒IO {
v a l s t ream = sou r c e . g e t L i n e s . toStreamv a l r e s u l t = st ream . map( wordCount )
. f o l d L e f t (Map . empty [ S t r i ng , I n t ] ) {ca s e ( acc , v ) ⇒ acc |+| v
}r e s u l t . t o L i s t . shows
}}
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Second attempt
Just what is this |+|?
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Typeclasses
i n s t a n c e Show [ ( S t r i ng , I n t ) ] where . . .i n s t a n c e Show Monoid b ⇒ Map a b where . . .
http://debasishg.blogspot.de/2010/06/scala-implicits-type-classes-here-i.html
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Monoids
(S,⊗, 1)∀a, b ∈ S : a ⊗ b ∈ S
∀a, b, c ∈ S : (a ⊗ b)⊗ c = a ⊗ (b ⊗ c)∀a ∈ S : 1⊗ a = a ⊗ 1 = a
http://apocalisp.wordpress.com/2010/06/14/on-monoids/
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Second attempt
Pretty good, but can we do better?
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Iteratees
Scala machines(https://github.com/runarorama/scala-machines)https://dl.dropbox.com/u/4588997/Machines.pdfGave similar performance on a by-line basis.Thought, three times faster if we provide a Process to ssplit it bywords and then monoidally merge single-element Maps.
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Iteratees – same as Stream
de f wordFreq ( path : S t r i n g ) =g e t F i l e L i n e s ( new F i l e ( path ) ,
i d outmap wordCount ) e x e cu t e
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Iteratees – 3x faster
de f s p l i t W o r d s ( t e x t : S t r i n g ) : L i s t [ S t r i n g ] =t e x t . f i l t e r ( accep tedChar s )
. toLowerCase . s p l i t (”\\W” ) . t o L i s t
v a l words : P roce s s [ S t r i ng , S t r i n g ] = ( f o r {s ← awa i t [ S t r i n g ]_ ← t r a v e r s e P l a n _ ( s p l i t W o r d s ( s ) ) ( emit )
} y i e l d ( ) ) r e p e a t e d l y
de f wordCount ( path : S t r i n g ) =g e t F i l e L i n e s ( new F i l e ( path ) ,
( i d s p l i t words ) outmap (_ . f o l d ( l ⇒ (1 , Map . empty [ S t r i ng , I n t ] ) ,
w ⇒ (0 , Map(w → 1 ) ) ) ) ) e x e cu t eGeorge Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Wordcounting software
Scoobi http://nicta.github.com/scoobi/Spark http://spark-project.org/
Scalding https://github.com/twitter/scalding/wiki/Type-safe-api-reference
George Leontiev deltamethod GmbHScala solution
..........
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
.....
.....
......
.....
......
.....
.....
.
Wordcounting
I did not have time to try to use those. But turns out, this codeshould work for these ”as is”.
George Leontiev deltamethod GmbHScala solution