Scala parallel-collections
Click here to load reader
-
Upload
knoldus-software-llp -
Category
Technology
-
view
1.654 -
download
1
description
Transcript of Scala parallel-collections
Parallel Collections with Scala
Jul 6' 2012 > Vikas Hazrati > [email protected] > @vhazrati
Motivation
Multiple-cores
Popular Parallel Programming remains a formidable challenge.
Implicit Parallelism
scala> val list = (1 to 10000).toList
scala> list.map(_ + 42)
scala> list.par.map(_ + 42)
scala> List(1,2,3,4,5)res0: List[Int] = List(1, 2, 3, 4, 5)
scala> res0.map(println(_))12345res1: List[Unit] = List((), (), (), (), ())
scala> res0.par.map(println(_))31425res2: scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((), (), (), (), ())
ParArray
ParVector
mutable.ParHashMap
mutable.ParHashSet
immutable.ParHashMap
immutable.ParHashSet
ParRange
ParTrieMap (collection.concurrent.TrieMaps are new in 2.10)
Caution: Performance benefits visible only around severalThousand elements in the collection
Machine Architecture
Specific collection – ParArray, ParTrieMap
Per element workload
JVM vendor and version
Specific operation – transformer(filter), accessor (foreach)
Memory Management
Depends on
scala> val parArray = (1 to 1000000).toArray.par
scala> parArray.fold(0)(_+_)res3: Int = 1784293664
scala> val narArray = (1 to 1000000).toArray
scala> narArray.fold(0)(_+_)res5: Int = 1784293664
scala> parArray.fold(0)(_+_)res6: Int = 1784293664
I did not noticeDifference on mylaptop
map, fold and filter
creating a parallel collection
import scala.collection.parallel.immutable.ParVector
val pv = new ParVector[Int]
val pv = Vector(1,2,3,4,5,6,7,8,9).par
Taking a sequential collectionAnd converting it
With a new
Parallel collections can be converted back to sequential collections with seq
Collections are inherently sequential
They are converted to || by copying elements into similar parallel collection
An example is List– it’s converted into a standard immutable parallel sequence, which is a ParVector.
Overhead!
Array, Vector, HashMap do not have this overhead
how does it work?
Map reduce ?
by recursively “splitting” a given collection, applying an operation on each partition of the collection in parallel, and re-“combining” all of the results that were completed in parallel.
Side effecting operations Non Associative operations
side effecting operationscala> var sum =0sum: Int = 0
scala> val list = (1 to 1000).toList.par
scala> list.foreach(sum += _); sumres7: Int = 452474
scala> var sum =0sum: Int = 0
scala> list.foreach(sum += _); sumres8: Int = 497761
scala> var sum =0sum: Int = 0
scala> list.foreach(sum += _); sumres9: Int = 422508
non-associative operations
The order in which function is applied to the elements of the collection canbe arbitrary
scala> val list = (1 to 1000).toList.par
scala> list.reduce(_-_)
res01: Int = -228888
scala> list.reduce(_-_)
res02: Int = -61000
scala> list.reduce(_-_)
res03: Int = -331818
associate but non-commutative
scala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").parstrings: scala.collection.parallel.immutable.ParSeq[java.lang.String] = ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz)
scala> val alphabet = strings.reduce(_++_)alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz
out of order?
Operations may be out of order
BUT
Recombination of results would be in order
collection
A B C
CA
B
A B C
performance
In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated.
conversions
Converting parallel to sequential takes constant time
List is converted tovector
architecture
splitters combiners
Split the collection intoNon-trivial partitions so That they can be accessedin sequence
Is a Builder.Combines split lists together.
brickbats
Absence of configuration
Not all algorithms are parallel friendly
unproven
Now, if you want your code to not care whether it receives a parallel or sequential collection, you should prefix it with Gen: GenTraversable, GenIterable, GenSeq, etc. These can be either parallel or sequential.