Scala parallel-collections

18

Click here to load reader

description

Parallel Collections in Scala. What to keep note of and watchouts.

Transcript of Scala parallel-collections

Page 1: Scala parallel-collections

Parallel Collections with Scala

Jul 6' 2012 > Vikas Hazrati > [email protected] > @vhazrati

Page 2: Scala parallel-collections

Motivation

Multiple-cores

Popular Parallel Programming remains a formidable challenge.

Implicit Parallelism

Page 3: Scala parallel-collections

scala> val list = (1 to 10000).toList

scala> list.map(_ + 42)

scala> list.par.map(_ + 42)

Page 4: Scala parallel-collections

scala> List(1,2,3,4,5)res0: List[Int] = List(1, 2, 3, 4, 5)

scala> res0.map(println(_))12345res1: List[Unit] = List((), (), (), (), ())

scala> res0.par.map(println(_))31425res2: scala.collection.parallel.immutable.ParSeq[Unit] = ParVector((), (), (), (), ())

Page 5: Scala parallel-collections

ParArray

ParVector

mutable.ParHashMap

mutable.ParHashSet

immutable.ParHashMap

immutable.ParHashSet

ParRange

ParTrieMap (collection.concurrent.TrieMaps are new in 2.10)

Page 6: Scala parallel-collections

Caution: Performance benefits visible only around severalThousand elements in the collection

Machine Architecture

Specific collection – ParArray, ParTrieMap

Per element workload

JVM vendor and version

Specific operation – transformer(filter), accessor (foreach)

Memory Management

Depends on

Page 7: Scala parallel-collections

scala> val parArray = (1 to 1000000).toArray.par

scala> parArray.fold(0)(_+_)res3: Int = 1784293664

scala> val narArray = (1 to 1000000).toArray

scala> narArray.fold(0)(_+_)res5: Int = 1784293664

scala> parArray.fold(0)(_+_)res6: Int = 1784293664

I did not noticeDifference on mylaptop

map, fold and filter

Page 8: Scala parallel-collections

creating a parallel collection

import scala.collection.parallel.immutable.ParVector

val pv = new ParVector[Int]

val pv = Vector(1,2,3,4,5,6,7,8,9).par

Taking a sequential collectionAnd converting it

With a new

Parallel collections can be converted back to sequential collections with seq

Page 9: Scala parallel-collections

Collections are inherently sequential

They are converted to || by copying elements into similar parallel collection

An example is List– it’s converted into a standard immutable parallel sequence, which is a ParVector.

Overhead!

Array, Vector, HashMap do not have this overhead

Page 10: Scala parallel-collections

how does it work?

Map reduce ?

by recursively “splitting” a given collection, applying an operation on each partition of the collection in parallel, and re-“combining” all of the results that were completed in parallel.

Side effecting operations Non Associative operations

Page 11: Scala parallel-collections

side effecting operationscala> var sum =0sum: Int = 0

scala> val list = (1 to 1000).toList.par

scala> list.foreach(sum += _); sumres7: Int = 452474

scala> var sum =0sum: Int = 0

scala> list.foreach(sum += _); sumres8: Int = 497761

scala> var sum =0sum: Int = 0

scala> list.foreach(sum += _); sumres9: Int = 422508

Page 12: Scala parallel-collections

non-associative operations

The order in which function is applied to the elements of the collection canbe arbitrary

scala> val list = (1 to 1000).toList.par

scala> list.reduce(_-_)

res01: Int = -228888

scala> list.reduce(_-_)

res02: Int = -61000

scala> list.reduce(_-_)

res03: Int = -331818

Page 13: Scala parallel-collections

associate but non-commutative

scala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").parstrings: scala.collection.parallel.immutable.ParSeq[java.lang.String] = ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz)

scala> val alphabet = strings.reduce(_++_)alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz

Page 14: Scala parallel-collections

out of order?

Operations may be out of order

BUT

Recombination of results would be in order

collection

A B C

CA

B

A B C

Page 15: Scala parallel-collections

performance

In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated.

Page 16: Scala parallel-collections

conversions

Converting parallel to sequential takes constant time

List is converted tovector

Page 17: Scala parallel-collections

architecture

splitters combiners

Split the collection intoNon-trivial partitions so That they can be accessedin sequence

Is a Builder.Combines split lists together.

Page 18: Scala parallel-collections

brickbats

Absence of configuration

Not all algorithms are parallel friendly

unproven

Now, if you want your code to not care whether it receives a parallel or sequential collection, you should prefix it with Gen: GenTraversable, GenIterable, GenSeq, etc. These can be either parallel or sequential.