Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Scala Collections PerformanceGS.com/EngineeringMarch, 2015Craig Motlin

Who am I?• Engineer at Goldman Sachs• Technical lead for GS Collections

– A replacement for the Java Collections Framework

Goals• Scala programs ought to perform as well as

Java programs, but Scala is a little slower

GoalsFrom RedBlackTree.scala/* * Forcing direct fields access using the @inline annotation * helps speed up various operations (especially * smallest/greatest and update/delete). * * Unfortunately the direct field access is not guaranteed to * work (but works on the current implementation of the Scala * compiler). * * An alternative is to implement the these classes using plain * old Java code... */

Goals• Scala programs ought to perform as well as

Java programs, but Scala is a little slower• Highlight a few performance problems that

matter to me• Suggest fixes, borrowing ideas and technology

from GS Collections

GoalsGS Collections and Scala’s Collections are similar• Mutable and immutable interfaces with common

parent• Similar iteration patterns at hierarchy root Traversable and RichIterable

• Lazy evaluation (view, asLazy)• Parallel-lazy evaluation (par, asParallel)

AgendaGS Collections and Scala’s Collections are different• Persistent data structures• Hash tables• Primitive specialization• Fork-join

Persistent Data Structures

Persistent Data Structures• Mutable collections are similar in GS

Collections and Scala – though we’ll look at some differences

• Immutable collections are quite different• Scala’s immutable collections are persistent• GS Collections’ immutable collections are not

Persistent Data StructuresWikipedia:

In computing, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified.

Operations yield new updated structures when

they are “modified”

Persistent Data Structures• Typical examples: List and Sorted Set

Persistent Data Structures• Typical examples: List and Sorted Set• “Add” by constructing O(log n) new nodes

from the leaf to a new root.

• Scala sorted set binary tree• GSC immutable sorted set covered later

Persistent Data Structures• Typical examples: List and Sorted Set• “Prepend” by constructing a new head in O(1)

time. LIFO structure.

• Scala immutable list linked list or stack• GSC immutable list array backed

Persistent Data Structures• Extremely important in purely functional

languages • All collections are immutable• Must have good runtime complexity• No one seems to miss them in GS Collections

Persistent Data Structures• Proposal: Mutable, Persistent Immutable, and

Plain Immutable• Mutable: Same as always• Persistent: Use when you want structural

sharing• (Plain) Immutable: Use when you’re done

mutating or when the data never mutates

Persistent Data StructuresPlain old immutable data structures• Not persistent (no structural sharing)• “Adding” is much slower: O(n)• Speed benefits for everything else• Huge memory benefits

Persistent Data Structures• Immutable array-backed list• Immutable trimmed array• No nodes 1/6 memory• Array backed cache locality, should

parallelize well

Measured with Java 8 64-bit and compressed oops

050000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

10000000

5000000

10000000

15000000

20000000

25000000

30000000

Lists: Size in MB by number of elements

com.gs.collections.api.list.ImmutableList scala.collection.immutable.List scala.collection.mutable.ArrayBuffer

Size (num elements)

Persistent Data StructuresPerformance assumptions:• Iterating through an array should be faster

than iterating through a linked list• Linked lists won’t parallelize well with .par• No surprises in the results – so we’ll skipgithub.com/goldmansachs/gs-collections/tree/master/jmh-testsgithub.com/goldmansachs/gs-collections/tree/master/memory-tests

Persistent Data Structures• Immutable array-backed sorted set• Immutable trimmed, sorted array• No nodes ~1/8 memory• Array backed cache locality

050000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

10000000

50000001000000015000000200000002500000030000000350000004000000045000000

Sorted Sets: Size in MB by number of elements

com.gs.collections.api.set.sorted.ImmutableSortedSet com.gs.collections.impl.set.sorted.mutable.TreeSortedSetscala.collection.immutable.TreeSet scala.collection.mutable.TreeSet

Size (num elements)

Persistent Data Structures• Assumptions about contains:• May be faster when it’s a binary search in an

array (good cache locality)• Will be about the same in mutable /

immutable tree (all nodes have same structure)

Immutable Mutable0

Sorted Set contains() throughput (higher is better)

GSC Scala

Iteration

Persistent Data Structuresval scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE)

def serial_immutable_scala(): Unit ={ val count: Int = this.scalaImmutable .view .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError }}

Lazy evaluation so we don’t create

intermediate collections

Lazy evaluation so we don’t create

intermediate collections

Terminating in .toList or .toBuffer would be

dominated by collection creation

Persistent Data Structuresprivate final ImmutableSortedSet<Integer> gscImmutable = SortedSets.immutable.withAll(Interval.zeroTo(SIZE));@Benchmarkpublic void serial_immutable_gsc(){ int count = this.gscImmutable .asLazy() .select(each -> each % 10_000 != 0) .collect(String::valueOf) .collect(Integer::valueOf) .count(each -> (each + 1) % 10_000 != 0); if (count != 999_800) { throw new AssertionError(); }}

Persistent Data Structures• Assumption: Iterating over an array is

somewhat faster

Serial / Immutable Serial / Mutable0

Sorted Set Iteration throughput (higher is better)

GSC Scala

Persistent Data StructuresNext up, parallel-lazy evaluationthis.scalaImmutable.view this.scalaImmutable.par

this.gscImmutable.asLazy() this.gscImmutable.asParallel(this.executorService, BATCH_SIZE)

Persistent Data Structures• Assumption: Scala’s tree should parallelize

moderately well• Assumption: GSC’s array should parallelize

very well

Measured on an 8 core Linux VMIntel Xeon E5-2697 v28x

Serial / Immutable Serial / Mutable Parallel / Immutable0

GSC Scala

Persistent Data Structures• Scala’s immutable.TreeSet doesn’t override .par,

so parallel is slower than serial• Some tree operations (like filter) are hard to

parallelize• TreeSet.count() should be easy to parallelize using

fork/join with some work• New assumption: Mutable trees won’t parallelize well

either

Measured on an 8 core Linux VMIntel Xeon E5-2697 v28x

Serial / Immutable Serial / Mutable Parallel / Immutable Parallel / Mutable0

GSC Scala

Persistent Data Structures• GS Collections follow up:

– TreeSortedSet delegates to java.util.TreeSet. Write our own tree and override .asParallel()

• Scala follow up:– Override .par

Persistent Data Structures• Proposal: Mutable, Persistent, and Immutable• All three are useful in different cases• How common is it to share structure, really?

Hash Tables

Hash Tables• Scala’s immutable.HashMap is a hash array

mapped trie (persistent structure)• Wikipedia: “achieves almost hash table-like

speed while using memory much more economically”

• Let’s see

Hash Tables• Scala’s mutable.HashMap is backed by an array

of Entry pairs• java.util.HashMap.Entry caches the hashcode• UnifiedMap<K, V> is backed by Object[],

flattened, alternating K and V• ImmutableUnifiedMap is backed by UnifiedMap

Hash Tables

050000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

10000000

1000000020000000300000004000000050000000600000007000000080000000

Maps: Size in MB by number of elements

com.gs.collections.api.map.ImmutableMap com.gs.collections.impl.map.mutable.UnifiedMapjava.util.HashMap scala.collection.immutable.HashMapscala.collection.mutable.HashMap

Size (num elements)

Hash Tables@Benchmarkpublic void get(){ int localSize = this.size; String[] localElements = this.elements; Map<String, String> localScalaMap = this.scalaMap;

for (int i = 0; i < localSize; i++) { if (!localScalaMap.get(localElements[i]).isDefined()) { throw new AssertionError(i); } }}

Hash Tables

0 250000

500000

750000

1000000

1250000

1500000

1750000

2000000

2250000

2500000

2750000

3000000

3250000

3500000

3750000

4000000

4250000

4500000

4750000

5000000

5250000

5500000

5750000

6000000

6250000

6500000

6750000

7000000

7250000

7500000

7750000

8000000

8250000

8500000

8750000

9000000

9250000

9500000

9750000

10000000

5000000

10000000

15000000

20000000

25000000

30000000

35000000

Map get() throughput by map size (higher is better)

GscImmutableMapGetTest.get GscMutableMapGetTest.get ScalaImmutableMapGetTest.get ScalaMutableMapGetTest.get

Size (num elements)

Hash Tables

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

5500000

6000000

6500000

7000000

7500000

8000000

8500000

9000000

9500000

100000000

5000000

10000000

15000000

20000000

25000000

30000000

Map put() throughtput by map size (higher is better)

com.gs.collections.impl.map.mutable.UnifiedMap scala.collection.immutable.Mapscala.collection.mutable.HashMap

Size (num elements)

Hash Tables@Benchmarkpublic mutable.HashMap<String, String> mutableScalaPut(){ int localSize = this.size; String[] localElements = this.elements;

mutable.HashMap<String, String> map = new PresizableHashMap<>(localSize);

for (int i = 0; i < localSize; i++) { map.put(localElements[i], "dummy"); } return map;}

Hash TablesHashMap ought to have a constructor for pre-sizing

class PresizableHashMap[K, V](val _initialSize: Int) extends scala.collection.mutable.HashMap[K, V]{ private def initialCapacity = if (_initialSize == 0) 1 else smallestPowerOfTwoGreaterThan( (_initialSize.toLong * 1000 / _loadFactor).asInstanceOf[Int])

private def smallestPowerOfTwoGreaterThan(n: Int): Int = if (n > 1) Integer.highestOneBit(n - 1) << 1 else 1

table = new Array(initialCapacity) threshold = ((initialCapacity.toLong * _loadFactor) / 1000).toInt}

Hash Tables• scala.collection.mutable.HashSet is

backed by an array using open-addressing• java.util.HashSet is implemented by

delegating to a HashMap.• UnifiedSet is backed by Object[], either

elements or arrays of collisions• ImmutableUnifiedSet is backed by UnifiedSet

Hash Tables

050000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

10000000

50000001000000015000000200000002500000030000000350000004000000045000000

Sets: Size in MB by number of elements

com.gs.collections.api.set.ImmutableSet com.gs.collections.impl.set.mutable.UnifiedSet java.util.HashSetscala.collection.immutable.HashSet scala.collection.mutable.HashSet

Size (num elements)

Primitive Specialization

Primitive Specialization• Boxing is expensive

– Reference + Header + alignment• Scala has specialization, but none of the collections

are specialized• If you cannot afford wrappers you can:

– Use primitive arrays (filter, map, etc. will auto-box)– Use a Java Collections library

Primitive Specialization

050000

100000

150000

200000

250000

300000

350000

400000

450000

500000

550000

600000

650000

700000

750000

800000

850000

900000

950000

10000000

5000000

10000000

15000000

20000000

25000000

30000000

35000000

Sets: Size in MB by number of elements

com.gs.collections.impl.set.mutable.primitive.IntHashSet com.gs.collections.impl.set.mutable.UnifiedSetscala.collection.mutable.HashSet

Size (num elements)

Primitive Specialization• Proposal: Primitive lists, sets, and maps in

Scala• Not Traversable – outside the collections

hierarchy• Fix Specialization on Functions (lambdas)

– Want for-comprehensions to work well

Fork/Join

Parallel / Lazy / Scalaval list = this.integers .par .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .filter(each => (each + 1) % 10000 != 0) .toBuffer

Assert.assertEquals(999800, list.size)

Parallel / Lazy / GSCMutableList<Integer> list = this.integersGSC .asParallel(this.executorService, BATCH_SIZE) .select(each -> each % 10_000 != 0) .collect(String::valueOf) .collect(Integer::valueOf) .select(each -> (each + 1) % 10_000 != 0) .toList();

Verify.assertSize(999_800, list);

Parallel / Lazy / JDKList<Integer> list = this.integersJDK .parallelStream() .filter(each -> each % 10_000 != 0) .map(String::valueOf) .map(Integer::valueOf) .filter(each -> (each + 1) % 10_000 != 0) .collect(Collectors.toList());

Stacked computation ops/s (higher is better)

Serial Lazy Parallel Lazy Parallel hand-coded0

Java 8 GS Collections Scala8x

Fork-Join Merge• Intermediate results are merged in a tree• Merging is O(n log n) work and garbage

Fork-Join Merge• Amount of work done by last thread is O(n)

Parallel / Lazy / GSCMutableList<Integer> list = this.integersGSC .asParallel(this.executorService, BATCH_SIZE) .select(each -> each % 10_000 != 0) .collect(String::valueOf) .collect(Integer::valueOf) .select(each -> (each + 1) % 10_000 != 0) .toList();

ParallelIterable.toList() returns a

CompositeFastList, a List with O(1) implementation

of addAll()

Parallel / Lazy / GSCpublic final class CompositeFastList<E> { private final FastList<FastList<E>> lists = FastList.newList();

public boolean addAll(Collection<? extends E> collection) { FastList<E> collectionToAdd = collection instanceof FastList ? (FastList<E>) collection : new FastList<E>(collection); this.lists.add(collectionToAdd); return true; } ...

CompositeFastList Merge• Merging is O(1) work per batch

Fork/Join• Fork-join is general purpose but always requires

merge work• We can get better performance through specialized

data structures meant for combining

SummaryGS Collections and Scala’s Collections are different• Persistent data structures• Hash tables• Primitive specialization• Fork-join

More Information@motlin@GoldmanSachs

stackoverflow.com/questions/tagged/gs-collections

github.com/goldmansachs/gs-collections/tree/master/jmh-testsgithub.com/goldmansachs/gs-collections/tree/master/memory-testsgithub.com/goldmansachs/gs-collections-kata

infoq.com/presentations/java-streams-scala-parallel-collections

Learn more at GS.com/Engineering

© 2015 Goldman Sachs. This presentation should not be relied upon or considered investment advice. Goldman Sachs does not warrant or guarantee to anyone the accuracy, completeness or efficacy of this presentation, and recipients should not rely on it except at their own risk. This presentation may not be forwarded or disclosed except with this disclaimer intact.

Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Documents

Transcript of Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Scala 2013: a Pragmatic guide to Scala adoPtionpages.zeroturnaround.com/.../images/Scala2013-A-Pragmatic-Guide-To... · Scala 2013: a Pragmatic guide to Scala ... Numerous blog posts

European Power: More Lean, More Green · 2017-10-05 · Pragna Kataria +1 212 934-6393 pragna.kataria@gs.com Goldman Sachs India SPL Baptiste Cota +44 20 7774-0042 baptiste.x.cota@gs.com

84 SCALA BROCHURE FRONT & BACK€¦ · SCALA SCALA . SCALA 20 . TOTAL 1 N FIN ed B TOTAL SCALA . Title: 84 SCALA BROCHURE FRONT & BACK Author: Arun Pawar Created Date: 7/13/2015 11:05:47

Casbah (MongoDB + Scala Toolkit) Documentationapi.mongodb.com/scala/casbah/2.5.0/CasbahDocumentation.pdf · Casbah (MongoDB + Scala Toolkit) Documentation, Release 2.5.0 Welcome to

Global Themes and Risks - STANYAbby Joseph Cohen, CFA Goldman, Sachs & Co. 1-212-902-4095 abby.cohen@gs.com Rachel Siu Goldman, Sachs & Co. 1-212-357-0493 rachel.siu@gs.com Global

Spring scala - Sneaking Scala into your corporation

SERIE SCALA - americanstandard-la.com el fin de facilitar el proceso de instalación lea cuidadosamente las ... SCALA Semi concealed Tank . 1215101 Taza ensamble piso SCALA SCALA ﬂoor

Heroku - files.meetup.com to Heroku.pdf · Heroku Ruby, Python, Java, Scala, Clojure, Node.js, Logo Tuesday, February 21, 12. Craig Kerstiens @craigkerstiens Tuesday, February 21,

Building Kermeta compiler using Scaladays2011.scala-lang.org/node/138/155/15-8-M - Kermeta - Fouquet.pdf · Scala and Trait 13 KERMETA COMPILER why scala ? Kermeta Scala compiler

Spring Scala - files.meetup.comfiles.meetup.com/11921292/Spring Scala - Sneaking Scala into your... · Apache Camel Spring Scala (Spring Source aka VMware aka Pivotal) 7 years of

Scala Days 2018 You Are a - Lightbend...scala/scala-parallel-collections scala/scala-collection-compat scala/scala-java8-compat scala/scala-swing scala/scala-async scala/scala-continuations

TAKING THE HEAT - goldmansachs.com · TAKING THE HEAT Amanda Hindlian amanda.hindlian@gs.com Sandra Lawson sandra.lawson@gs.com Sonya Banerjee sonya.banerjee@gs.com The Goldman Sachs

Camel Scala

Scala Pans

Under the hood of scala implicits (Scala eXchange 2014)

Scala & Spark Msc. Johannes Härtel PTT18/19 Msc. Marcel ... · What is Scala? - Scala is a general purpose programming language. - Scala provides support for functional programming

Scala - A Scalable Language - oracle.com · Scala compared + abstract types-raw types Scala adds Scala removes + existential types - wildcards + pattern matching-enums + mixin composition

PredictionIO – A Machine Learning Server in Scala – SF Scala

Managing Binary Compatibility in Scala (Scala Lift Off 2011)

Scala Tutorial - PeopleABOUT THE TUTORIAL Scala Tutorial ... and type-safe way. Scala has been created by Martin Odersky and he ... Example: ...people.cis.ksu.edu/~schmidt/705a/Scala/scala_tutorial.pdf ·