Procesado concurrente de datos con ParallelStreams - David Gomez

42
ParallelStreams Concurrent data processing in Java 8 David Gómez G. @dgomezg [email protected]

Transcript of Procesado concurrente de datos con ParallelStreams - David Gomez

ParallelStreamsConcurrent data processing in Java 8David Gómez G.@[email protected]

Do you remember?

use stream()

for (int i = 0; i < 100; i++) { long start = System.currentTimeMillis(); List<Integer> even = numbers.parallelStream() .filter(n -> n % 2 == 0) .sorted() .collect(toList()); System.out.printf( "%d elements computed in %5d msecs with %d threads\n”, even.size(), System.currentTimeMillis() - start, Thread.activeCount());}

4999299 elements computed in 225 msecs with 9 threads 4999299 elements computed in 230 msecs with 9 threads 4999299 elements computed in 250 msecs with 9 threads

@dgomezg

Previously on…

Streams? What’s that?

A Stream is…An convenience method to iterate over

collections in a declarative wayList<Integer>  numbers  =  new  ArrayList<Integer>();for  (int  i=  0;  i  <  100  ;  i++)  {   numbers.add(i); }  

List<Integer> evenNumbers = numbers.stream() .filter(n -> n % 2 == 0) .collect(toList());

@dgomezg

Anatomy of a Stream

Source

Intermediate Operations

filter

map

order

function

Final operation

pipe

line

@dgomezg

Iterating a Stream

List<Integer> evenNumbers = numbers.stream() .filter(n -> n % 2 == 0) .collect(toList());

Internal Iteration - No manual Iterators handling - Concise - Fluent API: chain sequence processing Elements computed only when needed

@dgomezg

Iterating a Stream

List<Integer> evenNumbers = numbers.parallelStream() .filter(n -> n % 2 == 0) .collect(toList());

Easily Parallelism - Concurrency is hard to be done right! - Uses ForkJoin - Process steps should be - stateless - independent

@dgomezg

Parallel Streams

use stream()

List<Integer> numbers = new ArrayList<>();for (int i= 0; i < 10_000_000 ; i++) { numbers.add((int)Math.round(Math.random()*100));}

//This will use just a single thread Stream<Integer> evenNumbers = numbers.stream();

or parallelStream()//Automatically select the optimum number of threads Stream<Integer> evenNumbers = numbers.parallelStream();

@dgomezg

Let’s test it

use stream()

for (int i = 0; i < 100; i++) { long start = System.currentTimeMillis(); List<Integer> even = numbers.stream() .filter(n -> n % 2 == 0) .sorted() .collect(toList()); System.out.printf( "%d elements computed in %5d msecs with %d threads\n”, even.size(), System.currentTimeMillis() - start, Thread.activeCount());}

5001983 elements computed in 828 msecs with 2 threads 5001983 elements computed in 843 msecs with 2 threads 5001983 elements computed in 675 msecs with 2 threads 5001983 elements computed in 795 msecs with 2 threads

@dgomezg

Going parallel

use stream()

for (int i = 0; i < 100; i++) { long start = System.currentTimeMillis(); List<Integer> even = numbers.parallelStream() .filter(n -> n % 2 == 0) .sorted() .collect(toList()); System.out.printf( "%d elements computed in %5d msecs with %d threads\n”, even.size(), System.currentTimeMillis() - start, Thread.activeCount());}

4999299 elements computed in 225 msecs with 9 threads 4999299 elements computed in 230 msecs with 9 threads 4999299 elements computed in 250 msecs with 9 threads

@dgomezg

Previously on…

http://www.slideshare.net/dgomezg/streams-en-java-8

Parallelism Under the hood

Fork/Join Framework

Proposed by Doug Lea

"a style of parallel programming in which problems are solved by (recursively) splitting them into subtasks that are solved in parallel."

Available in Java 7

Used by ParallelStreams

The F/J algorithm

Result solve(Problem problem) { if (problem is small) directly solve problem else { split problem into independent parts fork new subtasks to solve each part join all subtasks compose result from subresults } }

as proposed by Doug Lea

ForkJoinPool

ExecutorService implementation that • has a defined number of Workers (threads) • executes ForkJoinTasks • submitted by execute(ForkJoinTask  task)  

• or by invoke(ForkJoinTask  task)

ForkJoinTask

Abstract class that represents a task to be run concurrently

Every ForkJoinTask could be splitted (if not small enough) and solved Recursively

Two concrete implementations • RecursiveAction  if not returning value • RecursiveTask  if returning a value

ForkJoinWorkerThread

Any of the threads created by the ForkJoinPool

Executes ForkJoinTasks

Everyone has a Dequeue for tasks (allows task stealing)

ForkJoinWorkerThread

Result solve(Problem problem) { if (problem is small) directly solve problem else { split problem into independent parts fork new subtasks to solve each part join all subtasks compose result from subresults } }

the F/J algorithm

plus Task Stealing.

Fork/Join. When to use?

For computations that could be splitted into smaller tasks aka ‘divide and conquer’ algorithms Independent

Reduction with no contention.

ParallelStreams in action!

ParallellStreams

for (int i = 0; i < 100; i++) { long start = System.currentTimeMillis(); List<Integer> even = numbers.parallelStream() .filter(n -> n % 2 == 0) .sorted() .collect(toList()); System.out.printf( "%d elements computed in %5d msecs with %d threads\n”, even.size(), System.currentTimeMillis() - start, Thread.activeCount());}

4999299 elements computed in 225 msecs with 9 threads 4999299 elements computed in 230 msecs with 9 threads 4999299 elements computed in 250 msecs with 9 threads

Thread.activeCount not accurate

for (int i = 0; i < 100; i++) { long start = System.currentTimeMillis(); List<Integer> even = numbers.parallelStream() .filter(n -> n % 2 == 0) .sorted() .collect(toList()); System.out.printf( "%d elements computed in %5d msecs with %d threads\n”, even.size(), System.currentTimeMillis() - start, Thread.activeCount());}

Thread.activeCount() does not show the effective number of threads processing the stream

Better count threads involvedSet<String> workerThreadNames = new ConcurrentSet<>();

for (int i = 0; i < 100; i++) { long start = System.currentTimeMillis(); List<Integer> even = numbers.stream() .filter(n -> n % 2 == 0) .peek(n -> workerThreadNames.add( Thread.currentThread().getName())) .sorted() .collect(toList()); System.out.printf( "%d elements computed in %5d msecs with %d threads\n”, even.size(), System.currentTimeMillis() - start, workerThreadNames.size()); }

Threads usage

ParallelStreams use the common ForkJoinPool

Number of worker threads configured with -­‐Djava.util.concurrent.ForkJoinPool.common.parallelism=n

Useful to keep CPU parallelism under control…

…but …

Limiting parallelism

for (int i = 0; i < 100; i++) { long start = System.currentTimeMillis(); List<Integer> even = numbers.stream() .filter(n -> n % 2 == 0) .peek(n -> workerThreadNames.add( Thread.currentThread().getName())) .sorted() .collect(toList()); System.out.printf( "%d elements computed in %5d msecs with %d threads\n”, even.size(), System.currentTimeMillis() - start, workerThreadNames.size()); }

-­‐Djava.util.concurrent.ForkJoinPool.common.parallelism=4

5001069 elements computed in 269 msecs with 5 threads

WTF

Limiting parallelismfor (int i = 0; i < 100; i++) { long start = System.currentTimeMillis(); List<Integer> even = numbers.stream() .filter(n -> n % 2 == 0) .peek(n -> workerThreadNames.add( Thread.currentThread().getName())) .sorted() .collect(toList()); System.out.printf( "%d elements computed in %5d msecs with %d threads\n”, even.size(), System.currentTimeMillis() - start, workerThreadNames.size()); } System.out.println("credits to threads: “ + workerThreadNames);

5001069 elements computed in 269 msecs with 5 threads credits to threads: ForkJoinPool.commonPool-worker-0, ForkJoinPool.commonPool-worker-1, ForkJoinPool.commonPool-worker-2, ForkJoinPool.commonPool-worker-3, main

WTF

Threads Involved in ParallelStream

ParallelStreams use the common ForkJoinPool

Thread invoking ParallelStream also used as Worker

Caveats: •ParallelStream processing is synchronous for invoking thread

•Other Threads using common ForkJoinPool could be affected

ParallelStream Hack

ParallelStream can be forced to use a custom ForkJoinPoolForkJoinPool forkJoinPool = new ForkJoinPool(4);long start = System.currentTimeMillis();

numbers.parallelStream() .filter(n -> n % 2 == 0) .sorted() .collect(toList());

ParallelStream Hack

ParallelStream can be forced to use a custom ForkJoinPoolForkJoinPool forkJoinPool = new ForkJoinPool(4);long start = System.currentTimeMillis();ForkJoinTask<List<Integer>> task = forkJoinPool.submit(() -> { return numbers.parallelStream() .filter(n -> n % 2 == 0) .sorted() .collect(toList()); } ); List<Integer> even = task.get();

ParallelStream HackParallelStream can be forced to use a custom ForkJoinPoolForkJoinPool forkJoinPool = new ForkJoinPool(4);ForkJoinTask<List<Integer>> task = forkJoinPool.submit(() -> { return numbers.parallelStream() .filter(n -> n % 2 == 0) .sorted() .collect(toList()); } ); List<Integer> even = task.get();

Task submitted in 1 msecs 5000805 elements computed in 328 msecs with 4 threads

ParallelStream Hack benefits

A custom ExecutorService • Does not affect other ParallelStreams • Does not affect Common ForkJoinPool users • Reduces unpredictable latency due to other CommonForkJoin Pool load

• Invoking thread not used as worker (async parallel process)

Problems derived from Common ForkJoinPool

Blocking for IO

If firsts URLs stuck on a ConnectionTimeOut, overall performance could be affected Stream<String> urls = Files.lines(Paths.get("urlsToCheck.txt"));List<String> errors = urls.parallel().filter(url -> { //Connect to URL and wait for 200 response or timeout return true; }).collect(toList());

Nested parallelStreams

Outer parallelStream could exhaust ForkJoin Workers: long start = System.currentTimeMillis();IntStream.range(0, 10_000).parallel() .forEach(i -> { results[i][0] = (int) Math.round(Math.random() * 100); IntStream.range(1, 9_999) .parallel().forEach((int j) -> results[i][j] = (int) Math.round(Math.random() * 1000));});

Process finalized in 22974 msecs Process finalized in 22575 msecs Process finalized in 22606 msecs

Nested parallelStreams

Outer parallelStream could exhaust ForkJoin Workers: long start = System.currentTimeMillis();IntStream.range(0, 10_000).parallel() .forEach(i -> { results[i][0] = (int) Math.round(Math.random() * 100); IntStream.range(1, 9_999) .sequential().forEach((int j) -> results[i][j] = (int) Math.round(Math.random() * 1000));});

Process finalized in 12491 msecs Process finalized in 12589 msecs Process finalized in 12798 msecs

Other performance problems

Too much Auto(un)boxing

outboxing and boxing of Integers in every filter call

List<Integer> even = numbers.parallelStream() .filter(n -> n % 2 == 0) .sorted() .collect(toList());

4999464 elements computed in 290 msecs with 8 threads 4999464 elements computed in 276 msecs with 8 threads 4999464 elements computed in 257 msecs with 8 threads 4999464 elements computed in 265 msecs with 8 threads

Less Auto(un)boxing

outboxing and boxing of Integers in every filter call

List<Integer> even = numbers.parallelStream() .mapToInt(n -> n) .filter(n -> n % 2 == 0) .sorted() .boxed() .collect(toList());

4999460 elements computed in 160 msecs with 8 threads 4999460 elements computed in 243 msecs with 8 threads 4999460 elements computed in 144 msecs with 8 threads 4999460 elements computed in 140 msecs with 8 threads

Conclusions

Conclusions

ParallelStreams eases concurrent processing but: • Understand how it works • Don’t abuse the default common ForkJoinPool

• Don’t use when blocking by IO • Or use a custom ForkJoinPool

• Avoid unnecessary autoboxing • Don’t add contention or synchronisation • Be careful with nested parallel streams • Use method references when sorting