Java Benchmarking - as easy as two timestamps€¦ · JavaBenchmarking as easy as two timestamps...

Post on 15-Aug-2020

21 views 0 download

Transcript of Java Benchmarking - as easy as two timestamps€¦ · JavaBenchmarking as easy as two timestamps...

Java Benchmarkingas easy as two timestamps

Aleksey Shipilёvaleksey.shipilev@oracle.com, @shipilev

The following is intended to outline our general productdirection. It is intended for information purposes only, and maynot be incorporated into any contract. It is not a commitmentto deliver any material, code, or functionality, and should notbe relied upon in making purchasing decisions. Thedevelopment, release, and timing of any features orfunctionality described for Oracle’s products remains at thesole discretion of Oracle.

Slide 2/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Intro

Slide 3/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Intro: Warming up...

«How much for instantiating a String?»

long time1 = System.nanoTime ();for (int i = 0; i < 1000; i++) {

String s = new String("");}long time2 = System.nanoTime ();System.out.println("Time:" + (time2 - time1 ));

Slide 4/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Theory

Slide 5/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Theory: Why would people benchmark?

In the name of...1. Holywar: Node.js – But Java... – Node.js!2. Marketing: Check we are meeting the (release) criteria3. Engineering: Isolate a performance phenomena, make a

reference point for improvements4. Science: Understand the performance model, and predict

the future behavior

Slide 6/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Theory: In the name of Holywar

My favorite example: Computer Language Benchmarks Game:1

� Most comparisons are hardly fair: e.g. AOT vs. JIT� Measures what exactly? E.g. pidigits measures the speed

of FFI to GNU GMP� Lots of disclaimers these results are misrepresentative of

the real world (alas, nobody reads them or cares enough)� People love it, since it gives you numbers, which you can

then take as your shield and sword in Internet debates

1http://benchmarksgame.alioth.debian.org/Slide 7/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Theory: In the name of Marketing

My favorite example: SPEC benchmarks� Reference benchmark suites, agreed upon by the vendors� Provide the reference points, for which one can set the

success criteria, use in adverts, tweet obnoxiouscompetitive data, etc.

� It does not matter how representative they are – itmatters they are The Benchmarks Born at the FierySummit of Orodruin

Slide 8/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Theory: In the name of Engineering

«If you can’t measure it, you can’t optimize it»� Need the conditions where the system is running in a

predictable state, so we are able to quantify improvements� These benchmarks usually focus on particular pieces of

system, and have more resolution than «marketing»benchmarks

Slide 9/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Theory: In the name of Science

«Science Town PD: To Explain and Predict»� Derive the sound performance model from the results� Use the performance model to predict the future

behavior: keep calm and deploy to production� The most sweaty, and the most reliable target for

benchmarking

Slide 10/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Theory: Why would people benchmark?

In the name of...1. Holywar: Node.js – But Java... – Node.js!2. Marketing: check we are meeting the (release) criteria3. Engineering: isolate a performance phenomena, make a

reference point for improvements4. Science: understand the performance model, and predict

the future behavior

Slide 11/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Theory: «Scientific» approach

Ultimate Question

How does a benchmark react on changing the externalconditions?

Or, how far the actual performance model is from themental one?

1. Fool-proof: do these results even make any sense?2. Negative control: benchmark reacts on change, but

shouldn’t?3. Positive control: benchmark should not react on change,

but does?

Slide 12/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Theory: «Engineering» approach

Ultimate Question

Why doesn’t my benchmark run faster?

Directly observe if our experimental setup is sane:1. Where are the bottlenecks?2. Do we expect those things to be bottlenecks?3. Are these benchmarks running in the same mode?

Slide 13/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Theory: JMH

JMH is a Serious Business:http://openjdk.java.net/projects/code-tools/jmh/

� When used properly, helps to mitigate VM quirks� Aids running lots of benchmarks in different conditions� Internal profiling to quickly triage the issues� JVM languages support: Java, Scala, Groovy, Kotlin� ...or anything else callable from Java (e.g. Nashorn, etc.)

Slide 14/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Scientific

Slide 15/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Scientific: Story

In this section, we explore some of the methodologyimplications when doing the benchmarks. People tend to think

this story is a deal-breaker when trying to build their ownbenchmark harnesses.

Complete story and narrative is here:http://shipilev.net/blog/2014/nanotrusting-nanotime/

Slide 16/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: Model Problem

«Jessie, it’s time to cook somebenchmarks...»

«What is the cost ofvolatile write?»

It seems like a very easy question...Let’s measure it! Shall we?

Slide 17/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: Easy...

public class VolatileWrite {int v; volatile int vv;

@Benchmarkint baseline1 () { return 42; }

@Benchmarkint incrPlain () { return v++; }

@Benchmarkint incrVolatile () { return vv++; }

}

Slide 18/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: ...does it!

public class VolatileWrite {int v; volatile int vv;

@Benchmarkint baseline1 () { return 42; } // 2.0 ns

@Benchmarkint incrPlain () { return v++; } // 3.5 ns

@Benchmarkint incrVolatile () { return vv++; } // 15.1 ns

}

Slide 19/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: Fatal Flaw

volatile int vv;

@Benchmarkint incrVolatile () { return vv++; }

� Measuring in very unfavorable case, when benchmark ischoked by volatiles. We are pushing the system to its«edge» condition. This almost never happens inproduction.

� What do we really need to know is:«What is the volatile cost in realistic conditions?»

Slide 20/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: Backoffs

@Param int tokens;

volatile int vv;

@Benchmarkint incrVolatile () {

Blackhole.consumeCPU(tokens ); // burn timereturn vv++;

}

� «Burn off» a few cycles before doing heavy-weight op� Juggle tokens ⇒ juggle operation mix

Slide 21/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: Backoffs

� Take a few baselines while we are at it: which one iscorrect?

@Benchmarkvoid baseline_Plain ()

{ BH.consumeCPU(tokens ); }

@Benchmarkint baseline_Return42 ()

{ BH.consumeCPU(tokens ); return 42; }

@Benchmarkint baseline_ReturnPlain ()

{ BH.consumeCPU(tokens ); return v; }

Slide 22/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: Measuring...

«Bender B.Rodriguez

regrets usingExcel to drawthe charts»

0

25

50

75

100

0 10 20 30backoff

nse

c/o

p

label

baseline_Plainbaseline_Return42baseline_ReturnV

incrPlainincrVolatile

Slide 23/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: Subtracting baselinePlain

� Absolute volatile cost gets compensated very well!� Can we really subtract the baselines?

−5

0

5

10

15

0 10 20 30backoff

nse

c/o

p

label baseline_Plain incrPlain incrVolatile

Slide 24/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: Subtracting baseline_Return42

� We added some code in the baseline, and it runs faster?� Nothing surprising: performance is not usually composable

−5

0

5

10

15

0 10 20 30backoff

nse

c/o

p

label baseline_Return42 incrPlain incrVolatile

Slide 25/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: WTF is different?

−5

0

5

10

15

0 10 20 30backoff

nse

c/o

p

label baseline_Plain incrPlain incrVolatile

@Benchmarkvoid base_Plain () {

BH.consumeCPU(tkns);}.

−5

0

5

10

15

0 10 20 30backoff

nse

c/o

p

label baseline_Return42 incrPlain incrVolatile

@Benchmarkint base_Ret42 () {

BH.consumeCPU(tkns);return 42;

}

Slide 26/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: WTF is different?

−5

0

5

10

15

0 10 20 30backoff

nse

c/o

p

label baseline_ReturnV incrPlain incrVolatile

@Benchmarkint base_RetV () {

BH.consumeCPU(tkns);return v;

}

−5

0

5

10

15

0 10 20 30backoff

nse

c/o

p

label baseline_Return42 incrPlain incrVolatile

@Benchmarkint base_Ret42 () {

BH.consumeCPU(tkns);return 42;

}

Slide 27/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: Bottom Line

� Different baselines act differently: they are teststhemselves!

� Therefore, we can just compare plain and volatile:

−5

0

5

10

15

0 10 20 30backoff

nse

c/o

p

label incrPlain incrVolatile

Slide 28/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: Conclusion

This is what models are for!

� Explore the system behavior outside the (randomly)chosen configuration points

� Allow to predict the system behavior in future conditions� Catch the experimental setup problems (control!)� Combinatorial experiments help to create different

operation mixes, and derive the individual op costs fromtheir composite performance

Slide 29/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Models: You Are Joking, Right?

«Combinatorial experiments help to create different operationmixes, and derive the individual op costs from their composite

performance»

System.nanoTime!Measure each part individually!

Slide 30/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Verifying infrastructure

Why not?

// call continuouslypublic long measure () {

long startTime = System.nanoTime ();work ();return System.nanoTime () - startTime;

}

Slide 31/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Measuring Latency

Latency = time to call System.nanoTime

@Benchmarkpublic long latency_nanotime () {

return System.nanoTime ();}

Slide 32/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Measuring Granularity

Granularity = the minimum non-zero difference between twoconsecutive calls

private long lastValue;

@Benchmarkpublic long granularity_nanotime () {

long cur;do {

cur = System.nanoTime ();} while (cur == lastValue );lastValue = cur;return cur;

}

Slide 33/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Typical Case [Linux]

Java(TM) SE Runtime Environment , 1.7.0_45 -b18Java HotSpot(TM) 64-Bit Server VM, 24.45-b08Linux , 3.13.8-1-ARCH , amd64

Running with 1 threads and [-client ]:granularity_nanotime: 26.300 +- 0.205 ns

latency_nanotime: 25.542 +- 0.024 ns

Running with 1 threads and [-server ]:granularity_nanotime: 26.432 +- 0.191 ns

latency_nanotime: 26.276 +- 0.538 ns

Slide 34/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Typical Case [Solaris]

Java(TM) SE Runtime Environment , 1.8.0- b132Java HotSpot(TM) 64-Bit Server VM, 25.0-b70SunOS , 5.11, amd64

Running with 1 threads and [-client ]:granularity_nanotime: 29.322 +- 1.293 ns

latency_nanotime: 29.910 +- 1.626 ns

Running with 1 threads and [-server ]:granularity_nanotime: 28.990 +- 0.019 ns

latency_nanotime: 30.862 +- 6.622 ns

Slide 35/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Typical Case [Windows]

Java(TM) SE Runtime Environment , 1.7.0_51 -b13Java HotSpot(TM) 64-Bit Server VM, 24.51-b03Windows 7, 6.1, amd64

Running with 1 threads and [-client ]:granularity_nanotime: 371,419 +- 1,541 ns

latency_nanotime: 14,415 +- 0,389 ns

Running with 1 threads and [-server ]:granularity_nanotime: 371,237 +- 1,239 ns

latency_nanotime: 14,326 +- 0,308 ns

Slide 36/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Epic Case [Windows]

Java(TM) SE Runtime Environment , 1.8.0- b132Java HotSpot(TM) 64-Bit Server VM, 25.0-b70Windows Server 2008, 6.0, amd64

Running with 32 threads and [-client ]:granularity_nanotime: 15137.504 +- 97.132 ns

latency_nanotime: 15190.080 +- 1760.500 ns

Running with 32 threads and [-server ]:granularity_nanotime: 15118.159 +- 121.671 ns

latency_nanotime: 15176.690 +- 1504.406 ns

Slide 37/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Model Experiment

� But if System.nanoTime() is heavy and potentiallynon-scaling, then we run the system into oblivion?

� Let’s figure out when it starts to Detroit:

@Paramint backoff;

@Benchmarkpublic long nanotime () {

Blackhole.consumeCPU(backoff );return System.nanoTime ();

}

Slide 38/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Seems OK [Linux]

101

102

103

104

105

0 4 8 12 16 20 24 28 32 36 40 44 48threads

Syste

m.n

anoT

ime +

backoff, nsec

0 1 2 3 4log10(backoff)

System.nanoTime() latency vs. backoff [Linux]

Slide 39/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Double U. Tee. Eff. [Windows]

101

102

103

104

105

0 4 8 12 16 20 24 28 32threads

Syste

m.n

anoT

ime +

backoff, nsec

0 1 2 3 4log10(backoff)

System.nanoTime() latency vs. backoff [Windows]

Slide 40/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Paying for Monotonicity [Solaris]

101

102

103

104

105

0 4 8 12 16 20 24 28 32threads

Syste

m.n

anoT

ime +

backoff, nsec

0 1 2 3 4log10(backoff)

System.nanoTime() latency vs. backoff [Solaris]

Slide 41/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Typical Case [Mac OS X]

Java(TM) SE Runtime Environment , 1.8.0- b132Java HotSpot(TM) 64-Bit Server VM, 25.0-b70Mac OS X, 10.9.2 , x86_64

Running with 1 threads and [-server ]:granularity_nanotime: 1009.623 +- 2.140 ns

latency_nanotime: 44.145 +- 1.449 ns

Running with 4 threads and [-server ]:granularity_nanotime: 1044.703 +- 32.103 ns

latency_nanotime: 56.111 +- 3.397 ns

Slide 42/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Summing Up

System.nanoTime – is a new String.intern!

� Giving users the nanoTime is handing over a loaded gun� nanoTime is may and should be used in selected cases,

when you can foresee all disadvantages� Most frequently, the direct measurement is not available,

and we have to derive the models from the collateralevidence

Slide 43/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Timers: Stop Kidding Already?

Our code blocks are heavy enoughto keep nanoTime() granularity

and latency at bay!

Slide 44/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Omission: Heavy Benchmark is Heavy

public long measure () {long ops = 0;long startTime = System.nanoTime ();while(! isDone) {

setup (); // want to skip thiswork ();ops++;

}return ops / (System.nanoTime () - startTime );

}

Slide 45/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Omission: Measuring the Separate Block

public long measure () {long ops = 0;long realTime = 0;while(! isDone) {

setup (); // skip thislong time = System.nanoTime ();

work ();realTime += (System.nanoTime () - time);ops++;

}return ops / realTime;

}

Slide 46/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Omission: Checking Empty setup()...

Measuring the throughput... it grows past the CPU count?!

0

200

400

600

0 4 8 12 16 20 24 28 32# Threads

thro

ughput, o

ps/u

s

External loop timestamps Sum over per−iteration timestamps

Slide 47/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Omission: Hint

public long measure () {long ops = 0;long realTime = 0;while(! isDone) {

setup (); // skip thislong time = System.nanoTime ();

work ();realTime += (System.nanoTime () - time);ops++;...WHOOPS, WE DE-SCHEDULE HERE...

}return ops / realTime;

}

Slide 48/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Omission: Basic Example

� Measuring the operation time, 10 ms/op on average ⇒each 𝑖-th thread thinks its individual throughput is 𝜆𝑖 =100 ops/sec

� We have two threads, and therefore𝑁∑︀𝑖=1

𝜆𝑖 = 200 ops/sec

Slide 49/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Omission: A Fistful of Threads More

� Each thread still believes 𝜆𝑖 = 100 ops/sec!

� Now we have four threads ⇒𝑁∑︀𝑖=1

𝜆𝑖 = 400 ops/sec

Slide 50/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Omission: A Fistful of Threads More

� Each thread still believes 𝜆𝑖 = 100 ops/sec!

� Now we have four threads ⇒𝑁∑︀𝑖=1

𝜆𝑖 = 400 ops/sec

Slide 50/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Omission: Conclusion

"Phillip J. Fry is experiencingthe major safepoint event"

Timers skip the beats, andmay grossly

under/overestimate thedurations.

� Every performance metric thatincludes time is at fault

� Very easy to blow up onoverloaded systems

� Very easy to blow up whenmeasurers coordinate withworkload

Slide 51/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

S.S.: (TGIF) Thank God It’s Fibonacci

Is there a problem, officer?

public class FibonacciGen {BigInteger n1 = ONE; BigInteger n2 = ZERO;

@Benchmarkpublic BigInteger next() {

BigInteger cur = n1.add(n2);n2 = n1; n1 = cur;return cur;

}}

Slide 52/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

S.S.: Timing Each Call...

Whoops, this benchmark has no steady state, indeed:

2.5

5.0

7.5

10.0

12.5

0 25000 50000 75000 100000call #

tim

e to c

all, usec

Slide 53/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

S.S.: Pitfalls

No steady state – can not use the time-based benchmarks!The longer we measure, the «slower» the result appears:

duration, sec throughput, us/op1 5.013 ± 0.0062 7.087 ± 0.0094 10.021 ± 0.0178 14.159 ± 0.010

Slide 54/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

S.S.: Pick Your Poison

Time-based benchmarks:� Measuring in God knows what conditions� How should one compare two implementations?

(if you are lucky, and your performance model is linear...)

Work-based benchmarks:� Burning ourselves with timers latency/granularity� Burning ourselves with omission� Burning ourselves with transients

Slide 55/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

S.S.: Conclusion

«The only winning moveis not to play at all»

Non-steady state benchmarks forceyou to choose between all the bad

options.

Non-steady state benchmarks arethe large P.I.T.A!

Slide 56/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

S.S.: Palliative Relief

Measure in large batches!

@Setup(Level.Iteration)public void setup () {

n1 = BigInteger.ZERO; n2 = BigInteger.ONE;}

@Benchmark@Measurement(batchSize = 5000)public BigInteger next() {

BigInteger cur = n1.add(n2);n2 = n1; n1 = cur;return cur;

}

Slide 57/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering

Slide 58/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Comparisons

You want your results to be comparable.

� Every tiny little uncontrolled detail is a free variable� Libraries are the large complexes of tiny details� Language runtimes are galaxies of tiny details

Slide 59/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Story

This is a weird story of Java vs. Scala comparison coming fromStackOverflow, where people are bound to that believe

tail-recursion optimization is the best thing that happened incomputer science since the sliced bread.

Complete story and narrative is here:http://shipilev.net/blog/2014/java-scala-divided-we-fail/

Slide 60/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Scala’s @tailrec

@tailrec private defisDivisible(v: Int , d: Int , l: Int): Boolean = {

if (d > l) trueelse (v % d == 0) && isDivisible(v, d + 1, l)

}

@Benchmarkdef test (): Int = {

var v = 10while(! isDivisible(v, 2, l))

v += 2v

}

Slide 61/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Java’s absence-of-tailrec

private boolean isDivisible(int v, int d, int l) {if (d > l) return true;else

return (v % d == 0) && isDivisible(v, d+1, l);}

@Benchmarkpublic int test() {

int v = 10;while(! isDivisible(v, 2, l))

v += 2;return val;

}

Slide 62/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Measuring

Benchmark lim Score Score error Units-----------------------------------------------------ScalaBench 1 0.002 0.000 us/opScalaBench 5 0.494 0.005 us/opScalaBench 10 24.228 0.268 us/opScalaBench 15 3457.733 33.070 us/opScalaBench 20 2505634.259 15366.665 us/opJavaBench 1 0.002 0.000 us/opJavaBench 5 0.252 0.001 us/opJavaBench 10 12.782 0.325 us/opJavaBench 15 1615.890 7.647 us/opJavaBench 20 1053187.731 20502.217 us/op

Slide 63/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Profiling Java

Result: 12.719 +-(99.9%) 0.284 us/op [Average]

....[Thread state distributions].......................91.3% RUNNABLE8.7% WAITING

....[Thread state: RUNNABLE]...........................58.0% 63.5% n.s.JavaBench.isDivisible32.9% 36.1% n.s.JavaBench.test

....[Thread state: WAITING]............................8.7% 100.0% <irrelevant>

Slide 64/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Profiling Scala

Result: 24.076 +-(99.9%) 0.728 us/op [Average]

....[Thread state distributions].......................91.4% RUNNABLE8.6% WAITING

....[Thread state: RUNNABLE]...........................90.6% 99.1% n.s.ScalaBench.test0.9% 0.9% n.s.generated.ScalaBench_test.test_avgt_jmhLoop

....[Thread state: WAITING]............................8.6% 100.0% <irrelevant>

Slide 65/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Coarse-grained profilers

Coarse-grained (method-level) profilers are useless indiagnosing the problems in nano- and

micro-benchmarks.

Additional penalty points if they are sampling atsafepoints.

Slide 66/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: JMH perfasm

java -jar benchmarks.jar ... -prof perfasm

Surprisingly easy to marry these three things:1. Linux perf provides light-weight PMU sampling2. JVM debug info maps events back to VM methods3. -XX:+PrintAssembly maps events back to Java code

Actually, there are lots of good profilers already, but most ofthe time you don’t need «big guns» to quickly analyze

benchmarks.

Slide 67/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Hottest thing in Scala

One true and solid x86 division:

clocks insns code-------------------------------------------------------; n.s.g.ScalaBench_test::test_avgt_jmhLoop...

0.27% 0.17% cltd2.24% 17.26% idiv %ecx

77.99% 66.44% test %edx,%edx...

How can you possibly be 2x faster than this?

Slide 68/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Hottest thing in Java

clocks insns code-------------------------------------------------------; n.s.JavaBench::isDivisible...

1.68% 2.76% cltd0.06% 0.16% idiv %ecx

27.59% 36.37% test %edx,%edx...0.04% cltd

idiv %r10d12.24% 1.54% test %edx,%edx...0.01% callq <recursive-call>

Slide 69/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Second hottest thing in Java

clocks insns code-------------------------------------------------------; n.s.g.JavaBench_test::test_avgt_jmhLoop...1.34% 0.21% imul $0x55555556,%rdx,%rdx1.25% 0.20% sar $0x20,%rdx1.15% 2.36% mov %edx,%esi0.95% 1.51% sub %r10d,%esi ; irem

...

Beautiful trick of substituting the remainder with constantmultiplication and binary shift! 2

2http://www.hackersdelight.org/divcMore.pdfSlide 70/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Second hottest thing in Java

clocks insns code-------------------------------------------------------; n.s.g.JavaBench_test::test_avgt_jmhLoop...1.34% 0.21% imul $0x55555556,%rdx,%rdx1.25% 0.20% sar $0x20,%rdx1.15% 2.36% mov %edx,%esi0.95% 1.51% sub %r10d,%esi ; irem

...

Beautiful trick of substituting the remainder with constantmultiplication and binary shift! 2

2http://www.hackersdelight.org/divcMore.pdfSlide 70/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Quick Explanation

// inlines twice , specializes for d={2,3}private boolean isDivisble(int v, int d, int l) {

...return (v % d == 0) && isDivisble(v, d+1, l);

}

@Benchmarkpublic int test() {

int v = 10;while(! isDivisble(v, 2, l))

v += 2;return val;

}

Slide 71/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Make «d» unpredictable

Benchmark lim Score Score error Units-----------------------------------------------------ScalaBench 1 0.002 0.000 us/opScalaBench 5 0.489 0.002 us/opScalaBench 10 23.777 0.116 us/opScalaBench 15 3379.870 5.737 us/opScalaBench 20 2468845.944 2413.573 us/opJavaBench 1 0.003 0.000 us/opJavaBench 5 0.465 0.001 us/opJavaBench 10 22.989 0.095 us/opJavaBench 15 3453.116 16.390 us/opJavaBench 20 2518726.451 4374.482 us/op

Slide 72/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Engineering: Conclusion

«Days since the lastbenchmarking accident: 0»

(@gvsmirnov)

Benchmarks withoutanalysis make me a really

sad panda.

You show me nice charts:Language A vs. Language

B, Nashorn vs. Rhino, Graalvs. C2, etc, and all I see is

BAYESIAN NOISE

Slide 73/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Fin

Slide 74/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.

Fin: Conclusion

«If you don’t analyze thebenchmarks, you’ve gonna waste

a good time»

The superficial conclusionsalmost always feed onexisting biases, and arealmost always wrong.

Benchmarks are forunderstanding the Reality,not for reinforcing yourprejudices about the

Universe.

Slide 75/75. Copyright c○ 2014, Oracle and/or its affiliates. All rights reserved.