Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work,...

117
Java in High-Performance Computing Dawid Weiss Carrot Search Institute of Computing Science, Poznan University of Technology GeeCon Pozna ´ n, 05/2010

Transcript of Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work,...

Page 1: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Java in High-Performance Computing

Dawid Weiss

Carrot SearchInstitute of Computing Science, Poznan University of Technology

GeeCon Poznan, 05/2010

Page 2: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 3: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 4: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Learn from the mistakes of others. You can’t live longenough to make them all yourself.

— Eleanor Roosevelt

Page 5: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 6: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Talk outline

• What is “High performance”?

• What is “Java”?

• Measuring performance (benchmarking).

• HPPC library.

Crosscutting: (un?)common pitfalls and performance killers. SomeHotSpot internals.

Page 7: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Talk outline

• What is “High performance”?

• What is “Java”?

• Measuring performance (benchmarking).

• HPPC library.

Crosscutting: (un?)common pitfalls and performance killers. SomeHotSpot internals.

Page 8: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Divide-and-conquerstyle algorithm

for (Example e : examples) {e.hasQuiz() ? e.showQuiz() : e.showCode();e.explain();e.deriveConclusions();

}

Page 9: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 10: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

— PART I —

High PerformanceComputing

Page 11: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

High-performance computing (HPC) usessupercomputers and computer clusters to solveadvanced computation problems.

— Wikipedia

Page 12: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Is Java faster than C/C++?The short answer is: it depends.

— Cliff Click

Page 13: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

It’s usually hard to makea fast program run faster.

It’s easy to make a slowprogram run even slower.

It’s easy to make fasthardware run slow.

Page 14: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

It’s usually hard to makea fast program run faster.

It’s easy to make a slowprogram run even slower.

It’s easy to make fasthardware run slow.

Page 15: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

It’s usually hard to makea fast program run faster.

It’s easy to make a slowprogram run even slower.

It’s easy to make fasthardware run slow.

Page 16: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

For now, HPC

• limited allowed computation time,

• constrained resources (hardware, memory).

Good HPC software ∝ no (obvious) flaws.

Page 17: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

For now, HPC

• limited allowed computation time,

• constrained resources (hardware, memory).

Good HPC software ∝ no (obvious) flaws.

Page 18: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

— PART II —

What is Java?

(Recall: Is Java faster than C/C++?)

Page 19: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 1

public void testSum1() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum1(i, i);result = sum;

}

public void testSum2() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum2(i, i);result = sum;

}

where the body of sum1 and sum2 sums arguments and returns theresult and COUNT is significantly large. . .

Page 20: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 1

public void testSum1() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum1(i, i);result = sum;

}

public void testSum2() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum2(i, i);result = sum;

}

where the body of sum1 and sum2 sums arguments and returns theresult and COUNT is significantly large. . .

Page 21: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum2

sun-1.6.0-20

0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 22: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum2

sun-1.6.0-20 0.04

2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 23: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16

0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 24: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18

0.04 3.29ibm-1.6.2 0.08 6.28

jrockit-27.5.0 0.18 0.16harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 25: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2

0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 26: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0

0.18 0.16harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 27: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296

0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 28: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 29: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum2 sum3 sum4

sun-1.6.0-20 0.04 2.62 1.05 3.76sun-1.6.0-16 0.04 3.20 1.39 4.99sun-1.5.0-18 0.04 3.29 1.46 5.20

ibm-1.6.2 0.08 6.28 0.16 14.64jrockit-27.5.0 0.18 0.16 1.16 3.18

harmony-r917296 0.17 0.35 9.18 22.49

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 30: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

int sum1(int a, int b) {return a + b;

}

Integer sum2(Integer a, Integer b) {return a + b;

}

Integer sum2(Integer a, Integer b) {return Integer.valueOf(

a.intValue() + b.intValue());}

Page 31: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

int sum3(int... args) {int sum = 0;for (int i = 0; i < args.length; i++)

sum += args[i];return sum;

}

Integer sum4(Integer... args) {int sum = 0;for (int i = 0; i < args.length; i++) {

sum += args[i];}return sum;

}

Integer sum4(Integer [] args) {// ...

}

Page 32: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Conclusions

• Syntactic sugar may be costly.

• Primitive types are fast.

• Large differences between different VMs.

Page 33: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 34: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 2

Write once, run anywhere!

Page 35: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 36: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 37: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 38: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

But it’s the same VM!

Page 39: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

It works on my machine!

Page 40: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

private static boolean ready;

public static void startThread() {new Thread() {

public void run() {try {

sleep(2000);} catch (Exception e) { /* ignore */ }System.out.println("Marking loop exit.");ready = true;

}}.start();

}

public static void main(String[] args) {startThread();System.out.println("Entering the loop...");while (!ready) {

// Do nothing.}System.out.println("Done, I left the loop!");

}

Page 41: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

while (!ready) {// Do nothing.

}≡?

boolean r = ready;while (!r) {

// Do nothing.}

In most cases true, from a JMM perspective.

Page 42: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

while (!ready) {// Do nothing.

}≡?

boolean r = ready;while (!r) {

// Do nothing.}

In most cases true, from a JMM perspective.

Page 43: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

JVM Internals. . .

Page 44: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 45: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 46: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 47: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

C1:

• fast

• not (much) optimization

C2:

• slow(er) than C1

• a lot of JMM-allowed optimizations

Page 48: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

There are hundreds of JVMtuning/diagnostic switches.

Page 49: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

My personal favorite:

Page 50: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Conclusions

• Bytecode is far from what is executed.

• A lot going on under the (VM) hood.

• Bad code may work, but will eventually crash.

• HotSpot-level optimizations are good.

• If there is a bug in the HotSpot compiler. . .

Page 51: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Conclusions

• Bytecode is far from what is executed.

• A lot going on under the (VM) hood.

• Bad code may work, but will eventually crash.

• HotSpot-level optimizations are good.

• If there is a bug in the HotSpot compiler. . .

Page 52: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 53: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 54: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 55: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Any other diversifyingfactors?

Page 56: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 57: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

J2ME

• more VM vendors,

• hardware diversity,

• software and hardware quirks.

Page 58: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 59: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Non-JVM target platforms

• Dalvik

• GWT

• IKVM

Page 60: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 61: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Conclusions

• There is no “single” Java performance model.

• Performance depends on the VM,environment, class library, hardware.

• Apply benchmark-and-correct cycle.

Page 62: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Benchmarking

Page 63: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 3

public void testSum1() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum1(i, i);result = sum;

}

public void testSum1_2() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum1(i, i);}

Page 64: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum1_2

sun-1.6.0-20

0.04 0.00sun-1.6.0-16 0.04 0.00sun-1.5.0-18 0.04 0.00

ibm-1.6.2 0.08 0.01jrockit-27.5.0 0.17 0.08

harmony-r917296 0.17 0.11

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 65: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum1_2

sun-1.6.0-20 0.04

0.00sun-1.6.0-16 0.04 0.00sun-1.5.0-18 0.04 0.00

ibm-1.6.2 0.08 0.01jrockit-27.5.0 0.17 0.08

harmony-r917296 0.17 0.11

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 66: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum1_2

sun-1.6.0-20 0.04 0.00

sun-1.6.0-16 0.04 0.00sun-1.5.0-18 0.04 0.00

ibm-1.6.2 0.08 0.01jrockit-27.5.0 0.17 0.08

harmony-r917296 0.17 0.11

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 67: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM sum1 sum1_2

sun-1.6.0-20 0.04 0.00sun-1.6.0-16 0.04 0.00sun-1.5.0-18 0.04 0.00

ibm-1.6.2 0.08 0.01jrockit-27.5.0 0.17 0.08

harmony-r917296 0.17 0.11

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 68: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 69: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

java -server -XX:+PrintOptoAssembly -XX:+PrintCompilation ...

- method holder: ’com/dawidweiss/geecon2010/Example03’- access: 0xc1000001 public- name: ’testSum1_2’

...010 pushq rbp

subq rsp, #16 # Create framenop # nop for patch_verified_entry

016 addq rsp, 16 # Destroy framepopq rbptestl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC

021 ret

Page 70: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

java -server -XX:+PrintOptoAssembly -XX:+PrintCompilation ...

- method holder: ’com/dawidweiss/geecon2010/Example03’- access: 0xc1000001 public- name: ’testSum1_2’

...010 pushq rbp

subq rsp, #16 # Create framenop # nop for patch_verified_entry

016 addq rsp, 16 # Destroy framepopq rbptestl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC

021 ret

Page 71: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Conclusions

• Benchmarks must be executed to providefeedback.

• HotSpot is smart and effective at removingdead code.

Page 72: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 4

@Testpublic void testAdd1() {

int sum = 0;for (int i = 0; i < COUNT; i++) {

sum += add1(i);}guard = sum;

}

public int add1(int i) {return i + 1;

}

Note add1 is virtual.

Page 73: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

switch testAdd1

-XX:+Inlining -XX:+PrintInlining 0.04-XX:-Inlining ?

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200, JRE 1.7b80-debug).

Page 74: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

switch testAdd1

-XX:+Inlining -XX:+PrintInlining 0.04-XX:-Inlining 0.45

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200, JRE 1.7b80-debug).

Page 75: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Most Java calls aremonomorphic.

Page 76: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

HotSpot adjusts tomegamorphic calls

automatically.

Page 77: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 5

abstract class Superclass {abstract int call();

}

class Sub1 extends Superclass{ int call() { return 1; } }

class Sub2 extends Superclass{ int call() { return 2; } }

class Sub3 extends Superclass{ int call() { return 3; } }

Superclass[] mixed =initWithRandomInstances(10000);

Superclass[] solid =initWithSub1Instances(10000);

@Testpublic void testMonomorphic() {

int sum = 0;int m = solid.length;for (int i = 0; i < COUNT; i++)

sum += solid[i % m].call();guard = sum;

}

@Testpublic void testMegamorphic() {

int sum = 0;int m = mixed.length;for (int i = 0; i < COUNT; i++)

sum += mixed[i % m].call();guard = sum;

}

Page 78: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM monomorphic megamorphic

sun-1.6.0-20 0.19 0.32sun-1.6.0-16 0.19 0.34sun-1.5.0-18 0.18 0.34

ibm-1.6.2 0.20 0.30jrockit-27.5.0 0.22 0.29

harmony-r917296 0.27 0.32

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 79: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 6

@Testpublic void testBitCount1() {

int sum = 0;for (int i = 0; i < COUNT; i++)

sum += Integer.bitCount(i);guard = sum;

}

@Testpublic void testBitCount2() {

int sum = 0;for (int i = 0; i < COUNT; i++)

sum += bitCount(i);guard = sum;

}

/* Copied from* {@link Integer#bitCount}*/

static int bitCount(int i) {// HD, Figure 5-2i = i - ((i >>> 1)

& 0x55555555);i = (i & 0x33333333)

+ ((i >>> 2) & 0x33333333);i = (i + (i >>> 4))

& 0x0f0f0f0f;i = i + (i >>> 8);i = i + (i >>> 16);return i & 0x3f;

}

Page 80: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM testBitCount1 testBitCount2

sun-1.6.0-20 0.43 0.43sun-1.7.0-b80 0.43 0.43

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

VM testBitCount1 testBitCount2

sun-1.6.0-20 0.08 0.33sun-1.7.0-b83 0.07 0.32

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Windows 7, Intel I7 860).

Page 81: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

VM testBitCount1 testBitCount2

sun-1.6.0-20 0.43 0.43sun-1.7.0-b80 0.43 0.43

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

VM testBitCount1 testBitCount2

sun-1.6.0-20 0.08 0.33sun-1.7.0-b83 0.07 0.32

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Windows 7, Intel I7 860).

Page 82: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

... -XX:+PrintInlining ...

...Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Example06.testBitCount1: [measured 10 out of 15 rounds]round: 0.07 [+- 0.00], round.gc: 0.00 [+- 0.00] ...

@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)

Example06.testBitCount2: [measured 10 out of 15 rounds]round: 0.32 [+- 0.01], round.gc: 0.00 [+- 0.00] ...

Page 83: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

... -XX:+PrintInlining ...

...Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Example06.testBitCount1: [measured 10 out of 15 rounds]round: 0.07 [+- 0.00], round.gc: 0.00 [+- 0.00] ...

@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)

Example06.testBitCount2: [measured 10 out of 15 rounds]round: 0.32 [+- 0.01], round.gc: 0.00 [+- 0.00] ...

Page 84: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

... -XX:+PrintOptoAssembly ...

{method}- klass: {other class}- method holder: com/dawidweiss/geecon2010/Example06- name: testBitCount1

...0c2 B13: # B12 B14 &lt;- B8 B12 Loop: B13-B12 inner stride: ...0c2 movl R10, RDX # spill...0e1 movl [rsp + #40], R11 # spill0e6 popcnt R8, R8...0f5 addl R9, #7 # int0f9 popcnt R11, R110fe popcnt RCX, R9

Page 85: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

... -XX:+PrintOptoAssembly ...

{method}- klass: {other class}- method holder: com/dawidweiss/geecon2010/Example06- name: testBitCount1

...0c2 B13: # B12 B14 &lt;- B8 B12 Loop: B13-B12 inner stride: ...0c2 movl R10, RDX # spill...0e1 movl [rsp + #40], R11 # spill0e6 popcnt R8, R8...0f5 addl R9, #7 # int0f9 popcnt R11, R110fe popcnt RCX, R9

Page 86: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 87: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 88: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Conclusions

• Benchmarks must be statistically sound.→ averages, variance, min, max, warm-up phase

• Account for HotSpot optimisations.

• Account for hardware differences.→ test-on-target

• Use domain data and real scenarios.

• Inspect suspicious output with debug JVM.

See more: Cliff Click, http://java.sun.com/javaone/2009/articles/rockstar_click.jsp.

Page 89: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

HPPCHigh Performance Primitive Collections

Page 90: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Motivation

• Primitive types: fast and memory-friendly.

• Optional assertions.

• Single-threaded. No fail-fast.

• Fast, fast, fast iterators, with no GC overhead.

• Open internals (explicit implementation).

• Programmers know what they’re doing.

Page 91: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Why not JCF?

public interface List<E> extends Collection<E> {boolean contains(Object o); // [-] contract-enforced methodsIterator<E> iterator(); // [-] iterators over primitive types?Object[] toArray(); // [-] troublesome covariants...

Page 92: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Friendly Competition• fastutil

• PCJ

• GNU Trove

• Apache Mahout (ported COLT)

• Apache Primitive Collections

All of these have pros and cons and deal with JCF compatibilitysomehow.

Page 93: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Iterators in fastutil or PCJ

interface IntIterator extends Iterator<Integer> {// Primitive-specific methodint nextInt();

}

Page 94: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Iterators in HPPC

public final class IntCursor {public int index;public int value;

}

public class IntArrayList extends Iterable<IntCursor> {Iterator<IntCursor> iterator() { ... }

}

Page 95: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Iterating over list elements in HPPC

for (IntCursor c : list) {System.out.println(c.index + ": " + c.value);

}

...or

list.forEach(new IntProcedure() {public void apply(int value) {

System.out.println(value);}

});

...or

final int [] buffer = list.buffer;final int size = list.size();

for (int i = 0; i < size; i++) {System.out.println(i + ": " + buffer[i]);

}

Page 96: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Iterating over list elements in HPPC

for (IntCursor c : list) {System.out.println(c.index + ": " + c.value);

}

...or

list.forEach(new IntProcedure() {public void apply(int value) {

System.out.println(value);}

});

...or

final int [] buffer = list.buffer;final int size = list.size();

for (int i = 0; i < size; i++) {System.out.println(i + ": " + buffer[i]);

}

Page 97: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Iterating over list elements in HPPC

for (IntCursor c : list) {System.out.println(c.index + ": " + c.value);

}

...or

list.forEach(new IntProcedure() {public void apply(int value) {

System.out.println(value);}

});

...or

final int [] buffer = list.buffer;final int size = list.size();

for (int i = 0; i < size; i++) {System.out.println(i + ": " + buffer[i]);

}

Page 98: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

The fastest one?

Page 99: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 100: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 101: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

What’s in HPPC?

Page 102: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 103: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Open implementation isgood.

Page 104: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

/*** Applies a supplemental hash function to a given* hashCode, which defends against poor quality* hash functions. [...]*/

static int hash(int h) {// This function ensures that hashCodes that differ only by// constant multiples at each bit position have a bounded// number of collisions (approximately 8 at default load factor).h ^= (h >>> 20) ^ (h >>> 12);return h ^ (h >>> 7) ^ (h >>> 4);

}

HashMap rehashes your (carefully crafted) hash code.

Page 105: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

HPPC approach (example):

public class LongIntOpenHashMap implements LongIntMap {// ...public LongIntOpenHashMap(int initialCapacity, float loadFactor,

LongHashFunction keyHashFunction, IntHashFunction valueHashFunction) {// ...

}

Defaults: LongMurmurHash, IntHashFunction.

Page 106: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 7

Frequency count of character bigrams in a given text.

Page 107: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

• HPPC:

final char [] CHARS = DATA;final IntIntOpenHashMap counts = new IntIntOpenHashMap();for (int i = 0; i < CHARS.length - 1; i++) {

counts.putOrAdd((CHARS[i] << 16 | CHARS[i + 1]), 1, 1);}

• JCF, boxed integer types.

final Integer currentCount = map.get(bigram);map.put(bigram, currentCount == null ? 1 : currentCount + 1);

• JCF, with IntHolder (mutable value object).

• GNU Trove

map.adjustOrPutValue(bigram, 1, 1);

• fastutil, OpenHashMap and LinkedOpenHashMap

map.put(bigram, map.get(bigram) + 1);

• PCJ, OpenHashMap and ChainedHashMap

Page 108: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 109: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance
Page 110: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Is Java faster than C/C++?The short answer is: it depends.

— Cliff Click

Page 111: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 8

The same algorithm for building a DFSA automaton accepting aset of strings. Input: 3 565 575 strings, 158M of text.

gcc -O2 java 1.6.0_20-64

real

63.850s 43.197s

user

63.110s 46.370s

sys

0.240s 0.840s

Page 112: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 8

The same algorithm for building a DFSA automaton accepting aset of strings. Input: 3 565 575 strings, 158M of text.

gcc -O2 java 1.6.0_20-64

real

63.850s 43.197s

user

63.110s 46.370s

sys

0.240s 0.840s

Page 113: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 8

The same algorithm for building a DFSA automaton accepting aset of strings. Input: 3 565 575 strings, 158M of text.

gcc -O2 java 1.6.0_20-64

real 63.850s

43.197s

user 63.110s

46.370s

sys 0.240s

0.840s

Page 114: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Example 8

The same algorithm for building a DFSA automaton accepting aset of strings. Input: 3 565 575 strings, 158M of text.

gcc -O2 java 1.6.0_20-64

real 63.850s 43.197suser 63.110s 46.370ssys 0.240s 0.840s

Page 115: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Summary and Conclusions

Page 116: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

Performance checklist(sanity check)

• Algorithms, algorithms, algorithms.

• Proper data structures.

• Spurious GC activity.

• Memory barriers in tight loops.

• CPU cache utilization.

• Low-level, hotspot-specific code structuring.

Page 117: Java in High-Performance Computing€¦ · A lotgoing on under the (VM) hood. Bad code may work, ... GWT IKVM. Conclusions There is no “single” Java performance model. Performance

HPPC and junit-benchmarks are at:http://labs.carrotsearch.com