Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes...

41
© Copyright Azul Systems 2019 © Copyright Azul Systems 2015 @speakjava Java at Speed: Building a Better JVM Simon Ritter Deputy CTO, Azul Systems 1

Transcript of Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes...

Page 1: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

© Copyright Azul Systems 2015

@speakjava

Java at Speed:Building a Better JVM

Simon RitterDeputy CTO, Azul Systems

1

Page 2: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

JVM Performance Graph: Ideal

2

Page 3: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

JVM Performance Graph: Reality

3

Bytecodesinterpreted

C1 JIT plusprofiling

C2 JIT withDeoptimisations

Steady optimisedstate GC pauses

Page 4: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Zing: A Better JVM

4

Page 5: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Azul Zing JVM§ Based on OpenJDK source code§ Passes all Java SE TCK/JCK tests

– Drop in replacement for other JVMs § Hotspot collectors replaced with C4§ Works in conjunction with Zing System Tools

– Only supported on Linux§ Falcon JIT compiler

– C2 replacement§ ReadyNow! warm up elimination technology

5

Page 6: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Zing System Tools§ Enables better memory management for JVM§ Memory freed by JVM is returned to kernel§ Allocation of new blocks comes from kernel

– ZST knows cache status– Newly allocated blocks for TLAB are ‘hot’– Not like standard JVM

§ Other clever tricks– Contingency memory

6

Page 7: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Azul Continuous ConcurrentCompacting Collector (C4)

Page 8: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

C4 Basics§ Generational (young and old)

– Uses the same GC collector for both– For efficiency rather than pause containment

§ Concurrent, parallel and compacting§ No STW compacting fallback§ Algorithm is mark, relocate, remap

8

Page 9: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Loaded Value Barrier§ Read barrier

– Tests all object references as they are loaded§ Enforces two invariants

– Reference is marked through– Reference points to correct object position

§ Allows for concurrent marking and relocation§ Minimal performance overhead

– Test and jump (2 instructions)– x86 architecture reduces this to one micro-op

9

Page 10: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Concurrent Mark Phase

10

Root SetGC Threads

App Threads

X

X

X

XX

Page 11: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Relocation Phase

11

Compaction

A B C D E

A’ B’ C’ D’ E’

A -> A’ B -> B’ C -> C’ D -> D’ E -> E’

Page 12: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Quick Release

12

A -> A’ B -> B’ C -> C’ D -> D’ E -> E’

PHYSICAL

VIRTUAL

Page 13: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Remapping Phase

App Threads

GC Threads

A -> A’ B -> B’ C -> C’ D -> D’ E -> E’

X

X

X

Page 14: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Zing: Big Heaps, No Problem§ Scales to 8Tb heap

– No degradation in pause times§ Use one big heap, rather than many small heaps

– Less JVMs means more efficiency§ Zing does not require big heaps

– But works well with them

14

Page 15: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

GC Tuning

Page 16: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Non-Zing GC Tuning Options

Page 17: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

GC Tuning Used To Be HardJava -Xmx12g -XX:MaxPermSize=64M -XX:PermSize=32M -XX:MaxNewSize=2g

-XX:NewSize=1g -XX:SurvivorRatio=128 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0-XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled-XX:+UseCMSInitiatingOccupancyOnly -XX:ParallelGCThreads=12 -XX:LargePageSizeInBytes=256m …

Java –Xms8g –Xmx8g –Xmn2g -XX:PermSize=64M -XX:MaxPermSize=256M-XX:-OmitStackTraceInFastThrow -XX:SurvivorRatio=2 -XX:-UseAdaptiveSizePolicy -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled-XX:CMSMaxAbortablePrecleanTime=10000 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=63 -XX:+UseParNewGC –Xnoclassgc

Page 18: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

GC Tuning Used To Be HardJava -Xmx12g -XX:MaxPermSize=64M -XX:PermSize=32M -XX:MaxNewSize=2g

-XX:NewSize=1g -XX:SurvivorRatio=128 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0-XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled-XX:+UseCMSInitiatingOccupancyOnly -XX:ParallelGCThreads=12 -XX:LargePageSizeInBytes=256m …

Java –Xms8g –Xmx8g –Xmn2g -XX:PermSize=64M -XX:MaxPermSize=256M-XX:-OmitStackTraceInFastThrow -XX:SurvivorRatio=2 -XX:-UseAdaptiveSizePolicy -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled-XX:CMSMaxAbortablePrecleanTime=10000 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=63 -XX:+UseParNewGC –Xnoclassgc

Page 19: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

GC Tuning With Zing

java -Xmx1g

java -Xmx10g

java -Xmx100g

java -Xmx2t

Page 20: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Measuring Platform Performance§ jHiccup§ Spends most of its time asleep

– Minimal effect on perfomance– Wakes every 1 ms– Records delta of time it expects to wake up– Measured effect is what would be experienced by your

application§ Generates histogram log files

– These can be graphed for easy evaluation

20

Page 21: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Small Heap, Small Latency

21

Hazelcast 2-node system with 1Gb heap Hotspot v. Zing

Page 22: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Big Heap, Small Latency

22

Cassandra with 60Gb heap Hotspot v. Zing

Page 23: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Azul Falcon JIT Compiler

Page 24: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Advancing Adaptive Compilation§ Azul Falcon JVM compiler

– Based on latest compiler research– LLVM project

§ Better performance– Better intrinsics– More inlining– Fewer compiler excludes

§ Replacement for C2 compiler

Page 25: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Simple Code Example§ Simple array summing loop

– A modern compiler will use vector operations for this

25

Page 26: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

More Complex Code Example§ Conditional array cell addition loop

– Hard for compiler to identify for vector instruction use

26

Page 27: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Traditional JVM JIT

27

Per element jumps2 elements per iteration

Page 28: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Falcon JIT

Using AVX2 vector instructions32 elements per iteration

Broadwell E5-2690-v4

Page 29: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

ReadyNow!

Page 30: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Traditional JVM

30

Application Warm-up

Page 31: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

ReadyNow! Solution§ Save JVM JIT profiling information

– Classes loaded– Classes initialised– Instruction profiling data– Speculative optimisation failure data

§ Data can be gathered over much longer period– JVM/JIT profiles quickly– Significant reduction in deoptimisations

§ Able to load, initialise and compile most code before main()

31

Page 32: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Effect Of ReadyNow!

Customer application

Page 33: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

ReadyNow! Startup TimePe

rform

ance

Time

Perfo

rman

ce

Time

Without ReadyNow!

With ReadyNow!

Class loading, initialising and compile time

Page 34: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Falcon Pipeline

Zing JVM

Bytecodefrontend

LLVM

LLVM IR

VMcallbacks

Queries

Responses

Compiledmethods

Machine code

Page 35: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Deterministic Compiler

Method for compilation

Initial IR(Method bytecodes & live profile)

Queries and responses

Produced machine code

Given identical input

Guarantees identical output

Page 36: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Add Compile Stashing

Zing JVM

Bytecodefrontend

LLVM

LLVM IR

VMcallbacks

Queries

Responses

Compiledmethods

Machine code

CompileStash

Page 37: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Compile Stashing EffectPe

rform

ance

Time

Perfo

rman

ce

Time

Without Compile Stashing

With Compile Stashing

Up to 80% reduction in compile timeand 60% reduction in CPU load

Page 38: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

Summary

Page 39: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

FalconReadyNow! & Compile Stashing

C4

JVM Performance Graph: Zing

Page 40: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

The Zing JVM§ Start fast§ Go faster§ Stay fast

§ Simple replacement for other JVMs– No recoding necessary

40

Try Zing free for 30 days:

azul.com/zingtrial

Page 41: Java at Speed - QCon New York§Save JVM JIT profiling information –Classes loaded –Classes initialised –Instruction profiling data –Speculative optimisation failure data §Data

© Copyright Azul Systems 2019

© Copyright Azul Systems 2015

@speakjava

Thank you!

Simon RitterDeputy CTO, Azul Systems

41