HotSpot TM : A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000.

31
HotSpot TM : A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of HotSpot TM : A Huge Step Beyond JIT’s Zhanyong Wan May 1st, 2000.

HotSpotTM: A Huge Step Beyond JIT’s

Zhanyong WanMay 1st, 2000

5/1/2000 Zhanyong Wan 2

Sources of Information From Sun’s web-site

– HotSpot white paperhttp://java.sun.com/products/hotspot/ whitepaper.html– Various articles on Sun’s web-sitehttp://java.sun.com/products/hotspot /

From other web-sites– Java on Steroids: Sun's High-Performance Java Implementation,

U. Hölzle et.al. (slides from HotChips IX, August 1997) http://www.cs.ucsb.edu/oocsb/papers/HotChips.pdf– The HotSpot Virtual Machine, Bill Vennershttp://www.artima.com/designtechniques/hotspot.html– HotSpot: A new breed of virtual machine, Eric

Amstronghttp://www.javaworld.com/jw-03-1998/f_jw-03-hotspot.html

5/1/2000 Zhanyong Wan 3

Overview Why Java is different Why JIT is not good enough What HotSpot does The HotSpot architecture

– Memory model– Thread model– Adaptive optimization

Conclusions

5/1/2000 Zhanyong Wan 4

History 1st generation JVM

– Purely interpreting– 30 - 50 times slower than C++

2nd generation JVM– JIT compilers– 3 - 10 times slower than C++

Static compilers– Better performance than JIT’s

5/1/2000 Zhanyong Wan 5

The Future? HotSpot

– Dynamic, fully optimizing compiler– Close-to-C++ performance– May even exceed the speed of C++ in the future

5/1/2000 Zhanyong Wan 6

Questions of Interest How is it possible that HotSpot runs programs

faster than the native code generated by a static optimizing Java compiler?

How does HotSpot score? (The collection of technologies used by HotSpot.)

Where did they get the ideas? Which of these technologies also apply in

other systems (e.g. JIT, static source code/bytecode compiler, C++)?

Can Java be made to surpass the performance of C++, or is this a hype?

5/1/2000 Zhanyong Wan 7

Why Java Is Different (to C++) Granularity of factoring

– Smaller classes– Smaller methods– More frequent calls– Standard compiler analysis fails

Dynamic dispatch– Slower calls for virtual functions– Much more frequent than in C++

Sophisticated run-time system– Allocation, garbage collection– Threads, synchronization

Dynamically changing program– Classes loaded/discarded on the fly

5/1/2000 Zhanyong Wan 8

Why Java Is Different (cont’d) Distributed in a portable form

– A compiler can generate optimal machine code for a particular processor version

• e.g. Pentium vs. Pentium II

– Welcomes dynamic compilation (developed in the last decade)!

5/1/2000 Zhanyong Wan 9

Find the Java Bottleneck Time used in a typical Java program executed

w/ JDK interpreter:– Allocation/GC: 1/6– Synchronization: 1/6– Byte code: 2/3– Native methods: negligible

Performance critical code: the “hot spots”

Byte codes

Alloc/GC

Synch

Native

5/1/2000 Zhanyong Wan 10

Why JIT Is Not Good Enough Compiles on method-by-method basis when a

method is first invoked Compilation consumes “user time”

– Startup latency– Dilemma: either good code or fast compiler

• Gains of better optimization may not justify extra compile time

• More concerned w/ generating code quickly than w/ generating the quickest code

Root of problem: compilation is too eager

5/1/2000 Zhanyong Wan 11

The Baaad Way to Optimize People try to help: the optimization lore

– Make methods final or static– Large classes/methods– Avoid interfaces (interface method invocation much

slower than regular dynamic method dispatch)– Avoid creating lots of short-lived objects– Avoid synchronization (very expensive)– Against good OO design!

“Premature optimization is the root of all evil.” (Donald Knuth)

5/1/2000 Zhanyong Wan 12

The HotSpot Way to Optimize Optimize only when you know you have a

problem1. A program starts off being interpreted2. A profiler collects run-time info in the background3. After a while, a set of hot spots is identified4. A thread is launched to compile the methods in the hot

spots• Execution of the program is *not* blocked• “Take your time!” – fully optimizing• Take advantage of the late compilation: run-time info used

5. Once a method is compiled, it doesn’t need to be interpreted

6. Native code can be discarded when the hot spots change• Keeping the footprint small• Bytecode is always kept around

5/1/2000 Zhanyong Wan 13

The HotSpot Way (cont’d) Tackles each of the bottlenecks

– Adaptive optimization – Fast, accurate garbage collection– Fast thread synchronization

Performance – 2-3 times faster than JITs– Comparable to C++

Most importantly, eliminates the “performance excuse” for poor designs/code

5/1/2000 Zhanyong Wan 14

The HotSpot Architecture Memory model Thread model Adaptive compiler

5/1/2000 Zhanyong Wan 15

The HotSpot Memory Model Object references

– Java 2 SDK: as indirect handles• Relocating objects made easy• A significant performance bottleneck

– HotSpot: as direct pointers• A performance boost• GC must adjust all reference to an object when it is

relocated

Object headers– Java 2 SDK: 3-word– HotSpot: 2-word

• 2 bits for GC mark (reference count removed?)• An 8% savings in heap size

5/1/2000 Zhanyong Wan 16

Garbage Collection Background GC traditionally considered inefficient

– Takes 1/6 of the time in an interpreting JVM– Even worse in a JIT VM

Modern GC technology – Performs substantially better than explicit freeing– How can this be true?

• Unnecessary copies avoided• Memory segmentation, space locality

5/1/2000 Zhanyong Wan 17

The HotSpot Garbage Collector A high-level GC framework

– New collection algorithms can be “plugged-in”– Currently has 3 cooperating GC algorithms

Major features– Fast allocation and reclamation– Fully accurate: guarantees full memory reclamation– Completely eliminates memory fragmentation– Incremental, no perceivable pauses (usually < 10ms)– Small memory overhead

• 2-bit GC mark per object• 2-word object header (instead of 3- in Java 2 SDK)

5/1/2000 Zhanyong Wan 18

The HotSpot GC: Accuracy A partially accurate (conservative) collector

must– Either avoid relocating objects– Or use handles to refer indirectly to objects (slow)

The HotSpot collector– Fully accurate– All inaccessible objects can be reclaimed– All objects can be relocated

• Eliminates memory fragmentation• Increases memory locality

5/1/2000 Zhanyong Wan 19

The HotSpot GC: the Structure Three cooperating collectors

– A generational copying collector• For short-lived objects

– A mark-compact “old object” collector• For longer-lived objects when the live object set is small

– An incremental “pauseless” collector• For longer-lived objects when the live object set is big

5/1/2000 Zhanyong Wan 20

Generational Copying Collector Observation: the vast majority (often > 95%)

of the objects are very short-lived The way it works

– A memory area is reserved as an object “nursery”– Allocation is just updating a pointer and checking for

overflow: extremely fast– By the time the nursery overflows, most objects in it

are dead; the collector just moves the few survivors to the “old object” memory area

5/1/2000 Zhanyong Wan 21

Mark-Compact Collector Rare case

– Triggered by low-memory conditions or programmatic requests

Time proportional to the size of the set of live objects– Calls for an incremental collector when the size is

large

5/1/2000 Zhanyong Wan 22

Incremental Pauseless Collector An alternative to the mark-compact collector Relatively constant pause time even w/

extremely large data set Suitable for server applications and soft-real

time applications (games, animations) The way it works

– The “train” algorithm– Breaks up GC pauses into tiny pauses– Not a hard-real time algorithm: no guarantee for

upper limit on pause times

Side-benefit: better memory locality– Tends to relocate tightly-coupled objects together

5/1/2000 Zhanyong Wan 23

The HotSpot Thread Model Native thread support

– Currently supports Solaris & 32bit Windows– Preemption– Multiprocessing

Per-thread activation stack is shared w/ native methods– Fast calls between C and Java

5/1/2000 Zhanyong Wan 24

Thread Synchronization takes 1/6 of the time in an interpreting JVM

– (I think) the proportion can be even higher for a JIT

HotSpot’s thread synchronization– Ultra-fast (“a breakthrough”)– Constant time for all uncontended (no rival) synch– Fully scalable to multiprocessor– Makes fine-grain synch practical, encouraging good

OO design

5/1/2000 Zhanyong Wan 25

Adaptive Inlining Method invocations reduce the effectiveness of

optimizers– Standard optimizers don’t perform well across

method boundaries (need bigger block of code)– Inlining is the solution

Inlining has problems– Increased memory foot-print– Inlining is harder w/ OO languages because of

dynamic dispatching (worse in Java than in C++) HotSpot uses run-time information to

– Inline only the critical methods– Limit the set of methods that might be invoked at a

certain point

5/1/2000 Zhanyong Wan 26

Dynamic Deoptimization Simple inlining may violate the Java semantics

– A program can change the patterns of method invocation

– Java program can change on the fly via dynamic class loading/discarding

– Optimizations may become invalid

Must be able to deoptimize dynamically!– HotSpot can deoptimize (revert back to bytecode?) a

hot spot even during the execution of the code for it.

5/1/2000 Zhanyong Wan 27

Fully Optimizing Compiler Performs all the classic optimizations

– Dead code elimination– Loop invariant hoisting– Common sub-expression elimination– Constant propagation– And more …

Java-specific optimizations– Null-check elimination– Range-check elimination

Global graph coloring register allocator Highly portable

– Relying on a small machine description file

5/1/2000 Zhanyong Wan 28

Transparent Debugging & Profiling Semantics Native code generation & optimization fully

transparent to the programmer– Uses two stacks

• One real, one simulating

– Overhead of two stacks?

Pure bytecode semantics: easy debugging & profiling

Question: what’s the point of a transparent profiling semantics?

5/1/2000 Zhanyong Wan 29

Performance Evaluation Micro-benchmarks: not the way

– No or few method calls/synchronizations– Small live data set– No correlation w/ real programs– Give unrealistic results for HotSpot

SPEC JVM98 benchmark– The only industry-standard benchmark for Java– Predictive of the performance across a number of

real applications

5/1/2000 Zhanyong Wan 30

Where are the ideas from? Mostly from the last decade’s academic work

– Dynamic compilation– Modern GC– HotSpot puts them together

Academic research is relevant!

5/1/2000 Zhanyong Wan 31

(My) Conclusions HotSpot is great

– Many new technologies previously only seen in academia

Java performance may come close to or exceed the current implementation of C++

However Sun’s argument that Java can be faster than C++ is not convincing yet:– C++ has better control on machine resources– Many technologies used in HotSpot can be exploited

for C++ as well. Especially:• Fast synchronization• Dynamic compilation• Maybe GC (for some dialects of C++)

– Whether Java can exceed C++ remains to be tested