Jvm profiling under the hood

64
JVM Profiling Under da Hood Richard Warburton - @RichardWarburto Nitsan Wakart - @nitsanw

Transcript of Jvm profiling under the hood

JVM ProfilingUnder da Hood

Richard Warburton - @RichardWarburtoNitsan Wakart - @nitsanw

Why Profile?

Lies, Damn Lies and Statistical Profiling

Under the Hood

Conclusion

Measure data from your application

Exploratory Profiling

Execution Profiling=

Where in code is my applicationspending time?

CPU Profiling Limitations● Finds CPU bound bottlenecks

● Many problems not CPU Bound○ Networking○ Database or External Service○ I/O○ Garbage Collection○ Insufficient Parallelism○ Blocking & Queuing Effects

Why Profile?

Lies, Damn Lies and Statistical Profiling

Under the Hood

Conclusion

Different Execution Profilers● Instrumenting

○ Adds timing code to application

● Sampling○ Collects thread dumps periodically

Sampling Profilers

WebServerThread.run()

Controller.doSomething() Controller.next()

Repo.readPerson()

new Person()

View.printHtml()

Periodicity Bias

● Bias from sampling at a fixed interval

● Periodic operations with the same frequency as the samples

● Timed operations

Periodicity Bias

a() ??? a() ??? a() ??? a() ???

Stack Trace Sampling

● JVMTI interface: GetCallTrace○ Trigger a global safepoint(not on Zing)

○ Collect stack trace

● Large impact on application

● Samples only at safepoints

Exampleprivate static void outer()

{

for (int i = 0; i < OUTER; i++)

{

hotMethod(i);

}

}

// https://github.com/RichardWarburton/profiling-samples

Example (2)private static void hotMethod(final int i)

{

for (int k = 0; k < N; k++)

{

final int[] array = SafePointBias. array;

final int index = i % SIZE;

for (int j = index; j < SIZE; j++)

{

array[index] += array[j];

}

}

}

-XX:+PrintSafepointStatistics

ThreadDump 48

Maximum sync time 985 ms

Whats a safepoint?

● Java threads poll global flag○ At ‘uncounted’ loops back edge○ At method exit/enter

● A safepoint poll can be delayed by:○ Large methods○ Long running ‘counted’ loops○ BONUS: Page faults/thread suspension

Safepoint Bias

WebServerThread.run()

Controller.doSomething() Controller.next()

Repo.readPerson()

new Person()

View.printHtml() ???

Let sleeping dogs lie?

● ‘GetCallTrace’ profilers will sample ALL

threads

● Even sleeping threads...

This Application Mostly Sleeps

JVisualVM snapshot

No CPU? No profile!

JMC profile

Why Profile?

Lies, Damn Lies and Statistical Profiling

Under the Hood

Conclusion

Honest Profiler

https://github.com/richardwarburton/honest-profiler

AsyncGetCallTrace

● Used by Oracle Solaris Studio

● Adapted to open source prototype by Google’s Jeremy Manson

● Unsupported, Undocumented … Underestimated

SIGPROF - Interrupt Handlers

● OS Managed timing based interrupt

● Interrupts the thread and directly calls an event handler

● Used by profilers we’ll be talking about

Design

Log File

Processor Thread Graphical UI

Console UI

Signal Handler

Signal Handler

Os Timer Thread

“You are in a maze of twisty little stack frames,

all alike”

AsyncGetCallTrace under the hood

● A Java thread is ‘possessed’

● You have the PC/FP/SP

● What is the call trace?○ jmethodId - Java Method Identifier

○ bci - Byte Code Index -> used to find line number

Where Am I?

● Given a PC what is the current method?

● Is this a Java method?○ Each method ‘lives’ in a range of addresses

● If not, what do we do?

Java Method? Which line?

● Given a PC, what is the current line?○ Not all instructions map directly to a source line

● Given super-scalar CPUs what does PC

mean?

● What are the limits of PC accuracy?

“> I think Andi mentioned this to me last year -- > that instruction profiling was no longer reliable.

It never was.”

http://permalink.gmane.org/gmane.linux.kernel.perf.user/1948Exchange between Brenden Gregg and Andi Kleen

Skid

● PC indicated will be >= to PC at sample time

● Known limitation of instruction profiling

● Leads to harder ‘blame analysis’

Limits of line number accuracy:

Line number (derived from BCI) is the closest

attributable BCI to the PC (-XX:+DebugNonSafepoint)

The PC itself is within some skid distance from

actual sampled instruction

● Divided into frames○ frame { sender*, stack*, pc }

● A single linked list:root(null, s0, pc1) <- call1 (root, s1, pc2) <- call2(call1, s2, pc2)

● Convert to: (jmethodId,lineno)

The Stack

A typical stack

● JVM Thread runner infra:○ JavaThread::run to JavaCalls::call_helper

● Interleaved Java frames:○ Interpreted○ Compiled○ Java to Native and back

● Top frame may be Java or Native

Native frames

● Ignored, but need to navigate through

● Use a dedicated FP register to find sender

● But only if compiled to do so…

● Use a last remembered Java frame insteadSee: http://duartes.org/gustavo/blog/post/journey-to-the-stack/

Java Compiled Frames

● C1/C2 produce native code

● No FP register: use set frame size

● Challenge: methods can move (GC)

● Challenge: methods can get recompiled

Java Interpreter frames

● Separately managed by the runtime

● Make an effort to look like normal frames

● Challenge: may be interrupted half-way

through construction...

Virtual Frames

● C1/C2 inline code (intrinsics/other methods)

● No data on stack

● Must use JVM debug info

AsyncGetCallTrace Limitations

● Only profiles running threads

● Accuracy of line info limited by reality

● Only reports Java frames/threads

● Must lookup debug info during call

Compilers: Friend or Fiend?void safe_reset(void *start, size_t size) {

char *base = reinterpret_cast<char *>(start);

char *end = base + size;

for (char *p = base; p < end; p++) {

*p = 0;

}

}

Compilers: Friend or Fiend?safe_reset(void*, unsigned long):

lea rdx, [rdi+rsi]

cmp rdi, rdx

jae .L3

sub rdx, rdi

xor esi, esi

jmp memset

.L3:

rep ret

Concurrency Bug

● Even simple concurrency bugs are hard to spot

● Unspotted race condition in the ring buffer

● Spotted thanks to open source & Rajiv Signal

WriterReader

WriterReader

Extra Credit!

Native Profiling Tools

● Profile native methods

● Profile at the instruction level

● Profile hardware counters

Perf

● A Linux profiling tool

● Can be made to work with Java

● JMH integration

● Ongoing integration efforts

Solaris Studio

● Works on Linux!

● Secret Weapon!

● Give it a go!

ZVision

● Works for Zing

● No HWC support

● Very informative

Why Profile?

Lies, Damn Lies and Statistical Profiling

Under the Hood

Conclusion

What did we cover?

● Biases in Profilers

● More accurate sampling

● Alternative Profiling Approaches

Don’t just blindly trust your tooling.

Test your measuring instruments

Open Source enables implementation review

Q & A@nitsanw

psy-lob-saw.blogspot.co.uk@richardwarburtoinsightfullogic.comjava8training.comwww.pluralsight.

com/author/richard-warburton

Slides after here just for reference, don’t delete or show