Production Profiling: What, Why and How · Logs of GC events Often manual, aids system...
Transcript of Production Profiling: What, Why and How · Logs of GC events Often manual, aids system...
![Page 1: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/1.jpg)
Production Profiling: What, Why and How
Richard Warburton (@richardwarburto)Sadiq Jaffer (@sadiqj)
https://www.opsian.com
![Page 2: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/2.jpg)
Why Performance MattersDevelopment isn’t Production
Profiling vs MonitoringProduction Profiling
Conclusion
![Page 3: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/3.jpg)
Customer Experience
![Page 4: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/4.jpg)
Amazon: 100ms of latency costs 1% of sales
Google: 500ms seconds in search page generation time drops traffic by 20%
Responsive Applications make more Money
![Page 5: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/5.jpg)
Stop Costly Downtime
![Page 6: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/6.jpg)
Reduce Costs
![Page 7: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/7.jpg)
Why Performance MattersDevelopment isn’t Production
Profiling vs MonitoringProduction Profiling
Conclusion
![Page 8: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/8.jpg)
Development isn’t Production
Performance testing in development can be easier
May not have access to production
Tooling often desktop-based
Not representative of production
![Page 9: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/9.jpg)
Unrepresentative Hardware
vs
![Page 10: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/10.jpg)
Unrepresentative Software
![Page 11: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/11.jpg)
Unrepresentative Workloads
vs
![Page 12: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/12.jpg)
The JVM may have very different behaviour in production
Hotspot does adaptive optimisation
Production may optimise differently
![Page 13: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/13.jpg)
![Page 14: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/14.jpg)
Why Performance MattersDevelopment isn’t Production
Profiling vs MonitoringProduction Profiling
Conclusion
![Page 15: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/15.jpg)
Ambient/Passive/System Metrics
Preconfigured numerical measure about the
system
CPU Time Usage / Page-load Times
Cheap and sometimes effective
![Page 16: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/16.jpg)
LoggingRecords arbitrary events emitted by the system being monitored
log4j/slf4j/logback
Logs of GC events
Often manual, aids system understanding, expensive
![Page 17: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/17.jpg)
Coarse Grained Instrumentation
Measures time within some instrumented section of the code
Time spent inside the controller layer of your web-app or performing SQL queries
More detailed and actionable though expensive
![Page 18: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/18.jpg)
Production ProfilingWhat methods use up CPU time?
What lines of code allocate the most objects?
Where are your CPU Cache misses coming from?
Automatic, can be cheap but often isn’t
![Page 19: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/19.jpg)
Where Instrumentation can be blind in the Real World
Problem: Every 5 seconds an HTTP endpoint would be really slow.
Instrumentation: on the servlet request, didn’t even show the pause!
Cause: Tomcat expired its resources cache every 5 seconds, on load one resource
scanned the entire classpath
![Page 20: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/20.jpg)
![Page 21: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/21.jpg)
Surely a better way?
Not just Metrics - Actionable Insights
Diagnostics aren’t Diagnosis
What about Profiling?
![Page 22: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/22.jpg)
Why Performance MattersDevelopment isn’t Production
Profiling vs MonitoringProduction Profiling
Conclusion
![Page 23: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/23.jpg)
How to use Production Profilers
1) Extract relevant time period and apps/machines
2) Choose a type of profile: CPU Time/Wallclock Time/Memory
3) View results to tell you what the dominant consumer of a resource is
4) Fix biggest bottleneck
5) Deploy / Iterate
![Page 24: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/24.jpg)
CPU Time vs Wallclock Time
![Page 25: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/25.jpg)
Profiling Hotspots
![Page 26: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/26.jpg)
Profiling Treeviews
![Page 27: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/27.jpg)
Profiling Flamegraphs
![Page 28: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/28.jpg)
Instrumenting Profilers
Add instructions to collect timings (Eg: JVisualVM Profiler)
Inaccurate - modifies the behaviour of the program
High Overhead - > 2x slower
![Page 29: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/29.jpg)
Sampling/Statistical Profilers
WebServerThread.run()
Controller.doSomething() Controller.next()
Repo.readPerson()
new Person()
View.printHtml() ??? ???
![Page 30: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/30.jpg)
Safepoint Bias after Inlining
WebServerThread.run()
Controller.doSomething() Controller.next()
Repo.readPerson()
new Person()
View.printHtml() ???
![Page 31: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/31.jpg)
Time to Safepoint
-XX:+PrintSafepointStatistics
Thre
ads
Safepoint poll
VM O
pera
tion
![Page 32: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/32.jpg)
Advanced Statistical Profiling in Java
OS Signals to interrupt threads on resource consumption threshold
JVM’s signal handler-safe AsyncGetCallTrace to walk the stack
![Page 33: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/33.jpg)
People are put off by practical as much as technical issues
![Page 34: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/34.jpg)
Barriers to Ad-Hoc Production Profiling
Generally requires access to production
Process involves manual work - hard to automate
Low-overhead open source profilers unsupported
![Page 35: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/35.jpg)
What if we profiled all the time?
![Page 36: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/36.jpg)
Historical Data
Allows for post-hoc incident analysis
Enables correlation with other data/metrics
Performance regression analysis
![Page 37: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/37.jpg)
Putting Samples in Context
Application version
Environment parameters (machine type, CPU, location, etc.)
Ad-hoc profiling we can’t do this
![Page 38: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/38.jpg)
Opsian - Continuous Profiling
Opsian Aggregation
service
Web ReportsJVM Agents
![Page 39: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/39.jpg)
Summary
We can profile in production with low overhead
To overcome practical issues we can profile production all the time
Profiling all the time opens up new capabilities
![Page 40: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/40.jpg)
Why Performance MattersDevelopment isn’t Production
Profiling vs MonitoringProduction Profiling
Conclusion
![Page 41: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/41.jpg)
Performance Matters
Development isn’t Production
Metrics can be unactionable
Instrumentation has high overhead
Continuous Profiling provides insight
![Page 42: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/42.jpg)
We need an attitude shift on profiling + monitoring
![Page 43: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/43.jpg)
ContinuousProactive not Reactive
Systematic not Ad Hoc
![Page 44: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/44.jpg)
Please do Production Profiling.All the time.
![Page 45: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/45.jpg)
Any Questions?https://www.opsian.com/
![Page 46: Production Profiling: What, Why and How · Logs of GC events Often manual, aids system understanding, expensive. ... on the servlet request, didn’t even show the pause! Cause: Tomcat](https://reader034.fdocuments.us/reader034/viewer/2022042310/5ed79a6b30ed446bab02c484/html5/thumbnails/46.jpg)
The End