Deep into your applications, performance & profiling
-
Upload
fabien-arcellier -
Category
Engineering
-
view
195 -
download
1
Transcript of Deep into your applications, performance & profiling
![Page 1: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/1.jpg)
DEEP INTO YOURAPPLICATION ...PERFORMANCE & PROFILING
/ Fabien Arcellier @farcellier
![Page 2: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/2.jpg)
ABOUT @FARCELLIERTechnical Architect, Developer, Life-long learner at
Favourite subject : Devops,Performance & Software craftmanship
Octo Technology
![Page 3: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/3.jpg)
WHAT'S THE MENUWhat means profiling a application ?How does it works ?Apply on real world application memcached
![Page 4: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/4.jpg)
PROFILING IN A FEW WORDS ...
Software profiling is a form of dynamicprogram analysis that measures, forexample :
the space or time complexity of aprogramthe usage of particular instructionsthe frequency and duration of functioncalls, ...
![Page 5: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/5.jpg)
calls, ...@copyright wikipedia
TO GET THIS SORT OF REPORT ...
![Page 6: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/6.jpg)
TO HAVE A BETTER VIEW ON WHAT'SHAPPENS ON YOUR HARDWARE, ...
@copyright highscalability
![Page 7: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/7.jpg)
TO IMPROVE YOUR APPLICATIONPERFORMANCE, ...
@copyright macifcourseaularge
You need measurements to improve continuously yourapplication performance.
TO UNDERSTAND YOUR
![Page 8: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/8.jpg)
TO UNDERSTAND YOURAPPLICATION, ...
You want to understand what is consuming your CPU.
![Page 9: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/9.jpg)
TO MONITOR YOUR SERVER, ...Flame Graph Search
app__libc_start_mainmain
dotmat_mul
You want to understand what your CPUs are doing.
![Page 10: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/10.jpg)
AT THE BEGINNING THERE IS APROGRAM ...
int main(void) return 0;
int func1(void) return 0;
Use gcc to compile itgcc c app.c o app
![Page 11: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/11.jpg)
WITH A SIMPLE SYMBOLS TABLE ...readelf Displays information about ELF files
readelf s app
45: 0000000000400580 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini46: 00000000004004f8 11 FUNC GLOBAL DEFAULT 13 func1...57: 0000000000601040 0 NOTYPE GLOBAL DEFAULT 25 _end58: 0000000000400400 0 FUNC GLOBAL DEFAULT 13 _start59: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 25 __bss_start60: 00000000004004ed 11 FUNC GLOBAL DEFAULT 13 main...
00000000004004ed : Virtual address of the symbolFUNC : type.main : Name of the symbol
![Page 12: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/12.jpg)
HOW IT WORKS ?
60: 00000000004004ed 11 FUNC GLOBAL DEFAULT 13 main
![Page 13: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/13.jpg)
CAPTURE EVENTS AND ASSOCIATETHEM TO SYMBOLS
Generally we can list 3 type of profilers :
Instrumented profilingSampling profilingEvent-based profiling (Java, .Net, ...)
![Page 14: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/14.jpg)
INSTRUMENTED PROFILINGGprof, Callgrind, ...
ProCapture all eventsGranularity
ConsSlower than raw execution (20 times slower forcallgrind)Intrusive (modify code assembly or emulate a virtualprocessor)What they capture and what they show could differs
![Page 15: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/15.jpg)
TOOLING - CALLGRINDCallgrind is a callgraph analyzer that comes with Valgrind.Valgrind is a virtual machine using just-in-time (JIT)compilation techniques.
![Page 16: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/16.jpg)
EXAMPLE WITH A MATRIX CALCULUS
You can instrument your execution with callgrind andexplore on kcachegrind.
![Page 17: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/17.jpg)
SAMPLING PROFILINGPerf, Oprofile, Intel Vtune, ...
Pro~5 or 10% slower than raw executionRun on any code
ConsSome events are invisible
![Page 18: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/18.jpg)
SANDBOX - WRITE MY OWNSAMPLING PROFILER
To understand how simple a sampling profiler is, write yourown thread dump using gdb.
gstack() tmp=$(tempfile) echo thread apply all bt >"$tmp" gdb batch nx q x "$tmp" p "$1" rm f "$tmp"
You execute with frequency to know where your program isspending time
while sleep 1; do gstack @pid@ ; done
![Page 19: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/19.jpg)
TOOLING - PERF & FLAMEGRAPHPerf instrumentation appears on linux 2.6+ (Ubuntu 11.10& redhat 6)common interface for hardware counterFlamegraph is actively developped by Brendan Gregg
![Page 20: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/20.jpg)
EXAMPLE WITH A MATRIX CALCULUSFlame Graph
app__libc_start_mainmain
dotmat_mul
We don't have any time record on mat_new, even if it'scalled 3 times.
![Page 21: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/21.jpg)
FLAMEGRAPH INSTALLATIONgit clone https://github.com/brendangregg/FlameGraph.gitsudo ln s $PWD/flamegraph.pl /usr/bin/flamegraph.plsudo ln s $PWD/stackcollapseperf.pl /usr/bin/stackcollapseperf.plsudo ln s $PWD/stackcollapsejstack.pl /usr/bin/stackcollapsejstack.plsudo ln s $PWD/stackcollapsegdb.pl /usr/bin/stackcollapsegdb.pl
![Page 22: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/22.jpg)
WHAT'S HAPPENDS INSIDEMEMCACHE ?
![Page 23: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/23.jpg)
COMPILE MEMCACHEgit clone https://github.com/memcached/memcached.gitcd memcached./configure && make
![Page 24: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/24.jpg)
WHAT'S HIDDEN INSIDE MEMCACHEBINARY ?
readelf s ./memcached
...434: 000000000040edf0 10 FUNC GLOBAL DEFAULT 13 slabs_rebalancer_resume435: 0000000000000000 0 FUNC GLOBAL DEFAULT UND setuid@@GLIBC_2436: 0000000000000000 0 FUNC GLOBAL DEFAULT UND event_base_loop437: 0000000000412fd0 315 FUNC GLOBAL DEFAULT 13 pause_threads438: 00000000004135e0 10 FUNC GLOBAL DEFAULT 13 STATS_LOCK439: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getaddrinfo@@GLIBC_2440: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strerror@@GLIBC_2441: 000000000040f550 201 FUNC GLOBAL DEFAULT 13 do_item_unlink442: 0000000000000000 0 FUNC GLOBAL DEFAULT UND event_init443: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sleep@@GLIBC_2444: 0000000000412b40 247 FUNC GLOBAL DEFAULT 13 assoc_delete...
![Page 25: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/25.jpg)
WHAT'S HAPPENS WHEN I WRITE 100RECORD ON MEMCACHE
Doing a test with valgrind (not production friendly)Capture cpu usage with gdbCapture cpu usage with perf_eventCapture cache miss with perf_event
![Page 26: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/26.jpg)
MEMCACHE - PROFILING WITHCALLGRIND
Understand what's happen internally by following executiontrace.
valgrind tool=callgrind instratstart=no ./memcached
On another terminalcallgrind_control i onphp memcacheset.phpcallgrind_control i off
![Page 27: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/27.jpg)
MEMCACHE - PROFILING WITHCALLGRIND
kcachegrind callgrind.out.@pid@
![Page 28: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/28.jpg)
MEMCACHE - PROFILING WITH GDB./memcached &
while sleep 0.1; do gstack 8748; done > stack.txtcat stack.txt | stackcollapsegdb.pl | flamegraph.pl > gdb_graph.svg
In an another terminalphp memcacheset.php
![Page 29: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/29.jpg)
MEMCACHE - PROFILING WITH PERFWe capture events to build callgraph
perf record g ./memcached
In an another terminalphp memcacheset.php
To show an interactive reportperf reportperf report stdio
![Page 30: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/30.jpg)
MEMCACHE - PROFILING CPU CYCLEWITH PERF
perf script | stackcollapseperf.pl | flamegraph.pl > graph_stack_missing.svg
Flamegraph
Some information from kernel are missing.
![Page 31: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/31.jpg)
MEMCACHED - PROFILING CPUCYCLE WITH PERF - WITH KERNEL
STACKTRACE./memcached &sudo perf record a g p @pid@
In an another terminalphp memcacheset.php
Generate the flamegraphperf script | stackcollapseperf.pl | flamegraph.pl > graph.svg
Flamegraph
![Page 32: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/32.jpg)
MEMCACHED - PROFILING CACHEMISS WITH PERF
./memcached &sudo perf record e branchmisses a g p @pid@
![Page 33: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/33.jpg)
SYSTEM - WHAT'S YOUR SYSTEM ISDOING ?
sudo perf record a g
![Page 34: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/34.jpg)
USE FLAMEGRAPH WITH JAVAYou can export a flamegraph from jstack output
Logstash contention flamegraph
![Page 35: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/35.jpg)
GOING FURTHERPerf wikiCallgrind docsBrendan Gregg websiteHow profilers lie: the cases of gprof and KCachegrindIntel Vtune
![Page 36: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/36.jpg)
TO SUMMARYPrefer :
perf when you are looking for a bottleneck or you want towatch what's happens on a machinecallgrind when you want to understand what's happen inthe code and when the performance is not a requirement
![Page 37: Deep into your applications, performance & profiling](https://reader036.fdocuments.us/reader036/viewer/2022062316/58803df91a28abfd0a8b589f/html5/thumbnails/37.jpg)