Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas...
-
Upload
trinhquynh -
Category
Documents
-
view
249 -
download
1
Transcript of Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas...
![Page 1: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/1.jpg)
Nagios Core 4 News and improvements
![Page 2: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/2.jpg)
About me
32 years old
Programming since I was seven
Work as “core architect” at op5
Nagios Core co-maintainer since 2009
Will be found at the bar in the evenings
![Page 4: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/4.jpg)
Nagios Core 4
Goals
Algorithm analysis crash course
Bottleneck analysis of Nagios Core 3
Bottleneck solutions in Nagios Core 4
New features
Future possibilities
![Page 5: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/5.jpg)
Nagios Core 4 goals
Stability
low complexity
testing
Scalability
efficient, reusable, well-tested algorithms
efficient resource usage
Simplicity
useful api's
no “magic” and no bloat
![Page 6: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/6.jpg)
Algorithm analysis – big Oh
n = 100, one operation = 1 microsecond
O(1) = 0.0000001 second
O(lg n) = 0.0000046 seconds
O(n) = 0.00001 second
O(n * lg n) = 0.00046 seconds
O(n^2) = 0.01 second
O(2^n) = 4*10^16 years
O(n!) = 2.96*10^144 years
Conclusion:
Good algorithms > beefy hardware
![Page 7: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/7.jpg)
Algorithm analysis – big Oh
1 2 3 4 5 6 7 8 9 10 0
20
40
60
80
100
120
O(lg n)
O(n)
O(n^2)
![Page 8: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/8.jpg)
I/O media comparison
HDD seektime: 5.11ms
SSD seektime: 0.24ms
RAM seektime: 0.000013ms (13ns)
SSD is 21.3 times faster than SCSI
RAM is 393077 times faster than SCSI
RAM is 18461 times faster than SSD
Conclusion
All types of disk access is bad
![Page 9: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/9.jpg)
Bottleneck analysis - Test setup
3000 hosts
200 000 services
5 minute check interval
really (really) stupid plugin: check_aok
![Page 10: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/10.jpg)
Nagios 3 bottlenecks
configuration parsing
event queue insertion
add_event() runs in O(n) time 676 times per second, but lowest bound is O(lg n)
macro resolution
strcmp() ~3700 times/sec to handle checks
job spawning and check reaping
heavy on cache-line fills and disk I/O
insufficient parallelization
![Page 11: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/11.jpg)
Nagios 3 check flowchart
Nagios fork()s a child
child writes half a checkresult file
child fork()s and runs shell
child completes checkresult file
Nagios reads spooldir
“ok-to-read”? child reads status and output
shell fork()s and runs plugin
child creates an “ok-to-read” file
Nagios finds a checkresult file
shell parses commandline
child exits Nagios parses checkresult
cache miss
remove result and “ok-to-read”
Nagios reads scheduling queue
read the file
![Page 12: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/12.jpg)
Nagios 3 check flowchart - hotspots
Nagios fork()s a child
child writes half a check-result file
child fork()s and runs shell
child completes checkresult file
Nagios reads spooldir
“ok-to-read”? child reads status and output
shell fork()s and runs plugin
child creates an “ok-to-read” file
Nagios finds a checkresult file
shell parses commandline
child exits
read the file cache miss
remove result and “ok-to-read”
Nagios reads scheduling queue
Nagios parses checkresult
![Page 13: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/13.jpg)
depth-first search for host and service dependencies
O(n^2) -> O(n): 400000000 -> 20000 operations for 20000 dependencies
group members no longer duplicated
Verify exactly once
Effect: Nagios loads configurations really fast
Config parsing solution
![Page 14: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/14.jpg)
Move to priority queue on binary heap
Insertion: O(n) -> O(lg n)
Extract: O(1) -> O(lg n)
43000000 -> 9460 operations per second
Effect: Main nagios process uses (a lot) less CPU
Kudos: libpqueue author Volkan Yazici
Event queue solution
![Page 15: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/15.jpg)
Macro names sorted on startup
Lookups: O(n) -> O(lg n)
65360 -> 3010 operations per second
Effect: Main nagios process uses less CPU
Todo: Cache resolved check commands (when configured to)
Macros solution
![Page 16: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/16.jpg)
Checks Solutions
Worker processes run all helper apps (checks, notification, eventhandlers)
fork()'s/sec increased (800 with 300MB process, 13900 with 1MB process)
Effects:
Drastically reduced I/O load (100% -> 1%)
Drastically reduced CPU usage
Up to ~300000 checks / 5 minutes
Kudos: Sven Nierlein, William Leibzon & Jean Gabès
![Page 17: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/17.jpg)
Worker processes breakdown
Workers are spawned by Nagios
Chosen in round-robin fashion
Workers communicate with Nagios using libnagios api's exclusively
Todo:
Special-purpose workers calling in
Zero fork()'s
Experimental implementation in op5 labs
Remote workers
![Page 18: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/18.jpg)
Nagios 4 check flowchart - hotspots
Nagios tells worker to run check
worker parses commandline
plugin runs
shell fork()s
worker reads status and output
“Simple” commandline?
worker sends data back to Nagios Nagios parses check result
Nagios reads scheduling queue
worker fork()s
worker receives command
![Page 19: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/19.jpg)
With special-purpose workers
Nagios tells worker to run check worker parses commandline
Voodoo
worker sends data back to Nagios Nagios parses check result
Nagios reads scheduling queue worker receives command
![Page 20: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/20.jpg)
Nagios 3 check flowchart - hotspots
Nagios fork()s a child
child writes half a check-result file
child fork()s and runs shell
child completes checkresult file
Nagios reads spooldir
“ok-to-read”? child reads status and output
shell fork()s and runs plugin
child creates an “ok-to-read” file
Nagios finds a checkresult file
shell parses commandline
child exits
read the file cache miss
remove result and “ok-to-read”
Nagios reads scheduling queue
Nagios parses checkresult
![Page 21: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/21.jpg)
Check engine performance comparison
0
50000
100000
150000
200000
250000
300000
350000
Centreon
Icinga
Nagios 3
gearman
Shinken
Nagios 4
![Page 22: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/22.jpg)
Nagios 4 – New features
Major:
libnagios
Query handler
NERD
Minor:
service parents
hourly_value + minimum_value
$CHECKSOURCE$
![Page 23: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/23.jpg)
libnagios
iobroker – multiplexing library
iocache - bulk reading and writing
kvvec – key value vector handling
dkhash – dual-key hash api
bitmap – set-operations for large sets
squeue – fast scheduling queue
pqueue – priority queue (from Apache)
skiplist – previously in Nagios core only
nsock – simple socket library
![Page 24: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/24.jpg)
libnagios – usage example
#include <nagios/lib/libnagios.h> #define QH_SOCKET_PATH "/opt/monitor/var/nagios.qh" int main(int argc, char **argv) { int sd, r; char *buf[4096]; sd = nsock_unix(QH_SOCKET_PATH, NSOCK_TCP | NSOCK_CONNECT, 0); if(sd < 0) { printf("Failed to connect to '%s': %s: %m\n", argv[1], nsock_strerror exit(1); } if (nsock_printf("@nerd subscribe opathchecks") > 0) { while((r = read(sd, buf, sizeof(buf))) > 0) write(fileno(stdout), buf, r); } close(sd); return 0; }
![Page 25: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/25.jpg)
Query handler
General purpose handler for addressable queries in Nagios Core
query: “@<address><SP><query>\0”
“echo” service built in
query_socket=/path/to/nagios.qh in nagios.cfg
Kudos for inspiration: Mathias Kettner
![Page 26: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/26.jpg)
NERD
Nagios Event Radio Dispatcher
Provides real-time data to outside addons
Can reduce I/O load of current addons
Queried as 'nerd' via query-handler
Example queries:
@nerd subscribe hostchecks
@nerd subscribe servicechecks
Todo: Macro support, 'alerts' channel
demo time :)
![Page 27: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/27.jpg)
Other features
Service parents
servicedependencies made easy
hourly_value + minimum_value
$CHECKSOURCE$
Useful when adding remote checking modules
“make dox” and look in Documentation/html
![Page 28: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/28.jpg)
Easter eggs / micro-features
The /dev/null hack
object_cache_file
status_file
nagios-devel package available
libnagios and Nagios Core headers
![Page 29: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/29.jpg)
Addon status
Works:
mod_gearman
modpnp
livestatus (from http://github.com/ageric/livestatus)
merlin
![Page 30: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/30.jpg)
Known bugs, issues and ToDo's
Host latency calculation is messed up
If use_aggressive_host_checks=1, on-demand host checks are still run synchronously
Environment macros are currently not supported
![Page 31: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/31.jpg)
Deprecation notices
Command line
-o (don't verify objects) is removed and will throw an error
-x (don't verify object paths) is deprecated and will produce a warning
![Page 32: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/32.jpg)
Deprecation notices, continued
Object configuration in nagios.cfg is now officially unsupported. Do not rely on it to work
Embedded perl has been removed
Too many reports on memory leaks
Performance improved in workers by removing it, due to smaller memory footprint
![Page 33: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/33.jpg)
Deprecation notices, continued
nagios.cfg:
sleep_time - we now poll until it's time to run the next event
command_check_interval – commands are always handled immediately
last_command_check – as per above
failure_prediction* - never implemented
Everything relating to embedded perl
![Page 34: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/34.jpg)
Deprecation notices, continued
objects
failure_prediction* - this was never implemented
group member exclusions no longer inherited by group-in-group inclusion
group1->members = A,B
group1->group_members = group2
group2->group_members = !B,C
group1 has A,B,C as members in Nagios 4
group1 had A,C as members in Nagios 3
![Page 35: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/35.jpg)
Special thanks
Ethan Galstad
Daniel Wittenberg
Armin Wolfermann
Joerg Linge
Sven Nierlein
Mark Frost
Robin Sonefors
William Leibzon
Everyone who sent me configs for testing
![Page 36: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:](https://reader035.fdocuments.us/reader035/viewer/2022062504/5b06f95c7f8b9ad1768d79da/html5/thumbnails/36.jpg)
Questions?
Look me up between sessions
Check out the 'make dox' thingie
Online resources
http://www.github.com/ageric
http://www.op5.com