OSMC 2014: Why we do monitoring wrong | Michael Medin
description
Transcript of OSMC 2014: Why we do monitoring wrong | Michael Medin
![Page 1: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/1.jpg)
WrongWhy we do
![Page 2: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/2.jpg)
…frustration…
dev not ops
![Page 3: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/3.jpg)
![Page 4: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/4.jpg)
![Page 5: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/5.jpg)
![Page 6: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/6.jpg)
![Page 7: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/7.jpg)
![Page 8: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/8.jpg)
![Page 9: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/9.jpg)
![Page 10: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/10.jpg)
![Page 11: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/11.jpg)
Please don’t be angry!
Some times I am busy
![Page 12: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/12.jpg)
![Page 13: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/13.jpg)
![Page 14: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/14.jpg)
![Page 15: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/15.jpg)
TAKE:1
![Page 16: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/16.jpg)
check_disk -w 80 –c 90
![Page 17: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/17.jpg)
Slack
-w 80 –c 901gb 1tb 1pb
0.2g 219g 225 179g
![Page 18: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/18.jpg)
Better?
-w $ARG1$1gb 1tb 1pb
0.2g 22g 2 251g
80% 98% 99,8%
Magic?
![Page 19: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/19.jpg)
0
500
1000
1500
2000
2500
3000
Value Warning Critical
The problem
The first alert
On call staff alerted
Lost time
Things went bad!
![Page 20: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/20.jpg)
0
500
1000
1500
2000
2500
3000
Value Warning Critical
The problem
The first alert
On call staff alerted
Lost time
![Page 21: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/21.jpg)
No Slack
-w trend-line1gb 1tb 1pb
0g 0g 0g
![Page 22: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/22.jpg)
Works With Everything!
Magic?
![Page 23: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/23.jpg)
TAKE:2
![Page 24: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/24.jpg)
planningWhat aboutCapacity
Bounds?
![Page 25: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/25.jpg)
Alarm clock
![Page 26: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/26.jpg)
0
500
1000
1500
2000
2500
Warning Critical HDD 1 HDD 2
Full
How long?
> 80%
> 90%
![Page 27: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/27.jpg)
0
500
1000
1500
2000
2500
Warning Critical HDD 1 HDD 2
Full
warn=full in less than x weeks
![Page 28: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/28.jpg)
Photo Credit Howard Dickins
Alarm clock
2 hours before work
![Page 29: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/29.jpg)
0
500
1000
1500
2000
2500
3000
Value Warning Critical
The first alert
On call staff alerted
![Page 30: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/30.jpg)
No basic math!
Magic?
![Page 31: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/31.jpg)
![Page 32: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/32.jpg)
check_disk -w 80 –c 90
![Page 33: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/33.jpg)
0
500
1000
1500
2000
2500
Value Warning Critical
Backup
check_disk check_disk_backup
![Page 34: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/34.jpg)
0
500
1000
1500
2000
2500
Value Warning Critical
check_disk warn=usage>80% and not_backup
Backup
![Page 35: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/35.jpg)
No it is tags
Magic?
![Page 36: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/36.jpg)
Other
TAKE:1
![Page 37: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/37.jpg)
check_load -w 1 –c 2
![Page 38: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/38.jpg)
Bad CPU load?80%
90% 100%
0%
![Page 39: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/39.jpg)
0
10
20
30
40
50
60
70
80
90
100
Value Yesterday Last Week
![Page 40: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/40.jpg)
No, still math
Magic?
![Page 41: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/41.jpg)
![Page 42: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/42.jpg)
check_load -w 1 –c 2
![Page 43: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/43.jpg)
High Load???GOOD BAD
DO WE CARE?
![Page 44: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/44.jpg)
![Page 45: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/45.jpg)
No, still math
Magic?
![Page 46: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/46.jpg)
TAKE:2
![Page 47: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/47.jpg)
check_mem -w 80 –c 90
![Page 48: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/48.jpg)
Bad Memory?80%
90% 100%
0%
![Page 49: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/49.jpg)
Managed…Java
JVM .net
CLR
![Page 50: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/50.jpg)
check_mem
check_jmx check_counter
check_wmi
![Page 51: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/51.jpg)
check_disk -w 80 –c 90
![Page 52: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/52.jpg)
FULL DISK???GOOD BAD
DO WE CARE?
![Page 53: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/53.jpg)
Because we can?Why do we monitor?
Because we do?Because…
![Page 54: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/54.jpg)
Business!Technology
NOT
![Page 55: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/55.jpg)
IT
BUSINESS
![Page 56: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/56.jpg)
No, common sense
Magic?
![Page 57: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/57.jpg)
TAKE:1
![Page 58: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/58.jpg)
Nagios™ is Old
EasySimple
What we always do
![Page 59: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/59.jpg)
bischeckAddons
Other solutions“the new stuff”
forks
![Page 60: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/60.jpg)
Why a tool?
fast forward 15 yearsNagios™Naemon™could do this!
Why an addon?
![Page 61: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/61.jpg)
cron*/5 * * * * wrap.sh mycheck
#!/bin/bash $* if [ $? == 1 ];then send-email.sh fi;
![Page 62: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/62.jpg)
![Page 63: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/63.jpg)
![Page 64: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/64.jpg)
TAKE:2
![Page 65: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/65.jpg)
![Page 66: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/66.jpg)
![Page 67: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/67.jpg)
![Page 68: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/68.jpg)
![Page 69: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/69.jpg)
TAKE:1
![Page 70: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/70.jpg)
![Page 71: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/71.jpg)
![Page 72: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/72.jpg)
TAKE:2
![Page 73: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/73.jpg)
Photo by Olga Berrios
![Page 74: OSMC 2014: Why we do monitoring wrong | Michael Medin](https://reader034.fdocuments.us/reader034/viewer/2022042715/5594521d1a28abe14f8b467e/html5/thumbnails/74.jpg)
Information about NSClient++ http://nsclient.org
facebook.com/nsclient
Slides, and examples http://nsclient.org/nscp/conferances
My Blog http://blog.medin.name