Agenda
2
Concepts
What
Why
Who
How
When – ALWAYS!
Good Practices
3
What you should know
Monitoring is only one small part of a larger scenario that includes a Service Desk function, Incident, Problem, Change,
Release Management and other processes. Used alone, it will not transform your datacenter in a state-of-the-art
showcase.
Nonetheless, it is indispensable.
3
4
Concepts
4
Technology
Process
People
5
What is monitoring?
5
Watch
Alert
Document
Act*
* Automated Response Systems
6
But Wait!
Why are we doing this???
6
Abbreviate the return to normality
REACTIVE
Prevent Interruptions to a process
PROACTIVE
Optimize the use of a resource
ADVANCED
7
What…
7
Server
O.S.
Services
Applications
Services
DBMS
Database
Database
Storage
Network
Switch
Router
FirewallExternal resourcesPlug & Pray
Watch!
8
What…
8
Destination and Content depend on
SEVERITY of the event
TYPE of probe
TIME of Day or Shift
Alert!
9
What…
9
SEVERITY
Error
Warning
Informational
TYPE of probe
Ping,TCP
URL, Exists, Available
Current Value, % In
Use…
TIME of Day
Shift 1
Shift 2
Night - WE
Alert!
10
What…
10
Informational
• Performance Analysis
• Capacity Planning teams
Warning
• Level 2 teams• Performance
Error
• 24x7 teams• Operations
Center• Service Desk*
Alert!
Destination depends on SEVERITY
* Remember ESCALATION RULES!
11
What…
11
% In Use, # of jobs, # of tasks
• Performance Analysis
• Capacity Planning teams
Databases, Services, % avail
• Level 2 teams• Performance
Ping, URL, Services
• 24x7 teams• Operations
Center• Service Desk*
Alert!
Destination depends on TYPE of Probe
* Remember ESCALATION RULES!
12
What…
12
Shift 1
• Shift 1 teams
Shift 2
• Shift 2 teams
Night & weekends
• 24x7 teams• Manager
Alert!
Destination depends on TIME of Day
13
What…
13
Accountability
SLM
Process Improvement
Trends
Audit Trail
Complex Event Review
Document!
14
What…
14
Automated Response Scripts
Reboot
Restart
VMs - Load Balance/VMotion
Act!
15
Good Practices
15
Please DO NOT
Flood the incident team with false positives!Generate more than one alert for the same
eventSend an alert to the wrong personForget people needs restForget no one is reading email at 3 AM
16
Good Practices
16
Please DO
Establish a single point of contact (DC status)Warn the teams about planned maintenanceDefine clear responsibility (who does what)Define and document escalation proceduresEnjoy your success!
17
Questions?
17
Top Related