Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher,...

29
Machine Data to Readable Reports System Monitoring, Alerting and Reporting Ashley Fisher University of the Sunshine Coast, Queensland.

Transcript of Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher,...

Page 1: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

Machine Data to Readable ReportsSystem Monitoring, Alerting and Reporting

Ashley Fisher University of the Sunshine Coast, Queensland.

Page 2: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

2

Welcome

Ashley FisherBusiness Systems Analyst

University of the Sunshine Coast

Sunshine Coast, Queensland, Australia.

Page 3: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

3

• System Health• Monitoring• Alerting• Reporting

Page 4: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

4

Microsoft Windows Ahead

While this presentation focusses on Microsoft Windows Server and associated technologies, the concepts and implementation of these systems is similar in other operating environments.

Page 5: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

5

Underlying Infrastructure

• USC is Microsoft centric

• Servers are running on Windows Server 2008 R2

• Authentication through Active Directory

• Currently running Microsoft SQL Server 2008

Page 6: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

6

Blackboard Infrastructure

• 5 Environments

• Total 12 Application Servers

• 3 Dedicated Batch Servers

• 4 SQL Clusters, 1 Standalone MSSQL Installation

• 7 F5 BigIP Pools

• 7tb File Share Storage

• Approx. 12,000 Successful Logins per Day.

Page 7: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

7

Mediasite Infrastructure

• 2 Environments

• Total 12 Application Servers

• 2 SQL Clusters

• 8 F5 BigIP Pools

• 9.5tb File Share Storage

• 380 Recorded Presentations per Week

• Approximately 1,100 hours of content viewed per Day

Page 8: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

8

Monitoring Systems In Place

• Nagios – Monitoring Server Availability

• Zabbix (Pictured Left)

– Monitoring Server Availability and Performance

– Currently Proof of Concept

• Splunk – Log Monitoring

Page 9: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

9

Splunk captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards and visualizations.

Splunk has a mission of making machine data accessible across an organization by identifying data patterns, providing metrics, diagnosing problems and providing intelligence for business operations. Splunk is a horizontal technology used for application management, security and compliance, as well as business and web analytics.

https://en.wikipedia.org/wiki/Splunk

Page 10: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

10

Splunk Interface

Page 11: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

11

Blackboard Logging

• 67 Log files on each Blackboard host» A lot of information we can and are using. » A lot we’re potentially missing.

• Daily rotation of important logs» Troubleshooting issues across multiple days is frustrating.

• Logs archived Monday morning, weekly» As above, however we need to unzip the archived logs to get access to the

contained information.

Page 12: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

12

Blackboard Database

• The activity_accumulator table retains a transcript of user activity.

• We can use the behind table joins to track user login times, course access times, and individual content item interactions.

• USC rotates our activity_accumulator table data into a backup database every 180 days.

Page 13: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

13

Student Contesting Late Submission PenaltyStudents are penalised by a percentage of their received grade for late assignment submissions, students do contest the penalty from time to time.

• Traditional Method of Investigation– Database Query (activity_accumulator) – Individual Host Log Interrogation (Repeat)

• Lots of Steps

• Time Consuming

• Room for Error or Misinterpretation

Page 14: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

14

Student Contesting Late Submission PenaltyStudents are penalised by a percentage of their received grade for late assignment submissions, students do contest the penalty from time to time.

• Intermediate Method– Database Query (activity_accumulator) – Log Into Splunk– Search string:

index=“blackboard_prod” “_userpk1_”

• Few Steps

• Easy Training

• Now Dashboarded

Page 15: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

15

Page 16: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

16

Zabbix is an enterprise open source monitoring solution for networks and applications(…) It is designed to monitor and track the status of various network services, servers, and other network hardware.• Simple checks can verify the availability and responsiveness of standard services such

as SMTP or HTTP without installing any software on the monitored host.• A Zabbix agent can also be installed on UNIX and Windows hosts to monitor statistics

such as CPU load, network utilization, disk space, etc.• As an alternative to installing an agent on hosts, Zabbix includes support for monitoring

via SNMP, TCP and ICMP checks, as well as over IPMI, JMX, SSH, Telnet and using custom parameters(…)

https://en.wikipedia.org/wiki/Zabbix

Page 17: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

17

Zabbix InterfaceOverview/Landing Page

Page 18: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

18

• Zabbix holds a very template centred view of deployment.

• The approach we’ve taken is to have ‘opt-in’ templates available for hosts.

• CPU Load, Memory Use, Network Traffic/Bandwidth and HDD Space checks are in a template added to all hosts with an agent installed

Our Zabbix Environment

Page 19: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

19

Zabbix Templates

• Example Template: ‘Core Infrastructure Connectivity’.

When this template is applied to a host, the Zabbix agent on the host will ping those end-points locally. We can see if an individual host cannot connect to the time servers, domain controllers or our LDAP servers.

Page 20: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

20

Blackboard and Zabbix

• We have multiple Blackboard specific templates, one is inline with the last example, however it watches availability and response times of external connectors, SafeAssign and Collaborate for example.

Page 21: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

21

Blackboard and Zabbix

Page 22: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

22

Blackboard and Zabbix

• One very powerful tool we have is JMX monitoring pulling information about the Blackboard application itself.

Page 23: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

23

Zabbix Environment Mapping

Zabbix allows you to map relationships between nodes. Show where problems lay, and their impact.

IE. If there was a problem with file03, the line between bbdev01 and file03 would turn red, file03’s status would change from OK to Problem. This is an easy way to assess what the problem will impact.

Page 24: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

24

Mediasite and Zabbix

• Mediasite is really the forefront of monitoring through Zabbix.

• In Nagios, we currently have 5 checks per recorder in production.

• In Zabbix so far, I have 26 individual checks per recorder.

Page 25: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

25

Page 26: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

26

• The graph below shows the available space on our production Blackboard file share for the incident.

• Emergency maintenance was carried out on the 15th to increase the allocated disk space.

The Platforms in Collaboration

Page 27: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

27

• An alert was set up in Splunk to in real time, let us know when a student submits an assessment submission is greater than 200mb.

The Platforms in Collaboration

Page 28: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

28

Self-Healing?

https://mediasite.usc.edu.au/Mediasite/Play/4af80791a9784f0bb418be531d7e31671d

The above video is the only way that I could think of how to present this particular part.

In the video, I have the Zabbix monitoring platform on one side, and a camera feed of the remote Mediasite recorder on the other.

As illustrated in the previous slide, there are a few checks deemed “self-healing”, this is one such scenario. In the event that the Mediasite scheduler service fails, or stops, Zabbix picks it up, realises there is something not right, and I’ve got it sending a command to the recorder to shut the software down, and force a restart on the recorder.

Page 29: Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast   |  ANZTLC15

29

Questions?