1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation...

23
1 Evaluation of Network Evaluation of Network Management Systems (NMS) Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished Tasks Assigned Rahul Datta, ISIS, 09/10/06 Graduate Student, Vanderbilt University

Transcript of 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation...

Page 1: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

1

Evaluation of Network Management Evaluation of Network Management Systems (NMS)Systems (NMS)

• Background• Problem Statement• Resolution• Evaluation of NMS solutions• Recommendation• Tasks Accomplished• Tasks Assigned

Rahul Datta,

ISIS, 09/10/06 Graduate Student, Vanderbilt University

Page 2: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

2

Background Background The Fermi National Accelerator Laboratory ( Fermi labs), is

undergoing a research in lattice QCD( quantum chromodynamics).For this purpose they operate large clusters of computers. Their goal is the understanding of the strong dynamics of quarks and gluons, which is beyond the reach of the traditional perturbative methods of quantum field theory. A central goal of the groups using the computers is the accomplishment of the calculations required to extract from experiment the fundamental parameters of the Standard Model of particle physics. The Fermi labs is focusing on building a Cluster Reliability Subsystem. The LQCD computer cluster will be very large and will need to be available 24 hours a day. The cluster should insure that resources are used to best possible extent and attempt to complete started tasks in the presence of hardware and software failures (be fault resilient).

Page 3: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

3

Background contd..Background contd..Examples of things that can affect availability and

performance include• power outages - scheduled and unscheduled• job failures due to failing or failed hardware• scheduling jobs on faulty nodes• decreased performance due to hardware

deterioration• decreased performance due to external

influences (e.g. air quality)• inability to diagnose problems (e.g. hardware, OS,

batch tools)

Page 4: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

4

Problem StatementProblem Statement• To determine and specify the requirements placed on an NMS by

LQCD-like systems.• To survey available Network management systems (NMS) and

select a limited number capable of meeting the requirements to monitor/manage the computer cluster and the devices contained in that network. – To measure the performance of the NMS.– To ascertain the characteristics and features of the NMS.

• Prototype a limited-scale monitoring/adaptation system.– To monitor the ( utilization, state) of all processors and networks

in the system.• To experiment with it and observe what kind of plug-ins or

modification can be made to the NMS • To consider a system where pluggable components hook into a

message distribution system for routing and delivery to other pluggable components

Page 5: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

5

Goal ArchitectureGoal Architecture

Coordinator

ArchiversIB FabricMonitor

IPMIMonitor

EmailMonitor

AlarmPresenters

EmailSenders

DcacheMonitor

IP NetworkMonitor

Phys AttrMonitor

User Proc Monitor

ServiceMonitor

Disk Monitor

Help Ticket Monitor

Job Scanner

Job Checker

PBS

qstat

Database

Acct Log

Maui

Head Node Functions – Final System

Action Takers

To/From Subordinates

BookeepingDatabase

Page 6: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

6

Goal Architecture contd..Goal Architecture contd..

Coordinator

IB HCAMonitor IPMI

MonitorIP Network

Monitor

Phys AttrMonitor

User Proc Monitor

ServiceMonitor

Storage Monitor

PBS Monitor

Job Resource Monitor

Worker Functions – Final System

Action Takers

To/From Manager

Bookeeping Database

Job Activity Monitor

Job Class/Profile Monitor

Driver Monitor CPU state

MonitorUptime Monitor

Restart services,Report success/fail,Recycle drivers,Reboot machine

Activity timing,Running, staging,etc.

Page 7: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

7

ResolutionResolution• Open Source, Not restricted, (Distribution ,

porting , licensing)• Tools for user Interface• Kind of communications available.• Heavy weight package or Light weight package.

( Resource requirements, Memory, processor, bandwidth)

• Synchronization and triggers , Memory check.• Plug ins available or modules can be build ( for

ex. Sensor modules)• Effectors, sensors and monitors. • Documentation

Page 8: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

8

Potential NMS solutions Potential NMS solutions • Open NMS• PIKT• JFFNMS• Nagios• Aware• Net-Policy• SYSMON

Note : All the Network Management systems discussed here are Open Source.

• Due to the scope of the research done as of now Net-Policy and SYSMON has not been discussed in details here.

Page 9: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

9

Open NMS( Open Network Management System )

Platform supported : Linux ,Fermi Linux, Cent OS, RHEL 3 & 4, Debian Sarge, SuSE, Red Hat Linux, Mandrake, Solaris, Mac OS( panther).

Features :

GUI ( web based graphical user interface)

Service polling:

o OpenNMS provides real-time event-driven systems. Events are typically from SNMP traps, but can come from other sources such as syslog. There is no polling interval as such in these systems. If a node goes down, an SNMP trap is generated by the switch immediately. true real-time network monitoring OpenNMS has the ability to poll the following services (ICMP ,NotesHTTP, DominoHTTP ,Citrix ,LDAP ,SNMP ,SNMPv2 ,and many more…. )

Network discovery

Availability Reporting

Page 10: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

10

Open NMS( contd…)

SNMP Data Collection

SNMP Trap receiver (Over 5000 traps are pre-configured)

Notification via e-mail, pager, xmpp, growl, or anything that can be run on a command line

Supported Communications : Alarms, Sensors, Effectors

Threshold (based on data collected via SNMP or response time from a poller )

Well documented

Language written in : JAVA

Page 11: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

11

PIKT ( Problem Informant Killer Tool)

Platform supported : GNU/ Linux ,Fermi Linux, AIX. FreeBSD , OpenBSD, Digital UNIX

Features :

Lacks proper GUI

Reporting a problem

Fixing a problem (Kill idle user sessions, monitoring user activity, delete junk files, disk management)

Scanning a log file ( log file analysis)

Configuring a system ( network configuration)

Auto-configuring a file( automated configuration management)

Page 12: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

12

PIKT

Features (contd…)

Job scheduling (centrally directed scheduling daemon, cron alternative)

Monitoring system security (checksum differences, change auditing)

Enhancing the command line (command line macros, remote command execution)

Lacks proper documentation

Page 13: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

13

JFFNMS (Just For Fun Network Management Systems)

Platform supported : GNU/ Linux ,Fermi Linux, AIX. FreeBSD , OpenBSD, Digital UNIX Features :

Web GUI

Event console, Shows event , Alarms in the same time ordered display

Distributed Polling

Triggers/Actions Framework for email/other clients

Map and sub-Map support

Completely administrative via web. Sound alerts in the browser

Database Abstraction Framework

Object oriented

Sensors

Page 14: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

14

JFFNMS ( contd…….)

Reports

• Traffic bytes

• Utilization %

• Packets per second, errors per second, error rate

• Round Trip Time and Packet loss ( CISCO and Smokeping)

• Drops

• TCP connections: Incoming, Outgoing, Established, Delay

• Number of processors, Number of users

• Used memory and Disks with aggregation

• Processor utilization and Load average

• Temperature

• Documentation available

Language written in : PHP

Page 15: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

15

NAGIOS

Platform supported : Linux ,Fermi Linux

Features :

Monitoring of network services( SMTP, POP3,HTTP,etc) Ability to define network host hierarchy, allowing detection and distinction of hosts that

are down and those that are unreachable

Notifications via email , pager or other user defined method

Ability to define event handlers to be run during service or host events for proactive service resolution

Ability to acknowledge problems via the web interface

Supported Communications

o Simple plugin design allowing users to develop their own host and service checks

Page 16: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

16

NAGIOS (contd..)

Supported Communications (contd….)

o Simple plugin design allowing users to develop their own host and service checks

o Monitoring of Host resources( processor load, disk and memory usage, running processes, log files, etc)

o Monitoring of environmental factors such as temperature

Language written in : C

Page 17: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

17

AWARE

Platform supported : Linux ,Fermi Linux

Features :

Open source implementation allows for robust code base and customization

Common core engine implements a model of event processing

A "plug in" style mechanism allows dynamic addition of handlers

Agents are composed of a set of running event handlers

Agents can get their configuration from other agents (e.g., a centrally managed set of agent configurations)

Agents can communicate with other agents using connection oriented, connectionless and broadcast based methods

Page 18: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

18

AWARE

Features (contd..)

Supported Communications:o Sensors: A comprehensive set of sensors that gather relevant information

o Analyzers: Components that process data from the sensors and issue controller commands

o Controllers: Components that change system state (e.g., run programs, change system parameters, control devices

Documentation Available.

Language written in : C

Page 19: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

19

Comparison of the features of the different NMSComparison of the features of the different NMS

Tool Name GUI and Status Reports

Service Polling ,Network Discovery

Alarms, Sensors, Effectors

Memory ,Processor, Bandwidth

OpenNMS * * * * * * * * * * * * * * * TBD

PIKT * * * * * * * * TBD

JFFNMS * * * * * * * * * * * TBD

Nagios * * * * * * * * * * * TBD

Aware * * * * * * * * * * * * * TBD

Page 20: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

20

RecommendationRecommendation• Explore and experiment with the full

features of at least 2 or 3 Open Source Network Management Systems (NMS) before finalizing a NMS.

• Based on the comparative features OpenNMS has been chosen.

Page 21: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

21

Tasks AccomplishedTasks Accomplished• Installation of OpenNMS successfully

on an offsite Fermi Linux machine at ISIS, Vanderbilt University.

Page 22: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

22

Tasks assignedTasks assigned• Exploring the features of OpneNMS

for example :To find a sensor and installing the

sensor, building it. Writing our own sensors, alarms, effectors.

Detect the temperature difference of the hard drive of at least one of the nodes using OpenNMS.

Page 23: 1 Evaluation of Network Management Systems (NMS) Background Problem Statement Resolution Evaluation of NMS solutions Recommendation Tasks Accomplished.

23

Useful Links /URLSUseful Links /URLS

• http://www.openxtra.co.uk/resource-center/open_source_network_management_systems.php

• http://www.opennms.org/index.php/Main_Page

• http://jffnms.sourceforge.net/

• http://www.elegant-software.com/software/aware/doc/html/index.html