1 RedIRIS – Alberto Escolano Sánchez [email protected] RedIRIS monitoring and...

72
1 RedIRIS – Alberto Escolano Sánchez [email protected] RedIRIS monitoring and operational procedures

Transcript of 1 RedIRIS – Alberto Escolano Sánchez [email protected] RedIRIS monitoring and...

Page 1: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

1

RedIRIS – Alberto Escolano Sá[email protected]

RedIRIS monitoring and operational procedures

Page 2: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

2

Agenda

• Part I: Monitoring• Concepts• SNMP• Hardware• Tools• Active Monitoring

Page 3: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

3

Concepts

• SNMP (Simple Network Management Protocol)– RFC 1157– Protocol developed to manage nodes of an IP network

• UDP (User Datagram Protocol)– RFC 768– Most commonly used transport protocol for SNMP

• SMI (Structure of Management Information)

– RFC 1155

– RFC 2578 (version 2)– Contains the definitions for the structure and

identification of management information for the Internet

Page 4: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

4

Concepts

• MIB (Management Information Base)

– RFC 1156

– RFC 1213 (version 2)– Together with SNMP and SMI provide the architecture

for managing the Internet

• OID (Object Identifier)– List of numbers separated by points which specify an

exact parameter

• NMS (Network Management System)

– Set of applications that monitor and control managed devices

– Can be standard or vendor specific

Page 5: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

5

Agenda

• Part I: Monitoring• Concepts• SNMP• Hardware• Tools• Active Monitoring

Page 6: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

6

SNMP

• Protocol used to manage network devices such as switches, routers and servers

• Components

– NMS: Software used to monitor and control managed devices

– SNMP agent: Management software running in the managed device

– Network device: Network node to be managed

• SNMP uses the information provided by MIBs

• MIBs describe the structure of the management data of a network device in a hierarchical way using OIDs

• OIDs identify variables or elements that can be read or written via SNMP

• Network devices generate and send SNMP traps to the management system

Page 7: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

7

SNMP

• SNMP versions

– SNMPv1: Basic operations and features

– Simplicity

– Lack of security

– RFC 1157

– SNMPv2: Additional operations and features

– Several versions (SNMPv2p, SNMPv2c, SNMPv2u, SNMPv2*)

– Improved security

– Difficult choice between versions

– i.e: SNMPv2c – RFC 1901– SNMPv3: Security enhacement

– Uses features from several SNMPv2 versions– Flexible way to define security methods and

parameters– RFC 2570

Page 8: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

8

SNMP

• SNMP architecture

NMS L2 Switch

L3 Router

SNMP Agent

SNMP Agent

SNMP Manager

MIBs

MIBs

MIBs

SNMP Request (UDP Port 161)

SNMP Response (UDP Port 161)

SNMP Request (UDP Port 161)

SNMP Response (UDP Port 161)

SNMP Trap (UDP Port 162)

SNMP Trap (UDP Port 162)

Page 9: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

9

SNMP

• MIB Tree structure

– Each SNMP OID represents an individual object of the MIB

– The MIB can be broken down into a tree structure where OIDs are leaves on the tree

root

ccitt (0) iso (1) joint-iso-ccitt (2)

standard (0) identified organization (3)

dod (6) …

internet (1)

private (4) security (5) snmpv2 (6)experimental (3)mgmt (2)directory (1)

mib-II (1) interface (2) …

Page 10: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

10

SNMP

• First approach: How does all these things work?

– Query for inbound octets passed through an interface of a switch in the network

– Let’s assume all the SNMP stuff is configured and running properly

– We’ll need the MIB and OID for the SNMP query in the hierarchy of the OIDs tree

– 1.3.6.1.2.1.2 is the OID for the interfaces related data (

– 1.3.6.1.2.1.2.2.1.10 is the OID for the ifInOctets parameter value

– Now we need the interface index to refer to it. Let’s assume it is 65.

– The full OID is 1.3.6.1.2.1.2.2.1.10.65

– OID translation:

– .iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifInOctets.65

Page 11: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

11

SNMP

• Second approach: Numeric OID conversion

– 1.3.6.1.2.1.2.2.1.10.65 is converted using IF-MIB

– IF-MIB partially detailed:IF-MIB DEFINITIONS ::= BEGIN

IMPORTS

MODULE-IDENTITY, OBJECT-TYPE, Counter32, Gauge32, Counter64,

Integer32, TimeTicks, mib-2,

NOTIFICATION-TYPE FROM SNMPv2-SMI

ifMIB MODULE-IDENTITY

LAST-UPDATED "200006140000Z"

ORGANIZATION "IETF Interfaces MIB Working Group"

CONTACT-INFO

ifEntry OBJECT-TYPE

SYNTAX IfEntry

MAX-ACCESS not-accessible

STATUS current

DESCRIPTION

"An entry containing management information applicable to a

particular interface."

INDEX { ifIndex }

::= { ifTable 1 }

Page 12: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

12

SNMP

– IF-MIB partially detailed (cont.):IfEntry ::=

SEQUENCE {

ifIndex InterfaceIndex,

ifDescr DisplayString,

ifType IANAifType,

ifMtu Integer32,

ifSpeed Gauge32,

ifPhysAddress PhysAddress,

ifAdminStatus INTEGER,

ifOperStatus INTEGER,

ifLastChange TimeTicks,

ifInOctets Counter32,

ifInUcastPkts Counter32,

ifInOctets OBJECT-TYPE

SYNTAX Counter32

MAX-ACCESS read-only

STATUS current

DESCRIPTION

"The total number of octets received on the interface,

including framing characters.

Discontinuities in the value of this counter can occur at

re-initialization of the management system, and at other

times as indicated by the value of

ifCounterDiscontinuityTime."

::= { ifEntry 10 }

Page 13: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

13

SNMP

• Result of the SNMP query

– The OID has a Counter32 variable, so the result of the query is a 32 bits value stored in that variable

– i.e.: Real query done to a Cisco switch:

– .1.3.6.1.2.1.2.2.1.10.65 = Counter32: 36307165

– That result translated into text using IF-MIB

– .iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifInOctets.65 = Counter32: 36307165

• Conclusion of the results obtained

– The inbound octects that have passed through the Interface Index 65 of the network equipment queried are 36307165 total octets at the time queried

– For having results in bps, queries must be polled in time and calculate delta value between samples

Page 14: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

14

Agenda

• Part I: Monitoring• Concepts• SNMP• Hardware• Tools• Active Monitoring

Page 15: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

15

Hardware

• The hardware involved in SNMP monitoring are all the network equipment and servers

• RedIRIS core network– Layer 2 switches

– Nortel MERS 8610– Cisco Catalyst 6500

– Layer 3 routers– Juniper T-320, M-320– Juniper MX-480, MX-960– Juniper M120, M40e, M20, M10i

• RedIRIS access network– Layer 2 switches

– Juniper EX-4200– Cisco Catalyst 2960

– Layer 3 routers– Juniper M7i

• RedIRIS servers– Red Hat Linux Enterprise 4.x and 5.x– Solaris 8 and Solaris 10

Page 16: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

16

Hardware

• SNMP configuration

– Network equipment (L2, L3)

– General config parameters

– SNMP version

– SNMP communities (RO, RW)

– SNMP clients

– TRAPs to send to the SNMP manager

– Source address to bind TRAP packets

– Location and contact details

– TRAP details

– Vendor specific

– Vendor MIBs in SNMP manager

– Categories

– Authentication

– Chassis

– Link

– VLANs

– Configuration

– Routing

– STP

– …

Page 17: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

17

Hardware

• SNMP configuration

– Cisco IOS

– Parameters configured globally

snmp-server community public RO

snmp-server community private RW

snmp-server trap-source Vlan40

snmp-server location RedIRIS NOC; Ed. BRONCE, Pza. Manuel Gomez Moreno, s/n, 28020-Madrid

snmp-server contact RedIRIS NOC; +34 91 2127620; <[email protected]>

snmp-server enable traps snmp authentication linkdown linkup coldstart warmstart

snmp-server enable traps vlancreate

snmp-server enable traps vlandelete

snmp-server enable traps config

snmp-server enable traps bridge newroot topologychange

snmp-server enable traps syslog

snmp-server host 130.206.1.39 version 2c community

snmp-server tftp-server-list 80

snmp-server chassis-id number

Page 18: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

18

Hardware

• SNMP configuration– Juniper JUNOS– Configured in snmp dedicated module of the configuration

snmp {

location "Centro de Gestion de RedIRIS, C/ Serrano 142 (28006-Madrid)";

contact "RedIRIS NOC; +34 912127620; +34 629148201; <[email protected]>";

community <community> {

authorization read-only;

clients {

130.206.1.39/32;

130.206.1.40/32;

}

}

trap-options {

source-address lo0;

}

/* Notifications */

trap-group <trap-group-name>{

version v2;

categories {

authentication;

chassis;

link;

remote-operations;

routing;

startup;

rmon-alarm;

}

targets {

130.206.1.39;

}

}

}

Page 19: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

19

Hardware

• SNMP configuration– Servers (Solaris, Linux)

– SNMP manager used in RedIRIS (NET-SNMP)– Both client and server features– Used for Solaris and Linux systems– Available for free (http://www.net-snmp.org/)

– SNMP config files– /etc/snmp/snmpd.conf

– SNMP daemon config file– Listening UDP port 161#ACL

com2sec local 127.0.0.1/32 <community>

com2sec myLAN192.168.1.0/24 <community>

#ACL assignment for RW and RO groups

group MyRWGroup v1 local

group MyRWGroup v2c local

group MyROGroup v1 myLAN

group MyROGroup v2c myLAN

# MIB tree to be queried

## name incl/excl subtree mask(optional)

view all included .1 80

#group context sec.model sec.level prefix read write notif

access MyROGroup "" any noauth exact all none none

access MyRWGroup "" any noauth exact all all all

# Contact Information

syslocation RedIRIS NOC; Ed. BRONCE, Pza. Manuel Gomez Moreno, s/n, 28020-Madrid

syscontact RedIRIS NOC; +34 91 2127620; [email protected]

Page 20: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

20

Hardware

• SNMP configuration– Servers (Solaris, Linux)

– SNMP manager used in RedIRIS (NET-SNMP)– Both client and server features– Used for Solaris and Linux systems– Available for free (http://www.net-snmp.org/)

– SNMP config files

– /etc/snmp/snmptrapd.conf

– TRAP receiver daemon config file

– Listening UDP port 162# --== SONET/SDH Alamrs ==--

traphandle JUNIPER-SONET-MIB::jnxSonetAlarmSet /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected] [email protected]

traphandle JUNIPER-SONET-MIB::jnxSonetAlarmCleared /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected] [email protected]

# --== Links ==--

traphandle IF-MIB::linkUp /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected] [email protected]

traphandle IF-MIB::linkDown /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected] [email protected]

# --== BGP ==--

traphandle BGP4-MIB::bgpEstablished /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected] [email protected]

traphandle BGP4-MIB::bgpBackwardTransition /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected] [email protected]

– Traphandle is used to execute a script (traptoemail)

– Traptoemail is a script that processes traps and send them user-friendly via e-mail to RedIRIS NOC

Page 21: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

21

Hardware

• SNMP configuration– Servers (Solaris, Linux)

– SNMP daemons

– /etc/init.d/snmpd

– /etc/init.d/snmptrapd

– Launching options

– start

– status (for snmpd)

– stop

– restart

– reload

– Options in daemon:

– OPTIONS="-c /etc/snmp/snmptrapd.conf -o /var/log/snmptrap.log -u /var/run/snmptrapd.pid -M /usr/local/share/snmp/mibs/ -m ALL”

– This will take snmptrapd.conf as config file for the daemon, will generate snmptrapd.log and snmptrapd.pid files and will load ALL MIBs on the machine in the defined path

Page 22: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

22

Agenda

• Part I: Monitoring• Concepts• SNMP• Hardware• Tools• Active Monitoring

Page 23: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

23

Tools

• trap2email– Perl script combined with SNMP trap handler used to convert SNMP traps

to e-mail messages– Should be launched as an extension of snmptrapd, not as a regular user– Options

– -s smtpserver– -f fromaddress– toaddress

– traphandle IF-MIB::linkUp /usr/local/bin/traptoemail -s chico.rediris.es -f [email protected] [email protected]

– Line in /etc/snmp/snmptrapd.conf file– Results

Host: EB-Santiago0 (130.206.204.254)

SNMPv2-MIB::sysUpTime.0 112:4:13:18.95

SNMPv2-MIB::snmpTrapOID.0 IF-MIB::linkUp

IF-MIB::ifIndex.121 121

IF-MIB::ifAdminStatus.121 up

IF-MIB::ifOperStatus.121 up

IF-MIB::ifName.121 so-3/0/0

SNMPv2-MIB::snmpTrapEnterprise.0 JUNIPER-CHASSIS-DEFINES-MIB::jnxProductNameM40e

Interfaz: so-3/0/0

Descripcion del interfaz: -- Conexion RedIRIS-FCCN I - Num. Adm. 1530000-1022512

Page 24: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

24

Tools

• MRTG (The Multi Router Traffic Grapher)– Tool written in Perl downloadable for free from MRTG main web-site

licensed under GPL (http://oss.oetiker.ch/mrtg/)– The tool uses SNMP to query network devices and gets information from

them– The results of the queries are stored (log or RRD)– Those files are processed and included in a HTML file with PNG graphs– RedIRIS use RRD (Round Robin Database) format to store data collected– Example of graph generated with MRTG and RRD data

Page 25: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

25

Tools

• MRTG basic components– mrtg: main program– cfgmaker: script used to generate .cfg files needed for the main program

to generate graphs– RRDtool: if required. In RedIRIS RRD is used so RRDtool is needed and

information is stored in RRD database format– RRDtool is a free opensource tool licensed under GPL– Downloadable (http://oss.oetiker.ch/rrdtool/)

• MRTG configuration– MRTG needs .cfg files to generate HTML web pages where information is

displayed– cfgmaker [options] [community@]router [[options]

[community@]router ...]– Some options available:

– --ifref=nr interface references by Interface Number (default)

– --ifref=ip ... by Ip Address

– --ifref=eth ... by Ethernet Number

– --ifref=descr ... by Interface Description

– --ifref=name ... by Interface Name

– --ifref=type ... by Interface Type

– --ifdesc=nr interface description uses Interface Number (default)

– --ifdesc=ip ... uses Ip Address

– --ifdesc=descr ... uses Interface Description

– --ifdesc=name ... uses Interface Name

– --ifdesc=alias ... uses Interface Alias

– --ifdesc=type ... uses Interface Type

Page 26: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

26

Tools

• MRTG configuration– Command used in RedIRIS– ./cfgmaker --global "HtmlDir: /home/mrtg/datos/GAL/html" --global

"ImageDir: /home/mrtg/datos/GAL/html/image" --global "LogDir: /home/mrtg/datos/GAL/html/log" --global "LogFormat: rrdtool" --global "PathAdd: /usr/bin/" --global "Options[_]: growright, bits" --snmp-options=:::::2 <community>@eb-santiago0

HtmlDir: /home/mrtg/datos/GAL/html

ImageDir: /home/mrtg/datos/GAL/html/images

LogDir: /home/mrtg/datos/GAL/html/log

LogFormat: rrdtool

PathAdd:/usr/bin/

#WorkDir:/home/noc/mrtg/html/GAL

Refresh:300

Language: Spanish

Forks: 4

RunAsDaemon:Yes

Interval:5

Background[_]: #e8e7dc

#---------------------------------------------------------------

YLegend[cesga]: Bits por segundo

Options[cesga]: growright, bits

Target[cesga]: /130.206.204.21:<community>@eb-santiago0.rediris.es:::::2

MaxBytes[cesga]: 312500000

Title[cesga]: Línea de acceso CESGA

PageTop[cesga]:

<TABLE>

<TR><TD>Línea:</TD><TD>GigabitEthernet 1000 Mbps</TD></TR>

<TR><TD>Sistema:</TD><TD>EB-Santiago0</TD></TR>

<TR><TD>Administrador:</TD><TD>NOC de RedIRIS; +34-91 212 76 20/25; <[email protected]></TD></TR>

</TABLE>

#---------------------------------------------------------------

Page 27: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

27

Tools

• MRTG results

Page 28: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

28

Tools

• MRTG organization in RedIRIS– Each RedIRIS Node has an unique cfg file– MRTG statistics divided in several groups

– RedIRIS10 links– External links– Multicast statistics– BGP peerings– Monthly statistics– Yearly statistics– RedIRIS Central Services– Special Projects links– Access statistics

– Alphabetically ordered by Institution

Page 29: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

29

Tools

• Wheathermap– Combination of several files to generate the map

– SVG map for output– XML file with the status of the network– PNG files to display in a web page

Page 30: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

30

Tools

• Nagios– Open Source monitoring tool licensed under GPL– Free downloadable (http://www.nagios.org/)– Prerequisites needed to install the tool

– HTTP server (Apache)– GCC compiler to build the binaries from source– GD development libraries

– In fedora Linux for example all packages can be installed with yum

yum install httpd

yum install gcc

yum install glibc glibc-common

yum install gd gd-devel

– Download and install Nagios and Nagios Plugins– Nagios Plugins are needed to check the status of hosts and services

– HTTP, POP3, FTP, SSH, NTP…– CPU Load, Disk Usage, Memory Usage, Users…– Servers and Hosts (Unix/Linux, Windows)– Routers, Switches– …

Page 31: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

31

Tools

• Nagios configuration– Main Configuration File

– /usr/local/nagios/etc/nagios.cfg– File read by daemon and CGIs– Default file OK for starting

– Resource Files– Used to store user defined macros– Referenced in nagios.cfg

– Object Definition Files– Used to define hosts, services and

everything to be monitored– Used to define HOW hosts are

monitored– Referenced in nagios.cfg

– CGI Configuration File– Used to define directives that affect

the operation of CGIs– Referenced in nagios.cfg

Page 32: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

32

Tools

• Nagios configuration examples– Main Configuration File – nagios.cfg

– Default file after installing is OK for starting with the tool– Resource Files

– Optional and useful to store usernames, passwords of paths– See resource.cfg file in the sample-config directory of the Nagios

installation package– Object Definition Files

– Defined in nagios cfg: cfg_file=<file_name>cfg_file=/usr/local/nagios/etc/hosts.cfg

cfg_file=/usr/local/nagios/etc/services.cfg

cfg_file=/usr/local/nagios/etc/commands.cfg

– Example hosts.cfg filedefine host{

use generic-host

host_name chico.rediris.es

alias Chico

Address 130.206.1.3

check_command check-host-alive

max_check_attempts 10

notification_interval 120

notification_period 24×7

notification_options d,u,r

}

– CGI Configuration File– cgi.cfg file located in the config directory

authorized_for_system_information=nagiosadmin

authorized_for_configuration_information=nagiosadmin

authorized_for_system_commands=nagiosadmin

Page 33: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

33

Tools

• Nagios running

Page 34: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

34

Tools

• Nagios running

Page 35: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

35

Tools

• Nagios running

Page 36: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

36

Tools

• Nagios running

Page 37: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

37

Tools

• Nagios running

Page 38: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

38

Tools

• NagVis

– NagVis is a visualization addon for Nagios

– Free GPL software (http://www.nagvis.org/)

– Objects placed in maps updated periodically

– Maps organized:

– geographically

– physicallly

– Logically

– By processes

– NagVis collects the information from backends

– Default backend delivered with NagVis: NDO (Nagios Data Out) MySQL Backend

– All objects from Nagios can be added to NagVis

– Each map has its own configuration file

Page 39: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

39

Tools

• NagVis deployment in RedIRIS

Page 40: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

40

Tools

• NagVis deployment in RedIRIS

Page 41: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

41

Tools

• NagVis deployment in RedIRIS

Page 42: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

42

Tools

• NagVis deployment in RedIRIS

Page 43: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

43

Agenda

• Part I: Monitoring• Concepts• SNMP• Hardware• Tools• Active Monitoring

Page 44: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

44

Active Monitoring

• Until now all monitoring issues covered are passive monitoring related

– Passive monitoring is considered when devices are periodically polled to collect data

• Active Monitoring – What is?

– Active requires “action”

– Active monitoring is considered when injecting packets in the network to make tests and get results

– Throughput

– Delay

• Active Monitoring – How to do it?

– In RedIRIS we are actually deploying perfSONAR (PERFormance Service Oriented Network monitoring ARchitecture )

– Information and downloading (http://www.perfsonar.net/)

– DANTE vs Internet2 version

– JAVA vs Perl

Page 45: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

45

Active Monitoring

• perfSONAR components

– Client / Server application

– Client-side - perfSONAR UI (User Interface)

– Server-side

– 1 Linux box for throughput measurements (BWCTL)

– 1 Linux box for delay measurements (OWAMP)

– Server installation

– Red Hat Enterprise Linux 5.3 recomended

– May run in any Linux distribution

– RedIRIS tested in CentOS Linux 5.3

– Set of tools available in RPM binaries and TGZ sources

– Some dependencies not resolved

– It’s not expensive but hard to deploy

– Client installation

– JAVA graphical client multi-platform available

Page 46: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

46

Active Monitoring

• perfSONAR UI in action

Page 47: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

47

Active Monitoring

• perfSONAR services

– Measurement Point Service

– It creates and/or publish monitoring information related to active or passive measurements

– Measuremente Archive Service

– It stores and publish received information from Measurement Point Services

– Transformation Service

– It provides the capability to manipulate the stored data of the measurements performed

– Lookup Service

– Used to discover services and other LS

– Topology Service

– Allows the information of network topology is available to other services

– Finds closest MP

– Provides information of network topology to the visualization tools

– Authentication Service

– Controls access to services

Page 48: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

48

Active Monitoring

• perfSONAR services

– Measurement Point Service

– It creates and/or publish monitoring information related to active or passive measurements

– Measuremente Archive Service

– It stores and publish received information from Measurement Point Services

– Transformation Service

– It provides the capability to manipulate the stored data of the measurements performed

– Lookup Service

– Used to discover services and other LS

– Topology Service

– Allows the information of network topology is available to other services

– Finds closest MP

– Provides information of network topology to the visualization tools

– Authentication Service

– Controls access to services

Page 49: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

49

Active Monitoring

Client

Network A Network B

LS A LS BMA A MA B

a b

c de f

¿Link utilization – IPs a,b,c?

a,b,c : Net A, MA A

Get link abc utilization

Response

GraphgLS¿Where get info from Networks A and B?

LS A, LS B

• perfSONAR Client interaction

Page 50: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

50

Active Monitoring

• perfSONAR tools

– OWAMP (One Way Active Measurement Protocol)

– Daemon that runs one-way latency tests

– Provides:

– More accurate picture of the performance degradation (direction of degradation, is more sensitive to jitter)

– Vision of the routing (hops, one-way latency)

– Availability Information

– Temporal reference about problems

– BWCTL (BandWidth test ConTroLler)

– Daemon that runs iperf tests with multiple instances support

– Provides:

– Troubleshooting tool because it makes use of the network the same way as a user wouldArchivado de pruebas realizadas con límite de tráfico alcanzado

– More tools

Page 51: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

Active Monitoring

• Spanish LHC architecture

Page 52: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

Active Monitoring

• perfSONAR web-services (LS web admin interface)

Page 53: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

Active Monitoring

• perfSONAR web-services (LS Basic Configuration)

Page 54: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

54

Agenda

• Part II: Operational Procedures• Organization• Incidents• Maintenance works• 24x7• SLAs• Procedure

Page 55: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

55

Organization

• RedIRIS NOC is structured in levels

– Level 1

– Initial response team

– Monitoring network devices in real time

– Answering ops mailbox and level 1 queue

– Answering customer phone calls

– First approach to solve problems

– Dealing with carriers directly

– External company support

– Level 2

– Second level response team

– Answering noc mailbox and level 2 queue

– Supporting more complex network problems

– Dealing with vendors

– RedIRIS people

– External company support

Page 56: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

56

Agenda

• Part II: Operational Procedures• Organization• Incidents• Maintenance works• 24x7• SLAs• Procedure

Page 57: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

57

Incidents

• Incidents reported in several ways

– Tickets tool

– Web interface tool where all incidents are queued

– Main level 1 and level 2 team support tool

– e-mail

– RedIRIS ops and noc mailboxes

– Customers suppport mailboxes

– Network devices problems reports

– Telephone

– Customers also contact level 1 by phone

– Monitoring tools

– All the monitoring platform reports indicents in the network

– Level 1 continue checking monitoring tools

– Logs

– All the machines logs are stored and processed when problems are detected

Page 58: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

58

Agenda

• Part II: Operational Procedures• Organization• Incidents• Maintenance works• 24x7• SLAs• Procedure

Page 59: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

59

Maintenance works

• Different possibilities

– Network operator programmed work

– 15 previous days notification

– RedIRIS aceptation

– RedIRIS programmed work

– Engineering tasks

– Maintenance tasks

– New service configuration

– Non-programmed works

– Due to unexpected problems

– Network links (fiber cuts, etc.)

– Network equipment (hardware problems)

• Ticket system notification for all Institutions connected to RedIRIS

– Web based tool used to notify and update information about network problems

– Notifications via e-mail

Page 60: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

60

Agenda

• Part II: Operational Procedures• Organization• Incidents• Maintenance works• 24x7• SLAs• Procedure

Page 61: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

61

24x7

• External company 24x7x365 monitoring

– Support when RedIRIS people not in the office

– Procedures to monitor all RedIRIS equipment

– Procedures to open/close RMAs

– Hardware replacement procedures established

– Network operator and hardware vendors interaction

• They can also do in the equipment

– Execute “show” commands for monitoring

– Receive SNMP trap notifications

– Console login for Hardware replacements

• They can NOT do in the equipment

– Execute “config” commands

– Modify running configuration

– Configure new services

Page 62: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

62

Agenda

• Part II: Operational Procedures• Organization• Incidents• Maintenance works• 24x7• SLAs• Procedure

Page 63: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

63

SLAs

• Network Operators SLA

– Maintanence works MUST be 15 previous days notified

– If this is not done then a penalty is applied

– The links stability and quality must be guaranteed

– No degradation

– No outages

– There is a penalty for link failures greater than 10 secs

– There is a maximum incident response time established

– Incremental penalty to several failures of the same link

• External company SLA

– Dedicated people guaranteed

– Maximum incident response time

– Hardware stockage available

• Hardware vendor SLA

– 4 hour hardware replacement guaranteed

– Engineering support

Page 64: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

64

Agenda

• Part II: Operational Procedures• Organization• Incidents• Maintenance works• 24x7• SLAs• Procedure

Page 65: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

65

Procedure

• Incidents reported via Trouble Ticket tool

Page 66: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

66

Procedure

• Web or e-mail managed incidents

Page 67: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

67

Procedure

• New ticket creation – Also can be done by e-mail

Page 68: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

68

Procedure

• All new incidents are included in the Trouble Ticket system

– e-mail notifications

– phone calls

– Incidents reported by monitoring tools

– New service deployment

• All incidents are stored in a MySQL database

– Reports

– Statistics

– Tracing

• Level 1 to Level 2 escalating

Page 69: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

69

Procedure

• Network outages notifications

– Same tool used

Page 70: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

70

Procedure

• Results – Network tickets opened

Page 71: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

71

Procedure

• Results – Network ticket tracing

Page 72: 1 RedIRIS – Alberto Escolano Sánchez alberto.escolano@rediris.es RedIRIS monitoring and operational procedures.

72

Questions ?