New SNMP-based Monitoring of a Computing Cluster · 2005. 4. 1. · SNMP-based Monitoring of a...

14
SNMP-based Monitoring of a Computing Cluster Moreno Marzolla [email protected] Dipartimento di Informatica Universit ` a di Venezia & INFN Padova/BaBar collaboration Workshop sulle problematiche di calcolo e reti nell’INFN – p.1

Transcript of New SNMP-based Monitoring of a Computing Cluster · 2005. 4. 1. · SNMP-based Monitoring of a...

  • SNMP-based Monitoring of aComputing Cluster

    Moreno Marzolla

    [email protected]

    Dipartimento di Informatica Università di Venezia

    &

    INFN Padova/BaBar collaboration

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.1

    http://www.dsi.unive.it/http://www.pd.infn.it/http://www.pd.infn.it/~marzolla

  • Talk Outline

    � The Monitoring Challenge

    � Some Existing Tools

    � ASC: the Asynchronous SNMP Collector

    � Conclusions and Future Work

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.2

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • The Challenge

    BaBar’s Data Reprocessing suddenly stops.What happened?

    � Client process(es) crashed

    � Server process(es) crashed

    � The local disk failed

    � The CPU melted

    � /tmp overflowed

    � None of the above

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.3

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • Requirements

    � Scalable up to ��� �

    machines and more.

    � Easy to configure: If one has to do simple things theeffort required to configure the tool should be minimal.

    � Extensible: New functionalities should be easy to addas they are needed.

    � General: Needs to monitor the network switch, tapelibrary, UPS, environmental system...

    � GUI-independent: Should work as a regular UNIXdæmon, yet providing a convenient user interface(s).

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.4

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • Some tools

    There are many freely distributable monitoring toolsavailable.

    MRTG http://people.ee.ethz.ch/ oetiker/webtools/mrtg/

    NGop http://www-isd.fnal.gov/ngop/

    RemStats http://silverlock.dgim.crc.ca/remstats/release/index.html

    Ganglia http://ganglia.sourceforge.net/

    Nagios http://www.nagios.org/

    Snips http://www.netplex-tech.com/software/snips/

    Cricket http://cricket.sourceforge.net/

    Many, many others...

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.5

    http://www.dsi.unive.it/http://www.pd.infn.it/http://people.ee.ethz.ch/~oetiker/webtools/mrtg/http://www-isd.fnal.gov/ngop/http://silverlock.dgim.crc.ca/remstats/release/index.htmlhttp://ganglia.sourceforge.net/http://www.nagios.org/http://www.netplex-tech.com/software/snips/http://cricket.sourceforge.net/

  • What’s wrong?

    So, why don’t simply use one of the many programsavailable?Because they fail to meet our requirements. In particular:

    � Many of them don’t scale well/at all.

    � Configuration is highly nontrivial in most cases (lots ofdifferent configuration files scattered around...).

    � Many of them require special dæmons running on themonitored hosts: can’t handle the network switch/tapelibary/UPS...

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.6

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • The Do-It-Yourself approach

    So we decided to build a tool (ASC, theAsynchronous SNMP Collector ) from scratch.

    � The tool is entirely written in C.

    � Use asynchronous (non-blocking) SNMPrequests for data collection.SNMP (Simple Network Management Protocol is astandard protocol supported by many different piecesof hardware. Even our air conditioning system speaksSNMP...

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.7

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • The Do-It-Yourself approach (2)

    � Data are stored in Round Robin DatabasesProvide facilities for storing timestamped data withdifferent granularities; facilities for plotting graphs arealso provided

    � The configuration file is written in XML

    � ASC embeds a simple HTTP interfaceHTML pages are generated by applying an XSLTstylesheet to an automatically-generated XML statusfile

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.8

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • XML Configuration File

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.9

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • XML to HTML Output

    HTTPDASC

    Monitored Hosts

    XML

    StylesheetsXSLT

    PagesHTML

    Status

    ConfigurationFile

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.10

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • HTML Interface

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.11

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • Conclusions

    ASC is still under development. Are weapproaching the goals?

    � Scalability On the test farm ( � � � machines) it isworking very well. Will it scale tenfold? (Note thatapplying stylesheets is not cheap...

    � Easy Configuration We have a single configurationfile, which can optionally be split in different parts anduse macros. These are standard features of XML.

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.12

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • Facing a scalability limit

    Hosts

    ProxyProxy

    Top−LevelProxy

    ASC ASC ASC

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.13

    http://www.dsi.unive.it/http://www.pd.infn.it/

  • Work(s) in Progress

    � Implement Alarms

    � Understand SNMP Traps

    � Active control of ASC with its WEB interface

    � Implement more XSLT stylesheets

    Write the documentation

    Workshop sulle problematiche di calcolo e reti nell’INFN – p.14

    http://www.dsi.unive.it/http://www.pd.infn.it/

    Talk OutlineThe ChallengeRequirementsSome toolsWhat's wrong?The Do-It-Yourself approachThe Do-It-Yourself approach (2)XML Configuration FileXML to HTML OutputHTML InterfaceConclusionsFacing a scalability limitWork(s)in Progress