New SNMP-based Monitoring of a Computing Cluster · 2005. 4. 1. · SNMP-based Monitoring of a...
Transcript of New SNMP-based Monitoring of a Computing Cluster · 2005. 4. 1. · SNMP-based Monitoring of a...
-
SNMP-based Monitoring of aComputing Cluster
Moreno Marzolla
Dipartimento di Informatica Università di Venezia
&
INFN Padova/BaBar collaboration
Workshop sulle problematiche di calcolo e reti nell’INFN – p.1
http://www.dsi.unive.it/http://www.pd.infn.it/http://www.pd.infn.it/~marzolla
-
Talk Outline
� The Monitoring Challenge
� Some Existing Tools
� ASC: the Asynchronous SNMP Collector
� Conclusions and Future Work
Workshop sulle problematiche di calcolo e reti nell’INFN – p.2
http://www.dsi.unive.it/http://www.pd.infn.it/
-
The Challenge
BaBar’s Data Reprocessing suddenly stops.What happened?
� Client process(es) crashed
� Server process(es) crashed
� The local disk failed
� The CPU melted
� /tmp overflowed
� None of the above
Workshop sulle problematiche di calcolo e reti nell’INFN – p.3
http://www.dsi.unive.it/http://www.pd.infn.it/
-
Requirements
� Scalable up to ��� �
machines and more.
� Easy to configure: If one has to do simple things theeffort required to configure the tool should be minimal.
� Extensible: New functionalities should be easy to addas they are needed.
� General: Needs to monitor the network switch, tapelibrary, UPS, environmental system...
� GUI-independent: Should work as a regular UNIXdæmon, yet providing a convenient user interface(s).
Workshop sulle problematiche di calcolo e reti nell’INFN – p.4
http://www.dsi.unive.it/http://www.pd.infn.it/
-
Some tools
There are many freely distributable monitoring toolsavailable.
�
MRTG http://people.ee.ethz.ch/ oetiker/webtools/mrtg/
�
NGop http://www-isd.fnal.gov/ngop/
�
RemStats http://silverlock.dgim.crc.ca/remstats/release/index.html
�
Ganglia http://ganglia.sourceforge.net/
�
Nagios http://www.nagios.org/
�
Snips http://www.netplex-tech.com/software/snips/
�
Cricket http://cricket.sourceforge.net/
�
Many, many others...
Workshop sulle problematiche di calcolo e reti nell’INFN – p.5
http://www.dsi.unive.it/http://www.pd.infn.it/http://people.ee.ethz.ch/~oetiker/webtools/mrtg/http://www-isd.fnal.gov/ngop/http://silverlock.dgim.crc.ca/remstats/release/index.htmlhttp://ganglia.sourceforge.net/http://www.nagios.org/http://www.netplex-tech.com/software/snips/http://cricket.sourceforge.net/
-
What’s wrong?
So, why don’t simply use one of the many programsavailable?Because they fail to meet our requirements. In particular:
� Many of them don’t scale well/at all.
� Configuration is highly nontrivial in most cases (lots ofdifferent configuration files scattered around...).
� Many of them require special dæmons running on themonitored hosts: can’t handle the network switch/tapelibary/UPS...
Workshop sulle problematiche di calcolo e reti nell’INFN – p.6
http://www.dsi.unive.it/http://www.pd.infn.it/
-
The Do-It-Yourself approach
So we decided to build a tool (ASC, theAsynchronous SNMP Collector ) from scratch.
� The tool is entirely written in C.
� Use asynchronous (non-blocking) SNMPrequests for data collection.SNMP (Simple Network Management Protocol is astandard protocol supported by many different piecesof hardware. Even our air conditioning system speaksSNMP...
Workshop sulle problematiche di calcolo e reti nell’INFN – p.7
http://www.dsi.unive.it/http://www.pd.infn.it/
-
The Do-It-Yourself approach (2)
� Data are stored in Round Robin DatabasesProvide facilities for storing timestamped data withdifferent granularities; facilities for plotting graphs arealso provided
� The configuration file is written in XML
� ASC embeds a simple HTTP interfaceHTML pages are generated by applying an XSLTstylesheet to an automatically-generated XML statusfile
Workshop sulle problematiche di calcolo e reti nell’INFN – p.8
http://www.dsi.unive.it/http://www.pd.infn.it/
-
XML Configuration File
Workshop sulle problematiche di calcolo e reti nell’INFN – p.9
http://www.dsi.unive.it/http://www.pd.infn.it/
-
XML to HTML Output
HTTPDASC
Monitored Hosts
XML
StylesheetsXSLT
PagesHTML
Status
ConfigurationFile
Workshop sulle problematiche di calcolo e reti nell’INFN – p.10
http://www.dsi.unive.it/http://www.pd.infn.it/
-
HTML Interface
Workshop sulle problematiche di calcolo e reti nell’INFN – p.11
http://www.dsi.unive.it/http://www.pd.infn.it/
-
Conclusions
ASC is still under development. Are weapproaching the goals?
� Scalability On the test farm ( � � � machines) it isworking very well. Will it scale tenfold? (Note thatapplying stylesheets is not cheap...
� Easy Configuration We have a single configurationfile, which can optionally be split in different parts anduse macros. These are standard features of XML.
Workshop sulle problematiche di calcolo e reti nell’INFN – p.12
http://www.dsi.unive.it/http://www.pd.infn.it/
-
Facing a scalability limit
Hosts
ProxyProxy
Top−LevelProxy
ASC ASC ASC
Workshop sulle problematiche di calcolo e reti nell’INFN – p.13
http://www.dsi.unive.it/http://www.pd.infn.it/
-
Work(s) in Progress
� Implement Alarms
� Understand SNMP Traps
� Active control of ASC with its WEB interface
� Implement more XSLT stylesheets
�
Write the documentation
Workshop sulle problematiche di calcolo e reti nell’INFN – p.14
http://www.dsi.unive.it/http://www.pd.infn.it/
Talk OutlineThe ChallengeRequirementsSome toolsWhat's wrong?The Do-It-Yourself approachThe Do-It-Yourself approach (2)XML Configuration FileXML to HTML OutputHTML InterfaceConclusionsFacing a scalability limitWork(s)in Progress