Ruciomonitoring - IndicoInternalsystemhealthmonitoring...
Transcript of Ruciomonitoring - IndicoInternalsystemhealthmonitoring...
Rucio monitoringTeng Li
Rucio monitoring
• Different categories:• Internal system health monitoring
• Graphite / Grafana• Transferring / Staging / Deletion monitoring, pilot traces
• Message queue / Kafka / Elasticsearch / InfluxDB / Grafana / Kibana• Or periodic full database dumps for analytics
Internal system health monitoring
• Metrics sent by rucio daemons. Collected by graphite via statsd.
• Activities of various daemons:• Judge, Conveyer, Hermes, Kronos, Reaper, Necromancer, Transmogrifier….
• Easy to be enriched. Very undocumented
record_counter()record_timer()record_gauge()
JudgeConveyer
HermesKronos
Graphitestatsd
Grafana
pystatsd
Internal system health monitoring
Transferring / Staging / Deletion monitoring
• Messages generated to record data transfer
• Rucio daemons (Conveyer) generate messages when submitting / staging / queueing / finishing transfers (or client traces)
• Hermes send the messages to the broker
• Message ingested to es or influxDB
• Visualized using Grafana / Kibana
Ruciodaemons
HermesMsg
Broker
RabbitMQ/ActiveMQ…
Logstash Elasticsearch
Kibana
Kafka
Grafana
Transferring / Staging / Deletion monitoring
Different categories ofmessages:
• transfer• deletion• client trace
Over 80 metrics
Transferring / Staging / Deletion monitoring
Transferring / Staging / Deletion monitoring
• Undocumented• Message formats/types• How to export etc.
• Hard to extend• Messages are coupled to multiple daemons and rucio core
• Need enrichment as a full ddm monitoring system• fts• site topology• site information• etc.