Ruciomonitoring - IndicoInternalsystemhealthmonitoring...

8
Rucio monitoring Teng Li

Transcript of Ruciomonitoring - IndicoInternalsystemhealthmonitoring...

Page 1: Ruciomonitoring - IndicoInternalsystemhealthmonitoring •Metricssentbyruciodaemons.Collectedbygraphiteviastatsd. •Activitiesofvariousdaemons: •Judge,Conveyer,Hermes,Kronos,Reaper

Rucio monitoringTeng Li

Page 2: Ruciomonitoring - IndicoInternalsystemhealthmonitoring •Metricssentbyruciodaemons.Collectedbygraphiteviastatsd. •Activitiesofvariousdaemons: •Judge,Conveyer,Hermes,Kronos,Reaper

Rucio monitoring

• Different categories:• Internal system health monitoring

• Graphite / Grafana• Transferring / Staging / Deletion monitoring, pilot traces

• Message queue / Kafka / Elasticsearch / InfluxDB / Grafana / Kibana• Or periodic full database dumps for analytics

Page 3: Ruciomonitoring - IndicoInternalsystemhealthmonitoring •Metricssentbyruciodaemons.Collectedbygraphiteviastatsd. •Activitiesofvariousdaemons: •Judge,Conveyer,Hermes,Kronos,Reaper

Internal system health monitoring

• Metrics sent by rucio daemons. Collected by graphite via statsd.

• Activities of various daemons:• Judge, Conveyer, Hermes, Kronos, Reaper, Necromancer, Transmogrifier….

• Easy to be enriched. Very undocumented

record_counter()record_timer()record_gauge()

JudgeConveyer

HermesKronos

Graphitestatsd

Grafana

pystatsd

Page 4: Ruciomonitoring - IndicoInternalsystemhealthmonitoring •Metricssentbyruciodaemons.Collectedbygraphiteviastatsd. •Activitiesofvariousdaemons: •Judge,Conveyer,Hermes,Kronos,Reaper

Internal system health monitoring

Page 5: Ruciomonitoring - IndicoInternalsystemhealthmonitoring •Metricssentbyruciodaemons.Collectedbygraphiteviastatsd. •Activitiesofvariousdaemons: •Judge,Conveyer,Hermes,Kronos,Reaper

Transferring / Staging / Deletion monitoring

• Messages generated to record data transfer

• Rucio daemons (Conveyer) generate messages when submitting / staging / queueing / finishing transfers (or client traces)

• Hermes send the messages to the broker

• Message ingested to es or influxDB

• Visualized using Grafana / Kibana

Ruciodaemons

HermesMsg

Broker

RabbitMQ/ActiveMQ…

Logstash Elasticsearch

Kibana

Kafka

Grafana

Page 6: Ruciomonitoring - IndicoInternalsystemhealthmonitoring •Metricssentbyruciodaemons.Collectedbygraphiteviastatsd. •Activitiesofvariousdaemons: •Judge,Conveyer,Hermes,Kronos,Reaper

Transferring / Staging / Deletion monitoring

Different categories ofmessages:

• transfer• deletion• client trace

Over 80 metrics

Page 7: Ruciomonitoring - IndicoInternalsystemhealthmonitoring •Metricssentbyruciodaemons.Collectedbygraphiteviastatsd. •Activitiesofvariousdaemons: •Judge,Conveyer,Hermes,Kronos,Reaper

Transferring / Staging / Deletion monitoring

Page 8: Ruciomonitoring - IndicoInternalsystemhealthmonitoring •Metricssentbyruciodaemons.Collectedbygraphiteviastatsd. •Activitiesofvariousdaemons: •Judge,Conveyer,Hermes,Kronos,Reaper

Transferring / Staging / Deletion monitoring

• Undocumented• Message formats/types• How to export etc.

• Hard to extend• Messages are coupled to multiple daemons and rucio core

• Need enrichment as a full ddm monitoring system• fts• site topology• site information• etc.