How to see what is happening inside your OpenStackusing Elastic …€¦ · inside your...
Transcript of How to see what is happening inside your OpenStackusing Elastic …€¦ · inside your...
How to see what is happening inside your OpenStack using Elastic
Stack and Prometheus
Introduction & Agendal About me
- Csaba Patyi ([email protected])
- Consultant and Instuctor at Component Soft Ltd.
- 6 years of experience from the Ops side
- Mirantis and COA Certs
- Linux Foundation Certs
l Agenda- About Component Soft- Logging with Elastic Stack- Monitoring with Prometheus- (Monasca and Ceilosca)
About Component Softl Educational Services
lBash, Perl, Python courses
lRed Hat Linux, Advanced Linux
lJava, Scala and C++ courses
lOpenStack, Docker, Kubernetes, Ceph
lSoftware testing methodologies
lApache, Tomcat, MySQL
lVeritas Storage Foundation and Cluster
lAuthorized Oracle courses
lNext Gen. Telecom and Networking
l Consulting Services:lOpenStack and Ceph consulting and support services
lDocker and Kubernetes consulting and support services
lOpen Source Full Stack consulting and support services
l Contact Us:Website: http://www.componentsoft.eu/Office and training site:
1116 BudapestFehérvári street 126-128
Phone: +36-1-487 4040Fax: +36-1-487-4047E-mail: [email protected]
What is OpenStack? - The technical term:Collection of Independent but related projects
ObjectStoreSwift
DashboardHorizon
ComputeServiceNova
ImageStoreGlance
BlockStorageCinder
IdentityServiceKeystone
NetworkServiceNeutron
TelemetryServiceCeilometer
OrchestrationServiceHeat
measures usage
automatescreation
provides web UI
providesconnectivity
provides volumes
providesOS
template
providesbackstore
provides authentication &
authorization
Distribution of services
horizon
cinder
glance
nova-api
nova-compute
nova-scheduler
neutron-l2-agent
Mysql
RabbitMQ/Qpid
keystone
neutron-l3-agent
swift-proxy
swift-object
swift-account
swift-container
Controllernode(3)
Networknode**
Computenode**
Storagenode**
neutron-server
neutron-l2-agent
VM provisioning in-depth
nova-apinova-compute
nova-scheduler
DB
hypervisor
Keystone
Glance
Neutron
Cinder
Client
nova-conductor
AMQ
1
2 3
4
5
6
7
8 11
9
10
12
13
14
15
16
18
17
19
Hit an error? No problem. Check the logs… Where are they?
Node role /Project Name
Controller Network Compute Swift Storage
Horizon 3Keystone 5Nova 9 2 2Neutron 3 10 2Glance 2Cinder 5 1Ceilometer (+ aodh and gnocchi)
23 1 1 1
Heat 4Swift 2 1SUM 56 26 5 3
3 Controller, 2 Network, 20 Compute, 6 Storage Node == 31 nodes and 312 log files.# OpenStack log file locations per service: https://docs.openstack.org/<os-release-name>/config-reference/
for i in horizon keystone nova neutron glance cinder ceilometer aodh gnocchi heat swift ; doCOUNT=`find /var/log/ | egrep $i | grep log$ | wc -l`echo "$i : $COUNT"
done;
Elastic Stack for the rescuel Formally known as ELK stack
- Elasticsearch → Storing and indexing logs. Makes fast search possible
- Logstash → transforming incoming logs and sending it to Elasticsearch
- Kibana → Web interface for the Elastic Stack
l New modules/plugins:
- XPACK (most of them only in the payed version)
- Beats
l Filebeat → a small agent application collecting and sending logs directly toElasticSearch or Logstash
l Metricbeat → a small agent application collecting and sending metrics directly toElasticSearch or Logstash
l Etc.
Configuration for filebeat#/etc/filebeat/filebeat.ymlfilebeat:prospectors:. . .-paths:- "/var/log/nova/*.log"
exclude_files: ['\.gz$']document_type: novatags: [openstack_service_logs, nova']
. . .
output:logstash:enabled: truehosts:- logstash-server:5044
index: filebeatbulk_max_size: 50
Really simple Logstash config#/usr/share/logstash/pipeline/logstash.confinput {
beats {port => 5044
} }
filter {grok {match => { "message" =>
"(?m)^%{TIMESTAMP_ISO8601:date}\s+\d+\s+(?<loglevel>AUDIT|CRITICAL|DEBUG|INFO|TRACE|WARNING|ERROR)\s(?<module>\S+).*$" }
}if [module] == "iso8601.iso8601" {drop {}
} }
output {elasticsearch {hosts => "elasticsearch-server:9200"sniffing => truemanage_template => falseindex => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"document_type => "%{[@metadata][type]}"
}}
What about multiline logs?1: 2017-05-29 03:42:56.491 1980 WARNING neutron.db.agents_db [req-2d2958ae-dcaa-40be-b7d3-6e3513a0f2d3 - - - - -] Agent healthcheck: found 5 dead agents out of 13:2: Type Last heartbeat host3: Metering agent 2017-05-26 11:13:39 network2.openstack.local4: Loadbalancerv2 agent 2017-05-26 11:13:41 network2.openstack.local5: L3 agent 2017-05-26 11:13:41 network2.openstack.local6: Metadata agent 2017-05-26 11:13:39 network2.openstack.local7: DHCP agent 2017-05-26 11:13:10 network2.openstack.local
What about multiline logs?
1: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Failed reporting state!2: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):3: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "file-location", line 320, in _report_state4: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent True)5: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "another-file-location", line 88, in report_state6: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return method(context, 'report_state', **kwargs)7: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "another-another-file", line 169, in call
https://play.golang.org/p/vZWQJt5lQ0
Filebeat multiline config example#/etc/filebeat/filebeat.ymlfilebeat:prospectors:-
.
.
.-paths:- "/var/log/nova/*.log"
exclude_files: ['\.gz$']document_type: novamultiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3} [0-9]+
(ERROR|WARNING|INFO|DEBUG|TRACE) [0-9A-Za-z._]+ \['multiline.negate: truemultiline.match: aftertags: [openstack_service_logs, nova]
.
.
.output:elasticsearch:enabled: truehosts:- logstash-server:5044
index: filebeatbulk_max_size: 50
Kibana Demo
Build test environment for yourself
l https://github.com/itoperatorguy/openstack-elk-docker
What about monitoring and alarming?
l You have to monitor the physical infrastructure- Traditional:
l Zabbixl Nagiosl Icingal Zenossl + a few more
- New “players”l Elastic Stack XPACK (commercial licence)l Prometheus
l You have to monitor the virtual infrastructure- “Traditional”
l Ceilometer
- New “players”l Monascal Ceilosca
Prometheusl “...an open-source systems monitoring and alerting toolkit originally built at
SoundCloud.”
l “joined the Cloud Native Computing Foundation (https://www.cncf.io/) in 2016 as the second hosted project after Kubernetes.”
l Some of the Features- a multi-dimensional data model (time series identified by metric name and key/value pairs)
- no reliance on distributed storage; single server nodes are autonomous
- time series collection happens via a pull model over HTTP
- targets are discovered via service discovery or static configuration
- multiple modes of graphing and dashboarding support
l Some of the Components- the main Prometheus server which scrapes and stores time series data
- a push gateway for supporting short-lived jobs
- special-purpose exporters (for HAProxy, StatsD, Graphite, etc.)
- an alertmanager
Prometheus Architecture
Demo deployment of Prometheus
Prometheus exporters for OpenStack
l Consul exporter (official)
l cAdvisor
l ElasticSearch exporter
l Memcached exporter (official)
l MongoDB exporter
l MySQL server exporter (official)
l Node/system metrics exporter (official)
l RabbitMQ exporter
l RabbitMQ Management Plugin exporter
l Ceph exporter
l Gluster exporter
l Apache exporter
l HAProxy exporter (official)
Prometheus configurationglobal:
scrape_interval: 30sevaluation_interval: 30s
labels:cluster: swarmreplica: "1"
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).external_labels:
monitor: 'prometheus-swarm'
rule_files:- "alert.rules_nodes"- "alert.rules_tasks"- "alert.rules_service-groups"
.
.
.
.
.
.scrape_configs:
.
.
.- job_name: 'node-exporter'
dns_sd_configs:- names:
- 'tasks.node-exporter'type: 'A'port: 9100
- job_name: "node"scrape_interval: 5sstatic_configs:
- targets: ['10.10.10.51:9100','10.10.10.52:9100','10.10.10.53:9100','10.10.10.54:9100','10.10.10.55:9100','10.10.10.56:9100','10.10.10.57:9100']
Prometheus query examples
l Show Overall CPU usage for a server- 100 * (1 - avg
by(instance)(irate(node_cpu{mode='idle'}[5m])))l HTTP request rate, per second.. an hour ago
- rate(api_http_requests_total{status=500}[5m] offset 1h)
l Disk Will Fill in 4 Hours- predict_linear(node_filesystem_free[1h], 4*3600)
Prometheus Alarm Syntax
l ALERT <alert name>l IF <expression>l [ FOR <duration> ]l [ LABELS <label set> ]l [ ANNOTATIONS <label set> ]
Prometheus Demo
Sources
l OpenStack log file locations per service: https://docs.openstack.org/<os-release-name>/config-reference/
l Multiline solutions for LogStash: https://dzone.com/articles/using-multiple-grok-statements
l Grok Pattern names for Logstash: https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns
l Multiline solutions for Filebeat: https://www.elastic.co/guide/en/logstash/current/multiline.html
l Multiline test site for Filebeat: https://play.golang.org/p/uAd5XHxscu
l Docker ELK-stack deployment example: http://elk-docker.readthedocs.io/
l Good logstash pipeline config examples: https://github.com/sorantis/elkstack
l Prometheus main site: https://prometheus.io/
l Prometheus Docker Swarm usage: https://github.com/bvis/docker-prometheus-swarm
l Prometheus query examples:
- https://prometheus.io/docs/querying/examples
- https://github.com/infinityworksltd/prometheus-example-queries
Q & A