with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and...

33
Analyzing Performance of OpenStack with Grafana Dashboards GrafanaCon EU 2018 Alex Krzos Senior Software Engineer March 2nd 2018

Transcript of with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and...

Page 1: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

Analyzing Performance of OpenStack with Grafana Dashboards

GrafanaCon EU 2018

Alex KrzosSenior Software EngineerMarch 2nd 2018

Page 2: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

2

What is OpenStack Example Perf and Scale Analysis

What is the problem? What can be improved?

What is the Monitoring Stack Questions

Dashboards Dashboards DashboardsA dashboard is worth a thousand words

Agenda

Page 3: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

3

Alex Krzos

Senior Software Engineer @ Red Hat Inc

On Red Hat OpenStack Scale and Performance Team

2 years on OpenStack

2 years ManageIQ & RH Satellite Performance

4 years at Cisco System/Network testing

Recently completed Army National Guard Obligation

# whoami

Page 4: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

4

Open Source Cloud Software - Cloud Operating System

64 official projects, 1582 repos in github.com/openstack

What is OpenStack

Page 5: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

5

Major Projects:

● Keystone - Identity● Nova - Compute● Neutron - Networking● Swift - Object Storage● Cinder - Block Storage● Glance - Image Storage● Telemetry - Time-Series Data● Horizon - Dashboard GUI

What is OpenStack

Page 6: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

6

Many services, daemons, databases and messaging bus to monitor

Complexity of service interactions

Varying node counts and node types

Large configuration files

Data capture on testbeds

Errors / Misconfiguration

What are the problems?

Page 7: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

7

What are the problems? - Complexity

Page 8: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

8

What is our Monitoring Stack

Page 9: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

9

Benchmarking Tools

Page 10: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

10

Deploying the Dashboards via Ansible:

# ansible-playbook -i hosts install/grafana-dashboards.yml

Deploying the Dashboards

Page 11: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

11

Originally: .json and .json.j2

Converted to GrafYaml (.yaml, .yaml.j2)

Yaml advantages:

● Less Lines● Less Curly braces and quotes● Comments

GrafYaml Advantages:

● Manages ID / RefID● Defaults reduce lines stored in Yaml● Simple CLI

How Dashboards are Stored

Jinja2 Template Advantages

● Reduce duplication● Reuse by organizing into separate files

GrafYaml+Jinja2 reduced line count:

40,000 lines to 6,500 lines

Page 12: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

12

4 Types of Dashboards

● Static Dashboards (.yaml)● Templated “Static” Dashboards ● Templated General Dashboards● Cloud Specific Dashboards (Per-Cloud)

Dashboards, Dashboards, Dashboards

Page 13: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

13

Cloud Instance Count

Cloud Total Memory Usage

Cloud System Performance Comparison

Cloud Keystone Token Count

Apache Request Latency

Static Dashboards

Page 14: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

14

Static Dashboards - Performance Comparison

Page 15: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

15

Cloud Ceph

Cloud Rabbitmq

Three Node Performance

Gnocchi Status

Gnocchi Performance

Templated Dashboards

Page 16: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

16

Templated Dashboards

Page 17: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

17

Templated General DashboardsNode Types:

● Undercloud● Controller● Ceph Storage● Block/Object Storage● Computes

Page 18: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

18

Exposes Each Node in

Ansible Inventory

Includes:

● CPU● Memory● Disk● Network● Log

Cloud Specific Dashboards - CPU

Page 19: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

19

Cloud Specific Dashboards - Memory

Page 20: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

20

Cloud Specific Dashboards - Disk

Page 21: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

21

Cloud Specific Dashboards - Network

Page 22: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

22

Cloud Specific Dashboards - Log

Page 23: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

Example Performance and Scale Analysis

23

Page 24: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

24

Example - OpenStack Telemetry ScalingOpenStack Pike OpenStack Ocata

Page 25: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

25

Example - Slow API PostThreaded Batch

Page 26: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

26

Example - Coordination LossDaemon lost contact with coordination service

Page 27: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

27

Example - SMIs using more CPUOvercloud-compute-4 has 480 SMIs every 10s resulting in higher CPU util, Set “OS Control” in your BIOS power settings...

Page 28: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

28

Example - Uneven MemoryVisualize distribution of instances in uneven compute memory environment

Page 29: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

29

Example - Boot TimingsUnpatched vs Patched Boot Scenario

Page 30: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

30

Dashboard Storage / Reuse

Queries for different Data Sources

Preserving Data (Snapshotting)

Time Series Data Units

Standard Time Series Benchmarks

Metrics interval

What can be improved?

Page 31: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

Questions?

Page 32: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

32

Browbeat - https://github.com/openstack/browbeat

GrafYaml - https://github.com/openstack-infra/grafyaml

Ansible - https://github.com/ansible/ansible

OpenStack - https://www.openstack.org/

References / Links

Page 33: with Grafana Dashboards Analyzing Performance of OpenStack · 2 What is OpenStack Example Perf and Scale Analysis What is the problem? What can be improved? What is the Monitoring

THANK YOU

plus.google.com/+RedHat

linkedin.com/company/red-hat

youtube.com/user/RedHatVideos

facebook.com/redhatinc

twitter.com/RedHatNews