vSAE Performance Donitor User Danual m… · • Grafana: We use Grafana as the frontend to...
Transcript of vSAE Performance Donitor User Danual m… · • Grafana: We use Grafana as the frontend to...
1
vSAN Performance Monitor User Manual
2
TABLE OF CONTENET
OVERVIEW 3
REQUIREMENTS 4
INSTALLATION 5
CONFIGURATION 7
STARTUP 11
TROUBLESHOOTING 14
SCREENSHOTS 16
3
Overview
The vSAN performance monitor is a monitoring and visualization tool based on vSAN
Performance metrics. It will collect vSAN Performance and other metrics periodically from the
clusters configured. The data collected is visualized in a more efficient and user-friendly way.
The vSAN performance monitor comes with preconfigured dashboards which will help
customers evaluate the performance of vSAN clusters, identify and diagnose problems, and
understand current and future bottlenecks. The dashboards are heavily inspired by vSAN
Observer.
The vSAN performance monitor is delivered in a virtual appliance with three major components,
i.e., a Telegraf collector, InfluxDB, and a Grafana frontend.
• Telegraf: Telegraf is the agent that collects metrics from vSAN cluster and stores them in
InfluxDB.
• InfluxDB: InfluxDB is the database to store the metrics
• Grafana: We use Grafana as the frontend to virtualize the metrics stored in the InfluxDB
Once deployed, users will need to do some simple configuration changes to point the collector
to target vSAN cluster(s) and start the service. After that, the data will be collected periodically
and can be visualized for meaningful insights.
4
Requirements
• Web Browser: IE8+, Firefox or Chrome
For the client VM to deploy
• vSphere 6.0 / VM version 11 and later environments are needed for the client VM
deployment.
For the target vCenter you want to monitor
• vSphere 6.0 and later environments are required so that the vCenter can be monitored
• Clusters with vSAN enabled
• vSAN performance service needs to be turned on in the vCenter you want to monitor.
Please refer to the page for details to enable perf service. You can select specific vSphere
version on the right top corner of the page.
5
Installation
1. Login into the vSphere client and right-click on the datacenter or cluster which you want to
deploy the virtual machine on
2. Choose “Deploy OVF template”
3. Choose the vSAN-Performance-Monitor as OVF template and follow the steps to select
compute resource, storage, network and setup root password.
6
4. Once the status of “deploy OVF template” turns to “completed”, start the Virtual Machine
by clicking Action -> Power -> Power On
5. After the VM is successfully started, we can see the VM’s IP address and login to it to
complete the configuration. If you have any problem with starting VM, please refer to the
troubleshooting section.
7
Configuration
To use the vSAN-performance-monitor to collect and virtualize the metrics from one or more
specific vCenters, you will need to update the /root/telegraf.conf. The configuration follows
the telegraf plugin we have published.
1. To do this, first open the command line and login the VM with ssh. Use the root credential
you set while deployment. The default root credential is vmware.
ssh root@<vcenter-ip>
e.g: ssh [email protected]
2. Edit /root/telegraf.conf with vim: vim /root/telegraf.conf
8
3. You will need to change the following field:
• vCenter credentials
vcenters = [ "https://<vcenter-ip>/sdk"] # a list of vCenters to connect to
username = "<name>" # the username for vCenters
password = "<pwd>"
• vSAN cluster filter
# vSAN performance metrics are collected on the cluster level, and cluster to be monitored can be selected using Inventory Paths.
# You will need to config the path if your Inventory layout is customized.
# You can also modify the field if you want to col lect a portion of clusters
# By default, all clusters are collected.
vsan_cluster_include = ["/*/host/*"] # Inventory path to clusters to collect
• If you want to skip verifying vCenter’s certificate (in this case, skip next configuration)
# In this mode, TLS is susceptible to man -in-the-middle attacks.
# This should be used only for testing.
insecure_skip_verify = true # skip verify vCenter’s certificate chain
• vCenter certificate verification:
9
vcenters = [ "https://<vc-name>/sdk"] # use vCenter hostname instead of IP
ssl_ca = "/path/to/ca" # path to CA certificate
Step 1: Download root CA certificates:
Step 2: In telegraf.conf, set ssl_ca to the path of the CA file you just download. E.g.,
ssl_ca = “/Users/username/Downloads/certs/mac/f076756e.0 ”
Step 3: Use vCenter hostname instead of IP for the certificate chain to be verified. E.g.,
vcenters = [ "https://2-10-184-161-2.vmware.com/sdk"]
10
4. Optional: you might also change the following field if necessary:
• Interval
Interval = “300S” # how often metrics are collected, 300s is recommended
flush_interval = “300s” # how often metrics are sent , 300s is recommended
• Metrics to collect
vsan_metric_include = […] # vSAN performance entity to collect.
The default config file collects following metrics: "summary.disk-usage", "summary.health", "summary.resync", "performance.cluster-domclient", "performance.cluster-domcompmgr", "performance.host-domclient", "performance.host-domcompmgr", "performance.cache-disk", "performance.disk-group", "performance.capacity-disk", "performance.disk-group", "performance.virtual-machine", "performance.vscsi", "performance.virtual-disk", "performance.vsan-host-net", "performance.vsan-vnic-net",
11
"performance.vsan-pnic-net", "performance.vsan-iscsi-host", "performance.vsan-iscsi-target", "performance.vsan-iscsi-lun", "performance.lsom-world-cpu", "performance.nic-world-cpu", "performance.dom-world-cpu", "performance.cmmds-world-cpu", "performance.host-cpu", "performance.host-domowner", "performance.host-memory-slab", "performance.host-memory-heap"
• Query concurrency
collect_concurrency = 5 # The number of simultaneous queries for collection
discover_concurrency =5 # The number of simultaneous queries for discovery
• Others # whether or not to force discovery of new objects on initial gather call before
collecting metrics
force_discover_on_init = false
# the interval before (re)discovering objects subject to metrics collection
object_discovery_interval = "300s"
Startup
1. After the config file is edited and saved, you can use docker-compose up -d to start all
components in detached mode. Run docker ps to verify if Telegraf, InfluxDB and Grafana have
been successfully started. There should be three containers with STATUS of “Up”.
2. Open http://vm-ip:3000 in the browser, and you will see a Grafana login page. The default
login credential is admin/admin and you might change your password once login in.
12
3. After login in, click the "Dashboards" icon on the left bar and choose "Manage" to view all
available dashboards.
4. With the default collecting frequency, the data will appear in Grafana after 5 – 10 minutes.
For example, you might click the “vSAN Overview dashboard” to see an overview.
13
5. To stop, run
docker-compose down
14
Troubleshooting
1. “No host is compatible with the virtual machine”
Users may experience VMware specific problems when trying to start the fling VM (e.g. "The
guest operating system 'vmwarePhoton64Guest' is not supported").
Solution:
Right-click on the VM, click "Compatibility" > "Upgrade VM compatibility" > "Yes", when having
the option to choose "Compatible with" use the default option, e.g. "ESXi 6.7 and later" or
"Workstation 12 and later" and so on, and then press "OK". The VM should now be able to
start.
2. How to view the logs?
Run docker ps -a to list running and stopped containers with their IDs.
Run docker logs <container-id> to view logs.
3. Not all docker containers are successfully started.
If one or more docker containers are not able to start, the most likely reason is the
telegraf.conf is not configured correctly
Solution:
Run docker ps -a to find out which container fails
Run docker logs <container-id> to see the reason for failing.
4. No data points are available in the Granfana UI
With the default collecting frequency, data points usually appear in Grafana in 5 – 10 minutes
after you run docker-compose up -d. If you are not able to see data coming, please make sure:
15
• All three containers are started successfully. Please refer to the previous troubleshooting
steps.
• The vSAN performance service is enabled on the target vCenter following the instructions
The vSAN clusters’ Inventory Paths are configured correctly and they are successfully
discovered during the discover phrase. By default setting, the inventory path for vSAN
cluster are /DatacenterName/host/ClusterName, where the host folder is created by
system. We can use wildcard to select a group of resource, e.g.
▪ /DatacenterName/host/* for all clusters in a datacenter
▪ /*/host/* for all clusters in all datacenters
If you are not sure about the inventory path of a cluster, please go to https://<vcenter-
ip>/mob.
5. Certificate verification issue
• If you see errors like “x509: certificate signed by unknown authority”, please make sure you
have followed the above instructions and used the correct certification file.
• If you see errors like “vcenter cannot validate certificate for <ipaddress> because it doesn't
contain any IP SANs”, please make sure you are using the vCenter’s hostname instead of IP
address
• If the certificate verification still fails, you might try to skip verification by
insecure_skip_verify = true. In this mode, TLS is susceptible to man-in-the-middle attacks.
This should be used only for testing.
16
Screenshots
17
18
References
1. Grafana: https://grafana.com/ 2. InfluxDB: https://www.influxdata.com/products/influxdb-overview/ 3. Telegraf: https://www.influxdata.com/time-series-platform/telegraf/