Post on 07-Apr-2017
Who We Are
Ilan RabinovitchDir. Technical Community
Datadog
Ovais TariqStorage SRE
Uber(formerly at Lithium & Percona)
Agenda
1. About Lithium and MySQL2. Background: Monitoring Challenges in a Dynamic World3. Theory: Monitoring 1014. Practical: Triaging a Real Incident at Lithium
About Lithium Technologies
Lithium’s platform helps brands connect, engage and understand their customers
MySQL Architecture / Data Flow
•Multi-Tenant SaaS applications•Typical Master-slave replication setup•MySQL running
○ On bare metal○ In AWS public cloud○ In OpenStack
You’re in the cloud and it's everything you dreamed of!
Autoscaling Infinite StorageManaged Databases
Container Orchestration
Private Clouds
How much we measure?1 instance
• 10 metrics from CloudWatch1 operating system (e.g., Linux)
• 100 metricsMySQL Instance
• 350~ metrics
•Earlier - typical Nagios and Cacti setup•Static config and lack of context•No correlation between alerts and graphs•No self-service for developers•In-house tooling has high cost
• Disk Space Usage• Threads_connected• Threads_running• Connection_errors_ internal• Aborted_connects• Connection_errors_ max_connections
Sources:● Server Status Variables
Change in workload without an increase in workload affected the schema ‘groupecasino’
• Workload characteristics change to make it more CPU bound• No increase in IO activity• Increase in number of read operations• No change in types of read operations• Similar number of range queries reading more rows
Monitoring 101: Alerting https://www.datadoghq.com/blog/monitoring-101-alerting/
Monitoring 101: Collecting the Right Datahttps://www.datadoghq.com/blog/monitoring-101-collecting-data/
Monitoring 101: Investigating performance issues https://www.datadoghq.com/blog/monitoring-101-investigation/
Monitoring MySQL Performance Metricshttps://www.datadoghq.com/blog/monitoring-mysql-performance-metrics/
Collecting MySQL Metricshttps://www.datadoghq.com/blog/collecting-mysql-statistics-and-metrics/