Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
-
Upload
michael-kehoe -
Category
Technology
-
view
128 -
download
6
Transcript of Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
©2016 Couchbase Inc. 1
Monitoring Production Deployments The Tools –
LinkedInAlex Ma – Principal Architect – Couchbase
Michael Kehoe – Staff Site Reliability Engineer - LinkedIn
©2016 Couchbase Inc. 2©2016 Couchbase Inc.
Overview
• Monitoring Tools• Making sense of the data• External Monitoring Integrations• Summary
©2016 Couchbase Inc. 4
Michael KehoeStaff Site Reliability Engineer (SRE) - [email protected]• Production-SRE team• Member of CBVT• Australian!
• Contact• linkedin.com/in/michaelkkehoe• @matrixtek
\GOES HERE
©2016 Couchbase Inc. 5
Monitoring Tools
©2016 Couchbase Inc. 6
Monitoring Tools – Couchbase Web Console
©2016 Couchbase Inc. 7
Monitoring Tools – Couchbase Web Console
©2016 Couchbase Inc. 8
Monitoring Tools – Couchbase Web Console
©2016 Couchbase Inc. 9
Monitoring Tools – Couchbase REST API
• http://docs.couchbase.com/admin/admin/REST/rest-bucket-stats.html
• GET /pools/default/buckets/[bucket-name]/stats• JSON output format• 60 collections per metric
©2016 Couchbase Inc. 10
Monitoring Tools - cbstats
• http://docs.couchbase.com/admin/admin/CLI/cbstats-intro.html• Command Line tool for viewing stats• 333+ Available stats• Cumulative and Snapshot
©2016 Couchbase Inc. 11
Monitoring Tools - cbstats
• Average value size = ep_value_size/(curr_items_tot-ep_num_non_resident)
• ep_value_size = Amount of RAM used to hold values in this bucket for this node
• Curr_items_tot = Total count of active/replica items in this bucket for this node
• Ep_num_non_resident = Total number of items not resident in RAM• 9567135872 / ( 28733039 – 26582747 ) = 4449.22 bytes
©2016 Couchbase Inc. 12
Monitoring Tools - cbstats
• Cbstats can be pointed to a specific host and a specific port
©2016 Couchbase Inc. 13
Monitoring Tools - cbstats
• Cbstats Timings• Histogram that shows the timing of a number of internal operations
• Commit to disk, background IO operations, GET ops• http://
docs.couchbase.com/admin/admin/CLI/CBstats/cbstats-timing.html
©2016 Couchbase Inc. 14
Monitoring Tools - Queries
• http://developer.couchbase.com/documentation/server/current/tools/query-monitoring.html
• http://localhost:8093/admin/vitals
©2016 Couchbase Inc. 15
Monitoring Tools - htop
• Htop|Top|vmstat|proc• Core Utilization• Customization
©2016 Couchbase Inc. 16
Monitoring Tools - iostat
• IO Utilization• Average wait times• Read/Write requests• Determine Capacity
©2016 Couchbase Inc. 17
Monitoring Tools - iostat
• IO Utilization• Average wait times• Read/Write requests• Determine Capacity
©2016 Couchbase Inc. 18
Monitoring Tools - iftop
• See where traffic is coming from• Measure replication throughput• Verify Capacity
©2016 Couchbase Inc. 19
Making Sense of the data
©2016 Couchbase Inc. 20
Key Statistics
Metrics to Consider:• Couchbase-Server • Client application• Disk• Network
©2016 Couchbase Inc. 21
Key Statistics – Couchbase Server
©2016 Couchbase Inc. 22
Key Statistics – Couchbase Server
Metrics to Consider:• Operations• Cache miss (ep_cache_miss_rate)• Active/Replica vbuckets (vb_active_num/vb_replica_num)• Percentage of items in memory (vb_active_resident_items_ratio)• Disk Queue (ep_diskqueue_items)• Misdirected Requests (ep_num_not_my_vbuckets)
©2016 Couchbase Inc. 23
Key Statistics – Couchbase Client
Metrics to Consider:• Call-time latency
• Measure GET’s/ SET’s separately• Hit-rate
• Is the hit-rate what you expected• Errors
• Timeouts retrieving objects• Unable to reach Couchbase-Server
• See http://developer.couchbase.com/documentation/server/4.0/sdks/java-2.2/event-bus-metrics.html
©2016 Couchbase Inc. 24
Key Statistics – Couchbase Client
©2016 Couchbase Inc. 25
Key Statistics – Disk
Metrics to Consider:• Disk Space
• Compaction• Rebalance
• Disk IO• Can disk sustain required IOPS• Disk Queue
©2016 Couchbase Inc. 26
Key Statistics – Network
Metrics to Consider:• Network connectivity• Connections• Capacity/ Utilization
©2016 Couchbase Inc. 27
Key Statistics – Network – Connectivity
• Ping - simple network connectivity test
• Firewalls – make sure you have the correct ports open• See http://
developer.couchbase.com/documentation/server/current/install/install-ports.html
©2016 Couchbase Inc. 28
Key Statistics – Network – Connections
• File-descriptor limits• Connections in CLOSE_WAIT state
• Collect stats from /proc/net/tcp
©2016 Couchbase Inc. 29
Key Statistics – Network – Capacity/ Utilization
• Practical network capacity is ~85-90% of theoretical• E.g. 1Gb/s network interface can do 850-900Mb/s
• Congested networks are problematic• Higher latency on responses • Slower replication
• Collect stats from /proc/net/dev
©2016 Couchbase Inc. 30
Key Statistics – Network – Capacity/ Utilization
• Practical network capacity is ~85-90% of theoretical (1250 Mb/s)• E.g. 1Gb/s network interface can do 850-900Mb/s
Average object size (bytes) 4,096
ID length (bytes) 32
Meta data size (bytes) 56
Reads 100,000
Writes 60,000
Replica count 1
Read network utilization 421,600,000
Write network utilizaation 502,080,000
Total network utilization 923,680,000 1.25 billion theoretical max
remaining bandwidth 276,320,000
©2016 Couchbase Inc. 31
External Monitoring Integrations
©2016 Couchbase Inc. 32
External Monitoring Integrations
©2016 Couchbase Inc. 33
External Monitoring Integrations – Write your own
Getting Started• Use Couchbase REST API• Pipe ‘cbstats’ output
©2016 Couchbase Inc. 34©2016 Couchbase Inc.
Using Couchbase REST API
• Examples• Datadog – http://lnkd.in/cb-datadog• This Example – http://lnkd.in/cb-stats-collector
©2016 Couchbase Inc. 35©2016 Couchbase Inc.
Using Couchbase REST API
©2016 Couchbase Inc. 36©2016 Couchbase Inc.
Using Couchbase REST API
©2016 Couchbase Inc. 37©2016 Couchbase Inc.
Using Couchbase REST API
©2016 Couchbase Inc. 38©2016 Couchbase Inc.
Using Couchbase CBstats
©2016 Couchbase Inc. 39©2016 Couchbase Inc.
Using Couchbase CBstats
©2016 Couchbase Inc. 40
Summary
©2016 Couchbase Inc. 41
Summary
Important to have monitoring in-placeUnderstand the metrics you monitor• What causes them• How to remediate
©2016 Couchbase Inc. 42
Thank You!