Monitoring Guide - Couchbase
Transcript of Monitoring Guide - Couchbase
MonitoringGuide
ProactivemonitoringandalertingisessentialtomanagingahealthyCouchbaseenvironment.Whilethe
CouchbaseWebConsoleprovidesdetailedstatisticsandbasicalertingfunctionality,itisnotintendedtobea
realtimedashboardandshouldn'tbeusedastheprimaryoperationalmonitoringutility.
Integrationwithexternalmonitoringsystemsisrequiredfortwoprimarypurposes:proactivealertingandhigh
resolutiontrending.Theexternalmonitoringsystemshouldbecapableofsettingalertthresholdsonaper-
metricbasis.Asthevalueofmostmetricsareworkloadandenvironment-specific,theywillrequire
establishingabaselineforwhatis"normal"foryourusecases.TrendingtheCouchbasemetricswillhelp
establishthebaselinevaluesandalertscanbeconfiguredwhenpoint-in-timevaluesexceedthe"normal"
range.TrendedmetricsalsoallowsCouchbaseadministratorstoobserveresourceconsumptionovertime,
informingwhenscalingeventswillbecomenecessary.
ThisdocumentdescribeshowtopolltheCouchbaseRESTAPItoobtainmetricsforanexternalmonitoring
system,describeswhichmetricsaremostimportanttomonitor,andprovidesguidanceonhowtointerpret
thosemetrics.
ObtainingCouchbaseMetricsCouchbaseexposesmonitoringmetricsviaRESTAPIswithresponsesreturnedinJSONformat.Thereare
twotypesofstatisticalAPIsavailable,ClusterManager(port8091/18091)statsandServicespecific
administrativestats.
ClusterManagerstatsprovidestatisticalsamplingforagivenserviceand/orentitiesataparticularinterval.
Eachresponsefrom /statsendpointwillcontaina timestamppropertyforwhenthesamplewas
takenthatwilldirectlycorrelatetoeachoftheavailablestats.
EveryClusterManagerendpointsupportstwooptionalquerystringparameters:
zoom
The zoomparameterdeterminestheintervalofsamplestoreturnintheresponse.Thezoomparameter
providesthefollowinggranularity:
zoom=minute(default)-Everysecondforthelastminute(60samples)zoom=hour-Everyfour(4)secondsforthelasthour(900samples)
zoom=day-Everyminuteforthelastday(1440samples)
zoom=week-Everyten(10)minutesforthelastweek,actually,eight(8)days(1152samples)
zoom=year-Everysix(6)hoursforthelastyear(1464samples)
Duetosamplefrequency,thenumberofsamplesreturnedareplusorminusone(+-1).
haveTStamp
Requestsstatisticsfromthistimestampuntilthecurrenttime.The haveTStampparameterisspecifiedas
UNIXepochtimeinmilliseconds.
MonitoringGuide
3CouchbaseProfessionalServices
Tolimittheresultswhenusingthezoomparameter,post-processtheresults.Forexample,ifyouneed
samplesfromthelastfive(5)minutes,setthezoomparametertoonehourandretrievethelast75
entriesfromtheJSONlist.
PollingtheAPIsTheRESTAPIsshouldbepolledminutelyviaalocalagentorremotelyusingthenode(s)IPorhostname.
CouchbaseRESTAPIsmustbeaccessedusingadministrativeaccountcredentials;aRead-Only
Administratorisrecommendedforthispurpose.
AsmostofthemetricsprovidedbytheRESTAPIareper-node,itisnecessarytoqueryeverynodeinthe
cluster.
LimitthenumberofrequestsperAPIwhenqueryingmetrics,i.e.returnallbucketmetricsinonerequest
ratherthanissuingseparaterequestspermetric.HeavyuseoftheCouchbaseRESTAPIscanhaveCPU
utilizationimpactsonthecluster.
CouchbaseServiceDiscoverySomemonitoringsystemsarecapableofdiscoveringnewmonitoringtargetsandautomaticallydefiningthe
monitoringprofiletobeapplied.Couchbasesupportsthisbyexposingclustermembership,MDSservice
assignment,andserviceportsviatheDataServiceNodeAPI.
MetricsandServicestoMonitorEachsectioninthelistdescribetheavailablemonitoringmetricsexposedbytheCouchbaseservice,a
descriptionofeachmetric,andpossibleoperationalresponses.Alertsshouldbeconfiguredtobesentfrom
theexternalmonitoringsystemwhenmetricvaluesfalloutsidetheexpectedrange.Guidanceoninterpreting
themetricsandpossibleoperationalresponsesisprovided.
Eachguidewillcontainexamplesofhowtocallanendpointandparsetheresults.Fortheseexamplesatool
called jqisused,itisalightweightcliparserforJSON,thisisnotrequiredandisprovidedforexample
purposesonly.Itcanbedownloadedathttps://stedolan.github.io/jq/download
Monitoring:OperatingSystem
Monitoring:Nodes
Monitoring:DataService
Monitoring:XDCR
Monitoring:QueryService
Monitoring:IndexService
Monitoring:FTSService
Monitoring:EventingService
MonitoringGuide
4CouchbaseProfessionalServices
Monitoring:Logs
ReferenceImplementationsCouchbaseprovidesareferencemonitoringimplementationtodemonstrateinteractingwiththeavailable
RESTAPIs.
AsampleNagiospluginisavailablehere.
Acompletedockerizedmonitoringenvironmentisavailablehere.
ThirdPartyIntegrationsThefollowingmonitoringsystemshavepluginsavailableforCouchbase.Notethatthesearethirdparty
integrationsandmaynotbecompletenorfollowthebestpracticessetforthinthisdocument.
CouchbaseNodeExporterforPrometheus,seethePrometheusIntegrationGuidefordetails
AppDynamics
DataDog
Dynatrace
NewRelic
SignalFx
Sensu
ManageEngine
MonitoringGuide
5CouchbaseProfessionalServices
Monitoring:DataService
BucketsOverviewBucketsoverviewprovidesallavailablebuckets,high-levelsysteminformationandresourceutilizationfor
eachbucketinthecluster.
Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-buckets-summary.html
Insecure:http://localhost:8091/pools/default/buckets
Secure:https://localhost:18091/pools/default/buckets
Example
Thefollowingexampleillustratesretrievingallofthebucketsinaclusteranddisplayingbasicstatsabout
eachbucket.
curl\
--userAdministrator:password\
--silent\
--requestGET\
--dataskipMap=true\
http://localhost:8091/pools/default/buckets|\
jq-r'.[]|
"Bucket:"+.name+"\n"+
"QuotaUsed:"+(.basicStats.quotaPercentUsed|tostring)+"%\n"+
"Ops/Sec:"+(.basicStats.opsPerSec|tostring)+"\n"+
"DiskFetches:"+(.basicStats.diskFetches|tostring)+"\n"+
"ItemCount:"+(.basicStats.itemCount|tostring)+"\n"+
"DiskUsed:"+(.basicStats.diskUsed/1024/1024|tostring)+"MB\n"
+
"DataUsed:"+(.basicStats.dataUsed/1024/1024|tostring)+"MB\n"
+
"MemoryUsed:"+(.basicStats.memUsed/1024/1024|tostring)+"MB\n"
'
Note:The skipMapquerystringparameterisabooleanvaluethatcanbeusedtoincludeorexcludethecurrentvBucketdistributionmapforthebuckets.
IndividualBucket-LevelStats
Monitoring:DataService
6CouchbaseProfessionalServices
Bucketmetricsprovidedetailedinformationaboutresourceconsumption,applicationworkload,andinternal
operationsatthebucketlevel.ThefollowingBucketstatsareavailableviatheCluster-WideorPer-Node
Endpointslistedbelow.
Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-bucket-stats.html
Insecure:http://localhost:8091/pools/default/buckets/BUCKET/stats
Secure:https://localhost:18091/pools/default/buckets/BUCKET/stats
AvailableStats
Statname Description
avg_active_timestamp_driftAveragedrift(inseconds)permutationonactivevBuckets
avg_bg_wait_time Averagebackgroundfetchtimeinmicroseconds
avg_disk_commit_timeAveragediskcommittimeinsecondsasfromdisk_updatehistogramoftimings
avg_disk_update_timeAveragediskupdatetimeinmicrosecondsasfromdisk_updatehistogramoftimings
avg_replica_timestamp_driftAveragedrift(inseconds)permutationonreplicavBuckets
bg_wait_count Numberofbackgroundfetchoperations
bg_wait_total Backgroundfetchtimeinmicroseconds
bytes_read Numberofbytespersecondsentintothisbucket
bytes_written Numberofbytespersecondsentfromthisbucket
cas_badvalNumberofCASoperationspersecondusinganincorrectCASIDfordatathatthisbucketcontains
cas_hitsNumberofCASoperationspersecondfordatathatthisbucketcontains
cas_missesNumberofCASoperationspersecondfordatathatthisbucketdoesnotcontain
cmd_get Numberofgetoperationsservicedbythisbucket
cmd_lookupNumberoflookupsub-documentoperationsservicedbythisbucket
cmd_set Numberofsetoperationsservicedbythisbucket
couch_docs_actual_disk_sizeThesizeofalldatafilesforthisbucket,includingthedataitself,metadataandtemporaryfiles
couch_docs_data_size Thesizeofactivedatainthisbucket
couch_docs_disk_size Thesizeofactivedatainthisbucketondisk
couch_docs_fragmentationHowmuchfragmenteddatathereistobecompactedcomparedtorealdataforthedatafilesinthisbucket
couch_spatial_data_sizeThesizeofallactiveitemsinallthespatialindexesforthisbucketondisk
couch_spatial_disk_sizeThesizeofallactiveitemsinallthespatialindexesfor
Monitoring:DataService
7CouchbaseProfessionalServices
couch_spatial_disk_sizeThesizeofallactiveitemsinallthespatialindexesforthisbucketondisk
couch_spatial_ops Allthespatialindexreads
couch_total_disk_sizeThetotalsizeondiskofalldataandviewfilesforthisbucket.
couch_views_actual_disk_sizeThesizeofallactiveitemsinalltheindexesforthisbucketondisk
couch_views_data_sizeThesizeofactivedataonforalltheviewindexesinthisbucket
couch_views_disk_sizeThesizeofactivedataonforalltheviewindexesinthisbucketondisk
couch_views_fragmentationHowmuchfragmenteddatathereistobecompactedcomparedtorealdatafortheviewindexfilesinthisbucket
couch_views_opsAlltheviewreadsforalldesigndocumentsincludingscattergather.
curr_connectionsNumberofconnectionstothisserverincludingconnectionsfromexternalclientSDKs,proxies,DCPrequestsandinternalstatisticgathering
curr_itemsNumberofuniqueitemsinthisbucket-onlyactiveitems,notreplica
curr_items_tot Totalnumberofitemsinthisbucket(includingreplicas)
decr_hitsNumberofdecrementoperationspersecondfordatathatthisbucketcontains
decr_missesNumberofdecroperationspersecondfordatathatthisbucketdoesnotcontain
delete_hits Numberofdeleteoperationspersecondforthisbucket
delete_missesNumberofdeleteoperationspersecondfordatathatthisbucketdoes
disk_commit_count Thenumberofdiskcomments
disk_commit_total Thetotaltimespentcommittingtodisk
disk_update_count Thetotalnumberofdiskupdates
disk_update_total Thetotaltimespentupdatingdisk
disk_write_queueNumberofitemswaitingtobewrittentodiskinthisbucket
ep_active_ahead_exceptionsTotalnumberofaheadexceptionsforallactivevBuckets
ep_active_hlc_driftThesumoftotal_abs_driftforthenodesactivevBuckets
ep_active_hlc_drift_countThesumoftotal_abs_drift_countforthenodesactivevBuckets
ep_bg_fetched Numberofreadspersecondfromdiskforthisbucket
ep_cache_miss_ratePercentageofreadspersecondtothisbucketfromdiskasopposedtoRAM
Monitoring:DataService
8CouchbaseProfessionalServices
ep_clock_cas_drift_threshold_exceeded
ep_data_read_failed Numberofdiskreadfailures
ep_data_write_failed Numberofdiskwritefailures
ep_dcp_2i_backoff NumberofbackoffsforindexDCPconnections
ep_dcp_2i_countNumberofinternalsecondindexDCPconnectionsinthisbucket
ep_dcp_2i_items_remainingNumberofsecondaryindexitemsremainingtobesenttoconsumerinthisbucket
ep_dcp_2i_items_sentNumberofsecondaryindexitemspersecondbeingsentforaproducerforthisbucket
ep_dcp_2i_producer_count Numberofsecondaryindexsendersforthisbucket
ep_dcp_2i_total_backlog_sizeTotalsizeinbytesoftheDCPbacklogforsecondaryindexes
ep_dcp_2i_total_bytesNumberofbytespersecondbeingsentforsecondaryindexesDCPconnections
ep_dcp_cbas_backoff NumberofbackoffsforAnalyticsDCPconnections
ep_dcp_cbas_countNumberofinternalAnalyticsDCPconnectionsinthisbucket
ep_dcp_cbas_items_remainingNumberofAnalyticsitemsremainingtobesenttoconsumerinthisbucket
ep_dcp_cbas_items_sentNumberofAnalyticsitemspersecondbeingsentforaproducerforthisbucket
ep_dcp_cbas_producer_count NumberofAnalyticssendersforthisbucket
ep_dcp_cbas_total_backlog_size TotalsizeinbytesoftheDCPbacklogforAnalytics
ep_dcp_cbas_total_bytesNumberofbytespersecondbeingsentforAnalyticsDCPconnections
ep_dcp_eventing_backoff NumberofbackoffsforEventingDCPconnections
ep_dcp_eventing_countNumberofinternalEventingDCPconnectionsinthisbucket
ep_dcp_eventing_items_remainingNumberofEventingitemsremainingtobesenttoconsumerinthisbucket
ep_dcp_eventing_items_sentNumberofEventingitemspersecondbeingsentforaproducerforthisbucket
ep_dcp_eventing_producer_count NumberofEventingsendersforthisbucket
ep_dcp_eventing_total_backlog_size TotalsizeinbytesoftheDCPbacklogforEventing
ep_dcp_eventing_total_bytesNumberofbytespersecondbeingsentforEventingDCPconnections
ep_dcp_fts_backoff NumberofbackoffsforFTSDCPconnections
ep_dcp_fts_count NumberofinternalFTSDCPconnectionsinthisbucket
ep_dcp_fts_items_remainingNumberofFTSitemsremainingtobesenttoconsumerinthisbucket
NumberofFTSitemspersecondbeingsentfora
Monitoring:DataService
9CouchbaseProfessionalServices
ep_dcp_fts_items_sentNumberofFTSitemspersecondbeingsentforaproducerforthisbucket
ep_dcp_fts_producer_count NumberofFTSsendersforthisbucket
ep_dcp_fts_total_backlog_size TotalsizeinbytesoftheDCPbacklogforFTS
ep_dcp_fts_total_bytesNumberofbytespersecondbeingsentforFTSDCPconnections
ep_dcp_other_backoff NumberofbackoffsforotherDCPconnections
ep_dcp_other_count NumberofotherDCPconnectionsinthisbucket
ep_dcp_other_items_remainingNumberofitemsremainingtobesenttoconsumerinthisbucket
ep_dcp_other_items_sentNumberofitemspersecondbeingsentforaproducerforthisbucket
ep_dcp_other_producer_count Numberofothersendersforthisbucket
ep_dcp_other_total_backlog_sizeTotalsizeinbytesoftheDCPbacklogforanalyticsother
ep_dcp_other_total_bytesNumberofbytespersecondbeingsentforotherDCPconnectionsforthisbucket
ep_dcp_replica_backoff NumberofbackoffsforreplicationDCPconnections
ep_dcp_replica_countNumberofinternalreplicationDCPconnectionsinthisbucket
ep_dcp_replica_items_remainingNumberofreplicationitemsremainingtobesenttoconsumerinthisbucket
ep_dcp_replica_items_sentNumberofreplicationitemspersecondbeingsentforaproducerforthisbucket
ep_dcp_replica_producer_count Numberofreplicationsendersforthisbucket
ep_dcp_replica_total_backlog_size TotalsizeinbytesoftheDCPbacklogforreplication
ep_dcp_replica_total_bytesNumberofbytespersecondbeingsentforreplicationDCPconnections
ep_dcp_views+indexes_backoff Numberofbackoffsforview/indexDCPconnections
ep_dcp_views+indexes_countNumberofinternalview/indexDCPconnectionsinthisbucket
ep_dcp_views+indexes_items_remainingNumberofview/indexitemsremainingtobesenttoconsumerinthisbucket
ep_dcp_views+indexes_items_sentNumberofview/indexitemspersecondbeingsentforaproducerforthisbucket
ep_dcp_views+indexes_producer_count Numberofviews/indexsendersforthisbucket
ep_dcp_views+indexes_total_backlog_size TotalsizeinbytesoftheDCPbacklogforviews/indexes
ep_dcp_views+indexes_total_bytesNumberofbytespersecondbeingsentforviews/indexesDCPconnections
ep_dcp_views_backoff NumberofbackoffsforviewDCPconnections
ep_dcp_views_count NumberofinternalviewDCPconnectionsinthisbucket
Numberofviewitemsremainingtobesenttoconsumer
Monitoring:DataService
10CouchbaseProfessionalServices
ep_dcp_views_items_remainingNumberofviewitemsremainingtobesenttoconsumerinthisbucket
ep_dcp_views_items_sentNumberofviewitemspersecondbeingsentforaproducerforthisbucket
ep_dcp_views_producer_count Numberofviewsendersforthisbucket
ep_dcp_views_total_backlog_size TotalsizeinbytesoftheDCPbacklogforviews
ep_dcp_views_total_bytesNumberofbytespersecondbeingsentforviewDCPconnections
ep_dcp_xdcr_backoff NumberofbackoffsforXDCRDCPconnections
ep_dcp_xdcr_countNumberofinternalXDCRDCPconnectionsinthisbucket
ep_dcp_xdcr_items_remainingNumberofXDCRitemsremainingtobesenttoconsumerinthisbucket
ep_dcp_xdcr_items_sentNumberofXDCRitemspersecondbeingsentforaproducerforthisbucket
ep_dcp_xdcr_producer_count NumberofXDCRsendersforthisbucket
ep_dcp_xdcr_total_backlog_size TotalsizeinbytesoftheDCPbacklogforXDCR
ep_dcp_xdcr_total_bytesNumberofbytespersecondbeingsentforXDCRDCPconnections
ep_diskqueue_drainTotalnumberofitemspersecondbeingwrittentodiskinthisbucket
ep_diskqueue_fillTotalnumberofitemspersecondbeingputonthediskqueueinthis
ep_diskqueue_itemsTotalnumberofitemswaitingtobewrittentodiskinthisbucket
ep_flusher_todo Numberofitemscurrentlybeingwritten.
ep_item_commit_failedNumberoftimesatransactionfailedtocommitduetostorageerrors.
ep_kv_size TotalamountofuserdatacachedinRAMinthisbucket
ep_max_size Themaximumamountofmemorythisbucketcanuse.
ep_mem_high_wat Highwatermarkforauto-evictions
ep_mem_low_wat Lowwatermarkforauto-evictions
ep_meta_data_memoryTotalamountofitemmetadataconsumingRAMinthisbucket
ep_num_non_resident Thenumberofnon-residentitems.
ep_num_ops_del_metaNumberofdeleteoperationspersecondforthisbucketasthetargetforXDCR
ep_num_ops_del_ret_meta NumberofdelRetMetaoperations.
ep_num_ops_get_metaNumberofmetadatareadoperationspersecondforthisbucketasthetargetforXDCR
ep_num_ops_set_metaNumberofsetoperationspersecondforthisbucketasthetargetforXDCR
Monitoring:DataService
11CouchbaseProfessionalServices
ep_num_ops_set_ret_meta
ep_num_value_ejectsTotalnumberofitemspersecondbeingejectedtodiskinthisbucket
ep_oom_errorsNumberoftimesunrecoverableOOMshappenedwhileprocessingoperations.
ep_ops_createTotalnumberofnewitemsbeinginsertedintothisbucket
ep_ops_updateNumberofitemsupdatedondiskpersecondforthisbucket
ep_overheadExtramemoryusedbytransientdatalikepersistencequeues,replicationqueues,checkpoints,etc.
ep_queue_size Numberofitemsqueuedforstorage.
ep_replica_ahead_exceptionsTotalnumberofaheadexceptionsforallreplicavBuckets
ep_replica_hlc_driftThesumoftotal_abs_driftforthenode'sactivevBuckets
ep_replica_hlc_drift_countThesumoftotal_abs_drift_countforthenode'sactivevBuckets
ep_resident_items_rate PercentageofallitemscachedinRAMinthisbucket
ep_tmp_oom_errorsNumberofback-offssentpersecondtoclientSDKsdueto"outofmemory"situationsfromthisbucket
ep_vb_total TotalnumberofvBucketsforthisbucket
evictions Numberofitemspersecondevictedfromthisbucket
get_hitsNumberofgetoperationspersecondfordatathatthisbucketcontains
get_missesNumberofgetoperationspersecondfordatathatthisbucketdoesnotcontain
hibernated_requests Numberofhibernatedrequests
hibernated_waked Numberoftimeshibernatedwaked
hit_ratioPercentageofgetrequestsservedwithdatafromthisbucket
incr_hitsNumberofincrementoperationspersecondfordatathatthisbucketcontains
incr_missesNumberofincrementoperationspersecondfordatathatthisbucketdoesnotcontain
mem_used AmountofMemoryused
missesTotalamountofoperationspersecondforthatthatthebucketdoesnotcontain
opsTotalamountofoperationspersecond(includingXDCR)tothisbucket
rest_requests
swap_total
swap_used
Monitoring:DataService
12CouchbaseProfessionalServices
swap_used
vb_active_eject Numberofitemspersecondbeingejectedtodiskfrom"active"
vb_active_itm_memoryAmountofactiveuserdatacachedinRAMinthisbucket
vb_active_meta_data_memoryAmountofactiveitemmetadataconsumingRAMinthisbucket
vb_active_num NumberofvBucketsinthe"active"stateforthisbucket
vb_active_num_non_resident Numberofnon-residentitems.
vb_active_ops_createNewitemspersecondbeinginsertedinto"active"vBucketsinthisbucket
vb_active_ops_updateNumberofitemsupdatedondiskpersecondforthisbucket
vb_active_queue_ageSumofdiskqueueitemageinmillisecondsfor"active"vBuckets
vb_active_queue_drainNumberofactiveitemspersecondbeingwrittentodiskinthisbucket
vb_active_queue_fillNumberofactiveitemspersecondbeingputontheactiveitemdiskqueueinthisbucket
vb_active_queue_sizeNumberofactiveitemswaitingtobewrittentodiskinthisbucket
vb_active_resident_items_ratioPercentageofactiveitemscachedinRAMinthisbucket
vb_active_sync_write_aborted_count Numberofvbucketwritesaborted
vb_active_sync_write_accepted_count Numberofvbucketwritesaccepted
vb_active_sync_write_committed_count Numberofvbucketwritescommitted
vb_avg_active_queue_ageAverageageinsecondsofactiveitemsintheactiveitemqueueforthisbucket
vb_avg_pending_queue_ageAverageageinsecondsofpendingitemsinthependingitemqueueforthisbucketandshouldbetransientduringrebalancing
vb_avg_replica_queue_ageAverageageinsecondsofreplicaitemsinthereplicaitemqueueforthisbucket
vb_avg_total_queue_ageAverageageinsecondsofallitemsinthediskwritequeueforthisbucket
vb_pending_curr_itemsNumberofitemsin"pending"vBucketsinthisbucketandshouldbetransientduringrebalancing
vb_pending_ejectNumberofitemspersecondbeingejectedtodiskfrom"pending"vBucketsinthisbucketandshouldbetransientduringrebalancing
vb_pending_itm_memoryAmountofpendinguserdatacachedinRAMinthisbucketandshouldbetransientduringrebalancing
vb_pending_meta_data_memoryAmountofpendingitemmetadataconsumingRAMinthisbucketandshouldbetransientduringrebalancing
Monitoring:DataService
13CouchbaseProfessionalServices
vb_pending_num
NumberofvBucketsinthe"pending"stateforthis
bucketandshouldbetransientduringrebalancing
vb_pending_num_non_resident Numberofnon-residentitems.
vb_pending_ops_createNewitemspersecondbeinginsteadinto"pending"vBucketsinthisbucketandshouldbetransientduringrebalancing
vb_pending_ops_updateNumberofitemsupdatedondiskpersecondforthisbucket
vb_pending_queue_age Sumofdiskqueueitemageinmilliseconds.
vb_pending_queue_drainNumberofpendingitemspersecondbeingwrittentodiskinthisbucketandshouldbetransientduringrebalancing
vb_pending_queue_fillNumberofpendingitemspersecondbeingputonthependingitemdiskqueueinthisbucketandshouldbetransientduringrebalancing
vb_pending_queue_sizeNumberofpendingitemswaitingtobewrittentodiskinthisbucketandshouldbetransientduringrebalancing
vb_pending_resident_items_ratioPercentageofitemsinpendingstatevbucketscachedinRAMinthisbucket
vb_replica_curr_items Numberofitemsin"replica"vBucketsinthisbucket
vb_replica_ejectNumberofitemspersecondbeingejectedtodiskfrom"replica"vBucketsinthisbucket
vb_replica_itm_memoryAmountofreplicauserdatacachedinRAMinthisbucket
vb_replica_meta_data_memoryAmountofreplicaitemmetadataconsuminginRAMinthisbucket
vb_replica_num NumberofvBucketsinthe"replica"stateforthisbucket
vb_replica_num_non_resident Numberofnon-residentitems.
vb_replica_ops_createNewitemspersecondbeinginsertedinto"replica"vBucketsinthisbucket
vb_replica_ops_updateNumberofitemsupdatedondiskpersecondforthisbucket
vb_replica_queue_ageSumofdiskqueueitemageinmillisecondsfor"replica"vBuckets
vb_replica_queue_drainNumberofreplicaitemspersecondbeingwrittentodiskinthisbucket
vb_replica_queue_fillNumberofreplicaitemspersecondbeingputonthereplicaitemdiskqueueinthisbucket
vb_replica_queue_sizeNumberofreplicaitemswaitingtobewrittentodiskinthisbucket
vb_replica_resident_items_ratioPercentageofreplicaitemscachedinRAMinthisbucket
vb_total_queue_age Sumofdiskqueueitemageinmilliseconds.
Monitoring:DataService
14CouchbaseProfessionalServices
xdc_ops IncomingXDCRoperationspersecondforthisbucket
GETCluster-WideIndividualBucketStats
Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe
entireandclusterandthebestpracticeistomonitoreachnodeindividually.
Insecure:http://localhost:8091/pools/default/buckets/BUCKET/stats
Secure:https://localhost:18091/pools/default/buckets/BUCKET/stats
Example:Withanaverageforallsamples
BUCKET="travel-sample"
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/$BUCKET/stats|\
jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|
.key+":"+(.value|add/length|tostring)'
GETNode-LevelIndividualBucketStats
Eachnodeintheclusterrunningthedataserviceshouldbemonitoringindividuallyusingtheendpointlisted
below.
Insecure:http://localhost:8091/pools/default/buckets/BUCKET/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/BUCKET/nodes/NODE/stats
Example:StatsforIndividualNode
Thefollowingexampledemonstrateshowtoretrievethebucketstatsforaspecificnode.
BUCKET="travel-sample"
NODE="172.17.0.2:8091"
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/$BUCKET/nodes/$NODE/stats|\
jq-r-c'.op.samples|
Monitoring:DataService
15CouchbaseProfessionalServices
jq-r-c'.op.samples|
"cmd_get:"+(.cmd_get|add/length|tostring)+
"\ncmd_set:"+(.cmd_set|add/length|tostring)+
"\ncurr_connections:"+(.curr_connections|add/length|tostring)+
"\ncurr_items:"+(.curr_items|add/length|tostring)+
"\ncurr_items_tot:"+(.curr_items_tot|add/length|tostring)+
"\ndecr_hits:"+(.decr_hits|add/length|tostring)+
"\ndecr_misses:"+(.decr_misses|add/length|tostring)+
"\ndelete_hits:"+(.delete_hits|add/length|tostring)+
"\ndelete_misses:"+(.delete_misses|add/length|tostring)+
"\nep_bg_fetched:"+(.ep_bg_fetched|add/length|tostring)+
"\nevictions:"+(.evictions|add/length|tostring)+
"\nget_hits:"+(.get_hits|add/length|tostring)+
"\nget_misses:"+(.get_misses|add/length|tostring)+
"\nhit_ratio:"+(.hit_ratio|add/length|tostring)+
"\nincr_hits:"+(.incr_hits|add/length|tostring)+
"\nincr_misses:"+(.incr_misses|add/length|tostring)+
"\nmisses:"+(.misses|add/length|tostring)+
"\nops:"+(.ops|add/length|tostring)
"\nxdc_ops:"+(.xdc_ops|add/length|tostring)
'
KeyMetricstoMonitor
CouchbaseMetric Description Response
mem_usedep_kv_sizeep_mem_high_wat
Thesefourmetricstogethergiveinsightintohowmemoryisusedbythedataservice.
mem_used/ep_kv_sizerepresentsfragmentationwithintheKVengine.
outmem_usedistheactualmemoryutilizationwhereasep_kv_sizeisthesumofthemetadataandvaluesexpectedtobeinRAM.mem_used/memoryTotalshouldbelessthan90%.
ep_kv_size/ep_mem_high_watrepresentsyour
Theamountoffragmentation(mem_used/ep_kv_size)youshouldexpectwilldependontheworkload,butingeneral,alertifthisvalueexceeds115%.Ifmem_used/memoryTotalareconsistentlynear90%,thatisatriggertoaddadditionalmemoryornodestothecluster.Ifthisvalueapproaches100%,thenyoucouldfaceanOutofMemoryerrorandtheCouchbaseprocesscouldbekilledorcrash.Onceep_kv_size=ep_mem_high_wat,Couchbasewillstartejectingdatatodisk.Thismaybeexpecteddependingonyourusecase,butcachingusecaseswillalwayswantep_kv_sizetobelowerthanep_mem_high_wat.
Monitoring:DataService
16CouchbaseProfessionalServices
quotautilization.
ep_mem_high_watisthemaximumRAMthebucketisexpectedtouse.
ep_meta_data_memory
Theamountofmemoryusedspecificallyfordocumentmetadata.InValueEjectionmode,it'spossiblefordocumentmetadatatodisplacedocumentvaluesincache,reducingcachehitratesandincreasinglatencies.
Createabaselineforep_meta_data_memory/ep_mem_high_wat.Ifthisvalueexceeds30%andvb_active_resident_items_ratioisnot100%,considerconfiguringFullEjectiononthebucket.
ep_queue_size
Theamountofdatawaitingtobewrittentodisk.AlargevaluetypicallyindicatestheserverisdiskIObound.Ifthisvalueexceeds1,000,000items,theserverwillstartsendingtmp_oom(backoff)messagestotheapplication.
Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablediskIO.Alertat2xofbaseline.Youmayneedtoaddnodesorincreasetheper-nodediskIO.
ep_flusher_todo
Thenumberofitemscurrentlybeingwrittentodisk.Combinedwithep_queue_size,thisrepresentsthetotaldiskwritequeueontheserver.
Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablediskIO.Alertat2xofbaseline.
vb_avg_total_queue_age
Theaveragetimeinsecondsthatawriteisinqueuebeforepersistingtodisk.Thisrepresentsthelocalnode'sexposuretopotentialdataloss.
Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablediskIO.Alertat2xofbaseline.
ep_dcp_replica_items_remaining
Thenumberofitemsintheinter-nodereplicationqueue.Thisrepresentsthecluster'sexposuretopotentialdata
Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablenetworkIO.Alertat2xofbaseline.
Monitoring:DataService
17CouchbaseProfessionalServices
cluster'sexposuretopotentialdata
loss.
Alertat2xofbaseline.
ops
ThetotalnumberofKVoperationsoccurringagainstthenode.
Createabaselineforthisvalueas"normal"willbedependentonyourworkload.Alertat2xofbaseline.Abnormallyhighoperationscouldmeananunexpectedchangetotheapplicationorunusualapplicationtrafficpatterns.
cmd_get
ThenumberofKVGEToperationsoccurringagainstthenode.
Createabaselineforthisvalueas"normal"willbedependentonyourworkload.Alertat3xofbaseline.Abnormallyhighoperationscouldmeananunexpectedchangetotheapplicationorunusualapplicationtrafficpatterns.
cmd_set
ThenumberofKVSEToperationsoccurringagainstthenode.
Createabaselineforthisvalueas"normal"willbedependentonyourworkload.Alertat2xofbaseline.Abnormallyhighoperationscouldmeananunexpectedchangetotheapplicationorunusualapplicationtrafficpatterns.
delete_hits
ThenumberofKVDELETEoperationsoccurringagainstthenode.
Createabaselineforthisvalueas"normal"willbedependentonyourworkload.Alertat2xofbaseline.Abnormallyhighoperationscouldmeananunexpectedchangetotheapplicationorunusualapplicationtrafficpatterns.
ep_bg_fetched
Thenumberofitemsfetchedfromdisk(cachemisses).
Thisvalueshouldbecloseto0.Establishabaselineforthismetricandalertat2xofbaseline.
curr_connections
Thenumberofclient(SDK)connectionstoCouchbase.MoreconnectionswillresultinincreasedCPUutilization.
Createabaselineforyourenvironment.Alertat2xofbaseline.Couchbasewillbeginrejectingconnectionsabove30,000.
curr_items
Thenumberofitemscurrentlyactiveonthisnode.Duringwarmup,thiswillbe0untilcomplete.
Onceabaselinenumberofobjectshasbeenestablished,substantialchangestothebaselinecouldindicateunexpectedfailureswithinCouchbaseoranapplicationbug
vb_active_resident_items_ratio
Thepercentageofactivedatainthatismemoryresident.
Forcachingusecases,thisvalueshouldbecloseto100%.Ifthisvaluefallsbelow100%andep_bg_fetchedisgreaterthan0,thisindicatesthebucketneedsmoreRAM.Thevalueshouldneverbelessthan15%.
Thepercentageofreplicadatainthatismemory
Monitoring:DataService
18CouchbaseProfessionalServices
vb_replica_resident_items_ratioresident.Ahigherpercentageforthis
valuewillensurelowerlatencydataaccessfollowingafailover.
onbusinessrequirementsforobjectlatencyduringafailurescenario.Thevalueshouldneverbelessthan15%
ep_tmp_oom_errors
NumberoftimestemporaryOOMsweresenttoaclient.Representshightransientmemorypressurewithinthesystem.
Thiserrorindicatestemporarymemorypressureaftertheserverhasreachedep_mem_high_watandisejectingnotrecentlyaccessedvalues.Frequenterrorsindicatetheneedtoscalethecluster.
ep_oom_errors
NumberoftimespermanentOOMsweresenttoaclient.Representsveryhighconsistentmemorypressurewithinthesystem.
Thiserrorindicatesthebuckethasexceededitstotalmemoryallocationandimmediatelyrequiresadditionalmemoryornodesbeadded.
ep_dcp_views_items_remainingep_dcp_2i_items_remaining
ThenumberofdocumentsawaitingindexingforviewsandGSI.
Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablediskIO.Alertat2xbaseline.
ep_dcp_replica_backoff
Indicatesthenumberoftimesaninternalreplicationwasinstructedtoslowdown.
Alertifthisvaluegreaterthanzero.Thisindicatesaresourceconstraintwithintheclusterthatshouldbeinvestigated.
ep_dcp_xdcr_backoff
IndicatesthenumberoftimesanXDCRreplicationwasinstructedtoslowdown.
Shouldbemonitoredasarate.Createabaselineforyourenvironmentas"normal"willbedependentonworkloadpatternsandXDCRbandwidthlimits.Alertat2xofbaseline.
couch_docs_fragmentationThepercentageofdatafilefragmentation.
Bydefault,compactionshouldstartwhenthisvaluehits30%.Ifthisvalueconsistentlyexceeds30%,thenthistypicallyindicatesdiskIOcontentionoraproblemwithcompactionstartingthatshouldbeinvestigated.
couch_views_fragmentationThepercentageofViewindexfragmentation.
Bydefault,compactionshouldstartwhenthisvaluehits30%.Ifthisvaluesignificantlyexceeds30%,thenthistypicallyindicatesdiskIOcontentionoraproblemwithcompactionstartingthatshouldbeinvestigated.
vb_replica_numThenumberofreplicavBuckets.
Ifthisvaluefallsbelow(1024*thenumberofconfiguredreplicas)/thenumberofservers,itindicatesthatarebalanceisrequired.
vb_active_numThenumberofactivevBuckets.
Thisvalueshouldalwaysequal1024/thenumberofservers.Ifitdoesnot,itindicatesanodefailureandthatafailover+rebalanceisrequired.
Monitoring:DataService
19CouchbaseProfessionalServices
vb_active_num activevBuckets. indicatesanodefailureandthata
failover+rebalanceisrequired.
Example
Thefollowingexampleillustratesgettingtheverbosestatsforanindividualbucket.
BUCKET='travel-sample'
#outputthestatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/$BUCKET/stats|\
jq-r-c'.op.samples|to_entries|sort_by(.key)|.[]|
""+(.key)+":"+(.value|add/length|tostring)'
Example
Thefollowingexampleillustratesgettinganindividualstatforasinglebucket.
BUCKET='travel-sample'
STAT='cmd_get'
#outputthestatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/$BUCKET/stats/$STAT|\
jq-r-c'.nodeStats|to_entries|sort_by(.key)|.[]|
""+(.key)+":"+(.value|add/length|tostring)'
Example
Thisexampleshowshowtoretrieveallstatsforallbuckets.
#loopovereachofthebuckets
forbucketin$(curl\
--userAdministrator:password\
--silent\
--requestGET\
Monitoring:DataService
20CouchbaseProfessionalServices
jq-r'.[]|.name')
do
echo""
echo"Bucket:$bucket"
echo"================================================================"
#outputthestatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/$bucket/stats|\
jq-r-c'.op.samples|to_entries|sort_by(.key)|.[]|
""+(.key)+":"+(.value|add/length|tostring)'
done
Monitoring:DataService
21CouchbaseProfessionalServices
Monitoring:EventingService
EventingService-LevelStatsTheEventingstatsareanaggregateforalloftheEventingFunctionsdeployed,eitherfortheentireclusteror
aspecificnode.
AvailableStats
Statname Description
eventing/bucket_op_exception_countTotalnumberofbucketoperationsinsideofanEventingfunctionwhichhaveresultedinanexception
eventing/checkpoint_failure_countTotalnumberoffailureswhencheckpointinglastprocessedsequencenumbersbyv8worker.Failuresareretriedusingexponentialbackoffuntiltimeout.
eventing/dcp_backlog Remainingmutationstoprocess
eventing/failed_count TotalnumberoffailedEventingfunctionoperations
eventing/n1ql_op_exception_countTotalnumberofN1QLoperationsinsideofanEventingfunctionwhichhaveresultedinanexception
eventing/on_delete_failureThetotalnumber OnDeletehandlerexecutionsthathavefailedforallfunctions
eventing/on_delete_successTotal OnDeletehandlerexecutionsthathavesucceededforallfunctions
eventing/on_update_failureTotal OnUpdatehandlerexecutionsthathavefailedforallfunctions
eventing/on_update_successTotal OnUpdatehandlerexecutionsthathavefailedforallfunctions
eventing/processed_count Totalnumberofmutationsthathavebeenprocessed
eventing/timeout_countTotalnumberofhandlerexecutionswereterminatedbecausethehandlerranlongerthantheconfiguredscripttimeout
GETClusterEventingServiceStats
Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe
entireandclusterandthebestpracticeistomonitoreachnodeindividually.
Insecure:http://localhost:8091/pools/default/buckets/@eventing/stats
Secure:https://localhost:18091/pools/default/buckets/@eventing/stats
Example
Monitoring:EventingService
22CouchbaseProfessionalServices
Thefollowingexampledemonstrateshowtoretrievetheeventingservicestatsforthecluster.
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@eventing/stats|\
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length==2)|
""+(.key)+":"+
(.value|add/length|tostring)'
GETNode-LevelEventingServiceStats
Eachnodeintheclusterrunningtheeventingserviceshouldbemonitoringindividuallyusingtheendpoint
listedbelow.
Insecure:http://localhost:8091/pools/default/buckets/@eventing/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/@eventing/nodes/NODE/stats
Example:StatsforIndividualNode
Thefollowingexampledemonstrateshowtoretrievetheeventingservicestatsforaspecificnodeinthe
cluster.
NODE="172.17.0.2:8091"
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@eventing/nodes/$NODE/stats|
\
jq-r-c'.op.samples|
"eventing/bucket_op_exception_count:"+
(.["eventing/bucket_op_exception_count"]|add/length|tostring)+
"\neventing/checkpoint_failure_count:"+
(.["eventing/checkpoint_failure_count"]|add/length|tostring)+
"\neventing/dcp_backlog:"+
(.["eventing/dcp_backlog"]|add/length|tostring)+
"\neventing/failed_count:"+
(.["eventing/failed_count"]|add/length|tostring)+
"\neventing/n1ql_op_exception_count:"+
Monitoring:EventingService
23CouchbaseProfessionalServices
(.["eventing/n1ql_op_exception_count"]|add/length|tostring)+
"\neventing/on_delete_failure:"+
(.["eventing/on_delete_failure"]|add/length|tostring)+
"\neventing/on_delete_success:"+
(.["eventing/on_delete_success"]|add/length|tostring)+
"\neventing/on_update_failure:"+
(.["eventing/on_update_failure"]|add/length|tostring)+
"\neventing/on_update_success:"+
(.["eventing/on_update_success"]|add/length|tostring)+
"\neventing/processed_count:"+
(.["eventing/processed_count"]|add/length|tostring)+
"\neventing/timeout_count:"+
(.["eventing/timeout_count"]|add/length|tostring)'
Example:StatsforEachNodeSeparately
#loopovereachofthebuckets
fornodein$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/nodes|\
jq-r'.nodes[]|
select(.services|contains(["eventing"])==true)|
.hostname'
)
do
echo"$nodeFunctionStats"
echo"-------------------------------------------------------"
#gettheeventingstatsforthespecificnode
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@eventing/nodes/$node/stats
|\
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length==2)|
""+(.key|split("/")[1])+":"+
(.value|add/length|tostring)'
done
KeyMetricstoMonitor
Monitoring:EventingService
24CouchbaseProfessionalServices
CouchbaseMetric Description Response
eventing/bucket_op_exception_counteventing/failed_counteventing/n1ql_op_exception_counteventing/on_delete_failureeventing/on_update_failureeventing/timeout_count
Anyexceptions/failuresshouldbemonitored
Forthisvalue"normal"is0,anyvalueotherthan0wouldindicateexceptionsarebeingthrownandshouldbeinvestigated
eventing/dcp_backlogThenumberofitemstobeprocessed.
Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandnumberoffunctions.Alertat2xofbaseline.
EventingFunction-LevelStatsTheEventingstatsforaspecificfunctionsareavailableonlyoncethefunctionhasbeendeployed.Thesame
statsthatareavailablefortheserviceasawholearealsoavailableonaper-functionbasisandcanbe
retrievedfortheentireclusteroraspecificnodeinthecluster.
AvailableStats
Statname Description
eventing/function_name/bucket_op_exception_countTotalnumberofoperationsinsideofanEventingfunctionwhichhaveresultedinanexceptionforthefunction
eventing/function_name/checkpoint_failure_countTotalnumberofcheckpointfailuresforthefunction
eventing/function_name/dcp_backlog Remainingmutationstoprocess
eventing/function_name/failed_countTotalnumberoffailedEventingfunctionoperationsforthefunction
eventing/function_name/n1ql_op_exception_countTotalnumberofN1QLoperationsinsideofanEventingfunctionwhichhaveresultedinanexceptionforthefunction
eventing/function_name/on_delete_failureThetotalnumber OnDeletehandlerexecutionsthathavefailedforthefunction
eventing/function_name/on_delete_successTotal OnDeletehandlerexecutionsthathavesucceededforthefunction
eventing/function_name/on_update_failureTotal OnUpdatehandlerexecutionsthathavefailedforthefunction
eventing/function_name/on_update_successTotal OnUpdatehandlerexecutionsthathavefailedforthefunction
eventing/function_name/processed_countTotalnumberofmutationsthathavebeenprocessedforthefunction
eventing/function_name/timeout_countTotalnumberofhandlerexecutionsthathaveresultedinatimeoutforthefunction
Monitoring:EventingService
25CouchbaseProfessionalServices
GETClusterEventingFunctionStats
Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe
entireandclusterandthebestpracticeistomonitoreachnodeindividually.
Insecure:http://localhost:8091/pools/default/buckets/@eventing/stats
Secure:https://localhost:18091/pools/default/buckets/@eventing/stats
Example
Thefollowingexampledemonstrateshowtoretrievetheeventingservicestatsforthecluster.
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@eventing/stats|\
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length==3)|
""+(.key)+":"+
(.value|add/length|tostring)'
GETEventingFunctionStatsperNode
Eachnodeintheclusterrunningtheeventingserviceshouldbemonitoringindividually,althoughasfunctions
canbedynamic,fromamanageabilitystandpoint,itwillbeeasiertomonitortheaggregatestatsofthe
service.However,eachindividualfunctioncanbemonitoredifyousochoose.
Insecure:http://localhost:8091/pools/default/buckets/@eventing/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/@eventing/nodes/NODE/stats
Example
Thefollowingexampledemonstrateshowtoretrievethespecificeventingfunctionstatsforthenode.
NODE="172.17.0.2:8091"
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@eventing/nodes/$NODE/stats|
\
Monitoring:EventingService
26CouchbaseProfessionalServices
jq-r'.op.samplesas$stats
|$stats|[
keys|.[]|select(.|split("/")|length==3)|split("/")[1]
]|sort|uniqueas$funcs
|$funcs|.[]|
"Function:"+.+
"\n----------------------------------------------------------------"+
"\nbucket_op_exception_count:"+
($stats["eventing/"+.+"/bucket_op_exception_count"]|add|tostri
ng)+
"\ncheckpoint_failure_count:"+
($stats["eventing/"+.+"/checkpoint_failure_count"]|add|tostrin
g)+
"\ndcp_backlog:"+
($stats["eventing/"+.+"/dcp_backlog"]|add|tostring)+
"\nfailed_count:"+
($stats["eventing/"+.+"/failed_count"]|add|tostring)+
"\nn1ql_op_exception_count:"+
($stats["eventing/"+.+"/n1ql_op_exception_count"]|add|tostring
)+
"\non_delete_failure:"+
($stats["eventing/"+.+"/on_delete_failure"]|add/length|tostr
ing)+
"\non_delete_success:"+
($stats["eventing/"+.+"/on_delete_success"]|add/length|tost
ring)+
"\non_update_failure:"+
($stats["eventing/"+.+"/on_update_failure"]|add/length|tostr
ing)+
"\non_update_success:"+
($stats["eventing/"+.+"/on_update_success"]|add/length|tostr
ing)+
"\nprocessed_count:"+
($stats["eventing/"+.+"/processed_count"]|add/length|tostrin
g)+
"\ntimeout_count:"+
($stats["eventing/"+.+"/timeout_count"]|add|tostring)
'
KeyMetricstoMonitor
CouchbaseMetric Description Response
eventing/func_name/bucket_op_exception_counteventing/func_name/failed_counteventing/func_name/n1ql_op_exception_counteventing/func_name/on_delete_failureeventing/func_name/on_update_failure
Anyexceptions/failuresshouldbemonitored
Forthisvalue"normal"is0,anyvalueotherthan0wouldindicateexceptionsarebeingthrownandshouldbe
Monitoring:EventingService
27CouchbaseProfessionalServices
eventing/func_name/on_update_failureeventing/func_name/timeout_count
monitored thrownandshouldbeinvestigated
eventing/func_name/dcp_backlogThenumberofitemstobeprocessed.
Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandnumberoffunctions.Alertat2xofbaseline.
Monitoring:EventingService
28CouchbaseProfessionalServices
Monitoring:Full-TextSearchService
GETFull-TextSearchIndexes
Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-fts-indexing.html#index-definition
http://localhost:8094/api/index
Retrieveallindexdefinitionsandconfigurations
Example
ThefollowingexampleillustrateshowtoretrieveeachFTSindexname
curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8094/api/index|
jq-r'.indexDefs.indexDefs|keys|.[]'
FTSService-LevelStats
AvailableStats
Statname Description
fts_curr_batches_blocked_by_herder Thenumberofbatchesblockedbytheherder
fts_num_bytes_used_ram ThenumberofbytesusedinmemoryfortheFTSservice.
fts_total_queries_rejected_by_herder Thenumberofqueriesrejectedbytheherder
GETClusterFTSServiceStats
Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe
entireandclusterandthebestpracticeistomonitoreachnodeindividually.
Insecure:http://localhost:8091/pools/default/buckets/@fts/stats
Secure:https://localhost:18091/pools/default/buckets/@fts/stats
curl\
--userAdministrator:password\
Monitoring:Full-TextSearchService
29CouchbaseProfessionalServices
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@fts/stats|\
jq-r'.op.samples|
"fts_num_bytes_used_ram:"+(.fts_num_bytes_used_ram|add/length|
tostring)'
GETNode-LevelFTSServiceStats
EachnodeintheclusterrunningtheFTSserviceshouldbemonitoringindividuallyusingtheendpointlisted
below.
Insecure:http://localhost:8091/pools/default/buckets/@fts/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/@fts/nodes/NODE/stats
Example:StatsforIndividualNode
ThefollowingexampledemonstrateshowtoretrievetheFTSservicestatsforthecluster.
NODE="172.17.0.2:8091"
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@fts/nodes/$NODE/stats|\
jq-r'.op.samples|
"fts_num_bytes_used_ram:"+(.fts_num_bytes_used_ram|add/length|
tostring)'
Example:StatsforEachNodeSeparately
#loopovereachofthebuckets
fornodein$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/nodes|\
jq-r'.nodes[]|
select(.services|contains(["fts"])==true)|
.hostname'
)
Monitoring:Full-TextSearchService
30CouchbaseProfessionalServices
do
echo"$nodeFTSStats"
echo"-------------------------------------------------------"
#gettheFTSstatsforthespecificnode
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@fts/nodes/$node/stats|\
jq-r'.op.samples|
"fts_num_bytes_used_ram:"+(.fts_num_bytes_used_ram|add/length
|tostring)'
done
IndividualFTS-LevelStatsTheFTSstatsforaspecificindexesareavailableonlyunderthebucketthattheindexiscreatedon.The
samestatsthatareavailablefortheserviceasawholearealsoavailableonaper-indexbasisandcanbe
retrievedfortheentireclusteroraspecificnodeinthecluster.
AvailableStats
Statname Description
fts/indexName/avg_queries_latency Theaveragequerylatencyinmilliseconds
fts/indexName/doc_count Thenumberofdocumentsintheindex
fts/indexName/num_bytes_used_disk Totaldiskfilesizeusedbytheindex
fts/indexName/num_files_on_disk Numberoffilesfortheindexondisk
fts/indexName/num_mutations_to_index Thenumberofdocumentspendingindexing
fts/indexName/num_pindexes_actualNumberofindexpartitions(includingreplicapartitions)
fts/indexName/num_pindexes_targetNumberofindexpartitionsexpected(includingreplicapartitions)
fts/indexName/num_recs_to_persist Numberofindexrecordsnotyetpersistedtodisk
fts/indexName/num_root_filesegments Thenumberofrootfilesegments
fts/indexName/num_root_memorysegments Thenumberofrootmemorysegments
fts/indexName/total_bytes_indexed Numberofftsbytesindexedpersecond
fts/indexName/total_bytes_query_results Numberofbytesreturnedinresultspersecond
fts/indexName/total_compaction_written_bytes Numberofcompactionbyteswrittenpersecond
fts/indexName/total_queries Thenumberofqueriespersecond
Monitoring:Full-TextSearchService
31CouchbaseProfessionalServices
fts/indexName/total_queries_error Thenumberofqueryerrorspersecond
fts/indexName/total_queries_slow Thenumberofslowqueriespersecond(>5s)
fts/indexName/total_queries_timeoutThenumberofqueriespersecondthatresultedinatimeout
fts/indexName/total_request_time Totaltimespentservicingrequests
fts/indexName/total_term_searchers Numberoftermsearchersstartedpersecond
GETClusterIndividualFTSStats
Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe
entireandclusterandthebestpracticeistomonitoreachnodeindividually.
Insecure:http://localhost:8091/pools/default/buckets/@fts-BUCKET/stats
Secure:https://localhost:18091/pools/default/buckets/@fts-BUCKET/stats
Example
Thefollowingexampledemonstrateshowtoretrievetheeventingservicestatsforthecluster.
BUCKET="travel-sample"
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@fts-$BUCKET/stats|\
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length==3)|
""+(.key)+":"+
(.value|add/length|tostring)'
GETIndividualFTSStatsperNode
EachnodeintheclusterrunningtheFTSserviceshouldbemonitoringindividually.
Insecure:http://localhost:8091/pools/default/buckets/@fts-BUCKET/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/@fts-BUCKET/nodes/NODE/stats
Example:StatsforIndividualNode
ThefollowingexampledemonstrateshowtoretrievealloftheFTSstatsforaspecificindexinabucketfora
specificnode.
Monitoring:Full-TextSearchService
32CouchbaseProfessionalServices
NODE="172.17.0.2:8091"
BUCKET="travel-sample"
INDEX="demo"
#gettheFTSstatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@fts-$BUCKET/nodes/$NODE/stats
|\
jq-r--argindex"$INDEX"'.op.samples|
"avg_queries_latency:"+
(.["fts/"+$index+"/avg_queries_latency"]|add/length|tostring)
+
"\ndoc_count:"+
(.["fts/"+$index+"/doc_count"]|add/length|tostring)+
"\nnum_bytes_used_disk:"+
(.["fts/"+$index+"/num_bytes_used_disk"]|add/length|tostring)
+
"\nnum_mutations_to_index:"+
(.["fts/"+$index+"/num_mutations_to_index"]|add|tostring)+
"\nnum_pindexes_actual:"+
(.["fts/"+$index+"/num_pindexes_actual"]|add|tostring)+
"\nnum_pindexes_target:"+
(.["fts/"+$index+"/num_pindexes_target"]|add|tostring)+
"\nnum_recs_to_persist:"+
(.["fts/"+$index+"/num_recs_to_persist"]|add|tostring)+
"\ntotal_bytes_indexed:"+
(.["fts/"+$index+"/total_bytes_indexed"]|add/length|tostring)
+
"\ntotal_bytes_query_results:"+
(.["fts/"+$index+"/total_bytes_query_results"]|add/length|tost
ring)+
"\ntotal_compaction_written_bytes:"+
(.["fts/"+$index+"/total_compaction_written_bytes"]|add/length|
tostring)+
"\ntotal_queries:"+
(.["fts/"+$index+"/total_queries"]|add|tostring)+
"\ntotal_queries_error:"+
(.["fts/"+$index+"/total_queries_error"]|add|tostring)+
"\ntotal_queries_slow:"+
(.["fts/"+$index+"/total_queries_slow"]|add|tostring)+
"\ntotal_queries_timeout:"+
Monitoring:Full-TextSearchService
33CouchbaseProfessionalServices
(.["fts/"+$index+"/total_queries_timeout"]|add|tostring)+
"\ntotal_request_time+queued:"+
(.["fts/"+$index+"/total_request_time"]|add|tostring)+
"\ntotal_term_searchers:"+
(.["fts/"+$index+"/total_term_searchers"]|add|tostring)'
Example:StatsforIndividualNode
ThefollowingexampledemonstrateshowtoretrievealloftheFTSstats,foreverybucketintheclusterfora
singlenode.
NODE="172.17.0.2:8091"
#loopovereachofthebucketsthathasindexes
forbucketin$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8094/api/index|\
jq-r'.indexDefs.indexDefs|[to_entries[]|.value.sourceName]|sort
|unique|.[]')
do
echo""
echo"Bucket:$bucket"
echo"================================================================"
#gettheFTSstatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@fts-$bucket/nodes/$NODE/sta
ts|\
#1.reducethesamplesobject,byloopingovereachproperty,onlywork
withproperties
#whoareindexspecificstatpropertiesandeithersumoraveragesampl
es
#2.getalloftheuniqueindexkeys
#3.loopovereachindexandoutputthestats
jq-r'
reduce(.op.samples|to_entries[])as$key,$value(
;
if(
$key|split("/")|length==3
Monitoring:Full-TextSearchService
34CouchbaseProfessionalServices
and($key|contains("replica")|not)
)then
if([
"num_mutations_to_index","num_pindexes_actual",
"num_pindexes_target","num_recs_to_persist","total_queries",
"total_queries_error","total_queries_slow","total_queries_timeou
t",
"total_request_time+queued","total_term_searchers"
]|.[]|contains($key|split("/")|.[2])==true)then
.[$key]+=($value|add)
else
.[$key]+=($value|add/length|roundit/100.0)
end
else
.
end
)|.as$stats|
$stats|keys|map(split("/")[1])|sort|uniqueas$indexes|
$indexes|.[]|
"Index:"+.+
"\n----------------------------------------------------------------"+
"\navg_queries_latency:"
+($stats["fts\/"+.+"\/avg_queries_latency"]|tostring)+
"\ndoc_count:"
+($stats["fts\/"+.+"\/doc_count"]|tostring)+
"\nnum_bytes_used_disk:"
+($stats["fts\/"+.+"\/num_bytes_used_disk"]|tostring)+
"\nnum_mutations_to_index:"
+($stats["fts\/"+.+"\/num_mutations_to_index"]|tostring)+
"\nnum_pindexes_actual:"
+($stats["fts\/"+.+"\/num_pindexes_actual"]|tostring)+
"\nnum_pindexes_target:"
+($stats["fts\/"+.+"\/num_pindexes_target"]|tostring)+
"\nnum_recs_to_persist:"
+($stats["fts\/"+.+"\/num_recs_to_persist"]|tostring)+
"\ntotal_bytes_indexed:"
+($stats["fts\/"+.+"\/total_bytes_indexed"]|tostring)+
"\ntotal_bytes_query_results:"
+($stats["fts\/"+.+"\/total_bytes_query_results"]|tostring)+
"\ntotal_compaction_written_bytes:"
+($stats["fts\/"+.+"\/total_compaction_written_bytes"]|tostri
ng)+
"\ntotal_queries:"
+($stats["fts\/"+.+"\/total_queries"]|tostring)+
"\ntotal_queries_error:"
+($stats["fts\/"+.+"\/total_queries_error"]|tostring)+
Monitoring:Full-TextSearchService
35CouchbaseProfessionalServices
"\ntotal_queries_slow:"
+($stats["fts\/"+.+"\/total_queries_slow"]|tostring)+
"\ntotal_queries_timeout:"
+($stats["fts\/"+.+"\/total_queries_timeout"]|tostring)+
"\ntotal_request_time:"
+($stats["fts\/"+.+"\/total_request_time"]|tostring)+
"\ntotal_term_searchers:"
+($stats["fts\/"+.+"\/total_term_searchers"]|tostring)+
"\n"
'
done
KeyMetricstoMonitor
CouchbaseMetric Description Response
avg_queries_latency Theaveragequerylatency
Createabaselineforthisvalue,as"normal"willdependonthesize.Alertat2xofthebaseline.Thiswouldindicateaslowdownforindexscanstotheindex.
total_queries
Thenumberofqueryrequeststotheindex
Createabaselineforthisvalue,as"normal"willdependontheamount.Alertat2xofthebaseline.Thiswouldindicateadramaticincreaseinrequests.
total_queries_errortotal_queries_timeout
Thenumberofqueryerrorstotheindex
Alertatanyvaluegreaterthan0asthisindicatesfailedrequests.
FTSAggregateStatsTheFTSaggregatestatsforaspecificbucketareavailableonlyunderthebucketthattheindexesexiston
andareatotalofalloftheindexesforthatbucketintheclusterornode.
AvailableStats
Statname Description
fts/doc_count Thenumberofdocumentsinallftsindexes
fts/num_bytes_used_disk Totaldiskfilesizeusedbytheindexes
fts/num_files_on_disk Thenumberofindexfilesondisk
fts/num_mutations_to_index Thenumberofdocumentspendingindexing
fts/num_pindexes_actual Numberofindexpartitions(includingreplicapartitions)
fts/num_pindexes_target Numberofindexpartitionsexpected(includingreplicapartitions)
fts/num_recs_to_persist Numberofindexrecordsnotyetpersistedtodisk
fts/num_root_filesegments Numberofrootfilesegments
Monitoring:Full-TextSearchService
36CouchbaseProfessionalServices
fts/num_root_filesegments Numberofrootfilesegments
fts/num_root_memorysegments Numberofrootmemorysegments
fts/total_bytes_indexed Numberofftsbytesindexedpersecond
fts/total_bytes_query_results Numberofbytesreturnedinresultspersecond
fts/total_compaction_written_bytes Numberofcompactionbyteswrittenpersecond
fts/total_queries Thenumberofqueriespersecond
fts/total_queries_error Thenumberofqueryerrorspersecond
fts/total_queries_slow Thenumberofslowqueriespersecond(>5s)
fts/total_queries_timeout Thenumberofqueriespersecondthatresultedinatimeout
fts/total_request_time
fts/total_term_searchers Numberoftermsearchersstartedpersecond
GETClusterFTSAggregateStats
Insecure:http://localhost:8091/pools/default/buckets/@fts-BUCKET/stats
Secure:https://localhost:18091/pools/default/buckets/@fts-BUCKET/stats
Example:StatsforCluster
Thefollowingexampledemonstrateshowtoretrievealloftheftsaggregatestatsforaspecificbucketinthe
entirecluster.
BUCKET="travel-sample"
#gettheFTSstatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@fts-$BUCKET/stats|\
jq-r'.op.samples|
"doc_count:"+(.["fts/doc_count"]|add/length|tostring)+
"\nnum_bytes_used_disk:"+(.["fts/num_bytes_used_disk"]|add/length
|tostring)+
"\nnum_mutations_to_index:"+(.["fts/num_mutations_to_index"]|add/
length|tostring)+
"\nnum_pindexes_actual:"+(.["fts/num_pindexes_actual"]|add|tostri
ng)+
"\nnum_pindexes_target:"+(.["fts/num_pindexes_target"]|add/length
|tostring)+
"\ntotal_bytes_indexed:"+(.["fts/total_bytes_indexed"]|add/length
|tostring)+
Monitoring:Full-TextSearchService
37CouchbaseProfessionalServices
"\ntotal_bytes_query_results:"+(.["fts/total_bytes_query_results"]|
add/length|tostring)+
"\ntotal_compaction_written_bytes:"+(.["fts/total_compaction_written_
bytes"]|add/length|tostring)+
"\ntotal_queries:"+(.["fts/total_queries"]|add/length|tostring)
+
"\ntotal_queries_error:"+(.["fts/total_queries_error"]|add/length
|tostring)+
"\ntotal_queries_slow:"+(.["fts/total_queries_slow"]|add/length|
tostring)+
"\ntotal_queries_timeout:"+(.["fts/total_queries_timeout"]|add/le
ngth|tostring)+
"\ntotal_request_time:"+(.["fts/total_request_time"]|add|tostring
)+
"\ntotal_term_searchers:"+(.["fts/total_term_searchers"]|add|tost
ring)'
GETFTSAggregateStatsperNode
Insecure:http://localhost:8091/pools/default/buckets/@fts-BUCKET/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/@fts-BUCKET/nodes/NODE/stats
Example:AggregateStatsforIndividualNode
Thefollowingexampledemonstrateshowtoretrievealloftheindexaggregatestatsforaspecificinabucket
foraspecificnode.
BUCKET="travel-sample"
NODE="172.17.0.2:8091"
#gettheFTSstatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@fts-$BUCKET/nodes/$NODE/stats
|\
jq-r'.op.samples|
"doc_count:"+(.["fts/doc_count"]|add/length|tostring)+
"\nnum_bytes_used_disk:"+(.["fts/num_bytes_used_disk"]|add/length
|tostring)+
"\nnum_mutations_to_index:"+(.["fts/num_mutations_to_index"]|add/
length|tostring)+
"\nnum_pindexes_actual:"+(.["fts/num_pindexes_actual"]|add|tostri
Monitoring:Full-TextSearchService
38CouchbaseProfessionalServices
ng)+
"\nnum_pindexes_target:"+(.["fts/num_pindexes_target"]|add/length
|tostring)+
"\ntotal_bytes_indexed:"+(.["fts/total_bytes_indexed"]|add/length
|tostring)+
"\ntotal_bytes_query_results:"+(.["fts/total_bytes_query_results"]|
add/length|tostring)+
"\ntotal_compaction_written_bytes:"+(.["fts/total_compaction_written_
bytes"]|add/length|tostring)+
"\ntotal_queries:"+(.["fts/total_queries"]|add/length|tostring)
+
"\ntotal_queries_error:"+(.["fts/total_queries_error"]|add/length
|tostring)+
"\ntotal_queries_slow:"+(.["fts/total_queries_slow"]|add/length|
tostring)+
"\ntotal_queries_timeout:"+(.["fts/total_queries_timeout"]|add/le
ngth|tostring)+
"\ntotal_request_time:"+(.["fts/total_request_time"]|add|tostring
)+
"\ntotal_term_searchers:"+(.["fts/total_term_searchers"]|add|tost
ring)'
Monitoring:Full-TextSearchService
39CouchbaseProfessionalServices
Monitoring:IndexService
IndexStatusTheindexstatusAPIdisplaysallindexdefinitions,nodeplacementandstatuswithinthecluster.
Insecure:http://localhost:8091/indexStatus
Secure:https://localhost:18091/indexStatus
Response:
"indexes":[
"storageMode":"plasma",
"partitioned":false,
"instId":4607548507687231469,
"hosts":["127.0.0.1:8091"],
"progress":100,
"definition":"CREATEINDEX`def_airportname`ON`travel-sample`(`airpor
tname`)WITH\"defer_build\":true",
"status":"Ready",
"bucket":"travel-sample",
"index":"def_airportname",
"id":15764219156300962421
,
"storageMode":"plasma",
"partitioned":false,
"instId":11862384293590784556,
"hosts":["127.0.0.1:8091"],
"progress":100,
"definition":"CREATEINDEX`def_city`ON`travel-sample`(`city`)WITH
\"defer_build\":true",
"status":"Ready",
"bucket":"travel-sample",
"index":"def_city",
"id":2037567312091921182
],
"version":45110879,
"warnings":[]
KeyMetricstoMonitor
Monitoring:IndexService
40CouchbaseProfessionalServices
CouchbaseMetric
Description Response
status Indicateswhetheraindexisina"Ready"or"Building"state.
Alertifthevalueisnot"Ready"or"Building".
Example
ThefollowingexampleillustratesoutputtingeachIndexNameandStatus.
curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/indexStatus|\
jq-r'.indexes|sort_by(.bucket)|.[]|.bucket+":"+.index+"("
+.status+")"'
Thisexampleshowsoutputtingallindexeswhosestatusisnot"Ready"or"Building"
curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/indexStatus|\
jq-r'.indexes|map(select(
(.status!="Ready"and.status!="Building")
))|.[]|.bucket+":"+.index+"("+.status+")"'
IndexService-LevelStatsThefollowingIndexservicestatsareavailableviatheCluster-WideorPer-NodeEndpointslistedbelow.
AvailableStats
Statname Description
index_memory_quota Theclusterwidememoryquota.
index_memory_used Theamountofmemorycurrentlyusedbytheindexingservice.
index_ram_percent Thepercentageofindexentriesinram.
index_remaining_ram Theamountofmemoryremaining.
Monitoring:IndexService
41CouchbaseProfessionalServices
GETClusterIndexServiceStats
Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe
entireandclusterandthebestpracticeistomonitoreachnodeindividually.
Insecure:http://localhost:8091/pools/default/buckets/@index/stats
Secure:https://localhost:18091/pools/default/buckets/@index/stats
Example
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@index/stats|\
jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|
.key+":"+(.value|add/length|tostring)'
GETNode-LevelIndexServiceStats
Eachnodeintheclusterrunningtheindexserviceshouldbemonitoringindividuallyusingtheendpointlisted
below.
Insecure:http://localhost:8091/pools/default/buckets/@index/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/@index/nodes/NODE/stats
Example:StatsforIndividualNode
Thefollowingexampledemonstrateshowtoretrievetheindexservicestatsforaspecificnode.
NODE="172.17.0.2:8091"
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@index/nodes/$NODE/stats|\
jq-r-c'.op.samples|
"index_memory_quota:"+(.index_memory_quota|add/length|tostring)
+
"\nindex_memory_used:"+(.index_memory_used|add/length|tostring)
+
"\nindex_ram_percent:"+(.index_ram_percent|add/length|tostring)
Monitoring:IndexService
42CouchbaseProfessionalServices
+
"\nindex_remaining_ram:"+(.index_remaining_ram|add/length|tostr
ing)'
Example:StatsforEachNodeSeparately
#loopovereachofthebuckets
fornodein$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/nodes|\
jq-r'.nodes[]|
select(.services|contains(["index"])==true)|
.hostname'
)
do
echo"$nodeIndexStats"
echo"-------------------------------------------------------"
#gettheindexstatsforthespecificnode
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@index/nodes/$node/stats|\
jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|
.key+":"+(.value|add/length|tostring)'
done
KeyMetricstoMonitor
CouchbaseMetric Description Response
index_remaining_ramTheamountofmemoryremaining.
Alertifthisvalueis20%orless,asitisanindicativeofindexgrowthandnewindexnodeswillneedtobeexpanded.
IndividualIndex-LevelStatsTheIndexstatsforaspecificindexesareavailableonlyunderthebucketthattheindexiscreatedon.The
samestatsthatareavailablefortheserviceasawholearealsoavailableonaper-indexbasisandcanbe
retrievedfortheentireclusteroraspecificnodeinthecluster.
Monitoring:IndexService
43CouchbaseProfessionalServices
AvailableStats
Statname Description
index/indexName/avg_item_size Theaverageindexentrysize
index/indexName/avg_scan_latency Theaveragelatencywhenscanningtheindex
index/indexName/cache_hits Thenumberofin-memoryhitstotheindex
index/indexName/cache_miss_ratio Theratioofmissestohits
index/indexName/cache_misses Thenumberofin-memorymissestotheindex
index/indexName/data_size Thetotaldatasizeoftheindex
index/indexName/data_size_on_disk Thetotalsizeoftheindexdataondisk
index/indexName/disk_overhead_estimate Thesizeofstaledataondiskduetofragmentation
index/indexName/disk_size Thesizeoftheindexondisk
index/indexName/frag_percent Theindexfragmentationpercentage
index/indexName/index_frag_percent Theindexfragmentationpercentage
index/indexName/index_resident_percentThepercentageoftheindexthatismemoryresident
index/indexName/items_count Thenumberofitemsintheindex
index/indexName/log_space_on_disk Thesizeofthelogfilesondisk
index/indexName/memory_used Theamountofmemoryusedbytheindex
index/indexName/num_docs_indexed Thenumberofitemsindexedsincethelastrestart
index/indexName/num_docs_pending Thenumberofitemspendingindexing
index/indexName/num_docs_pending+queuedThenumberofdocumentsthatarependingorqueuedforindexing
index/indexName/num_docs_queuedThenumberofdocumentsthatarequeuedforindexing
index/indexName/num_requests Thenumberofrequeststotheindex
index/indexName/num_rows_returned Theaveragenumberofrowsreturnedbyascan
index/indexName/raw_data_size Therawuncompresseddatasize
index/indexName/recs_in_memThenumberofrecordsintheindexthatareinmemory
index/indexName/recs_on_disk Thenumberofrecordsnotinmemory
index/indexName/scan_bytes_read Theaveragenumberofbytesreadperscan
index/indexName/total_scan_duration Thetotaltimespentscanning
GETClusterIndividualIndexStats
Monitoring:IndexService
44CouchbaseProfessionalServices
Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe
entireandclusterandthebestpracticeistomonitoreachnodeindividually.
Insecure:http://localhost:8091/pools/default/buckets/@index-BUCKET/stats
Secure:https://localhost:18091/pools/default/buckets/@index-BUCKET/stats
Example
Thefollowingexampledemonstrateshowtoretrievetheeventingservicestatsforthecluster.
BUCKET="travel-sample"
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@index-$BUCKET/stats|\
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length==3)|
""+(.key)+":"+
(.value|add/length|tostring)'
GETIndividualIndexStatsperNode
Eachnodeintheclusterrunningtheindexserviceshouldbemonitoringindividually.
Insecure:http://localhost:8091/pools/default/buckets/@index-BUCKET/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/@index-BUCKET/nodes/NODE/stats
Example:StatsforIndividualNode
Thefollowingexampledemonstrateshowtoretrievealloftheindexstatsforaspecificindexinabucketfora
specificnode.
NODE="172.17.0.2:8091"
BUCKET="travel-sample"
INDEX="def_faa"
#gettheindexstatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@index-$BUCKET/nodes/$NODE/sta
Monitoring:IndexService
45CouchbaseProfessionalServices
ts|\
jq-r--argindex"$INDEX"'.op.samples|
"avg_item_size:"+(.["index/"+$index+"/avg_item_size"]|add/len
gth|tostring)+
"\navg_scan_latency:"+(.["index/"+$index+"/avg_scan_latency"]|a
dd/length|tostring)+
"\ncache_hits:"+(.["index/"+$index+"/cache_hits"]|add|tostrin
g)+
"\ncache_miss_ratio:"+(.["index/"+$index+"/cache_miss_ratio"]|a
dd/length|tostring)+
"\ncache_misses:"+(.["index/"+$index+"/cache_misses"]|add|tos
tring)+
"\ndata_size:"+(.["index/"+$index+"/data_size"]|add/length|
tostring)+
"\ndisk_overhead_estimate:"+(.["index/"+$index+"/disk_overhead_es
timate"]|add/length|tostring)+
"\ndisk_size:"+(.["index/"+$index+"/disk_size"]|add/length|
tostring)+
"\nfrag_percent:"+(.["index/"+$index+"/frag_percent"]|add/len
gth|tostring)+
"\nindex_frag_percent:"+(.["index/"+$index+"/index_frag_percent"]
|add/length|tostring)+
"\nindex_resident_percent:"+(.["index/"+$index+"/index_resident_p
ercent"]|add/length|tostring)+
"\nitems_count:"+(.["index/"+$index+"/items_count"]|add/lengt
h|tostring)+
"\nmemory_used:"+(.["index/"+$index+"/memory_used"]|add/lengt
h|tostring)+
"\nnum_docs_indexed:"+(.["index/"+$index+"/num_docs_indexed"]|a
dd|tostring)+
"\nnum_docs_pending+queued:"+(.["index/"+$index+"/num_docs_pendin
g+queued"]|add|tostring)+
"\nnum_docs_queued:"+(.["index/"+$index+"/num_docs_queued"]|add
|tostring)+
"\nnum_requests:"+(.["index/"+$index+"/num_requests"]|add|tos
tring)+
"\nnum_rows_returned:"+(.["index/"+$index+"/num_rows_returned"]|
add|tostring)+
"\nrecs_in_mem:"+(.["index/"+$index+"/recs_in_mem"]|add/lengt
h|tostring)+
"\nrecs_on_disk:"+(.["index/"+$index+"/recs_on_disk"]|add/len
gth|tostring)+
"\nscan_bytes_read:"+(.["index/"+$index+"/scan_bytes_read"]|add
|tostring)+
"\ntotal_scan_duration:"+(.["index/"+$index+"/total_scan_duration
"]|add|tostring)
Monitoring:IndexService
46CouchbaseProfessionalServices
'
Example:StatsforIndividualNode
Thefollowingexampledemonstrateshowtoretrievealloftheindexstats,foreverybucketintheclusterfora
singlenode.
NODE="172.17.0.2:8091"
#loopovereachofthebucketsthathasindexes
forbucketin$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/indexStatus|\
jq-r'[.indexes[]|.bucket]|sort|unique|.[]')
do
echo""
echo"Bucket:$bucket"
echo"================================================================"
#gettheindexstatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@index-$bucket/nodes/$NODE/s
tats|\
#1.reducethesamplesobject,byloopingovereachproperty,onlywork
withproperties
#whoareindexspecificstatpropertiesandeithersumoraveragesampl
es
#2.getalloftheuniqueindexkeys
#3.loopovereachindexandoutputthestats
jq-r'reduce(.op.samples|to_entries[])as$key,$value(
;
if(
$key|split("/")|length==3
and($key|contains("replica")|not)
)then
if([
"cache_hits","cache_misses","num_docs_indexed","num_docs_pending
",
"num_docs_pending+queued","num_docs_queued","num_requests",
Monitoring:IndexService
47CouchbaseProfessionalServices
"num_rows_returned","scan_bytes_read","total_scan_duration"
]|.[]|contains($key|split("/")|.[2])==true)then
.[$key]+=($value|add)
else
.[$key]+=($value|add/length|roundit/100.0)
end
else
.
end
)|.as$stats|
$stats|keys|map(split("/")[1])|sort|uniqueas$indexes|
$indexes|.[]|
"Index:"+.+
"\n----------------------------------------------------------------"+
"\navg_item_size:"+($stats["index\/"+.+"\/avg_item_size"]|t
ostring)+
"\navg_scan_latency:"+($stats["index\/"+.+"\/avg_scan_latency
"]|tostring)+
"\ncache_hits:"+($stats["index\/"+.+"\/cache_hits"]|tostrin
g)+
"\ncache_miss_ratio:"+($stats["index\/"+.+"\/cache_miss_ratio
"]|tostring)+
"\ncache_misses:"+($stats["index\/"+.+"\/cache_misses"]|tos
tring)+
"\ndata_size:"+($stats["index\/"+.+"\/data_size"]|tostring)
+
"\ndisk_overhead_estimate:"+($stats["index\/"+.+"\/disk_overh
ead_estimate"]|tostring)+
"\ndisk_size:"+($stats["index\/"+.+"\/disk_size"]|tostring)
+
"\nfrag_percent:"+($stats["index\/"+.+"\/frag_percent"]|tos
tring)+
"\nindex_frag_percent:"+($stats["index\/"+.+"\/index_frag_per
cent"]|tostring)+
"\nindex_resident_percent:"+($stats["index\/"+.+"\/index_resi
dent_percent"]|tostring)+
"\nitems_count:"+($stats["index\/"+.+"\/items_count"]|tostr
ing)+
"\nmemory_used:"+($stats["index\/"+.+"\/memory_used"]|tostr
ing)+
"\nnum_docs_indexed:"+($stats["index\/"+.+"\/num_docs_indexed
"]|tostring)+
"\nnum_docs_pending:"+($stats["index\/"+.+"\/num_docs_pending
"]|tostring)+
"\nnum_docs_pending+queued:"+($stats["index\/"+.+"\/num_docs_
pending+queued"]|tostring)+
Monitoring:IndexService
48CouchbaseProfessionalServices
"\nnum_docs_queued:"+($stats["index\/"+.+"\/num_docs_queued"]
|tostring)+
"\nnum_requests:"+($stats["index\/"+.+"\/num_requests"]|tos
tring)+
"\nnum_rows_returned:"+($stats["index\/"+.+"\/num_rows_return
ed"]|tostring)+
"\nrecs_in_mem:"+($stats["index\/"+.+"\/recs_in_mem"]|tostr
ing)+
"\nrecs_on_disk:"+($stats["index\/"+.+"\/recs_on_disk"]|tos
tring)+
"\nscan_bytes_read:"+($stats["index\/"+.+"\/scan_bytes_read"]
|tostring)+
"\navg_scan_latency:"+($stats["index\/"+.+"\/avg_scan_latency
"]|tostring)+
"\ntotal_scan_duration:"+($stats["index\/"+.+"\/total_scan_du
ration"]|tostring)+
"\n"
'
done
KeyMetricstoMonitor
CouchbaseMetric Description Response
avg_item_size Theaverageindexentrysize
Createabaselineforthisvalue,as"normal"willdependonthesize.Alertat2xofthebaseline.Thiswouldindicateadramaticmodelchange.
avg_scan_latencyTheaveragescanlatency
Createabaselineforthisvalue,as"normal"willdependonthesize.Alertat2xofthebaseline.Thiswouldindicateaslowdownforindexscanstotheindex.
index_resident_percent
Thepercentageoftheindexthatismemoryresident
Createabaselineforthisvalueas"normal"willdependonSLAsandhardconfiguration.Alertat5-10%deviationofthebaseline.
num_requests
Thenumberofindexscanrequeststotheindex
Createabaselineforthisvalue,as"normal"willdependontheamount.Alertat2xofthebaseline.Thiswouldindicateadramaticincreaseinrequests.
IndexAggregateStatsTheIndexaggregatestatsforaspecificbucketareavailableonlyunderthebucketthattheindexesexiston
andareatotalofalloftheindexesforthatbucketintheclusterornode.
AvailableStats
Monitoring:IndexService
49CouchbaseProfessionalServices
Statname Description
index/cache_hits Thenumberofin-memoryhitstotheindex
index/cache_misses Thenumberofin-memorymissestotheindex
index/data_size Thetotaldatasizeoftheindex
index/data_size_on_disk Thetotaldatasizeondisk
index/disk_overhead_estimate Thesizeofstaledataondiskduetofragmentation
index/disk_size Thesizeoftheindexondisk
index/frag_percent Theindexfragmentationpercentage
index/fragmentation Theindexfragmentationpercentage
index/items_count Thenumberofitemsintheindex
index/memory_used Theamountofmemoryusedbytheindex
index/num_docs_indexed Thenumberofitemsindexedsincethelastrestart
index/num_docs_pending Thenumberofdocumentsthatarependingorqueuedforindexing
index/num_docs_queued Thenumberofdocumentsthatarequeuedforindexing
index/num_requests Thenumberofrequeststotheindex
index/num_rows_returned Theaveragenumberofrowsreturnedbyascan
index/raw_data_size Therawuncompresseddatasize
index/recs_in_mem Thenumberofrecordsintheindexthatareinmemory
index/recs_on_disk Thenumberofrecordsnotinmemory
index/scan_bytes_read Theaveragenumberofbytesreadperscan
index/total_scan_duration Thetotaltimespentscanning
GETClusterIndexAggregateStats
Insecure:http://localhost:8091/pools/default/buckets/@index-BUCKET/stats
Secure:https://localhost:18091/pools/default/buckets/@index-BUCKET/stats
Example:StatsforCluster
Thefollowingexampledemonstrateshowtoretrievealloftheindexaggregatestatsforaspecificbucketin
theentirecluster.
BUCKET="travel-sample"
#gettheindexstatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
Monitoring:IndexService
50CouchbaseProfessionalServices
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@index-$BUCKET/stats|\
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length==2)|
""+(.key|split("/")[1])+":"+
(.value|add/length|tostring)'
GETIndexAggregateStatsperNode
Insecure:http://localhost:8091/pools/default/buckets/@index-BUCKET/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/@index-BUCKET/nodes/NODE/stats
Example:AggregateStatsforIndividualNode
Thefollowingexampledemonstrateshowtoretrievealloftheindexaggregatestatsforaspecificinabucket
foraspecificnode.
BUCKET="travel-sample"
NODE="172.17.0.2:8091"
#gettheindexstatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@index-$BUCKET/nodes/$NODE/sta
ts|\
jq-r'.op.samples|
"cache_hits:"+(.["index/cache_hits"]|add|tostring)+
"\ncache_misses:"+(.["index/cache_misses"]|add|tostring)+
"\ndata_size:"+(.["index/data_size"]|add|tostring)+
"\ndisk_overhead_estimate:"+(.["index/disk_overhead_estimate"]|add
/length|tostring)+
"\ndisk_size:"+(.["index/disk_size"]|add|tostring)+
"\nfrag_percent:"+(.["index/frag_percent"]|add/length|tostring)
+
"\nfragmentation:"+(.["index/fragmentation"]|add/length|tostrin
g)+
"\nitems_count:"+(.["index/items_count"]|add/length|tostring)+
"\nmemory_used:"+(.["index/memory_used"]|add/length|tostring)+
"\nnum_docs_indexed:"+(.["index/num_docs_indexed"]|add|tostring)
+
"\nnum_docs_pending:"+(.["index/num_docs_pending"]|add|tostring)
+
Monitoring:IndexService
51CouchbaseProfessionalServices
"\nnum_docs_queued:"+(.["index/num_docs_queued"]|add|tostring)+
"\nnum_requests:"+(.["index/num_requests"]|add|tostring)+
"\nnum_rows_returned:"+(.["index/num_rows_returned"]|add|tostring
)+
"\nrecs_in_mem:"+(.["index/recs_in_mem"]|add|tostring)+
"\nrecs_on_disk:"+(.["index/recs_on_disk"]|add|tostring)+
"\nscan_bytes_read:"+(.["index/scan_bytes_read"]|add|tostring)+
"\ntotal_scan_duration:"+(.["index/total_scan_duration"]|add|tost
ring)
'
Monitoring:IndexService
52CouchbaseProfessionalServices
Monitoring:Logs
Built-inEmailAlertsandLogsCouchbaseprovidesseveralbuilt-inalertsforwhenCouchbaseisapproachingacriticalfailureorwhena
criticalfailurehasoccurred.Itisrecommendedtoenablethebuilt-inemailalertsandconfigurethemtobe
senttomultiplerecipientsoradistributionlist.Thesealertsshouldbetreatedasafail-safetoproactive
alertingfromanexternalmonitoringservice.
SomeenvironmentsdonotpermitCouchbasenodestosendemail.Thistableprovidesthelog-based
equivalentofthebuilt-inCouchbaseemailalerts.
LogscanbemonitoredviaRESTusingthe https://<server>:8091/logsendpointorviathe
/opt/couchbase/var/lib/couchbase/logs/info.logfile.Alertscanbegeneratedbyapplyinga
regularexpressiontomatcheitherthemodule/codecombinationorstringnotedbelow.
AvailableAlerts
Alert Description Code
Nodewasauto-failed-over
Thesendingnodehasbeenfailedoverautomatically. auto_failover_node
Maximumnumberofauto-failed-overnodeswasreached
Theauto-failoversystemstopsauto-failoverwhenthemaximumnumberofsparenodesavailablehasbeenreached.
auto_failover_maximum_reached
Nodewasn'tauto-failed-overasothernodesaredownatthesametime
Auto-failoverdoesnottakeplaceifthereisalreadyanodedown. auto_failover_other_nodes_down
Nodewasnotauto-failed-overastherearenotenoughnodesintheclusterrunningthesameservice
Youcannotsupportauto-failoverwithlessthanthreenodes. auto_failover_cluster_too_small
Nodewasnotauto-failed-overasauto-failoverforoneormoreservicesrunningonthenodeisdisabled
Auto-failoverdoesnottakeplaceonanodeasoneormoreservicesrunningonthenodeisdisabled.
auto_failover_disabled
Node'sIPaddresshaschangedunexpectedly
TheIPaddressofthenodehaschanged,whichmayindicateanetworkinterface,operatingsystem,orothernetworkorsystemfailure.
ip
Diskspaceusedforpersistentstoragehasreach
Thediskdeviceconfiguredforstorageofpersistentdatais disk
Monitoring:Logs
53CouchbaseProfessionalServices
storagehasreachatleast90%ofcapacity
storageofpersistentdataisnearingfullcapacity.
disk
Metadataoverheadismorethan50%
Theamountofdatarequiredtostorethemetadatainformationforyourdatasetisnowgreaterthan50%oftheavailableRAM.
overhead
Bucketmemoryonanodeisentirelyusedformetadata
AlltheavailableRAMonanodeisbeingusedtostorethemetadatafortheobjectsstored.Thismeansthatthereisnomemoryavailableforcachingvalues.Withnomemoryleftforstoringmetadata,furtherrequeststostoredatawillalsofail.
Onlyapplicabletobucketsconfiguredforvalue-onlyejection.
ep_oom_errors
Writingdatatodiskforaspecificbuckethasfailed
Thediskordeviceusedforpersistingdatahasfailedtostorepersistentdataforabucket.
ep_item_commit_failed
Writingeventtoauditloghasfailed
Theauditlogeventwritinghasfailed. audit_dropped_events
ApproachingfullIndexerRAMwarning
TheindexerRAMlimitthresholdisapproachingwarning.
indexer_ram_max_usage
Remotemutationtimestampexceededdriftthreshold
Theremotemutationtimestampexceededdriftthresholdwarning. ep_clock_cas_drift_threshold_exceeded
Communicationissuesamongsomenodesinthecluster
Therearesomecommunicationissuesinsomenodeswithinthecluster.
communication_issue
LogsAPIThesamelogfilemessagesthatareavailableintheAdminUIhttp://localhost:8091/ui/index.html#!/logsare
availableviaaRESTAPIaswell.
Insecure:http://localhost:8091/logs
Secure:https://localhost:18091/logs
APIParameters
TheLogsAPIsupportsthefollowingquerystringparameters
Param Description
limit Anintegergreaterthan0thatlimitstheoverallnumberofmessagesreturned
sinceTime Epochtimestampinmillisecondstostartreturningmessagesfrom
Monitoring:Logs
54CouchbaseProfessionalServices
LogResponseProperties
Property Description
code Acodespecifiedbythemoduleor0
module Themodulethatgeneratedthelogmessage
node Thenodethatthemessagecamefrom
serverTime AnISO-8601timestampofwhenthemessagewaslogged
shortTextAshortstringdescribingthelogentry,mostcommonly"message","nodeup",or"nodedown"
text Thedetailedlogmessage
tstamp AnEpochtimestampofwhenthemessagewaslogged
type Thetypeoflogmessage,valuescanbe:info,warning,critical
Example:AllLogMessages
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datalimit=100\
http://localhost:8091/logs|\
jq-r'.list[]|
"["+.type+"]"+.serverTime+
"Module:"+.module+
"Code:"+(.code|tostring)+
"Message:"+.text
'
Example:CriticalMessagesOnly
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datalimit=100\
http://localhost:8091/logs|\
jq-r'.list[]|select(.type=="critical")|
"["+.type+"]"+.serverTime+
"Module:"+.module+
"Code:"+(.code|tostring)+
"Message:"+.text
'
Monitoring:Logs
55CouchbaseProfessionalServices
Example:WarningMessagesOnly
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datalimit=100\
http://localhost:8091/logs|\
jq-r'.list[]|select(.type=="warning")|
"["+.type+"]"+.serverTime+
"Module:"+.module+
"Code:"+(.code|tostring)+
"Message:"+.text
'
Example:CriticalorWarningMessagesOnly
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datalimit=100\
http://localhost:8091/logs|\
jq-r'.list[]|select(.type=="critical"or.type=="warning")|
"["+.type+"]"+.serverTime+
"Module:"+.module+
"Code:"+(.code|tostring)+
"Message:"+.text
'
AlertsAPICriticalalertsthattriggeremailalerts,arealsodisplayedtousersintheAdminUIuponloggingin.These
alertscanoptionallybemonitored,shouldemailnotbeanoption.
Insecure:http://localhost:8091/pools/default
Secure:https://localhost:18091/pools/default
Alertsarelocatedattherootoftheresponsepayloadinaproperty "alerts",whichisanarray.
AlertProperties
Monitoring:Logs
56CouchbaseProfessionalServices
Property Description
msg Thealertmessageanddetails
serverTime Thetimethealertwasissued
Example:RetrieveAllAlerts
curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/default|\
jq-r'.alerts[]|.serverTime+"-"+.msg'
Monitoring:Logs
57CouchbaseProfessionalServices
Monitoring:Nodes
GETNodesOverview
http://localhost:8091/pools/nodes
Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-node-get-info.html
Response
"nodes":[
"hostname":"10.112.170.101:8091",
"thisNode":true,
"ports":
"sslProxy":11214,
"httpsMgmt":18091,
"httpsCAPI":18092,
"proxy":11211,
"direct":11210
,
"services":["fts","index","kv","n1ql","cbas","eventing"]
]
Eachnodeintheclusterislistedinthe"nodes"array.The thisNodeattributeindicatesthenodeyou
haveexecutedthequeryagainst.Usingthisoutput,amonitoringagentcandiscovernewnodeswithinthe
clusterandwhichservicesareassignedtothosenodesinordertoautomaticallyapplythecorrectmonitoring
profile.
KeyMetricstoMonitor
CouchbaseMetric Description Response
statusThisisametametricthatindicatesoverallnodehealth.
Alertifthevalueis"unhealthy".
clusterMembershipIndicateswhetherthenodeisanactiveparticipantinclusteroperations.Possiblevaluesare"active","inactiveAdded",and"inactiveFailed".
Alerton"inactiveFailed"andinvestigatethecauseofthenodefailure.
Example
Monitoring:Nodes
58CouchbaseProfessionalServices
Thisexampleillustratesretrievingthestatusofeachnodeinthecluster.
curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/nodes|\
jq-r'.nodes[]|.hostname+"("+.status+")"'
Example
Thefollowingexampledisplaystheclustermembershipofeachnode
curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/nodes|\
jq-r'.nodes[]|.hostname+"("+.clusterMembership+")"'
Example
Showtheservicesandsystemstatsforeachnodecluster.
curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/nodes|\
jq-r'.nodes[]|.hostname+"("+(.services|join(","))+")\n"+
"cpu_utilization_rate:"+
(.systemStats.cpu_utilization_rate|tostring)+"%\n"+
"swap_total:"+
(.systemStats.swap_total/1024/1024|tostring)+"MB\n"+
"swap_used:"+
(.systemStats.swap_used/1024/1024|tostring)+"MB("+
((.systemStats.swap_used/.systemStats.swap_total)*100|tostring)+
"%)\n"+
"mem_total:"+
(.systemStats.mem_total/1024/1024|tostring)+"MB\n"+
"mem_free:"+
(.systemStats.mem_free/1024/1024|tostring)+"MB("+
((.systemStats.mem_free/.systemStats.mem_total)*100|tostring)+"
%)"
'
Monitoring:Nodes
59CouchbaseProfessionalServices
Monitoring:Nodes
60CouchbaseProfessionalServices
Monitoring:QueryService
QueryService-LevelStatsThefollowingQuerystatsareavailableviatheCluster-WideorPer-NodeEndpointslistedbelow.
AvailableStats
Statname Description
query_avg_req_time Theaveragetotalrequesttime.
query_avg_svc_time Theaveragetimeofthequeryserviceforrequests.
query_avg_response_size Theaveragesizeinbytesoftheresonse.
query_avg_result_count Theaveragenumberofresultsbeingreturned.
query_active_requests Thenumberofactiverequests.
query_errors Thenumberofqueriesresultinginanerror.
query_invalid_requests Thenumberofinvalid/incorrectlyformattedqueries.
query_queued_requests Thenumberofqueryrequeststhathavebeenqueued.
query_request_time Thecurrentrequestduration.
query_requests Thecurrentnumberofrequestspersecond.
query_requests_1000ms Thenumberofqueriesgreaterthan1000ms.
query_requests_250ms Thenumberofqueriesgreaterthan250ms.
query_requests_5000ms Thenumberofqueriesgreaterthan5000ms.
query_requests_500ms Thenumberofqueriesgreaterthan500ms.
query_result_count Thenumberofresultsreturned.
query_result_size Theresultqueryresultsize.
query_selects Thenumberofselectsbeingexecuted.
query_service_time Thetimespentbythequeryservicetoservicetherequest.
query_warnings Thenumberofquerywarningsgenerated.
GETClusterQueryServiceStats
Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe
entireandclusterandthebestpracticeistomonitoreachnodeindividually.
Insecure:http://localhost:8091/pools/default/buckets/@query/stats
Secure:https://localhost:18091/pools/default/buckets/@query/stats
Monitoring:QueryService
61CouchbaseProfessionalServices
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@query/stats|\
jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|
.key+":"+(.value|add/length|tostring)'
GETNode-LevelQueryServiceStats
Eachnodeintheclusterrunningthequeryserviceshouldbemonitoringindividuallyusingtheendpointlisted
below.
Insecure:http://localhost:8091/pools/default/buckets/@query/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/@query/nodes/NODE/stats
Example:StatsforIndividualNode
Thefollowingexampledemonstrateshowtoretrievethequeryservicestatsforthecluster.
NODE="172.17.0.2:8091"
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@query/nodes/$NODE/stats|\
jq-r-c'.op.samples|
"query_avg_req_time:"+(.query_avg_req_time|add/length|tostring)
+
"\nquery_avg_svc_time:"+(.query_avg_svc_time|add/length|tostrin
g)+
"\nquery_avg_response_size:"+(.query_avg_response_size|add/length
|tostring)+
"\nquery_avg_result_count:"+(.query_avg_result_count|add/length|
tostring)+
"\nquery_active_requests:"+(.query_active_requests|add|tostring)
+
"\nquery_errors:"+(.query_errors|add|tostring)+
"\nquery_invalid_requests:"+(.query_invalid_requests|add|tostring
)+
"\nquery_queued_requests:"+(.query_queued_requests|add|tostring)
Monitoring:QueryService
62CouchbaseProfessionalServices
+
"\nquery_request_time:"+(.query_request_time|add|tostring)+
"\nquery_requests:"+(.query_requests|add|tostring)+
"\nquery_requests_1000ms:"+(.query_requests_1000ms|add|tostring)
+
"\nquery_requests_250ms:"+(.query_requests_250ms|add|tostring)+
"\nquery_requests_5000ms:"+(.query_requests_5000ms|add|tostring)
+
"\nquery_requests_500ms:"+(.query_requests_500ms|add|tostring)+
"\nquery_result_count:"+(.query_result_count|add|tostring)+
"\nquery_result_size:"+(.query_result_size|add|tostring)+
"\nquery_selects:"+(.query_selects|add|tostring)+
"\nquery_service_time:"+(.query_service_time|add|tostring)+
"\nquery_warnings:"+(.query_warnings|add|tostring)'
Example:StatsforEachNodeSeparately
#loopovereachofthebuckets
fornodein$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/nodes|\
jq-r'.nodes[]|
select(.services|contains(["n1ql"])==true)|
.hostname'
)
do
echo"$nodeQueryStats"
echo"-------------------------------------------------------"
#getthequerystatsforthespecificnode
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@query/nodes/$node/stats|\
jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|
.key+":"+(.value|add/length|tostring)'
done
KeyMetricstoMonitor
CouchbaseMetric Description Response
The
Monitoring:QueryService
63CouchbaseProfessionalServices
query_avg_svc_time
Theaveragetimeofthequeryserviceforrequests.
Createabaselineforthisvalue,as"normal"willdependonworkload.Alertat2xofthebaseline.Thiswouldindicatethatmorequerynodesmaybeneededorindexesareperformingslowlyandrequireinvestigation.
query_requests
Thenumberofqueryrequestspersecond.
Createabaselineforthisvalue,as"normal"willdependonworkload.Alertat2xofthebaseline.Thiswouldindicateanincreaseinquerytraffic.
Monitoring:QueryService
64CouchbaseProfessionalServices
Monitoring:OperatingSystem
OperatingSystemMetricsJustasmonitoringCouchbaseandtheindividualservices,buckets,indexes,etc.isextremelyimportantto
haveasolidunderstandingofoverallclusterhealth,itisalsoimportanttomonitortheoperatingsystemand
variousstatsforeachnodeinthecluster.Eachoperatingsystemhasvaryingmeansofretrievingthese
metricsandmanymonitoringsolutionscollectthemoutofthebox.
OSMetric Response
FreeRAMFree+cachememoryshouldalwaysbeatleast20%oftotalsystemmemory.Iffree+cachememoryfallsbelow20%,scalethecluster.
Swapusage
Swapusageshouldalwaysbezero.Ifswapisused,itmeanstheOSisunderveryhighmemorypressureandunabletopurgedirtypagesfastenoughandtheclustershouldbescaled.
MemcachedprocessRAMusage
Createabaselineforthisvalueas"normal"willbedependentuponyourworkingset.Alertifthisvalueexceeds150%ofbaseline.Thismayindicateanunusualincreaseinwritetraffic,readingoftypicallycolddata,orpossiblemallocfragmentation.ConfirmtheCouchbaseresidentratiosarestillcorrect.Addmemoryorscaletheclusterifnecessary.
Beam.smpprocessRAMusage
Createabaselineforthisvalueas"normal"willbedependentuponyourclustersizeandAPIactivitylevels.Alertifthisvalueexceeds120%ofbaseline.Thismayindicateamemoryleakinthebeamprocess.ContactCouchbaseSupportiflargerthanafewgigabytes.
IOutilization(iostat)
Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablediskIO.OverallsustainedIOutilizationshouldnotexceed90%oftotalIOcapacity.
TotalCPUutilization
Createabaselineforthisvalueas"normal"willbedependentuponyourworkload.SustainedCPUutilization>90%indicatesaneedtoscalethecluster.
CouchbaseserviceCPUutilization
Createabaselineforthesevaluesas"normal"willbedependentuponyourworkload.Alertifthisvalueexceeds2xofbaseline.
Beam.smpCPUutilization
Createabaselineforthisvalueas"normal"willbedependentuponyourworkload.Alertifthisvalueexceeds2xofbaseline
%stealCPUThisvalueshouldalwaysbezero.AnythingabovezeroindicatestheVMhypervisorisoversubscribed.AdditionalphysicalhostsshouldbeaddedorcollocatedVMsshouldbemigratedtootherhosts.
Networkutilization
Createabaselineforthisvalueas"normal"willbedependentuponyourworkload.Alertifthisvalueexceeds120%ofbaseline.Ifthesustainedutilizationisabove80%ofthetotalavailablebandwidth,itindicatestheneedtoscalethecluster.
Presenceofbeam.smpprocess
Alertifbeam.smpisnotpresent.ThisindicatesCouchbaseisofflineandneedstoberestarted.
Alertifdata/index/query/fts/eventing/analyticsprocessesarenotpresent.ThisindicatesCouchbaseiseitheroffline,startingup,orservicesmayhavecrashedandneedtoberestarted.Belowaretheprocessesbyservice:
Monitoring:OperatingSystem
65CouchbaseProfessionalServices
Presenceofserviceprocesses
restarted.Belowaretheprocessesbyservice:
DataService:memcachedDataService:projectorDataService:goxdcrIndexService:indexerQueryService:cbq-engineFullTextSearchService:cbftEventingService:eventing-producerEventingService:eventing-consumerAnalyticsService:cbas
NTPclockskew
Couchbaserequiresallclusternodes(andanyreplicatedclusters)tohavetheirsystemclockssynchronizedtoacommonclocksource.Monitorclockskewoneachserverandalertifitismorethan1minuteoutofsync.
CouchbaseSystemStatsThefollowingOperatingSystemstatsareavailableviatheCluster-WideorPer-NodeEndpointslistedbelow.
AvailableStats
Statname Description
allocstall Numberofallocationsstalledwhenreclaiming
cpu_cores_available NumberofCPUcoresavailableintheclusterorthenode
cpu_irq_rate TheCPUinterruptrequestrate
cpu_stolen_rate CPUstealrate
cpu_idle_ms TheamountoftimetheCPUhasbeenidle
cpu_local_ms
cpu_utilization_rate MaxCPUutilization%
hibernated_requests Idlestreamingrequests
hibernated_waked Streamingwakeups/sec
mem_actual_free AmountofRAMavailableonthisserver
mem_actual_used AmountofRAMusedonthisserver
mem_free AmountofRAMavailableonthisserver
mem_limit ThelimitforRAM
mem_total AmountofRAMusedonthisserver
mem_used_sys AmountofRAMavailabletotheOS
odp_report_failed
rest_requests Managementportreqs/sec
swap_total Amountofswapspaceavailableonthisserver
swap_used Amountofswapspaceinuseonthisserver
Monitoring:OperatingSystem
66CouchbaseProfessionalServices
GETClusterSystemStats
Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe
entireandclusterandthebestpracticeistomonitoreachnodeindividually.
Insecure:http://localhost:8091/pools/default/buckets/@system/stats
Secure:https://localhost:18091/pools/default/buckets/@system/stats
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@query/stats|\
jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|
.key+":"+(.value|add/length|tostring)'
GETNode-LevelOSStats
Eachnodeintheclustershouldbemonitoringindividuallyusingtheendpointlistedbelow.
Insecure:http://localhost:8091/pools/default/buckets/@system/nodes/NODE/stats
Secure:https://localhost:18091/pools/default/buckets/@system/nodes/NODE/stats
Example:StatsforIndividualNode
Thefollowingexampledemonstrateshowtoretrievethesystemstatsforthecluster.
NODE="172.17.0.2:8091"
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@system/nodes/$NODE/stats|\
jq-r-c'.op.samples|
"cpu_idle_ms:"+(.cpu_idle_ms|add/length|tostring)+
"\ncpu_local_ms:"+(.cpu_local_ms|add/length|tostring)+
"\ncpu_utilization_rate:"+(.cpu_utilization_rate|add/length|tos
tring)+
"\nhibernated_requests:"+(.hibernated_requests|add/length|tostr
ing)+
Monitoring:OperatingSystem
67CouchbaseProfessionalServices
"\nhibernated_waked:"+(.hibernated_waked|add/length|tostring)+
"\nmem_actual_free:"+(.mem_actual_free|add/length|tostring)+
"\nmem_actual_used:"+(.mem_actual_used|add/length|tostring)+
"\nmem_free:"+(.mem_free|add/length|tostring)+
"\nmem_total:"+(.mem_total|add/length|tostring)+
"\nmem_used_sys:"+(.mem_used_sys|add/length|tostring)+
"\nrest_requests:"+(.rest_requests|add/length|tostring)+
"\nswap_total:"+(.swap_total|add/length|tostring)+
"\nswap_used:"+(.swap_used|add/length|tostring)'
Example:StatsforEachNodeSeparately
#loopovereachofthebuckets
fornodein$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/nodes|\
jq-r'.nodes[]|
.hostname'
)
do
echo"$nodeOSStats"
echo"-------------------------------------------------------"
#getthesystemstatsforthespecificnode
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@system/nodes/$node/stats|
\
jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|
.key+":"+(.value|add/length|tostring)'
done
Monitoring:OperatingSystem
68CouchbaseProfessionalServices
Monitoring:XDCR
ReplicationStatusThetasksendpointwillprovideclusterwideinformationonoperationssuchasrebalance,XDCRreplications,
etc.Theresponseisanarraythatwillneedtobefilteredforitemscontaining [].type=="xdcr"
Insecure:http://localhost:8091/pools/default/tasks
Secure:http://localhost:18091/pools/default/tasks
Response:
[
"cancelURI":"/controller/cancelXDCR/20763b82bb6b517bd0d15d9f6b78c13c%2Ftr
avel-sample%2Fdemo",
"settingsURI":"/settings/replications/20763b82bb6b517bd0d15d9f6b78c13c%2F
travel-sample%2Fdemo",
"status":"running",
"replicationType":"xmem",
"continuous":true,
"filterExpression":"",
"id":"20763b82bb6b517bd0d15d9f6b78c13c/travel-sample/demo",
"pauseRequested":false,
"source":"travel-sample",
"target":"/remoteClusters/20763b82bb6b517bd0d15d9f6b78c13c/buckets/demo",
"type":"xdcr",
"recommendedRefreshPeriod":10,
"changesLeft":0,
"docsChecked":0,
"docsWritten":31591,
"maxVBReps":null,
"errors":[]
]
KeyMetricstoMonitor
CouchbaseMetric Description Response
statusIndicateswhetherareplicationisina"running","paused",or"notRunning"state.
Alertifthevalueis"paused"or"notRunning".
Note:The replicationIdiscomposedof3parts,delimitedbya /:
SampleReplicationId: 6f76c2a07245aef856db44a8e361032/travel-sample/default
Monitoring:XDCR
69CouchbaseProfessionalServices
RemoteClusterID
SourceBucket
TargetBucket
Example
ThefollowingexampleillustratesoutputtingthereplicationIDandStatus.
curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/default/tasks|\
jq-r'map(select(.type|contains("xdcr")))|
.[]|.id+"("+.status+")"'
Thisexampleshowsoutputtingallreplicationswhosestatusis"paused"or"notRunning"
curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/default/tasks|\
jq-c'map(select(
(.type|contains("xdcr"))
and
(.status|contains("paused")orcontains("notRunning"))
))|.[]|.id+"("+.status+")"'
PerReplicationStatsTheXDCRstatsareanaggregateforalloftheconfiguredreplications,eitherfortheentireclusterora
specificnode.
html
Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-xdcr-statistics.html
AvailableStats
Statname Description
replication_changes_leftThetotalnumberofchangesleftacrossallreplicationsforthebucket
Thetotalnumberofdocumentsinreplication
Monitoring:XDCR
70CouchbaseProfessionalServices
replication_docs_rep_queueThetotalnumberofdocumentsinreplicationqueueforallreplicationsforthebucket
replications/replicationId/bandwidth_usageBandwidthusedduringreplication,measuredinbytespersecond.
replications/replicationId/changes_leftNumberofmutationstobereplicatedtotheremotecluster
replications/replicationId/data_replicated Sizeofdatareplicatedinbytes
replications/replicationId/datapool_failed_gets Numberoffailedgetsfromthepool
replications/replicationId/dcp_datach_length
replications/replicationId/dcp_dispatch_time
replications/replicationId/deletion_docs_writtenThenumberofdocsdeletedthathavebeenwrittentothetargetcluster
replications/replicationId/deletion_failed_cr_sourceThenumberofdeletesthathavefailedconflictresolutiononthesourceduetooptimisticreplication
replications/replicationId/deletion_filteredThenumberofdeletesthathavebeenfiltered
replications/replicationId/deletion_received_from_dcpThenumberofdeletesthathavebeenreceivedfromDCP
replications/replicationId/docs_checked Numberofdocumentscheckedforchanges
replications/replicationId/docs_failed_cr_sourceThenumberofdocsthathavefailedconflictresolutiononthesourceduetooptimisticreplication
replications/replicationId/docs_filteredNumberofdocumentsthathavebeenfilteredoutandnotreplicatedtotargetcluster
replications/replicationId/docs_opt_repd Numberofdocumentssentoptimistically
replications/replicationId/docs_processed Thenumberofdocumentsprocessed
replications/replicationId/docs_received_from_dcp NumberofdocumentsreceivedfromDCP
replications/replicationId/docs_rep_queue Numberofdocumentsinreplicationqueue
replications/replicationId/docs_unable_to_filterThenumberofdocumentswherefilteringcouldnotbeprocessed
replications/replicationId/docs_writtenNumberofdocumentswrittentothetargetcluster
replications/replicationId/expiry_docs_writtenThenumberofexpirydocumentswrittentothetargetcluster
replications/replicationId/expiry_failed_cr_sourceThenumberofexpiriesthathavefailedconflictresolutiononthesourceduetooptimisticreplication
expiry_filteredThenumberofexpirydocumentsthathavebeenfilteredoutandnotreplicatedtothetargetcluster
replications/replicationId/expiry_received_from_dcpThenumberofexpirydocumentsthathavebeenreceived
Thenumberofexpirydocumentsremoved
Monitoring:XDCR
71CouchbaseProfessionalServices
replications/replicationId/expiry_strippedThenumberofexpirydocumentsremovedfromreplicating
replications/replicationId/num_checkpointsNumberofcheckpointsissuedinreplicationqueue
replications/replicationId/num_failedckptsNumberofcheckpointsfailedduringreplication
replications/replicationId/percent_completenessPercentageofcheckeditemsoutofallcheckedandto-be-replicateditems
replications/replicationId/rate_doc_checks
replications/replicationId/rate_doc_opt_repd
replications/replicationId/rate_received_from_dcpNumberofdocumentsreceivedfromDCPpersecond
replications/replicationId/rate_replicatedRateofdocumentsbeingreplicated,measuredindocumentspersecond
replications/replicationId/resp_wait_time
replications/replicationId/set_docs_writtenThenumberofsetsthathavefailedconflictresolutiononthesourceduetooptimisticreplication
replications/replicationId/set_failed_cr_sourceThenumberofsetsthathavefailedconflictresolutiononthesourceduetooptimisticreplication
replications/replicationId/set_filteredNumberofsetsthathavebeenfilteredoutandnotreplicatedtotargetcluster
replications/replicationId/set_received_from_dcpThenumberofsetsthathavebeenreceivedfromDCP
replications/replicationId/size_rep_queue Sizeofreplicationqueueinbytes
replications/replicationId/throttle_latency Throttlelatency
replications/replicationId/throughput_throttle_latency Throughputthrottlelatency
replications/replicationId/time_committing Secondselapsedduringreplication
replications/replicationId/wtavg_docs_latencyWeightedaveragelatencyforsendingreplicatedchangestotargetcluster
replications/replicationId/wtavg_meta_latency
Weightedaveragetimeforrequestingdocumentmetadata.XDCRusesthisforconflictresolutionpriortosendingthedocumentintothereplicationqueue
GETCluster-WideBucketXDCRStats
Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe
entireandclusterandthebestpracticeistomonitoreachnodeindividually.
Insecure:http://localhost:8091/pools/default/buckets/@xdcr-BUCKET/stats
Secure:http://localhost:8091/pools/default/buckets/@xdcr-BUCKET/stats
Monitoring:XDCR
72CouchbaseProfessionalServices
Example:SingleBucket
ThisexamplewilloutputtheXDCRstatsforaspecificbucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@xdcr-travel-sample/stats|\
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length>1)|
""+(.key)+":"+
(.value|add/length|tostring)'
Example:AllReplications
ThisexamplewilloutputallXDCRstatsforeverybucketthathasoneormorereplicationsconfigured.
#loopovereachofthebuckets
forbucketin$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/default/tasks|\
jq-r'[.[]|select(.type=="xdcr")|.source]|sort|unique|.[]')
do
echo""
echo"Bucket:$bucket"
echo"================================================================"
#getthexdcrstatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@xdcr-$bucket/stats|\
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length>1)|
""+(.key)+":"+
(.value|add/length|tostring)'
done
GETNode-LevelBucketXDCRStats
Monitoring:XDCR
73CouchbaseProfessionalServices
Eachdatanodeintheclustershouldbemonitoringindividuallyusingtheendpointlistedbelow.
Insecure:http://localhost:8091/pools/default/buckets/@xdcr-BUCKET/nodes/NODE/stats
Secure:http://localhost:8091/pools/default/buckets/@xdcr-BUCKET/nodes/NODE/stats
Example:SingleBucket
ThisexamplewilloutputtheXDCRstatsforaspecificnodeandbucket.
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@xdcr-travel-sample/nodes/172.
17.0.2:8091/stats|\
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length>1)|
""+(.key)+":"+
(.value|add/length|tostring)'
Example:AllReplications
ThisexamplewilloutputallXDCRstatsforasinglenodeforeverybucketthathasoneormorereplications
configured.
#loopovereachofthebuckets
forbucketin$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/default/tasks|\
jq-r'[.[]|select(.type=="xdcr")|.source]|sort|unique|.[]')
do
echo""
echo"Bucket:$bucket"
echo"================================================================"
#getthexdcrstatsforthebucket
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@xdcr-$bucket/nodes/172.17.0
.2:8091/stats|\
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length>1)|
Monitoring:XDCR
74CouchbaseProfessionalServices
""+(.key)+":"+
(.value|add/length|tostring)'
done
Example:AllReplicationsforEachNode
ThisexamplewilloutputallXDCRstatsforasinglenodeforeverybucketthathasoneormorereplications
configured.
#getallofthebucketsintheclusterthathave1ormore
#xdcrreplicationsconfigured
buckets=$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/default/tasks|\
jq-r'[.[]|select(.type=="xdcr")|.source]|sort|unique|.[]')
#getallofthenodesintheclusterrunningthedataservice
nodes=$(curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/nodes|\
jq-r'.nodes[]|
select(.services|contains(["kv"])==true)|
.hostname'
)
#loopovereachofthebuckets
forbucketin$buckets[@]
do
echo""
echo"Bucket:$bucket"
echo"================================================================"
#loopovereachofthenodesinthecluster
fornodein$nodes[@]
do
echo"Node:$node"
echo"----------------------------------------------------------------"
#getthexdcrstatsforthebucketonthenode
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/@xdcr-$bucket/nodes/$node/
stats|\
Monitoring:XDCR
75CouchbaseProfessionalServices
jq-r'.op.samples|to_entries|sort_by(.key)|.[]|
select(.key|split("/")|length>1)|
""+(.key)+":"+
(.value|add/length|tostring)'
echo""
done
done
KeyMetricstoMonitor
CouchbaseMetric Description Response
changes_left
ThenumberofitemspendingXDCRreplication.Thiscanbeusedtoapproximatethedegreeofeventualconsistencybetweenclusters.
Createabaselineforthisvalueas"normal"willdependonworkload,XDCRconfiguration,andavailablebandwidth.Alertat2xofbaseline.Thismayindicatearesourcebottleneck.
bandwidth_usageTheamountofbandwidthinbytesusedforXDCRreplication.
AnalertvalueforthismetricshouldbebasedonthenetworkinterconnectcapacitybetweentheclustersandthepercentageoftheinterconnectXDCRisexpectedorallowedtoconsume.
GETPerNodeIndividualStatforaReplication
EachXDCRreplicationstatcanberetrievedindividually.TheentirekeymustbeURL-encoded,where /'s
arereplacedwith %2F.
Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-xdcr-statistics.html
Example
Thisexampleshowsrequestinganindividualstatforasinglereplicationanddisplaystheresultsforeach
datanodeinthecluster.
#setthereplicationinfo
REMOTE_CLUSTER='20763b82bb6b517bd0d15d9f6b78c13c'
SOURCE_BUCKET='travel-sample'
target_BUCKET='demo'
STAT_NAME='percent_completeness'
#buildtheurl
STAT_URL="http://localhost:8091/pools/default/buckets/$SOURCE_BUCKET/stats"
STAT_URL="$STAT_URL/replications%2F$REMOTE_CLUSTER%2F$SOURCE_BUCKET"
STAT_URL="$STAT_URL%2F$target_BUCKET%2F$STAT_NAME"
curl\
Monitoring:XDCR
76CouchbaseProfessionalServices
--userAdministrator:password\
--silent\
$STAT_URL|\
jq-r'.nodeStats|to_entries|.[]|
(.key|split(":")|.[0])+":"+(.value|add/length|tostring)'
GETRemoteClusterInformation
The replicationIdisauniquelygeneratedIDanddoesnotconveytheremoteclusterdetails.All
configuredremoteclustersandtheirassociatedIDscanberetrievedfromtheRESTAPI.
Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-xdcr-get-ref.html
Insecure:http://localhost:8091/pools/default/remoteClusters
Secure:https://localhost:18091/pools/default/remoteClusters
Example
Thisexampleshowsrequestinganindividualstatforasinglereplicationanddisplaystheresultsforeach
datanodeinthecluster.
curl\
--userAdministrator:password\
--silent\
--requestGET\
http://localhost:8091/pools/default/remoteClusters|\
jq-r'.'
BucketXDCROperations
GETBucketIncomingXDCRoperations
Toretrievetheincomingwriteoperationsthatoccuronatargetclusterduetoreplication,maketherequest
onyourtargetclusterandbucket.
Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-xdcr-statistics.html#rest-xdcr-stats-
operations
Insecure:http://localhost:8091/pools/default/buckets/BUCKET/stats
Secure:http://localhost:8091/pools/default/buckets/BUCKET/stats
AvailableStats
Statname Description
ep_num_ops_get_metaThenumberofmetadatareadoperationspersecondforthebucketasthetargetforXDCR
Monitoring:XDCR
77CouchbaseProfessionalServices
ep_num_ops_get_meta targetforXDCR
ep_num_ops_set_meta ThenumberofsetoperationspersecondforthebucketasthetargetforXDCR
ep_num_ops_del_metaThenumberofdeleteoperationspersecondforthebucketasthetargetforXDCR
xdc_opsTotalXDCRoperationspersecondforthisbucket(measuredfromthesumofthestatistics:ep_num_ops_del_meta,ep_num_ops_get_meta,andep_num_ops_set_meta)
Example
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/travel-sample/stats|\
jq-r'.op.samples|
"ep_num_ops_get_meta:"+(.ep_num_ops_get_meta|add/length|tostr
ing)+
"\nep_num_ops_set_meta:"+(.ep_num_ops_set_meta|add/length|tost
ring)+
"\nep_num_ops_del_meta:"+(.ep_num_ops_del_meta|add/length|tost
ring)+
"\nxdc_ops:"+(.xdc_ops|add/length|tostring)'
GETXDCRTimestamp-basedConflictResolutionStatsWhenusingbucketsconfiguredwithTimestamp-basedConflictResolutionitisimportanttomonitorthedrift
relatedstatistics.WhenaclusteristhedestinationforXDCRtraffic,activevBucketswillcalculatedriftfrom
theirremoteclusterpeers.
Itisnormalforaclusterwithcloselysynchronizedclockstoshowsomedrift;ingeneralitwillbeshowinghow
longittookamutationtobereplicatedandshouldremainsteady.ItisalsonormalfortheactivevBucketdrift
tobezeroifnoXDCRrelationshipexists(orifnoXDCRtrafficisflowing).
Documentation:https://docs.couchbase.com/server/6.0/learn/clusters-and-availability/xdcr-monitor-
timestamp-conflict-resolution.html
Insecure:http://localhost:8091/pools/default/buckets/BUCKET/stats
Secure:http://localhost:8091/pools/default/buckets/BUCKET/stats
AvailableStats
Monitoring:XDCR
78CouchbaseProfessionalServices
Statname Description
avg_active_timestamp_drift
avg_replica_timestamp_drift
ep_active_hlc_drift Thesumoftotal_abs_driftforthenode'sactivevBuckets
ep_active_hlc_drift_countThesumoftotal_abs_drift_countforthenode'sactivevBuckets
ep_replica_hlc_drift Thesumoftotal_abs_driftforthenode'sactivevBuckets
ep_replica_hlc_drift_countThesumoftotal_abs_drift_countforthenode'sactivevBuckets
ep_active_ahead_exceptionsThesumofdrift_ahead_exceededforthenode'sactivevBuckets
ep_replica_ahead_exceptionsThesumofdrift_ahead_exceededforthenode'sreplicavBuckets
ep_clock_cas_drift_threshold_exceeded
Example
curl\
--userAdministrator:password\
--silent\
--requestGET\
--datazoom=minute\
http://localhost:8091/pools/default/buckets/travel-sample/stats|\
jq-r'.op.samples|
"avg_active_timestamp_drift:"+
(.avg_active_timestamp_drift|add/length|tostring)+
"\navg_replica_timestamp_drift:"+
(.avg_replica_timestamp_drift|add/length|tostring)+
"\nep_active_hlc_drift:"+
(.ep_active_hlc_drift|add/length|tostring)+
"\nep_active_hlc_drift_count:"+
(.ep_active_hlc_drift_count|add/length|tostring)+
"\nep_replica_hlc_drift:"+
(.ep_replica_hlc_drift|add/length|tostring)+
"\nep_replica_hlc_drift_count:"+
(.ep_replica_hlc_drift_count|add/length|tostring)+
"\nep_active_ahead_exceptions:"+
(.ep_active_ahead_exceptions|add/length|tostring)+
"\nep_clock_cas_drift_threshold_exceeded:"+
(.ep_clock_cas_drift_threshold_exceeded|add/length|tostring)'
Monitoring:XDCR
79CouchbaseProfessionalServices