Monitoring Guide - Couchbase

78

Transcript of Monitoring Guide - Couchbase

Page 1: Monitoring Guide - Couchbase
Page 2: Monitoring Guide - Couchbase

MonitoringGuide

ProactivemonitoringandalertingisessentialtomanagingahealthyCouchbaseenvironment.Whilethe

CouchbaseWebConsoleprovidesdetailedstatisticsandbasicalertingfunctionality,itisnotintendedtobea

realtimedashboardandshouldn'tbeusedastheprimaryoperationalmonitoringutility.

Integrationwithexternalmonitoringsystemsisrequiredfortwoprimarypurposes:proactivealertingandhigh

resolutiontrending.Theexternalmonitoringsystemshouldbecapableofsettingalertthresholdsonaper-

metricbasis.Asthevalueofmostmetricsareworkloadandenvironment-specific,theywillrequire

establishingabaselineforwhatis"normal"foryourusecases.TrendingtheCouchbasemetricswillhelp

establishthebaselinevaluesandalertscanbeconfiguredwhenpoint-in-timevaluesexceedthe"normal"

range.TrendedmetricsalsoallowsCouchbaseadministratorstoobserveresourceconsumptionovertime,

informingwhenscalingeventswillbecomenecessary.

ThisdocumentdescribeshowtopolltheCouchbaseRESTAPItoobtainmetricsforanexternalmonitoring

system,describeswhichmetricsaremostimportanttomonitor,andprovidesguidanceonhowtointerpret

thosemetrics.

ObtainingCouchbaseMetricsCouchbaseexposesmonitoringmetricsviaRESTAPIswithresponsesreturnedinJSONformat.Thereare

twotypesofstatisticalAPIsavailable,ClusterManager(port8091/18091)statsandServicespecific

administrativestats.

ClusterManagerstatsprovidestatisticalsamplingforagivenserviceand/orentitiesataparticularinterval.

Eachresponsefrom /statsendpointwillcontaina timestamppropertyforwhenthesamplewas

takenthatwilldirectlycorrelatetoeachoftheavailablestats.

EveryClusterManagerendpointsupportstwooptionalquerystringparameters:

zoom

The zoomparameterdeterminestheintervalofsamplestoreturnintheresponse.Thezoomparameter

providesthefollowinggranularity:

zoom=minute(default)-Everysecondforthelastminute(60samples)zoom=hour-Everyfour(4)secondsforthelasthour(900samples)

zoom=day-Everyminuteforthelastday(1440samples)

zoom=week-Everyten(10)minutesforthelastweek,actually,eight(8)days(1152samples)

zoom=year-Everysix(6)hoursforthelastyear(1464samples)

Duetosamplefrequency,thenumberofsamplesreturnedareplusorminusone(+-1).

haveTStamp

Requestsstatisticsfromthistimestampuntilthecurrenttime.The haveTStampparameterisspecifiedas

UNIXepochtimeinmilliseconds.

MonitoringGuide

3CouchbaseProfessionalServices

Page 3: Monitoring Guide - Couchbase

Tolimittheresultswhenusingthezoomparameter,post-processtheresults.Forexample,ifyouneed

samplesfromthelastfive(5)minutes,setthezoomparametertoonehourandretrievethelast75

entriesfromtheJSONlist.

PollingtheAPIsTheRESTAPIsshouldbepolledminutelyviaalocalagentorremotelyusingthenode(s)IPorhostname.

CouchbaseRESTAPIsmustbeaccessedusingadministrativeaccountcredentials;aRead-Only

Administratorisrecommendedforthispurpose.

AsmostofthemetricsprovidedbytheRESTAPIareper-node,itisnecessarytoqueryeverynodeinthe

cluster.

LimitthenumberofrequestsperAPIwhenqueryingmetrics,i.e.returnallbucketmetricsinonerequest

ratherthanissuingseparaterequestspermetric.HeavyuseoftheCouchbaseRESTAPIscanhaveCPU

utilizationimpactsonthecluster.

CouchbaseServiceDiscoverySomemonitoringsystemsarecapableofdiscoveringnewmonitoringtargetsandautomaticallydefiningthe

monitoringprofiletobeapplied.Couchbasesupportsthisbyexposingclustermembership,MDSservice

assignment,andserviceportsviatheDataServiceNodeAPI.

MetricsandServicestoMonitorEachsectioninthelistdescribetheavailablemonitoringmetricsexposedbytheCouchbaseservice,a

descriptionofeachmetric,andpossibleoperationalresponses.Alertsshouldbeconfiguredtobesentfrom

theexternalmonitoringsystemwhenmetricvaluesfalloutsidetheexpectedrange.Guidanceoninterpreting

themetricsandpossibleoperationalresponsesisprovided.

Eachguidewillcontainexamplesofhowtocallanendpointandparsetheresults.Fortheseexamplesatool

called jqisused,itisalightweightcliparserforJSON,thisisnotrequiredandisprovidedforexample

purposesonly.Itcanbedownloadedathttps://stedolan.github.io/jq/download

Monitoring:OperatingSystem

Monitoring:Nodes

Monitoring:DataService

Monitoring:XDCR

Monitoring:QueryService

Monitoring:IndexService

Monitoring:FTSService

Monitoring:EventingService

MonitoringGuide

4CouchbaseProfessionalServices

Page 4: Monitoring Guide - Couchbase

Monitoring:Logs

ReferenceImplementationsCouchbaseprovidesareferencemonitoringimplementationtodemonstrateinteractingwiththeavailable

RESTAPIs.

AsampleNagiospluginisavailablehere.

Acompletedockerizedmonitoringenvironmentisavailablehere.

ThirdPartyIntegrationsThefollowingmonitoringsystemshavepluginsavailableforCouchbase.Notethatthesearethirdparty

integrationsandmaynotbecompletenorfollowthebestpracticessetforthinthisdocument.

CouchbaseNodeExporterforPrometheus,seethePrometheusIntegrationGuidefordetails

AppDynamics

DataDog

Dynatrace

NewRelic

SignalFx

Sensu

ManageEngine

MonitoringGuide

5CouchbaseProfessionalServices

Page 5: Monitoring Guide - Couchbase

Monitoring:DataService

BucketsOverviewBucketsoverviewprovidesallavailablebuckets,high-levelsysteminformationandresourceutilizationfor

eachbucketinthecluster.

Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-buckets-summary.html

Insecure:http://localhost:8091/pools/default/buckets

Secure:https://localhost:18091/pools/default/buckets

Example

Thefollowingexampleillustratesretrievingallofthebucketsinaclusteranddisplayingbasicstatsabout

eachbucket.

curl\

--userAdministrator:password\

--silent\

--requestGET\

--dataskipMap=true\

http://localhost:8091/pools/default/buckets|\

jq-r'.[]|

"Bucket:"+.name+"\n"+

"QuotaUsed:"+(.basicStats.quotaPercentUsed|tostring)+"%\n"+

"Ops/Sec:"+(.basicStats.opsPerSec|tostring)+"\n"+

"DiskFetches:"+(.basicStats.diskFetches|tostring)+"\n"+

"ItemCount:"+(.basicStats.itemCount|tostring)+"\n"+

"DiskUsed:"+(.basicStats.diskUsed/1024/1024|tostring)+"MB\n"

+

"DataUsed:"+(.basicStats.dataUsed/1024/1024|tostring)+"MB\n"

+

"MemoryUsed:"+(.basicStats.memUsed/1024/1024|tostring)+"MB\n"

'

Note:The skipMapquerystringparameterisabooleanvaluethatcanbeusedtoincludeorexcludethecurrentvBucketdistributionmapforthebuckets.

IndividualBucket-LevelStats

Monitoring:DataService

6CouchbaseProfessionalServices

Page 6: Monitoring Guide - Couchbase

Bucketmetricsprovidedetailedinformationaboutresourceconsumption,applicationworkload,andinternal

operationsatthebucketlevel.ThefollowingBucketstatsareavailableviatheCluster-WideorPer-Node

Endpointslistedbelow.

Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-bucket-stats.html

Insecure:http://localhost:8091/pools/default/buckets/BUCKET/stats

Secure:https://localhost:18091/pools/default/buckets/BUCKET/stats

AvailableStats

Statname Description

avg_active_timestamp_driftAveragedrift(inseconds)permutationonactivevBuckets

avg_bg_wait_time Averagebackgroundfetchtimeinmicroseconds

avg_disk_commit_timeAveragediskcommittimeinsecondsasfromdisk_updatehistogramoftimings

avg_disk_update_timeAveragediskupdatetimeinmicrosecondsasfromdisk_updatehistogramoftimings

avg_replica_timestamp_driftAveragedrift(inseconds)permutationonreplicavBuckets

bg_wait_count Numberofbackgroundfetchoperations

bg_wait_total Backgroundfetchtimeinmicroseconds

bytes_read Numberofbytespersecondsentintothisbucket

bytes_written Numberofbytespersecondsentfromthisbucket

cas_badvalNumberofCASoperationspersecondusinganincorrectCASIDfordatathatthisbucketcontains

cas_hitsNumberofCASoperationspersecondfordatathatthisbucketcontains

cas_missesNumberofCASoperationspersecondfordatathatthisbucketdoesnotcontain

cmd_get Numberofgetoperationsservicedbythisbucket

cmd_lookupNumberoflookupsub-documentoperationsservicedbythisbucket

cmd_set Numberofsetoperationsservicedbythisbucket

couch_docs_actual_disk_sizeThesizeofalldatafilesforthisbucket,includingthedataitself,metadataandtemporaryfiles

couch_docs_data_size Thesizeofactivedatainthisbucket

couch_docs_disk_size Thesizeofactivedatainthisbucketondisk

couch_docs_fragmentationHowmuchfragmenteddatathereistobecompactedcomparedtorealdataforthedatafilesinthisbucket

couch_spatial_data_sizeThesizeofallactiveitemsinallthespatialindexesforthisbucketondisk

couch_spatial_disk_sizeThesizeofallactiveitemsinallthespatialindexesfor

Monitoring:DataService

7CouchbaseProfessionalServices

Page 7: Monitoring Guide - Couchbase

couch_spatial_disk_sizeThesizeofallactiveitemsinallthespatialindexesforthisbucketondisk

couch_spatial_ops Allthespatialindexreads

couch_total_disk_sizeThetotalsizeondiskofalldataandviewfilesforthisbucket.

couch_views_actual_disk_sizeThesizeofallactiveitemsinalltheindexesforthisbucketondisk

couch_views_data_sizeThesizeofactivedataonforalltheviewindexesinthisbucket

couch_views_disk_sizeThesizeofactivedataonforalltheviewindexesinthisbucketondisk

couch_views_fragmentationHowmuchfragmenteddatathereistobecompactedcomparedtorealdatafortheviewindexfilesinthisbucket

couch_views_opsAlltheviewreadsforalldesigndocumentsincludingscattergather.

curr_connectionsNumberofconnectionstothisserverincludingconnectionsfromexternalclientSDKs,proxies,DCPrequestsandinternalstatisticgathering

curr_itemsNumberofuniqueitemsinthisbucket-onlyactiveitems,notreplica

curr_items_tot Totalnumberofitemsinthisbucket(includingreplicas)

decr_hitsNumberofdecrementoperationspersecondfordatathatthisbucketcontains

decr_missesNumberofdecroperationspersecondfordatathatthisbucketdoesnotcontain

delete_hits Numberofdeleteoperationspersecondforthisbucket

delete_missesNumberofdeleteoperationspersecondfordatathatthisbucketdoes

disk_commit_count Thenumberofdiskcomments

disk_commit_total Thetotaltimespentcommittingtodisk

disk_update_count Thetotalnumberofdiskupdates

disk_update_total Thetotaltimespentupdatingdisk

disk_write_queueNumberofitemswaitingtobewrittentodiskinthisbucket

ep_active_ahead_exceptionsTotalnumberofaheadexceptionsforallactivevBuckets

ep_active_hlc_driftThesumoftotal_abs_driftforthenodesactivevBuckets

ep_active_hlc_drift_countThesumoftotal_abs_drift_countforthenodesactivevBuckets

ep_bg_fetched Numberofreadspersecondfromdiskforthisbucket

ep_cache_miss_ratePercentageofreadspersecondtothisbucketfromdiskasopposedtoRAM

Monitoring:DataService

8CouchbaseProfessionalServices

Page 8: Monitoring Guide - Couchbase

ep_clock_cas_drift_threshold_exceeded

ep_data_read_failed Numberofdiskreadfailures

ep_data_write_failed Numberofdiskwritefailures

ep_dcp_2i_backoff NumberofbackoffsforindexDCPconnections

ep_dcp_2i_countNumberofinternalsecondindexDCPconnectionsinthisbucket

ep_dcp_2i_items_remainingNumberofsecondaryindexitemsremainingtobesenttoconsumerinthisbucket

ep_dcp_2i_items_sentNumberofsecondaryindexitemspersecondbeingsentforaproducerforthisbucket

ep_dcp_2i_producer_count Numberofsecondaryindexsendersforthisbucket

ep_dcp_2i_total_backlog_sizeTotalsizeinbytesoftheDCPbacklogforsecondaryindexes

ep_dcp_2i_total_bytesNumberofbytespersecondbeingsentforsecondaryindexesDCPconnections

ep_dcp_cbas_backoff NumberofbackoffsforAnalyticsDCPconnections

ep_dcp_cbas_countNumberofinternalAnalyticsDCPconnectionsinthisbucket

ep_dcp_cbas_items_remainingNumberofAnalyticsitemsremainingtobesenttoconsumerinthisbucket

ep_dcp_cbas_items_sentNumberofAnalyticsitemspersecondbeingsentforaproducerforthisbucket

ep_dcp_cbas_producer_count NumberofAnalyticssendersforthisbucket

ep_dcp_cbas_total_backlog_size TotalsizeinbytesoftheDCPbacklogforAnalytics

ep_dcp_cbas_total_bytesNumberofbytespersecondbeingsentforAnalyticsDCPconnections

ep_dcp_eventing_backoff NumberofbackoffsforEventingDCPconnections

ep_dcp_eventing_countNumberofinternalEventingDCPconnectionsinthisbucket

ep_dcp_eventing_items_remainingNumberofEventingitemsremainingtobesenttoconsumerinthisbucket

ep_dcp_eventing_items_sentNumberofEventingitemspersecondbeingsentforaproducerforthisbucket

ep_dcp_eventing_producer_count NumberofEventingsendersforthisbucket

ep_dcp_eventing_total_backlog_size TotalsizeinbytesoftheDCPbacklogforEventing

ep_dcp_eventing_total_bytesNumberofbytespersecondbeingsentforEventingDCPconnections

ep_dcp_fts_backoff NumberofbackoffsforFTSDCPconnections

ep_dcp_fts_count NumberofinternalFTSDCPconnectionsinthisbucket

ep_dcp_fts_items_remainingNumberofFTSitemsremainingtobesenttoconsumerinthisbucket

NumberofFTSitemspersecondbeingsentfora

Monitoring:DataService

9CouchbaseProfessionalServices

Page 9: Monitoring Guide - Couchbase

ep_dcp_fts_items_sentNumberofFTSitemspersecondbeingsentforaproducerforthisbucket

ep_dcp_fts_producer_count NumberofFTSsendersforthisbucket

ep_dcp_fts_total_backlog_size TotalsizeinbytesoftheDCPbacklogforFTS

ep_dcp_fts_total_bytesNumberofbytespersecondbeingsentforFTSDCPconnections

ep_dcp_other_backoff NumberofbackoffsforotherDCPconnections

ep_dcp_other_count NumberofotherDCPconnectionsinthisbucket

ep_dcp_other_items_remainingNumberofitemsremainingtobesenttoconsumerinthisbucket

ep_dcp_other_items_sentNumberofitemspersecondbeingsentforaproducerforthisbucket

ep_dcp_other_producer_count Numberofothersendersforthisbucket

ep_dcp_other_total_backlog_sizeTotalsizeinbytesoftheDCPbacklogforanalyticsother

ep_dcp_other_total_bytesNumberofbytespersecondbeingsentforotherDCPconnectionsforthisbucket

ep_dcp_replica_backoff NumberofbackoffsforreplicationDCPconnections

ep_dcp_replica_countNumberofinternalreplicationDCPconnectionsinthisbucket

ep_dcp_replica_items_remainingNumberofreplicationitemsremainingtobesenttoconsumerinthisbucket

ep_dcp_replica_items_sentNumberofreplicationitemspersecondbeingsentforaproducerforthisbucket

ep_dcp_replica_producer_count Numberofreplicationsendersforthisbucket

ep_dcp_replica_total_backlog_size TotalsizeinbytesoftheDCPbacklogforreplication

ep_dcp_replica_total_bytesNumberofbytespersecondbeingsentforreplicationDCPconnections

ep_dcp_views+indexes_backoff Numberofbackoffsforview/indexDCPconnections

ep_dcp_views+indexes_countNumberofinternalview/indexDCPconnectionsinthisbucket

ep_dcp_views+indexes_items_remainingNumberofview/indexitemsremainingtobesenttoconsumerinthisbucket

ep_dcp_views+indexes_items_sentNumberofview/indexitemspersecondbeingsentforaproducerforthisbucket

ep_dcp_views+indexes_producer_count Numberofviews/indexsendersforthisbucket

ep_dcp_views+indexes_total_backlog_size TotalsizeinbytesoftheDCPbacklogforviews/indexes

ep_dcp_views+indexes_total_bytesNumberofbytespersecondbeingsentforviews/indexesDCPconnections

ep_dcp_views_backoff NumberofbackoffsforviewDCPconnections

ep_dcp_views_count NumberofinternalviewDCPconnectionsinthisbucket

Numberofviewitemsremainingtobesenttoconsumer

Monitoring:DataService

10CouchbaseProfessionalServices

Page 10: Monitoring Guide - Couchbase

ep_dcp_views_items_remainingNumberofviewitemsremainingtobesenttoconsumerinthisbucket

ep_dcp_views_items_sentNumberofviewitemspersecondbeingsentforaproducerforthisbucket

ep_dcp_views_producer_count Numberofviewsendersforthisbucket

ep_dcp_views_total_backlog_size TotalsizeinbytesoftheDCPbacklogforviews

ep_dcp_views_total_bytesNumberofbytespersecondbeingsentforviewDCPconnections

ep_dcp_xdcr_backoff NumberofbackoffsforXDCRDCPconnections

ep_dcp_xdcr_countNumberofinternalXDCRDCPconnectionsinthisbucket

ep_dcp_xdcr_items_remainingNumberofXDCRitemsremainingtobesenttoconsumerinthisbucket

ep_dcp_xdcr_items_sentNumberofXDCRitemspersecondbeingsentforaproducerforthisbucket

ep_dcp_xdcr_producer_count NumberofXDCRsendersforthisbucket

ep_dcp_xdcr_total_backlog_size TotalsizeinbytesoftheDCPbacklogforXDCR

ep_dcp_xdcr_total_bytesNumberofbytespersecondbeingsentforXDCRDCPconnections

ep_diskqueue_drainTotalnumberofitemspersecondbeingwrittentodiskinthisbucket

ep_diskqueue_fillTotalnumberofitemspersecondbeingputonthediskqueueinthis

ep_diskqueue_itemsTotalnumberofitemswaitingtobewrittentodiskinthisbucket

ep_flusher_todo Numberofitemscurrentlybeingwritten.

ep_item_commit_failedNumberoftimesatransactionfailedtocommitduetostorageerrors.

ep_kv_size TotalamountofuserdatacachedinRAMinthisbucket

ep_max_size Themaximumamountofmemorythisbucketcanuse.

ep_mem_high_wat Highwatermarkforauto-evictions

ep_mem_low_wat Lowwatermarkforauto-evictions

ep_meta_data_memoryTotalamountofitemmetadataconsumingRAMinthisbucket

ep_num_non_resident Thenumberofnon-residentitems.

ep_num_ops_del_metaNumberofdeleteoperationspersecondforthisbucketasthetargetforXDCR

ep_num_ops_del_ret_meta NumberofdelRetMetaoperations.

ep_num_ops_get_metaNumberofmetadatareadoperationspersecondforthisbucketasthetargetforXDCR

ep_num_ops_set_metaNumberofsetoperationspersecondforthisbucketasthetargetforXDCR

Monitoring:DataService

11CouchbaseProfessionalServices

Page 11: Monitoring Guide - Couchbase

ep_num_ops_set_ret_meta

ep_num_value_ejectsTotalnumberofitemspersecondbeingejectedtodiskinthisbucket

ep_oom_errorsNumberoftimesunrecoverableOOMshappenedwhileprocessingoperations.

ep_ops_createTotalnumberofnewitemsbeinginsertedintothisbucket

ep_ops_updateNumberofitemsupdatedondiskpersecondforthisbucket

ep_overheadExtramemoryusedbytransientdatalikepersistencequeues,replicationqueues,checkpoints,etc.

ep_queue_size Numberofitemsqueuedforstorage.

ep_replica_ahead_exceptionsTotalnumberofaheadexceptionsforallreplicavBuckets

ep_replica_hlc_driftThesumoftotal_abs_driftforthenode'sactivevBuckets

ep_replica_hlc_drift_countThesumoftotal_abs_drift_countforthenode'sactivevBuckets

ep_resident_items_rate PercentageofallitemscachedinRAMinthisbucket

ep_tmp_oom_errorsNumberofback-offssentpersecondtoclientSDKsdueto"outofmemory"situationsfromthisbucket

ep_vb_total TotalnumberofvBucketsforthisbucket

evictions Numberofitemspersecondevictedfromthisbucket

get_hitsNumberofgetoperationspersecondfordatathatthisbucketcontains

get_missesNumberofgetoperationspersecondfordatathatthisbucketdoesnotcontain

hibernated_requests Numberofhibernatedrequests

hibernated_waked Numberoftimeshibernatedwaked

hit_ratioPercentageofgetrequestsservedwithdatafromthisbucket

incr_hitsNumberofincrementoperationspersecondfordatathatthisbucketcontains

incr_missesNumberofincrementoperationspersecondfordatathatthisbucketdoesnotcontain

mem_used AmountofMemoryused

missesTotalamountofoperationspersecondforthatthatthebucketdoesnotcontain

opsTotalamountofoperationspersecond(includingXDCR)tothisbucket

rest_requests

swap_total

swap_used

Monitoring:DataService

12CouchbaseProfessionalServices

Page 12: Monitoring Guide - Couchbase

swap_used

vb_active_eject Numberofitemspersecondbeingejectedtodiskfrom"active"

vb_active_itm_memoryAmountofactiveuserdatacachedinRAMinthisbucket

vb_active_meta_data_memoryAmountofactiveitemmetadataconsumingRAMinthisbucket

vb_active_num NumberofvBucketsinthe"active"stateforthisbucket

vb_active_num_non_resident Numberofnon-residentitems.

vb_active_ops_createNewitemspersecondbeinginsertedinto"active"vBucketsinthisbucket

vb_active_ops_updateNumberofitemsupdatedondiskpersecondforthisbucket

vb_active_queue_ageSumofdiskqueueitemageinmillisecondsfor"active"vBuckets

vb_active_queue_drainNumberofactiveitemspersecondbeingwrittentodiskinthisbucket

vb_active_queue_fillNumberofactiveitemspersecondbeingputontheactiveitemdiskqueueinthisbucket

vb_active_queue_sizeNumberofactiveitemswaitingtobewrittentodiskinthisbucket

vb_active_resident_items_ratioPercentageofactiveitemscachedinRAMinthisbucket

vb_active_sync_write_aborted_count Numberofvbucketwritesaborted

vb_active_sync_write_accepted_count Numberofvbucketwritesaccepted

vb_active_sync_write_committed_count Numberofvbucketwritescommitted

vb_avg_active_queue_ageAverageageinsecondsofactiveitemsintheactiveitemqueueforthisbucket

vb_avg_pending_queue_ageAverageageinsecondsofpendingitemsinthependingitemqueueforthisbucketandshouldbetransientduringrebalancing

vb_avg_replica_queue_ageAverageageinsecondsofreplicaitemsinthereplicaitemqueueforthisbucket

vb_avg_total_queue_ageAverageageinsecondsofallitemsinthediskwritequeueforthisbucket

vb_pending_curr_itemsNumberofitemsin"pending"vBucketsinthisbucketandshouldbetransientduringrebalancing

vb_pending_ejectNumberofitemspersecondbeingejectedtodiskfrom"pending"vBucketsinthisbucketandshouldbetransientduringrebalancing

vb_pending_itm_memoryAmountofpendinguserdatacachedinRAMinthisbucketandshouldbetransientduringrebalancing

vb_pending_meta_data_memoryAmountofpendingitemmetadataconsumingRAMinthisbucketandshouldbetransientduringrebalancing

Monitoring:DataService

13CouchbaseProfessionalServices

Page 13: Monitoring Guide - Couchbase

vb_pending_num

NumberofvBucketsinthe"pending"stateforthis

bucketandshouldbetransientduringrebalancing

vb_pending_num_non_resident Numberofnon-residentitems.

vb_pending_ops_createNewitemspersecondbeinginsteadinto"pending"vBucketsinthisbucketandshouldbetransientduringrebalancing

vb_pending_ops_updateNumberofitemsupdatedondiskpersecondforthisbucket

vb_pending_queue_age Sumofdiskqueueitemageinmilliseconds.

vb_pending_queue_drainNumberofpendingitemspersecondbeingwrittentodiskinthisbucketandshouldbetransientduringrebalancing

vb_pending_queue_fillNumberofpendingitemspersecondbeingputonthependingitemdiskqueueinthisbucketandshouldbetransientduringrebalancing

vb_pending_queue_sizeNumberofpendingitemswaitingtobewrittentodiskinthisbucketandshouldbetransientduringrebalancing

vb_pending_resident_items_ratioPercentageofitemsinpendingstatevbucketscachedinRAMinthisbucket

vb_replica_curr_items Numberofitemsin"replica"vBucketsinthisbucket

vb_replica_ejectNumberofitemspersecondbeingejectedtodiskfrom"replica"vBucketsinthisbucket

vb_replica_itm_memoryAmountofreplicauserdatacachedinRAMinthisbucket

vb_replica_meta_data_memoryAmountofreplicaitemmetadataconsuminginRAMinthisbucket

vb_replica_num NumberofvBucketsinthe"replica"stateforthisbucket

vb_replica_num_non_resident Numberofnon-residentitems.

vb_replica_ops_createNewitemspersecondbeinginsertedinto"replica"vBucketsinthisbucket

vb_replica_ops_updateNumberofitemsupdatedondiskpersecondforthisbucket

vb_replica_queue_ageSumofdiskqueueitemageinmillisecondsfor"replica"vBuckets

vb_replica_queue_drainNumberofreplicaitemspersecondbeingwrittentodiskinthisbucket

vb_replica_queue_fillNumberofreplicaitemspersecondbeingputonthereplicaitemdiskqueueinthisbucket

vb_replica_queue_sizeNumberofreplicaitemswaitingtobewrittentodiskinthisbucket

vb_replica_resident_items_ratioPercentageofreplicaitemscachedinRAMinthisbucket

vb_total_queue_age Sumofdiskqueueitemageinmilliseconds.

Monitoring:DataService

14CouchbaseProfessionalServices

Page 14: Monitoring Guide - Couchbase

xdc_ops IncomingXDCRoperationspersecondforthisbucket

GETCluster-WideIndividualBucketStats

Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe

entireandclusterandthebestpracticeistomonitoreachnodeindividually.

Insecure:http://localhost:8091/pools/default/buckets/BUCKET/stats

Secure:https://localhost:18091/pools/default/buckets/BUCKET/stats

Example:Withanaverageforallsamples

BUCKET="travel-sample"

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/$BUCKET/stats|\

jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|

.key+":"+(.value|add/length|tostring)'

GETNode-LevelIndividualBucketStats

Eachnodeintheclusterrunningthedataserviceshouldbemonitoringindividuallyusingtheendpointlisted

below.

Insecure:http://localhost:8091/pools/default/buckets/BUCKET/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/BUCKET/nodes/NODE/stats

Example:StatsforIndividualNode

Thefollowingexampledemonstrateshowtoretrievethebucketstatsforaspecificnode.

BUCKET="travel-sample"

NODE="172.17.0.2:8091"

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/$BUCKET/nodes/$NODE/stats|\

jq-r-c'.op.samples|

Monitoring:DataService

15CouchbaseProfessionalServices

Page 15: Monitoring Guide - Couchbase

jq-r-c'.op.samples|

"cmd_get:"+(.cmd_get|add/length|tostring)+

"\ncmd_set:"+(.cmd_set|add/length|tostring)+

"\ncurr_connections:"+(.curr_connections|add/length|tostring)+

"\ncurr_items:"+(.curr_items|add/length|tostring)+

"\ncurr_items_tot:"+(.curr_items_tot|add/length|tostring)+

"\ndecr_hits:"+(.decr_hits|add/length|tostring)+

"\ndecr_misses:"+(.decr_misses|add/length|tostring)+

"\ndelete_hits:"+(.delete_hits|add/length|tostring)+

"\ndelete_misses:"+(.delete_misses|add/length|tostring)+

"\nep_bg_fetched:"+(.ep_bg_fetched|add/length|tostring)+

"\nevictions:"+(.evictions|add/length|tostring)+

"\nget_hits:"+(.get_hits|add/length|tostring)+

"\nget_misses:"+(.get_misses|add/length|tostring)+

"\nhit_ratio:"+(.hit_ratio|add/length|tostring)+

"\nincr_hits:"+(.incr_hits|add/length|tostring)+

"\nincr_misses:"+(.incr_misses|add/length|tostring)+

"\nmisses:"+(.misses|add/length|tostring)+

"\nops:"+(.ops|add/length|tostring)

"\nxdc_ops:"+(.xdc_ops|add/length|tostring)

'

KeyMetricstoMonitor

CouchbaseMetric Description Response

mem_usedep_kv_sizeep_mem_high_wat

Thesefourmetricstogethergiveinsightintohowmemoryisusedbythedataservice.

mem_used/ep_kv_sizerepresentsfragmentationwithintheKVengine.

outmem_usedistheactualmemoryutilizationwhereasep_kv_sizeisthesumofthemetadataandvaluesexpectedtobeinRAM.mem_used/memoryTotalshouldbelessthan90%.

ep_kv_size/ep_mem_high_watrepresentsyour

Theamountoffragmentation(mem_used/ep_kv_size)youshouldexpectwilldependontheworkload,butingeneral,alertifthisvalueexceeds115%.Ifmem_used/memoryTotalareconsistentlynear90%,thatisatriggertoaddadditionalmemoryornodestothecluster.Ifthisvalueapproaches100%,thenyoucouldfaceanOutofMemoryerrorandtheCouchbaseprocesscouldbekilledorcrash.Onceep_kv_size=ep_mem_high_wat,Couchbasewillstartejectingdatatodisk.Thismaybeexpecteddependingonyourusecase,butcachingusecaseswillalwayswantep_kv_sizetobelowerthanep_mem_high_wat.

Monitoring:DataService

16CouchbaseProfessionalServices

Page 16: Monitoring Guide - Couchbase

quotautilization.

ep_mem_high_watisthemaximumRAMthebucketisexpectedtouse.

ep_meta_data_memory

Theamountofmemoryusedspecificallyfordocumentmetadata.InValueEjectionmode,it'spossiblefordocumentmetadatatodisplacedocumentvaluesincache,reducingcachehitratesandincreasinglatencies.

Createabaselineforep_meta_data_memory/ep_mem_high_wat.Ifthisvalueexceeds30%andvb_active_resident_items_ratioisnot100%,considerconfiguringFullEjectiononthebucket.

ep_queue_size

Theamountofdatawaitingtobewrittentodisk.AlargevaluetypicallyindicatestheserverisdiskIObound.Ifthisvalueexceeds1,000,000items,theserverwillstartsendingtmp_oom(backoff)messagestotheapplication.

Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablediskIO.Alertat2xofbaseline.Youmayneedtoaddnodesorincreasetheper-nodediskIO.

ep_flusher_todo

Thenumberofitemscurrentlybeingwrittentodisk.Combinedwithep_queue_size,thisrepresentsthetotaldiskwritequeueontheserver.

Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablediskIO.Alertat2xofbaseline.

vb_avg_total_queue_age

Theaveragetimeinsecondsthatawriteisinqueuebeforepersistingtodisk.Thisrepresentsthelocalnode'sexposuretopotentialdataloss.

Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablediskIO.Alertat2xofbaseline.

ep_dcp_replica_items_remaining

Thenumberofitemsintheinter-nodereplicationqueue.Thisrepresentsthecluster'sexposuretopotentialdata

Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablenetworkIO.Alertat2xofbaseline.

Monitoring:DataService

17CouchbaseProfessionalServices

Page 17: Monitoring Guide - Couchbase

cluster'sexposuretopotentialdata

loss.

Alertat2xofbaseline.

ops

ThetotalnumberofKVoperationsoccurringagainstthenode.

Createabaselineforthisvalueas"normal"willbedependentonyourworkload.Alertat2xofbaseline.Abnormallyhighoperationscouldmeananunexpectedchangetotheapplicationorunusualapplicationtrafficpatterns.

cmd_get

ThenumberofKVGEToperationsoccurringagainstthenode.

Createabaselineforthisvalueas"normal"willbedependentonyourworkload.Alertat3xofbaseline.Abnormallyhighoperationscouldmeananunexpectedchangetotheapplicationorunusualapplicationtrafficpatterns.

cmd_set

ThenumberofKVSEToperationsoccurringagainstthenode.

Createabaselineforthisvalueas"normal"willbedependentonyourworkload.Alertat2xofbaseline.Abnormallyhighoperationscouldmeananunexpectedchangetotheapplicationorunusualapplicationtrafficpatterns.

delete_hits

ThenumberofKVDELETEoperationsoccurringagainstthenode.

Createabaselineforthisvalueas"normal"willbedependentonyourworkload.Alertat2xofbaseline.Abnormallyhighoperationscouldmeananunexpectedchangetotheapplicationorunusualapplicationtrafficpatterns.

ep_bg_fetched

Thenumberofitemsfetchedfromdisk(cachemisses).

Thisvalueshouldbecloseto0.Establishabaselineforthismetricandalertat2xofbaseline.

curr_connections

Thenumberofclient(SDK)connectionstoCouchbase.MoreconnectionswillresultinincreasedCPUutilization.

Createabaselineforyourenvironment.Alertat2xofbaseline.Couchbasewillbeginrejectingconnectionsabove30,000.

curr_items

Thenumberofitemscurrentlyactiveonthisnode.Duringwarmup,thiswillbe0untilcomplete.

Onceabaselinenumberofobjectshasbeenestablished,substantialchangestothebaselinecouldindicateunexpectedfailureswithinCouchbaseoranapplicationbug

vb_active_resident_items_ratio

Thepercentageofactivedatainthatismemoryresident.

Forcachingusecases,thisvalueshouldbecloseto100%.Ifthisvaluefallsbelow100%andep_bg_fetchedisgreaterthan0,thisindicatesthebucketneedsmoreRAM.Thevalueshouldneverbelessthan15%.

Thepercentageofreplicadatainthatismemory

Monitoring:DataService

18CouchbaseProfessionalServices

Page 18: Monitoring Guide - Couchbase

vb_replica_resident_items_ratioresident.Ahigherpercentageforthis

valuewillensurelowerlatencydataaccessfollowingafailover.

onbusinessrequirementsforobjectlatencyduringafailurescenario.Thevalueshouldneverbelessthan15%

ep_tmp_oom_errors

NumberoftimestemporaryOOMsweresenttoaclient.Representshightransientmemorypressurewithinthesystem.

Thiserrorindicatestemporarymemorypressureaftertheserverhasreachedep_mem_high_watandisejectingnotrecentlyaccessedvalues.Frequenterrorsindicatetheneedtoscalethecluster.

ep_oom_errors

NumberoftimespermanentOOMsweresenttoaclient.Representsveryhighconsistentmemorypressurewithinthesystem.

Thiserrorindicatesthebuckethasexceededitstotalmemoryallocationandimmediatelyrequiresadditionalmemoryornodesbeadded.

ep_dcp_views_items_remainingep_dcp_2i_items_remaining

ThenumberofdocumentsawaitingindexingforviewsandGSI.

Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablediskIO.Alertat2xbaseline.

ep_dcp_replica_backoff

Indicatesthenumberoftimesaninternalreplicationwasinstructedtoslowdown.

Alertifthisvaluegreaterthanzero.Thisindicatesaresourceconstraintwithintheclusterthatshouldbeinvestigated.

ep_dcp_xdcr_backoff

IndicatesthenumberoftimesanXDCRreplicationwasinstructedtoslowdown.

Shouldbemonitoredasarate.Createabaselineforyourenvironmentas"normal"willbedependentonworkloadpatternsandXDCRbandwidthlimits.Alertat2xofbaseline.

couch_docs_fragmentationThepercentageofdatafilefragmentation.

Bydefault,compactionshouldstartwhenthisvaluehits30%.Ifthisvalueconsistentlyexceeds30%,thenthistypicallyindicatesdiskIOcontentionoraproblemwithcompactionstartingthatshouldbeinvestigated.

couch_views_fragmentationThepercentageofViewindexfragmentation.

Bydefault,compactionshouldstartwhenthisvaluehits30%.Ifthisvaluesignificantlyexceeds30%,thenthistypicallyindicatesdiskIOcontentionoraproblemwithcompactionstartingthatshouldbeinvestigated.

vb_replica_numThenumberofreplicavBuckets.

Ifthisvaluefallsbelow(1024*thenumberofconfiguredreplicas)/thenumberofservers,itindicatesthatarebalanceisrequired.

vb_active_numThenumberofactivevBuckets.

Thisvalueshouldalwaysequal1024/thenumberofservers.Ifitdoesnot,itindicatesanodefailureandthatafailover+rebalanceisrequired.

Monitoring:DataService

19CouchbaseProfessionalServices

Page 19: Monitoring Guide - Couchbase

vb_active_num activevBuckets. indicatesanodefailureandthata

failover+rebalanceisrequired.

Example

Thefollowingexampleillustratesgettingtheverbosestatsforanindividualbucket.

BUCKET='travel-sample'

#outputthestatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/$BUCKET/stats|\

jq-r-c'.op.samples|to_entries|sort_by(.key)|.[]|

""+(.key)+":"+(.value|add/length|tostring)'

Example

Thefollowingexampleillustratesgettinganindividualstatforasinglebucket.

BUCKET='travel-sample'

STAT='cmd_get'

#outputthestatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/$BUCKET/stats/$STAT|\

jq-r-c'.nodeStats|to_entries|sort_by(.key)|.[]|

""+(.key)+":"+(.value|add/length|tostring)'

Example

Thisexampleshowshowtoretrieveallstatsforallbuckets.

#loopovereachofthebuckets

forbucketin$(curl\

--userAdministrator:password\

--silent\

--requestGET\

Monitoring:DataService

20CouchbaseProfessionalServices

Page 20: Monitoring Guide - Couchbase

jq-r'.[]|.name')

do

echo""

echo"Bucket:$bucket"

echo"================================================================"

#outputthestatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/$bucket/stats|\

jq-r-c'.op.samples|to_entries|sort_by(.key)|.[]|

""+(.key)+":"+(.value|add/length|tostring)'

done

Monitoring:DataService

21CouchbaseProfessionalServices

Page 21: Monitoring Guide - Couchbase

Monitoring:EventingService

EventingService-LevelStatsTheEventingstatsareanaggregateforalloftheEventingFunctionsdeployed,eitherfortheentireclusteror

aspecificnode.

AvailableStats

Statname Description

eventing/bucket_op_exception_countTotalnumberofbucketoperationsinsideofanEventingfunctionwhichhaveresultedinanexception

eventing/checkpoint_failure_countTotalnumberoffailureswhencheckpointinglastprocessedsequencenumbersbyv8worker.Failuresareretriedusingexponentialbackoffuntiltimeout.

eventing/dcp_backlog Remainingmutationstoprocess

eventing/failed_count TotalnumberoffailedEventingfunctionoperations

eventing/n1ql_op_exception_countTotalnumberofN1QLoperationsinsideofanEventingfunctionwhichhaveresultedinanexception

eventing/on_delete_failureThetotalnumber OnDeletehandlerexecutionsthathavefailedforallfunctions

eventing/on_delete_successTotal OnDeletehandlerexecutionsthathavesucceededforallfunctions

eventing/on_update_failureTotal OnUpdatehandlerexecutionsthathavefailedforallfunctions

eventing/on_update_successTotal OnUpdatehandlerexecutionsthathavefailedforallfunctions

eventing/processed_count Totalnumberofmutationsthathavebeenprocessed

eventing/timeout_countTotalnumberofhandlerexecutionswereterminatedbecausethehandlerranlongerthantheconfiguredscripttimeout

GETClusterEventingServiceStats

Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe

entireandclusterandthebestpracticeistomonitoreachnodeindividually.

Insecure:http://localhost:8091/pools/default/buckets/@eventing/stats

Secure:https://localhost:18091/pools/default/buckets/@eventing/stats

Example

Monitoring:EventingService

22CouchbaseProfessionalServices

Page 22: Monitoring Guide - Couchbase

Thefollowingexampledemonstrateshowtoretrievetheeventingservicestatsforthecluster.

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@eventing/stats|\

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length==2)|

""+(.key)+":"+

(.value|add/length|tostring)'

GETNode-LevelEventingServiceStats

Eachnodeintheclusterrunningtheeventingserviceshouldbemonitoringindividuallyusingtheendpoint

listedbelow.

Insecure:http://localhost:8091/pools/default/buckets/@eventing/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/@eventing/nodes/NODE/stats

Example:StatsforIndividualNode

Thefollowingexampledemonstrateshowtoretrievetheeventingservicestatsforaspecificnodeinthe

cluster.

NODE="172.17.0.2:8091"

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@eventing/nodes/$NODE/stats|

\

jq-r-c'.op.samples|

"eventing/bucket_op_exception_count:"+

(.["eventing/bucket_op_exception_count"]|add/length|tostring)+

"\neventing/checkpoint_failure_count:"+

(.["eventing/checkpoint_failure_count"]|add/length|tostring)+

"\neventing/dcp_backlog:"+

(.["eventing/dcp_backlog"]|add/length|tostring)+

"\neventing/failed_count:"+

(.["eventing/failed_count"]|add/length|tostring)+

"\neventing/n1ql_op_exception_count:"+

Monitoring:EventingService

23CouchbaseProfessionalServices

Page 23: Monitoring Guide - Couchbase

(.["eventing/n1ql_op_exception_count"]|add/length|tostring)+

"\neventing/on_delete_failure:"+

(.["eventing/on_delete_failure"]|add/length|tostring)+

"\neventing/on_delete_success:"+

(.["eventing/on_delete_success"]|add/length|tostring)+

"\neventing/on_update_failure:"+

(.["eventing/on_update_failure"]|add/length|tostring)+

"\neventing/on_update_success:"+

(.["eventing/on_update_success"]|add/length|tostring)+

"\neventing/processed_count:"+

(.["eventing/processed_count"]|add/length|tostring)+

"\neventing/timeout_count:"+

(.["eventing/timeout_count"]|add/length|tostring)'

Example:StatsforEachNodeSeparately

#loopovereachofthebuckets

fornodein$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/nodes|\

jq-r'.nodes[]|

select(.services|contains(["eventing"])==true)|

.hostname'

)

do

echo"$nodeFunctionStats"

echo"-------------------------------------------------------"

#gettheeventingstatsforthespecificnode

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@eventing/nodes/$node/stats

|\

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length==2)|

""+(.key|split("/")[1])+":"+

(.value|add/length|tostring)'

done

KeyMetricstoMonitor

Monitoring:EventingService

24CouchbaseProfessionalServices

Page 24: Monitoring Guide - Couchbase

CouchbaseMetric Description Response

eventing/bucket_op_exception_counteventing/failed_counteventing/n1ql_op_exception_counteventing/on_delete_failureeventing/on_update_failureeventing/timeout_count

Anyexceptions/failuresshouldbemonitored

Forthisvalue"normal"is0,anyvalueotherthan0wouldindicateexceptionsarebeingthrownandshouldbeinvestigated

eventing/dcp_backlogThenumberofitemstobeprocessed.

Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandnumberoffunctions.Alertat2xofbaseline.

EventingFunction-LevelStatsTheEventingstatsforaspecificfunctionsareavailableonlyoncethefunctionhasbeendeployed.Thesame

statsthatareavailablefortheserviceasawholearealsoavailableonaper-functionbasisandcanbe

retrievedfortheentireclusteroraspecificnodeinthecluster.

AvailableStats

Statname Description

eventing/function_name/bucket_op_exception_countTotalnumberofoperationsinsideofanEventingfunctionwhichhaveresultedinanexceptionforthefunction

eventing/function_name/checkpoint_failure_countTotalnumberofcheckpointfailuresforthefunction

eventing/function_name/dcp_backlog Remainingmutationstoprocess

eventing/function_name/failed_countTotalnumberoffailedEventingfunctionoperationsforthefunction

eventing/function_name/n1ql_op_exception_countTotalnumberofN1QLoperationsinsideofanEventingfunctionwhichhaveresultedinanexceptionforthefunction

eventing/function_name/on_delete_failureThetotalnumber OnDeletehandlerexecutionsthathavefailedforthefunction

eventing/function_name/on_delete_successTotal OnDeletehandlerexecutionsthathavesucceededforthefunction

eventing/function_name/on_update_failureTotal OnUpdatehandlerexecutionsthathavefailedforthefunction

eventing/function_name/on_update_successTotal OnUpdatehandlerexecutionsthathavefailedforthefunction

eventing/function_name/processed_countTotalnumberofmutationsthathavebeenprocessedforthefunction

eventing/function_name/timeout_countTotalnumberofhandlerexecutionsthathaveresultedinatimeoutforthefunction

Monitoring:EventingService

25CouchbaseProfessionalServices

Page 25: Monitoring Guide - Couchbase

GETClusterEventingFunctionStats

Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe

entireandclusterandthebestpracticeistomonitoreachnodeindividually.

Insecure:http://localhost:8091/pools/default/buckets/@eventing/stats

Secure:https://localhost:18091/pools/default/buckets/@eventing/stats

Example

Thefollowingexampledemonstrateshowtoretrievetheeventingservicestatsforthecluster.

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@eventing/stats|\

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length==3)|

""+(.key)+":"+

(.value|add/length|tostring)'

GETEventingFunctionStatsperNode

Eachnodeintheclusterrunningtheeventingserviceshouldbemonitoringindividually,althoughasfunctions

canbedynamic,fromamanageabilitystandpoint,itwillbeeasiertomonitortheaggregatestatsofthe

service.However,eachindividualfunctioncanbemonitoredifyousochoose.

Insecure:http://localhost:8091/pools/default/buckets/@eventing/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/@eventing/nodes/NODE/stats

Example

Thefollowingexampledemonstrateshowtoretrievethespecificeventingfunctionstatsforthenode.

NODE="172.17.0.2:8091"

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@eventing/nodes/$NODE/stats|

\

Monitoring:EventingService

26CouchbaseProfessionalServices

Page 26: Monitoring Guide - Couchbase

jq-r'.op.samplesas$stats

|$stats|[

keys|.[]|select(.|split("/")|length==3)|split("/")[1]

]|sort|uniqueas$funcs

|$funcs|.[]|

"Function:"+.+

"\n----------------------------------------------------------------"+

"\nbucket_op_exception_count:"+

($stats["eventing/"+.+"/bucket_op_exception_count"]|add|tostri

ng)+

"\ncheckpoint_failure_count:"+

($stats["eventing/"+.+"/checkpoint_failure_count"]|add|tostrin

g)+

"\ndcp_backlog:"+

($stats["eventing/"+.+"/dcp_backlog"]|add|tostring)+

"\nfailed_count:"+

($stats["eventing/"+.+"/failed_count"]|add|tostring)+

"\nn1ql_op_exception_count:"+

($stats["eventing/"+.+"/n1ql_op_exception_count"]|add|tostring

)+

"\non_delete_failure:"+

($stats["eventing/"+.+"/on_delete_failure"]|add/length|tostr

ing)+

"\non_delete_success:"+

($stats["eventing/"+.+"/on_delete_success"]|add/length|tost

ring)+

"\non_update_failure:"+

($stats["eventing/"+.+"/on_update_failure"]|add/length|tostr

ing)+

"\non_update_success:"+

($stats["eventing/"+.+"/on_update_success"]|add/length|tostr

ing)+

"\nprocessed_count:"+

($stats["eventing/"+.+"/processed_count"]|add/length|tostrin

g)+

"\ntimeout_count:"+

($stats["eventing/"+.+"/timeout_count"]|add|tostring)

'

KeyMetricstoMonitor

CouchbaseMetric Description Response

eventing/func_name/bucket_op_exception_counteventing/func_name/failed_counteventing/func_name/n1ql_op_exception_counteventing/func_name/on_delete_failureeventing/func_name/on_update_failure

Anyexceptions/failuresshouldbemonitored

Forthisvalue"normal"is0,anyvalueotherthan0wouldindicateexceptionsarebeingthrownandshouldbe

Monitoring:EventingService

27CouchbaseProfessionalServices

Page 27: Monitoring Guide - Couchbase

eventing/func_name/on_update_failureeventing/func_name/timeout_count

monitored thrownandshouldbeinvestigated

eventing/func_name/dcp_backlogThenumberofitemstobeprocessed.

Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandnumberoffunctions.Alertat2xofbaseline.

Monitoring:EventingService

28CouchbaseProfessionalServices

Page 28: Monitoring Guide - Couchbase

Monitoring:Full-TextSearchService

GETFull-TextSearchIndexes

Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-fts-indexing.html#index-definition

http://localhost:8094/api/index

Retrieveallindexdefinitionsandconfigurations

Example

ThefollowingexampleillustrateshowtoretrieveeachFTSindexname

curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8094/api/index|

jq-r'.indexDefs.indexDefs|keys|.[]'

FTSService-LevelStats

AvailableStats

Statname Description

fts_curr_batches_blocked_by_herder Thenumberofbatchesblockedbytheherder

fts_num_bytes_used_ram ThenumberofbytesusedinmemoryfortheFTSservice.

fts_total_queries_rejected_by_herder Thenumberofqueriesrejectedbytheherder

GETClusterFTSServiceStats

Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe

entireandclusterandthebestpracticeistomonitoreachnodeindividually.

Insecure:http://localhost:8091/pools/default/buckets/@fts/stats

Secure:https://localhost:18091/pools/default/buckets/@fts/stats

curl\

--userAdministrator:password\

Monitoring:Full-TextSearchService

29CouchbaseProfessionalServices

Page 29: Monitoring Guide - Couchbase

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@fts/stats|\

jq-r'.op.samples|

"fts_num_bytes_used_ram:"+(.fts_num_bytes_used_ram|add/length|

tostring)'

GETNode-LevelFTSServiceStats

EachnodeintheclusterrunningtheFTSserviceshouldbemonitoringindividuallyusingtheendpointlisted

below.

Insecure:http://localhost:8091/pools/default/buckets/@fts/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/@fts/nodes/NODE/stats

Example:StatsforIndividualNode

ThefollowingexampledemonstrateshowtoretrievetheFTSservicestatsforthecluster.

NODE="172.17.0.2:8091"

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@fts/nodes/$NODE/stats|\

jq-r'.op.samples|

"fts_num_bytes_used_ram:"+(.fts_num_bytes_used_ram|add/length|

tostring)'

Example:StatsforEachNodeSeparately

#loopovereachofthebuckets

fornodein$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/nodes|\

jq-r'.nodes[]|

select(.services|contains(["fts"])==true)|

.hostname'

)

Monitoring:Full-TextSearchService

30CouchbaseProfessionalServices

Page 30: Monitoring Guide - Couchbase

do

echo"$nodeFTSStats"

echo"-------------------------------------------------------"

#gettheFTSstatsforthespecificnode

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@fts/nodes/$node/stats|\

jq-r'.op.samples|

"fts_num_bytes_used_ram:"+(.fts_num_bytes_used_ram|add/length

|tostring)'

done

IndividualFTS-LevelStatsTheFTSstatsforaspecificindexesareavailableonlyunderthebucketthattheindexiscreatedon.The

samestatsthatareavailablefortheserviceasawholearealsoavailableonaper-indexbasisandcanbe

retrievedfortheentireclusteroraspecificnodeinthecluster.

AvailableStats

Statname Description

fts/indexName/avg_queries_latency Theaveragequerylatencyinmilliseconds

fts/indexName/doc_count Thenumberofdocumentsintheindex

fts/indexName/num_bytes_used_disk Totaldiskfilesizeusedbytheindex

fts/indexName/num_files_on_disk Numberoffilesfortheindexondisk

fts/indexName/num_mutations_to_index Thenumberofdocumentspendingindexing

fts/indexName/num_pindexes_actualNumberofindexpartitions(includingreplicapartitions)

fts/indexName/num_pindexes_targetNumberofindexpartitionsexpected(includingreplicapartitions)

fts/indexName/num_recs_to_persist Numberofindexrecordsnotyetpersistedtodisk

fts/indexName/num_root_filesegments Thenumberofrootfilesegments

fts/indexName/num_root_memorysegments Thenumberofrootmemorysegments

fts/indexName/total_bytes_indexed Numberofftsbytesindexedpersecond

fts/indexName/total_bytes_query_results Numberofbytesreturnedinresultspersecond

fts/indexName/total_compaction_written_bytes Numberofcompactionbyteswrittenpersecond

fts/indexName/total_queries Thenumberofqueriespersecond

Monitoring:Full-TextSearchService

31CouchbaseProfessionalServices

Page 31: Monitoring Guide - Couchbase

fts/indexName/total_queries_error Thenumberofqueryerrorspersecond

fts/indexName/total_queries_slow Thenumberofslowqueriespersecond(>5s)

fts/indexName/total_queries_timeoutThenumberofqueriespersecondthatresultedinatimeout

fts/indexName/total_request_time Totaltimespentservicingrequests

fts/indexName/total_term_searchers Numberoftermsearchersstartedpersecond

GETClusterIndividualFTSStats

Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe

entireandclusterandthebestpracticeistomonitoreachnodeindividually.

Insecure:http://localhost:8091/pools/default/buckets/@fts-BUCKET/stats

Secure:https://localhost:18091/pools/default/buckets/@fts-BUCKET/stats

Example

Thefollowingexampledemonstrateshowtoretrievetheeventingservicestatsforthecluster.

BUCKET="travel-sample"

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@fts-$BUCKET/stats|\

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length==3)|

""+(.key)+":"+

(.value|add/length|tostring)'

GETIndividualFTSStatsperNode

EachnodeintheclusterrunningtheFTSserviceshouldbemonitoringindividually.

Insecure:http://localhost:8091/pools/default/buckets/@fts-BUCKET/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/@fts-BUCKET/nodes/NODE/stats

Example:StatsforIndividualNode

ThefollowingexampledemonstrateshowtoretrievealloftheFTSstatsforaspecificindexinabucketfora

specificnode.

Monitoring:Full-TextSearchService

32CouchbaseProfessionalServices

Page 32: Monitoring Guide - Couchbase

NODE="172.17.0.2:8091"

BUCKET="travel-sample"

INDEX="demo"

#gettheFTSstatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@fts-$BUCKET/nodes/$NODE/stats

|\

jq-r--argindex"$INDEX"'.op.samples|

"avg_queries_latency:"+

(.["fts/"+$index+"/avg_queries_latency"]|add/length|tostring)

+

"\ndoc_count:"+

(.["fts/"+$index+"/doc_count"]|add/length|tostring)+

"\nnum_bytes_used_disk:"+

(.["fts/"+$index+"/num_bytes_used_disk"]|add/length|tostring)

+

"\nnum_mutations_to_index:"+

(.["fts/"+$index+"/num_mutations_to_index"]|add|tostring)+

"\nnum_pindexes_actual:"+

(.["fts/"+$index+"/num_pindexes_actual"]|add|tostring)+

"\nnum_pindexes_target:"+

(.["fts/"+$index+"/num_pindexes_target"]|add|tostring)+

"\nnum_recs_to_persist:"+

(.["fts/"+$index+"/num_recs_to_persist"]|add|tostring)+

"\ntotal_bytes_indexed:"+

(.["fts/"+$index+"/total_bytes_indexed"]|add/length|tostring)

+

"\ntotal_bytes_query_results:"+

(.["fts/"+$index+"/total_bytes_query_results"]|add/length|tost

ring)+

"\ntotal_compaction_written_bytes:"+

(.["fts/"+$index+"/total_compaction_written_bytes"]|add/length|

tostring)+

"\ntotal_queries:"+

(.["fts/"+$index+"/total_queries"]|add|tostring)+

"\ntotal_queries_error:"+

(.["fts/"+$index+"/total_queries_error"]|add|tostring)+

"\ntotal_queries_slow:"+

(.["fts/"+$index+"/total_queries_slow"]|add|tostring)+

"\ntotal_queries_timeout:"+

Monitoring:Full-TextSearchService

33CouchbaseProfessionalServices

Page 33: Monitoring Guide - Couchbase

(.["fts/"+$index+"/total_queries_timeout"]|add|tostring)+

"\ntotal_request_time+queued:"+

(.["fts/"+$index+"/total_request_time"]|add|tostring)+

"\ntotal_term_searchers:"+

(.["fts/"+$index+"/total_term_searchers"]|add|tostring)'

Example:StatsforIndividualNode

ThefollowingexampledemonstrateshowtoretrievealloftheFTSstats,foreverybucketintheclusterfora

singlenode.

NODE="172.17.0.2:8091"

#loopovereachofthebucketsthathasindexes

forbucketin$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8094/api/index|\

jq-r'.indexDefs.indexDefs|[to_entries[]|.value.sourceName]|sort

|unique|.[]')

do

echo""

echo"Bucket:$bucket"

echo"================================================================"

#gettheFTSstatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@fts-$bucket/nodes/$NODE/sta

ts|\

#1.reducethesamplesobject,byloopingovereachproperty,onlywork

withproperties

#whoareindexspecificstatpropertiesandeithersumoraveragesampl

es

#2.getalloftheuniqueindexkeys

#3.loopovereachindexandoutputthestats

jq-r'

reduce(.op.samples|to_entries[])as$key,$value(

;

if(

$key|split("/")|length==3

Monitoring:Full-TextSearchService

34CouchbaseProfessionalServices

Page 34: Monitoring Guide - Couchbase

and($key|contains("replica")|not)

)then

if([

"num_mutations_to_index","num_pindexes_actual",

"num_pindexes_target","num_recs_to_persist","total_queries",

"total_queries_error","total_queries_slow","total_queries_timeou

t",

"total_request_time+queued","total_term_searchers"

]|.[]|contains($key|split("/")|.[2])==true)then

.[$key]+=($value|add)

else

.[$key]+=($value|add/length|roundit/100.0)

end

else

.

end

)|.as$stats|

$stats|keys|map(split("/")[1])|sort|uniqueas$indexes|

$indexes|.[]|

"Index:"+.+

"\n----------------------------------------------------------------"+

"\navg_queries_latency:"

+($stats["fts\/"+.+"\/avg_queries_latency"]|tostring)+

"\ndoc_count:"

+($stats["fts\/"+.+"\/doc_count"]|tostring)+

"\nnum_bytes_used_disk:"

+($stats["fts\/"+.+"\/num_bytes_used_disk"]|tostring)+

"\nnum_mutations_to_index:"

+($stats["fts\/"+.+"\/num_mutations_to_index"]|tostring)+

"\nnum_pindexes_actual:"

+($stats["fts\/"+.+"\/num_pindexes_actual"]|tostring)+

"\nnum_pindexes_target:"

+($stats["fts\/"+.+"\/num_pindexes_target"]|tostring)+

"\nnum_recs_to_persist:"

+($stats["fts\/"+.+"\/num_recs_to_persist"]|tostring)+

"\ntotal_bytes_indexed:"

+($stats["fts\/"+.+"\/total_bytes_indexed"]|tostring)+

"\ntotal_bytes_query_results:"

+($stats["fts\/"+.+"\/total_bytes_query_results"]|tostring)+

"\ntotal_compaction_written_bytes:"

+($stats["fts\/"+.+"\/total_compaction_written_bytes"]|tostri

ng)+

"\ntotal_queries:"

+($stats["fts\/"+.+"\/total_queries"]|tostring)+

"\ntotal_queries_error:"

+($stats["fts\/"+.+"\/total_queries_error"]|tostring)+

Monitoring:Full-TextSearchService

35CouchbaseProfessionalServices

Page 35: Monitoring Guide - Couchbase

"\ntotal_queries_slow:"

+($stats["fts\/"+.+"\/total_queries_slow"]|tostring)+

"\ntotal_queries_timeout:"

+($stats["fts\/"+.+"\/total_queries_timeout"]|tostring)+

"\ntotal_request_time:"

+($stats["fts\/"+.+"\/total_request_time"]|tostring)+

"\ntotal_term_searchers:"

+($stats["fts\/"+.+"\/total_term_searchers"]|tostring)+

"\n"

'

done

KeyMetricstoMonitor

CouchbaseMetric Description Response

avg_queries_latency Theaveragequerylatency

Createabaselineforthisvalue,as"normal"willdependonthesize.Alertat2xofthebaseline.Thiswouldindicateaslowdownforindexscanstotheindex.

total_queries

Thenumberofqueryrequeststotheindex

Createabaselineforthisvalue,as"normal"willdependontheamount.Alertat2xofthebaseline.Thiswouldindicateadramaticincreaseinrequests.

total_queries_errortotal_queries_timeout

Thenumberofqueryerrorstotheindex

Alertatanyvaluegreaterthan0asthisindicatesfailedrequests.

FTSAggregateStatsTheFTSaggregatestatsforaspecificbucketareavailableonlyunderthebucketthattheindexesexiston

andareatotalofalloftheindexesforthatbucketintheclusterornode.

AvailableStats

Statname Description

fts/doc_count Thenumberofdocumentsinallftsindexes

fts/num_bytes_used_disk Totaldiskfilesizeusedbytheindexes

fts/num_files_on_disk Thenumberofindexfilesondisk

fts/num_mutations_to_index Thenumberofdocumentspendingindexing

fts/num_pindexes_actual Numberofindexpartitions(includingreplicapartitions)

fts/num_pindexes_target Numberofindexpartitionsexpected(includingreplicapartitions)

fts/num_recs_to_persist Numberofindexrecordsnotyetpersistedtodisk

fts/num_root_filesegments Numberofrootfilesegments

Monitoring:Full-TextSearchService

36CouchbaseProfessionalServices

Page 36: Monitoring Guide - Couchbase

fts/num_root_filesegments Numberofrootfilesegments

fts/num_root_memorysegments Numberofrootmemorysegments

fts/total_bytes_indexed Numberofftsbytesindexedpersecond

fts/total_bytes_query_results Numberofbytesreturnedinresultspersecond

fts/total_compaction_written_bytes Numberofcompactionbyteswrittenpersecond

fts/total_queries Thenumberofqueriespersecond

fts/total_queries_error Thenumberofqueryerrorspersecond

fts/total_queries_slow Thenumberofslowqueriespersecond(>5s)

fts/total_queries_timeout Thenumberofqueriespersecondthatresultedinatimeout

fts/total_request_time

fts/total_term_searchers Numberoftermsearchersstartedpersecond

GETClusterFTSAggregateStats

Insecure:http://localhost:8091/pools/default/buckets/@fts-BUCKET/stats

Secure:https://localhost:18091/pools/default/buckets/@fts-BUCKET/stats

Example:StatsforCluster

Thefollowingexampledemonstrateshowtoretrievealloftheftsaggregatestatsforaspecificbucketinthe

entirecluster.

BUCKET="travel-sample"

#gettheFTSstatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@fts-$BUCKET/stats|\

jq-r'.op.samples|

"doc_count:"+(.["fts/doc_count"]|add/length|tostring)+

"\nnum_bytes_used_disk:"+(.["fts/num_bytes_used_disk"]|add/length

|tostring)+

"\nnum_mutations_to_index:"+(.["fts/num_mutations_to_index"]|add/

length|tostring)+

"\nnum_pindexes_actual:"+(.["fts/num_pindexes_actual"]|add|tostri

ng)+

"\nnum_pindexes_target:"+(.["fts/num_pindexes_target"]|add/length

|tostring)+

"\ntotal_bytes_indexed:"+(.["fts/total_bytes_indexed"]|add/length

|tostring)+

Monitoring:Full-TextSearchService

37CouchbaseProfessionalServices

Page 37: Monitoring Guide - Couchbase

"\ntotal_bytes_query_results:"+(.["fts/total_bytes_query_results"]|

add/length|tostring)+

"\ntotal_compaction_written_bytes:"+(.["fts/total_compaction_written_

bytes"]|add/length|tostring)+

"\ntotal_queries:"+(.["fts/total_queries"]|add/length|tostring)

+

"\ntotal_queries_error:"+(.["fts/total_queries_error"]|add/length

|tostring)+

"\ntotal_queries_slow:"+(.["fts/total_queries_slow"]|add/length|

tostring)+

"\ntotal_queries_timeout:"+(.["fts/total_queries_timeout"]|add/le

ngth|tostring)+

"\ntotal_request_time:"+(.["fts/total_request_time"]|add|tostring

)+

"\ntotal_term_searchers:"+(.["fts/total_term_searchers"]|add|tost

ring)'

GETFTSAggregateStatsperNode

Insecure:http://localhost:8091/pools/default/buckets/@fts-BUCKET/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/@fts-BUCKET/nodes/NODE/stats

Example:AggregateStatsforIndividualNode

Thefollowingexampledemonstrateshowtoretrievealloftheindexaggregatestatsforaspecificinabucket

foraspecificnode.

BUCKET="travel-sample"

NODE="172.17.0.2:8091"

#gettheFTSstatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@fts-$BUCKET/nodes/$NODE/stats

|\

jq-r'.op.samples|

"doc_count:"+(.["fts/doc_count"]|add/length|tostring)+

"\nnum_bytes_used_disk:"+(.["fts/num_bytes_used_disk"]|add/length

|tostring)+

"\nnum_mutations_to_index:"+(.["fts/num_mutations_to_index"]|add/

length|tostring)+

"\nnum_pindexes_actual:"+(.["fts/num_pindexes_actual"]|add|tostri

Monitoring:Full-TextSearchService

38CouchbaseProfessionalServices

Page 38: Monitoring Guide - Couchbase

ng)+

"\nnum_pindexes_target:"+(.["fts/num_pindexes_target"]|add/length

|tostring)+

"\ntotal_bytes_indexed:"+(.["fts/total_bytes_indexed"]|add/length

|tostring)+

"\ntotal_bytes_query_results:"+(.["fts/total_bytes_query_results"]|

add/length|tostring)+

"\ntotal_compaction_written_bytes:"+(.["fts/total_compaction_written_

bytes"]|add/length|tostring)+

"\ntotal_queries:"+(.["fts/total_queries"]|add/length|tostring)

+

"\ntotal_queries_error:"+(.["fts/total_queries_error"]|add/length

|tostring)+

"\ntotal_queries_slow:"+(.["fts/total_queries_slow"]|add/length|

tostring)+

"\ntotal_queries_timeout:"+(.["fts/total_queries_timeout"]|add/le

ngth|tostring)+

"\ntotal_request_time:"+(.["fts/total_request_time"]|add|tostring

)+

"\ntotal_term_searchers:"+(.["fts/total_term_searchers"]|add|tost

ring)'

Monitoring:Full-TextSearchService

39CouchbaseProfessionalServices

Page 39: Monitoring Guide - Couchbase

Monitoring:IndexService

IndexStatusTheindexstatusAPIdisplaysallindexdefinitions,nodeplacementandstatuswithinthecluster.

Insecure:http://localhost:8091/indexStatus

Secure:https://localhost:18091/indexStatus

Response:

"indexes":[

"storageMode":"plasma",

"partitioned":false,

"instId":4607548507687231469,

"hosts":["127.0.0.1:8091"],

"progress":100,

"definition":"CREATEINDEX`def_airportname`ON`travel-sample`(`airpor

tname`)WITH\"defer_build\":true",

"status":"Ready",

"bucket":"travel-sample",

"index":"def_airportname",

"id":15764219156300962421

,

"storageMode":"plasma",

"partitioned":false,

"instId":11862384293590784556,

"hosts":["127.0.0.1:8091"],

"progress":100,

"definition":"CREATEINDEX`def_city`ON`travel-sample`(`city`)WITH

\"defer_build\":true",

"status":"Ready",

"bucket":"travel-sample",

"index":"def_city",

"id":2037567312091921182

],

"version":45110879,

"warnings":[]

KeyMetricstoMonitor

Monitoring:IndexService

40CouchbaseProfessionalServices

Page 40: Monitoring Guide - Couchbase

CouchbaseMetric

Description Response

status Indicateswhetheraindexisina"Ready"or"Building"state.

Alertifthevalueisnot"Ready"or"Building".

Example

ThefollowingexampleillustratesoutputtingeachIndexNameandStatus.

curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/indexStatus|\

jq-r'.indexes|sort_by(.bucket)|.[]|.bucket+":"+.index+"("

+.status+")"'

Thisexampleshowsoutputtingallindexeswhosestatusisnot"Ready"or"Building"

curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/indexStatus|\

jq-r'.indexes|map(select(

(.status!="Ready"and.status!="Building")

))|.[]|.bucket+":"+.index+"("+.status+")"'

IndexService-LevelStatsThefollowingIndexservicestatsareavailableviatheCluster-WideorPer-NodeEndpointslistedbelow.

AvailableStats

Statname Description

index_memory_quota Theclusterwidememoryquota.

index_memory_used Theamountofmemorycurrentlyusedbytheindexingservice.

index_ram_percent Thepercentageofindexentriesinram.

index_remaining_ram Theamountofmemoryremaining.

Monitoring:IndexService

41CouchbaseProfessionalServices

Page 41: Monitoring Guide - Couchbase

GETClusterIndexServiceStats

Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe

entireandclusterandthebestpracticeistomonitoreachnodeindividually.

Insecure:http://localhost:8091/pools/default/buckets/@index/stats

Secure:https://localhost:18091/pools/default/buckets/@index/stats

Example

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@index/stats|\

jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|

.key+":"+(.value|add/length|tostring)'

GETNode-LevelIndexServiceStats

Eachnodeintheclusterrunningtheindexserviceshouldbemonitoringindividuallyusingtheendpointlisted

below.

Insecure:http://localhost:8091/pools/default/buckets/@index/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/@index/nodes/NODE/stats

Example:StatsforIndividualNode

Thefollowingexampledemonstrateshowtoretrievetheindexservicestatsforaspecificnode.

NODE="172.17.0.2:8091"

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@index/nodes/$NODE/stats|\

jq-r-c'.op.samples|

"index_memory_quota:"+(.index_memory_quota|add/length|tostring)

+

"\nindex_memory_used:"+(.index_memory_used|add/length|tostring)

+

"\nindex_ram_percent:"+(.index_ram_percent|add/length|tostring)

Monitoring:IndexService

42CouchbaseProfessionalServices

Page 42: Monitoring Guide - Couchbase

+

"\nindex_remaining_ram:"+(.index_remaining_ram|add/length|tostr

ing)'

Example:StatsforEachNodeSeparately

#loopovereachofthebuckets

fornodein$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/nodes|\

jq-r'.nodes[]|

select(.services|contains(["index"])==true)|

.hostname'

)

do

echo"$nodeIndexStats"

echo"-------------------------------------------------------"

#gettheindexstatsforthespecificnode

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@index/nodes/$node/stats|\

jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|

.key+":"+(.value|add/length|tostring)'

done

KeyMetricstoMonitor

CouchbaseMetric Description Response

index_remaining_ramTheamountofmemoryremaining.

Alertifthisvalueis20%orless,asitisanindicativeofindexgrowthandnewindexnodeswillneedtobeexpanded.

IndividualIndex-LevelStatsTheIndexstatsforaspecificindexesareavailableonlyunderthebucketthattheindexiscreatedon.The

samestatsthatareavailablefortheserviceasawholearealsoavailableonaper-indexbasisandcanbe

retrievedfortheentireclusteroraspecificnodeinthecluster.

Monitoring:IndexService

43CouchbaseProfessionalServices

Page 43: Monitoring Guide - Couchbase

AvailableStats

Statname Description

index/indexName/avg_item_size Theaverageindexentrysize

index/indexName/avg_scan_latency Theaveragelatencywhenscanningtheindex

index/indexName/cache_hits Thenumberofin-memoryhitstotheindex

index/indexName/cache_miss_ratio Theratioofmissestohits

index/indexName/cache_misses Thenumberofin-memorymissestotheindex

index/indexName/data_size Thetotaldatasizeoftheindex

index/indexName/data_size_on_disk Thetotalsizeoftheindexdataondisk

index/indexName/disk_overhead_estimate Thesizeofstaledataondiskduetofragmentation

index/indexName/disk_size Thesizeoftheindexondisk

index/indexName/frag_percent Theindexfragmentationpercentage

index/indexName/index_frag_percent Theindexfragmentationpercentage

index/indexName/index_resident_percentThepercentageoftheindexthatismemoryresident

index/indexName/items_count Thenumberofitemsintheindex

index/indexName/log_space_on_disk Thesizeofthelogfilesondisk

index/indexName/memory_used Theamountofmemoryusedbytheindex

index/indexName/num_docs_indexed Thenumberofitemsindexedsincethelastrestart

index/indexName/num_docs_pending Thenumberofitemspendingindexing

index/indexName/num_docs_pending+queuedThenumberofdocumentsthatarependingorqueuedforindexing

index/indexName/num_docs_queuedThenumberofdocumentsthatarequeuedforindexing

index/indexName/num_requests Thenumberofrequeststotheindex

index/indexName/num_rows_returned Theaveragenumberofrowsreturnedbyascan

index/indexName/raw_data_size Therawuncompresseddatasize

index/indexName/recs_in_memThenumberofrecordsintheindexthatareinmemory

index/indexName/recs_on_disk Thenumberofrecordsnotinmemory

index/indexName/scan_bytes_read Theaveragenumberofbytesreadperscan

index/indexName/total_scan_duration Thetotaltimespentscanning

GETClusterIndividualIndexStats

Monitoring:IndexService

44CouchbaseProfessionalServices

Page 44: Monitoring Guide - Couchbase

Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe

entireandclusterandthebestpracticeistomonitoreachnodeindividually.

Insecure:http://localhost:8091/pools/default/buckets/@index-BUCKET/stats

Secure:https://localhost:18091/pools/default/buckets/@index-BUCKET/stats

Example

Thefollowingexampledemonstrateshowtoretrievetheeventingservicestatsforthecluster.

BUCKET="travel-sample"

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@index-$BUCKET/stats|\

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length==3)|

""+(.key)+":"+

(.value|add/length|tostring)'

GETIndividualIndexStatsperNode

Eachnodeintheclusterrunningtheindexserviceshouldbemonitoringindividually.

Insecure:http://localhost:8091/pools/default/buckets/@index-BUCKET/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/@index-BUCKET/nodes/NODE/stats

Example:StatsforIndividualNode

Thefollowingexampledemonstrateshowtoretrievealloftheindexstatsforaspecificindexinabucketfora

specificnode.

NODE="172.17.0.2:8091"

BUCKET="travel-sample"

INDEX="def_faa"

#gettheindexstatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@index-$BUCKET/nodes/$NODE/sta

Monitoring:IndexService

45CouchbaseProfessionalServices

Page 45: Monitoring Guide - Couchbase

ts|\

jq-r--argindex"$INDEX"'.op.samples|

"avg_item_size:"+(.["index/"+$index+"/avg_item_size"]|add/len

gth|tostring)+

"\navg_scan_latency:"+(.["index/"+$index+"/avg_scan_latency"]|a

dd/length|tostring)+

"\ncache_hits:"+(.["index/"+$index+"/cache_hits"]|add|tostrin

g)+

"\ncache_miss_ratio:"+(.["index/"+$index+"/cache_miss_ratio"]|a

dd/length|tostring)+

"\ncache_misses:"+(.["index/"+$index+"/cache_misses"]|add|tos

tring)+

"\ndata_size:"+(.["index/"+$index+"/data_size"]|add/length|

tostring)+

"\ndisk_overhead_estimate:"+(.["index/"+$index+"/disk_overhead_es

timate"]|add/length|tostring)+

"\ndisk_size:"+(.["index/"+$index+"/disk_size"]|add/length|

tostring)+

"\nfrag_percent:"+(.["index/"+$index+"/frag_percent"]|add/len

gth|tostring)+

"\nindex_frag_percent:"+(.["index/"+$index+"/index_frag_percent"]

|add/length|tostring)+

"\nindex_resident_percent:"+(.["index/"+$index+"/index_resident_p

ercent"]|add/length|tostring)+

"\nitems_count:"+(.["index/"+$index+"/items_count"]|add/lengt

h|tostring)+

"\nmemory_used:"+(.["index/"+$index+"/memory_used"]|add/lengt

h|tostring)+

"\nnum_docs_indexed:"+(.["index/"+$index+"/num_docs_indexed"]|a

dd|tostring)+

"\nnum_docs_pending+queued:"+(.["index/"+$index+"/num_docs_pendin

g+queued"]|add|tostring)+

"\nnum_docs_queued:"+(.["index/"+$index+"/num_docs_queued"]|add

|tostring)+

"\nnum_requests:"+(.["index/"+$index+"/num_requests"]|add|tos

tring)+

"\nnum_rows_returned:"+(.["index/"+$index+"/num_rows_returned"]|

add|tostring)+

"\nrecs_in_mem:"+(.["index/"+$index+"/recs_in_mem"]|add/lengt

h|tostring)+

"\nrecs_on_disk:"+(.["index/"+$index+"/recs_on_disk"]|add/len

gth|tostring)+

"\nscan_bytes_read:"+(.["index/"+$index+"/scan_bytes_read"]|add

|tostring)+

"\ntotal_scan_duration:"+(.["index/"+$index+"/total_scan_duration

"]|add|tostring)

Monitoring:IndexService

46CouchbaseProfessionalServices

Page 46: Monitoring Guide - Couchbase

'

Example:StatsforIndividualNode

Thefollowingexampledemonstrateshowtoretrievealloftheindexstats,foreverybucketintheclusterfora

singlenode.

NODE="172.17.0.2:8091"

#loopovereachofthebucketsthathasindexes

forbucketin$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/indexStatus|\

jq-r'[.indexes[]|.bucket]|sort|unique|.[]')

do

echo""

echo"Bucket:$bucket"

echo"================================================================"

#gettheindexstatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@index-$bucket/nodes/$NODE/s

tats|\

#1.reducethesamplesobject,byloopingovereachproperty,onlywork

withproperties

#whoareindexspecificstatpropertiesandeithersumoraveragesampl

es

#2.getalloftheuniqueindexkeys

#3.loopovereachindexandoutputthestats

jq-r'reduce(.op.samples|to_entries[])as$key,$value(

;

if(

$key|split("/")|length==3

and($key|contains("replica")|not)

)then

if([

"cache_hits","cache_misses","num_docs_indexed","num_docs_pending

",

"num_docs_pending+queued","num_docs_queued","num_requests",

Monitoring:IndexService

47CouchbaseProfessionalServices

Page 47: Monitoring Guide - Couchbase

"num_rows_returned","scan_bytes_read","total_scan_duration"

]|.[]|contains($key|split("/")|.[2])==true)then

.[$key]+=($value|add)

else

.[$key]+=($value|add/length|roundit/100.0)

end

else

.

end

)|.as$stats|

$stats|keys|map(split("/")[1])|sort|uniqueas$indexes|

$indexes|.[]|

"Index:"+.+

"\n----------------------------------------------------------------"+

"\navg_item_size:"+($stats["index\/"+.+"\/avg_item_size"]|t

ostring)+

"\navg_scan_latency:"+($stats["index\/"+.+"\/avg_scan_latency

"]|tostring)+

"\ncache_hits:"+($stats["index\/"+.+"\/cache_hits"]|tostrin

g)+

"\ncache_miss_ratio:"+($stats["index\/"+.+"\/cache_miss_ratio

"]|tostring)+

"\ncache_misses:"+($stats["index\/"+.+"\/cache_misses"]|tos

tring)+

"\ndata_size:"+($stats["index\/"+.+"\/data_size"]|tostring)

+

"\ndisk_overhead_estimate:"+($stats["index\/"+.+"\/disk_overh

ead_estimate"]|tostring)+

"\ndisk_size:"+($stats["index\/"+.+"\/disk_size"]|tostring)

+

"\nfrag_percent:"+($stats["index\/"+.+"\/frag_percent"]|tos

tring)+

"\nindex_frag_percent:"+($stats["index\/"+.+"\/index_frag_per

cent"]|tostring)+

"\nindex_resident_percent:"+($stats["index\/"+.+"\/index_resi

dent_percent"]|tostring)+

"\nitems_count:"+($stats["index\/"+.+"\/items_count"]|tostr

ing)+

"\nmemory_used:"+($stats["index\/"+.+"\/memory_used"]|tostr

ing)+

"\nnum_docs_indexed:"+($stats["index\/"+.+"\/num_docs_indexed

"]|tostring)+

"\nnum_docs_pending:"+($stats["index\/"+.+"\/num_docs_pending

"]|tostring)+

"\nnum_docs_pending+queued:"+($stats["index\/"+.+"\/num_docs_

pending+queued"]|tostring)+

Monitoring:IndexService

48CouchbaseProfessionalServices

Page 48: Monitoring Guide - Couchbase

"\nnum_docs_queued:"+($stats["index\/"+.+"\/num_docs_queued"]

|tostring)+

"\nnum_requests:"+($stats["index\/"+.+"\/num_requests"]|tos

tring)+

"\nnum_rows_returned:"+($stats["index\/"+.+"\/num_rows_return

ed"]|tostring)+

"\nrecs_in_mem:"+($stats["index\/"+.+"\/recs_in_mem"]|tostr

ing)+

"\nrecs_on_disk:"+($stats["index\/"+.+"\/recs_on_disk"]|tos

tring)+

"\nscan_bytes_read:"+($stats["index\/"+.+"\/scan_bytes_read"]

|tostring)+

"\navg_scan_latency:"+($stats["index\/"+.+"\/avg_scan_latency

"]|tostring)+

"\ntotal_scan_duration:"+($stats["index\/"+.+"\/total_scan_du

ration"]|tostring)+

"\n"

'

done

KeyMetricstoMonitor

CouchbaseMetric Description Response

avg_item_size Theaverageindexentrysize

Createabaselineforthisvalue,as"normal"willdependonthesize.Alertat2xofthebaseline.Thiswouldindicateadramaticmodelchange.

avg_scan_latencyTheaveragescanlatency

Createabaselineforthisvalue,as"normal"willdependonthesize.Alertat2xofthebaseline.Thiswouldindicateaslowdownforindexscanstotheindex.

index_resident_percent

Thepercentageoftheindexthatismemoryresident

Createabaselineforthisvalueas"normal"willdependonSLAsandhardconfiguration.Alertat5-10%deviationofthebaseline.

num_requests

Thenumberofindexscanrequeststotheindex

Createabaselineforthisvalue,as"normal"willdependontheamount.Alertat2xofthebaseline.Thiswouldindicateadramaticincreaseinrequests.

IndexAggregateStatsTheIndexaggregatestatsforaspecificbucketareavailableonlyunderthebucketthattheindexesexiston

andareatotalofalloftheindexesforthatbucketintheclusterornode.

AvailableStats

Monitoring:IndexService

49CouchbaseProfessionalServices

Page 49: Monitoring Guide - Couchbase

Statname Description

index/cache_hits Thenumberofin-memoryhitstotheindex

index/cache_misses Thenumberofin-memorymissestotheindex

index/data_size Thetotaldatasizeoftheindex

index/data_size_on_disk Thetotaldatasizeondisk

index/disk_overhead_estimate Thesizeofstaledataondiskduetofragmentation

index/disk_size Thesizeoftheindexondisk

index/frag_percent Theindexfragmentationpercentage

index/fragmentation Theindexfragmentationpercentage

index/items_count Thenumberofitemsintheindex

index/memory_used Theamountofmemoryusedbytheindex

index/num_docs_indexed Thenumberofitemsindexedsincethelastrestart

index/num_docs_pending Thenumberofdocumentsthatarependingorqueuedforindexing

index/num_docs_queued Thenumberofdocumentsthatarequeuedforindexing

index/num_requests Thenumberofrequeststotheindex

index/num_rows_returned Theaveragenumberofrowsreturnedbyascan

index/raw_data_size Therawuncompresseddatasize

index/recs_in_mem Thenumberofrecordsintheindexthatareinmemory

index/recs_on_disk Thenumberofrecordsnotinmemory

index/scan_bytes_read Theaveragenumberofbytesreadperscan

index/total_scan_duration Thetotaltimespentscanning

GETClusterIndexAggregateStats

Insecure:http://localhost:8091/pools/default/buckets/@index-BUCKET/stats

Secure:https://localhost:18091/pools/default/buckets/@index-BUCKET/stats

Example:StatsforCluster

Thefollowingexampledemonstrateshowtoretrievealloftheindexaggregatestatsforaspecificbucketin

theentirecluster.

BUCKET="travel-sample"

#gettheindexstatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

Monitoring:IndexService

50CouchbaseProfessionalServices

Page 50: Monitoring Guide - Couchbase

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@index-$BUCKET/stats|\

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length==2)|

""+(.key|split("/")[1])+":"+

(.value|add/length|tostring)'

GETIndexAggregateStatsperNode

Insecure:http://localhost:8091/pools/default/buckets/@index-BUCKET/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/@index-BUCKET/nodes/NODE/stats

Example:AggregateStatsforIndividualNode

Thefollowingexampledemonstrateshowtoretrievealloftheindexaggregatestatsforaspecificinabucket

foraspecificnode.

BUCKET="travel-sample"

NODE="172.17.0.2:8091"

#gettheindexstatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@index-$BUCKET/nodes/$NODE/sta

ts|\

jq-r'.op.samples|

"cache_hits:"+(.["index/cache_hits"]|add|tostring)+

"\ncache_misses:"+(.["index/cache_misses"]|add|tostring)+

"\ndata_size:"+(.["index/data_size"]|add|tostring)+

"\ndisk_overhead_estimate:"+(.["index/disk_overhead_estimate"]|add

/length|tostring)+

"\ndisk_size:"+(.["index/disk_size"]|add|tostring)+

"\nfrag_percent:"+(.["index/frag_percent"]|add/length|tostring)

+

"\nfragmentation:"+(.["index/fragmentation"]|add/length|tostrin

g)+

"\nitems_count:"+(.["index/items_count"]|add/length|tostring)+

"\nmemory_used:"+(.["index/memory_used"]|add/length|tostring)+

"\nnum_docs_indexed:"+(.["index/num_docs_indexed"]|add|tostring)

+

"\nnum_docs_pending:"+(.["index/num_docs_pending"]|add|tostring)

+

Monitoring:IndexService

51CouchbaseProfessionalServices

Page 51: Monitoring Guide - Couchbase

"\nnum_docs_queued:"+(.["index/num_docs_queued"]|add|tostring)+

"\nnum_requests:"+(.["index/num_requests"]|add|tostring)+

"\nnum_rows_returned:"+(.["index/num_rows_returned"]|add|tostring

)+

"\nrecs_in_mem:"+(.["index/recs_in_mem"]|add|tostring)+

"\nrecs_on_disk:"+(.["index/recs_on_disk"]|add|tostring)+

"\nscan_bytes_read:"+(.["index/scan_bytes_read"]|add|tostring)+

"\ntotal_scan_duration:"+(.["index/total_scan_duration"]|add|tost

ring)

'

Monitoring:IndexService

52CouchbaseProfessionalServices

Page 52: Monitoring Guide - Couchbase

Monitoring:Logs

Built-inEmailAlertsandLogsCouchbaseprovidesseveralbuilt-inalertsforwhenCouchbaseisapproachingacriticalfailureorwhena

criticalfailurehasoccurred.Itisrecommendedtoenablethebuilt-inemailalertsandconfigurethemtobe

senttomultiplerecipientsoradistributionlist.Thesealertsshouldbetreatedasafail-safetoproactive

alertingfromanexternalmonitoringservice.

SomeenvironmentsdonotpermitCouchbasenodestosendemail.Thistableprovidesthelog-based

equivalentofthebuilt-inCouchbaseemailalerts.

LogscanbemonitoredviaRESTusingthe https://<server>:8091/logsendpointorviathe

/opt/couchbase/var/lib/couchbase/logs/info.logfile.Alertscanbegeneratedbyapplyinga

regularexpressiontomatcheitherthemodule/codecombinationorstringnotedbelow.

AvailableAlerts

Alert Description Code

Nodewasauto-failed-over

Thesendingnodehasbeenfailedoverautomatically. auto_failover_node

Maximumnumberofauto-failed-overnodeswasreached

Theauto-failoversystemstopsauto-failoverwhenthemaximumnumberofsparenodesavailablehasbeenreached.

auto_failover_maximum_reached

Nodewasn'tauto-failed-overasothernodesaredownatthesametime

Auto-failoverdoesnottakeplaceifthereisalreadyanodedown. auto_failover_other_nodes_down

Nodewasnotauto-failed-overastherearenotenoughnodesintheclusterrunningthesameservice

Youcannotsupportauto-failoverwithlessthanthreenodes. auto_failover_cluster_too_small

Nodewasnotauto-failed-overasauto-failoverforoneormoreservicesrunningonthenodeisdisabled

Auto-failoverdoesnottakeplaceonanodeasoneormoreservicesrunningonthenodeisdisabled.

auto_failover_disabled

Node'sIPaddresshaschangedunexpectedly

TheIPaddressofthenodehaschanged,whichmayindicateanetworkinterface,operatingsystem,orothernetworkorsystemfailure.

ip

Diskspaceusedforpersistentstoragehasreach

Thediskdeviceconfiguredforstorageofpersistentdatais disk

Monitoring:Logs

53CouchbaseProfessionalServices

Page 53: Monitoring Guide - Couchbase

storagehasreachatleast90%ofcapacity

storageofpersistentdataisnearingfullcapacity.

disk

Metadataoverheadismorethan50%

Theamountofdatarequiredtostorethemetadatainformationforyourdatasetisnowgreaterthan50%oftheavailableRAM.

overhead

Bucketmemoryonanodeisentirelyusedformetadata

AlltheavailableRAMonanodeisbeingusedtostorethemetadatafortheobjectsstored.Thismeansthatthereisnomemoryavailableforcachingvalues.Withnomemoryleftforstoringmetadata,furtherrequeststostoredatawillalsofail.

Onlyapplicabletobucketsconfiguredforvalue-onlyejection.

ep_oom_errors

Writingdatatodiskforaspecificbuckethasfailed

Thediskordeviceusedforpersistingdatahasfailedtostorepersistentdataforabucket.

ep_item_commit_failed

Writingeventtoauditloghasfailed

Theauditlogeventwritinghasfailed. audit_dropped_events

ApproachingfullIndexerRAMwarning

TheindexerRAMlimitthresholdisapproachingwarning.

indexer_ram_max_usage

Remotemutationtimestampexceededdriftthreshold

Theremotemutationtimestampexceededdriftthresholdwarning. ep_clock_cas_drift_threshold_exceeded

Communicationissuesamongsomenodesinthecluster

Therearesomecommunicationissuesinsomenodeswithinthecluster.

communication_issue

LogsAPIThesamelogfilemessagesthatareavailableintheAdminUIhttp://localhost:8091/ui/index.html#!/logsare

availableviaaRESTAPIaswell.

Insecure:http://localhost:8091/logs

Secure:https://localhost:18091/logs

APIParameters

TheLogsAPIsupportsthefollowingquerystringparameters

Param Description

limit Anintegergreaterthan0thatlimitstheoverallnumberofmessagesreturned

sinceTime Epochtimestampinmillisecondstostartreturningmessagesfrom

Monitoring:Logs

54CouchbaseProfessionalServices

Page 54: Monitoring Guide - Couchbase

LogResponseProperties

Property Description

code Acodespecifiedbythemoduleor0

module Themodulethatgeneratedthelogmessage

node Thenodethatthemessagecamefrom

serverTime AnISO-8601timestampofwhenthemessagewaslogged

shortTextAshortstringdescribingthelogentry,mostcommonly"message","nodeup",or"nodedown"

text Thedetailedlogmessage

tstamp AnEpochtimestampofwhenthemessagewaslogged

type Thetypeoflogmessage,valuescanbe:info,warning,critical

Example:AllLogMessages

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datalimit=100\

http://localhost:8091/logs|\

jq-r'.list[]|

"["+.type+"]"+.serverTime+

"Module:"+.module+

"Code:"+(.code|tostring)+

"Message:"+.text

'

Example:CriticalMessagesOnly

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datalimit=100\

http://localhost:8091/logs|\

jq-r'.list[]|select(.type=="critical")|

"["+.type+"]"+.serverTime+

"Module:"+.module+

"Code:"+(.code|tostring)+

"Message:"+.text

'

Monitoring:Logs

55CouchbaseProfessionalServices

Page 55: Monitoring Guide - Couchbase

Example:WarningMessagesOnly

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datalimit=100\

http://localhost:8091/logs|\

jq-r'.list[]|select(.type=="warning")|

"["+.type+"]"+.serverTime+

"Module:"+.module+

"Code:"+(.code|tostring)+

"Message:"+.text

'

Example:CriticalorWarningMessagesOnly

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datalimit=100\

http://localhost:8091/logs|\

jq-r'.list[]|select(.type=="critical"or.type=="warning")|

"["+.type+"]"+.serverTime+

"Module:"+.module+

"Code:"+(.code|tostring)+

"Message:"+.text

'

AlertsAPICriticalalertsthattriggeremailalerts,arealsodisplayedtousersintheAdminUIuponloggingin.These

alertscanoptionallybemonitored,shouldemailnotbeanoption.

Insecure:http://localhost:8091/pools/default

Secure:https://localhost:18091/pools/default

Alertsarelocatedattherootoftheresponsepayloadinaproperty "alerts",whichisanarray.

AlertProperties

Monitoring:Logs

56CouchbaseProfessionalServices

Page 56: Monitoring Guide - Couchbase

Property Description

msg Thealertmessageanddetails

serverTime Thetimethealertwasissued

Example:RetrieveAllAlerts

curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/default|\

jq-r'.alerts[]|.serverTime+"-"+.msg'

Monitoring:Logs

57CouchbaseProfessionalServices

Page 57: Monitoring Guide - Couchbase

Monitoring:Nodes

GETNodesOverview

http://localhost:8091/pools/nodes

Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-node-get-info.html

Response

"nodes":[

"hostname":"10.112.170.101:8091",

"thisNode":true,

"ports":

"sslProxy":11214,

"httpsMgmt":18091,

"httpsCAPI":18092,

"proxy":11211,

"direct":11210

,

"services":["fts","index","kv","n1ql","cbas","eventing"]

]

Eachnodeintheclusterislistedinthe"nodes"array.The thisNodeattributeindicatesthenodeyou

haveexecutedthequeryagainst.Usingthisoutput,amonitoringagentcandiscovernewnodeswithinthe

clusterandwhichservicesareassignedtothosenodesinordertoautomaticallyapplythecorrectmonitoring

profile.

KeyMetricstoMonitor

CouchbaseMetric Description Response

statusThisisametametricthatindicatesoverallnodehealth.

Alertifthevalueis"unhealthy".

clusterMembershipIndicateswhetherthenodeisanactiveparticipantinclusteroperations.Possiblevaluesare"active","inactiveAdded",and"inactiveFailed".

Alerton"inactiveFailed"andinvestigatethecauseofthenodefailure.

Example

Monitoring:Nodes

58CouchbaseProfessionalServices

Page 58: Monitoring Guide - Couchbase

Thisexampleillustratesretrievingthestatusofeachnodeinthecluster.

curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/nodes|\

jq-r'.nodes[]|.hostname+"("+.status+")"'

Example

Thefollowingexampledisplaystheclustermembershipofeachnode

curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/nodes|\

jq-r'.nodes[]|.hostname+"("+.clusterMembership+")"'

Example

Showtheservicesandsystemstatsforeachnodecluster.

curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/nodes|\

jq-r'.nodes[]|.hostname+"("+(.services|join(","))+")\n"+

"cpu_utilization_rate:"+

(.systemStats.cpu_utilization_rate|tostring)+"%\n"+

"swap_total:"+

(.systemStats.swap_total/1024/1024|tostring)+"MB\n"+

"swap_used:"+

(.systemStats.swap_used/1024/1024|tostring)+"MB("+

((.systemStats.swap_used/.systemStats.swap_total)*100|tostring)+

"%)\n"+

"mem_total:"+

(.systemStats.mem_total/1024/1024|tostring)+"MB\n"+

"mem_free:"+

(.systemStats.mem_free/1024/1024|tostring)+"MB("+

((.systemStats.mem_free/.systemStats.mem_total)*100|tostring)+"

%)"

'

Monitoring:Nodes

59CouchbaseProfessionalServices

Page 59: Monitoring Guide - Couchbase

Monitoring:Nodes

60CouchbaseProfessionalServices

Page 60: Monitoring Guide - Couchbase

Monitoring:QueryService

QueryService-LevelStatsThefollowingQuerystatsareavailableviatheCluster-WideorPer-NodeEndpointslistedbelow.

AvailableStats

Statname Description

query_avg_req_time Theaveragetotalrequesttime.

query_avg_svc_time Theaveragetimeofthequeryserviceforrequests.

query_avg_response_size Theaveragesizeinbytesoftheresonse.

query_avg_result_count Theaveragenumberofresultsbeingreturned.

query_active_requests Thenumberofactiverequests.

query_errors Thenumberofqueriesresultinginanerror.

query_invalid_requests Thenumberofinvalid/incorrectlyformattedqueries.

query_queued_requests Thenumberofqueryrequeststhathavebeenqueued.

query_request_time Thecurrentrequestduration.

query_requests Thecurrentnumberofrequestspersecond.

query_requests_1000ms Thenumberofqueriesgreaterthan1000ms.

query_requests_250ms Thenumberofqueriesgreaterthan250ms.

query_requests_5000ms Thenumberofqueriesgreaterthan5000ms.

query_requests_500ms Thenumberofqueriesgreaterthan500ms.

query_result_count Thenumberofresultsreturned.

query_result_size Theresultqueryresultsize.

query_selects Thenumberofselectsbeingexecuted.

query_service_time Thetimespentbythequeryservicetoservicetherequest.

query_warnings Thenumberofquerywarningsgenerated.

GETClusterQueryServiceStats

Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe

entireandclusterandthebestpracticeistomonitoreachnodeindividually.

Insecure:http://localhost:8091/pools/default/buckets/@query/stats

Secure:https://localhost:18091/pools/default/buckets/@query/stats

Monitoring:QueryService

61CouchbaseProfessionalServices

Page 61: Monitoring Guide - Couchbase

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@query/stats|\

jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|

.key+":"+(.value|add/length|tostring)'

GETNode-LevelQueryServiceStats

Eachnodeintheclusterrunningthequeryserviceshouldbemonitoringindividuallyusingtheendpointlisted

below.

Insecure:http://localhost:8091/pools/default/buckets/@query/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/@query/nodes/NODE/stats

Example:StatsforIndividualNode

Thefollowingexampledemonstrateshowtoretrievethequeryservicestatsforthecluster.

NODE="172.17.0.2:8091"

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@query/nodes/$NODE/stats|\

jq-r-c'.op.samples|

"query_avg_req_time:"+(.query_avg_req_time|add/length|tostring)

+

"\nquery_avg_svc_time:"+(.query_avg_svc_time|add/length|tostrin

g)+

"\nquery_avg_response_size:"+(.query_avg_response_size|add/length

|tostring)+

"\nquery_avg_result_count:"+(.query_avg_result_count|add/length|

tostring)+

"\nquery_active_requests:"+(.query_active_requests|add|tostring)

+

"\nquery_errors:"+(.query_errors|add|tostring)+

"\nquery_invalid_requests:"+(.query_invalid_requests|add|tostring

)+

"\nquery_queued_requests:"+(.query_queued_requests|add|tostring)

Monitoring:QueryService

62CouchbaseProfessionalServices

Page 62: Monitoring Guide - Couchbase

+

"\nquery_request_time:"+(.query_request_time|add|tostring)+

"\nquery_requests:"+(.query_requests|add|tostring)+

"\nquery_requests_1000ms:"+(.query_requests_1000ms|add|tostring)

+

"\nquery_requests_250ms:"+(.query_requests_250ms|add|tostring)+

"\nquery_requests_5000ms:"+(.query_requests_5000ms|add|tostring)

+

"\nquery_requests_500ms:"+(.query_requests_500ms|add|tostring)+

"\nquery_result_count:"+(.query_result_count|add|tostring)+

"\nquery_result_size:"+(.query_result_size|add|tostring)+

"\nquery_selects:"+(.query_selects|add|tostring)+

"\nquery_service_time:"+(.query_service_time|add|tostring)+

"\nquery_warnings:"+(.query_warnings|add|tostring)'

Example:StatsforEachNodeSeparately

#loopovereachofthebuckets

fornodein$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/nodes|\

jq-r'.nodes[]|

select(.services|contains(["n1ql"])==true)|

.hostname'

)

do

echo"$nodeQueryStats"

echo"-------------------------------------------------------"

#getthequerystatsforthespecificnode

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@query/nodes/$node/stats|\

jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|

.key+":"+(.value|add/length|tostring)'

done

KeyMetricstoMonitor

CouchbaseMetric Description Response

The

Monitoring:QueryService

63CouchbaseProfessionalServices

Page 63: Monitoring Guide - Couchbase

query_avg_svc_time

Theaveragetimeofthequeryserviceforrequests.

Createabaselineforthisvalue,as"normal"willdependonworkload.Alertat2xofthebaseline.Thiswouldindicatethatmorequerynodesmaybeneededorindexesareperformingslowlyandrequireinvestigation.

query_requests

Thenumberofqueryrequestspersecond.

Createabaselineforthisvalue,as"normal"willdependonworkload.Alertat2xofthebaseline.Thiswouldindicateanincreaseinquerytraffic.

Monitoring:QueryService

64CouchbaseProfessionalServices

Page 64: Monitoring Guide - Couchbase

Monitoring:OperatingSystem

OperatingSystemMetricsJustasmonitoringCouchbaseandtheindividualservices,buckets,indexes,etc.isextremelyimportantto

haveasolidunderstandingofoverallclusterhealth,itisalsoimportanttomonitortheoperatingsystemand

variousstatsforeachnodeinthecluster.Eachoperatingsystemhasvaryingmeansofretrievingthese

metricsandmanymonitoringsolutionscollectthemoutofthebox.

OSMetric Response

FreeRAMFree+cachememoryshouldalwaysbeatleast20%oftotalsystemmemory.Iffree+cachememoryfallsbelow20%,scalethecluster.

Swapusage

Swapusageshouldalwaysbezero.Ifswapisused,itmeanstheOSisunderveryhighmemorypressureandunabletopurgedirtypagesfastenoughandtheclustershouldbescaled.

MemcachedprocessRAMusage

Createabaselineforthisvalueas"normal"willbedependentuponyourworkingset.Alertifthisvalueexceeds150%ofbaseline.Thismayindicateanunusualincreaseinwritetraffic,readingoftypicallycolddata,orpossiblemallocfragmentation.ConfirmtheCouchbaseresidentratiosarestillcorrect.Addmemoryorscaletheclusterifnecessary.

Beam.smpprocessRAMusage

Createabaselineforthisvalueas"normal"willbedependentuponyourclustersizeandAPIactivitylevels.Alertifthisvalueexceeds120%ofbaseline.Thismayindicateamemoryleakinthebeamprocess.ContactCouchbaseSupportiflargerthanafewgigabytes.

IOutilization(iostat)

Createabaselineforthisvalueas"normal"willbedependentuponyourworkloadandavailablediskIO.OverallsustainedIOutilizationshouldnotexceed90%oftotalIOcapacity.

TotalCPUutilization

Createabaselineforthisvalueas"normal"willbedependentuponyourworkload.SustainedCPUutilization>90%indicatesaneedtoscalethecluster.

CouchbaseserviceCPUutilization

Createabaselineforthesevaluesas"normal"willbedependentuponyourworkload.Alertifthisvalueexceeds2xofbaseline.

Beam.smpCPUutilization

Createabaselineforthisvalueas"normal"willbedependentuponyourworkload.Alertifthisvalueexceeds2xofbaseline

%stealCPUThisvalueshouldalwaysbezero.AnythingabovezeroindicatestheVMhypervisorisoversubscribed.AdditionalphysicalhostsshouldbeaddedorcollocatedVMsshouldbemigratedtootherhosts.

Networkutilization

Createabaselineforthisvalueas"normal"willbedependentuponyourworkload.Alertifthisvalueexceeds120%ofbaseline.Ifthesustainedutilizationisabove80%ofthetotalavailablebandwidth,itindicatestheneedtoscalethecluster.

Presenceofbeam.smpprocess

Alertifbeam.smpisnotpresent.ThisindicatesCouchbaseisofflineandneedstoberestarted.

Alertifdata/index/query/fts/eventing/analyticsprocessesarenotpresent.ThisindicatesCouchbaseiseitheroffline,startingup,orservicesmayhavecrashedandneedtoberestarted.Belowaretheprocessesbyservice:

Monitoring:OperatingSystem

65CouchbaseProfessionalServices

Page 65: Monitoring Guide - Couchbase

Presenceofserviceprocesses

restarted.Belowaretheprocessesbyservice:

DataService:memcachedDataService:projectorDataService:goxdcrIndexService:indexerQueryService:cbq-engineFullTextSearchService:cbftEventingService:eventing-producerEventingService:eventing-consumerAnalyticsService:cbas

NTPclockskew

Couchbaserequiresallclusternodes(andanyreplicatedclusters)tohavetheirsystemclockssynchronizedtoacommonclocksource.Monitorclockskewoneachserverandalertifitismorethan1minuteoutofsync.

CouchbaseSystemStatsThefollowingOperatingSystemstatsareavailableviatheCluster-WideorPer-NodeEndpointslistedbelow.

AvailableStats

Statname Description

allocstall Numberofallocationsstalledwhenreclaiming

cpu_cores_available NumberofCPUcoresavailableintheclusterorthenode

cpu_irq_rate TheCPUinterruptrequestrate

cpu_stolen_rate CPUstealrate

cpu_idle_ms TheamountoftimetheCPUhasbeenidle

cpu_local_ms

cpu_utilization_rate MaxCPUutilization%

hibernated_requests Idlestreamingrequests

hibernated_waked Streamingwakeups/sec

mem_actual_free AmountofRAMavailableonthisserver

mem_actual_used AmountofRAMusedonthisserver

mem_free AmountofRAMavailableonthisserver

mem_limit ThelimitforRAM

mem_total AmountofRAMusedonthisserver

mem_used_sys AmountofRAMavailabletotheOS

odp_report_failed

rest_requests Managementportreqs/sec

swap_total Amountofswapspaceavailableonthisserver

swap_used Amountofswapspaceinuseonthisserver

Monitoring:OperatingSystem

66CouchbaseProfessionalServices

Page 66: Monitoring Guide - Couchbase

GETClusterSystemStats

Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe

entireandclusterandthebestpracticeistomonitoreachnodeindividually.

Insecure:http://localhost:8091/pools/default/buckets/@system/stats

Secure:https://localhost:18091/pools/default/buckets/@system/stats

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@query/stats|\

jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|

.key+":"+(.value|add/length|tostring)'

GETNode-LevelOSStats

Eachnodeintheclustershouldbemonitoringindividuallyusingtheendpointlistedbelow.

Insecure:http://localhost:8091/pools/default/buckets/@system/nodes/NODE/stats

Secure:https://localhost:18091/pools/default/buckets/@system/nodes/NODE/stats

Example:StatsforIndividualNode

Thefollowingexampledemonstrateshowtoretrievethesystemstatsforthecluster.

NODE="172.17.0.2:8091"

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@system/nodes/$NODE/stats|\

jq-r-c'.op.samples|

"cpu_idle_ms:"+(.cpu_idle_ms|add/length|tostring)+

"\ncpu_local_ms:"+(.cpu_local_ms|add/length|tostring)+

"\ncpu_utilization_rate:"+(.cpu_utilization_rate|add/length|tos

tring)+

"\nhibernated_requests:"+(.hibernated_requests|add/length|tostr

ing)+

Monitoring:OperatingSystem

67CouchbaseProfessionalServices

Page 67: Monitoring Guide - Couchbase

"\nhibernated_waked:"+(.hibernated_waked|add/length|tostring)+

"\nmem_actual_free:"+(.mem_actual_free|add/length|tostring)+

"\nmem_actual_used:"+(.mem_actual_used|add/length|tostring)+

"\nmem_free:"+(.mem_free|add/length|tostring)+

"\nmem_total:"+(.mem_total|add/length|tostring)+

"\nmem_used_sys:"+(.mem_used_sys|add/length|tostring)+

"\nrest_requests:"+(.rest_requests|add/length|tostring)+

"\nswap_total:"+(.swap_total|add/length|tostring)+

"\nswap_used:"+(.swap_used|add/length|tostring)'

Example:StatsforEachNodeSeparately

#loopovereachofthebuckets

fornodein$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/nodes|\

jq-r'.nodes[]|

.hostname'

)

do

echo"$nodeOSStats"

echo"-------------------------------------------------------"

#getthesystemstatsforthespecificnode

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@system/nodes/$node/stats|

\

jq-r'.op.samples|to_entries[]|select(.key!="timestamp")|

.key+":"+(.value|add/length|tostring)'

done

Monitoring:OperatingSystem

68CouchbaseProfessionalServices

Page 68: Monitoring Guide - Couchbase

Monitoring:XDCR

ReplicationStatusThetasksendpointwillprovideclusterwideinformationonoperationssuchasrebalance,XDCRreplications,

etc.Theresponseisanarraythatwillneedtobefilteredforitemscontaining [].type=="xdcr"

Insecure:http://localhost:8091/pools/default/tasks

Secure:http://localhost:18091/pools/default/tasks

Response:

[

"cancelURI":"/controller/cancelXDCR/20763b82bb6b517bd0d15d9f6b78c13c%2Ftr

avel-sample%2Fdemo",

"settingsURI":"/settings/replications/20763b82bb6b517bd0d15d9f6b78c13c%2F

travel-sample%2Fdemo",

"status":"running",

"replicationType":"xmem",

"continuous":true,

"filterExpression":"",

"id":"20763b82bb6b517bd0d15d9f6b78c13c/travel-sample/demo",

"pauseRequested":false,

"source":"travel-sample",

"target":"/remoteClusters/20763b82bb6b517bd0d15d9f6b78c13c/buckets/demo",

"type":"xdcr",

"recommendedRefreshPeriod":10,

"changesLeft":0,

"docsChecked":0,

"docsWritten":31591,

"maxVBReps":null,

"errors":[]

]

KeyMetricstoMonitor

CouchbaseMetric Description Response

statusIndicateswhetherareplicationisina"running","paused",or"notRunning"state.

Alertifthevalueis"paused"or"notRunning".

Note:The replicationIdiscomposedof3parts,delimitedbya /:

SampleReplicationId: 6f76c2a07245aef856db44a8e361032/travel-sample/default

Monitoring:XDCR

69CouchbaseProfessionalServices

Page 69: Monitoring Guide - Couchbase

RemoteClusterID

SourceBucket

TargetBucket

Example

ThefollowingexampleillustratesoutputtingthereplicationIDandStatus.

curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/default/tasks|\

jq-r'map(select(.type|contains("xdcr")))|

.[]|.id+"("+.status+")"'

Thisexampleshowsoutputtingallreplicationswhosestatusis"paused"or"notRunning"

curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/default/tasks|\

jq-c'map(select(

(.type|contains("xdcr"))

and

(.status|contains("paused")orcontains("notRunning"))

))|.[]|.id+"("+.status+")"'

PerReplicationStatsTheXDCRstatsareanaggregateforalloftheconfiguredreplications,eitherfortheentireclusterora

specificnode.

html

Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-xdcr-statistics.html

AvailableStats

Statname Description

replication_changes_leftThetotalnumberofchangesleftacrossallreplicationsforthebucket

Thetotalnumberofdocumentsinreplication

Monitoring:XDCR

70CouchbaseProfessionalServices

Page 70: Monitoring Guide - Couchbase

replication_docs_rep_queueThetotalnumberofdocumentsinreplicationqueueforallreplicationsforthebucket

replications/replicationId/bandwidth_usageBandwidthusedduringreplication,measuredinbytespersecond.

replications/replicationId/changes_leftNumberofmutationstobereplicatedtotheremotecluster

replications/replicationId/data_replicated Sizeofdatareplicatedinbytes

replications/replicationId/datapool_failed_gets Numberoffailedgetsfromthepool

replications/replicationId/dcp_datach_length

replications/replicationId/dcp_dispatch_time

replications/replicationId/deletion_docs_writtenThenumberofdocsdeletedthathavebeenwrittentothetargetcluster

replications/replicationId/deletion_failed_cr_sourceThenumberofdeletesthathavefailedconflictresolutiononthesourceduetooptimisticreplication

replications/replicationId/deletion_filteredThenumberofdeletesthathavebeenfiltered

replications/replicationId/deletion_received_from_dcpThenumberofdeletesthathavebeenreceivedfromDCP

replications/replicationId/docs_checked Numberofdocumentscheckedforchanges

replications/replicationId/docs_failed_cr_sourceThenumberofdocsthathavefailedconflictresolutiononthesourceduetooptimisticreplication

replications/replicationId/docs_filteredNumberofdocumentsthathavebeenfilteredoutandnotreplicatedtotargetcluster

replications/replicationId/docs_opt_repd Numberofdocumentssentoptimistically

replications/replicationId/docs_processed Thenumberofdocumentsprocessed

replications/replicationId/docs_received_from_dcp NumberofdocumentsreceivedfromDCP

replications/replicationId/docs_rep_queue Numberofdocumentsinreplicationqueue

replications/replicationId/docs_unable_to_filterThenumberofdocumentswherefilteringcouldnotbeprocessed

replications/replicationId/docs_writtenNumberofdocumentswrittentothetargetcluster

replications/replicationId/expiry_docs_writtenThenumberofexpirydocumentswrittentothetargetcluster

replications/replicationId/expiry_failed_cr_sourceThenumberofexpiriesthathavefailedconflictresolutiononthesourceduetooptimisticreplication

expiry_filteredThenumberofexpirydocumentsthathavebeenfilteredoutandnotreplicatedtothetargetcluster

replications/replicationId/expiry_received_from_dcpThenumberofexpirydocumentsthathavebeenreceived

Thenumberofexpirydocumentsremoved

Monitoring:XDCR

71CouchbaseProfessionalServices

Page 71: Monitoring Guide - Couchbase

replications/replicationId/expiry_strippedThenumberofexpirydocumentsremovedfromreplicating

replications/replicationId/num_checkpointsNumberofcheckpointsissuedinreplicationqueue

replications/replicationId/num_failedckptsNumberofcheckpointsfailedduringreplication

replications/replicationId/percent_completenessPercentageofcheckeditemsoutofallcheckedandto-be-replicateditems

replications/replicationId/rate_doc_checks

replications/replicationId/rate_doc_opt_repd

replications/replicationId/rate_received_from_dcpNumberofdocumentsreceivedfromDCPpersecond

replications/replicationId/rate_replicatedRateofdocumentsbeingreplicated,measuredindocumentspersecond

replications/replicationId/resp_wait_time

replications/replicationId/set_docs_writtenThenumberofsetsthathavefailedconflictresolutiononthesourceduetooptimisticreplication

replications/replicationId/set_failed_cr_sourceThenumberofsetsthathavefailedconflictresolutiononthesourceduetooptimisticreplication

replications/replicationId/set_filteredNumberofsetsthathavebeenfilteredoutandnotreplicatedtotargetcluster

replications/replicationId/set_received_from_dcpThenumberofsetsthathavebeenreceivedfromDCP

replications/replicationId/size_rep_queue Sizeofreplicationqueueinbytes

replications/replicationId/throttle_latency Throttlelatency

replications/replicationId/throughput_throttle_latency Throughputthrottlelatency

replications/replicationId/time_committing Secondselapsedduringreplication

replications/replicationId/wtavg_docs_latencyWeightedaveragelatencyforsendingreplicatedchangestotargetcluster

replications/replicationId/wtavg_meta_latency

Weightedaveragetimeforrequestingdocumentmetadata.XDCRusesthisforconflictresolutionpriortosendingthedocumentintothereplicationqueue

GETCluster-WideBucketXDCRStats

Theseendpointsareinformationalandshouldnotbeusedformonitoringastheyareanaggregateforthe

entireandclusterandthebestpracticeistomonitoreachnodeindividually.

Insecure:http://localhost:8091/pools/default/buckets/@xdcr-BUCKET/stats

Secure:http://localhost:8091/pools/default/buckets/@xdcr-BUCKET/stats

Monitoring:XDCR

72CouchbaseProfessionalServices

Page 72: Monitoring Guide - Couchbase

Example:SingleBucket

ThisexamplewilloutputtheXDCRstatsforaspecificbucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@xdcr-travel-sample/stats|\

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length>1)|

""+(.key)+":"+

(.value|add/length|tostring)'

Example:AllReplications

ThisexamplewilloutputallXDCRstatsforeverybucketthathasoneormorereplicationsconfigured.

#loopovereachofthebuckets

forbucketin$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/default/tasks|\

jq-r'[.[]|select(.type=="xdcr")|.source]|sort|unique|.[]')

do

echo""

echo"Bucket:$bucket"

echo"================================================================"

#getthexdcrstatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@xdcr-$bucket/stats|\

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length>1)|

""+(.key)+":"+

(.value|add/length|tostring)'

done

GETNode-LevelBucketXDCRStats

Monitoring:XDCR

73CouchbaseProfessionalServices

Page 73: Monitoring Guide - Couchbase

Eachdatanodeintheclustershouldbemonitoringindividuallyusingtheendpointlistedbelow.

Insecure:http://localhost:8091/pools/default/buckets/@xdcr-BUCKET/nodes/NODE/stats

Secure:http://localhost:8091/pools/default/buckets/@xdcr-BUCKET/nodes/NODE/stats

Example:SingleBucket

ThisexamplewilloutputtheXDCRstatsforaspecificnodeandbucket.

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@xdcr-travel-sample/nodes/172.

17.0.2:8091/stats|\

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length>1)|

""+(.key)+":"+

(.value|add/length|tostring)'

Example:AllReplications

ThisexamplewilloutputallXDCRstatsforasinglenodeforeverybucketthathasoneormorereplications

configured.

#loopovereachofthebuckets

forbucketin$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/default/tasks|\

jq-r'[.[]|select(.type=="xdcr")|.source]|sort|unique|.[]')

do

echo""

echo"Bucket:$bucket"

echo"================================================================"

#getthexdcrstatsforthebucket

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@xdcr-$bucket/nodes/172.17.0

.2:8091/stats|\

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length>1)|

Monitoring:XDCR

74CouchbaseProfessionalServices

Page 74: Monitoring Guide - Couchbase

""+(.key)+":"+

(.value|add/length|tostring)'

done

Example:AllReplicationsforEachNode

ThisexamplewilloutputallXDCRstatsforasinglenodeforeverybucketthathasoneormorereplications

configured.

#getallofthebucketsintheclusterthathave1ormore

#xdcrreplicationsconfigured

buckets=$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/default/tasks|\

jq-r'[.[]|select(.type=="xdcr")|.source]|sort|unique|.[]')

#getallofthenodesintheclusterrunningthedataservice

nodes=$(curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/nodes|\

jq-r'.nodes[]|

select(.services|contains(["kv"])==true)|

.hostname'

)

#loopovereachofthebuckets

forbucketin$buckets[@]

do

echo""

echo"Bucket:$bucket"

echo"================================================================"

#loopovereachofthenodesinthecluster

fornodein$nodes[@]

do

echo"Node:$node"

echo"----------------------------------------------------------------"

#getthexdcrstatsforthebucketonthenode

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/@xdcr-$bucket/nodes/$node/

stats|\

Monitoring:XDCR

75CouchbaseProfessionalServices

Page 75: Monitoring Guide - Couchbase

jq-r'.op.samples|to_entries|sort_by(.key)|.[]|

select(.key|split("/")|length>1)|

""+(.key)+":"+

(.value|add/length|tostring)'

echo""

done

done

KeyMetricstoMonitor

CouchbaseMetric Description Response

changes_left

ThenumberofitemspendingXDCRreplication.Thiscanbeusedtoapproximatethedegreeofeventualconsistencybetweenclusters.

Createabaselineforthisvalueas"normal"willdependonworkload,XDCRconfiguration,andavailablebandwidth.Alertat2xofbaseline.Thismayindicatearesourcebottleneck.

bandwidth_usageTheamountofbandwidthinbytesusedforXDCRreplication.

AnalertvalueforthismetricshouldbebasedonthenetworkinterconnectcapacitybetweentheclustersandthepercentageoftheinterconnectXDCRisexpectedorallowedtoconsume.

GETPerNodeIndividualStatforaReplication

EachXDCRreplicationstatcanberetrievedindividually.TheentirekeymustbeURL-encoded,where /'s

arereplacedwith %2F.

Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-xdcr-statistics.html

Example

Thisexampleshowsrequestinganindividualstatforasinglereplicationanddisplaystheresultsforeach

datanodeinthecluster.

#setthereplicationinfo

REMOTE_CLUSTER='20763b82bb6b517bd0d15d9f6b78c13c'

SOURCE_BUCKET='travel-sample'

target_BUCKET='demo'

STAT_NAME='percent_completeness'

#buildtheurl

STAT_URL="http://localhost:8091/pools/default/buckets/$SOURCE_BUCKET/stats"

STAT_URL="$STAT_URL/replications%2F$REMOTE_CLUSTER%2F$SOURCE_BUCKET"

STAT_URL="$STAT_URL%2F$target_BUCKET%2F$STAT_NAME"

curl\

Monitoring:XDCR

76CouchbaseProfessionalServices

Page 76: Monitoring Guide - Couchbase

--userAdministrator:password\

--silent\

$STAT_URL|\

jq-r'.nodeStats|to_entries|.[]|

(.key|split(":")|.[0])+":"+(.value|add/length|tostring)'

GETRemoteClusterInformation

The replicationIdisauniquelygeneratedIDanddoesnotconveytheremoteclusterdetails.All

configuredremoteclustersandtheirassociatedIDscanberetrievedfromtheRESTAPI.

Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-xdcr-get-ref.html

Insecure:http://localhost:8091/pools/default/remoteClusters

Secure:https://localhost:18091/pools/default/remoteClusters

Example

Thisexampleshowsrequestinganindividualstatforasinglereplicationanddisplaystheresultsforeach

datanodeinthecluster.

curl\

--userAdministrator:password\

--silent\

--requestGET\

http://localhost:8091/pools/default/remoteClusters|\

jq-r'.'

BucketXDCROperations

GETBucketIncomingXDCRoperations

Toretrievetheincomingwriteoperationsthatoccuronatargetclusterduetoreplication,maketherequest

onyourtargetclusterandbucket.

Documentation:https://docs.couchbase.com/server/6.0/rest-api/rest-xdcr-statistics.html#rest-xdcr-stats-

operations

Insecure:http://localhost:8091/pools/default/buckets/BUCKET/stats

Secure:http://localhost:8091/pools/default/buckets/BUCKET/stats

AvailableStats

Statname Description

ep_num_ops_get_metaThenumberofmetadatareadoperationspersecondforthebucketasthetargetforXDCR

Monitoring:XDCR

77CouchbaseProfessionalServices

Page 77: Monitoring Guide - Couchbase

ep_num_ops_get_meta targetforXDCR

ep_num_ops_set_meta ThenumberofsetoperationspersecondforthebucketasthetargetforXDCR

ep_num_ops_del_metaThenumberofdeleteoperationspersecondforthebucketasthetargetforXDCR

xdc_opsTotalXDCRoperationspersecondforthisbucket(measuredfromthesumofthestatistics:ep_num_ops_del_meta,ep_num_ops_get_meta,andep_num_ops_set_meta)

Example

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/travel-sample/stats|\

jq-r'.op.samples|

"ep_num_ops_get_meta:"+(.ep_num_ops_get_meta|add/length|tostr

ing)+

"\nep_num_ops_set_meta:"+(.ep_num_ops_set_meta|add/length|tost

ring)+

"\nep_num_ops_del_meta:"+(.ep_num_ops_del_meta|add/length|tost

ring)+

"\nxdc_ops:"+(.xdc_ops|add/length|tostring)'

GETXDCRTimestamp-basedConflictResolutionStatsWhenusingbucketsconfiguredwithTimestamp-basedConflictResolutionitisimportanttomonitorthedrift

relatedstatistics.WhenaclusteristhedestinationforXDCRtraffic,activevBucketswillcalculatedriftfrom

theirremoteclusterpeers.

Itisnormalforaclusterwithcloselysynchronizedclockstoshowsomedrift;ingeneralitwillbeshowinghow

longittookamutationtobereplicatedandshouldremainsteady.ItisalsonormalfortheactivevBucketdrift

tobezeroifnoXDCRrelationshipexists(orifnoXDCRtrafficisflowing).

Documentation:https://docs.couchbase.com/server/6.0/learn/clusters-and-availability/xdcr-monitor-

timestamp-conflict-resolution.html

Insecure:http://localhost:8091/pools/default/buckets/BUCKET/stats

Secure:http://localhost:8091/pools/default/buckets/BUCKET/stats

AvailableStats

Monitoring:XDCR

78CouchbaseProfessionalServices

Page 78: Monitoring Guide - Couchbase

Statname Description

avg_active_timestamp_drift

avg_replica_timestamp_drift

ep_active_hlc_drift Thesumoftotal_abs_driftforthenode'sactivevBuckets

ep_active_hlc_drift_countThesumoftotal_abs_drift_countforthenode'sactivevBuckets

ep_replica_hlc_drift Thesumoftotal_abs_driftforthenode'sactivevBuckets

ep_replica_hlc_drift_countThesumoftotal_abs_drift_countforthenode'sactivevBuckets

ep_active_ahead_exceptionsThesumofdrift_ahead_exceededforthenode'sactivevBuckets

ep_replica_ahead_exceptionsThesumofdrift_ahead_exceededforthenode'sreplicavBuckets

ep_clock_cas_drift_threshold_exceeded

Example

curl\

--userAdministrator:password\

--silent\

--requestGET\

--datazoom=minute\

http://localhost:8091/pools/default/buckets/travel-sample/stats|\

jq-r'.op.samples|

"avg_active_timestamp_drift:"+

(.avg_active_timestamp_drift|add/length|tostring)+

"\navg_replica_timestamp_drift:"+

(.avg_replica_timestamp_drift|add/length|tostring)+

"\nep_active_hlc_drift:"+

(.ep_active_hlc_drift|add/length|tostring)+

"\nep_active_hlc_drift_count:"+

(.ep_active_hlc_drift_count|add/length|tostring)+

"\nep_replica_hlc_drift:"+

(.ep_replica_hlc_drift|add/length|tostring)+

"\nep_replica_hlc_drift_count:"+

(.ep_replica_hlc_drift_count|add/length|tostring)+

"\nep_active_ahead_exceptions:"+

(.ep_active_ahead_exceptions|add/length|tostring)+

"\nep_clock_cas_drift_threshold_exceeded:"+

(.ep_clock_cas_drift_threshold_exceeded|add/length|tostring)'

Monitoring:XDCR

79CouchbaseProfessionalServices