How to set up orchestrator to manage thousands of MySQL servers

57
How to setup Orchestrator to manage thousands of MySQL servers Simon J Mudd | 3 rd October 2017

Transcript of How to set up orchestrator to manage thousands of MySQL servers

Page 1: How to set up orchestrator to manage thousands of MySQL servers

HowtosetupOrchestratortomanagethousandsofMySQLservers

SimonJMudd |3rd October2017

Page 2: How to set up orchestrator to manage thousands of MySQL servers

SessionSummary

• Whatisorchestratorandwhyuseit?• Whathappensasyoumonitormoreservers?• Featuresaddedtomakeitscaleandimproveusability• Usingorchestratoratsmallerscale• Wayforward

1

Page 3: How to set up orchestrator to manage thousands of MySQL servers

Booking.com

• Oneofthelargesttravele-commercesitesintheworld• part ofthePricelineGroup(NASDAQ:PCLN)• Weofferaccommodationin228countries• Ourwebsiteandcustomerservicein40languages• Morethan15,000employeesin204officesin70countries

• WeusethousandsofMySQLservers:• weuseorchestratortomanagethetopologyandhandlemasterandintermediatemasterfailures

2

Page 4: How to set up orchestrator to manage thousands of MySQL servers

WhatisOrchestratorandwhyuseit?

3

Page 5: How to set up orchestrator to manage thousands of MySQL servers

Orchestrator

4

Page 6: How to set up orchestrator to manage thousands of MySQL servers

Orchestrator

• WrittenbyShlomi Noach• hestartedonthisatoutbrain andisnowworkingatgithub.com• Heintroducedbooking.com toorchestratorabout3yearsagowhenwewerelookingforsomethingtohandlefailoversautomatically

5

Page 7: How to set up orchestrator to manage thousands of MySQL servers

Orchestrator

• PeriodicallymonitorsMySQLserversandcheckstheirhealth• Handlesmasterfailover,butalsodoesmuchmore…• GUItomanageandvisualise topology– veryhandy• CLItodothesametasks– usedforscripting• APIcallstorunatadistance• NeedsaDBbackendtostorestate.

• NormallyMySQLbutcanbeSQLite

• Writteningo

6

Page 8: How to set up orchestrator to manage thousands of MySQL servers

Orchestrator

Whatfailuresdoesithandle?• Masterfailures

• Optionalhookstoexternalsystemswhichneedtobeawareofthesefailures

• Intermediatemasterfailures• Doesnot careaboutleafslavesorapplications• WorkswithOracleorMariaDB GTID• WorkswithoutGTID:CanaddPseudo-GTID (eventsinjectedonthemasterareusedtofindamatch)sononeed tomigrate

• Handlesmulti-leveltopologies7

Page 9: How to set up orchestrator to manage thousands of MySQL servers

OrchestratorGUI

8

Page 10: How to set up orchestrator to manage thousands of MySQL servers

OrchestratorGUI

9

Page 11: How to set up orchestrator to manage thousands of MySQL servers

OrchestratorGUI

10

Page 12: How to set up orchestrator to manage thousands of MySQL servers

Orchestrator

TopologyManagement• DraganddropusingtheGUI

• Moveoneslaveabout• Moveallslaves

• ScriptablerelocationfromthecommandlineorusingAPIcalls

11

Page 13: How to set up orchestrator to manage thousands of MySQL servers

Whathappensasyoumonitormoreservers?

12

Page 14: How to set up orchestrator to manage thousands of MySQL servers

Whathappensasyoumonitormoreservers?

• Integrationneededwithinternalinfrastructure• Deployment:tellorchestratortodiscoverandforgetservers*• Determinecandidatemasters• Handlespecialcases:

• testMySQLversions,specialsetups(black- orwhite-listserversorclusters)

• MakeorchestratorHA• Monitororchestratorbehaviour andperformance• Providewideraccesstodifferenttypesofuser

13*Itcanautomaticallydetectnewserversinanexistingclusterbutnotnewdetectnewclusterswithouthelp

Page 15: How to set up orchestrator to manage thousands of MySQL servers

IntegrationwithInternalInfrastructure

• Populatethemetadatadb onthemasterto:• Maphostorinstancenamestomorefamiliarclusternames• Howtodeterminereplicationdelay• Configurationofacceptablelevelsofreplicationdelay

• Addandremovalofservers/instancesastheyaredeployedorremovedfromservice

• SetupofPseudo-GTID(ifnotusingGTID)

14

Page 16: How to set up orchestrator to manage thousands of MySQL servers

IntegrationwithInternalInfrastructure

• Addfailoverhooksformonitoring,notificationandtotakesite-specificactions(tellothersystemsaboutthenewmaster)

• Selectionofcandidatemasters• Blacklistingserverswhicharenotsuitable:backupservers,testservers,serversinthewrongnetworkareas…

15

Page 17: How to set up orchestrator to manage thousands of MySQL servers

BetterVisibility

• Improveorchestratordeploymentvisibility• Foreachrunningapp:showhost,version,uptime• Showtheactivenodeandhowlongit’sbeenactive

• AuditingofMySQLfailuresandrecoveryviatheGUIisgoodandimproving

• noneedtosearchthelogs

16

Page 18: How to set up orchestrator to manage thousands of MySQL servers

BetterVisibility

17

Page 19: How to set up orchestrator to manage thousands of MySQL servers

Featuresaddedtoscaleandimproveusability

18

Page 20: How to set up orchestrator to manage thousands of MySQL servers

Performance

Wefoundbottlenecksespeciallyonstartup• Trytodiscoverseveralthousandmysql serversatonceandupdatethebackendatthesametimeàmax_connections exceeded

• Multiplegoroutinestryingtopollthesamestuckserver

Solution:• FIFODiscoveryqueuewhichavoidsduplicatesandlimitsmaximumdiscoveryconcurrency

19

Page 21: How to set up orchestrator to manage thousands of MySQL servers

Performance

Howtofigureoutwhat’sgoingon?• Understandingloggingishardatthisscale– toomuchnoise• Nodiscoverymetricstoseeproblemsatserveroraggregatelevel

Solution:• CollectdiscoverymetricsandkeepforNseconds• Logdiscoverytimesindebugmode• Provideinterfacetoretrieveraworaggregatevaluestouseinmonitoringsystems

20

Page 22: How to set up orchestrator to manage thousands of MySQL servers

Performance

Discovery(Poll)times

21

Page 23: How to set up orchestrator to manage thousands of MySQL servers

Performance

Discovery(Poll)counters

22

Page 24: How to set up orchestrator to manage thousands of MySQL servers

Performance

• Aclientupgrademightupgradethedatabasewhichotherolderappswerestillusing

Solution:• Makeauto-upgradeofthedatabaseoptionalsotheDBAcontrolsthis

23

Page 25: How to set up orchestrator to manage thousands of MySQL servers

Performance

• Crosszone(dc)accesschangesperformanceprofilesignificantlyandcausedproblems

• orchestratorappsaresupposedtobeeasytoreplaceandlocationshouldnotmatter

• latencycanbearealenemySolution:• Batchupdatesofsomedataintosmallernumberoflargerinserts• Collectmetricsonthesetimings• Catchdiscoverieswhichtaketoolong(internalcodebottlenecks)• Visibilityofthemetricsmadeiteasiertolocatecauses

24

Page 26: How to set up orchestrator to manage thousands of MySQL servers

Performance

• Specialconnectionssettings• "MySQLOrchestratorMaxPoolConnections":controlgopoolsize• "MySQLConnectTimeoutSeconds":1

• don’twastetimewaitingtoconnecttoadeadserver

25

Page 27: How to set up orchestrator to manage thousands of MySQL servers

Performance

golang specific-isms• Orchestratorbydefaultusesdatabase/sql andbydefaultsendsaquerywithparametersusingMySQL’sPrepare/Executesyntax

• Thisgenerates2rtt’s andonslowerconnectionscanaffecttheelapsedtimetocompleteaquery

• OptionstodisablethisbyinterpolatingparametervaluespriortosendingSQL

• Go(orchestratorcode)isquitehappytotry topoll10,000serversatonce

• Sometimesthatisnotsensible• Throttlingtoavoidthunderingherdisnecessary

26

Page 28: How to set up orchestrator to manage thousands of MySQL servers

OrchestratorHA

• Morethenoneorchestratorserverperzone/dc• Someupgradesreallyeasy– justrestartwithnewbinaries

• Commonendpointvialoadbalancer• Simplerforusers• worksforapi callsandmaysimplifyfirewallrules

27

Page 29: How to set up orchestrator to manage thousands of MySQL servers

28

OrchestratorHA

LoadBalancer

app1 app2 app3 app4

nginx1 nginx2 nginx3 nginx4

backend

Zone1 Zone2

Page 30: How to set up orchestrator to manage thousands of MySQL servers

OrchestratorHA

MightIhavemorethanoneorchestratorcluster?• Yesforactivedevelopment

• asaside-effectgivesusextraredundancy• Developmentloadistoosmalltocatchmanyissues• Recoveriesdisabledglobally onthisclusterbutmonitoringworksthesame

• Complianceregulationsmayrequiresegregationofdifferentnetworks

29

Page 31: How to set up orchestrator to manage thousands of MySQL servers

OrchestratorHA

Solution• MovefromusingorchestratorbinarytouseclusterAPIinterface

• Recentlymigratedtouseneworchestrator-clientcommandwhichsolvesthesameproblemandwasneededfororchestrator/raftaccess

• Simplifiesconfiguration• Allowseasyaccesstomorethenoneorchestratorcluster• Orchestratorupgradeswithdb backendchangesareeasier

30

Page 32: How to set up orchestrator to manage thousands of MySQL servers

OrchestratorAPI

EnhancementstoAPIcalls• Bulkretrievalofinstanceinformationandpromotionrules• Asynchronousdiscoverycall(e.g.bootstrapnewcluster)• Moremonitoringinformationavailable

• Discoverytimingmetrics• Discoveryqueuemetrics• Backendwritemetrics

31

Page 33: How to set up orchestrator to manage thousands of MySQL servers

SpecialCases

• TestingMySQL8.0orMariaDB 10.3?• “Let’snotpromotetothisbox”• Sameapplieswhiletestingnewminorversionsofcourse

• Sometopologieshaveslaveswithaggregatedata• Donot treatthemasanormalbox– shouldnot becandidatemasters

• OrchestratorcannothandleGRormulti-sourcereplicationyet• Besttoavoid theseboxes(forautomaticfailover)untilwehavesolutions• Patcheswelcometosolvesuchmissingfunctionality

32

Page 34: How to set up orchestrator to manage thousands of MySQL servers

SpecialCases

HandlingTLSconnections• OrchestratorcouldhandleusingTLSornotusingitbut…• SomeserversneedtobeaccessedbyTLS,othersdon’t(ODBCaccessormoresecuritysensitive systems)

• Orchestratorcouldnothandlethis• Codeaddedtorecognise errorandautomaticallyswitchtoTLS:

• Error 3159: Connections using insecure transport are prohibited while --require_secure_transport=ON

• GlobalOFF button– givesyoupeaceofmind

33

Page 35: How to set up orchestrator to manage thousands of MySQL servers

ProvideWiderUserAccess

• Orchestratorfanclub• Differentgroupsofuserslikeorchestrator• DBAs,Developers,Sysadmins,Auditors,Managers

• Usenginx (orsimilar)• Providesauthentication• ProvidesTLS• Thecombinationcanbeusedwithunix groupstoallowuser oradmin accesstoorchestrator

• Combinedwithaloadbalancerprovideseasyaccessforusersandalsoforapplications(usingapi calls)

34

Page 36: How to set up orchestrator to manage thousands of MySQL servers

Monitoring

Somethingstomonitor• Orchestratorprocess(andnginx)• Orchestratorclusterendpoint• SuccessfulorfailedDiscoveriesperminute• Discoveryqueuesizes• Discoverytimings

• aggregatedatagivesmean,medianandpercentiles• DiscoveriesexceedingInstancePollSeconds• Whenchangingactiveorchestratornodethesevaluesmay change

35

Page 37: How to set up orchestrator to manage thousands of MySQL servers

Booking.com contributions

Commitstopublicorchestratorrepo• Simon:170• Dmitry:40• Mauro:15• Daniël:8• Shlomi:many(whileworkingatbooking)

36

Page 38: How to set up orchestrator to manage thousands of MySQL servers

Usingorchestratoratsmallerscale

37

Page 39: How to set up orchestrator to manage thousands of MySQL servers

Usingorchestratoratsmallerscale

Notmentionedherebut• ConsideruseofSqlite – goodstartingpoint– singlebinary• ConsideruseofSqlite/raft

• providesHA• allnodesmonitorallMySQLservers

• Onlydifferenceisthedb backend• Notsurewherescalinglimits

38

Page 40: How to set up orchestrator to manage thousands of MySQL servers

Configurationsettings

Settingstobeconsidered,brokendownbyfunction

39

Page 41: How to set up orchestrator to manage thousands of MySQL servers

Configurationsettings

• MySQLbackend• "MySQLOrchestratorCredentialsConfigFile":"/path/.my-orchestratordb.cnf"• "MySQLOrchestratorDatabase":"orchestrator”• "MySQLOrchestratorHost":"orchestratordb.example.com"• "MySQLOrchestratorPort":3306• "MySQLOrchestratorMaxPoolConnections":100• "MySQLConnectTimeoutSeconds":1

• Sqlite backend• "BackendDB":"sqlite”• "SQLite3DataFile":"/var/lib/orchestrator/orchestrator.db"

40

Page 42: How to set up orchestrator to manage thousands of MySQL servers

Configurationsettings

• Psuedo-GTIDSettings(ifusingpseudo-gtid)• PseudoGTIDPattern• PseudoGTIDMonotonicHint• DetectPseudoGTIDQuery

41

Page 43: How to set up orchestrator to manage thousands of MySQL servers

Configurationsettings

• Clusterandhostsettings• Querymetadatadb (populatedexternally)todetectclusters• DetectClusterAliasQuery• DetectClusterDomainQuery

42

Page 44: How to set up orchestrator to manage thousands of MySQL servers

Configurationsettings

• Recoverysettings• Regexp filters– verysitedependent• RecoverMasterClusterFilters – white-listmastersbyclustername• RecoverIntermediateMasterClusterFilters• PromotionIgnoreHostnameFilters – ignoreserversfrombeingpromoted*• RecoveryIgnoreHostnameFilters – ignorespecialserversfromrecovery

43*Doesnotscalewell

Page 45: How to set up orchestrator to manage thousands of MySQL servers

Configurationsettings

• Failoversettings• OnFailureDetectionProcesses – whattodowhenafailureisdetected• PreFailoverProcesses – whattodopriortostartingrecovery• PostFailoverProcesses – whattodoaftercompletingrecovery• PostUnsuccessfulFailoverProcesses – whattodoifrecoveryfails• PostMasterFailoverProcesses – whattodoafterIMrecovery• PostIntermediateMasterFailoverProcesses – whattodoafterMasterrecovery

44

Page 46: How to set up orchestrator to manage thousands of MySQL servers

Configurationsettings

• Authenticationsettings(e.g.ifusingnginx withLDAP)• "AuthenticationMethod":"proxy",• "HTTPAuthUser":”user1",• "HTTPAuthPassword":”pass1",• "AuthUserHeader":”SomeHeader",• “PowerAuthUsers":["api-user1","api-user2",”realuser1"]• PowerAuthGroups":[”special_sysadmins”,“dbas”],

45*Doesnotscalewell

Page 47: How to set up orchestrator to manage thousands of MySQL servers

Configurationsettings

• Environmentsettings(e.g.shorten/simplifyhostnames)• “DataCenterPattern”• “PhysicalEnvironmentPattern”:• “RemoveTextFromHostnameDisplay”:“:.example.com:3306”

46

Page 48: How to set up orchestrator to manage thousands of MySQL servers

WayForward

47

Page 49: How to set up orchestrator to manage thousands of MySQL servers

WayForward

• Improvementsneededtotackleproblemsatbothendsofthescale• Smallerinstallations– forgettingonboard• Largerinstallations– toallowforfurtherscaling

48

Page 50: How to set up orchestrator to manage thousands of MySQL servers

WayForward

• Simplifyconfigurationandentrytoorchestrator• Shlomi isdoingaverygoodjobwithsqlite andraftsetups• Configurationcouldbesimplerandmoreautomaticformostpeople• Needtostandardise orchestratorsetupsmore?

• ExtendfunctionalitytocovermoreoftheMySQLeco-system• AWSandothercloudsystems• GroupReplicationorGalera• Multi-source

49

Page 51: How to set up orchestrator to manage thousands of MySQL servers

WayForward

• Distributionofdiscoveriesamongstallorchestratornodes• Orchestrator/raft:allnodesmonitorallMySQLservers

• Raftusagerecommendshavingseveralnodes• Orchestrator/MySQL:onenodemonitorsallMySQLservers• Better:distributemonitoringamongstavailablenodes

• Avoidsunnecessaryloadonmonitoredservers• reducesworkonbusyorchestratorapps• Usefulforsmallandlargeinstallations

• efficientbalancingisharder

50

Page 52: How to set up orchestrator to manage thousands of MySQL servers

WayForward

• Reducerecoverytime• Speedingupdetectiontorecoverytime wouldbegoodasreducesdowntime• Shouldbepossibletoreacttofailureevent(knowingstateofotherservers)immediately

• statecurrentlystoredinbackenddb• analysisanddetectionphasehappensindependentlyofserverpolling

• Withreduceddefaultpolltimeof5secondsrecoveryislikelytobetriggeredwithin10seconds

• notcriticalformostpeople?

51

Page 53: How to set up orchestrator to manage thousands of MySQL servers

WayForward

• Furtherworkneededtoscalemore• bottlenecksstillexist• Largerinstallationskeepgrowing

• Improvemonitoring• ExternalAPIcalls• Addinternalmetrics

52

Page 54: How to set up orchestrator to manage thousands of MySQL servers

Conclusion

53

Page 55: How to set up orchestrator to manage thousands of MySQL servers

Doesitwork?

Icheckedforfailuresoverarecentperiod• 6masterfailures• About40intermediatemasterfailures• No-onecalledup• Noharmwasdone

54

Page 56: How to set up orchestrator to manage thousands of MySQL servers

Questions?

Page 57: How to set up orchestrator to manage thousands of MySQL servers

Thanks

Simon J [email protected]