Breda Development Meetup 2016-06-08 - High Availability

53
High Availability Breda Development Meetup Bas Peters - june 8, 2016

Transcript of Breda Development Meetup 2016-06-08 - High Availability

Page 1: Breda Development Meetup 2016-06-08 - High Availability

HighAvailabilityBredaDevelopmentMeetupBasPeters- june 8,2016

Page 2: Breda Development Meetup 2016-06-08 - High Availability
Page 3: Breda Development Meetup 2016-06-08 - High Availability

UptimePercentiletarget Max downtimeperyear

90% 36days

99% 3.65days

99.5% 1.83days

99.9% 8.76hours

99.99% 52.56minutes

99.999% 5.25minutes

99.9999% 31.5seconds

Page 4: Breda Development Meetup 2016-06-08 - High Availability

HA is Redundancyü RAID: Disk crash? Another disk still works!

ü Virtualization: Physical host crashes? VM available on other physical host!

ü Clustering: Server crashes? Another server still works!

ü Power: Power outage? Redundant power supply!

ü Network: Switch or NIC crashes? 2nd network route available!

ü Geographical: Datacenter offline? Another DC available to perform work!

Page 5: Breda Development Meetup 2016-06-08 - High Availability

Traditional setup

router

server

enduser

Page 6: Breda Development Meetup 2016-06-08 - High Availability

Traditional setup - enhanced

router databaseserverenduser applicationserver

Page 7: Breda Development Meetup 2016-06-08 - High Availability

Adding redundancy

router databaseserverenduser

applicationserver1

loadbalancer

applicationserver2

Page 8: Breda Development Meetup 2016-06-08 - High Availability

Enhanced redundancy

router databaseserverenduser

applicationserver1

loadbalancer

applicationserver2

router(backup) loadbalancer (backup)

Page 9: Breda Development Meetup 2016-06-08 - High Availability

Database redundancy

routerenduser

applicationserver1

loadbalancer

applicationserver2

router(backup) loadbalancer (backup)

databaseserver1

databaseserver2

Page 10: Breda Development Meetup 2016-06-08 - High Availability

Datacenter redundancy

routerenduser

applicationserver1

loadbalancer applicationserver2

router(backup) loadbalancer (backup) databaseserver1

databaseserver2

datacenter1

datacenter2

Page 11: Breda Development Meetup 2016-06-08 - High Availability

States and sessionso Multiplerequestscanbeservedby

differentbackendservers

o StoresessionindatabaseornoSQL cache

o Loadbalancercan“stick”asinglebackend

servertoauser…

o ...butnot inallcases!

app1 app2 app3 app4

12

3

12 3

Page 12: Breda Development Meetup 2016-06-08 - High Availability

Local storageo Avoidstoringmeaningfulpersistentusercontentonalocalserver

o Applicationlevelcachingisusefulaslongasitisnotdestructive

o Synchronizationofcontentsbetweenbackendserversisapain

o Usedatabaseforstoragewherepossible

…Therearepossibilitiestosharestorageamongstbackendservers

Page 13: Breda Development Meetup 2016-06-08 - High Availability

Shared storage - NASo NetworkAttachedStorage

o ANAShandlesthecompletefilesystemo Reliesonprotocolslike:

NFS: NetworkFilesystemSMB/CIFS: WindowsFileSharing

o Simpletoimplement

o Redundancyisveryhardtoachieve,oftensinglepointoffailure

o Performanceismediocreandbottleneckscanoccur

Page 14: Breda Development Meetup 2016-06-08 - High Availability

Shared storage - SANo StorageAreaNetwork

o ASANhandlesonlythe“blocklevel”partofthefilesystemo Reliesonprotocolslike:

iSCSI: IPbasedSCSIFibre Channel: OpticalfibertransportprotocolAoE: ATAoverEthernet

o Hardtoimplement,expensive

o Redundancycanbeachievedtoavoidsinglepointoffailure

o Performanceandscalabilityis(reasonably)good

Page 15: Breda Development Meetup 2016-06-08 - High Availability

Shared storage – Cluster Filesystemo Filesystemsharedonmultipleserversusingspecialsoftware/driverso Windowsimplementation:

DFS: WindowsDistributedFileSystemo Linuximplementations:

HDFS: HadoopDistributedFilesystemCeph: ObjectStoragePlatformGlusterFS: RedHatClusterFilesystem

o Relativelyeasytoimplement

o Redundancycaneasilybeachieved

o Performanceandscalabilityis(reasonably)good

Page 16: Breda Development Meetup 2016-06-08 - High Availability

Database High Availabilityo HighAvailabilityonRDBMS(relationaldatabasemanagementsystems)is

oftenthemostdifficultthinginaHighAvailablesetup

o Hardwareresourcesanddataneed toberedundant

o Rememberthatitisn’tjustdata,itisconstantlychangingdata

o HighAvailabilitymeanstheoperationcancontinueuninterrupted,notby

restoringanew/backupserver

Page 17: Breda Development Meetup 2016-06-08 - High Availability

Database HA - Replication

o Asynchronousbydefault

o Onemaster,manyslaves

o Nowritescale-outpossible

o Difficulttorecoverfromafailoversituation

o Pronetoinconsistencywhennotusedproperly

Page 18: Breda Development Meetup 2016-06-08 - High Availability

Database HA - Shardingo Separatedataovermultipledatabase

back-endsusingkeyeddistribution

o Multimastersetuppossible

o Excellentscalability

o Redundancyneedstobeobtainedthroughacomplementarymethodology

o Requiresmorecomplexapplicationlogic

Page 19: Breda Development Meetup 2016-06-08 - High Availability

Database HA – Clustering I

o Synchronousbydefault

o Multimastersetuppossible

o Writescale-outpossible

o Near-automaticfaultrecovery

o Requirescodelevelreplicationconflictresolving

Page 20: Breda Development Meetup 2016-06-08 - High Availability

Database HA – Clustering IIClusteringforMicrosoftSQL(from2012)o AlwaysOnAvailabilityGroupso EachnoderequiresWSFC(WindowsServerFailoverClustering)o Asynchronousandsynchronouscommitmodesupportedo Upto8“warm”availabilityreplicascanbesetupo Thesereplicascanbeusedforreadtransactionsandbackupso Availabilitygrouplistenertoautomaticallyredirectclientstothebestavailableservero Nota“real”cluster,nomaster-masterreplicationpossible

Page 21: Breda Development Meetup 2016-06-08 - High Availability

Database HA – Clustering IIIClusteringforMySQL(MariaDB)o Galera (wsrep)plugintoenableclustering

(includedinMariaDB 10.1bydefault)o Asynchronousandsynchronouscommitmodesupportedo Multi-mastersynchronousreplicationo Readandwritescalabilityo Automaticmembershipcontrol,nodejoininganddroppingo Nolistenerfunctionalitythatredirectsclientstoavailablenodes

Page 22: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Quorum I

”A quorum istheminimumnumberofmembersofa deliberative

assembly necessarytoconductthebusinessofthatgroup”

- Wikipedia

Page 23: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Quorum IIo NodeMajority:Eachnodethatisavailable

andincommunicationcanvote.Theclusterfunctionsonlywithamajorityofthevotes.

o Whenanetworkpartitionoccurs,thenodesintheminoritypartwillgoinlockdowntoavoida“splitbrain”situation

o Whenanetworkpartitionresolves,theminoritypartwillrejointheactiveclusterafterastatetransfertoretrievethedatathatwaschangedinthemeantime

o Aclustershouldcontainanoddnumberofnodestopreventatotallockdownduringanodefailureornetworkpartition

Page 24: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Scenario 1o NodeAisgracefullystopped

o Othernodesreceive“leave”messageandquorumisreducedby1

o Clusterisonline

o NodeBandCcontinuetoserverequestsbecausetheyhavethemajorityofvotes(2of2)

Page 25: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Scenario 2o NodeAandBaregracefullystopped

o NodeCreceive“leave”messagesfromAandBandquorumisreducedby2

o Clusterisonline

o NodeCcontinuestoserveclientssinceithasthemajorityofvotesinthequorum(1of1)

Page 26: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Scenario 3o Allnodesaregracefullystopped

o Clusterisoffline

o Thereisapotentialprobleminstartingtheclusteragain.Themostrecent(laststopped)nodeshouldbeusedtobootstraptheclusterorthereispotentialdataloss

Page 27: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Scenario 4o NodeAdisappearsfromtheclusterdueto

unforeseencircumstances

o NodeBandCwilltrytoreconnecttoAbutwilleventuallyremoveAfromthecluster,maintainingthequorum(3)

o Clusterisonline

o NodeBandCcontinuetoserverequestsbecausetheyhavethemajorityofvotes(2of3)

Page 28: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Scenario 5o NodeAandBdisappearfromthecluster

duetounforeseencircumstances

o NodeCwilltrytoreconnecttoAandBbutwilleventuallyremovebothfromthecluster,maintainingthequorum(3)

o Clusterisoffline

o TheclusterisofflinebecauseNodeCcannotacquireamajorityofthevotes(1of3)andwillremaininlockdown

Page 29: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Scenario 6o Allnodesdisappearfromthecluster

duetounforeseencircumstances

o Clusterisoffline (obviously)

o ThisisapotentialproblemastheNodewiththemostrecentdatashouldbeusedtobootstraptheclusteragaintoavoiddataloss

Page 30: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Scenario 7o AnetworksplitcausesNodeA,BandC

toloseconnectivitywithNodeD,EandF

o Clusterisoffline

o NodeA,BandChavenomajority(3of6)andNodeD,EandFalsohavenomajority(3of6).AllNodesgoinlockdown

Page 31: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Multiple Datacenters IDC1 DC2

node1

node2

node3

Page 32: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Multiple Datacenters IIDC1 DC2

node1

node2

node3

node4

Page 33: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Multiple Datacenters IIIDC1 DC2

node1 node2

DC3

node3

Page 34: Breda Development Meetup 2016-06-08 - High Availability

Clustering – Multiple Datacenters IVDC1 DC2

node1

node2

node3

node4

DC3

node5 node6

Page 35: Breda Development Meetup 2016-06-08 - High Availability

Health Endpoint Monitoring

o MonitorapplicationsforavailabilityinaHApool

o Monitormiddle-tierservicesforavailability

o Automaticremovalofmisbehavingendpointsfromthepool

o Endpointsthatarehealthyagainafteraserviceinterruptionare

automaticallyre-added

Page 36: Breda Development Meetup 2016-06-08 - High Availability

Application Health Check

loadbalancer

ApplicationNode

StorageavailableCodecanbeexecutedDatabasereachableServiceArunningServiceBrunning

statusrequest

200(OK)Responsetime:50ms

Page 37: Breda Development Meetup 2016-06-08 - High Availability

Database Health Check

loadbalancer

DatabaseNode

DatabaserunningSimplequerycanbeexecutedLocaldatabasenode ishealthyclusternode

statusrequest

200(OK)Responsetime:50ms

Page 38: Breda Development Meetup 2016-06-08 - High Availability

appserver 1

appserver 2appserver 3

Monitoring Strategy

Loadbalancer

DBloadbalancer

db node1db node2

db node3

DBloadbalancer

db node1db node2

db node3

appserver1appserver2

DBnode1DBnode3

Page 39: Breda Development Meetup 2016-06-08 - High Availability

Design Patterns for HA environments

o Safeguardperformance

o Increasefaulttolerancy

o Improveconsistency

Page 40: Breda Development Meetup 2016-06-08 - High Availability

Queue based load leveling pattern I

o Temporaldecoupling

o Loadleveling

o Loadbalancing

o Loosecoupling

tasks

service

messagequeue

requestsreceivedatvariablerate

messagesprocessedatamore

consistentrate

Page 41: Breda Development Meetup 2016-06-08 - High Availability

Queue based load leveling pattern II

Whentouse?o Anytypeofapplicationorservicethatissubjecttooverloading

Whennottouse?o Notsuitableifaresponsewithminimallatencyisexpectedfromthe

applicationorservice

Page 42: Breda Development Meetup 2016-06-08 - High Availability

Throttling pattern Io Rejectordelayrequeststotheapplicationwhenacertainnumberof

requestsinacertainamountoftimeisreached

o Disableordegradefunctionalityofselectednonessentialservicessothatessentialservicescanrununimpededwithsufficientresources

Page 43: Breda Development Meetup 2016-06-08 - High Availability

Throttling pattern IIWhentouse?o Toensurethatasystemcontinuestomeetservicelevelagreements

o Topreventasingletenantfrommonopolizingtheresourcesprovidedbyanapplication

o Tohandleburstsinactivity

o Tohelpcost-optimizeasystembylimitingthemaximumresourcelevelsneededtokeepitfunctioning

Page 44: Breda Development Meetup 2016-06-08 - High Availability

Retry patterno Enabletheapplicationtohandleanticipated,temporaryfailures

o Transparentlyretryinganoperationthathaspreviouslyfailedintheexpectationthatthecauseofthefailureistransient

o Especiallyusefulinmicro-serviceandcloudarchitectures

Page 45: Breda Development Meetup 2016-06-08 - High Availability

DeploymentsHighavailableenvironmentsbringadditionalchallengestosoftwaredeployments:

o Howtoperformatomicreleases?

o Howtorollbackafaultyreleasequickly?

o Howtoreleasenewsoftwarewithoutanydowntime?

Page 46: Breda Development Meetup 2016-06-08 - High Availability

Basic deployment

loadbalancer

applicationserver1

applicationserver2

databasecluster

1.replaceapplicationcodeonappserver 1

2.replaceapplicationcodeonappserver 2

3.applydatabasechanges

DONE!

Page 47: Breda Development Meetup 2016-06-08 - High Availability

Enhanced deployment

loadbalancer

applicationserver1

applicationserver2

databasecluster

1.removeappserver 1fromthepool

3.enableappserver 1inthepoolanddisableappserver 2

2.replaceapplicationcodeonappserver 1

DONE!

4.replaceapplicationcodeonappserver 2

5.enableappserver 2inthepool

6.applydatabasechanges

Page 48: Breda Development Meetup 2016-06-08 - High Availability

A/B Deployments Iloadbalancer applicationserver1 applicationserver2

www.live.nlappserver 1- Aappserver 2- A

www.shadow.nlappserver 1- Bappserver 2- B

webserverA/deploy/A

webserverA/deploy/A

webserverB/deploy/B

webserverB/deploy/B

Page 49: Breda Development Meetup 2016-06-08 - High Availability

A/B Deployments IIloadbalancer

requestfor:www.live.nl

“www.live.nl isbeingservedbypoolA”

applicationserver

WebserverAcoderesidesat/deploy/A

requestfor:www.shadow.nl

“www.shadow.nl isbeingservedbypool B”

Webserver Bcoderesides at/deploy/B

Page 50: Breda Development Meetup 2016-06-08 - High Availability

A/B Deployments IIIloadbalancer

www.live.nlwww.shadow.nl

POOLAè BPOOLBè A

ByswappingPoolAwithPoolBinthe loadbalancer,theentirebackendsareswitchedinstantaneously.

Thisenablesseamlessdeploymentwithout downtime

Page 51: Breda Development Meetup 2016-06-08 - High Availability

Deployment best practiceso Neverintroducebackwardsbreakingchangestothedatabase

o Thoroughlytestshadow-liveenvironmentasitistheclosesttothereallivedeployment

o Maintainatightreleaseversioning,basedonsemanticversioning

o ReleasingendofdayandonaFridayisnotrecommended

Page 52: Breda Development Meetup 2016-06-08 - High Availability

Questions?

Page 53: Breda Development Meetup 2016-06-08 - High Availability

WWW.CMTELECOM.COM

THANKSFORLISTENING!