Chapter 5 hdhjshjdhsjdhs
-
Upload
ahmed-yousufzai -
Category
Documents
-
view
222 -
download
0
Transcript of Chapter 5 hdhjshjdhsjdhs
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
1/31
Chapter 5: Availability
Len Bass, Paul Clements, RickKazman, distributed under CreativeCmmns Attributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
2/31
Chapter !utline
"hat is Availability# Availability $eneral %cenari
&actics 'r Availability
A (esi)n Checklist 'r Availability %ummary
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
3/31
"hat is Availability#
Availability re'ers t a prperty ' s't*arethat it is there and ready t carry ut its task*hen yu need it t be+
&his is a brad perspective and
encmpasses *hat is nrmally calledreliability+
Availability builds n reliability by addin) thentin ' recvery repair-+
.undamentally, availability is abutminimizin) service uta)e time bymiti)atin) 'aults+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
4/31
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
5/31
Principal prperties
Availability&he prbability that the system *ill be up and
runnin) and able t deliver use'ul services t users+
Reliability
&he prbability that the system *ill crrectly deliverservices as e/pected by users+
%a'ety A 0ud)ment ' h* likely it is that the system *ill
cause dama)e t peple r its envirnment+
%ecurity A 0ud)ment ' h* likely it is that the system can
resist accidental r deliberate intrusins+
5Chapter 11 %ecurity and (ependability
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
6/31
Availability and reliability
2t is smetimes pssible t incrprate systemavailability under system reliability !bviusly i' a system is unavailable it is nt deliverin) the
speci3ed system services+
4*ever, it is pssible t have systems *ith l*reliability that must be available+ % ln) as system 'ailures can be repaired uickly and des
nt dama)e data, sme system 'ailures may nt be aprblem+
Availability is there're best cnsidered as a separateattribute re6ectin) *hether r nt the system candeliver its services+
Availability takes repair time int accunt, i' thesystem has t be taken ut ' service t repair 'aults+
7Chapter 11 %ecurity and (ependability
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
7/31
Causes ' 'ailure
4ard*are 'ailure 4ard*are 'ails because ' desi)n and
manu'acturin) errrs r becausecmpnents have reached the end ' their
natural li'e+ %'t*are 'ailure
%'t*are 'ails due t errrs in itsspeci3catin, desi)n r implementatin+
!peratinal 'ailure 4uman peratrs make mistakes+ 8*
perhaps the lar)est sin)le cause ' system'ailures in sci9technical systems+
Chapter 11 %ecurity and (ependability
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
8/31
(ependability attribute
%a'e system peratin depends nthe system bein) available andperatin) reliably+
A system may be unreliable becauseits data has been crrupted by ane/ternal attack+
(enial ' service attacks n a systemare intended t make it unavailable+
2' a system is in'ected *ith a virus,yu cannt be cn3dent in itsreliability+ Chapter 11 %ecurity and (ependability ;
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
9/31
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
10/31
%ample Cncrete Availability %cenari
&he heartbeat mnitr determinesthat the server is nnrespnsivedurin) nrmal peratins+ &he
system in'rms the peratr andcntinues t perate *ith nd*ntime+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
11/31
$al ' Availability &actics
A 'ailure ccurs *hen the system n ln)erdelivers a service cnsistent *ith itsspeci3catin this 'ailure is bservable by the system=s actrs+
A 'ault r cmbinatin ' 'aults- has theptential t cause a 'ailure+
Availability tactics enable a system t endure'aults s that services remain cmpliant *ith
their speci3catins+ &he tactics keep 'aults 'rm becmin) 'ailures r
at least bund the e?ects ' the 'ault and makerepair pssible+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
12/31
$al ' Availability &actics
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
13/31
Availability &actics
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
14/31
(etect .aults
Pin)
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
15/31
(etect .aults
&imestamp: used t detect incrrectseuences ' events, primarily in distributedmessa)e9passin) systems+
%anity Checkin): checks the validity r
reasnableness ' a cmpnent=s peratinsr utputsD typically based n a kn*led)e 'the internal desi)n, the state ' the system, rthe nature ' the in'rmatin under scrutiny+
Cnditin @nitrin): checkin) cnditins in aprcess r device, r validatin) assumptinsmade durin) the desi)n+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
16/31
(etect .aults
Etin): t check that replicated cmpnentsare prducin) the same results+ Cmes invarius 6avrs: replicatin, 'unctinalredundancy, analytic redundancy+
>/ceptin (etectin: detectin ' a systemcnditin that alters the nrmal 6* 'e/ecutin, e+)+ system e/ceptin, parameter'ence, parameter typin), timeut+
%el'9test: prcedure 'r a cmpnent t testitsel' 'r crrect peratin+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
17/31
Recver 'rm .aultsPreparatin F Repair-
Active Redundancy ht spare-: all ndes in aprotectiongroup receive and prcess identical inputs in parallel,all*in) redundant spares- t maintain synchrnusstate *ith the active ndes-+ A prtectin )rup is a )rup ' ndes *here ne r mre ndes
are Gactive,H *ith the remainder servin) as redundant spares+
Passive Redundancy *arm spare-: nly the activemembers ' the prtectin )rup prcess input traIcD ne' their duties is t prvide the redundant spares- *ithperidic state updates+
%pare cld spare-: redundant spares ' a prtectin )rupremain ut ' service until a 'ail9ver ccurs, at *hichpint a p*er9n9reset prcedure is initiated n theredundant spare prir t its bein) placed in service+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
18/31
Recver 'rm .aultsPreparatin F Repair-
>/ceptin 4andlin): dealin) *ith thee/ceptin by reprtin) it r handlin) it,ptentially maskin) the 'ault bycrrectin) the cause ' the e/ceptin
and retryin)+ Rllback: revert t a previus kn*n
)d state, re'erred t as the GrllbacklineH+
%'t*are Jp)rade: in9service up)radest e/ecutable cde ima)es in a nn9service9a?ectin) manner+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
19/31
Recver 'rm .aultsPreparatin F Repair-
Retry: *here a 'ailure is transient retryin) theperatin may lead t success+
2)nre .aulty Behavir: i)nrin) messa)es sent'rm a surce *hen it is determined that thse
messa)es are spurius+ (e)radatin: maintains the mst critical
system 'unctins in the presence ' cmpnent'ailures, drppin) less critical 'unctins+
Recn3)uratin: reassi)nin) respnsibilities tthe resurces le't 'unctinin), *hilemaintainin) as much 'unctinality as pssible+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
20/31
Recver 'rm .aultsReintrductin-
%had*: peratin) a previusly 'ailed r in9service up)raded cmpnent in a Gshad* mdeH'r a prede3ned time prir t revertin) thecmpnent back t an active rle+
%tate Resynchrnizatin: partner t activeredundancy and passive redundancy *here statein'rmatin is sent 'rm active t standbycmpnents+
>scalatin) Restart: recver 'rm 'aults by varyin)
the )ranularity ' the cmpnents- restarted andminimizin) the level ' service a?ected+
8n9stp .r*ardin): 'unctinality is split intsupervisry and data+ 2' a supervisr 'ails, a
ruter cntinues 'r*ardin) packets aln) kn*n Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
21/31
Prevent .aults
Remval .rm %ervice: temprarily placin) asystem cmpnent in an ut9'9service state 'rthe purpse ' miti)atin) ptential system 'ailures
&ransactins: bundlin) state updates s thatasynchrnus messa)es e/chan)ed bet*eendistributed cmpnents are atomic, consistent,isolated, and durable+
Predictive @del: mnitr the state ' health ' aprcess t ensure that the system is peratin)
*ithin nminal parametersD take crrective actin*hen cnditins are detected that are predictive 'likely 'uture 'aults+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
22/31
Prevent .aults
>/ceptin Preventin: preventin)system e/ceptins 'rm ccurrin) bymaskin) a 'ault, r preventin) it via
smart pinters, abstract data types,*rappers+
2ncrease Cmpetence %et: desi)nin)a cmpnent t handle mre cases'aultsas part ' its nrmalperatin+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
23/31
(esi)n Checklist 'rAvailability
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
AllocationofResponsibilities
Determine the systemresponsibilities that need to behighly available. Ensure thatadditional responsibilities havebeen allocated to detect anomission, crash, incorrect timing,or incorrect response.Ensure that there areresponsibilities to:
logthefault
notifyappropriateentitiespeopleorsystems!
disablesourceofeventscausingthefault
betemporarilyunavailable
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
24/31
(esi)n Checklist 'rAvailability
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
&oordination 'odel
Determine the system responsibilities thatneed to be highly available. (ith respect tothose responsibilities Ensure that coordination mechanisms can
detect an omission, crash, incorrect timing,or incorrect response. &onsider, e.g.,
)hether guaranteed delivery is necessary.(ill the coordination )or$ under degradedcommunication*
Ensure that coordination mechanisms enablethe logging of the fault, noti"cation ofappropriate entities, disabling of the sourceof the events causing the fault, "#ing or
mas$ing the fault, or operating in adegraded mode
Ensure that the coordination model supportsthe replacement of the artifacts processors,communications channels, persistentstorage, and processes!. E.g., does
replacement of a server allo) the system tocontinue to operate*
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
25/31
(esi)n Checklist 'rAvailability
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
Data'odel
Determine )hich portions of the systemneed to be highly available. (ithinthose portions, determine )hich dataabstractions could cause a fault ofomission, a crash, incorrect timing
behavior, or an incorrect response.or those data abstractions, operations,and properties, ensure that they can bedisabled, be temporarily unavailable, orbe "#ed or mas$ed in the event of a
fault.E.g., ensure that )rite re+uests arecached if a server is temporarilyunavailable and performed )hen theserver is returned to service.
i h kli '
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
26/31
(esi)n Checklist 'rAvailability
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
'appingAmongArchitecturalElements
Determine )hich artifacts processors,communication channels, storage,processes! may produce a fault: omission,crash, incorrect timing, or incorrectresponse.
Ensure that the mapping or re-mapping!of architectural elements is e#ibleenough to permit the recovery from thefault. /his may involve a consideration of
)hich processes on failed processorsneed to be re-assigned at runtime
)hich processors, data stores, orcommunication channels can beactivated or re-assigned at runtime
ho) data on failed processors or storagecan be served b re lacement units
i Ch kli '
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
27/31
(esi)n Checklist 'rAvailability
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
Resource'anagement
Determine )hat critical resources arenecessary to continue operating in thepresence of a fault: omission, crash,incorrect timing, or incorrect response.Ensure there are su0cient remainingresources in the event of a fault to log
the fault1 notify appropriate entitiespeople or systems!1 disable source ofevents causing the fault1 "# or mas$ thefault%failure1 operate normally, instartup, shutdo)n, repair mode,degraded operation, and overloaded
operation.Determine the availability time forcritical resources, )hat critical resourcesmust be available during speci"ed timeintervals, time intervals during )hichthe critical resources may be in a
degraded mode, and repair time forcritical resources. Ensure that the
( i Ch kli '
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
28/31
(esi)n Checklist 'rAvailability
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
3inding/ime
Determine ho) and )hen architecturalelements are bound. 4f late binding is usedto alternate bet)een components that canthemselves be sources of faults e.g.processes, processors, communicationchannels!, ensure the chosen availabilitystrategy is su0cient to cover faultsintroduced by all sources. E.g. 4f late binding is used to s)itch bet)een
processors that )ill be the sub5ect offaults, )ill the fault detection andrecovery mechanisms )or$ for allpossible bindings*
4f late binding is used to change thede"nition or tolerance of )hat constitutesa fault e.g., ho) long a process can go)ithout responding before a fault isassumed!, is the recovery strategychosen su0cient to handle all cases* or
( i Ch kli '
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
29/31
(esi)n Checklist 'rAvailability
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
&hoiceof/echnol
ogy
Determine the availabletechnologies that can help! detectfaults, recover from faults, re-
introduce failed components.Determine )hat technologies areavailable that help the response to afault e.g., event loggers!.Determine the availabilitycharacteristics of chosentechnologies themselves: (hatfaults can they recover from* (hatfaults might they introduce into thesystem*
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
30/31
%ummary
Availability re'ers t the ability ' thesystem t be available 'r use *hen a'ault ccurs+
&he 'ault must be rec)nized r
prevented- and then the system mustrespnd+
&he respnse *ill depend n the criticality
' the applicatin and the type ' 'ault can ran)e 'rm Gi)nre itH t Gkeep n )in)
as i' it didn=t ccur+H
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License
-
7/24/2019 Chapter 5 hdhjshjdhsjdhs
31/31
%ummary
&actics 'r availability are cate)rized int detect'aults, recver 'rm 'aults and prevent 'aults+
(etectin tactics depend n detectin) si)ns 'li'e 'rm varius cmpnents+
Recvery tactics are retryin) an peratin rmaintainin) redundant data r cmputatins+
Preventin tactics depend n remvin) elements'rm service r limitin) the scpe ' 'aults+
All availability tactics invlve the crdinatinmdel+
Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAtt ib ti Li