Chapter 5 hdhjshjdhsjdhs

download Chapter 5 hdhjshjdhsjdhs

of 31

Transcript of Chapter 5 hdhjshjdhsjdhs

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    1/31

    Chapter 5: Availability

    Len Bass, Paul Clements, RickKazman, distributed under CreativeCmmns Attributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    2/31

    Chapter !utline

    "hat is Availability# Availability $eneral %cenari

    &actics 'r Availability

    A (esi)n Checklist 'r Availability %ummary

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    3/31

    "hat is Availability#

    Availability re'ers t a prperty ' s't*arethat it is there and ready t carry ut its task*hen yu need it t be+

    &his is a brad perspective and

    encmpasses *hat is nrmally calledreliability+

    Availability builds n reliability by addin) thentin ' recvery repair-+

    .undamentally, availability is abutminimizin) service uta)e time bymiti)atin) 'aults+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    4/31

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    5/31

    Principal prperties

    Availability&he prbability that the system *ill be up and

    runnin) and able t deliver use'ul services t users+

    Reliability

    &he prbability that the system *ill crrectly deliverservices as e/pected by users+

    %a'ety A 0ud)ment ' h* likely it is that the system *ill

    cause dama)e t peple r its envirnment+

    %ecurity A 0ud)ment ' h* likely it is that the system can

    resist accidental r deliberate intrusins+

    5Chapter 11 %ecurity and (ependability

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    6/31

    Availability and reliability

    2t is smetimes pssible t incrprate systemavailability under system reliability !bviusly i' a system is unavailable it is nt deliverin) the

    speci3ed system services+

    4*ever, it is pssible t have systems *ith l*reliability that must be available+ % ln) as system 'ailures can be repaired uickly and des

    nt dama)e data, sme system 'ailures may nt be aprblem+

    Availability is there're best cnsidered as a separateattribute re6ectin) *hether r nt the system candeliver its services+

    Availability takes repair time int accunt, i' thesystem has t be taken ut ' service t repair 'aults+

    7Chapter 11 %ecurity and (ependability

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    7/31

    Causes ' 'ailure

    4ard*are 'ailure 4ard*are 'ails because ' desi)n and

    manu'acturin) errrs r becausecmpnents have reached the end ' their

    natural li'e+ %'t*are 'ailure

    %'t*are 'ails due t errrs in itsspeci3catin, desi)n r implementatin+

    !peratinal 'ailure 4uman peratrs make mistakes+ 8*

    perhaps the lar)est sin)le cause ' system'ailures in sci9technical systems+

    Chapter 11 %ecurity and (ependability

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    8/31

    (ependability attribute

    %a'e system peratin depends nthe system bein) available andperatin) reliably+

    A system may be unreliable becauseits data has been crrupted by ane/ternal attack+

    (enial ' service attacks n a systemare intended t make it unavailable+

    2' a system is in'ected *ith a virus,yu cannt be cn3dent in itsreliability+ Chapter 11 %ecurity and (ependability ;

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    9/31

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    10/31

    %ample Cncrete Availability %cenari

    &he heartbeat mnitr determinesthat the server is nnrespnsivedurin) nrmal peratins+ &he

    system in'rms the peratr andcntinues t perate *ith nd*ntime+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    11/31

    $al ' Availability &actics

    A 'ailure ccurs *hen the system n ln)erdelivers a service cnsistent *ith itsspeci3catin this 'ailure is bservable by the system=s actrs+

    A 'ault r cmbinatin ' 'aults- has theptential t cause a 'ailure+

    Availability tactics enable a system t endure'aults s that services remain cmpliant *ith

    their speci3catins+ &he tactics keep 'aults 'rm becmin) 'ailures r

    at least bund the e?ects ' the 'ault and makerepair pssible+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    12/31

    $al ' Availability &actics

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    13/31

    Availability &actics

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    14/31

    (etect .aults

    Pin)

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    15/31

    (etect .aults

    &imestamp: used t detect incrrectseuences ' events, primarily in distributedmessa)e9passin) systems+

    %anity Checkin): checks the validity r

    reasnableness ' a cmpnent=s peratinsr utputsD typically based n a kn*led)e 'the internal desi)n, the state ' the system, rthe nature ' the in'rmatin under scrutiny+

    Cnditin @nitrin): checkin) cnditins in aprcess r device, r validatin) assumptinsmade durin) the desi)n+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    16/31

    (etect .aults

    Etin): t check that replicated cmpnentsare prducin) the same results+ Cmes invarius 6avrs: replicatin, 'unctinalredundancy, analytic redundancy+

    >/ceptin (etectin: detectin ' a systemcnditin that alters the nrmal 6* 'e/ecutin, e+)+ system e/ceptin, parameter'ence, parameter typin), timeut+

    %el'9test: prcedure 'r a cmpnent t testitsel' 'r crrect peratin+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    17/31

    Recver 'rm .aultsPreparatin F Repair-

    Active Redundancy ht spare-: all ndes in aprotectiongroup receive and prcess identical inputs in parallel,all*in) redundant spares- t maintain synchrnusstate *ith the active ndes-+ A prtectin )rup is a )rup ' ndes *here ne r mre ndes

    are Gactive,H *ith the remainder servin) as redundant spares+

    Passive Redundancy *arm spare-: nly the activemembers ' the prtectin )rup prcess input traIcD ne' their duties is t prvide the redundant spares- *ithperidic state updates+

    %pare cld spare-: redundant spares ' a prtectin )rupremain ut ' service until a 'ail9ver ccurs, at *hichpint a p*er9n9reset prcedure is initiated n theredundant spare prir t its bein) placed in service+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    18/31

    Recver 'rm .aultsPreparatin F Repair-

    >/ceptin 4andlin): dealin) *ith thee/ceptin by reprtin) it r handlin) it,ptentially maskin) the 'ault bycrrectin) the cause ' the e/ceptin

    and retryin)+ Rllback: revert t a previus kn*n

    )d state, re'erred t as the GrllbacklineH+

    %'t*are Jp)rade: in9service up)radest e/ecutable cde ima)es in a nn9service9a?ectin) manner+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    19/31

    Recver 'rm .aultsPreparatin F Repair-

    Retry: *here a 'ailure is transient retryin) theperatin may lead t success+

    2)nre .aulty Behavir: i)nrin) messa)es sent'rm a surce *hen it is determined that thse

    messa)es are spurius+ (e)radatin: maintains the mst critical

    system 'unctins in the presence ' cmpnent'ailures, drppin) less critical 'unctins+

    Recn3)uratin: reassi)nin) respnsibilities tthe resurces le't 'unctinin), *hilemaintainin) as much 'unctinality as pssible+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    20/31

    Recver 'rm .aultsReintrductin-

    %had*: peratin) a previusly 'ailed r in9service up)raded cmpnent in a Gshad* mdeH'r a prede3ned time prir t revertin) thecmpnent back t an active rle+

    %tate Resynchrnizatin: partner t activeredundancy and passive redundancy *here statein'rmatin is sent 'rm active t standbycmpnents+

    >scalatin) Restart: recver 'rm 'aults by varyin)

    the )ranularity ' the cmpnents- restarted andminimizin) the level ' service a?ected+

    8n9stp .r*ardin): 'unctinality is split intsupervisry and data+ 2' a supervisr 'ails, a

    ruter cntinues 'r*ardin) packets aln) kn*n Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    21/31

    Prevent .aults

    Remval .rm %ervice: temprarily placin) asystem cmpnent in an ut9'9service state 'rthe purpse ' miti)atin) ptential system 'ailures

    &ransactins: bundlin) state updates s thatasynchrnus messa)es e/chan)ed bet*eendistributed cmpnents are atomic, consistent,isolated, and durable+

    Predictive @del: mnitr the state ' health ' aprcess t ensure that the system is peratin)

    *ithin nminal parametersD take crrective actin*hen cnditins are detected that are predictive 'likely 'uture 'aults+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    22/31

    Prevent .aults

    >/ceptin Preventin: preventin)system e/ceptins 'rm ccurrin) bymaskin) a 'ault, r preventin) it via

    smart pinters, abstract data types,*rappers+

    2ncrease Cmpetence %et: desi)nin)a cmpnent t handle mre cases'aultsas part ' its nrmalperatin+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    23/31

    (esi)n Checklist 'rAvailability

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

    AllocationofResponsibilities

    Determine the systemresponsibilities that need to behighly available. Ensure thatadditional responsibilities havebeen allocated to detect anomission, crash, incorrect timing,or incorrect response.Ensure that there areresponsibilities to:

    logthefault

    notifyappropriateentitiespeopleorsystems!

    disablesourceofeventscausingthefault

    betemporarilyunavailable

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    24/31

    (esi)n Checklist 'rAvailability

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

    &oordination 'odel

    Determine the system responsibilities thatneed to be highly available. (ith respect tothose responsibilities Ensure that coordination mechanisms can

    detect an omission, crash, incorrect timing,or incorrect response. &onsider, e.g.,

    )hether guaranteed delivery is necessary.(ill the coordination )or$ under degradedcommunication*

    Ensure that coordination mechanisms enablethe logging of the fault, noti"cation ofappropriate entities, disabling of the sourceof the events causing the fault, "#ing or

    mas$ing the fault, or operating in adegraded mode

    Ensure that the coordination model supportsthe replacement of the artifacts processors,communications channels, persistentstorage, and processes!. E.g., does

    replacement of a server allo) the system tocontinue to operate*

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    25/31

    (esi)n Checklist 'rAvailability

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

    Data'odel

    Determine )hich portions of the systemneed to be highly available. (ithinthose portions, determine )hich dataabstractions could cause a fault ofomission, a crash, incorrect timing

    behavior, or an incorrect response.or those data abstractions, operations,and properties, ensure that they can bedisabled, be temporarily unavailable, orbe "#ed or mas$ed in the event of a

    fault.E.g., ensure that )rite re+uests arecached if a server is temporarilyunavailable and performed )hen theserver is returned to service.

    i h kli '

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    26/31

    (esi)n Checklist 'rAvailability

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

    'appingAmongArchitecturalElements

    Determine )hich artifacts processors,communication channels, storage,processes! may produce a fault: omission,crash, incorrect timing, or incorrectresponse.

    Ensure that the mapping or re-mapping!of architectural elements is e#ibleenough to permit the recovery from thefault. /his may involve a consideration of

    )hich processes on failed processorsneed to be re-assigned at runtime

    )hich processors, data stores, orcommunication channels can beactivated or re-assigned at runtime

    ho) data on failed processors or storagecan be served b re lacement units

    i Ch kli '

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    27/31

    (esi)n Checklist 'rAvailability

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

    Resource'anagement

    Determine )hat critical resources arenecessary to continue operating in thepresence of a fault: omission, crash,incorrect timing, or incorrect response.Ensure there are su0cient remainingresources in the event of a fault to log

    the fault1 notify appropriate entitiespeople or systems!1 disable source ofevents causing the fault1 "# or mas$ thefault%failure1 operate normally, instartup, shutdo)n, repair mode,degraded operation, and overloaded

    operation.Determine the availability time forcritical resources, )hat critical resourcesmust be available during speci"ed timeintervals, time intervals during )hichthe critical resources may be in a

    degraded mode, and repair time forcritical resources. Ensure that the

    ( i Ch kli '

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    28/31

    (esi)n Checklist 'rAvailability

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

    3inding/ime

    Determine ho) and )hen architecturalelements are bound. 4f late binding is usedto alternate bet)een components that canthemselves be sources of faults e.g.processes, processors, communicationchannels!, ensure the chosen availabilitystrategy is su0cient to cover faultsintroduced by all sources. E.g. 4f late binding is used to s)itch bet)een

    processors that )ill be the sub5ect offaults, )ill the fault detection andrecovery mechanisms )or$ for allpossible bindings*

    4f late binding is used to change thede"nition or tolerance of )hat constitutesa fault e.g., ho) long a process can go)ithout responding before a fault isassumed!, is the recovery strategychosen su0cient to handle all cases* or

    ( i Ch kli '

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    29/31

    (esi)n Checklist 'rAvailability

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

    &hoiceof/echnol

    ogy

    Determine the availabletechnologies that can help! detectfaults, recover from faults, re-

    introduce failed components.Determine )hat technologies areavailable that help the response to afault e.g., event loggers!.Determine the availabilitycharacteristics of chosentechnologies themselves: (hatfaults can they recover from* (hatfaults might they introduce into thesystem*

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    30/31

    %ummary

    Availability re'ers t the ability ' thesystem t be available 'r use *hen a'ault ccurs+

    &he 'ault must be rec)nized r

    prevented- and then the system mustrespnd+

    &he respnse *ill depend n the criticality

    ' the applicatin and the type ' 'ault can ran)e 'rm Gi)nre itH t Gkeep n )in)

    as i' it didn=t ccur+H

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAttributin License

  • 7/24/2019 Chapter 5 hdhjshjdhsjdhs

    31/31

    %ummary

    &actics 'r availability are cate)rized int detect'aults, recver 'rm 'aults and prevent 'aults+

    (etectin tactics depend n detectin) si)ns 'li'e 'rm varius cmpnents+

    Recvery tactics are retryin) an peratin rmaintainin) redundant data r cmputatins+

    Preventin tactics depend n remvin) elements'rm service r limitin) the scpe ' 'aults+

    All availability tactics invlve the crdinatinmdel+

    Len Bass, Paul Clements, Rick Kazman, distributed under Creative CmmnsAtt ib ti Li