How to use SHARPE and SPNP - Duke...

159
How to use SHARPE and SPNP Dr. Dan (Dong-Seong) Kim University of Canterbury, New Zealand [email protected] http://www.cosc.canterbury.ac.nz/dongseong.kim

Transcript of How to use SHARPE and SPNP - Duke...

  • How to use SHARPE and SPNP

    Dr. Dan (Dong-Seong) Kim

    University of Canterbury, New Zealand

    [email protected] http://www.cosc.canterbury.ac.nz/dongseong.kim

    mailto:[email protected]

  • University of Canterbury

    University of Canterbury (UC) • originated in 1873 in the centre of Christchurch as

    Canterbury College, the first constituent college of the University of New Zealand

    • Ernest Rutherford – physicist – Nobel Prize in chem.

    • John Key– politician currently Prime Minister of New Zealand

    Computer Science and Software Engineering department at UC has been ranked in the top 101-150 Computer Science departments in the 2011 International QS World University Rankings.

    http://en.wikipedia.org/wiki/Christchurchhttp://en.wikipedia.org/wiki/University_of_New_Zealandhttp://en.wikipedia.org/wiki/University_of_New_Zealandhttp://en.wikipedia.org/wiki/Ernest_Rutherfordhttp://en.wikipedia.org/wiki/John_Keyhttp://en.wikipedia.org/wiki/John_Keyhttp://en.wikipedia.org/wiki/Prime_Minister_of_New_Zealandhttp://en.wikipedia.org/wiki/Prime_Minister_of_New_Zealandhttp://en.wikipedia.org/wiki/Prime_Minister_of_New_Zealandhttp://www.topuniversities.com/university-rankings/world-university-rankings/2011/subject-rankings/technology/computer-science-information-systems?page=4http://www.topuniversities.com/university-rankings/world-university-rankings/2011/subject-rankings/technology/computer-science-information-systems?page=4

  • About myself

    Lecturer (Assistant Professor in US) since Aug. 2011 • Full time/permanent

    • Computer science and software engineering Dept.

    • Research/teaching: Computer and Network Security

    Postdoc at Duke U. from June 2008- July 2011 • (Kishor S. Trivedi group)

    U of Maryland, USA in 2007 • Virgil D. Gligor group (former ACM SIGSAC chair)

    Studied at KAU in Korea (BS, MS, PhD) • JongSou Park group (Penn. State PhD)

  • Security

    Hardware/ Software

    Network

    Ubiquitous computing

    Cyber physical systems

    Intrusion Detection /Tolerance systems

    Sensor Nets Dependability and Security

    Cloud/VDC

    Fault tolerance Reliability of Sat./UAV

    Embedded System

    REASSURE analysis (REliability, Availability, Security, SUrvivability, REsilience)

    Smart Grid

  • Outline - SHARPE

    Brief Intro. To SHARPE

    Download

    Analytic Modeling using SHARPE • Non-state space models

    o Reliability Block Diagram (RBD)

    o Fault Tree

    o Reliability Graph

    • State space models o Continuous Time Markov Chains (CTMC)

    o Stochastic Petri nets/Stochastic Reward nets

    • Hierarchical Models

    • Others

  • 6/81

    Copyright © by Kishor S. Trivedi

    Health & Medicine

    Communication

    Avionics

    Entertainment Banking

    Motivation: Dependence on Computer Systems

  • Dependability– An umbrella term

    Trustworthiness of a computer system such that reliance can justifiably be placed on the service it delivers

    Dependability

    Attributes

    Availability Reliability Safety Maintainability

    Fault Prevention Fault Removal Fault Tolerance Fault Forecasting

    Means

    Threats Faults Errors Failures

  • IFIP WG10.4

    Failure occurs when the delivered service no longer complies with the desired output.

    Error is that part of the system state which is liable to lead to subsequent failure.

    Fault is adjudged or hypothesized cause of an error.

    Faults are the cause of errors that may lead to failures

    Fault Error Failure

  • Reliability Availability

    Dependability Measures

    Dependability Attributes or Measures

    • Reliability: “The ability of a system to perform a required function

    under given conditions for a given time interval.” No recovery is

    assumed after system fails (there can be recovery after a component

    failure)

    • Availability: “The ability of a system to be in a state to perform a

    required function at a given instant of time or at any instant of time

    within a given time interval.“

  • Reliability, Availability, Performance

    Reliability-Based

    • Availability: Steady-state, Transient, Interva

    Downtime

    • Reliability: R(t), System MTTF

    Performance

    • Throughput, Loss Probability, Response Time

    “Does it work, and for how long?''

    “Given that it works, how well does it work?''

  • Composite Performance and Availability

    Need Techniques and Tools That Can Evaluate • Performance, Availability and Their

    Combinations

    “How much work will be done(lost) in a

    given interval including the effects of

    failure/repair/contention?''

    Performability

  • Methods of Evaluation

    Measurement-Based

    • Most believable, most expensive

    • Not always possible or cost effective during system

    design

    Model-Based

    • Less believable, Less expensive

    • Discrete-Event Simulation vs. Analytic

  • Numerical solution tool

    Close-form

    solution

    Evaluation Methods

    Model-based

    Discrete-event simulation

    Hybrid

    Analytic Models

    Quantitative evaluation

    Measurement-based

  • Overview of SHARPE

    SHARPE: Symbolic-Hierarchical Automated Reliability and Performance Evaluator

    Well-known modeling tool (Installed at over 300 Sites; companies and universities)

    Combines flexibility of Markov models and efficiency of combinatorial models

    Ported to most architectures and operating systems

    Used for Education, Research, Engineering Practice

  • Overview of SHARPE (cont.)

    Graphical User Interface is now available

    Used for analysis of performance, dependability

    and performability

    Hierarchy facilitates largeness & stiffness

    avoidance

    Steady-state as well as transient analysis

    Written in C language

  • Architecture of SHARPE

    Fault tree

    Reliability graph

    Reliability

    Block

    Diagrams

    Task graph Pfqn, Mfqn

    Hierarchical & Hybrid Compositions

    Semi-Markov chain

    Markov chain

    Petri net

    (GSPN & SRN)

    Availability/Reliability Performance Performability

  • Non-state space models

    Analytic models

    Non-state space model types

    Series Parallel reliability block diagrams

    (RBDs)

    Non-SP reliability block diagrams

    (reliability graph: Relgraph)

    Fault trees (FTs)

    Fault trees with repeated events

  • Non-state space models (cont.)

    Reliability block diagrams, Fault trees and

    Reliability graphs

    • Commonly used for reliability and availability

    • These model types are similar in that they capture

    conditions that make a system fail in terms of the

    structural relationships between the system

    components.

  • Non-state space models (cont.)

    Non-state space modeling techniques (like RBDs, relgraphs and FTs) are easy to use and assuming statistical independence solve for system reliability, system availability and system MTTF; can find bottlenecks

    Each component can have attached to it

    • A probability of failure

    • A failure rate

    • A distribution of time to failure

    • Steady-state and instantaneous unavailability

  • Non-state space models (cont.)

    can be solved using fast algorithms assuming stochastic independence between system components (all implemented in SHARPE) • Sum of Disjoint Products (SDP) algorithms.

    • Binary Decision Diagrams (BDD) algorithms.

    • Factoring (conditioning) algorithms.

    • Series-parallel composition algorithms.

    • Bounding algorithm for relgraphs

    Failure/Repair Dependencies are often present; • RBDs, FTs cannot easily handle these (e.g., shared repair,

    warm/cold spares, imperfect coverage, non-zero switching time, travel time of repair person, reliability with repair).

  • Non-state space models (cont.)

    Reliability block diagrams (Control-voice channel example)

    Fault Tree (Control-voice channel example)

  • Reliability block diagrams

  • 2 Control and 3 Voice Channels Example

    control

    control

    voice

    voice

    voice

  • Description

    Each control channel has a reliability Rc(t)

    Each voice channel has a reliability Rv(t)

    System is up if at least one control channel and at least 1 voice channel are up.

    Reliability:

    ]))(1(1][))(1(1[)( 32 tRtRtR vc

  • Exercise: (non-) virtualized vs. virtualizedsystem

    H. Ramasamy and M. Schunter, ‚Architecting Dependable Systems Using Virtualization,‛ In Workshop on DSN-2007.

  • Fault Tree

  • Markov chains

  • Markov chains

    To model complex interactions between components, use models like Markov chains or more generally state space models.

    Markov reliability models will have one or more absorbing states;

    • Markov availability models will have n absorbing states

    Many examples of dependencies among system components have been observed in practice and captured by continuous-time Markov chains (CTMCs).

  • Modeling Taxonomy

    Abstract models

    Discrete-event simulation

    Hybrid

    Analytic models

    Non-state-space models

    State-space models

  • State-Space Models

    States and labeled state transitions

    State can keep track of:

    • Number of functioning resources of each type

    • States of recovery for each failed resource

    • Number of tasks of each type waiting at each resource

    • Allocation of resources to tasks

    A transition:

    • Can occur from any state to any other state

    • Can represent a simple or a compound event

  • Markov Availability model WebSphere AP Server

    Application server and proxy server (with escalated levels of recovery)

    • Delay and imperfect coverage in each step of recovery modeled

    UA UR UB(1-r)rm

    rrmqra

    (1-q)ra

    bbm

    RE(1-b)bm

    m

    UOUPg ed2

    1D

    ed2dd1

    (1-e)d2

    UN

    dm

    1N

    (1-d)d1

    (1-e)d2

    2N(1-d)d1

    (1-e)d2

    ed2 dd1

    Failure detection By WLM

    By Node Agent

    Manual detection

    Recovery Node Agent

    Auto process restart

    Manual recovery Process restart

    Node reboot

    Repair

  • State-Space model taxonomy

    (discrete) State space

    models

    Markovian models

    non-Markovian models

    discrete-time Markov chains (DTMC)

    continuous-time Markov chains (CTMC)

    Markov reward models (MRM)

    Semi-Markov process (SMP)

    Markov regenerative process

    Non-Homogeneous Markov

    Can relax the assumption of exponential distributions

  • Problem with State Space Models

    State space explosion problem or the largeness problem

    Stochastic Petri nets (SPNs) and related formalisms for easy specification and automated generation/solution of underlying Markov model

    Or use hierarchical (Multilevel) model composition.

    • e.g. Upper level : FT or RBD, lower level: Markov chains

    • Many practical examples of the use of hierarchical models exist

  • Markov Reward Models (MRMs)

    Modeling any system with a pure reliability / availability model can lead to incomplete, or, at least, less precise results. • Gracefully degrading systems may be able to

    survive the failure of one or more of their active components and continue to provide service at a reduced level.

    • Markov reward model is commonly used technique for the modeling of gracefully degradable system

  • Two-State Markov Availability

    Model in SHARPE

  • Availability measures

    Steady-state Availability

    Steady-state Unavailability

    Transient Availability

    Average Cumulative Downtime

  • Snapshot of the GUI

  • bind

    lambda 0.0033

    mu 1

    end

    * We define a model of type Markov with the name m2_state

    markov m2_state readprob

    *specify each transition and its rate

    1 0 lambda

    0 1 mu

    end

    * specify that state 1 is the initial state with probability 1

    1 1

    end * define variable A as the steady-state probability of

    * the Markov chain m2_state being in state 1 var A prob(m2_state,1)

  • var U prob(m2_state,0)

    var downtime 60*8760*U

    *ask sharpe to compute and print A, U and downtime

    expr A, U, downtime

    *define function A(t) as the transient probability for the Markov chain

    * to be in the up state 1 at time t

    func A(t) tvalue(t;m2_state,1)

    * we ask SHARPE to compute and print pointwise availability

    * for time points 0 through 1000 in steps of 50 hours

    loop t,0,1000,50

    expr A(t)

    end

    end

  • Outputs from SHARPE

    A: 9.9668e-01

    U: 3.3223e-03

    Downtime: 1.7462e+03

  • 0.995

    0.996

    0.997

    0.998

    0.999

    1

    1.001

    0 50 100

    150

    200

    250

    300

    350

    400

    450

    500

    550

    600

    650

    700

    750

    800

    850

    900

    950

    1000

    Time

    insta

    nta

    neo

    us a

    vailab

    ilit

    y A

    (t)

    Outputs

  • Two component system: Markov availability model

    Assume we have a two-component parallel redundant system with repair rate m.

    Assume that the failure rate of both components is .

    When both the components have failed, the system is considered to have failed.

  • Markov availability model (Cont.)

    Let the number of properly functioning components be the state of the system. The state space is {0,1,2} where 0 is the system down state.

    We wish to examine effects of shared vs. non-shared repair

  • 2 1 0

    2

    m

    m2

    2 1 0

    2

    m

    m

    Non-shared (independent)

    repair

    Shared repair

    Markov availability model (Cont.)

  • Note: Non-shared case can be modeled &

    solved using a RBD or a FT but shared case

    needs the use of Markov chains.

    m

    m

    A

    2)1(1m

    m

    sysA

    Markov availability model (Cont.)

  • Markov availability model (Cont.)

  • 72

    markov shared

    2 1 2*lambda

    * Could be also written

    * 2 1 2/MTTF

    1 0 lambda

    1 2 mu

    0 1 mu

    end

    Shared Case

  • bind mu 1

    lambda 0.1

    end var U prob(shared,0)

    var downtime 60*8760*U

    loop j ,2, 5, 0.5

    bind lambda 1.0 *10^-j

    expr downtime

    end

    end

  • Markov availability model (Cont.)

  • Non-shared case

    Markov availability model (Cont.)

  • markov non_shared

    2 1 2*lambda

    1 0 lambda

    0 1 mu

    1 2 2*mu

    end

    Non-shared Case

  • bind mu 1

    end var U prob(non_shared,0) var downtime 60*8760*U

    loop j ,2, 5, 0.5

    bind lambda 1.0 *10^-j

    expr downtime

    end

    end

  • Copyright © 2008 by K.S. Trivedi 78

    Markov availability model (Cont.)

    Non-shared case

  • Comparing Shared/Non-shared cases

  • CTMC: WFS example

    WFS Example

  • A Workstations-Fileserver Example

    Computing system consisting of:

    • A file-server

    • Two workstations

    • Computing network connecting them

    System operational as long as:

    • One of the Workstations

    and

    • The file-server are operational

    Computer network is assumed to be fault-free.

  • The WFS Example

  • Assuming exponentially distributed times to

    failure

    • w : failure rate of workstation

    • f : failure rate of file-server

    Assume that components are repairable

    • mw: repair rate of workstation

    • mf : repair rate of file-server

    File-server has (preemptive) priority for repair

    over workstations (such repair priority cannot

    be captured by non-state-space models)

    Markov Chain for WFS Example

  • Markov Availability Model for WFS

    0_0

    2_1 1_1

    1_0 2_0

    0_1

    f

    2w

    2w

    w

    mw mw

    w

    mf mf mf f f

    Since each state is reachable from every other state, the

    CTMC is irreducible. Furthermore, all states are positive

    recurrent (since it is a finite state CTMC).

  • In the previous figure, the label (i_j) of each state is

    interpreted as follows: i represents the number of

    workstations that are still functioning and j is ‘1’ or ‘0’

    depending on whether the file-server is up or down

    respectively.

    Note that in the text, no component failures are

    allowed from system failure states; this is commonly

    assumed by many engineers in practice. Here we

    allow component failures from system failure states to

    show that this situation can also be modeled.

    Markov Availability Model for WFS (cont.)

  • Model in SHARPE GUI

  • Outputs; steady state

  • Analysis Frame

  • Output Generated by SHARPE

  • Markov Reward Models (MRMs)

  • Markov Reward Models (MRMs)

    Continuous Time Markov Chains are useful

    models for performance as well as availability

    prediction

    Extension of CTMC to Markov reward models

    make them even more useful

    Attach a reward rate (or a weight) ri to state i of

    CTMC. Let Z(t)=rX(t) be the instantaneous reward

    rate of CTMC at time t

  • Markov Reward Models (MRMs) (Continued)

    Define Y(t) the accumulated reward in the interval (0,t]

    Computing the expected values of these measures is easy

    For computing distribution of Y(t) see the Perfomability monograph edited by Haverkort, Marie, Rubino & Trivedi

    t

    dZtY0

    )()(

  • 3-State Markov Reward Model with Sample

    Paths of X(t) and Z(t) Processes

    r1 = 1.7

    r2 = 1.0

    r3 = 0.0

    1 2 3

    2

    m2 m3

  • CTMC: Security models

  • Legend

    R

    IR CR IA CA

    CIA

    A

    Dependability and Security Models

    A Dependability and Security Model Classification

    A Availability

    C Confidentiality

    I Integrity

    R Reliability

  • Security Modeling Taxonomy

    Analytic models

    Non-state space models

    State-space models

    Attack tree

    SMP

    CTMC

  • Security compromise of smart card

    Confidentiality compromise

    Availability compromise

    Integrity compromise

    Get PIN Unauthorized access

    Dump communication

    Applications/algorithm Data Protocol

    Block access

    Hardware damage

    Denial Of service

    Non-state space model: Attack Tree

    Fault tree -> attack tree

    I1 I2 I3

  • Non-state space model: Attack graph

    Reliability graph -> attack graph

    Will be covered in more detail • In security talk

  • Probabilistic Security Quantification

    State transition diagram of system security states/ attacker behavior, and use Markov chains, Semi Markov Process, Stochastic Petri Nets

    Attack (Response/countermeasure) Trees for incorporating both attacker and system behavior.

    Hierarchical and fixed-point iterative models in future?

  • Security Quantification of SITAR

    Security vs. attack rate

    0.9550.96

    0.9650.97

    0.9750.98

    0.9850.99

    0.9951

    0.33

    2.33

    4.33

    6.33

    8.33

    10.3

    3

    12.3

    3

    threat level 3

    threat level 1

    Mean time to security failure vs. attack rate

    0

    5000

    10000

    15000

    20000

    25000

    30000

    1 2 3 4 5 6 7 8 9 10 11 12 13

    threat level 3

    threat level 1

    System security

    Mean time to severe security failure

    Threat level 1

    Threat level 3

  • State Space Explosion

    State space explosion can be handled in two ways: • Large model tolerance must apply to specification, storage

    and solution of the model. If the storage and solution problems can be solved, the specification problem can be solved by using more concise (and smaller) model specifications that can be automatically transformed into Markov models (GSPN and SRN models).

    • Large models can be avoided by using hierarchical model composition.

    Ability of SHARPE to combine results from different kinds of models • Possibility to use state-space methods for those parts of a

    system that require them, and use non-state-space methods for the more ‚well-behaved‛ parts of the system.

  • Dependability and Security Evaluation Methods

    Model-based

    Discrete-event simulation

    Hybrid

    Analytic Models

    Quantitative evaluation

    Measurement-based

    Hierarchical models

    largeness

    Combinatorial models

    Efficiency, simplicity

    State-space models

    Dependency capture

  • VDC example

  • Introduction

    VMware Virtualization • w/o virtualization vs. w virtualization

  • Introduction

    VM live migration (e.g., VMotion in VMware) • enables the Live Migration of Virtual Machines

    across hosts

    How useful is this ?

  • Introduction

    VMware

    CPU utilization CPU utilization

    VMware VMware

    CPU utilization

    OS

    App App

    OS

    App

    OS

    App

    OS

    App

    OS

    Call for Upgrade

    Upgrade Completion

    VM live migration (VMotion) (cont)

    host1 host2 host3

  • Introduction

    VMotion

    Bottle App

    ESX Server 1 ESX Server 3 ESX Server 2

    Bottle App Bottle App

    VM live migration (VMotion) (cont) • automatic resource allocation to bottleneck

  • Introduction

    VMware HA (High Availability) • provides easy to use, cost effective high

    availability.

    Resource Pool X Host failure

  • Revisit (non-) virtualized vs. virtualizedsystem

    H. Ramasamy and M. Schunter, ‚Architecting Dependable Systems Using Virtualization,‛ In Workshop on DSN-2007.

  • A System Architecture

    VMM1

    Host1

    VMM2

    Host2

    VM2

    APP2

    VM1

    APP1

    SAN

    CPU

    Mem

    Power

    NIC

    Cooler

    Hardware

  • Failure classification in virtualized two hosts

    Failures

    Hardware failures Software failures

    (non-aging related Mandel bugs)

    VMM Application VM Power

    failure

    Memory

    failure

    CPU

    failure

    Network

    failure

    SAN

    failure

    Cooler

    failure

    How can you represent this system using stochastic model?

    Any idea?

  • A system availability model of virtualized two nodes (old)

    System Failure

    AND

    HW1

    CPU Mem Net Pwr

    host1

    VMM1 HW2

    CPU Mem Net Pwr

    host2

    VMM2

    SAN SAN

    VMs VMs

  • A system hierarchical model of virtualized two hosts

    System Failure

    VM sub SAN

    HW1

    CPU1 Mem1 NIC1 Pwr1

    Host1

    VMM1

    Coo1

    HW2

    CPU2 Mem2 NIC2 Pwr2

    Host2

    VMM2

    Coo2

    AND

    OR

    Compute equivalent failure and recovery rate in one host (subtree) in the fault tree And use it in VM Markov model

  • Markov submodel

    CPU submodel

    D1 RP

    VMM1

    host1

    VMM2

    host2

    VM2

    APP2

    VM1

    APP1

    SAN

    CPU Mem Power NIC Cooler

  • MTTFeq and MTTRequ computation of VM host

    Top level is a host fault subtree (boxed)

    1. Solve Markov submodels such as CPU, MEM, NIC, PWR, COO in the host fault subtree. • We can compute MTTFeq, MTTReq of the Markov

    submodels

    2. Compute MTTFeq and MTTReq of a host fault subtree model

    3. Use MTTFeq and MTTReq values into VM sub Markov model to be shown

    Ref: Dazhi Wang, MS Thesis

  • System flow review

    A host fault subtree

    VM sub Markov chain model

    2) MTTF_eq/MTTR_eq

    of a host fault subtree

    HW1

    CPU Mem NIC Pwr

    Host

    1

    VMM1

    Coo 1) MTTF_eq/MTTR_eq of

    Markov submodels

    Measure of interests

    (availability)

    CPU Mem NIC Pwr VMM Coo SAN

    Input Parameters values of the Markov submodels

    System Failure

    VM sub SAN

    HW1

    CPU1 Mem1 NIC1 Pwr1

    Host1

    VMM1

    Coo1

    HW2

    CPU2 Mem2 NIC2 Pwr2

    Host2

    VMM2

    Coo2

    AND

    OR

    3) Use MTTFeq and MTTReq

    values into VM sub Markov

    model to be shown

  • VMware HA (High Availability)

    • After a host failure detection, restart VM on the other host

    VM migration

    • After the host is repaired, VM is moved to on the original host

    Resource Pool

    Host failure detected!

    x Host failure

    RP Host repair

    started

    UP Host repaired

    restart

    Revisit: HA and VM migration

  • Availability model of virtualized two nodes system : VMsub model

    Up state

    Down state

    Host failure

    Host failure

    detected

    VM is restarted

    on other host

    Host is repaired

    VM is migrated

    130

  • 131/86

    Input Parameters values in Markov submodels

    For execution only, real value will be used

    alpha_sp 2

    lambda_cpu 2/8760

    mu_cpu 2

    lambda_mem 2/8760

    mu_mem 2

    lambda_net 2/8760

    mu_net 2

    mu2_net 1

    lambda_pwr 2/8760

    mu_pwr 2

    mu2_pwr 1

    lambda_coo 2/8760

    lambda2_coo 2/8760

    alpha_coo 1

    mu2_coo 1

    mu_coo 2

    delta_vmm 1

    b_vmm 0.9

    beta_vmm 6

    mu_vmm 1

    lambda_vmm 1/2880

  • 132/86

    1) Compute MTTFeq and MTTReq of Markov submodels in the host fault subtree

    Results from SHARPE execution

    MTTFeq MTTReq

    CPU 2.19050000e+003 5.00000000e-001

    Memory 1.09550000e+003 5.00000000e-001

    Network 2.19049994e+003 9.99942929e-001

    Power 2.19000000e+003 4.99942929e-001

    Cooler 4.38099977e+003 5.00399446e-001

    VMM 2.88000000e+003 1.31666667e+000

  • 133/86

    2) Compute MTTFeq and MTTReq of host fault subtree model

    MTTF equivalent : 3.49899881e+002

    MTTR equivalent : 6.79629790e-001

    They are used in VM sub Markov chain model in the next slides.

  • 134/86

    Steady state availability of system fault

    tree

    Steady state availability : 9.99770797e-001 -------------------------------------------

    Down time : 1.20469077e+002 (min)

  • Other examples index

    Non-state space models

    • Reliability Block Diagram (RBD): voice, IBM, NEC, CISCO

    • Fault Tree: the same

    • Reliability Graph

    State space models

    • Continuous Time Markov Chains (CTMC): o Simple two components failure-recovery: Bluebook

    o Other failure-recovery model: IBM, NEC-VDC

    o Software rejuvenation

    o Security model: SITAR

    o UAV model

    • Stochastic Petri nets/Stochastic Reward nets o NEC-VDC, SITAR, NEC-warm rejuvenation, Disaster tolerance

  • Reliability models in practice

  • Availability models in practice

    Expected interval availability

  • Stochastic Petri nets/SRN

  • UUXUUX UFaXUUX UUXUFaX

    UUXUDaX h h

    hd hd

    vr vr

    vdvd

    v v

    a a

    ad ad

    UDaXUUX

    UPaXUUX UUXUPaX FUXUUX UUXFUX

    DXXUUR UURDXX

    2am

    1a ac m

    2am

    (1 ) 1a ac m (1 ) 1a ac m

    UFvXUUX

    UDvXUUX

    DXXUUU

    hm

    UUUDXX

    UUXUFvX

    UUXUDvX

    UXXFUU

    hh

    UUUUXX

    FUUUXX

    hd hd

    DUUUXX UXXDUU

    hmvm vm

    vr

    1a ac m

    2 vr2

    UPvXUUX UUXUPvX

    UXXUUU

    (1− cv)r v

    μv

    cvr

    v

    (1− cv)r v

    μv

    cvr

    v

    Model with only 1st Host Failure (IEEE-TR)

  • UUXUUX UFaXUUX UUXUFaX

    UUXUDaX h h

    hd hd

    vr vr

    vdvd

    v v

    a a

    ad ad

    UDaXUUX

    UPaXUUX UUXUPaX FUXUUX UUXFUX

    DXXUUR UURDXX

    2am

    1a ac m

    2am

    (1 ) 1a ac m (1 ) 1a ac m

    UFvXUUX

    UDvXUUX

    DXXFUU

    hm

    UUUDXX

    UUXUFvX

    UUXUDvX

    UXXFUU

    hh

    UUUUXX

    FUUUXX

    hd hd

    DUUUXX UXXDUU

    hmvm vm

    vr

    1a ac m

    2 vr2

    UPvXUUX UUXUPvX

    DXXURR

    (1− cv)r v

    μv

    cvr

    v

    (1− cv)r v

    μv

    cvr

    v

    DXXDUU FUUDXX URRDXX DUUDXX

    DXXUUU

    UXXUUU

    hhdhm

    vr

    2

    hm

    2 hd

    2

    vr2

    UXXURR URRUXX

    hm

    UXXUUR

    vr2

    h

    UURUXX

    vr2

    hm

    vrvr

    Model with 2nd Host Failure

  • Sensitivity of MTTF_VMs

    Parameter Sensitivity – Model with only 1st Failure

    Sensitivity – Model with 2nd Failure

    lambda_h -1.99866136e+00 -1.99869163e+00

    m_v 9.99931045e-01 5.20949060e-02

    lambda_a 1.04508774e-03 1.04508774e-03

    lambda_v 3.25669697e-05 3.25669696e-05

    r_v -2.00199738e-05 -2.24829226e-05

    mu_v -1.72413369e-05 -1.72413369e-05

    The order of parameters in the sensitivity ranking is the same for both models

    Obs: In the paper, the values for sens. of some parameters were wrong, due to a computation error that was detected and fixed now. But only the order of r_v and mu_v was swaped with this fix. (6th -> 5th, 5th -> 6th in the ranking)

  • Unavailability and MTTF_VMs

    Metric Model with only 1st Failure Model with 2nd Failure

    Unavailability 3.96688083e-10 4.01031499e-07

    MTTF 3.85133268e+07 2.00774300e+06

    MTTF and Unavailability are larger in the second model

  • SRN models

    Refer to Trivedi’s SPN lecture notes if necc.

    Open the SHARPE project file.

  • SPN model for WFS example

  • Modeling Practices using SHARPE

  • Performance models in practice

  • Performability models in practice

  • SPNP: Stochastic Petri Net Package.

    Version 6.0

  • Outline - SPNP

    Brief Intro. To SPNP

    Download

    Analytic Modeling using SPNP • Stochastic Petri nets/Stochastic Reward nets

    • Others

  • SPNP

    Modeling tool well-known (Installed at over 180 Sites;

    companies & universities)

    Ported to Most Architectures and Operating Systems

    Used For Performance, Dependability and

    Performability

    Steady-State as well as Transient Analysis

    Written in C Language

    SPNP GUI supports animantion

  • Stochastic Reward Nets

    Bipartite graph with two kinds of nodes:

    places and transitions

    Timed and immediate transitions

    Inhibitor arc, variable cardinality arc,

    guard function, and priority

    Reward rate based measures

    Fluid stochastic Petri net: fluid places and

    fluid arcs

  • Characteristics

    Structural characteristics:

    • Marking dependency

    • Resampling policies, for general distributions

    Stochastic characteristics:

    • Allow definition of reward rates in terms of net level entities

    • Automatically generate the reward rates for the markings

  • Characteristics (cont.)

    Non- Markovian SPNs as well as Fluid Stochastic Petri

    Nets (FSPNs) can be described and solved

    Besides the analytic numeric solution of Markovian

    models discrete-event simulation is now available

    A user-friendly GUI Interface is now available

  • Transition Type

    Poisson

    Binominal

    Negative Binominal

    2-stageHyperexponential

    3-stageHypoexponential

    Erlang

    Pareto

    Cauchy

    Transition Type

    Exponential

    Deterministic(Constant)

    Uniform(a,b)

    Geometric(p)

    Weibull(a,b)

    Lognormal

    Truncated Normal

    Gamma

    Beta

    Available Distribution functions

  • Solution Techniques

    SRN Model

    Markovian Nets Non-Markovian Nets FSPNs

    Extended Reachability Graph

    (ERG)

    Generate All Markings

    (tangible + vanishing)

    Generate Reward Rates

    for Tangible Markings

    make correspondance list

    schedule transition events

    schedule fluid events

    change states

    Compute statistics

    and confidence interval

    Discrete

    Event

    Simulation

    (DES)

    modify

    clock

  • Steps in Analytic Numeric Analysis of Markovian SRN

    Generates all markings of the SRN by considering all the enabled transitions in each marking. Classify the markings as Tangible and Vanishing markings.

    Stochastic Reward Nets

    Extended Reachability Graphs

    Markov Reward Model

    Solve MRM for Steady-State

    Transient behavior using

    known methods

    Eliminates the vanishing markings

  • Discrete Event Simulation

    FSPN (Fluid Stochastic Petri net)

    Used as a model for: • Systems involving fluid variables

    • Approximation of models with a large number of tokens

    No need to generate the reachability graph

    Possibility to give the number of replications or the desired relative error.

  • SRN models in SPNP

    Some examples

  • References

    K. Trivedi’s lecture notes on SHARPE demonstration

    SHARPE portal

    SPNP site at K. Trivedi’s homepage.