How to use SHARPE and SPNP - Duke...
Transcript of How to use SHARPE and SPNP - Duke...
-
How to use SHARPE and SPNP
Dr. Dan (Dong-Seong) Kim
University of Canterbury, New Zealand
[email protected] http://www.cosc.canterbury.ac.nz/dongseong.kim
mailto:[email protected]
-
University of Canterbury
University of Canterbury (UC) • originated in 1873 in the centre of Christchurch as
Canterbury College, the first constituent college of the University of New Zealand
• Ernest Rutherford – physicist – Nobel Prize in chem.
• John Key– politician currently Prime Minister of New Zealand
Computer Science and Software Engineering department at UC has been ranked in the top 101-150 Computer Science departments in the 2011 International QS World University Rankings.
http://en.wikipedia.org/wiki/Christchurchhttp://en.wikipedia.org/wiki/University_of_New_Zealandhttp://en.wikipedia.org/wiki/University_of_New_Zealandhttp://en.wikipedia.org/wiki/Ernest_Rutherfordhttp://en.wikipedia.org/wiki/John_Keyhttp://en.wikipedia.org/wiki/John_Keyhttp://en.wikipedia.org/wiki/Prime_Minister_of_New_Zealandhttp://en.wikipedia.org/wiki/Prime_Minister_of_New_Zealandhttp://en.wikipedia.org/wiki/Prime_Minister_of_New_Zealandhttp://www.topuniversities.com/university-rankings/world-university-rankings/2011/subject-rankings/technology/computer-science-information-systems?page=4http://www.topuniversities.com/university-rankings/world-university-rankings/2011/subject-rankings/technology/computer-science-information-systems?page=4
-
About myself
Lecturer (Assistant Professor in US) since Aug. 2011 • Full time/permanent
• Computer science and software engineering Dept.
• Research/teaching: Computer and Network Security
Postdoc at Duke U. from June 2008- July 2011 • (Kishor S. Trivedi group)
U of Maryland, USA in 2007 • Virgil D. Gligor group (former ACM SIGSAC chair)
Studied at KAU in Korea (BS, MS, PhD) • JongSou Park group (Penn. State PhD)
-
Security
Hardware/ Software
Network
Ubiquitous computing
Cyber physical systems
Intrusion Detection /Tolerance systems
Sensor Nets Dependability and Security
Cloud/VDC
Fault tolerance Reliability of Sat./UAV
Embedded System
REASSURE analysis (REliability, Availability, Security, SUrvivability, REsilience)
Smart Grid
-
Outline - SHARPE
Brief Intro. To SHARPE
Download
Analytic Modeling using SHARPE • Non-state space models
o Reliability Block Diagram (RBD)
o Fault Tree
o Reliability Graph
• State space models o Continuous Time Markov Chains (CTMC)
o Stochastic Petri nets/Stochastic Reward nets
• Hierarchical Models
• Others
-
6/81
Copyright © by Kishor S. Trivedi
Health & Medicine
Communication
Avionics
Entertainment Banking
Motivation: Dependence on Computer Systems
-
Dependability– An umbrella term
Trustworthiness of a computer system such that reliance can justifiably be placed on the service it delivers
Dependability
Attributes
Availability Reliability Safety Maintainability
Fault Prevention Fault Removal Fault Tolerance Fault Forecasting
Means
Threats Faults Errors Failures
-
IFIP WG10.4
Failure occurs when the delivered service no longer complies with the desired output.
Error is that part of the system state which is liable to lead to subsequent failure.
Fault is adjudged or hypothesized cause of an error.
Faults are the cause of errors that may lead to failures
Fault Error Failure
-
Reliability Availability
Dependability Measures
Dependability Attributes or Measures
• Reliability: “The ability of a system to perform a required function
under given conditions for a given time interval.” No recovery is
assumed after system fails (there can be recovery after a component
failure)
• Availability: “The ability of a system to be in a state to perform a
required function at a given instant of time or at any instant of time
within a given time interval.“
-
Reliability, Availability, Performance
Reliability-Based
• Availability: Steady-state, Transient, Interva
Downtime
• Reliability: R(t), System MTTF
Performance
• Throughput, Loss Probability, Response Time
“Does it work, and for how long?''
“Given that it works, how well does it work?''
-
Composite Performance and Availability
Need Techniques and Tools That Can Evaluate • Performance, Availability and Their
Combinations
“How much work will be done(lost) in a
given interval including the effects of
failure/repair/contention?''
Performability
-
Methods of Evaluation
Measurement-Based
• Most believable, most expensive
• Not always possible or cost effective during system
design
Model-Based
• Less believable, Less expensive
• Discrete-Event Simulation vs. Analytic
-
Numerical solution tool
Close-form
solution
Evaluation Methods
Model-based
Discrete-event simulation
Hybrid
Analytic Models
Quantitative evaluation
Measurement-based
-
Overview of SHARPE
SHARPE: Symbolic-Hierarchical Automated Reliability and Performance Evaluator
Well-known modeling tool (Installed at over 300 Sites; companies and universities)
Combines flexibility of Markov models and efficiency of combinatorial models
Ported to most architectures and operating systems
Used for Education, Research, Engineering Practice
-
Overview of SHARPE (cont.)
Graphical User Interface is now available
Used for analysis of performance, dependability
and performability
Hierarchy facilitates largeness & stiffness
avoidance
Steady-state as well as transient analysis
Written in C language
-
Architecture of SHARPE
Fault tree
Reliability graph
Reliability
Block
Diagrams
Task graph Pfqn, Mfqn
Hierarchical & Hybrid Compositions
Semi-Markov chain
Markov chain
Petri net
(GSPN & SRN)
Availability/Reliability Performance Performability
-
Non-state space models
Analytic models
Non-state space model types
Series Parallel reliability block diagrams
(RBDs)
Non-SP reliability block diagrams
(reliability graph: Relgraph)
Fault trees (FTs)
Fault trees with repeated events
-
Non-state space models (cont.)
Reliability block diagrams, Fault trees and
Reliability graphs
• Commonly used for reliability and availability
• These model types are similar in that they capture
conditions that make a system fail in terms of the
structural relationships between the system
components.
-
Non-state space models (cont.)
Non-state space modeling techniques (like RBDs, relgraphs and FTs) are easy to use and assuming statistical independence solve for system reliability, system availability and system MTTF; can find bottlenecks
Each component can have attached to it
• A probability of failure
• A failure rate
• A distribution of time to failure
• Steady-state and instantaneous unavailability
-
Non-state space models (cont.)
can be solved using fast algorithms assuming stochastic independence between system components (all implemented in SHARPE) • Sum of Disjoint Products (SDP) algorithms.
• Binary Decision Diagrams (BDD) algorithms.
• Factoring (conditioning) algorithms.
• Series-parallel composition algorithms.
• Bounding algorithm for relgraphs
Failure/Repair Dependencies are often present; • RBDs, FTs cannot easily handle these (e.g., shared repair,
warm/cold spares, imperfect coverage, non-zero switching time, travel time of repair person, reliability with repair).
-
Non-state space models (cont.)
Reliability block diagrams (Control-voice channel example)
Fault Tree (Control-voice channel example)
-
Reliability block diagrams
-
2 Control and 3 Voice Channels Example
control
control
voice
voice
voice
-
Description
Each control channel has a reliability Rc(t)
Each voice channel has a reliability Rv(t)
System is up if at least one control channel and at least 1 voice channel are up.
Reliability:
]))(1(1][))(1(1[)( 32 tRtRtR vc
-
Exercise: (non-) virtualized vs. virtualizedsystem
H. Ramasamy and M. Schunter, ‚Architecting Dependable Systems Using Virtualization,‛ In Workshop on DSN-2007.
-
Fault Tree
-
Markov chains
-
Markov chains
To model complex interactions between components, use models like Markov chains or more generally state space models.
Markov reliability models will have one or more absorbing states;
• Markov availability models will have n absorbing states
Many examples of dependencies among system components have been observed in practice and captured by continuous-time Markov chains (CTMCs).
-
Modeling Taxonomy
Abstract models
Discrete-event simulation
Hybrid
Analytic models
Non-state-space models
State-space models
-
State-Space Models
States and labeled state transitions
State can keep track of:
• Number of functioning resources of each type
• States of recovery for each failed resource
• Number of tasks of each type waiting at each resource
• Allocation of resources to tasks
A transition:
• Can occur from any state to any other state
• Can represent a simple or a compound event
-
Markov Availability model WebSphere AP Server
Application server and proxy server (with escalated levels of recovery)
• Delay and imperfect coverage in each step of recovery modeled
UA UR UB(1-r)rm
rrmqra
(1-q)ra
bbm
RE(1-b)bm
m
UOUPg ed2
1D
ed2dd1
(1-e)d2
UN
dm
1N
(1-d)d1
(1-e)d2
2N(1-d)d1
(1-e)d2
ed2 dd1
Failure detection By WLM
By Node Agent
Manual detection
Recovery Node Agent
Auto process restart
Manual recovery Process restart
Node reboot
Repair
-
State-Space model taxonomy
(discrete) State space
models
Markovian models
non-Markovian models
discrete-time Markov chains (DTMC)
continuous-time Markov chains (CTMC)
Markov reward models (MRM)
Semi-Markov process (SMP)
Markov regenerative process
Non-Homogeneous Markov
Can relax the assumption of exponential distributions
-
Problem with State Space Models
State space explosion problem or the largeness problem
Stochastic Petri nets (SPNs) and related formalisms for easy specification and automated generation/solution of underlying Markov model
Or use hierarchical (Multilevel) model composition.
• e.g. Upper level : FT or RBD, lower level: Markov chains
• Many practical examples of the use of hierarchical models exist
-
Markov Reward Models (MRMs)
Modeling any system with a pure reliability / availability model can lead to incomplete, or, at least, less precise results. • Gracefully degrading systems may be able to
survive the failure of one or more of their active components and continue to provide service at a reduced level.
• Markov reward model is commonly used technique for the modeling of gracefully degradable system
-
Two-State Markov Availability
Model in SHARPE
-
Availability measures
Steady-state Availability
Steady-state Unavailability
Transient Availability
Average Cumulative Downtime
-
Snapshot of the GUI
-
bind
lambda 0.0033
mu 1
end
* We define a model of type Markov with the name m2_state
markov m2_state readprob
*specify each transition and its rate
1 0 lambda
0 1 mu
end
* specify that state 1 is the initial state with probability 1
1 1
end * define variable A as the steady-state probability of
* the Markov chain m2_state being in state 1 var A prob(m2_state,1)
-
var U prob(m2_state,0)
var downtime 60*8760*U
*ask sharpe to compute and print A, U and downtime
expr A, U, downtime
*define function A(t) as the transient probability for the Markov chain
* to be in the up state 1 at time t
func A(t) tvalue(t;m2_state,1)
* we ask SHARPE to compute and print pointwise availability
* for time points 0 through 1000 in steps of 50 hours
loop t,0,1000,50
expr A(t)
end
end
-
Outputs from SHARPE
A: 9.9668e-01
U: 3.3223e-03
Downtime: 1.7462e+03
-
0.995
0.996
0.997
0.998
0.999
1
1.001
0 50 100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
Time
insta
nta
neo
us a
vailab
ilit
y A
(t)
Outputs
-
Two component system: Markov availability model
Assume we have a two-component parallel redundant system with repair rate m.
Assume that the failure rate of both components is .
When both the components have failed, the system is considered to have failed.
-
Markov availability model (Cont.)
Let the number of properly functioning components be the state of the system. The state space is {0,1,2} where 0 is the system down state.
We wish to examine effects of shared vs. non-shared repair
-
2 1 0
2
m
m2
2 1 0
2
m
m
Non-shared (independent)
repair
Shared repair
Markov availability model (Cont.)
-
Note: Non-shared case can be modeled &
solved using a RBD or a FT but shared case
needs the use of Markov chains.
m
m
A
2)1(1m
m
sysA
Markov availability model (Cont.)
-
Markov availability model (Cont.)
-
72
markov shared
2 1 2*lambda
* Could be also written
* 2 1 2/MTTF
1 0 lambda
1 2 mu
0 1 mu
end
Shared Case
-
bind mu 1
lambda 0.1
end var U prob(shared,0)
var downtime 60*8760*U
loop j ,2, 5, 0.5
bind lambda 1.0 *10^-j
expr downtime
end
end
-
Markov availability model (Cont.)
-
Non-shared case
Markov availability model (Cont.)
-
markov non_shared
2 1 2*lambda
1 0 lambda
0 1 mu
1 2 2*mu
end
Non-shared Case
-
bind mu 1
end var U prob(non_shared,0) var downtime 60*8760*U
loop j ,2, 5, 0.5
bind lambda 1.0 *10^-j
expr downtime
end
end
-
Copyright © 2008 by K.S. Trivedi 78
Markov availability model (Cont.)
Non-shared case
-
Comparing Shared/Non-shared cases
-
CTMC: WFS example
WFS Example
-
A Workstations-Fileserver Example
Computing system consisting of:
• A file-server
• Two workstations
• Computing network connecting them
System operational as long as:
• One of the Workstations
and
• The file-server are operational
Computer network is assumed to be fault-free.
-
The WFS Example
-
Assuming exponentially distributed times to
failure
• w : failure rate of workstation
• f : failure rate of file-server
Assume that components are repairable
• mw: repair rate of workstation
• mf : repair rate of file-server
File-server has (preemptive) priority for repair
over workstations (such repair priority cannot
be captured by non-state-space models)
Markov Chain for WFS Example
-
Markov Availability Model for WFS
0_0
2_1 1_1
1_0 2_0
0_1
f
2w
2w
w
mw mw
w
mf mf mf f f
Since each state is reachable from every other state, the
CTMC is irreducible. Furthermore, all states are positive
recurrent (since it is a finite state CTMC).
-
In the previous figure, the label (i_j) of each state is
interpreted as follows: i represents the number of
workstations that are still functioning and j is ‘1’ or ‘0’
depending on whether the file-server is up or down
respectively.
Note that in the text, no component failures are
allowed from system failure states; this is commonly
assumed by many engineers in practice. Here we
allow component failures from system failure states to
show that this situation can also be modeled.
Markov Availability Model for WFS (cont.)
-
Model in SHARPE GUI
-
Outputs; steady state
-
Analysis Frame
-
Output Generated by SHARPE
-
Markov Reward Models (MRMs)
-
Markov Reward Models (MRMs)
Continuous Time Markov Chains are useful
models for performance as well as availability
prediction
Extension of CTMC to Markov reward models
make them even more useful
Attach a reward rate (or a weight) ri to state i of
CTMC. Let Z(t)=rX(t) be the instantaneous reward
rate of CTMC at time t
-
Markov Reward Models (MRMs) (Continued)
Define Y(t) the accumulated reward in the interval (0,t]
Computing the expected values of these measures is easy
For computing distribution of Y(t) see the Perfomability monograph edited by Haverkort, Marie, Rubino & Trivedi
t
dZtY0
)()(
-
3-State Markov Reward Model with Sample
Paths of X(t) and Z(t) Processes
r1 = 1.7
r2 = 1.0
r3 = 0.0
1 2 3
2
m2 m3
-
CTMC: Security models
-
Legend
R
IR CR IA CA
CIA
A
Dependability and Security Models
A Dependability and Security Model Classification
A Availability
C Confidentiality
I Integrity
R Reliability
-
Security Modeling Taxonomy
Analytic models
Non-state space models
State-space models
Attack tree
SMP
CTMC
-
Security compromise of smart card
Confidentiality compromise
Availability compromise
Integrity compromise
Get PIN Unauthorized access
Dump communication
Applications/algorithm Data Protocol
Block access
Hardware damage
Denial Of service
Non-state space model: Attack Tree
Fault tree -> attack tree
I1 I2 I3
-
Non-state space model: Attack graph
Reliability graph -> attack graph
Will be covered in more detail • In security talk
-
Probabilistic Security Quantification
State transition diagram of system security states/ attacker behavior, and use Markov chains, Semi Markov Process, Stochastic Petri Nets
Attack (Response/countermeasure) Trees for incorporating both attacker and system behavior.
Hierarchical and fixed-point iterative models in future?
-
Security Quantification of SITAR
Security vs. attack rate
0.9550.96
0.9650.97
0.9750.98
0.9850.99
0.9951
0.33
2.33
4.33
6.33
8.33
10.3
3
12.3
3
threat level 3
threat level 1
Mean time to security failure vs. attack rate
0
5000
10000
15000
20000
25000
30000
1 2 3 4 5 6 7 8 9 10 11 12 13
threat level 3
threat level 1
System security
Mean time to severe security failure
Threat level 1
Threat level 3
-
State Space Explosion
State space explosion can be handled in two ways: • Large model tolerance must apply to specification, storage
and solution of the model. If the storage and solution problems can be solved, the specification problem can be solved by using more concise (and smaller) model specifications that can be automatically transformed into Markov models (GSPN and SRN models).
• Large models can be avoided by using hierarchical model composition.
Ability of SHARPE to combine results from different kinds of models • Possibility to use state-space methods for those parts of a
system that require them, and use non-state-space methods for the more ‚well-behaved‛ parts of the system.
-
Dependability and Security Evaluation Methods
Model-based
Discrete-event simulation
Hybrid
Analytic Models
Quantitative evaluation
Measurement-based
Hierarchical models
largeness
Combinatorial models
Efficiency, simplicity
State-space models
Dependency capture
-
VDC example
-
Introduction
VMware Virtualization • w/o virtualization vs. w virtualization
-
Introduction
VM live migration (e.g., VMotion in VMware) • enables the Live Migration of Virtual Machines
across hosts
How useful is this ?
-
Introduction
VMware
CPU utilization CPU utilization
VMware VMware
CPU utilization
OS
App App
OS
App
OS
App
OS
App
OS
Call for Upgrade
Upgrade Completion
VM live migration (VMotion) (cont)
host1 host2 host3
-
Introduction
VMotion
Bottle App
ESX Server 1 ESX Server 3 ESX Server 2
Bottle App Bottle App
VM live migration (VMotion) (cont) • automatic resource allocation to bottleneck
-
Introduction
VMware HA (High Availability) • provides easy to use, cost effective high
availability.
Resource Pool X Host failure
-
Revisit (non-) virtualized vs. virtualizedsystem
H. Ramasamy and M. Schunter, ‚Architecting Dependable Systems Using Virtualization,‛ In Workshop on DSN-2007.
-
A System Architecture
VMM1
Host1
VMM2
Host2
VM2
APP2
VM1
APP1
SAN
CPU
Mem
Power
NIC
Cooler
Hardware
-
Failure classification in virtualized two hosts
Failures
Hardware failures Software failures
(non-aging related Mandel bugs)
VMM Application VM Power
failure
Memory
failure
CPU
failure
Network
failure
SAN
failure
Cooler
failure
How can you represent this system using stochastic model?
Any idea?
-
A system availability model of virtualized two nodes (old)
System Failure
AND
HW1
CPU Mem Net Pwr
host1
VMM1 HW2
CPU Mem Net Pwr
host2
VMM2
SAN SAN
VMs VMs
-
A system hierarchical model of virtualized two hosts
System Failure
VM sub SAN
HW1
CPU1 Mem1 NIC1 Pwr1
Host1
VMM1
Coo1
HW2
CPU2 Mem2 NIC2 Pwr2
Host2
VMM2
Coo2
AND
OR
Compute equivalent failure and recovery rate in one host (subtree) in the fault tree And use it in VM Markov model
-
Markov submodel
CPU submodel
D1 RP
VMM1
host1
VMM2
host2
VM2
APP2
VM1
APP1
SAN
CPU Mem Power NIC Cooler
-
MTTFeq and MTTRequ computation of VM host
Top level is a host fault subtree (boxed)
1. Solve Markov submodels such as CPU, MEM, NIC, PWR, COO in the host fault subtree. • We can compute MTTFeq, MTTReq of the Markov
submodels
2. Compute MTTFeq and MTTReq of a host fault subtree model
3. Use MTTFeq and MTTReq values into VM sub Markov model to be shown
Ref: Dazhi Wang, MS Thesis
-
System flow review
A host fault subtree
VM sub Markov chain model
2) MTTF_eq/MTTR_eq
of a host fault subtree
HW1
CPU Mem NIC Pwr
Host
1
VMM1
Coo 1) MTTF_eq/MTTR_eq of
Markov submodels
Measure of interests
(availability)
CPU Mem NIC Pwr VMM Coo SAN
Input Parameters values of the Markov submodels
System Failure
VM sub SAN
HW1
CPU1 Mem1 NIC1 Pwr1
Host1
VMM1
Coo1
HW2
CPU2 Mem2 NIC2 Pwr2
Host2
VMM2
Coo2
AND
OR
3) Use MTTFeq and MTTReq
values into VM sub Markov
model to be shown
-
VMware HA (High Availability)
• After a host failure detection, restart VM on the other host
VM migration
• After the host is repaired, VM is moved to on the original host
Resource Pool
Host failure detected!
x Host failure
RP Host repair
started
UP Host repaired
restart
Revisit: HA and VM migration
-
Availability model of virtualized two nodes system : VMsub model
Up state
Down state
Host failure
Host failure
detected
VM is restarted
on other host
Host is repaired
VM is migrated
130
-
131/86
Input Parameters values in Markov submodels
For execution only, real value will be used
alpha_sp 2
lambda_cpu 2/8760
mu_cpu 2
lambda_mem 2/8760
mu_mem 2
lambda_net 2/8760
mu_net 2
mu2_net 1
lambda_pwr 2/8760
mu_pwr 2
mu2_pwr 1
lambda_coo 2/8760
lambda2_coo 2/8760
alpha_coo 1
mu2_coo 1
mu_coo 2
delta_vmm 1
b_vmm 0.9
beta_vmm 6
mu_vmm 1
lambda_vmm 1/2880
-
132/86
1) Compute MTTFeq and MTTReq of Markov submodels in the host fault subtree
Results from SHARPE execution
MTTFeq MTTReq
CPU 2.19050000e+003 5.00000000e-001
Memory 1.09550000e+003 5.00000000e-001
Network 2.19049994e+003 9.99942929e-001
Power 2.19000000e+003 4.99942929e-001
Cooler 4.38099977e+003 5.00399446e-001
VMM 2.88000000e+003 1.31666667e+000
-
133/86
2) Compute MTTFeq and MTTReq of host fault subtree model
MTTF equivalent : 3.49899881e+002
MTTR equivalent : 6.79629790e-001
They are used in VM sub Markov chain model in the next slides.
-
134/86
Steady state availability of system fault
tree
Steady state availability : 9.99770797e-001 -------------------------------------------
Down time : 1.20469077e+002 (min)
-
Other examples index
Non-state space models
• Reliability Block Diagram (RBD): voice, IBM, NEC, CISCO
• Fault Tree: the same
• Reliability Graph
State space models
• Continuous Time Markov Chains (CTMC): o Simple two components failure-recovery: Bluebook
o Other failure-recovery model: IBM, NEC-VDC
o Software rejuvenation
o Security model: SITAR
o UAV model
• Stochastic Petri nets/Stochastic Reward nets o NEC-VDC, SITAR, NEC-warm rejuvenation, Disaster tolerance
-
Reliability models in practice
-
Availability models in practice
Expected interval availability
-
Stochastic Petri nets/SRN
-
UUXUUX UFaXUUX UUXUFaX
UUXUDaX h h
hd hd
vr vr
vdvd
v v
a a
ad ad
UDaXUUX
UPaXUUX UUXUPaX FUXUUX UUXFUX
DXXUUR UURDXX
2am
1a ac m
2am
(1 ) 1a ac m (1 ) 1a ac m
UFvXUUX
UDvXUUX
DXXUUU
hm
UUUDXX
UUXUFvX
UUXUDvX
UXXFUU
hh
UUUUXX
FUUUXX
hd hd
DUUUXX UXXDUU
hmvm vm
vr
1a ac m
2 vr2
UPvXUUX UUXUPvX
UXXUUU
(1− cv)r v
μv
cvr
v
(1− cv)r v
μv
cvr
v
Model with only 1st Host Failure (IEEE-TR)
-
UUXUUX UFaXUUX UUXUFaX
UUXUDaX h h
hd hd
vr vr
vdvd
v v
a a
ad ad
UDaXUUX
UPaXUUX UUXUPaX FUXUUX UUXFUX
DXXUUR UURDXX
2am
1a ac m
2am
(1 ) 1a ac m (1 ) 1a ac m
UFvXUUX
UDvXUUX
DXXFUU
hm
UUUDXX
UUXUFvX
UUXUDvX
UXXFUU
hh
UUUUXX
FUUUXX
hd hd
DUUUXX UXXDUU
hmvm vm
vr
1a ac m
2 vr2
UPvXUUX UUXUPvX
DXXURR
(1− cv)r v
μv
cvr
v
(1− cv)r v
μv
cvr
v
DXXDUU FUUDXX URRDXX DUUDXX
DXXUUU
UXXUUU
hhdhm
vr
2
hm
2 hd
2
vr2
UXXURR URRUXX
hm
UXXUUR
vr2
h
UURUXX
vr2
hm
vrvr
Model with 2nd Host Failure
-
Sensitivity of MTTF_VMs
Parameter Sensitivity – Model with only 1st Failure
Sensitivity – Model with 2nd Failure
lambda_h -1.99866136e+00 -1.99869163e+00
m_v 9.99931045e-01 5.20949060e-02
lambda_a 1.04508774e-03 1.04508774e-03
lambda_v 3.25669697e-05 3.25669696e-05
r_v -2.00199738e-05 -2.24829226e-05
mu_v -1.72413369e-05 -1.72413369e-05
The order of parameters in the sensitivity ranking is the same for both models
Obs: In the paper, the values for sens. of some parameters were wrong, due to a computation error that was detected and fixed now. But only the order of r_v and mu_v was swaped with this fix. (6th -> 5th, 5th -> 6th in the ranking)
-
Unavailability and MTTF_VMs
Metric Model with only 1st Failure Model with 2nd Failure
Unavailability 3.96688083e-10 4.01031499e-07
MTTF 3.85133268e+07 2.00774300e+06
MTTF and Unavailability are larger in the second model
-
SRN models
Refer to Trivedi’s SPN lecture notes if necc.
Open the SHARPE project file.
-
SPN model for WFS example
-
Modeling Practices using SHARPE
-
Performance models in practice
-
Performability models in practice
-
SPNP: Stochastic Petri Net Package.
Version 6.0
-
Outline - SPNP
Brief Intro. To SPNP
Download
Analytic Modeling using SPNP • Stochastic Petri nets/Stochastic Reward nets
• Others
-
SPNP
Modeling tool well-known (Installed at over 180 Sites;
companies & universities)
Ported to Most Architectures and Operating Systems
Used For Performance, Dependability and
Performability
Steady-State as well as Transient Analysis
Written in C Language
SPNP GUI supports animantion
-
Stochastic Reward Nets
Bipartite graph with two kinds of nodes:
places and transitions
Timed and immediate transitions
Inhibitor arc, variable cardinality arc,
guard function, and priority
Reward rate based measures
Fluid stochastic Petri net: fluid places and
fluid arcs
-
Characteristics
Structural characteristics:
• Marking dependency
• Resampling policies, for general distributions
Stochastic characteristics:
• Allow definition of reward rates in terms of net level entities
• Automatically generate the reward rates for the markings
-
Characteristics (cont.)
Non- Markovian SPNs as well as Fluid Stochastic Petri
Nets (FSPNs) can be described and solved
Besides the analytic numeric solution of Markovian
models discrete-event simulation is now available
A user-friendly GUI Interface is now available
-
Transition Type
Poisson
Binominal
Negative Binominal
2-stageHyperexponential
3-stageHypoexponential
Erlang
Pareto
Cauchy
Transition Type
Exponential
Deterministic(Constant)
Uniform(a,b)
Geometric(p)
Weibull(a,b)
Lognormal
Truncated Normal
Gamma
Beta
Available Distribution functions
-
Solution Techniques
SRN Model
Markovian Nets Non-Markovian Nets FSPNs
Extended Reachability Graph
(ERG)
Generate All Markings
(tangible + vanishing)
Generate Reward Rates
for Tangible Markings
make correspondance list
schedule transition events
schedule fluid events
change states
Compute statistics
and confidence interval
Discrete
Event
Simulation
(DES)
modify
clock
-
Steps in Analytic Numeric Analysis of Markovian SRN
Generates all markings of the SRN by considering all the enabled transitions in each marking. Classify the markings as Tangible and Vanishing markings.
Stochastic Reward Nets
Extended Reachability Graphs
Markov Reward Model
Solve MRM for Steady-State
Transient behavior using
known methods
Eliminates the vanishing markings
-
Discrete Event Simulation
FSPN (Fluid Stochastic Petri net)
Used as a model for: • Systems involving fluid variables
• Approximation of models with a large number of tokens
No need to generate the reachability graph
Possibility to give the number of replications or the desired relative error.
-
SRN models in SPNP
Some examples
-
References
K. Trivedi’s lecture notes on SHARPE demonstration
SHARPE portal
SPNP site at K. Trivedi’s homepage.