RESISTing)Reliability)Degradation) through)Proactive...
Transcript of RESISTing)Reliability)Degradation) through)Proactive...
![Page 1: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/1.jpg)
RESISTing Reliability Degradation through Proactive ReconfigurationD. Cooray, S. Malek, R. Roshandel, and D. KilgoreSummarized by Haoliang Wang
September 28, 2015
![Page 2: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/2.jpg)
MotivationAn emerging class of system -‐ Situated Software System◦ Predominantly pervasive, embedded and mobile◦ Software system is subject to dynamical contextual changes◦ Most applications like emergency response are mission-‐critical – Reliabilitymatters
Reliability analysis at design-‐time is insufficient◦ System reliability (and other QoS) depends on its runtime characteristics◦ Adaptation at runtime is necessary
Adaptation using reactive approach ◦ Adapts to changes after degradation – not good enough◦ Prediction-‐based proactive adaptation is preferred
![Page 3: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/3.jpg)
Challenges§Proactively re-‐configure the system before performance degradation
§Effectively estimate the reliability of a complex system at runtime
§Determine the optimal system architecture at runtime
![Page 4: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/4.jpg)
RESIST FrameworkResilient Situated Software System◦ Component-‐level Reliability Analyzer◦ Configuration Reliability Analyzer◦ Configuration Selector
Context-‐Aware Middleware◦ Provides support for execution, monitoringand adaptation of a software system
![Page 5: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/5.jpg)
RESIST Framework (Cont. )RESIST is Goal Management layer solution in the three layer architectural model for self-‐managed system
![Page 6: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/6.jpg)
RESIST Framework (Cont. )System Model◦ The system is divided into several functional componentswhich have their own reliability◦ Each component is allocated to a process◦ The system reliability is determined by the architecture, the individual components, and the context
Failure Model◦ Fail-‐stop – detectable by middleware facilities◦ Component failureEffects are contained within the boundary of component
◦ Process failureOccurs when one of its components exits prematurely.Other components running on it will also fail
![Page 7: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/7.jpg)
Component-level AnalysisDiscrete Time Markov Chain (DTMC)◦ Estimate the component reliability ◦ A stochastic process with a set of states S = {S1, S2, S3, …, SN}
◦ Transition matrix A = {aij}, where aij is the probability of transitioning from Si to Sj
◦ Reliability of the component is computedby solving the steady state probability of not being in any failure state
How to derive the transition matrix A?
![Page 8: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/8.jpg)
Component-level Analysis (Cont. )Hidden Markov Models (HMMs)◦ Learn from the runtime data and estimate the transition probability matrix
◦ A stochastic process with a set of states S = {S1, S2, S3, …, SN}
◦ Transition matrix A = {aij}, where aij is the probability of transitioning from Si to Sj
◦ A set of observations O = {O1, O2, O3, …, OM}◦ Observation matrix E = {eik}, where eik is the probability of observing event Ok in state Si
Baum-‐Welch algorithm is used to train and solve the HMM and obtain the converged transition matrix A
![Page 9: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/9.jpg)
Component-level Analysis (Cont. )An example for estimating component reliability◦ A robot controller behavior model◦ States S = {idle, estimating, planning, moving, failed}◦ Running Baum-‐Welch algorithm on the observation sequence and we can obtain the transition matrix A
◦ Solve for the steady state probability vector[0.1966, 0.2238, 0.3849, 0.1914, 0.0033]
◦ Controller component reliability is 1-‐ 0.0033 = 99.67%
![Page 10: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/10.jpg)
Component-level Analysis (Cont. )Estimate the near future by incorporating the context◦ Define a set of contextual parameters C = {C1, C2, …, Cx}◦ If akj is a transition probability from state Sk to state Sj in matrix A which is affected by changes in a specific contextual parameters Cn, then
a’kj = μ(akj, ΔCn), where μ is a context-‐specific function quantifying the impact of contextual change on the transition probability.
◦ The remaining transition probabilities in the row are adjusted proportionately such that: a’kj + akf + Σa’km = 1.
![Page 11: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/11.jpg)
Configuration-level AnalysisMarkov-‐based system-‐level reliability estimation◦ System reliability is estimated compositionally based on the reliability of individual components
◦ Map the components and the interactions between them into a DTMC, where a state is one or more components in concurrent execution
◦ System reliability is computed as,
where 𝑀 is a 𝑘×𝑘matrix whose elements are,
where 𝑅% is the reliability of state 𝑠% and 𝐸 is the determinant of the remaining matrixexcluding the last row of the first column of (𝐼 − 𝑀)
𝑅 = (−1)./0𝑅.𝐸
𝐼 −𝑀
𝑀 𝑖, 𝑗 = 4𝑅%𝑃%6 , 𝑠% 𝑟𝑒𝑎𝑐ℎ𝑒𝑠 𝑠6 𝑎𝑛𝑑 𝑖 ≠ 𝑘0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
![Page 12: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/12.jpg)
Configuration-level Analysis (Cont. )An example for estimating system reliability◦ Suppose we obtain the initial component reliabilityfor the Controller and Navigator to be
and assume others are 100% reliable◦ Based on the observed data, we can obtain thetransition probability for each state and therefore M
◦ Solving the model yields a system reliability of 93.85%
𝐶 = 0.9967,𝑁 = 0.9751
![Page 13: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/13.jpg)
Configuration-level Analysis (Cont. )Impact of architectural style◦ E.g., Replicating components to improve system reliability
![Page 14: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/14.jpg)
Configuration-level Analysis (Cont. )Impact of deployment architecture◦ E.g., Reallocating components to different processes to improve system reliability
![Page 15: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/15.jpg)
Configuration SelectionConfiguration selection as an optimization problem◦ The optimal configuration in RESIST is defined as one that satisfies the system’s reliability requirement, while improving other quality attributes of concern
◦ In other words, given the decision variables,𝑝% ∈ 𝛧/ represents the number of replicas for component 𝑖𝑥%6 ∈ [0, 1] indicates if component 𝑖 is placed on process 𝑗
the objective is to find an architectural configuration 𝐶∗ such that,
where 𝑈S is a utility function indicating the preference for quality attribute 𝑞𝑅(𝐶) is the expected reliability of a given architecture 𝐶
𝐶∗ = 𝑎𝑟𝑔𝑚𝑎𝑥(W) X 𝑈S(𝐶)∀S ∈ Z[\]%^_ `a6bc^%dbe
𝑠. 𝑡. ∀𝑖 ∈ 1,… , 𝑡 , 𝑝% ≤ 𝑤%, 𝑤 ∈ 𝛧/∀𝑖 ∈ 1, … , 𝑡 , ∑ 𝑥%6i
6j0 = 1𝑅 𝐶 ≥ 𝛿, 𝛿 𝜖 ℝ, 0 < 𝛿 ≤ 1
![Page 16: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/16.jpg)
Configuration Selection (Cont. )Configuration reliability R(C)◦ Assume the component may either be replicated or share a process with other componentsExpress with a binary variable 𝑞% = 1 if 𝑖^i component shares a process; 0 if otherwise.
𝑞% = 1−X 𝑥%6p (1 − 𝑥.6)^
.q%
i
6j0◦ Thus, the effective reliability of component i is,
𝑟%rss = 𝑞%𝑟%tuvwr + (1 − 𝑞%)𝑟%wrywhere,
𝑟%tuvwr =X 𝑟%𝑥%6p [𝑟.𝑥%6 + (1 − 𝑥.6)]^
.q%
z
6j0
𝑟%wry = 1 − 1− 𝑟%0/{|
◦ Finally, the system reliability can be computed as specified in configuration-‐level analysis
![Page 17: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/17.jpg)
Configuration Selection (Cont. )Time-‐complexity analysis◦ Suppose we have
P = number of processesC = number of componentsN = maximum number of replicas
◦ This implies that there𝑂(𝑃W) ways of allocating components to processes𝑂(𝑁W) ways replicating components
◦ Therefore, total possible configuration is 𝑂((𝑁𝑃)W) – NP Problem
However the solution space may be significantly pruned by imposing architectural constrains
![Page 18: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/18.jpg)
EvaluationImplementation◦ Mobile emergency response system prototype◦ XTEAM is used to control system’s operational profile◦ Prism-‐XM is used to gather the runtime data◦ Matlab is used to generate and solve HMM model
Evaluation Criteria◦ Validity of reliability predictions◦ Effectiveness of proactive re-‐configuration◦ Performance overhead
![Page 19: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/19.jpg)
Evaluation (Cont. )Validity of Reliability Prediction◦ Use Bump Probability as the contextual parameter which affect the transition probability from moving state to estimating.
![Page 20: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/20.jpg)
Evaluation (Cont. )Proactive Reconfiguration
![Page 21: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/21.jpg)
Evaluation (Cont. )Overhead of Component Reliability Analysis
![Page 22: RESISTing)Reliability)Degradation) through)Proactive ...menasce/cs788/slides/Wang-H-Resist-Cooray.pdf · States#S(=({idle,estimating,planning,moving, failed} Running#Baum1Welch#algorithm#on#the#observation#](https://reader033.fdocuments.us/reader033/viewer/2022060306/5f0976627e708231d426f23f/html5/thumbnails/22.jpg)
SummaryRESIST is framework that maintain the reliability of the situated software system through proactive reconfiguration of the software architecture
Three major components◦ Component reliability analysis◦ Configuration reliability analysis◦ Configuration selector
Three key contributions◦ Incorporation of multiple sources of information, particularly contextual information◦ Automatically find the optimal architectural configuration◦ Proactively adapt the system before the system’s reliability degrades