Centre for Advanced Studies
May 21, 2005 © 2005 IBM Corporation
Hierarchical Model-based Autonomic Control of Software Systems
Marin Litoiu, IBM CAS TorontoMurray Woodside, Tao Zheng, Carleton University
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Outline
Motivation Hierarchical Control Performance Models Conclusions
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
A Typical Deployment: Data Centres
ClientClient WebServer
Data ServerApp Server
ClientClient WebServer
Data ServerApp Server
Data Centre
Applicationcomponent SLAs
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Self - optimization
Automated allocation of software and hardware resources to accomplish a performance goal by optimizing a cost function in the presence of – Workload variations– Perturbations– Change in the environment
Aims at– Reducing the cost of
ownership– Improve the QoS
(dependability)
time
Response time
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Outline
Motivation Hierarchical Control Performance Models Conclusions
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Hierarchical Control(1)
Tuning Monitor
ManagementUnit
Sensors
EffectorsFunctional
Unit
Managed Resource
(Web) services
Tuning Manager
Goals(SLAs )
Model BuilderTC
ontroller
TModel
Load balance Manager
Model Builder
LController
LModel
Provisioning Controller
Model Builder
PController
PModel
Load Monitor
Autonomic componentAutonomic system
Autonomic system
**
Monitor
ManagementUnit
Sensors
EffectorsFunctional
Unit
Managed Component
(Web) services
Component Controller
Goals(SLAs )
Model BuilderC
Decision
CModel
Application Controller
Model Builder
ADecision
AModel
Provisioning
Model Builder
PDecision
PModel
Autonomic componentAutonomic application
Autonomic system
**
Level 1: Component tuning
Level 2: Application tuning
Level 3: Provisioning
SLO
SLO
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Hierarchical Control(2)
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Hierarchical Control(3)
The Controller– Monitor the managed component performance metrics
and the input workload and setpoints (SLOs)
– Use the performance model to estimate future metrics and future adjustments of controlled parameters
– If the future workloads cannot be accommodated by local adjustments, alert the upper level; otherwise perform the local adjustments.
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Hierarchical Control(4): Advantages
The targeted systems are hierarchical– Structure is hierarchical (containment)– Goals are hierarchical– Authority is hierarchical
Provides several homeostatic control levers: If the 1st one fails, engage the 2nd, if the 2nd one fails, engage the 3rd
…
Solves time scale and scope issues Reduces cognition, control and communication complexity
“When in doubt, mumble; when in trouble, delegate; when in charge, ponder.”
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Agenda
Motivation Hierarchical control Performance Models Conclusions
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
The Role of Performance Models
“All models are wrong, some models are useful.” Prediction (forecast) role: tell what happens in the future
– if the workload increases by 100 users, the response time will increase by 5s
Estimation role: what happens now– If I increase the number of threads, servers, alter the
software architecture, what is the estimated change in performance
Problem determination role– Where is the bottleneck
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Queuing Network Models
K
iii QDR
1)](1[)( 1NN
X=N/(R)
Ui=X *Di
Di=service demand;
X=throughput
Ui=utilization of device i
Qi=queue length at device i
Scalability of Auction Application
0
500
1000
1500
2000
2500
1 25 50 75 100
Number of Clients
Resp
onse
tim
e[m
s] MeasuredModel
ClientClient WebServer
Data ServerApp Server
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Layered Queuing Models (LQM)
Layered Queuing Models (LQM) are analytic performance models that – Extend Queuing Network Models (QNMs)– Model queuing at software components: threading and data connection pools, locks and critical sections– Model multiple classes of requests
LQM Structure– Software resource interactions: synchronous, asynchronous, forward call– Demands at hardware resources for each class of request, one user per class in the system– Queuing centers: CPU, DISK, network, threading and data connections pools…
ClientClient WebServer
Data ServerApp Server
ClientCPU, Disk CPU, Disk CPU, DiskCPU, Disk Layer 0
Layer 1
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Linearized Dynamic Models
Linear regression models x(k)=Ax(k-1)+Bu(k)
y(k)=Cx(k)+Du(k)x,u, y - vectorsA,B,C,D- matrixes
Consider multiple input, output and state variables
Take into account the tendencies Advantage
– Take advantage of the controller design techniques from system control
– Capture the transient behaviour
Disadvantage– Assume the system is linear– The matrixes A,B,C,D are experimentally
identified– Any change in the system will invalidate the
model
Scalability of Auction Application
0
500
1000
1500
2000
2500
1 25 50 75 100
Number of Clients
Resp
onse
time[
ms] Measured
Linear Model
R(N)=C*N
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Threshold and Policy Models
Decision is part of the Model– Monitor an output variable (utilization, queue length..)– Increase/decrease a state variable– “ If the number of users increases by 10, increase the number
of threads by 5”– “ if the utilization is greater than 50%, then provision a new
server” More complex models
– Monitor more output variables– Increase/decrease one output variable
Advantages– Very simple– Very fast
Disadvantages– Not globally optimal, sometimes not even locally optimal– Not able to handle changes in the system
add a unit to the pool if <M
y remove a unit from the pool if >1
x 1 0 -1
x = 1
2
3 4
Output Measure y_1
Output Measure y_2
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Tradeoffs.…….
Layered Queuing Models– Model non-linearities across wide domains– Appropriate at application and system level– Work with mean values– Difficult to obtain service times per individual transactions( Kalman filters seem to help)
Dynamic Models– Model transitory behaviour– Appropriate at component level– For mean values, can be deduced from LQMs– In general, built experimentally (hard)– Enable System Control
Threshold and Policy Models– Can capture rules of thumb and domain experience– Can be deduced from the queuing or dynamic models– Appropriate at any level, as a default model– Simple and fast– Hard to maintain…
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Conclusions Hierarchical control
– Solves the problem of timescale and scope– Provide flexibility of choice of control algorithms– Supports accelerated decision making by LQM at upper levels
Self Optimization=Component tuning + Application Tuning (Load balancing) + Provisioning
Models for Self-optimization– Threshold or Policy Based Models– Linearized Dynamic Models– Network or Layered Queuing Models
Further Work:– End to end evaluation– Optimization – Stability
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Backup slides
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Model Builder
System Model
Filter
x, P: new parameters and covariances
x: new parameters
H: sensitivities
z: measured performance
y: predicted performance
e: prediction
error
Monitor
Prediction:
= previous estimate of x
ŷ = prediction of observation y based on and the model,
Feedback:
z = new observation vector
= K ( z - ŷ )
oldx̂
newx̂
*
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Dynamic Models link AC to System Control
1789: Watt’s fly-ball governor
1878: Maxwell's “On Governors”
1931: Black&Nyquist’s electronic negative feedback amplifier
1952: Bellman’s optimal control (dynamic programming)
1960: Kalman fillter
1970’s: Time Sharing OS
1978: Cerf et al. TCP/IP
2001:AC
Experimental AC Classic AC Modern AC
Steam valve
E
B
R
Steam valve
E
B
R
Automatic Control
Autonomic Computing
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Solvers for LQMs
Algorithms– Layers of QNMs– The output of a lower lever is used
as input for the upper level– Continue until a fix-point is reached
Public Solvers– Method of Layers (Rolia, Sevcik)
• Based on Linearizer approximation algorithm
– LQNS (Woodside )• Based on approximate MVA
– APERA (Litoiu)• Based on approximate MVA, two
layers, on alphaworksLayer 0
(hardware)
Layer 1
Layer 2
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
LQMs- Predicting the application response time
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2
R(m
s) N: 796 1 1 1 1 N: 1 796 1 1 1 N: 1 1 796 1 1 N: 1 1 1 796 1 N: 1 1 1 1 796
•Depending on the class mix, response time varies widely even when the number of users is constant ( 800) •Each class may reach its maximum response time for a different workload mix •Workload mixes produce changes in the bottlenecks
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
LQMs- Predicting the Threading Level in WAS*•100 clients
Monitored number of threads with HTTPConnection: Keep-Alive
Monitored number of threads with HTTPConnection:close
Estimated average number of threads
-------------------------------* From Litoiu M., "Migrating to Web services: a performance engineering approach," Journal of SoftwareMaintenance and Evolution: Research and Practice, No 16, pp. 51-70, 2004.
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Dynamic Models link AC to System Control
1789: Watt’s fly-ball governor
1878: Maxwell's “On Governors”
1931: Black&Nyquist’s electronic negative feedback amplifier
1952: Bellman’s optimal control (dynamic programming)
1960: Kalman fillter
1970’s: Time Sharing OS
1978: Cerf et al. TCP/IP
2001:AC
Experimental AC Classic AC Modern AC
Steam valve
E
B
R
Steam valve
E
B
R
Automatic Control
Autonomic Computing
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Component tuning
Controller (Analyze, Optimize)
Monitor
Sensor
Component
Effector
(Model) Execute
(observation) y u (control)
(disturbance) z
Application tuning
setpoint
Example of control parameters
Threading level
Admission control
Scheduling
Session length
Live versus closed connections
Specialized controllers
Specialized decision making
How general can you be at the component level?
How can one systematically inject variability at this level?
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Application tuning
Controller (Analyze, Optimize)
Monitor
Sensor
Application=Software Components
Effector
(Model) Execute
(observation) y u (control)
(disturbance) z
Provisioning
setpoint
Load balancing
horizontal scaling
vertical scaling
Inter-component optimization
DB2 connection pool in WAS
Component allocation
Scheduling
Controller and models
Become more complex
Become more general
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Provisioning
Controller (Analyze, Optimize)
Monitor
Sensor
Applications
Effector
(Model) Execute
(observation) y u (control)
(disturbance) z
SLA Hardware and software provisioning
Add/remove hardware
Add/remove software components
Needs long term prediction
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
Building Accurate Models
“If the map and the terrain don’t match, trust the terrain."
Swiss Army Rule
IBM Centre for Advanced Studies, Toronto
© 2005 IBM CorporationDEAS 05, St. Louis May 21, 2005
THANKS!
Top Related